Commit Graph

10 Commits

Author SHA1 Message Date
Garry Tan
dcd13dd638 feat: v0.16.4 — gbrain check-resolvable CLI + skillify-check wiring (#325)
* Merge origin/master into garrytan/check-resolvable-v1

Resolves CHANGELOG.md conflict: preserved v0.16.1/v0.16.2/v0.16.3 upstream
entries and added v0.16.4 (check-resolvable ship) above them.

* refactor: extract findRepoRoot to src/core/repo-root.ts

Moves findRepoRoot() from private in doctor.ts to a zero-dependency shared
module with a parameterized startDir for test hermeticity. Doctor imports
the shared version; no behavior change (default arg matches prior semantics).

The new gbrain check-resolvable CLI needs findRepoRoot too; importing from
doctor.ts would drag in DB/progress dependencies.

* feat: gbrain check-resolvable CLI wrapper

Standalone CLI gate over checkResolvable(). Exits 1 on any issue (warnings
or errors) per the README:259 contract, stricter than doctor's resolver_health
which ignores warnings. Doctor has 15 other checks to lean on; the standalone
command has nowhere to hide.

- Stable JSON envelope: {ok, skillsDir, report, autoFix, deferred, error, message}
- --fix auto-applies DRY fixes via autoFixDryViolations before re-checking
- --dry-run with --fix previews without writing; autoFix.fixed shows diff
- --verbose prints the deferred-checks note (Checks 5 + 6)
- --skills-dir PATH for hermetic test runs
- Permissive on unknown flags, matching lint/orphans/publish convention

Checks 5 (trigger routing eval) and 6 (brain filing) are tracked as separate
GitHub issues and surfaced via the deferred[] field in --json output.

Covered by 17 new test cases (flag parsing, JSON envelope shape, exit-code
regression gates, --fix wiring, --verbose output).

* chore: bump version and changelog (v0.16.4)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: track check-resolvable issue-URL swap in TODOS

Defers the filing of GitHub tracking issues for Checks 5 (trigger routing
eval) and 6 (brain filing) plus the TBD-check-5/TBD-check-6 URL replacement
in src/commands/check-resolvable.ts. Unblocks merging PR #325.

* test: fix repo-root CI failure — assert parity, not path contents

The 'default arg uses process.cwd()' test asserted the returned path
matched /honolulu/, which is the local workspace name but not the CI
runner's checkout path (/home/runner/work/gbrain/gbrain). The test's
real purpose is behavioral parity: findRepoRoot() === findRepoRoot(cwd).
Assert that directly instead of pattern-matching paths.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 02:07:00 -07:00
Garry Tan
418d955fd3 docs: v0.16.1 — minions worker deployment guide (from #287) (#317)
* docs: v0.16.1 — minions worker deployment guide (from #287)

New docs/guides/minions-deployment.md covering persistent worker deploy
patterns (watchdog cron, inline --follow for cron-only workloads) plus
the sharp edges of running gbrain jobs work against Supabase in
production.

Addresses a real gap: existing minions docs (minions-fix.md,
minions-shell-jobs.md) cover schema repair and shell-job security,
not deploy patterns. With v0.16.0's durable agent runtime, the
persistent worker is now load-bearing for subagent + subagent_aggregator
handlers too, so a supervised deploy story matters.

Pre-landing accuracy pass corrected five factual bugs against current
source:
- max_stalled column default (5, not 1 or 3)
- stalled-jobs smoke-test query (active, not waiting)
- watchdog SIGTERM-to-SIGKILL grace (10s minimum, not 2s)
- cron env pattern (crontab env lines, not source ~/.bashrc)
- --follow exit semantics (blocks until submitted job is terminal,
  not until queue is empty)

Docs-only. No code changed. Zero migration required.

Contributed by a downstream agent fork via #287.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: credit Wintermute correctly in v0.16.1 CHANGELOG

Wintermute is gbrain's own OpenClaw instance running in production, not a
community contributor. The original CHANGELOG framing ("community contributor
@wintermute") understated the funnier truth: the agent built on top of the
project wrote the deploy guide for the project after hitting its sharp edges
in production. Dogfooding with extra steps.

Co-Authored-By: Wintermute (OpenClaw) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: rewrite minions deployment guide for agent line-by-line execution

Fixes 12 findings from reading v0.16.1 guide as-an-agent would:

Real bugs:
- Crontab syntax wrong for user crontabs (6-field format dumped into
  `crontab -e` got "bad minute" or parsed `user` as the command). Now two
  labeled blocks: 5-field for `crontab -e`, 6-field for `/etc/crontab`.
- Watchdog restart loop (old shutdown lines in unrotated log re-matched
  every 5 min forever). New `minion-watchdog.sh` writes 2-line PID file
  (PID + restart epoch) and only considers log lines newer than the
  epoch. Regex rewritten explicit (mawk rejects `{n}` intervals).
- Credentials in world-readable /etc/crontab. Secrets move to
  /etc/gbrain.env (mode 600), referenced via BASH_ENV in crontab.

Structural:
- Preconditions block (5 fail-fast checks).
- "Which option?" decision tree.
- Template variable table (6 vars documented).
- Upgrade section (v0.13.x -> v0.16.2 checklist).
- Option 3: systemd.service + Procfile + fly.toml.partial snippets.
- Uninstall section.
- `--follow` example uses `gbrain embed --stale` (a real command) instead
  of the fictional `gbrain enrich`.
- Dead-end "Proposed CLI flags (not yet implemented)" replaced with a
  "Tune per-job today" callout pointing at flags that exist.
- Known Issues rewritten as imperatives.

Also wires `docs/guides/minions-deployment.md` into `scripts/llms-config.ts`
under the Configuration section so remote agents fetching llms.txt /
llms-full.txt see the guide by name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.16.2)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync v0.16.2 CHANGELOG with the actual --follow example in the guide

The shipped docs/guides/minions-deployment.md uses `gbrain embed --stale`
(a real command) but the v0.16.2 CHANGELOG entry still referenced
`gbrain enrich --brain $GBRAIN_WORKSPACE` (the older draft). Bring the
CHANGELOG in line with what actually shipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 00:01:08 -07:00
Garry Tan
0e9f8814a5 feat: v0.16.0 — durable agent runtime (gbrain agent + subagent handler + plugin loader) (#258)
* refactor(mcp): extract buildToolDefs helper for subagent tool registry reuse

The inline operations.map(...) block in src/mcp/server.ts became the only
source of truth for agent-facing tool definitions. Extract into a reusable
exported helper so the v0.15 subagent tool registry can call it with a
filtered OPERATIONS subset instead of duplicating the shape.

Byte-for-byte equivalence regression pinned in test/mcp-tool-defs.test.ts —
legacy inline mapping kept verbatim inside the test so any future drift
between the new helper and the pre-extraction MCP schema fails loudly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(operations): subagent-aware OperationContext + put_page namespace

Adds three optional fields to OperationContext:
  - jobId?: number       — the currently running Minion job id
  - subagentId?: number  — the owning subagent job id for tool-dispatched calls
  - viaSubagent?: boolean — FAIL-CLOSED flag for agent-path gating

put_page now enforces a namespace rule when invoked on the subagent tool
dispatch path (viaSubagent=true): writes MUST target
`wiki/agents/<subagentId>/...`. Anchored, slash-boundary enforced so a
collision like `wiki/agents/12evil/...` can't impersonate subagent 12.

The check runs BEFORE the dry-run short-circuit so preview calls surface
the same rejection. Fail-closed: a missing subagentId with viaSubagent=true
rejects every slug rather than letting a dispatcher bug open a hole.

Existing callers unaffected — all three fields are optional and the legacy
put_page behavior is unchanged when viaSubagent is undefined/false.

12 regression + namespace tests pin:
  - local CLI writes (viaSubagent unset) accept arbitrary slugs
  - MCP writes (remote=true, viaSubagent unset) accept arbitrary slugs
  - subagent-path: anchored prefix accepted, wrong id rejected, prefix-
    collision defeated, leading-slash rejected, bare-prefix rejected,
    fail-closed on missing/NaN subagentId, permission_denied code emitted

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(schema): v0.15.0 subagent runtime tables + migration orchestrator

Adds three new tables for the durable LLM agent runtime:

  subagent_messages         — Anthropic message-block persistence.
                              Parallel tool_use blocks in one assistant
                              message live in content_blocks JSONB, not
                              across rows (fixes the (job_id, turn_idx, role)
                              misdesign codex caught in v0.13 drafting).

  subagent_tool_executions  — Two-phase tool ledger. INSERT pending before
                              execute, UPDATE complete/failed after. Replay
                              re-runs pending rows only if the tool is
                              idempotent (v1 ships only idempotent tools so
                              this is preventive).

  subagent_rate_leases      — Lease-based concurrency cap for outbound
                              providers (e.g. anthropic:messages). Stale
                              leases auto-prune on next acquire so crashed
                              workers can't strand capacity.

All DDL uses CREATE TABLE/INDEX IF NOT EXISTS — order-independent vs
PR #244's initSchema() reorder, and idempotent across fresh-install +
upgrade paths. Shipped in both src/schema.sql (Postgres) and
src/core/pglite-schema.ts (PGLite); schema-embedded.ts regenerated.

Migration orchestrator v0_15_0.ts (phases: schema → verify → record).
v0_14_0.ts is a no-op stub so the registry's version sequence stays
gapless (v0.14.0 shipped shell-jobs — code change, no DB migration).

10 unit tests for registry wiring, ordering, dry-run phase behavior, and
schema-embedded table presence. test/apply-migrations.test.ts updated for
the two new registry entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): emit child_done on every terminal + max_stalled per-job + terminal set fix

Three correctness fixes the v0.15 subagent aggregator spine depends on:

1. child_done emission on ALL terminal transitions, not just success.
   - completeJob already emitted on success — now also tags outcome='complete'.
   - failJob newly emits on terminal 'failed' or 'dead' (outcome='failed'|'dead',
     error=<text>), BEFORE the parent-terminal UPDATE so the EXISTS guard on
     the inbox INSERT doesn't skip it on fail_parent paths (codex catch).
   - cancelJob now emits outcome='cancelled' per descendant with a parent.
   - handleTimeouts now emits outcome='timeout' per timed-out child.
   ChildDoneMessage gains optional { outcome, error } — backwards compatible
   (legacy writers omitted them; consumers treat absent outcome as 'complete').

2. Parent-resolution terminal set now includes 'failed'.
   Pre-v0.15 the `NOT EXISTS (... status NOT IN ('completed','dead','cancelled'))`
   guard treated a failed child as still-pending, stranding aggregator parents
   that chose on_child_fail='continue' or 'ignore' in waiting-children forever.
   Expanded to {completed, failed, dead, cancelled} everywhere parent resolution
   reads child status (completeJob inline, failJob remove_dep + continue,
   cancelJob sweep, handleTimeouts sweep, and the resolveParent method itself).

3. MinionJobInput.max_stalled threads through MinionQueue.add() on INSERT.
   Column exists with default 1 — that is "first stall → dead", which defeats
   crash recovery for long-running handlers. Subagent children will set
   max_stalled: 3 to survive mid-run worker kills. Second-submitter under an
   idempotency-key hit does NOT mutate the existing row (codex-flagged
   footgun — first-submit options are load-bearing state).

13 unit tests pin: emission on each of completeJob/failJob/cancelJob/
handleTimeouts, insertion order on fail_parent, terminal-set expansion with
continue policy, max_stalled default + override + idempotency behavior.

E2E tier 1 (Postgres) passes 141 tests unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): rate-leases + waitForCompletion infra for v0.15 subagent

Two infrastructure modules the subagent handler spine depends on:

rate-leases.ts — lease-based concurrency cap for outbound providers
(anthropic:messages, openai:*, etc.). Counter-based limiters leak capacity
on worker crash; leases are owner-tagged rows with expires_at that
auto-prune on the next acquire. Two-phase: txn-scoped pg_advisory_xact_lock
guards the check-then-insert so concurrent acquires can't both win the
"last slot". renewLeaseWithBackoff retries 3x (250/500/1000ms) for mid-
call DB blips — on persistent failure the LLM-loop caller aborts with a
renewable error so the worker re-claims and the rate invariant is
preserved. Owner FK cascades clean up leases on job deletion.

wait-for-completion.ts — poll-until-terminal helper for CLI callers.
Minions' NOTIFY is worker-side only; `gbrain agent run --follow` polls
getJob() until status is {completed, failed, dead, cancelled}. TimeoutError
carries jobId + elapsedMs and does NOT cancel the job — the user can
inspect via `gbrain jobs get <id>` later. Supports AbortSignal for Ctrl-C
without throwing. Default pollMs is 1000 on Postgres, 250 on PGLite (inline
CLI has no network RTT).

21 unit tests cover: single/multi acquire under cap, rejection past cap,
release frees slot, different keys are independent, stale prune, cascade
on owner delete, renew bumps expires_at, renew on missing is false,
backoff path success + pruned short-circuit. waitForCompletion: fast-path
terminal, transitions mid-wait (completed/failed/cancelled), TimeoutError
shape, abort-signal early exit, non-existent job error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): subagent ToolDef types + brain-tool registry (v0.15)

Types first so the handler has a stable contract:
  - SubagentHandlerData / AggregatorHandlerData — the two job.data shapes
  - ToolCtx (engine, jobId, remote, signal) + ToolDef (name, description,
    input_schema, idempotent, execute) — Anthropic-envelope, distinct from
    the MCP McpToolDef extraction landed earlier
  - ContentBlock discriminated union for subagent_messages.content_blocks
  - SubagentStopReason + SubagentResult emitted on terminal completion

brain-allowlist.ts derives one ToolDef per allow-listed OPERATION. Reuses
the ParamDef → JSONSchema shape from the MCP extraction in a local helper
(Anthropic's input_schema field diverges from MCP's inputSchema by a
character). The 11-name allow-list is read-safe + put_page — every
destructive / filesystem / identity-mutating op stays off by default.

put_page gets a namespace-wrapped tool schema: `slug` pattern = anchored
`^wiki/agents/<subagentId>/.+`. The server-side check in put_page op
(shipped in prior commit) is still the authoritative gate — the schema
just helps the model write correct slugs first-try. `subagentId` is
plumbed into the ToolCtx so the viaSubagent=true fail-closed path lights
up on every tool-dispatched put_page.

filterAllowedTools narrows a registry by subagent_def's allowed_tools
frontmatter field. Rejects unknown names at load time (no silent drop —
typos in a skills/subagents/*.md would otherwise ship to prod with a
tool silently missing).

18 tests pin: every allowlist name exists in OPERATIONS (catches upstream
rename), Anthropic name regex, put_page namespace pattern per-subagent,
execute() routes through the op handler with viaSubagent=true, out-of-
namespace put_page throws permission_denied, filter passes prefixed +
unprefixed names, rejects unknowns, deduplicates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): subagent-audit JSONL + transcript renderer

Two small plumbing pieces the v0.15 subagent handler + `gbrain agent logs`
depend on:

subagent-audit.ts — JSONL-rotated audit log mirroring the shell-audit
pattern. Two event flavors: submission (one line per job submit) and
heartbeat (one line per turn boundary — llm_call_started / completed /
tool_called / tool_result / tool_failed). Heartbeats fix the "--follow on
a long Anthropic call shows nothing for 30 seconds" problem codex flagged.
Never logs prompts or tool inputs (PII risk — subagent input_vars may
carry user-supplied free text); DOES log tokens, ms_elapsed, tool_name,
first 200 chars of error text. Rotates weekly via ISO week. `readSubagent
AuditForJob` is the readback path for `gbrain agent logs` — scans the
current + prior week file so job boundaries across weeks still resolve.
`GBRAIN_AUDIT_DIR` overrides the default ~/.gbrain/audit/ for container
deploys.

transcript.ts — renders subagent_messages + subagent_tool_executions to
markdown. Message order is authoritative; tool rows splice under their
owning assistant tool_use by tool_use_id. Handles text, tool_use (with
pending / complete / failed execution rows), tool_result (skipped if
we already rendered the owning tool_use — avoids double-printing), and
unknown block types (fenced JSON dump for diagnostics). Output is
UTF-8-safe truncated at maxOutputBytes.

21 unit tests: ISO week filename rotation (incl. 2027-01-01 → W53-2026
boundary), submission + heartbeat write shapes, 200-char error cap, best-
effort write failure doesn't throw, readback filters by job_id and
sinceIso. Transcript: empty input, ordering, token line, tool_use +
complete/failed/pending execution rendering, truncation, unknown-block
diagnostic dump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): subagent LLM-loop handler with crash-resumable replay

The main event: runs one Anthropic Messages API conversation with tool
use, persists every turn + tool execution, and resumes cleanly after a
worker kill anywhere in the loop.

Design points that carry the v0.15 guarantees:

  1. Two-phase tool persistence. INSERT status='pending' before dispatch,
     UPDATE to 'complete' or 'failed' after. subagent_messages rows are
     the canonical conversation; subagent_tool_executions rows are the
     canonical "did this tool run + what did it return". Either DB commit
     is atomic, so replay has a single source of truth.

  2. Replay reconciliation. If the last persisted message is an assistant
     with tool_use blocks AND no following synthesized user message, we
     crashed mid-dispatch. On resume, finish those tools first (respecting
     idempotent flag for 'pending' rows), synthesize the user turn, and
     THEN call the LLM again. Non-idempotent pending rows abort the job
     with a clear error — v0.15 ships only idempotent tools so this is
     preventive.

  3. Rate lease around every LLM call. acquireLease before, releaseLease
     after (both success and error paths). acquired=false throws
     RateLeaseUnavailableError — the worker treats it as a renewable
     error and re-claims later, so a temporary capacity cap doesn't fail
     the job terminally.

  4. Anthropic prompt caching. system block gets cache_control=ephemeral;
     the LAST tool def gets it too (Anthropic caches everything up to and
     including the marked block). ~10x cost reduction on multi-turn
     agents per the plan.

  5. Dual-signal abort. AbortSignal.any merges ctx.signal (timeout / lock
     loss / cancel) with ctx.shutdownSignal (worker SIGTERM). Both feed
     the Anthropic call's AbortSignal; mid-turn abort bails before the
     next LLM call with whatever turns are already persisted. Node ≥ 20
     has AbortSignal.any; older runtimes get a manual-merge polyfill.

  6. Injectable Anthropic client. The real SDK implements MessagesClient
     structurally; tests inject a FakeMessagesClient that scripts
     responses.

12 unit tests pin: no-tool happy path, single tool_use complete, tool
throws → failed row + loop continues, unknown tool name rejection,
max_turns cap, crash-then-resume with partial state, replay skips already-
complete tool execs without re-invoking execute, non-idempotent pending
rejects on resume, lease acquire + release roundtrip, RateLeaseUnavailable
under cap-full, missing prompt validation, allowed_tools unknown-name.

NOT in v0.15: refusal detection (stop_reason + content shape), stop_reason
=max_tokens partial recovery, mid-call lease renewal with backoff loop.
All three are documented as P2 items in the plan file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): subagent_aggregator handler with mixed-outcome rendering

Claims AFTER all subagent children resolve — by then Lane 1B's queue
changes have posted one child_done message per terminal transition into
this job's inbox (complete / failed / dead / cancelled / timeout). The
aggregator reads those, builds a deterministic markdown summary, and
returns it as the handler result.

Not an LLM call in v0.15 — output is reproducible concatenation so
fan-out runs stay comparable. v0.16+ can add an LLM synthesis pass
behind an opt-in flag.

Contract:
  - empty children_ids → `(no children)` marker
  - missing child_done (shouldn't happen under v0.15 invariants but
    possible if a terminal-state path slipped past Lane 1B) → counted as
    failed with "no child_done message observed" error
  - non-complete outcomes: result is null in the output so no payload
    leaks alongside a failure label
  - children appear in the order children_ids was supplied
  - custom aggregate_prompt_template replaces the markdown header

13 unit tests cover: empty input, all-success, mixed outcomes, result
suppression on failure, missing child_done handling, order preservation,
custom template, progress + log emission, stringified JSONB payload
parsing, non-child_done inbox filtering, legacy-writer outcome fallback,
and internal helper edges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): GBRAIN_PLUGIN_PATH loader + plugin-authors guide (v0.15)

Plumbing that makes Wintermute (and future downstream agents) day-1
usable on v0.15. Host repos drop a `gbrain.plugin.json` + `subagents/`
directory somewhere, set GBRAIN_PLUGIN_PATH (colon-separated like \$PATH),
and their custom subagent defs load at worker startup.

Path policy is strict: absolute paths only. Relative, ~-prefixed, and
URL-style (https://, file://) all rejected with warnings — the user
controls where plugins live. Non-existent paths and files (not dirs) are
warned and skipped so a typo doesn't crash worker startup.

Collision policy: left-wins. If two plugins ship a subagent with the same
name, the first one in GBRAIN_PLUGIN_PATH keeps it and the other gets a
warning naming both sources. Deterministic + debuggable.

Trust policy: plugins ship subagent defs ONLY. Cannot declare new tools,
cannot extend the brain allow-list, cannot override safety flags. The
subagent def's `allowed_tools:` frontmatter MUST subset the derived
registry — validation happens at load time (worker startup), not at
dispatch time, so a typo in a skill gives a loud startup error instead
of silently "tool never fires at 3am."

Manifest `plugin_version: "gbrain-plugin-v1"` locks the contract. Unknown
versions rejected. `subagents` field escape attempts (`../../../etc` etc)
rejected. gray-matter handles the markdown frontmatter parse — subagent
defs don't conform to the page schema, so we don't use parseMarkdown.

docs/guides/plugin-authors.md is the Wintermute-facing walkthrough.
Covers the minimum viable plugin shape, the three policies, the
frontmatter fields, known caveats (audit JSONL is local-only, tool calls
always run remote=true, put_page is namespace-scoped).

22 unit tests pin path rejection, missing/invalid manifest, unsupported
version, escape-attempt, basename fallback for missing frontmatter.name,
allowed_tools round-trip, unknown-tool rejection with validAgentToolNames,
empty env, multi-path, collision warning with left-wins, trimmed paths,
manifest-rejection as warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): gbrain agent run + logs + worker registration (v0.15 Lane 4H)

Three integration seams wired:

src/commands/agent.ts — \`gbrain agent run\`. Submits subagent jobs (or a
fan-out of N + aggregator) under the trusted-submit flag so the
PROTECTED_JOB_NAMES guard doesn't reject. Fan-out path creates the
aggregator first (so children can reference its id as parent), submits
each child with on_child_fail='continue' (required by Lane 1B's terminal-
set + child_done machinery), then jsonb_set's the aggregator's
children_ids. Short-circuits a 1-entry manifest to a single subagent
with no aggregator. Follow mode runs agent-logs streaming + waitFor
Completion in parallel and exits on terminal status; detach prints the
job id and exits. Ctrl-C is handled as detach, not cancel — the job
keeps running, consistent with durability invariants.

src/commands/agent-logs.ts — \`gbrain agent logs\`. Merges ~/.gbrain/audit/
subagent-jobs-*.jsonl (heartbeats + submissions) with subagent_messages
(persisted conversation) in one chronological stream. --follow polls at
1s and exits when the job hits terminal. --since accepts ISO-8601 OR
relative shorthand (5m / 1h / 2d). Writes transcript tail (full message
+ tool tree) only for terminal jobs, so mid-run --follow doesn't spam a
half-rendered transcript.

src/commands/jobs.ts registerBuiltinHandlers — matches the shell-handler
opt-in shape. GBRAIN_ALLOW_LLM_JOBS=1 registers the subagent +
subagent_aggregator handlers, then loads plugins from GBRAIN_PLUGIN_PATH
with validAgentToolNames pulled from BRAIN_TOOL_ALLOWLIST. Every plugin
warning + loaded-plugin line prints to stderr, mirroring the openclaw-
seam startup convention.

src/core/minions/protected-names.ts — subagent + subagent_aggregator
join the protected set. MCP submit_job returns permission_denied; only
trusted-CLI callers (with allowProtectedSubmit) can insert these rows.

src/cli.ts — adds 'agent' to CLI_ONLY + dispatches it like 'jobs'.

Test fallout: subagent-handler.test.ts + subagent-transcript.test.ts
helpers now submit under allowProtectedSubmit (they insert rows named
'subagent' directly against the queue). 23 new tests in agent-cli.test.ts
cover: flag parsing (including --detach implies !follow, --tools comma
split, -- terminator, unknown flag throw), --since parse (ISO, relative
5m/2h/1d, unparseable error), protected-name guard for all three names,
trusted-submit gate, and a fan-out integration check that verifies the
aggregator + children shape after --fanout-manifest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): rename max_children test's spawned jobs off the protected 'subagent' name

The spawn-storm test submitted 50 literal-string 'subagent' children to
exercise the max_children row-lock serialization. In v0.15 'subagent' is
a PROTECTED_JOB_NAME (CLI-only; trusted submit required), so the old
literal submission now throws before reaching the row-lock check.

The test is about max_children semantics, not the v0.15 subagent runtime
specifically — rename the child name to 'child_worker' so the test
exercises the exact same queue.add path without tripping the new guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ship): v0.15.0 — VERSION, CHANGELOG, README, upgrading-agents, CLAUDE.md

Bumps VERSION → 0.15.0 and package.json → 0.15.0 (resolves the pre-existing
drift — on master, VERSION=0.14.0 but package.json=0.13.1; src/version.ts
reads package.json, so this is what the binary prints now).

CHANGELOG lands the release-summary entry in the GStack voice + the full
itemized change list (11 new modules, 3 new tables, queue correctness
fixes, trust-model additions, 159 new unit tests). Voice rules respected
— no em dashes, no AI vocabulary, real file names + real numbers.

README gets a "Durable agents: `gbrain agent` (v0.15)" section next to
the Minions block, with the three canonical CLI shapes (single run,
fanout-manifest, logs --follow) and a pointer to plugin-authors.md.

docs/UPGRADING_DOWNSTREAM_AGENTS.md gets a full v0.15.0 section covering
the four adoption steps downstream agents (Wintermute and similar) need:
(1) worker opt-in via GBRAIN_ALLOW_LLM_JOBS, (2) moving custom subagent
defs to a plugin repo, (3) replacing ephemeral subagent runs with durable
`gbrain agent run`, (4) the put_page namespace rule for agent-driven writes.

CLAUDE.md updated with concise per-file descriptions for every new module:
the handler, aggregator, audit, rate-leases, wait-for-completion,
transcript, plugin-loader, brain-allowlist, tool-defs extraction, agent
CLI + logs CLI, and the registerBuiltinHandlers wiring for subagent
handlers + plugin-loader.

Verified: binary builds (940 modules, 89ms compile), prints `gbrain 0.15.0`,
`gbrain agent --help` shows the new subcommand shape. 170 new tests pass
(full v0.15 surface). Full unit suite passes bar one parallel-load
flake on a pre-existing E2E (graph-quality, passes in isolation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): drop GBRAIN_ALLOW_LLM_JOBS flag — subagent handlers always-on

The env flag was ceremony. Shell jobs need the flag because they execute
arbitrary CLI commands (RCE surface). Subagent jobs don't — they call the
Anthropic API with whatever ANTHROPIC_API_KEY is in env, so the key is
already the cost gate (no key → SDK fails on the first turn). And
who-can-submit is already protected by PROTECTED_JOB_NAMES +
TrustedSubmitOpts: MCP callers get permission_denied; only `gbrain agent
run` with allowProtectedSubmit can insert subagent / subagent_aggregator
rows. The flag added nothing the existing guards didn't already give us.

registerBuiltinHandlers now always registers subagent + subagent_aggregator
and loads GBRAIN_PLUGIN_PATH plugins. Worker startup prints:

  [minion worker] subagent handlers enabled

instead of the conditional enabled/disabled pair. Plugin discovery runs
unconditionally — empty PATH is a no-op.

README, CHANGELOG, docs/UPGRADING_DOWNSTREAM_AGENTS, CLAUDE.md, agent CLI
help text, and subagent handler docstring all updated to drop the flag
reference. Shell handler's GBRAIN_ALLOW_SHELL_JOBS gate is untouched —
separate concern (RCE, not billing).

Full suite: 1859 pass, 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: scrub private agent-fork name from all public artifacts

Enforces the rule added to CLAUDE.md (privacy section): never say
`Wintermute` in any CHANGELOG, README, doc, PR, or commit message.
Reader-facing copy says `your OpenClaw` (the term covers every
downstream OpenClaw deployment — Wintermute, Hermes, AlphaClaw — in
one umbrella the reader already recognizes). First-person /
origin-story copy says `Garry's OpenClaw` (honest that this is the
production deployment driving the feature, without exposing the
private agent's name).

Swept across:
  CHANGELOG.md (v0.15 entry + 4 historical mentions)
  README.md
  TODOS.md
  docs/UPGRADING_DOWNSTREAM_AGENTS.md
  docs/guides/plugin-authors.md (including example plugin names)
  docs/guides/plugin-handlers.md
  docs/guides/minions-fix.md
  docs/designs/KNOWLEDGE_RUNTIME.md (27 refs, mostly analytical)
  docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md
  skills/migrations/v0.11.0.md
  skills/skillpack-check/SKILL.md
  scripts/skillify-check.ts
  src/commands/doctor.ts
  src/commands/migrations/v0_15_0.ts
  src/commands/skillpack-check.ts
  src/core/enrichment/completeness.ts
  src/core/minions/plugin-loader.ts
  src/core/operations.ts
  src/core/output/scaffold.ts

Intentionally kept (these mentions define/test the rule itself):
  CLAUDE.md — the privacy rule section necessarily uses the literal
  name to define the restriction and examples
  test/plugin-loader.test.ts — fixture name in a plugin-loading test;
  renaming risks breaking assertion logic
  test/integrations.test.ts — the word appears in a privacy-regex
  test that explicitly enforces name redaction
  test/doctor-minions-check.test.ts — a comment referencing the rule
  CEO plan artifact at ~/.gstack/projects/… — private, not distributed

Binary builds (941 modules), 198/198 relevant tests pass, `gbrain --version`
prints `0.15.0`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: gitignore bun --compile artifacts with a glob, not specific hashes

Each `bun build --compile` emits a fresh hash-named `.*-*.bun-build` file
in cwd. The prior entries listed two specific hashes that were already
stale, so every build after those created a new untracked file requiring
manual cleanup.

Replace the two stale entries with `*.bun-build` so any current or future
compile artifact is ignored automatically.

Verified: ran `bun build --compile`, got two new `.*-*.bun-build` files,
`git status` stays clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ship): rename v0.15.0 → v0.16.0

gbrain master is at 0.14.2. Other 0.15.x PRs may land before/after
this one — we bump the minor (new capability) and lock to 0.16.0 so
ordering with concurrent work doesn't matter.

Touches:
- VERSION: 0.15.0 → 0.16.0
- package.json: 0.15.0 → 0.16.0
- Rename src/commands/migrations/v0_15_0.ts → v0_16_0.ts (+ all
  version strings inside + import in index.ts registry)
- Rename test/migrations-v0_15_0.test.ts → migrations-v0_16_0.test.ts
- test/apply-migrations.test.ts: skippedFuture lists now reference
  '0.16.0'
- test/put-page-namespace.test.ts + test/mcp-tool-defs.test.ts: Lane
  comment refs updated
- src/schema.sql + src/core/pglite-schema.ts: "v0.15.0" section
  comment updated; src/core/schema-embedded.ts regenerated
- CHANGELOG.md: top entry renamed to [0.16.0]; inline v0_15_0 /
  v0.15.0 refs swept
- docs/UPGRADING_DOWNSTREAM_AGENTS.md: section heading v0.15.0 → v0.16.0

Verified: `gbrain --version` prints 0.16.0, migration registry /
buildPlan / put_page / mcp-tool-defs / handlers tests all green
(49/49).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: reframe v0.16 durability headline around OpenClaw crashes

"Laptop closed mid-run" framing implied a consumer workflow. Real pain is
OpenClaw subagents dying daily on worker kill, memory blip, or timeout.
Headline + README copy match the body now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate llms-full.txt after README copy change

Regen drift guard caught the README edit from 83beec4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:14:17 -07:00
Garry Tan
a4df40fe5c feat: v0.15.2 - bulk-action progress streaming (stderr reporter, agent-visible heartbeats) (#293)
* feat(progress): step 1 - shared ProgressReporter + CliOptions

Adds the foundation for v0.14.2's bulk-action progress streaming work:

- src/core/progress.ts: dependency-free reporter with auto/human/json/quiet
  modes, TTY-aware rendering, time+item rate gating, heartbeat helper for
  slow single queries, dot-composed child phases, EPIPE defense (both sync
  throw and async 'error' event), and a singleton module-level signal
  coordinator so SIGINT/SIGTERM emits abort events for all live phases
  without leaking per-instance listeners.

- src/core/cli-options.ts: parseGlobalFlags() for --quiet /
  --progress-json / --progress-interval=<ms> (both space and = forms),
  plus cliOptsToProgressOptions() that resolves to the right mode. Non-TTY
  default is human-plain one-line-per-event; JSON is explicit opt-in so
  shell pipelines don't suddenly see structured noise.

- test/progress.test.ts (17 cases): mode resolution, rate gating, no-fake-
  totals on heartbeat paths, EPIPE paths, SIGINT singleton, child phase
  composition.

- test/cli-options.test.ts (14 cases): flag parsing, invalid values,
  interleaved flags, mode resolution.

Follow-ups wire doctor/embed/files/export/extract/import/sync/migrate/
repair-jsonb/backlinks/orphans/lint/integrity/eval/autopilot/jobs plus
the apply-migrations orchestrators through this reporter, and route
Minion handlers to job.updateProgress instead of stderr. See the plan
in ~/.claude/plans/.

1682 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(progress): step 2 - wire global flags into cli.ts

Parse --quiet / --progress-json / --progress-interval from argv BEFORE
command dispatch, strip them, stash resolved CliOptions on a module-level
singleton (same pattern as Commander's program.opts()) and on every
OperationContext created for shared-op dispatch.

- src/cli.ts: parseGlobalFlags(rawArgs) at the top of main(); setCliOptions
  once; dispatch sees only the stripped argv. Fixes the "gbrain
  --progress-json doctor" unknown-command case that Codex flagged.
- src/core/cli-options.ts: expose setCliOptions/getCliOptions/
  _resetCliOptionsForTest singleton. Commands that want progress call
  getCliOptions() to construct their reporter.
- src/core/operations.ts: OperationContext gains optional cliOpts field
  so shared-op handlers (and MCP-invoked ops that need a reporter) can
  read the same settings. MCP callers leave it undefined and consumers
  default to quiet.
- test/cli-options.test.ts: +4 cases covering singleton round-trip and
  an integration smoke spawning `bun src/cli.ts --progress-json --version`
  to prove the global flag survives dispatch.

45 relevant unit tests pass (progress + cli-options + cli.test.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(progress): step 3a - doctor + orphans heartbeat streaming

Doctor on a 52K-page brain used to sit silent for 10+ minutes while the
DB checks ran, then get killed by an agent timeout. Wired through the
new reporter so agents see which check is running and the slow ones
heartbeat every second.

doctor.ts:
- Start a single `doctor.db_checks` phase around the DB section, with a
  per-check heartbeat before each step (connection, pgvector, rls,
  schema_version, embeddings, graph_coverage, integrity, jsonb_integrity,
  markdown_body_completeness).
- jsonb_integrity now scans 5 targets, not 4: added page_versions.
  frontmatter so the check surface matches `repair-jsonb` (per Codex
  review of the plan — the old 4-target scan missed a known repair site).
  Per-target heartbeat so 50K-row scans show incremental progress.
- markdown_body_completeness: wrap the existing query in a 1s heartbeat
  timer. The regex scan over rd.data ->> 'content' can't be paginated
  usefully; this just lets agents see life during the sequential scan.
  No fake totals — the LIMIT 100 query has no meaningful total count.
- integrity sample: same heartbeat pattern around the 500-page scan.

orphans.ts:
- findOrphans() wraps the NOT EXISTS anti-join in a 1s heartbeat.
  Keyset pagination was considered and rejected: without an index on
  links.to_page_id it's no faster than the full scan, and may re-plan
  the anti-join per batch. A schema migration adding that index is the
  right fix and is queued for v0.14.3.

Follow-ups:
- Step 3b: wire embed/files/export (the \r-only stdout offenders).
- Step 5: end-to-end progress test spawning `gbrain doctor --progress-json`
  against a fixture brain, asserting stderr events and clean stdout.

All existing unit tests continue to pass (76/76 in doctor + orphans +
progress + cli-options).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(progress): step 3b - embed + files + export stderr progress

Replaces the \r-on-stdout progress pattern in the three worst offenders
(embed, files sync, export) with the shared reporter on stderr. Stdout
now carries only final summaries, so scripts and tests that grep for
counts ("Embedded N chunks", "Files sync complete", "Exported N pages")
still work when output is piped.

- embed.ts: runEmbedCore accepts an optional onProgress callback. The
  CLI wrapper builds a reporter and passes reporter.tick(); Minion
  handlers will pass job.updateProgress in Step 4. Worker-pool is
  single-threaded JS so no rate-gate race (per Codex review #18).
- files.ts syncFiles(): tick per file; summary preserved on stdout.
- export.ts: tick per page; summary preserved on stdout.

Also fixes a --quiet flag collision. `skillpack-check` has its own
--quiet mode (suppress all stdout). parseGlobalFlags strips --quiet
globally now, and skillpack-check reads the resolved CliOptions
singleton via getCliOptions() instead of re-parsing argv. Test updated
to match the stripping behavior.

1686 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(progress): step 3c - extract + import + sync reporter streaming

Extract, import, and sync now stream per-file progress to stderr through
the shared reporter. All three kept their stdout summaries + JSON
action-events intact so existing tests + agent scripts are unaffected.

- extract.ts (4 paths: links/timeline × fs/db): replaced the ad-hoc
  `process.stderr.write({event:"progress"...})` lines with reporter
  ticks. Same channel (stderr), canonical schema now, visible in both
  text and --json modes. Stdout action-events (`add_link` /
  `add_timeline`) untouched — tests grep them.
- import.ts: the logProgress() function that printed every 100 files to
  stdout is now a progress.tick() call per file. Rate-gated by the
  reporter. Stdout still gets the final "Import complete (Xs)" summary
  and the --json payload.
- sync.ts: three new phases (`sync.deletes`, `sync.renames`,
  `sync.imports`) tick per file, so big syncs show each step rather than
  a single end-of-run summary. Phase hierarchy ready to be child()-chained
  into runImport / runEmbed later, per Codex review #26.

Updated the #132 nested-transaction regression test in test/sync.test.ts
to also accept the new hoisted-loop shape — the guarantee (this loop is
not wrapped in engine.transaction) still holds.

1686 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(progress): step 3d - migrate/repair/backlinks/lint/integrity/eval

Wires the remaining bulk commands through the reporter:

- migrate-engine: phase starts (migrate.copy_pages, migrate.copy_links),
  per-page tick. Old \"Progress: N/total\" stdout logs replaced by
  stderr ticks; final stdout summary preserved.
- repair-jsonb: per-column start + a heartbeat timer while each UPDATE
  runs (minutes on 50K-row tables). CRITICAL: stdout stays clean so
  migrations/v0_12_2.ts's JSON.parse(child.stdout) still works. Per
  Codex review #12.
- backlinks: 1s heartbeat around findBacklinkGaps() (sync double-walk
  of the brain dir).
- lint: tick per page; per-issue stdout output preserved.
- integrity auto: tick per page in the main resolver loop. The separate
  ~/.gbrain/integrity-progress.jsonl resume marker is untouched (its
  role shifts from live progress reporting to resume-only).
- eval: add an onProgress option to core's runEval(), CLI wraps with a
  reporter. Phases: eval.single / eval.ab. Tick per query.

core/search/eval.ts gains a RunEvalOptions type so future callers (MCP
eval op, Minion handlers) can also hook in without the reporter.

1686 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(progress): step 3e - onProgress callbacks on core libs

- src/core/embedding.ts: embedBatch() gains an optional
  EmbedBatchOptions.onBatchComplete callback, fired after each 100-item
  sub-batch. CLI wrappers pass reporter.tick; Minion handlers can pass
  job.updateProgress.
- src/core/enrichment-service.ts: enrichEntities() config gains
  onProgress(done, total, name) fired after each entity. Same split:
  CLI -> reporter, Minion -> DB-backed progress.

No CLI behavior change on its own. Wiring these callbacks into the
Minion handlers is Step 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(progress): step 4 - orchestrators + upgrade + minion handlers

- cli-options.ts: childGlobalFlags() returns the flag suffix to append
  to child gbrain subprocesses. Empty string by default, " --quiet
  --progress-json" when the parent has them set, so child behavior
  inherits the parent's progress-mode without scattering string-concat
  logic across every execSync site.

- migrations/v0_12_2.ts: each execSync inherits the parent's global
  flags. Phase C (repair-jsonb --dry-run --json) pins explicit stdio to
  ['ignore','pipe','inherit'] so child stderr streams straight through
  while stdout stays captured for JSON.parse. Per Codex review #12.
- migrations/v0_12_0.ts + v0_11_0.ts: same childGlobalFlags wiring at
  each gbrain-subcommand execSync.

- upgrade.ts: post-upgrade timeout bumped 300s → 30min (1_800_000 ms)
  with GBRAIN_POST_UPGRADE_TIMEOUT_MS override. The old 300s cap killed
  v0.12.0 graph-backfill migrations on 50K+ brains; the heartbeat
  wiring added in v0.14.2 makes long waits observable, so a generous
  ceiling no longer means users stare at a silent terminal.

- jobs.ts: the embed Minion handler passes job.updateProgress as the
  onProgress callback, so per-job progress is durable in minion_jobs
  and readable via `gbrain jobs get <id>`. Primary Minion progress
  channel is DB-backed — stderr from `jobs work` stays coarse for
  daemon liveness only. Per Codex review #20.

1686 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(progress): step 5 - E2E doctor-progress test + CI guard

scripts/check-progress-to-stdout.sh greps src/ for the banned
`process.stdout.write('\r…')` pattern that v0.14.2 removed from the
bulk-action codepaths. Wired into the `bun run test` script so any
future regression that puts progress back on stdout fails fast. An
empty allowlist documents the position: every known call site was
migrated; new exceptions need a rationale in the allowlist.

test/e2e/doctor-progress.test.ts (Tier 1, needs Postgres + pgvector):
- `gbrain --progress-json doctor --json`: stderr carries JSONL progress
  events with the canonical {event, phase, ts} shape, starts + finishes
  for `doctor.db_checks`. Stdout stays parseable JSON — no progress
  pollution.
- `gbrain doctor` (no flag): human-plain progress goes to stderr only,
  stdout stays free of `[doctor.db_checks]`.
- `gbrain --quiet doctor`: reporter emits nothing; doctor still runs to
  completion.

test/cli-options.test.ts: +2 spawning integration tests. One verifies
`gbrain --progress-json --version` keeps stdout clean of progress events
(single-shot commands that don't use a reporter aren't affected). One
guards the skillpack-check --quiet regression — --quiet suppresses
stdout by reading the resolved CliOptions singleton, not re-parsing argv.

Full test matrix:
  bun run test           -> 1726 pass / 184 skipped (no DB) / 0 fail
  bun run test:e2e       -> 136 pass / 13 skipped / 0 fail

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(progress): step 6 - docs + v0.14.2 release bump

- VERSION + package.json bumped to 0.14.2.
- docs/progress-events.md (new): canonical JSON event schema reference.
  Stable from v0.14.2, additive only. Lists every phase name shipped
  in this release, the five event types (start/tick/heartbeat/finish/
  abort), the TTY/non-TTY rendering rules, subprocess inheritance
  semantics, and the Minion DB-backed progress model.
- CLAUDE.md: "Bulk-action progress reporting" section under the build
  instructions; Key files entries for src/core/progress.ts,
  src/core/cli-options.ts, scripts/check-progress-to-stdout.sh, and
  docs/progress-events.md; doctor.ts entry updated to note the v0.14.2
  5-target jsonb_integrity scan + heartbeat wiring.
- CHANGELOG.md v0.14.2: full release summary per project voice rules.
  The "numbers that matter" table, per-command before/after grid,
  backward-compat warnings for stdout→stderr moves, and an itemized
  changes section covering reporter/CLI plumbing/schema/Minion
  handlers/doctor fixes/upgrade timeout/CI guard/tests. No em dashes.
  Real file paths, real commands, real numbers.
- skills/migrations/v0.14.2.md (new): agent migration note. Mechanical
  step is "nothing" since v0.14.2 is purely additive. Walks agents
  through the three new global flags, the 14 wired commands, the event
  schema cheat sheet, Minion progress via job.updateProgress, and
  scripts/verification commands.

Full test matrix:
  bun run test (unit + guards) -> 1726 pass / 184 skipped / 0 fail
  bun run test:e2e (Postgres)  -> 141 pass / 8 skipped / 0 fail

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version to 0.15.2, restore master's [0.14.2] CHANGELOG entry

Master sits at 0.14.2 (reliability wave). This PR lands on top as 0.15.2
(progress streaming wave). Splits the merge-time combined CHANGELOG entry
back into two discrete release sections so history stays honest:

- [0.15.2] = progress reporter, CliOptions, 14 wired commands, Minion
  embed handler, doctor jsonb_integrity 5-target fix, upgrade timeout bump,
  CI guard, progress unit+E2E tests.
- [0.14.2] = master's eight root-cause bug fixes, restored verbatim from
  origin/master.

Touched files:
- VERSION + package.json: 0.14.2 -> 0.15.2 (next patch off master).
- skills/migrations/v0.14.2.md -> skills/migrations/v0.15.2.md (rename
  + rewrite frontmatter + body to v0.15.2).
- CHANGELOG.md: split into two entries; progress-wave refs renamed
  v0.14.2 -> v0.15.2; reliability-wave entry restored from master.
- src/core/progress.ts, src/commands/doctor.ts, src/commands/sync.ts,
  src/commands/upgrade.ts, docs/progress-events.md, test/sync.test.ts:
  progress-wave v0.14.2 references -> v0.15.2. The remaining v0.14.2
  references in test/e2e/migration-flow.test.ts (Bug 3 context) and
  CLAUDE.md (reliability-wave key commands, Bug 3 ledger move) correctly
  point at master's 0.14.2 release.

Test matrix after version bump:
  bun run test -> 1780 pass / 179 skipped / 0 fail

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:54:13 -07:00
Garry Tan
ff10796a00 fix(wave): v0.15.1 - 4 hot issues + scope expansion (#248)
* fix(wave): 4 hot issues + 3 scope expansions (v0.13.1)

Addresses four user-filed regressions after v0.13.0 plus three adjacent
footgun closures.

* #170 — CREATE INDEX [CONCURRENTLY] IF NOT EXISTS idx_pages_updated_at_desc
  on pages (updated_at DESC). Engine-aware migration v12 with invalid-index
  cleanup on Postgres, plain CREATE on PGLite. ~700x on 30k+ row brains.
  Contributed by @fuleinist (#215).

* #219 — Minions schema default max_stalled 1 -> 5. v13 migration ALTERs
  the default and UPDATEs existing non-terminal rows (waiting/active/
  delayed/waiting-children/paused) so live queues get rescued on upgrade.
  Adds MinionJobInput.max_stalled with [1,100] clamp. New --max-stalled
  CLI flag on `jobs submit`. Reported by @macbotmini-eng.

* #218 — package.json postinstall surfaces errors instead of silencing.
  trustedDependencies whitelists @electric-sql/pglite. doctor
  schema_version check fails loudly when migrations never ran and links
  to #218. README + INSTALL_FOR_AGENTS warn against `bun install -g`.
  Reported by @gopalpatel.

* #223 — @electric-sql/pglite pinned to exactly 0.4.3 (was ^0.4.4).
  PGLiteEngine.connect() wraps PGlite.create() errors with a message
  pointing at the issue + gbrain doctor. Does NOT suggest 'missing
  migrations' as a cause (create-time abort happens before migrations
  run). Pin is unverified against macOS 26.3; error-wrap is the safety
  net. Reported by @AndreLYL.

* Scope: `gbrain jobs submit` gains --backoff-type/--backoff-delay/
  --backoff-jitter/--timeout-ms/--idempotency-key (MinionJobInput audit).
* Scope: `gbrain jobs smoke --sigkill-rescue` regression case (opt-in,
  CI-only) that simulates a killed worker and asserts the new default
  rescues.
* Scope: `gbrain doctor --index-audit` reports zero-scan Postgres indexes
  as drop candidates (informational; no auto-drop).

Infrastructure:
* Migration interface extended with sqlFor: { postgres?, pglite? } and
  transaction: boolean. Runner picks the engine-specific branch and
  bypasses engine.transaction() when transaction:false (required for
  CONCURRENTLY). BrainEngine.kind readonly discriminator added.
* scripts/check-jsonb-pattern.sh CI guard extended to block
  `max_stalled DEFAULT 1` from regressing.

Tests:
* 15 new unit tests: v12/v13 structural + behavioral assertions,
  max_stalled default/clamp/backfill, PGLite error-wrap source guard,
  engine kind discriminator.
* 3 regression tests pinned by IRON RULE.
* Full unit suite: 1416 pass.
* Full E2E suite against Postgres 16 + pgvector: 126 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.1)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync documentation for v0.13.1

CLAUDE.md "Key files" and "Commands" sections refreshed to match the
v0.13.1 fix wave:

- Note `BrainEngine.kind` discriminator on engine.ts
- Document v0.13.1 connect() error-wrap on pglite-engine.ts
- Refresh src/core/minions/ layout (no shell handler, no protected-names,
  no quiet-hours/stagger — that was v0.13-development scaffolding that
  did not ship)
- Add src/core/migrate.ts entry with `Migration` interface extensions
  (`sqlFor`, `transaction: false`)
- Document new `gbrain jobs submit` flags (--max-stalled, --backoff-type,
  --backoff-delay, --backoff-jitter, --timeout-ms, --idempotency-key)
- Document `gbrain jobs smoke --sigkill-rescue` regression guard
- Document `gbrain doctor --index-audit` and the schema_version=0
  surface that catches #218 postinstall failures
- Extend check-jsonb-pattern.sh note with the max_stalled DEFAULT 1
  regression guard
- Touch up test file blurbs for migrate.test.ts, pglite-engine.test.ts,
  minions.test.ts with v0.13.1 coverage

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): run files sequentially to eliminate shared-DB race

The E2E suite was flaky. ~3 of every 5 runs had 4-10 failures clustered
in Links, Timeline, Versions, Minions resilience, Parallel Import, and
Page CRUD tests. Symptoms included "expected 16 pages, got 8" (half),
"expected 1 link inserted, got 0", timeline entries missing after
round-trip, and similar data-shape mismatches.

Root cause: bun test runs test FILES in parallel (each in a worker
process). 13 E2E files share one DATABASE_URL, and `setupDB()` in
`test/e2e/helpers.ts` does `TRUNCATE ... CASCADE` on all tables before
each file's `importFixtures()`. File A's TRUNCATE would race with file
B's in-flight INSERT stream, producing the observed half-populated or
wrong-count states.

An earlier attempt used a Postgres advisory lock held on a dedicated
single-connection client for the lifetime of each file's run. It broke
because bun's default 5000 ms hook timeout fires on queued beforeAll()
calls: with 13 files serializing through the lock, files 2-13 would
time out waiting for file 1 to finish.

This commit switches to sequential file execution at the harness level
via scripts/run-e2e.sh, which loops through test/e2e/*.test.ts one at
a time, tracks aggregate pass/fail counts, and exits non-zero on the
first failing file. No lock, no timeout issues, no changes to any test
file. package.json test:e2e points at the new script.

Verified: 5 back-to-back runs against the same Postgres container,
each completing in ~5 min. Every run: 13 files, 138 tests, 0 fails.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version to 0.15.1 (fix wave locked to MINOR line)

Master v0.14.2 was the last /investigate root-cause wave on the
v0.14.x line. This fix wave opens v0.15.x: four hot issues (#170,
#218, #219, #223) close v0.13.x regressions that v0.14.x didn't
cover, so the MINOR bump reflects the semantic shift — new schema
migrations (v14, v15), a new CLI surface (`--max-stalled`,
`--sigkill-rescue`, `--index-audit`), a new BrainEngine contract
(`kind` discriminator + extended `Migration` interface), and a new
install-time contract (PGLite 0.4.3 pin + `trustedDependencies`).

Locked to 0.15.1 in advance: other work may land before/after this
PR, but the version is fixed so reviewers can cite a stable number.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 13:19:23 -07:00
Garry Tan
7f156c8873 feat: v0.15.0 llms.txt + llms-full.txt + AGENTS.md (#294)
* feat: llms.txt + llms-full.txt + AGENTS.md (v0.15.0)

Ship three new public artifacts at the repo root so agents that aren't
Claude Code can discover GBrain documentation cleanly:

- AGENTS.md — ~45-line install + operating protocol for non-Claude agents
  (Codex, Cursor, OpenClaw, Aider). Covers install, read order, trust
  boundary, config/debug/migration pointers, fork regeneration. Uses
  relative links so it survives fork/rename.
- llms.txt — llmstxt.org-spec index (H1 + blockquote + Core entry points /
  Configuration / Debugging / Migrations / Philosophy / Optional H2s).
- llms-full.txt — same index with core docs inlined for single-fetch
  ingestion. ~225KB, well under the 600KB FULL_SIZE_BUDGET.

Generator-driven via scripts/build-llms.ts + scripts/llms-config.ts.
LLMS_REPO_BASE env var makes it fork-friendly. bun run build:llms
regenerates both outputs deterministically.

test/build-llms.test.ts has 7 cases: paths resolve on disk, generator
idempotent, llms.txt spec shape, checked-in files match generator output
(drift guard), content contract (RESOLVER / AGENTS / INSTALL referenced),
AGENTS mirrors README + INSTALL_FOR_AGENTS install path, llms-full.txt
under size budget.

Leverage point per Codex review: README.md + INSTALL_FOR_AGENTS.md
install prompts now tell agents to fetch AGENTS.md first. Without this,
the new files were invisible.

Drive-by fix: INSTALL_FOR_AGENTS.md:136 had `git pull origin main` while
the repo's default branch is master (origin/HEAD -> master). Corrected.

Plan + reviews: /plan-eng-review CLEARED, /codex adversarial review
found 15 issues — 7 folded in directly, 3 user tension decisions, 5
stayed as NOT-in-scope with reasoning.

Version bumps to 0.15.0 (new public-artifact feature surface per Step 12
of /ship feature-signal heuristic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: normalize VERSION to 3-digit to match master

master uses 3-digit semver (0.14.2); my earlier /ship bumped VERSION to
the 4-digit gstack format (0.15.0.0). Revert to 0.15.0 to match
package.json (already 3-digit) and master's convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 11:51:32 -07:00
Garry Tan
c0b621923b fix: JSONB double-encode + splitBody wiki + parseEmbedding (v0.12.1) (#196)
* fix: splitBody and inferType for wiki-style markdown content

- splitBody now requires explicit timeline sentinel (<!-- timeline -->,
  --- timeline ---, or --- directly before ## Timeline / ## History).
  A bare --- in body text is a markdown horizontal rule, not a separator.
  This fixes the 83% content truncation @knee5 reported on a 1,991-article
  wiki where 4,856 of 6,680 wikilinks were lost.

- serializeMarkdown emits <!-- timeline --> sentinel for round-trip stability.

- inferType extended with /writing/, /wiki/analysis/, /wiki/guides/,
  /wiki/hardware/, /wiki/architecture/, /wiki/concepts/. Path order is
  most-specific-first so projects/blog/writing/essay.md → writing,
  not project.

- PageType union extended: writing, analysis, guide, hardware, architecture.

Updates test/import-file.test.ts to use the new sentinel.

Co-Authored-By: @knee5 (PR #187)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: JSONB double-encode bug on Postgres + parseEmbedding NaN scores

Two related Postgres-string-typed-data bugs that PGLite hid:

1. JSONB double-encode (postgres-engine.ts:107,668,846 + files.ts:254):
   ${JSON.stringify(value)}::jsonb in postgres.js v3 stringified again
   on the wire, storing JSONB columns as quoted string literals. Every
   frontmatter->>'key' returned NULL on Postgres-backed brains; GIN
   indexes were inert. Switched to sql.json(value), which is the
   postgres.js-native JSONB encoder (Parameter with OID 3802).
   Affected columns: pages.frontmatter, raw_data.data,
   ingest_log.pages_updated, files.metadata. page_versions.frontmatter
   is downstream via INSERT...SELECT and propagates the fix.

2. pgvector embeddings returning as strings (utils.ts):
   getEmbeddingsByChunkIds returned "[0.1,0.2,...]" instead of
   Float32Array on Supabase, producing [NaN] cosine scores.
   Adds parseEmbedding() helper handling Float32Array, numeric arrays,
   and pgvector string format. Throws loud on malformed vectors
   (per Codex's no-silent-NaN requirement); returns null for
   non-vector strings (treated as "no embedding here"). rowToChunk
   delegates to parseEmbedding.

E2E regression test at test/e2e/postgres-jsonb.test.ts asserts
jsonb_typeof = 'object' AND col->>'k' returns expected scalar across
all 5 affected columns — the test that should have caught the original
bug. Runs in CI via the existing pgvector service.

Co-Authored-By: @knee5 (PR #187 — JSONB triple-fix)
Co-Authored-By: @leonardsellem (PR #175 — parseEmbedding)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: extract wikilink syntax with ancestor-search slug resolution

extractMarkdownLinks now handles [[page]] and [[page|Display Text]]
alongside standard [text](page.md). For wiki KBs where authors omit
leading ../ (thinking in wiki-root-relative terms), resolveSlug
walks ancestor directories until it finds a matching slug.

Without this, wikilinks under tech/wiki/analysis/ targeting
[[../../finance/wiki/concepts/foo]] silently dangled when the
correct relative depth was 3 × ../ instead of 2.

Co-Authored-By: @knee5 (PR #187)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gbrain repair-jsonb + v0.12.1 migration + CI grep guard

- New gbrain repair-jsonb command. Detects rows where
  jsonb_typeof(col) = 'string' and rewrites them via
  (col #>> '{}')::jsonb across 5 affected columns:
  pages.frontmatter, raw_data.data, ingest_log.pages_updated,
  files.metadata, page_versions.frontmatter. Idempotent — re-running
  is a no-op. PGLite engines short-circuit cleanly (the bug never
  affected the parameterized encode path PGLite uses). --dry-run
  shows what would be repaired; --json for scripting.

- New v0_12_1.ts migration orchestrator. Phases: schema → repair → verify.
  Modeled on v0_12_0 pattern, registered in migrations/index.ts.
  Runs automatically via gbrain upgrade / apply-migrations.

- CI grep guard at scripts/check-jsonb-pattern.sh fails the build if
  anyone reintroduces the ${JSON.stringify(x)}::jsonb interpolation
  pattern. Wired into bun test via package.json. Best-effort static
  analysis (multi-line and helper-wrapped variants are caught by the
  E2E round-trip test instead).

- Updates apply-migrations.test.ts expectations to account for the new
  v0.12.1 entry in the registry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.12.1)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.12.1

- CLAUDE.md: document repair-jsonb command, v0_12_1 migration,
  splitBody sentinel contract, inferType wiki subtypes, CI grep
  guard, new test files (repair-jsonb, migrations-v0_12_1, markdown)
- README.md: add gbrain repair-jsonb to ADMIN command reference
- INSTALL_FOR_AGENTS.md: fix verification count (6 -> 7), add
  v0.12.1 upgrade guidance for Postgres brains
- docs/GBRAIN_VERIFY.md: add check #8 for JSONB integrity on
  Postgres-backed brains
- docs/UPGRADING_DOWNSTREAM_AGENTS.md: add v0.12.1 section with
  migration steps, splitBody contract, wiki subtype inference
- skills/migrate/SKILL.md: document native wikilink extraction
  via gbrain extract links (v0.12.1+)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 07:14:24 +08:00
Garry Tan
d8613366a5 Minions v7 + v0.11.1 canonical migration + skillify (#130)
* feat: add minion_jobs schema, migration v5, and executeRaw to BrainEngine

Foundation for the Minions job queue system. Adds:
- minion_jobs table (20 columns) with CHECK constraints, partial indexes,
  and RLS. Inspired by BullMQ's job model, adapted for Postgres.
- Migration v5 creates the table for existing databases.
- executeRaw<T>() method on BrainEngine interface for raw SQL access,
  needed by the Minions module for claim queries (FOR UPDATE SKIP LOCKED),
  token-fenced writes, and atomic stall detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Minions job queue — queue, worker, backoff, types

BullMQ-inspired Postgres-native job queue built into GBrain. No Redis.
No external dependencies. Postgres transactions replace Lua scripts.

- MinionQueue: submit, claim (FOR UPDATE SKIP LOCKED), complete/fail
  (token-fenced), atomic stall detection (CTE), delayed promotion,
  parent-child resolution, prune, stats
- MinionWorker: handler registry, lock renewal, graceful SIGTERM,
  exponential backoff with jitter, UnrecoverableError bypass
- MinionJobContext: updateProgress(), log(), isActive() for handlers
- 8-state machine: waiting/active/completed/failed/delayed/dead/
  cancelled/waiting-children

Patterns stolen from: BullMQ (lock tokens, stall detection, flows),
Sidekiq (dead set, backoff formula), Inngest (checkpoint/resume).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: 43 tests for Minions job queue

Full coverage of the Minions module against PGLite in-memory:
- Queue CRUD (9): submit, get, list, remove, cancel, retry, duplicate
- State machine (6): waiting→active→completed/failed, retry→delayed→waiting
- Backoff (4): exponential, fixed, jitter range, attempts_made=0 edge
- Stall detection (3): detect stalled, counter increment, max→dead
- Dependencies (5): parent waits, fail_parent, continue, remove_dep, orphan
- Worker lifecycle (5): register, start-without-handlers, claim+execute,
  non-Error throws, UnrecoverableError bypass
- Lock management (3): renewal, token mismatch, claim sets lock fields
- Claim mechanics (4): empty queue, priority ordering, name filtering,
  delayed promotion timing
- Cancel & retry (2): cancel active, retry dead

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Minions CLI commands and MCP operations

Wire Minions into the GBrain CLI and MCP layer:

CLI (gbrain jobs):
  submit <name> [--params JSON] [--follow] [--dry-run]
  list [--status S] [--queue Q] [--limit N]
  get <id> — detailed view with attempt history
  cancel/retry/delete <id>
  prune [--older-than 30d]
  stats — job health dashboard
  work [--queue Q] [--concurrency N] — Postgres-only worker daemon

6 MCP operations (contract-first, auto-exposed via MCP server):
  submit_job, get_job, list_jobs, cancel_job, retry_job, get_job_progress

Built-in handlers: sync, embed, lint, import. --follow runs inline.
Worker daemon blocked on PGLite (exclusive file lock).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for Minions job queue

CLAUDE.md: added Minions files to key files, updated operation count (36),
BrainEngine method count (38), test file count (45), added jobs CLI commands.
CHANGELOG.md: added Minions entry to v0.10.0 (background jobs, retry, stall
detection, worker daemon).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Minions v2 — agent orchestration primitives (pause/resume, inbox, tokens, replay)

Adds the foundation for Minions as universal agent orchestration infrastructure.
GBrain's Postgres-native job queue now supports durable, observable, steerable
background agents. The OpenClaw plugin (separate repo) will consume these via
library import, not MCP, for zero-latency local integration.

## New capabilities

- **Concurrent worker** — Promise pool replaces sequential loop. Per-job
  AbortController for cooperative cancellation. Graceful shutdown waits for
  all in-flight jobs via Promise.allSettled.
- **Pause/resume** — pauseJob clears the lock and fires AbortSignal on active
  jobs. Handlers check ctx.signal.aborted and exit cleanly. resumeJob returns
  paused jobs to waiting. Catch block skips failJob when signal.aborted.
- **Inbox (separate table)** — minion_inbox table for sidechannel messages.
  sendMessage with sender validation (parent job or admin). readInbox is
  token-fenced and marks read_at atomically. Separate table avoids row bloat
  from rewriting JSONB on every send.
- **Token accounting** — tokens_input/tokens_output/tokens_cache_read columns.
  updateTokens accumulates; completeJob rolls child tokens up to parent.
  USD cost computed at read time (no cost_usd column — pricing too volatile).
- **Job replay** — replayJob clones a terminal job with optional data overrides.
  New job, fresh attempts, no parent link.

## Handler contract additions

MinionJobContext now provides:
- `signal: AbortSignal` — cooperative cancellation
- `updateTokens(tokens)` — accumulate token usage
- `readInbox()` — check for sidechannel messages
- `log()` — now accepts string or TranscriptEntry

## MCP operations added

pause_job, resume_job, replay_job, send_job_message — all auto-generate CLI
commands and MCP server endpoints.

## Library exports

package.json exports map adds ./minions and ./engine-factory paths so plugins
can `import { MinionQueue } from 'gbrain/minions'` for direct library use.

## Instruction layer (the teaching)

- skills/minion-orchestrator/SKILL.md — when/how to use Minions, decision
  matrix, lifecycle management, anti-patterns
- skills/conventions/subagent-routing.md — cross-cutting rule: all background
  work goes through Minions
- RESOLVER.md — trigger entries for agent orchestration
- manifest.json — registered

## Schema migration v6

Additive: 3 token columns, paused status, minion_inbox table with unread index.
Full Postgres + PGLite support. No backfill needed.

## Tests

65 tests (was 43): pause/resume (5), inbox (6), tokens (4), replay (4),
concurrent worker context (3), plus all existing coverage.

## What's NOT in this commit

Deferred to follow-up PRs:
- LISTEN/NOTIFY subscribe (needs real Postgres E2E)
- Resource governor (depends on concurrent worker stress testing)
- Routing eval harness (needs API keys + benchmark data)
- OpenClaw plugin (separate @gbrain/openclaw-minions-plugin repo)

See docs/designs/MINIONS_AGENT_ORCHESTRATION.md for full CEO-approved design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): migration v7 — agent_parity_layer schema

Adds columns on minion_jobs (depth, max_children, timeout_ms, timeout_at,
remove_on_complete, remove_on_fail, idempotency_key) plus the new
minion_attachments table. Three partial indexes for bounded scans:
idx_minion_jobs_timeout, idx_minion_jobs_parent_status, and
uniq_minion_jobs_idempotency. Check constraints enforce non-negative depth
and positive child cap / timeout.

Additive migration — existing installs pick it up via ensureSchema on next
use. No user action required.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(minions): extend types for v7 parity layer

Extends MinionJob with depth/max_children/timeout_ms/timeout_at/
remove_on_complete/remove_on_fail/idempotency_key. Extends MinionJobInput
with the same options plus max_spawn_depth override. Adds MinionQueueOpts
(maxSpawnDepth default 5, maxAttachmentBytes default 5 MiB). Adds
AttachmentInput/Attachment shapes and ChildDoneMessage in the InboxMessage
union. rowToMinionJob updated to pick up the new columns.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(minions): attachments validator

New module validateAttachment() gates every attachment write. Rejects empty
filenames, path traversal (.., /, \), null bytes, oversized content (5 MiB
default, per-queue override), invalid base64, and implausible content_type
headers. Returns normalized { filename, content_type, content (Buffer),
sha256, size } on success.

The DB also enforces UNIQUE (job_id, filename) as defense-in-depth for
concurrent addAttachment races — JS-only checks are not sufficient.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(minions): queue v7 — depth, child cap, timeouts, cascade, idempotency, child_done

Wraps completeJob and failJob in engine.transaction() so parent hook
invocations (resolveParent, failParent, removeChildDependency) fold into
the same transaction as the child update. A process crash between child
and parent can't strand the parent in waiting-children anymore.

Adds v7 behaviors:
- Depth tracking. add() computes depth = parent.depth + 1 and rejects
  past maxSpawnDepth (default 5).
- Per-parent child cap. add() takes SELECT ... FOR UPDATE on the parent,
  counts non-terminal children, rejects when count >= max_children.
  NULL max_children = no cap.
- Per-job wall-clock timeout. claim() populates timeout_at when
  timeout_ms is set. New handleTimeouts() dead-letters expired rows with
  error_text='timeout exceeded'. Terminal — no retry.
- Cascade cancel. cancelJob() walks descendants via recursive CTE with
  depth-100 runaway cap. Returns the root row. Re-parented descendants
  (parent_job_id NULL) are naturally excluded.
- Idempotency. add() uses INSERT ... ON CONFLICT (idempotency_key) DO
  NOTHING RETURNING; falls back to SELECT when RETURNING is empty. Same
  key always yields the same job id.
- child_done inbox. completeJob inserts {type:'child_done', child_id,
  job_name, result} into the parent's inbox in the same transaction as
  the token rollup, guarded by EXISTS so terminal/deleted parents skip
  without FK violation. New readChildCompletions(parent_id, lock_token,
  since?) helper; token-fenced like readInbox.
- removeOnComplete / removeOnFail. Deletes the row after the parent hook
  fires, so parent policy sees consistent state.
- Attachment methods. addAttachment validates via validateAttachment
  then INSERTs; UNIQUE (job_id, filename) backs the JS dup check.
  listAttachments, getAttachment, deleteAttachment round out the API.

Fixes pre-existing inverted status bug: add() now puts children in
waiting/delayed (not waiting-children) and atomically flips the parent
to waiting-children in the same transaction. Tests no longer need
manual UPDATE workarounds.

Two correctness fixes:
- Sibling completion race. Under READ COMMITTED, two grandchildren
  completing concurrently each saw the other as still-active in the
  pre-commit snapshot and neither flipped the parent. Fixed by taking
  SELECT ... FOR UPDATE on the parent row at the start of completeJob
  and failJob transactions, serializing siblings on the parent lock.
- JSONB double-encode. postgres.js conn.unsafe(sql, params) auto-
  JSON-encodes parameters. Calling JSON.stringify(obj) first stored a
  JSON string literal (jsonb_typeof=string) and broke payload->>'key'
  queries silently. Removed JSON.stringify from three call sites
  (child_done inbox post, updateProgress, sendMessage). PGLite tolerated
  both forms so unit tests missed it — real-PG E2E caught it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(minions): worker — timeout safety net + handleTimeouts tick

Worker tick now calls handleStalled() first, then handleTimeouts() — stall
requeue wins over timeout dead-letter when both could fire in the same
cycle. handleTimeouts() guards on lock_until > now() so stalled jobs take
the retryable path.

launchJob schedules a per-job setTimeout(timeout_ms) that fires ctx.signal
as a best-effort handler interrupt. The timer is always cleared in .finally
so process exit isn't delayed by a dangling timer. Handlers that respect
AbortSignal stop cleanly; handlers that ignore it still get dead-lettered
by the DB-side handleTimeouts.

Removed post-completeJob and post-failJob parent-hook calls from the worker
— those are now inside the queue method transactions. Worker becomes
simpler and crash-safer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(minions): 33 new unit tests for v7 parity layer

Covers depth cap, per-parent child cap, timeout dead-letter, cascade
cancel (including the re-parent edge case), removeOnComplete /
removeOnFail, idempotency (single + concurrent), child_done inbox
(posted in txn + survives child removeOnComplete + since cursor),
attachment validation (oversize, path traversal, null byte, duplicates,
base64), AbortSignal firing on pause mid-handler, catch-block skipping
failJob when aborted, worker in-flight bookkeeping, token-rollup guard
when parent already terminal, and setTimeout safety-net cleanup.

Existing tests updated to remove the inverted-status manual UPDATE
workarounds that the add() fix made obsolete.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(e2e): Minions v7 concurrency + OpenClaw resilience coverage

minions-concurrency.test.ts spins two MinionWorker instances against the
test Postgres, submits 20 jobs, and asserts zero double-claims (every job
runs exactly once). This is the only test that actually proves FOR UPDATE
SKIP LOCKED under real concurrency — PGLite runs on a single connection
and can't exercise the race.

minions-resilience.test.ts covers the six OpenClaw daily pains:
1. Spawn storm caps enforce under concurrent submit. 2. Agent stall →
handleStalled() requeues; handleTimeouts() skips (lock_until guard).
3. Forgotten dispatches recoverable via child_done inbox. 4. Cascade
cancel stops grandchildren mid-flight. 5. Deep tree fan-in
(parent → 3 children → 2 grandchildren each) completes with the full
inbox chain. 6. Parent crash/recovery resumes from persisted state.

helpers.ts extends ALL_TABLES with minion_attachments, minion_inbox, and
minion_jobs (FK dependents first) so E2E teardown doesn't leak rows
between runs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: release v0.11.0 — Minions v7 agent orchestration primitives

Bumps VERSION / package.json to 0.11.0. Adds CHANGELOG entry covering
depth tracking, max_children, per-job timeouts, cascade cancel,
idempotency keys, child_done inbox, removeOnComplete/Fail, attachments,
migration v7, plus the two correctness fixes (sibling completion race
and JSONB double-encode).

TODOS.md captures the four v7 follow-ups: per-queue rate limiting,
repeat/cron scheduler, worker event emitter, and waitForChildren
convenience helpers.

1066 unit + 105 E2E = 1171 tests passing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(minions): unify JSONB inserts, tighten nullish coalescing

Three non-blocker cleanups from post-ship review of v0.11.0:

- queue.ts add() and completeJob(): pre-stringifying with JSON.stringify
  while other sites pass raw objects with $n::jsonb casts. postgres.js
  double-encodes if you stringify first — works on PGLite (text→JSONB
  auto-cast), fails silently on real PG. Unify on raw object + explicit
  $n::jsonb cast.
- queue.ts readChildCompletions: since clause used sent_at > $2 relying
  on PG's implicit text→TIMESTAMPTZ coercion. Explicit $2::timestamptz
  is safer and clearer.
- types.ts rowToMinionJob: parent_job_id used || which coerces 0 to null.
  Harmless today (SERIAL IDs start at 1) but ?? is semantically correct.

All 110 unit tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(minions): updateProgress missed $1::jsonb cast in unification

Residual from c502b7e — updateProgress was the only remaining JSONB write
without the explicit ::jsonb cast. Not broken (implicit cast works) but
breaks the convention the prior commit unified everywhere else.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* doc: Minions v7 skill count + jobs subcommands (26 skills)

README: bump skill count 25 → 26, add minion-orchestrator row, add
`gbrain jobs` command family block so v0.11.0's headline feature is
actually discoverable from the top-level commands reference.

CLAUDE.md: unit test count 48 → 49 (minions.test.ts expanded), skill
count 25 → 26, add minion-orchestrator to Key files + skills categorization,
expand MinionQueue one-liner to cover v7 primitives (depth/child-cap,
timeouts, idempotency, child_done inbox, removeOnComplete/Fail).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat: Minions adoption UX — smoke test + migration + pain-triggered routing

Teach OpenClaw when to reach for Minions vs native subagents. Ship three
pieces so upgrading from v0.10.x actually lands for real users:

- `gbrain jobs smoke` — one-command health check that submits a `noop` job,
  runs a worker, verifies completion, and prints engine-aware guidance
  (PGLite installs get the "daemon needs Postgres, use --follow" note).
  Fails loud if schema's below v7 so the user knows to `gbrain init`.

- `skills/migrations/v0.11.0.md` — post-upgrade migration file the
  auto-update agent reads. Six steps: apply schema, run smoke, ask user
  via AskUserQuestion which mode they want (always / pain_triggered / off),
  write to `~/.gbrain/preferences.json`, sanity-check handlers, mark done.
  Completeness scores on each option so the recommendation is explicit.

- `skills/conventions/subagent-routing.md` rewritten — was a "MUST use
  Minions for ALL background work" mandate, now reads preferences.json
  on every routing decision and branches on three modes. Mode B
  (pain_triggered) is the default: keep subagents until gateway drops
  state, parallel > 3, runtime > 5min, or user expresses frustration.
  Then pitch the switch in-session with a specific script.

Rename pass: "Minions v7" → "Minions" in README (JOBS block), TODOS.md
(P1 section header + depends-on), CHANGELOG.md v0.11.0 entry. v7 stays
as the internal schema version in code/migration contexts. The product
name is just Minions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* doc(readme): promote Minions — 6 OpenClaw pains + how each is fixed

The one-line mention in the skills table wasn't doing the work. Added a
dedicated section between "How It Works" and "Getting Data In" that leads
with the six multi-agent failures every OpenClaw user hits daily (spawn
storms, hung handlers, forgotten dispatches, unstructured debugging,
gateway crashes, runaway grandchildren) and maps each pain to the
specific Minions primitive that fixes it.

Includes the smoke test command, the adoption default (pain_triggered),
and a pointer to skills/minion-orchestrator for the full patterns.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(bench): add harness for Minions vs OpenClaw subagent dispatch

Shared harness (openclawDispatch + minionsHandler) using matching
claude-haiku-4-5 calls on both sides so the delta measures queue+
dispatch overhead on top of identical LLM work. Includes
statsFromResults (p50/p95/p99) and formatStats helpers. Uses
`openclaw agent --local` embedded mode; does not test gateway
multi-agent fan-out (documented in the harness header).

* test(bench): durability under SIGKILL — Minions vs OpenClaw --local

Headline bench for the claim: when the orchestrator dies mid-dispatch,
Minions rescues via PG state + stall detection; OpenClaw --local loses
in-flight work outright.

Minions side: seed 10 active+expired-lock rows (exact state a SIGKILLed
worker leaves) then run a rescue worker. Expect 10/10 completed.
OpenClaw side: spawn 10 `openclaw agent --local` in parallel, SIGKILL
each at 500ms, count pre-kill delivered output. Expect 0/10 — no
persistence layer, nothing to recover.

Budget: ~$0 (Minions handlers sleep 10ms; OC calls die at 500ms so
partial LLM billing is negligible).

* test(bench): per-dispatch throughput — Minions vs OpenClaw --local

20 serial dispatches each side, identical claude-haiku-4-5 call with the
same trivial prompt. p50/p95/p99 reported via statsFromResults. Serial
(not parallel) so the per-dispatch cost is measured honestly and LLM
token spend stays bounded (~$0.08 total).

Minions: one queue, one worker, one concurrency. Submit → poll to
completion before next submit. OpenClaw: N sequential
`openclaw agent --local` spawns.

* test(bench): fan-out — Minions 10-wide concurrency vs 10 parallel OC spawns

Parent dispatches 10 children, waits for all to return. Minions uses
worker concurrency=10 sharing one warm process; OpenClaw parallel
`openclaw agent --local` spawns, each boots its own runtime.

3 runs × 10 children per run. Reports ok count and wall time per run
plus summary. Honest caveat documented: does not test OC gateway
multi-agent fan-out — that needs a custom WS client and LLM-backed
parent agent. This measures what users script today.

Budget: ~$0.12 LLM spend.

* test(bench): memory — 10 in-flight subagents, single-proc vs 10-proc cost

Measures resident memory for keeping 10 subagents in flight. Minions:
one worker process, concurrency=10 with handlers that park on a
promise — sample RSS of the test process via process.memoryUsage().
OpenClaw: 10 parallel `openclaw agent --local` processes, sum their
RSS via `ps -o rss=`.

Handlers are cheap sleeps, no LLM — we want harness memory, not LLM
client state. Budget: $0.

* test(bench): fan-out — don't gate on OC success rate, report numbers

Initial run showed OC parallel `--local` at 10-wide hits 40% failure
rate (17/30 across 3 runs). That's the finding, not a test bug —
process startup stampede + LLM rate limits. Bench now prints error
samples and reports the numbers instead of gating.

Minions side still gates at 90% (30/30 observed in practice).

* doc(benchmarks): Minions vs OpenClaw --local subagent dispatch

Real numbers on four claims: durability, throughput, fan-out, memory.
Same claude-haiku-4-5 call on both sides so the delta is queue+dispatch+
process cost on top of identical LLM work.

Headline: Minions rescues 10/10 from a SIGKILLed worker in 458ms while
OpenClaw --local loses all 10; ~10× faster per dispatch (778ms p50 vs
8086ms p50); ~21× faster at 10-wide fan-out AND 100% reliable vs OC's
43% failure rate; 2 MB vs 814 MB to keep 10 subagents in flight.

Honest caveats section covers what this doesn't test (OC gateway
multi-agent, load tests, other models). Fully reproducible via
test/e2e/bench-vs-openclaw/.

* doc(readme): inject Minions vs OpenClaw bench numbers

Headline deltas now in the Minions section: 10/10 vs 0/10 on crash,
~10× faster per dispatch, ~21× faster fan-out at 10-wide with 0%
failure vs 43%, ~400× less memory. Links to the full bench doc.

Prose first said Minions "fixes all six pains." Now it shows the
numbers that prove it.

* bench: production Wintermute benchmark — Minions 753ms vs sub-agent timeout

Real deployment: 45K-page brain on Render+Supabase. Task: pull 99 tweets,
write brain page, commit, sync. Minions: 753ms, $0. Sub-agent: gateway
timeout (>10s, couldn't even spawn under production load).

Also: 19,240 tweets backfilled across 36 months in 15 min at $0.
Sub-agents would cost $1.08 and fail 40% of spawns.

* bench: tweet ingestion — Minions 719ms vs OpenClaw 12.5s (17×)

Production benchmark with runnable test code:
- test/e2e/bench-vs-openclaw/tweet-ingest.bench.ts (reusable)
- docs/benchmarks/2026-04-18-tweet-ingestion.md (publishable)

Task: pull 100 tweets from X API, write brain page, commit, sync.
Minions: 719ms mean, $0, 100% success.
OpenClaw: 12,480ms mean, $0.03/run, 60% success (gateway timeouts).
At scale: 36-month backfill, 19K tweets, 15 min, $0 vs est. $1.08.

* doc(benchmarks): Wintermute production data point for Minions vs OpenClaw

Adds a production-environment data point to the Minions README section:
one month of tweet ingest on Wintermute (Render + Supabase + 45K-page brain)
ran end-to-end in 753ms for \$0.00 via Minions, while the equivalent
sessions_spawn hit the 10s gateway timeout and produced nothing.

Full methodology + logs in docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core): preferences.ts + cli-util.ts — foundations for v0.11.1

Adds two foundational modules that apply-migrations (Lane A-4), the
v0.11.0 orchestrator (Lane C-1), and the stopgap script (Lane C-4) all
depend on.

- src/core/preferences.ts: atomic-write ~/.gbrain/preferences.json
  (mktemp + rename, 0o600, forward-compatible for unknown keys) with
  validateMinionMode, loadPreferences, savePreferences. Plus
  appendCompletedMigration + loadCompletedMigrations for the
  ~/.gbrain/migrations/completed.jsonl log (tolerates malformed lines).
  Uses process.env.HOME || homedir() so $HOME overrides work in CI and
  tests; Bun's os.homedir() caches the initial value and ignores later
  mutations.
- src/core/cli-util.ts: promptLine(prompt) helper, extracted from
  src/commands/init.ts:212-224. Shared so init, apply-migrations, and
  the v0.11.0 orchestrator's mode prompt don't each reinvent it.

test/preferences.test.ts: 21 unit tests covering load/save atomicity,
0o600 perms, forward-compat for unknown keys, minion_mode validation,
completed.jsonl JSONL append idempotence, auto-ts population, malformed-
line tolerance in loadCompletedMigrations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(init): add --migrate-only flag (schema-only, no saveConfig)

Context: v0.11.0 migration orchestrators need a safe way to re-apply the
schema against an existing brain without risking a config flip. Today
running bare `gbrain init` with no flags defaults to PGLite and calls
saveConfig, which would silently overwrite an existing Postgres
database_url — caught by Codex in the v0.11.1 plan review as a
show-stopper data-loss bug.

The new --migrate-only path:
  - loadConfig() reads the existing config (does NOT call saveConfig)
  - errors out with a clear "run gbrain init first" if no config exists
  - connects via the already-configured engine, calls engine.initSchema(),
    disconnects
  - --json emits structured success/error payloads

Everything downstream in the v0.11.1 migration chain (apply-migrations,
the stopgap bash script, the package.json postinstall hook) will invoke
this flag rather than bare gbrain init.

test/init-migrate-only.test.ts: 4 tests covering the no-config error
path, --json error payload shape, happy-path with a PGLite fixture
(verifies config.json content is byte-identical after the call — the
real invariant), and idempotent rerun.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(migrations): TS registry replaces filesystem migration scan

Context: Codex flagged that bun build --compile produces a self-contained
binary, and the existing findMigrationsDir() in upgrade.ts:145 walks
skills/migrations/v*.md on disk — which fails on a compiled install
because the markdown files aren't bundled. The plan's fix is a TS
registry: migrations are code, imported directly, visible to both source
installs and compiled binaries.

- src/commands/migrations/types.ts: shared Migration, OrchestratorOpts,
  OrchestratorResult types.
- src/commands/migrations/index.ts: exports the migrations[] array,
  getMigration(version), and compareVersions() (semver comparator).
  The feature_pitch data that lived in the MD file frontmatter now
  lives here as a code constant on each Migration, so runPostUpgrade's
  post-upgrade pitch printer can consume it without a filesystem read.
- src/commands/migrations/v0_11_0.ts: stub orchestrator + pitch. The
  full phase implementation lands in Lane C-1; for now the stub throws
  a clear "not yet implemented" so apply-migrations --list (Lane A-4)
  can still enumerate the migration.

test/migrations-registry.test.ts: 9 tests covering ascending-semver
ordering, feature_pitch shape invariants, getMigration lookup, and
compareVersions edge cases (equal / newer / older / single-digit
across major bumps).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): gbrain apply-migrations — migration runner CLI

Reads ~/.gbrain/migrations/completed.jsonl, diffs against the TS migration
registry, runs pending orchestrators. Resumes status:"partial" entries
(the stopgap bash script writes these so v0.11.1 apply-migrations can
pick up where it left off). Idempotent: rerunning when up-to-date exits 0.

Flags:
  --list                    Show applied + partial + pending + future.
  --dry-run                 Print the plan; take no action.
  --yes / --non-interactive Skip prompts (used by runPostUpgrade + postinstall).
  --mode <a|p|o>            Preset minion_mode (bypasses the Phase C TTY prompt).
  --migration vX.Y.Z        Force-run one specific version.
  --host-dir <path>         Include $PWD in host-file walk (default is
                            $HOME/.claude + $HOME/.openclaw only).
  --no-autopilot-install    Skip Phase F.

Diff rule (Codex H9): apply when no status:"complete" entry exists AND
migration.version ≤ installed VERSION. Previously proposed rule was
"version > currentVersion", which would SKIP v0.11.0 when running v0.11.1;
regression test in apply-migrations.test.ts pins the correct semantics.

Registered in src/cli.ts CLI_ONLY Set; dispatched before connectEngine so
each phase owns its own engine/subprocess lifecycle (no double-connect
when the orchestrator shells out to init --migrate-only or jobs smoke).

test/apply-migrations.test.ts: 18 unit tests covering parseArgs for every
flag, indexCompleted/statusForVersion correctness (including stopgap-then-
complete transition), and buildPlan's four buckets (applied / partial /
pending / skippedFuture) with the Codex H9 regression pinned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(upgrade): runPostUpgrade tail-calls apply-migrations; postinstall hook

Closes the v0.11.0 mega-bug: migration skills never fired on upgrade.
`runPostUpgrade` now does two things:

  1. Cosmetic: prints feature_pitch headlines for migrations newer than
     the prior binary. Uses the TS registry (Codex K) instead of walking
     skills/migrations/*.md on disk — compiled binaries see the same list
     source installs do.
  2. Mechanical: invokes apply-migrations --yes --non-interactive in the
     same process so Phase F (autopilot install) doesn't hit a subprocess
     timeout wall. Catches + surfaces errors without failing the upgrade.

Also:
  - Drops the early-return on missing upgrade-state.json (Codex H8).
    runPostUpgrade now runs apply-migrations unconditionally; it's cheap
    when nothing is pending. This repairs every broken-v0.11.0 install on
    their next upgrade attempt.
  - Bumps the `gbrain post-upgrade` subprocess timeout in runUpgrade from
    30s → 300s (Codex H7). A v0.11.0→v0.11.1 migration that has to
    schema-init + smoke + prefs + host-rewrite + launchd-install exceeds
    30s trivially.
  - Removes now-dead findMigrationsDir + extractFeaturePitch helpers and
    their filesystem-reading imports (readdirSync, resolve).
  - src/cli.ts post-upgrade dispatch now awaits the async runPostUpgrade.

apply-migrations (Lane A-4):
  - First-install guard: loadConfig() check at the top. No brain
    configured = exit silently for --yes / --non-interactive (postinstall
    stays quiet on fresh `bun add gbrain`); explicit message on --list /
    --dry-run.

package.json:
  - New `postinstall` script: gbrain --version >/dev/null 2>&1 && gbrain
    apply-migrations --yes --non-interactive 2>/dev/null || true. The
    --version sanity check guards against a half-written binary (Codex
    review criticism). || true prevents `bun update gbrain` failure
    mid-upgrade.

Manual smoke verified: fresh $HOME with no config → apply-migrations
--yes silently exits 0; --dry-run prints the one-liner "No brain
configured... Nothing to migrate."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commands): extract library-level Core functions that throw not exit

Codex architecture finding #5: reusing CLI entry-point functions as Minions
handler bodies is wrong. If a Minion invokes runExtract / runEmbed /
runBacklinks / runLint and the handler hits a process.exit(1), the ENTIRE
WORKER process dies — killing every other in-flight job. Handlers need
library-level APIs that throw, and the CLI stays a thin wrapper that
catches + exits.

Per-command shape:
  - runXxxCore(opts): throws on validation errors, returns structured
    result. Handler-safe.
  - runXxx(args): arg parser; calls Core; catches; process.exit(1) on
    thrown errors. CLI-safe.

Shipped:
  - runExtractCore({ mode, dir, dryRun?, jsonMode? }) → ExtractResult
  - runEmbedCore({ slug? | slugs? | all? | stale? }) → void
  - runBacklinksCore({ action, dir, dryRun? }) → BacklinksResult
  - runLintCore({ target, fix?, dryRun? }) → LintResult

sync.ts is already correct — performSync throws; runSync wraps. No change.

import.ts deferred to v0.12.0 (its one process.exit fires only on a
missing dir arg; handlers always pass a dir, so worker-kill risk is
zero in practice). Noted in the plan's Out-of-scope.

Smoke verified: all four Core functions throw on invalid mode / missing
dir / not-found target instead of exiting the process.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(jobs): Tier 1 handlers + autopilot-cycle (the killer handler)

registerBuiltinHandlers now handlers every operation autopilot needs to
dispatch via Minions + the single autopilot-cycle handler the autopilot
loop actually submits each interval.

Existing handlers (sync, embed, lint) rewired to call library-level Core
functions directly instead of the CLI wrappers. CLI wrappers call
process.exit(1) on validation errors; if a worker claimed a badly-formed
job, the WORKER PROCESS would die — killing every in-flight job. Cores
throw, so one bad job fails one job.

New handlers:
  - extract  → runExtractCore (mode: links|timeline|all, dir)
  - backlinks → runBacklinksCore (action: check|fix, dir)
  - autopilot-cycle → THE killer handler. Runs sync → extract → embed →
    backlinks inline. Each step wrapped in try/catch; returns
    { partial: true, failed_steps: [...] } when any step fails. Does NOT
    throw on partial failure — that would trigger Minion retry, and an
    intermittent extract bug would block every future cycle. Replaces
    the 4-job parent-child DAG proposed in early plan drafts (Codex
    H3/H4: parent/child is NOT a depends_on primitive in Minions).

import.ts handler still uses the CLI wrapper (runImport) — import's one
process.exit fires only on a missing dir arg and the handler always
passes a dir; Core extraction deferred to v0.12.0 when Tier 2 refactors
happen.

registerBuiltinHandlers promoted from private to exported for testability.

test/handlers.test.ts: 4 tests. Asserts every expected handler name
registers. Asserts autopilot-cycle against a nonexistent repo returns
{ partial: true, failed_steps: ['sync', 'extract', 'backlinks'] } — does
NOT throw. Asserts autopilot-cycle against an empty (but real) git repo
returns a result with a steps map, never throws.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(autopilot): Minions dispatch + worker spawn supervisor + async shutdown

Autopilot now dispatches each cycle as a single `autopilot-cycle` Minion
job (with idempotency_key on the cycle slot) instead of running steps
inline. A forked `gbrain jobs work` child drains the queue durably,
supervised by autopilot. The user runs ONE install step
(`gbrain autopilot --install`) and gets sync + extract + embed + backlinks
+ durable job processing, with no separate worker daemon to manage.

Mode selection:
  - minion_mode=always OR pain_triggered (default), engine=postgres →
    Minions dispatch. Spawn child, submit autopilot-cycle each interval.
  - minion_mode=off, OR engine=pglite, OR `--inline` flag → run steps
    inline in-process, same as pre-v0.11.1. PGLite has an exclusive file
    lock that blocks a second worker process, so the inline path is the
    only path that works there.

Worker supervision:
  - spawn(resolveGbrainCliPath(), ['jobs', 'work'], { stdio: 'inherit' }).
    stdio:'inherit' avoids pipe-buffer blocking (Codex architecture #2).
  - On worker exit: 10s backoff + restart. Crash counter caps at 5 →
    autopilot stops with a clear error.
  - resolveGbrainCliPath() prefers argv[1] (cli.ts / /gbrain), then
    process.execPath (compiled binary suffix check), then `which gbrain`
    (installed to $PATH). NEVER blindly uses process.execPath, which on
    source installs is the Bun runtime, not `gbrain` (Codex architecture
    #1).

Shutdown:
  - Async SIGTERM/SIGINT handler: sends SIGTERM to worker, awaits its
    exit for up to 35s (the worker's own drain is 30s; we add buffer for
    signal-delivery latency), then SIGKILL if still alive.
  - Drops the old `process.on('exit')` lock-cleanup handler — its
    callback runs synchronously and can't wait for the worker drain.
    Lock file cleanup moved inside the async shutdown.

Lock-file mtime refresh every cycle (Codex C) so a long-lived autopilot
doesn't get declared "stale" by the next cron-fired invocation after 10
minutes.

Inline fallback path calls the new Core fns (runExtractCore, runEmbedCore)
instead of the CLI wrappers. That way a bad arg from inside the loop
can't process.exit() the autopilot itself (matches Codex #5).

test/autopilot-resolve-cli.test.ts: 3 tests covering argv[1]-as-gbrain,
argv[1]-as-cli.ts, and graceful error when no path resolves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(autopilot): env-aware install + OpenClaw bootstrap injection

Expand installDaemon from 2 targets (macOS launchd, Linux crontab) to 4:

  - macos              → launchd plist (unchanged)
  - linux-systemd      → ~/.config/systemd/user/gbrain-autopilot.service
                         with Restart=on-failure, RestartSec=30, and an
                         is-system-running probe to confirm the user bus
                         actually works (Codex architecture #7 hardened —
                         the naive /run/systemd/system existence check was
                         a false-positive magnet)
  - ephemeral-container → detects RENDER / RAILWAY_ENVIRONMENT /
                          FLY_APP_NAME / /.dockerenv. Crontab is unreliable
                          here (wiped on deploy), so we write
                          ~/.gbrain/start-autopilot.sh and tell the user
                          to source it from their agent's bootstrap
  - linux-cron         → existing crontab path (unchanged)

detectInstallTarget() + --target flag for explicit override. Also:
  - --inject-bootstrap / --no-inject control OpenClaw ensure-services.sh
    auto-injection. Default is ON when OpenClaw is detected (OPENCLAW_HOME
    env var, openclaw.json in CWD or $HOME, or an ensure-services.sh
    found). Injection adds ONE line with a `# gbrain:autopilot v0.11.0`
    marker and writes .bak.<ISO-timestamp> before touching the file.
    Idempotent — the marker check prevents double injection.

uninstallDaemon mirrors all four targets. A user can now run
`gbrain autopilot --uninstall` after moving hosts (macOS laptop → Linux
server) and the uninstall will find + remove every artifact.

writeWrapperScript now uses resolveGbrainCliPath() instead of blindly
baking process.execPath into the wrapper script — on source installs
that path is the Bun runtime, not gbrain (Codex architecture #1 fix
propagated to the install path too).

test/autopilot-install.test.ts: 4 tests covering detectInstallTarget's
platform + env-var branches. Deeper E2E coverage (systemd unit file
contents, ephemeral start-script contents + exec bit, OpenClaw marker
injection + .bak) lives in Task 14's E2E fixture test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(migrations): v0.11.0 orchestrator — phases A through G, full implementation

Replaces the stub from commit de027ce. The orchestrator runs all seven
phases of the v0.11.0 Minions adoption migration idempotently, resumable
from any prior status:"partial" run (the stopgap bash script writes
those).

Phases:
  A. Schema  — `gbrain init --migrate-only` (NEVER bare `gbrain init`,
               which defaults to PGLite and clobbers existing configs —
               Codex H1 show-stopper).
  B. Smoke   — `gbrain jobs smoke`. Abort loudly on non-zero.
  C. Mode    — --mode flag wins. Preserved from prefs on resume. Non-TTY
               or --yes defaults pain_triggered with explicit print.
               Interactive: numbered 1/2/3 menu via shared promptLine.
  D. Prefs   — savePreferences({minion_mode, set_at, set_in_version}).
  E. Host    — AGENTS.md marker injection + cron manifest rewrites. For
               cron entries whose skill matches a gbrain builtin
               (sync/embed/lint/import/extract/backlinks/autopilot-cycle)
               rewrites kind:agentTurn → kind:shell with a
               gbrain jobs submit command. PGLite branch keeps --follow
               (inline execution, the only path that works without a
               worker daemon); Postgres branch drops --follow + adds
               --idempotency-key ${handler}:${slot} so long cron jobs
               don't stack up (same Codex fix as the autopilot-cycle
               dispatch). For non-builtin handlers (host-specific, like
               ea-inbox-sweep, frameio-scan, x-dm-triage) emits a
               structured TODO row to
               ~/.gbrain/migrations/pending-host-work.jsonl so the host
               agent can walk through plugin-contract work per
               skills/migrations/v0.11.0.md.
  F. Install — `gbrain autopilot --install --yes`. Best-effort (failure
               doesn't abort; user can run manually).
  G. Record  — append to completed.jsonl. status:"complete" unless
               pending_host_work > 0, in which case status:"partial" +
               apply_migrations_pending: true.

Safety guards (Codex code-quality tension #3: strict-skip, no rollback):
  - Scope: $HOME/.claude + $HOME/.openclaw only by default. --host-dir
    must be explicit to include $PWD or any other path.
  - Symlink escape: SKIP if the resolved target leaves the scoped root.
  - >1 MB files: SKIP with warning.
  - Permission denied: SKIP with warning; other files continue.
  - Malformed JSON manifest: SKIP with parse error logged; continue.
  - mtime re-check right before write: bail the file if changed between
    read + write; other files continue.
  - Every edit writes a .bak.<ISO-timestamp> sibling first (second-
    precision so two same-day runs don't collide).
  - Idempotency: `_gbrain_migrated_by: "v0.11.0"` JSON property marker
    on each rewritten cron entry (JSON can't have comments — Codex G);
    AGENTS.md marker `<!-- gbrain:subagent-routing v0.11.0 -->`.
  - TODO dedupe: JSONL appends deduped by (handler, manifest_path) so
    reruns don't grow the file.

Post-run summary: when pending_host_work > 0, prints a one-liner
pointing the user at the JSONL path + the v0.11.0 skill file. The skill
(Lane C-3 / C-4) is the host-agent instruction manual.

test/migrations-v0_11_0.test.ts: 18 tests covering:
  - AGENTS.md injection: happy path, .bak creation, idempotent rerun,
    --dry-run no-op, symlink-escape SKIP, >1MB SKIP.
  - Cron rewrite: builtin handlers rewrite to shell+gbrain jobs submit,
    non-builtins emit JSONL TODOs without touching the manifest, mixed
    manifests get both treatments in one pass, idempotent rerun, TODO
    dedupe, malformed JSON SKIP, no-entries-array SKIP, --dry-run no-op.
  - findAgentsMdFiles + findCronManifests: scoped walk to $HOME/.claude +
    $HOME/.openclaw, --host-dir opt-in for $PWD.
  - BUILTIN_HANDLERS frozen at the canonical 7 names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(skill): port skillify from Wintermute, pair with check-resolvable

Skillify is the "meta skill": turn any raw feature or script into a
properly-skilled, tested, resolvable, evaled unit of agent-visible
capability. Proven in production on Wintermute; paired with gbrain's
existing `check-resolvable` it becomes a user-controllable equivalent of
Hermes' auto-skill-creation — you decide when and what, the tooling
keeps the checklist honest.

Shipped:
  - skills/skillify/SKILL.md — ported from ~/git/wintermute/workspace/
    skills/skillify/SKILL.md. Genericized:
      * /data/.openclaw/workspace → \${PROJECT_ROOT} (runtime-detected).
      * services/voice-agent/__tests__/ → test/ (detected from repo).
      * Manual `grep skills/... AGENTS.md` replaced with a reference to
        `gbrain check-resolvable`, which does reachability + MECE + DRY
        + gap detection properly instead of grep-matching a path string.
  - scripts/skillify-check.ts — ported from
    ~/git/wintermute/workspace/scripts/skillify-check.mjs. Preserves the
    --recent flag and --json output shape. Detects project root via
    package.json walkup; detects test dir (test/ → __tests__/ → tests/
    → spec/). Runs the 10-item checklist per target and exits non-zero
    if any required item is missing.
  - test/skillify-check.test.ts — 4 CLI tests: happy-path against
    publish.ts (known-skilled), --json shape + schema, --recent smoke,
    bogus-target exit code.
  - skills/RESOLVER.md — adds the trigger row ("Skillify this", "is
    this a skill?", "make this proper") → skills/skillify/SKILL.md.
  - skills/manifest.json — adds the skillify entry so the conformance
    test passes.

Why the pair:
  * Hermes auto-creates skills in the background. Fine until you don't
    know what the agent shipped — checklists decay silently.
  * gbrain ships the same capability as two user-controlled tools:
    /skillify builds the checklist, gbrain check-resolvable validates
    reachability + MECE + DRY across the whole skill tree.
  * Human keeps judgment. Tooling keeps the checklist honest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(v0.11.1): cron-via-minions convention, plugin-handlers guide, minions-fix, skill updates

New reference docs:
  - skills/conventions/cron-via-minions.md — the rewrite convention for
    cron manifests. Shows the Postgres (fire-and-forget + idempotency-
    key) vs PGLite (--follow inline) branch; explains why builtin-only
    auto-rewrite is safe + how host-specific handlers get the plugin
    contract.
  - docs/guides/plugin-handlers.md — the plugin contract for host-
    specific Minion handlers. Code-level registration via import +
    worker.register(), not a data file (Codex D: handlers.json was an
    RCE surface). Concrete TypeScript skeleton + handler contract
    (ctx.data, ctx.signal, ctx.inbox) + full migration flow from TODO
    JSONL to a rewritten cron entry.
  - docs/guides/minions-fix.md — user-facing troubleshooting for
    half-migrated v0.11.0 installs. Paste-one-liner for the stopgap,
    gbrain apply-migrations path for v0.11.1+, verification commands,
    failure-mode recipes.

Rewrites + updates:
  - skills/migrations/v0.11.0.md — body restored as the host-agent
    instruction manual. Audience is the host agent reading
    ~/.gbrain/migrations/pending-host-work.jsonl after the CLI
    orchestrator has done the mechanical phases. Walks each TODO type
    through the 10-item skillify checklist (plugin contract, ship
    bootstrap, unit tests, integration tests, LLM evals, resolver
    trigger, trigger eval, E2E smoke, brain filing, check-resolvable).
    Reverses the earlier "delete the body" decision (1B) because the
    body serves a different audience now — host-agent, not CLI
    documentation.
  - skills/cron-scheduler/SKILL.md — Phase 4 ("Register with host
    scheduler") now references cron-via-minions + plugin-handlers.
  - skills/maintain/SKILL.md — new "Fix a half-migrated install"
    section with the apply-migrations recipe.
  - skills/setup/SKILL.md — new Phase C.5 "One-step autopilot +
    Minions install (v0.11.1+)" explaining the four install targets
    + the OpenClaw auto-injection default.
  - docs/GBRAIN_SKILLPACK.md — Operations section adds the three new
    guides + the subagent-routing and cron-routing SKILLPACK notes
    (v0.11.0+).

All 167 related tests (conformance + resolver + skillify-check + v0_11_0
orchestrator) stay green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.11.1): stopgap script + CLAUDE.md directive + README + CHANGELOG + version bump

scripts/fix-v0.11.0.sh — the paste-command for broken-v0.11.0 installs.
Released on the v0.11.1 tag so:
  curl -fsSL https://raw.githubusercontent.com/garrytan/gbrain/v0.11.1/scripts/fix-v0.11.0.sh | bash
always works (master branch could be renamed). 8 steps: schema apply,
smoke, mode prompt (non-TTY defaults pain_triggered), atomic write of
preferences.json (0o600), append completed.jsonl with status:"partial"
and apply_migrations_pending:true so the v0.11.1 apply-migrations run
resumes correctly (does NOT poison the permanent migration path —
Codex H2 avoidance), AGENTS.md + cron/jobs.json detection with guidance
printed as text only (never auto-edits from a curl-piped script), and a
closing line telling the user to run `gbrain autopilot --install` as the
one-stop finisher.

CLAUDE.md — new "Migration is canonical, not advisory" section pinning
the design principle. Any host-repo change (AGENTS.md, cron manifests,
launchctl units) is GBrain's responsibility via the migration; the
exception is host-specific handler registration, which goes via the
code-level plugin contract in docs/guides/plugin-handlers.md.

README.md — new sections:
  - "v0.11.0 migration didn't fire on your upgrade?" with both repair
    paths (v0.11.1 binary and pre-v0.11.1 stopgap).
  - "Skillify + check-resolvable: user-controllable auto-skill-creation"
    explaining why the user-controlled pair beats Hermes-style auto
    generation. Includes the scripts/skillify-check.ts invocation.

CHANGELOG.md — v0.11.1 entry (per CLAUDE.md voice: lead with what the
user can now do that they couldn't before; frame as benefits, not files
changed). Covers: mega-bug fix + apply-migrations + postinstall +
stopgap, autopilot-supervises-worker + single-install-step + env-aware
targets, Core fn extraction so handlers don't kill workers, skillify +
check-resolvable pair, host-agnostic plugin contract replacing
handlers.json (RCE concern), gbrain init --migrate-only, TS migration
registry + H8/H9 diff-rule fixes, CLAUDE.md directive. All Codex hard
blockers (H1, H3/H4, H5, H6, H7, H8, H9, K) + architecture issues
(#1/#2/#4/#5/#7) resolved.

package.json — version bump 0.11.0 → 0.11.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): migration-flow E2E against live Postgres + Bun env quirk fix

Ships test/e2e/migration-flow.test.ts — the end-to-end integration test
for the v0.11.0 orchestrator. Spins up against a live Postgres (gated
on DATABASE_URL per CLAUDE.md lifecycle) and exercises four scenarios:

  - Fresh install: schema apply (Phase A via `gbrain init --migrate-only`)
    → smoke (Phase B) → mode resolution (C) → prefs (D) → host rewrite
    (E, empty fixture) → record (G). Asserts preferences.json exists with
    0o600, completed.jsonl has a v0.11.0 entry, autopilot install was
    skipped per --no-autopilot-install.
  - Idempotent rerun: second orchestrator invocation on a completed
    install doesn't blow up; mode stays stable.
  - Host rewrite mixed manifest: 4-entry cron/jobs.json with 2 gbrain-
    builtin handlers (sync, embed) + 2 non-builtin (ea-inbox-sweep,
    morning-briefing). Asserts builtins rewrite to `gbrain jobs submit`
    kind:shell, non-builtins are LEFT on kind:agentTurn, and 2 JSONL
    TODOs are emitted with correct shape. AGENTS.md gets the marker
    injected. Status is "partial" because pending-host-work > 0.
  - Resumable: stopgap writes a partial completed.jsonl row first;
    orchestrator re-runs successfully against it and appends a new
    post-orchestrator entry. 1 partial + 1 complete = 2 rows total.

Critical fix surfaced by the E2E: src/commands/migrations/v0_11_0.ts's
three execSync calls (gbrain init --migrate-only, gbrain jobs smoke,
gbrain autopilot --install) now explicitly pass `env: process.env`.
Bun's execSync default does NOT propagate post-start `process.env.PATH`
mutations to subprocesses — only the initial PATH snapshot. Without the
explicit env, any user-side env tweak (e.g. setting GBRAIN_DATABASE_URL
in a script before calling the orchestrator) would be invisible to the
orchestrator's subprocesses. This is also the reason the E2E needs a
PATH shim installed at module-load time to expose the `gbrain` command.

test/init-migrate-only.test.ts: subprocess env now strips DATABASE_URL
and GBRAIN_DATABASE_URL. The "no config" error-path tests need
loadConfig() to return null, which it won't if the env-var fallback at
src/core/config.ts:30 fires. Before this fix, running the unit tests
with DATABASE_URL set (e.g. during an E2E run) caused false failures
because `gbrain init --migrate-only` saw the env var and succeeded.

Full test totals with live Postgres: 1265 pass, 0 fail, 3497 expect
calls, 67 files, ~95s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump VERSION file to 0.11.1

Commit 5c4cf1d bumped package.json version to 0.11.1 but missed the
root VERSION file. src/version.ts reads from package.json so
`gbrain --version` prints 0.11.1 correctly, but any tool or script
that reads the VERSION file directly (like /ship's idempotency check)
saw the stale 0.11.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.11.1): doctor self-heal check + skillpack-check command for cron health reports

Closes the discoverability hole from the v0.11.0 mega-bug: once a user is
on v0.11.1 (or later), every `gbrain doctor` invocation immediately
surfaces a half-migrated state, and `gbrain skillpack-check` gives host
agents (Wintermute's morning-briefing, any OpenClaw cron) a single
exit-coded JSON pipe to check from their own skills.

gbrain doctor — two new checks:
  1. Filesystem-only (fires on every `doctor` invocation, even --fast):
     if `~/.gbrain/migrations/completed.jsonl` has any status:"partial"
     entry with no matching status:"complete" for the same version, print
     `MINIONS HALF-INSTALLED (partial migration: vX.Y.Z). Run: gbrain
     apply-migrations --yes`. Typical cause is the stopgap wrote a
     partial record but nobody ran `apply-migrations` afterward.
  2. DB-path: if schema version is v7+ (Minions present) AND
     `~/.gbrain/preferences.json` is missing, print the same banner.
     Catches installs that never ran the stopgap or apply-migrations at
     all — the classic v0.11.0 "upgrade landed, migration never fired"
     state.

Both checks status:"fail" so doctor exits non-zero when either fires.
Test `test/doctor-minions-check.test.ts` pins the five branches
(partial present → FAIL, partial+complete → quiet, no-jsonl → quiet,
multiple versions named correctly, human-readable banner contains the
exact "MINIONS HALF-INSTALLED" phrase Wintermute's cron can grep for).

gbrain skillpack-check — new command + skill:
  - `src/commands/skillpack-check.ts` wraps `doctor --fast --json` +
    `apply-migrations --list` into one JSON report with `{healthy,
    summary, actions[], doctor, migrations}`. Exit 0 on healthy, 1 on
    action-needed, 2 on determine-failure. `--quiet` flag for cron
    pipes that want exit-code-only behavior.
  - `actions[]` is the remediation list. Doctor messages of the form
    `... Run: <cmd>` get their command extracted (regex fixed to match
    the full remainder of the line, not just the first word). Pending
    or partial migrations push `gbrain apply-migrations --yes` to the
    front of actions[].
  - `gbrainSpawn()` helper resolves the gbrain invocation correctly on
    compiled binary installs (`argv[1] = /usr/local/bin/gbrain`) AND
    source installs (`argv[1] = src/cli.ts`, prefix with `bun run`).
    Same Codex #1 fix pattern as autopilot's resolveGbrainCliPath.
  - `skills/skillpack-check/SKILL.md` teaches agents when to run it,
    what to do with the output, and anti-patterns (don't run without
    --quiet in a cron that emails; don't ignore exit 2).
  - Registered in skills/RESOLVER.md and skills/manifest.json.

Test `test/skillpack-check.test.ts` (5 tests) covers healthy fresh
install, half-migrated exit-1 with apply-migrations in actions[],
--quiet suppresses stdout in both states, --help prints usage, summary
includes top action when multiple are present.

1192 unit tests pass (+15 new). The 38 failing tests are all
DATABASE_URL E2Es — same pre-existing pattern, unchanged by this
commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* doc(v0.11.1): reframe README + minions-fix — v0.11.0 was never released

v0.11.0 was cut but never released publicly. v0.11.1 is the first
public Minions ship, and fixes the upgrade-migration mega-bug so it
self-heals on every future `gbrain upgrade` + `bun update gbrain`.
The README was wrongly framing the fix as a retrospective for v0.11.0
users — none exist, so remove it.

README changes:
  - Delete the "v0.11.0 migration didn't fire on your upgrade?" section.
    Replace with "Health check and self-heal": the `gbrain doctor`,
    `gbrain skillpack-check --quiet`, and `gbrain skillpack-check | jq`
    recipes that ship in v0.11.1. Still links to docs/guides/minions-fix.md
    for deeper troubleshooting.
  - Promote the production benchmark to top billing. The previous section
    led with the lab benchmark (same LLM, localhost) and buried the
    production data point as a single follow-up sentence. Real deployment
    numbers are the stronger signal:
      * 753ms vs >10s gateway timeout (sub-agent couldn't even spawn)
      * $0.00 vs ~$0.03 per run
      * 100% vs 0% success rate under 19-cron production load
      * 36-month tweet backfill: 19,240 tweets, ~15 min, $0.00
    Lab numbers stay (separate table, labeled "controlled environment")
    so readers can see both layers.
  - Add the "The routing rule" closer: Deterministic → Minions, Judgment
    → Sub-agents. This is the clearest framing in the production
    benchmark doc and belongs in the README so readers leave with the
    right mental model. `minion_mode: pain_triggered` automates it.

docs/guides/minions-fix.md rewrite:
  - Reframe as: v0.11.0 never released, v0.11.1 is the first ship,
    `gbrain apply-migrations --yes` is canonical. Stopgap stays
    documented for pre-v0.11.1 branch builds (e.g. Wintermute's
    minions-jobs checkout before v0.11.1 tags).
  - Add the detection + verification commands (doctor + skillpack-check)
    at the top.
  - Cross-reference skills/skillpack-check/SKILL.md as the agent-facing
    health-check pattern.

Zero lingering "v0.11.0 released" references in README or minions-fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(doctor): remove "schema v7+ no prefs → FAIL" check (too aggressive)

CI failure in Tier 1 Mechanical E2E:
  (fail) E2E: Doctor Command > gbrain doctor exits 0 on healthy DB

Root cause: the doctor half-migration detection added two checks. The
second check (`schema v7+ AND ~/.gbrain/preferences.json missing →
minions_config FAIL`) was too aggressive. It treated a valid fresh-
install state as broken.

`gbrain init` against Postgres applies schema v7 but doesn't write
preferences.json — that's the migration orchestrator's Phase D, which
only runs via `apply-migrations`. Between `init` finishing and the user
running `apply-migrations`, the install is legitimately in a
"schema-applied, no prefs" state. Doctor was exiting 1 on this valid
state, breaking the pre-existing CI test that init's + docters a
healthy DB.

Fix: drop the check. The filesystem check (step 3 — partial-completed
without a matching complete) is sufficient signal for genuine half-
migration. Added a regression test pinning the exact CI scenario: no
completed.jsonl present, no preferences.json, doctor must not fail any
minions_* check.

Also removes the now-unused `preferencesPaths` import.

Verified against live Postgres: CI-equivalent `gbrain doctor` + `gbrain
doctor --json` both pass. Full suite: 1281/1281 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* doc(readme): Minions section — lead with the story, compress the rest

The previous section opened with "six daily pains" as a numbered list
before the hook, buried the production numbers halfway down, and had
a table explaining how each pain gets fixed. Fine for a spec doc;
wrong for a README that needs to land the impact fast.

Rewrite:
  - Lead with "your sub-agents won't drop work anymore" — the reason
    a reader is here.
  - Production numbers promoted, framed as a story: "Here's my
    personal OpenClaw deployment: one Render container, Supabase
    Postgres holding a 45,000-page brain, 19 cron jobs firing on
    schedule, the X Enterprise API on the wire..." Gives the reader
    the setup before the punchline.
  - The routing rule (deterministic → Minions, judgment → sub-agents)
    survives unchanged. It's the clearest framing in the whole section.
  - Lose the "how each pain gets fixed" table. Compress the six pains
    + their fixes into one paragraph that names the primitives by
    name (max_children, timeout_ms, child_done inbox, cascade cancel,
    idempotency keys, attachment validation). Readers who want depth
    click through to skills/minion-orchestrator/SKILL.md.
  - Close with "not incrementally better — categorically different"
    and the three headline numbers.
  - Drop the separate Lab Numbers table; the production numbers are
    stronger and the lab data is one click away via the link.

Lines: 75 → 42. Same signal, less scroll.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* doc: scrub X Enterprise API + @garrytan references from user-facing docs

User feedback: shouldn't name the specific enterprise-tier API product
or the account in the README or benchmark docs. Genericize:

  - "X Enterprise API on the wire" → drop entirely; the 19-cron load
    story carries the setup without naming the vendor
  - "X Enterprise API ($50K/mo firehose)" → "external API"
  - "@garrytan tweets" → "my social posts"
  - "Pull ~100 @garrytan tweets" → "Pull ~100 of my social posts"
  - "X Enterprise API (full-archive)" env var comment → "external API
    bearer token"

Scope:
  - README.md — the Minions production story line + scaling callout
  - docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md
  - docs/benchmarks/2026-04-18-tweet-ingestion.md

Plain "X API" references in the tweet-ingestion methodology stay —
those describe which public HTTP endpoint was called, not the
enterprise-tier product. Benchmark doc filenames (tweet-ingestion.md)
stay to preserve inbound links; content is genericized.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* doc(readme): Skillify section — match Minions energy, land the category shift

The previous section was competent but undersold what skillify actually
is. Rewrite matches the Minions section's shape: lead with the hook,
tell the story, land the punchline.

Key changes:
  - Title: "your skills tree stops being a black box." Names the thing
    skillify actually solves.
  - Open with the problem: Hermes auto-creates skills as a background
    behavior. Six months later you have an opaque pile nobody's read
    or tested. Make the liability concrete.
  - Promote the 10 items by name (SKILL.md + script + unit tests +
    integration tests + LLM evals + resolver trigger + trigger eval +
    E2E + brain filing + check-resolvable audit). Showing the list
    makes the scope of the unlock visible.
  - New subsection "Why this is the right answer for OpenClaw" names
    the debugging-the-black-box pain directly. Skillify makes the tree
    legible: when something breaks, you know which layer (contract,
    test, eval, trigger, or route) to inspect. When anything goes
    stale, check-resolvable flags it.
  - Close with "compounding quality instead of compounding entropy" +
    "not a nice-to-have. It's the piece that makes the skills tree
    survive six months."
  - Expand the code block to include `gbrain check-resolvable` (the
    other half of the pair) so readers see the whole workflow.

Length goes from 17 to 34 lines — still shorter than Minions, still
one section. Worth the space because this is a category shift for
how agent skills get built, not a feature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: root <root@localhost>
2026-04-18 16:57:38 +08:00
Garry Tan
91ced664b6 feat: Voice v0.8.0 + feature discovery + Edge Function removal (#55)
* chore: remove Supabase Edge Function MCP deployment

The Edge Function never worked reliably. All MCP traffic goes through
self-hosted server + ngrok tunnel. Removes deploy-remote.sh, edge-entry.ts,
supabase/functions/, .env.production.example, and CHATGPT.md (OAuth not
implemented).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite MCP docs for self-hosted + ngrok deployment

All per-client guides updated from Edge Function URLs to self-hosted
server + ngrok tunnel pattern. DEPLOY.md rewritten with local vs remote
paths. ALTERNATIVES.md now shows self-hosted as primary, with ngrok,
Tailscale, and Fly.io/Railway comparison.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: voice recipe v0.8.0 — 25 production patterns from real deployment

Identity separation, pre-computed bid system, conversation timing fix,
proactive advisor mode, radical prompt compression, OpenAI Realtime
Prompting Guide structure, auth-before-speech, brain escalation, stuck
watchdog, never-hang-up rule, thinking sounds, fallback TwiML, tool set
architecture, trusted user auth, caller routing, dynamic VAD, on-screen
debug UI, live moment capture, belt-and-suspenders post-call, mandatory
3-step post-call, WebRTC parity, dual API events, report-aware query
routing. WebRTC pseudocode updated with native FormData and 6 gotchas.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: post-upgrade feature discovery framework

upgrade.ts captures old version before upgrading, then execs
gbrain post-upgrade (new binary) to read migration files and print
feature pitches. Migration files get YAML frontmatter with feature_pitch
field (headline, description, recipe, tiers). CLI prints excited builder
tone post-upgrade. v0.8.0 migration offers voice setup with environment
detection (server vs local) and 3-tier progressive disclosure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Voice section to README with WebRTC screenshot + tweet link

Her out of the box: voice-to-brain with 25 production patterns. WebRTC
client screenshot embedded. Remote MCP section rewritten for self-hosted
+ ngrok. Setup block genericized.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add recipe validation tests + genericize personal refs

5 new integration tests: secrets completeness, semver version, requires
resolution, all-recipes-parse, no-personal-references. Test fixture
genericized. CLAUDE.md/TODOS.md/SKILLPACK updated for v0.8.0. build:edge
script removed from package.json.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.8.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 10:52:30 -10:00
Garry Tan
3e21e9b69b feat: GBrain v0.6.0 — Remote MCP Server + 12 Bug Fixes (#28)
* fix: 7 bug fixes from Issue #9 and #22

- fix(mcp): use ListToolsRequestSchema/CallToolRequestSchema instead of string literals (Issue #9, PR #25)
- fix(mcp): handleToolCall reads dry_run from params instead of hardcoding false (#22 Bug #11)
- fix(search): keyword search returns best chunk per page via DISTINCT ON, not all chunks (#22 Bug #8)
- fix(search): dedup layer 1 keeps top 3 chunks per page instead of collapsing to 1 (#22 Bug #12)
- fix(engine): transaction uses scoped engine via Object.create, no shared state mutation (#22 Bug #2)
- fix(engine): upsertChunks uses UPSERT instead of DELETE+INSERT, preserves existing embeddings (#22 Bug #1)
- fix(slugs): validateSlug normalizes to lowercase, pathToSlug lowercases consistently (#22 Bug #4)
- schema: add unique index on content_chunks(page_id, chunk_index) for UPSERT support
- schema: add access_tokens and mcp_request_log tables via migration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: embed schema.sql at build time, remove fs dependency from initSchema

initSchema() previously read schema.sql from disk at runtime via readFileSync,
which broke in compiled Bun binaries and Deno Edge Functions. Now uses a
generated schema-embedded.ts constant (run `bun run build:schema` to regenerate).

- Removes fs and path imports from postgres-engine.ts and db.ts
- Adds scripts/build-schema.sh for one-source-of-truth generation
- Adds build:schema npm script

Fixes Issue #22 Bug #6.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: 5 more bug fixes from Issue #22

- fix(file_upload): call storage.upload() in all 3 paths (operation, CLI upload, CLI sync) with rollback semantics (#22 Bug #9)
- fix(import): use atomic index counter for parallel queue instead of array.shift() race, preserve checkpoint on errors (#22 Bug #3)
- fix(s3): replace unsigned fetch with @aws-sdk/client-s3 for proper SigV4 auth, supports R2/MinIO via forcePathStyle (#22 Bug #10)
- fix(redirect): verify remote file exists before deleting local copy, skip files not found in storage (#22 Bug #5)
- deps: add @aws-sdk/client-s3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: remote MCP server via Supabase Edge Functions

Deploy GBrain as a serverless remote MCP endpoint on your existing Supabase
instance. One brain, accessible from Claude Desktop, Claude Code, Cowork,
Perplexity Computer, and any MCP client. Zero new infrastructure.

New files:
- supabase/functions/gbrain-mcp/index.ts — Edge Function with Hono + MCP SDK
- supabase/functions/gbrain-mcp/deno.json — Deno import map
- src/edge-entry.ts — curated bundle entry point (excludes fs-dependent modules)
- src/commands/auth.ts — standalone token management (create/list/revoke/test)
- scripts/deploy-remote.sh — one-script deployment
- .env.production.example — 3-value config template

Changes:
- config.ts: lazy-evaluate CONFIG_DIR (no homedir() at module scope)
- schema.sql: add access_tokens + mcp_request_log tables
- package.json: add build:edge script

Auth: bearer tokens via access_tokens table (SHA-256 hashed, per-client, revocable)
Transport: WebStandardStreamableHTTPServerTransport (stateless, Streamable HTTP)
Health: /health endpoint (unauth: 200/503, auth: postgres/pgvector/openai checks)
Excluded from remote: sync_brain, file_upload (may exceed 60s timeout)

Setup: clone, fill .env.production, run scripts/deploy-remote.sh, create token, done.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: per-client MCP setup guides

- docs/mcp/DEPLOY.md — deployment walkthrough, auth, troubleshooting, latency table
- docs/mcp/CLAUDE_CODE.md — claude mcp add command
- docs/mcp/CLAUDE_DESKTOP.md — Settings > Integrations (NOT JSON config!)
- docs/mcp/CLAUDE_COWORK.md — remote + local bridge paths
- docs/mcp/PERPLEXITY.md — Perplexity Computer connector setup
- docs/mcp/CHATGPT.md — coming soon (requires OAuth 2.1, P0 TODO)
- docs/mcp/ALTERNATIVES.md — Tailscale Funnel + ngrok self-hosted options

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.6.0)

GBrain v0.6.0: Remote MCP server via Supabase Edge Functions + 12 bug fixes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Remote MCP Server section to README

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: make document-release mandatory in CLAUDE.md, add MCP key files

Post-ship requirements section: document-release is NOT optional. Lists every
file that must be checked on every ship. A ship without updated docs is incomplete.

Also adds remote MCP server files to Key files section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: batch upsertChunks into single statement to prevent deadlocks

The per-chunk UPSERT loop caused deadlocks under parallel workers because
each INSERT ON CONFLICT acquired row-level locks sequentially. Multiple
workers upserting different pages could deadlock on the shared unique index.

Fix: batch all chunks into a single multi-row INSERT ON CONFLICT statement.
One round-trip, one lock acquisition. COALESCE preserves existing embeddings
when the new value is NULL.

Fixes CI failure: "E2E: Parallel Import > parallel import with --workers 4"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: advisory lock in initSchema() prevents deadlock on concurrent DDL

When multiple processes call initSchema() concurrently (e.g., test setup +
CLI subprocess, or parallel workers during E2E tests), the schema SQL's
DROP TRIGGER + CREATE TRIGGER statements acquire AccessExclusiveLock on
different tables, causing deadlocks.

Fix: pg_advisory_lock(42) serializes all initSchema() calls within the
same database. The lock is session-scoped and released in a finally block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add explicit test timeouts for CLI subprocess E2E tests

CLI subprocess tests (Setup Journey, Doctor Command, Parallel Import)
spawn `bun run src/cli.ts` which takes several seconds to JIT compile +
connect. The Bun test framework default 5000ms per-test timeout is too
tight for CI. Added 30-60s timeouts matching each subprocess's own
timeout to prevent false failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: infinite recursion in config.ts exported getConfigDir/getConfigPath

The replace_all refactor created recursive functions: the exported
getConfigDir() called the private getConfigDir() which called itself.
Renamed exports to configDir()/configPath() to avoid shadowing.

Also adds scripts/smoke-test-mcp.ts — verified all 8 MCP tool calls
work against a real Postgres database.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:23:00 -10:00