Files
gbrain/docs/guides/minions-fix.md
Garry Tan 0e9f8814a5 feat: v0.16.0 — durable agent runtime (gbrain agent + subagent handler + plugin loader) (#258)
* refactor(mcp): extract buildToolDefs helper for subagent tool registry reuse

The inline operations.map(...) block in src/mcp/server.ts became the only
source of truth for agent-facing tool definitions. Extract into a reusable
exported helper so the v0.15 subagent tool registry can call it with a
filtered OPERATIONS subset instead of duplicating the shape.

Byte-for-byte equivalence regression pinned in test/mcp-tool-defs.test.ts —
legacy inline mapping kept verbatim inside the test so any future drift
between the new helper and the pre-extraction MCP schema fails loudly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(operations): subagent-aware OperationContext + put_page namespace

Adds three optional fields to OperationContext:
  - jobId?: number       — the currently running Minion job id
  - subagentId?: number  — the owning subagent job id for tool-dispatched calls
  - viaSubagent?: boolean — FAIL-CLOSED flag for agent-path gating

put_page now enforces a namespace rule when invoked on the subagent tool
dispatch path (viaSubagent=true): writes MUST target
`wiki/agents/<subagentId>/...`. Anchored, slash-boundary enforced so a
collision like `wiki/agents/12evil/...` can't impersonate subagent 12.

The check runs BEFORE the dry-run short-circuit so preview calls surface
the same rejection. Fail-closed: a missing subagentId with viaSubagent=true
rejects every slug rather than letting a dispatcher bug open a hole.

Existing callers unaffected — all three fields are optional and the legacy
put_page behavior is unchanged when viaSubagent is undefined/false.

12 regression + namespace tests pin:
  - local CLI writes (viaSubagent unset) accept arbitrary slugs
  - MCP writes (remote=true, viaSubagent unset) accept arbitrary slugs
  - subagent-path: anchored prefix accepted, wrong id rejected, prefix-
    collision defeated, leading-slash rejected, bare-prefix rejected,
    fail-closed on missing/NaN subagentId, permission_denied code emitted

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(schema): v0.15.0 subagent runtime tables + migration orchestrator

Adds three new tables for the durable LLM agent runtime:

  subagent_messages         — Anthropic message-block persistence.
                              Parallel tool_use blocks in one assistant
                              message live in content_blocks JSONB, not
                              across rows (fixes the (job_id, turn_idx, role)
                              misdesign codex caught in v0.13 drafting).

  subagent_tool_executions  — Two-phase tool ledger. INSERT pending before
                              execute, UPDATE complete/failed after. Replay
                              re-runs pending rows only if the tool is
                              idempotent (v1 ships only idempotent tools so
                              this is preventive).

  subagent_rate_leases      — Lease-based concurrency cap for outbound
                              providers (e.g. anthropic:messages). Stale
                              leases auto-prune on next acquire so crashed
                              workers can't strand capacity.

All DDL uses CREATE TABLE/INDEX IF NOT EXISTS — order-independent vs
PR #244's initSchema() reorder, and idempotent across fresh-install +
upgrade paths. Shipped in both src/schema.sql (Postgres) and
src/core/pglite-schema.ts (PGLite); schema-embedded.ts regenerated.

Migration orchestrator v0_15_0.ts (phases: schema → verify → record).
v0_14_0.ts is a no-op stub so the registry's version sequence stays
gapless (v0.14.0 shipped shell-jobs — code change, no DB migration).

10 unit tests for registry wiring, ordering, dry-run phase behavior, and
schema-embedded table presence. test/apply-migrations.test.ts updated for
the two new registry entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): emit child_done on every terminal + max_stalled per-job + terminal set fix

Three correctness fixes the v0.15 subagent aggregator spine depends on:

1. child_done emission on ALL terminal transitions, not just success.
   - completeJob already emitted on success — now also tags outcome='complete'.
   - failJob newly emits on terminal 'failed' or 'dead' (outcome='failed'|'dead',
     error=<text>), BEFORE the parent-terminal UPDATE so the EXISTS guard on
     the inbox INSERT doesn't skip it on fail_parent paths (codex catch).
   - cancelJob now emits outcome='cancelled' per descendant with a parent.
   - handleTimeouts now emits outcome='timeout' per timed-out child.
   ChildDoneMessage gains optional { outcome, error } — backwards compatible
   (legacy writers omitted them; consumers treat absent outcome as 'complete').

2. Parent-resolution terminal set now includes 'failed'.
   Pre-v0.15 the `NOT EXISTS (... status NOT IN ('completed','dead','cancelled'))`
   guard treated a failed child as still-pending, stranding aggregator parents
   that chose on_child_fail='continue' or 'ignore' in waiting-children forever.
   Expanded to {completed, failed, dead, cancelled} everywhere parent resolution
   reads child status (completeJob inline, failJob remove_dep + continue,
   cancelJob sweep, handleTimeouts sweep, and the resolveParent method itself).

3. MinionJobInput.max_stalled threads through MinionQueue.add() on INSERT.
   Column exists with default 1 — that is "first stall → dead", which defeats
   crash recovery for long-running handlers. Subagent children will set
   max_stalled: 3 to survive mid-run worker kills. Second-submitter under an
   idempotency-key hit does NOT mutate the existing row (codex-flagged
   footgun — first-submit options are load-bearing state).

13 unit tests pin: emission on each of completeJob/failJob/cancelJob/
handleTimeouts, insertion order on fail_parent, terminal-set expansion with
continue policy, max_stalled default + override + idempotency behavior.

E2E tier 1 (Postgres) passes 141 tests unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): rate-leases + waitForCompletion infra for v0.15 subagent

Two infrastructure modules the subagent handler spine depends on:

rate-leases.ts — lease-based concurrency cap for outbound providers
(anthropic:messages, openai:*, etc.). Counter-based limiters leak capacity
on worker crash; leases are owner-tagged rows with expires_at that
auto-prune on the next acquire. Two-phase: txn-scoped pg_advisory_xact_lock
guards the check-then-insert so concurrent acquires can't both win the
"last slot". renewLeaseWithBackoff retries 3x (250/500/1000ms) for mid-
call DB blips — on persistent failure the LLM-loop caller aborts with a
renewable error so the worker re-claims and the rate invariant is
preserved. Owner FK cascades clean up leases on job deletion.

wait-for-completion.ts — poll-until-terminal helper for CLI callers.
Minions' NOTIFY is worker-side only; `gbrain agent run --follow` polls
getJob() until status is {completed, failed, dead, cancelled}. TimeoutError
carries jobId + elapsedMs and does NOT cancel the job — the user can
inspect via `gbrain jobs get <id>` later. Supports AbortSignal for Ctrl-C
without throwing. Default pollMs is 1000 on Postgres, 250 on PGLite (inline
CLI has no network RTT).

21 unit tests cover: single/multi acquire under cap, rejection past cap,
release frees slot, different keys are independent, stale prune, cascade
on owner delete, renew bumps expires_at, renew on missing is false,
backoff path success + pruned short-circuit. waitForCompletion: fast-path
terminal, transitions mid-wait (completed/failed/cancelled), TimeoutError
shape, abort-signal early exit, non-existent job error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): subagent ToolDef types + brain-tool registry (v0.15)

Types first so the handler has a stable contract:
  - SubagentHandlerData / AggregatorHandlerData — the two job.data shapes
  - ToolCtx (engine, jobId, remote, signal) + ToolDef (name, description,
    input_schema, idempotent, execute) — Anthropic-envelope, distinct from
    the MCP McpToolDef extraction landed earlier
  - ContentBlock discriminated union for subagent_messages.content_blocks
  - SubagentStopReason + SubagentResult emitted on terminal completion

brain-allowlist.ts derives one ToolDef per allow-listed OPERATION. Reuses
the ParamDef → JSONSchema shape from the MCP extraction in a local helper
(Anthropic's input_schema field diverges from MCP's inputSchema by a
character). The 11-name allow-list is read-safe + put_page — every
destructive / filesystem / identity-mutating op stays off by default.

put_page gets a namespace-wrapped tool schema: `slug` pattern = anchored
`^wiki/agents/<subagentId>/.+`. The server-side check in put_page op
(shipped in prior commit) is still the authoritative gate — the schema
just helps the model write correct slugs first-try. `subagentId` is
plumbed into the ToolCtx so the viaSubagent=true fail-closed path lights
up on every tool-dispatched put_page.

filterAllowedTools narrows a registry by subagent_def's allowed_tools
frontmatter field. Rejects unknown names at load time (no silent drop —
typos in a skills/subagents/*.md would otherwise ship to prod with a
tool silently missing).

18 tests pin: every allowlist name exists in OPERATIONS (catches upstream
rename), Anthropic name regex, put_page namespace pattern per-subagent,
execute() routes through the op handler with viaSubagent=true, out-of-
namespace put_page throws permission_denied, filter passes prefixed +
unprefixed names, rejects unknowns, deduplicates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): subagent-audit JSONL + transcript renderer

Two small plumbing pieces the v0.15 subagent handler + `gbrain agent logs`
depend on:

subagent-audit.ts — JSONL-rotated audit log mirroring the shell-audit
pattern. Two event flavors: submission (one line per job submit) and
heartbeat (one line per turn boundary — llm_call_started / completed /
tool_called / tool_result / tool_failed). Heartbeats fix the "--follow on
a long Anthropic call shows nothing for 30 seconds" problem codex flagged.
Never logs prompts or tool inputs (PII risk — subagent input_vars may
carry user-supplied free text); DOES log tokens, ms_elapsed, tool_name,
first 200 chars of error text. Rotates weekly via ISO week. `readSubagent
AuditForJob` is the readback path for `gbrain agent logs` — scans the
current + prior week file so job boundaries across weeks still resolve.
`GBRAIN_AUDIT_DIR` overrides the default ~/.gbrain/audit/ for container
deploys.

transcript.ts — renders subagent_messages + subagent_tool_executions to
markdown. Message order is authoritative; tool rows splice under their
owning assistant tool_use by tool_use_id. Handles text, tool_use (with
pending / complete / failed execution rows), tool_result (skipped if
we already rendered the owning tool_use — avoids double-printing), and
unknown block types (fenced JSON dump for diagnostics). Output is
UTF-8-safe truncated at maxOutputBytes.

21 unit tests: ISO week filename rotation (incl. 2027-01-01 → W53-2026
boundary), submission + heartbeat write shapes, 200-char error cap, best-
effort write failure doesn't throw, readback filters by job_id and
sinceIso. Transcript: empty input, ordering, token line, tool_use +
complete/failed/pending execution rendering, truncation, unknown-block
diagnostic dump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): subagent LLM-loop handler with crash-resumable replay

The main event: runs one Anthropic Messages API conversation with tool
use, persists every turn + tool execution, and resumes cleanly after a
worker kill anywhere in the loop.

Design points that carry the v0.15 guarantees:

  1. Two-phase tool persistence. INSERT status='pending' before dispatch,
     UPDATE to 'complete' or 'failed' after. subagent_messages rows are
     the canonical conversation; subagent_tool_executions rows are the
     canonical "did this tool run + what did it return". Either DB commit
     is atomic, so replay has a single source of truth.

  2. Replay reconciliation. If the last persisted message is an assistant
     with tool_use blocks AND no following synthesized user message, we
     crashed mid-dispatch. On resume, finish those tools first (respecting
     idempotent flag for 'pending' rows), synthesize the user turn, and
     THEN call the LLM again. Non-idempotent pending rows abort the job
     with a clear error — v0.15 ships only idempotent tools so this is
     preventive.

  3. Rate lease around every LLM call. acquireLease before, releaseLease
     after (both success and error paths). acquired=false throws
     RateLeaseUnavailableError — the worker treats it as a renewable
     error and re-claims later, so a temporary capacity cap doesn't fail
     the job terminally.

  4. Anthropic prompt caching. system block gets cache_control=ephemeral;
     the LAST tool def gets it too (Anthropic caches everything up to and
     including the marked block). ~10x cost reduction on multi-turn
     agents per the plan.

  5. Dual-signal abort. AbortSignal.any merges ctx.signal (timeout / lock
     loss / cancel) with ctx.shutdownSignal (worker SIGTERM). Both feed
     the Anthropic call's AbortSignal; mid-turn abort bails before the
     next LLM call with whatever turns are already persisted. Node ≥ 20
     has AbortSignal.any; older runtimes get a manual-merge polyfill.

  6. Injectable Anthropic client. The real SDK implements MessagesClient
     structurally; tests inject a FakeMessagesClient that scripts
     responses.

12 unit tests pin: no-tool happy path, single tool_use complete, tool
throws → failed row + loop continues, unknown tool name rejection,
max_turns cap, crash-then-resume with partial state, replay skips already-
complete tool execs without re-invoking execute, non-idempotent pending
rejects on resume, lease acquire + release roundtrip, RateLeaseUnavailable
under cap-full, missing prompt validation, allowed_tools unknown-name.

NOT in v0.15: refusal detection (stop_reason + content shape), stop_reason
=max_tokens partial recovery, mid-call lease renewal with backoff loop.
All three are documented as P2 items in the plan file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): subagent_aggregator handler with mixed-outcome rendering

Claims AFTER all subagent children resolve — by then Lane 1B's queue
changes have posted one child_done message per terminal transition into
this job's inbox (complete / failed / dead / cancelled / timeout). The
aggregator reads those, builds a deterministic markdown summary, and
returns it as the handler result.

Not an LLM call in v0.15 — output is reproducible concatenation so
fan-out runs stay comparable. v0.16+ can add an LLM synthesis pass
behind an opt-in flag.

Contract:
  - empty children_ids → `(no children)` marker
  - missing child_done (shouldn't happen under v0.15 invariants but
    possible if a terminal-state path slipped past Lane 1B) → counted as
    failed with "no child_done message observed" error
  - non-complete outcomes: result is null in the output so no payload
    leaks alongside a failure label
  - children appear in the order children_ids was supplied
  - custom aggregate_prompt_template replaces the markdown header

13 unit tests cover: empty input, all-success, mixed outcomes, result
suppression on failure, missing child_done handling, order preservation,
custom template, progress + log emission, stringified JSONB payload
parsing, non-child_done inbox filtering, legacy-writer outcome fallback,
and internal helper edges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): GBRAIN_PLUGIN_PATH loader + plugin-authors guide (v0.15)

Plumbing that makes Wintermute (and future downstream agents) day-1
usable on v0.15. Host repos drop a `gbrain.plugin.json` + `subagents/`
directory somewhere, set GBRAIN_PLUGIN_PATH (colon-separated like \$PATH),
and their custom subagent defs load at worker startup.

Path policy is strict: absolute paths only. Relative, ~-prefixed, and
URL-style (https://, file://) all rejected with warnings — the user
controls where plugins live. Non-existent paths and files (not dirs) are
warned and skipped so a typo doesn't crash worker startup.

Collision policy: left-wins. If two plugins ship a subagent with the same
name, the first one in GBRAIN_PLUGIN_PATH keeps it and the other gets a
warning naming both sources. Deterministic + debuggable.

Trust policy: plugins ship subagent defs ONLY. Cannot declare new tools,
cannot extend the brain allow-list, cannot override safety flags. The
subagent def's `allowed_tools:` frontmatter MUST subset the derived
registry — validation happens at load time (worker startup), not at
dispatch time, so a typo in a skill gives a loud startup error instead
of silently "tool never fires at 3am."

Manifest `plugin_version: "gbrain-plugin-v1"` locks the contract. Unknown
versions rejected. `subagents` field escape attempts (`../../../etc` etc)
rejected. gray-matter handles the markdown frontmatter parse — subagent
defs don't conform to the page schema, so we don't use parseMarkdown.

docs/guides/plugin-authors.md is the Wintermute-facing walkthrough.
Covers the minimum viable plugin shape, the three policies, the
frontmatter fields, known caveats (audit JSONL is local-only, tool calls
always run remote=true, put_page is namespace-scoped).

22 unit tests pin path rejection, missing/invalid manifest, unsupported
version, escape-attempt, basename fallback for missing frontmatter.name,
allowed_tools round-trip, unknown-tool rejection with validAgentToolNames,
empty env, multi-path, collision warning with left-wins, trimmed paths,
manifest-rejection as warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): gbrain agent run + logs + worker registration (v0.15 Lane 4H)

Three integration seams wired:

src/commands/agent.ts — \`gbrain agent run\`. Submits subagent jobs (or a
fan-out of N + aggregator) under the trusted-submit flag so the
PROTECTED_JOB_NAMES guard doesn't reject. Fan-out path creates the
aggregator first (so children can reference its id as parent), submits
each child with on_child_fail='continue' (required by Lane 1B's terminal-
set + child_done machinery), then jsonb_set's the aggregator's
children_ids. Short-circuits a 1-entry manifest to a single subagent
with no aggregator. Follow mode runs agent-logs streaming + waitFor
Completion in parallel and exits on terminal status; detach prints the
job id and exits. Ctrl-C is handled as detach, not cancel — the job
keeps running, consistent with durability invariants.

src/commands/agent-logs.ts — \`gbrain agent logs\`. Merges ~/.gbrain/audit/
subagent-jobs-*.jsonl (heartbeats + submissions) with subagent_messages
(persisted conversation) in one chronological stream. --follow polls at
1s and exits when the job hits terminal. --since accepts ISO-8601 OR
relative shorthand (5m / 1h / 2d). Writes transcript tail (full message
+ tool tree) only for terminal jobs, so mid-run --follow doesn't spam a
half-rendered transcript.

src/commands/jobs.ts registerBuiltinHandlers — matches the shell-handler
opt-in shape. GBRAIN_ALLOW_LLM_JOBS=1 registers the subagent +
subagent_aggregator handlers, then loads plugins from GBRAIN_PLUGIN_PATH
with validAgentToolNames pulled from BRAIN_TOOL_ALLOWLIST. Every plugin
warning + loaded-plugin line prints to stderr, mirroring the openclaw-
seam startup convention.

src/core/minions/protected-names.ts — subagent + subagent_aggregator
join the protected set. MCP submit_job returns permission_denied; only
trusted-CLI callers (with allowProtectedSubmit) can insert these rows.

src/cli.ts — adds 'agent' to CLI_ONLY + dispatches it like 'jobs'.

Test fallout: subagent-handler.test.ts + subagent-transcript.test.ts
helpers now submit under allowProtectedSubmit (they insert rows named
'subagent' directly against the queue). 23 new tests in agent-cli.test.ts
cover: flag parsing (including --detach implies !follow, --tools comma
split, -- terminator, unknown flag throw), --since parse (ISO, relative
5m/2h/1d, unparseable error), protected-name guard for all three names,
trusted-submit gate, and a fan-out integration check that verifies the
aggregator + children shape after --fanout-manifest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): rename max_children test's spawned jobs off the protected 'subagent' name

The spawn-storm test submitted 50 literal-string 'subagent' children to
exercise the max_children row-lock serialization. In v0.15 'subagent' is
a PROTECTED_JOB_NAME (CLI-only; trusted submit required), so the old
literal submission now throws before reaching the row-lock check.

The test is about max_children semantics, not the v0.15 subagent runtime
specifically — rename the child name to 'child_worker' so the test
exercises the exact same queue.add path without tripping the new guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ship): v0.15.0 — VERSION, CHANGELOG, README, upgrading-agents, CLAUDE.md

Bumps VERSION → 0.15.0 and package.json → 0.15.0 (resolves the pre-existing
drift — on master, VERSION=0.14.0 but package.json=0.13.1; src/version.ts
reads package.json, so this is what the binary prints now).

CHANGELOG lands the release-summary entry in the GStack voice + the full
itemized change list (11 new modules, 3 new tables, queue correctness
fixes, trust-model additions, 159 new unit tests). Voice rules respected
— no em dashes, no AI vocabulary, real file names + real numbers.

README gets a "Durable agents: `gbrain agent` (v0.15)" section next to
the Minions block, with the three canonical CLI shapes (single run,
fanout-manifest, logs --follow) and a pointer to plugin-authors.md.

docs/UPGRADING_DOWNSTREAM_AGENTS.md gets a full v0.15.0 section covering
the four adoption steps downstream agents (Wintermute and similar) need:
(1) worker opt-in via GBRAIN_ALLOW_LLM_JOBS, (2) moving custom subagent
defs to a plugin repo, (3) replacing ephemeral subagent runs with durable
`gbrain agent run`, (4) the put_page namespace rule for agent-driven writes.

CLAUDE.md updated with concise per-file descriptions for every new module:
the handler, aggregator, audit, rate-leases, wait-for-completion,
transcript, plugin-loader, brain-allowlist, tool-defs extraction, agent
CLI + logs CLI, and the registerBuiltinHandlers wiring for subagent
handlers + plugin-loader.

Verified: binary builds (940 modules, 89ms compile), prints `gbrain 0.15.0`,
`gbrain agent --help` shows the new subcommand shape. 170 new tests pass
(full v0.15 surface). Full unit suite passes bar one parallel-load
flake on a pre-existing E2E (graph-quality, passes in isolation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(minions): drop GBRAIN_ALLOW_LLM_JOBS flag — subagent handlers always-on

The env flag was ceremony. Shell jobs need the flag because they execute
arbitrary CLI commands (RCE surface). Subagent jobs don't — they call the
Anthropic API with whatever ANTHROPIC_API_KEY is in env, so the key is
already the cost gate (no key → SDK fails on the first turn). And
who-can-submit is already protected by PROTECTED_JOB_NAMES +
TrustedSubmitOpts: MCP callers get permission_denied; only `gbrain agent
run` with allowProtectedSubmit can insert subagent / subagent_aggregator
rows. The flag added nothing the existing guards didn't already give us.

registerBuiltinHandlers now always registers subagent + subagent_aggregator
and loads GBRAIN_PLUGIN_PATH plugins. Worker startup prints:

  [minion worker] subagent handlers enabled

instead of the conditional enabled/disabled pair. Plugin discovery runs
unconditionally — empty PATH is a no-op.

README, CHANGELOG, docs/UPGRADING_DOWNSTREAM_AGENTS, CLAUDE.md, agent CLI
help text, and subagent handler docstring all updated to drop the flag
reference. Shell handler's GBRAIN_ALLOW_SHELL_JOBS gate is untouched —
separate concern (RCE, not billing).

Full suite: 1859 pass, 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: scrub private agent-fork name from all public artifacts

Enforces the rule added to CLAUDE.md (privacy section): never say
`Wintermute` in any CHANGELOG, README, doc, PR, or commit message.
Reader-facing copy says `your OpenClaw` (the term covers every
downstream OpenClaw deployment — Wintermute, Hermes, AlphaClaw — in
one umbrella the reader already recognizes). First-person /
origin-story copy says `Garry's OpenClaw` (honest that this is the
production deployment driving the feature, without exposing the
private agent's name).

Swept across:
  CHANGELOG.md (v0.15 entry + 4 historical mentions)
  README.md
  TODOS.md
  docs/UPGRADING_DOWNSTREAM_AGENTS.md
  docs/guides/plugin-authors.md (including example plugin names)
  docs/guides/plugin-handlers.md
  docs/guides/minions-fix.md
  docs/designs/KNOWLEDGE_RUNTIME.md (27 refs, mostly analytical)
  docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md
  skills/migrations/v0.11.0.md
  skills/skillpack-check/SKILL.md
  scripts/skillify-check.ts
  src/commands/doctor.ts
  src/commands/migrations/v0_15_0.ts
  src/commands/skillpack-check.ts
  src/core/enrichment/completeness.ts
  src/core/minions/plugin-loader.ts
  src/core/operations.ts
  src/core/output/scaffold.ts

Intentionally kept (these mentions define/test the rule itself):
  CLAUDE.md — the privacy rule section necessarily uses the literal
  name to define the restriction and examples
  test/plugin-loader.test.ts — fixture name in a plugin-loading test;
  renaming risks breaking assertion logic
  test/integrations.test.ts — the word appears in a privacy-regex
  test that explicitly enforces name redaction
  test/doctor-minions-check.test.ts — a comment referencing the rule
  CEO plan artifact at ~/.gstack/projects/… — private, not distributed

Binary builds (941 modules), 198/198 relevant tests pass, `gbrain --version`
prints `0.15.0`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: gitignore bun --compile artifacts with a glob, not specific hashes

Each `bun build --compile` emits a fresh hash-named `.*-*.bun-build` file
in cwd. The prior entries listed two specific hashes that were already
stale, so every build after those created a new untracked file requiring
manual cleanup.

Replace the two stale entries with `*.bun-build` so any current or future
compile artifact is ignored automatically.

Verified: ran `bun build --compile`, got two new `.*-*.bun-build` files,
`git status` stays clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ship): rename v0.15.0 → v0.16.0

gbrain master is at 0.14.2. Other 0.15.x PRs may land before/after
this one — we bump the minor (new capability) and lock to 0.16.0 so
ordering with concurrent work doesn't matter.

Touches:
- VERSION: 0.15.0 → 0.16.0
- package.json: 0.15.0 → 0.16.0
- Rename src/commands/migrations/v0_15_0.ts → v0_16_0.ts (+ all
  version strings inside + import in index.ts registry)
- Rename test/migrations-v0_15_0.test.ts → migrations-v0_16_0.test.ts
- test/apply-migrations.test.ts: skippedFuture lists now reference
  '0.16.0'
- test/put-page-namespace.test.ts + test/mcp-tool-defs.test.ts: Lane
  comment refs updated
- src/schema.sql + src/core/pglite-schema.ts: "v0.15.0" section
  comment updated; src/core/schema-embedded.ts regenerated
- CHANGELOG.md: top entry renamed to [0.16.0]; inline v0_15_0 /
  v0.15.0 refs swept
- docs/UPGRADING_DOWNSTREAM_AGENTS.md: section heading v0.15.0 → v0.16.0

Verified: `gbrain --version` prints 0.16.0, migration registry /
buildPlan / put_page / mcp-tool-defs / handlers tests all green
(49/49).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: reframe v0.16 durability headline around OpenClaw crashes

"Laptop closed mid-run" framing implied a consumer workflow. Real pain is
OpenClaw subagents dying daily on worker kill, memory blip, or timeout.
Headline + README copy match the body now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate llms-full.txt after README copy change

Regen drift guard caught the README edit from 83beec4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:14:17 -07:00

5.5 KiB

Minions fix — repairing a half-migrated install

tl;dr: on v0.11.1+ everything should self-heal. If Minions is partially set up (no ~/.gbrain/preferences.json, autopilot still inline, cron jobs still on agentTurn), run:

gbrain apply-migrations --yes

It's idempotent. On v0.11.1 installs that already migrated it's a cheap no-op.

Context

v0.11.0 shipped the Minions schema, queue, worker, and migration skill — but the migration skill itself never fired on upgrade. runPostUpgrade printed the feature pitch and stopped. v0.11.0 was never released publicly; v0.11.1 is the first public Minions ship and fixes the mega-bug (migration fires automatically on gbrain upgrade and via the postinstall hook).

If you're on a pre-v0.11.1 branch build (e.g. running the minions-jobs branch before v0.11.1 tagged), Minions may be installed but not wired: schema is v7, but no ~/.gbrain/preferences.json, autopilot still runs inline, cron jobs still call agentTurn.

This guide covers both paths: the canonical v0.11.1+ fix, and the stopgap for pre-v0.11.1 binaries that don't have apply-migrations.

Detecting the half-migrated state

gbrain doctor

If the install is half-migrated, you'll see:

[FAIL] minions_migration: MINIONS HALF-INSTALLED (partial migration: 0.11.0). Run: gbrain apply-migrations --yes

or

[FAIL] minions_config: MINIONS HALF-INSTALLED (schema v7+ but no ~/.gbrain/preferences.json). Run: gbrain apply-migrations --yes

For a machine-readable report (cron-friendly):

gbrain skillpack-check --quiet && echo healthy || echo needs_action
gbrain skillpack-check | jq -r '.actions[]'    # prints the exact commands to run

The fix (v0.11.1 or later)

gbrain apply-migrations --yes

Reads ~/.gbrain/migrations/completed.jsonl, diffs against the TS migration registry, runs whatever's pending. Seven phases:

A. Schema        gbrain init --migrate-only
B. Smoke         gbrain jobs smoke
C. Mode          prompt (or --yes default pain_triggered)
D. Prefs         write ~/.gbrain/preferences.json
E. Host          AGENTS.md marker injection + cron rewrites for gbrain
                 builtins; JSONL TODOs for host-specific handlers
F. Install       gbrain autopilot --install (env-aware)
G. Record        append completed.jsonl status:"complete"

If Phase E emits TODOs for host-specific handlers (e.g. your OpenClaw's ~29 non-gbrain crons), the migration finishes with status: "partial". Your host agent walks the TODOs using skills/migrations/v0.11.0.md + docs/guides/plugin-handlers.md, ships handler registrations in the host repo, then re-runs gbrain apply-migrations --yes. Newly registerable cron entries get rewritten and the JSONL rows mark status: "complete".

The stopgap (pre-v0.11.1 binary, no apply-migrations yet)

If you're stuck on a branch build that doesn't have apply-migrations:

curl -fsSL https://raw.githubusercontent.com/garrytan/gbrain/v0.11.1/scripts/fix-v0.11.0.sh | bash

This bash script does what apply-migrations does from a shell environment:

  1. gbrain init --migrate-only — schema v7.
  2. gbrain jobs smoke — verify Minions health.
  3. Prompt for minion_mode (defaults pain_triggered on non-TTY).
  4. Write ~/.gbrain/preferences.json atomically.
  5. Append ~/.gbrain/migrations/completed.jsonl with status: "partial" and apply_migrations_pending: true. That partial record is the signal to v0.11.1's apply-migrations to pick up remaining phases after the user upgrades.
  6. Detect host agent repos and PRINT rewrite instructions (never auto-edits from a curl-piped script).
  7. Print the next step: Run: gbrain autopilot --install.

Once v0.11.1 is installed, re-run gbrain apply-migrations --yes to finish the remaining phases (host rewrites + autopilot install). The stopgap's status: "partial" record is designed to resume cleanly (it doesn't poison the permanent migration path).

Verify the fix landed

# 1. Preferences exist and are readable
cat ~/.gbrain/preferences.json

# 2. Migration recorded
cat ~/.gbrain/migrations/completed.jsonl

# 3. Autopilot is supervising a Minions worker child
gbrain autopilot --status
ps aux | grep 'jobs work'

# 4. Jobs show up in the queue
gbrain jobs list

# 5. Any host-specific TODOs still pending
cat ~/.gbrain/migrations/pending-host-work.jsonl 2>/dev/null || echo "(none — all host work is done)"

# 6. Doctor + skillpack-check should both be clean
gbrain doctor
gbrain skillpack-check --quiet && echo ok

If the fix fails

Each phase is idempotent. Re-running is safe. Common failure modes:

  • Phase B smoke fails: the schema didn't apply. Check ~/.gbrain/config.json has a valid database_url (or database_path for PGLite). Run gbrain init --migrate-only directly and look at the error.
  • Phase F install fails: your host environment doesn't match any detected target. Pass --target <macos|linux-systemd|ephemeral-container|linux-cron> explicitly.
  • Pending host work never clears: your host agent hasn't shipped handler registrations yet. Read ~/.gbrain/migrations/pending-host-work.jsonl, open skills/migrations/v0.11.0.md, and follow the host-agent instruction manual.
  • skills/migrations/v0.11.0.md — full migration skill for host agents.
  • skills/skillpack-check/SKILL.md — when and how to run the health check.
  • docs/guides/plugin-handlers.md — plugin contract for host-specific handlers.
  • skills/conventions/cron-via-minions.md — the canonical cron rewrite pattern.