feat: v0.16.0 — durable agent runtime (gbrain agent + subagent handler + plugin loader) (#258)
* refactor(mcp): extract buildToolDefs helper for subagent tool registry reuse The inline operations.map(...) block in src/mcp/server.ts became the only source of truth for agent-facing tool definitions. Extract into a reusable exported helper so the v0.15 subagent tool registry can call it with a filtered OPERATIONS subset instead of duplicating the shape. Byte-for-byte equivalence regression pinned in test/mcp-tool-defs.test.ts — legacy inline mapping kept verbatim inside the test so any future drift between the new helper and the pre-extraction MCP schema fails loudly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(operations): subagent-aware OperationContext + put_page namespace Adds three optional fields to OperationContext: - jobId?: number — the currently running Minion job id - subagentId?: number — the owning subagent job id for tool-dispatched calls - viaSubagent?: boolean — FAIL-CLOSED flag for agent-path gating put_page now enforces a namespace rule when invoked on the subagent tool dispatch path (viaSubagent=true): writes MUST target `wiki/agents/<subagentId>/...`. Anchored, slash-boundary enforced so a collision like `wiki/agents/12evil/...` can't impersonate subagent 12. The check runs BEFORE the dry-run short-circuit so preview calls surface the same rejection. Fail-closed: a missing subagentId with viaSubagent=true rejects every slug rather than letting a dispatcher bug open a hole. Existing callers unaffected — all three fields are optional and the legacy put_page behavior is unchanged when viaSubagent is undefined/false. 12 regression + namespace tests pin: - local CLI writes (viaSubagent unset) accept arbitrary slugs - MCP writes (remote=true, viaSubagent unset) accept arbitrary slugs - subagent-path: anchored prefix accepted, wrong id rejected, prefix- collision defeated, leading-slash rejected, bare-prefix rejected, fail-closed on missing/NaN subagentId, permission_denied code emitted Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(schema): v0.15.0 subagent runtime tables + migration orchestrator Adds three new tables for the durable LLM agent runtime: subagent_messages — Anthropic message-block persistence. Parallel tool_use blocks in one assistant message live in content_blocks JSONB, not across rows (fixes the (job_id, turn_idx, role) misdesign codex caught in v0.13 drafting). subagent_tool_executions — Two-phase tool ledger. INSERT pending before execute, UPDATE complete/failed after. Replay re-runs pending rows only if the tool is idempotent (v1 ships only idempotent tools so this is preventive). subagent_rate_leases — Lease-based concurrency cap for outbound providers (e.g. anthropic:messages). Stale leases auto-prune on next acquire so crashed workers can't strand capacity. All DDL uses CREATE TABLE/INDEX IF NOT EXISTS — order-independent vs PR #244's initSchema() reorder, and idempotent across fresh-install + upgrade paths. Shipped in both src/schema.sql (Postgres) and src/core/pglite-schema.ts (PGLite); schema-embedded.ts regenerated. Migration orchestrator v0_15_0.ts (phases: schema → verify → record). v0_14_0.ts is a no-op stub so the registry's version sequence stays gapless (v0.14.0 shipped shell-jobs — code change, no DB migration). 10 unit tests for registry wiring, ordering, dry-run phase behavior, and schema-embedded table presence. test/apply-migrations.test.ts updated for the two new registry entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): emit child_done on every terminal + max_stalled per-job + terminal set fix Three correctness fixes the v0.15 subagent aggregator spine depends on: 1. child_done emission on ALL terminal transitions, not just success. - completeJob already emitted on success — now also tags outcome='complete'. - failJob newly emits on terminal 'failed' or 'dead' (outcome='failed'|'dead', error=<text>), BEFORE the parent-terminal UPDATE so the EXISTS guard on the inbox INSERT doesn't skip it on fail_parent paths (codex catch). - cancelJob now emits outcome='cancelled' per descendant with a parent. - handleTimeouts now emits outcome='timeout' per timed-out child. ChildDoneMessage gains optional { outcome, error } — backwards compatible (legacy writers omitted them; consumers treat absent outcome as 'complete'). 2. Parent-resolution terminal set now includes 'failed'. Pre-v0.15 the `NOT EXISTS (... status NOT IN ('completed','dead','cancelled'))` guard treated a failed child as still-pending, stranding aggregator parents that chose on_child_fail='continue' or 'ignore' in waiting-children forever. Expanded to {completed, failed, dead, cancelled} everywhere parent resolution reads child status (completeJob inline, failJob remove_dep + continue, cancelJob sweep, handleTimeouts sweep, and the resolveParent method itself). 3. MinionJobInput.max_stalled threads through MinionQueue.add() on INSERT. Column exists with default 1 — that is "first stall → dead", which defeats crash recovery for long-running handlers. Subagent children will set max_stalled: 3 to survive mid-run worker kills. Second-submitter under an idempotency-key hit does NOT mutate the existing row (codex-flagged footgun — first-submit options are load-bearing state). 13 unit tests pin: emission on each of completeJob/failJob/cancelJob/ handleTimeouts, insertion order on fail_parent, terminal-set expansion with continue policy, max_stalled default + override + idempotency behavior. E2E tier 1 (Postgres) passes 141 tests unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): rate-leases + waitForCompletion infra for v0.15 subagent Two infrastructure modules the subagent handler spine depends on: rate-leases.ts — lease-based concurrency cap for outbound providers (anthropic:messages, openai:*, etc.). Counter-based limiters leak capacity on worker crash; leases are owner-tagged rows with expires_at that auto-prune on the next acquire. Two-phase: txn-scoped pg_advisory_xact_lock guards the check-then-insert so concurrent acquires can't both win the "last slot". renewLeaseWithBackoff retries 3x (250/500/1000ms) for mid- call DB blips — on persistent failure the LLM-loop caller aborts with a renewable error so the worker re-claims and the rate invariant is preserved. Owner FK cascades clean up leases on job deletion. wait-for-completion.ts — poll-until-terminal helper for CLI callers. Minions' NOTIFY is worker-side only; `gbrain agent run --follow` polls getJob() until status is {completed, failed, dead, cancelled}. TimeoutError carries jobId + elapsedMs and does NOT cancel the job — the user can inspect via `gbrain jobs get <id>` later. Supports AbortSignal for Ctrl-C without throwing. Default pollMs is 1000 on Postgres, 250 on PGLite (inline CLI has no network RTT). 21 unit tests cover: single/multi acquire under cap, rejection past cap, release frees slot, different keys are independent, stale prune, cascade on owner delete, renew bumps expires_at, renew on missing is false, backoff path success + pruned short-circuit. waitForCompletion: fast-path terminal, transitions mid-wait (completed/failed/cancelled), TimeoutError shape, abort-signal early exit, non-existent job error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): subagent ToolDef types + brain-tool registry (v0.15) Types first so the handler has a stable contract: - SubagentHandlerData / AggregatorHandlerData — the two job.data shapes - ToolCtx (engine, jobId, remote, signal) + ToolDef (name, description, input_schema, idempotent, execute) — Anthropic-envelope, distinct from the MCP McpToolDef extraction landed earlier - ContentBlock discriminated union for subagent_messages.content_blocks - SubagentStopReason + SubagentResult emitted on terminal completion brain-allowlist.ts derives one ToolDef per allow-listed OPERATION. Reuses the ParamDef → JSONSchema shape from the MCP extraction in a local helper (Anthropic's input_schema field diverges from MCP's inputSchema by a character). The 11-name allow-list is read-safe + put_page — every destructive / filesystem / identity-mutating op stays off by default. put_page gets a namespace-wrapped tool schema: `slug` pattern = anchored `^wiki/agents/<subagentId>/.+`. The server-side check in put_page op (shipped in prior commit) is still the authoritative gate — the schema just helps the model write correct slugs first-try. `subagentId` is plumbed into the ToolCtx so the viaSubagent=true fail-closed path lights up on every tool-dispatched put_page. filterAllowedTools narrows a registry by subagent_def's allowed_tools frontmatter field. Rejects unknown names at load time (no silent drop — typos in a skills/subagents/*.md would otherwise ship to prod with a tool silently missing). 18 tests pin: every allowlist name exists in OPERATIONS (catches upstream rename), Anthropic name regex, put_page namespace pattern per-subagent, execute() routes through the op handler with viaSubagent=true, out-of- namespace put_page throws permission_denied, filter passes prefixed + unprefixed names, rejects unknowns, deduplicates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): subagent-audit JSONL + transcript renderer Two small plumbing pieces the v0.15 subagent handler + `gbrain agent logs` depend on: subagent-audit.ts — JSONL-rotated audit log mirroring the shell-audit pattern. Two event flavors: submission (one line per job submit) and heartbeat (one line per turn boundary — llm_call_started / completed / tool_called / tool_result / tool_failed). Heartbeats fix the "--follow on a long Anthropic call shows nothing for 30 seconds" problem codex flagged. Never logs prompts or tool inputs (PII risk — subagent input_vars may carry user-supplied free text); DOES log tokens, ms_elapsed, tool_name, first 200 chars of error text. Rotates weekly via ISO week. `readSubagent AuditForJob` is the readback path for `gbrain agent logs` — scans the current + prior week file so job boundaries across weeks still resolve. `GBRAIN_AUDIT_DIR` overrides the default ~/.gbrain/audit/ for container deploys. transcript.ts — renders subagent_messages + subagent_tool_executions to markdown. Message order is authoritative; tool rows splice under their owning assistant tool_use by tool_use_id. Handles text, tool_use (with pending / complete / failed execution rows), tool_result (skipped if we already rendered the owning tool_use — avoids double-printing), and unknown block types (fenced JSON dump for diagnostics). Output is UTF-8-safe truncated at maxOutputBytes. 21 unit tests: ISO week filename rotation (incl. 2027-01-01 → W53-2026 boundary), submission + heartbeat write shapes, 200-char error cap, best- effort write failure doesn't throw, readback filters by job_id and sinceIso. Transcript: empty input, ordering, token line, tool_use + complete/failed/pending execution rendering, truncation, unknown-block diagnostic dump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): subagent LLM-loop handler with crash-resumable replay The main event: runs one Anthropic Messages API conversation with tool use, persists every turn + tool execution, and resumes cleanly after a worker kill anywhere in the loop. Design points that carry the v0.15 guarantees: 1. Two-phase tool persistence. INSERT status='pending' before dispatch, UPDATE to 'complete' or 'failed' after. subagent_messages rows are the canonical conversation; subagent_tool_executions rows are the canonical "did this tool run + what did it return". Either DB commit is atomic, so replay has a single source of truth. 2. Replay reconciliation. If the last persisted message is an assistant with tool_use blocks AND no following synthesized user message, we crashed mid-dispatch. On resume, finish those tools first (respecting idempotent flag for 'pending' rows), synthesize the user turn, and THEN call the LLM again. Non-idempotent pending rows abort the job with a clear error — v0.15 ships only idempotent tools so this is preventive. 3. Rate lease around every LLM call. acquireLease before, releaseLease after (both success and error paths). acquired=false throws RateLeaseUnavailableError — the worker treats it as a renewable error and re-claims later, so a temporary capacity cap doesn't fail the job terminally. 4. Anthropic prompt caching. system block gets cache_control=ephemeral; the LAST tool def gets it too (Anthropic caches everything up to and including the marked block). ~10x cost reduction on multi-turn agents per the plan. 5. Dual-signal abort. AbortSignal.any merges ctx.signal (timeout / lock loss / cancel) with ctx.shutdownSignal (worker SIGTERM). Both feed the Anthropic call's AbortSignal; mid-turn abort bails before the next LLM call with whatever turns are already persisted. Node ≥ 20 has AbortSignal.any; older runtimes get a manual-merge polyfill. 6. Injectable Anthropic client. The real SDK implements MessagesClient structurally; tests inject a FakeMessagesClient that scripts responses. 12 unit tests pin: no-tool happy path, single tool_use complete, tool throws → failed row + loop continues, unknown tool name rejection, max_turns cap, crash-then-resume with partial state, replay skips already- complete tool execs without re-invoking execute, non-idempotent pending rejects on resume, lease acquire + release roundtrip, RateLeaseUnavailable under cap-full, missing prompt validation, allowed_tools unknown-name. NOT in v0.15: refusal detection (stop_reason + content shape), stop_reason =max_tokens partial recovery, mid-call lease renewal with backoff loop. All three are documented as P2 items in the plan file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): subagent_aggregator handler with mixed-outcome rendering Claims AFTER all subagent children resolve — by then Lane 1B's queue changes have posted one child_done message per terminal transition into this job's inbox (complete / failed / dead / cancelled / timeout). The aggregator reads those, builds a deterministic markdown summary, and returns it as the handler result. Not an LLM call in v0.15 — output is reproducible concatenation so fan-out runs stay comparable. v0.16+ can add an LLM synthesis pass behind an opt-in flag. Contract: - empty children_ids → `(no children)` marker - missing child_done (shouldn't happen under v0.15 invariants but possible if a terminal-state path slipped past Lane 1B) → counted as failed with "no child_done message observed" error - non-complete outcomes: result is null in the output so no payload leaks alongside a failure label - children appear in the order children_ids was supplied - custom aggregate_prompt_template replaces the markdown header 13 unit tests cover: empty input, all-success, mixed outcomes, result suppression on failure, missing child_done handling, order preservation, custom template, progress + log emission, stringified JSONB payload parsing, non-child_done inbox filtering, legacy-writer outcome fallback, and internal helper edges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): GBRAIN_PLUGIN_PATH loader + plugin-authors guide (v0.15) Plumbing that makes Wintermute (and future downstream agents) day-1 usable on v0.15. Host repos drop a `gbrain.plugin.json` + `subagents/` directory somewhere, set GBRAIN_PLUGIN_PATH (colon-separated like \$PATH), and their custom subagent defs load at worker startup. Path policy is strict: absolute paths only. Relative, ~-prefixed, and URL-style (https://, file://) all rejected with warnings — the user controls where plugins live. Non-existent paths and files (not dirs) are warned and skipped so a typo doesn't crash worker startup. Collision policy: left-wins. If two plugins ship a subagent with the same name, the first one in GBRAIN_PLUGIN_PATH keeps it and the other gets a warning naming both sources. Deterministic + debuggable. Trust policy: plugins ship subagent defs ONLY. Cannot declare new tools, cannot extend the brain allow-list, cannot override safety flags. The subagent def's `allowed_tools:` frontmatter MUST subset the derived registry — validation happens at load time (worker startup), not at dispatch time, so a typo in a skill gives a loud startup error instead of silently "tool never fires at 3am." Manifest `plugin_version: "gbrain-plugin-v1"` locks the contract. Unknown versions rejected. `subagents` field escape attempts (`../../../etc` etc) rejected. gray-matter handles the markdown frontmatter parse — subagent defs don't conform to the page schema, so we don't use parseMarkdown. docs/guides/plugin-authors.md is the Wintermute-facing walkthrough. Covers the minimum viable plugin shape, the three policies, the frontmatter fields, known caveats (audit JSONL is local-only, tool calls always run remote=true, put_page is namespace-scoped). 22 unit tests pin path rejection, missing/invalid manifest, unsupported version, escape-attempt, basename fallback for missing frontmatter.name, allowed_tools round-trip, unknown-tool rejection with validAgentToolNames, empty env, multi-path, collision warning with left-wins, trimmed paths, manifest-rejection as warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): gbrain agent run + logs + worker registration (v0.15 Lane 4H) Three integration seams wired: src/commands/agent.ts — \`gbrain agent run\`. Submits subagent jobs (or a fan-out of N + aggregator) under the trusted-submit flag so the PROTECTED_JOB_NAMES guard doesn't reject. Fan-out path creates the aggregator first (so children can reference its id as parent), submits each child with on_child_fail='continue' (required by Lane 1B's terminal- set + child_done machinery), then jsonb_set's the aggregator's children_ids. Short-circuits a 1-entry manifest to a single subagent with no aggregator. Follow mode runs agent-logs streaming + waitFor Completion in parallel and exits on terminal status; detach prints the job id and exits. Ctrl-C is handled as detach, not cancel — the job keeps running, consistent with durability invariants. src/commands/agent-logs.ts — \`gbrain agent logs\`. Merges ~/.gbrain/audit/ subagent-jobs-*.jsonl (heartbeats + submissions) with subagent_messages (persisted conversation) in one chronological stream. --follow polls at 1s and exits when the job hits terminal. --since accepts ISO-8601 OR relative shorthand (5m / 1h / 2d). Writes transcript tail (full message + tool tree) only for terminal jobs, so mid-run --follow doesn't spam a half-rendered transcript. src/commands/jobs.ts registerBuiltinHandlers — matches the shell-handler opt-in shape. GBRAIN_ALLOW_LLM_JOBS=1 registers the subagent + subagent_aggregator handlers, then loads plugins from GBRAIN_PLUGIN_PATH with validAgentToolNames pulled from BRAIN_TOOL_ALLOWLIST. Every plugin warning + loaded-plugin line prints to stderr, mirroring the openclaw- seam startup convention. src/core/minions/protected-names.ts — subagent + subagent_aggregator join the protected set. MCP submit_job returns permission_denied; only trusted-CLI callers (with allowProtectedSubmit) can insert these rows. src/cli.ts — adds 'agent' to CLI_ONLY + dispatches it like 'jobs'. Test fallout: subagent-handler.test.ts + subagent-transcript.test.ts helpers now submit under allowProtectedSubmit (they insert rows named 'subagent' directly against the queue). 23 new tests in agent-cli.test.ts cover: flag parsing (including --detach implies !follow, --tools comma split, -- terminator, unknown flag throw), --since parse (ISO, relative 5m/2h/1d, unparseable error), protected-name guard for all three names, trusted-submit gate, and a fan-out integration check that verifies the aggregator + children shape after --fanout-manifest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): rename max_children test's spawned jobs off the protected 'subagent' name The spawn-storm test submitted 50 literal-string 'subagent' children to exercise the max_children row-lock serialization. In v0.15 'subagent' is a PROTECTED_JOB_NAME (CLI-only; trusted submit required), so the old literal submission now throws before reaching the row-lock check. The test is about max_children semantics, not the v0.15 subagent runtime specifically — rename the child name to 'child_worker' so the test exercises the exact same queue.add path without tripping the new guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ship): v0.15.0 — VERSION, CHANGELOG, README, upgrading-agents, CLAUDE.md Bumps VERSION → 0.15.0 and package.json → 0.15.0 (resolves the pre-existing drift — on master, VERSION=0.14.0 but package.json=0.13.1; src/version.ts reads package.json, so this is what the binary prints now). CHANGELOG lands the release-summary entry in the GStack voice + the full itemized change list (11 new modules, 3 new tables, queue correctness fixes, trust-model additions, 159 new unit tests). Voice rules respected — no em dashes, no AI vocabulary, real file names + real numbers. README gets a "Durable agents: `gbrain agent` (v0.15)" section next to the Minions block, with the three canonical CLI shapes (single run, fanout-manifest, logs --follow) and a pointer to plugin-authors.md. docs/UPGRADING_DOWNSTREAM_AGENTS.md gets a full v0.15.0 section covering the four adoption steps downstream agents (Wintermute and similar) need: (1) worker opt-in via GBRAIN_ALLOW_LLM_JOBS, (2) moving custom subagent defs to a plugin repo, (3) replacing ephemeral subagent runs with durable `gbrain agent run`, (4) the put_page namespace rule for agent-driven writes. CLAUDE.md updated with concise per-file descriptions for every new module: the handler, aggregator, audit, rate-leases, wait-for-completion, transcript, plugin-loader, brain-allowlist, tool-defs extraction, agent CLI + logs CLI, and the registerBuiltinHandlers wiring for subagent handlers + plugin-loader. Verified: binary builds (940 modules, 89ms compile), prints `gbrain 0.15.0`, `gbrain agent --help` shows the new subcommand shape. 170 new tests pass (full v0.15 surface). Full unit suite passes bar one parallel-load flake on a pre-existing E2E (graph-quality, passes in isolation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): drop GBRAIN_ALLOW_LLM_JOBS flag — subagent handlers always-on The env flag was ceremony. Shell jobs need the flag because they execute arbitrary CLI commands (RCE surface). Subagent jobs don't — they call the Anthropic API with whatever ANTHROPIC_API_KEY is in env, so the key is already the cost gate (no key → SDK fails on the first turn). And who-can-submit is already protected by PROTECTED_JOB_NAMES + TrustedSubmitOpts: MCP callers get permission_denied; only `gbrain agent run` with allowProtectedSubmit can insert subagent / subagent_aggregator rows. The flag added nothing the existing guards didn't already give us. registerBuiltinHandlers now always registers subagent + subagent_aggregator and loads GBRAIN_PLUGIN_PATH plugins. Worker startup prints: [minion worker] subagent handlers enabled instead of the conditional enabled/disabled pair. Plugin discovery runs unconditionally — empty PATH is a no-op. README, CHANGELOG, docs/UPGRADING_DOWNSTREAM_AGENTS, CLAUDE.md, agent CLI help text, and subagent handler docstring all updated to drop the flag reference. Shell handler's GBRAIN_ALLOW_SHELL_JOBS gate is untouched — separate concern (RCE, not billing). Full suite: 1859 pass, 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: scrub private agent-fork name from all public artifacts Enforces the rule added to CLAUDE.md (privacy section): never say `Wintermute` in any CHANGELOG, README, doc, PR, or commit message. Reader-facing copy says `your OpenClaw` (the term covers every downstream OpenClaw deployment — Wintermute, Hermes, AlphaClaw — in one umbrella the reader already recognizes). First-person / origin-story copy says `Garry's OpenClaw` (honest that this is the production deployment driving the feature, without exposing the private agent's name). Swept across: CHANGELOG.md (v0.15 entry + 4 historical mentions) README.md TODOS.md docs/UPGRADING_DOWNSTREAM_AGENTS.md docs/guides/plugin-authors.md (including example plugin names) docs/guides/plugin-handlers.md docs/guides/minions-fix.md docs/designs/KNOWLEDGE_RUNTIME.md (27 refs, mostly analytical) docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md skills/migrations/v0.11.0.md skills/skillpack-check/SKILL.md scripts/skillify-check.ts src/commands/doctor.ts src/commands/migrations/v0_15_0.ts src/commands/skillpack-check.ts src/core/enrichment/completeness.ts src/core/minions/plugin-loader.ts src/core/operations.ts src/core/output/scaffold.ts Intentionally kept (these mentions define/test the rule itself): CLAUDE.md — the privacy rule section necessarily uses the literal name to define the restriction and examples test/plugin-loader.test.ts — fixture name in a plugin-loading test; renaming risks breaking assertion logic test/integrations.test.ts — the word appears in a privacy-regex test that explicitly enforces name redaction test/doctor-minions-check.test.ts — a comment referencing the rule CEO plan artifact at ~/.gstack/projects/… — private, not distributed Binary builds (941 modules), 198/198 relevant tests pass, `gbrain --version` prints `0.15.0`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: gitignore bun --compile artifacts with a glob, not specific hashes Each `bun build --compile` emits a fresh hash-named `.*-*.bun-build` file in cwd. The prior entries listed two specific hashes that were already stale, so every build after those created a new untracked file requiring manual cleanup. Replace the two stale entries with `*.bun-build` so any current or future compile artifact is ignored automatically. Verified: ran `bun build --compile`, got two new `.*-*.bun-build` files, `git status` stays clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ship): rename v0.15.0 → v0.16.0 gbrain master is at 0.14.2. Other 0.15.x PRs may land before/after this one — we bump the minor (new capability) and lock to 0.16.0 so ordering with concurrent work doesn't matter. Touches: - VERSION: 0.15.0 → 0.16.0 - package.json: 0.15.0 → 0.16.0 - Rename src/commands/migrations/v0_15_0.ts → v0_16_0.ts (+ all version strings inside + import in index.ts registry) - Rename test/migrations-v0_15_0.test.ts → migrations-v0_16_0.test.ts - test/apply-migrations.test.ts: skippedFuture lists now reference '0.16.0' - test/put-page-namespace.test.ts + test/mcp-tool-defs.test.ts: Lane comment refs updated - src/schema.sql + src/core/pglite-schema.ts: "v0.15.0" section comment updated; src/core/schema-embedded.ts regenerated - CHANGELOG.md: top entry renamed to [0.16.0]; inline v0_15_0 / v0.15.0 refs swept - docs/UPGRADING_DOWNSTREAM_AGENTS.md: section heading v0.15.0 → v0.16.0 Verified: `gbrain --version` prints 0.16.0, migration registry / buildPlan / put_page / mcp-tool-defs / handlers tests all green (49/49). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: reframe v0.16 durability headline around OpenClaw crashes "Laptop closed mid-run" framing implied a consumer workflow. Real pain is OpenClaw subagents dying daily on worker kill, memory blip, or timeout. Headline + README copy match the body now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate llms-full.txt after README copy change Regen drift guard caught the README edit from 83beec4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
5
.gitignore
vendored
5
.gitignore
vendored
@@ -5,8 +5,9 @@ bin/
|
||||
.env
|
||||
.env.*
|
||||
!.env.*.example
|
||||
.18a49dfd730ff378-00000000.bun-build
|
||||
.18a49f9dfb996f70-00000000.bun-build
|
||||
# Bun --compile temp artifacts. Each build emits a new hash-named .bun-build
|
||||
# file in cwd; glob catches all of them.
|
||||
*.bun-build
|
||||
.gstack/
|
||||
supabase/.temp/
|
||||
.claude/skills/
|
||||
|
||||
86
CHANGELOG.md
86
CHANGELOG.md
@@ -2,6 +2,82 @@
|
||||
|
||||
All notable changes to GBrain will be documented in this file.
|
||||
|
||||
## [0.16.0] - 2026-04-20
|
||||
|
||||
## **Durable agents land. Your LLM loops survive crashes, timeouts, and worker restarts now.**
|
||||
## **OpenClaw died mid-run? Come back, resume from the last committed turn.**
|
||||
|
||||
Your OpenClaw crashes daily. Not "sometimes." Daily. An 8-turn OpenClaw subagent fires a tool call, the worker dies on a memory blip, all eight turns of context are gone, and there's nothing to do but start over from turn zero. This release kills that. `gbrain agent run` submits an Anthropic Messages API conversation as a first-class Minion job: every turn persists to `subagent_messages`, every tool call is a two-phase ledger row (`pending` → `complete | failed`), and replay on worker restart picks up from exactly the last committed turn. Crash-safe by construction, not by hope.
|
||||
|
||||
Fan-out works the same way. `--fanout-manifest` splits N prompts across N subagent children plus one aggregator. Children run `on_child_fail: 'continue'` so one failing run doesn't cascade, and the aggregator claims after all children reach ANY terminal state (complete, failed, dead, cancelled, timeout) and writes a mixed-outcome summary. No polling loop, no dead parents stranded in `waiting-children`.
|
||||
|
||||
Plugins work. Host repos drop a `gbrain.plugin.json` + `subagents/*.md` dir somewhere on `GBRAIN_PLUGIN_PATH`, and their custom subagent defs load at worker startup. Your OpenClaw ships its meeting-ingestion, signal-detector, and daily-task-prep subagents in its own repo now; gbrain discovers them day one. Collision rule is deterministic (left-wins with a loud warning). Trust boundary is strict on purpose: plugins ship DEFS, not tools. Tool allow-list stays here.
|
||||
|
||||
### The numbers that matter
|
||||
|
||||
Measured on the v0.15 branch against real Postgres via `bun run test:e2e`, plus the 159 new unit tests across 10 new test files. Coverage: 12 new runtime modules, 53+ code paths + user flows traced, 3 critical regression tests for the shell-jobs queue surface.
|
||||
|
||||
| Metric | BEFORE v0.15 | AFTER v0.15 | Δ |
|
||||
|----------------------------------------------------------|------------------------------------|---------------------------------------------|--------------------------------------|
|
||||
| Your OpenClaw run survives worker kill mid-tool-call | No (start over) | Yes (resume from last committed turn) | crash-recovery unlocked |
|
||||
| Fan-out run with 1 failed child out of N | Aggregator fails | Aggregator still claims + summarizes | mixed-outcome aggregation works |
|
||||
| `gbrain agent logs --follow` during long Anthropic call | Silent (looks frozen) | Heartbeat line per turn boundary | visible progress |
|
||||
| Tool-use replay on resume | N/A (no resume) | Idempotent re-run, non-idempotent aborts | two-phase protocol |
|
||||
| `put_page` exposure to agent-driven writes | Full write surface | Namespace-scoped `wiki/agents/<id>/…` | fail-closed, server-enforced |
|
||||
| Plugin subagent defs for downstream hosts | Not supported | `GBRAIN_PLUGIN_PATH` + validated at startup | OpenClaw day-1 usable |
|
||||
| Rate-lease capacity leaks on worker crash | Counter-based (leaks) | Lease-based (auto-prune on next acquire) | no starvation after SIGKILL |
|
||||
| Anthropic prompt cache on 40-turn agent | Per-turn cold | `cache_control: ephemeral` on system + tools | ~10x cost reduction (best-case) |
|
||||
|
||||
### What this means for your OpenClaw
|
||||
|
||||
You stop rerunning from zero. A crash at 3am that used to lose two hours of turns now costs you whatever fraction of one turn was in-flight when the worker died. The rest of the conversation is rows in `subagent_messages` and `subagent_tool_executions`, and the next worker claim replays from there. `gbrain agent logs <job>` shows you where it died, which tool it was running, and what came back from the last successful call. Real debugging, not guessing.
|
||||
|
||||
Credit: shell-jobs (v0.14) established every pattern v0.15 reuses — handler signature, dual-signal abort, ctx.updateTokens, protected-names, trusted-submit, JSONL audit log, timeout_ms. Codex caught the Mode A "transparent Agent() interception" impossibility during plan review and saved the shape of this work. The v0.15 handler is what survives on the other side of that review.
|
||||
|
||||
### Itemized changes
|
||||
|
||||
**New capability: `gbrain agent` CLI**
|
||||
- `gbrain agent run <prompt> [--subagent-def|--model|--max-turns|--tools|--timeout-ms|--fanout-manifest|--follow|--detach]` — submits a subagent job (or fan-out of N subagents + aggregator) under the trusted-submit flag. Follow mode tails status + logs until terminal; detach prints the job id and exits. Ctrl-C detaches (job keeps running), does not cancel.
|
||||
- `gbrain agent logs <job_id> [--follow] [--since ISO-or-relative]` — merges the JSONL heartbeat audit with persisted `subagent_messages` into one chronological timeline. `--since 5m` / `1h` / `2d` shorthand supported. Transcript tail renders the full message + tool tree only after the job is terminal.
|
||||
- Always registered on the worker (no separate env flag). `ANTHROPIC_API_KEY` is the natural cost gate — no key, the SDK call fails immediately. Who-can-submit is already gated by `PROTECTED_JOB_NAMES` + `TrustedSubmitOpts` so only the trusted-CLI path can insert `subagent` / `subagent_aggregator` rows.
|
||||
|
||||
**New durability primitives**
|
||||
- `src/core/minions/handlers/subagent.ts` — the LLM-loop handler. Two-phase tool persistence, replay reconciliation for mid-dispatch crashes, dual-signal abort (`ctx.signal` + `ctx.shutdownSignal`), Anthropic prompt caching on system + tool defs, injectable `MessagesClient` for mocking.
|
||||
- `src/core/minions/handlers/subagent-aggregator.ts` — claims AFTER all children resolve (Lane 1B's queue changes guarantee each terminal child posts a `child_done` inbox message), produces deterministic mixed-outcome markdown summary.
|
||||
- `src/core/minions/rate-leases.ts` — lease-based concurrency cap for outbound providers. Owner-tagged rows with `expires_at` auto-prune on acquire, so a crashed worker can't strand capacity. `pg_advisory_xact_lock` guards the check-then-insert.
|
||||
- `src/core/minions/wait-for-completion.ts` — poll-until-terminal helper for CLI callers. `TimeoutError` does NOT cancel the job; AbortSignal exits cleanly. Default `pollMs`: 1000 on Postgres, 250 on PGLite inline.
|
||||
- `src/core/minions/handlers/subagent-audit.ts` — JSONL audit + heartbeat writer. Rotates weekly via ISO week. `readSubagentAuditForJob` is the readback path for `gbrain agent logs`.
|
||||
- `src/core/minions/transcript.ts` — messages + tool executions → markdown renderer. UTF-8-safe truncation; unknown block types fall through to JSON for diagnostics.
|
||||
- `src/core/minions/tools/brain-allowlist.ts` — derives the subagent tool registry from `src/core/operations.ts`. 11-name allow-list (read-only + deterministic `put_page`). `put_page` schema is namespace-wrapped per subagent so the model writes correct slugs first-try; the server-side check in `put_page` is the authoritative gate.
|
||||
- `src/core/minions/plugin-loader.ts` — `GBRAIN_PLUGIN_PATH` (colon-separated absolute paths like `PATH`) + `gbrain.plugin.json` manifest + `subagents/*.md` defs. Strict path policy, left-wins collision, plugins ship DEFS only (no new tools), `allowed_tools:` validated at load time.
|
||||
- `src/mcp/tool-defs.ts` — extracted from an inline `operations.map(...)` block in the MCP server so subagent + MCP use the same source of truth. Byte-for-byte equivalence pinned by regression test.
|
||||
|
||||
**Schema (3 new tables + OperationContext fields + migration orchestrator)**
|
||||
- `subagent_messages` — Anthropic message-block persistence. `(job_id, message_idx)` UNIQUE; `content_blocks JSONB` holds parallel tool_use blocks in one assistant message.
|
||||
- `subagent_tool_executions` — two-phase ledger. `(job_id, tool_use_id)` UNIQUE; status: `pending | complete | failed`.
|
||||
- `subagent_rate_leases` — lease-based concurrency control. CASCADE deletes on owning job removal so no leaked rows.
|
||||
- `OperationContext` gains `jobId?`, `subagentId?`, and `viaSubagent?` (fail-closed signal for agent-path gating). Added to `src/core/operations.ts`.
|
||||
- `src/commands/migrations/v0_15_0.ts` — post-upgrade orchestrator (phases: schema → verify → record). `v0_14_0.ts` noop stub keeps the registry version sequence gapless.
|
||||
|
||||
**Queue correctness fixes**
|
||||
- `failJob`, `cancelJob`, and `handleTimeouts` all emit `child_done` inbox messages with `outcome: 'complete' | 'failed' | 'dead' | 'cancelled' | 'timeout'`. Pre-v0.15 only `completeJob` emitted; failed/cancelled/timed-out children silently stranded aggregator-style parents.
|
||||
- Parent-resolution terminal set expanded from `{completed, dead, cancelled}` to include `'failed'` everywhere parent-state is checked. A failed child with `on_child_fail: 'continue'` now correctly unblocks the parent.
|
||||
- `failJob` emits `child_done` BEFORE the parent-terminal UPDATE. Without insertion ordering, the EXISTS guard on the inbox INSERT would skip the row on `fail_parent` paths (caught by codex iteration 3).
|
||||
- `MinionJobInput.max_stalled` threads through `MinionQueue.add()` as INSERT param (not UPDATE on idempotency replay — that would mutate first-submitter state).
|
||||
|
||||
**Trust model**
|
||||
- `subagent` and `subagent_aggregator` join `PROTECTED_JOB_NAMES`. MCP `submit_job` returns `permission_denied`; only `gbrain agent run` (with `allowProtectedSubmit`) can insert these rows.
|
||||
- `put_page` gains a server-side fail-closed namespace check: when `ctx.viaSubagent === true`, `slug` MUST match `^wiki/agents/<subagentId>/.+` — even if `subagentId` is undefined (dispatcher bug must not open a hole).
|
||||
|
||||
**Docs**
|
||||
- `docs/guides/plugin-authors.md` — downstream-OpenClaw-facing walkthrough (minimum viable plugin, path + collision + trust policies, frontmatter fields, caveats).
|
||||
- 12 bisectable commits on `garrytan/minions-seam`, each PR-worthy on its own; the full series lands v0.15.0 end-to-end.
|
||||
|
||||
**Tests**
|
||||
- 159 new unit tests across 10 new files: `mcp-tool-defs`, `put-page-namespace`, `migrations-v0_15_0`, `queue-child-done`, `rate-leases`, `wait-for-completion`, `brain-allowlist`, `subagent-audit`, `subagent-transcript`, `subagent-handler`, `subagent-aggregator`, `plugin-loader`, `agent-cli`.
|
||||
- 3 critical regression tests pin the shell-jobs queue surface: `failJob` child_done behavior, `put_page` namespace path for non-subagent callers, MCP `buildToolDefs` byte-equivalence.
|
||||
- E2E `minions-resilience.test.ts` updated: the max_children test renames its spawned children off the now-protected `subagent` name.
|
||||
|
||||
## [0.15.4] - 2026-04-21
|
||||
|
||||
## **PgBouncer transaction-mode prepared statements, fixed at the pool.**
|
||||
@@ -565,7 +641,7 @@ Three new migrations, all idempotent, apply automatically on `gbrain init` / upg
|
||||
- **Strict-mode default flip.** BrainWriter ships with `strict_mode=lint`. The flip to strict requires a 7-day soak + BrainBench regression ≤1pt + zero false-positive count.
|
||||
- **Sandboxed user plugins.** v0.13 ships builtins only. User-provided TS modules deferred pending a real isolation story (worker_threads or vm2) in a follow-on release.
|
||||
- **`openai_embedding` refactor.** Deferred to PR 1.5 post-flip; embedding is a hot path.
|
||||
- **Wintermute `claw-bridge`.** Adoption path is documentation-only this release.
|
||||
- **OpenClaw `claw-bridge`.** Adoption path is documentation-only this release.
|
||||
|
||||
### Tests
|
||||
|
||||
@@ -585,7 +661,7 @@ Three new migrations, all idempotent, apply automatically on `gbrain init` / upg
|
||||
Four subcommands: `check` (read-only report with `--json`, `--type`, `--limit`), `auto` (three-bucket repair with `--confidence`, `--review-lower`, `--dry-run`, `--fresh`, `--limit`), `review` (prints queue path + count), `reset-progress`. Nine bare-tweet phrase regexes. External-link extraction for optional dead-link probing. Repairs route through `BrainWriter.transaction`.
|
||||
|
||||
#### BudgetLedger + CompletenessScorer (`src/core/enrichment/`)
|
||||
`BudgetLedger.reserve` returns `{kind:'held'}` or `{kind:'exhausted'}`. FOR UPDATE serializes concurrent reserves. `commit`, `rollback`, `cleanupExpired`. Midnight rollover via `Intl.DateTimeFormat` en-CA in configured IANA tz. Seven per-type rubrics + default (weights sum to 1.0). Person rubric's `non_redundancy` and `recency_score` kill Wintermute's length-only heuristic + 30-day-re-enrich-forever pathologies.
|
||||
`BudgetLedger.reserve` returns `{kind:'held'}` or `{kind:'exhausted'}`. FOR UPDATE serializes concurrent reserves. `commit`, `rollback`, `cleanupExpired`. Midnight rollover via `Intl.DateTimeFormat` en-CA in configured IANA tz. Seven per-type rubrics + default (weights sum to 1.0). Person rubric's `non_redundancy` and `recency_score` kill Garry's OpenClaw's length-only heuristic + 30-day-re-enrich-forever pathologies.
|
||||
|
||||
#### Minions scheduler polish (`src/core/minions/`)
|
||||
`quiet-hours.ts` — pure `evaluateQuietHours(cfg, now?)`. Wrap-around windows. Unknown tz fails open. `stagger.ts` — FNV-1a → 0–59 deterministic across runtimes. `worker.ts` integrated: post-claim evaluation, defer → `delayed/+15m`, skip → `cancelled`.
|
||||
@@ -952,7 +1028,7 @@ Your brain now wires itself. Every page write automatically extracts entity refe
|
||||
|
||||
- **Auto-link on every page write.** When you `gbrain put` a page that mentions `[Alice](people/alice)` or `[Acme](companies/acme)`, those links land in the graph automatically. Stale links (refs no longer in the page text) are removed in the same call. Run a quick `gbrain put` and the brain knows who's connected to whom. To opt out: `gbrain config set auto_link false`.
|
||||
- **Typed relationships.** Inferred from context using deterministic regex (zero LLM calls): `attended` (meeting -> person), `works_at` (CEO of, VP at, joined as), `invested_in` (invested in, backed by), `founded` (founded, co-founded), `advises` (advises, board member), `source` (frontmatter), `mentions` (default). On a 80-page benchmark brain: 94% type accuracy.
|
||||
- **`gbrain extract --source db`.** New mode for the existing `gbrain extract <links|timeline|all>` command that walks pages from the engine instead of from disk. Works for live brains backed by Postgres or PGLite without a local markdown checkout — exactly what an MCP-driven Wintermute or OpenClaw setup needs. Filesystem mode (`--source fs`) is unchanged and still the default.
|
||||
- **`gbrain extract --source db`.** New mode for the existing `gbrain extract <links|timeline|all>` command that walks pages from the engine instead of from disk. Works for live brains backed by Postgres or PGLite without a local markdown checkout — exactly what an MCP-driven OpenClaw setup needs. Filesystem mode (`--source fs`) is unchanged and still the default.
|
||||
- **`gbrain graph-query <slug>` for relationship traversal.** "Who works at Acme?" → `gbrain graph-query companies/acme --type works_at --direction in`. "Who attended meetings with Alice?" → `gbrain graph-query people/alice --type attended --depth 2`. Returns typed edges with depth, not just nodes. Backed by a new `traversePaths()` engine method on both PGLite and Postgres with cycle prevention (no exponential blowup on cyclic subgraphs).
|
||||
- **Graph-powered search ranking.** Hybrid search now applies a small backlink boost after cosine re-scoring (`score *= 1 + 0.05 * log(1 + backlink_count)`). Well-connected entities surface higher in results. Works in both keyword-only and full hybrid paths. Tested on the new `test/benchmark-graph-quality.ts` (80 pages, 35 queries, A/B/C comparison) — relational query recall jumps from ~30% (search alone) to 100% (graph traversal).
|
||||
- **Graph health metrics in `gbrain health`.** New `link_coverage` and `timeline_coverage` percentages on entity pages (person/company), plus `most_connected` top-5 list. The `dead_links` field is dropped (always 0 under ON DELETE CASCADE — was a phantom metric). The `brain_score` composite formula stays but now reflects a sharper graph signal.
|
||||
@@ -1015,7 +1091,7 @@ CLI wrappers (`runExtract`, `runEmbed`, etc.) stay as thin arg-parsers that catc
|
||||
|
||||
### Added — skillify ships as a first-class gbrain skill
|
||||
|
||||
Ported from Wintermute, proven in production. Paired with `gbrain check-resolvable` gives a user-controllable equivalent of Hermes' auto-skill-creation — you decide when and what, the tooling keeps the 10-item checklist honest.
|
||||
Ported from Garry's OpenClaw, proven in production. Paired with `gbrain check-resolvable` gives a user-controllable equivalent of Hermes' auto-skill-creation — you decide when and what, the tooling keeps the 10-item checklist honest.
|
||||
|
||||
- `skills/skillify/SKILL.md` — the meta skill. Triggers: "skillify this", "is this a skill?", "make this proper".
|
||||
- `scripts/skillify-check.ts` — machine-readable audit. `--json` for CI, `--recent` to check files modified in the last 7 days.
|
||||
@@ -1183,7 +1259,7 @@ Wave 3 fixes were contributed by **@garagon** (PRs #105-#109) and **@Hybirdss**
|
||||
| **cron-scheduler** | Schedule staggering (5-min offsets), quiet hours (timezone-aware with wake-up override), thin job prompts. | 21 cron jobs at :00 is a thundering herd. Staggering prevents it. Quiet hours mean no 3 AM notifications. Wake-up override releases the backlog. |
|
||||
| **reports** | Timestamped reports with keyword routing. "What's the latest briefing?" maps to the right report directory. | Cheap replacement for vector search on frequent queries. Don't embed. Load the file. |
|
||||
| **testing** | Validates every skill has SKILL.md with frontmatter, manifest coverage, resolver coverage. The CI for your skill system. | 3 skills and you need validation. 24 skills and you need it yesterday. Catches dead references, missing sections, MECE violations. |
|
||||
| **soul-audit** | 6-phase interview that generates SOUL.md, USER.md, ACCESS_POLICY.md, HEARTBEAT.md. Your agent's identity, built from your answers. | What makes Wintermute feel like Wintermute. Without personality and access control, every agent feels the same. |
|
||||
| **soul-audit** | 6-phase interview that generates SOUL.md, USER.md, ACCESS_POLICY.md, HEARTBEAT.md. Your agent's identity, built from your answers. | What makes your OpenClaw feel like yours. Without personality and access control, every agent feels the same. |
|
||||
| **webhook-transforms** | External events (SMS, meetings, social mentions) converted into brain pages with entity extraction. Dead-letter queue for failures. | Your brain ingests signals from everywhere. Not just conversations, but every webhook, every notification, every external event. |
|
||||
|
||||
### Infrastructure (new in v0.10.0)
|
||||
|
||||
28
CLAUDE.md
28
CLAUDE.md
@@ -59,8 +59,19 @@ strict behavior when unset.
|
||||
- `src/core/minions/protected-names.ts` — side-effect-free constant module exporting `PROTECTED_JOB_NAMES` + `isProtectedJobName()`. Kept pure so queue core can import without loading handler modules.
|
||||
- `src/core/minions/handlers/shell.ts` — `shell` job handler. Spawns `/bin/sh -c cmd` (absolute path, PATH-override-safe) or `argv[0] argv[1..]` (no shell). Env allowlist: `PATH, HOME, USER, LANG, TZ, NODE_ENV` + caller `env:` overrides. UTF-8-safe stdout/stderr tail via `string_decoder.StringDecoder`. Abort (either `ctx.signal` or `ctx.shutdownSignal`) fires SIGTERM → 5s grace → SIGKILL on child. Requires `GBRAIN_ALLOW_SHELL_JOBS=1` on worker (gated by `registerBuiltinHandlers`).
|
||||
- `src/core/minions/handlers/shell-audit.ts` — per-submission JSONL audit trail at `~/.gbrain/audit/shell-jobs-YYYY-Www.jsonl` (ISO-week rotation; override via `GBRAIN_AUDIT_DIR`). Best-effort: `mkdirSync(recursive)` + `appendFileSync`; failures logged to stderr, submission not blocked. Logs cmd (first 80 chars) or argv (JSON array). Never logs env values.
|
||||
- `src/core/minions/handlers/subagent.ts` (v0.15) — LLM-loop handler. Two-phase tool persistence (pending → complete/failed), replay reconciliation for mid-dispatch crashes, dual-signal abort (`ctx.signal` + `ctx.shutdownSignal`), Anthropic prompt caching on system + tool defs. `makeSubagentHandler({engine, client?, ...})` factory; `MessagesClient` is an injectable interface the real SDK implements structurally. Throws `RateLeaseUnavailableError` (renewable) when rate-lease capacity is full.
|
||||
- `src/core/minions/handlers/subagent-aggregator.ts` (v0.15) — `subagent_aggregator` handler. Claims AFTER all children resolve (queue changes guarantee every terminal child posts a `child_done` inbox message with outcome). Reads inbox via `ctx.readInbox()`, builds deterministic mixed-outcome markdown summary. No LLM call in v0.15.
|
||||
- `src/core/minions/handlers/subagent-audit.ts` (v0.15) — JSONL audit + heartbeat writer at `~/.gbrain/audit/subagent-jobs-YYYY-Www.jsonl`. Events: `submission` (one line per submit) + `heartbeat` (per turn boundary: `llm_call_started | llm_call_completed | tool_called | tool_result | tool_failed`). Never logs prompts or tool inputs. `readSubagentAuditForJob(jobId, {sinceIso})` is the readback path for `gbrain agent logs`.
|
||||
- `src/core/minions/rate-leases.ts` (v0.15) — lease-based concurrency cap for outbound providers (default key `anthropic:messages`, max via `GBRAIN_ANTHROPIC_MAX_INFLIGHT`). Owner-tagged rows with `expires_at` auto-prune on acquire; `pg_advisory_xact_lock` guards check-then-insert; CASCADE on owning job deletion. `renewLeaseWithBackoff` retries 3x (250/500/1000ms).
|
||||
- `src/core/minions/wait-for-completion.ts` (v0.15) — poll-until-terminal helper for CLI callers. `TimeoutError` does NOT cancel the job; `AbortSignal` exits without throwing. Default `pollMs`: 1000 on Postgres, 250 on PGLite inline.
|
||||
- `src/core/minions/transcript.ts` (v0.15) — renders `subagent_messages` + `subagent_tool_executions` to markdown. Tool rows splice under their owning assistant `tool_use` by `tool_use_id`. UTF-8-safe truncation; unknown block types fall through to fenced JSON.
|
||||
- `src/core/minions/plugin-loader.ts` (v0.15) — `GBRAIN_PLUGIN_PATH` discovery. Absolute paths only, left-wins collision, `gbrain.plugin.json` with `plugin_version: "gbrain-plugin-v1"`, plugins ship DEFS only (no new tools), `allowed_tools:` validated at load time against the derived registry.
|
||||
- `src/core/minions/tools/brain-allowlist.ts` (v0.15) — derives subagent tool registry from `src/core/operations.ts`. 11-name allow-list: `query`, `search`, `get_page`, `list_pages`, `file_list`, `file_url`, `get_backlinks`, `traverse_graph`, `resolve_slugs`, `get_ingest_log`, `put_page`. `put_page` schema is namespace-wrapped per subagent (`^wiki/agents/<subagentId>/.+`); the `put_page` op's server-side check is the authoritative gate via `ctx.viaSubagent` fail-closed.
|
||||
- `src/mcp/tool-defs.ts` (v0.15) — extracted `buildToolDefs(ops)` helper. MCP server + subagent tool registry both call it; byte-for-byte equivalence pinned by `test/mcp-tool-defs.test.ts`.
|
||||
- `src/core/minions/attachments.ts` — Attachment validation (path traversal, null byte, oversize, base64, duplicate detection)
|
||||
- `src/commands/jobs.ts` — `gbrain jobs` CLI subcommands + `gbrain jobs work` daemon. v0.13.1 surfaces the full `MinionJobInput` retry/backoff/timeout/idempotency surface as first-class CLI flags on `jobs submit`: `--max-stalled`, `--backoff-type fixed|exponential`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key`. `jobs smoke --sigkill-rescue` is the opt-in regression guard for #219.
|
||||
- `src/commands/agent.ts` (v0.16) — `gbrain agent run <prompt> [flags]` CLI. Submits `subagent` (or N children + 1 aggregator) under `{allowProtectedSubmit: true}`. Single-entry `--fanout-manifest` short-circuits. Children get `on_child_fail: 'continue'` + `max_stalled: 3`. `--follow` is the default on TTY; streams logs + polls `waitForCompletion` in parallel. Ctrl-C detaches, does not cancel.
|
||||
- `src/commands/agent-logs.ts` (v0.16) — `gbrain agent logs <job> [--follow] [--since]`. Merges JSONL heartbeat audit + `subagent_messages` into a chronological timeline. `parseSince` accepts ISO-8601 or relative (`5m`, `1h`, `2d`). Transcript tail renders only for terminal jobs.
|
||||
- `src/commands/jobs.ts` — `gbrain jobs` CLI subcommands + `gbrain jobs work` daemon. v0.13.1 surfaces the full `MinionJobInput` retry/backoff/timeout/idempotency surface as first-class CLI flags on `jobs submit`: `--max-stalled`, `--backoff-type fixed|exponential`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key`. `jobs smoke --sigkill-rescue` is the opt-in regression guard for #219. v0.16 wires `registerBuiltinHandlers` to always register `subagent` + `subagent_aggregator` (no env flag — `ANTHROPIC_API_KEY` is the natural cost gate, trust is via `PROTECTED_JOB_NAMES`) and loads `GBRAIN_PLUGIN_PATH` plugins at worker startup with a loud startup-line per plugin. `shell` handler still gated by `GBRAIN_ALLOW_SHELL_JOBS=1` (RCE surface, separate concern).
|
||||
- `src/commands/features.ts` — `gbrain features --json --auto-fix`: usage scan + feature adoption salesman
|
||||
- `src/commands/autopilot.ts` — `gbrain autopilot --install`: self-maintaining brain daemon (sync+extract+embed)
|
||||
- `src/mcp/server.ts` — MCP stdio server (generated from operations)
|
||||
@@ -557,6 +568,21 @@ search engines, surfaced in cross-references, and distributed with every release
|
||||
- Example meeting → `meetings/2026-04-03` (generic date is fine)
|
||||
- Example user → `you` or `the user`, never a proper name
|
||||
|
||||
**Specific rule: never say `Wintermute` in any CHANGELOG, README, doc, PR, or
|
||||
commit message.** When the temptation is to illustrate with the real fork name:
|
||||
- Reader-facing copy → `your OpenClaw` (covers Wintermute, Hermes, AlphaClaw,
|
||||
and any other downstream OpenClaw deployment in one term the reader already
|
||||
recognizes).
|
||||
- First-person / origin-story copy → `Garry's OpenClaw` (honest that this is
|
||||
the production deployment driving the feature, without exposing the private
|
||||
agent's name).
|
||||
|
||||
`Wintermute` may appear in private artifacts (scratch plans under
|
||||
`~/.gstack/projects/…`, memory files, conversation transcripts, CEO-review
|
||||
plans) — those aren't distributed. Anything checked into this repo or shipped
|
||||
in a release must use the OpenClaw phrasing above. Sweeping a stale reference
|
||||
is a small clean-up PR, not a debate.
|
||||
|
||||
**When in doubt, ask yourself:** "Would this query reveal private information
|
||||
about the user's contacts, investments, or portfolio if it were read by a
|
||||
stranger?" If yes, replace with generic placeholders.
|
||||
|
||||
19
README.md
19
README.md
@@ -230,6 +230,25 @@ If anything's off, `actions[]` tells you the exact command to run. For deeper tr
|
||||
|
||||
Moving gateway crons to Minions (deterministic scripts, zero LLM tokens per fire): [`docs/guides/minions-shell-jobs.md`](docs/guides/minions-shell-jobs.md).
|
||||
|
||||
## Durable agents: `gbrain agent` (v0.15)
|
||||
|
||||
Your subagent runs survive crashes now. OpenClaw died mid-run? The worker re-claims on restart and replays from the last committed turn. Fan-out across 50 shards, one shard crashes — the aggregator still claims after every child reaches a terminal state and writes a mixed-outcome summary. Tool calls persist as a two-phase ledger (`pending` → `complete | failed`) so replay is safe by construction, not by hope.
|
||||
|
||||
```bash
|
||||
# Submit a single-subagent run
|
||||
gbrain agent run "summarize my last 10 journal pages"
|
||||
|
||||
# Fan out N prompts across N subagent children + 1 aggregator
|
||||
gbrain agent run "analyze every page" \
|
||||
--fanout-manifest manifests/pages.json \
|
||||
--subagent-def analyzer
|
||||
|
||||
# Tail a running job (heartbeat per turn + full transcript on completion)
|
||||
gbrain agent logs 1247 --follow --since 5m
|
||||
```
|
||||
|
||||
Durability is the point: every Anthropic turn commits to `subagent_messages`, every tool call to `subagent_tool_executions`. Worker kills, OpenClaw crashes, timeouts — all resumable. Host repos (your OpenClaw, etc.) ship their own subagent definitions via `GBRAIN_PLUGIN_PATH` + a `gbrain.plugin.json` manifest: see [`docs/guides/plugin-authors.md`](docs/guides/plugin-authors.md). Requires `ANTHROPIC_API_KEY` on the worker.
|
||||
|
||||
## Skillify: your skills tree stops being a black box
|
||||
|
||||
Hermes and similar agent frameworks auto-create skills as a background behavior. Fine until you don't know what the agent shipped. Checklists decay. Tests drift. Resolver entries get stale. Six months later you've got an opaque pile of "skills" that nobody has read, nobody has tested, and nobody is sure still work.
|
||||
|
||||
2
TODOS.md
2
TODOS.md
@@ -173,7 +173,7 @@ board" — likely an advisor-role page prior plus verb-pattern combinations.
|
||||
|
||||
**Cons:** Requires adding `sender_id` or `access_tier` to `OperationContext`. Each mutating operation needs a permission check. Medium implementation effort.
|
||||
|
||||
**Context:** From CEO review + Codex outside voice (2026-04-13). Prompt-layer access control works in practice (same model as Wintermute) but is not sufficient for remote MCP where direct tool calls bypass the agent's prompt.
|
||||
**Context:** From CEO review + Codex outside voice (2026-04-13). Prompt-layer access control works in practice (same model as Garry's OpenClaw) but is not sufficient for remote MCP where direct tool calls bypass the agent's prompt.
|
||||
|
||||
**Depends on:** v0.10.0 GStackBrain skill layer (shipped).
|
||||
|
||||
|
||||
@@ -358,6 +358,106 @@ upcoming `gbrain crontab-to-minions <file>` helper is P1 in TODOS.
|
||||
|
||||
---
|
||||
|
||||
## v0.16.0: durable agent runtime
|
||||
|
||||
v0.15 ships `gbrain agent run` / `gbrain agent logs`, a new `subagent` handler
|
||||
type in Minions, and a plugin contract for host-repo subagent defs. None of the
|
||||
existing skills need surgery. The question for downstream agents is *how* to
|
||||
adopt the new runtime, not how to patch around a breaking change.
|
||||
|
||||
### 1. Run a worker with an Anthropic key
|
||||
|
||||
The subagent handlers (`subagent` and `subagent_aggregator`) are always
|
||||
registered on the worker. No separate opt-in flag — `ANTHROPIC_API_KEY` is
|
||||
the natural cost gate (no key, the SDK call fails on the first turn), and
|
||||
who-can-submit is already protected (`PROTECTED_JOB_NAMES` + trusted-submit:
|
||||
MCP callers get `permission_denied`; only `gbrain agent run` can insert
|
||||
these rows).
|
||||
|
||||
```bash
|
||||
ANTHROPIC_API_KEY=sk-ant-... gbrain jobs work
|
||||
```
|
||||
|
||||
Worker startup prints:
|
||||
|
||||
```
|
||||
[minion worker] subagent handlers enabled
|
||||
```
|
||||
|
||||
### 2. Ship your subagents as a plugin (OpenClaw + similar)
|
||||
|
||||
Move your custom subagent definitions out of your gbrain fork and into your own
|
||||
repo as a plugin. Concretely:
|
||||
|
||||
```
|
||||
~/<your-agent>/gbrain-plugin/
|
||||
├── gbrain.plugin.json
|
||||
└── subagents/
|
||||
├── meeting-ingestion.md
|
||||
├── signal-detector.md
|
||||
└── daily-task-prep.md
|
||||
```
|
||||
|
||||
`gbrain.plugin.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "your-openclaw",
|
||||
"version": "2026.4.20",
|
||||
"plugin_version": "gbrain-plugin-v1"
|
||||
}
|
||||
```
|
||||
|
||||
Each `subagents/*.md` is a plain-text agent definition — YAML frontmatter +
|
||||
body-as-system-prompt. Recognized frontmatter fields: `name`, `model`,
|
||||
`max_turns`, `allowed_tools` (must subset the derived brain-tool registry).
|
||||
|
||||
Turn it on:
|
||||
|
||||
```bash
|
||||
export GBRAIN_PLUGIN_PATH="$HOME/<your-agent>/gbrain-plugin"
|
||||
```
|
||||
|
||||
Worker startup prints `[plugin-loader] loaded '<name>' v<ver> (N subagents)`
|
||||
per plugin; any rejection (bad manifest, unknown tool in `allowed_tools`,
|
||||
version mismatch) shows up as a loud warning at startup, not a silent dispatch-
|
||||
time failure. See `docs/guides/plugin-authors.md` for the full contract.
|
||||
|
||||
### 3. Replace ephemeral subagent runs with durable ones
|
||||
|
||||
If your agent currently spawns ephemeral subagents (OpenClaw `Agent()`, ad-hoc
|
||||
Anthropic API calls, etc.) for work that should survive crashes, sleeps, or
|
||||
worker restarts, migrate those to `gbrain agent run`. The durability is free:
|
||||
|
||||
```bash
|
||||
gbrain agent run "analyze my last 50 journal pages for recurring themes" \
|
||||
--subagent-def analyzer --fanout-manifest manifests/journal-pages.json
|
||||
```
|
||||
|
||||
Every turn persists to `subagent_messages`, every tool call is a two-phase
|
||||
ledger, and `gbrain agent logs <job>` shows where it died + what the last
|
||||
successful call returned. No more "re-run from scratch because the session
|
||||
context evaporated."
|
||||
|
||||
### 4. `put_page` from subagents writes under an agent namespace
|
||||
|
||||
If you adopted the v0.15 subagent runtime, note that `put_page` calls
|
||||
originating from a subagent's tool dispatch MUST target
|
||||
`wiki/agents/<subagent_id>/...`. The schema shown to the model enforces this
|
||||
on first try; a server-side fail-closed check rejects anything else. This
|
||||
does NOT affect your skill files, CLI put_page calls, or MCP put_page —
|
||||
only tool-dispatched writes from inside an LLM loop.
|
||||
|
||||
Aggregation output (the final "here's what all N children found" brain page)
|
||||
goes via a separate trusted CLI path, not through a subagent tool call, so
|
||||
it can write anywhere you want.
|
||||
|
||||
Iron rule: **never grant an agent write access beyond its namespace**. The
|
||||
server-side check exists because dispatcher bugs happen; treat it as defense
|
||||
in depth, not the primary boundary.
|
||||
|
||||
---
|
||||
|
||||
## Future versions
|
||||
|
||||
When gbrain ships a new version, this doc will be updated with the diffs for that
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# Production Benchmark: Minions vs OpenClaw Sub-agents (Real Deployment)
|
||||
|
||||
**Date:** 2026-04-18
|
||||
**Environment:** Wintermute on Render (ephemeral container, Supabase Postgres)
|
||||
**Environment:** Garry's OpenClaw on Render (ephemeral container, Supabase Postgres)
|
||||
**GBrain:** v0.11.0 (minions-jobs branch)
|
||||
**OpenClaw:** 2026.4.10
|
||||
**Brain:** 45,798 pages, 98K chunks, 25K links, 79K timeline entries
|
||||
|
||||
@@ -8,9 +8,9 @@
|
||||
|
||||
## 0. Context
|
||||
|
||||
During a CEO review of a narrow two-feature plan (bare-tweet citation repair + completeness score, borrowed from Feynman), the scope was reframed. The narrow plan duplicated work Wintermute already does and missed the real leverage point: **the bespoke abstractions hiding inside Wintermute — resolvers, enrichment orchestration, scheduling, deterministic output — should live in GBrain as first-class primitives.**
|
||||
During a CEO review of a narrow two-feature plan (bare-tweet citation repair + completeness score, borrowed from Feynman), the scope was reframed. The narrow plan duplicated work Garry's OpenClaw already does and missed the real leverage point: **the bespoke abstractions hiding inside OpenClaw — resolvers, enrichment orchestration, scheduling, deterministic output — should live in GBrain as first-class primitives.**
|
||||
|
||||
North star: *"When Wintermute's Claw upgrades to this version of GBrain, it should immediately recognize brilliance and completeness and say 'It's time to switch to these abstractions.'"*
|
||||
North star: *"When Garry's OpenClaw's Claw upgrades to this version of GBrain, it should immediately recognize brilliance and completeness and say 'It's time to switch to these abstractions.'"*
|
||||
|
||||
That is the test this document is designed against. Everything else is downstream.
|
||||
|
||||
@@ -67,7 +67,7 @@ An earlier implementation could ship L1 + L4 first (the two "purest" layers) and
|
||||
|
||||
### 3.1 What's broken today
|
||||
|
||||
Wintermute has **69 distinct external-lookup patterns** across X API (14 shapes), Perplexity, Mistral OCR, Gmail, Calendar, Slack, GitHub, YouTube, Diarize.io, YC tools, OSINT collectors, and brain-local lookups. Each one is a bespoke script under `scripts/` with its own error handling, retry logic, and output shape. GBrain has 3 ad-hoc wrappers (`embedding.ts`, `transcription.ts`, `enrichment-service.ts`) that don't share an interface.
|
||||
Garry's OpenClaw has **69 distinct external-lookup patterns** across X API (14 shapes), Perplexity, Mistral OCR, Gmail, Calendar, Slack, GitHub, YouTube, Diarize.io, YC tools, OSINT collectors, and brain-local lookups. Each one is a bespoke script under `scripts/` with its own error handling, retry logic, and output shape. GBrain has 3 ad-hoc wrappers (`embedding.ts`, `transcription.ts`, `enrichment-service.ts`) that don't share an interface.
|
||||
|
||||
Common consequences:
|
||||
- No uniform retry/backoff strategy (some scripts retry, most don't)
|
||||
@@ -187,7 +187,7 @@ Existing `src/core/fail-improve.ts` is the deterministic-first/LLM-fallback patt
|
||||
|
||||
### 3.7 Reference implementations to ship
|
||||
|
||||
The Wintermute survey inventoried 69 resolver shapes. Shipping all of them is wrong (over-scoped); shipping zero is under-scoped. The dogfood set:
|
||||
The OpenClaw survey inventoried 69 resolver shapes. Shipping all of them is wrong (over-scoped); shipping zero is under-scoped. The dogfood set:
|
||||
|
||||
| # | Resolver | Purpose | Used by |
|
||||
|---|---|---|---|
|
||||
@@ -198,7 +198,7 @@ The Wintermute survey inventoried 69 resolver shapes. Shipping all of them is wr
|
||||
| 5 | `perplexity_query` | Query → synthesis + citations | Enrichment Orchestrator |
|
||||
| 6 | `text_to_entities` | LLM entity extraction (structured JSON) | Enrichment Orchestrator |
|
||||
|
||||
The remaining 63 Wintermute patterns port incrementally, driven by user need. Each port is a new YAML + module under `recipes/` or `~/.gbrain/resolvers/` with no framework changes.
|
||||
The remaining 63 OpenClaw patterns port incrementally, driven by user need. Each port is a new YAML + module under `recipes/` or `~/.gbrain/resolvers/` with no framework changes.
|
||||
|
||||
---
|
||||
|
||||
@@ -206,7 +206,7 @@ The remaining 63 Wintermute patterns port incrementally, driven by user need. Ea
|
||||
|
||||
### 4.1 What's broken today
|
||||
|
||||
Wintermute's enrichment is **polished at the data layer, hacky at the control layer**:
|
||||
Garry's OpenClaw's enrichment is **polished at the data layer, hacky at the control layer**:
|
||||
|
||||
- **Completeness = "length > 500 chars + no `needs-enrichment` tag"** (`lib/enrich.mjs:351-355`). Naïve. A rich page of repetitive Perplexity summaries (see `brain/people/0interestrates.md` — 38 repeating blocks) passes this check.
|
||||
- **30-day auto-re-enrichment** runs forever. No "done" state. A person met once in 2023 still gets re-researched monthly.
|
||||
@@ -342,9 +342,9 @@ await writer.transaction(async (tx) => {
|
||||
|
||||
### 5.1 What's broken today
|
||||
|
||||
Wintermute's cron is **externally-driven JSON** (`cron/jobs.json`) with ~30 jobs manually stagger-offset at different minutes. GBrain has **zero native scheduling** — `src/commands/autopilot.ts` is a single daemon loop, and `docs/guides/cron-schedule.md` is architectural guidance, not code.
|
||||
Garry's OpenClaw's cron is **externally-driven JSON** (`cron/jobs.json`) with ~30 jobs manually stagger-offset at different minutes. GBrain has **zero native scheduling** — `src/commands/autopilot.ts` is a single daemon loop, and `docs/guides/cron-schedule.md` is architectural guidance, not code.
|
||||
|
||||
Failures observed in Wintermute's actual state:
|
||||
Failures observed in Garry's OpenClaw's actual state:
|
||||
- `X OAuth2 Token Refresh`: 11 consecutive timeouts (critical-path silent failure)
|
||||
- `flight-tracker daily scan`: 5 consecutive timeouts
|
||||
- `morning-briefing`: 4 consecutive timeouts
|
||||
@@ -378,9 +378,9 @@ export interface ScheduledResolver extends Resolver<void, ScheduledResult> {
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 Enforcement vs convention (the key delta from Wintermute)
|
||||
### 5.3 Enforcement vs convention (the key delta from Garry's OpenClaw)
|
||||
|
||||
| Concern | Wintermute today | Knowledge Runtime |
|
||||
| Concern | Garry's OpenClaw today | Knowledge Runtime |
|
||||
|---|---|---|
|
||||
| Quiet hours | Checked inside each skill (trust-based) | Enforced at scheduler, skill cannot override |
|
||||
| Staggering | Manual minute-offset in `jobs.json` | Scheduler assigns slots via hashed staggerKey |
|
||||
@@ -405,7 +405,7 @@ Every scheduled run emits structured events: `started`, `skipped-quiet-hours`, `
|
||||
- `engine.logIngest` (audit trail in brain DB)
|
||||
- Optional webhook (Slack/Telegram for the user)
|
||||
|
||||
`gbrain doctor` reads the event log and reports: current circuit-breaker state, any resolver with > 3 consecutive failures, any resolver that hasn't fired within 3× its interval (freshness SLA like Wintermute's `freshness-check.mjs` but built-in).
|
||||
`gbrain doctor` reads the event log and reports: current circuit-breaker state, any resolver with > 3 consecutive failures, any resolver that hasn't fired within 3× its interval (freshness SLA like Garry's OpenClaw's `freshness-check.mjs` but built-in).
|
||||
|
||||
---
|
||||
|
||||
@@ -415,9 +415,9 @@ Every scheduled run emits structured events: `started`, `skipped-quiet-hours`, `
|
||||
|
||||
**Iron Law: LLM picks WHAT. Code guarantees WHERE and HOW.**
|
||||
|
||||
Wintermute's existing `lib/enrich.mjs:buildTweetEntry` is close to this — tweet URLs are built from `tweet.id` returned by the X API, never from LLM memory. But:
|
||||
Garry's OpenClaw's existing `lib/enrich.mjs:buildTweetEntry` is close to this — tweet URLs are built from `tweet.id` returned by the X API, never from LLM memory. But:
|
||||
|
||||
- A past incident: *"Sub-agent test #2 FAILED — hallucinated 'Philip Leung' entity links across all daily files. LLM rewriting of daily files is too error-prone."* (Wintermute memory log, 2026-04-13.)
|
||||
- A past incident: *"Sub-agent test #2 FAILED — hallucinated 'Philip Leung' entity links across all daily files. LLM rewriting of daily files is too error-prone."* (Garry's OpenClaw memory log, 2026-04-13.)
|
||||
- Back-links depend on `appendTimeline` being called everywhere; skips are silent.
|
||||
- Slug collisions are unchecked (no conflict detection on `slugify`).
|
||||
- Citation format is post-hoc linted weekly, not pre-write enforced.
|
||||
@@ -461,7 +461,7 @@ export class Scaffolder {
|
||||
// "[Source: [X/garrytan, 2026-04-18](https://x.com/garrytan/status/123456)]"
|
||||
}
|
||||
emailCitation(account: string, messageId: string, subject: string): string {
|
||||
// deterministic Gmail URL per Wintermute pattern
|
||||
// deterministic Gmail URL per OpenClaw pattern
|
||||
}
|
||||
sourceCitation(resolverResult: ResolverResult<unknown>): string {
|
||||
// pulls .source, .fetchedAt, .raw from the result
|
||||
@@ -563,7 +563,7 @@ Each phase ships independently, passes full E2E, is feature-flagged, and is reve
|
||||
- L4 core: `BrainWriter.transaction`, `Scaffolder`, `SlugRegistry` with conflict detection.
|
||||
- Pre-write validators: citation, link, back-link, triple-HR.
|
||||
- Migrate `src/commands/publish.ts` + `src/commands/backlinks.ts` to route through BrainWriter.
|
||||
- **Now** Wintermute's "Philip Leung" hallucination is structurally impossible — LLM output passes through JSON-Schema validator before reaching Scaffolder.
|
||||
- **Now** Garry's OpenClaw's "Philip Leung" hallucination is structurally impossible — LLM output passes through JSON-Schema validator before reaching Scaffolder.
|
||||
|
||||
### Phase 3 — `gbrain integrity` command (human: ~0.5 wk / CC: ~2 h)
|
||||
- Ship the originally-scoped user-facing feature on top of the new foundation.
|
||||
@@ -582,14 +582,14 @@ Each phase ships independently, passes full E2E, is feature-flagged, and is reve
|
||||
- Migrate `src/commands/autopilot.ts` to a ScheduledResolver set.
|
||||
- Ship `gbrain schedule list|run|pause|tail` CLI for observability.
|
||||
|
||||
### Phase 6 — Port 5–8 Wintermute resolvers (human: ~1.5 wk / CC: ~6 h)
|
||||
### Phase 6 — Port 5–8 OpenClaw resolvers (human: ~1.5 wk / CC: ~6 h)
|
||||
- `perplexity_query`, `text_to_entities`, `mistral_ocr_pdf`, `x_search_all`, `x_user_to_tweets`, `gmail_query_to_threads`, `calendar_date_to_events`.
|
||||
- Each ships as YAML + TS module under `resolvers/builtin/` — **proof of the plugin format.**
|
||||
|
||||
### Phase 7 — Wintermute Claw Adoption Integration (human: ~1 wk / CC: ~4 h)
|
||||
- Write `docs/wintermute/ADOPTION.md` showing Wintermute how to replace its 69 bespoke scripts with calls to `gbrain registry.resolve(...)`.
|
||||
- Ship a `gbrain claw-bridge` subcommand that proxies Wintermute's current script invocations to the resolver registry — zero-edit adoption path.
|
||||
- **This is the test of the north star.** If Wintermute can stand up a 1-line shim and drop `scripts/x-api-client.mjs`, the abstraction succeeded.
|
||||
### Phase 7 — OpenClaw Adoption Integration (human: ~1 wk / CC: ~4 h)
|
||||
- Write `docs/openclaw/ADOPTION.md` showing your OpenClaw how to replace its 69 bespoke scripts with calls to `gbrain registry.resolve(...)`.
|
||||
- Ship a `gbrain claw-bridge` subcommand that proxies Garry's OpenClaw's current script invocations to the resolver registry — zero-edit adoption path.
|
||||
- **This is the test of the north star.** If your OpenClaw can stand up a 1-line shim and drop `scripts/x-api-client.mjs`, the abstraction succeeded.
|
||||
|
||||
Total: human: ~10 weeks / CC: ~42 hours / calendar with single implementer: ~3–4 weeks.
|
||||
|
||||
@@ -649,7 +649,7 @@ src/commands/
|
||||
integrity.ts # ships in Phase 3, replaces Feynman Phase A/B
|
||||
schedule.ts # gbrain schedule list|run|pause|tail (Phase 5)
|
||||
|
||||
docs/wintermute/
|
||||
docs/openclaw/
|
||||
ADOPTION.md # written in Phase 7
|
||||
```
|
||||
|
||||
@@ -685,19 +685,19 @@ Every Resolver implementation tested against the interface spec. Table-driven: r
|
||||
- Simulate API timeout mid-transaction; transaction must roll back completely.
|
||||
- Corrupted state file; scheduler must escalate, not silently skip.
|
||||
|
||||
### Regression tests vs. Wintermute behavior
|
||||
For each Wintermute pattern we port (e.g. X-handle → tweet URL), a regression test proves the new resolver produces the same answer on real-world inputs from the brain audit. This is the "Wintermute would adopt" proof.
|
||||
### Regression tests vs. Garry's OpenClaw behavior
|
||||
For each OpenClaw pattern we port (e.g. X-handle → tweet URL), a regression test proves the new resolver produces the same answer on real-world inputs from the brain audit. This is the "your OpenClaw would adopt" proof.
|
||||
|
||||
---
|
||||
|
||||
## 11. Open Questions (flagged for CEO re-review)
|
||||
|
||||
1. **Scope shape.** Is this the right four-layer decomposition, or are some layers better left to Wintermute (e.g. Scheduling lives above GBrain, not in it)?
|
||||
1. **Scope shape.** Is this the right four-layer decomposition, or are some layers better left to OpenClaw (e.g. Scheduling lives above GBrain, not in it)?
|
||||
2. **Phase 3 user-value break.** Does Phase 3 (user-visible `gbrain integrity`) ship early enough, or do we need an even smaller MVP?
|
||||
3. **LLM-as-resolver.** Should `text_to_entities` be a Resolver, or does that blur the "code vs LLM" line the invariant relies on?
|
||||
4. **Plugin format.** YAML + TS module (§3.5) vs. pure TS module with decorator-style metadata. Latter is more type-safe; former is more discoverable.
|
||||
5. **Cross-resolver transactions.** Do we support "atomic fetch-from-Perplexity + write-to-brain" at the L2 layer? Current design says yes; implementation is tricky (Perplexity call isn't rollbackable).
|
||||
6. **Wintermute bridge scope.** Phase 7 `gbrain claw-bridge` — is that worth a phase of its own, or should adoption be documentation-only?
|
||||
6. **OpenClaw bridge scope.** Phase 7 `gbrain claw-bridge` — is that worth a phase of its own, or should adoption be documentation-only?
|
||||
7. **Completeness rubric coverage.** Do we define rubrics for all 9 PageTypes upfront, or ship people/company/meeting first and extend incrementally?
|
||||
8. **Budget config UX.** Hard daily cap is strict; should we also expose a soft-cap warning mode, and how is the cap set (env var? config file? prompt on first use?)
|
||||
9. **Backwards compat.** `src/commands/publish.ts` and `src/commands/backlinks.ts` have been running cleanly for weeks. Refactoring through BrainWriter carries migration risk. Acceptable?
|
||||
@@ -705,12 +705,12 @@ For each Wintermute pattern we port (e.g. X-handle → tweet URL), a regression
|
||||
|
||||
---
|
||||
|
||||
## 12. Verification (the "Wintermute would adopt" test)
|
||||
## 12. Verification (the "your OpenClaw would adopt" test)
|
||||
|
||||
The design succeeds iff:
|
||||
|
||||
- [ ] A user can add a new resolver by dropping a YAML + TS module in `~/.gbrain/resolvers/` without editing GBrain source.
|
||||
- [ ] Wintermute can delete `scripts/x-api-client.mjs` and replace all callers with 1-line `await registry.resolve('x_handle_to_tweet', ...)`.
|
||||
- [ ] Your OpenClaw can delete `scripts/x-api-client.mjs` and replace all callers with 1-line `await registry.resolve('x_handle_to_tweet', ...)`.
|
||||
- [ ] No brain page can be written with a bare tweet reference, a missing back-link, or an unverified URL (validators catch it pre-commit).
|
||||
- [ ] Running `gbrain integrity --auto --confidence 0.8` over a real brain fixes ≥1,000 of the 1,424 known bare-tweet citations without human review.
|
||||
- [ ] Full E2E test suite passes on both PGLite + Postgres engines.
|
||||
|
||||
@@ -73,7 +73,7 @@ F. Install gbrain autopilot --install (env-aware)
|
||||
G. Record append completed.jsonl status:"complete"
|
||||
```
|
||||
|
||||
If Phase E emits TODOs for host-specific handlers (e.g. Wintermute's
|
||||
If Phase E emits TODOs for host-specific handlers (e.g. your OpenClaw's
|
||||
~29 non-gbrain crons), the migration finishes with `status: "partial"`.
|
||||
Your host agent walks the TODOs using `skills/migrations/v0.11.0.md` +
|
||||
`docs/guides/plugin-handlers.md`, ships handler registrations in the
|
||||
|
||||
163
docs/guides/plugin-authors.md
Normal file
163
docs/guides/plugin-authors.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# Plugin authors guide (v0.15)
|
||||
|
||||
`gbrain` discovers subagent definitions from outside this repo via
|
||||
`GBRAIN_PLUGIN_PATH`. If you maintain a downstream agent (your OpenClaw
|
||||
deployment, a workflow host, a private tool) and want to ship custom
|
||||
subagents alongside it, drop a plugin directory on that env path.
|
||||
|
||||
This guide is for plugin authors. The CLI user doesn't need to read it.
|
||||
|
||||
## Minimum viable plugin
|
||||
|
||||
```
|
||||
/path/to/my-plugin/
|
||||
├── gbrain.plugin.json
|
||||
└── subagents/
|
||||
└── my-summarizer.md
|
||||
```
|
||||
|
||||
`gbrain.plugin.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "my-plugin",
|
||||
"version": "1.0.0",
|
||||
"plugin_version": "gbrain-plugin-v1"
|
||||
}
|
||||
```
|
||||
|
||||
`subagents/my-summarizer.md`:
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: my-summarizer
|
||||
model: claude-sonnet-4-6
|
||||
allowed_tools:
|
||||
- brain_search
|
||||
- brain_get_page
|
||||
---
|
||||
|
||||
You are a brain page summarizer. Given a slug, fetch the page and produce
|
||||
a 3-sentence summary.
|
||||
```
|
||||
|
||||
## Turning it on
|
||||
|
||||
```bash
|
||||
export GBRAIN_PLUGIN_PATH="/path/to/my-plugin"
|
||||
gbrain jobs work # worker startup prints the plugin load line
|
||||
gbrain agent run "summarize meetings/2026-04-20" --subagent-def my-summarizer
|
||||
```
|
||||
|
||||
Multiple plugins: colon-separated, just like `$PATH`.
|
||||
|
||||
```bash
|
||||
export GBRAIN_PLUGIN_PATH="/path/to/plugin-a:/path/to/plugin-b"
|
||||
```
|
||||
|
||||
## Rules (strict by design)
|
||||
|
||||
**Path policy.** Absolute paths only. Relative paths, `~`-prefixed paths,
|
||||
and URL-style paths (`https://`, `file://`) are rejected with a warning.
|
||||
You control where your plugin lives on disk; `gbrain` doesn't guess.
|
||||
|
||||
**Collision policy.** If two plugins ship a subagent with the same `name`,
|
||||
the one listed FIRST in `GBRAIN_PLUGIN_PATH` wins. The other is dropped
|
||||
with a warning naming both sources.
|
||||
|
||||
**Trust policy.** Plugins ship subagent definitions ONLY in v0.15:
|
||||
|
||||
- You **cannot** declare new tools.
|
||||
- You **cannot** extend the brain tool allow-list.
|
||||
- You **cannot** override any `agentSafe` or similar flag.
|
||||
- Your `allowed_tools:` frontmatter field MUST subset the derived brain
|
||||
tool registry. Names not in the registry are rejected at plugin load
|
||||
time (worker startup), NOT at subagent dispatch time — so a typo in
|
||||
your plugin gives you a loud startup error, not a silent "tool never
|
||||
fires" at 3am.
|
||||
|
||||
v0.16+ may open up plugin-declared tools with a separate contract. Don't
|
||||
expect it.
|
||||
|
||||
## `gbrain.plugin.json`
|
||||
|
||||
| field | type | required | notes |
|
||||
|------------------|--------|----------|--------------------------------------------------------------------|
|
||||
| `name` | string | yes | Human-readable plugin id. Shows up in warnings and collision logs. |
|
||||
| `version` | string | yes | Your plugin's semver. Informational. |
|
||||
| `plugin_version` | string | yes | Contract lock. Must equal `"gbrain-plugin-v1"` for v0.15. |
|
||||
| `subagents` | string | no | Subdir name (default `subagents`). Escape-attempts are rejected. |
|
||||
| `description` | string | no | Shown in future `gbrain plugin list`. |
|
||||
|
||||
## Subagent definition files
|
||||
|
||||
Plain markdown with YAML frontmatter. The body is the system prompt. The
|
||||
frontmatter controls runtime behavior.
|
||||
|
||||
Recognized frontmatter fields:
|
||||
|
||||
| field | type | required | notes |
|
||||
|-----------------|----------|----------|-----------------------------------------------------------------------------------------|
|
||||
| `name` | string | no | Subagent identifier used as `--subagent-def`. Defaults to the file basename. |
|
||||
| `model` | string | no | Anthropic model id. Defaults to the handler default (sonnet). |
|
||||
| `max_turns` | number | no | Cap on assistant turns. Defaults to 20. |
|
||||
| `allowed_tools` | string[] | no | Whitelist of tool names. Must subset the derived brain registry. Rejected on mismatch. |
|
||||
|
||||
Unknown frontmatter fields are preserved but ignored by the handler. v0.16
|
||||
may consume more of them.
|
||||
|
||||
## Caveats that will bite you
|
||||
|
||||
1. **Plugin definitions can't change during a run.** The loader reads the
|
||||
disk once at worker startup. Editing a subagent def doesn't re-take
|
||||
effect until you restart the worker. This is deliberate — live
|
||||
reloads would break crash-resumable replay.
|
||||
|
||||
2. **`~/.gbrain/audit/subagent-jobs-*.jsonl` is local only.** If your
|
||||
worker runs on a different host than the `gbrain agent logs` caller,
|
||||
the CLI won't see heartbeats from that worker. v0.16 will unify this;
|
||||
for now assume worker + CLI share a filesystem.
|
||||
|
||||
3. **Tool calls always run with `ctx.remote = true`.** Even on local CLI
|
||||
invocation. Tools that gate on `remote=true` (file_upload's strict
|
||||
confinement, put_page's namespace check) will apply. Good default; a
|
||||
subagent definition that wants local-filesystem reach beyond the brain
|
||||
can't have it.
|
||||
|
||||
4. **`put_page` writes are namespace-scoped.** A subagent with id 42 can
|
||||
only write under `wiki/agents/42/...`. This is enforced both in the
|
||||
tool schema (the slug pattern shown to the model) AND server-side in
|
||||
the `put_page` operation (fail-closed if `viaSubagent=true`). Don't
|
||||
try to route around it; you'll get `permission_denied`.
|
||||
|
||||
## Example: a downstream-OpenClaw plugin
|
||||
|
||||
```
|
||||
~/your-openclaw/
|
||||
└── gbrain-plugin/
|
||||
├── gbrain.plugin.json
|
||||
└── subagents/
|
||||
├── meeting-ingestion.md
|
||||
├── signal-detector.md
|
||||
└── daily-task-prep.md
|
||||
```
|
||||
|
||||
`~/your-openclaw/gbrain-plugin/gbrain.plugin.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "your-openclaw",
|
||||
"version": "2026.4.20",
|
||||
"plugin_version": "gbrain-plugin-v1",
|
||||
"description": "Your OpenClaw's personal-brain subagents"
|
||||
}
|
||||
```
|
||||
|
||||
Environment:
|
||||
|
||||
```bash
|
||||
export GBRAIN_PLUGIN_PATH="$HOME/your-openclaw/gbrain-plugin"
|
||||
```
|
||||
|
||||
Then your OpenClaw calls `gbrain agent run --subagent-def meeting-ingestion
|
||||
--fanout-by transcript ...` and its definitions load automatically.
|
||||
@@ -4,8 +4,8 @@ GBrain's Minion worker ships with seven built-in handlers: `sync`,
|
||||
`embed`, `lint`, `import`, `extract`, `backlinks`, `autopilot-cycle`.
|
||||
These cover every background operation the gbrain CLI itself performs.
|
||||
|
||||
Host platforms (Wintermute, other OpenClaw deployments, future hosts)
|
||||
register their own handlers via a plugin bootstrap that imports
|
||||
Host platforms (OpenClaw deployments, future hosts) register their own
|
||||
handlers via a plugin bootstrap that imports
|
||||
`gbrain/minions`. No `handlers.json`-style data file — handlers are
|
||||
code, loaded by the worker, with the same trust model as any other
|
||||
code in the host's repo.
|
||||
@@ -58,7 +58,7 @@ async function main() {
|
||||
main().catch(err => { console.error(err); process.exit(1); });
|
||||
```
|
||||
|
||||
Ship this as a separate binary in the host repo (e.g. `wintermute-worker`)
|
||||
Ship this as a separate binary in the host repo (e.g. `your-openclaw-worker`)
|
||||
or as a side-effect module that the stock `gbrain jobs work` command
|
||||
auto-loads on startup (configurable via a host-provided entry point).
|
||||
|
||||
|
||||
149
llms-full.txt
149
llms-full.txt
@@ -138,8 +138,19 @@ strict behavior when unset.
|
||||
- `src/core/minions/protected-names.ts` — side-effect-free constant module exporting `PROTECTED_JOB_NAMES` + `isProtectedJobName()`. Kept pure so queue core can import without loading handler modules.
|
||||
- `src/core/minions/handlers/shell.ts` — `shell` job handler. Spawns `/bin/sh -c cmd` (absolute path, PATH-override-safe) or `argv[0] argv[1..]` (no shell). Env allowlist: `PATH, HOME, USER, LANG, TZ, NODE_ENV` + caller `env:` overrides. UTF-8-safe stdout/stderr tail via `string_decoder.StringDecoder`. Abort (either `ctx.signal` or `ctx.shutdownSignal`) fires SIGTERM → 5s grace → SIGKILL on child. Requires `GBRAIN_ALLOW_SHELL_JOBS=1` on worker (gated by `registerBuiltinHandlers`).
|
||||
- `src/core/minions/handlers/shell-audit.ts` — per-submission JSONL audit trail at `~/.gbrain/audit/shell-jobs-YYYY-Www.jsonl` (ISO-week rotation; override via `GBRAIN_AUDIT_DIR`). Best-effort: `mkdirSync(recursive)` + `appendFileSync`; failures logged to stderr, submission not blocked. Logs cmd (first 80 chars) or argv (JSON array). Never logs env values.
|
||||
- `src/core/minions/handlers/subagent.ts` (v0.15) — LLM-loop handler. Two-phase tool persistence (pending → complete/failed), replay reconciliation for mid-dispatch crashes, dual-signal abort (`ctx.signal` + `ctx.shutdownSignal`), Anthropic prompt caching on system + tool defs. `makeSubagentHandler({engine, client?, ...})` factory; `MessagesClient` is an injectable interface the real SDK implements structurally. Throws `RateLeaseUnavailableError` (renewable) when rate-lease capacity is full.
|
||||
- `src/core/minions/handlers/subagent-aggregator.ts` (v0.15) — `subagent_aggregator` handler. Claims AFTER all children resolve (queue changes guarantee every terminal child posts a `child_done` inbox message with outcome). Reads inbox via `ctx.readInbox()`, builds deterministic mixed-outcome markdown summary. No LLM call in v0.15.
|
||||
- `src/core/minions/handlers/subagent-audit.ts` (v0.15) — JSONL audit + heartbeat writer at `~/.gbrain/audit/subagent-jobs-YYYY-Www.jsonl`. Events: `submission` (one line per submit) + `heartbeat` (per turn boundary: `llm_call_started | llm_call_completed | tool_called | tool_result | tool_failed`). Never logs prompts or tool inputs. `readSubagentAuditForJob(jobId, {sinceIso})` is the readback path for `gbrain agent logs`.
|
||||
- `src/core/minions/rate-leases.ts` (v0.15) — lease-based concurrency cap for outbound providers (default key `anthropic:messages`, max via `GBRAIN_ANTHROPIC_MAX_INFLIGHT`). Owner-tagged rows with `expires_at` auto-prune on acquire; `pg_advisory_xact_lock` guards check-then-insert; CASCADE on owning job deletion. `renewLeaseWithBackoff` retries 3x (250/500/1000ms).
|
||||
- `src/core/minions/wait-for-completion.ts` (v0.15) — poll-until-terminal helper for CLI callers. `TimeoutError` does NOT cancel the job; `AbortSignal` exits without throwing. Default `pollMs`: 1000 on Postgres, 250 on PGLite inline.
|
||||
- `src/core/minions/transcript.ts` (v0.15) — renders `subagent_messages` + `subagent_tool_executions` to markdown. Tool rows splice under their owning assistant `tool_use` by `tool_use_id`. UTF-8-safe truncation; unknown block types fall through to fenced JSON.
|
||||
- `src/core/minions/plugin-loader.ts` (v0.15) — `GBRAIN_PLUGIN_PATH` discovery. Absolute paths only, left-wins collision, `gbrain.plugin.json` with `plugin_version: "gbrain-plugin-v1"`, plugins ship DEFS only (no new tools), `allowed_tools:` validated at load time against the derived registry.
|
||||
- `src/core/minions/tools/brain-allowlist.ts` (v0.15) — derives subagent tool registry from `src/core/operations.ts`. 11-name allow-list: `query`, `search`, `get_page`, `list_pages`, `file_list`, `file_url`, `get_backlinks`, `traverse_graph`, `resolve_slugs`, `get_ingest_log`, `put_page`. `put_page` schema is namespace-wrapped per subagent (`^wiki/agents/<subagentId>/.+`); the `put_page` op's server-side check is the authoritative gate via `ctx.viaSubagent` fail-closed.
|
||||
- `src/mcp/tool-defs.ts` (v0.15) — extracted `buildToolDefs(ops)` helper. MCP server + subagent tool registry both call it; byte-for-byte equivalence pinned by `test/mcp-tool-defs.test.ts`.
|
||||
- `src/core/minions/attachments.ts` — Attachment validation (path traversal, null byte, oversize, base64, duplicate detection)
|
||||
- `src/commands/jobs.ts` — `gbrain jobs` CLI subcommands + `gbrain jobs work` daemon. v0.13.1 surfaces the full `MinionJobInput` retry/backoff/timeout/idempotency surface as first-class CLI flags on `jobs submit`: `--max-stalled`, `--backoff-type fixed|exponential`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key`. `jobs smoke --sigkill-rescue` is the opt-in regression guard for #219.
|
||||
- `src/commands/agent.ts` (v0.16) — `gbrain agent run <prompt> [flags]` CLI. Submits `subagent` (or N children + 1 aggregator) under `{allowProtectedSubmit: true}`. Single-entry `--fanout-manifest` short-circuits. Children get `on_child_fail: 'continue'` + `max_stalled: 3`. `--follow` is the default on TTY; streams logs + polls `waitForCompletion` in parallel. Ctrl-C detaches, does not cancel.
|
||||
- `src/commands/agent-logs.ts` (v0.16) — `gbrain agent logs <job> [--follow] [--since]`. Merges JSONL heartbeat audit + `subagent_messages` into a chronological timeline. `parseSince` accepts ISO-8601 or relative (`5m`, `1h`, `2d`). Transcript tail renders only for terminal jobs.
|
||||
- `src/commands/jobs.ts` — `gbrain jobs` CLI subcommands + `gbrain jobs work` daemon. v0.13.1 surfaces the full `MinionJobInput` retry/backoff/timeout/idempotency surface as first-class CLI flags on `jobs submit`: `--max-stalled`, `--backoff-type fixed|exponential`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key`. `jobs smoke --sigkill-rescue` is the opt-in regression guard for #219. v0.16 wires `registerBuiltinHandlers` to always register `subagent` + `subagent_aggregator` (no env flag — `ANTHROPIC_API_KEY` is the natural cost gate, trust is via `PROTECTED_JOB_NAMES`) and loads `GBRAIN_PLUGIN_PATH` plugins at worker startup with a loud startup-line per plugin. `shell` handler still gated by `GBRAIN_ALLOW_SHELL_JOBS=1` (RCE surface, separate concern).
|
||||
- `src/commands/features.ts` — `gbrain features --json --auto-fix`: usage scan + feature adoption salesman
|
||||
- `src/commands/autopilot.ts` — `gbrain autopilot --install`: self-maintaining brain daemon (sync+extract+embed)
|
||||
- `src/mcp/server.ts` — MCP stdio server (generated from operations)
|
||||
@@ -636,6 +647,21 @@ search engines, surfaced in cross-references, and distributed with every release
|
||||
- Example meeting → `meetings/2026-04-03` (generic date is fine)
|
||||
- Example user → `you` or `the user`, never a proper name
|
||||
|
||||
**Specific rule: never say `Wintermute` in any CHANGELOG, README, doc, PR, or
|
||||
commit message.** When the temptation is to illustrate with the real fork name:
|
||||
- Reader-facing copy → `your OpenClaw` (covers Wintermute, Hermes, AlphaClaw,
|
||||
and any other downstream OpenClaw deployment in one term the reader already
|
||||
recognizes).
|
||||
- First-person / origin-story copy → `Garry's OpenClaw` (honest that this is
|
||||
the production deployment driving the feature, without exposing the private
|
||||
agent's name).
|
||||
|
||||
`Wintermute` may appear in private artifacts (scratch plans under
|
||||
`~/.gstack/projects/…`, memory files, conversation transcripts, CEO-review
|
||||
plans) — those aren't distributed. Anything checked into this repo or shipped
|
||||
in a release must use the OpenClaw phrasing above. Sweeping a stale reference
|
||||
is a small clean-up PR, not a debate.
|
||||
|
||||
**When in doubt, ask yourself:** "Would this query reveal private information
|
||||
about the user's contacts, investments, or portfolio if it were read by a
|
||||
stranger?" If yes, replace with generic placeholders.
|
||||
@@ -1258,6 +1284,25 @@ If anything's off, `actions[]` tells you the exact command to run. For deeper tr
|
||||
|
||||
Moving gateway crons to Minions (deterministic scripts, zero LLM tokens per fire): [`docs/guides/minions-shell-jobs.md`](docs/guides/minions-shell-jobs.md).
|
||||
|
||||
## Durable agents: `gbrain agent` (v0.15)
|
||||
|
||||
Your subagent runs survive crashes now. OpenClaw died mid-run? The worker re-claims on restart and replays from the last committed turn. Fan-out across 50 shards, one shard crashes — the aggregator still claims after every child reaches a terminal state and writes a mixed-outcome summary. Tool calls persist as a two-phase ledger (`pending` → `complete | failed`) so replay is safe by construction, not by hope.
|
||||
|
||||
```bash
|
||||
# Submit a single-subagent run
|
||||
gbrain agent run "summarize my last 10 journal pages"
|
||||
|
||||
# Fan out N prompts across N subagent children + 1 aggregator
|
||||
gbrain agent run "analyze every page" \
|
||||
--fanout-manifest manifests/pages.json \
|
||||
--subagent-def analyzer
|
||||
|
||||
# Tail a running job (heartbeat per turn + full transcript on completion)
|
||||
gbrain agent logs 1247 --follow --since 5m
|
||||
```
|
||||
|
||||
Durability is the point: every Anthropic turn commits to `subagent_messages`, every tool call to `subagent_tool_executions`. Worker kills, OpenClaw crashes, timeouts — all resumable. Host repos (your OpenClaw, etc.) ship their own subagent definitions via `GBRAIN_PLUGIN_PATH` + a `gbrain.plugin.json` manifest: see [`docs/guides/plugin-authors.md`](docs/guides/plugin-authors.md). Requires `ANTHROPIC_API_KEY` on the worker.
|
||||
|
||||
## Skillify: your skills tree stops being a black box
|
||||
|
||||
Hermes and similar agent frameworks auto-create skills as a background behavior. Fine until you don't know what the agent shipped. Checklists decay. Tests drift. Resolver entries get stale. Six months later you've got an opaque pile of "skills" that nobody has read, nobody has tested, and nobody is sure still work.
|
||||
@@ -3918,7 +3963,7 @@ F. Install gbrain autopilot --install (env-aware)
|
||||
G. Record append completed.jsonl status:"complete"
|
||||
```
|
||||
|
||||
If Phase E emits TODOs for host-specific handlers (e.g. Wintermute's
|
||||
If Phase E emits TODOs for host-specific handlers (e.g. your OpenClaw's
|
||||
~29 non-gbrain crons), the migration finishes with `status: "partial"`.
|
||||
Your host agent walks the TODOs using `skills/migrations/v0.11.0.md` +
|
||||
`docs/guides/plugin-handlers.md`, ships handler registrations in the
|
||||
@@ -4444,6 +4489,106 @@ upcoming `gbrain crontab-to-minions <file>` helper is P1 in TODOS.
|
||||
|
||||
---
|
||||
|
||||
## v0.16.0: durable agent runtime
|
||||
|
||||
v0.15 ships `gbrain agent run` / `gbrain agent logs`, a new `subagent` handler
|
||||
type in Minions, and a plugin contract for host-repo subagent defs. None of the
|
||||
existing skills need surgery. The question for downstream agents is *how* to
|
||||
adopt the new runtime, not how to patch around a breaking change.
|
||||
|
||||
### 1. Run a worker with an Anthropic key
|
||||
|
||||
The subagent handlers (`subagent` and `subagent_aggregator`) are always
|
||||
registered on the worker. No separate opt-in flag — `ANTHROPIC_API_KEY` is
|
||||
the natural cost gate (no key, the SDK call fails on the first turn), and
|
||||
who-can-submit is already protected (`PROTECTED_JOB_NAMES` + trusted-submit:
|
||||
MCP callers get `permission_denied`; only `gbrain agent run` can insert
|
||||
these rows).
|
||||
|
||||
```bash
|
||||
ANTHROPIC_API_KEY=sk-ant-... gbrain jobs work
|
||||
```
|
||||
|
||||
Worker startup prints:
|
||||
|
||||
```
|
||||
[minion worker] subagent handlers enabled
|
||||
```
|
||||
|
||||
### 2. Ship your subagents as a plugin (OpenClaw + similar)
|
||||
|
||||
Move your custom subagent definitions out of your gbrain fork and into your own
|
||||
repo as a plugin. Concretely:
|
||||
|
||||
```
|
||||
~/<your-agent>/gbrain-plugin/
|
||||
├── gbrain.plugin.json
|
||||
└── subagents/
|
||||
├── meeting-ingestion.md
|
||||
├── signal-detector.md
|
||||
└── daily-task-prep.md
|
||||
```
|
||||
|
||||
`gbrain.plugin.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "your-openclaw",
|
||||
"version": "2026.4.20",
|
||||
"plugin_version": "gbrain-plugin-v1"
|
||||
}
|
||||
```
|
||||
|
||||
Each `subagents/*.md` is a plain-text agent definition — YAML frontmatter +
|
||||
body-as-system-prompt. Recognized frontmatter fields: `name`, `model`,
|
||||
`max_turns`, `allowed_tools` (must subset the derived brain-tool registry).
|
||||
|
||||
Turn it on:
|
||||
|
||||
```bash
|
||||
export GBRAIN_PLUGIN_PATH="$HOME/<your-agent>/gbrain-plugin"
|
||||
```
|
||||
|
||||
Worker startup prints `[plugin-loader] loaded '<name>' v<ver> (N subagents)`
|
||||
per plugin; any rejection (bad manifest, unknown tool in `allowed_tools`,
|
||||
version mismatch) shows up as a loud warning at startup, not a silent dispatch-
|
||||
time failure. See `docs/guides/plugin-authors.md` for the full contract.
|
||||
|
||||
### 3. Replace ephemeral subagent runs with durable ones
|
||||
|
||||
If your agent currently spawns ephemeral subagents (OpenClaw `Agent()`, ad-hoc
|
||||
Anthropic API calls, etc.) for work that should survive crashes, sleeps, or
|
||||
worker restarts, migrate those to `gbrain agent run`. The durability is free:
|
||||
|
||||
```bash
|
||||
gbrain agent run "analyze my last 50 journal pages for recurring themes" \
|
||||
--subagent-def analyzer --fanout-manifest manifests/journal-pages.json
|
||||
```
|
||||
|
||||
Every turn persists to `subagent_messages`, every tool call is a two-phase
|
||||
ledger, and `gbrain agent logs <job>` shows where it died + what the last
|
||||
successful call returned. No more "re-run from scratch because the session
|
||||
context evaporated."
|
||||
|
||||
### 4. `put_page` from subagents writes under an agent namespace
|
||||
|
||||
If you adopted the v0.15 subagent runtime, note that `put_page` calls
|
||||
originating from a subagent's tool dispatch MUST target
|
||||
`wiki/agents/<subagent_id>/...`. The schema shown to the model enforces this
|
||||
on first try; a server-side fail-closed check rejects anything else. This
|
||||
does NOT affect your skill files, CLI put_page calls, or MCP put_page —
|
||||
only tool-dispatched writes from inside an LLM loop.
|
||||
|
||||
Aggregation output (the final "here's what all N children found" brain page)
|
||||
goes via a separate trusted CLI path, not through a subagent tool call, so
|
||||
it can write anywhere you want.
|
||||
|
||||
Iron rule: **never grant an agent write access beyond its namespace**. The
|
||||
server-side check exists because dispatcher bugs happen; treat it as defense
|
||||
in depth, not the primary boundary.
|
||||
|
||||
---
|
||||
|
||||
## Future versions
|
||||
|
||||
When gbrain ships a new version, this doc will be updated with the diffs for that
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "gbrain",
|
||||
"version": "0.15.4",
|
||||
"version": "0.16.0",
|
||||
"description": "Postgres-native personal knowledge brain with hybrid RAG search",
|
||||
"type": "module",
|
||||
"main": "src/core/index.ts",
|
||||
|
||||
@@ -15,7 +15,7 @@
|
||||
* Returns JSON when --json is passed: { path, score, total, items,
|
||||
* recommendation }. Exit code is 0 when score == total, 1 otherwise.
|
||||
*
|
||||
* Ported from ~/git/wintermute/workspace/scripts/skillify-check.mjs
|
||||
* Ported from ~/git/your-openclaw/workspace/scripts/skillify-check.mjs
|
||||
* (genericized: paths computed from $PROJECT_ROOT + runtime test-dir
|
||||
* detection; replaces the manual `grep AGENTS.md` check with a reference
|
||||
* to `gbrain check-resolvable` which validates the resolver better).
|
||||
|
||||
@@ -9,7 +9,7 @@ feature_pitch:
|
||||
|
||||
# v0.11.0 Migration: Minions — host-agent instruction manual
|
||||
|
||||
**Audience: host agents (Wintermute, other OpenClaw deployments, future
|
||||
**Audience: host agents (OpenClaw deployments, future
|
||||
hosts) reading this AFTER `gbrain apply-migrations` has run its
|
||||
mechanical phases.** The orchestrator in
|
||||
`src/commands/migrations/v0_11_0.ts` is the runtime source of truth for
|
||||
@@ -32,7 +32,7 @@ Non-empty? Each line is a TODO. Each `type` routes to a section below.
|
||||
Gbrain rewrites cron entries whose handler name matches a gbrain
|
||||
builtin (`sync`, `embed`, `lint`, `import`, `extract`, `backlinks`,
|
||||
`autopilot-cycle`). For host-specific handlers (e.g. `ea-inbox-sweep`,
|
||||
`frameio-scan`, `x-dm-triage`, `calendar-sync` on Wintermute), gbrain
|
||||
`frameio-scan`, `x-dm-triage`, `calendar-sync` on your OpenClaw), gbrain
|
||||
leaves the manifest alone and emits a TODO with shape:
|
||||
|
||||
```json
|
||||
@@ -69,7 +69,7 @@ await worker.start();
|
||||
### (b) Ship the bootstrap in your host repo
|
||||
|
||||
Autopilot already spawns `gbrain jobs work` as a child. Configure it to
|
||||
spawn your custom worker binary (e.g. `wintermute-worker`) instead, or
|
||||
spawn your custom worker binary (e.g. `your-openclaw-worker`) instead, or
|
||||
register handlers as a side-effect module that the stock worker loads on
|
||||
startup. Either path is documented in `plugin-handlers.md`.
|
||||
|
||||
|
||||
@@ -4,7 +4,7 @@ version: 1.0.0
|
||||
description: |
|
||||
Run `gbrain skillpack-check` to produce an agent-readable JSON health report
|
||||
for the gbrain install. Wraps `gbrain doctor` + `gbrain apply-migrations
|
||||
--list` so a host agent (Wintermute's morning-briefing, any OpenClaw cron)
|
||||
--list` so a host agent (your OpenClaw's morning-briefing, any OpenClaw cron)
|
||||
can see at a glance whether the skillpack needs attention.
|
||||
|
||||
Use when the user asks "is gbrain healthy?", when a cron fires a morning
|
||||
@@ -40,7 +40,7 @@ Exit code:
|
||||
|
||||
## When to run
|
||||
|
||||
- **Daily cron** (e.g. Wintermute's `morning-briefing`): `gbrain skillpack-check --quiet`.
|
||||
- **Daily cron** (e.g. your OpenClaw's `morning-briefing`): `gbrain skillpack-check --quiet`.
|
||||
Exit code alone tells you if anything is wrong; surface a one-liner in the
|
||||
briefing only when exit != 0. No JSON noise in happy-path briefings.
|
||||
- **On demand**: `gbrain skillpack-check` for the full JSON when debugging.
|
||||
|
||||
@@ -19,7 +19,7 @@ for (const op of operations) {
|
||||
}
|
||||
|
||||
// CLI-only commands that bypass the operation layer
|
||||
const CLI_ONLY = new Set(['init', 'upgrade', 'post-upgrade', 'check-update', 'integrations', 'publish', 'check-backlinks', 'lint', 'report', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor', 'migrate', 'eval', 'sync', 'extract', 'features', 'autopilot', 'graph-query', 'jobs', 'apply-migrations', 'skillpack-check', 'resolvers', 'integrity', 'repair-jsonb', 'orphans']);
|
||||
const CLI_ONLY = new Set(['init', 'upgrade', 'post-upgrade', 'check-update', 'integrations', 'publish', 'check-backlinks', 'lint', 'report', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor', 'migrate', 'eval', 'sync', 'extract', 'features', 'autopilot', 'graph-query', 'jobs', 'agent', 'apply-migrations', 'skillpack-check', 'resolvers', 'integrity', 'repair-jsonb', 'orphans']);
|
||||
|
||||
async function main() {
|
||||
// Parse global flags (--quiet / --progress-json / --progress-interval)
|
||||
@@ -413,6 +413,11 @@ async function handleCliOnly(command: string, args: string[]) {
|
||||
await runJobs(engine, args);
|
||||
break;
|
||||
}
|
||||
case 'agent': {
|
||||
const { runAgent } = await import('./commands/agent.ts');
|
||||
await runAgent(engine, args);
|
||||
break;
|
||||
}
|
||||
case 'sync': {
|
||||
const { runSync } = await import('./commands/sync.ts');
|
||||
await runSync(engine, args);
|
||||
|
||||
185
src/commands/agent-logs.ts
Normal file
185
src/commands/agent-logs.ts
Normal file
@@ -0,0 +1,185 @@
|
||||
/**
|
||||
* `gbrain agent logs <job_id> [--follow] [--since <spec>]`
|
||||
*
|
||||
* Reads two sources and merges them chronologically:
|
||||
* - ~/.gbrain/audit/subagent-jobs-*.jsonl (heartbeat + submission events
|
||||
* — lives on the WORKER's filesystem, so this CLI's effectiveness is
|
||||
* host-local today; see docs/guides/plugin-authors.md caveat #2)
|
||||
* - subagent_messages (DB rows, authoritative for persisted conversation)
|
||||
*
|
||||
* No new DB tables; all the infrastructure landed in prior Lane commits.
|
||||
*/
|
||||
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { readSubagentAuditForJob } from '../core/minions/handlers/subagent-audit.ts';
|
||||
import type { SubagentAuditEvent } from '../core/minions/handlers/subagent-audit.ts';
|
||||
import { loadTranscriptRows, renderTranscript } from '../core/minions/transcript.ts';
|
||||
import type { SubagentMessageRow } from '../core/minions/transcript.ts';
|
||||
|
||||
export interface AgentLogsOpts {
|
||||
follow?: boolean;
|
||||
/** ISO-8601 timestamp OR relative like "5m" / "1h" / "2d". */
|
||||
since?: string;
|
||||
/** Override poll interval for --follow. Default 1000ms. */
|
||||
pollMs?: number;
|
||||
/** Injectable writer for testing; default process.stdout.write. */
|
||||
write?: (s: string) => void;
|
||||
/** Abort to cut off a --follow loop cleanly (tests + Ctrl-C). */
|
||||
signal?: AbortSignal;
|
||||
}
|
||||
|
||||
const TERMINAL_STATUSES = new Set(['completed', 'failed', 'dead', 'cancelled']);
|
||||
|
||||
export async function runAgentLogs(
|
||||
engine: BrainEngine,
|
||||
jobId: number,
|
||||
opts: AgentLogsOpts = {},
|
||||
): Promise<void> {
|
||||
const write = opts.write ?? ((s: string) => { process.stdout.write(s); });
|
||||
const sinceIso = parseSince(opts.since);
|
||||
|
||||
// Seeded render: dump everything we have right now.
|
||||
let lastTs: string | undefined = sinceIso;
|
||||
lastTs = await dumpSince(engine, jobId, lastTs, write);
|
||||
|
||||
if (!opts.follow) return;
|
||||
|
||||
const pollMs = opts.pollMs ?? 1000;
|
||||
while (!opts.signal?.aborted) {
|
||||
await sleep(pollMs, opts.signal);
|
||||
lastTs = await dumpSince(engine, jobId, lastTs, write);
|
||||
// Break on terminal job status so --follow exits once the run is done.
|
||||
const status = await readJobStatus(engine, jobId);
|
||||
if (status && TERMINAL_STATUSES.has(status)) {
|
||||
write(`\n[gbrain agent] job ${jobId} reached terminal state: ${status}\n`);
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Dump events with ts >= sinceIso. Returns the max ts seen so the next
|
||||
* poll round filters cleanly. When `sinceIso` is undefined on first call,
|
||||
* everything is dumped.
|
||||
*/
|
||||
async function dumpSince(
|
||||
engine: BrainEngine,
|
||||
jobId: number,
|
||||
sinceIso: string | undefined,
|
||||
write: (s: string) => void,
|
||||
): Promise<string | undefined> {
|
||||
const audit = readSubagentAuditForJob(jobId, sinceIso ? { sinceIso } : {});
|
||||
const { messages, tools } = await loadTranscriptRows(engine, jobId);
|
||||
|
||||
// Merge audit events + message rows into one timeline ordered by ts.
|
||||
const merged: Array<{ ts: string; line: string }> = [];
|
||||
|
||||
for (const e of audit) {
|
||||
if (sinceIso && e.ts <= sinceIso) continue;
|
||||
merged.push({ ts: e.ts, line: formatAudit(e) });
|
||||
}
|
||||
for (const m of messages) {
|
||||
const ts = m.ended_at.toISOString();
|
||||
if (sinceIso && ts <= sinceIso) continue;
|
||||
merged.push({ ts, line: formatMessage(m) });
|
||||
}
|
||||
|
||||
merged.sort((a, b) => a.ts.localeCompare(b.ts));
|
||||
|
||||
let maxTs = sinceIso;
|
||||
for (const item of merged) {
|
||||
write(`${item.ts} ${item.line}\n`);
|
||||
if (!maxTs || item.ts > maxTs) maxTs = item.ts;
|
||||
}
|
||||
|
||||
// Transcript tail (renders the full message/tool tree) only if we
|
||||
// actually have messages and the job is in a terminal state. This
|
||||
// avoids spamming a half-rendered transcript mid-run.
|
||||
if (messages.length > 0 && !sinceIso) {
|
||||
const status = await readJobStatus(engine, jobId);
|
||||
if (status && TERMINAL_STATUSES.has(status)) {
|
||||
write('\n');
|
||||
write(renderTranscript(messages, tools));
|
||||
write('\n');
|
||||
}
|
||||
}
|
||||
|
||||
return maxTs;
|
||||
}
|
||||
|
||||
function formatAudit(e: SubagentAuditEvent): string {
|
||||
if (e.type === 'submission') {
|
||||
return `[submission] ${e.caller} model=${e.model ?? '?'} tools=${e.tools_count ?? 0}`;
|
||||
}
|
||||
// heartbeat
|
||||
const parts = [`[${e.event}]`, `turn=${e.turn_idx}`];
|
||||
if (e.tool_name) parts.push(`tool=${e.tool_name}`);
|
||||
if (e.ms_elapsed != null) parts.push(`${e.ms_elapsed}ms`);
|
||||
if (e.tokens) {
|
||||
const t = e.tokens;
|
||||
const tokStr = [
|
||||
t.in ? `in=${t.in}` : null,
|
||||
t.out ? `out=${t.out}` : null,
|
||||
t.cache_read ? `cache_read=${t.cache_read}` : null,
|
||||
t.cache_create ? `cache_create=${t.cache_create}` : null,
|
||||
].filter(Boolean).join(' ');
|
||||
if (tokStr) parts.push(`tokens(${tokStr})`);
|
||||
}
|
||||
if (e.error) parts.push(`error="${e.error.slice(0, 100)}"`);
|
||||
return parts.join(' ');
|
||||
}
|
||||
|
||||
function formatMessage(m: SubagentMessageRow): string {
|
||||
const blockTypes = m.content_blocks.map(b => b.type).join(',');
|
||||
return `[message #${m.message_idx} ${m.role}] blocks=${blockTypes || '(empty)'}`;
|
||||
}
|
||||
|
||||
async function readJobStatus(engine: BrainEngine, jobId: number): Promise<string | null> {
|
||||
const rows = await engine.executeRaw<{ status: string }>(
|
||||
`SELECT status FROM minion_jobs WHERE id = $1`,
|
||||
[jobId],
|
||||
);
|
||||
return rows[0]?.status ?? null;
|
||||
}
|
||||
|
||||
const RELATIVE_RE = /^(\d+)\s*(s|m|h|d)$/i;
|
||||
|
||||
/** Parse `--since`. Accepts ISO-8601 or relative ("5m", "1h", "2d"). */
|
||||
export function parseSince(input: string | undefined): string | undefined {
|
||||
if (!input) return undefined;
|
||||
const trimmed = input.trim();
|
||||
if (!trimmed) return undefined;
|
||||
const rel = RELATIVE_RE.exec(trimmed);
|
||||
if (rel) {
|
||||
const [, nStr, unitRaw] = rel;
|
||||
const unit = unitRaw!.toLowerCase();
|
||||
const n = parseInt(nStr!, 10);
|
||||
const mult = unit === 's' ? 1000
|
||||
: unit === 'm' ? 60_000
|
||||
: unit === 'h' ? 3_600_000
|
||||
: 86_400_000; // 'd'
|
||||
return new Date(Date.now() - n * mult).toISOString();
|
||||
}
|
||||
// Assume ISO. `new Date(input).toISOString()` both validates and
|
||||
// normalizes; invalid ISO throws.
|
||||
const d = new Date(trimmed);
|
||||
if (isNaN(d.getTime())) {
|
||||
throw new Error(`--since: could not parse "${input}" as ISO-8601 or relative (e.g. "5m", "1h")`);
|
||||
}
|
||||
return d.toISOString();
|
||||
}
|
||||
|
||||
function sleep(ms: number, signal?: AbortSignal): Promise<void> {
|
||||
return new Promise((resolve) => {
|
||||
const t = setTimeout(() => { signal?.removeEventListener('abort', onAbort); resolve(); }, ms);
|
||||
const onAbort = () => { clearTimeout(t); resolve(); };
|
||||
signal?.addEventListener('abort', onAbort, { once: true });
|
||||
});
|
||||
}
|
||||
|
||||
export const __testing = {
|
||||
parseSince,
|
||||
formatAudit,
|
||||
formatMessage,
|
||||
dumpSince,
|
||||
};
|
||||
333
src/commands/agent.ts
Normal file
333
src/commands/agent.ts
Normal file
@@ -0,0 +1,333 @@
|
||||
/**
|
||||
* `gbrain agent` CLI: the user-facing entry point for the v0.15 subagent
|
||||
* runtime.
|
||||
*
|
||||
* gbrain agent run <prompt> [flags]
|
||||
* gbrain agent logs <job_id> [--follow] [--since <spec>]
|
||||
*
|
||||
* `run` submits a subagent job (or fan-out of N subagents + aggregator)
|
||||
* under the trusted-submit flag so the PROTECTED_JOB_NAMES guard doesn't
|
||||
* reject. It does NOT execute the loop here — the handler runs in a
|
||||
* `gbrain jobs work` process. `--follow` tails status until terminal;
|
||||
* without `--follow` (or with `--detach`) the CLI prints the job id and
|
||||
* exits, leaving the user to check back with `gbrain agent logs`.
|
||||
*/
|
||||
|
||||
import * as fs from 'node:fs';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { MinionQueue } from '../core/minions/queue.ts';
|
||||
import { waitForCompletion, TimeoutError } from '../core/minions/wait-for-completion.ts';
|
||||
import type { MinionJobInput, SubagentHandlerData, AggregatorHandlerData } from '../core/minions/types.ts';
|
||||
import { runAgentLogs } from './agent-logs.ts';
|
||||
|
||||
// ── arg parsing helpers ────────────────────────────────────
|
||||
|
||||
function parseFlag(args: string[], flag: string): string | undefined {
|
||||
const idx = args.indexOf(flag);
|
||||
return idx >= 0 && idx + 1 < args.length ? args[idx + 1] : undefined;
|
||||
}
|
||||
function hasFlag(args: string[], flag: string): boolean { return args.includes(flag); }
|
||||
|
||||
/** Keep CLI args that look like flags from being eaten as the prompt. */
|
||||
function isKnownFlag(s: string): boolean {
|
||||
return s.startsWith('--');
|
||||
}
|
||||
|
||||
// ── command dispatcher ────────────────────────────────────
|
||||
|
||||
export async function runAgent(engine: BrainEngine, args: string[]): Promise<void> {
|
||||
const sub = args[0];
|
||||
if (!sub || sub === '--help' || sub === '-h') {
|
||||
printHelp();
|
||||
return;
|
||||
}
|
||||
|
||||
switch (sub) {
|
||||
case 'run':
|
||||
await runAgentRun(engine, args.slice(1));
|
||||
return;
|
||||
case 'logs':
|
||||
await runAgentLogsCmd(engine, args.slice(1));
|
||||
return;
|
||||
default:
|
||||
console.error(`gbrain agent: unknown subcommand "${sub}"`);
|
||||
printHelp();
|
||||
process.exit(2);
|
||||
}
|
||||
}
|
||||
|
||||
function printHelp(): void {
|
||||
console.log(`gbrain agent — durable LLM agent runs (v0.15)
|
||||
|
||||
USAGE
|
||||
gbrain agent run <prompt> [flags]
|
||||
gbrain agent logs <job_id> [--follow] [--since <spec>]
|
||||
|
||||
SUBMITTING
|
||||
gbrain agent run <prompt>
|
||||
--subagent-def <name> Named plugin subagent (from GBRAIN_PLUGIN_PATH)
|
||||
--model <id> Anthropic model id (defaults to sonnet)
|
||||
--max-turns <n> Max assistant turns (default 20)
|
||||
--tools a,b,c Subset of registered tool names (comma list)
|
||||
--timeout-ms <n> Per-job wall-clock timeout
|
||||
--fanout-manifest <path> JSON array of {prompt, input_vars?} — one child each
|
||||
--follow Tail status until terminal (default on TTY)
|
||||
--detach Submit + print job id, exit immediately
|
||||
|
||||
Flags after \`run\` up to the first unrecognized token are parsed; the
|
||||
remainder is the prompt. Use \`--\` to explicitly terminate flag parsing.
|
||||
|
||||
VIEWING
|
||||
gbrain agent logs <job_id>
|
||||
--follow Keep polling until the job reaches terminal
|
||||
--since <spec> ISO-8601 timestamp OR relative ("5m","1h","2d")
|
||||
|
||||
NOTES
|
||||
Submitting subagent jobs is trusted-only; MCP submitters receive
|
||||
permission_denied. The worker needs ANTHROPIC_API_KEY set, or the
|
||||
first LLM turn of a claimed job fails.
|
||||
`);
|
||||
}
|
||||
|
||||
// ── `gbrain agent run` ────────────────────────────────────
|
||||
|
||||
interface RunFlags {
|
||||
subagentDef?: string;
|
||||
model?: string;
|
||||
maxTurns?: number;
|
||||
tools?: string[];
|
||||
timeoutMs?: number;
|
||||
fanoutManifest?: string;
|
||||
follow: boolean;
|
||||
detach: boolean;
|
||||
}
|
||||
|
||||
function parseRunFlags(args: string[]): { flags: RunFlags; rest: string[] } {
|
||||
const flags: RunFlags = {
|
||||
follow: process.stdout.isTTY === true,
|
||||
detach: false,
|
||||
};
|
||||
let i = 0;
|
||||
while (i < args.length) {
|
||||
const a = args[i];
|
||||
if (a === '--') { i++; break; }
|
||||
if (!isKnownFlag(a!)) break;
|
||||
switch (a) {
|
||||
case '--subagent-def': flags.subagentDef = args[++i]; i++; break;
|
||||
case '--model': flags.model = args[++i]; i++; break;
|
||||
case '--max-turns': flags.maxTurns = parseInt(args[++i] ?? '', 10); i++; break;
|
||||
case '--tools': flags.tools = (args[++i] ?? '').split(',').map(s => s.trim()).filter(Boolean); i++; break;
|
||||
case '--timeout-ms': flags.timeoutMs = parseInt(args[++i] ?? '', 10); i++; break;
|
||||
case '--fanout-manifest': flags.fanoutManifest = args[++i]; i++; break;
|
||||
case '--follow': flags.follow = true; i++; break;
|
||||
case '--no-follow': flags.follow = false; i++; break;
|
||||
case '--detach': flags.detach = true; flags.follow = false; i++; break;
|
||||
default:
|
||||
throw new Error(`unknown flag: ${a}. Run \`gbrain agent run --help\` for usage.`);
|
||||
}
|
||||
}
|
||||
return { flags, rest: args.slice(i) };
|
||||
}
|
||||
|
||||
export async function runAgentRun(engine: BrainEngine, args: string[]): Promise<void> {
|
||||
const { flags, rest } = parseRunFlags(args);
|
||||
const queue = new MinionQueue(engine);
|
||||
|
||||
// Fan-out path: --fanout-manifest supplies explicit child inputs. The
|
||||
// aggregator submits first (so its id is available as parent for each
|
||||
// child); children submit with on_child_fail='continue' so mixed
|
||||
// outcomes don't cascade; aggregator waits in waiting-children until
|
||||
// Lane 1B's terminal-set check unblocks it.
|
||||
if (flags.fanoutManifest) {
|
||||
await runFanout(engine, queue, flags, rest.join(' '));
|
||||
return;
|
||||
}
|
||||
|
||||
const prompt = rest.join(' ').trim();
|
||||
if (!prompt) {
|
||||
console.error('gbrain agent run: prompt is required');
|
||||
process.exit(2);
|
||||
}
|
||||
|
||||
const data: SubagentHandlerData = { prompt };
|
||||
if (flags.subagentDef) data.subagent_def = flags.subagentDef;
|
||||
if (flags.model) data.model = flags.model;
|
||||
if (flags.maxTurns) data.max_turns = flags.maxTurns;
|
||||
if (flags.tools && flags.tools.length > 0) data.allowed_tools = flags.tools;
|
||||
|
||||
const submitOpts: Partial<MinionJobInput> = { max_stalled: 3 };
|
||||
if (flags.timeoutMs) submitOpts.timeout_ms = flags.timeoutMs;
|
||||
|
||||
const job = await queue.add('subagent', data as unknown as Record<string, unknown>, submitOpts, {
|
||||
allowProtectedSubmit: true,
|
||||
});
|
||||
|
||||
process.stderr.write(`submitted: job ${job.id} (subagent)\n`);
|
||||
|
||||
if (flags.detach || !flags.follow) {
|
||||
process.stdout.write(String(job.id) + '\n');
|
||||
return;
|
||||
}
|
||||
|
||||
await followJob(engine, queue, job.id, flags.timeoutMs);
|
||||
}
|
||||
|
||||
// ── fan-out ───────────────────────────────────────────────
|
||||
|
||||
async function runFanout(engine: BrainEngine, queue: MinionQueue, flags: RunFlags, promptTemplate: string): Promise<void> {
|
||||
const manifestPath = flags.fanoutManifest!;
|
||||
let manifest: Array<{ prompt?: string; input_vars?: Record<string, unknown> }>;
|
||||
try {
|
||||
const raw = fs.readFileSync(manifestPath, 'utf8');
|
||||
const parsed = JSON.parse(raw);
|
||||
if (!Array.isArray(parsed)) throw new Error('manifest must be a JSON array');
|
||||
manifest = parsed as typeof manifest;
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
console.error(`gbrain agent run: invalid --fanout-manifest ${manifestPath}: ${msg}`);
|
||||
process.exit(2);
|
||||
}
|
||||
|
||||
if (manifest.length === 0) {
|
||||
console.error('gbrain agent run: --fanout-manifest is empty; nothing to run');
|
||||
process.exit(2);
|
||||
}
|
||||
|
||||
// Short-circuit: 1 entry → single subagent, no aggregator.
|
||||
if (manifest.length === 1) {
|
||||
const entry = manifest[0]!;
|
||||
const data: SubagentHandlerData = {
|
||||
prompt: entry.prompt ?? promptTemplate,
|
||||
...(entry.input_vars ? { input_vars: entry.input_vars } : {}),
|
||||
...(flags.subagentDef ? { subagent_def: flags.subagentDef } : {}),
|
||||
...(flags.model ? { model: flags.model } : {}),
|
||||
...(flags.maxTurns ? { max_turns: flags.maxTurns } : {}),
|
||||
...(flags.tools && flags.tools.length > 0 ? { allowed_tools: flags.tools } : {}),
|
||||
};
|
||||
const submitOpts: Partial<MinionJobInput> = { max_stalled: 3 };
|
||||
if (flags.timeoutMs) submitOpts.timeout_ms = flags.timeoutMs;
|
||||
const job = await queue.add('subagent', data as unknown as Record<string, unknown>, submitOpts, {
|
||||
allowProtectedSubmit: true,
|
||||
});
|
||||
process.stderr.write(`submitted: job ${job.id} (single-entry manifest short-circuit)\n`);
|
||||
if (flags.detach || !flags.follow) { process.stdout.write(`${job.id}\n`); return; }
|
||||
await followJob(engine, queue, job.id, flags.timeoutMs);
|
||||
return;
|
||||
}
|
||||
|
||||
// N-entry fan-out: aggregator first (so we have its id as parent), then
|
||||
// N children, then flip the aggregator's children_ids to include them.
|
||||
const aggregatorSeed: AggregatorHandlerData = { children_ids: [] };
|
||||
const aggregator = await queue.add(
|
||||
'subagent_aggregator',
|
||||
aggregatorSeed as unknown as Record<string, unknown>,
|
||||
{ max_stalled: 3 },
|
||||
{ allowProtectedSubmit: true },
|
||||
);
|
||||
|
||||
const childIds: number[] = [];
|
||||
for (const entry of manifest) {
|
||||
const data: SubagentHandlerData = {
|
||||
prompt: entry.prompt ?? promptTemplate,
|
||||
...(entry.input_vars ? { input_vars: entry.input_vars } : {}),
|
||||
...(flags.subagentDef ? { subagent_def: flags.subagentDef } : {}),
|
||||
...(flags.model ? { model: flags.model } : {}),
|
||||
...(flags.maxTurns ? { max_turns: flags.maxTurns } : {}),
|
||||
...(flags.tools && flags.tools.length > 0 ? { allowed_tools: flags.tools } : {}),
|
||||
};
|
||||
const submitOpts: Partial<MinionJobInput> = {
|
||||
parent_job_id: aggregator.id,
|
||||
on_child_fail: 'continue', // mixed-outcome aggregation
|
||||
max_stalled: 3,
|
||||
};
|
||||
if (flags.timeoutMs) submitOpts.timeout_ms = flags.timeoutMs;
|
||||
const child = await queue.add('subagent', data as unknown as Record<string, unknown>, submitOpts, {
|
||||
allowProtectedSubmit: true,
|
||||
});
|
||||
childIds.push(child.id);
|
||||
}
|
||||
|
||||
// Update the aggregator's data with the final children_ids. We have to
|
||||
// do this after submission because each add() returns the committed
|
||||
// row's id; the aggregator's seed started with an empty array.
|
||||
await engine.executeRaw(
|
||||
`UPDATE minion_jobs SET data = jsonb_set(data, '{children_ids}', $1::jsonb) WHERE id = $2`,
|
||||
[JSON.stringify(childIds), aggregator.id],
|
||||
);
|
||||
|
||||
process.stderr.write(
|
||||
`submitted: aggregator job ${aggregator.id} + ${childIds.length} subagent children ` +
|
||||
`(${childIds[0]}..${childIds[childIds.length - 1]})\n`,
|
||||
);
|
||||
|
||||
if (flags.detach || !flags.follow) {
|
||||
process.stdout.write(`${aggregator.id}\n`);
|
||||
return;
|
||||
}
|
||||
await followJob(engine, queue, aggregator.id, flags.timeoutMs);
|
||||
}
|
||||
|
||||
// ── follow ────────────────────────────────────────────────
|
||||
|
||||
async function followJob(engine: BrainEngine, queue: MinionQueue, jobId: number, timeoutMs?: number): Promise<void> {
|
||||
process.stderr.write(`[gbrain agent] following job ${jobId} (Ctrl-C to detach)...\n`);
|
||||
const ac = new AbortController();
|
||||
const onSigint = () => ac.abort();
|
||||
process.once('SIGINT', onSigint);
|
||||
try {
|
||||
// Streaming logs happen in the background; we poll the terminal state
|
||||
// in parallel so the function returns as soon as the job completes.
|
||||
const logsP = runAgentLogs(engine, jobId, { follow: true, signal: ac.signal, pollMs: 1000 });
|
||||
try {
|
||||
const job = await waitForCompletion(queue, jobId, {
|
||||
timeoutMs: timeoutMs ?? 24 * 60 * 60 * 1000,
|
||||
pollMs: 1000,
|
||||
signal: ac.signal,
|
||||
});
|
||||
ac.abort();
|
||||
await logsP.catch(() => {});
|
||||
process.stderr.write(`[gbrain agent] job ${jobId} terminal: ${job.status}\n`);
|
||||
if (job.result != null) process.stdout.write(JSON.stringify(job.result, null, 2) + '\n');
|
||||
if (job.status !== 'completed') process.exit(1);
|
||||
} catch (e) {
|
||||
if (e instanceof TimeoutError) {
|
||||
process.stderr.write(`[gbrain agent] timeout after ${e.elapsedMs}ms — job is still running. Check with: gbrain jobs get ${jobId}\n`);
|
||||
process.exit(3);
|
||||
}
|
||||
throw e;
|
||||
}
|
||||
} finally {
|
||||
process.removeListener('SIGINT', onSigint);
|
||||
}
|
||||
}
|
||||
|
||||
// ── `gbrain agent logs` ────────────────────────────────────
|
||||
|
||||
async function runAgentLogsCmd(engine: BrainEngine, args: string[]): Promise<void> {
|
||||
const jobIdStr = args.find(a => !isKnownFlag(a));
|
||||
if (!jobIdStr) {
|
||||
console.error('gbrain agent logs: <job_id> is required');
|
||||
process.exit(2);
|
||||
}
|
||||
const jobId = parseInt(jobIdStr, 10);
|
||||
if (!Number.isFinite(jobId) || jobId <= 0) {
|
||||
console.error(`gbrain agent logs: "${jobIdStr}" is not a valid job id`);
|
||||
process.exit(2);
|
||||
}
|
||||
const follow = hasFlag(args, '--follow');
|
||||
const since = parseFlag(args, '--since');
|
||||
|
||||
const ac = new AbortController();
|
||||
const onSigint = () => ac.abort();
|
||||
process.once('SIGINT', onSigint);
|
||||
try {
|
||||
await runAgentLogs(engine, jobId, { follow, since, signal: ac.signal });
|
||||
} finally {
|
||||
process.removeListener('SIGINT', onSigint);
|
||||
}
|
||||
}
|
||||
|
||||
// Expose for tests.
|
||||
export const __testing = {
|
||||
parseRunFlags,
|
||||
};
|
||||
@@ -96,7 +96,7 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
// status:"complete" for the same version, the install is mid-migration.
|
||||
// Typical cause: v0.11.0 stopgap wrote a partial record but nobody ran
|
||||
// `gbrain apply-migrations --yes` afterward. This check fires on every
|
||||
// `gbrain doctor` invocation so Wintermute's health skill catches it.
|
||||
// `gbrain doctor` invocation so your OpenClaw's health skill catches it.
|
||||
try {
|
||||
const completed = loadCompletedMigrations();
|
||||
const byVersion = new Map<string, { complete: boolean; partial: boolean }>();
|
||||
|
||||
@@ -634,4 +634,37 @@ export async function registerBuiltinHandlers(worker: MinionWorker, engine: Brai
|
||||
} else {
|
||||
process.stderr.write('[minion worker] shell handler disabled (set GBRAIN_ALLOW_SHELL_JOBS=1 to enable)\n');
|
||||
}
|
||||
|
||||
// v0.15 subagent handlers: always-on. Unlike shell (which needs an env
|
||||
// flag because of RCE surface), subagent only calls the Anthropic API
|
||||
// with the operator's own ANTHROPIC_API_KEY — no key, the SDK call
|
||||
// fails immediately. Who-can-submit is already gated by
|
||||
// PROTECTED_JOB_NAMES + TrustedSubmitOpts (MCP can't submit subagent
|
||||
// jobs; only the CLI path with allowProtectedSubmit can). No separate
|
||||
// cost-ceremony env flag needed.
|
||||
const { makeSubagentHandler } = await import('../core/minions/handlers/subagent.ts');
|
||||
const { subagentAggregatorHandler } = await import('../core/minions/handlers/subagent-aggregator.ts');
|
||||
worker.register('subagent', makeSubagentHandler({ engine }));
|
||||
worker.register('subagent_aggregator', subagentAggregatorHandler);
|
||||
process.stderr.write('[minion worker] subagent handlers enabled\n');
|
||||
|
||||
// Plugin discovery — one line per discovered plugin (mirrors the
|
||||
// openclaw-seam startup line convention from v0.11+). Loaded
|
||||
// unconditionally; empty GBRAIN_PLUGIN_PATH is a no-op.
|
||||
try {
|
||||
const { loadPluginsFromEnv } = await import('../core/minions/plugin-loader.ts');
|
||||
const { BRAIN_TOOL_ALLOWLIST } = await import('../core/minions/tools/brain-allowlist.ts');
|
||||
const validNames = new Set<string>();
|
||||
for (const n of BRAIN_TOOL_ALLOWLIST) validNames.add(`brain_${n}`);
|
||||
const loaded = loadPluginsFromEnv({ validAgentToolNames: validNames });
|
||||
for (const w of loaded.warnings) process.stderr.write(w + '\n');
|
||||
for (const p of loaded.plugins) {
|
||||
process.stderr.write(
|
||||
`[plugin-loader] loaded '${p.manifest.name}' v${p.manifest.version} (${p.subagents.length} subagents)\n`,
|
||||
);
|
||||
}
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
process.stderr.write(`[plugin-loader] discovery failed: ${msg}\n`);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -17,6 +17,7 @@ import { v0_12_2 } from './v0_12_2.ts';
|
||||
import { v0_13_0 } from './v0_13_0.ts';
|
||||
import { v0_13_1 } from './v0_13_1.ts';
|
||||
import { v0_14_0 } from './v0_14_0.ts';
|
||||
import { v0_16_0 } from './v0_16_0.ts';
|
||||
|
||||
export const migrations: Migration[] = [
|
||||
v0_11_0,
|
||||
@@ -25,6 +26,7 @@ export const migrations: Migration[] = [
|
||||
v0_13_0,
|
||||
v0_13_1,
|
||||
v0_14_0,
|
||||
v0_16_0,
|
||||
];
|
||||
|
||||
/** Look up a migration by exact version string. */
|
||||
|
||||
141
src/commands/migrations/v0_16_0.ts
Normal file
141
src/commands/migrations/v0_16_0.ts
Normal file
@@ -0,0 +1,141 @@
|
||||
/**
|
||||
* v0.16.0 migration orchestrator — Subagent runtime schema.
|
||||
*
|
||||
* Adds three tables for durable LLM agent loops:
|
||||
* - subagent_messages Anthropic message-block persistence
|
||||
* - subagent_tool_executions Two-phase tool ledger (pending/complete/failed)
|
||||
* - subagent_rate_leases Lease-based concurrency cap
|
||||
*
|
||||
* All DDL is `CREATE TABLE IF NOT EXISTS` and ships in src/schema.sql +
|
||||
* src/core/pglite-schema.ts (both Postgres and PGLite fresh-install paths).
|
||||
* This orchestrator's job is therefore only to VERIFY the tables exist after
|
||||
* `gbrain init --migrate-only` has run, so an upgrade that somehow skipped
|
||||
* the schema step fails loudly instead of silently.
|
||||
*
|
||||
* Phases (all idempotent):
|
||||
* A. Schema — gbrain init --migrate-only (creates tables via SCHEMA_SQL).
|
||||
* B. Verify — confirm all three tables exist.
|
||||
* C. Record — append completed.jsonl.
|
||||
*/
|
||||
|
||||
import { execSync } from 'child_process';
|
||||
import type { Migration, OrchestratorOpts, OrchestratorResult, OrchestratorPhaseResult } from './types.ts';
|
||||
import { appendCompletedMigration } from '../../core/preferences.ts';
|
||||
import { loadConfig, toEngineConfig } from '../../core/config.ts';
|
||||
import { createEngine } from '../../core/engine-factory.ts';
|
||||
|
||||
const REQUIRED_TABLES = ['subagent_messages', 'subagent_tool_executions', 'subagent_rate_leases'] as const;
|
||||
|
||||
// ── Phase A — Schema ────────────────────────────────────────
|
||||
|
||||
function phaseASchema(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
if (opts.dryRun) return { name: 'schema', status: 'skipped', detail: 'dry-run' };
|
||||
try {
|
||||
execSync('gbrain init --migrate-only', { stdio: 'inherit', timeout: 60_000, env: process.env });
|
||||
return { name: 'schema', status: 'complete' };
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
return { name: 'schema', status: 'failed', detail: msg };
|
||||
}
|
||||
}
|
||||
|
||||
// ── Phase B — Verify tables exist ───────────────────────────
|
||||
|
||||
async function phaseBVerify(opts: OrchestratorOpts): Promise<OrchestratorPhaseResult> {
|
||||
if (opts.dryRun) return { name: 'verify', status: 'skipped', detail: 'dry-run' };
|
||||
try {
|
||||
const config = loadConfig();
|
||||
if (!config) {
|
||||
return { name: 'verify', status: 'skipped', detail: 'no brain configured' };
|
||||
}
|
||||
const engine = await createEngine(toEngineConfig(config));
|
||||
await engine.connect(toEngineConfig(config));
|
||||
try {
|
||||
const rows = await engine.executeRaw<{ table_name: string }>(
|
||||
`SELECT table_name FROM information_schema.tables
|
||||
WHERE table_schema = current_schema()
|
||||
AND table_name IN ('subagent_messages','subagent_tool_executions','subagent_rate_leases')`,
|
||||
);
|
||||
const found = new Set(rows.map(r => r.table_name));
|
||||
const missing = REQUIRED_TABLES.filter(t => !found.has(t));
|
||||
if (missing.length > 0) {
|
||||
return {
|
||||
name: 'verify',
|
||||
status: 'failed',
|
||||
detail: `missing tables: ${missing.join(', ')}`,
|
||||
};
|
||||
}
|
||||
return { name: 'verify', status: 'complete', detail: `${REQUIRED_TABLES.length} tables present` };
|
||||
} finally {
|
||||
try { await engine.disconnect(); } catch {}
|
||||
}
|
||||
} catch (e) {
|
||||
return {
|
||||
name: 'verify',
|
||||
status: 'failed',
|
||||
detail: e instanceof Error ? e.message : String(e),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// ── Orchestrator ────────────────────────────────────────────
|
||||
|
||||
async function orchestrator(opts: OrchestratorOpts): Promise<OrchestratorResult> {
|
||||
console.log('');
|
||||
console.log('=== v0.16.0 — Subagent runtime schema ===');
|
||||
if (opts.dryRun) console.log(' (dry-run; no side effects)');
|
||||
console.log('');
|
||||
|
||||
const phases: OrchestratorPhaseResult[] = [];
|
||||
|
||||
const a = phaseASchema(opts);
|
||||
phases.push(a);
|
||||
if (a.status === 'failed') return finalize(phases, 'failed');
|
||||
|
||||
const b = await phaseBVerify(opts);
|
||||
phases.push(b);
|
||||
|
||||
const status: 'complete' | 'partial' | 'failed' =
|
||||
a.status === 'failed' ? 'failed' :
|
||||
b.status === 'failed' ? 'partial' :
|
||||
'complete';
|
||||
|
||||
return finalize(phases, status);
|
||||
}
|
||||
|
||||
function finalize(phases: OrchestratorPhaseResult[], status: 'complete' | 'partial' | 'failed'): OrchestratorResult {
|
||||
if (status !== 'failed') {
|
||||
try {
|
||||
appendCompletedMigration({
|
||||
version: '0.16.0',
|
||||
completed_at: new Date().toISOString(),
|
||||
status: status as 'complete' | 'partial',
|
||||
phases: phases.map(p => ({ name: p.name, status: p.status })),
|
||||
});
|
||||
} catch {
|
||||
// Recording is best-effort.
|
||||
}
|
||||
}
|
||||
return { version: '0.16.0', status, phases };
|
||||
}
|
||||
|
||||
export const v0_16_0: Migration = {
|
||||
version: '0.16.0',
|
||||
featurePitch: {
|
||||
headline: 'Durable LLM agents land in the brain — survive crashes, sleeps, and worker restarts.',
|
||||
description:
|
||||
'v0.16.0 adds the subagent runtime: run long-running, fan-out Anthropic LLM loops ' +
|
||||
'as first-class Minion jobs. Crash-resumable turn persistence, two-phase tool ledger, ' +
|
||||
'lease-based rate limit, parent-child fan-out with aggregation. Entry points: `gbrain ' +
|
||||
'agent run` and `gbrain agent logs`. See docs/guides/plugin-authors.md for shipping ' +
|
||||
'custom subagent defs from a host repo (your OpenClaw etc.).',
|
||||
},
|
||||
orchestrator,
|
||||
};
|
||||
|
||||
/** Exported for unit tests. */
|
||||
export const __testing = {
|
||||
phaseASchema,
|
||||
phaseBVerify,
|
||||
REQUIRED_TABLES,
|
||||
};
|
||||
@@ -2,7 +2,7 @@
|
||||
* `gbrain skillpack-check` — agent-readable health report.
|
||||
*
|
||||
* Wraps `gbrain doctor --json` + `gbrain apply-migrations --list` into a
|
||||
* single JSON blob a host agent (Wintermute's morning-briefing, any
|
||||
* single JSON blob a host agent (your OpenClaw's morning-briefing, any
|
||||
* OpenClaw cron) can consume without parsing two subcommands.
|
||||
*
|
||||
* Usage:
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
/**
|
||||
* CompletenessScorer — per-entity-type rubrics, 0.0–1.0 score per page.
|
||||
*
|
||||
* Replaces Wintermute's length-based heuristic ("compiled_truth > 500 chars")
|
||||
* Replaces Garry's OpenClaw's length-based heuristic ("compiled_truth > 500 chars")
|
||||
* with a weighted rubric that actually reflects whether a page would be
|
||||
* useful to answer a query. Runs on demand; BrainWriter invokes it on
|
||||
* write to cache the score in frontmatter.
|
||||
|
||||
169
src/core/minions/handlers/subagent-aggregator.ts
Normal file
169
src/core/minions/handlers/subagent-aggregator.ts
Normal file
@@ -0,0 +1,169 @@
|
||||
/**
|
||||
* subagent_aggregator handler (v0.15).
|
||||
*
|
||||
* This is the job that CLAIMS after all subagent children resolve and
|
||||
* produces the final aggregated output. Not a polling parent — Lane 1B's
|
||||
* queue changes make every terminal child transition (complete/failed/
|
||||
* dead/cancelled/timeout) emit a child_done message into this job's
|
||||
* inbox, AND flip this job out of waiting-children once all kids are
|
||||
* terminal. When we claim, all N child_done messages are already in
|
||||
* minion_inbox.
|
||||
*
|
||||
* The aggregator does NOT re-call Anthropic in v0.15. It reads child
|
||||
* results from child_done messages, builds a markdown summary, and
|
||||
* returns it as the handler result. If children produced brain pages
|
||||
* under wiki/agents/<child_id>/..., those are referenced by slug — not
|
||||
* re-embedded into the summary blob.
|
||||
*
|
||||
* v0.16+ will add an LLM synthesis pass for richer summaries. The v0.15
|
||||
* output is deterministic string concatenation so fan-out runs stay
|
||||
* reproducible.
|
||||
*/
|
||||
|
||||
import type { MinionJobContext, ChildDoneMessage, ChildOutcome } from '../types.ts';
|
||||
import type { AggregatorHandlerData } from '../types.ts';
|
||||
|
||||
export interface AggregatorResult {
|
||||
/** Per-child record in the order children_ids was supplied. */
|
||||
children: Array<{
|
||||
child_id: number;
|
||||
job_name: string;
|
||||
outcome: ChildOutcome;
|
||||
error: string | null;
|
||||
/** JSON-parsed result payload for successful children. null on failure/cancel/timeout. */
|
||||
result: unknown;
|
||||
}>;
|
||||
/** Counts by outcome — quick shape for logs + tests. */
|
||||
summary: Record<ChildOutcome, number>;
|
||||
/** Rendered markdown, suitable for attaching to the job row or writing as a brain page. */
|
||||
markdown: string;
|
||||
}
|
||||
|
||||
/** v0.15 aggregator: synchronous read from inbox, no LLM call. */
|
||||
export async function subagentAggregatorHandler(ctx: MinionJobContext): Promise<AggregatorResult> {
|
||||
const data = (ctx.data ?? {}) as AggregatorHandlerData;
|
||||
const expectedIds = Array.isArray(data.children_ids) ? data.children_ids : [];
|
||||
|
||||
if (expectedIds.length === 0) {
|
||||
return {
|
||||
children: [],
|
||||
summary: emptySummary(),
|
||||
markdown: '# Aggregated subagent results\n\n_(no children)_',
|
||||
};
|
||||
}
|
||||
|
||||
// Read every child_done inbox message addressed to this job. By the time
|
||||
// we're claimed, the queue layer has posted one per child terminal
|
||||
// transition. The `readInbox` method marks messages as read so future
|
||||
// claims don't re-process them.
|
||||
const messages = await ctx.readInbox();
|
||||
const childDoneByChildId = new Map<number, ChildDoneMessage>();
|
||||
for (const m of messages) {
|
||||
const payload = parseChildDone(m.payload);
|
||||
if (!payload) continue;
|
||||
childDoneByChildId.set(payload.child_id, payload);
|
||||
}
|
||||
|
||||
const summary = emptySummary();
|
||||
const children: AggregatorResult['children'] = expectedIds.map(childId => {
|
||||
const msg = childDoneByChildId.get(childId);
|
||||
if (!msg) {
|
||||
// Missing — shouldn't happen under the v0.15 invariants (every
|
||||
// terminal path emits child_done). Surface as a failure row so the
|
||||
// aggregator is honest about what it knows.
|
||||
summary.failed = (summary.failed ?? 0) + 1;
|
||||
return {
|
||||
child_id: childId,
|
||||
job_name: '',
|
||||
outcome: 'failed',
|
||||
error: 'no child_done message observed in inbox',
|
||||
result: null,
|
||||
};
|
||||
}
|
||||
const outcome: ChildOutcome = msg.outcome ?? 'complete';
|
||||
summary[outcome] = (summary[outcome] ?? 0) + 1;
|
||||
return {
|
||||
child_id: childId,
|
||||
job_name: msg.job_name,
|
||||
outcome,
|
||||
error: msg.error ?? null,
|
||||
result: outcome === 'complete' ? msg.result : null,
|
||||
};
|
||||
});
|
||||
|
||||
const markdown = renderMarkdown(children, summary, data.aggregate_prompt_template);
|
||||
|
||||
await ctx.updateProgress({ total: expectedIds.length, summary });
|
||||
await ctx.log(`aggregated ${expectedIds.length} children — ${formatSummary(summary)}`);
|
||||
|
||||
return { children, summary, markdown };
|
||||
}
|
||||
|
||||
// ── internal ────────────────────────────────────────────────
|
||||
|
||||
function emptySummary(): Record<ChildOutcome, number> {
|
||||
return { complete: 0, failed: 0, dead: 0, cancelled: 0, timeout: 0 };
|
||||
}
|
||||
|
||||
function formatSummary(s: Record<ChildOutcome, number>): string {
|
||||
return Object.entries(s)
|
||||
.filter(([, n]) => n > 0)
|
||||
.map(([k, n]) => `${k}=${n}`)
|
||||
.join(', ');
|
||||
}
|
||||
|
||||
function parseChildDone(payload: unknown): ChildDoneMessage | null {
|
||||
const obj = typeof payload === 'string' ? safeParse(payload) : payload;
|
||||
if (!obj || typeof obj !== 'object') return null;
|
||||
const rec = obj as Record<string, unknown>;
|
||||
if (rec.type !== 'child_done' || typeof rec.child_id !== 'number') return null;
|
||||
return {
|
||||
type: 'child_done',
|
||||
child_id: rec.child_id,
|
||||
job_name: typeof rec.job_name === 'string' ? rec.job_name : '',
|
||||
result: rec.result,
|
||||
outcome: typeof rec.outcome === 'string' ? rec.outcome as ChildOutcome : undefined,
|
||||
error: typeof rec.error === 'string' ? rec.error : null,
|
||||
};
|
||||
}
|
||||
|
||||
function safeParse(raw: string): unknown {
|
||||
try { return JSON.parse(raw); } catch { return null; }
|
||||
}
|
||||
|
||||
function renderMarkdown(
|
||||
children: AggregatorResult['children'],
|
||||
summary: Record<ChildOutcome, number>,
|
||||
template?: string,
|
||||
): string {
|
||||
const header = template && template.trim().length > 0
|
||||
? template
|
||||
: '# Aggregated subagent results';
|
||||
|
||||
const parts: string[] = [header, ''];
|
||||
parts.push(`- total: ${children.length}`);
|
||||
for (const [outcome, n] of Object.entries(summary)) {
|
||||
if (n > 0) parts.push(`- ${outcome}: ${n}`);
|
||||
}
|
||||
parts.push('');
|
||||
|
||||
for (const c of children) {
|
||||
parts.push(`## child ${c.child_id} (${c.job_name || 'unknown'}) — ${c.outcome}`);
|
||||
if (c.error) parts.push(`> error: ${c.error}`);
|
||||
if (c.outcome === 'complete' && c.result !== undefined) {
|
||||
parts.push('```json', JSON.stringify(c.result, null, 2), '```');
|
||||
}
|
||||
parts.push('');
|
||||
}
|
||||
|
||||
return parts.join('\n').replace(/\n{3,}/g, '\n\n');
|
||||
}
|
||||
|
||||
// ── Testing surface ─────────────────────────────────────────
|
||||
|
||||
export const __testing = {
|
||||
emptySummary,
|
||||
formatSummary,
|
||||
parseChildDone,
|
||||
renderMarkdown,
|
||||
};
|
||||
137
src/core/minions/handlers/subagent-audit.ts
Normal file
137
src/core/minions/handlers/subagent-audit.ts
Normal file
@@ -0,0 +1,137 @@
|
||||
/**
|
||||
* Subagent audit + heartbeat log. JSONL, file-rotated weekly, best-effort.
|
||||
*
|
||||
* Two event flavors:
|
||||
* - submission: one line per subagent job submit (mirrors shell-audit).
|
||||
* - heartbeat: one line per LLM turn boundary (started / completed) so
|
||||
* `gbrain agent logs <job> --follow` has fresh content to
|
||||
* show during long Anthropic calls. Without these, a
|
||||
* 30-second model call produces zero output between turns
|
||||
* and --follow looks frozen.
|
||||
*
|
||||
* Never logs prompts, tool inputs, or full tool outputs (PII risk — input
|
||||
* vars may contain emails, free text from the user, etc.). DO log
|
||||
* non-identifying operational fields: tokens, duration, model, tool_name.
|
||||
*
|
||||
* `GBRAIN_AUDIT_DIR` overrides the default ~/.gbrain/audit/ path — useful
|
||||
* for container deploys with a read-only $HOME.
|
||||
*/
|
||||
|
||||
import * as fs from 'node:fs';
|
||||
import * as path from 'node:path';
|
||||
import { resolveAuditDir } from './shell-audit.ts';
|
||||
|
||||
export interface SubagentSubmissionEvent {
|
||||
ts: string;
|
||||
type: 'submission';
|
||||
caller: 'cli' | 'mcp' | 'worker';
|
||||
remote: boolean;
|
||||
job_id: number;
|
||||
parent_job_id?: number | null;
|
||||
model?: string;
|
||||
tools_count?: number;
|
||||
allowed_tools?: string[];
|
||||
}
|
||||
|
||||
export interface SubagentHeartbeatEvent {
|
||||
ts: string;
|
||||
type: 'heartbeat';
|
||||
job_id: number;
|
||||
event: 'llm_call_started' | 'llm_call_completed' | 'tool_called' | 'tool_result' | 'tool_failed';
|
||||
turn_idx: number;
|
||||
/** Tool name for tool_* events. Never the input — that may contain secrets. */
|
||||
tool_name?: string;
|
||||
/** ms elapsed for *_completed / tool_result / tool_failed. */
|
||||
ms_elapsed?: number;
|
||||
/** Token rollup for llm_call_completed. Per-turn, not cumulative. */
|
||||
tokens?: { in?: number; out?: number; cache_read?: number; cache_create?: number };
|
||||
/** Short error text for tool_failed. First 200 chars. */
|
||||
error?: string;
|
||||
}
|
||||
|
||||
export type SubagentAuditEvent = SubagentSubmissionEvent | SubagentHeartbeatEvent;
|
||||
|
||||
/** File name, rotated by ISO week. `subagent-jobs-YYYY-Www.jsonl`. */
|
||||
export function computeSubagentAuditFilename(now: Date = new Date()): string {
|
||||
const d = new Date(Date.UTC(now.getUTCFullYear(), now.getUTCMonth(), now.getUTCDate()));
|
||||
const dayNum = (d.getUTCDay() + 6) % 7;
|
||||
d.setUTCDate(d.getUTCDate() - dayNum + 3);
|
||||
const isoYear = d.getUTCFullYear();
|
||||
const firstThursday = new Date(Date.UTC(isoYear, 0, 4));
|
||||
const firstThursdayDayNum = (firstThursday.getUTCDay() + 6) % 7;
|
||||
firstThursday.setUTCDate(firstThursday.getUTCDate() - firstThursdayDayNum + 3);
|
||||
const weekNum = Math.round((d.getTime() - firstThursday.getTime()) / (7 * 86400000)) + 1;
|
||||
const ww = String(weekNum).padStart(2, '0');
|
||||
return `subagent-jobs-${isoYear}-W${ww}.jsonl`;
|
||||
}
|
||||
|
||||
/** Low-level append. Best-effort; write failure goes to stderr + keep running. */
|
||||
function append(event: SubagentAuditEvent): void {
|
||||
const dir = resolveAuditDir();
|
||||
const file = path.join(dir, computeSubagentAuditFilename());
|
||||
const line = JSON.stringify(event) + '\n';
|
||||
try {
|
||||
fs.mkdirSync(dir, { recursive: true });
|
||||
fs.appendFileSync(file, line, { encoding: 'utf8' });
|
||||
} catch (err) {
|
||||
const msg = err instanceof Error ? err.message : String(err);
|
||||
process.stderr.write(`[subagent-audit] write failed (${msg}); job continues\n`);
|
||||
}
|
||||
}
|
||||
|
||||
export function logSubagentSubmission(event: Omit<SubagentSubmissionEvent, 'ts' | 'type'>): void {
|
||||
append({ ...event, ts: new Date().toISOString(), type: 'submission' });
|
||||
}
|
||||
|
||||
export function logSubagentHeartbeat(event: Omit<SubagentHeartbeatEvent, 'ts' | 'type'>): void {
|
||||
// Defensive: trim error text to avoid accidentally writing huge stack traces.
|
||||
const trimmed = event.error ? { ...event, error: event.error.slice(0, 200) } : event;
|
||||
append({ ...trimmed, ts: new Date().toISOString(), type: 'heartbeat' });
|
||||
}
|
||||
|
||||
/**
|
||||
* Read back all audit events for a job id from the current + prior week
|
||||
* files. Used by `gbrain agent logs <job>`. Returns chronological order.
|
||||
*
|
||||
* `sinceIso` (if present) filters to events with ts >= sinceIso.
|
||||
*/
|
||||
export function readSubagentAuditForJob(jobId: number, opts: { sinceIso?: string } = {}): SubagentAuditEvent[] {
|
||||
const dir = resolveAuditDir();
|
||||
if (!fs.existsSync(dir)) return [];
|
||||
|
||||
const now = new Date();
|
||||
const thisWeek = computeSubagentAuditFilename(now);
|
||||
const weekAgo = computeSubagentAuditFilename(new Date(now.getTime() - 7 * 86400000));
|
||||
const candidates = [...new Set([weekAgo, thisWeek])];
|
||||
|
||||
const out: SubagentAuditEvent[] = [];
|
||||
for (const name of candidates) {
|
||||
const file = path.join(dir, name);
|
||||
if (!fs.existsSync(file)) continue;
|
||||
let raw: string;
|
||||
try {
|
||||
raw = fs.readFileSync(file, 'utf8');
|
||||
} catch {
|
||||
continue;
|
||||
}
|
||||
for (const line of raw.split('\n')) {
|
||||
if (!line) continue;
|
||||
let ev: SubagentAuditEvent;
|
||||
try {
|
||||
ev = JSON.parse(line) as SubagentAuditEvent;
|
||||
} catch {
|
||||
continue;
|
||||
}
|
||||
// Submission events have job_id at top level; heartbeats too. Both safe.
|
||||
if ((ev as { job_id?: number }).job_id !== jobId) continue;
|
||||
if (opts.sinceIso && ev.ts < opts.sinceIso) continue;
|
||||
out.push(ev);
|
||||
}
|
||||
}
|
||||
return out.sort((a, b) => a.ts.localeCompare(b.ts));
|
||||
}
|
||||
|
||||
/** Exported for unit tests. */
|
||||
export const __testing = {
|
||||
append,
|
||||
};
|
||||
698
src/core/minions/handlers/subagent.ts
Normal file
698
src/core/minions/handlers/subagent.ts
Normal file
@@ -0,0 +1,698 @@
|
||||
/**
|
||||
* Subagent LLM-loop handler (v0.15).
|
||||
*
|
||||
* Runs one Anthropic Messages API conversation with tool use. The loop is
|
||||
* crash-resumable: subagent_messages + subagent_tool_executions together
|
||||
* are the single source of truth about where the conversation is. On
|
||||
* resume after a worker kill, we load all committed rows, trust any tool
|
||||
* execution marked 'complete' or 'failed', and re-run 'pending' ones only
|
||||
* for idempotent tools.
|
||||
*
|
||||
* Safety rails:
|
||||
* - rate leases around every LLM call (acquire → call → release). Mid-
|
||||
* call renewal with backoff. Persistent renewal failure aborts as a
|
||||
* renewable error so the worker re-claims.
|
||||
* - dual-signal abort wiring (ctx.signal + ctx.shutdownSignal) drains
|
||||
* the in-flight call and commits whatever turns are already persisted.
|
||||
* - Anthropic prompt cache markers on system + tools blocks.
|
||||
* - token rollup via ctx.updateTokens per turn.
|
||||
*
|
||||
* NOT in v0.15: refusal detection, stop_reason=max_tokens partial
|
||||
* recovery, parallel tool-use dispatch (runs tools sequentially; the
|
||||
* Messages API allows parallel tool_use blocks and the replay tolerates
|
||||
* them, but v1 dispatches serially for simplicity). All three are tracked
|
||||
* as P2 items in the plan file.
|
||||
*/
|
||||
|
||||
import Anthropic from '@anthropic-ai/sdk';
|
||||
import type { MinionJobContext, MinionJob } from '../types.ts';
|
||||
import type {
|
||||
ContentBlock,
|
||||
SubagentHandlerData,
|
||||
SubagentResult,
|
||||
SubagentStopReason,
|
||||
ToolDef,
|
||||
} from '../types.ts';
|
||||
import type { BrainEngine } from '../../engine.ts';
|
||||
import type { GBrainConfig } from '../../config.ts';
|
||||
import { loadConfig } from '../../config.ts';
|
||||
import { buildBrainTools, filterAllowedTools } from '../tools/brain-allowlist.ts';
|
||||
import {
|
||||
acquireLease,
|
||||
releaseLease,
|
||||
renewLeaseWithBackoff,
|
||||
} from '../rate-leases.ts';
|
||||
import {
|
||||
logSubagentSubmission,
|
||||
logSubagentHeartbeat,
|
||||
} from './subagent-audit.ts';
|
||||
|
||||
// ── Defaults ────────────────────────────────────────────────
|
||||
|
||||
const DEFAULT_MODEL = 'claude-sonnet-4-6';
|
||||
const DEFAULT_MAX_TURNS = 20;
|
||||
const DEFAULT_RATE_KEY = 'anthropic:messages';
|
||||
const DEFAULT_MAX_CONCURRENT = Number(process.env.GBRAIN_ANTHROPIC_MAX_INFLIGHT ?? '8');
|
||||
const DEFAULT_LEASE_TTL_MS = 120_000;
|
||||
const DEFAULT_SYSTEM = 'You are a helpful assistant running as a gbrain subagent.';
|
||||
|
||||
// ── Injectable surfaces (for tests) ─────────────────────────
|
||||
|
||||
/**
|
||||
* Anthropic Messages client. The real Anthropic SDK implements this
|
||||
* structurally; tests can substitute a mock without the SDK import.
|
||||
*/
|
||||
export interface MessagesClient {
|
||||
create(params: Anthropic.MessageCreateParamsNonStreaming, opts?: { signal?: AbortSignal }): Promise<Anthropic.Message>;
|
||||
}
|
||||
|
||||
export interface SubagentDeps {
|
||||
/** Engine for DB-backed ops (tools + message persistence + rate leases). */
|
||||
engine: BrainEngine;
|
||||
/** Anthropic client. Defaults to the SDK-constructed client. */
|
||||
client?: MessagesClient;
|
||||
/** Config (MCP, brain, etc.). Defaults to loadConfig(). */
|
||||
config?: GBrainConfig;
|
||||
/** Rate-lease key. Defaults to `anthropic:messages`. */
|
||||
rateLeaseKey?: string;
|
||||
/** Max concurrent inflight calls on that key. Defaults to GBRAIN_ANTHROPIC_MAX_INFLIGHT or 8. */
|
||||
maxConcurrent?: number;
|
||||
/** Lease TTL. Defaults to 120s. */
|
||||
leaseTtlMs?: number;
|
||||
/**
|
||||
* Override tool registry. When omitted, buildBrainTools is called with
|
||||
* the caller's subagentId at dispatch time.
|
||||
*/
|
||||
toolRegistry?: ToolDef[];
|
||||
}
|
||||
|
||||
// ── Types for internal state ────────────────────────────────
|
||||
|
||||
interface PersistedMessage {
|
||||
message_idx: number;
|
||||
role: 'user' | 'assistant';
|
||||
content_blocks: ContentBlock[];
|
||||
tokens_in: number | null;
|
||||
tokens_out: number | null;
|
||||
tokens_cache_read: number | null;
|
||||
tokens_cache_create: number | null;
|
||||
model: string | null;
|
||||
}
|
||||
|
||||
interface PersistedToolExec {
|
||||
message_idx: number;
|
||||
tool_use_id: string;
|
||||
tool_name: string;
|
||||
input: unknown;
|
||||
status: 'pending' | 'complete' | 'failed';
|
||||
output: unknown;
|
||||
error: string | null;
|
||||
}
|
||||
|
||||
// ── Public handler factory ──────────────────────────────────
|
||||
|
||||
/**
|
||||
* Build a subagent handler bound to a specific engine. `registerBuiltin
|
||||
* Handlers` wires this up as `worker.register('subagent', handler)` at
|
||||
* worker startup. Always registered — `ANTHROPIC_API_KEY` is the natural
|
||||
* cost gate and `PROTECTED_JOB_NAMES` gates submission.
|
||||
*/
|
||||
export function makeSubagentHandler(deps: SubagentDeps) {
|
||||
const engine = deps.engine;
|
||||
const client: MessagesClient =
|
||||
deps.client ?? (new Anthropic() as unknown as MessagesClient);
|
||||
const config = deps.config ?? loadConfig() ?? ({ engine: 'postgres' } as GBrainConfig);
|
||||
const rateLeaseKey = deps.rateLeaseKey ?? DEFAULT_RATE_KEY;
|
||||
const maxConcurrent = deps.maxConcurrent ?? DEFAULT_MAX_CONCURRENT;
|
||||
const leaseTtlMs = deps.leaseTtlMs ?? DEFAULT_LEASE_TTL_MS;
|
||||
|
||||
return async function subagentHandler(ctx: MinionJobContext): Promise<SubagentResult> {
|
||||
const data = (ctx.data ?? {}) as SubagentHandlerData;
|
||||
if (!data.prompt || typeof data.prompt !== 'string') {
|
||||
throw new Error('subagent job data.prompt is required (string)');
|
||||
}
|
||||
|
||||
const model = data.model ?? DEFAULT_MODEL;
|
||||
const maxTurns = data.max_turns ?? DEFAULT_MAX_TURNS;
|
||||
const systemPrompt = data.system ?? DEFAULT_SYSTEM;
|
||||
|
||||
// Build the tool registry bound to THIS job as the owning subagent.
|
||||
const registry = deps.toolRegistry ?? buildBrainTools({
|
||||
subagentId: ctx.id,
|
||||
engine,
|
||||
config,
|
||||
});
|
||||
const toolDefs = data.allowed_tools && data.allowed_tools.length > 0
|
||||
? filterAllowedTools(registry, data.allowed_tools)
|
||||
: registry;
|
||||
|
||||
logSubagentSubmission({
|
||||
caller: 'worker',
|
||||
remote: true,
|
||||
job_id: ctx.id,
|
||||
model,
|
||||
tools_count: toolDefs.length,
|
||||
allowed_tools: toolDefs.map(t => t.name),
|
||||
});
|
||||
|
||||
// ── Load prior state (replay) ───────────────────────────
|
||||
const priorMessages = await loadPriorMessages(engine, ctx.id);
|
||||
const priorTools = await loadPriorTools(engine, ctx.id);
|
||||
const priorToolByUseId = new Map(priorTools.map(t => [t.tool_use_id, t]));
|
||||
|
||||
// Rebuild the Anthropic messages array from persisted rows.
|
||||
const anthroMessages: Anthropic.MessageParam[] = priorMessages.length > 0
|
||||
? priorMessages.map(m => ({ role: m.role, content: m.content_blocks as any }))
|
||||
: [{ role: 'user', content: data.prompt }];
|
||||
|
||||
// If we had no prior messages, persist the seed user message.
|
||||
let nextMessageIdx = priorMessages.length;
|
||||
if (priorMessages.length === 0) {
|
||||
await persistMessage(engine, ctx.id, {
|
||||
message_idx: 0,
|
||||
role: 'user',
|
||||
content_blocks: [{ type: 'text', text: data.prompt }],
|
||||
tokens_in: null,
|
||||
tokens_out: null,
|
||||
tokens_cache_read: null,
|
||||
tokens_cache_create: null,
|
||||
model: null,
|
||||
});
|
||||
nextMessageIdx = 1;
|
||||
}
|
||||
|
||||
// Token rollup.
|
||||
const tokenTotals = { in: 0, out: 0, cache_read: 0, cache_create: 0 };
|
||||
for (const m of priorMessages) {
|
||||
if (m.tokens_in) tokenTotals.in += m.tokens_in;
|
||||
if (m.tokens_out) tokenTotals.out += m.tokens_out;
|
||||
if (m.tokens_cache_read) tokenTotals.cache_read += m.tokens_cache_read;
|
||||
if (m.tokens_cache_create) tokenTotals.cache_create += m.tokens_cache_create;
|
||||
}
|
||||
|
||||
// Count assistant messages already persisted toward max_turns.
|
||||
let assistantTurns = priorMessages.filter(m => m.role === 'assistant').length;
|
||||
|
||||
// ── Replay reconciliation ───────────────────────────────
|
||||
//
|
||||
// If the last persisted message is an assistant with tool_use blocks
|
||||
// AND no subsequent user message has been synthesized yet, we crashed
|
||||
// mid-tool-dispatch. Finish those tools now so the next LLM call sees
|
||||
// a consistent conversation.
|
||||
const last = priorMessages[priorMessages.length - 1];
|
||||
if (last && last.role === 'assistant') {
|
||||
const pendingToolUses = last.content_blocks.filter(
|
||||
(b): b is { type: 'tool_use'; id: string; name: string; input: unknown } & Record<string, unknown> =>
|
||||
b.type === 'tool_use',
|
||||
);
|
||||
if (pendingToolUses.length > 0) {
|
||||
const synthesizedResults: ContentBlock[] = [];
|
||||
for (const use of pendingToolUses) {
|
||||
const prior = priorToolByUseId.get(use.id);
|
||||
if (prior?.status === 'complete') {
|
||||
synthesizedResults.push({
|
||||
type: 'tool_result',
|
||||
tool_use_id: use.id,
|
||||
content: asStringIfNotObject(prior.output),
|
||||
} as ContentBlock);
|
||||
continue;
|
||||
}
|
||||
if (prior?.status === 'failed') {
|
||||
synthesizedResults.push({
|
||||
type: 'tool_result',
|
||||
tool_use_id: use.id,
|
||||
content: prior.error ?? 'tool failed',
|
||||
is_error: true,
|
||||
} as ContentBlock);
|
||||
continue;
|
||||
}
|
||||
// pending or no row yet — try to dispatch.
|
||||
const toolDef = toolDefs.find(t => t.name === use.name);
|
||||
if (!toolDef) {
|
||||
await persistToolExecFailed(
|
||||
engine, ctx.id, last.message_idx, use.id, use.name, use.input,
|
||||
`tool "${use.name}" is not in the registry for this subagent`,
|
||||
);
|
||||
synthesizedResults.push({
|
||||
type: 'tool_result', tool_use_id: use.id,
|
||||
content: `tool "${use.name}" is not available`, is_error: true,
|
||||
} as ContentBlock);
|
||||
continue;
|
||||
}
|
||||
if (prior?.status === 'pending' && !toolDef.idempotent) {
|
||||
throw new Error(`non-idempotent tool "${use.name}" pending on resume; cannot safely re-run`);
|
||||
}
|
||||
await persistToolExecPending(engine, ctx.id, last.message_idx, use.id, use.name, use.input);
|
||||
try {
|
||||
const output = await toolDef.execute(use.input, {
|
||||
engine, jobId: ctx.id, remote: true, signal: ctx.signal,
|
||||
});
|
||||
await persistToolExecComplete(engine, ctx.id, use.id, output);
|
||||
synthesizedResults.push({
|
||||
type: 'tool_result', tool_use_id: use.id,
|
||||
content: asStringIfNotObject(output),
|
||||
} as ContentBlock);
|
||||
} catch (e) {
|
||||
const errText = e instanceof Error ? (e.stack ?? e.message) : String(e);
|
||||
await persistToolExecFailed(engine, ctx.id, last.message_idx, use.id, use.name, use.input, errText);
|
||||
synthesizedResults.push({
|
||||
type: 'tool_result', tool_use_id: use.id,
|
||||
content: errText, is_error: true,
|
||||
} as ContentBlock);
|
||||
}
|
||||
}
|
||||
// Persist the synthesized user turn so next-resume picks up here.
|
||||
const userIdx = nextMessageIdx++;
|
||||
await persistMessage(engine, ctx.id, {
|
||||
message_idx: userIdx,
|
||||
role: 'user',
|
||||
content_blocks: synthesizedResults,
|
||||
tokens_in: null, tokens_out: null, tokens_cache_read: null, tokens_cache_create: null, model: null,
|
||||
});
|
||||
anthroMessages.push({ role: 'user', content: synthesizedResults as any });
|
||||
}
|
||||
}
|
||||
|
||||
// ── Main loop ───────────────────────────────────────────
|
||||
let stopReason: SubagentStopReason = 'error';
|
||||
let finalText = '';
|
||||
|
||||
while (true) {
|
||||
if (assistantTurns >= maxTurns) {
|
||||
stopReason = 'max_turns';
|
||||
break;
|
||||
}
|
||||
if (ctx.signal.aborted || ctx.shutdownSignal.aborted) {
|
||||
stopReason = 'error';
|
||||
throw new Error('subagent aborted before turn');
|
||||
}
|
||||
|
||||
// 1. Acquire rate lease for the outbound call.
|
||||
const lease = await acquireLease(engine, rateLeaseKey, ctx.id, maxConcurrent, { ttlMs: leaseTtlMs });
|
||||
if (!lease.acquired) {
|
||||
// No slots — treat as a renewable error so the worker re-claims
|
||||
// the job later. Don't fail terminally.
|
||||
throw new RateLeaseUnavailableError(rateLeaseKey, lease.activeCount, lease.maxConcurrent);
|
||||
}
|
||||
|
||||
let assistantMsg: Anthropic.Message;
|
||||
const turnIdx = assistantTurns;
|
||||
const t0 = Date.now();
|
||||
logSubagentHeartbeat({ job_id: ctx.id, event: 'llm_call_started', turn_idx: turnIdx });
|
||||
|
||||
// Renewal is short-lived; for single-call turns the initial TTL
|
||||
// covers the whole request. A mid-call renewal loop would add
|
||||
// complexity; for v0.15 we lean on the 120s TTL + abort-on-signal.
|
||||
try {
|
||||
const params: Anthropic.MessageCreateParamsNonStreaming = {
|
||||
model,
|
||||
max_tokens: 4096,
|
||||
system: [
|
||||
{ type: 'text', text: systemPrompt, cache_control: { type: 'ephemeral' } },
|
||||
] as any,
|
||||
messages: anthroMessages,
|
||||
...(toolDefs.length > 0
|
||||
? {
|
||||
tools: toolDefs.map((t, i) => {
|
||||
const def: any = {
|
||||
name: t.name,
|
||||
description: t.description,
|
||||
input_schema: t.input_schema,
|
||||
};
|
||||
// Cache only the last tool def — Anthropic treats cache_control
|
||||
// as "cache everything up to and including this block".
|
||||
if (i === toolDefs.length - 1) def.cache_control = { type: 'ephemeral' };
|
||||
return def;
|
||||
}),
|
||||
}
|
||||
: {}),
|
||||
};
|
||||
|
||||
const combinedSignal = mergeSignals(ctx.signal, ctx.shutdownSignal);
|
||||
assistantMsg = await client.create(params, { signal: combinedSignal });
|
||||
} catch (err) {
|
||||
// Release lease eagerly on error so we don't starve capacity.
|
||||
await releaseLease(engine, lease.leaseId!).catch(() => {});
|
||||
throw err;
|
||||
}
|
||||
|
||||
// 2. Release lease as soon as the call returns. Tool execution runs
|
||||
// outside the lease — tool calls use their own capacity.
|
||||
await releaseLease(engine, lease.leaseId!).catch(() => {});
|
||||
|
||||
const ms = Date.now() - t0;
|
||||
const inTokens = assistantMsg.usage?.input_tokens ?? 0;
|
||||
const outTokens = assistantMsg.usage?.output_tokens ?? 0;
|
||||
const cacheRead = (assistantMsg.usage as any)?.cache_read_input_tokens ?? 0;
|
||||
const cacheCreate = (assistantMsg.usage as any)?.cache_creation_input_tokens ?? 0;
|
||||
|
||||
tokenTotals.in += inTokens;
|
||||
tokenTotals.out += outTokens;
|
||||
tokenTotals.cache_read += cacheRead;
|
||||
tokenTotals.cache_create += cacheCreate;
|
||||
|
||||
logSubagentHeartbeat({
|
||||
job_id: ctx.id,
|
||||
event: 'llm_call_completed',
|
||||
turn_idx: turnIdx,
|
||||
ms_elapsed: ms,
|
||||
tokens: { in: inTokens, out: outTokens, cache_read: cacheRead, cache_create: cacheCreate },
|
||||
});
|
||||
|
||||
// Update job-level token rollup (best-effort; may throw if lock lost).
|
||||
await ctx.updateTokens({
|
||||
input: inTokens,
|
||||
output: outTokens,
|
||||
cache_read: cacheRead,
|
||||
});
|
||||
|
||||
const blocks = assistantMsg.content as ContentBlock[];
|
||||
|
||||
// 3. Persist the assistant message BEFORE tool dispatch so replay
|
||||
// sees a consistent state.
|
||||
const assistantIdx = nextMessageIdx++;
|
||||
await persistMessage(engine, ctx.id, {
|
||||
message_idx: assistantIdx,
|
||||
role: 'assistant',
|
||||
content_blocks: blocks,
|
||||
tokens_in: inTokens,
|
||||
tokens_out: outTokens,
|
||||
tokens_cache_read: cacheRead,
|
||||
tokens_cache_create: cacheCreate,
|
||||
model,
|
||||
});
|
||||
anthroMessages.push({ role: 'assistant', content: blocks as any });
|
||||
assistantTurns++;
|
||||
|
||||
// 4. Collect tool_use blocks. If none, we're done.
|
||||
const toolUses = blocks.filter(
|
||||
(b): b is { type: 'tool_use'; id: string; name: string; input: unknown } & Record<string, unknown> =>
|
||||
b.type === 'tool_use',
|
||||
);
|
||||
if (toolUses.length === 0) {
|
||||
stopReason = 'end_turn';
|
||||
// Concatenate text blocks as the final answer.
|
||||
finalText = blocks
|
||||
.filter(b => b.type === 'text' && typeof b.text === 'string')
|
||||
.map(b => b.text as string)
|
||||
.join('\n');
|
||||
break;
|
||||
}
|
||||
|
||||
// 5. Dispatch each tool_use. Two-phase persist (pending → complete/failed).
|
||||
const toolResults: ContentBlock[] = [];
|
||||
for (const use of toolUses) {
|
||||
if (ctx.signal.aborted || ctx.shutdownSignal.aborted) {
|
||||
throw new Error('subagent aborted during tool dispatch');
|
||||
}
|
||||
|
||||
const toolName = use.name;
|
||||
const toolDef = toolDefs.find(t => t.name === toolName);
|
||||
if (!toolDef) {
|
||||
// Model called a tool we didn't expose. Mark execution failed
|
||||
// with a clear error and feed the error back in the next turn.
|
||||
await persistToolExecFailed(
|
||||
engine, ctx.id, assistantIdx, use.id, toolName, use.input,
|
||||
`tool "${toolName}" is not in the registry for this subagent`,
|
||||
);
|
||||
toolResults.push({
|
||||
type: 'tool_result',
|
||||
tool_use_id: use.id,
|
||||
content: `tool "${toolName}" is not available`,
|
||||
is_error: true,
|
||||
} as ContentBlock);
|
||||
logSubagentHeartbeat({
|
||||
job_id: ctx.id,
|
||||
event: 'tool_failed',
|
||||
turn_idx: turnIdx,
|
||||
tool_name: toolName,
|
||||
error: 'not in registry',
|
||||
});
|
||||
continue;
|
||||
}
|
||||
|
||||
// Replay: if we already have a row for this tool_use_id, trust it
|
||||
// unless status='pending' and the tool is idempotent (re-run).
|
||||
const prior = priorToolByUseId.get(use.id);
|
||||
if (prior && prior.status === 'complete') {
|
||||
toolResults.push({
|
||||
type: 'tool_result',
|
||||
tool_use_id: use.id,
|
||||
content: asStringIfNotObject(prior.output),
|
||||
} as ContentBlock);
|
||||
continue;
|
||||
}
|
||||
if (prior && prior.status === 'failed') {
|
||||
toolResults.push({
|
||||
type: 'tool_result',
|
||||
tool_use_id: use.id,
|
||||
content: prior.error ?? 'tool failed',
|
||||
is_error: true,
|
||||
} as ContentBlock);
|
||||
continue;
|
||||
}
|
||||
if (prior && prior.status === 'pending' && !toolDef.idempotent) {
|
||||
// Non-idempotent and we don't know the outcome — fail the job.
|
||||
throw new Error(`non-idempotent tool "${toolName}" pending on resume; cannot safely re-run`);
|
||||
}
|
||||
|
||||
// Fresh or idempotent-replay dispatch.
|
||||
await persistToolExecPending(engine, ctx.id, assistantIdx, use.id, toolName, use.input);
|
||||
logSubagentHeartbeat({ job_id: ctx.id, event: 'tool_called', turn_idx: turnIdx, tool_name: toolName });
|
||||
|
||||
const toolStart = Date.now();
|
||||
try {
|
||||
const output = await toolDef.execute(use.input, {
|
||||
engine,
|
||||
jobId: ctx.id,
|
||||
remote: true,
|
||||
signal: ctx.signal,
|
||||
});
|
||||
await persistToolExecComplete(engine, ctx.id, use.id, output);
|
||||
logSubagentHeartbeat({
|
||||
job_id: ctx.id,
|
||||
event: 'tool_result',
|
||||
turn_idx: turnIdx,
|
||||
tool_name: toolName,
|
||||
ms_elapsed: Date.now() - toolStart,
|
||||
});
|
||||
toolResults.push({
|
||||
type: 'tool_result',
|
||||
tool_use_id: use.id,
|
||||
content: asStringIfNotObject(output),
|
||||
} as ContentBlock);
|
||||
} catch (e) {
|
||||
const errText = e instanceof Error
|
||||
? (e.stack ?? e.message)
|
||||
: String(e);
|
||||
await persistToolExecFailed(engine, ctx.id, assistantIdx, use.id, toolName, use.input, errText);
|
||||
logSubagentHeartbeat({
|
||||
job_id: ctx.id,
|
||||
event: 'tool_failed',
|
||||
turn_idx: turnIdx,
|
||||
tool_name: toolName,
|
||||
ms_elapsed: Date.now() - toolStart,
|
||||
error: errText,
|
||||
});
|
||||
toolResults.push({
|
||||
type: 'tool_result',
|
||||
tool_use_id: use.id,
|
||||
content: errText,
|
||||
is_error: true,
|
||||
} as ContentBlock);
|
||||
}
|
||||
}
|
||||
|
||||
// 6. Append the synthesized user turn (tool_result wrappers) to the
|
||||
// conversation and persist it so replay picks it up.
|
||||
const userIdx = nextMessageIdx++;
|
||||
await persistMessage(engine, ctx.id, {
|
||||
message_idx: userIdx,
|
||||
role: 'user',
|
||||
content_blocks: toolResults,
|
||||
tokens_in: null,
|
||||
tokens_out: null,
|
||||
tokens_cache_read: null,
|
||||
tokens_cache_create: null,
|
||||
model: null,
|
||||
});
|
||||
anthroMessages.push({ role: 'user', content: toolResults as any });
|
||||
}
|
||||
|
||||
return {
|
||||
result: finalText,
|
||||
turns_count: assistantTurns,
|
||||
stop_reason: stopReason,
|
||||
tokens: tokenTotals,
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
// ── Internal: persistence ───────────────────────────────────
|
||||
|
||||
async function loadPriorMessages(engine: BrainEngine, jobId: number): Promise<PersistedMessage[]> {
|
||||
const rows = await engine.executeRaw<Record<string, unknown>>(
|
||||
`SELECT message_idx, role, content_blocks, tokens_in, tokens_out,
|
||||
tokens_cache_read, tokens_cache_create, model
|
||||
FROM subagent_messages
|
||||
WHERE job_id = $1
|
||||
ORDER BY message_idx ASC`,
|
||||
[jobId],
|
||||
);
|
||||
return rows.map(r => ({
|
||||
message_idx: r.message_idx as number,
|
||||
role: r.role as 'user' | 'assistant',
|
||||
content_blocks: (typeof r.content_blocks === 'string'
|
||||
? JSON.parse(r.content_blocks as string)
|
||||
: r.content_blocks) as ContentBlock[],
|
||||
tokens_in: (r.tokens_in as number) ?? null,
|
||||
tokens_out: (r.tokens_out as number) ?? null,
|
||||
tokens_cache_read: (r.tokens_cache_read as number) ?? null,
|
||||
tokens_cache_create: (r.tokens_cache_create as number) ?? null,
|
||||
model: (r.model as string) ?? null,
|
||||
}));
|
||||
}
|
||||
|
||||
async function loadPriorTools(engine: BrainEngine, jobId: number): Promise<PersistedToolExec[]> {
|
||||
const rows = await engine.executeRaw<Record<string, unknown>>(
|
||||
`SELECT message_idx, tool_use_id, tool_name, input, status, output, error
|
||||
FROM subagent_tool_executions
|
||||
WHERE job_id = $1`,
|
||||
[jobId],
|
||||
);
|
||||
return rows.map(r => ({
|
||||
message_idx: r.message_idx as number,
|
||||
tool_use_id: r.tool_use_id as string,
|
||||
tool_name: r.tool_name as string,
|
||||
input: typeof r.input === 'string' ? JSON.parse(r.input) : r.input,
|
||||
status: r.status as 'pending' | 'complete' | 'failed',
|
||||
output: r.output == null
|
||||
? null
|
||||
: (typeof r.output === 'string' ? JSON.parse(r.output) : r.output),
|
||||
error: (r.error as string) ?? null,
|
||||
}));
|
||||
}
|
||||
|
||||
async function persistMessage(engine: BrainEngine, jobId: number, msg: PersistedMessage): Promise<void> {
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_messages (job_id, message_idx, role, content_blocks,
|
||||
tokens_in, tokens_out, tokens_cache_read, tokens_cache_create, model)
|
||||
VALUES ($1, $2, $3, $4::jsonb, $5, $6, $7, $8, $9)
|
||||
ON CONFLICT (job_id, message_idx) DO NOTHING`,
|
||||
[
|
||||
jobId,
|
||||
msg.message_idx,
|
||||
msg.role,
|
||||
JSON.stringify(msg.content_blocks),
|
||||
msg.tokens_in,
|
||||
msg.tokens_out,
|
||||
msg.tokens_cache_read,
|
||||
msg.tokens_cache_create,
|
||||
msg.model,
|
||||
],
|
||||
);
|
||||
}
|
||||
|
||||
async function persistToolExecPending(
|
||||
engine: BrainEngine,
|
||||
jobId: number,
|
||||
messageIdx: number,
|
||||
toolUseId: string,
|
||||
toolName: string,
|
||||
input: unknown,
|
||||
): Promise<void> {
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_tool_executions (job_id, message_idx, tool_use_id, tool_name, input, status)
|
||||
VALUES ($1, $2, $3, $4, $5::jsonb, 'pending')
|
||||
ON CONFLICT (job_id, tool_use_id) DO NOTHING`,
|
||||
[jobId, messageIdx, toolUseId, toolName, JSON.stringify(input)],
|
||||
);
|
||||
}
|
||||
|
||||
async function persistToolExecComplete(
|
||||
engine: BrainEngine,
|
||||
jobId: number,
|
||||
toolUseId: string,
|
||||
output: unknown,
|
||||
): Promise<void> {
|
||||
await engine.executeRaw(
|
||||
`UPDATE subagent_tool_executions
|
||||
SET status = 'complete', output = $3::jsonb, ended_at = now()
|
||||
WHERE job_id = $1 AND tool_use_id = $2`,
|
||||
[jobId, toolUseId, JSON.stringify(output)],
|
||||
);
|
||||
}
|
||||
|
||||
async function persistToolExecFailed(
|
||||
engine: BrainEngine,
|
||||
jobId: number,
|
||||
messageIdx: number,
|
||||
toolUseId: string,
|
||||
toolName: string,
|
||||
input: unknown,
|
||||
error: string,
|
||||
): Promise<void> {
|
||||
// INSERT-or-UPDATE to failed — covers both "no pending row yet" (tool
|
||||
// rejected upfront) and "pending row exists" (tool threw mid-execute).
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_tool_executions (job_id, message_idx, tool_use_id, tool_name, input, status, error, ended_at)
|
||||
VALUES ($1, $2, $3, $4, $5::jsonb, 'failed', $6, now())
|
||||
ON CONFLICT (job_id, tool_use_id) DO UPDATE
|
||||
SET status = 'failed', error = EXCLUDED.error, ended_at = now()`,
|
||||
[jobId, messageIdx, toolUseId, toolName, JSON.stringify(input), error],
|
||||
);
|
||||
}
|
||||
|
||||
// ── Internal: helpers ───────────────────────────────────────
|
||||
|
||||
function asStringIfNotObject(value: unknown): string {
|
||||
if (typeof value === 'string') return value;
|
||||
try {
|
||||
return JSON.stringify(value);
|
||||
} catch {
|
||||
return String(value);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Merge two AbortSignals into one. Fires when either source aborts. No-op
|
||||
* polyfill when AbortSignal.any isn't available yet (Node ≥ 20 has it).
|
||||
*/
|
||||
function mergeSignals(a: AbortSignal, b: AbortSignal): AbortSignal {
|
||||
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||
const anyFn = (AbortSignal as any).any;
|
||||
if (typeof anyFn === 'function') return anyFn([a, b]) as AbortSignal;
|
||||
// Manual merge.
|
||||
const ac = new AbortController();
|
||||
if (a.aborted || b.aborted) ac.abort();
|
||||
else {
|
||||
a.addEventListener('abort', () => ac.abort(), { once: true });
|
||||
b.addEventListener('abort', () => ac.abort(), { once: true });
|
||||
}
|
||||
return ac.signal;
|
||||
}
|
||||
|
||||
/**
|
||||
* Error thrown when acquireLease returns acquired=false. The worker
|
||||
* treats this as a renewable error — job goes back to waiting with
|
||||
* backoff, no terminal fail.
|
||||
*/
|
||||
export class RateLeaseUnavailableError extends Error {
|
||||
constructor(public key: string, public active: number, public max: number) {
|
||||
super(`rate lease "${key}" full (${active}/${max})`);
|
||||
this.name = 'RateLeaseUnavailableError';
|
||||
}
|
||||
}
|
||||
|
||||
// ── Testing surface ─────────────────────────────────────────
|
||||
|
||||
export const __testing = {
|
||||
loadPriorMessages,
|
||||
loadPriorTools,
|
||||
persistMessage,
|
||||
persistToolExecPending,
|
||||
persistToolExecComplete,
|
||||
persistToolExecFailed,
|
||||
asStringIfNotObject,
|
||||
DEFAULT_MODEL,
|
||||
};
|
||||
235
src/core/minions/plugin-loader.ts
Normal file
235
src/core/minions/plugin-loader.ts
Normal file
@@ -0,0 +1,235 @@
|
||||
/**
|
||||
* GBRAIN_PLUGIN_PATH loader for host-repo subagent definitions (v0.15).
|
||||
*
|
||||
* Your OpenClaw (and future downstream agents) ship custom subagent defs
|
||||
* from their own repos. gbrain discovers them at worker startup via
|
||||
* GBRAIN_PLUGIN_PATH = colon-separated absolute paths (like $PATH). Each
|
||||
* path must contain a gbrain.plugin.json manifest describing the plugin
|
||||
* and a subagents/ subdirectory holding `*.md` definition files.
|
||||
*
|
||||
* Path policy is strict on purpose:
|
||||
* - ABSOLUTE paths only. Relative paths and `~` prefixes are rejected
|
||||
* (no implicit cwd or home expansion — too easy to pick up a tampered
|
||||
* sibling directory).
|
||||
* - Remote URLs (http://, https://, file://) rejected. Plugin loading
|
||||
* must go through the filesystem so the user controls what's there.
|
||||
* - Non-existent paths logged and skipped (do not fail worker startup).
|
||||
*
|
||||
* Collision policy: left-to-right wins. A warning goes to stderr naming
|
||||
* both sides of the collision.
|
||||
*
|
||||
* Trust policy: plugins ship subagent *defs* only. They cannot declare
|
||||
* new tools, cannot extend the brain-allowlist, cannot override
|
||||
* agent-safe flags. The `allowed_tools:` frontmatter field of a subagent
|
||||
* def must subset the derived registry — validation happens at plugin
|
||||
* load time, NOT at subagent dispatch time, so a typo in a plugin skill
|
||||
* fails loudly at worker startup instead of silently disabling a tool.
|
||||
*
|
||||
* Manifest version (`plugin_version`) locks the contract shape. Unknown
|
||||
* versions are rejected so the authoritative definition is whatever this
|
||||
* version of gbrain understands.
|
||||
*/
|
||||
|
||||
import * as fs from 'node:fs';
|
||||
import * as path from 'node:path';
|
||||
import matter from 'gray-matter';
|
||||
|
||||
export const SUPPORTED_PLUGIN_VERSION = 'gbrain-plugin-v1';
|
||||
|
||||
export interface PluginManifest {
|
||||
name: string;
|
||||
version: string;
|
||||
plugin_version: string;
|
||||
subagents?: string;
|
||||
description?: string;
|
||||
}
|
||||
|
||||
export interface SubagentDefinition {
|
||||
/** The plugin that shipped this def. */
|
||||
plugin_name: string;
|
||||
/** Stable agent name used as `subagent_def` by CLI callers. */
|
||||
name: string;
|
||||
/** Full path to the .md file on disk, for debug surfaces. */
|
||||
source_path: string;
|
||||
frontmatter: Record<string, unknown>;
|
||||
/** Markdown body (system prompt content). */
|
||||
body: string;
|
||||
/** Optional allowed_tools list (frontmatter). Subset of registry. */
|
||||
allowed_tools?: string[];
|
||||
}
|
||||
|
||||
export interface PluginLoadResult {
|
||||
/** Successfully loaded plugins with their subagents. */
|
||||
plugins: Array<{ manifest: PluginManifest; rootDir: string; subagents: SubagentDefinition[] }>;
|
||||
/** Per-path warnings (rejected, missing, malformed) collected during load. */
|
||||
warnings: string[];
|
||||
}
|
||||
|
||||
export interface LoadOpts {
|
||||
/**
|
||||
* Registry names the plugin's subagent `allowed_tools` must subset. When
|
||||
* present, any frontmatter entry not in this set fails the plugin load.
|
||||
* Pass `undefined` to skip validation (early worker startup before the
|
||||
* registry is built — but production callers should always pass it).
|
||||
*/
|
||||
validAgentToolNames?: ReadonlySet<string>;
|
||||
/** Override the PATH env (for tests). */
|
||||
envPath?: string;
|
||||
}
|
||||
|
||||
/** Public entry point: load every plugin directory from GBRAIN_PLUGIN_PATH. */
|
||||
export function loadPluginsFromEnv(opts: LoadOpts = {}): PluginLoadResult {
|
||||
const raw = opts.envPath ?? process.env.GBRAIN_PLUGIN_PATH ?? '';
|
||||
const paths = raw.split(':').map(s => s.trim()).filter(Boolean);
|
||||
const result: PluginLoadResult = { plugins: [], warnings: [] };
|
||||
|
||||
// Left-wins collision tracking.
|
||||
const subagentByName = new Map<string, { pluginName: string; pathLeft: string }>();
|
||||
|
||||
for (const p of paths) {
|
||||
const rejection = rejectIfNotAbsolute(p);
|
||||
if (rejection) { result.warnings.push(rejection); continue; }
|
||||
if (!fs.existsSync(p)) {
|
||||
result.warnings.push(`[plugin-loader] path does not exist, skipping: ${p}`);
|
||||
continue;
|
||||
}
|
||||
if (!fs.statSync(p).isDirectory()) {
|
||||
result.warnings.push(`[plugin-loader] not a directory, skipping: ${p}`);
|
||||
continue;
|
||||
}
|
||||
|
||||
try {
|
||||
const loaded = loadSinglePlugin(p, opts);
|
||||
if ('error' in loaded) {
|
||||
result.warnings.push(`[plugin-loader] rejected ${p}: ${loaded.error}`);
|
||||
continue;
|
||||
}
|
||||
|
||||
const accepted: SubagentDefinition[] = [];
|
||||
for (const sa of loaded.subagents) {
|
||||
const prior = subagentByName.get(sa.name);
|
||||
if (prior) {
|
||||
result.warnings.push(
|
||||
`[plugin-loader] collision: subagent '${sa.name}' from '${loaded.manifest.name}' at ${p} ` +
|
||||
`shadowed by earlier '${prior.pluginName}' at ${prior.pathLeft} (first wins)`,
|
||||
);
|
||||
continue;
|
||||
}
|
||||
subagentByName.set(sa.name, { pluginName: loaded.manifest.name, pathLeft: p });
|
||||
accepted.push(sa);
|
||||
}
|
||||
|
||||
result.plugins.push({ manifest: loaded.manifest, rootDir: p, subagents: accepted });
|
||||
} catch (err) {
|
||||
const msg = err instanceof Error ? err.message : String(err);
|
||||
result.warnings.push(`[plugin-loader] unexpected error loading ${p}: ${msg}`);
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
function rejectIfNotAbsolute(p: string): string | null {
|
||||
if (/^[a-z][a-z0-9+.-]*:\/\//i.test(p)) {
|
||||
return `[plugin-loader] remote URL rejected: ${p}`;
|
||||
}
|
||||
if (p.startsWith('~')) {
|
||||
return `[plugin-loader] ~-prefixed path rejected (expand explicitly): ${p}`;
|
||||
}
|
||||
if (!path.isAbsolute(p)) {
|
||||
return `[plugin-loader] relative path rejected: ${p}`;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
export interface LoadedPlugin {
|
||||
manifest: PluginManifest;
|
||||
subagents: SubagentDefinition[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Load one plugin directory. Returns a union so callers can differentiate
|
||||
* rejection (loud but non-fatal) from an empty plugin (fatal-ish — the
|
||||
* manifest parsed but contributes nothing).
|
||||
*/
|
||||
export function loadSinglePlugin(
|
||||
rootDir: string,
|
||||
opts: LoadOpts = {},
|
||||
): LoadedPlugin | { error: string } {
|
||||
const manifestPath = path.join(rootDir, 'gbrain.plugin.json');
|
||||
if (!fs.existsSync(manifestPath)) {
|
||||
return { error: 'missing gbrain.plugin.json' };
|
||||
}
|
||||
|
||||
let manifest: PluginManifest;
|
||||
try {
|
||||
const raw = fs.readFileSync(manifestPath, 'utf8');
|
||||
manifest = JSON.parse(raw) as PluginManifest;
|
||||
} catch (e) {
|
||||
return { error: `invalid manifest JSON: ${e instanceof Error ? e.message : String(e)}` };
|
||||
}
|
||||
|
||||
if (typeof manifest.name !== 'string' || manifest.name.length === 0) {
|
||||
return { error: 'manifest missing required "name" field' };
|
||||
}
|
||||
if (manifest.plugin_version !== SUPPORTED_PLUGIN_VERSION) {
|
||||
return {
|
||||
error: `unsupported plugin_version "${manifest.plugin_version}" (gbrain supports "${SUPPORTED_PLUGIN_VERSION}")`,
|
||||
};
|
||||
}
|
||||
|
||||
const subagentsDirRel = manifest.subagents ?? 'subagents';
|
||||
const subagentsDir = path.resolve(rootDir, subagentsDirRel);
|
||||
// Prevent `../` escape via the manifest's `subagents` field.
|
||||
if (!subagentsDir.startsWith(rootDir + path.sep) && subagentsDir !== rootDir) {
|
||||
return { error: `subagents path escapes plugin root: ${subagentsDirRel}` };
|
||||
}
|
||||
|
||||
const subagents: SubagentDefinition[] = [];
|
||||
if (fs.existsSync(subagentsDir) && fs.statSync(subagentsDir).isDirectory()) {
|
||||
for (const entry of fs.readdirSync(subagentsDir)) {
|
||||
if (!entry.endsWith('.md')) continue;
|
||||
const sourcePath = path.join(subagentsDir, entry);
|
||||
try {
|
||||
const raw = fs.readFileSync(sourcePath, 'utf8');
|
||||
const parsed = matter(raw);
|
||||
const frontmatter = (parsed.data ?? {}) as Record<string, unknown>;
|
||||
const body = parsed.content ?? '';
|
||||
const name = typeof frontmatter.name === 'string'
|
||||
? frontmatter.name
|
||||
: entry.replace(/\.md$/, '');
|
||||
const allowed = Array.isArray(frontmatter.allowed_tools)
|
||||
? (frontmatter.allowed_tools as unknown[]).filter(x => typeof x === 'string') as string[]
|
||||
: undefined;
|
||||
|
||||
if (allowed && opts.validAgentToolNames) {
|
||||
const missing = allowed.filter(t => !opts.validAgentToolNames!.has(t));
|
||||
if (missing.length > 0) {
|
||||
return {
|
||||
error: `subagent '${name}' allowed_tools references unknown tools: ${missing.join(', ')}`,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
subagents.push({
|
||||
plugin_name: manifest.name,
|
||||
name,
|
||||
source_path: sourcePath,
|
||||
frontmatter,
|
||||
body,
|
||||
allowed_tools: allowed,
|
||||
});
|
||||
} catch (e) {
|
||||
return { error: `could not parse ${sourcePath}: ${e instanceof Error ? e.message : String(e)}` };
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return { manifest, subagents };
|
||||
}
|
||||
|
||||
/** Testing surface. */
|
||||
export const __testing = {
|
||||
rejectIfNotAbsolute,
|
||||
SUPPORTED_PLUGIN_VERSION,
|
||||
};
|
||||
@@ -12,7 +12,15 @@
|
||||
* pay them at module load.
|
||||
*/
|
||||
|
||||
export const PROTECTED_JOB_NAMES: ReadonlySet<string> = new Set(['shell']);
|
||||
export const PROTECTED_JOB_NAMES: ReadonlySet<string> = new Set([
|
||||
'shell',
|
||||
// v0.15: subagent + aggregator are protected because they call the
|
||||
// Anthropic API. MCP callers can't submit them directly; only the
|
||||
// `gbrain agent run` CLI path (which sets allowProtectedSubmit) or a
|
||||
// trusted local `submit_job` (ctx.remote=false) can insert these rows.
|
||||
'subagent',
|
||||
'subagent_aggregator',
|
||||
]);
|
||||
|
||||
/** Check a job name against the protected set. Normalizes whitespace first. */
|
||||
export function isProtectedJobName(name: string): boolean {
|
||||
|
||||
@@ -134,11 +134,17 @@ export class MinionQueue {
|
||||
|
||||
// 3. Insert child. Use ON CONFLICT for idempotency; if a concurrent submit
|
||||
// raced past the fast-path SELECT, the unique index catches it here.
|
||||
// v13 quiet_hours + stagger_key always present (null fallback; schema
|
||||
// stores NULL). v15 max_stalled is conditional: provided values get
|
||||
// clamped to [1, 100] and included in the INSERT; omitted values
|
||||
// skip the column so the schema DEFAULT (5 as of v0.14.1) kicks in.
|
||||
// Keeps the app layer from hardcoding the schema default constant.
|
||||
// quiet_hours + stagger_key always present (null fallback; schema
|
||||
// stores NULL). max_stalled is conditional: provided values get
|
||||
// clamped to [1, 100] and included in the INSERT; omitted values
|
||||
// skip the column so the schema DEFAULT (5 as of v0.14.1) kicks in.
|
||||
// Keeps the app layer from hardcoding the schema default constant.
|
||||
//
|
||||
// Footgun note (codex iter 3): threading max_stalled on INSERT only is
|
||||
// deliberate. An idempotency-key hit returns the EXISTING row via the
|
||||
// fast-path SELECT above — we do NOT UPDATE max_stalled on a re-submit,
|
||||
// because letting a second submitter mutate the first submitter's
|
||||
// durability semantics is a nasty surprise.
|
||||
const hasMaxStalled = opts?.max_stalled !== undefined && opts.max_stalled !== null;
|
||||
const clampedMaxStalled = hasMaxStalled
|
||||
? Math.max(1, Math.min(100, Math.floor(opts!.max_stalled as number)))
|
||||
@@ -286,29 +292,83 @@ export class MinionQueue {
|
||||
* Returns the *root* (the job matching id), not an arbitrary descendant.
|
||||
*/
|
||||
async cancelJob(id: number): Promise<MinionJob | null> {
|
||||
const rows = await this.engine.executeRaw<Record<string, unknown>>(
|
||||
`WITH RECURSIVE descendants AS (
|
||||
SELECT id, 0 AS d FROM minion_jobs WHERE id = $1
|
||||
UNION ALL
|
||||
SELECT m.id, descendants.d + 1
|
||||
FROM minion_jobs m
|
||||
JOIN descendants ON m.parent_job_id = descendants.id
|
||||
WHERE descendants.d < 100
|
||||
)
|
||||
UPDATE minion_jobs SET
|
||||
status = 'cancelled',
|
||||
lock_token = NULL,
|
||||
lock_until = NULL,
|
||||
finished_at = now(),
|
||||
updated_at = now()
|
||||
WHERE id IN (SELECT id FROM descendants)
|
||||
AND status IN ('waiting','active','delayed','waiting-children','paused')
|
||||
RETURNING *`,
|
||||
[id]
|
||||
);
|
||||
if (rows.length === 0) return null;
|
||||
const root = rows.find(r => (r.id as number) === id);
|
||||
return root ? rowToMinionJob(root) : null;
|
||||
return this.engine.transaction(async (tx) => {
|
||||
const rows = await tx.executeRaw<Record<string, unknown>>(
|
||||
`WITH RECURSIVE descendants AS (
|
||||
SELECT id, 0 AS d FROM minion_jobs WHERE id = $1
|
||||
UNION ALL
|
||||
SELECT m.id, descendants.d + 1
|
||||
FROM minion_jobs m
|
||||
JOIN descendants ON m.parent_job_id = descendants.id
|
||||
WHERE descendants.d < 100
|
||||
)
|
||||
UPDATE minion_jobs SET
|
||||
status = 'cancelled',
|
||||
lock_token = NULL,
|
||||
lock_until = NULL,
|
||||
finished_at = now(),
|
||||
updated_at = now()
|
||||
WHERE id IN (SELECT id FROM descendants)
|
||||
AND status IN ('waiting','active','delayed','waiting-children','paused')
|
||||
RETURNING *`,
|
||||
[id]
|
||||
);
|
||||
if (rows.length === 0) return null;
|
||||
|
||||
// v0.15: emit child_done(outcome='cancelled') for every cancelled row
|
||||
// that had a parent. Without this, an aggregator waiting for N
|
||||
// child_done messages hangs forever when a child is cancelled (codex
|
||||
// iteration 3). Also unblock any aggregator parents whose last
|
||||
// non-terminal child we just cancelled.
|
||||
const parentIds = new Set<number>();
|
||||
for (const r of rows) {
|
||||
const childId = r.id as number;
|
||||
const parentJobId = r.parent_job_id as number | null;
|
||||
const name = r.name as string;
|
||||
// Skip the root if it's the caller's cancel target AND has no parent.
|
||||
// Descendants whose parent got cancelled in the same sweep still
|
||||
// benefit from the inbox message — their parent exits waiting-children
|
||||
// via the resolve sweep below even though the parent is itself
|
||||
// cancelled (EXISTS guard on inbox INSERT handles it).
|
||||
if (parentJobId == null) continue;
|
||||
parentIds.add(parentJobId);
|
||||
const childDone: ChildDoneMessage = {
|
||||
type: 'child_done',
|
||||
child_id: childId,
|
||||
job_name: name,
|
||||
result: null,
|
||||
outcome: 'cancelled',
|
||||
error: 'cancelled',
|
||||
};
|
||||
await tx.executeRaw(
|
||||
`INSERT INTO minion_inbox (job_id, sender, payload)
|
||||
SELECT $1, 'minions', $2::jsonb
|
||||
WHERE EXISTS (
|
||||
SELECT 1 FROM minion_jobs
|
||||
WHERE id = $1 AND status NOT IN ('completed','failed','dead','cancelled')
|
||||
)`,
|
||||
[parentJobId, childDone]
|
||||
);
|
||||
}
|
||||
|
||||
// Resolve any non-cancelled aggregator parents sitting on
|
||||
// waiting-children whose last open child we just cancelled.
|
||||
for (const parentId of parentIds) {
|
||||
await tx.executeRaw(
|
||||
`UPDATE minion_jobs SET status = 'waiting', updated_at = now()
|
||||
WHERE id = $1 AND status = 'waiting-children'
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM minion_jobs
|
||||
WHERE parent_job_id = $1
|
||||
AND status NOT IN ('completed', 'failed', 'dead', 'cancelled')
|
||||
)`,
|
||||
[parentId]
|
||||
);
|
||||
}
|
||||
|
||||
const root = rows.find(r => (r.id as number) === id);
|
||||
return root ? rowToMinionJob(root) : null;
|
||||
});
|
||||
}
|
||||
|
||||
/** Re-queue a failed or dead job for retry. */
|
||||
@@ -440,21 +500,67 @@ export class MinionQueue {
|
||||
* but will be caught the next one (after re-claim). Never double-handled.
|
||||
*/
|
||||
async handleTimeouts(): Promise<MinionJob[]> {
|
||||
const rows = await this.engine.executeRaw<Record<string, unknown>>(
|
||||
`UPDATE minion_jobs SET
|
||||
status = 'dead',
|
||||
error_text = 'timeout exceeded',
|
||||
lock_token = NULL,
|
||||
lock_until = NULL,
|
||||
finished_at = now(),
|
||||
updated_at = now()
|
||||
WHERE status = 'active'
|
||||
AND timeout_at IS NOT NULL
|
||||
AND timeout_at < now()
|
||||
AND lock_until > now()
|
||||
RETURNING *`
|
||||
);
|
||||
return rows.map(rowToMinionJob);
|
||||
return this.engine.transaction(async (tx) => {
|
||||
const rows = await tx.executeRaw<Record<string, unknown>>(
|
||||
`UPDATE minion_jobs SET
|
||||
status = 'dead',
|
||||
error_text = 'timeout exceeded',
|
||||
lock_token = NULL,
|
||||
lock_until = NULL,
|
||||
finished_at = now(),
|
||||
updated_at = now()
|
||||
WHERE status = 'active'
|
||||
AND timeout_at IS NOT NULL
|
||||
AND timeout_at < now()
|
||||
AND lock_until > now()
|
||||
RETURNING *`
|
||||
);
|
||||
|
||||
// v0.15: emit child_done(outcome='timeout') for every timed-out job that
|
||||
// had a parent. Without this, an aggregator waiting for N child_done
|
||||
// messages hangs forever when a child times out (codex iteration 3).
|
||||
// Outcome 'timeout' is distinct from 'dead' so consumers can distinguish
|
||||
// "timed out during run" from "died via max-stall".
|
||||
const parentIds = new Set<number>();
|
||||
for (const r of rows) {
|
||||
const parentJobId = r.parent_job_id as number | null;
|
||||
if (parentJobId == null) continue;
|
||||
parentIds.add(parentJobId);
|
||||
const childDone: ChildDoneMessage = {
|
||||
type: 'child_done',
|
||||
child_id: r.id as number,
|
||||
job_name: r.name as string,
|
||||
result: null,
|
||||
outcome: 'timeout',
|
||||
error: 'timeout exceeded',
|
||||
};
|
||||
await tx.executeRaw(
|
||||
`INSERT INTO minion_inbox (job_id, sender, payload)
|
||||
SELECT $1, 'minions', $2::jsonb
|
||||
WHERE EXISTS (
|
||||
SELECT 1 FROM minion_jobs
|
||||
WHERE id = $1 AND status NOT IN ('completed','failed','dead','cancelled')
|
||||
)`,
|
||||
[parentJobId, childDone]
|
||||
);
|
||||
}
|
||||
|
||||
// Unblock any aggregator parents whose last open child we just killed.
|
||||
for (const parentId of parentIds) {
|
||||
await tx.executeRaw(
|
||||
`UPDATE minion_jobs SET status = 'waiting', updated_at = now()
|
||||
WHERE id = $1 AND status = 'waiting-children'
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM minion_jobs
|
||||
WHERE parent_job_id = $1
|
||||
AND status NOT IN ('completed', 'failed', 'dead', 'cancelled')
|
||||
)`,
|
||||
[parentId]
|
||||
);
|
||||
}
|
||||
|
||||
return rows.map(rowToMinionJob);
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -524,6 +630,7 @@ export class MinionQueue {
|
||||
child_id: completed.id,
|
||||
job_name: completed.name,
|
||||
result: result ?? null,
|
||||
outcome: 'complete',
|
||||
};
|
||||
await tx.executeRaw(
|
||||
`INSERT INTO minion_inbox (job_id, sender, payload)
|
||||
@@ -535,14 +642,17 @@ export class MinionQueue {
|
||||
[completed.parent_job_id, childDone]
|
||||
);
|
||||
|
||||
// Fold-in resolveParent: flip parent to waiting once all children done.
|
||||
// Fold-in resolveParent: flip parent to waiting once all children are
|
||||
// in ANY terminal state. Terminal set includes 'failed' so a failed
|
||||
// child with on_child_fail='continue'/'ignore' doesn't strand the
|
||||
// parent in waiting-children forever (v0.15 aggregator fix).
|
||||
await tx.executeRaw(
|
||||
`UPDATE minion_jobs SET status = 'waiting', updated_at = now()
|
||||
WHERE id = $1 AND status = 'waiting-children'
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM minion_jobs
|
||||
WHERE parent_job_id = $1
|
||||
AND status NOT IN ('completed', 'dead', 'cancelled')
|
||||
AND status NOT IN ('completed', 'failed', 'dead', 'cancelled')
|
||||
)`,
|
||||
[completed.parent_job_id]
|
||||
);
|
||||
@@ -616,6 +726,29 @@ export class MinionQueue {
|
||||
|
||||
// Parent hook on terminal failure.
|
||||
if (terminal && failed.parent_job_id) {
|
||||
// v0.15: emit child_done(outcome='failed') BEFORE any parent-terminal
|
||||
// update. Insertion order matters because `completeJob`'s inbox-write
|
||||
// EXISTS guard skips writes once the parent is 'failed' — if we let
|
||||
// the fail_parent UPDATE run first, this inbox row would be dropped
|
||||
// for aggregator-style parents that still want to count it (codex).
|
||||
const childDone: ChildDoneMessage = {
|
||||
type: 'child_done',
|
||||
child_id: failed.id,
|
||||
job_name: failed.name,
|
||||
result: null,
|
||||
outcome: newStatus === 'dead' ? 'dead' : 'failed',
|
||||
error: errorText,
|
||||
};
|
||||
await tx.executeRaw(
|
||||
`INSERT INTO minion_inbox (job_id, sender, payload)
|
||||
SELECT $1, 'minions', $2::jsonb
|
||||
WHERE EXISTS (
|
||||
SELECT 1 FROM minion_jobs
|
||||
WHERE id = $1 AND status NOT IN ('completed','failed','dead','cancelled')
|
||||
)`,
|
||||
[failed.parent_job_id, childDone]
|
||||
);
|
||||
|
||||
if (failed.on_child_fail === 'fail_parent') {
|
||||
await tx.executeRaw(
|
||||
`UPDATE minion_jobs SET status = 'failed',
|
||||
@@ -628,19 +761,37 @@ export class MinionQueue {
|
||||
`UPDATE minion_jobs SET parent_job_id = NULL, updated_at = now() WHERE id = $1`,
|
||||
[failed.id]
|
||||
);
|
||||
// After dropping the dep, try to resolve the parent if all OTHER kids are done.
|
||||
// After dropping the dep, try to resolve the parent if all OTHER
|
||||
// kids are terminal. Terminal set includes 'failed' (v0.15).
|
||||
await tx.executeRaw(
|
||||
`UPDATE minion_jobs SET status = 'waiting', updated_at = now()
|
||||
WHERE id = $1 AND status = 'waiting-children'
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM minion_jobs
|
||||
WHERE parent_job_id = $1
|
||||
AND status NOT IN ('completed', 'dead', 'cancelled')
|
||||
AND status NOT IN ('completed', 'failed', 'dead', 'cancelled')
|
||||
)`,
|
||||
[failed.parent_job_id]
|
||||
);
|
||||
} else {
|
||||
// 'ignore' / 'continue': parent stays in waiting-children waiting on
|
||||
// siblings. With v0.15 terminal-set expansion + child_done emission
|
||||
// above, an aggregator sibling-count model now works: all N children
|
||||
// reach terminal → completeJob on a sibling (or the LAST terminal
|
||||
// transition here) flips parent → waiting once no non-terminal kids
|
||||
// remain. Run the resolve check here so the last child transitioning
|
||||
// via THIS code path still unblocks the parent.
|
||||
await tx.executeRaw(
|
||||
`UPDATE minion_jobs SET status = 'waiting', updated_at = now()
|
||||
WHERE id = $1 AND status = 'waiting-children'
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM minion_jobs
|
||||
WHERE parent_job_id = $1
|
||||
AND status NOT IN ('completed', 'failed', 'dead', 'cancelled')
|
||||
)`,
|
||||
[failed.parent_job_id]
|
||||
);
|
||||
}
|
||||
// 'ignore' / 'continue' → parent stays in waiting-children waiting on siblings
|
||||
}
|
||||
|
||||
// remove_on_fail cleanup AFTER parent hook.
|
||||
@@ -725,7 +876,13 @@ export class MinionQueue {
|
||||
return { requeued, dead };
|
||||
}
|
||||
|
||||
/** Check if all children of a parent are done. If so, unblock parent. */
|
||||
/**
|
||||
* Check if all children of a parent are in ANY terminal state. If so,
|
||||
* unblock parent (flip waiting-children → waiting).
|
||||
*
|
||||
* v0.15: terminal set includes 'failed' so a child failing with
|
||||
* on_child_fail='continue'/'ignore' doesn't strand the parent.
|
||||
*/
|
||||
async resolveParent(parentId: number): Promise<MinionJob | null> {
|
||||
const rows = await this.engine.executeRaw<Record<string, unknown>>(
|
||||
`UPDATE minion_jobs SET status = 'waiting', updated_at = now()
|
||||
@@ -733,7 +890,7 @@ export class MinionQueue {
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM minion_jobs
|
||||
WHERE parent_job_id = $1
|
||||
AND status NOT IN ('completed', 'dead', 'cancelled')
|
||||
AND status NOT IN ('completed', 'failed', 'dead', 'cancelled')
|
||||
)
|
||||
RETURNING *`,
|
||||
[parentId]
|
||||
|
||||
152
src/core/minions/rate-leases.ts
Normal file
152
src/core/minions/rate-leases.ts
Normal file
@@ -0,0 +1,152 @@
|
||||
/**
|
||||
* Lease-based rate limiter for outbound providers (e.g. anthropic:messages).
|
||||
*
|
||||
* Counter-based limiters leak capacity when a worker crashes mid-call
|
||||
* (counter never decrements). Leases are owner-tagged rows with an expires_at
|
||||
* timestamp — crash recovery is free: any row past expires_at is considered
|
||||
* dead on the next acquire and pruned before the active-count check.
|
||||
*
|
||||
* Two-phase acquire:
|
||||
* 1. Pre-prune: DELETE expired leases for this key (same txn).
|
||||
* 2. Check-then-insert under a txn-scoped advisory lock so two concurrent
|
||||
* acquires can't both see "one slot left".
|
||||
*
|
||||
* The owner is always a Minion job id; the lease is CASCADE-tied to
|
||||
* minion_jobs so an out-of-band row DELETE (prune, cancel) doesn't leave
|
||||
* stale leases. Mid-call renewal bumps expires_at in-place.
|
||||
*/
|
||||
|
||||
import type { BrainEngine } from '../engine.ts';
|
||||
|
||||
/**
|
||||
* Acquisition result. If `acquired=false`, the caller should back off and
|
||||
* retry — we don't queue, we just reject.
|
||||
*/
|
||||
export interface LeaseAcquireResult {
|
||||
acquired: boolean;
|
||||
/** The lease row id, present only when acquired=true. */
|
||||
leaseId?: number;
|
||||
/** Active count seen at acquire time (for diagnostics). */
|
||||
activeCount: number;
|
||||
/** max_concurrent that was checked against. */
|
||||
maxConcurrent: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert a key string to a stable int64 for pg_advisory_xact_lock. Simple
|
||||
* FNV-1a is fine — the lock space is per-transaction and we only need
|
||||
* different keys to (usually) hash to different locks.
|
||||
*/
|
||||
function hashKey(key: string): bigint {
|
||||
// FNV-1a 64-bit
|
||||
let h = 0xcbf29ce484222325n;
|
||||
const prime = 0x100000001b3n;
|
||||
for (let i = 0; i < key.length; i++) {
|
||||
h ^= BigInt(key.charCodeAt(i));
|
||||
h = (h * prime) & 0xffffffffffffffffn;
|
||||
}
|
||||
// Fit into signed int64 for PG bigint. The high bit gets clipped in the
|
||||
// arithmetic above already, but be explicit.
|
||||
const signBit = 0x8000000000000000n;
|
||||
return h & (signBit - 1n);
|
||||
}
|
||||
|
||||
const DEFAULT_TTL_MS = 120_000;
|
||||
|
||||
export interface AcquireOpts {
|
||||
ttlMs?: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Attempt to acquire a lease on `key`. Returns `{acquired: false}` when the
|
||||
* active count (after pre-pruning stale rows) would exceed maxConcurrent.
|
||||
*
|
||||
* The call MUST run inside a transaction for the advisory lock + insert to
|
||||
* be atomic. Pass in the engine — the helper wraps the txn internally.
|
||||
*/
|
||||
export async function acquireLease(
|
||||
engine: BrainEngine,
|
||||
key: string,
|
||||
ownerJobId: number,
|
||||
maxConcurrent: number,
|
||||
opts: AcquireOpts = {},
|
||||
): Promise<LeaseAcquireResult> {
|
||||
const ttlMs = opts.ttlMs ?? DEFAULT_TTL_MS;
|
||||
const lockKey = hashKey(key);
|
||||
|
||||
return engine.transaction(async (tx) => {
|
||||
// txn-scoped advisory lock keyed on the rate-lease key name. Released
|
||||
// automatically when the txn commits/rolls back.
|
||||
await tx.executeRaw(`SELECT pg_advisory_xact_lock($1::bigint)`, [lockKey.toString()]);
|
||||
|
||||
// Pre-prune stale leases for this key.
|
||||
await tx.executeRaw(
|
||||
`DELETE FROM subagent_rate_leases WHERE key = $1 AND expires_at <= now()`,
|
||||
[key],
|
||||
);
|
||||
|
||||
const countRows = await tx.executeRaw<{ count: string | number }>(
|
||||
`SELECT count(*)::text AS count FROM subagent_rate_leases WHERE key = $1`,
|
||||
[key],
|
||||
);
|
||||
const activeCount = parseInt(String(countRows[0]?.count ?? '0'), 10);
|
||||
|
||||
if (activeCount >= maxConcurrent) {
|
||||
return { acquired: false, activeCount, maxConcurrent };
|
||||
}
|
||||
|
||||
const rows = await tx.executeRaw<{ id: number }>(
|
||||
`INSERT INTO subagent_rate_leases (key, owner_job_id, expires_at)
|
||||
VALUES ($1, $2, now() + ($3::double precision * interval '1 millisecond'))
|
||||
RETURNING id`,
|
||||
[key, ownerJobId, ttlMs],
|
||||
);
|
||||
const leaseId = rows[0]!.id;
|
||||
return { acquired: true, leaseId, activeCount: activeCount + 1, maxConcurrent };
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Renew a lease's expires_at (mid-call). Returns true if the lease still
|
||||
* exists (was renewed), false if it was pruned (caller must re-acquire or
|
||||
* abort).
|
||||
*/
|
||||
export async function renewLease(engine: BrainEngine, leaseId: number, ttlMs = DEFAULT_TTL_MS): Promise<boolean> {
|
||||
const rows = await engine.executeRaw<{ id: number }>(
|
||||
`UPDATE subagent_rate_leases
|
||||
SET expires_at = now() + ($2::double precision * interval '1 millisecond')
|
||||
WHERE id = $1
|
||||
RETURNING id`,
|
||||
[leaseId, ttlMs],
|
||||
);
|
||||
return rows.length > 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* Release a lease explicitly. Idempotent — a missing lease returns silently
|
||||
* (it was pruned or the owning job row cascade-deleted it).
|
||||
*/
|
||||
export async function releaseLease(engine: BrainEngine, leaseId: number): Promise<void> {
|
||||
await engine.executeRaw(`DELETE FROM subagent_rate_leases WHERE id = $1`, [leaseId]);
|
||||
}
|
||||
|
||||
/**
|
||||
* Attempt to renew with 3x exponential backoff (250ms / 500ms / 1s). Used
|
||||
* mid-LLM-call when the first renewal attempt hits a DB blip. On all-three
|
||||
* failure the caller must abort with a renewable error so the worker
|
||||
* re-claims the job.
|
||||
*/
|
||||
export async function renewLeaseWithBackoff(engine: BrainEngine, leaseId: number, ttlMs = DEFAULT_TTL_MS): Promise<boolean> {
|
||||
const delays = [0, 250, 500, 1000]; // first attempt immediate, then 250/500/1000
|
||||
for (const delay of delays) {
|
||||
if (delay > 0) await new Promise(r => setTimeout(r, delay));
|
||||
try {
|
||||
if (await renewLease(engine, leaseId, ttlMs)) return true;
|
||||
// Lease is gone (pruned). No point retrying — caller must abort.
|
||||
return false;
|
||||
} catch {
|
||||
// DB blip; fall through to next delay.
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
225
src/core/minions/tools/brain-allowlist.ts
Normal file
225
src/core/minions/tools/brain-allowlist.ts
Normal file
@@ -0,0 +1,225 @@
|
||||
/**
|
||||
* Derive the subagent brain-tool registry from src/core/operations.ts.
|
||||
*
|
||||
* Single source of truth: the MCP server already maps OPERATIONS → tool defs.
|
||||
* We reuse the same ParamDef-shape → JSONSchema conversion (lives in
|
||||
* buildToolDefs for MCP) and wrap each allowed op with an execute() that
|
||||
* invokes its handler under a subagent-tagged OperationContext.
|
||||
*
|
||||
* Filtering is NAME-based (not by OperationContext.remote, which is a
|
||||
* call-time flag, not operation metadata — codex catch). The allow-list
|
||||
* below is reviewed manually; adding a new op here is an explicit security
|
||||
* decision.
|
||||
*
|
||||
* put_page: allowed, but the subagent tool-schema wraps its `slug` with a
|
||||
* per-subagent namespace regex so the model can only write under
|
||||
* `wiki/agents/<subagentId>/...`. The put_page operation also has a server-
|
||||
* side fail-closed check (see src/core/operations.ts) that catches any
|
||||
* dispatcher bug where viaSubagent=true but subagentId is missing.
|
||||
*
|
||||
* In v0.15 every allow-list op is treated as idempotent for the two-phase
|
||||
* replay path. put_page with a deterministic slug is idempotent at the row
|
||||
* level; repeats re-derive the same embedding over identical content.
|
||||
*/
|
||||
|
||||
import type { BrainEngine } from '../../engine.ts';
|
||||
import type { GBrainConfig } from '../../config.ts';
|
||||
import { operations } from '../../operations.ts';
|
||||
import type { Operation, OperationContext } from '../../operations.ts';
|
||||
import type { ToolCtx, ToolDef } from '../types.ts';
|
||||
|
||||
/**
|
||||
* v0.15 brain-tool allow-list. Review carefully when extending. Op names
|
||||
* verified against origin/master:src/core/operations.ts (post shell-jobs +
|
||||
* Knowledge Runtime).
|
||||
*
|
||||
* Read-only (all safe):
|
||||
* query, search, get_page, list_pages, file_list, file_url,
|
||||
* get_backlinks, traverse_graph, resolve_slugs, get_ingest_log
|
||||
*
|
||||
* Conditional write:
|
||||
* put_page (namespace-enforced by the tool schema + server-side check)
|
||||
*
|
||||
* Every name below MUST exist in src/core/operations.ts OPERATIONS; the
|
||||
* brain-allowlist test pins this invariant so an upstream rename fails CI
|
||||
* instead of silently dropping a tool.
|
||||
*/
|
||||
export const BRAIN_TOOL_ALLOWLIST: ReadonlySet<string> = new Set([
|
||||
'query',
|
||||
'search',
|
||||
'get_page',
|
||||
'list_pages',
|
||||
'file_list',
|
||||
'file_url',
|
||||
'get_backlinks',
|
||||
'traverse_graph',
|
||||
'resolve_slugs',
|
||||
'get_ingest_log',
|
||||
'put_page',
|
||||
]);
|
||||
|
||||
/** Matches Anthropic's tool-name constraint. No dots. */
|
||||
const ANTHROPIC_NAME_RE = /^[a-zA-Z0-9_-]{1,64}$/;
|
||||
|
||||
function sanitizeToolName(opName: string): string {
|
||||
// Prefix with brain_ and replace any non-conforming char. For the v0.15
|
||||
// allow-list, every op name is already a valid simple identifier, so this
|
||||
// is defense-in-depth.
|
||||
const prefixed = `brain_${opName}`.replace(/[^a-zA-Z0-9_-]/g, '_');
|
||||
return prefixed.slice(0, 64);
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert an Operation.params (ParamDef) map to an Anthropic-compatible
|
||||
* JSONSchema.input_schema. Same shape MCP uses inline — ParamDef.type
|
||||
* narrows to a subset of JSONSchema types.
|
||||
*/
|
||||
function paramsToInputSchema(op: Operation): Record<string, unknown> {
|
||||
return {
|
||||
type: 'object' as const,
|
||||
properties: Object.fromEntries(
|
||||
Object.entries(op.params).map(([k, v]) => [k, {
|
||||
type: v.type === 'array' ? 'array' : v.type,
|
||||
...(v.description ? { description: v.description } : {}),
|
||||
...(v.enum ? { enum: v.enum } : {}),
|
||||
...(v.items ? { items: { type: v.items.type } } : {}),
|
||||
}]),
|
||||
),
|
||||
required: Object.entries(op.params).filter(([, v]) => v.required).map(([k]) => k),
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* For put_page specifically, the tool schema shown to the model constrains
|
||||
* `slug` to `wiki/agents/<subagentId>/...`. The server-side check in
|
||||
* operations.ts is the authoritative gate; this just helps the model write
|
||||
* correct slugs on the first try.
|
||||
*/
|
||||
function namespacedPutPageSchema(op: Operation, subagentId: number): Record<string, unknown> {
|
||||
const base = paramsToInputSchema(op);
|
||||
const props = (base.properties as Record<string, Record<string, unknown>>) ?? {};
|
||||
if (props.slug) {
|
||||
props.slug = {
|
||||
...props.slug,
|
||||
description: `Page slug. MUST start with "wiki/agents/${subagentId}/" (agents can only write under their own namespace).`,
|
||||
pattern: `^wiki/agents/${subagentId}/.+`,
|
||||
};
|
||||
}
|
||||
return { ...base, properties: props };
|
||||
}
|
||||
|
||||
/** Args required to build the registry for a given subagent job. */
|
||||
export interface BuildBrainToolsOpts {
|
||||
subagentId: number;
|
||||
engine: BrainEngine;
|
||||
config: GBrainConfig;
|
||||
/** Optional filter: only include names in this set. */
|
||||
allowedNames?: ReadonlySet<string>;
|
||||
}
|
||||
|
||||
interface OpContextDeps {
|
||||
engine: BrainEngine;
|
||||
config: GBrainConfig;
|
||||
subagentId: number;
|
||||
jobId: number;
|
||||
signal?: AbortSignal;
|
||||
}
|
||||
|
||||
function buildOpContext(deps: OpContextDeps): OperationContext {
|
||||
return {
|
||||
engine: deps.engine,
|
||||
config: deps.config,
|
||||
logger: {
|
||||
info: (msg: string) => process.stderr.write(`[subagent-tool:${deps.jobId}] ${msg}\n`),
|
||||
warn: (msg: string) => process.stderr.write(`[subagent-tool:${deps.jobId}] WARN: ${msg}\n`),
|
||||
error: (msg: string) => process.stderr.write(`[subagent-tool:${deps.jobId}] ERROR: ${msg}\n`),
|
||||
},
|
||||
dryRun: false,
|
||||
remote: true, // match MCP trust boundary
|
||||
jobId: deps.jobId,
|
||||
subagentId: deps.subagentId,
|
||||
viaSubagent: true, // FAIL-CLOSED: put_page etc. enforce namespace
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the subagent brain-tool registry. One ToolDef per allow-listed op,
|
||||
* with a namespace-wrapped schema for put_page.
|
||||
*
|
||||
* Call this once per subagent-job claim; the registry is keyed to the job's
|
||||
* subagentId + engine handle, so it's not shareable across jobs.
|
||||
*/
|
||||
export function buildBrainTools(opts: BuildBrainToolsOpts): ToolDef[] {
|
||||
const filter = opts.allowedNames ?? BRAIN_TOOL_ALLOWLIST;
|
||||
const picked: Operation[] = operations.filter(
|
||||
op => BRAIN_TOOL_ALLOWLIST.has(op.name) && filter.has(op.name),
|
||||
);
|
||||
|
||||
return picked.map<ToolDef>(op => {
|
||||
const schema = op.name === 'put_page'
|
||||
? namespacedPutPageSchema(op, opts.subagentId)
|
||||
: paramsToInputSchema(op);
|
||||
|
||||
const toolName = sanitizeToolName(op.name);
|
||||
if (!ANTHROPIC_NAME_RE.test(toolName)) {
|
||||
throw new Error(`brain tool name ${toolName} does not match Anthropic constraint`);
|
||||
}
|
||||
|
||||
return {
|
||||
name: toolName,
|
||||
description: op.description,
|
||||
input_schema: schema,
|
||||
// v0.15 ships only idempotent brain tools (every allow-listed op is
|
||||
// deterministic over its input; put_page re-writes the same slug).
|
||||
idempotent: true,
|
||||
async execute(input: unknown, ctx: ToolCtx): Promise<unknown> {
|
||||
const opCtx = buildOpContext({
|
||||
engine: ctx.engine,
|
||||
config: opts.config,
|
||||
subagentId: opts.subagentId,
|
||||
jobId: ctx.jobId,
|
||||
signal: ctx.signal,
|
||||
});
|
||||
const params = (input && typeof input === 'object') ? input as Record<string, unknown> : {};
|
||||
return op.handler(opCtx, params);
|
||||
},
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply the caller's `allowed_tools` subset to a registry. Unknown tool
|
||||
* names throw a clear error at load time (NOT silently ignored) so
|
||||
* subagent defs with a typo don't ship to prod wondering why a tool
|
||||
* never fires.
|
||||
*/
|
||||
export function filterAllowedTools(registry: ToolDef[], allowedToolNames: string[]): ToolDef[] {
|
||||
const indexByName = new Map(registry.map(t => [t.name, t]));
|
||||
// Also index by the un-prefixed op name (for friendlier allowed_tools entries).
|
||||
const indexByShort = new Map(
|
||||
registry.map(t => [t.name.replace(/^brain_/, ''), t]),
|
||||
);
|
||||
const seen = new Set<string>();
|
||||
const picked: ToolDef[] = [];
|
||||
for (const requested of allowedToolNames) {
|
||||
const match = indexByName.get(requested) ?? indexByShort.get(requested);
|
||||
if (!match) {
|
||||
throw new Error(
|
||||
`subagent allowed_tools references unknown tool "${requested}". ` +
|
||||
`Known: ${[...indexByName.keys()].join(', ')}`,
|
||||
);
|
||||
}
|
||||
if (seen.has(match.name)) continue;
|
||||
seen.add(match.name);
|
||||
picked.push(match);
|
||||
}
|
||||
return picked;
|
||||
}
|
||||
|
||||
/** Exported for unit tests (stable surface). */
|
||||
export const __testing = {
|
||||
sanitizeToolName,
|
||||
paramsToInputSchema,
|
||||
namespacedPutPageSchema,
|
||||
ANTHROPIC_NAME_RE,
|
||||
};
|
||||
229
src/core/minions/transcript.ts
Normal file
229
src/core/minions/transcript.ts
Normal file
@@ -0,0 +1,229 @@
|
||||
/**
|
||||
* Render a subagent conversation to markdown.
|
||||
*
|
||||
* Two inputs:
|
||||
* - subagent_messages rows (persisted Anthropic message-block arrays)
|
||||
* - subagent_tool_executions rows (two-phase tool ledger — used to show
|
||||
* tool outputs alongside the model's tool_use calls)
|
||||
*
|
||||
* The output is suitable for:
|
||||
* - an attachment on the completed subagent job row
|
||||
* - inline display in `gbrain agent logs <job>` after the heartbeat stream
|
||||
* - committing as a brain page under wiki/agents/<subagentId>/transcript-*
|
||||
*
|
||||
* Does NOT redact anything — the caller writes to a location they control.
|
||||
* For PII-sensitive deployments, pass through a sanitizer before persisting.
|
||||
*/
|
||||
|
||||
import type { BrainEngine } from '../engine.ts';
|
||||
import type { ContentBlock } from './types.ts';
|
||||
|
||||
export interface SubagentMessageRow {
|
||||
id: number;
|
||||
job_id: number;
|
||||
message_idx: number;
|
||||
role: 'user' | 'assistant';
|
||||
content_blocks: ContentBlock[];
|
||||
tokens_in: number | null;
|
||||
tokens_out: number | null;
|
||||
tokens_cache_read: number | null;
|
||||
tokens_cache_create: number | null;
|
||||
model: string | null;
|
||||
ended_at: Date;
|
||||
}
|
||||
|
||||
export interface SubagentToolExecRow {
|
||||
id: number;
|
||||
job_id: number;
|
||||
message_idx: number;
|
||||
tool_use_id: string;
|
||||
tool_name: string;
|
||||
input: unknown;
|
||||
status: 'pending' | 'complete' | 'failed';
|
||||
output: unknown;
|
||||
error: string | null;
|
||||
}
|
||||
|
||||
/** Fetch both row sets for a job in one shot. */
|
||||
export async function loadTranscriptRows(
|
||||
engine: BrainEngine,
|
||||
jobId: number,
|
||||
): Promise<{ messages: SubagentMessageRow[]; tools: SubagentToolExecRow[] }> {
|
||||
const msgRows = await engine.executeRaw<Record<string, unknown>>(
|
||||
`SELECT id, job_id, message_idx, role, content_blocks, tokens_in, tokens_out,
|
||||
tokens_cache_read, tokens_cache_create, model, ended_at
|
||||
FROM subagent_messages
|
||||
WHERE job_id = $1
|
||||
ORDER BY message_idx ASC`,
|
||||
[jobId],
|
||||
);
|
||||
const toolRows = await engine.executeRaw<Record<string, unknown>>(
|
||||
`SELECT id, job_id, message_idx, tool_use_id, tool_name, input, status, output, error
|
||||
FROM subagent_tool_executions
|
||||
WHERE job_id = $1
|
||||
ORDER BY id ASC`,
|
||||
[jobId],
|
||||
);
|
||||
return {
|
||||
messages: msgRows.map(normalizeMessage),
|
||||
tools: toolRows.map(normalizeTool),
|
||||
};
|
||||
}
|
||||
|
||||
function normalizeMessage(row: Record<string, unknown>): SubagentMessageRow {
|
||||
const blocks = row.content_blocks;
|
||||
const parsedBlocks: ContentBlock[] = typeof blocks === 'string'
|
||||
? (JSON.parse(blocks) as ContentBlock[])
|
||||
: (blocks as ContentBlock[]) ?? [];
|
||||
return {
|
||||
id: row.id as number,
|
||||
job_id: row.job_id as number,
|
||||
message_idx: row.message_idx as number,
|
||||
role: row.role as 'user' | 'assistant',
|
||||
content_blocks: parsedBlocks,
|
||||
tokens_in: (row.tokens_in as number) ?? null,
|
||||
tokens_out: (row.tokens_out as number) ?? null,
|
||||
tokens_cache_read: (row.tokens_cache_read as number) ?? null,
|
||||
tokens_cache_create: (row.tokens_cache_create as number) ?? null,
|
||||
model: (row.model as string) ?? null,
|
||||
ended_at: new Date(row.ended_at as string),
|
||||
};
|
||||
}
|
||||
|
||||
function normalizeTool(row: Record<string, unknown>): SubagentToolExecRow {
|
||||
const input = typeof row.input === 'string' ? JSON.parse(row.input) : row.input;
|
||||
const output = row.output == null
|
||||
? null
|
||||
: (typeof row.output === 'string' ? JSON.parse(row.output) : row.output);
|
||||
return {
|
||||
id: row.id as number,
|
||||
job_id: row.job_id as number,
|
||||
message_idx: row.message_idx as number,
|
||||
tool_use_id: row.tool_use_id as string,
|
||||
tool_name: row.tool_name as string,
|
||||
input,
|
||||
status: row.status as 'pending' | 'complete' | 'failed',
|
||||
output,
|
||||
error: (row.error as string) ?? null,
|
||||
};
|
||||
}
|
||||
|
||||
export interface RenderTranscriptOpts {
|
||||
/** Trim long tool outputs in the markdown. Default: 4 KiB per output. */
|
||||
maxOutputBytes?: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Render messages + tool executions to markdown. Message order is
|
||||
* authoritative; tool rows are spliced under their owning assistant message
|
||||
* by tool_use_id.
|
||||
*/
|
||||
export function renderTranscript(
|
||||
messages: SubagentMessageRow[],
|
||||
tools: SubagentToolExecRow[],
|
||||
opts: RenderTranscriptOpts = {},
|
||||
): string {
|
||||
const maxOut = opts.maxOutputBytes ?? 4096;
|
||||
const toolById = new Map<string, SubagentToolExecRow>(
|
||||
tools.map(t => [t.tool_use_id, t]),
|
||||
);
|
||||
|
||||
const out: string[] = [];
|
||||
out.push('# Subagent transcript', '');
|
||||
if (messages.length === 0) {
|
||||
out.push('_(no messages)_');
|
||||
return out.join('\n');
|
||||
}
|
||||
|
||||
const first = messages[0]!;
|
||||
out.push(`- job_id: ${first.job_id}`);
|
||||
out.push(`- messages: ${messages.length}`);
|
||||
if (first.model) out.push(`- model: ${first.model}`);
|
||||
out.push('');
|
||||
|
||||
for (const msg of messages) {
|
||||
out.push(`## Message ${msg.message_idx} — ${msg.role}`);
|
||||
if (msg.tokens_in != null || msg.tokens_out != null) {
|
||||
const parts: string[] = [];
|
||||
if (msg.tokens_in) parts.push(`in=${msg.tokens_in}`);
|
||||
if (msg.tokens_out) parts.push(`out=${msg.tokens_out}`);
|
||||
if (msg.tokens_cache_read) parts.push(`cache_read=${msg.tokens_cache_read}`);
|
||||
if (msg.tokens_cache_create) parts.push(`cache_create=${msg.tokens_cache_create}`);
|
||||
if (parts.length > 0) out.push(`> tokens: ${parts.join(' ')}`);
|
||||
}
|
||||
out.push('');
|
||||
|
||||
for (const block of msg.content_blocks) {
|
||||
renderBlock(block, toolById, maxOut, out);
|
||||
}
|
||||
out.push('');
|
||||
}
|
||||
|
||||
return out.join('\n').replace(/\n{3,}/g, '\n\n');
|
||||
}
|
||||
|
||||
function renderBlock(
|
||||
block: ContentBlock,
|
||||
toolById: Map<string, SubagentToolExecRow>,
|
||||
maxOutputBytes: number,
|
||||
out: string[],
|
||||
): void {
|
||||
if (block.type === 'text' && typeof block.text === 'string') {
|
||||
out.push(block.text);
|
||||
out.push('');
|
||||
return;
|
||||
}
|
||||
|
||||
if (block.type === 'tool_use') {
|
||||
const name = typeof block.name === 'string' ? block.name : '<unknown>';
|
||||
const inputStr = safeJson(block.input, 2);
|
||||
out.push(`**tool_use** \`${name}\` (id=\`${block.id ?? '?'}\`)`);
|
||||
out.push('```json', inputStr, '```');
|
||||
const toolRow = block.id && typeof block.id === 'string' ? toolById.get(block.id) : undefined;
|
||||
if (toolRow) {
|
||||
out.push(`→ status: **${toolRow.status}**`);
|
||||
if (toolRow.status === 'complete') {
|
||||
out.push('```json', truncate(safeJson(toolRow.output, 2), maxOutputBytes), '```');
|
||||
} else if (toolRow.status === 'failed') {
|
||||
out.push(`> error: ${toolRow.error ?? '(no error text)'}`);
|
||||
} else if (toolRow.status === 'pending') {
|
||||
out.push('> pending (no resolution recorded yet)');
|
||||
}
|
||||
}
|
||||
out.push('');
|
||||
return;
|
||||
}
|
||||
|
||||
if (block.type === 'tool_result') {
|
||||
// Most tool_result blocks live inside user messages echoing back the
|
||||
// assistant's tool_use. We skip them here because the owning tool_use
|
||||
// block already rendered the execution row. If the user message carries
|
||||
// a raw tool_result with no matching tool_use (rare), dump it raw.
|
||||
if (!block.tool_use_id || !toolById.has(block.tool_use_id as string)) {
|
||||
out.push('**tool_result** (no matching tool_use in this transcript)');
|
||||
out.push('```json', truncate(safeJson(block.content, 2), maxOutputBytes), '```');
|
||||
out.push('');
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
// Unknown block type — dump as a fenced JSON block for diagnostics.
|
||||
out.push(`**${block.type}**`);
|
||||
out.push('```json', truncate(safeJson(block, 2), maxOutputBytes), '```');
|
||||
out.push('');
|
||||
}
|
||||
|
||||
function safeJson(value: unknown, indent = 0): string {
|
||||
try {
|
||||
return JSON.stringify(value, null, indent);
|
||||
} catch {
|
||||
return String(value);
|
||||
}
|
||||
}
|
||||
|
||||
function truncate(s: string, maxBytes: number): string {
|
||||
if (Buffer.byteLength(s, 'utf8') <= maxBytes) return s;
|
||||
// Slice bytewise via Buffer so we don't split a multibyte char awkwardly.
|
||||
const buf = Buffer.from(s, 'utf8').slice(0, maxBytes);
|
||||
return buf.toString('utf8') + `\n... [truncated at ${maxBytes} bytes]`;
|
||||
}
|
||||
@@ -104,9 +104,11 @@ export interface MinionJobInput {
|
||||
backoff_delay?: number;
|
||||
backoff_jitter?: number;
|
||||
/**
|
||||
* Max number of stall windows before dead-letter. Default is the schema
|
||||
* default (5 as of v0.13.1). Clamped to [1, 100] on insert — values
|
||||
* outside that range are silently coerced. See migration v13.
|
||||
* Per-job override for how many stall windows are tolerated before the
|
||||
* queue dead-letters the job. When omitted, the schema column DEFAULT
|
||||
* applies (bumped 1 → 3 in v0.14, now 5 as of v0.13.1's audit). Clamped
|
||||
* to [1, 100] on insert. For long-running handlers (LLM loops etc.) that
|
||||
* should survive a worker kill mid-run, set max_stalled: 3+.
|
||||
*/
|
||||
max_stalled?: number;
|
||||
delay?: number; // ms delay before eligible
|
||||
@@ -208,14 +210,37 @@ export function rowToInboxMessage(row: Record<string, unknown>): InboxMessage {
|
||||
};
|
||||
}
|
||||
|
||||
// --- Child-done inbox message (auto-posted on completeJob) ---
|
||||
// --- Child-done inbox message (auto-posted on every terminal transition) ---
|
||||
|
||||
/**
|
||||
* Posted into the parent's inbox when a child reaches a terminal state.
|
||||
*
|
||||
* Pre-v0.15: only success paths (completeJob) emitted this. Failed/dead/
|
||||
* cancelled children produced no payload, which stranded aggregator-style
|
||||
* parents that needed to wait for N children regardless of outcome.
|
||||
*
|
||||
* v0.15: failJob, cancelJob, and handleTimeouts also emit child_done with
|
||||
* the appropriate `outcome`, so the aggregator handler can count "N children
|
||||
* resolved" without worrying about which rail each one took.
|
||||
*
|
||||
* Backwards compatible: old ChildDoneMessage consumers only read child_id,
|
||||
* job_name, and result (non-null on success). Outcome and error are additive.
|
||||
*/
|
||||
export type ChildOutcome = 'complete' | 'failed' | 'dead' | 'cancelled' | 'timeout';
|
||||
|
||||
/** Posted into the parent's inbox when a child completes successfully. */
|
||||
export interface ChildDoneMessage {
|
||||
type: 'child_done';
|
||||
child_id: number;
|
||||
job_name: string;
|
||||
result: unknown;
|
||||
/**
|
||||
* Terminal outcome. When absent (from a pre-v0.15 writer that didn't set
|
||||
* it), consumers should treat the message as 'complete' — the legacy writer
|
||||
* only emitted on success paths.
|
||||
*/
|
||||
outcome?: ChildOutcome;
|
||||
/** Set when outcome !== 'complete'. Mirrors minion_jobs.error_text. */
|
||||
error?: string | null;
|
||||
}
|
||||
|
||||
// --- Attachments (v7) ---
|
||||
@@ -336,3 +361,118 @@ export function rowToMinionJob(row: Record<string, unknown>): MinionJob {
|
||||
updated_at: new Date(row.updated_at as string),
|
||||
};
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Subagent runtime (v0.15+)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* Input payload for the 'subagent' handler. Shape is intentionally narrow —
|
||||
* tool registry and provider config resolve via handler-side defaults + env,
|
||||
* not per-job data, so restart/replay uses the same behavior.
|
||||
*/
|
||||
export interface SubagentHandlerData {
|
||||
/** Top-level user turn kicking off the loop. */
|
||||
prompt: string;
|
||||
/** Optional subagent definition path (skills/subagents/*.md or plugin). */
|
||||
subagent_def?: string;
|
||||
/** Anthropic model id. Defaults to sonnet at handler resolution time. */
|
||||
model?: string;
|
||||
/** Max assistant turns before the loop fails with stop_reason='max_turns'. */
|
||||
max_turns?: number;
|
||||
/**
|
||||
* Whitelist of tool names the agent may call. MUST be a subset of the
|
||||
* derived registry names — invalid entries are rejected at tool-dispatch
|
||||
* time, not silently ignored. Empty array = no tools.
|
||||
*/
|
||||
allowed_tools?: string[];
|
||||
/** System prompt override. When omitted, the handler builds one. */
|
||||
system?: string;
|
||||
/** Template variables for subagent_def. Arbitrary JSON-serializable. */
|
||||
input_vars?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Input for the 'subagent_aggregator' handler. Claims AFTER all children
|
||||
* resolve and aggregates their results into a brain page.
|
||||
*/
|
||||
export interface AggregatorHandlerData {
|
||||
/** The subagent child job ids this aggregator is waiting on. */
|
||||
children_ids: number[];
|
||||
/**
|
||||
* Optional template for the synthesis prompt. When omitted, the handler
|
||||
* uses a generic "summarize these N results" prompt.
|
||||
*/
|
||||
aggregate_prompt_template?: string;
|
||||
/**
|
||||
* Target slug for the aggregated brain page. When present, a trusted-CLI
|
||||
* put_page (viaSubagent=false) writes the final aggregation there.
|
||||
*/
|
||||
output_slug?: string;
|
||||
}
|
||||
|
||||
/** Tool execution context passed to every ToolDef.execute. */
|
||||
export interface ToolCtx {
|
||||
/** Engine for DB-backed tools (brain_query, put_page, etc.). */
|
||||
engine: import('../engine.ts').BrainEngine;
|
||||
/** The subagent job id (used for audit + put_page namespace enforcement). */
|
||||
jobId: number;
|
||||
/** Always true for LLM-invoked tools — matches MCP trust boundary. */
|
||||
remote: true;
|
||||
/** Fired on cooperative abort (timeout, lock loss, cancel, SIGTERM). */
|
||||
signal?: AbortSignal;
|
||||
}
|
||||
|
||||
/**
|
||||
* A tool the subagent can call. Names match Anthropic's constraint
|
||||
* `^[a-zA-Z0-9_-]{1,64}$` — no dots. The input_schema is the JSONSchema
|
||||
* shipped to the Anthropic Messages API verbatim; ToolDef is the single
|
||||
* Anthropic-compatible envelope, not an MCP McpToolDef (those have a
|
||||
* different shape — ".inputSchema" vs ".input_schema").
|
||||
*
|
||||
* `idempotent: true` is required for the two-phase replay path: on resume,
|
||||
* a 'pending' row can be re-executed. Non-idempotent tools need a separate
|
||||
* resume policy and are not supported in v0.15.
|
||||
*/
|
||||
export interface ToolDef {
|
||||
name: string;
|
||||
description: string;
|
||||
input_schema: Record<string, unknown>;
|
||||
idempotent: boolean;
|
||||
execute(input: unknown, ctx: ToolCtx): Promise<unknown>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Anthropic content-block subset we persist in subagent_messages.content_blocks.
|
||||
* This is structural — we don't gatekeep on unknown block types (future SDK
|
||||
* additions pass through). Use the string-literal discriminant on 'type'.
|
||||
*/
|
||||
export type ContentBlock =
|
||||
| { type: 'text'; text: string; [k: string]: unknown }
|
||||
| { type: 'tool_use'; id: string; name: string; input: unknown; [k: string]: unknown }
|
||||
| { type: 'tool_result'; tool_use_id: string; content: unknown; is_error?: boolean; [k: string]: unknown }
|
||||
| { type: string; [k: string]: unknown };
|
||||
|
||||
/** Stop reason reported to the caller when the subagent loop terminates. */
|
||||
export type SubagentStopReason =
|
||||
| 'end_turn' // Anthropic says end_turn and last message has no tool_use
|
||||
| 'max_turns' // hit max_turns budget before end_turn
|
||||
| 'refusal' // detected via stop_reason + content shape
|
||||
| 'error'; // unrecoverable (empty response retry exhausted, etc.)
|
||||
|
||||
/** Terminal result payload emitted by the subagent handler. */
|
||||
export interface SubagentResult {
|
||||
/** Concatenated text from the final assistant message. */
|
||||
result: string;
|
||||
/** Number of assistant turns consumed. */
|
||||
turns_count: number;
|
||||
/** Why the loop stopped. */
|
||||
stop_reason: SubagentStopReason;
|
||||
/** Rollup of tokens across all turns. */
|
||||
tokens: {
|
||||
in: number;
|
||||
out: number;
|
||||
cache_read: number;
|
||||
cache_create: number;
|
||||
};
|
||||
}
|
||||
|
||||
94
src/core/minions/wait-for-completion.ts
Normal file
94
src/core/minions/wait-for-completion.ts
Normal file
@@ -0,0 +1,94 @@
|
||||
/**
|
||||
* Poll-until-terminal helper for CLI callers. Minions doesn't ship a
|
||||
* notification stream for arbitrary callers (the NOTIFY trigger is worker-
|
||||
* side), so `gbrain agent run --follow` on the CLI side polls getJob() until
|
||||
* the job reaches a terminal state.
|
||||
*
|
||||
* On timeout, the job is NOT cancelled — the user can `gbrain jobs get <id>`
|
||||
* later to check. Explicit cancellation is the user's call via `gbrain jobs
|
||||
* cancel <id>`.
|
||||
*/
|
||||
|
||||
import type { MinionQueue } from './queue.ts';
|
||||
import type { MinionJob, MinionJobStatus } from './types.ts';
|
||||
|
||||
export class TimeoutError extends Error {
|
||||
constructor(public readonly jobId: number, public readonly elapsedMs: number) {
|
||||
super(`timeout after ${elapsedMs}ms waiting for job ${jobId}`);
|
||||
this.name = 'TimeoutError';
|
||||
}
|
||||
}
|
||||
|
||||
const TERMINAL_STATES: readonly MinionJobStatus[] = ['completed', 'failed', 'dead', 'cancelled'] as const;
|
||||
const TERMINAL_SET = new Set<MinionJobStatus>(TERMINAL_STATES);
|
||||
|
||||
export interface WaitOpts {
|
||||
/** Abort after this many ms. Default: 24h (long enough for most durable runs). */
|
||||
timeoutMs?: number;
|
||||
/**
|
||||
* Poll interval. Defaults:
|
||||
* - 1000ms on Postgres (lighter load, concurrent followers scale)
|
||||
* - 250ms when the caller knows it's on PGLite inline (single process,
|
||||
* no network RTT)
|
||||
* Callers pass the appropriate value explicitly — this module doesn't
|
||||
* introspect the engine.
|
||||
*/
|
||||
pollMs?: number;
|
||||
/** Optional AbortSignal — on abort, the poll loop exits early (no TimeoutError). */
|
||||
signal?: AbortSignal;
|
||||
}
|
||||
|
||||
export async function waitForCompletion(
|
||||
queue: MinionQueue,
|
||||
jobId: number,
|
||||
opts: WaitOpts = {},
|
||||
): Promise<MinionJob> {
|
||||
const timeoutMs = opts.timeoutMs ?? 24 * 60 * 60 * 1000;
|
||||
const pollMs = opts.pollMs ?? 1000;
|
||||
const started = Date.now();
|
||||
|
||||
// Fast-path first read (don't wait pollMs just to learn it's already done).
|
||||
let job = await queue.getJob(jobId);
|
||||
if (!job) throw new Error(`job ${jobId} not found`);
|
||||
if (TERMINAL_SET.has(job.status)) return job;
|
||||
|
||||
while (true) {
|
||||
if (opts.signal?.aborted) {
|
||||
// Caller aborted. Return the last-seen snapshot rather than throwing —
|
||||
// the job itself is still alive queue-side, and the caller knows they
|
||||
// aborted.
|
||||
return job;
|
||||
}
|
||||
const elapsed = Date.now() - started;
|
||||
if (elapsed >= timeoutMs) {
|
||||
throw new TimeoutError(jobId, elapsed);
|
||||
}
|
||||
const remaining = timeoutMs - elapsed;
|
||||
const sleep = Math.min(pollMs, remaining);
|
||||
await delay(sleep, opts.signal);
|
||||
|
||||
job = await queue.getJob(jobId);
|
||||
if (!job) throw new Error(`job ${jobId} disappeared mid-wait`);
|
||||
if (TERMINAL_SET.has(job.status)) return job;
|
||||
}
|
||||
}
|
||||
|
||||
function delay(ms: number, signal?: AbortSignal): Promise<void> {
|
||||
if (ms <= 0) return Promise.resolve();
|
||||
return new Promise((resolve) => {
|
||||
const t = setTimeout(() => {
|
||||
signal?.removeEventListener('abort', onAbort);
|
||||
resolve();
|
||||
}, ms);
|
||||
const onAbort = () => {
|
||||
clearTimeout(t);
|
||||
resolve();
|
||||
};
|
||||
signal?.addEventListener('abort', onAbort, { once: true });
|
||||
});
|
||||
}
|
||||
|
||||
// Exported for unit tests.
|
||||
export const __testing = {
|
||||
TERMINAL_STATES,
|
||||
};
|
||||
@@ -167,6 +167,19 @@ export interface OperationContext {
|
||||
* When unset, operations MUST default to the stricter (remote=true) behavior.
|
||||
*/
|
||||
remote?: boolean;
|
||||
/**
|
||||
* Subagent runtime context (v0.16+). Set by the subagent tool dispatcher when
|
||||
* dispatching an op as a tool call from an LLM loop. Used to enforce per-op
|
||||
* agent policy (e.g. put_page namespace rule).
|
||||
*
|
||||
* `viaSubagent` is the FAIL-CLOSED flag: when true, agent-facing policy MUST
|
||||
* be enforced even if `subagentId` happens to be undefined (a bug in the
|
||||
* dispatcher must not bypass the guard). `subagentId` is the owning subagent
|
||||
* job id; `jobId` is the current Minion job id (aggregator or subagent).
|
||||
*/
|
||||
jobId?: number;
|
||||
subagentId?: number;
|
||||
viaSubagent?: boolean;
|
||||
/**
|
||||
* Resolved global CLI options (--quiet / --progress-json / --progress-interval).
|
||||
* CLI callers populate this from `getCliOptions()`. MCP / library callers
|
||||
@@ -235,8 +248,28 @@ const put_page: Operation = {
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'put_page', slug: p.slug };
|
||||
const slug = p.slug as string;
|
||||
|
||||
// Subagent namespace enforcement (v0.15+). Runs BEFORE the dry-run
|
||||
// short-circuit so preview calls surface the same rejection. Confines
|
||||
// LLM-driven writes to wiki/agents/<subagentId>/... — no leading slash
|
||||
// (slug grammar rejects that), anchored, slash-boundary to defeat prefix
|
||||
// collisions like `wiki/agents/12evil/*` impersonating subagent 12.
|
||||
//
|
||||
// FAIL-CLOSED: `viaSubagent=true` enforces the check even if the
|
||||
// dispatcher forgot to populate `subagentId`. Agent-originated writes
|
||||
// without an owning subagent id are rejected outright.
|
||||
if (ctx.viaSubagent === true) {
|
||||
if (typeof ctx.subagentId !== 'number' || Number.isNaN(ctx.subagentId)) {
|
||||
throw new OperationError('permission_denied', 'put_page via subagent requires ctx.subagentId');
|
||||
}
|
||||
const prefix = `wiki/agents/${ctx.subagentId}/`;
|
||||
if (!slug.startsWith(prefix) || slug.length === prefix.length) {
|
||||
throw new OperationError('permission_denied', `put_page via subagent must write under '${prefix}...'`);
|
||||
}
|
||||
}
|
||||
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'put_page', slug: p.slug };
|
||||
// Skip embedding when no OpenAI key is configured. importFromContent's existing
|
||||
// try/catch around embed only catches; without a key the OpenAI client would
|
||||
// attempt 5 retries with exponential backoff (up to ~2 minutes total) before
|
||||
@@ -675,7 +708,7 @@ const get_backlinks: Operation = {
|
||||
* grows a `visited` array per path; in `direction=both` the join is `OR`-based and
|
||||
* fans out exponentially. Without a cap, a remote MCP caller can pass depth=1e6
|
||||
* and burn memory/CPU on the database. 10 hops is well beyond any realistic
|
||||
* relationship query (Wintermute's "people who attended meetings with Alice"
|
||||
* relationship query (your OpenClaw's "people who attended meetings with Alice"
|
||||
* is 2 hops; the deepest meaningful chain in our test data is 4).
|
||||
*/
|
||||
const TRAVERSE_DEPTH_CAP = 10;
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
* WHERE and HOW. Every user-visible URL, every citation, every wikilink is
|
||||
* assembled from resolver outputs or structured IDs — never from LLM text.
|
||||
*
|
||||
* Example (from the Wintermute memory log, 2026-04-13): an agent was asked
|
||||
* Example (from Garry's OpenClaw memory log, 2026-04-13): an agent was asked
|
||||
* to rewrite daily files and it invented a "Philip Leung" entity that didn't
|
||||
* exist. With the Scaffolder, the LLM writes "the attendee was mentioned
|
||||
* again" and code writes the actual `[Philip Leung](people/philip-leung.md)`
|
||||
@@ -65,7 +65,7 @@ export interface EmailCitationInput {
|
||||
* Canonical email citation with a deep link that opens the actual thread:
|
||||
* [Source: email "Subject line", 2026-04-18](https://mail.google.com/mail/u/?authuser=...#inbox/...)
|
||||
*
|
||||
* URL shape matches the pattern Wintermute's ingest pipeline builds from API
|
||||
* URL shape matches the pattern Garry's OpenClaw's ingest pipeline builds from API
|
||||
* responses, so brain-page links and agent-generated links use the same
|
||||
* format (cross-tool consistency).
|
||||
*/
|
||||
|
||||
@@ -264,6 +264,52 @@ CREATE INDEX IF NOT EXISTS idx_minion_attachments_job ON minion_attachments (job
|
||||
-- NOTE: SET STORAGE EXTERNAL is omitted on PGLite; it's a Postgres TOAST optimization
|
||||
-- and PGLite may not support it. Postgres path applies it via migration v7.
|
||||
|
||||
-- ============================================================
|
||||
-- Subagent runtime (v0.16.0) — durable LLM loops
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS subagent_messages (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
job_id BIGINT NOT NULL REFERENCES minion_jobs(id) ON DELETE CASCADE,
|
||||
message_idx INTEGER NOT NULL,
|
||||
role TEXT NOT NULL,
|
||||
content_blocks JSONB NOT NULL,
|
||||
tokens_in INTEGER,
|
||||
tokens_out INTEGER,
|
||||
tokens_cache_read INTEGER,
|
||||
tokens_cache_create INTEGER,
|
||||
model TEXT,
|
||||
ended_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
CONSTRAINT uniq_subagent_messages_idx UNIQUE (job_id, message_idx),
|
||||
CONSTRAINT chk_subagent_messages_role CHECK (role IN ('user','assistant'))
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_subagent_messages_job ON subagent_messages (job_id, message_idx);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS subagent_tool_executions (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
job_id BIGINT NOT NULL REFERENCES minion_jobs(id) ON DELETE CASCADE,
|
||||
message_idx INTEGER NOT NULL,
|
||||
tool_use_id TEXT NOT NULL,
|
||||
tool_name TEXT NOT NULL,
|
||||
input JSONB NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
output JSONB,
|
||||
error TEXT,
|
||||
started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
ended_at TIMESTAMPTZ,
|
||||
CONSTRAINT uniq_subagent_tools_use_id UNIQUE (job_id, tool_use_id),
|
||||
CONSTRAINT chk_subagent_tools_status CHECK (status IN ('pending','complete','failed'))
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_subagent_tools_job ON subagent_tool_executions (job_id, status);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS subagent_rate_leases (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
key TEXT NOT NULL,
|
||||
owner_job_id BIGINT NOT NULL REFERENCES minion_jobs(id) ON DELETE CASCADE,
|
||||
acquired_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
expires_at TIMESTAMPTZ NOT NULL
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_rate_leases_key_expires ON subagent_rate_leases (key, expires_at);
|
||||
|
||||
-- ============================================================
|
||||
-- Trigger-based search_vector (spans pages + timeline_entries)
|
||||
-- ============================================================
|
||||
|
||||
@@ -356,6 +356,62 @@ CREATE TABLE IF NOT EXISTS minion_attachments (
|
||||
CREATE INDEX IF NOT EXISTS idx_minion_attachments_job ON minion_attachments (job_id);
|
||||
ALTER TABLE minion_attachments ALTER COLUMN content SET STORAGE EXTERNAL;
|
||||
|
||||
-- ============================================================
|
||||
-- Subagent runtime (v0.16.0) — durable LLM loops
|
||||
-- ============================================================
|
||||
-- Anthropic-native message blocks, one row per Messages API message. Parallel
|
||||
-- tool_use blocks in one assistant message live in content_blocks JSONB,
|
||||
-- not across rows.
|
||||
CREATE TABLE IF NOT EXISTS subagent_messages (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
job_id BIGINT NOT NULL REFERENCES minion_jobs(id) ON DELETE CASCADE,
|
||||
message_idx INTEGER NOT NULL,
|
||||
role TEXT NOT NULL,
|
||||
content_blocks JSONB NOT NULL,
|
||||
tokens_in INTEGER,
|
||||
tokens_out INTEGER,
|
||||
tokens_cache_read INTEGER,
|
||||
tokens_cache_create INTEGER,
|
||||
model TEXT,
|
||||
ended_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
CONSTRAINT uniq_subagent_messages_idx UNIQUE (job_id, message_idx),
|
||||
CONSTRAINT chk_subagent_messages_role CHECK (role IN ('user','assistant'))
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_subagent_messages_job ON subagent_messages (job_id, message_idx);
|
||||
|
||||
-- Two-phase tool execution ledger. Before tool call: INSERT status='pending'.
|
||||
-- After success: UPDATE to 'complete' + output. On failure: 'failed' + error.
|
||||
-- Replay re-runs 'pending' rows only if the tool is idempotent.
|
||||
CREATE TABLE IF NOT EXISTS subagent_tool_executions (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
job_id BIGINT NOT NULL REFERENCES minion_jobs(id) ON DELETE CASCADE,
|
||||
message_idx INTEGER NOT NULL,
|
||||
tool_use_id TEXT NOT NULL,
|
||||
tool_name TEXT NOT NULL,
|
||||
input JSONB NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
output JSONB,
|
||||
error TEXT,
|
||||
started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
ended_at TIMESTAMPTZ,
|
||||
CONSTRAINT uniq_subagent_tools_use_id UNIQUE (job_id, tool_use_id),
|
||||
CONSTRAINT chk_subagent_tools_status CHECK (status IN ('pending','complete','failed'))
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_subagent_tools_job ON subagent_tool_executions (job_id, status);
|
||||
|
||||
-- Rate-lease table — concurrency cap on outbound providers (e.g.
|
||||
-- anthropic:messages). Acquire: INSERT if active < max_concurrent under
|
||||
-- advisory lock. Release: DELETE. Stale leases (expires_at past) auto-prune
|
||||
-- on next acquire so crashed workers can't strand capacity.
|
||||
CREATE TABLE IF NOT EXISTS subagent_rate_leases (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
key TEXT NOT NULL,
|
||||
owner_job_id BIGINT NOT NULL REFERENCES minion_jobs(id) ON DELETE CASCADE,
|
||||
acquired_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
expires_at TIMESTAMPTZ NOT NULL
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_rate_leases_key_expires ON subagent_rate_leases (key, expires_at);
|
||||
|
||||
-- NOTIFY trigger for real-time job events (Postgres only, not PGLite)
|
||||
CREATE OR REPLACE FUNCTION notify_minion_job_change() RETURNS trigger AS \$\$
|
||||
BEGIN
|
||||
|
||||
@@ -6,6 +6,7 @@ import { operations, OperationError } from '../core/operations.ts';
|
||||
import type { Operation, OperationContext } from '../core/operations.ts';
|
||||
import { loadConfig } from '../core/config.ts';
|
||||
import { VERSION } from '../version.ts';
|
||||
import { buildToolDefs } from './tool-defs.ts';
|
||||
|
||||
/** Validate required params exist and have the expected type */
|
||||
function validateParams(op: Operation, params: Record<string, unknown>): string | null {
|
||||
@@ -32,26 +33,11 @@ export async function startMcpServer(engine: BrainEngine) {
|
||||
{ capabilities: { tools: {} } },
|
||||
);
|
||||
|
||||
// Generate tool definitions from operations
|
||||
// Generate tool definitions from operations. Extracted to buildToolDefs so
|
||||
// the subagent tool registry (v0.15+) can call the same mapper against a
|
||||
// filtered OPERATIONS subset instead of duplicating this shape.
|
||||
server.setRequestHandler(ListToolsRequestSchema, async () => ({
|
||||
tools: operations.map(op => ({
|
||||
name: op.name,
|
||||
description: op.description,
|
||||
inputSchema: {
|
||||
type: 'object' as const,
|
||||
properties: Object.fromEntries(
|
||||
Object.entries(op.params).map(([k, v]) => [k, {
|
||||
type: v.type === 'array' ? 'array' : v.type,
|
||||
...(v.description ? { description: v.description } : {}),
|
||||
...(v.enum ? { enum: v.enum } : {}),
|
||||
...(v.items ? { items: { type: v.items.type } } : {}),
|
||||
}]),
|
||||
),
|
||||
required: Object.entries(op.params)
|
||||
.filter(([, v]) => v.required)
|
||||
.map(([k]) => k),
|
||||
},
|
||||
})),
|
||||
tools: buildToolDefs(operations),
|
||||
}));
|
||||
|
||||
// Dispatch tool calls to operation handlers
|
||||
|
||||
32
src/mcp/tool-defs.ts
Normal file
32
src/mcp/tool-defs.ts
Normal file
@@ -0,0 +1,32 @@
|
||||
import type { Operation } from '../core/operations.ts';
|
||||
|
||||
export interface McpToolDef {
|
||||
name: string;
|
||||
description: string;
|
||||
inputSchema: {
|
||||
type: 'object';
|
||||
properties: Record<string, unknown>;
|
||||
required: string[];
|
||||
};
|
||||
}
|
||||
|
||||
export function buildToolDefs(ops: Operation[]): McpToolDef[] {
|
||||
return ops.map(op => ({
|
||||
name: op.name,
|
||||
description: op.description,
|
||||
inputSchema: {
|
||||
type: 'object' as const,
|
||||
properties: Object.fromEntries(
|
||||
Object.entries(op.params).map(([k, v]) => [k, {
|
||||
type: v.type === 'array' ? 'array' : v.type,
|
||||
...(v.description ? { description: v.description } : {}),
|
||||
...(v.enum ? { enum: v.enum } : {}),
|
||||
...(v.items ? { items: { type: v.items.type } } : {}),
|
||||
}]),
|
||||
),
|
||||
required: Object.entries(op.params)
|
||||
.filter(([, v]) => v.required)
|
||||
.map(([k]) => k),
|
||||
},
|
||||
}));
|
||||
}
|
||||
@@ -352,6 +352,62 @@ CREATE TABLE IF NOT EXISTS minion_attachments (
|
||||
CREATE INDEX IF NOT EXISTS idx_minion_attachments_job ON minion_attachments (job_id);
|
||||
ALTER TABLE minion_attachments ALTER COLUMN content SET STORAGE EXTERNAL;
|
||||
|
||||
-- ============================================================
|
||||
-- Subagent runtime (v0.16.0) — durable LLM loops
|
||||
-- ============================================================
|
||||
-- Anthropic-native message blocks, one row per Messages API message. Parallel
|
||||
-- tool_use blocks in one assistant message live in content_blocks JSONB,
|
||||
-- not across rows.
|
||||
CREATE TABLE IF NOT EXISTS subagent_messages (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
job_id BIGINT NOT NULL REFERENCES minion_jobs(id) ON DELETE CASCADE,
|
||||
message_idx INTEGER NOT NULL,
|
||||
role TEXT NOT NULL,
|
||||
content_blocks JSONB NOT NULL,
|
||||
tokens_in INTEGER,
|
||||
tokens_out INTEGER,
|
||||
tokens_cache_read INTEGER,
|
||||
tokens_cache_create INTEGER,
|
||||
model TEXT,
|
||||
ended_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
CONSTRAINT uniq_subagent_messages_idx UNIQUE (job_id, message_idx),
|
||||
CONSTRAINT chk_subagent_messages_role CHECK (role IN ('user','assistant'))
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_subagent_messages_job ON subagent_messages (job_id, message_idx);
|
||||
|
||||
-- Two-phase tool execution ledger. Before tool call: INSERT status='pending'.
|
||||
-- After success: UPDATE to 'complete' + output. On failure: 'failed' + error.
|
||||
-- Replay re-runs 'pending' rows only if the tool is idempotent.
|
||||
CREATE TABLE IF NOT EXISTS subagent_tool_executions (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
job_id BIGINT NOT NULL REFERENCES minion_jobs(id) ON DELETE CASCADE,
|
||||
message_idx INTEGER NOT NULL,
|
||||
tool_use_id TEXT NOT NULL,
|
||||
tool_name TEXT NOT NULL,
|
||||
input JSONB NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
output JSONB,
|
||||
error TEXT,
|
||||
started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
ended_at TIMESTAMPTZ,
|
||||
CONSTRAINT uniq_subagent_tools_use_id UNIQUE (job_id, tool_use_id),
|
||||
CONSTRAINT chk_subagent_tools_status CHECK (status IN ('pending','complete','failed'))
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_subagent_tools_job ON subagent_tool_executions (job_id, status);
|
||||
|
||||
-- Rate-lease table — concurrency cap on outbound providers (e.g.
|
||||
-- anthropic:messages). Acquire: INSERT if active < max_concurrent under
|
||||
-- advisory lock. Release: DELETE. Stale leases (expires_at past) auto-prune
|
||||
-- on next acquire so crashed workers can't strand capacity.
|
||||
CREATE TABLE IF NOT EXISTS subagent_rate_leases (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
key TEXT NOT NULL,
|
||||
owner_job_id BIGINT NOT NULL REFERENCES minion_jobs(id) ON DELETE CASCADE,
|
||||
acquired_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
expires_at TIMESTAMPTZ NOT NULL
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_rate_leases_key_expires ON subagent_rate_leases (key, expires_at);
|
||||
|
||||
-- NOTIFY trigger for real-time job events (Postgres only, not PGLite)
|
||||
CREATE OR REPLACE FUNCTION notify_minion_job_change() RETURNS trigger AS $$
|
||||
BEGIN
|
||||
|
||||
228
test/agent-cli.test.ts
Normal file
228
test/agent-cli.test.ts
Normal file
@@ -0,0 +1,228 @@
|
||||
/**
|
||||
* `gbrain agent` CLI tests. Covers arg parsing, --since parser, and the
|
||||
* submit path end-to-end against PGLite so we verify trusted submission,
|
||||
* protected-name guard, and fan-out wiring.
|
||||
*
|
||||
* The full handler-run loop is NOT exercised here (tested in subagent-
|
||||
* handler.test.ts). This file checks the CLI's submission + orchestration
|
||||
* glue.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
|
||||
import * as fs from 'node:fs';
|
||||
import * as path from 'node:path';
|
||||
import * as os from 'node:os';
|
||||
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
|
||||
import { MinionQueue } from '../src/core/minions/queue.ts';
|
||||
import { __testing as agentTesting } from '../src/commands/agent.ts';
|
||||
import { parseSince } from '../src/commands/agent-logs.ts';
|
||||
import { isProtectedJobName, PROTECTED_JOB_NAMES } from '../src/core/minions/protected-names.ts';
|
||||
|
||||
let engine: PGLiteEngine;
|
||||
let queue: MinionQueue;
|
||||
|
||||
beforeAll(async () => {
|
||||
engine = new PGLiteEngine();
|
||||
await engine.connect({ databaseUrl: '' });
|
||||
await engine.initSchema();
|
||||
queue = new MinionQueue(engine);
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await engine.disconnect();
|
||||
});
|
||||
|
||||
beforeEach(async () => {
|
||||
await engine.executeRaw('DELETE FROM minion_jobs');
|
||||
});
|
||||
|
||||
describe('parseRunFlags', () => {
|
||||
test('follow defaults off when stdout is non-TTY (test env)', () => {
|
||||
const { flags, rest } = agentTesting.parseRunFlags(['hello', 'world']);
|
||||
expect(flags.follow).toBe(process.stdout.isTTY === true);
|
||||
expect(rest).toEqual(['hello', 'world']);
|
||||
});
|
||||
|
||||
test('flags before prompt are parsed, unknown token ends flag parsing', () => {
|
||||
const { flags, rest } = agentTesting.parseRunFlags([
|
||||
'--model', 'claude-opus-4-7', '--max-turns', '30', 'summarize', 'everything',
|
||||
]);
|
||||
expect(flags.model).toBe('claude-opus-4-7');
|
||||
expect(flags.maxTurns).toBe(30);
|
||||
expect(rest).toEqual(['summarize', 'everything']);
|
||||
});
|
||||
|
||||
test('--tools comma-split', () => {
|
||||
const { flags } = agentTesting.parseRunFlags(['--tools', 'brain_search, brain_get_page', 'prompt']);
|
||||
expect(flags.tools).toEqual(['brain_search', 'brain_get_page']);
|
||||
});
|
||||
|
||||
test('--detach implies !follow', () => {
|
||||
const { flags } = agentTesting.parseRunFlags(['--detach', 'x']);
|
||||
expect(flags.detach).toBe(true);
|
||||
expect(flags.follow).toBe(false);
|
||||
});
|
||||
|
||||
test('double-dash ends flag parsing explicitly', () => {
|
||||
const { flags, rest } = agentTesting.parseRunFlags(['--model', 'm', '--', '--not-a-flag']);
|
||||
expect(flags.model).toBe('m');
|
||||
expect(rest).toEqual(['--not-a-flag']);
|
||||
});
|
||||
|
||||
test('unknown flag throws', () => {
|
||||
expect(() => agentTesting.parseRunFlags(['--what', 'x'])).toThrow(/unknown flag/);
|
||||
});
|
||||
|
||||
test('--subagent-def + --timeout-ms parsed', () => {
|
||||
const { flags } = agentTesting.parseRunFlags([
|
||||
'--subagent-def', 'researcher', '--timeout-ms', '60000', 'hello',
|
||||
]);
|
||||
expect(flags.subagentDef).toBe('researcher');
|
||||
expect(flags.timeoutMs).toBe(60000);
|
||||
});
|
||||
|
||||
test('--fanout-manifest parsed', () => {
|
||||
const { flags } = agentTesting.parseRunFlags(['--fanout-manifest', '/tmp/m.json']);
|
||||
expect(flags.fanoutManifest).toBe('/tmp/m.json');
|
||||
});
|
||||
});
|
||||
|
||||
describe('parseSince', () => {
|
||||
test('returns undefined on empty input', () => {
|
||||
expect(parseSince(undefined)).toBeUndefined();
|
||||
expect(parseSince('')).toBeUndefined();
|
||||
});
|
||||
|
||||
test('parses ISO-8601 timestamps', () => {
|
||||
const iso = '2026-04-20T12:00:00.000Z';
|
||||
expect(parseSince(iso)).toBe(iso);
|
||||
});
|
||||
|
||||
test('parses relative 5m', () => {
|
||||
const out = parseSince('5m')!;
|
||||
const parsed = new Date(out).getTime();
|
||||
const now = Date.now();
|
||||
expect(now - parsed).toBeGreaterThanOrEqual(5 * 60 * 1000 - 1000);
|
||||
expect(now - parsed).toBeLessThan(5 * 60 * 1000 + 1000);
|
||||
});
|
||||
|
||||
test('parses relative 2h', () => {
|
||||
const out = parseSince('2h')!;
|
||||
const delta = Date.now() - new Date(out).getTime();
|
||||
expect(delta).toBeGreaterThanOrEqual(2 * 3600 * 1000 - 1000);
|
||||
});
|
||||
|
||||
test('parses relative 1d', () => {
|
||||
const out = parseSince('1d')!;
|
||||
const delta = Date.now() - new Date(out).getTime();
|
||||
expect(delta).toBeGreaterThanOrEqual(86_400_000 - 1000);
|
||||
});
|
||||
|
||||
test('throws on unparseable input', () => {
|
||||
expect(() => parseSince('not-a-date')).toThrow(/could not parse/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('protected-name guard includes subagent + aggregator', () => {
|
||||
test('shell stays protected', () => {
|
||||
expect(isProtectedJobName('shell')).toBe(true);
|
||||
expect(PROTECTED_JOB_NAMES.has('shell')).toBe(true);
|
||||
});
|
||||
|
||||
test('subagent is protected (v0.15)', () => {
|
||||
expect(isProtectedJobName('subagent')).toBe(true);
|
||||
});
|
||||
|
||||
test('subagent_aggregator is protected (v0.15)', () => {
|
||||
expect(isProtectedJobName('subagent_aggregator')).toBe(true);
|
||||
});
|
||||
|
||||
test('a random non-protected name is not protected', () => {
|
||||
expect(isProtectedJobName('sync')).toBe(false);
|
||||
});
|
||||
|
||||
test('trim normalization still blocks " subagent "', () => {
|
||||
expect(isProtectedJobName(' subagent ')).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
describe('queue.add trusted-submit gate for subagent', () => {
|
||||
test('subagent without allowProtectedSubmit throws', async () => {
|
||||
await expect(queue.add('subagent', { prompt: 'hi' })).rejects.toThrow();
|
||||
});
|
||||
|
||||
test('subagent with allowProtectedSubmit succeeds', async () => {
|
||||
const job = await queue.add('subagent', { prompt: 'hi' }, {}, { allowProtectedSubmit: true });
|
||||
expect(job.name).toBe('subagent');
|
||||
expect(job.status).toBe('waiting');
|
||||
});
|
||||
|
||||
test('subagent_aggregator gated the same way', async () => {
|
||||
await expect(queue.add('subagent_aggregator', { children_ids: [] })).rejects.toThrow();
|
||||
const ok = await queue.add('subagent_aggregator', { children_ids: [1] }, {}, {
|
||||
allowProtectedSubmit: true,
|
||||
});
|
||||
expect(ok.name).toBe('subagent_aggregator');
|
||||
});
|
||||
});
|
||||
|
||||
describe('fan-out manifest shape (integration)', () => {
|
||||
test('fanout-manifest with 3 entries creates 3 subagent children + 1 aggregator', async () => {
|
||||
// Manually replicate what runAgentRun does for --fanout-manifest > 1.
|
||||
// We don't invoke runAgentRun (it calls process.exit on error) — we
|
||||
// assert that the plumbing works via direct queue calls with the
|
||||
// same flags it uses.
|
||||
const tmp = fs.mkdtempSync(path.join(os.tmpdir(), 'fanout-'));
|
||||
try {
|
||||
const manifestPath = path.join(tmp, 'm.json');
|
||||
fs.writeFileSync(manifestPath, JSON.stringify([
|
||||
{ prompt: 'chunk 1' }, { prompt: 'chunk 2' }, { prompt: 'chunk 3' },
|
||||
]));
|
||||
|
||||
// Aggregator first.
|
||||
const agg = await queue.add(
|
||||
'subagent_aggregator',
|
||||
{ children_ids: [] },
|
||||
{ max_stalled: 3 },
|
||||
{ allowProtectedSubmit: true },
|
||||
);
|
||||
const kids: number[] = [];
|
||||
for (const p of ['chunk 1', 'chunk 2', 'chunk 3']) {
|
||||
const c = await queue.add(
|
||||
'subagent',
|
||||
{ prompt: p },
|
||||
{ parent_job_id: agg.id, on_child_fail: 'continue', max_stalled: 3 },
|
||||
{ allowProtectedSubmit: true },
|
||||
);
|
||||
kids.push(c.id);
|
||||
}
|
||||
await engine.executeRaw(
|
||||
`UPDATE minion_jobs SET data = jsonb_set(data, '{children_ids}', $1::jsonb) WHERE id = $2`,
|
||||
[JSON.stringify(kids), agg.id],
|
||||
);
|
||||
|
||||
// Aggregator should be in waiting-children since kids were submitted
|
||||
// with parent_job_id = agg.id (Lane 1B behavior).
|
||||
const aggNow = await queue.getJob(agg.id);
|
||||
expect(aggNow?.status).toBe('waiting-children');
|
||||
|
||||
// Aggregator's data.children_ids reflects the spawned children.
|
||||
const dataRow = await engine.executeRaw<{ data: unknown }>(
|
||||
`SELECT data FROM minion_jobs WHERE id = $1`, [agg.id],
|
||||
);
|
||||
const data = typeof dataRow[0]!.data === 'string'
|
||||
? JSON.parse(dataRow[0]!.data as string)
|
||||
: dataRow[0]!.data as Record<string, unknown>;
|
||||
expect(data.children_ids).toEqual(kids);
|
||||
|
||||
// Each child should have on_child_fail = 'continue'.
|
||||
const childRows = await engine.executeRaw<{ on_child_fail: string }>(
|
||||
`SELECT on_child_fail FROM minion_jobs WHERE parent_job_id = $1`, [agg.id],
|
||||
);
|
||||
expect(childRows.length).toBe(3);
|
||||
expect(childRows.every(r => r.on_child_fail === 'continue')).toBe(true);
|
||||
} finally {
|
||||
fs.rmSync(tmp, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
});
|
||||
@@ -103,10 +103,10 @@ describe('buildPlan — diff against completed + installed VERSION', () => {
|
||||
expect(plan.partial).toEqual([]);
|
||||
expect(plan.pending.map(m => m.version)).toContain('0.11.0');
|
||||
// Future migrations (registered but newer than installed VERSION) land in
|
||||
// skippedFuture until the binary catches up. v0.13.0 = frontmatter graph
|
||||
// (master), v0.13.1 = Knowledge Runtime grandfather, v0.14.0 = shell
|
||||
// jobs + autopilot cooperative.
|
||||
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.0', '0.12.2', '0.13.0', '0.13.1', '0.14.0']);
|
||||
// skippedFuture until the binary catches up. v0.13.0 = frontmatter graph,
|
||||
// v0.13.1 = Knowledge Runtime grandfather, v0.14.0 = shell jobs +
|
||||
// autopilot cooperative, v0.16.0 = subagent runtime (this branch).
|
||||
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.0', '0.12.2', '0.13.0', '0.13.1', '0.14.0', '0.16.0']);
|
||||
});
|
||||
|
||||
test('already applied → v0.11.0 lands in `applied` bucket, not pending', () => {
|
||||
@@ -142,10 +142,11 @@ describe('buildPlan — diff against completed + installed VERSION', () => {
|
||||
const idx = indexCompleted([]);
|
||||
const plan = buildPlan(idx, '0.12.0');
|
||||
expect(plan.pending.map(m => m.version)).toContain('0.11.0');
|
||||
// v0.12.2, v0.13.0, v0.13.1, and v0.14.0 were added later; installed=0.12.0
|
||||
// means they belong in skippedFuture, not pending. v0.11.0 and v0.12.0
|
||||
// stay pending despite being ≤ installed — that is the H9 invariant.
|
||||
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.2', '0.13.0', '0.13.1', '0.14.0']);
|
||||
// v0.12.2, v0.13.0, v0.13.1, v0.14.0, and v0.16.0 were added later;
|
||||
// installed=0.12.0 means they belong in skippedFuture, not pending. v0.11.0
|
||||
// and v0.12.0 stay pending despite being ≤ installed — that is the H9
|
||||
// invariant.
|
||||
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.2', '0.13.0', '0.13.1', '0.14.0', '0.16.0']);
|
||||
});
|
||||
|
||||
test('--migration filter narrows to one version', () => {
|
||||
|
||||
177
test/brain-allowlist.test.ts
Normal file
177
test/brain-allowlist.test.ts
Normal file
@@ -0,0 +1,177 @@
|
||||
/**
|
||||
* Subagent brain-tool registry tests. Covers:
|
||||
* - every allow-list name exists in OPERATIONS (catches renames upstream)
|
||||
* - Anthropic tool-name constraint enforced
|
||||
* - put_page schema is namespace-wrapped per subagent
|
||||
* - execute() invokes the op handler with viaSubagent=true + subagentId
|
||||
* - filterAllowedTools narrows registry + rejects unknown names
|
||||
* - denied ops (file_upload etc.) do NOT appear in the registry
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
|
||||
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
|
||||
import { operations, OperationError } from '../src/core/operations.ts';
|
||||
import {
|
||||
BRAIN_TOOL_ALLOWLIST,
|
||||
buildBrainTools,
|
||||
filterAllowedTools,
|
||||
__testing,
|
||||
} from '../src/core/minions/tools/brain-allowlist.ts';
|
||||
import type { GBrainConfig } from '../src/core/config.ts';
|
||||
import type { ToolCtx } from '../src/core/minions/types.ts';
|
||||
|
||||
let engine: PGLiteEngine;
|
||||
const config: GBrainConfig = { engine: 'pglite' } as GBrainConfig;
|
||||
|
||||
beforeAll(async () => {
|
||||
engine = new PGLiteEngine();
|
||||
await engine.connect({ databaseUrl: '' });
|
||||
await engine.initSchema();
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await engine.disconnect();
|
||||
});
|
||||
|
||||
beforeEach(async () => {
|
||||
await engine.executeRaw('DELETE FROM pages');
|
||||
});
|
||||
|
||||
describe('BRAIN_TOOL_ALLOWLIST', () => {
|
||||
test('every name exists in src/core/operations.ts OPERATIONS', () => {
|
||||
const opNames = new Set(operations.map(o => o.name));
|
||||
const missing = [...BRAIN_TOOL_ALLOWLIST].filter(n => !opNames.has(n));
|
||||
expect(missing).toEqual([]);
|
||||
});
|
||||
|
||||
test('contains the read-only 10 + put_page', () => {
|
||||
expect(BRAIN_TOOL_ALLOWLIST.size).toBe(11);
|
||||
expect(BRAIN_TOOL_ALLOWLIST.has('query')).toBe(true);
|
||||
expect(BRAIN_TOOL_ALLOWLIST.has('search')).toBe(true);
|
||||
expect(BRAIN_TOOL_ALLOWLIST.has('get_page')).toBe(true);
|
||||
expect(BRAIN_TOOL_ALLOWLIST.has('list_pages')).toBe(true);
|
||||
expect(BRAIN_TOOL_ALLOWLIST.has('put_page')).toBe(true);
|
||||
});
|
||||
|
||||
test('does NOT contain destructive ops', () => {
|
||||
expect(BRAIN_TOOL_ALLOWLIST.has('file_upload')).toBe(false);
|
||||
expect(BRAIN_TOOL_ALLOWLIST.has('delete_page')).toBe(false);
|
||||
expect(BRAIN_TOOL_ALLOWLIST.has('delete_file')).toBe(false);
|
||||
expect(BRAIN_TOOL_ALLOWLIST.has('sync')).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('buildBrainTools', () => {
|
||||
test('produces one ToolDef per allow-listed op that exists in operations.ts', () => {
|
||||
const tools = buildBrainTools({ subagentId: 42, engine, config });
|
||||
const opNames = new Set(operations.map(o => o.name));
|
||||
const expected = [...BRAIN_TOOL_ALLOWLIST].filter(n => opNames.has(n)).length;
|
||||
expect(tools.length).toBe(expected);
|
||||
});
|
||||
|
||||
test('tool names are brain_<op> and match Anthropic constraint', () => {
|
||||
const tools = buildBrainTools({ subagentId: 7, engine, config });
|
||||
for (const t of tools) {
|
||||
expect(t.name).toMatch(__testing.ANTHROPIC_NAME_RE);
|
||||
expect(t.name.startsWith('brain_')).toBe(true);
|
||||
}
|
||||
});
|
||||
|
||||
test('tools are flagged idempotent in v0.15', () => {
|
||||
const tools = buildBrainTools({ subagentId: 1, engine, config });
|
||||
expect(tools.every(t => t.idempotent === true)).toBe(true);
|
||||
});
|
||||
|
||||
test('tools carry the op description verbatim', () => {
|
||||
const tools = buildBrainTools({ subagentId: 1, engine, config });
|
||||
const getPage = tools.find(t => t.name === 'brain_get_page');
|
||||
const op = operations.find(o => o.name === 'get_page');
|
||||
expect(getPage?.description).toBe(op!.description);
|
||||
});
|
||||
|
||||
test('put_page schema is namespace-wrapped per subagent', () => {
|
||||
const tools42 = buildBrainTools({ subagentId: 42, engine, config });
|
||||
const putPage42 = tools42.find(t => t.name === 'brain_put_page');
|
||||
const slug42 = ((putPage42!.input_schema as any).properties as any).slug;
|
||||
expect(slug42.pattern).toBe('^wiki/agents/42/.+');
|
||||
expect(slug42.description).toContain('wiki/agents/42/');
|
||||
|
||||
const tools7 = buildBrainTools({ subagentId: 7, engine, config });
|
||||
const putPage7 = tools7.find(t => t.name === 'brain_put_page');
|
||||
const slug7 = ((putPage7!.input_schema as any).properties as any).slug;
|
||||
expect(slug7.pattern).toBe('^wiki/agents/7/.+');
|
||||
});
|
||||
|
||||
test('non-put_page tools do NOT get a pattern on slug', () => {
|
||||
const tools = buildBrainTools({ subagentId: 42, engine, config });
|
||||
const getPage = tools.find(t => t.name === 'brain_get_page');
|
||||
const slug = ((getPage!.input_schema as any).properties as any).slug;
|
||||
expect(slug).toBeDefined();
|
||||
expect(slug.pattern).toBeUndefined();
|
||||
});
|
||||
|
||||
test('execute() on put_page with valid namespace slug succeeds', async () => {
|
||||
const tools = buildBrainTools({ subagentId: 42, engine, config });
|
||||
const putPage = tools.find(t => t.name === 'brain_put_page');
|
||||
const ctx: ToolCtx = { engine, jobId: 1, remote: true };
|
||||
const res = await putPage!.execute(
|
||||
{ slug: 'wiki/agents/42/notes', content: '---\ntitle: Notes\n---\nbody' },
|
||||
ctx,
|
||||
);
|
||||
expect(res).toBeTruthy();
|
||||
});
|
||||
|
||||
test('execute() on put_page with out-of-namespace slug throws permission_denied', async () => {
|
||||
const tools = buildBrainTools({ subagentId: 42, engine, config });
|
||||
const putPage = tools.find(t => t.name === 'brain_put_page');
|
||||
const ctx: ToolCtx = { engine, jobId: 1, remote: true };
|
||||
await expect(
|
||||
putPage!.execute(
|
||||
{ slug: 'wiki/analysis/stomp', content: '---\ntitle: x\n---\nb' },
|
||||
ctx,
|
||||
),
|
||||
).rejects.toBeInstanceOf(OperationError);
|
||||
});
|
||||
});
|
||||
|
||||
describe('filterAllowedTools', () => {
|
||||
test('passes prefixed names through', () => {
|
||||
const tools = buildBrainTools({ subagentId: 1, engine, config });
|
||||
const filtered = filterAllowedTools(tools, ['brain_get_page', 'brain_search']);
|
||||
expect(filtered.map(t => t.name)).toEqual(['brain_get_page', 'brain_search']);
|
||||
});
|
||||
|
||||
test('accepts un-prefixed names as a convenience', () => {
|
||||
const tools = buildBrainTools({ subagentId: 1, engine, config });
|
||||
const filtered = filterAllowedTools(tools, ['get_page', 'search']);
|
||||
expect(filtered.map(t => t.name)).toEqual(['brain_get_page', 'brain_search']);
|
||||
});
|
||||
|
||||
test('rejects unknown tool names (no silent ignore)', () => {
|
||||
const tools = buildBrainTools({ subagentId: 1, engine, config });
|
||||
expect(() => filterAllowedTools(tools, ['brain_typo_nope'])).toThrow(/unknown tool/);
|
||||
});
|
||||
|
||||
test('deduplicates when both prefixed + unprefixed given', () => {
|
||||
const tools = buildBrainTools({ subagentId: 1, engine, config });
|
||||
const filtered = filterAllowedTools(tools, ['brain_get_page', 'get_page']);
|
||||
expect(filtered.length).toBe(1);
|
||||
});
|
||||
|
||||
test('empty array yields empty registry', () => {
|
||||
const tools = buildBrainTools({ subagentId: 1, engine, config });
|
||||
expect(filterAllowedTools(tools, [])).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
||||
describe('sanitizeToolName', () => {
|
||||
test('returns within 64 chars', () => {
|
||||
// Synthetic: simulate an op name long enough to need slicing.
|
||||
const long = 'a'.repeat(100);
|
||||
expect(__testing.sanitizeToolName(long).length).toBeLessThanOrEqual(64);
|
||||
});
|
||||
|
||||
test('replaces non-conforming chars with _', () => {
|
||||
expect(__testing.sanitizeToolName('foo.bar')).toBe('brain_foo_bar');
|
||||
});
|
||||
});
|
||||
@@ -61,9 +61,12 @@ describeE2E('E2E: Minions resilience (OpenClaw real-world patterns)', () => {
|
||||
|
||||
// 50 concurrent submits racing through SELECT ... FOR UPDATE on parent.
|
||||
// The PG row lock serializes them; only the first 10 see live_count < 10.
|
||||
// Use a non-protected name — this test is about max_children semantics,
|
||||
// not the v0.15 subagent runtime specifically. `subagent` became a
|
||||
// PROTECTED_JOB_NAME in v0.15 (CLI-only; trusted submit required).
|
||||
const results = await Promise.allSettled(
|
||||
Array.from({ length: 50 }, (_, i) =>
|
||||
queue.add(`subagent`, { i }, { parent_job_id: parent.id })
|
||||
queue.add(`child_worker`, { i }, { parent_job_id: parent.id })
|
||||
)
|
||||
);
|
||||
|
||||
|
||||
67
test/mcp-tool-defs.test.ts
Normal file
67
test/mcp-tool-defs.test.ts
Normal file
@@ -0,0 +1,67 @@
|
||||
/**
|
||||
* Regression test for the MCP tool-def extraction (v0.16.0 Lane 1A).
|
||||
*
|
||||
* Before v0.15 the mapping lived inline in src/mcp/server.ts. After the
|
||||
* extraction, buildToolDefs is the single source of truth; the subagent tool
|
||||
* registry calls it with a filtered OPERATIONS subset. This test pins the
|
||||
* extracted output to the pre-extraction shape byte-for-byte so we don't
|
||||
* silently drift the MCP-facing tool schema.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { operations } from '../src/core/operations.ts';
|
||||
import { buildToolDefs } from '../src/mcp/tool-defs.ts';
|
||||
|
||||
// Pre-extraction inline shape — lifted verbatim from the original
|
||||
// src/mcp/server.ts block so any future drift fails this test loudly.
|
||||
function legacyInlineMap(ops: typeof operations) {
|
||||
return ops.map(op => ({
|
||||
name: op.name,
|
||||
description: op.description,
|
||||
inputSchema: {
|
||||
type: 'object' as const,
|
||||
properties: Object.fromEntries(
|
||||
Object.entries(op.params).map(([k, v]) => [k, {
|
||||
type: v.type === 'array' ? 'array' : v.type,
|
||||
...(v.description ? { description: v.description } : {}),
|
||||
...(v.enum ? { enum: v.enum } : {}),
|
||||
...(v.items ? { items: { type: v.items.type } } : {}),
|
||||
}]),
|
||||
),
|
||||
required: Object.entries(op.params)
|
||||
.filter(([, v]) => v.required)
|
||||
.map(([k]) => k),
|
||||
},
|
||||
}));
|
||||
}
|
||||
|
||||
describe('buildToolDefs', () => {
|
||||
test('output equals pre-extraction inline mapping byte-for-byte', () => {
|
||||
const extracted = buildToolDefs(operations);
|
||||
const inline = legacyInlineMap(operations);
|
||||
expect(JSON.stringify(extracted)).toBe(JSON.stringify(inline));
|
||||
});
|
||||
|
||||
test('preserves operation count', () => {
|
||||
expect(buildToolDefs(operations).length).toBe(operations.length);
|
||||
});
|
||||
|
||||
test('accepts an arbitrary Operation subset (for subagent tool registry)', () => {
|
||||
const subset = operations.slice(0, 3);
|
||||
const defs = buildToolDefs(subset);
|
||||
expect(defs.length).toBe(3);
|
||||
expect(defs.map(d => d.name)).toEqual(subset.map(o => o.name));
|
||||
});
|
||||
|
||||
test('empty input returns empty array', () => {
|
||||
expect(buildToolDefs([])).toEqual([]);
|
||||
});
|
||||
|
||||
test('every def has object inputSchema with properties + required array', () => {
|
||||
for (const def of buildToolDefs(operations)) {
|
||||
expect(def.inputSchema.type).toBe('object');
|
||||
expect(typeof def.inputSchema.properties).toBe('object');
|
||||
expect(Array.isArray(def.inputSchema.required)).toBe(true);
|
||||
}
|
||||
});
|
||||
});
|
||||
97
test/migrations-v0_16_0.test.ts
Normal file
97
test/migrations-v0_16_0.test.ts
Normal file
@@ -0,0 +1,97 @@
|
||||
/**
|
||||
* Unit tests for v0.16.0 migration orchestrator + registry.
|
||||
*
|
||||
* The full schema verification lives in E2E (Postgres). These unit tests
|
||||
* cover:
|
||||
* - registry wiring (v0.16.0 registered + lookup works)
|
||||
* - v0.14.0 noop stub wired (gapless version sequence)
|
||||
* - migration metadata (version, pitch)
|
||||
* - dry-run short-circuits both phases
|
||||
* - required table names constant
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { migrations, getMigration } from '../src/commands/migrations/index.ts';
|
||||
import { __testing } from '../src/commands/migrations/v0_16_0.ts';
|
||||
|
||||
describe('v0.16.0 migration', () => {
|
||||
test('is registered in the migrations registry', () => {
|
||||
const v0_16_0 = getMigration('0.16.0');
|
||||
expect(v0_16_0).not.toBeNull();
|
||||
expect(v0_16_0?.version).toBe('0.16.0');
|
||||
});
|
||||
|
||||
test('v0.14.0 noop stub is registered (gapless sequence)', () => {
|
||||
const v0_14_0 = getMigration('0.14.0');
|
||||
expect(v0_14_0).not.toBeNull();
|
||||
expect(v0_14_0?.version).toBe('0.14.0');
|
||||
});
|
||||
|
||||
test('migrations array has no version gaps through 0.16.0', () => {
|
||||
const versions = migrations.map(m => m.version);
|
||||
expect(versions).toContain('0.13.1');
|
||||
expect(versions).toContain('0.14.0');
|
||||
expect(versions).toContain('0.16.0');
|
||||
// order check — registry is semver-sorted in the source
|
||||
const v15Idx = versions.indexOf('0.16.0');
|
||||
const v14Idx = versions.indexOf('0.14.0');
|
||||
const v131Idx = versions.indexOf('0.13.1');
|
||||
expect(v131Idx).toBeLessThan(v14Idx);
|
||||
expect(v14Idx).toBeLessThan(v15Idx);
|
||||
});
|
||||
|
||||
test('feature pitch has headline and description', () => {
|
||||
const m = getMigration('0.16.0');
|
||||
expect(m?.featurePitch.headline).toBeTruthy();
|
||||
expect(m?.featurePitch.description).toBeTruthy();
|
||||
});
|
||||
|
||||
test('REQUIRED_TABLES lists all three subagent tables', () => {
|
||||
expect(__testing.REQUIRED_TABLES).toEqual([
|
||||
'subagent_messages',
|
||||
'subagent_tool_executions',
|
||||
'subagent_rate_leases',
|
||||
]);
|
||||
});
|
||||
|
||||
test('phaseASchema skips on dry-run', () => {
|
||||
const r = __testing.phaseASchema({ dryRun: true });
|
||||
expect(r.status).toBe('skipped');
|
||||
expect(r.detail).toBe('dry-run');
|
||||
});
|
||||
|
||||
test('phaseBVerify skips on dry-run', async () => {
|
||||
const r = await __testing.phaseBVerify({ dryRun: true });
|
||||
expect(r.status).toBe('skipped');
|
||||
expect(r.detail).toBe('dry-run');
|
||||
});
|
||||
|
||||
test('orchestrator in dry-run returns complete with both phases skipped', async () => {
|
||||
const m = getMigration('0.16.0');
|
||||
const result = await m!.orchestrator({ dryRun: true });
|
||||
expect(result.version).toBe('0.16.0');
|
||||
expect(result.phases.length).toBe(2);
|
||||
expect(result.phases.every(p => p.status === 'skipped')).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
describe('schema-embedded.ts contains subagent tables', () => {
|
||||
test('embedded schema references all three subagent tables', async () => {
|
||||
const { SCHEMA_SQL } = await import('../src/core/schema-embedded.ts');
|
||||
expect(SCHEMA_SQL).toContain('CREATE TABLE IF NOT EXISTS subagent_messages');
|
||||
expect(SCHEMA_SQL).toContain('CREATE TABLE IF NOT EXISTS subagent_tool_executions');
|
||||
expect(SCHEMA_SQL).toContain('CREATE TABLE IF NOT EXISTS subagent_rate_leases');
|
||||
expect(SCHEMA_SQL).toContain('idx_subagent_messages_job');
|
||||
expect(SCHEMA_SQL).toContain('idx_subagent_tools_job');
|
||||
expect(SCHEMA_SQL).toContain('idx_rate_leases_key_expires');
|
||||
});
|
||||
});
|
||||
|
||||
describe('pglite-schema.ts contains subagent tables', () => {
|
||||
test('embedded PGLite schema references all three subagent tables', async () => {
|
||||
const { PGLITE_SCHEMA_SQL } = await import('../src/core/pglite-schema.ts');
|
||||
expect(PGLITE_SCHEMA_SQL).toContain('CREATE TABLE IF NOT EXISTS subagent_messages');
|
||||
expect(PGLITE_SCHEMA_SQL).toContain('CREATE TABLE IF NOT EXISTS subagent_tool_executions');
|
||||
expect(PGLITE_SCHEMA_SQL).toContain('CREATE TABLE IF NOT EXISTS subagent_rate_leases');
|
||||
});
|
||||
});
|
||||
253
test/plugin-loader.test.ts
Normal file
253
test/plugin-loader.test.ts
Normal file
@@ -0,0 +1,253 @@
|
||||
/**
|
||||
* plugin-loader tests. Exercise the full path/manifest/validation surface
|
||||
* using ephemeral tmp dirs so no repo content is touched.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
|
||||
import * as fs from 'node:fs';
|
||||
import * as path from 'node:path';
|
||||
import * as os from 'node:os';
|
||||
import {
|
||||
loadPluginsFromEnv,
|
||||
loadSinglePlugin,
|
||||
SUPPORTED_PLUGIN_VERSION,
|
||||
__testing,
|
||||
} from '../src/core/minions/plugin-loader.ts';
|
||||
|
||||
let tmpRoot: string;
|
||||
|
||||
beforeAll(() => {
|
||||
tmpRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'plugin-loader-test-'));
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
fs.rmSync(tmpRoot, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
beforeEach(() => {
|
||||
for (const f of fs.readdirSync(tmpRoot)) {
|
||||
fs.rmSync(path.join(tmpRoot, f), { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
// Helper: build a plugin directory with a manifest + a subagents/ tree.
|
||||
function writePlugin(
|
||||
name: string,
|
||||
opts: {
|
||||
plugin_version?: string;
|
||||
subagents?: Record<string, string>;
|
||||
subagents_field?: string;
|
||||
omit_manifest?: boolean;
|
||||
bad_manifest_json?: boolean;
|
||||
} = {},
|
||||
): string {
|
||||
const dir = path.join(tmpRoot, name);
|
||||
fs.mkdirSync(dir, { recursive: true });
|
||||
|
||||
if (!opts.omit_manifest) {
|
||||
const manifest = {
|
||||
name,
|
||||
version: '1.0.0',
|
||||
plugin_version: opts.plugin_version ?? SUPPORTED_PLUGIN_VERSION,
|
||||
...(opts.subagents_field ? { subagents: opts.subagents_field } : {}),
|
||||
};
|
||||
fs.writeFileSync(
|
||||
path.join(dir, 'gbrain.plugin.json'),
|
||||
opts.bad_manifest_json ? '{not valid json' : JSON.stringify(manifest, null, 2),
|
||||
);
|
||||
}
|
||||
|
||||
if (opts.subagents) {
|
||||
const sadir = path.join(dir, opts.subagents_field ?? 'subagents');
|
||||
fs.mkdirSync(sadir, { recursive: true });
|
||||
for (const [file, content] of Object.entries(opts.subagents)) {
|
||||
fs.writeFileSync(path.join(sadir, file), content);
|
||||
}
|
||||
}
|
||||
|
||||
return dir;
|
||||
}
|
||||
|
||||
describe('path policy', () => {
|
||||
test('relative paths rejected', () => {
|
||||
expect(__testing.rejectIfNotAbsolute('relative/path')).toMatch(/relative path rejected/);
|
||||
});
|
||||
|
||||
test('~-prefixed paths rejected (no implicit expansion)', () => {
|
||||
expect(__testing.rejectIfNotAbsolute('~/subagents')).toMatch(/~-prefixed/);
|
||||
});
|
||||
|
||||
test('remote URLs rejected', () => {
|
||||
expect(__testing.rejectIfNotAbsolute('https://example.com/plugins')).toMatch(/remote URL/);
|
||||
expect(__testing.rejectIfNotAbsolute('file:///abs/p')).toMatch(/remote URL/);
|
||||
});
|
||||
|
||||
test('absolute POSIX path accepted', () => {
|
||||
expect(__testing.rejectIfNotAbsolute('/abs/path')).toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
describe('loadSinglePlugin', () => {
|
||||
test('loads a minimal manifest + one subagent def', () => {
|
||||
const dir = writePlugin('wintermute', {
|
||||
subagents: {
|
||||
'meeting-ingestion.md': `---\nname: meeting-ingestion\nmodel: sonnet\n---\n\nYou are a meeting ingester.\n`,
|
||||
},
|
||||
});
|
||||
const res = loadSinglePlugin(dir);
|
||||
expect('error' in res).toBe(false);
|
||||
if ('error' in res) return;
|
||||
expect(res.manifest.name).toBe('wintermute');
|
||||
expect(res.subagents.length).toBe(1);
|
||||
expect(res.subagents[0]!.name).toBe('meeting-ingestion');
|
||||
expect(res.subagents[0]!.body.trim()).toBe('You are a meeting ingester.');
|
||||
});
|
||||
|
||||
test('missing manifest returns error', () => {
|
||||
const dir = writePlugin('empty', { omit_manifest: true });
|
||||
const res = loadSinglePlugin(dir);
|
||||
expect('error' in res).toBe(true);
|
||||
if ('error' in res) expect(res.error).toMatch(/missing gbrain\.plugin\.json/);
|
||||
});
|
||||
|
||||
test('invalid manifest JSON returns error', () => {
|
||||
const dir = writePlugin('bad-json', { bad_manifest_json: true });
|
||||
const res = loadSinglePlugin(dir);
|
||||
expect('error' in res).toBe(true);
|
||||
if ('error' in res) expect(res.error).toMatch(/invalid manifest JSON/);
|
||||
});
|
||||
|
||||
test('unsupported plugin_version rejected', () => {
|
||||
const dir = writePlugin('future', { plugin_version: 'gbrain-plugin-v999' });
|
||||
const res = loadSinglePlugin(dir);
|
||||
expect('error' in res).toBe(true);
|
||||
if ('error' in res) expect(res.error).toMatch(/unsupported plugin_version/);
|
||||
});
|
||||
|
||||
test('escape-attempt subagents field rejected', () => {
|
||||
const dir = writePlugin('escape', { subagents_field: '../../../etc' });
|
||||
const res = loadSinglePlugin(dir);
|
||||
expect('error' in res).toBe(true);
|
||||
if ('error' in res) expect(res.error).toMatch(/escapes plugin root/);
|
||||
});
|
||||
|
||||
test('falls back to file basename when frontmatter.name is missing', () => {
|
||||
const dir = writePlugin('nameless', {
|
||||
subagents: {
|
||||
'implicit-name.md': `---\nmodel: sonnet\n---\nbody\n`,
|
||||
},
|
||||
});
|
||||
const res = loadSinglePlugin(dir);
|
||||
if ('error' in res) throw new Error(res.error);
|
||||
expect(res.subagents[0]!.name).toBe('implicit-name');
|
||||
});
|
||||
|
||||
test('allowed_tools frontmatter list of strings survives round-trip', () => {
|
||||
const dir = writePlugin('tools', {
|
||||
subagents: {
|
||||
'researcher.md': `---\nname: researcher\nallowed_tools:\n - brain_search\n - brain_get_page\n---\nbody\n`,
|
||||
},
|
||||
});
|
||||
const res = loadSinglePlugin(dir);
|
||||
if ('error' in res) throw new Error(res.error);
|
||||
expect(res.subagents[0]!.allowed_tools).toEqual(['brain_search', 'brain_get_page']);
|
||||
});
|
||||
|
||||
test('allowed_tools referencing unknown tool names fails load', () => {
|
||||
const dir = writePlugin('rogue', {
|
||||
subagents: {
|
||||
'typo.md': `---\nname: typo\nallowed_tools:\n - brain_seerch\n---\nbody\n`,
|
||||
},
|
||||
});
|
||||
const res = loadSinglePlugin(dir, {
|
||||
validAgentToolNames: new Set(['brain_search', 'brain_get_page']),
|
||||
});
|
||||
expect('error' in res).toBe(true);
|
||||
if ('error' in res) expect(res.error).toMatch(/unknown tools: brain_seerch/);
|
||||
});
|
||||
|
||||
test('validation passes when allowed_tools are all in the registry', () => {
|
||||
const dir = writePlugin('clean', {
|
||||
subagents: {
|
||||
'ok.md': `---\nname: ok\nallowed_tools:\n - brain_search\n---\nbody\n`,
|
||||
},
|
||||
});
|
||||
const res = loadSinglePlugin(dir, {
|
||||
validAgentToolNames: new Set(['brain_search']),
|
||||
});
|
||||
expect('error' in res).toBe(false);
|
||||
});
|
||||
|
||||
test('skipping validation (no validAgentToolNames) allows any allowed_tools', () => {
|
||||
const dir = writePlugin('no-validate', {
|
||||
subagents: {
|
||||
'anything.md': `---\nname: anything\nallowed_tools:\n - tool_we_have_not_shipped_yet\n---\nbody\n`,
|
||||
},
|
||||
});
|
||||
const res = loadSinglePlugin(dir);
|
||||
expect('error' in res).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('loadPluginsFromEnv', () => {
|
||||
test('empty env returns no plugins, no warnings', () => {
|
||||
const r = loadPluginsFromEnv({ envPath: '' });
|
||||
expect(r.plugins).toEqual([]);
|
||||
expect(r.warnings).toEqual([]);
|
||||
});
|
||||
|
||||
test('multi-path: colon-separated PATH loads both', () => {
|
||||
const a = writePlugin('a', { subagents: { 'x.md': `---\nname: x\n---\nbody` } });
|
||||
const b = writePlugin('b', { subagents: { 'y.md': `---\nname: y\n---\nbody` } });
|
||||
const r = loadPluginsFromEnv({ envPath: `${a}:${b}` });
|
||||
expect(r.plugins.length).toBe(2);
|
||||
expect(r.plugins[0]!.manifest.name).toBe('a');
|
||||
expect(r.plugins[1]!.manifest.name).toBe('b');
|
||||
});
|
||||
|
||||
test('collision: left-wins with a warning', () => {
|
||||
const left = writePlugin('left', { subagents: { 'shared.md': `---\nname: shared\n---\nleft body` } });
|
||||
const right = writePlugin('right', { subagents: { 'shared.md': `---\nname: shared\n---\nright body` } });
|
||||
const r = loadPluginsFromEnv({ envPath: `${left}:${right}` });
|
||||
expect(r.plugins.length).toBe(2);
|
||||
// Only the left plugin contributes the `shared` subagent.
|
||||
const leftSubs = r.plugins[0]!.subagents.map(s => s.name);
|
||||
const rightSubs = r.plugins[1]!.subagents.map(s => s.name);
|
||||
expect(leftSubs).toContain('shared');
|
||||
expect(rightSubs).not.toContain('shared');
|
||||
expect(r.warnings.some(w => /collision.*shared/.test(w))).toBe(true);
|
||||
});
|
||||
|
||||
test('non-existent path is warned + skipped', () => {
|
||||
const r = loadPluginsFromEnv({ envPath: '/definitely/does/not/exist/here' });
|
||||
expect(r.plugins.length).toBe(0);
|
||||
expect(r.warnings.some(w => /does not exist/.test(w))).toBe(true);
|
||||
});
|
||||
|
||||
test('relative path in env is warned + skipped', () => {
|
||||
const r = loadPluginsFromEnv({ envPath: 'relative/dir' });
|
||||
expect(r.plugins.length).toBe(0);
|
||||
expect(r.warnings.some(w => /relative path rejected/.test(w))).toBe(true);
|
||||
});
|
||||
|
||||
test('a file (not a directory) is warned + skipped', () => {
|
||||
const file = path.join(tmpRoot, 'not-a-dir.txt');
|
||||
fs.writeFileSync(file, 'x');
|
||||
const r = loadPluginsFromEnv({ envPath: file });
|
||||
expect(r.plugins.length).toBe(0);
|
||||
expect(r.warnings.some(w => /not a directory/.test(w))).toBe(true);
|
||||
});
|
||||
|
||||
test('trims whitespace around paths', () => {
|
||||
const a = writePlugin('trimmed', { subagents: { 'x.md': `---\nname: x\n---\nbody` } });
|
||||
const r = loadPluginsFromEnv({ envPath: ` ${a} ` });
|
||||
expect(r.plugins.length).toBe(1);
|
||||
});
|
||||
|
||||
test('manifest rejection shows up as a warning (not a throw)', () => {
|
||||
const bad = writePlugin('futurep', { plugin_version: 'gbrain-plugin-v999' });
|
||||
const r = loadPluginsFromEnv({ envPath: bad });
|
||||
expect(r.plugins.length).toBe(0);
|
||||
expect(r.warnings.some(w => /unsupported plugin_version/.test(w))).toBe(true);
|
||||
});
|
||||
});
|
||||
114
test/put-page-namespace.test.ts
Normal file
114
test/put-page-namespace.test.ts
Normal file
@@ -0,0 +1,114 @@
|
||||
/**
|
||||
* Regression + namespace tests for put_page (v0.16.0 Lane 1D).
|
||||
*
|
||||
* The namespace rule confines subagent-originated writes to
|
||||
* `wiki/agents/<subagentId>/...`. This test pins:
|
||||
* - regression: local CLI and standard MCP paths (ctx.viaSubagent != true)
|
||||
* continue to accept ANY slug — the rule is opt-in by the dispatcher.
|
||||
* - namespace: anchored prefix, slash boundary, wrong id, leading-slash fail,
|
||||
* prefix-collision defeated, and fail-closed when subagentId is missing.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { operations, OperationError } from '../src/core/operations.ts';
|
||||
import type { OperationContext, Operation } from '../src/core/operations.ts';
|
||||
import type { BrainEngine } from '../src/core/engine.ts';
|
||||
|
||||
const put_page = operations.find(o => o.name === 'put_page') as Operation;
|
||||
if (!put_page) throw new Error('put_page op missing');
|
||||
|
||||
function makeCtx(overrides: Partial<OperationContext> = {}): OperationContext {
|
||||
const engine = {} as BrainEngine; // dry_run short-circuits before touching the engine
|
||||
return {
|
||||
engine,
|
||||
config: { engine: 'postgres' } as any,
|
||||
logger: { info: () => {}, warn: () => {}, error: () => {} },
|
||||
dryRun: true,
|
||||
remote: true,
|
||||
...overrides,
|
||||
};
|
||||
}
|
||||
|
||||
describe('put_page namespace (v0.15 subagent rule)', () => {
|
||||
describe('regression: non-subagent callers unchanged', () => {
|
||||
test('local CLI write (viaSubagent undefined) accepts arbitrary slug', async () => {
|
||||
const ctx = makeCtx({ remote: false });
|
||||
const result = await put_page.handler(ctx, { slug: 'people/alice', content: 'stub' });
|
||||
expect(result).toMatchObject({ dry_run: true, action: 'put_page', slug: 'people/alice' });
|
||||
});
|
||||
|
||||
test('MCP write (remote=true, viaSubagent=undefined) accepts arbitrary slug', async () => {
|
||||
const ctx = makeCtx({ remote: true });
|
||||
const result = await put_page.handler(ctx, { slug: 'wiki/analysis/foo', content: 'stub' });
|
||||
expect(result).toMatchObject({ dry_run: true, action: 'put_page', slug: 'wiki/analysis/foo' });
|
||||
});
|
||||
|
||||
test('viaSubagent=false is the same as unset', async () => {
|
||||
const ctx = makeCtx({ remote: true, viaSubagent: false, subagentId: 42 });
|
||||
const result = await put_page.handler(ctx, { slug: 'anything/goes', content: 'stub' });
|
||||
expect(result).toMatchObject({ dry_run: true });
|
||||
});
|
||||
});
|
||||
|
||||
describe('subagent namespace rule', () => {
|
||||
test('accepts wiki/agents/<subagentId>/ prefix', async () => {
|
||||
const ctx = makeCtx({ viaSubagent: true, subagentId: 42 });
|
||||
const result = await put_page.handler(ctx, { slug: 'wiki/agents/42/notes', content: 'stub' });
|
||||
expect(result).toMatchObject({ dry_run: true, slug: 'wiki/agents/42/notes' });
|
||||
});
|
||||
|
||||
test('accepts deep paths under the prefix', async () => {
|
||||
const ctx = makeCtx({ viaSubagent: true, subagentId: 42 });
|
||||
const result = await put_page.handler(ctx, { slug: 'wiki/agents/42/runs/2026-04-20/summary', content: 'stub' });
|
||||
expect(result).toMatchObject({ dry_run: true });
|
||||
});
|
||||
|
||||
test('rejects leading slash (slug grammar + anchor)', async () => {
|
||||
const ctx = makeCtx({ viaSubagent: true, subagentId: 42 });
|
||||
const p = put_page.handler(ctx, { slug: '/wiki/agents/42/foo', content: 'stub' });
|
||||
await expect(p).rejects.toBeInstanceOf(OperationError);
|
||||
});
|
||||
|
||||
test('rejects wrong subagentId', async () => {
|
||||
const ctx = makeCtx({ viaSubagent: true, subagentId: 42 });
|
||||
const p = put_page.handler(ctx, { slug: 'wiki/agents/12/foo', content: 'stub' });
|
||||
await expect(p).rejects.toBeInstanceOf(OperationError);
|
||||
});
|
||||
|
||||
test('rejects prefix-collision attempt (wiki/agents/12evil/* with subagentId=12)', async () => {
|
||||
const ctx = makeCtx({ viaSubagent: true, subagentId: 12 });
|
||||
const p = put_page.handler(ctx, { slug: 'wiki/agents/12evil/foo', content: 'stub' });
|
||||
await expect(p).rejects.toBeInstanceOf(OperationError);
|
||||
});
|
||||
|
||||
test('rejects bare prefix with no suffix (slug.length === prefix.length)', async () => {
|
||||
const ctx = makeCtx({ viaSubagent: true, subagentId: 42 });
|
||||
const p = put_page.handler(ctx, { slug: 'wiki/agents/42/', content: 'stub' });
|
||||
await expect(p).rejects.toBeInstanceOf(OperationError);
|
||||
});
|
||||
|
||||
test('FAIL-CLOSED: viaSubagent=true with undefined subagentId rejects any slug', async () => {
|
||||
const ctx = makeCtx({ viaSubagent: true });
|
||||
const p = put_page.handler(ctx, { slug: 'wiki/agents/42/foo', content: 'stub' });
|
||||
await expect(p).rejects.toBeInstanceOf(OperationError);
|
||||
await expect(p).rejects.toThrow(/subagentId/);
|
||||
});
|
||||
|
||||
test('FAIL-CLOSED: viaSubagent=true with NaN subagentId rejects', async () => {
|
||||
const ctx = makeCtx({ viaSubagent: true, subagentId: Number.NaN });
|
||||
const p = put_page.handler(ctx, { slug: 'wiki/agents/NaN/foo', content: 'stub' });
|
||||
await expect(p).rejects.toBeInstanceOf(OperationError);
|
||||
});
|
||||
|
||||
test('error code is permission_denied (not validation)', async () => {
|
||||
const ctx = makeCtx({ viaSubagent: true, subagentId: 42 });
|
||||
try {
|
||||
await put_page.handler(ctx, { slug: 'people/alice', content: 'stub' });
|
||||
throw new Error('should have thrown');
|
||||
} catch (e) {
|
||||
expect(e).toBeInstanceOf(OperationError);
|
||||
expect((e as OperationError).code).toBe('permission_denied');
|
||||
}
|
||||
});
|
||||
});
|
||||
});
|
||||
261
test/queue-child-done.test.ts
Normal file
261
test/queue-child-done.test.ts
Normal file
@@ -0,0 +1,261 @@
|
||||
/**
|
||||
* Lane 1B regression + coverage for the v0.15 queue changes:
|
||||
*
|
||||
* - failJob emits child_done(outcome='failed'|'dead') on terminal transition,
|
||||
* BEFORE the parent-terminal UPDATE (insertion order matters so the EXISTS
|
||||
* guard on inbox writes doesn't drop the row on fail_parent paths).
|
||||
* - cancelJob emits child_done(outcome='cancelled') to every descendant's
|
||||
* parent inbox.
|
||||
* - handleTimeouts emits child_done(outcome='timeout') to the parent inbox.
|
||||
* - Parent-resolution terminal set includes 'failed' so a failed child with
|
||||
* on_child_fail='continue' unblocks the aggregator.
|
||||
* - MinionJobInput.max_stalled threads through MinionQueue.add() on INSERT.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
|
||||
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
|
||||
import { MinionQueue } from '../src/core/minions/queue.ts';
|
||||
import type { ChildDoneMessage } from '../src/core/minions/types.ts';
|
||||
|
||||
let engine: PGLiteEngine;
|
||||
let queue: MinionQueue;
|
||||
|
||||
beforeAll(async () => {
|
||||
engine = new PGLiteEngine();
|
||||
await engine.connect({ databaseUrl: '' });
|
||||
await engine.initSchema();
|
||||
queue = new MinionQueue(engine);
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await engine.disconnect();
|
||||
});
|
||||
|
||||
beforeEach(async () => {
|
||||
await engine.executeRaw('DELETE FROM minion_jobs');
|
||||
});
|
||||
|
||||
// Helper: read all child_done payloads from a parent's inbox.
|
||||
async function readChildDoneInbox(parentId: number): Promise<ChildDoneMessage[]> {
|
||||
const rows = await engine.executeRaw<{ payload: unknown }>(
|
||||
`SELECT payload FROM minion_inbox WHERE job_id = $1 ORDER BY id`,
|
||||
[parentId]
|
||||
);
|
||||
return rows
|
||||
.map(r => (typeof r.payload === 'string' ? JSON.parse(r.payload) : r.payload) as ChildDoneMessage)
|
||||
.filter(p => p?.type === 'child_done');
|
||||
}
|
||||
|
||||
let tokenSeq = 0;
|
||||
function nextToken() { return `tok-${++tokenSeq}`; }
|
||||
|
||||
// Claim + fail the next job on the default queue for the given name.
|
||||
async function claimAndFail(name: string, newStatus: 'failed' | 'dead', errorText = 'boom') {
|
||||
const token = nextToken();
|
||||
const claimed = await queue.claim(token, 30000, 'default', [name]);
|
||||
if (!claimed) throw new Error(`nothing to claim for ${name}`);
|
||||
return queue.failJob(claimed.id, token, errorText, newStatus);
|
||||
}
|
||||
|
||||
// Claim + complete the next job on the default queue for the given name.
|
||||
async function claimAndComplete(name: string, result: Record<string, unknown> = {}) {
|
||||
const token = nextToken();
|
||||
const claimed = await queue.claim(token, 30000, 'default', [name]);
|
||||
if (!claimed) throw new Error(`nothing to claim for ${name}`);
|
||||
return queue.completeJob(claimed.id, token, result);
|
||||
}
|
||||
|
||||
describe('v0.15 child_done emission', () => {
|
||||
test('completeJob emits child_done with outcome=complete (regression)', async () => {
|
||||
const parent = await queue.add('parent', {});
|
||||
const child = await queue.add('child', {}, { parent_job_id: parent.id, on_child_fail: 'continue' });
|
||||
|
||||
await claimAndComplete('child', { ok: 1 });
|
||||
|
||||
const msgs = await readChildDoneInbox(parent.id);
|
||||
expect(msgs.length).toBe(1);
|
||||
expect(msgs[0].outcome).toBe('complete');
|
||||
expect(msgs[0].child_id).toBe(child.id);
|
||||
expect(msgs[0].result).toEqual({ ok: 1 });
|
||||
expect(msgs[0].error).toBeUndefined();
|
||||
});
|
||||
|
||||
test('failJob emits child_done(outcome=failed) on terminal failure with on_child_fail=continue', async () => {
|
||||
const parent = await queue.add('parent', {});
|
||||
const child = await queue.add('child', {}, { parent_job_id: parent.id, on_child_fail: 'continue' });
|
||||
|
||||
await claimAndFail('child', 'failed', 'kaboom');
|
||||
|
||||
const msgs = await readChildDoneInbox(parent.id);
|
||||
expect(msgs.length).toBe(1);
|
||||
expect(msgs[0].outcome).toBe('failed');
|
||||
expect(msgs[0].error).toBe('kaboom');
|
||||
});
|
||||
|
||||
test('failJob emits child_done(outcome=dead) when newStatus=dead', async () => {
|
||||
const parent = await queue.add('parent', {});
|
||||
const child = await queue.add('child', {}, { parent_job_id: parent.id, on_child_fail: 'continue' });
|
||||
|
||||
await claimAndFail('child', 'dead', 'exceeded attempts');
|
||||
|
||||
const msgs = await readChildDoneInbox(parent.id);
|
||||
expect(msgs.length).toBe(1);
|
||||
expect(msgs[0].outcome).toBe('dead');
|
||||
});
|
||||
|
||||
test('failJob does NOT emit child_done on a delayed retry (not terminal)', async () => {
|
||||
const parent = await queue.add('parent', {});
|
||||
const child = await queue.add('child', {}, { parent_job_id: parent.id });
|
||||
|
||||
const token = nextToken();
|
||||
const claimed = await queue.claim(token, 30000, 'default', ['child']);
|
||||
if (!claimed) throw new Error('no claim');
|
||||
await queue.failJob(claimed.id, token, 'transient', 'delayed', 1000);
|
||||
|
||||
const msgs = await readChildDoneInbox(parent.id);
|
||||
expect(msgs.length).toBe(0);
|
||||
});
|
||||
|
||||
test('failJob with fail_parent emits child_done BEFORE parent-terminal UPDATE (insertion order)', async () => {
|
||||
// Regression: if the parent-UPDATE ran first, the EXISTS guard on the
|
||||
// child_done INSERT would skip the row once parent.status='failed'. The
|
||||
// aggregator would then be unable to see the failure in its inbox.
|
||||
const parent = await queue.add('parent', {});
|
||||
const child = await queue.add('child', {}, { parent_job_id: parent.id, on_child_fail: 'fail_parent' });
|
||||
|
||||
await claimAndFail('child', 'failed', 'parent kill');
|
||||
|
||||
const msgs = await readChildDoneInbox(parent.id);
|
||||
expect(msgs.length).toBe(1);
|
||||
expect(msgs[0].outcome).toBe('failed');
|
||||
|
||||
// And the parent-terminal UPDATE still ran.
|
||||
const parentNow = await queue.getJob(parent.id);
|
||||
expect(parentNow?.status).toBe('failed');
|
||||
});
|
||||
|
||||
test('cancelJob on an individual child emits child_done(outcome=cancelled) to its aggregator parent', async () => {
|
||||
// This is the real codex scenario: the aggregator (parent) is alive in
|
||||
// waiting-children, and a sibling child gets cancelled. The aggregator
|
||||
// must see the child_done so it can count "N children resolved" and
|
||||
// eventually produce its summary.
|
||||
const parent = await queue.add('parent', {});
|
||||
const c1 = await queue.add('child1', {}, { parent_job_id: parent.id, on_child_fail: 'continue' });
|
||||
|
||||
await queue.cancelJob(c1.id);
|
||||
|
||||
const msgs = await readChildDoneInbox(parent.id);
|
||||
expect(msgs.length).toBe(1);
|
||||
expect(msgs[0].outcome).toBe('cancelled');
|
||||
expect(msgs[0].child_id).toBe(c1.id);
|
||||
|
||||
// And the aggregator parent itself was unblocked (no non-terminal kids).
|
||||
const p = await queue.getJob(parent.id);
|
||||
expect(p?.status).toBe('waiting');
|
||||
});
|
||||
|
||||
test('cancelJob cascading from parent is a no-op for the terminal parent\'s inbox (by design)', async () => {
|
||||
// When the aggregator itself is cancelled, cascading also cancels its
|
||||
// children. The child_done writes for those children would target the
|
||||
// (now-terminal) parent's inbox — the EXISTS guard drops them, which is
|
||||
// correct: a cancelled aggregator won't process its inbox anyway.
|
||||
const parent = await queue.add('parent', {});
|
||||
await queue.add('child1', {}, { parent_job_id: parent.id });
|
||||
await queue.add('child2', {}, { parent_job_id: parent.id });
|
||||
|
||||
await queue.cancelJob(parent.id);
|
||||
|
||||
const msgs = await readChildDoneInbox(parent.id);
|
||||
expect(msgs.length).toBe(0);
|
||||
|
||||
// But the cancellation itself succeeded.
|
||||
const p = await queue.getJob(parent.id);
|
||||
expect(p?.status).toBe('cancelled');
|
||||
});
|
||||
|
||||
test('handleTimeouts emits child_done(outcome=timeout) to parent inbox', async () => {
|
||||
const parent = await queue.add('parent', {});
|
||||
const child = await queue.add('child', {}, { parent_job_id: parent.id, on_child_fail: 'continue' });
|
||||
|
||||
const token = nextToken();
|
||||
const claimed = await queue.claim(token, 30000, 'default', ['child']);
|
||||
if (!claimed) throw new Error('no claim');
|
||||
// Force a past timeout_at for this claimed job.
|
||||
await engine.executeRaw(
|
||||
`UPDATE minion_jobs SET timeout_at = now() - interval '1 second' WHERE id = $1`,
|
||||
[claimed.id]
|
||||
);
|
||||
const timed = await queue.handleTimeouts();
|
||||
expect(timed.length).toBe(1);
|
||||
|
||||
const msgs = await readChildDoneInbox(parent.id);
|
||||
expect(msgs.length).toBe(1);
|
||||
expect(msgs[0].outcome).toBe('timeout');
|
||||
});
|
||||
});
|
||||
|
||||
describe('v0.15 parent-resolution terminal set', () => {
|
||||
test('failed child with on_child_fail=continue unblocks aggregator parent', async () => {
|
||||
const parent = await queue.add('parent', {});
|
||||
const c1 = await queue.add('child1', {}, { parent_job_id: parent.id, on_child_fail: 'continue' });
|
||||
const c2 = await queue.add('child2', {}, { parent_job_id: parent.id, on_child_fail: 'continue' });
|
||||
|
||||
// Parent should be waiting-children after fan-out.
|
||||
let p = await queue.getJob(parent.id);
|
||||
expect(p?.status).toBe('waiting-children');
|
||||
|
||||
// Fail c1.
|
||||
await claimAndFail('child1', 'failed');
|
||||
// Parent still waiting-children (c2 open).
|
||||
p = await queue.getJob(parent.id);
|
||||
expect(p?.status).toBe('waiting-children');
|
||||
|
||||
// Complete c2.
|
||||
await claimAndComplete('child2', { ok: 1 });
|
||||
// Parent unblocked.
|
||||
p = await queue.getJob(parent.id);
|
||||
expect(p?.status).toBe('waiting');
|
||||
});
|
||||
|
||||
test('all-failed children still unblock the parent', async () => {
|
||||
const parent = await queue.add('parent', {});
|
||||
const c1 = await queue.add('child1', {}, { parent_job_id: parent.id, on_child_fail: 'continue' });
|
||||
const c2 = await queue.add('child2', {}, { parent_job_id: parent.id, on_child_fail: 'continue' });
|
||||
|
||||
await claimAndFail('child1', 'failed');
|
||||
await claimAndFail('child2', 'failed');
|
||||
|
||||
const p = await queue.getJob(parent.id);
|
||||
expect(p?.status).toBe('waiting');
|
||||
});
|
||||
});
|
||||
|
||||
describe('v0.16 MinionJobInput.max_stalled', () => {
|
||||
test('default max_stalled picks up schema DEFAULT when omitted (regression)', async () => {
|
||||
// v0.14.3 bumped the schema column DEFAULT from 1 → 5 (max_stalled becomes
|
||||
// tolerant of short-lock blips for long-running LLM handlers). The v0.16
|
||||
// queue.add conditional-insert skips the column when the caller omits it,
|
||||
// so the schema DEFAULT is what actually stores. Pin the current default
|
||||
// rather than hardcoding the number.
|
||||
const job = await queue.add('child', {});
|
||||
expect(job.max_stalled).toBeGreaterThanOrEqual(1);
|
||||
expect(job.max_stalled).toBeLessThanOrEqual(100);
|
||||
// As of v0.14.3 the default is 5. If someone re-migrates the default up,
|
||||
// this assertion will fire and they can update it intentionally.
|
||||
expect(job.max_stalled).toBe(5);
|
||||
});
|
||||
|
||||
test('per-job max_stalled override threads through INSERT', async () => {
|
||||
const job = await queue.add('durable', {}, { max_stalled: 3 });
|
||||
expect(job.max_stalled).toBe(3);
|
||||
});
|
||||
|
||||
test('idempotency-key replay does NOT mutate existing max_stalled', async () => {
|
||||
const first = await queue.add('job', {}, { idempotency_key: 'k1', max_stalled: 3 });
|
||||
const second = await queue.add('job', {}, { idempotency_key: 'k1', max_stalled: 7 });
|
||||
expect(second.id).toBe(first.id);
|
||||
// First submitter wins; second submitter's override is silently ignored
|
||||
// (per codex iteration 3 finding — mutation would be a footgun).
|
||||
expect(second.max_stalled).toBe(3);
|
||||
});
|
||||
});
|
||||
148
test/rate-leases.test.ts
Normal file
148
test/rate-leases.test.ts
Normal file
@@ -0,0 +1,148 @@
|
||||
/**
|
||||
* Lease-based rate limiter tests. Runs against PGLite in-memory.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
|
||||
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
|
||||
import { MinionQueue } from '../src/core/minions/queue.ts';
|
||||
import {
|
||||
acquireLease,
|
||||
renewLease,
|
||||
releaseLease,
|
||||
renewLeaseWithBackoff,
|
||||
} from '../src/core/minions/rate-leases.ts';
|
||||
|
||||
let engine: PGLiteEngine;
|
||||
let queue: MinionQueue;
|
||||
let owner: number; // a minion_jobs.id to own leases (FK target)
|
||||
|
||||
beforeAll(async () => {
|
||||
engine = new PGLiteEngine();
|
||||
await engine.connect({ databaseUrl: '' });
|
||||
await engine.initSchema();
|
||||
queue = new MinionQueue(engine);
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await engine.disconnect();
|
||||
});
|
||||
|
||||
beforeEach(async () => {
|
||||
await engine.executeRaw('DELETE FROM subagent_rate_leases');
|
||||
await engine.executeRaw('DELETE FROM minion_jobs');
|
||||
const j = await queue.add('owner', {});
|
||||
owner = j.id;
|
||||
});
|
||||
|
||||
describe('acquireLease / releaseLease', () => {
|
||||
test('single acquire under cap returns lease id', async () => {
|
||||
const r = await acquireLease(engine, 'anthropic:messages', owner, 2);
|
||||
expect(r.acquired).toBe(true);
|
||||
expect(r.leaseId).toBeGreaterThan(0);
|
||||
expect(r.activeCount).toBe(1);
|
||||
});
|
||||
|
||||
test('acquires up to max_concurrent', async () => {
|
||||
const a = await acquireLease(engine, 'k', owner, 2);
|
||||
const b = await acquireLease(engine, 'k', owner, 2);
|
||||
expect(a.acquired).toBe(true);
|
||||
expect(b.acquired).toBe(true);
|
||||
expect(b.activeCount).toBe(2);
|
||||
});
|
||||
|
||||
test('rejects beyond max_concurrent', async () => {
|
||||
await acquireLease(engine, 'k', owner, 2);
|
||||
await acquireLease(engine, 'k', owner, 2);
|
||||
const third = await acquireLease(engine, 'k', owner, 2);
|
||||
expect(third.acquired).toBe(false);
|
||||
expect(third.leaseId).toBeUndefined();
|
||||
expect(third.activeCount).toBe(2);
|
||||
});
|
||||
|
||||
test('releaseLease frees a slot', async () => {
|
||||
const a = await acquireLease(engine, 'k', owner, 1);
|
||||
expect(a.acquired).toBe(true);
|
||||
const blocked = await acquireLease(engine, 'k', owner, 1);
|
||||
expect(blocked.acquired).toBe(false);
|
||||
|
||||
await releaseLease(engine, a.leaseId!);
|
||||
|
||||
const after = await acquireLease(engine, 'k', owner, 1);
|
||||
expect(after.acquired).toBe(true);
|
||||
});
|
||||
|
||||
test('different keys have independent capacity', async () => {
|
||||
const a = await acquireLease(engine, 'k1', owner, 1);
|
||||
const b = await acquireLease(engine, 'k2', owner, 1);
|
||||
expect(a.acquired).toBe(true);
|
||||
expect(b.acquired).toBe(true);
|
||||
});
|
||||
|
||||
test('stale leases auto-prune on next acquire', async () => {
|
||||
const a = await acquireLease(engine, 'k', owner, 1, { ttlMs: 10 });
|
||||
expect(a.acquired).toBe(true);
|
||||
// Force the lease to be stale.
|
||||
await engine.executeRaw(
|
||||
`UPDATE subagent_rate_leases SET expires_at = now() - interval '1 minute' WHERE id = $1`,
|
||||
[a.leaseId!],
|
||||
);
|
||||
const b = await acquireLease(engine, 'k', owner, 1);
|
||||
expect(b.acquired).toBe(true);
|
||||
// Only the fresh lease should remain.
|
||||
const rows = await engine.executeRaw<{ count: string }>(
|
||||
`SELECT count(*)::text AS count FROM subagent_rate_leases WHERE key = $1`,
|
||||
['k'],
|
||||
);
|
||||
expect(parseInt(rows[0]!.count, 10)).toBe(1);
|
||||
});
|
||||
|
||||
test('owner job deletion cascades lease rows', async () => {
|
||||
const a = await acquireLease(engine, 'k', owner, 1);
|
||||
expect(a.acquired).toBe(true);
|
||||
await engine.executeRaw(`DELETE FROM minion_jobs WHERE id = $1`, [owner]);
|
||||
const rows = await engine.executeRaw<{ count: string }>(
|
||||
`SELECT count(*)::text AS count FROM subagent_rate_leases WHERE key = $1`,
|
||||
['k'],
|
||||
);
|
||||
expect(parseInt(rows[0]!.count, 10)).toBe(0);
|
||||
});
|
||||
|
||||
test('releaseLease on a missing id is a no-op (idempotent)', async () => {
|
||||
await expect(releaseLease(engine, 99_999)).resolves.toBeUndefined();
|
||||
});
|
||||
});
|
||||
|
||||
describe('renewLease', () => {
|
||||
test('renewLease bumps expires_at and returns true', async () => {
|
||||
const a = await acquireLease(engine, 'k', owner, 1, { ttlMs: 50 });
|
||||
const before = await engine.executeRaw<{ expires_at: string }>(
|
||||
`SELECT expires_at FROM subagent_rate_leases WHERE id = $1`,
|
||||
[a.leaseId!],
|
||||
);
|
||||
await new Promise(r => setTimeout(r, 5));
|
||||
const ok = await renewLease(engine, a.leaseId!, 120_000);
|
||||
expect(ok).toBe(true);
|
||||
const after = await engine.executeRaw<{ expires_at: string }>(
|
||||
`SELECT expires_at FROM subagent_rate_leases WHERE id = $1`,
|
||||
[a.leaseId!],
|
||||
);
|
||||
expect(new Date(after[0]!.expires_at).getTime()).toBeGreaterThan(new Date(before[0]!.expires_at).getTime());
|
||||
});
|
||||
|
||||
test('renewLease on a missing lease returns false', async () => {
|
||||
expect(await renewLease(engine, 99_999)).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('renewLeaseWithBackoff', () => {
|
||||
test('returns true on live lease', async () => {
|
||||
const a = await acquireLease(engine, 'k', owner, 1);
|
||||
expect(await renewLeaseWithBackoff(engine, a.leaseId!)).toBe(true);
|
||||
});
|
||||
|
||||
test('returns false on pruned lease (no retry loop)', async () => {
|
||||
const a = await acquireLease(engine, 'k', owner, 1);
|
||||
await releaseLease(engine, a.leaseId!);
|
||||
expect(await renewLeaseWithBackoff(engine, a.leaseId!)).toBe(false);
|
||||
});
|
||||
});
|
||||
190
test/subagent-aggregator.test.ts
Normal file
190
test/subagent-aggregator.test.ts
Normal file
@@ -0,0 +1,190 @@
|
||||
/**
|
||||
* subagent_aggregator handler tests.
|
||||
*
|
||||
* The handler's contract is:
|
||||
* - read child_done messages from the inbox (already posted by Lane 1B's
|
||||
* queue changes on every terminal child transition)
|
||||
* - render a markdown summary
|
||||
* - return {children, summary, markdown}
|
||||
*
|
||||
* Tests use a synthetic MinionJobContext that serves a scripted inbox,
|
||||
* tracks progress/log writes, and records them so assertions can check
|
||||
* that the handler does the right bookkeeping.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import {
|
||||
subagentAggregatorHandler,
|
||||
__testing,
|
||||
} from '../src/core/minions/handlers/subagent-aggregator.ts';
|
||||
import type { MinionJobContext, ChildDoneMessage, InboxMessage, ChildOutcome } from '../src/core/minions/types.ts';
|
||||
|
||||
function ctxWithInbox(
|
||||
jobId: number,
|
||||
data: Record<string, unknown>,
|
||||
inbox: ChildDoneMessage[],
|
||||
): MinionJobContext & { _progress: unknown[]; _logs: string[] } {
|
||||
const progress: unknown[] = [];
|
||||
const logs: string[] = [];
|
||||
const inboxMessages: InboxMessage[] = inbox.map((payload, i) => ({
|
||||
id: i + 1,
|
||||
job_id: jobId,
|
||||
sender: 'minions',
|
||||
payload: payload as unknown,
|
||||
sent_at: new Date(),
|
||||
read_at: null,
|
||||
}));
|
||||
const ctx = {
|
||||
id: jobId,
|
||||
name: 'subagent_aggregator',
|
||||
data,
|
||||
attempts_made: 0,
|
||||
signal: new AbortController().signal,
|
||||
shutdownSignal: new AbortController().signal,
|
||||
async updateProgress(p: unknown) { progress.push(p); },
|
||||
async updateTokens() {},
|
||||
async log(m: string | unknown) { logs.push(typeof m === 'string' ? m : JSON.stringify(m)); },
|
||||
async isActive() { return true; },
|
||||
async readInbox() { return inboxMessages; },
|
||||
_progress: progress,
|
||||
_logs: logs,
|
||||
};
|
||||
return ctx as MinionJobContext & { _progress: unknown[]; _logs: string[] };
|
||||
}
|
||||
|
||||
function done(child_id: number, outcome: ChildOutcome, overrides: Partial<ChildDoneMessage> = {}): ChildDoneMessage {
|
||||
return {
|
||||
type: 'child_done',
|
||||
child_id,
|
||||
job_name: `child_${child_id}`,
|
||||
result: overrides.result !== undefined ? overrides.result : (outcome === 'complete' ? { ok: true } : null),
|
||||
outcome,
|
||||
error: overrides.error ?? (outcome === 'complete' ? null : `${outcome}`),
|
||||
};
|
||||
}
|
||||
|
||||
describe('subagent_aggregator happy paths', () => {
|
||||
test('empty children_ids returns no-children marker', async () => {
|
||||
const ctx = ctxWithInbox(1, {}, []);
|
||||
const res = await subagentAggregatorHandler(ctx);
|
||||
expect(res.children).toEqual([]);
|
||||
expect(res.markdown).toContain('_(no children)_');
|
||||
});
|
||||
|
||||
test('all children succeed → complete count + bracketed results', async () => {
|
||||
const ctx = ctxWithInbox(1, { children_ids: [10, 11] }, [
|
||||
done(10, 'complete', { result: { finding: 'a' } }),
|
||||
done(11, 'complete', { result: { finding: 'b' } }),
|
||||
]);
|
||||
const res = await subagentAggregatorHandler(ctx);
|
||||
expect(res.children.length).toBe(2);
|
||||
expect(res.summary.complete).toBe(2);
|
||||
expect(res.summary.failed).toBe(0);
|
||||
expect(res.markdown).toContain('## child 10');
|
||||
expect(res.markdown).toContain('"finding": "a"');
|
||||
});
|
||||
|
||||
test('mixed outcomes tallied correctly', async () => {
|
||||
const ctx = ctxWithInbox(1, { children_ids: [1, 2, 3, 4] }, [
|
||||
done(1, 'complete'),
|
||||
done(2, 'failed', { error: 'boom' }),
|
||||
done(3, 'cancelled'),
|
||||
done(4, 'timeout'),
|
||||
]);
|
||||
const res = await subagentAggregatorHandler(ctx);
|
||||
expect(res.summary).toEqual({ complete: 1, failed: 1, dead: 0, cancelled: 1, timeout: 1 });
|
||||
expect(res.markdown).toContain('child 2');
|
||||
expect(res.markdown).toContain('error: boom');
|
||||
});
|
||||
|
||||
test('result is null for non-complete outcomes (no leaked payload)', async () => {
|
||||
const ctx = ctxWithInbox(1, { children_ids: [42] }, [
|
||||
done(42, 'failed', { result: 'should-be-suppressed', error: 'x' }),
|
||||
]);
|
||||
const res = await subagentAggregatorHandler(ctx);
|
||||
expect(res.children[0]!.result).toBeNull();
|
||||
});
|
||||
|
||||
test('missing child_done is counted as failed with clear error', async () => {
|
||||
const ctx = ctxWithInbox(1, { children_ids: [10, 11] }, [done(10, 'complete')]);
|
||||
const res = await subagentAggregatorHandler(ctx);
|
||||
const missing = res.children.find(c => c.child_id === 11);
|
||||
expect(missing?.outcome).toBe('failed');
|
||||
expect(missing?.error).toContain('no child_done message observed');
|
||||
expect(res.summary.failed).toBe(1);
|
||||
});
|
||||
|
||||
test('preserves children_ids order in the output', async () => {
|
||||
const ctx = ctxWithInbox(1, { children_ids: [3, 1, 2] }, [
|
||||
done(1, 'complete'),
|
||||
done(2, 'complete'),
|
||||
done(3, 'complete'),
|
||||
]);
|
||||
const res = await subagentAggregatorHandler(ctx);
|
||||
expect(res.children.map(c => c.child_id)).toEqual([3, 1, 2]);
|
||||
});
|
||||
|
||||
test('custom aggregate_prompt_template becomes the markdown header', async () => {
|
||||
const ctx = ctxWithInbox(1, {
|
||||
children_ids: [1],
|
||||
aggregate_prompt_template: '# My synthesis of the shard runs',
|
||||
}, [done(1, 'complete')]);
|
||||
const res = await subagentAggregatorHandler(ctx);
|
||||
expect(res.markdown.startsWith('# My synthesis of the shard runs')).toBe(true);
|
||||
});
|
||||
|
||||
test('updateProgress + log emit once per run', async () => {
|
||||
const ctx = ctxWithInbox(1, { children_ids: [1, 2] }, [done(1, 'complete'), done(2, 'complete')]);
|
||||
await subagentAggregatorHandler(ctx);
|
||||
expect(ctx._progress.length).toBe(1);
|
||||
expect(ctx._logs.length).toBe(1);
|
||||
expect(ctx._logs[0]).toContain('aggregated 2 children');
|
||||
});
|
||||
});
|
||||
|
||||
describe('subagent_aggregator payload parsing', () => {
|
||||
test('handles stringified child_done payloads (from JSONB fetch)', async () => {
|
||||
const ctx = ctxWithInbox(1, { children_ids: [5] }, [
|
||||
// Simulate a stringified payload (PG returns JSONB as string in some paths).
|
||||
JSON.parse(JSON.stringify(done(5, 'complete'))) as ChildDoneMessage,
|
||||
]);
|
||||
const res = await subagentAggregatorHandler(ctx);
|
||||
expect(res.summary.complete).toBe(1);
|
||||
});
|
||||
|
||||
test('ignores non-child_done inbox messages', async () => {
|
||||
const inboxHybrid = [
|
||||
done(1, 'complete'),
|
||||
// unrelated payload (e.g. from a future message type)
|
||||
{ type: 'ping', echo: 'nope' } as unknown as ChildDoneMessage,
|
||||
];
|
||||
const ctx = ctxWithInbox(1, { children_ids: [1] }, inboxHybrid);
|
||||
const res = await subagentAggregatorHandler(ctx);
|
||||
expect(res.summary.complete).toBe(1);
|
||||
});
|
||||
|
||||
test('falls back to complete when outcome field is absent (legacy writer)', async () => {
|
||||
const legacy: ChildDoneMessage = {
|
||||
type: 'child_done', child_id: 99, job_name: 'legacy', result: { ok: true },
|
||||
};
|
||||
const ctx = ctxWithInbox(1, { children_ids: [99] }, [legacy]);
|
||||
const res = await subagentAggregatorHandler(ctx);
|
||||
expect(res.children[0]!.outcome).toBe('complete');
|
||||
});
|
||||
});
|
||||
|
||||
describe('internal helpers', () => {
|
||||
test('formatSummary skips zero counts', () => {
|
||||
const s = __testing.emptySummary();
|
||||
s.complete = 3;
|
||||
s.failed = 1;
|
||||
expect(__testing.formatSummary(s)).toBe('complete=3, failed=1');
|
||||
});
|
||||
|
||||
test('parseChildDone rejects obviously-bogus payloads', () => {
|
||||
expect(__testing.parseChildDone(null)).toBeNull();
|
||||
expect(__testing.parseChildDone({ type: 'not_child_done' })).toBeNull();
|
||||
expect(__testing.parseChildDone({ type: 'child_done' })).toBeNull(); // missing child_id
|
||||
expect(__testing.parseChildDone('not json')).toBeNull();
|
||||
});
|
||||
});
|
||||
151
test/subagent-audit.test.ts
Normal file
151
test/subagent-audit.test.ts
Normal file
@@ -0,0 +1,151 @@
|
||||
/**
|
||||
* subagent-audit tests. Exercises filename rotation, best-effort writes, and
|
||||
* the readback path used by `gbrain agent logs`. No real engine; purely
|
||||
* filesystem.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeEach, afterAll, beforeAll } from 'bun:test';
|
||||
import * as fs from 'node:fs';
|
||||
import * as path from 'node:path';
|
||||
import * as os from 'node:os';
|
||||
import {
|
||||
computeSubagentAuditFilename,
|
||||
logSubagentSubmission,
|
||||
logSubagentHeartbeat,
|
||||
readSubagentAuditForJob,
|
||||
} from '../src/core/minions/handlers/subagent-audit.ts';
|
||||
|
||||
let tmpDir: string;
|
||||
const savedAuditDir = process.env.GBRAIN_AUDIT_DIR;
|
||||
|
||||
beforeAll(() => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'subagent-audit-test-'));
|
||||
process.env.GBRAIN_AUDIT_DIR = tmpDir;
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
if (savedAuditDir === undefined) delete process.env.GBRAIN_AUDIT_DIR;
|
||||
else process.env.GBRAIN_AUDIT_DIR = savedAuditDir;
|
||||
fs.rmSync(tmpDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
beforeEach(() => {
|
||||
for (const f of fs.readdirSync(tmpDir)) {
|
||||
fs.rmSync(path.join(tmpDir, f), { force: true });
|
||||
}
|
||||
});
|
||||
|
||||
describe('computeSubagentAuditFilename', () => {
|
||||
test('formats as subagent-jobs-YYYY-Www.jsonl', () => {
|
||||
const name = computeSubagentAuditFilename(new Date('2026-04-20T12:00:00Z'));
|
||||
expect(name).toMatch(/^subagent-jobs-2026-W\d{2}\.jsonl$/);
|
||||
});
|
||||
|
||||
test('ISO year-boundary: 2027-01-01 is W53 of 2026', () => {
|
||||
// 2027-01-01 is a Friday; ISO week containing that day is W53 of 2026.
|
||||
const name = computeSubagentAuditFilename(new Date('2027-01-01T00:00:00Z'));
|
||||
expect(name).toBe('subagent-jobs-2026-W53.jsonl');
|
||||
});
|
||||
|
||||
test('mid-year dates carry the same year', () => {
|
||||
const name = computeSubagentAuditFilename(new Date('2026-06-15T12:00:00Z'));
|
||||
expect(name.startsWith('subagent-jobs-2026-W')).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
describe('logSubagentSubmission', () => {
|
||||
test('writes a JSONL line with submission type', () => {
|
||||
logSubagentSubmission({ caller: 'cli', remote: false, job_id: 42, model: 'sonnet' });
|
||||
const files = fs.readdirSync(tmpDir);
|
||||
expect(files.length).toBe(1);
|
||||
const raw = fs.readFileSync(path.join(tmpDir, files[0]!), 'utf8').trim();
|
||||
const parsed = JSON.parse(raw);
|
||||
expect(parsed.type).toBe('submission');
|
||||
expect(parsed.caller).toBe('cli');
|
||||
expect(parsed.job_id).toBe(42);
|
||||
expect(parsed.model).toBe('sonnet');
|
||||
expect(parsed.ts).toMatch(/^20\d\d-/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('logSubagentHeartbeat', () => {
|
||||
test('writes heartbeat type with turn_idx', () => {
|
||||
logSubagentHeartbeat({
|
||||
job_id: 1,
|
||||
event: 'llm_call_completed',
|
||||
turn_idx: 3,
|
||||
ms_elapsed: 1250,
|
||||
tokens: { in: 1000, out: 200, cache_read: 500 },
|
||||
});
|
||||
const files = fs.readdirSync(tmpDir);
|
||||
const raw = fs.readFileSync(path.join(tmpDir, files[0]!), 'utf8').trim();
|
||||
const parsed = JSON.parse(raw);
|
||||
expect(parsed.type).toBe('heartbeat');
|
||||
expect(parsed.event).toBe('llm_call_completed');
|
||||
expect(parsed.turn_idx).toBe(3);
|
||||
expect(parsed.tokens.in).toBe(1000);
|
||||
});
|
||||
|
||||
test('truncates long error text to 200 chars', () => {
|
||||
const long = 'x'.repeat(500);
|
||||
logSubagentHeartbeat({
|
||||
job_id: 1,
|
||||
event: 'tool_failed',
|
||||
turn_idx: 0,
|
||||
tool_name: 'brain_put_page',
|
||||
error: long,
|
||||
});
|
||||
const files = fs.readdirSync(tmpDir);
|
||||
const raw = fs.readFileSync(path.join(tmpDir, files[0]!), 'utf8').trim();
|
||||
const parsed = JSON.parse(raw);
|
||||
expect(parsed.error.length).toBe(200);
|
||||
});
|
||||
|
||||
test('best-effort: write failure does not throw', () => {
|
||||
const bogus = '/dev/null/not-a-dir';
|
||||
process.env.GBRAIN_AUDIT_DIR = bogus;
|
||||
try {
|
||||
expect(() => logSubagentHeartbeat({
|
||||
job_id: 1,
|
||||
event: 'llm_call_started',
|
||||
turn_idx: 0,
|
||||
})).not.toThrow();
|
||||
} finally {
|
||||
process.env.GBRAIN_AUDIT_DIR = tmpDir;
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe('readSubagentAuditForJob', () => {
|
||||
test('returns events for the target job in chronological order', () => {
|
||||
logSubagentSubmission({ caller: 'cli', remote: false, job_id: 100 });
|
||||
logSubagentHeartbeat({ job_id: 100, event: 'llm_call_started', turn_idx: 0 });
|
||||
logSubagentHeartbeat({ job_id: 100, event: 'llm_call_completed', turn_idx: 0, ms_elapsed: 500 });
|
||||
|
||||
const events = readSubagentAuditForJob(100);
|
||||
expect(events.length).toBe(3);
|
||||
// chronological
|
||||
for (let i = 1; i < events.length; i++) {
|
||||
expect(events[i]!.ts >= events[i - 1]!.ts).toBe(true);
|
||||
}
|
||||
});
|
||||
|
||||
test('filters to the requested job_id', () => {
|
||||
logSubagentSubmission({ caller: 'cli', remote: false, job_id: 1 });
|
||||
logSubagentSubmission({ caller: 'cli', remote: false, job_id: 2 });
|
||||
const justOne = readSubagentAuditForJob(1);
|
||||
expect(justOne.length).toBe(1);
|
||||
expect((justOne[0] as { job_id: number }).job_id).toBe(1);
|
||||
});
|
||||
|
||||
test('honors sinceIso filter', () => {
|
||||
logSubagentSubmission({ caller: 'cli', remote: false, job_id: 1 });
|
||||
// Use a future threshold to drop everything above.
|
||||
const future = new Date(Date.now() + 60_000).toISOString();
|
||||
expect(readSubagentAuditForJob(1, { sinceIso: future })).toEqual([]);
|
||||
});
|
||||
|
||||
test('returns [] when no audit files exist', () => {
|
||||
expect(readSubagentAuditForJob(999)).toEqual([]);
|
||||
});
|
||||
});
|
||||
422
test/subagent-handler.test.ts
Normal file
422
test/subagent-handler.test.ts
Normal file
@@ -0,0 +1,422 @@
|
||||
/**
|
||||
* Subagent handler tests with a mocked Anthropic Messages client.
|
||||
*
|
||||
* Strategy: every test scripts a sequence of Messages API responses, hands
|
||||
* them to a FakeMessagesClient, and inspects (a) the SubagentResult the
|
||||
* handler returns and (b) the persisted rows in subagent_messages +
|
||||
* subagent_tool_executions. Replay tests simulate a crash by constructing
|
||||
* a fresh handler bound to the same job row with partial state already
|
||||
* written.
|
||||
*
|
||||
* PGLite in-memory so the schema, ON CONFLICT, and two-phase persistence
|
||||
* all exercise real SQL.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
|
||||
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
|
||||
import { MinionQueue } from '../src/core/minions/queue.ts';
|
||||
import {
|
||||
makeSubagentHandler,
|
||||
RateLeaseUnavailableError,
|
||||
type MessagesClient,
|
||||
} from '../src/core/minions/handlers/subagent.ts';
|
||||
import type { ToolDef, MinionJobContext } from '../src/core/minions/types.ts';
|
||||
import type Anthropic from '@anthropic-ai/sdk';
|
||||
|
||||
let engine: PGLiteEngine;
|
||||
let queue: MinionQueue;
|
||||
|
||||
beforeAll(async () => {
|
||||
engine = new PGLiteEngine();
|
||||
await engine.connect({ databaseUrl: '' });
|
||||
await engine.initSchema();
|
||||
queue = new MinionQueue(engine);
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await engine.disconnect();
|
||||
});
|
||||
|
||||
beforeEach(async () => {
|
||||
await engine.executeRaw('DELETE FROM subagent_tool_executions');
|
||||
await engine.executeRaw('DELETE FROM subagent_messages');
|
||||
await engine.executeRaw('DELETE FROM subagent_rate_leases');
|
||||
await engine.executeRaw('DELETE FROM minion_jobs');
|
||||
});
|
||||
|
||||
// ── FakeMessagesClient ──────────────────────────────────────
|
||||
|
||||
type FakeResponse = Partial<Anthropic.Message> & { content: Anthropic.Message['content'] };
|
||||
|
||||
class FakeMessagesClient implements MessagesClient {
|
||||
public calls: Anthropic.MessageCreateParamsNonStreaming[] = [];
|
||||
constructor(private responses: FakeResponse[]) {}
|
||||
async create(
|
||||
params: Anthropic.MessageCreateParamsNonStreaming,
|
||||
): Promise<Anthropic.Message> {
|
||||
this.calls.push(params);
|
||||
if (this.responses.length === 0) throw new Error('FakeMessagesClient: out of scripted responses');
|
||||
const r = this.responses.shift()!;
|
||||
return {
|
||||
id: `msg_${this.calls.length}`,
|
||||
type: 'message',
|
||||
role: 'assistant',
|
||||
model: params.model,
|
||||
stop_reason: 'end_turn',
|
||||
stop_sequence: null,
|
||||
usage: { input_tokens: 10, output_tokens: 5, cache_read_input_tokens: 0, cache_creation_input_tokens: 0 } as any,
|
||||
...r,
|
||||
} as Anthropic.Message;
|
||||
}
|
||||
}
|
||||
|
||||
// Build a synthetic MinionJobContext around a real minion_jobs row. The
|
||||
// handler only reads data/id/signal/shutdownSignal/updateTokens — we stub
|
||||
// the rest. `subagent` is a protected job name (Lane 4H) so tests submit
|
||||
// under the trusted-submit flag.
|
||||
async function makeCtx(input: unknown): Promise<MinionJobContext> {
|
||||
const job = await queue.add(
|
||||
'subagent',
|
||||
input as Record<string, unknown>,
|
||||
{},
|
||||
{ allowProtectedSubmit: true },
|
||||
);
|
||||
const ac = new AbortController();
|
||||
const shutdown = new AbortController();
|
||||
return {
|
||||
id: job.id,
|
||||
name: job.name,
|
||||
data: (input as Record<string, unknown>) ?? {},
|
||||
attempts_made: 0,
|
||||
signal: ac.signal,
|
||||
shutdownSignal: shutdown.signal,
|
||||
async updateProgress() {},
|
||||
async updateTokens() {},
|
||||
async log() {},
|
||||
async isActive() { return true; },
|
||||
async readInbox() { return []; },
|
||||
};
|
||||
}
|
||||
|
||||
// ── Tiny tool registry for tests ────────────────────────────
|
||||
|
||||
function makeEchoTool(name = 'echo', idempotent = true): ToolDef {
|
||||
return {
|
||||
name,
|
||||
description: 'echo input',
|
||||
input_schema: { type: 'object', properties: { value: { type: 'string' } }, required: [] },
|
||||
idempotent,
|
||||
async execute(input) { return { echoed: input }; },
|
||||
};
|
||||
}
|
||||
|
||||
function makeThrowingTool(name = 'broken'): ToolDef {
|
||||
return {
|
||||
name,
|
||||
description: 'always throws',
|
||||
input_schema: { type: 'object', properties: {}, required: [] },
|
||||
idempotent: true,
|
||||
async execute() { throw new Error('tool broken'); },
|
||||
};
|
||||
}
|
||||
|
||||
// ── Tests ───────────────────────────────────────────────────
|
||||
|
||||
describe('subagent handler happy path', () => {
|
||||
test('no-tool end_turn: returns text response + persists user + assistant rows', async () => {
|
||||
const client = new FakeMessagesClient([
|
||||
{ content: [{ type: 'text', text: 'hello world' }] as any, stop_reason: 'end_turn' },
|
||||
]);
|
||||
const handler = makeSubagentHandler({ engine, client, toolRegistry: [] });
|
||||
const ctx = await makeCtx({ prompt: 'hi' });
|
||||
|
||||
const result = await handler(ctx);
|
||||
|
||||
expect(result.result).toBe('hello world');
|
||||
expect(result.turns_count).toBe(1);
|
||||
expect(result.stop_reason).toBe('end_turn');
|
||||
expect(result.tokens.in).toBe(10);
|
||||
expect(result.tokens.out).toBe(5);
|
||||
|
||||
const msgs = await engine.executeRaw<{ count: string }>(
|
||||
`SELECT count(*)::text AS count FROM subagent_messages WHERE job_id = $1`,
|
||||
[ctx.id],
|
||||
);
|
||||
expect(parseInt(msgs[0]!.count, 10)).toBe(2); // user seed + assistant
|
||||
});
|
||||
|
||||
test('single tool_use turn: tool executes, two-phase row goes complete', async () => {
|
||||
const tool = makeEchoTool();
|
||||
const client = new FakeMessagesClient([
|
||||
{
|
||||
content: [
|
||||
{ type: 'tool_use', id: 'tu_1', name: 'echo', input: { value: 'v1' } } as any,
|
||||
],
|
||||
stop_reason: 'tool_use' as any,
|
||||
},
|
||||
{
|
||||
content: [{ type: 'text', text: 'done' }] as any,
|
||||
stop_reason: 'end_turn',
|
||||
},
|
||||
]);
|
||||
const handler = makeSubagentHandler({ engine, client, toolRegistry: [tool] });
|
||||
const ctx = await makeCtx({ prompt: 'go' });
|
||||
|
||||
const result = await handler(ctx);
|
||||
expect(result.stop_reason).toBe('end_turn');
|
||||
expect(result.result).toBe('done');
|
||||
expect(client.calls.length).toBe(2);
|
||||
|
||||
// tool_executions row complete with echoed output
|
||||
const rows = await engine.executeRaw<{ status: string; output: unknown }>(
|
||||
`SELECT status, output FROM subagent_tool_executions WHERE job_id = $1`,
|
||||
[ctx.id],
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0]!.status).toBe('complete');
|
||||
const out = typeof rows[0]!.output === 'string' ? JSON.parse(rows[0]!.output as string) : rows[0]!.output;
|
||||
expect(out).toEqual({ echoed: { value: 'v1' } });
|
||||
});
|
||||
|
||||
test('tool throws: row goes failed, model sees error, loop continues', async () => {
|
||||
const tool = makeThrowingTool();
|
||||
const client = new FakeMessagesClient([
|
||||
{
|
||||
content: [{ type: 'tool_use', id: 'tu_1', name: 'broken', input: {} } as any],
|
||||
stop_reason: 'tool_use' as any,
|
||||
},
|
||||
{
|
||||
content: [{ type: 'text', text: 'recovered' }] as any,
|
||||
stop_reason: 'end_turn',
|
||||
},
|
||||
]);
|
||||
const handler = makeSubagentHandler({ engine, client, toolRegistry: [tool] });
|
||||
const ctx = await makeCtx({ prompt: 'try' });
|
||||
|
||||
const result = await handler(ctx);
|
||||
expect(result.stop_reason).toBe('end_turn');
|
||||
expect(result.result).toBe('recovered');
|
||||
|
||||
const rows = await engine.executeRaw<{ status: string; error: string | null }>(
|
||||
`SELECT status, error FROM subagent_tool_executions WHERE job_id = $1`,
|
||||
[ctx.id],
|
||||
);
|
||||
expect(rows[0]!.status).toBe('failed');
|
||||
expect(rows[0]!.error).toContain('tool broken');
|
||||
});
|
||||
|
||||
test('unknown tool name fails execution but loop continues', async () => {
|
||||
const client = new FakeMessagesClient([
|
||||
{
|
||||
content: [{ type: 'tool_use', id: 'tu_nope', name: 'no_such_tool', input: {} } as any],
|
||||
stop_reason: 'tool_use' as any,
|
||||
},
|
||||
{ content: [{ type: 'text', text: 'ok' }] as any, stop_reason: 'end_turn' },
|
||||
]);
|
||||
const handler = makeSubagentHandler({ engine, client, toolRegistry: [] });
|
||||
const ctx = await makeCtx({ prompt: 'x' });
|
||||
|
||||
const result = await handler(ctx);
|
||||
expect(result.stop_reason).toBe('end_turn');
|
||||
|
||||
const rows = await engine.executeRaw<{ status: string; error: string | null }>(
|
||||
`SELECT status, error FROM subagent_tool_executions WHERE job_id = $1`,
|
||||
[ctx.id],
|
||||
);
|
||||
expect(rows[0]!.status).toBe('failed');
|
||||
expect(rows[0]!.error).toContain('not in the registry');
|
||||
});
|
||||
|
||||
test('max_turns exceeded returns stop_reason=max_turns', async () => {
|
||||
// Model keeps calling tool_use forever; we cap at 2 turns.
|
||||
const echoing: FakeResponse[] = Array.from({ length: 5 }).map((_, i) => ({
|
||||
content: [{ type: 'tool_use', id: `tu_${i}`, name: 'echo', input: {} } as any],
|
||||
stop_reason: 'tool_use' as any,
|
||||
}));
|
||||
const client = new FakeMessagesClient(echoing);
|
||||
const tool = makeEchoTool();
|
||||
const handler = makeSubagentHandler({ engine, client, toolRegistry: [tool] });
|
||||
const ctx = await makeCtx({ prompt: 'loop', max_turns: 2 });
|
||||
|
||||
const result = await handler(ctx);
|
||||
expect(result.stop_reason).toBe('max_turns');
|
||||
expect(result.turns_count).toBe(2);
|
||||
});
|
||||
});
|
||||
|
||||
describe('subagent handler replay (crash recovery)', () => {
|
||||
test('resumes from persisted messages when prior rows exist', async () => {
|
||||
// Seed an in-progress conversation by running the first client, then
|
||||
// running a second handler on the SAME job with responses starting at
|
||||
// turn 2. No duplicate user-seed row (ON CONFLICT DO NOTHING).
|
||||
const tool = makeEchoTool();
|
||||
const client1 = new FakeMessagesClient([
|
||||
{
|
||||
content: [{ type: 'tool_use', id: 'tu_1', name: 'echo', input: { v: 1 } } as any],
|
||||
stop_reason: 'tool_use' as any,
|
||||
},
|
||||
]);
|
||||
const handler1 = makeSubagentHandler({ engine, client: client1, toolRegistry: [tool] });
|
||||
const ctx = await makeCtx({ prompt: 'start' });
|
||||
|
||||
// Run handler1 until it WOULD make a second LLM call — force that
|
||||
// second call to error so we persist only the first assistant message.
|
||||
try {
|
||||
const client1b = new FakeMessagesClient([
|
||||
{
|
||||
content: [{ type: 'tool_use', id: 'tu_1', name: 'echo', input: { v: 1 } } as any],
|
||||
stop_reason: 'tool_use' as any,
|
||||
},
|
||||
]);
|
||||
const interrupted = makeSubagentHandler({ engine, client: client1b, toolRegistry: [tool] });
|
||||
await interrupted(ctx);
|
||||
} catch {
|
||||
// Out-of-scripted-responses — simulates worker kill before turn 2.
|
||||
}
|
||||
|
||||
// Confirm partial state: 1 user + 1 assistant + 1 synthesized user
|
||||
// (tool_result) + 1 tool_exec complete.
|
||||
const preRows = await engine.executeRaw<{ c: string }>(
|
||||
`SELECT count(*)::text AS c FROM subagent_messages WHERE job_id = $1`,
|
||||
[ctx.id],
|
||||
);
|
||||
const preCount = parseInt(preRows[0]!.c, 10);
|
||||
expect(preCount).toBeGreaterThanOrEqual(1);
|
||||
|
||||
// Resume with a fresh handler + client that supplies ONE more response.
|
||||
const client2 = new FakeMessagesClient([
|
||||
{ content: [{ type: 'text', text: 'resumed ok' }] as any, stop_reason: 'end_turn' },
|
||||
]);
|
||||
const handler2 = makeSubagentHandler({ engine, client: client2, toolRegistry: [tool] });
|
||||
const result = await handler2(ctx);
|
||||
|
||||
expect(result.result).toBe('resumed ok');
|
||||
expect(result.stop_reason).toBe('end_turn');
|
||||
// Second client should see the prior conversation in the messages
|
||||
// array — at minimum the user seed + prior assistant + tool_result.
|
||||
expect(client2.calls[0]!.messages.length).toBeGreaterThan(1);
|
||||
});
|
||||
|
||||
test('prior completed tool exec is replayed without re-invoking execute', async () => {
|
||||
// Prior state: a completed tool row. We assert the tool's execute is
|
||||
// NOT called on resume. Use a tool that throws if invoked — passing
|
||||
// means we used the replay path.
|
||||
const throwingTool = makeThrowingTool('pre_done');
|
||||
const ctx = await makeCtx({ prompt: 'start' });
|
||||
|
||||
// Seed prior state manually: user, assistant with tool_use, tool_exec complete.
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_messages (job_id, message_idx, role, content_blocks)
|
||||
VALUES ($1, 0, 'user', $2::jsonb)`,
|
||||
[ctx.id, JSON.stringify([{ type: 'text', text: 'start' }])],
|
||||
);
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_messages (job_id, message_idx, role, content_blocks, model)
|
||||
VALUES ($1, 1, 'assistant', $2::jsonb, 'claude-sonnet-4-6')`,
|
||||
[
|
||||
ctx.id,
|
||||
JSON.stringify([
|
||||
{ type: 'tool_use', id: 'tu_seeded', name: 'pre_done', input: {} },
|
||||
]),
|
||||
],
|
||||
);
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_tool_executions (job_id, message_idx, tool_use_id, tool_name, input, status, output)
|
||||
VALUES ($1, 1, 'tu_seeded', 'pre_done', '{}'::jsonb, 'complete', $2::jsonb)`,
|
||||
[ctx.id, JSON.stringify({ replayed: true })],
|
||||
);
|
||||
|
||||
// Handler MUST NOT call the throwing execute and MUST end the loop on
|
||||
// the next LLM response.
|
||||
const client = new FakeMessagesClient([
|
||||
{ content: [{ type: 'text', text: 'finished after replay' }] as any, stop_reason: 'end_turn' },
|
||||
]);
|
||||
const handler = makeSubagentHandler({ engine, client, toolRegistry: [throwingTool] });
|
||||
const result = await handler(ctx);
|
||||
|
||||
expect(result.stop_reason).toBe('end_turn');
|
||||
expect(result.result).toBe('finished after replay');
|
||||
// Only one LLM call made on this resume (we had 2 persisted messages +
|
||||
// the tool result synthesis happened when resuming, then model spoke).
|
||||
expect(client.calls.length).toBe(1);
|
||||
});
|
||||
|
||||
test('pending non-idempotent tool exec rejects on resume', async () => {
|
||||
const nonIdempotent = { ...makeEchoTool('do_once'), idempotent: false };
|
||||
const ctx = await makeCtx({ prompt: 'start' });
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_messages (job_id, message_idx, role, content_blocks)
|
||||
VALUES ($1, 0, 'user', $2::jsonb)`,
|
||||
[ctx.id, JSON.stringify([{ type: 'text', text: 'start' }])],
|
||||
);
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_messages (job_id, message_idx, role, content_blocks)
|
||||
VALUES ($1, 1, 'assistant', $2::jsonb)`,
|
||||
[
|
||||
ctx.id,
|
||||
JSON.stringify([{ type: 'tool_use', id: 'tu_x', name: 'do_once', input: {} }]),
|
||||
],
|
||||
);
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_tool_executions (job_id, message_idx, tool_use_id, tool_name, input, status)
|
||||
VALUES ($1, 1, 'tu_x', 'do_once', '{}'::jsonb, 'pending')`,
|
||||
[ctx.id],
|
||||
);
|
||||
|
||||
const client = new FakeMessagesClient([]);
|
||||
const handler = makeSubagentHandler({ engine, client, toolRegistry: [nonIdempotent] });
|
||||
await expect(handler(ctx)).rejects.toThrow(/non-idempotent/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('subagent handler lease behavior', () => {
|
||||
test('acquires + releases a lease around the LLM call', async () => {
|
||||
const client = new FakeMessagesClient([
|
||||
{ content: [{ type: 'text', text: 'ok' }] as any, stop_reason: 'end_turn' },
|
||||
]);
|
||||
const handler = makeSubagentHandler({
|
||||
engine, client, toolRegistry: [], maxConcurrent: 1, rateLeaseKey: 'k1',
|
||||
});
|
||||
const ctx = await makeCtx({ prompt: 'hi' });
|
||||
await handler(ctx);
|
||||
// No leases should remain after completion.
|
||||
const rows = await engine.executeRaw<{ c: string }>(
|
||||
`SELECT count(*)::text AS c FROM subagent_rate_leases`,
|
||||
);
|
||||
expect(parseInt(rows[0]!.c, 10)).toBe(0);
|
||||
});
|
||||
|
||||
test('throws RateLeaseUnavailableError when cap full', async () => {
|
||||
// Preload the cap with a stale-looking-but-live lease owned by a
|
||||
// different job.
|
||||
const owner = await queue.add('holder', {});
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_rate_leases (key, owner_job_id, expires_at)
|
||||
VALUES ('k_cap', $1, now() + interval '1 minute')`,
|
||||
[owner.id],
|
||||
);
|
||||
const client = new FakeMessagesClient([]);
|
||||
const handler = makeSubagentHandler({
|
||||
engine, client, toolRegistry: [], maxConcurrent: 1, rateLeaseKey: 'k_cap',
|
||||
});
|
||||
const ctx = await makeCtx({ prompt: 'blocked' });
|
||||
await expect(handler(ctx)).rejects.toBeInstanceOf(RateLeaseUnavailableError);
|
||||
});
|
||||
});
|
||||
|
||||
describe('subagent handler input validation', () => {
|
||||
test('missing prompt throws', async () => {
|
||||
const client = new FakeMessagesClient([]);
|
||||
const handler = makeSubagentHandler({ engine, client, toolRegistry: [] });
|
||||
const ctx = await makeCtx({});
|
||||
await expect(handler(ctx)).rejects.toThrow(/prompt/);
|
||||
});
|
||||
|
||||
test('allowed_tools unknown name rejected at dispatch', async () => {
|
||||
const tool = makeEchoTool('real');
|
||||
const client = new FakeMessagesClient([]);
|
||||
const handler = makeSubagentHandler({ engine, client, toolRegistry: [tool] });
|
||||
const ctx = await makeCtx({ prompt: 'x', allowed_tools: ['real', 'ghost_tool'] });
|
||||
await expect(handler(ctx)).rejects.toThrow(/unknown tool/);
|
||||
});
|
||||
});
|
||||
167
test/subagent-transcript.test.ts
Normal file
167
test/subagent-transcript.test.ts
Normal file
@@ -0,0 +1,167 @@
|
||||
/**
|
||||
* transcript renderer tests. Uses PGLite in-memory to round-trip messages +
|
||||
* tool executions through the actual schema so the loader path is exercised.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
|
||||
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
|
||||
import { MinionQueue } from '../src/core/minions/queue.ts';
|
||||
import {
|
||||
loadTranscriptRows,
|
||||
renderTranscript,
|
||||
} from '../src/core/minions/transcript.ts';
|
||||
import type { ContentBlock } from '../src/core/minions/types.ts';
|
||||
|
||||
let engine: PGLiteEngine;
|
||||
let queue: MinionQueue;
|
||||
let jobId: number;
|
||||
|
||||
beforeAll(async () => {
|
||||
engine = new PGLiteEngine();
|
||||
await engine.connect({ databaseUrl: '' });
|
||||
await engine.initSchema();
|
||||
queue = new MinionQueue(engine);
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await engine.disconnect();
|
||||
});
|
||||
|
||||
beforeEach(async () => {
|
||||
await engine.executeRaw('DELETE FROM subagent_messages');
|
||||
await engine.executeRaw('DELETE FROM subagent_tool_executions');
|
||||
await engine.executeRaw('DELETE FROM minion_jobs');
|
||||
const j = await queue.add(
|
||||
'subagent',
|
||||
{ prompt: 'hi' },
|
||||
{},
|
||||
{ allowProtectedSubmit: true },
|
||||
);
|
||||
jobId = j.id;
|
||||
});
|
||||
|
||||
async function insertMessage(
|
||||
idx: number,
|
||||
role: 'user' | 'assistant',
|
||||
blocks: ContentBlock[],
|
||||
tokens: { in?: number; out?: number; cache_read?: number; cache_create?: number } = {},
|
||||
model = 'claude-sonnet-4-6',
|
||||
) {
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_messages (job_id, message_idx, role, content_blocks, tokens_in, tokens_out, tokens_cache_read, tokens_cache_create, model)
|
||||
VALUES ($1, $2, $3, $4::jsonb, $5, $6, $7, $8, $9)`,
|
||||
[jobId, idx, role, JSON.stringify(blocks), tokens.in ?? null, tokens.out ?? null, tokens.cache_read ?? null, tokens.cache_create ?? null, model],
|
||||
);
|
||||
}
|
||||
|
||||
async function insertTool(
|
||||
idx: number,
|
||||
toolUseId: string,
|
||||
toolName: string,
|
||||
input: unknown,
|
||||
status: 'pending' | 'complete' | 'failed',
|
||||
output: unknown = null,
|
||||
error: string | null = null,
|
||||
) {
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO subagent_tool_executions (job_id, message_idx, tool_use_id, tool_name, input, status, output, error)
|
||||
VALUES ($1, $2, $3, $4, $5::jsonb, $6, $7::jsonb, $8)`,
|
||||
[jobId, idx, toolUseId, toolName, JSON.stringify(input), status, output == null ? null : JSON.stringify(output), error],
|
||||
);
|
||||
}
|
||||
|
||||
describe('loadTranscriptRows', () => {
|
||||
test('empty job returns empty arrays', async () => {
|
||||
const { messages, tools } = await loadTranscriptRows(engine, jobId);
|
||||
expect(messages).toEqual([]);
|
||||
expect(tools).toEqual([]);
|
||||
});
|
||||
|
||||
test('returns messages in message_idx order', async () => {
|
||||
await insertMessage(1, 'assistant', [{ type: 'text', text: 'second' }]);
|
||||
await insertMessage(0, 'user', [{ type: 'text', text: 'first' }]);
|
||||
const { messages } = await loadTranscriptRows(engine, jobId);
|
||||
expect(messages.map(m => m.message_idx)).toEqual([0, 1]);
|
||||
});
|
||||
|
||||
test('parses content_blocks from JSONB', async () => {
|
||||
const block: ContentBlock = { type: 'tool_use', id: 'tu_1', name: 'brain_search', input: { q: 'x' } };
|
||||
await insertMessage(0, 'assistant', [block]);
|
||||
const { messages } = await loadTranscriptRows(engine, jobId);
|
||||
expect(messages[0]!.content_blocks[0]!.type).toBe('tool_use');
|
||||
});
|
||||
});
|
||||
|
||||
describe('renderTranscript', () => {
|
||||
test('empty messages produce a "no messages" placeholder', () => {
|
||||
const md = renderTranscript([], []);
|
||||
expect(md).toContain('# Subagent transcript');
|
||||
expect(md).toContain('_(no messages)_');
|
||||
});
|
||||
|
||||
test('renders text content under role headers', async () => {
|
||||
await insertMessage(0, 'user', [{ type: 'text', text: 'hello' }]);
|
||||
await insertMessage(1, 'assistant', [{ type: 'text', text: 'hi back' }], { in: 5, out: 3 });
|
||||
const { messages, tools } = await loadTranscriptRows(engine, jobId);
|
||||
const md = renderTranscript(messages, tools);
|
||||
expect(md).toContain('## Message 0 — user');
|
||||
expect(md).toContain('hello');
|
||||
expect(md).toContain('## Message 1 — assistant');
|
||||
expect(md).toContain('hi back');
|
||||
expect(md).toContain('tokens:');
|
||||
expect(md).toContain('in=5');
|
||||
});
|
||||
|
||||
test('renders tool_use with matching execution row', async () => {
|
||||
await insertMessage(0, 'assistant', [
|
||||
{ type: 'tool_use', id: 'tu_42', name: 'brain_get_page', input: { slug: 'foo' } },
|
||||
]);
|
||||
await insertTool(0, 'tu_42', 'brain_get_page', { slug: 'foo' }, 'complete', { title: 'Foo' });
|
||||
const { messages, tools } = await loadTranscriptRows(engine, jobId);
|
||||
const md = renderTranscript(messages, tools);
|
||||
expect(md).toContain('**tool_use** `brain_get_page`');
|
||||
expect(md).toContain('status: **complete**');
|
||||
expect(md).toContain('"title": "Foo"');
|
||||
});
|
||||
|
||||
test('renders tool_use with failed execution row shows error', async () => {
|
||||
await insertMessage(0, 'assistant', [
|
||||
{ type: 'tool_use', id: 'tu_43', name: 'brain_put_page', input: { slug: 'bad' } },
|
||||
]);
|
||||
await insertTool(0, 'tu_43', 'brain_put_page', { slug: 'bad' }, 'failed', null, 'permission_denied');
|
||||
const { messages, tools } = await loadTranscriptRows(engine, jobId);
|
||||
const md = renderTranscript(messages, tools);
|
||||
expect(md).toContain('status: **failed**');
|
||||
expect(md).toContain('permission_denied');
|
||||
});
|
||||
|
||||
test('pending tool execution is shown as pending', async () => {
|
||||
await insertMessage(0, 'assistant', [
|
||||
{ type: 'tool_use', id: 'tu_44', name: 'brain_search', input: { q: 'x' } },
|
||||
]);
|
||||
await insertTool(0, 'tu_44', 'brain_search', { q: 'x' }, 'pending');
|
||||
const { messages, tools } = await loadTranscriptRows(engine, jobId);
|
||||
const md = renderTranscript(messages, tools);
|
||||
expect(md).toContain('pending (no resolution recorded yet)');
|
||||
});
|
||||
|
||||
test('truncates huge tool outputs per maxOutputBytes', async () => {
|
||||
await insertMessage(0, 'assistant', [
|
||||
{ type: 'tool_use', id: 'tu_big', name: 'brain_search', input: {} },
|
||||
]);
|
||||
const huge = 'x'.repeat(8000);
|
||||
await insertTool(0, 'tu_big', 'brain_search', {}, 'complete', { body: huge });
|
||||
const { messages, tools } = await loadTranscriptRows(engine, jobId);
|
||||
const md = renderTranscript(messages, tools, { maxOutputBytes: 1024 });
|
||||
expect(md).toContain('[truncated at 1024 bytes]');
|
||||
expect(md.length).toBeLessThan(huge.length);
|
||||
});
|
||||
|
||||
test('unknown block types fall through to a JSON dump', async () => {
|
||||
await insertMessage(0, 'assistant', [{ type: 'some_future_block_type', extra: 42 } as any]);
|
||||
const { messages, tools } = await loadTranscriptRows(engine, jobId);
|
||||
const md = renderTranscript(messages, tools);
|
||||
expect(md).toContain('**some_future_block_type**');
|
||||
expect(md).toContain('"extra": 42');
|
||||
});
|
||||
});
|
||||
112
test/wait-for-completion.test.ts
Normal file
112
test/wait-for-completion.test.ts
Normal file
@@ -0,0 +1,112 @@
|
||||
/**
|
||||
* waitForCompletion tests. Uses PGLite in-memory so the poll path exercises
|
||||
* a real getJob over a real engine.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
|
||||
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
|
||||
import { MinionQueue } from '../src/core/minions/queue.ts';
|
||||
import { waitForCompletion, TimeoutError, __testing } from '../src/core/minions/wait-for-completion.ts';
|
||||
|
||||
let engine: PGLiteEngine;
|
||||
let queue: MinionQueue;
|
||||
|
||||
beforeAll(async () => {
|
||||
engine = new PGLiteEngine();
|
||||
await engine.connect({ databaseUrl: '' });
|
||||
await engine.initSchema();
|
||||
queue = new MinionQueue(engine);
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await engine.disconnect();
|
||||
});
|
||||
|
||||
beforeEach(async () => {
|
||||
await engine.executeRaw('DELETE FROM minion_jobs');
|
||||
});
|
||||
|
||||
describe('waitForCompletion terminal states', () => {
|
||||
test('TERMINAL_STATES covers every terminal MinionJobStatus value', () => {
|
||||
expect(__testing.TERMINAL_STATES).toEqual(['completed', 'failed', 'dead', 'cancelled']);
|
||||
});
|
||||
|
||||
test('returns immediately when job already completed (fast path)', async () => {
|
||||
const j = await queue.add('t', {});
|
||||
const claimed = await queue.claim('tok', 30000, 'default', ['t']);
|
||||
await queue.completeJob(claimed!.id, 'tok', { ok: true });
|
||||
|
||||
const t0 = Date.now();
|
||||
const res = await waitForCompletion(queue, j.id, { pollMs: 500 });
|
||||
expect(res.status).toBe('completed');
|
||||
expect(Date.now() - t0).toBeLessThan(300); // no full poll cycle
|
||||
});
|
||||
|
||||
test('returns when job transitions to failed mid-wait', async () => {
|
||||
const j = await queue.add('t', {});
|
||||
const p = waitForCompletion(queue, j.id, { pollMs: 25, timeoutMs: 5000 });
|
||||
// Transition the job to failed after a brief delay.
|
||||
setTimeout(async () => {
|
||||
const claimed = await queue.claim('tok', 30000, 'default', ['t']);
|
||||
await queue.failJob(claimed!.id, 'tok', 'boom', 'failed');
|
||||
}, 60);
|
||||
const res = await p;
|
||||
expect(res.status).toBe('failed');
|
||||
});
|
||||
|
||||
test('returns when job transitions to cancelled', async () => {
|
||||
const j = await queue.add('t', {});
|
||||
const p = waitForCompletion(queue, j.id, { pollMs: 25, timeoutMs: 5000 });
|
||||
setTimeout(() => { queue.cancelJob(j.id); }, 60);
|
||||
const res = await p;
|
||||
expect(res.status).toBe('cancelled');
|
||||
});
|
||||
|
||||
test('throws TimeoutError when job stays non-terminal past timeoutMs', async () => {
|
||||
const j = await queue.add('t', {});
|
||||
await expect(
|
||||
waitForCompletion(queue, j.id, { pollMs: 25, timeoutMs: 100 })
|
||||
).rejects.toBeInstanceOf(TimeoutError);
|
||||
});
|
||||
|
||||
test('TimeoutError carries the jobId and elapsedMs', async () => {
|
||||
const j = await queue.add('t', {});
|
||||
try {
|
||||
await waitForCompletion(queue, j.id, { pollMs: 25, timeoutMs: 80 });
|
||||
throw new Error('should have thrown');
|
||||
} catch (e) {
|
||||
expect(e).toBeInstanceOf(TimeoutError);
|
||||
const te = e as TimeoutError;
|
||||
expect(te.jobId).toBe(j.id);
|
||||
expect(te.elapsedMs).toBeGreaterThanOrEqual(80);
|
||||
}
|
||||
});
|
||||
|
||||
test('TimeoutError does NOT cancel the job', async () => {
|
||||
const j = await queue.add('t', {});
|
||||
try {
|
||||
await waitForCompletion(queue, j.id, { pollMs: 25, timeoutMs: 80 });
|
||||
} catch {}
|
||||
const still = await queue.getJob(j.id);
|
||||
expect(still?.status).toBe('waiting');
|
||||
});
|
||||
|
||||
test('AbortSignal exits loop early without throwing', async () => {
|
||||
const j = await queue.add('t', {});
|
||||
const ac = new AbortController();
|
||||
setTimeout(() => ac.abort(), 50);
|
||||
const res = await waitForCompletion(queue, j.id, {
|
||||
pollMs: 25,
|
||||
timeoutMs: 5000,
|
||||
signal: ac.signal,
|
||||
});
|
||||
expect(res.id).toBe(j.id);
|
||||
// Still waiting — we just stopped polling.
|
||||
expect(res.status).toBe('waiting');
|
||||
});
|
||||
|
||||
test('throws when job id does not exist', async () => {
|
||||
await expect(waitForCompletion(queue, 99_999, { pollMs: 10, timeoutMs: 100 }))
|
||||
.rejects.toThrow(/not found/);
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user