* refactor(mcp): extract buildToolDefs helper for subagent tool registry reuse The inline operations.map(...) block in src/mcp/server.ts became the only source of truth for agent-facing tool definitions. Extract into a reusable exported helper so the v0.15 subagent tool registry can call it with a filtered OPERATIONS subset instead of duplicating the shape. Byte-for-byte equivalence regression pinned in test/mcp-tool-defs.test.ts — legacy inline mapping kept verbatim inside the test so any future drift between the new helper and the pre-extraction MCP schema fails loudly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(operations): subagent-aware OperationContext + put_page namespace Adds three optional fields to OperationContext: - jobId?: number — the currently running Minion job id - subagentId?: number — the owning subagent job id for tool-dispatched calls - viaSubagent?: boolean — FAIL-CLOSED flag for agent-path gating put_page now enforces a namespace rule when invoked on the subagent tool dispatch path (viaSubagent=true): writes MUST target `wiki/agents/<subagentId>/...`. Anchored, slash-boundary enforced so a collision like `wiki/agents/12evil/...` can't impersonate subagent 12. The check runs BEFORE the dry-run short-circuit so preview calls surface the same rejection. Fail-closed: a missing subagentId with viaSubagent=true rejects every slug rather than letting a dispatcher bug open a hole. Existing callers unaffected — all three fields are optional and the legacy put_page behavior is unchanged when viaSubagent is undefined/false. 12 regression + namespace tests pin: - local CLI writes (viaSubagent unset) accept arbitrary slugs - MCP writes (remote=true, viaSubagent unset) accept arbitrary slugs - subagent-path: anchored prefix accepted, wrong id rejected, prefix- collision defeated, leading-slash rejected, bare-prefix rejected, fail-closed on missing/NaN subagentId, permission_denied code emitted Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(schema): v0.15.0 subagent runtime tables + migration orchestrator Adds three new tables for the durable LLM agent runtime: subagent_messages — Anthropic message-block persistence. Parallel tool_use blocks in one assistant message live in content_blocks JSONB, not across rows (fixes the (job_id, turn_idx, role) misdesign codex caught in v0.13 drafting). subagent_tool_executions — Two-phase tool ledger. INSERT pending before execute, UPDATE complete/failed after. Replay re-runs pending rows only if the tool is idempotent (v1 ships only idempotent tools so this is preventive). subagent_rate_leases — Lease-based concurrency cap for outbound providers (e.g. anthropic:messages). Stale leases auto-prune on next acquire so crashed workers can't strand capacity. All DDL uses CREATE TABLE/INDEX IF NOT EXISTS — order-independent vs PR #244's initSchema() reorder, and idempotent across fresh-install + upgrade paths. Shipped in both src/schema.sql (Postgres) and src/core/pglite-schema.ts (PGLite); schema-embedded.ts regenerated. Migration orchestrator v0_15_0.ts (phases: schema → verify → record). v0_14_0.ts is a no-op stub so the registry's version sequence stays gapless (v0.14.0 shipped shell-jobs — code change, no DB migration). 10 unit tests for registry wiring, ordering, dry-run phase behavior, and schema-embedded table presence. test/apply-migrations.test.ts updated for the two new registry entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): emit child_done on every terminal + max_stalled per-job + terminal set fix Three correctness fixes the v0.15 subagent aggregator spine depends on: 1. child_done emission on ALL terminal transitions, not just success. - completeJob already emitted on success — now also tags outcome='complete'. - failJob newly emits on terminal 'failed' or 'dead' (outcome='failed'|'dead', error=<text>), BEFORE the parent-terminal UPDATE so the EXISTS guard on the inbox INSERT doesn't skip it on fail_parent paths (codex catch). - cancelJob now emits outcome='cancelled' per descendant with a parent. - handleTimeouts now emits outcome='timeout' per timed-out child. ChildDoneMessage gains optional { outcome, error } — backwards compatible (legacy writers omitted them; consumers treat absent outcome as 'complete'). 2. Parent-resolution terminal set now includes 'failed'. Pre-v0.15 the `NOT EXISTS (... status NOT IN ('completed','dead','cancelled'))` guard treated a failed child as still-pending, stranding aggregator parents that chose on_child_fail='continue' or 'ignore' in waiting-children forever. Expanded to {completed, failed, dead, cancelled} everywhere parent resolution reads child status (completeJob inline, failJob remove_dep + continue, cancelJob sweep, handleTimeouts sweep, and the resolveParent method itself). 3. MinionJobInput.max_stalled threads through MinionQueue.add() on INSERT. Column exists with default 1 — that is "first stall → dead", which defeats crash recovery for long-running handlers. Subagent children will set max_stalled: 3 to survive mid-run worker kills. Second-submitter under an idempotency-key hit does NOT mutate the existing row (codex-flagged footgun — first-submit options are load-bearing state). 13 unit tests pin: emission on each of completeJob/failJob/cancelJob/ handleTimeouts, insertion order on fail_parent, terminal-set expansion with continue policy, max_stalled default + override + idempotency behavior. E2E tier 1 (Postgres) passes 141 tests unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): rate-leases + waitForCompletion infra for v0.15 subagent Two infrastructure modules the subagent handler spine depends on: rate-leases.ts — lease-based concurrency cap for outbound providers (anthropic:messages, openai:*, etc.). Counter-based limiters leak capacity on worker crash; leases are owner-tagged rows with expires_at that auto-prune on the next acquire. Two-phase: txn-scoped pg_advisory_xact_lock guards the check-then-insert so concurrent acquires can't both win the "last slot". renewLeaseWithBackoff retries 3x (250/500/1000ms) for mid- call DB blips — on persistent failure the LLM-loop caller aborts with a renewable error so the worker re-claims and the rate invariant is preserved. Owner FK cascades clean up leases on job deletion. wait-for-completion.ts — poll-until-terminal helper for CLI callers. Minions' NOTIFY is worker-side only; `gbrain agent run --follow` polls getJob() until status is {completed, failed, dead, cancelled}. TimeoutError carries jobId + elapsedMs and does NOT cancel the job — the user can inspect via `gbrain jobs get <id>` later. Supports AbortSignal for Ctrl-C without throwing. Default pollMs is 1000 on Postgres, 250 on PGLite (inline CLI has no network RTT). 21 unit tests cover: single/multi acquire under cap, rejection past cap, release frees slot, different keys are independent, stale prune, cascade on owner delete, renew bumps expires_at, renew on missing is false, backoff path success + pruned short-circuit. waitForCompletion: fast-path terminal, transitions mid-wait (completed/failed/cancelled), TimeoutError shape, abort-signal early exit, non-existent job error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): subagent ToolDef types + brain-tool registry (v0.15) Types first so the handler has a stable contract: - SubagentHandlerData / AggregatorHandlerData — the two job.data shapes - ToolCtx (engine, jobId, remote, signal) + ToolDef (name, description, input_schema, idempotent, execute) — Anthropic-envelope, distinct from the MCP McpToolDef extraction landed earlier - ContentBlock discriminated union for subagent_messages.content_blocks - SubagentStopReason + SubagentResult emitted on terminal completion brain-allowlist.ts derives one ToolDef per allow-listed OPERATION. Reuses the ParamDef → JSONSchema shape from the MCP extraction in a local helper (Anthropic's input_schema field diverges from MCP's inputSchema by a character). The 11-name allow-list is read-safe + put_page — every destructive / filesystem / identity-mutating op stays off by default. put_page gets a namespace-wrapped tool schema: `slug` pattern = anchored `^wiki/agents/<subagentId>/.+`. The server-side check in put_page op (shipped in prior commit) is still the authoritative gate — the schema just helps the model write correct slugs first-try. `subagentId` is plumbed into the ToolCtx so the viaSubagent=true fail-closed path lights up on every tool-dispatched put_page. filterAllowedTools narrows a registry by subagent_def's allowed_tools frontmatter field. Rejects unknown names at load time (no silent drop — typos in a skills/subagents/*.md would otherwise ship to prod with a tool silently missing). 18 tests pin: every allowlist name exists in OPERATIONS (catches upstream rename), Anthropic name regex, put_page namespace pattern per-subagent, execute() routes through the op handler with viaSubagent=true, out-of- namespace put_page throws permission_denied, filter passes prefixed + unprefixed names, rejects unknowns, deduplicates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): subagent-audit JSONL + transcript renderer Two small plumbing pieces the v0.15 subagent handler + `gbrain agent logs` depend on: subagent-audit.ts — JSONL-rotated audit log mirroring the shell-audit pattern. Two event flavors: submission (one line per job submit) and heartbeat (one line per turn boundary — llm_call_started / completed / tool_called / tool_result / tool_failed). Heartbeats fix the "--follow on a long Anthropic call shows nothing for 30 seconds" problem codex flagged. Never logs prompts or tool inputs (PII risk — subagent input_vars may carry user-supplied free text); DOES log tokens, ms_elapsed, tool_name, first 200 chars of error text. Rotates weekly via ISO week. `readSubagent AuditForJob` is the readback path for `gbrain agent logs` — scans the current + prior week file so job boundaries across weeks still resolve. `GBRAIN_AUDIT_DIR` overrides the default ~/.gbrain/audit/ for container deploys. transcript.ts — renders subagent_messages + subagent_tool_executions to markdown. Message order is authoritative; tool rows splice under their owning assistant tool_use by tool_use_id. Handles text, tool_use (with pending / complete / failed execution rows), tool_result (skipped if we already rendered the owning tool_use — avoids double-printing), and unknown block types (fenced JSON dump for diagnostics). Output is UTF-8-safe truncated at maxOutputBytes. 21 unit tests: ISO week filename rotation (incl. 2027-01-01 → W53-2026 boundary), submission + heartbeat write shapes, 200-char error cap, best- effort write failure doesn't throw, readback filters by job_id and sinceIso. Transcript: empty input, ordering, token line, tool_use + complete/failed/pending execution rendering, truncation, unknown-block diagnostic dump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): subagent LLM-loop handler with crash-resumable replay The main event: runs one Anthropic Messages API conversation with tool use, persists every turn + tool execution, and resumes cleanly after a worker kill anywhere in the loop. Design points that carry the v0.15 guarantees: 1. Two-phase tool persistence. INSERT status='pending' before dispatch, UPDATE to 'complete' or 'failed' after. subagent_messages rows are the canonical conversation; subagent_tool_executions rows are the canonical "did this tool run + what did it return". Either DB commit is atomic, so replay has a single source of truth. 2. Replay reconciliation. If the last persisted message is an assistant with tool_use blocks AND no following synthesized user message, we crashed mid-dispatch. On resume, finish those tools first (respecting idempotent flag for 'pending' rows), synthesize the user turn, and THEN call the LLM again. Non-idempotent pending rows abort the job with a clear error — v0.15 ships only idempotent tools so this is preventive. 3. Rate lease around every LLM call. acquireLease before, releaseLease after (both success and error paths). acquired=false throws RateLeaseUnavailableError — the worker treats it as a renewable error and re-claims later, so a temporary capacity cap doesn't fail the job terminally. 4. Anthropic prompt caching. system block gets cache_control=ephemeral; the LAST tool def gets it too (Anthropic caches everything up to and including the marked block). ~10x cost reduction on multi-turn agents per the plan. 5. Dual-signal abort. AbortSignal.any merges ctx.signal (timeout / lock loss / cancel) with ctx.shutdownSignal (worker SIGTERM). Both feed the Anthropic call's AbortSignal; mid-turn abort bails before the next LLM call with whatever turns are already persisted. Node ≥ 20 has AbortSignal.any; older runtimes get a manual-merge polyfill. 6. Injectable Anthropic client. The real SDK implements MessagesClient structurally; tests inject a FakeMessagesClient that scripts responses. 12 unit tests pin: no-tool happy path, single tool_use complete, tool throws → failed row + loop continues, unknown tool name rejection, max_turns cap, crash-then-resume with partial state, replay skips already- complete tool execs without re-invoking execute, non-idempotent pending rejects on resume, lease acquire + release roundtrip, RateLeaseUnavailable under cap-full, missing prompt validation, allowed_tools unknown-name. NOT in v0.15: refusal detection (stop_reason + content shape), stop_reason =max_tokens partial recovery, mid-call lease renewal with backoff loop. All three are documented as P2 items in the plan file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): subagent_aggregator handler with mixed-outcome rendering Claims AFTER all subagent children resolve — by then Lane 1B's queue changes have posted one child_done message per terminal transition into this job's inbox (complete / failed / dead / cancelled / timeout). The aggregator reads those, builds a deterministic markdown summary, and returns it as the handler result. Not an LLM call in v0.15 — output is reproducible concatenation so fan-out runs stay comparable. v0.16+ can add an LLM synthesis pass behind an opt-in flag. Contract: - empty children_ids → `(no children)` marker - missing child_done (shouldn't happen under v0.15 invariants but possible if a terminal-state path slipped past Lane 1B) → counted as failed with "no child_done message observed" error - non-complete outcomes: result is null in the output so no payload leaks alongside a failure label - children appear in the order children_ids was supplied - custom aggregate_prompt_template replaces the markdown header 13 unit tests cover: empty input, all-success, mixed outcomes, result suppression on failure, missing child_done handling, order preservation, custom template, progress + log emission, stringified JSONB payload parsing, non-child_done inbox filtering, legacy-writer outcome fallback, and internal helper edges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): GBRAIN_PLUGIN_PATH loader + plugin-authors guide (v0.15) Plumbing that makes Wintermute (and future downstream agents) day-1 usable on v0.15. Host repos drop a `gbrain.plugin.json` + `subagents/` directory somewhere, set GBRAIN_PLUGIN_PATH (colon-separated like \$PATH), and their custom subagent defs load at worker startup. Path policy is strict: absolute paths only. Relative, ~-prefixed, and URL-style (https://, file://) all rejected with warnings — the user controls where plugins live. Non-existent paths and files (not dirs) are warned and skipped so a typo doesn't crash worker startup. Collision policy: left-wins. If two plugins ship a subagent with the same name, the first one in GBRAIN_PLUGIN_PATH keeps it and the other gets a warning naming both sources. Deterministic + debuggable. Trust policy: plugins ship subagent defs ONLY. Cannot declare new tools, cannot extend the brain allow-list, cannot override safety flags. The subagent def's `allowed_tools:` frontmatter MUST subset the derived registry — validation happens at load time (worker startup), not at dispatch time, so a typo in a skill gives a loud startup error instead of silently "tool never fires at 3am." Manifest `plugin_version: "gbrain-plugin-v1"` locks the contract. Unknown versions rejected. `subagents` field escape attempts (`../../../etc` etc) rejected. gray-matter handles the markdown frontmatter parse — subagent defs don't conform to the page schema, so we don't use parseMarkdown. docs/guides/plugin-authors.md is the Wintermute-facing walkthrough. Covers the minimum viable plugin shape, the three policies, the frontmatter fields, known caveats (audit JSONL is local-only, tool calls always run remote=true, put_page is namespace-scoped). 22 unit tests pin path rejection, missing/invalid manifest, unsupported version, escape-attempt, basename fallback for missing frontmatter.name, allowed_tools round-trip, unknown-tool rejection with validAgentToolNames, empty env, multi-path, collision warning with left-wins, trimmed paths, manifest-rejection as warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): gbrain agent run + logs + worker registration (v0.15 Lane 4H) Three integration seams wired: src/commands/agent.ts — \`gbrain agent run\`. Submits subagent jobs (or a fan-out of N + aggregator) under the trusted-submit flag so the PROTECTED_JOB_NAMES guard doesn't reject. Fan-out path creates the aggregator first (so children can reference its id as parent), submits each child with on_child_fail='continue' (required by Lane 1B's terminal- set + child_done machinery), then jsonb_set's the aggregator's children_ids. Short-circuits a 1-entry manifest to a single subagent with no aggregator. Follow mode runs agent-logs streaming + waitFor Completion in parallel and exits on terminal status; detach prints the job id and exits. Ctrl-C is handled as detach, not cancel — the job keeps running, consistent with durability invariants. src/commands/agent-logs.ts — \`gbrain agent logs\`. Merges ~/.gbrain/audit/ subagent-jobs-*.jsonl (heartbeats + submissions) with subagent_messages (persisted conversation) in one chronological stream. --follow polls at 1s and exits when the job hits terminal. --since accepts ISO-8601 OR relative shorthand (5m / 1h / 2d). Writes transcript tail (full message + tool tree) only for terminal jobs, so mid-run --follow doesn't spam a half-rendered transcript. src/commands/jobs.ts registerBuiltinHandlers — matches the shell-handler opt-in shape. GBRAIN_ALLOW_LLM_JOBS=1 registers the subagent + subagent_aggregator handlers, then loads plugins from GBRAIN_PLUGIN_PATH with validAgentToolNames pulled from BRAIN_TOOL_ALLOWLIST. Every plugin warning + loaded-plugin line prints to stderr, mirroring the openclaw- seam startup convention. src/core/minions/protected-names.ts — subagent + subagent_aggregator join the protected set. MCP submit_job returns permission_denied; only trusted-CLI callers (with allowProtectedSubmit) can insert these rows. src/cli.ts — adds 'agent' to CLI_ONLY + dispatches it like 'jobs'. Test fallout: subagent-handler.test.ts + subagent-transcript.test.ts helpers now submit under allowProtectedSubmit (they insert rows named 'subagent' directly against the queue). 23 new tests in agent-cli.test.ts cover: flag parsing (including --detach implies !follow, --tools comma split, -- terminator, unknown flag throw), --since parse (ISO, relative 5m/2h/1d, unparseable error), protected-name guard for all three names, trusted-submit gate, and a fan-out integration check that verifies the aggregator + children shape after --fanout-manifest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): rename max_children test's spawned jobs off the protected 'subagent' name The spawn-storm test submitted 50 literal-string 'subagent' children to exercise the max_children row-lock serialization. In v0.15 'subagent' is a PROTECTED_JOB_NAME (CLI-only; trusted submit required), so the old literal submission now throws before reaching the row-lock check. The test is about max_children semantics, not the v0.15 subagent runtime specifically — rename the child name to 'child_worker' so the test exercises the exact same queue.add path without tripping the new guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ship): v0.15.0 — VERSION, CHANGELOG, README, upgrading-agents, CLAUDE.md Bumps VERSION → 0.15.0 and package.json → 0.15.0 (resolves the pre-existing drift — on master, VERSION=0.14.0 but package.json=0.13.1; src/version.ts reads package.json, so this is what the binary prints now). CHANGELOG lands the release-summary entry in the GStack voice + the full itemized change list (11 new modules, 3 new tables, queue correctness fixes, trust-model additions, 159 new unit tests). Voice rules respected — no em dashes, no AI vocabulary, real file names + real numbers. README gets a "Durable agents: `gbrain agent` (v0.15)" section next to the Minions block, with the three canonical CLI shapes (single run, fanout-manifest, logs --follow) and a pointer to plugin-authors.md. docs/UPGRADING_DOWNSTREAM_AGENTS.md gets a full v0.15.0 section covering the four adoption steps downstream agents (Wintermute and similar) need: (1) worker opt-in via GBRAIN_ALLOW_LLM_JOBS, (2) moving custom subagent defs to a plugin repo, (3) replacing ephemeral subagent runs with durable `gbrain agent run`, (4) the put_page namespace rule for agent-driven writes. CLAUDE.md updated with concise per-file descriptions for every new module: the handler, aggregator, audit, rate-leases, wait-for-completion, transcript, plugin-loader, brain-allowlist, tool-defs extraction, agent CLI + logs CLI, and the registerBuiltinHandlers wiring for subagent handlers + plugin-loader. Verified: binary builds (940 modules, 89ms compile), prints `gbrain 0.15.0`, `gbrain agent --help` shows the new subcommand shape. 170 new tests pass (full v0.15 surface). Full unit suite passes bar one parallel-load flake on a pre-existing E2E (graph-quality, passes in isolation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(minions): drop GBRAIN_ALLOW_LLM_JOBS flag — subagent handlers always-on The env flag was ceremony. Shell jobs need the flag because they execute arbitrary CLI commands (RCE surface). Subagent jobs don't — they call the Anthropic API with whatever ANTHROPIC_API_KEY is in env, so the key is already the cost gate (no key → SDK fails on the first turn). And who-can-submit is already protected by PROTECTED_JOB_NAMES + TrustedSubmitOpts: MCP callers get permission_denied; only `gbrain agent run` with allowProtectedSubmit can insert subagent / subagent_aggregator rows. The flag added nothing the existing guards didn't already give us. registerBuiltinHandlers now always registers subagent + subagent_aggregator and loads GBRAIN_PLUGIN_PATH plugins. Worker startup prints: [minion worker] subagent handlers enabled instead of the conditional enabled/disabled pair. Plugin discovery runs unconditionally — empty PATH is a no-op. README, CHANGELOG, docs/UPGRADING_DOWNSTREAM_AGENTS, CLAUDE.md, agent CLI help text, and subagent handler docstring all updated to drop the flag reference. Shell handler's GBRAIN_ALLOW_SHELL_JOBS gate is untouched — separate concern (RCE, not billing). Full suite: 1859 pass, 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: scrub private agent-fork name from all public artifacts Enforces the rule added to CLAUDE.md (privacy section): never say `Wintermute` in any CHANGELOG, README, doc, PR, or commit message. Reader-facing copy says `your OpenClaw` (the term covers every downstream OpenClaw deployment — Wintermute, Hermes, AlphaClaw — in one umbrella the reader already recognizes). First-person / origin-story copy says `Garry's OpenClaw` (honest that this is the production deployment driving the feature, without exposing the private agent's name). Swept across: CHANGELOG.md (v0.15 entry + 4 historical mentions) README.md TODOS.md docs/UPGRADING_DOWNSTREAM_AGENTS.md docs/guides/plugin-authors.md (including example plugin names) docs/guides/plugin-handlers.md docs/guides/minions-fix.md docs/designs/KNOWLEDGE_RUNTIME.md (27 refs, mostly analytical) docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md skills/migrations/v0.11.0.md skills/skillpack-check/SKILL.md scripts/skillify-check.ts src/commands/doctor.ts src/commands/migrations/v0_15_0.ts src/commands/skillpack-check.ts src/core/enrichment/completeness.ts src/core/minions/plugin-loader.ts src/core/operations.ts src/core/output/scaffold.ts Intentionally kept (these mentions define/test the rule itself): CLAUDE.md — the privacy rule section necessarily uses the literal name to define the restriction and examples test/plugin-loader.test.ts — fixture name in a plugin-loading test; renaming risks breaking assertion logic test/integrations.test.ts — the word appears in a privacy-regex test that explicitly enforces name redaction test/doctor-minions-check.test.ts — a comment referencing the rule CEO plan artifact at ~/.gstack/projects/… — private, not distributed Binary builds (941 modules), 198/198 relevant tests pass, `gbrain --version` prints `0.15.0`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: gitignore bun --compile artifacts with a glob, not specific hashes Each `bun build --compile` emits a fresh hash-named `.*-*.bun-build` file in cwd. The prior entries listed two specific hashes that were already stale, so every build after those created a new untracked file requiring manual cleanup. Replace the two stale entries with `*.bun-build` so any current or future compile artifact is ignored automatically. Verified: ran `bun build --compile`, got two new `.*-*.bun-build` files, `git status` stays clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ship): rename v0.15.0 → v0.16.0 gbrain master is at 0.14.2. Other 0.15.x PRs may land before/after this one — we bump the minor (new capability) and lock to 0.16.0 so ordering with concurrent work doesn't matter. Touches: - VERSION: 0.15.0 → 0.16.0 - package.json: 0.15.0 → 0.16.0 - Rename src/commands/migrations/v0_15_0.ts → v0_16_0.ts (+ all version strings inside + import in index.ts registry) - Rename test/migrations-v0_15_0.test.ts → migrations-v0_16_0.test.ts - test/apply-migrations.test.ts: skippedFuture lists now reference '0.16.0' - test/put-page-namespace.test.ts + test/mcp-tool-defs.test.ts: Lane comment refs updated - src/schema.sql + src/core/pglite-schema.ts: "v0.15.0" section comment updated; src/core/schema-embedded.ts regenerated - CHANGELOG.md: top entry renamed to [0.16.0]; inline v0_15_0 / v0.15.0 refs swept - docs/UPGRADING_DOWNSTREAM_AGENTS.md: section heading v0.15.0 → v0.16.0 Verified: `gbrain --version` prints 0.16.0, migration registry / buildPlan / put_page / mcp-tool-defs / handlers tests all green (49/49). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: reframe v0.16 durability headline around OpenClaw crashes "Laptop closed mid-run" framing implied a consumer workflow. Real pain is OpenClaw subagents dying daily on worker kill, memory blip, or timeout. Headline + README copy match the body now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate llms-full.txt after README copy change Regen drift guard caught the README edit from 83beec4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
619 lines
35 KiB
Markdown
619 lines
35 KiB
Markdown
# GBrain
|
||
|
||
Your AI agent is smart but forgetful. GBrain gives it a brain.
|
||
|
||
Built by the President and CEO of Y Combinator to run his actual AI agents. The production brain powering his OpenClaw and Hermes deployments: **17,888 pages, 4,383 people, 723 companies**, 21 cron jobs running autonomously, built in 12 days. The agent ingests meetings, emails, tweets, voice calls, and original ideas while you sleep. It enriches every person and company it encounters. It fixes its own citations and consolidates memory overnight. You wake up and the brain is smarter than when you went to bed.
|
||
|
||
The brain wires itself. Every page write extracts entity references and creates typed links (`attended`, `works_at`, `invested_in`, `founded`, `advises`) with zero LLM calls. Hybrid search. Self-wiring knowledge graph. Structured timeline. Backlink-boosted ranking. Ask "who works at Acme AI?" or "what did Bob invest in this quarter?" and get answers vector search alone can't reach. Benchmarked end-to-end: **Recall@5 jumps from 83% to 95%, Precision@5 from 39% to 45%, +30 more correct answers in the agent's top-5 reads** on a 240-page Opus-generated rich-prose corpus. Graph-only F1: **86.6% vs grep's 57.8%** (+28.8 pts). [Full report](docs/benchmarks/2026-04-18-brainbench-v1.md).
|
||
|
||
GBrain is those patterns, generalized. 26 skills. Install in 30 minutes. Your agent does the work. As Garry's personal agent gets smarter, so does yours.
|
||
|
||
> **~30 minutes to a fully working brain.** Database ready in 2 seconds (PGLite, no server). You just answer questions about API keys.
|
||
|
||
> **LLMs:** fetch [`llms.txt`](llms.txt) for the documentation map, or [`llms-full.txt`](llms-full.txt) for the same map with core docs inlined in one fetch. **Agents:** start with [`AGENTS.md`](AGENTS.md) (or [`CLAUDE.md`](CLAUDE.md) if you're Claude Code).
|
||
|
||
## Install
|
||
|
||
### On an agent platform (recommended)
|
||
|
||
GBrain is designed to be installed and operated by an AI agent. If you don't have one running yet:
|
||
|
||
- **[OpenClaw](https://openclaw.ai)** ... Deploy [AlphaClaw on Render](https://render.com/deploy?repo=https://github.com/chrysb/alphaclaw) (one click, 8GB+ RAM)
|
||
- **[Hermes Agent](https://github.com/NousResearch/hermes-agent)** ... Deploy on [Railway](https://github.com/praveen-ks-2001/hermes-agent-template) (one click)
|
||
|
||
Paste this into your agent:
|
||
|
||
```
|
||
Retrieve and follow the instructions at:
|
||
https://raw.githubusercontent.com/garrytan/gbrain/master/INSTALL_FOR_AGENTS.md
|
||
```
|
||
|
||
That's it. The agent clones the repo, installs GBrain, sets up the brain, loads 26 skills, and configures recurring jobs. You answer a few questions about API keys. ~30 minutes.
|
||
|
||
If your agent doesn't auto-read `AGENTS.md`, point it at that file first:
|
||
`https://raw.githubusercontent.com/garrytan/gbrain/master/AGENTS.md` is the non-Claude
|
||
agent operating protocol (install, read order, trust boundary, common tasks). For
|
||
the full doc map, use `llms.txt` at the same URL root.
|
||
|
||
### Standalone CLI (no agent)
|
||
|
||
```bash
|
||
git clone https://github.com/garrytan/gbrain.git && cd gbrain && bun install && bun link
|
||
gbrain init # local brain, ready in 2 seconds
|
||
gbrain import ~/notes/ # index your markdown
|
||
gbrain query "what themes show up across my notes?"
|
||
```
|
||
|
||
**Do NOT use `bun install -g github:garrytan/gbrain`.** Bun blocks the top-level
|
||
postinstall hook on global installs, so schema migrations never run and the CLI
|
||
aborts with `Aborted()` the first time it opens PGLite. Use `git clone + bun install
|
||
&& bun link` as shown above. See [#218](https://github.com/garrytan/gbrain/issues/218).
|
||
|
||
```
|
||
3 results (hybrid search, 0.12s):
|
||
|
||
1. concepts/do-things-that-dont-scale (score: 0.94)
|
||
PG's argument that unscalable effort teaches you what users want.
|
||
[Source: paulgraham.com, 2013-07-01]
|
||
|
||
2. originals/founder-mode-observation (score: 0.87)
|
||
Deep involvement isn't micromanagement if it expands the team's thinking.
|
||
|
||
3. concepts/build-something-people-want (score: 0.81)
|
||
The YC motto. Connected to 12 other brain pages.
|
||
```
|
||
|
||
### MCP server (Claude Code, Cursor, Windsurf)
|
||
|
||
GBrain exposes 30+ MCP tools via stdio:
|
||
|
||
```json
|
||
{
|
||
"mcpServers": {
|
||
"gbrain": { "command": "gbrain", "args": ["serve"] }
|
||
}
|
||
}
|
||
```
|
||
|
||
Add to `~/.claude/server.json` (Claude Code), Settings > MCP Servers (Cursor), or your client's MCP config.
|
||
|
||
### Remote MCP (Claude Desktop, Cowork, Perplexity)
|
||
|
||
```bash
|
||
ngrok http 8787 --url your-brain.ngrok.app
|
||
bun run src/commands/auth.ts create "claude-desktop"
|
||
claude mcp add gbrain -t http https://your-brain.ngrok.app/mcp -H "Authorization: Bearer TOKEN"
|
||
```
|
||
|
||
Per-client guides: [`docs/mcp/`](docs/mcp/DEPLOY.md). ChatGPT requires OAuth 2.1 (not yet implemented).
|
||
|
||
## The 26 Skills
|
||
|
||
GBrain ships 26 skills organized by `skills/RESOLVER.md`. The resolver tells your agent which skill to read for any task.
|
||
|
||
[Skill files are code.](https://x.com/garrytan/status/2042925773300908103) They're the most powerful way to get knowledge work done. A skill file is a fat markdown document that encodes an entire workflow: when to fire, what to check, how to chain with other skills, what quality bar to enforce. The agent reads the skill and executes it. Skills can also call deterministic TypeScript code bundled in GBrain (search, import, embed, sync) for the parts that shouldn't be left to LLM judgment. [Thin harness, fat skills](docs/ethos/THIN_HARNESS_FAT_SKILLS.md): the intelligence lives in the skills, not the runtime.
|
||
|
||
### Always-on
|
||
|
||
| Skill | What it does |
|
||
|-------|-------------|
|
||
| **signal-detector** | Fires on every message. Spawns a cheap model in parallel to capture original thinking and entity mentions. The brain compounds on autopilot. |
|
||
| **brain-ops** | Brain-first lookup before any external API. The read-enrich-write loop that makes every response smarter. |
|
||
|
||
### Content ingestion
|
||
|
||
| Skill | What it does |
|
||
|-------|-------------|
|
||
| **ingest** | Thin router. Detects input type and delegates to the right ingestion skill. |
|
||
| **idea-ingest** | Links, articles, tweets become brain pages with analysis, author people pages, and cross-linking. |
|
||
| **media-ingest** | Video, audio, PDF, books, screenshots, GitHub repos. Transcripts, entity extraction, backlink propagation. |
|
||
| **meeting-ingestion** | Transcripts become brain pages. Every attendee gets enriched. Every company gets a timeline entry. |
|
||
|
||
### Brain operations
|
||
|
||
| Skill | What it does |
|
||
|-------|-------------|
|
||
| **enrich** | Tiered enrichment (Tier 1/2/3). Creates and updates person/company pages with compiled truth and timelines. |
|
||
| **query** | 3-layer search with synthesis and citations. Says "the brain doesn't have info on X" instead of hallucinating. |
|
||
| **maintain** | Periodic health: stale pages, orphans, dead links, citation audit, back-link enforcement, tag consistency. |
|
||
| **citation-fixer** | Scans pages for missing or malformed citations. Fixes format to match the standard. |
|
||
| **repo-architecture** | Where new brain files go. Decision protocol: primary subject determines directory, not format. |
|
||
| **publish** | Share brain pages as password-protected HTML. Zero LLM calls. |
|
||
| **data-research** | Structured data research with parameterized YAML recipes. Extract investor updates, expenses, company metrics from email. |
|
||
|
||
### Operational
|
||
|
||
| Skill | What it does |
|
||
|-------|-------------|
|
||
| **daily-task-manager** | Task lifecycle with priority levels (P0-P3). Stored as searchable brain pages. |
|
||
| **daily-task-prep** | Morning prep: calendar lookahead with brain context per attendee, open threads, task review. |
|
||
| **cron-scheduler** | Schedule staggering (5-min offsets), quiet hours (timezone-aware with wake-up override), idempotency. |
|
||
| **reports** | Timestamped reports with keyword routing. "What's the latest briefing?" finds it instantly. |
|
||
| **cross-modal-review** | Quality gate via second model. Refusal routing: if one model refuses, silently switch. |
|
||
| **webhook-transforms** | External events (SMS, meetings, social mentions) converted into brain pages with entity extraction. |
|
||
| **testing** | Validates every skill has SKILL.md with frontmatter, manifest coverage, resolver coverage. |
|
||
| **skill-creator** | Create new skills following the conformance standard. MECE check against existing skills. |
|
||
| **minion-orchestrator** | Long-running agent work as background jobs. Submit, fan out children with depth/cap/timeouts, collect results via child_done inbox. |
|
||
|
||
### Identity and setup
|
||
|
||
| Skill | What it does |
|
||
|-------|-------------|
|
||
| **soul-audit** | 6-phase interview generating SOUL.md (agent identity), USER.md (user profile), ACCESS_POLICY.md (4-tier privacy), HEARTBEAT.md (operational cadence). |
|
||
| **setup** | Auto-provision PGLite or Supabase. First import. GStack detection. |
|
||
| **migrate** | Universal migration from Obsidian, Notion, Logseq, markdown, CSV, JSON, Roam. |
|
||
| **briefing** | Daily briefing with meeting context, active deals, and citation tracking. |
|
||
|
||
### Conventions
|
||
|
||
Cross-cutting rules in `skills/conventions/`:
|
||
- **quality.md** ... citations, back-links, notability gate, source attribution
|
||
- **brain-first.md** ... 5-step lookup before any external API call
|
||
- **model-routing.md** ... which model for which task
|
||
- **test-before-bulk.md** ... test 3-5 items before any batch operation
|
||
- **cross-modal.yaml** ... review pairs and refusal routing chain
|
||
|
||
## How It Works
|
||
|
||
```
|
||
Signal arrives (meeting, email, tweet, link)
|
||
-> Signal detector captures ideas + entities (parallel, never blocks)
|
||
-> Brain-ops: check the brain first (gbrain search, gbrain get)
|
||
-> Respond with full context
|
||
-> Write: update brain pages with new information + citations
|
||
-> Auto-link: typed relationships extracted on every write (zero LLM calls)
|
||
-> Sync: gbrain indexes changes for next query
|
||
```
|
||
|
||
Every cycle adds knowledge. The agent enriches a person page after a meeting. Next time that person comes up, the agent already has context. The difference compounds daily.
|
||
|
||
The system gets smarter on its own. Entity enrichment auto-escalates: a person mentioned once gets a stub page (Tier 3). After 3 mentions across different sources, they get web + social enrichment (Tier 2). After a meeting or 8+ mentions, full pipeline (Tier 1). The brain learns who matters without being told. Deterministic classifiers improve over time via a fail-improve loop that logs every LLM fallback and generates better regex patterns from the failures. `gbrain doctor` shows the trajectory: "intent classifier: 87% deterministic, up from 40% in week 1."
|
||
|
||
> "Prep me for my meeting with Jordan in 30 minutes"
|
||
> ... pulls dossier, shared history, recent activity, open threads
|
||
|
||
> "What have I said about the relationship between shame and founder performance?"
|
||
> ... searches YOUR thinking, not the internet
|
||
|
||
## Minions: your sub-agents won't drop work anymore
|
||
|
||
A durable, Postgres-native job queue built into the brain. Every long-running agent task is now a job that survives gateway restarts, streams progress, gets paused / resumed / steered mid-flight, and shows up in `gbrain jobs list`. Zero infra beyond your existing brain.
|
||
|
||
### The production numbers that matter
|
||
|
||
Here's my personal OpenClaw deployment: one Render container. Supabase Postgres holding a 45,000-page brain. 19 cron jobs firing on schedule. Real gateway load from real daily work. The task: pull a month of my social posts from an external API and ingest them end-to-end into the brain as a structured page.
|
||
|
||
| | Minions | `sessions_spawn` |
|
||
|--- |--- |--- |
|
||
| Wall time | **753ms** | **>10,000ms** (gateway timeout) |
|
||
| Token cost | **$0.00** | ~$0.03 per run |
|
||
| Success rate | **100%** | **0%** (couldn't even spawn) |
|
||
| Memory/job | ~2 MB | ~80 MB |
|
||
|
||
Under that 19-cron load, sub-agent spawn couldn't clear the 10-second gateway wall. Minions landed it in under a second for zero tokens. **Scaling:** 19,240 posts across 36 months, single bash loop, ~15 min total, $0.00. Sub-agents: ~9 min best case, ~$1.08 in tokens, ~40% spawn failure. **Lab:** durability ∞ (SIGKILL mid-flight, 10/10 rescued), throughput ~10× faster, fan-out ~21× with no failure wall, memory ~400× less.
|
||
|
||
Full benchmarks: [production](docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md) and [lab](docs/benchmarks/2026-04-18-minions-vs-openclaw-subagents.md).
|
||
|
||
### The routing rule
|
||
|
||
> **Deterministic** (same input → same steps → same output) → **Minions**
|
||
> **Judgment** (input requires assessment or decision) → **Sub-agents**
|
||
|
||
Pull posts, parse JSON, write a brain page, run a sync — deterministic. $0 tokens, survives restart, millisecond runtime. Triage the inbox, assess meeting priority, decide if a cold email deserves a reply — judgment. What sub-agents are actually good at. `minion_mode: pain_triggered` (the default) automates the routing.
|
||
|
||
### What's fixed
|
||
|
||
The six daily pains — spawn storms, agents that stop responding, forgotten dispatches, gateway crashes mid-run, runaway grandchildren, debugging soup — all belonged to the "deterministic work through a reasoning model" mistake. Minions fixes them by not making that mistake: `max_children` cap, `timeout_ms` + AbortSignal, `child_done` inbox, full `parent_job_id`/`depth`/transcript per job, Postgres durability with stall detection, cascade cancel via recursive CTE. Plus idempotency keys, attachment validation, `removeOnComplete`, and `gbrain jobs smoke` that proves the install in half a second.
|
||
|
||
```bash
|
||
gbrain jobs smoke # verify install
|
||
gbrain jobs submit sync --params '{}' # fire a background job
|
||
gbrain jobs stats # health dashboard
|
||
gbrain jobs work --concurrency 4 # start a worker (Postgres only)
|
||
```
|
||
|
||
Read [`skills/minion-orchestrator/SKILL.md`](skills/minion-orchestrator/SKILL.md) for parent-child DAGs, fan-in collection, steering via inbox.
|
||
|
||
**Minions is not incrementally better than sub-agents for background work. It's categorically different.** 753ms vs gateway timeout. $0 vs tokens. 100% vs couldn't-spawn. If your agent does deterministic work on a schedule, it runs on Minions now.
|
||
|
||
### Health check and self-heal
|
||
|
||
Minions is canonical as of v0.11.1 — every `gbrain upgrade` runs the migration automatically (schema → smoke → prefs → host rewrites → env-aware autopilot install). If you ever want to verify manually or wire a cron into your morning briefing:
|
||
|
||
```bash
|
||
gbrain doctor # half-migrated state? prints loud banner + exits non-zero
|
||
gbrain skillpack-check --quiet # exit 0/1/2 for pipeline gating
|
||
gbrain skillpack-check | jq # full JSON: {healthy, summary, actions[], doctor, migrations}
|
||
```
|
||
|
||
If anything's off, `actions[]` tells you the exact command to run. For deeper troubleshooting: [`docs/guides/minions-fix.md`](docs/guides/minions-fix.md).
|
||
|
||
Moving gateway crons to Minions (deterministic scripts, zero LLM tokens per fire): [`docs/guides/minions-shell-jobs.md`](docs/guides/minions-shell-jobs.md).
|
||
|
||
## Durable agents: `gbrain agent` (v0.15)
|
||
|
||
Your subagent runs survive crashes now. OpenClaw died mid-run? The worker re-claims on restart and replays from the last committed turn. Fan-out across 50 shards, one shard crashes — the aggregator still claims after every child reaches a terminal state and writes a mixed-outcome summary. Tool calls persist as a two-phase ledger (`pending` → `complete | failed`) so replay is safe by construction, not by hope.
|
||
|
||
```bash
|
||
# Submit a single-subagent run
|
||
gbrain agent run "summarize my last 10 journal pages"
|
||
|
||
# Fan out N prompts across N subagent children + 1 aggregator
|
||
gbrain agent run "analyze every page" \
|
||
--fanout-manifest manifests/pages.json \
|
||
--subagent-def analyzer
|
||
|
||
# Tail a running job (heartbeat per turn + full transcript on completion)
|
||
gbrain agent logs 1247 --follow --since 5m
|
||
```
|
||
|
||
Durability is the point: every Anthropic turn commits to `subagent_messages`, every tool call to `subagent_tool_executions`. Worker kills, OpenClaw crashes, timeouts — all resumable. Host repos (your OpenClaw, etc.) ship their own subagent definitions via `GBRAIN_PLUGIN_PATH` + a `gbrain.plugin.json` manifest: see [`docs/guides/plugin-authors.md`](docs/guides/plugin-authors.md). Requires `ANTHROPIC_API_KEY` on the worker.
|
||
|
||
## Skillify: your skills tree stops being a black box
|
||
|
||
Hermes and similar agent frameworks auto-create skills as a background behavior. Fine until you don't know what the agent shipped. Checklists decay. Tests drift. Resolver entries get stale. Six months later you've got an opaque pile of "skills" that nobody has read, nobody has tested, and nobody is sure still work.
|
||
|
||
GBrain ships the same capability. Except the human stays in the loop.
|
||
|
||
- **`/skillify`** turns raw code into a properly-skilled feature: SKILL.md + deterministic script + unit tests + integration tests + LLM evals + resolver trigger + resolver trigger eval + E2E smoke + brain filing. Ten items. Every one required.
|
||
- **`gbrain check-resolvable`** walks the whole skills tree: reachability, MECE overlap, DRY violations, gap detection, orphaned skills. Exits non-zero if anything is off.
|
||
- **`scripts/skillify-check.ts`** — machine-readable audit. `--json` for CI, `--recent` for last-7-days files.
|
||
|
||
You decide when and what. The tooling keeps the checklist honest.
|
||
|
||
### Why this is the right answer for OpenClaw
|
||
|
||
Auto-generated skills are a liability the first time a behavior breaks. Was it the skill? The test? The resolver trigger? The eval? You don't know, because you never read it. Debugging a black box is pure guesswork.
|
||
|
||
Skillify makes the black box legible. Every skill in your tree has: a contract (SKILL.md), tests that exercise that contract, an eval that grades LLM output against a rubric, a resolver trigger the user actually types, and a test that confirms the trigger routes right. If something breaks, you know which layer to look at. If anything goes stale, `check-resolvable` says so.
|
||
|
||
In practice this combo produces **zero orphaned skills, every feature with tests + evals + resolver triggers + evals of the triggers.** Compounding quality instead of compounding entropy.
|
||
|
||
```bash
|
||
# Audit a feature's skill completeness (10-item checklist)
|
||
bun run scripts/skillify-check.ts src/commands/publish.ts
|
||
|
||
# In CI: fail the build when a new feature isn't properly skilled
|
||
bun run scripts/skillify-check.ts --json --recent
|
||
|
||
# Validate the whole skills tree before shipping
|
||
gbrain check-resolvable
|
||
```
|
||
|
||
**Skillify is not a nice-to-have. It's the piece that makes the skills tree survive six months of compounding work.** Read [`skills/skillify/SKILL.md`](skills/skillify/SKILL.md) for the full 10-item checklist and the anti-patterns it catches.
|
||
|
||
## Getting Data In
|
||
|
||
GBrain ships integration recipes that your agent sets up for you. Each recipe tells the agent what credentials to ask for, how to validate, and what cron to register.
|
||
|
||
| Recipe | Requires | What It Does |
|
||
|--------|----------|-------------|
|
||
| [Public Tunnel](recipes/ngrok-tunnel.md) | — | Fixed URL for MCP + voice (ngrok Hobby $8/mo) |
|
||
| [Credential Gateway](recipes/credential-gateway.md) | — | Gmail + Calendar access |
|
||
| [Voice-to-Brain](recipes/twilio-voice-brain.md) | ngrok-tunnel | Phone calls to brain pages (Twilio + OpenAI Realtime) |
|
||
| [Email-to-Brain](recipes/email-to-brain.md) | credential-gateway | Gmail to entity pages |
|
||
| [X-to-Brain](recipes/x-to-brain.md) | — | Twitter timeline + mentions + deletions |
|
||
| [Calendar-to-Brain](recipes/calendar-to-brain.md) | credential-gateway | Google Calendar to searchable daily pages |
|
||
| [Meeting Sync](recipes/meeting-sync.md) | — | Circleback transcripts to brain pages with attendees |
|
||
|
||
**Data research recipes** extract structured data from email into tracked brain pages. Built-in recipes for investor updates (MRR, ARR, runway, headcount), expense tracking, and company metrics. Create your own with `gbrain research init`.
|
||
|
||
Run `gbrain integrations` to see status.
|
||
|
||
## GBrain + GStack
|
||
|
||
[GStack](https://github.com/garrytan/gstack) is the engine. GBrain is the mod.
|
||
|
||
- **[GStack](https://github.com/garrytan/gstack)** = coding skills (ship, review, QA, investigate, office-hours, retro). 70,000+ stars, 30,000 developers per day. When your agent codes on itself, it uses GStack.
|
||
- **GBrain** = everything-else skills (brain ops, signal detection, ingestion, enrichment, cron, reports, identity). When your agent remembers, thinks, and operates, it uses GBrain.
|
||
- **`hosts/gbrain.ts`** = the bridge. Tells GStack's coding skills to check the brain before coding.
|
||
|
||
`gbrain init` detects if GStack is installed and reports mod status. If GStack isn't there, it tells you how to get it.
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌──────────────────┐ ┌───────────────┐ ┌──────────────────┐
|
||
│ Brain Repo │ │ GBrain │ │ AI Agent │
|
||
│ (git) │ │ (retrieval) │ │ (read/write) │
|
||
│ │ │ │ │ │
|
||
│ markdown files │───>│ Postgres + │<──>│ 26 skills │
|
||
│ = source of │ │ pgvector │ │ define HOW to │
|
||
│ truth │ │ │ │ use the brain │
|
||
│ │<───│ hybrid │ │ │
|
||
│ human can │ │ search │ │ RESOLVER.md │
|
||
│ always read │ │ (vector + │ │ routes intent │
|
||
│ & edit │ │ keyword + │ │ to skill │
|
||
│ │ │ RRF) │ │ │
|
||
└──────────────────┘ └───────────────┘ └──────────────────┘
|
||
```
|
||
|
||
The repo is the system of record. GBrain is the retrieval layer. The agent reads and writes through both. Human always wins... edit any markdown file and `gbrain sync` picks up the changes.
|
||
|
||
## The Knowledge Model
|
||
|
||
Every page follows the compiled truth + timeline pattern:
|
||
|
||
```markdown
|
||
---
|
||
type: concept
|
||
title: Do Things That Don't Scale
|
||
tags: [startups, growth, pg-essay]
|
||
---
|
||
|
||
Paul Graham's argument that startups should do unscalable things early on.
|
||
The key insight: the unscalable effort teaches you what users actually
|
||
want, which you can't learn any other way.
|
||
|
||
---
|
||
|
||
- 2013-07-01: Published on paulgraham.com
|
||
- 2024-11-15: Referenced in batch W25 kickoff talk
|
||
```
|
||
|
||
Above the `---`: **compiled truth**. Your current best understanding. Gets rewritten when new evidence changes the picture. Below: **timeline**. Append-only evidence trail. Never edited, only added to.
|
||
|
||
## Knowledge Graph
|
||
|
||
Pages aren't just text. Every mention of a person, company, or concept becomes a typed link in a structured graph. The brain wires itself.
|
||
|
||
```
|
||
Write a meeting page mentioning Alice and Acme AI
|
||
-> Auto-link extracts entity refs from content (zero LLM calls)
|
||
-> Infers types: meeting page + person ref => `attended`
|
||
"CEO of X" pattern => `works_at`
|
||
"invested in" => `invested_in`
|
||
"advises", "advisor" => `advises`
|
||
"founded", "co-founded" => `founded`
|
||
-> Reconciles stale links: edits remove links no longer in content
|
||
-> Backlinks rank well-connected entities higher in search
|
||
```
|
||
|
||
```bash
|
||
gbrain graph-query people/alice --type attended --depth 2
|
||
# returns who Alice met with, transitively
|
||
```
|
||
|
||
The graph powers questions vector search can't: "who works at Acme AI?", "what has Bob invested in?", "find the connection between Alice and Carol". Backfill an existing brain in one command:
|
||
|
||
```bash
|
||
gbrain extract links --source db # wire up the existing 29K pages
|
||
gbrain extract timeline --source db # extract dated events from markdown timelines
|
||
```
|
||
|
||
Then ask graph questions or watch the search ranking improve. Benchmarked: **Recall@5 jumps from 83% to 95%, Precision@5 from 39% to 45%, +30 more correct answers in the agent's top-5 reads** on a 240-page Opus-generated rich-prose corpus. Graph-only F1 hits 86.6% vs grep's 57.8% (+28.8 pts). See [docs/benchmarks/2026-04-18-brainbench-v1.md](docs/benchmarks/2026-04-18-brainbench-v1.md).
|
||
|
||
## Search
|
||
|
||
Hybrid search: vector + keyword + RRF fusion + multi-query expansion + 4-layer dedup.
|
||
|
||
```
|
||
Query
|
||
-> Intent classifier (entity? temporal? event? general?)
|
||
-> Multi-query expansion (Claude Haiku)
|
||
-> Vector search (HNSW cosine) + Keyword search (tsvector)
|
||
-> RRF fusion: score = sum(1/(60 + rank))
|
||
-> Cosine re-scoring + compiled truth boost
|
||
-> 4-layer dedup + compiled truth guarantee
|
||
-> Results
|
||
```
|
||
|
||
Keyword alone misses conceptual matches. Vector alone misses exact phrases. RRF gets both. Search quality is benchmarked and reproducible: `gbrain eval --qrels queries.json` measures P@k, Recall@k, MRR, and nDCG@k. A/B test config changes before deploying them.
|
||
|
||
## Why it works: many strategies in concert
|
||
|
||
The brain isn't one trick. Every retrieval question goes through ~20 deterministic
|
||
techniques layered together. No single one is magic; the win comes from stacking
|
||
them so each layer covers what the others miss.
|
||
|
||
```
|
||
Question
|
||
│
|
||
├─ INGESTION (every put_page)
|
||
│ ├─ Recursive markdown chunking (or semantic / LLM-guided)
|
||
│ ├─ Embedding cache invalidation on edit
|
||
│ └─ Idempotent imports (content-hash dedup)
|
||
│
|
||
├─ GRAPH EXTRACTION (auto-link post-hook, zero LLM)
|
||
│ ├─ Entity-ref regex (markdown links + bare slugs)
|
||
│ ├─ Code-fence stripping (no false-positive slugs in code blocks)
|
||
│ ├─ Typed inference cascade (FOUNDED → INVESTED → ADVISES → WORKS_AT)
|
||
│ ├─ Page-role priors (partner-bio language → invested_in)
|
||
│ ├─ Within-page dedup (same target collapses to one link)
|
||
│ ├─ Stale-link reconciliation (edits remove dropped refs)
|
||
│ └─ Multi-type link constraint (same person can works_at AND advises)
|
||
│
|
||
├─ SEARCH PIPELINE (every query)
|
||
│ ├─ Intent classifier (entity / temporal / event / general — auto-routes)
|
||
│ ├─ Multi-query expansion (Haiku rephrases the question 3 ways)
|
||
│ ├─ Vector search (HNSW cosine over OpenAI embeddings)
|
||
│ ├─ Keyword search (Postgres tsvector + websearch_to_tsquery)
|
||
│ ├─ Reciprocal Rank Fusion (score = sum 1/(60+rank) across both)
|
||
│ ├─ Cosine re-scoring (re-rank chunks against actual query embedding)
|
||
│ ├─ Compiled-truth boost (assessments outrank timeline noise)
|
||
│ ├─ Backlink boost (well-connected entities rank higher)
|
||
│ └─ Source-aware dedup (one CT chunk per page guaranteed)
|
||
│
|
||
├─ GRAPH TRAVERSAL (relational queries)
|
||
│ ├─ Recursive CTE with cycle prevention (visited-array check)
|
||
│ ├─ Type-filtered edges (--type works_at, attended, etc.)
|
||
│ ├─ Direction control (in / out / both)
|
||
│ └─ Depth-capped (≤10 for remote MCP; DoS prevention)
|
||
│
|
||
└─ AGENT WORKFLOW (graph-confident hybrid)
|
||
├─ Graph-query first (high-precision typed answers)
|
||
├─ Grep fallback when graph returns nothing
|
||
└─ Graph hits ranked first in top-K (better P@K and R@K)
|
||
```
|
||
|
||
End-to-end on the BrainBench v1 corpus (240 rich-prose pages, before/after PR #188):
|
||
|
||
| Metric | BEFORE PR #188 | AFTER PR #188 | Δ |
|
||
|-------------------------|----------------|---------------|-------------|
|
||
| **Precision@5** | 39.2% | **44.7%** | **+5.4 pts**|
|
||
| **Recall@5** | 83.1% | **94.6%** | **+11.5 pts**|
|
||
| Correct in top-5 | 217 | 247 | **+30** |
|
||
| Graph-only F1 (ablation)| 57.8% (grep) | **86.6%** | **+28.8 pts**|
|
||
|
||
Plus 5 orthogonal capability checks (identity resolution, temporal queries,
|
||
performance at 10K-page scale, robustness to malformed input, MCP operation
|
||
contract). All pass. [Full report.](docs/benchmarks/2026-04-18-brainbench-v1.md)
|
||
|
||
The point: each technique handles a class of inputs the others miss. Vector
|
||
search misses exact slug refs; keyword catches them. Keyword misses conceptual
|
||
matches; vector catches them. RRF picks the best of both. Compiled-truth boost
|
||
keeps assessments above timeline noise. Auto-link extraction wires the graph
|
||
that lets backlink boost rank well-connected entities higher. Graph traversal
|
||
answers questions search alone can't reach. The agent picks graph-first for
|
||
precision and falls back to keyword for recall. **All deterministic, all in
|
||
concert, all measured.**
|
||
|
||
## Voice
|
||
|
||
Call a phone number. Your AI answers. It knows who's calling, pulls their full context from the brain, and responds like someone who actually knows your world. When the call ends, a brain page appears with the transcript, entity detection, and cross-references.
|
||
|
||
<p align="center">
|
||
<img src="docs/images/voice-client.png" alt="Voice client connected" width="300" />
|
||
</p>
|
||
|
||
> [See it in action](https://x.com/garrytan/status/2043022208512172263)
|
||
|
||
The voice recipe ships with GBrain: [Voice-to-Brain](recipes/twilio-voice-brain.md). WebRTC works in a browser tab with zero setup. A real phone number is optional.
|
||
|
||
## Engine Architecture
|
||
|
||
```
|
||
CLI / MCP Server
|
||
(thin wrappers, identical operations)
|
||
|
|
||
BrainEngine interface (pluggable)
|
||
|
|
||
+--------+--------+
|
||
| |
|
||
PGLiteEngine PostgresEngine
|
||
(default) (Supabase)
|
||
| |
|
||
~/.gbrain/ Supabase Pro ($25/mo)
|
||
brain.pglite Postgres + pgvector
|
||
embedded PG 17.5
|
||
|
||
gbrain migrate --to supabase|pglite
|
||
(bidirectional migration)
|
||
```
|
||
|
||
PGLite: embedded Postgres, no server, zero config. When your brain outgrows local (1000+ files, multi-device), `gbrain migrate --to supabase` moves everything.
|
||
|
||
## File Storage
|
||
|
||
Brain repos accumulate binaries. GBrain has a three-stage migration:
|
||
|
||
```bash
|
||
gbrain files mirror <dir> # copy to cloud, local untouched
|
||
gbrain files redirect <dir> # replace local with .redirect pointers
|
||
gbrain files clean <dir> # remove pointers, cloud only
|
||
gbrain files restore <dir> # download everything back (undo)
|
||
```
|
||
|
||
Storage backends: S3-compatible (AWS, R2, MinIO), Supabase Storage, or local.
|
||
|
||
## Commands
|
||
|
||
```
|
||
SETUP
|
||
gbrain init [--supabase|--url] Create brain (PGLite default)
|
||
gbrain migrate --to supabase|pglite Bidirectional engine migration
|
||
gbrain upgrade Self-update with feature discovery
|
||
|
||
PAGES
|
||
gbrain get <slug> Read a page (fuzzy slug matching)
|
||
gbrain put <slug> [< file.md] Write/update (auto-versions)
|
||
gbrain delete <slug> Delete a page
|
||
gbrain list [--type T] [--tag T] List with filters
|
||
|
||
SEARCH
|
||
gbrain search <query> Keyword search (tsvector)
|
||
gbrain query <question> Hybrid search (vector + keyword + RRF)
|
||
|
||
IMPORT
|
||
gbrain import <dir> [--no-embed] Import markdown (idempotent)
|
||
gbrain sync [--repo <path>] Git-to-brain incremental sync
|
||
gbrain export [--dir ./out/] Export to markdown
|
||
|
||
FILES
|
||
gbrain files list|upload|sync|verify File storage operations
|
||
|
||
EMBEDDINGS
|
||
gbrain embed [<slug>|--all|--stale] Generate/refresh embeddings
|
||
|
||
LINKS + GRAPH
|
||
gbrain link|unlink|backlinks Cross-reference management
|
||
gbrain extract links|timeline|all Batch backfill from existing pages
|
||
(--source db|fs, --type, --since, --dry-run)
|
||
gbrain graph-query <slug> Typed traversal (--type T --depth N
|
||
--direction in|out|both)
|
||
|
||
JOBS (Minions)
|
||
gbrain jobs submit <name> [--params JSON] [--follow] Submit a background job
|
||
gbrain jobs list [--status S] [--queue Q] List jobs with filters
|
||
gbrain jobs get|cancel|retry|delete <id> Manage job lifecycle
|
||
gbrain jobs prune [--older-than 30d] Clean completed/dead jobs
|
||
gbrain jobs stats Job health dashboard
|
||
gbrain jobs smoke One-command health check
|
||
gbrain jobs work [--queue Q] [--concurrency N] Start worker daemon
|
||
|
||
ADMIN
|
||
gbrain doctor [--json] [--fast] Health checks (resolver, skills, DB, embeddings)
|
||
gbrain doctor --fix [--dry-run] Auto-fix DRY violations (delegate inlined rules to conventions)
|
||
gbrain stats Brain statistics
|
||
gbrain serve MCP server (stdio)
|
||
gbrain integrations Integration recipe dashboard
|
||
gbrain check-backlinks check|fix Back-link enforcement
|
||
gbrain lint [--fix] LLM artifact detection
|
||
gbrain repair-jsonb [--dry-run] Repair v0.12.0 double-encoded JSONB (Postgres)
|
||
gbrain orphans [--json] [--count] Find pages with zero inbound wikilinks
|
||
gbrain transcribe <audio> Transcribe audio (Groq Whisper)
|
||
gbrain research init <name> Scaffold a data-research recipe
|
||
gbrain research list Show available recipes
|
||
```
|
||
|
||
Run `gbrain --help` for the full reference.
|
||
|
||
## Origin Story
|
||
|
||
I was setting up my [OpenClaw](https://openclaw.ai) agent and started a markdown brain repo. One page per person, one page per company, compiled truth on top, timeline on the bottom. Within a week: 10,000+ files, 3,000+ people, 13 years of calendar data, 280+ meeting transcripts, 300+ captured ideas.
|
||
|
||
The agent runs while I sleep. The dream cycle scans every conversation, enriches missing entities, fixes broken citations, consolidates memory. I wake up and the brain is smarter than when I went to sleep.
|
||
|
||
The skills in this repo are those patterns, generalized. What took 11 days to build by hand ships as a mod you install in 30 minutes.
|
||
|
||
## Docs
|
||
|
||
**For agents:**
|
||
- **[skills/RESOLVER.md](skills/RESOLVER.md)** ... Start here. The skill dispatcher.
|
||
- [Individual skill files](skills/) ... 25 standalone instruction sets
|
||
- [GBRAIN_SKILLPACK.md](docs/GBRAIN_SKILLPACK.md) ... Legacy reference architecture
|
||
- [Getting Data In](docs/integrations/README.md) ... Integration recipes and data flow
|
||
- [GBRAIN_VERIFY.md](docs/GBRAIN_VERIFY.md) ... Installation verification
|
||
|
||
**For humans:**
|
||
- [GBRAIN_RECOMMENDED_SCHEMA.md](docs/GBRAIN_RECOMMENDED_SCHEMA.md) ... Brain repo directory structure
|
||
- [Thin Harness, Fat Skills](docs/ethos/THIN_HARNESS_FAT_SKILLS.md) ... Architecture philosophy
|
||
- [ENGINES.md](docs/ENGINES.md) ... Pluggable engine interface
|
||
|
||
**Reference:**
|
||
- [GBRAIN_V0.md](docs/GBRAIN_V0.md) ... Full product spec
|
||
- [CHANGELOG.md](CHANGELOG.md) ... Version history
|
||
|
||
**Benchmarks:**
|
||
- [BrainBench v1 (PR #188)](docs/benchmarks/2026-04-18-brainbench-v1.md) ... single comprehensive before/after report on a 240-page Opus-generated corpus. 7 categories: relational queries, identity resolution, temporal queries, performance, robustness, MCP contract.
|
||
|
||
## Contributing
|
||
|
||
See [CONTRIBUTING.md](CONTRIBUTING.md). Run `bun test` for unit tests. E2E tests: spin up Postgres with pgvector, run `bun run test:e2e`, tear down.
|
||
|
||
PRs welcome for: new enrichment APIs, performance optimizations, additional engine backends, new skills following the conformance standard in `skills/skill-creator/SKILL.md`.
|
||
|
||
## License
|
||
|
||
MIT
|