Files
gbrain/llms-full.txt
Garry Tan 275158137a fix: v0.18.1 — RLS hardening + schema backfill (supersedes #336) (#343)
* fix(doctor): check ALL public tables for RLS, not just gbrain's own

The RLS check was hardcoded to only verify 10 gbrain-managed tables:
pages, content_chunks, links, tags, raw_data, page_versions,
timeline_entries, ingest_log, config, files.

Any other table in the public schema (created by the application,
extensions, or manually) was invisible to the check. This allowed
12 tables to exist without RLS for months — publicly readable by
anyone with the Supabase anon key.

Changes:
- Query ALL tables in public schema, not a hardcoded list
- Upgrade severity from 'warn' to 'fail' — missing RLS is a security
  issue, not a suggestion
- Include table count in success message for visibility
- Include remediation SQL in failure message

Supabase exposes the public schema via PostgREST. Any table without
RLS is readable/writable by the anon key by default.

* fix(schema): enable RLS on 10 gbrain-managed public tables

The base schema and prior migrations shipped 10 public tables
without Row Level Security enabled: access_tokens, mcp_request_log,
minion_inbox, minion_attachments, subagent_messages,
subagent_tool_executions, subagent_rate_leases, gbrain_cycle_locks,
budget_ledger, budget_reservations.

Supabase exposes the public schema via PostgREST, so tables without
RLS are readable and writable by anyone holding the anon key.
access_tokens and the subagent conversation history tables carry
the most sensitive data in the set.

Fix: add the missing ENABLE RLS statements to src/schema.sql
(inside the existing BYPASSRLS-gated DO block, so dev sessions
without bypass don't get locked out). Add a new schema migration
v17 rls_backfill_missing_tables that does the same on existing
brains. budget_ledger and budget_reservations were previously
migration-only (v12); promoted to the base schema so fresh installs
pick up RLS from the standard gate.

Regenerated src/core/schema-embedded.ts.

* fix(doctor): widen RLS check to all public tables, add GBRAIN:RLS_EXEMPT escape hatch

The RLS check was hardcoded to 10 gbrain-managed tables; any other
table in the public schema (plugin-created, user-created, extension-
created) was invisible to the check. Widen the scan to every
pg_tables row in the public schema.

Upgrade severity warn to fail. Missing RLS is a security issue, not
a suggestion. gbrain doctor now exits 1 when any public table lacks
RLS. Cron and CI wrappers that call gbrain doctor should be aware
of the exit-code flip.

Add an explicit escape hatch for tables that should stay readable
by the anon key on purpose (analytics, public materialized views,
plugin tables). The doctor reads pg_description for each non-RLS
table and treats a comment matching GBRAIN:RLS_EXEMPT reason=<why>
as an intentional exemption. Doctor enumerates exempt tables by
name on every successful run so they never go invisible.

There is no gbrain rls-exempt CLI subcommand by design. The escape
hatch is deliberately painful: operators drop to psql and type the
justification as raw SQL. Comment lives in pg_description, survives
pg_dump, shows up in schema diffs, and appears in shell history.

PGLite is now explicitly skipped with an ok status (embedded and
single-user, no PostgREST exposure). Previously hit the
db.getConnection() throw-catch path and surfaced a misleading warn.

Remediation SQL now quotes identifiers (ALTER TABLE "public"."<name>"
...) so it works on tables with hyphens, reserved words, or mixed
case.

See docs/guides/rls-and-you.md for the full user-facing guide.

* test: coverage for RLS hardening (doctor + migration + e2e)

Four layers of guard for the v0.18 RLS changes:

test/doctor.test.ts: source-grep structural regression guards on
the doctor RLS block — absence of the old tablename IN filter,
presence of status=fail on the gap branch, quoted-identifier
remediation SQL, PGLite skip wrapper, GBRAIN:RLS_EXEMPT parsing
with required reason=. Fast, no DB needed. Mirrors the
statement_timeout regression pattern in test/postgres-engine.test.ts.

test/migrate.test.ts: structural guard for migration v17. Asserts
the migration exists with the expected name, all 10 ALTER TABLE
statements are present, BYPASSRLS gating is in place, and
LATEST_VERSION has caught up.

test/e2e/mechanical.test.ts: rewrote the E2E RLS Verification
block. The old hardcoded-allowlist query is replaced with an
every-public-table-has-RLS assertion. Four new CLI-spawn cases
verify real end-to-end behavior: (a) no-RLS public table makes
gbrain doctor --json return status=fail with ALTER TABLE in the
message and exit code 1, (b) a GBRAIN:RLS_EXEMPT comment with a
valid reason makes doctor report the table as explicitly exempt
and keep status=ok, (c) a GBRAIN:RLS_EXEMPT prefix without a
reason= segment still fails doctor, (d) an unrelated comment on
a no-RLS table still fails doctor.

All helpers use try/finally with unique-per-run suffixes
(gbrain_rls_..._<pid>_<timestamp>) so assertion failures don't
pollute subsequent tests.

* docs: one-page guide for RLS and GBRAIN:RLS_EXEMPT escape hatch

Covers why RLS matters on Supabase (PostgREST exposes the public
schema to the anon key), what to do when gbrain doctor fails, the
exact SQL template for an intentional exemption, how to audit
exemptions later, and how the check behaves on PGLite vs
self-hosted Postgres.

Emphasizes that the escape hatch is deliberately painful on
purpose: there is no gbrain rls-exempt CLI subcommand and no
config-file allowlist. The operator drops to psql and writes the
justification in SQL, which makes the action visible in shell
history, pg_dump, schema diffs, and doctor output on every run.

Referenced from gbrain doctor's failure message when any public
table lacks RLS.

* chore: bump version and changelog (v0.18.0)

Reconciles VERSION and package.json (were drifting: 0.17.0 vs
0.16.4). Runtime gbrain --version reads from package.json via
src/version.ts, so prior ships were reporting 0.16.4. Both now
land on 0.18.0.

Minor bump (not patch) because gbrain doctor's exit code semantics
change: missing RLS on a public table was warn+exit-0, is now
fail+exit-1. Any external cron, CI, or skillpack-check wrapper
around gbrain doctor needs to be aware. skillpack-check.ts itself
is unaffected (uses --fast, skips DB checks).

CHANGELOG entry follows the release-summary format from CLAUDE.md:
headline, lead paragraph, numbers-that-matter table, what-this-
means-for-your-workflow, To take advantage of v0.18.0 block with
remediation SQL + exemption format, itemized changes.

Also sweeps a stale @Wintermute reference in the 0.17.0 entry to
"Garry's OpenClaw" per the CLAUDE.md privacy rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.18.1): address codex review (orchestrator wiring + fail-closed + identifier escape)

Four fixes from `/codex` review of the merged diff:

1. HIGH — wire migration v24 into the `gbrain apply-migrations`
   upgrade path. Without an orchestrator entry, `gbrain upgrade`'s
   post-upgrade step runs `apply-migrations --yes`, which walks the
   registry in `src/commands/migrations/index.ts`. The registry
   stopped at v0_18_0, so v24 never fired on upgrade (connectEngine
   and doctor do not call initSchema). New `v0_18_1.ts` orchestrator
   mirrors v0.18.0's Phase A: shells out to `gbrain init
   --migrate-only`, which triggers initSchema → runMigrations → v24
   applies. Registered in the migrations array.

2. HIGH — fail loudly when v24 runs under a non-BYPASSRLS role
   instead of RAISE WARNING-then-silently-bumping-version. The
   runner at migrate.ts:773 unconditionally calls
   `setConfig('version', String(m.version))` when a migration
   completes without throwing, so a WARNING-and-continue path would
   permanently lock the backfill out: schema_version=24 on the next
   run means `m.version > current` is false and v24 is skipped
   forever, even after the role gets BYPASSRLS. Changed `RAISE
   WARNING` → `RAISE EXCEPTION` so the transaction aborts,
   schema_version stays at 23, and a subsequent initSchema retries
   cleanly after the role is fixed. Test asserts the SQL uses
   EXCEPTION and does not use WARNING.

3. MEDIUM — escape double-quote characters in the remediation SQL
   output. doctor.ts was building `ALTER TABLE "public"."${n}"`
   with `n` un-escaped, so a pathological table name containing a
   literal `"` would break out of the quoted identifier and produce
   invalid copy-paste SQL. Double the `"` before interpolating,
   matching Postgres quoted-identifier escaping rules. Extremely
   rare in practice, cheap to get right.

4. LOW — CHANGELOG cleanup: corrected the upgrade-behavior claim
   (v24 runs via `apply-migrations --yes` through the new
   orchestrator, not during `gbrain doctor`) and split the "tables
   with RLS" row into two metrics (21 base-schema tables + 2
   migration-only budget_* tables = 23 managed total, all covered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: add v0.18.1 to apply-migrations skippedFuture expectations

CI-only failure: test/apply-migrations.test.ts hardcodes the
orchestrator-migration version list in two `skippedFuture` expectations.
The v0.18.1 orchestrator I added in the prior commit pushed the list to
8 entries. Both assertions now include 0.18.1 at the tail.

Caught by the gbrain CI run on the merged branch — locally the rest of
the unit suite (dream/orphans) is flaky due to unrelated PGLite
parallelism, but `bun test test/apply-migrations.test.ts` now passes
18/18. CI should follow.

* docs: scrub v0.18.1 CHANGELOG — remove specific-table attack surface

Responsible-disclosure pass on the public-facing release notes. The
prior CHANGELOG entry enumerated which gbrain-managed public tables
had shipped without RLS and highlighted the most sensitive ones by
name. That gives anyone reading the CHANGELOG a directed probe list
for unpatched Supabase installs before operators have had a chance
to run `gbrain upgrade`.

Rewritten to describe the change at a functional level (what doctor
does now, what the upgrade path does, what the escape hatch is)
without naming the specific tables or quantifying the gap. The actual
SQL remains in the binary — anyone reverse-engineering can find it
there — but we shouldn't put it on the release page with a banner.

User-facing content kept intact: the "To take advantage of" block,
the upgrade commands, the exemption SQL template, the breaking
exit-code note.

* docs(CLAUDE.md): add responsible-disclosure rule for release notes

Prior incident on this branch: the original v0.18.1 CHANGELOG entry
enumerated the specific public tables that had shipped without RLS,
quantified the exposure duration, and highlighted the most sensitive
ones by name. Garry caught it. Scrubbed in ecd06a0.

This directive codifies the rule so future sessions (or other agents
working in this repo) don't repeat the mistake:

- Describe security fixes functionally, not by attack surface.
- Public artifacts (CHANGELOG, README, docs/, PR titles/bodies,
  commit messages, release pages) get the functional description.
- Private artifacts (plan files under ~/.claude/plans/ or
  ~/.gstack/projects/) keep the detailed before/after tables.
- Source code will disclose the specifics to reverse engineers
  anyway — that's intrinsic. The concern is the broadcast-channel
  asymmetry of a release page.

Also added a corresponding feedback memory at
~/.claude/projects/.../feedback_responsible_disclosure.md so the rule
carries across sessions and other projects, not just gbrain.

Placed right after the existing privacy rule (scrub real names) since
they share the same "public artifact hygiene" posture.

* chore: regenerate llms.txt + llms-full.txt (CLAUDE.md drift)

Adding the responsible-disclosure rule to CLAUDE.md in ffe340d
diverged the committed llms-full.txt from the generator output.
The build-llms drift-guard test caught it in CI. Regenerated.

* fix(v24): guard budget_ledger + budget_reservations with IF EXISTS

Garry flagged: migration v24 fires `ALTER TABLE budget_ledger ENABLE
ROW LEVEL SECURITY` unconditionally. budget_ledger and
budget_reservations are migration-only (v12) — not in schema.sql,
not re-created on every initSchema. In the normal flow v12 runs
before v24 so they exist, but two edge cases break that assumption:

  1. An operator manually dropped them (budget data is regenerable
     from resolver call logs, so `DROP TABLE` is a reasonable
     cleanup move).
  2. A brain was somehow running an old gbrain that lacked v12, and
     is only catching up now.

Bare ALTER hits 42P01 (relation does not exist), aborts the
transaction, and leaves schema_version at 23. On next initSchema,
v24 retries and hits the same error — stuck in a loop.

Fix: wrap each of the two budget ALTERs in
    IF EXISTS (SELECT 1 FROM information_schema.tables
                WHERE table_schema = 'public'
                  AND table_name = '<tbl>') THEN ... END IF;

The other 8 tables are not guarded. schema.sql creates them
idempotently on every initSchema run before migrations fire, so
they are guaranteed to exist by the time v24 runs. Adding guards
there would be unnecessary and make the SQL noisier.

Also simplified the DECLARE/BEGIN structure: moved the
non-BYPASSRLS early-exit to the top so the happy path reads
cleanly without the outer IF.

Tests:
  - test/migrate.test.ts: new assertion that both budget_* ALTERs
    are wrapped in information_schema.tables IF EXISTS blocks;
    BYPASSRLS gate assertion relaxed to match either phrasing.
  - Manual e2e: fresh Postgres init (v0→v24), then DROP TABLE
    budget_ledger + budget_reservations, reset version=23, re-run
    init. v24 applied cleanly, version advanced to 24, budget_*
    stayed dropped. Without the guard this would have errored out.

* test(e2e): v24 self-heals when budget_* tables are missing

Behavioral e2e proof for the IF EXISTS guard added in 2fc7780. Scenario:

  1. Fresh Postgres init to v24 (setupDB in beforeAll).
  2. DROP TABLE budget_ledger + budget_reservations.
  3. Roll config.version back to '23'.
  4. CLI-spawn `gbrain init --non-interactive` to re-trigger initSchema.
  5. Assert: exit 0, no 42P01 in stderr, version advances to 24,
     budget_* stay dropped (since v12 doesn't re-run at
     current=23 > v12=12).

Without the guard, step 4 hits 42P01 (relation does not exist),
aborts the transaction, leaves version at 23, and the next
initSchema re-runs v24 forever — an infinite retry loop. This test
catches any future regression that strips the guard.

Cleanup (finally block) restores budget_* with the exact migration
v12 schema so downstream tests that reference these tables see the
original shape. Version is restored from the pre-test snapshot.

Runs with the rest of the E2E: RLS Verification block. 78/78 in
test/e2e/mechanical.test.ts with the addition.

---------

Co-authored-by: Wintermute <wintermute@garrytan.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 07:17:40 -07:00

4992 lines
259 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# GBrain — Full Context
> GBrain is a personal knowledge brain and GStack mod for agent platforms. Pluggable engines (PGLite default, Postgres+pgvector for scale), contract-first operations, 26 fat-markdown skills. Teaches agents brain ops, ingestion, enrichment, scheduling, identity, and access control.
This file concatenates core GBrain documentation for single-fetch ingestion.
For the link-only index, see `llms.txt`. Source of truth: https://github.com/garrytan/gbrain.
# Core entry points
## AGENTS.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/AGENTS.md
# Agents working on GBrain
This is your install + operating protocol. Claude Code reads `./CLAUDE.md` automatically.
Everyone else (Codex, Cursor, OpenClaw, Aider, Continue, or an LLM fetching via URL):
start here.
## Install (5 min)
1. Clone: `git clone https://github.com/garrytan/gbrain ~/gbrain && cd ~/gbrain`
2. Install: `bun install`
3. Init the brain: `gbrain init` (defaults to PGLite, zero-config). For 1000+ files or
multi-machine sync, init suggests Postgres + pgvector via Supabase.
4. Read [`./INSTALL_FOR_AGENTS.md`](./INSTALL_FOR_AGENTS.md) for the full 9-step flow
(API keys, identity, cron, verification).
## Read this order
1. `./AGENTS.md` (this file) — install + operating protocol.
2. [`./CLAUDE.md`](./CLAUDE.md) — architecture reference, key files, trust boundaries,
test layout.
3. [`./skills/RESOLVER.md`](./skills/RESOLVER.md) — skill dispatcher. Read before any task.
## Trust boundary (critical)
GBrain distinguishes **trusted local CLI callers** (`OperationContext.remote = false`,
set by `src/cli.ts`) from **untrusted agent-facing callers** (`remote = true`, set by
`src/mcp/server.ts`). Security-sensitive operations like `file_upload` tighten filesystem
confinement when `remote = true` and default to strict behavior when unset. If you are
writing or reviewing an operation, consult `src/core/operations.ts` for the contract.
## Common tasks
- **Configure:** [`docs/ENGINES.md`](./docs/ENGINES.md),
[`docs/guides/live-sync.md`](./docs/guides/live-sync.md),
[`docs/mcp/DEPLOY.md`](./docs/mcp/DEPLOY.md).
- **Debug:** [`docs/GBRAIN_VERIFY.md`](./docs/GBRAIN_VERIFY.md),
[`docs/guides/minions-fix.md`](./docs/guides/minions-fix.md), `gbrain doctor --fix`.
- **Migrate:** [`docs/UPGRADING_DOWNSTREAM_AGENTS.md`](./docs/UPGRADING_DOWNSTREAM_AGENTS.md),
[`skills/migrations/`](./skills/migrations/), `gbrain apply-migrations`.
- **Everything else:** [`./llms.txt`](./llms.txt) is the full documentation map.
[`./llms-full.txt`](./llms-full.txt) is the same map with core docs inlined for
single-fetch ingestion.
## Before shipping
Run `bun test` plus the E2E lifecycle described in `./CLAUDE.md` (spin up the test
Postgres container, run `bun run test:e2e`, tear it down). Ship via the `/ship` skill,
not by hand.
## Privacy
Never commit real names of people, companies, or funds into public artifacts. See the
Privacy rule in `./CLAUDE.md`. GBrain pages reference real contacts; public docs must
use generic placeholders (`alice-example`, `acme-example`, `fund-a`).
## Forks
If you are a fork, regenerate `llms.txt` + `llms-full.txt` with your own URL base before
publishing: `LLMS_REPO_BASE=https://raw.githubusercontent.com/your-org/your-fork/main bun run build:llms`.
---
## CLAUDE.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/CLAUDE.md
# CLAUDE.md
GBrain is a personal knowledge brain and GStack mod for agent platforms. Pluggable
engines: PGLite (embedded Postgres via WASM, zero-config default) or Postgres + pgvector
+ hybrid search in a managed Supabase instance. `gbrain init` defaults to PGLite;
suggests Supabase for 1000+ files. GStack teaches agents how to code. GBrain teaches
agents everything else: brain ops, signal detection, content ingestion, enrichment,
cron scheduling, reports, identity, and access control.
## Architecture
Contract-first: `src/core/operations.ts` defines ~41 shared operations (adds `find_orphans` in v0.12.3). CLI and MCP
server are both generated from this single source. Engine factory (`src/core/engine-factory.ts`)
dynamically imports the configured engine (`'pglite'` or `'postgres'`). Skills are fat
markdown files (tool-agnostic, work with both CLI and plugin contexts).
**Trust boundary:** `OperationContext.remote` distinguishes trusted local CLI callers
(`remote: false` set by `src/cli.ts`) from untrusted agent-facing callers
(`remote: true` set by `src/mcp/server.ts`). Security-sensitive operations like
`file_upload` tighten filesystem confinement when `remote=true` and default to
strict behavior when unset.
## Key files
- `src/core/operations.ts` — Contract-first operation definitions (the foundation). Also exports upload validators: `validateUploadPath`, `validatePageSlug`, `validateFilename`. `OperationContext.remote` flags untrusted callers.
- `src/core/engine.ts` — Pluggable engine interface (BrainEngine). `clampSearchLimit(limit, default, cap)` takes an explicit cap so per-operation caps can be tighter than `MAX_SEARCH_LIMIT`. Exports `LinkBatchInput` / `TimelineBatchInput` for the v0.12.1 bulk-insert API (`addLinksBatch` / `addTimelineEntriesBatch`). As of v0.13.1, `BrainEngine` has a `readonly kind: 'postgres' | 'pglite'` discriminator so migrations (`src/core/migrate.ts`) and other consumers can branch on engine without `instanceof` + dynamic imports.
- `src/core/engine-factory.ts` — Engine factory with dynamic imports (`'pglite'` | `'postgres'`)
- `src/core/pglite-engine.ts` — PGLite (embedded Postgres 17.5 via WASM) implementation, all 40 BrainEngine methods. `addLinksBatch` / `addTimelineEntriesBatch` use multi-row `unnest()` with manual `$N` placeholders. As of v0.13.1, `connect()` wraps `PGlite.create()` in a try/catch that emits an actionable error naming the macOS 26.3 WASM bug (#223) and pointing at `gbrain doctor`; the lock is released on failure so the next process can retry cleanly.
- `src/core/pglite-schema.ts` — PGLite-specific DDL (pgvector, pg_trgm, triggers)
- `src/core/postgres-engine.ts` — Postgres + pgvector implementation (Supabase / self-hosted). `addLinksBatch` / `addTimelineEntriesBatch` use `INSERT ... SELECT FROM unnest($1::text[], ...) JOIN pages ON CONFLICT DO NOTHING RETURNING 1` — 4-5 array params regardless of batch size, sidesteps the 65535-parameter cap. As of v0.12.3, `searchKeyword` / `searchVector` scope `statement_timeout` via `sql.begin` + `SET LOCAL` so the GUC dies with the transaction instead of leaking across the pooled postgres.js connection (contributed by @garagon). `getEmbeddingsByChunkIds` uses `tryParseEmbedding` so one corrupt row skips+warns instead of killing the query.
- `src/core/utils.ts` — Shared SQL utilities extracted from postgres-engine.ts. Exports `parseEmbedding(value)` (throws on unknown input, used by migration + ingest paths where data integrity matters) and as of v0.12.3 `tryParseEmbedding(value)` (returns `null` + warns once per process, used by search/rescore paths where availability matters more than strictness).
- `src/core/db.ts` — Connection management, schema initialization
- `src/commands/migrate-engine.ts` — Bidirectional engine migration (`gbrain migrate --to supabase/pglite`)
- `src/core/import-file.ts` — importFromFile + importFromContent (chunk + embed + tags)
- `src/core/sync.ts` — Pure sync functions (manifest parsing, filtering, slug conversion)
- `src/core/storage.ts` — Pluggable storage interface (S3, Supabase Storage, local)
- `src/core/supabase-admin.ts` — Supabase admin API (project discovery, pgvector check)
- `src/core/file-resolver.ts` — File resolution with fallback chain (local -> .redirect.yaml -> .redirect -> .supabase)
- `src/core/chunkers/` — 3-tier chunking (recursive, semantic, LLM-guided)
- `src/core/search/` — Hybrid search: vector + keyword + RRF + multi-query expansion + dedup
- `src/core/search/intent.ts` — Query intent classifier (entity/temporal/event/general → auto-selects detail level)
- `src/core/search/eval.ts` — Retrieval eval harness: P@k, R@k, MRR, nDCG@k metrics + runEval() orchestrator
- `src/commands/eval.ts` — `gbrain eval` command: single-run table + A/B config comparison
- `src/core/embedding.ts` — OpenAI text-embedding-3-large, batch, retry, backoff
- `src/core/check-resolvable.ts` — Resolver validation: reachability, MECE overlap, DRY checks, structured fix objects. v0.14.1: `CROSS_CUTTING_PATTERNS.conventions` is an array (notability gate accepts both `conventions/quality.md` and `_brain-filing-rules.md`). New `extractDelegationTargets()` parses `> **Convention:**`, `> **Filing rule:**`, and inline backtick references. DRY suppression is proximity-based via `DRY_PROXIMITY_LINES = 40`.
- `src/core/repo-root.ts` — Shared `findRepoRoot(startDir?)` (v0.16.4): walks up from `startDir` (default `process.cwd()`) looking for `skills/RESOLVER.md`. Zero-dependency module imported by both `doctor.ts` and `check-resolvable.ts`. Parameterized `startDir` makes tests hermetic.
- `src/commands/check-resolvable.ts` — Standalone CLI wrapper (v0.16.4) over `checkResolvable()`. Exports `parseFlags`, `resolveSkillsDir`, `DEFERRED`, `runCheckResolvable`. Exit rule: **1 on any issue (warnings OR errors)**, stricter than doctor's `ok` flag — honors README:259. Stable JSON envelope `{ok, skillsDir, report, autoFix, deferred, error, message}` — same shape on success and error paths. `--fix` path runs `autoFixDryViolations` BEFORE `checkResolvable` (same ordering as doctor). `deferred[]` array surfaces pending Checks 5 (trigger routing eval) and 6 (brain filing) with issue URLs. `scripts/skillify-check.ts` subprocess-calls `gbrain check-resolvable --json` (cached per process) and fails loud on binary-missing — no silent false-pass.
- `src/core/dry-fix.ts` — `gbrain doctor --fix` engine. `autoFixDryViolations(fixes, {dryRun})` rewrites inlined rules to `> **Convention:** see [path](path).` callouts via three shape-aware expanders (bullet / blockquote / paragraph). Five guards: working-tree-dirty (`getWorkingTreeStatus()` returns 3-state `'clean' | 'dirty' | 'not_a_repo'`), no-git-backup, inside-code-fence, already-delegated (40-line proximity, consistent with detector), ambiguous-multi-match, block-is-callout. `execFileSync` array args (no shell — no injection surface). EOF newline preserved.
- `src/core/backoff.ts` — Adaptive load-aware throttling: CPU/memory checks, exponential backoff, active hours multiplier
- `src/core/fail-improve.ts` — Deterministic-first, LLM-fallback loop with JSONL failure logging and auto-test generation
- `src/core/transcription.ts` — Audio transcription: Groq Whisper (default), OpenAI fallback, ffmpeg segmentation for >25MB
- `src/core/enrichment-service.ts` — Global enrichment service: entity slug generation, tier auto-escalation, batch throttling
- `src/core/data-research.ts` — Recipe validation, field extraction (MRR/ARR regex), dedup, tracker parsing, HTML stripping
- `src/commands/extract.ts` — `gbrain extract links|timeline|all [--source fs|db]`: batch link/timeline extraction. fs walks markdown files, db walks pages from the engine (mutation-immune snapshot iteration; use this for live brains with no local checkout). As of v0.12.1 there is no in-memory dedup pre-load — candidates are buffered 100 at a time and flushed via `addLinksBatch` / `addTimelineEntriesBatch`; `ON CONFLICT DO NOTHING` enforces uniqueness at the DB layer, and the `created` counter returns real rows inserted (truthful on re-runs).
- `src/commands/graph-query.ts` — `gbrain graph-query <slug> [--type T] [--depth N] [--direction in|out|both]`: typed-edge relationship traversal (renders indented tree)
- `src/core/link-extraction.ts` — shared library for the v0.12.0 graph layer. extractEntityRefs (canonical, replaces backlinks.ts duplicate) matches both `[Name](people/slug)` markdown links and Obsidian `[[people/slug|Name]]` wikilinks as of v0.12.3. extractPageLinks, inferLinkType heuristics (attended/works_at/invested_in/founded/advises/source/mentions), parseTimelineEntries, isAutoLinkEnabled config helper. `DIR_PATTERN` covers `people`, `companies`, `deals`, `topics`, `concepts`, `projects`, `entities`, `tech`, `finance`, `personal`, `openclaw`. Used by extract.ts, operations.ts auto-link post-hook, and backlinks.ts.
- `src/core/minions/` — Minions job queue: BullMQ-inspired, Postgres-native (queue, worker, backoff, types, protected-names, quiet-hours, stagger, handlers/shell).
- `src/core/minions/queue.ts` — MinionQueue class (submit, claim, complete, fail, stall detection, parent-child, depth/child-cap, per-job timeouts, cascade-kill, attachments, idempotency keys, child_done inbox, removeOnComplete/Fail). `add()` takes a 4th `trusted` arg (separate from `opts` to prevent spread leakage); protected names in `PROTECTED_JOB_NAMES` require `{allowProtectedSubmit: true}` and the check runs trim-normalized (whitespace-bypass safe). v0.14.1 #219: `add()` plumbs `max_stalled` through with a `[1, 100]` clamp; omitted values let the schema DEFAULT (5) kick in.
- `src/core/minions/worker.ts` — MinionWorker class (handler registry, lock renewal, graceful shutdown, timeout safety net). v0.14.0 abort-path fix: aborted jobs now call `failJob` with reason (`timeout`/`cancel`/`lock-lost`/`shutdown`) instead of returning silently. `shutdownAbort` (instance field) fires on process SIGTERM/SIGINT and propagates to `ctx.shutdownSignal` — shell handler listens to it; non-shell handlers don't.
- `src/core/minions/types.ts` — `MinionJobInput` + `MinionJobStatus` + handler context types. `MinionJobInput.max_stalled` (new in v0.14.1) is optional; omitted values let the schema DEFAULT (5) kick in, provided values are clamped to `[1, 100]`.
- `src/core/minions/protected-names.ts` — side-effect-free constant module exporting `PROTECTED_JOB_NAMES` + `isProtectedJobName()`. Kept pure so queue core can import without loading handler modules.
- `src/core/minions/handlers/shell.ts` — `shell` job handler. Spawns `/bin/sh -c cmd` (absolute path, PATH-override-safe) or `argv[0] argv[1..]` (no shell). Env allowlist: `PATH, HOME, USER, LANG, TZ, NODE_ENV` + caller `env:` overrides. UTF-8-safe stdout/stderr tail via `string_decoder.StringDecoder`. Abort (either `ctx.signal` or `ctx.shutdownSignal`) fires SIGTERM → 5s grace → SIGKILL on child. Requires `GBRAIN_ALLOW_SHELL_JOBS=1` on worker (gated by `registerBuiltinHandlers`).
- `src/core/minions/handlers/shell-audit.ts` — per-submission JSONL audit trail at `~/.gbrain/audit/shell-jobs-YYYY-Www.jsonl` (ISO-week rotation; override via `GBRAIN_AUDIT_DIR`). Best-effort: `mkdirSync(recursive)` + `appendFileSync`; failures logged to stderr, submission not blocked. Logs cmd (first 80 chars) or argv (JSON array). Never logs env values.
- `src/core/minions/handlers/subagent.ts` (v0.15) — LLM-loop handler. Two-phase tool persistence (pending → complete/failed), replay reconciliation for mid-dispatch crashes, dual-signal abort (`ctx.signal` + `ctx.shutdownSignal`), Anthropic prompt caching on system + tool defs. `makeSubagentHandler({engine, client?, ...})` factory; `MessagesClient` is an injectable interface the real SDK implements structurally. Throws `RateLeaseUnavailableError` (renewable) when rate-lease capacity is full.
- `src/core/minions/handlers/subagent-aggregator.ts` (v0.15) — `subagent_aggregator` handler. Claims AFTER all children resolve (queue changes guarantee every terminal child posts a `child_done` inbox message with outcome). Reads inbox via `ctx.readInbox()`, builds deterministic mixed-outcome markdown summary. No LLM call in v0.15.
- `src/core/minions/handlers/subagent-audit.ts` (v0.15) — JSONL audit + heartbeat writer at `~/.gbrain/audit/subagent-jobs-YYYY-Www.jsonl`. Events: `submission` (one line per submit) + `heartbeat` (per turn boundary: `llm_call_started | llm_call_completed | tool_called | tool_result | tool_failed`). Never logs prompts or tool inputs. `readSubagentAuditForJob(jobId, {sinceIso})` is the readback path for `gbrain agent logs`.
- `src/core/minions/rate-leases.ts` (v0.15) — lease-based concurrency cap for outbound providers (default key `anthropic:messages`, max via `GBRAIN_ANTHROPIC_MAX_INFLIGHT`). Owner-tagged rows with `expires_at` auto-prune on acquire; `pg_advisory_xact_lock` guards check-then-insert; CASCADE on owning job deletion. `renewLeaseWithBackoff` retries 3x (250/500/1000ms).
- `src/core/minions/wait-for-completion.ts` (v0.15) — poll-until-terminal helper for CLI callers. `TimeoutError` does NOT cancel the job; `AbortSignal` exits without throwing. Default `pollMs`: 1000 on Postgres, 250 on PGLite inline.
- `src/core/minions/transcript.ts` (v0.15) — renders `subagent_messages` + `subagent_tool_executions` to markdown. Tool rows splice under their owning assistant `tool_use` by `tool_use_id`. UTF-8-safe truncation; unknown block types fall through to fenced JSON.
- `src/core/minions/plugin-loader.ts` (v0.15) — `GBRAIN_PLUGIN_PATH` discovery. Absolute paths only, left-wins collision, `gbrain.plugin.json` with `plugin_version: "gbrain-plugin-v1"`, plugins ship DEFS only (no new tools), `allowed_tools:` validated at load time against the derived registry.
- `src/core/minions/tools/brain-allowlist.ts` (v0.15) — derives subagent tool registry from `src/core/operations.ts`. 11-name allow-list: `query`, `search`, `get_page`, `list_pages`, `file_list`, `file_url`, `get_backlinks`, `traverse_graph`, `resolve_slugs`, `get_ingest_log`, `put_page`. `put_page` schema is namespace-wrapped per subagent (`^wiki/agents/<subagentId>/.+`); the `put_page` op's server-side check is the authoritative gate via `ctx.viaSubagent` fail-closed.
- `src/mcp/tool-defs.ts` (v0.15) — extracted `buildToolDefs(ops)` helper. MCP server + subagent tool registry both call it; byte-for-byte equivalence pinned by `test/mcp-tool-defs.test.ts`.
- `src/core/minions/attachments.ts` — Attachment validation (path traversal, null byte, oversize, base64, duplicate detection)
- `src/commands/agent.ts` (v0.16) — `gbrain agent run <prompt> [flags]` CLI. Submits `subagent` (or N children + 1 aggregator) under `{allowProtectedSubmit: true}`. Single-entry `--fanout-manifest` short-circuits. Children get `on_child_fail: 'continue'` + `max_stalled: 3`. `--follow` is the default on TTY; streams logs + polls `waitForCompletion` in parallel. Ctrl-C detaches, does not cancel.
- `src/commands/agent-logs.ts` (v0.16) — `gbrain agent logs <job> [--follow] [--since]`. Merges JSONL heartbeat audit + `subagent_messages` into a chronological timeline. `parseSince` accepts ISO-8601 or relative (`5m`, `1h`, `2d`). Transcript tail renders only for terminal jobs.
- `src/commands/jobs.ts` — `gbrain jobs` CLI subcommands + `gbrain jobs work` daemon. v0.13.1 surfaces the full `MinionJobInput` retry/backoff/timeout/idempotency surface as first-class CLI flags on `jobs submit`: `--max-stalled`, `--backoff-type fixed|exponential`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key`. `jobs smoke --sigkill-rescue` is the opt-in regression guard for #219. v0.16 wires `registerBuiltinHandlers` to always register `subagent` + `subagent_aggregator` (no env flag — `ANTHROPIC_API_KEY` is the natural cost gate, trust is via `PROTECTED_JOB_NAMES`) and loads `GBRAIN_PLUGIN_PATH` plugins at worker startup with a loud startup-line per plugin. `shell` handler still gated by `GBRAIN_ALLOW_SHELL_JOBS=1` (RCE surface, separate concern).
- `src/commands/features.ts` — `gbrain features --json --auto-fix`: usage scan + feature adoption salesman
- `src/commands/autopilot.ts` — `gbrain autopilot --install`: self-maintaining brain daemon (sync+extract+embed)
- `src/mcp/server.ts` — MCP stdio server (generated from operations)
- `src/commands/auth.ts` — Standalone token management (create/list/revoke/test)
- `src/commands/upgrade.ts` — Self-update CLI. `runPostUpgrade()` enumerates migrations from the TS registry (src/commands/migrations/index.ts) and tail-calls `runApplyMigrations(['--yes', '--non-interactive'])` so the mechanical side of every outstanding migration runs unconditionally.
- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3 — superseded by v0.14.3's schema-level DEFAULT 5 + UPDATE backfill; pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry <version>` writes a `'retry'` reset marker. v0.14.3 (fix wave) ships schema-only migrations v14 (`pages_updated_at_index`) + v15 (`minion_jobs_max_stalled_default_5` with UPDATE backfill) via the `MIGRATIONS` array in `src/core/migrate.ts` — no orchestrator phases needed.
- `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent.
- `src/commands/orphans.ts` — `gbrain orphans [--json] [--count] [--include-pseudo]`: surfaces pages with zero inbound wikilinks, grouped by domain. Auto-generated/raw/pseudo pages filtered by default. Also exposed as `find_orphans` MCP operation. Shipped in v0.12.3 (contributed by @knee5).
- `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). v0.15.2: every DB check is wrapped in a progress phase; `markdown_body_completeness` runs under a 1s heartbeat timer so 10+ min scans are observable on 50K-page brains. Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, and `gbrain apply-migrations`.
- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows.
- `src/core/progress.ts` — Shared bulk-action progress reporter. Writes to stderr. Modes: `auto` (TTY: `\r`-rewriting; non-TTY: plain lines), `human`, `json` (JSONL), `quiet`. Rate-gated by `minIntervalMs` and `minItems`. `startHeartbeat(reporter, note)` helper for single long queries. `child()` composes phase paths. Singleton SIGINT/SIGTERM coordinator emits `abort` events for every live phase. EPIPE defense on both sync throws and stream `'error'` events. Zero dependencies. Introduced in v0.15.2.
- `src/core/cli-options.ts` — Global CLI flag parser. `parseGlobalFlags(argv)` returns `{cliOpts, rest}` with `--quiet` / `--progress-json` / `--progress-interval=<ms>` stripped. `getCliOptions()` / `setCliOptions()` expose a module-level singleton so commands reach the resolved flags without parameter threading. `cliOptsToProgressOptions()` maps to reporter options. `childGlobalFlags()` returns the flag suffix to append to `execSync('gbrain ...')` calls in migration orchestrators. `OperationContext.cliOpts` extends shared-op dispatch for MCP callers.
- `src/core/cycle.ts` — v0.17 brain maintenance cycle primitive. `runCycle(engine: BrainEngine | null, opts: CycleOpts): Promise<CycleReport>` composes 6 phases in semantically-driven order (lint → backlinks → sync → extract → embed → orphans). Three callers: `gbrain dream` CLI, `gbrain autopilot` daemon's inline path, and the Minions `autopilot-cycle` handler (`src/commands/jobs.ts`). One source of truth for what the brain does overnight. Coordination via `gbrain_cycle_locks` DB table (TTL-based; works through PgBouncer transaction pooling, unlike session-scoped `pg_try_advisory_lock`) + `~/.gbrain/cycle.lock` file lock with PID-liveness for PGLite / engine=null mode. `CycleReport.schema_version: "1"` is the stable agent-consumable shape. `PhaseResult.error: { class, code, message, hint?, docs_url? }` is Stripe-API-tier structured failure info. `yieldBetweenPhases` hook awaited between every phase — Minions handler uses this to renew its job lock and prevent v0.14 stall-death regression. Engine nullable: filesystem phases (lint, backlinks) run without DB; DB phases skip with `status: "skipped", reason: "no_database"`. Lock-skip: read-only phase selections (`--phase orphans`) bypass the cycle lock.
- `src/commands/dream.ts` — v0.17 `gbrain dream` CLI. ~80-line thin alias over `runCycle`. brainDir resolution requires explicit `--dir` OR `sync.repo_path` config (no more walk-up-cwd-for-.git footgun). Flags: `--dry-run`, `--json`, `--phase <name>`, `--pull`, `--dir <path>`. Exit code 1 on status=failed (partial/warn not fatal — don't page on warnings).
- `scripts/check-progress-to-stdout.sh` — CI guard against regressing to `\r`-on-stdout progress. Wired into `bun run test` via `scripts/check-progress-to-stdout.sh && bun test` in package.json.
- `docs/progress-events.md` — Canonical JSON event schema reference. Stable from v0.15.2, additive only.
- `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (`<!-- timeline -->`, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics).
- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces (a) the `${JSON.stringify(x)}::jsonb` interpolation pattern (postgres.js v3 double-encodes it), or (b) `max_stalled INTEGER NOT NULL DEFAULT 1` in any schema source file (v0.15.1 #219 regression guard — must be DEFAULT 5 to preserve SIGKILL-rescue). Wired into `bun test`.
- `scripts/llms-config.ts` + `scripts/build-llms.ts` — Generator for `llms.txt` (llmstxt.org-spec web index) + `llms-full.txt` (inlined single-fetch bundle). Curated config drives both. Run `bun run build:llms` after adding a new doc. `LLMS_REPO_BASE` env var lets forks regenerate with their own URL base. `FULL_SIZE_BUDGET` (600KB) caps the inline bundle; generator WARNs if exceeded. Committed output is not analogous to `schema-embedded.ts` (no runtime consumer); we commit for GitHub browsing and fork-safe fetching.
- `AGENTS.md` — Local-clone entry point for non-Claude agents (Codex, Cursor, OpenClaw, Aider). Mirrors `CLAUDE.md` intent via relative links. Claude Code keeps using `CLAUDE.md`.
- `docs/UPGRADING_DOWNSTREAM_AGENTS.md` — Patches for downstream agent skill forks to apply when upgrading. Each release appends a new section. v0.10.3 includes diffs for brain-ops, meeting-ingestion, signal-detector, enrich.
- `src/core/schema-embedded.ts` — AUTO-GENERATED from schema.sql (run `bun run build:schema`)
- `src/schema.sql` — Full Postgres + pgvector DDL (source of truth, generates schema-embedded.ts)
- `src/commands/integrations.ts` — Standalone integration recipe management (no DB needed). Exports `getRecipeDirs()` (trust-tagged recipe sources), SSRF helpers (`isInternalUrl`, `parseOctet`, `hostnameToOctets`, `isPrivateIpv4`). Only package-bundled recipes are `embedded=true`; `$GBRAIN_RECIPES_DIR` and cwd `./recipes/` are untrusted and cannot run `command`/`http`/string health checks.
- `src/core/search/expansion.ts` — Multi-query expansion via Haiku. Exports `sanitizeQueryForPrompt` + `sanitizeExpansionOutput` (prompt-injection defense-in-depth). Sanitized query is only used for the LLM channel; original query still drives search.
- `recipes/` — Integration recipe files (YAML frontmatter + markdown setup instructions)
- `docs/guides/` — Individual SKILLPACK guides (broken out from monolith)
- `docs/integrations/` — "Getting Data In" guides and integration docs
- `docs/architecture/infra-layer.md` — Shared infrastructure documentation
- `docs/ethos/THIN_HARNESS_FAT_SKILLS.md` — Architecture philosophy essay
- `docs/ethos/MARKDOWN_SKILLS_AS_RECIPES.md` — "Homebrew for Personal AI" essay
- `docs/guides/repo-architecture.md` — Two-repo pattern (agent vs brain)
- `docs/guides/sub-agent-routing.md` — Model routing table for sub-agents
- `docs/guides/skill-development.md` — 5-step skill development cycle + MECE
- `docs/guides/idea-capture.md` — Originality distribution, depth test, cross-linking
- `docs/guides/quiet-hours.md` — Notification hold + timezone-aware delivery
- `docs/guides/diligence-ingestion.md` — Data room to brain pages pipeline
- `docs/designs/HOMEBREW_FOR_PERSONAL_AI.md` — 10-star vision for integration system
- `docs/mcp/` — Per-client setup guides (Claude Desktop, Code, Cowork, Perplexity)
- `docs/benchmarks/` — Search quality benchmark results (reproducible, fictional data)
- `skills/_brain-filing-rules.md` — Cross-cutting brain filing rules (referenced by all brain-writing skills)
- `skills/RESOLVER.md` — Skill routing table (based on the agent-fork AGENTS.md pattern)
- `skills/conventions/` — Cross-cutting rules (quality, brain-first, model-routing, test-before-bulk, cross-modal)
- `skills/_output-rules.md` — Output quality standards (deterministic links, no slop, exact phrasing)
- `skills/signal-detector/SKILL.md` — Always-on idea+entity capture on every message
- `skills/brain-ops/SKILL.md` — Brain-first lookup, read-enrich-write loop, source attribution
- `skills/idea-ingest/SKILL.md` — Links/articles/tweets with author people page mandatory
- `skills/media-ingest/SKILL.md` — Video/audio/PDF/book with entity extraction
- `skills/meeting-ingestion/SKILL.md` — Transcripts with attendee enrichment chaining
- `skills/citation-fixer/SKILL.md` — Citation format auditing and fixing
- `skills/repo-architecture/SKILL.md` — Filing rules by primary subject
- `skills/skill-creator/SKILL.md` — Create conforming skills with MECE check
- `skills/daily-task-manager/SKILL.md` — Task lifecycle with priority levels
- `skills/daily-task-prep/SKILL.md` — Morning prep with calendar context
- `skills/cross-modal-review/SKILL.md` — Quality gate via second model
- `skills/cron-scheduler/SKILL.md` — Schedule staggering, quiet hours, idempotency
- `skills/reports/SKILL.md` — Timestamped reports with keyword routing
- `skills/testing/SKILL.md` — Skill validation framework
- `skills/soul-audit/SKILL.md` — 6-phase interview for SOUL.md, USER.md, ACCESS_POLICY.md, HEARTBEAT.md
- `skills/webhook-transforms/SKILL.md` — External events to brain signals
- `skills/data-research/SKILL.md` — Structured data research: email-to-tracker pipeline with parameterized YAML recipes
- `skills/minion-orchestrator/SKILL.md` — Background job orchestration: submit, fan out children with depth/cap/timeouts, collect results via child_done inbox
- `templates/` — SOUL.md, USER.md, ACCESS_POLICY.md, HEARTBEAT.md templates
- `skills/migrations/` — Version migration files with feature_pitch YAML frontmatter
- `src/commands/publish.ts` — Deterministic brain page publisher (code+skill pair, zero LLM calls)
- `src/commands/backlinks.ts` — Back-link checker and fixer (enforces Iron Law)
- `src/commands/lint.ts` — Page quality linter (catches LLM artifacts, placeholder dates)
- `src/commands/report.ts` — Structured report saver (audit trail for maintenance/enrichment)
- `openclaw.plugin.json` — ClawHub bundle plugin manifest
## Commands
Run `gbrain --help` or `gbrain --tools-json` for full command reference.
Key commands added in v0.7:
- `gbrain init` — defaults to PGLite (no Supabase needed), scans repo size, suggests Supabase for 1000+ files
- `gbrain migrate --to supabase` / `gbrain migrate --to pglite` — bidirectional engine migration
Key commands added for Minions (job queue):
- `gbrain jobs submit <name> [--params JSON] [--follow] [--dry-run]` — submit a background job. v0.13.1 adds first-class flags for every `MinionJobInput` tuning knob: `--max-stalled N`, `--backoff-type fixed|exponential`, `--backoff-delay Nms`, `--backoff-jitter 0..1`, `--timeout-ms N`, `--idempotency-key K`.
- `gbrain jobs list [--status S] [--queue Q]` — list jobs with filters
- `gbrain jobs get <id>` — job details with attempt history
- `gbrain jobs cancel/retry/delete <id>` — manage job lifecycle
- `gbrain jobs prune [--older-than 30d]` — clean old completed/dead jobs
- `gbrain jobs stats` — job health dashboard
- `gbrain jobs smoke [--sigkill-rescue]` — health smoke test. `--sigkill-rescue` is the v0.13.1 regression guard for #219: simulates a killed worker and asserts the stalled job is requeued instead of dead-lettered on first stall.
- `gbrain jobs work [--queue Q] [--concurrency N]` — start worker daemon (Postgres only)
Key commands added in v0.12.2:
- `gbrain repair-jsonb [--dry-run] [--json]` — repair double-encoded JSONB rows left over from v0.12.0-and-earlier Postgres writes. Idempotent; PGLite no-ops. The `v0_12_2` migration runs this automatically on `gbrain upgrade`.
Key commands added in v0.12.3:
- `gbrain orphans [--json] [--count] [--include-pseudo]` — surface pages with zero inbound wikilinks, grouped by domain. Auto-generated/raw/pseudo pages filtered by default. Also exposed as `find_orphans` MCP operation. The natural consumer of the v0.12.0 knowledge graph layer: once edges are captured, find the gaps.
- `gbrain doctor` gains two new reliability detection checks: `jsonb_integrity` (v0.12.0 Postgres double-encode damage) and `markdown_body_completeness` (pages truncated by the old splitBody bug). Detection only; fix hints point at `gbrain repair-jsonb` and `gbrain sync --force`.
Key commands added in v0.14.2:
- `gbrain sync --skip-failed` — acknowledge the current set of failed-parse files recorded in `~/.gbrain/sync-failures.jsonl` so the sync bookmark advances past them. Doctor's `sync_failures` check shows previously-skipped as "all acknowledged" instead of warning.
- `gbrain sync --retry-failed` — re-walk the unacknowledged failures and re-attempt parsing. If the files now succeed, they clear from the set and the bookmark advances naturally.
- `gbrain apply-migrations --force-retry <version>` — reset a wedged migration (3 consecutive partials with no completion) by appending a `'retry'` marker. Next `apply-migrations --yes` treats the version as fresh. `complete` status never regresses to `partial` either before or after a retry marker.
- `GBRAIN_POOL_SIZE` env var — honored by both the singleton pool (`src/core/db.ts`) and the parallel-import worker pool (`src/commands/import.ts`). Default is 10; lower to 2 for Supabase transaction pooler to avoid MaxClients crashes during `gbrain upgrade` subprocess spawns. Read at call time via `resolvePoolSize()`.
- `gbrain doctor` gains two new checks: `sync_failures` (surfaces unacknowledged parse failures with exact paths + fix hints) and `brain_score` (renders the 5-component breakdown when score < 100: embed coverage / 35, link density / 25, timeline coverage / 15, orphans / 15, dead links / 10 — sum equals total).
Key commands added in v0.14.3 (fix wave):
- `gbrain doctor --index-audit` — opt-in Postgres-only check reporting zero-scan indexes from `pg_stat_user_indexes`. Informational only; never auto-drops.
- `gbrain doctor` schema_version check fails loudly when `version=0` — catches `bun install -g github:...` postinstall failures (#218) and routes users to `gbrain apply-migrations --yes`.
- `gbrain jobs submit` gains `--max-stalled`, `--backoff-type`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key` — exposing existing `MinionJobInput` fields as first-class CLI flags.
- `gbrain jobs smoke --sigkill-rescue` — opt-in regression smoke case simulating a killed worker; asserts the v0.14.3 schema default (`max_stalled=5`) actually rescues on first stall.
## Testing
`bun test` runs all tests. After the v0.12.1 release: ~75 unit test files + 8 E2E test files (1412 unit pass, 119 E2E when `DATABASE_URL` is set — skip gracefully otherwise). Unit tests run
without a database. E2E tests skip gracefully when `DATABASE_URL` is not set.
Unit tests: `test/markdown.test.ts` (frontmatter parsing), `test/chunkers/recursive.test.ts`
(chunking), `test/parity.test.ts` (operations contract
parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redaction),
`test/files.test.ts` (MIME/hash), `test/import-file.test.ts` (import pipeline),
`test/upgrade.test.ts` (schema migrations),
`test/file-migration.test.ts` (file migration), `test/file-resolver.test.ts` (file resolution),
`test/import-resume.test.ts` (import checkpoints), `test/migrate.test.ts` (migration; v8/v9 helper-btree-index SQL structural assertions + 1000-row wall-clock fixtures that guard the O(n²)→O(n log n) fix + v0.13.1 assertions on v12/v13 SQL shape, `sqlFor` + `transaction:false` runner semantics, and the `max_stalled DEFAULT 1` regression guard),
`test/setup-branching.test.ts` (setup flow), `test/slug-validation.test.ts` (slug validation),
`test/storage.test.ts` (storage backends), `test/supabase-admin.test.ts` (Supabase admin),
`test/yaml-lite.test.ts` (YAML parsing), `test/check-update.test.ts` (version check + update CLI),
`test/pglite-engine.test.ts` (PGLite engine, all 40 BrainEngine methods including 11 cases for `addLinksBatch` / `addTimelineEntriesBatch`: empty batch, missing optionals, within-batch dedup via ON CONFLICT, missing-slug rows dropped by JOIN, half-existing batch, batch of 100 + v0.13.1 `connect()` error-wrap assertion (original error nested, #223 link in message, lock released)),
`test/engine-factory.test.ts` (engine factory + dynamic imports),
`test/integrations.test.ts` (recipe parsing, CLI routing, recipe validation),
`test/publish.test.ts` (content stripping, encryption, password generation, HTML output),
`test/backlinks.test.ts` (entity extraction, back-link detection, timeline entry generation),
`test/lint.test.ts` (LLM artifact detection, code fence stripping, frontmatter validation),
`test/report.test.ts` (report format, directory structure),
`test/skills-conformance.test.ts` (skill frontmatter + required sections validation),
`test/resolver.test.ts` (RESOLVER.md coverage, routing validation),
`test/search.test.ts` (RRF normalization, compiled truth boost, cosine similarity, dedup key),
`test/dedup.test.ts` (source-aware dedup, compiled truth guarantee, layer interactions),
`test/intent.test.ts` (query intent classification: entity/temporal/event/general),
`test/eval.test.ts` (retrieval metrics: precisionAtK, recallAtK, mrr, ndcgAtK, parseQrels),
`test/check-resolvable.test.ts` (resolver reachability, MECE overlap, gap detection, DRY checks + v0.14.1 proximity-based DRY detection + `extractDelegationTargets` coverage — 13 DRY cases),
`test/dry-fix.test.ts` (v0.14.1 auto-fix: three shape-aware expander pure-function tests, five guards — working-tree-dirty, no-git-backup, inside-code-fence, already-delegated within 40 lines, ambiguous-multi-match, block-is-callout — 28 cases),
`test/doctor-fix.test.ts` (v0.14.1 `gbrain doctor --fix` CLI integration: dry-run preview, apply path, JSON output shape — 3 cases),
`test/backoff.test.ts` (load-aware throttling, concurrency limits, active hours),
`test/fail-improve.test.ts` (deterministic/LLM cascade, JSONL logging, test generation, rotation),
`test/transcription.test.ts` (provider detection, format validation, API key errors),
`test/enrichment-service.test.ts` (entity slugification, extraction, tier escalation),
`test/data-research.test.ts` (recipe validation, MRR/ARR extraction, dedup, tracker parsing, HTML stripping),
`test/minions.test.ts` (Minions job queue v7: CRUD, state machine, backoff, stall detection, dependencies, worker lifecycle, lock management, claim mechanics, depth/child-cap, timeouts, cascade kill, idempotency, child_done inbox, attachments, removeOnComplete/Fail + v0.13.1 `max_stalled` clamp/default/plumbing coverage),
`test/extract.test.ts` (link extraction, timeline extraction, frontmatter parsing, directory type inference),
`test/extract-db.test.ts` (gbrain extract --source db: typed link inference, idempotency, --type filter, --dry-run JSON output),
`test/extract-fs.test.ts` (gbrain extract --source fs: first-run inserts + second-run reports zero, dry-run dedups candidates across files, second-run perf regression guard — the v0.12.1 N+1 dedup bug),
`test/link-extraction.test.ts` (canonical extractEntityRefs both formats, extractPageLinks dedup, inferLinkType heuristics, parseTimelineEntries date variants, isAutoLinkEnabled config),
`test/graph-query.test.ts` (direction in/out/both, type filter, indented tree output),
`test/features.test.ts` (feature scanning, brain_score calculation, CLI routing, persistence),
`test/file-upload-security.test.ts` (symlink traversal, cwd confinement, slug + filename allowlists, remote vs local trust),
`test/query-sanitization.test.ts` (prompt-injection stripping, output sanitization, structural boundary),
`test/search-limit.test.ts` (clampSearchLimit default/cap behavior across list_pages and get_ingest_log),
`test/repair-jsonb.test.ts` (v0.12.2 JSONB repair: TARGETS list, idempotency, engine-awareness),
`test/migrations-v0_12_2.test.ts` (v0.12.2 orchestrator phases: schema → repair → verify → record),
`test/markdown.test.ts` (splitBody sentinel precedence, horizontal-rule preservation, inferType wiki subtypes),
`test/orphans.test.ts` (v0.12.3 orphans command: detection, pseudo filtering, text/json/count outputs, MCP op),
`test/postgres-engine.test.ts` (v0.12.3 statement_timeout scoping: `sql.begin` + `SET LOCAL` shape, source-level grep guardrail against reintroduced bare `SET statement_timeout`),
`test/sync.test.ts` (sync logic + v0.12.3 regression guard asserting top-level `engine.transaction` is not called),
`test/doctor.test.ts` (doctor command + v0.12.3 assertions that `jsonb_integrity` scans the four v0.12.0 write sites and `markdown_body_completeness` is present),
`test/utils.test.ts` (shared SQL utilities + `tryParseEmbedding` null-return and single-warn semantics),
`test/build-llms.test.ts` (llms.txt/llms-full.txt generator: path resolution, idempotence, spec shape, regen-drift guard, content contract, AGENTS.md install-path mirror, size-budget enforcement — 7 cases).
E2E tests (`test/e2e/`): Run against real Postgres+pgvector. Require `DATABASE_URL`.
- `bun run test:e2e` runs Tier 1 (mechanical, all operations, no API keys). Includes 9 dedicated cases for the postgres-engine `addLinksBatch` / `addTimelineEntriesBatch` bind path — postgres-js's `unnest()` binding is structurally different from PGLite's and gets its own coverage.
- `test/e2e/search-quality.test.ts` runs search quality E2E against PGLite (no API keys, in-memory)
- `test/e2e/graph-quality.test.ts` runs the v0.10.3 knowledge graph pipeline (auto-link via put_page, reconciliation, traversePaths) against PGLite in-memory
- `test/e2e/postgres-jsonb.test.ts` — v0.12.2 regression test. Round-trips all 5 JSONB write sites (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter) against real Postgres and asserts `jsonb_typeof='object'` plus `->>'key'` returns the expected scalar. The test that should have caught the original double-encode bug.
- `test/e2e/jsonb-roundtrip.test.ts` — v0.12.3 companion regression against the 4 doctor-scanned JSONB sites. Assertion-level overlap with `postgres-jsonb.test.ts` is intentional defense-in-depth: if doctor's scan surface ever drifts from the actual write surface, one of these tests catches it.
- `test/e2e/upgrade.test.ts` runs check-update E2E against real GitHub API (network required)
- Tier 2 (`skills.test.ts`) requires OpenClaw + API keys, runs nightly in CI
- If `.env.testing` doesn't exist in this directory, check sibling worktrees for one:
`find ../ -maxdepth 2 -name .env.testing -print -quit` and copy it here if found.
- Always run E2E tests when they exist. Do not skip them just because DATABASE_URL
is not set. Start the test DB, run the tests, then tear it down.
### API keys and running ALL tests
ALWAYS source the user's shell profile before running tests:
```bash
source ~/.zshrc 2>/dev/null || true
```
This loads `OPENAI_API_KEY` and `ANTHROPIC_API_KEY`. Without these, Tier 2 tests
skip silently. Do NOT skip Tier 2 tests just because they require API keys — load
the keys and run them.
When asked to "run all E2E tests" or "run tests", that means ALL tiers:
- Tier 1: `bun run test:e2e` (mechanical, sync, upgrade — no API keys needed)
- Tier 2: `test/e2e/skills.test.ts` (requires OpenAI + Anthropic + openclaw CLI)
- Always spin up the test DB, source zshrc, run everything, tear down.
### E2E test DB lifecycle (ALWAYS follow this)
You are responsible for spinning up and tearing down the test Postgres container.
Do not leave containers running after tests. Do not skip E2E tests.
1. **Check for `.env.testing`** — if missing, copy from sibling worktree.
Read it to get the DATABASE_URL (it has the port number).
2. **Check if the port is free:**
`docker ps --filter "publish=PORT"` — if another container is on that port,
pick a different port (try 5435, 5436, 5437) and start on that one instead.
3. **Start the test DB:**
```bash
docker run -d --name gbrain-test-pg \
-e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=gbrain_test \
-p PORT:5432 pgvector/pgvector:pg16
```
Wait for ready: `docker exec gbrain-test-pg pg_isready -U postgres`
4. **Run E2E tests:**
`DATABASE_URL=postgresql://postgres:postgres@localhost:PORT/gbrain_test bun run test:e2e`
5. **Tear down immediately after tests finish (pass or fail):**
`docker stop gbrain-test-pg && docker rm gbrain-test-pg`
Never leave `gbrain-test-pg` running. If you find a stale one from a previous run,
stop and remove it before starting a new one.
## Skills
Read the skill files in `skills/` before doing brain operations. GBrain ships 26 skills
organized by `skills/RESOLVER.md`:
**Original 8 (conformance-migrated):** ingest (thin router), query, maintain, enrich,
briefing, migrate, setup, publish.
**Brain skills (ported from an upstream agent fork):** signal-detector, brain-ops, idea-ingest, media-ingest,
meeting-ingestion, citation-fixer, repo-architecture, skill-creator, daily-task-manager.
**Operational + identity:** daily-task-prep, cross-modal-review, cron-scheduler, reports,
testing, soul-audit, webhook-transforms, data-research, minion-orchestrator.
**Conventions:** `skills/conventions/` has cross-cutting rules (quality, brain-first,
model-routing, test-before-bulk, cross-modal). `skills/_brain-filing-rules.md` and
`skills/_output-rules.md` are shared references.
## Bulk-action progress reporting
All bulk commands (doctor, embed, import, export, sync, extract, migrate,
repair-jsonb, orphans, check-backlinks, lint, integrity auto, eval, files
sync, and apply-migrations) stream progress through the shared reporter
at `src/core/progress.ts`. Agents get heartbeats within 1 second of every
iteration regardless of how slow the underlying work is.
Rules:
- Progress always writes to **stderr**. Stdout stays clean for data output
(`--json` payloads, final summaries, JSON action events from `extract`).
- Non-TTY default: plain one-line-per-event human text. JSON requires the
explicit `--progress-json` flag.
- Global flags (`--quiet`, `--progress-json`, `--progress-interval=<ms>`)
are parsed by `src/core/cli-options.ts` BEFORE command dispatch.
- Phase names are machine-stable `snake_case.dot.path` (e.g.
`doctor.db_checks`, `sync.imports`). Documented in
`docs/progress-events.md`; additive changes only.
- `scripts/check-progress-to-stdout.sh` is a CI guard that fails the build
if any new code writes `\r` progress to stdout. Wired into `bun run test`.
- Minion handlers pass `job.updateProgress` as the `onProgress` callback
to core functions (DB-backed primary progress channel); stderr from
`jobs work` stays coarse for daemon liveness only.
When wiring a new bulk command: `import { createProgress } from '../core/progress.ts'`
and `import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts'`.
Create a reporter with `createProgress(cliOptsToProgressOptions(getCliOptions()))`,
`start(phase, total?)` before the loop, `tick()` inside it, `finish()` after.
For single long-running queries, use `startHeartbeat(reporter, note)` with a
try/finally to guarantee cleanup. Never call `process.stdout.write('\r...')`
in bulk paths, the CI guard will fail the build.
## Build
`bun build --compile --outfile bin/gbrain src/cli.ts`
## Pre-ship requirements
Before shipping (/ship) or reviewing (/review), always run the full test suite:
- `bun test` — unit tests (no database required)
- Follow the "E2E test DB lifecycle" steps above to spin up the test DB,
run `bun run test:e2e`, then tear it down.
Both must pass. Do not ship with failing E2E tests. Do not skip E2E tests.
## Post-ship requirements (MANDATORY)
After EVERY /ship, you MUST run /document-release. This is NOT optional. Do NOT
skip it. Do NOT say "docs look fine" without running it. The skill reads every .md
file in the project, cross-references the diff, and updates anything that drifted.
If /ship's Step 8.5 triggers document-release automatically, that counts. But if
it gets skipped for ANY reason (timeout, error, oversight), you MUST run it manually
before considering the ship complete.
Files that MUST be checked on every ship:
- README.md — does it reflect new features, commands, or setup steps?
- CLAUDE.md — does it reflect new files, test files, or architecture changes?
- CHANGELOG.md — does it cover every commit?
- TODOS.md — are completed items marked done?
- docs/ — do any guides need updating?
A ship without updated docs is an incomplete ship. Period.
## CHANGELOG voice + release-summary format
Every version entry in `CHANGELOG.md` MUST start with a release-summary section in
the GStack/Garry voice — one viewport's worth of prose + tables that lands like a
verdict, not marketing. The itemized changelog (subsections, bullets, files) goes
BELOW that summary, separated by a `### Itemized changes` header.
The release-summary section gets read by humans, by the auto-update agent, and by
anyone deciding whether to upgrade. The itemized list is for agents that need to
know exactly what changed.
### Release-summary template
Use this structure for the top of every `## [X.Y.Z]` entry:
1. **Two-line bold headline** (10-14 words total) ... should land like a verdict, not
marketing. Sound like someone who shipped today and cares whether it works.
2. **Lead paragraph** (3-5 sentences) ... what shipped, what changed for the user.
Specific, concrete, no AI vocabulary, no em dashes, no hype.
3. **A "The X numbers that matter" section** with:
- One short setup paragraph naming the source of the numbers (real production
deployment OR a reproducible benchmark ... name the file/command to run).
- A table of 3-6 key metrics with BEFORE / AFTER / Δ columns.
- A second optional table for per-category breakdown if relevant.
- 1-2 sentences interpreting the most striking number in concrete user terms.
4. **A "What this means for [audience]" closing paragraph** (2-4 sentences) tying
the metrics to a real workflow shift. End with what to do.
Voice rules:
- No em dashes (use commas, periods, "...").
- No AI vocabulary (delve, robust, comprehensive, nuanced, fundamental, etc.) or
banned phrases ("here's the kicker", "the bottom line", etc.).
- Real numbers, real file names, real commands. Not "fast" but "~30s on 30K pages."
- Short paragraphs, mix one-sentence punches with 2-3 sentence runs.
- Connect to user outcomes: "the agent does ~3x less reading" beats "improved
precision."
- Be direct about quality. "Well-designed" or "this is a mess." No dancing.
Source material to pull from:
- CHANGELOG.md previous entry for prior context
- `docs/benchmarks/[latest].md` for the headline numbers
- Recent commits (`git log <prev-version>..HEAD --oneline`) for what shipped
- Don't make up numbers. If a metric isn't in a benchmark or production data, don't
include it. Say "no measurement yet" if asked.
Target length: ~250-350 words for the summary. Should render as one viewport.
### "To take advantage of v[version]" block (required, v0.13+)
After the release-summary and BEFORE `### Itemized changes`, every `## [X.Y.Z]`
entry MUST include a human-readable self-repair block under the heading
`## To take advantage of v[version]`.
Why: `gbrain upgrade` runs `gbrain post-upgrade` which runs `gbrain apply-migrations`.
This chain has a known weak link — `upgrade.ts` catches post-upgrade failures as
best-effort (so the binary still works). When that chain silently fails, users end
up with half-upgraded brains. The self-repair block gives them a paste-ready
recovery path; the v0.13+ `~/.gbrain/upgrade-errors.jsonl` trail + `gbrain doctor`
integration close the loop.
Template (adapt the verify commands per release):
```markdown
## To take advantage of v[version]
`gbrain upgrade` should do this automatically. If it didn't, or if `gbrain doctor`
warns about a partial migration:
1. **Run the orchestrator manually:**
```bash
gbrain apply-migrations --yes
```
2. **Your agent reads `skills/migrations/v[version].md` the next time you interact with it.**
[One sentence on whether headless agents need manual action, or whether the
orchestrator already handled the mechanical side.]
3. **Verify the outcome:**
```bash
[release-specific verify commands, e.g. `gbrain graph ... --depth 2`]
gbrain stats
```
4. **If any step fails or the numbers look wrong,** please file an issue:
https://github.com/garrytan/gbrain/issues with:
- output of `gbrain doctor`
- contents of `~/.gbrain/upgrade-errors.jsonl` if it exists
- which step broke
This feedback loop is how the gbrain maintainers find fragile upgrade paths. Thank you.
```
**Skip this block** for patches that are pure bug fixes with zero user-facing action
(rare). If the release has a schema migration, data backfill, or new feature the
user needs to verify, the block is required.
The v0.13.0 entry in CHANGELOG.md is the canonical example.
### Itemized changes (the existing rules)
Below the release summary, write `### Itemized changes` and continue with the
detailed subsections (Knowledge Graph Layer, Schema migrations, Security hardening,
Tests, etc.). Same rules as before:
- Lead with what the user can now DO that they couldn't before
- Frame as benefits and capabilities, not files changed or code written
- Make the user think "hell yeah, I want that"
- Bad: "Added GBRAIN_VERIFY.md installation verification runbook"
- Good: "Your agent now verifies the entire GBrain installation end-to-end, catching
silent sync failures and stale embeddings before they bite you"
- Bad: "Setup skill Phase H and Phase I added"
- Good: "New installs automatically set up live sync so your brain never falls behind"
- **Always credit community contributions.** When a CHANGELOG entry includes work from
a community PR, name the contributor with `Contributed by @username`. Contributors
did real work. Thank them publicly every time, no exceptions.
### Reference: v0.12.0 entry as canonical example
The v0.12.0 entry in CHANGELOG.md is the canonical example of the format. Match its
structure for every future version: bold headline, lead paragraph, "numbers that
matter" with BrainBench-style before/after table, "what this means" closer, then
`### Itemized changes` with the detailed sections below.
## Version migrations
Create a migration file at `skills/migrations/v[version].md` when a release
includes changes that existing users need to act on. The auto-update agent
reads these files post-upgrade (Section 17, Step 4) and executes them.
**You need a migration file when:**
- New setup step that existing installs don't have (e.g., v0.5.0 added live sync,
existing users need to set it up, not just new installs)
- New SKILLPACK section with a MUST ADD setup requirement
- Schema changes that require `gbrain init` or manual SQL
- Changed defaults that affect existing behavior
- Deprecated commands or flags that need replacement
- New verification steps that should run on existing installs
- New cron jobs or background processes that should be registered
**You do NOT need a migration file when:**
- Bug fixes with no behavior changes
- Documentation-only improvements (the agent re-reads docs automatically)
- New optional features that don't affect existing setups
- Performance improvements that are transparent
**The key test:** if an existing user upgrades and does nothing else, will their
brain work worse than before? If yes, migration file. If no, skip it.
Write migration files as agent instructions, not technical notes. Tell the agent
what to do, step by step, with exact commands. See `skills/migrations/v0.5.0.md`
for the pattern.
## Migration is canonical, not advisory
GBrain's job is to deliver a canonical, working setup to every user on upgrade.
Anything that looks like a "host-repo change" — AGENTS.md, cron manifests,
launchctl units, config files outside `~/.gbrain/` — is a GBrain migration
step, not a nudge we leave for the host-repo maintainer. Migrations edit host
files (with backups) to make the canonical setup real. Exceptions: changes
that require human judgment (content edits, renames that break semantics,
host-specific handler registration where shell-exec would be an RCE surface).
Everything mechanical ships in the migration.
**Test:** if shipping a feature requires a sentence that starts with "in
your AGENTS.md, add…" or "in your cron/jobs.json, rewrite…", the migration
orchestrator should be doing that edit, not the user.
**The exception is host-specific code.** For custom Minion handlers
(host-specific integrations like inbox sweeps or third-party API scanners), shipping them as a
data file the worker would exec is an RCE surface. Those get registered in
the host's own repo via the plugin contract (`docs/guides/plugin-handlers.md`);
the migration orchestrator emits a structured TODO to
`~/.gbrain/migrations/pending-host-work.jsonl` + the host agent walks the
TODOs using `skills/migrations/v0.11.0.md` — stays host-agnostic, still
canonical.
## Privacy rule: scrub real names from public docs
**Never reference real people, companies, funds, or private agent names in any
public-facing artifact.** Public artifacts include: `CHANGELOG.md`, `README.md`,
`docs/`, `skills/`, PR titles + bodies, commit messages, and comments in checked-in
code. Query examples, benchmark stories, and migration guides MUST use generic
placeholders.
Why: gbrain runs a personal knowledge brain containing notes on real people and
real companies (YC founders, portfolio companies, funds, investors, meeting
attendees). When a doc copies a query like `gbrain graph diana-hu --depth 2` or
names a specific agent fork like `Wintermute`, that real name gets indexed by
search engines, surfaced in cross-references, and distributed with every release.
**Name mapping** to use in examples:
- Agent forks → `your agent fork`, `a downstream agent`, or `agent-fork`
- Example person → `alice-example`, `charlie-example`, or `a-founder`
- Example company → `acme-example`, `widget-co`, or `a-company`
- Example fund → `fund-a`, `fund-b`, `fund-c`
- Example deal → `acme-seed`, `widget-series-a`
- Example meeting → `meetings/2026-04-03` (generic date is fine)
- Example user → `you` or `the user`, never a proper name
**Specific rule: never say `Wintermute` in any CHANGELOG, README, doc, PR, or
commit message.** When the temptation is to illustrate with the real fork name:
- Reader-facing copy → `your OpenClaw` (covers Wintermute, Hermes, AlphaClaw,
and any other downstream OpenClaw deployment in one term the reader already
recognizes).
- First-person / origin-story copy → `Garry's OpenClaw` (honest that this is
the production deployment driving the feature, without exposing the private
agent's name).
`Wintermute` may appear in private artifacts (scratch plans under
`~/.gstack/projects/…`, memory files, conversation transcripts, CEO-review
plans) — those aren't distributed. Anything checked into this repo or shipped
in a release must use the OpenClaw phrasing above. Sweeping a stale reference
is a small clean-up PR, not a debate.
**When in doubt, ask yourself:** "Would this query reveal private information
about the user's contacts, investments, or portfolio if it were read by a
stranger?" If yes, replace with generic placeholders.
**Illustrative API examples with household-brand companies** (Stripe, Brex, OpenAI,
GitHub, etc.) are fine — they're public entities, not contacts in anyone's brain.
Do not confuse illustrative API examples with queries that reveal real
relationships.
## Responsible-disclosure rule: don't broadcast attack surface in release notes
**When a release fixes a security gap or a user-impacting bug, describe the fix
functionally. Do not enumerate the attack surface, quantify the exposure window,
or highlight the most sensitive records by name in public-facing artifacts.**
Public-facing artifacts include: `CHANGELOG.md`, `README.md`, `docs/`, PR titles
and bodies, commit messages, GitHub issue titles and comments, release pages,
tweets, blog posts.
**Don't write:**
- "10 tables were publicly readable by the anon key for months, including X, Y, Z"
- "X and Y are the most sensitive ones"
- "N tables exposed. Fix: enable RLS on these specific tables: ..."
**Do write:**
- "Security hardening pass. Fresh installs secure by default. Existing brains
brought to the same bar automatically on upgrade."
- "If `gbrain doctor` still flags anything after upgrade, the message names each
table and gives the exact fix."
Why: anyone reading the release page before they've upgraded now has a directed
probe list for unpatched installs. The source code ships the specifics anyway
(`src/schema.sql`, `src/core/migrate.ts`, test fixtures) — reverse engineers can
get them. But the release page is a broadcast channel. Don't hand attackers a
curated list with a banner.
**The test:** if a reader with no prior context could read the release note and
walk away knowing "gbrain at version X has table Y readable by anon key until
they patch," the note is too specific. Rewrite until that's no longer possible.
**What IS fine in public artifacts:**
- The mechanism of the fix ("the check now scans every public table instead of
a hardcoded allowlist").
- User-facing operator ergonomics (the escape-hatch SQL template, the upgrade
commands, the breaking-change flag).
- Credit to contributors.
- Generic framing of severity ("security posture tightening pass") without
quantification.
**What stays in private artifacts (plan files, private memories, internal docs):**
- Specific table names, record counts, exposure duration.
- Which records stand out as highest-risk.
- Detailed before/after tables in the "numbers that matter" format.
If the CEO/Eng review of a plan produces a detailed exposure table, keep it in
the plan file under `~/.claude/plans/` or `~/.gstack/projects/`. Don't copy it
into the CHANGELOG or PR body.
Applies retroactively: if you see a prior CHANGELOG entry naming attack-surface
specifics, scrub it as a small cleanup commit, the same way a stale Wintermute
reference gets swept.
## Schema state tracking
`~/.gbrain/update-state.json` tracks which recommended schema directories the user
adopted, declined, or added custom. The auto-update agent (SKILLPACK Section 17)
reads this during upgrades to suggest new schema additions without re-suggesting
things the user already declined. The setup skill writes the initial state during
Phase C/E. Never modify a user's custom directories or re-suggest declined ones.
## GitHub Actions SHA maintenance
All GitHub Actions in `.github/workflows/` are pinned to commit SHAs. Before shipping
(`/ship`) or reviewing (`/review`), check for stale pins and update them:
```bash
for action in actions/checkout oven-sh/setup-bun actions/upload-artifact actions/download-artifact softprops/action-gh-release gitleaks/gitleaks-action; do
tag=$(grep -r "$action@" .github/workflows/ | head -1 | grep -o '#.*' | tr -d '# ')
[ -n "$tag" ] && echo "$action@$tag: $(gh api repos/$action/git/ref/tags/$tag --jq .object.sha 2>/dev/null)"
done
```
If any SHA differs from what's in the workflow files, update the pin and version comment.
## PR descriptions cover the whole branch
Pull request titles and bodies must describe **everything in the PR diff against the
base branch**, not just the most recent commit you made. When you open or update a
PR, walk the full commit range with `git log --oneline <base>..<head>` and write the
body to cover all of it. Group by feature area (schema, code, tests, docs) — not
chronologically by commit.
This matters because reviewers read the PR body to understand what's shipping. If
the body only covers your last commit, they miss everything else and can't review
properly. A 7-commit PR with a body that describes commit 7 is worse than no body
at all — it actively misleads.
When in doubt, run `gh pr view <N> --json commits --jq '[.commits[].messageHeadline]'`
to see what's actually in the PR before writing the body.
## Community PR wave process
Never merge external PRs directly into master. Instead, use the "fix wave" workflow:
1. **Categorize** — group PRs by theme (bug fixes, features, infra, docs)
2. **Deduplicate** — if two PRs fix the same thing, pick the one that changes fewer
lines. Close the other with a note pointing to the winner.
3. **Collector branch** — create a feature branch (e.g. `garrytan/fix-wave-N`), cherry-pick
or manually re-implement the best fixes from each PR. Do NOT merge PR branches directly —
read the diff, understand the fix, and write it yourself if needed.
4. **Test the wave** — verify with `bun test && bun run test:e2e` (full E2E lifecycle).
Every fix in the wave must have test coverage.
5. **Close with context** — every closed PR gets a comment explaining why and what (if
anything) supersedes it. Contributors did real work; respect that with clear communication
and thank them.
6. **Ship as one PR** — single PR to master with all attributions preserved via
`Co-Authored-By:` trailers. Include a summary of what merged and what closed.
**Community PR guardrails:**
- Always AskUserQuestion before accepting commits that touch voice, tone, or
promotional material (README intro, CHANGELOG voice, skill templates).
- Never auto-merge PRs that remove YC references or "neutralize" the founder perspective.
- Preserve contributor attribution in commit messages.
## Skill routing
When the user's request matches an available skill, ALWAYS invoke it using the Skill
tool as your FIRST action. Do NOT answer directly, do NOT use other tools first.
The skill has specialized workflows that produce better results than ad-hoc answers.
**NEVER hand-roll ship operations.** Do not manually run git commit + push + gh pr
create when /ship is available. /ship handles VERSION bump, CHANGELOG, document-release,
pre-landing review, test coverage audit, and adversarial review. Manually creating a PR
skips all of these. If the user says "commit and ship", "push and ship", "bisect and
ship", or any combination that ends with shipping — invoke /ship and let it handle
everything including the commits. If the branch name contains a version (e.g.
`v0.5-live-sync`), /ship should use that version for the bump.
Key routing rules:
- Product ideas, "is this worth building", brainstorming → invoke office-hours
- Bugs, errors, "why is this broken", 500 errors → invoke investigate
- Ship, deploy, push, create PR, "commit and ship", "push and ship" → invoke ship
- QA, test the site, find bugs → invoke qa
- Code review, check my diff → invoke review
- Update docs after shipping → invoke document-release
- Weekly retro → invoke retro
- Design system, brand → invoke design-consultation
- Visual audit, design polish → invoke design-review
- Architecture review → invoke plan-eng-review
- Save progress, checkpoint, resume → invoke checkpoint
- Code quality, health check → invoke health
---
## INSTALL_FOR_AGENTS.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/INSTALL_FOR_AGENTS.md
# GBrain Installation Guide for AI Agents
Read this entire file, then follow the steps. Ask the user for API keys when needed.
Target: ~30 minutes to a fully working brain.
## Step 0: If you are not Claude Code
Read `AGENTS.md` at the repo root first. It's the non-Claude-agent operating
protocol (install, read order, trust boundary, common tasks). Claude Code reads
`CLAUDE.md` automatically and can skip ahead.
If you fetched this file by URL without cloning yet, the companion files live at:
- `https://raw.githubusercontent.com/garrytan/gbrain/master/AGENTS.md` — start here
- `https://raw.githubusercontent.com/garrytan/gbrain/master/llms.txt` — full doc map
- `https://raw.githubusercontent.com/garrytan/gbrain/master/llms-full.txt` — same map, inlined
## Step 1: Install GBrain
```bash
git clone https://github.com/garrytan/gbrain.git ~/gbrain && cd ~/gbrain
curl -fsSL https://bun.sh/install | bash
export PATH="$HOME/.bun/bin:$PATH"
bun install && bun link
```
Verify: `gbrain --version` should print a version number. If `gbrain` is not found,
restart the shell or add the PATH export to the shell profile.
> **Do NOT use `bun install -g github:garrytan/gbrain`.** Bun blocks the top-level
> postinstall hook on global installs, so schema migrations never run and the CLI
> aborts with `Aborted()` when it opens PGLite. Use the `git clone + bun link` path
> above. Tracking issue: [#218](https://github.com/garrytan/gbrain/issues/218).
## Step 2: API Keys
Ask the user for these:
```bash
export OPENAI_API_KEY=sk-... # required for vector search
export ANTHROPIC_API_KEY=sk-ant-... # optional, improves search quality
```
Save to shell profile or `.env`. Without OpenAI, keyword search still works.
Without Anthropic, search works but skips query expansion.
## Step 3: Create the Brain
```bash
gbrain init # PGLite, no server needed
gbrain doctor --json # verify all checks pass
```
The user's markdown files (notes, docs, brain repo) are SEPARATE from this tool repo.
Ask the user where their files are, or create a new brain repo:
```bash
mkdir -p ~/brain && cd ~/brain && git init
```
Read `~/gbrain/docs/GBRAIN_RECOMMENDED_SCHEMA.md` and set up the MECE directory
structure (people/, companies/, concepts/, etc.) inside the user's brain repo,
NOT inside ~/gbrain.
## Step 4: Import and Index
```bash
gbrain import ~/brain/ --no-embed # import markdown files
gbrain embed --stale # generate vector embeddings
gbrain query "key themes across these documents?"
```
## Step 4.5: Wire the Knowledge Graph
If the user already had a brain repo (Step 3 imported existing markdown), backfill
the typed-link graph and structured timeline. This populates the `links` and
`timeline_entries` tables that future writes will maintain automatically.
```bash
gbrain extract links --source db --dry-run | head -20 # preview
gbrain extract links --source db # commit
gbrain extract timeline --source db # dated events
gbrain stats # verify links > 0
```
For brand-new empty brains, skip this step — auto-link populates the graph as the
agent writes pages going forward. There is nothing to backfill yet.
After this step:
- `gbrain graph-query <slug> --depth 2` works (relationship traversal)
- Search ranks well-connected entities higher (backlink boost)
- Every future `put_page` auto-creates typed links and reconciles stale ones
If a user has a very large brain (>10K pages), `extract --source db` is idempotent
and supports `--since YYYY-MM-DD` for incremental runs.
## Step 5: Load Skills
Read `~/gbrain/skills/RESOLVER.md`. This is the skill dispatcher. It tells you which
skill to read for any task. Save this to your memory permanently.
The three most important skills to adopt immediately:
1. **Signal detector** (`skills/signal-detector/SKILL.md`) — fire this on EVERY
inbound message. It captures ideas and entities in parallel. The brain compounds.
2. **Brain-ops** (`skills/brain-ops/SKILL.md`) — brain-first lookup on every response.
Check the brain before any external API call.
3. **Conventions** (`skills/conventions/quality.md`) — citation format, back-linking
iron law, source attribution. These are non-negotiable quality rules.
## Step 6: Identity (optional)
Run the soul-audit skill to customize the agent's identity:
```
Read skills/soul-audit/SKILL.md and follow it.
```
This generates SOUL.md (agent identity), USER.md (user profile), ACCESS_POLICY.md
(who sees what), and HEARTBEAT.md (operational cadence) from the user's answers.
If skipped, minimal defaults are installed automatically.
## Step 7: Recurring Jobs
Set up using your platform's scheduler (OpenClaw cron, Railway cron, crontab):
- **Live sync** (every 15 min): `gbrain sync --repo ~/brain && gbrain embed --stale`
- **Auto-update** (daily): `gbrain check-update --json` (tell user, never auto-install)
- **Dream cycle** (nightly): read `docs/guides/cron-schedule.md` for the full protocol.
Entity sweep, citation fixes, memory consolidation. This is what makes the brain
compound. Do not skip it.
- **Weekly**: `gbrain doctor --json && gbrain embed --stale`
## Step 8: Integrations
Run `gbrain integrations list`. Each recipe in `~/gbrain/recipes/` is a self-contained
installer. It tells you what credentials to ask for, how to validate, and what cron
to register. Ask the user which integrations they want (email, calendar, voice, Twitter).
Verify: `gbrain integrations doctor` (after at least one is configured)
## Step 9: Verify
Read `docs/GBRAIN_VERIFY.md` and run all 7 verification checks. Check #4 (live sync
actually works) is the most important.
## Upgrade
```bash
cd ~/gbrain && git pull origin master && bun install
gbrain init # apply schema migrations (idempotent)
gbrain post-upgrade # show migration notes for the version range
```
Then read `~/gbrain/skills/migrations/v<NEW_VERSION>.md` (and any intermediate
versions you skipped) and run any backfill or verification steps it lists. Skipping
this is how features ship in the binary but stay dormant in the user's brain.
For v0.12.0+ specifically: if your brain was created before v0.12.0, run
`gbrain extract links --source db && gbrain extract timeline --source db` to
backfill the new graph layer (see Step 4.5 above).
For v0.12.2+ specifically: if your brain is Postgres- or Supabase-backed and
predates v0.12.2, the `v0_12_2` migration runs `gbrain repair-jsonb`
automatically during `gbrain post-upgrade` to fix the double-encoded JSONB
columns. PGLite brains no-op. If wiki-style imports were truncated by the old
`splitBody` bug, run `gbrain sync --full` after upgrading to rebuild
`compiled_truth` from source markdown.
---
## skills/RESOLVER.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/skills/RESOLVER.md
# GBrain Skill Resolver
This is the dispatcher. Skills are the implementation. **Read the skill file before acting.** If two skills could match, read both. They are designed to chain (e.g., ingest then enrich for each entity).
## Always-on (every message)
| Trigger | Skill |
|---------|-------|
| Every inbound message (spawn parallel, don't block) | `skills/signal-detector/SKILL.md` |
| Any brain read/write/lookup/citation | `skills/brain-ops/SKILL.md` |
## Brain operations
| Trigger | Skill |
|---------|-------|
| "What do we know about", "tell me about", "search for" | `skills/query/SKILL.md` |
| "Who knows who", "relationship between", "connections", "graph query" | `skills/query/SKILL.md` (use graph-query) |
| Creating/enriching a person or company page | `skills/enrich/SKILL.md` |
| Where does a new file go? Filing rules | `skills/repo-architecture/SKILL.md` |
| Fix broken citations in brain pages | `skills/citation-fixer/SKILL.md` |
| "Research", "track", "extract from email", "investor updates", "donations" | `skills/data-research/SKILL.md` |
| Share a brain page as a link | `skills/publish/SKILL.md` |
## Content & media ingestion
| Trigger | Skill |
|---------|-------|
| User shares a link, article, tweet, or idea | `skills/idea-ingest/SKILL.md` |
| Video, audio, PDF, book, YouTube, screenshot | `skills/media-ingest/SKILL.md` |
| Meeting transcript received | `skills/meeting-ingestion/SKILL.md` |
| Generic "ingest this" (auto-routes to above) | `skills/ingest/SKILL.md` |
## Thinking skills (from GStack)
| Trigger | Skill |
|---------|-------|
| "Brainstorm", "I have an idea", "office hours" | GStack: office-hours |
| "Review this plan", "CEO review", "poke holes" | GStack: ceo-review |
| "Debug", "fix", "broken", "investigate" | GStack: investigate |
| "Retro", "what shipped", "retrospective" | GStack: retro |
> These skills come from GStack. If GStack is installed, the agent reads them directly.
> If not, brain-only mode still works (brain skills function without thinking skills).
## Operational
| Trigger | Skill |
|---------|-------|
| Task add/remove/complete/defer/review | `skills/daily-task-manager/SKILL.md` |
| Morning prep, meeting context, day planning | `skills/daily-task-prep/SKILL.md` |
| Daily briefing, "what's happening today" | `skills/briefing/SKILL.md` |
| Cron scheduling, quiet hours, job staggering | `skills/cron-scheduler/SKILL.md` |
| Save or load reports | `skills/reports/SKILL.md` |
| "Create a skill", "improve this skill" | `skills/skill-creator/SKILL.md` |
| "Skillify this", "is this a skill?", "make this proper" | `skills/skillify/SKILL.md` |
| "Is gbrain healthy?", morning health check, skillpack-check | `skills/skillpack-check/SKILL.md` |
| Cross-modal review, second opinion | `skills/cross-modal-review/SKILL.md` |
| "Validate skills", skill health check | `skills/testing/SKILL.md` |
| Webhook setup, external event processing | `skills/webhook-transforms/SKILL.md` |
| "Spawn agent", "background task", "parallel tasks", "steer agent", "pause/resume agent" | `skills/minion-orchestrator/SKILL.md` |
## Setup & migration
| Trigger | Skill |
|---------|-------|
| "Set up GBrain", first boot | `skills/setup/SKILL.md` |
| "Migrate from Obsidian/Notion/Logseq" | `skills/migrate/SKILL.md` |
| Brain health check, maintenance run | `skills/maintain/SKILL.md` |
| "Extract links", "build link graph", "populate timeline" | `skills/maintain/SKILL.md` (extraction sections) |
| "Brain health", "what features am I missing", "brain score" | Run `gbrain features --json` |
| "Set up autopilot", "run brain maintenance", "keep brain updated" | Run `gbrain autopilot --install --repo ~/brain` |
| Agent identity, "who am I", customize agent | `skills/soul-audit/SKILL.md` |
| "Populate links", "extract links", "backfill graph" | `skills/maintain/SKILL.md` (graph population phase) |
| "Populate timeline", "extract timeline entries" | `skills/maintain/SKILL.md` (graph population phase) |
## Identity & access (always-on)
| Trigger | Skill |
|---------|-------|
| Non-owner sends a message | Check `ACCESS_POLICY.md` before responding |
| Agent needs to know its identity/vibe | Read `SOUL.md` |
| Agent needs user context | Read `USER.md` |
| Operational cadence (what to check and when) | Read `HEARTBEAT.md` |
## Disambiguation rules
When multiple skills could match:
1. Prefer the most specific skill (meeting-ingestion over ingest)
2. If the user mentions a URL, route by content type (link → idea-ingest, video → media-ingest)
3. If the user mentions a person/company, check if enrich or query fits better
4. Chaining is explicit in each skill's Phases section
5. When in doubt, ask the user
## Conventions (cross-cutting)
These apply to ALL brain-writing skills:
- `skills/conventions/quality.md` — citations, back-links, notability gate
- `skills/conventions/brain-first.md` — check brain before external APIs
- `skills/conventions/subagent-routing.md` — when to use Minions vs inline work
- `skills/_brain-filing-rules.md` — where files go
- `skills/_output-rules.md` — output quality standards
---
## README.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/README.md
# GBrain
Your AI agent is smart but forgetful. GBrain gives it a brain.
Built by the President and CEO of Y Combinator to run his actual AI agents. The production brain powering his OpenClaw and Hermes deployments: **17,888 pages, 4,383 people, 723 companies**, 21 cron jobs running autonomously, built in 12 days. The agent ingests meetings, emails, tweets, voice calls, and original ideas while you sleep. It enriches every person and company it encounters. It fixes its own citations and consolidates memory overnight. You wake up and the brain is smarter than when you went to bed.
The brain wires itself. Every page write extracts entity references and creates typed links (`attended`, `works_at`, `invested_in`, `founded`, `advises`) with zero LLM calls. Hybrid search. Self-wiring knowledge graph. Structured timeline. Backlink-boosted ranking. Ask "who works at Acme AI?" or "what did Bob invest in this quarter?" and get answers vector search alone can't reach. Benchmarked end-to-end: **Recall@5 jumps from 83% to 95%, Precision@5 from 39% to 45%, +30 more correct answers in the agent's top-5 reads** on a 240-page Opus-generated rich-prose corpus. Graph-only F1: **86.6% vs grep's 57.8%** (+28.8 pts). [Full report](docs/benchmarks/2026-04-18-brainbench-v1.md).
GBrain is those patterns, generalized. 26 skills. Install in 30 minutes. Your agent does the work. As Garry's personal agent gets smarter, so does yours.
> **~30 minutes to a fully working brain.** Database ready in 2 seconds (PGLite, no server). You just answer questions about API keys.
> **LLMs:** fetch [`llms.txt`](llms.txt) for the documentation map, or [`llms-full.txt`](llms-full.txt) for the same map with core docs inlined in one fetch. **Agents:** start with [`AGENTS.md`](AGENTS.md) (or [`CLAUDE.md`](CLAUDE.md) if you're Claude Code).
## Install
### On an agent platform (recommended)
GBrain is designed to be installed and operated by an AI agent. If you don't have one running yet:
- **[OpenClaw](https://openclaw.ai)** ... Deploy [AlphaClaw on Render](https://render.com/deploy?repo=https://github.com/chrysb/alphaclaw) (one click, 8GB+ RAM)
- **[Hermes Agent](https://github.com/NousResearch/hermes-agent)** ... Deploy on [Railway](https://github.com/praveen-ks-2001/hermes-agent-template) (one click)
Paste this into your agent:
```
Retrieve and follow the instructions at:
https://raw.githubusercontent.com/garrytan/gbrain/master/INSTALL_FOR_AGENTS.md
```
That's it. The agent clones the repo, installs GBrain, sets up the brain, loads 26 skills, and configures recurring jobs. You answer a few questions about API keys. ~30 minutes.
If your agent doesn't auto-read `AGENTS.md`, point it at that file first:
`https://raw.githubusercontent.com/garrytan/gbrain/master/AGENTS.md` is the non-Claude
agent operating protocol (install, read order, trust boundary, common tasks). For
the full doc map, use `llms.txt` at the same URL root.
### Standalone CLI (no agent)
```bash
git clone https://github.com/garrytan/gbrain.git && cd gbrain && bun install && bun link
gbrain init # local brain, ready in 2 seconds
gbrain import ~/notes/ # index your markdown
gbrain query "what themes show up across my notes?"
```
**Do NOT use `bun install -g github:garrytan/gbrain`.** Bun blocks the top-level
postinstall hook on global installs, so schema migrations never run and the CLI
aborts with `Aborted()` the first time it opens PGLite. Use `git clone + bun install
&& bun link` as shown above. See [#218](https://github.com/garrytan/gbrain/issues/218).
```
3 results (hybrid search, 0.12s):
1. concepts/do-things-that-dont-scale (score: 0.94)
PG's argument that unscalable effort teaches you what users want.
[Source: paulgraham.com, 2013-07-01]
2. originals/founder-mode-observation (score: 0.87)
Deep involvement isn't micromanagement if it expands the team's thinking.
3. concepts/build-something-people-want (score: 0.81)
The YC motto. Connected to 12 other brain pages.
```
### MCP server (Claude Code, Cursor, Windsurf)
GBrain exposes 30+ MCP tools via stdio:
```json
{
"mcpServers": {
"gbrain": { "command": "gbrain", "args": ["serve"] }
}
}
```
Add to `~/.claude/server.json` (Claude Code), Settings > MCP Servers (Cursor), or your client's MCP config.
### Remote MCP (Claude Desktop, Cowork, Perplexity)
```bash
ngrok http 8787 --url your-brain.ngrok.app
bun run src/commands/auth.ts create "claude-desktop"
claude mcp add gbrain -t http https://your-brain.ngrok.app/mcp -H "Authorization: Bearer TOKEN"
```
Per-client guides: [`docs/mcp/`](docs/mcp/DEPLOY.md). ChatGPT requires OAuth 2.1 (not yet implemented).
## The 26 Skills
GBrain ships 26 skills organized by `skills/RESOLVER.md`. The resolver tells your agent which skill to read for any task.
[Skill files are code.](https://x.com/garrytan/status/2042925773300908103) They're the most powerful way to get knowledge work done. A skill file is a fat markdown document that encodes an entire workflow: when to fire, what to check, how to chain with other skills, what quality bar to enforce. The agent reads the skill and executes it. Skills can also call deterministic TypeScript code bundled in GBrain (search, import, embed, sync) for the parts that shouldn't be left to LLM judgment. [Thin harness, fat skills](docs/ethos/THIN_HARNESS_FAT_SKILLS.md): the intelligence lives in the skills, not the runtime.
### Always-on
| Skill | What it does |
|-------|-------------|
| **signal-detector** | Fires on every message. Spawns a cheap model in parallel to capture original thinking and entity mentions. The brain compounds on autopilot. |
| **brain-ops** | Brain-first lookup before any external API. The read-enrich-write loop that makes every response smarter. |
### Content ingestion
| Skill | What it does |
|-------|-------------|
| **ingest** | Thin router. Detects input type and delegates to the right ingestion skill. |
| **idea-ingest** | Links, articles, tweets become brain pages with analysis, author people pages, and cross-linking. |
| **media-ingest** | Video, audio, PDF, books, screenshots, GitHub repos. Transcripts, entity extraction, backlink propagation. |
| **meeting-ingestion** | Transcripts become brain pages. Every attendee gets enriched. Every company gets a timeline entry. |
### Brain operations
| Skill | What it does |
|-------|-------------|
| **enrich** | Tiered enrichment (Tier 1/2/3). Creates and updates person/company pages with compiled truth and timelines. |
| **query** | 3-layer search with synthesis and citations. Says "the brain doesn't have info on X" instead of hallucinating. |
| **maintain** | Periodic health: stale pages, orphans, dead links, citation audit, back-link enforcement, tag consistency. |
| **citation-fixer** | Scans pages for missing or malformed citations. Fixes format to match the standard. |
| **repo-architecture** | Where new brain files go. Decision protocol: primary subject determines directory, not format. |
| **publish** | Share brain pages as password-protected HTML. Zero LLM calls. |
| **data-research** | Structured data research with parameterized YAML recipes. Extract investor updates, expenses, company metrics from email. |
### Operational
| Skill | What it does |
|-------|-------------|
| **daily-task-manager** | Task lifecycle with priority levels (P0-P3). Stored as searchable brain pages. |
| **daily-task-prep** | Morning prep: calendar lookahead with brain context per attendee, open threads, task review. |
| **cron-scheduler** | Schedule staggering (5-min offsets), quiet hours (timezone-aware with wake-up override), idempotency. |
| **reports** | Timestamped reports with keyword routing. "What's the latest briefing?" finds it instantly. |
| **cross-modal-review** | Quality gate via second model. Refusal routing: if one model refuses, silently switch. |
| **webhook-transforms** | External events (SMS, meetings, social mentions) converted into brain pages with entity extraction. |
| **testing** | Validates every skill has SKILL.md with frontmatter, manifest coverage, resolver coverage. |
| **skill-creator** | Create new skills following the conformance standard. MECE check against existing skills. |
| **minion-orchestrator** | Long-running agent work as background jobs. Submit, fan out children with depth/cap/timeouts, collect results via child_done inbox. |
### Identity and setup
| Skill | What it does |
|-------|-------------|
| **soul-audit** | 6-phase interview generating SOUL.md (agent identity), USER.md (user profile), ACCESS_POLICY.md (4-tier privacy), HEARTBEAT.md (operational cadence). |
| **setup** | Auto-provision PGLite or Supabase. First import. GStack detection. |
| **migrate** | Universal migration from Obsidian, Notion, Logseq, markdown, CSV, JSON, Roam. |
| **briefing** | Daily briefing with meeting context, active deals, and citation tracking. |
### Conventions
Cross-cutting rules in `skills/conventions/`:
- **quality.md** ... citations, back-links, notability gate, source attribution
- **brain-first.md** ... 5-step lookup before any external API call
- **model-routing.md** ... which model for which task
- **test-before-bulk.md** ... test 3-5 items before any batch operation
- **cross-modal.yaml** ... review pairs and refusal routing chain
## How It Works
```
Signal arrives (meeting, email, tweet, link)
-> Signal detector captures ideas + entities (parallel, never blocks)
-> Brain-ops: check the brain first (gbrain search, gbrain get)
-> Respond with full context
-> Write: update brain pages with new information + citations
-> Auto-link: typed relationships extracted on every write (zero LLM calls)
-> Sync: gbrain indexes changes for next query
```
Every cycle adds knowledge. The agent enriches a person page after a meeting. Next time that person comes up, the agent already has context. The difference compounds daily.
The system gets smarter on its own. Entity enrichment auto-escalates: a person mentioned once gets a stub page (Tier 3). After 3 mentions across different sources, they get web + social enrichment (Tier 2). After a meeting or 8+ mentions, full pipeline (Tier 1). The brain learns who matters without being told. Deterministic classifiers improve over time via a fail-improve loop that logs every LLM fallback and generates better regex patterns from the failures. `gbrain doctor` shows the trajectory: "intent classifier: 87% deterministic, up from 40% in week 1."
> "Prep me for my meeting with Jordan in 30 minutes"
> ... pulls dossier, shared history, recent activity, open threads
> "What have I said about the relationship between shame and founder performance?"
> ... searches YOUR thinking, not the internet
## Minions: your sub-agents won't drop work anymore
A durable, Postgres-native job queue built into the brain. Every long-running agent task is now a job that survives gateway restarts, streams progress, gets paused / resumed / steered mid-flight, and shows up in `gbrain jobs list`. Zero infra beyond your existing brain.
### The production numbers that matter
Here's my personal OpenClaw deployment: one Render container. Supabase Postgres holding a 45,000-page brain. 19 cron jobs firing on schedule. Real gateway load from real daily work. The task: pull a month of my social posts from an external API and ingest them end-to-end into the brain as a structured page.
| | Minions | `sessions_spawn` |
|--- |--- |--- |
| Wall time | **753ms** | **>10,000ms** (gateway timeout) |
| Token cost | **$0.00** | ~$0.03 per run |
| Success rate | **100%** | **0%** (couldn't even spawn) |
| Memory/job | ~2 MB | ~80 MB |
Under that 19-cron load, sub-agent spawn couldn't clear the 10-second gateway wall. Minions landed it in under a second for zero tokens. **Scaling:** 19,240 posts across 36 months, single bash loop, ~15 min total, $0.00. Sub-agents: ~9 min best case, ~$1.08 in tokens, ~40% spawn failure. **Lab:** durability ∞ (SIGKILL mid-flight, 10/10 rescued), throughput ~10× faster, fan-out ~21× with no failure wall, memory ~400× less.
Full benchmarks: [production](docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md) and [lab](docs/benchmarks/2026-04-18-minions-vs-openclaw-subagents.md).
### The routing rule
> **Deterministic** (same input → same steps → same output) → **Minions**
> **Judgment** (input requires assessment or decision) → **Sub-agents**
Pull posts, parse JSON, write a brain page, run a sync — deterministic. $0 tokens, survives restart, millisecond runtime. Triage the inbox, assess meeting priority, decide if a cold email deserves a reply — judgment. What sub-agents are actually good at. `minion_mode: pain_triggered` (the default) automates the routing.
### What's fixed
The six daily pains — spawn storms, agents that stop responding, forgotten dispatches, gateway crashes mid-run, runaway grandchildren, debugging soup — all belonged to the "deterministic work through a reasoning model" mistake. Minions fixes them by not making that mistake: `max_children` cap, `timeout_ms` + AbortSignal, `child_done` inbox, full `parent_job_id`/`depth`/transcript per job, Postgres durability with stall detection, cascade cancel via recursive CTE. Plus idempotency keys, attachment validation, `removeOnComplete`, and `gbrain jobs smoke` that proves the install in half a second.
```bash
gbrain jobs smoke # verify install
gbrain jobs submit sync --params '{}' # fire a background job
gbrain jobs stats # health dashboard
gbrain jobs work --concurrency 4 # start a worker (Postgres only)
```
Read [`skills/minion-orchestrator/SKILL.md`](skills/minion-orchestrator/SKILL.md) for parent-child DAGs, fan-in collection, steering via inbox.
**Minions is not incrementally better than sub-agents for background work. It's categorically different.** 753ms vs gateway timeout. $0 vs tokens. 100% vs couldn't-spawn. If your agent does deterministic work on a schedule, it runs on Minions now.
### Health check and self-heal
Minions is canonical as of v0.11.1 — every `gbrain upgrade` runs the migration automatically (schema → smoke → prefs → host rewrites → env-aware autopilot install). If you ever want to verify manually or wire a cron into your morning briefing:
```bash
gbrain doctor # half-migrated state? prints loud banner + exits non-zero
gbrain skillpack-check --quiet # exit 0/1/2 for pipeline gating
gbrain skillpack-check | jq # full JSON: {healthy, summary, actions[], doctor, migrations}
```
If anything's off, `actions[]` tells you the exact command to run. For deeper troubleshooting: [`docs/guides/minions-fix.md`](docs/guides/minions-fix.md).
Moving gateway crons to Minions (deterministic scripts, zero LLM tokens per fire): [`docs/guides/minions-shell-jobs.md`](docs/guides/minions-shell-jobs.md).
## Durable agents: `gbrain agent` (v0.15)
Your subagent runs survive crashes now. OpenClaw died mid-run? The worker re-claims on restart and replays from the last committed turn. Fan-out across 50 shards, one shard crashes — the aggregator still claims after every child reaches a terminal state and writes a mixed-outcome summary. Tool calls persist as a two-phase ledger (`pending` → `complete | failed`) so replay is safe by construction, not by hope.
```bash
# Submit a single-subagent run
gbrain agent run "summarize my last 10 journal pages"
# Fan out N prompts across N subagent children + 1 aggregator
gbrain agent run "analyze every page" \
--fanout-manifest manifests/pages.json \
--subagent-def analyzer
# Tail a running job (heartbeat per turn + full transcript on completion)
gbrain agent logs 1247 --follow --since 5m
```
Durability is the point: every Anthropic turn commits to `subagent_messages`, every tool call to `subagent_tool_executions`. Worker kills, OpenClaw crashes, timeouts — all resumable. Host repos (your OpenClaw, etc.) ship their own subagent definitions via `GBRAIN_PLUGIN_PATH` + a `gbrain.plugin.json` manifest: see [`docs/guides/plugin-authors.md`](docs/guides/plugin-authors.md). Requires `ANTHROPIC_API_KEY` on the worker.
## Skillify: your skills tree stops being a black box
Hermes and similar agent frameworks auto-create skills as a background behavior. Fine until you don't know what the agent shipped. Checklists decay. Tests drift. Resolver entries get stale. Six months later you've got an opaque pile of "skills" that nobody has read, nobody has tested, and nobody is sure still work.
GBrain ships the same capability. Except the human stays in the loop.
- **`/skillify`** turns raw code into a properly-skilled feature: SKILL.md + deterministic script + unit tests + integration tests + LLM evals + resolver trigger + resolver trigger eval + E2E smoke + brain filing. Ten items. Every one required.
- **`gbrain check-resolvable`** walks the whole skills tree: reachability, MECE overlap, DRY violations, gap detection, orphaned skills. Exits non-zero if anything is off.
- **`scripts/skillify-check.ts`** — machine-readable audit. `--json` for CI, `--recent` for last-7-days files.
You decide when and what. The tooling keeps the checklist honest.
### Why this is the right answer for OpenClaw
Auto-generated skills are a liability the first time a behavior breaks. Was it the skill? The test? The resolver trigger? The eval? You don't know, because you never read it. Debugging a black box is pure guesswork.
Skillify makes the black box legible. Every skill in your tree has: a contract (SKILL.md), tests that exercise that contract, an eval that grades LLM output against a rubric, a resolver trigger the user actually types, and a test that confirms the trigger routes right. If something breaks, you know which layer to look at. If anything goes stale, `check-resolvable` says so.
In practice this combo produces **zero orphaned skills, every feature with tests + evals + resolver triggers + evals of the triggers.** Compounding quality instead of compounding entropy.
```bash
# Audit a feature's skill completeness (10-item checklist)
bun run scripts/skillify-check.ts src/commands/publish.ts
# In CI: fail the build when a new feature isn't properly skilled
bun run scripts/skillify-check.ts --json --recent
# Validate the whole skills tree before shipping
gbrain check-resolvable
```
**Skillify is not a nice-to-have. It's the piece that makes the skills tree survive six months of compounding work.** Read [`skills/skillify/SKILL.md`](skills/skillify/SKILL.md) for the full 10-item checklist and the anti-patterns it catches.
## Getting Data In
GBrain ships integration recipes that your agent sets up for you. Each recipe tells the agent what credentials to ask for, how to validate, and what cron to register.
| Recipe | Requires | What It Does |
|--------|----------|-------------|
| [Public Tunnel](recipes/ngrok-tunnel.md) | — | Fixed URL for MCP + voice (ngrok Hobby $8/mo) |
| [Credential Gateway](recipes/credential-gateway.md) | — | Gmail + Calendar access |
| [Voice-to-Brain](recipes/twilio-voice-brain.md) | ngrok-tunnel | Phone calls to brain pages (Twilio + OpenAI Realtime) |
| [Email-to-Brain](recipes/email-to-brain.md) | credential-gateway | Gmail to entity pages |
| [X-to-Brain](recipes/x-to-brain.md) | — | Twitter timeline + mentions + deletions |
| [Calendar-to-Brain](recipes/calendar-to-brain.md) | credential-gateway | Google Calendar to searchable daily pages |
| [Meeting Sync](recipes/meeting-sync.md) | — | Circleback transcripts to brain pages with attendees |
**Data research recipes** extract structured data from email into tracked brain pages. Built-in recipes for investor updates (MRR, ARR, runway, headcount), expense tracking, and company metrics. Create your own with `gbrain research init`.
Run `gbrain integrations` to see status.
## GBrain + GStack
[GStack](https://github.com/garrytan/gstack) is the engine. GBrain is the mod.
- **[GStack](https://github.com/garrytan/gstack)** = coding skills (ship, review, QA, investigate, office-hours, retro). 70,000+ stars, 30,000 developers per day. When your agent codes on itself, it uses GStack.
- **GBrain** = everything-else skills (brain ops, signal detection, ingestion, enrichment, cron, reports, identity). When your agent remembers, thinks, and operates, it uses GBrain.
- **`hosts/gbrain.ts`** = the bridge. Tells GStack's coding skills to check the brain before coding.
`gbrain init` detects if GStack is installed and reports mod status. If GStack isn't there, it tells you how to get it.
## Architecture
```
┌──────────────────┐ ┌───────────────┐ ┌──────────────────┐
│ Brain Repo │ │ GBrain │ │ AI Agent │
│ (git) │ │ (retrieval) │ │ (read/write) │
│ │ │ │ │ │
│ markdown files │───>│ Postgres + │<──>│ 26 skills │
│ = source of │ │ pgvector │ │ define HOW to │
│ truth │ │ │ │ use the brain │
│ │<───│ hybrid │ │ │
│ human can │ │ search │ │ RESOLVER.md │
│ always read │ │ (vector + │ │ routes intent │
│ & edit │ │ keyword + │ │ to skill │
│ │ │ RRF) │ │ │
└──────────────────┘ └───────────────┘ └──────────────────┘
```
The repo is the system of record. GBrain is the retrieval layer. The agent reads and writes through both. Human always wins... edit any markdown file and `gbrain sync` picks up the changes.
## The Knowledge Model
Every page follows the compiled truth + timeline pattern:
```markdown
---
type: concept
title: Do Things That Don't Scale
tags: [startups, growth, pg-essay]
---
Paul Graham's argument that startups should do unscalable things early on.
The key insight: the unscalable effort teaches you what users actually
want, which you can't learn any other way.
---
- 2013-07-01: Published on paulgraham.com
- 2024-11-15: Referenced in batch W25 kickoff talk
```
Above the `---`: **compiled truth**. Your current best understanding. Gets rewritten when new evidence changes the picture. Below: **timeline**. Append-only evidence trail. Never edited, only added to.
## Knowledge Graph
Pages aren't just text. Every mention of a person, company, or concept becomes a typed link in a structured graph. The brain wires itself.
```
Write a meeting page mentioning Alice and Acme AI
-> Auto-link extracts entity refs from content (zero LLM calls)
-> Infers types: meeting page + person ref => `attended`
"CEO of X" pattern => `works_at`
"invested in" => `invested_in`
"advises", "advisor" => `advises`
"founded", "co-founded" => `founded`
-> Reconciles stale links: edits remove links no longer in content
-> Backlinks rank well-connected entities higher in search
```
```bash
gbrain graph-query people/alice --type attended --depth 2
# returns who Alice met with, transitively
```
The graph powers questions vector search can't: "who works at Acme AI?", "what has Bob invested in?", "find the connection between Alice and Carol". Backfill an existing brain in one command:
```bash
gbrain extract links --source db # wire up the existing 29K pages
gbrain extract timeline --source db # extract dated events from markdown timelines
```
Then ask graph questions or watch the search ranking improve. Benchmarked: **Recall@5 jumps from 83% to 95%, Precision@5 from 39% to 45%, +30 more correct answers in the agent's top-5 reads** on a 240-page Opus-generated rich-prose corpus. Graph-only F1 hits 86.6% vs grep's 57.8% (+28.8 pts). See [docs/benchmarks/2026-04-18-brainbench-v1.md](docs/benchmarks/2026-04-18-brainbench-v1.md).
## Search
Hybrid search: vector + keyword + RRF fusion + multi-query expansion + 4-layer dedup.
```
Query
-> Intent classifier (entity? temporal? event? general?)
-> Multi-query expansion (Claude Haiku)
-> Vector search (HNSW cosine) + Keyword search (tsvector)
-> RRF fusion: score = sum(1/(60 + rank))
-> Cosine re-scoring + compiled truth boost
-> 4-layer dedup + compiled truth guarantee
-> Results
```
Keyword alone misses conceptual matches. Vector alone misses exact phrases. RRF gets both. Search quality is benchmarked and reproducible: `gbrain eval --qrels queries.json` measures P@k, Recall@k, MRR, and nDCG@k. A/B test config changes before deploying them.
## Why it works: many strategies in concert
The brain isn't one trick. Every retrieval question goes through ~20 deterministic
techniques layered together. No single one is magic; the win comes from stacking
them so each layer covers what the others miss.
```
Question
├─ INGESTION (every put_page)
│ ├─ Recursive markdown chunking (or semantic / LLM-guided)
│ ├─ Embedding cache invalidation on edit
│ └─ Idempotent imports (content-hash dedup)
├─ GRAPH EXTRACTION (auto-link post-hook, zero LLM)
│ ├─ Entity-ref regex (markdown links + bare slugs)
│ ├─ Code-fence stripping (no false-positive slugs in code blocks)
│ ├─ Typed inference cascade (FOUNDED → INVESTED → ADVISES → WORKS_AT)
│ ├─ Page-role priors (partner-bio language → invested_in)
│ ├─ Within-page dedup (same target collapses to one link)
│ ├─ Stale-link reconciliation (edits remove dropped refs)
│ └─ Multi-type link constraint (same person can works_at AND advises)
├─ SEARCH PIPELINE (every query)
│ ├─ Intent classifier (entity / temporal / event / general — auto-routes)
│ ├─ Multi-query expansion (Haiku rephrases the question 3 ways)
│ ├─ Vector search (HNSW cosine over OpenAI embeddings)
│ ├─ Keyword search (Postgres tsvector + websearch_to_tsquery)
│ ├─ Reciprocal Rank Fusion (score = sum 1/(60+rank) across both)
│ ├─ Cosine re-scoring (re-rank chunks against actual query embedding)
│ ├─ Compiled-truth boost (assessments outrank timeline noise)
│ ├─ Backlink boost (well-connected entities rank higher)
│ └─ Source-aware dedup (one CT chunk per page guaranteed)
├─ GRAPH TRAVERSAL (relational queries)
│ ├─ Recursive CTE with cycle prevention (visited-array check)
│ ├─ Type-filtered edges (--type works_at, attended, etc.)
│ ├─ Direction control (in / out / both)
│ └─ Depth-capped (≤10 for remote MCP; DoS prevention)
└─ AGENT WORKFLOW (graph-confident hybrid)
├─ Graph-query first (high-precision typed answers)
├─ Grep fallback when graph returns nothing
└─ Graph hits ranked first in top-K (better P@K and R@K)
```
End-to-end on the BrainBench v1 corpus (240 rich-prose pages, before/after PR #188):
| Metric | BEFORE PR #188 | AFTER PR #188 | Δ |
|-------------------------|----------------|---------------|-------------|
| **Precision@5** | 39.2% | **44.7%** | **+5.4 pts**|
| **Recall@5** | 83.1% | **94.6%** | **+11.5 pts**|
| Correct in top-5 | 217 | 247 | **+30** |
| Graph-only F1 (ablation)| 57.8% (grep) | **86.6%** | **+28.8 pts**|
Plus 5 orthogonal capability checks (identity resolution, temporal queries,
performance at 10K-page scale, robustness to malformed input, MCP operation
contract). All pass. [Full report.](docs/benchmarks/2026-04-18-brainbench-v1.md)
The point: each technique handles a class of inputs the others miss. Vector
search misses exact slug refs; keyword catches them. Keyword misses conceptual
matches; vector catches them. RRF picks the best of both. Compiled-truth boost
keeps assessments above timeline noise. Auto-link extraction wires the graph
that lets backlink boost rank well-connected entities higher. Graph traversal
answers questions search alone can't reach. The agent picks graph-first for
precision and falls back to keyword for recall. **All deterministic, all in
concert, all measured.**
## Voice
Call a phone number. Your AI answers. It knows who's calling, pulls their full context from the brain, and responds like someone who actually knows your world. When the call ends, a brain page appears with the transcript, entity detection, and cross-references.
<p align="center">
<img src="docs/images/voice-client.png" alt="Voice client connected" width="300" />
</p>
> [See it in action](https://x.com/garrytan/status/2043022208512172263)
The voice recipe ships with GBrain: [Voice-to-Brain](recipes/twilio-voice-brain.md). WebRTC works in a browser tab with zero setup. A real phone number is optional.
## Engine Architecture
```
CLI / MCP Server
(thin wrappers, identical operations)
|
BrainEngine interface (pluggable)
|
+--------+--------+
| |
PGLiteEngine PostgresEngine
(default) (Supabase)
| |
~/.gbrain/ Supabase Pro ($25/mo)
brain.pglite Postgres + pgvector
embedded PG 17.5
gbrain migrate --to supabase|pglite
(bidirectional migration)
```
PGLite: embedded Postgres, no server, zero config. When your brain outgrows local (1000+ files, multi-device), `gbrain migrate --to supabase` moves everything.
## File Storage
Brain repos accumulate binaries. GBrain has a three-stage migration:
```bash
gbrain files mirror <dir> # copy to cloud, local untouched
gbrain files redirect <dir> # replace local with .redirect pointers
gbrain files clean <dir> # remove pointers, cloud only
gbrain files restore <dir> # download everything back (undo)
```
Storage backends: S3-compatible (AWS, R2, MinIO), Supabase Storage, or local.
## Commands
```
SETUP
gbrain init [--supabase|--url] Create brain (PGLite default)
gbrain migrate --to supabase|pglite Bidirectional engine migration
gbrain upgrade Self-update with feature discovery
PAGES
gbrain get <slug> Read a page (fuzzy slug matching)
gbrain put <slug> [< file.md] Write/update (auto-versions)
gbrain delete <slug> Delete a page
gbrain list [--type T] [--tag T] List with filters
SEARCH
gbrain search <query> Keyword search (tsvector)
gbrain query <question> Hybrid search (vector + keyword + RRF)
IMPORT
gbrain import <dir> [--no-embed] Import markdown (idempotent)
gbrain sync [--repo <path>] Git-to-brain incremental sync
gbrain export [--dir ./out/] Export to markdown
FILES
gbrain files list|upload|sync|verify File storage operations
EMBEDDINGS
gbrain embed [<slug>|--all|--stale] Generate/refresh embeddings
LINKS + GRAPH
gbrain link|unlink|backlinks Cross-reference management
gbrain extract links|timeline|all Batch backfill from existing pages
(--source db|fs, --type, --since, --dry-run)
gbrain graph-query <slug> Typed traversal (--type T --depth N
--direction in|out|both)
JOBS (Minions)
gbrain jobs submit <name> [--params JSON] [--follow] Submit a background job
gbrain jobs list [--status S] [--queue Q] List jobs with filters
gbrain jobs get|cancel|retry|delete <id> Manage job lifecycle
gbrain jobs prune [--older-than 30d] Clean completed/dead jobs
gbrain jobs stats Job health dashboard
gbrain jobs smoke One-command health check
gbrain jobs work [--queue Q] [--concurrency N] Start worker daemon
ADMIN
gbrain doctor [--json] [--fast] Health checks (resolver, skills, DB, embeddings)
gbrain doctor --fix [--dry-run] Auto-fix DRY violations (delegate inlined rules to conventions)
gbrain stats Brain statistics
gbrain serve MCP server (stdio)
gbrain integrations Integration recipe dashboard
gbrain check-backlinks check|fix Back-link enforcement
gbrain lint [--fix] LLM artifact detection
gbrain repair-jsonb [--dry-run] Repair v0.12.0 double-encoded JSONB (Postgres)
gbrain orphans [--json] [--count] Find pages with zero inbound wikilinks
gbrain transcribe <audio> Transcribe audio (Groq Whisper)
gbrain research init <name> Scaffold a data-research recipe
gbrain research list Show available recipes
```
Run `gbrain --help` for the full reference.
## Origin Story
I was setting up my [OpenClaw](https://openclaw.ai) agent and started a markdown brain repo. One page per person, one page per company, compiled truth on top, timeline on the bottom. Within a week: 10,000+ files, 3,000+ people, 13 years of calendar data, 280+ meeting transcripts, 300+ captured ideas.
The agent runs while I sleep. The dream cycle scans every conversation, enriches missing entities, fixes broken citations, consolidates memory. I wake up and the brain is smarter than when I went to sleep.
The skills in this repo are those patterns, generalized. What took 11 days to build by hand ships as a mod you install in 30 minutes.
## Docs
**For agents:**
- **[skills/RESOLVER.md](skills/RESOLVER.md)** ... Start here. The skill dispatcher.
- [Individual skill files](skills/) ... 25 standalone instruction sets
- [GBRAIN_SKILLPACK.md](docs/GBRAIN_SKILLPACK.md) ... Legacy reference architecture
- [Getting Data In](docs/integrations/README.md) ... Integration recipes and data flow
- [GBRAIN_VERIFY.md](docs/GBRAIN_VERIFY.md) ... Installation verification
**For humans:**
- [GBRAIN_RECOMMENDED_SCHEMA.md](docs/GBRAIN_RECOMMENDED_SCHEMA.md) ... Brain repo directory structure
- [Thin Harness, Fat Skills](docs/ethos/THIN_HARNESS_FAT_SKILLS.md) ... Architecture philosophy
- [ENGINES.md](docs/ENGINES.md) ... Pluggable engine interface
**Reference:**
- [GBRAIN_V0.md](docs/GBRAIN_V0.md) ... Full product spec
- [CHANGELOG.md](CHANGELOG.md) ... Version history
**Benchmarks:**
- [BrainBench v1 (PR #188)](docs/benchmarks/2026-04-18-brainbench-v1.md) ... single comprehensive before/after report on a 240-page Opus-generated corpus. 7 categories: relational queries, identity resolution, temporal queries, performance, robustness, MCP contract.
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md). Run `bun test` for unit tests. E2E tests: spin up Postgres with pgvector, run `bun run test:e2e`, tear down.
PRs welcome for: new enrichment APIs, performance optimizations, additional engine backends, new skills following the conformance standard in `skills/skill-creator/SKILL.md`.
## License
MIT
---
# Configuration
## docs/ENGINES.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/ENGINES.md
# Pluggable Engine Architecture
## The idea
Every GBrain operation goes through `BrainEngine`. The engine is the contract between "what the brain can do" and "how it's stored." Swap the engine, keep everything else.
v0 shipped `PostgresEngine` backed by Supabase. v0.7 adds `PGLiteEngine` -- embedded Postgres 17.5 via WASM (@electric-sql/pglite), zero-config default. The interface is designed so a `DuckDBEngine`, `TursoEngine`, or any custom backend could slot in without touching the CLI, MCP server, skills, or any consumer code.
## Why this matters
Different users have different constraints:
| User | Needs | Best engine |
|------|-------|-------------|
| Getting started | Zero-config, no accounts, no server | PGLiteEngine (default since v0.7) |
| Power user (you) | World-class search, 7K+ pages, zero-ops | PostgresEngine + Supabase |
| Open source hacker | Single file, no server, git-friendly | PGLiteEngine |
| Team/enterprise | Multi-user, RLS, audit trail | PostgresEngine + self-hosted |
| Researcher | Analytics, bulk exports, embeddings | DuckDBEngine (someday) |
| Edge/mobile | Offline-first, sync later | PGLiteEngine + sync (someday) |
The engine interface means we don't have to choose. PGLite is the zero-friction default. Supabase is the production scale path. `gbrain migrate --to supabase/pglite` moves between them.
## The interface
```typescript
// src/core/engine.ts
export interface BrainEngine {
// Lifecycle
connect(config: EngineConfig): Promise<void>;
disconnect(): Promise<void>;
initSchema(): Promise<void>;
transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T>;
// Pages CRUD
getPage(slug: string): Promise<Page | null>;
putPage(slug: string, page: PageInput): Promise<Page>;
deletePage(slug: string): Promise<void>;
listPages(filters: PageFilters): Promise<Page[]>;
// Search
searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]>;
searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]>;
// Chunks
upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void>;
getChunks(slug: string): Promise<Chunk[]>;
// Links
addLink(from: string, to: string, context?: string, linkType?: string): Promise<void>;
removeLink(from: string, to: string): Promise<void>;
getLinks(slug: string): Promise<Link[]>;
getBacklinks(slug: string): Promise<Link[]>;
traverseGraph(slug: string, depth?: number): Promise<GraphNode[]>;
// Tags
addTag(slug: string, tag: string): Promise<void>;
removeTag(slug: string, tag: string): Promise<void>;
getTags(slug: string): Promise<string[]>;
// Timeline
addTimelineEntry(slug: string, entry: TimelineInput): Promise<void>;
getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]>;
// Raw data
putRawData(slug: string, source: string, data: object): Promise<void>;
getRawData(slug: string, source?: string): Promise<RawData[]>;
// Versions
createVersion(slug: string): Promise<PageVersion>;
getVersions(slug: string): Promise<PageVersion[]>;
revertToVersion(slug: string, versionId: number): Promise<void>;
// Stats + health
getStats(): Promise<BrainStats>;
getHealth(): Promise<BrainHealth>;
// Ingest log
logIngest(entry: IngestLogInput): Promise<void>;
getIngestLog(opts?: IngestLogOpts): Promise<IngestLogEntry[]>;
// Config
getConfig(key: string): Promise<string | null>;
setConfig(key: string, value: string): Promise<void>;
// Migration + advanced (added v0.7)
runMigration(sql: string): Promise<void>;
getChunksWithEmbeddings(slug: string): Promise<ChunkWithEmbedding[]>;
}
```
### Key design choices
**Slug-based API, not ID-based.** Every method takes slugs, not numeric IDs. The engine resolves slugs to IDs internally. This keeps the interface portable... slugs are strings, IDs are database-specific.
**Embedding is NOT in the engine.** The engine stores embeddings and searches by vector, but it doesn't generate embeddings. `src/core/embedding.ts` handles that. This is intentional: embedding is an external API call (OpenAI), not a storage concern. All engines share the same embedding service.
**Chunking is NOT in the engine.** Same logic. `src/core/chunkers/` handles chunking. The engine stores and retrieves chunks. All engines share the same chunkers.
**Search returns `SearchResult[]`, not raw rows.** The engine is responsible for its own search implementation (tsvector vs FTS5, pgvector vs sqlite-vss) but must return a uniform result type. RRF fusion and dedup happen above the engine, in `src/core/search/hybrid.ts`.
**`traverseGraph` exists but is engine-specific.** Postgres uses recursive CTEs. SQLite would use a loop with depth tracking. The interface is the same: give me a slug and max depth, return the graph.
## How search works across engines
```
+-------------------+
| hybrid.ts |
| (RRF fusion + |
| dedup, shared) |
+--------+----------+
|
+------------+------------+
| |
+--------v--------+ +--------v--------+
| engine.search | | engine.search |
| Keyword() | | Vector() |
+-----------------+ +-----------------+
| |
+-----------+-----------+ +---------+---------+
| | | |
+-------v-------+ +-------v---+ +-------v---+ +----v--------+
| Postgres: | | PGLite: | | Postgres: | | PGLite: |
| tsvector + | | tsvector +| | pgvector | | pgvector |
| ts_rank + | | ts_rank | | HNSW | | HNSW |
| websearch_to_ | | (same SQL)| | cosine | | cosine |
| tsquery | | | | | | (same SQL) |
+---------------+ +-----------+ +-----------+ +-------------+
```
RRF fusion, multi-query expansion, and 4-layer dedup are engine-agnostic. They operate on `SearchResult[]` arrays. Only the raw keyword and vector searches are engine-specific.
## PostgresEngine (v0, ships)
**Dependencies:** `postgres` (porsager/postgres), `pgvector`
**Postgres-specific features used:**
- `tsvector` + `GIN` index for full-text search with `ts_rank` weighting
- `pgvector` HNSW index for cosine similarity vector search
- `pg_trgm` + `GIN` for fuzzy slug resolution
- Recursive CTEs for graph traversal
- Trigger-based search_vector (spans pages + timeline_entries)
- JSONB for frontmatter with GIN index
- Connection pooling via Supabase Supavisor (port 6543)
**Hosting:** Supabase Pro ($25/mo). Zero-ops. Managed Postgres with pgvector built in.
**Why not self-hosted for v0:** The brain should be infrastructure agents use, not something you maintain. Self-hosted Postgres with Docker is a welcome community PR, but v0 optimizes for zero ops.
## PGLiteEngine (v0.7, ships)
**Dependencies:** `@electric-sql/pglite` (v0.4.4+)
**What it is:** Embedded Postgres 17.5 compiled to WASM via ElectricSQL's PGLite. Runs in-process, no server, no Docker, no accounts. Same SQL as PostgresEngine -- not a separate dialect. All 37 BrainEngine methods implemented.
**PGLite-specific details:**
- Uses `pglite-schema.ts` for DDL (pgvector extension, pg_trgm, triggers, indexes)
- Parameterized queries throughout (shared utilities in `src/core/utils.ts`)
- `hybridSearch` keyword-only fallback when `OPENAI_API_KEY` is not set
- Data stored at `~/.gbrain/brain.db` (configurable)
- pgvector HNSW index for cosine similarity vector search (same as Postgres)
- tsvector + ts_rank for full-text search (same as Postgres)
- pg_trgm for fuzzy slug resolution (same as Postgres)
**When to use PGLite vs Postgres:**
| Factor | PGLite | PostgresEngine + Supabase |
|--------|--------|--------------------------|
| Setup | `gbrain init` (zero-config) | Account + connection string |
| Scale | Good for < 1,000 files | Production-proven at 10K+ |
| Multi-device | Single machine only | Any device via remote MCP |
| Cost | Free | Supabase Pro ($25/mo) |
| Concurrency | Single process | Connection pooling |
| Backups | Manual (file copy) | Managed by Supabase |
**Migration:** `gbrain migrate --to supabase` exports everything (pages, chunks, embeddings, links, tags, timeline) and imports into Supabase. `gbrain migrate --to pglite` goes the other direction. Bidirectional, lossless.
## Adding a new engine
1. Create `src/core/<name>-engine.ts` implementing `BrainEngine`
2. Add to engine factory in `src/core/engine-factory.ts`:
```typescript
export function createEngine(type: string): BrainEngine {
switch (type) {
case 'pglite': return new PGLiteEngine();
case 'postgres': return new PostgresEngine();
case 'myengine': return new MyEngine();
default: throw new Error(`Unknown engine: ${type}`);
}
}
```
The factory uses dynamic imports so engines are only loaded when selected.
3. Store engine type in `~/.gbrain/config.json`: `{ "engine": "myengine", ... }`
4. Add tests. The test suite should be engine-agnostic where possible... same test cases, different engine constructor.
5. Document in this file + add a design doc in `docs/`
### What you DON'T need to touch
- `src/cli.ts` (dispatches to engine, doesn't know which one)
- `src/mcp/server.ts` (same)
- `src/core/chunkers/*` (shared across engines)
- `src/core/embedding.ts` (shared across engines)
- `src/core/search/hybrid.ts`, `expansion.ts`, `dedup.ts` (shared, operate on SearchResult[])
- `skills/*` (fat markdown, engine-agnostic)
### What you DO need to implement
Every method in `BrainEngine`. The full interface. No optional methods, no feature flags. If your engine can't do vector search (e.g., a pure-text engine), implement `searchVector` to return `[]` and document the limitation.
## Capability matrix
| Capability | PostgresEngine | PGLiteEngine | Notes |
|-----------|---------------|-------------|-------|
| CRUD | Full | Full | Same SQL |
| Keyword search | tsvector + ts_rank | tsvector + ts_rank | Identical (real Postgres) |
| Vector search | pgvector HNSW | pgvector HNSW | Identical (real Postgres) |
| Fuzzy slug | pg_trgm | pg_trgm | Identical (real Postgres) |
| Graph traversal | Recursive CTE | Recursive CTE | Same SQL |
| Transactions | Full ACID | Full ACID | Both support this |
| JSONB queries | GIN index | GIN index | Identical |
| Concurrent access | Connection pooling | Single process | PGLite limitation |
| Hosting | Supabase, self-hosted, Docker | Local file | |
| Migration methods | runMigration, getChunksWithEmbeddings | Same | Added v0.7 |
## Future engine ideas
**TursoEngine.** libSQL (SQLite fork) with embedded replicas and HTTP edge access. Would give SQLite's simplicity with cloud sync. Interesting for mobile/edge use cases.
**DuckDBEngine.** Analytical workloads. Bulk exports, embedding analysis, brain-wide statistics. Not for OLTP. Could be a secondary engine for analytics alongside Postgres for operations.
**Custom/Remote.** The interface is clean enough that someone could build an engine backed by any storage: Firestore, DynamoDB, a REST API, even a flat file system. The interface doesn't assume SQL.
Note: The original SQLite engine plan (`docs/SQLITE_ENGINE.md`) was superseded by PGLite. PGLite uses the same SQL as Postgres, eliminating the need for a separate SQLite dialect with FTS5/sqlite-vss translation.
---
## docs/GBRAIN_RECOMMENDED_SCHEMA.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/GBRAIN_RECOMMENDED_SCHEMA.md
<!-- schema-version: 0.5.0 -->
<!-- source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/GBRAIN_RECOMMENDED_SCHEMA.md -->
# Brain: The LLM-Maintained Knowledge Base
A system prompt for any AI agent that wants to build and maintain a personal knowledge base. This describes the pattern, the architecture, and the operational discipline that makes it work.
Drop this into your agent's workspace as a skill or system prompt. Your agent will build the rest.
---
## What this is
A personal intelligence system where your AI agent builds and maintains an interlinked wiki of everything you know about your world — people, companies, deals, projects, meetings, ideas — as structured, cross-referenced markdown files. The agent writes and maintains all of it. You direct, curate, and think.
This is Karpathy's LLM wiki pattern, but extended from research notes into a full operational knowledge base — one that integrates with your calendar, email, meetings, social media, and contacts to stay continuously current.
The key insight: **knowledge management has failed for 30 years because maintenance falls on humans. LLM agents change the equation — they don't get bored, don't forget to update cross-references, and can touch 50 files in one pass.** Your wiki stays alive because the cost of maintenance is near zero.
## Three Founding Principles
### 1. Every Piece of Knowledge Has a Primary Home (MECE Directories)
Every piece of knowledge passes through a decision tree and lands in exactly one directory. No duplicated pages, no ambiguity about where something goes.
This is the single most important structural decision. Without it, knowledge bases rot — the same fact lives in three places with three different versions, nobody knows which is current, and the agent (or human) stops trusting the system. MECE directories with explicit resolver rules prevent this.
Every directory has a `README.md` (the resolver) that answers two questions:
1. **What goes here** — a positive definition with a concrete test
2. **What does NOT go here** — the key distinctions from neighboring directories that the agent might confuse
The brain also has a top-level `RESOLVER.md` — a numbered decision tree the agent walks when filing anything. When two directories seem to fit, disambiguation rules break the tie. When nothing fits, the item goes in `inbox/` — which is itself a signal the schema needs to evolve.
**The agent must read the resolver before creating any new page.** This is not optional.
**Important nuance: MECE applies to directories, not to reality.** Real people and entities are multi-faceted — a political founder can also be a friend, donor, media actor, and hiring candidate. The resolver picks the *primary home* for their page (people/), but the page itself uses typed backlinks and cross-references to surface all their facets. The MECE rule prevents duplicate pages, not duplicate relationships. Cross-references are how adjacency is preserved without breaking the one-page-per-entity rule.
### 2. Compiled Truth + Timeline (Two-Layer Pages)
Every brain page has two layers, separated by a horizontal rule (`---`):
**Above the line — Compiled Truth.** Always current, always rewritten when new information arrives. Starts with a one-paragraph executive summary. If you read only this, you know the state of play. Followed by structured State fields, Open Threads (active items — removed when resolved), and See Also (cross-links).
**Below the line — Timeline.** Append-only, never rewritten. Reverse-chronological evidence log. Each entry: date, source, what happened. When an open thread gets resolved, it moves here with its resolution.
If someone asks "what's the current state?" — read above the line. If someone asks "what happened?" — read below the line. The top is the current summary. The bottom is the source log.
This is the Karpathy wiki pattern's killer feature: **the synthesis is pre-computed.** Unlike RAG, where the LLM re-derives knowledge from scratch every query, your brain has already done the work. The cross-references are already there. The contradictions have already been flagged.
### 3. Enrichment Fires on Every Signal
Every time any signal touches a person or company — meeting, email, tweet, calendar event, contact sync, conversation mention — the enrichment pipeline fires. The brain grows as a side effect of normal operations, not as a separate task you remember to do.
This is what distinguishes an operational brain from Karpathy's research wiki. He describes ingesting sources you manually add. An operational brain goes further — every pipeline (meetings, email, social media, contacts) automatically triggers enrichment on every entity it touches. You never have to remember to update someone's page. The system does it because the plumbing is wired correctly.
## Wiring It Into Your Agent
The brain must be referenced in your agent's configuration (AGENTS.md or equivalent) as a hard rule, not a suggestion. Specifically:
1. **Before creating any brain page → read RESOLVER.md.** This should be in your agent's operational rules, not buried in documentation.
2. **Before answering any question about people, companies, deals, or strategy → search the brain first.** Even if the agent thinks it knows the answer. File contents are current; the agent's memory of them goes stale.
3. **The enrich skill fires on every signal.** Every ingest pathway — meeting processing, email triage, social monitoring, contact sync — should call the enrichment pipeline when it encounters a person or company. This is wiring, not discipline. If it depends on the agent remembering, it will eventually be forgotten.
4. **Corrections are the highest-value data.** If the user corrects the agent about a person, company, deal, or decision — it gets written to the brain immediately. No batching, no deferring.
The chain of authority: **Agent config (AGENTS.md) says "read RESOLVER.md" → RESOLVER.md is the decision tree → each directory README.md is the local resolver → schema.md defines page structure → the enrich skill defines the enrichment protocol.**
## Architecture
Three layers:
**Raw sources** — meeting transcripts, emails, tweets, web research, API responses, calendar events, contact data. Immutable. The agent reads from these but never modifies them. Stored in `sources/` and `.raw/` sidecar directories.
**The brain** — a directory of interlinked markdown files. People pages, company pages, deal pages, meeting pages, project pages, concept pages. The agent owns this layer entirely. It creates pages, updates them when new information arrives, maintains cross-references, and keeps everything consistent. You read it; the agent writes it.
**The schema** — a document (this one, plus `schema.md` and `RESOLVER.md`) that tells the agent how the brain is structured, what the conventions are, and what workflows to follow. This is the key configuration file — it makes your agent a disciplined knowledge maintainer rather than a generic chatbot.
## The Database + Markdown Architecture
The markdown wiki is the human-facing layer — the primary interface for humans and LLMs. But it's not the sole source of truth. A structured database layer provides the foundation, and the markdown is generated from it.
### The Four Database Primitives
**Entity registry** — canonical ID, all aliases, all external IDs (LinkedIn member ID, X user ID, email addresses, phone numbers) in one table. This is the single source of truth for "is this the same person?" When you merge two entities, it's a database operation (point both IDs at the same canonical record), not a file-merge operation with cross-reference fixups.
**Event ledger** — every signal that touches the brain is an immutable event: meeting attended, email received, tweet published, enrichment completed, user correction applied. Events have provenance: source, timestamp, confidence, raw payload reference. The timeline section of markdown pages is generated from this ledger. You never lose events because a page rewrite went wrong.
**Fact store** — structured claims with provenance. "Jane Doe is CTO of Acme" with `source=crustdata, confidence=high, observed_at=2026-04-07`. When two sources disagree (LinkedIn says CTO, company website says VP Engineering), the conflict is visible as two facts for the same field with different values. The compiled truth section above the line is generated from the fact store's latest-confident values. Contradictions become data, not bugs.
**Relationship graph** — typed edges between entities. Person→Company (role: CTO, started: 2024-01), Person→Person (relationship: co-founded company together), Company→Deal (type: Series A, date: 2025-03). Enables graph queries that markdown grep can't answer: "who do I know who's invested in AI infrastructure companies?" becomes a traversal, not a prayer.
### Why This Matters
- **Identity resolution** becomes a database operation (merge entity IDs), not a file-merge operation with manual cross-reference fixups
- **Contradictions are structural** (two facts with different values for the same field and different sources) rather than textual (hoping the LLM notices a discrepancy buried in prose)
- **Concurrency is solved** — events append to a ledger, facts upsert to a store, markdown is rebuilt. No more merge conflicts on shared files
- **Graph queries work** — "who do I know at this company?" and "what companies has this investor backed that I also know the founders of?" become database queries, not impossible grep chains
### File-Layer Conventions
The markdown layer uses conventions that map directly to the database primitives:
1. **Use frontmatter for structured metadata** — anything you'd want to query (role, company, stage, score, tags) goes in YAML frontmatter, not buried in prose. These map to the fact store.
2. **Use `.raw/` for provenance** — save every API response with source and timestamp. These map to provenance records in the fact store.
3. **Treat the timeline as an event stream** — dated, sourced, append-only. These map to the event ledger.
4. **Keep compiled truth conceptually separate from evidence** — above the line is synthesis; below the line is evidence. The synthesis is a generated view; the evidence is queryable records.
5. **Use canonical slugs consistently** — every cross-reference uses the filename slug. These are the entity IDs in the registry.
## Directory Structure
```
brain/
├── RESOLVER.md — master decision tree for filing (agent reads this first)
├── schema.md — page conventions, templates, workflows
├── index.md — content catalog with one-line summaries
├── log.md — chronological record of all ingests/updates
├── people/ — one page per human being
│ ├── README.md — resolver: what goes here, what doesn't
│ └── .raw/ — raw API responses per person (JSON sidecars)
├── companies/ — one page per organization
│ ├── README.md
│ └── .raw/
├── deals/ — financial transactions with terms and decisions
│ └── README.md
├── meetings/ — records of specific events with transcripts
│ └── README.md
├── projects/ — things being actively built (has a repo, spec, or team)
│ └── README.md
├── ideas/ — raw possibilities nobody is building yet
│ └── README.md
├── concepts/ — mental models and frameworks you'd teach
│ └── README.md
├── writing/ — prose artifacts (essays, philosophy, drafts)
│ └── README.md
├── programs/ — major life workstreams (the forest, not the trees)
│ └── README.md
├── org/ — your institution's strategy and operations
│ └── README.md
├── civic/ — political landscape, policy, government
│ └── README.md
├── media/ — public narrative, content ops, social monitoring
│ └── README.md
├── personal/ — private notes, health, personal reflections
│ └── README.md
├── household/ — domestic operations, properties, logistics
│ └── README.md
├── hiring/ — candidate pipelines and evaluations
│ └── README.md
├── sources/ — raw data imports and archived snapshots
│ └── README.md
├── prompts/ — reusable LLM prompt library
├── inbox/ — unsorted quick captures (temporary)
└── archive/ — dead pages, historical record
```
Every directory has a README.md resolver. Adapt directories to your life — add or remove domains as needed. Not everyone needs civic/ or hiring/ or household/. The invariant is: **one directory per knowledge domain, one file per entity, every directory has a resolver, and RESOLVER.md is the master decision tree that guarantees MECE filing.**
## Entity Identity and Deduplication
In a system fed by meetings, email, social media, contacts, and APIs, **entity identity is the first real failure mode.** Without a canonical identity layer, you will end up with subtle split-brain pages — "Jane Smith" from a meeting transcript and "J. Smith" from an email and "jsmith" from Twitter all creating separate pages for the same person.
### Canonical slugs
Every entity gets a canonical slug that serves as its stable ID:
- People: `first-last.md` (all lowercase, hyphens for spaces)
- Companies: `company-name.md`
- If collisions arise, disambiguate: `david-liu-crustdata.md`, `david-liu-meta.md`
The filename IS the identity. All references, cross-links, and .raw/ sidecars use this slug.
### Aliases
People have many names across sources. The frontmatter `aliases` field captures all known variants:
```yaml
aliases: ["Jenny Shao", "Jenny G. Shao", "JennyGShao", "jennifer.shao@company.com"]
```
Aliases include: misspellings from meeting transcripts, maiden names, nicknames, email addresses, social handles, and phonetic variants. When the enrich skill encounters a new name variant for a known entity, it adds the variant to aliases — it does NOT create a new page.
### Deduplication protocol
Before creating any new page, the agent must:
1. Search existing pages by name (exact and fuzzy)
2. Search aliases across all pages: `grep -rl "NAME_VARIANT" /data/brain/people/ --include="*.md"`
3. Check .raw/ sidecars for matching email addresses or social handles
4. If a match is found → UPDATE the existing page (add alias if the name variant is new)
5. If no match → CREATE a new page
### Merge protocol
When you discover two pages are the same person:
1. Pick the more complete page as the survivor
2. Merge all timeline entries from the duplicate into the survivor (chronological order)
3. Merge all aliases
4. Update all cross-references that pointed to the duplicate
5. Delete the duplicate
6. Commit with message: `merge: [duplicate] into [survivor]`
During weekly lint, actively look for potential duplicates: similar names, same company, same email across different pages.
## Key Disambiguation Rules
The most common filing confusions and how to resolve them:
- **Concept vs. Idea:** Could you *teach* it as a framework? → concept. Could you *build* it? → idea.
- **Concept vs. Personal:** Would you share it in a professional talk? → concept. Is it private reflection? → personal.
- **Idea vs. Project:** Is anyone working on it? Yes → project. No → idea. The graduation moment is when work starts.
- **Writing vs. Media:** Writing is the *artifact* (the essay). Media is the *production and distribution infrastructure* (content pipeline, social monitoring).
- **Writing vs. Concepts:** A concept page is distilled (200 words of compiled truth). An essay is developed prose (argument, narrative, story).
- **Person vs. Company:** Is it about *them as a human*? → people/. Is it about *the organization*? → companies/. Both pages link to each other.
- **Household vs. Personal:** Would a PA execute on it? → household (operational). Is it private reflection? → personal.
- **Sources vs. .raw/ sidecars:** Per-entity enrichment data → .raw/ sidecar. Bulk multi-entity imports → sources/.
When nothing fits, file in inbox/ and flag it. That's a signal the schema needs to evolve.
## Page Types and Templates
### Person
The most important page type. A great person page is a well-researched briefing — not a LinkedIn scrape.
```markdown
# Person Name
> Executive summary: who they are, why they matter, what you should
> know walking into any interaction with them.
## State
- **Role:** Current title
- **Company:** Current org
- **Relationship:** To you (friend, colleague, investor, etc.)
- **Key context:** 2-4 bullets of what matters right now
## What They Believe
Worldview, positions, first principles. The hills they die on.
Every claim must cite its source and type:
- [Belief] — observed: [tweet/meeting/article, date]
- [Belief] — self-described: [interview/bio, date]
- [Belief] — inferred: [pattern across N interactions, confidence: high/medium/low]
## What They're Building
Current projects, recent ships, product direction.
## What Motivates Them
Ambition drivers, career arc, what gets them out of bed.
Distinguish between what they say motivates them (self-described) and
what their behavior suggests (observed/inferred).
## Communication Style
How they prefer to communicate. How they handle disagreement.
What energizes them in conversation.
This section is high-value but requires careful sourcing.
Rules: only write here from direct observation (meeting behavior,
language in emails/tweets, visible patterns). Never generalize
from a single data point. Mark confidence level.
## Hobby Horses
Topics they return to obsessively. Recurring themes in their public voice.
## Assessment
- **Strengths:** What they're great at. Be specific.
- **Gaps:** Where they could grow. Be specific and fair.
- **Net read:** One-line synthesis.
- **Confidence:** high (5+ interactions) / medium (2-4) / low (1 or inferred)
- **Last assessed:** YYYY-MM-DD
## Trajectory
Ascending, plateauing, pivoting, declining? Evidence.
## Relationship
History of interactions, temperature, dynamic.
## Contact
- Email, phone, LinkedIn, X handle, location
## Network
- **Close to:** People they're frequently seen with
- **Crew:** Which cluster they belong to
## Open Threads
- Active items, pending intros, follow-ups
---
## Timeline
- **YYYY-MM-DD** | Source — What happened.
```
All sections are optional — include what you have, leave empty sections as `[No data yet]` rather than omitting them. **The structure itself is a prompt for future enrichment.** When a section says `[No data yet]`, the agent knows what to look for next time it encounters this person.
The principle: facts are table stakes. Context is the value.
### Epistemic discipline on people pages
The context sections (Beliefs, Motivations, Communication Style, Assessment) are the highest-value parts of the system but also the most prone to hallucination. An agent can over-generalize from sparse evidence or overfit to one recent interaction. Rules:
- **Every claim cites its source.** Not "she's aggressive" but "she pushed back hard on pricing in the March 15 meeting (observed)."
- **Three source types:** `observed` (you saw it happen), `self-described` (they said it about themselves), `inferred` (you're reading between lines). Label each.
- **Confidence tracks interaction count.** One meeting = low confidence. Five meetings = high. Don't write definitive assessments from thin data.
- **Recency matters.** A belief from 2 years ago may not be current. Mark dates.
- **Never generalize from a single data point.** "She seemed frustrated in one meeting" is a timeline entry. Patterns require multiple observations.
- **The user's corrections override everything.** If the user says "that's wrong about her," update immediately — that correction is the highest-confidence signal in the system.
### Company
```markdown
# Company Name
> What they do, stage, why they matter.
## State
- **What:** One-line description
- **Stage:** Seed / Series A / Growth / Public
- **Key people:** Names with links to people pages
- **Key metrics:** Revenue, headcount, funding
- **Connection:** How they relate to your world
## Open Threads
---
## Timeline
```
### Meeting
```markdown
# Meeting Title
> YOUR analysis — not a copy of the AI meeting notes.
> What matters given everything else going on.
> What was decided. What was left unsaid.
## Attendees
## Key Decisions
## Action Items
## Connections to other brain pages
---
## Full Transcript
```
### Deal, Project, Concept — same pattern. Compiled truth on top, timeline on bottom.
## The Enrichment Pipeline
**This is the most important operational pattern.** Every time your agent encounters a person or company — in a meeting, email, tweet, calendar event, contact sync — it should enrich the corresponding brain page.
Enrichment is not just "look up their LinkedIn." It's:
- **What they believe** — positions, worldview, public stances
- **What they're building** — current projects, what's shipping
- **What motivates them** — ambition, career trajectory
- **Their communication style** — how they engage, what energizes them
- **Their relationship to you** — history, context, open threads
- **Hard facts** — role, company, contact info, funding (table stakes)
Facts are table stakes. Context is the value.
### When to enrich
**Any time** a person or company signal appears:
- Someone is mentioned in a meeting transcript → enrich
- Someone emails you → enrich
- Someone interacts with you on social media → enrich
- A new contact appears → enrich
- You mention someone in conversation and their page is thin → enrich
- A company announces funding, ships a product, makes news → enrich
### Enrichment sources (in order of value)
1. **Your own interactions** — what you said about them, what they said to you (highest signal)
2. **Meeting transcripts** — richest context source
3. **Email threads** — tone, urgency, relationship dynamics
4. **Social media** — beliefs, public positioning, who they engage with
5. **Web search** — background, press, talks
6. **People APIs** — structured profile data (career history, education, skills, contact info)
7. **Company APIs** — funding, investors, valuations, headcount, financials
8. **Contact data** — email, phone, location
### Data source skills
Each external data source should be its own named skill with full API documentation, auth patterns, and usage notes. The enrich skill orchestrates them — it decides *which* sources to call based on tier, then delegates to the individual skill for *how* to call the API.
This keeps things DRY: the enrich skill owns the logic (when to enrich, what tier, what to extract), and each data source skill owns the API contract (endpoints, auth, rate limits, gotchas, validation rules).
Recommended data source skills:
- **Web search** — broad keyword search (Brave, Google, etc.). Quick background, press, funding.
- **Semantic search** — better than keyword for finding specific people, LinkedIn URLs, personal writing. (Exa, Perplexity, etc.)
- **Social search** — X/Twitter, Bluesky, etc. for public voice: beliefs, projects, engagement patterns.
- **People enrichment** — structured LinkedIn-like data: career history, education, skills, contact info. (Crustdata, Proxycurl, People Data Labs, etc.)
- **Network search** — search your professional network for warm intros and connections. (Happenstance, Clay, etc.)
- **Company intelligence** — Pitchbook-grade data: funding rounds, investors, valuations, headcount, financials. (Captain API, Crunchbase, etc.)
- **Meeting history** — search past meetings for interactions with this entity. (Circleback, Otter, Fireflies, etc.)
- **Contact data** — email, phone, location from your contacts. (Google Contacts, etc.)
The typical enrichment flow for a new person:
1. **Network search** → find LinkedIn URL, career arc, alternate names
2. **People enrichment** → deep structured data (skills, work history, education, contact info)
3. **Semantic search** → find personal sites, talks, writing that reveal beliefs and perspective
4. **Social search** → their public voice, who they engage with, hobby horses
5. **Web search** → press coverage, recent news, talks
6. **Meeting history** → past interactions with you
For a new company:
1. **Company intelligence** → funding, investors, headcount, financials
2. **Web search** → product, press, traction
3. **Social search** → company's public positioning
4. **People enrichment** → enrich founders/key team members (each triggers person enrichment)
### Enrichment tiers (don't over-enrich)
- **Tier 1 (key people):** Full pipeline — all sources. Inner circle, business partners, important collaborators.
- **Tier 2 (notable):** Web search + social + brain cross-reference. People you interact with occasionally.
- **Tier 3 (minor mentions):** Extract signal from source only, append to timeline. Everyone else worth tracking.
A thin page with real interaction data is better than a fat page stuffed with generic web results. Don't waste 10 API calls on someone with no public presence.
### Raw data sidecars
Every enrichment API response gets saved as a JSON sidecar:
```
people/jane-doe.md ← brain page (curated, readable)
people/.raw/jane-doe.json ← raw API responses
```
The JSON is keyed by source with fetch timestamps:
```json
{
"sources": {
"crustdata": { "fetched_at": "2026-04-05T...", "data": { ... } },
"web_search": { "fetched_at": "...", "data": { ... } }
}
}
```
The brain page is the distilled version. Raw data is the archive.
What goes in the brain page (distilled): location, current title, company, headline, education (one line), career arc (condensed), top skills, social handles, profile picture permalink.
What stays in .raw/ only: full work history with job descriptions, complete skill lists, company descriptions for each employer, platform-specific IDs, follower/connection counts, full API response bodies.
When re-enriching: overwrite the source key with fresh data + new timestamp. Don't append — replace.
### Validation rules
When auto-enriching from people/company APIs:
- **Low connection/follower count (e.g., <20):** Likely wrong person. Save to .raw/ with a `"validation": "low_connections"` flag. Don't auto-write to the brain page.
- **Name mismatch:** If the returned name doesn't share a last name with the entity, skip.
- **Obviously joke profiles:** Career arcs mentioning absurd titles — skip.
- **When in doubt:** Save raw data but don't update the brain page. Wrong data is worse than no data.
### Browser budget
If enrichment involves browser-based lookups (LinkedIn, authenticated pages), set a daily budget (e.g., 20 lookups/day) to avoid account flagging. Prefer API-based enrichment services for bulk work — they don't touch the user's browser session.
## Entry Criteria — Who Gets a Page
Not everyone deserves a brain page. Scale page creation to relationship importance:
**Always create a page for:**
- Anyone you've had a 1:1 or small-group meeting with
- Key colleagues, partners, and direct collaborators
- Anyone with a strong working relationship or better
- Family, close friends, inner circle
**Create if signal exists:**
- People from contacts with recent interaction
- Anyone mentioned by name in conversation with context
- Event contacts with multiple shared events
**Do NOT create:**
- Random names from mass event guest lists with no interaction
- Single-name entries with no identifying context
- Contacts with no relationship signal at all
When in doubt: does the user benefit from this entry existing? If no, skip it.
## The Skill Architecture
Skills are the modular building blocks of the system. There are three types, and understanding how they compose is critical.
### 1. Data source skills (leaf nodes)
Each external API or data source gets its own named skill. The skill owns the API contract: endpoints, authentication, rate limits, error handling, validation rules, and what the response looks like.
Examples:
- **People enrichment** (Crustdata, Proxycurl, People Data Labs) — structured LinkedIn-like data
- **Network search** (Happenstance, Clay) — search professional network, find mutual connections
- **Company intelligence** (Captain API/Pitchbook, Crunchbase) — funding, investors, financials
- **Semantic search** (Exa, Perplexity) — find LinkedIn URLs, personal sites, writing
- **Meeting history** (Circleback, Otter, Fireflies) — past meeting transcripts and notes
- **Calendar/contacts** (Google Calendar, Google Contacts via integration tools) — schedule, contact info
- **Social media** (X API, Bluesky API) — public posts, engagement, follower data
- **Workspace tools** (Gmail, Slack, Drive via integration tools) — email threads, messages, documents
Data source skills are **never called directly by the user.** They're called by orchestration skills (below).
### 2. Orchestration skills (coordinators)
These skills contain the *logic* — they decide what to do, then delegate to data source skills for how to do it.
**The enrich skill** is the most important orchestration skill. It decides:
- Is this a CREATE (new page) or UPDATE (new signal)?
- What tier is this entity? (determines which data sources to call)
- What signal types to extract from the source material?
- Which data source skills to call, in what order?
- How to write the results to the brain?
Other orchestration skills:
- **Meeting ingestion** — pulls meetings from a meeting tool, creates brain meeting pages with analysis, then calls enrich for every attendee and company discussed
- **Email triage / executive assistant** — processes inbox, handles scheduling, then calls enrich when it encounters people or companies
- **Social monitoring** — scans public social media for mentions and engagement, then calls enrich for notable accounts
### 3. Pipeline skills (end-to-end workflows)
These are the user-facing skills that chain multiple orchestration and data source skills together:
- **Morning briefing** — reads calendar + tasks + brain state + recent signals → produces a briefing
- **Person research** — given a name, runs full Tier 1 enrichment and presents the result
- **Weekly brain maintenance** — runs lint, flags stale pages, suggests enrichment targets
### How they compose
```
User says "tell me about Jane Doe"
→ Agent searches brain (grep/index)
→ Page is thin → calls enrich skill (orchestration)
→ enrich determines Tier 1
→ calls happenstance skill (data source) → gets LinkedIn URL
→ calls crustdata skill (data source) → gets full profile
→ calls exa skill (data source) → finds personal writing
→ calls web_search (built-in tool) → gets press coverage
→ calls meeting history (data source) → finds past meetings
→ writes brain page, saves .raw/ sidecar, cross-references
→ Agent presents the enriched page to user
```
```
Cron fires "meeting ingestion" every afternoon
→ meeting-ingestion skill (orchestration) pulls new meetings
→ for each meeting: creates brain meeting page
→ for each attendee: calls enrich skill (orchestration)
→ enrich calls relevant data source skills based on tier
→ for each company discussed: calls enrich skill
→ extracts tasks, commits brain repo
```
The key insight: **data source skills are stateless and reusable.** The enrich skill can call Crustdata whether the trigger was a meeting, an email, a social mention, or a direct user request. The data source skill doesn't care where the request came from.
## How Enrich Wires Into Everything
The enrich skill is the central hub. Every ingest pathway converges on it:
```
Meeting ingestion ───────┬─────────────────────────┬─── people enrichment API
Email triage ────────────┤ ├─── company intelligence API
Social monitoring ───────┤ ENRICH SKILL ├─── network search API
Contact sync ────────────┤ (orchestration) ├─── semantic search API
Manual conversation ─────┤ ├─── social search API
Calendar events ─────────┤ ├─── web search
Webhooks ────────────────┴─────────────────────────┴─── meeting history API
BRAIN REPO
(people/, companies/,
meetings/, deals/)
```
Every arrow into the enrich skill carries a **signal** (the raw information from the source) and an **entity** (the person or company to enrich). The enrich skill:
1. **Checks brain state** — does a page exist? Is it thin?
2. **Determines tier** — Tier 1 (full pipeline), Tier 2 (web + social + cross-ref), Tier 3 (source extraction only)
3. **Extracts signal** from the source material (beliefs, motivations, trajectory, facts)
4. **Calls data source skills** based on tier (each skill is a named, documented module)
5. **Writes to brain** — CREATE (via RESOLVER.md) or UPDATE (append timeline, update compiled truth)
6. **Cross-references** — updates all linked entity pages
7. **Saves raw data** to `.raw/` sidecar
8. **Commits** to the brain repo
The critical wiring rule: **every ingest skill must call enrich.** This is not optional or aspirational. It's structural. If a new ingest pathway is added (say, a Slack monitoring skill), its implementation must include "for each person/company mentioned, call the enrich skill." If that line is missing, the brain stops compounding from that source.
## Automated Cron Jobs
The brain doesn't just grow when you're actively using it. Cron jobs make the system autonomous — the brain is maintained, the inbox is triaged, meetings are ingested, and mentions are monitored even while you sleep.
### The cron architecture
Cron jobs run as **isolated agent sessions** — they get their own context, read their own skills, and don't block the main conversation thread. They can post to specific notification channels (Telegram topics, Slack channels, Discord threads) or work silently.
Each cron job is essentially: "wake up, read a skill, do the work, post results (or stay silent if nothing happened), go back to sleep."
### Recommended cron jobs for a brain-powered system
**High frequency (every 10-30 minutes):**
- **Email monitor** — scan inbox, classify by priority, post digest to a notification channel. Handle low-risk items (scheduling, acknowledgments) directly.
- **Message monitor** — check key communication channels for unreplied messages from important contacts. Surface them with suggested responses.
**Medium frequency (every 1-3 hours):**
- **Social radar** — scan public social media for mentions of you or your organization, engagement, emerging narratives. Alert on items that need attention. Call enrich for notable new accounts engaging with you.
- **Heartbeat** — the omnibus check. Calendar lookahead, task review, email scan, brain state review. Post if something needs attention; stay silent if not.
**Daily:**
- **Morning briefing** — calendar + tasks + urgent items + overnight signals → one notification. The "here's your day" message.
- **Task prep** — archive yesterday's completed tasks, build today's list from calendar + backlog + recurring items.
- **Meeting ingestion** — pull all new meetings from your meeting tool, run full ingestion (create meeting pages, propagate to entity pages, extract tasks). This is the heaviest cron job — it touches the most brain pages.
- **Social media collection** — archive your own posts, track engagement velocity, detect deletions. Feed into media/ pages.
**Weekly:**
- **Brain lint** — run the full maintenance pass: contradictions, stale pages, orphans, missing cross-references, MECE filing violations. Post a report.
- **Enrichment sweep** — find brain pages that haven't been enriched in 90+ days, or pages with many `[No data yet]` sections. Queue them for re-enrichment.
- **Contact sync** — pull recent additions from your contacts, cross-reference with brain. Create pages for significant new contacts.
### How crons feed the brain
The key insight: **cron jobs are the autonomous enrichment engine.** Without them, the brain only grows when you're actively talking to the agent. With them:
- The email monitor encounters a person → calls enrich → brain grows
- The meeting ingestion processes a transcript → calls enrich for every attendee → brain grows
- The social radar detects a new notable account → calls enrich → brain grows
- The contact sync finds a new contact → calls enrich → brain grows
- The enrichment sweep finds stale pages → calls enrich with fresh data → brain stays current
The brain compounds 24/7 because the cron jobs are wired to call enrich. The user sleeps; the brain doesn't.
### Cron job design rules
1. **Silent when nothing happens.** If a cron finds nothing new, it should produce no output. No "nothing to report" messages. This is critical — noisy crons get disabled.
2. **Post to specific channels.** Each cron posts to its designated notification channel (e.g., email cron → Emails topic, social radar → Social Alerts topic). Don't mix signals.
3. **Spawn sub-agents for heavy work.** The cron session should stay lightweight. If meeting ingestion needs to process 5 meetings and update 30 entity pages, spawn sub-agents for the entity propagation.
4. **Idempotent and checkpoint-aware.** Every cron should track what it's already processed (in a state file like `meeting-notes-state.json`) so it doesn't redo work on the next run.
5. **Respect quiet hours.** Don't post between 11 PM and 7 AM unless something is genuinely urgent. Crons should check the time before posting.
6. **Every ingest cron must call enrich.** This is the structural rule. A cron that processes meetings but doesn't enrich attendees is a bug, not a feature.
### Example: how it all fits together
A typical afternoon in an autonomous brain system:
1. **3:00 PM** — Email monitor cron fires. Scans inbox. Finds 3 new emails: a scheduling request, a funding announcement, and a founder asking for advice.
- Handles the scheduling request directly (checks calendar, replies with available times)
- Calls enrich on the company in the funding announcement → updates company page with new round
- Posts the founder's email to notification channel for the user to handle
2. **3:15 PM** — Meeting ingestion cron fires. Finds 2 new meetings from today.
- Creates 2 brain meeting pages with analysis
- Calls enrich for 8 attendees across both meetings → updates 8 people pages
- Calls enrich for 3 companies discussed → updates 3 company pages
- Extracts 4 action items → adds to task list
3. **3:30 PM** — Social radar cron fires. Detects a journalist writing a thread about the user's organization.
- Posts alert to Social Alerts channel
- Calls enrich on the journalist → creates/updates their people page with recent activity
4. **4:00 PM** — Heartbeat fires. Calendar shows a meeting in 1 hour. Brain page for the attendee was last enriched 3 months ago.
- Triggers a fresh enrichment pass on the attendee
- Posts a prep note: "Meeting with X in 1 hour. Here's what's changed since you last met."
The user didn't ask for any of this. The brain grew by 12 pages and the user walked into their 4:00 PM meeting fully prepared — because the plumbing is wired correctly.
## Worked Examples From a Production System
These examples show how the architecture operates end-to-end. Names and specifics are genericized, but the skill chains are exact — every skill call, every file write, every cron trigger is how it actually works.
### Example 1: Meeting Ingestion — The Full Chain
A cron job fires at 3:00 PM daily with the prompt: "Read skills/meeting-ingestion/SKILL.md and process today's meetings."
**Step 1: Skill chain loads.** The meeting-ingestion skill's preamble says "Read skills/enrich/SKILL.md" — so the agent loads the enrichment protocol before touching any data. This is critical: it means the agent knows how to handle every person and company it encounters.
**Step 2: Pull new meetings.** The agent calls the meeting history data source skill (in this system, Circleback). It checks a state file (`memory/meeting-notes-state.json`) that tracks the last processed meeting ID. Finds 2 new meetings since last run.
**Step 3: Process Meeting 1 — "Product Review with Sarah Chen and Mike Torres."**
The agent creates `brain/meetings/2026-04-07-product-review.md` with:
- Its own analysis above the line (not a copy of the AI summary — reframed through what the brain already knows about the attendees and the project)
- Key decisions, action items, and connections to other brain pages
- Full transcript below the line
**Step 4: Enrich attendees.**
For **Sarah Chen** — the agent searches the brain: `grep -rl "Sarah Chen" /data/brain/people/`. Finds `people/sarah-chen.md`. Reads it. Page was last enriched 2 weeks ago and has good coverage. → **Tier 3**: extract signal from this meeting only. Appends to her timeline: "2026-04-07 | Meeting — Pushed back on timeline for launch, wants more QA. Concerned about API stability." Updates her Open Threads with the new follow-up item.
For **Mike Torres** — brain search finds `people/mike-torres.md`. Page exists but is thin: just a name, title, and one previous meeting entry. → **Tier 2**: web search + social + brain cross-reference. Agent finds his recent blog posts (feeds into What They Believe), his X activity (feeds into Hobby Horses), and cross-references him with two other brain pages that mention him. Updates compiled truth above the line.
For **"Alex from Meridian Labs"** (mentioned in the meeting but not an attendee) — brain search finds nothing. → **CREATE path**:
1. Reads RESOLVER.md: "a specific named person" → `people/`
2. Creates `people/alex-rivera.md` using the person template from schema.md
3. Runs **Tier 1 enrichment** (full pipeline): network search → finds LinkedIn URL. People enrichment API → full structured profile. Semantic search → finds a conference talk. Web search → finds press coverage of Meridian Labs' recent funding.
4. Saves raw API responses to `people/.raw/alex-rivera.json`
5. Cross-references: updates `companies/meridian-labs.md` to link to Alex's page
**Step 5: Enrich companies discussed.** Meridian Labs was discussed extensively. Agent checks `companies/meridian-labs.md` — exists but funding data is 4 months stale. Calls company intelligence API → gets fresh round data. Updates the page.
**Step 6: Extract action items.** Finds 3 action items in the transcript → appends to `ops/tasks.md`.
**Step 7: Repeat for Meeting 2.** Same flow.
**Step 8: Commit and notify.**
```bash
cd /data/brain && git add -A && git commit -m "meetings: 2026-04-07 product review, investor sync" && git push
```
Posts summary to the Meetings notification channel: "Processed 2 meetings. Created 1 new person page (Alex Rivera). Updated 4 entity pages. 5 action items extracted."
**Files touched in this run:**
```
brain/
├── meetings/
│ ├── 2026-04-07-product-review.md (CREATED)
│ └── 2026-04-07-investor-sync.md (CREATED)
├── people/
│ ├── sarah-chen.md (UPDATED — timeline + open threads)
│ ├── mike-torres.md (UPDATED — Tier 2 enrichment)
│ ├── alex-rivera.md (CREATED — Tier 1 enrichment)
│ └── .raw/
│ └── alex-rivera.json (CREATED — raw API responses)
├── companies/
│ └── meridian-labs.md (UPDATED — fresh funding data)
ops/
└── tasks.md (UPDATED — 5 new action items)
memory/
└── meeting-notes-state.json (UPDATED — checkpoint)
```
### Example 2: Email Triage — Resolver + Enrichment in Action
An email monitor cron fires at 12:00 PM. Its prompt: "Read skills/executive-assistant/SKILL.md and skills/gmail/SKILL.md. Triage the inbox."
**Step 1: Pull inbox.** The agent calls the Gmail data source skill via its workspace integration. Gets 8 new emails since last check.
**Step 2: Classify and handle.** Most emails are routine: 2 scheduling confirmations (handled directly — checks calendar, sends confirmations), 3 newsletters (archived), 1 internal FYI (noted). But one stands out:
**An email from "David Park, GP at Ridgeline Ventures"** — subject: "Series A for NovaTech — co-invest opportunity." The agent has never seen this person before.
**Step 3: Enrich the unknown sender.**
The agent calls the enrich skill. Enrich searches the brain:
```bash
grep -rl "David Park" /data/brain/people/ --include="*.md" # no results
grep -rl "Ridgeline" /data/brain/companies/ --include="*.md" # no results
grep -rl "david.park@ridgeline" /data/brain/people/ --include="*.md" # no results (alias search)
```
No match. → **CREATE path.**
1. Reads RESOLVER.md: "a specific named person" → `people/`
2. Runs **Tier 2 enrichment** (this is an unsolicited email, not a key relationship yet):
- Web search: finds David Park's profile on Ridgeline's website. GP, focuses on enterprise SaaS. Previously at two other funds.
- Social search: finds his X account. Recent posts about AI infrastructure, developer tools. Reposted an article about NovaTech last week.
- Brain cross-reference: searches for NovaTech → finds `companies/novatech.md` exists (from a meeting 2 months ago). Cross-links.
3. Creates `people/david-park.md` with what it found — role, fund, investment focus, public voice, connection to NovaTech.
4. Also checks `companies/ridgeline-ventures.md` — doesn't exist. Creates a thin page with what's known from the web search.
**Step 4: Back in the EA skill.** Now the agent has context. It classifies the email:
- Priority: Medium (co-invest opportunity, not urgent)
- Context: David Park is a GP at a fund that focuses on enterprise SaaS. NovaTech is already in the brain from a previous meeting.
- Action needed: User should review
Posts to the Emails notification channel:
> **Co-invest opportunity — NovaTech Series A**
> From: David Park, GP at Ridgeline Ventures
> He's reaching out about co-investing in NovaTech's Series A. Ridgeline focuses on enterprise SaaS.
> NovaTech is already in the brain — you met their founder in February.
> [Open in Gmail](link)
**The email monitor didn't just triage — it grew the brain by two pages** (one person, one company) and cross-linked them to an existing entity.
### Example 3: The Compound Effect — How Context Builds Before a Meeting
This example shows how a completely unknown person becomes a rich brain page across 4 autonomous cron runs over 48 hours, with zero manual intervention. The result: you walk into a meeting fully prepared.
**Hour 0 — Social radar cron (Tuesday, 3:00 PM)**
The social radar cron scans for mentions and engagement on X. It detects a reply to one of the user's posts from an account named `@lena_builds` — a thoughtful, technical response about developer tooling that got 50+ likes.
The agent calls enrich. Brain search: no match for "Lena" or "lena_builds." → **CREATE, Tier 3** (minor mention — just a social interaction, not a relationship yet).
Creates `people/lena-kovac.md` with minimal data: X handle, display name, the reply text, and a note that she seems technical. No API calls — Tier 3 is source-extraction only.
```markdown
# Lena Kovac
> Technical builder. Engaged with a post about developer tooling on X.
## State
- **X:** @lena_builds
- **Relationship:** None yet — social interaction only
- **Confidence:** low (1 interaction)
---
## Timeline
- **2026-04-07** | X reply — Replied to post about developer tools.
Thoughtful technical take on compiler-driven UX. 50+ likes.
```
**Hour 18 — Email monitor cron (Wednesday, 9:00 AM)**
The morning email sweep finds an email from `lena@kovac.dev` — subject: "Loved your talk at the devtools summit — would love to chat about what we're building."
The agent calls enrich. Searches the brain:
```bash
grep -rl "lena" /data/brain/people/ --include="*.md" # finds people/lena-kovac.md
grep -rl "kovac.dev" /data/brain/people/ --include="*.md" # no alias match yet
```
Finds the existing page. Reads it — it's thin (Tier 3, just the X reply). The email adds a new signal AND an email address. → **Upgrade to Tier 2.**
- Adds `lena@kovac.dev` to aliases in frontmatter
- Web search: finds her personal site (`kovac.dev`) — she's building a developer tools startup called Lattice. Previously at a major tech company on their compiler team.
- Social search: deeper X dive. She posts regularly about developer experience, compilers, and Rust. Has 3K followers.
- Brain cross-reference: searches for "Lattice" and "compiler" — finds a concept page about developer tooling that links to 2 companies in the same space.
- Updates `people/lena-kovac.md` with real substance: career history, what she's building, what she believes about developer tooling, her public voice.
**Hour 26 — Executive assistant cron (Wednesday, 5:00 PM)**
The afternoon EA sweep processes scheduling requests. One of the emails it triages is Lena's — she asked to chat. The user's calendar is open Thursday at 2 PM.
But the EA skill also checks: is there a calendar event already scheduled with this person? It searches the calendar — finds that Lena's email (`lena@kovac.dev`) appears in a calendar event for Thursday at 2 PM (she booked through the user's public booking link).
The EA skill sees the meeting is tomorrow. Calls enrich again. Page exists and is now Tier 2 with decent coverage, but there's a meeting tomorrow. → **Upgrade to Tier 1.**
- Network search: finds her LinkedIn URL. She has 2 mutual connections with the user.
- People enrichment API: full structured profile — Stanford CS, 4 years at a major tech company, founded Lattice 8 months ago.
- Semantic search: finds a conference talk she gave on "Why Developer Tools Are Stuck in 2015."
- Saves everything to `people/.raw/lena-kovac.json`
- Updates the brain page with full Tier 1 depth: beliefs, trajectory, what she's building, assessment, network connections.
**Hour 40 — Morning briefing cron (Thursday, 7:30 AM)**
The morning briefing cron builds the daily prep. It reads the calendar: meeting with Lena Kovac at 2 PM. It reads `people/lena-kovac.md` — which is now a rich page.
Produces a prep note in the daily briefing:
> **2:00 PM — Lena Kovac (Lattice)**
> Building a developer tools startup focused on compiler-driven UX. Stanford CS, 4 years on compilers at [major tech co]. Founded Lattice 8 months ago.
> She replied to your devtools post on X last Tuesday (the technical one about compiler-driven UX that got traction). Then emailed the next morning — "loved your talk, want to chat about what we're building."
> Her public writing argues that developer tools are stuck in a 2015 paradigm and that compiler intelligence should drive the entire editing experience. She gave a talk on this at DevTools Summit.
> 2 mutual connections. She's technical, has founder energy, and is building in a space you care about.
**The compound effect:** Lena went from unknown → thin Tier 3 page → substantive Tier 2 page → rich Tier 1 page → meeting prep note. Four cron runs over 48 hours. Zero manual enrichment requests. The user walks into the meeting knowing exactly who Lena is, what she cares about, and why she reached out — because every pipeline is wired to call enrich, and enrich knows how to escalate tier based on relationship signals.
This is the core insight of the brain system: **knowledge compounds autonomously when the plumbing is wired correctly.** Each cron job doesn't just do its own job — it feeds the enrichment pipeline, which feeds every future cron job. The meeting ingestion cron creates pages that the morning briefing cron reads. The email monitor enriches people that the social radar first detected. The whole system is a flywheel.
## Ingest Workflows
These are the specific ingest patterns. Each one calls enrich as its terminal step.
### Meeting ingestion
After every meeting (via Circleback, Otter, Fireflies, or manual notes):
1. Pull meeting notes + full transcript
2. Create a brain meeting page with **your own analysis** (not just regurgitated AI summary) — reframe through what you know about the attendees' world
3. **Propagate to entity pages** — call enrich for every person and company discussed. A meeting is NOT fully ingested until entity pages are updated.
4. Extract action items to task list
5. Commit
### Email ingestion
When processing email:
- Extract people and companies mentioned
- Call enrich with email context (tone, requests, relationship signals)
- Note scheduling, commitments, follow-ups
### Social media ingestion
When monitoring social media:
- Capture what people you track are saying publicly (beliefs, projects, opinions)
- Detect engagement patterns (who's replying to you, who's amplifying you)
- Call enrich for notable accounts → feed into "What They Believe" and "Hobby Horses" sections
### Manual ingestion
When you mention someone or something in conversation:
- Your own comments are the highest-value signal — always capture these
- "Really sharp on the technical side, could be a good advisor for the infra project" → that goes in the person's page immediately
- If the brain page is thin, trigger a full enrichment
## Navigation and Concurrency
**index.md** — content catalog. Every page listed with a one-line summary. Useful for navigation and query routing.
**log.md** — chronological record of ingests and updates. Append-only.
At scale (500+ pages), add search tooling (embeddings, BM25, or tools like gbrain). At moderate scale, grep works well.
### Write hotspots and concurrency
Once you have cron jobs, ingest jobs, and sub-agents all touching the brain repo, **index.md and log.md become merge-conflict magnets.** Every workflow wants to append to log.md and update index.md on every commit.
Practical mitigations:
- **Treat index.md as derived, not hand-maintained.** Instead of updating it in every ingest workflow, rebuild it periodically (daily or on-demand) by scanning the directory tree. This eliminates it as a write hotspot.
- **Make log.md append-safe.** Each entry is a self-contained line with a timestamp prefix. Concurrent appends to the end of the file rarely conflict. If they do, both sides are correct — just keep both lines.
- **Commit in batches, not per-page.** When an ingest job updates 10 entity pages, commit once at the end, not 10 times. This reduces conflict surface.
- **Pull before push.** Every workflow should `git pull --rebase` before pushing. With append-only log and independent entity pages, rebases almost always auto-resolve.
- **Entity pages rarely conflict.** Two workflows updating `people/jane-doe.md` at the same time is rare because they're triggered by different signals about different people. The real conflict hotspots are the shared files (index.md, log.md), which is why those should be append-only or derived.
## Maintenance (Lint)
Periodically (weekly), the agent should:
- **Deduplication scan:** Look for potential duplicate pages — similar names, same company, same email across different pages. Merge when confirmed.
- **Contradictions:** Check for conflicting facts between pages (e.g., two pages listing different roles for the same person at the same company).
- **Staleness:** Flag State sections superseded by newer Timeline entries.
- **Orphans:** Find pages with no inbound links.
- **Open Threads:** Check for items that seem resolved but weren't moved to Timeline.
- **Missing cross-references:** Entity A mentions Entity B but doesn't link to their page.
- **Missing pages:** Entities mentioned frequently but lacking their own page.
- **MECE filing:** Flag any pages that seem to be in the wrong directory.
- **Source audit:** Check people pages for unsourced claims in high-value sections (Beliefs, Motivations, Assessment). Flag claims without source type or date.
- **Alias coverage:** Check if recent meeting transcripts or emails contain name variants not yet in any page's aliases field.
## What makes this different from RAG
RAG re-derives knowledge from scratch on every query. The brain pre-computes synthesis and keeps it current. Specifically:
- **Cross-references are pre-built.** You don't need the LLM to discover that Person A works at Company B and was in Meeting C — that's already linked.
- **Contradictions are pre-flagged.** When new data conflicts with old data, the agent resolves or flags it during ingest, not at query time.
- **The compilation is persistent.** Each source ingested makes the brain richer. Nothing is thrown away or re-derived.
- **The structure itself is a prompt.** Empty sections ("What They Believe: [No data yet]") tell the agent what to look for next.
## Page Lifecycle
Brain pages can have implicit lifecycle states:
- **Active:** Current, recently updated, ongoing relationship or relevance
- **Dormant:** Not updated in 6+ months, relationship cooled, but still potentially relevant
- **Archived:** Moved to `archive/` — dead companies, ended relationships, resolved deals. Historical record only.
- **Graduated:** For ideas that became projects, or projects that became programs — the old page links to the new one
During lint passes, flag pages that haven't been updated in 6+ months for review. Some should be archived; others just need a fresh enrichment pass.
## What makes a great brain
A great brain lets you walk into any meeting, call, or decision already knowing:
1. Who this person is and what they care about (30 seconds of reading)
2. What the company's actual state is (not what they said 6 months ago)
3. What open threads exist between you (promises, follow-ups, deals)
4. What changed recently (latest timeline entries)
5. What to watch for (patterns, concerns, opportunities)
A bad brain is a pile of LinkedIn scrapes and meeting transcripts nobody reads. A good brain is compiled context that makes you more effective in every interaction.
## The Resolver
When creating or filing a new page, walk this decision tree. Every piece of knowledge has exactly one home.
### Decision Tree
**Start here: what is the primary subject?**
1. **A specific named person** → `people/`
2. **A specific organization** (company, fund, nonprofit, government body) → `companies/`
3. **A financial transaction** with terms and a decision to make → `deals/`
4. **A record of a specific meeting/call** that happened at a specific time → `meetings/`
5. **Something being actively built** (has a repo, spec, team, or active work) → `projects/`
6. **A raw possibility** that nobody is building yet → `ideas/`
7. **A reusable mental model or thesis** about how the world works → `concepts/`
8. **A piece of prose** that could be published as a standalone work → `writing/`
9. **Your institution's strategy, org, processes, internal dynamics** → `org/`
10. **Political or civic landscape** — policy, legislation, elections, government → `civic/`
11. **Public narrative or content operations** — social monitoring, content pipeline, published posts → `media/`
12. **A major life program** — an enduring domain of commitment containing multiple projects → `programs/`
13. **Domestic operations** — properties, logistics, household management → `household/`
14. **Private notes** — health, personal reflections, inner life → `personal/`
15. **A hiring pipeline** — candidate evaluations, role specs, interview notes → `hiring/`
16. **A reusable LLM prompt** — templates for getting specific outputs from models → `prompts/`
17. **A raw data import or snapshot** — bulk exports, API dumps, periodic captures → `sources/`
18. **Agent deliverables** — briefings, digests, and research produced by your agent → `agent/`
19. **Unsorted / quick capture** — you don't know where it goes yet → `inbox/`
20. **Dead / no longer relevant** — historical pages with no active references → `archive/`
### Disambiguation Rules
When two directories seem to fit, apply these tiebreakers:
- **Person vs. Company:** If the page is about *them as a human* (beliefs, relationship, trajectory), it's people/. If it's about *the organization they run*, it's companies/. Both pages link to each other.
- **Concept vs. Idea:** Could you *teach* it to someone as a framework? Concept. Could you *build* it? Idea.
- **Concept vs. Personal:** Would you share it in a professional talk? Concept. Is it private reflection? Personal.
- **Idea vs. Project:** Is anyone working on it? If yes, project. If no, idea. The graduation moment is when work starts.
- **Writing vs. Concepts:** Concepts are distilled (200 words of compiled truth). Writing is developed prose (argument, narrative, story).
- **Writing vs. Media:** Writing is the *artifact*. Media is the *production and distribution infrastructure*.
- **Org vs. Programs:** org/ is institutional knowledge *about* your organization. programs/ is about your personal role and priorities within it.
- **Civic vs. People:** Political figures get people/ pages. Their legislative agenda and political positioning as civic actors goes in civic/.
- **Household vs. Personal:** If a PA would execute on it, it's household (operational). If it's private reflection, it's personal (inner life).
- **Sources vs. .raw/ sidecars:** Per-entity enrichment data → .raw/ sidecar next to the entity. Bulk multi-entity imports → sources/.
- **Agent vs. Sources:** Sources feed *into* the brain. Agent deliverables are synthesized output that feeds *into your reading*.
### Special directories (not knowledge)
These exist in the brain repo but aren't knowledge directories:
- **templates/** — page templates for each type (structural, not content)
- **attachments/** — binary attachments (images, PDFs). Managed by your editor, not by the agent.
### MECE Check
Every piece of knowledge should pass through the decision tree above and land in exactly one directory. If you find something that genuinely doesn't fit any category, file it in inbox/ and flag it — that's a signal the schema needs to evolve.
## Getting started
1. Create the directory structure above (or let your agent create it)
2. Write a `RESOLVER.md` decision tree and a `README.md` resolver for each directory
3. Write a `schema.md` with your page conventions and templates
4. Add the brain rules to your agent's config (AGENTS.md or equivalent) as hard rules
5. Start with one meeting transcript or one person you want to track
6. Let the agent build the first few pages, review them, and iterate on the schema
7. Wire up your meeting tool to trigger ingestion
8. Wire up enrichment to fire on every new person/company signal
9. The brain compounds from there
The human's job: curate sources, direct analysis, ask good questions, and think about what it all means. The agent's job: everything else.
---
## docs/guides/live-sync.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/guides/live-sync.md
# Live Sync: Keep the Index Current
## Goal
Every markdown change in the brain repo is searchable within minutes, automatically, with no manual intervention.
## What the User Gets
Without this: you correct a hallucination in a brain page, but the vector DB
keeps serving the old text because nobody ran `gbrain sync`. Stale search
results erode trust. The brain becomes unreliable.
With this: edits show up in search within minutes. The vector DB stays current
with the brain repo automatically. You never have to remember to run sync.
## Implementation
### Prerequisite: Session Mode Pooler
Sync uses `engine.transaction()` on every import. If `DATABASE_URL` points to
Supabase's **Transaction mode** pooler, sync will throw `.begin() is not a
function` and **silently skip most pages**. This is the number one cause of
"sync ran but nothing happened."
Fix: use the **Session mode** pooler string (port 6543, Session mode) or the
direct connection (port 5432, IPv6-only). Verify by running `gbrain sync` and
checking that the page count in `gbrain stats` matches the syncable file count
in the repo.
### The Primitives
Always chain sync + embed:
```bash
gbrain sync --repo /path/to/brain && gbrain embed --stale
```
- `gbrain sync --repo <path>` -- one-shot incremental sync. Detects changes via
`git diff`, imports only what changed. For small changesets (<= 100 files),
embeddings are generated inline during import.
- `gbrain embed --stale` -- backfill embeddings for any chunks that don't have
them. Safety net for large syncs (>100 files) or prior `--no-embed` runs.
- `gbrain sync --watch --repo <path>` -- foreground polling loop, every 60s
(configurable with `--interval N`). Embeds inline for small changesets. Exits
after 5 consecutive failures, so run under a process manager or pair with a
cron fallback.
### Approach 1: Cron Job (recommended)
Run every 5-30 minutes. Works with any cron scheduler.
```bash
gbrain sync --repo /data/brain && gbrain embed --stale
```
**OpenClaw:**
```
Name: gbrain-auto-sync
Schedule: */15 * * * *
Prompt: "Run: gbrain sync --repo /data/brain && gbrain embed --stale
Log the result. If sync fails with .begin() is not a function,
the DATABASE_URL is using Transaction mode pooler."
```
**Hermes:**
```
/cron add "*/15 * * * *" "Run gbrain sync --repo /data/brain &&
gbrain embed --stale. Log the result." --name "gbrain-auto-sync"
```
### Approach 2: Long-Lived Watcher
For near-instant sync (60s polling). Run under a process manager that
auto-restarts on exit. Pair with a cron fallback since `--watch` exits
on repeated failures.
```bash
gbrain sync --watch --repo /data/brain
```
### Approach 3: Git Hook / Webhook
Triggers sync on push events for instant sync (<5s).
- **GitHub webhook:** Set up the webhook to call
`gbrain sync --repo /data/brain && gbrain embed --stale`.
Verify `X-Hub-Signature-256` against a shared secret.
- **Git post-receive hook:** If the brain repo is on the same machine.
### What Gets Synced
Sync only indexes "syncable" markdown files. These are excluded by design:
- Hidden paths (`.git/`, `.raw/`, etc.)
- The `ops/` directory
- Meta files: `README.md`, `index.md`, `schema.md`, `log.md`
### Sync is Idempotent
Concurrent runs are safe. Two syncs on the same commit no-op because content
hashes match. If both a cron and `--watch` fire simultaneously, no conflict.
## Tricky Spots
1. **Always chain sync + embed.** Running `gbrain sync` without
`gbrain embed --stale` leaves new chunks without embeddings. They exist
in the database but are invisible to vector search. Always run both
commands together. The `&&` ensures embed only runs if sync succeeds.
2. **--watch polls, it doesn't stream.** The `--watch` flag polls every 60s
(configurable). It is not a filesystem watcher or git hook. It exits after
5 consecutive failures, so it needs a process manager (systemd, pm2) or a
cron fallback to stay alive. Don't assume it runs forever.
3. **Webhook needs the server running.** If you use a GitHub webhook for
instant sync, the receiving server must be running and reachable. If the
server is down when a push happens, that sync is missed. Pair webhooks
with a cron fallback that catches anything the webhook missed.
## How to Verify
1. **Edit a file and search for the change.** Edit a brain markdown file,
commit, and push. Wait for the next sync cycle (cron interval or `--watch`
poll). Run `gbrain search "<text from the edit>"`. The updated content
should appear in results. If it returns old content, sync failed.
2. **Compare page count to file count.** Run `gbrain stats` and count the
syncable markdown files in the brain repo. The page count in the database
should match. If they diverge, files are being silently skipped (likely
a Transaction mode pooler issue).
3. **Check embedded chunk count.** In `gbrain stats`, the embedded chunk
count should be close to the total chunk count. A large gap means
`gbrain embed --stale` isn't running after sync, leaving chunks invisible
to vector search.
---
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
---
## docs/guides/cron-schedule.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/guides/cron-schedule.md
# Reference Cron Schedule
## Goal
A production brain runs 20+ recurring jobs that keep it alive, current, and
compounding. This guide shows the schedule, the patterns, and how to set it up.
## What the User Gets
Without this: the brain only updates when you manually ingest data. Pages go
stale, entities are thin, citations break, and the agent answers from old context.
With this: the brain maintains itself. Email, social, calendar, and meetings
flow in automatically. Thin pages get enriched overnight. Broken citations get
fixed. You wake up and the brain is smarter than when you went to sleep.
## The Schedule
| Frequency | Job | Brain Interaction | Recipe |
|-----------|-----|-------------------|--------|
| Every 30 min | Email monitoring | Search sender, update people pages | [email-to-brain](../../recipes/email-to-brain.md) |
| Every 30 min | X/Twitter collection | Create/update media pages, entity extraction | [x-to-brain](../../recipes/x-to-brain.md) |
| 3x/day (weekdays) | Meeting sync | Full ingestion + attendee propagation | [meeting-sync](../../recipes/meeting-sync.md) |
| Weekly | Calendar sync | Daily files + attendee enrichment | [calendar-to-brain](../../recipes/calendar-to-brain.md) |
| Daily AM | Morning briefing | Search calendar attendees, deal status, active threads | [briefing skill](../../skills/briefing/SKILL.md) |
| Weekly | Brain maintenance | `gbrain doctor`, embed stale, orphan detection | [maintain skill](../../skills/maintain/SKILL.md) |
| Nightly | Dream cycle | Entity sweep, enrich thin spots, fix citations | See below |
## Implementation: Setting Up Cron Jobs
```bash
# Email collector — every 30 minutes
*/30 * * * * cd /path/to/email-collector && node email-collector.mjs collect && node email-collector.mjs digest
# X/Twitter collector — every 30 minutes
*/30 * * * * cd /path/to/x-collector && node x-collector.mjs collect >> /tmp/x-collector.log 2>&1
# Meeting sync — 10 AM, 4 PM, 9 PM on weekdays
0 10,16,21 * * 1-5 cd /path/to/meeting-sync && node meeting-sync.mjs >> /tmp/meeting-sync.log 2>&1
# Calendar sync — Sundays at 10 AM
0 10 * * 0 cd /path/to/calendar-sync && node calendar-sync.mjs --start $(date -v-7d +%Y-%m-%d) --end $(date +%Y-%m-%d)
# Brain health — weekly Mondays at 6 AM
0 6 * * 1 gbrain doctor --json >> /tmp/gbrain-health.log 2>&1 && gbrain embed --stale
# Dream cycle — nightly at 2 AM
0 2 * * * /path/to/dream-cycle.sh
```
### Quiet Hours Gate (MANDATORY)
Every cron job that sends notifications MUST check quiet hours first.
See [Quiet Hours](quiet-hours.md) for the full pattern.
```bash
# In every cron script:
if ! bash scripts/quiet-hours-gate.sh; then
mkdir -p /tmp/cron-held
echo "$OUTPUT" > /tmp/cron-held/$(basename "$0" .sh).md
exit 0
fi
# Not quiet hours — send normally
```
### Travel-Aware Timezone Handling
The agent reads your calendar for flights, hotels, and out-of-office blocks to
infer your current location and timezone. All times shown in YOUR local timezone.
```
// Example: user flew to Tokyo
// 2 PM Pacific = 3 AM Tokyo = quiet hours
// Hold the notification, fold into morning briefing
get_user_timezone():
calendar = gbrain search "flight" --type calendar --recent 7d
if recent_flight:
return infer_timezone(flight.destination)
return config.default_timezone // fallback: US/Pacific
```
When you travel: cron jobs that would fire during your waking hours at home but
hit your sleeping hours at the destination get held and folded into the next
morning briefing. Zero config change needed.
## The Dream Cycle
The most important cron job. Runs while you sleep.
### What It Does
```
dream_cycle():
// Phase 1: Entity Sweep
conversations = get_todays_conversations()
for message in conversations:
entities = detect_entities(message)
for entity in entities:
page = gbrain search "{entity.name}"
if not page:
create_page(entity) // new entity, create + enrich
elif page.is_thin():
enrich_page(entity) // thin page, fill it out
else:
update_timeline(entity) // existing page, add today's mentions
// Phase 2: Fix Broken Citations
pages = gbrain list --type person --limit 100
for page in pages:
for entry in page.timeline:
if not entry.has_source_attribution():
fix_citation(entry) // add [Source: ...] where missing
if entry.has_tweet_url() and not entry.url_is_valid():
fix_url(entry) // broken tweet links
// Phase 3: Consolidate Memory
patterns = detect_patterns_across_conversations()
for pattern in patterns:
promote_to_memory(pattern) // ephemeral → durable knowledge
// Phase 4: Sync
gbrain sync --no-pull --no-embed
gbrain embed --stale
```
### Setting Up the Dream Cycle
**OpenClaw:** Ships with DREAMS.md as a default skill. Three phases (light,
deep, REM) run automatically during quiet hours.
**Hermes Agent:**
```bash
/cron add "0 2 * * *" "Dream cycle: search today's sessions for
entities I mentioned. For each person, company, or idea: check
if a brain page exists (gbrain search), create or update it if
thin. Fix any broken citations. Then consolidate: read MEMORY.md,
promote important signals, remove stale entries."
--name "nightly-dream-cycle"
```
**Claude Code / Custom agents:** Create a script:
```bash
#!/bin/bash
# dream-cycle.sh
# Check quiet hours (should be quiet — that's when we run)
echo "Dream cycle starting at $(date)"
# Phase 1: Entity sweep (spawn sub-agent)
# Read today's conversation logs, extract entities, update brain
# Phase 2: Citation hygiene
gbrain doctor --json | jq '.checks[] | select(.status=="warn")'
# Phase 3: Embed any stale content
gbrain embed --stale
echo "Dream cycle complete at $(date)"
```
## Tricky Spots
1. **The dream cycle is NOT optional.** Without it, signal leaks out of every
conversation. With it, nothing is lost. This is the difference between an
agent that forgets and one that remembers.
2. **Quiet hours gate on EVERY notification job.** If you skip it, the user
gets pinged at 3 AM. One 3 AM ping and they'll disable the whole system.
3. **Don't over-cron.** 20+ jobs sounds like a lot. Start with: email (30 min),
dream cycle (nightly), brain health (weekly). Add more as you add
integration recipes.
4. **Timezone changes are automatic.** Don't make the user reconfigure cron
when they travel. Read the calendar, infer the timezone, adjust delivery.
5. **Held messages MUST be picked up.** If quiet hours hold a notification,
the morning briefing MUST include it. Otherwise information is lost.
## How to Verify
1. **Quiet hours:** Set quiet hours to current hour. Run a notification cron.
Verify output went to `/tmp/cron-held/`, not to messaging.
2. **Dream cycle:** Run the dream cycle manually. Check that thin entity pages
got enriched and broken citations were fixed.
3. **Email collector cron:** Wait 30 minutes. Check `data/digests/` for new digest.
4. **Morning briefing:** Check that held messages appear in the briefing.
5. **Health check:** Run `gbrain doctor --json`. All checks should pass.
---
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md). See also: [Quiet Hours](quiet-hours.md), [Operational Disciplines](operational-disciplines.md)*
---
## docs/guides/minions-deployment.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/guides/minions-deployment.md
# Minions Worker Deployment Guide
Deploy `gbrain jobs work` so it stays running across crashes, reboots, and
Postgres connection blips. Written for agents to execute line-by-line.
## The problem
The persistent worker can die silently from:
- Database connection drops (Supabase/Postgres maintenance or network blips).
- Lock-renewal failures → the stall detector eventually dead-letters jobs.
- Bun process crashes with no automatic restart.
- Internal event-loop death (PID alive, worker loop stopped).
When the worker dies, submitted jobs sit in `waiting` forever. Nothing in
gbrain core auto-restarts the worker — that's what this guide wires up.
## Variables used in this guide
Substitute these once before copy-pasting any snippet.
| Variable | Meaning | Typical value |
|---|---|---|
| `$GBRAIN_BIN` | Absolute path to the `gbrain` binary | `$(command -v gbrain)` — often `/usr/local/bin/gbrain` or `~/.bun/bin/gbrain` |
| `$GBRAIN_WORKER_USER` | OS user that owns the worker process | the same user that ran `gbrain init`; never `root` |
| `$GBRAIN_WORKER_PID_FILE` | Worker PID + restart-epoch file | `/tmp/gbrain-worker.pid` (or `/var/run/gbrain/worker.pid` for systemd) |
| `$GBRAIN_WORKER_LOG_FILE` | Worker log sink (stdout + stderr merged) | `/tmp/gbrain-worker.log` (or `/var/log/gbrain/worker.log`) |
| `$GBRAIN_WORKSPACE` | `cwd` for shell jobs submitted by this deployment | absolute path, e.g. `/srv/my-brain` |
| `$GBRAIN_ENV_FILE` | Secrets file sourced by crontab / systemd | `/etc/gbrain.env` (mode 600) |
## Preconditions
Run these before Step 1 of any option. Fail fast if something is wrong.
```bash
# 1. gbrain is on PATH and resolves to an absolute location.
command -v gbrain || { echo "gbrain not on PATH. Install, then retry."; exit 1; }
# 2. DATABASE_URL points at reachable Postgres (or PGLite path exists).
gbrain doctor --fast --json | jq '.checks[] | select(.name=="db_connectivity")'
# 3. Schema is up to date. If version=0 or status=="fail", fix it first:
# gbrain apply-migrations --yes
gbrain doctor --fast --json | jq '.checks[] | select(.name=="schema_version")'
# 4. You have write access to at least one crontab mechanism.
crontab -l >/dev/null 2>&1 && echo "user crontab OK"
[ -w /etc/crontab ] && echo "/etc/crontab OK"
# 5. If you plan to submit `shell` jobs, the WORKER process needs
# GBRAIN_ALLOW_SHELL_JOBS=1 (submitters do not). The handler is gated
# in registerBuiltinHandlers(); without the flag the worker startup
# line reads "shell handler disabled (...)".
```
## Which option?
- Your workload runs LLM subagents (`gbrain agent run`) or jobs that take
> 30 s → **Option 1** (watchdog cron + persistent worker).
- Your workload is short deterministic scripts on a fixed schedule (every
3 h, daily, weekly) → **Option 2** (inline `--follow`).
- You don't have shell access to a long-running box (Fly/Render/Railway,
or any systemd host) → **Option 3** (service manager — replaces cron).
## Option 1: watchdog cron + persistent worker
A 5-minute cron checks whether the worker process is alive **and** whether
it has logged an internal shutdown since its last start. Restarts if either
condition fails.
### 1a. Install the env file (secrets stay out of crontab)
Never paste `DATABASE_URL` or API keys into crontab. `/etc/crontab` is
mode 644 (world-readable); user crontabs under `/var/spool/cron/` are
readable by `root`. Use the shipped env-file template:
```bash
sudo install -m 600 -o $GBRAIN_WORKER_USER -g $GBRAIN_WORKER_USER \
docs/guides/minions-deployment-snippets/gbrain.env.example /etc/gbrain.env
sudoedit /etc/gbrain.env
```
Fill in the connection string and `GBRAIN_ALLOW_SHELL_JOBS=1` (if
applicable). See
[`gbrain.env.example`](./minions-deployment-snippets/gbrain.env.example)
for the full list.
### 1b. Install the watchdog script
The [`minion-watchdog.sh`](./minions-deployment-snippets/minion-watchdog.sh)
ships in-repo and writes a two-line PID file (PID on line 1, restart epoch
on line 2). The restart-epoch marker is how the watchdog distinguishes
stale shutdown lines in the log from current ones — without it, every tick
after the first restart would match an old `worker shutting down` line and
loop forever.
Requires GNU coreutils (Linux default). On macOS/BSD install via
`brew install coreutils` and alias `date` to `gdate` in the cron env if you
want to test the watchdog locally; production Linux boxes work as-is.
```bash
sudo install -m 755 -o $GBRAIN_WORKER_USER -g $GBRAIN_WORKER_USER \
docs/guides/minions-deployment-snippets/minion-watchdog.sh \
/usr/local/bin/minion-watchdog.sh
```
### 1c. Wire into cron
Pick the form that matches the crontab you're editing.
**If you ran `crontab -e`** (user crontab — 5-field, no user column):
```
SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin
BASH_ENV=/etc/gbrain.env
*/5 * * * * /usr/local/bin/minion-watchdog.sh
```
**If you edited `/etc/crontab` directly** (system crontab — 6-field, with
user column):
```
SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin
BASH_ENV=/etc/gbrain.env
*/5 * * * * gbrain /usr/local/bin/minion-watchdog.sh
```
In both forms, `BASH_ENV=/etc/gbrain.env` tells non-interactive bash to
source the env file before running the watchdog — that's how the
connection string and `GBRAIN_ALLOW_SHELL_JOBS` reach the worker without
landing in the world-readable crontab itself.
### 1d. Log rotation
The watchdog appends to the worker log across restarts. If you expect the
file to grow unbounded, rotate it externally with `logrotate`:
```
# /etc/logrotate.d/gbrain-worker
/tmp/gbrain-worker.log {
daily
rotate 7
missingok
notifempty
copytruncate
}
```
`copytruncate` is important — the watchdog's restart-epoch check survives
it (the epoch is compared against in-log timestamps, not file inode).
## Option 2: inline `--follow` (no persistent worker)
Each cron run brings its own temporary worker. `--follow` starts one on
the queue and blocks until the just-submitted job reaches a terminal state
(`completed` / `failed` / `dead` / `cancelled`). 2-3 s startup overhead
per job; negligible vs job duration for scheduled work.
Example: nightly brain enrichment as a shell job.
```bash
GBRAIN_ALLOW_SHELL_JOBS=1 gbrain jobs submit shell \
--queue nightly-enrich \
--params "{\"cmd\":\"$GBRAIN_BIN embed --stale\",\"cwd\":\"$GBRAIN_WORKSPACE\"}" \
--follow \
--timeout-ms 600000
```
Replace `gbrain embed --stale` with whichever gbrain subcommand you're
scheduling (`sync`, `extract`, `orphans`, `doctor`, `check-backlinks`,
`lint`, `autopilot`). If you're shelling out to a non-gbrain binary,
keep its absolute path in the `cmd`.
**Shared-queue gotcha.** If other jobs are already waiting on the same
queue with higher priority or earlier `created_at`, the temporary worker
processes those first before reaching yours. `--follow` still exits only
when YOUR job finishes. For strict single-job semantics on shared queues,
use a dedicated queue name like `nightly-enrich` above.
## Option 3: service manager (systemd / Fly / Render / Railway)
Replaces the watchdog entirely. No cron, no PID file, no restart-loop.
The service manager owns liveness.
### systemd (Linux hosts with shell access)
```bash
# Create the worker user if it doesn't exist.
sudo useradd --system --home "$GBRAIN_WORKSPACE" --shell /usr/sbin/nologin gbrain \
2>/dev/null || true
sudo mkdir -p "$GBRAIN_WORKSPACE" && sudo chown gbrain:gbrain "$GBRAIN_WORKSPACE"
# Install the unit file, substituting /srv/gbrain → your workspace path.
sudo install -m 644 docs/guides/minions-deployment-snippets/systemd.service \
/etc/systemd/system/gbrain-worker.service
sudo sed -i "s|/srv/gbrain|$GBRAIN_WORKSPACE|g" \
/etc/systemd/system/gbrain-worker.service
# See 1a above for /etc/gbrain.env install.
sudo systemctl daemon-reload
sudo systemctl enable --now gbrain-worker
sudo systemctl status gbrain-worker
journalctl -u gbrain-worker -n 50
```
`Restart=always` + `RestartSec=10s` give you crash-loop recovery. The unit
runs as an unprivileged `gbrain` user with `PrivateTmp`, `ProtectSystem=strict`,
and `ReadWritePaths=$GBRAIN_WORKSPACE`. `LimitNOFILE=65535` in the shipped
unit covers Bun + Postgres pool + concurrent LLM subagent calls without
hitting the default 1024 cap.
### Fly.io
Merge the `[processes]` block from
[`fly.toml.partial`](./minions-deployment-snippets/fly.toml.partial) into
your existing `fly.toml`. Set secrets with `fly secrets set` —
Fly auto-restarts the process on crash.
### Render / Railway / Heroku
Drop [`Procfile`](./minions-deployment-snippets/Procfile) at the repo root.
Set the connection string and `GBRAIN_ALLOW_SHELL_JOBS=1` via the
platform's env UI or CLI.
## Upgrading an existing deployment
If you deployed on v0.13.x or earlier, walk this checklist:
1. **Stop the worker before upgrading.**
`kill $(head -n1 /tmp/gbrain-worker.pid)` and wait for the process to
exit. Skipping this risks an in-flight job landing partial schema.
2. **Run `gbrain upgrade`**. Then `gbrain apply-migrations --yes` if
`gbrain doctor` reports any migration as `partial` or `pending`.
3. **If you run shell jobs:** from v0.14 onward, the worker requires
`GBRAIN_ALLOW_SHELL_JOBS=1` to register the `shell` handler. Add it to
`/etc/gbrain.env`. Submitters don't need the flag; only the worker does.
4. **If you tuned your watchdog for `max_stalled=1`:** v0.14.3 migration
v15 raised the schema default to 5 and backfilled existing non-terminal
rows. A watchdog tuned around 1-strike dead-lettering will now
over-restart because it takes 5 misses to dead-letter. Switch to the
shipped watchdog (which keys on log markers, not job state).
5. **If your v0.16.1 watchdog is still running:** it has a restart-loop
bug (old shutdown lines in the unrotated log re-match every 5 min
forever). Install the current `minion-watchdog.sh` from this guide's
snippets — it writes a restart epoch into the PID file and only
considers log lines newer than that epoch.
6. **Verify.** `gbrain doctor` should report zero `pending` or `partial`
migrations. `gbrain jobs stats` should show no unexplained growth in
`dead` between pre- and post-upgrade.
## Known issues
### Supabase connection drops
The worker uses a single Postgres connection. If Supabase drops it
(maintenance, connection limits, network blip), lock renewal fails
silently. The stall detector then dead-letters the job after
`max_stalled` misses.
**Current defaults that make this worse:**
- `lockDuration: 30000` (30 s) — too short for long jobs during connection blips.
- `max_stalled: 5` (schema column default on master — see `src/schema.sql`
and `src/core/pglite-schema.ts`). Five missed heartbeats before dead-letter.
- `stalledInterval: 30000` (30 s) — checks too aggressively.
**Tune per-job today.** `gbrain jobs submit` accepts `--max-stalled N`,
`--backoff-type fixed|exponential`, `--backoff-delay <ms>`,
`--backoff-jitter 0..1`, and `--timeout-ms N` as first-class flags
(since v0.13.1). These write onto the job row at submit time — which is
what `handleStalled()` reads — so per-job tuning is the real knob today.
Worker-level `--lock-duration` / `--stall-interval` are on the roadmap;
until they land, rely on per-job `--max-stalled` plus the watchdog (or
systemd) for worker health.
### DO NOT pass `maxStalledCount` to `MinionWorker`
It's a no-op. The stall detector reads the row's `max_stalled` column
(set at submit time), not the worker opt in `src/core/minions/worker.ts:74`.
Use `gbrain jobs submit --max-stalled N` per-job instead.
### Zombie shell children
When the Bun worker crashes hard, child processes from shell jobs can
become zombies. The watchdog's 10 s `SIGTERM → SIGKILL` window covers the
shell handler's 5 s child-kill grace (`KILL_GRACE_MS`). For long-running
shell jobs, bump the watchdog's `sleep 10` to `sleep 30` so the worker
has time to flush in-flight jobs before the kill.
## Smoke test
```bash
# Worker alive?
kill -0 $(head -n1 /tmp/gbrain-worker.pid) 2>/dev/null && echo ALIVE || echo DEAD
# Aggregate queue health.
gbrain jobs stats
# Jobs currently stalled (still `active` with expired lock_until, pre-requeue).
gbrain jobs list --status active --limit 10
# Dead-lettered jobs.
gbrain jobs list --status dead --limit 10
# Shell handler registered? (stderr banner merged into log via 2>&1.)
grep "shell handler enabled" /tmp/gbrain-worker.log
```
## Uninstall
- **Option 1 (watchdog cron):** `crontab -e`, delete the watchdog line.
`kill $(head -n1 /tmp/gbrain-worker.pid) && rm /tmp/gbrain-worker.pid`.
Optionally `sudo rm /etc/gbrain.env /usr/local/bin/minion-watchdog.sh`.
- **Option 2 (inline `--follow`):** remove the cron entry. Nothing else to
clean up — temporary workers exit with their jobs.
- **Option 3 (systemd):** `sudo systemctl disable --now gbrain-worker`,
then `sudo rm /etc/systemd/system/gbrain-worker.service /etc/gbrain.env`,
then `sudo systemctl daemon-reload`.
- **Option 3 (Fly/Render/Railway):** delete the `worker` process from
`fly.toml` / `Procfile` and redeploy. Secrets set via `fly secrets`
persist until `fly secrets unset`.
---
## docs/guides/quiet-hours.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/guides/quiet-hours.md
# Quiet Hours and Timezone-Aware Delivery
## Goal
Hold all notifications during sleep hours, merge held messages into the morning briefing, and adjust automatically when the user travels.
## What the User Gets
Without this: 3 AM pings from cron jobs. One bad notification and the user
disables the entire system.
With this: the brain works overnight (dream cycle, collectors, enrichment)
but notifications are held until morning. Travel to Tokyo? The system adjusts
automatically from your calendar, no config change needed.
## Implementation
### Quiet Hours Gate
Every cron job that sends notifications must check quiet hours FIRST.
```
QUIET_START = 23 // 11 PM local time
QUIET_END = 8 // 8 AM local time
is_quiet(local_hour):
return local_hour >= QUIET_START OR local_hour < QUIET_END
```
**Before sending any notification:**
1. Determine user's current timezone (from config or heartbeat state)
2. Convert current UTC time to local time
3. If quiet hours: hold the message, don't send
### Held Messages
During quiet hours, output goes to a held directory instead of being sent:
```
if is_quiet():
mkdir -p /tmp/cron-held/
write("/tmp/cron-held/{job-name}.md", output)
exit // don't send
else:
send(output)
```
The morning briefing picks up held messages:
```
morning_briefing():
held_files = list("/tmp/cron-held/*.md")
if held_files:
briefing += "## Overnight Updates\n\n"
for file in held_files:
briefing += read(file)
delete(file)
```
This way nothing is lost. Overnight cron results get folded into the
first thing the user sees in the morning.
### Timezone Awareness
The agent should know what timezone the user is in. Store it in
the agent's operational state:
```json
{
"currentLocation": {
"timezone": "US/Pacific",
"city": "San Francisco"
}
}
```
**Update the timezone when:**
- Calendar shows the user flying somewhere (check for airline/hotel events)
- User mentions being in a different city
- User's active hours shift (they're responding at 3 AM PT = they're probably traveling)
**All times shown to the user should be in their LOCAL timezone.** Never
show UTC or a timezone the user isn't in.
### Shell Implementation
```bash
#!/bin/bash
# quiet-hours-gate.sh — run before any notification
TIMEZONE="${USER_TIMEZONE:-US/Pacific}"
LOCAL_HOUR=$(TZ="$TIMEZONE" date +%H)
if [ "$LOCAL_HOUR" -ge 23 ] || [ "$LOCAL_HOUR" -lt 8 ]; then
echo "QUIET_HOURS=true"
exit 1 # don't send
fi
echo "QUIET_HOURS=false"
exit 0 # ok to send
```
**In cron job scripts:**
```bash
# Check quiet hours first
if ! bash scripts/quiet-hours-gate.sh; then
mkdir -p /tmp/cron-held
echo "$OUTPUT" > /tmp/cron-held/$(basename "$0" .sh).md
exit 0
fi
# Not quiet hours — send normally
send_notification "$OUTPUT"
```
### Configurable Hours
Some users want different quiet hours. Store the config:
```json
{
"quiet_hours": {
"start": 23,
"end": 8,
"enabled": true
}
}
```
Set `enabled: false` to disable quiet hours entirely (e.g., for 24/7 monitoring).
## Tricky Spots
1. **Gate on EVERY job.** The quiet hours check must run before every single
cron job that produces notifications. If even one job skips the gate, the
user gets a 3 AM ping and loses trust in the entire system. No exceptions.
2. **Held messages MUST be picked up.** If the morning briefing doesn't read
`/tmp/cron-held/`, overnight results vanish silently. Verify the briefing
skill reads and clears the held directory. Orphaned held files mean the
pickup integration is broken.
3. **Timezone auto-detection is fragile.** Calendar-based timezone detection
relies on the user having airline/hotel events with location data. If the
user books travel without calendar entries, the system won't detect the
move. Fall back to activity-hour analysis (responding at 3 AM PT = probably
not in PT anymore) and ask the user if uncertain.
## How to Verify
1. **Set quiet hours to the current hour.** Temporarily set `QUIET_START` to
one hour before now and `QUIET_END` to one hour after. Trigger a cron job.
Verify the output goes to `/tmp/cron-held/` instead of being sent.
2. **Check held message pickup.** After step 1, run or simulate the morning
briefing. Verify the held message appears in the "Overnight Updates"
section and the file is deleted from `/tmp/cron-held/`.
3. **Verify timezone adjustment.** Change the timezone config to a zone where
it's currently quiet hours. Trigger a notification. Verify it's held. Change
back to your real timezone during active hours. Trigger again. Verify it sends.
---
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
---
## docs/mcp/DEPLOY.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/mcp/DEPLOY.md
# Deploy GBrain Remote MCP Server
Access your brain from any device, any AI client. GBrain's MCP server runs locally
via `gbrain serve` (stdio). For remote access, wrap it in an HTTP server behind a
public tunnel.
## Two Paths
### Local (zero setup)
```bash
gbrain serve
```
Works with Claude Code, Cursor, Windsurf, and any MCP client that supports stdio.
No server, no tunnel, no token needed.
### Remote (any device, any AI client)
```
Your AI client (Claude Desktop, Perplexity, etc.)
→ ngrok tunnel (https://YOUR-DOMAIN.ngrok.app)
→ Your HTTP server (wraps gbrain serve)
→ Supabase Postgres (via pooler connection string)
```
This requires:
1. A machine running `gbrain serve` behind an HTTP wrapper
2. A public tunnel (ngrok, Tailscale, or cloud host)
3. Bearer token auth for security
## Remote Setup
### 1. Set up the tunnel
See the [ngrok-tunnel recipe](../../recipes/ngrok-tunnel.md) for full setup.
Quick version:
```bash
brew install ngrok
ngrok config add-authtoken YOUR_TOKEN
ngrok http 8787 --url your-brain.ngrok.app # Hobby tier for fixed domain
```
### 2. Create access tokens
```bash
# Create a token for each client
bun run src/commands/auth.ts create "claude-desktop"
# List all tokens
bun run src/commands/auth.ts list
# Revoke a token
bun run src/commands/auth.ts revoke "claude-desktop"
```
Tokens are per-client. Create one for each device/app. Revoke individually
if compromised. Tokens are stored SHA-256 hashed in your database.
### 3. Connect your AI client
- **Claude Code:** [setup guide](CLAUDE_CODE.md)
- **Claude Desktop:** [setup guide](CLAUDE_DESKTOP.md) (must use GUI, not JSON config)
- **Claude Cowork:** [setup guide](CLAUDE_COWORK.md)
- **Perplexity:** [setup guide](PERPLEXITY.md)
### 4. Verify
```bash
bun run src/commands/auth.ts test \
https://YOUR-DOMAIN.ngrok.app/mcp \
--token YOUR_TOKEN
```
## Operations
All 30 GBrain operations are available remotely, including `sync_brain` and
`file_upload` (no timeout limits with self-hosted server).
**Security note on `file_upload`:** remote MCP callers are confined to the working
directory where `gbrain serve` was launched. Symlinks, `..` traversal, and absolute
paths outside cwd are rejected. Page slugs and filenames are allowlist-validated
(alphanumeric + hyphens; no control chars, RTL overrides, or backslashes). Local
CLI callers (`gbrain file upload ...`) keep unrestricted filesystem access since
the user owns the machine.
## Deployment Options
See [ALTERNATIVES.md](ALTERNATIVES.md) for a comparison of ngrok, Tailscale
Funnel, and cloud hosts (Fly.io, Railway).
## Troubleshooting
**"missing_auth" error**
Include the Authorization header: `Authorization: Bearer YOUR_TOKEN`
**"invalid_token" error**
Run `bun run src/commands/auth.ts list` to see active tokens.
**"service_unavailable" error**
Database connection failed. Check your Supabase dashboard for outages.
**Claude Desktop doesn't connect**
Remote servers must be added via Settings > Integrations, NOT
`claude_desktop_config.json`. See [CLAUDE_DESKTOP.md](CLAUDE_DESKTOP.md).
## Expected Latencies
| Operation | Typical Latency | Notes |
|-----------|----------------|-------|
| get_page | < 100ms | Single DB query |
| list_pages | < 200ms | DB query with filters |
| search (keyword) | 100-300ms | Full-text search |
| query (hybrid) | 1-3s | Embedding + vector + keyword + RRF |
| put_page | 100-500ms | Write + trigger search_vector update |
| get_stats | < 100ms | Aggregate query |
**Note:** `gbrain serve --http` (built-in HTTP transport) is planned but not yet
implemented. Currently, remote MCP requires a custom HTTP wrapper. See the
production deployment pattern in the [voice recipe](../../recipes/twilio-voice-brain.md)
for a reference implementation.
---
# Debugging
## docs/GBRAIN_VERIFY.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/GBRAIN_VERIFY.md
# GBrain Installation Verification Runbook
Run these checks after install to confirm every part of GBrain is working.
Each check includes the command, expected output, and what to do if it fails.
The most important check is #4 (live sync). "Sync ran" is not the same as
"sync worked." A sync that silently skips pages because of a pooler bug is
worse than no sync at all, because you think it's working.
---
## 1. Schema Verification
**Command:**
```bash
gbrain doctor --json
```
**Expected:** All checks return `"ok"`:
- `connection`: connected, N pages
- `pgvector`: extension installed
- `rls`: enabled on all tables
- `schema_version`: current
- `embeddings`: coverage percentage
**If it fails:** The doctor output includes specific fix instructions for each
check. See `skills/setup/SKILL.md` Error Recovery table.
---
## 2. Skillpack Loaded
**Check:** Ask the agent: "What is the brain-agent loop?"
**Expected:** The agent references GBRAIN_SKILLPACK.md Section 2 and describes
the read-write cycle: detect entities, read brain, respond with context, write
brain, sync.
**If it fails:** The agent hasn't loaded the skillpack. Run step 6 from the
install paste (read `docs/GBRAIN_SKILLPACK.md`).
---
## 3. Auto-Update Configured
**Command:**
```bash
gbrain check-update --json
```
**Expected:** Returns JSON with `current_version`, `latest_version`,
`update_available` (boolean). The cron `gbrain-update-check` is registered.
**If it fails:** Run step 7 from the install paste. See GBRAIN_SKILLPACK.md
Section 17.
---
## 4. Live Sync Actually Works
This is the most important check. Three parts.
### 4a. Coverage Check
Compare page count in the DB against syncable file count in the repo:
```bash
gbrain stats
```
Then count syncable files:
```bash
find /data/brain -name '*.md' \
-not -path '*/.*' \
-not -path '*/.raw/*' \
-not -path '*/ops/*' \
-not -name 'README.md' \
-not -name 'index.md' \
-not -name 'schema.md' \
-not -name 'log.md' \
| wc -l
```
**Expected:** Page count in `gbrain stats` should be close to the file count.
Some difference is normal (files added since last sync), but if page count is
less than half the file count, sync is silently skipping pages.
**If page count is way too low:** The #1 cause is the connection pooler bug.
Check your `DATABASE_URL`:
- If it contains `pooler.supabase.com:6543`, verify it's using **Session mode**,
not Transaction mode.
- Transaction mode breaks `engine.transaction()` and causes `.begin() is not a
function` errors.
- Fix: switch to Session mode pooler string, then run `gbrain sync --full`
to reimport everything.
### 4b. Embed Check
```bash
gbrain stats
```
**Expected:** Embedded chunk count should be close to total chunk count.
**If embedded is much lower than total:**
```bash
gbrain embed --stale
```
If `OPENAI_API_KEY` is not set, embeddings can't be generated. Keyword search
still works without embeddings, but hybrid/semantic search won't.
### 4c. End-to-End Test
This is the real test. Edit a brain page, push, wait, search.
1. Edit a page in the brain repo (e.g., correct a fact on a person's page):
```bash
# Example: fix a line in Gustaf's page
cd /data/brain
# Make a small edit to any .md file
git add -A && git commit -m "test: verify live sync" && git push
```
2. Wait for the next sync cycle (cron interval or `--watch` poll).
3. Search for the corrected text:
```bash
gbrain search "<text from the correction>"
```
**Expected:** The search returns the **corrected** text, not the old version.
**If it returns old text:** Sync failed silently. Check:
- Is the sync cron registered and running?
- Is `gbrain sync --watch` still alive (if using watch mode)?
- Run `gbrain config get sync.last_run` to see when sync last ran.
- Run `gbrain sync --repo /data/brain` manually and check for errors.
- If you see `.begin() is not a function`, fix the pooler (see 4a above).
---
## 5. Embedding Coverage
**Command:**
```bash
gbrain stats
```
**Expected:** Embedded chunk count matches (or is close to) total chunk count.
**If zero or very low:** `OPENAI_API_KEY` may be missing or invalid. Check:
```bash
echo $OPENAI_API_KEY | head -c 10
```
If blank, set the key. Then:
```bash
gbrain embed --stale
```
---
## 6. Brain-First Lookup Protocol
**Check:** Ask the agent about a person or concept that exists in the brain.
**Expected:** The agent uses `gbrain search` or `gbrain query` FIRST, not grep
or external APIs. The response includes brain-sourced context with source
attribution.
**If it fails:** The brain-first lookup protocol isn't injected into the agent's
system context. See `skills/setup/SKILL.md` Phase D.
---
## 7. Knowledge Graph Wired
The v0.12.0 graph layer needs to be populated for existing brains. New writes are
auto-linked, but historical pages need a one-time backfill.
**Command:**
```bash
gbrain stats | grep -E 'links|timeline'
```
**Expected:** Both `links` and `timeline_entries` are non-zero (assuming the brain
has content with entity references and dated markdown).
**If it's zero on a brain with imported content:** Run the backfill.
```bash
gbrain extract links --source db --dry-run | head -5 # preview
gbrain extract links --source db # commit
gbrain extract timeline --source db
gbrain stats # confirm > 0
```
**Bonus check** — graph traversal works:
```bash
# Pick any well-connected slug from your brain
gbrain graph-query people/<some-person-slug> --depth 2
```
**Expected:** Indented tree of typed edges (`--attended-->`, `--works_at-->`, etc.).
If the slug has no inbound or outbound links, try a different one or run extract
again.
**If extract finds nothing:** Your pages may not use entity-reference syntax. The
extractor matches `[Name](people/slug)`, `[Name](../people/slug.md)`, and bare
`people/slug` references. If your brain uses a different format, the auto-link
heuristics won't find them — file an issue with a sample page.
---
## 8. JSONB Frontmatter Integrity (v0.12.2)
Postgres-backed brains created before v0.12.2 had double-encoded JSONB columns
(`frontmatter->>'key'` returned NULL, GIN indexes were inert). `gbrain upgrade`
runs `gbrain repair-jsonb` automatically via the `v0_12_2` orchestrator.
Verify the repair succeeded.
**Command:**
```bash
gbrain repair-jsonb --dry-run --json
```
**Expected:** `totalRepaired: 0` across all 5 columns (`pages.frontmatter`,
`raw_data.data`, `ingest_log.pages_updated`, `files.metadata`,
`page_versions.frontmatter`). A zero count means every row is properly-typed
JSON objects, not string-encoded JSON.
**If the count is > 0:** The repair didn't run or was interrupted. Re-run
without `--dry-run`:
```bash
gbrain repair-jsonb
```
Idempotent. PGLite brains always report 0 (unaffected by the original bug).
**Bonus check** — frontmatter-keyed queries actually resolve:
```bash
gbrain call list_pages '{"frontmatterKey": "type", "frontmatterValue": "person"}'
```
If this returns rows on a brain with person pages, the JSONB path is healthy.
---
## Quick Verification (all checks in one pass)
```bash
# 1. Schema
gbrain doctor --json
# 2. Sync recency
gbrain config get sync.last_run
# 3. Page count + embed coverage
gbrain stats
# 4. Search works
gbrain search "test query from your brain content"
# 5. Catch any unembedded chunks
gbrain embed --stale
# 6. Auto-update
gbrain check-update --json
# 7. Knowledge graph populated (links + timeline > 0)
gbrain stats | grep -E 'links|timeline'
# 8. JSONB integrity (v0.12.2 — Postgres only, PGLite always 0)
gbrain repair-jsonb --dry-run --json
```
If all eight return successfully, the installation is healthy. For the full
end-to-end sync test (4c), push a real change and verify it appears in search.
---
## docs/guides/minions-fix.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/guides/minions-fix.md
# Minions fix — repairing a half-migrated install
**tl;dr:** on v0.11.1+ everything should self-heal. If Minions is partially
set up (no `~/.gbrain/preferences.json`, autopilot still inline, cron jobs
still on `agentTurn`), run:
```bash
gbrain apply-migrations --yes
```
It's idempotent. On v0.11.1 installs that already migrated it's a cheap
no-op.
## Context
v0.11.0 shipped the Minions schema, queue, worker, and migration skill —
but the migration skill itself never fired on upgrade. `runPostUpgrade`
printed the feature pitch and stopped. v0.11.0 was never released
publicly; v0.11.1 is the first public Minions ship and fixes the
mega-bug (migration fires automatically on `gbrain upgrade` and via
the `postinstall` hook).
If you're on a pre-v0.11.1 branch build (e.g. running the
`minions-jobs` branch before v0.11.1 tagged), Minions may be installed
but not wired: schema is v7, but no `~/.gbrain/preferences.json`,
autopilot still runs inline, cron jobs still call `agentTurn`.
This guide covers both paths: the canonical v0.11.1+ fix, and the
stopgap for pre-v0.11.1 binaries that don't have `apply-migrations`.
## Detecting the half-migrated state
```bash
gbrain doctor
```
If the install is half-migrated, you'll see:
```
[FAIL] minions_migration: MINIONS HALF-INSTALLED (partial migration: 0.11.0). Run: gbrain apply-migrations --yes
```
or
```
[FAIL] minions_config: MINIONS HALF-INSTALLED (schema v7+ but no ~/.gbrain/preferences.json). Run: gbrain apply-migrations --yes
```
For a machine-readable report (cron-friendly):
```bash
gbrain skillpack-check --quiet && echo healthy || echo needs_action
gbrain skillpack-check | jq -r '.actions[]' # prints the exact commands to run
```
## The fix (v0.11.1 or later)
```bash
gbrain apply-migrations --yes
```
Reads `~/.gbrain/migrations/completed.jsonl`, diffs against the TS
migration registry, runs whatever's pending. Seven phases:
```
A. Schema gbrain init --migrate-only
B. Smoke gbrain jobs smoke
C. Mode prompt (or --yes default pain_triggered)
D. Prefs write ~/.gbrain/preferences.json
E. Host AGENTS.md marker injection + cron rewrites for gbrain
builtins; JSONL TODOs for host-specific handlers
F. Install gbrain autopilot --install (env-aware)
G. Record append completed.jsonl status:"complete"
```
If Phase E emits TODOs for host-specific handlers (e.g. your OpenClaw's
~29 non-gbrain crons), the migration finishes with `status: "partial"`.
Your host agent walks the TODOs using `skills/migrations/v0.11.0.md` +
`docs/guides/plugin-handlers.md`, ships handler registrations in the
host repo, then re-runs `gbrain apply-migrations --yes`. Newly
registerable cron entries get rewritten and the JSONL rows mark
`status: "complete"`.
## The stopgap (pre-v0.11.1 binary, no apply-migrations yet)
If you're stuck on a branch build that doesn't have `apply-migrations`:
```bash
curl -fsSL https://raw.githubusercontent.com/garrytan/gbrain/v0.11.1/scripts/fix-v0.11.0.sh | bash
```
This bash script does what apply-migrations does from a shell environment:
1. `gbrain init --migrate-only` — schema v7.
2. `gbrain jobs smoke` — verify Minions health.
3. Prompt for `minion_mode` (defaults `pain_triggered` on non-TTY).
4. Write `~/.gbrain/preferences.json` atomically.
5. Append `~/.gbrain/migrations/completed.jsonl` with `status: "partial"`
and `apply_migrations_pending: true`. That partial record is the
signal to v0.11.1's `apply-migrations` to pick up remaining phases
after the user upgrades.
6. Detect host agent repos and PRINT rewrite instructions (never
auto-edits from a curl-piped script).
7. Print the next step: `Run: gbrain autopilot --install`.
Once v0.11.1 is installed, re-run `gbrain apply-migrations --yes` to
finish the remaining phases (host rewrites + autopilot install). The
stopgap's `status: "partial"` record is designed to resume cleanly
(it doesn't poison the permanent migration path).
## Verify the fix landed
```bash
# 1. Preferences exist and are readable
cat ~/.gbrain/preferences.json
# 2. Migration recorded
cat ~/.gbrain/migrations/completed.jsonl
# 3. Autopilot is supervising a Minions worker child
gbrain autopilot --status
ps aux | grep 'jobs work'
# 4. Jobs show up in the queue
gbrain jobs list
# 5. Any host-specific TODOs still pending
cat ~/.gbrain/migrations/pending-host-work.jsonl 2>/dev/null || echo "(none — all host work is done)"
# 6. Doctor + skillpack-check should both be clean
gbrain doctor
gbrain skillpack-check --quiet && echo ok
```
## If the fix fails
Each phase is idempotent. Re-running is safe. Common failure modes:
- **Phase B smoke fails:** the schema didn't apply. Check
`~/.gbrain/config.json` has a valid `database_url` (or `database_path`
for PGLite). Run `gbrain init --migrate-only` directly and look at
the error.
- **Phase F install fails:** your host environment doesn't match any
detected target. Pass `--target <macos|linux-systemd|ephemeral-container|linux-cron>`
explicitly.
- **Pending host work never clears:** your host agent hasn't shipped
handler registrations yet. Read
`~/.gbrain/migrations/pending-host-work.jsonl`, open
`skills/migrations/v0.11.0.md`, and follow the host-agent instruction
manual.
## Related
- `skills/migrations/v0.11.0.md` — full migration skill for host agents.
- `skills/skillpack-check/SKILL.md` — when and how to run the health check.
- `docs/guides/plugin-handlers.md` — plugin contract for host-specific
handlers.
- `skills/conventions/cron-via-minions.md` — the canonical cron rewrite
pattern.
---
## docs/integrations/reliability-repair.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/integrations/reliability-repair.md
# Reliability repair (v0.12.2)
If you ran v0.12.0 on real Postgres or Supabase, two bugs may have corrupted
data already in your brain. v0.12.1 fixed the code going forward.
v0.12.2 adds detection in `gbrain doctor` and a standalone `gbrain repair-jsonb`
command for the mechanically fixable class. PGLite users are not affected.
## What got corrupted
**JSONB double-encode.** Four write sites used
`${JSON.stringify(x)}::jsonb` with postgres.js, which stored a JSONB
*string literal* instead of an object. `frontmatter ->> 'key'` returns NULL;
GIN indexes are ineffective. Affected: `pages.frontmatter`,
`raw_data.data`, `ingest_log.pages_updated`, `files.metadata`.
**Markdown body truncation.** `splitBody()` treated `---` horizontal rules
as a body/timeline delimiter, dropping everything after the first rule.
Wiki-style pages with multiple `##`/`###` sections lost the bulk of their
content at import time.
## Detect
```
gbrain doctor
```
Reports two new checks:
- `jsonb_integrity` — counts double-encoded rows per table and points you
at `gbrain repair-jsonb`.
- `markdown_body_completeness` — heuristic for pages whose `compiled_truth`
is suspiciously short compared to `raw_data.data ->> 'content'`.
## Repair
For JSONB (mechanically fixable):
```
gbrain repair-jsonb
```
Runs `UPDATE <table> SET <col> = (<col>#>>'{}')::jsonb WHERE jsonb_typeof(<col>) = 'string'`
across every affected column. Idempotent. Second run reports 0 rows. Use
`--dry-run` to preview, `--json` for structured output. The `v0_12_2`
migration runs this automatically on `gbrain upgrade`.
For truncated markdown bodies (source-dependent):
```
gbrain sync --force
# or per-page
gbrain import <slug> --force
```
v0.12.2 cannot recover content that was already lost if you no longer have
the source markdown file. `gbrain doctor` tells you which pages look short;
you decide whether to re-import from source or accept the truncation.
## Verify
```
gbrain doctor
```
All four `jsonb_integrity` rows should read zero. `markdown_body_completeness`
should match your expectations for the corpus.
---
# Migrations
## docs/UPGRADING_DOWNSTREAM_AGENTS.md
Source: https://raw.githubusercontent.com/garrytan/gbrain/master/docs/UPGRADING_DOWNSTREAM_AGENTS.md
# Upgrading Downstream Agents
GBrain ships skills in `skills/`. Downstream agents (custom OpenClaw deployments,
agent forks of any kind) often **copy** these skill files into their own workspace and
diverge over time — adding agent-specific phases, removing irrelevant ones, tightening
language. Once that happens, gbrain can't push updates to those forks. The agent has
to apply the diffs by hand.
This doc lists the exact diffs each downstream agent needs to apply when upgrading.
Cross-reference against your fork's local skill files.
## Why this exists
`gbrain upgrade` ships the new binary. `gbrain post-upgrade [--execute --yes]` runs
the schema migrations and backfills the data. But the **skill files themselves**
that tell the agent how to behave — those are user-owned. If your `~/git/<your-agent>/workspace/skills/brain-ops/SKILL.md`
says `# Based on gbrain v0.10.0` at the top, it doesn't know about v0.12.0 features.
The agent will keep manually calling `gbrain link` after every `put_page` (now redundant —
auto-link does it), miss out on `gbrain graph-query` for relationship questions, and
not know to backfill the structured timeline.
## How to apply
1. Identify your forked skill files. Typically at `~/git/<your-agent>/workspace/skills/` or wherever your agent's skill directory lives.
2. For each skill listed below, find the matching phase/section in your fork.
3. Apply the diff (paste the new block in the indicated location).
4. Update the version banner at the top of your fork (`# Based on gbrain v0.12.0`).
5. Verify: ask the agent to write a test page and confirm the response includes
`auto_links: { created, removed, errors }`.
Total time: ~10 minutes for all four skills.
---
## 1. brain-ops/SKILL.md
**Where:** Insert a new `### Phase 2.5` section immediately after `### Phase 2: On Every Inbound Signal`.
**Why:** Phase 2.5 declares that auto-link runs automatically. Without this, the
agent's mental model says it must call `gbrain link` after every `put_page`, which
is now redundant and can cause double-add warnings.
```markdown
### Phase 2.5: Structured Graph Updates (automatic)
Every `put_page` call automatically extracts entity references and writes them
to the graph (`links` table) with inferred relationship types. Stale links
(refs no longer in the page text) are removed in the same call. This is
"auto-link" reconciliation.
- No manual `add_link` calls needed for ordinary page writes.
- Inferred link types: `attended` (meeting -> person), `works_at`, `invested_in`,
`founded`, `advises`, `source` (frontmatter), `mentions` (default).
- The `put_page` MCP response includes `auto_links: { created, removed, errors }`
so the agent can verify outcomes.
- To disable: `gbrain config set auto_link false`. Default is on.
- Timeline entries with specific dates still need explicit `gbrain timeline-add`
(or batch via `gbrain extract timeline --source db`).
```
**Also update the Iron Law section.** If your fork still says "Back-links maintained
on every brain write (Iron Law)" without qualification, append:
```markdown
**v0.12.0 update:** Auto-link satisfies the Iron Law for entity-reference links
on every `put_page`. The agent's Iron Law obligation is now: include the
entity reference in the page content (e.g., `[Alice](people/alice)`); auto-link
handles the structured row. Manual `add_link` calls are reserved for
relationships you can't express in markdown content.
```
---
## 2. meeting-ingestion/SKILL.md
**Where:** Append to the end of `### Phase 3: Attendee enrichment`.
**Why:** Eliminates redundant `gbrain link` calls per attendee (auto-link handles them
when the meeting page references attendees as `[Name](people/slug)`).
```markdown
**Note (v0.12.0):** Once the meeting page is written via `gbrain put`, the
auto-link post-hook automatically creates `attended` links from the meeting
to each attendee whose page is referenced as `[Name](people/slug)`. You don't
need to call `gbrain link` for attendees. You DO still need `gbrain timeline-add`
for dated events (auto-link only handles links, not timeline entries).
```
**Where:** In `### Phase 4: Entity propagation`, the line "Back-link from entity page
to meeting page" can be replaced with:
```markdown
4. Entity references in the meeting page body auto-create the link via auto-link.
For incoming references on the entity page (entity page → meeting page), edit
the entity page to mention the meeting and `put_page` it — auto-link handles
the rest.
```
---
## 3. signal-detector/SKILL.md
**Where:** Append to the end of `### Phase 2: Entity Detection`.
**Why:** Same logic as brain-ops — eliminates manual `gbrain link` after writing
originals/ideas pages that reference people or companies.
```markdown
**Auto-link (v0.12.0):** When you write/update an originals or ideas page that
references a person or company, the auto-link post-hook on `put_page`
automatically creates the link from the new page to that entity. You don't
need to call `gbrain link` manually. Timeline entries still need explicit calls.
```
---
## 4. enrich/SKILL.md
**Where:** Replace `### Step 7: Cross-reference` with the v0.12.0 version.
**Why:** Step 7 used to be primarily about creating links between related entity
pages. With auto-link, that's automatic. Step 7 is now about content updates,
not link creation.
Old (delete):
```markdown
### Step 7: Cross-reference
- Update company pages from person enrichment (and vice versa)
- Update related project/deal pages if relevant context surfaced
- Check index files if the brain uses them
- Add back-links manually via `gbrain link` for any new entity references
```
New (paste):
```markdown
### Step 7: Cross-reference
- Update company pages from person enrichment (and vice versa)
- Update related project/deal pages if relevant context surfaced
- Check index files if the brain uses them
**Note (v0.12.0):** Links between brain pages are auto-created on every
`put_page` call (auto-link post-hook). Step 7 focuses on content
cross-references (updating related pages' compiled truth with new signal
from this enrichment), not on creating links. Verify via the `auto_links`
field in the put_page response (`{ created, removed, errors }`).
Timeline entries still need explicit `gbrain timeline-add` calls.
```
---
## After all four diffs are applied
1. **Bump the version banner** at the top of each forked file:
```
# Based on gbrain v0.12.0 skills/<skill-name>, extended with <your-agent>-specific config
```
2. **Run the v0.12.0 backfill** (this populates the graph for your existing brain):
```bash
gbrain post-upgrade
```
The v0.12.0 release wires post-upgrade to call `apply-migrations --yes`
automatically, which runs the v0_12_0 orchestrator (schema → config check →
`extract links --source db` → `extract timeline --source db` → verify).
Idempotent; cheap when nothing is pending.
3. **Verify auto-link works:** ask the agent to write a test page that references
`[Some Person](people/some-person)`. Confirm the put_page response includes
`auto_links: { created: 1, removed: 0, errors: 0 }`.
4. **Verify graph traversal works:**
```bash
gbrain graph-query people/some-well-connected-person --depth 2
```
Should return an indented tree of typed edges.
---
## v0.12.2 hotfix (data-correctness, no skill edits)
v0.12.2 is a Postgres data-correctness hotfix. No forked skill files need to
change — the skill contracts are unchanged. But you DO need to run the migration,
and you should know about one behavior change in markdown parsing.
### 1. Run the migration (Postgres-backed brains)
```bash
gbrain upgrade
```
The `v0_12_2` orchestrator runs `gbrain repair-jsonb` automatically. It rewrites
rows where `jsonb_typeof = 'string'` across `pages.frontmatter`, `raw_data.data`,
`ingest_log.pages_updated`, `files.metadata`, and `page_versions.frontmatter`.
Idempotent, safe to re-run. PGLite brains no-op cleanly.
Verify after upgrade:
```bash
gbrain repair-jsonb --dry-run --json # expect totalRepaired: 0
```
### 2. Recover any truncated wiki articles
If your brain imported wiki-style markdown before v0.12.2, some pages were
silently truncated (any standalone `---` in body content was treated as a
timeline separator). Re-import from source:
```bash
gbrain sync --full
```
The new `splitBody` rebuilds `compiled_truth` correctly.
### 3. Know the splitBody contract going forward
`splitBody` now requires an explicit timeline sentinel. Recognized markers
(priority order):
1. `<!-- timeline -->` (preferred — what `serializeMarkdown` emits)
2. `--- timeline ---` (decorated separator)
3. `---` directly before `## Timeline` or `## History` heading (backward-compat)
A bare `---` in body text is now a markdown horizontal rule, not a timeline
separator. If your agent writes pages with a bare `---` delimiter, migrate to
`<!-- timeline -->` — the `serializeMarkdown` helper already does this.
### 4. Wiki subtypes now auto-typed
`inferType` now auto-detects five additional directory patterns as their own
page types (previously they all defaulted to `concept`):
| Path pattern | New type |
|------------------------|----------------|
| `/wiki/analysis/` | `analysis` |
| `/wiki/guides/` | `guide` |
| `/wiki/hardware/` | `hardware` |
| `/wiki/architecture/` | `architecture` |
| `/writing/` | `writing` |
If your skills or queries filter by `type=concept` and expect wiki content in
that bucket, update them to include the new types.
---
## v0.13.0 — Frontmatter Relationship Indexing
**Verdict: no action required for most skills.** v0.13 projects YAML frontmatter fields into the graph as typed edges. The ingestion API is unchanged — keep calling `put_page` with frontmatter the way you do today; the graph auto-populates behind the scenes.
Three skills get an optional new phase if you want to consume the new `auto_links.unresolved` response field. Without this, unresolvable frontmatter names silently skip (same as v0.12 behavior).
### 1. meeting-ingestion/SKILL.md (optional)
**Where:** Add a new section after "Phase 3: Write Meeting Page".
```markdown
### Phase 3.5: Check for unresolved attendees (v0.13+)
After `put_page`, inspect `response.auto_links.unresolved` — an array of frontmatter
references that did not resolve to existing pages. For meetings, this usually means
attendees you haven't created a person page for yet.
If `unresolved.length > 0`:
- Option 1 (create pages now): trigger an enrichment pass to build the missing people pages.
- Option 2 (defer): log the unresolved names to the enrichment queue for later.
- Option 3 (accept the gap): the attendee edge will not be created until a page exists.
Re-running `gbrain extract links --source db --include-frontmatter` after creating
the page fills in the missing edges.
```
### 2. enrich/SKILL.md (optional)
**Where:** Add to the enrichment trigger list.
```markdown
### Drain unresolved frontmatter names (v0.13+)
If any `put_page` response includes `auto_links.unresolved` entries, the enrichment
tier should pick up those (field, name) pairs and try to create the missing entity
pages. Example flow:
1. signal-detector captures a meeting with `attendees: [Alice Known, Unknown Person]`
2. put_page returns `auto_links.unresolved = [{field: 'attendees', name: 'Unknown Person'}]`
3. enrichment tier consumes `Unknown Person` → web search → creates `people/unknown-person.md`
4. The next put_page (or a backfill run) wires up the `attended` edge automatically
```
### 3. idea-ingest/SKILL.md (optional)
**Where:** Same pattern as meeting-ingestion — check `auto_links.unresolved` after `put_page`, route names to enrichment.
### Unchanged skills (no diffs needed)
- **brain-ops/SKILL.md** — auto-link mechanics are internal; the write path stays the same.
- **signal-detector/SKILL.md** — signal capture path unchanged.
- **query/SKILL.md** — `traverse_graph` now returns richer results automatically.
- **daily-task-manager/SKILL.md**, **briefing/SKILL.md**, **citation-fixer/SKILL.md**, **media-ingest/SKILL.md** — unchanged.
### New edge types you can filter in graph queries
v0.13 edges carry new `link_type` values. If your fork has graph-query skills that filter by type, these are now available:
- `works_at` (person → company) — from `company:`, `companies:`, or `key_people:`
- `founded` (person → company) — from `founded:`
- `invested_in` (investor → deal/company) — from `investors:` or `lead:`
- `led_round` (lead → deal) — from `lead:`
- `yc_partner` (partner → company) — from `partner:`
- `attended` (person → meeting) — from `attendees:`
- `discussed_in` (source → page) — from `sources:`
- `source` (page → source) — from `source:`
- `related_to` (page → target) — from `related:` or `see_also:`
### Migration timing
`gbrain upgrade` takes 2-5 min on a 46K-page brain (one-time). Runs out-of-process via `gbrain post-upgrade`. If your agent holds a DB connection during the upgrade, reconnect after; otherwise keep serving.
### Type normalization NOT in v0.13
Legacy rows with `link_type='attendee'` or `link_type='mention'` coexist with new `'attended'` / `'mentions'` rows. Your queries filtering on old type names keep working. A separate opt-in `gbrain normalize-types` command in v0.14 handles the rename.
## v0.14.0 shell jobs (optional adoption, no skill edits)
Adds a `shell` job type to Minions so deterministic cron scripts (API fetch, token
refresh, scrape + write) move off the LLM gateway. Zero tokens per fire. ~60%
gateway CPU headroom at typical scale. Feature is **off by default**, existing
installs keep running exactly as they did before. Nothing breaks.
To adopt, follow `skills/migrations/v0.14.0.md`. The short version:
1. Set `GBRAIN_ALLOW_SHELL_JOBS=1` on the worker process, then `gbrain jobs work`
(Postgres). On PGLite, every crontab invocation uses `--follow` for inline
execution; no persistent worker.
2. Classify each of your host's cron entries: LLM-requiring (keep on gateway) vs
deterministic (candidate for shell). Typical splits:
- **Deterministic → shell:** `ycli-token-refresh`, `x-oauth2-refresh`,
`x-garrytan-unified`, `calendar-sync-to-brain`, `github-pulse`,
`frameio-scan`, `flight-tracker`, `x-raw-json-backfill`.
- **LLM-requiring → stay:** `social-radar`, `content-ideas`, `adversary-vacuum`,
`ea-inbox-sweep`, `morning-briefing`, `brain-maintenance`.
3. For each deterministic cron, rewrite as:
```cron
3 13,16,19,22,1,4,7,10 * * * \
gbrain jobs submit shell \
--params '{"cmd":"node scripts/your-script.mjs","cwd":"/data/.openclaw/workspace"}' \
--max-attempts 3 --timeout-ms 300000
```
4. Watch `gbrain jobs get <id>` for exit_code / stdout_tail / stderr_tail on each
fire. Compare against pre-migration behavior before approving the next batch.
**No skill edits required.** The handler runs worker-side; skill files don't
change. If your host exposed custom handlers via the plugin contract (v0.11.0),
they still work the same way.
Iron rule: **never auto-rewrite the operator's crontab.** Every rewrite is
per-cron, human-approved, with a diff. If you want automation later, the
upcoming `gbrain crontab-to-minions <file>` helper is P1 in TODOS.
---
## v0.16.0: durable agent runtime
v0.15 ships `gbrain agent run` / `gbrain agent logs`, a new `subagent` handler
type in Minions, and a plugin contract for host-repo subagent defs. None of the
existing skills need surgery. The question for downstream agents is *how* to
adopt the new runtime, not how to patch around a breaking change.
### 1. Run a worker with an Anthropic key
The subagent handlers (`subagent` and `subagent_aggregator`) are always
registered on the worker. No separate opt-in flag — `ANTHROPIC_API_KEY` is
the natural cost gate (no key, the SDK call fails on the first turn), and
who-can-submit is already protected (`PROTECTED_JOB_NAMES` + trusted-submit:
MCP callers get `permission_denied`; only `gbrain agent run` can insert
these rows).
```bash
ANTHROPIC_API_KEY=sk-ant-... gbrain jobs work
```
Worker startup prints:
```
[minion worker] subagent handlers enabled
```
### 2. Ship your subagents as a plugin (OpenClaw + similar)
Move your custom subagent definitions out of your gbrain fork and into your own
repo as a plugin. Concretely:
```
~/<your-agent>/gbrain-plugin/
├── gbrain.plugin.json
└── subagents/
├── meeting-ingestion.md
├── signal-detector.md
└── daily-task-prep.md
```
`gbrain.plugin.json`:
```json
{
"name": "your-openclaw",
"version": "2026.4.20",
"plugin_version": "gbrain-plugin-v1"
}
```
Each `subagents/*.md` is a plain-text agent definition — YAML frontmatter +
body-as-system-prompt. Recognized frontmatter fields: `name`, `model`,
`max_turns`, `allowed_tools` (must subset the derived brain-tool registry).
Turn it on:
```bash
export GBRAIN_PLUGIN_PATH="$HOME/<your-agent>/gbrain-plugin"
```
Worker startup prints `[plugin-loader] loaded '<name>' v<ver> (N subagents)`
per plugin; any rejection (bad manifest, unknown tool in `allowed_tools`,
version mismatch) shows up as a loud warning at startup, not a silent dispatch-
time failure. See `docs/guides/plugin-authors.md` for the full contract.
### 3. Replace ephemeral subagent runs with durable ones
If your agent currently spawns ephemeral subagents (OpenClaw `Agent()`, ad-hoc
Anthropic API calls, etc.) for work that should survive crashes, sleeps, or
worker restarts, migrate those to `gbrain agent run`. The durability is free:
```bash
gbrain agent run "analyze my last 50 journal pages for recurring themes" \
--subagent-def analyzer --fanout-manifest manifests/journal-pages.json
```
Every turn persists to `subagent_messages`, every tool call is a two-phase
ledger, and `gbrain agent logs <job>` shows where it died + what the last
successful call returned. No more "re-run from scratch because the session
context evaporated."
### 4. `put_page` from subagents writes under an agent namespace
If you adopted the v0.15 subagent runtime, note that `put_page` calls
originating from a subagent's tool dispatch MUST target
`wiki/agents/<subagent_id>/...`. The schema shown to the model enforces this
on first try; a server-side fail-closed check rejects anything else. This
does NOT affect your skill files, CLI put_page calls, or MCP put_page —
only tool-dispatched writes from inside an LLM loop.
Aggregation output (the final "here's what all N children found" brain page)
goes via a separate trusted CLI path, not through a subagent tool call, so
it can write anywhere you want.
Iron rule: **never grant an agent write access beyond its namespace**. The
server-side check exists because dispatcher bugs happen; treat it as defense
in depth, not the primary boundary.
---
## Future versions
When gbrain ships a new version, this doc will be updated with the diffs for that
version. Each new version appends a section; old sections stay so you can catch up
multiple versions at once.
To check what your fork is missing:
```bash
diff <(grep -A3 "Based on gbrain" ~/<your-fork>/skills/brain-ops/SKILL.md) \
<(grep "v[0-9]" ~/gbrain/skills/migrations/ | tail -3)
```
---