triton6564685/gbrain

Fork 0

Files

Garry Tan 08b3698e90

E2E Tests / Tier 1 (Mechanical) (push) Failing after 29s

Details

Test / gitleaks (push) Failing after 10s

Details

Test / test (push) Failing after 26s

Details

E2E Tests / Tier 2 (LLM Skills) (push) Has been skipped

Details

v0.18.2: migration hardening — integrity fix + reserved-connection primitive (#356 )

* fix: migration hardening — timeout handling, lock detection, diagnostics

Addresses all 8 issues from the v0.18.0 production upgrade field report:

1. LATEST_VERSION now uses Math.max() instead of array-last (was wrong
   when MIGRATIONS array is out of order: [.., 23, 22, 21, 20, 15, 16])

2. Pre-flight lock check: runMigrations() queries pg_stat_activity for
   idle-in-transaction connections >5min before attempting DDL, prints
   PIDs and kill advice

3. SET LOCAL statement_timeout = 600s inside migration transactions for
   Supabase compatibility (server-enforced timeout overrides session SET)

4. Catches Postgres error 57014 (statement_timeout) with actionable
   diagnostics instead of raw stack trace

5. Better progress output: prints schema version range, migration names
   before/after, checkmarks on success

6. Migration 21 fix: drops files.page_slug_fkey before swapping the
   pages unique constraint (guarded for PGLite which has no files table)

7. idle_in_transaction_session_timeout = 5min on all Postgres connections
   (both instance-level and module-level) to prevent 24h stale locks

8. apply-migrations CLI warns when schema migrations are pending, since
   it only runs orchestrator migrations (System B) not schema DDL (System A)

All 34 migrate tests pass. Typecheck clean.

* feat(engine): BrainEngine.withReservedConnection() primitive + DRY session defaults

Adds a ReservedConnection interface and withReservedConnection(fn) method to
BrainEngine. Postgres uses postgres-js sql.reserve() to pin a single backend for
the callback; PGLite passes through its single backing connection. Used
immediately for non-transactional DDL timeout handling (next commit) and
foundation for the future write-quiesce design.

Extracts setSessionDefaults(sql) helper in db.ts, absorbing the duplicated
idle_in_transaction_session_timeout block that was copy-pasted between db.ts and
postgres-engine.ts (Gap 5 / ER-C1). Single write site, both connect paths call
the helper now.

Codex plan-review flagged that advisory-lock designs on postgres.js pools
require a reserved-connection primitive; this is that primitive.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(migrate): close v21/v23 integrity window + non-transactional DDL timeout

Two codex-caught issues that both the initial review and the engineering review
missed:

1. Migration 21 integrity window. Original v21 dropped files_page_slug_fkey and
   persisted config.version=21, leaving files WITHOUT any FK to pages until v23
   ran and added the replacement files.page_id. Process death between v21 and
   v23 left files unconstrained while file_upload / `gbrain files` kept
   accepting writes. Fix: v21 uses sqlFor to split engines (Postgres gets
   additive-only, PGLite gets the full UNIQUE swap since it has no concurrent
   writers). v23's handler now wraps the FK drop + UNIQUE swap + page_id
   addition + backfill + ledger creation in one engine.transaction(). Atomic.

2. Non-transactional DDL timeout gap. runMigrationSQL's else-branch (for
   migrations with transaction:false, like CREATE INDEX CONCURRENTLY) ran the
   DDL on the shared pool with no timeout override. Supabase's 2-min server
   statement_timeout would abort a CONCURRENTLY index on any large table.
   Fix: use engine.withReservedConnection + SET statement_timeout='600000'
   inside the isolated connection.

Also: extracted getIdleBlockers(engine) helper — single source of truth for the
pg_stat_activity query. Shared by the DDL pre-flight warning and the new
`gbrain doctor --locks` CLI (next commit).

57014 diagnostic rewritten to the 4-part "what / why / fix / verify" pattern.
No longer references a non-existent CLI flag.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(doctor): gbrain doctor --locks CLI flag

The v0.18.0 57014 diagnostic referenced `gbrain doctor --locks` but the flag
didn't exist. Users hitting statement_timeout would run the suggested command
and get "unknown option". Implemented now.

On Postgres: queries pg_stat_activity via the new getIdleBlockers() helper,
prints each blocker's PID, state, query_start, truncated query, and the exact
`SELECT pg_terminate_backend(<pid>);` command. Exits 1 on blockers, 0 on clean.

On PGLite: prints "not applicable" (no pool, no idle-in-tx concept) and exits
0. The flag is a safe no-op there.

--json emits structured output: {status, blockers: [...]}.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: migration hardening regression guards (unit + E2E)

test/migrate.test.ts — 10 new regression guards:
- LATEST_VERSION equals max(versions) under any array order. Guards against
  regression to array[-1] (the field report's "told I'm at v16 while 7
  migrations behind" bug).
- getIdleBlockers shape: pglite returns [], postgres returns rows, query
  failure returns [] (not throw).
- 57014 catch path: mocked engine throws err.code='57014', assert the 4-part
  diagnostic hits stderr with what/why/fix/verify markers.
- apply-migrations pre-flight warning structural check.
- setSessionDefaults DRY check: helper defined once in db.ts, postgres-engine
  calls it, neither path inlines the SET.
- runMigrationSQL reserved-connection usage structural check.
- Migration 21 test updates for engine-split sqlFor (codex restructure).
- Migration 23 atomic-transaction assertion.

test/e2e/migrate-chain.test.ts (new): 11 E2E tests against real Postgres:
- Post-chain schema invariants (composite UNIQUE exists, old pages_slug_key
  gone, files_page_slug_fkey gone, files.page_id column present,
  file_migration_ledger table populated).
- doctor --locks real-PG integration (second connection + BEGIN + idle,
  assert the PID appears in pg_stat_activity).
- runMigrationsUpTo advances config.version to target, not past.
- withReservedConnection round-trip (executes queries, session GUC visible
  inside callback).

test/e2e/helpers.ts: new runMigrationsUpTo(engine, targetVersion) and
setConfigVersion(version) helpers. The v15→v23 chain E2E needed a way to stop
at intermediate schema versions; neither `gbrain init --migrate-only` nor the
existing setupDB() supported this. Codex caught that the proposed E2E wasn't
implementable without new harness work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump version and changelog (v0.18.2)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(changelog): rewrite v0.18.2 entry to match gstack CLAUDE.md format

Applied the gstack CHANGELOG style rules from ~/git/gstack/CLAUDE.md:

- Two-line bold headline lands a verdict, not a feature list.
- Single coherent lead story instead of "Second headline... Third headline..."
- "The numbers that matter" table with BEFORE / AFTER / Δ columns, counted
  against the v0.18.0 field report (the concrete source).
- "What this means for your workflow" closing paragraph with the 4-command
  recovery path.
- TODOS.md references removed from user-facing body (explicit rule: never
  mention TODOS, internal tracking, or contributor-facing details in the
  user-read portion).
- Contributor-only detail (helper extraction, test file paths, interface
  specifics) moved to a "For contributors" subsection.
- Itemized changes reorganized as Added / Changed / Fixed / For contributors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(changelog): v0.18.2 voice-rule audit — headline, em dashes

Audit against ~/git/gstack/CLAUDE.md voice rules:

- Headline tightened from 32 words to 19 (rule says 10-14; repo convention
  on v0.18.1 was 22, this is closer).
- Em dashes removed from 7 lines. Replaced with commas, colons, or periods
  per the "no em dashes" rule.
- AI vocabulary audit: clean.
- Banned phrases audit: clean.

Content unchanged. Only voice/punctuation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: root <root@localhost>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

2026-04-23 10:39:28 -07:00

31 KiB

Raw Blame History

TODOS

check-resolvable

File tracking issues for Checks 5 + 6 (deferred in PR #325)

Priority: P2

What: src/commands/check-resolvable.ts currently points DEFERRED[].issue at GitHub issue search URLs (?q=TBD-check-5, ?q=TBD-check-6). File real tracking issues and grep-replace both placeholders with the real URLs.

Why: v0.16.4 shipped gbrain check-resolvable with 4 of the 6 checks from the original spec. Checks 5 (trigger routing eval) and 6 (brain filing) were explicitly deferred during plan-ceo-review because they each need new detection logic. The CLI's deferred[] JSON field is meant to surface these to agents so they know the coverage boundary — the TBD placeholders do the right thing mechanically but aren't clickable.

How:

gh issue create -t "check-resolvable Check 5: trigger routing eval" -b "..." — detection: every skill's own frontmatter trigger should match the RESOLVER.md entry pointing at that skill. Needs new issue type (e.g. mis_route).
gh issue create -t "check-resolvable Check 6: brain filing validation" -b "..." — detection: scan SKILL.md body for brain paths (e.g., brain/people/, brain/companies/), cross-reference with skills/_brain-filing-rules.md. Flag mutating skills missing entries.
Replace TBD-check-5 and TBD-check-6 in src/commands/check-resolvable.ts with the real issue URLs.

Effort: ~15 min mechanical (issue filing + grep-replace). Implementation of the checks themselves is a separate, larger piece of work — the TODO here is just the issue filing + URL swap.

P1 (BrainBench v1.1 — categories deferred from PR #188)

BrainBench Cat 5: Source Attribution / Provenance

What: Eval that gbrain correctly cites the right page when claiming fact F, and resolves source-conflict cases (3 sources disagree on $5M raise — which wins?). 200 queries across citation/provenance/conflict sub-categories on a 300-entity dataset with deliberately-conflicting sources.

Why deferred from PR #188: Needs ~$100-200 of Opus tokens to generate the conflict-graph dataset. v1 scope was procedural-only.

Threshold: citation_recall > 90%, citation_precision > 85%, conflict_resolution > 70%.

Depends on: Identity Resolution (Cat 3) shipped — uses same world generator pattern.

BrainBench Cat 6: Auto-link Precision under Prose (at scale)

What: Cat 10 (Robustness/Adversarial) covered code-fence leak and false-positive substrings on 22 hand-crafted cases. v1.1 extends this to 500+ prose-heavy pages with realistic narrative noise. Tests link precision in the wild, not just edge cases.

Why deferred from PR #188: Needs prose-heavy generated corpus (~$100-150 Opus). Existing 22-case eval already caught + fixed the code-fence leak bug.

Threshold: link_precision > 95% on prose, type_accuracy > 80% on varied phrasing.

BrainBench Cat 8: Skill Behavior Compliance

What: Replays 100 inbound signals through a real LLM agent loop with gbrain skills loaded. Measures: brain-first lookup compliance, back-link iron-law adherence, citation format compliance, tier escalation correctness.

Why deferred: Needs real LLM API loop (~$2K total — most expensive single category).

Threshold: brain_first_compliance > 95%, back_link_compliance > 90%, citation_format > 95%.

BrainBench Cat 9: End-to-End Workflows

What: 50 end-to-end scenarios across meeting ingestion, email-to-brain, daily-task-prep, briefing generation, sync cycle. Rubric-graded (10-15 criteria each).

Why deferred: Needs LLM agent loop (~$1K). Plus 50 hand-built rubrics.

Threshold: 80% scenario pass rate per workflow.

What: PDF/image/audio/video ingestion accuracy. 50 PDFs, 30 images, 20 audio files, 10 videos, 30 HTML pages. Per-modality recall and fidelity metrics.

Why deferred: Needs licensed real datasets (Common Voice for audio etc.). Dataset curation is the bulk of the work.

Threshold: PDF text fidelity > 95% (text-based) / > 80% (scanned), audio WER < 15%, entity_recall > 80% post-ingestion.

BrainBench Cat 1+2 at full scale

What: Existing benchmark-search-quality.ts (29 pages, 20 queries) and benchmark-graph-quality.ts (80 pages, 5 queries) currently pass at small scale. v1.1 extends both to 2-3K rich-prose pages generated via Opus to surface scale-dependent failures (tied keyword clusters, hub-node fan-out, prose-noise extraction precision).

Why deferred from PR #188: Needs ~$200-300 of Opus tokens for the rich corpus. The 80-page version already proves algorithmic correctness; scale-up proves it survives real-world load.

Threshold: maintain v1 metrics at 30x scale.

v0.10.4: inferLinkType prose precision fix

Shipped in PR #188. BrainBench Cat 2 rich-corpus type accuracy went from 70.7% → 88.5%. Fix: widened verb regexes (added "led the seed/Series A", "early investor", "invests in", "portfolio company", etc.), tightened ADVISES_RE to require explicit advisor rooting (generic "board member" matches investors too), widened context window 80→240 chars, added person-page role prior (partner-bio language → invested_in for outbound company refs only). Per-type after fix: invested_in 91.7% (was 0%), mentions 100%, attended 100%. works_at 58% and advises 41% are next iteration's residuals.

v0.10.5: inferLinkType residuals (works_at, advises)

What: After the v0.10.4 fix, two link types still under-perform on rich prose. Drive these to >85% type accuracy in next iteration.

works_at: 58% type accuracy. Engineer/employee pages use varied phrasings the regex doesn't catch ("spent some time at", "joined the team", narrative "is currently at" without a verb). Approach: extend WORKS_AT_RE; consider employee-role page prior similar to partner prior.

advises: 41% type accuracy. Advisor pages often describe board roles without using the word "advisor" explicitly ("on Beta Health's board", "joined Beta as a board member"). The v0.10.4 fix tightened ADVISES_RE to require "advisor" rooting to avoid false positives from investors. Need a tighter signal that distinguishes "advisor on board" from "investor on board" — likely an advisor-role page prior plus verb-pattern combinations.

Threshold: Cat 2 rich-prose type accuracy > 92% (currently 88.5%).

v0.10.4: gbrain alias resolution feature (driven by Cat 3)

What: Add an alias table to gbrain so "Sarah Chen" / "S. Chen" / "@schen" / "sarah.chen@example.com" resolve to one canonical entity. Schema: aliases (id, slug, alias_text) with a unique index. Search blends alias matches into hybrid scoring.

Why: BrainBench Cat 3 measured 31% recall on undocumented aliases — that's the v0.10.x baseline. With alias table, should jump to 80%+.

Depends on: Cat 3 baseline (shipped in PR #188).

P1

Minions shell jobs — Phase 2 scheduling (deferred from v0.13.0)

What: minion_schedules table + autopilot-cycle scanner that submits due shell jobs.

Why: v0.13.0 moves shell scripts to Minions but still leaves scheduling in the host crontab. Your OpenClaw's scripts/service-manager.sh + crontab is the only piece left on the host side. A DB-driven scheduler would mean a single gbrain autopilot --install replaces the host crontab entirely, scheduling is visible via gbrain jobs list --scheduled, and downtime-on-one-machine tolerance improves (schedule is shared DB state, not per-host crontab).

Pros: Canonical host-agnostic deployment. No more host-specific crontab.

Cons: Cross-engine migration complexity (new table on both PGLite + Postgres). Autopilot-cycle scanner needs to handle missed-schedule semantics (fire-once-on-startup or skip-if-past-now), and this is where every other cron-like system has historically accrued bugs.

Depends on: v0.13.0 shell jobs shipped. ✅

`gbrain crontab-to-minions <file>` migration helper (deferred from v0.13.0)

What: Parse an existing crontab file, emit a proposed rewrite using gbrain jobs submit shell ... for each deterministic entry, keep LLM-requiring entries as-is.

Why: Hand-rewriting ~14 OpenClaw cron entries is error-prone and one-shot. A helper would make the migration reversible and auditable (diff the before/after crontab, dry-run the first N, commit).

Pros: Removes the "rewrite 14 lines by hand" tax every agent operator pays on adoption.

Cons: Crontab parsing is historically fiddly (5-field vs 6-field, @hourly aliases, Vixie extensions, env vars in crontab). Could misrewrite entries with shell substitution.

Depends on: v0.13.0 shell jobs shipped. ✅

Batch the DB-source extract read path (deferred from v0.12.1)

What: extractLinksFromDB and extractTimelineFromDB at src/commands/extract.ts:447, 504 issue one engine.getPage(slug) per slug after engine.getAllSlugs(). On a 47K-page brain that's still 47K serial reads over the Supabase pooler.

Why: v0.12.1 fixed the write-side N+1 with batched INSERTs (~100x fewer round-trips). The read side still does serial getPage() calls — each fetches compiled_truth + timeline + frontmatter (tens of KB per page). On a 47K-page Supabase brain that's ~10-20 minutes of read latency before any work happens. The v0.12.0 orchestrator's backfill uses --source db, so this stays slow until fixed.

Pros: Mirrors the write-side fix on the read path. Combined with batched writes, full re-extract on a 47K-page brain should drop from "minutes" to "seconds" end-to-end. Eliminates the implicit listPages-pagination-mutation learning risk by giving you a snapshot read.

Cons: New engine method (getPagesBatch(slugs: string[]) → Promise<Page[]> or a streaming cursor) needs to land on both PGLite and Postgres. Memory budget — a 47K-page brain with ~30KB/page is ~1.4GB if loaded all at once; needs chunked iteration (e.g., 500 slugs/query, stream-process).

Context: Codex's plan-time review and the testing/performance specialists at ship time both flagged this. Filed during v0.12.1 to ship the bug fix without scope creep. Approach: add getPagesBatch(slugs) returning chunked results, then update the 4 DB-source extract paths to consume it.

Depends on: v0.12.1 ships first.

Batch embedding queue across files

What: Shared embedding queue that collects chunks from all parallel import workers and flushes to OpenAI in batches of 100, instead of each worker batching independently.

Why: With 4 workers importing files that average 5 chunks each, you get 4 concurrent OpenAI API calls with small batches (5-10 chunks). A shared queue would batch 100 chunks across workers into one API call, cutting embedding cost and latency roughly in half.

Pros: Fewer API calls (500 chunks = 5 calls instead of ~100), lower cost, faster embedding.

Cons: Adds coordination complexity: backpressure when queue is full, error attribution back to source file, worker pausing. Medium implementation effort.

Context: Deferred during eng review because per-worker embedding is simpler and the parallel workers themselves are the bigger speed win (network round-trips). Revisit after profiling real import workloads to confirm embedding is actually the bottleneck. If most imports use --no-embed, this matters less.

Implementation sketch: src/core/embedding-queue.ts with a Promise-based semaphore. Workers await queue.submit(chunks) which resolves when the queue has room. Queue flushes to OpenAI in batches of 100 with max 2-3 concurrent API calls. Track source file per chunk for error propagation.

Depends on: Part 5 (parallel import with per-worker engines) -- already shipped.

P0

Fix `bun build --compile` WASM embedding for PGLite

What: Submit PR to oven-sh/bun fixing WASM file embedding in bun build --compile (issue oven-sh/bun#15032).

Why: PGLite's WASM files (~3MB) can't be embedded in the compiled binary. Users who install via bun install -g gbrain are fine (WASM resolves from node_modules), but the compiled binary can't use PGLite. Jarred Sumner (Bun founder, YC W22) would likely be receptive.

Pros: Single-binary distribution includes PGLite. No sidecar files needed.

Cons: Requires understanding Bun's bundler internals. May be a large PR.

Context: Issue has been open since Nov 2024. The root cause is that bun build --compile generates virtual filesystem paths (/$bunfs/root/...) that PGLite can't resolve. Multiple users have reported this. A fix would benefit any WASM-dependent package, not just PGLite.

Depends on: PGLite engine shipping (to have a real use case for the PR).

ChatGPT MCP support (OAuth 2.1)

What: Add OAuth 2.1 with Dynamic Client Registration to the self-hosted MCP server so ChatGPT can connect.

Why: ChatGPT requires OAuth 2.1 for MCP connectors. Bearer token auth is NOT supported. This is the only major AI client that can't use GBrain remotely.

Pros: Completes the "every AI client" promise. ChatGPT has the largest user base.

Cons: OAuth 2.1 is a significant implementation: authorization endpoint, token endpoint, PKCE flow, dynamic client registration. Estimated CC: ~3-4 hours.

Context: Discovered during DX review (2026-04-10). All other clients (Claude Desktop/Code/Cowork, Perplexity) work with bearer tokens. The Edge Function deployment was removed in v0.8.0. OAuth needs to be added to the self-hosted HTTP MCP server (or gbrain serve --http when implemented).

Depends on: gbrain serve --http (not yet implemented).

Runtime MCP access control

What: Add sender identity checking to MCP operations. Brain ops return filtered data based on access tier (Full/Work/Family/None).

Why: ACCESS_POLICY.md is prompt-layer enforcement (agent reads policy before responding). A direct MCP caller can bypass it. Runtime enforcement in the MCP server is the real security boundary for multi-user and remote deployments.

Pros: Real security boundary. ACCESS_POLICY.md becomes enforceable, not advisory.

Cons: Requires adding sender_id or access_tier to OperationContext. Each mutating operation needs a permission check. Medium implementation effort.

Context: From CEO review + Codex outside voice (2026-04-13). Prompt-layer access control works in practice (same model as Garry's OpenClaw) but is not sufficient for remote MCP where direct tool calls bypass the agent's prompt.

Depends on: v0.10.0 GStackBrain skill layer (shipped).

P1 (new from v0.7.0)

Constrained health_check DSL for third-party recipes

Completed: v0.9.3 (2026-04-12). Typed DSL with 4 check types (http, env_exists, command, any_of). All 7 first-party recipes migrated. String health checks accepted with deprecation warning + metachar validation for non-embedded recipes.

P1 (new from v0.11.0 — Minions)

Per-queue rate limiting for Minions

What: Token-bucket rate limiting per queue via a new minion_rate_limits table (queue, capacity, refill_rate, tokens, updated_at), with acquire/release in claim().

Why: The #1 daily OpenClaw pain is spawn storms hitting OpenAI/Anthropic rate limits. max_children caps fan-out per parent, but a queue with 50 ready jobs will still slam the API. Every Minions consumer currently reinvents token-bucket in user code.

Pros: First-class rate limiting means no consumer has to roll their own. Composes with max_children (which is per-parent) to give two orthogonal throttles.

Cons: Adds a write hotspot on the rate-limit row. Mitigate by keeping it a simple UPDATE ... WHERE tokens > 0 RETURNING that fails fast and puts the claim back in the pool.

Effort: ~2 hours. Deferred from v0.11.0 to keep the parity PR at a reviewable size.

Depends on: Minions (shipped in v0.11.0).

Minions repeat/cron scheduler

What: BullMQ-style repeatable jobs. queue.add(name, data, { repeat: { cron: '0 * * * *' } }).

Why: Idempotency keys (shipped in v0.11.0) are the foundation. Consumers currently use launchd/cron to fire gbrain jobs submit, but a native scheduler inside the worker would be cleaner and portable across deployments.

Pros: One mental model for both immediate and scheduled work. Idempotency prevents double-fire.

Cons: Every cron library has edge cases (DST, missed intervals on worker restart). Use a battle-tested parser.

Effort: ~1 day.

Depends on: Idempotency keys (shipped in v0.11.0).

Minions worker event emitter

What: worker.on('job:completed', handler) / worker.on('job:failed', ...) instead of polling.

Why: Consumers currently poll getJob(id) to watch state changes. An event API is the ergonomic BullMQ has and Minions doesn't.

Effort: ~4 hours.

`waitForChildren(parent_id, n)` / `collectResults(parent_id)` helpers

What: Convenience wrappers over readChildCompletions for common fan-in patterns.

Why: The child_done inbox primitive shipped in v0.11.0. Now add the ergonomic API on top so orchestrators don't have to write the polling loop.

Effort: ~2 hours.

Depends on: child_done inbox primitive (shipped in v0.11.0).

P2

Orchestrator + runner double-write to migrations ledger (deferred from v0.18.2 codex review)

What: src/commands/migrations/v0_18_0.ts:200-208 appends an entry to ~/.gbrain/migrations/completed.jsonl while src/commands/apply-migrations.ts:374-386 also appends one for the same orchestrator run. The dedupe guard in src/core/preferences.ts:120-131 only suppresses duplicate complete entries, not partial entries. Result: distorted wedge counting (3-consecutive-partials-triggers-wedge logic sees 6 partials when it should see 3).

Why: Codex plan-review caught this during PR #356 while verifying the two-migration-systems resume boundary. Not blocking v0.18.2 shipping because it only affects the wedge detection threshold, not correctness of the migration itself.

Fix: Pick one writer (prefer apply-migrations.ts runner as the single source of truth, remove the orchestrator-side append). Fold into feat/agent-migration-devex follow-up PR, which already touches both files for the migrate-command consolidation work.

Depends on: v0.18.2 shipped. ✅

22K-page resync is 30+ minutes on large brains (deferred from v0.18.2 codex review)

What: When a schema migration requires data backfill (e.g., computing page_id from page_slug across all files rows), src/commands/sync.ts:248-251, 311-337 iterates per-file. None of v0.18.2's hardening work shrinks this path. On a 22K-page brain the resync takes 30+ minutes; at 500K pages it would be several hours.

Why: Codex explicitly called out that none of PR #356 or the two follow-up PRs addresses the resync execution model. This is a separate performance-design problem.

Options to explore:

(a) Parallel page import via worker pool (Minions-based).
(b) Bulk COPY-based import replacing the per-file INSERT.
(c) Incremental resync that only rewrites changed rows (needs content hash or updated_at gating).

Priority: P2 now, upgrade to P1 if another heavy migration ships that needs backfill at this scale.

Depends on: v0.18.2 shipped. ✅

Minions: `gbrain jobs stats --orphaned` (deferred from v0.13.0)

What: New CLI flag / output column surfacing jobs that are waiting with no registered handler on any live worker.

Why: v0.13.0 adds shell jobs that require GBRAIN_ALLOW_SHELL_JOBS=1 on the worker. If an operator submits a shell job but no worker with the flag is running, the row sits in waiting silently. The CLI's starvation warning + docs help at submit time; this TODO surfaces the problem at operational-check time.

Pros: Closes the "did my cron actually run" ambiguity for multi-machine deployments.

Cons: Knowing "no worker has this handler registered" requires worker heartbeat tracking, which Minions doesn't have yet (it's stateless at DB level beyond lock_token). Could be approximated by "no jobs of this name have completed in last N minutes AND count of waiting is > 0."

Depends on: v0.13.0 shell jobs shipped. ✅

Minions: AbortReason plumbing on MinionJobContext (deferred from v0.13.0)

What: Handlers today can't distinguish whether ctx.signal.aborted fired due to timeout, cancel, or lock-loss. v0.13.0 derives this at worker-catch-time from abort.signal.reason, but the handler can't see it directly. Expose ctx.abortReason?: 'timeout' | 'cancel' | 'lock-lost' | 'shutdown' on the context.

Why: Shell handler's kill-sequence today can't decide "retry this" (lock-lost) vs "don't retry, user cancelled" (cancel) — they look the same. A typed AbortReason lets handlers make that decision for themselves.

Pros: Handlers get richer signals.

Cons: Small surface-area addition to the handler API. Not strictly required since the worker already makes the retry/dead decision for them.

Depends on: v0.13.0 shell jobs shipped. ✅

Minions: blocking-mode audit log for true forensic integrity (deferred from v0.13.0)

What: Opt-in mode for shell-audit where appendFileSync failures DO block submission instead of logging-and-continuing.

Why: v0.13.0 ships the audit log in best-effort mode, which means a disk-full attacker can silently disable the forensic trail. Acceptable for v0.13.0 because the primary use is operational ("what did this cron do last Tuesday"), not security forensics. Operators who want fail-closed semantics should have a flag.

Pros: Enables true forensic integrity for deployments that need it.

Cons: Fail-closed means a transient disk issue blocks shell submissions, which can be worse than a missing log line for most operators. Opt-in is the right shape but adds surface area.

Depends on: v0.13.0 shell jobs shipped. ✅

Minions: configurable per-job output buffer sizes (deferred from v0.13.0)

What: Add max_stdout_bytes / max_stderr_bytes to ShellJobParams; override the 64KB/16KB defaults.

Why: 64KB/16KB covers typical OpenClaw scripts today but a verbose benchmark or a debug-dump script could need more.

Depends on: First shell-job author who actually needs it. Don't pre-build the flag.

Security hardening follow-ups (deferred from security-wave-3)

What: Close remaining security gaps identified during the v0.9.4 Codex outside-voice review that didn't make the wave's in-scope cut.

Why: Wave 3 closed 5 blockers + 4 mediums. These are the known residuals. Each is an independent hardening item that becomes trivial as Runtime MCP access control (P0 above) lands.

Items (each a separate small task):

DNS rebinding protection for HTTP health_checks. Current isInternalUrl validates the hostname string; DNS resolution happens later inside fetch. A malicious DNS server can return a public IP on first lookup and an internal IP on the actual request. Fix: resolve hostname via dns.lookup before fetch, pin the IP with a custom http.Agent lookup override, re-validate post-resolution. Alternative: use ssrf-req-filter library.
Extended IPv6 private-range coverage. Block fc00::/7 (Unique Local Addresses), fe80::/10 (link-local), 2002::/16 (6to4), 2001::/32 (Teredo), ::/128. Current code covers ::1, ::, and IPv4-mapped (::ffff:*) via hex hextet parsing.
IPv4 shorthand parsing. 127.1 (legacy 2-octet form = 127.0.0.1), 127.0.1 (3-octet), mixed-radix with trailing dots. Current code handles hex/octal/decimal integer-form IPs but not these shorthand variants.
Broader operation-layer limit caps. traverse_graph depth param, plus get_chunks, get_links, get_backlinks, get_timeline, get_versions, get_raw_data, resolve_slugs — all currently accept unbounded limit/depth. Wave 3 only clamped list_pages and get_ingest_log.
sync_brain repo path validation. The repo parameter accepts an arbitrary filesystem path. Same threat model as file_upload before wave 3. Add validateUploadPath (strict) for remote callers.
file_upload size limit. readFileSync loads the entire file into memory. Trivial memory-DoS from MCP. Add ~100MB cap (matches CLI's TUS routing threshold) and stream for larger files.
file_upload regular-file check. Reject directories, devices, FIFOs, Unix sockets via stat.isFile() before readFileSync.
Explicit confinement root (H2). file_upload strict mode currently uses process.cwd(). Move to ctx.config.upload_root (or derive from where the brain's schema lives) so MCP server cwd can't be the wrong anchor.

Effort: M total (human: ~1 day / CC: ~1-2 hrs).

Priority: P2 — deferred consciously. Wave 3 closed the easily-exploitable paths. These are the defense-in-depth follow-ups.

Depends on: Security wave 3 shipped. None are blockers for Runtime MCP access control, but all three security workstreams (this, that P0, and the health-check DSL) converge on the same zero-trust MCP goal.

Community recipe submission (`gbrain integrations submit`)

What: Package a user's custom integration recipe as a PR to the GBrain repo. Validates frontmatter, checks constrained DSL health_checks, creates PR with template.

Why: Turns GBrain from a single-author integration set into a community ecosystem. The recipe format IS the contribution format.

Pros: Community-driven integration library. Users build Slack-to-brain, RSS-to-brain, Discord-to-brain.

Cons: Support burden. Need constrained DSL (P1) before accepting third-party recipes. Need review process for recipe quality.

Context: From CEO review (2026-04-11). User explicitly deferred due to bandwidth constraints. Target v0.9.0.

Depends on: Constrained health_check DSL (P1) — SHIPPED in v0.9.3.

Always-on deployment recipes (Fly.io, Railway)

What: Alternative deployment recipes for voice-to-brain and future integrations that run on cloud servers instead of local + ngrok.

Why: ngrok free URLs are ephemeral (change on restart). Always-on deployment eliminates the watchdog complexity and gives a stable webhook URL.

Pros: Stable URLs, no ngrok dependency, production-grade uptime.

Cons: Costs $5-10/mo per integration. Requires cloud account.

Context: From DX review (2026-04-11). v0.7.0 ships local+ngrok as v1 deployment path.

Depends on: v0.7.0 recipe format (shipped).

`gbrain serve --http` + Fly.io/Railway deployment

What: Add gbrain serve --http as a thin HTTP wrapper around the stdio MCP server. Include a Dockerfile/fly.toml for cloud deployment.

Why: The Edge Function deployment was removed in v0.8.0. Remote MCP now requires a custom HTTP wrapper around gbrain serve. A built-in --http flag would make this zero-effort. Bun runs natively, no bundling seam, no 60s timeout, no cold start.

Pros: Simpler remote MCP setup. Users run gbrain serve --http behind ngrok instead of building a custom server. Supports all 30 operations remotely (including sync_brain and file_upload).

Cons: Users need ngrok ($8/mo) or a cloud host (Fly.io $5/mo, Railway $5/mo). Not zero-infra.

Context: Production deployments use a custom Hono server wrapping gbrain serve. This TODO would formalize that pattern into the CLI. ChatGPT OAuth 2.1 support depends on this.

Depends on: v0.8.0 (Edge Function removal shipped).

P2 (knowledge graph follow-ups)

Auto-link skipped writes generate redundant SQL

What: When gbrain put is called with identical content (status=skipped), runAutoLink still does a full getLinks + per-candidate addLink loop. On N identical writes of a 50-entity page that's 50N round trips.

Why: Defensive reconciliation catches drift between page text and links table, but on truly idempotent writes it's wasted work.

Pros: Lower DB load on cron-style re-syncs. Keeps put_page latency tight under bulk MCP usage.

Cons: Need to track whether links could have drifted independent of content (e.g., a target page was deleted). Conservative approach: only skip auto-link reconciliation if status=skipped AND existing links match desired set (which still requires the getLinks call).

Context: Caught in /ship adversarial review (2026-04-18). Acceptable for v0.10.3 because auto-link runs in a transaction with row locks, so amplification cost is bounded.

Effort estimate: S (CC: ~10min) Priority: P2 Depends on: Nothing.

Audit `extract --source db` against auto_link config flag

What: gbrain extract links --source db writes to the same links table that auto_link=false is supposed to opt out of. The two are conceptually distinct (extract is intentional batch op, auto_link is implicit on write), but a user who turned off auto_link expecting "no automatic link writes" might be surprised.

Why: Either the behavior should match (extract checks auto_link too) or the docs should explicitly state extract is a superset.

Pros: Less surprise for users who treat auto_link as a master switch.

Cons: Some users want extract to work even when auto_link is off (e.g. one-time backfill).

Context: Caught in /ship adversarial review (2026-04-18). Documenting for now.

Effort estimate: S (CC: ~10min for docs OR ~20min for code change). Priority: P2 Depends on: Nothing.

Doctor --fix polish from v0.14.1 adversarial review

What: Six deferred findings from v0.14.1 ship-time adversarial review on src/core/dry-fix.ts:

TOCTOU between read and write. attemptFix reads once, writes later. Concurrent editor saves silently overwritten. Fix: re-read immediately before write and compare snapshot, or O_EXCL tempfile + rename.
Fence detection misses 4-backtick and ~~~ fences. isInsideCodeFence only catches ^```$. CommonMark-legal alternates slip through.
expandBullet walk-up is dead code. Loop breaks immediately because baseIndent matches the current line. Remove or make it actually walk up.
Multi-match guard too strict. Skills with the pattern in a table-of-contents AND body get ambiguous_multiple_matches forever. Consider: fix first, re-scan, repeat until fixed-point.
Subprocess spam. getWorkingTreeStatus spawns git status N×M times per doctor --fix. Cache per-skill per-invocation.
doctor --fix --json swallows the auto-fix report. printAutoFixReport returns early on jsonOutput; agents don't see fix outcomes. Emit auto_fix as a top-level key.

Why: None are ship-blockers; all surfaced during v0.14.1 Codex adversarial review. Bundle into one follow-up PR.

Pros: Closes the adversarial findings loop. Better correctness under concurrent edits and JSON-consumer agents.

Cons: Concurrent-edit test is finicky.

Context: v0.14.1 shipped with the 4 critical fixes (shell-injection via execFileSync, no-git-backup detection, EOF newline preservation, proximity-window consistency). These six are the deferred remainder.

Effort estimate: M (CC: ~45min for all six + tests). Priority: P2 Depends on: Nothing.

Completed

Implement AWS Signature V4 for S3 storage backend

Completed: v0.6.0 (2026-04-10) — replaced with @aws-sdk/client-s3 for proper SigV4 signing.

31 KiB Raw Blame History Unescape Escape

TODOS

check-resolvable

File tracking issues for Checks 5 + 6 (deferred in PR #325)

P1 (BrainBench v1.1 — categories deferred from PR #188)

BrainBench Cat 5: Source Attribution / Provenance

BrainBench Cat 6: Auto-link Precision under Prose (at scale)

BrainBench Cat 8: Skill Behavior Compliance

BrainBench Cat 9: End-to-End Workflows

BrainBench Cat 11: Multi-modal Ingestion

BrainBench Cat 1+2 at full scale

v0.10.4: inferLinkType prose precision fix

v0.10.5: inferLinkType residuals (works_at, advises)

v0.10.4: gbrain alias resolution feature (driven by Cat 3)

P1

Minions shell jobs — Phase 2 scheduling (deferred from v0.13.0)

gbrain crontab-to-minions <file> migration helper (deferred from v0.13.0)

Batch the DB-source extract read path (deferred from v0.12.1)

Batch embedding queue across files

P0

Fix bun build --compile WASM embedding for PGLite

ChatGPT MCP support (OAuth 2.1)

Runtime MCP access control

P1 (new from v0.7.0)

Constrained health_check DSL for third-party recipes

P1 (new from v0.11.0 — Minions)

Per-queue rate limiting for Minions

Minions repeat/cron scheduler

Minions worker event emitter

waitForChildren(parent_id, n) / collectResults(parent_id) helpers

P2

Orchestrator + runner double-write to migrations ledger (deferred from v0.18.2 codex review)

22K-page resync is 30+ minutes on large brains (deferred from v0.18.2 codex review)

Minions: gbrain jobs stats --orphaned (deferred from v0.13.0)

Minions: AbortReason plumbing on MinionJobContext (deferred from v0.13.0)

Minions: blocking-mode audit log for true forensic integrity (deferred from v0.13.0)

Minions: configurable per-job output buffer sizes (deferred from v0.13.0)

Security hardening follow-ups (deferred from security-wave-3)

Community recipe submission (gbrain integrations submit)

Always-on deployment recipes (Fly.io, Railway)

gbrain serve --http + Fly.io/Railway deployment

P2 (knowledge graph follow-ups)

Auto-link skipped writes generate redundant SQL

Audit extract --source db against auto_link config flag

Doctor --fix polish from v0.14.1 adversarial review

Completed

Implement AWS Signature V4 for S3 storage backend

31 KiB

Raw Blame History

`gbrain crontab-to-minions <file>` migration helper (deferred from v0.13.0)

Fix `bun build --compile` WASM embedding for PGLite

`waitForChildren(parent_id, n)` / `collectResults(parent_id)` helpers

Minions: `gbrain jobs stats --orphaned` (deferred from v0.13.0)

Community recipe submission (`gbrain integrations submit`)

`gbrain serve --http` + Fly.io/Railway deployment

Audit `extract --source db` against auto_link config flag