From ff10796a00a957da96353b2c3138ea127ca3bdab Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 21 Apr 2026 13:19:23 -0700 Subject: [PATCH] fix(wave): v0.15.1 - 4 hot issues + scope expansion (#248) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(wave): 4 hot issues + 3 scope expansions (v0.13.1) Addresses four user-filed regressions after v0.13.0 plus three adjacent footgun closures. * #170 — CREATE INDEX [CONCURRENTLY] IF NOT EXISTS idx_pages_updated_at_desc on pages (updated_at DESC). Engine-aware migration v12 with invalid-index cleanup on Postgres, plain CREATE on PGLite. ~700x on 30k+ row brains. Contributed by @fuleinist (#215). * #219 — Minions schema default max_stalled 1 -> 5. v13 migration ALTERs the default and UPDATEs existing non-terminal rows (waiting/active/ delayed/waiting-children/paused) so live queues get rescued on upgrade. Adds MinionJobInput.max_stalled with [1,100] clamp. New --max-stalled CLI flag on `jobs submit`. Reported by @macbotmini-eng. * #218 — package.json postinstall surfaces errors instead of silencing. trustedDependencies whitelists @electric-sql/pglite. doctor schema_version check fails loudly when migrations never ran and links to #218. README + INSTALL_FOR_AGENTS warn against `bun install -g`. Reported by @gopalpatel. * #223 — @electric-sql/pglite pinned to exactly 0.4.3 (was ^0.4.4). PGLiteEngine.connect() wraps PGlite.create() errors with a message pointing at the issue + gbrain doctor. Does NOT suggest 'missing migrations' as a cause (create-time abort happens before migrations run). Pin is unverified against macOS 26.3; error-wrap is the safety net. Reported by @AndreLYL. * Scope: `gbrain jobs submit` gains --backoff-type/--backoff-delay/ --backoff-jitter/--timeout-ms/--idempotency-key (MinionJobInput audit). * Scope: `gbrain jobs smoke --sigkill-rescue` regression case (opt-in, CI-only) that simulates a killed worker and asserts the new default rescues. * Scope: `gbrain doctor --index-audit` reports zero-scan Postgres indexes as drop candidates (informational; no auto-drop). Infrastructure: * Migration interface extended with sqlFor: { postgres?, pglite? } and transaction: boolean. Runner picks the engine-specific branch and bypasses engine.transaction() when transaction:false (required for CONCURRENTLY). BrainEngine.kind readonly discriminator added. * scripts/check-jsonb-pattern.sh CI guard extended to block `max_stalled DEFAULT 1` from regressing. Tests: * 15 new unit tests: v12/v13 structural + behavioral assertions, max_stalled default/clamp/backfill, PGLite error-wrap source guard, engine kind discriminator. * 3 regression tests pinned by IRON RULE. * Full unit suite: 1416 pass. * Full E2E suite against Postgres 16 + pgvector: 126 pass. Co-Authored-By: Claude Opus 4.7 (1M context) * chore: bump version and changelog (v0.13.1) Co-Authored-By: Claude Opus 4.7 (1M context) * docs: sync documentation for v0.13.1 CLAUDE.md "Key files" and "Commands" sections refreshed to match the v0.13.1 fix wave: - Note `BrainEngine.kind` discriminator on engine.ts - Document v0.13.1 connect() error-wrap on pglite-engine.ts - Refresh src/core/minions/ layout (no shell handler, no protected-names, no quiet-hours/stagger — that was v0.13-development scaffolding that did not ship) - Add src/core/migrate.ts entry with `Migration` interface extensions (`sqlFor`, `transaction: false`) - Document new `gbrain jobs submit` flags (--max-stalled, --backoff-type, --backoff-delay, --backoff-jitter, --timeout-ms, --idempotency-key) - Document `gbrain jobs smoke --sigkill-rescue` regression guard - Document `gbrain doctor --index-audit` and the schema_version=0 surface that catches #218 postinstall failures - Extend check-jsonb-pattern.sh note with the max_stalled DEFAULT 1 regression guard - Touch up test file blurbs for migrate.test.ts, pglite-engine.test.ts, minions.test.ts with v0.13.1 coverage Co-Authored-By: Claude Opus 4.7 (1M context) * test(e2e): run files sequentially to eliminate shared-DB race The E2E suite was flaky. ~3 of every 5 runs had 4-10 failures clustered in Links, Timeline, Versions, Minions resilience, Parallel Import, and Page CRUD tests. Symptoms included "expected 16 pages, got 8" (half), "expected 1 link inserted, got 0", timeline entries missing after round-trip, and similar data-shape mismatches. Root cause: bun test runs test FILES in parallel (each in a worker process). 13 E2E files share one DATABASE_URL, and `setupDB()` in `test/e2e/helpers.ts` does `TRUNCATE ... CASCADE` on all tables before each file's `importFixtures()`. File A's TRUNCATE would race with file B's in-flight INSERT stream, producing the observed half-populated or wrong-count states. An earlier attempt used a Postgres advisory lock held on a dedicated single-connection client for the lifetime of each file's run. It broke because bun's default 5000 ms hook timeout fires on queued beforeAll() calls: with 13 files serializing through the lock, files 2-13 would time out waiting for file 1 to finish. This commit switches to sequential file execution at the harness level via scripts/run-e2e.sh, which loops through test/e2e/*.test.ts one at a time, tracks aggregate pass/fail counts, and exits non-zero on the first failing file. No lock, no timeout issues, no changes to any test file. package.json test:e2e points at the new script. Verified: 5 back-to-back runs against the same Postgres container, each completing in ~5 min. Every run: 13 files, 138 tests, 0 fails. Co-Authored-By: Claude Opus 4.7 (1M context) * chore: bump version to 0.15.1 (fix wave locked to MINOR line) Master v0.14.2 was the last /investigate root-cause wave on the v0.14.x line. This fix wave opens v0.15.x: four hot issues (#170, #218, #219, #223) close v0.13.x regressions that v0.14.x didn't cover, so the MINOR bump reflects the semantic shift — new schema migrations (v14, v15), a new CLI surface (`--max-stalled`, `--sigkill-rescue`, `--index-audit`), a new BrainEngine contract (`kind` discriminator + extended `Migration` interface), and a new install-time contract (PGLite 0.4.3 pin + `trustedDependencies`). Locked to 0.15.1 in advance: other work may land before/after this PR, but the version is fixed so reviewers can cite a stable number. Co-Authored-By: Claude Opus 4.7 (1M context) --------- Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 79 ++++++++++++++++++++++- CLAUDE.md | 33 ++++++---- INSTALL_FOR_AGENTS.md | 5 ++ README.md | 5 ++ VERSION | 2 +- bun.lock | 7 +- llms-full.txt | 43 +++++++++---- package.json | 11 ++-- scripts/check-jsonb-pattern.sh | 14 ++++ scripts/run-e2e.sh | 66 +++++++++++++++++++ src/commands/doctor.ts | 64 +++++++++++++++++- src/commands/jobs.ts | 103 ++++++++++++++++++++++++----- src/core/engine.ts | 3 + src/core/migrate.ts | 111 ++++++++++++++++++++++++++------ src/core/minions/queue.ts | 34 ++++++---- src/core/minions/types.ts | 6 ++ src/core/pglite-engine.ts | 31 +++++++-- src/core/pglite-schema.ts | 2 +- src/core/postgres-engine.ts | 1 + src/core/schema-embedded.ts | 4 +- src/schema.sql | 4 +- test/migrate.test.ts | 106 ++++++++++++++++++++++++++++++ test/migrations-v0_14_0.test.ts | 16 +++-- test/minions.test.ts | 104 ++++++++++++++++++++++++++++++ test/pglite-engine.test.ts | 37 +++++++++++ 25 files changed, 797 insertions(+), 94 deletions(-) create mode 100755 scripts/run-e2e.sh diff --git a/CHANGELOG.md b/CHANGELOG.md index c8d3863..e60374f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,83 @@ All notable changes to GBrain will be documented in this file. +## [0.15.1] - 2026-04-21 + +## **Fix wave: 4 hot issues that blocked real brains, landed together.** +## **PGLite survives macOS 26.3. Minions actually rescues SIGKILL'd jobs. Autopilot dashboards stop the 14.6s seqscan. `bun install -g` tells you when it's broken.** + +v0.15.1 is the hotfix wave on top of the v0.14.x stack (shell job type in v0.14.0, doctor DRY + `--fix` in v0.14.1, 8 deferred bug fixes in v0.14.2) plus v0.15.0 (llms.txt + AGENTS.md): four user-filed issues against v0.13.x, fixed and verified together, plus three scope expansions that close adjacent footguns. Upgrade is automatic. If `gbrain upgrade` runs clean, your brain gets faster and more reliable on the next sync cycle. + +### The numbers that matter + +The four issues this release closes, with measured impact: + +| Issue | Before v0.15.1 | After v0.15.1 | Δ | +|-------|----------------|----------------|---| +| #170 `SELECT * FROM pages ORDER BY updated_at DESC` on 31k rows (Postgres) | ~14.6s seqscan | <20ms index scan | ~700x | +| #219 `max_stalled` default on `minion_jobs` | 3 (three rescues before dead, v0.14.2 set this) | 5 (four rescues before dead) | extra headroom for flaky deploys | +| #219 existing waiting/active jobs with `max_stalled<5` | would still dead-letter earlier than expected | backfilled to 5 on upgrade | closes the pain today | +| #218 `bun install -g github:garrytan/gbrain` postinstall failure | silent `|| true` | visible stderr warning with recovery URL | users know it's broken | +| #223 PGLite WASM crash on macOS 26.3 | raw `Aborted()`, no hint | pinned `@electric-sql/pglite` to `0.4.3` + actionable error message naming the issue | users can route to #223 | + +### What this means for you + +If you run autopilot against a Supabase brain with 30k+ pages, your health/dashboard cycle was silently burning 14.6 seconds on every iteration. The new index drops that to single-digit milliseconds without locking writes (Postgres gets `CREATE INDEX CONCURRENTLY` with an invalid-index cleanup DO block; PGLite gets plain `CREATE INDEX` since it has no concurrent writers). Your agent stops blocking on list-pages-by-date queries. + +If you use Minions, the "SIGKILL mid-flight, 10/10 rescued" claim is now actually true out-of-the-box with generous headroom. Default `max_stalled=5` means a kill -9'd worker gets picked up by the next worker instead of dead-lettered early. v15 migration backfills existing non-terminal rows (`waiting/active/delayed/waiting-children/paused`) so upgrading doesn't leave a queue full of doomed jobs. + +If you install via `bun install -g github:...` (not recommended but people try it), you'll now see a loud stderr warning with a link to #218 instead of a broken CLI that fails on next invocation. The real fix is `git clone + bun link`, documented in README and INSTALL_FOR_AGENTS.md. + +If you're on macOS 26.3 and PGLite was crashing with `Aborted()`, the pin to 0.4.3 gives us the best shot at avoiding the WASM regression (noting: 0.4.3 is unverified against 26.3 in CI — the error-wrap at `pglite-engine.ts connect()` is the safety net if the pin doesn't hold). Any PGLite init failure now shows the #223 link instead of a raw runtime error. + +## To take advantage of v0.15.1 + +`gbrain upgrade` should do this automatically. If it didn't, or if `gbrain doctor` warns about a partial migration: + +1. **Run the orchestrator manually:** + ```bash + gbrain apply-migrations --yes + ``` +2. **Verify the outcome:** + ```bash + psql "$DATABASE_URL" -c "\d minion_jobs" | grep max_stalled # DEFAULT should be 5 + psql "$DATABASE_URL" -c "\d pages" | grep idx_pages_updated_at_desc # index should exist + gbrain doctor + ``` +3. **If any step fails or the numbers look wrong,** file an issue with `gbrain doctor` output and the contents of `~/.gbrain/upgrade-errors.jsonl` if it exists. https://github.com/garrytan/gbrain/issues + +### Itemized changes + +#### Added +- Schema migration **v14** — `CREATE INDEX [CONCURRENTLY] IF NOT EXISTS idx_pages_updated_at_desc ON pages (updated_at DESC)` (engine-aware; Postgres uses CONCURRENTLY with an invalid-index DO-block cleanup, PGLite uses plain CREATE). Closes #170. Contributed by @fuleinist (#215). +- Schema migration **v15** — `ALTER TABLE minion_jobs ALTER COLUMN max_stalled SET DEFAULT 5` (bumps v0.14.2's default of 3 to 5 for extra flaky-deploy headroom) + `UPDATE` backfill scoped to non-terminal statuses (`waiting/active/delayed/waiting-children/paused`) so existing queued work benefits on upgrade. Closes #219. Reported by @macbotmini-eng. +- `MinionJobInput.max_stalled` — new optional field, plumbed through `queue.add()` with `[1, 100]` clamp. +- `gbrain jobs submit --max-stalled N` — CLI flag to set per-job stall tolerance. +- `gbrain jobs submit --backoff-type`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key` — scope-expansion audit exposing existing `MinionJobInput` fields as first-class CLI flags. +- `gbrain jobs smoke --sigkill-rescue` — opt-in regression smoke case that simulates a killed worker and asserts the v0.15.1 default actually rescues. +- `gbrain doctor --index-audit` — new opt-in Postgres check that reports zero-scan indexes from `pg_stat_user_indexes`. Informational only (no auto-drop). PGLite no-ops. +- `BrainEngine.kind` readonly discriminator (`'postgres' | 'pglite'`) — lets migrations and consumers branch on engine without `instanceof` + dynamic imports. +- `package.json trustedDependencies: ["@electric-sql/pglite"]` — lets Bun run PGLite's dep postinstall on global installs. + +#### Changed +- `@electric-sql/pglite` pinned to exactly `0.4.3` (was `^0.4.4`) — best-available mitigation for the macOS 26.3 WASM abort. Reported by @AndreLYL (#223). Flagged as unverified; reproduce on a 26.3 machine and file a follow-up if it still aborts. +- `package.json postinstall` — now warns loudly on stderr with a recovery URL instead of silencing errors with `2>/dev/null || true`. `bun install -g` hitting a migration failure now tells you what to do. Reported by @gopalpatel (#218). +- `src/core/pglite-engine.ts connect()` — wraps `PGlite.create()` with a friendly error pointing at #223 and `gbrain doctor`. Nests the original error for debuggability. +- `doctor` `schema_version` check — now fails loudly when `version=0` (migrations never ran), linking #218. +- `README.md` + `INSTALL_FOR_AGENTS.md` — explicit warning against `bun install -g github:garrytan/gbrain`. + +#### Fixed +- **The "SIGKILL mid-flight, 10/10 rescued" claim is now accurate** out-of-the-box with headroom (#219). Schema default 3 → 5. +- **Autopilot dashboards stop blocking on list-pages queries** on 30k+ row Postgres brains (#170). +- **PGLite error on macOS 26.3** is now actionable instead of a raw `Aborted()` (#223). +- **`bun install -g` no longer produces a silently broken CLI** (#218) — postinstall surfaces failures. + +#### Internal +- `Migration` interface extended with `sqlFor: { postgres?, pglite? }` + `transaction: boolean` fields. Runner picks the engine-specific SQL branch and (on Postgres only) bypasses `engine.transaction()` when `transaction: false` (required for CONCURRENTLY). +- `scripts/check-jsonb-pattern.sh` extended with a CI guard against `max_stalled DEFAULT 1` regressing. +- ~15 new unit tests covering max_stalled default/clamp/backfill/v14/v15 semantics. 3 regression tests pinned by IRON RULE. +- `test/e2e/` now runs test files sequentially via `scripts/run-e2e.sh` to eliminate shared-DB races that caused ~3/5 runs to have 4-10 flaky fails. Every run post-fix: 13 files, 138 tests, 0 fails. + ## [0.15.0] - 2026-04-21 ## **GBrain now talks to LLMs the way modern docs sites do.** @@ -126,7 +203,7 @@ Your agent's feedback loops tighten. When sync blocks, doctor surfaces the exact #### Reliability - **Bug 2: `GBRAIN_POOL_SIZE` env knob** (`src/core/db.ts`, `src/commands/import.ts`). Honored by both the singleton pool and the parallel-import worker pool. Defaults to 10; lower for Supabase transaction pooler. `initPostgres` / `initPGLite` now wrap lifecycle in `try { ... } finally { await engine.disconnect() }`. - **Bug 3: Migration ledger centralization + wedge cap** (`src/commands/apply-migrations.ts`, `src/core/preferences.ts`). Runner owns all ledger writes. 3 consecutive partials = wedged, skipped with a loud message. New `--force-retry ` flag writes a `'retry'` marker without faking success. `complete` status never regresses. `appendCompletedMigration` is idempotent on double-complete. -- **Bug 8: `max_stalled` default 1 → 3** (`src/core/schema-embedded.ts`, `src/core/pglite-schema.ts`, `src/schema.sql`). First lock-lost tick no longer dead-letters. `v0_14_0` Phase A ALTERs existing installs. `autopilot-cycle` handler yields to the event loop between phases so the worker's lock-renewal timer fires. +- **Bug 8: `max_stalled` default 1 → 3** (`src/core/schema-embedded.ts`, `src/core/pglite-schema.ts`, `src/schema.sql`). First lock-lost tick no longer dead-letters. `v0_14_0` Phase A ALTERs existing installs. `autopilot-cycle` handler yields to the event loop between phases so the worker's lock-renewal timer fires. (v0.15.1 further bumps this to 5 and adds a non-terminal row backfill — see #219.) - **Bug 9: Sync gate + acknowledge mechanism** (`src/commands/sync.ts`, `src/commands/import.ts`, `src/core/sync.ts`). All 3 sync paths (incremental, full via `runImport`, `gbrain import` git continuity) gate `sync.last_commit` on no-failures. Failures append to `~/.gbrain/sync-failures.jsonl` with dedup key. New `gbrain sync --skip-failed` + `--retry-failed` flags. Doctor surfaces unacknowledged failures. #### Observability diff --git a/CLAUDE.md b/CLAUDE.md index 4aa0e51..e0a93e7 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -23,9 +23,9 @@ strict behavior when unset. ## Key files - `src/core/operations.ts` — Contract-first operation definitions (the foundation). Also exports upload validators: `validateUploadPath`, `validatePageSlug`, `validateFilename`. `OperationContext.remote` flags untrusted callers. -- `src/core/engine.ts` — Pluggable engine interface (BrainEngine). `clampSearchLimit(limit, default, cap)` takes an explicit cap so per-operation caps can be tighter than `MAX_SEARCH_LIMIT`. Exports `LinkBatchInput` / `TimelineBatchInput` for the v0.12.1 bulk-insert API (`addLinksBatch` / `addTimelineEntriesBatch`). +- `src/core/engine.ts` — Pluggable engine interface (BrainEngine). `clampSearchLimit(limit, default, cap)` takes an explicit cap so per-operation caps can be tighter than `MAX_SEARCH_LIMIT`. Exports `LinkBatchInput` / `TimelineBatchInput` for the v0.12.1 bulk-insert API (`addLinksBatch` / `addTimelineEntriesBatch`). As of v0.13.1, `BrainEngine` has a `readonly kind: 'postgres' | 'pglite'` discriminator so migrations (`src/core/migrate.ts`) and other consumers can branch on engine without `instanceof` + dynamic imports. - `src/core/engine-factory.ts` — Engine factory with dynamic imports (`'pglite'` | `'postgres'`) -- `src/core/pglite-engine.ts` — PGLite (embedded Postgres 17.5 via WASM) implementation, all 40 BrainEngine methods. `addLinksBatch` / `addTimelineEntriesBatch` use multi-row `unnest()` with manual `$N` placeholders. +- `src/core/pglite-engine.ts` — PGLite (embedded Postgres 17.5 via WASM) implementation, all 40 BrainEngine methods. `addLinksBatch` / `addTimelineEntriesBatch` use multi-row `unnest()` with manual `$N` placeholders. As of v0.13.1, `connect()` wraps `PGlite.create()` in a try/catch that emits an actionable error naming the macOS 26.3 WASM bug (#223) and pointing at `gbrain doctor`; the lock is released on failure so the next process can retry cleanly. - `src/core/pglite-schema.ts` — PGLite-specific DDL (pgvector, pg_trgm, triggers) - `src/core/postgres-engine.ts` — Postgres + pgvector implementation (Supabase / self-hosted). `addLinksBatch` / `addTimelineEntriesBatch` use `INSERT ... SELECT FROM unnest($1::text[], ...) JOIN pages ON CONFLICT DO NOTHING RETURNING 1` — 4-5 array params regardless of batch size, sidesteps the 65535-parameter cap. As of v0.12.3, `searchKeyword` / `searchVector` scope `statement_timeout` via `sql.begin` + `SET LOCAL` so the GUC dies with the transaction instead of leaking across the pooled postgres.js connection (contributed by @garagon). `getEmbeddingsByChunkIds` uses `tryParseEmbedding` so one corrupt row skips+warns instead of killing the query. - `src/core/utils.ts` — Shared SQL utilities extracted from postgres-engine.ts. Exports `parseEmbedding(value)` (throws on unknown input, used by migration + ingest paths where data integrity matters) and as of v0.12.3 `tryParseEmbedding(value)` (returns `null` + warns once per process, used by search/rescore paths where availability matters more than strictness). @@ -52,25 +52,27 @@ strict behavior when unset. - `src/commands/extract.ts` — `gbrain extract links|timeline|all [--source fs|db]`: batch link/timeline extraction. fs walks markdown files, db walks pages from the engine (mutation-immune snapshot iteration; use this for live brains with no local checkout). As of v0.12.1 there is no in-memory dedup pre-load — candidates are buffered 100 at a time and flushed via `addLinksBatch` / `addTimelineEntriesBatch`; `ON CONFLICT DO NOTHING` enforces uniqueness at the DB layer, and the `created` counter returns real rows inserted (truthful on re-runs). - `src/commands/graph-query.ts` — `gbrain graph-query [--type T] [--depth N] [--direction in|out|both]`: typed-edge relationship traversal (renders indented tree) - `src/core/link-extraction.ts` — shared library for the v0.12.0 graph layer. extractEntityRefs (canonical, replaces backlinks.ts duplicate) matches both `[Name](people/slug)` markdown links and Obsidian `[[people/slug|Name]]` wikilinks as of v0.12.3. extractPageLinks, inferLinkType heuristics (attended/works_at/invested_in/founded/advises/source/mentions), parseTimelineEntries, isAutoLinkEnabled config helper. `DIR_PATTERN` covers `people`, `companies`, `deals`, `topics`, `concepts`, `projects`, `entities`, `tech`, `finance`, `personal`, `openclaw`. Used by extract.ts, operations.ts auto-link post-hook, and backlinks.ts. -- `src/core/minions/` — Minions job queue: BullMQ-inspired, Postgres-native (queue, worker, backoff, types) -- `src/core/minions/queue.ts` — MinionQueue class (submit, claim, complete, fail, stall detection, parent-child, depth/child-cap, per-job timeouts, cascade-kill, attachments, idempotency keys, child_done inbox, removeOnComplete/Fail). `add()` takes a 4th `trusted` arg (separate from `opts` to prevent spread leakage); protected names in `PROTECTED_JOB_NAMES` require `{allowProtectedSubmit: true}` and the check runs trim-normalized (whitespace-bypass safe). +- `src/core/minions/` — Minions job queue: BullMQ-inspired, Postgres-native (queue, worker, backoff, types, protected-names, quiet-hours, stagger, handlers/shell). +- `src/core/minions/queue.ts` — MinionQueue class (submit, claim, complete, fail, stall detection, parent-child, depth/child-cap, per-job timeouts, cascade-kill, attachments, idempotency keys, child_done inbox, removeOnComplete/Fail). `add()` takes a 4th `trusted` arg (separate from `opts` to prevent spread leakage); protected names in `PROTECTED_JOB_NAMES` require `{allowProtectedSubmit: true}` and the check runs trim-normalized (whitespace-bypass safe). v0.14.1 #219: `add()` plumbs `max_stalled` through with a `[1, 100]` clamp; omitted values let the schema DEFAULT (5) kick in. - `src/core/minions/worker.ts` — MinionWorker class (handler registry, lock renewal, graceful shutdown, timeout safety net). v0.14.0 abort-path fix: aborted jobs now call `failJob` with reason (`timeout`/`cancel`/`lock-lost`/`shutdown`) instead of returning silently. `shutdownAbort` (instance field) fires on process SIGTERM/SIGINT and propagates to `ctx.shutdownSignal` — shell handler listens to it; non-shell handlers don't. +- `src/core/minions/types.ts` — `MinionJobInput` + `MinionJobStatus` + handler context types. `MinionJobInput.max_stalled` (new in v0.14.1) is optional; omitted values let the schema DEFAULT (5) kick in, provided values are clamped to `[1, 100]`. - `src/core/minions/protected-names.ts` — side-effect-free constant module exporting `PROTECTED_JOB_NAMES` + `isProtectedJobName()`. Kept pure so queue core can import without loading handler modules. - `src/core/minions/handlers/shell.ts` — `shell` job handler. Spawns `/bin/sh -c cmd` (absolute path, PATH-override-safe) or `argv[0] argv[1..]` (no shell). Env allowlist: `PATH, HOME, USER, LANG, TZ, NODE_ENV` + caller `env:` overrides. UTF-8-safe stdout/stderr tail via `string_decoder.StringDecoder`. Abort (either `ctx.signal` or `ctx.shutdownSignal`) fires SIGTERM → 5s grace → SIGKILL on child. Requires `GBRAIN_ALLOW_SHELL_JOBS=1` on worker (gated by `registerBuiltinHandlers`). - `src/core/minions/handlers/shell-audit.ts` — per-submission JSONL audit trail at `~/.gbrain/audit/shell-jobs-YYYY-Www.jsonl` (ISO-week rotation; override via `GBRAIN_AUDIT_DIR`). Best-effort: `mkdirSync(recursive)` + `appendFileSync`; failures logged to stderr, submission not blocked. Logs cmd (first 80 chars) or argv (JSON array). Never logs env values. - `src/core/minions/attachments.ts` — Attachment validation (path traversal, null byte, oversize, base64, duplicate detection) -- `src/commands/jobs.ts` — `gbrain jobs` CLI subcommands + `gbrain jobs work` daemon +- `src/commands/jobs.ts` — `gbrain jobs` CLI subcommands + `gbrain jobs work` daemon. v0.13.1 surfaces the full `MinionJobInput` retry/backoff/timeout/idempotency surface as first-class CLI flags on `jobs submit`: `--max-stalled`, `--backoff-type fixed|exponential`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key`. `jobs smoke --sigkill-rescue` is the opt-in regression guard for #219. - `src/commands/features.ts` — `gbrain features --json --auto-fix`: usage scan + feature adoption salesman - `src/commands/autopilot.ts` — `gbrain autopilot --install`: self-maintaining brain daemon (sync+extract+embed) - `src/mcp/server.ts` — MCP stdio server (generated from operations) - `src/commands/auth.ts` — Standalone token management (create/list/revoke/test) - `src/commands/upgrade.ts` — Self-update CLI. `runPostUpgrade()` enumerates migrations from the TS registry (src/commands/migrations/index.ts) and tail-calls `runApplyMigrations(['--yes', '--non-interactive'])` so the mechanical side of every outstanding migration runs unconditionally. -- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3, pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry ` writes a `'retry'` reset marker. +- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3 — superseded by v0.14.3's schema-level DEFAULT 5 + UPDATE backfill; pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry ` writes a `'retry'` reset marker. v0.14.3 (fix wave) ships schema-only migrations v14 (`pages_updated_at_index`) + v15 (`minion_jobs_max_stalled_default_5` with UPDATE backfill) via the `MIGRATIONS` array in `src/core/migrate.ts` — no orchestrator phases needed. - `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent. - `src/commands/orphans.ts` — `gbrain orphans [--json] [--count] [--include-pseudo]`: surfaces pages with zero inbound wikilinks, grouped by domain. Auto-generated/raw/pseudo pages filtered by default. Also exposed as `find_orphans` MCP operation. Shipped in v0.12.3 (contributed by @knee5). -- `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run]`: health checks. v0.12.3 adds two reliability detection checks: `jsonb_integrity` (scans pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata for `jsonb_typeof='string'` rows left over from v0.12.0) and `markdown_body_completeness` (flags pages whose compiled_truth is <30% of raw source when raw has multiple H2/H3 boundaries). Fix hints point at `gbrain repair-jsonb` and `gbrain sync --force`. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. +- `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, and `gbrain apply-migrations`. +- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows. - `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (``, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics). -- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces the `${JSON.stringify(x)}::jsonb` interpolation pattern (which postgres.js v3 double-encodes). Wired into `bun test`. +- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces (a) the `${JSON.stringify(x)}::jsonb` interpolation pattern (postgres.js v3 double-encodes it), or (b) `max_stalled INTEGER NOT NULL DEFAULT 1` in any schema source file (v0.15.1 #219 regression guard — must be DEFAULT 5 to preserve SIGKILL-rescue). Wired into `bun test`. - `scripts/llms-config.ts` + `scripts/build-llms.ts` — Generator for `llms.txt` (llmstxt.org-spec web index) + `llms-full.txt` (inlined single-fetch bundle). Curated config drives both. Run `bun run build:llms` after adding a new doc. `LLMS_REPO_BASE` env var lets forks regenerate with their own URL base. `FULL_SIZE_BUDGET` (600KB) caps the inline bundle; generator WARNs if exceeded. Committed output is not analogous to `schema-embedded.ts` (no runtime consumer); we commit for GitHub browsing and fork-safe fetching. - `AGENTS.md` — Local-clone entry point for non-Claude agents (Codex, Cursor, OpenClaw, Aider). Mirrors `CLAUDE.md` intent via relative links. Claude Code keeps using `CLAUDE.md`. - `docs/UPGRADING_DOWNSTREAM_AGENTS.md` — Patches for downstream agent skill forks to apply when upgrading. Each release appends a new section. v0.10.3 includes diffs for brain-ops, meeting-ingestion, signal-detector, enrich. @@ -132,12 +134,13 @@ Key commands added in v0.7: - `gbrain migrate --to supabase` / `gbrain migrate --to pglite` — bidirectional engine migration Key commands added for Minions (job queue): -- `gbrain jobs submit [--params JSON] [--follow] [--dry-run]` — submit a background job +- `gbrain jobs submit [--params JSON] [--follow] [--dry-run]` — submit a background job. v0.13.1 adds first-class flags for every `MinionJobInput` tuning knob: `--max-stalled N`, `--backoff-type fixed|exponential`, `--backoff-delay Nms`, `--backoff-jitter 0..1`, `--timeout-ms N`, `--idempotency-key K`. - `gbrain jobs list [--status S] [--queue Q]` — list jobs with filters - `gbrain jobs get ` — job details with attempt history - `gbrain jobs cancel/retry/delete ` — manage job lifecycle - `gbrain jobs prune [--older-than 30d]` — clean old completed/dead jobs - `gbrain jobs stats` — job health dashboard +- `gbrain jobs smoke [--sigkill-rescue]` — health smoke test. `--sigkill-rescue` is the v0.13.1 regression guard for #219: simulates a killed worker and asserts the stalled job is requeued instead of dead-lettered on first stall. - `gbrain jobs work [--queue Q] [--concurrency N]` — start worker daemon (Postgres only) Key commands added in v0.12.2: @@ -154,6 +157,12 @@ Key commands added in v0.14.2: - `GBRAIN_POOL_SIZE` env var — honored by both the singleton pool (`src/core/db.ts`) and the parallel-import worker pool (`src/commands/import.ts`). Default is 10; lower to 2 for Supabase transaction pooler to avoid MaxClients crashes during `gbrain upgrade` subprocess spawns. Read at call time via `resolvePoolSize()`. - `gbrain doctor` gains two new checks: `sync_failures` (surfaces unacknowledged parse failures with exact paths + fix hints) and `brain_score` (renders the 5-component breakdown when score < 100: embed coverage / 35, link density / 25, timeline coverage / 15, orphans / 15, dead links / 10 — sum equals total). +Key commands added in v0.14.3 (fix wave): +- `gbrain doctor --index-audit` — opt-in Postgres-only check reporting zero-scan indexes from `pg_stat_user_indexes`. Informational only; never auto-drops. +- `gbrain doctor` schema_version check fails loudly when `version=0` — catches `bun install -g github:...` postinstall failures (#218) and routes users to `gbrain apply-migrations --yes`. +- `gbrain jobs submit` gains `--max-stalled`, `--backoff-type`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key` — exposing existing `MinionJobInput` fields as first-class CLI flags. +- `gbrain jobs smoke --sigkill-rescue` — opt-in regression smoke case simulating a killed worker; asserts the v0.14.3 schema default (`max_stalled=5`) actually rescues on first stall. + ## Testing `bun test` runs all tests. After the v0.12.1 release: ~75 unit test files + 8 E2E test files (1412 unit pass, 119 E2E when `DATABASE_URL` is set — skip gracefully otherwise). Unit tests run @@ -165,11 +174,11 @@ parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redac `test/files.test.ts` (MIME/hash), `test/import-file.test.ts` (import pipeline), `test/upgrade.test.ts` (schema migrations), `test/file-migration.test.ts` (file migration), `test/file-resolver.test.ts` (file resolution), -`test/import-resume.test.ts` (import checkpoints), `test/migrate.test.ts` (migration; v8/v9 helper-btree-index SQL structural assertions + 1000-row wall-clock fixtures that guard the O(n²)→O(n log n) fix), +`test/import-resume.test.ts` (import checkpoints), `test/migrate.test.ts` (migration; v8/v9 helper-btree-index SQL structural assertions + 1000-row wall-clock fixtures that guard the O(n²)→O(n log n) fix + v0.13.1 assertions on v12/v13 SQL shape, `sqlFor` + `transaction:false` runner semantics, and the `max_stalled DEFAULT 1` regression guard), `test/setup-branching.test.ts` (setup flow), `test/slug-validation.test.ts` (slug validation), `test/storage.test.ts` (storage backends), `test/supabase-admin.test.ts` (Supabase admin), `test/yaml-lite.test.ts` (YAML parsing), `test/check-update.test.ts` (version check + update CLI), -`test/pglite-engine.test.ts` (PGLite engine, all 40 BrainEngine methods including 11 cases for `addLinksBatch` / `addTimelineEntriesBatch`: empty batch, missing optionals, within-batch dedup via ON CONFLICT, missing-slug rows dropped by JOIN, half-existing batch, batch of 100), +`test/pglite-engine.test.ts` (PGLite engine, all 40 BrainEngine methods including 11 cases for `addLinksBatch` / `addTimelineEntriesBatch`: empty batch, missing optionals, within-batch dedup via ON CONFLICT, missing-slug rows dropped by JOIN, half-existing batch, batch of 100 + v0.13.1 `connect()` error-wrap assertion (original error nested, #223 link in message, lock released)), `test/engine-factory.test.ts` (engine factory + dynamic imports), `test/integrations.test.ts` (recipe parsing, CLI routing, recipe validation), `test/publish.test.ts` (content stripping, encryption, password generation, HTML output), @@ -190,7 +199,7 @@ parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redac `test/transcription.test.ts` (provider detection, format validation, API key errors), `test/enrichment-service.test.ts` (entity slugification, extraction, tier escalation), `test/data-research.test.ts` (recipe validation, MRR/ARR extraction, dedup, tracker parsing, HTML stripping), -`test/minions.test.ts` (Minions job queue v7: CRUD, state machine, backoff, stall detection, dependencies, worker lifecycle, lock management, claim mechanics, depth/child-cap, timeouts, cascade kill, idempotency, child_done inbox, attachments, removeOnComplete/Fail), +`test/minions.test.ts` (Minions job queue v7: CRUD, state machine, backoff, stall detection, dependencies, worker lifecycle, lock management, claim mechanics, depth/child-cap, timeouts, cascade kill, idempotency, child_done inbox, attachments, removeOnComplete/Fail + v0.13.1 `max_stalled` clamp/default/plumbing coverage), `test/extract.test.ts` (link extraction, timeline extraction, frontmatter parsing, directory type inference), `test/extract-db.test.ts` (gbrain extract --source db: typed link inference, idempotency, --type filter, --dry-run JSON output), `test/extract-fs.test.ts` (gbrain extract --source fs: first-run inserts + second-run reports zero, dry-run dedups candidates across files, second-run perf regression guard — the v0.12.1 N+1 dedup bug), diff --git a/INSTALL_FOR_AGENTS.md b/INSTALL_FOR_AGENTS.md index 57ed0c0..0d3b08a 100644 --- a/INSTALL_FOR_AGENTS.md +++ b/INSTALL_FOR_AGENTS.md @@ -26,6 +26,11 @@ bun install && bun link Verify: `gbrain --version` should print a version number. If `gbrain` is not found, restart the shell or add the PATH export to the shell profile. +> **Do NOT use `bun install -g github:garrytan/gbrain`.** Bun blocks the top-level +> postinstall hook on global installs, so schema migrations never run and the CLI +> aborts with `Aborted()` when it opens PGLite. Use the `git clone + bun link` path +> above. Tracking issue: [#218](https://github.com/garrytan/gbrain/issues/218). + ## Step 2: API Keys Ask the user for these: diff --git a/README.md b/README.md index 2ad2a7a..16f4f34 100644 --- a/README.md +++ b/README.md @@ -44,6 +44,11 @@ gbrain import ~/notes/ # index your markdown gbrain query "what themes show up across my notes?" ``` +**Do NOT use `bun install -g github:garrytan/gbrain`.** Bun blocks the top-level +postinstall hook on global installs, so schema migrations never run and the CLI +aborts with `Aborted()` the first time it opens PGLite. Use `git clone + bun install +&& bun link` as shown above. See [#218](https://github.com/garrytan/gbrain/issues/218). + ``` 3 results (hybrid search, 0.12s): diff --git a/VERSION b/VERSION index a551051..e815b86 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.15.0 +0.15.1 diff --git a/bun.lock b/bun.lock index da3296c..6ed65b0 100644 --- a/bun.lock +++ b/bun.lock @@ -7,7 +7,7 @@ "dependencies": { "@anthropic-ai/sdk": "^0.30.0", "@aws-sdk/client-s3": "^3.1028.0", - "@electric-sql/pglite": "^0.4.4", + "@electric-sql/pglite": "0.4.3", "@modelcontextprotocol/sdk": "^1.0.0", "gray-matter": "^4.0.3", "marked": "^18.0.0", @@ -20,6 +20,9 @@ }, }, }, + "trustedDependencies": [ + "@electric-sql/pglite", + ], "packages": { "@anthropic-ai/sdk": ["@anthropic-ai/sdk@0.30.1", "", { "dependencies": { "@types/node": "^18.11.18", "@types/node-fetch": "^2.6.4", "abort-controller": "^3.0.0", "agentkeepalive": "^4.2.1", "form-data-encoder": "1.7.2", "formdata-node": "^4.3.2", "node-fetch": "^2.6.7" } }, "sha512-nuKvp7wOIz6BFei8WrTdhmSsx5mwnArYyJgh4+vYu3V4J0Ltb8Xm3odPm51n1aSI0XxNCrDl7O88cxCtUdAkaw=="], @@ -103,7 +106,7 @@ "@aws/lambda-invoke-store": ["@aws/lambda-invoke-store@0.2.4", "", {}, "sha512-iY8yvjE0y651BixKNPgmv1WrQc+GZ142sb0z4gYnChDDY2YqI4P/jsSopBWrKfAt7LOJAkOXt7rC/hms+WclQQ=="], - "@electric-sql/pglite": ["@electric-sql/pglite@0.4.4", "", {}, "sha512-g/6CWAJ4XOkObWCWAQ2IReZD8VvsDy3poRHSKvpRR2F96F8WJ3HVbjpso3gN7l0q6QPPgvxSSpl/qo5k8a7mkQ=="], + "@electric-sql/pglite": ["@electric-sql/pglite@0.4.3", "", {}, "sha512-ichuWTgtd4mOM1G4SpyGJa5trT03lWbMypDV0fUXUCXg5hiHqVAz/bZyV68NqmkLB7WcYmj1RMJVSp8HV/v/ZQ=="], "@hono/node-server": ["@hono/node-server@1.19.12", "", { "peerDependencies": { "hono": "^4" } }, "sha512-txsUW4SQ1iilgE0l9/e9VQWmELXifEFvmdA1j6WFh/aFPj99hIntrSsq/if0UWyGVkmrRPKA1wCeP+UCr1B9Uw=="], diff --git a/llms-full.txt b/llms-full.txt index d96bf9c..16928e2 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -102,9 +102,9 @@ strict behavior when unset. ## Key files - `src/core/operations.ts` — Contract-first operation definitions (the foundation). Also exports upload validators: `validateUploadPath`, `validatePageSlug`, `validateFilename`. `OperationContext.remote` flags untrusted callers. -- `src/core/engine.ts` — Pluggable engine interface (BrainEngine). `clampSearchLimit(limit, default, cap)` takes an explicit cap so per-operation caps can be tighter than `MAX_SEARCH_LIMIT`. Exports `LinkBatchInput` / `TimelineBatchInput` for the v0.12.1 bulk-insert API (`addLinksBatch` / `addTimelineEntriesBatch`). +- `src/core/engine.ts` — Pluggable engine interface (BrainEngine). `clampSearchLimit(limit, default, cap)` takes an explicit cap so per-operation caps can be tighter than `MAX_SEARCH_LIMIT`. Exports `LinkBatchInput` / `TimelineBatchInput` for the v0.12.1 bulk-insert API (`addLinksBatch` / `addTimelineEntriesBatch`). As of v0.13.1, `BrainEngine` has a `readonly kind: 'postgres' | 'pglite'` discriminator so migrations (`src/core/migrate.ts`) and other consumers can branch on engine without `instanceof` + dynamic imports. - `src/core/engine-factory.ts` — Engine factory with dynamic imports (`'pglite'` | `'postgres'`) -- `src/core/pglite-engine.ts` — PGLite (embedded Postgres 17.5 via WASM) implementation, all 40 BrainEngine methods. `addLinksBatch` / `addTimelineEntriesBatch` use multi-row `unnest()` with manual `$N` placeholders. +- `src/core/pglite-engine.ts` — PGLite (embedded Postgres 17.5 via WASM) implementation, all 40 BrainEngine methods. `addLinksBatch` / `addTimelineEntriesBatch` use multi-row `unnest()` with manual `$N` placeholders. As of v0.13.1, `connect()` wraps `PGlite.create()` in a try/catch that emits an actionable error naming the macOS 26.3 WASM bug (#223) and pointing at `gbrain doctor`; the lock is released on failure so the next process can retry cleanly. - `src/core/pglite-schema.ts` — PGLite-specific DDL (pgvector, pg_trgm, triggers) - `src/core/postgres-engine.ts` — Postgres + pgvector implementation (Supabase / self-hosted). `addLinksBatch` / `addTimelineEntriesBatch` use `INSERT ... SELECT FROM unnest($1::text[], ...) JOIN pages ON CONFLICT DO NOTHING RETURNING 1` — 4-5 array params regardless of batch size, sidesteps the 65535-parameter cap. As of v0.12.3, `searchKeyword` / `searchVector` scope `statement_timeout` via `sql.begin` + `SET LOCAL` so the GUC dies with the transaction instead of leaking across the pooled postgres.js connection (contributed by @garagon). `getEmbeddingsByChunkIds` uses `tryParseEmbedding` so one corrupt row skips+warns instead of killing the query. - `src/core/utils.ts` — Shared SQL utilities extracted from postgres-engine.ts. Exports `parseEmbedding(value)` (throws on unknown input, used by migration + ingest paths where data integrity matters) and as of v0.12.3 `tryParseEmbedding(value)` (returns `null` + warns once per process, used by search/rescore paths where availability matters more than strictness). @@ -131,25 +131,27 @@ strict behavior when unset. - `src/commands/extract.ts` — `gbrain extract links|timeline|all [--source fs|db]`: batch link/timeline extraction. fs walks markdown files, db walks pages from the engine (mutation-immune snapshot iteration; use this for live brains with no local checkout). As of v0.12.1 there is no in-memory dedup pre-load — candidates are buffered 100 at a time and flushed via `addLinksBatch` / `addTimelineEntriesBatch`; `ON CONFLICT DO NOTHING` enforces uniqueness at the DB layer, and the `created` counter returns real rows inserted (truthful on re-runs). - `src/commands/graph-query.ts` — `gbrain graph-query [--type T] [--depth N] [--direction in|out|both]`: typed-edge relationship traversal (renders indented tree) - `src/core/link-extraction.ts` — shared library for the v0.12.0 graph layer. extractEntityRefs (canonical, replaces backlinks.ts duplicate) matches both `[Name](people/slug)` markdown links and Obsidian `[[people/slug|Name]]` wikilinks as of v0.12.3. extractPageLinks, inferLinkType heuristics (attended/works_at/invested_in/founded/advises/source/mentions), parseTimelineEntries, isAutoLinkEnabled config helper. `DIR_PATTERN` covers `people`, `companies`, `deals`, `topics`, `concepts`, `projects`, `entities`, `tech`, `finance`, `personal`, `openclaw`. Used by extract.ts, operations.ts auto-link post-hook, and backlinks.ts. -- `src/core/minions/` — Minions job queue: BullMQ-inspired, Postgres-native (queue, worker, backoff, types) -- `src/core/minions/queue.ts` — MinionQueue class (submit, claim, complete, fail, stall detection, parent-child, depth/child-cap, per-job timeouts, cascade-kill, attachments, idempotency keys, child_done inbox, removeOnComplete/Fail). `add()` takes a 4th `trusted` arg (separate from `opts` to prevent spread leakage); protected names in `PROTECTED_JOB_NAMES` require `{allowProtectedSubmit: true}` and the check runs trim-normalized (whitespace-bypass safe). +- `src/core/minions/` — Minions job queue: BullMQ-inspired, Postgres-native (queue, worker, backoff, types, protected-names, quiet-hours, stagger, handlers/shell). +- `src/core/minions/queue.ts` — MinionQueue class (submit, claim, complete, fail, stall detection, parent-child, depth/child-cap, per-job timeouts, cascade-kill, attachments, idempotency keys, child_done inbox, removeOnComplete/Fail). `add()` takes a 4th `trusted` arg (separate from `opts` to prevent spread leakage); protected names in `PROTECTED_JOB_NAMES` require `{allowProtectedSubmit: true}` and the check runs trim-normalized (whitespace-bypass safe). v0.14.1 #219: `add()` plumbs `max_stalled` through with a `[1, 100]` clamp; omitted values let the schema DEFAULT (5) kick in. - `src/core/minions/worker.ts` — MinionWorker class (handler registry, lock renewal, graceful shutdown, timeout safety net). v0.14.0 abort-path fix: aborted jobs now call `failJob` with reason (`timeout`/`cancel`/`lock-lost`/`shutdown`) instead of returning silently. `shutdownAbort` (instance field) fires on process SIGTERM/SIGINT and propagates to `ctx.shutdownSignal` — shell handler listens to it; non-shell handlers don't. +- `src/core/minions/types.ts` — `MinionJobInput` + `MinionJobStatus` + handler context types. `MinionJobInput.max_stalled` (new in v0.14.1) is optional; omitted values let the schema DEFAULT (5) kick in, provided values are clamped to `[1, 100]`. - `src/core/minions/protected-names.ts` — side-effect-free constant module exporting `PROTECTED_JOB_NAMES` + `isProtectedJobName()`. Kept pure so queue core can import without loading handler modules. - `src/core/minions/handlers/shell.ts` — `shell` job handler. Spawns `/bin/sh -c cmd` (absolute path, PATH-override-safe) or `argv[0] argv[1..]` (no shell). Env allowlist: `PATH, HOME, USER, LANG, TZ, NODE_ENV` + caller `env:` overrides. UTF-8-safe stdout/stderr tail via `string_decoder.StringDecoder`. Abort (either `ctx.signal` or `ctx.shutdownSignal`) fires SIGTERM → 5s grace → SIGKILL on child. Requires `GBRAIN_ALLOW_SHELL_JOBS=1` on worker (gated by `registerBuiltinHandlers`). - `src/core/minions/handlers/shell-audit.ts` — per-submission JSONL audit trail at `~/.gbrain/audit/shell-jobs-YYYY-Www.jsonl` (ISO-week rotation; override via `GBRAIN_AUDIT_DIR`). Best-effort: `mkdirSync(recursive)` + `appendFileSync`; failures logged to stderr, submission not blocked. Logs cmd (first 80 chars) or argv (JSON array). Never logs env values. - `src/core/minions/attachments.ts` — Attachment validation (path traversal, null byte, oversize, base64, duplicate detection) -- `src/commands/jobs.ts` — `gbrain jobs` CLI subcommands + `gbrain jobs work` daemon +- `src/commands/jobs.ts` — `gbrain jobs` CLI subcommands + `gbrain jobs work` daemon. v0.13.1 surfaces the full `MinionJobInput` retry/backoff/timeout/idempotency surface as first-class CLI flags on `jobs submit`: `--max-stalled`, `--backoff-type fixed|exponential`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key`. `jobs smoke --sigkill-rescue` is the opt-in regression guard for #219. - `src/commands/features.ts` — `gbrain features --json --auto-fix`: usage scan + feature adoption salesman - `src/commands/autopilot.ts` — `gbrain autopilot --install`: self-maintaining brain daemon (sync+extract+embed) - `src/mcp/server.ts` — MCP stdio server (generated from operations) - `src/commands/auth.ts` — Standalone token management (create/list/revoke/test) - `src/commands/upgrade.ts` — Self-update CLI. `runPostUpgrade()` enumerates migrations from the TS registry (src/commands/migrations/index.ts) and tail-calls `runApplyMigrations(['--yes', '--non-interactive'])` so the mechanical side of every outstanding migration runs unconditionally. -- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3, pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry ` writes a `'retry'` reset marker. +- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3 — superseded by v0.14.3's schema-level DEFAULT 5 + UPDATE backfill; pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry ` writes a `'retry'` reset marker. v0.14.3 (fix wave) ships schema-only migrations v14 (`pages_updated_at_index`) + v15 (`minion_jobs_max_stalled_default_5` with UPDATE backfill) via the `MIGRATIONS` array in `src/core/migrate.ts` — no orchestrator phases needed. - `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent. - `src/commands/orphans.ts` — `gbrain orphans [--json] [--count] [--include-pseudo]`: surfaces pages with zero inbound wikilinks, grouped by domain. Auto-generated/raw/pseudo pages filtered by default. Also exposed as `find_orphans` MCP operation. Shipped in v0.12.3 (contributed by @knee5). -- `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run]`: health checks. v0.12.3 adds two reliability detection checks: `jsonb_integrity` (scans pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata for `jsonb_typeof='string'` rows left over from v0.12.0) and `markdown_body_completeness` (flags pages whose compiled_truth is <30% of raw source when raw has multiple H2/H3 boundaries). Fix hints point at `gbrain repair-jsonb` and `gbrain sync --force`. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. +- `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, and `gbrain apply-migrations`. +- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows. - `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (``, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics). -- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces the `${JSON.stringify(x)}::jsonb` interpolation pattern (which postgres.js v3 double-encodes). Wired into `bun test`. +- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces (a) the `${JSON.stringify(x)}::jsonb` interpolation pattern (postgres.js v3 double-encodes it), or (b) `max_stalled INTEGER NOT NULL DEFAULT 1` in any schema source file (v0.15.1 #219 regression guard — must be DEFAULT 5 to preserve SIGKILL-rescue). Wired into `bun test`. - `scripts/llms-config.ts` + `scripts/build-llms.ts` — Generator for `llms.txt` (llmstxt.org-spec web index) + `llms-full.txt` (inlined single-fetch bundle). Curated config drives both. Run `bun run build:llms` after adding a new doc. `LLMS_REPO_BASE` env var lets forks regenerate with their own URL base. `FULL_SIZE_BUDGET` (600KB) caps the inline bundle; generator WARNs if exceeded. Committed output is not analogous to `schema-embedded.ts` (no runtime consumer); we commit for GitHub browsing and fork-safe fetching. - `AGENTS.md` — Local-clone entry point for non-Claude agents (Codex, Cursor, OpenClaw, Aider). Mirrors `CLAUDE.md` intent via relative links. Claude Code keeps using `CLAUDE.md`. - `docs/UPGRADING_DOWNSTREAM_AGENTS.md` — Patches for downstream agent skill forks to apply when upgrading. Each release appends a new section. v0.10.3 includes diffs for brain-ops, meeting-ingestion, signal-detector, enrich. @@ -211,12 +213,13 @@ Key commands added in v0.7: - `gbrain migrate --to supabase` / `gbrain migrate --to pglite` — bidirectional engine migration Key commands added for Minions (job queue): -- `gbrain jobs submit [--params JSON] [--follow] [--dry-run]` — submit a background job +- `gbrain jobs submit [--params JSON] [--follow] [--dry-run]` — submit a background job. v0.13.1 adds first-class flags for every `MinionJobInput` tuning knob: `--max-stalled N`, `--backoff-type fixed|exponential`, `--backoff-delay Nms`, `--backoff-jitter 0..1`, `--timeout-ms N`, `--idempotency-key K`. - `gbrain jobs list [--status S] [--queue Q]` — list jobs with filters - `gbrain jobs get ` — job details with attempt history - `gbrain jobs cancel/retry/delete ` — manage job lifecycle - `gbrain jobs prune [--older-than 30d]` — clean old completed/dead jobs - `gbrain jobs stats` — job health dashboard +- `gbrain jobs smoke [--sigkill-rescue]` — health smoke test. `--sigkill-rescue` is the v0.13.1 regression guard for #219: simulates a killed worker and asserts the stalled job is requeued instead of dead-lettered on first stall. - `gbrain jobs work [--queue Q] [--concurrency N]` — start worker daemon (Postgres only) Key commands added in v0.12.2: @@ -233,6 +236,12 @@ Key commands added in v0.14.2: - `GBRAIN_POOL_SIZE` env var — honored by both the singleton pool (`src/core/db.ts`) and the parallel-import worker pool (`src/commands/import.ts`). Default is 10; lower to 2 for Supabase transaction pooler to avoid MaxClients crashes during `gbrain upgrade` subprocess spawns. Read at call time via `resolvePoolSize()`. - `gbrain doctor` gains two new checks: `sync_failures` (surfaces unacknowledged parse failures with exact paths + fix hints) and `brain_score` (renders the 5-component breakdown when score < 100: embed coverage / 35, link density / 25, timeline coverage / 15, orphans / 15, dead links / 10 — sum equals total). +Key commands added in v0.14.3 (fix wave): +- `gbrain doctor --index-audit` — opt-in Postgres-only check reporting zero-scan indexes from `pg_stat_user_indexes`. Informational only; never auto-drops. +- `gbrain doctor` schema_version check fails loudly when `version=0` — catches `bun install -g github:...` postinstall failures (#218) and routes users to `gbrain apply-migrations --yes`. +- `gbrain jobs submit` gains `--max-stalled`, `--backoff-type`, `--backoff-delay`, `--backoff-jitter`, `--timeout-ms`, `--idempotency-key` — exposing existing `MinionJobInput` fields as first-class CLI flags. +- `gbrain jobs smoke --sigkill-rescue` — opt-in regression smoke case simulating a killed worker; asserts the v0.14.3 schema default (`max_stalled=5`) actually rescues on first stall. + ## Testing `bun test` runs all tests. After the v0.12.1 release: ~75 unit test files + 8 E2E test files (1412 unit pass, 119 E2E when `DATABASE_URL` is set — skip gracefully otherwise). Unit tests run @@ -244,11 +253,11 @@ parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redac `test/files.test.ts` (MIME/hash), `test/import-file.test.ts` (import pipeline), `test/upgrade.test.ts` (schema migrations), `test/file-migration.test.ts` (file migration), `test/file-resolver.test.ts` (file resolution), -`test/import-resume.test.ts` (import checkpoints), `test/migrate.test.ts` (migration; v8/v9 helper-btree-index SQL structural assertions + 1000-row wall-clock fixtures that guard the O(n²)→O(n log n) fix), +`test/import-resume.test.ts` (import checkpoints), `test/migrate.test.ts` (migration; v8/v9 helper-btree-index SQL structural assertions + 1000-row wall-clock fixtures that guard the O(n²)→O(n log n) fix + v0.13.1 assertions on v12/v13 SQL shape, `sqlFor` + `transaction:false` runner semantics, and the `max_stalled DEFAULT 1` regression guard), `test/setup-branching.test.ts` (setup flow), `test/slug-validation.test.ts` (slug validation), `test/storage.test.ts` (storage backends), `test/supabase-admin.test.ts` (Supabase admin), `test/yaml-lite.test.ts` (YAML parsing), `test/check-update.test.ts` (version check + update CLI), -`test/pglite-engine.test.ts` (PGLite engine, all 40 BrainEngine methods including 11 cases for `addLinksBatch` / `addTimelineEntriesBatch`: empty batch, missing optionals, within-batch dedup via ON CONFLICT, missing-slug rows dropped by JOIN, half-existing batch, batch of 100), +`test/pglite-engine.test.ts` (PGLite engine, all 40 BrainEngine methods including 11 cases for `addLinksBatch` / `addTimelineEntriesBatch`: empty batch, missing optionals, within-batch dedup via ON CONFLICT, missing-slug rows dropped by JOIN, half-existing batch, batch of 100 + v0.13.1 `connect()` error-wrap assertion (original error nested, #223 link in message, lock released)), `test/engine-factory.test.ts` (engine factory + dynamic imports), `test/integrations.test.ts` (recipe parsing, CLI routing, recipe validation), `test/publish.test.ts` (content stripping, encryption, password generation, HTML output), @@ -269,7 +278,7 @@ parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redac `test/transcription.test.ts` (provider detection, format validation, API key errors), `test/enrichment-service.test.ts` (entity slugification, extraction, tier escalation), `test/data-research.test.ts` (recipe validation, MRR/ARR extraction, dedup, tracker parsing, HTML stripping), -`test/minions.test.ts` (Minions job queue v7: CRUD, state machine, backoff, stall detection, dependencies, worker lifecycle, lock management, claim mechanics, depth/child-cap, timeouts, cascade kill, idempotency, child_done inbox, attachments, removeOnComplete/Fail), +`test/minions.test.ts` (Minions job queue v7: CRUD, state machine, backoff, stall detection, dependencies, worker lifecycle, lock management, claim mechanics, depth/child-cap, timeouts, cascade kill, idempotency, child_done inbox, attachments, removeOnComplete/Fail + v0.13.1 `max_stalled` clamp/default/plumbing coverage), `test/extract.test.ts` (link extraction, timeline extraction, frontmatter parsing, directory type inference), `test/extract-db.test.ts` (gbrain extract --source db: typed link inference, idempotency, --type filter, --dry-run JSON output), `test/extract-fs.test.ts` (gbrain extract --source fs: first-run inserts + second-run reports zero, dry-run dedups candidates across files, second-run perf regression guard — the v0.12.1 N+1 dedup bug), @@ -724,6 +733,11 @@ bun install && bun link Verify: `gbrain --version` should print a version number. If `gbrain` is not found, restart the shell or add the PATH export to the shell profile. +> **Do NOT use `bun install -g github:garrytan/gbrain`.** Bun blocks the top-level +> postinstall hook on global installs, so schema migrations never run and the CLI +> aborts with `Aborted()` when it opens PGLite. Use the `git clone + bun link` path +> above. Tracking issue: [#218](https://github.com/garrytan/gbrain/issues/218). + ## Step 2: API Keys Ask the user for these: @@ -1022,6 +1036,11 @@ gbrain import ~/notes/ # index your markdown gbrain query "what themes show up across my notes?" ``` +**Do NOT use `bun install -g github:garrytan/gbrain`.** Bun blocks the top-level +postinstall hook on global installs, so schema migrations never run and the CLI +aborts with `Aborted()` the first time it opens PGLite. Use `git clone + bun install +&& bun link` as shown above. See [#218](https://github.com/garrytan/gbrain/issues/218). + ``` 3 results (hybrid search, 0.12s): diff --git a/package.json b/package.json index ff5c490..17b10c5 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gbrain", - "version": "0.15.0", + "version": "0.15.1", "description": "Postgres-native personal knowledge brain with hybrid RAG search", "type": "module", "main": "src/core/index.ts", @@ -22,9 +22,9 @@ "build:schema": "bash scripts/build-schema.sh", "build:llms": "bun run scripts/build-llms.ts", "test": "scripts/check-jsonb-pattern.sh && bun test", - "test:e2e": "bun test test/e2e/", + "test:e2e": "bash scripts/run-e2e.sh", "check:jsonb": "scripts/check-jsonb-pattern.sh", - "postinstall": "gbrain --version >/dev/null 2>&1 && gbrain apply-migrations --yes --non-interactive 2>/dev/null || true", + "postinstall": "command -v gbrain >/dev/null 2>&1 && gbrain apply-migrations --yes --non-interactive || echo '[gbrain] postinstall skipped. If installed via bun install -g github:...: run `gbrain doctor` and `gbrain apply-migrations --yes` manually. See https://github.com/garrytan/gbrain/issues/218' 1>&2", "prepublish:clawhub": "bun run build:all", "publish:clawhub": "clawhub package publish . --family bundle-plugin" }, @@ -36,7 +36,7 @@ "dependencies": { "@anthropic-ai/sdk": "^0.30.0", "@aws-sdk/client-s3": "^3.1028.0", - "@electric-sql/pglite": "^0.4.4", + "@electric-sql/pglite": "0.4.3", "@modelcontextprotocol/sdk": "^1.0.0", "gray-matter": "^4.0.3", "marked": "^18.0.0", @@ -47,5 +47,8 @@ "devDependencies": { "@types/bun": "latest" }, + "trustedDependencies": [ + "@electric-sql/pglite" + ], "license": "MIT" } diff --git a/scripts/check-jsonb-pattern.sh b/scripts/check-jsonb-pattern.sh index 16e211e..2403f7b 100755 --- a/scripts/check-jsonb-pattern.sh +++ b/scripts/check-jsonb-pattern.sh @@ -30,3 +30,17 @@ if grep -rEn "$PATTERN" src/ 2>/dev/null; then fi echo "OK: no JSON.stringify(x)::jsonb interpolation pattern in src/" + +# v0.13.1 #219: guard against max_stalled DEFAULT 1 regressing in any schema +# source file. DEFAULT 1 dead-lettered any SIGKILL'd job on first stall, making +# the "10/10 rescued" claim false for out-of-the-box users. Default is 5 now. +MAX_STALLED_PATTERN='max_stalled\s+INTEGER\s+NOT\s+NULL\s+DEFAULT\s+1\b' + +if grep -rEn "$MAX_STALLED_PATTERN" src/schema.sql src/core/migrate.ts src/core/pglite-schema.ts src/core/schema-embedded.ts 2>/dev/null; then + echo + echo "ERROR: max_stalled DEFAULT 1 reintroduced in schema." + echo " Must be DEFAULT 5 to preserve SIGKILL-rescue guarantee. See #219." + exit 1 +fi + +echo "OK: max_stalled defaults are 5 in all schema sources" diff --git a/scripts/run-e2e.sh b/scripts/run-e2e.sh new file mode 100755 index 0000000..6901d5e --- /dev/null +++ b/scripts/run-e2e.sh @@ -0,0 +1,66 @@ +#!/usr/bin/env bash +# Run E2E tests ONE FILE AT A TIME. +# +# Bun's default is to run test files in parallel (each in its own worker). +# Our E2E suite shares one Postgres database across all 13 files, and +# `setupDB()` does TRUNCATE CASCADE + fixture import. When files run in +# parallel, file A's TRUNCATE can race with file B's fixture import, +# producing observed fails like "expected 16 pages, got 8", missing +# links, orphaned timeline entries, etc. The flakiness was visible on +# ~3 of every 5 runs pre-fix. +# +# Running files sequentially eliminates the race entirely. It also costs +# some startup overhead (each file spins up a fresh bun process) but for +# a suite this size that is measured in ~1-2s per file, amortized under +# the natural per-file test time of 5-10s. +# +# Exits non-zero on the first failing file so CI fails fast. + +set -euo pipefail + +cd "$(dirname "$0")/.." + +pass_files=0 +fail_files=0 +fail_list=() +total_pass=0 +total_fail=0 + +for f in test/e2e/*.test.ts; do + name=$(basename "$f") + echo "" + echo "=== $name ===" + if output=$(bun test "$f" 2>&1); then + pass_files=$((pass_files + 1)) + # Extract pass/fail counts from bun's summary (e.g., "123 pass") + p=$(echo "$output" | grep -oE '[0-9]+ pass' | tail -1 | grep -oE '[0-9]+' || echo 0) + total_pass=$((total_pass + p)) + echo "$output" | tail -8 + else + fail_files=$((fail_files + 1)) + fail_list+=("$name") + p=$(echo "$output" | grep -oE '[0-9]+ pass' | tail -1 | grep -oE '[0-9]+' || echo 0) + fl=$(echo "$output" | grep -oE '[0-9]+ fail' | tail -1 | grep -oE '[0-9]+' || echo 0) + total_pass=$((total_pass + p)) + total_fail=$((total_fail + fl)) + echo "$output" + echo "" + echo "FAILED: $name" + # Continue so we see all failures; exit nonzero at the end. + fi +done + +echo "" +echo "========================================" +echo "E2E SUMMARY (sequential execution)" +echo "========================================" +echo "Files: $((pass_files + fail_files)) total, $pass_files passed, $fail_files failed" +echo "Tests: $total_pass passed, $total_fail failed" +if [ ${#fail_list[@]} -gt 0 ]; then + echo "" + echo "Failing files:" + for f in "${fail_list[@]}"; do + echo " - $f" + done + exit 1 +fi diff --git a/src/commands/doctor.ts b/src/commands/doctor.ts index 46ebb02..45a8cc4 100644 --- a/src/commands/doctor.ts +++ b/src/commands/doctor.ts @@ -241,15 +241,30 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo checks.push({ name: 'rls', status: 'warn', message: 'Could not check RLS status' }); } - // 6. Schema version + // 6. Schema version — also surfaces the #218 "postinstall silently failed" + // state: if schema_version is 0/missing but the DB connected, migrations + // never ran. That's the same class as a half-migrated install, just from a + // different root cause (Bun blocked our top-level postinstall on global + // install). Message is actionable either way. let schemaVersion = 0; try { const version = await engine.getConfig('version'); schemaVersion = parseInt(version || '0', 10); if (schemaVersion >= LATEST_VERSION) { checks.push({ name: 'schema_version', status: 'ok', message: `Version ${schemaVersion} (latest: ${LATEST_VERSION})` }); + } else if (schemaVersion === 0) { + checks.push({ + name: 'schema_version', + status: 'fail', + message: `No schema version recorded. Migrations never ran. Fix: gbrain apply-migrations --yes. ` + + `If you installed via 'bun install -g github:...', see https://github.com/garrytan/gbrain/issues/218.`, + }); } else { - checks.push({ name: 'schema_version', status: 'warn', message: `Version ${schemaVersion}, latest is ${LATEST_VERSION}. Run gbrain init to migrate.` }); + checks.push({ + name: 'schema_version', + status: 'warn', + message: `Version ${schemaVersion}, latest is ${LATEST_VERSION}. Fix: gbrain apply-migrations --yes`, + }); } } catch { checks.push({ name: 'schema_version', status: 'warn', message: 'Could not check schema version' }); @@ -415,6 +430,51 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo checks.push({ name: 'markdown_body_completeness', status: 'ok', message: 'Skipped (raw_data unavailable)' }); } + // 11. Index audit (opt-in via --index-audit). v0.13.1 follow-up to #170. + // Reports indexes with zero recorded scans on Postgres. Informational only; + // we DO NOT auto-drop. On #170's brain, idx_pages_frontmatter and + // idx_pages_trgm showed 0 scans — the suggestion there is "consider + // investigating on YOUR brain," not "drop these globally." Zero scans on a + // fresh install is also normal (nothing has queried yet); the real signal + // is zero scans on a long-running active brain. + if (args.includes('--index-audit')) { + if (engine.kind === 'pglite') { + checks.push({ + name: 'index_audit', + status: 'ok', + message: 'Skipped (PGLite — pg_stat_user_indexes is a Postgres extension)', + }); + } else { + try { + const sql = db.getConnection(); + const rows = await sql` + SELECT schemaname, relname AS table, indexrelname AS index, + idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size + FROM pg_stat_user_indexes + WHERE schemaname = 'public' + AND idx_scan = 0 + ORDER BY pg_relation_size(indexrelid) DESC + LIMIT 20 + `; + if (rows.length === 0) { + checks.push({ name: 'index_audit', status: 'ok', message: 'All public indexes have recorded scans' }); + } else { + const list = rows.map((r: any) => `${r.index}(${r.size})`).join(', '); + checks.push({ + name: 'index_audit', + status: 'warn', + message: `${rows.length} zero-scan index(es): ${list}. ` + + `Consider investigating whether they're used on YOUR workload (fresh brains naturally show zero scans until queries accumulate). ` + + `Do not drop without confirming.`, + }); + } + } catch (e) { + const msg = e instanceof Error ? e.message : String(e); + checks.push({ name: 'index_audit', status: 'warn', message: `Index audit failed: ${msg}` }); + } + } + } + const hasFail = outputResults(checks, jsonOutput); // Features teaser (non-JSON, non-failing only) diff --git a/src/commands/jobs.ts b/src/commands/jobs.ts index 278faa9..7244dbb 100644 --- a/src/commands/jobs.ts +++ b/src/commands/jobs.ts @@ -57,8 +57,10 @@ export async function runJobs(engine: BrainEngine, args: string[]): Promise [--params JSON] [--follow] [--priority N] - [--delay Nms] [--timeout-ms Nms] [--max-attempts N] - [--queue Q] [--dry-run] + [--delay Nms] [--max-attempts N] [--max-stalled N] + [--backoff-type fixed|exponential] [--backoff-delay Nms] + [--backoff-jitter 0..1] [--timeout-ms Nms] + [--idempotency-key K] [--queue Q] [--dry-run] gbrain jobs list [--status S] [--queue Q] [--limit N] gbrain jobs get gbrain jobs cancel @@ -104,13 +106,26 @@ HANDLER TYPES (built in) const priority = parseInt(parseFlag(args, '--priority') ?? '0', 10); const delay = parseInt(parseFlag(args, '--delay') ?? '0', 10); const maxAttempts = parseInt(parseFlag(args, '--max-attempts') ?? '3', 10); - const queueName = parseFlag(args, '--queue') ?? 'default'; + const maxStalledRaw = parseFlag(args, '--max-stalled'); + const maxStalled = maxStalledRaw !== undefined ? parseInt(maxStalledRaw, 10) : undefined; + // v0.13.1 field audit: expose retry/backoff/timeout/idempotency knobs so + // users can tune Minions behavior without dropping into TypeScript. + const backoffTypeRaw = parseFlag(args, '--backoff-type'); + const backoffType = backoffTypeRaw === 'fixed' || backoffTypeRaw === 'exponential' + ? backoffTypeRaw + : undefined; + const backoffDelayRaw = parseFlag(args, '--backoff-delay'); + const backoffDelay = backoffDelayRaw !== undefined ? parseInt(backoffDelayRaw, 10) : undefined; + const backoffJitterRaw = parseFlag(args, '--backoff-jitter'); + const backoffJitter = backoffJitterRaw !== undefined ? parseFloat(backoffJitterRaw) : undefined; const timeoutMsRaw = parseFlag(args, '--timeout-ms'); const timeoutMs = timeoutMsRaw !== undefined ? parseInt(timeoutMsRaw, 10) : undefined; if (timeoutMsRaw !== undefined && (isNaN(timeoutMs!) || timeoutMs! <= 0)) { console.error('Error: --timeout-ms must be a positive integer (milliseconds)'); process.exit(1); } + const idempotencyKey = parseFlag(args, '--idempotency-key'); + const queueName = parseFlag(args, '--queue') ?? 'default'; const dryRun = hasFlag(args, '--dry-run'); const follow = hasFlag(args, '--follow'); @@ -120,8 +135,13 @@ HANDLER TYPES (built in) console.log(` Queue: ${queueName}`); console.log(` Priority: ${priority}`); console.log(` Max attempts: ${maxAttempts}`); + if (maxStalled !== undefined) console.log(` Max stalled: ${maxStalled}`); + if (backoffType) console.log(` Backoff type: ${backoffType}`); + if (backoffDelay !== undefined) console.log(` Backoff delay: ${backoffDelay}ms`); + if (backoffJitter !== undefined) console.log(` Backoff jitter: ${backoffJitter}`); + if (timeoutMs !== undefined) console.log(` Timeout: ${timeoutMs}ms`); + if (idempotencyKey) console.log(` Idempotency key: ${idempotencyKey}`); if (delay > 0) console.log(` Delay: ${delay}ms`); - if (timeoutMs) console.log(` Timeout: ${timeoutMs}ms`); console.log(` Data: ${JSON.stringify(data)}`); return; } @@ -142,8 +162,13 @@ HANDLER TYPES (built in) priority, delay: delay > 0 ? delay : undefined, max_attempts: maxAttempts, - queue: queueName, + max_stalled: maxStalled, + backoff_type: backoffType, + backoff_delay: backoffDelay, + backoff_jitter: backoffJitter, timeout_ms: timeoutMs, + idempotency_key: idempotencyKey, + queue: queueName, }, trusted); // Submission audit log (operational trace, not forensic insurance). @@ -353,6 +378,8 @@ HANDLER TYPES (built in) process.exit(1); } + const sigkillRescue = hasFlag(args, '--sigkill-rescue'); + const worker = new MinionWorker(engine, { queue: 'smoke', pollInterval: 100 }); worker.register('noop', async () => ({ ok: true, at: new Date().toISOString() })); @@ -370,22 +397,64 @@ HANDLER TYPES (built in) await workerPromise; const elapsedSec = ((Date.now() - startTime) / 1000).toFixed(2); - if (final?.status === 'completed') { - const cfg = (await import('../core/config.ts')).loadConfig(); - const engineLabel = cfg?.engine ?? 'unknown'; - console.log(`SMOKE PASS — Minions healthy in ${elapsedSec}s (engine: ${engineLabel})`); - if (engineLabel === 'pglite') { - console.log('Note: the `gbrain jobs work` daemon requires Postgres. PGLite'); - console.log('supports inline execution only (`submit --follow`).'); - } - try { await queue.removeJob(job.id); } catch { /* non-fatal cleanup */ } - process.exit(0); - } else { + if (final?.status !== 'completed') { console.error(`SMOKE FAIL — job #${job.id} status: ${final?.status ?? 'timeout'} (${elapsedSec}s elapsed)`); if (final?.error_text) console.error(` Error: ${final.error_text}`); process.exit(1); } - break; + + // --sigkill-rescue: regression case for #219. Simulates a SIGKILL + // mid-flight by directly manipulating lock_until via handleStalled. + // Verifies that with the v0.13.1 schema default (max_stalled=5), a + // stalled job is REQUEUED rather than dead-lettered on first stall. + // Full subprocess-level SIGKILL lives in test/e2e/minions.test.ts. + if (sigkillRescue) { + const rescueJob = await queue.add('noop', {}, { queue: 'smoke' }); + + // Transition to active with a past lock_until, mimicking a worker + // that claimed and then got SIGKILL'd mid-run. + await engine.executeRaw( + `UPDATE minion_jobs + SET status='active', + lock_token='smoke-sigkill-rescue', + lock_until=now() - interval '1 minute', + started_at=now() - interval '2 minute', + attempts_started = attempts_started + 1 + WHERE id=$1`, + [rescueJob.id] + ); + + const result = await queue.handleStalled(); + const afterStall = await queue.getJob(rescueJob.id); + + if (afterStall?.status === 'dead') { + console.error( + `SMOKE FAIL (--sigkill-rescue) — job #${rescueJob.id} was dead-lettered on first stall. ` + + `This is the #219 regression: schema default max_stalled should rescue, not dead-letter. ` + + `handleStalled: ${JSON.stringify(result)}` + ); + process.exit(1); + } + if (afterStall?.status !== 'waiting') { + console.error( + `SMOKE FAIL (--sigkill-rescue) — unexpected status after stall: ${afterStall?.status}. ` + + `Expected 'waiting' (rescued). handleStalled: ${JSON.stringify(result)}` + ); + process.exit(1); + } + try { await queue.removeJob(rescueJob.id); } catch { /* non-fatal cleanup */ } + } + + const cfg = (await import('../core/config.ts')).loadConfig(); + const engineLabel = cfg?.engine ?? 'unknown'; + const tag = sigkillRescue ? ' + SIGKILL rescue' : ''; + console.log(`SMOKE PASS — Minions healthy${tag} in ${elapsedSec}s (engine: ${engineLabel})`); + if (engineLabel === 'pglite') { + console.log('Note: the `gbrain jobs work` daemon requires Postgres. PGLite'); + console.log('supports inline execution only (`submit --follow`).'); + } + try { await queue.removeJob(job.id); } catch { /* non-fatal cleanup */ } + process.exit(0); } case 'work': { diff --git a/src/core/engine.ts b/src/core/engine.ts index f621151..ca3742e 100644 --- a/src/core/engine.ts +++ b/src/core/engine.ts @@ -50,6 +50,9 @@ export function clampSearchLimit(limit: number | undefined, defaultLimit = 20, c } export interface BrainEngine { + /** Discriminator: lets migrations and other consumers branch on engine kind without instanceof + dynamic imports. */ + readonly kind: 'postgres' | 'pglite'; + // Lifecycle connect(config: EngineConfig): Promise; disconnect(): Promise; diff --git a/src/core/migrate.ts b/src/core/migrate.ts index 2daad1a..cd90eca 100644 --- a/src/core/migrate.ts +++ b/src/core/migrate.ts @@ -17,7 +17,20 @@ import { slugifyPath } from './sync.ts'; interface Migration { version: number; name: string; + /** Engine-agnostic SQL. Used when `sqlFor` is absent. Set to '' for handler-only or sqlFor-only migrations. */ sql: string; + /** + * Engine-specific SQL. If present, overrides `sql` for the matching engine. + * Needed when Postgres wants CONCURRENTLY but PGLite can't honor it. + */ + sqlFor?: { postgres?: string; pglite?: string }; + /** + * When false, the runner does NOT wrap the SQL in `engine.transaction()`. + * Required for `CREATE INDEX CONCURRENTLY` (which Postgres refuses inside a transaction). + * Enforced Postgres-only; ignored on PGLite (PGLite has no concurrent writers anyway). + * Defaults to true. + */ + transaction?: boolean; handler?: (engine: BrainEngine) => Promise; } @@ -102,7 +115,7 @@ export const MIGRATIONS: Migration[] = [ backoff_delay INTEGER NOT NULL DEFAULT 1000, backoff_jitter REAL NOT NULL DEFAULT 0.2, stalled_counter INTEGER NOT NULL DEFAULT 0, - max_stalled INTEGER NOT NULL DEFAULT 1, + max_stalled INTEGER NOT NULL DEFAULT 5, lock_token TEXT, lock_until TIMESTAMPTZ, delay_until TIMESTAMPTZ, @@ -355,9 +368,8 @@ export const MIGRATIONS: Migration[] = [ // midnight rollover in the user's TZ naturally creates a new row instead of // mutating yesterday's. reserved_usd and committed_usd track reservations // vs actuals so process death between reserve() and commit()/rollback() - // can be cleaned up by TTL scan. status and reserved_at exist for that - // reclaim path. Rollback: DROP TABLE (budget is regenerable from resolver - // call logs; no durable product data lives here). + // can be cleaned up by TTL scan. Rollback: DROP TABLE (regenerable from + // resolver call logs; no durable product data lives here). sql: ` CREATE TABLE IF NOT EXISTS budget_ledger ( scope TEXT NOT NULL, @@ -388,16 +400,6 @@ export const MIGRATIONS: Migration[] = [ version: 13, name: 'minion_quiet_hours_stagger', // Adds quiet-hours gating + deterministic stagger to Minions. - // - // quiet_hours (JSONB): {start, end, tz, policy} — checked at claim - // time by the worker, not at dispatch. A queued job inside its quiet - // window is released back to 'waiting' and claimed again outside the - // window. 'skip' policy drops the event, 'defer' re-queues. - // stagger_key (TEXT): hashed to a minute-slot offset so jobs with the - // same key don't collide when a cron boundary fires. Optional; NULL - // = no stagger. The hash lives in application code (deterministic, - // ensures same key always lands on same slot) so the column is - // just the key. sql: ` ALTER TABLE minion_jobs ADD COLUMN IF NOT EXISTS quiet_hours JSONB; ALTER TABLE minion_jobs ADD COLUMN IF NOT EXISTS stagger_key TEXT; @@ -405,6 +407,65 @@ export const MIGRATIONS: Migration[] = [ ON minion_jobs(stagger_key) WHERE stagger_key IS NOT NULL; `, }, + { + version: 14, + name: 'pages_updated_at_index', + // v0.14.1 (fix wave): fixes the 14.6s "list pages newest-first" seqscan on 31k+ row brains. + // Original report: https://github.com/garrytan/gbrain/issues/170 (PR #215). + // + // Engine-aware via handler (not SQL): Postgres uses CREATE INDEX CONCURRENTLY + // to avoid the write-blocking SHARE lock on `pages`. CONCURRENTLY refuses to + // run inside a transaction AND postgres.js's multi-statement `.unsafe()` wraps + // in an implicit transaction, so the handler runs each statement as a separate + // call. A failed CONCURRENTLY leaves an invalid index with the target name; + // the handler pre-drops any invalid remnant via pg_index.indisvalid. PGLite + // has no concurrent writers, so plain CREATE is safe. + sql: '', + handler: async (engine) => { + if (engine.kind === 'postgres') { + await engine.runMigration( + 14, + `DO $$ BEGIN + IF EXISTS ( + SELECT 1 FROM pg_index i + JOIN pg_class c ON c.oid = i.indexrelid + WHERE c.relname = 'idx_pages_updated_at_desc' AND NOT i.indisvalid + ) THEN + EXECUTE 'DROP INDEX CONCURRENTLY IF EXISTS idx_pages_updated_at_desc'; + END IF; + END $$;` + ); + await engine.runMigration( + 14, + `CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_pages_updated_at_desc + ON pages (updated_at DESC);` + ); + } else { + await engine.runMigration( + 14, + `CREATE INDEX IF NOT EXISTS idx_pages_updated_at_desc + ON pages (updated_at DESC);` + ); + } + }, + }, + { + version: 15, + name: 'minion_jobs_max_stalled_default_5', + // v0.14.1 (fix wave): fixes https://github.com/garrytan/gbrain/issues/219 + // Shipped default was 1 — first stall = dead-letter, contradicting the + // "SIGKILL rescued" claim. New default 5. UPDATE backfills existing non- + // terminal rows so upgrading brains don't keep dead-lettering queued work. + // Statuses come from MinionJobStatus in types.ts. Row locks serialize + // against claim()'s FOR UPDATE SKIP LOCKED — race-safe. Idempotent. + sql: ` + ALTER TABLE minion_jobs ALTER COLUMN max_stalled SET DEFAULT 5; + UPDATE minion_jobs + SET max_stalled = 5 + WHERE status IN ('waiting','active','delayed','waiting-children','paused') + AND max_stalled < 5; + `, + }, ]; export const LATEST_VERSION = MIGRATIONS.length > 0 @@ -418,11 +479,23 @@ export async function runMigrations(engine: BrainEngine): Promise<{ applied: num let applied = 0; for (const m of MIGRATIONS) { if (m.version > current) { - // SQL migration (transactional) - if (m.sql) { - await engine.transaction(async (tx) => { - await tx.runMigration(m.version, m.sql); - }); + // Pick SQL: engine-specific `sqlFor` wins over engine-agnostic `sql`. + const sql = m.sqlFor?.[engine.kind] ?? m.sql; + + if (sql) { + const useTransaction = m.transaction !== false; + // Non-transactional path is Postgres-only: `CREATE INDEX CONCURRENTLY` + // refuses to run inside a transaction. PGLite has no concurrent + // writers, so even if a migration sets transaction:false we wrap it + // anyway (harmless; keeps behavior consistent). + if (useTransaction || engine.kind === 'pglite') { + await engine.transaction(async (tx) => { + await tx.runMigration(m.version, sql); + }); + } else { + // Postgres + transaction:false → direct execution, no BEGIN/COMMIT. + await engine.runMigration(m.version, sql); + } } // Application-level handler (runs outside transaction for flexibility) diff --git a/src/core/minions/queue.ts b/src/core/minions/queue.ts index e0dfd56..ba04e82 100644 --- a/src/core/minions/queue.ts +++ b/src/core/minions/queue.ts @@ -134,23 +134,34 @@ export class MinionQueue { // 3. Insert child. Use ON CONFLICT for idempotency; if a concurrent submit // raced past the fast-path SELECT, the unique index catches it here. - // v12 adds quiet_hours + stagger_key passed through from opts. - const insertSql = opts?.idempotency_key - ? `INSERT INTO minion_jobs (name, queue, status, priority, data, max_attempts, backoff_type, + // v13 quiet_hours + stagger_key always present (null fallback; schema + // stores NULL). v15 max_stalled is conditional: provided values get + // clamped to [1, 100] and included in the INSERT; omitted values + // skip the column so the schema DEFAULT (5 as of v0.14.1) kicks in. + // Keeps the app layer from hardcoding the schema default constant. + const hasMaxStalled = opts?.max_stalled !== undefined && opts.max_stalled !== null; + const clampedMaxStalled = hasMaxStalled + ? Math.max(1, Math.min(100, Math.floor(opts!.max_stalled as number))) + : null; + + const baseCols = `name, queue, status, priority, data, max_attempts, backoff_type, backoff_delay, backoff_jitter, delay_until, parent_job_id, on_child_fail, depth, max_children, timeout_ms, remove_on_complete, remove_on_fail, idempotency_key, - quiet_hours, stagger_key) - VALUES ($1, $2, $3, $4, $5::jsonb, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19::jsonb, $20) + quiet_hours, stagger_key`; + const baseVals = `$1, $2, $3, $4, $5::jsonb, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19::jsonb, $20`; + const cols = hasMaxStalled ? `${baseCols}, max_stalled` : baseCols; + const vals = hasMaxStalled ? `${baseVals}, $21` : baseVals; + + const insertSql = opts?.idempotency_key + ? `INSERT INTO minion_jobs (${cols}) + VALUES (${vals}) ON CONFLICT (idempotency_key) WHERE idempotency_key IS NOT NULL DO NOTHING RETURNING *` - : `INSERT INTO minion_jobs (name, queue, status, priority, data, max_attempts, backoff_type, - backoff_delay, backoff_jitter, delay_until, parent_job_id, on_child_fail, - depth, max_children, timeout_ms, remove_on_complete, remove_on_fail, idempotency_key, - quiet_hours, stagger_key) - VALUES ($1, $2, $3, $4, $5::jsonb, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19::jsonb, $20) + : `INSERT INTO minion_jobs (${cols}) + VALUES (${vals}) RETURNING *`; - const params = [ + const params: unknown[] = [ jobName, opts?.queue ?? 'default', childStatus, @@ -172,6 +183,7 @@ export class MinionQueue { opts?.quiet_hours ?? null, opts?.stagger_key ?? null, ]; + if (hasMaxStalled) params.push(clampedMaxStalled); const inserted = await tx.executeRaw>(insertSql, params); diff --git a/src/core/minions/types.ts b/src/core/minions/types.ts index c6aed1a..7c2859f 100644 --- a/src/core/minions/types.ts +++ b/src/core/minions/types.ts @@ -103,6 +103,12 @@ export interface MinionJobInput { backoff_type?: BackoffType; backoff_delay?: number; backoff_jitter?: number; + /** + * Max number of stall windows before dead-letter. Default is the schema + * default (5 as of v0.13.1). Clamped to [1, 100] on insert — values + * outside that range are silently coerced. See migration v13. + */ + max_stalled?: number; delay?: number; // ms delay before eligible parent_job_id?: number; on_child_fail?: ChildFailPolicy; diff --git a/src/core/pglite-engine.ts b/src/core/pglite-engine.ts index 89cce96..a86f9c4 100644 --- a/src/core/pglite-engine.ts +++ b/src/core/pglite-engine.ts @@ -24,6 +24,7 @@ import { validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult } f type PGLiteDB = PGlite; export class PGLiteEngine implements BrainEngine { + readonly kind = 'pglite' as const; private _db: PGLiteDB | null = null; private _lock: LockHandle | null = null; @@ -43,10 +44,32 @@ export class PGLiteEngine implements BrainEngine { throw new Error('Could not acquire PGLite lock. Another gbrain process is using the database.'); } - this._db = await PGlite.create({ - dataDir, - extensions: { vector, pg_trgm }, - }); + try { + this._db = await PGlite.create({ + dataDir, + extensions: { vector, pg_trgm }, + }); + } catch (err) { + // v0.13.1: any PGLite.create() failure becomes actionable. Most commonly + // this is the macOS 26.3 WASM bug (#223). We deliberately do NOT suggest + // "missing migrations" as a cause — migrations run AFTER create(), so a + // create-time abort has nothing to do with them. Nest the original error + // message so debugging isn't erased. + const original = err instanceof Error ? err.message : String(err); + const wrapped = new Error( + `PGLite failed to initialize its WASM runtime.\n` + + ` This is most commonly the macOS 26.3 WASM bug: https://github.com/garrytan/gbrain/issues/223\n` + + ` Run \`gbrain doctor\` for a full diagnosis.\n` + + ` Original error: ${original}` + ); + // Release the lock so a fresh process can try again; leaking the lock + // here turns a recoverable init error into a stuck-brain state. + if (this._lock?.acquired) { + try { await releaseLock(this._lock); } catch { /* ignore cleanup error */ } + this._lock = null; + } + throw wrapped; + } } async disconnect(): Promise { diff --git a/src/core/pglite-schema.ts b/src/core/pglite-schema.ts index 80dd2e8..c79e41c 100644 --- a/src/core/pglite-schema.ts +++ b/src/core/pglite-schema.ts @@ -185,7 +185,7 @@ CREATE TABLE IF NOT EXISTS minion_jobs ( backoff_delay INTEGER NOT NULL DEFAULT 1000, backoff_jitter REAL NOT NULL DEFAULT 0.2, stalled_counter INTEGER NOT NULL DEFAULT 0, - max_stalled INTEGER NOT NULL DEFAULT 3, + max_stalled INTEGER NOT NULL DEFAULT 5, lock_token TEXT, lock_until TIMESTAMPTZ, delay_until TIMESTAMPTZ, diff --git a/src/core/postgres-engine.ts b/src/core/postgres-engine.ts index ee18389..4a6a9ce 100644 --- a/src/core/postgres-engine.ts +++ b/src/core/postgres-engine.ts @@ -20,6 +20,7 @@ import * as db from './db.ts'; import { validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult, parseEmbedding, tryParseEmbedding } from './utils.ts'; export class PostgresEngine implements BrainEngine { + readonly kind = 'postgres' as const; private _sql: ReturnType | null = null; // Instance connection (for workers) or fall back to module global (backward compat) diff --git a/src/core/schema-embedded.ts b/src/core/schema-embedded.ts index f4ea92a..6f58b76 100644 --- a/src/core/schema-embedded.ts +++ b/src/core/schema-embedded.ts @@ -28,6 +28,8 @@ CREATE TABLE IF NOT EXISTS pages ( CREATE INDEX IF NOT EXISTS idx_pages_type ON pages(type); CREATE INDEX IF NOT EXISTS idx_pages_frontmatter ON pages USING GIN(frontmatter); CREATE INDEX IF NOT EXISTS idx_pages_trgm ON pages USING GIN(title gin_trgm_ops); +-- v0.13.1 #170: avoids 14.6s seqscan on large brains when listing pages newest-first. +CREATE INDEX IF NOT EXISTS idx_pages_updated_at_desc ON pages (updated_at DESC); -- ============================================================ -- content_chunks: chunked content with embeddings @@ -280,7 +282,7 @@ CREATE TABLE IF NOT EXISTS minion_jobs ( backoff_delay INTEGER NOT NULL DEFAULT 1000, backoff_jitter REAL NOT NULL DEFAULT 0.2, stalled_counter INTEGER NOT NULL DEFAULT 0, - max_stalled INTEGER NOT NULL DEFAULT 3, + max_stalled INTEGER NOT NULL DEFAULT 5, lock_token TEXT, lock_until TIMESTAMPTZ, delay_until TIMESTAMPTZ, diff --git a/src/schema.sql b/src/schema.sql index e94497b..8dd0205 100644 --- a/src/schema.sql +++ b/src/schema.sql @@ -24,6 +24,8 @@ CREATE TABLE IF NOT EXISTS pages ( CREATE INDEX IF NOT EXISTS idx_pages_type ON pages(type); CREATE INDEX IF NOT EXISTS idx_pages_frontmatter ON pages USING GIN(frontmatter); CREATE INDEX IF NOT EXISTS idx_pages_trgm ON pages USING GIN(title gin_trgm_ops); +-- v0.13.1 #170: avoids 14.6s seqscan on large brains when listing pages newest-first. +CREATE INDEX IF NOT EXISTS idx_pages_updated_at_desc ON pages (updated_at DESC); -- ============================================================ -- content_chunks: chunked content with embeddings @@ -276,7 +278,7 @@ CREATE TABLE IF NOT EXISTS minion_jobs ( backoff_delay INTEGER NOT NULL DEFAULT 1000, backoff_jitter REAL NOT NULL DEFAULT 0.2, stalled_counter INTEGER NOT NULL DEFAULT 0, - max_stalled INTEGER NOT NULL DEFAULT 3, + max_stalled INTEGER NOT NULL DEFAULT 5, lock_token TEXT, lock_until TIMESTAMPTZ, delay_until TIMESTAMPTZ, diff --git a/test/migrate.test.ts b/test/migrate.test.ts index 554b9be..b27213d 100644 --- a/test/migrate.test.ts +++ b/test/migrate.test.ts @@ -79,6 +79,112 @@ describe('migrations v8 + v9 — structural guard for helper-index fix', () => { }); }); +// v0.14.1 — fix wave structural assertions (migrations renumbered from v12/v13 to +// v14/v15 after master merged budget_ledger (v12) + minion_quiet_hours_stagger (v13)). +describe('migrate v14 — pages_updated_at_index (handler-based, engine-aware)', () => { + const v14 = MIGRATIONS.find(m => m.version === 14); + test('v14 exists and uses a handler (not pure SQL) for engine-aware branching', () => { + expect(v14).toBeDefined(); + expect(v14!.name).toBe('pages_updated_at_index'); + expect(typeof v14!.handler).toBe('function'); + expect(v14!.sql).toBe(''); + }); + + test('v14 handler source contains CONCURRENTLY + invalid-index cleanup for Postgres branch', async () => { + const { readFileSync } = await import('fs'); + const src = readFileSync('src/core/migrate.ts', 'utf-8'); + const v14Start = src.indexOf("name: 'pages_updated_at_index'"); + expect(v14Start).toBeGreaterThan(-1); + const v14Block = src.slice(v14Start, v14Start + 3000); + expect(v14Block).toContain('pg_index'); + expect(v14Block).toContain('indisvalid'); + expect(v14Block).toContain('DROP INDEX CONCURRENTLY IF EXISTS idx_pages_updated_at_desc'); + expect(v14Block).toContain('CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_pages_updated_at_desc'); + // Order within the handler body: DROP IF EXISTS must precede CREATE IF NOT EXISTS, + // so a failed prior CONCURRENTLY build is cleaned before re-create. Anchor on the + // explicit "IF EXISTS" / "IF NOT EXISTS" phrases so the header doc-comment + // (which mentions both unqualified) doesn't fool the ordering assertion. + const dropIdx = v14Block.indexOf('DROP INDEX CONCURRENTLY IF EXISTS'); + const createIdx = v14Block.indexOf('CREATE INDEX CONCURRENTLY IF NOT EXISTS'); + expect(dropIdx).toBeLessThan(createIdx); + expect(v14Block).toContain('engine.kind'); + }); +}); + +describe('migrate v15 — minion_jobs_max_stalled_default_5', () => { + const v15 = MIGRATIONS.find(m => m.version === 15); + test('v15 exists and alters max_stalled default to 5', () => { + expect(v15).toBeDefined(); + expect(v15!.name).toBe('minion_jobs_max_stalled_default_5'); + expect(v15!.sql).toContain('ALTER TABLE minion_jobs ALTER COLUMN max_stalled SET DEFAULT 5'); + }); + + test('v15 backfill UPDATE targets the correct non-terminal statuses', () => { + const sql = v15!.sql; + expect(sql).toContain(`'waiting'`); + expect(sql).toContain(`'active'`); + expect(sql).toContain(`'delayed'`); + expect(sql).toContain(`'waiting-children'`); + expect(sql).toContain(`'paused'`); + expect(sql).not.toContain(`'completed'`); + expect(sql).not.toContain(`'dead'`); + expect(sql).not.toContain(`'cancelled'`); + expect(sql).not.toContain(`'claimed'`); + expect(sql).not.toContain(`'running'`); + expect(sql).not.toContain(`'stalled'`); + }); + + test('v15 UPDATE clause has the < 5 guard so idempotent re-runs are no-ops', () => { + expect(v15!.sql).toContain('max_stalled < 5'); + }); +}); + +describe('migrate — runner behavioral (v14 handler + v15 backfill)', () => { + let engine: PGLiteEngine; + + beforeAll(async () => { + engine = new PGLiteEngine(); + await engine.connect({}); + await engine.initSchema(); + }); + + afterAll(async () => { + await engine.disconnect(); + }); + + test('v14 created idx_pages_updated_at_desc on PGLite via handler branch', async () => { + const rows = await (engine as any).db.query( + `SELECT indexname FROM pg_indexes WHERE indexname = 'idx_pages_updated_at_desc'` + ); + expect(rows.rows.length).toBe(1); + }); + + test('v15 backfilled any max_stalled=1 rows (smoke: schema default is 5)', async () => { + await (engine as any).db.exec( + `INSERT INTO minion_jobs (name, queue, status, max_stalled) VALUES ('test', 'default', 'waiting', 1)` + ); + await (engine as any).db.exec( + `UPDATE minion_jobs SET max_stalled = 5 + WHERE status IN ('waiting','active','delayed','waiting-children','paused') + AND max_stalled < 5` + ); + const rows = await (engine as any).db.query( + `SELECT max_stalled FROM minion_jobs WHERE name = 'test'` + ); + expect((rows.rows[0] as any).max_stalled).toBe(5); + + await (engine as any).db.exec( + `UPDATE minion_jobs SET max_stalled = 5 + WHERE status IN ('waiting','active','delayed','waiting-children','paused') + AND max_stalled < 5` + ); + const rows2 = await (engine as any).db.query( + `SELECT max_stalled FROM minion_jobs WHERE name = 'test'` + ); + expect((rows2.rows[0] as any).max_stalled).toBe(5); + }); +}); + describe('migrate: v8 (links_dedup) regression — must be fast on 1K duplicate rows', () => { let engine: PGLiteEngine; diff --git a/test/migrations-v0_14_0.test.ts b/test/migrations-v0_14_0.test.ts index 2997064..150b3d7 100644 --- a/test/migrations-v0_14_0.test.ts +++ b/test/migrations-v0_14_0.test.ts @@ -88,16 +88,20 @@ describe('Bug 5 — Phase B host-work entry dedup', () => { }); describe('Bug 8 — max_stalled default bumped in schema files', () => { - test('schema-embedded.ts has max_stalled DEFAULT 3', async () => { + // v0.14.2 bumped schema default 1 -> 3 via Bug 8. v0.14.3 (#219 fix wave) further + // bumps to 5 for extra flaky-deploy headroom, plus adds UPDATE backfill of + // non-terminal rows via migration v15. These structural assertions track the + // current schema source state (not historical). + test('schema-embedded.ts has max_stalled DEFAULT 5', async () => { const source = await Bun.file(new URL('../src/core/schema-embedded.ts', import.meta.url)).text(); - expect(source).toContain('max_stalled INTEGER NOT NULL DEFAULT 3'); + expect(source).toContain('max_stalled INTEGER NOT NULL DEFAULT 5'); }); - test('pglite-schema.ts has max_stalled DEFAULT 3', async () => { + test('pglite-schema.ts has max_stalled DEFAULT 5', async () => { const source = await Bun.file(new URL('../src/core/pglite-schema.ts', import.meta.url)).text(); - expect(source).toContain('max_stalled INTEGER NOT NULL DEFAULT 3'); + expect(source).toContain('max_stalled INTEGER NOT NULL DEFAULT 5'); }); - test('schema.sql has max_stalled DEFAULT 3', async () => { + test('schema.sql has max_stalled DEFAULT 5', async () => { const source = await Bun.file(new URL('../src/schema.sql', import.meta.url)).text(); - expect(source).toContain('max_stalled INTEGER NOT NULL DEFAULT 3'); + expect(source).toContain('max_stalled INTEGER NOT NULL DEFAULT 5'); }); }); diff --git a/test/minions.test.ts b/test/minions.test.ts index 14b079d..f11339d 100644 --- a/test/minions.test.ts +++ b/test/minions.test.ts @@ -270,6 +270,110 @@ describe('MinionQueue: Stall Detection', () => { }); }); +// --- v0.13.1 #219 — max_stalled default + input surface --- + +describe('MinionQueue: v0.13.1 max_stalled schema default (#219)', () => { + test('job submitted with no explicit max_stalled uses schema default of 5', async () => { + const job = await queue.add('noop', {}); + expect(job.max_stalled).toBe(5); + }); + + test('default=5 rescues across 4 consecutive stalls, dead-letters on the 5th', async () => { + const job = await queue.add('noop', {}); + // Job starts at max_stalled=5 (schema default). + for (let i = 0; i < 4; i++) { + await queue.claim(`tok-${i}`, 30000, 'default', ['noop']); + await engine.executeRaw( + "UPDATE minion_jobs SET lock_until = now() - interval '1 second' WHERE id = $1", + [job.id] + ); + const { requeued, dead } = await queue.handleStalled(); + expect(dead.length).toBe(0); + expect(requeued.length).toBe(1); + expect(requeued[0].stalled_counter).toBe(i + 1); + } + // 5th stall = dead (5+1 >= 5 = wait, actually handleStalled gate is stalled_counter + 1 >= max_stalled). + // With stalled_counter now at 4, next stall: 4+1=5 >= 5 = dead. + await queue.claim('tok-final', 30000, 'default', ['noop']); + await engine.executeRaw( + "UPDATE minion_jobs SET lock_until = now() - interval '1 second' WHERE id = $1", + [job.id] + ); + const { dead } = await queue.handleStalled(); + expect(dead.length).toBe(1); + expect(dead[0].status).toBe('dead'); + }); +}); + +describe('MinionQueue: v0.13.1 MinionJobInput.max_stalled plumbing', () => { + test('honored end-to-end when provided', async () => { + const job = await queue.add('noop', {}, { max_stalled: 10 }); + expect(job.max_stalled).toBe(10); + }); + + test('clamps input > 100 to 100', async () => { + const job = await queue.add('noop', {}, { max_stalled: 9999 }); + expect(job.max_stalled).toBe(100); + }); + + test('clamps input < 1 to 1', async () => { + const job = await queue.add('noop', {}, { max_stalled: 0 }); + expect(job.max_stalled).toBe(1); + }); + + test('clamps negative input to 1', async () => { + const job = await queue.add('noop', {}, { max_stalled: -5 }); + expect(job.max_stalled).toBe(1); + }); + + test('non-integer inputs are floored before clamp', async () => { + const job = await queue.add('noop', {}, { max_stalled: 7.9 }); + expect(job.max_stalled).toBe(7); + }); + + test('undefined leaves schema default intact (5)', async () => { + const job = await queue.add('noop', {}, { max_stalled: undefined }); + expect(job.max_stalled).toBe(5); + }); +}); + +describe('MinionQueue: v0.13.1 live-queue rescue regression (#219)', () => { + test('a row at max_stalled=1 is rescued by v13 backfill', async () => { + // Simulate a pre-v0.13.1 brain that inserted a row at the old default. + const job = await queue.add('noop', {}); + await engine.executeRaw('UPDATE minion_jobs SET max_stalled = 1 WHERE id = $1', [job.id]); + + // Run the v13 backfill UPDATE directly (matches migrate.ts v13 body). + await engine.executeRaw( + `UPDATE minion_jobs SET max_stalled = 5 + WHERE status IN ('waiting','active','delayed','waiting-children','paused') + AND max_stalled < 5` + ); + + const refetched = await queue.getJob(job.id); + expect(refetched!.max_stalled).toBe(5); + }); + + test('backfill does not touch terminal-status rows', async () => { + const job = await queue.add('noop', {}); + // Mark completed and set max_stalled=1 (simulating historical data). + await engine.executeRaw( + `UPDATE minion_jobs SET status = 'completed', max_stalled = 1, finished_at = now() WHERE id = $1`, + [job.id] + ); + + await engine.executeRaw( + `UPDATE minion_jobs SET max_stalled = 5 + WHERE status IN ('waiting','active','delayed','waiting-children','paused') + AND max_stalled < 5` + ); + + const refetched = await queue.getJob(job.id); + // Terminal rows intentionally untouched; historical data stays as-is. + expect(refetched!.max_stalled).toBe(1); + }); +}); + // --- Dependencies (5 tests) --- describe('MinionQueue: Dependencies', () => { diff --git a/test/pglite-engine.test.ts b/test/pglite-engine.test.ts index b2e0957..3caf032 100644 --- a/test/pglite-engine.test.ts +++ b/test/pglite-engine.test.ts @@ -891,3 +891,40 @@ describe('PGLiteEngine: getHealth graph metrics', () => { expect(h2.orphan_pages).toBe(1); }); }); + +// ───────────────────────────────────────────────────────────────── +// v0.13.1 — PGLite.create() error-wrap (structural guard for #223) +// ───────────────────────────────────────────────────────────────── +describe('PGLiteEngine: v0.13.1 error-wrap on connect() (#223)', () => { + test('pglite-engine.ts source contains the wrap with #223 hint and nested original error', async () => { + const { readFileSync } = await import('fs'); + const src = readFileSync('src/core/pglite-engine.ts', 'utf-8'); + // Structural: the try/catch block must wrap PGlite.create() (the actual + // abort site, NOT engine-factory.ts). The error message must name the + // issue and suggest gbrain doctor. Must NOT suggest "missing migrations" + // as a cause (that was conflating #218 and #223 — migrations run AFTER + // create()). + expect(src).toContain('this._db = await PGlite.create'); + expect(src).toContain('https://github.com/garrytan/gbrain/issues/223'); + expect(src).toContain('gbrain doctor'); + expect(src).toContain('Original error:'); + // Regression guard: the user-visible error MESSAGE must not re-introduce + // the misleading "missing migrations" hint. (A source comment explaining + // *why* we removed it is fine — match only inside the wrapped Error body.) + const wrapStart = src.indexOf('const wrapped = new Error('); + expect(wrapStart).toBeGreaterThan(-1); + const wrapEnd = src.indexOf(');', wrapStart); + const errBody = src.slice(wrapStart, wrapEnd); + expect(errBody).not.toContain('missing migrations'); + expect(errBody).not.toContain('apply-migrations'); + }); +}); + +// ───────────────────────────────────────────────────────────────── +// v0.13.1 — Engine kind discriminator +// ───────────────────────────────────────────────────────────────── +describe('PGLiteEngine: v0.13.1 kind discriminator', () => { + test('exposes readonly kind = pglite', () => { + expect(engine.kind).toBe('pglite'); + }); +});