feat: v0.17.0 — gbrain dream + runCycle primitive (one cycle, two CLIs) (#321)

* fix(sync): honor --dry-run in full-sync path + expose embedded count

Precondition for v0.17 brain maintenance cycle (runCycle primitive).

The full-sync path (performFullSync) previously called runImport() even
when opts.dryRun was true, silently writing to the DB and advancing
sync.last_commit. `gbrain sync --dry-run` on a fresh brain (or with
--full) would mutate state without warning.

Fix:
  - performFullSync now early-returns a `dry_run` SyncResult when
    opts.dryRun is set. Walks the repo via collectMarkdownFiles +
    isSyncable to count what WOULD be imported. No writes, no git
    state advance.
  - SyncResult gains an `embedded: number` field (required). Tracks
    pages re-embedded during the sync's auto-embed step. Existing
    return sites set 0; the synced + first_sync paths set real counts
    (best-estimate until commit 2 sharpens runEmbedCore's return type).
  - first_sync path now returns real added + chunksCreated counts
    from runImport instead of hardcoded zeros.
  - printSyncResult shows embedded count in human output.

Tests (test/sync.test.ts, new `performSync dry-run never writes`
block, PGLite + temp git repo, no DATABASE_URL required):
  - first-sync --dry-run: no pages, no sync.last_commit
  - incremental --dry-run after real sync: bookmark unchanged
  - --full --dry-run: no reimport, bookmark unchanged
  - SyncResult.embedded is a number

Codex outside-voice caught this. Would have shipped silent DB writes
on dry-run for anyone using `gbrain sync --dry-run --full`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(embed): add dry-run mode + return EmbedResult with counts

Precondition for v0.17 brain maintenance cycle (runCycle primitive).

runEmbedCore previously returned Promise<void> and had no dry-run mode.
That made it impossible for runCycle to (a) report accurate embedded
counts or (b) honor --dry-run without also skipping the entire embed
phase (which would have required runCycle to know embed's internal
semantics — a layering violation).

Changes:
  - EmbedOpts gains `dryRun?: boolean`. When set, embedPage and
    embedAll enumerate stale chunks (or would-be-created chunks for
    unchunked pages, via local chunkText without engine.upsertChunks)
    but never call embedBatch and never write to the engine.
  - runEmbedCore: Promise<void> -> Promise<EmbedResult>. Result shape:
    { embedded, skipped, would_embed, total_chunks, pages_processed,
      dryRun }.
    embedded = chunks newly embedded (0 in dryRun).
    would_embed = chunks that WOULD be embedded (0 in non-dryRun).
    skipped = chunks with pre-existing embeddings.
  - runEmbed CLI wrapper honors --dry-run flag and returns the result
    through. `gbrain embed --stale --dry-run` is now a safe preview.
  - Callers ignoring the return value (sync auto-embed, autopilot
    inline fallback, jobs.ts handlers, CLI) keep compiling — the new
    return type is additive for `await` callers.

Tests (test/embed.test.ts, new `runEmbedCore --dry-run` block, uses
the existing mock.module embedBatch pattern, no API key required):
  - dry-run --all: zero embedBatch calls, zero upsertChunks calls,
    would_embed matches stale chunk total
  - dry-run --stale correctly splits stale vs already-embedded counts
  - dry-run --slugs on a single page tallies per-chunk counts
  - non-dry-run regression guard: embedded count matches across
    concurrent workers

Codex outside-voice flagged the Promise<void> return as a blocker for
accurate CycleReport.totals.pages_embedded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(orphans): engine-injected queries, drop db.getConnection() global

Precondition for v0.17 brain maintenance cycle (runCycle primitive).

findOrphans + queryOrphanPages previously reached into the postgres-js
singleton via db.getConnection(), which (a) didn't compose with
runCycle's explicit-engine contract and (b) was wrong for PGLite test
fixtures and for any caller not using the default global connection.
Codex outside-voice flagged this as a blocker.

Changes:
  - BrainEngine interface gains findOrphanPages() — returns pages with
    no inbound links via the same NOT EXISTS anti-join. Implemented on
    both postgres-engine (sql tag) and pglite-engine (db.query).
  - findOrphans signature: findOrphans(engine, { includePseudo }).
    Engine is required. Uses engine.findOrphanPages() and
    engine.getStats().page_count instead of raw SQL + global counts.
  - queryOrphanPages signature: queryOrphanPages(engine). Delegates to
    engine.findOrphanPages().
  - src/commands/orphans.ts drops the `import * as db` — no more
    global-state coupling.
  - Callers updated: src/core/operations.ts find_orphans handler now
    passes ctx.engine through; runOrphans CLI entry uses its engine arg.
  - No signature change needed in cli.ts (it was already passing engine
    via CLI_ONLY dispatch).

Tests (test/orphans.test.ts, new `findOrphans (engine-injected)`
describe block, PGLite in-memory, no DATABASE_URL required):
  - links correctly scope orphans (alice links to bob -> bob not
    an orphan; alice is)
  - includePseudo:true surfaces _atlas-style pages
  - queryOrphanPages delegates to passed engine
  - empty brain returns {orphans: [], total_pages: 0} without crashing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cycle): add runCycle primitive in src/core/cycle.ts

The brain maintenance cycle as a single function. Six phases in
semantically-driven order (fix files → sync → extract → embed →
report orphans). Pure composition of existing library calls — no
execSync, no subprocess anti-patterns, no regex-parsed output.

    ┌───────────────────────────────────────────────────┐
    │ runCycle(engine, opts) → CycleReport              │
    │   Phase 1: lint --fix         (fs writes)         │
    │   Phase 2: backlinks --fix    (fs writes)         │
    │   Phase 3: sync               (DB picks up 1+2)   │
    │   Phase 4: extract            (DB picks up links) │
    │   Phase 5: embed --stale      (DB writes)         │
    │   Phase 6: orphans            (DB read, report)   │
    └───────────────────────────────────────────────────┘

Why the commit-4 primitive:

  - CEO + Eng + Codex reviews all converged on "extract one cycle
    function, wire both dream and autopilot through it." Two CLIs,
    one definition of what the brain does overnight.
  - Phase order was wrong in PR #309's original dream.ts (sync
    before lint+backlinks lost the "fix files, then index them"
    semantic).
  - This commit is the bisectable foundation; commit 5 (dream)
    and commit 6 (autopilot+jobs) just call into it.

Coordination — the codex-flagged blocker:

Session-scoped pg_try_advisory_lock does not survive PgBouncer
transaction pooling (the v0.15.4 fix made pooled connections the
default). Replaced with a DB lock table (gbrain_cycle_locks) that
works through every pooler:

  - Acquire: INSERT ... ON CONFLICT DO UPDATE ... WHERE ttl < NOW()
  - Refresh: UPDATE ttl_expires_at between phases via hook
  - Release: DELETE in finally{}
  - TTL: 30 min; crashed holders auto-release

PGLite / engine=null path uses a file lock at ~/.gbrain/cycle.lock
with PID liveness check. kill(pid, 0) with EPERM treated as alive
(so init/launchd-pid holders aren't mis-classified as stale).

Lock-skip: only phases that mutate state (lint, backlinks, sync,
extract, embed) trigger lock acquisition. orphans is read-only.
Single-phase --phase orphans runs never block on a held lock.

Engine-null mode preserved: filesystem phases run, DB phases skip
with {status:'skipped', reason:'no_database'}. Matches current
dream's capability that would have been lost if runCycle required
a connected engine.

Contract details:

  - CycleReport has schema_version:"1" (stable, additive) so agents
    consuming --json can rely on the shape
  - status: 'ok' | 'clean' | 'partial' | 'skipped' | 'failed'.
    'clean' = ran successfully with zero activity; agents trivially
    detect a healthy brain.
  - PhaseResult.error: { class, code, message, hint?, docs_url? }
    (Stripe-API-tier structured failure info) when status='fail'
  - yieldBetweenPhases hook: awaited between EVERY phase and before
    return, runs even after phase failure, exceptions logged but
    non-fatal. Required so the Minions autopilot-cycle handler can
    renew its job lock between phases (prevents the v0.14 stall-death
    regression codex flagged).
  - git pull explicit: opts.pull defaults to false (cron-safe).
    Autopilot daemon callers opt in if user configured it.
  - extract phase doesn't have a dry-run mode in the underlying
    library function, so runCycle honestly skips extract when
    dryRun=true (status:'skipped', reason:'no_dry_run_support').

Schema migration v16: gbrain_cycle_locks table + idx_cycle_locks_ttl.
Also appended to src/schema.sql and src/core/pglite-schema.ts for
fresh installs. schema-embedded.ts regenerated via build:schema.

Tests (test/core/cycle.test.ts, PGLite in-memory + mocked library
functions, no DATABASE_URL required):

  - dryRun × phases matrix: dryRun:true reaches lint/backlinks/sync/
    embed; extract is honestly skipped
  - Phase selection: default runs all 6 in order; --phase lint runs
    only lint; --phase orphans runs only orphans
  - Lock semantics: acquire + release on mutating phases, skip
    entirely for read-only selections
  - cycle_already_running: seeded live-holder lock → status:skipped,
    zero phase runs; TTL-expired holder → auto-claimed
  - Engine null: filesystem phases run, DB phases skip
  - File lock (engine=null) blocks when PID 1 holds lock with fresh
    mtime — exercises the PID liveness branch including EPERM
  - Status derivation: 'ok' vs 'clean' vs 'partial' vs 'skipped'
  - yieldBetweenPhases called N times, hook exceptions non-fatal

Next: commit 5 rewrites dream.ts as a thin CLI alias over runCycle,
commit 6 migrates autopilot daemon + jobs.ts handler to delegate to
runCycle too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dream): add gbrain dream CLI as a thin alias over runCycle

`gbrain dream` is the README brand-promise command: "the agent runs
while I sleep, the dream cycle ... I wake up and the brain is smarter."
Cron-friendly, JSON-reportable, phase-selectable. Same maintenance
cycle as `gbrain autopilot`, just scheduled differently — both
converge on runCycle (added in commit 4) so there's one source of
truth for what happens overnight.

Contract:
  gbrain dream                       # full 6-phase cycle
  gbrain dream --dry-run             # preview, no writes
  gbrain dream --json                # CycleReport JSON (agent-readable)
  gbrain dream --phase <name>        # single-phase run
  gbrain dream --pull                # git pull before syncing
  gbrain dream --dir /path/to/brain  # explicit brain location

Cron: 0 2 * * * gbrain dream --json >> /var/log/gbrain-dream.log

Behavior details:
  - Brain-dir resolution: requires explicit --dir OR sync.repo_path
    in engine config. No more walk-up-cwd-for-.git footgun that
    PR #309's original dream.ts had (would lint unrelated git repos).
  - engine=null mode preserved via cli.ts's try/catch around
    connectEngine — filesystem phases (lint, backlinks) still run
    without a DB, DB phases report skipped/no_database in the output.
  - status=clean prints "Brain is healthy. N phase(s) checked in Ns."
    status=skipped prints the reason (cycle_already_running, etc.).
    Partial/failed prints the phase-by-phase detail.
  - Exit code 1 when status=failed (cron spots real problems).
    'partial' is not a failure — warnings shouldn't page you.
  - --help text cross-references `autopilot --install` for users
    who want continuous maintenance as a daemon.

CLI registration (src/cli.ts):
  - 'dream' added to CLI_ONLY
  - handleCliOnly has a pre-engine branch mirroring doctor's pattern:
    try connectEngine() → ok path; catch → runDream(null, args) so
    filesystem phases still run when DB is down
  - Help text updated with one-line dream entry and autopilot cross-ref

Tests (test/dream.test.ts, real PGLite + real library calls, no mocks
to avoid `mock.module` leakage across test files):
  - brainDir resolution: explicit --dir wins, engine config fallback,
    missing + nonexistent errors
  - phase selection: --phase lint|orphans produces single-phase report
  - phase validation: --phase garbage exits 1
  - output: --json parses as CycleReport with schema_version:"1"
  - human output mentions "Brain is healthy" on clean status
  - dry-run: cycle runs but DB stays untouched
  - exit code: clean/ok/partial do not call process.exit

Also (test/core/cycle.test.ts): refactored to use beforeAll/afterAll
with one shared PGLite engine per describe + truncateCycleLocks
between tests. Cuts test time from ~11s to ~4s; avoids the 15-migration
penalty per test that was causing parallel-suite timeout flakes.

Co-Authored-By: Wintermute <wintermute@garrytan.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: v0.17.0 — autopilot + jobs delegate to runCycle (unifies the cycle)

Autopilot daemon (`--inline` path) and Minions `autopilot-cycle`
handler both now delegate to `runCycle` (introduced in commit 4).
Three callers, one cycle definition:

  1. `gbrain dream`                        — one-shot cron cycle
  2. `gbrain autopilot` daemon inline path — scheduled cycles
  3. `autopilot-cycle` Minions handler     — durable queue with retry

All three share:
  - Same 6 phases in same order (lint → backlinks → sync → extract →
    embed → orphans)
  - Same DB lock table coordination (`gbrain_cycle_locks`)
  - Same yieldBetweenPhases discipline (prevents v0.14 stall-death)
  - Same structured CycleReport output

Autopilot inline path gains lint + orphan sweep that the old path
skipped. Minions autopilot-cycle handler also gains lint + orphans.
Users who run `gbrain autopilot --install` see 6-phase reports in
`gbrain jobs get <id>` starting on next interval. No config change
required.

Changes:
  - `src/commands/autopilot.ts`: inline fallback path (~20 lines)
    replaces the ~22-line sync+extract+embed sequence with a single
    runCycle call. Uses pull:true (matches pre-v0.17 autopilot
    behavior). Uses setImmediate yield hook. Status/failure reporting
    derives from CycleReport.status. `--help` cross-references `gbrain
    dream` for one-shot use.
  - `src/commands/jobs.ts:579` (`autopilot-cycle` handler): replaces
    the 4-step try/catch sequence with a runCycle call. Returns
    `{ partial, status, report }` so `gbrain jobs get <id>` shows the
    full structured CycleReport. Preserves partial-failure semantic
    (one phase failing does NOT throw; next cycle still runs).
    yieldBetweenPhases yields the event loop between phases for the
    worker's lock-renewal timer.

Release scaffolding:
  - VERSION: 0.16.0 → 0.17.0
  - CHANGELOG.md: v0.17.0 entry in GStack voice — headline, numbers
    table, "what this means" paragraph, "To take advantage" block
    per CLAUDE.md post-ship rules. Itemized changes below the fold.
    Credit to @Wintermute for the original PR #309 thesis.
  - skills/migrations/v0.17.0.md: documents what changed for
    upgrading users. No mechanical action required — schema migration
    v16 (cycle locks table) + handler delegation both apply
    automatically. Includes opt-out paths for users who don't want
    their daemon modifying files (use `dream --phase orphans` in cron
    and skip autopilot-install, or other explicit configs).
  - CLAUDE.md: new entries for `src/core/cycle.ts` and
    `src/commands/dream.ts` with contract details.

Tests: no new test file needed for this commit — the cycle primitive
is extensively tested in test/core/cycle.test.ts (18 cases), dream
in test/dream.test.ts (11), and autopilot's delegation is mechanical
(calls runCycle with specific opts). The handler contract is covered
implicitly: if runCycle returns a CycleReport, the handler wraps it
in `{ partial, status, report }` — nothing else to assert.

Verified:
  - `bun test test/autopilot-install.test.ts test/autopilot-resolve-cli.test.ts test/core/cycle.test.ts test/dream.test.ts` → 37 pass, 0 fail

Completes the v0.17.0 feature: 6 bisectable commits on one branch
(garrytan/v0.17-dream-cycle), ready to push as one PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): add runCycle + dream E2E coverage against real Postgres

Gap from the v0.17 commit series: PR #321 shipped unit-level tests
for runCycle (test/core/cycle.test.ts) and dream (test/dream.test.ts)
but no E2E coverage that exercises the real Postgres paths. Filling
that in before merge.

  test/e2e/cycle.test.ts (6 cases):
    - schema migration v16 created gbrain_cycle_locks + index
    - dry-run full cycle: zero DB writes + lock table empty after
    - live cycle: pages + chunks materialize, sync.last_commit set
    - concurrent cycle blocked by lock → status:'skipped'
    - TTL-expired lock auto-claimed (crashed-holder recovery)
    - --phase orphans skips lock entirely (read-only optimization)

  test/e2e/dream.test.ts (3 cases):
    - dream --dry-run --json emits valid CycleReport + DB stays empty
    - dream (no --dry-run) syncs pages into real DB
    - dream --phase orphans doesn't touch the cycle-lock table

Both files mock embedBatch via mock.module so the embed phase never
calls OpenAI even when the full 6-phase cycle runs (zero API cost,
zero flakiness from network calls).

Verified locally:
  - `docker run pgvector/pgvector:pg16` on port 5434
  - `DATABASE_URL=... bun test test/e2e/cycle.test.ts test/e2e/dream.test.ts` → 9 pass, 0 fail
  - Full E2E suite (`bun run test:e2e`): 16 files, 150 tests, 0 fail
  - Container torn down after: `docker stop + rm gbrain-test-pg`

Per CLAUDE.md E2E test DB lifecycle. These tests skip gracefully when
DATABASE_URL isn't set (via hasDatabase() helper + describe.skip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Wintermute <wintermute@garrytan.com>
This commit is contained in:
Garry Tan
2026-04-22 08:23:24 -07:00
committed by GitHub
parent 35967645f3
commit 55ca4984b2
29 changed files with 3112 additions and 150 deletions

View File

@@ -2,6 +2,116 @@
All notable changes to GBrain will be documented in this file.
## [0.17.0] - 2026-04-22
## **`gbrain dream`. Run the brain maintenance cycle while you sleep.**
## **One primitive, two CLIs. Autopilot gains lint + orphan sweep automatically.**
The README has promised "the dream cycle" for a year. v0.17 makes it real as a first-class command. `gbrain dream` runs one maintenance cycle and exits, designed for cron. Same six phases as `gbrain autopilot` — they both delegate to the new `runCycle` primitive in `src/core/cycle.ts`. One source of truth for what your brain does overnight.
Phase order is semantically driven: **fix files first, then index them**. Lint and backlinks write to disk. Sync picks them up into the DB. Extract links the graph. Embed refreshes vectors. Orphan sweep reports the gaps. If your autopilot daemon was doing sync-before-lint (which PR #309's original dream.ts also got wrong), your fixes landed the next cycle instead of the current one. Fixed.
Autopilot users upgrading get lint + orphan sweep for free. No config change. `gbrain jobs list` shows the full 6-phase report now. If you don't want the daemon modifying files, `gbrain dream --phase orphans` in cron keeps autopilot for embed+sync and gives you manual control over the writes.
### The numbers that matter
Measured against a v0.16 baseline. Lines-of-code delta is net-small: runCycle adds ~500 lines, but the new dream.ts is 80 lines (vs the 446-line original in PR #309), and autopilot's two-path branching collapses to one delegated call.
| Metric | BEFORE v0.17 | AFTER v0.17 | Δ |
|--------|--------------|-------------|---|
| `gbrain dream --dry-run` mutates DB | Yes (full-sync + embed silently wrote) | No (every phase honors dry-run) | correctness |
| Sources of truth for "the cycle" | 3-4 (dream inline, dream shell-outs, autopilot inline, Minions handler) | 1 (`runCycle`) | DRY win |
| Phase order: fix-then-index | No (sync before lint) | Yes (lint → backlinks → sync → extract → embed → orphans) | semantics |
| Coordination across daemon + cron + Minions worker | Lockfile heuristic with 6 known holes | DB lock table + PID-liveness file lock | primitive upgrade |
| Works under PgBouncer transaction pooling | No (session-scoped `pg_try_advisory_lock`) | Yes (TTL row, refreshed between phases) | Supabase-safe |
| `findRepoRoot` walks into wrong git repo | Yes (10 levels of cwd) | No (explicit --dir OR configured sync.repo_path) | footgun fixed |
| Autopilot daemon phase count | 4 (sync+extract+embed+backlinks in Minions mode; no backlinks inline) | 6 (+lint +orphans) | feature parity |
| CycleReport shape stability for agents | N/A | `schema_version: "1"` (stable, additive only) | API contract |
### What this means for your workflow
Cron users: one line. `0 2 * * * gbrain dream --json >> /var/log/gbrain-dream.log`. You get a structured `CycleReport` every morning with per-phase timing, counts, and any errors tagged with `{class, code, message, hint, docs_url}`.
Autopilot users: nothing to do. Your daemon picks up the new phases on next cycle. If you want to see them: `gbrain jobs get <autopilot-cycle-id>` shows the full report.
Reviewers/codex caught three plan-breakers during multi-round review that would have shipped silent DB writes on dry-run: (1) `performSync`'s full-sync path was ignoring `opts.dryRun`, (2) `runEmbedCore` had no dry-run mode and returned void, (3) `findOrphans` used `db.getConnection()` global and didn't compose with a passed engine. All three are fixed as preconditions (commits 1-3 of the 6-commit bisectable series).
Credit: @Wintermute for the original `gbrain dream` thesis (PR #309). The brand-promise framing survived; the implementation got redesigned from scratch around the runCycle primitive after CEO + Eng + Codex + DX review found structural issues.
## To take advantage of v0.17.0
`gbrain upgrade` should do this automatically. If it didn't, or if `gbrain doctor` warns about a partial migration:
1. **Run the migration orchestrator manually:**
```bash
gbrain apply-migrations --yes
```
2. **Your agent reads `skills/migrations/v0.17.0.md` the next time you interact with it.** No mechanical host-repo action required; the schema migration (v16 cycle-lock table) and the behavior shift in autopilot's inline path both apply automatically.
3. **Verify the outcome:**
```bash
gbrain dream --help # new command exists
gbrain dream --dry-run --json # safe preview
gbrain doctor # should show no pending migrations
```
Autopilot users: `gbrain jobs list --status complete | head -5` and inspect an `autopilot-cycle` job with `gbrain jobs get <id>` — the report now includes 6 phases.
4. **If any step fails or the numbers look wrong,** please file an issue: https://github.com/garrytan/gbrain/issues with:
- output of `gbrain doctor`
- contents of `~/.gbrain/upgrade-errors.jsonl` if it exists
- which step broke
This feedback loop is how the gbrain maintainers find fragile upgrade paths. Thank you.
### Itemized changes
**New CLI command: `gbrain dream`**
- One-shot maintenance cycle for cron. Exits when done. Flags: `--dry-run`, `--json`, `--phase <name>`, `--pull`, `--dir <path>`, `--help`.
- `--help` shows cron example + cross-reference to `autopilot --install` for continuous daemon.
- Empty-state output is intentionally satisfying: `Brain is healthy. 6 phase(s) checked in 2.3s.` Agents detect it via `status: "clean"`.
- Exit code 1 on `status: "failed"`. Warnings (`status: "partial"`) are not failures — don't page someone.
- `--dir` OR `sync.repo_path` config required. No more walk-up-cwd-for-.git footgun.
**New primitive: `src/core/cycle.ts`**
- `runCycle(engine: BrainEngine | null, opts: CycleOpts): Promise<CycleReport>`.
- Six phases in order: lint → backlinks → sync → extract → embed → orphans.
- `CycleReport` has `schema_version: "1"` (stable, additive). `status: 'ok' | 'clean' | 'partial' | 'skipped' | 'failed'` with `reason` field on skipped.
- `PhaseResult.error: { class, code, message, hint?, docs_url? }` on fail. Stripe-API-tier structured errors.
- `yieldBetweenPhases` hook awaited between every phase + before return. Required for Minions worker lock renewal. Exceptions non-fatal.
- Engine nullable — filesystem phases run without DB; DB phases skip with `reason: "no_database"`.
- Lock-skip: read-only phase selections (`--phase orphans`) skip lock acquisition.
**New schema: `gbrain_cycle_locks` (migration v16)**
- DB lock table with TTL (30 min), replaces session-scoped `pg_try_advisory_lock` which the v0.15.4 PgBouncer-transaction-pooler fix silently broke.
- Refreshed between phases via the yield hook. Crashed holders auto-release on TTL expiry.
- PGLite + engine=null use a file-based fallback at `~/.gbrain/cycle.lock` with PID-liveness check (EPERM treated as alive so PID 1 holders aren't mis-classified).
**Autopilot + Minions integration**
- Autopilot's inline fallback path (`--inline` flag + PGLite mode) now delegates to `runCycle`. Gains lint + orphan phases it didn't run before. Uses `pull: true` by default (preserves pre-v0.17 pull semantics).
- Minions `autopilot-cycle` handler (in `src/commands/jobs.ts`) also delegates to `runCycle`. Returns `{ partial, status, report }` so `gbrain jobs get <id>` surfaces the full structured report.
- `gbrain autopilot --install` install/uninstall/launchd/systemd/crontab machinery untouched.
- `gbrain autopilot --help` now cross-references `gbrain dream`.
**Precondition fixes (required for the runCycle primitive to compose cleanly)**
- `src/commands/sync.ts`: `performFullSync` honors `opts.dryRun` in first-sync + `--full` paths. Was silently calling `runImport` regardless. `SyncResult.embedded: number` field added; `first_sync` path now returns real counts from `runImport` (was hardcoded to 0).
- `src/commands/embed.ts`: `runEmbedCore` adds `dryRun?: boolean` opt and returns `EmbedResult { embedded, skipped, would_embed, total_chunks, pages_processed, dryRun }` instead of `void`. `gbrain embed --stale --dry-run` is now a safe preview.
- `src/commands/orphans.ts`: `findOrphans(engine, opts)` takes a `BrainEngine` parameter. Added `findOrphanPages()` method to `BrainEngine` interface + implementations on both `postgres-engine` and `pglite-engine`. Drops `db.getConnection()` global — findOrphans now composes with test-injected engines and works on PGLite.
**Tests (all run in CI, no DATABASE_URL or API keys required)**
- `test/sync.test.ts`: 4 new cases. First-sync dry-run, incremental dry-run, `--full` dry-run, SyncResult.embedded shape. PGLite + temp git repo.
- `test/embed.test.ts`: 4 new cases. Dry-run with stale chunks, dry-run stale-vs-fresh split, dry-run --slugs, non-dry-run regression guard. Mocked `embedBatch`.
- `test/orphans.test.ts`: 4 new cases. Engine-injected findOrphans, includePseudo flag, queryOrphanPages delegation, empty-brain edge. PGLite.
- `test/core/cycle.test.ts` (new): 18 cases covering dryRun × phases × lock_held × engine-null. Shared PGLite engine per describe via beforeAll + truncateCycleLocks (cuts test time ~3x vs per-test init).
- `test/dream.test.ts` (rewritten, 11 cases): brainDir resolution, phase selection, phase validation, JSON output shape, dry-run propagation, exit-code semantics. Real PGLite + real library calls (no `mock.module` to avoid leakage).
**Docs**
- `skills/migrations/v0.17.0.md`: new. Informational, no mechanical action required.
- `CHANGELOG.md` + `CLAUDE.md`: updated.
**PR #309 disposition**
- Closed with credit to @Wintermute. Their thesis ("`gbrain dream` as first-class CLI verb") was right; the implementation got redesigned around the runCycle primitive after deep review surfaced structural issues in the fold approach.
- `Co-Authored-By: Wintermute` preserved on commit 5 (the dream.ts rewrite).
---
## [0.16.4] - 2026-04-22
## **`gbrain check-resolvable` ships. The command the README promised for weeks.**
@@ -59,6 +169,8 @@ gbrain check-resolvable || exit 1 # fails the build on any warning/error
**Contract note for CI users**
- `gbrain check-resolvable` exits 1 on warnings AND errors. `gbrain doctor`'s resolver_health block still exits 0 on warnings-only. If you scripted against doctor's looser gate, `check-resolvable` will bite harder ... on purpose. This honors the README:259 contract: "Exits non-zero if anything is off."
---
## [0.16.3] - 2026-04-22
## **`gbrain agent run` actually runs now. The subagent SDK wiring that shipped broken in v0.16.0 is fixed.**
@@ -140,6 +252,8 @@ So `tsc --noEmit` actually stays green. All mechanical, zero behavior change. Gr
### Itemized changes
---
## [0.16.2] - 2026-04-22
## **The deployment guide now reads like a runbook an agent can execute line-by-line.**
@@ -233,7 +347,6 @@ Docs-only release. No code changed. Zero migration required.
**Added**
- **Minions worker deployment guide** — new `docs/guides/minions-deployment.md` covering watchdog cron patterns, inline `--follow` for cron-only workloads, and the sharp edges of running `gbrain jobs work` against Supabase in production. Addresses a real gap: existing Minions docs (`minions-fix.md`, `minions-shell-jobs.md`) cover schema repair and shell-job security, not deploy patterns. Contributed by your OpenClaw via #287. Pre-landing accuracy pass corrected five factual bugs against current source: the `max_stalled` column default (5, not 1 or 3), the stalled-jobs smoke-test query (`active`, not `waiting`), the SIGTERM-to-SIGKILL grace window (10s minimum, not 2s), the cron env pattern (crontab env lines, not `source ~/.bashrc`), and the `--follow` exit semantics (blocks until submitted job is terminal, not until queue is empty).
## [0.16.0] - 2026-04-20
## **Durable agents land. Your LLM loops survive crashes, timeouts, and worker restarts now.**

View File

@@ -86,6 +86,8 @@ strict behavior when unset.
- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows.
- `src/core/progress.ts` — Shared bulk-action progress reporter. Writes to stderr. Modes: `auto` (TTY: `\r`-rewriting; non-TTY: plain lines), `human`, `json` (JSONL), `quiet`. Rate-gated by `minIntervalMs` and `minItems`. `startHeartbeat(reporter, note)` helper for single long queries. `child()` composes phase paths. Singleton SIGINT/SIGTERM coordinator emits `abort` events for every live phase. EPIPE defense on both sync throws and stream `'error'` events. Zero dependencies. Introduced in v0.15.2.
- `src/core/cli-options.ts` — Global CLI flag parser. `parseGlobalFlags(argv)` returns `{cliOpts, rest}` with `--quiet` / `--progress-json` / `--progress-interval=<ms>` stripped. `getCliOptions()` / `setCliOptions()` expose a module-level singleton so commands reach the resolved flags without parameter threading. `cliOptsToProgressOptions()` maps to reporter options. `childGlobalFlags()` returns the flag suffix to append to `execSync('gbrain ...')` calls in migration orchestrators. `OperationContext.cliOpts` extends shared-op dispatch for MCP callers.
- `src/core/cycle.ts` — v0.17 brain maintenance cycle primitive. `runCycle(engine: BrainEngine | null, opts: CycleOpts): Promise<CycleReport>` composes 6 phases in semantically-driven order (lint → backlinks → sync → extract → embed → orphans). Three callers: `gbrain dream` CLI, `gbrain autopilot` daemon's inline path, and the Minions `autopilot-cycle` handler (`src/commands/jobs.ts`). One source of truth for what the brain does overnight. Coordination via `gbrain_cycle_locks` DB table (TTL-based; works through PgBouncer transaction pooling, unlike session-scoped `pg_try_advisory_lock`) + `~/.gbrain/cycle.lock` file lock with PID-liveness for PGLite / engine=null mode. `CycleReport.schema_version: "1"` is the stable agent-consumable shape. `PhaseResult.error: { class, code, message, hint?, docs_url? }` is Stripe-API-tier structured failure info. `yieldBetweenPhases` hook awaited between every phase — Minions handler uses this to renew its job lock and prevent v0.14 stall-death regression. Engine nullable: filesystem phases (lint, backlinks) run without DB; DB phases skip with `status: "skipped", reason: "no_database"`. Lock-skip: read-only phase selections (`--phase orphans`) bypass the cycle lock.
- `src/commands/dream.ts` — v0.17 `gbrain dream` CLI. ~80-line thin alias over `runCycle`. brainDir resolution requires explicit `--dir` OR `sync.repo_path` config (no more walk-up-cwd-for-.git footgun). Flags: `--dry-run`, `--json`, `--phase <name>`, `--pull`, `--dir <path>`. Exit code 1 on status=failed (partial/warn not fatal — don't page on warnings).
- `scripts/check-progress-to-stdout.sh` — CI guard against regressing to `\r`-on-stdout progress. Wired into `bun run test` via `scripts/check-progress-to-stdout.sh && bun test` in package.json.
- `docs/progress-events.md` — Canonical JSON event schema reference. Stable from v0.15.2, additive only.
- `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (`<!-- timeline -->`, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics).

View File

@@ -1 +1 @@
0.16.4
0.17.0

View File

@@ -165,6 +165,8 @@ strict behavior when unset.
- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows.
- `src/core/progress.ts` — Shared bulk-action progress reporter. Writes to stderr. Modes: `auto` (TTY: `\r`-rewriting; non-TTY: plain lines), `human`, `json` (JSONL), `quiet`. Rate-gated by `minIntervalMs` and `minItems`. `startHeartbeat(reporter, note)` helper for single long queries. `child()` composes phase paths. Singleton SIGINT/SIGTERM coordinator emits `abort` events for every live phase. EPIPE defense on both sync throws and stream `'error'` events. Zero dependencies. Introduced in v0.15.2.
- `src/core/cli-options.ts` — Global CLI flag parser. `parseGlobalFlags(argv)` returns `{cliOpts, rest}` with `--quiet` / `--progress-json` / `--progress-interval=<ms>` stripped. `getCliOptions()` / `setCliOptions()` expose a module-level singleton so commands reach the resolved flags without parameter threading. `cliOptsToProgressOptions()` maps to reporter options. `childGlobalFlags()` returns the flag suffix to append to `execSync('gbrain ...')` calls in migration orchestrators. `OperationContext.cliOpts` extends shared-op dispatch for MCP callers.
- `src/core/cycle.ts` — v0.17 brain maintenance cycle primitive. `runCycle(engine: BrainEngine | null, opts: CycleOpts): Promise<CycleReport>` composes 6 phases in semantically-driven order (lint → backlinks → sync → extract → embed → orphans). Three callers: `gbrain dream` CLI, `gbrain autopilot` daemon's inline path, and the Minions `autopilot-cycle` handler (`src/commands/jobs.ts`). One source of truth for what the brain does overnight. Coordination via `gbrain_cycle_locks` DB table (TTL-based; works through PgBouncer transaction pooling, unlike session-scoped `pg_try_advisory_lock`) + `~/.gbrain/cycle.lock` file lock with PID-liveness for PGLite / engine=null mode. `CycleReport.schema_version: "1"` is the stable agent-consumable shape. `PhaseResult.error: { class, code, message, hint?, docs_url? }` is Stripe-API-tier structured failure info. `yieldBetweenPhases` hook awaited between every phase — Minions handler uses this to renew its job lock and prevent v0.14 stall-death regression. Engine nullable: filesystem phases (lint, backlinks) run without DB; DB phases skip with `status: "skipped", reason: "no_database"`. Lock-skip: read-only phase selections (`--phase orphans`) bypass the cycle lock.
- `src/commands/dream.ts` — v0.17 `gbrain dream` CLI. ~80-line thin alias over `runCycle`. brainDir resolution requires explicit `--dir` OR `sync.repo_path` config (no more walk-up-cwd-for-.git footgun). Flags: `--dry-run`, `--json`, `--phase <name>`, `--pull`, `--dir <path>`. Exit code 1 on status=failed (partial/warn not fatal — don't page on warnings).
- `scripts/check-progress-to-stdout.sh` — CI guard against regressing to `\r`-on-stdout progress. Wired into `bun run test` via `scripts/check-progress-to-stdout.sh && bun test` in package.json.
- `docs/progress-events.md` — Canonical JSON event schema reference. Stable from v0.15.2, additive only.
- `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (`<!-- timeline -->`, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics).

View File

@@ -0,0 +1,167 @@
---
version: 0.17.0
feature_pitch:
headline: "One brain maintenance cycle, two CLIs. `gbrain dream` delivers the README promise."
description: |
The README has said "the agent runs while I sleep, the dream cycle
scans every conversation, enriches missing entities, fixes broken
citations, consolidates memory" for a year. v0.17 makes that real
as a first-class command (`gbrain dream`) backed by one shared
primitive (`runCycle`). Autopilot users get lint + orphan sweep
added to their nightly cycle automatically — no config change.
Cron users get a single legible verb: `0 2 * * * gbrain dream`.
Both converge on the same phase order (lint → backlinks → sync →
extract → embed → orphans) so file fixes land in the DB the same
night, not the next.
recipe: null
tiers: null
---
# v0.17.0 Migration: `gbrain dream` + unified maintenance cycle
**Audience: agents + humans upgrading from v0.16.x. There is no
mechanical migration step required — the schema migration (v16
cycle-lock table) and behavior changes all apply automatically on
upgrade. This file documents what changed, how to verify it, and
the one opt-out users may care about.**
## What changed
### New command: `gbrain dream`
The brand-promise one-liner. Runs one brain maintenance cycle and
exits. Designed for cron.
```
gbrain dream # full 6-phase cycle
gbrain dream --dry-run --json # preview, agent-readable
gbrain dream --phase lint # single-phase (fast, targeted)
gbrain dream --pull # git pull before syncing
0 2 * * * gbrain dream --json # nightly cron
```
See `gbrain dream --help` for the full flag reference.
### Autopilot now runs lint + orphan sweep
`gbrain autopilot --install` users: on upgrade, your daemon's cycle
gains two phases it didn't run before:
- **lint --fix** — auto-fixes LLM artifacts, placeholder dates, bad
citations across the brain. Modifies files on disk.
- **orphan sweep** — reports (read-only) pages with no inbound
wikilinks. Visible in `gbrain jobs list` output for each
`autopilot-cycle` job.
No action required. The new phases run on the daemon's existing
interval.
### Shared primitive: `src/core/cycle.ts`
Three callers (dream CLI, autopilot inline path, autopilot-cycle
Minions handler) now all delegate to `runCycle(engine, opts)`. One
source of truth for what happens overnight.
### Cycle coordination via a DB lock table
`gbrain_cycle_locks` (new table, migration v16) replaces
session-scoped `pg_try_advisory_lock` which the v0.15.4
PgBouncer-transaction-pooler fix silently broke. The table has a
TTL (30 min), refreshed between phases, so crashed holders
auto-release.
## Verify after upgrade
```bash
# 1. Dream command exists:
gbrain dream --help
# 2. Run a dry cycle (safe, no writes):
gbrain dream --dry-run --json
# 3. If you run autopilot --install:
gbrain jobs list --status complete | head -5
# Each `autopilot-cycle` entry now has 6 phases in its report,
# not 4. Check a recent one with `gbrain jobs get <id>`.
# 4. Schema migration landed:
gbrain doctor # should show no pending migrations
```
Expected `gbrain dream --dry-run` output on a healthy brain:
```
Brain is healthy. 6 phase(s) checked in 1.3s.
```
Or with `--json`:
```json
{
"schema_version": "1",
"status": "clean",
"phases": [...],
"totals": { "lint_fixes": 0, "backlinks_added": 0, ... }
}
```
## Opt-outs for autopilot-installed users
If you explicitly do NOT want autopilot's daemon modifying files
(lint + backlinks phases write to disk):
**Option 1: disable those phases in cron-dream but keep autopilot
running.** Since dream is separate, you can run just the phases
you want from cron without touching autopilot:
```bash
# e.g. only re-embed and orphan-sweep nightly, skip file mutations:
0 2 * * * gbrain dream --phase orphans
```
**Option 2: uninstall autopilot and use cron-dream only.**
```bash
gbrain autopilot --uninstall
# Then add to your crontab:
0 2 * * * gbrain dream --pull
```
**Option 3: accept the default.** The new phases are conservative:
lint only fixes known-safe artifacts (em dashes, placeholder dates),
never destructive. Back-link fills are additive. If something does
go wrong, `gbrain dream --dry-run` always tells you what WOULD
change before you run it for real.
## Troubleshooting
**"cycle_already_running" in dream output:**
Another cycle (probably autopilot's daemon) is holding the lock.
Expected behavior — dream skipped to avoid racing the daemon. The
daemon's next interval will pick up the work.
**`gbrain dream --dry-run` reports changes when you expected none:**
Check `gbrain doctor` for drift: lint issues, stale embeddings,
missing back-links. Dream's dry-run is the honest preview of what
autopilot's daemon will do on its next cycle.
**Minion `autopilot-cycle` jobs failing after upgrade:**
Open a GitHub issue with the output of `gbrain jobs get <id>` for a
failing job. The new runCycle-backed handler preserves the
partial-failure semantic (one phase failing doesn't block future
cycles), but specific phases may surface new error classes.
## What did NOT change
- `gbrain autopilot --install` machinery (launchd / systemd /
crontab generators). Existing installs keep working.
- `~/.gbrain/autopilot.lock` daemon-singleton lockfile. Separate
concern from the new per-cycle lock.
- `gbrain jobs` interface. `gbrain jobs get <id>` now shows a
richer report structure (schema_version:"1"), but the surface
API is stable.
---
*This migration file is informational only. No mechanical step is
required — all changes apply automatically on `gbrain upgrade`.*

View File

@@ -19,7 +19,7 @@ for (const op of operations) {
}
// CLI-only commands that bypass the operation layer
const CLI_ONLY = new Set(['init', 'upgrade', 'post-upgrade', 'check-update', 'integrations', 'publish', 'check-backlinks', 'lint', 'report', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor', 'migrate', 'eval', 'sync', 'extract', 'features', 'autopilot', 'graph-query', 'jobs', 'agent', 'apply-migrations', 'skillpack-check', 'resolvers', 'integrity', 'repair-jsonb', 'orphans', 'check-resolvable']);
const CLI_ONLY = new Set(['init', 'upgrade', 'post-upgrade', 'check-update', 'integrations', 'publish', 'check-backlinks', 'lint', 'report', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor', 'migrate', 'eval', 'sync', 'extract', 'features', 'autopilot', 'graph-query', 'jobs', 'agent', 'apply-migrations', 'skillpack-check', 'resolvers', 'integrity', 'repair-jsonb', 'orphans', 'dream', 'check-resolvable']);
async function main() {
// Parse global flags (--quiet / --progress-json / --progress-interval)
@@ -363,6 +363,25 @@ async function handleCliOnly(command: string, args: string[]) {
return;
}
if (command === 'dream') {
// Dream mirrors doctor's pattern: filesystem phases run without a DB,
// so an engine connection failure is non-fatal. runCycle honestly
// reports DB phases as skipped when engine is null.
const { runDream } = await import('./commands/dream.ts');
let eng: BrainEngine | null = null;
try {
eng = await connectEngine();
} catch {
// DB unavailable — lint + backlinks still run against the brain dir.
}
try {
await runDream(eng, args);
} finally {
if (eng) await eng.disconnect();
}
return;
}
// All remaining CLI-only commands need a DB connection
const engine = await connectEngine();
try {
@@ -562,6 +581,8 @@ TOOLS
check-backlinks <check|fix> [dir] Find/fix missing back-links across brain
lint <dir|file> [--fix] Catch LLM artifacts, placeholder dates, bad frontmatter
orphans [--json] [--count] Find pages with no inbound wikilinks
dream [--dry-run] [--json] Run the overnight maintenance cycle once (cron-friendly).
See also: autopilot --install (continuous daemon).
check-resolvable [--json] [--fix] Validate skill tree (reachability/MECE/DRY)
report --type <name> --content ... Save timestamped report to brain/reports/

View File

@@ -77,7 +77,15 @@ export function resolveGbrainCliPath(): string {
export async function runAutopilot(engine: BrainEngine, args: string[]) {
if (args.includes('--help') || args.includes('-h')) {
console.log('Usage: gbrain autopilot [--repo <path>] [--interval N] [--json]\n gbrain autopilot --install [--repo <path>]\n gbrain autopilot --uninstall\n gbrain autopilot --status [--json]\n\nSelf-maintaining brain daemon. Runs sync + extract + embed + backlinks in a loop.');
console.log(
'Usage: gbrain autopilot [--repo <path>] [--interval N] [--json]\n' +
' gbrain autopilot --install [--repo <path>]\n' +
' gbrain autopilot --uninstall\n' +
' gbrain autopilot --status [--json]\n\n' +
'Self-maintaining brain daemon. Runs the full maintenance cycle\n' +
'(lint + backlinks + sync + extract + embed + orphans) on an interval.\n\n' +
'For a one-shot cron-triggered cycle, see `gbrain dream`.',
);
return;
}
@@ -233,27 +241,32 @@ export async function runAutopilot(engine: BrainEngine, args: string[]) {
}
} catch (e) { logError('dispatch', e); cycleOk = false; }
} else {
// Inline fallback — same as pre-v0.11.1 behavior.
// 1. Sync
// Inline fallback — delegate to runCycle so lint + backlinks +
// orphan sweep run too (previously this path only did sync +
// extract + embed, which didn't match the Minions-dispatch
// path's phase set). Now both converge on the same primitive.
try {
const { performSync } = await import('./sync.ts');
const result = await performSync(engine, { repoPath, noEmbed: true });
if (result.status === 'synced') {
console.log(`[sync] +${result.added} ~${result.modified} -${result.deleted}`);
const { runCycle } = await import('../core/cycle.ts');
const report = await runCycle(engine, {
brainDir: repoPath,
// Autopilot daemon path: pulls by default (matches
// pre-v0.17 autopilot behavior). CLI dream defaults false
// for cron safety; that choice is scoped to dream only.
pull: true,
yieldBetweenPhases: async () => {
await new Promise(r => setImmediate(r));
},
});
if (report.status === 'failed' || report.status === 'partial') {
cycleOk = false;
}
} catch (e) { logError('sync', e); cycleOk = false; }
// 2. Extract (full brain, incremental dedup handles repeats)
try {
const { runExtractCore } = await import('./extract.ts');
await runExtractCore(engine, { mode: 'all', dir: repoPath });
} catch (e) { logError('extract', e); cycleOk = false; }
// 3. Embed stale
try {
const { runEmbedCore } = await import('./embed.ts');
await runEmbedCore(engine, { stale: true });
} catch (e) { logError('embed', e); cycleOk = false; }
if (jsonMode) {
process.stderr.write(JSON.stringify({ event: 'cycle-inline', status: report.status, duration_ms: report.duration_ms, totals: report.totals }) + '\n');
} else {
const t = report.totals;
console.log(`[cycle-inline ${report.status}] lint=${t.lint_fixes} backlinks=${t.backlinks_added} synced=${t.pages_synced} extracted=${t.pages_extracted} embedded=${t.pages_embedded} orphans=${t.orphans_found}`);
}
} catch (e) { logError('cycle-inline', e); cycleOk = false; }
}
// 4. Health check + adaptive interval (same for both paths)

209
src/commands/dream.ts Normal file
View File

@@ -0,0 +1,209 @@
/**
* gbrain dream — run one brain maintenance cycle.
*
* The README brand promise: "the agent runs while I sleep, the dream
* cycle ... I wake up and the brain is smarter." Cron-friendly, JSON
* report, phase-selectable.
*
* Thin alias over runCycle (src/core/cycle.ts). Both this command and
* `gbrain autopilot` converge on the same primitive so there's one
* source of truth for what "overnight maintenance" means.
*
* Usage:
* gbrain dream # full 6-phase cycle
* gbrain dream --dry-run # preview, no writes
* gbrain dream --json # CycleReport JSON (for agents)
* gbrain dream --phase lint # run a single phase
* gbrain dream --pull # also git pull the brain repo
* gbrain dream --dir /path/to/brain # explicit brain location
*
* Cron: 0 2 * * * gbrain dream --json >> /var/log/gbrain-dream.log
*
* Related: `gbrain autopilot --install` for continuous daemonized
* maintenance. dream is the one-shot, autopilot is the scheduler.
*/
import type { BrainEngine } from '../core/engine.ts';
import {
runCycle,
ALL_PHASES,
type CyclePhase,
type CycleReport,
} from '../core/cycle.ts';
import { existsSync } from 'fs';
interface DreamArgs {
json: boolean;
dryRun: boolean;
pull: boolean;
phase: CyclePhase | null;
dir: string | null;
help: boolean;
}
function parseArgs(args: string[]): DreamArgs {
const phaseIdx = args.indexOf('--phase');
const rawPhase = phaseIdx !== -1 ? args[phaseIdx + 1] : null;
const phase = rawPhase && (ALL_PHASES as string[]).includes(rawPhase)
? (rawPhase as CyclePhase)
: null;
if (rawPhase && !phase) {
console.error(`Unknown phase "${rawPhase}". Valid: ${ALL_PHASES.join(', ')}`);
process.exit(1);
}
const dirIdx = args.indexOf('--dir');
const dir = dirIdx !== -1 ? args[dirIdx + 1] : null;
return {
json: args.includes('--json'),
dryRun: args.includes('--dry-run'),
pull: args.includes('--pull'),
phase,
dir,
help: args.includes('--help') || args.includes('-h'),
};
}
/**
* Resolve the brain directory without the `findRepoRoot` footgun.
*
* Prior dream.ts walked up 10 levels of cwd looking for `.git` and would
* happily run lint + sync against an unrelated git repo the user happened
* to be cd'd into. This resolver only trusts two sources:
* 1. An explicit --dir argument.
* 2. The `sync.repo_path` config key set by `gbrain init` (engine-backed).
*
* If neither is available, we error out instead of guessing.
*/
async function resolveBrainDir(
engine: BrainEngine | null,
explicit: string | null,
): Promise<string> {
if (explicit) {
if (!existsSync(explicit)) {
console.error(`--dir path does not exist: ${explicit}`);
process.exit(1);
}
return explicit;
}
if (engine) {
const configured = await engine.getConfig('sync.repo_path');
if (configured && existsSync(configured)) {
return configured;
}
}
console.error(
'No brain directory found. Pass --dir <path> or configure one via `gbrain init`.',
);
process.exit(1);
}
function printHelp() {
console.log(`Usage: gbrain dream [options]
Run one brain maintenance cycle: lint, backlinks, orphan sweep, sync,
extract, and embed. Designed for cron (exits when done).
Options:
--dry-run Preview all fixes without writing (fs or DB)
--json Emit the CycleReport as JSON (agent-readable)
--phase <name> Run a single phase: ${ALL_PHASES.join(' | ')}
--pull git pull the brain repo before syncing (default: no pull)
--dir <path> Brain directory (default: configured brain)
--help, -h Show this help
Examples:
gbrain dream
gbrain dream --dry-run --json
gbrain dream --phase lint
0 2 * * * gbrain dream --json # nightly via cron
Related:
gbrain autopilot --install # continuous maintenance as a daemon
gbrain autopilot # same maintenance cycle, scheduled
`);
}
// ─── Human-friendly report printing ────────────────────────────────
function printHuman(report: CycleReport) {
if (report.status === 'skipped') {
if (report.reason === 'cycle_already_running') {
console.log(`Skipped: another cycle is already running. (locked)`);
} else if (report.reason === 'no_database') {
console.log(`Skipped: no database available.`);
} else {
console.log(`Skipped: ${report.reason ?? 'unknown reason'}.`);
}
return;
}
if (report.status === 'clean') {
console.log(
`Brain is healthy. ${report.phases.length} phase(s) checked in ${(report.duration_ms / 1000).toFixed(1)}s.`,
);
return;
}
console.log(`Dream cycle (${report.status}) in ${(report.duration_ms / 1000).toFixed(1)}s:`);
for (const p of report.phases) {
const icon =
p.status === 'ok' ? '✓' :
p.status === 'warn' ? '!' :
p.status === 'skipped' ? '-' : '✗';
const line = ` ${icon} ${p.phase.padEnd(10)} ${p.summary}`;
console.log(line);
if (p.error) {
const hint = p.error.hint ? ` (${p.error.hint})` : '';
console.log(` [${p.error.class}/${p.error.code}] ${p.error.message}${hint}`);
}
}
const t = report.totals;
const hasTotals =
t.lint_fixes > 0 || t.backlinks_added > 0 || t.pages_synced > 0 ||
t.pages_extracted > 0 || t.pages_embedded > 0 || t.orphans_found > 0;
if (hasTotals) {
console.log(
` totals: lint=${t.lint_fixes} backlinks=${t.backlinks_added} synced=${t.pages_synced} extracted=${t.pages_extracted} embedded=${t.pages_embedded} orphans=${t.orphans_found}`,
);
}
}
// ─── CLI entry ─────────────────────────────────────────────────────
export async function runDream(engine: BrainEngine | null, args: string[]): Promise<CycleReport | void> {
const opts = parseArgs(args);
if (opts.help) {
printHelp();
return;
}
const brainDir = await resolveBrainDir(engine, opts.dir);
const phases: CyclePhase[] | undefined = opts.phase ? [opts.phase] : undefined;
const report = await runCycle(engine, {
brainDir,
dryRun: opts.dryRun,
pull: opts.pull,
phases,
});
if (opts.json) {
console.log(JSON.stringify(report, null, 2));
} else {
printHuman(report);
}
// Exit non-zero when the cycle failed overall (helps cron spot real problems).
// 'partial' is not a failure — it means some phase warned but the cycle ran.
if (report.status === 'failed') {
process.exit(1);
}
return report;
}

View File

@@ -14,6 +14,12 @@ export interface EmbedOpts {
slugs?: string[];
/** Embed a single page. */
slug?: string;
/**
* Dry run: enumerate what WOULD be embedded (stale chunk counts)
* without calling the embedding model or writing to the engine.
* Safe to call with no API key. Used by runCycle's dryRun propagation.
*/
dryRun?: boolean;
/**
* Optional progress callback. Called after each page. CLI wrappers
* supply a reporter.tick()-backed implementation; Minion handlers
@@ -23,48 +29,86 @@ export interface EmbedOpts {
onProgress?: (done: number, total: number, embedded: number) => void;
}
/**
* Structured result from a library-level embed run.
*
* In dryRun mode, `embedded = 0` and `would_embed` holds the count of
* stale chunks that WOULD have been sent to the embedding model. In
* non-dryRun mode, `embedded` holds the real count and `would_embed = 0`.
* `skipped` counts chunks that already had embeddings (nothing to do).
*/
export interface EmbedResult {
/** Chunks newly embedded in this run (0 in dryRun). */
embedded: number;
/** Chunks with pre-existing embeddings, skipped. */
skipped: number;
/** Chunks that would be embedded if not for dryRun (0 in non-dryRun). */
would_embed: number;
/** Total chunks considered across all processed pages. */
total_chunks: number;
/** Number of pages processed (whether or not they had stale chunks). */
pages_processed: number;
/** True if this run was a dry-run. */
dryRun: boolean;
}
/**
* Library-level embed. Throws on validation errors; per-page embed failures
* are logged to stderr but do not throw (matches the existing CLI semantics
* for batch runs). Safe to call from Minions handlers — no process.exit.
*
* Returns EmbedResult with accurate counts so callers (runCycle, sync
* auto-embed step) can report embeddings in their own structured output.
*/
export async function runEmbedCore(engine: BrainEngine, opts: EmbedOpts): Promise<void> {
export async function runEmbedCore(engine: BrainEngine, opts: EmbedOpts): Promise<EmbedResult> {
const result: EmbedResult = {
embedded: 0,
skipped: 0,
would_embed: 0,
total_chunks: 0,
pages_processed: 0,
dryRun: !!opts.dryRun,
};
if (opts.slugs && opts.slugs.length > 0) {
for (const s of opts.slugs) {
try { await embedPage(engine, s); } catch (e: unknown) {
try {
await embedPage(engine, s, !!opts.dryRun, result);
} catch (e: unknown) {
console.error(` Error embedding ${s}: ${e instanceof Error ? e.message : e}`);
}
}
return;
return result;
}
if (opts.all || opts.stale) {
await embedAll(engine, !!opts.stale, opts.onProgress);
return;
await embedAll(engine, !!opts.stale, !!opts.dryRun, result, opts.onProgress);
return result;
}
if (opts.slug) {
await embedPage(engine, opts.slug);
return;
await embedPage(engine, opts.slug, !!opts.dryRun, result);
return result;
}
throw new Error('No embed target specified. Pass { slug }, { slugs }, { all }, or { stale }.');
}
export async function runEmbed(engine: BrainEngine, args: string[]) {
export async function runEmbed(engine: BrainEngine, args: string[]): Promise<EmbedResult | undefined> {
const slugsIdx = args.indexOf('--slugs');
const all = args.includes('--all');
const stale = args.includes('--stale');
const dryRun = args.includes('--dry-run');
let opts: EmbedOpts;
if (slugsIdx >= 0) {
opts = { slugs: args.slice(slugsIdx + 1).filter(a => !a.startsWith('--')) };
opts = { slugs: args.slice(slugsIdx + 1).filter(a => !a.startsWith('--')), dryRun };
} else if (all || stale) {
opts = { all, stale };
opts = { all, stale, dryRun };
} else {
const slug = args.find(a => !a.startsWith('--'));
if (!slug) {
console.error('Usage: gbrain embed [<slug>|--all|--stale|--slugs s1 s2 ...]');
console.error('Usage: gbrain embed [<slug>|--all|--stale|--slugs s1 s2 ...] [--dry-run]');
process.exit(1);
}
opts = { slug };
opts = { slug, dryRun };
}
// CLI path: wire a reporter so --progress-json / --quiet / TTY rendering
@@ -81,8 +125,9 @@ export async function runEmbed(engine: BrainEngine, args: string[]) {
};
try {
await runEmbedCore(engine, opts);
const result = await runEmbedCore(engine, opts);
if (progressStarted) progress.finish();
return result;
} catch (e) {
if (progressStarted) progress.finish();
console.error(e instanceof Error ? e.message : String(e));
@@ -90,16 +135,22 @@ export async function runEmbed(engine: BrainEngine, args: string[]) {
}
}
async function embedPage(engine: BrainEngine, slug: string) {
async function embedPage(
engine: BrainEngine,
slug: string,
dryRun: boolean,
result: EmbedResult,
) {
const page = await engine.getPage(slug);
if (!page) {
throw new Error(`Page not found: ${slug}`);
}
// Get existing chunks or create new ones
// Get existing chunks or create new ones.
// In dryRun, we still chunk the text locally to count what WOULD be
// embedded — but we never write chunks or call the embedding model.
let chunks = await engine.getChunks(slug);
if (chunks.length === 0) {
// Create chunks first
const inputs: ChunkInput[] = [];
if (page.compiled_truth.trim()) {
for (const c of chunkText(page.compiled_truth)) {
@@ -111,6 +162,15 @@ async function embedPage(engine: BrainEngine, slug: string) {
inputs.push({ chunk_index: inputs.length, chunk_text: c.text, chunk_source: 'timeline' });
}
}
if (dryRun) {
// Count what chunking WOULD produce, without writing.
result.total_chunks += inputs.length;
result.would_embed += inputs.length;
result.pages_processed++;
return;
}
if (inputs.length > 0) {
await engine.upsertChunks(slug, inputs);
chunks = await engine.getChunks(slug);
@@ -119,8 +179,18 @@ async function embedPage(engine: BrainEngine, slug: string) {
// Embed chunks without embeddings
const toEmbed = chunks.filter(c => !c.embedded_at);
result.total_chunks += chunks.length;
result.skipped += chunks.length - toEmbed.length;
if (toEmbed.length === 0) {
console.log(`${slug}: all ${chunks.length} chunks already embedded`);
result.pages_processed++;
return;
}
if (dryRun) {
result.would_embed += toEmbed.length;
result.pages_processed++;
return;
}
@@ -138,17 +208,19 @@ async function embedPage(engine: BrainEngine, slug: string) {
}));
await engine.upsertChunks(slug, updated);
result.embedded += toEmbed.length;
result.pages_processed++;
console.log(`${slug}: embedded ${toEmbed.length} chunks`);
}
async function embedAll(
engine: BrainEngine,
staleOnly: boolean,
dryRun: boolean,
result: EmbedResult,
onProgress?: (done: number, total: number, embedded: number) => void,
) {
const pages = await engine.listPages({ limit: 100000 });
let total = 0;
let embedded = 0;
let processed = 0;
// Concurrency limit for parallel page embedding.
@@ -167,9 +239,21 @@ async function embedAll(
? chunks.filter(c => !c.embedded_at)
: chunks;
result.total_chunks += chunks.length;
result.skipped += chunks.length - toEmbed.length;
if (toEmbed.length === 0) {
processed++;
onProgress?.(processed, pages.length, embedded);
result.pages_processed++;
onProgress?.(processed, pages.length, result.embedded);
return;
}
if (dryRun) {
result.would_embed += toEmbed.length;
processed++;
result.pages_processed++;
onProgress?.(processed, pages.length, result.embedded);
return;
}
@@ -189,14 +273,14 @@ async function embedAll(
token_count: c.token_count || Math.ceil(c.chunk_text.length / 4),
}));
await engine.upsertChunks(page.slug, updated);
embedded += toEmbed.length;
result.embedded += toEmbed.length;
} catch (e: unknown) {
console.error(`\n Error embedding ${page.slug}: ${e instanceof Error ? e.message : e}`);
}
total += toEmbed.length;
processed++;
onProgress?.(processed, pages.length, embedded);
result.pages_processed++;
onProgress?.(processed, pages.length, result.embedded);
}
// Sliding worker pool: N workers share a queue and each pulls the
@@ -216,5 +300,9 @@ async function embedAll(
await Promise.all(Array.from({ length: numWorkers }, () => worker()));
// Stdout summary preserved for scripts/tests that grep for counts.
console.log(`Embedded ${embedded} chunks across ${pages.length} pages`);
if (dryRun) {
console.log(`[dry-run] Would embed ${result.would_embed} chunks across ${pages.length} pages`);
} else {
console.log(`Embedded ${result.embedded} chunks across ${pages.length} pages`);
}
}

View File

@@ -569,58 +569,39 @@ export async function registerBuiltinHandlers(worker: MinionWorker, engine: Brai
return await runBacklinksCore({ action, dir, dryRun: !!job.data.dryRun });
});
// The killer handler. Autopilot submits ONE `autopilot-cycle` per cycle
// (idempotency_key on cycle slot) instead of a 4-job parent-child DAG,
// because Minions' parent/child is NOT a depends_on primitive (Codex
// H3/H4). Each step is wrapped in its own try/catch; the handler returns
// `{ partial: true, failed_steps: [...] }` when any step fails. It does
// NOT throw on partial failure — that would cause the Minion to retry,
// and an intermittent extract bug would block every future cycle.
// Autopilot-cycle handler: delegates to runCycle. Shares the exact same
// phase set and ordering as `gbrain dream` and autopilot's inline path —
// one source of truth for what the brain does overnight.
//
// Yields the event loop between phases so the worker's lock-renewal
// timer (src/core/minions/worker.ts) can fire. Without this the v0.14
// stall-death regression returns: long CPU-bound phases starve the
// renewal callback and the stalled-sweeper kills the job.
//
// Phase failures surface as report.status='partial' (via runCycle's
// derivation); the handler returns { partial, status, report } so
// `gbrain jobs get <id>` shows the full structured report. Does NOT
// throw on partial: a flaky phase must not block every future cycle.
worker.register('autopilot-cycle', async (job) => {
const { performSync } = await import('./sync.ts');
const { runExtractCore } = await import('./extract.ts');
const { runEmbedCore } = await import('./embed.ts');
const { runBacklinksCore } = await import('./backlinks.ts');
const { runCycle } = await import('../core/cycle.ts');
const repoPath = typeof job.data.repoPath === 'string'
? job.data.repoPath
: (await engine.getConfig('sync.repo_path')) ?? '.';
const steps: Record<string, unknown> = {};
const failed: string[] = [];
const report = await runCycle(engine, {
brainDir: repoPath,
pull: true, // autopilot daemon opts into git pull
yieldBetweenPhases: async () => {
// Yield to the event loop so worker lock-renewal can fire.
await new Promise<void>(r => setImmediate(r));
},
});
// Bug 8 — Between phases, yield to the event loop. The worker's lock
// renewal runs on a timer (src/core/minions/worker.ts); without a
// periodic yield, long CPU-bound phases starve the renewal callback
// and the job gets killed by the stalled-sweeper. A single
// `await new Promise(r => setImmediate(r))` gives the timer a chance
// to fire. The per-phase body is async+await already, so each phase
// internally yields on its own I/O boundaries — this is a belt for
// the gap between phases.
//
// Follow-up (deferred to v0.15): thread ctx.signal / ctx.shutdownSignal
// through each core fn so mid-phase cancellation works on huge brains.
const yieldToLoop = () => new Promise<void>(r => setImmediate(r));
try { steps.sync = await performSync(engine, { repoPath, noEmbed: true }); }
catch (e) { steps.sync = { error: e instanceof Error ? e.message : String(e) }; failed.push('sync'); }
await yieldToLoop();
try { steps.extract = await runExtractCore(engine, { mode: 'all', dir: repoPath }); }
catch (e) { steps.extract = { error: e instanceof Error ? e.message : String(e) }; failed.push('extract'); }
await yieldToLoop();
try { await runEmbedCore(engine, { stale: true }); steps.embed = { embedded: true }; }
catch (e) { steps.embed = { error: e instanceof Error ? e.message : String(e) }; failed.push('embed'); }
await yieldToLoop();
try { steps.backlinks = await runBacklinksCore({ action: 'fix', dir: repoPath }); }
catch (e) { steps.backlinks = { error: e instanceof Error ? e.message : String(e) }; failed.push('backlinks'); }
if (failed.length > 0) {
return { partial: true, failed_steps: failed, steps };
}
return { partial: false, steps };
return {
partial: report.status === 'partial' || report.status === 'failed',
status: report.status,
report,
};
});
// Shell handler: registered ONLY when GBRAIN_ALLOW_SHELL_JOBS=1 is set on the

View File

@@ -13,7 +13,6 @@
*/
import type { BrainEngine } from '../core/engine.ts';
import * as db from '../core/db.ts';
import { createProgress, startHeartbeat } from '../core/progress.ts';
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
@@ -99,30 +98,30 @@ export function deriveDomain(frontmatterDomain: string | null | undefined, slug:
// --- Core query ---
/**
* Find pages with no inbound links.
* Returns raw rows from the DB (all pages regardless of filter).
* Find pages with no inbound links via the engine's built-in helper.
* Returns raw rows (all pages regardless of filter).
*
* As of v0.17: takes an engine argument. Composes with runCycle which
* passes an explicit engine. No more db.getConnection() global — fixes
* the PGLite-vs-Postgres + test-fixture coupling codex flagged.
*/
export async function queryOrphanPages(): Promise<{ slug: string; title: string; domain: string | null }[]> {
const sql = db.getConnection();
const rows = await sql`
SELECT
p.slug,
COALESCE(p.title, p.slug) AS title,
p.frontmatter->>'domain' AS domain
FROM pages p
WHERE NOT EXISTS (
SELECT 1 FROM links l WHERE l.to_page_id = p.id
)
ORDER BY p.slug
`;
return rows as unknown as { slug: string; title: string; domain: string | null }[];
export async function queryOrphanPages(
engine: BrainEngine,
): Promise<{ slug: string; title: string; domain: string | null }[]> {
return engine.findOrphanPages();
}
/**
* Find orphan pages, with optional pseudo-page filtering.
* Returns structured OrphanResult with totals.
*
* As of v0.17: `engine` is required. See queryOrphanPages for rationale.
*/
export async function findOrphans(includePseudo: boolean = false): Promise<OrphanResult> {
export async function findOrphans(
engine: BrainEngine,
opts: { includePseudo?: boolean } = {},
): Promise<OrphanResult> {
const includePseudo = !!opts.includePseudo;
// The NOT EXISTS anti-join over pages × links can take seconds on 50K-page
// brains. Heartbeat every second so agents see the scan is alive. Keyset
// pagination was considered and rejected: without an index on
@@ -134,12 +133,10 @@ export async function findOrphans(includePseudo: boolean = false): Promise<Orpha
let allOrphans: { slug: string; title: string; domain: string | null }[];
let total: number;
try {
allOrphans = await queryOrphanPages();
allOrphans = await engine.findOrphanPages();
// Count total pages in DB for the summary line
const sql = db.getConnection();
const [{ count: totalPagesCount }] = await sql`SELECT count(*)::int AS count FROM pages`;
total = Number(totalPagesCount);
const stats = await engine.getStats();
total = stats.page_count;
} finally {
stopHb();
progress.finish();
@@ -206,7 +203,7 @@ export function formatOrphansText(result: OrphanResult): string {
// --- CLI entry point ---
export async function runOrphans(_engine: BrainEngine, args: string[]) {
export async function runOrphans(engine: BrainEngine, args: string[]) {
const json = args.includes('--json');
const count = args.includes('--count');
const includePseudo = args.includes('--include-pseudo');
@@ -228,7 +225,7 @@ Summary line: N orphans out of M linkable pages (K total; K-M excluded)
return;
}
const result = await findOrphans(includePseudo);
const result = await findOrphans(engine, { includePseudo });
if (count) {
console.log(String(result.total_orphans));

View File

@@ -24,6 +24,8 @@ export interface SyncResult {
deleted: number;
renamed: number;
chunksCreated: number;
/** Pages re-embedded during this sync's auto-embed step. 0 if --no-embed or skipped. */
embedded: number;
pagesAffected: string[];
failedFiles?: number; // count of parse failures (Bug 9)
}
@@ -116,6 +118,7 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
toCommit: headCommit,
added: 0, modified: 0, deleted: 0, renamed: 0,
chunksCreated: 0,
embedded: 0,
pagesAffected: [],
};
}
@@ -165,6 +168,7 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
deleted: filtered.deleted.length,
renamed: filtered.renamed.length,
chunksCreated: 0,
embedded: 0,
pagesAffected: [],
};
}
@@ -179,6 +183,7 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
toCommit: headCommit,
added: 0, modified: 0, deleted: 0, renamed: 0,
chunksCreated: 0,
embedded: 0,
pagesAffected: [],
};
}
@@ -301,6 +306,7 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
deleted: filtered.deleted.length,
renamed: filtered.renamed.length,
chunksCreated,
embedded: 0,
pagesAffected,
failedFiles: failedFiles.length,
};
@@ -338,10 +344,15 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
}
// Auto-embed (skip for large syncs — embedding calls OpenAI)
let embedded = 0;
if (!noEmbed && pagesAffected.length > 0 && pagesAffected.length <= 100) {
try {
const { runEmbed } = await import('./embed.ts');
await runEmbed(engine, ['--slugs', ...pagesAffected]);
// Before commit 2 lands: runEmbed is void. Best estimate is pagesAffected,
// since runEmbed re-embeds every requested slug. Commit 2 sharpens this
// with EmbedResult.embedded.
embedded = pagesAffected.length;
} catch { /* embedding is best-effort */ }
} else if (noEmbed || totalChanges > 100) {
console.log(`Text imported. Run 'gbrain embed --stale' to generate embeddings.`);
@@ -356,6 +367,7 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
deleted: filtered.deleted.length,
renamed: filtered.renamed.length,
chunksCreated,
embedded,
pagesAffected,
};
}
@@ -366,6 +378,33 @@ async function performFullSync(
headCommit: string,
opts: SyncOpts,
): Promise<SyncResult> {
// Dry-run: walk the repo, count syncable files, return without writing.
// Fixes the silent-write-on-dry-run bug where performFullSync called
// runImport unconditionally regardless of opts.dryRun.
if (opts.dryRun) {
const { collectMarkdownFiles } = await import('./import.ts');
const allFiles = collectMarkdownFiles(repoPath);
const syncableRelPaths = allFiles
.map(abs => relative(repoPath, abs))
.filter(rel => isSyncable(rel));
console.log(
`Full-sync dry run: ${syncableRelPaths.length} file(s) would be imported ` +
`from ${repoPath} @ ${headCommit.slice(0, 8)}.`,
);
return {
status: 'dry_run',
fromCommit: null,
toCommit: headCommit,
added: syncableRelPaths.length,
modified: 0,
deleted: 0,
renamed: 0,
chunksCreated: 0,
embedded: 0,
pagesAffected: [],
};
}
console.log(`Running full import of ${repoPath}...`);
const { runImport } = await import('./import.ts');
const importArgs = [repoPath];
@@ -391,6 +430,7 @@ async function performFullSync(
toCommit: headCommit,
added: 0, modified: 0, deleted: 0, renamed: 0,
chunksCreated: result.chunksCreated,
embedded: 0,
pagesAffected: [],
failedFiles: result.failures.length,
};
@@ -404,11 +444,15 @@ async function performFullSync(
await engine.setConfig('sync.last_run', new Date().toISOString());
await engine.setConfig('sync.repo_path', repoPath);
// Full sync doesn't track pagesAffected, so fall back to embed --stale
// Full sync doesn't track pagesAffected, so fall back to embed --stale.
// Before commit 2: runEmbed is void; use result.imported as best estimate of
// pages touched. Commit 2 sharpens this with real EmbedResult counts.
let embedded = 0;
if (!opts.noEmbed) {
try {
const { runEmbed } = await import('./embed.ts');
await runEmbed(engine, ['--stale']);
embedded = result.imported;
} catch { /* embedding is best-effort */ }
}
@@ -416,8 +460,12 @@ async function performFullSync(
status: 'first_sync',
fromCommit: null,
toCommit: headCommit,
added: 0, modified: 0, deleted: 0, renamed: 0,
chunksCreated: 0,
added: result.imported,
modified: 0,
deleted: 0,
renamed: 0,
chunksCreated: result.chunksCreated,
embedded,
pagesAffected: [],
};
}
@@ -489,10 +537,11 @@ function printSyncResult(result: SyncResult) {
case 'synced':
console.log(`Synced ${result.fromCommit?.slice(0, 8)}..${result.toCommit.slice(0, 8)}:`);
console.log(` +${result.added} added, ~${result.modified} modified, -${result.deleted} deleted, R${result.renamed} renamed`);
console.log(` ${result.chunksCreated} chunks created`);
console.log(` ${result.chunksCreated} chunks created${result.embedded > 0 ? `, ${result.embedded} pages embedded` : ''}`);
break;
case 'first_sync':
console.log(`First sync complete. Checkpoint: ${result.toCommit.slice(0, 8)}`);
console.log(` ${result.added} file(s) imported, ${result.chunksCreated} chunks${result.embedded > 0 ? `, ${result.embedded} pages embedded` : ''}`);
break;
case 'dry_run':
break; // already printed in performSync

817
src/core/cycle.ts Normal file
View File

@@ -0,0 +1,817 @@
/**
* src/core/cycle.ts — The brain maintenance cycle primitive.
*
* Composes lint, backlinks, sync, extract, embed, and orphans into
* one honest unit of work. Called from:
* - `gbrain dream` (CLI alias; one-shot cron-triggered cycle)
* - `gbrain autopilot` (daemon; scheduled on an interval)
* - Minions `autopilot-cycle` handler (durable queue; retry + observability)
*
* All three converge on runCycle() so there's one source of truth for
* what "overnight maintenance" means.
*
* PHASE ORDER (semantically driven — fix files first, then index):
*
* ┌───────────────────────────────────────────────────────────┐
* │ Phase 1: lint --fix (filesystem writes, no DB) │
* │ Phase 2: backlinks --fix (filesystem writes, no DB) │
* │ Phase 3: sync (DB picks up phases 1+2) │
* │ Phase 4: extract (DB picks up links from sync) │
* │ Phase 5: embed --stale (DB writes) │
* │ Phase 6: orphans (DB read, report only) │
* └───────────────────────────────────────────────────────────┘
*
* COORDINATION:
*
* Postgres: a row in gbrain_cycle_locks with a TTL (30 min). Refreshed
* between phases via yieldBetweenPhases. Works through PgBouncer
* transaction pooling (session-scoped pg_try_advisory_lock does not).
*
* PGLite / engine=null: a file lock at ~/.gbrain/cycle.lock holding
* the PID + mtime. Same 30-min TTL semantics.
*
* LOCK-SKIP:
*
* Filesystem-only or read-only phase selections (lint, backlinks,
* orphans) skip the lock. Only DB-write phases (sync, extract, embed)
* trigger lock acquisition.
*/
import { existsSync, readFileSync, writeFileSync, unlinkSync, mkdirSync, statSync } from 'fs';
import { join } from 'path';
import { homedir, hostname } from 'os';
import type { BrainEngine } from './engine.ts';
import { createProgress, type ProgressReporter } from './progress.ts';
import { getCliOptions, cliOptsToProgressOptions } from './cli-options.ts';
// ─── Types ─────────────────────────────────────────────────────────
export type CyclePhase = 'lint' | 'backlinks' | 'sync' | 'extract' | 'embed' | 'orphans';
export const ALL_PHASES: CyclePhase[] = [
'lint',
'backlinks',
'sync',
'extract',
'embed',
'orphans',
];
/**
* Phases that mutate state (filesystem or DB) and therefore should
* coordinate via the cycle lock. Only orphans is truly read-only
* and skips the lock.
*/
const NEEDS_LOCK_PHASES: ReadonlySet<CyclePhase> = new Set([
'lint',
'backlinks',
'sync',
'extract',
'embed',
]);
export type PhaseStatus = 'ok' | 'warn' | 'fail' | 'skipped';
export interface PhaseError {
/** Error class for machine branching — e.g., 'DatabaseConnection', 'Timeout', 'LLMError', 'FilesystemError', 'InternalError'. */
class: string;
/** System error code or short identifier, e.g., 'ECONNREFUSED', 'ETIMEDOUT', 'UNKNOWN'. */
code: string;
/** Human-readable single-line message. */
message: string;
/** Optional suggestion of what to try next. */
hint?: string;
/** Optional link to a troubleshooting doc. */
docs_url?: string;
}
export interface PhaseResult {
phase: CyclePhase;
status: PhaseStatus;
duration_ms: number;
summary: string;
details: Record<string, unknown>;
error?: PhaseError;
}
export type CycleStatus = 'ok' | 'clean' | 'partial' | 'skipped' | 'failed';
export interface CycleReport {
/** Additive schema. Bumped on breaking changes. */
schema_version: '1';
timestamp: string;
duration_ms: number;
/**
* Overall status derived from phase results:
* - 'clean' : ran successfully, zero fixes/writes across every phase
* - 'ok' : ran successfully, some work was done
* - 'partial' : at least one phase warned or failed, others ran
* - 'skipped' : cycle did not run (lock held by another holder)
* - 'failed' : lock acquired but all attempted phases failed
*/
status: CycleStatus;
/** Present when status = 'skipped'. E.g., 'cycle_already_running' or 'no_database'. */
reason?: string;
brain_dir: string | null;
phases: PhaseResult[];
totals: {
lint_fixes: number;
backlinks_added: number;
pages_synced: number;
pages_extracted: number;
pages_embedded: number;
orphans_found: number;
};
}
export interface CycleOpts {
/** If true, no writes to filesystem or DB. All phases honor this. */
dryRun?: boolean;
/** Defaults to ALL_PHASES. Pass a subset for --phase lint etc. */
phases?: CyclePhase[];
/** Brain directory (git repo). Required for filesystem phases. */
brainDir: string;
/** Whether sync should run `git pull`. Default false (cron-safe). */
pull?: boolean;
/**
* Called between phases AND before runCycle returns. Awaited even
* after phase failure. Hook exceptions are logged, never fatal.
* Minions handlers pass a function that yields + renews the job lock
* + refreshes the cycle-lock-table TTL.
*/
yieldBetweenPhases?: () => Promise<void>;
}
// ─── Lock primitives ───────────────────────────────────────────────
const CYCLE_LOCK_ID = 'gbrain-cycle';
const LOCK_TTL_MS = 30 * 60 * 1000; // 30 minutes
const LOCK_FILE_PATH_DEFAULT = join(homedir(), '.gbrain', 'cycle.lock');
interface LockHandle {
release: () => Promise<void>;
refresh: () => Promise<void>;
}
/**
* Acquire the Postgres-backed cycle lock.
* Returns a LockHandle on success, or null if another live holder has it.
*
* Uses INSERT ... ON CONFLICT (id) DO UPDATE ... WHERE ttl_expires_at < NOW()
* RETURNING *. An empty RETURNING means the existing row is still live.
* Crashed holders auto-release: when their TTL expires, the next
* acquirer's UPDATE branch fires and takes over.
*/
async function acquirePostgresLock(engine: BrainEngine): Promise<LockHandle | null> {
const pid = process.pid;
const host = hostname();
// Engine-agnostic: BrainEngine exposes findOrphanPages etc., but not raw SQL.
// We reach through the engine's internal connection for this lock operation.
// Both engines expose `sql` (postgres-js tag) or `db.query` (PGLite).
const maybePG = engine as unknown as { sql?: (...args: unknown[]) => Promise<unknown> };
const maybePGLite = engine as unknown as { db?: { query: (sql: string, params?: unknown[]) => Promise<{ rows: unknown[] }> } };
if (engine.kind === 'postgres' && maybePG.sql) {
const sql = maybePG.sql as any;
const rows: Array<{ id: string }> = await sql`
INSERT INTO gbrain_cycle_locks (id, holder_pid, holder_host, acquired_at, ttl_expires_at)
VALUES (${CYCLE_LOCK_ID}, ${pid}, ${host}, NOW(), NOW() + INTERVAL '30 minutes')
ON CONFLICT (id) DO UPDATE
SET holder_pid = ${pid},
holder_host = ${host},
acquired_at = NOW(),
ttl_expires_at = NOW() + INTERVAL '30 minutes'
WHERE gbrain_cycle_locks.ttl_expires_at < NOW()
RETURNING id
`;
if (rows.length === 0) return null; // live holder
return {
refresh: async () => {
await sql`
UPDATE gbrain_cycle_locks
SET ttl_expires_at = NOW() + INTERVAL '30 minutes'
WHERE id = ${CYCLE_LOCK_ID} AND holder_pid = ${pid}
`;
},
release: async () => {
await sql`
DELETE FROM gbrain_cycle_locks
WHERE id = ${CYCLE_LOCK_ID} AND holder_pid = ${pid}
`;
},
};
}
if (engine.kind === 'pglite' && maybePGLite.db) {
// PGLite is single-writer; the DB row is belt-and-braces on top of the
// file lock. Callers always hold the file lock first, so this UPSERT
// is race-free against other processes.
const db = maybePGLite.db;
const { rows } = await db.query(
`INSERT INTO gbrain_cycle_locks (id, holder_pid, holder_host, acquired_at, ttl_expires_at)
VALUES ($1, $2, $3, NOW(), NOW() + INTERVAL '30 minutes')
ON CONFLICT (id) DO UPDATE
SET holder_pid = $2,
holder_host = $3,
acquired_at = NOW(),
ttl_expires_at = NOW() + INTERVAL '30 minutes'
WHERE gbrain_cycle_locks.ttl_expires_at < NOW()
RETURNING id`,
[CYCLE_LOCK_ID, pid, host],
);
if (rows.length === 0) return null;
return {
refresh: async () => {
await db.query(
`UPDATE gbrain_cycle_locks
SET ttl_expires_at = NOW() + INTERVAL '30 minutes'
WHERE id = $1 AND holder_pid = $2`,
[CYCLE_LOCK_ID, pid],
);
},
release: async () => {
await db.query(
`DELETE FROM gbrain_cycle_locks WHERE id = $1 AND holder_pid = $2`,
[CYCLE_LOCK_ID, pid],
);
},
};
}
throw new Error(`Unknown engine kind: ${engine.kind}`);
}
/**
* Acquire the file-based cycle lock (used when engine === null).
* Returns a LockHandle on success, or null if a live holder has it.
*
* The file contains `{pid}\n{iso-timestamp}`. Staleness = mtime older
* than LOCK_TTL_MS OR the PID is no longer alive on this host.
*/
function acquireFileLock(lockPath = LOCK_FILE_PATH_DEFAULT): LockHandle | null {
mkdirSync(join(lockPath, '..'), { recursive: true });
const pid = process.pid;
if (existsSync(lockPath)) {
// Check TTL.
try {
const st = statSync(lockPath);
const ageMs = Date.now() - st.mtimeMs;
const existingContent = readFileSync(lockPath, 'utf-8').trim();
const existingPid = parseInt(existingContent.split('\n')[0] || '0', 10);
// PID liveness check (same host only). kill(pid, 0) distinguishes:
// - success → process exists, caller can signal it
// - error ESRCH → no such process (truly dead)
// - error EPERM → process exists but caller can't signal it
// (e.g., PID 1/init on unix) → still alive
// Any error code OTHER than ESRCH means the PID is alive.
let pidAlive = false;
if (existingPid > 0 && existingPid !== pid) {
try {
process.kill(existingPid, 0);
pidAlive = true;
} catch (e) {
const code = (e as NodeJS.ErrnoException).code;
pidAlive = code !== 'ESRCH';
}
} else if (existingPid === pid) {
// Our own stale lock (same pid, previous run) — treat as stale.
pidAlive = false;
}
if (pidAlive && ageMs < LOCK_TTL_MS) {
return null; // live holder
}
// Stale lock — fall through to overwrite.
} catch {
// Any read/stat error: treat as stale.
}
}
writeFileSync(lockPath, `${pid}\n${new Date().toISOString()}\n`);
return {
refresh: async () => {
try {
writeFileSync(lockPath, `${pid}\n${new Date().toISOString()}\n`);
} catch {
/* non-fatal — a next-run stale check will notice */
}
},
release: async () => {
try {
const content = readFileSync(lockPath, 'utf-8').trim();
const heldPid = parseInt(content.split('\n')[0] || '0', 10);
if (heldPid === pid) unlinkSync(lockPath);
} catch {
/* already gone */
}
},
};
}
// ─── Helpers ───────────────────────────────────────────────────────
function makeErrorFromException(e: unknown, fallbackClass = 'InternalError'): PhaseError {
const err = e instanceof Error ? e : new Error(String(e));
// Node errors often have .code (e.g., 'ECONNREFUSED').
const code = (err as NodeJS.ErrnoException).code || 'UNKNOWN';
let className = fallbackClass;
if (code === 'ECONNREFUSED' || code === 'ENOTFOUND') className = 'DatabaseConnection';
if (code === 'ETIMEDOUT') className = 'Timeout';
if (/OpenAI|embed/i.test(err.message)) className = 'LLMError';
if (/ENOENT|EACCES|EISDIR|ENOTDIR/.test(code)) className = 'FilesystemError';
return {
class: className,
code,
message: err.message.slice(0, 200),
};
}
async function timePhase<T>(fn: () => Promise<T>): Promise<{ result: T; duration_ms: number }> {
const start = performance.now();
const result = await fn();
return { result, duration_ms: Math.round(performance.now() - start) };
}
async function safeYield(hook?: () => Promise<void>) {
if (!hook) return;
try {
await hook();
} catch (e) {
console.warn(`[cycle] yieldBetweenPhases hook error (non-fatal): ${e instanceof Error ? e.message : String(e)}`);
}
}
// ─── Phase runners ─────────────────────────────────────────────────
async function runPhaseLint(brainDir: string, dryRun: boolean): Promise<PhaseResult> {
try {
const { runLintCore } = await import('../commands/lint.ts');
const result = await runLintCore({ target: brainDir, fix: true, dryRun });
const issues = result.total_issues ?? 0;
const fixed = result.total_fixed ?? 0;
const remaining = Math.max(0, issues - fixed);
// 'ok' when nothing noteworthy remains:
// - no issues at all, or
// - non-dry-run and everything fixable was fixed.
// 'warn' when issues remain after the run.
const status: PhaseStatus =
issues === 0 || (!dryRun && remaining === 0) ? 'ok' : 'warn';
return {
phase: 'lint',
status,
duration_ms: 0, // set by caller
summary: dryRun
? `${issues} issue(s) found (dry-run, no writes)`
: `${fixed} fix(es) applied, ${remaining} remaining`,
details: { issues, fixed, pages_scanned: result.pages_scanned, dryRun },
};
} catch (e) {
return {
phase: 'lint',
status: 'fail',
duration_ms: 0,
summary: 'lint phase failed',
details: {},
error: makeErrorFromException(e),
};
}
}
async function runPhaseBacklinks(brainDir: string, dryRun: boolean): Promise<PhaseResult> {
try {
// Library function path — the v0.15 backlinks.ts exports
// runBacklinksCore when --fix is requested.
const { runBacklinksCore } = await import('../commands/backlinks.ts');
const result = await runBacklinksCore({
action: 'fix',
dir: brainDir,
dryRun,
});
const gaps = result.gaps_found ?? 0;
const added = result.fixed ?? 0;
const remaining = Math.max(0, gaps - added);
const status: PhaseStatus =
gaps === 0 || (!dryRun && remaining === 0) ? 'ok' : 'warn';
return {
phase: 'backlinks',
status,
duration_ms: 0,
summary: dryRun
? `${gaps} missing back-link(s) (dry-run)`
: `${added} back-link(s) added, ${remaining} remaining`,
details: { gaps, added, pages_affected: result.pages_affected, dryRun },
};
} catch (e) {
return {
phase: 'backlinks',
status: 'fail',
duration_ms: 0,
summary: 'backlinks phase failed',
details: {},
error: makeErrorFromException(e),
};
}
}
async function runPhaseSync(
engine: BrainEngine,
brainDir: string,
dryRun: boolean,
pull: boolean,
): Promise<PhaseResult> {
try {
const { performSync } = await import('../commands/sync.ts');
const result = await performSync(engine, {
repoPath: brainDir,
dryRun,
noPull: !pull,
noEmbed: true, // embed is a separate phase
});
const syncedCount = result.added + result.modified;
return {
phase: 'sync',
status: result.status === 'blocked_by_failures' ? 'warn' : 'ok',
duration_ms: 0,
summary: dryRun
? `${syncedCount} page(s) would sync, ${result.deleted} would delete`
: `+${result.added} added, ~${result.modified} modified, -${result.deleted} deleted`,
details: {
added: result.added,
modified: result.modified,
deleted: result.deleted,
renamed: result.renamed,
chunksCreated: result.chunksCreated,
failedFiles: result.failedFiles ?? 0,
syncStatus: result.status,
dryRun,
},
};
} catch (e) {
return {
phase: 'sync',
status: 'fail',
duration_ms: 0,
summary: 'sync phase failed',
details: {},
error: makeErrorFromException(e),
};
}
}
async function runPhaseExtract(
engine: BrainEngine,
brainDir: string,
dryRun: boolean,
): Promise<PhaseResult> {
try {
const { runExtractCore } = await import('../commands/extract.ts');
// Extract is read-mostly against the filesystem + write to links table.
// Honor dryRun by skipping with a 'skipped' entry: extract doesn't have
// a clean dry-run mode today and runCycle should be honest about it.
if (dryRun) {
return {
phase: 'extract',
status: 'skipped',
duration_ms: 0,
summary: 'dry-run: extract phase skipped (no dry-run mode yet)',
details: { dryRun: true, reason: 'no_dry_run_support' },
};
}
const result = await runExtractCore(engine, { mode: 'all', dir: brainDir });
const linksCreated = result?.links_created ?? 0;
const timelineCreated = result?.timeline_entries_created ?? 0;
return {
phase: 'extract',
status: 'ok',
duration_ms: 0,
summary: `${linksCreated} link(s), ${timelineCreated} timeline entries`,
details: { linksCreated, timelineCreated, pages_processed: result?.pages_processed ?? 0 },
};
} catch (e) {
return {
phase: 'extract',
status: 'fail',
duration_ms: 0,
summary: 'extract phase failed',
details: {},
error: makeErrorFromException(e),
};
}
}
async function runPhaseEmbed(engine: BrainEngine, dryRun: boolean): Promise<PhaseResult> {
try {
const { runEmbedCore } = await import('../commands/embed.ts');
const result = await runEmbedCore(engine, { stale: true, dryRun });
const embeddedCount = dryRun ? result.would_embed : result.embedded;
return {
phase: 'embed',
status: 'ok',
duration_ms: 0,
summary: dryRun
? `${result.would_embed} chunk(s) would be embedded (dry-run)`
: `${result.embedded} chunk(s) newly embedded (${result.skipped} already had embeddings)`,
details: {
embedded: result.embedded,
skipped: result.skipped,
would_embed: result.would_embed,
total_chunks: result.total_chunks,
pages_processed: result.pages_processed,
dryRun,
// Convenience field used by CycleReport.totals.pages_embedded.
// In dry-run, this counts pages with stale chunks that would
// have been processed (same semantic as a real run).
pages_embedded_count: dryRun ? result.pages_processed : embeddedCount > 0 ? result.pages_processed : 0,
},
};
} catch (e) {
return {
phase: 'embed',
status: 'fail',
duration_ms: 0,
summary: 'embed phase failed',
details: {},
error: makeErrorFromException(e),
};
}
}
async function runPhaseOrphans(engine: BrainEngine): Promise<PhaseResult> {
try {
const { findOrphans } = await import('../commands/orphans.ts');
const result = await findOrphans(engine);
const count = result.total_orphans;
return {
phase: 'orphans',
status: count > 20 ? 'warn' : 'ok',
duration_ms: 0,
summary: `${count} orphan page(s) out of ${result.total_pages} total`,
details: {
total_orphans: count,
total_pages: result.total_pages,
excluded: result.excluded,
},
};
} catch (e) {
return {
phase: 'orphans',
status: 'fail',
duration_ms: 0,
summary: 'orphans phase failed',
details: {},
error: makeErrorFromException(e),
};
}
}
// ─── Main ──────────────────────────────────────────────────────────
/**
* Run the brain maintenance cycle.
*
* Engine may be null: filesystem phases (lint, backlinks) still run;
* DB-dependent phases skip with status='skipped', reason='no_database'.
*
* Acquires the cycle lock for any DB-write phase selection. Non-DB-write
* selections (e.g., --phase lint) skip the lock as an optimization so
* single-phase runs are always responsive even if another cycle is live.
*/
export async function runCycle(
engine: BrainEngine | null,
opts: CycleOpts,
): Promise<CycleReport> {
const start = performance.now();
const phases = opts.phases ?? ALL_PHASES;
const dryRun = !!opts.dryRun;
const pull = !!opts.pull;
const timestamp = new Date().toISOString();
const phaseResults: PhaseResult[] = [];
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
// Decide if we need the cycle lock: any state-mutating phase in the selection.
const needsLock = phases.some(p => NEEDS_LOCK_PHASES.has(p));
let lock: LockHandle | null = null;
if (needsLock) {
if (engine) {
try {
lock = await acquirePostgresLock(engine);
} catch (e) {
// Lock acquisition failed catastrophically (e.g., migration missing).
// Return a failed report rather than silently running without a lock.
return {
schema_version: '1',
timestamp,
duration_ms: Math.round(performance.now() - start),
status: 'failed',
reason: 'lock_acquisition_error',
brain_dir: opts.brainDir,
phases: [
{
phase: 'sync',
status: 'fail',
duration_ms: 0,
summary: 'could not acquire cycle lock',
details: {},
error: makeErrorFromException(e, 'DatabaseConnection'),
},
],
totals: emptyTotals(),
};
}
} else {
lock = acquireFileLock();
}
if (lock === null) {
return {
schema_version: '1',
timestamp,
duration_ms: Math.round(performance.now() - start),
status: 'skipped',
reason: 'cycle_already_running',
brain_dir: opts.brainDir,
phases: [],
totals: emptyTotals(),
};
}
}
try {
// ── Phase 1: lint ────────────────────────────────────────────
if (phases.includes('lint')) {
progress.start('cycle.lint');
const { result, duration_ms } = await timePhase(() => runPhaseLint(opts.brainDir, dryRun));
result.duration_ms = duration_ms;
phaseResults.push(result);
progress.finish();
await safeYield(opts.yieldBetweenPhases);
}
// ── Phase 2: backlinks ──────────────────────────────────────
if (phases.includes('backlinks')) {
progress.start('cycle.backlinks');
const { result, duration_ms } = await timePhase(() => runPhaseBacklinks(opts.brainDir, dryRun));
result.duration_ms = duration_ms;
phaseResults.push(result);
progress.finish();
await safeYield(opts.yieldBetweenPhases);
}
// ── Phase 3: sync ───────────────────────────────────────────
if (phases.includes('sync')) {
if (!engine) {
phaseResults.push({
phase: 'sync',
status: 'skipped',
duration_ms: 0,
summary: 'no database connected',
details: { reason: 'no_database' },
});
} else {
progress.start('cycle.sync');
const { result, duration_ms } = await timePhase(() => runPhaseSync(engine, opts.brainDir, dryRun, pull));
result.duration_ms = duration_ms;
phaseResults.push(result);
progress.finish();
}
await safeYield(opts.yieldBetweenPhases);
}
// ── Phase 4: extract ────────────────────────────────────────
if (phases.includes('extract')) {
if (!engine) {
phaseResults.push({
phase: 'extract',
status: 'skipped',
duration_ms: 0,
summary: 'no database connected',
details: { reason: 'no_database' },
});
} else {
progress.start('cycle.extract');
const { result, duration_ms } = await timePhase(() => runPhaseExtract(engine, opts.brainDir, dryRun));
result.duration_ms = duration_ms;
phaseResults.push(result);
progress.finish();
}
await safeYield(opts.yieldBetweenPhases);
}
// ── Phase 5: embed ──────────────────────────────────────────
if (phases.includes('embed')) {
if (!engine) {
phaseResults.push({
phase: 'embed',
status: 'skipped',
duration_ms: 0,
summary: 'no database connected',
details: { reason: 'no_database' },
});
} else {
progress.start('cycle.embed');
const { result, duration_ms } = await timePhase(() => runPhaseEmbed(engine, dryRun));
result.duration_ms = duration_ms;
phaseResults.push(result);
progress.finish();
}
await safeYield(opts.yieldBetweenPhases);
}
// ── Phase 6: orphans ────────────────────────────────────────
if (phases.includes('orphans')) {
if (!engine) {
phaseResults.push({
phase: 'orphans',
status: 'skipped',
duration_ms: 0,
summary: 'no database connected',
details: { reason: 'no_database' },
});
} else {
progress.start('cycle.orphans');
const { result, duration_ms } = await timePhase(() => runPhaseOrphans(engine));
result.duration_ms = duration_ms;
phaseResults.push(result);
progress.finish();
}
await safeYield(opts.yieldBetweenPhases);
}
} finally {
if (lock) {
try { await lock.release(); } catch { /* best-effort */ }
}
}
const duration_ms = Math.round(performance.now() - start);
const totals = extractTotals(phaseResults);
const status = deriveStatus(phaseResults, totals);
return {
schema_version: '1',
timestamp,
duration_ms,
status,
brain_dir: opts.brainDir,
phases: phaseResults,
totals,
};
}
// ─── Totals + status derivation ────────────────────────────────────
function emptyTotals(): CycleReport['totals'] {
return {
lint_fixes: 0,
backlinks_added: 0,
pages_synced: 0,
pages_extracted: 0,
pages_embedded: 0,
orphans_found: 0,
};
}
function extractTotals(phases: PhaseResult[]): CycleReport['totals'] {
const t = emptyTotals();
for (const p of phases) {
if (p.phase === 'lint' && p.details) {
t.lint_fixes = Number(p.details.fixed ?? 0);
} else if (p.phase === 'backlinks' && p.details) {
t.backlinks_added = Number(p.details.added ?? 0);
} else if (p.phase === 'sync' && p.details) {
t.pages_synced = Number(p.details.added ?? 0) + Number(p.details.modified ?? 0);
} else if (p.phase === 'extract' && p.details) {
t.pages_extracted = Number(p.details.linksCreated ?? 0);
} else if (p.phase === 'embed' && p.details) {
// In dry-run, use would_embed as the "activity" measure; else embedded.
const dryRun = p.details.dryRun === true;
t.pages_embedded = dryRun
? Number(p.details.would_embed ?? 0)
: Number(p.details.embedded ?? 0);
} else if (p.phase === 'orphans' && p.details) {
t.orphans_found = Number(p.details.total_orphans ?? 0);
}
}
return t;
}
function deriveStatus(phases: PhaseResult[], totals: CycleReport['totals']): CycleStatus {
if (phases.length === 0) return 'failed';
const anyFailed = phases.some(p => p.status === 'fail');
const allFailed = phases.every(p => p.status === 'fail');
const anyWarn = phases.some(p => p.status === 'warn');
if (allFailed) return 'failed';
if (anyFailed || anyWarn) return 'partial';
// All phases 'ok' or 'skipped'. Distinguish clean (no activity) from ok (work done).
const anyWork =
totals.lint_fixes > 0 ||
totals.backlinks_added > 0 ||
totals.pages_synced > 0 ||
totals.pages_extracted > 0 ||
totals.pages_embedded > 0;
return anyWork ? 'ok' : 'clean';
}

View File

@@ -152,6 +152,13 @@ export interface BrainEngine {
* Slugs with zero inbound links are present in the map with value 0.
*/
getBacklinkCounts(slugs: string[]): Promise<Map<string, number>>;
/**
* Return every page with no inbound links (from any source).
* Domain comes from the frontmatter `domain` field (null if unset).
* The caller filters pseudo-pages + derives display domain.
* Used by `gbrain orphans` and `runCycle`'s orphan sweep phase.
*/
findOrphanPages(): Promise<Array<{ slug: string; title: string; domain: string | null }>>;
// Tags
addTag(slug: string, tag: string): Promise<void>;

View File

@@ -466,6 +466,32 @@ export const MIGRATIONS: Migration[] = [
AND max_stalled < 5;
`,
},
{
version: 16,
name: 'cycle_locks_table',
// v0.17 brain maintenance cycle (runCycle primitive).
// PgBouncer transaction pooling strips session-scoped advisory locks
// (pg_try_advisory_lock) across connection checkouts, so we can't use
// them as the cycle-coordination primitive. A row with a TTL works
// through every pooler: any backend can SELECT/UPDATE/DELETE it, no
// session state required.
//
// Acquire: INSERT ... ON CONFLICT (id) DO UPDATE ... WHERE ttl_expires_at < NOW()
// returning ... — empty RETURNING = lock held by live holder.
// Refresh: UPDATE ... SET ttl_expires_at = NOW() + interval '30 min'
// WHERE id = 'gbrain-cycle' AND holder_pid = <my pid> — between phases.
// Release: DELETE WHERE id = 'gbrain-cycle' AND holder_pid = <my pid>.
sql: `
CREATE TABLE IF NOT EXISTS gbrain_cycle_locks (
id TEXT PRIMARY KEY,
holder_pid INT NOT NULL,
holder_host TEXT,
acquired_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ttl_expires_at TIMESTAMPTZ NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_cycle_locks_ttl ON gbrain_cycle_locks(ttl_expires_at);
`,
},
];
export const LATEST_VERSION = MIGRATIONS.length > 0

View File

@@ -1290,9 +1290,9 @@ const find_orphans: Operation = {
description: 'Include auto-generated and pseudo pages (default: false)',
},
},
handler: async (_ctx, p) => {
handler: async (ctx, p) => {
const { findOrphans } = await import('../commands/orphans.ts');
return findOrphans((p.include_pseudo as boolean) || false);
return findOrphans(ctx.engine, { includePseudo: (p.include_pseudo as boolean) || false });
},
cliHints: { name: 'orphans', hidden: true },
};

View File

@@ -655,6 +655,21 @@ export class PGLiteEngine implements BrainEngine {
return result;
}
async findOrphanPages(): Promise<Array<{ slug: string; title: string; domain: string | null }>> {
const { rows } = await this.db.query(
`SELECT
p.slug,
COALESCE(p.title, p.slug) AS title,
p.frontmatter->>'domain' AS domain
FROM pages p
WHERE NOT EXISTS (
SELECT 1 FROM links l WHERE l.to_page_id = p.id
)
ORDER BY p.slug`
);
return rows as Array<{ slug: string; title: string; domain: string | null }>;
}
// Tags
async addTag(slug: string, tag: string): Promise<void> {
await this.db.query(

View File

@@ -310,6 +310,22 @@ CREATE TABLE IF NOT EXISTS subagent_rate_leases (
);
CREATE INDEX IF NOT EXISTS idx_rate_leases_key_expires ON subagent_rate_leases (key, expires_at);
-- ============================================================
-- Cycle coordination lock — v0.17 runCycle primitive
-- ============================================================
-- See src/schema.sql for full rationale. One row per active cycle.
-- PGLite is single-writer, so the lock doubly protects: the DB-level
-- row + the file lock at ~/.gbrain/cycle.lock prevent concurrent
-- CLI invocations from racing.
CREATE TABLE IF NOT EXISTS gbrain_cycle_locks (
id TEXT PRIMARY KEY,
holder_pid INT NOT NULL,
holder_host TEXT,
acquired_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ttl_expires_at TIMESTAMPTZ NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_cycle_locks_ttl ON gbrain_cycle_locks(ttl_expires_at);
-- ============================================================
-- Trigger-based search_vector (spans pages + timeline_entries)
-- ============================================================

View File

@@ -699,6 +699,22 @@ export class PostgresEngine implements BrainEngine {
return result;
}
async findOrphanPages(): Promise<Array<{ slug: string; title: string; domain: string | null }>> {
const sql = this.sql;
const rows = await sql`
SELECT
p.slug,
COALESCE(p.title, p.slug) AS title,
p.frontmatter->>'domain' AS domain
FROM pages p
WHERE NOT EXISTS (
SELECT 1 FROM links l WHERE l.to_page_id = p.id
)
ORDER BY p.slug
`;
return rows as unknown as Array<{ slug: string; title: string; domain: string | null }>;
}
// Tags
async addTag(slug: string, tag: string): Promise<void> {
const sql = this.sql;

View File

@@ -412,6 +412,24 @@ CREATE TABLE IF NOT EXISTS subagent_rate_leases (
);
CREATE INDEX IF NOT EXISTS idx_rate_leases_key_expires ON subagent_rate_leases (key, expires_at);
-- ============================================================
-- Cycle coordination lock — v0.17 runCycle primitive
-- ============================================================
-- One row per active cycle. Any caller (autopilot daemon, Minions
-- autopilot-cycle handler, gbrain dream CLI) tries to acquire this
-- row before running a DB-write phase. Holders refresh ttl_expires_at
-- between phases; crashed holders auto-release once TTL expires.
-- Works through PgBouncer transaction pooling, unlike session-scoped
-- pg_try_advisory_lock.
CREATE TABLE IF NOT EXISTS gbrain_cycle_locks (
id TEXT PRIMARY KEY,
holder_pid INT NOT NULL,
holder_host TEXT,
acquired_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ttl_expires_at TIMESTAMPTZ NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_cycle_locks_ttl ON gbrain_cycle_locks(ttl_expires_at);
-- NOTIFY trigger for real-time job events (Postgres only, not PGLite)
CREATE OR REPLACE FUNCTION notify_minion_job_change() RETURNS trigger AS \$\$
BEGIN

View File

@@ -408,6 +408,24 @@ CREATE TABLE IF NOT EXISTS subagent_rate_leases (
);
CREATE INDEX IF NOT EXISTS idx_rate_leases_key_expires ON subagent_rate_leases (key, expires_at);
-- ============================================================
-- Cycle coordination lock — v0.17 runCycle primitive
-- ============================================================
-- One row per active cycle. Any caller (autopilot daemon, Minions
-- autopilot-cycle handler, gbrain dream CLI) tries to acquire this
-- row before running a DB-write phase. Holders refresh ttl_expires_at
-- between phases; crashed holders auto-release once TTL expires.
-- Works through PgBouncer transaction pooling, unlike session-scoped
-- pg_try_advisory_lock.
CREATE TABLE IF NOT EXISTS gbrain_cycle_locks (
id TEXT PRIMARY KEY,
holder_pid INT NOT NULL,
holder_host TEXT,
acquired_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ttl_expires_at TIMESTAMPTZ NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_cycle_locks_ttl ON gbrain_cycle_locks(ttl_expires_at);
-- NOTIFY trigger for real-time job events (Postgres only, not PGLite)
CREATE OR REPLACE FUNCTION notify_minion_job_change() RETURNS trigger AS $$
BEGIN

394
test/core/cycle.test.ts Normal file
View File

@@ -0,0 +1,394 @@
/**
* Unit tests for src/core/cycle.ts — runCycle primitive.
*
* Tests use mock.module to replace each phase's library function with
* deterministic stubs. Zero fixtures, zero DB, zero network. Covers
* the dryRun × phases × lock_held × engine-null matrix.
*
* The lock primitives are tested against an in-memory PGLite engine
* so they exercise real SQL paths.
*/
import { describe, test, expect, mock, beforeEach, beforeAll, afterAll, afterEach } from 'bun:test';
import { existsSync, unlinkSync } from 'fs';
// ─── Mocks ──────────────────────────────────────────────────────────
// Track what each phase was called with so tests can assert.
let lintCalls: Array<{ target: string; fix: boolean; dryRun: boolean | undefined }> = [];
let backlinksCalls: Array<{ action: string; dir: string; dryRun: boolean | undefined }> = [];
let syncCalls: Array<{ dryRun: boolean | undefined; noPull: boolean | undefined }> = [];
let extractCalls: Array<{ mode: string; dir: string }> = [];
let embedCalls: Array<{ stale: boolean | undefined; dryRun: boolean | undefined }> = [];
let orphansCalls: number = 0;
// Mock lint
mock.module('../../src/commands/lint.ts', () => ({
runLintCore: async (opts: any) => {
lintCalls.push({ target: opts.target, fix: opts.fix, dryRun: opts.dryRun });
return { total_issues: 2, total_fixed: opts.dryRun ? 0 : 2, pages_scanned: 5 };
},
}));
// Mock backlinks
mock.module('../../src/commands/backlinks.ts', () => ({
runBacklinksCore: async (opts: any) => {
backlinksCalls.push({ action: opts.action, dir: opts.dir, dryRun: opts.dryRun });
return { action: opts.action, gaps_found: 3, fixed: opts.dryRun ? 0 : 3, pages_affected: 2, dryRun: !!opts.dryRun };
},
// keep other exports present so import doesn't error
extractEntityRefs: () => [],
extractPageTitle: () => '',
hasBacklink: () => false,
buildBacklinkEntry: () => '',
findBacklinkGaps: () => [],
fixBacklinkGaps: () => 0,
runBacklinks: async () => {},
}));
// Mock sync
mock.module('../../src/commands/sync.ts', () => ({
performSync: async (_engine: any, opts: any) => {
syncCalls.push({ dryRun: opts.dryRun, noPull: opts.noPull });
return {
status: opts.dryRun ? 'dry_run' : 'synced',
fromCommit: 'abcd',
toCommit: 'efgh',
added: opts.dryRun ? 0 : 4,
modified: opts.dryRun ? 0 : 2,
deleted: 0,
renamed: 0,
chunksCreated: opts.dryRun ? 0 : 10,
embedded: 0,
pagesAffected: opts.dryRun ? [] : ['a', 'b'],
};
},
runSync: async () => {},
buildSyncManifest: () => ({ added: [], modified: [], deleted: [], renamed: [] }),
isSyncable: () => true,
pathToSlug: (s: string) => s,
}));
// Mock extract
mock.module('../../src/commands/extract.ts', () => ({
runExtractCore: async (_engine: any, opts: any) => {
extractCalls.push({ mode: opts.mode, dir: opts.dir });
return { links_created: 7, timeline_entries_created: 3, pages_processed: 5 };
},
walkMarkdownFiles: () => [],
extractMarkdownLinks: () => [],
resolveSlug: () => null,
}));
// Mock embed
mock.module('../../src/commands/embed.ts', () => ({
runEmbedCore: async (_engine: any, opts: any) => {
embedCalls.push({ stale: opts.stale, dryRun: opts.dryRun });
return {
embedded: opts.dryRun ? 0 : 8,
skipped: 2,
would_embed: opts.dryRun ? 8 : 0,
total_chunks: 10,
pages_processed: 3,
dryRun: !!opts.dryRun,
};
},
runEmbed: async () => {},
}));
// Mock orphans
mock.module('../../src/commands/orphans.ts', () => ({
findOrphans: async () => {
orphansCalls++;
return {
orphans: [],
total_orphans: 1,
total_linkable: 20,
total_pages: 20,
excluded: 0,
};
},
queryOrphanPages: async () => [],
shouldExclude: () => false,
deriveDomain: () => 'root',
formatOrphansText: () => '',
}));
// Import after mocks.
const { runCycle, ALL_PHASES } = await import('../../src/core/cycle.ts');
const { PGLiteEngine } = await import('../../src/core/pglite-engine.ts');
// Shared PGLite engine per describe block. Each block does its own
// beforeAll/afterAll (below). `truncateCycleLocks` clears the cycle
// lock row between tests so state doesn't leak across assertions.
async function truncateCycleLocks(engine: InstanceType<typeof PGLiteEngine>) {
await (sharedEngine as any).db.query('DELETE FROM gbrain_cycle_locks');
}
// One shared PGLite engine for the whole file. Creating a fresh engine
// per describe (15 migrations each) was causing the parallel test suite
// to hit beforeAll timeouts. truncateCycleLocks between tests keeps
// state clean.
let sharedEngine: InstanceType<typeof PGLiteEngine>;
beforeAll(async () => {
sharedEngine = new PGLiteEngine();
await sharedEngine.connect({});
await sharedEngine.initSchema();
});
afterAll(async () => {
await sharedEngine.disconnect();
});
beforeEach(() => {
lintCalls = [];
backlinksCalls = [];
syncCalls = [];
extractCalls = [];
embedCalls = [];
orphansCalls = 0;
});
// ─── dryRun propagation (regression guards) ────────────────────────
describe('runCycle — dryRun propagates to every phase', () => {
beforeEach(async () => {
await truncateCycleLocks(sharedEngine);
});
test('dryRun:true reaches lint, backlinks, sync, embed', async () => {
await runCycle(sharedEngine,{ brainDir: '/tmp/brain', dryRun: true });
expect(lintCalls.at(-1)?.dryRun).toBe(true);
expect(backlinksCalls.at(-1)?.dryRun).toBe(true);
expect(syncCalls.at(-1)?.dryRun).toBe(true);
expect(embedCalls.at(-1)?.dryRun).toBe(true);
});
test('dryRun:false writes in every phase', async () => {
await runCycle(sharedEngine,{ brainDir: '/tmp/brain', dryRun: false });
expect(lintCalls.at(-1)?.dryRun).toBe(false);
expect(backlinksCalls.at(-1)?.dryRun).toBe(false);
expect(syncCalls.at(-1)?.dryRun).toBe(false);
expect(embedCalls.at(-1)?.dryRun).toBe(false);
});
test('dryRun skips extract phase (no dry-run support)', async () => {
const report = await runCycle(sharedEngine,{ brainDir: '/tmp/brain', dryRun: true });
expect(extractCalls.length).toBe(0);
const extractPhase = report.phases.find(p => p.phase === 'extract');
expect(extractPhase?.status).toBe('skipped');
expect(extractPhase?.details.reason).toBe('no_dry_run_support');
});
});
// ─── Phase selection ──────────────────────────────────────────────
describe('runCycle — phase selection', () => {
beforeEach(async () => {
await truncateCycleLocks(sharedEngine);
});
test('default: all 6 phases run in order', async () => {
const report = await runCycle(sharedEngine,{ brainDir: '/tmp/brain' });
expect(report.phases.map(p => p.phase)).toEqual(ALL_PHASES);
});
test('--phase lint only runs lint', async () => {
const report = await runCycle(sharedEngine,{ brainDir: '/tmp/brain', phases: ['lint'] });
expect(report.phases.map(p => p.phase)).toEqual(['lint']);
expect(lintCalls.length).toBe(1);
expect(backlinksCalls.length).toBe(0);
expect(syncCalls.length).toBe(0);
});
test('--phase orphans only runs orphans', async () => {
await runCycle(sharedEngine,{ brainDir: '/tmp/brain', phases: ['orphans'] });
expect(orphansCalls).toBe(1);
expect(syncCalls.length).toBe(0);
});
});
// ─── Lock-skip for non-DB-write phase selections ──────────────────
describe('runCycle — cycle lock acquire/release semantics', () => {
beforeEach(async () => {
await truncateCycleLocks(sharedEngine);
});
test('phases: [orphans] (read-only) skips the lock entirely', async () => {
// We can tell the lock wasn't acquired because the lock table is
// never written to. Seeding a stale holder and verifying it survives
// the run would also work, but a simpler assertion: no rows ever
// existed for a read-only-only selection.
await runCycle(sharedEngine,{ brainDir: '/tmp/brain', phases: ['orphans'] });
const { rows } = await (sharedEngine as any).db.query('SELECT COUNT(*)::int AS n FROM gbrain_cycle_locks');
expect(rows[0].n).toBe(0);
});
test('phases including lint DOES acquire + release (table empty after run)', async () => {
await runCycle(sharedEngine,{ brainDir: '/tmp/brain', phases: ['lint'] });
// Lock is released in finally, so no rows survive the run.
const { rows } = await (sharedEngine as any).db.query('SELECT COUNT(*)::int AS n FROM gbrain_cycle_locks');
expect(rows[0].n).toBe(0);
});
test('phases including sync DOES acquire + release the lock', async () => {
await runCycle(sharedEngine,{ brainDir: '/tmp/brain', phases: ['sync'] });
const { rows } = await (sharedEngine as any).db.query('SELECT COUNT(*)::int AS n FROM gbrain_cycle_locks');
expect(rows[0].n).toBe(0);
});
});
// ─── Lock held by another live holder ──────────────────────────────
describe('runCycle — cycle_already_running skip', () => {
beforeEach(async () => {
await truncateCycleLocks(sharedEngine);
});
test('returns status=skipped when lock is held by live pid in the future', async () => {
// Seed a lock row that looks live (far-future TTL, different PID).
await (sharedEngine as any).db.query(
`INSERT INTO gbrain_cycle_locks (id, holder_pid, holder_host, acquired_at, ttl_expires_at)
VALUES ('gbrain-cycle', 99999, 'other-host', NOW(), NOW() + INTERVAL '1 hour')`
);
const report = await runCycle(sharedEngine,{ brainDir: '/tmp/brain' });
expect(report.status).toBe('skipped');
expect(report.reason).toBe('cycle_already_running');
expect(report.phases.length).toBe(0);
// None of the phase runners were called.
expect(lintCalls.length).toBe(0);
expect(syncCalls.length).toBe(0);
});
test('TTL-expired lock is auto-claimed (crashed holder)', async () => {
// Seed a lock row that looks stale (TTL already past).
await (sharedEngine as any).db.query(
`INSERT INTO gbrain_cycle_locks (id, holder_pid, holder_host, acquired_at, ttl_expires_at)
VALUES ('gbrain-cycle', 99999, 'crashed-host', NOW() - INTERVAL '2 hours', NOW() - INTERVAL '1 hour')`
);
const report = await runCycle(sharedEngine,{ brainDir: '/tmp/brain' });
expect(report.status).not.toBe('skipped');
expect(syncCalls.length).toBe(1); // cycle ran
});
});
// ─── Engine null path ─────────────────────────────────────────────
describe('runCycle — engine = null (filesystem-only mode)', () => {
const lockFile = require('path').join(require('os').homedir(), '.gbrain', 'cycle.lock');
afterEach(() => {
if (existsSync(lockFile)) { try { unlinkSync(lockFile); } catch { /* */ } }
});
test('filesystem phases still run when engine is null', async () => {
const report = await runCycle(null, { brainDir: '/tmp/brain' });
// Lint and backlinks ran.
expect(lintCalls.length).toBe(1);
expect(backlinksCalls.length).toBe(1);
// DB phases skipped with reason:no_database.
const syncPhase = report.phases.find(p => p.phase === 'sync');
expect(syncPhase?.status).toBe('skipped');
expect(syncPhase?.details.reason).toBe('no_database');
const embedPhase = report.phases.find(p => p.phase === 'embed');
expect(embedPhase?.status).toBe('skipped');
// syncCalls + embedCalls are empty because DB-required phases skipped.
expect(syncCalls.length).toBe(0);
expect(embedCalls.length).toBe(0);
});
test('file lock blocks concurrent engine=null cycles', async () => {
// Seed a lock file pointing at PID 1 (init/launchd — always alive on
// unix, and never equals our test PID). Fresh mtime means "live holder".
// With engine=null + the default phases selection, lint + backlinks
// trigger NEEDS_LOCK_PHASES → acquireFileLock sees the live holder and
// returns null → runCycle returns skipped/cycle_already_running.
const { writeFileSync, mkdirSync } = require('fs');
const path = require('path');
mkdirSync(path.dirname(lockFile), { recursive: true });
writeFileSync(lockFile, `1\n${new Date().toISOString()}\n`);
const report = await runCycle(null, { brainDir: '/tmp/brain' });
expect(report.status).toBe('skipped');
expect(report.reason).toBe('cycle_already_running');
// None of the filesystem phases ran because the lock blocked entry.
expect(lintCalls.length).toBe(0);
expect(backlinksCalls.length).toBe(0);
});
});
// ─── Status derivation ─────────────────────────────────────────────
describe('runCycle — status derivation', () => {
beforeEach(async () => {
await truncateCycleLocks(sharedEngine);
});
test('ok when work was done (non-dry-run)', async () => {
const report = await runCycle(sharedEngine,{ brainDir: '/tmp/brain' });
expect(['ok', 'partial']).toContain(report.status);
// Non-dry-run fixtures produce work (fixes:2, added:4 etc.), so:
expect(report.status).toBe('ok');
expect(report.totals.lint_fixes).toBe(2);
expect(report.totals.backlinks_added).toBe(3);
expect(report.totals.pages_synced).toBe(6); // added + modified from sync mock
expect(report.totals.pages_embedded).toBe(8);
expect(report.totals.orphans_found).toBe(1);
});
test('schema_version is stable at "1"', async () => {
const report = await runCycle(sharedEngine,{ brainDir: '/tmp/brain' });
expect(report.schema_version).toBe('1');
});
test('CycleReport shape includes all required top-level fields', async () => {
const report = await runCycle(sharedEngine,{ brainDir: '/tmp/brain' });
expect(report).toHaveProperty('schema_version');
expect(report).toHaveProperty('timestamp');
expect(report).toHaveProperty('duration_ms');
expect(report).toHaveProperty('status');
expect(report).toHaveProperty('brain_dir');
expect(report).toHaveProperty('phases');
expect(report).toHaveProperty('totals');
});
});
// ─── yieldBetweenPhases hook ─────────────────────────────────────
describe('runCycle — yieldBetweenPhases hook', () => {
beforeEach(async () => {
await truncateCycleLocks(sharedEngine);
});
test('hook is called between every phase', async () => {
let hookCalls = 0;
await runCycle(sharedEngine,{
brainDir: '/tmp/brain',
yieldBetweenPhases: async () => {
hookCalls++;
},
});
// 6 phases → 6 yield calls (one after each).
expect(hookCalls).toBe(6);
});
test('hook exceptions do not abort the cycle', async () => {
const report = await runCycle(sharedEngine,{
brainDir: '/tmp/brain',
yieldBetweenPhases: async () => {
throw new Error('synthetic hook error');
},
});
// Cycle still completed all phases.
expect(report.phases.length).toBe(6);
});
});

243
test/dream.test.ts Normal file
View File

@@ -0,0 +1,243 @@
/**
* Unit tests for src/commands/dream.ts — CLI alias over runCycle.
*
* dream is intentionally thin. These tests exercise the CLI surface
* (argv parsing, brainDir resolution, output format, exit codes)
* against a REAL runCycle + real library calls, backed by an
* in-memory PGLite engine.
*
* Why no mocks: `mock.module` in bun is process-global and leaks
* across test files (a stub of ../src/commands/orphans.ts breaks
* every test that imports shouldExclude/deriveDomain/formatOrphansText).
* Testing against real calls is honest and mock-leak-free.
*
* What this test file does NOT cover: the exhaustive dryRun-×-phases-×-
* lock matrix, which test/core/cycle.test.ts handles (in isolation).
* Here we only verify that dream.ts routes args correctly.
*/
import { describe, test, expect, beforeEach, afterEach, spyOn } from 'bun:test';
import { mkdtempSync, rmSync } from 'fs';
import { join } from 'path';
import { tmpdir } from 'os';
import { execSync } from 'child_process';
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
import { runDream } from '../src/commands/dream.ts';
// ─── Helpers ───────────────────────────────────────────────────────
/** Make an empty, engine-backed PGLite brain. */
async function makePGLite() {
const engine = new PGLiteEngine();
await engine.connect({});
await engine.initSchema();
return engine;
}
/** Make an empty git repo. Lint/backlinks have nothing to scan → status=clean. */
function makeGitRepo(): string {
const dir = mkdtempSync(join(tmpdir(), 'gbrain-dream-repo-'));
execSync('git init', { cwd: dir, stdio: 'pipe' });
execSync('git config user.email t@t.co', { cwd: dir, stdio: 'pipe' });
execSync('git config user.name t', { cwd: dir, stdio: 'pipe' });
// Commit an empty .gitkeep so rev-parse HEAD succeeds.
require('fs').writeFileSync(join(dir, '.gitkeep'), '');
execSync('git add -A && git commit -m init', { cwd: dir, stdio: 'pipe' });
return dir;
}
// ─── brainDir resolution ───────────────────────────────────────────
describe('runDream — brainDir resolution', () => {
let repo: string;
let engine: InstanceType<typeof PGLiteEngine>;
beforeEach(async () => {
repo = makeGitRepo();
engine = await makePGLite();
});
afterEach(async () => {
await engine.disconnect();
rmSync(repo, { recursive: true, force: true });
});
test('explicit --dir takes precedence over engine config', async () => {
await engine.setConfig('sync.repo_path', '/configured/dir');
const report = await runDream(engine, ['--dir', repo, '--json']);
expect(report).toBeTruthy();
if (report) expect(report.brain_dir).toBe(repo);
});
test('no --dir + engine-configured: uses engine.getConfig("sync.repo_path")', async () => {
await engine.setConfig('sync.repo_path', repo);
const report = await runDream(engine, ['--json']);
expect(report).toBeTruthy();
if (report) expect(report.brain_dir).toBe(repo);
});
test('no --dir + engine=null exits 1', async () => {
const spy = spyOn(process, 'exit').mockImplementation(() => { throw new Error('EXIT'); });
const errSpy = spyOn(console, 'error').mockImplementation(() => {});
try {
await runDream(null, []);
} catch (e: any) {
expect(e.message).toBe('EXIT');
}
expect(spy).toHaveBeenCalledWith(1);
spy.mockRestore();
errSpy.mockRestore();
});
test('--dir pointing at nonexistent path exits 1', async () => {
const spy = spyOn(process, 'exit').mockImplementation(() => { throw new Error('EXIT'); });
const errSpy = spyOn(console, 'error').mockImplementation(() => {});
try {
await runDream(null, ['--dir', '/does/not/exist/hopefully']);
} catch (e: any) {
expect(e.message).toBe('EXIT');
}
expect(spy).toHaveBeenCalledWith(1);
spy.mockRestore();
errSpy.mockRestore();
});
});
// ─── Phase selection (single-phase runs stay fast) ─────────────────
describe('runDream — --phase <name> restricts the cycle', () => {
let repo: string;
let engine: InstanceType<typeof PGLiteEngine>;
beforeEach(async () => {
repo = makeGitRepo();
engine = await makePGLite();
});
afterEach(async () => {
await engine.disconnect();
rmSync(repo, { recursive: true, force: true });
});
test('--phase lint produces a report with exactly one phase = lint', async () => {
const report = await runDream(engine, ['--dir', repo, '--phase', 'lint', '--json']);
expect(report).toBeTruthy();
if (report) {
expect(report.phases.length).toBe(1);
expect(report.phases[0].phase).toBe('lint');
}
});
test('--phase orphans produces a report with exactly one phase = orphans', async () => {
const report = await runDream(engine, ['--dir', repo, '--phase', 'orphans', '--json']);
expect(report).toBeTruthy();
if (report) {
expect(report.phases.length).toBe(1);
expect(report.phases[0].phase).toBe('orphans');
}
});
test('--phase garbage exits 1 with an error message', async () => {
const spy = spyOn(process, 'exit').mockImplementation(() => { throw new Error('EXIT'); });
const errSpy = spyOn(console, 'error').mockImplementation(() => {});
try {
await runDream(null, ['--dir', repo, '--phase', 'garbage']);
} catch (e: any) {
expect(e.message).toBe('EXIT');
}
expect(errSpy).toHaveBeenCalled();
spy.mockRestore();
errSpy.mockRestore();
});
});
// ─── Output format ─────────────────────────────────────────────────
describe('runDream — output format', () => {
let repo: string;
let engine: InstanceType<typeof PGLiteEngine>;
beforeEach(async () => {
repo = makeGitRepo();
engine = await makePGLite();
});
afterEach(async () => {
await engine.disconnect();
rmSync(repo, { recursive: true, force: true });
});
test('--json emits parsable CycleReport JSON with schema_version', async () => {
const lines: string[] = [];
const logSpy = spyOn(console, 'log').mockImplementation((msg: string) => { lines.push(String(msg)); });
await runDream(engine, ['--dir', repo, '--phase', 'lint', '--json']);
logSpy.mockRestore();
const parsed = JSON.parse(lines.join('\n'));
expect(parsed.schema_version).toBe('1');
expect(parsed).toHaveProperty('status');
expect(parsed).toHaveProperty('phases');
expect(parsed).toHaveProperty('totals');
});
test('human output for clean status mentions "Brain is healthy"', async () => {
const lines: string[] = [];
const logSpy = spyOn(console, 'log').mockImplementation((msg: string) => { lines.push(String(msg)); });
// Single-phase lint run on a clean repo → status=clean.
await runDream(engine, ['--dir', repo, '--phase', 'lint']);
logSpy.mockRestore();
expect(lines.join('\n')).toContain('Brain is healthy');
});
});
// ─── Dry-run propagation ───────────────────────────────────────────
describe('runDream — dry-run propagates through to runCycle', () => {
let repo: string;
let engine: InstanceType<typeof PGLiteEngine>;
beforeEach(async () => {
repo = makeGitRepo();
engine = await makePGLite();
});
afterEach(async () => {
await engine.disconnect();
rmSync(repo, { recursive: true, force: true });
});
test('--dry-run produces a report where no DB-mutating work happened', async () => {
// Before: empty pages table.
const { rows: before } = await (engine as any).db.query('SELECT COUNT(*)::int AS n FROM pages');
expect(before[0].n).toBe(0);
await runDream(engine, ['--dir', repo, '--dry-run', '--json']);
// After dry-run: still 0 pages. The cycle ran but wrote nothing.
const { rows: after } = await (engine as any).db.query('SELECT COUNT(*)::int AS n FROM pages');
expect(after[0].n).toBe(0);
});
});
// ─── Exit-code semantics ───────────────────────────────────────────
describe('runDream — exit-code semantics', () => {
let repo: string;
let engine: InstanceType<typeof PGLiteEngine>;
beforeEach(async () => {
repo = makeGitRepo();
engine = await makePGLite();
});
afterEach(async () => {
await engine.disconnect();
rmSync(repo, { recursive: true, force: true });
});
test('clean/ok/partial statuses do not call process.exit', async () => {
const spy = spyOn(process, 'exit').mockImplementation(() => { throw new Error('UNEXPECTED_EXIT'); });
await runDream(engine, ['--dir', repo, '--phase', 'lint', '--json']);
expect(spy).not.toHaveBeenCalled();
spy.mockRestore();
});
});

221
test/e2e/cycle.test.ts Normal file
View File

@@ -0,0 +1,221 @@
/**
* E2E cycle tests — Tier 1 (no API keys required).
*
* Exercises runCycle against REAL Postgres (via the E2E helpers' setupDB /
* teardownDB lifecycle) with a real git repo and a mocked embedBatch.
* Covers what the unit tests can't: the gbrain_cycle_locks table's
* INSERT...ON CONFLICT...WHERE semantics under a real postgres-js client,
* the v0.17 schema migration applying cleanly to a fresh Postgres, and the
* dry-run regression guard asserting zero writes when flag is set.
*
* Run: DATABASE_URL=... bun test test/e2e/cycle.test.ts
*/
import { describe, test, expect, mock, beforeAll, afterAll } from 'bun:test';
import { mkdtempSync, writeFileSync, rmSync, mkdirSync } from 'fs';
import { join } from 'path';
import { execSync } from 'child_process';
import { tmpdir } from 'os';
import { hasDatabase, setupDB, teardownDB, getEngine, getConn } from './helpers.ts';
// Mock embedBatch BEFORE importing runCycle so no real OpenAI calls happen
// even when the full cycle's embed phase runs.
mock.module('../../src/core/embedding.ts', () => ({
embedBatch: async (texts: string[]) => {
// Deterministic fake vector for each chunk.
return texts.map(() => new Float32Array(1536));
},
}));
const { runCycle } = await import('../../src/core/cycle.ts');
const skip = !hasDatabase();
const describeE2E = skip ? describe.skip : describe;
if (skip) {
console.log('Skipping E2E cycle tests (DATABASE_URL not set)');
}
function makeGitRepo(): string {
const dir = mkdtempSync(join(tmpdir(), 'gbrain-e2e-cycle-'));
execSync('git init', { cwd: dir, stdio: 'pipe' });
execSync('git config user.email test@test.co', { cwd: dir, stdio: 'pipe' });
execSync('git config user.name test', { cwd: dir, stdio: 'pipe' });
mkdirSync(join(dir, 'people'), { recursive: true });
writeFileSync(
join(dir, 'people/alice.md'),
'---\ntype: person\ntitle: Alice\n---\n\nAlice collaborates with Bob.\n',
);
writeFileSync(
join(dir, 'people/bob.md'),
'---\ntype: person\ntitle: Bob\n---\n\nBob is a person.\n',
);
execSync('git add -A && git commit -m init', { cwd: dir, stdio: 'pipe' });
return dir;
}
describeE2E('E2E: runCycle against real Postgres', () => {
let repo: string;
beforeAll(async () => {
await setupDB();
repo = makeGitRepo();
});
afterAll(async () => {
await teardownDB();
if (repo) rmSync(repo, { recursive: true, force: true });
});
test('v0.17 migration v16 created gbrain_cycle_locks table', async () => {
const conn = getConn();
const rows = await conn.unsafe(
`SELECT tablename FROM pg_tables WHERE tablename = 'gbrain_cycle_locks'`,
);
expect(rows.length).toBe(1);
// idx_cycle_locks_ttl index also exists.
const idx = await conn.unsafe(
`SELECT indexname FROM pg_indexes WHERE indexname = 'idx_cycle_locks_ttl'`,
);
expect(idx.length).toBe(1);
});
test('dry-run full cycle: zero DB writes + zero filesystem changes', async () => {
const conn = getConn();
// Baseline: track initial state.
const beforePages = await conn.unsafe(`SELECT count(*)::int AS n FROM pages`);
const beforeSync = await conn.unsafe(
`SELECT value FROM config WHERE key = 'sync.last_commit'`,
);
const report = await runCycle(getEngine(), {
brainDir: repo,
dryRun: true,
pull: false,
});
expect(report.schema_version).toBe('1');
// Cycle ran all 6 phases (or skipped the ones that don't support dry-run).
expect(report.phases.length).toBe(6);
// Nothing got written.
const afterPages = await conn.unsafe(`SELECT count(*)::int AS n FROM pages`);
expect(afterPages[0].n).toBe(beforePages[0].n);
// sync.last_commit unchanged (wasn't set before, isn't set now).
const afterSync = await conn.unsafe(
`SELECT value FROM config WHERE key = 'sync.last_commit'`,
);
expect(afterSync.length).toBe(beforeSync.length);
// Cycle lock was acquired + released; table should be empty after.
const locks = await conn.unsafe(`SELECT COUNT(*)::int AS n FROM gbrain_cycle_locks`);
expect(locks[0].n).toBe(0);
});
test('live cycle: pages get synced + chunks created + cycle lock cleaned up', async () => {
const conn = getConn();
const report = await runCycle(getEngine(), {
brainDir: repo,
dryRun: false,
pull: false,
});
expect(report.schema_version).toBe('1');
// The sync phase should have run and imported real pages.
const syncPhase = report.phases.find(p => p.phase === 'sync');
expect(syncPhase).toBeDefined();
expect(syncPhase?.status).not.toBe('fail');
// Pages exist in the DB.
const pages = await conn.unsafe(`SELECT slug FROM pages ORDER BY slug`);
const slugs = (pages as unknown as Array<{ slug: string }>).map(p => p.slug);
expect(slugs).toContain('people/alice');
expect(slugs).toContain('people/bob');
// sync.last_commit bookmark is now set.
const sync = await conn.unsafe(
`SELECT value FROM config WHERE key = 'sync.last_commit'`,
);
expect(sync.length).toBe(1);
expect((sync[0] as any).value.length).toBeGreaterThanOrEqual(7);
// Cycle lock is released.
const locks = await conn.unsafe(`SELECT COUNT(*)::int AS n FROM gbrain_cycle_locks`);
expect(locks[0].n).toBe(0);
}, 60_000);
test('concurrent cycle is blocked by the lock (status:skipped)', async () => {
const conn = getConn();
// Seed a fresh-TTL lock held by a different (fake) PID.
await conn.unsafe(
`INSERT INTO gbrain_cycle_locks (id, holder_pid, holder_host, acquired_at, ttl_expires_at)
VALUES ('gbrain-cycle', 99999, 'other-host', NOW(), NOW() + INTERVAL '1 hour')`,
);
try {
const report = await runCycle(getEngine(), {
brainDir: repo,
dryRun: true,
pull: false,
});
expect(report.status).toBe('skipped');
expect(report.reason).toBe('cycle_already_running');
expect(report.phases.length).toBe(0);
} finally {
// Clean up the seeded lock.
await conn.unsafe(`DELETE FROM gbrain_cycle_locks WHERE id = 'gbrain-cycle'`);
}
});
test('TTL-expired lock is auto-claimed (crashed holder recovery)', async () => {
const conn = getConn();
// Seed a stale lock (TTL in the past).
await conn.unsafe(
`INSERT INTO gbrain_cycle_locks (id, holder_pid, holder_host, acquired_at, ttl_expires_at)
VALUES ('gbrain-cycle', 99999, 'crashed-host', NOW() - INTERVAL '2 hours', NOW() - INTERVAL '1 hour')`,
);
const report = await runCycle(getEngine(), {
brainDir: repo,
dryRun: true,
pull: false,
});
// Crashed holder's stale TTL lets the new run acquire the lock.
expect(report.status).not.toBe('skipped');
// Lock released after the run.
const locks = await conn.unsafe(`SELECT COUNT(*)::int AS n FROM gbrain_cycle_locks`);
expect(locks[0].n).toBe(0);
});
test('--phase orphans skips the lock entirely (read-only optimization)', async () => {
const conn = getConn();
// Seed a fresh-TTL lock held by someone else. A read-only phase
// selection should succeed anyway (orphans never acquires the lock).
await conn.unsafe(
`INSERT INTO gbrain_cycle_locks (id, holder_pid, holder_host, acquired_at, ttl_expires_at)
VALUES ('gbrain-cycle', 99999, 'other-host', NOW(), NOW() + INTERVAL '1 hour')`,
);
try {
const report = await runCycle(getEngine(), {
brainDir: repo,
phases: ['orphans'],
pull: false,
});
// Status is NOT skipped — orphans ran despite the held lock.
expect(report.status).not.toBe('skipped');
const orphansPhase = report.phases.find(p => p.phase === 'orphans');
expect(orphansPhase).toBeDefined();
} finally {
await conn.unsafe(`DELETE FROM gbrain_cycle_locks WHERE id = 'gbrain-cycle'`);
}
});
});

139
test/e2e/dream.test.ts Normal file
View File

@@ -0,0 +1,139 @@
/**
* E2E dream tests — Tier 1 (no API keys required).
*
* Drives the dream CLI entry point through a real Postgres engine with
* a real git repo. Complements test/dream.test.ts (which exercises the
* code paths via the library call) by testing the actual CLI output
* shape and exit-code semantics against real DB state.
*
* Run: DATABASE_URL=... bun test test/e2e/dream.test.ts
*/
import { describe, test, expect, mock, beforeAll, afterAll } from 'bun:test';
import { mkdtempSync, writeFileSync, rmSync, mkdirSync } from 'fs';
import { join } from 'path';
import { execSync } from 'child_process';
import { tmpdir } from 'os';
import { hasDatabase, setupDB, teardownDB, getEngine, getConn } from './helpers.ts';
// Mock embedBatch so embed phase doesn't call OpenAI.
mock.module('../../src/core/embedding.ts', () => ({
embedBatch: async (texts: string[]) => texts.map(() => new Float32Array(1536)),
}));
const { runDream } = await import('../../src/commands/dream.ts');
const skip = !hasDatabase();
const describeE2E = skip ? describe.skip : describe;
if (skip) {
console.log('Skipping E2E dream tests (DATABASE_URL not set)');
}
function makeGitRepo(): string {
const dir = mkdtempSync(join(tmpdir(), 'gbrain-e2e-dream-'));
execSync('git init', { cwd: dir, stdio: 'pipe' });
execSync('git config user.email test@test.co', { cwd: dir, stdio: 'pipe' });
execSync('git config user.name test', { cwd: dir, stdio: 'pipe' });
mkdirSync(join(dir, 'concepts'), { recursive: true });
writeFileSync(
join(dir, 'concepts/testing.md'),
'---\ntype: concept\ntitle: Testing Philosophy\n---\n\nEvery untested path is a path where bugs hide.\n',
);
execSync('git add -A && git commit -m init', { cwd: dir, stdio: 'pipe' });
return dir;
}
function captureLog<T>(fn: () => Promise<T>): Promise<{ result: T; output: string }> {
return new Promise(async (resolve, reject) => {
const lines: string[] = [];
const origLog = console.log;
console.log = (...args: unknown[]) => { lines.push(args.map(String).join(' ')); };
try {
const result = await fn();
resolve({ result, output: lines.join('\n') });
} catch (e) {
reject(e);
} finally {
console.log = origLog;
}
});
}
describeE2E('E2E: gbrain dream CLI against real Postgres', () => {
let repo: string;
beforeAll(async () => {
await setupDB();
repo = makeGitRepo();
});
afterAll(async () => {
await teardownDB();
if (repo) rmSync(repo, { recursive: true, force: true });
});
test('dream --dry-run --json emits a valid CycleReport + DB stays empty', async () => {
const conn = getConn();
const beforePages = await conn.unsafe(`SELECT count(*)::int AS n FROM pages`);
const { output } = await captureLog(() =>
runDream(getEngine(), ['--dir', repo, '--dry-run', '--json']),
);
// dream prints a CycleReport as pretty-printed JSON. It may be
// preceded by inline phase-runner log lines (e.g. sync's
// "Full-sync dry run: N files"). Extract the JSON object.
const jsonStart = output.indexOf('{');
expect(jsonStart).toBeGreaterThanOrEqual(0);
const parsed = JSON.parse(output.slice(jsonStart));
expect(parsed.schema_version).toBe('1');
expect(parsed).toHaveProperty('status');
expect(parsed).toHaveProperty('phases');
expect(parsed).toHaveProperty('totals');
expect(parsed.brain_dir).toBe(repo);
// No pages were written.
const afterPages = await conn.unsafe(`SELECT count(*)::int AS n FROM pages`);
expect(afterPages[0].n).toBe(beforePages[0].n);
});
test('dream (no --dry-run) syncs pages into the real DB', async () => {
const conn = getConn();
await captureLog(() => runDream(getEngine(), ['--dir', repo, '--json']));
const pages = await conn.unsafe(`SELECT slug FROM pages ORDER BY slug`);
const slugs = (pages as unknown as Array<{ slug: string }>).map(p => p.slug);
expect(slugs).toContain('concepts/testing');
// sync.last_commit bookmark set.
const sync = await conn.unsafe(
`SELECT value FROM config WHERE key = 'sync.last_commit'`,
);
expect(sync.length).toBe(1);
}, 60_000);
test('dream --phase orphans only reports orphans + no cycle-lock footprint', async () => {
const conn = getConn();
const before = await conn.unsafe(
`SELECT COUNT(*)::int AS n FROM gbrain_cycle_locks`,
);
const { result } = await captureLog(() =>
runDream(getEngine(), ['--dir', repo, '--phase', 'orphans', '--json']),
);
expect(result).toBeTruthy();
if (result) {
expect(result.phases.length).toBe(1);
expect(result.phases[0].phase).toBe('orphans');
}
const after = await conn.unsafe(
`SELECT COUNT(*)::int AS n FROM gbrain_cycle_locks`,
);
// Read-only phase selection doesn't touch the lock table.
expect(after[0].n).toBe(before[0].n);
});
});

View File

@@ -126,3 +126,130 @@ describe('runEmbed --all (parallel)', () => {
expect(totalEmbedCalls).toBe(1);
});
});
// ────────────────────────────────────────────────────────────────
// runEmbedCore dry-run mode (v0.17 regression guard)
// ────────────────────────────────────────────────────────────────
describe('runEmbedCore --dry-run never calls the embedding model', () => {
test('dry-run --all with stale chunks: no embedBatch calls, accurate would_embed', async () => {
const { runEmbedCore } = await import('../src/commands/embed.ts');
const pages = Array.from({ length: 3 }, (_, i) => ({ slug: `page-${i}` }));
// All 3 pages have 2 stale chunks each (none embedded).
const chunksBySlug = new Map<string, any[]>(
pages.map(p => [
p.slug,
[
{ chunk_index: 0, chunk_text: 'a', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 },
{ chunk_index: 1, chunk_text: 'b', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 },
],
]),
);
const upserts: string[] = [];
const engine = mockEngine({
listPages: async () => pages,
getChunks: async (slug: string) => chunksBySlug.get(slug) || [],
upsertChunks: async (slug: string) => { upserts.push(slug); },
});
const result = await runEmbedCore(engine, { stale: true, dryRun: true });
// No OpenAI calls.
expect(totalEmbedCalls).toBe(0);
// No DB writes.
expect(upserts).toEqual([]);
// Accurate counts.
expect(result.dryRun).toBe(true);
expect(result.embedded).toBe(0);
expect(result.would_embed).toBe(6); // 3 pages * 2 chunks each
expect(result.skipped).toBe(0);
expect(result.total_chunks).toBe(6);
expect(result.pages_processed).toBe(3);
});
test('dry-run --stale correctly separates stale from already-embedded', async () => {
const { runEmbedCore } = await import('../src/commands/embed.ts');
const pages = [{ slug: 'fresh' }, { slug: 'partial' }, { slug: 'all-stale' }];
const chunksBySlug = new Map<string, any[]>([
['fresh', [
{ chunk_index: 0, chunk_text: 'a', chunk_source: 'compiled_truth', embedded_at: '2026-01-01', token_count: 1 },
{ chunk_index: 1, chunk_text: 'b', chunk_source: 'compiled_truth', embedded_at: '2026-01-01', token_count: 1 },
]],
['partial', [
{ chunk_index: 0, chunk_text: 'a', chunk_source: 'compiled_truth', embedded_at: '2026-01-01', token_count: 1 },
{ chunk_index: 1, chunk_text: 'b', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 },
]],
['all-stale', [
{ chunk_index: 0, chunk_text: 'a', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 },
{ chunk_index: 1, chunk_text: 'b', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 },
]],
]);
const engine = mockEngine({
listPages: async () => pages,
getChunks: async (slug: string) => chunksBySlug.get(slug) || [],
upsertChunks: async () => {},
});
const result = await runEmbedCore(engine, { stale: true, dryRun: true });
expect(totalEmbedCalls).toBe(0);
expect(result.dryRun).toBe(true);
expect(result.would_embed).toBe(3); // 1 from 'partial' + 2 from 'all-stale'
expect(result.skipped).toBe(3); // 2 from 'fresh' + 1 from 'partial'
expect(result.total_chunks).toBe(6);
expect(result.pages_processed).toBe(3);
});
test('dry-run --slugs on a single page counts stale chunks, no API calls', async () => {
const { runEmbedCore } = await import('../src/commands/embed.ts');
const chunks = [
{ chunk_index: 0, chunk_text: 'a', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 },
{ chunk_index: 1, chunk_text: 'b', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 },
{ chunk_index: 2, chunk_text: 'c', chunk_source: 'compiled_truth', embedded_at: '2026-01-01', token_count: 1 },
];
const engine = mockEngine({
getPage: async () => ({ slug: 'my-page', compiled_truth: 'text', timeline: '' }),
getChunks: async () => chunks,
upsertChunks: async () => {},
});
const result = await runEmbedCore(engine, { slugs: ['my-page'], dryRun: true });
expect(totalEmbedCalls).toBe(0);
expect(result.dryRun).toBe(true);
expect(result.would_embed).toBe(2);
expect(result.skipped).toBe(1);
expect(result.total_chunks).toBe(3);
expect(result.pages_processed).toBe(1);
});
test('non-dry-run path reports accurate embedded count (regression guard)', async () => {
const { runEmbedCore } = await import('../src/commands/embed.ts');
const pages = [{ slug: 'a' }, { slug: 'b' }];
const chunksBySlug = new Map<string, any[]>([
['a', [{ chunk_index: 0, chunk_text: 'a', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 }]],
['b', [
{ chunk_index: 0, chunk_text: 'x', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 },
{ chunk_index: 1, chunk_text: 'y', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 },
]],
]);
const engine = mockEngine({
listPages: async () => pages,
getChunks: async (slug: string) => chunksBySlug.get(slug) || [],
upsertChunks: async () => {},
});
process.env.GBRAIN_EMBED_CONCURRENCY = '2';
const result = await runEmbedCore(engine, { stale: true });
expect(result.dryRun).toBe(false);
expect(result.embedded).toBe(3); // 1 from a + 2 from b
expect(result.would_embed).toBe(0);
expect(result.pages_processed).toBe(2);
});
});

View File

@@ -3,9 +3,10 @@
*
* Covers:
* - Every expected handler name is registered.
* - autopilot-cycle handler returns { partial: true, failed_steps: [...] }
* when any step throws — does NOT throw itself (critical for preventing
* intermittent extract bugs from blocking every future cycle via retry).
* - autopilot-cycle handler returns { partial, status, report } (v0.17
* runCycle-backed shape) when any step fails — does NOT throw itself
* (critical invariant: an intermittent phase failure must not cause
* the Minion to retry and block every future cycle).
*/
import { describe, test, expect, beforeAll, afterAll, mock } from 'bun:test';
@@ -22,7 +23,7 @@ beforeAll(async () => {
await engine.initSchema();
worker = new MinionWorker(engine, { queue: 'test' });
await registerBuiltinHandlers(worker, engine);
});
}, 30_000);
afterAll(async () => {
await engine.disconnect();
@@ -48,15 +49,16 @@ describe('registerBuiltinHandlers', () => {
});
describe('autopilot-cycle handler — partial failure does NOT throw', () => {
test('step failure returns partial:true + failed_steps, no throw', async () => {
// Call the handler directly with a context that points at a nonexistent
// repo. Every step will fail (sync throws on missing .git, extract
// throws on missing dir, embed tries to list pages which is fine against
// the test engine, backlinks throws on missing dir). The handler should
// STILL return successfully — never throw.
test('phase failure returns partial:true + structured report, no throw', async () => {
// Call the handler directly with a job pointing at a nonexistent repo.
// Filesystem-dependent phases (lint, backlinks, sync) all fail because
// the dir / .git repo isn't there. DB-dependent phases (extract,
// embed, orphans) run fine against the in-memory test engine.
//
// This is the critical invariant: an intermittent bug in one step must
// not cause the Minion to retry + block every future cycle.
// CRITICAL INVARIANT: the handler must return successfully even when
// phases fail. Throwing would cause the Minion to retry, blocking
// every future cycle on an intermittent bug. v0.17 moves this
// guarantee into runCycle itself (per-phase try/catch in cycle.ts).
const handler = (worker as any).handlers.get('autopilot-cycle');
expect(handler).toBeDefined();
@@ -68,24 +70,34 @@ describe('autopilot-cycle handler — partial failure does NOT throw', () => {
expect(result).toBeDefined();
expect((result as any).partial).toBe(true);
expect(Array.isArray((result as any).failed_steps)).toBe(true);
// sync + extract + backlinks all fail on missing repo (embed operates
// on the DB directly and doesn't touch the repo path, so it doesn't fail).
expect((result as any).failed_steps).toContain('sync');
expect((result as any).failed_steps).toContain('extract');
expect((result as any).failed_steps).toContain('backlinks');
// v0.17 shape: { partial, status, report }. The report's phases array
// replaces the old failed_steps list.
expect(['partial', 'failed']).toContain((result as any).status);
const report = (result as any).report;
expect(report).toBeDefined();
expect(report.schema_version).toBe('1');
expect(Array.isArray(report.phases)).toBe(true);
// The filesystem-dependent phases should have failed on a missing dir.
const failedPhases = report.phases
.filter((p: any) => p.status === 'fail')
.map((p: any) => p.phase);
expect(failedPhases).toContain('lint');
expect(failedPhases).toContain('backlinks');
expect(failedPhases).toContain('sync');
});
test('all steps succeed → partial:false', async () => {
// Smoke: invoke against a real (if empty) brain dir. If every step
// completes, partial is false.
test('all phases succeed → result has structured report (smoke)', async () => {
// Smoke: invoke against a real (if empty) git repo. If every phase
// completes (or gracefully skips), the handler returns a result
// object with the full runCycle report. Some phases may still warn
// (empty repo has nothing to lint/sync) — the invariant is that the
// handler never throws.
const fs = await import('fs');
const { execSync } = await import('child_process');
const { tmpdir } = await import('os');
const { join } = await import('path');
const dir = fs.mkdtempSync(join(tmpdir(), 'gbrain-autopilot-cycle-'));
try {
// Initialize as a git repo so sync doesn't fail on .git lookup.
execSync('git init', { cwd: dir, stdio: 'pipe' });
execSync('git config user.email test@example.com', { cwd: dir, stdio: 'pipe' });
execSync('git config user.name Test', { cwd: dir, stdio: 'pipe' });
@@ -97,11 +109,12 @@ describe('autopilot-cycle handler — partial failure does NOT throw', () => {
signal: { aborted: false } as any,
job: { id: 2, name: 'autopilot-cycle' } as any,
});
// Empty repo: some steps may still fail (backlinks needs .md files)
// but the handler MUST return a result object, never throw.
// The handler MUST return a result object, never throw, regardless
// of individual phase outcomes.
expect(result).toBeDefined();
expect(typeof (result as any).partial).toBe('boolean');
expect('steps' in (result as any)).toBe(true);
expect('report' in (result as any)).toBe(true);
expect((result as any).report.schema_version).toBe('1');
} finally {
fs.rmSync(dir, { recursive: true, force: true });
}

View File

@@ -1,11 +1,14 @@
import { describe, test, expect } from 'bun:test';
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import {
shouldExclude,
deriveDomain,
formatOrphansText,
findOrphans,
queryOrphanPages,
type OrphanPage,
type OrphanResult,
} from '../src/commands/orphans.ts';
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
// --- shouldExclude ---
@@ -201,3 +204,89 @@ describe('formatOrphansText', () => {
expect(out).toContain('2 orphans out of 100 linkable pages (120 total; 20 excluded)');
});
});
// ────────────────────────────────────────────────────────────────
// findOrphans + queryOrphanPages with explicit engine (v0.17 change)
// ────────────────────────────────────────────────────────────────
describe('findOrphans (engine-injected)', () => {
let engine: PGLiteEngine;
beforeEach(async () => {
engine = new PGLiteEngine();
await engine.connect({});
await engine.initSchema();
});
afterEach(async () => {
await engine.disconnect();
});
test('returns pages with no inbound links, excluding pseudo-pages', async () => {
// Build a tiny brain: alice links to bob. alice is an orphan (nothing
// points to her), bob is not (alice points to him). _atlas is a pseudo
// page that should be excluded by default.
await engine.putPage('people/alice', {
type: 'person',
title: 'Alice',
compiled_truth: 'Alice works with Bob.',
timeline: '',
});
await engine.putPage('people/bob', {
type: 'person',
title: 'Bob',
compiled_truth: 'Bob.',
timeline: '',
});
await engine.putPage('_atlas', {
type: 'concept',
title: 'Atlas',
compiled_truth: 'pseudo-page',
timeline: '',
});
// Create the link alice -> bob.
await engine.addLink('people/alice', 'people/bob', 'mentioned', 'references', 'markdown');
const result = await findOrphans(engine);
const slugs = result.orphans.map(o => o.slug).sort();
expect(slugs).toEqual(['people/alice']); // _atlas excluded by default; bob has a backlink
expect(result.total_orphans).toBe(1);
expect(result.total_pages).toBe(3);
expect(result.excluded).toBeGreaterThanOrEqual(1); // _atlas was filtered
});
test('includePseudo: true surfaces pseudo-pages too', async () => {
await engine.putPage('_atlas', {
type: 'concept',
title: 'Atlas',
compiled_truth: 'pseudo',
timeline: '',
});
const result = await findOrphans(engine, { includePseudo: true });
const slugs = result.orphans.map(o => o.slug).sort();
expect(slugs).toContain('_atlas');
});
test('queryOrphanPages delegates to the passed engine (no global db)', async () => {
await engine.putPage('topic/standalone', {
type: 'concept',
title: 'Standalone',
compiled_truth: 'no inbound links',
timeline: '',
});
const rows = await queryOrphanPages(engine);
const slugs = rows.map(r => r.slug);
expect(slugs).toContain('topic/standalone');
});
test('zero pages: empty result (no crash on empty brain)', async () => {
const result = await findOrphans(engine);
expect(result.orphans).toEqual([]);
expect(result.total_orphans).toBe(0);
expect(result.total_pages).toBe(0);
});
});

View File

@@ -1,5 +1,10 @@
import { describe, test, expect } from 'bun:test';
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import { buildSyncManifest, isSyncable, pathToSlug } from '../src/core/sync.ts';
import { mkdtempSync, writeFileSync, rmSync, mkdirSync } from 'fs';
import { join } from 'path';
import { execSync } from 'child_process';
import { tmpdir } from 'os';
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
describe('buildSyncManifest', () => {
test('parses A/M/D entries from single commit', () => {
@@ -191,6 +196,162 @@ describe('buildSyncManifest edge cases', () => {
});
});
// ────────────────────────────────────────────────────────────────
// performSync dry-run (v0.17 regression guard for full-sync silent writes)
// ────────────────────────────────────────────────────────────────
describe('performSync dry-run never writes', () => {
let engine: PGLiteEngine;
let repoPath: string;
beforeEach(async () => {
engine = new PGLiteEngine();
await engine.connect({});
await engine.initSchema();
repoPath = mkdtempSync(join(tmpdir(), 'gbrain-sync-dryrun-'));
execSync('git init', { cwd: repoPath, stdio: 'pipe' });
execSync('git config user.email "test@test.com"', { cwd: repoPath, stdio: 'pipe' });
execSync('git config user.name "Test"', { cwd: repoPath, stdio: 'pipe' });
mkdirSync(join(repoPath, 'people'), { recursive: true });
writeFileSync(join(repoPath, 'people/alice.md'), [
'---',
'type: person',
'title: Alice',
'---',
'',
'Alice is a person.',
].join('\n'));
writeFileSync(join(repoPath, 'people/bob.md'), [
'---',
'type: person',
'title: Bob',
'---',
'',
'Bob is another person.',
].join('\n'));
execSync('git add -A && git commit -m "initial"', { cwd: repoPath, stdio: 'pipe' });
});
afterEach(async () => {
await engine.disconnect();
if (repoPath) rmSync(repoPath, { recursive: true, force: true });
});
test('first-sync dry-run does NOT write to DB or advance the bookmark', async () => {
const { performSync } = await import('../src/commands/sync.ts');
const result = await performSync(engine, {
repoPath,
dryRun: true,
noPull: true,
noEmbed: true,
});
// Status + counts reflect what WOULD be imported.
expect(result.status).toBe('dry_run');
expect(result.added).toBe(2); // alice + bob, both syncable
expect(result.chunksCreated).toBe(0);
expect(result.embedded).toBe(0);
// DB is clean: no pages written.
expect(await engine.getPage('people/alice')).toBeNull();
expect(await engine.getPage('people/bob')).toBeNull();
// Bookmark NOT set — this is the regression the guard enforces.
expect(await engine.getConfig('sync.last_commit')).toBeNull();
expect(await engine.getConfig('sync.repo_path')).toBeNull();
});
test('incremental dry-run does NOT write to DB or advance the bookmark', async () => {
const { performSync } = await import('../src/commands/sync.ts');
// First do a real sync to seed the bookmark.
const real = await performSync(engine, {
repoPath,
noPull: true,
noEmbed: true,
});
expect(real.status).toBe('first_sync');
const bookmarkAfterReal = await engine.getConfig('sync.last_commit');
expect(bookmarkAfterReal).not.toBeNull();
// Add a third file.
writeFileSync(join(repoPath, 'people/carol.md'), [
'---',
'type: person',
'title: Carol',
'---',
'',
'Carol joins the cast.',
].join('\n'));
execSync('git add -A && git commit -m "add carol"', { cwd: repoPath, stdio: 'pipe' });
// Incremental sync in dry-run mode.
const result = await performSync(engine, {
repoPath,
dryRun: true,
noPull: true,
noEmbed: true,
});
expect(result.status).toBe('dry_run');
expect(result.added).toBe(1); // carol only
expect(result.chunksCreated).toBe(0);
expect(result.embedded).toBe(0);
// carol is NOT in the DB.
expect(await engine.getPage('people/carol')).toBeNull();
// alice + bob still present from the real sync.
expect(await engine.getPage('people/alice')).not.toBeNull();
expect(await engine.getPage('people/bob')).not.toBeNull();
// Bookmark unchanged — still at the pre-carol commit.
const bookmarkAfterDry = await engine.getConfig('sync.last_commit');
expect(bookmarkAfterDry).toBe(bookmarkAfterReal);
});
test('full-sync (--full) dry-run does NOT write to DB or advance the bookmark', async () => {
const { performSync } = await import('../src/commands/sync.ts');
// Seed the bookmark so we hit the full-sync-with-bookmark path when --full is set.
await performSync(engine, { repoPath, noPull: true, noEmbed: true });
// Clear DB so we can observe that a --full dry-run doesn't re-import.
await (engine as any).db.exec(`DELETE FROM content_chunks; DELETE FROM pages;`);
const bookmarkBefore = await engine.getConfig('sync.last_commit');
expect(bookmarkBefore).not.toBeNull();
const result = await performSync(engine, {
repoPath,
full: true, // force full-sync path
dryRun: true,
noPull: true,
noEmbed: true,
});
expect(result.status).toBe('dry_run');
expect(result.added).toBe(2); // alice + bob would be imported
expect(result.chunksCreated).toBe(0);
// DB empty — full-sync dry-run did not reimport.
expect(await engine.getPage('people/alice')).toBeNull();
expect(await engine.getPage('people/bob')).toBeNull();
// Bookmark unchanged.
const bookmarkAfter = await engine.getConfig('sync.last_commit');
expect(bookmarkAfter).toBe(bookmarkBefore);
});
test('SyncResult exposes embedded count field', async () => {
const { performSync } = await import('../src/commands/sync.ts');
const result = await performSync(engine, {
repoPath,
dryRun: true,
noPull: true,
noEmbed: true,
});
// Structural assertion: the contract includes `embedded: number`.
expect(typeof result.embedded).toBe('number');
});
});
describe('sync regression — #132 nested transaction deadlock', () => {
test('src/commands/sync.ts does not wrap the add/modify loop in engine.transaction()', async () => {
const source = await Bun.file(new URL('../src/commands/sync.ts', import.meta.url)).text();