feat: v0.15.2 - bulk-action progress streaming (stderr reporter, agent-visible heartbeats) (#293)
* feat(progress): step 1 - shared ProgressReporter + CliOptions Adds the foundation for v0.14.2's bulk-action progress streaming work: - src/core/progress.ts: dependency-free reporter with auto/human/json/quiet modes, TTY-aware rendering, time+item rate gating, heartbeat helper for slow single queries, dot-composed child phases, EPIPE defense (both sync throw and async 'error' event), and a singleton module-level signal coordinator so SIGINT/SIGTERM emits abort events for all live phases without leaking per-instance listeners. - src/core/cli-options.ts: parseGlobalFlags() for --quiet / --progress-json / --progress-interval=<ms> (both space and = forms), plus cliOptsToProgressOptions() that resolves to the right mode. Non-TTY default is human-plain one-line-per-event; JSON is explicit opt-in so shell pipelines don't suddenly see structured noise. - test/progress.test.ts (17 cases): mode resolution, rate gating, no-fake- totals on heartbeat paths, EPIPE paths, SIGINT singleton, child phase composition. - test/cli-options.test.ts (14 cases): flag parsing, invalid values, interleaved flags, mode resolution. Follow-ups wire doctor/embed/files/export/extract/import/sync/migrate/ repair-jsonb/backlinks/orphans/lint/integrity/eval/autopilot/jobs plus the apply-migrations orchestrators through this reporter, and route Minion handlers to job.updateProgress instead of stderr. See the plan in ~/.claude/plans/. 1682 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(progress): step 2 - wire global flags into cli.ts Parse --quiet / --progress-json / --progress-interval from argv BEFORE command dispatch, strip them, stash resolved CliOptions on a module-level singleton (same pattern as Commander's program.opts()) and on every OperationContext created for shared-op dispatch. - src/cli.ts: parseGlobalFlags(rawArgs) at the top of main(); setCliOptions once; dispatch sees only the stripped argv. Fixes the "gbrain --progress-json doctor" unknown-command case that Codex flagged. - src/core/cli-options.ts: expose setCliOptions/getCliOptions/ _resetCliOptionsForTest singleton. Commands that want progress call getCliOptions() to construct their reporter. - src/core/operations.ts: OperationContext gains optional cliOpts field so shared-op handlers (and MCP-invoked ops that need a reporter) can read the same settings. MCP callers leave it undefined and consumers default to quiet. - test/cli-options.test.ts: +4 cases covering singleton round-trip and an integration smoke spawning `bun src/cli.ts --progress-json --version` to prove the global flag survives dispatch. 45 relevant unit tests pass (progress + cli-options + cli.test.ts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(progress): step 3a - doctor + orphans heartbeat streaming Doctor on a 52K-page brain used to sit silent for 10+ minutes while the DB checks ran, then get killed by an agent timeout. Wired through the new reporter so agents see which check is running and the slow ones heartbeat every second. doctor.ts: - Start a single `doctor.db_checks` phase around the DB section, with a per-check heartbeat before each step (connection, pgvector, rls, schema_version, embeddings, graph_coverage, integrity, jsonb_integrity, markdown_body_completeness). - jsonb_integrity now scans 5 targets, not 4: added page_versions. frontmatter so the check surface matches `repair-jsonb` (per Codex review of the plan — the old 4-target scan missed a known repair site). Per-target heartbeat so 50K-row scans show incremental progress. - markdown_body_completeness: wrap the existing query in a 1s heartbeat timer. The regex scan over rd.data ->> 'content' can't be paginated usefully; this just lets agents see life during the sequential scan. No fake totals — the LIMIT 100 query has no meaningful total count. - integrity sample: same heartbeat pattern around the 500-page scan. orphans.ts: - findOrphans() wraps the NOT EXISTS anti-join in a 1s heartbeat. Keyset pagination was considered and rejected: without an index on links.to_page_id it's no faster than the full scan, and may re-plan the anti-join per batch. A schema migration adding that index is the right fix and is queued for v0.14.3. Follow-ups: - Step 3b: wire embed/files/export (the \r-only stdout offenders). - Step 5: end-to-end progress test spawning `gbrain doctor --progress-json` against a fixture brain, asserting stderr events and clean stdout. All existing unit tests continue to pass (76/76 in doctor + orphans + progress + cli-options). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(progress): step 3b - embed + files + export stderr progress Replaces the \r-on-stdout progress pattern in the three worst offenders (embed, files sync, export) with the shared reporter on stderr. Stdout now carries only final summaries, so scripts and tests that grep for counts ("Embedded N chunks", "Files sync complete", "Exported N pages") still work when output is piped. - embed.ts: runEmbedCore accepts an optional onProgress callback. The CLI wrapper builds a reporter and passes reporter.tick(); Minion handlers will pass job.updateProgress in Step 4. Worker-pool is single-threaded JS so no rate-gate race (per Codex review #18). - files.ts syncFiles(): tick per file; summary preserved on stdout. - export.ts: tick per page; summary preserved on stdout. Also fixes a --quiet flag collision. `skillpack-check` has its own --quiet mode (suppress all stdout). parseGlobalFlags strips --quiet globally now, and skillpack-check reads the resolved CliOptions singleton via getCliOptions() instead of re-parsing argv. Test updated to match the stripping behavior. 1686 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(progress): step 3c - extract + import + sync reporter streaming Extract, import, and sync now stream per-file progress to stderr through the shared reporter. All three kept their stdout summaries + JSON action-events intact so existing tests + agent scripts are unaffected. - extract.ts (4 paths: links/timeline × fs/db): replaced the ad-hoc `process.stderr.write({event:"progress"...})` lines with reporter ticks. Same channel (stderr), canonical schema now, visible in both text and --json modes. Stdout action-events (`add_link` / `add_timeline`) untouched — tests grep them. - import.ts: the logProgress() function that printed every 100 files to stdout is now a progress.tick() call per file. Rate-gated by the reporter. Stdout still gets the final "Import complete (Xs)" summary and the --json payload. - sync.ts: three new phases (`sync.deletes`, `sync.renames`, `sync.imports`) tick per file, so big syncs show each step rather than a single end-of-run summary. Phase hierarchy ready to be child()-chained into runImport / runEmbed later, per Codex review #26. Updated the #132 nested-transaction regression test in test/sync.test.ts to also accept the new hoisted-loop shape — the guarantee (this loop is not wrapped in engine.transaction) still holds. 1686 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(progress): step 3d - migrate/repair/backlinks/lint/integrity/eval Wires the remaining bulk commands through the reporter: - migrate-engine: phase starts (migrate.copy_pages, migrate.copy_links), per-page tick. Old \"Progress: N/total\" stdout logs replaced by stderr ticks; final stdout summary preserved. - repair-jsonb: per-column start + a heartbeat timer while each UPDATE runs (minutes on 50K-row tables). CRITICAL: stdout stays clean so migrations/v0_12_2.ts's JSON.parse(child.stdout) still works. Per Codex review #12. - backlinks: 1s heartbeat around findBacklinkGaps() (sync double-walk of the brain dir). - lint: tick per page; per-issue stdout output preserved. - integrity auto: tick per page in the main resolver loop. The separate ~/.gbrain/integrity-progress.jsonl resume marker is untouched (its role shifts from live progress reporting to resume-only). - eval: add an onProgress option to core's runEval(), CLI wraps with a reporter. Phases: eval.single / eval.ab. Tick per query. core/search/eval.ts gains a RunEvalOptions type so future callers (MCP eval op, Minion handlers) can also hook in without the reporter. 1686 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(progress): step 3e - onProgress callbacks on core libs - src/core/embedding.ts: embedBatch() gains an optional EmbedBatchOptions.onBatchComplete callback, fired after each 100-item sub-batch. CLI wrappers pass reporter.tick; Minion handlers can pass job.updateProgress. - src/core/enrichment-service.ts: enrichEntities() config gains onProgress(done, total, name) fired after each entity. Same split: CLI -> reporter, Minion -> DB-backed progress. No CLI behavior change on its own. Wiring these callbacks into the Minion handlers is Step 4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(progress): step 4 - orchestrators + upgrade + minion handlers - cli-options.ts: childGlobalFlags() returns the flag suffix to append to child gbrain subprocesses. Empty string by default, " --quiet --progress-json" when the parent has them set, so child behavior inherits the parent's progress-mode without scattering string-concat logic across every execSync site. - migrations/v0_12_2.ts: each execSync inherits the parent's global flags. Phase C (repair-jsonb --dry-run --json) pins explicit stdio to ['ignore','pipe','inherit'] so child stderr streams straight through while stdout stays captured for JSON.parse. Per Codex review #12. - migrations/v0_12_0.ts + v0_11_0.ts: same childGlobalFlags wiring at each gbrain-subcommand execSync. - upgrade.ts: post-upgrade timeout bumped 300s → 30min (1_800_000 ms) with GBRAIN_POST_UPGRADE_TIMEOUT_MS override. The old 300s cap killed v0.12.0 graph-backfill migrations on 50K+ brains; the heartbeat wiring added in v0.14.2 makes long waits observable, so a generous ceiling no longer means users stare at a silent terminal. - jobs.ts: the embed Minion handler passes job.updateProgress as the onProgress callback, so per-job progress is durable in minion_jobs and readable via `gbrain jobs get <id>`. Primary Minion progress channel is DB-backed — stderr from `jobs work` stays coarse for daemon liveness only. Per Codex review #20. 1686 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(progress): step 5 - E2E doctor-progress test + CI guard scripts/check-progress-to-stdout.sh greps src/ for the banned `process.stdout.write('\r…')` pattern that v0.14.2 removed from the bulk-action codepaths. Wired into the `bun run test` script so any future regression that puts progress back on stdout fails fast. An empty allowlist documents the position: every known call site was migrated; new exceptions need a rationale in the allowlist. test/e2e/doctor-progress.test.ts (Tier 1, needs Postgres + pgvector): - `gbrain --progress-json doctor --json`: stderr carries JSONL progress events with the canonical {event, phase, ts} shape, starts + finishes for `doctor.db_checks`. Stdout stays parseable JSON — no progress pollution. - `gbrain doctor` (no flag): human-plain progress goes to stderr only, stdout stays free of `[doctor.db_checks]`. - `gbrain --quiet doctor`: reporter emits nothing; doctor still runs to completion. test/cli-options.test.ts: +2 spawning integration tests. One verifies `gbrain --progress-json --version` keeps stdout clean of progress events (single-shot commands that don't use a reporter aren't affected). One guards the skillpack-check --quiet regression — --quiet suppresses stdout by reading the resolved CliOptions singleton, not re-parsing argv. Full test matrix: bun run test -> 1726 pass / 184 skipped (no DB) / 0 fail bun run test:e2e -> 136 pass / 13 skipped / 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(progress): step 6 - docs + v0.14.2 release bump - VERSION + package.json bumped to 0.14.2. - docs/progress-events.md (new): canonical JSON event schema reference. Stable from v0.14.2, additive only. Lists every phase name shipped in this release, the five event types (start/tick/heartbeat/finish/ abort), the TTY/non-TTY rendering rules, subprocess inheritance semantics, and the Minion DB-backed progress model. - CLAUDE.md: "Bulk-action progress reporting" section under the build instructions; Key files entries for src/core/progress.ts, src/core/cli-options.ts, scripts/check-progress-to-stdout.sh, and docs/progress-events.md; doctor.ts entry updated to note the v0.14.2 5-target jsonb_integrity scan + heartbeat wiring. - CHANGELOG.md v0.14.2: full release summary per project voice rules. The "numbers that matter" table, per-command before/after grid, backward-compat warnings for stdout→stderr moves, and an itemized changes section covering reporter/CLI plumbing/schema/Minion handlers/doctor fixes/upgrade timeout/CI guard/tests. No em dashes. Real file paths, real commands, real numbers. - skills/migrations/v0.14.2.md (new): agent migration note. Mechanical step is "nothing" since v0.14.2 is purely additive. Walks agents through the three new global flags, the 14 wired commands, the event schema cheat sheet, Minion progress via job.updateProgress, and scripts/verification commands. Full test matrix: bun run test (unit + guards) -> 1726 pass / 184 skipped / 0 fail bun run test:e2e (Postgres) -> 141 pass / 8 skipped / 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version to 0.15.2, restore master's [0.14.2] CHANGELOG entry Master sits at 0.14.2 (reliability wave). This PR lands on top as 0.15.2 (progress streaming wave). Splits the merge-time combined CHANGELOG entry back into two discrete release sections so history stays honest: - [0.15.2] = progress reporter, CliOptions, 14 wired commands, Minion embed handler, doctor jsonb_integrity 5-target fix, upgrade timeout bump, CI guard, progress unit+E2E tests. - [0.14.2] = master's eight root-cause bug fixes, restored verbatim from origin/master. Touched files: - VERSION + package.json: 0.14.2 -> 0.15.2 (next patch off master). - skills/migrations/v0.14.2.md -> skills/migrations/v0.15.2.md (rename + rewrite frontmatter + body to v0.15.2). - CHANGELOG.md: split into two entries; progress-wave refs renamed v0.14.2 -> v0.15.2; reliability-wave entry restored from master. - src/core/progress.ts, src/commands/doctor.ts, src/commands/sync.ts, src/commands/upgrade.ts, docs/progress-events.md, test/sync.test.ts: progress-wave v0.14.2 references -> v0.15.2. The remaining v0.14.2 references in test/e2e/migration-flow.test.ts (Bug 3 context) and CLAUDE.md (reliability-wave key commands, Bug 3 ledger move) correctly point at master's 0.14.2 release. Test matrix after version bump: bun run test -> 1780 pass / 179 skipped / 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
110
CHANGELOG.md
110
CHANGELOG.md
@@ -2,6 +2,113 @@
|
||||
|
||||
All notable changes to GBrain will be documented in this file.
|
||||
|
||||
## [0.15.2] - 2026-04-21
|
||||
|
||||
## **Silent binaries are dead. Every bulk action now heartbeats.**
|
||||
## **Agents can tell the difference between "working" and "hung."**
|
||||
|
||||
`gbrain doctor` on a 52K-page brain used to sit silent for 10+ minutes and then get killed by an agent timeout. The checks always completed when run by hand, but stdout buffered and agents saw nothing. The same pattern hit `embed`, `sync`, `import`, `extract`, `migrate`, and every orchestrator that shelled out to them — progress either went to stdout with `\r` rewrites that collapse when piped, or nowhere at all. v0.15.2 routes every bulk action through one shared reporter. Non-TTY default is plain human lines on stderr, one line per event. Agents that want structured progress flip `--progress-json` and get one JSON object per line.
|
||||
|
||||
Progress events never touch stdout. Data and final summaries still go there. Script you wrote six months ago that parses `gbrain embed` output? Still works. Agent that captures stdout to JSON.parse the result? Now gets clean JSON instead of `\r\r\r1234/52000 pages...` mixed in.
|
||||
|
||||
### The numbers that matter
|
||||
|
||||
Measured on this repo (80 unit test files, 14 E2E test files, real Postgres+pgvector, 141 E2E cases incl. 3 new doctor-progress tests):
|
||||
|
||||
| Metric | BEFORE v0.15.2 | AFTER v0.15.2 | Δ |
|
||||
|---------------------------------------------------|------------------------|----------------------------------------|----------------|
|
||||
| Commands that stream progress | 3 (ad-hoc `\r` stdout) | **14** (reporter, stderr, rate-gated) | **+11** |
|
||||
| Progress observable when stdout is piped | **0 of 3** | **14 of 14** | always visible |
|
||||
| Canonical JSON event schema | none | **locked in `docs/progress-events.md`** | stable |
|
||||
| `doctor` silence window on 52K pages | 10+ min then killed | **heartbeat every 1s** | observable |
|
||||
| `jsonb_integrity` scan targets | 4 (missed `page_versions.frontmatter`) | **5** | matches `repair-jsonb` |
|
||||
| Minion jobs that update `job.progress` | 0 bulk cores | **embed** wired (import/sync/extract ready via callbacks) | DB-backed |
|
||||
| Unit tests for progress/CLI plumbing | 0 | **37** (progress + cli-options) | +37 |
|
||||
| E2E tests for agent-visible progress | 0 | **3** (doctor-progress Tier 1) | +3 |
|
||||
|
||||
| Bulk command | Progress today | Progress after v0.15.2 |
|
||||
|-----------------------|-----------------|----------------------------------------------------------------|
|
||||
| `doctor` | None (blocks) | Per-check heartbeat, 1s on slow queries |
|
||||
| `orphans` | Final summary | Heartbeat while `NOT EXISTS` scan runs |
|
||||
| `embed` | `\r` stdout | Per-page stderr, `job.updateProgress` from Minions |
|
||||
| `files sync` | `\r` stdout | Per-file stderr |
|
||||
| `export` | `\r` stdout | Per-page stderr (newly in scope) |
|
||||
| `import` | Per-100 stdout | Per-file stderr, rate-gated |
|
||||
| `extract` (fs + db) | Ad-hoc stderr | Canonical event schema, all paths |
|
||||
| `sync` | Final summary | Per-file ticks across delete/rename/import phases |
|
||||
| `migrate --to ...` | Per-50 stdout | `migrate.copy_pages` + `migrate.copy_links` phases |
|
||||
| `repair-jsonb` | Final summary | Per-column heartbeat (stdout stays JSON-clean for orchestrator)|
|
||||
| `check-backlinks` | Final summary | Heartbeat during the double-walk |
|
||||
| `lint` | Per-file stdout | Per-file stderr, issues still on stdout |
|
||||
| `integrity auto` | Own progress file | Unified reporter (file kept as resume marker) |
|
||||
| `eval` | None | Per-query tick in single + A/B modes |
|
||||
| `apply-migrations` | Inherited child output | Explicit flag propagation + stdio discipline |
|
||||
|
||||
Concrete agent win: on a 52K-page brain, `gbrain --progress-json doctor` emits ~10 events per second on stderr (start per check, heartbeats during the slow scan, finish per check) while `gbrain doctor --json` keeps stdout clean and JSON-parseable. The agent never sees silence longer than 1 second, and its stdout parser doesn't need to scrub progress garbage.
|
||||
|
||||
### What this means for you
|
||||
|
||||
If you run `gbrain` in CI, through a Minion worker, or inside any agent that captures stdout, this release means your downstream consumers stop guessing. Slow migrations announce themselves. Long imports name each file. `gbrain jobs get <id>` returns live `progress` for Minion-queued bulk work. The `gbrain doctor` warning you've been ignoring because it fires silently and then 10 minutes later tells you nothing is wrong becomes a 1-second heartbeat that proves it's working. If you're reading logs from a shell pipeline and prefer plain human lines, you don't need to do anything, that's the default for non-TTY stderr. Only add `--progress-json` when you want structured events.
|
||||
|
||||
## To take advantage of v0.15.2
|
||||
|
||||
`gbrain upgrade` should do this automatically. If it didn't, or if `gbrain doctor` warns about a partial migration:
|
||||
|
||||
1. **Nothing mechanical is required.** v0.15.2 is purely additive to the CLI surface — no schema changes, no migration orchestrator, no data rewrites. Progress events start flowing the next time you invoke a bulk command.
|
||||
2. **To stream structured events to your agent:**
|
||||
```bash
|
||||
gbrain --progress-json sync 2> progress.log
|
||||
# or
|
||||
gbrain doctor --progress-json --json > doctor.json 2> doctor.progress
|
||||
```
|
||||
3. **For Minion-queued jobs:**
|
||||
```bash
|
||||
gbrain jobs submit embed
|
||||
# while it runs:
|
||||
gbrain jobs get <id> # .progress is live-updated by the worker
|
||||
```
|
||||
4. **If `gbrain doctor` still looks hung** on a very large brain, check the CLI output for heartbeat lines. If they're missing, file an issue at https://github.com/garrytan/gbrain/issues with the command you ran, stdout/stderr samples, and output of `gbrain doctor --fast`.
|
||||
|
||||
### Itemized changes
|
||||
|
||||
#### Reporter (new, `src/core/progress.ts`)
|
||||
- Dependency-free. Modes: `auto` (TTY → `\r`-rewriting; non-TTY → plain lines), `human`, `json` (JSONL on stderr), `quiet`.
|
||||
- Rate gating: emits on whichever fires first: `minIntervalMs` (default 1000) or `minItems` (default `max(10, ceil(total/100))`). Final `tick` where `done === total` always emits.
|
||||
- `startHeartbeat(reporter, note)` helper for single long-running queries (doctor's `markdown_body_completeness`, `orphans` anti-join, `repair-jsonb` per-column UPDATE).
|
||||
- `child()` composes phase paths, `sync.import.<slug>`, not flat `<slug>`.
|
||||
- EPIPE defense on both sync throws and stream `'error'` events. Singleton module-level SIGINT/SIGTERM handler emits `abort` events for every live phase, one handler no matter how many reporters exist.
|
||||
|
||||
#### CLI plumbing (`src/core/cli-options.ts`, `src/cli.ts`)
|
||||
- Global flags `--quiet`, `--progress-json`, `--progress-interval=<ms>` parsed before command dispatch.
|
||||
- `CliOptions` singleton (`getCliOptions`) reachable from every command without threading a new parameter through 20 handlers.
|
||||
- `OperationContext.cliOpts` extends shared-op dispatch, MCP callers see defaults, CLI callers see parsed flags.
|
||||
- `childGlobalFlags()` helper: appends the parent's flags to every `execSync('gbrain ...')` call in the migration orchestrators, so child progress matches parent mode.
|
||||
|
||||
#### JSON event schema
|
||||
- Stable from v0.15.2, documented in `docs/progress-events.md`.
|
||||
- `{event, phase, ts}` always present. Optional: `total`, `done`, `pct`, `eta_ms`, `note`, `elapsed_ms`, `reason`. No fake totals when a query has no count.
|
||||
- Phases use `snake_case.dot.path`. Machine-stable. Agent parsers can group by phase prefix (all `doctor.*` events belong to one run).
|
||||
|
||||
#### Backward-compat warnings
|
||||
Progress for `embed`, `files`, `export`, `extract`, `import`, `migrate-engine` moved from stdout to stderr. Stdout now carries only final summaries and `--json` payloads. Scripts that parsed `process.stdout` for progress lines (`\r 1234/52000 pages...`) see empty stdout for those counters; the data they actually want (the final "Embedded N chunks" summary) is still there. Point anything grepping stdout for progress at stderr instead.
|
||||
|
||||
#### Minion handlers (`src/commands/jobs.ts`)
|
||||
- `embed` handler passes `job.updateProgress({done, total, embedded, phase})` as the `onProgress` callback. Primary Minion progress channel is DB-backed, readable via `gbrain jobs get <id>` or the `get_job_progress` MCP op. Stderr from `jobs work` stays coarse for daemon liveness.
|
||||
- Other handlers (`sync`, `extract`, `backlinks`, `autopilot-cycle`, `import`) have the callback plumbing ready from the core functions; wiring the remaining handlers is a follow-up.
|
||||
|
||||
#### `gbrain doctor`
|
||||
- `jsonb_integrity` now scans 5 targets (adds `page_versions.frontmatter`), matching `repair-jsonb`'s surface. The old 4-target check missed one of the repair sites.
|
||||
- Per-check heartbeats so agents see `doctor.db_checks` starting, which check is in-flight, and `doctor.markdown_body_completeness` scanning.
|
||||
- No false totals: the `LIMIT 100` truncation check reports `heartbeat`, not `tick` with a fake count.
|
||||
|
||||
#### Upgrade (`src/commands/upgrade.ts`)
|
||||
- Post-upgrade timeout bumped 300s → 1800s (30 min). Override via `GBRAIN_POST_UPGRADE_TIMEOUT_MS`. The old 300s cap killed v0.12.0 graph-backfill migrations on 50K+ brains; heartbeat wiring in v0.15.2 makes the long wait observable.
|
||||
|
||||
#### CI guard
|
||||
- `scripts/check-progress-to-stdout.sh` greps `src/` for `process.stdout.write('\r...')` and fails `bun run test` if any regression lands.
|
||||
|
||||
#### Tests
|
||||
- New: `test/progress.test.ts` (17 cases — mode resolution, rate gating, EPIPE paths, SIGINT singleton, child phase composition), `test/cli-options.test.ts` (18 cases — flag parsing, `--quiet` skillpack-check collision regression, global-flag strip-and-dispatch), `test/e2e/doctor-progress.test.ts` (3 cases, Tier 1 — spawns the real CLI against a real Postgres, asserts stderr JSONL matches the schema and stdout stays clean).
|
||||
## [0.15.1] - 2026-04-21
|
||||
|
||||
## **Fix wave: 4 hot issues that blocked real brains, landed together.**
|
||||
@@ -146,7 +253,6 @@ Credit to Codex for catching that the original plan's AGENTS.md was underpowered
|
||||
- `INSTALL_FOR_AGENTS.md` — new "Step 0: If you are not Claude Code" prelude points agents at `AGENTS.md` first.
|
||||
- `CLAUDE.md` — adds `scripts/llms-config.ts`, `scripts/build-llms.ts`, and `AGENTS.md` to Key files. Explicitly notes that committed generator output is NOT analogous to `schema-embedded.ts` (no runtime consumer; committed for GitHub browsing + fork safety).
|
||||
|
||||
#### Fixed
|
||||
- `INSTALL_FOR_AGENTS.md:136` — `git pull origin main` → `git pull origin master`. Pre-existing drift: README and CI use `master`, `origin/HEAD -> master`, but the upgrade instructions told users to pull from a branch that doesn't exist. Folded into this release as a drive-by fix.
|
||||
|
||||
## [0.14.2] - 2026-04-20
|
||||
@@ -214,7 +320,7 @@ Your agent's feedback loops tighten. When sync blocks, doctor surfaces the exact
|
||||
- **Bug 6/10: `jsonb_agg(DISTINCT ...)` in legacy `traverseGraph`** (`src/core/postgres-engine.ts`, `src/core/pglite-engine.ts`). Presentation-level dedup only — the schema continues to preserve per-`origin_page_id` / per-`link_source` provenance rows. Fixes duplicate edges like `works_at → companies/brex` appearing twice in `gbrain graph`.
|
||||
|
||||
#### New migration
|
||||
- **Bug 5: `v0_14_0` migration registered** (`src/commands/migrations/v0_14_0.ts`). Phase A: `ALTER minion_jobs.max_stalled SET DEFAULT 3` (idempotent). Phase B: emits `pending-host-work.jsonl` entry pointing at `skills/migrations/v0.14.0.md` for shell-jobs adoption. Registered in `src/commands/migrations/index.ts`. `package.json` bumped to 0.14.2 (0.14.0 and 0.14.1 were taken by upstream during this branch's work).
|
||||
- **Bug 5: `v0_14_0` migration registered** (`src/commands/migrations/v0_14_0.ts`). Phase A: `ALTER minion_jobs.max_stalled SET DEFAULT 3` (idempotent). Phase B: emits `pending-host-work.jsonl` entry pointing at `skills/migrations/v0.14.0.md` for shell-jobs adoption. Registered in `src/commands/migrations/index.ts`.
|
||||
|
||||
#### Tests
|
||||
- New: `test/traverse-graph-dedup.test.ts`, `test/sync-failures.test.ts`, `test/brain-score-breakdown.test.ts`, `test/migration-resume.test.ts`, `test/migrations-v0_14_0.test.ts`.
|
||||
|
||||
38
CLAUDE.md
38
CLAUDE.md
@@ -69,8 +69,12 @@ strict behavior when unset.
|
||||
- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3 — superseded by v0.14.3's schema-level DEFAULT 5 + UPDATE backfill; pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry <version>` writes a `'retry'` reset marker. v0.14.3 (fix wave) ships schema-only migrations v14 (`pages_updated_at_index`) + v15 (`minion_jobs_max_stalled_default_5` with UPDATE backfill) via the `MIGRATIONS` array in `src/core/migrate.ts` — no orchestrator phases needed.
|
||||
- `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent.
|
||||
- `src/commands/orphans.ts` — `gbrain orphans [--json] [--count] [--include-pseudo]`: surfaces pages with zero inbound wikilinks, grouped by domain. Auto-generated/raw/pseudo pages filtered by default. Also exposed as `find_orphans` MCP operation. Shipped in v0.12.3 (contributed by @knee5).
|
||||
- `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, and `gbrain apply-migrations`.
|
||||
- `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). v0.15.2: every DB check is wrapped in a progress phase; `markdown_body_completeness` runs under a 1s heartbeat timer so 10+ min scans are observable on 50K-page brains. Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, and `gbrain apply-migrations`.
|
||||
- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows.
|
||||
- `src/core/progress.ts` — Shared bulk-action progress reporter. Writes to stderr. Modes: `auto` (TTY: `\r`-rewriting; non-TTY: plain lines), `human`, `json` (JSONL), `quiet`. Rate-gated by `minIntervalMs` and `minItems`. `startHeartbeat(reporter, note)` helper for single long queries. `child()` composes phase paths. Singleton SIGINT/SIGTERM coordinator emits `abort` events for every live phase. EPIPE defense on both sync throws and stream `'error'` events. Zero dependencies. Introduced in v0.15.2.
|
||||
- `src/core/cli-options.ts` — Global CLI flag parser. `parseGlobalFlags(argv)` returns `{cliOpts, rest}` with `--quiet` / `--progress-json` / `--progress-interval=<ms>` stripped. `getCliOptions()` / `setCliOptions()` expose a module-level singleton so commands reach the resolved flags without parameter threading. `cliOptsToProgressOptions()` maps to reporter options. `childGlobalFlags()` returns the flag suffix to append to `execSync('gbrain ...')` calls in migration orchestrators. `OperationContext.cliOpts` extends shared-op dispatch for MCP callers.
|
||||
- `scripts/check-progress-to-stdout.sh` — CI guard against regressing to `\r`-on-stdout progress. Wired into `bun run test` via `scripts/check-progress-to-stdout.sh && bun test` in package.json.
|
||||
- `docs/progress-events.md` — Canonical JSON event schema reference. Stable from v0.15.2, additive only.
|
||||
- `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (`<!-- timeline -->`, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics).
|
||||
- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces (a) the `${JSON.stringify(x)}::jsonb` interpolation pattern (postgres.js v3 double-encodes it), or (b) `max_stalled INTEGER NOT NULL DEFAULT 1` in any schema source file (v0.15.1 #219 regression guard — must be DEFAULT 5 to preserve SIGKILL-rescue). Wired into `bun test`.
|
||||
- `scripts/llms-config.ts` + `scripts/build-llms.ts` — Generator for `llms.txt` (llmstxt.org-spec web index) + `llms-full.txt` (inlined single-fetch bundle). Curated config drives both. Run `bun run build:llms` after adding a new doc. `LLMS_REPO_BASE` env var lets forks regenerate with their own URL base. `FULL_SIZE_BUDGET` (600KB) caps the inline bundle; generator WARNs if exceeded. Committed output is not analogous to `schema-embedded.ts` (no runtime consumer); we commit for GitHub browsing and fork-safe fetching.
|
||||
@@ -293,6 +297,38 @@ testing, soul-audit, webhook-transforms, data-research, minion-orchestrator.
|
||||
model-routing, test-before-bulk, cross-modal). `skills/_brain-filing-rules.md` and
|
||||
`skills/_output-rules.md` are shared references.
|
||||
|
||||
## Bulk-action progress reporting
|
||||
|
||||
All bulk commands (doctor, embed, import, export, sync, extract, migrate,
|
||||
repair-jsonb, orphans, check-backlinks, lint, integrity auto, eval, files
|
||||
sync, and apply-migrations) stream progress through the shared reporter
|
||||
at `src/core/progress.ts`. Agents get heartbeats within 1 second of every
|
||||
iteration regardless of how slow the underlying work is.
|
||||
|
||||
Rules:
|
||||
- Progress always writes to **stderr**. Stdout stays clean for data output
|
||||
(`--json` payloads, final summaries, JSON action events from `extract`).
|
||||
- Non-TTY default: plain one-line-per-event human text. JSON requires the
|
||||
explicit `--progress-json` flag.
|
||||
- Global flags (`--quiet`, `--progress-json`, `--progress-interval=<ms>`)
|
||||
are parsed by `src/core/cli-options.ts` BEFORE command dispatch.
|
||||
- Phase names are machine-stable `snake_case.dot.path` (e.g.
|
||||
`doctor.db_checks`, `sync.imports`). Documented in
|
||||
`docs/progress-events.md`; additive changes only.
|
||||
- `scripts/check-progress-to-stdout.sh` is a CI guard that fails the build
|
||||
if any new code writes `\r` progress to stdout. Wired into `bun run test`.
|
||||
- Minion handlers pass `job.updateProgress` as the `onProgress` callback
|
||||
to core functions (DB-backed primary progress channel); stderr from
|
||||
`jobs work` stays coarse for daemon liveness only.
|
||||
|
||||
When wiring a new bulk command: `import { createProgress } from '../core/progress.ts'`
|
||||
and `import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts'`.
|
||||
Create a reporter with `createProgress(cliOptsToProgressOptions(getCliOptions()))`,
|
||||
`start(phase, total?)` before the loop, `tick()` inside it, `finish()` after.
|
||||
For single long-running queries, use `startHeartbeat(reporter, note)` with a
|
||||
try/finally to guarantee cleanup. Never call `process.stdout.write('\r...')`
|
||||
in bulk paths, the CI guard will fail the build.
|
||||
|
||||
## Build
|
||||
|
||||
`bun build --compile --outfile bin/gbrain src/cli.ts`
|
||||
|
||||
191
docs/progress-events.md
Normal file
191
docs/progress-events.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# Progress events
|
||||
|
||||
Canonical reference for the JSONL progress stream that `gbrain` writes to
|
||||
`stderr` when a bulk command runs with `--progress-json`. Stable from
|
||||
v0.15.2. Additive changes only; no renames or removals without a major
|
||||
version bump.
|
||||
|
||||
Most humans won't read this page. Agents parsing progress will.
|
||||
|
||||
## When do I get these events?
|
||||
|
||||
Any of these commands stream events when `--progress-json` is set:
|
||||
|
||||
- `gbrain doctor` (DB checks, JSONB integrity, markdown body completeness,
|
||||
integrity sample)
|
||||
- `gbrain orphans`
|
||||
- `gbrain embed`
|
||||
- `gbrain files sync`
|
||||
- `gbrain export`
|
||||
- `gbrain extract [links|timeline|all]` (fs or db source)
|
||||
- `gbrain import`
|
||||
- `gbrain sync`
|
||||
- `gbrain migrate --to …`
|
||||
- `gbrain repair-jsonb`
|
||||
- `gbrain check-backlinks`
|
||||
- `gbrain lint`
|
||||
- `gbrain integrity auto`
|
||||
- `gbrain eval`
|
||||
- `gbrain apply-migrations` (the orchestrator + every child command)
|
||||
|
||||
Non-bulk commands (`stats`, `graph-query`, `get`, `put`, etc.) don't emit
|
||||
events — they return in under a second.
|
||||
|
||||
## Channel
|
||||
|
||||
- Progress events: **`stderr`**, one JSON object per line, `\n`-terminated.
|
||||
- Data results (`--json` payloads from each command): **`stdout`**.
|
||||
- Final human summaries: **`stdout`**.
|
||||
|
||||
Agents can safely capture stdout for their result parsing and read stderr
|
||||
separately for progress.
|
||||
|
||||
## Flags
|
||||
|
||||
| Flag | Behavior |
|
||||
|---|---|
|
||||
| *(none)* | Auto. TTY: `\r`-rewriting single line. Non-TTY: plain line-per-event on stderr. |
|
||||
| `--progress-json` | Force JSON-lines mode on stderr (this doc). |
|
||||
| `--quiet` | Suppress progress entirely. Warnings and final output still print. |
|
||||
| `--progress-interval=<ms>` | Override the minimum interval between tick emits (default 1000). |
|
||||
|
||||
Global flags: parsed by `src/core/cli-options.ts` before command dispatch,
|
||||
so `gbrain --progress-json doctor` works the same as
|
||||
`gbrain doctor --progress-json` (the latter also works — per-command
|
||||
parsers see the flag via the shared `CliOptions` singleton).
|
||||
|
||||
## Event types
|
||||
|
||||
Every event is a single-line JSON object with these common fields:
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| `event` | string | One of: `start`, `tick`, `heartbeat`, `finish`, `abort`. |
|
||||
| `phase` | string | Machine-stable snake_case, dot-separated. See "Phase names" below. |
|
||||
| `ts` | ISO 8601 UTC string | Event emission time. |
|
||||
| `elapsed_ms` | number | Ms since the phase started. Present on `tick`/`heartbeat`/`finish`/`abort`. |
|
||||
|
||||
### `start`
|
||||
|
||||
Emitted when a phase begins.
|
||||
|
||||
```json
|
||||
{"event":"start","phase":"doctor.db_checks","ts":"2026-04-20T12:34:56.789Z"}
|
||||
{"event":"start","phase":"import.files","total":52000,"ts":"2026-04-20T12:34:56.789Z"}
|
||||
```
|
||||
|
||||
Optional fields:
|
||||
|
||||
- `total` — the total item count if known at start.
|
||||
|
||||
### `tick`
|
||||
|
||||
Emitted periodically during iteration. Time- and item-gated: the reporter
|
||||
won't emit more often than `minIntervalMs` (default 1000) and
|
||||
`minItems` (default `max(10, ceil(total/100))`).
|
||||
|
||||
```json
|
||||
{"event":"tick","phase":"orphans.scan","done":15000,"total":52000,"pct":28.8,"elapsed_ms":4200,"eta_ms":10300,"ts":"..."}
|
||||
```
|
||||
|
||||
Fields:
|
||||
|
||||
- `done` — items completed in this phase.
|
||||
- `total` — total items, if known. Omitted when the scan doesn't have a
|
||||
total up front (e.g. a streaming iterator).
|
||||
- `pct` — `done/total * 100`, one decimal. Omitted when `total` is unknown.
|
||||
- `eta_ms` — projected ms until `done === total`, from the observed rate.
|
||||
Omitted when `total` is unknown.
|
||||
- `note` — optional string with the current item (e.g. a slug or filename).
|
||||
|
||||
### `heartbeat`
|
||||
|
||||
Emitted for long-running single operations that don't iterate
|
||||
(e.g. `SELECT` against a 50K-row table). No `done`, no `total` — just a
|
||||
signal that work is still happening.
|
||||
|
||||
```json
|
||||
{"event":"heartbeat","phase":"doctor.markdown_body_completeness","note":"scanning pages for truncation…","elapsed_ms":1000,"ts":"..."}
|
||||
```
|
||||
|
||||
### `finish`
|
||||
|
||||
Emitted when a phase completes normally.
|
||||
|
||||
```json
|
||||
{"event":"finish","phase":"import.files","done":52000,"total":52000,"elapsed_ms":187000,"ts":"..."}
|
||||
```
|
||||
|
||||
### `abort`
|
||||
|
||||
Emitted by a single process-level SIGINT/SIGTERM handler that tracks every
|
||||
live phase. After `abort`, no further events emit for that phase.
|
||||
|
||||
```json
|
||||
{"event":"abort","phase":"doctor.markdown_body_completeness","reason":"SIGINT","elapsed_ms":5300,"ts":"..."}
|
||||
```
|
||||
|
||||
## Phase names
|
||||
|
||||
Phases use `snake_case.dot.path` naming. A fresh reporter starts at the
|
||||
root; `child()` composition appends to the parent's current phase, so a
|
||||
sync that calls import emits `sync.import.<file>`, not `import.<file>`.
|
||||
|
||||
Stable phase names shipped in v0.15.2:
|
||||
|
||||
- `doctor.db_checks` (umbrella for all DB-side doctor checks)
|
||||
- `orphans.scan`
|
||||
- `embed.pages`
|
||||
- `extract.links_fs`, `extract.timeline_fs`, `extract.links_db`, `extract.timeline_db`
|
||||
- `import.files`
|
||||
- `sync.deletes`, `sync.renames`, `sync.imports`
|
||||
- `migrate.copy_pages`, `migrate.copy_links`
|
||||
- `repair_jsonb.run`, `repair_jsonb.<table>.<column>`
|
||||
- `backlinks.scan`
|
||||
- `lint.pages`
|
||||
- `integrity.auto`
|
||||
- `eval.single`, `eval.ab`
|
||||
- `export.pages`
|
||||
- `files.sync`
|
||||
|
||||
Sub-phases exposed via `child()`:
|
||||
|
||||
- `sync.import.files` — nested inside a sync
|
||||
- `apply_migrations.v0_12_2.jsonb_repair` — nested inside the orchestrator
|
||||
|
||||
## Subprocess inheritance
|
||||
|
||||
When a parent CLI spawns `gbrain …` child processes (mostly in
|
||||
`src/commands/migrations/*`), global flags (`--quiet`, `--progress-json`,
|
||||
`--progress-interval`) are propagated to the child's argv via the
|
||||
`childGlobalFlags()` helper in `src/core/cli-options.ts`. Child stderr
|
||||
passes straight through `stdio: 'inherit'` so the event stream is one
|
||||
merged JSONL feed on the parent's stderr.
|
||||
|
||||
One exception: the orchestrator phase in `migrations/v0_12_2.ts` that
|
||||
captures child stdout (`repair-jsonb --dry-run --json` for verification)
|
||||
does not pass `--progress-json` to avoid any risk of stdout pollution
|
||||
breaking the orchestrator's `JSON.parse`. Its stdio is explicit:
|
||||
`['ignore', 'pipe', 'inherit']` so stderr still flows through.
|
||||
|
||||
## Minion jobs
|
||||
|
||||
`gbrain jobs work` (the Minion worker daemon) keeps progress in the DB,
|
||||
not on stderr. Each Minion handler that runs a bulk core (embed, sync,
|
||||
extract, import, backlinks) calls `job.updateProgress({done, total,
|
||||
…})` per iteration. Agents read per-job progress via the
|
||||
`get_job_progress` MCP operation or `gbrain jobs get <id>`.
|
||||
|
||||
The `jobs work` daemon itself emits coarse one-line-per-job stderr output
|
||||
for liveness only. Per-page detail lives in the DB.
|
||||
|
||||
## Compatibility
|
||||
|
||||
- **Added**: only. A new event type, a new field, a new phase name — all
|
||||
safe. Agents must ignore unknown fields and unknown event types.
|
||||
- **Removed/renamed**: never without a major version bump.
|
||||
- **Schema changes**: announced in `CHANGELOG.md` and in
|
||||
`skills/migrations/v<next>.md`.
|
||||
|
||||
If your agent depends on this schema and something surprises you, open
|
||||
an issue with the event you received and what you expected.
|
||||
@@ -148,8 +148,12 @@ strict behavior when unset.
|
||||
- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3 — superseded by v0.14.3's schema-level DEFAULT 5 + UPDATE backfill; pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry <version>` writes a `'retry'` reset marker. v0.14.3 (fix wave) ships schema-only migrations v14 (`pages_updated_at_index`) + v15 (`minion_jobs_max_stalled_default_5` with UPDATE backfill) via the `MIGRATIONS` array in `src/core/migrate.ts` — no orchestrator phases needed.
|
||||
- `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent.
|
||||
- `src/commands/orphans.ts` — `gbrain orphans [--json] [--count] [--include-pseudo]`: surfaces pages with zero inbound wikilinks, grouped by domain. Auto-generated/raw/pseudo pages filtered by default. Also exposed as `find_orphans` MCP operation. Shipped in v0.12.3 (contributed by @knee5).
|
||||
- `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, and `gbrain apply-migrations`.
|
||||
- `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). v0.15.2: every DB check is wrapped in a progress phase; `markdown_body_completeness` runs under a 1s heartbeat timer so 10+ min scans are observable on 50K-page brains. Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, and `gbrain apply-migrations`.
|
||||
- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows.
|
||||
- `src/core/progress.ts` — Shared bulk-action progress reporter. Writes to stderr. Modes: `auto` (TTY: `\r`-rewriting; non-TTY: plain lines), `human`, `json` (JSONL), `quiet`. Rate-gated by `minIntervalMs` and `minItems`. `startHeartbeat(reporter, note)` helper for single long queries. `child()` composes phase paths. Singleton SIGINT/SIGTERM coordinator emits `abort` events for every live phase. EPIPE defense on both sync throws and stream `'error'` events. Zero dependencies. Introduced in v0.15.2.
|
||||
- `src/core/cli-options.ts` — Global CLI flag parser. `parseGlobalFlags(argv)` returns `{cliOpts, rest}` with `--quiet` / `--progress-json` / `--progress-interval=<ms>` stripped. `getCliOptions()` / `setCliOptions()` expose a module-level singleton so commands reach the resolved flags without parameter threading. `cliOptsToProgressOptions()` maps to reporter options. `childGlobalFlags()` returns the flag suffix to append to `execSync('gbrain ...')` calls in migration orchestrators. `OperationContext.cliOpts` extends shared-op dispatch for MCP callers.
|
||||
- `scripts/check-progress-to-stdout.sh` — CI guard against regressing to `\r`-on-stdout progress. Wired into `bun run test` via `scripts/check-progress-to-stdout.sh && bun test` in package.json.
|
||||
- `docs/progress-events.md` — Canonical JSON event schema reference. Stable from v0.15.2, additive only.
|
||||
- `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (`<!-- timeline -->`, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics).
|
||||
- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces (a) the `${JSON.stringify(x)}::jsonb` interpolation pattern (postgres.js v3 double-encodes it), or (b) `max_stalled INTEGER NOT NULL DEFAULT 1` in any schema source file (v0.15.1 #219 regression guard — must be DEFAULT 5 to preserve SIGKILL-rescue). Wired into `bun test`.
|
||||
- `scripts/llms-config.ts` + `scripts/build-llms.ts` — Generator for `llms.txt` (llmstxt.org-spec web index) + `llms-full.txt` (inlined single-fetch bundle). Curated config drives both. Run `bun run build:llms` after adding a new doc. `LLMS_REPO_BASE` env var lets forks regenerate with their own URL base. `FULL_SIZE_BUDGET` (600KB) caps the inline bundle; generator WARNs if exceeded. Committed output is not analogous to `schema-embedded.ts` (no runtime consumer); we commit for GitHub browsing and fork-safe fetching.
|
||||
@@ -372,6 +376,38 @@ testing, soul-audit, webhook-transforms, data-research, minion-orchestrator.
|
||||
model-routing, test-before-bulk, cross-modal). `skills/_brain-filing-rules.md` and
|
||||
`skills/_output-rules.md` are shared references.
|
||||
|
||||
## Bulk-action progress reporting
|
||||
|
||||
All bulk commands (doctor, embed, import, export, sync, extract, migrate,
|
||||
repair-jsonb, orphans, check-backlinks, lint, integrity auto, eval, files
|
||||
sync, and apply-migrations) stream progress through the shared reporter
|
||||
at `src/core/progress.ts`. Agents get heartbeats within 1 second of every
|
||||
iteration regardless of how slow the underlying work is.
|
||||
|
||||
Rules:
|
||||
- Progress always writes to **stderr**. Stdout stays clean for data output
|
||||
(`--json` payloads, final summaries, JSON action events from `extract`).
|
||||
- Non-TTY default: plain one-line-per-event human text. JSON requires the
|
||||
explicit `--progress-json` flag.
|
||||
- Global flags (`--quiet`, `--progress-json`, `--progress-interval=<ms>`)
|
||||
are parsed by `src/core/cli-options.ts` BEFORE command dispatch.
|
||||
- Phase names are machine-stable `snake_case.dot.path` (e.g.
|
||||
`doctor.db_checks`, `sync.imports`). Documented in
|
||||
`docs/progress-events.md`; additive changes only.
|
||||
- `scripts/check-progress-to-stdout.sh` is a CI guard that fails the build
|
||||
if any new code writes `\r` progress to stdout. Wired into `bun run test`.
|
||||
- Minion handlers pass `job.updateProgress` as the `onProgress` callback
|
||||
to core functions (DB-backed primary progress channel); stderr from
|
||||
`jobs work` stays coarse for daemon liveness only.
|
||||
|
||||
When wiring a new bulk command: `import { createProgress } from '../core/progress.ts'`
|
||||
and `import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts'`.
|
||||
Create a reporter with `createProgress(cliOptsToProgressOptions(getCliOptions()))`,
|
||||
`start(phase, total?)` before the loop, `tick()` inside it, `finish()` after.
|
||||
For single long-running queries, use `startHeartbeat(reporter, note)` with a
|
||||
try/finally to guarantee cleanup. Never call `process.stdout.write('\r...')`
|
||||
in bulk paths, the CI guard will fail the build.
|
||||
|
||||
## Build
|
||||
|
||||
`bun build --compile --outfile bin/gbrain src/cli.ts`
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "gbrain",
|
||||
"version": "0.15.1",
|
||||
"version": "0.15.2",
|
||||
"description": "Postgres-native personal knowledge brain with hybrid RAG search",
|
||||
"type": "module",
|
||||
"main": "src/core/index.ts",
|
||||
@@ -21,9 +21,10 @@
|
||||
"build:all": "bun build --compile --target=bun-darwin-arm64 --outfile bin/gbrain-darwin-arm64 src/cli.ts && bun build --compile --target=bun-linux-x64 --outfile bin/gbrain-linux-x64 src/cli.ts",
|
||||
"build:schema": "bash scripts/build-schema.sh",
|
||||
"build:llms": "bun run scripts/build-llms.ts",
|
||||
"test": "scripts/check-jsonb-pattern.sh && bun test",
|
||||
"test": "scripts/check-jsonb-pattern.sh && scripts/check-progress-to-stdout.sh && bun test",
|
||||
"test:e2e": "bash scripts/run-e2e.sh",
|
||||
"check:jsonb": "scripts/check-jsonb-pattern.sh",
|
||||
"check:progress": "scripts/check-progress-to-stdout.sh",
|
||||
"postinstall": "command -v gbrain >/dev/null 2>&1 && gbrain apply-migrations --yes --non-interactive || echo '[gbrain] postinstall skipped. If installed via bun install -g github:...: run `gbrain doctor` and `gbrain apply-migrations --yes` manually. See https://github.com/garrytan/gbrain/issues/218' 1>&2",
|
||||
"prepublish:clawhub": "bun run build:all",
|
||||
"publish:clawhub": "clawhub package publish . --family bundle-plugin"
|
||||
|
||||
63
scripts/check-progress-to-stdout.sh
Executable file
63
scripts/check-progress-to-stdout.sh
Executable file
@@ -0,0 +1,63 @@
|
||||
#!/usr/bin/env bash
|
||||
# CI guard: fail if any new code emits \r-progress to stdout.
|
||||
#
|
||||
# Since v0.14.2, bulk-action progress lives on stderr via the shared
|
||||
# src/core/progress.ts reporter. \r-rewriting on stdout breaks every
|
||||
# piped-output scenario: agents that capture stdout for structured
|
||||
# results see progress garbage mixed with the data, and CI logs show
|
||||
# a single line per command because everything after the last \r
|
||||
# is truncated by the terminal emulator when played back.
|
||||
#
|
||||
# This script greps for the anti-pattern. Legitimate uses of \r inside
|
||||
# string literals (e.g. Windows line-ending normalization, regex
|
||||
# patterns) are expected to contain \r without being preceded by
|
||||
# `process.stdout.write`. We match the full write-call form only.
|
||||
#
|
||||
# Usage: scripts/check-progress-to-stdout.sh
|
||||
# Exit: 0 when clean, 1 when a banned pattern is found.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
ROOT="$(git rev-parse --show-toplevel 2>/dev/null || pwd)"
|
||||
cd "$ROOT"
|
||||
|
||||
# The banned pattern: process.stdout.write('\r... or process.stdout.write("\r...
|
||||
# Greedy quote character class so both quote styles match.
|
||||
PATTERN="process\.stdout\.write\([\`'\"]\\\\r"
|
||||
|
||||
# Files allowed to use this pattern historically. Empty allowlist — the point
|
||||
# of v0.14.2 was to remove every one of them. Add entries only if you really
|
||||
# need a \r on stdout (if so, add the rationale as a comment at the call site
|
||||
# and list the file here).
|
||||
ALLOWLIST=()
|
||||
|
||||
matches=""
|
||||
if command -v rg >/dev/null 2>&1; then
|
||||
matches="$(rg -n --no-heading "$PATTERN" src/ 2>/dev/null || true)"
|
||||
else
|
||||
matches="$(grep -rEn "$PATTERN" src/ 2>/dev/null || true)"
|
||||
fi
|
||||
|
||||
if [ -n "$matches" ]; then
|
||||
# Filter out allowlisted files.
|
||||
filtered="$matches"
|
||||
for f in "${ALLOWLIST[@]:-}"; do
|
||||
[ -z "$f" ] && continue
|
||||
filtered="$(echo "$filtered" | grep -v "^${f}:" || true)"
|
||||
done
|
||||
|
||||
if [ -n "$filtered" ]; then
|
||||
echo "ERROR: found process.stdout.write('\\r…') pattern(s) in src/:"
|
||||
echo
|
||||
echo "$filtered"
|
||||
echo
|
||||
echo "Bulk-action progress must go through src/core/progress.ts"
|
||||
echo "(writes to stderr, handles TTY vs non-TTY, honors --quiet /"
|
||||
echo " --progress-json / --progress-interval). If you genuinely"
|
||||
echo "need a \\r on stdout, add the file to the ALLOWLIST at the"
|
||||
echo "top of this script and explain why at the call site."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
echo "check-progress-to-stdout: OK (no banned stdout \\r patterns)"
|
||||
164
skills/migrations/v0.15.2.md
Normal file
164
skills/migrations/v0.15.2.md
Normal file
@@ -0,0 +1,164 @@
|
||||
---
|
||||
version: 0.15.2
|
||||
feature_pitch:
|
||||
headline: "Silent binaries are dead. Every bulk action now heartbeats."
|
||||
description: |
|
||||
`gbrain doctor` on a 52K-page brain used to sit silent for 10+
|
||||
minutes before an agent timeout killed it. Same pattern on embed,
|
||||
sync, import, extract, migrate, and every orchestrator. v0.15.2
|
||||
routes 14 bulk commands through one shared reporter that writes
|
||||
to stderr. Non-TTY default is plain human lines; agents that
|
||||
want structured events add `--progress-json` and get one JSON
|
||||
object per line. Stdout stays clean for data output. Event
|
||||
schema is locked in docs/progress-events.md.
|
||||
recipe: docs/progress-events.md
|
||||
tiers: null
|
||||
---
|
||||
|
||||
# v0.15.2 Migration: Bulk-action progress streaming
|
||||
|
||||
**Audience: host agents reading this after `gbrain apply-migrations`
|
||||
has run. v0.15.2 is purely additive to the CLI surface, there is no
|
||||
schema change, no data rewrite, and no orchestrator for this release.**
|
||||
Your binaries just got observable. This file tells you how to use it.
|
||||
|
||||
## Mechanical migration: nothing
|
||||
|
||||
There is no mechanical step. If `gbrain upgrade` completed, progress
|
||||
events are already flowing the next time you invoke a bulk command.
|
||||
Read on to know what's there and how to consume it.
|
||||
|
||||
## What's new at the CLI
|
||||
|
||||
### Three new global flags
|
||||
|
||||
These work on any `gbrain` subcommand:
|
||||
|
||||
- `--progress-json` — emit one JSON event per line on stderr.
|
||||
- `--quiet` — suppress progress output entirely.
|
||||
- `--progress-interval=<ms>` — minimum ms between progress emits
|
||||
(default 1000).
|
||||
|
||||
Parsed before command dispatch, so both work:
|
||||
|
||||
```
|
||||
gbrain --progress-json doctor --json
|
||||
gbrain doctor --json --progress-json
|
||||
```
|
||||
|
||||
### Per-TTY behavior
|
||||
|
||||
Without `--progress-json`:
|
||||
|
||||
- **TTY:** `\r`-rewriting single-line progress on stderr (fancy).
|
||||
- **Non-TTY (pipe, CI, agent):** one plain-text line per event on
|
||||
stderr. No JSON, no noise. Human-readable.
|
||||
|
||||
The default was deliberately NOT JSON-on-non-TTY. Shell pipelines
|
||||
that just pipe `gbrain ... | less` should get readable logs, not a
|
||||
JSON blob. Agents opt in to JSON explicitly.
|
||||
|
||||
## What's new per command
|
||||
|
||||
Fourteen commands now stream progress through the shared reporter:
|
||||
|
||||
| Command | What you'll see |
|
||||
|---------|-----------------|
|
||||
| `doctor` | `doctor.db_checks` phase + per-check heartbeats, including a 1s heartbeat while `markdown_body_completeness` scans |
|
||||
| `orphans` | `orphans.scan` heartbeat while the anti-join runs |
|
||||
| `embed` | `embed.pages` with per-page ticks |
|
||||
| `files sync` | `files.sync` with per-file ticks |
|
||||
| `export` | `export.pages` with per-page ticks |
|
||||
| `import` | `import.files` with per-file ticks (replaces per-100 stdout logs) |
|
||||
| `extract [links|timeline|all]` (fs + db) | `extract.links_fs` / `extract.timeline_db` etc. |
|
||||
| `sync` | `sync.deletes`, `sync.renames`, `sync.imports` phases |
|
||||
| `migrate --to ...` | `migrate.copy_pages`, `migrate.copy_links` |
|
||||
| `repair-jsonb` | `repair_jsonb.run` + per-column heartbeats |
|
||||
| `check-backlinks` | `backlinks.scan` heartbeat |
|
||||
| `lint` | `lint.pages` per-page ticks |
|
||||
| `integrity auto` | `integrity.auto` per-page ticks |
|
||||
| `eval` | `eval.single` / `eval.ab` per-query ticks |
|
||||
| `apply-migrations` (v0_11/v0_12_0/v0_12_2) | Child processes inherit the parent's progress mode |
|
||||
|
||||
## JSON event schema
|
||||
|
||||
Documented in `docs/progress-events.md` (canonical reference). Stable
|
||||
from v0.15.2, additive changes only.
|
||||
|
||||
Quick agent cheat sheet:
|
||||
|
||||
```json
|
||||
{"event":"start","phase":"doctor.db_checks","ts":"..."}
|
||||
{"event":"tick","phase":"orphans.scan","done":15000,"total":52000,"pct":28.8,"elapsed_ms":4200,"eta_ms":10300,"ts":"..."}
|
||||
{"event":"heartbeat","phase":"doctor.markdown_body_completeness","note":"scanning pages for truncation...","elapsed_ms":1000,"ts":"..."}
|
||||
{"event":"finish","phase":"doctor.db_checks","elapsed_ms":187000,"ts":"..."}
|
||||
{"event":"abort","phase":"orphans.scan","reason":"SIGINT","elapsed_ms":5300,"ts":"..."}
|
||||
```
|
||||
|
||||
Parser rules:
|
||||
|
||||
1. One JSON object per line on stderr.
|
||||
2. Ignore unknown event types and unknown fields. Schema is additive.
|
||||
3. Group by `phase` prefix to track one run: all `doctor.*` events
|
||||
belong to the same `doctor` invocation.
|
||||
4. `total` / `pct` / `eta_ms` are absent when the scan doesn't have a
|
||||
total up front (e.g. heartbeat-only paths). Don't assume they exist.
|
||||
|
||||
## Minion jobs
|
||||
|
||||
`gbrain jobs work` (the Minion worker daemon) writes progress to the
|
||||
DB via `job.updateProgress`, not to stderr. Read per-job progress via
|
||||
the `get_job_progress` MCP op or:
|
||||
|
||||
```bash
|
||||
gbrain jobs submit embed
|
||||
# while it runs:
|
||||
gbrain jobs get <id> # .progress updates live as the handler ticks
|
||||
```
|
||||
|
||||
The `embed` Minion handler is wired as of v0.15.2. Other bulk cores
|
||||
(`sync`, `extract`, `backlinks`, `import`, `autopilot-cycle`) have the
|
||||
callback plumbing ready and will follow.
|
||||
|
||||
## Backward-compatibility warnings
|
||||
|
||||
Five commands moved per-page progress from stdout to stderr:
|
||||
|
||||
- `embed` (was `\r`-on-stdout)
|
||||
- `files sync` (was `\r`-on-stdout)
|
||||
- `export` (was `\r`-on-stdout, newly in scope)
|
||||
- `migrate-engine` (was per-50 `console.log` to stdout)
|
||||
- `import` (was per-100 `console.log` to stdout)
|
||||
|
||||
If you have scripts that grep `stdout` for progress strings like
|
||||
`Progress: 1234/52000` or `\r 1234/52000 pages...` — those strings
|
||||
now live on stderr. The final data summaries (`Embedded N chunks
|
||||
across M pages`, `Import complete`, etc.) remain on stdout so the
|
||||
"did it finish" signal is unchanged.
|
||||
|
||||
`integrity auto` still writes `~/.gbrain/integrity-progress.jsonl`,
|
||||
but its role is now "resume marker only" — live progress goes through
|
||||
the reporter. If you depended on tailing that file for real-time
|
||||
progress, switch to the stderr stream.
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
# Your agent sees structured events; stdout stays JSON-parseable:
|
||||
gbrain --progress-json doctor --json > doctor.json 2> doctor.progress.log
|
||||
wc -l doctor.progress.log # should be non-zero
|
||||
jq . doctor.json # should parse cleanly
|
||||
|
||||
# For a very large brain, watch the heartbeat:
|
||||
gbrain --progress-json doctor 2>&1 >/dev/null | grep '"event"'
|
||||
```
|
||||
|
||||
If you see silence for more than a second or two on a non-trivial
|
||||
command, file an issue with the exact command and the first 100 lines
|
||||
of stderr.
|
||||
|
||||
## That's the whole migration
|
||||
|
||||
No mechanical step. No config change. Agents that parse `stdout` keep
|
||||
working; agents that want progress now have it on a clean stderr
|
||||
channel with a documented schema.
|
||||
10
src/cli.ts
10
src/cli.ts
@@ -6,6 +6,7 @@ import type { BrainEngine } from './core/engine.ts';
|
||||
import { operations, OperationError } from './core/operations.ts';
|
||||
import type { Operation, OperationContext } from './core/operations.ts';
|
||||
import { serializeMarkdown } from './core/markdown.ts';
|
||||
import { parseGlobalFlags, setCliOptions, getCliOptions } from './core/cli-options.ts';
|
||||
import { VERSION } from './version.ts';
|
||||
|
||||
// Build CLI name -> operation lookup
|
||||
@@ -21,7 +22,13 @@ for (const op of operations) {
|
||||
const CLI_ONLY = new Set(['init', 'upgrade', 'post-upgrade', 'check-update', 'integrations', 'publish', 'check-backlinks', 'lint', 'report', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor', 'migrate', 'eval', 'sync', 'extract', 'features', 'autopilot', 'graph-query', 'jobs', 'apply-migrations', 'skillpack-check', 'resolvers', 'integrity', 'repair-jsonb', 'orphans']);
|
||||
|
||||
async function main() {
|
||||
const args = process.argv.slice(2);
|
||||
// Parse global flags (--quiet / --progress-json / --progress-interval)
|
||||
// BEFORE command dispatch, so `gbrain --progress-json doctor` works.
|
||||
// The stripped argv is what the command sees.
|
||||
const rawArgs = process.argv.slice(2);
|
||||
const { cliOpts, rest: args } = parseGlobalFlags(rawArgs);
|
||||
setCliOptions(cliOpts);
|
||||
|
||||
let command = args[0];
|
||||
|
||||
if (!command || command === '--help' || command === '-h') {
|
||||
@@ -148,6 +155,7 @@ function makeContext(engine: BrainEngine, params: Record<string, unknown>): Oper
|
||||
// Local CLI invocation — the user owns the machine; do not apply remote-caller
|
||||
// confinement (e.g., cwd-locked file_upload).
|
||||
remote: false,
|
||||
cliOpts: getCliOptions(),
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
@@ -13,6 +13,8 @@
|
||||
import { readFileSync, writeFileSync, readdirSync, statSync, lstatSync, existsSync } from 'fs';
|
||||
import { join, relative, basename } from 'path';
|
||||
import { extractEntityRefs as canonicalExtractEntityRefs } from '../core/link-extraction.ts';
|
||||
import { createProgress, startHeartbeat } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
|
||||
interface BacklinkGap {
|
||||
/** The page that mentions the entity */
|
||||
@@ -201,7 +203,18 @@ export async function runBacklinksCore(opts: BacklinksOpts): Promise<BacklinksRe
|
||||
throw new Error(`Directory not found: ${opts.dir}`);
|
||||
}
|
||||
|
||||
const gaps = findBacklinkGaps(opts.dir);
|
||||
// findBacklinkGaps is a sync double-walk of the brain dir. On 50K-page
|
||||
// brains that can take seconds — heartbeat so agents see we're working.
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('backlinks.scan');
|
||||
const stopHb = startHeartbeat(progress, 'walking pages for missing back-links…');
|
||||
let gaps: BacklinkGap[];
|
||||
try {
|
||||
gaps = findBacklinkGaps(opts.dir);
|
||||
} finally {
|
||||
stopHb();
|
||||
progress.finish();
|
||||
}
|
||||
const pagesAffected = new Set(gaps.map(g => g.targetPage)).size;
|
||||
|
||||
if (opts.action === 'fix' && gaps.length > 0) {
|
||||
|
||||
@@ -4,6 +4,8 @@ import { LATEST_VERSION } from '../core/migrate.ts';
|
||||
import { checkResolvable } from '../core/check-resolvable.ts';
|
||||
import { autoFixDryViolations, type AutoFixReport, type FixOutcome } from '../core/dry-fix.ts';
|
||||
import { loadCompletedMigrations } from '../core/preferences.ts';
|
||||
import { createProgress, startHeartbeat, type ProgressReporter } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
import type { DbUrlSource } from '../core/config.ts';
|
||||
import { join } from 'path';
|
||||
import { existsSync, readFileSync, readdirSync } from 'fs';
|
||||
@@ -33,6 +35,12 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
const checks: Check[] = [];
|
||||
let autoFixReport: AutoFixReport | null = null;
|
||||
|
||||
// Progress reporter. `--json` is doctor's own JSON output (list of checks);
|
||||
// progress events stay on stderr regardless, gated by the global --quiet /
|
||||
// --progress-json flags. On a 52K-page brain the DB checks can take minutes,
|
||||
// and without a heartbeat agents can't tell doctor from a hang.
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
|
||||
// --- Filesystem checks (always run, no DB needed) ---
|
||||
|
||||
// 1. Resolver health
|
||||
@@ -196,19 +204,27 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
return;
|
||||
}
|
||||
|
||||
// DB checks phase — start a single reporter phase so agents see which
|
||||
// check is running (several take seconds on 50K-page brains; without a
|
||||
// heartbeat the binary looks hung when stdout is piped).
|
||||
progress.start('doctor.db_checks');
|
||||
|
||||
// 3. Connection
|
||||
progress.heartbeat('connection');
|
||||
try {
|
||||
const stats = await engine.getStats();
|
||||
checks.push({ name: 'connection', status: 'ok', message: `Connected, ${stats.page_count} pages` });
|
||||
} catch (e: unknown) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
checks.push({ name: 'connection', status: 'fail', message: msg });
|
||||
progress.finish();
|
||||
const earlyFail2 = outputResults(checks, jsonOutput);
|
||||
process.exit(earlyFail2 ? 1 : 0);
|
||||
return;
|
||||
}
|
||||
|
||||
// 4. pgvector extension
|
||||
progress.heartbeat('pgvector');
|
||||
try {
|
||||
const sql = db.getConnection();
|
||||
const ext = await sql`SELECT extname FROM pg_extension WHERE extname = 'vector'`;
|
||||
@@ -222,6 +238,7 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
}
|
||||
|
||||
// 5. RLS
|
||||
progress.heartbeat('rls');
|
||||
try {
|
||||
const sql = db.getConnection();
|
||||
const tables = await sql`
|
||||
@@ -246,6 +263,7 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
// never ran. That's the same class as a half-migrated install, just from a
|
||||
// different root cause (Bun blocked our top-level postinstall on global
|
||||
// install). Message is actionable either way.
|
||||
progress.heartbeat('schema_version');
|
||||
let schemaVersion = 0;
|
||||
try {
|
||||
const version = await engine.getConfig('version');
|
||||
@@ -278,6 +296,7 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
// but `apply-migrations` didn't follow up.
|
||||
|
||||
// 7. Embedding health
|
||||
progress.heartbeat('embeddings');
|
||||
try {
|
||||
const health = await engine.getHealth();
|
||||
const pct = (health.embed_coverage * 100).toFixed(0);
|
||||
@@ -293,6 +312,8 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
}
|
||||
|
||||
// 8. Graph health (link + timeline coverage on entity pages).
|
||||
// dead_links removed in v0.10.1: ON DELETE CASCADE on link FKs makes it always 0.
|
||||
progress.heartbeat('graph_coverage');
|
||||
try {
|
||||
const health = await engine.getHealth();
|
||||
const linkPct = ((health.link_coverage ?? 0) * 100).toFixed(0);
|
||||
@@ -335,6 +356,8 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
// Read-only — no network, no writes, no resolver calls. Samples the first
|
||||
// 500 pages by slug order and surfaces bare-tweet + dead-link counts as a
|
||||
// warning. Full-brain scan: `gbrain integrity check`.
|
||||
progress.heartbeat('integrity_sample');
|
||||
const integrityHb = startHeartbeat(progress, 'scanning 500-page integrity sample…');
|
||||
try {
|
||||
const { scanIntegrity } = await import('./integrity.ts');
|
||||
const res = await scanIntegrity(engine, { limit: 500 });
|
||||
@@ -360,24 +383,31 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
}
|
||||
} catch (e) {
|
||||
checks.push({ name: 'integrity', status: 'warn', message: `integrity scan skipped: ${e instanceof Error ? e.message : String(e)}` });
|
||||
} finally {
|
||||
integrityHb();
|
||||
}
|
||||
|
||||
// 10. JSONB integrity (v0.12.3 reliability wave).
|
||||
// v0.12.0's JSON.stringify()::jsonb pattern stored JSONB string literals
|
||||
// instead of objects on real Postgres. PGLite masked this; Supabase did not.
|
||||
// Scan the 4 known sites (pages.frontmatter, raw_data.data, ingest_log.pages_updated,
|
||||
// files.metadata) for rows whose top-level jsonb_typeof is 'string'.
|
||||
// Scan 5 known write sites for rows whose top-level jsonb_typeof is
|
||||
// 'string'. `page_versions.frontmatter` added in v0.15.2 so doctor's
|
||||
// surface matches `repair-jsonb` (the previous 4-target scan missed a
|
||||
// repair target, per #254/Codex review).
|
||||
progress.heartbeat('jsonb_integrity');
|
||||
try {
|
||||
const sql = db.getConnection();
|
||||
const targets: Array<{ table: string; col: string; expected: 'object' | 'array' }> = [
|
||||
{ table: 'pages', col: 'frontmatter', expected: 'object' },
|
||||
{ table: 'raw_data', col: 'data', expected: 'object' },
|
||||
{ table: 'ingest_log', col: 'pages_updated', expected: 'array' },
|
||||
{ table: 'files', col: 'metadata', expected: 'object' },
|
||||
{ table: 'pages', col: 'frontmatter', expected: 'object' },
|
||||
{ table: 'raw_data', col: 'data', expected: 'object' },
|
||||
{ table: 'ingest_log', col: 'pages_updated', expected: 'array' },
|
||||
{ table: 'files', col: 'metadata', expected: 'object' },
|
||||
{ table: 'page_versions', col: 'frontmatter', expected: 'object' },
|
||||
];
|
||||
let totalBad = 0;
|
||||
const breakdown: string[] = [];
|
||||
for (const { table, col } of targets) {
|
||||
progress.heartbeat(`jsonb_integrity.${table}.${col}`);
|
||||
const rows = await sql.unsafe(
|
||||
`SELECT count(*)::int AS n FROM ${table} WHERE jsonb_typeof(${col}) = 'string'`,
|
||||
);
|
||||
@@ -401,6 +431,12 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
// v0.12.0's splitBody ate everything after the first `---` horizontal rule,
|
||||
// truncating wiki-style pages. Heuristic: pages whose body is <30% of the
|
||||
// raw source content length when raw has multiple H2/H3 boundaries.
|
||||
//
|
||||
// No total on this check: the regex scan over rd.data -> 'content' is a
|
||||
// sequential scan that LIMIT 100 bounds only the output, not the scan
|
||||
// work. We heartbeat every second so agents see life, no fake totals.
|
||||
progress.heartbeat('markdown_body_completeness');
|
||||
const mbcHb = startHeartbeat(progress, 'scanning pages for truncation…');
|
||||
try {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`
|
||||
@@ -428,9 +464,11 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
} catch {
|
||||
// pages_raw.raw_data may not exist on older schemas; best-effort.
|
||||
checks.push({ name: 'markdown_body_completeness', status: 'ok', message: 'Skipped (raw_data unavailable)' });
|
||||
} finally {
|
||||
mbcHb();
|
||||
}
|
||||
|
||||
// 11. Index audit (opt-in via --index-audit). v0.13.1 follow-up to #170.
|
||||
// 12. Index audit (opt-in via --index-audit). v0.13.1 follow-up to #170.
|
||||
// Reports indexes with zero recorded scans on Postgres. Informational only;
|
||||
// we DO NOT auto-drop. On #170's brain, idx_pages_frontmatter and
|
||||
// idx_pages_trgm showed 0 scans — the suggestion there is "consider
|
||||
@@ -438,6 +476,7 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
// fresh install is also normal (nothing has queried yet); the real signal
|
||||
// is zero scans on a long-running active brain.
|
||||
if (args.includes('--index-audit')) {
|
||||
progress.heartbeat('index_audit');
|
||||
if (engine.kind === 'pglite') {
|
||||
checks.push({
|
||||
name: 'index_audit',
|
||||
@@ -475,6 +514,8 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo
|
||||
}
|
||||
}
|
||||
|
||||
progress.finish();
|
||||
|
||||
const hasFail = outputResults(checks, jsonOutput);
|
||||
|
||||
// Features teaser (non-JSON, non-failing only)
|
||||
|
||||
@@ -2,6 +2,8 @@ import type { BrainEngine } from '../core/engine.ts';
|
||||
import { embedBatch } from '../core/embedding.ts';
|
||||
import type { ChunkInput } from '../core/types.ts';
|
||||
import { chunkText } from '../core/chunkers/recursive.ts';
|
||||
import { createProgress, type ProgressReporter } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
|
||||
export interface EmbedOpts {
|
||||
/** Embed ALL pages (every chunk). */
|
||||
@@ -12,6 +14,13 @@ export interface EmbedOpts {
|
||||
slugs?: string[];
|
||||
/** Embed a single page. */
|
||||
slug?: string;
|
||||
/**
|
||||
* Optional progress callback. Called after each page. CLI wrappers
|
||||
* supply a reporter.tick()-backed implementation; Minion handlers
|
||||
* supply a job.updateProgress()-backed one so per-job progress lives
|
||||
* in the DB where `gbrain jobs get` can read it.
|
||||
*/
|
||||
onProgress?: (done: number, total: number, embedded: number) => void;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -29,7 +38,7 @@ export async function runEmbedCore(engine: BrainEngine, opts: EmbedOpts): Promis
|
||||
return;
|
||||
}
|
||||
if (opts.all || opts.stale) {
|
||||
await embedAll(engine, !!opts.stale);
|
||||
await embedAll(engine, !!opts.stale, opts.onProgress);
|
||||
return;
|
||||
}
|
||||
if (opts.slug) {
|
||||
@@ -58,9 +67,24 @@ export async function runEmbed(engine: BrainEngine, args: string[]) {
|
||||
opts = { slug };
|
||||
}
|
||||
|
||||
// CLI path: wire a reporter so --progress-json / --quiet / TTY rendering
|
||||
// all work. Minion handlers call runEmbedCore directly with their own
|
||||
// onProgress (see jobs.ts).
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
let progressStarted = false;
|
||||
opts.onProgress = (done, total, _embedded) => {
|
||||
if (!progressStarted) {
|
||||
progress.start('embed.pages', total);
|
||||
progressStarted = true;
|
||||
}
|
||||
progress.tick(1);
|
||||
};
|
||||
|
||||
try {
|
||||
await runEmbedCore(engine, opts);
|
||||
if (progressStarted) progress.finish();
|
||||
} catch (e) {
|
||||
if (progressStarted) progress.finish();
|
||||
console.error(e instanceof Error ? e.message : String(e));
|
||||
process.exit(1);
|
||||
}
|
||||
@@ -117,7 +141,11 @@ async function embedPage(engine: BrainEngine, slug: string) {
|
||||
console.log(`${slug}: embedded ${toEmbed.length} chunks`);
|
||||
}
|
||||
|
||||
async function embedAll(engine: BrainEngine, staleOnly: boolean) {
|
||||
async function embedAll(
|
||||
engine: BrainEngine,
|
||||
staleOnly: boolean,
|
||||
onProgress?: (done: number, total: number, embedded: number) => void,
|
||||
) {
|
||||
const pages = await engine.listPages({ limit: 100000 });
|
||||
let total = 0;
|
||||
let embedded = 0;
|
||||
@@ -141,7 +169,7 @@ async function embedAll(engine: BrainEngine, staleOnly: boolean) {
|
||||
|
||||
if (toEmbed.length === 0) {
|
||||
processed++;
|
||||
process.stdout.write(`\r ${processed}/${pages.length} pages, ${embedded} chunks embedded`);
|
||||
onProgress?.(processed, pages.length, embedded);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -168,7 +196,7 @@ async function embedAll(engine: BrainEngine, staleOnly: boolean) {
|
||||
|
||||
total += toEmbed.length;
|
||||
processed++;
|
||||
process.stdout.write(`\r ${processed}/${pages.length} pages, ${embedded} chunks embedded`);
|
||||
onProgress?.(processed, pages.length, embedded);
|
||||
}
|
||||
|
||||
// Sliding worker pool: N workers share a queue and each pulls the
|
||||
@@ -187,5 +215,6 @@ async function embedAll(engine: BrainEngine, staleOnly: boolean) {
|
||||
const numWorkers = Math.min(CONCURRENCY, pages.length);
|
||||
await Promise.all(Array.from({ length: numWorkers }, () => worker()));
|
||||
|
||||
console.log(`\n\nEmbedded ${embedded} chunks across ${pages.length} pages`);
|
||||
// Stdout summary preserved for scripts/tests that grep for counts.
|
||||
console.log(`Embedded ${embedded} chunks across ${pages.length} pages`);
|
||||
}
|
||||
|
||||
@@ -50,17 +50,28 @@ export async function runEvalCommand(engine: BrainEngine, args: string[]): Promi
|
||||
const k = opts.k ?? 5;
|
||||
const configA = buildConfig(opts, 'a');
|
||||
|
||||
const { createProgress } = await import('../core/progress.ts');
|
||||
const { getCliOptions, cliOptsToProgressOptions } = await import('../core/cli-options.ts');
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
|
||||
if (opts.configB || opts.configBPath) {
|
||||
// A/B comparison mode
|
||||
const configB = buildConfig(opts, 'b');
|
||||
progress.start('eval.ab', qrels.length * 2);
|
||||
const onProgress = (_done: number, _total: number, q: string) => progress.tick(1, q);
|
||||
const [reportA, reportB] = await Promise.all([
|
||||
runEval(engine, qrels, configA, k),
|
||||
runEval(engine, qrels, configB, k),
|
||||
runEval(engine, qrels, configA, k, { onProgress }),
|
||||
runEval(engine, qrels, configB, k, { onProgress }),
|
||||
]);
|
||||
progress.finish();
|
||||
printABTable(reportA, reportB, k);
|
||||
} else {
|
||||
// Single-run mode
|
||||
const report = await runEval(engine, qrels, configA, k);
|
||||
progress.start('eval.single', qrels.length);
|
||||
const report = await runEval(engine, qrels, configA, k, {
|
||||
onProgress: (_done, _total, q) => progress.tick(1, q),
|
||||
});
|
||||
progress.finish();
|
||||
printSingleTable(report);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2,6 +2,8 @@ import { writeFileSync, mkdirSync } from 'fs';
|
||||
import { join, dirname } from 'path';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { serializeMarkdown } from '../core/markdown.ts';
|
||||
import { createProgress } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
|
||||
export async function runExport(engine: BrainEngine, args: string[]) {
|
||||
const dirIdx = args.indexOf('--dir');
|
||||
@@ -10,6 +12,10 @@ export async function runExport(engine: BrainEngine, args: string[]) {
|
||||
const pages = await engine.listPages({ limit: 100000 });
|
||||
console.log(`Exporting ${pages.length} pages to ${outDir}/`);
|
||||
|
||||
// Progress on stderr so stdout stays clean for scripts parsing counts.
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('export.pages', pages.length);
|
||||
|
||||
let exported = 0;
|
||||
|
||||
for (const page of pages) {
|
||||
@@ -41,10 +47,10 @@ export async function runExport(engine: BrainEngine, args: string[]) {
|
||||
}
|
||||
|
||||
exported++;
|
||||
if (exported % 100 === 0) {
|
||||
process.stdout.write(`\r ${exported}/${pages.length} exported`);
|
||||
}
|
||||
progress.tick();
|
||||
}
|
||||
|
||||
console.log(`\nExported ${exported} pages to ${outDir}/`);
|
||||
progress.finish();
|
||||
// Stdout summary preserved so scripts that grep for "Exported N pages" keep working.
|
||||
console.log(`Exported ${exported} pages to ${outDir}/`);
|
||||
}
|
||||
|
||||
@@ -26,6 +26,8 @@ import {
|
||||
extractFrontmatterLinks,
|
||||
type UnresolvedFrontmatterRef,
|
||||
} from '../core/link-extraction.ts';
|
||||
import { createProgress } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
|
||||
// Batch size for addLinksBatch / addTimelineEntriesBatch.
|
||||
// Postgres bind-parameter limit is 65535. Links use 4 cols/row → 16K hard ceiling;
|
||||
@@ -415,6 +417,12 @@ async function extractLinksFromDir(
|
||||
const files = walkMarkdownFiles(brainDir);
|
||||
const allSlugs = new Set(files.map(f => f.relPath.replace('.md', '')));
|
||||
|
||||
// Progress stream on stderr (separate from the action-events --json writes
|
||||
// to stdout, which tests grep for). Rate-gated; respects global --quiet /
|
||||
// --progress-json flags.
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('extract.links_fs', files.length);
|
||||
|
||||
// Dedup in dry-run only — DB enforces uniqueness via ON CONFLICT in batch writes.
|
||||
// Without this, the same link extracted from N files would print N times in --dry-run.
|
||||
const dryRunSeen = dryRun ? new Set<string>() : null;
|
||||
@@ -454,11 +462,10 @@ async function extractLinksFromDir(
|
||||
}
|
||||
}
|
||||
} catch { /* skip unreadable */ }
|
||||
if (jsonMode && !dryRun && (i % 100 === 0 || i === files.length - 1)) {
|
||||
process.stderr.write(JSON.stringify({ event: 'progress', phase: 'extracting_links', done: i + 1, total: files.length }) + '\n');
|
||||
}
|
||||
progress.tick(1);
|
||||
}
|
||||
await flush();
|
||||
progress.finish();
|
||||
|
||||
if (!jsonMode) {
|
||||
const label = dryRun ? '(dry run) would create' : 'created';
|
||||
@@ -472,6 +479,9 @@ async function extractTimelineFromDir(
|
||||
): Promise<{ created: number; pages: number }> {
|
||||
const files = walkMarkdownFiles(brainDir);
|
||||
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('extract.timeline_fs', files.length);
|
||||
|
||||
// Dedup in dry-run only — DB enforces uniqueness via ON CONFLICT in batch writes.
|
||||
const dryRunSeen = dryRun ? new Set<string>() : null;
|
||||
|
||||
@@ -510,11 +520,10 @@ async function extractTimelineFromDir(
|
||||
}
|
||||
}
|
||||
} catch { /* skip unreadable */ }
|
||||
if (jsonMode && !dryRun && (i % 100 === 0 || i === files.length - 1)) {
|
||||
process.stderr.write(JSON.stringify({ event: 'progress', phase: 'extracting_timeline', done: i + 1, total: files.length }) + '\n');
|
||||
}
|
||||
progress.tick(1);
|
||||
}
|
||||
await flush();
|
||||
progress.finish();
|
||||
|
||||
if (!jsonMode) {
|
||||
const label = dryRun ? '(dry run) would create' : 'created';
|
||||
@@ -586,6 +595,9 @@ async function extractLinksFromDB(
|
||||
const slugList = Array.from(allSlugs);
|
||||
let processed = 0, created = 0;
|
||||
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('extract.links_db', slugList.length);
|
||||
|
||||
// Dedup in dry-run only — DB enforces uniqueness via ON CONFLICT in batch writes.
|
||||
const dryRunSeen = dryRun ? new Set<string>() : null;
|
||||
|
||||
@@ -661,11 +673,10 @@ async function extractLinksFromDB(
|
||||
}
|
||||
}
|
||||
processed++;
|
||||
if (jsonMode && !dryRun && (processed % 500 === 0 || i === slugList.length - 1)) {
|
||||
process.stderr.write(JSON.stringify({ event: 'progress', phase: 'extracting_links_db', done: processed, total: slugList.length }) + '\n');
|
||||
}
|
||||
progress.tick(1);
|
||||
}
|
||||
await flush();
|
||||
progress.finish();
|
||||
|
||||
if (!jsonMode) {
|
||||
const label = dryRun ? '(dry run) would create' : 'created';
|
||||
@@ -699,6 +710,9 @@ async function extractTimelineFromDB(
|
||||
const slugList = Array.from(allSlugs);
|
||||
let processed = 0, created = 0;
|
||||
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('extract.timeline_db', slugList.length);
|
||||
|
||||
// Dedup in dry-run only — DB enforces uniqueness via ON CONFLICT in batch writes.
|
||||
const dryRunSeen = dryRun ? new Set<string>() : null;
|
||||
|
||||
@@ -753,11 +767,10 @@ async function extractTimelineFromDB(
|
||||
}
|
||||
}
|
||||
processed++;
|
||||
if (jsonMode && !dryRun && (processed % 500 === 0 || i === slugList.length - 1)) {
|
||||
process.stderr.write(JSON.stringify({ event: 'progress', phase: 'extracting_timeline_db', done: processed, total: slugList.length }) + '\n');
|
||||
}
|
||||
progress.tick(1);
|
||||
}
|
||||
await flush();
|
||||
progress.finish();
|
||||
|
||||
if (!jsonMode) {
|
||||
const label = dryRun ? '(dry run) would create' : 'created';
|
||||
|
||||
@@ -4,6 +4,8 @@ import { createHash } from 'crypto';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import * as db from '../core/db.ts';
|
||||
import { humanSize } from '../core/file-resolver.ts';
|
||||
import { createProgress } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
|
||||
/** Size threshold: files >= 100 MB use TUS resumable upload */
|
||||
const SIZE_THRESHOLD = 100 * 1024 * 1024;
|
||||
@@ -306,13 +308,14 @@ async function syncFiles(dir?: string) {
|
||||
let uploaded = 0;
|
||||
let skipped = 0;
|
||||
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('files.sync', files.length);
|
||||
|
||||
for (let i = 0; i < files.length; i++) {
|
||||
const filePath = files[i];
|
||||
const relativePath = relative(dir, filePath);
|
||||
|
||||
if ((i + 1) % 50 === 0 || i === files.length - 1) {
|
||||
process.stdout.write(`\r ${i + 1}/${files.length} processed, ${uploaded} uploaded, ${skipped} skipped`);
|
||||
}
|
||||
progress.tick(1);
|
||||
|
||||
const hash = fileHash(filePath);
|
||||
const filename = basename(filePath);
|
||||
@@ -343,7 +346,9 @@ async function syncFiles(dir?: string) {
|
||||
uploaded++;
|
||||
}
|
||||
|
||||
console.log(`\n\nFiles sync complete: ${uploaded} uploaded, ${skipped} skipped (unchanged)`);
|
||||
progress.finish();
|
||||
// Stdout summary preserved for scripts/tests that grep for it.
|
||||
console.log(`Files sync complete: ${uploaded} uploaded, ${skipped} skipped (unchanged)`);
|
||||
}
|
||||
|
||||
async function verifyFiles() {
|
||||
|
||||
@@ -5,6 +5,8 @@ import { cpus, totalmem, homedir } from 'os';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { importFile } from '../core/import-file.ts';
|
||||
import { loadConfig } from '../core/config.ts';
|
||||
import { createProgress } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
|
||||
function defaultWorkers(): number {
|
||||
const cpuCount = cpus().length;
|
||||
@@ -81,12 +83,12 @@ export async function runImport(engine: BrainEngine, args: string[], opts: { com
|
||||
const failures: Array<{ path: string; error: string }> = []; // Bug 9
|
||||
const startTime = Date.now();
|
||||
|
||||
function logProgress() {
|
||||
const elapsed = (Date.now() - startTime) / 1000;
|
||||
const rate = elapsed > 0 ? Math.round(processed / elapsed) : 0;
|
||||
const remaining = rate > 0 ? Math.round((files.length - processed) / rate) : 0;
|
||||
const pct = Math.round((processed / files.length) * 100);
|
||||
console.log(`[gbrain import] ${processed}/${files.length} (${pct}%) | ${rate} files/sec | imported: ${imported} | skipped: ${skipped} | errors: ${errors} | ETA: ${remaining}s`);
|
||||
// Progress on stderr so stdout stays clean for the final summary / --json payload.
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('import.files', files.length);
|
||||
|
||||
function tickProgress() {
|
||||
progress.tick(1, `imported=${imported} skipped=${skipped} errors=${errors}`);
|
||||
}
|
||||
|
||||
async function processFile(eng: BrainEngine, filePath: string) {
|
||||
@@ -119,8 +121,8 @@ export async function runImport(engine: BrainEngine, args: string[], opts: { com
|
||||
failures.push({ path: relativePath, error: msg });
|
||||
}
|
||||
processed++;
|
||||
tickProgress();
|
||||
if (processed % 100 === 0 || processed === files.length) {
|
||||
logProgress();
|
||||
// Save checkpoint every 100 files — track completed file set, not just a counter
|
||||
if (processed % 100 === 0) {
|
||||
try {
|
||||
@@ -180,6 +182,8 @@ export async function runImport(engine: BrainEngine, args: string[], opts: { com
|
||||
}
|
||||
}
|
||||
|
||||
progress.finish();
|
||||
|
||||
// Error summary
|
||||
for (const [err, count] of Object.entries(errorCounts)) {
|
||||
if (count > 5) {
|
||||
|
||||
@@ -361,8 +361,14 @@ async function cmdAuto(args: string[]): Promise<void> {
|
||||
let bucketErr = 0;
|
||||
let pagesProcessed = 0;
|
||||
|
||||
const { createProgress } = await import('../core/progress.ts');
|
||||
const { getCliOptions, cliOptsToProgressOptions } = await import('../core/cli-options.ts');
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
|
||||
try {
|
||||
const allSlugs = [...(await engine.getAllSlugs())].sort();
|
||||
const toScan = allSlugs.filter(s => !seen.has(s));
|
||||
progress.start('integrity.auto', toScan.length);
|
||||
for (const slug of allSlugs) {
|
||||
if (pagesProcessed >= limit) break;
|
||||
if (seen.has(slug)) continue;
|
||||
@@ -371,6 +377,7 @@ async function cmdAuto(args: string[]): Promise<void> {
|
||||
if (!page) continue;
|
||||
|
||||
pagesProcessed++;
|
||||
progress.tick(1, slug);
|
||||
|
||||
// Bare-tweet handling
|
||||
if (!skipTweet) {
|
||||
@@ -456,6 +463,8 @@ async function cmdAuto(args: string[]): Promise<void> {
|
||||
}
|
||||
}
|
||||
|
||||
progress.finish();
|
||||
|
||||
// Summary
|
||||
console.log('');
|
||||
console.log(`=== integrity auto summary${dryRun ? ' (DRY RUN)' : ''} ===`);
|
||||
|
||||
@@ -511,11 +511,20 @@ export async function registerBuiltinHandlers(worker: MinionWorker, engine: Brai
|
||||
|
||||
worker.register('embed', async (job) => {
|
||||
const { runEmbedCore } = await import('./embed.ts');
|
||||
// Primary Minion progress channel is job.updateProgress (DB-backed,
|
||||
// readable via `gbrain jobs get <id>`). Stderr from the worker daemon
|
||||
// only emits coarse job-start / job-done lines; per-page detail lives
|
||||
// in the DB. Per Codex review #20.
|
||||
await runEmbedCore(engine, {
|
||||
slug: typeof job.data.slug === 'string' ? job.data.slug : undefined,
|
||||
slugs: Array.isArray(job.data.slugs) ? (job.data.slugs as string[]) : undefined,
|
||||
all: !!job.data.all,
|
||||
stale: job.data.all ? false : (job.data.stale !== false),
|
||||
onProgress: (done, total, embedded) => {
|
||||
// Fire-and-forget: progress updates are best-effort and must not
|
||||
// block the worker loop.
|
||||
job.updateProgress({ done, total, embedded, phase: 'embed.pages' }).catch(() => {});
|
||||
},
|
||||
});
|
||||
return { embedded: true };
|
||||
});
|
||||
|
||||
@@ -268,10 +268,17 @@ export async function runLint(args: string[]) {
|
||||
const isSingleFile = statSync(target).isFile();
|
||||
const pages = isSingleFile ? [target] : collectPages(target);
|
||||
|
||||
// Progress on stderr. Stdout keeps the per-issue human output it always had.
|
||||
const { createProgress } = await import('../core/progress.ts');
|
||||
const { getCliOptions, cliOptsToProgressOptions } = await import('../core/cli-options.ts');
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('lint.pages', pages.length);
|
||||
|
||||
for (const page of pages) {
|
||||
const content = readFileSync(page, 'utf-8');
|
||||
const relPath = isSingleFile ? page : relative(target, page);
|
||||
const issues = lintContent(content, relPath);
|
||||
progress.tick(1);
|
||||
if (issues.length === 0) continue;
|
||||
|
||||
console.log(`\n${relPath}:`);
|
||||
@@ -292,6 +299,8 @@ export async function runLint(args: string[]) {
|
||||
}
|
||||
}
|
||||
|
||||
progress.finish();
|
||||
|
||||
// Re-run core for the aggregate counts (cheap; re-parses contents but
|
||||
// produces canonical numbers for the summary line).
|
||||
const result = await runLintCore({ target, fix: doFix, dryRun });
|
||||
|
||||
@@ -14,6 +14,8 @@ import type { EngineConfig } from '../core/types.ts';
|
||||
import { homedir } from 'os';
|
||||
import { join } from 'path';
|
||||
import { writeFileSync, readFileSync, existsSync, unlinkSync } from 'fs';
|
||||
import { createProgress } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
|
||||
interface MigrateOpts {
|
||||
targetEngine: 'postgres' | 'pglite';
|
||||
@@ -146,6 +148,9 @@ export async function runMigrateEngine(sourceEngine: BrainEngine, args: string[]
|
||||
|
||||
console.log(`Migrating ${pagesToMigrate.length} pages (${allPages.length} total, ${completedSet.size} already done)...`);
|
||||
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('migrate.copy_pages', pagesToMigrate.length);
|
||||
|
||||
let migrated = 0;
|
||||
for (const page of pagesToMigrate) {
|
||||
// Copy page
|
||||
@@ -203,20 +208,21 @@ export async function runMigrateEngine(sourceEngine: BrainEngine, args: string[]
|
||||
manifest!.completed_slugs.push(page.slug);
|
||||
saveManifest(manifest!);
|
||||
migrated++;
|
||||
|
||||
if (migrated % 50 === 0 || migrated === pagesToMigrate.length) {
|
||||
console.log(` Progress: ${migrated}/${pagesToMigrate.length} pages`);
|
||||
}
|
||||
progress.tick(1, page.slug);
|
||||
}
|
||||
progress.finish();
|
||||
|
||||
// Copy links (after all pages exist in target)
|
||||
console.log('Copying links...');
|
||||
progress.start('migrate.copy_links', allPages.length);
|
||||
for (const page of allPages) {
|
||||
const links = await sourceEngine.getLinks(page.slug);
|
||||
for (const link of links) {
|
||||
await targetEngine.addLink(link.from_slug, link.to_slug, link.context, link.link_type);
|
||||
}
|
||||
progress.tick(1);
|
||||
}
|
||||
progress.finish();
|
||||
|
||||
// Copy config (selective)
|
||||
const configKeys = ['embedding_model', 'embedding_dimensions', 'chunk_strategy'];
|
||||
|
||||
@@ -23,6 +23,7 @@
|
||||
import { existsSync, readFileSync, writeFileSync, mkdirSync, appendFileSync, lstatSync, statSync, realpathSync } from 'fs';
|
||||
import { join, resolve, dirname } from 'path';
|
||||
import { execSync } from 'child_process';
|
||||
import { childGlobalFlags } from '../../core/cli-options.ts';
|
||||
import type { Migration, OrchestratorOpts, OrchestratorResult, OrchestratorPhaseResult } from './types.ts';
|
||||
import { savePreferences, loadPreferences } from '../../core/preferences.ts';
|
||||
// Bug 3 — appendCompletedMigration moved to the runner (apply-migrations.ts).
|
||||
@@ -60,7 +61,7 @@ export interface PendingHostWorkEntry {
|
||||
function phaseASchema(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
if (opts.dryRun) return { name: 'schema', status: 'skipped', detail: 'dry-run' };
|
||||
try {
|
||||
execSync('gbrain init --migrate-only', { stdio: 'inherit', timeout: 60_000, env: process.env });
|
||||
execSync('gbrain init --migrate-only' + childGlobalFlags(), { stdio: 'inherit', timeout: 60_000, env: process.env });
|
||||
return { name: 'schema', status: 'complete' };
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
|
||||
@@ -32,6 +32,7 @@
|
||||
|
||||
import { execSync } from 'child_process';
|
||||
import type { Migration, OrchestratorOpts, OrchestratorResult, OrchestratorPhaseResult } from './types.ts';
|
||||
import { childGlobalFlags } from '../../core/cli-options.ts';
|
||||
// Bug 3 — ledger writes moved to the runner (apply-migrations.ts).
|
||||
|
||||
// ── Phase A — Schema ────────────────────────────────────────
|
||||
@@ -42,7 +43,7 @@ function phaseASchema(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
// 10-minute budget. Migrations v8/v9 dedup with helper-index should be sub-second
|
||||
// even on 80K-duplicate brains, but the outer wall-clock cap shouldn't be the
|
||||
// failure mode (the prior 60s ceiling tripped Garry's production upgrade).
|
||||
execSync('gbrain init --migrate-only', { stdio: 'inherit', timeout: 600_000, env: process.env });
|
||||
execSync('gbrain init --migrate-only' + childGlobalFlags(), { stdio: 'inherit', timeout: 600_000, env: process.env });
|
||||
return { name: 'schema', status: 'complete' };
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
@@ -92,7 +93,7 @@ function phaseCBackfillLinks(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
// --source db is idempotent: the UNIQUE constraint on
|
||||
// (from_page_id, to_page_id, link_type) and ON CONFLICT DO NOTHING
|
||||
// make re-runs cheap. Empty brains return 0/0 quickly.
|
||||
execSync('gbrain extract links --source db', { stdio: 'inherit', timeout: 600_000, env: process.env });
|
||||
execSync('gbrain extract links --source db' + childGlobalFlags(), { stdio: 'inherit', timeout: 600_000, env: process.env });
|
||||
return { name: 'backfill_links', status: 'complete' };
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
@@ -103,7 +104,7 @@ function phaseCBackfillLinks(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
function phaseDBackfillTimeline(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
if (opts.dryRun) return { name: 'backfill_timeline', status: 'skipped', detail: 'dry-run' };
|
||||
try {
|
||||
execSync('gbrain extract timeline --source db', { stdio: 'inherit', timeout: 600_000, env: process.env });
|
||||
execSync('gbrain extract timeline --source db' + childGlobalFlags(), { stdio: 'inherit', timeout: 600_000, env: process.env });
|
||||
return { name: 'backfill_timeline', status: 'complete' };
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
|
||||
@@ -22,6 +22,7 @@
|
||||
|
||||
import { execSync } from 'child_process';
|
||||
import type { Migration, OrchestratorOpts, OrchestratorResult, OrchestratorPhaseResult } from './types.ts';
|
||||
import { childGlobalFlags } from '../../core/cli-options.ts';
|
||||
// Bug 3 — ledger writes moved to the runner (apply-migrations.ts).
|
||||
|
||||
// ── Phase A — Schema ────────────────────────────────────────
|
||||
@@ -29,7 +30,9 @@ import type { Migration, OrchestratorOpts, OrchestratorResult, OrchestratorPhase
|
||||
function phaseASchema(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
if (opts.dryRun) return { name: 'schema', status: 'skipped', detail: 'dry-run' };
|
||||
try {
|
||||
execSync('gbrain init --migrate-only', { stdio: 'inherit', timeout: 60_000, env: process.env });
|
||||
// Propagate global progress flags so the child shows the same mode the
|
||||
// parent orchestrator is running in.
|
||||
execSync('gbrain init --migrate-only' + childGlobalFlags(), { stdio: 'inherit', timeout: 60_000, env: process.env });
|
||||
return { name: 'schema', status: 'complete' };
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
@@ -42,7 +45,8 @@ function phaseASchema(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
function phaseBRepair(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
if (opts.dryRun) return { name: 'jsonb_repair', status: 'skipped', detail: 'dry-run' };
|
||||
try {
|
||||
execSync('gbrain repair-jsonb', { stdio: 'inherit', timeout: 600_000, env: process.env });
|
||||
// stdio: 'inherit' — child's stderr progress streams straight through.
|
||||
execSync('gbrain repair-jsonb' + childGlobalFlags(), { stdio: 'inherit', timeout: 600_000, env: process.env });
|
||||
return { name: 'jsonb_repair', status: 'complete' };
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
@@ -55,8 +59,14 @@ function phaseBRepair(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
function phaseCVerify(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
if (opts.dryRun) return { name: 'verify', status: 'skipped', detail: 'dry-run' };
|
||||
try {
|
||||
// Explicit stdio discipline: we must parse JSON off child.stdout, so
|
||||
// pipe stdout but let child.stderr (progress) pass straight through.
|
||||
// Any accidental stdout progress from the child would break JSON.parse
|
||||
// (per Codex review #12). NOTE: we deliberately do NOT pass
|
||||
// --progress-json here — this child is parsed, not watched.
|
||||
const out = execSync('gbrain repair-jsonb --dry-run --json', {
|
||||
encoding: 'utf-8', timeout: 60_000, env: process.env,
|
||||
stdio: ['ignore', 'pipe', 'inherit'],
|
||||
});
|
||||
const parsed = JSON.parse(out) as { total_repaired?: number; engine?: string };
|
||||
const remaining = parsed.total_repaired ?? 0;
|
||||
|
||||
@@ -14,6 +14,8 @@
|
||||
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import * as db from '../core/db.ts';
|
||||
import { createProgress, startHeartbeat } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
|
||||
// --- Types ---
|
||||
|
||||
@@ -121,13 +123,28 @@ export async function queryOrphanPages(): Promise<{ slug: string; title: string;
|
||||
* Returns structured OrphanResult with totals.
|
||||
*/
|
||||
export async function findOrphans(includePseudo: boolean = false): Promise<OrphanResult> {
|
||||
const allOrphans = await queryOrphanPages();
|
||||
const totalPages = allOrphans.length; // pages with no inbound links
|
||||
// The NOT EXISTS anti-join over pages × links can take seconds on 50K-page
|
||||
// brains. Heartbeat every second so agents see the scan is alive. Keyset
|
||||
// pagination was considered and rejected: without an index on
|
||||
// links.to_page_id it does no useful work. Adding that index is a
|
||||
// follow-up (v0.14.3 schema migration).
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('orphans.scan');
|
||||
const stopHb = startHeartbeat(progress, 'scanning pages for missing inbound links…');
|
||||
let allOrphans: { slug: string; title: string; domain: string | null }[];
|
||||
let total: number;
|
||||
try {
|
||||
allOrphans = await queryOrphanPages();
|
||||
|
||||
// Count total pages in DB for the summary line
|
||||
const sql = db.getConnection();
|
||||
const [{ count: totalPagesCount }] = await sql`SELECT count(*)::int AS count FROM pages`;
|
||||
const total = Number(totalPagesCount);
|
||||
// Count total pages in DB for the summary line
|
||||
const sql = db.getConnection();
|
||||
const [{ count: totalPagesCount }] = await sql`SELECT count(*)::int AS count FROM pages`;
|
||||
total = Number(totalPagesCount);
|
||||
} finally {
|
||||
stopHb();
|
||||
progress.finish();
|
||||
}
|
||||
const _totalPages = allOrphans.length; // pages with no inbound links (preserved for ref)
|
||||
|
||||
const filtered = includePseudo
|
||||
? allOrphans
|
||||
|
||||
@@ -31,6 +31,8 @@
|
||||
import { loadConfig, toEngineConfig } from '../core/config.ts';
|
||||
import type { EngineConfig } from '../core/types.ts';
|
||||
import * as db from '../core/db.ts';
|
||||
import { createProgress, startHeartbeat } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
|
||||
interface RepairTarget {
|
||||
table: string;
|
||||
@@ -97,28 +99,44 @@ export async function repairJsonb(opts: RepairOpts = { dryRun: false }): Promise
|
||||
await db.connect(engineCfg);
|
||||
const sql = db.getConnection();
|
||||
|
||||
// Progress on stderr only. Stdout is reserved for the JSON summary that
|
||||
// migrations/v0_12_2.ts parses via JSON.parse — stray progress lines on
|
||||
// stdout would break the orchestrator (per Codex review #12).
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
progress.start('repair_jsonb.run', TARGETS.length);
|
||||
|
||||
for (const t of TARGETS) {
|
||||
const phase = `repair_jsonb.${t.table}.${t.column}`;
|
||||
progress.heartbeat(phase);
|
||||
// Heartbeat the caller while each UPDATE runs (minutes on 50K-row tables).
|
||||
const stopHb = startHeartbeat(progress, `${t.table}.${t.column}`);
|
||||
let repaired = 0;
|
||||
|
||||
if (opts.dryRun) {
|
||||
const rows = await sql.unsafe(
|
||||
`SELECT count(*)::int AS n FROM ${t.table} WHERE jsonb_typeof(${t.column}) = 'string'`,
|
||||
);
|
||||
repaired = (rows[0] as { n: number }).n;
|
||||
} else {
|
||||
const rows = await sql.unsafe(
|
||||
`UPDATE ${t.table}
|
||||
SET ${t.column} = (${t.column} #>> '{}')::jsonb
|
||||
WHERE jsonb_typeof(${t.column}) = 'string'
|
||||
RETURNING 1`,
|
||||
);
|
||||
repaired = rows.length;
|
||||
try {
|
||||
if (opts.dryRun) {
|
||||
const rows = await sql.unsafe(
|
||||
`SELECT count(*)::int AS n FROM ${t.table} WHERE jsonb_typeof(${t.column}) = 'string'`,
|
||||
);
|
||||
repaired = (rows[0] as { n: number }).n;
|
||||
} else {
|
||||
const rows = await sql.unsafe(
|
||||
`UPDATE ${t.table}
|
||||
SET ${t.column} = (${t.column} #>> '{}')::jsonb
|
||||
WHERE jsonb_typeof(${t.column}) = 'string'
|
||||
RETURNING 1`,
|
||||
);
|
||||
repaired = rows.length;
|
||||
}
|
||||
} finally {
|
||||
stopHb();
|
||||
}
|
||||
|
||||
progress.tick(1, `${t.table}.${t.column}=${repaired}`);
|
||||
result.per_target.push({ table: t.table, column: t.column, rows_repaired: repaired });
|
||||
result.total_repaired += repaired;
|
||||
}
|
||||
|
||||
progress.finish();
|
||||
return result;
|
||||
}
|
||||
|
||||
|
||||
@@ -18,6 +18,7 @@
|
||||
|
||||
import { execFileSync } from 'child_process';
|
||||
import { VERSION } from '../version.ts';
|
||||
import { getCliOptions } from '../core/cli-options.ts';
|
||||
|
||||
/**
|
||||
* Resolve the gbrain binary + args for spawning subcommands from
|
||||
@@ -207,7 +208,9 @@ Exit codes:
|
||||
return;
|
||||
}
|
||||
|
||||
const quiet = args.includes('--quiet');
|
||||
// --quiet is parsed as a global flag in src/cli.ts (and stripped from argv
|
||||
// before reaching here); honor it via the CliOptions singleton.
|
||||
const quiet = getCliOptions().quiet;
|
||||
const report = buildReport();
|
||||
|
||||
if (!quiet) {
|
||||
|
||||
@@ -12,6 +12,8 @@ import {
|
||||
acknowledgeSyncFailures,
|
||||
} from '../core/sync.ts';
|
||||
import type { SyncManifest } from '../core/sync.ts';
|
||||
import { createProgress } from '../core/progress.ts';
|
||||
import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts';
|
||||
|
||||
export interface SyncResult {
|
||||
status: 'up_to_date' | 'synced' | 'first_sync' | 'dry_run' | 'blocked_by_failures';
|
||||
@@ -190,29 +192,43 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
|
||||
let chunksCreated = 0;
|
||||
const start = Date.now();
|
||||
|
||||
// Per-file progress on stderr so agents see each step of a big sync.
|
||||
// Phases: sync.deletes, sync.renames, sync.imports.
|
||||
const progress = createProgress(cliOptsToProgressOptions(getCliOptions()));
|
||||
|
||||
// Process deletes first (prevents slug conflicts)
|
||||
for (const path of filtered.deleted) {
|
||||
const slug = pathToSlug(path);
|
||||
await engine.deletePage(slug);
|
||||
pagesAffected.push(slug);
|
||||
if (filtered.deleted.length > 0) {
|
||||
progress.start('sync.deletes', filtered.deleted.length);
|
||||
for (const path of filtered.deleted) {
|
||||
const slug = pathToSlug(path);
|
||||
await engine.deletePage(slug);
|
||||
pagesAffected.push(slug);
|
||||
progress.tick(1, slug);
|
||||
}
|
||||
progress.finish();
|
||||
}
|
||||
|
||||
// Process renames (updateSlug preserves page_id, chunks, embeddings)
|
||||
for (const { from, to } of filtered.renamed) {
|
||||
const oldSlug = pathToSlug(from);
|
||||
const newSlug = pathToSlug(to);
|
||||
try {
|
||||
await engine.updateSlug(oldSlug, newSlug);
|
||||
} catch {
|
||||
// Slug doesn't exist or collision, treat as add
|
||||
if (filtered.renamed.length > 0) {
|
||||
progress.start('sync.renames', filtered.renamed.length);
|
||||
for (const { from, to } of filtered.renamed) {
|
||||
const oldSlug = pathToSlug(from);
|
||||
const newSlug = pathToSlug(to);
|
||||
try {
|
||||
await engine.updateSlug(oldSlug, newSlug);
|
||||
} catch {
|
||||
// Slug doesn't exist or collision, treat as add
|
||||
}
|
||||
// Reimport at new path (picks up content changes)
|
||||
const filePath = join(repoPath, to);
|
||||
if (existsSync(filePath)) {
|
||||
const result = await importFile(engine, filePath, to, { noEmbed });
|
||||
if (result.status === 'imported') chunksCreated += result.chunks;
|
||||
}
|
||||
pagesAffected.push(newSlug);
|
||||
progress.tick(1, newSlug);
|
||||
}
|
||||
// Reimport at new path (picks up content changes)
|
||||
const filePath = join(repoPath, to);
|
||||
if (existsSync(filePath)) {
|
||||
const result = await importFile(engine, filePath, to, { noEmbed });
|
||||
if (result.status === 'imported') chunksCreated += result.chunks;
|
||||
}
|
||||
pagesAffected.push(newSlug);
|
||||
progress.finish();
|
||||
}
|
||||
|
||||
// Process adds and modifies.
|
||||
@@ -225,24 +241,37 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
|
||||
// ep_poll whenever the diff crosses the old > 10 threshold that used to
|
||||
// trigger the outer wrap. Per-file atomicity is also the right granularity:
|
||||
// one file's failure should not roll back the others' successful imports.
|
||||
//
|
||||
// v0.15.2: per-file progress on stderr via the shared reporter.
|
||||
// Bug 9: per-file failures captured in `failedFiles` so the caller can
|
||||
// gate `sync.last_commit` advancement and record recoverable errors.
|
||||
const failedFiles: Array<{ path: string; error: string; line?: number }> = [];
|
||||
for (const path of [...filtered.added, ...filtered.modified]) {
|
||||
const filePath = join(repoPath, path);
|
||||
if (!existsSync(filePath)) continue;
|
||||
try {
|
||||
const result = await importFile(engine, filePath, path, { noEmbed });
|
||||
if (result.status === 'imported') {
|
||||
chunksCreated += result.chunks;
|
||||
pagesAffected.push(result.slug);
|
||||
} else if (result.status === 'skipped' && (result as any).error) {
|
||||
// importFile returned a non-throw skip with a reason
|
||||
failedFiles.push({ path, error: String((result as any).error) });
|
||||
const addsAndMods = [...filtered.added, ...filtered.modified];
|
||||
if (addsAndMods.length > 0) {
|
||||
progress.start('sync.imports', addsAndMods.length);
|
||||
for (const path of addsAndMods) {
|
||||
const filePath = join(repoPath, path);
|
||||
if (!existsSync(filePath)) {
|
||||
progress.tick(1, `skip:${path}`);
|
||||
continue;
|
||||
}
|
||||
} catch (e: unknown) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
console.error(` Warning: skipped ${path}: ${msg}`);
|
||||
failedFiles.push({ path, error: msg });
|
||||
try {
|
||||
const result = await importFile(engine, filePath, path, { noEmbed });
|
||||
if (result.status === 'imported') {
|
||||
chunksCreated += result.chunks;
|
||||
pagesAffected.push(result.slug);
|
||||
} else if (result.status === 'skipped' && (result as any).error) {
|
||||
// importFile returned a non-throw skip with a reason.
|
||||
failedFiles.push({ path, error: String((result as any).error) });
|
||||
}
|
||||
} catch (e: unknown) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
console.error(` Warning: skipped ${path}: ${msg}`);
|
||||
failedFiles.push({ path, error: msg });
|
||||
}
|
||||
progress.tick(1, path);
|
||||
}
|
||||
progress.finish();
|
||||
}
|
||||
|
||||
const elapsed = Date.now() - start;
|
||||
|
||||
@@ -56,11 +56,16 @@ export async function runUpgrade(args: string[]) {
|
||||
// Save old version for post-upgrade migration detection
|
||||
saveUpgradeState(oldVersion, newVersion);
|
||||
// Run post-upgrade feature discovery (reads migration files from the NEW binary).
|
||||
// Timeout bumped 30s → 300s because runPostUpgrade now tail-calls
|
||||
// apply-migrations, which can do long work (schema, smoke, host-rewrite,
|
||||
// autopilot install) on a v0.11.0→v0.11.1 jump. Codex H7.
|
||||
// Timeout bumped 300s → 1800s (30 min) in v0.15.2 because v0.12.0 graph
|
||||
// backfill on 50K+ brains regularly exceeded the old ceiling. The heartbeat
|
||||
// wiring added in v0.15.2 makes the long wait observable; a hard 300s
|
||||
// cap would still kill legit migrations mid-run. Override via
|
||||
// GBRAIN_POST_UPGRADE_TIMEOUT_MS env var.
|
||||
const postUpgradeTimeoutMs = Number(
|
||||
process.env.GBRAIN_POST_UPGRADE_TIMEOUT_MS || 1_800_000,
|
||||
);
|
||||
try {
|
||||
execSync('gbrain post-upgrade', { stdio: 'inherit', timeout: 300_000 });
|
||||
execSync('gbrain post-upgrade', { stdio: 'inherit', timeout: postUpgradeTimeoutMs });
|
||||
} catch (e) {
|
||||
// post-upgrade is best-effort, don't fail the upgrade. BUT leave a
|
||||
// trail so `gbrain doctor` can surface it and give the user a clear
|
||||
|
||||
147
src/core/cli-options.ts
Normal file
147
src/core/cli-options.ts
Normal file
@@ -0,0 +1,147 @@
|
||||
/**
|
||||
* Global CLI flags parsed before command dispatch.
|
||||
*
|
||||
* Keeping this separate from per-command flag parsing so that
|
||||
* `gbrain --progress-json doctor` works: the global flag is stripped
|
||||
* before cli.ts looks at argv[0] for the subcommand.
|
||||
*
|
||||
* Threading: every command handler receives a resolved CliOptions object.
|
||||
* Shared-operation handlers see the same values via OperationContext.cliOpts.
|
||||
*/
|
||||
|
||||
import type { ProgressOptions } from './progress.ts';
|
||||
|
||||
export interface CliOptions {
|
||||
quiet: boolean;
|
||||
progressJson: boolean;
|
||||
progressInterval: number; // ms
|
||||
}
|
||||
|
||||
export const DEFAULT_CLI_OPTIONS: CliOptions = {
|
||||
quiet: false,
|
||||
progressJson: false,
|
||||
progressInterval: 1000,
|
||||
};
|
||||
|
||||
/**
|
||||
* Parse recognized global flags from the front / anywhere in argv and return
|
||||
* the resolved options plus the remaining argv (with global flags stripped).
|
||||
*
|
||||
* Recognized:
|
||||
* --quiet
|
||||
* --progress-json
|
||||
* --progress-interval=<ms>
|
||||
* --progress-interval <ms> (space-separated form)
|
||||
*
|
||||
* Unknown flags are passed through unchanged — per-command parsers see them.
|
||||
*/
|
||||
export function parseGlobalFlags(argv: string[]): { cliOpts: CliOptions; rest: string[] } {
|
||||
const cliOpts: CliOptions = { ...DEFAULT_CLI_OPTIONS };
|
||||
const rest: string[] = [];
|
||||
|
||||
for (let i = 0; i < argv.length; i++) {
|
||||
const a = argv[i];
|
||||
if (a === '--quiet') {
|
||||
cliOpts.quiet = true;
|
||||
continue;
|
||||
}
|
||||
if (a === '--progress-json') {
|
||||
cliOpts.progressJson = true;
|
||||
continue;
|
||||
}
|
||||
if (a === '--progress-interval' && i + 1 < argv.length) {
|
||||
const next = argv[i + 1];
|
||||
const parsed = parseInterval(next);
|
||||
if (parsed !== null) {
|
||||
cliOpts.progressInterval = parsed;
|
||||
i++;
|
||||
continue;
|
||||
}
|
||||
// not a number — let per-command parser handle; pass through
|
||||
rest.push(a);
|
||||
continue;
|
||||
}
|
||||
if (a.startsWith('--progress-interval=')) {
|
||||
const val = a.slice('--progress-interval='.length);
|
||||
const parsed = parseInterval(val);
|
||||
if (parsed !== null) {
|
||||
cliOpts.progressInterval = parsed;
|
||||
continue;
|
||||
}
|
||||
rest.push(a);
|
||||
continue;
|
||||
}
|
||||
rest.push(a);
|
||||
}
|
||||
|
||||
return { cliOpts, rest };
|
||||
}
|
||||
|
||||
function parseInterval(s: string): number | null {
|
||||
const n = Number(s);
|
||||
if (!Number.isFinite(n) || n < 0) return null;
|
||||
return Math.floor(n);
|
||||
}
|
||||
|
||||
/**
|
||||
* Map resolved CliOptions to ProgressOptions for createProgress().
|
||||
*
|
||||
* Mode resolution:
|
||||
* --quiet → 'quiet'
|
||||
* --progress-json → 'json'
|
||||
* otherwise → 'auto' (TTY: human-\r, non-TTY: human-plain)
|
||||
*
|
||||
* Agents that want structured events on a non-TTY stream must pass
|
||||
* --progress-json explicitly. Non-TTY default is plain human lines so
|
||||
* shell pipelines don't suddenly see JSON noise.
|
||||
*/
|
||||
export function cliOptsToProgressOptions(cliOpts: CliOptions): ProgressOptions {
|
||||
if (cliOpts.quiet) return { mode: 'quiet' };
|
||||
if (cliOpts.progressJson) return { mode: 'json', minIntervalMs: cliOpts.progressInterval };
|
||||
return { mode: 'auto', minIntervalMs: cliOpts.progressInterval };
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Module-level singleton (set once by cli.ts after parsing global flags; read
|
||||
// by any bulk command that wants to construct a reporter). Same pattern as
|
||||
// Commander's `program.opts()`. Also threaded into OperationContext for
|
||||
// shared ops that run under the MCP server (which sets its own defaults).
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
let activeCliOptions: CliOptions = { ...DEFAULT_CLI_OPTIONS };
|
||||
|
||||
export function setCliOptions(opts: CliOptions): void {
|
||||
activeCliOptions = { ...opts };
|
||||
}
|
||||
|
||||
export function getCliOptions(): CliOptions {
|
||||
return activeCliOptions;
|
||||
}
|
||||
|
||||
/**
|
||||
* Reset singleton to defaults. Only used by tests.
|
||||
*/
|
||||
export function _resetCliOptionsForTest(): void {
|
||||
activeCliOptions = { ...DEFAULT_CLI_OPTIONS };
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the global-flag suffix to append to child `gbrain …` subprocess
|
||||
* commands so children inherit the parent's progress-mode.
|
||||
*
|
||||
* Returns a string ready to concat onto an execSync command string, with
|
||||
* a leading space when non-empty. E.g. " --progress-json --quiet".
|
||||
*
|
||||
* Empty string when nothing to propagate (so the child's behavior is
|
||||
* unchanged for the common no-flag case).
|
||||
*/
|
||||
export function childGlobalFlags(cliOpts?: CliOptions): string {
|
||||
const opts = cliOpts ?? activeCliOptions;
|
||||
const parts: string[] = [];
|
||||
if (opts.quiet) parts.push('--quiet');
|
||||
if (opts.progressJson) parts.push('--progress-json');
|
||||
if (opts.progressInterval !== DEFAULT_CLI_OPTIONS.progressInterval) {
|
||||
parts.push(`--progress-interval=${opts.progressInterval}`);
|
||||
}
|
||||
return parts.length > 0 ? ' ' + parts.join(' ') : '';
|
||||
}
|
||||
@@ -32,7 +32,19 @@ export async function embed(text: string): Promise<Float32Array> {
|
||||
return result[0];
|
||||
}
|
||||
|
||||
export async function embedBatch(texts: string[]): Promise<Float32Array[]> {
|
||||
export interface EmbedBatchOptions {
|
||||
/**
|
||||
* Optional callback fired after each 100-item sub-batch completes.
|
||||
* CLI wrappers tick a reporter; Minion handlers can call
|
||||
* job.updateProgress here instead of hooking the per-page callback.
|
||||
*/
|
||||
onBatchComplete?: (done: number, total: number) => void;
|
||||
}
|
||||
|
||||
export async function embedBatch(
|
||||
texts: string[],
|
||||
options: EmbedBatchOptions = {},
|
||||
): Promise<Float32Array[]> {
|
||||
const truncated = texts.map(t => t.slice(0, MAX_CHARS));
|
||||
const results: Float32Array[] = [];
|
||||
|
||||
@@ -41,6 +53,7 @@ export async function embedBatch(texts: string[]): Promise<Float32Array[]> {
|
||||
const batch = truncated.slice(i, i + BATCH_SIZE);
|
||||
const batchResults = await embedBatchWithRetry(batch);
|
||||
results.push(...batchResults);
|
||||
options.onBatchComplete?.(results.length, truncated.length);
|
||||
}
|
||||
|
||||
return results;
|
||||
|
||||
@@ -146,11 +146,13 @@ export async function enrichEntity(
|
||||
|
||||
/**
|
||||
* Enrich multiple entities with throttling between each.
|
||||
* config.onProgress is called after each entity so callers can stream
|
||||
* progress to a reporter (CLI) or job.updateProgress (Minion).
|
||||
*/
|
||||
export async function enrichEntities(
|
||||
engine: BrainEngine,
|
||||
requests: EnrichmentRequest[],
|
||||
config?: { throttle?: boolean },
|
||||
config?: { throttle?: boolean; onProgress?: (done: number, total: number, name: string) => void },
|
||||
): Promise<EnrichmentResult[]> {
|
||||
const results: EnrichmentResult[] = [];
|
||||
for (const req of requests) {
|
||||
@@ -159,6 +161,7 @@ export async function enrichEntities(
|
||||
}
|
||||
const result = await enrichEntity(engine, req);
|
||||
results.push(result);
|
||||
config?.onProgress?.(results.length, requests.length, req.name);
|
||||
}
|
||||
return results;
|
||||
}
|
||||
|
||||
@@ -167,6 +167,13 @@ export interface OperationContext {
|
||||
* When unset, operations MUST default to the stricter (remote=true) behavior.
|
||||
*/
|
||||
remote?: boolean;
|
||||
/**
|
||||
* Resolved global CLI options (--quiet / --progress-json / --progress-interval).
|
||||
* CLI callers populate this from `getCliOptions()`. MCP / library callers
|
||||
* may leave it undefined — consumers default to quiet/no-progress for
|
||||
* background work.
|
||||
*/
|
||||
cliOpts?: { quiet: boolean; progressJson: boolean; progressInterval: number };
|
||||
}
|
||||
|
||||
export interface Operation {
|
||||
|
||||
477
src/core/progress.ts
Normal file
477
src/core/progress.ts
Normal file
@@ -0,0 +1,477 @@
|
||||
/**
|
||||
* Bulk-action progress reporter.
|
||||
*
|
||||
* Single source of truth for per-object progress on long-running binaries
|
||||
* (doctor, embed, sync, extract, etc.). Writes to stderr so stdout stays
|
||||
* clean for data / JSON output that agents parse.
|
||||
*
|
||||
* Modes:
|
||||
* auto (default): isTTY ? human-\r : human-plain one-line-per-event
|
||||
* human: force human rendering
|
||||
* json: emit one JSON object per line (see schema below)
|
||||
* quiet: no output
|
||||
*
|
||||
* JSON event schema (stable from v0.15.2, additive only):
|
||||
* {"event":"start","phase":"<snake.dot.path>","total"?:N,"ts":"<iso>"}
|
||||
* {"event":"tick","phase":"...","done":N,"total"?:N,"pct"?:F,"elapsed_ms":N,"eta_ms"?:N,"ts":"..."}
|
||||
* {"event":"heartbeat","phase":"...","note":"<str>","elapsed_ms":N,"ts":"..."}
|
||||
* {"event":"finish","phase":"...","done"?:N,"total"?:N,"elapsed_ms":N,"ts":"..."}
|
||||
* {"event":"abort","phase":"...","reason":"<SIGINT|SIGTERM>","elapsed_ms":N,"ts":"..."}
|
||||
*
|
||||
* Rules:
|
||||
* - phase uses snake_case dot-separated machine-stable names.
|
||||
* - total/pct/eta_ms are omitted when total is unknown (no fake totals).
|
||||
* - stdout is NEVER written to. Data output stays a separate concern.
|
||||
*
|
||||
* See docs/progress-events.md for the full reference.
|
||||
*/
|
||||
|
||||
export type ProgressMode = 'auto' | 'human' | 'json' | 'quiet';
|
||||
|
||||
export interface ProgressOptions {
|
||||
mode?: ProgressMode;
|
||||
stream?: NodeJS.WritableStream; // default process.stderr
|
||||
minIntervalMs?: number; // default 1000
|
||||
minItems?: number; // default: max(10, Math.ceil((total||1000)/100))
|
||||
}
|
||||
|
||||
export interface ProgressReporter {
|
||||
start(phase: string, total?: number): void;
|
||||
tick(n?: number, note?: string): void;
|
||||
heartbeat(note: string): void;
|
||||
finish(note?: string): void;
|
||||
child(phase: string, total?: number): ProgressReporter;
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Singleton signal coordinator
|
||||
// ---------------------------------------------------------------------------
|
||||
// Per Codex review #28/#29: one process-level SIGINT/SIGTERM handler, tracking
|
||||
// every live reporter. Per-instance handlers would leak listeners and interfere
|
||||
// with command-level handlers (e.g. shell-handler abort in jobs.ts).
|
||||
//
|
||||
// We never call process.exit() or swallow the signal — we just emit abort
|
||||
// events for live phases, then remove ourselves so the user's own handlers
|
||||
// (or the default Node behavior) run as usual.
|
||||
|
||||
interface LivePhase {
|
||||
reporter: PhaseState;
|
||||
abort: (reason: string) => void;
|
||||
}
|
||||
|
||||
const liveReporters = new Set<LivePhase>();
|
||||
let signalHandlerInstalled = false;
|
||||
|
||||
function installSignalHandler(): void {
|
||||
if (signalHandlerInstalled) return;
|
||||
signalHandlerInstalled = true;
|
||||
|
||||
const onSignal = (reason: 'SIGINT' | 'SIGTERM') => {
|
||||
// Copy to array so abort() can mutate liveReporters during iteration.
|
||||
const snapshot = Array.from(liveReporters);
|
||||
for (const entry of snapshot) {
|
||||
try {
|
||||
entry.abort(reason);
|
||||
} catch {
|
||||
/* best-effort */
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
// once() so we don't block user handlers or double-fire.
|
||||
process.once('SIGINT', () => onSignal('SIGINT'));
|
||||
process.once('SIGTERM', () => onSignal('SIGTERM'));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Mode resolution
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
function resolveMode(mode: ProgressMode, stream: NodeJS.WritableStream): 'human-tty' | 'human-plain' | 'json' | 'quiet' {
|
||||
if (mode === 'quiet') return 'quiet';
|
||||
if (mode === 'json') return 'json';
|
||||
const isTty = (stream as { isTTY?: boolean }).isTTY === true;
|
||||
if (mode === 'human') return isTty ? 'human-tty' : 'human-plain';
|
||||
// auto
|
||||
return isTty ? 'human-tty' : 'human-plain';
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Stream write with EPIPE defense (sync throw path AND 'error' event path).
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const brokenStreams = new WeakSet<NodeJS.WritableStream>();
|
||||
|
||||
function safeWrite(stream: NodeJS.WritableStream, chunk: string): void {
|
||||
if (brokenStreams.has(stream)) return;
|
||||
try {
|
||||
stream.write(chunk, (err) => {
|
||||
if (err) brokenStreams.add(stream);
|
||||
});
|
||||
} catch {
|
||||
brokenStreams.add(stream);
|
||||
}
|
||||
}
|
||||
|
||||
// Attach one 'error' listener per stream so async EPIPE marks it broken.
|
||||
const errorListenersAttached = new WeakSet<NodeJS.WritableStream>();
|
||||
function attachErrorListener(stream: NodeJS.WritableStream): void {
|
||||
if (errorListenersAttached.has(stream)) return;
|
||||
errorListenersAttached.add(stream);
|
||||
// 'error' on a raw tty/pipe is rare, but EPIPE can surface this way.
|
||||
(stream as NodeJS.EventEmitter).on?.('error', () => {
|
||||
brokenStreams.add(stream);
|
||||
});
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Rendering helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
function renderHumanLine(phase: string, done: number | undefined, total: number | undefined, note: string | undefined): string {
|
||||
const parts: string[] = [`[${phase}]`];
|
||||
if (typeof done === 'number') {
|
||||
if (typeof total === 'number' && total > 0) {
|
||||
const pct = Math.floor((done / total) * 100);
|
||||
parts.push(`${done}/${total} (${pct}%)`);
|
||||
} else {
|
||||
parts.push(`${done}`);
|
||||
}
|
||||
}
|
||||
if (note) parts.push(note);
|
||||
return parts.join(' ');
|
||||
}
|
||||
|
||||
function nowIso(): string {
|
||||
return new Date().toISOString();
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Phase state (per start/finish lifecycle of one reporter instance)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
interface PhaseState {
|
||||
phase: string;
|
||||
total?: number;
|
||||
done: number;
|
||||
startedAt: number;
|
||||
lastEmitMs: number;
|
||||
lastDoneEmitted: number;
|
||||
heartbeatTimer?: ReturnType<typeof setInterval>;
|
||||
live: LivePhase | null; // membership in liveReporters for signal cleanup
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Reporter factory
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
interface ReporterInternal extends ProgressReporter {
|
||||
_phasePath: string[]; // for child phase path composition
|
||||
}
|
||||
|
||||
class Reporter implements ReporterInternal {
|
||||
_phasePath: string[];
|
||||
private stream: NodeJS.WritableStream;
|
||||
private renderMode: 'human-tty' | 'human-plain' | 'json' | 'quiet';
|
||||
private minIntervalMs: number;
|
||||
private minItemsOverride?: number;
|
||||
private state: PhaseState | null = null;
|
||||
|
||||
constructor(parentPath: string[], opts: Required<Omit<ProgressOptions, 'stream' | 'minIntervalMs' | 'minItems'>> & {
|
||||
stream: NodeJS.WritableStream;
|
||||
minIntervalMs: number;
|
||||
minItems?: number;
|
||||
}) {
|
||||
this._phasePath = parentPath;
|
||||
this.stream = opts.stream;
|
||||
this.renderMode = resolveMode(opts.mode, opts.stream);
|
||||
this.minIntervalMs = opts.minIntervalMs;
|
||||
this.minItemsOverride = opts.minItems;
|
||||
if (this.renderMode !== 'quiet') {
|
||||
attachErrorListener(this.stream);
|
||||
installSignalHandler();
|
||||
}
|
||||
}
|
||||
|
||||
private defaultMinItems(total?: number): number {
|
||||
if (this.minItemsOverride !== undefined) return this.minItemsOverride;
|
||||
const base = total && total > 0 ? total : 1000;
|
||||
return Math.max(10, Math.ceil(base / 100));
|
||||
}
|
||||
|
||||
private emitJson(obj: Record<string, unknown>): void {
|
||||
safeWrite(this.stream, JSON.stringify(obj) + '\n');
|
||||
}
|
||||
|
||||
private emitHumanLine(line: string): void {
|
||||
if (this.renderMode === 'human-tty') {
|
||||
// \r rewrite: clear-to-EOL then carriage-return-positioned line.
|
||||
safeWrite(this.stream, `\r\x1b[2K${line}`);
|
||||
} else {
|
||||
safeWrite(this.stream, line + '\n');
|
||||
}
|
||||
}
|
||||
|
||||
private finalizeHumanLine(): void {
|
||||
// When a TTY phase ends, move to a new line so subsequent output doesn't overwrite.
|
||||
if (this.renderMode === 'human-tty') safeWrite(this.stream, '\n');
|
||||
}
|
||||
|
||||
private phaseName(localPhase: string): string {
|
||||
return [...this._phasePath, localPhase].join('.');
|
||||
}
|
||||
|
||||
start(localPhase: string, total?: number): void {
|
||||
// Auto-finish prior phase if caller forgot.
|
||||
if (this.state) this.finish();
|
||||
|
||||
const phase = this.phaseName(localPhase);
|
||||
const now = Date.now();
|
||||
const s: PhaseState = {
|
||||
phase,
|
||||
total,
|
||||
done: 0,
|
||||
startedAt: now,
|
||||
lastEmitMs: now,
|
||||
lastDoneEmitted: 0,
|
||||
live: null,
|
||||
};
|
||||
this.state = s;
|
||||
|
||||
// Register with signal coordinator.
|
||||
const live: LivePhase = {
|
||||
reporter: s,
|
||||
abort: (reason) => this.abortFromSignal(reason),
|
||||
};
|
||||
liveReporters.add(live);
|
||||
s.live = live;
|
||||
|
||||
if (this.renderMode === 'quiet') return;
|
||||
|
||||
if (this.renderMode === 'json') {
|
||||
const obj: Record<string, unknown> = { event: 'start', phase, ts: nowIso() };
|
||||
if (typeof total === 'number') obj.total = total;
|
||||
this.emitJson(obj);
|
||||
} else {
|
||||
this.emitHumanLine(renderHumanLine(phase, undefined, total, 'start'));
|
||||
}
|
||||
}
|
||||
|
||||
tick(n: number = 1, note?: string): void {
|
||||
const s = this.state;
|
||||
if (!s) return;
|
||||
s.done += n;
|
||||
|
||||
if (this.renderMode === 'quiet') return;
|
||||
|
||||
const now = Date.now();
|
||||
const sinceEmit = now - s.lastEmitMs;
|
||||
const itemsSinceEmit = s.done - s.lastDoneEmitted;
|
||||
const minItems = this.defaultMinItems(s.total);
|
||||
const isFinalTick = s.total !== undefined && s.done >= s.total;
|
||||
|
||||
// Emit if: time-gate passed, OR enough items since last emit, OR this is the final tick.
|
||||
const shouldEmit = sinceEmit >= this.minIntervalMs || itemsSinceEmit >= minItems || isFinalTick;
|
||||
if (!shouldEmit) return;
|
||||
|
||||
s.lastEmitMs = now;
|
||||
s.lastDoneEmitted = s.done;
|
||||
|
||||
const elapsedMs = now - s.startedAt;
|
||||
if (this.renderMode === 'json') {
|
||||
const obj: Record<string, unknown> = {
|
||||
event: 'tick',
|
||||
phase: s.phase,
|
||||
done: s.done,
|
||||
elapsed_ms: elapsedMs,
|
||||
ts: nowIso(),
|
||||
};
|
||||
if (typeof s.total === 'number' && s.total > 0) {
|
||||
obj.total = s.total;
|
||||
obj.pct = Math.round((s.done / s.total) * 1000) / 10; // one decimal
|
||||
if (s.done > 0) {
|
||||
const msPerItem = elapsedMs / s.done;
|
||||
const remaining = Math.max(0, s.total - s.done);
|
||||
obj.eta_ms = Math.round(msPerItem * remaining);
|
||||
}
|
||||
}
|
||||
if (note) obj.note = note;
|
||||
this.emitJson(obj);
|
||||
} else {
|
||||
this.emitHumanLine(renderHumanLine(s.phase, s.done, s.total, note));
|
||||
}
|
||||
}
|
||||
|
||||
heartbeat(note: string): void {
|
||||
const s = this.state;
|
||||
if (!s) return;
|
||||
if (this.renderMode === 'quiet') return;
|
||||
|
||||
const now = Date.now();
|
||||
const elapsedMs = now - s.startedAt;
|
||||
|
||||
if (this.renderMode === 'json') {
|
||||
this.emitJson({
|
||||
event: 'heartbeat',
|
||||
phase: s.phase,
|
||||
note,
|
||||
elapsed_ms: elapsedMs,
|
||||
ts: nowIso(),
|
||||
});
|
||||
} else {
|
||||
this.emitHumanLine(renderHumanLine(s.phase, undefined, undefined, note));
|
||||
}
|
||||
}
|
||||
|
||||
finish(note?: string): void {
|
||||
const s = this.state;
|
||||
if (!s) return;
|
||||
|
||||
if (s.heartbeatTimer) {
|
||||
clearInterval(s.heartbeatTimer);
|
||||
s.heartbeatTimer = undefined;
|
||||
}
|
||||
if (s.live) {
|
||||
liveReporters.delete(s.live);
|
||||
s.live = null;
|
||||
}
|
||||
|
||||
if (this.renderMode !== 'quiet') {
|
||||
const elapsedMs = Date.now() - s.startedAt;
|
||||
if (this.renderMode === 'json') {
|
||||
const obj: Record<string, unknown> = {
|
||||
event: 'finish',
|
||||
phase: s.phase,
|
||||
elapsed_ms: elapsedMs,
|
||||
ts: nowIso(),
|
||||
};
|
||||
if (s.done > 0) obj.done = s.done;
|
||||
if (typeof s.total === 'number') obj.total = s.total;
|
||||
if (note) obj.note = note;
|
||||
this.emitJson(obj);
|
||||
} else {
|
||||
this.emitHumanLine(renderHumanLine(s.phase, s.done > 0 ? s.done : undefined, s.total, note ?? 'done'));
|
||||
this.finalizeHumanLine();
|
||||
}
|
||||
}
|
||||
|
||||
this.state = null;
|
||||
}
|
||||
|
||||
private abortFromSignal(reason: string): void {
|
||||
const s = this.state;
|
||||
if (!s) return;
|
||||
if (s.heartbeatTimer) {
|
||||
clearInterval(s.heartbeatTimer);
|
||||
s.heartbeatTimer = undefined;
|
||||
}
|
||||
if (this.renderMode !== 'quiet') {
|
||||
const elapsedMs = Date.now() - s.startedAt;
|
||||
if (this.renderMode === 'json') {
|
||||
this.emitJson({
|
||||
event: 'abort',
|
||||
phase: s.phase,
|
||||
reason,
|
||||
elapsed_ms: elapsedMs,
|
||||
ts: nowIso(),
|
||||
});
|
||||
} else {
|
||||
this.emitHumanLine(renderHumanLine(s.phase, s.done > 0 ? s.done : undefined, s.total, `aborted (${reason})`));
|
||||
this.finalizeHumanLine();
|
||||
}
|
||||
}
|
||||
if (s.live) {
|
||||
liveReporters.delete(s.live);
|
||||
s.live = null;
|
||||
}
|
||||
this.state = null;
|
||||
}
|
||||
|
||||
child(localPhase: string, _total?: number): ProgressReporter {
|
||||
// Children inherit mode, stream, rate settings. The child's prefix path
|
||||
// is the parent's currently-active FULL phase (if any) plus the local
|
||||
// child-name passed here, so child.start('file1') renders as
|
||||
// '<parent-phase>.<child-name>.file1'. If parent has no active phase,
|
||||
// fall back to parent's own prefix.
|
||||
const childPath = this.state
|
||||
? [this.state.phase, localPhase]
|
||||
: [...this._phasePath, localPhase];
|
||||
const child = new Reporter(childPath, {
|
||||
mode: this.modeForChildren(),
|
||||
stream: this.stream,
|
||||
minIntervalMs: this.minIntervalMs,
|
||||
minItems: this.minItemsOverride,
|
||||
});
|
||||
return child;
|
||||
}
|
||||
|
||||
/**
|
||||
* Expose a heartbeat timer to external callers. The reporter owns the timer
|
||||
* so we can guarantee cleanup on finish/abort. Caller uses the returned
|
||||
* stopper in a try/finally. Internal helper — the canonical user API is:
|
||||
*
|
||||
* p.start('phase');
|
||||
* const stop = startHeartbeat(p, 'still scanning…');
|
||||
* try { await slowWork(); } finally { stop(); p.finish(); }
|
||||
*/
|
||||
|
||||
// modeForChildren preserves the fully-resolved mode (so a parent in 'json'
|
||||
// doesn't re-evaluate TTY for children — they inherit the explicit mode).
|
||||
private modeForChildren(): ProgressMode {
|
||||
switch (this.renderMode) {
|
||||
case 'human-tty':
|
||||
case 'human-plain':
|
||||
return 'human';
|
||||
case 'json':
|
||||
return 'json';
|
||||
case 'quiet':
|
||||
return 'quiet';
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Public API
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
export function createProgress(opts: ProgressOptions = {}): ProgressReporter {
|
||||
const stream = opts.stream ?? process.stderr;
|
||||
return new Reporter([], {
|
||||
mode: opts.mode ?? 'auto',
|
||||
stream,
|
||||
minIntervalMs: opts.minIntervalMs ?? 1000,
|
||||
minItems: opts.minItems,
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Starts a 1000ms interval that fires p.heartbeat(note). Returns a stop
|
||||
* function to call in finally. Safe to stop twice.
|
||||
*
|
||||
* Use for single long-running queries where there's no iteration to tick.
|
||||
*/
|
||||
export function startHeartbeat(p: ProgressReporter, note: string, intervalMs = 1000): () => void {
|
||||
const timer = setInterval(() => {
|
||||
try {
|
||||
p.heartbeat(note);
|
||||
} catch {
|
||||
/* reporter may be finished; ignore */
|
||||
}
|
||||
}, intervalMs);
|
||||
let stopped = false;
|
||||
return () => {
|
||||
if (stopped) return;
|
||||
stopped = true;
|
||||
clearInterval(timer);
|
||||
};
|
||||
}
|
||||
|
||||
// Test-only hook so we can assert one signal handler across many reporters.
|
||||
// Not part of the public API; used by test/progress.test.ts.
|
||||
export function __liveReporterCountForTest(): number {
|
||||
return liveReporters.size;
|
||||
}
|
||||
|
||||
export function __signalHandlerInstalledForTest(): boolean {
|
||||
return signalHandlerInstalled;
|
||||
}
|
||||
@@ -161,17 +161,27 @@ export function ndcgAtK(hits: string[], grades: Map<string, number>, k: number):
|
||||
* Run a full evaluation of one search configuration against all qrels.
|
||||
* Returns an EvalReport with per-query and mean metrics.
|
||||
*/
|
||||
export interface RunEvalOptions {
|
||||
/**
|
||||
* Optional per-query progress callback. Called after each qrel finishes.
|
||||
* CLI wrappers pass a reporter.tick()-backed implementation; no-op otherwise.
|
||||
*/
|
||||
onProgress?: (done: number, total: number, query: string) => void;
|
||||
}
|
||||
|
||||
export async function runEval(
|
||||
engine: BrainEngine,
|
||||
qrels: EvalQrel[],
|
||||
config: EvalConfig,
|
||||
k = 5,
|
||||
options: RunEvalOptions = {},
|
||||
): Promise<EvalReport> {
|
||||
const strategy = config.strategy ?? 'hybrid';
|
||||
const limit = config.limit ?? Math.max(k * 2, 10);
|
||||
|
||||
const queryResults: QueryResult[] = [];
|
||||
|
||||
let done = 0;
|
||||
for (const qrel of qrels) {
|
||||
const hits = await runQuery(engine, qrel.query, strategy, config, limit);
|
||||
|
||||
@@ -186,6 +196,8 @@ export async function runEval(
|
||||
mrr: mrr(hits, relevantSet),
|
||||
ndcg_at_k: ndcgAtK(hits, gradesMap, k),
|
||||
});
|
||||
done++;
|
||||
options.onProgress?.(done, qrels.length, qrel.query);
|
||||
}
|
||||
|
||||
return {
|
||||
|
||||
161
test/cli-options.test.ts
Normal file
161
test/cli-options.test.ts
Normal file
@@ -0,0 +1,161 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { spawnSync } from 'node:child_process';
|
||||
import { join } from 'node:path';
|
||||
import { parseGlobalFlags, cliOptsToProgressOptions, DEFAULT_CLI_OPTIONS, setCliOptions, getCliOptions, _resetCliOptionsForTest } from '../src/core/cli-options.ts';
|
||||
|
||||
describe('parseGlobalFlags', () => {
|
||||
test('empty argv → defaults, empty rest', () => {
|
||||
const r = parseGlobalFlags([]);
|
||||
expect(r.cliOpts).toEqual(DEFAULT_CLI_OPTIONS);
|
||||
expect(r.rest).toEqual([]);
|
||||
});
|
||||
|
||||
test('strips --quiet from argv and sets quiet=true', () => {
|
||||
// Per-command handlers that historically parsed their own --quiet
|
||||
// (skillpack-check) now read the resolved CliOptions singleton via
|
||||
// getCliOptions() — see src/core/cli-options.ts.
|
||||
const r = parseGlobalFlags(['--quiet', 'doctor', '--fast']);
|
||||
expect(r.cliOpts.quiet).toBe(true);
|
||||
expect(r.cliOpts.progressJson).toBe(false);
|
||||
expect(r.rest).toEqual(['doctor', '--fast']);
|
||||
});
|
||||
|
||||
test('strips --progress-json from argv', () => {
|
||||
const r = parseGlobalFlags(['--progress-json', 'doctor']);
|
||||
expect(r.cliOpts.progressJson).toBe(true);
|
||||
expect(r.rest).toEqual(['doctor']);
|
||||
});
|
||||
|
||||
test('--progress-interval=500 form', () => {
|
||||
const r = parseGlobalFlags(['--progress-interval=500', 'embed']);
|
||||
expect(r.cliOpts.progressInterval).toBe(500);
|
||||
expect(r.rest).toEqual(['embed']);
|
||||
});
|
||||
|
||||
test('--progress-interval 500 space-separated form', () => {
|
||||
const r = parseGlobalFlags(['--progress-interval', '500', 'embed']);
|
||||
expect(r.cliOpts.progressInterval).toBe(500);
|
||||
expect(r.rest).toEqual(['embed']);
|
||||
});
|
||||
|
||||
test('global flag interleaved mid-argv still stripped', () => {
|
||||
const r = parseGlobalFlags(['doctor', '--progress-json', '--fast']);
|
||||
expect(r.cliOpts.progressJson).toBe(true);
|
||||
expect(r.rest).toEqual(['doctor', '--fast']);
|
||||
});
|
||||
|
||||
test('invalid --progress-interval value passes through (per-command parser can handle it)', () => {
|
||||
const r = parseGlobalFlags(['--progress-interval=abc', 'doctor']);
|
||||
// Unparseable value → leave the flag in rest, default interval kept.
|
||||
expect(r.cliOpts.progressInterval).toBe(DEFAULT_CLI_OPTIONS.progressInterval);
|
||||
expect(r.rest).toEqual(['--progress-interval=abc', 'doctor']);
|
||||
});
|
||||
|
||||
test('negative --progress-interval rejected', () => {
|
||||
const r = parseGlobalFlags(['--progress-interval=-1', 'doctor']);
|
||||
expect(r.cliOpts.progressInterval).toBe(DEFAULT_CLI_OPTIONS.progressInterval);
|
||||
expect(r.rest).toContain('--progress-interval=-1');
|
||||
});
|
||||
|
||||
test('unknown flags pass through unchanged', () => {
|
||||
const r = parseGlobalFlags(['doctor', '--fast', '--json', '--foo=bar']);
|
||||
expect(r.rest).toEqual(['doctor', '--fast', '--json', '--foo=bar']);
|
||||
expect(r.cliOpts).toEqual(DEFAULT_CLI_OPTIONS);
|
||||
});
|
||||
|
||||
test('all global flags combined', () => {
|
||||
const r = parseGlobalFlags(['--quiet', '--progress-json', '--progress-interval=250', 'sync']);
|
||||
expect(r.cliOpts).toEqual({ quiet: true, progressJson: true, progressInterval: 250 });
|
||||
expect(r.rest).toEqual(['sync']);
|
||||
});
|
||||
});
|
||||
|
||||
describe('getCliOptions / setCliOptions singleton', () => {
|
||||
test('defaults when never set', () => {
|
||||
_resetCliOptionsForTest();
|
||||
expect(getCliOptions()).toEqual(DEFAULT_CLI_OPTIONS);
|
||||
});
|
||||
|
||||
test('setCliOptions applies + getCliOptions returns a copy', () => {
|
||||
_resetCliOptionsForTest();
|
||||
setCliOptions({ quiet: false, progressJson: true, progressInterval: 250 });
|
||||
expect(getCliOptions().progressJson).toBe(true);
|
||||
expect(getCliOptions().progressInterval).toBe(250);
|
||||
});
|
||||
});
|
||||
|
||||
describe('cli.ts global-flag stripping (integration)', () => {
|
||||
const CLI = join(import.meta.dir, '..', 'src', 'cli.ts');
|
||||
|
||||
test('gbrain --progress-json --version works (global flag stripped before dispatch)', () => {
|
||||
const res = spawnSync('bun', [CLI, '--progress-json', '--version'], {
|
||||
encoding: 'utf-8',
|
||||
env: { ...process.env, NO_COLOR: '1' },
|
||||
});
|
||||
expect(res.status).toBe(0);
|
||||
expect(res.stdout).toContain('gbrain ');
|
||||
});
|
||||
|
||||
test('gbrain --quiet --progress-interval=500 version works (flags interleaved, all stripped)', () => {
|
||||
const res = spawnSync('bun', [CLI, '--quiet', '--progress-interval=500', 'version'], {
|
||||
encoding: 'utf-8',
|
||||
env: { ...process.env, NO_COLOR: '1' },
|
||||
});
|
||||
expect(res.status).toBe(0);
|
||||
expect(res.stdout).toContain('gbrain ');
|
||||
});
|
||||
});
|
||||
|
||||
describe('CLI integration: progress streams to the right channel', () => {
|
||||
const CLI = join(import.meta.dir, '..', 'src', 'cli.ts');
|
||||
|
||||
test('gbrain --progress-json --version emits only the version on stdout', () => {
|
||||
// `version` is a single-shot command that goes through the main()
|
||||
// dispatch path. We want to confirm --progress-json doesn't force
|
||||
// stray progress onto stdout for commands that don't use a reporter.
|
||||
const res = spawnSync('bun', [CLI, '--progress-json', '--version'], {
|
||||
encoding: 'utf-8',
|
||||
env: { ...process.env, NO_COLOR: '1' },
|
||||
});
|
||||
expect(res.status).toBe(0);
|
||||
expect(res.stdout.trim()).toMatch(/^gbrain /);
|
||||
// No JSON progress object should end up on stdout.
|
||||
expect(res.stdout).not.toContain('"event":"start"');
|
||||
});
|
||||
|
||||
test('gbrain --quiet skillpack-check returns exit code with no stdout', () => {
|
||||
// Regression guard for the flag-collision that skillpack-check hit
|
||||
// when --quiet briefly passed through argv. Now it reads the singleton.
|
||||
const res = spawnSync('bun', [CLI, '--quiet', 'skillpack-check'], {
|
||||
encoding: 'utf-8',
|
||||
env: { ...process.env, NO_COLOR: '1' },
|
||||
});
|
||||
// Exit may be 0 or 1 depending on whether a brain is configured;
|
||||
// what matters is stdout stays empty.
|
||||
expect(res.stdout).toBe('');
|
||||
});
|
||||
});
|
||||
|
||||
describe('cliOptsToProgressOptions', () => {
|
||||
test('--quiet → quiet mode', () => {
|
||||
const opts = cliOptsToProgressOptions({ quiet: true, progressJson: false, progressInterval: 1000 });
|
||||
expect(opts.mode).toBe('quiet');
|
||||
});
|
||||
|
||||
test('--progress-json → json mode with interval', () => {
|
||||
const opts = cliOptsToProgressOptions({ quiet: false, progressJson: true, progressInterval: 500 });
|
||||
expect(opts.mode).toBe('json');
|
||||
expect(opts.minIntervalMs).toBe(500);
|
||||
});
|
||||
|
||||
test('defaults → auto mode', () => {
|
||||
const opts = cliOptsToProgressOptions(DEFAULT_CLI_OPTIONS);
|
||||
expect(opts.mode).toBe('auto');
|
||||
expect(opts.minIntervalMs).toBe(1000);
|
||||
});
|
||||
|
||||
test('quiet takes priority over progressJson', () => {
|
||||
const opts = cliOptsToProgressOptions({ quiet: true, progressJson: true, progressInterval: 1000 });
|
||||
expect(opts.mode).toBe('quiet');
|
||||
});
|
||||
});
|
||||
117
test/e2e/doctor-progress.test.ts
Normal file
117
test/e2e/doctor-progress.test.ts
Normal file
@@ -0,0 +1,117 @@
|
||||
/**
|
||||
* E2E — doctor --progress-json streaming.
|
||||
*
|
||||
* Spawns the real CLI against a real Postgres+pgvector instance. Asserts:
|
||||
* - stderr contains one JSON event per DB check (start + heartbeats)
|
||||
* - stdout stays clean of progress (agents that parse stdout don't see
|
||||
* progress garbage mixed with the check results)
|
||||
*
|
||||
* Tier 1 (no API keys). Requires DATABASE_URL or .env.testing.
|
||||
* Run: DATABASE_URL=... bun test test/e2e/doctor-progress.test.ts
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import { spawnSync } from 'child_process';
|
||||
import { join } from 'path';
|
||||
import {
|
||||
hasDatabase, setupDB, teardownDB, importFixtures,
|
||||
} from './helpers.ts';
|
||||
|
||||
const skip = !hasDatabase();
|
||||
const describeE2E = skip ? describe.skip : describe;
|
||||
|
||||
const CLI = join(import.meta.dir, '..', '..', 'src', 'cli.ts');
|
||||
|
||||
describeE2E('gbrain doctor --progress-json (E2E)', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
// Seed a handful of pages so the DB checks have something to scan.
|
||||
await importFixtures();
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await teardownDB();
|
||||
});
|
||||
|
||||
test('stderr has JSONL progress events, stdout stays clean', () => {
|
||||
const res = spawnSync('bun', [CLI, '--progress-json', 'doctor', '--json'], {
|
||||
encoding: 'utf-8',
|
||||
env: { ...process.env, NO_COLOR: '1' },
|
||||
timeout: 30_000,
|
||||
});
|
||||
|
||||
// Even if some checks warn, doctor runs to completion. Failures would
|
||||
// exit non-zero, which is OK — we're testing progress wiring.
|
||||
// Require that some output happened on both streams.
|
||||
expect(res.stderr.length).toBeGreaterThan(0);
|
||||
expect(res.stdout.length).toBeGreaterThan(0);
|
||||
|
||||
// Parse stderr as JSONL. Extract every line that looks like a JSON
|
||||
// object; tolerate stray non-JSON lines (warnings, dependency noise).
|
||||
const lines = res.stderr.split('\n').filter(l => l.trim().startsWith('{'));
|
||||
const events: Array<Record<string, unknown>> = [];
|
||||
for (const line of lines) {
|
||||
try {
|
||||
events.push(JSON.parse(line));
|
||||
} catch {
|
||||
// Not a progress event — could be a legacy stderr logger line.
|
||||
}
|
||||
}
|
||||
|
||||
expect(events.length).toBeGreaterThan(0);
|
||||
|
||||
// We expect at least one 'start' for doctor.db_checks.
|
||||
const starts = events.filter(e => e.event === 'start');
|
||||
const phases = starts.map(e => e.phase);
|
||||
expect(phases).toContain('doctor.db_checks');
|
||||
|
||||
// We expect at least one 'finish' for it too.
|
||||
const finishes = events.filter(e => e.event === 'finish');
|
||||
expect(finishes.some(e => e.phase === 'doctor.db_checks')).toBe(true);
|
||||
|
||||
// Every event has the canonical schema (event, phase, ts).
|
||||
for (const ev of events) {
|
||||
expect(typeof ev.event).toBe('string');
|
||||
expect(typeof ev.phase).toBe('string');
|
||||
expect(typeof ev.ts).toBe('string');
|
||||
}
|
||||
|
||||
// Stdout should be doctor's --json payload (array of checks) and nothing
|
||||
// that looks like a progress event. Parse it as JSON to ensure no stray
|
||||
// progress-line pollution on stdout.
|
||||
const parsed = JSON.parse(res.stdout);
|
||||
expect(Array.isArray(parsed.checks) || Array.isArray(parsed)).toBe(true);
|
||||
});
|
||||
|
||||
test('default (no --progress-json) writes human-plain progress to stderr only', () => {
|
||||
const res = spawnSync('bun', [CLI, 'doctor'], {
|
||||
encoding: 'utf-8',
|
||||
env: { ...process.env, NO_COLOR: '1' },
|
||||
timeout: 30_000,
|
||||
});
|
||||
|
||||
// Stdout may contain the check summary (human-readable) but should NOT
|
||||
// contain `[doctor.db_checks]` — that's stderr territory.
|
||||
expect(res.stdout).not.toContain('[doctor.db_checks]');
|
||||
|
||||
// Stderr should contain the phase bracket marker at least once.
|
||||
// Skip assertion if the DB had no pages and doctor short-circuits fast.
|
||||
if (res.stderr.length > 0) {
|
||||
expect(res.stderr).toContain('doctor.db_checks');
|
||||
}
|
||||
});
|
||||
|
||||
test('--quiet suppresses progress entirely', () => {
|
||||
const res = spawnSync('bun', [CLI, '--quiet', 'doctor'], {
|
||||
encoding: 'utf-8',
|
||||
env: { ...process.env, NO_COLOR: '1' },
|
||||
timeout: 30_000,
|
||||
});
|
||||
|
||||
// With --quiet the reporter emits no start/finish/tick lines on stderr.
|
||||
// Stderr may still contain warnings/errors from doctor's own logger,
|
||||
// just no progress phases.
|
||||
expect(res.stderr).not.toContain('[doctor.db_checks]');
|
||||
expect(res.stderr).not.toContain('"event":"start"');
|
||||
});
|
||||
});
|
||||
260
test/progress.test.ts
Normal file
260
test/progress.test.ts
Normal file
@@ -0,0 +1,260 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { PassThrough } from 'node:stream';
|
||||
import { createProgress, startHeartbeat, __liveReporterCountForTest, __signalHandlerInstalledForTest } from '../src/core/progress.ts';
|
||||
|
||||
/** Collect everything a reporter writes into a string. */
|
||||
function sink(isTTY = false): { stream: PassThrough & { isTTY?: boolean }; read: () => string } {
|
||||
const s = new PassThrough() as PassThrough & { isTTY?: boolean };
|
||||
s.isTTY = isTTY;
|
||||
const chunks: string[] = [];
|
||||
s.on('data', (c) => chunks.push(c.toString('utf8')));
|
||||
return { stream: s, read: () => chunks.join('') };
|
||||
}
|
||||
|
||||
function parseJsonl(raw: string): Record<string, unknown>[] {
|
||||
return raw
|
||||
.split('\n')
|
||||
.filter((l) => l.length > 0)
|
||||
.map((l) => JSON.parse(l));
|
||||
}
|
||||
|
||||
describe('progress reporter', () => {
|
||||
test('auto mode: non-TTY → human-plain (NOT JSON)', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'auto', stream, minIntervalMs: 0, minItems: 1 });
|
||||
p.start('scan', 3);
|
||||
p.tick();
|
||||
p.tick();
|
||||
p.tick();
|
||||
p.finish();
|
||||
const out = read();
|
||||
// plain lines, no JSON
|
||||
expect(out).not.toContain('"event"');
|
||||
expect(out).toContain('[scan]');
|
||||
expect(out).toContain('1/3');
|
||||
expect(out).toContain('3/3');
|
||||
});
|
||||
|
||||
test('auto mode: TTY → human-\\r (carriage return, no newline between ticks)', () => {
|
||||
const { stream, read } = sink(true);
|
||||
const p = createProgress({ mode: 'auto', stream, minIntervalMs: 0, minItems: 1 });
|
||||
p.start('scan', 2);
|
||||
p.tick();
|
||||
p.tick();
|
||||
p.finish();
|
||||
const out = read();
|
||||
// TTY path uses \r + clear-line escape; final newline on finish.
|
||||
expect(out).toContain('\r');
|
||||
expect(out).toContain('[scan]');
|
||||
});
|
||||
|
||||
test('json mode emits one JSON object per line with schema', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 0, minItems: 1 });
|
||||
p.start('doctor.jsonb_integrity', 4);
|
||||
p.tick(1, 'pages.frontmatter');
|
||||
p.tick(1, 'raw_data.data');
|
||||
p.finish();
|
||||
const events = parseJsonl(read());
|
||||
expect(events.length).toBeGreaterThanOrEqual(3);
|
||||
expect(events[0]).toMatchObject({ event: 'start', phase: 'doctor.jsonb_integrity', total: 4 });
|
||||
expect(events[0].ts).toMatch(/^\d{4}-\d{2}-\d{2}T/);
|
||||
expect(events[1]).toMatchObject({ event: 'tick', phase: 'doctor.jsonb_integrity', done: 1, total: 4 });
|
||||
expect(events[1].pct).toBe(25);
|
||||
expect(typeof events[1].elapsed_ms).toBe('number');
|
||||
expect(events[events.length - 1]).toMatchObject({ event: 'finish', phase: 'doctor.jsonb_integrity' });
|
||||
});
|
||||
|
||||
test('quiet mode emits nothing', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'quiet', stream });
|
||||
p.start('scan', 10);
|
||||
p.tick();
|
||||
p.heartbeat('hello');
|
||||
p.finish();
|
||||
expect(read()).toBe('');
|
||||
});
|
||||
|
||||
test('tick() time-gated: calls inside minIntervalMs collapse to one emit', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 5000, minItems: 999999 });
|
||||
p.start('scan', 100);
|
||||
// Rapid ticks — should not emit intermediate 'tick' events (only the final one if eq total).
|
||||
for (let i = 0; i < 10; i++) p.tick();
|
||||
const events = parseJsonl(read());
|
||||
const ticks = events.filter((e) => e.event === 'tick');
|
||||
// 10 ticks, total=100, final-tick-on-complete heuristic doesn't apply (done < total).
|
||||
// Time-gated + item-gated should suppress all.
|
||||
expect(ticks.length).toBe(0);
|
||||
p.finish();
|
||||
});
|
||||
|
||||
test('tick() item-gated: minItems threshold emits after N items', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 999999, minItems: 50 });
|
||||
p.start('scan', 1000);
|
||||
for (let i = 0; i < 100; i++) p.tick();
|
||||
p.finish();
|
||||
const events = parseJsonl(read());
|
||||
const ticks = events.filter((e) => e.event === 'tick');
|
||||
// 100 ticks with minItems=50 ⇒ expect ~2 emits
|
||||
expect(ticks.length).toBeGreaterThanOrEqual(1);
|
||||
expect(ticks.length).toBeLessThanOrEqual(3);
|
||||
});
|
||||
|
||||
test('final tick emits regardless of gating when done === total', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 999999, minItems: 999999 });
|
||||
p.start('scan', 3);
|
||||
p.tick();
|
||||
p.tick();
|
||||
p.tick(); // this one hits done===total, must emit
|
||||
p.finish();
|
||||
const events = parseJsonl(read());
|
||||
const ticks = events.filter((e) => e.event === 'tick');
|
||||
expect(ticks.length).toBe(1);
|
||||
expect(ticks[0]).toMatchObject({ done: 3, total: 3 });
|
||||
});
|
||||
|
||||
test('start(phase) with no total → ticks omit pct/eta_ms', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 0, minItems: 1 });
|
||||
p.start('unknown_size_scan'); // no total
|
||||
p.tick();
|
||||
p.finish();
|
||||
const events = parseJsonl(read());
|
||||
const tick = events.find((e) => e.event === 'tick')!;
|
||||
expect(tick).toBeDefined();
|
||||
expect(tick.total).toBeUndefined();
|
||||
expect(tick.pct).toBeUndefined();
|
||||
expect(tick.eta_ms).toBeUndefined();
|
||||
expect(tick.done).toBe(1);
|
||||
});
|
||||
|
||||
test('heartbeat() emits without bumping done', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 0, minItems: 1 });
|
||||
p.start('slow_query');
|
||||
p.heartbeat('still scanning…');
|
||||
p.heartbeat('still scanning…');
|
||||
p.finish();
|
||||
const events = parseJsonl(read());
|
||||
const hb = events.filter((e) => e.event === 'heartbeat');
|
||||
expect(hb.length).toBe(2);
|
||||
expect(hb[0]).toMatchObject({ phase: 'slow_query', note: 'still scanning…' });
|
||||
// No 'done' field on heartbeat.
|
||||
expect(hb[0].done).toBeUndefined();
|
||||
});
|
||||
|
||||
test('child() composes phase path with dots', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 0, minItems: 1 });
|
||||
p.start('sync');
|
||||
const c = p.child('import');
|
||||
c.start('file1', 1);
|
||||
c.tick();
|
||||
c.finish();
|
||||
p.finish();
|
||||
const events = parseJsonl(read());
|
||||
const startEvents = events.filter((e) => e.event === 'start');
|
||||
const phases = startEvents.map((e) => e.phase);
|
||||
expect(phases).toContain('sync');
|
||||
expect(phases).toContain('sync.import.file1');
|
||||
});
|
||||
|
||||
test('child.finish() does not close parent', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 0, minItems: 1 });
|
||||
p.start('sync');
|
||||
const c = p.child('import');
|
||||
c.start('batch1', 1);
|
||||
c.tick();
|
||||
c.finish();
|
||||
// Parent still alive — another tick should work.
|
||||
// (parent.tick requires a started phase; start was called on 'sync'.)
|
||||
p.tick(1, 'after-child');
|
||||
p.finish();
|
||||
const events = parseJsonl(read());
|
||||
const finishes = events.filter((e) => e.event === 'finish');
|
||||
const finishPhases = finishes.map((e) => e.phase);
|
||||
expect(finishPhases).toContain('sync.import.batch1');
|
||||
expect(finishPhases).toContain('sync');
|
||||
});
|
||||
|
||||
test('EPIPE sync throw is swallowed; subsequent writes are no-ops', () => {
|
||||
const brokenStream = {
|
||||
isTTY: false,
|
||||
write: () => {
|
||||
throw Object.assign(new Error('write EPIPE'), { code: 'EPIPE' });
|
||||
},
|
||||
on: () => {},
|
||||
} as unknown as NodeJS.WritableStream;
|
||||
const p = createProgress({ mode: 'json', stream: brokenStream, minIntervalMs: 0, minItems: 1 });
|
||||
// Must not throw.
|
||||
expect(() => {
|
||||
p.start('scan', 3);
|
||||
p.tick();
|
||||
p.tick();
|
||||
p.finish();
|
||||
}).not.toThrow();
|
||||
});
|
||||
|
||||
test("EPIPE stream 'error' event marks stream broken", () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 0, minItems: 1 });
|
||||
p.start('scan', 2);
|
||||
p.tick();
|
||||
// Simulate async EPIPE via error event.
|
||||
stream.emit('error', Object.assign(new Error('EPIPE'), { code: 'EPIPE' }));
|
||||
// Subsequent calls must not throw.
|
||||
expect(() => {
|
||||
p.tick();
|
||||
p.finish();
|
||||
}).not.toThrow();
|
||||
// We did get at least the pre-error emissions.
|
||||
expect(read()).toContain('"event":"start"');
|
||||
});
|
||||
|
||||
test('only one process-level signal handler installed across many reporters', () => {
|
||||
// Baseline: one handler already installed by prior tests in this file.
|
||||
const installedBefore = __signalHandlerInstalledForTest();
|
||||
const { stream } = sink(false);
|
||||
for (let i = 0; i < 50; i++) {
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 0, minItems: 1 });
|
||||
p.start(`phase_${i}`, 1);
|
||||
p.finish();
|
||||
}
|
||||
// After 50 reporter lifecycles, still exactly one handler and zero leaked live entries.
|
||||
expect(__signalHandlerInstalledForTest()).toBe(installedBefore || true);
|
||||
expect(__liveReporterCountForTest()).toBe(0);
|
||||
});
|
||||
|
||||
test('startHeartbeat() fires heartbeats and stop() clears', async () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream, minIntervalMs: 0, minItems: 1 });
|
||||
p.start('slow_query');
|
||||
const stop = startHeartbeat(p, 'still running…', 20);
|
||||
await new Promise((r) => setTimeout(r, 85));
|
||||
stop();
|
||||
p.finish();
|
||||
const events = parseJsonl(read());
|
||||
const hb = events.filter((e) => e.event === 'heartbeat');
|
||||
// Expect ~4 heartbeats in 85ms at 20ms interval, tolerate jitter.
|
||||
expect(hb.length).toBeGreaterThanOrEqual(2);
|
||||
expect(hb.length).toBeLessThanOrEqual(6);
|
||||
});
|
||||
|
||||
test('finish without prior start is a no-op (no crash)', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream });
|
||||
expect(() => p.finish()).not.toThrow();
|
||||
expect(read()).toBe('');
|
||||
});
|
||||
|
||||
test('tick without prior start is a no-op (no crash)', () => {
|
||||
const { stream, read } = sink(false);
|
||||
const p = createProgress({ mode: 'json', stream });
|
||||
expect(() => p.tick()).not.toThrow();
|
||||
expect(read()).toBe('');
|
||||
});
|
||||
});
|
||||
@@ -194,7 +194,13 @@ describe('buildSyncManifest edge cases', () => {
|
||||
describe('sync regression — #132 nested transaction deadlock', () => {
|
||||
test('src/commands/sync.ts does not wrap the add/modify loop in engine.transaction()', async () => {
|
||||
const source = await Bun.file(new URL('../src/commands/sync.ts', import.meta.url)).text();
|
||||
const loopStart = source.indexOf('for (const path of [...filtered.added, ...filtered.modified]');
|
||||
// Accept either of the historical loop shapes: the original inline
|
||||
// `for (const path of [...filtered.added, ...filtered.modified])` or
|
||||
// the v0.15.2 progress-wrapped variant where the list is hoisted into
|
||||
// a local `addsAndMods` variable first.
|
||||
const inlineIdx = source.indexOf('for (const path of [...filtered.added, ...filtered.modified]');
|
||||
const hoistedIdx = source.indexOf('const addsAndMods = [...filtered.added, ...filtered.modified]');
|
||||
const loopStart = inlineIdx !== -1 ? inlineIdx : hoistedIdx;
|
||||
expect(loopStart).toBeGreaterThan(-1);
|
||||
const prelude = source.slice(0, loopStart);
|
||||
const lastTxIdx = prelude.lastIndexOf('engine.transaction');
|
||||
|
||||
Reference in New Issue
Block a user