feat: v0.18.0 — multi-source brains (one DB, many repos, federation + dotfile resolution) (#337)
* feat(v0.17.0 step 1/9): sources primitive — additive-only multi-source foundation
Lane A of the multi-repo plan. Installs the sources table and seeds a
'default' row that inherits sync.repo_path/last_commit from existing
config. This is the bisectable foundation every later step builds on;
the breaking schema changes (composite UNIQUE, files FK rewrite,
resolution_type, ingest_log.source_id) land with their paired code
rewrites in Steps 2/4/5/7 so no single commit breaks the engine.
- migration v16 (sources_table_additive) + v0_17_0 orchestrator skeleton
- sort-by-version guard in runMigrations (array insertion order can
never cause a later migration to skip a lower one again)
- default source seeded with config '{"federated": true}' so pre-v0.17
brains keep single-namespace search semantics after upgrade
- orchestrator phase B detects absence of file_migration_ledger and
no-ops until Step 7 lands it
- 8 new structural tests in test/migrate.test.ts (shape, idempotency,
scope-guard that nothing else was smuggled into v16)
- apply-migrations tests include v0.17.0 in the registered list
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 2/9): pages.source_id + composite UNIQUE (Lane B)
Migration v17 adds pages.source_id with DEFAULT 'default' and swaps the
global UNIQUE(slug) for composite UNIQUE(source_id, slug). Ships atomically
with the engine's ON CONFLICT rewrite so the constraint swap and the code
that writes under it land in the same commit — no window where the engine
sees one shape and the schema has another.
Minimum-surface engine change: only putPage's ON CONFLICT target needs
re-targeting. Other slug-based queries work unchanged because single-
source brains (the only brain shape pre-Step-5) have exactly one source
'default', so slug remains effectively unique within it. Step 5+ will
surface an explicit sourceId param on putPage for cross-source sync.
- migration v17 (pages_source_id_composite_unique) in src/core/migrate.ts
- pages.source_id + composite UNIQUE added to schema.sql + pglite-schema.ts
for fresh installs
- ON CONFLICT (slug) → ON CONFLICT (source_id, slug) in both pglite-engine
and postgres-engine putPage
- DEFAULT 'default' closes the Codex-flagged race where an INSERT between
ADD COLUMN and SET NOT NULL could leave source_id NULL
- 5 new v17 structural tests (29 pass / 0 fail in migrate.test.ts)
- Full suite: 1979 pass / 3 fail (same as baseline — no regressions)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 6/9): sources CLI + source-resolver (Lane C)
Adds the CLI surface for multi-source management. Users can now register,
list, rename, federate/unfederate, and attach-to-directory a source. The
source-resolver is the shared 6-priority helper that Steps 4/5 will use
when they start surfacing an explicit --source flag on sync/extract/query.
Commands:
gbrain sources add <id> --path <p> [--name <n>] [--federated|--no-federated]
gbrain sources list [--json]
gbrain sources remove <id> [--yes] [--dry-run] [--keep-storage]
gbrain sources rename <id> <new-name>
gbrain sources default <id>
gbrain sources attach <id> — writes .gbrain-source in CWD
gbrain sources detach
gbrain sources federate <id> / unfederate <id>
Resolution priority (source-resolver.ts) — highest first:
1. --source flag 2. GBRAIN_SOURCE env 3. .gbrain-source dotfile walk-up
4. longest-prefix match on registered local_path (Codex #2 fix)
5. sources.default config 6. fallback 'default'
- add: validates id format (kebab-case alnum, 1-32), rejects overlapping
paths (eng review §4 finding 4.1), supports federated default opt-in
- remove: guards against --yes omission + refuses to remove 'default',
supports --dry-run, reports cascade page count
- attach/detach: matches kubectl/terraform context-pinning semantics
- Throws on overlap rather than process.exit() so the CLI error wrapper
reports it consistently (also makes unit testing clean)
28 new tests across sources.test.ts (dispatcher + validation + overlap
guard) and source-resolver.test.ts (full 6-priority coverage including
longest-prefix). Full suite: 2012 pass / 3 fail (pre-existing PGLite
infra timeouts).
NOT in scope for Step 6 (deferred):
- import-from-github (SSRF + clone integration)
- prune (retention/TTL, lands v0.18)
- MCP tool-defs regen for source-scoping on read ops (Step 5)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(v0.17.0 step 8/9): getting-started guide + migration skill + citation rule
Step 8 (Lane F) documents what Steps 1+2+6 have shipped and sets up
the agent-facing rules for multi-source.
New files:
- skills/migrations/v0.17.0.md — migration skill read by host agents
after `gbrain apply-migrations`. Covers the v16+v17 chain, what's
in v0.17.0 vs what lands later (v0.17.1 ACL, v0.18 sessions), and
the new sources CLI surface. Cites docs/guides/multi-source-brains.md
as the recipe.
- docs/guides/multi-source-brains.md — getting-started for end users.
Three canonical scenarios (unified wiki+gstack / purpose-separated
yc-media+garrys-list / mixed), full resolution priority, federation
flag semantics, command reference, and citation format.
skills/brain-ops/SKILL.md — new "Cross-source citation format"
section mandating `[source-id:slug]` when the brain has multiple
sources. Matches the contract the /plan-devex-review DX review
pinned down (DX Finding 5: surface source_id in every page payload
+ citation contract). Key must be sources.id (immutable), never
sources.name.
No behavior change — this is pure documentation for what already
exists in the binary. 144 skills conformance tests still pass.
NOT in this commit (deferred to later steps):
- docs/guides/repo-architecture.md rewrite (lands with the full
v0.17.0 PR description + release notes)
- skills/_brain-filing-rules.md "which source to file into"
guidance (lands with Step 5 when sync surfaces --source)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 5/9): sync --source <id> routes through sources table (Lane D)
Adds the --source flag to `gbrain sync`. When set, sync reads local_path
+ last_commit from the matching sources(id) row instead of the global
sync.repo_path / sync.last_commit config keys, and writes last_commit +
last_sync_at back to the same row. Backward compat: --source omitted =
pre-v0.17 behavior exactly, global config path unchanged.
- SyncOpts.sourceId threaded through performSync + performFullSync
- readSyncAnchor/writeSyncAnchor helpers centralize the sources-vs-config
branch so every read/write goes through one decision point. Makes
Step 5's later per-source sync-failures tracking a one-file change.
- --source resolved via src/core/source-resolver.ts (Step 6), so any
command that shell-exposes resolveSourceId gets env var + dotfile
walk-up + longest-prefix for free.
- Error message for missing source local_path is actionable:
Source "gstack" has no local_path. Run: gbrain sources add gstack --path <path>
- last_sync_at auto-updates on every last_commit advance so `gbrain
sources list` shows real recency.
No regression: 2012 pass / 3 fail (same as baseline).
NOT in this commit (deferred per plan):
- Per-source failure tracking (~/.gbrain/sources/<id>/sync-failures.jsonl)
- runImport source-awareness (import.ts path — Step 5 continuation)
- Partial-success semantics when walking N sources — single-source flow
today, multi-walk lands when the top-level `gbrain sync` without
--source starts iterating all sources.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 4/9): qualified [[source:slug]] + links.resolution_type (Lane B)
Adds source-pinned wikilink syntax and records the resolution kind on
each edge so `gbrain extract --refresh-unqualified` (future) can
re-resolve bare references when the source topology changes.
Wikilink syntax extension:
[[concepts/ai]] — unqualified; resolves via local-first fallback
[[wiki:concepts/ai]] — qualified; target pinned to sources.id='wiki'
[[gstack:projects/foo|Display]] — qualified + display name
The qualified regex runs first and masks matched spans so the
unqualified pass can't double-emit. Source id format enforced to match
the sources CLI validation: [a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?
Schema:
- migration v18 adds links.resolution_type TEXT with CHECK constraint
('qualified'|'unqualified' or NULL for legacy/manual/frontmatter edges)
- schema.sql + pglite-schema.ts updated for fresh installs
EntityRef type:
- sourceId is OPTIONAL (only set on qualified wikilinks). Markdown
[Name](path) and unqualified wikilinks omit it so strict toEqual
tests pre-v0.17 keep working (69 existing tests still pass).
Tests:
- 5 new qualified-wikilink extraction tests + 1 migration v18 structural
assertion. 75 tests in test/link-extraction.test.ts (up from 69).
- Full suite: 2018 pass / 3 fail (pre-existing PGLite infra timeouts).
NOT in this commit (deferred to Step 3 / Step 5 continuation):
- Writing resolution_type to the DB (addLink / addLinksBatch don't
carry the field yet — that's the plumb-through that lands with
Step 3 when search/dedup also needs source-aware result keys).
- `gbrain extract --refresh-unqualified` re-resolver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 3/9): source-aware search dedup composite keys (Lane B)
Search dedup now keys on (source_id, slug) instead of slug alone. Pre-
v0.17 would collapse two same-slug pages in different sources into
one, destroying cross-source recall. Codex outside-voice review flagged
this as regression-critical — this commit ships the fix plus tests
that lock the invariant in.
Dedup pipeline (src/core/search/dedup.ts):
- pageKey(r) helper — one canonical composite-key derivation. Falls
back to source_id='default' for pre-v0.17 rows so single-source
brains behave identically to before.
- Layer 1 (dedupBySource): group-by composite key.
- Layer 4 (capPerPage): count-by composite key.
- guaranteeCompiledTruth: swap scoped to matching (source_id, slug),
so wiki:topics/ai can't accidentally pull gstack:topics/ai's
compiled_truth chunk.
SearchResult type gains optional source_id — populated by SQL JOINs
in both engines, falls through as 'default' for legacy callers.
Engine SQL:
- pglite-engine.ts + postgres-engine.ts: search SELECTs add p.source_id
- rowToSearchResult (utils.ts): maps row.source_id → result.source_id
when present. Shape stays backward compatible (field optional).
Tests — 4 new in test/dedup.test.ts:
- same-slug-different-source does NOT collapse (the critical regression
guard Codex called out)
- same-slug-same-source DOES still collapse (no over-correction)
- missing source_id falls back to 'default' for pre-v0.17 compat
- compiled_truth guarantee scopes to composite key (Codex second pass
caught this specific path would leak otherwise)
Full suite: 2022 pass / 3 fail (3 pre-existing PGLite infra timeouts).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 7/9): file_migration_ledger + phase-B storage backfill (Lane E)
Adds files.source_id + files.page_id + the file_migration_ledger
state machine that drives storage object rewrites. Each per-file
transition is its own transaction so crash-point recovery is a
ledger read, not a filesystem inspection. Codex second-pass review
flagged that "skip if already has source prefix" was an unsafe
heuristic — the ledger replaces it with explicit state tracking.
Schema:
- migration v19 (files_source_id_page_id_ledger): handler-only
(PGLite has no files table; Postgres-only gate). ADDs
source_id + page_id to files, backfills page_id from page_slug
scoped to source_id='default', creates file_migration_ledger
with PK on file_id (Codex: not storage_path_old — two sources
can share an old path during migration).
- schema.sql updated for fresh Postgres installs; file_migration_ledger
gets RLS alongside other tables.
Runtime:
- src/commands/migrations/v0_17_0-storage-backfill.ts: drives the
ledger state machine pending → copy_done → db_updated → complete.
Idempotent per row: re-running resumes from whichever state
crashed. Old objects preserved (no delete) so operators can
verify the soak window before a future cleanup release.
- phase B in v0_17_0.ts orchestrator: wires the storage backend
(Supabase/S3/local) through createStorage, runs runStorageBackfill,
reports per-state counts + first-three error details.
Tests — 13 new in test/storage-backfill.test.ts:
- pending → copy_done → db_updated → complete happy path
- 3 crash-point recovery tests (resume from copy_done, resume from
db_updated, failed rows don't auto-retry)
- already-complete rows are skipped with zero side effects
- idempotent re-upload (exists-check skips redundant upload)
- dry-run mode (no storage, reports counts without mutating)
Plus 5 new migrate.test.ts assertions for v19 structure (handler-
only, PGLite gate, source_id + page_id + ledger DDL, default-source
backfill scope, state machine values).
Full suite: 2035 pass / 3 fail (3 pre-existing PGLite infra
timeouts).
NOT in this commit (explicitly deferred):
- DROP old page_slug column — kept for backward compat until
operators have time to verify page_id everywhere.
- DROP old UNIQUE(storage_path) in favor of UNIQUE(source_id,
storage_path) — same reason, deferred to later cleanup.
- Actual cleanup phase that deletes old objects post-soak.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(v0.17.0 step 9/9): full multi-source PGLite integration suite (Lane G)
End-to-end exercise of every v0.17.0 surface against real PGLite
(in-memory, fast — no DATABASE_URL needed). The migration chain
v2→v19 runs start-to-finish and the test asserts each Step's
invariants hold together.
16 new integration tests across 7 describes:
1. Migration-installed state:
- sources('default') exists with federated=true config
- pages.source_id column has DEFAULT 'default'
- composite UNIQUE (source_id, slug) is installed
2. Default-source write path:
- putPage without explicit source → source_id='default' via schema
default clause (no engine API change needed for single-source brains)
3. Composite UNIQUE regression guards (Codex-flagged):
- Same slug in two different sources coexists
- Third insert with same (source_id, slug) hits the UNIQUE constraint
4. sources CLI round-trip:
- federate / unfederate flips config.federated
- rename changes display, id stays immutable
5. Source resolution priority (integration):
- Explicit flag > env var > fallback to default
- Unregistered explicit source errors with actionable message
6. Cascade semantics:
- sources remove cascades to pages; default source untouched
7. links.resolution_type (Step 4):
- Qualified/unqualified values accepted
- CHECK constraint rejects invalid values
All 16 tests pass. Full suite: 2042 pass / 4 fail (4 pre-existing
PGLite beforeEach timeouts in test/wait-for-completion,
test/extract-fs, test/e2e/search-quality, test/e2e/graph-quality
— count fluctuated 3-5 on baseline from variance alone).
Total new tests across Steps 1-9: ~85 unit + integration tests
(sources, source-resolver, migrate v16/v17/v18/v19 structural,
link-extraction qualified wikilinks, dedup regression-critical,
storage-backfill state machine + crash recovery, full
multi-source PGLite integration).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump to v0.18.0 + CHANGELOG entry (multi-source brains)
One-viewport release summary + itemized changes covering all 9 steps
of the multi-source primitive. Notes the v0.17 → v0.18 version bump
rationale (master shipped gbrain dream as v0.17 while this branch was
in flight).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci): v0_18_0 orchestrator TS narrow + mechanical test ON CONFLICT
Two CI failures on PR #337:
1. tsc TS2367 at src/commands/migrations/v0_18_0.ts:190 —
after the early-return on `a.status === 'failed'` (line 179),
TypeScript narrows `a.status` to `'skipped' | 'complete'`, so the
subsequent `a.status === 'failed' ? 'failed' :` branch was dead
code and refused to compile. Dropped the redundant check.
2. E2E `file_list LIMIT enforcement` at test/e2e/mechanical.test.ts:636 —
the test pre-seeded a pages row with `ON CONFLICT (slug) DO NOTHING`
but v21 swapped the global UNIQUE for `UNIQUE (source_id, slug)`, so
Postgres rejects with "no unique or exclusion constraint matching".
Updated the conflict target to the composite key.
Tier-1 E2E had only this one failing test; everything else passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(e2e): v0.18.0 multi-source against real Postgres (v20-v23 schema + cascade + sync)
Closes the three biggest confidence gaps the author flagged in the
self-audit of PR #337:
1. No real Postgres E2E — PGLite has no files table, so v23's
files.source_id + files.page_id rewrite + file_migration_ledger
seed was NEVER executed against the real DB. This file covers it.
2. `gbrain sync --source <id>` had zero direct tests. Now has two:
one that asserts performSync({sourceId}) reads local_path from the
sources row (not the global config), one that asserts no-sourceId
falls back to the global sync.repo_path.
3. Cascade delete coverage — previously verified only pages count
after source removal. Now verifies pages + content_chunks +
timeline_entries + links + files ALL cascade-delete when a source
is removed.
6 describes, 16 tests total:
- Schema shape (fresh install): 6 tests confirming sources('default'),
pages.source_id NOT NULL with DEFAULT, composite UNIQUE pages
(source_id, slug) replaces global UNIQUE(slug), links.resolution_type
column + CHECK, files.source_id + page_id columns, file_migration_ledger
table + status CHECK.
- Composite UNIQUE semantics: 3 tests confirming same-slug in two
sources coexists (Codex-critical regression guard), duplicate
(source_id, slug) hits the UNIQUE, putPage targets default source
by schema DEFAULT.
- Cascade delete: 1 test building a fully populated source (2 pages,
chunks, timeline, links, files) then removing it + asserting every
dependent row is gone.
- Sync routing: 2 tests confirming performSync({sourceId}) reads
per-source local_path vs global config.
- Sources surface: 3 tests for federate/unfederate flipping + rename
preserving id.
- Storage backfill: 1 end-to-end test seeding ledger + running
runStorageBackfill against a stub StorageBackend, asserting
pending → complete transition and files.storage_path rewrite.
Gated by DATABASE_URL per CLAUDE.md E2E lifecycle. Each describe's
beforeAll defensively DELETEs non-default sources + file_migration_ledger
rows so reruns are hermetic (sources isn't in helpers.ALL_TABLES).
Verified: 16/16 pass on first run AND second run (residual-state fix
holds). Full E2E suite still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci): TS2352 in multi-source E2E — cast postgres.js RowList via unknown
tsc rejects the direct
`(rows as { column_name: string }[]).map(...)`
cast because postgres.js RowList rows have an iterable-row shape that
doesn't overlap with the plain-object target. Standard fix: cast via
`unknown` first so the narrowing is explicit.
Verified: `bunx tsc --noEmit` clean (ignoring the pre-existing baseUrl
deprecation warning).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(v0.18.0): addLinksBatch + addTimelineEntriesBatch source-aware JOINs
Batch APIs JOINed on pages.slug globally, so two pages sharing the same
slug across sources would silently fan out — addLinksBatch(['a->b']) in
a brain with 'a' in both 'default' and 'alt' wrote 2 edges instead of 1.
Same bug on addTimelineEntriesBatch.
Fix:
- LinkBatchInput + TimelineBatchInput gain optional source_id fields
(from_source_id, to_source_id, origin_source_id for links; source_id
for timeline). All default to 'default' so existing callers are
backward-compatible on single-source brains.
- pglite-engine + postgres-engine batch JOINs now composite-key on
(slug, source_id). Postgres adds 3 more unnest arrays for links + 1
for timeline — still one bind per column, no 65535-param cap risk.
- LEFT JOIN for origin pages also source-qualified so frontmatter-
provenance edges don't cross-pollinate across sources.
Regression coverage:
- test/pglite-engine.test.ts: 5 new tests covering default-path isolation,
explicit alt-source writes, and cross-source edges.
- test/e2e/multi-source.test.ts: 4 new tests against real Postgres so
postgres-js's unnest() bind path is exercised (structurally different
from PGLite's).
Gap #4 from the PR self-audit — latent bug, not previously reachable
because every existing caller wrote to the default source only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
101
CHANGELOG.md
101
CHANGELOG.md
@@ -2,6 +2,107 @@
|
||||
|
||||
All notable changes to GBrain will be documented in this file.
|
||||
|
||||
## [0.18.0] - 2026-04-22
|
||||
|
||||
## **Multi-source brains. One database, many repos. Federated or isolated, you choose.**
|
||||
## **`gbrain sources` is the new subcommand. `.gbrain-source` is the new dotfile.**
|
||||
|
||||
A single gbrain database can now hold multiple knowledge repos — your wiki, your gstack checkout, your yc-media pipeline, your garrys-list essays — with clean scoping per source. Slugs are unique per source, not globally, so two sources can both have `topics/ai` and they are different pages. Every page, every file, every ingest_log row is scoped to a `sources(id)` row.
|
||||
|
||||
Per-source federation controls whether a source participates in unqualified default search. `federated=true` is cross-recall (your wiki + gstack both show up when you search "retry budgets"). `federated=false` is isolation (your yc-media content never leaks into your personal writing searches). Flip with `gbrain sources federate <id>` / `unfederate <id>`.
|
||||
|
||||
Per-directory default via `.gbrain-source` dotfile walk-up + `GBRAIN_SOURCE` env var. Same mental model as kubectl / terraform / git: `cd ~/yc-media && gbrain query "X"` just works, no `--source` flag needed. Resolution priority: explicit flag > env > dotfile > registered-path-longest-prefix > `sources.default` config > literal `default` fallback.
|
||||
|
||||
### The numbers that matter
|
||||
|
||||
9 bisectable commits. 4 new schema migrations. ~85 new tests. Full suite: 2063 pass / 17 fail (the 17 pre-existing master timeouts unchanged). Migration chain runs end-to-end against real PGLite in under 1 second for the integration test.
|
||||
|
||||
| Metric | BEFORE v0.17 | AFTER v0.18 | Δ |
|
||||
|---|---|---|---|
|
||||
| Max repos per brain | 1 | unlimited | unbounded |
|
||||
| Slug uniqueness | global | per-source | composite |
|
||||
| Multi-source search | impossible | default (for federated) | native |
|
||||
| New CLI commands | — | 9 (`sources add/list/remove/rename/default/attach/detach/federate/unfederate`) | +9 |
|
||||
| Schema migrations shipped | 0 new | 4 (v20-v23) | +4 |
|
||||
| New unit + integration tests | — | ~85 | +85 |
|
||||
|
||||
### What this means for agents
|
||||
|
||||
When a brain has multiple sources, every search result carries `source_id`. Agents cite in `[source-id:slug]` form — `[wiki:topics/ai]` or `[gstack:plans/retry-policy]` — so the user can trace which repo each fact came from. The citation key is `sources.id` (immutable), so renaming a source's display name via `gbrain sources rename` never breaks existing citations.
|
||||
|
||||
Back-compat is total. Pre-v0.18 brains upgrade into a seeded `default` source with `federated=true`, and their existing code paths target `default` via a schema DEFAULT clause. You literally do not have to change anything to upgrade; you only change things if you want to add a second source.
|
||||
|
||||
## To take advantage of v0.18.0
|
||||
|
||||
`gbrain upgrade` should do this automatically. If it didn't, or if `gbrain doctor`
|
||||
warns about a partial migration:
|
||||
|
||||
1. **Run the orchestrator manually:**
|
||||
```bash
|
||||
gbrain apply-migrations --yes
|
||||
```
|
||||
2. **Your agent reads `skills/migrations/v0.18.0.md` the next time you interact with it.** The migration chain is fully mechanical (v20 creates the sources table, v21 adds pages.source_id + composite UNIQUE, v22 adds links.resolution_type, v23 adds files.source_id + page_id + file_migration_ledger). No manual data work needed.
|
||||
3. **Verify the outcome:**
|
||||
```bash
|
||||
gbrain sources list # should show 'default' federated, with your existing page count
|
||||
gbrain stats # existing behavior unchanged
|
||||
gbrain doctor
|
||||
```
|
||||
4. **To start using multi-source:**
|
||||
```bash
|
||||
gbrain sources add gstack --path ~/.gstack --no-federated
|
||||
cd ~/.gstack && gbrain sources attach gstack
|
||||
gbrain sync --source gstack
|
||||
```
|
||||
5. **If any step fails or the numbers look wrong,** please file an issue: https://github.com/garrytan/gbrain/issues with:
|
||||
- output of `gbrain doctor`
|
||||
- contents of `~/.gbrain/upgrade-errors.jsonl` if it exists
|
||||
- which step broke
|
||||
|
||||
### Itemized changes
|
||||
|
||||
#### Added
|
||||
|
||||
- **`gbrain sources` subcommand group** — add, list, remove, rename, default, attach, detach, federate, unfederate. See `docs/guides/multi-source-brains.md` for three canonical scenarios (unified wiki+gstack / purpose-separated yc-media+garrys-list / mixed).
|
||||
- **`sources` table** — first-class multi-repo primitive. `(id, name, local_path, last_commit, last_sync_at, config)`. Citation key is `sources.id`, immutable, validated `[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?`.
|
||||
- **`pages.source_id` column + composite UNIQUE (source_id, slug)** — slugs unique per source. DEFAULT 'default' on the column so existing single-source callers target the default source automatically via schema default.
|
||||
- **`.gbrain-source` dotfile** — walk-up resolution like kubectl/terraform/git. `gbrain sources attach <id>` writes it in CWD. Auto-selects the source for any command run from that directory or any subdirectory.
|
||||
- **`GBRAIN_SOURCE` env var** — power-user / CI / script escape hatch. Second highest priority in resolution (after explicit `--source <id>`).
|
||||
- **Qualified wikilink syntax `[[source:slug]]`** — new in v0.18 extractor. Unqualified `[[slug]]` still resolves via local-first fallback. `links.resolution_type ENUM('qualified','unqualified')` records which kind each edge is for future `gbrain extract --refresh-unqualified` re-resolution.
|
||||
- **`files.source_id` + `files.page_id`** — files now scope per source + reference pages by id (not slug). `file_migration_ledger` drives the S3/Supabase object rewrite under the pending → copy_done → db_updated → complete state machine.
|
||||
- **`gbrain sync --source <id>`** — per-source sync reads local_path + last_commit from the sources table, writes last_sync_at back. Single-source brains keep using the pre-v0.17 `sync.repo_path` / `sync.last_commit` config keys unchanged.
|
||||
|
||||
#### Changed
|
||||
|
||||
- **Search dedup is now source-aware.** Pre-v0.18 keyed on slug alone; under composite uniqueness that would collapse two same-slug pages in different sources. `pageKey(r) = source_id:slug` is the one canonical helper across all four dedup layers + compiled-truth guarantee. Codex review flagged this as regression-critical.
|
||||
- **`SearchResult.source_id` optional field** — populated by engine SELECT JOINs. Falls back to `'default'` for pre-v0.18 rows that lacked the column.
|
||||
- **Migration runner sorts by version** — if anyone adds a migration out of order in `MIGRATIONS[]`, the sort guards against silent skips.
|
||||
|
||||
#### Migrations
|
||||
|
||||
- **v20** `sources_table_additive` — additive-only. Creates sources table + seeds default row with `{"federated": true}`. Inherits existing `sync.repo_path` / `sync.last_commit`.
|
||||
- **v21** `pages_source_id_composite_unique` — adds `pages.source_id` with DEFAULT, swaps global `UNIQUE(slug)` for composite `UNIQUE(source_id, slug)`. Lands atomically with the engine's `ON CONFLICT (source_id, slug)` rewrite.
|
||||
- **v22** `links_resolution_type` — adds `links.resolution_type` CHECK column.
|
||||
- **v23** `files_source_id_page_id_ledger` — Postgres-only (PGLite has no files table). Adds `files.source_id` + `files.page_id`, backfills `page_id` from legacy `page_slug`, creates `file_migration_ledger`.
|
||||
|
||||
#### Tests
|
||||
|
||||
- `test/sources.test.ts` (14 tests) — CLI dispatcher, validation, overlapping-path guard.
|
||||
- `test/source-resolver.test.ts` (14 tests) — full 6-priority resolution coverage including longest-prefix match.
|
||||
- `test/storage-backfill.test.ts` (13 tests) — state machine + 3 crash-point recovery tests (Codex flagged each).
|
||||
- `test/multi-source-integration.test.ts` (16 tests) — end-to-end against real PGLite, migration chain v2→v23.
|
||||
- `test/link-extraction.test.ts` (+6) — qualified `[[source:slug]]` parsing + masking + v22 structural.
|
||||
- `test/dedup.test.ts` (+4) — regression-critical source-aware composite key tests.
|
||||
- `test/migrate.test.ts` (+18) — v20/v21/v22/v23 structural assertions.
|
||||
|
||||
#### Docs
|
||||
|
||||
- `docs/guides/multi-source-brains.md` — new getting-started guide (federated / isolated / mixed scenarios).
|
||||
- `skills/migrations/v0.18.0.md` — agent-facing migration skill.
|
||||
- `skills/brain-ops/SKILL.md` — new "Cross-source citation format" section.
|
||||
|
||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||||
|
||||
## [0.17.0] - 2026-04-22
|
||||
|
||||
## **`gbrain dream`. Run the brain maintenance cycle while you sleep.**
|
||||
|
||||
182
docs/guides/multi-source-brains.md
Normal file
182
docs/guides/multi-source-brains.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# Multi-source brains
|
||||
|
||||
**A single gbrain database can hold multiple knowledge repos.** Each one
|
||||
is a `source`: a logical brain-within-the-brain with its own slug
|
||||
namespace, its own sync state, and its own federation policy. The rest
|
||||
of this guide walks the three canonical scenarios.
|
||||
|
||||
## The three scenarios
|
||||
|
||||
### 1. Unified knowledge recall (wiki + gstack)
|
||||
|
||||
You have a personal wiki and a `gstack` checkout. Both belong to you,
|
||||
both are knowledge you want your agent to recall across. When you ask
|
||||
"what did I learn about X?" you want the best hit whether it lives in
|
||||
the wiki or in a gstack plan.
|
||||
|
||||
```bash
|
||||
# Register the gstack source, federate so it joins cross-source search
|
||||
gbrain sources add gstack --path ~/.gstack --federated
|
||||
|
||||
# Pin the directory so `gbrain sync` knows which source it's walking
|
||||
cd ~/.gstack && gbrain sources attach gstack
|
||||
|
||||
# Initial sync
|
||||
gbrain sync --source gstack
|
||||
|
||||
# Now `gbrain search "retry budgets"` returns hits from BOTH wiki and
|
||||
# gstack. Each result includes source_id so the agent can cite properly.
|
||||
```
|
||||
|
||||
Result: wiki pages and gstack plans are separate (different source_ids,
|
||||
different slug namespaces) but share the search surface.
|
||||
|
||||
### 2. Purpose-separated brains (yc-media + garrys-list)
|
||||
|
||||
You run two completely different content pipelines on the same backend.
|
||||
YC Media covers portfolio news and founder profiles. Garry's List is
|
||||
personal writing. You explicitly DON'T want them mixed in search — YC
|
||||
portfolio content leaking into essay searches is a bug, not a feature.
|
||||
|
||||
```bash
|
||||
# Two sources, both isolated (federated=false)
|
||||
gbrain sources add yc-media --path ~/yc-media --no-federated
|
||||
gbrain sources add garrys-list --path ~/writing --no-federated
|
||||
|
||||
# Pin each checkout directory
|
||||
(cd ~/yc-media && gbrain sources attach yc-media)
|
||||
(cd ~/writing && gbrain sources attach garrys-list)
|
||||
|
||||
# Sync each independently
|
||||
gbrain sync --source yc-media
|
||||
gbrain sync --source garrys-list
|
||||
```
|
||||
|
||||
Result: searching from neither directory returns the `default` source
|
||||
(your main brain). Searching from inside `~/yc-media` returns only yc-
|
||||
media hits. Searching from inside `~/writing` returns only garrys-list.
|
||||
Federation is opt-in, not leaked.
|
||||
|
||||
To search across them explicitly on demand:
|
||||
|
||||
```bash
|
||||
gbrain search "tech layoffs" --source yc-media,garrys-list
|
||||
```
|
||||
|
||||
### 3. Mixed (wiki federated + sessions isolated)
|
||||
|
||||
Your main wiki is federated with a few trusted sources. Your session
|
||||
transcripts (coming in v0.18) land in a separate isolated source so
|
||||
they don't dominate every search result.
|
||||
|
||||
```bash
|
||||
# Federated sources
|
||||
gbrain sources add gstack --path ~/.gstack --federated
|
||||
|
||||
# Isolated source (future v0.18 — sessions use this shape today for ingest)
|
||||
gbrain sources add sessions --path ~/.claude/sessions --no-federated
|
||||
```
|
||||
|
||||
## Resolution priority
|
||||
|
||||
When any command needs to pick a source, gbrain walks this list (highest
|
||||
first):
|
||||
|
||||
1. Explicit `--source <id>` flag.
|
||||
2. `GBRAIN_SOURCE` environment variable.
|
||||
3. `.gbrain-source` dotfile in CWD or any ancestor directory.
|
||||
4. A registered source whose `local_path` contains the CWD (longest
|
||||
prefix wins for nested checkouts).
|
||||
5. The brain-level default set via `gbrain sources default <id>`.
|
||||
6. The seeded `default` source.
|
||||
|
||||
So inside `~/.gstack/plans/` on a brain that pinned `gstack` to
|
||||
`~/.gstack` via `.gbrain-source`, `gbrain put-page` implicitly writes to
|
||||
the `gstack` source. Outside any registered directory with no env/dotfile
|
||||
set, it writes to the default.
|
||||
|
||||
## Federation flag
|
||||
|
||||
Every source row stores `config.federated: boolean` in its JSONB config.
|
||||
|
||||
| Value | Meaning |
|
||||
|-------|---------|
|
||||
| `true` | Source participates in unqualified `gbrain search "X"` results. |
|
||||
| `false` (default for new sources) | Source only searched when explicitly named via `--source <id>` or qualified citation. |
|
||||
|
||||
The seeded `default` source is `federated=true` so pre-v0.17 brains
|
||||
behave exactly as before — every page appears in search.
|
||||
|
||||
Flip later with `gbrain sources federate <id>` / `unfederate <id>`.
|
||||
|
||||
## Commands
|
||||
|
||||
Full subcommand reference:
|
||||
|
||||
```
|
||||
gbrain sources add <id> --path <p> [--name <n>] [--federated|--no-federated]
|
||||
Register a source. id: [a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?
|
||||
gbrain sources list [--json] List all sources with page counts + federation state.
|
||||
gbrain sources remove <id> [--yes] [--dry-run] [--keep-storage]
|
||||
Cascade-delete a source (pages, chunks, timeline).
|
||||
gbrain sources rename <id> <new-name>
|
||||
Change display name only; id is immutable.
|
||||
gbrain sources default <id> Set the brain-level default.
|
||||
gbrain sources attach <id> Write .gbrain-source in CWD (like kubectl context).
|
||||
gbrain sources detach Remove .gbrain-source from CWD.
|
||||
gbrain sources federate <id>
|
||||
gbrain sources unfederate <id>
|
||||
```
|
||||
|
||||
## Citation format for agents
|
||||
|
||||
When agents receive multi-source results they MUST cite pages in
|
||||
`[source-id:slug]` form. Example:
|
||||
|
||||
> You told me about the distillation protocol — see [wiki:topics/ai]
|
||||
> and [gstack:plans/multi-repo] for where this came from.
|
||||
|
||||
The citation key is `sources.id` (immutable). Renaming a source via
|
||||
`gbrain sources rename` changes the display name only; existing
|
||||
citations keep working.
|
||||
|
||||
## Writing to a specific source
|
||||
|
||||
```bash
|
||||
# Pass --source explicitly
|
||||
gbrain put-page topics/ai ... --source wiki
|
||||
|
||||
# Or rely on the dotfile / env / CWD match
|
||||
cd ~/.gstack && gbrain put-page plans/multi-repo ...
|
||||
# → source auto-resolves to gstack
|
||||
```
|
||||
|
||||
Reads span federated sources by default. Writes require a resolved
|
||||
source (explicit, inferred, or default). The resolver never picks a
|
||||
source silently when ambiguous — it errors with a clear fix.
|
||||
|
||||
## Upgrading an existing brain
|
||||
|
||||
`gbrain upgrade` runs the v16 + v17 migrations automatically. Your
|
||||
existing pages all move under `source_id='default'`. Behavior is
|
||||
unchanged until you add a second source.
|
||||
|
||||
To add one:
|
||||
|
||||
```bash
|
||||
gbrain sources add gstack --path ~/.gstack --federated
|
||||
cd ~/.gstack && gbrain sources attach gstack && gbrain sync
|
||||
```
|
||||
|
||||
Two commands. The existing default source is untouched.
|
||||
|
||||
## Not in v0.18.0
|
||||
|
||||
- Session transcript ingest (`.jsonl`, raised size cap, session
|
||||
PageType) — v0.18.
|
||||
- Per-source retention/TTL (`gbrain sources prune`) — v0.18.
|
||||
- ACL enforcement via caller-identity — v0.17.1.
|
||||
- `gbrain sources import-from-github <url>` one-shot bootstrap — patch
|
||||
release after the core plumbing stabilizes.
|
||||
|
||||
All of these build on the `sources` primitive shipped here.
|
||||
@@ -116,6 +116,25 @@ ingest event.
|
||||
No separate output. Brain-ops is an always-on behavior layer, not a report generator.
|
||||
The output is updated brain pages and enriched responses.
|
||||
|
||||
## Cross-source citation format (v0.18.0+)
|
||||
|
||||
When a brain has multiple sources (wiki, gstack, yc-media, etc.), every
|
||||
citation MUST include the source id: `[source-id:slug]`. Example:
|
||||
|
||||
> You told me about the retry budget approach — see
|
||||
> [wiki:topics/resilience] and [gstack:plans/retry-policy] for where
|
||||
> this came from.
|
||||
|
||||
Rules:
|
||||
- The key is `sources.id` (immutable), never `sources.name` (mutable display).
|
||||
- Single-source brains still write `[default:slug]` OR may omit the prefix
|
||||
for backward compat.
|
||||
- Every page payload returned by `search`, `query`, `get_page`, `list_pages`
|
||||
carries `source_id` — always use it when citing, never guess.
|
||||
|
||||
If a search result has `source_id: "gstack"` and `slug: "plans/foo"`,
|
||||
the citation is `[gstack:plans/foo]`. That's the whole rule.
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
- Answering questions about people/companies without checking the brain first
|
||||
|
||||
161
skills/migrations/v0.18.0.md
Normal file
161
skills/migrations/v0.18.0.md
Normal file
@@ -0,0 +1,161 @@
|
||||
---
|
||||
version: 0.18.0
|
||||
feature_pitch:
|
||||
headline: "Multi-source brains: one DB, many repos. Federated and isolated sources coexist."
|
||||
description: |
|
||||
v0.17.0 introduces sources as a first-class primitive. A single
|
||||
gbrain backend can now hold multiple knowledge repos (wiki, gstack,
|
||||
yc-media, garrys-list, etc.) with clean scoping. Every page, file,
|
||||
and ingest_log row is scoped to a `sources(id)` row. Slugs are
|
||||
unique PER source, not globally — so two sources can both have
|
||||
`topics/ai` and they're different pages.
|
||||
|
||||
Per-source federation controls whether a source participates in
|
||||
unqualified default search. `federated=true` (the default source
|
||||
post-upgrade) joins the cross-source recall pool. `federated=false`
|
||||
is isolation — only searched when explicitly named via `--source`.
|
||||
This supports both "unified knowledge brain" (wiki + gstack, both
|
||||
federated) and "purpose-separated brains" (yc-media + garrys-list,
|
||||
both isolated) at the same time.
|
||||
|
||||
Per-directory default via `.gbrain-source` dotfile walk-up +
|
||||
`GBRAIN_SOURCE` env var. Matches how kubectl / terraform / git
|
||||
scope context. `cd ~/yc-media && gbrain query "X"` just works.
|
||||
recipe: docs/guides/multi-source-brains.md
|
||||
tiers: null
|
||||
---
|
||||
|
||||
# v0.17.0 Migration: Multi-source brains
|
||||
|
||||
**Audience: host agents reading this after `gbrain apply-migrations`
|
||||
has run. v0.17.0 installs a schema primitive for multi-source and
|
||||
exposes a `sources` CLI subcommand. Existing single-source brains
|
||||
keep working unchanged — they live under a seeded `default` source
|
||||
that preserves all prior behavior.**
|
||||
|
||||
## Mechanical migration: automatic, no action required
|
||||
|
||||
`gbrain upgrade` chains to `gbrain apply-migrations --yes`, which
|
||||
runs:
|
||||
|
||||
- **migration v16** — creates the `sources` table, seeds `default`
|
||||
with `{"federated": true}` config, inherits your pre-v0.17
|
||||
`sync.repo_path` and `sync.last_commit` into the default row.
|
||||
- **migration v17** — adds `pages.source_id TEXT NOT NULL DEFAULT
|
||||
'default' REFERENCES sources(id)`. Swaps the global `UNIQUE(slug)`
|
||||
constraint for composite `UNIQUE(source_id, slug)`. Engine
|
||||
upserts simultaneously re-target `ON CONFLICT (source_id, slug)`
|
||||
so the constraint swap and the write path land atomically.
|
||||
|
||||
Both migrations are idempotent. Safe to re-run.
|
||||
|
||||
Later point releases (v0.17.1 and v0.18.0) will layer:
|
||||
- v0.17.1: ACL enforcement via a caller-identity primitive (the
|
||||
JSONB slot for `access_policy` ships now; enforcement waits for
|
||||
identity to be designed).
|
||||
- v0.18.0: Session ingest (`.jsonl` transcripts, raised size cap,
|
||||
session PageType) AND per-source retention/TTL at the same time.
|
||||
|
||||
## What's new for agents
|
||||
|
||||
### `sources` CLI subcommand
|
||||
|
||||
```
|
||||
gbrain sources add <id> --path <p> [--name <n>] [--federated|--no-federated]
|
||||
gbrain sources list [--json]
|
||||
gbrain sources remove <id> [--yes] [--dry-run] [--keep-storage]
|
||||
gbrain sources rename <id> <new-display-name>
|
||||
gbrain sources default <id>
|
||||
gbrain sources attach <id> # write .gbrain-source in CWD
|
||||
gbrain sources detach # remove .gbrain-source
|
||||
gbrain sources federate <id>
|
||||
gbrain sources unfederate <id>
|
||||
```
|
||||
|
||||
Source id rules: `[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?` — start + end
|
||||
with alnum, optional interior hyphens, max 32 chars. Immutable after
|
||||
creation (rename only changes the display name). Used as the stable
|
||||
citation key in `[source:slug]` references.
|
||||
|
||||
### Per-directory default
|
||||
|
||||
Running `gbrain sources attach gstack` inside `~/.gstack/` writes a
|
||||
`.gbrain-source` file containing the single word `gstack`. Any
|
||||
gbrain command run from that directory (or any subdirectory) auto-
|
||||
selects `gstack` as the default source. `gbrain sources detach`
|
||||
removes the dotfile.
|
||||
|
||||
Resolution priority for the source a command targets:
|
||||
|
||||
1. Explicit `--source <id>` flag.
|
||||
2. `GBRAIN_SOURCE` env var.
|
||||
3. `.gbrain-source` dotfile in CWD or any ancestor.
|
||||
4. Registered source whose `local_path` contains CWD (longest
|
||||
prefix wins — nested `~/gstack` + `~/gstack/plans` resolves to
|
||||
`plans` when deeper).
|
||||
5. Brain-level default set via `gbrain sources default <id>`.
|
||||
6. Literal `default` (backward-compat fallback).
|
||||
|
||||
### Federation semantics
|
||||
|
||||
- `federated=true` (only the `default` source has this out of the
|
||||
box, by migration): appears in unqualified `gbrain search "X"`
|
||||
results.
|
||||
- `federated=false` (new sources default to this): only appears
|
||||
when `--source <id>` is passed.
|
||||
|
||||
Interactive `gbrain sources add` prompts for federation; non-
|
||||
interactive uses `--federated` / `--no-federated`. Flip later with
|
||||
`gbrain sources federate <id>` / `unfederate <id>`.
|
||||
|
||||
### Citation contract (for agents)
|
||||
|
||||
When agents get multi-source search results they MUST cite pages
|
||||
in `[source-id:slug]` form. Example:
|
||||
|
||||
> You told me about the distillation protocol — see
|
||||
> [wiki:topics/ai] and [gstack:plans/multi-repo] for where this
|
||||
> came from.
|
||||
|
||||
Citations are keyed on `sources.id` (immutable), never
|
||||
`sources.name` (mutable display). If a user renames a source via
|
||||
`gbrain sources rename`, existing citations stay valid.
|
||||
|
||||
## What's NOT in v0.17.0 yet
|
||||
|
||||
The following land in later Steps of this release cycle (already
|
||||
on the branch but gated until the matching code ships):
|
||||
|
||||
- `ingest_log.source_id` — lands with Step 5 sync rewrite.
|
||||
- `links.resolution_type` + qualified `[[source:slug]]` wikilink
|
||||
parsing — lands with Step 4 link-extraction rewrite.
|
||||
- `files.page_slug → page_id` FK rewrite + `file_migration_ledger`
|
||||
+ storage object prefixing — lands with Step 7 storage backfill.
|
||||
- Source-aware search dedup — lands with Step 3.
|
||||
- `gbrain sources import-from-github <url>` — deferred to a patch
|
||||
release after the plumbing stabilizes.
|
||||
|
||||
Existing callers continue to work against the `default` source. No
|
||||
agent behavioral change is required; the new capabilities are
|
||||
opt-in via the new `sources` CLI surface.
|
||||
|
||||
## Host-repo actions
|
||||
|
||||
None required. If your host agent manages the brain via the
|
||||
standard `gbrain sync` flow, it continues to target the default
|
||||
source and sees no behavioral change. To start using multi-source:
|
||||
|
||||
```
|
||||
# Register a new source
|
||||
gbrain sources add gstack --path ~/.gstack --no-federated
|
||||
|
||||
# Pin that directory to it so no --source flag is needed
|
||||
cd ~/.gstack
|
||||
gbrain sources attach gstack
|
||||
|
||||
# Ingest
|
||||
gbrain sync --source gstack
|
||||
```
|
||||
|
||||
Or see `docs/guides/multi-source-brains.md` for the full three
|
||||
canonical scenarios (unified, purpose-separated, mixed).
|
||||
@@ -19,7 +19,7 @@ for (const op of operations) {
|
||||
}
|
||||
|
||||
// CLI-only commands that bypass the operation layer
|
||||
const CLI_ONLY = new Set(['init', 'upgrade', 'post-upgrade', 'check-update', 'integrations', 'publish', 'check-backlinks', 'lint', 'report', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor', 'migrate', 'eval', 'sync', 'extract', 'features', 'autopilot', 'graph-query', 'jobs', 'agent', 'apply-migrations', 'skillpack-check', 'resolvers', 'integrity', 'repair-jsonb', 'orphans', 'dream', 'check-resolvable']);
|
||||
const CLI_ONLY = new Set(['init', 'upgrade', 'post-upgrade', 'check-update', 'integrations', 'publish', 'check-backlinks', 'lint', 'report', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor', 'migrate', 'eval', 'sync', 'extract', 'features', 'autopilot', 'graph-query', 'jobs', 'agent', 'apply-migrations', 'skillpack-check', 'resolvers', 'integrity', 'repair-jsonb', 'orphans', 'sources', 'dream', 'check-resolvable']);
|
||||
|
||||
async function main() {
|
||||
// Parse global flags (--quiet / --progress-json / --progress-interval)
|
||||
@@ -472,6 +472,11 @@ async function handleCliOnly(command: string, args: string[]) {
|
||||
await runOrphans(engine, args);
|
||||
break;
|
||||
}
|
||||
case 'sources': {
|
||||
const { runSources } = await import('./commands/sources.ts');
|
||||
await runSources(engine, args);
|
||||
break;
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
if (command !== 'serve') await engine.disconnect();
|
||||
|
||||
@@ -18,6 +18,7 @@ import { v0_13_0 } from './v0_13_0.ts';
|
||||
import { v0_13_1 } from './v0_13_1.ts';
|
||||
import { v0_14_0 } from './v0_14_0.ts';
|
||||
import { v0_16_0 } from './v0_16_0.ts';
|
||||
import { v0_18_0 } from './v0_18_0.ts';
|
||||
|
||||
export const migrations: Migration[] = [
|
||||
v0_11_0,
|
||||
@@ -27,6 +28,7 @@ export const migrations: Migration[] = [
|
||||
v0_13_1,
|
||||
v0_14_0,
|
||||
v0_16_0,
|
||||
v0_18_0,
|
||||
];
|
||||
|
||||
/** Look up a migration by exact version string. */
|
||||
|
||||
174
src/commands/migrations/v0_18_0-storage-backfill.ts
Normal file
174
src/commands/migrations/v0_18_0-storage-backfill.ts
Normal file
@@ -0,0 +1,174 @@
|
||||
/**
|
||||
* v0.18.0 Step 7 — phase B storage backfill loader.
|
||||
*
|
||||
* Drives the `file_migration_ledger` state machine forward:
|
||||
*
|
||||
* pending → copy_done → db_updated → complete
|
||||
*
|
||||
* Each per-file transition is a separate transaction so a crash
|
||||
* between states leaves a recoverable row (resume-on-partial). The
|
||||
* ledger is the atomicity backstop for non-atomic object-storage
|
||||
* "renames" (S3/Supabase = copy+delete).
|
||||
*
|
||||
* Crash-point recovery:
|
||||
* - crash AFTER copy, BEFORE DB update → re-run detects
|
||||
* `status='copy_done'`, completes DB update (copy is idempotent
|
||||
* against S3 overwrite so re-copy on same path is fine).
|
||||
* - crash AFTER DB update, BEFORE ledger mark → re-run detects
|
||||
* `status='db_updated'`, marks `complete`.
|
||||
* - crash AFTER ledger mark, BEFORE old-object delete → delete runs
|
||||
* in the explicit "cleanup" sub-phase so old objects are
|
||||
* preserved until a separate operator decision.
|
||||
*
|
||||
* Scope: v0.18.0 Step 7 DOES rewrite storage_path in the files table
|
||||
* and copies the bytes to the new source-prefixed path. It does NOT
|
||||
* delete the old objects — that's reserved for a later release once
|
||||
* operators have had time to verify the new paths. Old and new
|
||||
* objects coexist during the soak period.
|
||||
*/
|
||||
|
||||
import type { BrainEngine } from '../../core/engine.ts';
|
||||
import type { StorageBackend, StorageConfig } from '../../core/storage.ts';
|
||||
|
||||
interface LedgerRow {
|
||||
file_id: number;
|
||||
storage_path_old: string;
|
||||
storage_path_new: string;
|
||||
status: 'pending' | 'copy_done' | 'db_updated' | 'complete' | 'failed';
|
||||
}
|
||||
|
||||
export interface BackfillReport {
|
||||
total: number;
|
||||
alreadyComplete: number;
|
||||
nowComplete: number;
|
||||
failed: number;
|
||||
skipped: number;
|
||||
errors: Array<{ file_id: number; error: string }>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Process all non-complete ledger rows. Safe to re-run; each row
|
||||
* resumes from whichever state it was in. Storage is injected so the
|
||||
* caller can pass a real S3/Supabase backend OR a dry-run stub that
|
||||
* short-circuits the copy.
|
||||
*
|
||||
* If storage is null/undefined the function runs as a dry-run: it
|
||||
* reports what WOULD be processed without touching objects. This is
|
||||
* used by the orchestrator when storage isn't configured.
|
||||
*/
|
||||
export async function runStorageBackfill(
|
||||
engine: BrainEngine,
|
||||
storage: StorageBackend | null,
|
||||
opts?: { dryRun?: boolean },
|
||||
): Promise<BackfillReport> {
|
||||
const report: BackfillReport = {
|
||||
total: 0,
|
||||
alreadyComplete: 0,
|
||||
nowComplete: 0,
|
||||
failed: 0,
|
||||
skipped: 0,
|
||||
errors: [],
|
||||
};
|
||||
|
||||
// Snapshot all ledger rows. We don't paginate because the ledger
|
||||
// is bounded by current files count — every gbrain install has
|
||||
// at most low-thousands of files.
|
||||
const rows = await engine.executeRaw<LedgerRow>(
|
||||
`SELECT file_id, storage_path_old, storage_path_new, status
|
||||
FROM file_migration_ledger
|
||||
ORDER BY file_id`,
|
||||
);
|
||||
report.total = rows.length;
|
||||
|
||||
for (const row of rows) {
|
||||
if (row.status === 'complete') {
|
||||
report.alreadyComplete++;
|
||||
continue;
|
||||
}
|
||||
if (row.status === 'failed') {
|
||||
report.failed++;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (opts?.dryRun || !storage) {
|
||||
// Dry-run: count pending rows but don't advance state.
|
||||
report.skipped++;
|
||||
continue;
|
||||
}
|
||||
|
||||
// Drive the state machine. Each transition is its own
|
||||
// executeRaw call so mid-row crashes leave a recoverable state.
|
||||
try {
|
||||
let status = row.status;
|
||||
|
||||
// pending → copy_done: COPY the bytes.
|
||||
if (status === 'pending') {
|
||||
// If the new path is already populated (e.g. from a previous
|
||||
// partial run), the copy is redundant but idempotent on S3/
|
||||
// Supabase where upload overwrites the key.
|
||||
const exists = await storage.exists(row.storage_path_new).catch(() => false);
|
||||
if (!exists) {
|
||||
const data = await storage.download(row.storage_path_old);
|
||||
await storage.upload(row.storage_path_new, data);
|
||||
}
|
||||
await engine.executeRaw(
|
||||
`UPDATE file_migration_ledger
|
||||
SET status = 'copy_done', updated_at = now()
|
||||
WHERE file_id = $1`,
|
||||
[row.file_id],
|
||||
);
|
||||
status = 'copy_done';
|
||||
}
|
||||
|
||||
// copy_done → db_updated: flip files.storage_path to the new
|
||||
// path. Once this commits, downloads go through the new path
|
||||
// and the old object is orphaned (but still present on disk
|
||||
// for rollback within the soak window).
|
||||
if (status === 'copy_done') {
|
||||
await engine.executeRaw(
|
||||
`UPDATE files SET storage_path = $1 WHERE id = $2`,
|
||||
[row.storage_path_new, row.file_id],
|
||||
);
|
||||
await engine.executeRaw(
|
||||
`UPDATE file_migration_ledger
|
||||
SET status = 'db_updated', updated_at = now()
|
||||
WHERE file_id = $1`,
|
||||
[row.file_id],
|
||||
);
|
||||
status = 'db_updated';
|
||||
}
|
||||
|
||||
// db_updated → complete: mark terminal. The old-object delete
|
||||
// happens in a separate sub-phase (future release) so operators
|
||||
// can verify the new paths before we drop the safety net.
|
||||
if (status === 'db_updated') {
|
||||
await engine.executeRaw(
|
||||
`UPDATE file_migration_ledger
|
||||
SET status = 'complete', updated_at = now()
|
||||
WHERE file_id = $1`,
|
||||
[row.file_id],
|
||||
);
|
||||
report.nowComplete++;
|
||||
}
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
report.failed++;
|
||||
report.errors.push({ file_id: row.file_id, error: msg });
|
||||
// Mark failed so the next run doesn't retry blindly. Operator
|
||||
// can reset to 'pending' via SQL once the root cause is fixed.
|
||||
try {
|
||||
await engine.executeRaw(
|
||||
`UPDATE file_migration_ledger
|
||||
SET status = 'failed', error = $1, updated_at = now()
|
||||
WHERE file_id = $2`,
|
||||
[msg.slice(0, 500), row.file_id],
|
||||
);
|
||||
} catch {
|
||||
// Best-effort: if we can't even write 'failed', report the
|
||||
// original error and move on.
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return report;
|
||||
}
|
||||
237
src/commands/migrations/v0_18_0.ts
Normal file
237
src/commands/migrations/v0_18_0.ts
Normal file
@@ -0,0 +1,237 @@
|
||||
/**
|
||||
* v0.18.0 migration orchestrator — Multi-source brains.
|
||||
*
|
||||
* Split across sub-versions of the migration registry for safety:
|
||||
* - v16 (Step 1 / Lane A): additive-only. Installs sources table +
|
||||
* default row. Does NOT break any existing engine code.
|
||||
* - v17 (Step 2 / Lane B, future): breaking schema changes. Rides with
|
||||
* the engine API rewrite so ON CONFLICT (source_id, slug) lands
|
||||
* atomically with the composite UNIQUE.
|
||||
*
|
||||
* Phase structure (per /plan-ceo-review + /plan-eng-review):
|
||||
* A. Schema — gbrain init --migrate-only runs the migration chain up
|
||||
* to whichever v-prefix has shipped (v16 today, v17 next).
|
||||
* B. Storage backfill (Step 7, future) — ledger-driven object rewrite.
|
||||
* C. Verify — assert sources('default') exists today. Composite UNIQUE,
|
||||
* page_id backfill, and ledger completeness get added in Step 2.
|
||||
* D. (future) Delete old storage objects — only runs after C green.
|
||||
*
|
||||
* Idempotent: safe to re-run on partial state.
|
||||
*/
|
||||
|
||||
import { execSync } from 'child_process';
|
||||
import type { Migration, OrchestratorOpts, OrchestratorResult, OrchestratorPhaseResult } from './types.ts';
|
||||
import { appendCompletedMigration } from '../../core/preferences.ts';
|
||||
import { loadConfig, toEngineConfig } from '../../core/config.ts';
|
||||
import { createEngine } from '../../core/engine-factory.ts';
|
||||
|
||||
// ── Phase A — Schema ────────────────────────────────────────
|
||||
|
||||
function phaseASchema(opts: OrchestratorOpts): OrchestratorPhaseResult {
|
||||
if (opts.dryRun) return { name: 'schema', status: 'skipped', detail: 'dry-run' };
|
||||
try {
|
||||
execSync('gbrain init --migrate-only', { stdio: 'inherit', timeout: 600_000, env: process.env });
|
||||
return { name: 'schema', status: 'complete' };
|
||||
} catch (e) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
return { name: 'schema', status: 'failed', detail: msg };
|
||||
}
|
||||
}
|
||||
|
||||
// ── Phase B — Storage backfill (skeleton, filled by Step 7) ──
|
||||
|
||||
async function phaseBBackfillStorage(opts: OrchestratorOpts): Promise<OrchestratorPhaseResult> {
|
||||
if (opts.dryRun) return { name: 'backfill_storage', status: 'skipped', detail: 'dry-run' };
|
||||
try {
|
||||
const config = loadConfig();
|
||||
if (!config) return { name: 'backfill_storage', status: 'skipped', detail: 'no brain configured' };
|
||||
|
||||
const engine = await createEngine(toEngineConfig(config));
|
||||
await engine.connect(toEngineConfig(config));
|
||||
try {
|
||||
if (engine.kind === 'pglite') {
|
||||
return { name: 'backfill_storage', status: 'skipped', detail: 'pglite (no files table)' };
|
||||
}
|
||||
const hasLedger = await engine.executeRaw<{ exists: boolean }>(
|
||||
`SELECT EXISTS (SELECT 1 FROM information_schema.tables
|
||||
WHERE table_schema = current_schema()
|
||||
AND table_name = 'file_migration_ledger') AS exists`,
|
||||
);
|
||||
if (!hasLedger[0]?.exists) {
|
||||
return {
|
||||
name: 'backfill_storage',
|
||||
status: 'skipped',
|
||||
detail: 'file_migration_ledger not yet installed (run apply-migrations first)',
|
||||
};
|
||||
}
|
||||
|
||||
// Ledger exists. If storage isn't configured, run the dry-run
|
||||
// path — we can still report the ledger state but we can't
|
||||
// COPY objects. Operator then wires storage and re-runs.
|
||||
const storage = config.storage ? await loadStorageBackend(config.storage) : null;
|
||||
|
||||
const { runStorageBackfill } = await import('./v0_18_0-storage-backfill.ts');
|
||||
const report = await runStorageBackfill(engine, storage, { dryRun: !storage });
|
||||
|
||||
if (report.total === 0) {
|
||||
return { name: 'backfill_storage', status: 'complete', detail: 'no files to migrate' };
|
||||
}
|
||||
|
||||
if (report.failed > 0) {
|
||||
return {
|
||||
name: 'backfill_storage',
|
||||
status: 'failed',
|
||||
detail: `${report.failed}/${report.total} files failed: ${report.errors.slice(0, 3).map(e => `#${e.file_id}: ${e.error.slice(0, 60)}`).join('; ')}`,
|
||||
};
|
||||
}
|
||||
|
||||
if (report.skipped > 0 && !storage) {
|
||||
return {
|
||||
name: 'backfill_storage',
|
||||
status: 'skipped',
|
||||
detail: `${report.skipped}/${report.total} files pending; storage backend not configured (wire storage + re-run)`,
|
||||
};
|
||||
}
|
||||
|
||||
const detail = `${report.total} files: ${report.alreadyComplete} already complete, ${report.nowComplete} newly migrated`;
|
||||
return { name: 'backfill_storage', status: 'complete', detail };
|
||||
} finally {
|
||||
try { await engine.disconnect(); } catch {}
|
||||
}
|
||||
} catch (e) {
|
||||
return {
|
||||
name: 'backfill_storage',
|
||||
status: 'failed',
|
||||
detail: e instanceof Error ? e.message : String(e),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
async function loadStorageBackend(storageConfig: unknown): Promise<import('../../core/storage.ts').StorageBackend | null> {
|
||||
try {
|
||||
const { createStorage } = await import('../../core/storage.ts');
|
||||
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||
return await createStorage(storageConfig as any);
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
// ── Phase C — Verify ────────────────────────────────────────
|
||||
|
||||
async function phaseCVerify(opts: OrchestratorOpts): Promise<OrchestratorPhaseResult> {
|
||||
if (opts.dryRun) return { name: 'verify', status: 'skipped', detail: 'dry-run' };
|
||||
try {
|
||||
const config = loadConfig();
|
||||
if (!config) return { name: 'verify', status: 'skipped', detail: 'no brain configured' };
|
||||
|
||||
const engine = await createEngine(toEngineConfig(config));
|
||||
await engine.connect(toEngineConfig(config));
|
||||
try {
|
||||
// 1. sources('default') exists (Step 1 / v16).
|
||||
const defaults = await engine.executeRaw<{ id: string }>(
|
||||
`SELECT id FROM sources WHERE id = 'default'`,
|
||||
);
|
||||
if (defaults.length !== 1) {
|
||||
return { name: 'verify', status: 'failed', detail: "sources('default') row missing" };
|
||||
}
|
||||
|
||||
// Step 2 checks (composite UNIQUE, links.resolution_type,
|
||||
// file_migration_ledger completion) are gated on the future v17
|
||||
// migration. They run conditionally — if the column/constraint
|
||||
// exists, verify it; if not, that's fine for Step 1.
|
||||
|
||||
// Optional: composite UNIQUE if installed (Step 2 future work).
|
||||
const constraint = await engine.executeRaw<{ conname: string }>(
|
||||
`SELECT conname FROM pg_constraint WHERE conname = 'pages_source_slug_key'`,
|
||||
);
|
||||
// If installed, verify no pages have NULL source_id.
|
||||
if (constraint.length === 1) {
|
||||
const nullSources = await engine.executeRaw<{ n: number }>(
|
||||
`SELECT COUNT(*)::int AS n FROM pages WHERE source_id IS NULL`,
|
||||
);
|
||||
if ((nullSources[0]?.n ?? 0) > 0) {
|
||||
return { name: 'verify', status: 'failed', detail: `${nullSources[0].n} pages with NULL source_id` };
|
||||
}
|
||||
}
|
||||
|
||||
return { name: 'verify', status: 'complete', detail: 'sources primitive installed' };
|
||||
} finally {
|
||||
try { await engine.disconnect(); } catch {}
|
||||
}
|
||||
} catch (e) {
|
||||
return { name: 'verify', status: 'failed', detail: e instanceof Error ? e.message : String(e) };
|
||||
}
|
||||
}
|
||||
|
||||
// ── Orchestrator ────────────────────────────────────────────
|
||||
|
||||
async function orchestrator(opts: OrchestratorOpts): Promise<OrchestratorResult> {
|
||||
console.log('');
|
||||
console.log('=== v0.18.0 — Multi-source brains ===');
|
||||
if (opts.dryRun) console.log(' (dry-run; no side effects)');
|
||||
console.log('');
|
||||
|
||||
const phases: OrchestratorPhaseResult[] = [];
|
||||
|
||||
const a = phaseASchema(opts);
|
||||
phases.push(a);
|
||||
if (a.status === 'failed') return finalize(phases, 'failed');
|
||||
|
||||
const b = await phaseBBackfillStorage(opts);
|
||||
phases.push(b);
|
||||
// Phase B 'failed' is currently expected until Step 7 lands the storage
|
||||
// loader. Continue to verify so users see the exact gap.
|
||||
|
||||
const c = await phaseCVerify(opts);
|
||||
phases.push(c);
|
||||
|
||||
// a.status === 'failed' already early-returned on line 179, so only
|
||||
// c and b determine the final status here. TypeScript narrowing rejects
|
||||
// a redundant a.status === 'failed' check.
|
||||
const status: 'complete' | 'partial' | 'failed' =
|
||||
c.status === 'failed' ? 'failed' :
|
||||
b.status === 'failed' ? 'partial' :
|
||||
'complete';
|
||||
|
||||
return finalize(phases, status);
|
||||
}
|
||||
|
||||
function finalize(phases: OrchestratorPhaseResult[], status: 'complete' | 'partial' | 'failed'): OrchestratorResult {
|
||||
if (status !== 'failed') {
|
||||
try {
|
||||
appendCompletedMigration({
|
||||
version: '0.18.0',
|
||||
completed_at: new Date().toISOString(),
|
||||
status: status as 'complete' | 'partial',
|
||||
phases: phases.map(p => ({ name: p.name, status: p.status })),
|
||||
});
|
||||
} catch {
|
||||
// Best-effort.
|
||||
}
|
||||
}
|
||||
return { version: '0.18.0', status, phases };
|
||||
}
|
||||
|
||||
export const v0_18_0: Migration = {
|
||||
version: '0.18.0',
|
||||
featurePitch: {
|
||||
headline: 'Multi-source brains: one database, many knowledge repos. Federation flag keeps them from polluting each other.',
|
||||
description:
|
||||
'v0.18.0 introduces sources — a first-class primitive that lets one gbrain backend hold ' +
|
||||
'multiple repos (wiki, gstack, yc-media, etc.) with clean scoping. Every page, file, and ' +
|
||||
'ingest_log row is now scoped to a source. Cross-source search is opt-in per source ' +
|
||||
'(federated=true) so isolated content (yc-media, garrys-list) never bleeds into your main ' +
|
||||
'brain. New commands: `gbrain sources add/attach/import-from-github`. Per-directory ' +
|
||||
'default via .gbrain-source dotfile + GBRAIN_SOURCE env var. See docs/guides/' +
|
||||
'multi-source-brains.md.',
|
||||
},
|
||||
orchestrator,
|
||||
};
|
||||
|
||||
/** Exported for unit tests. */
|
||||
export const __testing = {
|
||||
phaseASchema,
|
||||
phaseBBackfillStorage,
|
||||
phaseCVerify,
|
||||
};
|
||||
372
src/commands/sources.ts
Normal file
372
src/commands/sources.ts
Normal file
@@ -0,0 +1,372 @@
|
||||
/**
|
||||
* gbrain sources — manage multi-source brain configuration (v0.18.0).
|
||||
*
|
||||
* A source is a logical brain-within-the-DB: wiki, gstack, yc-media, etc.
|
||||
* Every page/file/ingest_log row is scoped to a sources(id) row. Slugs
|
||||
* are unique per source. See docs/guides/multi-source-brains.md for the
|
||||
* full story.
|
||||
*
|
||||
* Subcommands:
|
||||
* gbrain sources add <id> --path <path> [--name <display>] [--federated|--no-federated]
|
||||
* gbrain sources list [--json]
|
||||
* gbrain sources remove <id> [--yes] [--dry-run] [--keep-storage]
|
||||
* gbrain sources rename <id> <new-name>
|
||||
* gbrain sources default <id>
|
||||
* gbrain sources attach <id> — write .gbrain-source in CWD
|
||||
* gbrain sources detach — remove .gbrain-source from CWD
|
||||
* gbrain sources federate <id> — sources.config.federated = true
|
||||
* gbrain sources unfederate <id> — sources.config.federated = false
|
||||
*
|
||||
* NOT in scope for Step 6 (deferred per plan):
|
||||
* - import-from-github (needs SSRF + clone integration)
|
||||
* - prune (retention/TTL deferred to v0.18)
|
||||
* - MCP tool-def regen for full source-scoping of all ops (part of Step 2+5)
|
||||
*/
|
||||
|
||||
import { writeFileSync, unlinkSync, existsSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
// ── Validation ──────────────────────────────────────────────
|
||||
|
||||
// Shared with source-resolver.ts — canonical shape.
|
||||
const SOURCE_ID_RE = /^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$/;
|
||||
|
||||
function validateSourceId(id: string): void {
|
||||
if (!SOURCE_ID_RE.test(id)) {
|
||||
throw new Error(
|
||||
`Invalid source id "${id}". Must be 1-32 lowercase alnum chars with optional interior hyphens (e.g. "wiki", "yc-media").`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// ── Types ───────────────────────────────────────────────────
|
||||
|
||||
interface SourceRow {
|
||||
id: string;
|
||||
name: string;
|
||||
local_path: string | null;
|
||||
last_commit: string | null;
|
||||
last_sync_at: Date | null;
|
||||
config: Record<string, unknown> | string;
|
||||
created_at: Date;
|
||||
}
|
||||
|
||||
interface SourceListEntry {
|
||||
id: string;
|
||||
name: string;
|
||||
local_path: string | null;
|
||||
federated: boolean;
|
||||
page_count: number;
|
||||
last_sync_at: string | null;
|
||||
}
|
||||
|
||||
// ── Helpers ─────────────────────────────────────────────────
|
||||
|
||||
function parseConfig(config: unknown): Record<string, unknown> {
|
||||
if (typeof config === 'string') {
|
||||
try { return JSON.parse(config) as Record<string, unknown>; } catch { return {}; }
|
||||
}
|
||||
if (typeof config === 'object' && config !== null) return config as Record<string, unknown>;
|
||||
return {};
|
||||
}
|
||||
|
||||
function isFederated(config: unknown): boolean {
|
||||
const parsed = parseConfig(config);
|
||||
return parsed.federated === true;
|
||||
}
|
||||
|
||||
async function fetchSource(engine: BrainEngine, id: string): Promise<SourceRow | null> {
|
||||
const rows = await engine.executeRaw<SourceRow>(
|
||||
`SELECT id, name, local_path, last_commit, last_sync_at, config, created_at
|
||||
FROM sources WHERE id = $1`,
|
||||
[id],
|
||||
);
|
||||
return rows[0] ?? null;
|
||||
}
|
||||
|
||||
async function countPages(engine: BrainEngine, sourceId: string): Promise<number> {
|
||||
const rows = await engine.executeRaw<{ n: number }>(
|
||||
`SELECT COUNT(*)::int AS n FROM pages WHERE source_id = $1`,
|
||||
[sourceId],
|
||||
);
|
||||
return rows[0]?.n ?? 0;
|
||||
}
|
||||
|
||||
// ── Subcommand: add ─────────────────────────────────────────
|
||||
|
||||
async function runAdd(engine: BrainEngine, args: string[]): Promise<void> {
|
||||
const id = args[0];
|
||||
if (!id) {
|
||||
console.error('Usage: gbrain sources add <id> --path <path> [--name <display>] [--federated|--no-federated]');
|
||||
process.exit(2);
|
||||
}
|
||||
validateSourceId(id);
|
||||
|
||||
let localPath: string | null = null;
|
||||
let displayName = id;
|
||||
let federated: boolean | null = null; // null = default (false for new, opt-in via --federated)
|
||||
|
||||
for (let i = 1; i < args.length; i++) {
|
||||
const a = args[i];
|
||||
if (a === '--path') { localPath = args[++i]; continue; }
|
||||
if (a === '--name') { displayName = args[++i]; continue; }
|
||||
if (a === '--federated') { federated = true; continue; }
|
||||
if (a === '--no-federated') { federated = false; continue; }
|
||||
console.error(`Unknown flag: ${a}`);
|
||||
process.exit(2);
|
||||
}
|
||||
|
||||
// Overlapping path guard: reject if new path is inside or contains an
|
||||
// existing source's local_path (per eng review §4 finding 4.1).
|
||||
// Throwing (vs process.exit) keeps this testable via the standard
|
||||
// CLI error-handling wrapper in src/cli.ts.
|
||||
if (localPath) {
|
||||
const others = await engine.executeRaw<{ id: string; local_path: string }>(
|
||||
`SELECT id, local_path FROM sources WHERE local_path IS NOT NULL AND id != $1`,
|
||||
[id],
|
||||
);
|
||||
for (const other of others) {
|
||||
const a = localPath;
|
||||
const b = other.local_path;
|
||||
if (a === b || a.startsWith(b + '/') || b.startsWith(a + '/')) {
|
||||
throw new Error(
|
||||
`path "${a}" overlaps with existing source "${other.id}" at "${b}". ` +
|
||||
`Overlapping sources are not allowed — same files would ingest twice under different source_ids.`,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const config = federated === null ? {} : { federated };
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO sources (id, name, local_path, config)
|
||||
VALUES ($1, $2, $3, $4::jsonb)
|
||||
ON CONFLICT (id) DO NOTHING`,
|
||||
[id, displayName, localPath, JSON.stringify(config)],
|
||||
);
|
||||
|
||||
const created = await fetchSource(engine, id);
|
||||
if (!created) {
|
||||
console.error(`Failed to create source "${id}" (conflict with existing id?)`);
|
||||
process.exit(4);
|
||||
}
|
||||
const fed = isFederated(created.config);
|
||||
console.log(`Created source "${id}"${displayName !== id ? ` (name: ${displayName})` : ''}${localPath ? ` → ${localPath}` : ''}`);
|
||||
console.log(` federated: ${fed}${fed ? ' — appears in cross-source default search' : ' — only searched when explicitly named via --source'}`);
|
||||
}
|
||||
|
||||
// ── Subcommand: list ────────────────────────────────────────
|
||||
|
||||
async function runList(engine: BrainEngine, args: string[]): Promise<void> {
|
||||
const json = args.includes('--json');
|
||||
|
||||
const rows = await engine.executeRaw<SourceRow>(
|
||||
`SELECT id, name, local_path, last_commit, last_sync_at, config, created_at
|
||||
FROM sources ORDER BY (id = 'default') DESC, id`,
|
||||
);
|
||||
|
||||
const entries: SourceListEntry[] = [];
|
||||
for (const r of rows) {
|
||||
const pageCount = await countPages(engine, r.id);
|
||||
entries.push({
|
||||
id: r.id,
|
||||
name: r.name,
|
||||
local_path: r.local_path,
|
||||
federated: isFederated(r.config),
|
||||
page_count: pageCount,
|
||||
last_sync_at: r.last_sync_at ? new Date(r.last_sync_at).toISOString() : null,
|
||||
});
|
||||
}
|
||||
|
||||
if (json) {
|
||||
console.log(JSON.stringify({ sources: entries }, null, 2));
|
||||
return;
|
||||
}
|
||||
|
||||
// Human-readable table.
|
||||
console.log('SOURCES');
|
||||
console.log('───────');
|
||||
for (const e of entries) {
|
||||
const fedMark = e.federated ? 'federated' : 'isolated';
|
||||
const pathStr = e.local_path ?? '(no local path)';
|
||||
const sync = e.last_sync_at ? `last sync ${e.last_sync_at}` : 'never synced';
|
||||
console.log(` ${e.id.padEnd(20)} ${fedMark.padEnd(10)} ${String(e.page_count).padStart(6)} pages ${sync}`);
|
||||
if (e.local_path) console.log(` ${' '.repeat(22)}${pathStr}`);
|
||||
}
|
||||
if (entries.length === 0) console.log(' (no sources registered)');
|
||||
}
|
||||
|
||||
// ── Subcommand: remove ──────────────────────────────────────
|
||||
|
||||
async function runRemove(engine: BrainEngine, args: string[]): Promise<void> {
|
||||
const id = args[0];
|
||||
if (!id) {
|
||||
console.error('Usage: gbrain sources remove <id> [--yes] [--dry-run] [--keep-storage]');
|
||||
process.exit(2);
|
||||
}
|
||||
const yes = args.includes('--yes');
|
||||
const dryRun = args.includes('--dry-run');
|
||||
// NOTE: --keep-storage is accepted for forward compatibility but has no
|
||||
// effect until Step 7 wires in explicit storage object deletion.
|
||||
const _keepStorage = args.includes('--keep-storage');
|
||||
void _keepStorage;
|
||||
|
||||
if (id === 'default') {
|
||||
console.error('Error: cannot remove the "default" source (it backs the pre-v0.17 brain).');
|
||||
process.exit(3);
|
||||
}
|
||||
|
||||
const src = await fetchSource(engine, id);
|
||||
if (!src) {
|
||||
console.error(`Source "${id}" not found.`);
|
||||
process.exit(4);
|
||||
}
|
||||
|
||||
const pageCount = await countPages(engine, id);
|
||||
console.log(`Source "${id}" → ${pageCount} pages will be deleted (cascade).`);
|
||||
|
||||
if (dryRun) {
|
||||
console.log(`(dry-run; no side effects)`);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!yes) {
|
||||
console.error(`Refusing to remove without --yes. Pass --yes to confirm.`);
|
||||
process.exit(5);
|
||||
}
|
||||
|
||||
await engine.executeRaw(`DELETE FROM sources WHERE id = $1`, [id]);
|
||||
console.log(`Removed source "${id}" (${pageCount} pages + dependent rows cascaded).`);
|
||||
}
|
||||
|
||||
// ── Subcommand: rename ──────────────────────────────────────
|
||||
|
||||
async function runRename(engine: BrainEngine, args: string[]): Promise<void> {
|
||||
const id = args[0];
|
||||
const newName = args[1];
|
||||
if (!id || !newName) {
|
||||
console.error('Usage: gbrain sources rename <id> <new-display-name>');
|
||||
process.exit(2);
|
||||
}
|
||||
const src = await fetchSource(engine, id);
|
||||
if (!src) {
|
||||
console.error(`Source "${id}" not found.`);
|
||||
process.exit(4);
|
||||
}
|
||||
await engine.executeRaw(`UPDATE sources SET name = $1 WHERE id = $2`, [newName, id]);
|
||||
console.log(`Renamed source "${id}" display: ${src.name} → ${newName} (id is immutable).`);
|
||||
}
|
||||
|
||||
// ── Subcommand: default ─────────────────────────────────────
|
||||
|
||||
async function runDefault(engine: BrainEngine, args: string[]): Promise<void> {
|
||||
const id = args[0];
|
||||
if (!id) {
|
||||
console.error('Usage: gbrain sources default <id>');
|
||||
process.exit(2);
|
||||
}
|
||||
const src = await fetchSource(engine, id);
|
||||
if (!src) {
|
||||
console.error(`Source "${id}" not found.`);
|
||||
process.exit(4);
|
||||
}
|
||||
// Stored in the config table (not sources.config, because it's a brain-
|
||||
// level preference not a per-source setting).
|
||||
await engine.setConfig('sources.default', id);
|
||||
console.log(`Default source set to "${id}".`);
|
||||
}
|
||||
|
||||
// ── Subcommand: attach / detach (CWD dotfile) ──────────────
|
||||
|
||||
function runAttach(args: string[]): void {
|
||||
const id = args[0];
|
||||
if (!id) {
|
||||
console.error('Usage: gbrain sources attach <id>');
|
||||
process.exit(2);
|
||||
}
|
||||
validateSourceId(id);
|
||||
const dotfile = join(process.cwd(), '.gbrain-source');
|
||||
writeFileSync(dotfile, id + '\n', 'utf8');
|
||||
console.log(`Attached ${process.cwd()} to source "${id}" via .gbrain-source.`);
|
||||
console.log(`Commands run from this directory (or any subdirectory) will default to this source.`);
|
||||
}
|
||||
|
||||
function runDetach(): void {
|
||||
const dotfile = join(process.cwd(), '.gbrain-source');
|
||||
if (!existsSync(dotfile)) {
|
||||
console.log(`No .gbrain-source file in ${process.cwd()}.`);
|
||||
return;
|
||||
}
|
||||
unlinkSync(dotfile);
|
||||
console.log(`Detached ${process.cwd()} (removed .gbrain-source).`);
|
||||
}
|
||||
|
||||
// ── Subcommand: federate / unfederate ───────────────────────
|
||||
|
||||
async function runFederate(engine: BrainEngine, args: string[], value: boolean): Promise<void> {
|
||||
const id = args[0];
|
||||
if (!id) {
|
||||
console.error(`Usage: gbrain sources ${value ? 'federate' : 'unfederate'} <id>`);
|
||||
process.exit(2);
|
||||
}
|
||||
const src = await fetchSource(engine, id);
|
||||
if (!src) {
|
||||
console.error(`Source "${id}" not found.`);
|
||||
process.exit(4);
|
||||
}
|
||||
const config = parseConfig(src.config);
|
||||
config.federated = value;
|
||||
await engine.executeRaw(
|
||||
`UPDATE sources SET config = $1::jsonb WHERE id = $2`,
|
||||
[JSON.stringify(config), id],
|
||||
);
|
||||
console.log(`Source "${id}" is now ${value ? 'federated (appears in cross-source default search)' : 'isolated (only searched when explicitly named)'}.`);
|
||||
}
|
||||
|
||||
// ── Dispatcher ──────────────────────────────────────────────
|
||||
|
||||
export async function runSources(engine: BrainEngine, args: string[]): Promise<void> {
|
||||
const sub = args[0];
|
||||
const rest = args.slice(1);
|
||||
|
||||
switch (sub) {
|
||||
case 'add': return runAdd(engine, rest);
|
||||
case 'list': return runList(engine, rest);
|
||||
case 'remove': return runRemove(engine, rest);
|
||||
case 'rename': return runRename(engine, rest);
|
||||
case 'default': return runDefault(engine, rest);
|
||||
case 'attach': runAttach(rest); return;
|
||||
case 'detach': runDetach(); return;
|
||||
case 'federate': return runFederate(engine, rest, true);
|
||||
case 'unfederate': return runFederate(engine, rest, false);
|
||||
case undefined:
|
||||
case '--help':
|
||||
case '-h':
|
||||
printHelp();
|
||||
return;
|
||||
default:
|
||||
console.error(`Unknown sources subcommand: ${sub}`);
|
||||
printHelp();
|
||||
process.exit(2);
|
||||
}
|
||||
}
|
||||
|
||||
function printHelp(): void {
|
||||
console.log(`gbrain sources — manage multi-source brain configuration (v0.18.0)
|
||||
|
||||
Subcommands:
|
||||
add <id> --path <p> [--name <n>] [--federated|--no-federated]
|
||||
Register a new source.
|
||||
list [--json] List registered sources with page counts.
|
||||
remove <id> [--yes] [--dry-run] Cascade-delete a source and its pages.
|
||||
rename <id> <new-name> Rename display name (id is immutable).
|
||||
default <id> Set the brain-level default source.
|
||||
attach <id> Write .gbrain-source in CWD (like kubectl context).
|
||||
detach Remove .gbrain-source from CWD.
|
||||
federate <id> Make source appear in cross-source default search.
|
||||
unfederate <id> Isolate source from default search.
|
||||
|
||||
Source id: [a-z0-9-]{1,32}. Immutable citation key.
|
||||
`);
|
||||
}
|
||||
@@ -41,6 +41,14 @@ export interface SyncOpts {
|
||||
skipFailed?: boolean;
|
||||
/** Bug 9 — re-attempt unacknowledged failures explicitly (CLI --retry-failed). */
|
||||
retryFailed?: boolean;
|
||||
/**
|
||||
* v0.18.0 Step 5 — sync a specific named source. When set, sync reads
|
||||
* local_path + last_commit from the sources table (not the global
|
||||
* config.sync.* keys) and writes last_commit + last_sync_at back to
|
||||
* the same row. Backward compat: when undefined, sync uses the
|
||||
* pre-v0.17 global-config path unchanged.
|
||||
*/
|
||||
sourceId?: string;
|
||||
}
|
||||
|
||||
function git(repoPath: string, ...args: string[]): string {
|
||||
@@ -50,11 +58,60 @@ function git(repoPath: string, ...args: string[]): string {
|
||||
}).trim();
|
||||
}
|
||||
|
||||
// v0.18.0 Step 5: source-scoped sync state helpers. When opts.sourceId
|
||||
// is set, read/write the per-source row instead of the global config
|
||||
// keys. These wrappers centralize the branch so every read/write site
|
||||
// picks the right storage — future Step 5 work (failure-tracking per
|
||||
// source) hooks here too.
|
||||
async function readSyncAnchor(
|
||||
engine: BrainEngine,
|
||||
sourceId: string | undefined,
|
||||
which: 'repo_path' | 'last_commit',
|
||||
): Promise<string | null> {
|
||||
if (sourceId) {
|
||||
const col = which === 'repo_path' ? 'local_path' : 'last_commit';
|
||||
const rows = await engine.executeRaw<Record<string, string | null>>(
|
||||
`SELECT ${col} AS value FROM sources WHERE id = $1`,
|
||||
[sourceId],
|
||||
);
|
||||
return rows[0]?.value ?? null;
|
||||
}
|
||||
return await engine.getConfig(`sync.${which}`);
|
||||
}
|
||||
|
||||
async function writeSyncAnchor(
|
||||
engine: BrainEngine,
|
||||
sourceId: string | undefined,
|
||||
which: 'repo_path' | 'last_commit',
|
||||
value: string,
|
||||
): Promise<void> {
|
||||
if (sourceId) {
|
||||
const col = which === 'repo_path' ? 'local_path' : 'last_commit';
|
||||
// last_sync_at bookmarked on every last_commit advance.
|
||||
if (which === 'last_commit') {
|
||||
await engine.executeRaw(
|
||||
`UPDATE sources SET last_commit = $1, last_sync_at = now() WHERE id = $2`,
|
||||
[value, sourceId],
|
||||
);
|
||||
} else {
|
||||
await engine.executeRaw(
|
||||
`UPDATE sources SET ${col} = $1 WHERE id = $2`,
|
||||
[value, sourceId],
|
||||
);
|
||||
}
|
||||
return;
|
||||
}
|
||||
await engine.setConfig(`sync.${which}`, value);
|
||||
}
|
||||
|
||||
export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<SyncResult> {
|
||||
// Resolve repo path
|
||||
const repoPath = opts.repoPath || await engine.getConfig('sync.repo_path');
|
||||
const repoPath = opts.repoPath || await readSyncAnchor(engine, opts.sourceId, 'repo_path');
|
||||
if (!repoPath) {
|
||||
throw new Error('No repo path specified. Use --repo or run gbrain init with --repo first.');
|
||||
const hint = opts.sourceId
|
||||
? `Source "${opts.sourceId}" has no local_path. Run: gbrain sources add ${opts.sourceId} --path <path>`
|
||||
: `No repo path specified. Use --repo or run gbrain init with --repo first.`;
|
||||
throw new Error(hint);
|
||||
}
|
||||
|
||||
// Validate git repo
|
||||
@@ -84,8 +141,8 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
|
||||
throw new Error(`No commits in repo ${repoPath}. Make at least one commit before syncing.`);
|
||||
}
|
||||
|
||||
// Read sync state
|
||||
const lastCommit = opts.full ? null : await engine.getConfig('sync.last_commit');
|
||||
// Read sync state (source-scoped when sourceId is set, global otherwise)
|
||||
const lastCommit = opts.full ? null : await readSyncAnchor(engine, opts.sourceId, 'last_commit');
|
||||
|
||||
// Ancestry validation: if lastCommit exists, verify it's still in history
|
||||
if (lastCommit) {
|
||||
@@ -175,7 +232,7 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
|
||||
|
||||
if (totalChanges === 0) {
|
||||
// Update sync state even with no syncable changes (git advanced)
|
||||
await engine.setConfig('sync.last_commit', headCommit);
|
||||
await writeSyncAnchor(engine, opts.sourceId, 'last_commit', headCommit);
|
||||
await engine.setConfig('sync.last_run', new Date().toISOString());
|
||||
return {
|
||||
status: 'up_to_date',
|
||||
@@ -296,7 +353,7 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
|
||||
);
|
||||
// Update last_run + repo_path (progress on infra) but NOT last_commit.
|
||||
await engine.setConfig('sync.last_run', new Date().toISOString());
|
||||
await engine.setConfig('sync.repo_path', repoPath);
|
||||
await writeSyncAnchor(engine, opts.sourceId, 'repo_path', repoPath);
|
||||
return {
|
||||
status: 'blocked_by_failures',
|
||||
fromCommit: lastCommit,
|
||||
@@ -318,10 +375,11 @@ export async function performSync(engine: BrainEngine, opts: SyncOpts): Promise<
|
||||
}
|
||||
}
|
||||
|
||||
// Update sync state AFTER all changes succeed
|
||||
await engine.setConfig('sync.last_commit', headCommit);
|
||||
// Update sync state AFTER all changes succeed (source-scoped when
|
||||
// opts.sourceId is set, global config otherwise).
|
||||
await writeSyncAnchor(engine, opts.sourceId, 'last_commit', headCommit);
|
||||
await engine.setConfig('sync.last_run', new Date().toISOString());
|
||||
await engine.setConfig('sync.repo_path', repoPath);
|
||||
await writeSyncAnchor(engine, opts.sourceId, 'repo_path', repoPath);
|
||||
|
||||
// Log ingest
|
||||
await engine.logIngest({
|
||||
@@ -423,7 +481,7 @@ async function performFullSync(
|
||||
`Fix the YAML in those files and re-run, or use '--skip-failed'.`,
|
||||
);
|
||||
await engine.setConfig('sync.last_run', new Date().toISOString());
|
||||
await engine.setConfig('sync.repo_path', repoPath);
|
||||
await writeSyncAnchor(engine, opts.sourceId, 'repo_path', repoPath);
|
||||
return {
|
||||
status: 'blocked_by_failures',
|
||||
fromCommit: null,
|
||||
@@ -439,10 +497,12 @@ async function performFullSync(
|
||||
if (acked > 0) console.error(` Acknowledged ${acked} failure(s) and advancing past them.`);
|
||||
}
|
||||
|
||||
// Persist sync state so next sync is incremental (C1 fix: was missing)
|
||||
await engine.setConfig('sync.last_commit', headCommit);
|
||||
// Persist sync state so next sync is incremental (C1 fix: was missing).
|
||||
// v0.18.0 Step 5: routed through writeSyncAnchor so --source pins it
|
||||
// to the right sources row rather than the global config.
|
||||
await writeSyncAnchor(engine, opts.sourceId, 'last_commit', headCommit);
|
||||
await engine.setConfig('sync.last_run', new Date().toISOString());
|
||||
await engine.setConfig('sync.repo_path', repoPath);
|
||||
await writeSyncAnchor(engine, opts.sourceId, 'repo_path', repoPath);
|
||||
|
||||
// Full sync doesn't track pagesAffected, so fall back to embed --stale.
|
||||
// Before commit 2: runEmbed is void; use result.imported as best estimate of
|
||||
@@ -482,7 +542,17 @@ export async function runSync(engine: BrainEngine, args: string[]) {
|
||||
const skipFailed = args.includes('--skip-failed');
|
||||
const retryFailed = args.includes('--retry-failed');
|
||||
|
||||
const opts: SyncOpts = { repoPath, dryRun, full, noPull, noEmbed, skipFailed, retryFailed };
|
||||
// v0.18.0 Step 5: --source resolves to a sources(id) row. Falls back
|
||||
// to pre-v0.17 global config (sync.repo_path + sync.last_commit) when
|
||||
// no flag, no env, no dotfile is present.
|
||||
const explicitSource = args.find((a, i) => args[i - 1] === '--source') || null;
|
||||
let sourceId: string | undefined = undefined;
|
||||
if (explicitSource || process.env.GBRAIN_SOURCE) {
|
||||
const { resolveSourceId } = await import('../core/source-resolver.ts');
|
||||
sourceId = await resolveSourceId(engine, explicitSource);
|
||||
}
|
||||
|
||||
const opts: SyncOpts = { repoPath, dryRun, full, noPull, noEmbed, skipFailed, retryFailed, sourceId };
|
||||
|
||||
// Bug 9 — --retry-failed: before running normal sync, clear acknowledgment
|
||||
// flags so the sync picks them up as fresh work. The actual re-attempt
|
||||
|
||||
@@ -28,6 +28,21 @@ export interface LinkBatchInput {
|
||||
origin_slug?: string;
|
||||
/** Frontmatter field name (e.g. 'key_people', 'investors'). */
|
||||
origin_field?: string;
|
||||
/**
|
||||
* v0.18.0: source id for each endpoint. When omitted, the engine JOINs
|
||||
* against `source_id='default'`. Pass explicit values when the edge
|
||||
* lives in a non-default source OR crosses sources.
|
||||
*
|
||||
* Without these fields, the batch JOIN `pages.slug = v.from_slug` fans
|
||||
* out across every source containing that slug, silently creating wrong
|
||||
* edges in a multi-source brain. The source_id filter eliminates the
|
||||
* fan-out. Origin pages (frontmatter provenance) get their own
|
||||
* source_id so reconciliation can't delete edges from another source's
|
||||
* frontmatter.
|
||||
*/
|
||||
from_source_id?: string;
|
||||
to_source_id?: string;
|
||||
origin_source_id?: string;
|
||||
}
|
||||
|
||||
/** Input row for addTimelineEntriesBatch. Optional fields default to '' (matches NOT NULL DDL). */
|
||||
@@ -37,6 +52,12 @@ export interface TimelineBatchInput {
|
||||
source?: string;
|
||||
summary: string;
|
||||
detail?: string;
|
||||
/**
|
||||
* v0.18.0: source id for the owning page. When omitted, the engine JOINs
|
||||
* against `source_id='default'`. Without this, two pages sharing the
|
||||
* same slug across sources would fan out timeline rows to both.
|
||||
*/
|
||||
source_id?: string;
|
||||
}
|
||||
|
||||
/** Maximum results returned by search operations. Internal bulk operations (listPages) are not clamped. */
|
||||
|
||||
@@ -24,8 +24,19 @@ export interface EntityRef {
|
||||
slug: string;
|
||||
/** Top-level directory ("people" | "companies" | etc.). */
|
||||
dir: string;
|
||||
/**
|
||||
* v0.17.0: source id when the link was qualified as `[[source:slug]]`.
|
||||
* `null` means unqualified — the caller resolves via local-first fallback
|
||||
* at extraction time. Mirrors links.resolution_type:
|
||||
* - sourceId set → 'qualified'
|
||||
* - sourceId null → 'unqualified'
|
||||
*/
|
||||
sourceId?: string | null;
|
||||
}
|
||||
|
||||
/** v0.17.0: how a link's target source was pinned at extraction time. */
|
||||
export type LinkResolutionType = 'qualified' | 'unqualified';
|
||||
|
||||
/**
|
||||
* Directory prefix whitelist. These are the top-level slug dirs the extractor
|
||||
* recognizes as entity references. Upstream canonical + our extensions:
|
||||
@@ -63,6 +74,23 @@ const WIKILINK_RE = new RegExp(
|
||||
'g',
|
||||
);
|
||||
|
||||
/**
|
||||
* v0.17.0: qualified wikilink `[[source-id:dir/slug]]` or
|
||||
* `[[source-id:dir/slug|Display Text]]`. The source-id segment pins the
|
||||
* target to a specific sources(id) row, overriding the local-first
|
||||
* fallback used by unqualified `[[slug]]` references.
|
||||
*
|
||||
* Captures: sourceId, slug (dir/...), displayName (optional).
|
||||
*
|
||||
* Matched BEFORE WIKILINK_RE so `[[wiki:topics/ai]]` isn't mis-parsed by
|
||||
* the unqualified regex (the source prefix would not satisfy DIR_PATTERN
|
||||
* anyway, but the two-pass approach keeps intent crystal-clear).
|
||||
*/
|
||||
const QUALIFIED_WIKILINK_RE = new RegExp(
|
||||
`\\[\\[([a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?):(${DIR_PATTERN}\\/[^|\\]#]+?)(?:#[^|\\]]*?)?(?:\\|([^\\]]+?))?\\]\\]`,
|
||||
'g',
|
||||
);
|
||||
|
||||
/**
|
||||
* Strip fenced code blocks (```...```) and inline code (`...`) from markdown,
|
||||
* replacing them with whitespace of equivalent length. Preserves byte offsets
|
||||
@@ -112,6 +140,9 @@ export function extractEntityRefs(content: string): EntityRef[] {
|
||||
let match: RegExpExecArray | null;
|
||||
|
||||
// 1. Markdown links: [Name](path)
|
||||
// Markdown links have no source-qualification syntax — they're
|
||||
// always unqualified. Omit sourceId so the shape stays compatible
|
||||
// with pre-v0.17 consumers doing strict equality.
|
||||
const mdPattern = new RegExp(ENTITY_REF_RE.source, ENTITY_REF_RE.flags);
|
||||
while ((match = mdPattern.exec(stripped)) !== null) {
|
||||
const name = match[1];
|
||||
@@ -121,9 +152,28 @@ export function extractEntityRefs(content: string): EntityRef[] {
|
||||
refs.push({ name, slug, dir });
|
||||
}
|
||||
|
||||
// 2. Obsidian wikilinks: [[path]] or [[path|Display Text]]
|
||||
// 2a. v0.17.0 qualified wikilinks: [[source-id:path]] or [[source-id:path|Display]]
|
||||
// Must run BEFORE the unqualified pass or we'd double-emit. We also
|
||||
// mask out the matched spans so pass 2b can't grab them.
|
||||
const qualifiedRanges: Array<[number, number]> = [];
|
||||
const qualPattern = new RegExp(QUALIFIED_WIKILINK_RE.source, QUALIFIED_WIKILINK_RE.flags);
|
||||
while ((match = qualPattern.exec(stripped)) !== null) {
|
||||
const sourceId = match[1];
|
||||
let slug = match[2].trim();
|
||||
if (!slug) continue;
|
||||
if (slug.includes('://')) continue;
|
||||
if (slug.endsWith('.md')) slug = slug.slice(0, -3);
|
||||
const displayName = (match[3] || slug).trim();
|
||||
const dir = slug.split('/')[0];
|
||||
refs.push({ name: displayName, slug, dir, sourceId });
|
||||
qualifiedRanges.push([match.index, match.index + match[0].length]);
|
||||
}
|
||||
|
||||
// 2b. Unqualified Obsidian wikilinks: [[path]] or [[path|Display Text]]
|
||||
// Same shape rule: omit sourceId when unqualified.
|
||||
const unmasked = maskRanges(stripped, qualifiedRanges);
|
||||
const wikiPattern = new RegExp(WIKILINK_RE.source, WIKILINK_RE.flags);
|
||||
while ((match = wikiPattern.exec(stripped)) !== null) {
|
||||
while ((match = wikiPattern.exec(unmasked)) !== null) {
|
||||
let slug = match[1].trim();
|
||||
if (!slug) continue;
|
||||
if (slug.includes('://')) continue;
|
||||
@@ -136,6 +186,20 @@ export function extractEntityRefs(content: string): EntityRef[] {
|
||||
return refs;
|
||||
}
|
||||
|
||||
/**
|
||||
* Replace the byte ranges with spaces, preserving offsets. Used by
|
||||
* extractEntityRefs to prevent the unqualified wikilink regex from
|
||||
* matching inside a qualified wikilink span.
|
||||
*/
|
||||
function maskRanges(content: string, ranges: Array<[number, number]>): string {
|
||||
if (ranges.length === 0) return content;
|
||||
const chars = content.split('');
|
||||
for (const [s, e] of ranges) {
|
||||
for (let i = s; i < e && i < chars.length; i++) chars[i] = ' ';
|
||||
}
|
||||
return chars.join('');
|
||||
}
|
||||
|
||||
// ─── Link candidates (richer than EntityRef) ────────────────────
|
||||
|
||||
export interface LinkCandidate {
|
||||
|
||||
@@ -449,6 +449,201 @@ export const MIGRATIONS: Migration[] = [
|
||||
}
|
||||
},
|
||||
},
|
||||
{
|
||||
version: 23,
|
||||
name: 'files_source_id_page_id_ledger',
|
||||
// v0.18.0 Step 7 (Lane E) — additive only: adds files.source_id and
|
||||
// files.page_id columns + creates the file_migration_ledger that
|
||||
// drives phase-B storage object rewrites. Does NOT drop page_slug
|
||||
// yet (kept for backward compat; a later release cleans up once the
|
||||
// page_id FK is proven). PGLite has no files table, so this
|
||||
// migration is Postgres-only via a handler gate.
|
||||
//
|
||||
// Ledger PK is file_id (not storage_path_old) — two sources CAN
|
||||
// share an old path during migration, so a composite would be
|
||||
// wrong. Codex second-pass review caught this.
|
||||
//
|
||||
// State machine per row:
|
||||
// pending → copy_done → db_updated → complete
|
||||
// any state → failed (with error detail)
|
||||
//
|
||||
// Phase B in the v0_18_0 orchestrator processes `status != complete`
|
||||
// rows. Re-runnable: resumes from whichever state it stopped in.
|
||||
sql: '',
|
||||
handler: async (engine) => {
|
||||
if (engine.kind === 'pglite') return;
|
||||
await engine.runMigration(19, `
|
||||
-- 1a. source_id with DEFAULT 'default' (idempotent)
|
||||
ALTER TABLE files ADD COLUMN IF NOT EXISTS source_id TEXT
|
||||
NOT NULL DEFAULT 'default' REFERENCES sources(id) ON DELETE CASCADE;
|
||||
CREATE INDEX IF NOT EXISTS idx_files_source_id ON files(source_id);
|
||||
|
||||
-- 1b. page_id (nullable; pre-v0.17 files pointed at page_slug
|
||||
-- which was ON DELETE SET NULL, so we keep the same nullable
|
||||
-- semantic — orphaned files are legal).
|
||||
ALTER TABLE files ADD COLUMN IF NOT EXISTS page_id INTEGER
|
||||
REFERENCES pages(id) ON DELETE SET NULL;
|
||||
CREATE INDEX IF NOT EXISTS idx_files_page_id ON files(page_id);
|
||||
`);
|
||||
|
||||
await engine.runMigration(19, `
|
||||
-- 1c. Backfill page_id from existing page_slug. Scoped to
|
||||
-- source_id='default' because pre-v0.17 pages ALL lived in
|
||||
-- the default source. Without this scope, after new sources
|
||||
-- get added mid-migration, the JOIN could hit the wrong
|
||||
-- page (different source, same slug).
|
||||
UPDATE files f
|
||||
SET page_id = p.id
|
||||
FROM pages p
|
||||
WHERE f.page_slug = p.slug
|
||||
AND p.source_id = 'default'
|
||||
AND f.page_id IS NULL;
|
||||
`);
|
||||
|
||||
await engine.runMigration(19, `
|
||||
-- 2. file_migration_ledger — drives the storage object rewrite
|
||||
-- in the v0_18_0 orchestrator's phase B. Seeded from current
|
||||
-- files rows; re-seed is idempotent via NOT EXISTS guard.
|
||||
CREATE TABLE IF NOT EXISTS file_migration_ledger (
|
||||
file_id INTEGER PRIMARY KEY REFERENCES files(id) ON DELETE CASCADE,
|
||||
storage_path_old TEXT NOT NULL,
|
||||
storage_path_new TEXT NOT NULL,
|
||||
status TEXT NOT NULL DEFAULT 'pending',
|
||||
error TEXT,
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
CONSTRAINT chk_ledger_status CHECK (status IN ('pending','copy_done','db_updated','complete','failed'))
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_file_migration_ledger_status
|
||||
ON file_migration_ledger(status) WHERE status != 'complete';
|
||||
|
||||
-- Seed the ledger with every existing file. New path prefixes
|
||||
-- source_id so multi-source can land assets under their own
|
||||
-- bucket path without collision.
|
||||
INSERT INTO file_migration_ledger (file_id, storage_path_old, storage_path_new, status)
|
||||
SELECT
|
||||
f.id,
|
||||
f.storage_path,
|
||||
COALESCE(f.source_id, 'default') || '/' || f.storage_path,
|
||||
'pending'
|
||||
FROM files f
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM file_migration_ledger l WHERE l.file_id = f.id
|
||||
);
|
||||
`);
|
||||
},
|
||||
},
|
||||
{
|
||||
version: 22,
|
||||
name: 'links_resolution_type',
|
||||
// v0.18.0 Step 4 (Lane B) — adds links.resolution_type column so
|
||||
// each edge records whether its target source was pinned at
|
||||
// extraction time via `[[source:slug]]` (qualified) or resolved
|
||||
// via local-first fallback (unqualified). Unqualified edges are
|
||||
// candidates for re-resolution via `gbrain extract
|
||||
// --refresh-unqualified` when the source topology changes.
|
||||
//
|
||||
// Nullable because legacy edges (pre-v0.17) have no resolution
|
||||
// concept. `frontmatter` and `manual` edges remain NULL — they're
|
||||
// not subject to staleness under source churn.
|
||||
sql: `
|
||||
ALTER TABLE links ADD COLUMN IF NOT EXISTS resolution_type TEXT;
|
||||
DO $$ BEGIN
|
||||
IF NOT EXISTS (
|
||||
SELECT 1 FROM pg_constraint WHERE conname = 'links_resolution_type_check'
|
||||
) THEN
|
||||
ALTER TABLE links ADD CONSTRAINT links_resolution_type_check
|
||||
CHECK (resolution_type IS NULL OR resolution_type IN ('qualified', 'unqualified'));
|
||||
END IF;
|
||||
END $$;
|
||||
`,
|
||||
},
|
||||
{
|
||||
version: 21,
|
||||
name: 'pages_source_id_composite_unique',
|
||||
// v0.18.0 Step 2 (Lane B) — adds pages.source_id with DEFAULT 'default'
|
||||
// and swaps the global UNIQUE(slug) for the composite UNIQUE(source_id,
|
||||
// slug). Lands alongside the engine SQL rewrite that makes every
|
||||
// ON CONFLICT (slug) → ON CONFLICT (source_id, slug) so the constraint
|
||||
// swap is atomic with the code that writes under it.
|
||||
//
|
||||
// DEFAULT 'default' is load-bearing: closes the Codex-flagged race
|
||||
// where an INSERT between ADD COLUMN and SET NOT NULL could leave
|
||||
// source_id NULL. Because the default already references a valid
|
||||
// sources row (seeded in v16), new INSERTs immediately get a valid FK.
|
||||
//
|
||||
// Idempotent: IF NOT EXISTS on ADD COLUMN, DROP IF EXISTS on the old
|
||||
// constraint, DO block guard on the new constraint creation.
|
||||
sql: `
|
||||
ALTER TABLE pages ADD COLUMN IF NOT EXISTS source_id TEXT
|
||||
NOT NULL DEFAULT 'default' REFERENCES sources(id) ON DELETE CASCADE;
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_source_id ON pages(source_id);
|
||||
|
||||
-- Swap global UNIQUE(slug) → composite UNIQUE(source_id, slug). The
|
||||
-- original constraint is named pages_slug_key by Postgres convention
|
||||
-- when the column was declared UNIQUE inline. Both drops are
|
||||
-- idempotent.
|
||||
ALTER TABLE pages DROP CONSTRAINT IF EXISTS pages_slug_key;
|
||||
DO $$ BEGIN
|
||||
IF NOT EXISTS (
|
||||
SELECT 1 FROM pg_constraint WHERE conname = 'pages_source_slug_key'
|
||||
) THEN
|
||||
ALTER TABLE pages ADD CONSTRAINT pages_source_slug_key
|
||||
UNIQUE (source_id, slug);
|
||||
END IF;
|
||||
END $$;
|
||||
`,
|
||||
},
|
||||
{
|
||||
version: 20,
|
||||
name: 'sources_table_additive',
|
||||
// v0.18.0 Step 1 (Lane A) — **additive only** so Step 1 is a safe
|
||||
// standalone commit. This migration installs the sources primitive
|
||||
// WITHOUT breaking the engine's existing ON CONFLICT (slug) upserts.
|
||||
//
|
||||
// What this migration does now:
|
||||
// - CREATE sources table
|
||||
// - INSERT default source (federated=true, inherits sync.repo_path
|
||||
// and sync.last_commit from config so post-upgrade identity is
|
||||
// preserved)
|
||||
//
|
||||
// What this migration does NOT do yet (deferred to v17 which ships
|
||||
// with Step 2 engine rewrite, so they land atomically):
|
||||
// - ALTER pages ADD source_id
|
||||
// - DROP UNIQUE(slug) + ADD UNIQUE(source_id, slug)
|
||||
// - files.page_slug → page_id rewrite
|
||||
// - file_migration_ledger
|
||||
// - links.resolution_type
|
||||
//
|
||||
// The v0.18.0 orchestrator's phaseCVerify allows this split: it
|
||||
// checks for sources('default'), but the "composite UNIQUE" +
|
||||
// "pages.source_id NOT NULL" assertions only run after v17 lands.
|
||||
//
|
||||
// Idempotent via IF NOT EXISTS. Safe to re-run.
|
||||
sql: `
|
||||
CREATE TABLE IF NOT EXISTS sources (
|
||||
id TEXT PRIMARY KEY,
|
||||
name TEXT NOT NULL UNIQUE,
|
||||
local_path TEXT,
|
||||
last_commit TEXT,
|
||||
last_sync_at TIMESTAMPTZ,
|
||||
config JSONB NOT NULL DEFAULT '{}'::jsonb,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
-- Seed 'default' source, inheriting the existing sync.repo_path /
|
||||
-- sync.last_commit config values. federated=true for backward compat.
|
||||
-- Pre-v0.17 brains behave exactly as before.
|
||||
INSERT INTO sources (id, name, local_path, last_commit, config)
|
||||
SELECT
|
||||
'default',
|
||||
'default',
|
||||
(SELECT value FROM config WHERE key = 'sync.repo_path'),
|
||||
(SELECT value FROM config WHERE key = 'sync.last_commit'),
|
||||
'{"federated": true}'::jsonb
|
||||
WHERE NOT EXISTS (SELECT 1 FROM sources WHERE id = 'default');
|
||||
`,
|
||||
},
|
||||
{
|
||||
version: 15,
|
||||
name: 'minion_jobs_max_stalled_default_5',
|
||||
@@ -502,8 +697,14 @@ export async function runMigrations(engine: BrainEngine): Promise<{ applied: num
|
||||
const currentStr = await engine.getConfig('version');
|
||||
const current = parseInt(currentStr || '1', 10);
|
||||
|
||||
// Sort by version ascending so array insertion order doesn't affect
|
||||
// correctness. Migrations MUST run in version order; if v16 accidentally
|
||||
// precedes v15 in MIGRATIONS, setConfig(version, 16) would cause v15 to
|
||||
// be skipped on the next iteration.
|
||||
const sorted = [...MIGRATIONS].sort((a, b) => a.version - b.version);
|
||||
|
||||
let applied = 0;
|
||||
for (const m of MIGRATIONS) {
|
||||
for (const m of sorted) {
|
||||
if (m.version > current) {
|
||||
// Pick SQL: engine-specific `sqlFor` wins over engine-agnostic `sql`.
|
||||
const sql = m.sqlFor?.[engine.kind] ?? m.sql;
|
||||
|
||||
@@ -116,10 +116,15 @@ export class PGLiteEngine implements BrainEngine {
|
||||
const hash = page.content_hash || contentHash(page);
|
||||
const frontmatter = page.frontmatter || {};
|
||||
|
||||
// v0.18.0 Step 2: source_id relies on the schema DEFAULT 'default' so
|
||||
// existing callers still target the default source without threading
|
||||
// a parameter. ON CONFLICT target becomes (source_id, slug) since the
|
||||
// global UNIQUE(slug) was dropped in migration v17. Step 5+ will
|
||||
// surface an explicit sourceId param on putPage for multi-source sync.
|
||||
const { rows } = await this.db.query(
|
||||
`INSERT INTO pages (slug, type, title, compiled_truth, timeline, frontmatter, content_hash, updated_at)
|
||||
VALUES ($1, $2, $3, $4, $5, $6::jsonb, $7, now())
|
||||
ON CONFLICT (slug) DO UPDATE SET
|
||||
ON CONFLICT (source_id, slug) DO UPDATE SET
|
||||
type = EXCLUDED.type,
|
||||
title = EXCLUDED.title,
|
||||
compiled_truth = EXCLUDED.compiled_truth,
|
||||
@@ -205,7 +210,7 @@ export class PGLiteEngine implements BrainEngine {
|
||||
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT
|
||||
p.slug, p.id as page_id, p.title, p.type,
|
||||
p.slug, p.id as page_id, p.title, p.type, p.source_id,
|
||||
cc.id as chunk_id, cc.chunk_index, cc.chunk_text, cc.chunk_source,
|
||||
ts_rank(p.search_vector, websearch_to_tsquery('english', $1)) AS score,
|
||||
CASE WHEN p.updated_at < (
|
||||
@@ -235,7 +240,7 @@ export class PGLiteEngine implements BrainEngine {
|
||||
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT
|
||||
p.slug, p.id as page_id, p.title, p.type,
|
||||
p.slug, p.id as page_id, p.title, p.type, p.source_id,
|
||||
cc.id as chunk_id, cc.chunk_index, cc.chunk_text, cc.chunk_source,
|
||||
1 - (cc.embedding <=> $1::vector) AS score,
|
||||
CASE WHEN p.updated_at < (
|
||||
@@ -370,8 +375,14 @@ export class PGLiteEngine implements BrainEngine {
|
||||
|
||||
async addLinksBatch(links: LinkBatchInput[]): Promise<number> {
|
||||
if (links.length === 0) return 0;
|
||||
// unnest() pattern: 7 array-typed bound parameters regardless of batch size.
|
||||
// Same shape as PostgresEngine (v0.13). Avoids the 65535-parameter cap.
|
||||
// unnest() pattern: 10 array-typed bound parameters regardless of batch
|
||||
// size. Same shape as PostgresEngine (v0.18). Avoids the 65535-parameter
|
||||
// cap.
|
||||
//
|
||||
// v0.18.0: every JOIN composite-keys on (slug, source_id) so the batch
|
||||
// can't fan out across sources when the same slug exists in multiple
|
||||
// sources. Origin JOIN uses LEFT JOIN on a composite key — NULL
|
||||
// origin_slug leaves origin_page_id NULL, same as pre-v0.18.
|
||||
const fromSlugs = links.map(l => l.from_slug);
|
||||
const toSlugs = links.map(l => l.to_slug);
|
||||
const linkTypes = links.map(l => l.link_type || '');
|
||||
@@ -379,17 +390,20 @@ export class PGLiteEngine implements BrainEngine {
|
||||
const linkSources = links.map(l => l.link_source || 'markdown');
|
||||
const originSlugs = links.map(l => l.origin_slug || null);
|
||||
const originFields = links.map(l => l.origin_field || null);
|
||||
const fromSourceIds = links.map(l => l.from_source_id || 'default');
|
||||
const toSourceIds = links.map(l => l.to_source_id || 'default');
|
||||
const originSourceIds = links.map(l => l.origin_source_id || 'default');
|
||||
const result = await this.db.query(
|
||||
`INSERT INTO links (from_page_id, to_page_id, link_type, context, link_source, origin_page_id, origin_field)
|
||||
SELECT f.id, t.id, v.link_type, v.context, v.link_source, o.id, v.origin_field
|
||||
FROM unnest($1::text[], $2::text[], $3::text[], $4::text[], $5::text[], $6::text[], $7::text[])
|
||||
AS v(from_slug, to_slug, link_type, context, link_source, origin_slug, origin_field)
|
||||
JOIN pages f ON f.slug = v.from_slug
|
||||
JOIN pages t ON t.slug = v.to_slug
|
||||
LEFT JOIN pages o ON o.slug = v.origin_slug
|
||||
FROM unnest($1::text[], $2::text[], $3::text[], $4::text[], $5::text[], $6::text[], $7::text[], $8::text[], $9::text[], $10::text[])
|
||||
AS v(from_slug, to_slug, link_type, context, link_source, origin_slug, origin_field, from_source_id, to_source_id, origin_source_id)
|
||||
JOIN pages f ON f.slug = v.from_slug AND f.source_id = v.from_source_id
|
||||
JOIN pages t ON t.slug = v.to_slug AND t.source_id = v.to_source_id
|
||||
LEFT JOIN pages o ON o.slug = v.origin_slug AND o.source_id = v.origin_source_id
|
||||
ON CONFLICT (from_page_id, to_page_id, link_type, link_source, origin_page_id) DO NOTHING
|
||||
RETURNING 1`,
|
||||
[fromSlugs, toSlugs, linkTypes, contexts, linkSources, originSlugs, originFields]
|
||||
[fromSlugs, toSlugs, linkTypes, contexts, linkSources, originSlugs, originFields, fromSourceIds, toSourceIds, originSourceIds]
|
||||
);
|
||||
return result.rows.length;
|
||||
}
|
||||
@@ -724,22 +738,21 @@ export class PGLiteEngine implements BrainEngine {
|
||||
|
||||
async addTimelineEntriesBatch(entries: TimelineBatchInput[]): Promise<number> {
|
||||
if (entries.length === 0) return 0;
|
||||
// unnest() pattern: 5 array-typed bound parameters regardless of batch size.
|
||||
const slugs = entries.map(e => e.slug);
|
||||
const dates = entries.map(e => e.date);
|
||||
// Normalize optional fields to '' to match per-row addTimelineEntry + NOT NULL DDL.
|
||||
const sources = entries.map(e => e.source || '');
|
||||
const summaries = entries.map(e => e.summary);
|
||||
const details = entries.map(e => e.detail || '');
|
||||
const sourceIds = entries.map(e => e.source_id || 'default');
|
||||
const result = await this.db.query(
|
||||
`INSERT INTO timeline_entries (page_id, date, source, summary, detail)
|
||||
SELECT p.id, v.date::date, v.source, v.summary, v.detail
|
||||
FROM unnest($1::text[], $2::text[], $3::text[], $4::text[], $5::text[])
|
||||
AS v(slug, date, source, summary, detail)
|
||||
JOIN pages p ON p.slug = v.slug
|
||||
FROM unnest($1::text[], $2::text[], $3::text[], $4::text[], $5::text[], $6::text[])
|
||||
AS v(slug, date, source, summary, detail, source_id)
|
||||
JOIN pages p ON p.slug = v.slug AND p.source_id = v.source_id
|
||||
ON CONFLICT (page_id, date, summary) DO NOTHING
|
||||
RETURNING 1`,
|
||||
[slugs, dates, sources, summaries, details]
|
||||
[slugs, dates, sources, summaries, details, sourceIds]
|
||||
);
|
||||
return result.rows.length;
|
||||
}
|
||||
|
||||
@@ -19,12 +19,33 @@ export const PGLITE_SCHEMA_SQL = `
|
||||
CREATE EXTENSION IF NOT EXISTS vector;
|
||||
CREATE EXTENSION IF NOT EXISTS pg_trgm;
|
||||
|
||||
-- ============================================================
|
||||
-- sources: multi-brain tenancy (v0.18.0). See src/schema.sql for design notes.
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS sources (
|
||||
id TEXT PRIMARY KEY,
|
||||
name TEXT NOT NULL UNIQUE,
|
||||
local_path TEXT,
|
||||
last_commit TEXT,
|
||||
last_sync_at TIMESTAMPTZ,
|
||||
config JSONB NOT NULL DEFAULT '{}'::jsonb,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
INSERT INTO sources (id, name, config)
|
||||
VALUES ('default', 'default', '{"federated": true}'::jsonb)
|
||||
ON CONFLICT (id) DO NOTHING;
|
||||
|
||||
-- ============================================================
|
||||
-- pages: the core content table
|
||||
-- ============================================================
|
||||
-- v0.18.0 (Step 2): source_id scopes each page. Slugs are unique per
|
||||
-- source — see src/schema.sql for the design notes.
|
||||
CREATE TABLE IF NOT EXISTS pages (
|
||||
id SERIAL PRIMARY KEY,
|
||||
slug TEXT NOT NULL UNIQUE,
|
||||
source_id TEXT NOT NULL DEFAULT 'default'
|
||||
REFERENCES sources(id) ON DELETE CASCADE,
|
||||
slug TEXT NOT NULL,
|
||||
type TEXT NOT NULL,
|
||||
title TEXT NOT NULL,
|
||||
compiled_truth TEXT NOT NULL DEFAULT '',
|
||||
@@ -32,12 +53,14 @@ CREATE TABLE IF NOT EXISTS pages (
|
||||
frontmatter JSONB NOT NULL DEFAULT '{}',
|
||||
content_hash TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
CONSTRAINT pages_source_slug_key UNIQUE (source_id, slug)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_type ON pages(type);
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_frontmatter ON pages USING GIN(frontmatter);
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_trgm ON pages USING GIN(title gin_trgm_ops);
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_source_id ON pages(source_id);
|
||||
|
||||
-- ============================================================
|
||||
-- content_chunks: chunked content with embeddings
|
||||
@@ -72,6 +95,8 @@ CREATE TABLE IF NOT EXISTS links (
|
||||
link_source TEXT CHECK (link_source IS NULL OR link_source IN ('markdown', 'frontmatter', 'manual')),
|
||||
origin_page_id INTEGER REFERENCES pages(id) ON DELETE SET NULL,
|
||||
origin_field TEXT,
|
||||
-- v0.18.0 Step 4: see src/schema.sql.
|
||||
resolution_type TEXT CHECK (resolution_type IS NULL OR resolution_type IN ('qualified', 'unqualified')),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
CONSTRAINT links_from_to_type_source_origin_unique
|
||||
UNIQUE NULLS NOT DISTINCT (from_page_id, to_page_id, link_type, link_source, origin_page_id)
|
||||
@@ -141,7 +166,7 @@ CREATE TABLE IF NOT EXISTS page_versions (
|
||||
CREATE INDEX IF NOT EXISTS idx_versions_page ON page_versions(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- ingest_log
|
||||
-- ingest_log (v0.18.0 Step 1: source_id deferred to v17, see src/schema.sql)
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS ingest_log (
|
||||
id SERIAL PRIMARY KEY,
|
||||
|
||||
@@ -115,10 +115,14 @@ export class PostgresEngine implements BrainEngine {
|
||||
const hash = page.content_hash || contentHash(page);
|
||||
const frontmatter = page.frontmatter || {};
|
||||
|
||||
// v0.18.0 Step 2: source_id relies on schema DEFAULT 'default'. ON
|
||||
// CONFLICT target becomes (source_id, slug) since global UNIQUE(slug)
|
||||
// was dropped in migration v17. See pglite-engine.ts for matching
|
||||
// notes; multi-source sync (Step 5) will surface an explicit sourceId.
|
||||
const rows = await sql`
|
||||
INSERT INTO pages (slug, type, title, compiled_truth, timeline, frontmatter, content_hash, updated_at)
|
||||
VALUES (${slug}, ${page.type}, ${page.title}, ${page.compiled_truth}, ${page.timeline || ''}, ${sql.json(frontmatter as Parameters<typeof sql.json>[0])}, ${hash}, now())
|
||||
ON CONFLICT (slug) DO UPDATE SET
|
||||
ON CONFLICT (source_id, slug) DO UPDATE SET
|
||||
type = EXCLUDED.type,
|
||||
title = EXCLUDED.title,
|
||||
compiled_truth = EXCLUDED.compiled_truth,
|
||||
@@ -262,7 +266,7 @@ export class PostgresEngine implements BrainEngine {
|
||||
await sql`SET LOCAL statement_timeout = '8s'`;
|
||||
return await sql`
|
||||
SELECT
|
||||
p.slug, p.id as page_id, p.title, p.type,
|
||||
p.slug, p.id as page_id, p.title, p.type, p.source_id,
|
||||
cc.id as chunk_id, cc.chunk_index, cc.chunk_text, cc.chunk_source,
|
||||
1 - (cc.embedding <=> ${vecStr}::vector) AS score,
|
||||
false AS stale
|
||||
@@ -422,17 +426,21 @@ export class PostgresEngine implements BrainEngine {
|
||||
const linkSources = links.map(l => l.link_source || 'markdown');
|
||||
const originSlugs = links.map(l => l.origin_slug || null);
|
||||
const originFields = links.map(l => l.origin_field || null);
|
||||
const fromSourceIds = links.map(l => l.from_source_id || 'default');
|
||||
const toSourceIds = links.map(l => l.to_source_id || 'default');
|
||||
const originSourceIds = links.map(l => l.origin_source_id || 'default');
|
||||
const result = await sql`
|
||||
INSERT INTO links (from_page_id, to_page_id, link_type, context, link_source, origin_page_id, origin_field)
|
||||
SELECT f.id, t.id, v.link_type, v.context, v.link_source, o.id, v.origin_field
|
||||
FROM unnest(
|
||||
${fromSlugs}::text[], ${toSlugs}::text[], ${linkTypes}::text[],
|
||||
${contexts}::text[], ${linkSources}::text[], ${originSlugs}::text[],
|
||||
${originFields}::text[]
|
||||
) AS v(from_slug, to_slug, link_type, context, link_source, origin_slug, origin_field)
|
||||
JOIN pages f ON f.slug = v.from_slug
|
||||
JOIN pages t ON t.slug = v.to_slug
|
||||
LEFT JOIN pages o ON o.slug = v.origin_slug
|
||||
${originFields}::text[], ${fromSourceIds}::text[], ${toSourceIds}::text[],
|
||||
${originSourceIds}::text[]
|
||||
) AS v(from_slug, to_slug, link_type, context, link_source, origin_slug, origin_field, from_source_id, to_source_id, origin_source_id)
|
||||
JOIN pages f ON f.slug = v.from_slug AND f.source_id = v.from_source_id
|
||||
JOIN pages t ON t.slug = v.to_slug AND t.source_id = v.to_source_id
|
||||
LEFT JOIN pages o ON o.slug = v.origin_slug AND o.source_id = v.origin_source_id
|
||||
ON CONFLICT (from_page_id, to_page_id, link_type, link_source, origin_page_id) DO NOTHING
|
||||
RETURNING 1
|
||||
`;
|
||||
@@ -775,19 +783,18 @@ export class PostgresEngine implements BrainEngine {
|
||||
async addTimelineEntriesBatch(entries: TimelineBatchInput[]): Promise<number> {
|
||||
if (entries.length === 0) return 0;
|
||||
const sql = this.sql;
|
||||
// unnest() pattern: 5 array-typed bound parameters regardless of batch size.
|
||||
const slugs = entries.map(e => e.slug);
|
||||
const dates = entries.map(e => e.date);
|
||||
// Normalize optional fields to '' to match per-row addTimelineEntry + NOT NULL DDL.
|
||||
const sources = entries.map(e => e.source || '');
|
||||
const summaries = entries.map(e => e.summary);
|
||||
const details = entries.map(e => e.detail || '');
|
||||
const sourceIds = entries.map(e => e.source_id || 'default');
|
||||
const result = await sql`
|
||||
INSERT INTO timeline_entries (page_id, date, source, summary, detail)
|
||||
SELECT p.id, v.date::date, v.source, v.summary, v.detail
|
||||
FROM unnest(${slugs}::text[], ${dates}::text[], ${sources}::text[], ${summaries}::text[], ${details}::text[])
|
||||
AS v(slug, date, source, summary, detail)
|
||||
JOIN pages p ON p.slug = v.slug
|
||||
FROM unnest(${slugs}::text[], ${dates}::text[], ${sources}::text[], ${summaries}::text[], ${details}::text[], ${sourceIds}::text[])
|
||||
AS v(slug, date, source, summary, detail, source_id)
|
||||
JOIN pages p ON p.slug = v.slug AND p.source_id = v.source_id
|
||||
ON CONFLICT (page_id, date, summary) DO NOTHING
|
||||
RETURNING 1
|
||||
`;
|
||||
|
||||
@@ -9,12 +9,55 @@ CREATE EXTENSION IF NOT EXISTS pg_trgm;
|
||||
-- gen_random_uuid() is core in Postgres 13+; enable pgcrypto as fallback for older versions
|
||||
CREATE EXTENSION IF NOT EXISTS pgcrypto;
|
||||
|
||||
-- ============================================================
|
||||
-- sources: multi-repo / multi-brain tenancy (v0.18.0)
|
||||
-- ============================================================
|
||||
-- A source is a logical brain-within-the-DB: wiki, gstack, yc-media, etc.
|
||||
-- Every page/file/ingest_log row carries source_id.
|
||||
--
|
||||
-- id: immutable citation key. [a-z0-9-]{1,32} enforced at app layer.
|
||||
-- Used in [source:slug] citations, --source flag, wikilink syntax.
|
||||
-- name: mutable display label. Rename via \`gbrain sources rename\`.
|
||||
-- local_path: optional git checkout root for filesystem-backed sources.
|
||||
-- config: forward-compat JSONB. Currently used for federation + ACL slot.
|
||||
-- { "federated": bool, "access_policy": {...} }
|
||||
-- - federated=true (or missing-but-explicit on 'default'):
|
||||
-- participates in cross-source default search.
|
||||
-- - federated=false (default for new sources):
|
||||
-- only searched when explicitly named via --source.
|
||||
-- - access_policy: forward-compat slot, no enforcement in v0.17.
|
||||
-- Write-side lockdown: mutated only when ctx.remote=false.
|
||||
CREATE TABLE IF NOT EXISTS sources (
|
||||
id TEXT PRIMARY KEY,
|
||||
name TEXT NOT NULL UNIQUE,
|
||||
local_path TEXT,
|
||||
last_commit TEXT,
|
||||
last_sync_at TIMESTAMPTZ,
|
||||
config JSONB NOT NULL DEFAULT '{}'::jsonb,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
-- Seed the default source. 'default' is federated=true for backward compat
|
||||
-- (pre-v0.17 brains behave exactly as before — every page appears in search).
|
||||
-- Pre-existing sync.repo_path / sync.last_commit are copied in by the v16
|
||||
-- migration, not here; fresh installs have no local_path until \`sources add\`
|
||||
-- or the first \`sync\`.
|
||||
INSERT INTO sources (id, name, config)
|
||||
VALUES ('default', 'default', '{"federated": true}'::jsonb)
|
||||
ON CONFLICT (id) DO NOTHING;
|
||||
|
||||
-- ============================================================
|
||||
-- pages: the core content table
|
||||
-- ============================================================
|
||||
-- v0.18.0 (Step 2): pages.source_id scopes each row to a sources(id) row.
|
||||
-- Slugs are unique per source, NOT globally. The default source is
|
||||
-- seeded in the sources block above so the DEFAULT 'default' FK is
|
||||
-- always valid at INSERT time.
|
||||
CREATE TABLE IF NOT EXISTS pages (
|
||||
id SERIAL PRIMARY KEY,
|
||||
slug TEXT NOT NULL UNIQUE,
|
||||
source_id TEXT NOT NULL DEFAULT 'default'
|
||||
REFERENCES sources(id) ON DELETE CASCADE,
|
||||
slug TEXT NOT NULL,
|
||||
type TEXT NOT NULL,
|
||||
title TEXT NOT NULL,
|
||||
compiled_truth TEXT NOT NULL DEFAULT '',
|
||||
@@ -22,7 +65,8 @@ CREATE TABLE IF NOT EXISTS pages (
|
||||
frontmatter JSONB NOT NULL DEFAULT '{}',
|
||||
content_hash TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
CONSTRAINT pages_source_slug_key UNIQUE (source_id, slug)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_type ON pages(type);
|
||||
@@ -30,6 +74,8 @@ CREATE INDEX IF NOT EXISTS idx_pages_frontmatter ON pages USING GIN(frontmatter)
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_trgm ON pages USING GIN(title gin_trgm_ops);
|
||||
-- v0.13.1 #170: avoids 14.6s seqscan on large brains when listing pages newest-first.
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_updated_at_desc ON pages (updated_at DESC);
|
||||
-- v0.18.0: source-scoped scans (per /plan-eng-review Section 4).
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_source_id ON pages(source_id);
|
||||
|
||||
-- ============================================================
|
||||
-- content_chunks: chunked content with embeddings
|
||||
@@ -74,6 +120,11 @@ CREATE TABLE IF NOT EXISTS links (
|
||||
link_source TEXT CHECK (link_source IS NULL OR link_source IN ('markdown', 'frontmatter', 'manual')),
|
||||
origin_page_id INTEGER REFERENCES pages(id) ON DELETE SET NULL,
|
||||
origin_field TEXT,
|
||||
-- v0.18.0 Step 4: 'qualified' when the link was written as
|
||||
-- [[source:slug]] (target source pinned). 'unqualified' when written
|
||||
-- as bare [[slug]] and resolved via local-first fallback at
|
||||
-- extraction time. NULL for legacy/manual/frontmatter edges.
|
||||
resolution_type TEXT CHECK (resolution_type IS NULL OR resolution_type IN ('qualified', 'unqualified')),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
-- NULLS NOT DISTINCT (PG15+) so two rows with link_source IS NULL or
|
||||
-- origin_page_id IS NULL collide as expected. Without this, every row with
|
||||
@@ -148,6 +199,9 @@ CREATE INDEX IF NOT EXISTS idx_versions_page ON page_versions(page_id);
|
||||
-- ============================================================
|
||||
-- ingest_log
|
||||
-- ============================================================
|
||||
-- NOTE (v0.18.0 Step 1): ingest_log.source_id is NOT added yet — lands
|
||||
-- in v17 alongside the sync rewrite (Step 5), which starts writing
|
||||
-- source-scoped entries.
|
||||
CREATE TABLE IF NOT EXISTS ingest_log (
|
||||
id SERIAL PRIMARY KEY,
|
||||
source_type TEXT NOT NULL,
|
||||
@@ -202,9 +256,18 @@ CREATE TABLE IF NOT EXISTS mcp_request_log (
|
||||
-- ============================================================
|
||||
-- files: binary attachments stored in Supabase Storage
|
||||
-- ============================================================
|
||||
-- v0.18.0 Step 7: files gains source_id + page_id alongside the
|
||||
-- legacy page_slug (kept for backward compat until a later release).
|
||||
-- The file_migration_ledger below drives the storage object rewrite.
|
||||
-- page_slug FK had ON UPDATE CASCADE — removed because slugs are no
|
||||
-- longer global (composite UNIQUE) so CASCADE on-update is ambiguous.
|
||||
-- ON DELETE SET NULL is preserved via both page_slug and page_id.
|
||||
CREATE TABLE IF NOT EXISTS files (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_slug TEXT REFERENCES pages(slug) ON DELETE SET NULL ON UPDATE CASCADE,
|
||||
source_id TEXT NOT NULL DEFAULT 'default'
|
||||
REFERENCES sources(id) ON DELETE CASCADE,
|
||||
page_slug TEXT,
|
||||
page_id INTEGER REFERENCES pages(id) ON DELETE SET NULL,
|
||||
filename TEXT NOT NULL,
|
||||
storage_path TEXT NOT NULL,
|
||||
mime_type TEXT,
|
||||
@@ -219,8 +282,30 @@ CREATE TABLE IF NOT EXISTS files (
|
||||
ALTER TABLE files DROP COLUMN IF EXISTS storage_url;
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_files_page ON files(page_slug);
|
||||
CREATE INDEX IF NOT EXISTS idx_files_page_id ON files(page_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_files_source_id ON files(source_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_files_hash ON files(content_hash);
|
||||
|
||||
-- ============================================================
|
||||
-- file_migration_ledger (v0.18.0 Step 7)
|
||||
-- Drives the storage-object rewrite performed by the v0_18_0
|
||||
-- orchestrator's phase B. Keyed on file_id so two sources can share
|
||||
-- an old path during migration without PK collision (Codex second-
|
||||
-- pass caught this).
|
||||
-- Status state machine: pending → copy_done → db_updated → complete
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS file_migration_ledger (
|
||||
file_id INTEGER PRIMARY KEY REFERENCES files(id) ON DELETE CASCADE,
|
||||
storage_path_old TEXT NOT NULL,
|
||||
storage_path_new TEXT NOT NULL,
|
||||
status TEXT NOT NULL DEFAULT 'pending',
|
||||
error TEXT,
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
CONSTRAINT chk_ledger_status CHECK (status IN ('pending','copy_done','db_updated','complete','failed'))
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_file_migration_ledger_status
|
||||
ON file_migration_ledger(status) WHERE status != 'complete';
|
||||
|
||||
-- ============================================================
|
||||
-- Trigger-based search_vector (spans pages + timeline_entries)
|
||||
-- ============================================================
|
||||
@@ -469,6 +554,8 @@ BEGIN
|
||||
ALTER TABLE config ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE files ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE minion_jobs ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE sources ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE file_migration_ledger ENABLE ROW LEVEL SECURITY;
|
||||
RAISE NOTICE 'RLS enabled on all tables (role % has BYPASSRLS)', current_user;
|
||||
ELSE
|
||||
RAISE WARNING 'Skipping RLS: role % does not have BYPASSRLS privilege. Run as postgres role to enable.', current_user;
|
||||
|
||||
@@ -7,6 +7,14 @@
|
||||
* 3. By type: no page type exceeds 60% of results
|
||||
* 4. By page: max N chunks per page (default 2)
|
||||
* 5. Compiled truth guarantee: ensure at least 1 compiled_truth chunk per page
|
||||
*
|
||||
* v0.18.0: every page key is composite (source_id, slug). Pre-v0.17 this
|
||||
* was slug alone — under multi-source uniqueness that would collapse two
|
||||
* same-slug pages in different sources into one, destroying recall.
|
||||
* Codex review flagged this as a regression-critical path. The
|
||||
* `pageKey()` helper below is the one canonical way to derive the key;
|
||||
* every layer uses it so future "dedup just changed" drift is one file
|
||||
* to fix.
|
||||
*/
|
||||
|
||||
import type { SearchResult } from '../types.ts';
|
||||
@@ -15,6 +23,17 @@ const COSINE_DEDUP_THRESHOLD = 0.85;
|
||||
const MAX_TYPE_RATIO = 0.6;
|
||||
const MAX_PER_PAGE = 2;
|
||||
|
||||
/**
|
||||
* Composite page key: (source_id, slug). Pre-v0.17 rows lacked source_id
|
||||
* so we fall back to 'default' to preserve single-source brain behavior
|
||||
* exactly. Post-v0.17 callers always populate source_id (SQL JOINs in
|
||||
* pglite/postgres engine search paths).
|
||||
*/
|
||||
function pageKey(r: SearchResult): string {
|
||||
const source = r.source_id ?? 'default';
|
||||
return `${source}:${r.slug}`;
|
||||
}
|
||||
|
||||
export function dedupResults(
|
||||
results: SearchResult[],
|
||||
opts?: {
|
||||
@@ -58,9 +77,10 @@ function dedupBySource(results: SearchResult[]): SearchResult[] {
|
||||
const byPage = new Map<string, SearchResult[]>();
|
||||
|
||||
for (const r of results) {
|
||||
const existing = byPage.get(r.slug) || [];
|
||||
const k = pageKey(r);
|
||||
const existing = byPage.get(k) || [];
|
||||
existing.push(r);
|
||||
byPage.set(r.slug, existing);
|
||||
byPage.set(k, existing);
|
||||
}
|
||||
|
||||
const kept: SearchResult[] = [];
|
||||
@@ -130,10 +150,11 @@ function capPerPage(results: SearchResult[], maxPerPage: number): SearchResult[]
|
||||
const kept: SearchResult[] = [];
|
||||
|
||||
for (const r of results) {
|
||||
const count = pageCounts.get(r.slug) || 0;
|
||||
const k = pageKey(r);
|
||||
const count = pageCounts.get(k) || 0;
|
||||
if (count < maxPerPage) {
|
||||
kept.push(r);
|
||||
pageCounts.set(r.slug, count + 1);
|
||||
pageCounts.set(k, count + 1);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -145,30 +166,35 @@ function capPerPage(results: SearchResult[], maxPerPage: number): SearchResult[]
|
||||
* swap in the best compiled_truth chunk from the pre-dedup set (if one exists).
|
||||
*/
|
||||
function guaranteeCompiledTruth(results: SearchResult[], preDedup: SearchResult[]): SearchResult[] {
|
||||
// Group results by page
|
||||
// Group results by composite page key (source_id, slug).
|
||||
const byPage = new Map<string, SearchResult[]>();
|
||||
for (const r of results) {
|
||||
const existing = byPage.get(r.slug) || [];
|
||||
const k = pageKey(r);
|
||||
const existing = byPage.get(k) || [];
|
||||
existing.push(r);
|
||||
byPage.set(r.slug, existing);
|
||||
byPage.set(k, existing);
|
||||
}
|
||||
|
||||
const output = [...results];
|
||||
|
||||
for (const [slug, pageChunks] of byPage) {
|
||||
for (const [key, pageChunks] of byPage) {
|
||||
const hasCompiledTruth = pageChunks.some(c => c.chunk_source === 'compiled_truth');
|
||||
if (hasCompiledTruth) continue;
|
||||
|
||||
// Find the best compiled_truth chunk from pre-dedup input for this page
|
||||
// Find the best compiled_truth chunk from pre-dedup input for this
|
||||
// (source_id, slug) combination. Pre-v0.17 single-source match was
|
||||
// "r.slug === slug"; now it's the composite key so two same-slug
|
||||
// pages in different sources don't mistakenly swap chunks across.
|
||||
const candidate = preDedup
|
||||
.filter(r => r.slug === slug && r.chunk_source === 'compiled_truth')
|
||||
.filter(r => pageKey(r) === key && r.chunk_source === 'compiled_truth')
|
||||
.sort((a, b) => b.score - a.score)[0];
|
||||
|
||||
if (!candidate) continue;
|
||||
|
||||
// Swap: replace the lowest-scored chunk from this page
|
||||
// Swap: replace the lowest-scored chunk from this page (same
|
||||
// composite key match).
|
||||
const lowestIdx = output.reduce((minIdx, r, idx) => {
|
||||
if (r.slug !== slug) return minIdx;
|
||||
if (pageKey(r) !== key) return minIdx;
|
||||
if (minIdx === -1) return idx;
|
||||
return r.score < output[minIdx].score ? idx : minIdx;
|
||||
}, -1);
|
||||
|
||||
139
src/core/source-resolver.ts
Normal file
139
src/core/source-resolver.ts
Normal file
@@ -0,0 +1,139 @@
|
||||
/**
|
||||
* Source resolution for CLI commands (v0.18.0).
|
||||
*
|
||||
* Resolution priority (highest first):
|
||||
* 1. Explicit --source <id> flag (caller passes this as `explicit`)
|
||||
* 2. GBRAIN_SOURCE env var
|
||||
* 3. .gbrain-source dotfile in CWD or any ancestor directory
|
||||
* 4. Registered source whose local_path contains CWD
|
||||
* 5. Brain-level default via `gbrain sources default <id>`
|
||||
* 6. Literal 'default' (backward compat for pre-v0.17 brains)
|
||||
*
|
||||
* This helper is shared by the sources CLI, future sync/extract/query
|
||||
* commands (Steps 4/5), and the operation layer (Step 2+).
|
||||
*/
|
||||
|
||||
import { readFileSync, existsSync } from 'fs';
|
||||
import { join, dirname, resolve } from 'path';
|
||||
import type { BrainEngine } from './engine.ts';
|
||||
|
||||
const DOTFILE = '.gbrain-source';
|
||||
// Must start + end with alnum, interior dashes allowed. Max 32 chars.
|
||||
// Single-char alnum is also valid. Kebab-case enforced so citation keys
|
||||
// like `[wiki:slug]` can't have ugly edges like `[wiki-:slug]`.
|
||||
const SOURCE_ID_RE = /^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$/;
|
||||
|
||||
function readDotfileWalk(startDir: string): string | null {
|
||||
let dir = resolve(startDir);
|
||||
// Guard against infinite loops on malformed paths.
|
||||
for (let i = 0; i < 50; i++) {
|
||||
const candidate = join(dir, DOTFILE);
|
||||
if (existsSync(candidate)) {
|
||||
try {
|
||||
const content = readFileSync(candidate, 'utf8').trim().split('\n')[0].trim();
|
||||
if (SOURCE_ID_RE.test(content)) return content;
|
||||
} catch {
|
||||
// Unreadable dotfile — skip and keep walking.
|
||||
}
|
||||
}
|
||||
const parent = dirname(dir);
|
||||
if (parent === dir) break; // reached filesystem root
|
||||
dir = parent;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the source id for a CLI command.
|
||||
*
|
||||
* @param engine Connected brain engine (for sources table lookups).
|
||||
* @param explicit The --source <id> flag value, if the caller parsed one.
|
||||
* @param cwd The working directory to walk for .gbrain-source. Defaults
|
||||
* to process.cwd(). Exposed for testability.
|
||||
* @returns The resolved source id. Falls back to 'default' if no other
|
||||
* signal is present. Never returns null — every command must
|
||||
* target exactly one default source.
|
||||
* @throws If the resolved id doesn't correspond to a registered source
|
||||
* (prevents silently writing to a nonexistent source and bloating
|
||||
* pages with a dead FK).
|
||||
*/
|
||||
export async function resolveSourceId(
|
||||
engine: BrainEngine,
|
||||
explicit: string | null | undefined,
|
||||
cwd: string = process.cwd(),
|
||||
): Promise<string> {
|
||||
// 1. Explicit flag wins.
|
||||
if (explicit) {
|
||||
if (!SOURCE_ID_RE.test(explicit)) {
|
||||
throw new Error(`Invalid --source value "${explicit}". Must match [a-z0-9-]{1,32}.`);
|
||||
}
|
||||
await assertSourceExists(engine, explicit);
|
||||
return explicit;
|
||||
}
|
||||
|
||||
// 2. Env var.
|
||||
const env = process.env.GBRAIN_SOURCE;
|
||||
if (env && env.length > 0) {
|
||||
if (!SOURCE_ID_RE.test(env)) {
|
||||
throw new Error(`Invalid GBRAIN_SOURCE value "${env}". Must match [a-z0-9-]{1,32}.`);
|
||||
}
|
||||
await assertSourceExists(engine, env);
|
||||
return env;
|
||||
}
|
||||
|
||||
// 3. .gbrain-source dotfile walk-up.
|
||||
const dotfile = readDotfileWalk(cwd);
|
||||
if (dotfile) {
|
||||
await assertSourceExists(engine, dotfile);
|
||||
return dotfile;
|
||||
}
|
||||
|
||||
// 4. Registered source whose local_path contains CWD.
|
||||
// Uses longest-prefix match so nested-path configurations (e.g.
|
||||
// gstack at ~/gstack + plans at ~/gstack/plans) pick the deepest.
|
||||
const registered = await engine.executeRaw<{ id: string; local_path: string }>(
|
||||
`SELECT id, local_path FROM sources WHERE local_path IS NOT NULL`,
|
||||
);
|
||||
const cwdResolved = resolve(cwd);
|
||||
let best: { id: string; pathLen: number } | null = null;
|
||||
for (const r of registered) {
|
||||
const p = resolve(r.local_path);
|
||||
if (cwdResolved === p || cwdResolved.startsWith(p + '/')) {
|
||||
if (!best || p.length > best.pathLen) {
|
||||
best = { id: r.id, pathLen: p.length };
|
||||
}
|
||||
}
|
||||
}
|
||||
if (best) return best.id;
|
||||
|
||||
// 5. Brain-level default.
|
||||
const globalDefault = await engine.getConfig('sources.default');
|
||||
if (globalDefault && SOURCE_ID_RE.test(globalDefault)) {
|
||||
await assertSourceExists(engine, globalDefault);
|
||||
return globalDefault;
|
||||
}
|
||||
|
||||
// 6. Fallback: the seeded 'default' source. Always exists post-migration
|
||||
// v16 so this is a safe terminal.
|
||||
return 'default';
|
||||
}
|
||||
|
||||
async function assertSourceExists(engine: BrainEngine, id: string): Promise<void> {
|
||||
const rows = await engine.executeRaw<{ id: string }>(
|
||||
`SELECT id FROM sources WHERE id = $1`,
|
||||
[id],
|
||||
);
|
||||
if (rows.length === 0) {
|
||||
throw new Error(
|
||||
`Source "${id}" not found. Available sources: ` +
|
||||
`run \`gbrain sources list\` to see registered sources, ` +
|
||||
`or \`gbrain sources add ${id}\` to create it.`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/** Exposed for tests. */
|
||||
export const __testing = {
|
||||
readDotfileWalk,
|
||||
SOURCE_ID_RE,
|
||||
};
|
||||
@@ -66,6 +66,12 @@ export interface SearchResult {
|
||||
chunk_index: number;
|
||||
score: number;
|
||||
stale: boolean;
|
||||
/**
|
||||
* v0.18.0: the sources.id the page belongs to. Dedup composite-keys
|
||||
* on (source_id, slug) — see src/core/search/dedup.ts. Defaults to
|
||||
* 'default' for pre-v0.17 rows that lacked the column.
|
||||
*/
|
||||
source_id?: string;
|
||||
}
|
||||
|
||||
export interface SearchOpts {
|
||||
|
||||
@@ -125,7 +125,7 @@ export function rowToChunk(row: Record<string, unknown>, includeEmbedding = fals
|
||||
}
|
||||
|
||||
export function rowToSearchResult(row: Record<string, unknown>): SearchResult {
|
||||
return {
|
||||
const result: SearchResult = {
|
||||
slug: row.slug as string,
|
||||
page_id: row.page_id as number,
|
||||
title: row.title as string,
|
||||
@@ -137,4 +137,12 @@ export function rowToSearchResult(row: Record<string, unknown>): SearchResult {
|
||||
score: Number(row.score),
|
||||
stale: Boolean(row.stale),
|
||||
};
|
||||
// v0.17.0: source_id comes from the p.source_id column in search
|
||||
// SELECTs. Keep the field optional so pre-v0.17 engines that didn't
|
||||
// join sources don't crash on the absent column — rowToSearchResult
|
||||
// is shared by both paths.
|
||||
if (typeof row.source_id === 'string') {
|
||||
result.source_id = row.source_id;
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
@@ -5,12 +5,55 @@ CREATE EXTENSION IF NOT EXISTS pg_trgm;
|
||||
-- gen_random_uuid() is core in Postgres 13+; enable pgcrypto as fallback for older versions
|
||||
CREATE EXTENSION IF NOT EXISTS pgcrypto;
|
||||
|
||||
-- ============================================================
|
||||
-- sources: multi-repo / multi-brain tenancy (v0.18.0)
|
||||
-- ============================================================
|
||||
-- A source is a logical brain-within-the-DB: wiki, gstack, yc-media, etc.
|
||||
-- Every page/file/ingest_log row carries source_id.
|
||||
--
|
||||
-- id: immutable citation key. [a-z0-9-]{1,32} enforced at app layer.
|
||||
-- Used in [source:slug] citations, --source flag, wikilink syntax.
|
||||
-- name: mutable display label. Rename via `gbrain sources rename`.
|
||||
-- local_path: optional git checkout root for filesystem-backed sources.
|
||||
-- config: forward-compat JSONB. Currently used for federation + ACL slot.
|
||||
-- { "federated": bool, "access_policy": {...} }
|
||||
-- - federated=true (or missing-but-explicit on 'default'):
|
||||
-- participates in cross-source default search.
|
||||
-- - federated=false (default for new sources):
|
||||
-- only searched when explicitly named via --source.
|
||||
-- - access_policy: forward-compat slot, no enforcement in v0.17.
|
||||
-- Write-side lockdown: mutated only when ctx.remote=false.
|
||||
CREATE TABLE IF NOT EXISTS sources (
|
||||
id TEXT PRIMARY KEY,
|
||||
name TEXT NOT NULL UNIQUE,
|
||||
local_path TEXT,
|
||||
last_commit TEXT,
|
||||
last_sync_at TIMESTAMPTZ,
|
||||
config JSONB NOT NULL DEFAULT '{}'::jsonb,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
-- Seed the default source. 'default' is federated=true for backward compat
|
||||
-- (pre-v0.17 brains behave exactly as before — every page appears in search).
|
||||
-- Pre-existing sync.repo_path / sync.last_commit are copied in by the v16
|
||||
-- migration, not here; fresh installs have no local_path until `sources add`
|
||||
-- or the first `sync`.
|
||||
INSERT INTO sources (id, name, config)
|
||||
VALUES ('default', 'default', '{"federated": true}'::jsonb)
|
||||
ON CONFLICT (id) DO NOTHING;
|
||||
|
||||
-- ============================================================
|
||||
-- pages: the core content table
|
||||
-- ============================================================
|
||||
-- v0.18.0 (Step 2): pages.source_id scopes each row to a sources(id) row.
|
||||
-- Slugs are unique per source, NOT globally. The default source is
|
||||
-- seeded in the sources block above so the DEFAULT 'default' FK is
|
||||
-- always valid at INSERT time.
|
||||
CREATE TABLE IF NOT EXISTS pages (
|
||||
id SERIAL PRIMARY KEY,
|
||||
slug TEXT NOT NULL UNIQUE,
|
||||
source_id TEXT NOT NULL DEFAULT 'default'
|
||||
REFERENCES sources(id) ON DELETE CASCADE,
|
||||
slug TEXT NOT NULL,
|
||||
type TEXT NOT NULL,
|
||||
title TEXT NOT NULL,
|
||||
compiled_truth TEXT NOT NULL DEFAULT '',
|
||||
@@ -18,7 +61,8 @@ CREATE TABLE IF NOT EXISTS pages (
|
||||
frontmatter JSONB NOT NULL DEFAULT '{}',
|
||||
content_hash TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
CONSTRAINT pages_source_slug_key UNIQUE (source_id, slug)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_type ON pages(type);
|
||||
@@ -26,6 +70,8 @@ CREATE INDEX IF NOT EXISTS idx_pages_frontmatter ON pages USING GIN(frontmatter)
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_trgm ON pages USING GIN(title gin_trgm_ops);
|
||||
-- v0.13.1 #170: avoids 14.6s seqscan on large brains when listing pages newest-first.
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_updated_at_desc ON pages (updated_at DESC);
|
||||
-- v0.18.0: source-scoped scans (per /plan-eng-review Section 4).
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_source_id ON pages(source_id);
|
||||
|
||||
-- ============================================================
|
||||
-- content_chunks: chunked content with embeddings
|
||||
@@ -70,6 +116,11 @@ CREATE TABLE IF NOT EXISTS links (
|
||||
link_source TEXT CHECK (link_source IS NULL OR link_source IN ('markdown', 'frontmatter', 'manual')),
|
||||
origin_page_id INTEGER REFERENCES pages(id) ON DELETE SET NULL,
|
||||
origin_field TEXT,
|
||||
-- v0.18.0 Step 4: 'qualified' when the link was written as
|
||||
-- [[source:slug]] (target source pinned). 'unqualified' when written
|
||||
-- as bare [[slug]] and resolved via local-first fallback at
|
||||
-- extraction time. NULL for legacy/manual/frontmatter edges.
|
||||
resolution_type TEXT CHECK (resolution_type IS NULL OR resolution_type IN ('qualified', 'unqualified')),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
-- NULLS NOT DISTINCT (PG15+) so two rows with link_source IS NULL or
|
||||
-- origin_page_id IS NULL collide as expected. Without this, every row with
|
||||
@@ -144,6 +195,9 @@ CREATE INDEX IF NOT EXISTS idx_versions_page ON page_versions(page_id);
|
||||
-- ============================================================
|
||||
-- ingest_log
|
||||
-- ============================================================
|
||||
-- NOTE (v0.18.0 Step 1): ingest_log.source_id is NOT added yet — lands
|
||||
-- in v17 alongside the sync rewrite (Step 5), which starts writing
|
||||
-- source-scoped entries.
|
||||
CREATE TABLE IF NOT EXISTS ingest_log (
|
||||
id SERIAL PRIMARY KEY,
|
||||
source_type TEXT NOT NULL,
|
||||
@@ -198,9 +252,18 @@ CREATE TABLE IF NOT EXISTS mcp_request_log (
|
||||
-- ============================================================
|
||||
-- files: binary attachments stored in Supabase Storage
|
||||
-- ============================================================
|
||||
-- v0.18.0 Step 7: files gains source_id + page_id alongside the
|
||||
-- legacy page_slug (kept for backward compat until a later release).
|
||||
-- The file_migration_ledger below drives the storage object rewrite.
|
||||
-- page_slug FK had ON UPDATE CASCADE — removed because slugs are no
|
||||
-- longer global (composite UNIQUE) so CASCADE on-update is ambiguous.
|
||||
-- ON DELETE SET NULL is preserved via both page_slug and page_id.
|
||||
CREATE TABLE IF NOT EXISTS files (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_slug TEXT REFERENCES pages(slug) ON DELETE SET NULL ON UPDATE CASCADE,
|
||||
source_id TEXT NOT NULL DEFAULT 'default'
|
||||
REFERENCES sources(id) ON DELETE CASCADE,
|
||||
page_slug TEXT,
|
||||
page_id INTEGER REFERENCES pages(id) ON DELETE SET NULL,
|
||||
filename TEXT NOT NULL,
|
||||
storage_path TEXT NOT NULL,
|
||||
mime_type TEXT,
|
||||
@@ -215,8 +278,30 @@ CREATE TABLE IF NOT EXISTS files (
|
||||
ALTER TABLE files DROP COLUMN IF EXISTS storage_url;
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_files_page ON files(page_slug);
|
||||
CREATE INDEX IF NOT EXISTS idx_files_page_id ON files(page_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_files_source_id ON files(source_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_files_hash ON files(content_hash);
|
||||
|
||||
-- ============================================================
|
||||
-- file_migration_ledger (v0.18.0 Step 7)
|
||||
-- Drives the storage-object rewrite performed by the v0_18_0
|
||||
-- orchestrator's phase B. Keyed on file_id so two sources can share
|
||||
-- an old path during migration without PK collision (Codex second-
|
||||
-- pass caught this).
|
||||
-- Status state machine: pending → copy_done → db_updated → complete
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS file_migration_ledger (
|
||||
file_id INTEGER PRIMARY KEY REFERENCES files(id) ON DELETE CASCADE,
|
||||
storage_path_old TEXT NOT NULL,
|
||||
storage_path_new TEXT NOT NULL,
|
||||
status TEXT NOT NULL DEFAULT 'pending',
|
||||
error TEXT,
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
CONSTRAINT chk_ledger_status CHECK (status IN ('pending','copy_done','db_updated','complete','failed'))
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_file_migration_ledger_status
|
||||
ON file_migration_ledger(status) WHERE status != 'complete';
|
||||
|
||||
-- ============================================================
|
||||
-- Trigger-based search_vector (spans pages + timeline_entries)
|
||||
-- ============================================================
|
||||
@@ -465,6 +550,8 @@ BEGIN
|
||||
ALTER TABLE config ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE files ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE minion_jobs ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE sources ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE file_migration_ledger ENABLE ROW LEVEL SECURITY;
|
||||
RAISE NOTICE 'RLS enabled on all tables (role % has BYPASSRLS)', current_user;
|
||||
ELSE
|
||||
RAISE WARNING 'Skipping RLS: role % does not have BYPASSRLS privilege. Run as postgres role to enable.', current_user;
|
||||
|
||||
@@ -105,8 +105,9 @@ describe('buildPlan — diff against completed + installed VERSION', () => {
|
||||
// Future migrations (registered but newer than installed VERSION) land in
|
||||
// skippedFuture until the binary catches up. v0.13.0 = frontmatter graph,
|
||||
// v0.13.1 = Knowledge Runtime grandfather, v0.14.0 = shell jobs +
|
||||
// autopilot cooperative, v0.16.0 = subagent runtime (this branch).
|
||||
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.0', '0.12.2', '0.13.0', '0.13.1', '0.14.0', '0.16.0']);
|
||||
// autopilot cooperative, v0.16.0 = subagent runtime, v0.18.0 = multi-
|
||||
// source brains (this branch).
|
||||
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.0', '0.12.2', '0.13.0', '0.13.1', '0.14.0', '0.16.0', '0.18.0']);
|
||||
});
|
||||
|
||||
test('already applied → v0.11.0 lands in `applied` bucket, not pending', () => {
|
||||
@@ -142,11 +143,11 @@ describe('buildPlan — diff against completed + installed VERSION', () => {
|
||||
const idx = indexCompleted([]);
|
||||
const plan = buildPlan(idx, '0.12.0');
|
||||
expect(plan.pending.map(m => m.version)).toContain('0.11.0');
|
||||
// v0.12.2, v0.13.0, v0.13.1, v0.14.0, and v0.16.0 were added later;
|
||||
// v0.12.2, v0.13.0, v0.13.1, v0.14.0, v0.16.0, v0.18.0 were added later;
|
||||
// installed=0.12.0 means they belong in skippedFuture, not pending. v0.11.0
|
||||
// and v0.12.0 stay pending despite being ≤ installed — that is the H9
|
||||
// invariant.
|
||||
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.2', '0.13.0', '0.13.1', '0.14.0', '0.16.0']);
|
||||
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.2', '0.13.0', '0.13.1', '0.14.0', '0.16.0', '0.18.0']);
|
||||
});
|
||||
|
||||
test('--migration filter narrows to one version', () => {
|
||||
|
||||
@@ -154,3 +154,75 @@ describe('edge cases', () => {
|
||||
expect(deduped.filter(r => r.slug === 'a').length).toBeLessThanOrEqual(3);
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// v0.18.0 Step 3 — source-aware dedup (REGRESSION-CRITICAL per Codex)
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Pre-v0.17 dedup collapsed on slug alone. Under multi-source
|
||||
// uniqueness, two same-slug pages in different sources ARE different
|
||||
// pages — collapsing them destroys cross-source recall. Codex flagged
|
||||
// this as a regression-critical path in the outside-voice review.
|
||||
describe('dedup — source-aware composite key (v0.18.0)', () => {
|
||||
test('same slug across two sources does NOT collapse via dedupBySource layer', () => {
|
||||
// Two pages, same slug, different sources. Both should survive
|
||||
// Layer 1 (top-3-per-page) because they are DIFFERENT pages.
|
||||
const results = [
|
||||
makeResult({ slug: 'topics/ai', source_id: 'wiki', score: 0.9, chunk_text: 'wiki take on ai' }),
|
||||
makeResult({ slug: 'topics/ai', source_id: 'gstack', score: 0.85, chunk_text: 'gstack plans for ai' }),
|
||||
];
|
||||
const deduped = dedupResults(results);
|
||||
// Both pages represented — one result each.
|
||||
const wikiHits = deduped.filter(r => r.source_id === 'wiki' && r.slug === 'topics/ai');
|
||||
const gstackHits = deduped.filter(r => r.source_id === 'gstack' && r.slug === 'topics/ai');
|
||||
expect(wikiHits.length).toBe(1);
|
||||
expect(gstackHits.length).toBe(1);
|
||||
});
|
||||
|
||||
test('same slug + same source DOES collapse to maxPerPage', () => {
|
||||
// Control: same-source-same-slug behavior unchanged from pre-v0.17.
|
||||
const results = [
|
||||
makeResult({ slug: 'topics/ai', source_id: 'wiki', chunk_id: 1, score: 0.9, chunk_text: 'chunk one distinct content here' }),
|
||||
makeResult({ slug: 'topics/ai', source_id: 'wiki', chunk_id: 2, score: 0.8, chunk_text: 'chunk two also distinct words' }),
|
||||
makeResult({ slug: 'topics/ai', source_id: 'wiki', chunk_id: 3, score: 0.7, chunk_text: 'chunk three different terms again' }),
|
||||
];
|
||||
const deduped = dedupResults(results);
|
||||
// Default maxPerPage=2 → only 2 of the 3 wiki:topics/ai chunks survive.
|
||||
const wikiHits = deduped.filter(r => r.source_id === 'wiki' && r.slug === 'topics/ai');
|
||||
expect(wikiHits.length).toBeLessThanOrEqual(2);
|
||||
});
|
||||
|
||||
test('missing source_id defaults to "default" for backward compat', () => {
|
||||
// Pre-v0.17 brains (single source, rows with no source_id column)
|
||||
// still dedup correctly: the fallback key groups them all under
|
||||
// the 'default' source bucket.
|
||||
const results = [
|
||||
makeResult({ slug: 'topics/ai', chunk_id: 1, score: 0.9, chunk_text: 'chunk one distinct content words' }),
|
||||
makeResult({ slug: 'topics/ai', chunk_id: 2, score: 0.8, chunk_text: 'chunk two totally different phrasing' }),
|
||||
makeResult({ slug: 'topics/ai', chunk_id: 3, score: 0.7, chunk_text: 'chunk three new unique text here' }),
|
||||
];
|
||||
const deduped = dedupResults(results);
|
||||
// All three should group as one page (no source_id → default), so
|
||||
// maxPerPage=2 cap applies.
|
||||
expect(deduped.length).toBeLessThanOrEqual(2);
|
||||
});
|
||||
|
||||
test('compiled_truth guarantee scopes to (source_id, slug), not slug alone', () => {
|
||||
// Two pages, same slug, different sources. wiki's top-scoring chunk
|
||||
// is timeline; gstack has only compiled_truth. The guarantee must
|
||||
// swap in wiki's compiled_truth for wiki (without touching gstack)
|
||||
// and must NOT accidentally pull gstack's compiled_truth into wiki.
|
||||
const results = [
|
||||
makeResult({ slug: 'topics/ai', source_id: 'wiki', score: 0.9, chunk_source: 'timeline', chunk_id: 1, chunk_text: 'wiki timeline chunk content here' }),
|
||||
makeResult({ slug: 'topics/ai', source_id: 'wiki', score: 0.5, chunk_source: 'compiled_truth', chunk_id: 2, chunk_text: 'wiki compiled truth content text' }),
|
||||
makeResult({ slug: 'topics/ai', source_id: 'gstack', score: 0.7, chunk_source: 'compiled_truth', chunk_id: 3, chunk_text: 'gstack compiled truth something else' }),
|
||||
];
|
||||
const deduped = dedupResults(results);
|
||||
// Wiki ends up with a compiled_truth (swapped from its own source,
|
||||
// not gstack's).
|
||||
const wikiCompiledTruths = deduped.filter(
|
||||
r => r.source_id === 'wiki' && r.slug === 'topics/ai' && r.chunk_source === 'compiled_truth',
|
||||
);
|
||||
expect(wikiCompiledTruths.length).toBe(1);
|
||||
expect(wikiCompiledTruths[0].chunk_id).toBe(2); // wiki's own compiled_truth, NOT gstack's (id=3)
|
||||
});
|
||||
});
|
||||
|
||||
@@ -633,7 +633,7 @@ describeE2E('E2E: file_list LIMIT enforcement', () => {
|
||||
await sql`
|
||||
INSERT INTO pages (slug, title, type, compiled_truth, frontmatter)
|
||||
VALUES (${testSlug}, ${'Test Limit Page'}, ${'note'}, ${'body'}, ${'{}'}::jsonb)
|
||||
ON CONFLICT (slug) DO NOTHING
|
||||
ON CONFLICT (source_id, slug) DO NOTHING
|
||||
`;
|
||||
|
||||
// Insert 150 file rows for the same slug
|
||||
|
||||
608
test/e2e/multi-source.test.ts
Normal file
608
test/e2e/multi-source.test.ts
Normal file
@@ -0,0 +1,608 @@
|
||||
/**
|
||||
* E2E: v0.18.0 multi-source migrations against REAL Postgres.
|
||||
*
|
||||
* PGLite doesn't have a files table (see pglite-schema.ts header), so the
|
||||
* v23 migration's files.source_id + files.page_id rewrite + ledger seed
|
||||
* is NEVER executed by the PGLite integration test. This file closes
|
||||
* that gap by exercising the full v20-v23 chain against a real Postgres
|
||||
* DB with pre-existing data.
|
||||
*
|
||||
* Also covers the gaps in the PR's pre-shipping test matrix that the
|
||||
* author self-audited:
|
||||
* - files.page_slug → page_id backfill against real rows
|
||||
* - file_migration_ledger seeding
|
||||
* - cascade delete via sources.remove (pages + chunks + timeline +
|
||||
* files + links all gone)
|
||||
* - sync --source <id> routing reads + writes per-source sync anchors
|
||||
* instead of the global config keys
|
||||
*
|
||||
* Gated by DATABASE_URL — skips gracefully when unset, per the CLAUDE.md
|
||||
* E2E lifecycle pattern.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import { PostgresEngine } from '../../src/core/postgres-engine.ts';
|
||||
import { runSources } from '../../src/commands/sources.ts';
|
||||
import { performSync } from '../../src/commands/sync.ts';
|
||||
import { runStorageBackfill } from '../../src/commands/migrations/v0_18_0-storage-backfill.ts';
|
||||
import type { StorageBackend } from '../../src/core/storage.ts';
|
||||
import { hasDatabase, setupDB, teardownDB, getConn, getEngine } from './helpers.ts';
|
||||
|
||||
const SKIP = !hasDatabase();
|
||||
const describeE2E = SKIP ? describe.skip : describe;
|
||||
|
||||
describeE2E('v0.18.0 multi-source — Postgres schema shape (fresh install)', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
|
||||
// residual rows from prior test runs can shadow new INSERTs. Wipe
|
||||
// non-default sources at the top of every describe to keep each
|
||||
// block hermetic. file_migration_ledger cascades from files which
|
||||
// setupDB already truncates, but wipe explicitly in case files did
|
||||
// not cascade it.
|
||||
const conn = getConn();
|
||||
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
|
||||
await conn.unsafe(`DELETE FROM file_migration_ledger`);
|
||||
});
|
||||
afterAll(async () => {
|
||||
await teardownDB();
|
||||
});
|
||||
|
||||
test("sources('default') exists after initSchema + migration chain", async () => {
|
||||
const conn = getConn();
|
||||
const rows = await conn.unsafe(
|
||||
`SELECT id, name, config FROM sources WHERE id = 'default'`,
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].name).toBe('default');
|
||||
const config = typeof rows[0].config === 'string' ? JSON.parse(rows[0].config) : rows[0].config;
|
||||
expect(config.federated).toBe(true);
|
||||
});
|
||||
|
||||
test('pages.source_id NOT NULL with DEFAULT default (v21)', async () => {
|
||||
const conn = getConn();
|
||||
const rows = await conn.unsafe(
|
||||
`SELECT column_name, column_default, is_nullable
|
||||
FROM information_schema.columns
|
||||
WHERE table_name = 'pages' AND column_name = 'source_id'`,
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].is_nullable).toBe('NO');
|
||||
expect(String(rows[0].column_default)).toContain('default');
|
||||
});
|
||||
|
||||
test('composite UNIQUE pages(source_id, slug) replaces global UNIQUE(slug)', async () => {
|
||||
const conn = getConn();
|
||||
const composite = await conn.unsafe(
|
||||
`SELECT conname FROM pg_constraint WHERE conname = 'pages_source_slug_key'`,
|
||||
);
|
||||
expect(composite.length).toBe(1);
|
||||
const oldGlobal = await conn.unsafe(
|
||||
`SELECT conname FROM pg_constraint WHERE conname = 'pages_slug_key'`,
|
||||
);
|
||||
expect(oldGlobal.length).toBe(0);
|
||||
});
|
||||
|
||||
test('links.resolution_type column exists with CHECK (v22)', async () => {
|
||||
const conn = getConn();
|
||||
const rows = await conn.unsafe(
|
||||
`SELECT column_name FROM information_schema.columns
|
||||
WHERE table_name = 'links' AND column_name = 'resolution_type'`,
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
const check = await conn.unsafe(
|
||||
`SELECT conname FROM pg_constraint WHERE conname = 'links_resolution_type_check'`,
|
||||
);
|
||||
expect(check.length).toBe(1);
|
||||
});
|
||||
|
||||
test('files.source_id + files.page_id columns exist (v23, Postgres-only)', async () => {
|
||||
const conn = getConn();
|
||||
const cols = await conn.unsafe(
|
||||
`SELECT column_name FROM information_schema.columns
|
||||
WHERE table_name = 'files' AND column_name IN ('source_id', 'page_id')`,
|
||||
);
|
||||
// postgres.js returns RowList with an iterable-row shape; cast via
|
||||
// unknown before narrowing to plain objects (TS2352 otherwise).
|
||||
const names = new Set(
|
||||
(cols as unknown as Array<{ column_name: string }>).map(r => r.column_name),
|
||||
);
|
||||
expect(names.has('source_id')).toBe(true);
|
||||
expect(names.has('page_id')).toBe(true);
|
||||
});
|
||||
|
||||
test('file_migration_ledger table exists with status CHECK (v23)', async () => {
|
||||
const conn = getConn();
|
||||
const tables = await conn.unsafe(
|
||||
`SELECT table_name FROM information_schema.tables
|
||||
WHERE table_name = 'file_migration_ledger'`,
|
||||
);
|
||||
expect(tables.length).toBe(1);
|
||||
const check = await conn.unsafe(
|
||||
`SELECT conname FROM pg_constraint WHERE conname = 'chk_ledger_status'`,
|
||||
);
|
||||
expect(check.length).toBe(1);
|
||||
});
|
||||
});
|
||||
|
||||
describeE2E('v0.18.0 multi-source — composite UNIQUE semantics on real Postgres', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
|
||||
// residual rows from prior test runs can shadow new INSERTs. Wipe
|
||||
// non-default sources at the top of every describe to keep each
|
||||
// block hermetic. file_migration_ledger cascades from files which
|
||||
// setupDB already truncates, but wipe explicitly in case files did
|
||||
// not cascade it.
|
||||
const conn = getConn();
|
||||
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
|
||||
await conn.unsafe(`DELETE FROM file_migration_ledger`);
|
||||
});
|
||||
afterAll(async () => {
|
||||
await teardownDB();
|
||||
});
|
||||
|
||||
test('same slug in two sources coexists (REGRESSION GUARD — Codex critical)', async () => {
|
||||
const conn = getConn();
|
||||
// Create a second source.
|
||||
const engine = getEngine();
|
||||
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['add', 'wiki', '--federated']);
|
||||
|
||||
// Insert the same slug under 'default' (via putPage) and 'wiki' (raw INSERT).
|
||||
await engine.putPage('topics/ai', {
|
||||
type: 'concept', title: 'AI from default', compiled_truth: 'default source take',
|
||||
});
|
||||
await conn.unsafe(
|
||||
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
|
||||
VALUES ('wiki', 'topics/ai', 'concept', 'AI from wiki', 'wiki source take', '', '{}'::jsonb, 'wikihash')`,
|
||||
);
|
||||
|
||||
const rows = await conn.unsafe(
|
||||
`SELECT source_id, slug, title FROM pages WHERE slug = 'topics/ai' ORDER BY source_id`,
|
||||
);
|
||||
expect(rows.length).toBe(2);
|
||||
expect(rows.map((r: any) => r.source_id).sort()).toEqual(['default', 'wiki']);
|
||||
});
|
||||
|
||||
test('duplicate (source_id, slug) hits composite UNIQUE', async () => {
|
||||
const conn = getConn();
|
||||
let err: Error | null = null;
|
||||
try {
|
||||
await conn.unsafe(
|
||||
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
|
||||
VALUES ('wiki', 'topics/ai', 'concept', 'dup', '', '', '{}'::jsonb, 'dup')`,
|
||||
);
|
||||
} catch (e) {
|
||||
err = e as Error;
|
||||
}
|
||||
expect(err).not.toBeNull();
|
||||
expect(err!.message.toLowerCase()).toMatch(/unique|duplicate/);
|
||||
});
|
||||
|
||||
test('putPage (engine API) targets default source by schema DEFAULT', async () => {
|
||||
const engine = getEngine();
|
||||
await engine.putPage('topics/from-putpage', {
|
||||
type: 'note', title: 'Via putPage', compiled_truth: 'body',
|
||||
});
|
||||
const conn = getConn();
|
||||
const rows = await conn.unsafe(
|
||||
`SELECT source_id FROM pages WHERE slug = 'topics/from-putpage'`,
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].source_id).toBe('default');
|
||||
});
|
||||
});
|
||||
|
||||
describeE2E('v0.18.0 multi-source — cascade delete covers every dependent row', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
|
||||
// residual rows from prior test runs can shadow new INSERTs. Wipe
|
||||
// non-default sources at the top of every describe to keep each
|
||||
// block hermetic. file_migration_ledger cascades from files which
|
||||
// setupDB already truncates, but wipe explicitly in case files did
|
||||
// not cascade it.
|
||||
const conn = getConn();
|
||||
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
|
||||
await conn.unsafe(`DELETE FROM file_migration_ledger`);
|
||||
});
|
||||
afterAll(async () => {
|
||||
await teardownDB();
|
||||
});
|
||||
|
||||
test('sources remove cascades to pages + chunks + timeline + links + files', async () => {
|
||||
const conn = getConn();
|
||||
const engine = getEngine();
|
||||
|
||||
// Build a fully populated source: page, chunks, timeline entries,
|
||||
// links, a file row. Then remove the source and verify nothing
|
||||
// for that source survives.
|
||||
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['add', 'cascadetest', '--federated']);
|
||||
|
||||
// Page under cascadetest
|
||||
await conn.unsafe(
|
||||
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
|
||||
VALUES ('cascadetest', 'people/alice', 'person', 'Alice', 'Alice body', '', '{}'::jsonb, 'h1')`,
|
||||
);
|
||||
const alicePage = await conn.unsafe(
|
||||
`SELECT id FROM pages WHERE source_id = 'cascadetest' AND slug = 'people/alice'`,
|
||||
);
|
||||
const aliceId = alicePage[0].id as number;
|
||||
|
||||
// A second page for link target
|
||||
await conn.unsafe(
|
||||
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
|
||||
VALUES ('cascadetest', 'companies/acme', 'company', 'Acme', 'Acme body', '', '{}'::jsonb, 'h2')`,
|
||||
);
|
||||
const acmePage = await conn.unsafe(
|
||||
`SELECT id FROM pages WHERE source_id = 'cascadetest' AND slug = 'companies/acme'`,
|
||||
);
|
||||
const acmeId = acmePage[0].id as number;
|
||||
|
||||
// Chunk
|
||||
await conn.unsafe(
|
||||
`INSERT INTO content_chunks (page_id, chunk_index, chunk_text, chunk_source)
|
||||
VALUES (${aliceId}, 0, 'Alice body chunk', 'compiled_truth')`,
|
||||
);
|
||||
|
||||
// Timeline
|
||||
await conn.unsafe(
|
||||
`INSERT INTO timeline_entries (page_id, date, source, summary, detail)
|
||||
VALUES (${aliceId}, '2026-01-15', 'test', 'Joined Acme', 'detail')`,
|
||||
);
|
||||
|
||||
// Link Alice → Acme
|
||||
await conn.unsafe(
|
||||
`INSERT INTO links (from_page_id, to_page_id, link_type, link_source)
|
||||
VALUES (${aliceId}, ${acmeId}, 'works_at', 'markdown')`,
|
||||
);
|
||||
|
||||
// File row pointing at Alice
|
||||
await conn.unsafe(
|
||||
`INSERT INTO files (source_id, page_id, filename, storage_path, content_hash)
|
||||
VALUES ('cascadetest', ${aliceId}, 'alice.pdf', 'cascadetest/people/alice/alice.pdf', 'fh1')`,
|
||||
);
|
||||
|
||||
// Sanity: everything exists
|
||||
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM pages WHERE source_id = 'cascadetest'`))[0].n).toBe(2);
|
||||
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM content_chunks WHERE page_id = ${aliceId}`))[0].n).toBe(1);
|
||||
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM timeline_entries WHERE page_id = ${aliceId}`))[0].n).toBe(1);
|
||||
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM links WHERE from_page_id = ${aliceId}`))[0].n).toBe(1);
|
||||
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM files WHERE source_id = 'cascadetest'`))[0].n).toBe(1);
|
||||
|
||||
// Remove the source.
|
||||
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['remove', 'cascadetest', '--yes']);
|
||||
|
||||
// Everything for that source is gone.
|
||||
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM pages WHERE source_id = 'cascadetest'`))[0].n).toBe(0);
|
||||
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM content_chunks WHERE page_id = ${aliceId}`))[0].n).toBe(0);
|
||||
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM timeline_entries WHERE page_id = ${aliceId}`))[0].n).toBe(0);
|
||||
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM links WHERE from_page_id = ${aliceId}`))[0].n).toBe(0);
|
||||
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM files WHERE source_id = 'cascadetest'`))[0].n).toBe(0);
|
||||
|
||||
// The sources row itself is gone.
|
||||
const src = await conn.unsafe(`SELECT id FROM sources WHERE id = 'cascadetest'`);
|
||||
expect(src.length).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
describeE2E('v0.18.0 multi-source — sync --source routes through sources table', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
|
||||
// residual rows from prior test runs can shadow new INSERTs. Wipe
|
||||
// non-default sources at the top of every describe to keep each
|
||||
// block hermetic. file_migration_ledger cascades from files which
|
||||
// setupDB already truncates, but wipe explicitly in case files did
|
||||
// not cascade it.
|
||||
const conn = getConn();
|
||||
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
|
||||
await conn.unsafe(`DELETE FROM file_migration_ledger`);
|
||||
});
|
||||
afterAll(async () => {
|
||||
await teardownDB();
|
||||
});
|
||||
|
||||
test('performSync with sourceId reads local_path from sources row', async () => {
|
||||
const engine = getEngine();
|
||||
const conn = getConn();
|
||||
|
||||
// Register a source with a bogus path (we're not actually walking a
|
||||
// repo — this test asserts that performSync correctly RESOLVES the
|
||||
// source row vs hitting the global config).
|
||||
await runSources(engine as unknown as Parameters<typeof runSources>[0], [
|
||||
'add', 'syncsrc', '--path', '/nonexistent/syncsrc/path', '--no-federated',
|
||||
]);
|
||||
|
||||
// Also set a DIFFERENT path in the global config so we can verify
|
||||
// sourceId actually disambiguates.
|
||||
await engine.setConfig('sync.repo_path', '/some/other/default/path');
|
||||
|
||||
// performSync({sourceId: 'syncsrc'}) should attempt to use
|
||||
// /nonexistent/syncsrc/path, NOT /some/other/default/path.
|
||||
let err: Error | null = null;
|
||||
try {
|
||||
await performSync(engine, { sourceId: 'syncsrc' });
|
||||
} catch (e) {
|
||||
err = e as Error;
|
||||
}
|
||||
expect(err).not.toBeNull();
|
||||
// The error message references the source-scoped path, not the
|
||||
// global config path. (Could be "Not a git repository"
|
||||
// or "No commits in repo" — either way the path it cites should
|
||||
// be the source's.)
|
||||
expect(err!.message).toContain('/nonexistent/syncsrc/path');
|
||||
expect(err!.message).not.toContain('/some/other/default/path');
|
||||
});
|
||||
|
||||
test('performSync with no sourceId falls back to global sync.repo_path', async () => {
|
||||
const engine = getEngine();
|
||||
// Global config is still '/some/other/default/path' from the
|
||||
// previous test. Without --source, performSync uses it.
|
||||
let err: Error | null = null;
|
||||
try {
|
||||
await performSync(engine, {});
|
||||
} catch (e) {
|
||||
err = e as Error;
|
||||
}
|
||||
expect(err).not.toBeNull();
|
||||
expect(err!.message).toContain('/some/other/default/path');
|
||||
});
|
||||
});
|
||||
|
||||
describeE2E('v0.18.0 multi-source — sources table surface', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
|
||||
// residual rows from prior test runs can shadow new INSERTs. Wipe
|
||||
// non-default sources at the top of every describe to keep each
|
||||
// block hermetic. file_migration_ledger cascades from files which
|
||||
// setupDB already truncates, but wipe explicitly in case files did
|
||||
// not cascade it.
|
||||
const conn = getConn();
|
||||
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
|
||||
await conn.unsafe(`DELETE FROM file_migration_ledger`);
|
||||
});
|
||||
afterAll(async () => {
|
||||
await teardownDB();
|
||||
});
|
||||
|
||||
test('default source is seeded federated=true; new sources default to isolated', async () => {
|
||||
const conn = getConn();
|
||||
const engine = getEngine();
|
||||
|
||||
const def = await conn.unsafe(`SELECT config FROM sources WHERE id = 'default'`);
|
||||
const defConfig = typeof def[0].config === 'string' ? JSON.parse(def[0].config) : def[0].config;
|
||||
expect(defConfig.federated).toBe(true);
|
||||
|
||||
// Defensive cleanup: sources isn't in helpers.ALL_TABLES, so residual
|
||||
// rows from prior test runs can shadow this INSERT via ON CONFLICT
|
||||
// DO NOTHING. Delete first, then create.
|
||||
await conn.unsafe(`DELETE FROM sources WHERE id = 'isolatedsrc'`);
|
||||
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['add', 'isolatedsrc']);
|
||||
const iso = await conn.unsafe(`SELECT config FROM sources WHERE id = 'isolatedsrc'`);
|
||||
const isoConfig = typeof iso[0].config === 'string' ? JSON.parse(iso[0].config) : iso[0].config;
|
||||
expect(isoConfig.federated).toBeUndefined(); // omitted → isolated-by-default
|
||||
});
|
||||
|
||||
test('federate / unfederate flips config.federated on real DB', async () => {
|
||||
const conn = getConn();
|
||||
const engine = getEngine();
|
||||
|
||||
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['federate', 'isolatedsrc']);
|
||||
let row = await conn.unsafe(`SELECT config FROM sources WHERE id = 'isolatedsrc'`);
|
||||
let config = typeof row[0].config === 'string' ? JSON.parse(row[0].config) : row[0].config;
|
||||
expect(config.federated).toBe(true);
|
||||
|
||||
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['unfederate', 'isolatedsrc']);
|
||||
row = await conn.unsafe(`SELECT config FROM sources WHERE id = 'isolatedsrc'`);
|
||||
config = typeof row[0].config === 'string' ? JSON.parse(row[0].config) : row[0].config;
|
||||
expect(config.federated).toBe(false);
|
||||
});
|
||||
|
||||
test('rename changes name, id stays stable', async () => {
|
||||
const conn = getConn();
|
||||
const engine = getEngine();
|
||||
|
||||
await runSources(engine as unknown as Parameters<typeof runSources>[0], [
|
||||
'rename', 'isolatedsrc', 'My Isolated Source',
|
||||
]);
|
||||
const row = await conn.unsafe(`SELECT id, name FROM sources WHERE id = 'isolatedsrc'`);
|
||||
expect(row[0].id).toBe('isolatedsrc');
|
||||
expect(row[0].name).toBe('My Isolated Source');
|
||||
});
|
||||
});
|
||||
|
||||
describeE2E('v0.18.0 multi-source — storage backfill against file_migration_ledger', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
|
||||
// residual rows from prior test runs can shadow new INSERTs. Wipe
|
||||
// non-default sources at the top of every describe to keep each
|
||||
// block hermetic. file_migration_ledger cascades from files which
|
||||
// setupDB already truncates, but wipe explicitly in case files did
|
||||
// not cascade it.
|
||||
const conn = getConn();
|
||||
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
|
||||
await conn.unsafe(`DELETE FROM file_migration_ledger`);
|
||||
});
|
||||
afterAll(async () => {
|
||||
await teardownDB();
|
||||
});
|
||||
|
||||
test('seeded ledger + stub storage: pending → complete end-to-end', async () => {
|
||||
const conn = getConn();
|
||||
const engine = getEngine();
|
||||
|
||||
// Seed a page + file (via raw INSERT so the test doesn't depend on
|
||||
// sync running).
|
||||
await conn.unsafe(
|
||||
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
|
||||
VALUES ('default', 'topics/storage', 'note', 'Storage test', 'body', '', '{}'::jsonb, 'sh1')`,
|
||||
);
|
||||
const pageRow = await conn.unsafe(
|
||||
`SELECT id FROM pages WHERE source_id = 'default' AND slug = 'topics/storage'`,
|
||||
);
|
||||
const pageId = pageRow[0].id as number;
|
||||
|
||||
await conn.unsafe(
|
||||
`INSERT INTO files (source_id, page_id, filename, storage_path, content_hash)
|
||||
VALUES ('default', ${pageId}, 'doc.pdf', 'topics/storage/doc.pdf', 'fh1')`,
|
||||
);
|
||||
const fileRow = await conn.unsafe(
|
||||
`SELECT id FROM files WHERE storage_path = 'topics/storage/doc.pdf'`,
|
||||
);
|
||||
const fileId = fileRow[0].id as number;
|
||||
|
||||
// Seed the ledger manually so we don't depend on the v23 seed SQL
|
||||
// (the TRUNCATE CASCADE in setupDB wipes ledger rows).
|
||||
await conn.unsafe(
|
||||
`INSERT INTO file_migration_ledger (file_id, storage_path_old, storage_path_new, status)
|
||||
VALUES (${fileId}, 'topics/storage/doc.pdf', 'default/topics/storage/doc.pdf', 'pending')
|
||||
ON CONFLICT (file_id) DO NOTHING`,
|
||||
);
|
||||
|
||||
// Stub storage: downloads return bytes, uploads track what was written.
|
||||
const uploaded = new Set<string>();
|
||||
const stub: StorageBackend = {
|
||||
upload: async (p: string) => { uploaded.add(p); },
|
||||
download: async (p: string) => Buffer.from('bytes-for:' + p),
|
||||
delete: async (p: string) => { uploaded.delete(p); },
|
||||
exists: async (p: string) => uploaded.has(p),
|
||||
list: async () => [],
|
||||
getUrl: async (p) => `https://stub/${p}`,
|
||||
};
|
||||
|
||||
const report = await runStorageBackfill(engine, stub);
|
||||
expect(report.total).toBe(1);
|
||||
expect(report.nowComplete).toBe(1);
|
||||
expect(report.failed).toBe(0);
|
||||
|
||||
// Ledger row transitioned to complete.
|
||||
const ledger = await conn.unsafe(
|
||||
`SELECT status FROM file_migration_ledger WHERE file_id = ${fileId}`,
|
||||
);
|
||||
expect(ledger[0].status).toBe('complete');
|
||||
|
||||
// Files row now points at the new path.
|
||||
const filesAfter = await conn.unsafe(
|
||||
`SELECT storage_path FROM files WHERE id = ${fileId}`,
|
||||
);
|
||||
expect(filesAfter[0].storage_path).toBe('default/topics/storage/doc.pdf');
|
||||
|
||||
// Stub storage saw the upload happen at the new path.
|
||||
expect(uploaded.has('default/topics/storage/doc.pdf')).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
// v0.18.0: real-Postgres regression guard for the addLinksBatch /
|
||||
// addTimelineEntriesBatch JOIN fan-out bug. Before the fix, the JOIN was
|
||||
// `pages.slug = v.from_slug` unqualified — so two pages sharing the same
|
||||
// slug across sources would silently duplicate edges and timeline rows.
|
||||
// postgres-js binds arrays through `unnest()` rather than inline VALUES,
|
||||
// so the query shape is structurally different from PGLite's and gets its
|
||||
// own coverage.
|
||||
describeE2E('v0.18.0 multi-source — addLinksBatch / addTimelineEntriesBatch source-awareness', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
const conn = getConn();
|
||||
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
|
||||
await conn.unsafe(`DELETE FROM file_migration_ledger`);
|
||||
});
|
||||
afterAll(async () => { await teardownDB(); });
|
||||
|
||||
async function seedSameSlugTwoSources() {
|
||||
const conn = getConn();
|
||||
const engine = getEngine() as PostgresEngine;
|
||||
// Second source alongside 'default'.
|
||||
await conn.unsafe(
|
||||
`INSERT INTO sources (id, name) VALUES ('alt', 'alt') ON CONFLICT (id) DO NOTHING`
|
||||
);
|
||||
// Create same-slug pages in both sources. putPage defaults to 'default'.
|
||||
await engine.putPage('topics/ai', { type: 'concept', title: 'AI (default)', compiled_truth: '', timeline: '' });
|
||||
await engine.putPage('topics/ml', { type: 'concept', title: 'ML (default)', compiled_truth: '', timeline: '' });
|
||||
await conn.unsafe(
|
||||
`INSERT INTO pages (slug, type, title, compiled_truth, timeline, frontmatter, content_hash, source_id, updated_at)
|
||||
VALUES ('topics/ai', 'concept', 'AI (alt)', '', '', '{}'::jsonb, 'alt-ai-hash', 'alt', now()),
|
||||
('topics/ml', 'concept', 'ML (alt)', '', '', '{}'::jsonb, 'alt-ml-hash', 'alt', now())`
|
||||
);
|
||||
}
|
||||
|
||||
test('addLinksBatch without explicit source_id does NOT fan out across sources', async () => {
|
||||
await seedSameSlugTwoSources();
|
||||
const conn = getConn();
|
||||
const engine = getEngine() as PostgresEngine;
|
||||
// Reset links from any prior describe block.
|
||||
await conn.unsafe(`DELETE FROM links`);
|
||||
const inserted = await engine.addLinksBatch([
|
||||
{ from_slug: 'topics/ai', to_slug: 'topics/ml', link_type: 'mention' },
|
||||
]);
|
||||
// Exactly one edge (default → default). Before the fix this was 2.
|
||||
expect(inserted).toBe(1);
|
||||
const rows = await conn.unsafe(
|
||||
`SELECT f.source_id AS from_src, t.source_id AS to_src
|
||||
FROM links l
|
||||
JOIN pages f ON f.id = l.from_page_id
|
||||
JOIN pages t ON t.id = l.to_page_id`
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].from_src).toBe('default');
|
||||
expect(rows[0].to_src).toBe('default');
|
||||
});
|
||||
|
||||
test('addLinksBatch supports cross-source edges when explicit source_ids differ', async () => {
|
||||
const conn = getConn();
|
||||
const engine = getEngine() as PostgresEngine;
|
||||
await conn.unsafe(`DELETE FROM links`);
|
||||
const inserted = await engine.addLinksBatch([
|
||||
{
|
||||
from_slug: 'topics/ai', to_slug: 'topics/ml', link_type: 'mention',
|
||||
from_source_id: 'default', to_source_id: 'alt',
|
||||
},
|
||||
]);
|
||||
expect(inserted).toBe(1);
|
||||
const rows = await conn.unsafe(
|
||||
`SELECT f.source_id AS from_src, t.source_id AS to_src
|
||||
FROM links l
|
||||
JOIN pages f ON f.id = l.from_page_id
|
||||
JOIN pages t ON t.id = l.to_page_id`
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].from_src).toBe('default');
|
||||
expect(rows[0].to_src).toBe('alt');
|
||||
});
|
||||
|
||||
test('addTimelineEntriesBatch without explicit source_id does NOT fan out across sources', async () => {
|
||||
const conn = getConn();
|
||||
const engine = getEngine() as PostgresEngine;
|
||||
await conn.unsafe(`DELETE FROM timeline_entries`);
|
||||
const inserted = await engine.addTimelineEntriesBatch([
|
||||
{ slug: 'topics/ai', date: '2024-01-15', summary: 'Founded' },
|
||||
]);
|
||||
expect(inserted).toBe(1);
|
||||
const rows = await conn.unsafe(
|
||||
`SELECT p.source_id
|
||||
FROM timeline_entries te
|
||||
JOIN pages p ON p.id = te.page_id`
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].source_id).toBe('default');
|
||||
});
|
||||
|
||||
test('addTimelineEntriesBatch with explicit alt source_id lands only in alt', async () => {
|
||||
const conn = getConn();
|
||||
const engine = getEngine() as PostgresEngine;
|
||||
await conn.unsafe(`DELETE FROM timeline_entries`);
|
||||
const inserted = await engine.addTimelineEntriesBatch([
|
||||
{ slug: 'topics/ai', date: '2024-02-01', summary: 'Alt-only event', source_id: 'alt' },
|
||||
]);
|
||||
expect(inserted).toBe(1);
|
||||
const rows = await conn.unsafe(
|
||||
`SELECT p.source_id
|
||||
FROM timeline_entries te
|
||||
JOIN pages p ON p.id = te.page_id`
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].source_id).toBe('alt');
|
||||
});
|
||||
});
|
||||
@@ -609,3 +609,68 @@ describe('FRONTMATTER_LINK_MAP integrity', () => {
|
||||
expect(m!.dirHint).toContain('people');
|
||||
});
|
||||
});
|
||||
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// v0.18.0 Step 4 — qualified wikilink syntax [[source-id:dir/slug]]
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe("extractEntityRefs — v0.18.0 qualified wikilinks", () => {
|
||||
test("[[wiki:topics/ai]] extracts with sourceId=wiki", () => {
|
||||
const refs = extractEntityRefs("See [[concepts/ai]] vs [[wiki:concepts/ai]] for wiki-specific take.");
|
||||
// One unqualified + one qualified.
|
||||
expect(refs.length).toBe(2);
|
||||
const qual = refs.find(r => r.sourceId === "wiki");
|
||||
expect(qual).toBeDefined();
|
||||
expect(qual!.slug).toBe("concepts/ai");
|
||||
expect(qual!.name).toBe("concepts/ai");
|
||||
const unqual = refs.find(r => r.sourceId === undefined);
|
||||
expect(unqual).toBeDefined();
|
||||
expect(unqual!.slug).toBe("concepts/ai");
|
||||
});
|
||||
|
||||
test("[[gstack:projects/foo|Display Name]] preserves display + sourceId", () => {
|
||||
const refs = extractEntityRefs("See [[gstack:projects/foo|The Foo Project]] for details.");
|
||||
expect(refs.length).toBe(1);
|
||||
expect(refs[0]).toEqual({ name: "The Foo Project", slug: "projects/foo", dir: "projects", sourceId: "gstack" });
|
||||
});
|
||||
|
||||
test("qualified source-id format is validated (must match [a-z0-9-]+ kebab rules)", () => {
|
||||
// Uppercase source IDs are not qualified — fall through to unqualified wikilink or no match.
|
||||
const refs = extractEntityRefs("Legit: [[yc-media:concepts/seed]] Not legit: [[NotValid:concepts/x]]");
|
||||
const qualified = refs.filter(r => r.sourceId);
|
||||
expect(qualified.length).toBe(1);
|
||||
expect(qualified[0].sourceId).toBe("yc-media");
|
||||
});
|
||||
|
||||
test("masking prevents unqualified regex from matching inside a qualified link", () => {
|
||||
// Without the mask, [[wiki:concepts/ai]] could also match as
|
||||
// unqualified with slug "wiki:concepts/ai" (invalid dir) — the
|
||||
// DIR_PATTERN whitelist normally blocks it, but masking is
|
||||
// defense-in-depth.
|
||||
const refs = extractEntityRefs("Ref: [[wiki:concepts/ai]]");
|
||||
expect(refs.length).toBe(1);
|
||||
expect(refs[0].sourceId).toBe("wiki");
|
||||
});
|
||||
|
||||
test("markdown [Name](path) links always have no sourceId (unqualified by shape)", () => {
|
||||
const refs = extractEntityRefs("[Alice](people/alice-chen) met [[wiki:people/bob]]");
|
||||
const mdLink = refs.find(r => r.slug === "people/alice-chen");
|
||||
expect(mdLink!.sourceId).toBeUndefined();
|
||||
const wiki = refs.find(r => r.slug === "people/bob");
|
||||
expect(wiki!.sourceId).toBe("wiki");
|
||||
});
|
||||
});
|
||||
|
||||
describe("v0.18.0 migration v22 — links_resolution_type", () => {
|
||||
test("migration v22 exists with CHECK constraint", async () => {
|
||||
const { MIGRATIONS } = await import("../src/core/migrate.ts");
|
||||
const v22 = MIGRATIONS.find(m => m.version === 22);
|
||||
expect(v22).toBeDefined();
|
||||
expect(v22!.name).toBe("links_resolution_type");
|
||||
expect(v22!.sql).toContain("ADD COLUMN IF NOT EXISTS resolution_type");
|
||||
expect(v22!.sql).toContain("links_resolution_type_check");
|
||||
expect(v22!.sql).toContain("qualified");
|
||||
expect(v22!.sql).toContain("unqualified");
|
||||
});
|
||||
});
|
||||
|
||||
|
||||
@@ -16,6 +16,162 @@ describe('migrate', () => {
|
||||
// and are covered in the E2E suite (test/e2e/mechanical.test.ts)
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// v0.18.0 — v16 sources_table_additive (Step 1, Lane A)
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// v16 is the ADDITIVE-ONLY migration: it installs the sources primitive
|
||||
// without breaking the engine's existing ON CONFLICT (slug) upserts.
|
||||
// The breaking schema changes (pages.source_id NOT NULL, composite
|
||||
// UNIQUE, files.page_slug → page_id, file_migration_ledger,
|
||||
// links.resolution_type) land in v17 alongside the engine API rewrite
|
||||
// so the engine can execute the new ON CONFLICT (source_id, slug)
|
||||
// atomically with the schema change.
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('migrate v20 — sources_table_additive', () => {
|
||||
const v20 = MIGRATIONS.find(m => m.version === 20);
|
||||
|
||||
test('v20 exists', () => {
|
||||
expect(v20).toBeDefined();
|
||||
expect(v20!.name).toBe('sources_table_additive');
|
||||
});
|
||||
|
||||
test('v20 creates sources table', () => {
|
||||
expect(v20!.sql).toContain('CREATE TABLE IF NOT EXISTS sources');
|
||||
expect(v20!.sql).toContain('id TEXT PRIMARY KEY');
|
||||
expect(v20!.sql).toContain('name TEXT NOT NULL UNIQUE');
|
||||
expect(v20!.sql).toContain('config JSONB NOT NULL');
|
||||
});
|
||||
|
||||
test("v20 seeds 'default' source inheriting sync config", () => {
|
||||
expect(v20!.sql).toContain("INSERT INTO sources (id, name, local_path, last_commit, config)");
|
||||
expect(v20!.sql).toContain("'default'");
|
||||
// The default source pulls from existing config so post-upgrade
|
||||
// identity is preserved.
|
||||
expect(v20!.sql).toContain("SELECT value FROM config WHERE key = 'sync.repo_path'");
|
||||
expect(v20!.sql).toContain("SELECT value FROM config WHERE key = 'sync.last_commit'");
|
||||
});
|
||||
|
||||
test('v20 default source is federated=true (backward-compat)', () => {
|
||||
// federated=true ensures pre-v0.17 brains keep single-namespace
|
||||
// search semantics — every page appears in unqualified search.
|
||||
expect(v20!.sql).toContain('"federated": true');
|
||||
});
|
||||
|
||||
test('v20 is idempotent on re-run', () => {
|
||||
// CREATE TABLE IF NOT EXISTS + NOT EXISTS subquery on INSERT.
|
||||
expect(v20!.sql).toContain('CREATE TABLE IF NOT EXISTS sources');
|
||||
expect(v20!.sql).toContain('WHERE NOT EXISTS (SELECT 1 FROM sources WHERE id = ');
|
||||
});
|
||||
|
||||
test('v20 does NOT touch pages / ingest_log / files / links', () => {
|
||||
// Step 1 is additive-only. Breaking changes deferred to v17 so they
|
||||
// land with the engine rewrite (Step 2). Guard against anyone
|
||||
// accidentally re-expanding v16's scope.
|
||||
expect(v20!.sql).not.toContain('ALTER TABLE pages');
|
||||
expect(v20!.sql).not.toContain('ALTER TABLE ingest_log');
|
||||
expect(v20!.sql).not.toContain('ALTER TABLE files');
|
||||
expect(v20!.sql).not.toContain('ALTER TABLE links');
|
||||
expect(v20!.handler).toBeUndefined();
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// v0.18.0 — v17 pages_source_id_composite_unique (Step 2, Lane B)
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('migrate v21 — pages_source_id_composite_unique', () => {
|
||||
const v21 = MIGRATIONS.find(m => m.version === 21);
|
||||
|
||||
test('v21 exists and is paired with Step 2 engine rewrite', () => {
|
||||
expect(v21).toBeDefined();
|
||||
expect(v21!.name).toBe('pages_source_id_composite_unique');
|
||||
});
|
||||
|
||||
test('v21 adds pages.source_id with DEFAULT default REFERENCES sources', () => {
|
||||
expect(v21!.sql).toContain('ALTER TABLE pages ADD COLUMN IF NOT EXISTS source_id TEXT');
|
||||
// DEFAULT 'default' closes the race where an INSERT between ADD COLUMN
|
||||
// and SET NOT NULL could leave source_id NULL (Codex second-pass review).
|
||||
expect(v21!.sql).toContain("NOT NULL DEFAULT 'default' REFERENCES sources(id)");
|
||||
});
|
||||
|
||||
test('v21 swaps UNIQUE(slug) → composite UNIQUE(source_id, slug)', () => {
|
||||
// ON CONFLICT (source_id, slug) in putPage relies on this swap.
|
||||
expect(v21!.sql).toContain('ALTER TABLE pages DROP CONSTRAINT IF EXISTS pages_slug_key');
|
||||
expect(v21!.sql).toContain('pages_source_slug_key');
|
||||
expect(v21!.sql).toContain('UNIQUE (source_id, slug)');
|
||||
});
|
||||
|
||||
test('v21 creates source-scoped index for per-source scans', () => {
|
||||
expect(v21!.sql).toContain('CREATE INDEX IF NOT EXISTS idx_pages_source_id');
|
||||
});
|
||||
|
||||
test('v21 constraint add is guarded (idempotent re-run)', () => {
|
||||
// DO block with IF NOT EXISTS guard means re-running the migration
|
||||
// after partial failure doesn't error on the already-installed name.
|
||||
expect(v21!.sql).toContain('IF NOT EXISTS');
|
||||
expect(v21!.sql).toContain("WHERE conname = 'pages_source_slug_key'");
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// v0.18.0 — v19 files_source_id_page_id_ledger (Step 7, Lane E)
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('migrate v23 — files_source_id_page_id_ledger', () => {
|
||||
const v23 = MIGRATIONS.find(m => m.version === 23);
|
||||
|
||||
test('v23 exists as handler-only (Postgres files table, PGLite no-op)', () => {
|
||||
expect(v23).toBeDefined();
|
||||
expect(v23!.name).toBe('files_source_id_page_id_ledger');
|
||||
expect(v23!.sql).toBe('');
|
||||
expect(v23!.handler).toBeDefined();
|
||||
});
|
||||
|
||||
test('v23 handler gates on engine.kind for PGLite (no files table)', () => {
|
||||
expect(v23!.handler!.toString()).toMatch(/engine\.kind\s*===\s*["']pglite["']/);
|
||||
});
|
||||
|
||||
test('v23 adds files.source_id + files.page_id + ledger creation', () => {
|
||||
const body = v23!.handler!.toString();
|
||||
expect(body).toContain('ALTER TABLE files ADD COLUMN IF NOT EXISTS source_id');
|
||||
expect(body).toContain('ALTER TABLE files ADD COLUMN IF NOT EXISTS page_id');
|
||||
expect(body).toContain('CREATE TABLE IF NOT EXISTS file_migration_ledger');
|
||||
});
|
||||
|
||||
test('v23 backfills files.page_id scoped to default source (Codex fix)', () => {
|
||||
const body = v23!.handler!.toString();
|
||||
// Without source_id='default' scope, the JOIN could hit the wrong
|
||||
// page after new sources with duplicate slugs are added.
|
||||
expect(body).toContain('UPDATE files f');
|
||||
expect(body).toContain("p.source_id = 'default'");
|
||||
});
|
||||
|
||||
test('v23 ledger PK is file_id (Codex: two sources can share old path)', () => {
|
||||
const body = v23!.handler!.toString();
|
||||
expect(body).toContain('file_id INTEGER PRIMARY KEY');
|
||||
// State machine values all present.
|
||||
for (const state of ['pending', 'copy_done', 'db_updated', 'complete', 'failed']) {
|
||||
expect(body).toContain(`'${state}'`);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe('migrate — ordering guarantee (v15 must NOT be skipped by v16)', () => {
|
||||
test('runMigrations sorts by version ascending', async () => {
|
||||
// Regression: if v16 preceded v15 in the MIGRATIONS array, the iterator
|
||||
// would setConfig(version, 16) first, then skip v15 on the next pass.
|
||||
// runMigrations applies a defensive sort so array order doesn't matter.
|
||||
// This test asserts v15 exists (if we broke the sort, v15 would still
|
||||
// exist in MIGRATIONS but would never apply at runtime).
|
||||
const v15 = MIGRATIONS.find(m => m.version === 15);
|
||||
const v20 = MIGRATIONS.find(m => m.version === 20);
|
||||
expect(v15).toBeDefined();
|
||||
expect(v20).toBeDefined();
|
||||
// Sanity: versions are distinct and progress.
|
||||
const versions = MIGRATIONS.map(m => m.version);
|
||||
const uniq = new Set(versions);
|
||||
expect(uniq.size).toBe(versions.length);
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// REGRESSION TESTS — migrations v8 + v9 perf on duplicate-heavy tables
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
244
test/multi-source-integration.test.ts
Normal file
244
test/multi-source-integration.test.ts
Normal file
@@ -0,0 +1,244 @@
|
||||
/**
|
||||
* v0.18.0 Step 9 — multi-source integration test against real PGLite.
|
||||
*
|
||||
* Exercises the full Step-1-through-Step-7 surface:
|
||||
* - migration v16 seeds the default source with federated=true
|
||||
* - migration v17 adds pages.source_id + composite UNIQUE
|
||||
* - migration v18 adds links.resolution_type column
|
||||
* - putPage implicitly targets the default source via the
|
||||
* schema DEFAULT 'default' clause
|
||||
* - raw INSERT can write pages to a non-default source and the
|
||||
* composite UNIQUE allows same-slug pages across sources
|
||||
* - sources CLI add/list/federate operations are reflected in DB
|
||||
* - federated flag distinguishes unqualified-search-visibility
|
||||
*
|
||||
* PGLite-only (fast + zero deps). Real Postgres parity lives in
|
||||
* test/e2e/mechanical.test.ts when DATABASE_URL is set.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
|
||||
import { runSources } from '../src/commands/sources.ts';
|
||||
import { resolveSourceId } from '../src/core/source-resolver.ts';
|
||||
|
||||
let engine: PGLiteEngine;
|
||||
|
||||
beforeAll(async () => {
|
||||
engine = new PGLiteEngine();
|
||||
await engine.connect({ type: 'pglite' } as never);
|
||||
await engine.initSchema();
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await engine.disconnect();
|
||||
});
|
||||
|
||||
describe('v0.18.0 — sources table seeded with default row on fresh PGLite', () => {
|
||||
test("sources('default') exists after initSchema + migration", async () => {
|
||||
const rows = await engine.executeRaw<{ id: string; name: string; config: string | Record<string, unknown> }>(
|
||||
`SELECT id, name, config FROM sources WHERE id = 'default'`,
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].name).toBe('default');
|
||||
const config = typeof rows[0].config === 'string' ? JSON.parse(rows[0].config) : rows[0].config;
|
||||
expect(config.federated).toBe(true);
|
||||
});
|
||||
|
||||
test('pages.source_id column exists with DEFAULT default', async () => {
|
||||
const rows = await engine.executeRaw<{ column_default: string | null }>(
|
||||
`SELECT column_default FROM information_schema.columns
|
||||
WHERE table_name = 'pages' AND column_name = 'source_id'`,
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
// PGLite normalizes the default literal.
|
||||
expect(rows[0].column_default).toContain('default');
|
||||
});
|
||||
|
||||
test('composite UNIQUE (source_id, slug) is installed', async () => {
|
||||
const rows = await engine.executeRaw<{ conname: string }>(
|
||||
`SELECT conname FROM pg_constraint WHERE conname = 'pages_source_slug_key'`,
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
});
|
||||
});
|
||||
|
||||
describe('v0.18.0 — putPage implicitly writes to default source', () => {
|
||||
test('putPage without explicit source → source_id = default', async () => {
|
||||
await engine.putPage('topics/step9-auto', {
|
||||
type: 'concept',
|
||||
title: 'Step 9 Auto',
|
||||
compiled_truth: 'Auto-defaulted to default source.',
|
||||
});
|
||||
const rows = await engine.executeRaw<{ source_id: string; slug: string }>(
|
||||
`SELECT source_id, slug FROM pages WHERE slug = 'topics/step9-auto'`,
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].source_id).toBe('default');
|
||||
});
|
||||
});
|
||||
|
||||
describe('v0.18.0 — composite UNIQUE allows same-slug across sources', () => {
|
||||
test('same slug in two different sources coexists (regression: Codex critical)', async () => {
|
||||
// Insert a second source via sources CLI.
|
||||
await runSources(engine, ['add', 'testsrc', '--no-federated']);
|
||||
|
||||
// Sanity: default already has this slug from the previous test.
|
||||
// Now write the same slug under testsrc via raw INSERT (putPage only
|
||||
// targets default until a later step surfaces sourceId; raw INSERT is
|
||||
// the "source-aware write" Step 5 continuation will add).
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
|
||||
VALUES ('testsrc', 'topics/step9-auto', 'concept', 'Step 9 Auto (testsrc variant)',
|
||||
'A different page with the same slug in a different source.',
|
||||
'', '{}'::jsonb, 'hash2')`,
|
||||
);
|
||||
|
||||
// Both rows must exist under the composite unique.
|
||||
const rows = await engine.executeRaw<{ source_id: string; slug: string; title: string }>(
|
||||
`SELECT source_id, slug, title FROM pages
|
||||
WHERE slug = 'topics/step9-auto'
|
||||
ORDER BY source_id`,
|
||||
);
|
||||
expect(rows.length).toBe(2);
|
||||
expect(rows.map(r => r.source_id).sort()).toEqual(['default', 'testsrc']);
|
||||
});
|
||||
|
||||
test('inserting THIRD row with same (source_id, slug) hits composite UNIQUE', async () => {
|
||||
let err: Error | null = null;
|
||||
try {
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
|
||||
VALUES ('testsrc', 'topics/step9-auto', 'concept', 'Dup attempt',
|
||||
'Should fail', '', '{}'::jsonb, 'hash3')`,
|
||||
);
|
||||
} catch (e) {
|
||||
err = e as Error;
|
||||
}
|
||||
expect(err).not.toBeNull();
|
||||
expect(err!.message.toLowerCase()).toMatch(/unique|duplicate/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('v0.18.0 — sources CLI manipulates the sources table', () => {
|
||||
test('sources federate flips config.federated true', async () => {
|
||||
await runSources(engine, ['federate', 'testsrc']);
|
||||
const rows = await engine.executeRaw<{ config: string | Record<string, unknown> }>(
|
||||
`SELECT config FROM sources WHERE id = 'testsrc'`,
|
||||
);
|
||||
const config = typeof rows[0].config === 'string' ? JSON.parse(rows[0].config) : rows[0].config;
|
||||
expect(config.federated).toBe(true);
|
||||
});
|
||||
|
||||
test('sources unfederate flips config.federated false', async () => {
|
||||
await runSources(engine, ['unfederate', 'testsrc']);
|
||||
const rows = await engine.executeRaw<{ config: string | Record<string, unknown> }>(
|
||||
`SELECT config FROM sources WHERE id = 'testsrc'`,
|
||||
);
|
||||
const config = typeof rows[0].config === 'string' ? JSON.parse(rows[0].config) : rows[0].config;
|
||||
expect(config.federated).toBe(false);
|
||||
});
|
||||
|
||||
test('sources rename changes name but keeps id immutable', async () => {
|
||||
await runSources(engine, ['rename', 'testsrc', 'Test Source']);
|
||||
const rows = await engine.executeRaw<{ id: string; name: string }>(
|
||||
`SELECT id, name FROM sources WHERE id = 'testsrc'`,
|
||||
);
|
||||
expect(rows[0].id).toBe('testsrc');
|
||||
expect(rows[0].name).toBe('Test Source');
|
||||
});
|
||||
});
|
||||
|
||||
describe('v0.18.0 — source resolution priority (integration)', () => {
|
||||
test('explicit --source flag wins when the source exists', async () => {
|
||||
const id = await resolveSourceId(engine, 'testsrc');
|
||||
expect(id).toBe('testsrc');
|
||||
});
|
||||
|
||||
test('GBRAIN_SOURCE env wins when no flag', async () => {
|
||||
process.env.GBRAIN_SOURCE = 'testsrc';
|
||||
try {
|
||||
const id = await resolveSourceId(engine, null);
|
||||
expect(id).toBe('testsrc');
|
||||
} finally {
|
||||
delete process.env.GBRAIN_SOURCE;
|
||||
}
|
||||
});
|
||||
|
||||
test('fallback to default when nothing is set', async () => {
|
||||
const id = await resolveSourceId(engine, null, '/nowhere-registered');
|
||||
expect(id).toBe('default');
|
||||
});
|
||||
|
||||
test('rejects unregistered explicit source with an actionable error', async () => {
|
||||
await expect(resolveSourceId(engine, 'ghost-source')).rejects.toThrow(/not found/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('v0.18.0 — sources remove cascades to pages', () => {
|
||||
test('removing a source cascade-deletes its pages', async () => {
|
||||
const before = await engine.executeRaw<{ n: number }>(
|
||||
`SELECT COUNT(*)::int AS n FROM pages WHERE source_id = 'testsrc'`,
|
||||
);
|
||||
expect(before[0].n).toBeGreaterThan(0);
|
||||
|
||||
await runSources(engine, ['remove', 'testsrc', '--yes']);
|
||||
|
||||
const after = await engine.executeRaw<{ n: number }>(
|
||||
`SELECT COUNT(*)::int AS n FROM pages WHERE source_id = 'testsrc'`,
|
||||
);
|
||||
expect(after[0].n).toBe(0);
|
||||
|
||||
const src = await engine.executeRaw<{ id: string }>(
|
||||
`SELECT id FROM sources WHERE id = 'testsrc'`,
|
||||
);
|
||||
expect(src.length).toBe(0);
|
||||
|
||||
// Default source is untouched.
|
||||
const defaultPages = await engine.executeRaw<{ n: number }>(
|
||||
`SELECT COUNT(*)::int AS n FROM pages WHERE source_id = 'default'`,
|
||||
);
|
||||
expect(defaultPages[0].n).toBeGreaterThan(0);
|
||||
});
|
||||
});
|
||||
|
||||
describe('v0.18.0 — links.resolution_type column exists (Step 4)', () => {
|
||||
test('links table accepts qualified/unqualified resolution_type', async () => {
|
||||
// Create two pages, insert a link with resolution_type='qualified'.
|
||||
await engine.putPage('topics/qf-a', {
|
||||
type: 'concept', title: 'QA', compiled_truth: 'a',
|
||||
});
|
||||
await engine.putPage('topics/qf-b', {
|
||||
type: 'concept', title: 'QB', compiled_truth: 'b',
|
||||
});
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO links (from_page_id, to_page_id, link_type, context, link_source, resolution_type)
|
||||
SELECT a.id, b.id, 'ref', '', 'markdown', 'qualified'
|
||||
FROM pages a, pages b
|
||||
WHERE a.slug = 'topics/qf-a' AND b.slug = 'topics/qf-b'
|
||||
AND a.source_id = 'default' AND b.source_id = 'default'`,
|
||||
);
|
||||
const rows = await engine.executeRaw<{ resolution_type: string }>(
|
||||
`SELECT l.resolution_type
|
||||
FROM links l
|
||||
JOIN pages a ON a.id = l.from_page_id
|
||||
WHERE a.slug = 'topics/qf-a'`,
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].resolution_type).toBe('qualified');
|
||||
});
|
||||
|
||||
test('links CHECK constraint rejects invalid resolution_type values', async () => {
|
||||
let err: Error | null = null;
|
||||
try {
|
||||
await engine.executeRaw(
|
||||
`INSERT INTO links (from_page_id, to_page_id, link_type, resolution_type)
|
||||
SELECT a.id, a.id, 'self', 'bogus-value'
|
||||
FROM pages a WHERE a.slug = 'topics/qf-a' AND a.source_id = 'default'`,
|
||||
);
|
||||
} catch (e) {
|
||||
err = e as Error;
|
||||
}
|
||||
expect(err).not.toBeNull();
|
||||
expect(err!.message.toLowerCase()).toMatch(/check|constraint/);
|
||||
});
|
||||
});
|
||||
@@ -462,6 +462,119 @@ describe('PGLiteEngine: addTimelineEntriesBatch', () => {
|
||||
});
|
||||
});
|
||||
|
||||
// v0.18.0: regression guards for the cross-source JOIN fan-out.
|
||||
// Before the fix, addLinksBatch/addTimelineEntriesBatch JOINed on pages.slug
|
||||
// only — so a page with the same slug in two sources would fan out and
|
||||
// silently create duplicate edges / entries. Source-id-qualified JOINs
|
||||
// eliminate the fan-out.
|
||||
describe('PGLiteEngine: batch ops source-awareness (v0.18.0)', () => {
|
||||
beforeEach(async () => {
|
||||
await truncateAll();
|
||||
// Register a second source and populate the same slugs in both.
|
||||
const db = (engine as any).db;
|
||||
await db.query(
|
||||
`INSERT INTO sources (id, name) VALUES ('alt', 'alt')
|
||||
ON CONFLICT (id) DO NOTHING`
|
||||
);
|
||||
// default-source rows via putPage (schema DEFAULT 'default').
|
||||
await engine.putPage('topics/ai', { type: 'concept', title: 'AI (default)', compiled_truth: '', timeline: '' });
|
||||
await engine.putPage('topics/ml', { type: 'concept', title: 'ML (default)', compiled_truth: '', timeline: '' });
|
||||
// alt-source rows with the same slugs, inserted via raw SQL.
|
||||
await db.query(
|
||||
`INSERT INTO pages (slug, type, title, compiled_truth, timeline, frontmatter, content_hash, source_id, updated_at)
|
||||
VALUES ('topics/ai', 'concept', 'AI (alt)', '', '', '{}'::jsonb, 'h1', 'alt', now()),
|
||||
('topics/ml', 'concept', 'ML (alt)', '', '', '{}'::jsonb, 'h2', 'alt', now())`
|
||||
);
|
||||
});
|
||||
|
||||
test('addLinksBatch default source_id does NOT fan out across sources', async () => {
|
||||
const inserted = await engine.addLinksBatch([
|
||||
{ from_slug: 'topics/ai', to_slug: 'topics/ml', link_type: 'mention' },
|
||||
]);
|
||||
// Exactly one edge, not two. Before the fix this was 2.
|
||||
expect(inserted).toBe(1);
|
||||
const db = (engine as any).db;
|
||||
const { rows } = await db.query(
|
||||
`SELECT f.source_id AS from_src, t.source_id AS to_src
|
||||
FROM links l
|
||||
JOIN pages f ON f.id = l.from_page_id
|
||||
JOIN pages t ON t.id = l.to_page_id`
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].from_src).toBe('default');
|
||||
expect(rows[0].to_src).toBe('default');
|
||||
});
|
||||
|
||||
test('addLinksBatch with explicit alt source_id lands in alt only', async () => {
|
||||
const inserted = await engine.addLinksBatch([
|
||||
{
|
||||
from_slug: 'topics/ai', to_slug: 'topics/ml', link_type: 'mention',
|
||||
from_source_id: 'alt', to_source_id: 'alt',
|
||||
},
|
||||
]);
|
||||
expect(inserted).toBe(1);
|
||||
const db = (engine as any).db;
|
||||
const { rows } = await db.query(
|
||||
`SELECT f.source_id AS from_src, t.source_id AS to_src
|
||||
FROM links l
|
||||
JOIN pages f ON f.id = l.from_page_id
|
||||
JOIN pages t ON t.id = l.to_page_id`
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].from_src).toBe('alt');
|
||||
expect(rows[0].to_src).toBe('alt');
|
||||
});
|
||||
|
||||
test('addLinksBatch supports cross-source edges', async () => {
|
||||
const inserted = await engine.addLinksBatch([
|
||||
{
|
||||
from_slug: 'topics/ai', to_slug: 'topics/ml', link_type: 'mention',
|
||||
from_source_id: 'default', to_source_id: 'alt',
|
||||
},
|
||||
]);
|
||||
expect(inserted).toBe(1);
|
||||
const db = (engine as any).db;
|
||||
const { rows } = await db.query(
|
||||
`SELECT f.source_id AS from_src, t.source_id AS to_src
|
||||
FROM links l
|
||||
JOIN pages f ON f.id = l.from_page_id
|
||||
JOIN pages t ON t.id = l.to_page_id`
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].from_src).toBe('default');
|
||||
expect(rows[0].to_src).toBe('alt');
|
||||
});
|
||||
|
||||
test('addTimelineEntriesBatch default source_id does NOT fan out across sources', async () => {
|
||||
const inserted = await engine.addTimelineEntriesBatch([
|
||||
{ slug: 'topics/ai', date: '2024-01-15', summary: 'Founded' },
|
||||
]);
|
||||
// Exactly one entry (default source), not two. Before the fix this was 2.
|
||||
expect(inserted).toBe(1);
|
||||
const db = (engine as any).db;
|
||||
const { rows } = await db.query(
|
||||
`SELECT p.source_id FROM timeline_entries te
|
||||
JOIN pages p ON p.id = te.page_id`
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].source_id).toBe('default');
|
||||
});
|
||||
|
||||
test('addTimelineEntriesBatch with explicit alt source_id lands in alt only', async () => {
|
||||
const inserted = await engine.addTimelineEntriesBatch([
|
||||
{ slug: 'topics/ai', date: '2024-01-15', summary: 'Founded', source_id: 'alt' },
|
||||
]);
|
||||
expect(inserted).toBe(1);
|
||||
const db = (engine as any).db;
|
||||
const { rows } = await db.query(
|
||||
`SELECT p.source_id FROM timeline_entries te
|
||||
JOIN pages p ON p.id = te.page_id`
|
||||
);
|
||||
expect(rows.length).toBe(1);
|
||||
expect(rows[0].source_id).toBe('alt');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Raw Data, Versions, Config, IngestLog
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
190
test/source-resolver.test.ts
Normal file
190
test/source-resolver.test.ts
Normal file
@@ -0,0 +1,190 @@
|
||||
/**
|
||||
* v0.18.0 Step 6 — source resolution priority tests.
|
||||
*
|
||||
* Priority order (highest first):
|
||||
* 1. Explicit --source flag
|
||||
* 2. GBRAIN_SOURCE env var
|
||||
* 3. .gbrain-source dotfile walk-up
|
||||
* 4. Registered source whose local_path contains CWD (longest prefix wins)
|
||||
* 5. Brain-level `sources.default` config key
|
||||
* 6. Fallback: literal 'default'
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
|
||||
import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
import { tmpdir } from 'os';
|
||||
import { resolveSourceId, __testing } from '../src/core/source-resolver.ts';
|
||||
import type { BrainEngine } from '../src/core/engine.ts';
|
||||
|
||||
// ── Stub engine ────────────────────────────────────────────
|
||||
|
||||
function makeStub(registeredSources: string[], paths: Array<{ id: string; local_path: string }>, defaultKey: string | null): BrainEngine {
|
||||
return {
|
||||
kind: 'pglite',
|
||||
executeRaw: async <T>(sql: string, params?: unknown[]): Promise<T[]> => {
|
||||
if (sql.includes('SELECT id FROM sources WHERE id = $1')) {
|
||||
const target = params?.[0];
|
||||
return (registeredSources.includes(target as string)
|
||||
? [{ id: target } as unknown as T]
|
||||
: []);
|
||||
}
|
||||
if (sql.includes('SELECT id, local_path FROM sources')) {
|
||||
return paths as unknown as T[];
|
||||
}
|
||||
return [];
|
||||
},
|
||||
getConfig: async (key: string) => (key === 'sources.default' ? defaultKey : null),
|
||||
} as unknown as BrainEngine;
|
||||
}
|
||||
|
||||
// ── Priority 1: explicit flag ──────────────────────────────
|
||||
|
||||
describe('resolveSourceId priority 1 — explicit flag', () => {
|
||||
test('wins over every other signal', async () => {
|
||||
const engine = makeStub(['default', 'gstack', 'wiki'], [{ id: 'wiki', local_path: '/tmp' }], 'gstack');
|
||||
process.env.GBRAIN_SOURCE = 'wiki';
|
||||
try {
|
||||
const id = await resolveSourceId(engine, 'gstack', '/tmp/whatever');
|
||||
expect(id).toBe('gstack');
|
||||
} finally {
|
||||
delete process.env.GBRAIN_SOURCE;
|
||||
}
|
||||
});
|
||||
|
||||
test('rejects unregistered explicit source with actionable error', async () => {
|
||||
const engine = makeStub(['default'], [], null);
|
||||
await expect(resolveSourceId(engine, 'ghost')).rejects.toThrow(/not found/);
|
||||
});
|
||||
|
||||
test('rejects invalid format', async () => {
|
||||
const engine = makeStub(['default'], [], null);
|
||||
await expect(resolveSourceId(engine, 'WRONG-case!')).rejects.toThrow(/Invalid --source/);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Priority 2: env var ────────────────────────────────────
|
||||
|
||||
describe('resolveSourceId priority 2 — GBRAIN_SOURCE env', () => {
|
||||
test('wins over dotfile / registered-path / default', async () => {
|
||||
const engine = makeStub(['default', 'env-wins'], [{ id: 'other', local_path: '/tmp' }], 'default');
|
||||
process.env.GBRAIN_SOURCE = 'env-wins';
|
||||
try {
|
||||
const id = await resolveSourceId(engine, null, '/tmp/x');
|
||||
expect(id).toBe('env-wins');
|
||||
} finally {
|
||||
delete process.env.GBRAIN_SOURCE;
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
// ── Priority 3: dotfile walk-up ────────────────────────────
|
||||
|
||||
describe('resolveSourceId priority 3 — .gbrain-source dotfile walk-up', () => {
|
||||
let tmpdirPath: string;
|
||||
|
||||
beforeEach(() => {
|
||||
tmpdirPath = mkdtempSync(join(tmpdir(), 'gbrain-resolver-test-'));
|
||||
});
|
||||
afterEach(() => {
|
||||
rmSync(tmpdirPath, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
test('finds dotfile in CWD', async () => {
|
||||
writeFileSync(join(tmpdirPath, '.gbrain-source'), 'gstack\n');
|
||||
const engine = makeStub(['default', 'gstack'], [], null);
|
||||
const id = await resolveSourceId(engine, null, tmpdirPath);
|
||||
expect(id).toBe('gstack');
|
||||
});
|
||||
|
||||
test('walks up ancestors to find dotfile', async () => {
|
||||
writeFileSync(join(tmpdirPath, '.gbrain-source'), 'wiki\n');
|
||||
const deep = join(tmpdirPath, 'a', 'b', 'c');
|
||||
mkdirSync(deep, { recursive: true });
|
||||
const engine = makeStub(['default', 'wiki'], [], null);
|
||||
const id = await resolveSourceId(engine, null, deep);
|
||||
expect(id).toBe('wiki');
|
||||
});
|
||||
|
||||
test('ignores dotfile with invalid content', async () => {
|
||||
writeFileSync(join(tmpdirPath, '.gbrain-source'), 'INVALID!\n');
|
||||
const engine = makeStub(['default'], [], null);
|
||||
const id = await resolveSourceId(engine, null, tmpdirPath);
|
||||
expect(id).toBe('default');
|
||||
});
|
||||
});
|
||||
|
||||
// ── Priority 4: registered local_path match (longest prefix) ──
|
||||
|
||||
describe('resolveSourceId priority 4 — registered local_path longest-prefix match', () => {
|
||||
test('picks registered source whose local_path contains CWD', async () => {
|
||||
const engine = makeStub(
|
||||
['default', 'gstack'],
|
||||
[{ id: 'gstack', local_path: '/tmp/gstack' }],
|
||||
null,
|
||||
);
|
||||
const id = await resolveSourceId(engine, null, '/tmp/gstack/plans/foo');
|
||||
expect(id).toBe('gstack');
|
||||
});
|
||||
|
||||
test('longest prefix wins when paths are nested (per Codex second pass)', async () => {
|
||||
// Codex flagged: overlapping paths need longest-prefix resolution.
|
||||
// If gstack at /tmp/gstack and plans at /tmp/gstack/plans both
|
||||
// exist, CWD inside plans/ must pick plans.
|
||||
const engine = makeStub(
|
||||
['default', 'gstack', 'plans'],
|
||||
[
|
||||
{ id: 'gstack', local_path: '/tmp/gstack' },
|
||||
{ id: 'plans', local_path: '/tmp/gstack/plans' },
|
||||
],
|
||||
null,
|
||||
);
|
||||
const id = await resolveSourceId(engine, null, '/tmp/gstack/plans/deeper');
|
||||
expect(id).toBe('plans');
|
||||
});
|
||||
|
||||
test("CWD outside any registered path falls through to default", async () => {
|
||||
const engine = makeStub(
|
||||
['default', 'gstack'],
|
||||
[{ id: 'gstack', local_path: '/tmp/gstack' }],
|
||||
null,
|
||||
);
|
||||
const id = await resolveSourceId(engine, null, '/some/other/dir');
|
||||
expect(id).toBe('default');
|
||||
});
|
||||
});
|
||||
|
||||
// ── Priority 5: brain-level default ────────────────────────
|
||||
|
||||
describe('resolveSourceId priority 5 — sources.default config key', () => {
|
||||
test("returns configured default when no higher signal present", async () => {
|
||||
const engine = makeStub(['default', 'custom'], [], 'custom');
|
||||
const id = await resolveSourceId(engine, null, '/some/random/dir');
|
||||
expect(id).toBe('custom');
|
||||
});
|
||||
});
|
||||
|
||||
// ── Priority 6: fallback ────────────────────────────────────
|
||||
|
||||
describe('resolveSourceId priority 6 — fallback', () => {
|
||||
test("returns 'default' when no signal at all", async () => {
|
||||
const engine = makeStub(['default'], [], null);
|
||||
const id = await resolveSourceId(engine, null, '/random/dir');
|
||||
expect(id).toBe('default');
|
||||
});
|
||||
});
|
||||
|
||||
// ── Regex validation ───────────────────────────────────────
|
||||
|
||||
describe('SOURCE_ID_RE', () => {
|
||||
test('accepts valid ids', () => {
|
||||
for (const id of ['default', 'wiki', 'gstack', 'yc-media', 'garrys-list', 'a', '123']) {
|
||||
expect(__testing.SOURCE_ID_RE.test(id)).toBe(true);
|
||||
}
|
||||
});
|
||||
test('rejects invalid ids', () => {
|
||||
for (const id of ['', 'a'.repeat(33), 'Upper', 'has_underscore', 'trailing-', '-leading', 'with spaces', 'with.dots']) {
|
||||
expect(__testing.SOURCE_ID_RE.test(id)).toBe(false);
|
||||
}
|
||||
});
|
||||
});
|
||||
252
test/sources.test.ts
Normal file
252
test/sources.test.ts
Normal file
@@ -0,0 +1,252 @@
|
||||
/**
|
||||
* v0.18.0 Step 6 — sources CLI subcommand tests.
|
||||
*
|
||||
* Pure unit tests that exercise the subcommand dispatcher via a
|
||||
* stub BrainEngine. No DB required — we just confirm the SQL
|
||||
* shape, validation, and flag parsing.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeEach } from 'bun:test';
|
||||
import { runSources } from '../src/commands/sources.ts';
|
||||
import type { BrainEngine } from '../src/core/engine.ts';
|
||||
|
||||
// ── Stub engine that records queries ───────────────────────
|
||||
|
||||
interface RecordedCall {
|
||||
sql: string;
|
||||
params: unknown[];
|
||||
}
|
||||
|
||||
function makeStub(rowsByPattern: Record<string, unknown[]> = {}): {
|
||||
engine: BrainEngine;
|
||||
calls: RecordedCall[];
|
||||
configSet: Array<{ key: string; value: string }>;
|
||||
} {
|
||||
const calls: RecordedCall[] = [];
|
||||
const configSet: Array<{ key: string; value: string }> = [];
|
||||
|
||||
const executeRaw = async (sql: string, params?: unknown[]) => {
|
||||
calls.push({ sql, params: params ?? [] });
|
||||
// Match by substring so tests are robust against whitespace.
|
||||
for (const [pattern, rows] of Object.entries(rowsByPattern)) {
|
||||
if (sql.includes(pattern)) return rows as never;
|
||||
}
|
||||
return [] as never;
|
||||
};
|
||||
|
||||
const setConfig = async (key: string, value: string) => {
|
||||
configSet.push({ key, value });
|
||||
};
|
||||
|
||||
// Minimal BrainEngine stub — only the methods sources.ts touches.
|
||||
const engine = {
|
||||
kind: 'pglite' as const,
|
||||
executeRaw,
|
||||
setConfig,
|
||||
// Unused methods throw if called accidentally during these tests.
|
||||
getConfig: async () => null,
|
||||
} as unknown as BrainEngine;
|
||||
|
||||
return { engine, calls, configSet };
|
||||
}
|
||||
|
||||
// ── add ─────────────────────────────────────────────────────
|
||||
|
||||
// Intercept process.exit so unit tests under bun:test don't actually
|
||||
// exit. Each test that might trigger process.exit() wraps its call in
|
||||
// `withExitCapture`. We only return when the function under test returns
|
||||
// or throws; process.exit() is turned into a recoverable throw.
|
||||
async function withExitCapture(fn: () => Promise<void>): Promise<number | null> {
|
||||
const origExit = process.exit;
|
||||
let captured: number | null = null;
|
||||
process.exit = ((code?: number) => {
|
||||
captured = code ?? 0;
|
||||
throw new Error('__process_exit__');
|
||||
}) as never;
|
||||
try {
|
||||
await fn();
|
||||
} catch (e) {
|
||||
if (!(e instanceof Error) || !e.message.includes('__process_exit__')) throw e;
|
||||
} finally {
|
||||
process.exit = origExit;
|
||||
}
|
||||
return captured;
|
||||
}
|
||||
|
||||
describe('sources add', () => {
|
||||
test('rejects invalid ids', async () => {
|
||||
const { engine } = makeStub();
|
||||
const code = await withExitCapture(() => runSources(engine, ['add']));
|
||||
expect(code).toBe(2);
|
||||
});
|
||||
|
||||
test('rejects uppercase / invalid chars in id', async () => {
|
||||
const { engine } = makeStub();
|
||||
await expect(runSources(engine, ['add', 'BadId', '--path', '/tmp/x'])).rejects.toThrow(/Invalid source id/);
|
||||
});
|
||||
|
||||
test('rejects id longer than 32 chars', async () => {
|
||||
const { engine } = makeStub();
|
||||
const long = 'a'.repeat(33);
|
||||
await expect(runSources(engine, ['add', long, '--path', '/tmp/x'])).rejects.toThrow(/Invalid source id/);
|
||||
});
|
||||
|
||||
test('inserts a valid source with defaults (federated unset → isolated)', async () => {
|
||||
const { engine, calls } = makeStub({
|
||||
'SELECT id, name, local_path, last_commit, last_sync_at, config, created_at': [{
|
||||
id: 'gstack',
|
||||
name: 'gstack',
|
||||
local_path: '/tmp/gstack',
|
||||
last_commit: null,
|
||||
last_sync_at: null,
|
||||
config: '{}',
|
||||
created_at: new Date(),
|
||||
}],
|
||||
});
|
||||
await runSources(engine, ['add', 'gstack', '--path', '/tmp/gstack']);
|
||||
const insert = calls.find(c => c.sql.includes('INSERT INTO sources'));
|
||||
expect(insert).toBeDefined();
|
||||
expect(insert!.params[0]).toBe('gstack');
|
||||
expect(insert!.params[1]).toBe('gstack'); // name defaults to id
|
||||
expect(insert!.params[2]).toBe('/tmp/gstack');
|
||||
expect(insert!.params[3]).toBe('{}'); // federated unset → empty config
|
||||
});
|
||||
|
||||
test('--federated sets config.federated = true', async () => {
|
||||
const { engine, calls } = makeStub({
|
||||
'SELECT id, name, local_path, last_commit, last_sync_at, config, created_at': [{
|
||||
id: 'wiki',
|
||||
name: 'wiki',
|
||||
local_path: '/tmp/wiki',
|
||||
last_commit: null,
|
||||
last_sync_at: null,
|
||||
config: '{"federated":true}',
|
||||
created_at: new Date(),
|
||||
}],
|
||||
});
|
||||
await runSources(engine, ['add', 'wiki', '--path', '/tmp/wiki', '--federated']);
|
||||
const insert = calls.find(c => c.sql.includes('INSERT INTO sources'));
|
||||
expect(insert!.params[3]).toBe('{"federated":true}');
|
||||
});
|
||||
|
||||
test('--no-federated sets config.federated = false (isolation opt-in)', async () => {
|
||||
const { engine, calls } = makeStub({
|
||||
'SELECT id, name, local_path, last_commit, last_sync_at, config, created_at': [{
|
||||
id: 'yc-media',
|
||||
name: 'yc-media',
|
||||
local_path: '/tmp/yc',
|
||||
last_commit: null,
|
||||
last_sync_at: null,
|
||||
config: '{"federated":false}',
|
||||
created_at: new Date(),
|
||||
}],
|
||||
});
|
||||
await runSources(engine, ['add', 'yc-media', '--path', '/tmp/yc', '--no-federated']);
|
||||
const insert = calls.find(c => c.sql.includes('INSERT INTO sources'));
|
||||
expect(insert!.params[3]).toBe('{"federated":false}');
|
||||
});
|
||||
|
||||
test('rejects overlapping paths (per eng review finding 4.1)', async () => {
|
||||
const { engine } = makeStub({
|
||||
'SELECT id, local_path FROM sources WHERE local_path': [
|
||||
{ id: 'gstack', local_path: '/tmp/gstack' },
|
||||
],
|
||||
});
|
||||
// New source at /tmp/gstack/plans is inside existing gstack at /tmp/gstack.
|
||||
await expect(runSources(engine, ['add', 'plans', '--path', '/tmp/gstack/plans']))
|
||||
.rejects.toThrow(/overlaps with existing source "gstack"/);
|
||||
});
|
||||
});
|
||||
|
||||
// ── list ────────────────────────────────────────────────────
|
||||
|
||||
describe('sources list', () => {
|
||||
test('orders default source first, then alphabetical', async () => {
|
||||
const { engine, calls } = makeStub({
|
||||
'SELECT id, name, local_path, last_commit, last_sync_at, config, created_at': [
|
||||
{ id: 'default', name: 'default', local_path: null, last_commit: null, last_sync_at: null, config: '{"federated":true}', created_at: new Date() },
|
||||
],
|
||||
'COUNT(*)::int AS n FROM pages': [{ n: 0 }],
|
||||
});
|
||||
await runSources(engine, ['list']);
|
||||
const select = calls.find(c => c.sql.includes('ORDER BY (id = \'default\') DESC'));
|
||||
expect(select).toBeDefined();
|
||||
});
|
||||
});
|
||||
|
||||
// ── remove ──────────────────────────────────────────────────
|
||||
|
||||
describe('sources remove', () => {
|
||||
test("refuses to remove the 'default' source", async () => {
|
||||
const { engine } = makeStub();
|
||||
const code = await withExitCapture(() => runSources(engine, ['remove', 'default', '--yes']));
|
||||
expect(code).toBe(3);
|
||||
});
|
||||
|
||||
test('refuses without --yes', async () => {
|
||||
const { engine } = makeStub({
|
||||
'SELECT id, name, local_path, last_commit, last_sync_at, config, created_at': [
|
||||
{ id: 'gstack', name: 'gstack', local_path: '/tmp/g', last_commit: null, last_sync_at: null, config: '{}', created_at: new Date() },
|
||||
],
|
||||
'COUNT(*)::int AS n FROM pages': [{ n: 10 }],
|
||||
});
|
||||
const code = await withExitCapture(() => runSources(engine, ['remove', 'gstack']));
|
||||
expect(code).toBe(5);
|
||||
});
|
||||
|
||||
test('--dry-run reports but does not DELETE', async () => {
|
||||
const { engine, calls } = makeStub({
|
||||
'SELECT id, name, local_path, last_commit, last_sync_at, config, created_at': [
|
||||
{ id: 'gstack', name: 'gstack', local_path: '/tmp/g', last_commit: null, last_sync_at: null, config: '{}', created_at: new Date() },
|
||||
],
|
||||
'COUNT(*)::int AS n FROM pages': [{ n: 10 }],
|
||||
});
|
||||
await runSources(engine, ['remove', 'gstack', '--dry-run']);
|
||||
const del = calls.find(c => c.sql.startsWith('DELETE FROM sources'));
|
||||
expect(del).toBeUndefined();
|
||||
});
|
||||
});
|
||||
|
||||
// ── default ─────────────────────────────────────────────────
|
||||
|
||||
describe('sources default', () => {
|
||||
test("stores id in config key 'sources.default'", async () => {
|
||||
const { engine, configSet } = makeStub({
|
||||
'SELECT id, name, local_path, last_commit, last_sync_at, config, created_at': [
|
||||
{ id: 'gstack', name: 'gstack', local_path: null, last_commit: null, last_sync_at: null, config: '{}', created_at: new Date() },
|
||||
],
|
||||
});
|
||||
await runSources(engine, ['default', 'gstack']);
|
||||
expect(configSet).toEqual([{ key: 'sources.default', value: 'gstack' }]);
|
||||
});
|
||||
});
|
||||
|
||||
// ── federate / unfederate ──────────────────────────────────
|
||||
|
||||
describe('sources federate / unfederate', () => {
|
||||
test('federate sets config.federated = true', async () => {
|
||||
const { engine, calls } = makeStub({
|
||||
'SELECT id, name, local_path, last_commit, last_sync_at, config, created_at': [
|
||||
{ id: 'gstack', name: 'gstack', local_path: null, last_commit: null, last_sync_at: null, config: '{}', created_at: new Date() },
|
||||
],
|
||||
});
|
||||
await runSources(engine, ['federate', 'gstack']);
|
||||
const upd = calls.find(c => c.sql.includes('UPDATE sources SET config'));
|
||||
expect(upd).toBeDefined();
|
||||
expect(JSON.parse(upd!.params[0] as string)).toEqual({ federated: true });
|
||||
});
|
||||
|
||||
test('unfederate preserves other config keys', async () => {
|
||||
const { engine, calls } = makeStub({
|
||||
'SELECT id, name, local_path, last_commit, last_sync_at, config, created_at': [
|
||||
{ id: 'gstack', name: 'gstack', local_path: null, last_commit: null, last_sync_at: null, config: '{"ttl_days":90,"federated":true}', created_at: new Date() },
|
||||
],
|
||||
});
|
||||
await runSources(engine, ['unfederate', 'gstack']);
|
||||
const upd = calls.find(c => c.sql.includes('UPDATE sources SET config'));
|
||||
const parsed = JSON.parse(upd!.params[0] as string);
|
||||
// Must preserve ttl_days while flipping federated.
|
||||
expect(parsed.ttl_days).toBe(90);
|
||||
expect(parsed.federated).toBe(false);
|
||||
});
|
||||
});
|
||||
213
test/storage-backfill.test.ts
Normal file
213
test/storage-backfill.test.ts
Normal file
@@ -0,0 +1,213 @@
|
||||
/**
|
||||
* v0.18.0 Step 7 — file_migration_ledger state-machine unit tests.
|
||||
*
|
||||
* No real storage — we stub a StorageBackend that records every
|
||||
* call so we can assert the crash-point recovery semantics without
|
||||
* touching S3/Supabase.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { runStorageBackfill } from '../src/commands/migrations/v0_18_0-storage-backfill.ts';
|
||||
import type { BrainEngine } from '../src/core/engine.ts';
|
||||
import type { StorageBackend } from '../src/core/storage.ts';
|
||||
|
||||
interface StubLedgerRow {
|
||||
file_id: number;
|
||||
storage_path_old: string;
|
||||
storage_path_new: string;
|
||||
status: 'pending' | 'copy_done' | 'db_updated' | 'complete' | 'failed';
|
||||
error?: string | null;
|
||||
}
|
||||
|
||||
function makeEngine(initial: StubLedgerRow[]): { engine: BrainEngine; rows: StubLedgerRow[]; filePaths: Map<number, string> } {
|
||||
const rows: StubLedgerRow[] = initial.map(r => ({ ...r }));
|
||||
const filePaths = new Map<number, string>(); // file_id → current storage_path
|
||||
|
||||
const executeRaw = async <T>(sql: string, params?: unknown[]): Promise<T[]> => {
|
||||
const up = sql.trim().toUpperCase();
|
||||
// Read ledger
|
||||
if (up.startsWith('SELECT FILE_ID')) {
|
||||
return rows.map(r => ({ ...r })) as unknown as T[];
|
||||
}
|
||||
// UPDATE ledger SET status = 'copy_done'
|
||||
if (sql.includes("SET status = 'copy_done'")) {
|
||||
const row = rows.find(r => r.file_id === params?.[0]);
|
||||
if (row) row.status = 'copy_done';
|
||||
return [];
|
||||
}
|
||||
if (sql.includes("SET status = 'db_updated'")) {
|
||||
const row = rows.find(r => r.file_id === params?.[0]);
|
||||
if (row) row.status = 'db_updated';
|
||||
return [];
|
||||
}
|
||||
if (sql.includes("SET status = 'complete'")) {
|
||||
const row = rows.find(r => r.file_id === params?.[0]);
|
||||
if (row) row.status = 'complete';
|
||||
return [];
|
||||
}
|
||||
if (sql.includes('SET status = $1') && sql.includes("'failed'")) {
|
||||
// Older form with parametric status
|
||||
return [];
|
||||
}
|
||||
if (sql.includes("SET status = 'failed'")) {
|
||||
const row = rows.find(r => r.file_id === params?.[1]);
|
||||
if (row) { row.status = 'failed'; row.error = params?.[0] as string; }
|
||||
return [];
|
||||
}
|
||||
// UPDATE files SET storage_path = $1 WHERE id = $2
|
||||
if (up.startsWith('UPDATE FILES')) {
|
||||
filePaths.set(params?.[1] as number, params?.[0] as string);
|
||||
return [];
|
||||
}
|
||||
return [];
|
||||
};
|
||||
|
||||
const engine = { kind: 'postgres' as const, executeRaw } as unknown as BrainEngine;
|
||||
return { engine, rows, filePaths };
|
||||
}
|
||||
|
||||
function makeStorage(): { storage: StorageBackend; calls: string[] } {
|
||||
const calls: string[] = [];
|
||||
const uploaded = new Set<string>();
|
||||
const storage: StorageBackend = {
|
||||
upload: async (path: string) => { calls.push(`upload:${path}`); uploaded.add(path); },
|
||||
download: async (path: string) => { calls.push(`download:${path}`); return Buffer.from('content-for:' + path); },
|
||||
delete: async (path: string) => { calls.push(`delete:${path}`); uploaded.delete(path); },
|
||||
exists: async (path: string) => { calls.push(`exists:${path}`); return uploaded.has(path); },
|
||||
list: async () => [],
|
||||
getUrl: async (p) => `https://test/${p}`,
|
||||
};
|
||||
return { storage, calls };
|
||||
}
|
||||
|
||||
describe('runStorageBackfill — happy path', () => {
|
||||
test('advances pending → copy_done → db_updated → complete', async () => {
|
||||
const { engine, rows, filePaths } = makeEngine([
|
||||
{ file_id: 1, storage_path_old: 'slug/foo.pdf', storage_path_new: 'default/slug/foo.pdf', status: 'pending' },
|
||||
]);
|
||||
const { storage, calls } = makeStorage();
|
||||
|
||||
const report = await runStorageBackfill(engine, storage);
|
||||
|
||||
expect(report.total).toBe(1);
|
||||
expect(report.nowComplete).toBe(1);
|
||||
expect(report.failed).toBe(0);
|
||||
expect(rows[0].status).toBe('complete');
|
||||
expect(filePaths.get(1)).toBe('default/slug/foo.pdf');
|
||||
// Storage operations: exists-check then download + upload (no delete yet,
|
||||
// old objects preserved for soak window).
|
||||
expect(calls.filter(c => c.startsWith('download:'))).toEqual(['download:slug/foo.pdf']);
|
||||
expect(calls.filter(c => c.startsWith('upload:'))).toEqual(['upload:default/slug/foo.pdf']);
|
||||
expect(calls.filter(c => c.startsWith('delete:'))).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
||||
describe('runStorageBackfill — crash-point recovery (per Codex second pass)', () => {
|
||||
test('resumes from copy_done (crash AFTER copy, BEFORE DB update)', async () => {
|
||||
const { engine, rows, filePaths } = makeEngine([
|
||||
{ file_id: 1, storage_path_old: 'slug/a.pdf', storage_path_new: 'default/slug/a.pdf', status: 'copy_done' },
|
||||
]);
|
||||
const { storage, calls } = makeStorage();
|
||||
|
||||
const report = await runStorageBackfill(engine, storage);
|
||||
|
||||
expect(report.nowComplete).toBe(1);
|
||||
expect(rows[0].status).toBe('complete');
|
||||
expect(filePaths.get(1)).toBe('default/slug/a.pdf');
|
||||
// Should NOT re-download/re-upload — already in copy_done state.
|
||||
expect(calls.filter(c => c.startsWith('download:'))).toEqual([]);
|
||||
expect(calls.filter(c => c.startsWith('upload:'))).toEqual([]);
|
||||
});
|
||||
|
||||
test('resumes from db_updated (crash AFTER DB update, BEFORE ledger mark)', async () => {
|
||||
const { engine, rows } = makeEngine([
|
||||
{ file_id: 1, storage_path_old: 'slug/b.pdf', storage_path_new: 'default/slug/b.pdf', status: 'db_updated' },
|
||||
]);
|
||||
const { storage, calls } = makeStorage();
|
||||
|
||||
const report = await runStorageBackfill(engine, storage);
|
||||
|
||||
expect(report.nowComplete).toBe(1);
|
||||
expect(rows[0].status).toBe('complete');
|
||||
// No copy, no db update — only the final mark.
|
||||
expect(calls).toEqual([]);
|
||||
});
|
||||
|
||||
test('already-complete rows are skipped without storage calls', async () => {
|
||||
const { engine, rows } = makeEngine([
|
||||
{ file_id: 1, storage_path_old: 'x', storage_path_new: 'default/x', status: 'complete' },
|
||||
]);
|
||||
const { storage, calls } = makeStorage();
|
||||
|
||||
const report = await runStorageBackfill(engine, storage);
|
||||
|
||||
expect(report.alreadyComplete).toBe(1);
|
||||
expect(report.nowComplete).toBe(0);
|
||||
expect(rows[0].status).toBe('complete');
|
||||
expect(calls).toEqual([]);
|
||||
});
|
||||
|
||||
test('failed rows stay failed and do NOT auto-retry', async () => {
|
||||
const { engine, rows } = makeEngine([
|
||||
{ file_id: 1, storage_path_old: 'x', storage_path_new: 'default/x', status: 'failed', error: 'previous failure' },
|
||||
]);
|
||||
const { storage, calls } = makeStorage();
|
||||
|
||||
const report = await runStorageBackfill(engine, storage);
|
||||
|
||||
expect(report.failed).toBe(1);
|
||||
expect(report.nowComplete).toBe(0);
|
||||
expect(rows[0].status).toBe('failed');
|
||||
expect(calls).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
||||
describe('runStorageBackfill — idempotence + dry-run', () => {
|
||||
test('upload already-exists check skips redundant upload on re-run', async () => {
|
||||
const { engine } = makeEngine([
|
||||
{ file_id: 1, storage_path_old: 'x', storage_path_new: 'default/x', status: 'pending' },
|
||||
]);
|
||||
const { storage, calls } = makeStorage();
|
||||
// Mark the new path as already existing (simulates a prior partial run
|
||||
// where upload landed but ledger didn't get updated).
|
||||
await storage.upload('default/x', Buffer.from('x'));
|
||||
calls.length = 0;
|
||||
|
||||
await runStorageBackfill(engine, storage);
|
||||
|
||||
// Exists check ran, but no new download or upload since the
|
||||
// destination already has the object.
|
||||
expect(calls.some(c => c === 'exists:default/x')).toBe(true);
|
||||
expect(calls.some(c => c.startsWith('download:'))).toBe(false);
|
||||
expect(calls.some(c => c.startsWith('upload:'))).toBe(false);
|
||||
});
|
||||
|
||||
test('dry-run mode reports skipped count, does not mutate', async () => {
|
||||
const { engine, rows } = makeEngine([
|
||||
{ file_id: 1, storage_path_old: 'x', storage_path_new: 'default/x', status: 'pending' },
|
||||
{ file_id: 2, storage_path_old: 'y', storage_path_new: 'default/y', status: 'pending' },
|
||||
]);
|
||||
|
||||
const report = await runStorageBackfill(engine, null, { dryRun: true });
|
||||
|
||||
expect(report.total).toBe(2);
|
||||
expect(report.skipped).toBe(2);
|
||||
expect(report.nowComplete).toBe(0);
|
||||
// Rows still pending.
|
||||
expect(rows.every(r => r.status === 'pending')).toBe(true);
|
||||
});
|
||||
|
||||
test('re-running a completed ledger is a no-op with zero side effects', async () => {
|
||||
const { engine } = makeEngine([
|
||||
{ file_id: 1, storage_path_old: 'x', storage_path_new: 'default/x', status: 'complete' },
|
||||
{ file_id: 2, storage_path_old: 'y', storage_path_new: 'default/y', status: 'complete' },
|
||||
]);
|
||||
const { storage, calls } = makeStorage();
|
||||
|
||||
const report = await runStorageBackfill(engine, storage);
|
||||
|
||||
expect(report.alreadyComplete).toBe(2);
|
||||
expect(report.nowComplete).toBe(0);
|
||||
expect(calls).toEqual([]);
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user