Files
gbrain/test/e2e/multi-source.test.ts
Garry Tan 90c5d93fce feat: v0.18.0 — multi-source brains (one DB, many repos, federation + dotfile resolution) (#337)
* feat(v0.17.0 step 1/9): sources primitive — additive-only multi-source foundation

Lane A of the multi-repo plan. Installs the sources table and seeds a
'default' row that inherits sync.repo_path/last_commit from existing
config. This is the bisectable foundation every later step builds on;
the breaking schema changes (composite UNIQUE, files FK rewrite,
resolution_type, ingest_log.source_id) land with their paired code
rewrites in Steps 2/4/5/7 so no single commit breaks the engine.

- migration v16 (sources_table_additive) + v0_17_0 orchestrator skeleton
- sort-by-version guard in runMigrations (array insertion order can
  never cause a later migration to skip a lower one again)
- default source seeded with config '{"federated": true}' so pre-v0.17
  brains keep single-namespace search semantics after upgrade
- orchestrator phase B detects absence of file_migration_ledger and
  no-ops until Step 7 lands it
- 8 new structural tests in test/migrate.test.ts (shape, idempotency,
  scope-guard that nothing else was smuggled into v16)
- apply-migrations tests include v0.17.0 in the registered list

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.17.0 step 2/9): pages.source_id + composite UNIQUE (Lane B)

Migration v17 adds pages.source_id with DEFAULT 'default' and swaps the
global UNIQUE(slug) for composite UNIQUE(source_id, slug). Ships atomically
with the engine's ON CONFLICT rewrite so the constraint swap and the code
that writes under it land in the same commit — no window where the engine
sees one shape and the schema has another.

Minimum-surface engine change: only putPage's ON CONFLICT target needs
re-targeting. Other slug-based queries work unchanged because single-
source brains (the only brain shape pre-Step-5) have exactly one source
'default', so slug remains effectively unique within it. Step 5+ will
surface an explicit sourceId param on putPage for cross-source sync.

- migration v17 (pages_source_id_composite_unique) in src/core/migrate.ts
- pages.source_id + composite UNIQUE added to schema.sql + pglite-schema.ts
  for fresh installs
- ON CONFLICT (slug) → ON CONFLICT (source_id, slug) in both pglite-engine
  and postgres-engine putPage
- DEFAULT 'default' closes the Codex-flagged race where an INSERT between
  ADD COLUMN and SET NOT NULL could leave source_id NULL
- 5 new v17 structural tests (29 pass / 0 fail in migrate.test.ts)
- Full suite: 1979 pass / 3 fail (same as baseline — no regressions)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.17.0 step 6/9): sources CLI + source-resolver (Lane C)

Adds the CLI surface for multi-source management. Users can now register,
list, rename, federate/unfederate, and attach-to-directory a source. The
source-resolver is the shared 6-priority helper that Steps 4/5 will use
when they start surfacing an explicit --source flag on sync/extract/query.

Commands:
  gbrain sources add <id> --path <p> [--name <n>] [--federated|--no-federated]
  gbrain sources list [--json]
  gbrain sources remove <id> [--yes] [--dry-run] [--keep-storage]
  gbrain sources rename <id> <new-name>
  gbrain sources default <id>
  gbrain sources attach <id>   — writes .gbrain-source in CWD
  gbrain sources detach
  gbrain sources federate <id> / unfederate <id>

Resolution priority (source-resolver.ts) — highest first:
  1. --source flag  2. GBRAIN_SOURCE env  3. .gbrain-source dotfile walk-up
  4. longest-prefix match on registered local_path (Codex #2 fix)
  5. sources.default config  6. fallback 'default'

- add: validates id format (kebab-case alnum, 1-32), rejects overlapping
  paths (eng review §4 finding 4.1), supports federated default opt-in
- remove: guards against --yes omission + refuses to remove 'default',
  supports --dry-run, reports cascade page count
- attach/detach: matches kubectl/terraform context-pinning semantics
- Throws on overlap rather than process.exit() so the CLI error wrapper
  reports it consistently (also makes unit testing clean)

28 new tests across sources.test.ts (dispatcher + validation + overlap
guard) and source-resolver.test.ts (full 6-priority coverage including
longest-prefix). Full suite: 2012 pass / 3 fail (pre-existing PGLite
infra timeouts).

NOT in scope for Step 6 (deferred):
  - import-from-github (SSRF + clone integration)
  - prune (retention/TTL, lands v0.18)
  - MCP tool-defs regen for source-scoping on read ops (Step 5)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(v0.17.0 step 8/9): getting-started guide + migration skill + citation rule

Step 8 (Lane F) documents what Steps 1+2+6 have shipped and sets up
the agent-facing rules for multi-source.

New files:
- skills/migrations/v0.17.0.md — migration skill read by host agents
  after `gbrain apply-migrations`. Covers the v16+v17 chain, what's
  in v0.17.0 vs what lands later (v0.17.1 ACL, v0.18 sessions), and
  the new sources CLI surface. Cites docs/guides/multi-source-brains.md
  as the recipe.
- docs/guides/multi-source-brains.md — getting-started for end users.
  Three canonical scenarios (unified wiki+gstack / purpose-separated
  yc-media+garrys-list / mixed), full resolution priority, federation
  flag semantics, command reference, and citation format.

skills/brain-ops/SKILL.md — new "Cross-source citation format"
section mandating `[source-id:slug]` when the brain has multiple
sources. Matches the contract the /plan-devex-review DX review
pinned down (DX Finding 5: surface source_id in every page payload
+ citation contract). Key must be sources.id (immutable), never
sources.name.

No behavior change — this is pure documentation for what already
exists in the binary. 144 skills conformance tests still pass.

NOT in this commit (deferred to later steps):
- docs/guides/repo-architecture.md rewrite (lands with the full
  v0.17.0 PR description + release notes)
- skills/_brain-filing-rules.md "which source to file into"
  guidance (lands with Step 5 when sync surfaces --source)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.17.0 step 5/9): sync --source <id> routes through sources table (Lane D)

Adds the --source flag to `gbrain sync`. When set, sync reads local_path
+ last_commit from the matching sources(id) row instead of the global
sync.repo_path / sync.last_commit config keys, and writes last_commit +
last_sync_at back to the same row. Backward compat: --source omitted =
pre-v0.17 behavior exactly, global config path unchanged.

- SyncOpts.sourceId threaded through performSync + performFullSync
- readSyncAnchor/writeSyncAnchor helpers centralize the sources-vs-config
  branch so every read/write goes through one decision point. Makes
  Step 5's later per-source sync-failures tracking a one-file change.
- --source resolved via src/core/source-resolver.ts (Step 6), so any
  command that shell-exposes resolveSourceId gets env var + dotfile
  walk-up + longest-prefix for free.
- Error message for missing source local_path is actionable:
    Source "gstack" has no local_path. Run: gbrain sources add gstack --path <path>
- last_sync_at auto-updates on every last_commit advance so `gbrain
  sources list` shows real recency.

No regression: 2012 pass / 3 fail (same as baseline).

NOT in this commit (deferred per plan):
- Per-source failure tracking (~/.gbrain/sources/<id>/sync-failures.jsonl)
- runImport source-awareness (import.ts path — Step 5 continuation)
- Partial-success semantics when walking N sources — single-source flow
  today, multi-walk lands when the top-level `gbrain sync` without
  --source starts iterating all sources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.17.0 step 4/9): qualified [[source:slug]] + links.resolution_type (Lane B)

Adds source-pinned wikilink syntax and records the resolution kind on
each edge so `gbrain extract --refresh-unqualified` (future) can
re-resolve bare references when the source topology changes.

Wikilink syntax extension:
  [[concepts/ai]]             — unqualified; resolves via local-first fallback
  [[wiki:concepts/ai]]        — qualified; target pinned to sources.id='wiki'
  [[gstack:projects/foo|Display]]  — qualified + display name

The qualified regex runs first and masks matched spans so the
unqualified pass can't double-emit. Source id format enforced to match
the sources CLI validation: [a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?

Schema:
- migration v18 adds links.resolution_type TEXT with CHECK constraint
  ('qualified'|'unqualified' or NULL for legacy/manual/frontmatter edges)
- schema.sql + pglite-schema.ts updated for fresh installs

EntityRef type:
- sourceId is OPTIONAL (only set on qualified wikilinks). Markdown
  [Name](path) and unqualified wikilinks omit it so strict toEqual
  tests pre-v0.17 keep working (69 existing tests still pass).

Tests:
- 5 new qualified-wikilink extraction tests + 1 migration v18 structural
  assertion. 75 tests in test/link-extraction.test.ts (up from 69).
- Full suite: 2018 pass / 3 fail (pre-existing PGLite infra timeouts).

NOT in this commit (deferred to Step 3 / Step 5 continuation):
- Writing resolution_type to the DB (addLink / addLinksBatch don't
  carry the field yet — that's the plumb-through that lands with
  Step 3 when search/dedup also needs source-aware result keys).
- `gbrain extract --refresh-unqualified` re-resolver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.17.0 step 3/9): source-aware search dedup composite keys (Lane B)

Search dedup now keys on (source_id, slug) instead of slug alone. Pre-
v0.17 would collapse two same-slug pages in different sources into
one, destroying cross-source recall. Codex outside-voice review flagged
this as regression-critical — this commit ships the fix plus tests
that lock the invariant in.

Dedup pipeline (src/core/search/dedup.ts):
- pageKey(r) helper — one canonical composite-key derivation. Falls
  back to source_id='default' for pre-v0.17 rows so single-source
  brains behave identically to before.
- Layer 1 (dedupBySource): group-by composite key.
- Layer 4 (capPerPage): count-by composite key.
- guaranteeCompiledTruth: swap scoped to matching (source_id, slug),
  so wiki:topics/ai can't accidentally pull gstack:topics/ai's
  compiled_truth chunk.

SearchResult type gains optional source_id — populated by SQL JOINs
in both engines, falls through as 'default' for legacy callers.

Engine SQL:
- pglite-engine.ts + postgres-engine.ts: search SELECTs add p.source_id
- rowToSearchResult (utils.ts): maps row.source_id → result.source_id
  when present. Shape stays backward compatible (field optional).

Tests — 4 new in test/dedup.test.ts:
- same-slug-different-source does NOT collapse (the critical regression
  guard Codex called out)
- same-slug-same-source DOES still collapse (no over-correction)
- missing source_id falls back to 'default' for pre-v0.17 compat
- compiled_truth guarantee scopes to composite key (Codex second pass
  caught this specific path would leak otherwise)

Full suite: 2022 pass / 3 fail (3 pre-existing PGLite infra timeouts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.17.0 step 7/9): file_migration_ledger + phase-B storage backfill (Lane E)

Adds files.source_id + files.page_id + the file_migration_ledger
state machine that drives storage object rewrites. Each per-file
transition is its own transaction so crash-point recovery is a
ledger read, not a filesystem inspection. Codex second-pass review
flagged that "skip if already has source prefix" was an unsafe
heuristic — the ledger replaces it with explicit state tracking.

Schema:
- migration v19 (files_source_id_page_id_ledger): handler-only
  (PGLite has no files table; Postgres-only gate). ADDs
  source_id + page_id to files, backfills page_id from page_slug
  scoped to source_id='default', creates file_migration_ledger
  with PK on file_id (Codex: not storage_path_old — two sources
  can share an old path during migration).
- schema.sql updated for fresh Postgres installs; file_migration_ledger
  gets RLS alongside other tables.

Runtime:
- src/commands/migrations/v0_17_0-storage-backfill.ts: drives the
  ledger state machine pending → copy_done → db_updated → complete.
  Idempotent per row: re-running resumes from whichever state
  crashed. Old objects preserved (no delete) so operators can
  verify the soak window before a future cleanup release.
- phase B in v0_17_0.ts orchestrator: wires the storage backend
  (Supabase/S3/local) through createStorage, runs runStorageBackfill,
  reports per-state counts + first-three error details.

Tests — 13 new in test/storage-backfill.test.ts:
- pending → copy_done → db_updated → complete happy path
- 3 crash-point recovery tests (resume from copy_done, resume from
  db_updated, failed rows don't auto-retry)
- already-complete rows are skipped with zero side effects
- idempotent re-upload (exists-check skips redundant upload)
- dry-run mode (no storage, reports counts without mutating)

Plus 5 new migrate.test.ts assertions for v19 structure (handler-
only, PGLite gate, source_id + page_id + ledger DDL, default-source
backfill scope, state machine values).

Full suite: 2035 pass / 3 fail (3 pre-existing PGLite infra
timeouts).

NOT in this commit (explicitly deferred):
- DROP old page_slug column — kept for backward compat until
  operators have time to verify page_id everywhere.
- DROP old UNIQUE(storage_path) in favor of UNIQUE(source_id,
  storage_path) — same reason, deferred to later cleanup.
- Actual cleanup phase that deletes old objects post-soak.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.17.0 step 9/9): full multi-source PGLite integration suite (Lane G)

End-to-end exercise of every v0.17.0 surface against real PGLite
(in-memory, fast — no DATABASE_URL needed). The migration chain
v2→v19 runs start-to-finish and the test asserts each Step's
invariants hold together.

16 new integration tests across 7 describes:

1. Migration-installed state:
   - sources('default') exists with federated=true config
   - pages.source_id column has DEFAULT 'default'
   - composite UNIQUE (source_id, slug) is installed

2. Default-source write path:
   - putPage without explicit source → source_id='default' via schema
     default clause (no engine API change needed for single-source brains)

3. Composite UNIQUE regression guards (Codex-flagged):
   - Same slug in two different sources coexists
   - Third insert with same (source_id, slug) hits the UNIQUE constraint

4. sources CLI round-trip:
   - federate / unfederate flips config.federated
   - rename changes display, id stays immutable

5. Source resolution priority (integration):
   - Explicit flag > env var > fallback to default
   - Unregistered explicit source errors with actionable message

6. Cascade semantics:
   - sources remove cascades to pages; default source untouched

7. links.resolution_type (Step 4):
   - Qualified/unqualified values accepted
   - CHECK constraint rejects invalid values

All 16 tests pass. Full suite: 2042 pass / 4 fail (4 pre-existing
PGLite beforeEach timeouts in test/wait-for-completion,
test/extract-fs, test/e2e/search-quality, test/e2e/graph-quality
— count fluctuated 3-5 on baseline from variance alone).

Total new tests across Steps 1-9: ~85 unit + integration tests
(sources, source-resolver, migrate v16/v17/v18/v19 structural,
link-extraction qualified wikilinks, dedup regression-critical,
storage-backfill state machine + crash recovery, full
multi-source PGLite integration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump to v0.18.0 + CHANGELOG entry (multi-source brains)

One-viewport release summary + itemized changes covering all 9 steps
of the multi-source primitive. Notes the v0.17 → v0.18 version bump
rationale (master shipped gbrain dream as v0.17 while this branch was
in flight).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): v0_18_0 orchestrator TS narrow + mechanical test ON CONFLICT

Two CI failures on PR #337:

1. tsc TS2367 at src/commands/migrations/v0_18_0.ts:190 —
   after the early-return on `a.status === 'failed'` (line 179),
   TypeScript narrows `a.status` to `'skipped' | 'complete'`, so the
   subsequent `a.status === 'failed' ? 'failed' :` branch was dead
   code and refused to compile. Dropped the redundant check.

2. E2E `file_list LIMIT enforcement` at test/e2e/mechanical.test.ts:636 —
   the test pre-seeded a pages row with `ON CONFLICT (slug) DO NOTHING`
   but v21 swapped the global UNIQUE for `UNIQUE (source_id, slug)`, so
   Postgres rejects with "no unique or exclusion constraint matching".
   Updated the conflict target to the composite key.

Tier-1 E2E had only this one failing test; everything else passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): v0.18.0 multi-source against real Postgres (v20-v23 schema + cascade + sync)

Closes the three biggest confidence gaps the author flagged in the
self-audit of PR #337:

1. No real Postgres E2E — PGLite has no files table, so v23's
   files.source_id + files.page_id rewrite + file_migration_ledger
   seed was NEVER executed against the real DB. This file covers it.

2. `gbrain sync --source <id>` had zero direct tests. Now has two:
   one that asserts performSync({sourceId}) reads local_path from the
   sources row (not the global config), one that asserts no-sourceId
   falls back to the global sync.repo_path.

3. Cascade delete coverage — previously verified only pages count
   after source removal. Now verifies pages + content_chunks +
   timeline_entries + links + files ALL cascade-delete when a source
   is removed.

6 describes, 16 tests total:

- Schema shape (fresh install): 6 tests confirming sources('default'),
  pages.source_id NOT NULL with DEFAULT, composite UNIQUE pages
  (source_id, slug) replaces global UNIQUE(slug), links.resolution_type
  column + CHECK, files.source_id + page_id columns, file_migration_ledger
  table + status CHECK.

- Composite UNIQUE semantics: 3 tests confirming same-slug in two
  sources coexists (Codex-critical regression guard), duplicate
  (source_id, slug) hits the UNIQUE, putPage targets default source
  by schema DEFAULT.

- Cascade delete: 1 test building a fully populated source (2 pages,
  chunks, timeline, links, files) then removing it + asserting every
  dependent row is gone.

- Sync routing: 2 tests confirming performSync({sourceId}) reads
  per-source local_path vs global config.

- Sources surface: 3 tests for federate/unfederate flipping + rename
  preserving id.

- Storage backfill: 1 end-to-end test seeding ledger + running
  runStorageBackfill against a stub StorageBackend, asserting
  pending → complete transition and files.storage_path rewrite.

Gated by DATABASE_URL per CLAUDE.md E2E lifecycle. Each describe's
beforeAll defensively DELETEs non-default sources + file_migration_ledger
rows so reruns are hermetic (sources isn't in helpers.ALL_TABLES).

Verified: 16/16 pass on first run AND second run (residual-state fix
holds). Full E2E suite still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): TS2352 in multi-source E2E — cast postgres.js RowList via unknown

tsc rejects the direct
  `(rows as { column_name: string }[]).map(...)`
cast because postgres.js RowList rows have an iterable-row shape that
doesn't overlap with the plain-object target. Standard fix: cast via
`unknown` first so the narrowing is explicit.

Verified: `bunx tsc --noEmit` clean (ignoring the pre-existing baseUrl
deprecation warning).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.18.0): addLinksBatch + addTimelineEntriesBatch source-aware JOINs

Batch APIs JOINed on pages.slug globally, so two pages sharing the same
slug across sources would silently fan out — addLinksBatch(['a->b']) in
a brain with 'a' in both 'default' and 'alt' wrote 2 edges instead of 1.
Same bug on addTimelineEntriesBatch.

Fix:
- LinkBatchInput + TimelineBatchInput gain optional source_id fields
  (from_source_id, to_source_id, origin_source_id for links; source_id
  for timeline). All default to 'default' so existing callers are
  backward-compatible on single-source brains.
- pglite-engine + postgres-engine batch JOINs now composite-key on
  (slug, source_id). Postgres adds 3 more unnest arrays for links + 1
  for timeline — still one bind per column, no 65535-param cap risk.
- LEFT JOIN for origin pages also source-qualified so frontmatter-
  provenance edges don't cross-pollinate across sources.

Regression coverage:
- test/pglite-engine.test.ts: 5 new tests covering default-path isolation,
  explicit alt-source writes, and cross-source edges.
- test/e2e/multi-source.test.ts: 4 new tests against real Postgres so
  postgres-js's unnest() bind path is exercised (structurally different
  from PGLite's).

Gap #4 from the PR self-audit — latent bug, not previously reachable
because every existing caller wrote to the default source only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 16:24:23 -07:00

609 lines
26 KiB
TypeScript

/**
* E2E: v0.18.0 multi-source migrations against REAL Postgres.
*
* PGLite doesn't have a files table (see pglite-schema.ts header), so the
* v23 migration's files.source_id + files.page_id rewrite + ledger seed
* is NEVER executed by the PGLite integration test. This file closes
* that gap by exercising the full v20-v23 chain against a real Postgres
* DB with pre-existing data.
*
* Also covers the gaps in the PR's pre-shipping test matrix that the
* author self-audited:
* - files.page_slug → page_id backfill against real rows
* - file_migration_ledger seeding
* - cascade delete via sources.remove (pages + chunks + timeline +
* files + links all gone)
* - sync --source <id> routing reads + writes per-source sync anchors
* instead of the global config keys
*
* Gated by DATABASE_URL — skips gracefully when unset, per the CLAUDE.md
* E2E lifecycle pattern.
*/
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
import { PostgresEngine } from '../../src/core/postgres-engine.ts';
import { runSources } from '../../src/commands/sources.ts';
import { performSync } from '../../src/commands/sync.ts';
import { runStorageBackfill } from '../../src/commands/migrations/v0_18_0-storage-backfill.ts';
import type { StorageBackend } from '../../src/core/storage.ts';
import { hasDatabase, setupDB, teardownDB, getConn, getEngine } from './helpers.ts';
const SKIP = !hasDatabase();
const describeE2E = SKIP ? describe.skip : describe;
describeE2E('v0.18.0 multi-source — Postgres schema shape (fresh install)', () => {
beforeAll(async () => {
await setupDB();
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
// residual rows from prior test runs can shadow new INSERTs. Wipe
// non-default sources at the top of every describe to keep each
// block hermetic. file_migration_ledger cascades from files which
// setupDB already truncates, but wipe explicitly in case files did
// not cascade it.
const conn = getConn();
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
await conn.unsafe(`DELETE FROM file_migration_ledger`);
});
afterAll(async () => {
await teardownDB();
});
test("sources('default') exists after initSchema + migration chain", async () => {
const conn = getConn();
const rows = await conn.unsafe(
`SELECT id, name, config FROM sources WHERE id = 'default'`,
);
expect(rows.length).toBe(1);
expect(rows[0].name).toBe('default');
const config = typeof rows[0].config === 'string' ? JSON.parse(rows[0].config) : rows[0].config;
expect(config.federated).toBe(true);
});
test('pages.source_id NOT NULL with DEFAULT default (v21)', async () => {
const conn = getConn();
const rows = await conn.unsafe(
`SELECT column_name, column_default, is_nullable
FROM information_schema.columns
WHERE table_name = 'pages' AND column_name = 'source_id'`,
);
expect(rows.length).toBe(1);
expect(rows[0].is_nullable).toBe('NO');
expect(String(rows[0].column_default)).toContain('default');
});
test('composite UNIQUE pages(source_id, slug) replaces global UNIQUE(slug)', async () => {
const conn = getConn();
const composite = await conn.unsafe(
`SELECT conname FROM pg_constraint WHERE conname = 'pages_source_slug_key'`,
);
expect(composite.length).toBe(1);
const oldGlobal = await conn.unsafe(
`SELECT conname FROM pg_constraint WHERE conname = 'pages_slug_key'`,
);
expect(oldGlobal.length).toBe(0);
});
test('links.resolution_type column exists with CHECK (v22)', async () => {
const conn = getConn();
const rows = await conn.unsafe(
`SELECT column_name FROM information_schema.columns
WHERE table_name = 'links' AND column_name = 'resolution_type'`,
);
expect(rows.length).toBe(1);
const check = await conn.unsafe(
`SELECT conname FROM pg_constraint WHERE conname = 'links_resolution_type_check'`,
);
expect(check.length).toBe(1);
});
test('files.source_id + files.page_id columns exist (v23, Postgres-only)', async () => {
const conn = getConn();
const cols = await conn.unsafe(
`SELECT column_name FROM information_schema.columns
WHERE table_name = 'files' AND column_name IN ('source_id', 'page_id')`,
);
// postgres.js returns RowList with an iterable-row shape; cast via
// unknown before narrowing to plain objects (TS2352 otherwise).
const names = new Set(
(cols as unknown as Array<{ column_name: string }>).map(r => r.column_name),
);
expect(names.has('source_id')).toBe(true);
expect(names.has('page_id')).toBe(true);
});
test('file_migration_ledger table exists with status CHECK (v23)', async () => {
const conn = getConn();
const tables = await conn.unsafe(
`SELECT table_name FROM information_schema.tables
WHERE table_name = 'file_migration_ledger'`,
);
expect(tables.length).toBe(1);
const check = await conn.unsafe(
`SELECT conname FROM pg_constraint WHERE conname = 'chk_ledger_status'`,
);
expect(check.length).toBe(1);
});
});
describeE2E('v0.18.0 multi-source — composite UNIQUE semantics on real Postgres', () => {
beforeAll(async () => {
await setupDB();
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
// residual rows from prior test runs can shadow new INSERTs. Wipe
// non-default sources at the top of every describe to keep each
// block hermetic. file_migration_ledger cascades from files which
// setupDB already truncates, but wipe explicitly in case files did
// not cascade it.
const conn = getConn();
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
await conn.unsafe(`DELETE FROM file_migration_ledger`);
});
afterAll(async () => {
await teardownDB();
});
test('same slug in two sources coexists (REGRESSION GUARD — Codex critical)', async () => {
const conn = getConn();
// Create a second source.
const engine = getEngine();
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['add', 'wiki', '--federated']);
// Insert the same slug under 'default' (via putPage) and 'wiki' (raw INSERT).
await engine.putPage('topics/ai', {
type: 'concept', title: 'AI from default', compiled_truth: 'default source take',
});
await conn.unsafe(
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
VALUES ('wiki', 'topics/ai', 'concept', 'AI from wiki', 'wiki source take', '', '{}'::jsonb, 'wikihash')`,
);
const rows = await conn.unsafe(
`SELECT source_id, slug, title FROM pages WHERE slug = 'topics/ai' ORDER BY source_id`,
);
expect(rows.length).toBe(2);
expect(rows.map((r: any) => r.source_id).sort()).toEqual(['default', 'wiki']);
});
test('duplicate (source_id, slug) hits composite UNIQUE', async () => {
const conn = getConn();
let err: Error | null = null;
try {
await conn.unsafe(
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
VALUES ('wiki', 'topics/ai', 'concept', 'dup', '', '', '{}'::jsonb, 'dup')`,
);
} catch (e) {
err = e as Error;
}
expect(err).not.toBeNull();
expect(err!.message.toLowerCase()).toMatch(/unique|duplicate/);
});
test('putPage (engine API) targets default source by schema DEFAULT', async () => {
const engine = getEngine();
await engine.putPage('topics/from-putpage', {
type: 'note', title: 'Via putPage', compiled_truth: 'body',
});
const conn = getConn();
const rows = await conn.unsafe(
`SELECT source_id FROM pages WHERE slug = 'topics/from-putpage'`,
);
expect(rows.length).toBe(1);
expect(rows[0].source_id).toBe('default');
});
});
describeE2E('v0.18.0 multi-source — cascade delete covers every dependent row', () => {
beforeAll(async () => {
await setupDB();
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
// residual rows from prior test runs can shadow new INSERTs. Wipe
// non-default sources at the top of every describe to keep each
// block hermetic. file_migration_ledger cascades from files which
// setupDB already truncates, but wipe explicitly in case files did
// not cascade it.
const conn = getConn();
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
await conn.unsafe(`DELETE FROM file_migration_ledger`);
});
afterAll(async () => {
await teardownDB();
});
test('sources remove cascades to pages + chunks + timeline + links + files', async () => {
const conn = getConn();
const engine = getEngine();
// Build a fully populated source: page, chunks, timeline entries,
// links, a file row. Then remove the source and verify nothing
// for that source survives.
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['add', 'cascadetest', '--federated']);
// Page under cascadetest
await conn.unsafe(
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
VALUES ('cascadetest', 'people/alice', 'person', 'Alice', 'Alice body', '', '{}'::jsonb, 'h1')`,
);
const alicePage = await conn.unsafe(
`SELECT id FROM pages WHERE source_id = 'cascadetest' AND slug = 'people/alice'`,
);
const aliceId = alicePage[0].id as number;
// A second page for link target
await conn.unsafe(
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
VALUES ('cascadetest', 'companies/acme', 'company', 'Acme', 'Acme body', '', '{}'::jsonb, 'h2')`,
);
const acmePage = await conn.unsafe(
`SELECT id FROM pages WHERE source_id = 'cascadetest' AND slug = 'companies/acme'`,
);
const acmeId = acmePage[0].id as number;
// Chunk
await conn.unsafe(
`INSERT INTO content_chunks (page_id, chunk_index, chunk_text, chunk_source)
VALUES (${aliceId}, 0, 'Alice body chunk', 'compiled_truth')`,
);
// Timeline
await conn.unsafe(
`INSERT INTO timeline_entries (page_id, date, source, summary, detail)
VALUES (${aliceId}, '2026-01-15', 'test', 'Joined Acme', 'detail')`,
);
// Link Alice → Acme
await conn.unsafe(
`INSERT INTO links (from_page_id, to_page_id, link_type, link_source)
VALUES (${aliceId}, ${acmeId}, 'works_at', 'markdown')`,
);
// File row pointing at Alice
await conn.unsafe(
`INSERT INTO files (source_id, page_id, filename, storage_path, content_hash)
VALUES ('cascadetest', ${aliceId}, 'alice.pdf', 'cascadetest/people/alice/alice.pdf', 'fh1')`,
);
// Sanity: everything exists
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM pages WHERE source_id = 'cascadetest'`))[0].n).toBe(2);
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM content_chunks WHERE page_id = ${aliceId}`))[0].n).toBe(1);
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM timeline_entries WHERE page_id = ${aliceId}`))[0].n).toBe(1);
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM links WHERE from_page_id = ${aliceId}`))[0].n).toBe(1);
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM files WHERE source_id = 'cascadetest'`))[0].n).toBe(1);
// Remove the source.
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['remove', 'cascadetest', '--yes']);
// Everything for that source is gone.
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM pages WHERE source_id = 'cascadetest'`))[0].n).toBe(0);
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM content_chunks WHERE page_id = ${aliceId}`))[0].n).toBe(0);
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM timeline_entries WHERE page_id = ${aliceId}`))[0].n).toBe(0);
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM links WHERE from_page_id = ${aliceId}`))[0].n).toBe(0);
expect((await conn.unsafe(`SELECT COUNT(*)::int AS n FROM files WHERE source_id = 'cascadetest'`))[0].n).toBe(0);
// The sources row itself is gone.
const src = await conn.unsafe(`SELECT id FROM sources WHERE id = 'cascadetest'`);
expect(src.length).toBe(0);
});
});
describeE2E('v0.18.0 multi-source — sync --source routes through sources table', () => {
beforeAll(async () => {
await setupDB();
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
// residual rows from prior test runs can shadow new INSERTs. Wipe
// non-default sources at the top of every describe to keep each
// block hermetic. file_migration_ledger cascades from files which
// setupDB already truncates, but wipe explicitly in case files did
// not cascade it.
const conn = getConn();
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
await conn.unsafe(`DELETE FROM file_migration_ledger`);
});
afterAll(async () => {
await teardownDB();
});
test('performSync with sourceId reads local_path from sources row', async () => {
const engine = getEngine();
const conn = getConn();
// Register a source with a bogus path (we're not actually walking a
// repo — this test asserts that performSync correctly RESOLVES the
// source row vs hitting the global config).
await runSources(engine as unknown as Parameters<typeof runSources>[0], [
'add', 'syncsrc', '--path', '/nonexistent/syncsrc/path', '--no-federated',
]);
// Also set a DIFFERENT path in the global config so we can verify
// sourceId actually disambiguates.
await engine.setConfig('sync.repo_path', '/some/other/default/path');
// performSync({sourceId: 'syncsrc'}) should attempt to use
// /nonexistent/syncsrc/path, NOT /some/other/default/path.
let err: Error | null = null;
try {
await performSync(engine, { sourceId: 'syncsrc' });
} catch (e) {
err = e as Error;
}
expect(err).not.toBeNull();
// The error message references the source-scoped path, not the
// global config path. (Could be "Not a git repository"
// or "No commits in repo" — either way the path it cites should
// be the source's.)
expect(err!.message).toContain('/nonexistent/syncsrc/path');
expect(err!.message).not.toContain('/some/other/default/path');
});
test('performSync with no sourceId falls back to global sync.repo_path', async () => {
const engine = getEngine();
// Global config is still '/some/other/default/path' from the
// previous test. Without --source, performSync uses it.
let err: Error | null = null;
try {
await performSync(engine, {});
} catch (e) {
err = e as Error;
}
expect(err).not.toBeNull();
expect(err!.message).toContain('/some/other/default/path');
});
});
describeE2E('v0.18.0 multi-source — sources table surface', () => {
beforeAll(async () => {
await setupDB();
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
// residual rows from prior test runs can shadow new INSERTs. Wipe
// non-default sources at the top of every describe to keep each
// block hermetic. file_migration_ledger cascades from files which
// setupDB already truncates, but wipe explicitly in case files did
// not cascade it.
const conn = getConn();
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
await conn.unsafe(`DELETE FROM file_migration_ledger`);
});
afterAll(async () => {
await teardownDB();
});
test('default source is seeded federated=true; new sources default to isolated', async () => {
const conn = getConn();
const engine = getEngine();
const def = await conn.unsafe(`SELECT config FROM sources WHERE id = 'default'`);
const defConfig = typeof def[0].config === 'string' ? JSON.parse(def[0].config) : def[0].config;
expect(defConfig.federated).toBe(true);
// Defensive cleanup: sources isn't in helpers.ALL_TABLES, so residual
// rows from prior test runs can shadow this INSERT via ON CONFLICT
// DO NOTHING. Delete first, then create.
await conn.unsafe(`DELETE FROM sources WHERE id = 'isolatedsrc'`);
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['add', 'isolatedsrc']);
const iso = await conn.unsafe(`SELECT config FROM sources WHERE id = 'isolatedsrc'`);
const isoConfig = typeof iso[0].config === 'string' ? JSON.parse(iso[0].config) : iso[0].config;
expect(isoConfig.federated).toBeUndefined(); // omitted → isolated-by-default
});
test('federate / unfederate flips config.federated on real DB', async () => {
const conn = getConn();
const engine = getEngine();
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['federate', 'isolatedsrc']);
let row = await conn.unsafe(`SELECT config FROM sources WHERE id = 'isolatedsrc'`);
let config = typeof row[0].config === 'string' ? JSON.parse(row[0].config) : row[0].config;
expect(config.federated).toBe(true);
await runSources(engine as unknown as Parameters<typeof runSources>[0], ['unfederate', 'isolatedsrc']);
row = await conn.unsafe(`SELECT config FROM sources WHERE id = 'isolatedsrc'`);
config = typeof row[0].config === 'string' ? JSON.parse(row[0].config) : row[0].config;
expect(config.federated).toBe(false);
});
test('rename changes name, id stays stable', async () => {
const conn = getConn();
const engine = getEngine();
await runSources(engine as unknown as Parameters<typeof runSources>[0], [
'rename', 'isolatedsrc', 'My Isolated Source',
]);
const row = await conn.unsafe(`SELECT id, name FROM sources WHERE id = 'isolatedsrc'`);
expect(row[0].id).toBe('isolatedsrc');
expect(row[0].name).toBe('My Isolated Source');
});
});
describeE2E('v0.18.0 multi-source — storage backfill against file_migration_ledger', () => {
beforeAll(async () => {
await setupDB();
// sources + file_migration_ledger are not in helpers.ALL_TABLES, so
// residual rows from prior test runs can shadow new INSERTs. Wipe
// non-default sources at the top of every describe to keep each
// block hermetic. file_migration_ledger cascades from files which
// setupDB already truncates, but wipe explicitly in case files did
// not cascade it.
const conn = getConn();
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
await conn.unsafe(`DELETE FROM file_migration_ledger`);
});
afterAll(async () => {
await teardownDB();
});
test('seeded ledger + stub storage: pending → complete end-to-end', async () => {
const conn = getConn();
const engine = getEngine();
// Seed a page + file (via raw INSERT so the test doesn't depend on
// sync running).
await conn.unsafe(
`INSERT INTO pages (source_id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash)
VALUES ('default', 'topics/storage', 'note', 'Storage test', 'body', '', '{}'::jsonb, 'sh1')`,
);
const pageRow = await conn.unsafe(
`SELECT id FROM pages WHERE source_id = 'default' AND slug = 'topics/storage'`,
);
const pageId = pageRow[0].id as number;
await conn.unsafe(
`INSERT INTO files (source_id, page_id, filename, storage_path, content_hash)
VALUES ('default', ${pageId}, 'doc.pdf', 'topics/storage/doc.pdf', 'fh1')`,
);
const fileRow = await conn.unsafe(
`SELECT id FROM files WHERE storage_path = 'topics/storage/doc.pdf'`,
);
const fileId = fileRow[0].id as number;
// Seed the ledger manually so we don't depend on the v23 seed SQL
// (the TRUNCATE CASCADE in setupDB wipes ledger rows).
await conn.unsafe(
`INSERT INTO file_migration_ledger (file_id, storage_path_old, storage_path_new, status)
VALUES (${fileId}, 'topics/storage/doc.pdf', 'default/topics/storage/doc.pdf', 'pending')
ON CONFLICT (file_id) DO NOTHING`,
);
// Stub storage: downloads return bytes, uploads track what was written.
const uploaded = new Set<string>();
const stub: StorageBackend = {
upload: async (p: string) => { uploaded.add(p); },
download: async (p: string) => Buffer.from('bytes-for:' + p),
delete: async (p: string) => { uploaded.delete(p); },
exists: async (p: string) => uploaded.has(p),
list: async () => [],
getUrl: async (p) => `https://stub/${p}`,
};
const report = await runStorageBackfill(engine, stub);
expect(report.total).toBe(1);
expect(report.nowComplete).toBe(1);
expect(report.failed).toBe(0);
// Ledger row transitioned to complete.
const ledger = await conn.unsafe(
`SELECT status FROM file_migration_ledger WHERE file_id = ${fileId}`,
);
expect(ledger[0].status).toBe('complete');
// Files row now points at the new path.
const filesAfter = await conn.unsafe(
`SELECT storage_path FROM files WHERE id = ${fileId}`,
);
expect(filesAfter[0].storage_path).toBe('default/topics/storage/doc.pdf');
// Stub storage saw the upload happen at the new path.
expect(uploaded.has('default/topics/storage/doc.pdf')).toBe(true);
});
});
// v0.18.0: real-Postgres regression guard for the addLinksBatch /
// addTimelineEntriesBatch JOIN fan-out bug. Before the fix, the JOIN was
// `pages.slug = v.from_slug` unqualified — so two pages sharing the same
// slug across sources would silently duplicate edges and timeline rows.
// postgres-js binds arrays through `unnest()` rather than inline VALUES,
// so the query shape is structurally different from PGLite's and gets its
// own coverage.
describeE2E('v0.18.0 multi-source — addLinksBatch / addTimelineEntriesBatch source-awareness', () => {
beforeAll(async () => {
await setupDB();
const conn = getConn();
await conn.unsafe(`DELETE FROM sources WHERE id != 'default'`);
await conn.unsafe(`DELETE FROM file_migration_ledger`);
});
afterAll(async () => { await teardownDB(); });
async function seedSameSlugTwoSources() {
const conn = getConn();
const engine = getEngine() as PostgresEngine;
// Second source alongside 'default'.
await conn.unsafe(
`INSERT INTO sources (id, name) VALUES ('alt', 'alt') ON CONFLICT (id) DO NOTHING`
);
// Create same-slug pages in both sources. putPage defaults to 'default'.
await engine.putPage('topics/ai', { type: 'concept', title: 'AI (default)', compiled_truth: '', timeline: '' });
await engine.putPage('topics/ml', { type: 'concept', title: 'ML (default)', compiled_truth: '', timeline: '' });
await conn.unsafe(
`INSERT INTO pages (slug, type, title, compiled_truth, timeline, frontmatter, content_hash, source_id, updated_at)
VALUES ('topics/ai', 'concept', 'AI (alt)', '', '', '{}'::jsonb, 'alt-ai-hash', 'alt', now()),
('topics/ml', 'concept', 'ML (alt)', '', '', '{}'::jsonb, 'alt-ml-hash', 'alt', now())`
);
}
test('addLinksBatch without explicit source_id does NOT fan out across sources', async () => {
await seedSameSlugTwoSources();
const conn = getConn();
const engine = getEngine() as PostgresEngine;
// Reset links from any prior describe block.
await conn.unsafe(`DELETE FROM links`);
const inserted = await engine.addLinksBatch([
{ from_slug: 'topics/ai', to_slug: 'topics/ml', link_type: 'mention' },
]);
// Exactly one edge (default → default). Before the fix this was 2.
expect(inserted).toBe(1);
const rows = await conn.unsafe(
`SELECT f.source_id AS from_src, t.source_id AS to_src
FROM links l
JOIN pages f ON f.id = l.from_page_id
JOIN pages t ON t.id = l.to_page_id`
);
expect(rows.length).toBe(1);
expect(rows[0].from_src).toBe('default');
expect(rows[0].to_src).toBe('default');
});
test('addLinksBatch supports cross-source edges when explicit source_ids differ', async () => {
const conn = getConn();
const engine = getEngine() as PostgresEngine;
await conn.unsafe(`DELETE FROM links`);
const inserted = await engine.addLinksBatch([
{
from_slug: 'topics/ai', to_slug: 'topics/ml', link_type: 'mention',
from_source_id: 'default', to_source_id: 'alt',
},
]);
expect(inserted).toBe(1);
const rows = await conn.unsafe(
`SELECT f.source_id AS from_src, t.source_id AS to_src
FROM links l
JOIN pages f ON f.id = l.from_page_id
JOIN pages t ON t.id = l.to_page_id`
);
expect(rows.length).toBe(1);
expect(rows[0].from_src).toBe('default');
expect(rows[0].to_src).toBe('alt');
});
test('addTimelineEntriesBatch without explicit source_id does NOT fan out across sources', async () => {
const conn = getConn();
const engine = getEngine() as PostgresEngine;
await conn.unsafe(`DELETE FROM timeline_entries`);
const inserted = await engine.addTimelineEntriesBatch([
{ slug: 'topics/ai', date: '2024-01-15', summary: 'Founded' },
]);
expect(inserted).toBe(1);
const rows = await conn.unsafe(
`SELECT p.source_id
FROM timeline_entries te
JOIN pages p ON p.id = te.page_id`
);
expect(rows.length).toBe(1);
expect(rows[0].source_id).toBe('default');
});
test('addTimelineEntriesBatch with explicit alt source_id lands only in alt', async () => {
const conn = getConn();
const engine = getEngine() as PostgresEngine;
await conn.unsafe(`DELETE FROM timeline_entries`);
const inserted = await engine.addTimelineEntriesBatch([
{ slug: 'topics/ai', date: '2024-02-01', summary: 'Alt-only event', source_id: 'alt' },
]);
expect(inserted).toBe(1);
const rows = await conn.unsafe(
`SELECT p.source_id
FROM timeline_entries te
JOIN pages p ON p.id = te.page_id`
);
expect(rows.length).toBe(1);
expect(rows[0].source_id).toBe('alt');
});
});