* feat(v0.17.0 step 1/9): sources primitive — additive-only multi-source foundation
Lane A of the multi-repo plan. Installs the sources table and seeds a
'default' row that inherits sync.repo_path/last_commit from existing
config. This is the bisectable foundation every later step builds on;
the breaking schema changes (composite UNIQUE, files FK rewrite,
resolution_type, ingest_log.source_id) land with their paired code
rewrites in Steps 2/4/5/7 so no single commit breaks the engine.
- migration v16 (sources_table_additive) + v0_17_0 orchestrator skeleton
- sort-by-version guard in runMigrations (array insertion order can
never cause a later migration to skip a lower one again)
- default source seeded with config '{"federated": true}' so pre-v0.17
brains keep single-namespace search semantics after upgrade
- orchestrator phase B detects absence of file_migration_ledger and
no-ops until Step 7 lands it
- 8 new structural tests in test/migrate.test.ts (shape, idempotency,
scope-guard that nothing else was smuggled into v16)
- apply-migrations tests include v0.17.0 in the registered list
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 2/9): pages.source_id + composite UNIQUE (Lane B)
Migration v17 adds pages.source_id with DEFAULT 'default' and swaps the
global UNIQUE(slug) for composite UNIQUE(source_id, slug). Ships atomically
with the engine's ON CONFLICT rewrite so the constraint swap and the code
that writes under it land in the same commit — no window where the engine
sees one shape and the schema has another.
Minimum-surface engine change: only putPage's ON CONFLICT target needs
re-targeting. Other slug-based queries work unchanged because single-
source brains (the only brain shape pre-Step-5) have exactly one source
'default', so slug remains effectively unique within it. Step 5+ will
surface an explicit sourceId param on putPage for cross-source sync.
- migration v17 (pages_source_id_composite_unique) in src/core/migrate.ts
- pages.source_id + composite UNIQUE added to schema.sql + pglite-schema.ts
for fresh installs
- ON CONFLICT (slug) → ON CONFLICT (source_id, slug) in both pglite-engine
and postgres-engine putPage
- DEFAULT 'default' closes the Codex-flagged race where an INSERT between
ADD COLUMN and SET NOT NULL could leave source_id NULL
- 5 new v17 structural tests (29 pass / 0 fail in migrate.test.ts)
- Full suite: 1979 pass / 3 fail (same as baseline — no regressions)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 6/9): sources CLI + source-resolver (Lane C)
Adds the CLI surface for multi-source management. Users can now register,
list, rename, federate/unfederate, and attach-to-directory a source. The
source-resolver is the shared 6-priority helper that Steps 4/5 will use
when they start surfacing an explicit --source flag on sync/extract/query.
Commands:
gbrain sources add <id> --path <p> [--name <n>] [--federated|--no-federated]
gbrain sources list [--json]
gbrain sources remove <id> [--yes] [--dry-run] [--keep-storage]
gbrain sources rename <id> <new-name>
gbrain sources default <id>
gbrain sources attach <id> — writes .gbrain-source in CWD
gbrain sources detach
gbrain sources federate <id> / unfederate <id>
Resolution priority (source-resolver.ts) — highest first:
1. --source flag 2. GBRAIN_SOURCE env 3. .gbrain-source dotfile walk-up
4. longest-prefix match on registered local_path (Codex #2 fix)
5. sources.default config 6. fallback 'default'
- add: validates id format (kebab-case alnum, 1-32), rejects overlapping
paths (eng review §4 finding 4.1), supports federated default opt-in
- remove: guards against --yes omission + refuses to remove 'default',
supports --dry-run, reports cascade page count
- attach/detach: matches kubectl/terraform context-pinning semantics
- Throws on overlap rather than process.exit() so the CLI error wrapper
reports it consistently (also makes unit testing clean)
28 new tests across sources.test.ts (dispatcher + validation + overlap
guard) and source-resolver.test.ts (full 6-priority coverage including
longest-prefix). Full suite: 2012 pass / 3 fail (pre-existing PGLite
infra timeouts).
NOT in scope for Step 6 (deferred):
- import-from-github (SSRF + clone integration)
- prune (retention/TTL, lands v0.18)
- MCP tool-defs regen for source-scoping on read ops (Step 5)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(v0.17.0 step 8/9): getting-started guide + migration skill + citation rule
Step 8 (Lane F) documents what Steps 1+2+6 have shipped and sets up
the agent-facing rules for multi-source.
New files:
- skills/migrations/v0.17.0.md — migration skill read by host agents
after `gbrain apply-migrations`. Covers the v16+v17 chain, what's
in v0.17.0 vs what lands later (v0.17.1 ACL, v0.18 sessions), and
the new sources CLI surface. Cites docs/guides/multi-source-brains.md
as the recipe.
- docs/guides/multi-source-brains.md — getting-started for end users.
Three canonical scenarios (unified wiki+gstack / purpose-separated
yc-media+garrys-list / mixed), full resolution priority, federation
flag semantics, command reference, and citation format.
skills/brain-ops/SKILL.md — new "Cross-source citation format"
section mandating `[source-id:slug]` when the brain has multiple
sources. Matches the contract the /plan-devex-review DX review
pinned down (DX Finding 5: surface source_id in every page payload
+ citation contract). Key must be sources.id (immutable), never
sources.name.
No behavior change — this is pure documentation for what already
exists in the binary. 144 skills conformance tests still pass.
NOT in this commit (deferred to later steps):
- docs/guides/repo-architecture.md rewrite (lands with the full
v0.17.0 PR description + release notes)
- skills/_brain-filing-rules.md "which source to file into"
guidance (lands with Step 5 when sync surfaces --source)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 5/9): sync --source <id> routes through sources table (Lane D)
Adds the --source flag to `gbrain sync`. When set, sync reads local_path
+ last_commit from the matching sources(id) row instead of the global
sync.repo_path / sync.last_commit config keys, and writes last_commit +
last_sync_at back to the same row. Backward compat: --source omitted =
pre-v0.17 behavior exactly, global config path unchanged.
- SyncOpts.sourceId threaded through performSync + performFullSync
- readSyncAnchor/writeSyncAnchor helpers centralize the sources-vs-config
branch so every read/write goes through one decision point. Makes
Step 5's later per-source sync-failures tracking a one-file change.
- --source resolved via src/core/source-resolver.ts (Step 6), so any
command that shell-exposes resolveSourceId gets env var + dotfile
walk-up + longest-prefix for free.
- Error message for missing source local_path is actionable:
Source "gstack" has no local_path. Run: gbrain sources add gstack --path <path>
- last_sync_at auto-updates on every last_commit advance so `gbrain
sources list` shows real recency.
No regression: 2012 pass / 3 fail (same as baseline).
NOT in this commit (deferred per plan):
- Per-source failure tracking (~/.gbrain/sources/<id>/sync-failures.jsonl)
- runImport source-awareness (import.ts path — Step 5 continuation)
- Partial-success semantics when walking N sources — single-source flow
today, multi-walk lands when the top-level `gbrain sync` without
--source starts iterating all sources.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 4/9): qualified [[source:slug]] + links.resolution_type (Lane B)
Adds source-pinned wikilink syntax and records the resolution kind on
each edge so `gbrain extract --refresh-unqualified` (future) can
re-resolve bare references when the source topology changes.
Wikilink syntax extension:
[[concepts/ai]] — unqualified; resolves via local-first fallback
[[wiki:concepts/ai]] — qualified; target pinned to sources.id='wiki'
[[gstack:projects/foo|Display]] — qualified + display name
The qualified regex runs first and masks matched spans so the
unqualified pass can't double-emit. Source id format enforced to match
the sources CLI validation: [a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?
Schema:
- migration v18 adds links.resolution_type TEXT with CHECK constraint
('qualified'|'unqualified' or NULL for legacy/manual/frontmatter edges)
- schema.sql + pglite-schema.ts updated for fresh installs
EntityRef type:
- sourceId is OPTIONAL (only set on qualified wikilinks). Markdown
[Name](path) and unqualified wikilinks omit it so strict toEqual
tests pre-v0.17 keep working (69 existing tests still pass).
Tests:
- 5 new qualified-wikilink extraction tests + 1 migration v18 structural
assertion. 75 tests in test/link-extraction.test.ts (up from 69).
- Full suite: 2018 pass / 3 fail (pre-existing PGLite infra timeouts).
NOT in this commit (deferred to Step 3 / Step 5 continuation):
- Writing resolution_type to the DB (addLink / addLinksBatch don't
carry the field yet — that's the plumb-through that lands with
Step 3 when search/dedup also needs source-aware result keys).
- `gbrain extract --refresh-unqualified` re-resolver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 3/9): source-aware search dedup composite keys (Lane B)
Search dedup now keys on (source_id, slug) instead of slug alone. Pre-
v0.17 would collapse two same-slug pages in different sources into
one, destroying cross-source recall. Codex outside-voice review flagged
this as regression-critical — this commit ships the fix plus tests
that lock the invariant in.
Dedup pipeline (src/core/search/dedup.ts):
- pageKey(r) helper — one canonical composite-key derivation. Falls
back to source_id='default' for pre-v0.17 rows so single-source
brains behave identically to before.
- Layer 1 (dedupBySource): group-by composite key.
- Layer 4 (capPerPage): count-by composite key.
- guaranteeCompiledTruth: swap scoped to matching (source_id, slug),
so wiki:topics/ai can't accidentally pull gstack:topics/ai's
compiled_truth chunk.
SearchResult type gains optional source_id — populated by SQL JOINs
in both engines, falls through as 'default' for legacy callers.
Engine SQL:
- pglite-engine.ts + postgres-engine.ts: search SELECTs add p.source_id
- rowToSearchResult (utils.ts): maps row.source_id → result.source_id
when present. Shape stays backward compatible (field optional).
Tests — 4 new in test/dedup.test.ts:
- same-slug-different-source does NOT collapse (the critical regression
guard Codex called out)
- same-slug-same-source DOES still collapse (no over-correction)
- missing source_id falls back to 'default' for pre-v0.17 compat
- compiled_truth guarantee scopes to composite key (Codex second pass
caught this specific path would leak otherwise)
Full suite: 2022 pass / 3 fail (3 pre-existing PGLite infra timeouts).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(v0.17.0 step 7/9): file_migration_ledger + phase-B storage backfill (Lane E)
Adds files.source_id + files.page_id + the file_migration_ledger
state machine that drives storage object rewrites. Each per-file
transition is its own transaction so crash-point recovery is a
ledger read, not a filesystem inspection. Codex second-pass review
flagged that "skip if already has source prefix" was an unsafe
heuristic — the ledger replaces it with explicit state tracking.
Schema:
- migration v19 (files_source_id_page_id_ledger): handler-only
(PGLite has no files table; Postgres-only gate). ADDs
source_id + page_id to files, backfills page_id from page_slug
scoped to source_id='default', creates file_migration_ledger
with PK on file_id (Codex: not storage_path_old — two sources
can share an old path during migration).
- schema.sql updated for fresh Postgres installs; file_migration_ledger
gets RLS alongside other tables.
Runtime:
- src/commands/migrations/v0_17_0-storage-backfill.ts: drives the
ledger state machine pending → copy_done → db_updated → complete.
Idempotent per row: re-running resumes from whichever state
crashed. Old objects preserved (no delete) so operators can
verify the soak window before a future cleanup release.
- phase B in v0_17_0.ts orchestrator: wires the storage backend
(Supabase/S3/local) through createStorage, runs runStorageBackfill,
reports per-state counts + first-three error details.
Tests — 13 new in test/storage-backfill.test.ts:
- pending → copy_done → db_updated → complete happy path
- 3 crash-point recovery tests (resume from copy_done, resume from
db_updated, failed rows don't auto-retry)
- already-complete rows are skipped with zero side effects
- idempotent re-upload (exists-check skips redundant upload)
- dry-run mode (no storage, reports counts without mutating)
Plus 5 new migrate.test.ts assertions for v19 structure (handler-
only, PGLite gate, source_id + page_id + ledger DDL, default-source
backfill scope, state machine values).
Full suite: 2035 pass / 3 fail (3 pre-existing PGLite infra
timeouts).
NOT in this commit (explicitly deferred):
- DROP old page_slug column — kept for backward compat until
operators have time to verify page_id everywhere.
- DROP old UNIQUE(storage_path) in favor of UNIQUE(source_id,
storage_path) — same reason, deferred to later cleanup.
- Actual cleanup phase that deletes old objects post-soak.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(v0.17.0 step 9/9): full multi-source PGLite integration suite (Lane G)
End-to-end exercise of every v0.17.0 surface against real PGLite
(in-memory, fast — no DATABASE_URL needed). The migration chain
v2→v19 runs start-to-finish and the test asserts each Step's
invariants hold together.
16 new integration tests across 7 describes:
1. Migration-installed state:
- sources('default') exists with federated=true config
- pages.source_id column has DEFAULT 'default'
- composite UNIQUE (source_id, slug) is installed
2. Default-source write path:
- putPage without explicit source → source_id='default' via schema
default clause (no engine API change needed for single-source brains)
3. Composite UNIQUE regression guards (Codex-flagged):
- Same slug in two different sources coexists
- Third insert with same (source_id, slug) hits the UNIQUE constraint
4. sources CLI round-trip:
- federate / unfederate flips config.federated
- rename changes display, id stays immutable
5. Source resolution priority (integration):
- Explicit flag > env var > fallback to default
- Unregistered explicit source errors with actionable message
6. Cascade semantics:
- sources remove cascades to pages; default source untouched
7. links.resolution_type (Step 4):
- Qualified/unqualified values accepted
- CHECK constraint rejects invalid values
All 16 tests pass. Full suite: 2042 pass / 4 fail (4 pre-existing
PGLite beforeEach timeouts in test/wait-for-completion,
test/extract-fs, test/e2e/search-quality, test/e2e/graph-quality
— count fluctuated 3-5 on baseline from variance alone).
Total new tests across Steps 1-9: ~85 unit + integration tests
(sources, source-resolver, migrate v16/v17/v18/v19 structural,
link-extraction qualified wikilinks, dedup regression-critical,
storage-backfill state machine + crash recovery, full
multi-source PGLite integration).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump to v0.18.0 + CHANGELOG entry (multi-source brains)
One-viewport release summary + itemized changes covering all 9 steps
of the multi-source primitive. Notes the v0.17 → v0.18 version bump
rationale (master shipped gbrain dream as v0.17 while this branch was
in flight).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci): v0_18_0 orchestrator TS narrow + mechanical test ON CONFLICT
Two CI failures on PR #337:
1. tsc TS2367 at src/commands/migrations/v0_18_0.ts:190 —
after the early-return on `a.status === 'failed'` (line 179),
TypeScript narrows `a.status` to `'skipped' | 'complete'`, so the
subsequent `a.status === 'failed' ? 'failed' :` branch was dead
code and refused to compile. Dropped the redundant check.
2. E2E `file_list LIMIT enforcement` at test/e2e/mechanical.test.ts:636 —
the test pre-seeded a pages row with `ON CONFLICT (slug) DO NOTHING`
but v21 swapped the global UNIQUE for `UNIQUE (source_id, slug)`, so
Postgres rejects with "no unique or exclusion constraint matching".
Updated the conflict target to the composite key.
Tier-1 E2E had only this one failing test; everything else passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(e2e): v0.18.0 multi-source against real Postgres (v20-v23 schema + cascade + sync)
Closes the three biggest confidence gaps the author flagged in the
self-audit of PR #337:
1. No real Postgres E2E — PGLite has no files table, so v23's
files.source_id + files.page_id rewrite + file_migration_ledger
seed was NEVER executed against the real DB. This file covers it.
2. `gbrain sync --source <id>` had zero direct tests. Now has two:
one that asserts performSync({sourceId}) reads local_path from the
sources row (not the global config), one that asserts no-sourceId
falls back to the global sync.repo_path.
3. Cascade delete coverage — previously verified only pages count
after source removal. Now verifies pages + content_chunks +
timeline_entries + links + files ALL cascade-delete when a source
is removed.
6 describes, 16 tests total:
- Schema shape (fresh install): 6 tests confirming sources('default'),
pages.source_id NOT NULL with DEFAULT, composite UNIQUE pages
(source_id, slug) replaces global UNIQUE(slug), links.resolution_type
column + CHECK, files.source_id + page_id columns, file_migration_ledger
table + status CHECK.
- Composite UNIQUE semantics: 3 tests confirming same-slug in two
sources coexists (Codex-critical regression guard), duplicate
(source_id, slug) hits the UNIQUE, putPage targets default source
by schema DEFAULT.
- Cascade delete: 1 test building a fully populated source (2 pages,
chunks, timeline, links, files) then removing it + asserting every
dependent row is gone.
- Sync routing: 2 tests confirming performSync({sourceId}) reads
per-source local_path vs global config.
- Sources surface: 3 tests for federate/unfederate flipping + rename
preserving id.
- Storage backfill: 1 end-to-end test seeding ledger + running
runStorageBackfill against a stub StorageBackend, asserting
pending → complete transition and files.storage_path rewrite.
Gated by DATABASE_URL per CLAUDE.md E2E lifecycle. Each describe's
beforeAll defensively DELETEs non-default sources + file_migration_ledger
rows so reruns are hermetic (sources isn't in helpers.ALL_TABLES).
Verified: 16/16 pass on first run AND second run (residual-state fix
holds). Full E2E suite still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci): TS2352 in multi-source E2E — cast postgres.js RowList via unknown
tsc rejects the direct
`(rows as { column_name: string }[]).map(...)`
cast because postgres.js RowList rows have an iterable-row shape that
doesn't overlap with the plain-object target. Standard fix: cast via
`unknown` first so the narrowing is explicit.
Verified: `bunx tsc --noEmit` clean (ignoring the pre-existing baseUrl
deprecation warning).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(v0.18.0): addLinksBatch + addTimelineEntriesBatch source-aware JOINs
Batch APIs JOINed on pages.slug globally, so two pages sharing the same
slug across sources would silently fan out — addLinksBatch(['a->b']) in
a brain with 'a' in both 'default' and 'alt' wrote 2 edges instead of 1.
Same bug on addTimelineEntriesBatch.
Fix:
- LinkBatchInput + TimelineBatchInput gain optional source_id fields
(from_source_id, to_source_id, origin_source_id for links; source_id
for timeline). All default to 'default' so existing callers are
backward-compatible on single-source brains.
- pglite-engine + postgres-engine batch JOINs now composite-key on
(slug, source_id). Postgres adds 3 more unnest arrays for links + 1
for timeline — still one bind per column, no 65535-param cap risk.
- LEFT JOIN for origin pages also source-qualified so frontmatter-
provenance edges don't cross-pollinate across sources.
Regression coverage:
- test/pglite-engine.test.ts: 5 new tests covering default-path isolation,
explicit alt-source writes, and cross-source edges.
- test/e2e/multi-source.test.ts: 4 new tests against real Postgres so
postgres-js's unnest() bind path is exercised (structurally different
from PGLite's).
Gap #4 from the PR self-audit — latent bug, not previously reachable
because every existing caller wrote to the default source only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
512 lines
24 KiB
TypeScript
512 lines
24 KiB
TypeScript
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
|
import { LATEST_VERSION, runMigrations, MIGRATIONS } from '../src/core/migrate.ts';
|
|
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
|
|
|
|
describe('migrate', () => {
|
|
test('LATEST_VERSION is a number >= 1', () => {
|
|
expect(typeof LATEST_VERSION).toBe('number');
|
|
expect(LATEST_VERSION).toBeGreaterThanOrEqual(1);
|
|
});
|
|
|
|
test('runMigrations is exported and callable', async () => {
|
|
expect(typeof runMigrations).toBe('function');
|
|
});
|
|
|
|
// Integration tests for actual migration execution require DATABASE_URL
|
|
// and are covered in the E2E suite (test/e2e/mechanical.test.ts)
|
|
});
|
|
|
|
// ─────────────────────────────────────────────────────────────────
|
|
// v0.18.0 — v16 sources_table_additive (Step 1, Lane A)
|
|
// ─────────────────────────────────────────────────────────────────
|
|
// v16 is the ADDITIVE-ONLY migration: it installs the sources primitive
|
|
// without breaking the engine's existing ON CONFLICT (slug) upserts.
|
|
// The breaking schema changes (pages.source_id NOT NULL, composite
|
|
// UNIQUE, files.page_slug → page_id, file_migration_ledger,
|
|
// links.resolution_type) land in v17 alongside the engine API rewrite
|
|
// so the engine can execute the new ON CONFLICT (source_id, slug)
|
|
// atomically with the schema change.
|
|
// ─────────────────────────────────────────────────────────────────
|
|
describe('migrate v20 — sources_table_additive', () => {
|
|
const v20 = MIGRATIONS.find(m => m.version === 20);
|
|
|
|
test('v20 exists', () => {
|
|
expect(v20).toBeDefined();
|
|
expect(v20!.name).toBe('sources_table_additive');
|
|
});
|
|
|
|
test('v20 creates sources table', () => {
|
|
expect(v20!.sql).toContain('CREATE TABLE IF NOT EXISTS sources');
|
|
expect(v20!.sql).toContain('id TEXT PRIMARY KEY');
|
|
expect(v20!.sql).toContain('name TEXT NOT NULL UNIQUE');
|
|
expect(v20!.sql).toContain('config JSONB NOT NULL');
|
|
});
|
|
|
|
test("v20 seeds 'default' source inheriting sync config", () => {
|
|
expect(v20!.sql).toContain("INSERT INTO sources (id, name, local_path, last_commit, config)");
|
|
expect(v20!.sql).toContain("'default'");
|
|
// The default source pulls from existing config so post-upgrade
|
|
// identity is preserved.
|
|
expect(v20!.sql).toContain("SELECT value FROM config WHERE key = 'sync.repo_path'");
|
|
expect(v20!.sql).toContain("SELECT value FROM config WHERE key = 'sync.last_commit'");
|
|
});
|
|
|
|
test('v20 default source is federated=true (backward-compat)', () => {
|
|
// federated=true ensures pre-v0.17 brains keep single-namespace
|
|
// search semantics — every page appears in unqualified search.
|
|
expect(v20!.sql).toContain('"federated": true');
|
|
});
|
|
|
|
test('v20 is idempotent on re-run', () => {
|
|
// CREATE TABLE IF NOT EXISTS + NOT EXISTS subquery on INSERT.
|
|
expect(v20!.sql).toContain('CREATE TABLE IF NOT EXISTS sources');
|
|
expect(v20!.sql).toContain('WHERE NOT EXISTS (SELECT 1 FROM sources WHERE id = ');
|
|
});
|
|
|
|
test('v20 does NOT touch pages / ingest_log / files / links', () => {
|
|
// Step 1 is additive-only. Breaking changes deferred to v17 so they
|
|
// land with the engine rewrite (Step 2). Guard against anyone
|
|
// accidentally re-expanding v16's scope.
|
|
expect(v20!.sql).not.toContain('ALTER TABLE pages');
|
|
expect(v20!.sql).not.toContain('ALTER TABLE ingest_log');
|
|
expect(v20!.sql).not.toContain('ALTER TABLE files');
|
|
expect(v20!.sql).not.toContain('ALTER TABLE links');
|
|
expect(v20!.handler).toBeUndefined();
|
|
});
|
|
});
|
|
|
|
// ─────────────────────────────────────────────────────────────────
|
|
// v0.18.0 — v17 pages_source_id_composite_unique (Step 2, Lane B)
|
|
// ─────────────────────────────────────────────────────────────────
|
|
describe('migrate v21 — pages_source_id_composite_unique', () => {
|
|
const v21 = MIGRATIONS.find(m => m.version === 21);
|
|
|
|
test('v21 exists and is paired with Step 2 engine rewrite', () => {
|
|
expect(v21).toBeDefined();
|
|
expect(v21!.name).toBe('pages_source_id_composite_unique');
|
|
});
|
|
|
|
test('v21 adds pages.source_id with DEFAULT default REFERENCES sources', () => {
|
|
expect(v21!.sql).toContain('ALTER TABLE pages ADD COLUMN IF NOT EXISTS source_id TEXT');
|
|
// DEFAULT 'default' closes the race where an INSERT between ADD COLUMN
|
|
// and SET NOT NULL could leave source_id NULL (Codex second-pass review).
|
|
expect(v21!.sql).toContain("NOT NULL DEFAULT 'default' REFERENCES sources(id)");
|
|
});
|
|
|
|
test('v21 swaps UNIQUE(slug) → composite UNIQUE(source_id, slug)', () => {
|
|
// ON CONFLICT (source_id, slug) in putPage relies on this swap.
|
|
expect(v21!.sql).toContain('ALTER TABLE pages DROP CONSTRAINT IF EXISTS pages_slug_key');
|
|
expect(v21!.sql).toContain('pages_source_slug_key');
|
|
expect(v21!.sql).toContain('UNIQUE (source_id, slug)');
|
|
});
|
|
|
|
test('v21 creates source-scoped index for per-source scans', () => {
|
|
expect(v21!.sql).toContain('CREATE INDEX IF NOT EXISTS idx_pages_source_id');
|
|
});
|
|
|
|
test('v21 constraint add is guarded (idempotent re-run)', () => {
|
|
// DO block with IF NOT EXISTS guard means re-running the migration
|
|
// after partial failure doesn't error on the already-installed name.
|
|
expect(v21!.sql).toContain('IF NOT EXISTS');
|
|
expect(v21!.sql).toContain("WHERE conname = 'pages_source_slug_key'");
|
|
});
|
|
});
|
|
|
|
// ─────────────────────────────────────────────────────────────────
|
|
// v0.18.0 — v19 files_source_id_page_id_ledger (Step 7, Lane E)
|
|
// ─────────────────────────────────────────────────────────────────
|
|
describe('migrate v23 — files_source_id_page_id_ledger', () => {
|
|
const v23 = MIGRATIONS.find(m => m.version === 23);
|
|
|
|
test('v23 exists as handler-only (Postgres files table, PGLite no-op)', () => {
|
|
expect(v23).toBeDefined();
|
|
expect(v23!.name).toBe('files_source_id_page_id_ledger');
|
|
expect(v23!.sql).toBe('');
|
|
expect(v23!.handler).toBeDefined();
|
|
});
|
|
|
|
test('v23 handler gates on engine.kind for PGLite (no files table)', () => {
|
|
expect(v23!.handler!.toString()).toMatch(/engine\.kind\s*===\s*["']pglite["']/);
|
|
});
|
|
|
|
test('v23 adds files.source_id + files.page_id + ledger creation', () => {
|
|
const body = v23!.handler!.toString();
|
|
expect(body).toContain('ALTER TABLE files ADD COLUMN IF NOT EXISTS source_id');
|
|
expect(body).toContain('ALTER TABLE files ADD COLUMN IF NOT EXISTS page_id');
|
|
expect(body).toContain('CREATE TABLE IF NOT EXISTS file_migration_ledger');
|
|
});
|
|
|
|
test('v23 backfills files.page_id scoped to default source (Codex fix)', () => {
|
|
const body = v23!.handler!.toString();
|
|
// Without source_id='default' scope, the JOIN could hit the wrong
|
|
// page after new sources with duplicate slugs are added.
|
|
expect(body).toContain('UPDATE files f');
|
|
expect(body).toContain("p.source_id = 'default'");
|
|
});
|
|
|
|
test('v23 ledger PK is file_id (Codex: two sources can share old path)', () => {
|
|
const body = v23!.handler!.toString();
|
|
expect(body).toContain('file_id INTEGER PRIMARY KEY');
|
|
// State machine values all present.
|
|
for (const state of ['pending', 'copy_done', 'db_updated', 'complete', 'failed']) {
|
|
expect(body).toContain(`'${state}'`);
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('migrate — ordering guarantee (v15 must NOT be skipped by v16)', () => {
|
|
test('runMigrations sorts by version ascending', async () => {
|
|
// Regression: if v16 preceded v15 in the MIGRATIONS array, the iterator
|
|
// would setConfig(version, 16) first, then skip v15 on the next pass.
|
|
// runMigrations applies a defensive sort so array order doesn't matter.
|
|
// This test asserts v15 exists (if we broke the sort, v15 would still
|
|
// exist in MIGRATIONS but would never apply at runtime).
|
|
const v15 = MIGRATIONS.find(m => m.version === 15);
|
|
const v20 = MIGRATIONS.find(m => m.version === 20);
|
|
expect(v15).toBeDefined();
|
|
expect(v20).toBeDefined();
|
|
// Sanity: versions are distinct and progress.
|
|
const versions = MIGRATIONS.map(m => m.version);
|
|
const uniq = new Set(versions);
|
|
expect(uniq.size).toBe(versions.length);
|
|
});
|
|
});
|
|
|
|
// ─────────────────────────────────────────────────────────────────
|
|
// REGRESSION TESTS — migrations v8 + v9 perf on duplicate-heavy tables
|
|
// ─────────────────────────────────────────────────────────────────
|
|
//
|
|
// Garry's production brain hit Supabase Management API's 60s ceiling because
|
|
// the DELETE...USING self-join in migrations v8 + v9 was O(n²) without an
|
|
// index on the dedup columns. The fix pre-creates a btree helper index
|
|
// before the DELETE, then drops it. These tests guard against any future
|
|
// change that re-introduces the missing helper index.
|
|
//
|
|
// Two-layer guard:
|
|
// 1. Structural — assert the migration SQL literally contains the helper
|
|
// CREATE INDEX + DROP INDEX (deterministic, fast, catches the regression
|
|
// even at 0-row scale where wall-clock can't distinguish O(n²) from O(1)).
|
|
// 2. Behavioral — populate 1000 duplicates and assert the migration completes
|
|
// under the wall-clock cap. Sanity check at small scale; the structural
|
|
// assertion is the real guard.
|
|
|
|
describe('migrations v8 + v9 — structural guard for helper-index fix', () => {
|
|
test('migration v8 SQL contains idx_links_dedup_helper CREATE+DROP around the DELETE', () => {
|
|
const v8 = MIGRATIONS.find(m => m.version === 8);
|
|
expect(v8).toBeDefined();
|
|
const sql = v8!.sql;
|
|
|
|
// The fix must: (a) create the helper btree, (b) DELETE...USING, (c) drop the helper, (d) add the unique constraint.
|
|
// If anyone reorders or removes the helper-index lines, this fails.
|
|
expect(sql).toContain('CREATE INDEX IF NOT EXISTS idx_links_dedup_helper');
|
|
expect(sql).toContain('ON links(from_page_id, to_page_id, link_type)');
|
|
expect(sql).toContain('DROP INDEX IF EXISTS idx_links_dedup_helper');
|
|
expect(sql).toContain('DELETE FROM links a USING links b');
|
|
expect(sql).toContain('ALTER TABLE links ADD CONSTRAINT links_from_to_type_unique');
|
|
|
|
// Order matters: CREATE INDEX before DELETE, DROP INDEX after DELETE, before ADD CONSTRAINT.
|
|
const createIdx = sql.indexOf('CREATE INDEX IF NOT EXISTS idx_links_dedup_helper');
|
|
const deleteUsing = sql.indexOf('DELETE FROM links a USING links b');
|
|
const dropIdx = sql.indexOf('DROP INDEX IF EXISTS idx_links_dedup_helper');
|
|
const addConstraint = sql.indexOf('ALTER TABLE links ADD CONSTRAINT links_from_to_type_unique');
|
|
expect(createIdx).toBeLessThan(deleteUsing);
|
|
expect(deleteUsing).toBeLessThan(dropIdx);
|
|
expect(dropIdx).toBeLessThan(addConstraint);
|
|
});
|
|
|
|
test('migration v9 SQL contains idx_timeline_dedup_helper CREATE+DROP around the DELETE', () => {
|
|
const v9 = MIGRATIONS.find(m => m.version === 9);
|
|
expect(v9).toBeDefined();
|
|
const sql = v9!.sql;
|
|
|
|
expect(sql).toContain('CREATE INDEX IF NOT EXISTS idx_timeline_dedup_helper');
|
|
expect(sql).toContain('ON timeline_entries(page_id, date, summary)');
|
|
expect(sql).toContain('DROP INDEX IF EXISTS idx_timeline_dedup_helper');
|
|
expect(sql).toContain('DELETE FROM timeline_entries a USING timeline_entries b');
|
|
expect(sql).toContain('CREATE UNIQUE INDEX IF NOT EXISTS idx_timeline_dedup');
|
|
|
|
const createHelper = sql.indexOf('CREATE INDEX IF NOT EXISTS idx_timeline_dedup_helper');
|
|
const deleteUsing = sql.indexOf('DELETE FROM timeline_entries a USING timeline_entries b');
|
|
const dropHelper = sql.indexOf('DROP INDEX IF EXISTS idx_timeline_dedup_helper');
|
|
const createUnique = sql.indexOf('CREATE UNIQUE INDEX IF NOT EXISTS idx_timeline_dedup');
|
|
expect(createHelper).toBeLessThan(deleteUsing);
|
|
expect(deleteUsing).toBeLessThan(dropHelper);
|
|
expect(dropHelper).toBeLessThan(createUnique);
|
|
});
|
|
});
|
|
|
|
// v0.14.1 — fix wave structural assertions (migrations renumbered from v12/v13 to
|
|
// v14/v15 after master merged budget_ledger (v12) + minion_quiet_hours_stagger (v13)).
|
|
describe('migrate v14 — pages_updated_at_index (handler-based, engine-aware)', () => {
|
|
const v14 = MIGRATIONS.find(m => m.version === 14);
|
|
test('v14 exists and uses a handler (not pure SQL) for engine-aware branching', () => {
|
|
expect(v14).toBeDefined();
|
|
expect(v14!.name).toBe('pages_updated_at_index');
|
|
expect(typeof v14!.handler).toBe('function');
|
|
expect(v14!.sql).toBe('');
|
|
});
|
|
|
|
test('v14 handler source contains CONCURRENTLY + invalid-index cleanup for Postgres branch', async () => {
|
|
const { readFileSync } = await import('fs');
|
|
const src = readFileSync('src/core/migrate.ts', 'utf-8');
|
|
const v14Start = src.indexOf("name: 'pages_updated_at_index'");
|
|
expect(v14Start).toBeGreaterThan(-1);
|
|
const v14Block = src.slice(v14Start, v14Start + 3000);
|
|
expect(v14Block).toContain('pg_index');
|
|
expect(v14Block).toContain('indisvalid');
|
|
expect(v14Block).toContain('DROP INDEX CONCURRENTLY IF EXISTS idx_pages_updated_at_desc');
|
|
expect(v14Block).toContain('CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_pages_updated_at_desc');
|
|
// Order within the handler body: DROP IF EXISTS must precede CREATE IF NOT EXISTS,
|
|
// so a failed prior CONCURRENTLY build is cleaned before re-create. Anchor on the
|
|
// explicit "IF EXISTS" / "IF NOT EXISTS" phrases so the header doc-comment
|
|
// (which mentions both unqualified) doesn't fool the ordering assertion.
|
|
const dropIdx = v14Block.indexOf('DROP INDEX CONCURRENTLY IF EXISTS');
|
|
const createIdx = v14Block.indexOf('CREATE INDEX CONCURRENTLY IF NOT EXISTS');
|
|
expect(dropIdx).toBeLessThan(createIdx);
|
|
expect(v14Block).toContain('engine.kind');
|
|
});
|
|
});
|
|
|
|
describe('migrate v15 — minion_jobs_max_stalled_default_5', () => {
|
|
const v15 = MIGRATIONS.find(m => m.version === 15);
|
|
test('v15 exists and alters max_stalled default to 5', () => {
|
|
expect(v15).toBeDefined();
|
|
expect(v15!.name).toBe('minion_jobs_max_stalled_default_5');
|
|
expect(v15!.sql).toContain('ALTER TABLE minion_jobs ALTER COLUMN max_stalled SET DEFAULT 5');
|
|
});
|
|
|
|
test('v15 backfill UPDATE targets the correct non-terminal statuses', () => {
|
|
const sql = v15!.sql;
|
|
expect(sql).toContain(`'waiting'`);
|
|
expect(sql).toContain(`'active'`);
|
|
expect(sql).toContain(`'delayed'`);
|
|
expect(sql).toContain(`'waiting-children'`);
|
|
expect(sql).toContain(`'paused'`);
|
|
expect(sql).not.toContain(`'completed'`);
|
|
expect(sql).not.toContain(`'dead'`);
|
|
expect(sql).not.toContain(`'cancelled'`);
|
|
expect(sql).not.toContain(`'claimed'`);
|
|
expect(sql).not.toContain(`'running'`);
|
|
expect(sql).not.toContain(`'stalled'`);
|
|
});
|
|
|
|
test('v15 UPDATE clause has the < 5 guard so idempotent re-runs are no-ops', () => {
|
|
expect(v15!.sql).toContain('max_stalled < 5');
|
|
});
|
|
});
|
|
|
|
describe('migrate — runner behavioral (v14 handler + v15 backfill)', () => {
|
|
let engine: PGLiteEngine;
|
|
|
|
beforeAll(async () => {
|
|
engine = new PGLiteEngine();
|
|
await engine.connect({});
|
|
await engine.initSchema();
|
|
});
|
|
|
|
afterAll(async () => {
|
|
await engine.disconnect();
|
|
});
|
|
|
|
test('v14 created idx_pages_updated_at_desc on PGLite via handler branch', async () => {
|
|
const rows = await (engine as any).db.query(
|
|
`SELECT indexname FROM pg_indexes WHERE indexname = 'idx_pages_updated_at_desc'`
|
|
);
|
|
expect(rows.rows.length).toBe(1);
|
|
});
|
|
|
|
test('v15 backfilled any max_stalled=1 rows (smoke: schema default is 5)', async () => {
|
|
await (engine as any).db.exec(
|
|
`INSERT INTO minion_jobs (name, queue, status, max_stalled) VALUES ('test', 'default', 'waiting', 1)`
|
|
);
|
|
await (engine as any).db.exec(
|
|
`UPDATE minion_jobs SET max_stalled = 5
|
|
WHERE status IN ('waiting','active','delayed','waiting-children','paused')
|
|
AND max_stalled < 5`
|
|
);
|
|
const rows = await (engine as any).db.query(
|
|
`SELECT max_stalled FROM minion_jobs WHERE name = 'test'`
|
|
);
|
|
expect((rows.rows[0] as any).max_stalled).toBe(5);
|
|
|
|
await (engine as any).db.exec(
|
|
`UPDATE minion_jobs SET max_stalled = 5
|
|
WHERE status IN ('waiting','active','delayed','waiting-children','paused')
|
|
AND max_stalled < 5`
|
|
);
|
|
const rows2 = await (engine as any).db.query(
|
|
`SELECT max_stalled FROM minion_jobs WHERE name = 'test'`
|
|
);
|
|
expect((rows2.rows[0] as any).max_stalled).toBe(5);
|
|
});
|
|
});
|
|
|
|
describe('migrate: v8 (links_dedup) regression — must be fast on 1K duplicate rows', () => {
|
|
let engine: PGLiteEngine;
|
|
|
|
beforeAll(async () => {
|
|
engine = new PGLiteEngine();
|
|
await engine.connect({});
|
|
await engine.initSchema();
|
|
});
|
|
|
|
afterAll(async () => {
|
|
await engine.disconnect();
|
|
});
|
|
|
|
test('1000 duplicate links dedup completes in <5s and leaves table deduped', async () => {
|
|
// Set up: drop BOTH the old (v8) and new (v11) unique constraints so
|
|
// duplicates can be inserted, then reset version so v8 + v11 re-run.
|
|
// v11 replaces the v8 constraint name; we drop whichever is present.
|
|
const db = (engine as any).db;
|
|
await db.exec(`ALTER TABLE links DROP CONSTRAINT IF EXISTS links_from_to_type_unique`);
|
|
await db.exec(`ALTER TABLE links DROP CONSTRAINT IF EXISTS links_from_to_type_source_origin_unique`);
|
|
|
|
// Two pages so the FK is satisfied
|
|
await engine.putPage('p/from', { type: 'concept', title: 'F', compiled_truth: '', timeline: '' });
|
|
await engine.putPage('p/to', { type: 'concept', title: 'T', compiled_truth: '', timeline: '' });
|
|
const fromId = (await db.query(`SELECT id FROM pages WHERE slug = 'p/from'`)).rows[0].id;
|
|
const toId = (await db.query(`SELECT id FROM pages WHERE slug = 'p/to'`)).rows[0].id;
|
|
|
|
// Insert 1000 duplicates of the same (from, to, type) row
|
|
for (let i = 0; i < 1000; i++) {
|
|
await db.query(
|
|
`INSERT INTO links (from_page_id, to_page_id, link_type, context) VALUES ($1, $2, $3, $4)`,
|
|
[fromId, toId, 'mention', `dup-${i}`]
|
|
);
|
|
}
|
|
const beforeCount = (await db.query(`SELECT COUNT(*)::int AS c FROM links`)).rows[0].c;
|
|
expect(beforeCount).toBe(1000);
|
|
|
|
// Reset version to 7 so v8 + v9 + v10 + v11 re-run
|
|
await engine.setConfig('version', '7');
|
|
|
|
// Run migrations and assert wall-clock + correctness
|
|
const start = Date.now();
|
|
await runMigrations(engine);
|
|
const elapsedMs = Date.now() - start;
|
|
|
|
expect(elapsedMs).toBeLessThan(5000);
|
|
|
|
const afterCount = (await db.query(`SELECT COUNT(*)::int AS c FROM links`)).rows[0].c;
|
|
expect(afterCount).toBe(1); // deduped to one row
|
|
|
|
// v11 replaces v8's constraint name. Assert the current (v11) constraint
|
|
// exists and the legacy v8 name is gone.
|
|
const constraints = (await db.query(`
|
|
SELECT conname FROM pg_constraint
|
|
WHERE conrelid = 'links'::regclass AND contype = 'u'
|
|
`)).rows;
|
|
expect(constraints.some((c: { conname: string }) => c.conname === 'links_from_to_type_source_origin_unique')).toBe(true);
|
|
expect(constraints.some((c: { conname: string }) => c.conname === 'links_from_to_type_unique')).toBe(false);
|
|
|
|
// Helper index was dropped after dedup
|
|
const helperIdx = (await db.query(`
|
|
SELECT indexname FROM pg_indexes
|
|
WHERE tablename = 'links' AND indexname = 'idx_links_dedup_helper'
|
|
`)).rows;
|
|
expect(helperIdx.length).toBe(0);
|
|
});
|
|
});
|
|
|
|
describe('migrate: v9 (timeline_dedup_index) regression — must be fast on 1K duplicate rows', () => {
|
|
let engine: PGLiteEngine;
|
|
|
|
beforeAll(async () => {
|
|
engine = new PGLiteEngine();
|
|
await engine.connect({});
|
|
await engine.initSchema();
|
|
});
|
|
|
|
afterAll(async () => {
|
|
await engine.disconnect();
|
|
});
|
|
|
|
test('1000 duplicate timeline entries dedup completes in <5s and leaves table deduped', async () => {
|
|
const db = (engine as any).db;
|
|
await db.exec(`DROP INDEX IF EXISTS idx_timeline_dedup`);
|
|
|
|
await engine.putPage('p/timeline', { type: 'concept', title: 'TL', compiled_truth: '', timeline: '' });
|
|
const pageId = (await db.query(`SELECT id FROM pages WHERE slug = 'p/timeline'`)).rows[0].id;
|
|
|
|
// Insert 1000 duplicates of the same (page_id, date, summary) row
|
|
for (let i = 0; i < 1000; i++) {
|
|
await db.query(
|
|
`INSERT INTO timeline_entries (page_id, date, source, summary, detail) VALUES ($1, $2::date, $3, $4, $5)`,
|
|
[pageId, '2024-01-15', `src-${i}`, 'Founded NovaMind', `detail-${i}`]
|
|
);
|
|
}
|
|
const beforeCount = (await db.query(`SELECT COUNT(*)::int AS c FROM timeline_entries`)).rows[0].c;
|
|
expect(beforeCount).toBe(1000);
|
|
|
|
await engine.setConfig('version', '7');
|
|
|
|
const start = Date.now();
|
|
await runMigrations(engine);
|
|
const elapsedMs = Date.now() - start;
|
|
|
|
expect(elapsedMs).toBeLessThan(5000);
|
|
|
|
const afterCount = (await db.query(`SELECT COUNT(*)::int AS c FROM timeline_entries`)).rows[0].c;
|
|
expect(afterCount).toBe(1);
|
|
|
|
const uniqueIdx = (await db.query(`
|
|
SELECT indexname FROM pg_indexes
|
|
WHERE tablename = 'timeline_entries' AND indexname = 'idx_timeline_dedup'
|
|
`)).rows;
|
|
expect(uniqueIdx.length).toBe(1);
|
|
|
|
const helperIdx = (await db.query(`
|
|
SELECT indexname FROM pg_indexes
|
|
WHERE tablename = 'timeline_entries' AND indexname = 'idx_timeline_dedup_helper'
|
|
`)).rows;
|
|
expect(helperIdx.length).toBe(0);
|
|
});
|
|
});
|
|
|
|
// ─────────────────────────────────────────────────────────────────
|
|
// resolvePoolSize — GBRAIN_POOL_SIZE env override
|
|
// ─────────────────────────────────────────────────────────────────
|
|
//
|
|
// Guards the Bug 2 fix: users on constrained poolers (Supabase port 6543)
|
|
// must be able to cap the pool size via GBRAIN_POOL_SIZE. The default
|
|
// (10) is unchanged when the env var is unset.
|
|
|
|
describe('resolvePoolSize — env var + explicit override', () => {
|
|
const { resolvePoolSize } = require('../src/core/db.ts');
|
|
const original = process.env.GBRAIN_POOL_SIZE;
|
|
|
|
afterAll(() => {
|
|
if (original === undefined) delete process.env.GBRAIN_POOL_SIZE;
|
|
else process.env.GBRAIN_POOL_SIZE = original;
|
|
});
|
|
|
|
test('returns 10 default when unset and no explicit override', () => {
|
|
delete process.env.GBRAIN_POOL_SIZE;
|
|
expect(resolvePoolSize()).toBe(10);
|
|
});
|
|
|
|
test('reads GBRAIN_POOL_SIZE as an integer', () => {
|
|
process.env.GBRAIN_POOL_SIZE = '2';
|
|
expect(resolvePoolSize()).toBe(2);
|
|
process.env.GBRAIN_POOL_SIZE = '5';
|
|
expect(resolvePoolSize()).toBe(5);
|
|
});
|
|
|
|
test('ignores invalid GBRAIN_POOL_SIZE values', () => {
|
|
process.env.GBRAIN_POOL_SIZE = 'not-a-number';
|
|
expect(resolvePoolSize()).toBe(10);
|
|
process.env.GBRAIN_POOL_SIZE = '0';
|
|
expect(resolvePoolSize()).toBe(10);
|
|
process.env.GBRAIN_POOL_SIZE = '-1';
|
|
expect(resolvePoolSize()).toBe(10);
|
|
});
|
|
|
|
test('explicit argument wins over env + default', () => {
|
|
delete process.env.GBRAIN_POOL_SIZE;
|
|
expect(resolvePoolSize(3)).toBe(3);
|
|
process.env.GBRAIN_POOL_SIZE = '7';
|
|
expect(resolvePoolSize(3)).toBe(3);
|
|
});
|
|
});
|