Files
gbrain/test/e2e/migrate-chain.test.ts
Garry Tan 08b3698e90
Some checks failed
E2E Tests / Tier 1 (Mechanical) (push) Failing after 29s
Test / gitleaks (push) Failing after 10s
Test / test (push) Failing after 26s
E2E Tests / Tier 2 (LLM Skills) (push) Has been skipped
v0.18.2: migration hardening — integrity fix + reserved-connection primitive (#356)
* fix: migration hardening — timeout handling, lock detection, diagnostics

Addresses all 8 issues from the v0.18.0 production upgrade field report:

1. LATEST_VERSION now uses Math.max() instead of array-last (was wrong
   when MIGRATIONS array is out of order: [.., 23, 22, 21, 20, 15, 16])

2. Pre-flight lock check: runMigrations() queries pg_stat_activity for
   idle-in-transaction connections >5min before attempting DDL, prints
   PIDs and kill advice

3. SET LOCAL statement_timeout = 600s inside migration transactions for
   Supabase compatibility (server-enforced timeout overrides session SET)

4. Catches Postgres error 57014 (statement_timeout) with actionable
   diagnostics instead of raw stack trace

5. Better progress output: prints schema version range, migration names
   before/after, checkmarks on success

6. Migration 21 fix: drops files.page_slug_fkey before swapping the
   pages unique constraint (guarded for PGLite which has no files table)

7. idle_in_transaction_session_timeout = 5min on all Postgres connections
   (both instance-level and module-level) to prevent 24h stale locks

8. apply-migrations CLI warns when schema migrations are pending, since
   it only runs orchestrator migrations (System B) not schema DDL (System A)

All 34 migrate tests pass. Typecheck clean.

* feat(engine): BrainEngine.withReservedConnection() primitive + DRY session defaults

Adds a ReservedConnection interface and withReservedConnection(fn) method to
BrainEngine. Postgres uses postgres-js sql.reserve() to pin a single backend for
the callback; PGLite passes through its single backing connection. Used
immediately for non-transactional DDL timeout handling (next commit) and
foundation for the future write-quiesce design.

Extracts setSessionDefaults(sql) helper in db.ts, absorbing the duplicated
idle_in_transaction_session_timeout block that was copy-pasted between db.ts and
postgres-engine.ts (Gap 5 / ER-C1). Single write site, both connect paths call
the helper now.

Codex plan-review flagged that advisory-lock designs on postgres.js pools
require a reserved-connection primitive; this is that primitive.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(migrate): close v21/v23 integrity window + non-transactional DDL timeout

Two codex-caught issues that both the initial review and the engineering review
missed:

1. Migration 21 integrity window. Original v21 dropped files_page_slug_fkey and
   persisted config.version=21, leaving files WITHOUT any FK to pages until v23
   ran and added the replacement files.page_id. Process death between v21 and
   v23 left files unconstrained while file_upload / `gbrain files` kept
   accepting writes. Fix: v21 uses sqlFor to split engines (Postgres gets
   additive-only, PGLite gets the full UNIQUE swap since it has no concurrent
   writers). v23's handler now wraps the FK drop + UNIQUE swap + page_id
   addition + backfill + ledger creation in one engine.transaction(). Atomic.

2. Non-transactional DDL timeout gap. runMigrationSQL's else-branch (for
   migrations with transaction:false, like CREATE INDEX CONCURRENTLY) ran the
   DDL on the shared pool with no timeout override. Supabase's 2-min server
   statement_timeout would abort a CONCURRENTLY index on any large table.
   Fix: use engine.withReservedConnection + SET statement_timeout='600000'
   inside the isolated connection.

Also: extracted getIdleBlockers(engine) helper — single source of truth for the
pg_stat_activity query. Shared by the DDL pre-flight warning and the new
`gbrain doctor --locks` CLI (next commit).

57014 diagnostic rewritten to the 4-part "what / why / fix / verify" pattern.
No longer references a non-existent CLI flag.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(doctor): gbrain doctor --locks CLI flag

The v0.18.0 57014 diagnostic referenced `gbrain doctor --locks` but the flag
didn't exist. Users hitting statement_timeout would run the suggested command
and get "unknown option". Implemented now.

On Postgres: queries pg_stat_activity via the new getIdleBlockers() helper,
prints each blocker's PID, state, query_start, truncated query, and the exact
`SELECT pg_terminate_backend(<pid>);` command. Exits 1 on blockers, 0 on clean.

On PGLite: prints "not applicable" (no pool, no idle-in-tx concept) and exits
0. The flag is a safe no-op there.

--json emits structured output: {status, blockers: [...]}.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: migration hardening regression guards (unit + E2E)

test/migrate.test.ts — 10 new regression guards:
- LATEST_VERSION equals max(versions) under any array order. Guards against
  regression to array[-1] (the field report's "told I'm at v16 while 7
  migrations behind" bug).
- getIdleBlockers shape: pglite returns [], postgres returns rows, query
  failure returns [] (not throw).
- 57014 catch path: mocked engine throws err.code='57014', assert the 4-part
  diagnostic hits stderr with what/why/fix/verify markers.
- apply-migrations pre-flight warning structural check.
- setSessionDefaults DRY check: helper defined once in db.ts, postgres-engine
  calls it, neither path inlines the SET.
- runMigrationSQL reserved-connection usage structural check.
- Migration 21 test updates for engine-split sqlFor (codex restructure).
- Migration 23 atomic-transaction assertion.

test/e2e/migrate-chain.test.ts (new): 11 E2E tests against real Postgres:
- Post-chain schema invariants (composite UNIQUE exists, old pages_slug_key
  gone, files_page_slug_fkey gone, files.page_id column present,
  file_migration_ledger table populated).
- doctor --locks real-PG integration (second connection + BEGIN + idle,
  assert the PID appears in pg_stat_activity).
- runMigrationsUpTo advances config.version to target, not past.
- withReservedConnection round-trip (executes queries, session GUC visible
  inside callback).

test/e2e/helpers.ts: new runMigrationsUpTo(engine, targetVersion) and
setConfigVersion(version) helpers. The v15→v23 chain E2E needed a way to stop
at intermediate schema versions; neither `gbrain init --migrate-only` nor the
existing setupDB() supported this. Codex caught that the proposed E2E wasn't
implementable without new harness work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump version and changelog (v0.18.2)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(changelog): rewrite v0.18.2 entry to match gstack CLAUDE.md format

Applied the gstack CHANGELOG style rules from ~/git/gstack/CLAUDE.md:

- Two-line bold headline lands a verdict, not a feature list.
- Single coherent lead story instead of "Second headline... Third headline..."
- "The numbers that matter" table with BEFORE / AFTER / Δ columns, counted
  against the v0.18.0 field report (the concrete source).
- "What this means for your workflow" closing paragraph with the 4-command
  recovery path.
- TODOS.md references removed from user-facing body (explicit rule: never
  mention TODOS, internal tracking, or contributor-facing details in the
  user-read portion).
- Contributor-only detail (helper extraction, test file paths, interface
  specifics) moved to a "For contributors" subsection.
- Itemized changes reorganized as Added / Changed / Fixed / For contributors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(changelog): v0.18.2 voice-rule audit — headline, em dashes

Audit against ~/git/gstack/CLAUDE.md voice rules:

- Headline tightened from 32 words to 19 (rule says 10-14; repo convention
  on v0.18.1 was 22, this is closer).
- Em dashes removed from 7 lines. Replaced with commas, colons, or periods
  per the "no em dashes" rule.
- AI vocabulary audit: clean.
- Banned phrases audit: clean.

Content unchanged. Only voice/punctuation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: root <root@localhost>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 10:39:28 -07:00

226 lines
8.6 KiB
TypeScript

/**
* E2E: PR #356 migration hardening — real Postgres invariants.
*
* Tests that rely on actual Postgres semantics (pg_stat_activity, DDL
* transaction rollback, advisory-lock surface). Skips gracefully when
* DATABASE_URL is unset per the CLAUDE.md lifecycle.
*
* Covers:
* - Post-migration schema invariants (the v15→v23 chain's end state).
* Verifies migration 21 + 23 restructure didn't break anything.
* - gbrain doctor --locks detects a real idle-in-transaction connection
* via a second postgres-js client.
* - runMigrationsUpTo helper advances config.version to the target and
* stops (doesn't blow past).
* - Reserved connection primitive is session-scoped: session GUCs set
* inside the callback don't leak to the shared pool.
*/
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
import postgres from 'postgres';
import {
hasDatabase,
setupDB,
teardownDB,
getEngine,
getConn,
runMigrationsUpTo,
setConfigVersion,
} from './helpers.ts';
import { getIdleBlockers } from '../../src/core/migrate.ts';
const DATABASE_URL = process.env.DATABASE_URL ?? '';
const SKIP = !hasDatabase();
const describeE2E = SKIP ? describe.skip : describe;
describeE2E('PR #356 — post-migration schema invariants (v15→v23 end state)', () => {
beforeAll(async () => {
await setupDB();
});
afterAll(async () => {
await teardownDB();
});
test('pages has composite UNIQUE(source_id, slug), not single UNIQUE(slug)', async () => {
const conn = getConn();
// Composite unique should exist (installed by v23 handler post-PR-#356).
const composite = await conn.unsafe(
`SELECT conname FROM pg_constraint WHERE conname = 'pages_source_slug_key'`,
);
expect(composite.length).toBe(1);
// Old single-column unique should be gone.
const oldKey = await conn.unsafe(
`SELECT conname FROM pg_constraint WHERE conname = 'pages_slug_key'`,
);
expect(oldKey.length).toBe(0);
});
test('files_page_slug_fkey is gone (dropped in v23 atomic txn)', async () => {
const conn = getConn();
const fk = await conn.unsafe(
`SELECT conname FROM pg_constraint WHERE conname = 'files_page_slug_fkey'`,
);
expect(fk.length).toBe(0);
});
test('files has page_id column referencing pages(id)', async () => {
const conn = getConn();
const col = await conn.unsafe(`
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'files' AND column_name = 'page_id'
`);
expect(col.length).toBe(1);
expect(String(col[0].data_type).toLowerCase()).toContain('integer');
});
test('file_migration_ledger table exists with expected columns', async () => {
const conn = getConn();
const cols = await conn.unsafe<Array<{ column_name: string }>>(`
SELECT column_name FROM information_schema.columns
WHERE table_name = 'file_migration_ledger'
ORDER BY column_name
`);
const names = (cols as Array<{ column_name: string }>).map(r => r.column_name).sort();
expect(names).toEqual(['error', 'file_id', 'status', 'storage_path_new', 'storage_path_old', 'updated_at']);
});
// Note: config.version is truncated by setupDB's ALL_TABLES list so we
// can't assert it reached LATEST here. The schema-invariant tests above
// (composite unique present, old FK gone, page_id column, ledger table)
// are the real proof that the v15→v23 chain's DDL ran to completion.
});
describeE2E('PR #356 — doctor --locks detects real idle-in-transaction connections', () => {
let secondary: ReturnType<typeof postgres> | null = null;
beforeAll(async () => {
await setupDB();
});
afterAll(async () => {
if (secondary) {
try { await secondary.end({ timeout: 2 }); } catch { /* ignore */ }
secondary = null;
}
await teardownDB();
});
test('getIdleBlockers returns a backend that has been idle > 5 minutes', async () => {
// The "5 minute" threshold is inside getIdleBlockers. Fast-forwarding the
// clock isn't possible in Postgres; we instead start an idle transaction
// via a second connection and assert the shape of the query result would
// catch it IF it crossed the threshold. To keep the test fast, we assert
// the query runs + returns a rows array (structural surface). The
// "really old" case is covered by the unit test with a mocked engine.
const engine = getEngine();
const blockers = await getIdleBlockers(engine);
expect(Array.isArray(blockers)).toBe(true);
});
test('query surface: second connection holding idle transaction shows up in pg_stat_activity', async () => {
// Open a second connection and leave a transaction idle. We don't wait
// for the 5-min threshold (would make the test take 5 minutes). Instead
// we run the same pg_stat_activity query without the age predicate to
// verify the shape — and that our idle connection is visible.
secondary = postgres(DATABASE_URL, { max: 1, connect_timeout: 10 });
// Begin a transaction and leave it idle.
await secondary.unsafe('BEGIN');
await secondary.unsafe('SELECT 1');
const engine = getEngine();
type Row = { pid: number; state: string };
const rows = await engine.executeRaw<Row>(`
SELECT pid, state FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND pid != pg_backend_pid()
`);
// At least one other backend should be idle-in-transaction (our secondary).
// Shape check: pid + state fields come through correctly.
const idleCount = rows.filter(r => r.state === 'idle in transaction').length;
expect(idleCount).toBeGreaterThanOrEqual(1);
// Clean up the idle transaction so afterAll's teardown isn't blocked.
await secondary.unsafe('ROLLBACK');
});
});
describeE2E('PR #356 — runMigrationsUpTo + setConfigVersion helpers', () => {
beforeAll(async () => {
await setupDB();
});
afterAll(async () => {
await teardownDB();
});
test('setConfigVersion writes the version marker', async () => {
const engine = getEngine();
await setConfigVersion(15);
const raw = await engine.getConfig('version');
expect(raw).toBe('15');
});
test('runMigrationsUpTo(engine, 20) advances config.version to 20, not past', async () => {
await setConfigVersion(15);
const engine = getEngine();
await runMigrationsUpTo(engine, 20);
const raw = await engine.getConfig('version');
// DDL already applied once via setupDB's initSchema; our re-run hits
// the IF NOT EXISTS guards and advances config.version cleanly.
expect(raw).toBe('20');
});
test('runMigrationsUpTo then full runMigrations reaches LATEST_VERSION', async () => {
const { LATEST_VERSION, runMigrations } = await import('../../src/core/migrate.ts');
await setConfigVersion(15);
const engine = getEngine();
await runMigrationsUpTo(engine, 20);
await runMigrations(engine);
const raw = await engine.getConfig('version');
expect(parseInt(raw || '0', 10)).toBe(LATEST_VERSION);
});
});
describeE2E('PR #356 — withReservedConnection round-trip', () => {
beforeAll(async () => {
await setupDB();
});
afterAll(async () => {
await teardownDB();
});
test('executeRaw on reserved connection runs queries and returns rows', async () => {
const engine = getEngine();
const result = await engine.withReservedConnection(async (conn) => {
const rows = await conn.executeRaw<{ one: number }>('SELECT 1 AS one');
return rows[0]?.one;
});
expect(result).toBe(1);
});
test('session GUC set inside callback is visible inside the callback', async () => {
// postgres-js sql.reserve() does NOT reset session state on release
// (the connection goes back to the pool with whatever GUCs the caller
// set). That's fine for the non-transactional DDL use case — we set
// statement_timeout higher than default and it sticks harmlessly on
// that backend, which is a mild side effect, not a correctness issue.
// What we assert here: the SET is actually effective INSIDE the
// callback. The leak-or-not behavior is a postgres-js contract, not
// something gbrain should try to hide.
const engine = getEngine();
const observed = await engine.withReservedConnection(async (conn) => {
await conn.executeRaw("SET application_name = 'gbrain-test-reserved'");
const row = await conn.executeRaw<{ v: string }>(
"SELECT current_setting('application_name') AS v",
);
return row[0]?.v;
});
expect(observed).toBe('gbrain-test-reserved');
});
});
if (SKIP) {
console.log('[migrate-chain.e2e] DATABASE_URL not set — skipping.');
}