* fix: migration hardening — timeout handling, lock detection, diagnostics Addresses all 8 issues from the v0.18.0 production upgrade field report: 1. LATEST_VERSION now uses Math.max() instead of array-last (was wrong when MIGRATIONS array is out of order: [.., 23, 22, 21, 20, 15, 16]) 2. Pre-flight lock check: runMigrations() queries pg_stat_activity for idle-in-transaction connections >5min before attempting DDL, prints PIDs and kill advice 3. SET LOCAL statement_timeout = 600s inside migration transactions for Supabase compatibility (server-enforced timeout overrides session SET) 4. Catches Postgres error 57014 (statement_timeout) with actionable diagnostics instead of raw stack trace 5. Better progress output: prints schema version range, migration names before/after, checkmarks on success 6. Migration 21 fix: drops files.page_slug_fkey before swapping the pages unique constraint (guarded for PGLite which has no files table) 7. idle_in_transaction_session_timeout = 5min on all Postgres connections (both instance-level and module-level) to prevent 24h stale locks 8. apply-migrations CLI warns when schema migrations are pending, since it only runs orchestrator migrations (System B) not schema DDL (System A) All 34 migrate tests pass. Typecheck clean. * feat(engine): BrainEngine.withReservedConnection() primitive + DRY session defaults Adds a ReservedConnection interface and withReservedConnection(fn) method to BrainEngine. Postgres uses postgres-js sql.reserve() to pin a single backend for the callback; PGLite passes through its single backing connection. Used immediately for non-transactional DDL timeout handling (next commit) and foundation for the future write-quiesce design. Extracts setSessionDefaults(sql) helper in db.ts, absorbing the duplicated idle_in_transaction_session_timeout block that was copy-pasted between db.ts and postgres-engine.ts (Gap 5 / ER-C1). Single write site, both connect paths call the helper now. Codex plan-review flagged that advisory-lock designs on postgres.js pools require a reserved-connection primitive; this is that primitive. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(migrate): close v21/v23 integrity window + non-transactional DDL timeout Two codex-caught issues that both the initial review and the engineering review missed: 1. Migration 21 integrity window. Original v21 dropped files_page_slug_fkey and persisted config.version=21, leaving files WITHOUT any FK to pages until v23 ran and added the replacement files.page_id. Process death between v21 and v23 left files unconstrained while file_upload / `gbrain files` kept accepting writes. Fix: v21 uses sqlFor to split engines (Postgres gets additive-only, PGLite gets the full UNIQUE swap since it has no concurrent writers). v23's handler now wraps the FK drop + UNIQUE swap + page_id addition + backfill + ledger creation in one engine.transaction(). Atomic. 2. Non-transactional DDL timeout gap. runMigrationSQL's else-branch (for migrations with transaction:false, like CREATE INDEX CONCURRENTLY) ran the DDL on the shared pool with no timeout override. Supabase's 2-min server statement_timeout would abort a CONCURRENTLY index on any large table. Fix: use engine.withReservedConnection + SET statement_timeout='600000' inside the isolated connection. Also: extracted getIdleBlockers(engine) helper — single source of truth for the pg_stat_activity query. Shared by the DDL pre-flight warning and the new `gbrain doctor --locks` CLI (next commit). 57014 diagnostic rewritten to the 4-part "what / why / fix / verify" pattern. No longer references a non-existent CLI flag. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(doctor): gbrain doctor --locks CLI flag The v0.18.0 57014 diagnostic referenced `gbrain doctor --locks` but the flag didn't exist. Users hitting statement_timeout would run the suggested command and get "unknown option". Implemented now. On Postgres: queries pg_stat_activity via the new getIdleBlockers() helper, prints each blocker's PID, state, query_start, truncated query, and the exact `SELECT pg_terminate_backend(<pid>);` command. Exits 1 on blockers, 0 on clean. On PGLite: prints "not applicable" (no pool, no idle-in-tx concept) and exits 0. The flag is a safe no-op there. --json emits structured output: {status, blockers: [...]}. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: migration hardening regression guards (unit + E2E) test/migrate.test.ts — 10 new regression guards: - LATEST_VERSION equals max(versions) under any array order. Guards against regression to array[-1] (the field report's "told I'm at v16 while 7 migrations behind" bug). - getIdleBlockers shape: pglite returns [], postgres returns rows, query failure returns [] (not throw). - 57014 catch path: mocked engine throws err.code='57014', assert the 4-part diagnostic hits stderr with what/why/fix/verify markers. - apply-migrations pre-flight warning structural check. - setSessionDefaults DRY check: helper defined once in db.ts, postgres-engine calls it, neither path inlines the SET. - runMigrationSQL reserved-connection usage structural check. - Migration 21 test updates for engine-split sqlFor (codex restructure). - Migration 23 atomic-transaction assertion. test/e2e/migrate-chain.test.ts (new): 11 E2E tests against real Postgres: - Post-chain schema invariants (composite UNIQUE exists, old pages_slug_key gone, files_page_slug_fkey gone, files.page_id column present, file_migration_ledger table populated). - doctor --locks real-PG integration (second connection + BEGIN + idle, assert the PID appears in pg_stat_activity). - runMigrationsUpTo advances config.version to target, not past. - withReservedConnection round-trip (executes queries, session GUC visible inside callback). test/e2e/helpers.ts: new runMigrationsUpTo(engine, targetVersion) and setConfigVersion(version) helpers. The v15→v23 chain E2E needed a way to stop at intermediate schema versions; neither `gbrain init --migrate-only` nor the existing setupDB() supported this. Codex caught that the proposed E2E wasn't implementable without new harness work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump version and changelog (v0.18.2) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(changelog): rewrite v0.18.2 entry to match gstack CLAUDE.md format Applied the gstack CHANGELOG style rules from ~/git/gstack/CLAUDE.md: - Two-line bold headline lands a verdict, not a feature list. - Single coherent lead story instead of "Second headline... Third headline..." - "The numbers that matter" table with BEFORE / AFTER / Δ columns, counted against the v0.18.0 field report (the concrete source). - "What this means for your workflow" closing paragraph with the 4-command recovery path. - TODOS.md references removed from user-facing body (explicit rule: never mention TODOS, internal tracking, or contributor-facing details in the user-read portion). - Contributor-only detail (helper extraction, test file paths, interface specifics) moved to a "For contributors" subsection. - Itemized changes reorganized as Added / Changed / Fixed / For contributors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(changelog): v0.18.2 voice-rule audit — headline, em dashes Audit against ~/git/gstack/CLAUDE.md voice rules: - Headline tightened from 32 words to 19 (rule says 10-14; repo convention on v0.18.1 was 22, this is closer). - Em dashes removed from 7 lines. Replaced with commas, colons, or periods per the "no em dashes" rule. - AI vocabulary audit: clean. - Banned phrases audit: clean. Content unchanged. Only voice/punctuation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: root <root@localhost> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2 lines
7 B
Plaintext
2 lines
7 B
Plaintext
0.18.2
|