Files
gbrain/test/traverse-graph-dedup.test.ts
Garry Tan b5fa3d044a fix: 8 root-cause fixes from /investigate (v0.14.2) (#259)
* fix: 8 root-cause fixes from /investigate wave

Consolidated bundle of bug fixes from /investigate on the 8 deferred bugs.
Each fix was designed to go at the structural gap, not the symptom. Codex
verified 20 load-bearing claims on the plan; 12 triggered plan revisions.

Bug 2  — GBRAIN_POOL_SIZE env knob + init finally blocks (no auto-detect).
         Covers both the singleton pool (db.ts) and instance pool (import.ts:140).
Bug 3  — Centralize migration ledger writes in apply-migrations runner.
         Removed appendCompletedMigration from v0_11_0, v0_12_0, v0_12_2,
         v0_13_0, v0_13_1. Added 3-partial wedge cap + --force-retry reset.
         'complete wins' preserved; no partial can regress a completed migration.
Bug 5  — v0.14.0 migration registered. src/commands/migrations/v0_14_0.ts
         ships Phase A (ALTER minion_jobs.max_stalled SET DEFAULT 3) + Phase B
         (pending-host-work ping for shell-jobs adoption).
Bug 6/10 — jsonb_agg(DISTINCT ...) in legacy traverseGraph (both engines).
         Presentation-level dedup; schema still preserves provenance rows.
Bug 7  — doctor --fast reads DB URL source via getDbUrlSource() in config.ts.
         Precise message: 'Skipping DB checks (--fast mode, URL present from env)'
         replaces the misleading 'No database configured'.
Bug 8  — max_stalled default bumped 1→3 in schema-embedded.ts, pglite-schema.ts,
         schema.sql (new installs). v0_14_0 Phase A ALTER for existing installs.
         autopilot-cycle handler yields to event loop between phases so the
         worker's lock-renewal timer fires on huge brains. (Deep AbortSignal
         threading through runEmbedCore/runExtractCore/runBacklinksCore/performSync
         deferred to v0.15 queue polish.)
Bug 9  — Gate sync.last_commit on no-failures across all three sync paths
         (incremental, full via runImport, gbrain import git continuity).
         recordSyncFailures() helper + ~/.gbrain/sync-failures.jsonl with
         dedup key path+commit+error-hash. New flags: --skip-failed (ack) +
         --retry-failed (re-attempt). Doctor surfaces unacknowledged failures.
Bug 11 — brain_score breakdown fields on BrainHealth (embed_coverage_score,
         link_density_score, timeline_coverage_score, no_orphans_score,
         no_dead_links_score); sum equals brain_score by construction.
         dead_links now on the type (resolves featuresTeaserForDoctor drift).
         orphan_pages kept as 'islanded' (no inbound AND no outbound) and
         docs updated to match — explicit semantic instead of doc drift.

New tests: test/traverse-graph-dedup.test.ts, test/sync-failures.test.ts,
test/brain-score-breakdown.test.ts, test/migration-resume.test.ts,
test/migrations-v0_14_0.test.ts. Extended: migrate, doctor, apply-migrations.

All 1696 unit tests pass locally. postgres-jsonb E2E regression unchanged
(none of these touch the JSONB write surface).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: v0.14.2 CHANGELOG + CLAUDE.md; align migration-flow E2E with runner-owned ledger

CHANGELOG: v0.14.2 entry in the standard release-summary format
(two-line headline + lead + numbers table + "what this means" +
"To take advantage of v0.14.2" self-repair block + itemized
changes grouped by reliability / observability / graph correctness /
new migration / tests / deferred-to-v0.15).

CLAUDE.md: new "Key commands added in v0.14.2" section covers
--skip-failed, --retry-failed, --force-retry, GBRAIN_POOL_SIZE env,
and the new doctor checks (sync_failures, brain_score breakdown).
Migration orchestrator docs updated to describe v0_14_0.ts + the
runner-owned ledger contract from Bug 3.

test/e2e/migration-flow.test.ts: three assertions updated to match
the Bug 3 contract — orchestrators no longer append to completed.jsonl
directly, so direct-orchestrator E2E calls leave the ledger empty.
Preferences assertions remain (that's still the orchestrator's side
of the contract). Runner's ledger write is covered by the unit suite
(test/apply-migrations.test.ts + test/migration-resume.test.ts).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 23:14:38 +08:00

107 lines
4.5 KiB
TypeScript

/**
* Bug 6/10 regression — legacy traverseGraph jsonb_agg duplicate edges.
*
* The links table deliberately allows multiple rows with the same
* (from_page_id, to_page_id, link_type) when origin_page_id or link_source
* differ. That's how markdown-body edges and frontmatter edges coexist for
* the same pair. The duplicates should NOT surface in the legacy
* traverseGraph() aggregated output — dedup is presentation-only in the
* jsonb_agg step. This test seeds two such rows and asserts the aggregation
* collapses them. It also asserts the underlying `links` table still has
* both rows (provenance preserved).
*
* Runs against PGLite (unit, always). The postgres-engine path uses the
* same SQL; an E2E test covers Postgres.
*/
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
let engine: PGLiteEngine;
beforeAll(async () => {
engine = new PGLiteEngine();
await engine.connect({});
await engine.initSchema();
});
afterAll(async () => {
await engine.disconnect();
});
beforeEach(async () => {
for (const t of ['links', 'pages']) {
await (engine as any).db.exec(`DELETE FROM ${t}`);
}
});
describe('Bug 6/10 — traverseGraph jsonb_agg DISTINCT', () => {
test('collapses two provenance rows for the same (from,to,type) edge', async () => {
await engine.putPage('people/alice', { type: 'person', title: 'Alice', compiled_truth: '', frontmatter: {} });
await engine.putPage('companies/acme', { type: 'company', title: 'Acme', compiled_truth: '', frontmatter: {} });
const alice = await (engine as any).db.query(`SELECT id FROM pages WHERE slug = 'people/alice'`);
const acme = await (engine as any).db.query(`SELECT id FROM pages WHERE slug = 'companies/acme'`);
const fromId = alice.rows[0].id as string;
const toId = acme.rows[0].id as string;
// Two rows, same (from, to, type), different provenance:
// row 1 from markdown body (origin_page_id = from page itself, link_source 'markdown')
// row 2 from frontmatter (origin_page_id = null, link_source 'frontmatter')
await (engine as any).db.query(
`INSERT INTO links (from_page_id, to_page_id, link_type, origin_page_id, link_source)
VALUES ($1, $2, 'works_at', $1, 'markdown')`,
[fromId, toId],
);
await (engine as any).db.query(
`INSERT INTO links (from_page_id, to_page_id, link_type, origin_page_id, link_source)
VALUES ($1, $2, 'works_at', NULL, 'frontmatter')`,
[fromId, toId],
);
// Provenance preserved at the table level.
const rawCount = await (engine as any).db.query(
`SELECT count(*)::int as n FROM links WHERE from_page_id = $1 AND to_page_id = $2 AND link_type = 'works_at'`,
[fromId, toId],
);
expect(rawCount.rows[0].n).toBe(2);
// Aggregated output dedups.
const nodes = await engine.traverseGraph('people/alice', 2);
const alicedNode = nodes.find(n => n.slug === 'people/alice');
expect(alicedNode).toBeDefined();
const worksAtEdges = alicedNode!.links.filter(
l => l.to_slug === 'companies/acme' && l.link_type === 'works_at',
);
expect(worksAtEdges.length).toBe(1);
});
test('keeps genuinely distinct link types even between same nodes', async () => {
await engine.putPage('people/bob', { type: 'person', title: 'Bob', compiled_truth: '', frontmatter: {} });
await engine.putPage('companies/widget', { type: 'company', title: 'Widget', compiled_truth: '', frontmatter: {} });
const bob = await (engine as any).db.query(`SELECT id FROM pages WHERE slug = 'people/bob'`);
const widget = await (engine as any).db.query(`SELECT id FROM pages WHERE slug = 'companies/widget'`);
const fromId = bob.rows[0].id as string;
const toId = widget.rows[0].id as string;
await (engine as any).db.query(
`INSERT INTO links (from_page_id, to_page_id, link_type, origin_page_id, link_source)
VALUES ($1, $2, 'works_at', $1, 'markdown')`,
[fromId, toId],
);
await (engine as any).db.query(
`INSERT INTO links (from_page_id, to_page_id, link_type, origin_page_id, link_source)
VALUES ($1, $2, 'founded', $1, 'markdown')`,
[fromId, toId],
);
const nodes = await engine.traverseGraph('people/bob', 2);
const bobNode = nodes.find(n => n.slug === 'people/bob');
const edges = bobNode!.links.filter(l => l.to_slug === 'companies/widget');
const types = edges.map(l => l.link_type).sort();
expect(types).toEqual(['founded', 'works_at']);
});
});