* fix: splitBody and inferType for wiki-style markdown content - splitBody now requires explicit timeline sentinel (<!-- timeline -->, --- timeline ---, or --- directly before ## Timeline / ## History). A bare --- in body text is a markdown horizontal rule, not a separator. This fixes the 83% content truncation @knee5 reported on a 1,991-article wiki where 4,856 of 6,680 wikilinks were lost. - serializeMarkdown emits <!-- timeline --> sentinel for round-trip stability. - inferType extended with /writing/, /wiki/analysis/, /wiki/guides/, /wiki/hardware/, /wiki/architecture/, /wiki/concepts/. Path order is most-specific-first so projects/blog/writing/essay.md → writing, not project. - PageType union extended: writing, analysis, guide, hardware, architecture. Updates test/import-file.test.ts to use the new sentinel. Co-Authored-By: @knee5 (PR #187) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: JSONB double-encode bug on Postgres + parseEmbedding NaN scores Two related Postgres-string-typed-data bugs that PGLite hid: 1. JSONB double-encode (postgres-engine.ts:107,668,846 + files.ts:254): ${JSON.stringify(value)}::jsonb in postgres.js v3 stringified again on the wire, storing JSONB columns as quoted string literals. Every frontmatter->>'key' returned NULL on Postgres-backed brains; GIN indexes were inert. Switched to sql.json(value), which is the postgres.js-native JSONB encoder (Parameter with OID 3802). Affected columns: pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata. page_versions.frontmatter is downstream via INSERT...SELECT and propagates the fix. 2. pgvector embeddings returning as strings (utils.ts): getEmbeddingsByChunkIds returned "[0.1,0.2,...]" instead of Float32Array on Supabase, producing [NaN] cosine scores. Adds parseEmbedding() helper handling Float32Array, numeric arrays, and pgvector string format. Throws loud on malformed vectors (per Codex's no-silent-NaN requirement); returns null for non-vector strings (treated as "no embedding here"). rowToChunk delegates to parseEmbedding. E2E regression test at test/e2e/postgres-jsonb.test.ts asserts jsonb_typeof = 'object' AND col->>'k' returns expected scalar across all 5 affected columns — the test that should have caught the original bug. Runs in CI via the existing pgvector service. Co-Authored-By: @knee5 (PR #187 — JSONB triple-fix) Co-Authored-By: @leonardsellem (PR #175 — parseEmbedding) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: extract wikilink syntax with ancestor-search slug resolution extractMarkdownLinks now handles [[page]] and [[page|Display Text]] alongside standard [text](page.md). For wiki KBs where authors omit leading ../ (thinking in wiki-root-relative terms), resolveSlug walks ancestor directories until it finds a matching slug. Without this, wikilinks under tech/wiki/analysis/ targeting [[../../finance/wiki/concepts/foo]] silently dangled when the correct relative depth was 3 × ../ instead of 2. Co-Authored-By: @knee5 (PR #187) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: gbrain repair-jsonb + v0.12.1 migration + CI grep guard - New gbrain repair-jsonb command. Detects rows where jsonb_typeof(col) = 'string' and rewrites them via (col #>> '{}')::jsonb across 5 affected columns: pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter. Idempotent — re-running is a no-op. PGLite engines short-circuit cleanly (the bug never affected the parameterized encode path PGLite uses). --dry-run shows what would be repaired; --json for scripting. - New v0_12_1.ts migration orchestrator. Phases: schema → repair → verify. Modeled on v0_12_0 pattern, registered in migrations/index.ts. Runs automatically via gbrain upgrade / apply-migrations. - CI grep guard at scripts/check-jsonb-pattern.sh fails the build if anyone reintroduces the ${JSON.stringify(x)}::jsonb interpolation pattern. Wired into bun test via package.json. Best-effort static analysis (multi-line and helper-wrapped variants are caught by the E2E round-trip test instead). - Updates apply-migrations.test.ts expectations to account for the new v0.12.1 entry in the registry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.12.1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.12.1 - CLAUDE.md: document repair-jsonb command, v0_12_1 migration, splitBody sentinel contract, inferType wiki subtypes, CI grep guard, new test files (repair-jsonb, migrations-v0_12_1, markdown) - README.md: add gbrain repair-jsonb to ADMIN command reference - INSTALL_FOR_AGENTS.md: fix verification count (6 -> 7), add v0.12.1 upgrade guidance for Postgres brains - docs/GBRAIN_VERIFY.md: add check #8 for JSONB integrity on Postgres-backed brains - docs/UPGRADING_DOWNSTREAM_AGENTS.md: add v0.12.1 section with migration steps, splitBody contract, wiki subtype inference - skills/migrate/SKILL.md: document native wikilink extraction via gbrain extract links (v0.12.1+) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.6 KiB
GBrain Installation Verification Runbook
Run these checks after install to confirm every part of GBrain is working. Each check includes the command, expected output, and what to do if it fails.
The most important check is #4 (live sync). "Sync ran" is not the same as "sync worked." A sync that silently skips pages because of a pooler bug is worse than no sync at all, because you think it's working.
1. Schema Verification
Command:
gbrain doctor --json
Expected: All checks return "ok":
connection: connected, N pagespgvector: extension installedrls: enabled on all tablesschema_version: currentembeddings: coverage percentage
If it fails: The doctor output includes specific fix instructions for each
check. See skills/setup/SKILL.md Error Recovery table.
2. Skillpack Loaded
Check: Ask the agent: "What is the brain-agent loop?"
Expected: The agent references GBRAIN_SKILLPACK.md Section 2 and describes the read-write cycle: detect entities, read brain, respond with context, write brain, sync.
If it fails: The agent hasn't loaded the skillpack. Run step 6 from the
install paste (read docs/GBRAIN_SKILLPACK.md).
3. Auto-Update Configured
Command:
gbrain check-update --json
Expected: Returns JSON with current_version, latest_version,
update_available (boolean). The cron gbrain-update-check is registered.
If it fails: Run step 7 from the install paste. See GBRAIN_SKILLPACK.md Section 17.
4. Live Sync Actually Works
This is the most important check. Three parts.
4a. Coverage Check
Compare page count in the DB against syncable file count in the repo:
gbrain stats
Then count syncable files:
find /data/brain -name '*.md' \
-not -path '*/.*' \
-not -path '*/.raw/*' \
-not -path '*/ops/*' \
-not -name 'README.md' \
-not -name 'index.md' \
-not -name 'schema.md' \
-not -name 'log.md' \
| wc -l
Expected: Page count in gbrain stats should be close to the file count.
Some difference is normal (files added since last sync), but if page count is
less than half the file count, sync is silently skipping pages.
If page count is way too low: The #1 cause is the connection pooler bug.
Check your DATABASE_URL:
- If it contains
pooler.supabase.com:6543, verify it's using Session mode, not Transaction mode. - Transaction mode breaks
engine.transaction()and causes.begin() is not a functionerrors. - Fix: switch to Session mode pooler string, then run
gbrain sync --fullto reimport everything.
4b. Embed Check
gbrain stats
Expected: Embedded chunk count should be close to total chunk count.
If embedded is much lower than total:
gbrain embed --stale
If OPENAI_API_KEY is not set, embeddings can't be generated. Keyword search
still works without embeddings, but hybrid/semantic search won't.
4c. End-to-End Test
This is the real test. Edit a brain page, push, wait, search.
- Edit a page in the brain repo (e.g., correct a fact on a person's page):
# Example: fix a line in Gustaf's page
cd /data/brain
# Make a small edit to any .md file
git add -A && git commit -m "test: verify live sync" && git push
-
Wait for the next sync cycle (cron interval or
--watchpoll). -
Search for the corrected text:
gbrain search "<text from the correction>"
Expected: The search returns the corrected text, not the old version.
If it returns old text: Sync failed silently. Check:
- Is the sync cron registered and running?
- Is
gbrain sync --watchstill alive (if using watch mode)? - Run
gbrain config get sync.last_runto see when sync last ran. - Run
gbrain sync --repo /data/brainmanually and check for errors. - If you see
.begin() is not a function, fix the pooler (see 4a above).
5. Embedding Coverage
Command:
gbrain stats
Expected: Embedded chunk count matches (or is close to) total chunk count.
If zero or very low: OPENAI_API_KEY may be missing or invalid. Check:
echo $OPENAI_API_KEY | head -c 10
If blank, set the key. Then:
gbrain embed --stale
6. Brain-First Lookup Protocol
Check: Ask the agent about a person or concept that exists in the brain.
Expected: The agent uses gbrain search or gbrain query FIRST, not grep
or external APIs. The response includes brain-sourced context with source
attribution.
If it fails: The brain-first lookup protocol isn't injected into the agent's
system context. See skills/setup/SKILL.md Phase D.
7. Knowledge Graph Wired
The v0.12.0 graph layer needs to be populated for existing brains. New writes are auto-linked, but historical pages need a one-time backfill.
Command:
gbrain stats | grep -E 'links|timeline'
Expected: Both links and timeline_entries are non-zero (assuming the brain
has content with entity references and dated markdown).
If it's zero on a brain with imported content: Run the backfill.
gbrain extract links --source db --dry-run | head -5 # preview
gbrain extract links --source db # commit
gbrain extract timeline --source db
gbrain stats # confirm > 0
Bonus check — graph traversal works:
# Pick any well-connected slug from your brain
gbrain graph-query people/<some-person-slug> --depth 2
Expected: Indented tree of typed edges (--attended-->, --works_at-->, etc.).
If the slug has no inbound or outbound links, try a different one or run extract
again.
If extract finds nothing: Your pages may not use entity-reference syntax. The
extractor matches [Name](people/slug), [Name](../people/slug.md), and bare
people/slug references. If your brain uses a different format, the auto-link
heuristics won't find them — file an issue with a sample page.
8. JSONB Frontmatter Integrity (v0.12.2)
Postgres-backed brains created before v0.12.2 had double-encoded JSONB columns
(frontmatter->>'key' returned NULL, GIN indexes were inert). gbrain upgrade
runs gbrain repair-jsonb automatically via the v0_12_2 orchestrator.
Verify the repair succeeded.
Command:
gbrain repair-jsonb --dry-run --json
Expected: totalRepaired: 0 across all 5 columns (pages.frontmatter,
raw_data.data, ingest_log.pages_updated, files.metadata,
page_versions.frontmatter). A zero count means every row is properly-typed
JSON objects, not string-encoded JSON.
If the count is > 0: The repair didn't run or was interrupted. Re-run
without --dry-run:
gbrain repair-jsonb
Idempotent. PGLite brains always report 0 (unaffected by the original bug).
Bonus check — frontmatter-keyed queries actually resolve:
gbrain call list_pages '{"frontmatterKey": "type", "frontmatterValue": "person"}'
If this returns rows on a brain with person pages, the JSONB path is healthy.
Quick Verification (all checks in one pass)
# 1. Schema
gbrain doctor --json
# 2. Sync recency
gbrain config get sync.last_run
# 3. Page count + embed coverage
gbrain stats
# 4. Search works
gbrain search "test query from your brain content"
# 5. Catch any unembedded chunks
gbrain embed --stale
# 6. Auto-update
gbrain check-update --json
# 7. Knowledge graph populated (links + timeline > 0)
gbrain stats | grep -E 'links|timeline'
# 8. JSONB integrity (v0.12.2 — Postgres only, PGLite always 0)
gbrain repair-jsonb --dry-run --json
If all eight return successfully, the installation is healthy. For the full end-to-end sync test (4c), push a real change and verify it appears in search.