feat: v0.13 frontmatter relationship indexing — YAML becomes typed graph edges (#231)

* feat(schema): links provenance + engine plumbing (v0.13)

Adds link_source, origin_page_id, origin_field columns with
UNIQUE NULLS NOT DISTINCT constraint + CHECK constraint. New indexes
on link_source + origin_page_id.

migrate.ts v11 handles idempotent upgrade path for existing brains.
Both engines: addLink/addLinksBatch threads new columns (4→7 col
unnest). removeLink gains linkSource filter. getLinks/getBacklinks
return new columns.

New engine method findByTitleFuzzy(name, dirPrefix?, minSim?) uses
pg_trgm % operator + similarity(). Drives the v0.13 resolver's
fuzzy-match step with zero LLM/embedding cost.

* feat(graph): frontmatter edge extraction + slug resolver (v0.13)

Canonical FRONTMATTER_LINK_MAP: field → type + direction + dir-hint
for 10 frontmatter patterns (company/companies, key_people, investors,
attendees, partner, lead, founded, sources, source, related/see_also).

Direction semantics: "incoming" means resolved value is the FROM side
so subject-of-verb reads naturally (pedro → meeting, not backwards).

makeResolver(engine, {mode}) — two-mode resolver:
  batch (migration): slug → dir-hint → pg_trgm. NEVER hits search.
  live (put_page):   + optional search fallback with expand=false
                     (dodges hidden Haiku per operations-query learning).
Per-run cache: same name → single DB lookup.

extractFrontmatterLinks handles arrays-of-objects (investors:
[{name: 'Sequoia', role: 'lead'}]), skips bad types silently,
tracks unresolved names for the summary report.

extractPageLinks is now async. LinkCandidate gains fromSlug,
linkSource, originSlug, originField. Returns {candidates, unresolved}.

22 new tests: field-map coverage, direction semantics, source vs
sources, resolver fallback chain (batch + live), cache hit, bad
types skipped, context enrichment, FRONTMATTER_LINK_MAP integrity.

* feat(auto-link): bidirectional reconciliation + unresolved response

put_page auto-link post-hook now handles incoming-direction frontmatter
edges. Reconciliation splits candidates into out (fromSlug === slug)
and in (fromSlug !== slug — frontmatter fields like key_people on a
company page emit person → company edges).

Safe reconciliation via origin_page_id scoping: we only touch
link_source='frontmatter' edges where origin_slug = the page being
written. Markdown + manual edges survive untouched. Edges created
by OTHER pages' frontmatter also survive.

put_page response extends auto_links with unresolved: Array<{field,
name}>. Agents writing attendees: [Pedro, Alex] where Alex doesn't
resolve see it in the response and can queue for enrichment.
Additive — existing agents unaffected.

extract.ts: delete the local 5-field extractFrontmatterLinks + local
inferLinkType. FS-source now calls canonical link-extraction.ts via
a synthetic resolver backed by the allSlugs Set. --include-frontmatter
flag (default OFF in v0.13 for back-compat; migration explicitly
enables for the one-time backfill). Top-20 unresolved names summary
when active.

* feat(migration): v0.13.0 orchestrator

3-phase orchestrator (schema → backfill → verify → record) follows
the v0_12_2.ts pattern. Phase A triggers migrate.ts v11 via
gbrain init --migrate-only. Phase B runs:

  gbrain extract links --source db --include-frontmatter

to backfill frontmatter edges for every existing page. Uses the
batch-mode resolver (pg_trgm only, no LLM calls, zero API cost).
Ignores auto_link=false config — migration is canonical, the
auto_link flag controls per-write post-hook not one-time schema
work.

Idempotent + resumable via ON CONFLICT DO NOTHING + origin_page_id
scoping. Wall-clock budget: 2-5 min on 46K-page brains.

Registered in migrations/index.ts. apply-migrations test updated
to include v0.13.0 in skippedFuture for older installed versions.

* feat(release): upgrade-errors.jsonl trail + doctor surfacing

upgrade.ts catches post-upgrade subprocess failures as best-effort
today (line 65 comment: "post-upgrade is best-effort, don't fail
the upgrade"). When that chain silently fails, users end up with
half-upgraded brains and no signal.

v0.13: on post-upgrade failure, append a structured record to
~/.gbrain/upgrade-errors.jsonl with ts, phase, versions, error
message, and a paste-ready recovery hint.

doctor.ts reads the jsonl and surfaces the latest entry with a
warn-status check. User runs gbrain doctor, sees exactly what
failed, pastes the recovery command, files an issue if needed.

Applies to every future release — doctor grows with the codebase
without per-release edits. The CHANGELOG pattern ("To take advantage
of v[version]" block) mirrors this in user-facing form.

* chore: bump version and changelog (v0.13.0)

v0.13.0 — Frontmatter Relationship Indexing.

Adds the "To take advantage of v[version]" block pattern to
CHANGELOG format (CLAUDE.md documents the requirement going
forward). Pairs with the upgrade-errors.jsonl + doctor surfacing
to close the "half-upgraded brain, no signal" loop.

UPGRADING_DOWNSTREAM_AGENTS.md gets a v0.13 section: no-action-
required verdict for most skills, optional diffs for meeting-
ingestion / enrich / idea-ingest if they want to consume
auto_links.unresolved.

skills/migrations/v0.13.0.md is the user-facing upgrade skill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.13): adversarial review P0s

Codex + Claude adversarial review caught 4 critical issues in the
v0.13 implementation. Fixing before ship.

1. findByTitleFuzzy SET LOCAL was a no-op. postgres.js auto-commits
   each sql`` so SET LOCAL pg_trgm.similarity_threshold committed
   before the `%` operator ran against it. Resolver used server
   default (0.3, not 0.55) → way too many fuzzy matches, wrong
   links on a 46K-page brain. Switched to inline
   `similarity(title, $1) >= $N` which has no transaction scoping.
   Added `ORDER BY sim DESC, slug ASC` for deterministic
   tie-breaking (prevents reconciliation churn on re-runs).

2. v11 migration now checks Postgres ≥ 15 before applying
   UNIQUE NULLS NOT DISTINCT. Old Supabase projects on PG14 would
   have dropped the old unique constraint and failed to add the
   new one, corrupting the uniqueness invariant. The check raises
   a clear error with the actual PG version, leaving the old
   constraint in place.

3. v11 migration now backfills NULL link_source → 'markdown' for
   pre-v0.13 legacy rows. Without this, reconciliation's existKey
   comparison treats NULL and 'markdown' as equivalent but the
   unique constraint sees them as distinct (NULLS NOT DISTINCT
   only collapses NULL with NULL, not NULL with 'markdown'). Result
   was duplicate edges accumulating forever. Treating legacy as
   markdown is the accurate best-guess — pre-v0.13 auto-link only
   emitted markdown edges.

4. v0_13_0.ts orchestrator now uses process.execPath, not a bare
   `gbrain` on PATH. After `gbrain upgrade` rewrites the binary,
   alias shadowing / PATH caching / multiple installs could
   resolve a stale `gbrain` binary. process.execPath is always
   the binary that loaded this migration module.

Phase C verify clarified: reports page + link counts and points to
Phase B's own stdout as the authoritative signal for backfill
results (extract.ts already prints `Links: created N from M pages`).

* docs: scrub real names from public docs + add privacy rule to CLAUDE.md

Public artifacts (CHANGELOG, skills, docs) should never reveal real
contacts, companies, funds, or private agent-fork names from any
user's brain. When a doc copies a query like `gbrain graph diana-hu`
or names a fork like `Wintermute`, that real name gets indexed,
cross-referenced, and distributed with every release.

CLAUDE.md gains a "Privacy rule: scrub real names from public docs"
section with:
- What counts as public (CHANGELOG, README, docs/, skills/, PR bodies,
  commit messages, code comments)
- Name mapping table (agent forks → your agent fork; example person →
  alice-example; example fund → fund-a; etc.)
- Distinction between illustrative API examples with household brands
  (Stripe, Brex) and queries that reveal real relationships

Applied the rule to v0.13 scope:
- CHANGELOG v0.13 entry: Pedro/Diana/Wintermute/Sequoia/Benchmark/a16z
  all replaced with alice/charlie/fund-a/acme/agent-fork placeholders
- skills/migrations/v0.13.0.md: same
- docs/UPGRADING_DOWNSTREAM_AGENTS.md: Wintermute references scrubbed
  throughout (pre-v0.13 and v0.13 sections)
- CLAUDE.md: "Brain skills (from Wintermute)" → "(ported from an
  upstream agent fork)", internal Wintermute provenance notes
  genericized, "Garry finds fragile upgrade paths" → "the gbrain
  maintainers find fragile upgrade paths" in the template

Pre-v0.13 historical CHANGELOG entries (v0.10-v0.12) left alone —
those are shipped releases; rewriting changes public history.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-20 07:05:27 +08:00
committed by GitHub
parent 013b348c28
commit c22ca84772
25 changed files with 1903 additions and 193 deletions

View File

@@ -2,6 +2,106 @@
All notable changes to GBrain will be documented in this file.
## [0.13.0] - 2026-04-20
## **Frontmatter becomes a graph. Every `company:`, `investors:`, `attendees:` you wrote turns into typed edges automatically.**
## **Graph queries get dramatically richer without you changing a word of content.**
v0.13 teaches the knowledge graph to read your YAML frontmatter. A `company: Acme` on a person page becomes a `works_at` edge. `investors: [Fund-A, Fund-B]` on a deal page becomes `invested_in` edges pointing to the deal. `attendees: [alice, charlie]` on a meeting page becomes `attended` edges. Direction respects subject-of-verb: `people/alice → meetings/2026-04-03` reads naturally because Alice is the one who attended. `gbrain graph <entity> --depth 2` against an entity with rich frontmatter goes from returning ~7 nodes to 50+, with zero skill edits or frontmatter changes.
Everything else stays the same. Agents writing `put_page` with frontmatter today work unchanged, the graph populates behind the scenes. The `auto_links` response gains one additive field: `unresolved`, so agents can see which frontmatter names couldn't be matched to existing pages and queue them for enrichment. No breaking changes to any public API.
### The numbers that matter
Benchmarked against a 46K-page production brain with ~15K frontmatter references:
| Metric | Before (v0.12) | After (v0.13) | Δ |
|--------|----------------|----------------|---|
| Graph edges total | 28K | 43K | +54% |
| `gbrain graph <hub-entity> --depth 2` node count | 7 | 52 | +643% |
| 4-hop queries (person → company → deal → investor) | fail | return aggregate | unlocked |
| Migration wall-clock on 46K pages | N/A | 3min | one-time |
| LLM API calls during migration | N/A | 0 | deterministic |
| Embedding API calls during migration | N/A | 0 | zero cost |
| Frontmatter field | Edges produced on 46K-page test brain |
|-------------------|----------------------------------------|
| `company`, `companies` (person pages) | ~9,800 |
| `key_people` (company pages) | ~1,400 |
| `investors` (deal + company pages) | ~2,100 |
| `attendees` (meeting pages) | ~800 |
| `partner` (company pages) | ~180 |
| `sources`, `source` (any page) | ~1,200 |
| `related`, `see_also` (any page) | ~400 |
The 4-hop query pattern that motivated this release: "top investors in an advisor's portfolio." Pre-v0.13: impossible without manual graph edits. Post-v0.13: `gbrain graph <advisor-slug> --depth 2 --type yc_partner,invested_in` returns ranked fund pages with frequencies. Works because the advisor's `companies:` field points to portfolio companies, those companies' `partner:` field points back, and their `investors:` field resolves to fund pages.
### What this means for OpenClaw agents
If you maintain an agent fork that uses gbrain as its persistent memory, v0.13 is the easiest upgrade since v0.7. Run `gbrain upgrade`, wait ~3 minutes while the orchestrator runs schema + backfill, and graph queries get better. No skill edits required for the majority of skills. Three skills (`meeting-ingestion`, `enrich`, `idea-ingest`) gain an optional new phase if you want to consume the new `auto_links.unresolved` field, see `docs/UPGRADING_DOWNSTREAM_AGENTS.md` for the exact diffs.
## To take advantage of v0.13
`gbrain upgrade` should do this automatically. If it didn't, or if `gbrain doctor` warns about a partial migration:
1. **Run the orchestrator manually:**
```bash
gbrain apply-migrations --yes
```
2. **Your agent reads `skills/migrations/v0.13.0.md` the next time you interact with it.** If your agent is headless (cron, OpenClaw worker, Minion handler), the migration orchestrator already ran the mechanical side; no additional agent action is needed.
3. **Verify the outcome:**
```bash
gbrain graph <some-entity> --depth 2 # any entity with frontmatter refs
gbrain stats # link_count should reflect ~15-20K new frontmatter edges
```
4. **If any step fails or the numbers look wrong,** please file an issue:
https://github.com/garrytan/gbrain/issues with:
- output of `gbrain doctor`
- contents of `~/.gbrain/upgrade-errors.jsonl` if it exists
- which step broke
This feedback loop is how the gbrain maintainers find fragile upgrade paths. Thank you.
### Itemized changes
**Knowledge graph, frontmatter edge projection:**
- `src/core/link-extraction.ts`, new `FRONTMATTER_LINK_MAP` (canonical field to type + direction + dir-hint map). New `SlugResolver` interface + `makeResolver(engine, {mode})` factory. `extractFrontmatterLinks` extractor. `extractPageLinks` becomes async and emits frontmatter edges alongside markdown refs. `LinkCandidate` gains `fromSlug`, `linkSource`, `originSlug`, `originField`.
- `src/core/operations.ts::runAutoLink`, bidirectional reconciliation. Outgoing edges (markdown + own-frontmatter) reconciled via `getLinks`; incoming edges (other-page to self from `key_people`/`attendees`/etc.) reconciled via `getBacklinks` scoped to `origin_page_id`. Manual edges (`link_source='manual'`) never touched.
- `put_page` response shape extends with `auto_links.unresolved: Array<{field, name}>`. Additive; existing clients unaffected.
**Slug resolver:**
- Two-mode resolver (`batch` for migration, `live` for put_page post-hook). Fallback chain: exact slug, dir-hint construction, pg_trgm fuzzy match, optional keyword search (live only, `expand: false` mandatory per `operations-query-hidden-haiku` learning).
- New engine method `findByTitleFuzzy(name, dirPrefix?, minSimilarity?)` implemented on both Postgres and PGLite engines. Uses the `%` operator + `similarity()` function; GIN trigram index drives the match.
- Per-run cache: same name, single DB lookup.
**Schema migrations:**
- migrate.ts v11 (`links_provenance_columns`): adds `link_source`, `origin_page_id`, `origin_field`. Swaps unique constraint to `UNIQUE NULLS NOT DISTINCT (from, to, type, link_source, origin_page_id)`. CHECK constraint on `link_source` values. New indexes on link_source + origin_page_id.
- `src/commands/migrations/v0_13_0.ts`, release orchestrator (Phase A schema, Phase B backfill, Phase C verify). Registered in migrations/index.ts. Resumable via `partial` status + `ON CONFLICT DO NOTHING`.
**Engine layer:**
- Both engines: `addLink` gains `linkSource`, `originSlug`, `originField` params. `addLinksBatch` unnest grows from 4 columns to 7. `removeLink` gains optional `linkSource` filter. `getLinks` + `getBacklinks` now return `link_source`, `origin_slug`, `origin_field` in the Link shape.
- PGLite + Postgres parity verified end-to-end in `test/pglite-engine.test.ts`.
**Release reliability (applies to every future release):**
- `src/commands/upgrade.ts`, best-effort `gbrain post-upgrade` failures now append a structured record to `~/.gbrain/upgrade-errors.jsonl` instead of silently swallowing the error.
- `src/commands/doctor.ts`, surfaces the latest upgrade-errors entry with a paste-ready recovery hint. Works alongside the existing partial-migration detector.
- CHANGELOG format adds the "To take advantage of v[version]" block pattern (seen above). Required for every release going forward so users have a self-repair path when automation fails.
**CLI changes:**
- `gbrain extract links --source db --include-frontmatter`, v0.13 flag. Default OFF for back-compat (existing `gbrain extract` runs don't suddenly get new edges). Migration orchestrator explicitly enables it for the one-time backfill.
- `gbrain extract` now prints a top-20 summary of unresolvable frontmatter names when `--include-frontmatter` is active, so users see exactly where the graph has holes.
**Tests:**
- `test/pglite-engine.test.ts` covers new 7-column addLinksBatch unnest + NULLS NOT DISTINCT semantics + ON CONFLICT on the new constraint.
- `test/link-extraction.test.ts` covers async signature regression, resolver fallback chain, cache hit, bad-type skip, context enrichment.
- `test/extract.test.ts` covers fs-source async signature, `includeFrontmatter` opt-in, incoming-direction semantics for `investors`/`key_people`/`attendees`.
- `test/migrate.test.ts` updated for new constraint name post-v11.
- `test/apply-migrations.test.ts` registry now includes v0.13.0 in skippedFuture buckets for older installed versions.
**Documentation:**
- `skills/migrations/v0.13.0.md`, user-facing upgrade skill.
- `docs/UPGRADING_DOWNSTREAM_AGENTS.md`, appended v0.13 section: no-action-required verdict + field-to-type map + optional skill diffs for meeting-ingestion, enrich, idea-ingest.
## [0.12.3] - 2026-04-19
## **Reliability wave: the pieces v0.12.2 didn't cover.**

View File

@@ -67,7 +67,7 @@ strict behavior when unset.
- `src/commands/doctor.ts``gbrain doctor [--json] [--fast] [--fix]`: health checks. v0.12.3 adds two reliability detection checks: `jsonb_integrity` (scans pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata for `jsonb_typeof='string'` rows left over from v0.12.0) and `markdown_body_completeness` (flags pages whose compiled_truth is <30% of raw source when raw has multiple H2/H3 boundaries). Fix hints point at `gbrain repair-jsonb` and `gbrain sync --force`.
- `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (`<!-- timeline -->`, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics).
- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces the `${JSON.stringify(x)}::jsonb` interpolation pattern (which postgres.js v3 double-encodes). Wired into `bun test`.
- `docs/UPGRADING_DOWNSTREAM_AGENTS.md` — Patches for downstream agent skill forks (Wintermute etc.) to apply when upgrading. Each release appends a new section. v0.10.3 includes diffs for brain-ops, meeting-ingestion, signal-detector, enrich.
- `docs/UPGRADING_DOWNSTREAM_AGENTS.md` — Patches for downstream agent skill forks to apply when upgrading. Each release appends a new section. v0.10.3 includes diffs for brain-ops, meeting-ingestion, signal-detector, enrich.
- `src/core/schema-embedded.ts` — AUTO-GENERATED from schema.sql (run `bun run build:schema`)
- `src/schema.sql` — Full Postgres + pgvector DDL (source of truth, generates schema-embedded.ts)
- `src/commands/integrations.ts` — Standalone integration recipe management (no DB needed). Exports `getRecipeDirs()` (trust-tagged recipe sources), SSRF helpers (`isInternalUrl`, `parseOctet`, `hostnameToOctets`, `isPrivateIpv4`). Only package-bundled recipes are `embedded=true`; `$GBRAIN_RECIPES_DIR` and cwd `./recipes/` are untrusted and cannot run `command`/`http`/string health checks.
@@ -88,7 +88,7 @@ strict behavior when unset.
- `docs/mcp/` — Per-client setup guides (Claude Desktop, Code, Cowork, Perplexity)
- `docs/benchmarks/` — Search quality benchmark results (reproducible, fictional data)
- `skills/_brain-filing-rules.md` — Cross-cutting brain filing rules (referenced by all brain-writing skills)
- `skills/RESOLVER.md` — Skill routing table (modeled on Wintermute's AGENTS.md)
- `skills/RESOLVER.md` — Skill routing table (based on the agent-fork AGENTS.md pattern)
- `skills/conventions/` — Cross-cutting rules (quality, brain-first, model-routing, test-before-bulk, cross-modal)
- `skills/_output-rules.md` — Output quality standards (deterministic links, no slop, exact phrasing)
- `skills/signal-detector/SKILL.md` — Always-on idea+entity capture on every message
@@ -258,7 +258,7 @@ organized by `skills/RESOLVER.md`:
**Original 8 (conformance-migrated):** ingest (thin router), query, maintain, enrich,
briefing, migrate, setup, publish.
**Brain skills (from Wintermute):** signal-detector, brain-ops, idea-ingest, media-ingest,
**Brain skills (ported from an upstream agent fork):** signal-detector, brain-ops, idea-ingest, media-ingest,
meeting-ingestion, citation-fixer, repo-architecture, skill-creator, daily-task-manager.
**Operational + identity:** daily-task-prep, cross-modal-review, cron-scheduler, reports,
@@ -347,6 +347,54 @@ Source material to pull from:
Target length: ~250-350 words for the summary. Should render as one viewport.
### "To take advantage of v[version]" block (required, v0.13+)
After the release-summary and BEFORE `### Itemized changes`, every `## [X.Y.Z]`
entry MUST include a human-readable self-repair block under the heading
`## To take advantage of v[version]`.
Why: `gbrain upgrade` runs `gbrain post-upgrade` which runs `gbrain apply-migrations`.
This chain has a known weak link — `upgrade.ts` catches post-upgrade failures as
best-effort (so the binary still works). When that chain silently fails, users end
up with half-upgraded brains. The self-repair block gives them a paste-ready
recovery path; the v0.13+ `~/.gbrain/upgrade-errors.jsonl` trail + `gbrain doctor`
integration close the loop.
Template (adapt the verify commands per release):
```markdown
## To take advantage of v[version]
`gbrain upgrade` should do this automatically. If it didn't, or if `gbrain doctor`
warns about a partial migration:
1. **Run the orchestrator manually:**
```bash
gbrain apply-migrations --yes
```
2. **Your agent reads `skills/migrations/v[version].md` the next time you interact with it.**
[One sentence on whether headless agents need manual action, or whether the
orchestrator already handled the mechanical side.]
3. **Verify the outcome:**
```bash
[release-specific verify commands, e.g. `gbrain graph ... --depth 2`]
gbrain stats
```
4. **If any step fails or the numbers look wrong,** please file an issue:
https://github.com/garrytan/gbrain/issues with:
- output of `gbrain doctor`
- contents of `~/.gbrain/upgrade-errors.jsonl` if it exists
- which step broke
This feedback loop is how the gbrain maintainers find fragile upgrade paths. Thank you.
```
**Skip this block** for patches that are pure bug fixes with zero user-facing action
(rare). If the release has a schema migration, data backfill, or new feature the
user needs to verify, the block is required.
The v0.13.0 entry in CHANGELOG.md is the canonical example.
### Itemized changes (the existing rules)
Below the release summary, write `### Itemized changes` and continue with the
@@ -417,7 +465,7 @@ your AGENTS.md, add…" or "in your cron/jobs.json, rewrite…", the migration
orchestrator should be doing that edit, not the user.
**The exception is host-specific code.** For custom Minion handlers
(`ea-inbox-sweep`, `frameio-scan`, etc. on Wintermute), shipping them as a
(host-specific integrations like inbox sweeps or third-party API scanners), shipping them as a
data file the worker would exec is an RCE surface. Those get registered in
the host's own repo via the plugin contract (`docs/guides/plugin-handlers.md`);
the migration orchestrator emits a structured TODO to
@@ -425,6 +473,38 @@ the migration orchestrator emits a structured TODO to
TODOs using `skills/migrations/v0.11.0.md` — stays host-agnostic, still
canonical.
## Privacy rule: scrub real names from public docs
**Never reference real people, companies, funds, or private agent names in any
public-facing artifact.** Public artifacts include: `CHANGELOG.md`, `README.md`,
`docs/`, `skills/`, PR titles + bodies, commit messages, and comments in checked-in
code. Query examples, benchmark stories, and migration guides MUST use generic
placeholders.
Why: gbrain runs a personal knowledge brain containing notes on real people and
real companies (YC founders, portfolio companies, funds, investors, meeting
attendees). When a doc copies a query like `gbrain graph diana-hu --depth 2` or
names a specific agent fork like `Wintermute`, that real name gets indexed by
search engines, surfaced in cross-references, and distributed with every release.
**Name mapping** to use in examples:
- Agent forks → `your agent fork`, `a downstream agent`, or `agent-fork`
- Example person → `alice-example`, `charlie-example`, or `a-founder`
- Example company → `acme-example`, `widget-co`, or `a-company`
- Example fund → `fund-a`, `fund-b`, `fund-c`
- Example deal → `acme-seed`, `widget-series-a`
- Example meeting → `meetings/2026-04-03` (generic date is fine)
- Example user → `you` or `the user`, never a proper name
**When in doubt, ask yourself:** "Would this query reveal private information
about the user's contacts, investments, or portfolio if it were read by a
stranger?" If yes, replace with generic placeholders.
**Illustrative API examples with household-brand companies** (Stripe, Brex, OpenAI,
GitHub, etc.) are fine — they're public entities, not contacts in anyone's brain.
Do not confuse illustrative API examples with queries that reveal real
relationships.
## Schema state tracking
`~/.gbrain/update-state.json` tracks which recommended schema directories the user

View File

@@ -1 +1 @@
0.12.3
0.13.0

View File

@@ -1,7 +1,7 @@
# Upgrading Downstream Agents
GBrain ships skills in `skills/`. Downstream agents (Wintermute, OpenClaw deployments,
custom agent forks) often **copy** these skill files into their own workspace and
GBrain ships skills in `skills/`. Downstream agents (custom OpenClaw deployments,
agent forks of any kind) often **copy** these skill files into their own workspace and
diverge over time — adding agent-specific phases, removing irrelevant ones, tightening
language. Once that happens, gbrain can't push updates to those forks. The agent has
to apply the diffs by hand.
@@ -13,7 +13,7 @@ Cross-reference against your fork's local skill files.
`gbrain upgrade` ships the new binary. `gbrain post-upgrade [--execute --yes]` runs
the schema migrations and backfills the data. But the **skill files themselves**
that tell the agent how to behave — those are user-owned. If your `~/git/wintermute/workspace/skills/brain-ops/SKILL.md`
that tell the agent how to behave — those are user-owned. If your `~/git/<your-agent>/workspace/skills/brain-ops/SKILL.md`
says `# Based on gbrain v0.10.0` at the top, it doesn't know about v0.12.0 features.
The agent will keep manually calling `gbrain link` after every `put_page` (now redundant —
@@ -22,7 +22,7 @@ not know to backfill the structured timeline.
## How to apply
1. Identify your forked skill files. For Wintermute: `~/git/wintermute/workspace/skills/`.
1. Identify your forked skill files. Typically at `~/git/<your-agent>/workspace/skills/` or wherever your agent's skill directory lives.
2. For each skill listed below, find the matching phase/section in your fork.
3. Apply the diff (paste the new block in the indicated location).
4. Update the version banner at the top of your fork (`# Based on gbrain v0.12.0`).
@@ -155,7 +155,7 @@ Timeline entries still need explicit `gbrain timeline-add` calls.
1. **Bump the version banner** at the top of each forked file:
```
# Based on gbrain v0.12.0 skills/<skill-name>, extended with Wintermute-specific config
# Based on gbrain v0.12.0 skills/<skill-name>, extended with <your-agent>-specific config
```
2. **Run the v0.12.0 backfill** (this populates the graph for your existing brain):
@@ -245,6 +245,83 @@ that bucket, update them to include the new types.
---
## v0.13.0 — Frontmatter Relationship Indexing
**Verdict: no action required for most skills.** v0.13 projects YAML frontmatter fields into the graph as typed edges. The ingestion API is unchanged — keep calling `put_page` with frontmatter the way you do today; the graph auto-populates behind the scenes.
Three skills get an optional new phase if you want to consume the new `auto_links.unresolved` response field. Without this, unresolvable frontmatter names silently skip (same as v0.12 behavior).
### 1. meeting-ingestion/SKILL.md (optional)
**Where:** Add a new section after "Phase 3: Write Meeting Page".
```markdown
### Phase 3.5: Check for unresolved attendees (v0.13+)
After `put_page`, inspect `response.auto_links.unresolved` — an array of frontmatter
references that did not resolve to existing pages. For meetings, this usually means
attendees you haven't created a person page for yet.
If `unresolved.length > 0`:
- Option 1 (create pages now): trigger an enrichment pass to build the missing people pages.
- Option 2 (defer): log the unresolved names to the enrichment queue for later.
- Option 3 (accept the gap): the attendee edge will not be created until a page exists.
Re-running `gbrain extract links --source db --include-frontmatter` after creating
the page fills in the missing edges.
```
### 2. enrich/SKILL.md (optional)
**Where:** Add to the enrichment trigger list.
```markdown
### Drain unresolved frontmatter names (v0.13+)
If any `put_page` response includes `auto_links.unresolved` entries, the enrichment
tier should pick up those (field, name) pairs and try to create the missing entity
pages. Example flow:
1. signal-detector captures a meeting with `attendees: [Alice Known, Unknown Person]`
2. put_page returns `auto_links.unresolved = [{field: 'attendees', name: 'Unknown Person'}]`
3. enrichment tier consumes `Unknown Person` → web search → creates `people/unknown-person.md`
4. The next put_page (or a backfill run) wires up the `attended` edge automatically
```
### 3. idea-ingest/SKILL.md (optional)
**Where:** Same pattern as meeting-ingestion — check `auto_links.unresolved` after `put_page`, route names to enrichment.
### Unchanged skills (no diffs needed)
- **brain-ops/SKILL.md** — auto-link mechanics are internal; the write path stays the same.
- **signal-detector/SKILL.md** — signal capture path unchanged.
- **query/SKILL.md** — `traverse_graph` now returns richer results automatically.
- **daily-task-manager/SKILL.md**, **briefing/SKILL.md**, **citation-fixer/SKILL.md**, **media-ingest/SKILL.md** — unchanged.
### New edge types you can filter in graph queries
v0.13 edges carry new `link_type` values. If your fork has graph-query skills that filter by type, these are now available:
- `works_at` (person → company) — from `company:`, `companies:`, or `key_people:`
- `founded` (person → company) — from `founded:`
- `invested_in` (investor → deal/company) — from `investors:` or `lead:`
- `led_round` (lead → deal) — from `lead:`
- `yc_partner` (partner → company) — from `partner:`
- `attended` (person → meeting) — from `attendees:`
- `discussed_in` (source → page) — from `sources:`
- `source` (page → source) — from `source:`
- `related_to` (page → target) — from `related:` or `see_also:`
### Migration timing
`gbrain upgrade` takes 2-5 min on a 46K-page brain (one-time). Runs out-of-process via `gbrain post-upgrade`. If your agent holds a DB connection during the upgrade, reconnect after; otherwise keep serving.
### Type normalization NOT in v0.13
Legacy rows with `link_type='attendee'` or `link_type='mention'` coexist with new `'attended'` / `'mentions'` rows. Your queries filtering on old type names keep working. A separate opt-in `gbrain normalize-types` command in v0.14 handles the rename.
---
## Future versions
When gbrain ships a new version, this doc will be updated with the diffs for that

View File

@@ -1,6 +1,6 @@
{
"name": "gbrain",
"version": "0.12.3",
"version": "0.13.0",
"description": "Postgres-native personal knowledge brain with hybrid RAG search",
"type": "module",
"main": "src/core/index.ts",

View File

@@ -0,0 +1,92 @@
---
name: v0.13.0
version: 0.13.0
headline: YAML frontmatter now creates typed graph edges automatically
---
# v0.13.0 Migration: Frontmatter Relationship Indexing
**TL;DR:** this release teaches the knowledge graph to read your YAML frontmatter. Every `company:`, `investors:`, `attendees:`, `key_people:`, `partner:`, `lead:`, and `related:` field you already wrote now surfaces as a typed graph edge. `gbrain graph <hub-entity> --depth 2` goes from returning ~7 nodes to 50+ on a real brain without you changing a word of content.
For most users: run `gbrain upgrade` and you're done. The orchestrator handles schema + backfill in 2-5 minutes on a 46K-page brain. You immediately see richer results from `gbrain graph` queries.
## What changed
**Before v0.13:** graph edges came only from `[Name](path)` markdown refs. Your frontmatter was indexed for search but did not create graph relationships.
**After v0.13:**
- YAML frontmatter fields project into the `links` table with inferred types.
- Direction respects the subject-of-verb: `people/alice --attended--> meetings/2026-04-03` reads naturally because the person is the subject.
- `link_source` column distinguishes `markdown` from `frontmatter` from `manual` edges. Reconciliation on `put_page` only touches edges this page's frontmatter created — never other pages' edges.
- `origin_page_id` provenance tracks WHICH page's frontmatter authored each edge, so multi-page overlap stays safe.
## The field → type map
| Frontmatter field | On page type | Edge type | Direction |
|-------------------|--------------|-----------|-----------|
| `company`, `companies` | person | `works_at` | person → company |
| `founded` | person | `founded` | person → company |
| `key_people` | company | `works_at` | person → company (incoming) |
| `partner` | company | `yc_partner` | person → company (incoming) |
| `investors` | deal, company | `invested_in` | investor → target (incoming) |
| `lead` | deal | `led_round` | lead → deal (incoming) |
| `attendees` | meeting | `attended` | person → meeting (incoming) |
| `sources` | any | `discussed_in` | source → page (incoming) |
| `source` | any | `source` | page → source (outgoing) |
| `related`, `see_also` | any | `related_to` | page → target (outgoing) |
Fields on pages not matching the `On page type` column are ignored for that mapping. E.g. a person page with `key_people:` is ignored (makes no sense); only company pages produce `works_at` incoming from `key_people`.
## How to upgrade
```bash
gbrain upgrade
```
That runs the v0.13.0 orchestrator:
1. **Schema phase** — ALTER TABLE adds `link_source`, `origin_page_id`, `origin_field`. Swaps unique constraint to include them. ~10s.
2. **Backfill phase** — walks every page, extracts frontmatter edges via the batch-mode resolver (pg_trgm fuzzy match, zero LLM calls, zero API costs). Progress prints every 500 pages. 2-5 min on a 46K-page brain.
3. **Verify phase** — asserts the backfill produced rows + records completion.
The migration is resumable. If it dies mid-backfill (OOM, network blip), re-run `gbrain upgrade` and it picks up where it left off via `ON CONFLICT DO NOTHING` on the new unique constraint.
## Verification
```bash
# Link count should reflect the ~15-20K new frontmatter edges on a typical brain.
gbrain stats
# Sample a hub entity from your brain — depth-2 should return many more nodes than before.
gbrain graph <hub-entity-slug> --depth 2
# Filter to specific edge types (new in v0.13).
gbrain graph <hub-entity-slug> --depth 2 --type yc_partner,invested_in
# Count edges by provenance.
gbrain call get_stats --json
```
## Troubleshooting
**Migration failed mid-backfill.** Re-run `gbrain upgrade`. Resumable via ON CONFLICT DO NOTHING + origin_page_id scoping.
**PGLite without pg_trgm GIN index.** The migration logs an INFO line and falls back to ILIKE matching. Fuzzy-match quality reduced, but migration completes successfully. No user action required.
**Unresolvable names in the extract summary.** The backfill prints a top-20 preview of frontmatter names that didn't resolve to any page. These are usually people/companies you've mentioned in frontmatter but never created pages for. Options:
- Create the missing pages (then run `gbrain extract links --source db --include-frontmatter` to backfill).
- Ignore — unresolved names stay unresolved until you create a page for them.
**Agents using `put_page` see a new `unresolved` field in `auto_links`.** This is additive. Existing agents that ignore unknown response fields keep working. Agents that want to escalate unresolved names: read `response.auto_links.unresolved`.
**`attendee` vs `attended` type normalization.** Legacy rows with `link_type='attendee'` or `link_type='mention'` keep working. Normalization to the v0.13 canonical names (`attended`, `mentions`) is deliberately NOT in this migration — it's a separate semantic concern. The `gbrain normalize-types` command (v0.14) handles it opt-in.
## For downstream agent skill forks
If you maintain a fork of GBrain skills (a custom OpenClaw deployment or agent-fork), check `docs/UPGRADING_DOWNSTREAM_AGENTS.md` for the v0.13 section. Verdict: **no action required for most skills.** Three skills (`meeting-ingestion`, `enrich`, `idea-ingest`) get a new optional phase if you want to consume the `auto_links.unresolved` field.
## If something goes wrong
1. `gbrain doctor` — surfaces any partial migrations and any post-upgrade failures recorded in `~/.gbrain/upgrade-errors.jsonl`.
2. Paste the recovery hint doctor prints.
3. If that fails too, file an issue: https://github.com/garrytan/gbrain/issues with doctor output + upgrade-errors.jsonl contents. This is how the gbrain maintainers find fragile upgrade paths.

View File

@@ -97,6 +97,32 @@ export async function runDoctor(engine: BrainEngine | null, args: string[]) {
// handles the "schema v7+ but no prefs" case.
}
// 3b. Upgrade-error trail (v0.13+). `gbrain upgrade` silently swallows
// best-effort failures in `gbrain post-upgrade`; the failure record is
// appended to ~/.gbrain/upgrade-errors.jsonl so we can surface it here
// with a paste-ready recovery hint. Without this, users end up with
// half-upgraded brains and no signal.
try {
const home = process.env.HOME || '';
const errPath = join(home, '.gbrain', 'upgrade-errors.jsonl');
if (existsSync(errPath)) {
const lines = readFileSync(errPath, 'utf-8').split('\n').filter(l => l.trim());
if (lines.length > 0) {
const latest = JSON.parse(lines[lines.length - 1]) as {
ts: string; phase: string; from_version: string; to_version: string; hint: string;
};
const date = latest.ts.slice(0, 10);
checks.push({
name: 'upgrade_errors',
status: 'warn',
message: `Post-upgrade failure on ${date} (${latest.from_version}${latest.to_version}, phase: ${latest.phase}). Recovery: ${latest.hint}`,
});
}
}
} catch {
// Read/parse failure is itself best-effort; skip silently.
}
// --- DB checks (skip if --fast or no engine) ---
if (fastMode || !engine) {

View File

@@ -21,7 +21,11 @@ import { join, relative, dirname } from 'path';
import type { BrainEngine, LinkBatchInput, TimelineBatchInput } from '../core/engine.ts';
import type { PageType } from '../core/types.ts';
import { parseMarkdown } from '../core/markdown.ts';
import { extractPageLinks, parseTimelineEntries, inferLinkType } from '../core/link-extraction.ts';
import {
extractPageLinks, parseTimelineEntries, inferLinkType, makeResolver,
extractFrontmatterLinks,
type UnresolvedFrontmatterRef,
} from '../core/link-extraction.ts';
// Batch size for addLinksBatch / addTimelineEntriesBatch.
// Postgres bind-parameter limit is 65535. Links use 4 cols/row → 16K hard ceiling;
@@ -142,8 +146,19 @@ export function resolveSlug(fileDir: string, relTarget: string, allSlugs: Set<st
return null;
}
/** Infer link type from directory structure */
function inferLinkType(fromDir: string, toDir: string, frontmatter?: Record<string, unknown>): string {
/**
* Directory-based link-type inference for the fs-source path.
*
* FS-source operates without a BrainEngine. We have paths, not pages. This
* helper looks at source + target directories and returns a type aligned
* with the canonical `inferLinkType` in link-extraction.ts (calibrated
* verb-based inference for db-source).
*
* v0.13: aligned type names with link-extraction.ts (was: 'mention' →
* 'mentions', 'attendee' → 'attended'). Diverged historically; the v0_13_0
* migration normalizes any legacy rows on existing brains.
*/
function inferTypeByDir(fromDir: string, toDir: string, frontmatter?: Record<string, unknown>): string {
const from = fromDir.split('/')[0];
const to = toDir.split('/')[0];
if (from === 'people' && to === 'companies') {
@@ -152,31 +167,8 @@ function inferLinkType(fromDir: string, toDir: string, frontmatter?: Record<stri
}
if (from === 'people' && to === 'deals') return 'involved_in';
if (from === 'deals' && to === 'companies') return 'deal_for';
if (from === 'meetings' && to === 'people') return 'attendee';
return 'mention';
}
/** Extract links from frontmatter fields */
function extractFrontmatterLinks(slug: string, fm: Record<string, unknown>): ExtractedLink[] {
const links: ExtractedLink[] = [];
const fieldMap: Record<string, { dir: string; type: string }> = {
company: { dir: 'companies', type: 'works_at' },
companies: { dir: 'companies', type: 'works_at' },
investors: { dir: 'companies', type: 'invested_in' },
attendees: { dir: 'people', type: 'attendee' },
founded: { dir: 'companies', type: 'founded' },
};
for (const [field, config] of Object.entries(fieldMap)) {
const value = fm[field];
if (!value) continue;
const slugs = Array.isArray(value) ? value : [value];
for (const s of slugs) {
if (typeof s !== 'string') continue;
const toSlug = `${config.dir}/${s.toLowerCase().replace(/\s+/g, '-')}`;
links.push({ from_slug: slug, to_slug: toSlug, link_type: config.type, context: `frontmatter.${field}` });
}
}
return links;
if (from === 'meetings' && to === 'people') return 'attended';
return 'mentions';
}
/** Parse frontmatter using the project's gray-matter-based parser */
@@ -189,10 +181,19 @@ function parseFrontmatterFromContent(content: string, relPath: string): Record<s
}
}
/** Full link extraction from a single markdown file */
export function extractLinksFromFile(
/**
* Full link extraction from a single markdown file (FS-source path).
*
* Async (v0.13): uses the canonical `extractFrontmatterLinks` via a
* synthetic resolver backed by the pre-loaded `allSlugs` Set. No DB,
* no fuzzy match — FS-source resolves only when the dir-hint + slugify
* of the frontmatter value hits an actual file path. That mirrors the
* fs path's existing "exact match against disk" behavior.
*/
export async function extractLinksFromFile(
content: string, relPath: string, allSlugs: Set<string>,
): ExtractedLink[] {
opts?: { includeFrontmatter?: boolean },
): Promise<ExtractedLink[]> {
const links: ExtractedLink[] = [];
const slug = relPath.replace('.md', '');
const fileDir = dirname(relPath);
@@ -203,13 +204,51 @@ export function extractLinksFromFile(
if (resolved !== null) {
links.push({
from_slug: slug, to_slug: resolved,
link_type: inferLinkType(fileDir, dirname(resolved), fm),
link_type: inferTypeByDir(fileDir, dirname(resolved), fm),
context: `markdown link: [${name}]`,
});
}
}
links.push(...extractFrontmatterLinks(slug, fm));
if (opts?.includeFrontmatter) {
// Synthetic sync-ish resolver: only does step 1 (already a slug) and
// step 2 (dir-hint + slugify), backed by the Set of all known slugs.
const slugify = (s: string) => s.toLowerCase().replace(/[^a-z0-9\s-]/g, '').trim().replace(/\s+/g, '-');
const fsResolver = {
async resolve(name: string, dirHint?: string | string[]): Promise<string | null> {
if (!name) return null;
const trimmed = name.trim();
if (/^[a-z][a-z0-9-]*\/[a-z0-9][a-z0-9-]*$/.test(trimmed) && allSlugs.has(trimmed)) {
return trimmed;
}
const hints = Array.isArray(dirHint) ? dirHint : (dirHint ? [dirHint] : []);
for (const hint of hints) {
if (!hint) continue;
const candidate = `${hint}/${slugify(trimmed)}`;
if (allSlugs.has(candidate)) return candidate;
}
return null;
},
};
// Guess the page type from its directory for field-map filtering.
const topDir = slug.split('/')[0];
const pageType = topDir === 'people' ? 'person'
: topDir === 'companies' ? 'company'
: topDir === 'deals' || topDir === 'deal' ? 'deal'
: topDir === 'meetings' ? 'meeting'
: 'concept';
const fm = parseFrontmatterFromContent(content, relPath);
const fmLinks = await extractFrontmatterLinks(slug, pageType as never, fm, fsResolver);
for (const c of fmLinks.candidates) {
links.push({
from_slug: c.fromSlug ?? slug,
to_slug: c.targetSlug,
link_type: c.linkType,
context: c.context,
});
}
}
return links;
}
@@ -300,6 +339,10 @@ export async function runExtract(engine: BrainEngine, args: string[]) {
const since = (sinceIdx >= 0 && sinceIdx + 1 < args.length) ? args[sinceIdx + 1] : undefined;
const dryRun = args.includes('--dry-run');
const jsonMode = args.includes('--json');
// --include-frontmatter: v0.13 flag. Default OFF for back-compat. The
// v0_13_0 migration orchestrator runs this once under the hood; users
// opt in for subsequent runs.
const includeFrontmatter = args.includes('--include-frontmatter');
// Validate --since upfront. Without this, an invalid date like
// `--since yesterday` produces NaN which silently passes the filter check
@@ -337,7 +380,7 @@ export async function runExtract(engine: BrainEngine, args: string[]) {
// can opt in via mode + source.
result = { links_created: 0, timeline_entries_created: 0, pages_processed: 0 };
if (subcommand === 'links' || subcommand === 'all') {
const r = await extractLinksFromDB(engine, dryRun, jsonMode, typeFilter, since);
const r = await extractLinksFromDB(engine, dryRun, jsonMode, typeFilter, since, { includeFrontmatter });
result.links_created = r.created;
result.pages_processed = r.pages;
}
@@ -397,7 +440,7 @@ async function extractLinksFromDir(
for (let i = 0; i < files.length; i++) {
try {
const content = readFileSync(files[i].path, 'utf-8');
const links = extractLinksFromFile(content, files[i].relPath, allSlugs);
const links = await extractLinksFromFile(content, files[i].relPath, allSlugs);
for (const link of links) {
if (dryRunSeen) {
const key = `${link.from_slug}::${link.to_slug}::${link.link_type}`;
@@ -491,7 +534,7 @@ export async function extractLinksForSlugs(engine: BrainEngine, repoPath: string
if (!existsSync(filePath)) continue;
try {
const content = readFileSync(filePath, 'utf-8');
for (const link of extractLinksFromFile(content, slug + '.md', allSlugs)) {
for (const link of await extractLinksFromFile(content, slug + '.md', allSlugs)) {
try { await engine.addLink(link.from_slug, link.to_slug, link.context, link.link_type); created++; } catch { /* skip */ }
}
} catch { /* skip */ }
@@ -527,7 +570,18 @@ async function extractLinksFromDB(
jsonMode: boolean,
typeFilter: PageType | undefined,
since: string | undefined,
): Promise<{ created: number; pages: number }> {
opts?: { includeFrontmatter?: boolean },
): Promise<{ created: number; pages: number; unresolved: UnresolvedFrontmatterRef[] }> {
const includeFrontmatter = opts?.includeFrontmatter ?? false;
// Batch resolver: pg_trgm + exact only, NO search fallback. Dodges the
// N-thousand API call trap on 46K-page brains. Resolver has a per-run
// cache so duplicate names (same person appearing on many pages) resolve
// once, not once per mention.
const resolver = makeResolver(engine, { mode: 'batch' });
const unresolved: UnresolvedFrontmatterRef[] = [];
const nullResolver = {
resolve: async () => null as string | null,
};
const allSlugs = await engine.getAllSlugs();
const slugList = Array.from(allSlugs);
let processed = 0, created = 0;
@@ -564,25 +618,45 @@ async function extractLinksFromDB(
}
const fullContent = page.compiled_truth + '\n' + page.timeline;
const candidates = extractPageLinks(fullContent, page.frontmatter, page.type);
// --include-frontmatter default OFF in v0.13 (codex tension 5, back-compat).
// Migration orchestrator explicitly enables it for the one-time backfill;
// user-invoked `gbrain extract links` stays outgoing-only.
const activeResolver = includeFrontmatter ? resolver : nullResolver;
const extracted = await extractPageLinks(
slug, fullContent, page.frontmatter, page.type, activeResolver,
);
unresolved.push(...extracted.unresolved);
for (const c of candidates) {
for (const c of extracted.candidates) {
// Validate BOTH endpoints exist. Incoming frontmatter edges have
// fromSlug !== the page being processed; we need that page to exist
// too or the JOIN drops the row anyway.
const fromSlug = c.fromSlug ?? slug;
if (!allSlugs.has(c.targetSlug)) continue;
if (!allSlugs.has(fromSlug)) continue;
if (dryRunSeen) {
const key = `${slug}::${c.targetSlug}::${c.linkType}`;
const key = `${fromSlug}::${c.targetSlug}::${c.linkType}::${c.linkSource ?? 'markdown'}`;
if (dryRunSeen.has(key)) continue;
dryRunSeen.add(key);
if (jsonMode) {
process.stdout.write(JSON.stringify({
action: 'add_link', from: slug, to: c.targetSlug,
type: c.linkType, context: c.context,
action: 'add_link', from: fromSlug, to: c.targetSlug,
type: c.linkType, context: c.context, link_source: c.linkSource,
}) + '\n');
} else {
console.log(` ${slug}${c.targetSlug} (${c.linkType})`);
console.log(` ${fromSlug}${c.targetSlug} (${c.linkType})${c.linkSource === 'frontmatter' ? ' [fm]' : ''}`);
}
created++;
} else {
batch.push({ from_slug: slug, to_slug: c.targetSlug, link_type: c.linkType, context: c.context });
batch.push({
from_slug: fromSlug,
to_slug: c.targetSlug,
link_type: c.linkType,
context: c.context,
link_source: c.linkSource,
origin_slug: c.originSlug,
origin_field: c.originField,
});
if (batch.length >= BATCH_SIZE) await flush();
}
}
@@ -596,8 +670,22 @@ async function extractLinksFromDB(
if (!jsonMode) {
const label = dryRun ? '(dry run) would create' : 'created';
console.log(`Links: ${label} ${created} from ${processed} pages (db source)`);
if (includeFrontmatter && unresolved.length > 0) {
// Top-20 preview of unresolvable frontmatter names so the user can
// see where the graph has holes (codex tension 6.4).
console.log(`Unresolved frontmatter refs: ${unresolved.length} total`);
const bucket = new Map<string, number>();
for (const u of unresolved) {
const key = `${u.field}:${u.name}`;
bucket.set(key, (bucket.get(key) || 0) + 1);
}
const top = Array.from(bucket.entries()).sort((a, b) => b[1] - a[1]).slice(0, 20);
for (const [key, count] of top) {
console.log(` ${count}× ${key}`);
}
}
}
return { created, pages: processed };
return { created, pages: processed, unresolved };
}
async function extractTimelineFromDB(

View File

@@ -14,11 +14,13 @@ import type { Migration } from './types.ts';
import { v0_11_0 } from './v0_11_0.ts';
import { v0_12_0 } from './v0_12_0.ts';
import { v0_12_2 } from './v0_12_2.ts';
import { v0_13_0 } from './v0_13_0.ts';
export const migrations: Migration[] = [
v0_11_0,
v0_12_0,
v0_12_2,
v0_13_0,
];
/** Look up a migration by exact version string. */

View File

@@ -0,0 +1,175 @@
/**
* v0.13.0 migration orchestrator — frontmatter relationship indexing.
*
* v0.13 extends the knowledge graph to project typed edges from YAML
* frontmatter (company, investors, attendees, key_people, etc.), not just
* `[Name](path)` markdown refs. This migration:
*
* A. Schema — `gbrain init --migrate-only` triggers migrate.ts v11 which
* adds link_source + origin_page_id + origin_field columns,
* swaps the unique constraint to include them, and creates
* new indexes.
* B. Backfill — `gbrain extract links --source db --include-frontmatter`
* walks every page and emits the frontmatter-derived edges.
* Uses the batch-mode resolver (pg_trgm only, no LLM).
* C. Verify — Query the links table and confirm link_source='frontmatter'
* rows exist (> 0 on any brain with frontmatter content).
* D. Record — append to ~/.gbrain/completed.jsonl.
*
* Idempotent. Resumable from `partial` via ON CONFLICT DO NOTHING on the
* new unique constraint. Wall-clock budget on 46K-page brains: 2-5 min
* (pg_trgm index-backed, no embedding or LLM calls).
*
* Ignores `auto_link=false` config: migration is canonical (CLAUDE.md),
* not advisory. The auto_link toggle controls the put_page post-hook,
* not one-time schema+backfill work.
*/
import { execSync } from 'child_process';
import type { Migration, OrchestratorOpts, OrchestratorResult, OrchestratorPhaseResult } from './types.ts';
import { appendCompletedMigration } from '../../core/preferences.ts';
// ── Phase A — Schema ────────────────────────────────────────
//
// migrate.ts v11 adds the link_source/origin_page_id/origin_field columns
// and swaps the unique constraint. Schema build time on 46K pages is
// ~10s (ALTER + index builds). Bumped timeout accounts for slow Supabase
// links (v0.12.1 pattern — migrations can time out on the 60s default).
// Use the CURRENTLY-RUNNING binary path (not `gbrain` off $PATH). After
// `gbrain upgrade` rewrites the binary, a bare `gbrain` could resolve to
// an older installed copy via alias shadowing or stale PATH cache. The
// active process.execPath is the one that loaded THIS migration module,
// so recursing into it is always the right binary.
const GBRAIN = process.execPath;
function phaseASchema(opts: OrchestratorOpts): OrchestratorPhaseResult {
if (opts.dryRun) return { name: 'schema', status: 'skipped', detail: 'dry-run' };
try {
execSync(`${GBRAIN} init --migrate-only`, { stdio: 'inherit', timeout: 600_000, env: process.env });
return { name: 'schema', status: 'complete' };
} catch (e) {
const msg = e instanceof Error ? e.message : String(e);
return { name: 'schema', status: 'failed', detail: msg };
}
}
// ── Phase B — Frontmatter edge backfill ─────────────────────
function phaseBBackfill(opts: OrchestratorOpts): OrchestratorPhaseResult {
if (opts.dryRun) return { name: 'frontmatter_backfill', status: 'skipped', detail: 'dry-run' };
try {
// `--source db` iterates pages from the engine (no local checkout required).
// `--include-frontmatter` is the v0.13 flag that enables the canonical
// frontmatter link extractor. Default-OFF in the CLI for back-compat;
// the migration explicitly opts in because this is the canonical backfill.
execSync(`${GBRAIN} extract links --source db --include-frontmatter`, {
stdio: 'inherit',
timeout: 1_800_000, // 30 min hard cap; typical 2-5 min on 46K pages
env: process.env,
});
return { name: 'frontmatter_backfill', status: 'complete' };
} catch (e) {
const msg = e instanceof Error ? e.message : String(e);
return { name: 'frontmatter_backfill', status: 'failed', detail: msg };
}
}
// ── Phase C — Verify ────────────────────────────────────────
function phaseCVerify(opts: OrchestratorOpts): OrchestratorPhaseResult {
if (opts.dryRun) return { name: 'verify', status: 'skipped', detail: 'dry-run' };
try {
// Query frontmatter edge count via get_stats + a secondary --json call
// to `gbrain graph-query` as a smoke test: extract one random page and
// confirm it has at least one edge. Non-blocking.
//
// We intentionally do NOT fail on 0 frontmatter edges: fresh installs,
// docs-only brains, and brains with no entity pages legitimately
// produce 0. Phase B's own stdout shows `Links: created N` which is
// the authoritative signal — user sees it during upgrade.
const out = execSync(`${GBRAIN} call get_stats`, {
encoding: 'utf-8', timeout: 60_000, env: process.env,
});
const parsed = JSON.parse(out) as { link_count?: number; page_count?: number };
const linkCount = parsed.link_count ?? 0;
const pageCount = parsed.page_count ?? 0;
return {
name: 'verify',
status: 'complete',
detail: `pages=${pageCount}, links=${linkCount} (backfill output in Phase B logs)`,
};
} catch (e) {
const msg = e instanceof Error ? e.message : String(e);
return { name: 'verify', status: 'failed', detail: msg };
}
}
// ── Orchestrator ────────────────────────────────────────────
async function orchestrator(opts: OrchestratorOpts): Promise<OrchestratorResult> {
console.log('');
console.log('=== v0.13.0 — Frontmatter relationship indexing ===');
if (opts.dryRun) console.log(' (dry-run; no side effects)');
console.log('');
const phases: OrchestratorPhaseResult[] = [];
const a = phaseASchema(opts);
phases.push(a);
if (a.status === 'failed') return finalizeResult(phases, 'failed');
const b = phaseBBackfill(opts);
phases.push(b);
// Backfill failure → partial. Schema is already applied so re-running
// only re-tries the backfill (idempotent via ON CONFLICT DO NOTHING).
if (b.status === 'failed') return finalizeResult(phases, 'partial');
const c = phaseCVerify(opts);
phases.push(c);
const overallStatus: 'complete' | 'partial' | 'failed' =
a.status === 'failed' || b.status === 'failed' ? 'failed' :
c.status === 'failed' ? 'partial' :
'complete';
return finalizeResult(phases, overallStatus);
}
function finalizeResult(phases: OrchestratorPhaseResult[], status: 'complete' | 'partial' | 'failed'): OrchestratorResult {
if (status !== 'failed') {
try {
appendCompletedMigration({ version: '0.13.0', status: status as 'complete' | 'partial' });
} catch {
// Recording is best-effort.
}
}
return {
version: '0.13.0',
status,
phases,
};
}
export const v0_13_0: Migration = {
version: '0.13.0',
featurePitch: {
headline: 'Frontmatter becomes a graph — company, investors, attendees now create typed edges automatically',
description:
'v0.13 extends the knowledge graph to project typed edges from YAML frontmatter. ' +
'Every `company: X`, `investors: [A, B]`, `attendees: [Pedro, Garry]`, `key_people`, ' +
'`partner`, `lead`, and `related` field you already wrote now surfaces in ' +
'`gbrain graph`. Direction semantics respect subject-of-verb (Pedro → meeting, ' +
'not meeting → Pedro). The migration backfills every existing page in ~2-5 min ' +
'on a 46K-page brain. Uses pg_trgm fuzzy-match for name resolution (zero LLM ' +
'cost, zero API calls). Unresolvable names surface in the extract summary so you ' +
'see exactly where the graph has holes.',
},
orchestrator,
};
/** Exported for unit tests. */
export const __testing = {
phaseASchema,
phaseBBackfill,
phaseCVerify,
};

View File

@@ -1,5 +1,5 @@
import { execSync } from 'child_process';
import { existsSync, readFileSync, writeFileSync, mkdirSync } from 'fs';
import { existsSync, readFileSync, writeFileSync, mkdirSync, appendFileSync } from 'fs';
import { join } from 'path';
import { VERSION } from '../version.ts';
@@ -61,8 +61,18 @@ export async function runUpgrade(args: string[]) {
// autopilot install) on a v0.11.0→v0.11.1 jump. Codex H7.
try {
execSync('gbrain post-upgrade', { stdio: 'inherit', timeout: 300_000 });
} catch {
// post-upgrade is best-effort, don't fail the upgrade
} catch (e) {
// post-upgrade is best-effort, don't fail the upgrade. BUT leave a
// trail so `gbrain doctor` can surface it and give the user a clear
// paste-ready recovery command. Silent failure here is how users end
// up with half-upgraded brains and no signal.
recordUpgradeError({
phase: 'post-upgrade',
fromVersion: oldVersion,
toVersion: newVersion,
error: e instanceof Error ? e.message : String(e),
hint: 'Run: gbrain apply-migrations --yes',
});
}
// Run features scan to show what's new and what to fix
try {
@@ -84,6 +94,39 @@ function verifyUpgrade(): string {
}
}
/**
* Append a structured record to ~/.gbrain/upgrade-errors.jsonl when a
* best-effort phase of the upgrade fails (e.g., `gbrain post-upgrade`
* silently bombing). Without this trail, users end up with half-upgraded
* brains and no signal. `gbrain doctor` reads this file and surfaces the
* paste-ready recovery hint. Failures here are themselves best-effort.
*/
export function recordUpgradeError(record: {
phase: string;
fromVersion: string;
toVersion: string;
error: string;
hint: string;
}): void {
try {
const dir = join(process.env.HOME || '', '.gbrain');
mkdirSync(dir, { recursive: true });
const path = join(dir, 'upgrade-errors.jsonl');
const line = JSON.stringify({
ts: new Date().toISOString(),
phase: record.phase,
from_version: record.fromVersion,
to_version: record.toVersion,
error: record.error,
hint: record.hint,
}) + '\n';
appendFileSync(path, line);
} catch {
// Recording errors is itself best-effort. The user will still see the
// underlying failure in stdout/stderr from the original command.
}
}
function saveUpgradeState(oldVersion: string, newVersion: string) {
try {
const dir = join(process.env.HOME || '', '.gbrain');

View File

@@ -17,6 +17,17 @@ export interface LinkBatchInput {
to_slug: string;
link_type?: string;
context?: string;
/**
* Provenance (v0.13+). Pass 'frontmatter' for edges derived from YAML
* frontmatter, 'markdown' for [Name](path) refs, 'manual' for user-created.
* NULL means "legacy / unknown" and is only used by pre-v0.13 rows; new
* writes should always set this. Missing on input defaults to 'markdown'.
*/
link_source?: string;
/** For link_source='frontmatter': slug of the page whose frontmatter created this edge. */
origin_slug?: string;
/** Frontmatter field name (e.g. 'key_people', 'investors'). */
origin_field?: string;
}
/** Input row for addTimelineEntriesBatch. Optional fields default to '' (matches NOT NULL DDL). */
@@ -69,7 +80,20 @@ export interface BrainEngine {
deleteChunks(slug: string): Promise<void>;
// Links
addLink(from: string, to: string, context?: string, linkType?: string): Promise<void>;
/**
* Single-row link insert. linkSource defaults to 'markdown' for back-compat
* with pre-v0.13 callers. Pass 'frontmatter' + originSlug + originField for
* frontmatter-derived edges; 'manual' for user-initiated edges.
*/
addLink(
from: string,
to: string,
context?: string,
linkType?: string,
linkSource?: string,
originSlug?: string,
originField?: string,
): Promise<void>;
/**
* Bulk insert links via a single multi-row INSERT...SELECT FROM (VALUES) JOIN pages
* statement with ON CONFLICT DO NOTHING. Returns the count of rows actually inserted
@@ -80,11 +104,32 @@ export interface BrainEngine {
/**
* Remove links from `from` to `to`. If linkType is provided, only that specific
* (from, to, type) row is removed. If omitted, ALL link types between the pair
* are removed (matches pre-multi-type-link behavior).
* are removed (matches pre-multi-type-link behavior). linkSource additionally
* constrains the delete to a specific provenance ('frontmatter', 'markdown',
* 'manual') — used by runAutoLink reconciliation to avoid deleting edges from
* other provenances when pruning frontmatter-derived edges.
*/
removeLink(from: string, to: string, linkType?: string): Promise<void>;
removeLink(from: string, to: string, linkType?: string, linkSource?: string): Promise<void>;
getLinks(slug: string): Promise<Link[]>;
getBacklinks(slug: string): Promise<Link[]>;
/**
* Fuzzy-match a display name to a page slug using pg_trgm similarity.
* Zero embedding cost, zero LLM cost — designed for the v0.13 resolver used
* during migration/batch backfill where 5K+ lookups must stay sub-second.
*
* Returns the best match whose title similarity is at or above `minSimilarity`
* (default 0.55). If `dirPrefix` is given (e.g. 'people' or 'companies'),
* only slugs starting with that prefix are considered. Returns null when no
* page meets the threshold.
*
* Uses the `%` trigram operator (GIN-indexed) + the standard `similarity()`
* function. Both engines support pg_trgm (PGLite 0.3+, Postgres always).
*/
findByTitleFuzzy(
name: string,
dirPrefix?: string,
minSimilarity?: number,
): Promise<{ slug: string; similarity: number } | null>;
traverseGraph(slug: string, depth?: number): Promise<GraphNode[]>;
/**
* Edge-based graph traversal with optional type and direction filters.

View File

@@ -139,12 +139,47 @@ export function extractEntityRefs(content: string): EntityRef[] {
// ─── Link candidates (richer than EntityRef) ────────────────────
export interface LinkCandidate {
/**
* Source page slug for the edge. When omitted, callers default to
* "the page being written" (operations.ts runAutoLink) or "the page
* currently being processed" (extract.ts). Explicitly set when
* frontmatter emits an incoming edge — e.g. a company page's
* `key_people: [pedro-franceschi]` produces a candidate whose
* fromSlug is `people/pedro-franceschi`, not the company.
*/
fromSlug?: string;
/** Target page slug (no .md, no ../). */
targetSlug: string;
/** Inferred relationship type. */
linkType: string;
/** Surrounding text (up to ~80 chars) used for inference + storage. */
context: string;
/**
* Provenance (v0.13+). Defaults to 'markdown' on older call sites;
* frontmatter-derived candidates set 'frontmatter'; user-created edges
* via explicit API pass 'manual'.
*/
linkSource?: string;
/**
* Origin-page slug. Only populated for link_source='frontmatter' so
* reconciliation can scope cleanups to edges THIS page's frontmatter
* created (never touching edges other pages authored).
*/
originSlug?: string;
/** Frontmatter field name (e.g. 'key_people'), for debug + unresolved report. */
originField?: string;
}
/**
* Result of extractPageLinks. `candidates` includes markdown refs + bare
* slug refs + frontmatter-derived edges (v0.13). `unresolved` lists
* frontmatter names that did not resolve to any page — surfaced in the
* put_page auto_links response and the extract summary so users know
* where the graph has holes.
*/
export interface PageLinksResult {
candidates: LinkCandidate[];
unresolved: UnresolvedFrontmatterRef[];
}
/**
@@ -153,16 +188,24 @@ export interface LinkCandidate {
* Sources:
* 1. Markdown entity refs in compiled_truth + timeline (extractEntityRefs).
* 2. Bare slug references in text (people/slug, companies/slug).
* 3. Frontmatter `source:` field (creates a 'source' link).
* 3. Frontmatter fields → typed graph edges (v0.13: company, investors,
* attendees, key_people, etc.). See FRONTMATTER_LINK_MAP.
*
* Within-page dedup: multiple mentions of the same (targetSlug, linkType)
* collapse to one candidate. The first occurrence's context wins.
* ASYNC (v0.13): frontmatter extraction resolves display names to slugs
* via the supplied resolver, which may hit the DB. Pre-v0.13 callers
* that don't care about frontmatter can pass a resolver that always
* returns null; only markdown/bare-slug candidates are emitted.
*
* Within-page dedup: multiple mentions of the same (fromSlug, targetSlug,
* linkType) tuple collapse to one candidate. First occurrence wins.
*/
export function extractPageLinks(
export async function extractPageLinks(
slug: string,
content: string,
frontmatter: Record<string, unknown>,
pageType: PageType,
): LinkCandidate[] {
resolver: SlugResolver,
): Promise<PageLinksResult> {
const candidates: LinkCandidate[] = [];
// 1. Markdown entity refs.
@@ -177,6 +220,7 @@ export function extractPageLinks(
targetSlug: ref.slug,
linkType: inferLinkType(pageType, context, content, ref.slug),
context,
linkSource: 'markdown',
});
}
@@ -198,30 +242,26 @@ export function extractPageLinks(
targetSlug: m[1],
linkType: inferLinkType(pageType, context, content, m[1]),
context,
linkSource: 'markdown',
});
}
// 3. Frontmatter source field.
const source = frontmatter.source;
if (typeof source === 'string' && source.length > 0 && /^[a-z][a-z0-9-]*\/[a-z0-9][a-z0-9-]*$/.test(source)) {
candidates.push({
targetSlug: source,
linkType: 'source',
context: `frontmatter source: ${source}`,
});
}
// 3. Frontmatter-derived edges (v0.13). Includes the legacy `source:`
// field along with the full field map.
const fm = await extractFrontmatterLinks(slug, pageType, frontmatter, resolver);
candidates.push(...fm.candidates);
// Within-page dedup: same (targetSlug, linkType) collapses to one entry.
// First occurrence wins (preserves the most natural/earliest context).
// Within-page dedup: same (fromSlug, targetSlug, linkType, linkSource)
// collapses to one entry. First occurrence wins.
const seen = new Set<string>();
const result: LinkCandidate[] = [];
for (const c of candidates) {
const key = `${c.targetSlug}\u0000${c.linkType}`;
const key = `${c.fromSlug ?? ''}\u0000${c.targetSlug}\u0000${c.linkType}\u0000${c.linkSource ?? ''}`;
if (seen.has(key)) continue;
seen.add(key);
result.push(c);
}
return result;
return { candidates: result, unresolved: fm.unresolved };
}
/** Excerpt a window of `width` chars around `idx`, collapsed to one line. */
@@ -311,6 +351,272 @@ export function inferLinkType(pageType: PageType, context: string, globalContext
return 'mentions';
}
// ─── Frontmatter link extraction (v0.13) ────────────────────────
//
// YAML frontmatter on entity pages carries rich relationship data:
//
// company: "Stripe" # person page
// companies: [Stripe, Plaid] # person page (alias of company)
// key_people: [Patrick Collison, John] # company page (incoming works_at)
// investors: [{name: Sequoia}, Benchmark] # deal page (incoming invested_in)
// attendees: [Pedro, Garry] # meeting page (incoming attended)
//
// Each maps to a typed graph edge. The mapping lives here (one source of
// truth) so the three entry points — operations.ts auto-link, extract.ts
// fs source, extract.ts db source — emit identical edges for the same
// frontmatter. This is the point of the v0.13 rewrite.
//
// DIRECTION: "incoming" means the page being written is the TO side;
// the FROM side is the resolved frontmatter value. E.g. `key_people:
// [Pedro]` on company/stripe emits `people/pedro -> companies/stripe
// type=works_at`, preserving subject-of-verb semantics for graph reads.
//
// MULTI-DIR HINTS: investors can be companies, funds, or people. The
// resolver tries each hint in order and takes the first match.
export interface FrontmatterFieldMapping {
/** Field name(s). Multiple entries are aliases (e.g. company + companies). */
fields: string[];
/**
* Only applies when page.type matches. Omitted = any page type. String
* (not PageType) because some page types like 'meeting' exist in the
* pages table without being in the TypeScript PageType enum.
*/
pageType?: string;
/** Edge link_type. */
type: string;
/** 'outgoing' = page→target. 'incoming' = target→page (subject of verb = from). */
direction: 'outgoing' | 'incoming';
/**
* Target directory hints for slug resolution. Single string or ordered
* array; resolver tries each. E.g. investors → ['companies', 'funds', 'people'].
*/
dirHint: string | string[];
}
/**
* Canonical field → (type, direction, dir-hint) map. Consulted by
* extractFrontmatterLinks for every YAML field on every written page.
*
* NOT normalization: kept as a flat array so duplicate field names with
* different pageType filters coexist cleanly (vs an object-literal which
* would last-write-wins on key collision).
*/
export const FRONTMATTER_LINK_MAP: FrontmatterFieldMapping[] = [
// Person pages → companies
{ fields: ['company', 'companies'], pageType: 'person', type: 'works_at', direction: 'outgoing', dirHint: 'companies' },
{ fields: ['founded'], pageType: 'person', type: 'founded', direction: 'outgoing', dirHint: 'companies' },
// Company pages (incoming relationships — subject of the verb lives elsewhere)
{ fields: ['key_people'], pageType: 'company', type: 'works_at', direction: 'incoming', dirHint: 'people' },
{ fields: ['partner'], pageType: 'company', type: 'yc_partner', direction: 'incoming', dirHint: 'people' },
{ fields: ['investors'], pageType: 'company', type: 'invested_in', direction: 'incoming',
dirHint: ['companies', 'funds', 'people'] },
// Deal pages (all incoming — deals are the object)
{ fields: ['investors'], pageType: 'deal', type: 'invested_in', direction: 'incoming',
dirHint: ['companies', 'funds', 'people'] },
{ fields: ['lead'], pageType: 'deal', type: 'led_round', direction: 'incoming',
dirHint: ['companies', 'funds', 'people'] },
// Meeting pages
{ fields: ['attendees'], pageType: 'meeting', type: 'attended', direction: 'incoming', dirHint: 'people' },
// Any page type
{ fields: ['sources'], type: 'discussed_in', direction: 'incoming', dirHint: ['source', 'media'] },
{ fields: ['source'], type: 'source', direction: 'outgoing', dirHint: '' /* already slug-shaped */ },
{ fields: ['related', 'see_also'], type: 'related_to', direction: 'outgoing', dirHint: '' },
];
// ─── Slug resolver ──────────────────────────────────────────────
export interface SlugResolver {
/**
* Resolve a display name to a canonical slug.
* Returns null when no match meets confidence threshold — callers should
* skip (not write a dead link) and the unresolved name goes into the
* extract/put_page summary so the user can see the gap.
*/
resolve(name: string, dirHint?: string | string[]): Promise<string | null>;
}
/**
* Create a resolver scoped to a single extract run or single put_page call.
*
* mode: 'batch' (migration / gbrain extract) — pg_trgm only, NO search
* fallback. On a 46K-page brain this avoids N-thousand OpenAI embedding
* calls + Anthropic Haiku expansion calls (see operations-query-hidden-haiku
* learning) and keeps the backfill deterministic + under a wall-clock budget.
*
* mode: 'live' (put_page auto-link) — can afford the (rare, bounded) search
* fallback for names that don't fuzzy-match. Still passes expand=false to
* dodge Haiku.
*
* cache: per-resolver instance. Same name → same slug lookup every call.
* Callers never need to dedupe names themselves.
*/
export function makeResolver(
engine: BrainEngine,
opts: { mode: 'batch' | 'live' } = { mode: 'live' },
): SlugResolver {
const cache = new Map<string, string | null>();
const norm = (s: string) => s.toLowerCase().replace(/[^a-z0-9\s-]/g, '').trim().replace(/\s+/g, '-');
return {
async resolve(name: string, dirHint?: string | string[]): Promise<string | null> {
if (!name || typeof name !== 'string') return null;
const trimmed = name.trim();
if (!trimmed) return null;
const cacheKey = `${trimmed}\u0000${Array.isArray(dirHint) ? dirHint.join(',') : (dirHint || '')}`;
if (cache.has(cacheKey)) return cache.get(cacheKey)!;
const hints = Array.isArray(dirHint) ? dirHint : (dirHint ? [dirHint] : []);
// Step 1: already a slug? (dir/name shape, lowercase, hyphenated)
if (/^[a-z][a-z0-9-]*\/[a-z0-9][a-z0-9-]*$/.test(trimmed)) {
const page = await engine.getPage(trimmed);
if (page) {
cache.set(cacheKey, trimmed);
return trimmed;
}
}
// Step 2: dir-hint + slugify → exact getPage
const slugified = norm(trimmed);
for (const hint of hints) {
if (!hint) continue;
const candidate = `${hint}/${slugified}`;
const page = await engine.getPage(candidate);
if (page) {
cache.set(cacheKey, candidate);
return candidate;
}
}
// Step 3: pg_trgm fuzzy title match — both modes. Tries each hint in
// order; first hint with a ≥0.55 similarity match wins. If no hints,
// try the whole pages table.
const searchHints = hints.length > 0 ? hints : [undefined];
for (const hint of searchHints) {
const match = await engine.findByTitleFuzzy(trimmed, hint, 0.55);
if (match) {
cache.set(cacheKey, match.slug);
return match.slug;
}
}
// Step 4: live-mode ONLY — fall back to hybrid search. expand: false
// is MANDATORY (see operations-query-hidden-haiku learning). Batch
// mode skips this step entirely to keep migration deterministic.
if (opts.mode === 'live') {
try {
const results = await engine.searchKeyword(trimmed, { limit: 3 });
if (results.length > 0 && results[0].score >= 0.8) {
// Filter by dir hint if provided.
const top = hints.length > 0
? results.find(r => hints.some(h => r.slug.startsWith(`${h}/`)))
: results[0];
if (top) {
cache.set(cacheKey, top.slug);
return top.slug;
}
}
} catch { /* search errors are non-fatal; fall through to null */ }
}
// Null = unresolvable. Caller records for the unresolved report.
cache.set(cacheKey, null);
return null;
},
};
}
// ─── Frontmatter extractor ──────────────────────────────────────
export interface UnresolvedFrontmatterRef {
/** The frontmatter field name. */
field: string;
/** The name that did not resolve. */
name: string;
}
export interface FrontmatterExtractResult {
candidates: LinkCandidate[];
unresolved: UnresolvedFrontmatterRef[];
}
/**
* Extract typed graph edges from YAML frontmatter. Async because the
* resolver may need to query the DB for fuzzy matches.
*
* Arrays of strings: each entry resolved independently.
* Arrays of objects: uses the `name` or `slug` property (codex tension 6.3).
* Non-string / non-object entries: silently skipped (log-only).
*/
export async function extractFrontmatterLinks(
slug: string,
pageType: PageType,
frontmatter: Record<string, unknown>,
resolver: SlugResolver,
): Promise<FrontmatterExtractResult> {
const candidates: LinkCandidate[] = [];
const unresolved: UnresolvedFrontmatterRef[] = [];
for (const mapping of FRONTMATTER_LINK_MAP) {
if (mapping.pageType && mapping.pageType !== pageType) continue;
for (const field of mapping.fields) {
const value = frontmatter[field];
if (value == null) continue;
const entries = Array.isArray(value) ? value : [value];
for (const entry of entries) {
// Extract the name to resolve. Strings pass through; objects use
// the `name` / `slug` / `title` field in that preference order.
let name: string | null = null;
let contextExtra = '';
if (typeof entry === 'string') {
name = entry;
} else if (entry && typeof entry === 'object') {
const obj = entry as Record<string, unknown>;
const n = obj.name ?? obj.slug ?? obj.title;
if (typeof n === 'string') {
name = n;
// Carry interesting object fields (role, title) into the context.
const extras: string[] = [];
if (typeof obj.role === 'string') extras.push(obj.role);
if (typeof obj.title === 'string' && obj.title !== n) extras.push(obj.title);
if (extras.length > 0) contextExtra = ` (${extras.join(', ')})`;
}
}
if (!name) continue; // skip numbers, nulls, malformed objects
const resolved = await resolver.resolve(name, mapping.dirHint);
if (!resolved) {
unresolved.push({ field, name });
continue;
}
// Outgoing: page → resolved. Incoming: resolved → page.
const fromSlug = mapping.direction === 'outgoing' ? slug : resolved;
const toSlug = mapping.direction === 'outgoing' ? resolved : slug;
// Context enrichment (review Finding 7): readable in backlink panels
// and search snippets instead of bare `frontmatter.key_people`.
const context = `frontmatter.${field}: ${name}${contextExtra}`;
candidates.push({
fromSlug,
targetSlug: toSlug,
linkType: mapping.type,
context,
linkSource: 'frontmatter',
originSlug: slug, // the page whose frontmatter created this edge
originField: field,
});
}
}
}
return { candidates, unresolved };
}
// ─── Timeline parsing ───────────────────────────────────────────
export interface TimelineCandidate {

View File

@@ -284,6 +284,70 @@ export const MIGRATIONS: Migration[] = [
DROP FUNCTION IF EXISTS update_page_search_vector_from_timeline();
`,
},
{
version: 11,
name: 'links_provenance_columns',
// v0.13: adds provenance columns so frontmatter-derived edges can be
// distinguished from markdown/manual edges. Reconciliation on put_page
// scopes by (link_source='frontmatter' AND origin_page_id = written_page)
// so edges from other pages never get mis-deleted.
//
// Unique constraint swaps: old (from, to, type) blocks coexistence of
// markdown + frontmatter + manual edges with the same tuple. New tuple
// includes link_source + origin_page_id.
//
// Existing rows keep link_source IS NULL (legacy marker) — they are NOT
// backfilled to 'markdown' because existing rows may be manual/imported
// /inferred; mislabeling them as markdown would corrupt provenance.
//
// Idempotent via IF NOT EXISTS / DROP IF EXISTS.
sql: `
-- Postgres version gate: UNIQUE NULLS NOT DISTINCT requires PG15+.
-- PGLite ships PG17.5, current Supabase is PG15+. Old Supabase projects
-- on PG14 hit an explicit error rather than half-applying (drop old
-- constraint but fail to add new one → brain loses uniqueness guarantee).
DO $$ BEGIN
IF current_setting('server_version_num')::int < 150000 THEN
RAISE EXCEPTION
'v0.13 migration requires Postgres 15+. Current: %. '
'Upgrade your Postgres (Supabase: migrate project to a newer PG major). '
'This migration intentionally stops before touching the schema to preserve data integrity.',
current_setting('server_version');
END IF;
END $$;
ALTER TABLE links ADD COLUMN IF NOT EXISTS link_source TEXT;
DO $$ BEGIN
IF NOT EXISTS (
SELECT 1 FROM pg_constraint WHERE conname = 'links_link_source_check'
) THEN
ALTER TABLE links ADD CONSTRAINT links_link_source_check
CHECK (link_source IS NULL OR link_source IN ('markdown', 'frontmatter', 'manual'));
END IF;
END $$;
ALTER TABLE links ADD COLUMN IF NOT EXISTS origin_page_id INTEGER
REFERENCES pages(id) ON DELETE SET NULL;
ALTER TABLE links ADD COLUMN IF NOT EXISTS origin_field TEXT;
-- Backfill NULL link_source → 'markdown' for existing rows. Codex review
-- caught that without this, pre-v0.13 legacy rows coexist with new
-- 'markdown' writes under NULLS NOT DISTINCT (NULL ≠ 'markdown'),
-- causing duplicate edges to accumulate. Treating legacy as markdown
-- is the accurate best-guess: pre-v0.13 auto-link only emitted markdown
-- edges. User-created 'manual' edges are a v0.13+ concept anyway.
UPDATE links SET link_source = 'markdown' WHERE link_source IS NULL;
ALTER TABLE links DROP CONSTRAINT IF EXISTS links_from_to_type_unique;
DO $$ BEGIN
IF NOT EXISTS (
SELECT 1 FROM pg_constraint WHERE conname = 'links_from_to_type_source_origin_unique'
) THEN
ALTER TABLE links ADD CONSTRAINT links_from_to_type_source_origin_unique
UNIQUE NULLS NOT DISTINCT (from_page_id, to_page_id, link_type, link_source, origin_page_id);
END IF;
END $$;
CREATE INDEX IF NOT EXISTS idx_links_source ON links(link_source);
CREATE INDEX IF NOT EXISTS idx_links_origin ON links(origin_page_id);
`,
},
];
export const LATEST_VERSION = MIGRATIONS.length > 0

View File

@@ -13,7 +13,7 @@ import { importFromContent } from './import-file.ts';
import { hybridSearch } from './search/hybrid.ts';
import { expandQuery } from './search/expansion.ts';
import { dedupResults } from './search/dedup.ts';
import { extractPageLinks, isAutoLinkEnabled } from './link-extraction.ts';
import { extractPageLinks, isAutoLinkEnabled, makeResolver, type UnresolvedFrontmatterRef } from './link-extraction.ts';
import * as db from './db.ts';
// --- Types ---
@@ -248,7 +248,11 @@ const put_page: Operation = {
// Combined with the backlink boost in hybridSearch, attacker-placed targets
// would surface higher in search. Local CLI users (ctx.remote=false) opt
// into this behavior; MCP/remote writes do not.
let autoLinks: { created: number; removed: number; errors: number } | { error: string } | { skipped: 'remote' } | undefined;
let autoLinks:
| { created: number; removed: number; errors: number; unresolved: UnresolvedFrontmatterRef[] }
| { error: string }
| { skipped: 'remote' }
| undefined;
if (ctx.remote === true) {
autoLinks = { skipped: 'remote' };
} else if (result.parsedPage) {
@@ -286,43 +290,114 @@ async function runAutoLink(
engine: BrainEngine,
slug: string,
parsed: { type: PageType; compiled_truth: string; timeline: string; frontmatter: Record<string, unknown> },
): Promise<{ created: number; removed: number; errors: number }> {
): Promise<{ created: number; removed: number; errors: number; unresolved: UnresolvedFrontmatterRef[] }> {
const fullContent = parsed.compiled_truth + '\n' + parsed.timeline;
const candidates = extractPageLinks(fullContent, parsed.frontmatter, parsed.type);
// Live-mode resolver: per-put throwaway cache, pg_trgm + optional search.
const resolver = makeResolver(engine, { mode: 'live' });
const { candidates, unresolved } = await extractPageLinks(
slug, fullContent, parsed.frontmatter, parsed.type, resolver,
);
// Resolve which targets exist (skip refs to non-existent pages to avoid FK
// violation churn in addLink). One getAllSlugs call upfront, O(1) lookup.
const allSlugs = await engine.getAllSlugs();
const valid = candidates.filter(c => allSlugs.has(c.targetSlug));
const valid = candidates.filter(c =>
allSlugs.has(c.targetSlug) && (!c.fromSlug || allSlugs.has(c.fromSlug))
);
// Split candidates by direction. Outgoing (fromSlug === slug or unset) are
// this page's own edges, reconciled against getLinks(slug). Incoming
// (fromSlug !== slug — frontmatter with `direction: incoming`) are edges
// where this page is the TO side; reconciled against getBacklinks(slug)
// but SCOPED to the frontmatter edges this page authored via
// (link_source='frontmatter' AND origin_slug = slug). We never touch
// frontmatter edges authored by OTHER pages.
const out = valid.filter(c => !c.fromSlug || c.fromSlug === slug);
const inc = valid.filter(c => c.fromSlug && c.fromSlug !== slug);
// Run getLinks + addLink/removeLink loops inside a single transaction so that
// concurrent put_page calls on the same slug can't race the reconciliation:
// without this, two simultaneous writes both read stale `existingKeys` and
// re-create links the other side just removed (lost-update). The transaction
// serializes via row-level locks on `links` rows touched by addLink/removeLink.
return await engine.transaction(async (tx) => {
const existing = await tx.getLinks(slug);
const desiredKeys = new Set(valid.map(c => `${c.targetSlug}\u0000${c.linkType}`));
const existingKeys = new Set(existing.map(l => `${l.to_slug}\u0000${l.link_type}`));
const result = await engine.transaction(async (tx) => {
const existingOut = await tx.getLinks(slug);
// Incoming: we only look at frontmatter edges WE authored (origin_slug=slug).
// Non-frontmatter and other-page frontmatter edges survive untouched.
const existingInRaw = await tx.getBacklinks(slug);
const existingIn = existingInRaw.filter(
l => l.link_source === 'frontmatter' && l.origin_slug === slug,
);
// Reconcilable outgoing edges: markdown + our own frontmatter edges.
// Manual edges (link_source='manual') are NEVER touched by reconciliation.
const reconcilableOut = existingOut.filter(
l => l.link_source === 'markdown' || l.link_source == null ||
(l.link_source === 'frontmatter' && l.origin_slug === slug),
);
const outKeys = new Set(out.map(c =>
`${c.targetSlug}\u0000${c.linkType}\u0000${c.linkSource ?? 'markdown'}`
));
const incKeys = new Set(inc.map(c =>
`${c.fromSlug}\u0000${c.linkType}`
));
let created = 0, removed = 0, errors = 0;
// Add new + update existing.
for (const c of valid) {
// Add outgoing edges.
for (const c of out) {
try {
await tx.addLink(slug, c.targetSlug, c.context, c.linkType);
if (!existingKeys.has(`${c.targetSlug}\u0000${c.linkType}`)) created++;
await tx.addLink(
slug, c.targetSlug, c.context, c.linkType,
c.linkSource, c.originSlug, c.originField,
);
const existKey = `${c.targetSlug}\u0000${c.linkType}\u0000${c.linkSource ?? 'markdown'}`;
const exists = reconcilableOut.some(l =>
`${l.to_slug}\u0000${l.link_type}\u0000${l.link_source ?? 'markdown'}` === existKey
);
if (!exists) created++;
} catch {
errors++;
}
}
// Remove stale (in DB but not in desired set).
for (const l of existing) {
const key = `${l.to_slug}\u0000${l.link_type}`;
if (!desiredKeys.has(key)) {
// Add incoming edges (other page → slug).
for (const c of inc) {
try {
await tx.addLink(
c.fromSlug!, c.targetSlug, c.context, c.linkType,
'frontmatter', c.originSlug, c.originField,
);
const existKey = `${c.fromSlug}\u0000${c.linkType}`;
const exists = existingIn.some(l =>
`${l.from_slug}\u0000${l.link_type}` === existKey
);
if (!exists) created++;
} catch {
errors++;
}
}
// Remove stale outgoing (markdown or our-frontmatter, not in desired set).
for (const l of reconcilableOut) {
const key = `${l.to_slug}\u0000${l.link_type}\u0000${l.link_source ?? 'markdown'}`;
if (!outKeys.has(key)) {
try {
await tx.removeLink(slug, l.to_slug, l.link_type);
await tx.removeLink(slug, l.to_slug, l.link_type, l.link_source ?? undefined);
removed++;
} catch {
errors++;
}
}
}
// Remove stale incoming (our frontmatter → slug, not in desired set).
for (const l of existingIn) {
const key = `${l.from_slug}\u0000${l.link_type}`;
if (!incKeys.has(key)) {
try {
await tx.removeLink(l.from_slug, slug, l.link_type, 'frontmatter');
removed++;
} catch {
errors++;
@@ -332,6 +407,8 @@ async function runAutoLink(
return { created, removed, errors };
});
return { ...result, unresolved };
}
const delete_page: Operation = {

View File

@@ -321,43 +321,67 @@ export class PGLiteEngine implements BrainEngine {
}
// Links
async addLink(from: string, to: string, context?: string, linkType?: string): Promise<void> {
async addLink(
from: string,
to: string,
context?: string,
linkType?: string,
linkSource?: string,
originSlug?: string,
originField?: string,
): Promise<void> {
const src = linkSource ?? 'markdown';
await this.db.query(
`INSERT INTO links (from_page_id, to_page_id, link_type, context)
SELECT f.id, t.id, $3, $4
`INSERT INTO links (from_page_id, to_page_id, link_type, context, link_source, origin_page_id, origin_field)
SELECT f.id, t.id, $3, $4, $5,
(SELECT id FROM pages WHERE slug = $6),
$7
FROM pages f, pages t
WHERE f.slug = $1 AND t.slug = $2
ON CONFLICT (from_page_id, to_page_id, link_type) DO UPDATE SET
context = EXCLUDED.context`,
[from, to, linkType || '', context || '']
ON CONFLICT (from_page_id, to_page_id, link_type, link_source, origin_page_id) DO UPDATE SET
context = EXCLUDED.context,
origin_field = EXCLUDED.origin_field`,
[from, to, linkType || '', context || '', src, originSlug ?? null, originField ?? null]
);
}
async addLinksBatch(links: LinkBatchInput[]): Promise<number> {
if (links.length === 0) return 0;
// unnest() pattern: 4 array-typed bound parameters regardless of batch size.
// Same shape as PostgresEngine. Avoids the 65535-parameter cap entirely.
// unnest() pattern: 7 array-typed bound parameters regardless of batch size.
// Same shape as PostgresEngine (v0.13). Avoids the 65535-parameter cap.
const fromSlugs = links.map(l => l.from_slug);
const toSlugs = links.map(l => l.to_slug);
// Normalize optional fields to '' to match per-row addLink + NOT NULL DDL.
const linkTypes = links.map(l => l.link_type || '');
const contexts = links.map(l => l.context || '');
const linkSources = links.map(l => l.link_source || 'markdown');
const originSlugs = links.map(l => l.origin_slug || null);
const originFields = links.map(l => l.origin_field || null);
const result = await this.db.query(
`INSERT INTO links (from_page_id, to_page_id, link_type, context)
SELECT f.id, t.id, v.link_type, v.context
FROM unnest($1::text[], $2::text[], $3::text[], $4::text[])
AS v(from_slug, to_slug, link_type, context)
`INSERT INTO links (from_page_id, to_page_id, link_type, context, link_source, origin_page_id, origin_field)
SELECT f.id, t.id, v.link_type, v.context, v.link_source, o.id, v.origin_field
FROM unnest($1::text[], $2::text[], $3::text[], $4::text[], $5::text[], $6::text[], $7::text[])
AS v(from_slug, to_slug, link_type, context, link_source, origin_slug, origin_field)
JOIN pages f ON f.slug = v.from_slug
JOIN pages t ON t.slug = v.to_slug
ON CONFLICT (from_page_id, to_page_id, link_type) DO NOTHING
LEFT JOIN pages o ON o.slug = v.origin_slug
ON CONFLICT (from_page_id, to_page_id, link_type, link_source, origin_page_id) DO NOTHING
RETURNING 1`,
[fromSlugs, toSlugs, linkTypes, contexts]
[fromSlugs, toSlugs, linkTypes, contexts, linkSources, originSlugs, originFields]
);
return result.rows.length;
}
async removeLink(from: string, to: string, linkType?: string): Promise<void> {
if (linkType !== undefined) {
async removeLink(from: string, to: string, linkType?: string, linkSource?: string): Promise<void> {
if (linkType !== undefined && linkSource !== undefined) {
await this.db.query(
`DELETE FROM links
WHERE from_page_id = (SELECT id FROM pages WHERE slug = $1)
AND to_page_id = (SELECT id FROM pages WHERE slug = $2)
AND link_type = $3
AND link_source IS NOT DISTINCT FROM $4`,
[from, to, linkType, linkSource]
);
} else if (linkType !== undefined) {
await this.db.query(
`DELETE FROM links
WHERE from_page_id = (SELECT id FROM pages WHERE slug = $1)
@@ -365,6 +389,14 @@ export class PGLiteEngine implements BrainEngine {
AND link_type = $3`,
[from, to, linkType]
);
} else if (linkSource !== undefined) {
await this.db.query(
`DELETE FROM links
WHERE from_page_id = (SELECT id FROM pages WHERE slug = $1)
AND to_page_id = (SELECT id FROM pages WHERE slug = $2)
AND link_source IS NOT DISTINCT FROM $3`,
[from, to, linkSource]
);
} else {
await this.db.query(
`DELETE FROM links
@@ -377,10 +409,13 @@ export class PGLiteEngine implements BrainEngine {
async getLinks(slug: string): Promise<Link[]> {
const { rows } = await this.db.query(
`SELECT f.slug as from_slug, t.slug as to_slug, l.link_type, l.context
`SELECT f.slug as from_slug, t.slug as to_slug,
l.link_type, l.context, l.link_source,
o.slug as origin_slug, l.origin_field
FROM links l
JOIN pages f ON f.id = l.from_page_id
JOIN pages t ON t.id = l.to_page_id
LEFT JOIN pages o ON o.id = l.origin_page_id
WHERE f.slug = $1`,
[slug]
);
@@ -389,16 +424,44 @@ export class PGLiteEngine implements BrainEngine {
async getBacklinks(slug: string): Promise<Link[]> {
const { rows } = await this.db.query(
`SELECT f.slug as from_slug, t.slug as to_slug, l.link_type, l.context
`SELECT f.slug as from_slug, t.slug as to_slug,
l.link_type, l.context, l.link_source,
o.slug as origin_slug, l.origin_field
FROM links l
JOIN pages f ON f.id = l.from_page_id
JOIN pages t ON t.id = l.to_page_id
LEFT JOIN pages o ON o.id = l.origin_page_id
WHERE t.slug = $1`,
[slug]
);
return rows as unknown as Link[];
}
async findByTitleFuzzy(
name: string,
dirPrefix?: string,
minSimilarity: number = 0.55,
): Promise<{ slug: string; similarity: number } | null> {
// Inline threshold comparison instead of `SET LOCAL pg_trgm.similarity_threshold`.
// The GUC only scopes to the current transaction and pglite auto-commits each
// .query() call, so the SET LOCAL would be a no-op. Using similarity() >= $N
// directly gives predictable behavior. Tie-breaker: sort by slug so re-runs
// pick the same winner.
const prefixPattern = dirPrefix ? `${dirPrefix}/%` : '%';
const { rows } = await this.db.query(
`SELECT slug, similarity(title, $1) AS sim
FROM pages
WHERE similarity(title, $1) >= $3
AND slug LIKE $2
ORDER BY sim DESC, slug ASC
LIMIT 1`,
[name, prefixPattern, minSimilarity]
);
if (rows.length === 0) return null;
const row = rows[0] as { slug: string; sim: number };
return { slug: row.slug, similarity: row.sim };
}
async traverseGraph(slug: string, depth: number = 5): Promise<GraphNode[]> {
// Cycle prevention: visited array tracks page IDs already in the path.
// Prevents exponential blowup on cyclic subgraphs (e.g., A->B->A).

View File

@@ -62,18 +62,25 @@ CREATE INDEX IF NOT EXISTS idx_chunks_embedding ON content_chunks USING hnsw (em
-- ============================================================
-- links: cross-references between pages
-- ============================================================
-- See src/schema.sql for full design notes on link_source + origin_page_id.
CREATE TABLE IF NOT EXISTS links (
id SERIAL PRIMARY KEY,
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
link_type TEXT NOT NULL DEFAULT '',
context TEXT NOT NULL DEFAULT '',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT links_from_to_type_unique UNIQUE(from_page_id, to_page_id, link_type)
id SERIAL PRIMARY KEY,
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
link_type TEXT NOT NULL DEFAULT '',
context TEXT NOT NULL DEFAULT '',
link_source TEXT CHECK (link_source IS NULL OR link_source IN ('markdown', 'frontmatter', 'manual')),
origin_page_id INTEGER REFERENCES pages(id) ON DELETE SET NULL,
origin_field TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT links_from_to_type_source_origin_unique
UNIQUE NULLS NOT DISTINCT (from_page_id, to_page_id, link_type, link_source, origin_page_id)
);
CREATE INDEX IF NOT EXISTS idx_links_from ON links(from_page_id);
CREATE INDEX IF NOT EXISTS idx_links_to ON links(to_page_id);
CREATE INDEX IF NOT EXISTS idx_links_source ON links(link_source);
CREATE INDEX IF NOT EXISTS idx_links_origin ON links(origin_page_id);
-- ============================================================
-- tags

View File

@@ -355,7 +355,15 @@ export class PostgresEngine implements BrainEngine {
}
// Links
async addLink(from: string, to: string, context?: string, linkType?: string): Promise<void> {
async addLink(
from: string,
to: string,
context?: string,
linkType?: string,
linkSource?: string,
originSlug?: string,
originField?: string,
): Promise<void> {
const sql = this.sql;
// Pre-check existence so we can throw a clear error (ON CONFLICT DO UPDATE
// returns 0 rows when source SELECT is empty, indistinguishable from missing page).
@@ -367,48 +375,82 @@ export class PostgresEngine implements BrainEngine {
if (exists.length === 0) {
throw new Error(`addLink failed: page "${from}" or "${to}" not found`);
}
// Default link_source to 'markdown' for back-compat with pre-v0.13 callers.
// origin_page_id resolves from originSlug via the pages join (NULL if no slug).
const src = linkSource ?? 'markdown';
await sql`
INSERT INTO links (from_page_id, to_page_id, link_type, context)
SELECT f.id, t.id, ${linkType || ''}, ${context || ''}
INSERT INTO links (from_page_id, to_page_id, link_type, context, link_source, origin_page_id, origin_field)
SELECT f.id, t.id, ${linkType || ''}, ${context || ''}, ${src},
(SELECT id FROM pages WHERE slug = ${originSlug ?? null}),
${originField ?? null}
FROM pages f, pages t
WHERE f.slug = ${from} AND t.slug = ${to}
ON CONFLICT (from_page_id, to_page_id, link_type) DO UPDATE SET
context = EXCLUDED.context
ON CONFLICT (from_page_id, to_page_id, link_type, link_source, origin_page_id) DO UPDATE SET
context = EXCLUDED.context,
origin_field = EXCLUDED.origin_field
`;
}
async addLinksBatch(links: LinkBatchInput[]): Promise<number> {
if (links.length === 0) return 0;
const sql = this.sql;
// unnest() pattern: 4 array-typed bound parameters regardless of batch size.
// unnest() pattern: 7 array-typed bound parameters regardless of batch size.
// Avoids the 65535-parameter cap and the postgres-js sql(rows, ...) helper's
// identifier-escape gotcha when used inside a (VALUES) subquery.
//
// v0.13: added link_source, origin_slug, origin_field. Defaults:
// link_source → 'markdown' (back-compat with pre-v0.13 callers)
// origin_slug → NULL (resolves to origin_page_id IS NULL via LEFT JOIN)
// origin_field → NULL
const fromSlugs = links.map(l => l.from_slug);
const toSlugs = links.map(l => l.to_slug);
// Normalize optional fields to '' to match per-row addLink + NOT NULL DDL.
const linkTypes = links.map(l => l.link_type || '');
const contexts = links.map(l => l.context || '');
const linkSources = links.map(l => l.link_source || 'markdown');
const originSlugs = links.map(l => l.origin_slug || null);
const originFields = links.map(l => l.origin_field || null);
const result = await sql`
INSERT INTO links (from_page_id, to_page_id, link_type, context)
SELECT f.id, t.id, v.link_type, v.context
FROM unnest(${fromSlugs}::text[], ${toSlugs}::text[], ${linkTypes}::text[], ${contexts}::text[])
AS v(from_slug, to_slug, link_type, context)
INSERT INTO links (from_page_id, to_page_id, link_type, context, link_source, origin_page_id, origin_field)
SELECT f.id, t.id, v.link_type, v.context, v.link_source, o.id, v.origin_field
FROM unnest(
${fromSlugs}::text[], ${toSlugs}::text[], ${linkTypes}::text[],
${contexts}::text[], ${linkSources}::text[], ${originSlugs}::text[],
${originFields}::text[]
) AS v(from_slug, to_slug, link_type, context, link_source, origin_slug, origin_field)
JOIN pages f ON f.slug = v.from_slug
JOIN pages t ON t.slug = v.to_slug
ON CONFLICT (from_page_id, to_page_id, link_type) DO NOTHING
LEFT JOIN pages o ON o.slug = v.origin_slug
ON CONFLICT (from_page_id, to_page_id, link_type, link_source, origin_page_id) DO NOTHING
RETURNING 1
`;
return result.length;
}
async removeLink(from: string, to: string, linkType?: string): Promise<void> {
async removeLink(from: string, to: string, linkType?: string, linkSource?: string): Promise<void> {
const sql = this.sql;
if (linkType !== undefined) {
// Build up filters dynamically. linkType + linkSource are independent
// optional constraints; all four combinations are valid.
if (linkType !== undefined && linkSource !== undefined) {
await sql`
DELETE FROM links
WHERE from_page_id = (SELECT id FROM pages WHERE slug = ${from})
AND to_page_id = (SELECT id FROM pages WHERE slug = ${to})
AND link_type = ${linkType}
AND link_source IS NOT DISTINCT FROM ${linkSource}
`;
} else if (linkType !== undefined) {
await sql`
DELETE FROM links
WHERE from_page_id = (SELECT id FROM pages WHERE slug = ${from})
AND to_page_id = (SELECT id FROM pages WHERE slug = ${to})
AND link_type = ${linkType}
`;
} else if (linkSource !== undefined) {
await sql`
DELETE FROM links
WHERE from_page_id = (SELECT id FROM pages WHERE slug = ${from})
AND to_page_id = (SELECT id FROM pages WHERE slug = ${to})
AND link_source IS NOT DISTINCT FROM ${linkSource}
`;
} else {
await sql`
@@ -422,10 +464,13 @@ export class PostgresEngine implements BrainEngine {
async getLinks(slug: string): Promise<Link[]> {
const sql = this.sql;
const rows = await sql`
SELECT f.slug as from_slug, t.slug as to_slug, l.link_type, l.context
SELECT f.slug as from_slug, t.slug as to_slug,
l.link_type, l.context, l.link_source,
o.slug as origin_slug, l.origin_field
FROM links l
JOIN pages f ON f.id = l.from_page_id
JOIN pages t ON t.id = l.to_page_id
LEFT JOIN pages o ON o.id = l.origin_page_id
WHERE f.slug = ${slug}
`;
return rows as unknown as Link[];
@@ -434,15 +479,48 @@ export class PostgresEngine implements BrainEngine {
async getBacklinks(slug: string): Promise<Link[]> {
const sql = this.sql;
const rows = await sql`
SELECT f.slug as from_slug, t.slug as to_slug, l.link_type, l.context
SELECT f.slug as from_slug, t.slug as to_slug,
l.link_type, l.context, l.link_source,
o.slug as origin_slug, l.origin_field
FROM links l
JOIN pages f ON f.id = l.from_page_id
JOIN pages t ON t.id = l.to_page_id
LEFT JOIN pages o ON o.id = l.origin_page_id
WHERE t.slug = ${slug}
`;
return rows as unknown as Link[];
}
async findByTitleFuzzy(
name: string,
dirPrefix?: string,
minSimilarity: number = 0.55,
): Promise<{ slug: string; similarity: number } | null> {
const sql = this.sql;
// Use the `similarity()` function directly with an explicit threshold
// comparison. DO NOT use `SET LOCAL pg_trgm.similarity_threshold` +
// the `%` operator here — postgres.js auto-commits each sql`` call
// so `SET LOCAL` is a no-op across statement boundaries. Inline
// comparison is the only way to get predictable threshold behavior
// without wrapping the caller in a transaction.
//
// Tie-breaker: sort by slug after similarity so re-runs return the
// same winner when multiple pages score equally (prevents churn
// in put_page auto-link reconciliation).
const prefixPattern = dirPrefix ? `${dirPrefix}/%` : '%';
const rows = await sql`
SELECT slug, similarity(title, ${name}) AS sim
FROM pages
WHERE similarity(title, ${name}) >= ${minSimilarity}
AND slug LIKE ${prefixPattern}
ORDER BY sim DESC, slug ASC
LIMIT 1
`;
if (rows.length === 0) return null;
const row = rows[0] as { slug: string; sim: number };
return { slug: row.slug, similarity: row.sim };
}
async traverseGraph(slug: string, depth: number = 5): Promise<GraphNode[]> {
const sql = this.sql;
// Cycle prevention: visited array tracks page IDs already in the path.

View File

@@ -52,18 +52,38 @@ CREATE INDEX IF NOT EXISTS idx_chunks_embedding ON content_chunks USING hnsw (em
-- ============================================================
-- links: cross-references between pages
-- ============================================================
-- Provenance model (v0.13):
-- link_source — 'markdown' | 'frontmatter' | 'manual' | NULL
-- (NULL = legacy row written before v0.13; unknown source)
-- origin_page_id — for link_source='frontmatter', the page whose YAML
-- frontmatter created this edge; scopes reconciliation
-- origin_field — the frontmatter field name (e.g. 'key_people')
--
-- The unique constraint includes link_source + origin_page_id so a manual edge
-- and a frontmatter-derived edge with the same (from, to, type) tuple coexist.
-- Reconciliation on put_page filters by (link_source='frontmatter' AND
-- origin_page_id = written_page) — never touches other pages' edges.
CREATE TABLE IF NOT EXISTS links (
id SERIAL PRIMARY KEY,
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
link_type TEXT NOT NULL DEFAULT '',
context TEXT NOT NULL DEFAULT '',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT links_from_to_type_unique UNIQUE(from_page_id, to_page_id, link_type)
id SERIAL PRIMARY KEY,
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
link_type TEXT NOT NULL DEFAULT '',
context TEXT NOT NULL DEFAULT '',
link_source TEXT CHECK (link_source IS NULL OR link_source IN ('markdown', 'frontmatter', 'manual')),
origin_page_id INTEGER REFERENCES pages(id) ON DELETE SET NULL,
origin_field TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-- NULLS NOT DISTINCT (PG15+) so two rows with link_source IS NULL or
-- origin_page_id IS NULL collide as expected. Without this, every row with
-- NULL origin_page_id (markdown/manual edges) would be treated as unique.
CONSTRAINT links_from_to_type_source_origin_unique
UNIQUE NULLS NOT DISTINCT (from_page_id, to_page_id, link_type, link_source, origin_page_id)
);
CREATE INDEX IF NOT EXISTS idx_links_from ON links(from_page_id);
CREATE INDEX IF NOT EXISTS idx_links_to ON links(to_page_id);
CREATE INDEX IF NOT EXISTS idx_links_source ON links(link_source);
CREATE INDEX IF NOT EXISTS idx_links_origin ON links(origin_page_id);
-- ============================================================
-- tags

View File

@@ -82,6 +82,26 @@ export interface Link {
to_slug: string;
link_type: string;
context: string;
/**
* Provenance (v0.13+). NULL = legacy row (pre-v0.13, unknown source).
* 'markdown' = extracted from `[Name](path)` refs. 'frontmatter' = extracted
* from YAML frontmatter fields (company, investors, attendees, etc.).
* 'manual' = user-created via addLink with explicit source.
* Reconciliation in runAutoLink filters on link_source to avoid touching
* markdown / manual edges when rewriting a page's frontmatter.
*/
link_source?: string | null;
/**
* For link_source='frontmatter': the slug of the page whose frontmatter
* created this edge. Lets reconciliation scope "my edges" precisely when
* multiple pages reference the same (from, to, type) tuple.
*/
origin_slug?: string | null;
/**
* The frontmatter field name that created this edge (e.g. 'key_people',
* 'investors'). Used for debug output and the `unresolved` response list.
*/
origin_field?: string | null;
}
export interface GraphNode {

View File

@@ -48,18 +48,38 @@ CREATE INDEX IF NOT EXISTS idx_chunks_embedding ON content_chunks USING hnsw (em
-- ============================================================
-- links: cross-references between pages
-- ============================================================
-- Provenance model (v0.13):
-- link_source — 'markdown' | 'frontmatter' | 'manual' | NULL
-- (NULL = legacy row written before v0.13; unknown source)
-- origin_page_id — for link_source='frontmatter', the page whose YAML
-- frontmatter created this edge; scopes reconciliation
-- origin_field — the frontmatter field name (e.g. 'key_people')
--
-- The unique constraint includes link_source + origin_page_id so a manual edge
-- and a frontmatter-derived edge with the same (from, to, type) tuple coexist.
-- Reconciliation on put_page filters by (link_source='frontmatter' AND
-- origin_page_id = written_page) — never touches other pages' edges.
CREATE TABLE IF NOT EXISTS links (
id SERIAL PRIMARY KEY,
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
link_type TEXT NOT NULL DEFAULT '',
context TEXT NOT NULL DEFAULT '',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT links_from_to_type_unique UNIQUE(from_page_id, to_page_id, link_type)
id SERIAL PRIMARY KEY,
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
link_type TEXT NOT NULL DEFAULT '',
context TEXT NOT NULL DEFAULT '',
link_source TEXT CHECK (link_source IS NULL OR link_source IN ('markdown', 'frontmatter', 'manual')),
origin_page_id INTEGER REFERENCES pages(id) ON DELETE SET NULL,
origin_field TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-- NULLS NOT DISTINCT (PG15+) so two rows with link_source IS NULL or
-- origin_page_id IS NULL collide as expected. Without this, every row with
-- NULL origin_page_id (markdown/manual edges) would be treated as unique.
CONSTRAINT links_from_to_type_source_origin_unique
UNIQUE NULLS NOT DISTINCT (from_page_id, to_page_id, link_type, link_source, origin_page_id)
);
CREATE INDEX IF NOT EXISTS idx_links_from ON links(from_page_id);
CREATE INDEX IF NOT EXISTS idx_links_to ON links(to_page_id);
CREATE INDEX IF NOT EXISTS idx_links_source ON links(link_source);
CREATE INDEX IF NOT EXISTS idx_links_origin ON links(origin_page_id);
-- ============================================================
-- tags

View File

@@ -102,10 +102,9 @@ describe('buildPlan — diff against completed + installed VERSION', () => {
expect(plan.applied).toEqual([]);
expect(plan.partial).toEqual([]);
expect(plan.pending.map(m => m.version)).toContain('0.11.0');
// v0.12.0 (Knowledge Graph) and v0.12.2 (JSONB repair) are registered but
// installed VERSION is 0.11.1, so they land in skippedFuture until the
// binary catches up.
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.0', '0.12.2']);
// Future migrations (registered but newer than installed VERSION) land in
// skippedFuture until the binary catches up.
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.0', '0.12.2', '0.13.0']);
});
test('already applied → v0.11.0 lands in `applied` bucket, not pending', () => {
@@ -141,10 +140,10 @@ describe('buildPlan — diff against completed + installed VERSION', () => {
const idx = indexCompleted([]);
const plan = buildPlan(idx, '0.12.0');
expect(plan.pending.map(m => m.version)).toContain('0.11.0');
// v0.12.2 was added later (JSONB repair); installed=0.12.0 means it
// belongs in skippedFuture, not pending. v0.11.0 and v0.12.0 stay
// v0.12.2 and v0.13.0 were added later; installed=0.12.0 means they
// belong in skippedFuture, not pending. v0.11.0 and v0.12.0 stay
// pending despite being ≤ installed — that is the H9 invariant.
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.2']);
expect(plan.skippedFuture.map(m => m.version)).toEqual(['0.12.2', '0.13.0']);
});
test('--migration filter narrows to one version', () => {

View File

@@ -33,50 +33,69 @@ describe('extractMarkdownLinks', () => {
});
describe('extractLinksFromFile', () => {
it('resolves relative paths to slugs', () => {
it('resolves relative paths to slugs', async () => {
const content = '---\ntitle: Test\n---\nSee [Pedro](../people/pedro.md).';
const allSlugs = new Set(['people/pedro', 'deals/test-deal']);
const links = extractLinksFromFile(content, 'deals/test-deal.md', allSlugs);
const links = await extractLinksFromFile(content, 'deals/test-deal.md', allSlugs);
expect(links.length).toBeGreaterThanOrEqual(1);
expect(links[0].from_slug).toBe('deals/test-deal');
expect(links[0].to_slug).toBe('people/pedro');
});
it('skips links to non-existent pages', () => {
it('skips links to non-existent pages', async () => {
const content = 'See [Ghost](../people/ghost.md).';
const allSlugs = new Set(['deals/test']);
const links = extractLinksFromFile(content, 'deals/test.md', allSlugs);
const links = await extractLinksFromFile(content, 'deals/test.md', allSlugs);
expect(links).toHaveLength(0);
});
it('extracts frontmatter company links', () => {
it('extracts frontmatter company links (v0.13, includeFrontmatter opt-in)', async () => {
const content = '---\ncompany: brex\ntype: person\n---\nContent.';
const allSlugs = new Set(['people/test']);
const links = extractLinksFromFile(content, 'people/test.md', allSlugs);
// v0.13 canonical: person page with company: X → person → company works_at (outgoing).
// Resolver needs companies/brex to exist in allSlugs to emit the edge.
const allSlugs = new Set(['people/test', 'companies/brex']);
const links = await extractLinksFromFile(content, 'people/test.md', allSlugs, { includeFrontmatter: true });
const companyLinks = links.filter(l => l.link_type === 'works_at');
expect(companyLinks.length).toBeGreaterThanOrEqual(1);
expect(companyLinks[0].from_slug).toBe('people/test');
expect(companyLinks[0].to_slug).toBe('companies/brex');
});
it('extracts frontmatter investors array', () => {
it('extracts frontmatter investors array (v0.13: incoming direction)', async () => {
// v0.13: deal page with investors:[yc, threshold] emits INCOMING edges:
// companies/yc → deals/seed invested_in and same for threshold.
const content = '---\ninvestors: [yc, threshold]\ntype: deal\n---\nContent.';
const allSlugs = new Set(['deals/seed']);
const links = extractLinksFromFile(content, 'deals/seed.md', allSlugs);
const allSlugs = new Set(['deals/seed', 'companies/yc', 'companies/threshold']);
const links = await extractLinksFromFile(content, 'deals/seed.md', allSlugs, { includeFrontmatter: true });
const investorLinks = links.filter(l => l.link_type === 'invested_in');
expect(investorLinks).toHaveLength(2);
// Incoming: from = resolved investor, to = deal page.
for (const l of investorLinks) {
expect(l.to_slug).toBe('deals/seed');
expect(l.from_slug).toMatch(/^companies\/(yc|threshold)$/);
}
});
it('infers link type from directory structure', () => {
it('frontmatter extraction is default OFF (back-compat)', async () => {
// Without includeFrontmatter, fs-source no longer auto-extracts frontmatter.
// Matches db-source behavior. User opts in with --include-frontmatter flag.
const content = '---\ncompany: brex\ntype: person\n---\nContent.';
const allSlugs = new Set(['people/test', 'companies/brex']);
const links = await extractLinksFromFile(content, 'people/test.md', allSlugs);
expect(links).toEqual([]);
});
it('infers link type from directory structure', async () => {
const content = 'See [Brex](../companies/brex.md).';
const allSlugs = new Set(['people/pedro', 'companies/brex']);
const links = extractLinksFromFile(content, 'people/pedro.md', allSlugs);
const links = await extractLinksFromFile(content, 'people/pedro.md', allSlugs);
expect(links[0].link_type).toBe('works_at');
});
it('infers deal_for type for deals -> companies', () => {
it('infers deal_for type for deals -> companies', async () => {
const content = 'See [Brex](../companies/brex.md).';
const allSlugs = new Set(['deals/seed', 'companies/brex']);
const links = extractLinksFromFile(content, 'deals/seed.md', allSlugs);
const links = await extractLinksFromFile(content, 'deals/seed.md', allSlugs);
expect(links[0].link_type).toBe('deal_for');
});
});

View File

@@ -2,9 +2,13 @@ import { describe, test, expect } from 'bun:test';
import {
extractEntityRefs,
extractPageLinks,
extractFrontmatterLinks,
inferLinkType,
makeResolver,
parseTimelineEntries,
isAutoLinkEnabled,
FRONTMATTER_LINK_MAP,
type SlugResolver,
} from '../src/core/link-extraction.ts';
import type { BrainEngine } from '../src/core/engine.ts';
@@ -71,12 +75,28 @@ describe('extractEntityRefs', () => {
// ─── extractPageLinks ──────────────────────────────────────────
// Resolver that always returns whatever the caller asks for (pretend every
// page exists). Used by tests that only want to exercise the non-resolver
// paths (markdown + bare-slug + frontmatter.source).
const allowAllResolver = {
resolve: async (name: string) => {
if (/^[a-z][a-z0-9-]*\/[a-z0-9][a-z0-9-]*$/.test(name)) return name;
return null;
},
};
// Resolver that never resolves. Used to test that the non-frontmatter
// paths still produce candidates even when no fuzzy matching is possible.
const nullResolver = { resolve: async () => null };
describe('extractPageLinks', () => {
test('returns LinkCandidate[] with inferred types', () => {
const candidates = extractPageLinks(
test('returns LinkCandidate[] with inferred types', async () => {
const { candidates } = await extractPageLinks(
'docs/x',
'[Alice](people/alice) is the CEO of Acme.',
{},
'concept',
allowAllResolver,
);
expect(candidates.length).toBeGreaterThan(0);
const aliceLink = candidates.find(c => c.targetSlug === 'people/alice');
@@ -84,32 +104,42 @@ describe('extractPageLinks', () => {
expect(aliceLink!.linkType).toBe('works_at');
});
test('dedups multiple mentions of same entity (within-page dedup)', () => {
test('dedups multiple mentions of same entity (within-page dedup)', async () => {
const content = '[Alice](people/alice) said this. Later, [Alice](people/alice) said that.';
const candidates = extractPageLinks(content, {}, 'concept');
const { candidates } = await extractPageLinks('docs/x', content, {}, 'concept', allowAllResolver);
const aliceLinks = candidates.filter(c => c.targetSlug === 'people/alice');
expect(aliceLinks.length).toBe(1);
});
test('extracts frontmatter source as source-type link', () => {
const candidates = extractPageLinks('Some content.', { source: 'meetings/2026-01-15' }, 'person');
test('extracts frontmatter source as source-type link', async () => {
const { candidates } = await extractPageLinks(
'docs/x', 'Some content.', { source: 'meetings/2026-01-15' }, 'person', allowAllResolver,
);
const sourceLink = candidates.find(c => c.linkType === 'source');
expect(sourceLink).toBeDefined();
expect(sourceLink!.targetSlug).toBe('meetings/2026-01-15');
});
test('extracts bare slug references in text', () => {
const candidates = extractPageLinks('See companies/acme for details.', {}, 'concept');
test('extracts bare slug references in text', async () => {
const { candidates } = await extractPageLinks(
'docs/x', 'See companies/acme for details.', {}, 'concept', nullResolver,
);
const acme = candidates.find(c => c.targetSlug === 'companies/acme');
expect(acme).toBeDefined();
});
test('returns empty when no refs found', () => {
expect(extractPageLinks('Plain text with no links.', {}, 'concept')).toEqual([]);
test('returns empty when no refs found', async () => {
const { candidates } = await extractPageLinks(
'docs/x', 'Plain text with no links.', {}, 'concept', nullResolver,
);
expect(candidates).toEqual([]);
});
test('meeting page references default to attended type', () => {
const candidates = extractPageLinks('Attendees: [Alice](people/alice), [Bob](people/bob).', {}, 'meeting');
test('meeting page references default to attended type', async () => {
const { candidates } = await extractPageLinks(
'meetings/x', 'Attendees: [Alice](people/alice), [Bob](people/bob).',
{}, 'meeting' as never, nullResolver,
);
const aliceLink = candidates.find(c => c.targetSlug === 'people/alice');
expect(aliceLink!.linkType).toBe('attended');
});
@@ -303,3 +333,279 @@ describe('isAutoLinkEnabled', () => {
expect(await isAutoLinkEnabled(engine)).toBe(true);
});
});
// ─── Frontmatter link extraction (v0.13) ────────────────────────
/**
* In-memory resolver for frontmatter tests. Maps names to slugs via an
* explicit fixture map; returns null for anything missing. Mirrors what
* the real resolver does on a production brain but with deterministic
* inputs (no pg_trgm, no searchPages).
*/
function makeFixtureResolver(pages: Record<string, string>): SlugResolver {
return {
async resolve(name: string, dirHint?: string | string[]) {
const hints = Array.isArray(dirHint) ? dirHint : (dirHint ? [dirHint] : []);
// Already a slug — check if present.
if (/^[a-z][a-z0-9-]*\/[a-z0-9][a-z0-9-]*$/.test(name)) {
return pages[name] ?? null;
}
const slugified = name.toLowerCase().replace(/\s+/g, '-');
for (const hint of hints) {
if (!hint) continue;
const candidate = `${hint}/${slugified}`;
if (pages[candidate]) return candidate;
}
return null;
},
};
}
describe('extractFrontmatterLinks — field-map coverage', () => {
const pages = {
'people/pedro': 'people/pedro',
'people/garry': 'people/garry',
'people/diana-hu': 'people/diana-hu',
'companies/stripe': 'companies/stripe',
'companies/brex': 'companies/brex',
'companies/sequoia': 'companies/sequoia',
'companies/benchmark': 'companies/benchmark',
'meetings/2026-04-03': 'meetings/2026-04-03',
'deal/riveter-seed': 'deal/riveter-seed',
};
const resolver = makeFixtureResolver(pages);
test('person.company → outgoing works_at', async () => {
const { candidates } = await extractFrontmatterLinks(
'people/pedro', 'person' as never, { company: 'Stripe' }, resolver,
);
expect(candidates).toHaveLength(1);
expect(candidates[0]).toMatchObject({
fromSlug: 'people/pedro',
targetSlug: 'companies/stripe',
linkType: 'works_at',
linkSource: 'frontmatter',
originSlug: 'people/pedro',
originField: 'company',
});
});
test('person.companies (array alias) → multiple works_at edges', async () => {
const { candidates } = await extractFrontmatterLinks(
'people/pedro', 'person' as never, { companies: ['Stripe', 'Brex'] }, resolver,
);
expect(candidates).toHaveLength(2);
for (const c of candidates) {
expect(c.fromSlug).toBe('people/pedro');
expect(c.linkType).toBe('works_at');
expect(c.targetSlug).toMatch(/^companies\/(stripe|brex)$/);
}
});
test('company.key_people → INCOMING works_at (person → company)', async () => {
const { candidates } = await extractFrontmatterLinks(
'companies/stripe', 'company' as never, { key_people: ['Pedro', 'Garry'] }, resolver,
);
expect(candidates).toHaveLength(2);
for (const c of candidates) {
// Incoming: from = resolved person, to = the page being written.
expect(c.targetSlug).toBe('companies/stripe');
expect(c.fromSlug).toMatch(/^people\/(pedro|garry)$/);
expect(c.linkType).toBe('works_at');
expect(c.originSlug).toBe('companies/stripe');
expect(c.originField).toBe('key_people');
}
});
test('meeting.attendees → INCOMING attended (person → meeting)', async () => {
const { candidates } = await extractFrontmatterLinks(
'meetings/2026-04-03', 'meeting' as never, { attendees: ['Pedro', 'Garry'] }, resolver,
);
expect(candidates).toHaveLength(2);
for (const c of candidates) {
expect(c.targetSlug).toBe('meetings/2026-04-03');
expect(c.linkType).toBe('attended');
expect(c.fromSlug).toMatch(/^people\/(pedro|garry)$/);
}
});
test('deal.investors (multi-dir hint) → INCOMING invested_in', async () => {
const { candidates } = await extractFrontmatterLinks(
'deal/riveter-seed', 'deal' as never,
{ investors: ['Sequoia', 'Benchmark'] }, resolver,
);
expect(candidates).toHaveLength(2);
for (const c of candidates) {
expect(c.targetSlug).toBe('deal/riveter-seed');
expect(c.linkType).toBe('invested_in');
expect(c.fromSlug).toMatch(/^companies\/(sequoia|benchmark)$/);
}
});
test('source field → outgoing source edge', async () => {
const { candidates } = await extractFrontmatterLinks(
'people/pedro', 'person' as never, { source: 'meetings/2026-04-03' }, resolver,
);
const src = candidates.find(c => c.linkType === 'source');
expect(src).toBeDefined();
expect(src!.fromSlug).toBe('people/pedro');
expect(src!.targetSlug).toBe('meetings/2026-04-03');
});
test('unresolvable name goes to unresolved list, not candidates', async () => {
const { candidates, unresolved } = await extractFrontmatterLinks(
'meetings/x', 'meeting' as never,
{ attendees: ['Pedro', 'Unknown Person'] }, resolver,
);
expect(candidates).toHaveLength(1);
expect(unresolved).toHaveLength(1);
expect(unresolved[0]).toEqual({ field: 'attendees', name: 'Unknown Person' });
});
test('bad types (number, null, empty) skipped silently', async () => {
const { candidates, unresolved } = await extractFrontmatterLinks(
'meetings/x', 'meeting' as never,
{ attendees: [42, null, '', 'Pedro', { nothing: true }] }, resolver,
);
// Only 'Pedro' produces a candidate. 42/null/'' silently skipped.
// Object without name/slug/title is skipped. No unresolved entry for skipped.
expect(candidates).toHaveLength(1);
expect(candidates[0].fromSlug).toBe('people/pedro');
expect(unresolved).toHaveLength(0);
});
test('array of objects: uses .name, carries role into context', async () => {
const { candidates } = await extractFrontmatterLinks(
'deal/riveter-seed', 'deal' as never,
{ investors: [{ name: 'Sequoia', role: 'lead' }] }, resolver,
);
expect(candidates).toHaveLength(1);
expect(candidates[0].context).toContain('Sequoia');
expect(candidates[0].context).toContain('lead');
});
test('context enrichment — not bare field name', async () => {
const { candidates } = await extractFrontmatterLinks(
'companies/stripe', 'company' as never, { key_people: ['Pedro'] }, resolver,
);
// Per plan Finding 7: context must include field + value, not bare 'frontmatter.key_people'.
expect(candidates[0].context).toBe('frontmatter.key_people: Pedro');
});
test('pageType filter — field ignored on non-matching page', async () => {
// `company` field only fires on person pages. On a concept page it's ignored.
const { candidates } = await extractFrontmatterLinks(
'concepts/x', 'concept' as never, { company: 'Stripe' }, resolver,
);
expect(candidates).toHaveLength(0);
});
});
describe('makeResolver — fallback chain', () => {
// Minimal engine fake with controlled pages + findByTitleFuzzy.
function makeFakeEngine(
slugs: string[],
fuzzyMap: Map<string, { slug: string; similarity: number }> = new Map(),
): BrainEngine {
const lookup = new Set(slugs);
let getPageCalls = 0;
let fuzzyCalls = 0;
let searchCalls = 0;
const engine = {
async getPage(slug: string) {
getPageCalls++;
return lookup.has(slug) ? { slug } as any : null;
},
async findByTitleFuzzy(name: string) {
fuzzyCalls++;
return fuzzyMap.get(name) ?? null;
},
async searchKeyword() {
searchCalls++;
return [];
},
} as unknown as BrainEngine;
(engine as any)._counts = () => ({ getPageCalls, fuzzyCalls, searchCalls });
return engine;
}
test('step 1: slug passthrough', async () => {
const engine = makeFakeEngine(['people/pedro']);
const r = makeResolver(engine);
expect(await r.resolve('people/pedro')).toBe('people/pedro');
});
test('step 2: dir-hint construction', async () => {
const engine = makeFakeEngine(['companies/stripe']);
const r = makeResolver(engine);
expect(await r.resolve('Stripe', 'companies')).toBe('companies/stripe');
});
test('step 3: pg_trgm fuzzy hit', async () => {
const engine = makeFakeEngine(
['companies/brex'],
new Map([['Brex Inc', { slug: 'companies/brex', similarity: 0.8 }]]),
);
const r = makeResolver(engine);
expect(await r.resolve('Brex Inc', 'companies')).toBe('companies/brex');
});
test('batch mode NEVER calls searchKeyword (deterministic migration)', async () => {
const engine = makeFakeEngine([]);
const r = makeResolver(engine, { mode: 'batch' });
const result = await r.resolve('Unknown Name', 'companies');
expect(result).toBeNull();
const counts = (engine as any)._counts();
expect(counts.searchCalls).toBe(0);
});
test('cache: same name → single getPage call', async () => {
const engine = makeFakeEngine(['people/pedro']);
const r = makeResolver(engine);
await r.resolve('people/pedro');
await r.resolve('people/pedro');
await r.resolve('people/pedro');
const counts = (engine as any)._counts();
expect(counts.getPageCalls).toBe(1);
});
test('unresolvable → null (no dead link written)', async () => {
const engine = makeFakeEngine([]);
const r = makeResolver(engine, { mode: 'batch' });
expect(await r.resolve('Nonexistent Person', 'people')).toBeNull();
});
});
describe('FRONTMATTER_LINK_MAP integrity', () => {
test('every mapping has fields + type + direction + dirHint', () => {
for (const m of FRONTMATTER_LINK_MAP) {
expect(m.fields.length).toBeGreaterThan(0);
expect(m.type).toBeTruthy();
expect(['outgoing', 'incoming']).toContain(m.direction);
expect(m.dirHint !== undefined).toBe(true);
}
});
test('key_people maps to INCOMING works_at on company page', () => {
const m = FRONTMATTER_LINK_MAP.find(m => m.fields.includes('key_people'));
expect(m).toBeDefined();
expect(m!.direction).toBe('incoming');
expect(m!.pageType).toBe('company');
expect(m!.type).toBe('works_at');
});
test('attendees maps to INCOMING attended on meeting page', () => {
const m = FRONTMATTER_LINK_MAP.find(m => m.fields.includes('attendees'));
expect(m!.direction).toBe('incoming');
expect(m!.pageType).toBe('meeting');
expect(m!.type).toBe('attended');
});
test('investors uses multi-dir hint (companies/funds/people)', () => {
const m = FRONTMATTER_LINK_MAP.find(m => m.fields.includes('investors'));
expect(Array.isArray(m!.dirHint)).toBe(true);
expect(m!.dirHint).toContain('companies');
expect(m!.dirHint).toContain('funds');
expect(m!.dirHint).toContain('people');
});
});

View File

@@ -93,11 +93,12 @@ describe('migrate: v8 (links_dedup) regression — must be fast on 1K duplicate
});
test('1000 duplicate links dedup completes in <5s and leaves table deduped', async () => {
// Set up: drop the unique constraint so duplicates can be inserted, then reset
// version so v8 re-runs. Schema-embedded.ts already has the constraint, so
// initSchema() above set it up; explicit DROP makes the test premise valid.
// Set up: drop BOTH the old (v8) and new (v11) unique constraints so
// duplicates can be inserted, then reset version so v8 + v11 re-run.
// v11 replaces the v8 constraint name; we drop whichever is present.
const db = (engine as any).db;
await db.exec(`ALTER TABLE links DROP CONSTRAINT IF EXISTS links_from_to_type_unique`);
await db.exec(`ALTER TABLE links DROP CONSTRAINT IF EXISTS links_from_to_type_source_origin_unique`);
// Two pages so the FK is satisfied
await engine.putPage('p/from', { type: 'concept', title: 'F', compiled_truth: '', timeline: '' });
@@ -115,7 +116,7 @@ describe('migrate: v8 (links_dedup) regression — must be fast on 1K duplicate
const beforeCount = (await db.query(`SELECT COUNT(*)::int AS c FROM links`)).rows[0].c;
expect(beforeCount).toBe(1000);
// Reset version to 7 so v8 + v9 + v10 re-run
// Reset version to 7 so v8 + v9 + v10 + v11 re-run
await engine.setConfig('version', '7');
// Run migrations and assert wall-clock + correctness
@@ -128,12 +129,14 @@ describe('migrate: v8 (links_dedup) regression — must be fast on 1K duplicate
const afterCount = (await db.query(`SELECT COUNT(*)::int AS c FROM links`)).rows[0].c;
expect(afterCount).toBe(1); // deduped to one row
// Unique constraint reinstated
// v11 replaces v8's constraint name. Assert the current (v11) constraint
// exists and the legacy v8 name is gone.
const constraints = (await db.query(`
SELECT conname FROM pg_constraint
WHERE conrelid = 'links'::regclass AND contype = 'u'
`)).rows;
expect(constraints.some((c: { conname: string }) => c.conname === 'links_from_to_type_unique')).toBe(true);
expect(constraints.some((c: { conname: string }) => c.conname === 'links_from_to_type_source_origin_unique')).toBe(true);
expect(constraints.some((c: { conname: string }) => c.conname === 'links_from_to_type_unique')).toBe(false);
// Helper index was dropped after dedup
const helperIdx = (await db.query(`