feat: GBrain v0.1.0 — Postgres-native personal knowledge brain (#1)

* chore: add CLAUDE.md with project context and gstack skill routing rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: initialize project with Bun + TypeScript

package.json with dependencies (postgres, pgvector, openai, anthropic,
MCP SDK, gray-matter). TypeScript config targeting ESNext with bundler
module resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add foundation layer — engine interface, Postgres engine, schema

BrainEngine pluggable interface with full PostgresEngine: CRUD, search
(keyword + vector), links, tags, timeline, versions, stats, health,
ingest log, config. Trigger-based tsvector spanning pages +
timeline_entries. Markdown parser with frontmatter, compiled_truth /
timeline splitting, and round-trip serialization. 19 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add 3-tier chunking and embedding service

Recursive delimiter-aware chunker (5-level hierarchy, 300-word chunks,
50-word overlap). Semantic chunker with Savitzky-Golay boundary detection
and recursive fallback. LLM-guided chunker via Claude Haiku with sliding
window topic detection. OpenAI embedding service with batch support,
exponential backoff, and rate limit handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add hybrid search with RRF fusion, expansion, and 4-layer dedup

Hybrid search merges vector (pgvector HNSW) + keyword (tsvector) via
Reciprocal Rank Fusion. Multi-query expansion via Claude Haiku generates
2 alternative phrasings. 4-layer dedup pipeline: by source, cosine
similarity, type diversity (60% cap), per-page cap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add GBRAIN_V0 spec, pluggable engine architecture, SQLite engine plan

GBRAIN_V0.md: full product spec with architecture decisions, CLI commands,
schema, search architecture, chunking strategies, first-time experience,
and future plans. ENGINES.md: pluggable engine interface, capability matrix,
how to add new backends. SQLITE_ENGINE.md: complete SQLite implementation
plan with schema, FTS5 setup, vector search options, and contributor guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add CLI with all commands

Full CLI dispatcher with 25+ commands: init (Supabase wizard), get, put,
delete, list, search, query (hybrid RRF), import (bulk with progress bar),
export (round-trip), embed, stats, health, tag/untag/tags, link/unlink/
backlinks/graph, timeline/timeline-add, history/revert, config, upgrade,
serve, call. Smart slug resolution on reads. Version snapshots on updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add MCP stdio server with all brain tools

20 MCP tools mirroring CLI operations: get/put/delete/list pages,
search (keyword), query (hybrid RRF + expansion), tags, links with
graph traversal, timeline, stats, health, version history, and revert.
Auto-chunks and embeds on put_page. CLI and MCP share the same engine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add 6 skill files and ClawHub manifest

Fat markdown skills for AI agents: ingest (meetings/docs/articles with
timeline merge), query (3-layer search + synthesis + citations), maintain
(health checks, stale detection, orphan audit), enrich (external API
enrichment), briefing (daily briefing compilation), migrate (universal
migration from Obsidian/Notion/Logseq/markdown/CSV/JSON/Roam).
ClawHub manifest for skill distribution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add README, CONTRIBUTING, update CLAUDE.md test references

README with quickstart, commands, architecture, library usage, MCP setup,
and links to design docs. CONTRIBUTING with setup, project structure,
and guides for adding commands and engines. CLAUDE.md updated to reference
actual test files instead of planned-but-unwritten import test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address adversarial review findings — 5 critical/high fixes

- revertToVersion: add page_id check to prevent cross-page data corruption
- traverseGraph: use UNION instead of UNION ALL for cycle safety
- embedAll: preserve all chunks when embedding stale subset only
- embedding: throw on retry exhaustion instead of returning zero vectors
- putPage: validate slugs to prevent path traversal on export

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.1.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: expand README with schema, install, search architecture, and motivation

Why it exists, how search works (with ASCII diagram), full database schema
with all 9 tables and index details, chunking strategies explained, storage
estimates, setup wizard walkthrough, knowledge model with example page,
library usage with more examples, expanded skills table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add MIT license (Copyright 2026 Garry Tan)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add OpenClaw install flow as primary option in README

OpenClaw users just say "install gbrain" and the orchestrator handles
everything: package install, Supabase setup wizard, skill registration.
Shows the conversational interface for querying, ingesting, and briefings.
ClawHub and standalone CLI paths follow as alternatives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add prerequisites and explicit OpenClaw install instructions

Prerequisites table listing Supabase, OpenAI, and Anthropic dependencies
with links. Environment variable setup. Explicit step-by-step prompt for
OpenClaw users showing exactly what to tell the orchestrator. Note that
search degrades gracefully without API keys (keyword-only without OpenAI,
no expansion without Anthropic).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: scrub named references, add PG essay demo section to README

Replace all Pedro/Brex/Jensen Huang/River AI examples with Paul Graham
essay examples using the kindling corpus. Add "Try it" section to README
showing the power of hybrid search on PG essays in 90 seconds. Update
test fixtures to use concept pages instead of person pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-05 12:48:10 -07:00
committed by GitHub
parent 3144971cd0
commit b22cbd349a
62 changed files with 6655 additions and 0 deletions

4
.gitignore vendored Normal file
View File

@@ -0,0 +1,4 @@
node_modules/
bin/
.DS_Store
*.log

27
CHANGELOG.md Normal file
View File

@@ -0,0 +1,27 @@
# Changelog
All notable changes to GBrain will be documented in this file.
## [0.1.0] - 2026-04-05
### Added
- Pluggable engine interface (`BrainEngine`) with full Postgres + pgvector implementation
- 25+ CLI commands: init, get, put, delete, list, search, query, import, export, embed, stats, health, link/unlink/backlinks/graph, tag/untag/tags, timeline/timeline-add, history/revert, config, upgrade, serve, call
- MCP stdio server with 20 tools mirroring all CLI operations
- 3-tier chunking: recursive (delimiter-aware), semantic (Savitzky-Golay boundary detection), LLM-guided (Claude Haiku topic shifts)
- Hybrid search with Reciprocal Rank Fusion merging vector + keyword results
- Multi-query expansion via Claude Haiku (2 alternative phrasings per query)
- 4-layer dedup pipeline: by source, cosine similarity, type diversity, per-page cap
- OpenAI embedding service (text-embedding-3-large, 1536 dims) with batch support and exponential backoff
- Postgres schema with pgvector HNSW, tsvector (trigger-based, spans timeline_entries), pg_trgm fuzzy slug matching
- Smart slug resolution for reads (fuzzy match via pg_trgm)
- Page version control with snapshot, history, and revert
- Typed links with recursive CTE graph traversal (max depth configurable)
- Brain health dashboard (embed coverage, stale pages, orphans, dead links)
- Stale alert annotations in search results
- Supabase init wizard with CLI auto-provision fallback
- Slug validation to prevent path traversal on export
- 6 fat markdown skills: ingest, query, maintain, enrich, briefing, migrate
- ClawHub manifest for skill distribution
- Full design docs: GBRAIN_V0 spec, pluggable engine architecture, SQLite engine plan

61
CLAUDE.md Normal file
View File

@@ -0,0 +1,61 @@
# CLAUDE.md
GBrain is a personal knowledge brain. Postgres + pgvector + hybrid search in a managed Supabase instance.
## Architecture
Thin CLI + fat skills. The CLI (`src/cli.ts`) dispatches commands to handler files in
`src/commands/`. The core library (`src/core/`) handles database, search, embeddings,
and markdown parsing. Skills (`skills/`) are fat markdown files that tell you HOW to
use the tools — ingest meetings, answer queries, maintain the brain, enrich from APIs.
## Key files
- `src/core/engine.ts` — Pluggable engine interface (BrainEngine)
- `src/core/postgres-engine.ts` — Postgres + pgvector implementation
- `src/core/db.ts` — Connection management, schema initialization
- `src/core/chunkers/` — 3-tier chunking (recursive, semantic, LLM-guided)
- `src/core/search/` — Hybrid search: vector + keyword + RRF + multi-query expansion + dedup
- `src/core/embedding.ts` — OpenAI text-embedding-3-large, batch, retry, backoff
- `src/mcp/server.ts` — MCP stdio server exposing all tools
- `src/schema.sql` — Full Postgres + pgvector DDL
## Commands
Run `gbrain --help` or `gbrain --tools-json` for full command reference.
## Testing
`bun test` runs all tests. Tests: `test/markdown.test.ts` (frontmatter parsing,
round-trip serialization), `test/chunkers/recursive.test.ts` (delimiter splitting,
overlap, chunk sizing). Future: `test/import.test.ts` for full import/export round-trip.
## Skills
Read the skill files in `skills/` before doing brain operations. They contain the
workflows, heuristics, and quality rules for ingestion, querying, maintenance, and
enrichment.
## Build
`bun build --compile --outfile bin/gbrain src/cli.ts`
## Skill routing
When the user's request matches an available skill, ALWAYS invoke it using the Skill
tool as your FIRST action. Do NOT answer directly, do NOT use other tools first.
The skill has specialized workflows that produce better results than ad-hoc answers.
Key routing rules:
- Product ideas, "is this worth building", brainstorming → invoke office-hours
- Bugs, errors, "why is this broken", 500 errors → invoke investigate
- Ship, deploy, push, create PR → invoke ship
- QA, test the site, find bugs → invoke qa
- Code review, check my diff → invoke review
- Update docs after shipping → invoke document-release
- Weekly retro → invoke retro
- Design system, brand → invoke design-consultation
- Visual audit, design polish → invoke design-review
- Architecture review → invoke plan-eng-review
- Save progress, checkpoint, resume → invoke checkpoint
- Code quality, health check → invoke health

78
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,78 @@
# Contributing to GBrain
## Setup
```bash
git clone https://github.com/garrytan/gbrain.git
cd gbrain
bun install
bun test
```
Requires Bun 1.0+.
## Project structure
```
src/
cli.ts CLI entry point
commands/ Command handlers (one file per command)
core/
engine.ts BrainEngine interface
postgres-engine.ts Postgres implementation
db.ts Connection management
types.ts TypeScript types
markdown.ts Frontmatter parsing
config.ts Config file management
chunkers/ 3-tier chunking (recursive, semantic, llm)
search/ Hybrid search (vector, keyword, hybrid, expansion, dedup)
embedding.ts OpenAI embedding service
mcp/
server.ts MCP stdio server
schema.sql Postgres DDL
skills/ Fat markdown skills for AI agents
test/ Tests (bun test)
docs/ Architecture docs
```
## Running tests
```bash
bun test # all tests
bun test test/markdown.test.ts # specific test
```
## Building
```bash
bun build --compile --outfile bin/gbrain src/cli.ts
```
## Adding a new command
1. Create `src/commands/mycommand.ts` with an exported `runMyCommand` function
2. Add the case to `src/cli.ts` in the switch statement
3. Add the tool to `src/mcp/server.ts` in `handleToolCall` and `getToolDefinitions`
4. Add to `src/commands/tools-json.ts`
5. Add tests
CLI and MCP must expose identical operations. Drift tests will verify this.
## Adding a new engine
See `docs/ENGINES.md` for the full guide. In short:
1. Create `src/core/myengine-engine.ts` implementing `BrainEngine`
2. Add to engine factory in `src/core/engine.ts`
3. Run the test suite against your engine
4. Document in `docs/`
The SQLite engine is designed and ready for implementation. See `docs/SQLITE_ENGINE.md`.
## Welcome PRs
- SQLite engine implementation
- Docker Compose for self-hosted Postgres
- Additional migration sources
- New enrichment API integrations
- Performance optimizations

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Garry Tan
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

462
README.md Normal file
View File

@@ -0,0 +1,462 @@
# GBrain
Open source personal knowledge brain. Postgres + pgvector + hybrid search that actually works.
```bash
gbrain query "what does Paul Graham say about doing things that don't scale?"
```
```
concepts/do-things-that-dont-scale (concept) score=0.0312
The most common unscalable thing founders have to do at the start is to
recruit users manually. Nearly all startups have to...
concepts/how-to-get-startup-ideas (concept) score=0.0298
The way to get startup ideas is not to try to think of startup ideas.
It's to look for problems, preferably problems you have yourself...
concepts/relentlessly-resourceful (concept) score=0.0251
Not merely relentless. That's not enough to make things go your way
except in a few mostly uninteresting domains. In any interesting domain...
```
Hybrid search finds essays by meaning, not just keywords. "Doing things that don't scale" matches even when the exact phrase doesn't appear. That's the point.
## Why this exists
You have a brain full of knowledge. It lives in markdown files, meeting notes, CRM exports, Obsidian vaults, Notion databases. It's scattered, unsearchable, and going stale.
Search is the bottleneck. Keyword search misses semantic matches. Vector search misses exact names and phrases. Neither connects related ideas across documents.
GBrain fixes this with hybrid search that combines both approaches, plus a knowledge model that treats every page like an intelligence assessment: compiled truth on top (your current best understanding, rewritten when evidence changes), append-only timeline on the bottom (the evidence trail that never gets edited).
AI agents maintain the brain. You ingest a document and the agent updates every entity mentioned, creates cross-reference links, and appends timeline entries. MCP clients query it. The intelligence lives in fat markdown skills, not application code.
## Try it: Paul Graham's essays in 90 seconds
GBrain ships with 10 Paul Graham essays as a kindling corpus. After setup, they're already in your brain:
```bash
# What's in there?
gbrain stats
# Pages: 10, Chunks: 47, Embedded: 47, Links: 0
# Keyword search (fast, exact matches)
gbrain search "startups"
# Hybrid search (the good one, semantic + keyword + expansion)
gbrain query "what makes a great founder?"
# Read a specific essay
gbrain get concepts/do-things-that-dont-scale
# Find essays related to a concept
gbrain query "when should you ignore conventional wisdom?"
# Check brain health
gbrain health
# Pages: 10, Embed coverage: 100%, Stale: 0, Orphans: 10
```
The essays are just the demo. The real power is when you import your own knowledge, thousands of pages about people, companies, projects, and the connections between them.
## Install
### Prerequisites
GBrain needs three things to run:
| Dependency | What it's for | How to get it |
|------------|--------------|---------------|
| **Supabase account** | Postgres + pgvector database | [supabase.com](https://supabase.com) (Pro tier, $25/mo for 8GB) |
| **OpenAI API key** | Embeddings (text-embedding-3-large) | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) |
| **Anthropic API key** | Multi-query expansion + LLM chunking (Haiku) | [console.anthropic.com](https://console.anthropic.com) |
Set the API keys as environment variables:
```bash
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
```
The Supabase connection URL is configured during `gbrain init`. The OpenAI and Anthropic SDKs read their keys from the environment automatically.
Without an OpenAI key, search still works (keyword only, no vector search). Without an Anthropic key, search still works (no multi-query expansion, no LLM chunking).
### With OpenClaw (recommended)
If you're running OpenClaw, tell it to set up your brain. Make sure your API keys are set in the environment first.
```
You: "Install gbrain and set up my knowledge brain.
I need you to:
1. Run: bun add gbrain
2. Run: gbrain init --supabase (follow the wizard to connect my Supabase database)
3. Run: gbrain import data/kindling/ (import the demo corpus)
4. Read the skill files in skills/ so you know how to use the brain"
```
OpenClaw will install the package, walk through the Supabase connection wizard, import demo data, and learn the 6 brain skills (ingest, query, maintain, enrich, briefing, migrate).
After setup, you talk to your brain through OpenClaw:
```
You: "What essays do we have about startups?"
You: "Ingest my meeting notes from today"
You: "Give me a briefing for my meetings tomorrow"
You: "Import my Obsidian vault into the brain"
```
OpenClaw reads the skill files in `skills/`, figures out which gbrain commands to run, and does the work. You never touch the CLI directly unless you want to.
### With ClawHub
```bash
clawhub install gbrain
```
This installs the npm package, copies the skill files, and runs `gbrain init --supabase` on first use.
### Standalone CLI
```bash
npm install -g gbrain
```
### As a library
```bash
bun add gbrain
```
```typescript
import { PostgresEngine } from 'gbrain';
```
All paths require a Postgres database with pgvector. Supabase Pro ($25/mo) is the recommended zero-ops option.
## Setup
After installing via CLI or library path, run the setup wizard:
```bash
# Guided wizard: auto-provisions Supabase or accepts a connection URL
gbrain init --supabase
# Or connect to any Postgres with pgvector
gbrain init --url postgresql://user:pass@host:5432/dbname
```
The init wizard:
1. Checks for Supabase CLI, offers auto-provisioning
2. Falls back to manual connection URL if CLI isn't available
3. Runs the full schema migration (tables, indexes, triggers, extensions)
4. Imports the kindling corpus (10 PG essays) as demo data
5. Verifies the connection and prints your first query to try
Config is saved to `~/.gbrain/config.json` with 0600 permissions.
OpenClaw users skip this step. The orchestrator runs the wizard for you during install.
## First import
```bash
# Import your markdown wiki (auto-chunks and auto-embeds)
gbrain import /path/to/brain/
# Skip embedding if you want to import fast and embed later
gbrain import /path/to/brain/ --no-embed
# Backfill embeddings for pages that don't have them
gbrain embed --stale
```
Import is idempotent. Re-running it skips unchanged files (compared by SHA-256 content hash). Progress bar shows status. ~30s for text import of 7,000 files, ~10-15 min for embedding.
## The knowledge model
Every page in the brain follows the compiled truth + timeline pattern:
```markdown
---
type: concept
title: Do Things That Don't Scale
tags: [startups, growth, pg-essay]
---
Paul Graham's argument that startups should do unscalable things early on.
The most common: recruiting users manually, one at a time. Airbnb went
door to door in New York photographing apartments. Stripe manually
installed their payment integration for early users.
The key insight: the unscalable effort teaches you what users actually
want, which you can't learn any other way.
---
- 2013-07-01: Published on paulgraham.com
- 2024-11-15: Referenced in batch W25 kickoff talk
- 2025-02-20: Cited in discussion about AI agent onboarding strategies
```
Above the `---` separator: **compiled truth**. Your current best understanding. Gets rewritten when new evidence changes the picture. Below: **timeline**. Append-only evidence trail. Never edited, only added to.
The compiled truth is the answer. The timeline is the proof.
## How search works
```
Query: "when should you ignore conventional wisdom?"
|
Multi-query expansion (Claude Haiku)
"contrarian thinking startups", "going against the crowd"
|
+----+----+
| |
Vector Keyword
(HNSW (tsvector +
cosine) ts_rank)
| |
+----+----+
|
RRF Fusion: score = sum(1/(60 + rank))
|
4-Layer Dedup
1. Best chunk per page
2. Cosine similarity > 0.85
3. Type diversity (60% cap)
4. Per-page chunk cap
|
Stale alerts (compiled truth older than latest timeline)
|
Results
```
Keyword search alone misses conceptual matches. "Ignore conventional wisdom" won't find an essay titled "The Bus Ticket Theory of Genius" even though it's exactly about that. Vector search alone misses exact phrases when the embedding is diluted by surrounding text. RRF fusion gets both right. Multi-query expansion catches phrasings you didn't think of.
## Database schema
9 tables in Postgres + pgvector:
```
pages The core content table
slug (UNIQUE) e.g. "concepts/do-things-that-dont-scale"
type person, company, deal, yc, civic, project, concept, source, media
title, compiled_truth, timeline
frontmatter (JSONB) Arbitrary metadata
search_vector Trigger-based tsvector (title + compiled_truth + timeline + timeline_entries)
content_hash SHA-256 for import idempotency
content_chunks Chunked content with embeddings
page_id (FK) Links to pages
chunk_text The chunk content
chunk_source 'compiled_truth' or 'timeline'
embedding (vector) 1536-dim from text-embedding-3-large
HNSW index Cosine similarity search
links Cross-references between pages
from_page_id, to_page_id
link_type knows, invested_in, works_at, founded, references, etc.
tags page_id + tag (many-to-many)
timeline_entries Structured timeline events
page_id, date, source, summary, detail (markdown)
page_versions Snapshot history for compiled_truth
compiled_truth, frontmatter, snapshot_at
raw_data Sidecar JSON from external APIs
page_id, source, data (JSONB)
ingest_log Audit trail of import/ingest operations
config Brain-level settings (embedding model, chunk strategy)
```
Indexes: B-tree on slug/type, GIN on frontmatter/search_vector, HNSW on embeddings, pg_trgm on title for fuzzy slug resolution.
## Chunking
Three strategies, dispatched by content type:
**Recursive** (timeline, bulk import): 5-level delimiter hierarchy (paragraphs, lines, sentences, clauses, words). 300-word chunks with 50-word sentence-aware overlap. Fast, predictable, lossless.
**Semantic** (compiled truth): Embeds each sentence, computes adjacent cosine similarities, applies Savitzky-Golay smoothing to find topic boundaries. Falls back to recursive on failure. Best quality for intelligence assessments.
**LLM-guided** (high-value content, on request): Pre-splits into 128-word candidates, asks Claude Haiku to identify topic shifts in sliding windows. 3 retries per window. Most expensive, best results.
## Commands
```
SETUP
gbrain init [--supabase|--url <conn>] Create brain (guided wizard)
gbrain upgrade Self-update
PAGES
gbrain get <slug> Read a page (supports fuzzy slug matching)
gbrain put <slug> [< file.md] Write/update a page (auto-versions)
gbrain delete <slug> Delete a page
gbrain list [--type T] [--tag T] [-n N] List pages with filters
SEARCH
gbrain search <query> Keyword search (tsvector)
gbrain query <question> Hybrid search (vector + keyword + RRF + expansion)
IMPORT/EXPORT
gbrain import <dir> [--no-embed] Import markdown directory (idempotent)
gbrain export [--dir ./out/] Export to markdown (round-trip)
EMBEDDINGS
gbrain embed [<slug>|--all|--stale] Generate/refresh embeddings
LINKS + GRAPH
gbrain link <from> <to> [--type T] Create typed link
gbrain unlink <from> <to> Remove link
gbrain backlinks <slug> Incoming links
gbrain graph <slug> [--depth N] Traverse link graph (recursive CTE, default depth 5)
TAGS
gbrain tags <slug> List tags
gbrain tag <slug> <tag> Add tag
gbrain untag <slug> <tag> Remove tag
TIMELINE
gbrain timeline [<slug>] View timeline entries
gbrain timeline-add <slug> <date> <text> Add timeline entry
ADMIN
gbrain stats Brain statistics
gbrain health Health dashboard (embed coverage, stale, orphans)
gbrain history <slug> Page version history
gbrain revert <slug> <version-id> Revert to previous version
gbrain config [get|set] <key> [value] Brain config
gbrain serve MCP server (stdio)
gbrain call <tool> '<json>' Raw tool invocation
gbrain --tools-json Tool discovery (JSON)
```
## Using as a library
GBrain is library-first. The CLI and MCP server are thin wrappers over the engine.
```typescript
import { PostgresEngine } from 'gbrain';
const engine = new PostgresEngine();
await engine.connect({ database_url: process.env.DATABASE_URL });
await engine.initSchema();
// Write a page
await engine.putPage('concepts/superlinear-returns', {
type: 'concept',
title: 'Superlinear Returns',
compiled_truth: 'Paul Graham argues that returns in many fields are superlinear...',
timeline: '- 2023-10-01: Published on paulgraham.com',
});
// Hybrid search
const results = await engine.searchKeyword('startup growth');
// Typed links
await engine.addLink('concepts/superlinear-returns', 'concepts/do-things-that-dont-scale', '', 'references');
// Graph traversal
const graph = await engine.traverseGraph('concepts/superlinear-returns', 3);
// Health check
const health = await engine.getHealth();
// { page_count: 10, embed_coverage: 1.0, stale_pages: 0, orphan_pages: 10 }
```
The `BrainEngine` interface is pluggable. See `docs/ENGINES.md` for how to add backends.
## MCP server
Add to your Claude Code or Cursor MCP config:
```json
{
"mcpServers": {
"gbrain": {
"command": "gbrain",
"args": ["serve"]
}
}
}
```
20 tools: get_page, put_page, delete_page, list_pages, search, query, add_tag, remove_tag, get_tags, add_link, remove_link, get_links, get_backlinks, traverse_graph, add_timeline_entry, get_timeline, get_stats, get_health, get_versions, revert_version.
Every tool mirrors a CLI command. Drift tests verify identical behavior.
## Skills
Fat markdown files that tell AI agents HOW to use gbrain. No skill logic in the binary.
| Skill | What it does |
|-------|-------------|
| **ingest** | Ingest meetings, docs, articles. Updates compiled truth (rewrite, not append), appends timeline, creates cross-reference links across all mentioned entities. |
| **query** | 3-layer search (keyword + vector + structured) with synthesis and citations. Says "the brain doesn't have info on X" rather than hallucinating. |
| **maintain** | Periodic health: find contradictions, stale compiled truth, orphan pages, dead links, tag inconsistency, missing embeddings, overdue threads. |
| **enrich** | Enrich pages from external APIs. Raw data stored separately, distilled highlights go to compiled truth. |
| **briefing** | Daily briefing: today's meetings with participant context, active deals with deadlines, time-sensitive threads, recent changes. |
| **migrate** | Universal migration from Obsidian (wikilinks to gbrain links), Notion (stripped UUIDs), Logseq (block refs), plain markdown, CSV, JSON, Roam. |
## Architecture
```
CLI / MCP Server
(thin wrappers, identical operations)
|
BrainEngine interface
(pluggable backend)
|
+--------+--------+
| |
PostgresEngine SQLiteEngine
(ships v0) (designed, community PRs welcome)
|
Supabase Pro ($25/mo)
Postgres + pgvector + pg_trgm
connection pooling via Supavisor
```
Embedding, chunking, and search fusion are engine-agnostic. Only raw keyword search (`searchKeyword`) and raw vector search (`searchVector`) are engine-specific. RRF fusion, multi-query expansion, and 4-layer dedup run above the engine on `SearchResult[]` arrays.
## Storage estimates
For a brain with ~7,500 pages:
| Component | Size |
|-----------|------|
| Page text (compiled_truth + timeline) | ~150MB |
| JSONB frontmatter + indexes | ~70MB |
| Content chunks (~22K, text) | ~80MB |
| Embeddings (22K x 1536 floats) | ~134MB |
| HNSW index overhead | ~270MB |
| Links, tags, timeline, versions | ~50MB |
| **Total** | **~750MB** |
Supabase free tier (500MB) won't fit a large brain. Supabase Pro ($25/mo, 8GB) is the starting point.
Initial embedding cost: ~$4-5 for 7,500 pages via OpenAI text-embedding-3-large.
## Docs
- [GBRAIN_V0.md](docs/GBRAIN_V0.md) -- Full product spec, all architecture decisions, every option considered
- [ENGINES.md](docs/ENGINES.md) -- Pluggable engine interface, capability matrix, how to add backends
- [SQLITE_ENGINE.md](docs/SQLITE_ENGINE.md) -- Complete SQLite engine plan with schema, FTS5, vector search options
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md). Welcome PRs for:
- SQLite engine implementation
- Docker Compose for self-hosted Postgres
- Additional migration sources
- New enrichment API integrations
## License
MIT

1
VERSION Normal file
View File

@@ -0,0 +1 @@
0.1.0

289
bun.lock Normal file
View File

@@ -0,0 +1,289 @@
{
"lockfileVersion": 1,
"configVersion": 1,
"workspaces": {
"": {
"name": "gbrain",
"dependencies": {
"@anthropic-ai/sdk": "^0.30.0",
"@modelcontextprotocol/sdk": "^1.0.0",
"gray-matter": "^4.0.3",
"openai": "^4.0.0",
"pgvector": "^0.2.0",
"postgres": "^3.4.0",
},
"devDependencies": {
"@types/bun": "latest",
},
},
},
"packages": {
"@anthropic-ai/sdk": ["@anthropic-ai/sdk@0.30.1", "", { "dependencies": { "@types/node": "^18.11.18", "@types/node-fetch": "^2.6.4", "abort-controller": "^3.0.0", "agentkeepalive": "^4.2.1", "form-data-encoder": "1.7.2", "formdata-node": "^4.3.2", "node-fetch": "^2.6.7" } }, "sha512-nuKvp7wOIz6BFei8WrTdhmSsx5mwnArYyJgh4+vYu3V4J0Ltb8Xm3odPm51n1aSI0XxNCrDl7O88cxCtUdAkaw=="],
"@hono/node-server": ["@hono/node-server@1.19.12", "", { "peerDependencies": { "hono": "^4" } }, "sha512-txsUW4SQ1iilgE0l9/e9VQWmELXifEFvmdA1j6WFh/aFPj99hIntrSsq/if0UWyGVkmrRPKA1wCeP+UCr1B9Uw=="],
"@modelcontextprotocol/sdk": ["@modelcontextprotocol/sdk@1.29.0", "", { "dependencies": { "@hono/node-server": "^1.19.9", "ajv": "^8.17.1", "ajv-formats": "^3.0.1", "content-type": "^1.0.5", "cors": "^2.8.5", "cross-spawn": "^7.0.5", "eventsource": "^3.0.2", "eventsource-parser": "^3.0.0", "express": "^5.2.1", "express-rate-limit": "^8.2.1", "hono": "^4.11.4", "jose": "^6.1.3", "json-schema-typed": "^8.0.2", "pkce-challenge": "^5.0.0", "raw-body": "^3.0.0", "zod": "^3.25 || ^4.0", "zod-to-json-schema": "^3.25.1" }, "peerDependencies": { "@cfworker/json-schema": "^4.1.1" }, "optionalPeers": ["@cfworker/json-schema"] }, "sha512-zo37mZA9hJWpULgkRpowewez1y6ML5GsXJPY8FI0tBBCd77HEvza4jDqRKOXgHNn867PVGCyTdzqpz0izu5ZjQ=="],
"@types/bun": ["@types/bun@1.3.11", "", { "dependencies": { "bun-types": "1.3.11" } }, "sha512-5vPne5QvtpjGpsGYXiFyycfpDF2ECyPcTSsFBMa0fraoxiQyMJ3SmuQIGhzPg2WJuWxVBoxWJ2kClYTcw/4fAg=="],
"@types/node": ["@types/node@18.19.130", "", { "dependencies": { "undici-types": "~5.26.4" } }, "sha512-GRaXQx6jGfL8sKfaIDD6OupbIHBr9jv7Jnaml9tB7l4v068PAOXqfcujMMo5PhbIs6ggR1XODELqahT2R8v0fg=="],
"@types/node-fetch": ["@types/node-fetch@2.6.13", "", { "dependencies": { "@types/node": "*", "form-data": "^4.0.4" } }, "sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw=="],
"abort-controller": ["abort-controller@3.0.0", "", { "dependencies": { "event-target-shim": "^5.0.0" } }, "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg=="],
"accepts": ["accepts@2.0.0", "", { "dependencies": { "mime-types": "^3.0.0", "negotiator": "^1.0.0" } }, "sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng=="],
"agentkeepalive": ["agentkeepalive@4.6.0", "", { "dependencies": { "humanize-ms": "^1.2.1" } }, "sha512-kja8j7PjmncONqaTsB8fQ+wE2mSU2DJ9D4XKoJ5PFWIdRMa6SLSN1ff4mOr4jCbfRSsxR4keIiySJU0N9T5hIQ=="],
"ajv": ["ajv@8.18.0", "", { "dependencies": { "fast-deep-equal": "^3.1.3", "fast-uri": "^3.0.1", "json-schema-traverse": "^1.0.0", "require-from-string": "^2.0.2" } }, "sha512-PlXPeEWMXMZ7sPYOHqmDyCJzcfNrUr3fGNKtezX14ykXOEIvyK81d+qydx89KY5O71FKMPaQ2vBfBFI5NHR63A=="],
"ajv-formats": ["ajv-formats@3.0.1", "", { "dependencies": { "ajv": "^8.0.0" } }, "sha512-8iUql50EUR+uUcdRQ3HDqa6EVyo3docL8g5WJ3FNcWmu62IbkGUue/pEyLBW8VGKKucTPgqeks4fIU1DA4yowQ=="],
"argparse": ["argparse@1.0.10", "", { "dependencies": { "sprintf-js": "~1.0.2" } }, "sha512-o5Roy6tNG4SL/FOkCAN6RzjiakZS25RLYFrcMttJqbdd8BWrnA+fGz57iN5Pb06pvBGvl5gQ0B48dJlslXvoTg=="],
"asynckit": ["asynckit@0.4.0", "", {}, "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q=="],
"body-parser": ["body-parser@2.2.2", "", { "dependencies": { "bytes": "^3.1.2", "content-type": "^1.0.5", "debug": "^4.4.3", "http-errors": "^2.0.0", "iconv-lite": "^0.7.0", "on-finished": "^2.4.1", "qs": "^6.14.1", "raw-body": "^3.0.1", "type-is": "^2.0.1" } }, "sha512-oP5VkATKlNwcgvxi0vM0p/D3n2C3EReYVX+DNYs5TjZFn/oQt2j+4sVJtSMr18pdRr8wjTcBl6LoV+FUwzPmNA=="],
"bun-types": ["bun-types@1.3.11", "", { "dependencies": { "@types/node": "*" } }, "sha512-1KGPpoxQWl9f6wcZh57LvrPIInQMn2TQ7jsgxqpRzg+l0QPOFvJVH7HmvHo/AiPgwXy+/Thf6Ov3EdVn1vOabg=="],
"bytes": ["bytes@3.1.2", "", {}, "sha512-/Nf7TyzTx6S3yRJObOAV7956r8cr2+Oj8AC5dt8wSP3BQAoeX58NoHyCU8P8zGkNXStjTSi6fzO6F0pBdcYbEg=="],
"call-bind-apply-helpers": ["call-bind-apply-helpers@1.0.2", "", { "dependencies": { "es-errors": "^1.3.0", "function-bind": "^1.1.2" } }, "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ=="],
"call-bound": ["call-bound@1.0.4", "", { "dependencies": { "call-bind-apply-helpers": "^1.0.2", "get-intrinsic": "^1.3.0" } }, "sha512-+ys997U96po4Kx/ABpBCqhA9EuxJaQWDQg7295H4hBphv3IZg0boBKuwYpt4YXp6MZ5AmZQnU/tyMTlRpaSejg=="],
"combined-stream": ["combined-stream@1.0.8", "", { "dependencies": { "delayed-stream": "~1.0.0" } }, "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg=="],
"content-disposition": ["content-disposition@1.0.1", "", {}, "sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q=="],
"content-type": ["content-type@1.0.5", "", {}, "sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA=="],
"cookie": ["cookie@0.7.2", "", {}, "sha512-yki5XnKuf750l50uGTllt6kKILY4nQ1eNIQatoXEByZ5dWgnKqbnqmTrBE5B4N7lrMJKQ2ytWMiTO2o0v6Ew/w=="],
"cookie-signature": ["cookie-signature@1.2.2", "", {}, "sha512-D76uU73ulSXrD1UXF4KE2TMxVVwhsnCgfAyTg9k8P6KGZjlXKrOLe4dJQKI3Bxi5wjesZoFXJWElNWBjPZMbhg=="],
"cors": ["cors@2.8.6", "", { "dependencies": { "object-assign": "^4", "vary": "^1" } }, "sha512-tJtZBBHA6vjIAaF6EnIaq6laBBP9aq/Y3ouVJjEfoHbRBcHBAHYcMh/w8LDrk2PvIMMq8gmopa5D4V8RmbrxGw=="],
"cross-spawn": ["cross-spawn@7.0.6", "", { "dependencies": { "path-key": "^3.1.0", "shebang-command": "^2.0.0", "which": "^2.0.1" } }, "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA=="],
"debug": ["debug@4.4.3", "", { "dependencies": { "ms": "^2.1.3" } }, "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA=="],
"delayed-stream": ["delayed-stream@1.0.0", "", {}, "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ=="],
"depd": ["depd@2.0.0", "", {}, "sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw=="],
"dunder-proto": ["dunder-proto@1.0.1", "", { "dependencies": { "call-bind-apply-helpers": "^1.0.1", "es-errors": "^1.3.0", "gopd": "^1.2.0" } }, "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A=="],
"ee-first": ["ee-first@1.1.1", "", {}, "sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow=="],
"encodeurl": ["encodeurl@2.0.0", "", {}, "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg=="],
"es-define-property": ["es-define-property@1.0.1", "", {}, "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g=="],
"es-errors": ["es-errors@1.3.0", "", {}, "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw=="],
"es-object-atoms": ["es-object-atoms@1.1.1", "", { "dependencies": { "es-errors": "^1.3.0" } }, "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA=="],
"es-set-tostringtag": ["es-set-tostringtag@2.1.0", "", { "dependencies": { "es-errors": "^1.3.0", "get-intrinsic": "^1.2.6", "has-tostringtag": "^1.0.2", "hasown": "^2.0.2" } }, "sha512-j6vWzfrGVfyXxge+O0x5sh6cvxAog0a/4Rdd2K36zCMV5eJ+/+tOAngRO8cODMNWbVRdVlmGZQL2YS3yR8bIUA=="],
"escape-html": ["escape-html@1.0.3", "", {}, "sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow=="],
"esprima": ["esprima@4.0.1", "", { "bin": { "esparse": "./bin/esparse.js", "esvalidate": "./bin/esvalidate.js" } }, "sha512-eGuFFw7Upda+g4p+QHvnW0RyTX/SVeJBDM/gCtMARO0cLuT2HcEKnTPvhjV6aGeqrCB/sbNop0Kszm0jsaWU4A=="],
"etag": ["etag@1.8.1", "", {}, "sha512-aIL5Fx7mawVa300al2BnEE4iNvo1qETxLrPI/o05L7z6go7fCw1J6EQmbK4FmJ2AS7kgVF/KEZWufBfdClMcPg=="],
"event-target-shim": ["event-target-shim@5.0.1", "", {}, "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ=="],
"eventsource": ["eventsource@3.0.7", "", { "dependencies": { "eventsource-parser": "^3.0.1" } }, "sha512-CRT1WTyuQoD771GW56XEZFQ/ZoSfWid1alKGDYMmkt2yl8UXrVR4pspqWNEcqKvVIzg6PAltWjxcSSPrboA4iA=="],
"eventsource-parser": ["eventsource-parser@3.0.6", "", {}, "sha512-Vo1ab+QXPzZ4tCa8SwIHJFaSzy4R6SHf7BY79rFBDf0idraZWAkYrDjDj8uWaSm3S2TK+hJ7/t1CEmZ7jXw+pg=="],
"express": ["express@5.2.1", "", { "dependencies": { "accepts": "^2.0.0", "body-parser": "^2.2.1", "content-disposition": "^1.0.0", "content-type": "^1.0.5", "cookie": "^0.7.1", "cookie-signature": "^1.2.1", "debug": "^4.4.0", "depd": "^2.0.0", "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "etag": "^1.8.1", "finalhandler": "^2.1.0", "fresh": "^2.0.0", "http-errors": "^2.0.0", "merge-descriptors": "^2.0.0", "mime-types": "^3.0.0", "on-finished": "^2.4.1", "once": "^1.4.0", "parseurl": "^1.3.3", "proxy-addr": "^2.0.7", "qs": "^6.14.0", "range-parser": "^1.2.1", "router": "^2.2.0", "send": "^1.1.0", "serve-static": "^2.2.0", "statuses": "^2.0.1", "type-is": "^2.0.1", "vary": "^1.1.2" } }, "sha512-hIS4idWWai69NezIdRt2xFVofaF4j+6INOpJlVOLDO8zXGpUVEVzIYk12UUi2JzjEzWL3IOAxcTubgz9Po0yXw=="],
"express-rate-limit": ["express-rate-limit@8.3.2", "", { "dependencies": { "ip-address": "10.1.0" }, "peerDependencies": { "express": ">= 4.11" } }, "sha512-77VmFeJkO0/rvimEDuUC5H30oqUC4EyOhyGccfqoLebB0oiEYfM7nwPrsDsBL1gsTpwfzX8SFy2MT3TDyRq+bg=="],
"extend-shallow": ["extend-shallow@2.0.1", "", { "dependencies": { "is-extendable": "^0.1.0" } }, "sha512-zCnTtlxNoAiDc3gqY2aYAWFx7XWWiasuF2K8Me5WbN8otHKTUKBwjPtNpRs/rbUZm7KxWAaNj7P1a/p52GbVug=="],
"fast-deep-equal": ["fast-deep-equal@3.1.3", "", {}, "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q=="],
"fast-uri": ["fast-uri@3.1.0", "", {}, "sha512-iPeeDKJSWf4IEOasVVrknXpaBV0IApz/gp7S2bb7Z4Lljbl2MGJRqInZiUrQwV16cpzw/D3S5j5Julj/gT52AA=="],
"finalhandler": ["finalhandler@2.1.1", "", { "dependencies": { "debug": "^4.4.0", "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "on-finished": "^2.4.1", "parseurl": "^1.3.3", "statuses": "^2.0.1" } }, "sha512-S8KoZgRZN+a5rNwqTxlZZePjT/4cnm0ROV70LedRHZ0p8u9fRID0hJUZQpkKLzro8LfmC8sx23bY6tVNxv8pQA=="],
"form-data": ["form-data@4.0.5", "", { "dependencies": { "asynckit": "^0.4.0", "combined-stream": "^1.0.8", "es-set-tostringtag": "^2.1.0", "hasown": "^2.0.2", "mime-types": "^2.1.12" } }, "sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w=="],
"form-data-encoder": ["form-data-encoder@1.7.2", "", {}, "sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A=="],
"formdata-node": ["formdata-node@4.4.1", "", { "dependencies": { "node-domexception": "1.0.0", "web-streams-polyfill": "4.0.0-beta.3" } }, "sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ=="],
"forwarded": ["forwarded@0.2.0", "", {}, "sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow=="],
"fresh": ["fresh@2.0.0", "", {}, "sha512-Rx/WycZ60HOaqLKAi6cHRKKI7zxWbJ31MhntmtwMoaTeF7XFH9hhBp8vITaMidfljRQ6eYWCKkaTK+ykVJHP2A=="],
"function-bind": ["function-bind@1.1.2", "", {}, "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA=="],
"get-intrinsic": ["get-intrinsic@1.3.0", "", { "dependencies": { "call-bind-apply-helpers": "^1.0.2", "es-define-property": "^1.0.1", "es-errors": "^1.3.0", "es-object-atoms": "^1.1.1", "function-bind": "^1.1.2", "get-proto": "^1.0.1", "gopd": "^1.2.0", "has-symbols": "^1.1.0", "hasown": "^2.0.2", "math-intrinsics": "^1.1.0" } }, "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ=="],
"get-proto": ["get-proto@1.0.1", "", { "dependencies": { "dunder-proto": "^1.0.1", "es-object-atoms": "^1.0.0" } }, "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g=="],
"gopd": ["gopd@1.2.0", "", {}, "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg=="],
"gray-matter": ["gray-matter@4.0.3", "", { "dependencies": { "js-yaml": "^3.13.1", "kind-of": "^6.0.2", "section-matter": "^1.0.0", "strip-bom-string": "^1.0.0" } }, "sha512-5v6yZd4JK3eMI3FqqCouswVqwugaA9r4dNZB1wwcmrD02QkV5H0y7XBQW8QwQqEaZY1pM9aqORSORhJRdNK44Q=="],
"has-symbols": ["has-symbols@1.1.0", "", {}, "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ=="],
"has-tostringtag": ["has-tostringtag@1.0.2", "", { "dependencies": { "has-symbols": "^1.0.3" } }, "sha512-NqADB8VjPFLM2V0VvHUewwwsw0ZWBaIdgo+ieHtK3hasLz4qeCRjYcqfB6AQrBggRKppKF8L52/VqdVsO47Dlw=="],
"hasown": ["hasown@2.0.2", "", { "dependencies": { "function-bind": "^1.1.2" } }, "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ=="],
"hono": ["hono@4.12.10", "", {}, "sha512-mx/p18PLy5og9ufies2GOSUqep98Td9q4i/EF6X7yJgAiIopxqdfIO3jbqsi3jRgTgw88jMDEzVKi+V2EF+27w=="],
"http-errors": ["http-errors@2.0.1", "", { "dependencies": { "depd": "~2.0.0", "inherits": "~2.0.4", "setprototypeof": "~1.2.0", "statuses": "~2.0.2", "toidentifier": "~1.0.1" } }, "sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ=="],
"humanize-ms": ["humanize-ms@1.2.1", "", { "dependencies": { "ms": "^2.0.0" } }, "sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ=="],
"iconv-lite": ["iconv-lite@0.7.2", "", { "dependencies": { "safer-buffer": ">= 2.1.2 < 3.0.0" } }, "sha512-im9DjEDQ55s9fL4EYzOAv0yMqmMBSZp6G0VvFyTMPKWxiSBHUj9NW/qqLmXUwXrrM7AvqSlTCfvqRb0cM8yYqw=="],
"inherits": ["inherits@2.0.4", "", {}, "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ=="],
"ip-address": ["ip-address@10.1.0", "", {}, "sha512-XXADHxXmvT9+CRxhXg56LJovE+bmWnEWB78LB83VZTprKTmaC5QfruXocxzTZ2Kl0DNwKuBdlIhjL8LeY8Sf8Q=="],
"ipaddr.js": ["ipaddr.js@1.9.1", "", {}, "sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g=="],
"is-extendable": ["is-extendable@0.1.1", "", {}, "sha512-5BMULNob1vgFX6EjQw5izWDxrecWK9AM72rugNr0TFldMOi0fj6Jk+zeKIt0xGj4cEfQIJth4w3OKWOJ4f+AFw=="],
"is-promise": ["is-promise@4.0.0", "", {}, "sha512-hvpoI6korhJMnej285dSg6nu1+e6uxs7zG3BYAm5byqDsgJNWwxzM6z6iZiAgQR4TJ30JmBTOwqZUw3WlyH3AQ=="],
"isexe": ["isexe@2.0.0", "", {}, "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw=="],
"jose": ["jose@6.2.2", "", {}, "sha512-d7kPDd34KO/YnzaDOlikGpOurfF0ByC2sEV4cANCtdqLlTfBlw2p14O/5d/zv40gJPbIQxfES3nSx1/oYNyuZQ=="],
"js-yaml": ["js-yaml@3.14.2", "", { "dependencies": { "argparse": "^1.0.7", "esprima": "^4.0.0" }, "bin": { "js-yaml": "bin/js-yaml.js" } }, "sha512-PMSmkqxr106Xa156c2M265Z+FTrPl+oxd/rgOQy2tijQeK5TxQ43psO1ZCwhVOSdnn+RzkzlRz/eY4BgJBYVpg=="],
"json-schema-traverse": ["json-schema-traverse@1.0.0", "", {}, "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug=="],
"json-schema-typed": ["json-schema-typed@8.0.2", "", {}, "sha512-fQhoXdcvc3V28x7C7BMs4P5+kNlgUURe2jmUT1T//oBRMDrqy1QPelJimwZGo7Hg9VPV3EQV5Bnq4hbFy2vetA=="],
"kind-of": ["kind-of@6.0.3", "", {}, "sha512-dcS1ul+9tmeD95T+x28/ehLgd9mENa3LsvDTtzm3vyBEO7RPptvAD+t44WVXaUjTBRcrpFeFlC8WCruUR456hw=="],
"math-intrinsics": ["math-intrinsics@1.1.0", "", {}, "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g=="],
"media-typer": ["media-typer@1.1.0", "", {}, "sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw=="],
"merge-descriptors": ["merge-descriptors@2.0.0", "", {}, "sha512-Snk314V5ayFLhp3fkUREub6WtjBfPdCPY1Ln8/8munuLuiYhsABgBVWsozAG+MWMbVEvcdcpbi9R7ww22l9Q3g=="],
"mime-db": ["mime-db@1.54.0", "", {}, "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ=="],
"mime-types": ["mime-types@3.0.2", "", { "dependencies": { "mime-db": "^1.54.0" } }, "sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A=="],
"ms": ["ms@2.1.3", "", {}, "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA=="],
"negotiator": ["negotiator@1.0.0", "", {}, "sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg=="],
"node-domexception": ["node-domexception@1.0.0", "", {}, "sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ=="],
"node-fetch": ["node-fetch@2.7.0", "", { "dependencies": { "whatwg-url": "^5.0.0" }, "peerDependencies": { "encoding": "^0.1.0" }, "optionalPeers": ["encoding"] }, "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A=="],
"object-assign": ["object-assign@4.1.1", "", {}, "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg=="],
"object-inspect": ["object-inspect@1.13.4", "", {}, "sha512-W67iLl4J2EXEGTbfeHCffrjDfitvLANg0UlX3wFUUSTx92KXRFegMHUVgSqE+wvhAbi4WqjGg9czysTV2Epbew=="],
"on-finished": ["on-finished@2.4.1", "", { "dependencies": { "ee-first": "1.1.1" } }, "sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg=="],
"once": ["once@1.4.0", "", { "dependencies": { "wrappy": "1" } }, "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w=="],
"openai": ["openai@4.104.0", "", { "dependencies": { "@types/node": "^18.11.18", "@types/node-fetch": "^2.6.4", "abort-controller": "^3.0.0", "agentkeepalive": "^4.2.1", "form-data-encoder": "1.7.2", "formdata-node": "^4.3.2", "node-fetch": "^2.6.7" }, "peerDependencies": { "ws": "^8.18.0", "zod": "^3.23.8" }, "optionalPeers": ["ws", "zod"], "bin": { "openai": "bin/cli" } }, "sha512-p99EFNsA/yX6UhVO93f5kJsDRLAg+CTA2RBqdHK4RtK8u5IJw32Hyb2dTGKbnnFmnuoBv5r7Z2CURI9sGZpSuA=="],
"parseurl": ["parseurl@1.3.3", "", {}, "sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ=="],
"path-key": ["path-key@3.1.1", "", {}, "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q=="],
"path-to-regexp": ["path-to-regexp@8.4.2", "", {}, "sha512-qRcuIdP69NPm4qbACK+aDogI5CBDMi1jKe0ry5rSQJz8JVLsC7jV8XpiJjGRLLol3N+R5ihGYcrPLTno6pAdBA=="],
"pgvector": ["pgvector@0.2.1", "", {}, "sha512-nKaQY9wtuiidwLMdVIce1O3kL0d+FxrigCVzsShnoqzOSaWWWOvuctb/sYwlai5cTwwzRSNa+a/NtN2kVZGNJw=="],
"pkce-challenge": ["pkce-challenge@5.0.1", "", {}, "sha512-wQ0b/W4Fr01qtpHlqSqspcj3EhBvimsdh0KlHhH8HRZnMsEa0ea2fTULOXOS9ccQr3om+GcGRk4e+isrZWV8qQ=="],
"postgres": ["postgres@3.4.9", "", {}, "sha512-GD3qdB0x1z9xgFI6cdRD6xu2Sp2WCOEoe3mtnyB5Ee0XrrL5Pe+e4CCnJrRMnL1zYtRDZmQQVbvOttLnKDLnaw=="],
"proxy-addr": ["proxy-addr@2.0.7", "", { "dependencies": { "forwarded": "0.2.0", "ipaddr.js": "1.9.1" } }, "sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg=="],
"qs": ["qs@6.15.0", "", { "dependencies": { "side-channel": "^1.1.0" } }, "sha512-mAZTtNCeetKMH+pSjrb76NAM8V9a05I9aBZOHztWy/UqcJdQYNsf59vrRKWnojAT9Y+GbIvoTBC++CPHqpDBhQ=="],
"range-parser": ["range-parser@1.2.1", "", {}, "sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg=="],
"raw-body": ["raw-body@3.0.2", "", { "dependencies": { "bytes": "~3.1.2", "http-errors": "~2.0.1", "iconv-lite": "~0.7.0", "unpipe": "~1.0.0" } }, "sha512-K5zQjDllxWkf7Z5xJdV0/B0WTNqx6vxG70zJE4N0kBs4LovmEYWJzQGxC9bS9RAKu3bgM40lrd5zoLJ12MQ5BA=="],
"require-from-string": ["require-from-string@2.0.2", "", {}, "sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw=="],
"router": ["router@2.2.0", "", { "dependencies": { "debug": "^4.4.0", "depd": "^2.0.0", "is-promise": "^4.0.0", "parseurl": "^1.3.3", "path-to-regexp": "^8.0.0" } }, "sha512-nLTrUKm2UyiL7rlhapu/Zl45FwNgkZGaCpZbIHajDYgwlJCOzLSk+cIPAnsEqV955GjILJnKbdQC1nVPz+gAYQ=="],
"safer-buffer": ["safer-buffer@2.1.2", "", {}, "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg=="],
"section-matter": ["section-matter@1.0.0", "", { "dependencies": { "extend-shallow": "^2.0.1", "kind-of": "^6.0.0" } }, "sha512-vfD3pmTzGpufjScBh50YHKzEu2lxBWhVEHsNGoEXmCmn2hKGfeNLYMzCJpe8cD7gqX7TJluOVpBkAequ6dgMmA=="],
"send": ["send@1.2.1", "", { "dependencies": { "debug": "^4.4.3", "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "etag": "^1.8.1", "fresh": "^2.0.0", "http-errors": "^2.0.1", "mime-types": "^3.0.2", "ms": "^2.1.3", "on-finished": "^2.4.1", "range-parser": "^1.2.1", "statuses": "^2.0.2" } }, "sha512-1gnZf7DFcoIcajTjTwjwuDjzuz4PPcY2StKPlsGAQ1+YH20IRVrBaXSWmdjowTJ6u8Rc01PoYOGHXfP1mYcZNQ=="],
"serve-static": ["serve-static@2.2.1", "", { "dependencies": { "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "parseurl": "^1.3.3", "send": "^1.2.0" } }, "sha512-xRXBn0pPqQTVQiC8wyQrKs2MOlX24zQ0POGaj0kultvoOCstBQM5yvOhAVSUwOMjQtTvsPWoNCHfPGwaaQJhTw=="],
"setprototypeof": ["setprototypeof@1.2.0", "", {}, "sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw=="],
"shebang-command": ["shebang-command@2.0.0", "", { "dependencies": { "shebang-regex": "^3.0.0" } }, "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA=="],
"shebang-regex": ["shebang-regex@3.0.0", "", {}, "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A=="],
"side-channel": ["side-channel@1.1.0", "", { "dependencies": { "es-errors": "^1.3.0", "object-inspect": "^1.13.3", "side-channel-list": "^1.0.0", "side-channel-map": "^1.0.1", "side-channel-weakmap": "^1.0.2" } }, "sha512-ZX99e6tRweoUXqR+VBrslhda51Nh5MTQwou5tnUDgbtyM0dBgmhEDtWGP/xbKn6hqfPRHujUNwz5fy/wbbhnpw=="],
"side-channel-list": ["side-channel-list@1.0.0", "", { "dependencies": { "es-errors": "^1.3.0", "object-inspect": "^1.13.3" } }, "sha512-FCLHtRD/gnpCiCHEiJLOwdmFP+wzCmDEkc9y7NsYxeF4u7Btsn1ZuwgwJGxImImHicJArLP4R0yX4c2KCrMrTA=="],
"side-channel-map": ["side-channel-map@1.0.1", "", { "dependencies": { "call-bound": "^1.0.2", "es-errors": "^1.3.0", "get-intrinsic": "^1.2.5", "object-inspect": "^1.13.3" } }, "sha512-VCjCNfgMsby3tTdo02nbjtM/ewra6jPHmpThenkTYh8pG9ucZ/1P8So4u4FGBek/BjpOVsDCMoLA/iuBKIFXRA=="],
"side-channel-weakmap": ["side-channel-weakmap@1.0.2", "", { "dependencies": { "call-bound": "^1.0.2", "es-errors": "^1.3.0", "get-intrinsic": "^1.2.5", "object-inspect": "^1.13.3", "side-channel-map": "^1.0.1" } }, "sha512-WPS/HvHQTYnHisLo9McqBHOJk2FkHO/tlpvldyrnem4aeQp4hai3gythswg6p01oSoTl58rcpiFAjF2br2Ak2A=="],
"sprintf-js": ["sprintf-js@1.0.3", "", {}, "sha512-D9cPgkvLlV3t3IzL0D0YLvGA9Ahk4PcvVwUbN0dSGr1aP0Nrt4AEnTUbuGvquEC0mA64Gqt1fzirlRs5ibXx8g=="],
"statuses": ["statuses@2.0.2", "", {}, "sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw=="],
"strip-bom-string": ["strip-bom-string@1.0.0", "", {}, "sha512-uCC2VHvQRYu+lMh4My/sFNmF2klFymLX1wHJeXnbEJERpV/ZsVuonzerjfrGpIGF7LBVa1O7i9kjiWvJiFck8g=="],
"toidentifier": ["toidentifier@1.0.1", "", {}, "sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA=="],
"tr46": ["tr46@0.0.3", "", {}, "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw=="],
"type-is": ["type-is@2.0.1", "", { "dependencies": { "content-type": "^1.0.5", "media-typer": "^1.1.0", "mime-types": "^3.0.0" } }, "sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw=="],
"undici-types": ["undici-types@5.26.5", "", {}, "sha512-JlCMO+ehdEIKqlFxk6IfVoAUVmgz7cU7zD/h9XZ0qzeosSHmUJVOzSQvvYSYWXkFXC+IfLKSIffhv0sVZup6pA=="],
"unpipe": ["unpipe@1.0.0", "", {}, "sha512-pjy2bYhSsufwWlKwPc+l3cN7+wuJlK6uz0YdJEOlQDbl6jo/YlPi4mb8agUkVC8BF7V8NuzeyPNqRksA3hztKQ=="],
"vary": ["vary@1.1.2", "", {}, "sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg=="],
"web-streams-polyfill": ["web-streams-polyfill@4.0.0-beta.3", "", {}, "sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug=="],
"webidl-conversions": ["webidl-conversions@3.0.1", "", {}, "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ=="],
"whatwg-url": ["whatwg-url@5.0.0", "", { "dependencies": { "tr46": "~0.0.3", "webidl-conversions": "^3.0.0" } }, "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw=="],
"which": ["which@2.0.2", "", { "dependencies": { "isexe": "^2.0.0" }, "bin": { "node-which": "./bin/node-which" } }, "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA=="],
"wrappy": ["wrappy@1.0.2", "", {}, "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ=="],
"zod": ["zod@4.3.6", "", {}, "sha512-rftlrkhHZOcjDwkGlnUtZZkvaPHCsDATp4pGpuOOMDaTdDDXF91wuVDJoWoPsKX/3YPQ5fHuF3STjcYyKr+Qhg=="],
"zod-to-json-schema": ["zod-to-json-schema@3.25.2", "", { "peerDependencies": { "zod": "^3.25.28 || ^4" } }, "sha512-O/PgfnpT1xKSDeQYSCfRI5Gy3hPf91mKVDuYLUHZJMiDFptvP41MSnWofm8dnCm0256ZNfZIM7DSzuSMAFnjHA=="],
"@types/node-fetch/@types/node": ["@types/node@25.5.2", "", { "dependencies": { "undici-types": "~7.18.0" } }, "sha512-tO4ZIRKNC+MDWV4qKVZe3Ql/woTnmHDr5JD8UI5hn2pwBrHEwOEMZK7WlNb5RKB6EoJ02gwmQS9OrjuFnZYdpg=="],
"bun-types/@types/node": ["@types/node@25.5.2", "", { "dependencies": { "undici-types": "~7.18.0" } }, "sha512-tO4ZIRKNC+MDWV4qKVZe3Ql/woTnmHDr5JD8UI5hn2pwBrHEwOEMZK7WlNb5RKB6EoJ02gwmQS9OrjuFnZYdpg=="],
"form-data/mime-types": ["mime-types@2.1.35", "", { "dependencies": { "mime-db": "1.52.0" } }, "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw=="],
"@types/node-fetch/@types/node/undici-types": ["undici-types@7.18.2", "", {}, "sha512-AsuCzffGHJybSaRrmr5eHr81mwJU3kjw6M+uprWvCXiNeN9SOGwQ3Jn8jb8m3Z6izVgknn1R0FTCEAP2QrLY/w=="],
"bun-types/@types/node/undici-types": ["undici-types@7.18.2", "", {}, "sha512-AsuCzffGHJybSaRrmr5eHr81mwJU3kjw6M+uprWvCXiNeN9SOGwQ3Jn8jb8m3Z6izVgknn1R0FTCEAP2QrLY/w=="],
"form-data/mime-types/mime-db": ["mime-db@1.52.0", "", {}, "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg=="],
}
}

198
docs/ENGINES.md Normal file
View File

@@ -0,0 +1,198 @@
# Pluggable Engine Architecture
## The idea
Every GBrain operation goes through `BrainEngine`. The engine is the contract between "what the brain can do" and "how it's stored." Swap the engine, keep everything else.
v0 ships `PostgresEngine` backed by Supabase. The interface is designed so a `SQLiteEngine`, `DuckDBEngine`, or `TursoEngine` could slot in without touching the CLI, MCP server, skills, or any consumer code.
## Why this matters
Different users have different constraints:
| User | Needs | Best engine |
|------|-------|-------------|
| Power user (you) | World-class search, 7K+ pages, zero-ops | PostgresEngine + Supabase |
| Open source hacker | Single file, no server, git-friendly | SQLiteEngine (future) |
| Team/enterprise | Multi-user, RLS, audit trail | PostgresEngine + self-hosted |
| Researcher | Analytics, bulk exports, embeddings | DuckDBEngine (someday) |
| Edge/mobile | Offline-first, sync later | SQLiteEngine + sync (someday) |
The engine interface means we don't have to choose. Ship Postgres now, let the community build the rest.
## The interface
```typescript
// src/core/engine.ts
export interface BrainEngine {
// Lifecycle
connect(config: EngineConfig): Promise<void>;
disconnect(): Promise<void>;
initSchema(): Promise<void>;
transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T>;
// Pages CRUD
getPage(slug: string): Promise<Page | null>;
putPage(slug: string, page: PageInput): Promise<Page>;
deletePage(slug: string): Promise<void>;
listPages(filters: PageFilters): Promise<Page[]>;
// Search
searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]>;
searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]>;
// Chunks
upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void>;
getChunks(slug: string): Promise<Chunk[]>;
// Links
addLink(from: string, to: string, context?: string, linkType?: string): Promise<void>;
removeLink(from: string, to: string): Promise<void>;
getLinks(slug: string): Promise<Link[]>;
getBacklinks(slug: string): Promise<Link[]>;
traverseGraph(slug: string, depth?: number): Promise<GraphNode[]>;
// Tags
addTag(slug: string, tag: string): Promise<void>;
removeTag(slug: string, tag: string): Promise<void>;
getTags(slug: string): Promise<string[]>;
// Timeline
addTimelineEntry(slug: string, entry: TimelineInput): Promise<void>;
getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]>;
// Raw data
putRawData(slug: string, source: string, data: object): Promise<void>;
getRawData(slug: string, source?: string): Promise<RawData[]>;
// Versions
createVersion(slug: string): Promise<PageVersion>;
getVersions(slug: string): Promise<PageVersion[]>;
revertToVersion(slug: string, versionId: number): Promise<void>;
// Stats + health
getStats(): Promise<BrainStats>;
getHealth(): Promise<BrainHealth>;
// Ingest log
logIngest(entry: IngestLogInput): Promise<void>;
getIngestLog(opts?: IngestLogOpts): Promise<IngestLogEntry[]>;
// Config
getConfig(key: string): Promise<string | null>;
setConfig(key: string, value: string): Promise<void>;
}
```
### Key design choices
**Slug-based API, not ID-based.** Every method takes slugs, not numeric IDs. The engine resolves slugs to IDs internally. This keeps the interface portable... slugs are strings, IDs are database-specific.
**Embedding is NOT in the engine.** The engine stores embeddings and searches by vector, but it doesn't generate embeddings. `src/core/embedding.ts` handles that. This is intentional: embedding is an external API call (OpenAI), not a storage concern. All engines share the same embedding service.
**Chunking is NOT in the engine.** Same logic. `src/core/chunkers/` handles chunking. The engine stores and retrieves chunks. All engines share the same chunkers.
**Search returns `SearchResult[]`, not raw rows.** The engine is responsible for its own search implementation (tsvector vs FTS5, pgvector vs sqlite-vss) but must return a uniform result type. RRF fusion and dedup happen above the engine, in `src/core/search/hybrid.ts`.
**`traverseGraph` exists but is engine-specific.** Postgres uses recursive CTEs. SQLite would use a loop with depth tracking. The interface is the same: give me a slug and max depth, return the graph.
## How search works across engines
```
+-------------------+
| hybrid.ts |
| (RRF fusion + |
| dedup, shared) |
+--------+----------+
|
+------------+------------+
| |
+--------v--------+ +--------v--------+
| engine.search | | engine.search |
| Keyword() | | Vector() |
+-----------------+ +-----------------+
| |
+-----------+-----------+ +---------+---------+
| | | |
+-------v-------+ +-------v---+ +-------v---+ +----v--------+
| Postgres: | | SQLite: | | Postgres: | | SQLite: |
| tsvector + | | FTS5 + | | pgvector | | sqlite-vss |
| ts_rank + | | bm25 | | HNSW | | or vec0 |
| websearch_to_ | | | | cosine | | |
| tsquery | | | | | | |
+---------------+ +-----------+ +-----------+ +-------------+
```
RRF fusion, multi-query expansion, and 4-layer dedup are engine-agnostic. They operate on `SearchResult[]` arrays. Only the raw keyword and vector searches are engine-specific.
## PostgresEngine (v0, ships)
**Dependencies:** `postgres` (porsager/postgres), `pgvector`
**Postgres-specific features used:**
- `tsvector` + `GIN` index for full-text search with `ts_rank` weighting
- `pgvector` HNSW index for cosine similarity vector search
- `pg_trgm` + `GIN` for fuzzy slug resolution
- Recursive CTEs for graph traversal
- Trigger-based search_vector (spans pages + timeline_entries)
- JSONB for frontmatter with GIN index
- Connection pooling via Supabase Supavisor (port 6543)
**Hosting:** Supabase Pro ($25/mo). Zero-ops. Managed Postgres with pgvector built in.
**Why not self-hosted for v0:** The brain should be infrastructure agents use, not something you maintain. Self-hosted Postgres with Docker is a welcome community PR, but v0 optimizes for zero ops.
## Adding a new engine
1. Create `src/core/<name>-engine.ts` implementing `BrainEngine`
2. Add to engine factory in `src/core/engine.ts`:
```typescript
export function createEngine(type: string): BrainEngine {
switch (type) {
case 'postgres': return new PostgresEngine();
case 'sqlite': return new SQLiteEngine();
default: throw new Error(`Unknown engine: ${type}`);
}
}
```
3. Store engine type in `~/.gbrain/config.json`: `{ "engine": "sqlite", ... }`
4. Add tests. The test suite should be engine-agnostic where possible... same test cases, different engine constructor.
5. Document in this file + add a design doc in `docs/`
### What you DON'T need to touch
- `src/cli.ts` (dispatches to engine, doesn't know which one)
- `src/mcp/server.ts` (same)
- `src/core/chunkers/*` (shared across engines)
- `src/core/embedding.ts` (shared across engines)
- `src/core/search/hybrid.ts`, `expansion.ts`, `dedup.ts` (shared, operate on SearchResult[])
- `skills/*` (fat markdown, engine-agnostic)
### What you DO need to implement
Every method in `BrainEngine`. The full interface. No optional methods, no feature flags. If your engine can't do vector search (e.g., a pure-text engine), implement `searchVector` to return `[]` and document the limitation.
## Capability matrix
| Capability | PostgresEngine | SQLiteEngine (future) | Notes |
|-----------|---------------|----------------------|-------|
| CRUD | Full | Full | |
| Keyword search | tsvector + ts_rank | FTS5 + bm25 | Different ranking algorithms |
| Vector search | pgvector HNSW | sqlite-vss or vec0 | Different index types |
| Fuzzy slug | pg_trgm | LIKE + Levenshtein | Postgres is better here |
| Graph traversal | Recursive CTE | Loop with depth tracking | Same interface |
| Transactions | Full ACID | Full ACID | Both support this |
| JSONB queries | GIN index | json_extract | Postgres is richer |
| Concurrent access | Connection pooling | Single writer | SQLite limitation |
| Hosting | Supabase, self-hosted, Docker | Local file | |
## Future engine ideas
**SQLiteEngine** (most requested). See `docs/SQLITE_ENGINE.md` for the full plan. Single file, no server, git-friendly. Uses FTS5 for keyword search, sqlite-vss or vec0 for vector search. Great for open source users who want zero infrastructure.
**TursoEngine.** libSQL (SQLite fork) with embedded replicas and HTTP edge access. Would give SQLite's simplicity with cloud sync. Interesting for mobile/edge use cases.
**DuckDBEngine.** Analytical workloads. Bulk exports, embedding analysis, brain-wide statistics. Not for OLTP. Could be a secondary engine for analytics alongside Postgres for operations.
**Custom/Remote.** The interface is clean enough that someone could build an engine backed by any storage: Firestore, DynamoDB, a REST API, even a flat file system. The interface doesn't assume SQL.

545
docs/GBRAIN_V0.md Normal file
View File

@@ -0,0 +1,545 @@
# GBrain v0: Postgres-Native Personal Knowledge Brain
## What this is
GBrain is a compiled intelligence system. Not a note-taking app. Not "chat with your notes."
Every page is an intelligence assessment. Above the line: compiled truth (your current best understanding, rewritten when evidence changes). Below the line: timeline (append-only evidence trail). AI agents maintain the brain. MCP clients query it. The intelligence lives in fat markdown skills, not application code.
The core insight: personal knowledge at scale is an intelligence problem, not a storage problem.
## Why it exists
A 7,471-file / 2.3GB markdown wiki is choking git. Git doesn't scale past ~5K files for wiki-style use. The compiled truth + timeline model (Karpathy-style knowledge pages) is right, but it needs a real database underneath.
There's already a production-grade RAG system (Ruby on Rails, Postgres + pgvector) with 3-tier chunking, hybrid search with RRF, multi-query expansion, and 4-layer dedup. GBrain ports these proven patterns to a standalone Bun + TypeScript tool.
## The knowledge model
```
+--------------------------------------------------+
| Page: concepts/do-things-that-dont-scale |
| |
| --- frontmatter (YAML) --- |
| type: concept |
| tags: [startups, growth, pg-essay] |
| |
| === COMPILED TRUTH === |
| Current best understanding. |
| Rewritten on new evidence. |
| This is the "what we know now" section. |
| |
| --- |
| |
| === TIMELINE === |
| Append-only evidence trail. |
| - 2013-07-01: Published on paulgraham.com |
| - 2024-11-15: Referenced in batch kickoff talk |
| Never edited, only appended. |
+--------------------------------------------------+
| |
v v
[Semantic chunks] [Recursive chunks]
(best quality for (predictable format
compiled truth) for timeline)
| |
v v
[Embeddings: text-embedding-3-large, 1536 dims]
|
v
[HNSW index + tsvector + pg_trgm]
|
v
[Hybrid search: vector + keyword + RRF fusion]
```
## Architecture decisions
### v0 stack
| Layer | Choice | Why |
|-------|--------|-----|
| Database | Postgres + pgvector | Proven RAG patterns, production-tested. World-class hybrid search. |
| Hosting | Supabase Pro ($25/mo) | Zero-ops. Managed Postgres, pgvector, connection pooling. 8GB storage. |
| Runtime | Bun + TypeScript | Consistent with GStack ecosystem. Fast. Compiles to single binary. |
| Embeddings | OpenAI text-embedding-3-large | 1536 dims (reduced from 3072 via dimensions API). ~$0.13/1M tokens. |
| LLM (chunking/expansion) | Claude Haiku | Cheapest model for topic boundary detection and query expansion. |
| Background jobs | Trigger.dev | Serverless. Embed backfill, stale detection, orphan audit, tag consistency. |
| Distribution | npm package + compiled binary + MCP server | Library for OpenClaw, CLI for humans, MCP for agents. |
### What we chose and why
**Postgres over SQLite.** We have 3+ years of proven RAG patterns running on Postgres. tsvector for full-text search, pgvector HNSW for semantic search, pg_trgm for fuzzy slug matching. Porting these to SQLite would mean reimplementing search from scratch. SQLite is a future pluggable engine for lightweight open source users (see `docs/ENGINES.md`).
**Supabase over self-hosted.** Zero maintenance. The brain should be infrastructure that AI agents use, not something you administer. Free tier has pgvector but only 500MB (not enough for 7K+ pages with embeddings, which need ~750MB). Pro tier at $25/mo gives 8GB. No Docker, no self-hosted Postgres in v1.
**Full port over minimal viable.** The patterns are proven. The port is mechanical. Shipping the full 3-tier chunking + hybrid search + 4-layer dedup means world-class RAG from day one. "We'll add that later" means rebuilding everything later.
**Library-first distribution.** gbrain is an npm package. OpenClaw installs it as a dependency (`bun add gbrain`), imports the engine directly. Zero-overhead function calls, shared connection pool, TypeScript types. The CLI and MCP server are thin wrappers over the same engine.
**Trigger-based tsvector (not generated column).** To include timeline_entries content in full-text search, the tsvector needs to span multiple tables. Generated columns can't do cross-table references. A trigger on pages + timeline_entries updates the search_vector.
**Auto-embed during import.** No separate embed step. `gbrain import` chunks and embeds in one pass. Progress bar shows status. `--no-embed` flag for users who want to defer. `embedded_at` column enables `gbrain embed --stale` for backfill.
## Distribution model
```
+-------------------+ +-------------------+ +-------------------+
| npm package | | Compiled binary | | MCP server |
| (library) | | (CLI) | | (stdio) |
+-------------------+ +-------------------+ +-------------------+
| | | | | |
| bun add gbrain | | GitHub Releases | | gbrain serve |
| import { Postgres | | npx gbrain | | in mcp.json |
| Engine } | | | | |
| | | | | |
| WHO: OpenClaw, | | WHO: Humans | | WHO: Claude Code, |
| AlphaClaw | | | | Cursor, etc. |
+-------------------+ +-------------------+ +-------------------+
| | |
+-------------------------+-------------------------+
|
+--------v--------+
| BrainEngine |
| (pluggable |
| interface) |
+-----------------+
|
+-------------+-------------+
| |
+------v------+ +-------v-------+
| Postgres | | SQLite |
| Engine | | Engine |
| (v0, ships) | | (future, see |
+-------------+ | ENGINES.md) |
+---------------+
```
package.json exports:
- Library: `src/core/index.ts` (BrainEngine interface, PostgresEngine, types)
- CLI binary: `src/cli.ts`
## First-time experience
### Path 1: OpenClaw user (primary)
OpenClaw is the AI orchestrator that uses gbrain as its knowledge backend. This is the most common install path.
```bash
# 1. Install gbrain as a ClawHub skill
clawhub install gbrain
# 2. The skill runs guided setup on first use:
# - Detects if Supabase CLI is available
# - If yes: auto-provisions a new Supabase project
# - If no: prompts for connection URL
# - Runs schema migration
# - Imports bundled kindling corpus (10 PG essays)
# - Shows live entity/edge extraction animation
# - Brain is ready
# 3. From OpenClaw, brain tools are now available:
# "What essays do we have about startups?"
# "Ingest my meeting notes from today"
# "What does PG say about doing things that don't scale?"
```
Behind the scenes, `clawhub install gbrain`:
1. Installs the `gbrain` npm package
2. Ships SKILL.md files (ingest, query, maintain, enrich, briefing, migrate)
3. Registers brain tools with the orchestrator
4. Runs `gbrain init --supabase` on first use (guided wizard)
### Path 2: CLI user (standalone)
```bash
# 1. Install
npm install -g gbrain
# or: download binary from GitHub Releases
# 2. Initialize with Supabase
gbrain init --supabase
# Guided wizard:
# Try 1: Supabase CLI auto-provision (npx supabase)
# Try 2: If CLI not installed or not logged in, fallback:
# "Enter your Supabase connection URL:"
# Then: runs schema migration, verifies pgvector extension
# Then: imports kindling corpus (10 PG essays as demo data)
# Then: shows live entity extraction animation
# Output: "Brain ready. 10 pages imported. Try: gbrain query 'what does PG say about startups?'"
# 3. Import your data
gbrain import /path/to/markdown/wiki/
# Progress bar: 7,471 files, auto-chunk, auto-embed
# ~30s for text import, ~10-15 min for embedding
# 4. Query
gbrain query "what does PG say about doing things that don't scale?"
```
### Path 3: MCP user (Claude Code, Cursor)
```json
// ~/.config/claude/mcp.json
{
"mcpServers": {
"gbrain": {
"command": "gbrain",
"args": ["serve"]
}
}
}
```
Then in Claude Code: "Search my brain for people who know about robotics"
### The init wizard in detail
`gbrain init --supabase` runs through these steps:
```
Step 1: Database Setup
├── Check for Supabase CLI (npx supabase --version)
│ ├── Found + logged in → auto-create project
│ │ ├── Create project via supabase CLI
│ │ ├── Wait for project to be ready
│ │ └── Extract connection string
│ ├── Found + not logged in →
│ │ └── Error: "Supabase CLI found but not logged in."
│ │ Cause: "You need to authenticate first."
│ │ Fix: "Run: npx supabase login"
│ │ Docs: "https://supabase.com/docs/guides/cli"
│ └── Not found → fallback to manual
│ └── Prompt: "Enter your Supabase connection URL:"
Step 2: Schema Migration
├── Connect to database
├── CREATE EXTENSION IF NOT EXISTS vector
├── CREATE EXTENSION IF NOT EXISTS pg_trgm
├── Run src/schema.sql (all tables, indexes, triggers)
└── Verify: test insert + vector query
Step 3: Config
├── Write ~/.gbrain/config.json (0600 permissions)
│ { "database_url": "...", "service_role_key": "..." }
└── Verify connection
Step 4: Kindling Import
├── Import 10 bundled PG essays as demo data
├── Chunk + embed each essay
├── Show live entity/edge extraction animation:
│ "Extracting entities... Paul Graham (person), Y Combinator (company)..."
│ "Creating links... Paul Graham → Y Combinator (founded)..."
└── Output: "Brain ready. 10 pages imported."
Step 5: First Query
└── "Try: gbrain query 'what does PG say about doing things that don't scale?'"
```
Every error follows the style guide: problem + cause + fix + docs link.
## CLI commands
```
gbrain init [--supabase|--url <conn>] # create brain
gbrain get <slug> # read a page
gbrain put <slug> [< file.md] # write/update a page
gbrain search <query> # keyword search (tsvector)
gbrain query <question> # hybrid search (RRF + expansion)
gbrain ingest <file> [--type ...] # ingest a source document
gbrain link <from> <to> [--type <type>] # create typed link
gbrain unlink <from> <to> # remove link
gbrain graph <slug> [--depth 5] # traverse link graph (recursive CTE)
gbrain backlinks <slug> # incoming links
gbrain tags <slug> # list tags
gbrain tag <slug> <tag> # add tag
gbrain untag <slug> <tag> # remove tag
gbrain timeline [<slug>] # view timeline
gbrain timeline-add <slug> <date> <text> # add timeline entry
gbrain list [--type] [--tag] [--limit] # list with filters
gbrain stats # brain statistics
gbrain health # brain health dashboard
gbrain import <dir> [--no-embed] # import from markdown directory
gbrain export [--dir ./export/] # export to markdown (round-trip)
gbrain embed [<slug>|--all|--stale] # generate/refresh embeddings
gbrain serve # MCP server (stdio)
gbrain call <tool> '<json>' # raw tool invocation
gbrain upgrade # self-update (npm, binary, ClawHub)
gbrain version # version info
gbrain config [get|set] <key> [value] # brain config
```
CLI and MCP expose identical operations. Drift tests assert identical results for all operations across both interfaces.
## Database schema
9 tables in Postgres + pgvector:
```
+------------------+ +-------------------+ +------------------+
| pages |---->| content_chunks | | links |
|------------------| |-------------------| |------------------|
| id (PK) | | id (PK) | | id (PK) |
| slug (UNIQUE) | | page_id (FK) | | from_page_id(FK) |
| type | | chunk_index | | to_page_id (FK) |
| title | | chunk_text | | link_type |
| compiled_truth | | chunk_source | | context |
| timeline | | embedding (1536) | +------------------+
| frontmatter(JSONB)| | model |
| search_vector | | token_count | +------------------+
| created_at | | embedded_at | | tags |
| updated_at | +-------------------+ |------------------|
+------------------+ | id (PK) |
| | page_id (FK) |
+-----> +--------------------+ | tag |
| | timeline_entries | +------------------+
| |--------------------|
| | id (PK) | +------------------+
| | page_id (FK) | | page_versions |
| | date | |------------------|
| | source | | id (PK) |
| | summary | | page_id (FK) |
| | detail (markdown) | | compiled_truth |
| +--------------------+ | frontmatter |
| | snapshot_at |
+-----> +--------------------+ +------------------+
| | raw_data |
| |--------------------| +------------------+
| | id (PK) | | config |
| | page_id (FK) | |------------------|
| | source | | key (PK) |
| | data (JSONB) | | value |
| +--------------------+ +------------------+
|
+-----> +--------------------+
| ingest_log |
|--------------------|
| id (PK) |
| source_type |
| source_ref |
| pages_updated |
| summary |
+--------------------+
```
Indexes:
- `pages.slug`: UNIQUE constraint (implicit B-tree)
- `pages.type`: B-tree
- `pages.search_vector`: GIN (full-text search)
- `pages.frontmatter`: GIN (JSONB queries)
- `pages.title`: GIN with pg_trgm (fuzzy slug resolution)
- `content_chunks.embedding`: HNSW with cosine ops (vector search)
- `content_chunks.page_id`: B-tree
- `links.from_page_id`, `links.to_page_id`: B-tree
- `tags.tag`, `tags.page_id`: B-tree
- `timeline_entries.page_id`, `timeline_entries.date`: B-tree
## Search architecture
```
Query: "when should you ignore conventional wisdom?"
|
v
+---------------------+
| Multi-query expansion|
| (Claude Haiku) |
| "contrarian thinking"
| "going against the crowd"
+---------------------+
| | |
v v v
[embed all 3 queries]
| | |
+---+---+
|
+----+----+
| |
v v
+--------+ +--------+
| Vector | | Keyword|
| Search | | Search |
| (HNSW | | (tsv + |
| cosine)| | ts_rank)|
+--------+ +--------+
| |
+----+----+
|
v
+------------------+
| RRF Fusion |
| score = sum( |
| 1/(60 + rank)) |
+------------------+
|
v
+------------------+
| 4-Layer Dedup |
| 1. By source |
| 2. Cosine > 0.85 |
| 3. Type cap 60% |
| 4. Per-page max |
+------------------+
|
v
+------------------+
| Stale alerts |
| (compiled_truth |
| older than |
| latest timeline)|
+------------------+
|
v
[Results]
```
## Chunking strategies
| Strategy | Input | Algorithm | When to use |
|----------|-------|-----------|-------------|
| Recursive | Any text | 5-level delimiter hierarchy (paragraphs > lines > sentences > clauses > whitespace). 300-word chunks, 50-word overlap. | Timeline (predictable format), bulk import |
| Semantic | Quality text | Embed each sentence, Savitzky-Golay filter for topic boundaries, cosine similarity minima. Falls back to recursive. | Compiled truth (intelligence assessments) |
| LLM-guided | High-value text | Pre-split to 128-word candidates, Claude Haiku finds topic shifts in sliding windows. 3 retries per window. | Explicitly requested via `--chunker llm` |
Dispatch: compiled_truth gets semantic chunker. Timeline gets recursive chunker. Override with `--chunker` flag or `chunk_strategy` in frontmatter.
## Skills (fat markdown, no code)
Each skill is a markdown file that AI agents (Claude Code, OpenClaw) read and follow. The skill contains the workflow, heuristics, and quality rules. No skill logic is in the binary.
| Skill | What it does |
|-------|-------------|
| `skills/ingest/SKILL.md` | Ingest meetings, docs, articles. Update compiled truth, append timeline, create links. |
| `skills/query/SKILL.md` | 3-layer search (FTS + vector + structured). Synthesize answer with citations. |
| `skills/maintain/SKILL.md` | Find contradictions, stale info, orphans, dead links, tag inconsistency. |
| `skills/enrich/SKILL.md` | Enrich from external APIs (Crustdata, Happenstance, Exa). Store raw data, distill to compiled truth. |
| `skills/briefing/SKILL.md` | Daily briefing: meetings with context, active deals, open threads. |
| `skills/migrate/SKILL.md` | Universal migration from Obsidian, Notion, Logseq, plain markdown, CSV, JSON, Roam. |
## CEO scope expansions (accepted for v0)
1. **CLI/MCP parity with drift tests.** Both interfaces are thin wrappers over the engine. Tests assert identical output.
2. **Smart slug resolution.** Fuzzy matching via pg_trgm for reads. Writes require exact slugs. `gbrain get "dont scale"` resolves to `concepts/do-things-that-dont-scale`.
3. **Brain health dashboard.** `gbrain health` shows page count, embed coverage, stale pages, orphans, dead links.
4. **Normalized timeline.** `timeline_entries` table only (no TEXT column). `detail` field supports markdown.
5. **Page version control.** `page_versions` table stores full snapshots (compiled_truth + frontmatter + links + tags). `gbrain history`, `gbrain diff`, `gbrain revert` commands. Revert re-chunks and re-embeds.
6. **Typed links + graph traversal.** `link_type` column (knows, invested_in, works_at, etc.). `gbrain graph` uses recursive CTE with max depth (default 5, configurable via `--depth`).
7. **Trigger.dev data cleanup jobs.** Daily embed backfill, weekly stale detection + orphan audit + tag consistency.
8. **Stale alert annotations.** Search results flag pages where compiled_truth is older than latest timeline entry.
9. **Timeline merge on ingest.** Same event created across all mentioned entities.
## Security model (v0)
Single-user, local-only:
- Supabase service role key in `~/.gbrain/config.json` (0600 permissions)
- MCP stdio transport is inherently local (client spawns `gbrain serve` as subprocess)
- No multi-user, no RLS, no OAuth in v0
- Multi-user path (future): Supabase RLS + per-user API keys
## Upgrade mechanism
`gbrain upgrade` detects the installation method and updates accordingly:
| Path | How |
|------|-----|
| npm | `bun update gbrain` (or npm equivalent) |
| Compiled binary | Download new binary to temp dir, atomic rename swap, exec new process |
| ClawHub | `clawhub update gbrain` |
Version check: compare local version against latest GitHub release tag.
## Storage and cost estimates
### Storage (~750MB for 7,471 pages)
| Component | Size |
|-----------|------|
| Page text (compiled_truth + timeline) | ~150MB |
| JSONB frontmatter | ~20MB |
| tsvector + GIN indexes | ~50MB |
| Content chunks (~22K, text) | ~80MB |
| Embeddings (22K x 1536 floats x 4 bytes) | ~134MB |
| HNSW index overhead (~2x embeddings) | ~270MB |
| Links, tags, timeline, raw_data, versions | ~50MB |
| **Total** | **~750MB** |
Supabase free tier (500MB) won't fit. Supabase Pro ($25/mo, 8GB) is the starting point.
### Embedding cost (~$4-5 for initial import)
| Step | Cost |
|------|------|
| Semantic chunker sentence embeddings (~374K sentences) | ~$1 |
| Chunk embeddings (~22K chunks) | ~$0.30 |
| Query expansion (per query, ~3 embeds) | negligible |
| **Total initial import** | **~$4-5** |
Budget alternative: `gbrain import --chunker recursive` skips sentence-level embeddings, then `gbrain embed --rechunk --chunker semantic` upgrades later.
## Serverless operations stack
```
+------------------+ +------------------+ +------------------+
| Supabase | | Vercel | | Trigger.dev |
| (Postgres + | | (web/API, | | (background |
| pgvector) | | optional) | | jobs) |
+------------------+ +------------------+ +------------------+
| Database | | Future web UI | | Embed backfill |
| Connection pool | | API endpoints | | Stale detection |
| pgvector HNSW | | Edge functions | | Orphan audit |
| tsvector FTS | | | | Tag consistency |
| pg_trgm fuzzy | | | | Daily briefing |
+------------------+ +------------------+ +------------------+
```
The CLI connects directly to Supabase Postgres. Trigger.dev and Vercel are for async/scheduled work. The CLI works without them.
## Verification checklist
1. `gbrain import /data/brain/` migrates all 7,471 files losslessly
2. `gbrain export` round-trips to semantically identical markdown
3. `gbrain query "what does PG say about doing things that don't scale?"` returns relevant hybrid search results
4. `gbrain serve` starts MCP server connectable by Claude Code
5. All 3 chunkers produce correct output with test fixtures
6. `gbrain init --supabase` works end-to-end
7. `bun test` passes all tests
8. `clawhub install gbrain` installs the skill and runs guided setup
9. `bun add gbrain` + `import { PostgresEngine } from 'gbrain'` works in external project
10. Drift tests pass: CLI and MCP produce identical results
11. `gbrain health` outputs accurate brain health metrics
12. Migration skill successfully imports an Obsidian vault
## Future plans
See `docs/ENGINES.md` for the pluggable engine architecture and future backend plans.
### v1 candidates (deferred from v0)
- **`gbrain ask` natural language CLI alias.** Trivial to add. P1 TODO.
- **Intelligence compiler.** Treat every fact as a first-class claim with source span, entity links, validity window, confidence, and contradiction status. "What changed, why, and what evidence would flip it again?" From Codex review. Builds on compiled truth model.
- **Active skills via Trigger.dev.** Application-specific briefings, meeting prep. Belongs in OpenClaw, not generic brain infra.
- **Multi-user access.** Supabase RLS + per-user API keys. v0 is single-user.
- **SQLite engine.** Community PRs welcome. See `docs/SQLITE_ENGINE.md`.
- **Docker Compose for self-hosted Postgres.** Community PRs welcome.
- **Web UI.** Optional Vercel-hosted dashboard for browsing brain pages.
### Interface abstraction principle
All operations go through `BrainEngine`. The engine interface is the contract. Postgres-specific features (tsvector, pgvector HNSW, pg_trgm, recursive CTEs) are implementation details inside `PostgresEngine`. The interface exposes capabilities, not SQL.
This means:
- A SQLite engine can implement `searchKeyword` using FTS5 instead of tsvector
- A SQLite engine can implement `searchVector` using sqlite-vss instead of pgvector
- A future DuckDB engine could implement analytics-heavy workloads
- The CLI, MCP server, and library consumers never know which engine runs underneath
See `docs/ENGINES.md` for the full interface spec and `docs/SQLITE_ENGINE.md` for the SQLite implementation plan.
## Review history
| Review | Runs | Status | Key findings |
|--------|------|--------|-------------|
| /office-hours | 1 | APPROVED | Builder mode. Full port approach chosen. |
| /plan-ceo-review | 1 | CLEAR | 11 proposals, 10 accepted, 1 deferred. SCOPE EXPANSION mode. |
| /codex review | 1 | issues_found | 24 points challenged, 3 accepted (fuzzy slug, revert spec, tsvector). |
| /plan-eng-review | 2 | CLEAR | 3 issues (upgrade paths, import guardrails, init wizard), 0 critical gaps. |
| /plan-devex-review | 1 | CLEAR | DX score 5/10 to 7/10. TTHW 25min to 90s. Champion tier. |

395
docs/SQLITE_ENGINE.md Normal file
View File

@@ -0,0 +1,395 @@
# SQLite Engine Design
## Status: Designed, not built. Community PRs welcome.
The pluggable engine interface (`docs/ENGINES.md`) means anyone can add a SQLite backend without touching the CLI, MCP server, or skills. This document is the full plan.
## Why SQLite
Postgres is the right choice for the primary user (7K+ pages, production RAG, zero-ops via Supabase). But a lot of people want something simpler:
- **No server.** One file. `brain.db`. Done.
- **Git-friendly.** You can (with care) commit a SQLite database alongside your notes.
- **Offline.** Works on a plane, in a coffee shop, wherever.
- **Zero cost.** No Supabase subscription. No hosting. No API keys for search (keyword-only mode works without OpenAI).
- **Portable.** Copy the file to another machine. That's it.
Tools like Khoj, Obsidian plugins, and various "local-first AI" projects already use SQLite with vector extensions. The patterns exist. This is well-trodden ground.
## What it gives up
Compared to PostgresEngine:
| Feature | Postgres | SQLite | Impact |
|---------|----------|--------|--------|
| Full-text search quality | tsvector + ts_rank (excellent) | FTS5 + bm25 (good) | Slightly less precise ranking |
| Fuzzy slug matching | pg_trgm (excellent) | LIKE + Levenshtein (ok) | Fuzzier matching, more false positives |
| Vector search | pgvector HNSW (fast, accurate) | sqlite-vss or vec0 (good enough) | Slower at scale, good for <50K chunks |
| Concurrent access | Connection pooling, many readers/writers | Single writer, many readers | Not an issue for single-user CLI |
| JSONB queries | GIN index, rich operators | json_extract, no index | Slower frontmatter queries |
| Graph traversal | Recursive CTE (native) | Recursive CTE (supported since 3.8.3) | Same |
| Hosted option | Supabase, RDS, etc. | Turso (libSQL), Cloudflare D1 | SQLite has cloud options too |
For a single user with <10K pages and no concurrent access needs, these tradeoffs are fine.
## Schema
SQLite equivalent of the Postgres schema. Key differences called out.
```sql
-- Enable WAL mode for better read concurrency
PRAGMA journal_mode=WAL;
PRAGMA foreign_keys=ON;
-- ============================================================
-- pages
-- ============================================================
CREATE TABLE pages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
slug TEXT NOT NULL UNIQUE,
type TEXT NOT NULL,
title TEXT NOT NULL,
compiled_truth TEXT NOT NULL DEFAULT '',
timeline TEXT NOT NULL DEFAULT '',
frontmatter TEXT NOT NULL DEFAULT '{}', -- JSON string, not JSONB
content_hash TEXT, -- SHA-256 for import idempotency
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_pages_type ON pages(type);
-- ============================================================
-- Full-text search via FTS5 (replaces tsvector)
-- ============================================================
CREATE VIRTUAL TABLE pages_fts USING fts5(
title,
compiled_truth,
timeline,
content='pages',
content_rowid='id',
tokenize='porter unicode61'
);
-- Triggers to keep FTS5 in sync
CREATE TRIGGER pages_fts_insert AFTER INSERT ON pages BEGIN
INSERT INTO pages_fts(rowid, title, compiled_truth, timeline)
VALUES (new.id, new.title, new.compiled_truth, new.timeline);
END;
CREATE TRIGGER pages_fts_update AFTER UPDATE ON pages BEGIN
INSERT INTO pages_fts(pages_fts, rowid, title, compiled_truth, timeline)
VALUES ('delete', old.id, old.title, old.compiled_truth, old.timeline);
INSERT INTO pages_fts(rowid, title, compiled_truth, timeline)
VALUES (new.id, new.title, new.compiled_truth, new.timeline);
END;
CREATE TRIGGER pages_fts_delete AFTER DELETE ON pages BEGIN
INSERT INTO pages_fts(pages_fts, rowid, title, compiled_truth, timeline)
VALUES ('delete', old.id, old.title, old.compiled_truth, old.timeline);
END;
-- ============================================================
-- content_chunks
-- ============================================================
CREATE TABLE content_chunks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
chunk_text TEXT NOT NULL,
chunk_source TEXT NOT NULL DEFAULT 'compiled_truth',
embedding BLOB, -- Float32Array as raw bytes
model TEXT NOT NULL DEFAULT 'text-embedding-3-large',
token_count INTEGER,
embedded_at TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_chunks_page ON content_chunks(page_id);
-- Vector search index created separately via sqlite-vss or vec0
-- See "Vector search options" section below
-- ============================================================
-- links
-- ============================================================
CREATE TABLE links (
id INTEGER PRIMARY KEY AUTOINCREMENT,
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
link_type TEXT NOT NULL DEFAULT '',
context TEXT NOT NULL DEFAULT '',
created_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(from_page_id, to_page_id)
);
CREATE INDEX idx_links_from ON links(from_page_id);
CREATE INDEX idx_links_to ON links(to_page_id);
-- ============================================================
-- tags
-- ============================================================
CREATE TABLE tags (
id INTEGER PRIMARY KEY AUTOINCREMENT,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
tag TEXT NOT NULL,
UNIQUE(page_id, tag)
);
CREATE INDEX idx_tags_tag ON tags(tag);
CREATE INDEX idx_tags_page_id ON tags(page_id);
-- ============================================================
-- raw_data
-- ============================================================
CREATE TABLE raw_data (
id INTEGER PRIMARY KEY AUTOINCREMENT,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
source TEXT NOT NULL,
data TEXT NOT NULL, -- JSON string
fetched_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(page_id, source)
);
CREATE INDEX idx_raw_data_page ON raw_data(page_id);
-- ============================================================
-- timeline_entries
-- ============================================================
CREATE TABLE timeline_entries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
date TEXT NOT NULL, -- ISO date string
source TEXT NOT NULL DEFAULT '',
summary TEXT NOT NULL,
detail TEXT NOT NULL DEFAULT '',
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_timeline_page ON timeline_entries(page_id);
CREATE INDEX idx_timeline_date ON timeline_entries(date);
-- ============================================================
-- page_versions
-- ============================================================
CREATE TABLE page_versions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
compiled_truth TEXT NOT NULL,
frontmatter TEXT NOT NULL DEFAULT '{}',
snapshot_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_versions_page ON page_versions(page_id);
-- ============================================================
-- ingest_log
-- ============================================================
CREATE TABLE ingest_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_type TEXT NOT NULL,
source_ref TEXT NOT NULL,
pages_updated TEXT NOT NULL DEFAULT '[]', -- JSON array
summary TEXT NOT NULL DEFAULT '',
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
-- ============================================================
-- config
-- ============================================================
CREATE TABLE config (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
INSERT INTO config (key, value) VALUES
('version', '1'),
('engine', 'sqlite'),
('embedding_model', 'text-embedding-3-large'),
('embedding_dimensions', '1536'),
('chunk_strategy', 'semantic');
```
### Key differences from Postgres schema
| Feature | Postgres | SQLite |
|---------|----------|--------|
| Types | `SERIAL`, `TIMESTAMPTZ`, `JSONB`, `vector(1536)` | `INTEGER`, `TEXT`, `TEXT` (JSON), `BLOB` |
| Full-text search | `tsvector` generated column + GIN | FTS5 virtual table + triggers |
| Vector storage | `vector(1536)` column type | `BLOB` (raw Float32Array bytes) |
| Vector index | HNSW via pgvector | Separate via sqlite-vss or vec0 |
| Fuzzy search | `pg_trgm` GIN index | LIKE queries or Levenshtein UDF |
| JSON queries | `JSONB` + GIN index | `json_extract()` function |
| Timestamps | `TIMESTAMPTZ` (native) | `TEXT` with ISO format |
## Vector search options
Two main choices for vector search in SQLite:
### Option A: sqlite-vss (Alex Garcia)
```sql
-- Load extension
.load ./vector0
.load ./vss0
-- Create virtual table linked to content_chunks
CREATE VIRTUAL TABLE chunks_vss USING vss0(
embedding(1536)
);
-- Insert embeddings (linked by rowid to content_chunks)
INSERT INTO chunks_vss(rowid, embedding)
SELECT id, embedding FROM content_chunks WHERE embedding IS NOT NULL;
-- Search
SELECT rowid, distance
FROM chunks_vss
WHERE vss_search(embedding, :query_embedding)
LIMIT 20;
```
Pros: mature, well-documented, used by many projects.
Cons: requires loading native extensions (platform-specific binaries).
### Option B: vec0 (newer, from same author)
```sql
-- Create virtual table
CREATE VIRTUAL TABLE chunks_vec USING vec0(
chunk_id INTEGER PRIMARY KEY,
embedding float[1536]
);
-- Search
SELECT chunk_id, distance
FROM chunks_vec
WHERE embedding MATCH :query_embedding
ORDER BY distance
LIMIT 20;
```
Pros: simpler API, better integration with SQLite ecosystem.
Cons: newer, less battle-tested.
### Option C: No vector search (keyword only)
For users who don't want to deal with vector extensions or OpenAI API keys, the brain still works with keyword search only. FTS5 + bm25 is genuinely good for structured wiki content where you know the terms. `searchVector` returns `[]`, hybrid search degrades gracefully to keyword-only.
This is a valid configuration. Not everyone needs embeddings.
## Init flow for SQLite
```bash
gbrain init --sqlite
# or: gbrain init --sqlite --path ~/brain.db
# 1. Create database file at specified path (default: ~/.gbrain/brain.db)
# 2. Run schema (all CREATE TABLE + FTS5 + triggers)
# 3. Write config to ~/.gbrain/config.json:
# { "engine": "sqlite", "database_path": "~/.gbrain/brain.db" }
# 4. Import kindling corpus (same as Postgres path)
# 5. "Brain ready. 10 pages imported."
```
No Supabase account needed. No API keys needed (keyword-only mode). No server. Just a file.
For vector search, the user additionally needs:
- OpenAI API key in `~/.gbrain/config.json` or `OPENAI_API_KEY` env var
- sqlite-vss or vec0 extension binary for their platform
## Fuzzy slug resolution without pg_trgm
Postgres uses `pg_trgm` GIN index for fast fuzzy matching. SQLite doesn't have this. Options:
1. **LIKE with wildcards.** `WHERE slug LIKE '%dont%scale%'`. Simple, works for partial matches, but no ranking.
2. **Levenshtein distance via UDF.** Load a user-defined function (or implement in TS) that computes edit distance. Sort by distance. Slower but more accurate.
3. **Trigram simulation in TS.** Compute trigrams in TypeScript, store in a separate table, query by trigram overlap. Fast but requires maintaining the trigram index.
Recommendation: start with LIKE + fallback to Levenshtein UDF. Good enough for single-user, <10K pages.
## Implementation roadmap
If you're building this, here's the order:
1. **`src/core/sqlite-engine.ts`** implementing `BrainEngine`
2. **Schema migration** (the SQL above)
3. **CRUD operations** (getPage, putPage, listPages, deletePage). Straightforward SQL.
4. **FTS5 keyword search** (searchKeyword). Map `websearch_to_tsquery` semantics to FTS5 query syntax.
5. **Tags, links, timeline, raw_data, versions, config, ingest_log.** All straightforward.
6. **Graph traversal.** SQLite supports recursive CTEs since 3.8.3. Port the Postgres CTE with max depth.
7. **Vector search** (optional). Pick sqlite-vss or vec0, implement searchVector.
8. **Tests.** Port the Postgres test suite. Most tests should be engine-agnostic.
Steps 1-6 are purely mechanical. Step 7 is the only one that requires a native extension.
## Dependencies for SQLite engine
```json
{
"better-sqlite3": "^11.0.0"
}
```
Or use Bun's built-in `bun:sqlite` driver (zero dependency).
For vector search, add one of:
- `sqlite-vss` (native extension, platform-specific)
- `vec0` (native extension, platform-specific)
## Testing strategy
Most test cases should be engine-agnostic. The test runner should parameterize by engine:
```typescript
const engines = [
{ name: 'postgres', factory: () => new PostgresEngine() },
{ name: 'sqlite', factory: () => new SQLiteEngine() },
];
for (const { name, factory } of engines) {
describe(`BrainEngine (${name})`, () => {
const engine = factory();
test('putPage + getPage round-trip', async () => {
await engine.putPage('test/slug', { title: 'Test', type: 'person', ... });
const page = await engine.getPage('test/slug');
expect(page.title).toBe('Test');
});
// ... all CRUD, search, link, tag, timeline tests
});
}
```
Search tests may need engine-specific assertions (ranking differences between tsvector and FTS5 are expected). But the interface contract (returns SearchResult[], sorted by relevance) should hold across engines.
## File structure
```
brain.db # ~750MB for 7K pages with embeddings
# ~150MB without embeddings (keyword-only)
~/.gbrain/config.json # { "engine": "sqlite", "database_path": "..." }
```
That's it. One file for the brain. One file for config.
## Migration between engines
Future work: `gbrain migrate --from postgres --to sqlite` (and vice versa). The engine interface makes this straightforward... export all data via one engine's methods, import via the other's. The data model is the same, only the storage format changes.
This is not built yet. For now, `gbrain export` to markdown and `gbrain import` into the other engine achieves the same result (with re-chunking and re-embedding).
## Contributing
If you want to build this:
1. Fork the repo
2. Create `src/core/sqlite-engine.ts`
3. Use the schema from this document
4. Run the existing test suite against your engine
5. PR it
The interface is well-defined. The schema is documented. The test suite exists. This should be a few days of focused work with CC, or a weekend project for a human.
We'd love to see it.

32
package.json Normal file
View File

@@ -0,0 +1,32 @@
{
"name": "gbrain",
"version": "0.1.0",
"description": "Postgres-native personal knowledge brain with hybrid RAG search",
"type": "module",
"main": "src/core/index.ts",
"bin": {
"gbrain": "src/cli.ts"
},
"exports": {
".": "./src/core/index.ts",
"./engine": "./src/core/engine.ts",
"./types": "./src/core/types.ts"
},
"scripts": {
"dev": "bun run src/cli.ts",
"build": "bun build --compile --outfile bin/gbrain src/cli.ts",
"test": "bun test"
},
"dependencies": {
"@anthropic-ai/sdk": "^0.30.0",
"@modelcontextprotocol/sdk": "^1.0.0",
"gray-matter": "^4.0.3",
"openai": "^4.0.0",
"pgvector": "^0.2.0",
"postgres": "^3.4.0"
},
"devDependencies": {
"@types/bun": "latest"
},
"license": "MIT"
}

58
skills/briefing/SKILL.md Normal file
View File

@@ -0,0 +1,58 @@
# Briefing Skill
Compile a daily briefing from brain context.
## Workflow
1. **Today's meetings.** For each meeting on the calendar:
- Look up all participants via `gbrain query <name>`
- Read their pages for compiled_truth context
- Summarize: who they are, recent timeline, relationship to you
2. **Active deals.** `gbrain list --type deal` filtered to active status:
- Deadlines approaching in the next 7 days
- Recent timeline entries (last 7 days)
3. **Time-sensitive threads.** Open items from timeline entries:
- Items with deadlines in the next 48 hours
- Follow-ups that are overdue
4. **Recent changes.** Pages updated in the last 24 hours:
- What changed and why (read timeline entries)
5. **People in play.** `gbrain list --type person` sorted by recency:
- Updated in last 7 days
- Have high activity (many recent timeline entries)
6. **Stale alerts.** From `gbrain health`:
- Pages flagged as stale that are relevant to today's meetings
## Output Format
```
DAILY BRIEFING — [date]
========================
MEETINGS TODAY
- [time] [meeting name]
Participants: [name] (slug: people/name, [key context])
ACTIVE DEALS
- [deal name] — [status], deadline: [date]
Recent: [latest timeline entry]
ACTION ITEMS
- [item] — due [date], related to [slug]
RECENT CHANGES (24h)
- [slug] — [what changed]
PEOPLE IN PLAY
- [name] — [why they're active]
```
## Commands Used
```
gbrain query <name>
gbrain get <slug>
gbrain list --type deal
gbrain list --type person
gbrain health
gbrain timeline <slug>
```

45
skills/enrich/SKILL.md Normal file
View File

@@ -0,0 +1,45 @@
# Enrich Skill
Enrich person and company pages from external APIs.
## Sources
| Source | Data | API |
|--------|------|-----|
| Crustdata | LinkedIn profiles, company data | REST API |
| Happenstance | Career history, connections | REST API |
| Exa | Web mentions, articles | REST API |
Note: enrichment requires separate API credentials for each service. No client
integrations ship in v1. This skill guides Claude Code to make API calls directly.
## Workflow
1. **Select target pages.** `gbrain list --type person` or `gbrain list --type company`
2. **For each page:**
- Read current compiled_truth to understand what we already know
- Call external APIs for fresh data
- Store raw API responses: the raw JSON goes into `gbrain call put_raw_data`
- Distill highlights into compiled_truth updates
3. **Validation rules:**
- Connection count < 20 on LinkedIn = likely wrong person, skip
- Name mismatch between brain and API = skip, flag for manual review
- Don't overwrite human-written assessments with API boilerplate
## Quality Rules
- Raw data goes to raw_data table (preserves provenance)
- Only distilled, useful info goes to compiled_truth
- Always add a timeline entry: "Enriched from [source] on [date]"
- Don't enrich the same page more than once per week unless requested
- Rate limit: respect API rate limits, use exponential backoff
## Commands Used
```
gbrain get <slug>
gbrain put <slug>
gbrain timeline-add <slug> <date> "Enriched from <source>"
gbrain list --type person
gbrain list --type company
```

34
skills/ingest/SKILL.md Normal file
View File

@@ -0,0 +1,34 @@
# Ingest Skill
Ingest meetings, articles, documents, and conversations into the brain.
## Workflow
1. **Parse the source.** Extract people, companies, dates, and events from the input.
2. **For each entity mentioned:**
- `gbrain get <slug>` to check if page exists
- If exists: update compiled_truth (rewrite State section with new info, don't append)
- If new: `gbrain put <slug>` to create the page
3. **Append to timeline.** `gbrain timeline-add <slug> <date> <summary>` for each event.
4. **Create cross-reference links.** `gbrain link <from> <to> --type <relationship>` for every entity pair mentioned together.
5. **Timeline merge.** The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.
## Quality Rules
- Executive summary in compiled_truth must be updated, not just timeline appended
- State section is REWRITTEN, not appended to. Current best understanding only.
- Timeline entries are reverse-chronological (newest first)
- Every person/company mentioned gets a page if one doesn't exist
- Link types: knows, works_at, invested_in, founded, met_at, discussed
- Source attribution: every timeline entry includes the source (meeting, article, email, etc.)
## Commands Used
```
gbrain get <slug>
gbrain put <slug> < content.md
gbrain timeline-add <slug> <date> <summary>
gbrain link <from> <to> --type <type>
gbrain tags <slug>
gbrain tag <slug> <tag>
```

59
skills/maintain/SKILL.md Normal file
View File

@@ -0,0 +1,59 @@
# Maintain Skill
Periodic brain health checks and cleanup.
## Workflow
1. **Run health check.** `gbrain health` to get the dashboard.
2. **Check each dimension:**
### Stale pages
Pages where compiled_truth is older than the latest timeline entry. The assessment hasn't been updated to reflect recent evidence.
- `gbrain query "stale pages"` or check health output
- For each stale page: read timeline, determine if compiled_truth needs rewriting
### Orphan pages
Pages with zero inbound links. Nobody references them.
- Review orphans: are they genuinely isolated or just missing links?
- Add links from related pages or flag for deletion
### Dead links
Links pointing to pages that don't exist.
- Remove dead links with `gbrain unlink`
### Missing cross-references
Pages that mention entity names but don't have formal links.
- Read compiled_truth, extract entity mentions, create links
### Tag consistency
Inconsistent tagging (e.g., "vc" vs "venture-capital", "ai" vs "artificial-intelligence").
- Standardize to the most common variant
### Embedding freshness
Chunks without embeddings, or chunks embedded with an old model.
- `gbrain embed --stale` to backfill
### Open threads
Timeline items older than 30 days with unresolved action items.
- Flag for review
## Quality Rules
- Never delete pages without confirmation
- Log all changes via timeline entries
- Run `gbrain health` before and after to show improvement
## Commands Used
```
gbrain health
gbrain list [--type T]
gbrain get <slug>
gbrain backlinks <slug>
gbrain link <from> <to> --type <type>
gbrain unlink <from> <to>
gbrain tag <slug> <tag>
gbrain untag <slug> <tag>
gbrain embed --stale
gbrain timeline <slug>
```

45
skills/manifest.json Normal file
View File

@@ -0,0 +1,45 @@
{
"name": "gbrain",
"version": "0.1.0",
"description": "Personal knowledge brain with hybrid RAG search",
"skills": [
{
"name": "ingest",
"path": "ingest/SKILL.md",
"description": "Ingest meetings, docs, articles into the brain"
},
{
"name": "query",
"path": "query/SKILL.md",
"description": "Answer questions using 3-layer search and synthesis"
},
{
"name": "maintain",
"path": "maintain/SKILL.md",
"description": "Brain health checks: contradictions, stale info, orphans"
},
{
"name": "enrich",
"path": "enrich/SKILL.md",
"description": "Enrich pages from external APIs (Crustdata, Happenstance, Exa)"
},
{
"name": "briefing",
"path": "briefing/SKILL.md",
"description": "Compile daily briefing with meeting context and active deals"
},
{
"name": "migrate",
"path": "migrate/SKILL.md",
"description": "Universal migration from Obsidian, Notion, Logseq, markdown, CSV, JSON, Roam"
}
],
"dependencies": {
"runtime": "bun",
"package": "gbrain"
},
"setup": {
"command": "gbrain init --supabase",
"description": "Initialize brain with Supabase (guided wizard)"
}
}

87
skills/migrate/SKILL.md Normal file
View File

@@ -0,0 +1,87 @@
# Migrate Skill
Universal migration from any wiki, note tool, or brain system into GBrain.
## Supported Sources
| Source | Format | Strategy |
|--------|--------|----------|
| Obsidian | Markdown + `[[wikilinks]]` | Direct import, convert wikilinks to gbrain links |
| Notion | Exported markdown or CSV | Parse Notion's export structure |
| Logseq | Markdown with `((block refs))` | Convert block refs to page links |
| Plain markdown | Any .md directory | `gbrain import <dir>` directly |
| CSV | Tabular data | Map columns to frontmatter fields |
| JSON | Structured data | Map keys to page fields |
| Roam | JSON export | Convert block structure to pages |
## General Workflow
1. **Assess the source.** What format? How many files? What structure?
2. **Plan the mapping.** How do source fields map to gbrain fields (type, title, tags, compiled_truth, timeline)?
3. **Test with a sample.** Import 5-10 files, verify with `gbrain get` and `gbrain export`.
4. **Bulk import.** Run the full migration.
5. **Verify.** `gbrain health` + `gbrain stats` + spot-check pages.
6. **Build links.** Extract cross-references from content and create typed links.
## Obsidian Migration
```bash
# 1. Direct import (obsidian vaults are markdown directories)
gbrain import /path/to/vault/
# 2. Convert [[wikilinks]] to gbrain links
# The skill reads each page's compiled_truth, finds [[Name]] patterns,
# resolves them to slugs, and creates links:
gbrain get <slug> # read content
# For each [[Name]] found:
gbrain link <current-slug> <resolved-slug> --type references
```
Obsidian-specific:
- `[[Name]]` becomes `gbrain link`
- `[[Name|alias]]` uses the alias for context
- Tags (`#tag`) become `gbrain tag`
- Frontmatter properties map to gbrain frontmatter
- Attachments (images, PDFs) are noted but not imported (future work)
## Notion Migration
1. Export from Notion: Settings > Export > Markdown & CSV
2. Notion exports nested directories with UUIDs in filenames
3. Strip UUIDs from filenames for clean slugs
4. Map Notion's database properties to frontmatter
5. `gbrain import` the cleaned directory
## CSV Migration
For tabular data (e.g., CRM exports, contact lists):
```bash
# For each row in the CSV:
# 1. Create a page with column values as frontmatter
# 2. Use a designated column as the slug (e.g., name)
# 3. Use another column as compiled_truth (e.g., notes)
gbrain put <slug> < generated.md
```
## Verification
After any migration:
1. `gbrain stats` — check page count matches source
2. `gbrain health` — check for orphans, missing embeddings
3. `gbrain export --dir /tmp/verify/` — round-trip test
4. Spot-check 5-10 pages with `gbrain get`
5. Test search: `gbrain query "someone you know is in the data"`
## Commands Used
```
gbrain import <dir> [--no-embed]
gbrain get <slug>
gbrain put <slug>
gbrain link <from> <to> --type <type>
gbrain tag <slug> <tag>
gbrain stats
gbrain health
gbrain export [--dir ./verify/]
```

38
skills/query/SKILL.md Normal file
View File

@@ -0,0 +1,38 @@
# Query Skill
Answer questions using the brain's knowledge with 3-layer search and synthesis.
## Workflow
1. **Decompose the question** into search strategies:
- Keyword search for specific names, dates, terms
- Semantic query for conceptual questions
- Structured queries (list by type, backlinks) for relational questions
2. **Execute searches:**
- `gbrain search <keywords>` for FTS matches
- `gbrain query <question>` for hybrid semantic+keyword with expansion
- `gbrain list --type <type>` or `gbrain backlinks <slug>` for structural queries
3. **Read top results.** `gbrain get <slug>` for the top 3-5 pages to get full context.
4. **Synthesize answer** with citations. Every claim traces back to a specific page slug.
5. **Flag gaps.** If the brain doesn't have info, say "the brain doesn't have information on X" rather than hallucinating.
## Quality Rules
- Never hallucinate. Only answer from brain content.
- Cite sources: "According to concepts/do-things-that-dont-scale..."
- Flag stale results: if a search result shows [STALE], note that the info may be outdated
- For "who" questions, use backlinks and typed links to find connections
- For "what happened" questions, use timeline entries
- For "what do we know" questions, read compiled_truth directly
## Commands Used
```
gbrain search <query>
gbrain query <question>
gbrain get <slug>
gbrain list [--type T] [--tag T]
gbrain backlinks <slug>
gbrain graph <slug> [--depth N]
gbrain timeline <slug>
```

252
src/cli.ts Normal file
View File

@@ -0,0 +1,252 @@
#!/usr/bin/env bun
import { PostgresEngine } from './core/postgres-engine.ts';
import { loadConfig, toEngineConfig } from './core/config.ts';
import type { BrainEngine } from './core/engine.ts';
const VERSION = '0.1.0';
async function main() {
const args = process.argv.slice(2);
const command = args[0];
if (!command || command === '--help' || command === '-h') {
printHelp();
return;
}
if (command === '--version' || command === 'version') {
console.log(`gbrain ${VERSION}`);
return;
}
if (command === '--tools-json') {
const { printToolsJson } = await import('./commands/tools-json.ts');
printToolsJson();
return;
}
// Commands that don't need a database connection
if (command === 'init') {
const { runInit } = await import('./commands/init.ts');
await runInit(args.slice(1));
return;
}
if (command === 'upgrade') {
const { runUpgrade } = await import('./commands/upgrade.ts');
await runUpgrade(args.slice(1));
return;
}
// All other commands need a database connection
const engine = await connectEngine();
try {
switch (command) {
case 'get': {
const { runGet } = await import('./commands/get.ts');
await runGet(engine, args.slice(1));
break;
}
case 'put': {
const { runPut } = await import('./commands/put.ts');
await runPut(engine, args.slice(1));
break;
}
case 'list': {
const { runList } = await import('./commands/list.ts');
await runList(engine, args.slice(1));
break;
}
case 'search': {
const { runSearch } = await import('./commands/search.ts');
await runSearch(engine, args.slice(1));
break;
}
case 'query': {
const { runQuery } = await import('./commands/query.ts');
await runQuery(engine, args.slice(1));
break;
}
case 'import': {
const { runImport } = await import('./commands/import.ts');
await runImport(engine, args.slice(1));
break;
}
case 'export': {
const { runExport } = await import('./commands/export.ts');
await runExport(engine, args.slice(1));
break;
}
case 'embed': {
const { runEmbed } = await import('./commands/embed.ts');
await runEmbed(engine, args.slice(1));
break;
}
case 'stats': {
const { runStats } = await import('./commands/stats.ts');
await runStats(engine);
break;
}
case 'health': {
const { runHealth } = await import('./commands/health.ts');
await runHealth(engine);
break;
}
case 'tag': {
const { runTag } = await import('./commands/tags.ts');
await runTag(engine, args.slice(1));
break;
}
case 'untag': {
const { runUntag } = await import('./commands/tags.ts');
await runUntag(engine, args.slice(1));
break;
}
case 'tags': {
const { runTags } = await import('./commands/tags.ts');
await runTags(engine, args.slice(1));
break;
}
case 'link': {
const { runLink } = await import('./commands/link.ts');
await runLink(engine, args.slice(1));
break;
}
case 'unlink': {
const { runUnlink } = await import('./commands/link.ts');
await runUnlink(engine, args.slice(1));
break;
}
case 'backlinks': {
const { runBacklinks } = await import('./commands/link.ts');
await runBacklinks(engine, args.slice(1));
break;
}
case 'graph': {
const { runGraph } = await import('./commands/link.ts');
await runGraph(engine, args.slice(1));
break;
}
case 'timeline': {
const { runTimeline } = await import('./commands/timeline.ts');
await runTimeline(engine, args.slice(1));
break;
}
case 'timeline-add': {
const { runTimelineAdd } = await import('./commands/timeline.ts');
await runTimelineAdd(engine, args.slice(1));
break;
}
case 'delete': {
const { runDelete } = await import('./commands/delete.ts');
await runDelete(engine, args.slice(1));
break;
}
case 'history': {
const { runHistory } = await import('./commands/version.ts');
await runHistory(engine, args.slice(1));
break;
}
case 'revert': {
const { runRevert } = await import('./commands/version.ts');
await runRevert(engine, args.slice(1));
break;
}
case 'config': {
const { runConfig } = await import('./commands/config.ts');
await runConfig(engine, args.slice(1));
break;
}
case 'serve': {
const { runServe } = await import('./commands/serve.ts');
await runServe(engine);
break;
}
case 'call': {
const { runCall } = await import('./commands/call.ts');
await runCall(engine, args.slice(1));
break;
}
default:
console.error(`Unknown command: ${command}`);
console.error('Run gbrain --help for usage');
process.exit(1);
}
} finally {
await engine.disconnect();
}
}
async function connectEngine(): Promise<BrainEngine> {
const config = loadConfig();
if (!config) {
console.error('No brain configured. Run: gbrain init --supabase');
process.exit(1);
}
const engine = new PostgresEngine();
await engine.connect(toEngineConfig(config));
return engine;
}
function printHelp() {
console.log(`gbrain ${VERSION} — personal knowledge brain
USAGE
gbrain <command> [options]
SETUP
init [--supabase|--url <conn>] Create brain (guided wizard)
upgrade Self-update
PAGES
get <slug> Read a page
put <slug> [< file.md] Write/update a page
delete <slug> Delete a page
list [--type T] [--tag T] [-n N] List pages
SEARCH
search <query> Keyword search (tsvector)
query <question> Hybrid search (RRF + expansion)
IMPORT/EXPORT
import <dir> [--no-embed] Import markdown directory
export [--dir ./out/] Export to markdown
EMBEDDINGS
embed [<slug>|--all|--stale] Generate/refresh embeddings
LINKS
link <from> <to> [--type T] Create typed link
unlink <from> <to> Remove link
backlinks <slug> Incoming links
graph <slug> [--depth N] Traverse link graph
TAGS
tags <slug> List tags
tag <slug> <tag> Add tag
untag <slug> <tag> Remove tag
TIMELINE
timeline [<slug>] View timeline
timeline-add <slug> <date> <text> Add timeline entry
ADMIN
stats Brain statistics
health Brain health dashboard
history <slug> Page version history
revert <slug> <version-id> Revert to version
config [get|set] <key> [value] Brain config
serve MCP server (stdio)
call <tool> '<json>' Raw tool invocation
version Version info
--tools-json Tool discovery (JSON)
`);
}
main().catch(e => {
console.error(e.message || e);
process.exit(1);
});

16
src/commands/call.ts Normal file
View File

@@ -0,0 +1,16 @@
import type { BrainEngine } from '../core/engine.ts';
import { handleToolCall } from '../mcp/server.ts';
export async function runCall(engine: BrainEngine, args: string[]) {
const tool = args[0];
const jsonStr = args[1];
if (!tool) {
console.error('Usage: gbrain call <tool> \'<json>\'');
process.exit(1);
}
const params = jsonStr ? JSON.parse(jsonStr) : {};
const result = await handleToolCall(engine, tool, params);
console.log(JSON.stringify(result, null, 2));
}

23
src/commands/config.ts Normal file
View File

@@ -0,0 +1,23 @@
import type { BrainEngine } from '../core/engine.ts';
export async function runConfig(engine: BrainEngine, args: string[]) {
const action = args[0];
const key = args[1];
const value = args[2];
if (action === 'get' && key) {
const val = await engine.getConfig(key);
if (val !== null) {
console.log(val);
} else {
console.error(`Config key not found: ${key}`);
process.exit(1);
}
} else if (action === 'set' && key && value) {
await engine.setConfig(key, value);
console.log(`Set ${key} = ${value}`);
} else {
console.error('Usage: gbrain config [get|set] <key> [value]');
process.exit(1);
}
}

18
src/commands/delete.ts Normal file
View File

@@ -0,0 +1,18 @@
import type { BrainEngine } from '../core/engine.ts';
export async function runDelete(engine: BrainEngine, args: string[]) {
const slug = args[0];
if (!slug) {
console.error('Usage: gbrain delete <slug>');
process.exit(1);
}
const page = await engine.getPage(slug);
if (!page) {
console.error(`Page not found: ${slug}`);
process.exit(1);
}
await engine.deletePage(slug);
console.log(`Deleted: ${slug}`);
}

113
src/commands/embed.ts Normal file
View File

@@ -0,0 +1,113 @@
import type { BrainEngine } from '../core/engine.ts';
import { embedBatch } from '../core/embedding.ts';
import type { ChunkInput } from '../core/types.ts';
import { chunkText } from '../core/chunkers/recursive.ts';
export async function runEmbed(engine: BrainEngine, args: string[]) {
const slug = args.find(a => !a.startsWith('--'));
const all = args.includes('--all');
const stale = args.includes('--stale');
if (slug) {
await embedPage(engine, slug);
} else if (all || stale) {
await embedAll(engine, stale);
} else {
console.error('Usage: gbrain embed [<slug>|--all|--stale]');
process.exit(1);
}
}
async function embedPage(engine: BrainEngine, slug: string) {
const page = await engine.getPage(slug);
if (!page) {
console.error(`Page not found: ${slug}`);
process.exit(1);
}
// Get existing chunks or create new ones
let chunks = await engine.getChunks(slug);
if (chunks.length === 0) {
// Create chunks first
const inputs: ChunkInput[] = [];
if (page.compiled_truth.trim()) {
for (const c of chunkText(page.compiled_truth)) {
inputs.push({ chunk_index: inputs.length, chunk_text: c.text, chunk_source: 'compiled_truth' });
}
}
if (page.timeline.trim()) {
for (const c of chunkText(page.timeline)) {
inputs.push({ chunk_index: inputs.length, chunk_text: c.text, chunk_source: 'timeline' });
}
}
if (inputs.length > 0) {
await engine.upsertChunks(slug, inputs);
chunks = await engine.getChunks(slug);
}
}
// Embed chunks without embeddings
const toEmbed = chunks.filter(c => !c.embedded_at);
if (toEmbed.length === 0) {
console.log(`${slug}: all ${chunks.length} chunks already embedded`);
return;
}
const embeddings = await embedBatch(toEmbed.map(c => c.chunk_text));
const updated: ChunkInput[] = chunks.map((c, i) => {
const needsEmbed = toEmbed.find(te => te.chunk_index === c.chunk_index);
const embIdx = needsEmbed ? toEmbed.indexOf(needsEmbed) : -1;
return {
chunk_index: c.chunk_index,
chunk_text: c.chunk_text,
chunk_source: c.chunk_source,
embedding: embIdx >= 0 ? embeddings[embIdx] : undefined,
token_count: c.token_count || Math.ceil(c.chunk_text.length / 4),
};
});
await engine.upsertChunks(slug, updated);
console.log(`${slug}: embedded ${toEmbed.length} chunks`);
}
async function embedAll(engine: BrainEngine, staleOnly: boolean) {
const pages = await engine.listPages({ limit: 100000 });
let total = 0;
let embedded = 0;
for (let i = 0; i < pages.length; i++) {
const page = pages[i];
const chunks = await engine.getChunks(page.slug);
const toEmbed = staleOnly
? chunks.filter(c => !c.embedded_at)
: chunks;
if (toEmbed.length === 0) continue;
try {
const embeddings = await embedBatch(toEmbed.map(c => c.chunk_text));
// Build a map of new embeddings by chunk_index
const embeddingMap = new Map<number, Float32Array>();
for (let j = 0; j < toEmbed.length; j++) {
embeddingMap.set(toEmbed[j].chunk_index, embeddings[j]);
}
// Preserve ALL chunks, only update embeddings for stale ones
const updated: ChunkInput[] = chunks.map(c => ({
chunk_index: c.chunk_index,
chunk_text: c.chunk_text,
chunk_source: c.chunk_source,
embedding: embeddingMap.get(c.chunk_index) ?? undefined,
token_count: c.token_count || Math.ceil(c.chunk_text.length / 4),
}));
await engine.upsertChunks(page.slug, updated);
embedded += toEmbed.length;
} catch (e: unknown) {
console.error(`\n Error embedding ${page.slug}: ${e instanceof Error ? e.message : e}`);
}
total += toEmbed.length;
process.stdout.write(`\r ${i + 1}/${pages.length} pages, ${embedded} chunks embedded`);
}
console.log(`\n\nEmbedded ${embedded} chunks across ${pages.length} pages`);
}

50
src/commands/export.ts Normal file
View File

@@ -0,0 +1,50 @@
import { writeFileSync, mkdirSync } from 'fs';
import { join, dirname } from 'path';
import type { BrainEngine } from '../core/engine.ts';
import { serializeMarkdown } from '../core/markdown.ts';
export async function runExport(engine: BrainEngine, args: string[]) {
const dirIdx = args.indexOf('--dir');
const outDir = dirIdx !== -1 ? args[dirIdx + 1] : './export';
const pages = await engine.listPages({ limit: 100000 });
console.log(`Exporting ${pages.length} pages to ${outDir}/`);
let exported = 0;
for (const page of pages) {
const tags = await engine.getTags(page.slug);
const md = serializeMarkdown(
page.frontmatter,
page.compiled_truth,
page.timeline,
{ type: page.type, title: page.title, tags },
);
const filePath = join(outDir, page.slug + '.md');
mkdirSync(dirname(filePath), { recursive: true });
writeFileSync(filePath, md);
// Export raw data as sidecar JSON
const rawData = await engine.getRawData(page.slug);
if (rawData.length > 0) {
const slugParts = page.slug.split('/');
const rawDir = join(outDir, ...slugParts.slice(0, -1), '.raw');
mkdirSync(rawDir, { recursive: true });
const rawPath = join(rawDir, slugParts[slugParts.length - 1] + '.json');
const rawObj: Record<string, unknown> = {};
for (const rd of rawData) {
rawObj[rd.source] = rd.data;
}
writeFileSync(rawPath, JSON.stringify(rawObj, null, 2) + '\n');
}
exported++;
if (exported % 100 === 0) {
process.stdout.write(`\r ${exported}/${pages.length} exported`);
}
}
console.log(`\nExported ${exported} pages to ${outDir}/`);
}

37
src/commands/get.ts Normal file
View File

@@ -0,0 +1,37 @@
import type { BrainEngine } from '../core/engine.ts';
import { serializeMarkdown } from '../core/markdown.ts';
export async function runGet(engine: BrainEngine, args: string[]) {
const slug = args[0];
if (!slug) {
console.error('Usage: gbrain get <slug>');
process.exit(1);
}
// Try exact match first, then fuzzy resolve
let page = await engine.getPage(slug);
if (!page) {
const candidates = await engine.resolveSlugs(slug);
if (candidates.length === 1) {
page = await engine.getPage(candidates[0]);
} else if (candidates.length > 1) {
console.error(`Ambiguous slug "${slug}". Did you mean:`);
for (const c of candidates) console.error(` ${c}`);
process.exit(1);
}
}
if (!page) {
console.error(`Page not found: ${slug}`);
process.exit(1);
}
const tags = await engine.getTags(page.slug);
const md = serializeMarkdown(
page.frontmatter,
page.compiled_truth,
page.timeline,
{ type: page.type, title: page.title, tags },
);
process.stdout.write(md);
}

36
src/commands/health.ts Normal file
View File

@@ -0,0 +1,36 @@
import type { BrainEngine } from '../core/engine.ts';
export async function runHealth(engine: BrainEngine) {
const health = await engine.getHealth();
const coveragePct = (health.embed_coverage * 100).toFixed(1);
console.log('Brain Health Dashboard');
console.log('======================');
console.log(`Pages: ${health.page_count}`);
console.log(`Embed coverage: ${coveragePct}%`);
console.log(`Missing embeddings: ${health.missing_embeddings}`);
console.log(`Stale pages: ${health.stale_pages}`);
console.log(`Orphan pages: ${health.orphan_pages}`);
console.log(`Dead links: ${health.dead_links}`);
// Health score: simple heuristic
let score = 10;
if (health.embed_coverage < 0.5) score -= 3;
else if (health.embed_coverage < 0.9) score -= 1;
if (health.stale_pages > health.page_count * 0.2) score -= 2;
if (health.orphan_pages > health.page_count * 0.3) score -= 1;
if (health.dead_links > 0) score -= 1;
if (health.missing_embeddings > 0) score -= 1;
score = Math.max(0, score);
console.log(`\nHealth score: ${score}/10`);
if (score < 7) {
console.log('\nRecommendations:');
if (health.missing_embeddings > 0) console.log(' Run: gbrain embed --stale');
if (health.stale_pages > 0) console.log(' Review stale pages (compiled_truth older than timeline)');
if (health.orphan_pages > 0) console.log(' Add links to orphan pages');
if (health.dead_links > 0) console.log(' Fix dead links');
}
}

152
src/commands/import.ts Normal file
View File

@@ -0,0 +1,152 @@
import { readFileSync, readdirSync, statSync } from 'fs';
import { join, relative } from 'path';
import { createHash } from 'crypto';
import type { BrainEngine } from '../core/engine.ts';
import { parseMarkdown } from '../core/markdown.ts';
import { chunkText } from '../core/chunkers/recursive.ts';
import { embed, embedBatch } from '../core/embedding.ts';
import type { ChunkInput } from '../core/types.ts';
export async function runImport(engine: BrainEngine, args: string[]) {
const dir = args.find(a => !a.startsWith('--'));
const noEmbed = args.includes('--no-embed');
if (!dir) {
console.error('Usage: gbrain import <dir> [--no-embed]');
process.exit(1);
}
// Collect all .md files
const files = collectMarkdownFiles(dir);
console.log(`Found ${files.length} markdown files`);
let imported = 0;
let skipped = 0;
let chunksCreated = 0;
for (let i = 0; i < files.length; i++) {
const filePath = files[i];
const relativePath = relative(dir, filePath);
// Progress
if ((i + 1) % 100 === 0 || i === files.length - 1) {
process.stdout.write(`\r ${i + 1}/${files.length} files processed, ${imported} imported, ${skipped} skipped`);
}
try {
const content = readFileSync(filePath, 'utf-8');
const parsed = parseMarkdown(content, relativePath);
const slug = parsed.slug;
// Check content hash for idempotency
const hash = createHash('sha256')
.update(parsed.compiled_truth + '\n---\n' + parsed.timeline)
.digest('hex');
const existing = await engine.getPage(slug);
if (existing?.content_hash === hash) {
skipped++;
continue;
}
// Upsert page
await engine.putPage(slug, {
type: parsed.type,
title: parsed.title,
compiled_truth: parsed.compiled_truth,
timeline: parsed.timeline,
frontmatter: parsed.frontmatter,
});
// Tags
for (const tag of parsed.tags) {
await engine.addTag(slug, tag);
}
// Chunk
const chunks: ChunkInput[] = [];
if (parsed.compiled_truth.trim()) {
const ctChunks = chunkText(parsed.compiled_truth);
for (const c of ctChunks) {
chunks.push({
chunk_index: chunks.length,
chunk_text: c.text,
chunk_source: 'compiled_truth',
});
}
}
if (parsed.timeline.trim()) {
const tlChunks = chunkText(parsed.timeline);
for (const c of tlChunks) {
chunks.push({
chunk_index: chunks.length,
chunk_text: c.text,
chunk_source: 'timeline',
});
}
}
// Embed if requested
if (!noEmbed && chunks.length > 0) {
try {
const embeddings = await embedBatch(chunks.map(c => c.chunk_text));
for (let j = 0; j < chunks.length; j++) {
chunks[j].embedding = embeddings[j];
chunks[j].token_count = Math.ceil(chunks[j].chunk_text.length / 4);
}
} catch {
// Embedding failure is non-fatal, chunks still saved without embeddings
}
}
if (chunks.length > 0) {
await engine.upsertChunks(slug, chunks);
chunksCreated += chunks.length;
}
imported++;
} catch (e: unknown) {
const msg = e instanceof Error ? e.message : String(e);
console.error(`\n Warning: skipped ${relativePath}: ${msg}`);
skipped++;
}
}
console.log(`\n\nImport complete:`);
console.log(` ${imported} pages imported`);
console.log(` ${skipped} pages skipped (unchanged or error)`);
console.log(` ${chunksCreated} chunks created`);
// Log the ingest
await engine.logIngest({
source_type: 'directory',
source_ref: dir,
pages_updated: [],
summary: `Imported ${imported} pages, ${skipped} skipped, ${chunksCreated} chunks`,
});
}
function collectMarkdownFiles(dir: string): string[] {
const files: string[] = [];
function walk(d: string) {
for (const entry of readdirSync(d)) {
// Skip hidden dirs and .raw dirs
if (entry.startsWith('.')) continue;
const full = join(d, entry);
const stat = statSync(full);
if (stat.isDirectory()) {
walk(full);
} else if (entry.endsWith('.md')) {
files.push(full);
}
}
}
walk(dir);
return files.sort();
}

82
src/commands/init.ts Normal file
View File

@@ -0,0 +1,82 @@
import { execSync } from 'child_process';
import { PostgresEngine } from '../core/postgres-engine.ts';
import { saveConfig, type GBrainConfig } from '../core/config.ts';
export async function runInit(args: string[]) {
const isSupabase = args.includes('--supabase');
const urlIndex = args.indexOf('--url');
const manualUrl = urlIndex !== -1 ? args[urlIndex + 1] : null;
let databaseUrl: string;
if (manualUrl) {
databaseUrl = manualUrl;
} else if (isSupabase) {
databaseUrl = await supabaseWizard();
} else {
// Default to supabase wizard
databaseUrl = await supabaseWizard();
}
// Connect and init schema
console.log('Connecting to database...');
const engine = new PostgresEngine();
await engine.connect({ database_url: databaseUrl });
console.log('Running schema migration...');
await engine.initSchema();
// Save config
const config: GBrainConfig = {
engine: 'postgres',
database_url: databaseUrl,
};
saveConfig(config);
console.log('Config saved to ~/.gbrain/config.json');
// Verify
const stats = await engine.getStats();
await engine.disconnect();
console.log(`\nBrain ready. ${stats.page_count} pages.`);
console.log('Next: gbrain import <dir> to migrate your markdown.');
}
async function supabaseWizard(): Promise<string> {
// Try Supabase CLI auto-provision
try {
execSync('npx supabase --version', { stdio: 'pipe' });
console.log('Supabase CLI detected.');
console.log('To auto-provision, run: npx supabase login && npx supabase projects create');
console.log('Then use: gbrain init --url <your-connection-string>');
} catch {
console.log('Supabase CLI not found.');
console.log('Install it: npm install -g supabase');
console.log('Or provide a connection URL directly.');
}
// Fallback to manual URL
console.log('\nEnter your Supabase/Postgres connection URL:');
console.log(' Format: postgresql://user:password@host:port/database');
console.log(' Find it: Supabase Dashboard > Settings > Database > Connection string\n');
const url = await readLine('Connection URL: ');
if (!url) {
console.error('No URL provided.');
process.exit(1);
}
return url;
}
function readLine(prompt: string): Promise<string> {
return new Promise((resolve) => {
process.stdout.write(prompt);
let data = '';
process.stdin.setEncoding('utf-8');
process.stdin.once('data', (chunk) => {
data = chunk.toString().trim();
resolve(data);
});
process.stdin.resume();
});
}

68
src/commands/link.ts Normal file
View File

@@ -0,0 +1,68 @@
import type { BrainEngine } from '../core/engine.ts';
export async function runLink(engine: BrainEngine, args: string[]) {
const from = args[0];
const to = args[1];
const typeIdx = args.indexOf('--type');
const linkType = typeIdx !== -1 ? args[typeIdx + 1] : '';
if (!from || !to) {
console.error('Usage: gbrain link <from> <to> [--type <type>]');
process.exit(1);
}
await engine.addLink(from, to, '', linkType);
console.log(`Linked ${from} -> ${to}${linkType ? ` (${linkType})` : ''}`);
}
export async function runUnlink(engine: BrainEngine, args: string[]) {
const [from, to] = args;
if (!from || !to) {
console.error('Usage: gbrain unlink <from> <to>');
process.exit(1);
}
await engine.removeLink(from, to);
console.log(`Unlinked ${from} -> ${to}`);
}
export async function runBacklinks(engine: BrainEngine, args: string[]) {
const slug = args[0];
if (!slug) {
console.error('Usage: gbrain backlinks <slug>');
process.exit(1);
}
const links = await engine.getBacklinks(slug);
if (links.length === 0) {
console.log(`No backlinks to ${slug}`);
return;
}
for (const l of links) {
const typeStr = l.link_type ? ` (${l.link_type})` : '';
console.log(`${l.from_slug}${typeStr}`);
}
console.log(`\n${links.length} backlinks`);
}
export async function runGraph(engine: BrainEngine, args: string[]) {
const slug = args.find(a => !a.startsWith('--'));
const depthIdx = args.indexOf('--depth');
const depth = depthIdx !== -1 ? parseInt(args[depthIdx + 1], 10) : 5;
if (!slug) {
console.error('Usage: gbrain graph <slug> [--depth N]');
process.exit(1);
}
const nodes = await engine.traverseGraph(slug, depth);
for (const node of nodes) {
const indent = ' '.repeat(node.depth);
const links = node.links.map(l => `${l.to_slug}${l.link_type ? `(${l.link_type})` : ''}`);
console.log(`${indent}${node.slug} [${node.type}]`);
if (links.length > 0) {
console.log(`${indent} -> ${links.join(', ')}`);
}
}
}

25
src/commands/list.ts Normal file
View File

@@ -0,0 +1,25 @@
import type { BrainEngine } from '../core/engine.ts';
import type { PageType } from '../core/types.ts';
export async function runList(engine: BrainEngine, args: string[]) {
const typeIdx = args.indexOf('--type');
const tagIdx = args.indexOf('--tag');
const limitIdx = args.indexOf('-n');
const type = typeIdx !== -1 ? (args[typeIdx + 1] as PageType) : undefined;
const tag = tagIdx !== -1 ? args[tagIdx + 1] : undefined;
const limit = limitIdx !== -1 ? parseInt(args[limitIdx + 1], 10) : 50;
const pages = await engine.listPages({ type, tag, limit });
if (pages.length === 0) {
console.log('No pages found.');
return;
}
for (const p of pages) {
const date = p.updated_at.toISOString().split('T')[0];
console.log(`${p.slug}\t${p.type}\t${date}\t${p.title}`);
}
console.log(`\n${pages.length} pages`);
}

50
src/commands/put.ts Normal file
View File

@@ -0,0 +1,50 @@
import { readFileSync } from 'fs';
import type { BrainEngine } from '../core/engine.ts';
import { parseMarkdown } from '../core/markdown.ts';
export async function runPut(engine: BrainEngine, args: string[]) {
const slug = args[0];
if (!slug) {
console.error('Usage: gbrain put <slug> [< file.md]');
process.exit(1);
}
// Read from stdin or file arg
let content: string;
const fileArg = args[1];
if (fileArg) {
content = readFileSync(fileArg, 'utf-8');
} else if (!process.stdin.isTTY) {
content = readFileSync('/dev/stdin', 'utf-8');
} else {
console.error('Provide content via stdin or file argument');
console.error(' gbrain put people/john < john.md');
console.error(' cat john.md | gbrain put people/john');
process.exit(1);
}
const parsed = parseMarkdown(content, slug + '.md');
// Create version snapshot before updating
const existing = await engine.getPage(slug);
if (existing) {
await engine.createVersion(slug);
}
const page = await engine.putPage(slug, {
type: parsed.type,
title: parsed.title,
compiled_truth: parsed.compiled_truth,
timeline: parsed.timeline,
frontmatter: parsed.frontmatter,
});
// Update tags
if (parsed.tags.length > 0) {
for (const tag of parsed.tags) {
await engine.addTag(slug, tag);
}
}
console.log(`${existing ? 'Updated' : 'Created'}: ${page.slug} (${page.type})`);
}

32
src/commands/query.ts Normal file
View File

@@ -0,0 +1,32 @@
import type { BrainEngine } from '../core/engine.ts';
import { hybridSearch } from '../core/search/hybrid.ts';
import { expandQuery } from '../core/search/expansion.ts';
export async function runQuery(engine: BrainEngine, args: string[]) {
const query = args.filter(a => !a.startsWith('--')).join(' ');
const noExpand = args.includes('--no-expand');
if (!query) {
console.error('Usage: gbrain query <question>');
process.exit(1);
}
const results = await hybridSearch(engine, query, {
limit: 20,
expansion: !noExpand,
expandFn: expandQuery,
});
if (results.length === 0) {
console.log('No results found.');
return;
}
for (const r of results) {
const staleTag = r.stale ? ' [STALE]' : '';
console.log(`${r.slug} (${r.type}) score=${r.score.toFixed(4)}${staleTag}`);
console.log(` ${r.chunk_text.slice(0, 120)}...`);
console.log();
}
console.log(`${results.length} results`);
}

24
src/commands/search.ts Normal file
View File

@@ -0,0 +1,24 @@
import type { BrainEngine } from '../core/engine.ts';
export async function runSearch(engine: BrainEngine, args: string[]) {
const query = args.join(' ');
if (!query) {
console.error('Usage: gbrain search <query>');
process.exit(1);
}
const results = await engine.searchKeyword(query, { limit: 20 });
if (results.length === 0) {
console.log('No results found.');
return;
}
for (const r of results) {
const staleTag = r.stale ? ' [STALE]' : '';
console.log(`${r.slug} (${r.type}) score=${r.score.toFixed(3)}${staleTag}`);
console.log(` ${r.chunk_text.slice(0, 120)}...`);
console.log();
}
console.log(`${results.length} results`);
}

7
src/commands/serve.ts Normal file
View File

@@ -0,0 +1,7 @@
import type { BrainEngine } from '../core/engine.ts';
import { startMcpServer } from '../mcp/server.ts';
export async function runServe(engine: BrainEngine) {
console.error('Starting GBrain MCP server (stdio)...');
await startMcpServer(engine);
}

21
src/commands/stats.ts Normal file
View File

@@ -0,0 +1,21 @@
import type { BrainEngine } from '../core/engine.ts';
export async function runStats(engine: BrainEngine) {
const stats = await engine.getStats();
console.log('Brain Statistics');
console.log('================');
console.log(`Pages: ${stats.page_count}`);
console.log(`Chunks: ${stats.chunk_count}`);
console.log(`Embedded: ${stats.embedded_count}`);
console.log(`Links: ${stats.link_count}`);
console.log(`Tags: ${stats.tag_count}`);
console.log(`Timeline entries: ${stats.timeline_entry_count}`);
if (Object.keys(stats.pages_by_type).length > 0) {
console.log('\nPages by type:');
for (const [type, count] of Object.entries(stats.pages_by_type)) {
console.log(` ${type}: ${count}`);
}
}
}

36
src/commands/tags.ts Normal file
View File

@@ -0,0 +1,36 @@
import type { BrainEngine } from '../core/engine.ts';
export async function runTags(engine: BrainEngine, args: string[]) {
const slug = args[0];
if (!slug) {
console.error('Usage: gbrain tags <slug>');
process.exit(1);
}
const tags = await engine.getTags(slug);
if (tags.length === 0) {
console.log(`No tags for ${slug}`);
} else {
console.log(tags.join(', '));
}
}
export async function runTag(engine: BrainEngine, args: string[]) {
const [slug, tag] = args;
if (!slug || !tag) {
console.error('Usage: gbrain tag <slug> <tag>');
process.exit(1);
}
await engine.addTag(slug, tag);
console.log(`Tagged ${slug} with "${tag}"`);
}
export async function runUntag(engine: BrainEngine, args: string[]) {
const [slug, tag] = args;
if (!slug || !tag) {
console.error('Usage: gbrain untag <slug> <tag>');
process.exit(1);
}
await engine.removeTag(slug, tag);
console.log(`Removed tag "${tag}" from ${slug}`);
}

40
src/commands/timeline.ts Normal file
View File

@@ -0,0 +1,40 @@
import type { BrainEngine } from '../core/engine.ts';
export async function runTimeline(engine: BrainEngine, args: string[]) {
const slug = args[0];
if (!slug) {
console.error('Usage: gbrain timeline <slug>');
process.exit(1);
}
const entries = await engine.getTimeline(slug);
if (entries.length === 0) {
console.log(`No timeline entries for ${slug}`);
return;
}
for (const e of entries) {
const source = e.source ? ` [${e.source}]` : '';
console.log(`${e.date}${source}: ${e.summary}`);
if (e.detail) {
console.log(` ${e.detail.slice(0, 200)}`);
}
}
}
export async function runTimelineAdd(engine: BrainEngine, args: string[]) {
const slug = args[0];
const date = args[1];
const text = args.slice(2).join(' ');
if (!slug || !date || !text) {
console.error('Usage: gbrain timeline-add <slug> <date> <text>');
process.exit(1);
}
await engine.addTimelineEntry(slug, {
date,
summary: text,
});
console.log(`Added timeline entry to ${slug}`);
}

View File

@@ -0,0 +1,29 @@
export function printToolsJson() {
const tools = [
{ name: 'get', description: 'Read a page by slug', parameters: { slug: 'string' } },
{ name: 'put', description: 'Write/update a page', parameters: { slug: 'string', content: 'string (markdown)' } },
{ name: 'delete', description: 'Delete a page', parameters: { slug: 'string' } },
{ name: 'list', description: 'List pages with optional filters', parameters: { type: 'string?', tag: 'string?', limit: 'number?' } },
{ name: 'search', description: 'Keyword search (tsvector)', parameters: { query: 'string' } },
{ name: 'query', description: 'Hybrid search (RRF + multi-query expansion)', parameters: { query: 'string' } },
{ name: 'import', description: 'Import markdown directory', parameters: { dir: 'string', no_embed: 'boolean?' } },
{ name: 'export', description: 'Export to markdown directory', parameters: { dir: 'string?' } },
{ name: 'embed', description: 'Generate/refresh embeddings', parameters: { slug: 'string?', all: 'boolean?', stale: 'boolean?' } },
{ name: 'tag', description: 'Add tag to page', parameters: { slug: 'string', tag: 'string' } },
{ name: 'untag', description: 'Remove tag from page', parameters: { slug: 'string', tag: 'string' } },
{ name: 'tags', description: 'List tags for a page', parameters: { slug: 'string' } },
{ name: 'link', description: 'Create typed link between pages', parameters: { from: 'string', to: 'string', type: 'string?' } },
{ name: 'unlink', description: 'Remove link between pages', parameters: { from: 'string', to: 'string' } },
{ name: 'backlinks', description: 'List incoming links to a page', parameters: { slug: 'string' } },
{ name: 'graph', description: 'Traverse link graph from a page', parameters: { slug: 'string', depth: 'number?' } },
{ name: 'timeline', description: 'View timeline entries for a page', parameters: { slug: 'string' } },
{ name: 'timeline-add', description: 'Add timeline entry', parameters: { slug: 'string', date: 'string', text: 'string' } },
{ name: 'stats', description: 'Brain statistics', parameters: {} },
{ name: 'health', description: 'Brain health dashboard', parameters: {} },
{ name: 'history', description: 'Page version history', parameters: { slug: 'string' } },
{ name: 'revert', description: 'Revert page to version', parameters: { slug: 'string', version_id: 'number' } },
{ name: 'config', description: 'Get/set brain config', parameters: { action: '"get"|"set"', key: 'string', value: 'string?' } },
];
console.log(JSON.stringify(tools, null, 2));
}

67
src/commands/upgrade.ts Normal file
View File

@@ -0,0 +1,67 @@
import { execSync } from 'child_process';
export async function runUpgrade(_args: string[]) {
// Detect installation method
const method = detectInstallMethod();
console.log(`Detected install method: ${method}`);
switch (method) {
case 'npm':
console.log('Upgrading via npm...');
try {
execSync('bun update gbrain', { stdio: 'inherit' });
console.log('Upgrade complete.');
} catch {
console.error('npm upgrade failed. Try: bun update gbrain');
}
break;
case 'binary':
console.log('Binary self-update not yet implemented.');
console.log('Download the latest binary from GitHub Releases:');
console.log(' https://github.com/garrytan/gbrain/releases');
break;
case 'clawhub':
console.log('Upgrading via ClawHub...');
try {
execSync('clawhub update gbrain', { stdio: 'inherit' });
console.log('Upgrade complete.');
} catch {
console.error('ClawHub upgrade failed. Try: clawhub update gbrain');
}
break;
default:
console.error('Could not detect installation method.');
console.log('Try one of:');
console.log(' bun update gbrain');
console.log(' clawhub update gbrain');
console.log(' Download from https://github.com/garrytan/gbrain/releases');
}
}
function detectInstallMethod(): 'npm' | 'binary' | 'clawhub' | 'unknown' {
const execPath = process.execPath || '';
// Check if running from node_modules (npm install)
if (execPath.includes('node_modules') || process.argv[1]?.includes('node_modules')) {
return 'npm';
}
// Check if clawhub is available
try {
execSync('which clawhub', { stdio: 'pipe' });
return 'clawhub';
} catch {
// not available
}
// Check if running as compiled binary
if (execPath.endsWith('/gbrain') || execPath.endsWith('\\gbrain.exe')) {
return 'binary';
}
return 'unknown';
}

39
src/commands/version.ts Normal file
View File

@@ -0,0 +1,39 @@
import type { BrainEngine } from '../core/engine.ts';
export async function runHistory(engine: BrainEngine, args: string[]) {
const slug = args[0];
if (!slug) {
console.error('Usage: gbrain history <slug>');
process.exit(1);
}
const versions = await engine.getVersions(slug);
if (versions.length === 0) {
console.log(`No version history for ${slug}`);
return;
}
console.log(`Version history for ${slug}:`);
for (const v of versions) {
const date = new Date(v.snapshot_at).toISOString();
const preview = v.compiled_truth.slice(0, 80).replace(/\n/g, ' ');
console.log(` #${v.id} ${date} ${preview}...`);
}
}
export async function runRevert(engine: BrainEngine, args: string[]) {
const slug = args[0];
const versionId = args[1] ? parseInt(args[1], 10) : NaN;
if (!slug || isNaN(versionId)) {
console.error('Usage: gbrain revert <slug> <version-id>');
process.exit(1);
}
// Create a snapshot before reverting
await engine.createVersion(slug);
await engine.revertToVersion(slug, versionId);
console.log(`Reverted ${slug} to version #${versionId}`);
console.log('Note: run gbrain embed <slug> to re-embed the reverted content');
}

163
src/core/chunkers/llm.ts Normal file
View File

@@ -0,0 +1,163 @@
/**
* LLM-Guided Text Chunker
* Ported from production Ruby implementation (llm_text_chunker.rb, 167 LOC)
*
* Algorithm:
* 1. Pre-split into 128-word candidates via recursive chunker
* 2. Sliding window of 3+ candidates
* 3. Ask Claude Haiku: "Where does the FIRST topic shift occur?"
* 4. Max 3 retries per window on unparseable responses
* 5. Merge candidates between split points
*/
import { chunkText as recursiveChunk, type TextChunk } from './recursive.ts';
const CANDIDATE_SIZE = 128; // words per pre-split candidate
const MAX_RETRIES = 3;
const WINDOW_SIZE = 5; // candidates per window
export interface LlmChunkOptions {
chunkSize?: number;
chunkOverlap?: number;
askLlm?: (prompt: string) => Promise<string>;
}
export async function chunkTextLlm(
text: string,
opts: LlmChunkOptions,
): Promise<TextChunk[]> {
const chunkSize = opts.chunkSize || 300;
const chunkOverlap = opts.chunkOverlap || 50;
const askLlm = opts.askLlm;
if (!askLlm) {
return recursiveChunk(text, { chunkSize, chunkOverlap });
}
try {
// Step 1: Pre-split into small candidates
const candidates = recursiveChunk(text, {
chunkSize: CANDIDATE_SIZE,
chunkOverlap: 0,
});
if (candidates.length <= 2) {
return recursiveChunk(text, { chunkSize, chunkOverlap });
}
// Step 2: Find split points via LLM
const splitPoints = await findSplitPoints(candidates, askLlm);
// Step 3: Merge candidates between split points
const merged = mergeAtSplits(candidates, splitPoints);
return merged.map((t, i) => ({ text: t.trim(), index: i }));
} catch {
return recursiveChunk(text, { chunkSize, chunkOverlap });
}
}
async function findSplitPoints(
candidates: TextChunk[],
askLlm: (prompt: string) => Promise<string>,
): Promise<number[]> {
const splitPoints: number[] = [];
let pos = 0;
while (pos < candidates.length - 1) {
const windowEnd = Math.min(pos + WINDOW_SIZE, candidates.length);
const window = candidates.slice(pos, windowEnd);
if (window.length < 2) break;
const splitAt = await askForSplit(window, pos, askLlm);
if (splitAt !== null && splitAt > pos) {
splitPoints.push(splitAt);
pos = splitAt;
} else {
// No split found in this window, advance by 1
pos++;
}
}
return splitPoints;
}
async function askForSplit(
window: TextChunk[],
offset: number,
askLlm: (prompt: string) => Promise<string>,
): Promise<number | null> {
// Format candidates as numbered items
const numbered = window
.map((c, i) => `[${offset + i}] ${c.text.slice(0, 200)}${c.text.length > 200 ? '...' : ''}`)
.join('\n\n');
const prompt = `You are analyzing a document that has been split into numbered segments. Your job is to find where the FIRST major topic shift occurs.
Here are the segments:
${numbered}
If there is a clear topic shift between any two adjacent segments, respond with ONLY the number of the segment where the NEW topic begins. For example, if the topic shifts between [${offset + 1}] and [${offset + 2}], respond with: ${offset + 2}
If there is no clear topic shift, respond with: NONE
Respond with only a number or NONE. Nothing else.`;
for (let retry = 0; retry < MAX_RETRIES; retry++) {
try {
const response = await askLlm(prompt);
const parsed = parseSplitResponse(response, offset, offset + window.length - 1);
return parsed;
} catch {
continue;
}
}
return null;
}
function parseSplitResponse(
response: string,
minId: number,
maxId: number,
): number | null {
const trimmed = response.trim().toUpperCase();
if (trimmed === 'NONE') return null;
const num = parseInt(trimmed, 10);
if (isNaN(num)) return null;
// Clamp to valid range, ensure forward progress
const clamped = Math.max(num, minId + 1);
if (clamped > maxId) return null;
return clamped;
}
function mergeAtSplits(candidates: TextChunk[], splitPoints: number[]): string[] {
if (splitPoints.length === 0) {
return [candidates.map(c => c.text).join(' ')];
}
const result: string[] = [];
let start = 0;
for (const split of splitPoints) {
const group = candidates.slice(start, split);
if (group.length > 0) {
result.push(group.map(c => c.text).join(' '));
}
start = split;
}
// Last group
const remaining = candidates.slice(start);
if (remaining.length > 0) {
result.push(remaining.map(c => c.text).join(' '));
}
return result.filter(t => t.trim().length > 0);
}

View File

@@ -0,0 +1,211 @@
/**
* Recursive Delimiter-Aware Text Chunker
* Ported from production Ruby implementation (text_chunker.rb, 205 LOC)
*
* 5-level delimiter hierarchy:
* 1. Paragraphs (\n\n)
* 2. Lines (\n)
* 3. Sentences (. ! ? followed by space or newline)
* 4. Clauses (; : , )
* 5. Words (whitespace)
*
* Config: 300-word chunks with 50-word sentence-aware overlap.
* Lossless invariant: non-overlapping portions reassemble to original.
*/
const DELIMITERS: string[][] = [
['\n\n'], // L0: paragraphs
['\n'], // L1: lines
['. ', '! ', '? ', '.\n', '!\n', '?\n'], // L2: sentences
['; ', ': ', ', '], // L3: clauses
[], // L4: words (whitespace split)
];
export interface ChunkOptions {
chunkSize?: number; // target words per chunk (default 300)
chunkOverlap?: number; // overlap words (default 50)
}
export interface TextChunk {
text: string;
index: number;
}
export function chunkText(text: string, opts?: ChunkOptions): TextChunk[] {
const chunkSize = opts?.chunkSize || 300;
const chunkOverlap = opts?.chunkOverlap || 50;
if (!text || text.trim().length === 0) return [];
const wordCount = countWords(text);
if (wordCount <= chunkSize) {
return [{ text: text.trim(), index: 0 }];
}
// Recursively split, then greedily merge to target size
const pieces = recursiveSplit(text, 0, chunkSize);
const merged = greedyMerge(pieces, chunkSize);
const withOverlap = applyOverlap(merged, chunkOverlap);
return withOverlap.map((t, i) => ({ text: t.trim(), index: i }));
}
function recursiveSplit(text: string, level: number, target: number): string[] {
if (level >= DELIMITERS.length) {
// Level 4: split on whitespace
return splitOnWhitespace(text, target);
}
const delimiters = DELIMITERS[level];
if (delimiters.length === 0) {
return splitOnWhitespace(text, target);
}
const pieces = splitAtDelimiters(text, delimiters);
// If splitting didn't help (only 1 piece), try next level
if (pieces.length <= 1) {
return recursiveSplit(text, level + 1, target);
}
// Check if any piece is still too large, recurse deeper
const result: string[] = [];
for (const piece of pieces) {
if (countWords(piece) > target) {
result.push(...recursiveSplit(piece, level + 1, target));
} else {
result.push(piece);
}
}
return result;
}
/**
* Split text at delimiter boundaries, preserving delimiters at the end
* of the piece that precedes them (lossless).
*/
function splitAtDelimiters(text: string, delimiters: string[]): string[] {
const pieces: string[] = [];
let remaining = text;
while (remaining.length > 0) {
let earliest = -1;
let earliestDelim = '';
for (const delim of delimiters) {
const idx = remaining.indexOf(delim);
if (idx !== -1 && (earliest === -1 || idx < earliest)) {
earliest = idx;
earliestDelim = delim;
}
}
if (earliest === -1) {
pieces.push(remaining);
break;
}
// Include the delimiter with the preceding text
const piece = remaining.slice(0, earliest + earliestDelim.length);
if (piece.trim().length > 0) {
pieces.push(piece);
}
remaining = remaining.slice(earliest + earliestDelim.length);
}
// Handle trailing content
if (remaining.trim().length > 0 && !pieces.includes(remaining)) {
// Already added above
}
return pieces.filter(p => p.trim().length > 0);
}
/**
* Fallback: split on whitespace boundaries to hit target word count.
*/
function splitOnWhitespace(text: string, target: number): string[] {
const words = text.match(/\S+\s*/g) || [];
if (words.length === 0) return [];
const pieces: string[] = [];
for (let i = 0; i < words.length; i += target) {
const slice = words.slice(i, i + target).join('');
if (slice.trim().length > 0) {
pieces.push(slice);
}
}
return pieces;
}
/**
* Greedily merge adjacent pieces until each chunk is near the target size.
* Avoids creating chunks larger than target * 1.5.
*/
function greedyMerge(pieces: string[], target: number): string[] {
if (pieces.length === 0) return [];
const result: string[] = [];
let current = pieces[0];
for (let i = 1; i < pieces.length; i++) {
const combined = current + pieces[i];
if (countWords(combined) <= Math.ceil(target * 1.5)) {
current = combined;
} else {
result.push(current);
current = pieces[i];
}
}
if (current.trim().length > 0) {
result.push(current);
}
return result;
}
/**
* Apply sentence-aware trailing overlap.
* The last N words of chunk[i] are prepended to chunk[i+1].
*/
function applyOverlap(chunks: string[], overlapWords: number): string[] {
if (chunks.length <= 1 || overlapWords <= 0) return chunks;
const result: string[] = [chunks[0]];
for (let i = 1; i < chunks.length; i++) {
const prevTrailing = extractTrailingContext(chunks[i - 1], overlapWords);
result.push(prevTrailing + chunks[i]);
}
return result;
}
/**
* Extract the last N words from text, trying to align to sentence boundaries.
* If a sentence boundary exists within the last N words, start there.
*/
function extractTrailingContext(text: string, targetWords: number): string {
const words = text.match(/\S+\s*/g) || [];
if (words.length <= targetWords) return '';
const trailing = words.slice(-targetWords).join('');
// Try to find a sentence boundary to start from
const sentenceStart = trailing.search(/[.!?]\s+/);
if (sentenceStart !== -1 && sentenceStart < trailing.length / 2) {
// Start after the sentence boundary
const afterSentence = trailing.slice(sentenceStart).replace(/^[.!?]\s+/, '');
if (afterSentence.trim().length > 0) {
return afterSentence;
}
}
return trailing;
}
function countWords(text: string): number {
return (text.match(/\S+/g) || []).length;
}

View File

@@ -0,0 +1,340 @@
/**
* Semantic Text Chunker
* Ported from production Ruby implementation (semantic_text_chunker.rb, 242 LOC)
*
* Algorithm:
* 1. Split text into sentences
* 2. Embed each sentence
* 3. Compute adjacent cosine similarities
* 4. Savitzky-Golay filter (5-window, 3rd-order polynomial)
* 5. Find local minima (topic boundaries)
* 6. Group sentences, recursively split oversized groups
*
* Falls back to recursive chunker on any failure.
*/
import { chunkText as recursiveChunk, type TextChunk } from './recursive.ts';
export interface SemanticChunkOptions {
chunkSize?: number;
chunkOverlap?: number;
embedFn?: (texts: string[]) => Promise<Float32Array[]>;
}
export async function chunkTextSemantic(
text: string,
opts: SemanticChunkOptions,
): Promise<TextChunk[]> {
const chunkSize = opts.chunkSize || 300;
const chunkOverlap = opts.chunkOverlap || 50;
const embedFn = opts.embedFn;
if (!embedFn) {
return recursiveChunk(text, { chunkSize, chunkOverlap });
}
try {
const sentences = splitSentences(text);
if (sentences.length <= 3) {
return recursiveChunk(text, { chunkSize, chunkOverlap });
}
// Embed all sentences
const embeddings = await embedFn(sentences);
if (embeddings.length !== sentences.length) {
return recursiveChunk(text, { chunkSize, chunkOverlap });
}
// Compute adjacent cosine similarities
const similarities = computeAdjacentSimilarities(embeddings);
// Find topic boundaries
const boundaries = findBoundaries(similarities);
// Group sentences at boundaries
const groups = groupAtBoundaries(sentences, boundaries);
// Recursively split oversized groups
const chunks: TextChunk[] = [];
let idx = 0;
for (const group of groups) {
const groupText = group.join(' ');
const wordCount = (groupText.match(/\S+/g) || []).length;
if (wordCount > chunkSize * 1.5) {
const subChunks = recursiveChunk(groupText, { chunkSize, chunkOverlap });
for (const sc of subChunks) {
chunks.push({ text: sc.text, index: idx++ });
}
} else {
chunks.push({ text: groupText.trim(), index: idx++ });
}
}
return chunks;
} catch {
// Any failure falls back to recursive
return recursiveChunk(text, { chunkSize, chunkOverlap });
}
}
/**
* Split text into sentences. Handles common abbreviations.
*/
export function splitSentences(text: string): string[] {
// Split on sentence-ending punctuation followed by whitespace or newline
const raw = text.split(/(?<=[.!?])\s+/);
return raw
.map(s => s.trim())
.filter(s => s.length > 0);
}
/**
* Compute cosine similarity between each adjacent pair of embeddings.
* Returns array of length (embeddings.length - 1).
*/
function computeAdjacentSimilarities(embeddings: Float32Array[]): number[] {
const sims: number[] = [];
for (let i = 0; i < embeddings.length - 1; i++) {
sims.push(cosineSimilarity(embeddings[i], embeddings[i + 1]));
}
return sims;
}
/**
* Find topic boundaries using Savitzky-Golay smoothing.
* Falls back to percentile-based detection if SG fails.
*/
function findBoundaries(similarities: number[]): number[] {
if (similarities.length < 5) {
return findBoundariesPercentile(similarities);
}
try {
return findBoundariesSavGol(similarities);
} catch {
return findBoundariesPercentile(similarities);
}
}
/**
* Savitzky-Golay boundary detection.
* Apply SG filter to get 1st derivative, find local minima.
*/
function findBoundariesSavGol(similarities: number[]): number[] {
// Compute 1st derivative via Savitzky-Golay (window=5, poly=3, deriv=1)
const derivative = savitzkyGolay(similarities, 5, 3, 1);
// Find zero crossings of the derivative (local minima)
// A minimum is where derivative goes from negative to positive
const minima: number[] = [];
for (let i = 1; i < derivative.length; i++) {
if (derivative[i - 1] < 0 && derivative[i] >= 0) {
minima.push(i);
}
}
// Filter by percentile: only keep minima where similarity is below 80th percentile
const threshold = percentile(similarities, 0.2); // low similarity = topic shift
const filtered = minima.filter(i => {
const simIdx = Math.min(i, similarities.length - 1);
return similarities[simIdx] < threshold;
});
// Enforce minimum distance of 2 between boundaries
return enforceMinDistance(filtered, 2);
}
/**
* Simple percentile-based boundary detection.
* Find positions where similarity drops below the 20th percentile.
*/
function findBoundariesPercentile(similarities: number[]): number[] {
if (similarities.length === 0) return [];
const threshold = percentile(similarities, 0.2);
const boundaries: number[] = [];
for (let i = 0; i < similarities.length; i++) {
if (similarities[i] < threshold) {
boundaries.push(i + 1); // boundary is after position i
}
}
return enforceMinDistance(boundaries, 2);
}
/**
* Savitzky-Golay filter implementation.
* Polynomial fitting over a sliding window.
*/
function savitzkyGolay(
data: number[],
windowSize: number,
polyOrder: number,
derivOrder: number,
): number[] {
const half = Math.floor(windowSize / 2);
const n = data.length;
if (n < windowSize) return data.slice();
// Build Vandermonde matrix for the window
const J: number[][] = [];
for (let i = -half; i <= half; i++) {
const row: number[] = [];
for (let j = 0; j <= polyOrder; j++) {
row.push(Math.pow(i, j));
}
J.push(row);
}
// Compute (J^T J)^-1 J^T
const JT = transpose(J);
const JTJ = matMul(JT, J);
const JTJinv = invertMatrix(JTJ);
const coeffs = matMul(JTJinv, JT);
// The row corresponding to derivOrder gives us the filter coefficients
// For derivative of order d, multiply by d!
const filterRow = coeffs[derivOrder];
const factorial = factorialN(derivOrder);
const result: number[] = new Array(n).fill(0);
for (let i = 0; i < n; i++) {
let val = 0;
for (let j = -half; j <= half; j++) {
const idx = Math.min(Math.max(i + j, 0), n - 1);
val += filterRow[j + half] * data[idx];
}
result[i] = val * factorial;
}
return result;
}
/**
* Group sentences into chunks at the given boundary positions.
*/
function groupAtBoundaries(sentences: string[], boundaries: number[]): string[][] {
const groups: string[][] = [];
let start = 0;
for (const b of boundaries) {
if (b > start && b < sentences.length) {
groups.push(sentences.slice(start, b));
start = b;
}
}
// Last group
if (start < sentences.length) {
groups.push(sentences.slice(start));
}
return groups.length > 0 ? groups : [sentences];
}
// Math helpers
function cosineSimilarity(a: Float32Array, b: Float32Array): number {
let dot = 0, normA = 0, normB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
const denom = Math.sqrt(normA) * Math.sqrt(normB);
return denom === 0 ? 0 : dot / denom;
}
function percentile(arr: number[], p: number): number {
const sorted = [...arr].sort((a, b) => a - b);
const idx = Math.floor(p * sorted.length);
return sorted[Math.min(idx, sorted.length - 1)];
}
function enforceMinDistance(boundaries: number[], minDist: number): number[] {
if (boundaries.length <= 1) return boundaries;
const result = [boundaries[0]];
for (let i = 1; i < boundaries.length; i++) {
if (boundaries[i] - result[result.length - 1] >= minDist) {
result.push(boundaries[i]);
}
}
return result;
}
function transpose(m: number[][]): number[][] {
const rows = m.length, cols = m[0].length;
const result: number[][] = Array.from({ length: cols }, () => new Array(rows).fill(0));
for (let i = 0; i < rows; i++) {
for (let j = 0; j < cols; j++) {
result[j][i] = m[i][j];
}
}
return result;
}
function matMul(a: number[][], b: number[][]): number[][] {
const rows = a.length, cols = b[0].length, inner = b.length;
const result: number[][] = Array.from({ length: rows }, () => new Array(cols).fill(0));
for (let i = 0; i < rows; i++) {
for (let j = 0; j < cols; j++) {
for (let k = 0; k < inner; k++) {
result[i][j] += a[i][k] * b[k][j];
}
}
}
return result;
}
function invertMatrix(m: number[][]): number[][] {
const n = m.length;
// Augment with identity
const aug: number[][] = m.map((row, i) => {
const identity = new Array(n).fill(0);
identity[i] = 1;
return [...row, ...identity];
});
// Gauss-Jordan elimination
for (let col = 0; col < n; col++) {
// Find pivot
let maxRow = col;
for (let row = col + 1; row < n; row++) {
if (Math.abs(aug[row][col]) > Math.abs(aug[maxRow][col])) {
maxRow = row;
}
}
[aug[col], aug[maxRow]] = [aug[maxRow], aug[col]];
const pivot = aug[col][col];
if (Math.abs(pivot) < 1e-12) {
throw new Error('Matrix is singular');
}
// Scale pivot row
for (let j = 0; j < 2 * n; j++) {
aug[col][j] /= pivot;
}
// Eliminate column
for (let row = 0; row < n; row++) {
if (row === col) continue;
const factor = aug[row][col];
for (let j = 0; j < 2 * n; j++) {
aug[row][j] -= factor * aug[col][j];
}
}
}
return aug.map(row => row.slice(n));
}
function factorialN(n: number): number {
let result = 1;
for (let i = 2; i <= n; i++) result *= i;
return result;
}

50
src/core/config.ts Normal file
View File

@@ -0,0 +1,50 @@
import { readFileSync, writeFileSync, mkdirSync, chmodSync } from 'fs';
import { join } from 'path';
import { homedir } from 'os';
import type { EngineConfig } from './types.ts';
const CONFIG_DIR = join(homedir(), '.gbrain');
const CONFIG_PATH = join(CONFIG_DIR, 'config.json');
export interface GBrainConfig {
engine: 'postgres' | 'sqlite';
database_url?: string;
database_path?: string;
openai_api_key?: string;
anthropic_api_key?: string;
}
export function loadConfig(): GBrainConfig | null {
try {
const raw = readFileSync(CONFIG_PATH, 'utf-8');
return JSON.parse(raw) as GBrainConfig;
} catch {
return null;
}
}
export function saveConfig(config: GBrainConfig): void {
mkdirSync(CONFIG_DIR, { recursive: true });
writeFileSync(CONFIG_PATH, JSON.stringify(config, null, 2) + '\n', { mode: 0o600 });
try {
chmodSync(CONFIG_PATH, 0o600);
} catch {
// chmod may fail on some platforms
}
}
export function toEngineConfig(config: GBrainConfig): EngineConfig {
return {
engine: config.engine,
database_url: config.database_url,
database_path: config.database_path,
};
}
export function getConfigDir(): string {
return CONFIG_DIR;
}
export function getConfigPath(): string {
return CONFIG_PATH;
}

102
src/core/db.ts Normal file
View File

@@ -0,0 +1,102 @@
import postgres from 'postgres';
import { readFileSync } from 'fs';
import { join, dirname } from 'path';
import { GBrainError, type EngineConfig } from './types.ts';
let sql: ReturnType<typeof postgres> | null = null;
export function getConnection(): ReturnType<typeof postgres> {
if (!sql) {
throw new GBrainError(
'No database connection',
'connect() has not been called',
'Run gbrain init --supabase or gbrain init --url <connection_string>',
);
}
return sql;
}
export async function connect(config: EngineConfig): Promise<void> {
if (sql) return;
const url = config.database_url;
if (!url) {
throw new GBrainError(
'No database URL',
'database_url is missing from config',
'Run gbrain init --supabase or gbrain init --url <connection_string>',
);
}
try {
sql = postgres(url, {
max: 10,
idle_timeout: 20,
connect_timeout: 10,
types: {
// Register pgvector type
bigint: postgres.BigInt,
},
});
// Test connection
await sql`SELECT 1`;
} catch (e: unknown) {
sql = null;
const msg = e instanceof Error ? e.message : String(e);
throw new GBrainError(
'Cannot connect to database',
msg,
'Check your connection URL in ~/.gbrain/config.json',
);
}
}
export async function disconnect(): Promise<void> {
if (sql) {
await sql.end();
sql = null;
}
}
export async function initSchema(): Promise<void> {
const conn = getConnection();
// Read schema SQL
const schemaPath = join(dirname(new URL(import.meta.url).pathname), '..', 'schema.sql');
const schemaSql = readFileSync(schemaPath, 'utf-8');
// Split on semicolons and execute each statement
// (postgres driver can handle multi-statement, but explicit is safer)
const statements = schemaSql
.split(/;\s*$/m)
.map(s => s.trim())
.filter(s => s.length > 0 && !s.startsWith('--'));
for (const stmt of statements) {
try {
await conn.unsafe(stmt);
} catch (e: unknown) {
// Ignore "already exists" errors for idempotency
const msg = e instanceof Error ? e.message : String(e);
if (msg.includes('already exists') || msg.includes('duplicate key')) {
continue;
}
throw e;
}
}
}
export async function withTransaction<T>(fn: () => Promise<T>): Promise<T> {
const conn = getConnection();
return conn.begin(async (tx) => {
// Temporarily swap global connection to transaction
const prev = sql;
sql = tx as unknown as ReturnType<typeof postgres>;
try {
return await fn();
} finally {
sql = prev;
}
});
}

94
src/core/embedding.ts Normal file
View File

@@ -0,0 +1,94 @@
/**
* Embedding Service
* Ported from production Ruby implementation (embedding_service.rb, 190 LOC)
*
* OpenAI text-embedding-3-large at 1536 dimensions.
* Retry with exponential backoff (4s base, 120s cap, 5 retries).
* 8000 character input truncation.
*/
import OpenAI from 'openai';
const MODEL = 'text-embedding-3-large';
const DIMENSIONS = 1536;
const MAX_CHARS = 8000;
const MAX_RETRIES = 5;
const BASE_DELAY_MS = 4000;
const MAX_DELAY_MS = 120000;
const BATCH_SIZE = 100;
let client: OpenAI | null = null;
function getClient(): OpenAI {
if (!client) {
client = new OpenAI();
}
return client;
}
export async function embed(text: string): Promise<Float32Array> {
const truncated = text.slice(0, MAX_CHARS);
const result = await embedBatch([truncated]);
return result[0];
}
export async function embedBatch(texts: string[]): Promise<Float32Array[]> {
const truncated = texts.map(t => t.slice(0, MAX_CHARS));
const results: Float32Array[] = [];
// Process in batches of BATCH_SIZE
for (let i = 0; i < truncated.length; i += BATCH_SIZE) {
const batch = truncated.slice(i, i + BATCH_SIZE);
const batchResults = await embedBatchWithRetry(batch);
results.push(...batchResults);
}
return results;
}
async function embedBatchWithRetry(texts: string[]): Promise<Float32Array[]> {
for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
try {
const response = await getClient().embeddings.create({
model: MODEL,
input: texts,
dimensions: DIMENSIONS,
});
// Sort by index to maintain order
const sorted = response.data.sort((a, b) => a.index - b.index);
return sorted.map(d => new Float32Array(d.embedding));
} catch (e: unknown) {
if (attempt === MAX_RETRIES - 1) throw e;
// Check for rate limit with Retry-After header
let delay = exponentialDelay(attempt);
if (e instanceof OpenAI.APIError && e.status === 429) {
const retryAfter = e.headers?.['retry-after'];
if (retryAfter) {
const parsed = parseInt(retryAfter, 10);
if (!isNaN(parsed)) {
delay = parsed * 1000;
}
}
}
await sleep(delay);
}
}
// Should not reach here
throw new Error('Embedding failed after all retries');
}
function exponentialDelay(attempt: number): number {
const delay = BASE_DELAY_MS * Math.pow(2, attempt);
return Math.min(delay, MAX_DELAY_MS);
}
function sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
export { MODEL as EMBEDDING_MODEL, DIMENSIONS as EMBEDDING_DIMENSIONS };

73
src/core/engine.ts Normal file
View File

@@ -0,0 +1,73 @@
import type {
Page, PageInput, PageFilters,
Chunk, ChunkInput,
SearchResult, SearchOpts,
Link, GraphNode,
TimelineEntry, TimelineInput, TimelineOpts,
RawData,
PageVersion,
BrainStats, BrainHealth,
IngestLogEntry, IngestLogInput,
EngineConfig,
} from './types.ts';
export interface BrainEngine {
// Lifecycle
connect(config: EngineConfig): Promise<void>;
disconnect(): Promise<void>;
initSchema(): Promise<void>;
transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T>;
// Pages CRUD
getPage(slug: string): Promise<Page | null>;
putPage(slug: string, page: PageInput): Promise<Page>;
deletePage(slug: string): Promise<void>;
listPages(filters?: PageFilters): Promise<Page[]>;
resolveSlugs(partial: string): Promise<string[]>;
// Search
searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]>;
searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]>;
// Chunks
upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void>;
getChunks(slug: string): Promise<Chunk[]>;
deleteChunks(slug: string): Promise<void>;
// Links
addLink(from: string, to: string, context?: string, linkType?: string): Promise<void>;
removeLink(from: string, to: string): Promise<void>;
getLinks(slug: string): Promise<Link[]>;
getBacklinks(slug: string): Promise<Link[]>;
traverseGraph(slug: string, depth?: number): Promise<GraphNode[]>;
// Tags
addTag(slug: string, tag: string): Promise<void>;
removeTag(slug: string, tag: string): Promise<void>;
getTags(slug: string): Promise<string[]>;
// Timeline
addTimelineEntry(slug: string, entry: TimelineInput): Promise<void>;
getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]>;
// Raw data
putRawData(slug: string, source: string, data: object): Promise<void>;
getRawData(slug: string, source?: string): Promise<RawData[]>;
// Versions
createVersion(slug: string): Promise<PageVersion>;
getVersions(slug: string): Promise<PageVersion[]>;
revertToVersion(slug: string, versionId: number): Promise<void>;
// Stats + health
getStats(): Promise<BrainStats>;
getHealth(): Promise<BrainHealth>;
// Ingest log
logIngest(entry: IngestLogInput): Promise<void>;
getIngestLog(opts?: { limit?: number }): Promise<IngestLogEntry[]>;
// Config
getConfig(key: string): Promise<string | null>;
setConfig(key: string, value: string): Promise<void>;
}

4
src/core/index.ts Normal file
View File

@@ -0,0 +1,4 @@
export type { BrainEngine } from './engine.ts';
export { PostgresEngine } from './postgres-engine.ts';
export * from './types.ts';
export { parseMarkdown, serializeMarkdown, splitBody } from './markdown.ts';

170
src/core/markdown.ts Normal file
View File

@@ -0,0 +1,170 @@
import matter from 'gray-matter';
import type { PageType } from './types.ts';
export interface ParsedMarkdown {
frontmatter: Record<string, unknown>;
compiled_truth: string;
timeline: string;
slug: string;
type: PageType;
title: string;
tags: string[];
}
/**
* Parse a markdown file with YAML frontmatter into its components.
*
* Structure:
* ---
* type: concept
* title: Do Things That Don't Scale
* tags: [startups, growth]
* ---
* Compiled truth content here...
* ---
* Timeline content here...
*
* The first --- pair is YAML frontmatter (handled by gray-matter).
* After frontmatter, the body is split at the first standalone ---
* (a line containing only --- with optional whitespace).
* Everything before is compiled_truth, everything after is timeline.
* If no body --- exists, all content is compiled_truth.
*/
export function parseMarkdown(content: string, filePath?: string): ParsedMarkdown {
const { data: frontmatter, content: body } = matter(content);
// Split body at first standalone ---
const { compiled_truth, timeline } = splitBody(body);
// Extract metadata from frontmatter
const type = (frontmatter.type as PageType) || inferType(filePath);
const title = (frontmatter.title as string) || inferTitle(filePath);
const tags = extractTags(frontmatter);
const slug = (frontmatter.slug as string) || inferSlug(filePath);
// Remove processed fields from frontmatter (they're stored as columns)
const cleanFrontmatter = { ...frontmatter };
delete cleanFrontmatter.type;
delete cleanFrontmatter.title;
delete cleanFrontmatter.tags;
delete cleanFrontmatter.slug;
return {
frontmatter: cleanFrontmatter,
compiled_truth: compiled_truth.trim(),
timeline: timeline.trim(),
slug,
type,
title,
tags,
};
}
/**
* Split body content at first standalone --- separator.
* Returns compiled_truth (before) and timeline (after).
*/
export function splitBody(body: string): { compiled_truth: string; timeline: string } {
// Match a line that is only --- (with optional whitespace)
// Must not be at the very start (that would be frontmatter)
const lines = body.split('\n');
let splitIndex = -1;
for (let i = 0; i < lines.length; i++) {
const trimmed = lines[i].trim();
if (trimmed === '---') {
// Skip if this is the very first non-empty line (leftover from frontmatter parsing)
const beforeContent = lines.slice(0, i).join('\n').trim();
if (beforeContent.length > 0) {
splitIndex = i;
break;
}
}
}
if (splitIndex === -1) {
return { compiled_truth: body, timeline: '' };
}
const compiled_truth = lines.slice(0, splitIndex).join('\n');
const timeline = lines.slice(splitIndex + 1).join('\n');
return { compiled_truth, timeline };
}
/**
* Serialize a page back to markdown format.
* Produces: frontmatter + compiled_truth + --- + timeline
*/
export function serializeMarkdown(
frontmatter: Record<string, unknown>,
compiled_truth: string,
timeline: string,
meta: { type: PageType; title: string; tags: string[] },
): string {
// Build full frontmatter including type, title, tags
const fullFrontmatter: Record<string, unknown> = {
type: meta.type,
title: meta.title,
...frontmatter,
};
if (meta.tags.length > 0) {
fullFrontmatter.tags = meta.tags;
}
const yamlContent = matter.stringify('', fullFrontmatter).trim();
let body = compiled_truth;
if (timeline) {
body += '\n\n---\n\n' + timeline;
}
return yamlContent + '\n\n' + body + '\n';
}
function inferType(filePath?: string): PageType {
if (!filePath) return 'concept';
// Normalize: add leading / for consistent matching
const lower = ('/' + filePath).toLowerCase();
if (lower.includes('/people/') || lower.includes('/person/')) return 'person';
if (lower.includes('/companies/') || lower.includes('/company/')) return 'company';
if (lower.includes('/deals/') || lower.includes('/deal/')) return 'deal';
if (lower.includes('/yc/')) return 'yc';
if (lower.includes('/civic/')) return 'civic';
if (lower.includes('/projects/') || lower.includes('/project/')) return 'project';
if (lower.includes('/sources/') || lower.includes('/source/')) return 'source';
if (lower.includes('/media/')) return 'media';
return 'concept';
}
function inferTitle(filePath?: string): string {
if (!filePath) return 'Untitled';
// Extract filename without extension, convert dashes/underscores to spaces
const parts = filePath.split('/');
const filename = parts[parts.length - 1]?.replace(/\.md$/i, '') || 'Untitled';
return filename.replace(/[-_]/g, ' ').replace(/\b\w/g, c => c.toUpperCase());
}
function inferSlug(filePath?: string): string {
if (!filePath) return 'untitled';
// Remove leading path components that are just the import root
// Keep the type directory + filename structure
let slug = filePath
.replace(/\.md$/i, '')
.replace(/\\/g, '/');
// Remove leading ./
if (slug.startsWith('./')) slug = slug.slice(2);
return slug.toLowerCase();
}
function extractTags(frontmatter: Record<string, unknown>): string[] {
const tags = frontmatter.tags;
if (!tags) return [];
if (Array.isArray(tags)) return tags.map(String);
if (typeof tags === 'string') return tags.split(',').map(t => t.trim()).filter(Boolean);
return [];
}

590
src/core/postgres-engine.ts Normal file
View File

@@ -0,0 +1,590 @@
import { createHash } from 'crypto';
import type { BrainEngine } from './engine.ts';
import type {
Page, PageInput, PageFilters, PageType,
Chunk, ChunkInput,
SearchResult, SearchOpts,
Link, GraphNode,
TimelineEntry, TimelineInput, TimelineOpts,
RawData,
PageVersion,
BrainStats, BrainHealth,
IngestLogEntry, IngestLogInput,
EngineConfig,
} from './types.ts';
import * as db from './db.ts';
export class PostgresEngine implements BrainEngine {
// Lifecycle
async connect(config: EngineConfig): Promise<void> {
await db.connect(config);
}
async disconnect(): Promise<void> {
await db.disconnect();
}
async initSchema(): Promise<void> {
await db.initSchema();
}
async transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T> {
return db.withTransaction(() => fn(this));
}
// Pages CRUD
async getPage(slug: string): Promise<Page | null> {
const sql = db.getConnection();
const rows = await sql`
SELECT id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash, created_at, updated_at
FROM pages WHERE slug = ${slug}
`;
if (rows.length === 0) return null;
return rowToPage(rows[0]);
}
async putPage(slug: string, page: PageInput): Promise<Page> {
validateSlug(slug);
const sql = db.getConnection();
const hash = contentHash(page.compiled_truth, page.timeline || '');
const frontmatter = page.frontmatter || {};
const rows = await sql`
INSERT INTO pages (slug, type, title, compiled_truth, timeline, frontmatter, content_hash, updated_at)
VALUES (${slug}, ${page.type}, ${page.title}, ${page.compiled_truth}, ${page.timeline || ''}, ${JSON.stringify(frontmatter)}::jsonb, ${hash}, now())
ON CONFLICT (slug) DO UPDATE SET
type = EXCLUDED.type,
title = EXCLUDED.title,
compiled_truth = EXCLUDED.compiled_truth,
timeline = EXCLUDED.timeline,
frontmatter = EXCLUDED.frontmatter,
content_hash = EXCLUDED.content_hash,
updated_at = now()
RETURNING id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash, created_at, updated_at
`;
return rowToPage(rows[0]);
}
async deletePage(slug: string): Promise<void> {
const sql = db.getConnection();
await sql`DELETE FROM pages WHERE slug = ${slug}`;
}
async listPages(filters?: PageFilters): Promise<Page[]> {
const sql = db.getConnection();
const limit = filters?.limit || 100;
const offset = filters?.offset || 0;
let rows;
if (filters?.type && filters?.tag) {
rows = await sql`
SELECT p.* FROM pages p
JOIN tags t ON t.page_id = p.id
WHERE p.type = ${filters.type} AND t.tag = ${filters.tag}
ORDER BY p.updated_at DESC LIMIT ${limit} OFFSET ${offset}
`;
} else if (filters?.type) {
rows = await sql`
SELECT * FROM pages WHERE type = ${filters.type}
ORDER BY updated_at DESC LIMIT ${limit} OFFSET ${offset}
`;
} else if (filters?.tag) {
rows = await sql`
SELECT p.* FROM pages p
JOIN tags t ON t.page_id = p.id
WHERE t.tag = ${filters.tag}
ORDER BY p.updated_at DESC LIMIT ${limit} OFFSET ${offset}
`;
} else {
rows = await sql`
SELECT * FROM pages
ORDER BY updated_at DESC LIMIT ${limit} OFFSET ${offset}
`;
}
return rows.map(rowToPage);
}
async resolveSlugs(partial: string): Promise<string[]> {
const sql = db.getConnection();
// Try exact match first
const exact = await sql`SELECT slug FROM pages WHERE slug = ${partial}`;
if (exact.length > 0) return [exact[0].slug];
// Fuzzy match via pg_trgm
const fuzzy = await sql`
SELECT slug, similarity(title, ${partial}) AS sim
FROM pages
WHERE title % ${partial} OR slug ILIKE ${'%' + partial + '%'}
ORDER BY sim DESC
LIMIT 5
`;
return fuzzy.map((r: { slug: string }) => r.slug);
}
// Search
async searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]> {
const sql = db.getConnection();
const limit = opts?.limit || 20;
const rows = await sql`
SELECT
p.slug, p.id as page_id, p.title, p.type,
cc.chunk_text, cc.chunk_source,
ts_rank(p.search_vector, websearch_to_tsquery('english', ${query})) AS score,
CASE WHEN p.updated_at < (
SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id
) THEN true ELSE false END AS stale
FROM pages p
JOIN content_chunks cc ON cc.page_id = p.id
WHERE p.search_vector @@ websearch_to_tsquery('english', ${query})
ORDER BY score DESC
LIMIT ${limit}
`;
return rows.map(rowToSearchResult);
}
async searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]> {
const sql = db.getConnection();
const limit = opts?.limit || 20;
const vecStr = '[' + Array.from(embedding).join(',') + ']';
const rows = await sql`
SELECT
p.slug, p.id as page_id, p.title, p.type,
cc.chunk_text, cc.chunk_source,
1 - (cc.embedding <=> ${vecStr}::vector) AS score,
CASE WHEN p.updated_at < (
SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id
) THEN true ELSE false END AS stale
FROM content_chunks cc
JOIN pages p ON p.id = cc.page_id
WHERE cc.embedding IS NOT NULL
ORDER BY cc.embedding <=> ${vecStr}::vector
LIMIT ${limit}
`;
return rows.map(rowToSearchResult);
}
// Chunks
async upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void> {
const sql = db.getConnection();
// Get page_id
const pages = await sql`SELECT id FROM pages WHERE slug = ${slug}`;
if (pages.length === 0) throw new Error(`Page not found: ${slug}`);
const pageId = pages[0].id;
// Delete existing chunks for this page
await sql`DELETE FROM content_chunks WHERE page_id = ${pageId}`;
// Insert new chunks
if (chunks.length === 0) return;
for (const chunk of chunks) {
const embeddingStr = chunk.embedding
? '[' + Array.from(chunk.embedding).join(',') + ']'
: null;
await sql`
INSERT INTO content_chunks (page_id, chunk_index, chunk_text, chunk_source, embedding, model, token_count, embedded_at)
VALUES (
${pageId}, ${chunk.chunk_index}, ${chunk.chunk_text}, ${chunk.chunk_source},
${embeddingStr ? sql`${embeddingStr}::vector` : sql`NULL`},
${chunk.model || 'text-embedding-3-large'},
${chunk.token_count || null},
${chunk.embedding ? sql`now()` : sql`NULL`}
)
`;
}
}
async getChunks(slug: string): Promise<Chunk[]> {
const sql = db.getConnection();
const rows = await sql`
SELECT cc.* FROM content_chunks cc
JOIN pages p ON p.id = cc.page_id
WHERE p.slug = ${slug}
ORDER BY cc.chunk_index
`;
return rows.map(rowToChunk);
}
async deleteChunks(slug: string): Promise<void> {
const sql = db.getConnection();
await sql`
DELETE FROM content_chunks
WHERE page_id = (SELECT id FROM pages WHERE slug = ${slug})
`;
}
// Links
async addLink(from: string, to: string, context?: string, linkType?: string): Promise<void> {
const sql = db.getConnection();
await sql`
INSERT INTO links (from_page_id, to_page_id, link_type, context)
SELECT f.id, t.id, ${linkType || ''}, ${context || ''}
FROM pages f, pages t
WHERE f.slug = ${from} AND t.slug = ${to}
ON CONFLICT (from_page_id, to_page_id) DO UPDATE SET
link_type = EXCLUDED.link_type,
context = EXCLUDED.context
`;
}
async removeLink(from: string, to: string): Promise<void> {
const sql = db.getConnection();
await sql`
DELETE FROM links
WHERE from_page_id = (SELECT id FROM pages WHERE slug = ${from})
AND to_page_id = (SELECT id FROM pages WHERE slug = ${to})
`;
}
async getLinks(slug: string): Promise<Link[]> {
const sql = db.getConnection();
const rows = await sql`
SELECT f.slug as from_slug, t.slug as to_slug, l.link_type, l.context
FROM links l
JOIN pages f ON f.id = l.from_page_id
JOIN pages t ON t.id = l.to_page_id
WHERE f.slug = ${slug}
`;
return rows as unknown as Link[];
}
async getBacklinks(slug: string): Promise<Link[]> {
const sql = db.getConnection();
const rows = await sql`
SELECT f.slug as from_slug, t.slug as to_slug, l.link_type, l.context
FROM links l
JOIN pages f ON f.id = l.from_page_id
JOIN pages t ON t.id = l.to_page_id
WHERE t.slug = ${slug}
`;
return rows as unknown as Link[];
}
async traverseGraph(slug: string, depth: number = 5): Promise<GraphNode[]> {
const sql = db.getConnection();
const rows = await sql`
WITH RECURSIVE graph AS (
SELECT p.id, p.slug, p.title, p.type, 0 as depth
FROM pages p WHERE p.slug = ${slug}
UNION
SELECT p2.id, p2.slug, p2.title, p2.type, g.depth + 1
FROM graph g
JOIN links l ON l.from_page_id = g.id
JOIN pages p2 ON p2.id = l.to_page_id
WHERE g.depth < ${depth}
)
SELECT DISTINCT g.slug, g.title, g.type, g.depth,
coalesce(
(SELECT json_agg(json_build_object('to_slug', p3.slug, 'link_type', l2.link_type))
FROM links l2
JOIN pages p3 ON p3.id = l2.to_page_id
WHERE l2.from_page_id = g.id),
'[]'::json
) as links
FROM graph g
ORDER BY g.depth, g.slug
`;
return rows.map((r: Record<string, unknown>) => ({
slug: r.slug as string,
title: r.title as string,
type: r.type as PageType,
depth: r.depth as number,
links: (typeof r.links === 'string' ? JSON.parse(r.links) : r.links) as { to_slug: string; link_type: string }[],
}));
}
// Tags
async addTag(slug: string, tag: string): Promise<void> {
const sql = db.getConnection();
await sql`
INSERT INTO tags (page_id, tag)
SELECT id, ${tag} FROM pages WHERE slug = ${slug}
ON CONFLICT (page_id, tag) DO NOTHING
`;
}
async removeTag(slug: string, tag: string): Promise<void> {
const sql = db.getConnection();
await sql`
DELETE FROM tags
WHERE page_id = (SELECT id FROM pages WHERE slug = ${slug})
AND tag = ${tag}
`;
}
async getTags(slug: string): Promise<string[]> {
const sql = db.getConnection();
const rows = await sql`
SELECT tag FROM tags
WHERE page_id = (SELECT id FROM pages WHERE slug = ${slug})
ORDER BY tag
`;
return rows.map((r: { tag: string }) => r.tag);
}
// Timeline
async addTimelineEntry(slug: string, entry: TimelineInput): Promise<void> {
const sql = db.getConnection();
await sql`
INSERT INTO timeline_entries (page_id, date, source, summary, detail)
SELECT id, ${entry.date}::date, ${entry.source || ''}, ${entry.summary}, ${entry.detail || ''}
FROM pages WHERE slug = ${slug}
`;
}
async getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]> {
const sql = db.getConnection();
const limit = opts?.limit || 100;
let rows;
if (opts?.after && opts?.before) {
rows = await sql`
SELECT te.* FROM timeline_entries te
JOIN pages p ON p.id = te.page_id
WHERE p.slug = ${slug} AND te.date >= ${opts.after}::date AND te.date <= ${opts.before}::date
ORDER BY te.date DESC LIMIT ${limit}
`;
} else if (opts?.after) {
rows = await sql`
SELECT te.* FROM timeline_entries te
JOIN pages p ON p.id = te.page_id
WHERE p.slug = ${slug} AND te.date >= ${opts.after}::date
ORDER BY te.date DESC LIMIT ${limit}
`;
} else {
rows = await sql`
SELECT te.* FROM timeline_entries te
JOIN pages p ON p.id = te.page_id
WHERE p.slug = ${slug}
ORDER BY te.date DESC LIMIT ${limit}
`;
}
return rows as unknown as TimelineEntry[];
}
// Raw data
async putRawData(slug: string, source: string, data: object): Promise<void> {
const sql = db.getConnection();
await sql`
INSERT INTO raw_data (page_id, source, data)
SELECT id, ${source}, ${JSON.stringify(data)}::jsonb
FROM pages WHERE slug = ${slug}
ON CONFLICT (page_id, source) DO UPDATE SET
data = EXCLUDED.data,
fetched_at = now()
`;
}
async getRawData(slug: string, source?: string): Promise<RawData[]> {
const sql = db.getConnection();
let rows;
if (source) {
rows = await sql`
SELECT rd.source, rd.data, rd.fetched_at FROM raw_data rd
JOIN pages p ON p.id = rd.page_id
WHERE p.slug = ${slug} AND rd.source = ${source}
`;
} else {
rows = await sql`
SELECT rd.source, rd.data, rd.fetched_at FROM raw_data rd
JOIN pages p ON p.id = rd.page_id
WHERE p.slug = ${slug}
`;
}
return rows as unknown as RawData[];
}
// Versions
async createVersion(slug: string): Promise<PageVersion> {
const sql = db.getConnection();
const rows = await sql`
INSERT INTO page_versions (page_id, compiled_truth, frontmatter)
SELECT id, compiled_truth, frontmatter
FROM pages WHERE slug = ${slug}
RETURNING *
`;
return rows[0] as unknown as PageVersion;
}
async getVersions(slug: string): Promise<PageVersion[]> {
const sql = db.getConnection();
const rows = await sql`
SELECT pv.* FROM page_versions pv
JOIN pages p ON p.id = pv.page_id
WHERE p.slug = ${slug}
ORDER BY pv.snapshot_at DESC
`;
return rows as unknown as PageVersion[];
}
async revertToVersion(slug: string, versionId: number): Promise<void> {
const sql = db.getConnection();
await sql`
UPDATE pages SET
compiled_truth = pv.compiled_truth,
frontmatter = pv.frontmatter,
updated_at = now()
FROM page_versions pv
WHERE pages.slug = ${slug} AND pv.id = ${versionId} AND pv.page_id = pages.id
`;
}
// Stats + health
async getStats(): Promise<BrainStats> {
const sql = db.getConnection();
const [stats] = await sql`
SELECT
(SELECT count(*) FROM pages) as page_count,
(SELECT count(*) FROM content_chunks) as chunk_count,
(SELECT count(*) FROM content_chunks WHERE embedded_at IS NOT NULL) as embedded_count,
(SELECT count(*) FROM links) as link_count,
(SELECT count(DISTINCT tag) FROM tags) as tag_count,
(SELECT count(*) FROM timeline_entries) as timeline_entry_count
`;
const types = await sql`
SELECT type, count(*)::int as count FROM pages GROUP BY type ORDER BY count DESC
`;
const pages_by_type: Record<string, number> = {};
for (const t of types) {
pages_by_type[t.type as string] = t.count as number;
}
return {
page_count: Number(stats.page_count),
chunk_count: Number(stats.chunk_count),
embedded_count: Number(stats.embedded_count),
link_count: Number(stats.link_count),
tag_count: Number(stats.tag_count),
timeline_entry_count: Number(stats.timeline_entry_count),
pages_by_type,
};
}
async getHealth(): Promise<BrainHealth> {
const sql = db.getConnection();
const [h] = await sql`
SELECT
(SELECT count(*) FROM pages) as page_count,
(SELECT count(*) FROM content_chunks WHERE embedded_at IS NOT NULL)::float /
GREATEST((SELECT count(*) FROM content_chunks), 1)::float as embed_coverage,
(SELECT count(*) FROM pages p
WHERE p.updated_at < (SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id)
) as stale_pages,
(SELECT count(*) FROM pages p
WHERE NOT EXISTS (SELECT 1 FROM links l WHERE l.to_page_id = p.id)
) as orphan_pages,
(SELECT count(*) FROM links l
WHERE NOT EXISTS (SELECT 1 FROM pages p WHERE p.id = l.to_page_id)
) as dead_links,
(SELECT count(*) FROM content_chunks WHERE embedded_at IS NULL) as missing_embeddings
`;
return {
page_count: Number(h.page_count),
embed_coverage: Number(h.embed_coverage),
stale_pages: Number(h.stale_pages),
orphan_pages: Number(h.orphan_pages),
dead_links: Number(h.dead_links),
missing_embeddings: Number(h.missing_embeddings),
};
}
// Ingest log
async logIngest(entry: IngestLogInput): Promise<void> {
const sql = db.getConnection();
await sql`
INSERT INTO ingest_log (source_type, source_ref, pages_updated, summary)
VALUES (${entry.source_type}, ${entry.source_ref}, ${JSON.stringify(entry.pages_updated)}::jsonb, ${entry.summary})
`;
}
async getIngestLog(opts?: { limit?: number }): Promise<IngestLogEntry[]> {
const sql = db.getConnection();
const limit = opts?.limit || 50;
const rows = await sql`
SELECT * FROM ingest_log ORDER BY created_at DESC LIMIT ${limit}
`;
return rows as unknown as IngestLogEntry[];
}
// Config
async getConfig(key: string): Promise<string | null> {
const sql = db.getConnection();
const rows = await sql`SELECT value FROM config WHERE key = ${key}`;
return rows.length > 0 ? (rows[0].value as string) : null;
}
async setConfig(key: string, value: string): Promise<void> {
const sql = db.getConnection();
await sql`
INSERT INTO config (key, value) VALUES (${key}, ${value})
ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value
`;
}
}
// Helpers
function validateSlug(slug: string): void {
if (!slug || /\.\./.test(slug) || /^\//.test(slug) || !/^[a-z0-9][a-z0-9/_-]*$/.test(slug)) {
throw new Error(`Invalid slug: "${slug}". Slugs must be lowercase alphanumeric with / - _ separators, no path traversal.`);
}
}
function contentHash(compiledTruth: string, timeline: string): string {
return createHash('sha256').update(compiledTruth + '\n---\n' + timeline).digest('hex');
}
function rowToPage(row: Record<string, unknown>): Page {
return {
id: row.id as number,
slug: row.slug as string,
type: row.type as PageType,
title: row.title as string,
compiled_truth: row.compiled_truth as string,
timeline: row.timeline as string,
frontmatter: (typeof row.frontmatter === 'string' ? JSON.parse(row.frontmatter) : row.frontmatter) as Record<string, unknown>,
content_hash: row.content_hash as string | undefined,
created_at: new Date(row.created_at as string),
updated_at: new Date(row.updated_at as string),
};
}
function rowToChunk(row: Record<string, unknown>): Chunk {
return {
id: row.id as number,
page_id: row.page_id as number,
chunk_index: row.chunk_index as number,
chunk_text: row.chunk_text as string,
chunk_source: row.chunk_source as 'compiled_truth' | 'timeline',
embedding: null, // Don't load embeddings into memory by default
model: row.model as string,
token_count: row.token_count as number | null,
embedded_at: row.embedded_at ? new Date(row.embedded_at as string) : null,
};
}
function rowToSearchResult(row: Record<string, unknown>): SearchResult {
return {
slug: row.slug as string,
page_id: row.page_id as number,
title: row.title as string,
type: row.type as PageType,
chunk_text: row.chunk_text as string,
chunk_source: row.chunk_source as 'compiled_truth' | 'timeline',
score: Number(row.score),
stale: Boolean(row.stale),
};
}

129
src/core/search/dedup.ts Normal file
View File

@@ -0,0 +1,129 @@
/**
* 4-Layer Dedup Pipeline
* Ported from production Ruby implementation (content_chunk.rb)
*
* 1. By source: one chunk per page with highest score
* 2. By cosine similarity: remove chunks >0.85 similar to kept results
* 3. By type: no page type exceeds 60% of results
* 4. By page: max N chunks per page (default 2)
*/
import type { SearchResult } from '../types.ts';
const COSINE_DEDUP_THRESHOLD = 0.85;
const MAX_TYPE_RATIO = 0.6;
const MAX_PER_PAGE = 2;
export function dedupResults(
results: SearchResult[],
opts?: {
cosineThreshold?: number;
maxTypeRatio?: number;
maxPerPage?: number;
},
): SearchResult[] {
const threshold = opts?.cosineThreshold ?? COSINE_DEDUP_THRESHOLD;
const maxRatio = opts?.maxTypeRatio ?? MAX_TYPE_RATIO;
const maxPerPage = opts?.maxPerPage ?? MAX_PER_PAGE;
let deduped = results;
// Layer 1: By source (one chunk per page with highest score)
deduped = dedupBySource(deduped);
// Layer 2: By cosine similarity text overlap
// (We don't have embeddings for results here, so use text similarity as proxy)
deduped = dedupByTextSimilarity(deduped, threshold);
// Layer 3: By type distribution
deduped = enforceTypeDiversity(deduped, maxRatio);
// Layer 4: By page cap
deduped = capPerPage(deduped, maxPerPage);
return deduped;
}
/**
* Layer 1: Keep only the highest-scoring chunk per page.
*/
function dedupBySource(results: SearchResult[]): SearchResult[] {
const byPage = new Map<string, SearchResult>();
for (const r of results) {
const existing = byPage.get(r.slug);
if (!existing || r.score > existing.score) {
byPage.set(r.slug, r);
}
}
return Array.from(byPage.values()).sort((a, b) => b.score - a.score);
}
/**
* Layer 2: Remove chunks that are too similar to already-kept results.
* Uses Jaccard similarity on word sets as a proxy for cosine similarity.
*/
function dedupByTextSimilarity(results: SearchResult[], threshold: number): SearchResult[] {
const kept: SearchResult[] = [];
for (const r of results) {
const rWords = new Set(r.chunk_text.toLowerCase().split(/\s+/));
let tooSimilar = false;
for (const k of kept) {
const kWords = new Set(k.chunk_text.toLowerCase().split(/\s+/));
const intersection = new Set([...rWords].filter(w => kWords.has(w)));
const union = new Set([...rWords, ...kWords]);
const jaccard = intersection.size / union.size;
if (jaccard > threshold) {
tooSimilar = true;
break;
}
}
if (!tooSimilar) {
kept.push(r);
}
}
return kept;
}
/**
* Layer 3: No page type exceeds maxRatio of total results.
*/
function enforceTypeDiversity(results: SearchResult[], maxRatio: number): SearchResult[] {
const maxPerType = Math.max(1, Math.ceil(results.length * maxRatio));
const typeCounts = new Map<string, number>();
const kept: SearchResult[] = [];
for (const r of results) {
const count = typeCounts.get(r.type) || 0;
if (count < maxPerType) {
kept.push(r);
typeCounts.set(r.type, count + 1);
}
}
return kept;
}
/**
* Layer 4: Cap chunks per page.
*/
function capPerPage(results: SearchResult[], maxPerPage: number): SearchResult[] {
const pageCounts = new Map<string, number>();
const kept: SearchResult[] = [];
for (const r of results) {
const count = pageCounts.get(r.slug) || 0;
if (count < maxPerPage) {
kept.push(r);
pageCounts.set(r.slug, count + 1);
}
}
return kept;
}

View File

@@ -0,0 +1,85 @@
/**
* Multi-Query Expansion via Claude Haiku
* Ported from production Ruby implementation (query_expansion_service.rb, 69 LOC)
*
* Skip queries < 3 words.
* Generate 2 alternative phrasings via tool use.
* Return original + alternatives (max 3 total).
*/
import Anthropic from '@anthropic-ai/sdk';
const MAX_QUERIES = 3;
const MIN_WORDS = 3;
let anthropicClient: Anthropic | null = null;
function getClient(): Anthropic {
if (!anthropicClient) {
anthropicClient = new Anthropic();
}
return anthropicClient;
}
export async function expandQuery(query: string): Promise<string[]> {
const wordCount = (query.match(/\S+/g) || []).length;
if (wordCount < MIN_WORDS) return [query];
try {
const alternatives = await callHaikuForExpansion(query);
const all = [query, ...alternatives];
// Deduplicate
const unique = [...new Set(all.map(q => q.toLowerCase().trim()))];
return unique.slice(0, MAX_QUERIES).map(q =>
all.find(orig => orig.toLowerCase().trim() === q) || q,
);
} catch {
return [query];
}
}
async function callHaikuForExpansion(query: string): Promise<string[]> {
const response = await getClient().messages.create({
model: 'claude-haiku-4-5-20251001',
max_tokens: 300,
tools: [
{
name: 'expand_query',
description: 'Generate alternative phrasings of a search query to improve recall',
input_schema: {
type: 'object' as const,
properties: {
alternative_queries: {
type: 'array',
items: { type: 'string' },
description: '2 alternative phrasings of the original query, each approaching the topic from a different angle',
},
},
required: ['alternative_queries'],
},
},
],
tool_choice: { type: 'tool', name: 'expand_query' },
messages: [
{
role: 'user',
content: `Generate 2 alternative search queries that would find relevant results for this question. Each alternative should approach the topic from a different angle or use different terminology.
Original query: "${query}"`,
},
],
});
// Extract tool use result
for (const block of response.content) {
if (block.type === 'tool_use' && block.name === 'expand_query') {
const input = block.input as { alternative_queries?: unknown };
const alts = input.alternative_queries;
if (Array.isArray(alts)) {
return alts.map(String).slice(0, 2);
}
}
}
return [];
}

86
src/core/search/hybrid.ts Normal file
View File

@@ -0,0 +1,86 @@
/**
* Hybrid Search with Reciprocal Rank Fusion (RRF)
* Ported from production Ruby implementation (content_chunk.rb)
*
* RRF score = sum(1 / (60 + rank_in_list))
* Merges vector + keyword results fairly regardless of score scale.
*/
import type { BrainEngine } from '../engine.ts';
import type { SearchResult, SearchOpts } from '../types.ts';
import { embed } from '../embedding.ts';
import { dedupResults } from './dedup.ts';
const RRF_K = 60;
export interface HybridSearchOpts extends SearchOpts {
expansion?: boolean;
expandFn?: (query: string) => Promise<string[]>;
}
export async function hybridSearch(
engine: BrainEngine,
query: string,
opts?: HybridSearchOpts,
): Promise<SearchResult[]> {
const limit = opts?.limit || 20;
// Determine query variants (optionally with expansion)
let queries = [query];
if (opts?.expansion && opts?.expandFn) {
try {
const expanded = await opts.expandFn(query);
queries = [query, ...expanded].slice(0, 3);
} catch {
// Expansion failure is non-fatal
}
}
// Embed all query variants
const embeddings = await Promise.all(queries.map(q => embed(q)));
// Run vector search for each embedding
const vectorLists = await Promise.all(
embeddings.map(emb => engine.searchVector(emb, { limit: limit * 2 })),
);
// Run keyword search (only the original query)
const keywordResults = await engine.searchKeyword(query, { limit: limit * 2 });
// Merge all result lists via RRF
const allLists = [...vectorLists, keywordResults];
const fused = rrfFusion(allLists);
// Dedup
const deduped = dedupResults(fused);
return deduped.slice(0, limit);
}
/**
* Reciprocal Rank Fusion: merge multiple ranked lists.
* Each result gets score = sum(1 / (K + rank)) across all lists it appears in.
*/
function rrfFusion(lists: SearchResult[][]): SearchResult[] {
const scores = new Map<string, { result: SearchResult; score: number }>();
for (const list of lists) {
for (let rank = 0; rank < list.length; rank++) {
const r = list[rank];
const key = `${r.slug}:${r.chunk_text.slice(0, 50)}`;
const existing = scores.get(key);
const rrfScore = 1 / (RRF_K + rank);
if (existing) {
existing.score += rrfScore;
} else {
scores.set(key, { result: r, score: rrfScore });
}
}
}
// Sort by fused score descending
return Array.from(scores.values())
.sort((a, b) => b.score - a.score)
.map(({ result, score }) => ({ ...result, score }));
}

View File

@@ -0,0 +1,10 @@
import type { BrainEngine } from '../engine.ts';
import type { SearchResult, SearchOpts } from '../types.ts';
export async function keywordSearch(
engine: BrainEngine,
query: string,
opts?: SearchOpts,
): Promise<SearchResult[]> {
return engine.searchKeyword(query, opts);
}

10
src/core/search/vector.ts Normal file
View File

@@ -0,0 +1,10 @@
import type { BrainEngine } from '../engine.ts';
import type { SearchResult, SearchOpts } from '../types.ts';
export async function vectorSearch(
engine: BrainEngine,
embedding: Float32Array,
opts?: SearchOpts,
): Promise<SearchResult[]> {
return engine.searchVector(embedding, opts);
}

183
src/core/types.ts Normal file
View File

@@ -0,0 +1,183 @@
// Page types
export type PageType = 'person' | 'company' | 'deal' | 'yc' | 'civic' | 'project' | 'concept' | 'source' | 'media';
export interface Page {
id: number;
slug: string;
type: PageType;
title: string;
compiled_truth: string;
timeline: string;
frontmatter: Record<string, unknown>;
content_hash?: string;
created_at: Date;
updated_at: Date;
}
export interface PageInput {
type: PageType;
title: string;
compiled_truth: string;
timeline?: string;
frontmatter?: Record<string, unknown>;
}
export interface PageFilters {
type?: PageType;
tag?: string;
limit?: number;
offset?: number;
}
// Chunks
export interface Chunk {
id: number;
page_id: number;
chunk_index: number;
chunk_text: string;
chunk_source: 'compiled_truth' | 'timeline';
embedding: Float32Array | null;
model: string;
token_count: number | null;
embedded_at: Date | null;
}
export interface ChunkInput {
chunk_index: number;
chunk_text: string;
chunk_source: 'compiled_truth' | 'timeline';
embedding?: Float32Array;
model?: string;
token_count?: number;
}
// Search
export interface SearchResult {
slug: string;
page_id: number;
title: string;
type: PageType;
chunk_text: string;
chunk_source: 'compiled_truth' | 'timeline';
score: number;
stale: boolean;
}
export interface SearchOpts {
limit?: number;
type?: PageType;
exclude_slugs?: string[];
}
// Links
export interface Link {
from_slug: string;
to_slug: string;
link_type: string;
context: string;
}
export interface GraphNode {
slug: string;
title: string;
type: PageType;
depth: number;
links: { to_slug: string; link_type: string }[];
}
// Timeline
export interface TimelineEntry {
id: number;
page_id: number;
date: string;
source: string;
summary: string;
detail: string;
created_at: Date;
}
export interface TimelineInput {
date: string;
source?: string;
summary: string;
detail?: string;
}
export interface TimelineOpts {
limit?: number;
after?: string;
before?: string;
}
// Raw data
export interface RawData {
source: string;
data: Record<string, unknown>;
fetched_at: Date;
}
// Versions
export interface PageVersion {
id: number;
page_id: number;
compiled_truth: string;
frontmatter: Record<string, unknown>;
snapshot_at: Date;
}
// Stats + Health
export interface BrainStats {
page_count: number;
chunk_count: number;
embedded_count: number;
link_count: number;
tag_count: number;
timeline_entry_count: number;
pages_by_type: Record<string, number>;
}
export interface BrainHealth {
page_count: number;
embed_coverage: number;
stale_pages: number;
orphan_pages: number;
dead_links: number;
missing_embeddings: number;
}
// Ingest log
export interface IngestLogEntry {
id: number;
source_type: string;
source_ref: string;
pages_updated: string[];
summary: string;
created_at: Date;
}
export interface IngestLogInput {
source_type: string;
source_ref: string;
pages_updated: string[];
summary: string;
}
// Config
export interface EngineConfig {
database_url?: string;
database_path?: string;
engine?: 'postgres' | 'sqlite';
}
// Errors
export class GBrainError extends Error {
constructor(
public problem: string,
public cause_description: string,
public fix: string,
public docs_url?: string,
) {
super(`${problem}: ${cause_description}. Fix: ${fix}`);
this.name = 'GBrainError';
}
}

220
src/mcp/server.ts Normal file
View File

@@ -0,0 +1,220 @@
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import type { BrainEngine } from '../core/engine.ts';
import { parseMarkdown, serializeMarkdown } from '../core/markdown.ts';
import { hybridSearch } from '../core/search/hybrid.ts';
import { expandQuery } from '../core/search/expansion.ts';
import { chunkText } from '../core/chunkers/recursive.ts';
import { embedBatch } from '../core/embedding.ts';
import type { ChunkInput } from '../core/types.ts';
export async function startMcpServer(engine: BrainEngine) {
const server = new Server(
{ name: 'gbrain', version: '0.1.0' },
{ capabilities: { tools: {} } },
);
server.setRequestHandler('tools/list' as any, async () => ({
tools: getToolDefinitions(),
}));
server.setRequestHandler('tools/call' as any, async (request: any) => {
const { name, arguments: params } = request.params;
try {
const result = await handleToolCall(engine, name, params || {});
return { content: [{ type: 'text', text: JSON.stringify(result, null, 2) }] };
} catch (e: unknown) {
const msg = e instanceof Error ? e.message : String(e);
return { content: [{ type: 'text', text: `Error: ${msg}` }], isError: true };
}
});
const transport = new StdioServerTransport();
await server.connect(transport);
}
export async function handleToolCall(
engine: BrainEngine,
tool: string,
params: Record<string, unknown>,
): Promise<unknown> {
switch (tool) {
case 'get_page': {
const slug = params.slug as string;
const page = await engine.getPage(slug);
if (!page) return { error: `Page not found: ${slug}` };
const tags = await engine.getTags(slug);
return { ...page, tags };
}
case 'put_page': {
const slug = params.slug as string;
const content = params.content as string;
const parsed = parseMarkdown(content, slug + '.md');
const existing = await engine.getPage(slug);
if (existing) await engine.createVersion(slug);
const page = await engine.putPage(slug, {
type: parsed.type,
title: parsed.title,
compiled_truth: parsed.compiled_truth,
timeline: parsed.timeline,
frontmatter: parsed.frontmatter,
});
for (const tag of parsed.tags) await engine.addTag(slug, tag);
// Chunk and embed
const chunks: ChunkInput[] = [];
if (parsed.compiled_truth.trim()) {
for (const c of chunkText(parsed.compiled_truth)) {
chunks.push({ chunk_index: chunks.length, chunk_text: c.text, chunk_source: 'compiled_truth' });
}
}
if (parsed.timeline.trim()) {
for (const c of chunkText(parsed.timeline)) {
chunks.push({ chunk_index: chunks.length, chunk_text: c.text, chunk_source: 'timeline' });
}
}
if (chunks.length > 0) {
try {
const embeddings = await embedBatch(chunks.map(c => c.chunk_text));
for (let i = 0; i < chunks.length; i++) {
chunks[i].embedding = embeddings[i];
}
} catch { /* non-fatal */ }
await engine.upsertChunks(slug, chunks);
}
return { slug: page.slug, status: existing ? 'updated' : 'created' };
}
case 'delete_page': {
await engine.deletePage(params.slug as string);
return { status: 'deleted' };
}
case 'list_pages': {
const pages = await engine.listPages({
type: params.type as any,
tag: params.tag as string,
limit: (params.limit as number) || 50,
});
return pages.map(p => ({ slug: p.slug, type: p.type, title: p.title, updated_at: p.updated_at }));
}
case 'search': {
return engine.searchKeyword(params.query as string, { limit: (params.limit as number) || 20 });
}
case 'query': {
return hybridSearch(engine, params.query as string, {
limit: (params.limit as number) || 20,
expansion: true,
expandFn: expandQuery,
});
}
case 'add_tag': {
await engine.addTag(params.slug as string, params.tag as string);
return { status: 'ok' };
}
case 'remove_tag': {
await engine.removeTag(params.slug as string, params.tag as string);
return { status: 'ok' };
}
case 'get_tags': {
return engine.getTags(params.slug as string);
}
case 'add_link': {
await engine.addLink(
params.from as string,
params.to as string,
params.context as string || '',
params.link_type as string || '',
);
return { status: 'ok' };
}
case 'remove_link': {
await engine.removeLink(params.from as string, params.to as string);
return { status: 'ok' };
}
case 'get_links': {
return engine.getLinks(params.slug as string);
}
case 'get_backlinks': {
return engine.getBacklinks(params.slug as string);
}
case 'traverse_graph': {
return engine.traverseGraph(params.slug as string, (params.depth as number) || 5);
}
case 'add_timeline_entry': {
await engine.addTimelineEntry(params.slug as string, {
date: params.date as string,
source: params.source as string || '',
summary: params.summary as string,
detail: params.detail as string || '',
});
return { status: 'ok' };
}
case 'get_timeline': {
return engine.getTimeline(params.slug as string);
}
case 'get_stats': {
return engine.getStats();
}
case 'get_health': {
return engine.getHealth();
}
case 'get_versions': {
return engine.getVersions(params.slug as string);
}
case 'revert_version': {
await engine.createVersion(params.slug as string);
await engine.revertToVersion(params.slug as string, params.version_id as number);
return { status: 'reverted' };
}
default:
throw new Error(`Unknown tool: ${tool}`);
}
}
function getToolDefinitions() {
return [
{ name: 'get_page', description: 'Read a page by slug', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
{ name: 'put_page', description: 'Write/update a page (markdown with frontmatter)', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, content: { type: 'string', description: 'Full markdown content with YAML frontmatter' } }, required: ['slug', 'content'] } },
{ name: 'delete_page', description: 'Delete a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
{ name: 'list_pages', description: 'List pages with optional filters', inputSchema: { type: 'object', properties: { type: { type: 'string' }, tag: { type: 'string' }, limit: { type: 'number' } } } },
{ name: 'search', description: 'Keyword search using full-text search', inputSchema: { type: 'object', properties: { query: { type: 'string' }, limit: { type: 'number' } }, required: ['query'] } },
{ name: 'query', description: 'Hybrid search with vector + keyword + multi-query expansion', inputSchema: { type: 'object', properties: { query: { type: 'string' }, limit: { type: 'number' } }, required: ['query'] } },
{ name: 'add_tag', description: 'Add tag to page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, tag: { type: 'string' } }, required: ['slug', 'tag'] } },
{ name: 'remove_tag', description: 'Remove tag from page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, tag: { type: 'string' } }, required: ['slug', 'tag'] } },
{ name: 'get_tags', description: 'List tags for a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
{ name: 'add_link', description: 'Create link between pages', inputSchema: { type: 'object', properties: { from: { type: 'string' }, to: { type: 'string' }, link_type: { type: 'string' }, context: { type: 'string' } }, required: ['from', 'to'] } },
{ name: 'remove_link', description: 'Remove link between pages', inputSchema: { type: 'object', properties: { from: { type: 'string' }, to: { type: 'string' } }, required: ['from', 'to'] } },
{ name: 'get_links', description: 'List outgoing links from a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
{ name: 'get_backlinks', description: 'List incoming links to a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
{ name: 'traverse_graph', description: 'Traverse link graph from a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, depth: { type: 'number', description: 'Max traversal depth (default 5)' } }, required: ['slug'] } },
{ name: 'add_timeline_entry', description: 'Add timeline entry to a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, date: { type: 'string' }, summary: { type: 'string' }, detail: { type: 'string' }, source: { type: 'string' } }, required: ['slug', 'date', 'summary'] } },
{ name: 'get_timeline', description: 'Get timeline entries for a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
{ name: 'get_stats', description: 'Brain statistics (page count, chunk count, etc.)', inputSchema: { type: 'object', properties: {} } },
{ name: 'get_health', description: 'Brain health dashboard (embed coverage, stale pages, orphans)', inputSchema: { type: 'object', properties: {} } },
{ name: 'get_versions', description: 'Page version history', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
{ name: 'revert_version', description: 'Revert page to a previous version', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, version_id: { type: 'number' } }, required: ['slug', 'version_id'] } },
];
}

195
src/schema.sql Normal file
View File

@@ -0,0 +1,195 @@
-- GBrain Postgres + pgvector schema
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- ============================================================
-- pages: the core content table
-- ============================================================
CREATE TABLE IF NOT EXISTS pages (
id SERIAL PRIMARY KEY,
slug TEXT NOT NULL UNIQUE,
type TEXT NOT NULL,
title TEXT NOT NULL,
compiled_truth TEXT NOT NULL DEFAULT '',
timeline TEXT NOT NULL DEFAULT '',
frontmatter JSONB NOT NULL DEFAULT '{}',
content_hash TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_pages_type ON pages(type);
CREATE INDEX IF NOT EXISTS idx_pages_frontmatter ON pages USING GIN(frontmatter);
CREATE INDEX IF NOT EXISTS idx_pages_trgm ON pages USING GIN(title gin_trgm_ops);
-- ============================================================
-- content_chunks: chunked content with embeddings
-- ============================================================
CREATE TABLE IF NOT EXISTS content_chunks (
id SERIAL PRIMARY KEY,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
chunk_text TEXT NOT NULL,
chunk_source TEXT NOT NULL DEFAULT 'compiled_truth',
embedding vector(1536),
model TEXT NOT NULL DEFAULT 'text-embedding-3-large',
token_count INTEGER,
embedded_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_chunks_page ON content_chunks(page_id);
CREATE INDEX IF NOT EXISTS idx_chunks_embedding ON content_chunks USING hnsw (embedding vector_cosine_ops);
-- ============================================================
-- links: cross-references between pages
-- ============================================================
CREATE TABLE IF NOT EXISTS links (
id SERIAL PRIMARY KEY,
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
link_type TEXT NOT NULL DEFAULT '',
context TEXT NOT NULL DEFAULT '',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(from_page_id, to_page_id)
);
CREATE INDEX IF NOT EXISTS idx_links_from ON links(from_page_id);
CREATE INDEX IF NOT EXISTS idx_links_to ON links(to_page_id);
-- ============================================================
-- tags
-- ============================================================
CREATE TABLE IF NOT EXISTS tags (
id SERIAL PRIMARY KEY,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
tag TEXT NOT NULL,
UNIQUE(page_id, tag)
);
CREATE INDEX IF NOT EXISTS idx_tags_tag ON tags(tag);
CREATE INDEX IF NOT EXISTS idx_tags_page_id ON tags(page_id);
-- ============================================================
-- raw_data: sidecar data (replaces .raw/ JSON files)
-- ============================================================
CREATE TABLE IF NOT EXISTS raw_data (
id SERIAL PRIMARY KEY,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
source TEXT NOT NULL,
data JSONB NOT NULL,
fetched_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(page_id, source)
);
CREATE INDEX IF NOT EXISTS idx_raw_data_page ON raw_data(page_id);
-- ============================================================
-- timeline_entries: structured timeline
-- ============================================================
CREATE TABLE IF NOT EXISTS timeline_entries (
id SERIAL PRIMARY KEY,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
date DATE NOT NULL,
source TEXT NOT NULL DEFAULT '',
summary TEXT NOT NULL,
detail TEXT NOT NULL DEFAULT '',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_timeline_page ON timeline_entries(page_id);
CREATE INDEX IF NOT EXISTS idx_timeline_date ON timeline_entries(date);
-- ============================================================
-- page_versions: snapshot history for compiled_truth
-- ============================================================
CREATE TABLE IF NOT EXISTS page_versions (
id SERIAL PRIMARY KEY,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
compiled_truth TEXT NOT NULL,
frontmatter JSONB NOT NULL DEFAULT '{}',
snapshot_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_versions_page ON page_versions(page_id);
-- ============================================================
-- ingest_log
-- ============================================================
CREATE TABLE IF NOT EXISTS ingest_log (
id SERIAL PRIMARY KEY,
source_type TEXT NOT NULL,
source_ref TEXT NOT NULL,
pages_updated JSONB NOT NULL DEFAULT '[]',
summary TEXT NOT NULL DEFAULT '',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ============================================================
-- config: brain-level settings
-- ============================================================
CREATE TABLE IF NOT EXISTS config (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
INSERT INTO config (key, value) VALUES
('version', '1'),
('embedding_model', 'text-embedding-3-large'),
('embedding_dimensions', '1536'),
('chunk_strategy', 'semantic')
ON CONFLICT (key) DO NOTHING;
-- ============================================================
-- Trigger-based search_vector (spans pages + timeline_entries)
-- ============================================================
ALTER TABLE pages ADD COLUMN IF NOT EXISTS search_vector tsvector;
CREATE INDEX IF NOT EXISTS idx_pages_search ON pages USING GIN(search_vector);
-- Function to rebuild search_vector for a page
CREATE OR REPLACE FUNCTION update_page_search_vector() RETURNS trigger AS $$
DECLARE
timeline_text TEXT;
BEGIN
-- Gather timeline_entries text for this page
SELECT coalesce(string_agg(summary || ' ' || detail, ' '), '')
INTO timeline_text
FROM timeline_entries
WHERE page_id = NEW.id;
-- Build weighted tsvector
NEW.search_vector :=
setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(NEW.compiled_truth, '')), 'B') ||
setweight(to_tsvector('english', coalesce(NEW.timeline, '')), 'C') ||
setweight(to_tsvector('english', coalesce(timeline_text, '')), 'C');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS trg_pages_search_vector ON pages;
CREATE TRIGGER trg_pages_search_vector
BEFORE INSERT OR UPDATE ON pages
FOR EACH ROW
EXECUTE FUNCTION update_page_search_vector();
-- When timeline_entries change, update the parent page's search_vector
CREATE OR REPLACE FUNCTION update_page_search_vector_from_timeline() RETURNS trigger AS $$
DECLARE
page_row pages%ROWTYPE;
BEGIN
-- Touch the page to re-fire its trigger
UPDATE pages SET updated_at = now()
WHERE id = coalesce(NEW.page_id, OLD.page_id);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS trg_timeline_search_vector ON timeline_entries;
CREATE TRIGGER trg_timeline_search_vector
AFTER INSERT OR UPDATE OR DELETE ON timeline_entries
FOR EACH ROW
EXECUTE FUNCTION update_page_search_vector_from_timeline();

View File

@@ -0,0 +1,77 @@
import { describe, test, expect } from 'bun:test';
import { chunkText } from '../../src/core/chunkers/recursive.ts';
describe('Recursive Text Chunker', () => {
test('returns empty array for empty input', () => {
expect(chunkText('')).toEqual([]);
expect(chunkText(' ')).toEqual([]);
});
test('returns single chunk for short text', () => {
const text = 'Hello world. This is a short text.';
const chunks = chunkText(text);
expect(chunks).toHaveLength(1);
expect(chunks[0].text).toBe(text.trim());
expect(chunks[0].index).toBe(0);
});
test('splits at paragraph boundaries', () => {
const paragraph = 'word '.repeat(200).trim();
const text = paragraph + '\n\n' + paragraph;
const chunks = chunkText(text, { chunkSize: 250 });
expect(chunks.length).toBeGreaterThanOrEqual(2);
});
test('respects chunk size target', () => {
const text = 'word '.repeat(1000).trim();
const chunks = chunkText(text, { chunkSize: 100 });
for (const chunk of chunks) {
const wordCount = chunk.text.split(/\s+/).length;
// Allow up to 1.5x target due to greedy merge
expect(wordCount).toBeLessThanOrEqual(150);
}
});
test('applies overlap between chunks', () => {
const text = 'word '.repeat(1000).trim();
const chunks = chunkText(text, { chunkSize: 100, chunkOverlap: 20 });
expect(chunks.length).toBeGreaterThan(1);
// Second chunk should start with words from end of first chunk
// (overlap means shared content between adjacent chunks)
expect(chunks[1].text.length).toBeGreaterThan(0);
});
test('splits at sentence boundaries', () => {
const sentences = Array.from({ length: 50 }, (_, i) =>
`This is sentence number ${i} with some content about topic ${i}.`
).join(' ');
const chunks = chunkText(sentences, { chunkSize: 50 });
expect(chunks.length).toBeGreaterThan(1);
// Each chunk should end near a sentence boundary
for (const chunk of chunks.slice(0, -1)) {
// Allow for overlap text, but the core content should have sentence endings
expect(chunk.text).toMatch(/[.!?]/);
}
});
test('assigns sequential indices', () => {
const text = 'word '.repeat(1000).trim();
const chunks = chunkText(text, { chunkSize: 100 });
for (let i = 0; i < chunks.length; i++) {
expect(chunks[i].index).toBe(i);
}
});
test('handles single word input', () => {
const chunks = chunkText('hello');
expect(chunks).toHaveLength(1);
expect(chunks[0].text).toBe('hello');
});
test('handles unicode text', () => {
const text = 'Bonjour le monde. ' + 'Ceci est un texte en francais. '.repeat(100);
const chunks = chunkText(text, { chunkSize: 50 });
expect(chunks.length).toBeGreaterThan(1);
expect(chunks[0].text).toContain('Bonjour');
});
});

148
test/markdown.test.ts Normal file
View File

@@ -0,0 +1,148 @@
import { describe, test, expect } from 'bun:test';
import { parseMarkdown, serializeMarkdown, splitBody } from '../src/core/markdown.ts';
describe('Markdown Parser', () => {
test('parses frontmatter + compiled_truth + timeline', () => {
const md = `---
type: concept
title: Do Things That Don't Scale
tags: [startups, growth]
---
Paul Graham argues that startups should do unscalable things early on.
---
- 2013-07-01: Published on paulgraham.com
- 2024-11-15: Referenced in batch kickoff talk
`;
const parsed = parseMarkdown(md);
expect(parsed.type).toBe('concept');
expect(parsed.title).toBe("Do Things That Don't Scale");
expect(parsed.tags).toEqual(['startups', 'growth']);
expect(parsed.compiled_truth).toContain('unscalable things');
expect(parsed.timeline).toContain('Published on paulgraham.com');
expect(parsed.timeline).toContain('batch kickoff talk');
});
test('handles no timeline separator', () => {
const md = `---
type: concept
title: Superlinear Returns
---
Returns in many fields are superlinear.
Performance compounds over time.
`;
const parsed = parseMarkdown(md);
expect(parsed.compiled_truth).toContain('superlinear');
expect(parsed.timeline).toBe('');
});
test('handles empty body', () => {
const md = `---
type: concept
title: Empty Page
---
`;
const parsed = parseMarkdown(md);
expect(parsed.compiled_truth).toBe('');
expect(parsed.timeline).toBe('');
});
test('removes type, title, tags from frontmatter object', () => {
const md = `---
type: concept
title: Test
tags: [a, b]
custom_field: hello
---
Content
`;
const parsed = parseMarkdown(md);
expect(parsed.frontmatter).not.toHaveProperty('type');
expect(parsed.frontmatter).not.toHaveProperty('title');
expect(parsed.frontmatter).not.toHaveProperty('tags');
expect(parsed.frontmatter).toHaveProperty('custom_field', 'hello');
});
test('infers type from file path', () => {
const md = `---
title: Someone
---
Content
`;
const parsed = parseMarkdown(md, 'people/someone.md');
expect(parsed.type).toBe('person');
});
test('infers slug from file path', () => {
const md = `---
type: concept
title: Test
---
Content
`;
const parsed = parseMarkdown(md, 'concepts/do-things-that-dont-scale.md');
expect(parsed.slug).toBe('concepts/do-things-that-dont-scale');
});
});
describe('splitBody', () => {
test('splits at first standalone ---', () => {
const body = 'Above the line\n\n---\n\nBelow the line';
const { compiled_truth, timeline } = splitBody(body);
expect(compiled_truth).toContain('Above the line');
expect(timeline).toContain('Below the line');
});
test('returns all as compiled_truth if no separator', () => {
const body = 'Just some content\nWith multiple lines';
const { compiled_truth, timeline } = splitBody(body);
expect(compiled_truth).toBe(body);
expect(timeline).toBe('');
});
test('handles --- at end of content', () => {
const body = 'Content here\n\n---\n';
const { compiled_truth, timeline } = splitBody(body);
expect(compiled_truth).toContain('Content here');
expect(timeline.trim()).toBe('');
});
});
describe('serializeMarkdown', () => {
test('round-trips through parse and serialize', () => {
const original = `---
type: concept
title: Do Things That Don't Scale
tags:
- startups
- growth
custom: value
---
Paul Graham argues that startups should do unscalable things early on.
---
- 2013-07-01: Published on paulgraham.com
`;
const parsed = parseMarkdown(original);
const serialized = serializeMarkdown(
parsed.frontmatter,
parsed.compiled_truth,
parsed.timeline,
{ type: parsed.type, title: parsed.title, tags: parsed.tags },
);
// Re-parse the serialized version
const reparsed = parseMarkdown(serialized);
expect(reparsed.type).toBe(parsed.type);
expect(reparsed.title).toBe(parsed.title);
expect(reparsed.compiled_truth).toBe(parsed.compiled_truth);
expect(reparsed.timeline).toBe(parsed.timeline);
expect(reparsed.frontmatter.custom).toBe('value');
});
});

19
tsconfig.json Normal file
View File

@@ -0,0 +1,19 @@
{
"compilerOptions": {
"target": "ESNext",
"module": "ESNext",
"moduleResolution": "bundler",
"types": ["bun-types"],
"strict": true,
"skipLibCheck": true,
"noEmit": true,
"esModuleInterop": true,
"allowImportingTsExtensions": true,
"resolveJsonModule": true,
"baseUrl": ".",
"paths": {
"@/*": ["src/*"]
}
},
"include": ["src", "test"]
}