* chore: add CLAUDE.md with project context and gstack skill routing rules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: initialize project with Bun + TypeScript package.json with dependencies (postgres, pgvector, openai, anthropic, MCP SDK, gray-matter). TypeScript config targeting ESNext with bundler module resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add foundation layer — engine interface, Postgres engine, schema BrainEngine pluggable interface with full PostgresEngine: CRUD, search (keyword + vector), links, tags, timeline, versions, stats, health, ingest log, config. Trigger-based tsvector spanning pages + timeline_entries. Markdown parser with frontmatter, compiled_truth / timeline splitting, and round-trip serialization. 19 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add 3-tier chunking and embedding service Recursive delimiter-aware chunker (5-level hierarchy, 300-word chunks, 50-word overlap). Semantic chunker with Savitzky-Golay boundary detection and recursive fallback. LLM-guided chunker via Claude Haiku with sliding window topic detection. OpenAI embedding service with batch support, exponential backoff, and rate limit handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add hybrid search with RRF fusion, expansion, and 4-layer dedup Hybrid search merges vector (pgvector HNSW) + keyword (tsvector) via Reciprocal Rank Fusion. Multi-query expansion via Claude Haiku generates 2 alternative phrasings. 4-layer dedup pipeline: by source, cosine similarity, type diversity (60% cap), per-page cap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add GBRAIN_V0 spec, pluggable engine architecture, SQLite engine plan GBRAIN_V0.md: full product spec with architecture decisions, CLI commands, schema, search architecture, chunking strategies, first-time experience, and future plans. ENGINES.md: pluggable engine interface, capability matrix, how to add new backends. SQLITE_ENGINE.md: complete SQLite implementation plan with schema, FTS5 setup, vector search options, and contributor guide. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add CLI with all commands Full CLI dispatcher with 25+ commands: init (Supabase wizard), get, put, delete, list, search, query (hybrid RRF), import (bulk with progress bar), export (round-trip), embed, stats, health, tag/untag/tags, link/unlink/ backlinks/graph, timeline/timeline-add, history/revert, config, upgrade, serve, call. Smart slug resolution on reads. Version snapshots on updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add MCP stdio server with all brain tools 20 MCP tools mirroring CLI operations: get/put/delete/list pages, search (keyword), query (hybrid RRF + expansion), tags, links with graph traversal, timeline, stats, health, version history, and revert. Auto-chunks and embeds on put_page. CLI and MCP share the same engine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add 6 skill files and ClawHub manifest Fat markdown skills for AI agents: ingest (meetings/docs/articles with timeline merge), query (3-layer search + synthesis + citations), maintain (health checks, stale detection, orphan audit), enrich (external API enrichment), briefing (daily briefing compilation), migrate (universal migration from Obsidian/Notion/Logseq/markdown/CSV/JSON/Roam). ClawHub manifest for skill distribution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add README, CONTRIBUTING, update CLAUDE.md test references README with quickstart, commands, architecture, library usage, MCP setup, and links to design docs. CONTRIBUTING with setup, project structure, and guides for adding commands and engines. CLAUDE.md updated to reference actual test files instead of planned-but-unwritten import test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address adversarial review findings — 5 critical/high fixes - revertToVersion: add page_id check to prevent cross-page data corruption - traverseGraph: use UNION instead of UNION ALL for cycle safety - embedAll: preserve all chunks when embedding stale subset only - embedding: throw on retry exhaustion instead of returning zero vectors - putPage: validate slugs to prevent path traversal on export Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.1.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: expand README with schema, install, search architecture, and motivation Why it exists, how search works (with ASCII diagram), full database schema with all 9 tables and index details, chunking strategies explained, storage estimates, setup wizard walkthrough, knowledge model with example page, library usage with more examples, expanded skills table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add MIT license (Copyright 2026 Garry Tan) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add OpenClaw install flow as primary option in README OpenClaw users just say "install gbrain" and the orchestrator handles everything: package install, Supabase setup wizard, skill registration. Shows the conversational interface for querying, ingesting, and briefings. ClawHub and standalone CLI paths follow as alternatives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add prerequisites and explicit OpenClaw install instructions Prerequisites table listing Supabase, OpenAI, and Anthropic dependencies with links. Environment variable setup. Explicit step-by-step prompt for OpenClaw users showing exactly what to tell the orchestrator. Note that search degrades gracefully without API keys (keyword-only without OpenAI, no expansion without Anthropic). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: scrub named references, add PG essay demo section to README Replace all Pedro/Brex/Jensen Huang/River AI examples with Paul Graham essay examples using the kindling corpus. Add "Try it" section to README showing the power of hybrid search on PG essays in 90 seconds. Update test fixtures to use concept pages instead of person pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9.3 KiB
Pluggable Engine Architecture
The idea
Every GBrain operation goes through BrainEngine. The engine is the contract between "what the brain can do" and "how it's stored." Swap the engine, keep everything else.
v0 ships PostgresEngine backed by Supabase. The interface is designed so a SQLiteEngine, DuckDBEngine, or TursoEngine could slot in without touching the CLI, MCP server, skills, or any consumer code.
Why this matters
Different users have different constraints:
| User | Needs | Best engine |
|---|---|---|
| Power user (you) | World-class search, 7K+ pages, zero-ops | PostgresEngine + Supabase |
| Open source hacker | Single file, no server, git-friendly | SQLiteEngine (future) |
| Team/enterprise | Multi-user, RLS, audit trail | PostgresEngine + self-hosted |
| Researcher | Analytics, bulk exports, embeddings | DuckDBEngine (someday) |
| Edge/mobile | Offline-first, sync later | SQLiteEngine + sync (someday) |
The engine interface means we don't have to choose. Ship Postgres now, let the community build the rest.
The interface
// src/core/engine.ts
export interface BrainEngine {
// Lifecycle
connect(config: EngineConfig): Promise<void>;
disconnect(): Promise<void>;
initSchema(): Promise<void>;
transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T>;
// Pages CRUD
getPage(slug: string): Promise<Page | null>;
putPage(slug: string, page: PageInput): Promise<Page>;
deletePage(slug: string): Promise<void>;
listPages(filters: PageFilters): Promise<Page[]>;
// Search
searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]>;
searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]>;
// Chunks
upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void>;
getChunks(slug: string): Promise<Chunk[]>;
// Links
addLink(from: string, to: string, context?: string, linkType?: string): Promise<void>;
removeLink(from: string, to: string): Promise<void>;
getLinks(slug: string): Promise<Link[]>;
getBacklinks(slug: string): Promise<Link[]>;
traverseGraph(slug: string, depth?: number): Promise<GraphNode[]>;
// Tags
addTag(slug: string, tag: string): Promise<void>;
removeTag(slug: string, tag: string): Promise<void>;
getTags(slug: string): Promise<string[]>;
// Timeline
addTimelineEntry(slug: string, entry: TimelineInput): Promise<void>;
getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]>;
// Raw data
putRawData(slug: string, source: string, data: object): Promise<void>;
getRawData(slug: string, source?: string): Promise<RawData[]>;
// Versions
createVersion(slug: string): Promise<PageVersion>;
getVersions(slug: string): Promise<PageVersion[]>;
revertToVersion(slug: string, versionId: number): Promise<void>;
// Stats + health
getStats(): Promise<BrainStats>;
getHealth(): Promise<BrainHealth>;
// Ingest log
logIngest(entry: IngestLogInput): Promise<void>;
getIngestLog(opts?: IngestLogOpts): Promise<IngestLogEntry[]>;
// Config
getConfig(key: string): Promise<string | null>;
setConfig(key: string, value: string): Promise<void>;
}
Key design choices
Slug-based API, not ID-based. Every method takes slugs, not numeric IDs. The engine resolves slugs to IDs internally. This keeps the interface portable... slugs are strings, IDs are database-specific.
Embedding is NOT in the engine. The engine stores embeddings and searches by vector, but it doesn't generate embeddings. src/core/embedding.ts handles that. This is intentional: embedding is an external API call (OpenAI), not a storage concern. All engines share the same embedding service.
Chunking is NOT in the engine. Same logic. src/core/chunkers/ handles chunking. The engine stores and retrieves chunks. All engines share the same chunkers.
Search returns SearchResult[], not raw rows. The engine is responsible for its own search implementation (tsvector vs FTS5, pgvector vs sqlite-vss) but must return a uniform result type. RRF fusion and dedup happen above the engine, in src/core/search/hybrid.ts.
traverseGraph exists but is engine-specific. Postgres uses recursive CTEs. SQLite would use a loop with depth tracking. The interface is the same: give me a slug and max depth, return the graph.
How search works across engines
+-------------------+
| hybrid.ts |
| (RRF fusion + |
| dedup, shared) |
+--------+----------+
|
+------------+------------+
| |
+--------v--------+ +--------v--------+
| engine.search | | engine.search |
| Keyword() | | Vector() |
+-----------------+ +-----------------+
| |
+-----------+-----------+ +---------+---------+
| | | |
+-------v-------+ +-------v---+ +-------v---+ +----v--------+
| Postgres: | | SQLite: | | Postgres: | | SQLite: |
| tsvector + | | FTS5 + | | pgvector | | sqlite-vss |
| ts_rank + | | bm25 | | HNSW | | or vec0 |
| websearch_to_ | | | | cosine | | |
| tsquery | | | | | | |
+---------------+ +-----------+ +-----------+ +-------------+
RRF fusion, multi-query expansion, and 4-layer dedup are engine-agnostic. They operate on SearchResult[] arrays. Only the raw keyword and vector searches are engine-specific.
PostgresEngine (v0, ships)
Dependencies: postgres (porsager/postgres), pgvector
Postgres-specific features used:
tsvector+GINindex for full-text search withts_rankweightingpgvectorHNSW index for cosine similarity vector searchpg_trgm+GINfor fuzzy slug resolution- Recursive CTEs for graph traversal
- Trigger-based search_vector (spans pages + timeline_entries)
- JSONB for frontmatter with GIN index
- Connection pooling via Supabase Supavisor (port 6543)
Hosting: Supabase Pro ($25/mo). Zero-ops. Managed Postgres with pgvector built in.
Why not self-hosted for v0: The brain should be infrastructure agents use, not something you maintain. Self-hosted Postgres with Docker is a welcome community PR, but v0 optimizes for zero ops.
Adding a new engine
- Create
src/core/<name>-engine.tsimplementingBrainEngine - Add to engine factory in
src/core/engine.ts:export function createEngine(type: string): BrainEngine { switch (type) { case 'postgres': return new PostgresEngine(); case 'sqlite': return new SQLiteEngine(); default: throw new Error(`Unknown engine: ${type}`); } } - Store engine type in
~/.gbrain/config.json:{ "engine": "sqlite", ... } - Add tests. The test suite should be engine-agnostic where possible... same test cases, different engine constructor.
- Document in this file + add a design doc in
docs/
What you DON'T need to touch
src/cli.ts(dispatches to engine, doesn't know which one)src/mcp/server.ts(same)src/core/chunkers/*(shared across engines)src/core/embedding.ts(shared across engines)src/core/search/hybrid.ts,expansion.ts,dedup.ts(shared, operate on SearchResult[])skills/*(fat markdown, engine-agnostic)
What you DO need to implement
Every method in BrainEngine. The full interface. No optional methods, no feature flags. If your engine can't do vector search (e.g., a pure-text engine), implement searchVector to return [] and document the limitation.
Capability matrix
| Capability | PostgresEngine | SQLiteEngine (future) | Notes |
|---|---|---|---|
| CRUD | Full | Full | |
| Keyword search | tsvector + ts_rank | FTS5 + bm25 | Different ranking algorithms |
| Vector search | pgvector HNSW | sqlite-vss or vec0 | Different index types |
| Fuzzy slug | pg_trgm | LIKE + Levenshtein | Postgres is better here |
| Graph traversal | Recursive CTE | Loop with depth tracking | Same interface |
| Transactions | Full ACID | Full ACID | Both support this |
| JSONB queries | GIN index | json_extract | Postgres is richer |
| Concurrent access | Connection pooling | Single writer | SQLite limitation |
| Hosting | Supabase, self-hosted, Docker | Local file |
Future engine ideas
SQLiteEngine (most requested). See docs/SQLITE_ENGINE.md for the full plan. Single file, no server, git-friendly. Uses FTS5 for keyword search, sqlite-vss or vec0 for vector search. Great for open source users who want zero infrastructure.
TursoEngine. libSQL (SQLite fork) with embedded replicas and HTTP edge access. Would give SQLite's simplicity with cloud sync. Interesting for mobile/edge use cases.
DuckDBEngine. Analytical workloads. Bulk exports, embedding analysis, brain-wide statistics. Not for OLTP. Could be a secondary engine for analytics alongside Postgres for operations.
Custom/Remote. The interface is clean enough that someone could build an engine backed by any storage: Firestore, DynamoDB, a REST API, even a flat file system. The interface doesn't assume SQL.