Files
gbrain/docs/ENGINES.md
Garry Tan 6c7d2ed30b feat: PGLite engine — local brain, zero infrastructure (v0.7.0) (#41)
* refactor: extract shared utils, add runMigration + getChunksWithEmbeddings to BrainEngine

Extract validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult
from postgres-engine.ts into shared utils.ts. Add rowToChunk includeEmbedding
parameter for migration support.

Add two new methods to BrainEngine interface:
- runMigration(version, sql) — replaces internal eng.sql access in migrate.ts
- getChunksWithEmbeddings(slug) — returns chunks with embedding data for migration

Replace 'sqlite' with 'pglite' in EngineConfig and GBrainConfig types.
Fix loadConfig to infer engine from database_path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: pluggable engine factory + hybridSearch keyword-only fallback

Add createEngine() factory with dynamic imports so PGLite WASM is never
loaded for Postgres users. Wire CLI to use factory instead of hardcoded
PostgresEngine.

Force workers=1 for PGLite imports (single-connection architecture).

Fix hybridSearch to check OPENAI_API_KEY before calling embed(). When
unset, returns keyword-only results instead of throwing. Critical for
local PGLite users who don't need vector search.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: PGLiteEngine — embedded Postgres 17.5 via WASM, same SQL everywhere

Full BrainEngine implementation (37 methods) using @electric-sql/pglite.
Same SQL as PostgresEngine — tsvector triggers, pgvector HNSW, pg_trgm
fuzzy matching, recursive CTEs, JSONB. Only the driver call syntax differs
(parameterized queries instead of tagged templates).

PGLite schema is the Postgres schema minus RLS, advisory locks, and
remote auth tables (access_tokens, mcp_request_log, files).

No server. No subscription. One directory. Works offline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: smart init (PGLite default) + bidirectional engine migration

gbrain init now defaults to PGLite — brain ready in 2 seconds, no
server needed. Scans target directory: <1000 .md files = PGLite,
>=1000 = suggests Supabase. --supabase and --pglite flags override.

gbrain migrate --to supabase/pglite transfers all data between engines
with manifest-based resume. Copies pages, chunks (with embeddings),
tags, timeline, raw data, links, and config. --force overwrites
non-empty target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: 60 new tests for PGLite engine, utils, and factory

41 PGLite engine tests covering all 37 BrainEngine methods: CRUD,
tsvector keyword search, pg_trgm fuzzy matching, chunk upsert with
COALESCE, graph traversal via recursive CTE, transactions, cascade
deletes, stats/health, and embedding round-trip.

14 shared utility tests (validateSlug, contentHash, row mappers).
5 engine factory tests (dispatch, error messages).

All run in-memory — zero Docker, zero DATABASE_URL, instant in CI.

Add P0 TODO: submit Bun PR for WASM embedding in bun build --compile
(oven-sh/bun#15032).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.7.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.7.0 PGLite engine

- CLAUDE.md: add PGLite key files, update architecture, add migrate command, add 3 test files
- README.md: PGLite as default init, zero-config getting started, migration path to Supabase
- docs/ENGINES.md: PGLiteEngine shipped (v0.7), capability matrix, migration docs
- docs/SQLITE_ENGINE.md: marked superseded by PGLite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove stale v0.4 README update prompt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove SQLITE_ENGINE.md (superseded by PGLite)

PGLite uses the same SQL as Postgres, making a separate SQLite
engine unnecessary. docs/ENGINES.md covers PGLiteEngine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update README step 2 to default to PGLite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add schema setup step and install-all-integrations step to README

Step 3 now tells agents to read GBRAIN_RECOMMENDED_SCHEMA.md and set up
the MECE directory structure before importing. Step 7 tells agents to
install every available integration recipe, not just list them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update install goal to match full opinionated setup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add 'Need an AI agent first?' section with one-click deploy links

New users who don't have OpenClaw or Hermes Agent get pointed to
AlphaClaw on Render and the Hermes Agent Railway template. One click
each. Claude Code mentioned for users who already have it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add migrate to CLI_ONLY + help output, fix standalone example

- migrate command was missing from CLI_ONLY set (errored as "Unknown command")
- migrate now shows in --help under SETUP
- init help line shows --pglite flag
- standalone CLI example uses gbrain init (not --supabase)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: set realistic time expectation (~30 min to working brain)

DB is 2 seconds. But schema + import + embeddings + integrations
is 15-30 minutes. The agent does the work, you answer API key
questions. Don't oversell time-to-value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix AlphaClaw Render requirement (8GB+ RAM, not free tier)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: final README polish for launch

- GOAL line: "Garry Tan's exact setup" (not Claude Code specific)
- Remove markdown links from code block (won't render)
- STEP 2 renamed from "START HERE" to "DATABASE"
- Tighten Supabase fallback text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove duplicate old install block from README

The v0.5-era "With OpenClaw or Hermes Agent" paste block was
superseded by the top-level "Start here" block. Having both
confused users and the old one still said --supabase as step 2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clean up README consistency and remove duplicated content

- Remove duplicate "Try it" section (old 4-act walkthrough that
  repeated the install flow and contradicted "~30 min" with "90 sec")
- Remove duplicate Setup section (third repetition of gbrain init)
- Fix brain.db → brain.pglite (actual default path)
- Fix "coming in v0.7" → "not yet implemented" (we ARE v0.7)
- Remove "You don't need Postgres" (confusing since PGLite IS Postgres)
- Deduplicate "competitive dynamics" query (appeared 3 times)
- Collapse redundant standalone CLI section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:01:09 -10:00

235 lines
11 KiB
Markdown

# Pluggable Engine Architecture
## The idea
Every GBrain operation goes through `BrainEngine`. The engine is the contract between "what the brain can do" and "how it's stored." Swap the engine, keep everything else.
v0 shipped `PostgresEngine` backed by Supabase. v0.7 adds `PGLiteEngine` -- embedded Postgres 17.5 via WASM (@electric-sql/pglite), zero-config default. The interface is designed so a `DuckDBEngine`, `TursoEngine`, or any custom backend could slot in without touching the CLI, MCP server, skills, or any consumer code.
## Why this matters
Different users have different constraints:
| User | Needs | Best engine |
|------|-------|-------------|
| Getting started | Zero-config, no accounts, no server | PGLiteEngine (default since v0.7) |
| Power user (you) | World-class search, 7K+ pages, zero-ops | PostgresEngine + Supabase |
| Open source hacker | Single file, no server, git-friendly | PGLiteEngine |
| Team/enterprise | Multi-user, RLS, audit trail | PostgresEngine + self-hosted |
| Researcher | Analytics, bulk exports, embeddings | DuckDBEngine (someday) |
| Edge/mobile | Offline-first, sync later | PGLiteEngine + sync (someday) |
The engine interface means we don't have to choose. PGLite is the zero-friction default. Supabase is the production scale path. `gbrain migrate --to supabase/pglite` moves between them.
## The interface
```typescript
// src/core/engine.ts
export interface BrainEngine {
// Lifecycle
connect(config: EngineConfig): Promise<void>;
disconnect(): Promise<void>;
initSchema(): Promise<void>;
transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T>;
// Pages CRUD
getPage(slug: string): Promise<Page | null>;
putPage(slug: string, page: PageInput): Promise<Page>;
deletePage(slug: string): Promise<void>;
listPages(filters: PageFilters): Promise<Page[]>;
// Search
searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]>;
searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]>;
// Chunks
upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void>;
getChunks(slug: string): Promise<Chunk[]>;
// Links
addLink(from: string, to: string, context?: string, linkType?: string): Promise<void>;
removeLink(from: string, to: string): Promise<void>;
getLinks(slug: string): Promise<Link[]>;
getBacklinks(slug: string): Promise<Link[]>;
traverseGraph(slug: string, depth?: number): Promise<GraphNode[]>;
// Tags
addTag(slug: string, tag: string): Promise<void>;
removeTag(slug: string, tag: string): Promise<void>;
getTags(slug: string): Promise<string[]>;
// Timeline
addTimelineEntry(slug: string, entry: TimelineInput): Promise<void>;
getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]>;
// Raw data
putRawData(slug: string, source: string, data: object): Promise<void>;
getRawData(slug: string, source?: string): Promise<RawData[]>;
// Versions
createVersion(slug: string): Promise<PageVersion>;
getVersions(slug: string): Promise<PageVersion[]>;
revertToVersion(slug: string, versionId: number): Promise<void>;
// Stats + health
getStats(): Promise<BrainStats>;
getHealth(): Promise<BrainHealth>;
// Ingest log
logIngest(entry: IngestLogInput): Promise<void>;
getIngestLog(opts?: IngestLogOpts): Promise<IngestLogEntry[]>;
// Config
getConfig(key: string): Promise<string | null>;
setConfig(key: string, value: string): Promise<void>;
// Migration + advanced (added v0.7)
runMigration(sql: string): Promise<void>;
getChunksWithEmbeddings(slug: string): Promise<ChunkWithEmbedding[]>;
}
```
### Key design choices
**Slug-based API, not ID-based.** Every method takes slugs, not numeric IDs. The engine resolves slugs to IDs internally. This keeps the interface portable... slugs are strings, IDs are database-specific.
**Embedding is NOT in the engine.** The engine stores embeddings and searches by vector, but it doesn't generate embeddings. `src/core/embedding.ts` handles that. This is intentional: embedding is an external API call (OpenAI), not a storage concern. All engines share the same embedding service.
**Chunking is NOT in the engine.** Same logic. `src/core/chunkers/` handles chunking. The engine stores and retrieves chunks. All engines share the same chunkers.
**Search returns `SearchResult[]`, not raw rows.** The engine is responsible for its own search implementation (tsvector vs FTS5, pgvector vs sqlite-vss) but must return a uniform result type. RRF fusion and dedup happen above the engine, in `src/core/search/hybrid.ts`.
**`traverseGraph` exists but is engine-specific.** Postgres uses recursive CTEs. SQLite would use a loop with depth tracking. The interface is the same: give me a slug and max depth, return the graph.
## How search works across engines
```
+-------------------+
| hybrid.ts |
| (RRF fusion + |
| dedup, shared) |
+--------+----------+
|
+------------+------------+
| |
+--------v--------+ +--------v--------+
| engine.search | | engine.search |
| Keyword() | | Vector() |
+-----------------+ +-----------------+
| |
+-----------+-----------+ +---------+---------+
| | | |
+-------v-------+ +-------v---+ +-------v---+ +----v--------+
| Postgres: | | PGLite: | | Postgres: | | PGLite: |
| tsvector + | | tsvector +| | pgvector | | pgvector |
| ts_rank + | | ts_rank | | HNSW | | HNSW |
| websearch_to_ | | (same SQL)| | cosine | | cosine |
| tsquery | | | | | | (same SQL) |
+---------------+ +-----------+ +-----------+ +-------------+
```
RRF fusion, multi-query expansion, and 4-layer dedup are engine-agnostic. They operate on `SearchResult[]` arrays. Only the raw keyword and vector searches are engine-specific.
## PostgresEngine (v0, ships)
**Dependencies:** `postgres` (porsager/postgres), `pgvector`
**Postgres-specific features used:**
- `tsvector` + `GIN` index for full-text search with `ts_rank` weighting
- `pgvector` HNSW index for cosine similarity vector search
- `pg_trgm` + `GIN` for fuzzy slug resolution
- Recursive CTEs for graph traversal
- Trigger-based search_vector (spans pages + timeline_entries)
- JSONB for frontmatter with GIN index
- Connection pooling via Supabase Supavisor (port 6543)
**Hosting:** Supabase Pro ($25/mo). Zero-ops. Managed Postgres with pgvector built in.
**Why not self-hosted for v0:** The brain should be infrastructure agents use, not something you maintain. Self-hosted Postgres with Docker is a welcome community PR, but v0 optimizes for zero ops.
## PGLiteEngine (v0.7, ships)
**Dependencies:** `@electric-sql/pglite` (v0.4.4+)
**What it is:** Embedded Postgres 17.5 compiled to WASM via ElectricSQL's PGLite. Runs in-process, no server, no Docker, no accounts. Same SQL as PostgresEngine -- not a separate dialect. All 37 BrainEngine methods implemented.
**PGLite-specific details:**
- Uses `pglite-schema.ts` for DDL (pgvector extension, pg_trgm, triggers, indexes)
- Parameterized queries throughout (shared utilities in `src/core/utils.ts`)
- `hybridSearch` keyword-only fallback when `OPENAI_API_KEY` is not set
- Data stored at `~/.gbrain/brain.db` (configurable)
- pgvector HNSW index for cosine similarity vector search (same as Postgres)
- tsvector + ts_rank for full-text search (same as Postgres)
- pg_trgm for fuzzy slug resolution (same as Postgres)
**When to use PGLite vs Postgres:**
| Factor | PGLite | PostgresEngine + Supabase |
|--------|--------|--------------------------|
| Setup | `gbrain init` (zero-config) | Account + connection string |
| Scale | Good for < 1,000 files | Production-proven at 10K+ |
| Multi-device | Single machine only | Any device via remote MCP |
| Cost | Free | Supabase Pro ($25/mo) |
| Concurrency | Single process | Connection pooling |
| Backups | Manual (file copy) | Managed by Supabase |
**Migration:** `gbrain migrate --to supabase` exports everything (pages, chunks, embeddings, links, tags, timeline) and imports into Supabase. `gbrain migrate --to pglite` goes the other direction. Bidirectional, lossless.
## Adding a new engine
1. Create `src/core/<name>-engine.ts` implementing `BrainEngine`
2. Add to engine factory in `src/core/engine-factory.ts`:
```typescript
export function createEngine(type: string): BrainEngine {
switch (type) {
case 'pglite': return new PGLiteEngine();
case 'postgres': return new PostgresEngine();
case 'myengine': return new MyEngine();
default: throw new Error(`Unknown engine: ${type}`);
}
}
```
The factory uses dynamic imports so engines are only loaded when selected.
3. Store engine type in `~/.gbrain/config.json`: `{ "engine": "myengine", ... }`
4. Add tests. The test suite should be engine-agnostic where possible... same test cases, different engine constructor.
5. Document in this file + add a design doc in `docs/`
### What you DON'T need to touch
- `src/cli.ts` (dispatches to engine, doesn't know which one)
- `src/mcp/server.ts` (same)
- `src/core/chunkers/*` (shared across engines)
- `src/core/embedding.ts` (shared across engines)
- `src/core/search/hybrid.ts`, `expansion.ts`, `dedup.ts` (shared, operate on SearchResult[])
- `skills/*` (fat markdown, engine-agnostic)
### What you DO need to implement
Every method in `BrainEngine`. The full interface. No optional methods, no feature flags. If your engine can't do vector search (e.g., a pure-text engine), implement `searchVector` to return `[]` and document the limitation.
## Capability matrix
| Capability | PostgresEngine | PGLiteEngine | Notes |
|-----------|---------------|-------------|-------|
| CRUD | Full | Full | Same SQL |
| Keyword search | tsvector + ts_rank | tsvector + ts_rank | Identical (real Postgres) |
| Vector search | pgvector HNSW | pgvector HNSW | Identical (real Postgres) |
| Fuzzy slug | pg_trgm | pg_trgm | Identical (real Postgres) |
| Graph traversal | Recursive CTE | Recursive CTE | Same SQL |
| Transactions | Full ACID | Full ACID | Both support this |
| JSONB queries | GIN index | GIN index | Identical |
| Concurrent access | Connection pooling | Single process | PGLite limitation |
| Hosting | Supabase, self-hosted, Docker | Local file | |
| Migration methods | runMigration, getChunksWithEmbeddings | Same | Added v0.7 |
## Future engine ideas
**TursoEngine.** libSQL (SQLite fork) with embedded replicas and HTTP edge access. Would give SQLite's simplicity with cloud sync. Interesting for mobile/edge use cases.
**DuckDBEngine.** Analytical workloads. Bulk exports, embedding analysis, brain-wide statistics. Not for OLTP. Could be a secondary engine for analytics alongside Postgres for operations.
**Custom/Remote.** The interface is clean enough that someone could build an engine backed by any storage: Firestore, DynamoDB, a REST API, even a flat file system. The interface doesn't assume SQL.
Note: The original SQLite engine plan (`docs/SQLITE_ENGINE.md`) was superseded by PGLite. PGLite uses the same SQL as Postgres, eliminating the need for a separate SQLite dialect with FTS5/sqlite-vss translation.