Files

Garry Tan 6c7d2ed30b feat: PGLite engine — local brain, zero infrastructure (v0.7.0) (#41 )

* refactor: extract shared utils, add runMigration + getChunksWithEmbeddings to BrainEngine

Extract validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult
from postgres-engine.ts into shared utils.ts. Add rowToChunk includeEmbedding
parameter for migration support.

Add two new methods to BrainEngine interface:
- runMigration(version, sql) — replaces internal eng.sql access in migrate.ts
- getChunksWithEmbeddings(slug) — returns chunks with embedding data for migration

Replace 'sqlite' with 'pglite' in EngineConfig and GBrainConfig types.
Fix loadConfig to infer engine from database_path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: pluggable engine factory + hybridSearch keyword-only fallback

Add createEngine() factory with dynamic imports so PGLite WASM is never
loaded for Postgres users. Wire CLI to use factory instead of hardcoded
PostgresEngine.

Force workers=1 for PGLite imports (single-connection architecture).

Fix hybridSearch to check OPENAI_API_KEY before calling embed(). When
unset, returns keyword-only results instead of throwing. Critical for
local PGLite users who don't need vector search.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: PGLiteEngine — embedded Postgres 17.5 via WASM, same SQL everywhere

Full BrainEngine implementation (37 methods) using @electric-sql/pglite.
Same SQL as PostgresEngine — tsvector triggers, pgvector HNSW, pg_trgm
fuzzy matching, recursive CTEs, JSONB. Only the driver call syntax differs
(parameterized queries instead of tagged templates).

PGLite schema is the Postgres schema minus RLS, advisory locks, and
remote auth tables (access_tokens, mcp_request_log, files).

No server. No subscription. One directory. Works offline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: smart init (PGLite default) + bidirectional engine migration

gbrain init now defaults to PGLite — brain ready in 2 seconds, no
server needed. Scans target directory: <1000 .md files = PGLite,
>=1000 = suggests Supabase. --supabase and --pglite flags override.

gbrain migrate --to supabase/pglite transfers all data between engines
with manifest-based resume. Copies pages, chunks (with embeddings),
tags, timeline, raw data, links, and config. --force overwrites
non-empty target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: 60 new tests for PGLite engine, utils, and factory

41 PGLite engine tests covering all 37 BrainEngine methods: CRUD,
tsvector keyword search, pg_trgm fuzzy matching, chunk upsert with
COALESCE, graph traversal via recursive CTE, transactions, cascade
deletes, stats/health, and embedding round-trip.

14 shared utility tests (validateSlug, contentHash, row mappers).
5 engine factory tests (dispatch, error messages).

All run in-memory — zero Docker, zero DATABASE_URL, instant in CI.

Add P0 TODO: submit Bun PR for WASM embedding in bun build --compile
(oven-sh/bun#15032).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.7.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.7.0 PGLite engine

- CLAUDE.md: add PGLite key files, update architecture, add migrate command, add 3 test files
- README.md: PGLite as default init, zero-config getting started, migration path to Supabase
- docs/ENGINES.md: PGLiteEngine shipped (v0.7), capability matrix, migration docs
- docs/SQLITE_ENGINE.md: marked superseded by PGLite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove stale v0.4 README update prompt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove SQLITE_ENGINE.md (superseded by PGLite)

PGLite uses the same SQL as Postgres, making a separate SQLite
engine unnecessary. docs/ENGINES.md covers PGLiteEngine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update README step 2 to default to PGLite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add schema setup step and install-all-integrations step to README

Step 3 now tells agents to read GBRAIN_RECOMMENDED_SCHEMA.md and set up
the MECE directory structure before importing. Step 7 tells agents to
install every available integration recipe, not just list them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update install goal to match full opinionated setup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add 'Need an AI agent first?' section with one-click deploy links

New users who don't have OpenClaw or Hermes Agent get pointed to
AlphaClaw on Render and the Hermes Agent Railway template. One click
each. Claude Code mentioned for users who already have it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add migrate to CLI_ONLY + help output, fix standalone example

- migrate command was missing from CLI_ONLY set (errored as "Unknown command")
- migrate now shows in --help under SETUP
- init help line shows --pglite flag
- standalone CLI example uses gbrain init (not --supabase)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: set realistic time expectation (~30 min to working brain)

DB is 2 seconds. But schema + import + embeddings + integrations
is 15-30 minutes. The agent does the work, you answer API key
questions. Don't oversell time-to-value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix AlphaClaw Render requirement (8GB+ RAM, not free tier)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: final README polish for launch

- GOAL line: "Garry Tan's exact setup" (not Claude Code specific)
- Remove markdown links from code block (won't render)
- STEP 2 renamed from "START HERE" to "DATABASE"
- Tighten Supabase fallback text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove duplicate old install block from README

The v0.5-era "With OpenClaw or Hermes Agent" paste block was
superseded by the top-level "Start here" block. Having both
confused users and the old one still said --supabase as step 2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clean up README consistency and remove duplicated content

- Remove duplicate "Try it" section (old 4-act walkthrough that
  repeated the install flow and contradicted "~30 min" with "90 sec")
- Remove duplicate Setup section (third repetition of gbrain init)
- Fix brain.db → brain.pglite (actual default path)
- Fix "coming in v0.7" → "not yet implemented" (we ARE v0.7)
- Remove "You don't need Postgres" (confusing since PGLite IS Postgres)
- Deduplicate "competitive dynamics" query (appeared 3 times)
- Collapse redundant standalone CLI section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-11 00:01:09 -10:00

11 KiB

Raw Blame History

Pluggable Engine Architecture

The idea

Every GBrain operation goes through BrainEngine. The engine is the contract between "what the brain can do" and "how it's stored." Swap the engine, keep everything else.

v0 shipped PostgresEngine backed by Supabase. v0.7 adds PGLiteEngine -- embedded Postgres 17.5 via WASM (@electric-sql/pglite), zero-config default. The interface is designed so a DuckDBEngine, TursoEngine, or any custom backend could slot in without touching the CLI, MCP server, skills, or any consumer code.

Why this matters

Different users have different constraints:

User	Needs	Best engine
Getting started	Zero-config, no accounts, no server	PGLiteEngine (default since v0.7)
Power user (you)	World-class search, 7K+ pages, zero-ops	PostgresEngine + Supabase
Open source hacker	Single file, no server, git-friendly	PGLiteEngine
Team/enterprise	Multi-user, RLS, audit trail	PostgresEngine + self-hosted
Researcher	Analytics, bulk exports, embeddings	DuckDBEngine (someday)
Edge/mobile	Offline-first, sync later	PGLiteEngine + sync (someday)

The engine interface means we don't have to choose. PGLite is the zero-friction default. Supabase is the production scale path. gbrain migrate --to supabase/pglite moves between them.

The interface

// src/core/engine.ts

export interface BrainEngine {
  // Lifecycle
  connect(config: EngineConfig): Promise<void>;
  disconnect(): Promise<void>;
  initSchema(): Promise<void>;
  transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T>;

  // Pages CRUD
  getPage(slug: string): Promise<Page | null>;
  putPage(slug: string, page: PageInput): Promise<Page>;
  deletePage(slug: string): Promise<void>;
  listPages(filters: PageFilters): Promise<Page[]>;

  // Search
  searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]>;
  searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]>;

  // Chunks
  upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void>;
  getChunks(slug: string): Promise<Chunk[]>;

  // Links
  addLink(from: string, to: string, context?: string, linkType?: string): Promise<void>;
  removeLink(from: string, to: string): Promise<void>;
  getLinks(slug: string): Promise<Link[]>;
  getBacklinks(slug: string): Promise<Link[]>;
  traverseGraph(slug: string, depth?: number): Promise<GraphNode[]>;

  // Tags
  addTag(slug: string, tag: string): Promise<void>;
  removeTag(slug: string, tag: string): Promise<void>;
  getTags(slug: string): Promise<string[]>;

  // Timeline
  addTimelineEntry(slug: string, entry: TimelineInput): Promise<void>;
  getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]>;

  // Raw data
  putRawData(slug: string, source: string, data: object): Promise<void>;
  getRawData(slug: string, source?: string): Promise<RawData[]>;

  // Versions
  createVersion(slug: string): Promise<PageVersion>;
  getVersions(slug: string): Promise<PageVersion[]>;
  revertToVersion(slug: string, versionId: number): Promise<void>;

  // Stats + health
  getStats(): Promise<BrainStats>;
  getHealth(): Promise<BrainHealth>;

  // Ingest log
  logIngest(entry: IngestLogInput): Promise<void>;
  getIngestLog(opts?: IngestLogOpts): Promise<IngestLogEntry[]>;

  // Config
  getConfig(key: string): Promise<string | null>;
  setConfig(key: string, value: string): Promise<void>;

  // Migration + advanced (added v0.7)
  runMigration(sql: string): Promise<void>;
  getChunksWithEmbeddings(slug: string): Promise<ChunkWithEmbedding[]>;
}

Key design choices

Slug-based API, not ID-based. Every method takes slugs, not numeric IDs. The engine resolves slugs to IDs internally. This keeps the interface portable... slugs are strings, IDs are database-specific.

Embedding is NOT in the engine. The engine stores embeddings and searches by vector, but it doesn't generate embeddings. src/core/embedding.ts handles that. This is intentional: embedding is an external API call (OpenAI), not a storage concern. All engines share the same embedding service.

Chunking is NOT in the engine. Same logic. src/core/chunkers/ handles chunking. The engine stores and retrieves chunks. All engines share the same chunkers.

Search returns SearchResult[], not raw rows. The engine is responsible for its own search implementation (tsvector vs FTS5, pgvector vs sqlite-vss) but must return a uniform result type. RRF fusion and dedup happen above the engine, in src/core/search/hybrid.ts.

traverseGraph exists but is engine-specific. Postgres uses recursive CTEs. SQLite would use a loop with depth tracking. The interface is the same: give me a slug and max depth, return the graph.

How search works across engines

                        +-------------------+
                        |  hybrid.ts        |
                        |  (RRF fusion +    |
                        |   dedup, shared)  |
                        +--------+----------+
                                 |
                    +------------+------------+
                    |                         |
           +--------v--------+       +--------v--------+
           | engine.search   |       | engine.search   |
           |   Keyword()     |       |   Vector()      |
           +-----------------+       +-----------------+
                    |                         |
        +-----------+-----------+   +---------+---------+
        |                       |   |                   |
+-------v-------+  +-------v---+   +-------v---+  +----v--------+
| Postgres:     |  | PGLite:   |   | Postgres: |  | PGLite:     |
| tsvector +    |  | tsvector +|   | pgvector  |  | pgvector    |
| ts_rank +     |  | ts_rank   |   | HNSW      |  | HNSW        |
| websearch_to_ |  | (same SQL)|   | cosine    |  | cosine      |
| tsquery       |  |           |   |           |  | (same SQL)  |
+---------------+  +-----------+   +-----------+  +-------------+

RRF fusion, multi-query expansion, and 4-layer dedup are engine-agnostic. They operate on SearchResult[] arrays. Only the raw keyword and vector searches are engine-specific.

PostgresEngine (v0, ships)

Dependencies: postgres (porsager/postgres), pgvector

Postgres-specific features used:

tsvector + GIN index for full-text search with ts_rank weighting
pgvector HNSW index for cosine similarity vector search
pg_trgm + GIN for fuzzy slug resolution
Recursive CTEs for graph traversal
Trigger-based search_vector (spans pages + timeline_entries)
JSONB for frontmatter with GIN index
Connection pooling via Supabase Supavisor (port 6543)

Hosting: Supabase Pro ($25/mo). Zero-ops. Managed Postgres with pgvector built in.

Why not self-hosted for v0: The brain should be infrastructure agents use, not something you maintain. Self-hosted Postgres with Docker is a welcome community PR, but v0 optimizes for zero ops.

PGLiteEngine (v0.7, ships)

Dependencies: @electric-sql/pglite (v0.4.4+)

What it is: Embedded Postgres 17.5 compiled to WASM via ElectricSQL's PGLite. Runs in-process, no server, no Docker, no accounts. Same SQL as PostgresEngine -- not a separate dialect. All 37 BrainEngine methods implemented.

PGLite-specific details:

Uses pglite-schema.ts for DDL (pgvector extension, pg_trgm, triggers, indexes)
Parameterized queries throughout (shared utilities in src/core/utils.ts)
hybridSearch keyword-only fallback when OPENAI_API_KEY is not set
Data stored at ~/.gbrain/brain.db (configurable)
pgvector HNSW index for cosine similarity vector search (same as Postgres)
tsvector + ts_rank for full-text search (same as Postgres)
pg_trgm for fuzzy slug resolution (same as Postgres)

When to use PGLite vs Postgres:

Factor	PGLite	PostgresEngine + Supabase
Setup	`gbrain init` (zero-config)	Account + connection string
Scale	Good for < 1,000 files	Production-proven at 10K+
Multi-device	Single machine only	Any device via remote MCP
Cost	Free	Supabase Pro ($25/mo)
Concurrency	Single process	Connection pooling
Backups	Manual (file copy)	Managed by Supabase

Migration: gbrain migrate --to supabase exports everything (pages, chunks, embeddings, links, tags, timeline) and imports into Supabase. gbrain migrate --to pglite goes the other direction. Bidirectional, lossless.

Adding a new engine

Create src/core/<name>-engine.ts implementing BrainEngine

Add to engine factory in src/core/engine-factory.ts:

export function createEngine(type: string): BrainEngine {
  switch (type) {
    case 'pglite': return new PGLiteEngine();
    case 'postgres': return new PostgresEngine();
    case 'myengine': return new MyEngine();
    default: throw new Error(`Unknown engine: ${type}`);
  }
}

The factory uses dynamic imports so engines are only loaded when selected.

Store engine type in ~/.gbrain/config.json: { "engine": "myengine", ... }
Add tests. The test suite should be engine-agnostic where possible... same test cases, different engine constructor.
Document in this file + add a design doc in docs/

What you DON'T need to touch

src/cli.ts (dispatches to engine, doesn't know which one)
src/mcp/server.ts (same)
src/core/chunkers/* (shared across engines)
src/core/embedding.ts (shared across engines)
src/core/search/hybrid.ts, expansion.ts, dedup.ts (shared, operate on SearchResult[])
skills/* (fat markdown, engine-agnostic)

What you DO need to implement

Every method in BrainEngine. The full interface. No optional methods, no feature flags. If your engine can't do vector search (e.g., a pure-text engine), implement searchVector to return [] and document the limitation.

Capability matrix

Capability	PostgresEngine	PGLiteEngine	Notes
CRUD	Full	Full	Same SQL
Keyword search	tsvector + ts_rank	tsvector + ts_rank	Identical (real Postgres)
Vector search	pgvector HNSW	pgvector HNSW	Identical (real Postgres)
Fuzzy slug	pg_trgm	pg_trgm	Identical (real Postgres)
Graph traversal	Recursive CTE	Recursive CTE	Same SQL
Transactions	Full ACID	Full ACID	Both support this
JSONB queries	GIN index	GIN index	Identical
Concurrent access	Connection pooling	Single process	PGLite limitation
Hosting	Supabase, self-hosted, Docker	Local file
Migration methods	runMigration, getChunksWithEmbeddings	Same	Added v0.7

Future engine ideas

TursoEngine. libSQL (SQLite fork) with embedded replicas and HTTP edge access. Would give SQLite's simplicity with cloud sync. Interesting for mobile/edge use cases.

DuckDBEngine. Analytical workloads. Bulk exports, embedding analysis, brain-wide statistics. Not for OLTP. Could be a secondary engine for analytics alongside Postgres for operations.

Custom/Remote. The interface is clean enough that someone could build an engine backed by any storage: Firestore, DynamoDB, a REST API, even a flat file system. The interface doesn't assume SQL.

Note: The original SQLite engine plan (docs/SQLITE_ENGINE.md) was superseded by PGLite. PGLite uses the same SQL as Postgres, eliminating the need for a separate SQLite dialect with FTS5/sqlite-vss translation.

11 KiB Raw Blame History