feat: PGLite engine — local brain, zero infrastructure (v0.7.0) (#41)

* refactor: extract shared utils, add runMigration + getChunksWithEmbeddings to BrainEngine Extract validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult from postgres-engine.ts into shared utils.ts. Add rowToChunk includeEmbedding parameter for migration support. Add two new methods to BrainEngine interface: - runMigration(version, sql) — replaces internal eng.sql access in migrate.ts - getChunksWithEmbeddings(slug) — returns chunks with embedding data for migration Replace 'sqlite' with 'pglite' in EngineConfig and GBrainConfig types. Fix loadConfig to infer engine from database_path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: pluggable engine factory + hybridSearch keyword-only fallback Add createEngine() factory with dynamic imports so PGLite WASM is never loaded for Postgres users. Wire CLI to use factory instead of hardcoded PostgresEngine. Force workers=1 for PGLite imports (single-connection architecture). Fix hybridSearch to check OPENAI_API_KEY before calling embed(). When unset, returns keyword-only results instead of throwing. Critical for local PGLite users who don't need vector search. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: PGLiteEngine — embedded Postgres 17.5 via WASM, same SQL everywhere Full BrainEngine implementation (37 methods) using @electric-sql/pglite. Same SQL as PostgresEngine — tsvector triggers, pgvector HNSW, pg_trgm fuzzy matching, recursive CTEs, JSONB. Only the driver call syntax differs (parameterized queries instead of tagged templates). PGLite schema is the Postgres schema minus RLS, advisory locks, and remote auth tables (access_tokens, mcp_request_log, files). No server. No subscription. One directory. Works offline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: smart init (PGLite default) + bidirectional engine migration gbrain init now defaults to PGLite — brain ready in 2 seconds, no server needed. Scans target directory: <1000 .md files = PGLite, >=1000 = suggests Supabase. --supabase and --pglite flags override. gbrain migrate --to supabase/pglite transfers all data between engines with manifest-based resume. Copies pages, chunks (with embeddings), tags, timeline, raw data, links, and config. --force overwrites non-empty target. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: 60 new tests for PGLite engine, utils, and factory 41 PGLite engine tests covering all 37 BrainEngine methods: CRUD, tsvector keyword search, pg_trgm fuzzy matching, chunk upsert with COALESCE, graph traversal via recursive CTE, transactions, cascade deletes, stats/health, and embedding round-trip. 14 shared utility tests (validateSlug, contentHash, row mappers). 5 engine factory tests (dispatch, error messages). All run in-memory — zero Docker, zero DATABASE_URL, instant in CI. Add P0 TODO: submit Bun PR for WASM embedding in bun build --compile (oven-sh/bun#15032). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.7.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.7.0 PGLite engine - CLAUDE.md: add PGLite key files, update architecture, add migrate command, add 3 test files - README.md: PGLite as default init, zero-config getting started, migration path to Supabase - docs/ENGINES.md: PGLiteEngine shipped (v0.7), capability matrix, migration docs - docs/SQLITE_ENGINE.md: marked superseded by PGLite Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove stale v0.4 README update prompt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove SQLITE_ENGINE.md (superseded by PGLite) PGLite uses the same SQL as Postgres, making a separate SQLite engine unnecessary. docs/ENGINES.md covers PGLiteEngine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update README step 2 to default to PGLite Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add schema setup step and install-all-integrations step to README Step 3 now tells agents to read GBRAIN_RECOMMENDED_SCHEMA.md and set up the MECE directory structure before importing. Step 7 tells agents to install every available integration recipe, not just list them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update install goal to match full opinionated setup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add 'Need an AI agent first?' section with one-click deploy links New users who don't have OpenClaw or Hermes Agent get pointed to AlphaClaw on Render and the Hermes Agent Railway template. One click each. Claude Code mentioned for users who already have it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add migrate to CLI_ONLY + help output, fix standalone example - migrate command was missing from CLI_ONLY set (errored as "Unknown command") - migrate now shows in --help under SETUP - init help line shows --pglite flag - standalone CLI example uses gbrain init (not --supabase) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: set realistic time expectation (~30 min to working brain) DB is 2 seconds. But schema + import + embeddings + integrations is 15-30 minutes. The agent does the work, you answer API key questions. Don't oversell time-to-value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix AlphaClaw Render requirement (8GB+ RAM, not free tier) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: final README polish for launch - GOAL line: "Garry Tan's exact setup" (not Claude Code specific) - Remove markdown links from code block (won't render) - STEP 2 renamed from "START HERE" to "DATABASE" - Tighten Supabase fallback text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove duplicate old install block from README The v0.5-era "With OpenClaw or Hermes Agent" paste block was superseded by the top-level "Start here" block. Having both confused users and the old one still said --supabase as step 2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clean up README consistency and remove duplicated content - Remove duplicate "Try it" section (old 4-act walkthrough that repeated the install flow and contradicted "~30 min" with "90 sec") - Remove duplicate Setup section (third repetition of gbrain init) - Fix brain.db → brain.pglite (actual default path) - Fix "coming in v0.7" → "not yet implemented" (we ARE v0.7) - Remove "You don't need Postgres" (confusing since PGLite IS Postgres) - Deduplicate "competitive dynamics" query (appeared 3 times) - Collapse redundant standalone CLI section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:01:09 -10:00
parent ce15062694
commit 6c7d2ed30b
26 changed files with 2287 additions and 744 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,7 +6,12 @@ All notable changes to GBrain will be documented in this file.

 ### Added

- **Your brain gets new senses automatically.** Integration recipes teach your agent how to wire up voice calls, email, Twitter, and calendar into your brain. Run `gbrain integrations` to see what's available. Your agent reads the recipe, asks for API keys, validates each one, and sets everything up. Markdown is code — the recipe IS the installer.
+- **Your brain now runs locally with zero infrastructure.** PGLite (Postgres 17.5 compiled to WASM) gives you the exact same search quality as Supabase, same pgvector HNSW, same pg_trgm fuzzy matching, same tsvector full-text search. No server, no subscription, no API keys needed for keyword search. `gbrain init` and you're running in 2 seconds.
+- **Smart init defaults to local.** `gbrain init` now creates a PGLite brain by default. If your repo has 1000+ markdown files, it suggests Supabase for scale. `--supabase` and `--pglite` flags let you choose explicitly.
+- **Migrate between engines anytime.** `gbrain migrate --to supabase` transfers your entire brain (pages, chunks, embeddings, tags, links, timeline) to remote Postgres with manifest-based resume. `gbrain migrate --to pglite` goes the other way. Embeddings copy directly, no re-embedding needed.
+- **Pluggable engine factory.** `createEngine()` dynamically loads the right engine from config. PGLite WASM is never loaded for Postgres users.
+- **Search works without OpenAI.** `hybridSearch` now checks for `OPENAI_API_KEY` before attempting embeddings. No key = keyword-only search. No more crashes when you just want to search your local brain.
+- **Your brain gets new senses automatically.** Integration recipes teach your agent how to wire up voice calls, email, Twitter, and calendar into your brain. Run `gbrain integrations` to see what's available. Your agent reads the recipe, asks for API keys, validates each one, and sets everything up. Markdown is code -- the recipe IS the installer.
 - **Voice-to-brain: phone calls create brain pages.** The first recipe: Twilio + OpenAI Realtime voice agent. Call a number, talk, and a structured brain page appears with entity detection, cross-references, and a summary posted to your messaging app. Opinionated defaults: caller screening, brain-first lookup, quiet hours, thinking sounds. The smoke test calls YOU (outbound) so you experience the magic immediately.
 - **`gbrain integrations` command.** Six subcommands for managing integration recipes: `list` (dashboard of senses + reflexes), `show` (recipe details), `status` (credential checks with direct links to get missing keys), `doctor` (health checks), `stats` (signal analytics), `test` (recipe validation). `--json` on every subcommand for agent-parseable output. No database connection needed.
 - **Health heartbeat.** Integrations log events to `~/.gbrain/integrations/<id>/heartbeat.jsonl`. Status checks detect stale integrations and include diagnostic steps.
@@ -14,6 +19,13 @@ All notable changes to GBrain will be documented in this file.
 - **"Getting Data In" documentation.** New `docs/integrations/` with a landing page, recipe format documentation, credential gateway guide, and meeting webhook guide. Explains the deterministic collector pattern: code for data, LLMs for judgment.
 - **Architecture and philosophy docs.** `docs/architecture/infra-layer.md` documents the shared foundation (import, chunk, embed, search). `docs/ethos/THIN_HARNESS_FAT_SKILLS.md` is Garry's essay on the architecture philosophy with an agent decision guide. `docs/designs/HOMEBREW_FOR_PERSONAL_AI.md` maps the 10-star vision.

+### Changed
+
+- **Engine interface expanded.** Added `runMigration()` (replaces internal driver access for schema migrations) and `getChunksWithEmbeddings()` (loads embedding data for cross-engine migration).
+- **Shared utilities extracted.** `validateSlug`, `contentHash`, and row mappers moved from `postgres-engine.ts` to `src/core/utils.ts`. Both engines share them.
+- **Config infers engine type.** If `database_path` is set but `engine` is missing, config now infers `pglite` instead of defaulting to `postgres`.
+- **Import serializes on PGLite.** Parallel workers are Postgres-only. PGLite uses sequential import (single-connection architecture).
+
 ## [0.6.1] - 2026-04-10

 ### Fixed