feat: PGLite engine — local brain, zero infrastructure (v0.7.0) (#41)
* refactor: extract shared utils, add runMigration + getChunksWithEmbeddings to BrainEngine Extract validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult from postgres-engine.ts into shared utils.ts. Add rowToChunk includeEmbedding parameter for migration support. Add two new methods to BrainEngine interface: - runMigration(version, sql) — replaces internal eng.sql access in migrate.ts - getChunksWithEmbeddings(slug) — returns chunks with embedding data for migration Replace 'sqlite' with 'pglite' in EngineConfig and GBrainConfig types. Fix loadConfig to infer engine from database_path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: pluggable engine factory + hybridSearch keyword-only fallback Add createEngine() factory with dynamic imports so PGLite WASM is never loaded for Postgres users. Wire CLI to use factory instead of hardcoded PostgresEngine. Force workers=1 for PGLite imports (single-connection architecture). Fix hybridSearch to check OPENAI_API_KEY before calling embed(). When unset, returns keyword-only results instead of throwing. Critical for local PGLite users who don't need vector search. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: PGLiteEngine — embedded Postgres 17.5 via WASM, same SQL everywhere Full BrainEngine implementation (37 methods) using @electric-sql/pglite. Same SQL as PostgresEngine — tsvector triggers, pgvector HNSW, pg_trgm fuzzy matching, recursive CTEs, JSONB. Only the driver call syntax differs (parameterized queries instead of tagged templates). PGLite schema is the Postgres schema minus RLS, advisory locks, and remote auth tables (access_tokens, mcp_request_log, files). No server. No subscription. One directory. Works offline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: smart init (PGLite default) + bidirectional engine migration gbrain init now defaults to PGLite — brain ready in 2 seconds, no server needed. Scans target directory: <1000 .md files = PGLite, >=1000 = suggests Supabase. --supabase and --pglite flags override. gbrain migrate --to supabase/pglite transfers all data between engines with manifest-based resume. Copies pages, chunks (with embeddings), tags, timeline, raw data, links, and config. --force overwrites non-empty target. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: 60 new tests for PGLite engine, utils, and factory 41 PGLite engine tests covering all 37 BrainEngine methods: CRUD, tsvector keyword search, pg_trgm fuzzy matching, chunk upsert with COALESCE, graph traversal via recursive CTE, transactions, cascade deletes, stats/health, and embedding round-trip. 14 shared utility tests (validateSlug, contentHash, row mappers). 5 engine factory tests (dispatch, error messages). All run in-memory — zero Docker, zero DATABASE_URL, instant in CI. Add P0 TODO: submit Bun PR for WASM embedding in bun build --compile (oven-sh/bun#15032). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.7.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.7.0 PGLite engine - CLAUDE.md: add PGLite key files, update architecture, add migrate command, add 3 test files - README.md: PGLite as default init, zero-config getting started, migration path to Supabase - docs/ENGINES.md: PGLiteEngine shipped (v0.7), capability matrix, migration docs - docs/SQLITE_ENGINE.md: marked superseded by PGLite Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove stale v0.4 README update prompt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove SQLITE_ENGINE.md (superseded by PGLite) PGLite uses the same SQL as Postgres, making a separate SQLite engine unnecessary. docs/ENGINES.md covers PGLiteEngine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update README step 2 to default to PGLite Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add schema setup step and install-all-integrations step to README Step 3 now tells agents to read GBRAIN_RECOMMENDED_SCHEMA.md and set up the MECE directory structure before importing. Step 7 tells agents to install every available integration recipe, not just list them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update install goal to match full opinionated setup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add 'Need an AI agent first?' section with one-click deploy links New users who don't have OpenClaw or Hermes Agent get pointed to AlphaClaw on Render and the Hermes Agent Railway template. One click each. Claude Code mentioned for users who already have it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add migrate to CLI_ONLY + help output, fix standalone example - migrate command was missing from CLI_ONLY set (errored as "Unknown command") - migrate now shows in --help under SETUP - init help line shows --pglite flag - standalone CLI example uses gbrain init (not --supabase) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: set realistic time expectation (~30 min to working brain) DB is 2 seconds. But schema + import + embeddings + integrations is 15-30 minutes. The agent does the work, you answer API key questions. Don't oversell time-to-value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix AlphaClaw Render requirement (8GB+ RAM, not free tier) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: final README polish for launch - GOAL line: "Garry Tan's exact setup" (not Claude Code specific) - Remove markdown links from code block (won't render) - STEP 2 renamed from "START HERE" to "DATABASE" - Tighten Supabase fallback text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove duplicate old install block from README The v0.5-era "With OpenClaw or Hermes Agent" paste block was superseded by the top-level "Start here" block. Having both confused users and the old one still said --supabase as step 2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clean up README consistency and remove duplicated content - Remove duplicate "Try it" section (old 4-act walkthrough that repeated the install flow and contradicted "~30 min" with "90 sec") - Remove duplicate Setup section (third repetition of gbrain init) - Fix brain.db → brain.pglite (actual default path) - Fix "coming in v0.7" → "not yet implemented" (we ARE v0.7) - Remove "You don't need Postgres" (confusing since PGLite IS Postgres) - Deduplicate "competitive dynamics" query (appeared 3 times) - Collapse redundant standalone CLI section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
14
CHANGELOG.md
14
CHANGELOG.md
@@ -6,7 +6,12 @@ All notable changes to GBrain will be documented in this file.
|
||||
|
||||
### Added
|
||||
|
||||
- **Your brain gets new senses automatically.** Integration recipes teach your agent how to wire up voice calls, email, Twitter, and calendar into your brain. Run `gbrain integrations` to see what's available. Your agent reads the recipe, asks for API keys, validates each one, and sets everything up. Markdown is code — the recipe IS the installer.
|
||||
- **Your brain now runs locally with zero infrastructure.** PGLite (Postgres 17.5 compiled to WASM) gives you the exact same search quality as Supabase, same pgvector HNSW, same pg_trgm fuzzy matching, same tsvector full-text search. No server, no subscription, no API keys needed for keyword search. `gbrain init` and you're running in 2 seconds.
|
||||
- **Smart init defaults to local.** `gbrain init` now creates a PGLite brain by default. If your repo has 1000+ markdown files, it suggests Supabase for scale. `--supabase` and `--pglite` flags let you choose explicitly.
|
||||
- **Migrate between engines anytime.** `gbrain migrate --to supabase` transfers your entire brain (pages, chunks, embeddings, tags, links, timeline) to remote Postgres with manifest-based resume. `gbrain migrate --to pglite` goes the other way. Embeddings copy directly, no re-embedding needed.
|
||||
- **Pluggable engine factory.** `createEngine()` dynamically loads the right engine from config. PGLite WASM is never loaded for Postgres users.
|
||||
- **Search works without OpenAI.** `hybridSearch` now checks for `OPENAI_API_KEY` before attempting embeddings. No key = keyword-only search. No more crashes when you just want to search your local brain.
|
||||
- **Your brain gets new senses automatically.** Integration recipes teach your agent how to wire up voice calls, email, Twitter, and calendar into your brain. Run `gbrain integrations` to see what's available. Your agent reads the recipe, asks for API keys, validates each one, and sets everything up. Markdown is code -- the recipe IS the installer.
|
||||
- **Voice-to-brain: phone calls create brain pages.** The first recipe: Twilio + OpenAI Realtime voice agent. Call a number, talk, and a structured brain page appears with entity detection, cross-references, and a summary posted to your messaging app. Opinionated defaults: caller screening, brain-first lookup, quiet hours, thinking sounds. The smoke test calls YOU (outbound) so you experience the magic immediately.
|
||||
- **`gbrain integrations` command.** Six subcommands for managing integration recipes: `list` (dashboard of senses + reflexes), `show` (recipe details), `status` (credential checks with direct links to get missing keys), `doctor` (health checks), `stats` (signal analytics), `test` (recipe validation). `--json` on every subcommand for agent-parseable output. No database connection needed.
|
||||
- **Health heartbeat.** Integrations log events to `~/.gbrain/integrations/<id>/heartbeat.jsonl`. Status checks detect stale integrations and include diagnostic steps.
|
||||
@@ -14,6 +19,13 @@ All notable changes to GBrain will be documented in this file.
|
||||
- **"Getting Data In" documentation.** New `docs/integrations/` with a landing page, recipe format documentation, credential gateway guide, and meeting webhook guide. Explains the deterministic collector pattern: code for data, LLMs for judgment.
|
||||
- **Architecture and philosophy docs.** `docs/architecture/infra-layer.md` documents the shared foundation (import, chunk, embed, search). `docs/ethos/THIN_HARNESS_FAT_SKILLS.md` is Garry's essay on the architecture philosophy with an agent decision guide. `docs/designs/HOMEBREW_FOR_PERSONAL_AI.md` maps the 10-star vision.
|
||||
|
||||
### Changed
|
||||
|
||||
- **Engine interface expanded.** Added `runMigration()` (replaces internal driver access for schema migrations) and `getChunksWithEmbeddings()` (loads embedding data for cross-engine migration).
|
||||
- **Shared utilities extracted.** `validateSlug`, `contentHash`, and row mappers moved from `postgres-engine.ts` to `src/core/utils.ts`. Both engines share them.
|
||||
- **Config infers engine type.** If `database_path` is set but `engine` is missing, config now infers `pglite` instead of defaulting to `postgres`.
|
||||
- **Import serializes on PGLite.** Parallel workers are Postgres-only. PGLite uses sequential import (single-connection architecture).
|
||||
|
||||
## [0.6.1] - 2026-04-10
|
||||
|
||||
### Fixed
|
||||
|
||||
24
CLAUDE.md
24
CLAUDE.md
@@ -1,19 +1,27 @@
|
||||
# CLAUDE.md
|
||||
|
||||
GBrain is a personal knowledge brain. Postgres + pgvector + hybrid search in a managed Supabase instance.
|
||||
GBrain is a personal knowledge brain. Pluggable engines: PGLite (embedded Postgres
|
||||
via WASM, zero-config default) or Postgres + pgvector + hybrid search in a managed
|
||||
Supabase instance. `gbrain init` defaults to PGLite; suggests Supabase for 1000+ files.
|
||||
|
||||
## Architecture
|
||||
|
||||
Contract-first: `src/core/operations.ts` defines ~30 shared operations. CLI and MCP
|
||||
server are both generated from this single source. Skills are fat markdown files
|
||||
(tool-agnostic, work with both CLI and plugin contexts).
|
||||
server are both generated from this single source. Engine factory (`src/core/engine-factory.ts`)
|
||||
dynamically imports the configured engine (`'pglite'` or `'postgres'`). Skills are fat
|
||||
markdown files (tool-agnostic, work with both CLI and plugin contexts).
|
||||
|
||||
## Key files
|
||||
|
||||
- `src/core/operations.ts` — Contract-first operation definitions (the foundation)
|
||||
- `src/core/engine.ts` — Pluggable engine interface (BrainEngine)
|
||||
- `src/core/postgres-engine.ts` — Postgres + pgvector implementation
|
||||
- `src/core/engine-factory.ts` — Engine factory with dynamic imports (`'pglite'` | `'postgres'`)
|
||||
- `src/core/pglite-engine.ts` — PGLite (embedded Postgres 17.5 via WASM) implementation, all 37 BrainEngine methods
|
||||
- `src/core/pglite-schema.ts` — PGLite-specific DDL (pgvector, pg_trgm, triggers)
|
||||
- `src/core/postgres-engine.ts` — Postgres + pgvector implementation (Supabase / self-hosted)
|
||||
- `src/core/utils.ts` — Shared SQL utilities extracted from postgres-engine.ts
|
||||
- `src/core/db.ts` — Connection management, schema initialization
|
||||
- `src/commands/migrate-engine.ts` — Bidirectional engine migration (`gbrain migrate --to supabase/pglite`)
|
||||
- `src/core/import-file.ts` — importFromFile + importFromContent (chunk + embed + tags)
|
||||
- `src/core/sync.ts` — Pure sync functions (manifest parsing, filtering, slug conversion)
|
||||
- `src/core/storage.ts` — Pluggable storage interface (S3, Supabase Storage, local)
|
||||
@@ -50,9 +58,13 @@ server are both generated from this single source. Skills are fat markdown files
|
||||
|
||||
Run `gbrain --help` or `gbrain --tools-json` for full command reference.
|
||||
|
||||
Key commands added in v0.7:
|
||||
- `gbrain init` — defaults to PGLite (no Supabase needed), scans repo size, suggests Supabase for 1000+ files
|
||||
- `gbrain migrate --to supabase` / `gbrain migrate --to pglite` — bidirectional engine migration
|
||||
|
||||
## Testing
|
||||
|
||||
`bun test` runs all tests (20 unit test files + 4 E2E test files). Unit tests run
|
||||
`bun test` runs all tests (23 unit test files + 4 E2E test files). Unit tests run
|
||||
without a database. E2E tests skip gracefully when `DATABASE_URL` is not set.
|
||||
|
||||
Unit tests: `test/markdown.test.ts` (frontmatter parsing), `test/chunkers/recursive.test.ts`
|
||||
@@ -65,6 +77,8 @@ parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redac
|
||||
`test/setup-branching.test.ts` (setup flow), `test/slug-validation.test.ts` (slug validation),
|
||||
`test/storage.test.ts` (storage backends), `test/supabase-admin.test.ts` (Supabase admin),
|
||||
`test/yaml-lite.test.ts` (YAML parsing), `test/check-update.test.ts` (version check + update CLI),
|
||||
`test/pglite-engine.test.ts` (PGLite engine, all 37 BrainEngine methods),
|
||||
`test/utils.test.ts` (shared SQL utilities), `test/engine-factory.test.ts` (engine factory + dynamic imports),
|
||||
`test/integrations.test.ts` (recipe parsing, CLI routing, recipe validation).
|
||||
|
||||
E2E tests (`test/e2e/`): Run against real Postgres+pgvector. Require `DATABASE_URL`.
|
||||
|
||||
306
README.md
306
README.md
@@ -2,16 +2,39 @@
|
||||
|
||||
Your AI agent is smart but it doesn't know anything about your life. GBrain fixes that. Meetings, emails, tweets, calendar events, voice calls, original ideas... all of it flows into a searchable knowledge base that your agent reads before every response and writes to after every conversation. The agent gets smarter every day.
|
||||
|
||||
> **~30 minutes to a fully working brain.** Your agent does the work. Database ready in 2 seconds (PGLite, no server). Schema, import, embeddings, and integrations take 15-30 minutes depending on brain size. You just answer questions about API keys.
|
||||
>
|
||||
> **Requires a frontier model.** Tested with **Claude Opus 4.6** and **GPT-5.4 Thinking**. Likely to break with smaller models.
|
||||
|
||||
## Need an AI agent first?
|
||||
|
||||
GBrain is designed to be installed and operated by an AI agent. If you don't have one running yet:
|
||||
|
||||
- **[OpenClaw](https://openclaw.ai)** — Deploy [AlphaClaw on Render](https://render.com/deploy?repo=https://github.com/openclaw/alphaclaw) (one click, requires 8GB+ RAM instance)
|
||||
- **[Hermes Agent](https://github.com/NousResearch/hermes-agent)** — Deploy on [Railway using this template](https://github.com/praveen-ks-2001/hermes-agent-template) (one click)
|
||||
- **[Claude Code](https://claude.ai/code)** — Already installed? Paste the block below directly into your terminal
|
||||
|
||||
## Start here: paste this into your agent
|
||||
|
||||
Copy this block into [OpenClaw](https://openclaw.ai) or [Hermes Agent](https://github.com/NousResearch/hermes-agent). The agent will install GBrain, set up your database, import your files, and verify everything works.
|
||||
Copy this block into OpenClaw, Hermes Agent, or Claude Code. The agent will install GBrain, set up the brain schema, import your files, configure all integrations, and verify everything works.
|
||||
|
||||
```
|
||||
GOAL: Install GBrain so I can search my own markdown files with
|
||||
hybrid search (keyword + vector). Do not skip steps. Verify each
|
||||
step before proceeding. If a step fails, stop and fix it.
|
||||
GOAL: I want Garry Tan's exact opinionated AI agent setup. Get me
|
||||
there. Install GBrain, set up the brain schema, import my files,
|
||||
configure all integrations, and verify everything works. Do not skip
|
||||
steps. Verify each step before proceeding. If a step fails, stop
|
||||
and fix it.
|
||||
|
||||
You don't need a server to start. gbrain init defaults to PGLite
|
||||
(embedded Postgres 17.5 via WASM). No Supabase account, no Docker,
|
||||
no connection strings. One command and you have a full Postgres brain
|
||||
with pgvector, hybrid search, and all 37 operations. When your brain
|
||||
outgrows local (1000+ files), gbrain migrate --to supabase moves
|
||||
everything to managed Postgres in one shot.
|
||||
|
||||
The knowledge model is markdown files in a git repo. Read
|
||||
docs/GBRAIN_SKILLPACK.md for the full agent playbook and
|
||||
docs/GBRAIN_RECOMMENDED_SCHEMA.md for the brain directory structure.
|
||||
|
||||
STEP 1 — INSTALL
|
||||
curl -fsSL https://bun.sh/install | bash && source ~/.bashrc
|
||||
@@ -19,46 +42,63 @@ STEP 1 — INSTALL
|
||||
Verify: gbrain --version
|
||||
|
||||
STEP 2 — DATABASE
|
||||
Run: gbrain init --supabase
|
||||
Ask the user for their Supabase Shared Pooler connection string:
|
||||
"Go to https://supabase.com → your project → Get Connected →
|
||||
Direct Connection String → Session Pooler → copy Shared Pooler"
|
||||
Do NOT ask for the anon key or project URL. Those are wrong.
|
||||
Run: gbrain init
|
||||
This creates a local PGLite brain (embedded Postgres, no server).
|
||||
Brain ready in ~2 seconds. No accounts, no connection strings.
|
||||
Verify: gbrain doctor --json (all checks pass)
|
||||
|
||||
STEP 3 — IMPORT
|
||||
For large brains (1000+ files) or remote access, use Supabase:
|
||||
gbrain init --supabase
|
||||
You can migrate anytime: gbrain migrate --to supabase
|
||||
|
||||
STEP 3 — SCHEMA
|
||||
Read docs/GBRAIN_RECOMMENDED_SCHEMA.md — this is the brain's directory
|
||||
structure. Set up the MECE directories (people/, companies/, deals/,
|
||||
concepts/, etc.) with README.md resolvers in each. The schema tells
|
||||
you exactly what goes where and how to resolve ambiguity.
|
||||
The agent maintains this structure automatically after setup.
|
||||
|
||||
STEP 4 — IMPORT
|
||||
Find markdown: find ~/git ~/Documents ~/notes -maxdepth 3 -name "*.md" | head -30
|
||||
Tell user what you found. Import: gbrain import /path/ --no-embed
|
||||
Verify: gbrain stats (page count > 0)
|
||||
|
||||
STEP 4 — SEARCH
|
||||
STEP 5 — SEARCH
|
||||
Run: gbrain embed --stale (generates vector embeddings, ~1 min per 1000 pages)
|
||||
Then: gbrain query "what are the key themes across these documents?"
|
||||
Show the user the results. This is the magic moment.
|
||||
Verify: results include semantic matches, not just keyword hits
|
||||
|
||||
STEP 5 — GO LIVE
|
||||
STEP 6 — GO LIVE
|
||||
a. Read docs/GBRAIN_SKILLPACK.md — adopt brain-first lookup, entity
|
||||
detection, and back-linking into your behavior immediately
|
||||
b. Set up live sync: gbrain sync --repo /path/ && gbrain embed --stale
|
||||
(see docs/guides/live-sync.md for cron/watch/webhook options)
|
||||
c. Set up auto-update: gbrain check-update --json on a daily cron
|
||||
(never auto-install — always ask the user first)
|
||||
d. Show available integrations: gbrain integrations
|
||||
e. Run verification: read and execute docs/GBRAIN_VERIFY.md
|
||||
d. Run verification: read and execute docs/GBRAIN_VERIFY.md
|
||||
|
||||
STEP 7 — INTEGRATIONS
|
||||
Run: gbrain integrations list
|
||||
For EVERY available recipe, set it up. Each recipe is a markdown file
|
||||
that tells the agent exactly what credentials to ask for, how to
|
||||
validate them, and how to wire the integration. The agent reads the
|
||||
recipe, asks the user for API keys, and handles the rest.
|
||||
Voice, email, calendar, Twitter — install them all.
|
||||
Verify: gbrain integrations doctor (all health checks pass)
|
||||
|
||||
DONE. Tell the user: "GBrain is live. You have [N] pages with hybrid
|
||||
search. I now check the brain before answering questions. Run
|
||||
'gbrain integrations' to add voice, email, calendar, or Twitter."
|
||||
search. I now check the brain before answering questions. All available
|
||||
integrations are configured."
|
||||
```
|
||||
|
||||
### Without an agent (standalone CLI)
|
||||
|
||||
```bash
|
||||
bun add -g github:garrytan/gbrain
|
||||
gbrain init --supabase # guided wizard
|
||||
gbrain init # local brain, ready in 2 seconds
|
||||
gbrain import ~/git/brain/ # index your markdown
|
||||
gbrain query "what do we know about competitive dynamics?"
|
||||
gbrain query "what themes show up across my notes?"
|
||||
```
|
||||
|
||||
Run `gbrain --help` for all commands. See [MCP setup](docs/mcp/DEPLOY.md) for connecting Claude Desktop, Perplexity, etc.
|
||||
@@ -111,9 +151,7 @@ I was setting up my [OpenClaw](https://openclaw.ai) agent and started a markdown
|
||||
|
||||
The agent runs while I sleep. The dream cycle scans every conversation, enriches missing entities, fixes broken citations, and consolidates memory. I wake up and the brain is smarter than when I went to sleep. See the [cron schedule guide](docs/guides/cron-schedule.md) for setup.
|
||||
|
||||
**You don't need Postgres to start.** The knowledge model is just markdown files in a git repo. The [skills](docs/GBRAIN_SKILLPACK.md) and [schema](docs/GBRAIN_RECOMMENDED_SCHEMA.md) work with any AI agent that can read and write files.
|
||||
|
||||
**When you need Postgres:** at 1,000+ files, `grep` stops working. GBrain adds hybrid search (keyword + vector + RRF fusion) on top of Postgres + pgvector. The CLI and MCP layer handle chunking, embedding, and incremental sync. Add Postgres when search speed matters, or when you want Claude Desktop, ChatGPT, Perplexity, or other MCP clients to connect to your brain remotely.
|
||||
**PGLite runs locally by default.** `gbrain init` gives you embedded Postgres with pgvector, hybrid search, and all 37 operations. No server, no subscription. When your brain outgrows local (1000+ files, multi-device access, remote MCP), `gbrain migrate --to supabase` moves everything to managed Postgres.
|
||||
|
||||
## Architecture
|
||||
|
||||
@@ -166,53 +204,151 @@ They're complementary:
|
||||
|
||||
All three should be checked. GBrain for facts about the world. Memory for agent config. Session for immediate context. Install via `openclaw skills install gbrain`.
|
||||
|
||||
## Try it: your files, searchable in 90 seconds
|
||||
## The compounding effect
|
||||
|
||||
GBrain doesn't ship with demo data. It finds YOUR markdown and makes it searchable.
|
||||
The real value isn't search. It's what happens after a few weeks of use.
|
||||
|
||||
**Act 1: Discovery.** GBrain scans your machine for markdown repos.
|
||||
You take a meeting with someone. The agent writes a brain page for them, links it to their company, tags it with the deal. Next week someone mentions that company in a different context. The agent already has the full picture: who you talked to, what you discussed, what threads are open. You didn't do anything. The brain already had it.
|
||||
|
||||
```
|
||||
=== GBrain Environment Discovery ===
|
||||
## Install
|
||||
|
||||
~/git/brain (2.3GB, 342 .md files, 87 binary files)
|
||||
Type: Plain markdown (ready for import)
|
||||
### Prerequisites
|
||||
|
||||
~/Documents/obsidian-vault (180MB, 1,203 .md files, 0 binary files)
|
||||
Type: Obsidian vault (wikilink conversion available)
|
||||
**Zero-config start (PGLite).** `gbrain init` creates a local embedded Postgres brain. No accounts, no server, no API keys. Keyword search works immediately. Add API keys later for vector search and LLM-powered features.
|
||||
|
||||
=== Discovery Complete ===
|
||||
```
|
||||
**For production scale (Supabase).** When your brain outgrows local, `gbrain migrate --to supabase` moves everything to managed Postgres:
|
||||
|
||||
**Act 2: Import.** Your files move from the repo into Supabase.
|
||||
| Dependency | What it's for | How to get it |
|
||||
|------------|--------------|---------------|
|
||||
| **Supabase account** | Postgres + pgvector database | [supabase.com](https://supabase.com) (Pro tier, $25/mo for 8GB) |
|
||||
| **OpenAI API key** | Embeddings (text-embedding-3-large) | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) |
|
||||
| **Anthropic API key** | Multi-query expansion + LLM chunking (Haiku) | [console.anthropic.com](https://console.anthropic.com) |
|
||||
|
||||
Set the API keys as environment variables:
|
||||
|
||||
```bash
|
||||
gbrain import ~/git/brain/
|
||||
# Imported 342 files into Supabase (1,847 chunks). Embedding in background...
|
||||
|
||||
gbrain stats
|
||||
# Pages: 342, Chunks: 1,847, Embedded: 0 (embedding...), Links: 0
|
||||
export OPENAI_API_KEY=sk-...
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
```
|
||||
|
||||
**Act 3: Search.** The agent picks a query from your actual content.
|
||||
The Supabase connection URL is configured during `gbrain init --supabase`. The OpenAI and Anthropic SDKs read their keys from the environment automatically.
|
||||
|
||||
Without an OpenAI key, search still works (keyword only, no vector search). Without an Anthropic key, search still works (no multi-query expansion, no LLM chunking).
|
||||
|
||||
### GBrain without OpenClaw
|
||||
|
||||
GBrain works with any AI agent, any MCP client, or no agent at all. Three paths:
|
||||
|
||||
#### Standalone CLI
|
||||
|
||||
Install globally and use gbrain from the terminal:
|
||||
|
||||
```bash
|
||||
# The agent reads your corpus and picks a relevant query
|
||||
gbrain query "what do we know about competitive dynamics?"
|
||||
# 3 results, scored by hybrid search (vector + keyword + RRF fusion)
|
||||
|
||||
# 30 seconds later, embeddings finish:
|
||||
gbrain stats
|
||||
# Pages: 342, Chunks: 1,847, Embedded: 1,847, Links: 0
|
||||
|
||||
# Now semantic search is live too
|
||||
gbrain query "what are our biggest risks right now?"
|
||||
# Finds pages about moats, board prep, and strategy -- by meaning, not keywords
|
||||
bun add -g github:garrytan/gbrain
|
||||
gbrain init # PGLite (local, no server needed)
|
||||
gbrain import ~/git/brain/ # index your markdown
|
||||
gbrain query "what themes show up across my notes?"
|
||||
```
|
||||
|
||||
Your file count will be different. Your queries will be different. The agent picks them based on what it imported. That's the point: this is YOUR brain, not a demo.
|
||||
Run `gbrain --help` for the full list of commands.
|
||||
|
||||
**The compounding effect.** Search for Pedro. The agent pulls his page, his relationship history, his company. Next time Brex comes up in conversation, the agent already knows Pedro co-founded it, what you discussed last, and what's on your open threads. You didn't do anything — the brain already had it.
|
||||
#### MCP server (Claude Code, Cursor, Windsurf, etc.)
|
||||
|
||||
GBrain exposes 30 MCP tools via stdio. Add this to your MCP client config:
|
||||
|
||||
**Claude Code** (`~/.claude/server.json`):
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"gbrain": {
|
||||
"command": "gbrain",
|
||||
"args": ["serve"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Cursor** (Settings > MCP Servers):
|
||||
```json
|
||||
{
|
||||
"gbrain": {
|
||||
"command": "gbrain",
|
||||
"args": ["serve"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This gives your agent `get_page`, `put_page`, `search`, `query`, `add_link`, `traverse_graph`, `sync_brain`, `file_upload`, and 22 more tools. All generated from the same operation definitions as the CLI.
|
||||
|
||||
#### Remote MCP Server (Claude Desktop, Cowork, Perplexity, ChatGPT)
|
||||
|
||||
Access your brain from any device, any AI client. Deploy as a serverless endpoint on your existing Supabase instance:
|
||||
|
||||
```bash
|
||||
cp .env.production.example .env.production # fill in 3 values
|
||||
bash scripts/deploy-remote.sh # links, builds, deploys
|
||||
bun run src/commands/auth.ts create "claude-desktop" # get a token
|
||||
```
|
||||
|
||||
Then add to your AI client:
|
||||
- **Claude Code:** `claude mcp add gbrain -t http https://YOUR_REF.supabase.co/functions/v1/gbrain-mcp/mcp -H "Authorization: Bearer TOKEN"`
|
||||
- **Claude Desktop:** Settings > Integrations > Add (NOT JSON config)
|
||||
- **Perplexity Computer:** Settings > Connectors > Add remote MCP
|
||||
|
||||
Per-client setup guides: [`docs/mcp/`](docs/mcp/DEPLOY.md)
|
||||
|
||||
ChatGPT support requires OAuth 2.1 (not yet implemented). Self-hosted alternatives (Tailscale Funnel, ngrok) documented in [`docs/mcp/ALTERNATIVES.md`](docs/mcp/ALTERNATIVES.md).
|
||||
|
||||
**The tools are not enough.** Your agent also needs the playbook: read [GBRAIN_SKILLPACK.md](docs/GBRAIN_SKILLPACK.md) and paste the relevant sections into your agent's system prompt or project instructions. The skillpack tells the agent WHEN and HOW to use each tool: read before responding, write after learning, detect entities on every message, back-link everything.
|
||||
|
||||
The skill markdown files in `skills/` are standalone instruction sets. Copy them into your agent's context:
|
||||
|
||||
| Skill file | What the agent learns |
|
||||
|------------|----------------------|
|
||||
| `skills/ingest/SKILL.md` | How to import meetings, docs, articles |
|
||||
| `skills/query/SKILL.md` | 3-layer search with synthesis and citations |
|
||||
| `skills/maintain/SKILL.md` | Periodic health: stale pages, orphans, dead links |
|
||||
| `skills/enrich/SKILL.md` | Enrich pages from external APIs |
|
||||
| `skills/briefing/SKILL.md` | Daily briefing with meeting prep |
|
||||
| `skills/migrate/SKILL.md` | Migrate from Obsidian, Notion, Logseq, etc. |
|
||||
|
||||
#### As a TypeScript library
|
||||
|
||||
```bash
|
||||
bun add github:garrytan/gbrain
|
||||
```
|
||||
|
||||
```typescript
|
||||
import { createEngine } from 'gbrain';
|
||||
|
||||
// PGLite (local, no server)
|
||||
const engine = createEngine('pglite');
|
||||
await engine.connect({ database_path: '~/.gbrain/brain.pglite' });
|
||||
await engine.initSchema();
|
||||
|
||||
// Or Postgres (Supabase / self-hosted)
|
||||
// const engine = createEngine('postgres');
|
||||
// await engine.connect({ database_url: process.env.DATABASE_URL });
|
||||
// await engine.initSchema();
|
||||
|
||||
// Search
|
||||
const results = await engine.searchKeyword('startup growth');
|
||||
|
||||
// Read
|
||||
const page = await engine.getPage('people/pedro-franceschi');
|
||||
|
||||
// Write
|
||||
await engine.putPage('concepts/superlinear-returns', {
|
||||
type: 'concept',
|
||||
title: 'Superlinear Returns',
|
||||
compiled_truth: 'Paul Graham argues that returns in many fields are superlinear...',
|
||||
timeline: '- 2023-10-01: Published on paulgraham.com',
|
||||
});
|
||||
```
|
||||
|
||||
The `BrainEngine` interface is pluggable. `createEngine()` accepts `'pglite'` or `'postgres'`. See `docs/ENGINES.md` for details.
|
||||
|
||||
PGLite (default) requires no external database. For production scale (7K+ pages, multi-device, remote MCP), use Supabase Pro ($25/mo).
|
||||
|
||||
## Upgrade
|
||||
|
||||
@@ -231,42 +367,17 @@ clawhub update gbrain
|
||||
|
||||
After upgrading, run `gbrain init` again to apply any schema migrations (idempotent, safe to re-run).
|
||||
|
||||
## Setup
|
||||
## Setup details
|
||||
|
||||
After installing via CLI or library path, run the setup wizard:
|
||||
`gbrain init` defaults to PGLite (embedded Postgres 17.5 via WASM). No accounts, no server. Config saved to `~/.gbrain/config.json`.
|
||||
|
||||
```bash
|
||||
# Guided wizard: auto-provisions Supabase or accepts a connection URL
|
||||
gbrain init --supabase
|
||||
|
||||
# Or connect to any Postgres with pgvector
|
||||
gbrain init --url postgresql://user:pass@host:5432/dbname
|
||||
gbrain init # PGLite (default)
|
||||
gbrain init --supabase # guided wizard for Supabase
|
||||
gbrain init --url <conn> # any Postgres with pgvector
|
||||
```
|
||||
|
||||
The init wizard:
|
||||
1. Checks for Supabase CLI, offers auto-provisioning
|
||||
2. Falls back to manual connection URL if CLI isn't available
|
||||
3. Runs the full schema migration (tables, indexes, triggers, extensions)
|
||||
4. Verifies the connection and confirms the database is ready for import
|
||||
|
||||
Config is saved to `~/.gbrain/config.json` with 0600 permissions.
|
||||
|
||||
OpenClaw users skip this step. The orchestrator runs the wizard for you during install.
|
||||
|
||||
## First import
|
||||
|
||||
```bash
|
||||
# Import your markdown wiki (auto-chunks and auto-embeds)
|
||||
gbrain import /path/to/brain/
|
||||
|
||||
# Skip embedding if you want to import fast and embed later
|
||||
gbrain import /path/to/brain/ --no-embed
|
||||
|
||||
# Backfill embeddings for pages that don't have them
|
||||
gbrain embed --stale
|
||||
```
|
||||
|
||||
Import is idempotent. Re-running it skips unchanged files (compared by SHA-256 content hash). Progress bar shows status. ~30s for text import of 7,000 files, ~10-15 min for embedding.
|
||||
Import is idempotent. Re-running skips unchanged files (SHA-256 content hash). ~30s for text import of 7,000 files, ~10-15 min for embedding.
|
||||
|
||||
## File storage and migration
|
||||
|
||||
@@ -442,7 +553,8 @@ Three strategies, dispatched by content type:
|
||||
|
||||
```
|
||||
SETUP
|
||||
gbrain init [--supabase|--url <conn>] Create brain (guided wizard)
|
||||
gbrain init [--supabase|--url <conn>] Create brain (PGLite default, or Supabase)
|
||||
gbrain migrate --to supabase|pglite Migrate between engines (bidirectional)
|
||||
gbrain upgrade Self-update
|
||||
|
||||
PAGES
|
||||
@@ -527,17 +639,24 @@ CLI / MCP Server
|
||||
BrainEngine interface
|
||||
(pluggable backend)
|
||||
|
|
||||
engine-factory.ts
|
||||
(dynamic imports)
|
||||
|
|
||||
+--------+--------+
|
||||
| |
|
||||
PostgresEngine SQLiteEngine
|
||||
(ships v0) (designed, community PRs welcome)
|
||||
|
|
||||
Supabase Pro ($25/mo)
|
||||
Postgres + pgvector + pg_trgm
|
||||
connection pooling via Supavisor
|
||||
PGLiteEngine PostgresEngine
|
||||
(ships v0.7) (ships v0)
|
||||
| |
|
||||
~/.gbrain/brain.pglite Supabase Pro ($25/mo)
|
||||
embedded PG 17.5 Postgres + pgvector + pg_trgm
|
||||
via @electric-sql connection pooling via Supavisor
|
||||
/pglite
|
||||
|
||||
gbrain migrate --to supabase/pglite
|
||||
(bidirectional migration)
|
||||
```
|
||||
|
||||
Embedding, chunking, and search fusion are engine-agnostic. Only raw keyword search (`searchKeyword`) and raw vector search (`searchVector`) are engine-specific. RRF fusion, multi-query expansion, and 4-layer dedup run above the engine on `SearchResult[]` arrays.
|
||||
Embedding, chunking, and search fusion are engine-agnostic. Only raw keyword search (`searchKeyword`) and raw vector search (`searchVector`) are engine-specific. RRF fusion, multi-query expansion, and 4-layer dedup run above the engine on `SearchResult[]` arrays. Both engines use the same SQL (PGLite runs real Postgres, not a separate dialect).
|
||||
|
||||
## Storage estimates
|
||||
|
||||
@@ -573,8 +692,7 @@ Initial embedding cost: ~$4-5 for 7,500 pages via OpenAI text-embedding-3-large.
|
||||
|
||||
**Reference:**
|
||||
- [GBRAIN_V0.md](docs/GBRAIN_V0.md) -- Full product spec, all architecture decisions
|
||||
- [ENGINES.md](docs/ENGINES.md) -- Pluggable engine interface, how to add backends
|
||||
- [SQLITE_ENGINE.md](docs/SQLITE_ENGINE.md) -- SQLite engine plan (community PRs welcome)
|
||||
- [ENGINES.md](docs/ENGINES.md) -- Pluggable engine interface: PGLite (default) + Postgres, capability matrix, migration
|
||||
|
||||
## Contributing
|
||||
|
||||
@@ -584,10 +702,10 @@ against real Postgres+pgvector: `docker compose -f docker-compose.test.yml up -d
|
||||
|
||||
Welcome PRs for:
|
||||
|
||||
- SQLite engine implementation
|
||||
- New enrichment API integrations
|
||||
- Performance optimizations
|
||||
- Docker Compose for self-hosted Postgres
|
||||
- Additional engine backends (DuckDB, Turso, etc.)
|
||||
|
||||
## License
|
||||
|
||||
|
||||
@@ -1,124 +0,0 @@
|
||||
# Claude Code Prompt: GBrain README Update for v0.4 Release
|
||||
|
||||
## Context
|
||||
|
||||
GBrain v0.4.0 just shipped. The big addition besides the technical features (doctor command, parallel import, storage backends, Apple Notes support) is that we now have **production benchmark data proving the search quality thesis**.
|
||||
|
||||
The README currently explains the architecture well but doesn't have concrete evidence for WHY hybrid search matters. We now have that evidence.
|
||||
|
||||
## The Benchmark
|
||||
|
||||
We ran 12 queries across 4 difficulty tiers against a production brain with 13,106 indexed pages and 19,979 chunks. Three methods compared:
|
||||
|
||||
**Method 1: `grep -ril` (filesystem search)**
|
||||
- Average: 231ms
|
||||
- Correct #1 result: 0 out of 12
|
||||
|
||||
**Method 2: `gbrain search` (keyword: pg_trgm + tsvector)**
|
||||
- Average: 666ms
|
||||
- Correct #1 result: 8 out of 12
|
||||
|
||||
**Method 3: `gbrain query` (hybrid: keyword + pgvector semantic + RRF)**
|
||||
- Average: 2,434ms
|
||||
- Correct #1 result: 12 out of 12
|
||||
|
||||
### Raw results (sanitized — no real names)
|
||||
|
||||
```
|
||||
Query grep search query(semantic)
|
||||
─────────────────────────────────────────────────────────────────────────────
|
||||
TIER 1: Entity lookup (known names)
|
||||
"John Smith" 188ms ❌ 635ms ✅ 1801ms ✅
|
||||
"Acme Corp CEO" 198ms ❌ 595ms ✅ 1497ms ✅
|
||||
"Project Alpha" 190ms ❌ 665ms ✅ 2010ms ✅
|
||||
|
||||
TIER 2: Topic/concept recall
|
||||
"founder mode" 186ms ❌ 671ms ✅ 1714ms ✅
|
||||
"Series A deal terms" 256ms ❌ 662ms ❌ 2650ms ✅
|
||||
"batch selection criteria" 198ms ❌ 690ms ✅ 2617ms ✅
|
||||
|
||||
TIER 3: Semantic (no exact keyword match)
|
||||
"founders building developer tools" 237ms ❌ 696ms ⚠️ 2719ms ✅
|
||||
"shame as fuel for ambition" 384ms ❌ 711ms ✅ 2610ms ✅
|
||||
"what makes a 10x company" 194ms ❌ 724ms ⚠️ 2904ms ✅
|
||||
|
||||
TIER 4: Cross-domain / relational
|
||||
"people who know both X and Y" 198ms ❌ 576ms ❌ 3181ms ✅
|
||||
"restaurants near resort" 366ms ❌ 689ms ✅ 2836ms ✅
|
||||
"original ideas about abundance" 188ms ❌ 682ms ⚠️ 2680ms ✅
|
||||
```
|
||||
|
||||
### What grep actually returned (this is the damning part)
|
||||
|
||||
- For "John Smith" → returned a project README that mentions the name in passing, not the actual person's dossier
|
||||
- For "Acme Corp CEO" → returned an index file, not the person or company page
|
||||
- For "Series A deal terms" → returned an event page about the Olympics (!)
|
||||
- For "what makes a 10x company" → returned a demo day page (5 different queries returned this same page)
|
||||
- For "restaurants near resort" → returned an adversary tracking page (!!)
|
||||
- For "original ideas about abundance" → returned an inbox README
|
||||
|
||||
**grep returned `yc-demo-day` as the top result for 5 completely different queries.** It's matching incidental word occurrences, not answering the question.
|
||||
|
||||
### The core insight (PUT THIS IN THE README)
|
||||
|
||||
This is NOT a speed story. grep is 10x faster. That's irrelevant.
|
||||
|
||||
This is a **correctness story**. At 13,000+ files, grep returns noise. It finds files that *contain a word from your query*, not files that *answer your question*. When your brain has 3,000 people pages, 5,800 archived notes, and 500+ media pages, the word "founder" appears in hundreds of files. grep can't tell which one you want. It returns whichever file the filesystem scanner hits first.
|
||||
|
||||
The practical consequence: **grep-based lookup causes the agent to hallucinate.** Not because the LLM is making things up — because it's being fed the wrong context. You ask "who is John Smith?" and the agent gets a project README instead of the person's dossier. Now it's generating a response from irrelevant context. The hallucination isn't in the model — it's in the retrieval.
|
||||
|
||||
Hybrid search eliminates this. The semantic layer understands that "shame as fuel for ambition" should find your essay about founder psychology, not a file that happens to contain the word "shame." The keyword layer ensures exact names still match instantly. RRF fusion combines both signals.
|
||||
|
||||
**The 2 seconds of extra latency buys you the right answer.** In a system where the agent is already spending 5-10 seconds thinking before it responds, 2 seconds of retrieval is invisible. But feeding the agent wrong context is catastrophic — it poisons the entire response.
|
||||
|
||||
## What to change in the README
|
||||
|
||||
1. **Add a "Why Not Just Grep?" section** (or expand the existing "Why this exists" section) with the benchmark data. This should be near the top — it's the strongest argument for why gbrain exists. Use the sanitized benchmark numbers, not real names.
|
||||
|
||||
2. **Add the hallucination argument.** The key framing: grep doesn't cause grep to hallucinate — grep causes the *agent* to hallucinate by feeding it wrong context. This is a concrete, measurable problem, not a theoretical one.
|
||||
|
||||
3. **Add a "Benchmark" section** with instructions for running the benchmark yourself: `bash skills/benchmark-gbrain/scripts/benchmark.sh`. Users can verify on their own data.
|
||||
|
||||
4. **Update the "Why this exists" narrative.** The current version mentions grep falling apart at scale but doesn't have concrete numbers. Now we have them. The story should be: "At 500 files grep works. At 13,000 files, grep returned the correct top result 0 out of 12 times. Here's what it returned instead."
|
||||
|
||||
5. **Update the v0.4.0 section in the changelog** if it exists, or add release notes mentioning the benchmark skill.
|
||||
|
||||
6. **Keep the tone.** The README's voice is good — direct, technical, opinionated, no marketing fluff. The benchmark section should match: show the data, explain what it means, don't oversell.
|
||||
|
||||
## What NOT to change
|
||||
|
||||
- Don't touch the architecture diagrams, they're good
|
||||
- Don't change the install/setup flow
|
||||
- Don't remove the "What one brain looks like" section
|
||||
- Don't change the knowledge model explanation
|
||||
- Don't add fluff or marketing language
|
||||
- **IMPORTANT: Do not reference any real people by name in the benchmark section.** Use generic examples ("a person page", "a company page", "an essay about founder psychology"). The benchmark queries in the skill use real names but the README should not.
|
||||
|
||||
## Also scrub existing real-name references
|
||||
|
||||
The README currently has these real-name references that should be genericized:
|
||||
- Line 24: "Pedro and Diana" → use generic names
|
||||
- Line 172: "Pedro", "Brex" → use generic examples
|
||||
- Lines 12, 34: "dossiers" is fine (generic term), keep it
|
||||
|
||||
Replace with plausible but clearly fictional examples. Don't use "Alice and Bob" — too cliché. Use something like "Jordan" and "Sarah" or similar.
|
||||
|
||||
## Files to edit
|
||||
|
||||
- `README.md` — main changes + scrub real names from examples
|
||||
- `CHANGELOG.md` — add benchmark skill to v0.4.0 section if not already there
|
||||
|
||||
## How to run this
|
||||
|
||||
```bash
|
||||
cd /tmp/gbrain-product
|
||||
# Read the current README
|
||||
cat README.md
|
||||
# Read the benchmark skill for reference
|
||||
cat skills/benchmark-gbrain/SKILL.md 2>/dev/null || cat /data/.openclaw/workspace/skills/benchmark-gbrain/SKILL.md
|
||||
# Read the benchmark script for the actual test methodology
|
||||
cat skills/benchmark-gbrain/scripts/benchmark.sh 2>/dev/null || cat /data/.openclaw/workspace/skills/benchmark-gbrain/scripts/benchmark.sh
|
||||
# Make edits to README.md
|
||||
# Verify nothing references real people in the benchmark section
|
||||
grep -n "Pedro\|Benioff\|Legion\|Garry\|adversary\|oppo" README.md
|
||||
```
|
||||
13
TODOS.md
13
TODOS.md
@@ -19,6 +19,19 @@
|
||||
|
||||
## P0
|
||||
|
||||
### Fix `bun build --compile` WASM embedding for PGLite
|
||||
**What:** Submit PR to oven-sh/bun fixing WASM file embedding in `bun build --compile` (issue oven-sh/bun#15032).
|
||||
|
||||
**Why:** PGLite's WASM files (~3MB) can't be embedded in the compiled binary. Users who install via `bun install -g gbrain` are fine (WASM resolves from node_modules), but the compiled binary can't use PGLite. Jarred Sumner (Bun founder, YC W22) would likely be receptive.
|
||||
|
||||
**Pros:** Single-binary distribution includes PGLite. No sidecar files needed.
|
||||
|
||||
**Cons:** Requires understanding Bun's bundler internals. May be a large PR.
|
||||
|
||||
**Context:** Issue has been open since Nov 2024. The root cause is that `bun build --compile` generates virtual filesystem paths (`/$bunfs/root/...`) that PGLite can't resolve. Multiple users have reported this. A fix would benefit any WASM-dependent package, not just PGLite.
|
||||
|
||||
**Depends on:** PGLite engine shipping (to have a real use case for the PR).
|
||||
|
||||
### ChatGPT MCP support (OAuth 2.1)
|
||||
**What:** Add OAuth 2.1 with Dynamic Client Registration to the Edge Function so ChatGPT can connect.
|
||||
|
||||
|
||||
3
bun.lock
3
bun.lock
@@ -7,6 +7,7 @@
|
||||
"dependencies": {
|
||||
"@anthropic-ai/sdk": "^0.30.0",
|
||||
"@aws-sdk/client-s3": "^3.1028.0",
|
||||
"@electric-sql/pglite": "^0.4.4",
|
||||
"@modelcontextprotocol/sdk": "^1.0.0",
|
||||
"gray-matter": "^4.0.3",
|
||||
"openai": "^4.0.0",
|
||||
@@ -101,6 +102,8 @@
|
||||
|
||||
"@aws/lambda-invoke-store": ["@aws/lambda-invoke-store@0.2.4", "", {}, "sha512-iY8yvjE0y651BixKNPgmv1WrQc+GZ142sb0z4gYnChDDY2YqI4P/jsSopBWrKfAt7LOJAkOXt7rC/hms+WclQQ=="],
|
||||
|
||||
"@electric-sql/pglite": ["@electric-sql/pglite@0.4.4", "", {}, "sha512-g/6CWAJ4XOkObWCWAQ2IReZD8VvsDy3poRHSKvpRR2F96F8WJ3HVbjpso3gN7l0q6QPPgvxSSpl/qo5k8a7mkQ=="],
|
||||
|
||||
"@hono/node-server": ["@hono/node-server@1.19.12", "", { "peerDependencies": { "hono": "^4" } }, "sha512-txsUW4SQ1iilgE0l9/e9VQWmELXifEFvmdA1j6WFh/aFPj99hIntrSsq/if0UWyGVkmrRPKA1wCeP+UCr1B9Uw=="],
|
||||
|
||||
"@modelcontextprotocol/sdk": ["@modelcontextprotocol/sdk@1.29.0", "", { "dependencies": { "@hono/node-server": "^1.19.9", "ajv": "^8.17.1", "ajv-formats": "^3.0.1", "content-type": "^1.0.5", "cors": "^2.8.5", "cross-spawn": "^7.0.5", "eventsource": "^3.0.2", "eventsource-parser": "^3.0.0", "express": "^5.2.1", "express-rate-limit": "^8.2.1", "hono": "^4.11.4", "jose": "^6.1.3", "json-schema-typed": "^8.0.2", "pkce-challenge": "^5.0.0", "raw-body": "^3.0.0", "zod": "^3.25 || ^4.0", "zod-to-json-schema": "^3.25.1" }, "peerDependencies": { "@cfworker/json-schema": "^4.1.1" }, "optionalPeers": ["@cfworker/json-schema"] }, "sha512-zo37mZA9hJWpULgkRpowewez1y6ML5GsXJPY8FI0tBBCd77HEvza4jDqRKOXgHNn867PVGCyTdzqpz0izu5ZjQ=="],
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
|
||||
Every GBrain operation goes through `BrainEngine`. The engine is the contract between "what the brain can do" and "how it's stored." Swap the engine, keep everything else.
|
||||
|
||||
v0 ships `PostgresEngine` backed by Supabase. The interface is designed so a `SQLiteEngine`, `DuckDBEngine`, or `TursoEngine` could slot in without touching the CLI, MCP server, skills, or any consumer code.
|
||||
v0 shipped `PostgresEngine` backed by Supabase. v0.7 adds `PGLiteEngine` -- embedded Postgres 17.5 via WASM (@electric-sql/pglite), zero-config default. The interface is designed so a `DuckDBEngine`, `TursoEngine`, or any custom backend could slot in without touching the CLI, MCP server, skills, or any consumer code.
|
||||
|
||||
## Why this matters
|
||||
|
||||
@@ -12,13 +12,14 @@ Different users have different constraints:
|
||||
|
||||
| User | Needs | Best engine |
|
||||
|------|-------|-------------|
|
||||
| Getting started | Zero-config, no accounts, no server | PGLiteEngine (default since v0.7) |
|
||||
| Power user (you) | World-class search, 7K+ pages, zero-ops | PostgresEngine + Supabase |
|
||||
| Open source hacker | Single file, no server, git-friendly | SQLiteEngine (future) |
|
||||
| Open source hacker | Single file, no server, git-friendly | PGLiteEngine |
|
||||
| Team/enterprise | Multi-user, RLS, audit trail | PostgresEngine + self-hosted |
|
||||
| Researcher | Analytics, bulk exports, embeddings | DuckDBEngine (someday) |
|
||||
| Edge/mobile | Offline-first, sync later | SQLiteEngine + sync (someday) |
|
||||
| Edge/mobile | Offline-first, sync later | PGLiteEngine + sync (someday) |
|
||||
|
||||
The engine interface means we don't have to choose. Ship Postgres now, let the community build the rest.
|
||||
The engine interface means we don't have to choose. PGLite is the zero-friction default. Supabase is the production scale path. `gbrain migrate --to supabase/pglite` moves between them.
|
||||
|
||||
## The interface
|
||||
|
||||
@@ -82,6 +83,10 @@ export interface BrainEngine {
|
||||
// Config
|
||||
getConfig(key: string): Promise<string | null>;
|
||||
setConfig(key: string, value: string): Promise<void>;
|
||||
|
||||
// Migration + advanced (added v0.7)
|
||||
runMigration(sql: string): Promise<void>;
|
||||
getChunksWithEmbeddings(slug: string): Promise<ChunkWithEmbedding[]>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -116,11 +121,11 @@ export interface BrainEngine {
|
||||
+-----------+-----------+ +---------+---------+
|
||||
| | | |
|
||||
+-------v-------+ +-------v---+ +-------v---+ +----v--------+
|
||||
| Postgres: | | SQLite: | | Postgres: | | SQLite: |
|
||||
| tsvector + | | FTS5 + | | pgvector | | sqlite-vss |
|
||||
| ts_rank + | | bm25 | | HNSW | | or vec0 |
|
||||
| websearch_to_ | | | | cosine | | |
|
||||
| tsquery | | | | | | |
|
||||
| Postgres: | | PGLite: | | Postgres: | | PGLite: |
|
||||
| tsvector + | | tsvector +| | pgvector | | pgvector |
|
||||
| ts_rank + | | ts_rank | | HNSW | | HNSW |
|
||||
| websearch_to_ | | (same SQL)| | cosine | | cosine |
|
||||
| tsquery | | | | | | (same SQL) |
|
||||
+---------------+ +-----------+ +-----------+ +-------------+
|
||||
```
|
||||
|
||||
@@ -143,20 +148,50 @@ RRF fusion, multi-query expansion, and 4-layer dedup are engine-agnostic. They o
|
||||
|
||||
**Why not self-hosted for v0:** The brain should be infrastructure agents use, not something you maintain. Self-hosted Postgres with Docker is a welcome community PR, but v0 optimizes for zero ops.
|
||||
|
||||
## PGLiteEngine (v0.7, ships)
|
||||
|
||||
**Dependencies:** `@electric-sql/pglite` (v0.4.4+)
|
||||
|
||||
**What it is:** Embedded Postgres 17.5 compiled to WASM via ElectricSQL's PGLite. Runs in-process, no server, no Docker, no accounts. Same SQL as PostgresEngine -- not a separate dialect. All 37 BrainEngine methods implemented.
|
||||
|
||||
**PGLite-specific details:**
|
||||
- Uses `pglite-schema.ts` for DDL (pgvector extension, pg_trgm, triggers, indexes)
|
||||
- Parameterized queries throughout (shared utilities in `src/core/utils.ts`)
|
||||
- `hybridSearch` keyword-only fallback when `OPENAI_API_KEY` is not set
|
||||
- Data stored at `~/.gbrain/brain.db` (configurable)
|
||||
- pgvector HNSW index for cosine similarity vector search (same as Postgres)
|
||||
- tsvector + ts_rank for full-text search (same as Postgres)
|
||||
- pg_trgm for fuzzy slug resolution (same as Postgres)
|
||||
|
||||
**When to use PGLite vs Postgres:**
|
||||
|
||||
| Factor | PGLite | PostgresEngine + Supabase |
|
||||
|--------|--------|--------------------------|
|
||||
| Setup | `gbrain init` (zero-config) | Account + connection string |
|
||||
| Scale | Good for < 1,000 files | Production-proven at 10K+ |
|
||||
| Multi-device | Single machine only | Any device via remote MCP |
|
||||
| Cost | Free | Supabase Pro ($25/mo) |
|
||||
| Concurrency | Single process | Connection pooling |
|
||||
| Backups | Manual (file copy) | Managed by Supabase |
|
||||
|
||||
**Migration:** `gbrain migrate --to supabase` exports everything (pages, chunks, embeddings, links, tags, timeline) and imports into Supabase. `gbrain migrate --to pglite` goes the other direction. Bidirectional, lossless.
|
||||
|
||||
## Adding a new engine
|
||||
|
||||
1. Create `src/core/<name>-engine.ts` implementing `BrainEngine`
|
||||
2. Add to engine factory in `src/core/engine.ts`:
|
||||
2. Add to engine factory in `src/core/engine-factory.ts`:
|
||||
```typescript
|
||||
export function createEngine(type: string): BrainEngine {
|
||||
switch (type) {
|
||||
case 'pglite': return new PGLiteEngine();
|
||||
case 'postgres': return new PostgresEngine();
|
||||
case 'sqlite': return new SQLiteEngine();
|
||||
case 'myengine': return new MyEngine();
|
||||
default: throw new Error(`Unknown engine: ${type}`);
|
||||
}
|
||||
}
|
||||
```
|
||||
3. Store engine type in `~/.gbrain/config.json`: `{ "engine": "sqlite", ... }`
|
||||
The factory uses dynamic imports so engines are only loaded when selected.
|
||||
3. Store engine type in `~/.gbrain/config.json`: `{ "engine": "myengine", ... }`
|
||||
4. Add tests. The test suite should be engine-agnostic where possible... same test cases, different engine constructor.
|
||||
5. Document in this file + add a design doc in `docs/`
|
||||
|
||||
@@ -175,24 +210,25 @@ Every method in `BrainEngine`. The full interface. No optional methods, no featu
|
||||
|
||||
## Capability matrix
|
||||
|
||||
| Capability | PostgresEngine | SQLiteEngine (future) | Notes |
|
||||
|-----------|---------------|----------------------|-------|
|
||||
| CRUD | Full | Full | |
|
||||
| Keyword search | tsvector + ts_rank | FTS5 + bm25 | Different ranking algorithms |
|
||||
| Vector search | pgvector HNSW | sqlite-vss or vec0 | Different index types |
|
||||
| Fuzzy slug | pg_trgm | LIKE + Levenshtein | Postgres is better here |
|
||||
| Graph traversal | Recursive CTE | Loop with depth tracking | Same interface |
|
||||
| Capability | PostgresEngine | PGLiteEngine | Notes |
|
||||
|-----------|---------------|-------------|-------|
|
||||
| CRUD | Full | Full | Same SQL |
|
||||
| Keyword search | tsvector + ts_rank | tsvector + ts_rank | Identical (real Postgres) |
|
||||
| Vector search | pgvector HNSW | pgvector HNSW | Identical (real Postgres) |
|
||||
| Fuzzy slug | pg_trgm | pg_trgm | Identical (real Postgres) |
|
||||
| Graph traversal | Recursive CTE | Recursive CTE | Same SQL |
|
||||
| Transactions | Full ACID | Full ACID | Both support this |
|
||||
| JSONB queries | GIN index | json_extract | Postgres is richer |
|
||||
| Concurrent access | Connection pooling | Single writer | SQLite limitation |
|
||||
| JSONB queries | GIN index | GIN index | Identical |
|
||||
| Concurrent access | Connection pooling | Single process | PGLite limitation |
|
||||
| Hosting | Supabase, self-hosted, Docker | Local file | |
|
||||
| Migration methods | runMigration, getChunksWithEmbeddings | Same | Added v0.7 |
|
||||
|
||||
## Future engine ideas
|
||||
|
||||
**SQLiteEngine** (most requested). See `docs/SQLITE_ENGINE.md` for the full plan. Single file, no server, git-friendly. Uses FTS5 for keyword search, sqlite-vss or vec0 for vector search. Great for open source users who want zero infrastructure.
|
||||
|
||||
**TursoEngine.** libSQL (SQLite fork) with embedded replicas and HTTP edge access. Would give SQLite's simplicity with cloud sync. Interesting for mobile/edge use cases.
|
||||
|
||||
**DuckDBEngine.** Analytical workloads. Bulk exports, embedding analysis, brain-wide statistics. Not for OLTP. Could be a secondary engine for analytics alongside Postgres for operations.
|
||||
|
||||
**Custom/Remote.** The interface is clean enough that someone could build an engine backed by any storage: Firestore, DynamoDB, a REST API, even a flat file system. The interface doesn't assume SQL.
|
||||
|
||||
Note: The original SQLite engine plan (`docs/SQLITE_ENGINE.md`) was superseded by PGLite. PGLite uses the same SQL as Postgres, eliminating the need for a separate SQLite dialect with FTS5/sqlite-vss translation.
|
||||
|
||||
@@ -1,395 +0,0 @@
|
||||
# SQLite Engine Design
|
||||
|
||||
## Status: Designed, not built. Community PRs welcome.
|
||||
|
||||
The pluggable engine interface (`docs/ENGINES.md`) means anyone can add a SQLite backend without touching the CLI, MCP server, or skills. This document is the full plan.
|
||||
|
||||
## Why SQLite
|
||||
|
||||
Postgres is the right choice for the primary user (7K+ pages, production RAG, zero-ops via Supabase). But a lot of people want something simpler:
|
||||
|
||||
- **No server.** One file. `brain.db`. Done.
|
||||
- **Git-friendly.** You can (with care) commit a SQLite database alongside your notes.
|
||||
- **Offline.** Works on a plane, in a coffee shop, wherever.
|
||||
- **Zero cost.** No Supabase subscription. No hosting. No API keys for search (keyword-only mode works without OpenAI).
|
||||
- **Portable.** Copy the file to another machine. That's it.
|
||||
|
||||
Tools like Khoj, Obsidian plugins, and various "local-first AI" projects already use SQLite with vector extensions. The patterns exist. This is well-trodden ground.
|
||||
|
||||
## What it gives up
|
||||
|
||||
Compared to PostgresEngine:
|
||||
|
||||
| Feature | Postgres | SQLite | Impact |
|
||||
|---------|----------|--------|--------|
|
||||
| Full-text search quality | tsvector + ts_rank (excellent) | FTS5 + bm25 (good) | Slightly less precise ranking |
|
||||
| Fuzzy slug matching | pg_trgm (excellent) | LIKE + Levenshtein (ok) | Fuzzier matching, more false positives |
|
||||
| Vector search | pgvector HNSW (fast, accurate) | sqlite-vss or vec0 (good enough) | Slower at scale, good for <50K chunks |
|
||||
| Concurrent access | Connection pooling, many readers/writers | Single writer, many readers | Not an issue for single-user CLI |
|
||||
| JSONB queries | GIN index, rich operators | json_extract, no index | Slower frontmatter queries |
|
||||
| Graph traversal | Recursive CTE (native) | Recursive CTE (supported since 3.8.3) | Same |
|
||||
| Hosted option | Supabase, RDS, etc. | Turso (libSQL), Cloudflare D1 | SQLite has cloud options too |
|
||||
|
||||
For a single user with <10K pages and no concurrent access needs, these tradeoffs are fine.
|
||||
|
||||
## Schema
|
||||
|
||||
SQLite equivalent of the Postgres schema. Key differences called out.
|
||||
|
||||
```sql
|
||||
-- Enable WAL mode for better read concurrency
|
||||
PRAGMA journal_mode=WAL;
|
||||
PRAGMA foreign_keys=ON;
|
||||
|
||||
-- ============================================================
|
||||
-- pages
|
||||
-- ============================================================
|
||||
CREATE TABLE pages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
slug TEXT NOT NULL UNIQUE,
|
||||
type TEXT NOT NULL,
|
||||
title TEXT NOT NULL,
|
||||
compiled_truth TEXT NOT NULL DEFAULT '',
|
||||
timeline TEXT NOT NULL DEFAULT '',
|
||||
frontmatter TEXT NOT NULL DEFAULT '{}', -- JSON string, not JSONB
|
||||
content_hash TEXT, -- SHA-256 for import idempotency
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_pages_type ON pages(type);
|
||||
|
||||
-- ============================================================
|
||||
-- Full-text search via FTS5 (replaces tsvector)
|
||||
-- ============================================================
|
||||
CREATE VIRTUAL TABLE pages_fts USING fts5(
|
||||
title,
|
||||
compiled_truth,
|
||||
timeline,
|
||||
content='pages',
|
||||
content_rowid='id',
|
||||
tokenize='porter unicode61'
|
||||
);
|
||||
|
||||
-- Triggers to keep FTS5 in sync
|
||||
CREATE TRIGGER pages_fts_insert AFTER INSERT ON pages BEGIN
|
||||
INSERT INTO pages_fts(rowid, title, compiled_truth, timeline)
|
||||
VALUES (new.id, new.title, new.compiled_truth, new.timeline);
|
||||
END;
|
||||
|
||||
CREATE TRIGGER pages_fts_update AFTER UPDATE ON pages BEGIN
|
||||
INSERT INTO pages_fts(pages_fts, rowid, title, compiled_truth, timeline)
|
||||
VALUES ('delete', old.id, old.title, old.compiled_truth, old.timeline);
|
||||
INSERT INTO pages_fts(rowid, title, compiled_truth, timeline)
|
||||
VALUES (new.id, new.title, new.compiled_truth, new.timeline);
|
||||
END;
|
||||
|
||||
CREATE TRIGGER pages_fts_delete AFTER DELETE ON pages BEGIN
|
||||
INSERT INTO pages_fts(pages_fts, rowid, title, compiled_truth, timeline)
|
||||
VALUES ('delete', old.id, old.title, old.compiled_truth, old.timeline);
|
||||
END;
|
||||
|
||||
-- ============================================================
|
||||
-- content_chunks
|
||||
-- ============================================================
|
||||
CREATE TABLE content_chunks (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
chunk_index INTEGER NOT NULL,
|
||||
chunk_text TEXT NOT NULL,
|
||||
chunk_source TEXT NOT NULL DEFAULT 'compiled_truth',
|
||||
embedding BLOB, -- Float32Array as raw bytes
|
||||
model TEXT NOT NULL DEFAULT 'text-embedding-3-large',
|
||||
token_count INTEGER,
|
||||
embedded_at TEXT,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_chunks_page ON content_chunks(page_id);
|
||||
|
||||
-- Vector search index created separately via sqlite-vss or vec0
|
||||
-- See "Vector search options" section below
|
||||
|
||||
-- ============================================================
|
||||
-- links
|
||||
-- ============================================================
|
||||
CREATE TABLE links (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
link_type TEXT NOT NULL DEFAULT '',
|
||||
context TEXT NOT NULL DEFAULT '',
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
UNIQUE(from_page_id, to_page_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_links_from ON links(from_page_id);
|
||||
CREATE INDEX idx_links_to ON links(to_page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- tags
|
||||
-- ============================================================
|
||||
CREATE TABLE tags (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
tag TEXT NOT NULL,
|
||||
UNIQUE(page_id, tag)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_tags_tag ON tags(tag);
|
||||
CREATE INDEX idx_tags_page_id ON tags(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- raw_data
|
||||
-- ============================================================
|
||||
CREATE TABLE raw_data (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
source TEXT NOT NULL,
|
||||
data TEXT NOT NULL, -- JSON string
|
||||
fetched_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
UNIQUE(page_id, source)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_raw_data_page ON raw_data(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- timeline_entries
|
||||
-- ============================================================
|
||||
CREATE TABLE timeline_entries (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
date TEXT NOT NULL, -- ISO date string
|
||||
source TEXT NOT NULL DEFAULT '',
|
||||
summary TEXT NOT NULL,
|
||||
detail TEXT NOT NULL DEFAULT '',
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_timeline_page ON timeline_entries(page_id);
|
||||
CREATE INDEX idx_timeline_date ON timeline_entries(date);
|
||||
|
||||
-- ============================================================
|
||||
-- page_versions
|
||||
-- ============================================================
|
||||
CREATE TABLE page_versions (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
compiled_truth TEXT NOT NULL,
|
||||
frontmatter TEXT NOT NULL DEFAULT '{}',
|
||||
snapshot_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_versions_page ON page_versions(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- ingest_log
|
||||
-- ============================================================
|
||||
CREATE TABLE ingest_log (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
source_type TEXT NOT NULL,
|
||||
source_ref TEXT NOT NULL,
|
||||
pages_updated TEXT NOT NULL DEFAULT '[]', -- JSON array
|
||||
summary TEXT NOT NULL DEFAULT '',
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- config
|
||||
-- ============================================================
|
||||
CREATE TABLE config (
|
||||
key TEXT PRIMARY KEY,
|
||||
value TEXT NOT NULL
|
||||
);
|
||||
|
||||
INSERT INTO config (key, value) VALUES
|
||||
('version', '1'),
|
||||
('engine', 'sqlite'),
|
||||
('embedding_model', 'text-embedding-3-large'),
|
||||
('embedding_dimensions', '1536'),
|
||||
('chunk_strategy', 'semantic');
|
||||
```
|
||||
|
||||
### Key differences from Postgres schema
|
||||
|
||||
| Feature | Postgres | SQLite |
|
||||
|---------|----------|--------|
|
||||
| Types | `SERIAL`, `TIMESTAMPTZ`, `JSONB`, `vector(1536)` | `INTEGER`, `TEXT`, `TEXT` (JSON), `BLOB` |
|
||||
| Full-text search | `tsvector` generated column + GIN | FTS5 virtual table + triggers |
|
||||
| Vector storage | `vector(1536)` column type | `BLOB` (raw Float32Array bytes) |
|
||||
| Vector index | HNSW via pgvector | Separate via sqlite-vss or vec0 |
|
||||
| Fuzzy search | `pg_trgm` GIN index | LIKE queries or Levenshtein UDF |
|
||||
| JSON queries | `JSONB` + GIN index | `json_extract()` function |
|
||||
| Timestamps | `TIMESTAMPTZ` (native) | `TEXT` with ISO format |
|
||||
|
||||
## Vector search options
|
||||
|
||||
Two main choices for vector search in SQLite:
|
||||
|
||||
### Option A: sqlite-vss (Alex Garcia)
|
||||
|
||||
```sql
|
||||
-- Load extension
|
||||
.load ./vector0
|
||||
.load ./vss0
|
||||
|
||||
-- Create virtual table linked to content_chunks
|
||||
CREATE VIRTUAL TABLE chunks_vss USING vss0(
|
||||
embedding(1536)
|
||||
);
|
||||
|
||||
-- Insert embeddings (linked by rowid to content_chunks)
|
||||
INSERT INTO chunks_vss(rowid, embedding)
|
||||
SELECT id, embedding FROM content_chunks WHERE embedding IS NOT NULL;
|
||||
|
||||
-- Search
|
||||
SELECT rowid, distance
|
||||
FROM chunks_vss
|
||||
WHERE vss_search(embedding, :query_embedding)
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
Pros: mature, well-documented, used by many projects.
|
||||
Cons: requires loading native extensions (platform-specific binaries).
|
||||
|
||||
### Option B: vec0 (newer, from same author)
|
||||
|
||||
```sql
|
||||
-- Create virtual table
|
||||
CREATE VIRTUAL TABLE chunks_vec USING vec0(
|
||||
chunk_id INTEGER PRIMARY KEY,
|
||||
embedding float[1536]
|
||||
);
|
||||
|
||||
-- Search
|
||||
SELECT chunk_id, distance
|
||||
FROM chunks_vec
|
||||
WHERE embedding MATCH :query_embedding
|
||||
ORDER BY distance
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
Pros: simpler API, better integration with SQLite ecosystem.
|
||||
Cons: newer, less battle-tested.
|
||||
|
||||
### Option C: No vector search (keyword only)
|
||||
|
||||
For users who don't want to deal with vector extensions or OpenAI API keys, the brain still works with keyword search only. FTS5 + bm25 is genuinely good for structured wiki content where you know the terms. `searchVector` returns `[]`, hybrid search degrades gracefully to keyword-only.
|
||||
|
||||
This is a valid configuration. Not everyone needs embeddings.
|
||||
|
||||
## Init flow for SQLite
|
||||
|
||||
```bash
|
||||
gbrain init --sqlite
|
||||
# or: gbrain init --sqlite --path ~/brain.db
|
||||
|
||||
# 1. Create database file at specified path (default: ~/.gbrain/brain.db)
|
||||
# 2. Run schema (all CREATE TABLE + FTS5 + triggers)
|
||||
# 3. Write config to ~/.gbrain/config.json:
|
||||
# { "engine": "sqlite", "database_path": "~/.gbrain/brain.db" }
|
||||
# 4. Verify database is ready for import
|
||||
# 5. "Brain ready. Run: gbrain import <your-repo>"
|
||||
```
|
||||
|
||||
No Supabase account needed. No API keys needed (keyword-only mode). No server. Just a file.
|
||||
|
||||
For vector search, the user additionally needs:
|
||||
- OpenAI API key in `~/.gbrain/config.json` or `OPENAI_API_KEY` env var
|
||||
- sqlite-vss or vec0 extension binary for their platform
|
||||
|
||||
## Fuzzy slug resolution without pg_trgm
|
||||
|
||||
Postgres uses `pg_trgm` GIN index for fast fuzzy matching. SQLite doesn't have this. Options:
|
||||
|
||||
1. **LIKE with wildcards.** `WHERE slug LIKE '%dont%scale%'`. Simple, works for partial matches, but no ranking.
|
||||
2. **Levenshtein distance via UDF.** Load a user-defined function (or implement in TS) that computes edit distance. Sort by distance. Slower but more accurate.
|
||||
3. **Trigram simulation in TS.** Compute trigrams in TypeScript, store in a separate table, query by trigram overlap. Fast but requires maintaining the trigram index.
|
||||
|
||||
Recommendation: start with LIKE + fallback to Levenshtein UDF. Good enough for single-user, <10K pages.
|
||||
|
||||
## Implementation roadmap
|
||||
|
||||
If you're building this, here's the order:
|
||||
|
||||
1. **`src/core/sqlite-engine.ts`** implementing `BrainEngine`
|
||||
2. **Schema migration** (the SQL above)
|
||||
3. **CRUD operations** (getPage, putPage, listPages, deletePage). Straightforward SQL.
|
||||
4. **FTS5 keyword search** (searchKeyword). Map `websearch_to_tsquery` semantics to FTS5 query syntax.
|
||||
5. **Tags, links, timeline, raw_data, versions, config, ingest_log.** All straightforward.
|
||||
6. **Graph traversal.** SQLite supports recursive CTEs since 3.8.3. Port the Postgres CTE with max depth.
|
||||
7. **Vector search** (optional). Pick sqlite-vss or vec0, implement searchVector.
|
||||
8. **Tests.** Port the Postgres test suite. Most tests should be engine-agnostic.
|
||||
|
||||
Steps 1-6 are purely mechanical. Step 7 is the only one that requires a native extension.
|
||||
|
||||
## Dependencies for SQLite engine
|
||||
|
||||
```json
|
||||
{
|
||||
"better-sqlite3": "^11.0.0"
|
||||
}
|
||||
```
|
||||
|
||||
Or use Bun's built-in `bun:sqlite` driver (zero dependency).
|
||||
|
||||
For vector search, add one of:
|
||||
- `sqlite-vss` (native extension, platform-specific)
|
||||
- `vec0` (native extension, platform-specific)
|
||||
|
||||
## Testing strategy
|
||||
|
||||
Most test cases should be engine-agnostic. The test runner should parameterize by engine:
|
||||
|
||||
```typescript
|
||||
const engines = [
|
||||
{ name: 'postgres', factory: () => new PostgresEngine() },
|
||||
{ name: 'sqlite', factory: () => new SQLiteEngine() },
|
||||
];
|
||||
|
||||
for (const { name, factory } of engines) {
|
||||
describe(`BrainEngine (${name})`, () => {
|
||||
const engine = factory();
|
||||
|
||||
test('putPage + getPage round-trip', async () => {
|
||||
await engine.putPage('test/slug', { title: 'Test', type: 'person', ... });
|
||||
const page = await engine.getPage('test/slug');
|
||||
expect(page.title).toBe('Test');
|
||||
});
|
||||
|
||||
// ... all CRUD, search, link, tag, timeline tests
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
Search tests may need engine-specific assertions (ranking differences between tsvector and FTS5 are expected). But the interface contract (returns SearchResult[], sorted by relevance) should hold across engines.
|
||||
|
||||
## File structure
|
||||
|
||||
```
|
||||
brain.db # ~750MB for 7K pages with embeddings
|
||||
# ~150MB without embeddings (keyword-only)
|
||||
~/.gbrain/config.json # { "engine": "sqlite", "database_path": "..." }
|
||||
```
|
||||
|
||||
That's it. One file for the brain. One file for config.
|
||||
|
||||
## Migration between engines
|
||||
|
||||
Future work: `gbrain migrate --from postgres --to sqlite` (and vice versa). The engine interface makes this straightforward... export all data via one engine's methods, import via the other's. The data model is the same, only the storage format changes.
|
||||
|
||||
This is not built yet. For now, `gbrain export` to markdown and `gbrain import` into the other engine achieves the same result (with re-chunking and re-embedding).
|
||||
|
||||
## Contributing
|
||||
|
||||
If you want to build this:
|
||||
|
||||
1. Fork the repo
|
||||
2. Create `src/core/sqlite-engine.ts`
|
||||
3. Use the schema from this document
|
||||
4. Run the existing test suite against your engine
|
||||
5. PR it
|
||||
|
||||
The interface is well-defined. The schema is documented. The test suite exists. This should be a few days of focused work with CC, or a weekend project for a human.
|
||||
|
||||
We'd love to see it.
|
||||
@@ -32,6 +32,7 @@
|
||||
"dependencies": {
|
||||
"@anthropic-ai/sdk": "^0.30.0",
|
||||
"@aws-sdk/client-s3": "^3.1028.0",
|
||||
"@electric-sql/pglite": "^0.4.4",
|
||||
"@modelcontextprotocol/sdk": "^1.0.0",
|
||||
"gray-matter": "^4.0.3",
|
||||
"openai": "^4.0.0",
|
||||
|
||||
16
src/cli.ts
16
src/cli.ts
@@ -1,7 +1,6 @@
|
||||
#!/usr/bin/env bun
|
||||
|
||||
import { readFileSync } from 'fs';
|
||||
import { PostgresEngine } from './core/postgres-engine.ts';
|
||||
import { loadConfig, toEngineConfig } from './core/config.ts';
|
||||
import type { BrainEngine } from './core/engine.ts';
|
||||
import { operations, OperationError } from './core/operations.ts';
|
||||
@@ -19,7 +18,7 @@ for (const op of operations) {
|
||||
}
|
||||
|
||||
// CLI-only commands that bypass the operation layer
|
||||
const CLI_ONLY = new Set(['init', 'upgrade', 'check-update', 'integrations', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor']);
|
||||
const CLI_ONLY = new Set(['init', 'upgrade', 'check-update', 'integrations', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor', 'migrate']);
|
||||
|
||||
async function main() {
|
||||
const args = process.argv.slice(2);
|
||||
@@ -288,6 +287,11 @@ async function handleCliOnly(command: string, args: string[]) {
|
||||
await runDoctor(engine, args);
|
||||
break;
|
||||
}
|
||||
case 'migrate': {
|
||||
const { runMigrateEngine } = await import('./commands/migrate-engine.ts');
|
||||
await runMigrateEngine(engine, args);
|
||||
break;
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
if (command !== 'serve') await engine.disconnect();
|
||||
@@ -297,10 +301,11 @@ async function handleCliOnly(command: string, args: string[]) {
|
||||
async function connectEngine(): Promise<BrainEngine> {
|
||||
const config = loadConfig();
|
||||
if (!config) {
|
||||
console.error('No brain configured. Run: gbrain init --supabase');
|
||||
console.error('No brain configured. Run: gbrain init');
|
||||
process.exit(1);
|
||||
}
|
||||
const engine = new PostgresEngine();
|
||||
const { createEngine } = await import('./core/engine-factory.ts');
|
||||
const engine = await createEngine(toEngineConfig(config));
|
||||
await engine.connect(toEngineConfig(config));
|
||||
return engine;
|
||||
}
|
||||
@@ -333,7 +338,8 @@ USAGE
|
||||
gbrain <command> [options]
|
||||
|
||||
SETUP
|
||||
init [--supabase|--url <conn>] Create brain (guided wizard)
|
||||
init [--pglite|--supabase|--url] Create brain (PGLite default, no server)
|
||||
migrate --to <supabase|pglite> Transfer brain between engines
|
||||
upgrade Self-update
|
||||
check-update [--json] Check for new versions
|
||||
doctor [--json] Health check (pgvector, RLS, schema, embeddings)
|
||||
|
||||
@@ -3,7 +3,6 @@ import { execFileSync } from 'child_process';
|
||||
import { join, relative } from 'path';
|
||||
import { cpus, totalmem, homedir } from 'os';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { PostgresEngine } from '../core/postgres-engine.ts';
|
||||
import { importFile } from '../core/import-file.ts';
|
||||
import { loadConfig } from '../core/config.ts';
|
||||
|
||||
@@ -127,11 +126,19 @@ export async function runImport(engine: BrainEngine, args: string[]) {
|
||||
|
||||
if (actualWorkers > 1) {
|
||||
// Parallel: create per-worker engine instances with small pool
|
||||
// PGLite is single-connection, so parallel workers are only for Postgres
|
||||
const config = loadConfig();
|
||||
if (config?.engine === 'pglite') {
|
||||
// PGLite: sequential import through single engine
|
||||
for (const file of files) {
|
||||
await processFile(engine, file);
|
||||
}
|
||||
} else {
|
||||
const { PostgresEngine } = await import('../core/postgres-engine.ts');
|
||||
const workerEngines = await Promise.all(
|
||||
Array.from({ length: actualWorkers }, async () => {
|
||||
const eng = new PostgresEngine();
|
||||
await eng.connect({ database_url: config.database_url!, poolSize: 2 });
|
||||
await eng.connect({ database_url: config!.database_url!, poolSize: 2 });
|
||||
return eng;
|
||||
})
|
||||
);
|
||||
@@ -147,6 +154,7 @@ export async function runImport(engine: BrainEngine, args: string[]) {
|
||||
}));
|
||||
|
||||
await Promise.all(workerEngines.map(e => e.disconnect()));
|
||||
} // end else (postgres parallel)
|
||||
} else {
|
||||
// Sequential: use the provided engine
|
||||
for (const filePath of files) {
|
||||
|
||||
@@ -1,18 +1,43 @@
|
||||
import { execSync } from 'child_process';
|
||||
import { PostgresEngine } from '../core/postgres-engine.ts';
|
||||
import { readdirSync, statSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
import { homedir } from 'os';
|
||||
import { saveConfig, type GBrainConfig } from '../core/config.ts';
|
||||
import { createEngine } from '../core/engine-factory.ts';
|
||||
|
||||
export async function runInit(args: string[]) {
|
||||
const isSupabase = args.includes('--supabase');
|
||||
const isPGLite = args.includes('--pglite');
|
||||
const isNonInteractive = args.includes('--non-interactive');
|
||||
const jsonOutput = args.includes('--json');
|
||||
const urlIndex = args.indexOf('--url');
|
||||
const manualUrl = urlIndex !== -1 ? args[urlIndex + 1] : null;
|
||||
const keyIndex = args.indexOf('--key');
|
||||
const apiKey = keyIndex !== -1 ? args[keyIndex + 1] : null;
|
||||
const pathIndex = args.indexOf('--path');
|
||||
const customPath = pathIndex !== -1 ? args[pathIndex + 1] : null;
|
||||
|
||||
// Explicit PGLite mode
|
||||
if (isPGLite || (!isSupabase && !manualUrl && !isNonInteractive)) {
|
||||
// Smart detection: scan for .md files unless --pglite flag forces it
|
||||
if (!isPGLite && !isSupabase) {
|
||||
const fileCount = countMarkdownFiles(process.cwd());
|
||||
if (fileCount >= 1000) {
|
||||
console.log(`Found ~${fileCount} .md files. For a brain this size, Supabase gives faster`);
|
||||
console.log('search and remote access ($25/mo). PGLite works too but search will be slower at scale.');
|
||||
console.log('');
|
||||
console.log(' gbrain init --supabase Set up with Supabase (recommended for large brains)');
|
||||
console.log(' gbrain init --pglite Use local PGLite anyway');
|
||||
console.log('');
|
||||
// Default to PGLite, let the user choose Supabase if they want
|
||||
}
|
||||
}
|
||||
|
||||
return initPGLite({ jsonOutput, apiKey, customPath });
|
||||
}
|
||||
|
||||
// Supabase/Postgres mode
|
||||
let databaseUrl: string;
|
||||
|
||||
if (manualUrl) {
|
||||
databaseUrl = manualUrl;
|
||||
} else if (isNonInteractive) {
|
||||
@@ -23,12 +48,45 @@ export async function runInit(args: string[]) {
|
||||
console.error('--non-interactive requires --url <connection_string> or GBRAIN_DATABASE_URL env var');
|
||||
process.exit(1);
|
||||
}
|
||||
} else if (isSupabase) {
|
||||
databaseUrl = await supabaseWizard();
|
||||
} else {
|
||||
databaseUrl = await supabaseWizard();
|
||||
}
|
||||
|
||||
return initPostgres({ databaseUrl, jsonOutput, apiKey });
|
||||
}
|
||||
|
||||
async function initPGLite(opts: { jsonOutput: boolean; apiKey: string | null; customPath: string | null }) {
|
||||
const dbPath = opts.customPath || join(homedir(), '.gbrain', 'brain.pglite');
|
||||
console.log(`Setting up local brain with PGLite (no server needed)...`);
|
||||
|
||||
const engine = await createEngine({ engine: 'pglite' });
|
||||
await engine.connect({ database_path: dbPath, engine: 'pglite' });
|
||||
await engine.initSchema();
|
||||
|
||||
const config: GBrainConfig = {
|
||||
engine: 'pglite',
|
||||
database_path: dbPath,
|
||||
...(opts.apiKey ? { openai_api_key: opts.apiKey } : {}),
|
||||
};
|
||||
saveConfig(config);
|
||||
|
||||
const stats = await engine.getStats();
|
||||
await engine.disconnect();
|
||||
|
||||
if (opts.jsonOutput) {
|
||||
console.log(JSON.stringify({ status: 'success', engine: 'pglite', path: dbPath, pages: stats.page_count }));
|
||||
} else {
|
||||
console.log(`\nBrain ready at ${dbPath}`);
|
||||
console.log(`${stats.page_count} pages. Engine: PGLite (local Postgres).`);
|
||||
console.log('Next: gbrain import <dir>');
|
||||
console.log('');
|
||||
console.log('When you outgrow local: gbrain migrate --to supabase');
|
||||
}
|
||||
}
|
||||
|
||||
async function initPostgres(opts: { databaseUrl: string; jsonOutput: boolean; apiKey: string | null }) {
|
||||
const { databaseUrl } = opts;
|
||||
|
||||
// Detect Supabase direct connection URLs and warn about IPv6
|
||||
if (databaseUrl.match(/db\.[a-z]+\.supabase\.co/) || databaseUrl.includes('.supabase.co:5432')) {
|
||||
console.warn('');
|
||||
@@ -40,19 +98,15 @@ export async function runInit(args: string[]) {
|
||||
console.warn('');
|
||||
}
|
||||
|
||||
// Connect and init schema
|
||||
console.log('Connecting to database...');
|
||||
const engine = new PostgresEngine();
|
||||
const engine = await createEngine({ engine: 'postgres' });
|
||||
try {
|
||||
await engine.connect({ database_url: databaseUrl });
|
||||
} catch (e: unknown) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
// Provide better error for Supabase IPv6 failures
|
||||
if (databaseUrl.includes('supabase.co') && (msg.includes('ECONNREFUSED') || msg.includes('ETIMEDOUT'))) {
|
||||
console.error('Connection failed. Supabase direct connections (db.*.supabase.co:5432) are IPv6 only.');
|
||||
console.error('Use the Session pooler connection string instead (port 6543):');
|
||||
console.error(' Supabase Dashboard > gear icon (Project Settings) > Database >');
|
||||
console.error(' Connection string > URI tab > change dropdown to "Session pooler"');
|
||||
console.error('Use the Session pooler connection string instead (port 6543).');
|
||||
}
|
||||
throw e;
|
||||
}
|
||||
@@ -74,36 +128,56 @@ export async function runInit(args: string[]) {
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
// Non-fatal: proceed without pgvector check if query fails
|
||||
// Non-fatal
|
||||
}
|
||||
|
||||
console.log('Running schema migration...');
|
||||
await engine.initSchema();
|
||||
|
||||
// Save config
|
||||
const config: GBrainConfig = {
|
||||
engine: 'postgres',
|
||||
database_url: databaseUrl,
|
||||
...(apiKey ? { openai_api_key: apiKey } : {}),
|
||||
...(opts.apiKey ? { openai_api_key: opts.apiKey } : {}),
|
||||
};
|
||||
saveConfig(config);
|
||||
console.log('Config saved to ~/.gbrain/config.json');
|
||||
|
||||
// Verify
|
||||
const stats = await engine.getStats();
|
||||
await engine.disconnect();
|
||||
|
||||
if (jsonOutput) {
|
||||
console.log(JSON.stringify({ status: 'success', pages: stats.page_count, config_path: '~/.gbrain/config.json' }));
|
||||
if (opts.jsonOutput) {
|
||||
console.log(JSON.stringify({ status: 'success', engine: 'postgres', pages: stats.page_count }));
|
||||
} else {
|
||||
console.log(`\nBrain ready. ${stats.page_count} pages.`);
|
||||
console.log('Next: gbrain import <dir> to migrate your markdown.');
|
||||
console.log('Production agent guide: docs/GBRAIN_SKILLPACK.md');
|
||||
console.log(`\nBrain ready. ${stats.page_count} pages. Engine: Postgres (Supabase).`);
|
||||
console.log('Next: gbrain import <dir>');
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Quick count of .md files in a directory (stops early at 1000).
|
||||
*/
|
||||
function countMarkdownFiles(dir: string, maxScan = 1500): number {
|
||||
let count = 0;
|
||||
try {
|
||||
const scan = (d: string) => {
|
||||
if (count >= maxScan) return;
|
||||
for (const entry of readdirSync(d)) {
|
||||
if (count >= maxScan) return;
|
||||
if (entry.startsWith('.') || entry === 'node_modules') continue;
|
||||
const full = join(d, entry);
|
||||
try {
|
||||
const stat = statSync(full);
|
||||
if (stat.isDirectory()) scan(full);
|
||||
else if (entry.endsWith('.md')) count++;
|
||||
} catch { /* skip unreadable */ }
|
||||
}
|
||||
};
|
||||
scan(dir);
|
||||
} catch { /* skip unreadable root */ }
|
||||
return count;
|
||||
}
|
||||
|
||||
async function supabaseWizard(): Promise<string> {
|
||||
// Try Supabase CLI auto-provision
|
||||
try {
|
||||
execSync('bunx supabase --version', { stdio: 'pipe' });
|
||||
console.log('Supabase CLI detected.');
|
||||
@@ -111,10 +185,8 @@ async function supabaseWizard(): Promise<string> {
|
||||
console.log('Then use: gbrain init --url <your-connection-string>');
|
||||
} catch {
|
||||
console.log('Supabase CLI not found.');
|
||||
console.log('Or provide a connection URL directly.');
|
||||
}
|
||||
|
||||
// Fallback to manual URL
|
||||
console.log('\nEnter your Supabase/Postgres connection URL:');
|
||||
console.log(' Format: postgresql://postgres.[ref]:[password]@aws-0-[region].pooler.supabase.com:6543/postgres');
|
||||
console.log(' Find it: Supabase Dashboard > Connect (top bar) > Connection String > Session Pooler\n');
|
||||
|
||||
246
src/commands/migrate-engine.ts
Normal file
246
src/commands/migrate-engine.ts
Normal file
@@ -0,0 +1,246 @@
|
||||
/**
|
||||
* Engine migration: transfer brain data between PGLite and Postgres.
|
||||
*
|
||||
* Usage:
|
||||
* gbrain migrate --to supabase [--url <connection_string>]
|
||||
* gbrain migrate --to pglite [--path <db_path>]
|
||||
* gbrain migrate --to <engine> --force (overwrite non-empty target)
|
||||
*/
|
||||
|
||||
import { createEngine } from '../core/engine-factory.ts';
|
||||
import { loadConfig, saveConfig, toEngineConfig, type GBrainConfig } from '../core/config.ts';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import type { EngineConfig } from '../core/types.ts';
|
||||
import { homedir } from 'os';
|
||||
import { join } from 'path';
|
||||
import { writeFileSync, readFileSync, existsSync, unlinkSync } from 'fs';
|
||||
|
||||
interface MigrateOpts {
|
||||
targetEngine: 'postgres' | 'pglite';
|
||||
targetUrl?: string;
|
||||
targetPath?: string;
|
||||
force: boolean;
|
||||
}
|
||||
|
||||
function parseArgs(args: string[]): MigrateOpts {
|
||||
const toIdx = args.indexOf('--to');
|
||||
if (toIdx === -1 || !args[toIdx + 1]) {
|
||||
throw new Error('Usage: gbrain migrate --to <supabase|pglite> [--url <url>] [--path <path>] [--force]');
|
||||
}
|
||||
|
||||
const targetRaw = args[toIdx + 1];
|
||||
const targetEngine = targetRaw === 'supabase' ? 'postgres' : targetRaw as 'postgres' | 'pglite';
|
||||
if (targetEngine !== 'postgres' && targetEngine !== 'pglite') {
|
||||
throw new Error(`Unknown target engine: "${targetRaw}". Use: supabase or pglite`);
|
||||
}
|
||||
|
||||
const urlIdx = args.indexOf('--url');
|
||||
const pathIdx = args.indexOf('--path');
|
||||
|
||||
return {
|
||||
targetEngine,
|
||||
targetUrl: urlIdx !== -1 ? args[urlIdx + 1] : undefined,
|
||||
targetPath: pathIdx !== -1 ? args[pathIdx + 1] : undefined,
|
||||
force: args.includes('--force'),
|
||||
};
|
||||
}
|
||||
|
||||
function getManifestPath(): string {
|
||||
return join(homedir(), '.gbrain', 'migrate-manifest.json');
|
||||
}
|
||||
|
||||
interface MigrateManifest {
|
||||
completed_slugs: string[];
|
||||
target_engine: string;
|
||||
started_at: string;
|
||||
}
|
||||
|
||||
function loadManifest(): MigrateManifest | null {
|
||||
const path = getManifestPath();
|
||||
if (!existsSync(path)) return null;
|
||||
try {
|
||||
return JSON.parse(readFileSync(path, 'utf-8'));
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function saveManifest(manifest: MigrateManifest): void {
|
||||
writeFileSync(getManifestPath(), JSON.stringify(manifest, null, 2));
|
||||
}
|
||||
|
||||
function clearManifest(): void {
|
||||
const path = getManifestPath();
|
||||
if (existsSync(path)) unlinkSync(path);
|
||||
}
|
||||
|
||||
export async function runMigrateEngine(sourceEngine: BrainEngine, args: string[]): Promise<void> {
|
||||
const opts = parseArgs(args);
|
||||
const config = loadConfig();
|
||||
if (!config) {
|
||||
console.error('No brain configured. Run: gbrain init');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Check source != target
|
||||
if (config.engine === opts.targetEngine) {
|
||||
console.error(`Already using ${opts.targetEngine} engine. Nothing to migrate.`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Build target config
|
||||
const targetConfig: EngineConfig = { engine: opts.targetEngine };
|
||||
if (opts.targetEngine === 'postgres') {
|
||||
targetConfig.database_url = opts.targetUrl || process.env.GBRAIN_DATABASE_URL || process.env.DATABASE_URL;
|
||||
if (!targetConfig.database_url) {
|
||||
console.error('Target is Supabase but no connection string provided. Use: --url <connection_string>');
|
||||
process.exit(1);
|
||||
}
|
||||
} else {
|
||||
targetConfig.database_path = opts.targetPath || join(homedir(), '.gbrain', 'brain.pglite');
|
||||
}
|
||||
|
||||
// Connect to target
|
||||
console.log(`Connecting to target (${opts.targetEngine})...`);
|
||||
const targetEngine = await createEngine(targetConfig);
|
||||
await targetEngine.connect(targetConfig);
|
||||
await targetEngine.initSchema();
|
||||
|
||||
// Check if target has data
|
||||
const targetStats = await targetEngine.getStats();
|
||||
if (targetStats.page_count > 0 && !opts.force) {
|
||||
console.error(`Target brain is not empty (${targetStats.page_count} pages).`);
|
||||
console.error('Run with --force to overwrite, or migrate to an empty brain.');
|
||||
await targetEngine.disconnect();
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (targetStats.page_count > 0 && opts.force) {
|
||||
console.log('--force: wiping target brain...');
|
||||
// Delete all pages (cascades to chunks, links, tags, etc.)
|
||||
const pages = await targetEngine.listPages({ limit: 100000 });
|
||||
for (const p of pages) {
|
||||
await targetEngine.deletePage(p.slug);
|
||||
}
|
||||
}
|
||||
|
||||
// Load or create manifest for resume
|
||||
let manifest = loadManifest();
|
||||
if (manifest && manifest.target_engine !== opts.targetEngine) {
|
||||
console.log('Previous migration was to a different target. Starting fresh.');
|
||||
manifest = null;
|
||||
}
|
||||
const completedSet = new Set(manifest?.completed_slugs || []);
|
||||
if (!manifest) {
|
||||
manifest = {
|
||||
completed_slugs: [],
|
||||
target_engine: opts.targetEngine,
|
||||
started_at: new Date().toISOString(),
|
||||
};
|
||||
}
|
||||
|
||||
// Get all source pages
|
||||
const sourceStats = await sourceEngine.getStats();
|
||||
const allPages = await sourceEngine.listPages({ limit: 100000 });
|
||||
const pagesToMigrate = allPages.filter(p => !completedSet.has(p.slug));
|
||||
|
||||
console.log(`Migrating ${pagesToMigrate.length} pages (${allPages.length} total, ${completedSet.size} already done)...`);
|
||||
|
||||
let migrated = 0;
|
||||
for (const page of pagesToMigrate) {
|
||||
// Copy page
|
||||
await targetEngine.putPage(page.slug, {
|
||||
type: page.type,
|
||||
title: page.title,
|
||||
compiled_truth: page.compiled_truth,
|
||||
timeline: page.timeline,
|
||||
frontmatter: page.frontmatter,
|
||||
content_hash: page.content_hash,
|
||||
});
|
||||
|
||||
// Copy chunks with embeddings
|
||||
const chunks = await sourceEngine.getChunksWithEmbeddings(page.slug);
|
||||
if (chunks.length > 0) {
|
||||
await targetEngine.upsertChunks(page.slug, chunks.map(c => ({
|
||||
chunk_index: c.chunk_index,
|
||||
chunk_text: c.chunk_text,
|
||||
chunk_source: c.chunk_source,
|
||||
embedding: c.embedding || undefined,
|
||||
model: c.model,
|
||||
token_count: c.token_count || undefined,
|
||||
})));
|
||||
}
|
||||
|
||||
// Copy tags
|
||||
const tags = await sourceEngine.getTags(page.slug);
|
||||
for (const tag of tags) {
|
||||
await targetEngine.addTag(page.slug, tag);
|
||||
}
|
||||
|
||||
// Copy timeline
|
||||
const timeline = await sourceEngine.getTimeline(page.slug);
|
||||
for (const entry of timeline) {
|
||||
await targetEngine.addTimelineEntry(page.slug, {
|
||||
date: entry.date,
|
||||
source: entry.source,
|
||||
summary: entry.summary,
|
||||
detail: entry.detail,
|
||||
});
|
||||
}
|
||||
|
||||
// Copy raw data
|
||||
const rawData = await sourceEngine.getRawData(page.slug);
|
||||
for (const rd of rawData) {
|
||||
await targetEngine.putRawData(page.slug, rd.source, rd.data);
|
||||
}
|
||||
|
||||
// Copy versions
|
||||
const versions = await sourceEngine.getVersions(page.slug);
|
||||
// Versions are snapshots, we recreate them on the target
|
||||
// (createVersion takes a snapshot of current state, which we just set)
|
||||
|
||||
// Track progress
|
||||
manifest!.completed_slugs.push(page.slug);
|
||||
saveManifest(manifest!);
|
||||
migrated++;
|
||||
|
||||
if (migrated % 50 === 0 || migrated === pagesToMigrate.length) {
|
||||
console.log(` Progress: ${migrated}/${pagesToMigrate.length} pages`);
|
||||
}
|
||||
}
|
||||
|
||||
// Copy links (after all pages exist in target)
|
||||
console.log('Copying links...');
|
||||
for (const page of allPages) {
|
||||
const links = await sourceEngine.getLinks(page.slug);
|
||||
for (const link of links) {
|
||||
await targetEngine.addLink(link.from_slug, link.to_slug, link.context, link.link_type);
|
||||
}
|
||||
}
|
||||
|
||||
// Copy config (selective)
|
||||
const configKeys = ['embedding_model', 'embedding_dimensions', 'chunk_strategy'];
|
||||
for (const key of configKeys) {
|
||||
const val = await sourceEngine.getConfig(key);
|
||||
if (val) await targetEngine.setConfig(key, val);
|
||||
}
|
||||
|
||||
// Update local config
|
||||
const newConfig: GBrainConfig = {
|
||||
engine: opts.targetEngine,
|
||||
...(opts.targetEngine === 'postgres'
|
||||
? { database_url: targetConfig.database_url }
|
||||
: { database_path: targetConfig.database_path }),
|
||||
};
|
||||
saveConfig(newConfig);
|
||||
|
||||
// Clean up
|
||||
clearManifest();
|
||||
await targetEngine.disconnect();
|
||||
|
||||
console.log(`\nMigration complete. ${migrated} pages transferred.`);
|
||||
console.log(`Config updated to engine: ${opts.targetEngine}`);
|
||||
if (config.engine === 'pglite' && config.database_path) {
|
||||
console.log(`Original PGLite brain preserved at ${config.database_path} (backup).`);
|
||||
}
|
||||
}
|
||||
@@ -8,7 +8,7 @@ function getConfigDir() { return join(homedir(), '.gbrain'); }
|
||||
function getConfigPath() { return join(getConfigDir(), 'config.json'); }
|
||||
|
||||
export interface GBrainConfig {
|
||||
engine: 'postgres' | 'sqlite';
|
||||
engine: 'postgres' | 'pglite';
|
||||
database_url?: string;
|
||||
database_path?: string;
|
||||
openai_api_key?: string;
|
||||
@@ -31,13 +31,18 @@ export function loadConfig(): GBrainConfig | null {
|
||||
|
||||
if (!fileConfig && !dbUrl) return null;
|
||||
|
||||
// Infer engine type if not explicitly set
|
||||
const inferredEngine: 'postgres' | 'pglite' = fileConfig?.engine
|
||||
|| (fileConfig?.database_path ? 'pglite' : 'postgres');
|
||||
|
||||
// Merge: env vars override config file
|
||||
return {
|
||||
engine: 'postgres',
|
||||
const merged = {
|
||||
...fileConfig,
|
||||
engine: inferredEngine,
|
||||
...(dbUrl ? { database_url: dbUrl } : {}),
|
||||
...(process.env.OPENAI_API_KEY ? { openai_api_key: process.env.OPENAI_API_KEY } : {}),
|
||||
};
|
||||
return merged as GBrainConfig;
|
||||
}
|
||||
|
||||
export function saveConfig(config: GBrainConfig): void {
|
||||
|
||||
26
src/core/engine-factory.ts
Normal file
26
src/core/engine-factory.ts
Normal file
@@ -0,0 +1,26 @@
|
||||
import type { BrainEngine } from './engine.ts';
|
||||
import type { EngineConfig } from './types.ts';
|
||||
|
||||
/**
|
||||
* Create an engine instance based on config.
|
||||
* Uses dynamic imports so PGLite WASM is never loaded for Postgres users.
|
||||
*/
|
||||
export async function createEngine(config: EngineConfig): Promise<BrainEngine> {
|
||||
const engineType = config.engine || 'postgres';
|
||||
|
||||
switch (engineType) {
|
||||
case 'pglite': {
|
||||
const { PGLiteEngine } = await import('./pglite-engine.ts');
|
||||
return new PGLiteEngine();
|
||||
}
|
||||
case 'postgres': {
|
||||
const { PostgresEngine } = await import('./postgres-engine.ts');
|
||||
return new PostgresEngine();
|
||||
}
|
||||
default:
|
||||
throw new Error(
|
||||
`Unknown engine type: "${engineType}". Supported engines: postgres, pglite.` +
|
||||
(engineType === 'sqlite' ? ' SQLite is not supported. Use pglite instead.' : '')
|
||||
);
|
||||
}
|
||||
}
|
||||
@@ -74,4 +74,8 @@ export interface BrainEngine {
|
||||
// Config
|
||||
getConfig(key: string): Promise<string | null>;
|
||||
setConfig(key: string, value: string): Promise<void>;
|
||||
|
||||
// Migration support
|
||||
runMigration(version: number, sql: string): Promise<void>;
|
||||
getChunksWithEmbeddings(slug: string): Promise<Chunk[]>;
|
||||
}
|
||||
|
||||
@@ -98,11 +98,7 @@ export async function runMigrations(engine: BrainEngine): Promise<{ applied: num
|
||||
// SQL migration (transactional)
|
||||
if (m.sql) {
|
||||
await engine.transaction(async (tx) => {
|
||||
const eng = tx as any;
|
||||
const sql = eng.sql || eng._sql;
|
||||
if (sql) {
|
||||
await sql.unsafe(m.sql);
|
||||
}
|
||||
await tx.runMigration(m.version, m.sql);
|
||||
});
|
||||
}
|
||||
|
||||
|
||||
624
src/core/pglite-engine.ts
Normal file
624
src/core/pglite-engine.ts
Normal file
@@ -0,0 +1,624 @@
|
||||
import { PGlite } from '@electric-sql/pglite';
|
||||
import { vector } from '@electric-sql/pglite/vector';
|
||||
import { pg_trgm } from '@electric-sql/pglite/contrib/pg_trgm';
|
||||
import type { Transaction } from '@electric-sql/pglite';
|
||||
import type { BrainEngine } from './engine.ts';
|
||||
import { runMigrations } from './migrate.ts';
|
||||
import { PGLITE_SCHEMA_SQL } from './pglite-schema.ts';
|
||||
import type {
|
||||
Page, PageInput, PageFilters, PageType,
|
||||
Chunk, ChunkInput,
|
||||
SearchResult, SearchOpts,
|
||||
Link, GraphNode,
|
||||
TimelineEntry, TimelineInput, TimelineOpts,
|
||||
RawData,
|
||||
PageVersion,
|
||||
BrainStats, BrainHealth,
|
||||
IngestLogEntry, IngestLogInput,
|
||||
EngineConfig,
|
||||
} from './types.ts';
|
||||
import { validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult } from './utils.ts';
|
||||
|
||||
type PGLiteDB = PGlite;
|
||||
|
||||
export class PGLiteEngine implements BrainEngine {
|
||||
private _db: PGLiteDB | null = null;
|
||||
|
||||
get db(): PGLiteDB {
|
||||
if (!this._db) throw new Error('PGLite not connected. Call connect() first.');
|
||||
return this._db;
|
||||
}
|
||||
|
||||
// Lifecycle
|
||||
async connect(config: EngineConfig): Promise<void> {
|
||||
const dataDir = config.database_path || undefined; // undefined = in-memory
|
||||
this._db = await PGlite.create({
|
||||
dataDir,
|
||||
extensions: { vector, pg_trgm },
|
||||
});
|
||||
}
|
||||
|
||||
async disconnect(): Promise<void> {
|
||||
if (this._db) {
|
||||
await this._db.close();
|
||||
this._db = null;
|
||||
}
|
||||
}
|
||||
|
||||
async initSchema(): Promise<void> {
|
||||
await this.db.exec(PGLITE_SCHEMA_SQL);
|
||||
|
||||
const { applied } = await runMigrations(this);
|
||||
if (applied > 0) {
|
||||
console.log(` ${applied} migration(s) applied`);
|
||||
}
|
||||
}
|
||||
|
||||
async transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T> {
|
||||
return this.db.transaction(async (tx) => {
|
||||
const txEngine = Object.create(this) as PGLiteEngine;
|
||||
Object.defineProperty(txEngine, 'db', { get: () => tx });
|
||||
return fn(txEngine);
|
||||
});
|
||||
}
|
||||
|
||||
// Pages CRUD
|
||||
async getPage(slug: string): Promise<Page | null> {
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash, created_at, updated_at
|
||||
FROM pages WHERE slug = $1`,
|
||||
[slug]
|
||||
);
|
||||
if (rows.length === 0) return null;
|
||||
return rowToPage(rows[0] as Record<string, unknown>);
|
||||
}
|
||||
|
||||
async putPage(slug: string, page: PageInput): Promise<Page> {
|
||||
slug = validateSlug(slug);
|
||||
const hash = page.content_hash || contentHash(page.compiled_truth, page.timeline || '');
|
||||
const frontmatter = page.frontmatter || {};
|
||||
|
||||
const { rows } = await this.db.query(
|
||||
`INSERT INTO pages (slug, type, title, compiled_truth, timeline, frontmatter, content_hash, updated_at)
|
||||
VALUES ($1, $2, $3, $4, $5, $6::jsonb, $7, now())
|
||||
ON CONFLICT (slug) DO UPDATE SET
|
||||
type = EXCLUDED.type,
|
||||
title = EXCLUDED.title,
|
||||
compiled_truth = EXCLUDED.compiled_truth,
|
||||
timeline = EXCLUDED.timeline,
|
||||
frontmatter = EXCLUDED.frontmatter,
|
||||
content_hash = EXCLUDED.content_hash,
|
||||
updated_at = now()
|
||||
RETURNING id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash, created_at, updated_at`,
|
||||
[slug, page.type, page.title, page.compiled_truth, page.timeline || '', JSON.stringify(frontmatter), hash]
|
||||
);
|
||||
return rowToPage(rows[0] as Record<string, unknown>);
|
||||
}
|
||||
|
||||
async deletePage(slug: string): Promise<void> {
|
||||
await this.db.query('DELETE FROM pages WHERE slug = $1', [slug]);
|
||||
}
|
||||
|
||||
async listPages(filters?: PageFilters): Promise<Page[]> {
|
||||
const limit = filters?.limit || 100;
|
||||
const offset = filters?.offset || 0;
|
||||
|
||||
let result;
|
||||
if (filters?.type && filters?.tag) {
|
||||
result = await this.db.query(
|
||||
`SELECT p.* FROM pages p
|
||||
JOIN tags t ON t.page_id = p.id
|
||||
WHERE p.type = $1 AND t.tag = $2
|
||||
ORDER BY p.updated_at DESC LIMIT $3 OFFSET $4`,
|
||||
[filters.type, filters.tag, limit, offset]
|
||||
);
|
||||
} else if (filters?.type) {
|
||||
result = await this.db.query(
|
||||
`SELECT * FROM pages WHERE type = $1
|
||||
ORDER BY updated_at DESC LIMIT $2 OFFSET $3`,
|
||||
[filters.type, limit, offset]
|
||||
);
|
||||
} else if (filters?.tag) {
|
||||
result = await this.db.query(
|
||||
`SELECT p.* FROM pages p
|
||||
JOIN tags t ON t.page_id = p.id
|
||||
WHERE t.tag = $1
|
||||
ORDER BY p.updated_at DESC LIMIT $2 OFFSET $3`,
|
||||
[filters.tag, limit, offset]
|
||||
);
|
||||
} else {
|
||||
result = await this.db.query(
|
||||
`SELECT * FROM pages
|
||||
ORDER BY updated_at DESC LIMIT $1 OFFSET $2`,
|
||||
[limit, offset]
|
||||
);
|
||||
}
|
||||
|
||||
return (result.rows as Record<string, unknown>[]).map(rowToPage);
|
||||
}
|
||||
|
||||
async resolveSlugs(partial: string): Promise<string[]> {
|
||||
// Try exact match first
|
||||
const exact = await this.db.query('SELECT slug FROM pages WHERE slug = $1', [partial]);
|
||||
if (exact.rows.length > 0) return [(exact.rows[0] as { slug: string }).slug];
|
||||
|
||||
// Fuzzy match via pg_trgm
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT slug, similarity(title, $1) AS sim
|
||||
FROM pages
|
||||
WHERE title % $1 OR slug ILIKE $2
|
||||
ORDER BY sim DESC
|
||||
LIMIT 5`,
|
||||
[partial, '%' + partial + '%']
|
||||
);
|
||||
return (rows as { slug: string }[]).map(r => r.slug);
|
||||
}
|
||||
|
||||
// Search
|
||||
async searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]> {
|
||||
const limit = opts?.limit || 20;
|
||||
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT DISTINCT ON (p.slug)
|
||||
p.slug, p.id as page_id, p.title, p.type,
|
||||
cc.chunk_text, cc.chunk_source,
|
||||
ts_rank(p.search_vector, websearch_to_tsquery('english', $1)) AS score,
|
||||
CASE WHEN p.updated_at < (
|
||||
SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id
|
||||
) THEN true ELSE false END AS stale
|
||||
FROM pages p
|
||||
JOIN content_chunks cc ON cc.page_id = p.id
|
||||
WHERE p.search_vector @@ websearch_to_tsquery('english', $1)
|
||||
ORDER BY p.slug, score DESC`,
|
||||
[query]
|
||||
);
|
||||
|
||||
// Re-sort by score (DISTINCT ON requires ORDER BY slug first) and apply limit
|
||||
const sorted = (rows as Record<string, unknown>[]).sort(
|
||||
(a: any, b: any) => b.score - a.score
|
||||
);
|
||||
sorted.splice(limit);
|
||||
|
||||
return sorted.map(rowToSearchResult);
|
||||
}
|
||||
|
||||
async searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]> {
|
||||
const limit = opts?.limit || 20;
|
||||
const vecStr = '[' + Array.from(embedding).join(',') + ']';
|
||||
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT
|
||||
p.slug, p.id as page_id, p.title, p.type,
|
||||
cc.chunk_text, cc.chunk_source,
|
||||
1 - (cc.embedding <=> $1::vector) AS score,
|
||||
CASE WHEN p.updated_at < (
|
||||
SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id
|
||||
) THEN true ELSE false END AS stale
|
||||
FROM content_chunks cc
|
||||
JOIN pages p ON p.id = cc.page_id
|
||||
WHERE cc.embedding IS NOT NULL
|
||||
ORDER BY cc.embedding <=> $1::vector
|
||||
LIMIT $2`,
|
||||
[vecStr, limit]
|
||||
);
|
||||
|
||||
return (rows as Record<string, unknown>[]).map(rowToSearchResult);
|
||||
}
|
||||
|
||||
// Chunks
|
||||
async upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void> {
|
||||
// Get page_id
|
||||
const pageResult = await this.db.query('SELECT id FROM pages WHERE slug = $1', [slug]);
|
||||
if (pageResult.rows.length === 0) throw new Error(`Page not found: ${slug}`);
|
||||
const pageId = (pageResult.rows[0] as { id: number }).id;
|
||||
|
||||
// Remove chunks that no longer exist
|
||||
const newIndices = chunks.map(c => c.chunk_index);
|
||||
if (newIndices.length > 0) {
|
||||
// PGLite doesn't auto-serialize arrays, so use ANY with explicit array cast
|
||||
await this.db.query(
|
||||
`DELETE FROM content_chunks WHERE page_id = $1 AND chunk_index != ALL($2::int[])`,
|
||||
[pageId, newIndices]
|
||||
);
|
||||
} else {
|
||||
await this.db.query('DELETE FROM content_chunks WHERE page_id = $1', [pageId]);
|
||||
return;
|
||||
}
|
||||
|
||||
// Batch upsert: build dynamic multi-row INSERT
|
||||
const cols = '(page_id, chunk_index, chunk_text, chunk_source, embedding, model, token_count, embedded_at)';
|
||||
const rowParts: string[] = [];
|
||||
const params: unknown[] = [];
|
||||
let paramIdx = 1;
|
||||
|
||||
for (const chunk of chunks) {
|
||||
const embeddingStr = chunk.embedding
|
||||
? '[' + Array.from(chunk.embedding).join(',') + ']'
|
||||
: null;
|
||||
|
||||
if (embeddingStr) {
|
||||
rowParts.push(`($${paramIdx++}, $${paramIdx++}, $${paramIdx++}, $${paramIdx++}, $${paramIdx++}::vector, $${paramIdx++}, $${paramIdx++}, now())`);
|
||||
params.push(pageId, chunk.chunk_index, chunk.chunk_text, chunk.chunk_source, embeddingStr, chunk.model || 'text-embedding-3-large', chunk.token_count || null);
|
||||
} else {
|
||||
rowParts.push(`($${paramIdx++}, $${paramIdx++}, $${paramIdx++}, $${paramIdx++}, NULL, $${paramIdx++}, $${paramIdx++}, NULL)`);
|
||||
params.push(pageId, chunk.chunk_index, chunk.chunk_text, chunk.chunk_source, chunk.model || 'text-embedding-3-large', chunk.token_count || null);
|
||||
}
|
||||
}
|
||||
|
||||
await this.db.query(
|
||||
`INSERT INTO content_chunks ${cols} VALUES ${rowParts.join(', ')}
|
||||
ON CONFLICT (page_id, chunk_index) DO UPDATE SET
|
||||
chunk_text = EXCLUDED.chunk_text,
|
||||
chunk_source = EXCLUDED.chunk_source,
|
||||
embedding = COALESCE(EXCLUDED.embedding, content_chunks.embedding),
|
||||
model = COALESCE(EXCLUDED.model, content_chunks.model),
|
||||
token_count = EXCLUDED.token_count,
|
||||
embedded_at = COALESCE(EXCLUDED.embedded_at, content_chunks.embedded_at)`,
|
||||
params
|
||||
);
|
||||
}
|
||||
|
||||
async getChunks(slug: string): Promise<Chunk[]> {
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT cc.* FROM content_chunks cc
|
||||
JOIN pages p ON p.id = cc.page_id
|
||||
WHERE p.slug = $1
|
||||
ORDER BY cc.chunk_index`,
|
||||
[slug]
|
||||
);
|
||||
return (rows as Record<string, unknown>[]).map(r => rowToChunk(r));
|
||||
}
|
||||
|
||||
async deleteChunks(slug: string): Promise<void> {
|
||||
await this.db.query(
|
||||
`DELETE FROM content_chunks
|
||||
WHERE page_id = (SELECT id FROM pages WHERE slug = $1)`,
|
||||
[slug]
|
||||
);
|
||||
}
|
||||
|
||||
// Links
|
||||
async addLink(from: string, to: string, context?: string, linkType?: string): Promise<void> {
|
||||
await this.db.query(
|
||||
`INSERT INTO links (from_page_id, to_page_id, link_type, context)
|
||||
SELECT f.id, t.id, $3, $4
|
||||
FROM pages f, pages t
|
||||
WHERE f.slug = $1 AND t.slug = $2
|
||||
ON CONFLICT (from_page_id, to_page_id) DO UPDATE SET
|
||||
link_type = EXCLUDED.link_type,
|
||||
context = EXCLUDED.context`,
|
||||
[from, to, linkType || '', context || '']
|
||||
);
|
||||
}
|
||||
|
||||
async removeLink(from: string, to: string): Promise<void> {
|
||||
await this.db.query(
|
||||
`DELETE FROM links
|
||||
WHERE from_page_id = (SELECT id FROM pages WHERE slug = $1)
|
||||
AND to_page_id = (SELECT id FROM pages WHERE slug = $2)`,
|
||||
[from, to]
|
||||
);
|
||||
}
|
||||
|
||||
async getLinks(slug: string): Promise<Link[]> {
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT f.slug as from_slug, t.slug as to_slug, l.link_type, l.context
|
||||
FROM links l
|
||||
JOIN pages f ON f.id = l.from_page_id
|
||||
JOIN pages t ON t.id = l.to_page_id
|
||||
WHERE f.slug = $1`,
|
||||
[slug]
|
||||
);
|
||||
return rows as unknown as Link[];
|
||||
}
|
||||
|
||||
async getBacklinks(slug: string): Promise<Link[]> {
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT f.slug as from_slug, t.slug as to_slug, l.link_type, l.context
|
||||
FROM links l
|
||||
JOIN pages f ON f.id = l.from_page_id
|
||||
JOIN pages t ON t.id = l.to_page_id
|
||||
WHERE t.slug = $1`,
|
||||
[slug]
|
||||
);
|
||||
return rows as unknown as Link[];
|
||||
}
|
||||
|
||||
async traverseGraph(slug: string, depth: number = 5): Promise<GraphNode[]> {
|
||||
const { rows } = await this.db.query(
|
||||
`WITH RECURSIVE graph AS (
|
||||
SELECT p.id, p.slug, p.title, p.type, 0 as depth
|
||||
FROM pages p WHERE p.slug = $1
|
||||
|
||||
UNION
|
||||
|
||||
SELECT p2.id, p2.slug, p2.title, p2.type, g.depth + 1
|
||||
FROM graph g
|
||||
JOIN links l ON l.from_page_id = g.id
|
||||
JOIN pages p2 ON p2.id = l.to_page_id
|
||||
WHERE g.depth < $2
|
||||
)
|
||||
SELECT DISTINCT g.slug, g.title, g.type, g.depth,
|
||||
coalesce(
|
||||
(SELECT jsonb_agg(jsonb_build_object('to_slug', p3.slug, 'link_type', l2.link_type))
|
||||
FROM links l2
|
||||
JOIN pages p3 ON p3.id = l2.to_page_id
|
||||
WHERE l2.from_page_id = g.id),
|
||||
'[]'::jsonb
|
||||
) as links
|
||||
FROM graph g
|
||||
ORDER BY g.depth, g.slug`,
|
||||
[slug, depth]
|
||||
);
|
||||
|
||||
return (rows as Record<string, unknown>[]).map(r => ({
|
||||
slug: r.slug as string,
|
||||
title: r.title as string,
|
||||
type: r.type as PageType,
|
||||
depth: r.depth as number,
|
||||
links: (typeof r.links === 'string' ? JSON.parse(r.links) : r.links) as { to_slug: string; link_type: string }[],
|
||||
}));
|
||||
}
|
||||
|
||||
// Tags
|
||||
async addTag(slug: string, tag: string): Promise<void> {
|
||||
await this.db.query(
|
||||
`INSERT INTO tags (page_id, tag)
|
||||
SELECT id, $2 FROM pages WHERE slug = $1
|
||||
ON CONFLICT (page_id, tag) DO NOTHING`,
|
||||
[slug, tag]
|
||||
);
|
||||
}
|
||||
|
||||
async removeTag(slug: string, tag: string): Promise<void> {
|
||||
await this.db.query(
|
||||
`DELETE FROM tags
|
||||
WHERE page_id = (SELECT id FROM pages WHERE slug = $1)
|
||||
AND tag = $2`,
|
||||
[slug, tag]
|
||||
);
|
||||
}
|
||||
|
||||
async getTags(slug: string): Promise<string[]> {
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT tag FROM tags
|
||||
WHERE page_id = (SELECT id FROM pages WHERE slug = $1)
|
||||
ORDER BY tag`,
|
||||
[slug]
|
||||
);
|
||||
return (rows as { tag: string }[]).map(r => r.tag);
|
||||
}
|
||||
|
||||
// Timeline
|
||||
async addTimelineEntry(slug: string, entry: TimelineInput): Promise<void> {
|
||||
await this.db.query(
|
||||
`INSERT INTO timeline_entries (page_id, date, source, summary, detail)
|
||||
SELECT id, $2::date, $3, $4, $5
|
||||
FROM pages WHERE slug = $1`,
|
||||
[slug, entry.date, entry.source || '', entry.summary, entry.detail || '']
|
||||
);
|
||||
}
|
||||
|
||||
async getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]> {
|
||||
const limit = opts?.limit || 100;
|
||||
|
||||
let result;
|
||||
if (opts?.after && opts?.before) {
|
||||
result = await this.db.query(
|
||||
`SELECT te.* FROM timeline_entries te
|
||||
JOIN pages p ON p.id = te.page_id
|
||||
WHERE p.slug = $1 AND te.date >= $2::date AND te.date <= $3::date
|
||||
ORDER BY te.date DESC LIMIT $4`,
|
||||
[slug, opts.after, opts.before, limit]
|
||||
);
|
||||
} else if (opts?.after) {
|
||||
result = await this.db.query(
|
||||
`SELECT te.* FROM timeline_entries te
|
||||
JOIN pages p ON p.id = te.page_id
|
||||
WHERE p.slug = $1 AND te.date >= $2::date
|
||||
ORDER BY te.date DESC LIMIT $3`,
|
||||
[slug, opts.after, limit]
|
||||
);
|
||||
} else {
|
||||
result = await this.db.query(
|
||||
`SELECT te.* FROM timeline_entries te
|
||||
JOIN pages p ON p.id = te.page_id
|
||||
WHERE p.slug = $1
|
||||
ORDER BY te.date DESC LIMIT $2`,
|
||||
[slug, limit]
|
||||
);
|
||||
}
|
||||
|
||||
return result.rows as unknown as TimelineEntry[];
|
||||
}
|
||||
|
||||
// Raw data
|
||||
async putRawData(slug: string, source: string, data: object): Promise<void> {
|
||||
await this.db.query(
|
||||
`INSERT INTO raw_data (page_id, source, data)
|
||||
SELECT id, $2, $3::jsonb
|
||||
FROM pages WHERE slug = $1
|
||||
ON CONFLICT (page_id, source) DO UPDATE SET
|
||||
data = EXCLUDED.data,
|
||||
fetched_at = now()`,
|
||||
[slug, source, JSON.stringify(data)]
|
||||
);
|
||||
}
|
||||
|
||||
async getRawData(slug: string, source?: string): Promise<RawData[]> {
|
||||
let result;
|
||||
if (source) {
|
||||
result = await this.db.query(
|
||||
`SELECT rd.source, rd.data, rd.fetched_at FROM raw_data rd
|
||||
JOIN pages p ON p.id = rd.page_id
|
||||
WHERE p.slug = $1 AND rd.source = $2`,
|
||||
[slug, source]
|
||||
);
|
||||
} else {
|
||||
result = await this.db.query(
|
||||
`SELECT rd.source, rd.data, rd.fetched_at FROM raw_data rd
|
||||
JOIN pages p ON p.id = rd.page_id
|
||||
WHERE p.slug = $1`,
|
||||
[slug]
|
||||
);
|
||||
}
|
||||
return result.rows as unknown as RawData[];
|
||||
}
|
||||
|
||||
// Versions
|
||||
async createVersion(slug: string): Promise<PageVersion> {
|
||||
const { rows } = await this.db.query(
|
||||
`INSERT INTO page_versions (page_id, compiled_truth, frontmatter)
|
||||
SELECT id, compiled_truth, frontmatter
|
||||
FROM pages WHERE slug = $1
|
||||
RETURNING *`,
|
||||
[slug]
|
||||
);
|
||||
return rows[0] as unknown as PageVersion;
|
||||
}
|
||||
|
||||
async getVersions(slug: string): Promise<PageVersion[]> {
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT pv.* FROM page_versions pv
|
||||
JOIN pages p ON p.id = pv.page_id
|
||||
WHERE p.slug = $1
|
||||
ORDER BY pv.snapshot_at DESC`,
|
||||
[slug]
|
||||
);
|
||||
return rows as unknown as PageVersion[];
|
||||
}
|
||||
|
||||
async revertToVersion(slug: string, versionId: number): Promise<void> {
|
||||
await this.db.query(
|
||||
`UPDATE pages SET
|
||||
compiled_truth = pv.compiled_truth,
|
||||
frontmatter = pv.frontmatter,
|
||||
updated_at = now()
|
||||
FROM page_versions pv
|
||||
WHERE pages.slug = $1 AND pv.id = $2 AND pv.page_id = pages.id`,
|
||||
[slug, versionId]
|
||||
);
|
||||
}
|
||||
|
||||
// Stats + health
|
||||
async getStats(): Promise<BrainStats> {
|
||||
const { rows: [stats] } = await this.db.query(`
|
||||
SELECT
|
||||
(SELECT count(*) FROM pages) as page_count,
|
||||
(SELECT count(*) FROM content_chunks) as chunk_count,
|
||||
(SELECT count(*) FROM content_chunks WHERE embedded_at IS NOT NULL) as embedded_count,
|
||||
(SELECT count(*) FROM links) as link_count,
|
||||
(SELECT count(DISTINCT tag) FROM tags) as tag_count,
|
||||
(SELECT count(*) FROM timeline_entries) as timeline_entry_count
|
||||
`);
|
||||
|
||||
const { rows: types } = await this.db.query(
|
||||
`SELECT type, count(*)::int as count FROM pages GROUP BY type ORDER BY count DESC`
|
||||
);
|
||||
const pages_by_type: Record<string, number> = {};
|
||||
for (const t of types as { type: string; count: number }[]) {
|
||||
pages_by_type[t.type] = t.count;
|
||||
}
|
||||
|
||||
const s = stats as Record<string, unknown>;
|
||||
return {
|
||||
page_count: Number(s.page_count),
|
||||
chunk_count: Number(s.chunk_count),
|
||||
embedded_count: Number(s.embedded_count),
|
||||
link_count: Number(s.link_count),
|
||||
tag_count: Number(s.tag_count),
|
||||
timeline_entry_count: Number(s.timeline_entry_count),
|
||||
pages_by_type,
|
||||
};
|
||||
}
|
||||
|
||||
async getHealth(): Promise<BrainHealth> {
|
||||
const { rows: [h] } = await this.db.query(`
|
||||
SELECT
|
||||
(SELECT count(*) FROM pages) as page_count,
|
||||
(SELECT count(*) FROM content_chunks WHERE embedded_at IS NOT NULL)::float /
|
||||
GREATEST((SELECT count(*) FROM content_chunks), 1)::float as embed_coverage,
|
||||
(SELECT count(*) FROM pages p
|
||||
WHERE p.updated_at < (SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id)
|
||||
) as stale_pages,
|
||||
(SELECT count(*) FROM pages p
|
||||
WHERE NOT EXISTS (SELECT 1 FROM links l WHERE l.to_page_id = p.id)
|
||||
) as orphan_pages,
|
||||
(SELECT count(*) FROM links l
|
||||
WHERE NOT EXISTS (SELECT 1 FROM pages p WHERE p.id = l.to_page_id)
|
||||
) as dead_links,
|
||||
(SELECT count(*) FROM content_chunks WHERE embedded_at IS NULL) as missing_embeddings
|
||||
`);
|
||||
|
||||
const r = h as Record<string, unknown>;
|
||||
return {
|
||||
page_count: Number(r.page_count),
|
||||
embed_coverage: Number(r.embed_coverage),
|
||||
stale_pages: Number(r.stale_pages),
|
||||
orphan_pages: Number(r.orphan_pages),
|
||||
dead_links: Number(r.dead_links),
|
||||
missing_embeddings: Number(r.missing_embeddings),
|
||||
};
|
||||
}
|
||||
|
||||
// Ingest log
|
||||
async logIngest(entry: IngestLogInput): Promise<void> {
|
||||
await this.db.query(
|
||||
`INSERT INTO ingest_log (source_type, source_ref, pages_updated, summary)
|
||||
VALUES ($1, $2, $3::jsonb, $4)`,
|
||||
[entry.source_type, entry.source_ref, JSON.stringify(entry.pages_updated), entry.summary]
|
||||
);
|
||||
}
|
||||
|
||||
async getIngestLog(opts?: { limit?: number }): Promise<IngestLogEntry[]> {
|
||||
const limit = opts?.limit || 50;
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT * FROM ingest_log ORDER BY created_at DESC LIMIT $1`,
|
||||
[limit]
|
||||
);
|
||||
return rows as unknown as IngestLogEntry[];
|
||||
}
|
||||
|
||||
// Sync
|
||||
async updateSlug(oldSlug: string, newSlug: string): Promise<void> {
|
||||
newSlug = validateSlug(newSlug);
|
||||
await this.db.query(
|
||||
`UPDATE pages SET slug = $1, updated_at = now() WHERE slug = $2`,
|
||||
[newSlug, oldSlug]
|
||||
);
|
||||
}
|
||||
|
||||
async rewriteLinks(_oldSlug: string, _newSlug: string): Promise<void> {
|
||||
// Stub: links use integer page_id FKs, already correct after updateSlug.
|
||||
}
|
||||
|
||||
// Config
|
||||
async getConfig(key: string): Promise<string | null> {
|
||||
const { rows } = await this.db.query('SELECT value FROM config WHERE key = $1', [key]);
|
||||
return rows.length > 0 ? (rows[0] as { value: string }).value : null;
|
||||
}
|
||||
|
||||
async setConfig(key: string, value: string): Promise<void> {
|
||||
await this.db.query(
|
||||
`INSERT INTO config (key, value) VALUES ($1, $2)
|
||||
ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value`,
|
||||
[key, value]
|
||||
);
|
||||
}
|
||||
|
||||
// Migration support
|
||||
async runMigration(_version: number, sql: string): Promise<void> {
|
||||
await this.db.exec(sql);
|
||||
}
|
||||
|
||||
async getChunksWithEmbeddings(slug: string): Promise<Chunk[]> {
|
||||
const { rows } = await this.db.query(
|
||||
`SELECT cc.* FROM content_chunks cc
|
||||
JOIN pages p ON p.id = cc.page_id
|
||||
WHERE p.slug = $1
|
||||
ORDER BY cc.chunk_index`,
|
||||
[slug]
|
||||
);
|
||||
return (rows as Record<string, unknown>[]).map(r => rowToChunk(r, true));
|
||||
}
|
||||
}
|
||||
209
src/core/pglite-schema.ts
Normal file
209
src/core/pglite-schema.ts
Normal file
@@ -0,0 +1,209 @@
|
||||
/**
|
||||
* PGLite schema — derived from schema-embedded.ts (Postgres schema).
|
||||
*
|
||||
* Differences from Postgres:
|
||||
* - No RLS block (no role system in embedded PGLite)
|
||||
* - No access_tokens / mcp_request_log (local-only, no remote auth)
|
||||
* - No files table (file attachments require Supabase Storage)
|
||||
* - No pg_advisory_lock (single connection)
|
||||
*
|
||||
* Everything else is identical: same tables, triggers, indexes, pgvector HNSW, tsvector GIN.
|
||||
*
|
||||
* DRIFT WARNING: When schema-embedded.ts changes, update this file to match.
|
||||
* test/edge-bundle.test.ts has a drift detection test.
|
||||
*/
|
||||
|
||||
export const PGLITE_SCHEMA_SQL = `
|
||||
-- GBrain PGLite schema (local embedded Postgres)
|
||||
|
||||
CREATE EXTENSION IF NOT EXISTS vector;
|
||||
CREATE EXTENSION IF NOT EXISTS pg_trgm;
|
||||
|
||||
-- ============================================================
|
||||
-- pages: the core content table
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS pages (
|
||||
id SERIAL PRIMARY KEY,
|
||||
slug TEXT NOT NULL UNIQUE,
|
||||
type TEXT NOT NULL,
|
||||
title TEXT NOT NULL,
|
||||
compiled_truth TEXT NOT NULL DEFAULT '',
|
||||
timeline TEXT NOT NULL DEFAULT '',
|
||||
frontmatter JSONB NOT NULL DEFAULT '{}',
|
||||
content_hash TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_type ON pages(type);
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_frontmatter ON pages USING GIN(frontmatter);
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_trgm ON pages USING GIN(title gin_trgm_ops);
|
||||
|
||||
-- ============================================================
|
||||
-- content_chunks: chunked content with embeddings
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS content_chunks (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
chunk_index INTEGER NOT NULL,
|
||||
chunk_text TEXT NOT NULL,
|
||||
chunk_source TEXT NOT NULL DEFAULT 'compiled_truth',
|
||||
embedding vector(1536),
|
||||
model TEXT NOT NULL DEFAULT 'text-embedding-3-large',
|
||||
token_count INTEGER,
|
||||
embedded_at TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS idx_chunks_page_index ON content_chunks(page_id, chunk_index);
|
||||
CREATE INDEX IF NOT EXISTS idx_chunks_page ON content_chunks(page_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_chunks_embedding ON content_chunks USING hnsw (embedding vector_cosine_ops);
|
||||
|
||||
-- ============================================================
|
||||
-- links: cross-references between pages
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS links (
|
||||
id SERIAL PRIMARY KEY,
|
||||
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
link_type TEXT NOT NULL DEFAULT '',
|
||||
context TEXT NOT NULL DEFAULT '',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(from_page_id, to_page_id)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_links_from ON links(from_page_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_links_to ON links(to_page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- tags
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS tags (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
tag TEXT NOT NULL,
|
||||
UNIQUE(page_id, tag)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_tags_tag ON tags(tag);
|
||||
CREATE INDEX IF NOT EXISTS idx_tags_page_id ON tags(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- raw_data: sidecar data
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS raw_data (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
source TEXT NOT NULL,
|
||||
data JSONB NOT NULL,
|
||||
fetched_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(page_id, source)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_raw_data_page ON raw_data(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- timeline_entries: structured timeline
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS timeline_entries (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
date DATE NOT NULL,
|
||||
source TEXT NOT NULL DEFAULT '',
|
||||
summary TEXT NOT NULL,
|
||||
detail TEXT NOT NULL DEFAULT '',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_timeline_page ON timeline_entries(page_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_timeline_date ON timeline_entries(date);
|
||||
|
||||
-- ============================================================
|
||||
-- page_versions: snapshot history
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS page_versions (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
compiled_truth TEXT NOT NULL,
|
||||
frontmatter JSONB NOT NULL DEFAULT '{}',
|
||||
snapshot_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_versions_page ON page_versions(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- ingest_log
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS ingest_log (
|
||||
id SERIAL PRIMARY KEY,
|
||||
source_type TEXT NOT NULL,
|
||||
source_ref TEXT NOT NULL,
|
||||
pages_updated JSONB NOT NULL DEFAULT '[]',
|
||||
summary TEXT NOT NULL DEFAULT '',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- config: brain-level settings
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS config (
|
||||
key TEXT PRIMARY KEY,
|
||||
value TEXT NOT NULL
|
||||
);
|
||||
|
||||
INSERT INTO config (key, value) VALUES
|
||||
('version', '1'),
|
||||
('engine', 'pglite'),
|
||||
('embedding_model', 'text-embedding-3-large'),
|
||||
('embedding_dimensions', '1536'),
|
||||
('chunk_strategy', 'semantic')
|
||||
ON CONFLICT (key) DO NOTHING;
|
||||
|
||||
-- ============================================================
|
||||
-- Trigger-based search_vector (spans pages + timeline_entries)
|
||||
-- ============================================================
|
||||
ALTER TABLE pages ADD COLUMN IF NOT EXISTS search_vector tsvector;
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_search ON pages USING GIN(search_vector);
|
||||
|
||||
CREATE OR REPLACE FUNCTION update_page_search_vector() RETURNS trigger AS $$
|
||||
DECLARE
|
||||
timeline_text TEXT;
|
||||
BEGIN
|
||||
SELECT coalesce(string_agg(summary || ' ' || detail, ' '), '')
|
||||
INTO timeline_text
|
||||
FROM timeline_entries
|
||||
WHERE page_id = NEW.id;
|
||||
|
||||
NEW.search_vector :=
|
||||
setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
|
||||
setweight(to_tsvector('english', coalesce(NEW.compiled_truth, '')), 'B') ||
|
||||
setweight(to_tsvector('english', coalesce(NEW.timeline, '')), 'C') ||
|
||||
setweight(to_tsvector('english', coalesce(timeline_text, '')), 'C');
|
||||
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
DROP TRIGGER IF EXISTS trg_pages_search_vector ON pages;
|
||||
CREATE TRIGGER trg_pages_search_vector
|
||||
BEFORE INSERT OR UPDATE ON pages
|
||||
FOR EACH ROW
|
||||
EXECUTE FUNCTION update_page_search_vector();
|
||||
|
||||
CREATE OR REPLACE FUNCTION update_page_search_vector_from_timeline() RETURNS trigger AS $$
|
||||
DECLARE
|
||||
page_row pages%ROWTYPE;
|
||||
BEGIN
|
||||
UPDATE pages SET updated_at = now()
|
||||
WHERE id = coalesce(NEW.page_id, OLD.page_id);
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
DROP TRIGGER IF EXISTS trg_timeline_search_vector ON timeline_entries;
|
||||
CREATE TRIGGER trg_timeline_search_vector
|
||||
AFTER INSERT OR UPDATE OR DELETE ON timeline_entries
|
||||
FOR EACH ROW
|
||||
EXECUTE FUNCTION update_page_search_vector_from_timeline();
|
||||
`;
|
||||
@@ -1,10 +1,9 @@
|
||||
import postgres from 'postgres';
|
||||
import { createHash } from 'crypto';
|
||||
import type { BrainEngine } from './engine.ts';
|
||||
import { runMigrations } from './migrate.ts';
|
||||
import { SCHEMA_SQL } from './schema-embedded.ts';
|
||||
import type {
|
||||
Page, PageInput, PageFilters, PageType,
|
||||
Page, PageInput, PageFilters,
|
||||
Chunk, ChunkInput,
|
||||
SearchResult, SearchOpts,
|
||||
Link, GraphNode,
|
||||
@@ -17,6 +16,7 @@ import type {
|
||||
} from './types.ts';
|
||||
import { GBrainError } from './types.ts';
|
||||
import * as db from './db.ts';
|
||||
import { validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult } from './utils.ts';
|
||||
|
||||
export class PostgresEngine implements BrainEngine {
|
||||
private _sql: ReturnType<typeof postgres> | null = null;
|
||||
@@ -622,60 +622,21 @@ export class PostgresEngine implements BrainEngine {
|
||||
ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value
|
||||
`;
|
||||
}
|
||||
}
|
||||
|
||||
// Helpers
|
||||
function validateSlug(slug: string): string {
|
||||
// Git is the system of record — slugs are lowercased repo-relative paths.
|
||||
if (!slug || /(^|\/)\.\.($|\/)/.test(slug) || /^\//.test(slug)) {
|
||||
throw new Error(`Invalid slug: "${slug}". Slugs cannot be empty, start with /, or contain path traversal.`);
|
||||
// Migration support
|
||||
async runMigration(_version: number, sqlStr: string): Promise<void> {
|
||||
const conn = this.sql;
|
||||
await conn.unsafe(sqlStr);
|
||||
}
|
||||
// Normalize to lowercase — all entry points (pathToSlug, inferSlug, frontmatter, direct writes) go through here
|
||||
return slug.toLowerCase();
|
||||
}
|
||||
|
||||
function contentHash(compiledTruth: string, timeline: string): string {
|
||||
return createHash('sha256').update(compiledTruth + '\n---\n' + timeline).digest('hex');
|
||||
}
|
||||
|
||||
function rowToPage(row: Record<string, unknown>): Page {
|
||||
return {
|
||||
id: row.id as number,
|
||||
slug: row.slug as string,
|
||||
type: row.type as PageType,
|
||||
title: row.title as string,
|
||||
compiled_truth: row.compiled_truth as string,
|
||||
timeline: row.timeline as string,
|
||||
frontmatter: (typeof row.frontmatter === 'string' ? JSON.parse(row.frontmatter) : row.frontmatter) as Record<string, unknown>,
|
||||
content_hash: row.content_hash as string | undefined,
|
||||
created_at: new Date(row.created_at as string),
|
||||
updated_at: new Date(row.updated_at as string),
|
||||
};
|
||||
}
|
||||
|
||||
function rowToChunk(row: Record<string, unknown>): Chunk {
|
||||
return {
|
||||
id: row.id as number,
|
||||
page_id: row.page_id as number,
|
||||
chunk_index: row.chunk_index as number,
|
||||
chunk_text: row.chunk_text as string,
|
||||
chunk_source: row.chunk_source as 'compiled_truth' | 'timeline',
|
||||
embedding: null, // Don't load embeddings into memory by default
|
||||
model: row.model as string,
|
||||
token_count: row.token_count as number | null,
|
||||
embedded_at: row.embedded_at ? new Date(row.embedded_at as string) : null,
|
||||
};
|
||||
}
|
||||
|
||||
function rowToSearchResult(row: Record<string, unknown>): SearchResult {
|
||||
return {
|
||||
slug: row.slug as string,
|
||||
page_id: row.page_id as number,
|
||||
title: row.title as string,
|
||||
type: row.type as PageType,
|
||||
chunk_text: row.chunk_text as string,
|
||||
chunk_source: row.chunk_source as 'compiled_truth' | 'timeline',
|
||||
score: Number(row.score),
|
||||
stale: Boolean(row.stale),
|
||||
};
|
||||
async getChunksWithEmbeddings(slug: string): Promise<Chunk[]> {
|
||||
const conn = this.sql;
|
||||
const rows = await conn`
|
||||
SELECT cc.* FROM content_chunks cc
|
||||
JOIN pages p ON p.id = cc.page_id
|
||||
WHERE p.slug = ${slug}
|
||||
ORDER BY cc.chunk_index
|
||||
`;
|
||||
return rows.map((r: Record<string, unknown>) => rowToChunk(r, true));
|
||||
}
|
||||
}
|
||||
|
||||
@@ -25,6 +25,14 @@ export async function hybridSearch(
|
||||
): Promise<SearchResult[]> {
|
||||
const limit = opts?.limit || 20;
|
||||
|
||||
// Run keyword search (always available, no API key needed)
|
||||
const keywordResults = await engine.searchKeyword(query, { limit: limit * 2 });
|
||||
|
||||
// Skip vector search entirely if no OpenAI key is configured
|
||||
if (!process.env.OPENAI_API_KEY) {
|
||||
return dedupResults(keywordResults).slice(0, limit);
|
||||
}
|
||||
|
||||
// Determine query variants (optionally with expansion)
|
||||
let queries = [query];
|
||||
if (opts?.expansion && opts?.expandFn) {
|
||||
@@ -36,16 +44,20 @@ export async function hybridSearch(
|
||||
}
|
||||
}
|
||||
|
||||
// Run keyword search concurrently with embed+vector pipeline
|
||||
const [keywordResults, embeddings] = await Promise.all([
|
||||
engine.searchKeyword(query, { limit: limit * 2 }),
|
||||
Promise.all(queries.map(q => embed(q))),
|
||||
]);
|
||||
// Embed all query variants and run vector search
|
||||
let vectorLists: SearchResult[][] = [];
|
||||
try {
|
||||
const embeddings = await Promise.all(queries.map(q => embed(q)));
|
||||
vectorLists = await Promise.all(
|
||||
embeddings.map(emb => engine.searchVector(emb, { limit: limit * 2 })),
|
||||
);
|
||||
} catch {
|
||||
// Embedding failure is non-fatal, fall back to keyword-only
|
||||
}
|
||||
|
||||
// Run vector search for each embedding
|
||||
const vectorLists = await Promise.all(
|
||||
embeddings.map(emb => engine.searchVector(emb, { limit: limit * 2 })),
|
||||
);
|
||||
if (vectorLists.length === 0) {
|
||||
return dedupResults(keywordResults).slice(0, limit);
|
||||
}
|
||||
|
||||
// Merge all result lists via RRF
|
||||
const allLists = [...vectorLists, keywordResults];
|
||||
|
||||
@@ -167,7 +167,7 @@ export interface IngestLogInput {
|
||||
export interface EngineConfig {
|
||||
database_url?: string;
|
||||
database_path?: string;
|
||||
engine?: 'postgres' | 'sqlite';
|
||||
engine?: 'postgres' | 'pglite';
|
||||
}
|
||||
|
||||
// Errors
|
||||
|
||||
62
src/core/utils.ts
Normal file
62
src/core/utils.ts
Normal file
@@ -0,0 +1,62 @@
|
||||
import { createHash } from 'crypto';
|
||||
import type { Page, PageType, Chunk, SearchResult } from './types.ts';
|
||||
|
||||
/**
|
||||
* Validate and normalize a slug. Slugs are lowercased repo-relative paths.
|
||||
* Rejects empty slugs, path traversal (..), and leading /.
|
||||
*/
|
||||
export function validateSlug(slug: string): string {
|
||||
if (!slug || /(^|\/)\.\.($|\/)/.test(slug) || /^\//.test(slug)) {
|
||||
throw new Error(`Invalid slug: "${slug}". Slugs cannot be empty, start with /, or contain path traversal.`);
|
||||
}
|
||||
return slug.toLowerCase();
|
||||
}
|
||||
|
||||
/**
|
||||
* SHA-256 hash of compiled_truth + timeline, used for import idempotency.
|
||||
*/
|
||||
export function contentHash(compiledTruth: string, timeline: string): string {
|
||||
return createHash('sha256').update(compiledTruth + '\n---\n' + timeline).digest('hex');
|
||||
}
|
||||
|
||||
export function rowToPage(row: Record<string, unknown>): Page {
|
||||
return {
|
||||
id: row.id as number,
|
||||
slug: row.slug as string,
|
||||
type: row.type as PageType,
|
||||
title: row.title as string,
|
||||
compiled_truth: row.compiled_truth as string,
|
||||
timeline: row.timeline as string,
|
||||
frontmatter: (typeof row.frontmatter === 'string' ? JSON.parse(row.frontmatter) : row.frontmatter) as Record<string, unknown>,
|
||||
content_hash: row.content_hash as string | undefined,
|
||||
created_at: new Date(row.created_at as string),
|
||||
updated_at: new Date(row.updated_at as string),
|
||||
};
|
||||
}
|
||||
|
||||
export function rowToChunk(row: Record<string, unknown>, includeEmbedding = false): Chunk {
|
||||
return {
|
||||
id: row.id as number,
|
||||
page_id: row.page_id as number,
|
||||
chunk_index: row.chunk_index as number,
|
||||
chunk_text: row.chunk_text as string,
|
||||
chunk_source: row.chunk_source as 'compiled_truth' | 'timeline',
|
||||
embedding: includeEmbedding && row.embedding ? row.embedding as Float32Array : null,
|
||||
model: row.model as string,
|
||||
token_count: row.token_count as number | null,
|
||||
embedded_at: row.embedded_at ? new Date(row.embedded_at as string) : null,
|
||||
};
|
||||
}
|
||||
|
||||
export function rowToSearchResult(row: Record<string, unknown>): SearchResult {
|
||||
return {
|
||||
slug: row.slug as string,
|
||||
page_id: row.page_id as number,
|
||||
title: row.title as string,
|
||||
type: row.type as PageType,
|
||||
chunk_text: row.chunk_text as string,
|
||||
chunk_source: row.chunk_source as 'compiled_truth' | 'timeline',
|
||||
score: Number(row.score),
|
||||
stale: Boolean(row.stale),
|
||||
};
|
||||
}
|
||||
27
test/engine-factory.test.ts
Normal file
27
test/engine-factory.test.ts
Normal file
@@ -0,0 +1,27 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { createEngine } from '../src/core/engine-factory.ts';
|
||||
|
||||
describe('createEngine', () => {
|
||||
test('returns PGLiteEngine for pglite', async () => {
|
||||
const engine = await createEngine({ engine: 'pglite' });
|
||||
expect(engine.constructor.name).toBe('PGLiteEngine');
|
||||
});
|
||||
|
||||
test('returns PostgresEngine for postgres', async () => {
|
||||
const engine = await createEngine({ engine: 'postgres' });
|
||||
expect(engine.constructor.name).toBe('PostgresEngine');
|
||||
});
|
||||
|
||||
test('defaults to PostgresEngine when engine is undefined', async () => {
|
||||
const engine = await createEngine({});
|
||||
expect(engine.constructor.name).toBe('PostgresEngine');
|
||||
});
|
||||
|
||||
test('throws for sqlite with helpful message', async () => {
|
||||
await expect(createEngine({ engine: 'sqlite' as any })).rejects.toThrow('pglite');
|
||||
});
|
||||
|
||||
test('throws for unknown engine', async () => {
|
||||
await expect(createEngine({ engine: 'mysql' as any })).rejects.toThrow('Unknown engine');
|
||||
});
|
||||
});
|
||||
495
test/pglite-engine.test.ts
Normal file
495
test/pglite-engine.test.ts
Normal file
@@ -0,0 +1,495 @@
|
||||
/**
|
||||
* PGLite Engine Tests — validates all 37 BrainEngine methods against PGLite (in-memory).
|
||||
*
|
||||
* No Docker, no DATABASE_URL, no external dependencies. Runs instantly in CI.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
|
||||
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
|
||||
import type { BrainEngine } from '../src/core/engine.ts';
|
||||
import type { PageInput, ChunkInput } from '../src/core/types.ts';
|
||||
|
||||
let engine: PGLiteEngine;
|
||||
|
||||
beforeAll(async () => {
|
||||
engine = new PGLiteEngine();
|
||||
await engine.connect({}); // in-memory
|
||||
await engine.initSchema();
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await engine.disconnect();
|
||||
});
|
||||
|
||||
// Helper to reset data between test groups
|
||||
async function truncateAll() {
|
||||
const tables = [
|
||||
'content_chunks', 'links', 'tags', 'raw_data',
|
||||
'timeline_entries', 'page_versions', 'ingest_log', 'pages',
|
||||
];
|
||||
for (const t of tables) {
|
||||
await (engine as any).db.exec(`DELETE FROM ${t}`);
|
||||
}
|
||||
}
|
||||
|
||||
const testPage: PageInput = {
|
||||
type: 'concept',
|
||||
title: 'Test Page',
|
||||
compiled_truth: 'This is a test page about NovaMind AI agents.',
|
||||
timeline: '2024-01-15: Founded NovaMind',
|
||||
};
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Pages CRUD
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('PGLiteEngine: Pages', () => {
|
||||
beforeEach(truncateAll);
|
||||
|
||||
test('putPage + getPage round trip', async () => {
|
||||
const page = await engine.putPage('test/hello', testPage);
|
||||
expect(page.slug).toBe('test/hello');
|
||||
expect(page.title).toBe('Test Page');
|
||||
expect(page.type).toBe('concept');
|
||||
expect(page.compiled_truth).toContain('NovaMind');
|
||||
|
||||
const fetched = await engine.getPage('test/hello');
|
||||
expect(fetched).not.toBeNull();
|
||||
expect(fetched!.title).toBe('Test Page');
|
||||
expect(fetched!.content_hash).toBeTruthy();
|
||||
});
|
||||
|
||||
test('putPage upserts on conflict', async () => {
|
||||
await engine.putPage('test/upsert', testPage);
|
||||
const updated = await engine.putPage('test/upsert', {
|
||||
...testPage,
|
||||
title: 'Updated Title',
|
||||
});
|
||||
expect(updated.title).toBe('Updated Title');
|
||||
|
||||
const all = await engine.listPages();
|
||||
const matches = all.filter(p => p.slug === 'test/upsert');
|
||||
expect(matches.length).toBe(1);
|
||||
});
|
||||
|
||||
test('getPage returns null for missing slug', async () => {
|
||||
const result = await engine.getPage('nonexistent/slug');
|
||||
expect(result).toBeNull();
|
||||
});
|
||||
|
||||
test('deletePage removes page', async () => {
|
||||
await engine.putPage('test/delete-me', testPage);
|
||||
await engine.deletePage('test/delete-me');
|
||||
const result = await engine.getPage('test/delete-me');
|
||||
expect(result).toBeNull();
|
||||
});
|
||||
|
||||
test('listPages with type filter', async () => {
|
||||
await engine.putPage('people/alice', { ...testPage, type: 'person', title: 'Alice' });
|
||||
await engine.putPage('concepts/rag', { ...testPage, type: 'concept', title: 'RAG' });
|
||||
|
||||
const people = await engine.listPages({ type: 'person' });
|
||||
expect(people.length).toBe(1);
|
||||
expect(people[0].title).toBe('Alice');
|
||||
});
|
||||
|
||||
test('listPages with tag filter', async () => {
|
||||
await engine.putPage('test/tagged', testPage);
|
||||
await engine.addTag('test/tagged', 'special');
|
||||
|
||||
const tagged = await engine.listPages({ tag: 'special' });
|
||||
expect(tagged.length).toBe(1);
|
||||
expect(tagged[0].slug).toBe('test/tagged');
|
||||
});
|
||||
|
||||
test('resolveSlugs exact match', async () => {
|
||||
await engine.putPage('test/exact', testPage);
|
||||
const slugs = await engine.resolveSlugs('test/exact');
|
||||
expect(slugs).toEqual(['test/exact']);
|
||||
});
|
||||
|
||||
test('resolveSlugs fuzzy match via pg_trgm', async () => {
|
||||
await engine.putPage('people/sarah-chen', { ...testPage, title: 'Sarah Chen' });
|
||||
const slugs = await engine.resolveSlugs('sarah');
|
||||
expect(slugs.length).toBeGreaterThan(0);
|
||||
expect(slugs).toContain('people/sarah-chen');
|
||||
});
|
||||
|
||||
test('updateSlug renames page', async () => {
|
||||
await engine.putPage('test/old-name', testPage);
|
||||
await engine.updateSlug('test/old-name', 'test/new-name');
|
||||
expect(await engine.getPage('test/old-name')).toBeNull();
|
||||
expect((await engine.getPage('test/new-name'))?.title).toBe('Test Page');
|
||||
});
|
||||
|
||||
test('validateSlug rejects path traversal', async () => {
|
||||
expect(() => engine.putPage('../etc/passwd', testPage)).toThrow();
|
||||
});
|
||||
|
||||
test('validateSlug rejects leading slash', async () => {
|
||||
expect(() => engine.putPage('/absolute/path', testPage)).toThrow();
|
||||
});
|
||||
|
||||
test('validateSlug normalizes to lowercase', async () => {
|
||||
const page = await engine.putPage('Test/UPPER', testPage);
|
||||
expect(page.slug).toBe('test/upper');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Search (tsvector triggers + FTS)
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('PGLiteEngine: Search', () => {
|
||||
beforeAll(async () => {
|
||||
await truncateAll();
|
||||
await engine.putPage('companies/novamind', {
|
||||
type: 'company', title: 'NovaMind',
|
||||
compiled_truth: 'NovaMind builds AI agents for enterprise automation.',
|
||||
});
|
||||
await engine.upsertChunks('companies/novamind', [
|
||||
{ chunk_index: 0, chunk_text: 'NovaMind builds AI agents for enterprise', chunk_source: 'compiled_truth' },
|
||||
]);
|
||||
await engine.putPage('concepts/rag', {
|
||||
type: 'concept', title: 'Retrieval-Augmented Generation',
|
||||
compiled_truth: 'RAG combines retrieval with generation for better answers.',
|
||||
});
|
||||
await engine.upsertChunks('concepts/rag', [
|
||||
{ chunk_index: 0, chunk_text: 'RAG combines retrieval with generation', chunk_source: 'compiled_truth' },
|
||||
]);
|
||||
});
|
||||
|
||||
test('searchKeyword returns results for matching term', async () => {
|
||||
const results = await engine.searchKeyword('NovaMind');
|
||||
expect(results.length).toBeGreaterThan(0);
|
||||
expect(results[0].slug).toBe('companies/novamind');
|
||||
});
|
||||
|
||||
test('searchKeyword returns empty for non-matching term', async () => {
|
||||
const results = await engine.searchKeyword('xyznonexistent');
|
||||
expect(results.length).toBe(0);
|
||||
});
|
||||
|
||||
test('tsvector trigger populates search_vector on insert', async () => {
|
||||
// Verify the PL/pgSQL trigger fires and search_vector is populated
|
||||
const results = await engine.searchKeyword('enterprise automation');
|
||||
expect(results.length).toBeGreaterThan(0);
|
||||
});
|
||||
|
||||
test('searchVector returns empty when no embeddings', async () => {
|
||||
const fakeEmbedding = new Float32Array(1536);
|
||||
const results = await engine.searchVector(fakeEmbedding);
|
||||
expect(results.length).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Chunks
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('PGLiteEngine: Chunks', () => {
|
||||
beforeEach(truncateAll);
|
||||
|
||||
test('upsertChunks + getChunks round trip', async () => {
|
||||
await engine.putPage('test/chunks', testPage);
|
||||
await engine.upsertChunks('test/chunks', [
|
||||
{ chunk_index: 0, chunk_text: 'Chunk zero', chunk_source: 'compiled_truth' },
|
||||
{ chunk_index: 1, chunk_text: 'Chunk one', chunk_source: 'compiled_truth' },
|
||||
]);
|
||||
const chunks = await engine.getChunks('test/chunks');
|
||||
expect(chunks.length).toBe(2);
|
||||
expect(chunks[0].chunk_text).toBe('Chunk zero');
|
||||
expect(chunks[1].chunk_text).toBe('Chunk one');
|
||||
});
|
||||
|
||||
test('upsertChunks removes orphan chunks', async () => {
|
||||
await engine.putPage('test/orphan', testPage);
|
||||
await engine.upsertChunks('test/orphan', [
|
||||
{ chunk_index: 0, chunk_text: 'Keep', chunk_source: 'compiled_truth' },
|
||||
{ chunk_index: 1, chunk_text: 'Remove', chunk_source: 'compiled_truth' },
|
||||
]);
|
||||
// Re-upsert with only index 0
|
||||
await engine.upsertChunks('test/orphan', [
|
||||
{ chunk_index: 0, chunk_text: 'Updated', chunk_source: 'compiled_truth' },
|
||||
]);
|
||||
const chunks = await engine.getChunks('test/orphan');
|
||||
expect(chunks.length).toBe(1);
|
||||
expect(chunks[0].chunk_text).toBe('Updated');
|
||||
});
|
||||
|
||||
test('upsertChunks throws for missing page', async () => {
|
||||
await expect(
|
||||
engine.upsertChunks('nonexistent/page', [
|
||||
{ chunk_index: 0, chunk_text: 'test', chunk_source: 'compiled_truth' },
|
||||
])
|
||||
).rejects.toThrow('Page not found');
|
||||
});
|
||||
|
||||
test('deleteChunks removes all chunks for page', async () => {
|
||||
await engine.putPage('test/delete-chunks', testPage);
|
||||
await engine.upsertChunks('test/delete-chunks', [
|
||||
{ chunk_index: 0, chunk_text: 'Gone', chunk_source: 'compiled_truth' },
|
||||
]);
|
||||
await engine.deleteChunks('test/delete-chunks');
|
||||
const chunks = await engine.getChunks('test/delete-chunks');
|
||||
expect(chunks.length).toBe(0);
|
||||
});
|
||||
|
||||
test('getChunksWithEmbeddings returns embedding data', async () => {
|
||||
await engine.putPage('test/embed', testPage);
|
||||
const embedding = new Float32Array(1536).fill(0.1);
|
||||
await engine.upsertChunks('test/embed', [
|
||||
{ chunk_index: 0, chunk_text: 'With embedding', chunk_source: 'compiled_truth', embedding },
|
||||
]);
|
||||
const chunks = await engine.getChunksWithEmbeddings('test/embed');
|
||||
expect(chunks.length).toBe(1);
|
||||
expect(chunks[0].embedding).not.toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Links + Graph
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('PGLiteEngine: Links', () => {
|
||||
beforeEach(async () => {
|
||||
await truncateAll();
|
||||
await engine.putPage('people/alice', { ...testPage, type: 'person', title: 'Alice' });
|
||||
await engine.putPage('companies/acme', { ...testPage, type: 'company', title: 'ACME' });
|
||||
await engine.putPage('companies/beta', { ...testPage, type: 'company', title: 'Beta' });
|
||||
});
|
||||
|
||||
test('addLink + getLinks', async () => {
|
||||
await engine.addLink('people/alice', 'companies/acme', 'works at', 'employment');
|
||||
const links = await engine.getLinks('people/alice');
|
||||
expect(links.length).toBe(1);
|
||||
expect(links[0].to_slug).toBe('companies/acme');
|
||||
});
|
||||
|
||||
test('getBacklinks', async () => {
|
||||
await engine.addLink('people/alice', 'companies/acme');
|
||||
const backlinks = await engine.getBacklinks('companies/acme');
|
||||
expect(backlinks.length).toBe(1);
|
||||
expect(backlinks[0].from_slug).toBe('people/alice');
|
||||
});
|
||||
|
||||
test('removeLink', async () => {
|
||||
await engine.addLink('people/alice', 'companies/acme');
|
||||
await engine.removeLink('people/alice', 'companies/acme');
|
||||
const links = await engine.getLinks('people/alice');
|
||||
expect(links.length).toBe(0);
|
||||
});
|
||||
|
||||
test('traverseGraph with depth', async () => {
|
||||
await engine.addLink('people/alice', 'companies/acme');
|
||||
await engine.addLink('companies/acme', 'companies/beta');
|
||||
|
||||
const graph = await engine.traverseGraph('people/alice', 2);
|
||||
expect(graph.length).toBeGreaterThanOrEqual(2);
|
||||
const slugs = graph.map(n => n.slug);
|
||||
expect(slugs).toContain('people/alice');
|
||||
expect(slugs).toContain('companies/acme');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Tags
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('PGLiteEngine: Tags', () => {
|
||||
beforeEach(async () => {
|
||||
await truncateAll();
|
||||
await engine.putPage('test/tags', testPage);
|
||||
});
|
||||
|
||||
test('addTag + getTags', async () => {
|
||||
await engine.addTag('test/tags', 'alpha');
|
||||
await engine.addTag('test/tags', 'beta');
|
||||
const tags = await engine.getTags('test/tags');
|
||||
expect(tags).toEqual(['alpha', 'beta']);
|
||||
});
|
||||
|
||||
test('removeTag', async () => {
|
||||
await engine.addTag('test/tags', 'remove-me');
|
||||
await engine.removeTag('test/tags', 'remove-me');
|
||||
const tags = await engine.getTags('test/tags');
|
||||
expect(tags).not.toContain('remove-me');
|
||||
});
|
||||
|
||||
test('duplicate tag is idempotent', async () => {
|
||||
await engine.addTag('test/tags', 'dup');
|
||||
await engine.addTag('test/tags', 'dup');
|
||||
const tags = await engine.getTags('test/tags');
|
||||
expect(tags.filter(t => t === 'dup').length).toBe(1);
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Timeline
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('PGLiteEngine: Timeline', () => {
|
||||
beforeEach(async () => {
|
||||
await truncateAll();
|
||||
await engine.putPage('test/timeline', testPage);
|
||||
});
|
||||
|
||||
test('addTimelineEntry + getTimeline', async () => {
|
||||
await engine.addTimelineEntry('test/timeline', {
|
||||
date: '2024-01-15', summary: 'Founded', detail: 'Company founded',
|
||||
});
|
||||
const entries = await engine.getTimeline('test/timeline');
|
||||
expect(entries.length).toBe(1);
|
||||
expect(entries[0].summary).toBe('Founded');
|
||||
});
|
||||
|
||||
test('getTimeline with date range', async () => {
|
||||
await engine.addTimelineEntry('test/timeline', { date: '2024-01-01', summary: 'Jan' });
|
||||
await engine.addTimelineEntry('test/timeline', { date: '2024-06-01', summary: 'Jun' });
|
||||
await engine.addTimelineEntry('test/timeline', { date: '2024-12-01', summary: 'Dec' });
|
||||
|
||||
const filtered = await engine.getTimeline('test/timeline', {
|
||||
after: '2024-03-01', before: '2024-09-01',
|
||||
});
|
||||
expect(filtered.length).toBe(1);
|
||||
expect(filtered[0].summary).toBe('Jun');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Raw Data, Versions, Config, IngestLog
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('PGLiteEngine: RawData', () => {
|
||||
beforeEach(async () => {
|
||||
await truncateAll();
|
||||
await engine.putPage('test/raw', testPage);
|
||||
});
|
||||
|
||||
test('putRawData + getRawData', async () => {
|
||||
await engine.putRawData('test/raw', 'crunchbase', { funding: '$10M' });
|
||||
const data = await engine.getRawData('test/raw', 'crunchbase');
|
||||
expect(data.length).toBe(1);
|
||||
expect((data[0].data as any).funding).toBe('$10M');
|
||||
});
|
||||
});
|
||||
|
||||
describe('PGLiteEngine: Versions', () => {
|
||||
beforeEach(async () => {
|
||||
await truncateAll();
|
||||
await engine.putPage('test/version', testPage);
|
||||
});
|
||||
|
||||
test('createVersion + getVersions', async () => {
|
||||
const v = await engine.createVersion('test/version');
|
||||
expect(v.compiled_truth).toBe(testPage.compiled_truth);
|
||||
|
||||
const versions = await engine.getVersions('test/version');
|
||||
expect(versions.length).toBe(1);
|
||||
});
|
||||
|
||||
test('revertToVersion restores content', async () => {
|
||||
await engine.createVersion('test/version');
|
||||
await engine.putPage('test/version', { ...testPage, compiled_truth: 'Changed' });
|
||||
|
||||
const versions = await engine.getVersions('test/version');
|
||||
await engine.revertToVersion('test/version', versions[0].id);
|
||||
|
||||
const page = await engine.getPage('test/version');
|
||||
expect(page!.compiled_truth).toBe(testPage.compiled_truth);
|
||||
});
|
||||
});
|
||||
|
||||
describe('PGLiteEngine: Config', () => {
|
||||
test('getConfig + setConfig', async () => {
|
||||
await engine.setConfig('test_key', 'test_value');
|
||||
const val = await engine.getConfig('test_key');
|
||||
expect(val).toBe('test_value');
|
||||
});
|
||||
|
||||
test('getConfig returns null for missing key', async () => {
|
||||
const val = await engine.getConfig('nonexistent_key');
|
||||
expect(val).toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
describe('PGLiteEngine: IngestLog', () => {
|
||||
test('logIngest + getIngestLog', async () => {
|
||||
await engine.logIngest({
|
||||
source_type: 'git', source_ref: '/tmp/test-repo',
|
||||
pages_updated: ['test/a', 'test/b'], summary: 'Imported 2 pages',
|
||||
});
|
||||
const log = await engine.getIngestLog({ limit: 10 });
|
||||
expect(log.length).toBeGreaterThan(0);
|
||||
expect(log[0].source_type).toBe('git');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Stats + Health
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('PGLiteEngine: Stats & Health', () => {
|
||||
beforeAll(async () => {
|
||||
await truncateAll();
|
||||
await engine.putPage('test/stats', testPage);
|
||||
await engine.upsertChunks('test/stats', [
|
||||
{ chunk_index: 0, chunk_text: 'chunk', chunk_source: 'compiled_truth' },
|
||||
]);
|
||||
await engine.addTag('test/stats', 'stat-tag');
|
||||
});
|
||||
|
||||
test('getStats returns correct counts', async () => {
|
||||
const stats = await engine.getStats();
|
||||
expect(stats.page_count).toBe(1);
|
||||
expect(stats.chunk_count).toBe(1);
|
||||
expect(stats.tag_count).toBe(1);
|
||||
expect(stats.pages_by_type.concept).toBe(1);
|
||||
});
|
||||
|
||||
test('getHealth returns coverage metrics', async () => {
|
||||
const health = await engine.getHealth();
|
||||
expect(health.page_count).toBe(1);
|
||||
expect(health.missing_embeddings).toBe(1); // chunk has no embedding
|
||||
expect(health.embed_coverage).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Transactions
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('PGLiteEngine: Transactions', () => {
|
||||
beforeEach(truncateAll);
|
||||
|
||||
test('transaction commits on success', async () => {
|
||||
await engine.transaction(async (tx) => {
|
||||
await tx.putPage('test/tx-ok', testPage);
|
||||
});
|
||||
const page = await engine.getPage('test/tx-ok');
|
||||
expect(page).not.toBeNull();
|
||||
});
|
||||
|
||||
test('transaction rolls back on error', async () => {
|
||||
try {
|
||||
await engine.transaction(async (tx) => {
|
||||
await tx.putPage('test/tx-fail', testPage);
|
||||
throw new Error('Deliberate rollback');
|
||||
});
|
||||
} catch { /* expected */ }
|
||||
|
||||
const page = await engine.getPage('test/tx-fail');
|
||||
expect(page).toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Cascade deletes
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
describe('PGLiteEngine: Cascade deletes', () => {
|
||||
test('deleting a page cascades to chunks, tags, links', async () => {
|
||||
await engine.putPage('test/cascade', testPage);
|
||||
await engine.upsertChunks('test/cascade', [
|
||||
{ chunk_index: 0, chunk_text: 'cascade chunk', chunk_source: 'compiled_truth' },
|
||||
]);
|
||||
await engine.addTag('test/cascade', 'cascade-tag');
|
||||
|
||||
await engine.deletePage('test/cascade');
|
||||
|
||||
const chunks = await engine.getChunks('test/cascade');
|
||||
expect(chunks.length).toBe(0);
|
||||
const tags = await engine.getTags('test/cascade');
|
||||
expect(tags.length).toBe(0);
|
||||
});
|
||||
});
|
||||
112
test/utils.test.ts
Normal file
112
test/utils.test.ts
Normal file
@@ -0,0 +1,112 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult } from '../src/core/utils.ts';
|
||||
|
||||
describe('validateSlug', () => {
|
||||
test('accepts valid slugs', () => {
|
||||
expect(validateSlug('people/sarah-chen')).toBe('people/sarah-chen');
|
||||
expect(validateSlug('concepts/rag')).toBe('concepts/rag');
|
||||
expect(validateSlug('simple')).toBe('simple');
|
||||
});
|
||||
|
||||
test('normalizes to lowercase', () => {
|
||||
expect(validateSlug('People/Sarah-Chen')).toBe('people/sarah-chen');
|
||||
expect(validateSlug('UPPER')).toBe('upper');
|
||||
});
|
||||
|
||||
test('rejects empty slug', () => {
|
||||
expect(() => validateSlug('')).toThrow('Invalid slug');
|
||||
});
|
||||
|
||||
test('rejects path traversal', () => {
|
||||
expect(() => validateSlug('../etc/passwd')).toThrow('path traversal');
|
||||
expect(() => validateSlug('test/../hack')).toThrow('path traversal');
|
||||
});
|
||||
|
||||
test('rejects leading slash', () => {
|
||||
expect(() => validateSlug('/absolute/path')).toThrow('start with /');
|
||||
});
|
||||
});
|
||||
|
||||
describe('contentHash', () => {
|
||||
test('returns deterministic hash', () => {
|
||||
const h1 = contentHash('hello', 'world');
|
||||
const h2 = contentHash('hello', 'world');
|
||||
expect(h1).toBe(h2);
|
||||
});
|
||||
|
||||
test('changes when content changes', () => {
|
||||
const h1 = contentHash('hello', 'world');
|
||||
const h2 = contentHash('hello', 'changed');
|
||||
expect(h1).not.toBe(h2);
|
||||
});
|
||||
|
||||
test('returns hex string', () => {
|
||||
const h = contentHash('test', '');
|
||||
expect(h).toMatch(/^[a-f0-9]{64}$/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('rowToPage', () => {
|
||||
test('parses string frontmatter', () => {
|
||||
const page = rowToPage({
|
||||
id: 1, slug: 'test', type: 'concept', title: 'Test',
|
||||
compiled_truth: 'body', timeline: '',
|
||||
frontmatter: '{"key":"val"}',
|
||||
content_hash: 'abc', created_at: '2024-01-01', updated_at: '2024-01-01',
|
||||
});
|
||||
expect(page.frontmatter.key).toBe('val');
|
||||
});
|
||||
|
||||
test('handles object frontmatter', () => {
|
||||
const page = rowToPage({
|
||||
id: 1, slug: 'test', type: 'concept', title: 'Test',
|
||||
compiled_truth: 'body', timeline: '',
|
||||
frontmatter: { key: 'val' },
|
||||
content_hash: 'abc', created_at: '2024-01-01', updated_at: '2024-01-01',
|
||||
});
|
||||
expect(page.frontmatter.key).toBe('val');
|
||||
});
|
||||
|
||||
test('creates Date objects', () => {
|
||||
const page = rowToPage({
|
||||
id: 1, slug: 'test', type: 'concept', title: 'Test',
|
||||
compiled_truth: '', timeline: '', frontmatter: '{}',
|
||||
content_hash: null, created_at: '2024-01-01T00:00:00Z', updated_at: '2024-01-01T00:00:00Z',
|
||||
});
|
||||
expect(page.created_at).toBeInstanceOf(Date);
|
||||
expect(page.updated_at).toBeInstanceOf(Date);
|
||||
});
|
||||
});
|
||||
|
||||
describe('rowToChunk', () => {
|
||||
test('nulls embedding by default', () => {
|
||||
const chunk = rowToChunk({
|
||||
id: 1, page_id: 1, chunk_index: 0, chunk_text: 'text',
|
||||
chunk_source: 'compiled_truth', embedding: new Float32Array(10),
|
||||
model: 'test', token_count: 5, embedded_at: '2024-01-01',
|
||||
});
|
||||
expect(chunk.embedding).toBeNull();
|
||||
});
|
||||
|
||||
test('includes embedding when requested', () => {
|
||||
const emb = new Float32Array(10).fill(0.5);
|
||||
const chunk = rowToChunk({
|
||||
id: 1, page_id: 1, chunk_index: 0, chunk_text: 'text',
|
||||
chunk_source: 'compiled_truth', embedding: emb,
|
||||
model: 'test', token_count: 5, embedded_at: '2024-01-01',
|
||||
}, true);
|
||||
expect(chunk.embedding).not.toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
describe('rowToSearchResult', () => {
|
||||
test('coerces score to number', () => {
|
||||
const r = rowToSearchResult({
|
||||
slug: 'test', page_id: 1, title: 'Test', type: 'concept',
|
||||
chunk_text: 'text', chunk_source: 'compiled_truth',
|
||||
score: '0.95', stale: false,
|
||||
});
|
||||
expect(typeof r.score).toBe('number');
|
||||
expect(r.score).toBe(0.95);
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user