Files
gbrain/CLAUDE.md
Garry Tan 6c7d2ed30b feat: PGLite engine — local brain, zero infrastructure (v0.7.0) (#41)
* refactor: extract shared utils, add runMigration + getChunksWithEmbeddings to BrainEngine

Extract validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult
from postgres-engine.ts into shared utils.ts. Add rowToChunk includeEmbedding
parameter for migration support.

Add two new methods to BrainEngine interface:
- runMigration(version, sql) — replaces internal eng.sql access in migrate.ts
- getChunksWithEmbeddings(slug) — returns chunks with embedding data for migration

Replace 'sqlite' with 'pglite' in EngineConfig and GBrainConfig types.
Fix loadConfig to infer engine from database_path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: pluggable engine factory + hybridSearch keyword-only fallback

Add createEngine() factory with dynamic imports so PGLite WASM is never
loaded for Postgres users. Wire CLI to use factory instead of hardcoded
PostgresEngine.

Force workers=1 for PGLite imports (single-connection architecture).

Fix hybridSearch to check OPENAI_API_KEY before calling embed(). When
unset, returns keyword-only results instead of throwing. Critical for
local PGLite users who don't need vector search.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: PGLiteEngine — embedded Postgres 17.5 via WASM, same SQL everywhere

Full BrainEngine implementation (37 methods) using @electric-sql/pglite.
Same SQL as PostgresEngine — tsvector triggers, pgvector HNSW, pg_trgm
fuzzy matching, recursive CTEs, JSONB. Only the driver call syntax differs
(parameterized queries instead of tagged templates).

PGLite schema is the Postgres schema minus RLS, advisory locks, and
remote auth tables (access_tokens, mcp_request_log, files).

No server. No subscription. One directory. Works offline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: smart init (PGLite default) + bidirectional engine migration

gbrain init now defaults to PGLite — brain ready in 2 seconds, no
server needed. Scans target directory: <1000 .md files = PGLite,
>=1000 = suggests Supabase. --supabase and --pglite flags override.

gbrain migrate --to supabase/pglite transfers all data between engines
with manifest-based resume. Copies pages, chunks (with embeddings),
tags, timeline, raw data, links, and config. --force overwrites
non-empty target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: 60 new tests for PGLite engine, utils, and factory

41 PGLite engine tests covering all 37 BrainEngine methods: CRUD,
tsvector keyword search, pg_trgm fuzzy matching, chunk upsert with
COALESCE, graph traversal via recursive CTE, transactions, cascade
deletes, stats/health, and embedding round-trip.

14 shared utility tests (validateSlug, contentHash, row mappers).
5 engine factory tests (dispatch, error messages).

All run in-memory — zero Docker, zero DATABASE_URL, instant in CI.

Add P0 TODO: submit Bun PR for WASM embedding in bun build --compile
(oven-sh/bun#15032).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.7.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.7.0 PGLite engine

- CLAUDE.md: add PGLite key files, update architecture, add migrate command, add 3 test files
- README.md: PGLite as default init, zero-config getting started, migration path to Supabase
- docs/ENGINES.md: PGLiteEngine shipped (v0.7), capability matrix, migration docs
- docs/SQLITE_ENGINE.md: marked superseded by PGLite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove stale v0.4 README update prompt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove SQLITE_ENGINE.md (superseded by PGLite)

PGLite uses the same SQL as Postgres, making a separate SQLite
engine unnecessary. docs/ENGINES.md covers PGLiteEngine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update README step 2 to default to PGLite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add schema setup step and install-all-integrations step to README

Step 3 now tells agents to read GBRAIN_RECOMMENDED_SCHEMA.md and set up
the MECE directory structure before importing. Step 7 tells agents to
install every available integration recipe, not just list them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update install goal to match full opinionated setup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add 'Need an AI agent first?' section with one-click deploy links

New users who don't have OpenClaw or Hermes Agent get pointed to
AlphaClaw on Render and the Hermes Agent Railway template. One click
each. Claude Code mentioned for users who already have it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add migrate to CLI_ONLY + help output, fix standalone example

- migrate command was missing from CLI_ONLY set (errored as "Unknown command")
- migrate now shows in --help under SETUP
- init help line shows --pglite flag
- standalone CLI example uses gbrain init (not --supabase)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: set realistic time expectation (~30 min to working brain)

DB is 2 seconds. But schema + import + embeddings + integrations
is 15-30 minutes. The agent does the work, you answer API key
questions. Don't oversell time-to-value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix AlphaClaw Render requirement (8GB+ RAM, not free tier)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: final README polish for launch

- GOAL line: "Garry Tan's exact setup" (not Claude Code specific)
- Remove markdown links from code block (won't render)
- STEP 2 renamed from "START HERE" to "DATABASE"
- Tighten Supabase fallback text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove duplicate old install block from README

The v0.5-era "With OpenClaw or Hermes Agent" paste block was
superseded by the top-level "Start here" block. Having both
confused users and the old one still said --supabase as step 2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clean up README consistency and remove duplicated content

- Remove duplicate "Try it" section (old 4-act walkthrough that
  repeated the install flow and contradicted "~30 min" with "90 sec")
- Remove duplicate Setup section (third repetition of gbrain init)
- Fix brain.db → brain.pglite (actual default path)
- Fix "coming in v0.7" → "not yet implemented" (we ARE v0.7)
- Remove "You don't need Postgres" (confusing since PGLite IS Postgres)
- Deduplicate "competitive dynamics" query (appeared 3 times)
- Collapse redundant standalone CLI section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:01:09 -10:00

16 KiB

CLAUDE.md

GBrain is a personal knowledge brain. Pluggable engines: PGLite (embedded Postgres via WASM, zero-config default) or Postgres + pgvector + hybrid search in a managed Supabase instance. gbrain init defaults to PGLite; suggests Supabase for 1000+ files.

Architecture

Contract-first: src/core/operations.ts defines ~30 shared operations. CLI and MCP server are both generated from this single source. Engine factory (src/core/engine-factory.ts) dynamically imports the configured engine ('pglite' or 'postgres'). Skills are fat markdown files (tool-agnostic, work with both CLI and plugin contexts).

Key files

  • src/core/operations.ts — Contract-first operation definitions (the foundation)
  • src/core/engine.ts — Pluggable engine interface (BrainEngine)
  • src/core/engine-factory.ts — Engine factory with dynamic imports ('pglite' | 'postgres')
  • src/core/pglite-engine.ts — PGLite (embedded Postgres 17.5 via WASM) implementation, all 37 BrainEngine methods
  • src/core/pglite-schema.ts — PGLite-specific DDL (pgvector, pg_trgm, triggers)
  • src/core/postgres-engine.ts — Postgres + pgvector implementation (Supabase / self-hosted)
  • src/core/utils.ts — Shared SQL utilities extracted from postgres-engine.ts
  • src/core/db.ts — Connection management, schema initialization
  • src/commands/migrate-engine.ts — Bidirectional engine migration (gbrain migrate --to supabase/pglite)
  • src/core/import-file.ts — importFromFile + importFromContent (chunk + embed + tags)
  • src/core/sync.ts — Pure sync functions (manifest parsing, filtering, slug conversion)
  • src/core/storage.ts — Pluggable storage interface (S3, Supabase Storage, local)
  • src/core/supabase-admin.ts — Supabase admin API (project discovery, pgvector check)
  • src/core/file-resolver.ts — MIME detection, content hashing for file uploads
  • src/core/chunkers/ — 3-tier chunking (recursive, semantic, LLM-guided)
  • src/core/search/ — Hybrid search: vector + keyword + RRF + multi-query expansion + dedup
  • src/core/embedding.ts — OpenAI text-embedding-3-large, batch, retry, backoff
  • src/mcp/server.ts — MCP stdio server (generated from operations)
  • supabase/functions/gbrain-mcp/index.ts — Remote MCP server (Supabase Edge Function)
  • src/edge-entry.ts — Curated bundle entry point for Edge Function (excludes fs-dependent modules)
  • src/commands/auth.ts — Standalone token management (create/list/revoke/test)
  • src/core/schema-embedded.ts — AUTO-GENERATED from schema.sql (run bun run build:schema)
  • src/schema.sql — Full Postgres + pgvector DDL (source of truth, generates schema-embedded.ts)
  • src/commands/integrations.ts — Standalone integration recipe management (no DB needed)
  • recipes/ — Integration recipe files (YAML frontmatter + markdown setup instructions)
  • docs/guides/ — Individual SKILLPACK guides (broken out from monolith)
  • docs/integrations/ — "Getting Data In" guides and integration docs
  • docs/architecture/infra-layer.md — Shared infrastructure documentation
  • docs/ethos/THIN_HARNESS_FAT_SKILLS.md — Architecture philosophy essay
  • docs/ethos/MARKDOWN_SKILLS_AS_RECIPES.md — "Homebrew for Personal AI" essay
  • docs/guides/repo-architecture.md — Two-repo pattern (agent vs brain)
  • docs/guides/sub-agent-routing.md — Model routing table for sub-agents
  • docs/guides/skill-development.md — 5-step skill development cycle + MECE
  • docs/guides/idea-capture.md — Originality distribution, depth test, cross-linking
  • docs/guides/quiet-hours.md — Notification hold + timezone-aware delivery
  • docs/guides/diligence-ingestion.md — Data room to brain pages pipeline
  • docs/designs/HOMEBREW_FOR_PERSONAL_AI.md — 10-star vision for integration system
  • scripts/deploy-remote.sh — One-script remote MCP deployment
  • docs/mcp/ — Per-client setup guides (Claude Desktop, Code, Cowork, Perplexity, ChatGPT)
  • openclaw.plugin.json — ClawHub bundle plugin manifest

Commands

Run gbrain --help or gbrain --tools-json for full command reference.

Key commands added in v0.7:

  • gbrain init — defaults to PGLite (no Supabase needed), scans repo size, suggests Supabase for 1000+ files
  • gbrain migrate --to supabase / gbrain migrate --to pglite — bidirectional engine migration

Testing

bun test runs all tests (23 unit test files + 4 E2E test files). Unit tests run without a database. E2E tests skip gracefully when DATABASE_URL is not set.

Unit tests: test/markdown.test.ts (frontmatter parsing), test/chunkers/recursive.test.ts (chunking), test/sync.test.ts (sync logic), test/parity.test.ts (operations contract parity), test/cli.test.ts (CLI structure), test/config.test.ts (config redaction), test/files.test.ts (MIME/hash), test/import-file.test.ts (import pipeline), test/upgrade.test.ts (schema migrations), test/doctor.test.ts (doctor command), test/file-migration.test.ts (file migration), test/file-resolver.test.ts (file resolution), test/import-resume.test.ts (import checkpoints), test/migrate.test.ts (migration), test/setup-branching.test.ts (setup flow), test/slug-validation.test.ts (slug validation), test/storage.test.ts (storage backends), test/supabase-admin.test.ts (Supabase admin), test/yaml-lite.test.ts (YAML parsing), test/check-update.test.ts (version check + update CLI), test/pglite-engine.test.ts (PGLite engine, all 37 BrainEngine methods), test/utils.test.ts (shared SQL utilities), test/engine-factory.test.ts (engine factory + dynamic imports), test/integrations.test.ts (recipe parsing, CLI routing, recipe validation).

E2E tests (test/e2e/): Run against real Postgres+pgvector. Require DATABASE_URL.

  • bun run test:e2e runs Tier 1 (mechanical, all operations, no API keys)
  • test/e2e/upgrade.test.ts runs check-update E2E against real GitHub API (network required)
  • Tier 2 (skills.test.ts) requires OpenClaw + API keys, runs nightly in CI
  • If .env.testing doesn't exist in this directory, check sibling worktrees for one: find ../ -maxdepth 2 -name .env.testing -print -quit and copy it here if found.
  • Always run E2E tests when they exist. Do not skip them just because DATABASE_URL is not set. Start the test DB, run the tests, then tear it down.

API keys and running ALL tests

ALWAYS source the user's shell profile before running tests:

source ~/.zshrc 2>/dev/null || true

This loads OPENAI_API_KEY and ANTHROPIC_API_KEY. Without these, Tier 2 tests skip silently. Do NOT skip Tier 2 tests just because they require API keys — load the keys and run them.

When asked to "run all E2E tests" or "run tests", that means ALL tiers:

  • Tier 1: bun run test:e2e (mechanical, sync, upgrade — no API keys needed)
  • Tier 2: test/e2e/skills.test.ts (requires OpenAI + Anthropic + openclaw CLI)
  • Always spin up the test DB, source zshrc, run everything, tear down.

E2E test DB lifecycle (ALWAYS follow this)

You are responsible for spinning up and tearing down the test Postgres container. Do not leave containers running after tests. Do not skip E2E tests.

  1. Check for .env.testing — if missing, copy from sibling worktree. Read it to get the DATABASE_URL (it has the port number).
  2. Check if the port is free: docker ps --filter "publish=PORT" — if another container is on that port, pick a different port (try 5435, 5436, 5437) and start on that one instead.
  3. Start the test DB:
    docker run -d --name gbrain-test-pg \
      -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres \
      -e POSTGRES_DB=gbrain_test \
      -p PORT:5432 pgvector/pgvector:pg16
    
    Wait for ready: docker exec gbrain-test-pg pg_isready -U postgres
  4. Run E2E tests: DATABASE_URL=postgresql://postgres:postgres@localhost:PORT/gbrain_test bun run test:e2e
  5. Tear down immediately after tests finish (pass or fail): docker stop gbrain-test-pg && docker rm gbrain-test-pg

Never leave gbrain-test-pg running. If you find a stale one from a previous run, stop and remove it before starting a new one.

Skills

Read the skill files in skills/ before doing brain operations. They contain the workflows, heuristics, and quality rules for ingestion, querying, maintenance, enrichment, and setup. 7 skills: ingest, query, maintain, enrich, briefing, migrate, setup.

Build

bun build --compile --outfile bin/gbrain src/cli.ts

Pre-ship requirements

Before shipping (/ship) or reviewing (/review), always run the full test suite:

  • bun test — unit tests (no database required)
  • Follow the "E2E test DB lifecycle" steps above to spin up the test DB, run bun run test:e2e, then tear it down.

Both must pass. Do not ship with failing E2E tests. Do not skip E2E tests.

Post-ship requirements (MANDATORY)

After EVERY /ship, you MUST run /document-release. This is NOT optional. Do NOT skip it. Do NOT say "docs look fine" without running it. The skill reads every .md file in the project, cross-references the diff, and updates anything that drifted.

If /ship's Step 8.5 triggers document-release automatically, that counts. But if it gets skipped for ANY reason (timeout, error, oversight), you MUST run it manually before considering the ship complete.

Files that MUST be checked on every ship:

  • README.md — does it reflect new features, commands, or setup steps?
  • CLAUDE.md — does it reflect new files, test files, or architecture changes?
  • CHANGELOG.md — does it cover every commit?
  • TODOS.md — are completed items marked done?
  • docs/ — do any guides need updating?

A ship without updated docs is an incomplete ship. Period.

CHANGELOG voice

CHANGELOG.md is read by agents during auto-update (Section 17). The agent summarizes the changelog to convince the user to upgrade. Write changelog entries that sell the upgrade, not document the implementation.

  • Lead with what the user can now DO that they couldn't before
  • Frame as benefits and capabilities, not files changed or code written
  • Make the user think "hell yeah, I want that"
  • Bad: "Added GBRAIN_VERIFY.md installation verification runbook"
  • Good: "Your agent now verifies the entire GBrain installation end-to-end, catching silent sync failures and stale embeddings before they bite you"
  • Bad: "Setup skill Phase H and Phase I added"
  • Good: "New installs automatically set up live sync so your brain never falls behind"

Version migrations

Create a migration file at skills/migrations/v[version].md when a release includes changes that existing users need to act on. The auto-update agent reads these files post-upgrade (Section 17, Step 4) and executes them.

You need a migration file when:

  • New setup step that existing installs don't have (e.g., v0.5.0 added live sync, existing users need to set it up, not just new installs)
  • New SKILLPACK section with a MUST ADD setup requirement
  • Schema changes that require gbrain init or manual SQL
  • Changed defaults that affect existing behavior
  • Deprecated commands or flags that need replacement
  • New verification steps that should run on existing installs
  • New cron jobs or background processes that should be registered

You do NOT need a migration file when:

  • Bug fixes with no behavior changes
  • Documentation-only improvements (the agent re-reads docs automatically)
  • New optional features that don't affect existing setups
  • Performance improvements that are transparent

The key test: if an existing user upgrades and does nothing else, will their brain work worse than before? If yes, migration file. If no, skip it.

Write migration files as agent instructions, not technical notes. Tell the agent what to do, step by step, with exact commands. See skills/migrations/v0.5.0.md for the pattern.

Schema state tracking

~/.gbrain/update-state.json tracks which recommended schema directories the user adopted, declined, or added custom. The auto-update agent (SKILLPACK Section 17) reads this during upgrades to suggest new schema additions without re-suggesting things the user already declined. The setup skill writes the initial state during Phase C/E. Never modify a user's custom directories or re-suggest declined ones.

GitHub Actions SHA maintenance

All GitHub Actions in .github/workflows/ are pinned to commit SHAs. Before shipping (/ship) or reviewing (/review), check for stale pins and update them:

for action in actions/checkout oven-sh/setup-bun actions/upload-artifact actions/download-artifact softprops/action-gh-release gitleaks/gitleaks-action; do
  tag=$(grep -r "$action@" .github/workflows/ | head -1 | grep -o '#.*' | tr -d '# ')
  [ -n "$tag" ] && echo "$action@$tag: $(gh api repos/$action/git/ref/tags/$tag --jq .object.sha 2>/dev/null)"
done

If any SHA differs from what's in the workflow files, update the pin and version comment.

Community PR wave process

Never merge external PRs directly into master. Instead, use the "fix wave" workflow:

  1. Categorize — group PRs by theme (bug fixes, features, infra, docs)
  2. Deduplicate — if two PRs fix the same thing, pick the one that changes fewer lines. Close the other with a note pointing to the winner.
  3. Collector branch — create a feature branch (e.g. garrytan/fix-wave-N), cherry-pick or manually re-implement the best fixes from each PR. Do NOT merge PR branches directly — read the diff, understand the fix, and write it yourself if needed.
  4. Test the wave — verify with bun test && bun run test:e2e (full E2E lifecycle). Every fix in the wave must have test coverage.
  5. Close with context — every closed PR gets a comment explaining why and what (if anything) supersedes it. Contributors did real work; respect that with clear communication and thank them.
  6. Ship as one PR — single PR to master with all attributions preserved via Co-Authored-By: trailers. Include a summary of what merged and what closed.

Community PR guardrails:

  • Always AskUserQuestion before accepting commits that touch voice, tone, or promotional material (README intro, CHANGELOG voice, skill templates).
  • Never auto-merge PRs that remove YC references or "neutralize" the founder perspective.
  • Preserve contributor attribution in commit messages.

Skill routing

When the user's request matches an available skill, ALWAYS invoke it using the Skill tool as your FIRST action. Do NOT answer directly, do NOT use other tools first. The skill has specialized workflows that produce better results than ad-hoc answers.

NEVER hand-roll ship operations. Do not manually run git commit + push + gh pr create when /ship is available. /ship handles VERSION bump, CHANGELOG, document-release, pre-landing review, test coverage audit, and adversarial review. Manually creating a PR skips all of these. If the user says "commit and ship", "push and ship", "bisect and ship", or any combination that ends with shipping — invoke /ship and let it handle everything including the commits. If the branch name contains a version (e.g. v0.5-live-sync), /ship should use that version for the bump.

Key routing rules:

  • Product ideas, "is this worth building", brainstorming → invoke office-hours
  • Bugs, errors, "why is this broken", 500 errors → invoke investigate
  • Ship, deploy, push, create PR, "commit and ship", "push and ship" → invoke ship
  • QA, test the site, find bugs → invoke qa
  • Code review, check my diff → invoke review
  • Update docs after shipping → invoke document-release
  • Weekly retro → invoke retro
  • Design system, brand → invoke design-consultation
  • Visual audit, design polish → invoke design-review
  • Architecture review → invoke plan-eng-review
  • Save progress, checkpoint, resume → invoke checkpoint
  • Code quality, health check → invoke health