Commit Graph

2 Commits

Author SHA1 Message Date
Garry Tan
6c7d2ed30b feat: PGLite engine — local brain, zero infrastructure (v0.7.0) (#41)
* refactor: extract shared utils, add runMigration + getChunksWithEmbeddings to BrainEngine

Extract validateSlug, contentHash, rowToPage, rowToChunk, rowToSearchResult
from postgres-engine.ts into shared utils.ts. Add rowToChunk includeEmbedding
parameter for migration support.

Add two new methods to BrainEngine interface:
- runMigration(version, sql) — replaces internal eng.sql access in migrate.ts
- getChunksWithEmbeddings(slug) — returns chunks with embedding data for migration

Replace 'sqlite' with 'pglite' in EngineConfig and GBrainConfig types.
Fix loadConfig to infer engine from database_path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: pluggable engine factory + hybridSearch keyword-only fallback

Add createEngine() factory with dynamic imports so PGLite WASM is never
loaded for Postgres users. Wire CLI to use factory instead of hardcoded
PostgresEngine.

Force workers=1 for PGLite imports (single-connection architecture).

Fix hybridSearch to check OPENAI_API_KEY before calling embed(). When
unset, returns keyword-only results instead of throwing. Critical for
local PGLite users who don't need vector search.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: PGLiteEngine — embedded Postgres 17.5 via WASM, same SQL everywhere

Full BrainEngine implementation (37 methods) using @electric-sql/pglite.
Same SQL as PostgresEngine — tsvector triggers, pgvector HNSW, pg_trgm
fuzzy matching, recursive CTEs, JSONB. Only the driver call syntax differs
(parameterized queries instead of tagged templates).

PGLite schema is the Postgres schema minus RLS, advisory locks, and
remote auth tables (access_tokens, mcp_request_log, files).

No server. No subscription. One directory. Works offline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: smart init (PGLite default) + bidirectional engine migration

gbrain init now defaults to PGLite — brain ready in 2 seconds, no
server needed. Scans target directory: <1000 .md files = PGLite,
>=1000 = suggests Supabase. --supabase and --pglite flags override.

gbrain migrate --to supabase/pglite transfers all data between engines
with manifest-based resume. Copies pages, chunks (with embeddings),
tags, timeline, raw data, links, and config. --force overwrites
non-empty target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: 60 new tests for PGLite engine, utils, and factory

41 PGLite engine tests covering all 37 BrainEngine methods: CRUD,
tsvector keyword search, pg_trgm fuzzy matching, chunk upsert with
COALESCE, graph traversal via recursive CTE, transactions, cascade
deletes, stats/health, and embedding round-trip.

14 shared utility tests (validateSlug, contentHash, row mappers).
5 engine factory tests (dispatch, error messages).

All run in-memory — zero Docker, zero DATABASE_URL, instant in CI.

Add P0 TODO: submit Bun PR for WASM embedding in bun build --compile
(oven-sh/bun#15032).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.7.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.7.0 PGLite engine

- CLAUDE.md: add PGLite key files, update architecture, add migrate command, add 3 test files
- README.md: PGLite as default init, zero-config getting started, migration path to Supabase
- docs/ENGINES.md: PGLiteEngine shipped (v0.7), capability matrix, migration docs
- docs/SQLITE_ENGINE.md: marked superseded by PGLite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove stale v0.4 README update prompt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove SQLITE_ENGINE.md (superseded by PGLite)

PGLite uses the same SQL as Postgres, making a separate SQLite
engine unnecessary. docs/ENGINES.md covers PGLiteEngine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update README step 2 to default to PGLite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add schema setup step and install-all-integrations step to README

Step 3 now tells agents to read GBRAIN_RECOMMENDED_SCHEMA.md and set up
the MECE directory structure before importing. Step 7 tells agents to
install every available integration recipe, not just list them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update install goal to match full opinionated setup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add 'Need an AI agent first?' section with one-click deploy links

New users who don't have OpenClaw or Hermes Agent get pointed to
AlphaClaw on Render and the Hermes Agent Railway template. One click
each. Claude Code mentioned for users who already have it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add migrate to CLI_ONLY + help output, fix standalone example

- migrate command was missing from CLI_ONLY set (errored as "Unknown command")
- migrate now shows in --help under SETUP
- init help line shows --pglite flag
- standalone CLI example uses gbrain init (not --supabase)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: set realistic time expectation (~30 min to working brain)

DB is 2 seconds. But schema + import + embeddings + integrations
is 15-30 minutes. The agent does the work, you answer API key
questions. Don't oversell time-to-value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix AlphaClaw Render requirement (8GB+ RAM, not free tier)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: final README polish for launch

- GOAL line: "Garry Tan's exact setup" (not Claude Code specific)
- Remove markdown links from code block (won't render)
- STEP 2 renamed from "START HERE" to "DATABASE"
- Tighten Supabase fallback text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove duplicate old install block from README

The v0.5-era "With OpenClaw or Hermes Agent" paste block was
superseded by the top-level "Start here" block. Having both
confused users and the old one still said --supabase as step 2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clean up README consistency and remove duplicated content

- Remove duplicate "Try it" section (old 4-act walkthrough that
  repeated the install flow and contradicted "~30 min" with "90 sec")
- Remove duplicate Setup section (third repetition of gbrain init)
- Fix brain.db → brain.pglite (actual default path)
- Fix "coming in v0.7" → "not yet implemented" (we ARE v0.7)
- Remove "You don't need Postgres" (confusing since PGLite IS Postgres)
- Deduplicate "competitive dynamics" query (appeared 3 times)
- Collapse redundant standalone CLI section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:01:09 -10:00
Garry Tan
b22cbd349a feat: GBrain v0.1.0 — Postgres-native personal knowledge brain (#1)
* chore: add CLAUDE.md with project context and gstack skill routing rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: initialize project with Bun + TypeScript

package.json with dependencies (postgres, pgvector, openai, anthropic,
MCP SDK, gray-matter). TypeScript config targeting ESNext with bundler
module resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add foundation layer — engine interface, Postgres engine, schema

BrainEngine pluggable interface with full PostgresEngine: CRUD, search
(keyword + vector), links, tags, timeline, versions, stats, health,
ingest log, config. Trigger-based tsvector spanning pages +
timeline_entries. Markdown parser with frontmatter, compiled_truth /
timeline splitting, and round-trip serialization. 19 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add 3-tier chunking and embedding service

Recursive delimiter-aware chunker (5-level hierarchy, 300-word chunks,
50-word overlap). Semantic chunker with Savitzky-Golay boundary detection
and recursive fallback. LLM-guided chunker via Claude Haiku with sliding
window topic detection. OpenAI embedding service with batch support,
exponential backoff, and rate limit handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add hybrid search with RRF fusion, expansion, and 4-layer dedup

Hybrid search merges vector (pgvector HNSW) + keyword (tsvector) via
Reciprocal Rank Fusion. Multi-query expansion via Claude Haiku generates
2 alternative phrasings. 4-layer dedup pipeline: by source, cosine
similarity, type diversity (60% cap), per-page cap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add GBRAIN_V0 spec, pluggable engine architecture, SQLite engine plan

GBRAIN_V0.md: full product spec with architecture decisions, CLI commands,
schema, search architecture, chunking strategies, first-time experience,
and future plans. ENGINES.md: pluggable engine interface, capability matrix,
how to add new backends. SQLITE_ENGINE.md: complete SQLite implementation
plan with schema, FTS5 setup, vector search options, and contributor guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add CLI with all commands

Full CLI dispatcher with 25+ commands: init (Supabase wizard), get, put,
delete, list, search, query (hybrid RRF), import (bulk with progress bar),
export (round-trip), embed, stats, health, tag/untag/tags, link/unlink/
backlinks/graph, timeline/timeline-add, history/revert, config, upgrade,
serve, call. Smart slug resolution on reads. Version snapshots on updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add MCP stdio server with all brain tools

20 MCP tools mirroring CLI operations: get/put/delete/list pages,
search (keyword), query (hybrid RRF + expansion), tags, links with
graph traversal, timeline, stats, health, version history, and revert.
Auto-chunks and embeds on put_page. CLI and MCP share the same engine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add 6 skill files and ClawHub manifest

Fat markdown skills for AI agents: ingest (meetings/docs/articles with
timeline merge), query (3-layer search + synthesis + citations), maintain
(health checks, stale detection, orphan audit), enrich (external API
enrichment), briefing (daily briefing compilation), migrate (universal
migration from Obsidian/Notion/Logseq/markdown/CSV/JSON/Roam).
ClawHub manifest for skill distribution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add README, CONTRIBUTING, update CLAUDE.md test references

README with quickstart, commands, architecture, library usage, MCP setup,
and links to design docs. CONTRIBUTING with setup, project structure,
and guides for adding commands and engines. CLAUDE.md updated to reference
actual test files instead of planned-but-unwritten import test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address adversarial review findings — 5 critical/high fixes

- revertToVersion: add page_id check to prevent cross-page data corruption
- traverseGraph: use UNION instead of UNION ALL for cycle safety
- embedAll: preserve all chunks when embedding stale subset only
- embedding: throw on retry exhaustion instead of returning zero vectors
- putPage: validate slugs to prevent path traversal on export

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.1.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: expand README with schema, install, search architecture, and motivation

Why it exists, how search works (with ASCII diagram), full database schema
with all 9 tables and index details, chunking strategies explained, storage
estimates, setup wizard walkthrough, knowledge model with example page,
library usage with more examples, expanded skills table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add MIT license (Copyright 2026 Garry Tan)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add OpenClaw install flow as primary option in README

OpenClaw users just say "install gbrain" and the orchestrator handles
everything: package install, Supabase setup wizard, skill registration.
Shows the conversational interface for querying, ingesting, and briefings.
ClawHub and standalone CLI paths follow as alternatives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add prerequisites and explicit OpenClaw install instructions

Prerequisites table listing Supabase, OpenAI, and Anthropic dependencies
with links. Environment variable setup. Explicit step-by-step prompt for
OpenClaw users showing exactly what to tell the orchestrator. Note that
search degrades gracefully without API keys (keyword-only without OpenAI,
no expansion without Anthropic).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: scrub named references, add PG essay demo section to README

Replace all Pedro/Brex/Jensen Huang/River AI examples with Paul Graham
essay examples using the kindling corpus. Add "Try it" section to README
showing the power of hybrid search on PG essays in 90 seconds. Update
test fixtures to use concept pages instead of person pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:48:10 -07:00