Commit Graph

13 Commits

Author SHA1 Message Date
Garry Tan
95353f8790 Add Section 16: Deterministic Collectors — Code for Data, LLMs for Judgment (#12)
Pattern for when LLMs keep failing at mechanical tasks despite prompt fixes.
Real example: email Gmail links dropped 5x, fixed by moving URL generation to
a deterministic Node.js collector script that feeds pre-formatted data to the LLM.

Architecture: deterministic pipeline → structured data → LLM analysis layer.
Same pattern as x-collector (Twitter data) — generalized to email, calendar,
Slack, GitHub, and any recurring data pull.

Co-authored-by: root <root@localhost>
2026-04-09 14:28:47 -07:00
Garry Tan
2f8aa80a49 docs: add SKILLPACK loading to OpenClaw install step 6
OpenClaw setup now instructs agents to read the SKILLPACK and update
all skills with production agent patterns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 10:25:16 -10:00
Garry Tan
2555de269a chore: add GitHub issue templates
Bug report template (includes gbrain doctor --json field) and
feature request template.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:22:53 -10:00
Garry Tan
011600ad2d docs: add non-OpenClaw guide, file storage docs, promote SKILLPACK
- New "GBrain without OpenClaw" section: standalone CLI, MCP server
  config (Claude Code + Cursor), TypeScript library with examples,
  and skill file loading table
- New "File storage and migration" section: three-stage lifecycle
  (mirror/redirect/clean), all 10 file subcommands, storage backends
- SKILLPACK promoted throughout: bold callout in "Production Agent"
  section, bold link in Docs section
- Removed duplicate "Using as a library" and "MCP server" sections
  (now covered in the unified non-OpenClaw guide)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:22:27 -10:00
Garry Tan
95eda98e27 feat: surface GBRAIN_SKILLPACK.md during setup and init
- gbrain init success message now prints the skillpack path
- Setup skill adds Phase E: load the production agent guide
- Agents are instructed to read and inject key SKILLPACK patterns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:22:00 -10:00
Garry Tan
041a6e51cb fix: validate required CLI params before calling handler
gbrain get with no args now shows "Usage: gbrain get <slug>" instead of
leaking a raw Postgres driver error (UNDEFINED_VALUE).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:21:35 -10:00
Garry Tan
912a321cfa GBrain v0.4.0 — production agent documentation + reference architecture (#10)
* fix: widen validateSlug to accept any filename characters

Git is the system of record. Slugs are lowercased repo-relative paths.
The restrictive regex rejected spaces, parens, and special chars, blocking
5,861 Apple Notes files from importing. Now only rejects empty slugs,
path traversal (..), and leading slash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: enable RLS on all tables with BYPASSRLS safety check

Without RLS, the Supabase anon key gives full read access to the DB.
Enable RLS on all 10 tables with no policies — the postgres role
(used by gbrain via pooler) has BYPASSRLS and is unaffected. Only
enables if the current role actually has BYPASSRLS privilege to
avoid locking ourselves out on non-Supabase setups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: import resilience — 5MB limit, error suppression, structured progress

Raise MAX_FILE_SIZE from 1MB to 5MB for Apple Notes with attachments.
Track error patterns and suppress after 5 identical errors to prevent
5,861 identical warnings from killing the agent process. Replace \r
progress bar with structured log lines (rate, ETA) for agent parsing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: init detects IPv6-only Supabase URLs, adds pgvector check

Detect db.*.supabase.co direct URLs and warn about IPv6 failure.
On ECONNREFUSED/ETIMEDOUT to Supabase, suggest the Session pooler
connection string with exact dashboard click path. Check for pgvector
extension after connecting and fail with clear instructions if missing.
Update wizard hints to show pooler URL format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add pre-ship requirement for E2E tests

E2E tests against real Postgres+pgvector must pass before /ship or
/review. Adds the requirement to CLAUDE.md so all agents enforce it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: parallel import with per-worker engine instances

Refactor PostgresEngine to support instance-level DB connections instead
of only the module-global singleton. Each worker gets its own connection
with poolSize:2 (vs 10 for the main engine), so 8 workers = 16 connections.

Add --workers N flag to gbrain import. Workers pull from a shared queue
and use independent engine instances — no transaction context corruption.

The bottleneck is network round-trips to Supabase (one per page upsert).
Parallel workers cut import time proportionally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: automatic schema migration runner

Migrations are embedded as string constants in migrate.ts (survives
Bun --compile). Each migration runs in a transaction for clean rollback
on failure. Runs automatically on initSchema() — no manual step needed
when a user updates the gbrain binary against an older DB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: pluggable storage backend (S3 + Supabase Storage + local)

Add StorageBackend interface with three implementations:
- S3Storage: works with AWS S3, Cloudflare R2, MinIO (any S3-compatible)
- SupabaseStorage: uses Supabase Storage REST API with service role key
- LocalStorage: filesystem-based, for testing

Add file-resolver.ts with fallback chain: local file → .redirect
breadcrumb → .supabase marker → storage backend. Supports the
three-stage migration (mirror → redirect → clean).

Add yaml-lite.ts for parsing marker and breadcrumb files without
adding a YAML dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: gbrain doctor command — health checks with --json output

Checks: connection, pgvector extension, RLS on all tables, schema
version, embedding coverage. Outputs structured JSON with --json flag
for agent parsing. Exit code 0 if healthy, 1 if issues found.

Agents should run gbrain doctor --json when any command fails.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite setup skill + README for agent-first DX

Setup skill: add Why Supabase, step-by-step project creation, explicit
agent instructions (nohup for large imports, doctor on failure, don't
ask for anon key), available init flags, file migration offer after
first import. Remove ClawHub references.

README: simplify to single OpenClaw install path, remove ClawHub, fix
squatted npm name to github:garrytan/gbrain, add Supabase settings
note about Session pooler.

Add Apple Notes test fixtures with spaces and parens in filenames for
E2E testing of the slug fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add RLS verification, schema health, and nohup hints to maintain skill

Maintenance skill now checks RLS status and schema version as part of
periodic health checks. Adds nohup pattern for large embedding refreshes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: import resume checkpoint + Supabase smart URL parsing

Import resume: saves checkpoint every 100 files to ~/.gbrain/import-checkpoint.json.
On restart with same directory and file count, skips already-processed files.
Use --fresh to ignore checkpoint and start over. Cleared on successful completion.

Supabase admin: extractProjectRef() parses any Supabase URL format (dashboard,
direct, pooler, project URL) to extract the project ref. discoverPoolerUrl()
uses the Management API to find the correct pooler connection string (including
the exact region prefix). checkRls() verifies RLS status via the API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add 56 unit tests for all new code

8 new test files covering every feature added in this branch:
- slug-validation.test.ts: spaces, parens, unicode, path traversal (10 tests)
- yaml-lite.test.ts: parse + stringify, marker/redirect formats (9 tests)
- supabase-admin.test.ts: extractProjectRef for 4 URL formats (7 tests)
- migrate.test.ts: version export, runMigrations callable (2 tests)
- storage.test.ts: LocalStorage CRUD + createStorage factory (14 tests)
- file-resolver.test.ts: fallback chain, redirect, marker parsing (6 tests)
- import-resume.test.ts: checkpoint save/load/resume/fresh (6 tests)
- doctor.test.ts: module export, CLI registration (3 tests)

Total: 184 pass, 0 fail (up from 128).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: bulk chunk INSERT + E2E tests for all new features

Bulk INSERT: upsertChunks now builds a multi-row VALUES query instead
of inserting chunks one-by-one. Reduces DB round-trips by ~50x per page.

E2E tests added to mechanical.test.ts:
- Slug with special chars: import Apple Notes fixtures with spaces/parens,
  verify search finds them, verify idempotency
- RLS verification: check pg_tables.rowsecurity on all tables, verify
  current user has BYPASSRLS
- Doctor command: verify exit 0 on healthy DB, --json produces valid JSON
  with check structure
- Parallel import: --workers 2 produces same page count as sequential

Unit tests added:
- setup-branching.test.ts: IPv6 detection, defaultWorkers auto-tuning,
  smart URL parsing across all Supabase URL formats

Fixtures added:
- large/big-file.md (2.1MB) for testing raised file size limit
- apple-notes/ fixtures already existed

Total: 200 pass, 0 fail (up from 184).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: --json on init/import, file migration CLI, lifecycle tests

--json flag: init and import now support --json for structured output.
Agents get parseable JSON instead of human-readable text.

File migration CLI: implement mirror, unmirror, redirect, restore,
clean, and status subcommands for the three-stage file migration
lifecycle (local → mirrored → redirected → cloud-only).

File migration tests: full lifecycle test covering every transition
in the state machine (LOCAL → MIRROR → UNMIRROR → REDIRECT → RESTORE
→ CLEAN), including edge cases and file resolver at each stage.

Bulk chunk INSERT: upsertChunks now builds multi-row parameterized
VALUES query, reducing round-trips per page from ~50 to 1.

Total: 207 pass, 0 fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: thorough E2E tests for parallel import concurrency

Replace the weak single-comparison parallel import test with 7 tests:
- Sequential baseline: capture page count, chunk count, and all slugs
- --workers 2: verify page count matches sequential
- Chunk count matches (no duplicates from concurrent writes)
- Page slugs match exactly
- No duplicate pages (SQL GROUP BY HAVING count > 1)
- No duplicate chunks (SQL GROUP BY page_id, chunk_index)
- --workers 4: also works correctly
- Re-import with workers is idempotent

These tests catch the exact bug Codex found (db.ts singleton causing
concurrent transaction corruption) by verifying data integrity after
parallel writes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add batch embedding queue as P1 TODO

Deferred during eng review (per-worker embedding is good enough for now).
Revisit after profiling real imports to confirm embedding is the bottleneck.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: E2E test failures — fixture counts, arg parsing, doctor exit code

Fix fixture count assertions: 13 → 16 pages (added apple-notes + large file),
companies 2 → 3 (ohmygreen), concepts 3 → 5 (notes, big-file).

Fix --workers arg parsing: the worker count value (e.g. "2") was being
picked up as the directory arg. Skip flag values when finding the dir.

Fix doctor exit code: warnings (like missing embeddings) should exit 0,
only actual failures exit 1. E2E tests import with --no-embed, so
embeddings are always WARN.

Fix E2E CLI tests: add initCli() before doctor and parallel import
tests so ~/.gbrain/config.json exists for the subprocess.

All E2E tests pass: 63 pass, 0 fail.
All unit tests pass: 207 pass, 0 fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.4.0

New CHANGELOG entry for all post-0.3.0 features (doctor, storage backends,
parallel import, resume checkpoints, RLS, schema migrations, --json output).
Version bumped 0.3.0 → 0.4.0 across all manifests.

CLAUDE.md: test count 9→19, skill count 8→7, added key files.
CONTRIBUTING.md: fixture count 13→16, added missing source files.
README.md: added gbrain doctor to commands, fixed stale welcome PRs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add GBRAIN_SKILLPACK.md reference architecture

Production agent patterns from a real deployment with 14,700+ brain files.
Covers: entity detection on every message, brain-first lookup protocol,
7-step enrichment pipeline with tiered API spend, compiled truth + timeline,
source attribution with mandatory citations, meeting ingestion with entity
propagation, cron schedule with quiet hours and travel-aware timezone,
YouTube/media ingestion via Diarize.io, integration guides for ClawVisor,
Circleback webhooks, and Quo/OpenPhone SMS. Opens with the Vannevar Bush
memex framing and the originals folder for capturing intellectual capital.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite README opener with memex pitch and production architecture

Replace code-first opener with mimetic-desire pitch: Vannevar Bush memex
tagline, production brain numbers (10K+ files, 3K+ people, 13 years of
calendar), "ask it anything" examples, compounding thesis.

New sections: The Compounding Thesis (read-write loop), Architecture
(three-column diagram), What a Production Agent Looks Like (SKILLPACK
reference), How gbrain fits with OpenClaw (three-layer complement).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update skills with brain-first lookup, entity detection, heartbeat

setup: Phase D rewritten with brain-first lookup protocol (gbrain search
→ query → get → grep fallback), sync-after-write rule, memory_search
complement table.

query: token-budget awareness (chunks not full pages), source precedence
hierarchy (user > compiled truth > timeline > external).

ingest: entity detection on every message (scan, check brain, create or
enrich, commit and sync).

maintain: heartbeat integration (doctor, embed --stale, sync verification,
stale compiled truth detection).

briefing: gbrain-native context loading (search attendees before meetings,
search sender before email, daily deal/meeting/commitment queries).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add OpenClaw positioning to README opener

Make it clear up top that GBrain is built for OpenClaw agents and
works with any OpenClaw deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: credit Karpathy's Knowledge LLM vision, add origin story

GBrain started as Karpathy's LLM wiki idea built for real. Worked great
until the brain hit thousands of files and grep fell apart. GBrain is the
search layer that had to exist once the brain outgrew grep.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 10:17:13 -07:00
Garry Tan
a86f995883 feat: GBrain v0.3.0 — contract-first architecture + ClawHub plugin (#7)
* feat: contract-first operations.ts with OperationError, dry_run, importFromContent

30 shared operations as single source of truth for CLI and MCP.
- OperationError with typed error codes (page_not_found, invalid_params, etc.)
- dry_run support on all mutating operations
- importFromContent split from importFile with transaction wrapping
- Idempotency hash now includes ALL fields (title, type, frontmatter, tags)
- Config env var fallback: GBRAIN_DATABASE_URL > DATABASE_URL > config file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: rewrite MCP server + CLI + tools-json from operations

server.ts: 233 -> ~80 lines. Tool definitions and dispatch generated from operations[].
cli.ts: shared operations auto-registered, CLI-only commands kept as manual dispatch.
tools-json: generated FROM operations[], eliminating the third contract surface.
Parity test verifies structural contract between operations, CLI, and MCP.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: delete 12 command files migrated to operations.ts

Handler logic for get, put, delete, list, search, query, health, stats,
tags, link, timeline, and version now lives in operations.ts.
Kept: init, upgrade, import, export, files, embed, sync, serve, call, config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: init --non-interactive, upgrade verification, schema migration

- gbrain init --non-interactive --url <url> for plugin mode (no TTY required)
- Post-upgrade version verification in gbrain upgrade
- Drop storage_url from files table (storage_path is the only identifier)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: tool-agnostic skills + new setup skill

All 7 skills rewritten with intent-based language instead of CLI commands.
Works with both CLI and MCP plugin contexts.
New setup skill replaces install: auto-provision Supabase via CLI,
AGENTS.md injection, target TTHW < 2 min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: ClawHub bundle plugin, CI workflows, v0.3.0

- openclaw.plugin.json with configSchema, MCP server config, skill listing
- GitHub Actions: test on push/PR, multi-platform release (macOS arm64 + Linux x64)
- Version bump 0.3.0, CHANGELOG, README ClawHub section, CLAUDE.md updated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: idempotency hash mismatch + MCP dry_run passthrough

importFromContent now passes its all-fields hash through putPage via
content_hash on PageInput, so the stored hash matches the computed hash.
Previously the skip-if-unchanged check never fired because the hash
formulas differed.

MCP server now passes dry_run from tool params to OperationContext.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.3.0.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: schema loader handles PL/pgSQL $$ blocks

Delete the semicolon-based SQL splitter in db.ts which broke on
PL/pgSQL trigger functions containing semicolons inside $$ delimiter
blocks. Use single conn.unsafe(schemaSql) call instead — the postgres
driver handles multi-statement SQL natively. schema.sql already uses
IF NOT EXISTS / CREATE OR REPLACE for idempotency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: E2E test infrastructure + realistic brain fixtures

Add test infrastructure for running E2E tests against real
Postgres+pgvector. Includes:
- test/e2e/helpers.ts: DB lifecycle, fixture import, timing, diagnostics
- 13 fixture files as a miniature realistic brain (people, companies,
  deals, meetings, concepts, projects, sources) following the
  compiled truth + timeline format from GBRAIN_RECOMMENDED_SCHEMA.md
- docker-compose.test.yml: local pgvector convenience (port 5433)
- .env.testing.example: template for test credentials
- package.json: add test:e2e script

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: E2E test suites + CI workflow

Tier 1 (mechanical.test.ts): 14 test suites covering all operations
against real Postgres — page CRUD, search with quality scoring, links,
tags, timeline, versions, admin, chunks, resolution, ingest log, raw
data, files, idempotency stress, setup journey (full CLI flow), init
edge cases, schema idempotency, schema diff guard, performance baselines.

Tier 1 (mcp.test.ts): MCP protocol test — spawns server, sends JSON-RPC,
verifies tools/list matches operations count.

Tier 2 (skills.test.ts): OpenClaw skill tests — ingest, query, health.
Skips gracefully when dependencies missing.

CI (.github/workflows/e2e.yml): Tier 1 on every PR (pgvector service),
Tier 2 nightly/manual with API key secrets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: E2E test fixes + traverseGraph jsonb cast

- Fix traverseGraph query: cast json_agg to jsonb_agg so SELECT DISTINCT works
- Fix put_page tests to use importFromContent with noEmbed (no OpenAI key in Tier 1)
- Fix get_health assertion (page_count not total_pages)
- Fix raw_data test to handle JSONB string/object return
- Simplify MCP test to verify tool generation directly
- Add timeouts to CLI subprocess tests
- Use port 5434 for docker-compose (5433 often in use)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update all project docs for E2E test suite

- CLAUDE.md: updated test count (9 unit + 3 E2E), added E2E test
  instructions, fixed skill count to 8
- CONTRIBUTING.md: updated project structure with test/e2e/, added E2E
  test instructions, rewrote "Adding a new command" to reflect
  contract-first architecture (add to operations.ts, done)
- README.md: fixed table count (10 not 9), added recommended schema doc
  to Docs section, added E2E instructions to Contributing section
- CHANGELOG.md: added E2E test suite, docker-compose, schema loader fix,
  and traverseGraph jsonb fix to v0.3.0 entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:26:11 -10:00
Garry Tan
ee9e6689ad docs: expand brain schema with database architecture and OSS smoothing (#4)
* docs: expand brain schema — database architecture, dedup, enrichment sources, worked examples

Rewrite the recommended schema doc: present the database layer (entity registry,
event ledger, fact store, relationship graph) as the core architecture rather than
a future upgrade. Add entity identity/deduplication, enrichment source ordering,
epistemic discipline, three worked examples, concurrency guidance, and browser
budget. Smooth language for open-source readability.

* chore: bump version and changelog (v0.2.0.2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 00:24:16 -07:00
Garry Tan
96384b712f docs: fix first-time experience — remove fictional kindling, add recommended schema (#3)
* docs: add recommended brain schema

Full LLM-maintained knowledge base architecture: MECE directory structure,
compiled truth + timeline pages, enrichment pipeline, resolver decision
tree, skill architecture, and cron job recommendations.

* docs: fix first-time experience — remove fictional kindling, add GitHub URL

- Remove all references to data/kindling/ (never existed)
- OpenClaw paste now references https://github.com/garrytan/gbrain
- "Try it" section rewritten as three-act story with user's own data
- Agent picks dynamic query based on imported content
- Step 5 links to recommended schema doc for brain restructuring
- Includes bun install fallback in paste step 1

* chore: bump version and changelog (v0.2.0.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 23:16:23 -07:00
Garry Tan
ecebd5552a feat: GBrain v0.2.0 — incremental sync, file storage, install skill (#2)
* refactor: extract importFile from import.ts + add tag reconciliation

Shared single-file import function used by both import and sync.
Adds tag reconciliation (removes stale tags on reimport), >1MB file
skip, and import->sync checkpoint continuity (writes git HEAD to
config table after import so sync picks up seamlessly).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add sync pure functions, updateSlug engine method, and sync tests

- buildSyncManifest: parses git diff --name-status -M output
- isSyncable: filters to .md pages, excludes hidden/ops/.raw/skip-list
- pathToSlug: converts file paths to page slugs with optional prefix
- updateSlug: renames page slug in-place (preserves page_id, chunks, embeddings)
- rewriteLinks: stub for v0.2 (FKs use page_id, already correct)
- 20 new tests, all passing (39 total across 3 files)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add gbrain sync command with CLI, MCP, and watch mode

18-step sync protocol: read config, git pull, ancestry validation,
git diff --name-status -M for net changes, isSyncable filter, process
deletes/renames/adds/modifies via importFile, batch optimization,
sync state checkpoint in Postgres config table. Watch mode with
polling and consecutive error counter. MCP sync_brain tool returns
structured SyncResult. Stale page deletion for un-syncable files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add files table, gbrain files commands, and config show redaction

- files table: page_slug FK with ON DELETE SET NULL + ON UPDATE CASCADE,
  storage_path, storage_url, mime_type, content_hash for dedup
- gbrain files list/upload/sync/verify commands for Supabase Storage
- gbrain config show redacts postgresql:// passwords and secret keys
- CLI help updated with FILES section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add install skill for GBrain onboarding

6-phase install workflow: environment discovery, Supabase setup (magic
path via CLI OAuth or fallback 2-copy-paste), init + import, ongoing
sync cron, optional file migration with mandatory verification, and
agent teaching (AGENTS.md rules). Every error gets what + why + fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.2.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add v0.2 features to README (sync, files, install skill)

README.md: added sync command to IMPORT/EXPORT section, added FILES
section with 4 commands, added files table to schema diagram, added
install skill to skills table, updated MCP tools count from 20 to 21
(sync_brain added).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: OpenClaw DX improvements (skill count, upgrade docs, config show help)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: consolidate version to single source of truth

Create src/version.ts that reads from package.json via static import
(safe for bun compiled binaries). Update mcp/server.ts from hardcoded
'0.1.0' to use shared VERSION. Bump skills/manifest.json to 0.2.0.

* fix: upgrade detection order, npm→bun naming, clawhub false positives

Reorder detection: node_modules first, binary second, clawhub last.
Rename 'npm' install method to 'bun'. Use 'clawhub --version' instead
of 'which clawhub' to avoid false positives from dangling symlinks.
Add 120s timeout to execSync calls to prevent hanging. Add --help flag.

* feat: per-command --help, unknown command check before DB connection

Add COMMAND_HELP map covering all 28 commands. Check --help before
init/upgrade dispatch and before connectEngine() so help works without
a database. Use COMMAND_HELP keys as known-command set to catch unknown
commands before wasting a DB round-trip.

* docs: standardize npm references to bun, add Upgrade section to README

Fix init.ts: npx→bunx, npm→bun for supabase CLI guidance.
Fix README: npm install→bun add for standalone CLI install.
Add ## Upgrade section to README with all three install methods.
Update install skill Upgrading section to list bun, ClawHub, and binary.

* test: full coverage audit — CLI dispatch, upgrade detection, config, edge cases

New test files:
- test/cli.test.ts: COMMAND_HELP ↔ switch consistency, version from
  package.json, per-command --help, unknown command handling, global help
- test/upgrade.test.ts: detection order verification, npm→bun naming,
  clawhub --version (not which), timeout presence
- test/config.test.ts: redactUrl for postgresql URLs, edge cases

Extended existing tests:
- test/sync.test.ts: empty string pathToSlug, uppercase .MD rejection,
  deeply nested files, multiple renames, unknown status codes
- test/markdown.test.ts: multiple --- separators, missing frontmatter,
  no frontmatter at all, empty string, type inference from paths

Tests: 39 → 83 (+44 new). All pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: 100% coverage — import-file mock engine, files utils, chunker edge cases

New test files:
- test/import-file.test.ts (9 tests): mock BrainEngine to test importFile
  without DB — MAX_FILE_SIZE skip, content_hash dedup, tag reconciliation
  (remove stale + add new), compiled_truth/timeline chunking, noEmbed flag,
  sequential chunk_index
- test/files.test.ts (22 tests): getMimeType for all extensions + uppercase
  + unknown + no-extension, fileHash consistency + different content + empty,
  collectFiles pattern (skip .md, skip hidden dirs, recurse, sorted output)

Extended:
- test/chunkers/recursive.test.ts (+6 tests): single newline splits,
  word-only text, clause delimiters, lossless preservation, default options,
  mixed delimiter hierarchy

Tests: 83 → 118 (+35 new). All pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:50:15 -07:00
Garry Tan
b22cbd349a feat: GBrain v0.1.0 — Postgres-native personal knowledge brain (#1)
* chore: add CLAUDE.md with project context and gstack skill routing rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: initialize project with Bun + TypeScript

package.json with dependencies (postgres, pgvector, openai, anthropic,
MCP SDK, gray-matter). TypeScript config targeting ESNext with bundler
module resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add foundation layer — engine interface, Postgres engine, schema

BrainEngine pluggable interface with full PostgresEngine: CRUD, search
(keyword + vector), links, tags, timeline, versions, stats, health,
ingest log, config. Trigger-based tsvector spanning pages +
timeline_entries. Markdown parser with frontmatter, compiled_truth /
timeline splitting, and round-trip serialization. 19 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add 3-tier chunking and embedding service

Recursive delimiter-aware chunker (5-level hierarchy, 300-word chunks,
50-word overlap). Semantic chunker with Savitzky-Golay boundary detection
and recursive fallback. LLM-guided chunker via Claude Haiku with sliding
window topic detection. OpenAI embedding service with batch support,
exponential backoff, and rate limit handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add hybrid search with RRF fusion, expansion, and 4-layer dedup

Hybrid search merges vector (pgvector HNSW) + keyword (tsvector) via
Reciprocal Rank Fusion. Multi-query expansion via Claude Haiku generates
2 alternative phrasings. 4-layer dedup pipeline: by source, cosine
similarity, type diversity (60% cap), per-page cap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add GBRAIN_V0 spec, pluggable engine architecture, SQLite engine plan

GBRAIN_V0.md: full product spec with architecture decisions, CLI commands,
schema, search architecture, chunking strategies, first-time experience,
and future plans. ENGINES.md: pluggable engine interface, capability matrix,
how to add new backends. SQLITE_ENGINE.md: complete SQLite implementation
plan with schema, FTS5 setup, vector search options, and contributor guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add CLI with all commands

Full CLI dispatcher with 25+ commands: init (Supabase wizard), get, put,
delete, list, search, query (hybrid RRF), import (bulk with progress bar),
export (round-trip), embed, stats, health, tag/untag/tags, link/unlink/
backlinks/graph, timeline/timeline-add, history/revert, config, upgrade,
serve, call. Smart slug resolution on reads. Version snapshots on updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add MCP stdio server with all brain tools

20 MCP tools mirroring CLI operations: get/put/delete/list pages,
search (keyword), query (hybrid RRF + expansion), tags, links with
graph traversal, timeline, stats, health, version history, and revert.
Auto-chunks and embeds on put_page. CLI and MCP share the same engine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add 6 skill files and ClawHub manifest

Fat markdown skills for AI agents: ingest (meetings/docs/articles with
timeline merge), query (3-layer search + synthesis + citations), maintain
(health checks, stale detection, orphan audit), enrich (external API
enrichment), briefing (daily briefing compilation), migrate (universal
migration from Obsidian/Notion/Logseq/markdown/CSV/JSON/Roam).
ClawHub manifest for skill distribution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add README, CONTRIBUTING, update CLAUDE.md test references

README with quickstart, commands, architecture, library usage, MCP setup,
and links to design docs. CONTRIBUTING with setup, project structure,
and guides for adding commands and engines. CLAUDE.md updated to reference
actual test files instead of planned-but-unwritten import test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address adversarial review findings — 5 critical/high fixes

- revertToVersion: add page_id check to prevent cross-page data corruption
- traverseGraph: use UNION instead of UNION ALL for cycle safety
- embedAll: preserve all chunks when embedding stale subset only
- embedding: throw on retry exhaustion instead of returning zero vectors
- putPage: validate slugs to prevent path traversal on export

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.1.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: expand README with schema, install, search architecture, and motivation

Why it exists, how search works (with ASCII diagram), full database schema
with all 9 tables and index details, chunking strategies explained, storage
estimates, setup wizard walkthrough, knowledge model with example page,
library usage with more examples, expanded skills table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add MIT license (Copyright 2026 Garry Tan)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add OpenClaw install flow as primary option in README

OpenClaw users just say "install gbrain" and the orchestrator handles
everything: package install, Supabase setup wizard, skill registration.
Shows the conversational interface for querying, ingesting, and briefings.
ClawHub and standalone CLI paths follow as alternatives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add prerequisites and explicit OpenClaw install instructions

Prerequisites table listing Supabase, OpenAI, and Anthropic dependencies
with links. Environment variable setup. Explicit step-by-step prompt for
OpenClaw users showing exactly what to tell the orchestrator. Note that
search degrades gracefully without API keys (keyword-only without OpenAI,
no expansion without Anthropic).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: scrub named references, add PG essay demo section to README

Replace all Pedro/Brex/Jensen Huang/River AI examples with Paul Graham
essay examples using the kindling corpus. Add "Try it" section to README
showing the power of hybrid search on PG essays in 90 seconds. Update
test fixtures to use concept pages instead of person pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:48:10 -07:00
Garry Tan
3144971cd0 Initial commit with gstack 2026-04-05 07:40:55 -07:00