feat: GBrain v0.2.0 — incremental sync, file storage, install skill (#2)

* refactor: extract importFile from import.ts + add tag reconciliation

Shared single-file import function used by both import and sync.
Adds tag reconciliation (removes stale tags on reimport), >1MB file
skip, and import->sync checkpoint continuity (writes git HEAD to
config table after import so sync picks up seamlessly).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add sync pure functions, updateSlug engine method, and sync tests

- buildSyncManifest: parses git diff --name-status -M output
- isSyncable: filters to .md pages, excludes hidden/ops/.raw/skip-list
- pathToSlug: converts file paths to page slugs with optional prefix
- updateSlug: renames page slug in-place (preserves page_id, chunks, embeddings)
- rewriteLinks: stub for v0.2 (FKs use page_id, already correct)
- 20 new tests, all passing (39 total across 3 files)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add gbrain sync command with CLI, MCP, and watch mode

18-step sync protocol: read config, git pull, ancestry validation,
git diff --name-status -M for net changes, isSyncable filter, process
deletes/renames/adds/modifies via importFile, batch optimization,
sync state checkpoint in Postgres config table. Watch mode with
polling and consecutive error counter. MCP sync_brain tool returns
structured SyncResult. Stale page deletion for un-syncable files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add files table, gbrain files commands, and config show redaction

- files table: page_slug FK with ON DELETE SET NULL + ON UPDATE CASCADE,
  storage_path, storage_url, mime_type, content_hash for dedup
- gbrain files list/upload/sync/verify commands for Supabase Storage
- gbrain config show redacts postgresql:// passwords and secret keys
- CLI help updated with FILES section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add install skill for GBrain onboarding

6-phase install workflow: environment discovery, Supabase setup (magic
path via CLI OAuth or fallback 2-copy-paste), init + import, ongoing
sync cron, optional file migration with mandatory verification, and
agent teaching (AGENTS.md rules). Every error gets what + why + fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.2.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add v0.2 features to README (sync, files, install skill)

README.md: added sync command to IMPORT/EXPORT section, added FILES
section with 4 commands, added files table to schema diagram, added
install skill to skills table, updated MCP tools count from 20 to 21
(sync_brain added).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: OpenClaw DX improvements (skill count, upgrade docs, config show help)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: consolidate version to single source of truth

Create src/version.ts that reads from package.json via static import
(safe for bun compiled binaries). Update mcp/server.ts from hardcoded
'0.1.0' to use shared VERSION. Bump skills/manifest.json to 0.2.0.

* fix: upgrade detection order, npm→bun naming, clawhub false positives

Reorder detection: node_modules first, binary second, clawhub last.
Rename 'npm' install method to 'bun'. Use 'clawhub --version' instead
of 'which clawhub' to avoid false positives from dangling symlinks.
Add 120s timeout to execSync calls to prevent hanging. Add --help flag.

* feat: per-command --help, unknown command check before DB connection

Add COMMAND_HELP map covering all 28 commands. Check --help before
init/upgrade dispatch and before connectEngine() so help works without
a database. Use COMMAND_HELP keys as known-command set to catch unknown
commands before wasting a DB round-trip.

* docs: standardize npm references to bun, add Upgrade section to README

Fix init.ts: npx→bunx, npm→bun for supabase CLI guidance.
Fix README: npm install→bun add for standalone CLI install.
Add ## Upgrade section to README with all three install methods.
Update install skill Upgrading section to list bun, ClawHub, and binary.

* test: full coverage audit — CLI dispatch, upgrade detection, config, edge cases

New test files:
- test/cli.test.ts: COMMAND_HELP ↔ switch consistency, version from
  package.json, per-command --help, unknown command handling, global help
- test/upgrade.test.ts: detection order verification, npm→bun naming,
  clawhub --version (not which), timeout presence
- test/config.test.ts: redactUrl for postgresql URLs, edge cases

Extended existing tests:
- test/sync.test.ts: empty string pathToSlug, uppercase .MD rejection,
  deeply nested files, multiple renames, unknown status codes
- test/markdown.test.ts: multiple --- separators, missing frontmatter,
  no frontmatter at all, empty string, type inference from paths

Tests: 39 → 83 (+44 new). All pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: 100% coverage — import-file mock engine, files utils, chunker edge cases

New test files:
- test/import-file.test.ts (9 tests): mock BrainEngine to test importFile
  without DB — MAX_FILE_SIZE skip, content_hash dedup, tag reconciliation
  (remove stale + add new), compiled_truth/timeline chunking, noEmbed flag,
  sequential chunk_index
- test/files.test.ts (22 tests): getMimeType for all extensions + uppercase
  + unknown + no-extension, fileHash consistency + different content + empty,
  collectFiles pattern (skip .md, skip hidden dirs, recurse, sorted output)

Extended:
- test/chunkers/recursive.test.ts (+6 tests): single newline splits,
  word-only text, clause delimiters, lossless preservation, default options,
  mixed delimiter hierarchy

Tests: 83 → 118 (+35 new). All pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-06 16:50:15 -07:00
committed by GitHub
parent b22cbd349a
commit ecebd5552a
29 changed files with 2365 additions and 145 deletions

210
skills/install/SKILL.md Normal file
View File

@@ -0,0 +1,210 @@
# Install GBrain
Set up GBrain from scratch. The agent drives the process, the human provides secrets and approvals.
## Prerequisites
- A Supabase account (Pro tier recommended: $25/mo for 8GB DB + 100GB storage)
- An OpenAI API key (for semantic search embeddings, ~$4-5 for 7,500 pages)
- A git-backed markdown knowledge base (or start fresh)
## Phase 1: Environment Discovery
Scan the environment to understand what we're working with.
```bash
# Find all git repos with markdown content
echo "=== GBrain Environment Discovery ==="
for dir in /data/* ~/git/* ~/Documents/* 2>/dev/null; do
if [ -d "$dir/.git" ]; then
md_count=$(find "$dir" -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | wc -l | tr -d ' ')
if [ "$md_count" -gt 10 ]; then
total_size=$(du -sh "$dir" 2>/dev/null | cut -f1)
binary_count=$(find "$dir" -not -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" -type f \( -name "*.jpg" -o -name "*.png" -o -name "*.pdf" -o -name "*.mp4" -o -name "*.m4a" -o -name "*.heic" -o -name "*.tiff" -o -name "*.dng" \) 2>/dev/null | wc -l | tr -d ' ')
echo ""
echo " $dir ($total_size, $md_count .md files, $binary_count binary files)"
# Detect knowledge base type
if [ -d "$dir/.obsidian" ]; then
echo " Type: Obsidian vault (detected, wikilink conversion needed in future release)"
elif [ -d "$dir/logseq" ]; then
echo " Type: Logseq (detected, block-ref conversion needed in future release)"
else
echo " Type: Plain markdown (ready for import)"
fi
fi
fi
done
echo ""
echo "=== Discovery Complete ==="
```
Present findings to the human. Recommend which repos to import.
## Phase 2: Supabase Setup
### Magic Path (zero copy-pastes)
Check if the Supabase CLI is available:
```bash
which supabase 2>/dev/null || npx supabase --version 2>/dev/null
```
If available, use the magic path:
1. Tell the human: "I'll set up Supabase for you. Click 'Authorize' when your browser opens."
2. Run `supabase login` (opens browser for OAuth)
3. Run `supabase projects create --name gbrain --region us-east-1`
4. Extract credentials from `supabase projects api-keys`
5. Proceed to Phase 3 automatically
### Fallback Path (2 copy-pastes)
If the Supabase CLI is not available, tell the human exactly what to do:
1. "Log into Supabase and add a credit card: https://supabase.com/dashboard/account/billing"
2. "Create a new project: https://supabase.com/dashboard/new/_"
- Name: gbrain
- Region: closest to you
- Generate a strong password
3. "Go to Project Settings > Database and copy the connection string (URI format)"
- Paste it here
4. "Go to Project Settings > API and copy the service_role key"
- Paste it here
That's it. Two copy-pastes. The agent does everything else.
## Phase 3: Initialize GBrain
```bash
gbrain init \
--url "<database_url>" \
--repo "<repo_path>"
```
This runs:
1. Connection test (SELECT 1)
2. pgvector extension check (CREATE EXTENSION IF NOT EXISTS vector)
3. Schema migration (idempotent, safe to re-run)
4. Text import (all .md files, no embeddings yet)
5. Sync checkpoint (writes git HEAD for seamless gbrain sync)
### First Search Result
After import completes, run a sample query to prove it works:
```bash
# Query the most recently modified page's topic
gbrain query "$(ls -t <repo_path>/*.md <repo_path>/**/*.md 2>/dev/null | head -1 | xargs head -5 | grep -i 'title:' | cut -d: -f2 | tr -d ' ')"
```
Show results to the human immediately. This is the magic moment.
### Start Embeddings
```bash
gbrain embed --stale &
```
Embeddings run in background. Keyword search works NOW. Semantic search improves as embeddings complete. Check progress with `gbrain embed --status`.
## Phase 4: Set Up Ongoing Sync
```bash
# Add to cron (every 5 minutes)
(crontab -l 2>/dev/null; echo "*/5 * * * * gbrain sync --no-pull 2>&1 | tail -1 >> /tmp/gbrain-sync.log") | crontab -
```
Or for agents that push to the brain repo, trigger sync after writes:
```bash
gbrain sync --no-pull
```
## Phase 5: Optional File Migration
If the repo has >100MB of binary files:
1. **Tell the human what will happen:**
"Your repo has X binary files (Y MB). I can move them to Supabase Storage to slim down git. Files stay in git history permanently. Want me to proceed?"
2. **If approved:**
```bash
gbrain health # verify everything is connected
gbrain files sync <repo>/attachments/ # upload all files
gbrain files verify # mandatory 100% verification
# STOP: ask human for approval before git rm
```
3. **After human approves git rm:**
```bash
cd <repo>
echo "attachments/" >> .gitignore
git rm -r --cached attachments/
git commit -m "Move attachments to Supabase Storage"
git push
```
## Phase 6: Teach the Agent
Add GBrain rules to AGENTS.md (or equivalent):
```markdown
## GBrain (Knowledge Search)
GBrain indexes your knowledge base for fast search. Always search before answering
questions about people, companies, deals, or anything in the brain.
### Commands
- `gbrain query "search terms"` -- Search the knowledge base (keyword + semantic)
- `gbrain sync` -- Sync latest changes from git to GBrain
- `gbrain files upload <path> --page <slug>` -- Upload a file to storage
- `gbrain health` -- Check GBrain status
- `gbrain stats` -- Show page count, embedding coverage, last sync
### Rules
1. **Search the brain first.** Before answering any question about people, companies,
deals, meetings, or strategy, run `gbrain query`. Your memory of file contents
goes stale; the database doesn't.
2. **Never commit binaries to git.** Use `gbrain files upload` instead.
3. **After writing to the brain repo,** trigger `gbrain sync --no-pull` to update
the search index immediately.
```
## Error Handling
Every error tells you what happened, why, and how to fix it:
| What You See | Why | Fix |
|---|---|---|
| Connection refused | Supabase project paused or wrong URL | supabase.com/dashboard > Restore |
| Password authentication failed | Wrong password | Project Settings > Database > Reset password |
| pgvector not available | Extension not enabled | Run CREATE EXTENSION vector in SQL Editor |
| OpenAI key invalid | Expired or wrong key | platform.openai.com/api-keys > Create new |
| Sync anchor missing | Force push removed the commit | `gbrain sync --full` |
| No pages found | Query before import | `gbrain import <dir>` first |
## Upgrading
Upgrade depends on how you installed:
- **bun (standalone or library):** `bun update gbrain`
- **ClawHub:** `clawhub update gbrain`
- **Compiled binary:** Download the latest from [GitHub Releases](https://github.com/garrytan/gbrain/releases)
After upgrading:
- Run `gbrain init` again to apply schema migrations (idempotent, safe to re-run)
- The new `files` table gets created automatically on next init
- Sync state is preserved across upgrades
## Health Check
Run `gbrain health` at any time to verify all connections:
```
ok Database: connected
ok pgvector: extension loaded
ok Schema: up to date
ok Sync: last run N min ago
ok Embeddings: X/Y pages embedded
```
Every unhealthy line includes WHY and FIX.

View File

@@ -1,6 +1,6 @@
{
"name": "gbrain",
"version": "0.1.0",
"version": "0.2.0",
"description": "Personal knowledge brain with hybrid RAG search",
"skills": [
{
@@ -32,6 +32,11 @@
"name": "migrate",
"path": "migrate/SKILL.md",
"description": "Universal migration from Obsidian, Notion, Logseq, markdown, CSV, JSON, Roam"
},
{
"name": "install",
"path": "install/SKILL.md",
"description": "Set up GBrain from scratch: Supabase, import, sync, file migration"
}
],
"dependencies": {