feat: GBrain v0.2.0 — incremental sync, file storage, install skill (#2)

* refactor: extract importFile from import.ts + add tag reconciliation Shared single-file import function used by both import and sync. Adds tag reconciliation (removes stale tags on reimport), >1MB file skip, and import->sync checkpoint continuity (writes git HEAD to config table after import so sync picks up seamlessly). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add sync pure functions, updateSlug engine method, and sync tests - buildSyncManifest: parses git diff --name-status -M output - isSyncable: filters to .md pages, excludes hidden/ops/.raw/skip-list - pathToSlug: converts file paths to page slugs with optional prefix - updateSlug: renames page slug in-place (preserves page_id, chunks, embeddings) - rewriteLinks: stub for v0.2 (FKs use page_id, already correct) - 20 new tests, all passing (39 total across 3 files) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add gbrain sync command with CLI, MCP, and watch mode 18-step sync protocol: read config, git pull, ancestry validation, git diff --name-status -M for net changes, isSyncable filter, process deletes/renames/adds/modifies via importFile, batch optimization, sync state checkpoint in Postgres config table. Watch mode with polling and consecutive error counter. MCP sync_brain tool returns structured SyncResult. Stale page deletion for un-syncable files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add files table, gbrain files commands, and config show redaction - files table: page_slug FK with ON DELETE SET NULL + ON UPDATE CASCADE, storage_path, storage_url, mime_type, content_hash for dedup - gbrain files list/upload/sync/verify commands for Supabase Storage - gbrain config show redacts postgresql:// passwords and secret keys - CLI help updated with FILES section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add install skill for GBrain onboarding 6-phase install workflow: environment discovery, Supabase setup (magic path via CLI OAuth or fallback 2-copy-paste), init + import, ongoing sync cron, optional file migration with mandatory verification, and agent teaching (AGENTS.md rules). Every error gets what + why + fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.2.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add v0.2 features to README (sync, files, install skill) README.md: added sync command to IMPORT/EXPORT section, added FILES section with 4 commands, added files table to schema diagram, added install skill to skills table, updated MCP tools count from 20 to 21 (sync_brain added). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: OpenClaw DX improvements (skill count, upgrade docs, config show help) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: consolidate version to single source of truth Create src/version.ts that reads from package.json via static import (safe for bun compiled binaries). Update mcp/server.ts from hardcoded '0.1.0' to use shared VERSION. Bump skills/manifest.json to 0.2.0. * fix: upgrade detection order, npm→bun naming, clawhub false positives Reorder detection: node_modules first, binary second, clawhub last. Rename 'npm' install method to 'bun'. Use 'clawhub --version' instead of 'which clawhub' to avoid false positives from dangling symlinks. Add 120s timeout to execSync calls to prevent hanging. Add --help flag. * feat: per-command --help, unknown command check before DB connection Add COMMAND_HELP map covering all 28 commands. Check --help before init/upgrade dispatch and before connectEngine() so help works without a database. Use COMMAND_HELP keys as known-command set to catch unknown commands before wasting a DB round-trip. * docs: standardize npm references to bun, add Upgrade section to README Fix init.ts: npx→bunx, npm→bun for supabase CLI guidance. Fix README: npm install→bun add for standalone CLI install. Add ## Upgrade section to README with all three install methods. Update install skill Upgrading section to list bun, ClawHub, and binary. * test: full coverage audit — CLI dispatch, upgrade detection, config, edge cases New test files: - test/cli.test.ts: COMMAND_HELP ↔ switch consistency, version from package.json, per-command --help, unknown command handling, global help - test/upgrade.test.ts: detection order verification, npm→bun naming, clawhub --version (not which), timeout presence - test/config.test.ts: redactUrl for postgresql URLs, edge cases Extended existing tests: - test/sync.test.ts: empty string pathToSlug, uppercase .MD rejection, deeply nested files, multiple renames, unknown status codes - test/markdown.test.ts: multiple --- separators, missing frontmatter, no frontmatter at all, empty string, type inference from paths Tests: 39 → 83 (+44 new). All pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: 100% coverage — import-file mock engine, files utils, chunker edge cases New test files: - test/import-file.test.ts (9 tests): mock BrainEngine to test importFile without DB — MAX_FILE_SIZE skip, content_hash dedup, tag reconciliation (remove stale + add new), compiled_truth/timeline chunking, noEmbed flag, sequential chunk_index - test/files.test.ts (22 tests): getMimeType for all extensions + uppercase + unknown + no-extension, fileHash consistency + different content + empty, collectFiles pattern (skip .md, skip hidden dirs, recurse, sorted output) Extended: - test/chunkers/recursive.test.ts (+6 tests): single newline splits, word-only text, clause delimiters, lossless preservation, default options, mixed delimiter hierarchy Tests: 83 → 118 (+35 new). All pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:50:15 -07:00
parent b22cbd349a
commit ecebd5552a
29 changed files with 2365 additions and 145 deletions
--- a/skills/install/SKILL.md
+++ b/skills/install/SKILL.md
@@ -0,0 +1,210 @@
+# Install GBrain
+
+Set up GBrain from scratch. The agent drives the process, the human provides secrets and approvals.
+
+## Prerequisites
+
+- A Supabase account (Pro tier recommended: $25/mo for 8GB DB + 100GB storage)
+- An OpenAI API key (for semantic search embeddings, ~$4-5 for 7,500 pages)
+- A git-backed markdown knowledge base (or start fresh)
+
+## Phase 1: Environment Discovery
+
+Scan the environment to understand what we're working with.
+
+```bash
+# Find all git repos with markdown content
+echo "=== GBrain Environment Discovery ==="
+for dir in /data/* ~/git/* ~/Documents/* 2>/dev/null; do
+  if [ -d "$dir/.git" ]; then
+    md_count=$(find "$dir" -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | wc -l | tr -d ' ')
+    if [ "$md_count" -gt 10 ]; then
+      total_size=$(du -sh "$dir" 2>/dev/null | cut -f1)
+      binary_count=$(find "$dir" -not -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" -type f \( -name "*.jpg" -o -name "*.png" -o -name "*.pdf" -o -name "*.mp4" -o -name "*.m4a" -o -name "*.heic" -o -name "*.tiff" -o -name "*.dng" \) 2>/dev/null | wc -l | tr -d ' ')
+      echo ""
+      echo "  $dir ($total_size, $md_count .md files, $binary_count binary files)"
+      # Detect knowledge base type
+      if [ -d "$dir/.obsidian" ]; then
+        echo "    Type: Obsidian vault (detected, wikilink conversion needed in future release)"
+      elif [ -d "$dir/logseq" ]; then
+        echo "    Type: Logseq (detected, block-ref conversion needed in future release)"
+      else
+        echo "    Type: Plain markdown (ready for import)"
+      fi
+    fi
+  fi
+done
+echo ""
+echo "=== Discovery Complete ==="
+```
+
+Present findings to the human. Recommend which repos to import.
+
+## Phase 2: Supabase Setup
+
+### Magic Path (zero copy-pastes)
+
+Check if the Supabase CLI is available:
+
+```bash
+which supabase 2>/dev/null || npx supabase --version 2>/dev/null
+```
+
+If available, use the magic path:
+
+1. Tell the human: "I'll set up Supabase for you. Click 'Authorize' when your browser opens."
+2. Run `supabase login` (opens browser for OAuth)
+3. Run `supabase projects create --name gbrain --region us-east-1`
+4. Extract credentials from `supabase projects api-keys`
+5. Proceed to Phase 3 automatically
+
+### Fallback Path (2 copy-pastes)
+
+If the Supabase CLI is not available, tell the human exactly what to do:
+
+1. "Log into Supabase and add a credit card: https://supabase.com/dashboard/account/billing"
+2. "Create a new project: https://supabase.com/dashboard/new/_"
+   - Name: gbrain
+   - Region: closest to you
+   - Generate a strong password
+3. "Go to Project Settings > Database and copy the connection string (URI format)"
+   - Paste it here
+4. "Go to Project Settings > API and copy the service_role key"
+   - Paste it here
+
+That's it. Two copy-pastes. The agent does everything else.
+
+## Phase 3: Initialize GBrain
+
+```bash
+gbrain init \
+  --url "<database_url>" \
+  --repo "<repo_path>"
+```
+
+This runs:
+1. Connection test (SELECT 1)
+2. pgvector extension check (CREATE EXTENSION IF NOT EXISTS vector)
+3. Schema migration (idempotent, safe to re-run)
+4. Text import (all .md files, no embeddings yet)
+5. Sync checkpoint (writes git HEAD for seamless gbrain sync)
+
+### First Search Result
+
+After import completes, run a sample query to prove it works:
+
+```bash
+# Query the most recently modified page's topic
+gbrain query "$(ls -t <repo_path>/*.md <repo_path>/**/*.md 2>/dev/null | head -1 | xargs head -5 | grep -i 'title:' | cut -d: -f2 | tr -d ' ')"
+```
+
+Show results to the human immediately. This is the magic moment.
+
+### Start Embeddings
+
+```bash
+gbrain embed --stale &
+```
+
+Embeddings run in background. Keyword search works NOW. Semantic search improves as embeddings complete. Check progress with `gbrain embed --status`.
+
+## Phase 4: Set Up Ongoing Sync
+
+```bash
+# Add to cron (every 5 minutes)
+(crontab -l 2>/dev/null; echo "*/5 * * * * gbrain sync --no-pull 2>&1 | tail -1 >> /tmp/gbrain-sync.log") | crontab -
+```
+
+Or for agents that push to the brain repo, trigger sync after writes:
+```bash
+gbrain sync --no-pull
+```
+
+## Phase 5: Optional File Migration
+
+If the repo has >100MB of binary files:
+
+1. **Tell the human what will happen:**
+   "Your repo has X binary files (Y MB). I can move them to Supabase Storage to slim down git. Files stay in git history permanently. Want me to proceed?"
+
+2. **If approved:**
+   ```bash
+   gbrain health                              # verify everything is connected
+   gbrain files sync <repo>/attachments/      # upload all files
+   gbrain files verify                        # mandatory 100% verification
+   # STOP: ask human for approval before git rm
+   ```
+
+3. **After human approves git rm:**
+   ```bash
+   cd <repo>
+   echo "attachments/" >> .gitignore
+   git rm -r --cached attachments/
+   git commit -m "Move attachments to Supabase Storage"
+   git push
+   ```
+
+## Phase 6: Teach the Agent
+
+Add GBrain rules to AGENTS.md (or equivalent):
+
+```markdown
+## GBrain (Knowledge Search)
+
+GBrain indexes your knowledge base for fast search. Always search before answering
+questions about people, companies, deals, or anything in the brain.
+
+### Commands
+- `gbrain query "search terms"` -- Search the knowledge base (keyword + semantic)
+- `gbrain sync` -- Sync latest changes from git to GBrain
+- `gbrain files upload <path> --page <slug>` -- Upload a file to storage
+- `gbrain health` -- Check GBrain status
+- `gbrain stats` -- Show page count, embedding coverage, last sync
+
+### Rules
+1. **Search the brain first.** Before answering any question about people, companies,
+   deals, meetings, or strategy, run `gbrain query`. Your memory of file contents
+   goes stale; the database doesn't.
+2. **Never commit binaries to git.** Use `gbrain files upload` instead.
+3. **After writing to the brain repo,** trigger `gbrain sync --no-pull` to update
+   the search index immediately.
+```
+
+## Error Handling
+
+Every error tells you what happened, why, and how to fix it:
+
+| What You See | Why | Fix |
+|---|---|---|
+| Connection refused | Supabase project paused or wrong URL | supabase.com/dashboard > Restore |
+| Password authentication failed | Wrong password | Project Settings > Database > Reset password |
+| pgvector not available | Extension not enabled | Run CREATE EXTENSION vector in SQL Editor |
+| OpenAI key invalid | Expired or wrong key | platform.openai.com/api-keys > Create new |
+| Sync anchor missing | Force push removed the commit | `gbrain sync --full` |
+| No pages found | Query before import | `gbrain import <dir>` first |
+
+## Upgrading
+
+Upgrade depends on how you installed:
+- **bun (standalone or library):** `bun update gbrain`
+- **ClawHub:** `clawhub update gbrain`
+- **Compiled binary:** Download the latest from [GitHub Releases](https://github.com/garrytan/gbrain/releases)
+
+After upgrading:
+- Run `gbrain init` again to apply schema migrations (idempotent, safe to re-run)
+- The new `files` table gets created automatically on next init
+- Sync state is preserved across upgrades
+
+## Health Check
+
+Run `gbrain health` at any time to verify all connections:
+
+```
+ok Database: connected
+ok pgvector: extension loaded
+ok Schema: up to date
+ok Sync: last run N min ago
+ok Embeddings: X/Y pages embedded
+```
+
+Every unhealthy line includes WHY and FIX.
--- a/skills/manifest.json
+++ b/skills/manifest.json
@@ -1,6 +1,6 @@
 {
  "name": "gbrain",
-  "version": "0.1.0",
+  "version": "0.2.0",
  "description": "Personal knowledge brain with hybrid RAG search",
  "skills": [
    {
@@ -32,6 +32,11 @@
      "name": "migrate",
      "path": "migrate/SKILL.md",
      "description": "Universal migration from Obsidian, Notion, Logseq, markdown, CSV, JSON, Roam"
+    },
+    {
+      "name": "install",
+      "path": "install/SKILL.md",
+      "description": "Set up GBrain from scratch: Supabase, import, sync, file migration"
    }
  ],
  "dependencies": {