feat: GBrain v0.3.0 — contract-first architecture + ClawHub plugin (#7)

* feat: contract-first operations.ts with OperationError, dry_run, importFromContent

30 shared operations as single source of truth for CLI and MCP.
- OperationError with typed error codes (page_not_found, invalid_params, etc.)
- dry_run support on all mutating operations
- importFromContent split from importFile with transaction wrapping
- Idempotency hash now includes ALL fields (title, type, frontmatter, tags)
- Config env var fallback: GBRAIN_DATABASE_URL > DATABASE_URL > config file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: rewrite MCP server + CLI + tools-json from operations

server.ts: 233 -> ~80 lines. Tool definitions and dispatch generated from operations[].
cli.ts: shared operations auto-registered, CLI-only commands kept as manual dispatch.
tools-json: generated FROM operations[], eliminating the third contract surface.
Parity test verifies structural contract between operations, CLI, and MCP.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: delete 12 command files migrated to operations.ts

Handler logic for get, put, delete, list, search, query, health, stats,
tags, link, timeline, and version now lives in operations.ts.
Kept: init, upgrade, import, export, files, embed, sync, serve, call, config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: init --non-interactive, upgrade verification, schema migration

- gbrain init --non-interactive --url <url> for plugin mode (no TTY required)
- Post-upgrade version verification in gbrain upgrade
- Drop storage_url from files table (storage_path is the only identifier)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: tool-agnostic skills + new setup skill

All 7 skills rewritten with intent-based language instead of CLI commands.
Works with both CLI and MCP plugin contexts.
New setup skill replaces install: auto-provision Supabase via CLI,
AGENTS.md injection, target TTHW < 2 min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: ClawHub bundle plugin, CI workflows, v0.3.0

- openclaw.plugin.json with configSchema, MCP server config, skill listing
- GitHub Actions: test on push/PR, multi-platform release (macOS arm64 + Linux x64)
- Version bump 0.3.0, CHANGELOG, README ClawHub section, CLAUDE.md updated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: idempotency hash mismatch + MCP dry_run passthrough

importFromContent now passes its all-fields hash through putPage via
content_hash on PageInput, so the stored hash matches the computed hash.
Previously the skip-if-unchanged check never fired because the hash
formulas differed.

MCP server now passes dry_run from tool params to OperationContext.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.3.0.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: schema loader handles PL/pgSQL $$ blocks

Delete the semicolon-based SQL splitter in db.ts which broke on
PL/pgSQL trigger functions containing semicolons inside $$ delimiter
blocks. Use single conn.unsafe(schemaSql) call instead — the postgres
driver handles multi-statement SQL natively. schema.sql already uses
IF NOT EXISTS / CREATE OR REPLACE for idempotency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: E2E test infrastructure + realistic brain fixtures

Add test infrastructure for running E2E tests against real
Postgres+pgvector. Includes:
- test/e2e/helpers.ts: DB lifecycle, fixture import, timing, diagnostics
- 13 fixture files as a miniature realistic brain (people, companies,
  deals, meetings, concepts, projects, sources) following the
  compiled truth + timeline format from GBRAIN_RECOMMENDED_SCHEMA.md
- docker-compose.test.yml: local pgvector convenience (port 5433)
- .env.testing.example: template for test credentials
- package.json: add test:e2e script

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: E2E test suites + CI workflow

Tier 1 (mechanical.test.ts): 14 test suites covering all operations
against real Postgres — page CRUD, search with quality scoring, links,
tags, timeline, versions, admin, chunks, resolution, ingest log, raw
data, files, idempotency stress, setup journey (full CLI flow), init
edge cases, schema idempotency, schema diff guard, performance baselines.

Tier 1 (mcp.test.ts): MCP protocol test — spawns server, sends JSON-RPC,
verifies tools/list matches operations count.

Tier 2 (skills.test.ts): OpenClaw skill tests — ingest, query, health.
Skips gracefully when dependencies missing.

CI (.github/workflows/e2e.yml): Tier 1 on every PR (pgvector service),
Tier 2 nightly/manual with API key secrets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: E2E test fixes + traverseGraph jsonb cast

- Fix traverseGraph query: cast json_agg to jsonb_agg so SELECT DISTINCT works
- Fix put_page tests to use importFromContent with noEmbed (no OpenAI key in Tier 1)
- Fix get_health assertion (page_count not total_pages)
- Fix raw_data test to handle JSONB string/object return
- Simplify MCP test to verify tool generation directly
- Add timeouts to CLI subprocess tests
- Use port 5434 for docker-compose (5433 often in use)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update all project docs for E2E test suite

- CLAUDE.md: updated test count (9 unit + 3 E2E), added E2E test
  instructions, fixed skill count to 8
- CONTRIBUTING.md: updated project structure with test/e2e/, added E2E
  test instructions, rewrote "Adding a new command" to reflect
  contract-first architecture (add to operations.ts, done)
- README.md: fixed table count (10 not 9), added recommended schema doc
  to Docs section, added E2E instructions to Contributing section
- CHANGELOG.md: added E2E test suite, docker-compose, schema loader fix,
  and traverseGraph jsonb fix to v0.3.0 entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-08 23:26:11 -10:00
committed by GitHub
parent ee9e6689ad
commit a86f995883
67 changed files with 3827 additions and 1394 deletions

View File

@@ -5,27 +5,27 @@ Compile a daily briefing from brain context.
## Workflow
1. **Today's meetings.** For each meeting on the calendar:
- Look up all participants via `gbrain query <name>`
- Read their pages for compiled_truth context
- Search gbrain for each participant by name
- Read their pages from gbrain for compiled_truth context
- Summarize: who they are, recent timeline, relationship to you
2. **Active deals.** `gbrain list --type deal` filtered to active status:
2. **Active deals.** List deal pages in gbrain filtered to active status:
- Deadlines approaching in the next 7 days
- Recent timeline entries (last 7 days)
3. **Time-sensitive threads.** Open items from timeline entries:
- Items with deadlines in the next 48 hours
- Follow-ups that are overdue
4. **Recent changes.** Pages updated in the last 24 hours:
- What changed and why (read timeline entries)
5. **People in play.** `gbrain list --type person` sorted by recency:
- What changed and why (read timeline entries from gbrain)
5. **People in play.** List person pages in gbrain sorted by recency:
- Updated in last 7 days
- Have high activity (many recent timeline entries)
6. **Stale alerts.** From `gbrain health`:
6. **Stale alerts.** From gbrain health check:
- Pages flagged as stale that are relevant to today's meetings
## Output Format
```
DAILY BRIEFING [date]
DAILY BRIEFING -- [date]
========================
MEETINGS TODAY
@@ -33,26 +33,23 @@ MEETINGS TODAY
Participants: [name] (slug: people/name, [key context])
ACTIVE DEALS
- [deal name] [status], deadline: [date]
- [deal name] -- [status], deadline: [date]
Recent: [latest timeline entry]
ACTION ITEMS
- [item] due [date], related to [slug]
- [item] -- due [date], related to [slug]
RECENT CHANGES (24h)
- [slug] [what changed]
- [slug] -- [what changed]
PEOPLE IN PLAY
- [name] [why they're active]
- [name] -- [why they're active]
```
## Commands Used
## Tools Used
```
gbrain query <name>
gbrain get <slug>
gbrain list --type deal
gbrain list --type person
gbrain health
gbrain timeline <slug>
```
- Search gbrain by name (query)
- Read a page from gbrain (get_page)
- List pages in gbrain by type (list_pages)
- Check gbrain health (get_health)
- View timeline entries in gbrain (get_timeline)

View File

@@ -11,16 +11,17 @@ Enrich person and company pages from external APIs.
| Exa | Web mentions, articles | REST API |
Note: enrichment requires separate API credentials for each service. No client
integrations ship in v1. This skill guides Claude Code to make API calls directly.
integrations ship in v1. This skill guides the agent to make API calls directly.
## Workflow
1. **Select target pages.** `gbrain list --type person` or `gbrain list --type company`
1. **Select target pages.** List person or company pages in gbrain.
2. **For each page:**
- Read current compiled_truth to understand what we already know
- Read the page from gbrain to understand what we already know
- Call external APIs for fresh data
- Store raw API responses: the raw JSON goes into `gbrain call put_raw_data`
- Store raw API responses in gbrain (put_raw_data) to preserve provenance
- Distill highlights into compiled_truth updates
- Store the updated page in gbrain
3. **Validation rules:**
- Connection count < 20 on LinkedIn = likely wrong person, skip
- Name mismatch between brain and API = skip, flag for manual review
@@ -28,18 +29,17 @@ integrations ship in v1. This skill guides Claude Code to make API calls directl
## Quality Rules
- Raw data goes to raw_data table (preserves provenance)
- Raw data goes to gbrain's raw_data store (preserves provenance)
- Only distilled, useful info goes to compiled_truth
- Always add a timeline entry: "Enriched from [source] on [date]"
- Always add a timeline entry in gbrain: "Enriched from [source] on [date]"
- Don't enrich the same page more than once per week unless requested
- Rate limit: respect API rate limits, use exponential backoff
## Commands Used
## Tools Used
```
gbrain get <slug>
gbrain put <slug>
gbrain timeline-add <slug> <date> "Enriched from <source>"
gbrain list --type person
gbrain list --type company
```
- Read a page from gbrain (get_page)
- Store/update a page in gbrain (put_page)
- Add a timeline entry in gbrain (add_timeline_entry)
- List pages in gbrain by type (list_pages)
- Store raw API data in gbrain (put_raw_data)
- Retrieve raw data from gbrain (get_raw_data)

View File

@@ -6,11 +6,11 @@ Ingest meetings, articles, documents, and conversations into the brain.
1. **Parse the source.** Extract people, companies, dates, and events from the input.
2. **For each entity mentioned:**
- `gbrain get <slug>` to check if page exists
- Read the entity's page from gbrain to check if it exists
- If exists: update compiled_truth (rewrite State section with new info, don't append)
- If new: `gbrain put <slug>` to create the page
3. **Append to timeline.** `gbrain timeline-add <slug> <date> <summary>` for each event.
4. **Create cross-reference links.** `gbrain link <from> <to> --type <relationship>` for every entity pair mentioned together.
- If new: store the page in gbrain with the appropriate type and slug
3. **Append to timeline.** Add a timeline entry in gbrain for each event, with date, summary, and source.
4. **Create cross-reference links.** Link entities in gbrain for every entity pair mentioned together, using the appropriate relationship type.
5. **Timeline merge.** The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.
## Quality Rules
@@ -22,13 +22,11 @@ Ingest meetings, articles, documents, and conversations into the brain.
- Link types: knows, works_at, invested_in, founded, met_at, discussed
- Source attribution: every timeline entry includes the source (meeting, article, email, etc.)
## Commands Used
## Tools Used
```
gbrain get <slug>
gbrain put <slug> < content.md
gbrain timeline-add <slug> <date> <summary>
gbrain link <from> <to> --type <type>
gbrain tags <slug>
gbrain tag <slug> <tag>
```
- Read a page from gbrain (get_page)
- Store/update a page in gbrain (put_page)
- Add a timeline entry in gbrain (add_timeline_entry)
- Link entities in gbrain (add_link)
- List tags for a page (get_tags)
- Tag a page in gbrain (add_tag)

View File

@@ -1,210 +1,9 @@
# Install GBrain
# Install GBrain (Deprecated)
Set up GBrain from scratch. The agent drives the process, the human provides secrets and approvals.
This skill has been replaced by the **setup** skill. See `skills/setup/SKILL.md`.
## Prerequisites
- A Supabase account (Pro tier recommended: $25/mo for 8GB DB + 100GB storage)
- An OpenAI API key (for semantic search embeddings, ~$4-5 for 7,500 pages)
- A git-backed markdown knowledge base (or start fresh)
## Phase 1: Environment Discovery
Scan the environment to understand what we're working with.
```bash
# Find all git repos with markdown content
echo "=== GBrain Environment Discovery ==="
for dir in /data/* ~/git/* ~/Documents/* 2>/dev/null; do
if [ -d "$dir/.git" ]; then
md_count=$(find "$dir" -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | wc -l | tr -d ' ')
if [ "$md_count" -gt 10 ]; then
total_size=$(du -sh "$dir" 2>/dev/null | cut -f1)
binary_count=$(find "$dir" -not -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" -type f \( -name "*.jpg" -o -name "*.png" -o -name "*.pdf" -o -name "*.mp4" -o -name "*.m4a" -o -name "*.heic" -o -name "*.tiff" -o -name "*.dng" \) 2>/dev/null | wc -l | tr -d ' ')
echo ""
echo " $dir ($total_size, $md_count .md files, $binary_count binary files)"
# Detect knowledge base type
if [ -d "$dir/.obsidian" ]; then
echo " Type: Obsidian vault (detected, wikilink conversion needed in future release)"
elif [ -d "$dir/logseq" ]; then
echo " Type: Logseq (detected, block-ref conversion needed in future release)"
else
echo " Type: Plain markdown (ready for import)"
fi
fi
fi
done
echo ""
echo "=== Discovery Complete ==="
```
Present findings to the human. Recommend which repos to import.
## Phase 2: Supabase Setup
### Magic Path (zero copy-pastes)
Check if the Supabase CLI is available:
```bash
which supabase 2>/dev/null || npx supabase --version 2>/dev/null
```
If available, use the magic path:
1. Tell the human: "I'll set up Supabase for you. Click 'Authorize' when your browser opens."
2. Run `supabase login` (opens browser for OAuth)
3. Run `supabase projects create --name gbrain --region us-east-1`
4. Extract credentials from `supabase projects api-keys`
5. Proceed to Phase 3 automatically
### Fallback Path (2 copy-pastes)
If the Supabase CLI is not available, tell the human exactly what to do:
1. "Log into Supabase and add a credit card: https://supabase.com/dashboard/account/billing"
2. "Create a new project: https://supabase.com/dashboard/new/_"
- Name: gbrain
- Region: closest to you
- Generate a strong password
3. "Go to Project Settings > Database and copy the connection string (URI format)"
- Paste it here
4. "Go to Project Settings > API and copy the service_role key"
- Paste it here
That's it. Two copy-pastes. The agent does everything else.
## Phase 3: Initialize GBrain
```bash
gbrain init \
--url "<database_url>" \
--repo "<repo_path>"
```
This runs:
1. Connection test (SELECT 1)
2. pgvector extension check (CREATE EXTENSION IF NOT EXISTS vector)
3. Schema migration (idempotent, safe to re-run)
4. Text import (all .md files, no embeddings yet)
5. Sync checkpoint (writes git HEAD for seamless gbrain sync)
### First Search Result
After import completes, run a sample query to prove it works:
```bash
# Query the most recently modified page's topic
gbrain query "$(ls -t <repo_path>/*.md <repo_path>/**/*.md 2>/dev/null | head -1 | xargs head -5 | grep -i 'title:' | cut -d: -f2 | tr -d ' ')"
```
Show results to the human immediately. This is the magic moment.
### Start Embeddings
```bash
gbrain embed --stale &
```
Embeddings run in background. Keyword search works NOW. Semantic search improves as embeddings complete. Check progress with `gbrain embed --status`.
## Phase 4: Set Up Ongoing Sync
```bash
# Add to cron (every 5 minutes)
(crontab -l 2>/dev/null; echo "*/5 * * * * gbrain sync --no-pull 2>&1 | tail -1 >> /tmp/gbrain-sync.log") | crontab -
```
Or for agents that push to the brain repo, trigger sync after writes:
```bash
gbrain sync --no-pull
```
## Phase 5: Optional File Migration
If the repo has >100MB of binary files:
1. **Tell the human what will happen:**
"Your repo has X binary files (Y MB). I can move them to Supabase Storage to slim down git. Files stay in git history permanently. Want me to proceed?"
2. **If approved:**
```bash
gbrain health # verify everything is connected
gbrain files sync <repo>/attachments/ # upload all files
gbrain files verify # mandatory 100% verification
# STOP: ask human for approval before git rm
```
3. **After human approves git rm:**
```bash
cd <repo>
echo "attachments/" >> .gitignore
git rm -r --cached attachments/
git commit -m "Move attachments to Supabase Storage"
git push
```
## Phase 6: Teach the Agent
Add GBrain rules to AGENTS.md (or equivalent):
```markdown
## GBrain (Knowledge Search)
GBrain indexes your knowledge base for fast search. Always search before answering
questions about people, companies, deals, or anything in the brain.
### Commands
- `gbrain query "search terms"` -- Search the knowledge base (keyword + semantic)
- `gbrain sync` -- Sync latest changes from git to GBrain
- `gbrain files upload <path> --page <slug>` -- Upload a file to storage
- `gbrain health` -- Check GBrain status
- `gbrain stats` -- Show page count, embedding coverage, last sync
### Rules
1. **Search the brain first.** Before answering any question about people, companies,
deals, meetings, or strategy, run `gbrain query`. Your memory of file contents
goes stale; the database doesn't.
2. **Never commit binaries to git.** Use `gbrain files upload` instead.
3. **After writing to the brain repo,** trigger `gbrain sync --no-pull` to update
the search index immediately.
```
## Error Handling
Every error tells you what happened, why, and how to fix it:
| What You See | Why | Fix |
|---|---|---|
| Connection refused | Supabase project paused or wrong URL | supabase.com/dashboard > Restore |
| Password authentication failed | Wrong password | Project Settings > Database > Reset password |
| pgvector not available | Extension not enabled | Run CREATE EXTENSION vector in SQL Editor |
| OpenAI key invalid | Expired or wrong key | platform.openai.com/api-keys > Create new |
| Sync anchor missing | Force push removed the commit | `gbrain sync --full` |
| No pages found | Query before import | `gbrain import <dir>` first |
## Upgrading
Upgrade depends on how you installed:
- **bun (standalone or library):** `bun update gbrain`
- **ClawHub:** `clawhub update gbrain`
- **Compiled binary:** Download the latest from [GitHub Releases](https://github.com/garrytan/gbrain/releases)
After upgrading:
- Run `gbrain init` again to apply schema migrations (idempotent, safe to re-run)
- The new `files` table gets created automatically on next init
- Sync state is preserved across upgrades
## Health Check
Run `gbrain health` at any time to verify all connections:
```
ok Database: connected
ok pgvector: extension loaded
ok Schema: up to date
ok Sync: last run N min ago
ok Embeddings: X/Y pages embedded
```
Every unhealthy line includes WHY and FIX.
The setup skill provides:
- Auto-provision Supabase via CLI (< 2 min TTHW)
- Manual fallback with non-interactive init
- AGENTS.md auto-injection (upgrade-safe)
- First import and health verification

View File

@@ -4,34 +4,34 @@ Periodic brain health checks and cleanup.
## Workflow
1. **Run health check.** `gbrain health` to get the dashboard.
1. **Run health check.** Check gbrain health to get the dashboard.
2. **Check each dimension:**
### Stale pages
Pages where compiled_truth is older than the latest timeline entry. The assessment hasn't been updated to reflect recent evidence.
- `gbrain query "stale pages"` or check health output
- For each stale page: read timeline, determine if compiled_truth needs rewriting
- Check the health output for stale page count
- For each stale page: read the page from gbrain, review timeline, determine if compiled_truth needs rewriting
### Orphan pages
Pages with zero inbound links. Nobody references them.
- Review orphans: are they genuinely isolated or just missing links?
- Add links from related pages or flag for deletion
- Add links in gbrain from related pages or flag for deletion
### Dead links
Links pointing to pages that don't exist.
- Remove dead links with `gbrain unlink`
- Remove dead links in gbrain
### Missing cross-references
Pages that mention entity names but don't have formal links.
- Read compiled_truth, extract entity mentions, create links
- Read compiled_truth from gbrain, extract entity mentions, create links in gbrain
### Tag consistency
Inconsistent tagging (e.g., "vc" vs "venture-capital", "ai" vs "artificial-intelligence").
- Standardize to the most common variant
- Standardize to the most common variant using gbrain tag operations
### Embedding freshness
Chunks without embeddings, or chunks embedded with an old model.
- `gbrain embed --stale` to backfill
- Refresh stale embeddings in gbrain
### Open threads
Timeline items older than 30 days with unresolved action items.
@@ -41,19 +41,16 @@ Timeline items older than 30 days with unresolved action items.
- Never delete pages without confirmation
- Log all changes via timeline entries
- Run `gbrain health` before and after to show improvement
- Check gbrain health before and after to show improvement
## Commands Used
## Tools Used
```
gbrain health
gbrain list [--type T]
gbrain get <slug>
gbrain backlinks <slug>
gbrain link <from> <to> --type <type>
gbrain unlink <from> <to>
gbrain tag <slug> <tag>
gbrain untag <slug> <tag>
gbrain embed --stale
gbrain timeline <slug>
```
- Check gbrain health (get_health)
- List pages in gbrain with filters (list_pages)
- Read a page from gbrain (get_page)
- Check backlinks in gbrain (get_backlinks)
- Link entities in gbrain (add_link)
- Remove links in gbrain (remove_link)
- Tag a page in gbrain (add_tag)
- Remove a tag in gbrain (remove_tag)
- View timeline in gbrain (get_timeline)

View File

@@ -1,6 +1,6 @@
{
"name": "gbrain",
"version": "0.2.0",
"version": "0.3.0",
"description": "Personal knowledge brain with hybrid RAG search",
"skills": [
{
@@ -34,9 +34,9 @@
"description": "Universal migration from Obsidian, Notion, Logseq, markdown, CSV, JSON, Roam"
},
{
"name": "install",
"path": "install/SKILL.md",
"description": "Set up GBrain from scratch: Supabase, import, sync, file migration"
"name": "setup",
"path": "setup/SKILL.md",
"description": "Set up GBrain: auto-provision Supabase, AGENTS.md injection, first import"
}
],
"dependencies": {
@@ -44,7 +44,7 @@
"package": "gbrain"
},
"setup": {
"command": "gbrain init --supabase",
"description": "Initialize brain with Supabase (guided wizard)"
"skill": "setup",
"description": "Auto-provision Supabase and configure GBrain (< 2 min)"
}
}

View File

@@ -9,7 +9,7 @@ Universal migration from any wiki, note tool, or brain system into GBrain.
| Obsidian | Markdown + `[[wikilinks]]` | Direct import, convert wikilinks to gbrain links |
| Notion | Exported markdown or CSV | Parse Notion's export structure |
| Logseq | Markdown with `((block refs))` | Convert block refs to page links |
| Plain markdown | Any .md directory | `gbrain import <dir>` directly |
| Plain markdown | Any .md directory | Import directory into gbrain directly |
| CSV | Tabular data | Map columns to frontmatter fields |
| JSON | Structured data | Map keys to page fields |
| Roam | JSON export | Convert block structure to pages |
@@ -18,31 +18,23 @@ Universal migration from any wiki, note tool, or brain system into GBrain.
1. **Assess the source.** What format? How many files? What structure?
2. **Plan the mapping.** How do source fields map to gbrain fields (type, title, tags, compiled_truth, timeline)?
3. **Test with a sample.** Import 5-10 files, verify with `gbrain get` and `gbrain export`.
4. **Bulk import.** Run the full migration.
5. **Verify.** `gbrain health` + `gbrain stats` + spot-check pages.
6. **Build links.** Extract cross-references from content and create typed links.
3. **Test with a sample.** Import 5-10 files, verify by reading them back from gbrain and exporting.
4. **Bulk import.** Import the full directory into gbrain.
5. **Verify.** Check gbrain health and statistics, spot-check pages.
6. **Build links.** Extract cross-references from content and create typed links in gbrain.
## Obsidian Migration
```bash
# 1. Direct import (obsidian vaults are markdown directories)
gbrain import /path/to/vault/
# 2. Convert [[wikilinks]] to gbrain links
# The skill reads each page's compiled_truth, finds [[Name]] patterns,
# resolves them to slugs, and creates links:
gbrain get <slug> # read content
# For each [[Name]] found:
gbrain link <current-slug> <resolved-slug> --type references
```
1. Import the vault directory into gbrain (Obsidian vaults are markdown directories)
2. Convert `[[wikilinks]]` to gbrain links:
- Read each page from gbrain
- For each `[[Name]]` found, resolve to a slug and create a link in gbrain
- `[[Name|alias]]` uses the alias for context
Obsidian-specific:
- `[[Name]]` becomes `gbrain link`
- `[[Name|alias]]` uses the alias for context
- Tags (`#tag`) become `gbrain tag`
- Tags (`#tag`) become gbrain tags
- Frontmatter properties map to gbrain frontmatter
- Attachments (images, PDFs) are noted but not imported (future work)
- Attachments (images, PDFs) are noted but handled separately via file storage
## Notion Migration
@@ -50,38 +42,31 @@ Obsidian-specific:
2. Notion exports nested directories with UUIDs in filenames
3. Strip UUIDs from filenames for clean slugs
4. Map Notion's database properties to frontmatter
5. `gbrain import` the cleaned directory
5. Import the cleaned directory into gbrain
## CSV Migration
For tabular data (e.g., CRM exports, contact lists):
```bash
# For each row in the CSV:
# 1. Create a page with column values as frontmatter
# 2. Use a designated column as the slug (e.g., name)
# 3. Use another column as compiled_truth (e.g., notes)
gbrain put <slug> < generated.md
```
1. For each row in the CSV, create a page with column values as frontmatter
2. Use a designated column as the slug (e.g., name)
3. Use another column as compiled_truth (e.g., notes)
4. Store each page in gbrain
## Verification
After any migration:
1. `gbrain stats` — check page count matches source
2. `gbrain health` — check for orphans, missing embeddings
3. `gbrain export --dir /tmp/verify/` — round-trip test
4. Spot-check 5-10 pages with `gbrain get`
5. Test search: `gbrain query "someone you know is in the data"`
1. Check gbrain statistics to verify page count matches source
2. Check gbrain health for orphans and missing embeddings
3. Export pages from gbrain for round-trip verification
4. Spot-check 5-10 pages by reading them from gbrain
5. Test search: search gbrain for "someone you know is in the data"
## Commands Used
## Tools Used
```
gbrain import <dir> [--no-embed]
gbrain get <slug>
gbrain put <slug>
gbrain link <from> <to> --type <type>
gbrain tag <slug> <tag>
gbrain stats
gbrain health
gbrain export [--dir ./verify/]
```
- Store/update pages in gbrain (put_page)
- Read pages from gbrain (get_page)
- Link entities in gbrain (add_link)
- Tag pages in gbrain (add_tag)
- Get gbrain statistics (get_stats)
- Check gbrain health (get_health)
- Search gbrain (query)

View File

@@ -9,10 +9,10 @@ Answer questions using the brain's knowledge with 3-layer search and synthesis.
- Semantic query for conceptual questions
- Structured queries (list by type, backlinks) for relational questions
2. **Execute searches:**
- `gbrain search <keywords>` for FTS matches
- `gbrain query <question>` for hybrid semantic+keyword with expansion
- `gbrain list --type <type>` or `gbrain backlinks <slug>` for structural queries
3. **Read top results.** `gbrain get <slug>` for the top 3-5 pages to get full context.
- Keyword search gbrain for FTS matches (search)
- Hybrid search gbrain for semantic+keyword with expansion (query)
- List pages in gbrain by type or check backlinks for structural queries
3. **Read top results.** Read the top 3-5 pages from gbrain to get full context.
4. **Synthesize answer** with citations. Every claim traces back to a specific page slug.
5. **Flag gaps.** If the brain doesn't have info, say "the brain doesn't have information on X" rather than hallucinating.
@@ -25,14 +25,12 @@ Answer questions using the brain's knowledge with 3-layer search and synthesis.
- For "what happened" questions, use timeline entries
- For "what do we know" questions, read compiled_truth directly
## Commands Used
## Tools Used
```
gbrain search <query>
gbrain query <question>
gbrain get <slug>
gbrain list [--type T] [--tag T]
gbrain backlinks <slug>
gbrain graph <slug> [--depth N]
gbrain timeline <slug>
```
- Keyword search gbrain (search)
- Hybrid search gbrain (query)
- Read a page from gbrain (get_page)
- List pages in gbrain with filters (list_pages)
- Check backlinks in gbrain (get_backlinks)
- Traverse the link graph in gbrain (traverse_graph)
- View timeline entries in gbrain (get_timeline)

111
skills/setup/SKILL.md Normal file
View File

@@ -0,0 +1,111 @@
# Setup GBrain
Set up GBrain from scratch. Target: working brain in under 2 minutes.
## Prerequisites
- A Supabase account (Pro tier recommended: $25/mo for 8GB DB + 100GB storage)
- An OpenAI API key (for semantic search embeddings, ~$4-5 for 7,500 pages)
- A git-backed markdown knowledge base (or start fresh)
## Phase A: Auto-Provision (Supabase CLI)
Check if the Supabase CLI is available. If it is, use the fast path:
1. Tell the user: "I'll set up Supabase for you. Click 'Authorize' when your browser opens."
2. Run `supabase login` (opens browser for OAuth)
3. Run `supabase projects create --name gbrain --region us-east-1`
4. Extract the database connection URL from `supabase projects api-keys`
5. Initialize gbrain with the connection URL in non-interactive mode
6. Proceed to Phase C automatically
## Phase B: Manual Fallback
If the Supabase CLI is not available, guide the user:
1. "Log into Supabase and add a credit card: https://supabase.com/dashboard/account/billing"
2. "Create a new project: https://supabase.com/dashboard/new/_"
- Name: gbrain
- Region: closest to you
- Generate a strong password
3. "Go to Project Settings > Database and copy the connection string (URI format)"
- Paste it here
4. Initialize gbrain with the provided URL in non-interactive mode
That's it. One copy-paste. The agent does everything else.
## Phase C: First Import
1. **Discover markdown repos.** Scan the environment for git repos with markdown content.
```bash
echo "=== GBrain Environment Discovery ==="
for dir in /data/* ~/git/* ~/Documents/* 2>/dev/null; do
if [ -d "$dir/.git" ]; then
md_count=$(find "$dir" -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | wc -l | tr -d ' ')
if [ "$md_count" -gt 10 ]; then
total_size=$(du -sh "$dir" 2>/dev/null | cut -f1)
echo " $dir ($total_size, $md_count .md files)"
fi
fi
done
echo "=== Discovery Complete ==="
```
2. **Import the best candidate.** Import the recommended directory into gbrain.
3. **Prove search works.** Search gbrain for a topic from the imported data. Show results immediately.
4. **Start embeddings.** Refresh stale embeddings in gbrain (runs in background). Keyword search works NOW, semantic search improves as embeddings complete.
## Phase D: AGENTS.md Injection
Auto-inject gbrain instructions into the project's AGENTS.md (or equivalent). Use a delimited managed block that's upgrade-safe:
```markdown
<!-- gbrain:start -->
## GBrain (Knowledge Search)
GBrain indexes your knowledge base for fast search. Always search before answering
questions about people, companies, deals, or anything in the brain.
### How to use
- Search gbrain for any topic before answering questions
- After writing new content, sync the repository to gbrain
- Upload binary files to gbrain storage instead of committing to git
- Check gbrain health periodically
### Rules
1. **Search the brain first.** Before answering any question about people, companies,
deals, meetings, or strategy, search gbrain. Your memory of file contents goes
stale; the database doesn't.
2. **Never commit binaries to git.** Upload to gbrain file storage instead.
3. **After writing to the brain repo,** sync to gbrain immediately.
<!-- gbrain:end -->
```
## Phase E: Health Check
After setup is complete, check gbrain health. Every dimension should be healthy.
Report the final state to the user:
- Page count and statistics
- Embedding coverage
- Search verification (run a sample query)
## Error Handling
Every error tells you what happened, why, and how to fix it:
| What You See | Why | Fix |
|---|---|---|
| Connection refused | Supabase project paused or wrong URL | supabase.com/dashboard > Restore |
| Password authentication failed | Wrong password | Project Settings > Database > Reset password |
| pgvector not available | Extension not enabled | Run CREATE EXTENSION vector in SQL Editor |
| OpenAI key invalid | Expired or wrong key | platform.openai.com/api-keys > Create new |
| No pages found | Query before import | Import files into gbrain first |
## Tools Used
- Initialize gbrain (via CLI: gbrain init --non-interactive --url ...)
- Import files into gbrain (via CLI: gbrain import)
- Search gbrain (query)
- Check gbrain health (get_health)
- Get gbrain statistics (get_stats)