* feat: battle-tested skill patterns from production deployment Backport production-learned brain-operations patterns: - Iron Law of Back-Linking (mandatory bidirectional linking) - Brain filing rules (file by primary subject, not format) - Enrichment protocol (7-step pipeline, 3-tier system, person/company templates) - Media ingest workflows (articles, videos, podcasts, PDFs, screenshots) - Citation requirements (mandatory [Source: ...] on every fact) - Test Before Bulk operating principle - Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS - X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger * chore: bump version and changelog (v0.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add _brain-filing-rules.md to CLAUDE.md key files * feat: smart file upload with TUS resumable and .redirect.yaml pointers - Supabase Storage auto-selects upload method by file size: < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry) - Signed URL generation for private bucket access (1-hour expiry) - New `upload-raw` command with size routing: small text stays in git, large/media files go to cloud with .redirect.yaml pointer - New `signed-url` command for generating access links - File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy) - Redirect format upgraded: 10 fields with full metadata - All migration commands (mirror, redirect, restore, clean) handle both formats * feat: skills reference actual gbrain file commands - Filing rules document upload-raw, signed-url, and .redirect.yaml format - Ingest skill uses gbrain files upload-raw for raw source preservation - Maintain skill adds file storage health checks - Setup skill adds storage configuration phase with migration guidance - Voice recipe uses upload-raw for call audio storage - Migration v0.9.0 with complete storage setup instructions * chore: bump version and changelog (v0.9.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: gbrain publish -- shareable HTML with password protection First code+skill pair: deterministic code does the work (strip private data, encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the agent when and how to use it. 34 new tests. See: https://x.com/garrytan/status/2042925773300908103 * feat: backlinks check/fix, page lint, and report commands Three new deterministic tools (zero LLM calls): - gbrain backlinks check/fix -- scans brain for entity mentions without back-links, creates them. Enforces the Iron Law from the skills. - gbrain lint [--fix] -- catches LLM preambles, code fence wrapping, placeholder dates, missing frontmatter, broken citations, empty sections. --fix auto-strips fixable artifacts. - gbrain report --type <name> -- saves timestamped reports to brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails. 33 new tests (409 total, 0 fail). * feat: v0.9.0 migration tells agents to swap scripts for built-in commands Migration file now: - Lists all 5 new deterministic commands with usage examples - Includes a script-to-command replacement table (old -> new) - Tells the agent to find custom script references in AGENTS.md, skills, and cron jobs and replace with gbrain commands - Adds recommended cron jobs for daily backlink fix + weekly lint - References the Thin Harness, Fat Skills thread * fix: CLI routing bugs found during DX review - Fixed subArgs reference error in handleCliOnly (used wrong variable name) - Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid conflict with existing backlinks operation (per-page incoming links) - Added TOOLS section to --help output showing publish, check-backlinks, lint, report - Added upload-raw and signed-url to FILES section in --help - Updated all docs/migration references to use check-backlinks * fix: security hardening from adversarial review - XSS: sanitize marked.parse() output (strip script/iframe/on* attrs) - Path traversal: validate report --type against [a-z0-9-] pattern - TUS: HEAD request before retry to get server's actual offset (TUS spec) - Pointer: upload-raw now includes pointer content in JSON output - Symlinks: use lstatSync in all walkers to prevent directory escape --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
321 lines
14 KiB
Markdown
321 lines
14 KiB
Markdown
# Setup GBrain
|
|
|
|
Set up GBrain from scratch. Target: working brain in under 5 minutes.
|
|
|
|
## Install (if not already installed)
|
|
|
|
```bash
|
|
bun add github:garrytan/gbrain
|
|
```
|
|
|
|
## How GBrain connects
|
|
|
|
GBrain connects directly to Postgres over the wire protocol. NOT through the
|
|
Supabase REST API. You need the **database connection string** (a `postgresql://` URI),
|
|
not the project URL or anon key. The password is embedded in the connection string.
|
|
|
|
Use the **Shared Pooler** connection string (port 6543), not the direct connection
|
|
(port 5432). The direct hostname resolves to IPv6 only, which many environments
|
|
can't reach. Find it: go to the project, click **Get Connected** next to the
|
|
project URL, then **Direct Connection String** > **Session Pooler**, and copy
|
|
the **Shared Pooler** connection string.
|
|
|
|
**Do NOT ask for the Supabase anon key.** GBrain doesn't use it.
|
|
|
|
## Why Supabase
|
|
|
|
Supabase gives you managed Postgres + pgvector (vector search built in) for $25/mo:
|
|
- 8GB database + 100GB storage on Pro tier
|
|
- No server to manage, automatic backups, dashboard for debugging
|
|
- pgvector pre-installed, just works
|
|
- Alternative: any Postgres with pgvector extension (self-hosted, Neon, Railway, etc.)
|
|
|
|
## Prerequisites
|
|
|
|
- A Supabase account (Pro tier recommended, $25/mo) OR any Postgres with pgvector
|
|
- An OpenAI API key (for semantic search embeddings, ~$4-5 for 7,500 pages)
|
|
- A git-backed markdown knowledge base (or start fresh)
|
|
|
|
## Available init options
|
|
|
|
- `gbrain init --supabase` -- interactive wizard (prompts for connection string)
|
|
- `gbrain init --url <connection_string>` -- direct, no prompts
|
|
- `gbrain init --non-interactive --url <connection_string>` -- for scripts/agents
|
|
- `gbrain doctor --json` -- health check after init
|
|
|
|
There is no `--local`, `--sqlite`, or offline mode. GBrain requires Postgres + pgvector.
|
|
|
|
## Phase A: Supabase Setup (recommended)
|
|
|
|
Guide the user through creating a Supabase project:
|
|
|
|
1. "Go to https://supabase.com and sign up or log in."
|
|
2. "Click 'New Project' in the top left."
|
|
- Name: `gbrain`
|
|
- Region: pick the one closest to you
|
|
- Database password: generate a strong one and save it
|
|
3. "Wait about 2 minutes for the project to initialize."
|
|
4. "Find the connection string: go to your project, click **Get Connected** next
|
|
to the project URL, then **Direct Connection String** > **Session Pooler**,
|
|
and copy the **Shared Pooler** connection string (port 6543)."
|
|
5. Initialize gbrain:
|
|
```bash
|
|
gbrain init --non-interactive --url "postgresql://postgres.[ref]:[password]@aws-0-[region].pooler.supabase.com:6543/postgres"
|
|
```
|
|
6. Verify: `gbrain doctor --json`
|
|
|
|
**OpenClaw/Hermes agent instructions:** Store the Supabase access token in your persistent
|
|
env as `SUPABASE_ACCESS_TOKEN`. gbrain doesn't store it, you need it for future
|
|
`gbrain doctor` runs. Generate at: https://supabase.com/dashboard/account/tokens
|
|
|
|
## Phase B: BYO Postgres (alternative)
|
|
|
|
If the user already has Postgres with pgvector:
|
|
|
|
1. Get the connection string from the user.
|
|
2. Run: `gbrain init --non-interactive --url "<connection_string>"`
|
|
3. Verify: `gbrain doctor --json`
|
|
|
|
If the connection fails with ECONNREFUSED and the URL contains `supabase.co`,
|
|
the user probably pasted the direct connection (IPv6 only). Guide them to the
|
|
Session pooler string instead (see Phase A step 4).
|
|
|
|
## Phase C: First Import
|
|
|
|
1. **Discover markdown repos.** Scan the environment for git repos with markdown content.
|
|
|
|
```bash
|
|
echo "=== GBrain Environment Discovery ==="
|
|
for dir in /data/* ~/git/* ~/Documents/* 2>/dev/null; do
|
|
if [ -d "$dir/.git" ]; then
|
|
md_count=$(find "$dir" -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | wc -l | tr -d ' ')
|
|
if [ "$md_count" -gt 10 ]; then
|
|
total_size=$(du -sh "$dir" 2>/dev/null | cut -f1)
|
|
echo " $dir ($total_size, $md_count .md files)"
|
|
fi
|
|
fi
|
|
done
|
|
echo "=== Discovery Complete ==="
|
|
```
|
|
|
|
2. **Import the best candidate.** For large imports (>1000 files), use nohup to
|
|
survive session timeouts:
|
|
```bash
|
|
nohup gbrain import <dir> --no-embed --workers 4 > /tmp/gbrain-import.log 2>&1 &
|
|
```
|
|
Then check progress: `tail -1 /tmp/gbrain-import.log`
|
|
|
|
For smaller imports, run directly:
|
|
```bash
|
|
gbrain import <dir> --no-embed
|
|
```
|
|
|
|
3. **Prove search works.** Pick a semantic query based on what you imported:
|
|
```bash
|
|
gbrain search "<topic from the imported data>"
|
|
```
|
|
This is the magical moment: the user sees search finding things grep couldn't.
|
|
|
|
4. **Start embeddings.** Refresh stale embeddings (runs in background). Keyword
|
|
search works NOW, semantic search improves as embeddings complete.
|
|
|
|
5. **Offer file migration.** If the repo has binary files (.raw/ directories with
|
|
images, PDFs, audio):
|
|
> "You have N binary files (X GB) in your brain repo. Want to move them to cloud
|
|
> storage? Your git repo will drop from X GB to Y MB. All links keep working."
|
|
|
|
If the user agrees, configure storage and run migration:
|
|
```bash
|
|
# Configure storage backend (Supabase Storage recommended)
|
|
gbrain config set storage.backend supabase
|
|
gbrain config set storage.bucket brain-files
|
|
gbrain config set storage.projectUrl <supabase-url>
|
|
gbrain config set storage.serviceRoleKey <service-role-key>
|
|
|
|
# Migrate binary files to cloud (3-step lifecycle)
|
|
gbrain files mirror <brain-dir> # Upload to cloud, keep local
|
|
gbrain files redirect <brain-dir> # Replace local with .redirect.yaml pointers
|
|
# (optional) gbrain files clean <brain-dir> --yes # Remove pointers too
|
|
```
|
|
|
|
After migration, `gbrain files upload-raw` handles new files automatically:
|
|
small text/PDFs stay in git, large/media files go to cloud with `.redirect.yaml`
|
|
pointers. Files >= 100 MB use TUS resumable upload for reliability.
|
|
|
|
If no markdown repos are found, create a starter brain with a few template pages
|
|
(a person page, a company page, a concept page) from docs/GBRAIN_RECOMMENDED_SCHEMA.md.
|
|
|
|
## Phase D: Brain-First Lookup Protocol
|
|
|
|
Inject the brain-first lookup protocol into the project's AGENTS.md (or equivalent).
|
|
This replaces grep-based knowledge lookups with structured gbrain queries.
|
|
|
|
### BEFORE (grep) vs AFTER (gbrain)
|
|
|
|
| Task | Before (grep) | After (gbrain) |
|
|
|------|---------------|-----------------|
|
|
| Find a person | `grep -r "Pedro" brain/` | `gbrain search "Pedro"` |
|
|
| Understand a topic | `grep -rl "deal" brain/ \| head -5 && cat ...` | `gbrain query "what's the status of the deal"` |
|
|
| Read a known page | `cat brain/people/pedro.md` | `gbrain get people/pedro` |
|
|
| Find connections | `grep -rl "Brex" brain/ \| xargs grep "Pedro"` | `gbrain query "Pedro Brex relationship"` |
|
|
|
|
### Lookup sequence (MANDATORY for every entity question)
|
|
|
|
1. `gbrain search "name"` -- keyword match, fast, works without embeddings
|
|
2. `gbrain query "what do we know about name"` -- hybrid search, needs embeddings
|
|
3. `gbrain get <slug>` -- direct page read when you know the slug from steps 1-2
|
|
4. `grep` fallback -- only if gbrain returns zero results AND the file may exist outside the indexed brain
|
|
|
|
Stop at the first step that gives you what you need. Most lookups resolve at step 1.
|
|
|
|
### Sync-after-write rule
|
|
|
|
After creating or updating any brain page in the repo, sync immediately so the
|
|
index stays current:
|
|
|
|
```bash
|
|
gbrain sync --no-pull --no-embed
|
|
```
|
|
|
|
This indexes new/changed files without pulling from git or regenerating embeddings.
|
|
Embeddings can be refreshed later in batch (`gbrain embed --stale`).
|
|
|
|
### gbrain vs memory_search
|
|
|
|
| Layer | What it stores | When to use |
|
|
|-------|---------------|-------------|
|
|
| **gbrain** | World knowledge: people, companies, deals, meetings, concepts, media | "Who is Pedro?", "What happened at the board meeting?" |
|
|
| **memory_search** | Agent operational state: preferences, decisions, session context | "How does the user like formatting?", "What did we decide about X?" |
|
|
|
|
Both should be checked. gbrain for facts about the world. memory_search for how
|
|
the agent should behave.
|
|
|
|
## Phase E: Load the Production Agent Guide
|
|
|
|
Read `docs/GBRAIN_SKILLPACK.md`. This is the reference architecture for how a
|
|
production agent uses gbrain: the brain-agent loop, entity detection, enrichment
|
|
pipeline, meeting ingestion, cron schedules, and the five operational disciplines.
|
|
|
|
Inject the key patterns into the agent's system context or AGENTS.md:
|
|
|
|
1. **Brain-agent loop** (Section 2): read before responding, write after learning
|
|
2. **Entity detection** (Section 3): spawn on every message, capture people/companies/ideas
|
|
3. **Source attribution** (Section 7): every fact needs `[Source: ...]`
|
|
4. **Iron law back-linking** (Section 15.4): every mention links back to the entity page
|
|
|
|
Tell the user: "The production agent guide is at docs/GBRAIN_SKILLPACK.md. It covers
|
|
the brain-agent loop, entity detection, enrichment, meeting ingestion, and cron
|
|
schedules. Read it when you're ready to go from 'search works' to 'the brain
|
|
maintains itself.'"
|
|
|
|
## Phase F: Health Check
|
|
|
|
Run `gbrain doctor --json` and report the results. Every check should be OK.
|
|
If any check fails, the doctor output tells you exactly what's wrong and how to fix it.
|
|
|
|
## Error Recovery
|
|
|
|
**If any gbrain command fails, run `gbrain doctor --json` first.** Report the full
|
|
output. It checks connection, pgvector, RLS, schema version, and embeddings.
|
|
|
|
| What You See | Why | Fix |
|
|
|---|---|---|
|
|
| Connection refused | Supabase project paused, IPv6, or wrong URL | Use Session pooler (port 6543), or supabase.com/dashboard > Restore |
|
|
| Password authentication failed | Wrong password | Project Settings > Database > Reset password |
|
|
| pgvector not available | Extension not enabled | Run `CREATE EXTENSION vector;` in SQL Editor |
|
|
| OpenAI key invalid | Expired or wrong key | platform.openai.com/api-keys > Create new |
|
|
| No pages found | Query before import | Import files into gbrain first |
|
|
| RLS not enabled | Security gap | Run `gbrain init` again (auto-enables RLS) |
|
|
|
|
## Phase G: Auto-Update Check (if not already configured)
|
|
|
|
If the user's install did NOT include setting up auto-update checks (e.g., they
|
|
used the manual install path or an older version of the OpenClaw/Hermes paste), offer it:
|
|
|
|
> "Would you like daily GBrain update checks? I'll let you know when there's a
|
|
> new version worth upgrading to — including new skills and schema recommendations.
|
|
> You'll always be asked before anything is installed."
|
|
|
|
If they agree:
|
|
1. Test: `gbrain check-update --json`
|
|
2. Register daily cron (see GBRAIN_SKILLPACK.md Section 17)
|
|
|
|
If already configured or user declines, skip.
|
|
|
|
## Phase H: Live Sync Setup (MUST ADD)
|
|
|
|
The brain repo is the source of truth. If sync doesn't run automatically, the
|
|
vector DB falls behind and gbrain returns stale answers. This phase is not optional.
|
|
|
|
Read `docs/GBRAIN_SKILLPACK.md` Section 18 for the full reference. Key points:
|
|
|
|
1. **Check the connection pooler first.** Sync uses transactions on every import.
|
|
If `DATABASE_URL` uses Supabase's Transaction mode pooler, sync will throw
|
|
`.begin() is not a function` and silently skip most pages. Verify the connection
|
|
string uses Session mode (port 6543, Session mode) or direct (port 5432).
|
|
|
|
2. **Set up automatic sync.** Choose the approach that fits your environment:
|
|
- **Cron** (recommended for agents): register a cron every 5-30 minutes:
|
|
`gbrain sync --repo /data/brain && gbrain embed --stale`
|
|
- **Watch mode**: `gbrain sync --watch --repo /data/brain` under a process
|
|
manager. Pair with a cron fallback (watch exits after 5 consecutive failures).
|
|
- **Webhook or git hook**: if available in your environment.
|
|
|
|
3. **Verify sync works.** Don't just check that the command ran. Check that it
|
|
worked:
|
|
- `gbrain stats` should show page count close to syncable file count in the repo.
|
|
- If page count is way too low, the pooler bug is silently skipping pages.
|
|
- Push a test change and confirm it appears in `gbrain search`.
|
|
|
|
4. **Chain sync + embed.** Always run both: `gbrain sync --repo <path> && gbrain
|
|
embed --stale`. For small syncs, embeddings are generated inline. The `embed
|
|
--stale` is a safety net for any stale chunks.
|
|
|
|
Tell the user: "Live sync is configured. The brain will stay current automatically.
|
|
I'll verify it's working in the next phase."
|
|
|
|
## Phase I: Full Verification
|
|
|
|
Run the full verification runbook to confirm the entire installation is working.
|
|
|
|
1. Read `docs/GBRAIN_VERIFY.md`
|
|
2. Execute each check in order
|
|
3. Report results to the user
|
|
4. Fix any failures before declaring setup complete
|
|
|
|
Every check in the runbook should pass. The most important one is check 4 (live
|
|
sync actually works): push a change, wait for sync, search for the corrected text.
|
|
"Sync ran" is not the same as "sync worked."
|
|
|
|
Tell the user: "I've verified the full GBrain installation. Here's the status of
|
|
each check: [list results]. Everything is working / [specific item] needs attention."
|
|
|
|
If already configured or user declines, skip.
|
|
|
|
## Schema State Tracking
|
|
|
|
After presenting the recommended directories (Phase C/E) and the user selects which
|
|
ones to create, write `~/.gbrain/update-state.json` recording:
|
|
- `schema_version_applied`: current gbrain version
|
|
- `skillpack_version_applied`: current gbrain version
|
|
- `schema_choices.adopted`: directories the user created
|
|
- `schema_choices.declined`: directories the user explicitly skipped
|
|
- `schema_choices.custom`: directories the user added that aren't in the recommended schema
|
|
|
|
This file enables future upgrades to suggest new schema additions without
|
|
re-suggesting things the user already declined.
|
|
|
|
## Tools Used
|
|
|
|
- `gbrain init --non-interactive --url ...` -- create brain
|
|
- `gbrain import <dir> --no-embed [--workers N]` -- import files
|
|
- `gbrain search <query>` -- search brain
|
|
- `gbrain doctor --json` -- health check
|
|
- `gbrain check-update --json` -- check for updates
|
|
- `gbrain embed refresh` -- generate embeddings
|
|
- `gbrain embed --stale` -- backfill missing embeddings
|
|
- `gbrain sync --repo <path>` -- one-shot sync from brain repo
|
|
- `gbrain sync --watch --repo <path>` -- continuous sync polling
|
|
- `gbrain config get sync.last_run` -- check last sync timestamp
|
|
- `gbrain stats` -- page count + embed coverage
|