* docs: break SKILLPACK into 17 individual guides The 1,281-line SKILLPACK monolith is now 17 individually linkable guides in docs/guides/, organized by category: core patterns, data pipelines, operations, search, and administration. GBRAIN_SKILLPACK.md becomes a structured index with categorized tables linking to each guide. The URL stays stable for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add integration guides, architecture docs, and ethos New documentation directories: - docs/integrations/ — "Getting Data In" landing page, credential gateway, meeting webhooks. Includes recipe format documentation. - docs/architecture/ — Infrastructure layer doc (import, chunk, embed, search) - docs/ethos/ — "Thin Harness, Fat Skills" essay with agent decision guide - docs/designs/ — "Homebrew for Personal AI" 10-star vision document Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add gbrain integrations command + voice-to-brain recipe New CLI command: gbrain integrations (list/show/status/doctor/stats/test) - Standalone command, no database connection needed - Uses gray-matter directly for recipe parsing (not parseMarkdown) - --json flag on every subcommand for agent-parseable output - Bare command shows senses/reflexes dashboard - Health heartbeat via ~/.gbrain/integrations/<id>/heartbeat.jsonl First recipe: recipes/twilio-voice-brain.md - Phone calls create brain pages via Twilio + OpenAI Realtime - Opinionated defaults: caller screening, brain-first lookup, quiet hours - Outbound call smoke test (GBrain calls the user to prove it works) - Validate-as-you-go credential testing - Twilio signature validation for webhook security Migration file for v0.7.0 with agent-readable changelog. 13 unit tests covering parseRecipe, CLI routing, and recipe validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Getting Data In to README, update CLAUDE.md and manifest README: voice calls in intro bullet list, new "Getting Data In" section with integration table (voice, email, X, calendar) and recipe philosophy. CLAUDE.md: reference new files (integrations.ts, recipes/, docs/guides/, docs/integrations/, docs/architecture/, docs/ethos/). manifest.json: bump to v0.7.0, add recipes_dir field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: v0.7.0 CHANGELOG, TODOS, VERSION bump CHANGELOG: v0.7.0 entry covering integration recipes, voice-to-brain, gbrain integrations command, SKILLPACK breakout, and new documentation. TODOS: 3 new items from CEO/DX reviews (constrained health_check DSL, community recipe submission, always-on deployment recipes). VERSION + package.json: bump to 0.7.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite voice recipe with agent instructions and verified links Major improvements to recipes/twilio-voice-brain.md: - Agent preamble: explains WHY sequential execution matters (each step depends on the previous), defines 4 stop points where the agent MUST pause and verify, tells agent to never say "something went wrong" but instead explain the exact error and fix - User actions are now specific: exact URLs for every credential (Twilio console, OpenAI API keys page, ngrok dashboard), what buttons to click, what fields to copy, common failure modes - All URLs verified via web search against current 2026 documentation: Twilio SID/token at twilio.com/console, OpenAI keys at platform.openai.com/api-keys, ngrok token at dashboard.ngrok.com/get-started/your-authtoken - Cost estimate corrected: OpenAI Realtime is $0.06/min input + $0.24/min output (was understated), total ~$20-22/mo for 100 min - Validate-as-you-go: each credential tested immediately with exact curl commands, failure messages explain what went wrong and how to fix - Smoke test flow: tells user exactly what to say, verifies ALL three outputs (messaging notification + brain page + search result) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add "Homebrew for Personal AI" essay (markdown is code) New essay at docs/ethos/MARKDOWN_SKILLS_AS_RECIPES.md — the distribution corollary to "Thin Harness, Fat Skills." Argues that markdown skill files are simultaneously documentation, specification, package, and source code. The agent is the package manager. The git repo is the app store. Referenced from SKILLPACK index and CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite agent instructions as command language, promote skills The OpenClaw/Hermes install block is now a drill sergeant, not a tour guide. Every step is an imperative command with exact verification criteria and explicit stop-on-failure behavior. No FYI, no suggestions, just rails. Key changes: - 11-step setup with STOP points after each step - Exact user instructions for Supabase connection string (what to click, what NOT to give the agent, what the string looks like) - "Verify: run X. You must see Y. If not: Z" after every step - Skills table now links to both skill files AND guide docs - Integration recipes table simplified (no "coming soon" placeholders) - Docs section reorganized: for agents / for humans / reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: 4 codex findings + add email-to-brain recipe Codex review found 4 issues, all fixed: 1. getStatus() returned "configured" if ANY secret was set (e.g. just OPENAI_API_KEY). Now requires ALL required secrets before marking configured. Prevents false "configured" status and spurious doctor runs. 2. Twilio health check hit unauthenticated endpoint (always 401). Now uses authenticated curl with SID:token, matching the setup validation. 3. README anchor docs/GBRAIN_SKILLPACK.md#the-dream-cycle broken after SKILLPACK rewrite. Updated to point to docs/guides/cron-schedule.md. 4. Compiled binary can't find recipes/ via import.meta.dir. Added GBRAIN_RECIPES_DIR env var override + global bun install path fallback. Also adds recipes/email-to-brain.md: Gmail deterministic collector pattern with ClawVisor credential gateway, validate-as-you-go, agent instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add email, X, calendar, and meeting sync recipes Four new integration recipes extracted from production wintermute patterns: - recipes/email-to-brain.md: Gmail via ClawVisor, deterministic collector pattern (code pulls emails with baked-in links, agent does judgment), noise filtering, signature detection, digest generation - recipes/x-to-brain.md: X API v2, timeline + mentions + keyword search, deletion detection (diffs previous run, verifies 404), engagement velocity tracking, rate limit awareness - recipes/calendar-to-brain.md: Google Calendar via ClawVisor, historical backfill (years of data), daily markdown files with attendees + locations, attendee enrichment for brain pages - recipes/meeting-sync.md: Circleback API, transcript import with speaker labels, attendee detection + filtering, entity propagation to people/ company pages, action item extraction, idempotent by source_id All recipes follow the same format: agent preamble with sequential execution rules, validate-as-you-go credentials, exact URLs for API key setup, stop-on-failure verification, and heartbeat logging. Updated README, SKILLPACK index, and integrations landing page with all 5 recipes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Google OAuth as alternative to ClawVisor in email + calendar recipes Both recipes now offer two auth options: - Option A: ClawVisor (recommended, handles OAuth + token refresh) - Option B: Google OAuth2 directly (no extra service, you manage tokens) Option B includes step-by-step instructions for Google Cloud Console: exact URLs, which buttons to click, which scopes to add, how to enable the API, and the OAuth flow for token exchange. This removes ClawVisor as a hard dependency for getting started. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add implementation guides with pseudocode and test suggestions Every recipe now includes an "Implementation Guide" section with: - Production-tested pseudocode the agent can follow to build each collector - Edge cases and failure modes discovered in real deployment - Non-obvious implementation details (why the 48h staleness heuristic, why Gmail links need authuser, why SSE responses need double-parsing) - Test suggestions: what the agent should verify after setup email-to-brain: noise filtering algorithm, signature detection patterns, Gmail link generation (authuser is critical), sent-mail dedup x-to-brain: deletion detection with 3 heuristics (7-day, 48h staleness, API verification), engagement velocity thresholds (50 min for 2x, 100 absolute jump), atomic writes, stdout contract, rate limit handling calendar-to-brain: smart chunking (monthly for sparse years, weekly for dense), attendee filtering (rooms, groups, distros), merge-with-existing (only replace ## Calendar section), date/time parsing edge cases meeting-sync: SSE double-JSON parsing, idempotency double-check (grep + filename), auto-tagging from meeting names, git commit after sync Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: 6 new guides from production patterns (wintermute extraction) New guides extracted and generalized from production deployment: - repo-architecture.md: Two-repo pattern (agent behavior vs world knowledge). Strict boundary rules, decision tree, hard rule: never write knowledge to the agent repo. - sub-agent-routing.md: Model routing table by task type. Signal detector pattern (spawn Sonnet on every message). Research pipeline pattern (Opus plans, DeepSeek executes, Opus synthesizes). Cost optimization. - skill-development.md: 5-step cycle (concept, prototype, evaluate, codify, cron). MECE discipline (no overlapping skills). Quality bar checklist. "If you ask twice, it should already be a skill." - idea-capture.md: Originality distribution rating (0-100 across 4 populations). Depth test ("could someone unfamiliar understand WHY?"). Deep cross-linking mandate. Notability filtering. - quiet-hours.md: Hold notifications 11pm-8am local time. Held messages directory pattern. Timezone-aware delivery. Morning briefing pickup. - diligence-ingestion.md: 9-step pipeline for data room materials. Detection patterns (PDF filenames, spreadsheet tabs, user language). Index.md template with bull/bear case. Company page enrichment. All PII scrubbed. Patterns generalized for any user. SKILLPACK index updated with 6 new entries. CLAUDE.md references added. All 37 SKILLPACK links verified. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: upgrade all guides to operational playbooks with pseudocode Every guide now follows the playbook structure: - Goal: one sentence, what this achieves - What the User Gets: without this / with this - Implementation: pseudocode with actual gbrain commands - Tricky Spots: production-tested gotchas - How to Verify: test steps the agent runs after setup Guides upgraded (15 files): - brain-agent-loop: on_message() loop with read/write/sync pseudocode - brain-first-lookup: 4-step lookup cascade with exact commands - brain-vs-memory: routing algorithm for 3 knowledge layers - compiled-truth: page structure + rewrite vs append rules - content-media: 3 ingest patterns (YouTube, social, PDFs) - cron-schedule: full schedule table + dream cycle pseudocode - enrichment-pipeline: 7-step protocol with tier classification - entity-detection: spawn pattern + detection prompt + notability filter - executive-assistant: 3 workflow algorithms (triage, prep, post-inbox) - meeting-ingestion: 6-step transcript-to-brain flow - operational-disciplines: 5 executable discipline blocks - originals-folder: detection + exact-phrasing capture + cross-linking - search-modes: decision tree for keyword vs hybrid vs direct - source-attribution: citation format + hierarchy + conflict resolution - Plus Goal/What User Gets headers on 6 newer guides Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add WebRTC to voice recipe + ngrok Hobby setup guide Voice recipe updates: - Added WebRTC endpoint (POST /session, GET /call, POST /tool) for browser-based calling with RNNoise noise suppression - WebRTC pseudocode with the 4 non-obvious gotchas from production (voice under audio.output.voice, no turn_detection, no session.update on connect, trigger greeting via data channel) - Recommend ngrok Hobby ($8/mo) for fixed domain instead of free tier - Fixed domain means URLs never change, Twilio never breaks New guide: docs/mcp/NGROK_SETUP.md - How to set up ngrok Hobby for both MCP and voice agent - Fixed domain setup, watchdog pattern, AI client configuration - Claude Desktop requires Settings > Integrations (not JSON config) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add dependency graph + ngrok-tunnel + credential-gateway recipes Recipes now have real dependencies via the `requires` field: - voice-to-brain requires ngrok-tunnel (needs public URL for Twilio) - email-to-brain requires credential-gateway (needs Gmail access) - calendar-to-brain requires credential-gateway (needs Calendar access) - x-to-brain and meeting-sync are standalone (direct API keys) Two new infrastructure recipes: - ngrok-tunnel: fixed public URL for MCP + voice. Recommends Hobby ($8/mo) for a domain that never changes. Includes watchdog pattern. - credential-gateway: secure Google service access via ClawVisor (recommended) or direct OAuth2. One setup, all Google recipes use it. Moved ngrok from docs/mcp/ to recipes/ — it's shared infrastructure, not MCP-specific. README and integrations landing page show dependency chains. When agent installs voice-to-brain, it sets up ngrok-tunnel first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add infra category, fix dashboard alignment, show dependencies DX audit found two bugs in gbrain integrations dashboard: 1. Column alignment broken — IDs > 18 chars ran into descriptions with no space. Fixed: pad to 22 chars. 2. ngrok-tunnel and credential-gateway showed as SENSES but they're infrastructure. Added 'infra' category. Dashboard now shows three sections: INFRASTRUCTURE (set up first), SENSES, REFLEXES. 3. Dependencies now shown inline: "AVAILABLE (needs credential-gateway)" Also added 'requires' field to JSON output for agent consumption. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add frontier model requirement disclaimer to README GBrain's markdown-is-code approach requires models capable of interpreting intent and implementing from architecture descriptions. Tested with Claude Opus 4.6 and GPT-5.4 Thinking. Smaller models will struggle with the recipe format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add PGLite → Supabase upgrade path to README Clarify the database progression: start with PGLite (Postgres as WASM, zero infrastructure, pgvector built in, nothing to install). Graduate to Supabase or self-hosted Postgres when you need connection pooling, concurrency, and remote MCP access from Claude Desktop, Cowork, ChatGPT, Perplexity Computer, or any MCP-compatible agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: revert PGLite mention (coming in next branch) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: make all 23 guides consistent (Goal/Impl/Tricky/Verify) Every guide now has exactly these sections in this order: - ## Goal (one sentence) - ## What the User Gets (without this / with this) - ## Implementation (pseudocode with gbrain commands) - ## Tricky Spots (3-5 numbered gotchas) - ## How to Verify (3-5 numbered test steps) 11 guides restructured from non-standard headings: - deterministic-collectors, live-sync, upgrades-auto-update (full rewrites) - entity-detection, diligence-ingestion, idea-capture, quiet-hours, repo-architecture, skill-development, sub-agent-routing (restructured) 23/23 guides now pass consistency audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: restructure README around the #1 blocker (getting data in) The README was leading with Postgres and database architecture. Most users are stuck at step zero: "I have an agent but it doesn't know anything about my life." New structure: 1. The Problem — your agent doesn't know your life 2. Getting Data In — integration recipes, front and center 3. The Compounding Thesis — why this matters 4. How this happened — credibility, origin story 5. When you need Postgres — scale, not starting point Postgres is de-emphasized from a full section to two paragraphs: "You don't need Postgres to start" and "When you need Postgres" (1,000+ files, remote MCP access, multiple AI clients). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move Install to top of README, remove duplicate section Install now appears right after Getting Data In (line 38), not buried at line 295. The user sees: Problem → Getting Data In → Install. Removed the duplicate Install section (262 lines) that was lower in the README. The agent instructions block, CLI quickstart, and all content is now in the single Install section near the top. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move agent install block to first thing in README "Start here: paste this into your agent" is now the first section, right after the one-line pitch. No scrolling, no context, no preamble. User opens the README, sees the paste block, copies it into OpenClaw or Hermes, and the agent takes over. Flow: pitch → paste block → Getting Data In → Compounding Thesis → origin story Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: compress install block from 11 steps to 5 The agent install block was 102 lines and 11 steps. Now it's 40 lines and 5 steps. Same coverage, half the text. Changes: - Merged "prove keyword search" + "embed" + "prove hybrid search" into one SEARCH step (the user doesn't care about the intermediate) - Merged skillpack, sync, auto-update, integrations, verification into one GO LIVE step with sub-items (post-install polish, not install) - Shortened database instructions (one line instead of 5 sub-steps) - Removed redundant preamble ("YOU MUST COMPLETE EVERY STEP" is now just "Do not skip steps. Verify each step.") The 5 steps: INSTALL → DATABASE → IMPORT → SEARCH → GO LIVE Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * security: gitignore all .env files, not just specific ones CSO audit found .gitignore covered .env.testing and .env.production but not bare .env. A user creating .env with database credentials could accidentally commit it. Fix: .env and .env.* are now gitignored. .env.*.example files are explicitly un-ignored so templates remain tracked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * security: scrub PII from essay and recipe examples - 510-MY-GARRY phone mnemonic → "Your Phone Number" - "Garry → Authenticated Mode" → "Owner → Authenticated Mode" - "Telegram" → "secure channel" in auth example - @garrytan → @yourhandle in X recipe example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
27 KiB
GBrain
Your AI agent is smart but it doesn't know anything about your life. GBrain fixes that. Meetings, emails, tweets, calendar events, voice calls, original ideas... all of it flows into a searchable knowledge base that your agent reads before every response and writes to after every conversation. The agent gets smarter every day.
Requires a frontier model. Tested with Claude Opus 4.6 and GPT-5.4 Thinking. Likely to break with smaller models.
Start here: paste this into your agent
Copy this block into OpenClaw or Hermes Agent. The agent will install GBrain, set up your database, import your files, and verify everything works.
GOAL: Install GBrain so I can search my own markdown files with
hybrid search (keyword + vector). Do not skip steps. Verify each
step before proceeding. If a step fails, stop and fix it.
STEP 1 — INSTALL
curl -fsSL https://bun.sh/install | bash && source ~/.bashrc
bun add github:garrytan/gbrain
Verify: gbrain --version
STEP 2 — DATABASE
Run: gbrain init --supabase
Ask the user for their Supabase Shared Pooler connection string:
"Go to https://supabase.com → your project → Get Connected →
Direct Connection String → Session Pooler → copy Shared Pooler"
Do NOT ask for the anon key or project URL. Those are wrong.
Verify: gbrain doctor --json (all checks pass)
STEP 3 — IMPORT
Find markdown: find ~/git ~/Documents ~/notes -maxdepth 3 -name "*.md" | head -30
Tell user what you found. Import: gbrain import /path/ --no-embed
Verify: gbrain stats (page count > 0)
STEP 4 — SEARCH
Run: gbrain embed --stale (generates vector embeddings, ~1 min per 1000 pages)
Then: gbrain query "what are the key themes across these documents?"
Show the user the results. This is the magic moment.
Verify: results include semantic matches, not just keyword hits
STEP 5 — GO LIVE
a. Read docs/GBRAIN_SKILLPACK.md — adopt brain-first lookup, entity
detection, and back-linking into your behavior immediately
b. Set up live sync: gbrain sync --repo /path/ && gbrain embed --stale
(see docs/guides/live-sync.md for cron/watch/webhook options)
c. Set up auto-update: gbrain check-update --json on a daily cron
(never auto-install — always ask the user first)
d. Show available integrations: gbrain integrations
e. Run verification: read and execute docs/GBRAIN_VERIFY.md
DONE. Tell the user: "GBrain is live. You have [N] pages with hybrid
search. I now check the brain before answering questions. Run
'gbrain integrations' to add voice, email, calendar, or Twitter."
Without an agent (standalone CLI)
bun add -g github:garrytan/gbrain
gbrain init --supabase # guided wizard
gbrain import ~/git/brain/ # index your markdown
gbrain query "what do we know about competitive dynamics?"
Run gbrain --help for all commands. See MCP setup for connecting Claude Desktop, Perplexity, etc.
Getting Data In
Once GBrain is installed, your agent needs data flowing in. GBrain ships integration recipes that your agent sets up for you. It reads the recipe, asks for API keys, validates each one, and runs a smoke test. Markdown is code... the recipe IS the installer.
| Recipe | Requires | What It Does |
|---|---|---|
| Public Tunnel | — | Fixed URL for MCP + voice (ngrok Hobby $8/mo) |
| Credential Gateway | — | Gmail + Calendar access (ClawVisor or Google OAuth) |
| Voice-to-Brain | ngrok-tunnel | Phone calls → brain pages (Twilio + OpenAI Realtime) |
| Email-to-Brain | credential-gateway | Gmail → entity pages (deterministic collector) |
| X-to-Brain | — | Twitter → brain pages (timeline + mentions + deletions) |
| Calendar-to-Brain | credential-gateway | Google Calendar → searchable daily pages |
| Meeting Sync | — | Circleback transcripts → brain pages with attendees |
Run gbrain integrations to see status. Dependencies resolve automatically. See Getting Data In for the full guide.
The Compounding Thesis
Most tools help you find things. GBrain makes you smarter over time.
Signal arrives (meeting, email, tweet, link)
→ Agent detects entities (people, companies, ideas)
→ READ: check the brain first (gbrain search, gbrain get)
→ Respond with full context
→ WRITE: update brain pages with new information
→ Sync: gbrain indexes changes for next query
Every cycle through this loop adds knowledge. The agent enriches a person page after a meeting. Next time that person comes up, the agent already has context. You never start from zero.
An agent without this loop answers from stale context. An agent with it gets smarter every conversation. The difference compounds daily.
"Who should I invite to dinner who knows both Pedro and Diana?" — cross-references the social graph across 3,000+ people pages
"What have I said about the relationship between shame and founder performance?" — searches YOUR thinking, not the internet
"Prep me for my meeting with Jordan in 30 minutes" — pulls dossier, shared history, recent activity, open threads
How this happened
I was setting up my OpenClaw agent and started a markdown brain repo. One page per person, one page per company, compiled truth on top, append-only timeline on the bottom. The agent got smarter the more it knew, so I kept feeding it. Within a week I had 10,000+ markdown files, 3,000+ people with compiled dossiers, 13 years of calendar data, 280+ meeting transcripts, and 300+ captured original ideas.
The agent runs while I sleep. The dream cycle scans every conversation, enriches missing entities, fixes broken citations, and consolidates memory. I wake up and the brain is smarter than when I went to sleep. See the cron schedule guide for setup.
You don't need Postgres to start. The knowledge model is just markdown files in a git repo. The skills and schema work with any AI agent that can read and write files.
When you need Postgres: at 1,000+ files, grep stops working. GBrain adds hybrid search (keyword + vector + RRF fusion) on top of Postgres + pgvector. The CLI and MCP layer handle chunking, embedding, and incremental sync. Add Postgres when search speed matters, or when you want Claude Desktop, ChatGPT, Perplexity, or other MCP clients to connect to your brain remotely.
Architecture
┌──────────────────┐ ┌───────────────┐ ┌──────────────────┐
│ Brain Repo │ │ GBrain │ │ AI Agent │
│ (git) │ │ (retrieval) │ │ (read/write) │
│ │ │ │ │ │
│ markdown files │───>│ Postgres + │<──>│ skills define │
│ = source of │ │ pgvector │ │ HOW to use the │
│ truth │ │ │ │ brain │
│ │<───│ hybrid │ │ │
│ human can │ │ search │ │ entity detect │
│ always read │ │ (vector + │ │ enrich │
│ & edit │ │ keyword + │ │ ingest │
│ │ │ RRF) │ │ brief │
└──────────────────┘ └───────────────┘ └──────────────────┘
The repo is the system of record. GBrain is the retrieval layer. The agent reads and writes through both. Human always wins — you can edit any markdown file directly and gbrain sync picks up the changes.
What a Production Agent Looks Like
The numbers above aren't theoretical. They come from a real deployment documented in GBRAIN_SKILLPACK.md — a reference architecture for how a production AI agent uses gbrain as its knowledge backbone.
Read the skillpack. It's the most important doc in this repo. It tells your agent HOW to use gbrain, not just what commands exist:
- The brain-agent loop — the read-write cycle that makes knowledge compound
- Entity detection — spawn on every message, capture people/companies/original ideas
- Enrichment pipeline — 7-step protocol with tiered API spend
- Meeting ingestion — transcript to brain pages with entity propagation
- Source attribution — every fact traceable to where it came from
- Reference cron schedule — 20+ recurring jobs that keep the brain alive
Without the skillpack, your agent has tools but no playbook. With it, the agent knows when to read, when to write, how to enrich, and how to keep the brain alive autonomously. It's a pattern book, not a tutorial. "Here's what works, here's why."
How gbrain fits with OpenClaw/Hermes
GBrain is world knowledge — people, companies, deals, meetings, concepts, your original thinking. It's the long-term memory of what you know about the world.
OpenClaw agent memory (memory_search) is operational state — preferences, decisions, session context, how the agent should behave.
They're complementary:
| Layer | What it stores | How to query |
|---|---|---|
| gbrain | People, companies, meetings, ideas, media | gbrain search, gbrain query, gbrain get |
| Agent memory | Preferences, decisions, operational config | memory_search |
| Session context | Current conversation | (automatic) |
All three should be checked. GBrain for facts about the world. Memory for agent config. Session for immediate context. Install via openclaw skills install gbrain.
Try it: your files, searchable in 90 seconds
GBrain doesn't ship with demo data. It finds YOUR markdown and makes it searchable.
Act 1: Discovery. GBrain scans your machine for markdown repos.
=== GBrain Environment Discovery ===
~/git/brain (2.3GB, 342 .md files, 87 binary files)
Type: Plain markdown (ready for import)
~/Documents/obsidian-vault (180MB, 1,203 .md files, 0 binary files)
Type: Obsidian vault (wikilink conversion available)
=== Discovery Complete ===
Act 2: Import. Your files move from the repo into Supabase.
gbrain import ~/git/brain/
# Imported 342 files into Supabase (1,847 chunks). Embedding in background...
gbrain stats
# Pages: 342, Chunks: 1,847, Embedded: 0 (embedding...), Links: 0
Act 3: Search. The agent picks a query from your actual content.
# The agent reads your corpus and picks a relevant query
gbrain query "what do we know about competitive dynamics?"
# 3 results, scored by hybrid search (vector + keyword + RRF fusion)
# 30 seconds later, embeddings finish:
gbrain stats
# Pages: 342, Chunks: 1,847, Embedded: 1,847, Links: 0
# Now semantic search is live too
gbrain query "what are our biggest risks right now?"
# Finds pages about moats, board prep, and strategy -- by meaning, not keywords
Your file count will be different. Your queries will be different. The agent picks them based on what it imported. That's the point: this is YOUR brain, not a demo.
The compounding effect. Search for Pedro. The agent pulls his page, his relationship history, his company. Next time Brex comes up in conversation, the agent already knows Pedro co-founded it, what you discussed last, and what's on your open threads. You didn't do anything — the brain already had it.
Upgrade
Upgrade depends on how you installed:
# Installed via bun (standalone or library)
bun update gbrain
# Installed via ClawHub
clawhub update gbrain
# Compiled binary
# Download the latest from https://github.com/garrytan/gbrain/releases
After upgrading, run gbrain init again to apply any schema migrations (idempotent, safe to re-run).
Setup
After installing via CLI or library path, run the setup wizard:
# Guided wizard: auto-provisions Supabase or accepts a connection URL
gbrain init --supabase
# Or connect to any Postgres with pgvector
gbrain init --url postgresql://user:pass@host:5432/dbname
The init wizard:
- Checks for Supabase CLI, offers auto-provisioning
- Falls back to manual connection URL if CLI isn't available
- Runs the full schema migration (tables, indexes, triggers, extensions)
- Verifies the connection and confirms the database is ready for import
Config is saved to ~/.gbrain/config.json with 0600 permissions.
OpenClaw users skip this step. The orchestrator runs the wizard for you during install.
First import
# Import your markdown wiki (auto-chunks and auto-embeds)
gbrain import /path/to/brain/
# Skip embedding if you want to import fast and embed later
gbrain import /path/to/brain/ --no-embed
# Backfill embeddings for pages that don't have them
gbrain embed --stale
Import is idempotent. Re-running it skips unchanged files (compared by SHA-256 content hash). Progress bar shows status. ~30s for text import of 7,000 files, ~10-15 min for embedding.
File storage and migration
Brain repos accumulate binary files: images, PDFs, audio recordings, raw API responses. A repo with 3,000 markdown pages might have 2GB of binaries making git clone painful.
GBrain has a three-stage migration lifecycle that moves binaries to cloud storage while preserving every reference:
Local files in git repo
│
▼ gbrain files mirror <dir>
Cloud copy exists, local files untouched
│
▼ gbrain files redirect <dir>
Local files replaced with .redirect breadcrumbs (tiny YAML pointers)
│
▼ gbrain files clean <dir>
Breadcrumbs removed, cloud is the only copy
Every stage is reversible until clean:
# Stage 1: Copy to cloud (git repo unchanged)
gbrain files mirror ~/git/brain/attachments/ --dry-run # preview first
gbrain files mirror ~/git/brain/attachments/
# Stage 2: Replace local files with breadcrumbs
gbrain files redirect ~/git/brain/attachments/ --dry-run
gbrain files redirect ~/git/brain/attachments/
# Your git repo just dropped from 2GB to 50MB
# Undo: download everything back from cloud
gbrain files restore ~/git/brain/attachments/
# Stage 3: Remove breadcrumbs (irreversible, cloud is the only copy)
gbrain files clean ~/git/brain/attachments/ --yes
Storage backends: S3-compatible (AWS S3, Cloudflare R2, MinIO), Supabase Storage, or local filesystem. Configured during gbrain init.
Additional file commands:
gbrain files list [slug] # list files for a page (or all)
gbrain files upload <file> --page <slug> # upload file linked to page
gbrain files sync <dir> # bulk upload directory
gbrain files verify # verify all uploads match local
gbrain files status # show migration status of directories
gbrain files unmirror <dir> # remove mirror marker (files stay in cloud)
The file resolver (src/core/file-resolver.ts) handles fallback automatically: if a local file is missing, it checks for a .redirect breadcrumb, then a .supabase marker, and resolves to the cloud URL. Code that references files by path keeps working after migration.
The knowledge model
Every page in the brain follows the compiled truth + timeline pattern:
---
type: concept
title: Do Things That Don't Scale
tags: [startups, growth, pg-essay]
---
Paul Graham's argument that startups should do unscalable things early on.
The most common: recruiting users manually, one at a time. Airbnb went
door to door in New York photographing apartments. Stripe manually
installed their payment integration for early users.
The key insight: the unscalable effort teaches you what users actually
want, which you can't learn any other way.
---
- 2013-07-01: Published on paulgraham.com
- 2024-11-15: Referenced in batch W25 kickoff talk
- 2025-02-20: Cited in discussion about AI agent onboarding strategies
Above the --- separator: compiled truth. Your current best understanding. Gets rewritten when new evidence changes the picture. Below: timeline. Append-only evidence trail. Never edited, only added to.
The compiled truth is the answer. The timeline is the proof.
How search works
Query: "when should you ignore conventional wisdom?"
|
Multi-query expansion (Claude Haiku)
"contrarian thinking startups", "going against the crowd"
|
+----+----+
| |
Vector Keyword
(HNSW (tsvector +
cosine) ts_rank)
| |
+----+----+
|
RRF Fusion: score = sum(1/(60 + rank))
|
4-Layer Dedup
1. Best chunk per page
2. Cosine similarity > 0.85
3. Type diversity (60% cap)
4. Per-page chunk cap
|
Stale alerts (compiled truth older than latest timeline)
|
Results
Keyword search alone misses conceptual matches. "Ignore conventional wisdom" won't find an essay titled "The Bus Ticket Theory of Genius" even though it's exactly about that. Vector search alone misses exact phrases when the embedding is diluted by surrounding text. RRF fusion gets both right. Multi-query expansion catches phrasings you didn't think of.
Database schema
10 tables in Postgres + pgvector:
pages The core content table
slug (UNIQUE) e.g. "concepts/do-things-that-dont-scale"
type person, company, deal, yc, civic, project, concept, source, media
title, compiled_truth, timeline
frontmatter (JSONB) Arbitrary metadata
search_vector Trigger-based tsvector (title + compiled_truth + timeline + timeline_entries)
content_hash SHA-256 for import idempotency
content_chunks Chunked content with embeddings
page_id (FK) Links to pages
chunk_text The chunk content
chunk_source 'compiled_truth' or 'timeline'
embedding (vector) 1536-dim from text-embedding-3-large
HNSW index Cosine similarity search
links Cross-references between pages
from_page_id, to_page_id
link_type knows, invested_in, works_at, founded, references, etc.
tags page_id + tag (many-to-many)
timeline_entries Structured timeline events
page_id, date, source, summary, detail (markdown)
page_versions Snapshot history for compiled_truth
compiled_truth, frontmatter, snapshot_at
raw_data Sidecar JSON from external APIs
page_id, source, data (JSONB)
files Binary attachments in Supabase Storage
page_slug (FK) Links to pages (ON UPDATE CASCADE)
storage_path, content_hash, mime_type, metadata (JSONB)
ingest_log Audit trail of import/ingest operations
config Brain-level settings (embedding model, chunk strategy, sync state)
Indexes: B-tree on slug/type, GIN on frontmatter/search_vector, HNSW on embeddings, pg_trgm on title for fuzzy slug resolution.
Chunking
Three strategies, dispatched by content type:
Recursive (timeline, bulk import): 5-level delimiter hierarchy (paragraphs, lines, sentences, clauses, words). 300-word chunks with 50-word sentence-aware overlap. Fast, predictable, lossless.
Semantic (compiled truth): Embeds each sentence, computes adjacent cosine similarities, applies Savitzky-Golay smoothing to find topic boundaries. Falls back to recursive on failure. Best quality for intelligence assessments.
LLM-guided (high-value content, on request): Pre-splits into 128-word candidates, asks Claude Haiku to identify topic shifts in sliding windows. 3 retries per window. Most expensive, best results.
Commands
SETUP
gbrain init [--supabase|--url <conn>] Create brain (guided wizard)
gbrain upgrade Self-update
PAGES
gbrain get <slug> Read a page (supports fuzzy slug matching)
gbrain put <slug> [< file.md] Write/update a page (auto-versions)
gbrain delete <slug> Delete a page
gbrain list [--type T] [--tag T] [-n N] List pages with filters
SEARCH
gbrain search <query> Keyword search (tsvector)
gbrain query <question> Hybrid search (vector + keyword + RRF + expansion)
IMPORT/EXPORT
gbrain import <dir> [--no-embed] Import markdown directory (idempotent)
gbrain sync [--repo <path>] [flags] Git-to-brain incremental sync
gbrain export [--dir ./out/] Export to markdown (round-trip)
FILES
gbrain files list [slug] List stored files
gbrain files upload <file> --page <slug> Upload file to storage
gbrain files sync <dir> Bulk upload directory
gbrain files verify Verify all uploads
EMBEDDINGS
gbrain embed [<slug>|--all|--stale] Generate/refresh embeddings
LINKS + GRAPH
gbrain link <from> <to> [--type T] Create typed link
gbrain unlink <from> <to> Remove link
gbrain backlinks <slug> Incoming links
gbrain graph <slug> [--depth N] Traverse link graph (recursive CTE, default depth 5)
TAGS
gbrain tags <slug> List tags
gbrain tag <slug> <tag> Add tag
gbrain untag <slug> <tag> Remove tag
TIMELINE
gbrain timeline [<slug>] View timeline entries
gbrain timeline-add <slug> <date> <text> Add timeline entry
ADMIN
gbrain doctor [--json] Health checks (pgvector, RLS, schema, embeddings)
gbrain stats Brain statistics
gbrain health Health dashboard (embed coverage, stale, orphans)
gbrain history <slug> Page version history
gbrain revert <slug> <version-id> Revert to previous version
gbrain config [get|set] <key> [value] Brain config
gbrain serve MCP server (stdio, local)
scripts/deploy-remote.sh Deploy remote MCP server (Supabase Edge Functions)
bun run src/commands/auth.ts Token management (create/list/revoke/test)
gbrain call <tool> '<json>' Raw tool invocation
gbrain --tools-json Tool discovery (JSON)
Library and MCP details
See GBrain without OpenClaw above for library usage examples, MCP server config, and skill file loading.
The BrainEngine interface is pluggable. See docs/ENGINES.md for how to add backends. 30 MCP tools are generated from the contract-first operations.ts. Parity tests verify structural identity between CLI, MCP, and tools-json.
Skills
Fat markdown files that tell AI agents HOW to use gbrain. No skill logic in the binary.
| Skill | What it does |
|---|---|
| ingest | Ingest meetings, docs, articles. Updates compiled truth (rewrite, not append), appends timeline, creates cross-reference links across all mentioned entities. |
| query | 3-layer search (keyword + vector + structured) with synthesis and citations. Says "the brain doesn't have info on X" rather than hallucinating. |
| maintain | Periodic health: find contradictions, stale compiled truth, orphan pages, dead links, tag inconsistency, missing embeddings, overdue threads. |
| enrich | Enrich pages from external APIs. Raw data stored separately, distilled highlights go to compiled truth. |
| briefing | Daily briefing: today's meetings with participant context, active deals with deadlines, time-sensitive threads, recent changes. |
| migrate | Universal migration from Obsidian (wikilinks to gbrain links), Notion (stripped UUIDs), Logseq (block refs), plain markdown, CSV, JSON, Roam. |
| setup | Set up GBrain from scratch: auto-provision Supabase via CLI, AGENTS.md injection, import, sync. Target TTHW < 2 min. |
Engine Architecture
CLI / MCP Server
(thin wrappers, identical operations)
|
BrainEngine interface
(pluggable backend)
|
+--------+--------+
| |
PostgresEngine SQLiteEngine
(ships v0) (designed, community PRs welcome)
|
Supabase Pro ($25/mo)
Postgres + pgvector + pg_trgm
connection pooling via Supavisor
Embedding, chunking, and search fusion are engine-agnostic. Only raw keyword search (searchKeyword) and raw vector search (searchVector) are engine-specific. RRF fusion, multi-query expansion, and 4-layer dedup run above the engine on SearchResult[] arrays.
Storage estimates
For a brain with ~7,500 pages:
| Component | Size |
|---|---|
| Page text (compiled_truth + timeline) | ~150MB |
| JSONB frontmatter + indexes | ~70MB |
| Content chunks (~22K, text) | ~80MB |
| Embeddings (22K x 1536 floats) | ~134MB |
| HNSW index overhead | ~270MB |
| Links, tags, timeline, versions | ~50MB |
| Total | ~750MB |
Supabase free tier (500MB) won't fit a large brain. Supabase Pro ($25/mo, 8GB) is the starting point.
Initial embedding cost: ~$4-5 for 7,500 pages via OpenAI text-embedding-3-large.
Docs
For agents:
- GBRAIN_SKILLPACK.md -- Start here. Index of all patterns, skills, and integrations
- Individual guides -- 17 standalone guides broken out from the skillpack
- Getting Data In -- Integration recipes, credential setup, data flow patterns
- GBRAIN_VERIFY.md -- Installation verification runbook
For humans:
- GBRAIN_RECOMMENDED_SCHEMA.md -- Brain repo directory structure
- Infrastructure Layer -- How import, chunking, embedding, and search work
- Thin Harness, Fat Skills -- Architecture philosophy
- Homebrew for Personal AI -- Why markdown is code
Reference:
- GBRAIN_V0.md -- Full product spec, all architecture decisions
- ENGINES.md -- Pluggable engine interface, how to add backends
- SQLITE_ENGINE.md -- SQLite engine plan (community PRs welcome)
Contributing
See CONTRIBUTING.md. Run bun test for unit tests. For E2E tests
against real Postgres+pgvector: docker compose -f docker-compose.test.yml up -d then
DATABASE_URL=postgresql://postgres:postgres@localhost:5434/gbrain_test bun run test:e2e.
Welcome PRs for:
- SQLite engine implementation
- New enrichment API integrations
- Performance optimizations
- Docker Compose for self-hosted Postgres
License
MIT