feat: GBrain v0.7.0 — Integration Recipes + SKILLPACK Breakout (#39)

* docs: break SKILLPACK into 17 individual guides

The 1,281-line SKILLPACK monolith is now 17 individually linkable guides
in docs/guides/, organized by category: core patterns, data pipelines,
operations, search, and administration.

GBRAIN_SKILLPACK.md becomes a structured index with categorized tables
linking to each guide. The URL stays stable for backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add integration guides, architecture docs, and ethos

New documentation directories:
- docs/integrations/ — "Getting Data In" landing page, credential gateway,
  meeting webhooks. Includes recipe format documentation.
- docs/architecture/ — Infrastructure layer doc (import, chunk, embed, search)
- docs/ethos/ — "Thin Harness, Fat Skills" essay with agent decision guide
- docs/designs/ — "Homebrew for Personal AI" 10-star vision document

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add gbrain integrations command + voice-to-brain recipe

New CLI command: gbrain integrations (list/show/status/doctor/stats/test)
- Standalone command, no database connection needed
- Uses gray-matter directly for recipe parsing (not parseMarkdown)
- --json flag on every subcommand for agent-parseable output
- Bare command shows senses/reflexes dashboard
- Health heartbeat via ~/.gbrain/integrations/<id>/heartbeat.jsonl

First recipe: recipes/twilio-voice-brain.md
- Phone calls create brain pages via Twilio + OpenAI Realtime
- Opinionated defaults: caller screening, brain-first lookup, quiet hours
- Outbound call smoke test (GBrain calls the user to prove it works)
- Validate-as-you-go credential testing
- Twilio signature validation for webhook security

Migration file for v0.7.0 with agent-readable changelog.
13 unit tests covering parseRecipe, CLI routing, and recipe validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Getting Data In to README, update CLAUDE.md and manifest

README: voice calls in intro bullet list, new "Getting Data In" section
with integration table (voice, email, X, calendar) and recipe philosophy.

CLAUDE.md: reference new files (integrations.ts, recipes/, docs/guides/,
docs/integrations/, docs/architecture/, docs/ethos/).

manifest.json: bump to v0.7.0, add recipes_dir field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: v0.7.0 CHANGELOG, TODOS, VERSION bump

CHANGELOG: v0.7.0 entry covering integration recipes, voice-to-brain,
gbrain integrations command, SKILLPACK breakout, and new documentation.

TODOS: 3 new items from CEO/DX reviews (constrained health_check DSL,
community recipe submission, always-on deployment recipes).

VERSION + package.json: bump to 0.7.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite voice recipe with agent instructions and verified links

Major improvements to recipes/twilio-voice-brain.md:

- Agent preamble: explains WHY sequential execution matters (each step
  depends on the previous), defines 4 stop points where the agent MUST
  pause and verify, tells agent to never say "something went wrong"
  but instead explain the exact error and fix

- User actions are now specific: exact URLs for every credential
  (Twilio console, OpenAI API keys page, ngrok dashboard), what
  buttons to click, what fields to copy, common failure modes

- All URLs verified via web search against current 2026 documentation:
  Twilio SID/token at twilio.com/console, OpenAI keys at
  platform.openai.com/api-keys, ngrok token at
  dashboard.ngrok.com/get-started/your-authtoken

- Cost estimate corrected: OpenAI Realtime is $0.06/min input +
  $0.24/min output (was understated), total ~$20-22/mo for 100 min

- Validate-as-you-go: each credential tested immediately with exact
  curl commands, failure messages explain what went wrong and how to fix

- Smoke test flow: tells user exactly what to say, verifies ALL
  three outputs (messaging notification + brain page + search result)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add "Homebrew for Personal AI" essay (markdown is code)

New essay at docs/ethos/MARKDOWN_SKILLS_AS_RECIPES.md — the distribution
corollary to "Thin Harness, Fat Skills." Argues that markdown skill files
are simultaneously documentation, specification, package, and source code.
The agent is the package manager. The git repo is the app store.

Referenced from SKILLPACK index and CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite agent instructions as command language, promote skills

The OpenClaw/Hermes install block is now a drill sergeant, not a tour guide.
Every step is an imperative command with exact verification criteria and
explicit stop-on-failure behavior. No FYI, no suggestions, just rails.

Key changes:
- 11-step setup with STOP points after each step
- Exact user instructions for Supabase connection string (what to click,
  what NOT to give the agent, what the string looks like)
- "Verify: run X. You must see Y. If not: Z" after every step
- Skills table now links to both skill files AND guide docs
- Integration recipes table simplified (no "coming soon" placeholders)
- Docs section reorganized: for agents / for humans / reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: 4 codex findings + add email-to-brain recipe

Codex review found 4 issues, all fixed:

1. getStatus() returned "configured" if ANY secret was set (e.g. just
   OPENAI_API_KEY). Now requires ALL required secrets before marking
   configured. Prevents false "configured" status and spurious doctor runs.

2. Twilio health check hit unauthenticated endpoint (always 401). Now
   uses authenticated curl with SID:token, matching the setup validation.

3. README anchor docs/GBRAIN_SKILLPACK.md#the-dream-cycle broken after
   SKILLPACK rewrite. Updated to point to docs/guides/cron-schedule.md.

4. Compiled binary can't find recipes/ via import.meta.dir. Added
   GBRAIN_RECIPES_DIR env var override + global bun install path fallback.

Also adds recipes/email-to-brain.md: Gmail deterministic collector pattern
with ClawVisor credential gateway, validate-as-you-go, agent instructions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add email, X, calendar, and meeting sync recipes

Four new integration recipes extracted from production wintermute patterns:

- recipes/email-to-brain.md: Gmail via ClawVisor, deterministic collector
  pattern (code pulls emails with baked-in links, agent does judgment),
  noise filtering, signature detection, digest generation

- recipes/x-to-brain.md: X API v2, timeline + mentions + keyword search,
  deletion detection (diffs previous run, verifies 404), engagement
  velocity tracking, rate limit awareness

- recipes/calendar-to-brain.md: Google Calendar via ClawVisor, historical
  backfill (years of data), daily markdown files with attendees + locations,
  attendee enrichment for brain pages

- recipes/meeting-sync.md: Circleback API, transcript import with speaker
  labels, attendee detection + filtering, entity propagation to people/
  company pages, action item extraction, idempotent by source_id

All recipes follow the same format: agent preamble with sequential execution
rules, validate-as-you-go credentials, exact URLs for API key setup,
stop-on-failure verification, and heartbeat logging.

Updated README, SKILLPACK index, and integrations landing page with all 5 recipes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Google OAuth as alternative to ClawVisor in email + calendar recipes

Both recipes now offer two auth options:
- Option A: ClawVisor (recommended, handles OAuth + token refresh)
- Option B: Google OAuth2 directly (no extra service, you manage tokens)

Option B includes step-by-step instructions for Google Cloud Console:
exact URLs, which buttons to click, which scopes to add, how to enable
the API, and the OAuth flow for token exchange.

This removes ClawVisor as a hard dependency for getting started.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add implementation guides with pseudocode and test suggestions

Every recipe now includes an "Implementation Guide" section with:

- Production-tested pseudocode the agent can follow to build each collector
- Edge cases and failure modes discovered in real deployment
- Non-obvious implementation details (why the 48h staleness heuristic,
  why Gmail links need authuser, why SSE responses need double-parsing)
- Test suggestions: what the agent should verify after setup

email-to-brain: noise filtering algorithm, signature detection patterns,
  Gmail link generation (authuser is critical), sent-mail dedup

x-to-brain: deletion detection with 3 heuristics (7-day, 48h staleness,
  API verification), engagement velocity thresholds (50 min for 2x, 100
  absolute jump), atomic writes, stdout contract, rate limit handling

calendar-to-brain: smart chunking (monthly for sparse years, weekly for
  dense), attendee filtering (rooms, groups, distros), merge-with-existing
  (only replace ## Calendar section), date/time parsing edge cases

meeting-sync: SSE double-JSON parsing, idempotency double-check (grep +
  filename), auto-tagging from meeting names, git commit after sync

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: 6 new guides from production patterns (wintermute extraction)

New guides extracted and generalized from production deployment:

- repo-architecture.md: Two-repo pattern (agent behavior vs world knowledge).
  Strict boundary rules, decision tree, hard rule: never write knowledge
  to the agent repo.

- sub-agent-routing.md: Model routing table by task type. Signal detector
  pattern (spawn Sonnet on every message). Research pipeline pattern
  (Opus plans, DeepSeek executes, Opus synthesizes). Cost optimization.

- skill-development.md: 5-step cycle (concept, prototype, evaluate, codify,
  cron). MECE discipline (no overlapping skills). Quality bar checklist.
  "If you ask twice, it should already be a skill."

- idea-capture.md: Originality distribution rating (0-100 across 4
  populations). Depth test ("could someone unfamiliar understand WHY?").
  Deep cross-linking mandate. Notability filtering.

- quiet-hours.md: Hold notifications 11pm-8am local time. Held messages
  directory pattern. Timezone-aware delivery. Morning briefing pickup.

- diligence-ingestion.md: 9-step pipeline for data room materials. Detection
  patterns (PDF filenames, spreadsheet tabs, user language). Index.md
  template with bull/bear case. Company page enrichment.

All PII scrubbed. Patterns generalized for any user.
SKILLPACK index updated with 6 new entries. CLAUDE.md references added.
All 37 SKILLPACK links verified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: upgrade all guides to operational playbooks with pseudocode

Every guide now follows the playbook structure:
- Goal: one sentence, what this achieves
- What the User Gets: without this / with this
- Implementation: pseudocode with actual gbrain commands
- Tricky Spots: production-tested gotchas
- How to Verify: test steps the agent runs after setup

Guides upgraded (15 files):
- brain-agent-loop: on_message() loop with read/write/sync pseudocode
- brain-first-lookup: 4-step lookup cascade with exact commands
- brain-vs-memory: routing algorithm for 3 knowledge layers
- compiled-truth: page structure + rewrite vs append rules
- content-media: 3 ingest patterns (YouTube, social, PDFs)
- cron-schedule: full schedule table + dream cycle pseudocode
- enrichment-pipeline: 7-step protocol with tier classification
- entity-detection: spawn pattern + detection prompt + notability filter
- executive-assistant: 3 workflow algorithms (triage, prep, post-inbox)
- meeting-ingestion: 6-step transcript-to-brain flow
- operational-disciplines: 5 executable discipline blocks
- originals-folder: detection + exact-phrasing capture + cross-linking
- search-modes: decision tree for keyword vs hybrid vs direct
- source-attribution: citation format + hierarchy + conflict resolution
- Plus Goal/What User Gets headers on 6 newer guides

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add WebRTC to voice recipe + ngrok Hobby setup guide

Voice recipe updates:
- Added WebRTC endpoint (POST /session, GET /call, POST /tool) for
  browser-based calling with RNNoise noise suppression
- WebRTC pseudocode with the 4 non-obvious gotchas from production
  (voice under audio.output.voice, no turn_detection, no session.update
  on connect, trigger greeting via data channel)
- Recommend ngrok Hobby ($8/mo) for fixed domain instead of free tier
- Fixed domain means URLs never change, Twilio never breaks

New guide: docs/mcp/NGROK_SETUP.md
- How to set up ngrok Hobby for both MCP and voice agent
- Fixed domain setup, watchdog pattern, AI client configuration
- Claude Desktop requires Settings > Integrations (not JSON config)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add dependency graph + ngrok-tunnel + credential-gateway recipes

Recipes now have real dependencies via the `requires` field:
- voice-to-brain requires ngrok-tunnel (needs public URL for Twilio)
- email-to-brain requires credential-gateway (needs Gmail access)
- calendar-to-brain requires credential-gateway (needs Calendar access)
- x-to-brain and meeting-sync are standalone (direct API keys)

Two new infrastructure recipes:
- ngrok-tunnel: fixed public URL for MCP + voice. Recommends Hobby
  ($8/mo) for a domain that never changes. Includes watchdog pattern.
- credential-gateway: secure Google service access via ClawVisor
  (recommended) or direct OAuth2. One setup, all Google recipes use it.

Moved ngrok from docs/mcp/ to recipes/ — it's shared infrastructure,
not MCP-specific.

README and integrations landing page show dependency chains.
When agent installs voice-to-brain, it sets up ngrok-tunnel first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add infra category, fix dashboard alignment, show dependencies

DX audit found two bugs in gbrain integrations dashboard:

1. Column alignment broken — IDs > 18 chars ran into descriptions
   with no space. Fixed: pad to 22 chars.

2. ngrok-tunnel and credential-gateway showed as SENSES but they're
   infrastructure. Added 'infra' category. Dashboard now shows three
   sections: INFRASTRUCTURE (set up first), SENSES, REFLEXES.

3. Dependencies now shown inline: "AVAILABLE (needs credential-gateway)"

Also added 'requires' field to JSON output for agent consumption.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add frontier model requirement disclaimer to README

GBrain's markdown-is-code approach requires models capable of
interpreting intent and implementing from architecture descriptions.
Tested with Claude Opus 4.6 and GPT-5.4 Thinking. Smaller models
will struggle with the recipe format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add PGLite → Supabase upgrade path to README

Clarify the database progression: start with PGLite (Postgres as WASM,
zero infrastructure, pgvector built in, nothing to install). Graduate
to Supabase or self-hosted Postgres when you need connection pooling,
concurrency, and remote MCP access from Claude Desktop, Cowork,
ChatGPT, Perplexity Computer, or any MCP-compatible agent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: revert PGLite mention (coming in next branch)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: make all 23 guides consistent (Goal/Impl/Tricky/Verify)

Every guide now has exactly these sections in this order:
- ## Goal (one sentence)
- ## What the User Gets (without this / with this)
- ## Implementation (pseudocode with gbrain commands)
- ## Tricky Spots (3-5 numbered gotchas)
- ## How to Verify (3-5 numbered test steps)

11 guides restructured from non-standard headings:
- deterministic-collectors, live-sync, upgrades-auto-update (full rewrites)
- entity-detection, diligence-ingestion, idea-capture, quiet-hours,
  repo-architecture, skill-development, sub-agent-routing (restructured)

23/23 guides now pass consistency audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: restructure README around the #1 blocker (getting data in)

The README was leading with Postgres and database architecture. Most
users are stuck at step zero: "I have an agent but it doesn't know
anything about my life."

New structure:
1. The Problem — your agent doesn't know your life
2. Getting Data In — integration recipes, front and center
3. The Compounding Thesis — why this matters
4. How this happened — credibility, origin story
5. When you need Postgres — scale, not starting point

Postgres is de-emphasized from a full section to two paragraphs:
"You don't need Postgres to start" and "When you need Postgres"
(1,000+ files, remote MCP access, multiple AI clients).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: move Install to top of README, remove duplicate section

Install now appears right after Getting Data In (line 38), not buried
at line 295. The user sees: Problem → Getting Data In → Install.

Removed the duplicate Install section (262 lines) that was lower in
the README. The agent instructions block, CLI quickstart, and all
content is now in the single Install section near the top.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: move agent install block to first thing in README

"Start here: paste this into your agent" is now the first section,
right after the one-line pitch. No scrolling, no context, no preamble.
User opens the README, sees the paste block, copies it into OpenClaw
or Hermes, and the agent takes over.

Flow: pitch → paste block → Getting Data In → Compounding Thesis → origin story

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: compress install block from 11 steps to 5

The agent install block was 102 lines and 11 steps. Now it's 40 lines
and 5 steps. Same coverage, half the text.

Changes:
- Merged "prove keyword search" + "embed" + "prove hybrid search"
  into one SEARCH step (the user doesn't care about the intermediate)
- Merged skillpack, sync, auto-update, integrations, verification
  into one GO LIVE step with sub-items (post-install polish, not install)
- Shortened database instructions (one line instead of 5 sub-steps)
- Removed redundant preamble ("YOU MUST COMPLETE EVERY STEP" is now
  just "Do not skip steps. Verify each step.")

The 5 steps: INSTALL → DATABASE → IMPORT → SEARCH → GO LIVE

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* security: gitignore all .env files, not just specific ones

CSO audit found .gitignore covered .env.testing and .env.production
but not bare .env. A user creating .env with database credentials
could accidentally commit it.

Fix: .env and .env.* are now gitignored. .env.*.example files are
explicitly un-ignored so templates remain tracked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* security: scrub PII from essay and recipe examples

- 510-MY-GARRY phone mnemonic → "Your Phone Number"
- "Garry → Authenticated Mode" → "Owner → Authenticated Mode"
- "Telegram" → "secure channel" in auth example
- @garrytan → @yourhandle in X recipe example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-10 23:39:06 -10:00
committed by GitHub
parent 8de04d3827
commit ce15062694
50 changed files with 7428 additions and 1494 deletions

332
README.md
View File

@@ -1,56 +1,88 @@
# GBrain
The memex Vannevar Bush imagined, built for people who think for a living.
Your AI agent is smart but it doesn't know anything about your life. GBrain fixes that. Meetings, emails, tweets, calendar events, voice calls, original ideas... all of it flows into a searchable knowledge base that your agent reads before every response and writes to after every conversation. The agent gets smarter every day.
## How this happened
> **Requires a frontier model.** Tested with **Claude Opus 4.6** and **GPT-5.4 Thinking**. Likely to break with smaller models.
I was setting up my [OpenClaw](https://openclaw.ai) agent and started a markdown brain repo. One page per person, one page per company, compiled truth on top, append-only timeline on the bottom. The agent got smarter the more it knew, so I kept feeding it. Meetings, emails, tweets, Apple Notes, calendar data, original ideas. One thing led to another. Within a week I had:
## Start here: paste this into your agent
- **10,000+ markdown files** indexed and searchable
- **3,000+ people** with compiled dossiers and relationship history
- **13 years of calendar data** (21,000+ events)
- **5,800+ Apple Notes** going back to 2009
- **280+ meeting transcripts** with AI analysis
- **300+ captured original ideas** organized by thesis
- **500+ media pages** (video transcripts, books, articles)
- Company profiles, food guides, travel logs
Copy this block into [OpenClaw](https://openclaw.ai) or [Hermes Agent](https://github.com/NousResearch/hermes-agent). The agent will install GBrain, set up your database, import your files, and verify everything works.
This is what I actually use day to day. The agent runs while I sleep... literally. The dream cycle scans every conversation from the day, enriches missing entities, fixes broken citations, and consolidates memory. I wake up and the brain is smarter than when I went to sleep. OpenClaw ships this as DREAMS.md. Hermes Agent can do the same with a nightly cron job (see the [SKILLPACK](docs/GBRAIN_SKILLPACK.md#the-dream-cycle) for setup).
```
GOAL: Install GBrain so I can search my own markdown files with
hybrid search (keyword + vector). Do not skip steps. Verify each
step before proceeding. If a step fails, stop and fix it.
**You don't need Postgres to start.** The knowledge model is just markdown files in a git repo. The [skills](docs/GBRAIN_SKILLPACK.md) and [schema](docs/GBRAIN_RECOMMENDED_SCHEMA.md) work with any AI agent that can read and write files. Start there.
STEP 1 — INSTALL
curl -fsSL https://bun.sh/install | bash && source ~/.bashrc
bun add github:garrytan/gbrain
Verify: gbrain --version
I added Postgres + pgvector later because at 1,000 to 10,000 long markdown docs, `grep` stops working. You need real chunking, real retrieval, real search. GBrain is the thin CLI and MCP layer I built on top of Postgres to solve that, optimized for OpenClaw and smart agents.
STEP 2 — DATABASE
Run: gbrain init --supabase
Ask the user for their Supabase Shared Pooler connection string:
"Go to https://supabase.com → your project → Get Connected →
Direct Connection String → Session Pooler → copy Shared Pooler"
Do NOT ask for the anon key or project URL. Those are wrong.
Verify: gbrain doctor --json (all checks pass)
### Ask it anything
STEP 3 — IMPORT
Find markdown: find ~/git ~/Documents ~/notes -maxdepth 3 -name "*.md" | head -30
Tell user what you found. Import: gbrain import /path/ --no-embed
Verify: gbrain stats (page count > 0)
> "Who should I invite to dinner who knows both Pedro and Diana?"
> — cross-references the social graph across 3,000+ people pages
STEP 4 — SEARCH
Run: gbrain embed --stale (generates vector embeddings, ~1 min per 1000 pages)
Then: gbrain query "what are the key themes across these documents?"
Show the user the results. This is the magic moment.
Verify: results include semantic matches, not just keyword hits
> "What have I said about the relationship between shame and founder performance?"
> — searches YOUR thinking, not the internet
STEP 5 — GO LIVE
a. Read docs/GBRAIN_SKILLPACK.md — adopt brain-first lookup, entity
detection, and back-linking into your behavior immediately
b. Set up live sync: gbrain sync --repo /path/ && gbrain embed --stale
(see docs/guides/live-sync.md for cron/watch/webhook options)
c. Set up auto-update: gbrain check-update --json on a daily cron
(never auto-install — always ask the user first)
d. Show available integrations: gbrain integrations
e. Run verification: read and execute docs/GBRAIN_VERIFY.md
> "What changed with the Series A since Tuesday?"
> — diffs timeline entries across deal and company pages
DONE. Tell the user: "GBrain is live. You have [N] pages with hybrid
search. I now check the brain before answering questions. Run
'gbrain integrations' to add voice, email, calendar, or Twitter."
```
> "Prep me for my meeting with Jordan in 30 minutes"
> — pulls dossier, shared history, recent activity, open threads
### Without an agent (standalone CLI)
Your markdown repo is the source of truth. GBrain makes it searchable. Your AI agent makes it live.
```bash
bun add -g github:garrytan/gbrain
gbrain init --supabase # guided wizard
gbrain import ~/git/brain/ # index your markdown
gbrain query "what do we know about competitive dynamics?"
```
## Why Postgres
Run `gbrain --help` for all commands. See [MCP setup](docs/mcp/DEPLOY.md) for connecting Claude Desktop, Perplexity, etc.
At 500 files, `grep` is fine. At 3,000 people pages, 5,800 Apple Notes, and 13 years of calendar data, `grep` falls apart. You need keyword search for exact names, vector search for semantic meaning, and something that fuses both. You need an index that updates incrementally when one file changes, not a full directory walk. You need your agent to find "everyone who was at the board dinner last March" in milliseconds, not 30 seconds of grepping.
## Getting Data In
GBrain gives you hybrid search that combines keyword and vector approaches, plus a knowledge model that treats every page like an intelligence assessment: compiled truth on top (your current best understanding, rewritten when evidence changes), append-only timeline on the bottom (the evidence trail that never gets edited).
Once GBrain is installed, your agent needs data flowing in. GBrain ships integration recipes that your agent sets up for you. It reads the recipe, asks for API keys, validates each one, and runs a smoke test. [Markdown is code](docs/ethos/THIN_HARNESS_FAT_SKILLS.md)... the recipe IS the installer.
AI agents maintain the brain. You ingest a document and the agent updates every entity mentioned, creates cross-reference links, and appends timeline entries. MCP clients query it. The intelligence lives in fat markdown skills, not application code.
| Recipe | Requires | What It Does |
|--------|----------|-------------|
| [Public Tunnel](recipes/ngrok-tunnel.md) | — | Fixed URL for MCP + voice (ngrok Hobby $8/mo) |
| [Credential Gateway](recipes/credential-gateway.md) | — | Gmail + Calendar access (ClawVisor or Google OAuth) |
| [Voice-to-Brain](recipes/twilio-voice-brain.md) | ngrok-tunnel | Phone calls → brain pages (Twilio + OpenAI Realtime) |
| [Email-to-Brain](recipes/email-to-brain.md) | credential-gateway | Gmail → entity pages (deterministic collector) |
| [X-to-Brain](recipes/x-to-brain.md) | — | Twitter → brain pages (timeline + mentions + deletions) |
| [Calendar-to-Brain](recipes/calendar-to-brain.md) | credential-gateway | Google Calendar → searchable daily pages |
| [Meeting Sync](recipes/meeting-sync.md) | — | Circleback transcripts → brain pages with attendees |
Run `gbrain integrations` to see status. Dependencies resolve automatically. See [Getting Data In](docs/integrations/README.md) for the full guide.
## The Compounding Thesis
Most tools help you find things. GBrain makes you smarter over time.
The core loop:
```
Signal arrives (meeting, email, tweet, link)
→ Agent detects entities (people, companies, ideas)
@@ -60,11 +92,28 @@ Signal arrives (meeting, email, tweet, link)
→ Sync: gbrain indexes changes for next query
```
Every cycle through this loop adds knowledge. The agent enriches a person page after a meeting. Next time that person comes up, the agent already has context — their role, your history, what they care about, what you discussed last time. You never start from zero.
Every cycle through this loop adds knowledge. The agent enriches a person page after a meeting. Next time that person comes up, the agent already has context. You never start from zero.
An agent without this loop answers from stale context. An agent with it gets smarter every conversation. The difference compounds daily.
Never do anything twice. If you look someone up once, that lookup lives in the brain forever. If a pattern emerges across three meetings, the agent captures it. If you generate an original idea in conversation, it goes to `originals/` — your searchable intellectual archive.
> "Who should I invite to dinner who knows both Pedro and Diana?"
> — cross-references the social graph across 3,000+ people pages
> "What have I said about the relationship between shame and founder performance?"
> — searches YOUR thinking, not the internet
> "Prep me for my meeting with Jordan in 30 minutes"
> — pulls dossier, shared history, recent activity, open threads
## How this happened
I was setting up my [OpenClaw](https://openclaw.ai) agent and started a markdown brain repo. One page per person, one page per company, compiled truth on top, append-only timeline on the bottom. The agent got smarter the more it knew, so I kept feeding it. Within a week I had 10,000+ markdown files, 3,000+ people with compiled dossiers, 13 years of calendar data, 280+ meeting transcripts, and 300+ captured original ideas.
The agent runs while I sleep. The dream cycle scans every conversation, enriches missing entities, fixes broken citations, and consolidates memory. I wake up and the brain is smarter than when I went to sleep. See the [cron schedule guide](docs/guides/cron-schedule.md) for setup.
**You don't need Postgres to start.** The knowledge model is just markdown files in a git repo. The [skills](docs/GBRAIN_SKILLPACK.md) and [schema](docs/GBRAIN_RECOMMENDED_SCHEMA.md) work with any AI agent that can read and write files.
**When you need Postgres:** at 1,000+ files, `grep` stops working. GBrain adds hybrid search (keyword + vector + RRF fusion) on top of Postgres + pgvector. The CLI and MCP layer handle chunking, embedding, and incremental sync. Add Postgres when search speed matters, or when you want Claude Desktop, ChatGPT, Perplexity, or other MCP clients to connect to your brain remotely.
## Architecture
@@ -165,203 +214,6 @@ Your file count will be different. Your queries will be different. The agent pic
**The compounding effect.** Search for Pedro. The agent pulls his page, his relationship history, his company. Next time Brex comes up in conversation, the agent already knows Pedro co-founded it, what you discussed last, and what's on your open threads. You didn't do anything — the brain already had it.
## Install
### Prerequisites
**Without Postgres**, you can use the GBrain knowledge model right now: the [skills](docs/GBRAIN_SKILLPACK.md), [schema](docs/GBRAIN_RECOMMENDED_SCHEMA.md), and compiled truth + timeline pattern work with any agent that reads and writes markdown files. Add Postgres when `grep` stops being enough.
**With Postgres**, GBrain needs three things:
| Dependency | What it's for | How to get it |
|------------|--------------|---------------|
| **Supabase account** | Postgres + pgvector database | [supabase.com](https://supabase.com) (Pro tier, $25/mo for 8GB) |
| **OpenAI API key** | Embeddings (text-embedding-3-large) | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) |
| **Anthropic API key** | Multi-query expansion + LLM chunking (Haiku) | [console.anthropic.com](https://console.anthropic.com) |
Set the API keys as environment variables:
```bash
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
```
The Supabase connection URL is configured during `gbrain init`. The OpenAI and Anthropic SDKs read their keys from the environment automatically.
Without an OpenAI key, search still works (keyword only, no vector search). Without an Anthropic key, search still works (no multi-query expansion, no LLM chunking).
### With OpenClaw or Hermes Agent (recommended)
To install, paste this into [OpenClaw](https://openclaw.ai) or [Hermes Agent](https://github.com/NousResearch/hermes-agent) and we'll work with you to do the rest:
```
Set up gbrain (https://github.com/garrytan/gbrain) as my
knowledge brain.
1. Make sure bun is installed:
curl -fsSL https://bun.sh/install | bash
Then run: bun add github:garrytan/gbrain
2. Run: gbrain init --supabase
(follow the wizard to connect my Supabase database)
3. Scan ~/git/ and ~/Documents/ for markdown repos,
pick the best one, and run: gbrain import <path> --no-embed
4. Run a query against the imported data to prove search works
5. Read docs/GBRAIN_RECOMMENDED_SCHEMA.md and offer to
restructure my knowledge base
6. Read docs/GBRAIN_SKILLPACK.md and update all your skills
with the production agent patterns: brain-agent loop,
entity detection, source attribution, iron law back-linking,
and the enrichment pipeline
7. Set up a daily cron for `gbrain check-update`.
Notify me on new features only, not patches.
Never auto-install, just tell me what's new.
8. Set up automatic sync so the vector DB stays current
with the brain repo. Read GBRAIN_SKILLPACK.md Section 18
for approaches (cron, --watch, webhook). Pick what fits
your environment. Always chain:
gbrain sync --repo <path> && gbrain embed --stale
Verify: push a change, confirm it appears in search.
9. Run the verification runbook (docs/GBRAIN_VERIFY.md)
to confirm everything works: schema, sync, embeddings,
brain-first lookup.
```
OpenClaw will install gbrain, walk through Supabase setup, discover your markdown files, import them, and prove search works with a query from your data.
After setup, you talk to your brain through OpenClaw:
```
Search the brain for everything we know about [topic]
Ingest my meeting notes from today
Give me a briefing for my meetings tomorrow
How many pages are in the brain now?
```
GBrain keeps your brain current. After setup, `gbrain sync --watch` polls your git repo and imports only what changed. Binary files (images, PDFs, audio) can be moved to cloud storage with `gbrain files mirror` to slim down your git repo.
> **Supabase settings:** GBrain connects directly to Postgres (not the REST API).
> You need the **Shared Pooler connection string**, not the project URL or anon key.
> Find it: go to your project, click **Get Connected** next to the project URL,
> then **Direct Connection String** > **Session Pooler**, and copy the
> **Shared Pooler** connection string.
### GBrain without OpenClaw
GBrain works with any AI agent, any MCP client, or no agent at all. Three paths:
#### Standalone CLI
Install globally and use gbrain from the terminal:
```bash
bun add -g github:garrytan/gbrain
gbrain init --supabase # guided wizard, connects to your Postgres
gbrain import ~/git/brain/ # index your markdown
gbrain query "what do we know about competitive dynamics?"
```
The CLI gives you every operation: page CRUD, search, tags, links, timeline, graph traversal, file management, health checks. Run `gbrain --help` for the full list.
#### MCP server (Claude Code, Cursor, Windsurf, etc.)
GBrain exposes 30 MCP tools via stdio. Add this to your MCP client config:
**Claude Code** (`~/.claude/server.json`):
```json
{
"mcpServers": {
"gbrain": {
"command": "gbrain",
"args": ["serve"]
}
}
}
```
**Cursor** (Settings > MCP Servers):
```json
{
"gbrain": {
"command": "gbrain",
"args": ["serve"]
}
}
```
This gives your agent `get_page`, `put_page`, `search`, `query`, `add_link`, `traverse_graph`, `sync_brain`, `file_upload`, and 22 more tools. All generated from the same operation definitions as the CLI.
#### Remote MCP Server (Claude Desktop, Cowork, Perplexity, ChatGPT)
Access your brain from any device, any AI client. Deploy as a serverless endpoint on your existing Supabase instance:
```bash
cp .env.production.example .env.production # fill in 3 values
bash scripts/deploy-remote.sh # links, builds, deploys
bun run src/commands/auth.ts create "claude-desktop" # get a token
```
Then add to your AI client:
- **Claude Code:** `claude mcp add gbrain -t http https://YOUR_REF.supabase.co/functions/v1/gbrain-mcp/mcp -H "Authorization: Bearer TOKEN"`
- **Claude Desktop:** Settings > Integrations > Add (NOT JSON config)
- **Perplexity Computer:** Settings > Connectors > Add remote MCP
Per-client setup guides: [`docs/mcp/`](docs/mcp/DEPLOY.md)
ChatGPT support requires OAuth 2.1 and is coming in v0.7. Self-hosted alternatives (Tailscale Funnel, ngrok) documented in [`docs/mcp/ALTERNATIVES.md`](docs/mcp/ALTERNATIVES.md).
**The tools are not enough.** Your agent also needs the playbook: read [GBRAIN_SKILLPACK.md](docs/GBRAIN_SKILLPACK.md) and paste the relevant sections into your agent's system prompt or project instructions. The skillpack tells the agent WHEN and HOW to use each tool: read before responding, write after learning, detect entities on every message, back-link everything.
The skill markdown files in `skills/` are standalone instruction sets. Copy them into your agent's context:
| Skill file | What the agent learns |
|------------|----------------------|
| `skills/ingest/SKILL.md` | How to import meetings, docs, articles |
| `skills/query/SKILL.md` | 3-layer search with synthesis and citations |
| `skills/maintain/SKILL.md` | Periodic health: stale pages, orphans, dead links |
| `skills/enrich/SKILL.md` | Enrich pages from external APIs |
| `skills/briefing/SKILL.md` | Daily briefing with meeting prep |
| `skills/migrate/SKILL.md` | Migrate from Obsidian, Notion, Logseq, etc. |
#### As a TypeScript library
```bash
bun add github:garrytan/gbrain
```
```typescript
import { PostgresEngine } from 'gbrain';
const engine = new PostgresEngine();
await engine.connect({ database_url: process.env.DATABASE_URL });
await engine.initSchema();
// Search
const results = await engine.searchKeyword('startup growth');
// Read
const page = await engine.getPage('people/pedro-franceschi');
// Write
await engine.putPage('concepts/superlinear-returns', {
type: 'concept',
title: 'Superlinear Returns',
compiled_truth: 'Paul Graham argues that returns in many fields are superlinear...',
timeline: '- 2023-10-01: Published on paulgraham.com',
});
```
The `BrainEngine` interface is pluggable. See `docs/ENGINES.md` for how to add backends.
All paths require a Postgres database with pgvector. Supabase Pro ($25/mo) is the recommended zero-ops option.
## Upgrade
Upgrade depends on how you installed:
@@ -707,12 +559,22 @@ Initial embedding cost: ~$4-5 for 7,500 pages via OpenAI text-embedding-3-large.
## Docs
- **[GBRAIN_SKILLPACK.md](docs/GBRAIN_SKILLPACK.md)** -- **Start here for agents.** Reference architecture for production agents: brain-agent loop, entity detection, enrichment pipeline, meeting ingestion, cron schedule
- [GBRAIN_RECOMMENDED_SCHEMA.md](docs/GBRAIN_RECOMMENDED_SCHEMA.md) -- The recommended brain schema: MECE directories, compiled truth + timeline, enrichment pipelines, resolver decision tree
- [GBRAIN_V0.md](docs/GBRAIN_V0.md) -- Full product spec, all architecture decisions, every option considered
- [ENGINES.md](docs/ENGINES.md) -- Pluggable engine interface, capability matrix, how to add backends
- [SQLITE_ENGINE.md](docs/SQLITE_ENGINE.md) -- Complete SQLite engine plan with schema, FTS5, vector search options
- [GBRAIN_VERIFY.md](docs/GBRAIN_VERIFY.md) -- Installation verification runbook: schema, live sync, embeddings, brain-first lookup
**For agents:**
- **[GBRAIN_SKILLPACK.md](docs/GBRAIN_SKILLPACK.md)** -- **Start here.** Index of all patterns, skills, and integrations
- [Individual guides](docs/guides/) -- 17 standalone guides broken out from the skillpack
- [Getting Data In](docs/integrations/README.md) -- Integration recipes, credential setup, data flow patterns
- [GBRAIN_VERIFY.md](docs/GBRAIN_VERIFY.md) -- Installation verification runbook
**For humans:**
- [GBRAIN_RECOMMENDED_SCHEMA.md](docs/GBRAIN_RECOMMENDED_SCHEMA.md) -- Brain repo directory structure
- [Infrastructure Layer](docs/architecture/infra-layer.md) -- How import, chunking, embedding, and search work
- [Thin Harness, Fat Skills](docs/ethos/THIN_HARNESS_FAT_SKILLS.md) -- Architecture philosophy
- [Homebrew for Personal AI](docs/ethos/MARKDOWN_SKILLS_AS_RECIPES.md) -- Why markdown is code
**Reference:**
- [GBRAIN_V0.md](docs/GBRAIN_V0.md) -- Full product spec, all architecture decisions
- [ENGINES.md](docs/ENGINES.md) -- Pluggable engine interface, how to add backends
- [SQLITE_ENGINE.md](docs/SQLITE_ENGINE.md) -- SQLite engine plan (community PRs welcome)
## Contributing