feat: GBrain v0.7.0 — Integration Recipes + SKILLPACK Breakout (#39)

* docs: break SKILLPACK into 17 individual guides

The 1,281-line SKILLPACK monolith is now 17 individually linkable guides
in docs/guides/, organized by category: core patterns, data pipelines,
operations, search, and administration.

GBRAIN_SKILLPACK.md becomes a structured index with categorized tables
linking to each guide. The URL stays stable for backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add integration guides, architecture docs, and ethos

New documentation directories:
- docs/integrations/ — "Getting Data In" landing page, credential gateway,
  meeting webhooks. Includes recipe format documentation.
- docs/architecture/ — Infrastructure layer doc (import, chunk, embed, search)
- docs/ethos/ — "Thin Harness, Fat Skills" essay with agent decision guide
- docs/designs/ — "Homebrew for Personal AI" 10-star vision document

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add gbrain integrations command + voice-to-brain recipe

New CLI command: gbrain integrations (list/show/status/doctor/stats/test)
- Standalone command, no database connection needed
- Uses gray-matter directly for recipe parsing (not parseMarkdown)
- --json flag on every subcommand for agent-parseable output
- Bare command shows senses/reflexes dashboard
- Health heartbeat via ~/.gbrain/integrations/<id>/heartbeat.jsonl

First recipe: recipes/twilio-voice-brain.md
- Phone calls create brain pages via Twilio + OpenAI Realtime
- Opinionated defaults: caller screening, brain-first lookup, quiet hours
- Outbound call smoke test (GBrain calls the user to prove it works)
- Validate-as-you-go credential testing
- Twilio signature validation for webhook security

Migration file for v0.7.0 with agent-readable changelog.
13 unit tests covering parseRecipe, CLI routing, and recipe validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Getting Data In to README, update CLAUDE.md and manifest

README: voice calls in intro bullet list, new "Getting Data In" section
with integration table (voice, email, X, calendar) and recipe philosophy.

CLAUDE.md: reference new files (integrations.ts, recipes/, docs/guides/,
docs/integrations/, docs/architecture/, docs/ethos/).

manifest.json: bump to v0.7.0, add recipes_dir field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: v0.7.0 CHANGELOG, TODOS, VERSION bump

CHANGELOG: v0.7.0 entry covering integration recipes, voice-to-brain,
gbrain integrations command, SKILLPACK breakout, and new documentation.

TODOS: 3 new items from CEO/DX reviews (constrained health_check DSL,
community recipe submission, always-on deployment recipes).

VERSION + package.json: bump to 0.7.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite voice recipe with agent instructions and verified links

Major improvements to recipes/twilio-voice-brain.md:

- Agent preamble: explains WHY sequential execution matters (each step
  depends on the previous), defines 4 stop points where the agent MUST
  pause and verify, tells agent to never say "something went wrong"
  but instead explain the exact error and fix

- User actions are now specific: exact URLs for every credential
  (Twilio console, OpenAI API keys page, ngrok dashboard), what
  buttons to click, what fields to copy, common failure modes

- All URLs verified via web search against current 2026 documentation:
  Twilio SID/token at twilio.com/console, OpenAI keys at
  platform.openai.com/api-keys, ngrok token at
  dashboard.ngrok.com/get-started/your-authtoken

- Cost estimate corrected: OpenAI Realtime is $0.06/min input +
  $0.24/min output (was understated), total ~$20-22/mo for 100 min

- Validate-as-you-go: each credential tested immediately with exact
  curl commands, failure messages explain what went wrong and how to fix

- Smoke test flow: tells user exactly what to say, verifies ALL
  three outputs (messaging notification + brain page + search result)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add "Homebrew for Personal AI" essay (markdown is code)

New essay at docs/ethos/MARKDOWN_SKILLS_AS_RECIPES.md — the distribution
corollary to "Thin Harness, Fat Skills." Argues that markdown skill files
are simultaneously documentation, specification, package, and source code.
The agent is the package manager. The git repo is the app store.

Referenced from SKILLPACK index and CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite agent instructions as command language, promote skills

The OpenClaw/Hermes install block is now a drill sergeant, not a tour guide.
Every step is an imperative command with exact verification criteria and
explicit stop-on-failure behavior. No FYI, no suggestions, just rails.

Key changes:
- 11-step setup with STOP points after each step
- Exact user instructions for Supabase connection string (what to click,
  what NOT to give the agent, what the string looks like)
- "Verify: run X. You must see Y. If not: Z" after every step
- Skills table now links to both skill files AND guide docs
- Integration recipes table simplified (no "coming soon" placeholders)
- Docs section reorganized: for agents / for humans / reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: 4 codex findings + add email-to-brain recipe

Codex review found 4 issues, all fixed:

1. getStatus() returned "configured" if ANY secret was set (e.g. just
   OPENAI_API_KEY). Now requires ALL required secrets before marking
   configured. Prevents false "configured" status and spurious doctor runs.

2. Twilio health check hit unauthenticated endpoint (always 401). Now
   uses authenticated curl with SID:token, matching the setup validation.

3. README anchor docs/GBRAIN_SKILLPACK.md#the-dream-cycle broken after
   SKILLPACK rewrite. Updated to point to docs/guides/cron-schedule.md.

4. Compiled binary can't find recipes/ via import.meta.dir. Added
   GBRAIN_RECIPES_DIR env var override + global bun install path fallback.

Also adds recipes/email-to-brain.md: Gmail deterministic collector pattern
with ClawVisor credential gateway, validate-as-you-go, agent instructions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add email, X, calendar, and meeting sync recipes

Four new integration recipes extracted from production wintermute patterns:

- recipes/email-to-brain.md: Gmail via ClawVisor, deterministic collector
  pattern (code pulls emails with baked-in links, agent does judgment),
  noise filtering, signature detection, digest generation

- recipes/x-to-brain.md: X API v2, timeline + mentions + keyword search,
  deletion detection (diffs previous run, verifies 404), engagement
  velocity tracking, rate limit awareness

- recipes/calendar-to-brain.md: Google Calendar via ClawVisor, historical
  backfill (years of data), daily markdown files with attendees + locations,
  attendee enrichment for brain pages

- recipes/meeting-sync.md: Circleback API, transcript import with speaker
  labels, attendee detection + filtering, entity propagation to people/
  company pages, action item extraction, idempotent by source_id

All recipes follow the same format: agent preamble with sequential execution
rules, validate-as-you-go credentials, exact URLs for API key setup,
stop-on-failure verification, and heartbeat logging.

Updated README, SKILLPACK index, and integrations landing page with all 5 recipes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Google OAuth as alternative to ClawVisor in email + calendar recipes

Both recipes now offer two auth options:
- Option A: ClawVisor (recommended, handles OAuth + token refresh)
- Option B: Google OAuth2 directly (no extra service, you manage tokens)

Option B includes step-by-step instructions for Google Cloud Console:
exact URLs, which buttons to click, which scopes to add, how to enable
the API, and the OAuth flow for token exchange.

This removes ClawVisor as a hard dependency for getting started.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add implementation guides with pseudocode and test suggestions

Every recipe now includes an "Implementation Guide" section with:

- Production-tested pseudocode the agent can follow to build each collector
- Edge cases and failure modes discovered in real deployment
- Non-obvious implementation details (why the 48h staleness heuristic,
  why Gmail links need authuser, why SSE responses need double-parsing)
- Test suggestions: what the agent should verify after setup

email-to-brain: noise filtering algorithm, signature detection patterns,
  Gmail link generation (authuser is critical), sent-mail dedup

x-to-brain: deletion detection with 3 heuristics (7-day, 48h staleness,
  API verification), engagement velocity thresholds (50 min for 2x, 100
  absolute jump), atomic writes, stdout contract, rate limit handling

calendar-to-brain: smart chunking (monthly for sparse years, weekly for
  dense), attendee filtering (rooms, groups, distros), merge-with-existing
  (only replace ## Calendar section), date/time parsing edge cases

meeting-sync: SSE double-JSON parsing, idempotency double-check (grep +
  filename), auto-tagging from meeting names, git commit after sync

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: 6 new guides from production patterns (wintermute extraction)

New guides extracted and generalized from production deployment:

- repo-architecture.md: Two-repo pattern (agent behavior vs world knowledge).
  Strict boundary rules, decision tree, hard rule: never write knowledge
  to the agent repo.

- sub-agent-routing.md: Model routing table by task type. Signal detector
  pattern (spawn Sonnet on every message). Research pipeline pattern
  (Opus plans, DeepSeek executes, Opus synthesizes). Cost optimization.

- skill-development.md: 5-step cycle (concept, prototype, evaluate, codify,
  cron). MECE discipline (no overlapping skills). Quality bar checklist.
  "If you ask twice, it should already be a skill."

- idea-capture.md: Originality distribution rating (0-100 across 4
  populations). Depth test ("could someone unfamiliar understand WHY?").
  Deep cross-linking mandate. Notability filtering.

- quiet-hours.md: Hold notifications 11pm-8am local time. Held messages
  directory pattern. Timezone-aware delivery. Morning briefing pickup.

- diligence-ingestion.md: 9-step pipeline for data room materials. Detection
  patterns (PDF filenames, spreadsheet tabs, user language). Index.md
  template with bull/bear case. Company page enrichment.

All PII scrubbed. Patterns generalized for any user.
SKILLPACK index updated with 6 new entries. CLAUDE.md references added.
All 37 SKILLPACK links verified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: upgrade all guides to operational playbooks with pseudocode

Every guide now follows the playbook structure:
- Goal: one sentence, what this achieves
- What the User Gets: without this / with this
- Implementation: pseudocode with actual gbrain commands
- Tricky Spots: production-tested gotchas
- How to Verify: test steps the agent runs after setup

Guides upgraded (15 files):
- brain-agent-loop: on_message() loop with read/write/sync pseudocode
- brain-first-lookup: 4-step lookup cascade with exact commands
- brain-vs-memory: routing algorithm for 3 knowledge layers
- compiled-truth: page structure + rewrite vs append rules
- content-media: 3 ingest patterns (YouTube, social, PDFs)
- cron-schedule: full schedule table + dream cycle pseudocode
- enrichment-pipeline: 7-step protocol with tier classification
- entity-detection: spawn pattern + detection prompt + notability filter
- executive-assistant: 3 workflow algorithms (triage, prep, post-inbox)
- meeting-ingestion: 6-step transcript-to-brain flow
- operational-disciplines: 5 executable discipline blocks
- originals-folder: detection + exact-phrasing capture + cross-linking
- search-modes: decision tree for keyword vs hybrid vs direct
- source-attribution: citation format + hierarchy + conflict resolution
- Plus Goal/What User Gets headers on 6 newer guides

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add WebRTC to voice recipe + ngrok Hobby setup guide

Voice recipe updates:
- Added WebRTC endpoint (POST /session, GET /call, POST /tool) for
  browser-based calling with RNNoise noise suppression
- WebRTC pseudocode with the 4 non-obvious gotchas from production
  (voice under audio.output.voice, no turn_detection, no session.update
  on connect, trigger greeting via data channel)
- Recommend ngrok Hobby ($8/mo) for fixed domain instead of free tier
- Fixed domain means URLs never change, Twilio never breaks

New guide: docs/mcp/NGROK_SETUP.md
- How to set up ngrok Hobby for both MCP and voice agent
- Fixed domain setup, watchdog pattern, AI client configuration
- Claude Desktop requires Settings > Integrations (not JSON config)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add dependency graph + ngrok-tunnel + credential-gateway recipes

Recipes now have real dependencies via the `requires` field:
- voice-to-brain requires ngrok-tunnel (needs public URL for Twilio)
- email-to-brain requires credential-gateway (needs Gmail access)
- calendar-to-brain requires credential-gateway (needs Calendar access)
- x-to-brain and meeting-sync are standalone (direct API keys)

Two new infrastructure recipes:
- ngrok-tunnel: fixed public URL for MCP + voice. Recommends Hobby
  ($8/mo) for a domain that never changes. Includes watchdog pattern.
- credential-gateway: secure Google service access via ClawVisor
  (recommended) or direct OAuth2. One setup, all Google recipes use it.

Moved ngrok from docs/mcp/ to recipes/ — it's shared infrastructure,
not MCP-specific.

README and integrations landing page show dependency chains.
When agent installs voice-to-brain, it sets up ngrok-tunnel first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add infra category, fix dashboard alignment, show dependencies

DX audit found two bugs in gbrain integrations dashboard:

1. Column alignment broken — IDs > 18 chars ran into descriptions
   with no space. Fixed: pad to 22 chars.

2. ngrok-tunnel and credential-gateway showed as SENSES but they're
   infrastructure. Added 'infra' category. Dashboard now shows three
   sections: INFRASTRUCTURE (set up first), SENSES, REFLEXES.

3. Dependencies now shown inline: "AVAILABLE (needs credential-gateway)"

Also added 'requires' field to JSON output for agent consumption.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add frontier model requirement disclaimer to README

GBrain's markdown-is-code approach requires models capable of
interpreting intent and implementing from architecture descriptions.
Tested with Claude Opus 4.6 and GPT-5.4 Thinking. Smaller models
will struggle with the recipe format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add PGLite → Supabase upgrade path to README

Clarify the database progression: start with PGLite (Postgres as WASM,
zero infrastructure, pgvector built in, nothing to install). Graduate
to Supabase or self-hosted Postgres when you need connection pooling,
concurrency, and remote MCP access from Claude Desktop, Cowork,
ChatGPT, Perplexity Computer, or any MCP-compatible agent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: revert PGLite mention (coming in next branch)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: make all 23 guides consistent (Goal/Impl/Tricky/Verify)

Every guide now has exactly these sections in this order:
- ## Goal (one sentence)
- ## What the User Gets (without this / with this)
- ## Implementation (pseudocode with gbrain commands)
- ## Tricky Spots (3-5 numbered gotchas)
- ## How to Verify (3-5 numbered test steps)

11 guides restructured from non-standard headings:
- deterministic-collectors, live-sync, upgrades-auto-update (full rewrites)
- entity-detection, diligence-ingestion, idea-capture, quiet-hours,
  repo-architecture, skill-development, sub-agent-routing (restructured)

23/23 guides now pass consistency audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: restructure README around the #1 blocker (getting data in)

The README was leading with Postgres and database architecture. Most
users are stuck at step zero: "I have an agent but it doesn't know
anything about my life."

New structure:
1. The Problem — your agent doesn't know your life
2. Getting Data In — integration recipes, front and center
3. The Compounding Thesis — why this matters
4. How this happened — credibility, origin story
5. When you need Postgres — scale, not starting point

Postgres is de-emphasized from a full section to two paragraphs:
"You don't need Postgres to start" and "When you need Postgres"
(1,000+ files, remote MCP access, multiple AI clients).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: move Install to top of README, remove duplicate section

Install now appears right after Getting Data In (line 38), not buried
at line 295. The user sees: Problem → Getting Data In → Install.

Removed the duplicate Install section (262 lines) that was lower in
the README. The agent instructions block, CLI quickstart, and all
content is now in the single Install section near the top.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: move agent install block to first thing in README

"Start here: paste this into your agent" is now the first section,
right after the one-line pitch. No scrolling, no context, no preamble.
User opens the README, sees the paste block, copies it into OpenClaw
or Hermes, and the agent takes over.

Flow: pitch → paste block → Getting Data In → Compounding Thesis → origin story

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: compress install block from 11 steps to 5

The agent install block was 102 lines and 11 steps. Now it's 40 lines
and 5 steps. Same coverage, half the text.

Changes:
- Merged "prove keyword search" + "embed" + "prove hybrid search"
  into one SEARCH step (the user doesn't care about the intermediate)
- Merged skillpack, sync, auto-update, integrations, verification
  into one GO LIVE step with sub-items (post-install polish, not install)
- Shortened database instructions (one line instead of 5 sub-steps)
- Removed redundant preamble ("YOU MUST COMPLETE EVERY STEP" is now
  just "Do not skip steps. Verify each step.")

The 5 steps: INSTALL → DATABASE → IMPORT → SEARCH → GO LIVE

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* security: gitignore all .env files, not just specific ones

CSO audit found .gitignore covered .env.testing and .env.production
but not bare .env. A user creating .env with database credentials
could accidentally commit it.

Fix: .env and .env.* are now gitignored. .env.*.example files are
explicitly un-ignored so templates remain tracked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* security: scrub PII from essay and recipe examples

- 510-MY-GARRY phone mnemonic → "Your Phone Number"
- "Garry → Authenticated Mode" → "Owner → Authenticated Mode"
- "Telegram" → "secure channel" in auth example
- @garrytan → @yourhandle in X recipe example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-10 23:39:06 -10:00
committed by GitHub
parent 8de04d3827
commit ce15062694
50 changed files with 7428 additions and 1494 deletions

333
recipes/email-to-brain.md Normal file
View File

@@ -0,0 +1,333 @@
---
id: email-to-brain
name: Email-to-Brain
version: 0.7.0
description: Gmail messages flow into brain pages. Deterministic collector pulls emails, agent analyzes and enriches entities.
category: sense
requires: [credential-gateway]
secrets:
- name: CLAWVISOR_URL
description: ClawVisor gateway URL (Option A — recommended, handles OAuth for you)
where: https://clawvisor.com — create an agent, activate Gmail service
- name: CLAWVISOR_AGENT_TOKEN
description: ClawVisor agent token (Option A)
where: https://clawvisor.com — agent settings, copy the agent token
- name: GOOGLE_CLIENT_ID
description: Google OAuth2 client ID (Option B — direct Gmail API access)
where: https://console.cloud.google.com/apis/credentials — create OAuth 2.0 Client ID
- name: GOOGLE_CLIENT_SECRET
description: Google OAuth2 client secret (Option B)
where: https://console.cloud.google.com/apis/credentials — same page as client ID
health_checks:
- "[ -n \"$CLAWVISOR_URL\" ] && curl -sf $CLAWVISOR_URL/health > /dev/null && echo 'ClawVisor: OK' || [ -n \"$GOOGLE_CLIENT_ID\" ] && echo 'Google OAuth: configured' || echo 'No email auth configured'"
setup_time: 20 min
cost_estimate: "$0 (both options are free)"
---
# Email-to-Brain: Gmail Messages That Update Your Brain
Emails arrive. Brain pages get smarter. The agent reads your inbox, detects
entities, updates person and company pages, extracts action items, and files
everything with source attribution.
## IMPORTANT: Instructions for the Agent
**You are the installer.** Follow these steps precisely.
**The core pattern: code for data, LLMs for judgment.**
Email collection is split into two layers:
1. DETERMINISTIC: code pulls emails, generates Gmail links, detects noise/signatures.
This never fails. Links are always correct. Timestamps are always accurate.
2. LATENT: you (the agent) read the collected emails and make judgment calls.
Who is important? What entities are mentioned? What action items exist?
**Do not try to pull emails yourself.** Use the collector script. It handles
pagination, deduplication, Gmail link generation, and noise filtering. If you
try to do this via raw API calls, you WILL forget links, miss emails, or break
pagination. The collector exists because LLMs kept failing at this.
**Why sequential execution matters:**
- Step 1 validates the credential gateway. Without it, nothing connects to Gmail.
- Step 2 sets up the collector. Without it, you have no emails to analyze.
- Step 3 does the first collection. Without data, Step 4 can't enrich.
- Step 4 is YOUR job: read the digest, update brain pages.
## Architecture
```
Gmail Account(s)
↓ (ClawVisor E2E encrypted gateway)
Email Collector (deterministic Node.js script)
↓ Outputs:
├── messages/{YYYY-MM-DD}.json (structured email data)
├── digests/{YYYY-MM-DD}.md (markdown digest for agent)
└── state.json (pagination state, known IDs)
Agent reads digest
↓ Judgment calls:
├── Entity detection (people, companies mentioned)
├── Brain page updates (timeline entries, compiled truth)
├── Action item extraction
└── Priority classification (urgent / normal / noise)
```
## Opinionated Defaults
**Noise filtering (deterministic, in collector):**
- Skip: noreply@, notifications@, calendar-notification@
- Flag: DocuSign, Dropbox Sign, HelloSign, PandaDoc (signatures needing action)
- Keep: everything else
**Email accounts:** Configure multiple accounts. Common setup:
- Work email (company domain)
- Personal email (gmail.com)
**Digest format:** Daily markdown with sections:
- Signatures pending (DocuSign etc. needing action)
- Messages to triage (real emails from real people)
- Noise (filtered, available if needed)
Every email gets a baked-in Gmail link: `[Open in Gmail](https://mail.google.com/mail/u/?authuser=ACCOUNT#inbox/MESSAGE_ID)` — these are generated by code, never by the LLM, so they are always correct.
## Prerequisites
1. **GBrain installed and configured** (`gbrain doctor` passes)
2. **Node.js 18+** (for the collector script)
3. **Gmail access** via one of:
- ClawVisor (recommended: E2E encrypted credential gateway)
- Google OAuth credentials (direct API access)
- Hermes Gateway (built-in Gmail connector)
## Setup Flow
### Step 1: Validate Credential Gateway
Ask the user: "How do you access Gmail programmatically? Options:
1. ClawVisor (recommended, handles OAuth and encryption)
2. Google OAuth credentials (you manage tokens yourself)
3. Hermes Gateway (if you're using Hermes Agent)"
#### Option A: ClawVisor (recommended)
Tell the user:
"I need your ClawVisor URL and agent token.
1. Go to https://clawvisor.com
2. Create an agent (or use existing)
3. Activate the Gmail service
4. Create a standing task with purpose: 'Full executive assistant email management
including inbox triage, searching by any criteria, reading emails, tracking threads'
IMPORTANT: Be EXPANSIVE in the task purpose. Narrow purposes like 'email triage'
will cause legitimate requests to fail verification.
5. Copy the gateway URL and agent token"
Validate:
```bash
curl -sf "$CLAWVISOR_URL/health" && echo "PASS: ClawVisor reachable" || echo "FAIL"
```
**STOP until ClawVisor validates.**
#### Option B: Google OAuth2 directly
Tell the user:
"I need Google OAuth2 credentials for Gmail access. Here's how:
1. Go to https://console.cloud.google.com/apis/credentials
(create a Google Cloud project if you don't have one)
2. Click **'+ CREATE CREDENTIALS'** > **'OAuth client ID'**
3. If prompted, configure the OAuth consent screen:
- User type: **External** (or Internal for Google Workspace)
- App name: 'GBrain Email' (anything works)
- Scopes: add **'Gmail API .../auth/gmail.readonly'**
- Test users: add your own email address
4. Create the OAuth client ID:
- Application type: **Desktop app**
- Name: 'GBrain'
5. Copy the **Client ID** and **Client Secret**
6. Also enable the Gmail API:
Go to https://console.cloud.google.com/apis/library/gmail.googleapis.com
Click **'Enable'**"
Validate:
```bash
[ -n "$GOOGLE_CLIENT_ID" ] && [ -n "$GOOGLE_CLIENT_SECRET" ] \
&& echo "PASS: Google OAuth credentials set" \
|| echo "FAIL: Missing GOOGLE_CLIENT_ID or GOOGLE_CLIENT_SECRET"
```
Then run the OAuth flow to get tokens:
```bash
# The collector script handles the OAuth flow:
# 1. Opens browser to Google consent URL with gmail.readonly scope
# 2. User grants access
# 3. Script receives auth code, exchanges for access + refresh token
# 4. Stores tokens in ~/.gbrain/google-tokens.json
# 5. Auto-refreshes on expiry
```
**STOP until OAuth flow completes and tokens are stored.**
### Step 2: Set Up the Email Collector
Create the collector directory and script:
```bash
mkdir -p email-collector/data/{messages,digests}
cd email-collector
npm init -y
```
The collector script needs these capabilities:
1. **collect** — pull emails from Gmail via credential gateway, deduplicate by message ID, store as JSON with Gmail links baked in
2. **digest** — generate a markdown digest from collected emails, grouped by: signatures pending, messages to triage, noise
3. **state tracking** — remember last collection timestamp and known message IDs to avoid re-processing
Key design rules for the collector:
- Gmail links are generated by CODE, not by the LLM. Format: `[Open in Gmail](https://mail.google.com/mail/u/?authuser=ACCOUNT#inbox/MESSAGE_ID)`
- Noise filtering is deterministic: noreply, notifications, calendar invites
- Signature detection uses known patterns: DocuSign envelope, Dropbox Sign, HelloSign, PandaDoc
- All state persisted to `data/state.json` (last collect timestamp, known message IDs)
- Output is structured JSON (machine-readable) AND markdown digest (agent-readable)
### Step 3: Run First Collection
```bash
node email-collector.mjs collect
node email-collector.mjs digest
```
Verify: `ls data/digests/` should show today's digest file.
Read the digest. Confirm it contains real emails with working Gmail links.
### Step 4: Enrich Brain Pages
This is YOUR job (the agent). Read the digest. For each email:
1. **Detect entities**: who sent it? Who is mentioned? What companies?
2. **Check the brain**: `gbrain search "sender name"` — do we have a page?
3. **Update brain pages**: if sender has a brain page, append a timeline entry:
`- YYYY-MM-DD | Email from {sender}: {subject} [Source: Gmail, {date}]`
4. **Create new pages**: if sender is notable and has no page, create one
5. **Extract action items**: if the email requires a response or action, log it
6. **Sync**: run `gbrain sync --no-pull --no-embed` to index changes
### Step 5: Set Up Cron
The collector should run every 30 minutes:
```bash
*/30 * * * * cd /path/to/email-collector && node email-collector.mjs collect && node email-collector.mjs digest
```
The agent should read the digest on a schedule (e.g., 3x/day: 9 AM, 12 PM, 3 PM)
and run the enrichment flow from Step 4.
### Step 6: Log Setup Completion
```bash
mkdir -p ~/.gbrain/integrations/email-to-brain
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.7.0","status":"ok"}' >> ~/.gbrain/integrations/email-to-brain/heartbeat.jsonl
```
## Implementation Guide
These are production-tested patterns. Follow them exactly.
### Noise Filtering (Deterministic)
```
NOISE_SENDERS = ['noreply', 'no-reply', 'notifications@', 'calendar-notification',
'mailer-daemon', 'postmaster', 'donotreply']
is_noise(email):
from = email.from.toLowerCase()
return NOISE_SENDERS.some(p => from.includes(p)) // substring match
```
Simple substring matching, not regex. `notifications@slack.com` matches because
`notifications@` is in the pattern list. Order doesn't matter.
### Signature Detection
```
SIGNATURE_PATTERNS = [
/docusign/i, /dropbox sign/i, /hellosign/i, /pandadoc/i,
/please sign/i, /signature needed/i, /ready for your signature/i,
/everyone has signed/i, /you just signed/i
]
is_signature(email):
subject = email.subject || ''
from = email.from || ''
return SIGNATURE_PATTERNS.some(p => p.test(subject) || p.test(from))
```
Test BOTH subject AND from. Signature requests come from services that have
"docusign" in the sender address, not just the subject.
### Gmail Link Generation (CRITICAL)
```
gmail_link(messageId, authuser):
return `https://mail.google.com/mail/u/?authuser=${authuser}#inbox/${messageId}`
```
The `authuser` parameter is CRITICAL. Without it, the link opens in the default
Gmail account, not the right one. Each email record stores its account separately.
Generate these in CODE, never by the LLM. Links must be 100% reliable.
### Deduplication
```
collect():
state = load_state()
since = state.lastCollect ? `newer_than:${hours_since}h` : 'newer_than:1d'
for account in accounts:
inbox = gmail.list(query=since, max=50)
for msg in inbox:
if msg.id in state.knownMessageIds: continue // already seen
record = build_record(msg)
state.knownMessageIds[msg.id] = record
// ALSO pull sent mail to detect replies
sent = gmail.list(query=`from:${account.email} ${since}`, max=30)
for msg in sent:
state.knownMessageIds[msg.id] = {is_sent: true}
```
**Why sent mail matters:** Without it, the digest shows "awaiting response" on
threads you already replied to. Sent mail acts as a negative filter.
### What the Agent Should Test After Setup
1. **Noise filtering:** Send a test email from `noreply@test.com`. Run collect.
Verify it appears in noise section, not triage section.
2. **Gmail links:** Click a link from the digest. Verify it opens the correct
account (not the default one).
3. **Deduplication:** Run collect twice in 1 minute. Verify no duplicate messages.
4. **Sent mail:** Reply to an email manually. Run collect. Verify the thread is
marked as replied-to in the digest.
## Cost Estimate
| Component | Monthly Cost |
|-----------|-------------|
| ClawVisor (free tier) | $0 |
| Gmail API | $0 (within free quota) |
| **Total** | **$0** |
## Troubleshooting
**No emails collected:**
- Check ClawVisor health: `curl $CLAWVISOR_URL/health`
- Check standing task is active and has Gmail service enabled
- Check task purpose is expansive enough (narrow purposes block requests)
**Gmail links don't work:**
- Verify the `authuser` parameter matches the account email
- Gmail links require being logged into the correct Google account
**Digest is empty but collection ran:**
- Check `data/messages/` for JSON files
- All emails might be filtered as noise — check noise filtering rules