Files
gbrain/recipes/email-to-brain.md
Garry Tan f82978d38d security: fix wave 2 — 5 vulns + typed health check DSL (v0.9.3) (#95)
* security: path traversal, query bounds, marker injection fixes

LocalStorage: contained() method validates all paths stay within storage root.
file-resolver: resolveFile validates filePath within brainRoot, marker prefix
rejects ../, absolute paths, bare '..'. file_list: LIMIT 100 on slug-filtered
branch + FILE_LIST_LIMIT constant for both branches.

Co-Authored-By: Gus <garagon@users.noreply.github.com>

* security: symlink hardening in all file walkers

All 4 walkers in files.ts (collectFiles, findRedirects, findAndClean, scan)
plus init.ts counter now use lstatSync + isSymbolicLink skip. Tests import
production collectFiles instead of reimplementing it. node_modules skipped.
CLI file list and verify queries bounded with LIMIT.

Co-Authored-By: Gus <garagon@users.noreply.github.com>

* feat: typed health check DSL + recipe migration

4 DSL types: http, env_exists, command, any_of. Replaces raw execSync
on recipe YAML. All 7 first-party recipes migrated from shell strings
to typed objects. String health_checks still accepted with deprecation
warning + metachar validation for non-embedded recipes. isUnsafeHealthCheck
blocks shell injection for user-created recipes.

Co-Authored-By: Gus <garagon@users.noreply.github.com>

* chore: bump version and changelog (v0.9.3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: E2E test for file_list LIMIT enforcement against real Postgres

Inserts 150 file rows for one slug, verifies file_list returns at most
100 (both slug-filtered and unfiltered branches). Proves the LIMIT
works at the database level, not just in unit tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Gus <garagon@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 07:49:13 -10:00

12 KiB

id, name, version, description, category, requires, secrets, health_checks, setup_time, cost_estimate
id name version description category requires secrets health_checks setup_time cost_estimate
email-to-brain Email-to-Brain 0.7.0 Gmail messages flow into brain pages. Deterministic collector pulls emails, agent analyzes and enriches entities. sense
credential-gateway
name description where
CLAWVISOR_URL ClawVisor gateway URL (Option A — recommended, handles OAuth for you) https://clawvisor.com — create an agent, activate Gmail service
name description where
CLAWVISOR_AGENT_TOKEN ClawVisor agent token (Option A) https://clawvisor.com — agent settings, copy the agent token
name description where
GOOGLE_CLIENT_ID Google OAuth2 client ID (Option B — direct Gmail API access) https://console.cloud.google.com/apis/credentials — create OAuth 2.0 Client ID
name description where
GOOGLE_CLIENT_SECRET Google OAuth2 client secret (Option B) https://console.cloud.google.com/apis/credentials — same page as client ID
type label checks
any_of Auth provider
type url label
http $CLAWVISOR_URL/health ClawVisor
type name label
env_exists GOOGLE_CLIENT_ID Google OAuth
20 min $0 (both options are free)

Email-to-Brain: Gmail Messages That Update Your Brain

Emails arrive. Brain pages get smarter. The agent reads your inbox, detects entities, updates person and company pages, extracts action items, and files everything with source attribution.

IMPORTANT: Instructions for the Agent

You are the installer. Follow these steps precisely.

The core pattern: code for data, LLMs for judgment. Email collection is split into two layers:

  1. DETERMINISTIC: code pulls emails, generates Gmail links, detects noise/signatures. This never fails. Links are always correct. Timestamps are always accurate.
  2. LATENT: you (the agent) read the collected emails and make judgment calls. Who is important? What entities are mentioned? What action items exist?

Do not try to pull emails yourself. Use the collector script. It handles pagination, deduplication, Gmail link generation, and noise filtering. If you try to do this via raw API calls, you WILL forget links, miss emails, or break pagination. The collector exists because LLMs kept failing at this.

Why sequential execution matters:

  • Step 1 validates the credential gateway. Without it, nothing connects to Gmail.
  • Step 2 sets up the collector. Without it, you have no emails to analyze.
  • Step 3 does the first collection. Without data, Step 4 can't enrich.
  • Step 4 is YOUR job: read the digest, update brain pages.

Architecture

Gmail Account(s)
  ↓ (ClawVisor E2E encrypted gateway)
Email Collector (deterministic Node.js script)
  ↓ Outputs:
  ├── messages/{YYYY-MM-DD}.json     (structured email data)
  ├── digests/{YYYY-MM-DD}.md        (markdown digest for agent)
  └── state.json                     (pagination state, known IDs)
  ↓
Agent reads digest
  ↓ Judgment calls:
  ├── Entity detection (people, companies mentioned)
  ├── Brain page updates (timeline entries, compiled truth)
  ├── Action item extraction
  └── Priority classification (urgent / normal / noise)

Opinionated Defaults

Noise filtering (deterministic, in collector):

  • Skip: noreply@, notifications@, calendar-notification@
  • Flag: DocuSign, Dropbox Sign, HelloSign, PandaDoc (signatures needing action)
  • Keep: everything else

Email accounts: Configure multiple accounts. Common setup:

  • Work email (company domain)
  • Personal email (gmail.com)

Digest format: Daily markdown with sections:

  • Signatures pending (DocuSign etc. needing action)
  • Messages to triage (real emails from real people)
  • Noise (filtered, available if needed)

Every email gets a baked-in Gmail link: [Open in Gmail](https://mail.google.com/mail/u/?authuser=ACCOUNT#inbox/MESSAGE_ID) — these are generated by code, never by the LLM, so they are always correct.

Prerequisites

  1. GBrain installed and configured (gbrain doctor passes)
  2. Node.js 18+ (for the collector script)
  3. Gmail access via one of:
    • ClawVisor (recommended: E2E encrypted credential gateway)
    • Google OAuth credentials (direct API access)
    • Hermes Gateway (built-in Gmail connector)

Setup Flow

Step 1: Validate Credential Gateway

Ask the user: "How do you access Gmail programmatically? Options:

  1. ClawVisor (recommended, handles OAuth and encryption)
  2. Google OAuth credentials (you manage tokens yourself)
  3. Hermes Gateway (if you're using Hermes Agent)"

Tell the user: "I need your ClawVisor URL and agent token.

  1. Go to https://clawvisor.com
  2. Create an agent (or use existing)
  3. Activate the Gmail service
  4. Create a standing task with purpose: 'Full executive assistant email management including inbox triage, searching by any criteria, reading emails, tracking threads' IMPORTANT: Be EXPANSIVE in the task purpose. Narrow purposes like 'email triage' will cause legitimate requests to fail verification.
  5. Copy the gateway URL and agent token"

Validate:

curl -sf "$CLAWVISOR_URL/health" && echo "PASS: ClawVisor reachable" || echo "FAIL"

STOP until ClawVisor validates.

Option B: Google OAuth2 directly

Tell the user: "I need Google OAuth2 credentials for Gmail access. Here's how:

  1. Go to https://console.cloud.google.com/apis/credentials (create a Google Cloud project if you don't have one)
  2. Click '+ CREATE CREDENTIALS' > 'OAuth client ID'
  3. If prompted, configure the OAuth consent screen:
    • User type: External (or Internal for Google Workspace)
    • App name: 'GBrain Email' (anything works)
    • Scopes: add 'Gmail API .../auth/gmail.readonly'
    • Test users: add your own email address
  4. Create the OAuth client ID:
    • Application type: Desktop app
    • Name: 'GBrain'
  5. Copy the Client ID and Client Secret
  6. Also enable the Gmail API: Go to https://console.cloud.google.com/apis/library/gmail.googleapis.com Click 'Enable'"

Validate:

[ -n "$GOOGLE_CLIENT_ID" ] && [ -n "$GOOGLE_CLIENT_SECRET" ] \
  && echo "PASS: Google OAuth credentials set" \
  || echo "FAIL: Missing GOOGLE_CLIENT_ID or GOOGLE_CLIENT_SECRET"

Then run the OAuth flow to get tokens:

# The collector script handles the OAuth flow:
# 1. Opens browser to Google consent URL with gmail.readonly scope
# 2. User grants access
# 3. Script receives auth code, exchanges for access + refresh token
# 4. Stores tokens in ~/.gbrain/google-tokens.json
# 5. Auto-refreshes on expiry

STOP until OAuth flow completes and tokens are stored.

Step 2: Set Up the Email Collector

Create the collector directory and script:

mkdir -p email-collector/data/{messages,digests}
cd email-collector
npm init -y

The collector script needs these capabilities:

  1. collect — pull emails from Gmail via credential gateway, deduplicate by message ID, store as JSON with Gmail links baked in
  2. digest — generate a markdown digest from collected emails, grouped by: signatures pending, messages to triage, noise
  3. state tracking — remember last collection timestamp and known message IDs to avoid re-processing

Key design rules for the collector:

  • Gmail links are generated by CODE, not by the LLM. Format: [Open in Gmail](https://mail.google.com/mail/u/?authuser=ACCOUNT#inbox/MESSAGE_ID)
  • Noise filtering is deterministic: noreply, notifications, calendar invites
  • Signature detection uses known patterns: DocuSign envelope, Dropbox Sign, HelloSign, PandaDoc
  • All state persisted to data/state.json (last collect timestamp, known message IDs)
  • Output is structured JSON (machine-readable) AND markdown digest (agent-readable)

Step 3: Run First Collection

node email-collector.mjs collect
node email-collector.mjs digest

Verify: ls data/digests/ should show today's digest file. Read the digest. Confirm it contains real emails with working Gmail links.

Step 4: Enrich Brain Pages

This is YOUR job (the agent). Read the digest. For each email:

  1. Detect entities: who sent it? Who is mentioned? What companies?
  2. Check the brain: gbrain search "sender name" — do we have a page?
  3. Update brain pages: if sender has a brain page, append a timeline entry: - YYYY-MM-DD | Email from {sender}: {subject} [Source: Gmail, {date}]
  4. Create new pages: if sender is notable and has no page, create one
  5. Extract action items: if the email requires a response or action, log it
  6. Sync: run gbrain sync --no-pull --no-embed to index changes

Step 5: Set Up Cron

The collector should run every 30 minutes:

*/30 * * * * cd /path/to/email-collector && node email-collector.mjs collect && node email-collector.mjs digest

The agent should read the digest on a schedule (e.g., 3x/day: 9 AM, 12 PM, 3 PM) and run the enrichment flow from Step 4.

Step 6: Log Setup Completion

mkdir -p ~/.gbrain/integrations/email-to-brain
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.7.0","status":"ok"}' >> ~/.gbrain/integrations/email-to-brain/heartbeat.jsonl

Implementation Guide

These are production-tested patterns. Follow them exactly.

Noise Filtering (Deterministic)

NOISE_SENDERS = ['noreply', 'no-reply', 'notifications@', 'calendar-notification',
                 'mailer-daemon', 'postmaster', 'donotreply']

is_noise(email):
  from = email.from.toLowerCase()
  return NOISE_SENDERS.some(p => from.includes(p))  // substring match

Simple substring matching, not regex. notifications@slack.com matches because notifications@ is in the pattern list. Order doesn't matter.

Signature Detection

SIGNATURE_PATTERNS = [
  /docusign/i, /dropbox sign/i, /hellosign/i, /pandadoc/i,
  /please sign/i, /signature needed/i, /ready for your signature/i,
  /everyone has signed/i, /you just signed/i
]

is_signature(email):
  subject = email.subject || ''
  from = email.from || ''
  return SIGNATURE_PATTERNS.some(p => p.test(subject) || p.test(from))

Test BOTH subject AND from. Signature requests come from services that have "docusign" in the sender address, not just the subject.

gmail_link(messageId, authuser):
  return `https://mail.google.com/mail/u/?authuser=${authuser}#inbox/${messageId}`

The authuser parameter is CRITICAL. Without it, the link opens in the default Gmail account, not the right one. Each email record stores its account separately. Generate these in CODE, never by the LLM. Links must be 100% reliable.

Deduplication

collect():
  state = load_state()
  since = state.lastCollect ? `newer_than:${hours_since}h` : 'newer_than:1d'

  for account in accounts:
    inbox = gmail.list(query=since, max=50)
    for msg in inbox:
      if msg.id in state.knownMessageIds: continue  // already seen
      record = build_record(msg)
      state.knownMessageIds[msg.id] = record

    // ALSO pull sent mail to detect replies
    sent = gmail.list(query=`from:${account.email} ${since}`, max=30)
    for msg in sent:
      state.knownMessageIds[msg.id] = {is_sent: true}

Why sent mail matters: Without it, the digest shows "awaiting response" on threads you already replied to. Sent mail acts as a negative filter.

What the Agent Should Test After Setup

  1. Noise filtering: Send a test email from noreply@test.com. Run collect. Verify it appears in noise section, not triage section.
  2. Gmail links: Click a link from the digest. Verify it opens the correct account (not the default one).
  3. Deduplication: Run collect twice in 1 minute. Verify no duplicate messages.
  4. Sent mail: Reply to an email manually. Run collect. Verify the thread is marked as replied-to in the digest.

Cost Estimate

Component Monthly Cost
ClawVisor (free tier) $0
Gmail API $0 (within free quota)
Total $0

Troubleshooting

No emails collected:

  • Check ClawVisor health: curl $CLAWVISOR_URL/health
  • Check standing task is active and has Gmail service enabled
  • Check task purpose is expansive enough (narrow purposes block requests)

Gmail links don't work:

  • Verify the authuser parameter matches the account email
  • Gmail links require being logged into the correct Google account

Digest is empty but collection ran:

  • Check data/messages/ for JSON files
  • All emails might be filtered as noise — check noise filtering rules