feat: v0.9.0 -- smart file storage, publish, production-grade skills (#62)

* feat: battle-tested skill patterns from production deployment Backport production-learned brain-operations patterns: - Iron Law of Back-Linking (mandatory bidirectional linking) - Brain filing rules (file by primary subject, not format) - Enrichment protocol (7-step pipeline, 3-tier system, person/company templates) - Media ingest workflows (articles, videos, podcasts, PDFs, screenshots) - Citation requirements (mandatory [Source: ...] on every fact) - Test Before Bulk operating principle - Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS - X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger * chore: bump version and changelog (v0.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add _brain-filing-rules.md to CLAUDE.md key files * feat: smart file upload with TUS resumable and .redirect.yaml pointers - Supabase Storage auto-selects upload method by file size: < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry) - Signed URL generation for private bucket access (1-hour expiry) - New `upload-raw` command with size routing: small text stays in git, large/media files go to cloud with .redirect.yaml pointer - New `signed-url` command for generating access links - File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy) - Redirect format upgraded: 10 fields with full metadata - All migration commands (mirror, redirect, restore, clean) handle both formats * feat: skills reference actual gbrain file commands - Filing rules document upload-raw, signed-url, and .redirect.yaml format - Ingest skill uses gbrain files upload-raw for raw source preservation - Maintain skill adds file storage health checks - Setup skill adds storage configuration phase with migration guidance - Voice recipe uses upload-raw for call audio storage - Migration v0.9.0 with complete storage setup instructions * chore: bump version and changelog (v0.9.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: gbrain publish -- shareable HTML with password protection First code+skill pair: deterministic code does the work (strip private data, encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the agent when and how to use it. 34 new tests. See: https://x.com/garrytan/status/2042925773300908103 * feat: backlinks check/fix, page lint, and report commands Three new deterministic tools (zero LLM calls): - gbrain backlinks check/fix -- scans brain for entity mentions without back-links, creates them. Enforces the Iron Law from the skills. - gbrain lint [--fix] -- catches LLM preambles, code fence wrapping, placeholder dates, missing frontmatter, broken citations, empty sections. --fix auto-strips fixable artifacts. - gbrain report --type <name> -- saves timestamped reports to brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails. 33 new tests (409 total, 0 fail). * feat: v0.9.0 migration tells agents to swap scripts for built-in commands Migration file now: - Lists all 5 new deterministic commands with usage examples - Includes a script-to-command replacement table (old -> new) - Tells the agent to find custom script references in AGENTS.md, skills, and cron jobs and replace with gbrain commands - Adds recommended cron jobs for daily backlink fix + weekly lint - References the Thin Harness, Fat Skills thread * fix: CLI routing bugs found during DX review - Fixed subArgs reference error in handleCliOnly (used wrong variable name) - Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid conflict with existing backlinks operation (per-page incoming links) - Added TOOLS section to --help output showing publish, check-backlinks, lint, report - Added upload-raw and signed-url to FILES section in --help - Updated all docs/migration references to use check-backlinks * fix: security hardening from adversarial review - XSS: sanitize marked.parse() output (strip script/iframe/on* attrs) - Path traversal: validate report --type against [a-z0-9-] pattern - TUS: HEAD request before retry to get server's actual offset (TUS spec) - Pointer: upload-raw now includes pointer content in JSON output - Symlinks: use lstatSync in all walkers to prevent directory escape --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 21:46:07 -10:00
parent 91ced664b6
commit baf3517868
30 changed files with 3239 additions and 92 deletions
--- a/recipes/twilio-voice-brain.md
+++ b/recipes/twilio-voice-brain.md
@@ -1,8 +1,8 @@
 ---
 id: twilio-voice-brain
 name: Voice-to-Brain
-version: 0.8.0
-description: Phone calls create brain pages via Twilio + OpenAI Realtime + GBrain MCP. Callers talk, brain pages appear.
+version: 0.8.1
+description: Phone calls create brain pages via Twilio + voice pipeline + GBrain MCP. Two architectures -- OpenAI Realtime (turnkey) or DIY STT+LLM+TTS (full control). Callers talk, brain pages appear.
 category: sense
 requires: [ngrok-tunnel]
 secrets:
@@ -52,6 +52,9 @@ auth token is incorrect. Let's re-enter it."

 ## Architecture

+Two pipeline options:
+
+### Option A: OpenAI Realtime (turnkey, simpler)
 ```
 Caller (phone)
  ↓ Twilio (WebSocket, g711_ulaw audio — no transcoding)
@@ -64,6 +67,33 @@ Brain page created (meetings/YYYY-MM-DD-call-{caller}.md)
 Summary posted to messaging app (Telegram/Slack/Discord)
 ```

+### Option B: DIY STT+LLM+TTS (full control, production-grade)
+```
+Caller (phone or WebRTC browser)
+  ↓ Twilio WebSocket OR WebRTC
+Voice Server (Node.js)
+  ↓ Deepgram STT (streaming speech-to-text, speaker diarization)
+  ↓ Claude API (streaming SSE, sentence-boundary dispatch)
+  ↓ Cartesia / OpenAI TTS (text-to-speech, low latency)
+  ↓ Function calls during conversation
+GBrain MCP (semantic search, page reads, page writes)
+  ↓ Post-call
+Brain page + audio upload + transcript storage
+```
+
+**Why v2 (Option B)?** OpenAI Realtime is a black box — you can't control STT
+quality, swap LLMs, or debug audio issues. The DIY stack gives you transparent
+Deepgram+Claude+TTS with full control over each stage. Trade-off: more integration
+work, but you own the pipeline.
+
+**Production-tested v2 architecture (pipeline.mjs, ~250 lines):**
+- Streaming SSE from Claude with sentence-boundary TTS dispatch
+- 20-turn conversation history cap (prevents context bloat)
+- Reconnect logic with exponential backoff on STT/TTS disconnects
+- Periodic keepalives to prevent WebSocket timeout
+- Audio endpointing for natural turn-taking
+- Smart VAD (Silero) as default with push-to-talk fallback
+
 ## Opinionated Defaults

 These are production-tested defaults from a real deployment. Customize after setup.
@@ -428,7 +458,7 @@ fi

 ```bash
 mkdir -p ~/.gbrain/integrations/twilio-voice-brain
-echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.7.0","status":"ok","details":{"phone":"TWILIO_NUMBER","deployment":"local+ngrok"}}' >> ~/.gbrain/integrations/twilio-voice-brain/heartbeat.jsonl
+echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.8.1","status":"ok","details":{"phone":"TWILIO_NUMBER","deployment":"local+ngrok"}}' >> ~/.gbrain/integrations/twilio-voice-brain/heartbeat.jsonl
 ```

 Tell the user: "Voice-to-brain is fully set up. Your number is [NUMBER]. Here's
@@ -472,6 +502,97 @@ The watchdog restarts the server if it crashes."
 - The watchdog (Step 9) handles this automatically
 - For a permanent URL: upgrade to ngrok paid ($8/mo) for a static domain, or deploy to Fly.io/Railway instead

+**Note on Option B credentials:** If using the DIY pipeline (Option B), you will
+also need API keys for your chosen STT provider (e.g., Deepgram) and TTS provider
+(e.g., Cartesia, OpenAI TTS). Collect and validate these during Step 2 alongside
+the Twilio and OpenAI credentials listed above.
+
+## Critical Production Fixes (v0.8.1)
+
+These are NOT optional. They prevent real production failures discovered in a
+deployment handling daily calls.
+
+### Unicode Crash Fix (CRITICAL)
+
+**Problem:** Em dashes (--), arrows (->), and other non-ASCII characters in the
+prompt context cause broken surrogate pairs that crash the Twilio WebSocket
+connection. Phone calls drop silently.
+
+**Fix:** Replace ALL non-ASCII characters with ASCII equivalents throughout the
+entire prompt file before sending to Twilio. This is invisible in development
+(browsers handle unicode fine) and catastrophic in production.
+
+```javascript
+function sanitizeForTwilio(text) {
+  return text
+    .replace(/[\u2014\u2013]/g, '--')   // em/en dash
+    .replace(/[\u2018\u2019]/g, "'")     // smart quotes
+    .replace(/[\u201C\u201D]/g, '"')     // smart double quotes
+    .replace(/\u2192/g, '->')              // right arrow
+    .replace(/\u2190/g, '<-')              // left arrow
+    .replace(/[\u2026]/g, '...')         // ellipsis
+    .replace(/[^\x00-\x7F]/g, '')        // strip remaining non-ASCII
+}
+```
+
+### PII Scrub from Voice Context (CRITICAL)
+
+**Problem:** Brain context loaded into the voice prompt may contain phone numbers,
+email addresses, and other PII. The voice agent reads these aloud to callers.
+
+**Fix:** Regex-strip PII from all voice context before injecting into the prompt:
+- Phone numbers: `/\+?\d[\d\s\-().]{7,}\d/g`
+- Email addresses: `/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g`
+- URLs with auth tokens or API keys
+- Any string matching common credential patterns
+
+### Identity-First Prompt (IMPORTANT)
+
+**Problem:** Voice agents lose their identity mid-conversation. Saying "You are NOT
+Claude" doesn't stick. The model reverts to its base persona.
+
+**Fix:** Put identity FIRST in the system prompt, before any context or rules:
+```
+# You ARE [Agent Name]
+You are [Name], a voice assistant who works with [Brain Name].
+You are NOT Claude. You are NOT a general AI assistant.
+[Name] has their own personality: [traits].
+
+# Context
+[... brain context, calendar, tasks ...]
+
+# Rules
+[... behavioral rules ...]
+```
+
+Positioning identity before context ensures the model sees it first and
+maintains it throughout the conversation.
+
+### Auto-Upload Call Audio (RECOMMENDED)
+
+**Problem:** If post-call processing fails, the call audio is lost forever.
+
+**Fix:** Auto-upload ALL call audio immediately on call end:
+- Twilio calls: download the MP3 recording URL from Twilio
+- WebRTC calls: capture via MediaRecorder (webm/opus format)
+- Upload via `gbrain files upload-raw <audio-file> --page meetings/call-slug --type call-recording`
+- GBrain auto-routes: small files stay in git, large files go to cloud storage
+  with `.redirect.yaml` pointer. Files >= 100 MB use TUS resumable upload.
+- Generate signed URLs for playback: `gbrain files signed-url <storage-path>`
+- This ensures every call has a recoverable audio source regardless
+  of whether the transcript or brain page was created successfully
+
+### Smart VAD as Default
+
+**Problem:** Push-to-talk is unnatural on phone calls. Server-side VAD has
+variable quality.
+
+**Fix:** Default to Smart VAD (Silero VAD) for voice activity detection:
+- Better endpointing than server-side VAD
+- Fewer false triggers in noisy environments
+- PTT available as fallback (UI toggle for WebRTC clients)
+- Presets: quiet (0.7 threshold), normal (0.85), noisy (0.95), very_noisy (0.98)
+
 ## Production Patterns (Recommended)

 These patterns come from a production voice deployment handling real calls daily.
@@ -488,13 +609,13 @@ AI brain. "I work with [Brain], [Owner]'s AI." Lighter, more playful, more curio
 #### Pre-Computed Bid System
 **Problem:** Dead air kills engagement. Voice agents wait passively.
 **Pattern:** At call start, scan live context and pre-compute up to 10 engagement bids.
-Two types: informative (tasks, calendar, social radar) and relational (curiosity templates).
+Two types: informative (tasks, calendar, social monitoring) and relational (curiosity templates).
 Bids go INTO the prompt so the agent picks from a list. Use bids #1 and #2 for greeting,
 cycle the rest during conversation. Never ask "anything else?" — bring up the next bid.

 #### Context-First Prompt
 **Problem:** Voice agent greets generically because it doesn't know what's happening today.
-**Pattern:** Load live context at call start: tasks, calendar, location, social radar,
+**Pattern:** Load live context at call start: tasks, calendar, location, social monitoring,
 morning briefing. Position context FIRST in the prompt (before rules) so the model sees
 it immediately and uses it in the greeting. Try/catch per section. Cap 500-1000 chars each.

@@ -658,7 +779,7 @@ over WebRTC data channel — use Whisper post-call instead.
 | Keyword | Report Loaded |
 |---------|--------------|
 | email, inbox, mail | inbox sweep report |
-| social, twitter, mentions | social radar report |
+| social, twitter, mentions | social engagement report |
 | briefing, morning | morning briefing |
 | meeting | meeting sync report |
 | slack | slack scan report |