feat: v0.9.0 -- smart file storage, publish, production-grade skills (#62)
* feat: battle-tested skill patterns from production deployment Backport production-learned brain-operations patterns: - Iron Law of Back-Linking (mandatory bidirectional linking) - Brain filing rules (file by primary subject, not format) - Enrichment protocol (7-step pipeline, 3-tier system, person/company templates) - Media ingest workflows (articles, videos, podcasts, PDFs, screenshots) - Citation requirements (mandatory [Source: ...] on every fact) - Test Before Bulk operating principle - Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS - X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger * chore: bump version and changelog (v0.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add _brain-filing-rules.md to CLAUDE.md key files * feat: smart file upload with TUS resumable and .redirect.yaml pointers - Supabase Storage auto-selects upload method by file size: < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry) - Signed URL generation for private bucket access (1-hour expiry) - New `upload-raw` command with size routing: small text stays in git, large/media files go to cloud with .redirect.yaml pointer - New `signed-url` command for generating access links - File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy) - Redirect format upgraded: 10 fields with full metadata - All migration commands (mirror, redirect, restore, clean) handle both formats * feat: skills reference actual gbrain file commands - Filing rules document upload-raw, signed-url, and .redirect.yaml format - Ingest skill uses gbrain files upload-raw for raw source preservation - Maintain skill adds file storage health checks - Setup skill adds storage configuration phase with migration guidance - Voice recipe uses upload-raw for call audio storage - Migration v0.9.0 with complete storage setup instructions * chore: bump version and changelog (v0.9.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: gbrain publish -- shareable HTML with password protection First code+skill pair: deterministic code does the work (strip private data, encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the agent when and how to use it. 34 new tests. See: https://x.com/garrytan/status/2042925773300908103 * feat: backlinks check/fix, page lint, and report commands Three new deterministic tools (zero LLM calls): - gbrain backlinks check/fix -- scans brain for entity mentions without back-links, creates them. Enforces the Iron Law from the skills. - gbrain lint [--fix] -- catches LLM preambles, code fence wrapping, placeholder dates, missing frontmatter, broken citations, empty sections. --fix auto-strips fixable artifacts. - gbrain report --type <name> -- saves timestamped reports to brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails. 33 new tests (409 total, 0 fail). * feat: v0.9.0 migration tells agents to swap scripts for built-in commands Migration file now: - Lists all 5 new deterministic commands with usage examples - Includes a script-to-command replacement table (old -> new) - Tells the agent to find custom script references in AGENTS.md, skills, and cron jobs and replace with gbrain commands - Adds recommended cron jobs for daily backlink fix + weekly lint - References the Thin Harness, Fat Skills thread * fix: CLI routing bugs found during DX review - Fixed subArgs reference error in handleCliOnly (used wrong variable name) - Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid conflict with existing backlinks operation (per-page incoming links) - Added TOOLS section to --help output showing publish, check-backlinks, lint, report - Added upload-raw and signed-url to FILES section in --help - Updated all docs/migration references to use check-backlinks * fix: security hardening from adversarial review - XSS: sanitize marked.parse() output (strip script/iframe/on* attrs) - Path traversal: validate report --type against [a-z0-9-] pattern - TUS: HEAD request before retry to get server's actual offset (TUS spec) - Pointer: upload-raw now includes pointer content in JSON output - Symlinks: use lstatSync in all walkers to prevent directory escape --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,8 +1,8 @@
|
||||
---
|
||||
id: twilio-voice-brain
|
||||
name: Voice-to-Brain
|
||||
version: 0.8.0
|
||||
description: Phone calls create brain pages via Twilio + OpenAI Realtime + GBrain MCP. Callers talk, brain pages appear.
|
||||
version: 0.8.1
|
||||
description: Phone calls create brain pages via Twilio + voice pipeline + GBrain MCP. Two architectures -- OpenAI Realtime (turnkey) or DIY STT+LLM+TTS (full control). Callers talk, brain pages appear.
|
||||
category: sense
|
||||
requires: [ngrok-tunnel]
|
||||
secrets:
|
||||
@@ -52,6 +52,9 @@ auth token is incorrect. Let's re-enter it."
|
||||
|
||||
## Architecture
|
||||
|
||||
Two pipeline options:
|
||||
|
||||
### Option A: OpenAI Realtime (turnkey, simpler)
|
||||
```
|
||||
Caller (phone)
|
||||
↓ Twilio (WebSocket, g711_ulaw audio — no transcoding)
|
||||
@@ -64,6 +67,33 @@ Brain page created (meetings/YYYY-MM-DD-call-{caller}.md)
|
||||
Summary posted to messaging app (Telegram/Slack/Discord)
|
||||
```
|
||||
|
||||
### Option B: DIY STT+LLM+TTS (full control, production-grade)
|
||||
```
|
||||
Caller (phone or WebRTC browser)
|
||||
↓ Twilio WebSocket OR WebRTC
|
||||
Voice Server (Node.js)
|
||||
↓ Deepgram STT (streaming speech-to-text, speaker diarization)
|
||||
↓ Claude API (streaming SSE, sentence-boundary dispatch)
|
||||
↓ Cartesia / OpenAI TTS (text-to-speech, low latency)
|
||||
↓ Function calls during conversation
|
||||
GBrain MCP (semantic search, page reads, page writes)
|
||||
↓ Post-call
|
||||
Brain page + audio upload + transcript storage
|
||||
```
|
||||
|
||||
**Why v2 (Option B)?** OpenAI Realtime is a black box — you can't control STT
|
||||
quality, swap LLMs, or debug audio issues. The DIY stack gives you transparent
|
||||
Deepgram+Claude+TTS with full control over each stage. Trade-off: more integration
|
||||
work, but you own the pipeline.
|
||||
|
||||
**Production-tested v2 architecture (pipeline.mjs, ~250 lines):**
|
||||
- Streaming SSE from Claude with sentence-boundary TTS dispatch
|
||||
- 20-turn conversation history cap (prevents context bloat)
|
||||
- Reconnect logic with exponential backoff on STT/TTS disconnects
|
||||
- Periodic keepalives to prevent WebSocket timeout
|
||||
- Audio endpointing for natural turn-taking
|
||||
- Smart VAD (Silero) as default with push-to-talk fallback
|
||||
|
||||
## Opinionated Defaults
|
||||
|
||||
These are production-tested defaults from a real deployment. Customize after setup.
|
||||
@@ -428,7 +458,7 @@ fi
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.gbrain/integrations/twilio-voice-brain
|
||||
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.7.0","status":"ok","details":{"phone":"TWILIO_NUMBER","deployment":"local+ngrok"}}' >> ~/.gbrain/integrations/twilio-voice-brain/heartbeat.jsonl
|
||||
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.8.1","status":"ok","details":{"phone":"TWILIO_NUMBER","deployment":"local+ngrok"}}' >> ~/.gbrain/integrations/twilio-voice-brain/heartbeat.jsonl
|
||||
```
|
||||
|
||||
Tell the user: "Voice-to-brain is fully set up. Your number is [NUMBER]. Here's
|
||||
@@ -472,6 +502,97 @@ The watchdog restarts the server if it crashes."
|
||||
- The watchdog (Step 9) handles this automatically
|
||||
- For a permanent URL: upgrade to ngrok paid ($8/mo) for a static domain, or deploy to Fly.io/Railway instead
|
||||
|
||||
**Note on Option B credentials:** If using the DIY pipeline (Option B), you will
|
||||
also need API keys for your chosen STT provider (e.g., Deepgram) and TTS provider
|
||||
(e.g., Cartesia, OpenAI TTS). Collect and validate these during Step 2 alongside
|
||||
the Twilio and OpenAI credentials listed above.
|
||||
|
||||
## Critical Production Fixes (v0.8.1)
|
||||
|
||||
These are NOT optional. They prevent real production failures discovered in a
|
||||
deployment handling daily calls.
|
||||
|
||||
### Unicode Crash Fix (CRITICAL)
|
||||
|
||||
**Problem:** Em dashes (--), arrows (->), and other non-ASCII characters in the
|
||||
prompt context cause broken surrogate pairs that crash the Twilio WebSocket
|
||||
connection. Phone calls drop silently.
|
||||
|
||||
**Fix:** Replace ALL non-ASCII characters with ASCII equivalents throughout the
|
||||
entire prompt file before sending to Twilio. This is invisible in development
|
||||
(browsers handle unicode fine) and catastrophic in production.
|
||||
|
||||
```javascript
|
||||
function sanitizeForTwilio(text) {
|
||||
return text
|
||||
.replace(/[\u2014\u2013]/g, '--') // em/en dash
|
||||
.replace(/[\u2018\u2019]/g, "'") // smart quotes
|
||||
.replace(/[\u201C\u201D]/g, '"') // smart double quotes
|
||||
.replace(/\u2192/g, '->') // right arrow
|
||||
.replace(/\u2190/g, '<-') // left arrow
|
||||
.replace(/[\u2026]/g, '...') // ellipsis
|
||||
.replace(/[^\x00-\x7F]/g, '') // strip remaining non-ASCII
|
||||
}
|
||||
```
|
||||
|
||||
### PII Scrub from Voice Context (CRITICAL)
|
||||
|
||||
**Problem:** Brain context loaded into the voice prompt may contain phone numbers,
|
||||
email addresses, and other PII. The voice agent reads these aloud to callers.
|
||||
|
||||
**Fix:** Regex-strip PII from all voice context before injecting into the prompt:
|
||||
- Phone numbers: `/\+?\d[\d\s\-().]{7,}\d/g`
|
||||
- Email addresses: `/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g`
|
||||
- URLs with auth tokens or API keys
|
||||
- Any string matching common credential patterns
|
||||
|
||||
### Identity-First Prompt (IMPORTANT)
|
||||
|
||||
**Problem:** Voice agents lose their identity mid-conversation. Saying "You are NOT
|
||||
Claude" doesn't stick. The model reverts to its base persona.
|
||||
|
||||
**Fix:** Put identity FIRST in the system prompt, before any context or rules:
|
||||
```
|
||||
# You ARE [Agent Name]
|
||||
You are [Name], a voice assistant who works with [Brain Name].
|
||||
You are NOT Claude. You are NOT a general AI assistant.
|
||||
[Name] has their own personality: [traits].
|
||||
|
||||
# Context
|
||||
[... brain context, calendar, tasks ...]
|
||||
|
||||
# Rules
|
||||
[... behavioral rules ...]
|
||||
```
|
||||
|
||||
Positioning identity before context ensures the model sees it first and
|
||||
maintains it throughout the conversation.
|
||||
|
||||
### Auto-Upload Call Audio (RECOMMENDED)
|
||||
|
||||
**Problem:** If post-call processing fails, the call audio is lost forever.
|
||||
|
||||
**Fix:** Auto-upload ALL call audio immediately on call end:
|
||||
- Twilio calls: download the MP3 recording URL from Twilio
|
||||
- WebRTC calls: capture via MediaRecorder (webm/opus format)
|
||||
- Upload via `gbrain files upload-raw <audio-file> --page meetings/call-slug --type call-recording`
|
||||
- GBrain auto-routes: small files stay in git, large files go to cloud storage
|
||||
with `.redirect.yaml` pointer. Files >= 100 MB use TUS resumable upload.
|
||||
- Generate signed URLs for playback: `gbrain files signed-url <storage-path>`
|
||||
- This ensures every call has a recoverable audio source regardless
|
||||
of whether the transcript or brain page was created successfully
|
||||
|
||||
### Smart VAD as Default
|
||||
|
||||
**Problem:** Push-to-talk is unnatural on phone calls. Server-side VAD has
|
||||
variable quality.
|
||||
|
||||
**Fix:** Default to Smart VAD (Silero VAD) for voice activity detection:
|
||||
- Better endpointing than server-side VAD
|
||||
- Fewer false triggers in noisy environments
|
||||
- PTT available as fallback (UI toggle for WebRTC clients)
|
||||
- Presets: quiet (0.7 threshold), normal (0.85), noisy (0.95), very_noisy (0.98)
|
||||
|
||||
## Production Patterns (Recommended)
|
||||
|
||||
These patterns come from a production voice deployment handling real calls daily.
|
||||
@@ -488,13 +609,13 @@ AI brain. "I work with [Brain], [Owner]'s AI." Lighter, more playful, more curio
|
||||
#### Pre-Computed Bid System
|
||||
**Problem:** Dead air kills engagement. Voice agents wait passively.
|
||||
**Pattern:** At call start, scan live context and pre-compute up to 10 engagement bids.
|
||||
Two types: informative (tasks, calendar, social radar) and relational (curiosity templates).
|
||||
Two types: informative (tasks, calendar, social monitoring) and relational (curiosity templates).
|
||||
Bids go INTO the prompt so the agent picks from a list. Use bids #1 and #2 for greeting,
|
||||
cycle the rest during conversation. Never ask "anything else?" — bring up the next bid.
|
||||
|
||||
#### Context-First Prompt
|
||||
**Problem:** Voice agent greets generically because it doesn't know what's happening today.
|
||||
**Pattern:** Load live context at call start: tasks, calendar, location, social radar,
|
||||
**Pattern:** Load live context at call start: tasks, calendar, location, social monitoring,
|
||||
morning briefing. Position context FIRST in the prompt (before rules) so the model sees
|
||||
it immediately and uses it in the greeting. Try/catch per section. Cap 500-1000 chars each.
|
||||
|
||||
@@ -658,7 +779,7 @@ over WebRTC data channel — use Whisper post-call instead.
|
||||
| Keyword | Report Loaded |
|
||||
|---------|--------------|
|
||||
| email, inbox, mail | inbox sweep report |
|
||||
| social, twitter, mentions | social radar report |
|
||||
| social, twitter, mentions | social engagement report |
|
||||
| briefing, morning | morning briefing |
|
||||
| meeting | meeting sync report |
|
||||
| slack | slack scan report |
|
||||
|
||||
Reference in New Issue
Block a user