feat: v0.9.0 -- smart file storage, publish, production-grade skills (#62)

* feat: battle-tested skill patterns from production deployment

Backport production-learned brain-operations patterns:
- Iron Law of Back-Linking (mandatory bidirectional linking)
- Brain filing rules (file by primary subject, not format)
- Enrichment protocol (7-step pipeline, 3-tier system, person/company templates)
- Media ingest workflows (articles, videos, podcasts, PDFs, screenshots)
- Citation requirements (mandatory [Source: ...] on every fact)
- Test Before Bulk operating principle
- Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS
- X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger

* chore: bump version and changelog (v0.8.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add _brain-filing-rules.md to CLAUDE.md key files

* feat: smart file upload with TUS resumable and .redirect.yaml pointers

- Supabase Storage auto-selects upload method by file size:
  < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry)
- Signed URL generation for private bucket access (1-hour expiry)
- New `upload-raw` command with size routing: small text stays in git,
  large/media files go to cloud with .redirect.yaml pointer
- New `signed-url` command for generating access links
- File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy)
- Redirect format upgraded: 10 fields with full metadata
- All migration commands (mirror, redirect, restore, clean) handle both formats

* feat: skills reference actual gbrain file commands

- Filing rules document upload-raw, signed-url, and .redirect.yaml format
- Ingest skill uses gbrain files upload-raw for raw source preservation
- Maintain skill adds file storage health checks
- Setup skill adds storage configuration phase with migration guidance
- Voice recipe uses upload-raw for call audio storage
- Migration v0.9.0 with complete storage setup instructions

* chore: bump version and changelog (v0.9.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: gbrain publish -- shareable HTML with password protection

First code+skill pair: deterministic code does the work (strip private data,
encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the
agent when and how to use it. 34 new tests.

See: https://x.com/garrytan/status/2042925773300908103

* feat: backlinks check/fix, page lint, and report commands

Three new deterministic tools (zero LLM calls):

- gbrain backlinks check/fix -- scans brain for entity mentions without
  back-links, creates them. Enforces the Iron Law from the skills.
- gbrain lint [--fix] -- catches LLM preambles, code fence wrapping,
  placeholder dates, missing frontmatter, broken citations, empty sections.
  --fix auto-strips fixable artifacts.
- gbrain report --type <name> -- saves timestamped reports to
  brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails.

33 new tests (409 total, 0 fail).

* feat: v0.9.0 migration tells agents to swap scripts for built-in commands

Migration file now:
- Lists all 5 new deterministic commands with usage examples
- Includes a script-to-command replacement table (old -> new)
- Tells the agent to find custom script references in AGENTS.md,
  skills, and cron jobs and replace with gbrain commands
- Adds recommended cron jobs for daily backlink fix + weekly lint
- References the Thin Harness, Fat Skills thread

* fix: CLI routing bugs found during DX review

- Fixed subArgs reference error in handleCliOnly (used wrong variable name)
- Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid
  conflict with existing backlinks operation (per-page incoming links)
- Added TOOLS section to --help output showing publish, check-backlinks,
  lint, report
- Added upload-raw and signed-url to FILES section in --help
- Updated all docs/migration references to use check-backlinks

* fix: security hardening from adversarial review

- XSS: sanitize marked.parse() output (strip script/iframe/on* attrs)
- Path traversal: validate report --type against [a-z0-9-] pattern
- TUS: HEAD request before retry to get server's actual offset (TUS spec)
- Pointer: upload-raw now includes pointer content in JSON output
- Symlinks: use lstatSync in all walkers to prevent directory escape

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-11 21:46:07 -10:00
committed by GitHub
parent 91ced664b6
commit baf3517868
30 changed files with 3239 additions and 92 deletions

View File

@@ -1,8 +1,8 @@
---
id: twilio-voice-brain
name: Voice-to-Brain
version: 0.8.0
description: Phone calls create brain pages via Twilio + OpenAI Realtime + GBrain MCP. Callers talk, brain pages appear.
version: 0.8.1
description: Phone calls create brain pages via Twilio + voice pipeline + GBrain MCP. Two architectures -- OpenAI Realtime (turnkey) or DIY STT+LLM+TTS (full control). Callers talk, brain pages appear.
category: sense
requires: [ngrok-tunnel]
secrets:
@@ -52,6 +52,9 @@ auth token is incorrect. Let's re-enter it."
## Architecture
Two pipeline options:
### Option A: OpenAI Realtime (turnkey, simpler)
```
Caller (phone)
↓ Twilio (WebSocket, g711_ulaw audio — no transcoding)
@@ -64,6 +67,33 @@ Brain page created (meetings/YYYY-MM-DD-call-{caller}.md)
Summary posted to messaging app (Telegram/Slack/Discord)
```
### Option B: DIY STT+LLM+TTS (full control, production-grade)
```
Caller (phone or WebRTC browser)
↓ Twilio WebSocket OR WebRTC
Voice Server (Node.js)
↓ Deepgram STT (streaming speech-to-text, speaker diarization)
↓ Claude API (streaming SSE, sentence-boundary dispatch)
↓ Cartesia / OpenAI TTS (text-to-speech, low latency)
↓ Function calls during conversation
GBrain MCP (semantic search, page reads, page writes)
↓ Post-call
Brain page + audio upload + transcript storage
```
**Why v2 (Option B)?** OpenAI Realtime is a black box — you can't control STT
quality, swap LLMs, or debug audio issues. The DIY stack gives you transparent
Deepgram+Claude+TTS with full control over each stage. Trade-off: more integration
work, but you own the pipeline.
**Production-tested v2 architecture (pipeline.mjs, ~250 lines):**
- Streaming SSE from Claude with sentence-boundary TTS dispatch
- 20-turn conversation history cap (prevents context bloat)
- Reconnect logic with exponential backoff on STT/TTS disconnects
- Periodic keepalives to prevent WebSocket timeout
- Audio endpointing for natural turn-taking
- Smart VAD (Silero) as default with push-to-talk fallback
## Opinionated Defaults
These are production-tested defaults from a real deployment. Customize after setup.
@@ -428,7 +458,7 @@ fi
```bash
mkdir -p ~/.gbrain/integrations/twilio-voice-brain
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.7.0","status":"ok","details":{"phone":"TWILIO_NUMBER","deployment":"local+ngrok"}}' >> ~/.gbrain/integrations/twilio-voice-brain/heartbeat.jsonl
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.8.1","status":"ok","details":{"phone":"TWILIO_NUMBER","deployment":"local+ngrok"}}' >> ~/.gbrain/integrations/twilio-voice-brain/heartbeat.jsonl
```
Tell the user: "Voice-to-brain is fully set up. Your number is [NUMBER]. Here's
@@ -472,6 +502,97 @@ The watchdog restarts the server if it crashes."
- The watchdog (Step 9) handles this automatically
- For a permanent URL: upgrade to ngrok paid ($8/mo) for a static domain, or deploy to Fly.io/Railway instead
**Note on Option B credentials:** If using the DIY pipeline (Option B), you will
also need API keys for your chosen STT provider (e.g., Deepgram) and TTS provider
(e.g., Cartesia, OpenAI TTS). Collect and validate these during Step 2 alongside
the Twilio and OpenAI credentials listed above.
## Critical Production Fixes (v0.8.1)
These are NOT optional. They prevent real production failures discovered in a
deployment handling daily calls.
### Unicode Crash Fix (CRITICAL)
**Problem:** Em dashes (--), arrows (->), and other non-ASCII characters in the
prompt context cause broken surrogate pairs that crash the Twilio WebSocket
connection. Phone calls drop silently.
**Fix:** Replace ALL non-ASCII characters with ASCII equivalents throughout the
entire prompt file before sending to Twilio. This is invisible in development
(browsers handle unicode fine) and catastrophic in production.
```javascript
function sanitizeForTwilio(text) {
return text
.replace(/[\u2014\u2013]/g, '--') // em/en dash
.replace(/[\u2018\u2019]/g, "'") // smart quotes
.replace(/[\u201C\u201D]/g, '"') // smart double quotes
.replace(/\u2192/g, '->') // right arrow
.replace(/\u2190/g, '<-') // left arrow
.replace(/[\u2026]/g, '...') // ellipsis
.replace(/[^\x00-\x7F]/g, '') // strip remaining non-ASCII
}
```
### PII Scrub from Voice Context (CRITICAL)
**Problem:** Brain context loaded into the voice prompt may contain phone numbers,
email addresses, and other PII. The voice agent reads these aloud to callers.
**Fix:** Regex-strip PII from all voice context before injecting into the prompt:
- Phone numbers: `/\+?\d[\d\s\-().]{7,}\d/g`
- Email addresses: `/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g`
- URLs with auth tokens or API keys
- Any string matching common credential patterns
### Identity-First Prompt (IMPORTANT)
**Problem:** Voice agents lose their identity mid-conversation. Saying "You are NOT
Claude" doesn't stick. The model reverts to its base persona.
**Fix:** Put identity FIRST in the system prompt, before any context or rules:
```
# You ARE [Agent Name]
You are [Name], a voice assistant who works with [Brain Name].
You are NOT Claude. You are NOT a general AI assistant.
[Name] has their own personality: [traits].
# Context
[... brain context, calendar, tasks ...]
# Rules
[... behavioral rules ...]
```
Positioning identity before context ensures the model sees it first and
maintains it throughout the conversation.
### Auto-Upload Call Audio (RECOMMENDED)
**Problem:** If post-call processing fails, the call audio is lost forever.
**Fix:** Auto-upload ALL call audio immediately on call end:
- Twilio calls: download the MP3 recording URL from Twilio
- WebRTC calls: capture via MediaRecorder (webm/opus format)
- Upload via `gbrain files upload-raw <audio-file> --page meetings/call-slug --type call-recording`
- GBrain auto-routes: small files stay in git, large files go to cloud storage
with `.redirect.yaml` pointer. Files >= 100 MB use TUS resumable upload.
- Generate signed URLs for playback: `gbrain files signed-url <storage-path>`
- This ensures every call has a recoverable audio source regardless
of whether the transcript or brain page was created successfully
### Smart VAD as Default
**Problem:** Push-to-talk is unnatural on phone calls. Server-side VAD has
variable quality.
**Fix:** Default to Smart VAD (Silero VAD) for voice activity detection:
- Better endpointing than server-side VAD
- Fewer false triggers in noisy environments
- PTT available as fallback (UI toggle for WebRTC clients)
- Presets: quiet (0.7 threshold), normal (0.85), noisy (0.95), very_noisy (0.98)
## Production Patterns (Recommended)
These patterns come from a production voice deployment handling real calls daily.
@@ -488,13 +609,13 @@ AI brain. "I work with [Brain], [Owner]'s AI." Lighter, more playful, more curio
#### Pre-Computed Bid System
**Problem:** Dead air kills engagement. Voice agents wait passively.
**Pattern:** At call start, scan live context and pre-compute up to 10 engagement bids.
Two types: informative (tasks, calendar, social radar) and relational (curiosity templates).
Two types: informative (tasks, calendar, social monitoring) and relational (curiosity templates).
Bids go INTO the prompt so the agent picks from a list. Use bids #1 and #2 for greeting,
cycle the rest during conversation. Never ask "anything else?" — bring up the next bid.
#### Context-First Prompt
**Problem:** Voice agent greets generically because it doesn't know what's happening today.
**Pattern:** Load live context at call start: tasks, calendar, location, social radar,
**Pattern:** Load live context at call start: tasks, calendar, location, social monitoring,
morning briefing. Position context FIRST in the prompt (before rules) so the model sees
it immediately and uses it in the greeting. Try/catch per section. Cap 500-1000 chars each.
@@ -658,7 +779,7 @@ over WebRTC data channel — use Whisper post-call instead.
| Keyword | Report Loaded |
|---------|--------------|
| email, inbox, mail | inbox sweep report |
| social, twitter, mentions | social radar report |
| social, twitter, mentions | social engagement report |
| briefing, morning | morning briefing |
| meeting | meeting sync report |
| slack | slack scan report |

View File

@@ -1,8 +1,8 @@
---
id: x-to-brain
name: X-to-Brain
version: 0.7.0
description: Twitter timeline, mentions, and keyword monitoring flow into brain pages. Tracks deletions and engagement velocity.
version: 0.8.1
description: Twitter timeline, mentions, and keyword monitoring flow into brain pages. Tracks deletions, engagement velocity, OCR on images, and real-time alerts.
category: sense
requires: []
secrets:
@@ -201,7 +201,99 @@ The agent should review collected data 2-3x daily and run enrichment.
```bash
mkdir -p ~/.gbrain/integrations/x-to-brain
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.7.0","status":"ok","details":{"user_id":"X_USER_ID"}}' >> ~/.gbrain/integrations/x-to-brain/heartbeat.jsonl
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.8.1","status":"ok","details":{"user_id":"X_USER_ID"}}' >> ~/.gbrain/integrations/x-to-brain/heartbeat.jsonl
```
## Production Patterns (v0.8.1)
These patterns come from a production deployment tracking 19+ accounts with
real-time monitoring.
### Image OCR (NEW)
**Problem:** Text-only collection misses visual context in tweet images --
screenshots, charts, memes with text overlay, quote screenshots.
**Fix:** Run OCR on tweet images via a vision model (Claude Sonnet or equivalent):
- For every tweet with images, extract full text content via vision API
- Store OCR output alongside the tweet data
- Include extracted text in entity detection and brain page updates
- Charts/data visualizations: extract data points, describe findings
This catches signal that text-only collectors miss entirely.
### Real-Time Monitoring via Filtered Stream (NEW)
**Problem:** 30-minute polling means you find out about things 30 minutes late.
For time-sensitive content (engagement spikes, deletions, breaking threads),
that's too slow.
**Fix:** Use Twitter's Filtered Stream API (`GET /2/tweets/search/stream`) for
near-real-time monitoring. Catches outbound tweets within seconds.
**Setup:**
1. Add filter rules: `POST /2/tweets/search/stream/rules` with your tracking terms
2. Open persistent connection: `GET /2/tweets/search/stream`
3. Process tweets as they arrive (no polling delay)
**Requirements:** Basic tier ($200/mo) minimum for Filtered Stream access.
**Use alongside polling:** Stream for real-time alerts, polling for completeness
(stream can drop tweets during disconnects).
### Tweet Rating Rubric (NEW)
**Problem:** Not all tweets deserve the same attention. Without scoring, every
tweet gets equal weight.
**Fix:** Rate tweets on a 6-dimension rubric:
1. **Reach** -- follower count, engagement rate
2. **Relevance** -- connection to your interests/work
3. **Sentiment** -- positive/negative/neutral toward you
4. **Novelty** -- new information vs rehash
5. **Actionability** -- does this require a response?
6. **Virality potential** -- engagement velocity, quote-tweet ratio
Re-rate after 60 minutes to track engagement trajectory. A tweet at 50 likes
that hits 500 in an hour is a different signal than one that stays at 50.
### Outbound Tweet Monitoring (NEW)
**Problem:** You tweet something and don't notice engagement patterns until
hours later.
**Fix:** 60-second monitoring window after every outbound tweet:
- Check engagement velocity (likes, replies, quotes)
- Flag unusual reply-to-like ratios (high reply ratios signal controversy)
- Flag if quote-tweet ratio > retweet ratio (commentary, not sharing)
- Cross-reference mentioned accounts against brain for context
### X-to-Brain Pipeline (NEW)
Every tweet interaction can automatically create/update brain pages:
- Mentioned person has a brain page? Append to their timeline
- New person mentioned? Check notability gate, create page if notable
- Article URL in tweet? Fetch and ingest via article workflow
- Video URL in tweet? Queue for transcription pipeline
- Images? OCR and extract text content
Follow `skills/_brain-filing-rules.md` for filing decisions.
### Cron Staggering (IMPORTANT)
**Problem:** Multiple cron jobs firing simultaneously causes resource contention
and timeouts.
**Fix:** Stagger all collection schedules so max 1 runs per minute:
```
# Good: staggered
*/30 * * * * x-collector # :00, :30
5,35 * * * * x-bundle-ingest # :05, :35
10 */3 * * * social-monitor # :10 every 3h
# Bad: overlapping
*/30 * * * * x-collector
*/30 * * * * x-bundle-ingest # fires at same time!
```
## Implementation Guide