* feat: battle-tested skill patterns from production deployment Backport production-learned brain-operations patterns: - Iron Law of Back-Linking (mandatory bidirectional linking) - Brain filing rules (file by primary subject, not format) - Enrichment protocol (7-step pipeline, 3-tier system, person/company templates) - Media ingest workflows (articles, videos, podcasts, PDFs, screenshots) - Citation requirements (mandatory [Source: ...] on every fact) - Test Before Bulk operating principle - Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS - X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger * chore: bump version and changelog (v0.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add _brain-filing-rules.md to CLAUDE.md key files * feat: smart file upload with TUS resumable and .redirect.yaml pointers - Supabase Storage auto-selects upload method by file size: < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry) - Signed URL generation for private bucket access (1-hour expiry) - New `upload-raw` command with size routing: small text stays in git, large/media files go to cloud with .redirect.yaml pointer - New `signed-url` command for generating access links - File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy) - Redirect format upgraded: 10 fields with full metadata - All migration commands (mirror, redirect, restore, clean) handle both formats * feat: skills reference actual gbrain file commands - Filing rules document upload-raw, signed-url, and .redirect.yaml format - Ingest skill uses gbrain files upload-raw for raw source preservation - Maintain skill adds file storage health checks - Setup skill adds storage configuration phase with migration guidance - Voice recipe uses upload-raw for call audio storage - Migration v0.9.0 with complete storage setup instructions * chore: bump version and changelog (v0.9.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: gbrain publish -- shareable HTML with password protection First code+skill pair: deterministic code does the work (strip private data, encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the agent when and how to use it. 34 new tests. See: https://x.com/garrytan/status/2042925773300908103 * feat: backlinks check/fix, page lint, and report commands Three new deterministic tools (zero LLM calls): - gbrain backlinks check/fix -- scans brain for entity mentions without back-links, creates them. Enforces the Iron Law from the skills. - gbrain lint [--fix] -- catches LLM preambles, code fence wrapping, placeholder dates, missing frontmatter, broken citations, empty sections. --fix auto-strips fixable artifacts. - gbrain report --type <name> -- saves timestamped reports to brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails. 33 new tests (409 total, 0 fail). * feat: v0.9.0 migration tells agents to swap scripts for built-in commands Migration file now: - Lists all 5 new deterministic commands with usage examples - Includes a script-to-command replacement table (old -> new) - Tells the agent to find custom script references in AGENTS.md, skills, and cron jobs and replace with gbrain commands - Adds recommended cron jobs for daily backlink fix + weekly lint - References the Thin Harness, Fat Skills thread * fix: CLI routing bugs found during DX review - Fixed subArgs reference error in handleCliOnly (used wrong variable name) - Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid conflict with existing backlinks operation (per-page incoming links) - Added TOOLS section to --help output showing publish, check-backlinks, lint, report - Added upload-raw and signed-url to FILES section in --help - Updated all docs/migration references to use check-backlinks * fix: security hardening from adversarial review - XSS: sanitize marked.parse() output (strip script/iframe/on* attrs) - Path traversal: validate report --type against [a-z0-9-] pattern - TUS: HEAD request before retry to get server's actual offset (TUS spec) - Pointer: upload-raw now includes pointer content in JSON output - Symlinks: use lstatSync in all walkers to prevent directory escape --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
10 KiB
Ingest Skill
Ingest meetings, articles, media, documents, and conversations into the brain.
Filing rule: Read
skills/_brain-filing-rules.mdbefore creating any new page.
Iron Law: Back-Linking (MANDATORY)
Every mention of a person or company with a brain page MUST create a back-link
FROM that entity's page TO the page mentioning them. An unlinked mention is a
broken brain. See skills/_brain-filing-rules.md for format.
Citation Requirements (MANDATORY)
Every fact written to a brain page must carry an inline [Source: ...] citation.
- User's statements:
[Source: User, {context}, YYYY-MM-DD] - Meeting data:
[Source: Meeting "{title}", YYYY-MM-DD] - Email/message:
[Source: email from {name} re: {subject}, YYYY-MM-DD] - Web content:
[Source: {publication}, {URL}, YYYY-MM-DD] - Social media:
[Source: X/@handle, YYYY-MM-DD](URL)(include link) - Synthesis:
[Source: compiled from {sources}]
Workflow
- Parse the source. Extract people, companies, dates, and events from the input.
- For each entity mentioned:
- Read the entity's page from gbrain to check if it exists
- If exists: update compiled_truth (rewrite State section with new info, don't append)
- If new: check notability gate, then store the page in gbrain with the appropriate type and slug
- Append to timeline. Add a timeline entry in gbrain for each event, with date, summary, and source citation.
- Create cross-reference links. Link entities in gbrain for every entity pair mentioned together, using the appropriate relationship type.
- Back-link all entities. Update EVERY mentioned entity's page with a back-link to this page (Iron Law).
- Timeline merge. The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.
Entity Detection on Every Message
Production agents should detect entity mentions on EVERY inbound message. This is the signal detection loop that makes the brain compound over time.
Protocol
- Scan the message for entity mentions: people, companies, concepts, original thinking. Fire on every message (no exceptions unless purely operational).
- For each entity detected:
gbrain search "name"-- does a page already exist?- If yes: load context with
gbrain get <slug>. Use the compiled truth to inform your response. Update the page if the message contains new information. - If no: assess notability (see
skills/_brain-filing-rules.md). If the entity is worth tracking, create a new page withgbrain put <type/slug>and populate with what you know.
- After creating or updating pages: sync to gbrain:
gbrain sync --no-pull --no-embed - Don't block the conversation. Entity detection and enrichment should happen alongside the response, not before it. The user shouldn't wait for brain writes to get an answer.
What counts as notable
- People the user interacts with or discusses (not random mentions)
- Companies relevant to the user's work or interests
- Concepts or frameworks the user references or creates
- The user's own original thinking (ideas, theses, observations) -- highest value
- See
skills/_brain-filing-rules.mdfor the full notability gate
What to capture from the user's own thinking
Original thinking is the most valuable signal. Capture exact phrasing -- the user's language IS the insight. Don't paraphrase.
- Novel observations or theses
- Frameworks, mental models, heuristics
- Connections between ideas that others miss
- Contrarian positions with reasoning
- Strong reactions to external stimuli (what triggered it and why)
Media Workflows
Content the user encounters should be captured in the brain. File by PRIMARY
SUBJECT, not by format (see skills/_brain-filing-rules.md).
Articles & Web Content
Input: URL shared by user, or article mentioned in conversation.
Process:
- Fetch content (
web_fetchor equivalent) - Extract: title, author, publication, date, full text
- Summarize: executive summary + key arguments (not a rehash)
- Extract entities: people, companies, concepts mentioned
- Save raw source for provenance (see Raw Source Preservation below)
- Analyze for the user: don't just summarize. What's interesting given what you know about them? Flag connections, contradictions, content opportunities.
Write to: appropriate directory per filing rules (about a person -> people/,
about a company -> companies/, reusable framework -> concepts/, raw data -> sources/)
Videos & Podcasts
Input: URL (YouTube, podcast, etc.) or local audio/video file.
Process:
- Get transcript -- speaker-diarized if possible (services like Diarize.io provide speaker-labeled, word-level timing)
- Save raw transcript (both JSON and human-readable TXT)
- Analyze: executive summary, key ideas, key quotes with speaker attribution, notable stories/anecdotes, people and companies mentioned
- Extract and cross-reference all entities mentioned
- HARD RULE: every video/podcast brain page MUST link to the raw diarized transcript. A page without transcript links is incomplete.
Write to: media/videos/ or media/podcasts/ with back-links to all entities.
Quality bar:
- Compelling headline (not "This video discusses...")
- Executive summary that makes you want to watch/listen
- Key Ideas as actual insights, not topic labels
- Verbatim quotes with real speaker names (not "speaker_0")
- All entities extracted with context and back-linked
PDFs & Documents
Input: File path or URL.
Process:
- Extract text (OCR if scanned/image PDF)
- Save raw source for provenance
- Summarize: executive summary + key sections + notable data
- Extract entities
- Cross-reference from entity pages
Write to: per filing rules (file by primary subject, not format).
Screenshots & Images
Input: Image file.
Process:
- Analyze content (OCR for text-heavy images, description for photos)
- If tweet screenshot: extract text, author, date, route to social media workflow
- If article screenshot: extract text, route to article workflow
- If data/chart: extract data points, describe findings
Write to: depends on content -- route to the appropriate workflow above.
Meeting Transcripts
Input: Transcript from meeting recording service, or manual notes.
Process:
- Pull full transcript (source of truth -- AI summaries are medium-low trust)
- Save raw transcript for provenance
- Write meeting page with YOUR analysis above the line, raw transcript below
- Entity propagation (MANDATORY): for each attendee and company discussed:
- Update their brain page State section if new info surfaced
- Append to their Timeline with link to the meeting page
- Create page if person/company is notable and has no page yet
- A meeting is NOT fully ingested until all entity pages are updated
Write to: meetings/YYYY-MM-DD-short-description.md
What makes a good meeting page:
- Reveals the real crux, not a bullet dump
- Connects to existing brain pages (people, companies, deals)
- Flags what changed (status, decisions, new info)
- Names tension or what was left unsaid
- Captures actual dynamic, not performative summary
Social Media Content
Input: Tweet, thread, or social media post.
Process:
- Fetch full content (thread, quote tweets, context)
- If images present: OCR via vision model for full text extraction
- Summarize: what's being said, why it matters, who's involved
- Extract entities and update brain pages
- Include direct link to the original post (MANDATORY for citations)
Write to: media/x/ for daily aggregation, or entity-specific directories
if the post is primarily about a person/company.
Raw Source Preservation
Every ingested item must have its raw source preserved for provenance.
Use gbrain files upload-raw for automatic size routing:
gbrain files upload-raw <file> --page <page-slug> --type <type>
- < 100 MB text/PDF: stays in git (brain repo
.raw/sidecar directories) - >= 100 MB OR media (video, audio, images): uploaded to cloud storage
via TUS resumable upload,
.redirect.yamlpointer left in the brain repo
The .redirect.yaml pointer format:
target: supabase://brain-files/page-slug/filename.mp4
bucket: brain-files
storage_path: page-slug/filename.mp4
size: 524288000
size_human: 500 MB
hash: sha256:abc123...
mime: video/mp4
uploaded: 2026-04-11T...
type: transcript
Accessing stored files:
gbrain files signed-url <storage-path>-- generate 1-hour signed URL for viewing/sharinggbrain files restore <dir>-- download back to local from cloud storage
Use put_raw_data in gbrain to store raw API responses and metadata (JSON, not binary).
Test Before Bulk
When processing multiple items (batch video ingestion, bulk meeting processing, etc.):
- Test on 3-5 items first. Run in test mode if available.
- Read the actual output. Is the quality good? Are titles compelling (not "This video discusses...")? Are entities extracted and back-linked? Is the format clean?
- Fix what's wrong in the approach/skill, not via one-off patches.
- Only then: bulk execute with throttling, commits every 5-10 items.
The marginal cost of testing 3 items first is near zero. The cost of cleaning up 100 bad pages is enormous.
Quality Rules
- Executive summary in compiled_truth must be updated, not just timeline appended
- State section is REWRITTEN, not appended to. Current best understanding only.
- Timeline entries are reverse-chronological (newest first)
- Every person/company mentioned gets a page if notable (see filing rules)
- Link types: knows, works_at, invested_in, founded, met_at, discussed
- Source attribution: every timeline entry includes [Source: ...] citation
- Back-links: every entity mention creates a back-link (Iron Law)
- Filing: file by primary subject, not format or source (see filing rules)
Tools Used
- Read a page from gbrain (get_page)
- Store/update a page in gbrain (put_page)
- Add a timeline entry in gbrain (add_timeline_entry)
- Link entities in gbrain (add_link)
- List tags for a page (get_tags)
- Tag a page in gbrain (add_tag)
- Store raw data in gbrain (put_raw_data)
- Check backlinks in gbrain (get_backlinks)