* feat: battle-tested skill patterns from production deployment Backport production-learned brain-operations patterns: - Iron Law of Back-Linking (mandatory bidirectional linking) - Brain filing rules (file by primary subject, not format) - Enrichment protocol (7-step pipeline, 3-tier system, person/company templates) - Media ingest workflows (articles, videos, podcasts, PDFs, screenshots) - Citation requirements (mandatory [Source: ...] on every fact) - Test Before Bulk operating principle - Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS - X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger * chore: bump version and changelog (v0.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add _brain-filing-rules.md to CLAUDE.md key files * feat: smart file upload with TUS resumable and .redirect.yaml pointers - Supabase Storage auto-selects upload method by file size: < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry) - Signed URL generation for private bucket access (1-hour expiry) - New `upload-raw` command with size routing: small text stays in git, large/media files go to cloud with .redirect.yaml pointer - New `signed-url` command for generating access links - File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy) - Redirect format upgraded: 10 fields with full metadata - All migration commands (mirror, redirect, restore, clean) handle both formats * feat: skills reference actual gbrain file commands - Filing rules document upload-raw, signed-url, and .redirect.yaml format - Ingest skill uses gbrain files upload-raw for raw source preservation - Maintain skill adds file storage health checks - Setup skill adds storage configuration phase with migration guidance - Voice recipe uses upload-raw for call audio storage - Migration v0.9.0 with complete storage setup instructions * chore: bump version and changelog (v0.9.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: gbrain publish -- shareable HTML with password protection First code+skill pair: deterministic code does the work (strip private data, encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the agent when and how to use it. 34 new tests. See: https://x.com/garrytan/status/2042925773300908103 * feat: backlinks check/fix, page lint, and report commands Three new deterministic tools (zero LLM calls): - gbrain backlinks check/fix -- scans brain for entity mentions without back-links, creates them. Enforces the Iron Law from the skills. - gbrain lint [--fix] -- catches LLM preambles, code fence wrapping, placeholder dates, missing frontmatter, broken citations, empty sections. --fix auto-strips fixable artifacts. - gbrain report --type <name> -- saves timestamped reports to brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails. 33 new tests (409 total, 0 fail). * feat: v0.9.0 migration tells agents to swap scripts for built-in commands Migration file now: - Lists all 5 new deterministic commands with usage examples - Includes a script-to-command replacement table (old -> new) - Tells the agent to find custom script references in AGENTS.md, skills, and cron jobs and replace with gbrain commands - Adds recommended cron jobs for daily backlink fix + weekly lint - References the Thin Harness, Fat Skills thread * fix: CLI routing bugs found during DX review - Fixed subArgs reference error in handleCliOnly (used wrong variable name) - Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid conflict with existing backlinks operation (per-page incoming links) - Added TOOLS section to --help output showing publish, check-backlinks, lint, report - Added upload-raw and signed-url to FILES section in --help - Updated all docs/migration references to use check-backlinks * fix: security hardening from adversarial review - XSS: sanitize marked.parse() output (strip script/iframe/on* attrs) - Path traversal: validate report --type against [a-z0-9-] pattern - TUS: HEAD request before retry to get server's actual offset (TUS spec) - Pointer: upload-raw now includes pointer content in JSON output - Symlinks: use lstatSync in all walkers to prevent directory escape --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
8.9 KiB
Enrich Skill
Enrich person and company pages from external sources. Scale effort to importance.
Filing rule: Read
skills/_brain-filing-rules.mdbefore creating any new page.
Iron Law: Back-Linking (MANDATORY)
Every mention of a person or company with a brain page MUST create a back-link
FROM that entity's page TO the page mentioning them. An unlinked mention is a
broken brain. See skills/_brain-filing-rules.md for format.
Philosophy
A brain page should read like an intelligence dossier, not a LinkedIn scrape. Facts are table stakes. Texture is the value -- what do they believe, what are they building, what makes them tick, where are they headed.
Citation Requirements (MANDATORY)
Every fact must carry an inline [Source: ...] citation.
Three formats:
- Direct attribution:
[Source: User, {context}, YYYY-MM-DD] - API/external:
[Source: {provider} enrichment, YYYY-MM-DD] - Synthesis:
[Source: compiled from {list of sources}]
Source precedence (highest to lowest):
- User's direct statements
- Compiled truth (pre-existing brain synthesis)
- Timeline entries (raw evidence)
- External sources (API enrichment, web search)
When sources conflict, note the contradiction with both citations.
When To Enrich
Primary triggers
- User mentions an entity in conversation
- Entity appears in a meeting transcript or email
- New contact appears with significant context
- Entity makes news or has a major event
- Any ingest pipeline encounters a notable entity
Do NOT enrich
- Random mentions with no relationship signal
- Bot/spam accounts
- Entities with no substantive connection to the user's work
- Same page enriched within the past week (unless new signal warrants it)
Enrichment Tiers
Scale enrichment to importance. Don't waste API calls on low-value entities.
| Tier | Who | Effort | Sources |
|---|---|---|---|
| 1 (key) | Inner circle, close collaborators, key contacts | Full pipeline | All available APIs + deep web research |
| 2 (notable) | Occasional interactions, industry figures | Moderate | Web research + social + brain cross-ref |
| 3 (minor) | Worth tracking, not critical | Light | Brain cross-ref + social lookup if handle known |
The Enrichment Protocol (7 Steps)
Step 1: Identify entities
Extract people, companies, concepts from the incoming signal.
Step 2: Check brain state
For each entity:
gbrain search "name"-- does a page already exist?- If yes: UPDATE path (add new signal, update compiled truth if material)
- If no: CREATE path (check notability gate first, then create)
Step 3: Extract signal from source
Don't just capture facts. Capture texture:
| Signal Type | What to Extract |
|---|---|
| Opinions, beliefs | What They Believe section |
| Current projects, features shipped | What They're Building section |
| Ambition, career arc, motivation | What Motivates Them section |
| Topics they return to obsessively | Hobby Horses section |
| Who they amplify, argue with, respect | Network / Relationships |
| Ascending, plateauing, pivoting? | Trajectory section |
| Role, company, funding, location | State section (hard facts) |
Step 4: External data source lookups
Priority order -- stop when you have enough signal for the entity's tier.
4a. Brain cross-reference (always, all tiers)
gbrain search "name"andgbrain query "what do we know about name"- Check related pages: company pages for person enrichment and vice versa
- This is free and often the richest source
4b. Web research (Tier 1 and 2)
- Use Perplexity, Brave Search, Exa, or equivalent web research tool
- Key pattern: Send existing brain knowledge as context so the search returns DELTA (what's new vs what you already know), not a rehash
- Opus-class models for Tier 1 deep research, lighter models for Tier 2
4c. Social media lookup (all tiers when handle known)
- Pull recent posts/tweets for tone, interests, current focus
- Social media is the highest-texture signal for what someone actually thinks
4d. People enrichment APIs (Tier 1)
- LinkedIn data, career history, connections, education
4e. Company enrichment APIs (Tier 1)
- Company data, financials, headcount, key hires, recent news
| Data Need | Example Sources | Tier |
|---|---|---|
| Web research | Perplexity, Brave, Exa | 1-2 |
| LinkedIn / career | Crustdata, Proxycurl, People Data Labs | 1 |
| Career history | Happenstance, LinkedIn | 1 |
| Funding / company data | Crunchbase, PitchBook, Clearbit | 1 |
| Social media | Platform APIs, web scraping | 1-3 |
| Meeting history | Calendar/meeting transcript tools | 1-2 |
Step 5: Save raw data (preserves provenance)
Store raw API responses via put_raw_data in gbrain:
{
"source": "crustdata",
"fetched_at": "2026-04-11T...",
"query": "jane doe",
"data": { ... }
}
Raw data preserves provenance. If the compiled truth is ever questioned, the raw data shows exactly what the API returned.
Step 6: Write to brain
CREATE path
- Check notability gate (see
skills/_brain-filing-rules.md) - Check filing rules -- where does this entity go?
- Create page with the appropriate template (below)
- Fill compiled truth with citations
- Add first timeline entry
- Leave empty sections as
[No data yet](don't fill with boilerplate)
UPDATE path
- Add new timeline entries (reverse-chronological, append-only)
- Update compiled truth ONLY if the new signal materially changes the picture
- Update State section with new facts
- Flag contradictions between new signal and existing compiled truth
- Don't overwrite user-written assessments with API boilerplate
Person page template
---
title: Full Name
type: person
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: []
company: Current Company
relationship: How the user knows them
email:
linkedin:
twitter:
location:
---
# Full Name
> 1-paragraph executive summary: HOW do you know them, WHY do they matter,
> what's the current state of the relationship.
## State
Role, company, key context. Hard facts only.
## What They Believe
Ideology, first principles, worldview. What hills do they die on?
## What They're Building
Current projects, recent launches, what they're focused on.
## What Motivates Them
Ambition, career arc, what drives them.
## Hobby Horses
Topics they return to obsessively. Recurring themes in their work/posts.
## Assessment
Your read on this person. Strengths, gaps, trajectory.
## Trajectory
Ascending, plateauing, pivoting, declining? Where are they headed?
## Relationship
History of interactions, shared context, relationship quality.
## Contact
Email, social handles, preferred communication channel.
## Network
Key connections, mutual contacts, organizational relationships.
## Open Threads
Active conversations, pending items, things to follow up on.
---
## Timeline
Reverse chronological. Every entry has a date and [Source: ...] citation.
- **YYYY-MM-DD** | Event description [Source: ...]
Company page template
---
title: Company Name
type: company
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: []
---
# Company Name
> 1-paragraph executive summary.
## State
What they do, stage, key people, key metrics, your connection.
## Open Threads
Active items, pending decisions, things to track.
---
## Timeline
- **YYYY-MM-DD** | Event description [Source: ...]
Step 7: Cross-reference
- Update company pages from person enrichment (and vice versa)
- Update related project/deal pages if relevant context surfaced
- Add back-links from every entity mentioned (MANDATORY)
- Check index files if the brain uses them
Bulk Enrichment Rules
- Test on 3-5 entities first. Read actual output. Check quality.
- Only proceed to bulk after test shots pass your quality bar.
- 3+ entities from one source -> batch process or spawn sub-agent
- Throttle API calls. Respect rate limits.
- Commit every 5-10 entities during bulk runs.
- Save a report after bulk enrichment (see Report Storage below).
Validation Rules
- Connection count < 20 on LinkedIn = likely wrong person, skip
- Name mismatch between brain and API = skip, flag for review
- Joke profiles or obviously wrong data = save to raw, don't update page
- Don't overwrite user-written assessments with API boilerplate
- When in doubt: save raw data but don't update brain page
Report Storage
After enrichment sweeps, save a report:
- Number of entities processed
- New pages created vs existing updated
- Data sources called and results quality
- Notable discoveries or contradictions
- Validation flags or API failures
This creates an audit trail for brain enrichment over time.
Tools Used
- Read a page from gbrain (get_page)
- Store/update a page in gbrain (put_page)
- Add a timeline entry in gbrain (add_timeline_entry)
- List pages in gbrain by type (list_pages)
- Store raw API data in gbrain (put_raw_data)
- Retrieve raw data from gbrain (get_raw_data)
- Link entities in gbrain (add_link)
- Check backlinks in gbrain (get_backlinks)