* feat: battle-tested skill patterns from production deployment Backport production-learned brain-operations patterns: - Iron Law of Back-Linking (mandatory bidirectional linking) - Brain filing rules (file by primary subject, not format) - Enrichment protocol (7-step pipeline, 3-tier system, person/company templates) - Media ingest workflows (articles, videos, podcasts, PDFs, screenshots) - Citation requirements (mandatory [Source: ...] on every fact) - Test Before Bulk operating principle - Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS - X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger * chore: bump version and changelog (v0.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add _brain-filing-rules.md to CLAUDE.md key files * feat: smart file upload with TUS resumable and .redirect.yaml pointers - Supabase Storage auto-selects upload method by file size: < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry) - Signed URL generation for private bucket access (1-hour expiry) - New `upload-raw` command with size routing: small text stays in git, large/media files go to cloud with .redirect.yaml pointer - New `signed-url` command for generating access links - File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy) - Redirect format upgraded: 10 fields with full metadata - All migration commands (mirror, redirect, restore, clean) handle both formats * feat: skills reference actual gbrain file commands - Filing rules document upload-raw, signed-url, and .redirect.yaml format - Ingest skill uses gbrain files upload-raw for raw source preservation - Maintain skill adds file storage health checks - Setup skill adds storage configuration phase with migration guidance - Voice recipe uses upload-raw for call audio storage - Migration v0.9.0 with complete storage setup instructions * chore: bump version and changelog (v0.9.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: gbrain publish -- shareable HTML with password protection First code+skill pair: deterministic code does the work (strip private data, encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the agent when and how to use it. 34 new tests. See: https://x.com/garrytan/status/2042925773300908103 * feat: backlinks check/fix, page lint, and report commands Three new deterministic tools (zero LLM calls): - gbrain backlinks check/fix -- scans brain for entity mentions without back-links, creates them. Enforces the Iron Law from the skills. - gbrain lint [--fix] -- catches LLM preambles, code fence wrapping, placeholder dates, missing frontmatter, broken citations, empty sections. --fix auto-strips fixable artifacts. - gbrain report --type <name> -- saves timestamped reports to brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails. 33 new tests (409 total, 0 fail). * feat: v0.9.0 migration tells agents to swap scripts for built-in commands Migration file now: - Lists all 5 new deterministic commands with usage examples - Includes a script-to-command replacement table (old -> new) - Tells the agent to find custom script references in AGENTS.md, skills, and cron jobs and replace with gbrain commands - Adds recommended cron jobs for daily backlink fix + weekly lint - References the Thin Harness, Fat Skills thread * fix: CLI routing bugs found during DX review - Fixed subArgs reference error in handleCliOnly (used wrong variable name) - Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid conflict with existing backlinks operation (per-page incoming links) - Added TOOLS section to --help output showing publish, check-backlinks, lint, report - Added upload-raw and signed-url to FILES section in --help - Updated all docs/migration references to use check-backlinks * fix: security hardening from adversarial review - XSS: sanitize marked.parse() output (strip script/iframe/on* attrs) - Path traversal: validate report --type against [a-z0-9-] pattern - TUS: HEAD request before retry to get server's actual offset (TUS spec) - Pointer: upload-raw now includes pointer content in JSON output - Symlinks: use lstatSync in all walkers to prevent directory escape --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
290 lines
8.9 KiB
Markdown
290 lines
8.9 KiB
Markdown
# Enrich Skill
|
|
|
|
Enrich person and company pages from external sources. Scale effort to importance.
|
|
|
|
> **Filing rule:** Read `skills/_brain-filing-rules.md` before creating any new page.
|
|
|
|
## Iron Law: Back-Linking (MANDATORY)
|
|
|
|
Every mention of a person or company with a brain page MUST create a back-link
|
|
FROM that entity's page TO the page mentioning them. An unlinked mention is a
|
|
broken brain. See `skills/_brain-filing-rules.md` for format.
|
|
|
|
## Philosophy
|
|
|
|
A brain page should read like an intelligence dossier, not a LinkedIn scrape.
|
|
Facts are table stakes. Texture is the value -- what do they believe, what are
|
|
they building, what makes them tick, where are they headed.
|
|
|
|
## Citation Requirements (MANDATORY)
|
|
|
|
Every fact must carry an inline `[Source: ...]` citation.
|
|
|
|
Three formats:
|
|
- **Direct attribution:** `[Source: User, {context}, YYYY-MM-DD]`
|
|
- **API/external:** `[Source: {provider} enrichment, YYYY-MM-DD]`
|
|
- **Synthesis:** `[Source: compiled from {list of sources}]`
|
|
|
|
Source precedence (highest to lowest):
|
|
1. User's direct statements
|
|
2. Compiled truth (pre-existing brain synthesis)
|
|
3. Timeline entries (raw evidence)
|
|
4. External sources (API enrichment, web search)
|
|
|
|
When sources conflict, note the contradiction with both citations.
|
|
|
|
## When To Enrich
|
|
|
|
### Primary triggers
|
|
- User mentions an entity in conversation
|
|
- Entity appears in a meeting transcript or email
|
|
- New contact appears with significant context
|
|
- Entity makes news or has a major event
|
|
- Any ingest pipeline encounters a notable entity
|
|
|
|
### Do NOT enrich
|
|
- Random mentions with no relationship signal
|
|
- Bot/spam accounts
|
|
- Entities with no substantive connection to the user's work
|
|
- Same page enriched within the past week (unless new signal warrants it)
|
|
|
|
## Enrichment Tiers
|
|
|
|
Scale enrichment to importance. Don't waste API calls on low-value entities.
|
|
|
|
| Tier | Who | Effort | Sources |
|
|
|------|-----|--------|---------|
|
|
| 1 (key) | Inner circle, close collaborators, key contacts | Full pipeline | All available APIs + deep web research |
|
|
| 2 (notable) | Occasional interactions, industry figures | Moderate | Web research + social + brain cross-ref |
|
|
| 3 (minor) | Worth tracking, not critical | Light | Brain cross-ref + social lookup if handle known |
|
|
|
|
## The Enrichment Protocol (7 Steps)
|
|
|
|
### Step 1: Identify entities
|
|
|
|
Extract people, companies, concepts from the incoming signal.
|
|
|
|
### Step 2: Check brain state
|
|
|
|
For each entity:
|
|
- `gbrain search "name"` -- does a page already exist?
|
|
- **If yes:** UPDATE path (add new signal, update compiled truth if material)
|
|
- **If no:** CREATE path (check notability gate first, then create)
|
|
|
|
### Step 3: Extract signal from source
|
|
|
|
Don't just capture facts. Capture texture:
|
|
|
|
| Signal Type | What to Extract |
|
|
|-------------|----------------|
|
|
| Opinions, beliefs | What They Believe section |
|
|
| Current projects, features shipped | What They're Building section |
|
|
| Ambition, career arc, motivation | What Motivates Them section |
|
|
| Topics they return to obsessively | Hobby Horses section |
|
|
| Who they amplify, argue with, respect | Network / Relationships |
|
|
| Ascending, plateauing, pivoting? | Trajectory section |
|
|
| Role, company, funding, location | State section (hard facts) |
|
|
|
|
### Step 4: External data source lookups
|
|
|
|
Priority order -- stop when you have enough signal for the entity's tier.
|
|
|
|
**4a. Brain cross-reference (always, all tiers)**
|
|
- `gbrain search "name"` and `gbrain query "what do we know about name"`
|
|
- Check related pages: company pages for person enrichment and vice versa
|
|
- This is free and often the richest source
|
|
|
|
**4b. Web research (Tier 1 and 2)**
|
|
- Use Perplexity, Brave Search, Exa, or equivalent web research tool
|
|
- **Key pattern:** Send existing brain knowledge as context so the search
|
|
returns DELTA (what's new vs what you already know), not a rehash
|
|
- Opus-class models for Tier 1 deep research, lighter models for Tier 2
|
|
|
|
**4c. Social media lookup (all tiers when handle known)**
|
|
- Pull recent posts/tweets for tone, interests, current focus
|
|
- Social media is the highest-texture signal for what someone actually thinks
|
|
|
|
**4d. People enrichment APIs (Tier 1)**
|
|
- LinkedIn data, career history, connections, education
|
|
|
|
**4e. Company enrichment APIs (Tier 1)**
|
|
- Company data, financials, headcount, key hires, recent news
|
|
|
|
| Data Need | Example Sources | Tier |
|
|
|-----------|----------------|------|
|
|
| Web research | Perplexity, Brave, Exa | 1-2 |
|
|
| LinkedIn / career | Crustdata, Proxycurl, People Data Labs | 1 |
|
|
| Career history | Happenstance, LinkedIn | 1 |
|
|
| Funding / company data | Crunchbase, PitchBook, Clearbit | 1 |
|
|
| Social media | Platform APIs, web scraping | 1-3 |
|
|
| Meeting history | Calendar/meeting transcript tools | 1-2 |
|
|
|
|
### Step 5: Save raw data (preserves provenance)
|
|
|
|
Store raw API responses via `put_raw_data` in gbrain:
|
|
```json
|
|
{
|
|
"source": "crustdata",
|
|
"fetched_at": "2026-04-11T...",
|
|
"query": "jane doe",
|
|
"data": { ... }
|
|
}
|
|
```
|
|
|
|
Raw data preserves provenance. If the compiled truth is ever questioned,
|
|
the raw data shows exactly what the API returned.
|
|
|
|
### Step 6: Write to brain
|
|
|
|
#### CREATE path
|
|
|
|
1. Check notability gate (see `skills/_brain-filing-rules.md`)
|
|
2. Check filing rules -- where does this entity go?
|
|
3. Create page with the appropriate template (below)
|
|
4. Fill compiled truth with citations
|
|
5. Add first timeline entry
|
|
6. Leave empty sections as `[No data yet]` (don't fill with boilerplate)
|
|
|
|
#### UPDATE path
|
|
|
|
1. Add new timeline entries (reverse-chronological, append-only)
|
|
2. Update compiled truth ONLY if the new signal materially changes the picture
|
|
3. Update State section with new facts
|
|
4. Flag contradictions between new signal and existing compiled truth
|
|
5. Don't overwrite user-written assessments with API boilerplate
|
|
|
|
#### Person page template
|
|
|
|
```markdown
|
|
---
|
|
title: Full Name
|
|
type: person
|
|
created: YYYY-MM-DD
|
|
updated: YYYY-MM-DD
|
|
tags: []
|
|
company: Current Company
|
|
relationship: How the user knows them
|
|
email:
|
|
linkedin:
|
|
twitter:
|
|
location:
|
|
---
|
|
|
|
# Full Name
|
|
|
|
> 1-paragraph executive summary: HOW do you know them, WHY do they matter,
|
|
> what's the current state of the relationship.
|
|
|
|
## State
|
|
Role, company, key context. Hard facts only.
|
|
|
|
## What They Believe
|
|
Ideology, first principles, worldview. What hills do they die on?
|
|
|
|
## What They're Building
|
|
Current projects, recent launches, what they're focused on.
|
|
|
|
## What Motivates Them
|
|
Ambition, career arc, what drives them.
|
|
|
|
## Hobby Horses
|
|
Topics they return to obsessively. Recurring themes in their work/posts.
|
|
|
|
## Assessment
|
|
Your read on this person. Strengths, gaps, trajectory.
|
|
|
|
## Trajectory
|
|
Ascending, plateauing, pivoting, declining? Where are they headed?
|
|
|
|
## Relationship
|
|
History of interactions, shared context, relationship quality.
|
|
|
|
## Contact
|
|
Email, social handles, preferred communication channel.
|
|
|
|
## Network
|
|
Key connections, mutual contacts, organizational relationships.
|
|
|
|
## Open Threads
|
|
Active conversations, pending items, things to follow up on.
|
|
|
|
---
|
|
|
|
## Timeline
|
|
Reverse chronological. Every entry has a date and [Source: ...] citation.
|
|
- **YYYY-MM-DD** | Event description [Source: ...]
|
|
```
|
|
|
|
#### Company page template
|
|
|
|
```markdown
|
|
---
|
|
title: Company Name
|
|
type: company
|
|
created: YYYY-MM-DD
|
|
updated: YYYY-MM-DD
|
|
tags: []
|
|
---
|
|
|
|
# Company Name
|
|
|
|
> 1-paragraph executive summary.
|
|
|
|
## State
|
|
What they do, stage, key people, key metrics, your connection.
|
|
|
|
## Open Threads
|
|
Active items, pending decisions, things to track.
|
|
|
|
---
|
|
|
|
## Timeline
|
|
- **YYYY-MM-DD** | Event description [Source: ...]
|
|
```
|
|
|
|
### Step 7: Cross-reference
|
|
|
|
- Update company pages from person enrichment (and vice versa)
|
|
- Update related project/deal pages if relevant context surfaced
|
|
- Add back-links from every entity mentioned (MANDATORY)
|
|
- Check index files if the brain uses them
|
|
|
|
## Bulk Enrichment Rules
|
|
|
|
- **Test on 3-5 entities first.** Read actual output. Check quality.
|
|
- Only proceed to bulk after test shots pass your quality bar.
|
|
- 3+ entities from one source -> batch process or spawn sub-agent
|
|
- Throttle API calls. Respect rate limits.
|
|
- Commit every 5-10 entities during bulk runs.
|
|
- Save a report after bulk enrichment (see Report Storage below).
|
|
|
|
## Validation Rules
|
|
|
|
- Connection count < 20 on LinkedIn = likely wrong person, skip
|
|
- Name mismatch between brain and API = skip, flag for review
|
|
- Joke profiles or obviously wrong data = save to raw, don't update page
|
|
- Don't overwrite user-written assessments with API boilerplate
|
|
- When in doubt: save raw data but don't update brain page
|
|
|
|
## Report Storage
|
|
|
|
After enrichment sweeps, save a report:
|
|
- Number of entities processed
|
|
- New pages created vs existing updated
|
|
- Data sources called and results quality
|
|
- Notable discoveries or contradictions
|
|
- Validation flags or API failures
|
|
|
|
This creates an audit trail for brain enrichment over time.
|
|
|
|
## Tools Used
|
|
|
|
- Read a page from gbrain (get_page)
|
|
- Store/update a page in gbrain (put_page)
|
|
- Add a timeline entry in gbrain (add_timeline_entry)
|
|
- List pages in gbrain by type (list_pages)
|
|
- Store raw API data in gbrain (put_raw_data)
|
|
- Retrieve raw data from gbrain (get_raw_data)
|
|
- Link entities in gbrain (add_link)
|
|
- Check backlinks in gbrain (get_backlinks)
|