Files

Garry Tan baf3517868 feat: v0.9.0 -- smart file storage, publish, production-grade skills (#62 )

* feat: battle-tested skill patterns from production deployment

Backport production-learned brain-operations patterns:
- Iron Law of Back-Linking (mandatory bidirectional linking)
- Brain filing rules (file by primary subject, not format)
- Enrichment protocol (7-step pipeline, 3-tier system, person/company templates)
- Media ingest workflows (articles, videos, podcasts, PDFs, screenshots)
- Citation requirements (mandatory [Source: ...] on every fact)
- Test Before Bulk operating principle
- Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS
- X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger

* chore: bump version and changelog (v0.8.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add _brain-filing-rules.md to CLAUDE.md key files

* feat: smart file upload with TUS resumable and .redirect.yaml pointers

- Supabase Storage auto-selects upload method by file size:
  < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry)
- Signed URL generation for private bucket access (1-hour expiry)
- New `upload-raw` command with size routing: small text stays in git,
  large/media files go to cloud with .redirect.yaml pointer
- New `signed-url` command for generating access links
- File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy)
- Redirect format upgraded: 10 fields with full metadata
- All migration commands (mirror, redirect, restore, clean) handle both formats

* feat: skills reference actual gbrain file commands

- Filing rules document upload-raw, signed-url, and .redirect.yaml format
- Ingest skill uses gbrain files upload-raw for raw source preservation
- Maintain skill adds file storage health checks
- Setup skill adds storage configuration phase with migration guidance
- Voice recipe uses upload-raw for call audio storage
- Migration v0.9.0 with complete storage setup instructions

* chore: bump version and changelog (v0.9.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: gbrain publish -- shareable HTML with password protection

First code+skill pair: deterministic code does the work (strip private data,
encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the
agent when and how to use it. 34 new tests.

See: https://x.com/garrytan/status/2042925773300908103

* feat: backlinks check/fix, page lint, and report commands

Three new deterministic tools (zero LLM calls):

- gbrain backlinks check/fix -- scans brain for entity mentions without
  back-links, creates them. Enforces the Iron Law from the skills.
- gbrain lint [--fix] -- catches LLM preambles, code fence wrapping,
  placeholder dates, missing frontmatter, broken citations, empty sections.
  --fix auto-strips fixable artifacts.
- gbrain report --type <name> -- saves timestamped reports to
  brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails.

33 new tests (409 total, 0 fail).

* feat: v0.9.0 migration tells agents to swap scripts for built-in commands

Migration file now:
- Lists all 5 new deterministic commands with usage examples
- Includes a script-to-command replacement table (old -> new)
- Tells the agent to find custom script references in AGENTS.md,
  skills, and cron jobs and replace with gbrain commands
- Adds recommended cron jobs for daily backlink fix + weekly lint
- References the Thin Harness, Fat Skills thread

* fix: CLI routing bugs found during DX review

- Fixed subArgs reference error in handleCliOnly (used wrong variable name)
- Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid
  conflict with existing backlinks operation (per-page incoming links)
- Added TOOLS section to --help output showing publish, check-backlinks,
  lint, report
- Added upload-raw and signed-url to FILES section in --help
- Updated all docs/migration references to use check-backlinks

* fix: security hardening from adversarial review

- XSS: sanitize marked.parse() output (strip script/iframe/on* attrs)
- Path traversal: validate report --type against [a-z0-9-] pattern
- TUS: HEAD request before retry to get server's actual offset (TUS spec)
- Pointer: upload-raw now includes pointer content in JSON output
- Symlinks: use lstatSync in all walkers to prevent directory escape

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-11 21:46:07 -10:00

8.9 KiB

Raw Blame History

Enrich Skill

Enrich person and company pages from external sources. Scale effort to importance.

Filing rule: Read skills/_brain-filing-rules.md before creating any new page.

Iron Law: Back-Linking (MANDATORY)

Every mention of a person or company with a brain page MUST create a back-link FROM that entity's page TO the page mentioning them. An unlinked mention is a broken brain. See skills/_brain-filing-rules.md for format.

Philosophy

A brain page should read like an intelligence dossier, not a LinkedIn scrape. Facts are table stakes. Texture is the value -- what do they believe, what are they building, what makes them tick, where are they headed.

Citation Requirements (MANDATORY)

Every fact must carry an inline [Source: ...] citation.

Three formats:

Direct attribution: [Source: User, {context}, YYYY-MM-DD]
API/external: [Source: {provider} enrichment, YYYY-MM-DD]
Synthesis: [Source: compiled from {list of sources}]

Source precedence (highest to lowest):

User's direct statements
Compiled truth (pre-existing brain synthesis)
Timeline entries (raw evidence)
External sources (API enrichment, web search)

When sources conflict, note the contradiction with both citations.

When To Enrich

Primary triggers

User mentions an entity in conversation
Entity appears in a meeting transcript or email
New contact appears with significant context
Entity makes news or has a major event
Any ingest pipeline encounters a notable entity

Do NOT enrich

Random mentions with no relationship signal
Bot/spam accounts
Entities with no substantive connection to the user's work
Same page enriched within the past week (unless new signal warrants it)

Enrichment Tiers

Scale enrichment to importance. Don't waste API calls on low-value entities.

Tier	Who	Effort	Sources
1 (key)	Inner circle, close collaborators, key contacts	Full pipeline	All available APIs + deep web research
2 (notable)	Occasional interactions, industry figures	Moderate	Web research + social + brain cross-ref
3 (minor)	Worth tracking, not critical	Light	Brain cross-ref + social lookup if handle known

The Enrichment Protocol (7 Steps)

Step 1: Identify entities

Extract people, companies, concepts from the incoming signal.

Step 2: Check brain state

For each entity:

gbrain search "name" -- does a page already exist?
If yes: UPDATE path (add new signal, update compiled truth if material)
If no: CREATE path (check notability gate first, then create)

Step 3: Extract signal from source

Don't just capture facts. Capture texture:

Signal Type	What to Extract
Opinions, beliefs	What They Believe section
Current projects, features shipped	What They're Building section
Ambition, career arc, motivation	What Motivates Them section
Topics they return to obsessively	Hobby Horses section
Who they amplify, argue with, respect	Network / Relationships
Ascending, plateauing, pivoting?	Trajectory section
Role, company, funding, location	State section (hard facts)

Step 4: External data source lookups

Priority order -- stop when you have enough signal for the entity's tier.

4a. Brain cross-reference (always, all tiers)

gbrain search "name" and gbrain query "what do we know about name"
Check related pages: company pages for person enrichment and vice versa
This is free and often the richest source

4b. Web research (Tier 1 and 2)

Use Perplexity, Brave Search, Exa, or equivalent web research tool
Key pattern: Send existing brain knowledge as context so the search returns DELTA (what's new vs what you already know), not a rehash
Opus-class models for Tier 1 deep research, lighter models for Tier 2

4c. Social media lookup (all tiers when handle known)

Pull recent posts/tweets for tone, interests, current focus
Social media is the highest-texture signal for what someone actually thinks

4d. People enrichment APIs (Tier 1)

LinkedIn data, career history, connections, education

4e. Company enrichment APIs (Tier 1)

Company data, financials, headcount, key hires, recent news

Data Need	Example Sources	Tier
Web research	Perplexity, Brave, Exa	1-2
LinkedIn / career	Crustdata, Proxycurl, People Data Labs	1
Career history	Happenstance, LinkedIn	1
Funding / company data	Crunchbase, PitchBook, Clearbit	1
Social media	Platform APIs, web scraping	1-3
Meeting history	Calendar/meeting transcript tools	1-2

Step 5: Save raw data (preserves provenance)

Store raw API responses via put_raw_data in gbrain:

{
  "source": "crustdata",
  "fetched_at": "2026-04-11T...",
  "query": "jane doe",
  "data": { ... }
}

Raw data preserves provenance. If the compiled truth is ever questioned, the raw data shows exactly what the API returned.

Step 6: Write to brain

CREATE path

Check notability gate (see skills/_brain-filing-rules.md)
Check filing rules -- where does this entity go?
Create page with the appropriate template (below)
Fill compiled truth with citations
Add first timeline entry
Leave empty sections as [No data yet] (don't fill with boilerplate)

UPDATE path

Add new timeline entries (reverse-chronological, append-only)
Update compiled truth ONLY if the new signal materially changes the picture
Update State section with new facts
Flag contradictions between new signal and existing compiled truth
Don't overwrite user-written assessments with API boilerplate

Person page template

---
title: Full Name
type: person
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: []
company: Current Company
relationship: How the user knows them
email:
linkedin:
twitter:
location:
---

# Full Name

> 1-paragraph executive summary: HOW do you know them, WHY do they matter,
> what's the current state of the relationship.

## State
Role, company, key context. Hard facts only.

## What They Believe
Ideology, first principles, worldview. What hills do they die on?

## What They're Building
Current projects, recent launches, what they're focused on.

## What Motivates Them
Ambition, career arc, what drives them.

## Hobby Horses
Topics they return to obsessively. Recurring themes in their work/posts.

## Assessment
Your read on this person. Strengths, gaps, trajectory.

## Trajectory
Ascending, plateauing, pivoting, declining? Where are they headed?

## Relationship
History of interactions, shared context, relationship quality.

## Contact
Email, social handles, preferred communication channel.

## Network
Key connections, mutual contacts, organizational relationships.

## Open Threads
Active conversations, pending items, things to follow up on.

---

## Timeline
Reverse chronological. Every entry has a date and [Source: ...] citation.
- **YYYY-MM-DD** | Event description [Source: ...]

Company page template

---
title: Company Name
type: company
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: []
---

# Company Name

> 1-paragraph executive summary.

## State
What they do, stage, key people, key metrics, your connection.

## Open Threads
Active items, pending decisions, things to track.

---

## Timeline
- **YYYY-MM-DD** | Event description [Source: ...]

Step 7: Cross-reference

Update company pages from person enrichment (and vice versa)
Update related project/deal pages if relevant context surfaced
Add back-links from every entity mentioned (MANDATORY)
Check index files if the brain uses them

Bulk Enrichment Rules

Test on 3-5 entities first. Read actual output. Check quality.
Only proceed to bulk after test shots pass your quality bar.
3+ entities from one source -> batch process or spawn sub-agent
Throttle API calls. Respect rate limits.
Commit every 5-10 entities during bulk runs.
Save a report after bulk enrichment (see Report Storage below).

Validation Rules

Connection count < 20 on LinkedIn = likely wrong person, skip
Name mismatch between brain and API = skip, flag for review
Joke profiles or obviously wrong data = save to raw, don't update page
Don't overwrite user-written assessments with API boilerplate
When in doubt: save raw data but don't update brain page

Report Storage

After enrichment sweeps, save a report:

Number of entities processed
New pages created vs existing updated
Data sources called and results quality
Notable discoveries or contradictions
Validation flags or API failures

This creates an audit trail for brain enrichment over time.

Tools Used

Read a page from gbrain (get_page)
Store/update a page in gbrain (put_page)
Add a timeline entry in gbrain (add_timeline_entry)
List pages in gbrain by type (list_pages)
Store raw API data in gbrain (put_raw_data)
Retrieve raw data from gbrain (get_raw_data)
Link entities in gbrain (add_link)
Check backlinks in gbrain (get_backlinks)

8.9 KiB Raw Blame History