Files

Garry Tan baf3517868 feat: v0.9.0 -- smart file storage, publish, production-grade skills (#62 )

* feat: battle-tested skill patterns from production deployment

Backport production-learned brain-operations patterns:
- Iron Law of Back-Linking (mandatory bidirectional linking)
- Brain filing rules (file by primary subject, not format)
- Enrichment protocol (7-step pipeline, 3-tier system, person/company templates)
- Media ingest workflows (articles, videos, podcasts, PDFs, screenshots)
- Citation requirements (mandatory [Source: ...] on every fact)
- Test Before Bulk operating principle
- Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS
- X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger

* chore: bump version and changelog (v0.8.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add _brain-filing-rules.md to CLAUDE.md key files

* feat: smart file upload with TUS resumable and .redirect.yaml pointers

- Supabase Storage auto-selects upload method by file size:
  < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry)
- Signed URL generation for private bucket access (1-hour expiry)
- New `upload-raw` command with size routing: small text stays in git,
  large/media files go to cloud with .redirect.yaml pointer
- New `signed-url` command for generating access links
- File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy)
- Redirect format upgraded: 10 fields with full metadata
- All migration commands (mirror, redirect, restore, clean) handle both formats

* feat: skills reference actual gbrain file commands

- Filing rules document upload-raw, signed-url, and .redirect.yaml format
- Ingest skill uses gbrain files upload-raw for raw source preservation
- Maintain skill adds file storage health checks
- Setup skill adds storage configuration phase with migration guidance
- Voice recipe uses upload-raw for call audio storage
- Migration v0.9.0 with complete storage setup instructions

* chore: bump version and changelog (v0.9.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: gbrain publish -- shareable HTML with password protection

First code+skill pair: deterministic code does the work (strip private data,
encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the
agent when and how to use it. 34 new tests.

See: https://x.com/garrytan/status/2042925773300908103

* feat: backlinks check/fix, page lint, and report commands

Three new deterministic tools (zero LLM calls):

- gbrain backlinks check/fix -- scans brain for entity mentions without
  back-links, creates them. Enforces the Iron Law from the skills.
- gbrain lint [--fix] -- catches LLM preambles, code fence wrapping,
  placeholder dates, missing frontmatter, broken citations, empty sections.
  --fix auto-strips fixable artifacts.
- gbrain report --type <name> -- saves timestamped reports to
  brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails.

33 new tests (409 total, 0 fail).

* feat: v0.9.0 migration tells agents to swap scripts for built-in commands

Migration file now:
- Lists all 5 new deterministic commands with usage examples
- Includes a script-to-command replacement table (old -> new)
- Tells the agent to find custom script references in AGENTS.md,
  skills, and cron jobs and replace with gbrain commands
- Adds recommended cron jobs for daily backlink fix + weekly lint
- References the Thin Harness, Fat Skills thread

* fix: CLI routing bugs found during DX review

- Fixed subArgs reference error in handleCliOnly (used wrong variable name)
- Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid
  conflict with existing backlinks operation (per-page incoming links)
- Added TOOLS section to --help output showing publish, check-backlinks,
  lint, report
- Added upload-raw and signed-url to FILES section in --help
- Updated all docs/migration references to use check-backlinks

* fix: security hardening from adversarial review

- XSS: sanitize marked.parse() output (strip script/iframe/on* attrs)
- Path traversal: validate report --type against [a-z0-9-] pattern
- TUS: HEAD request before retry to get server's actual offset (TUS spec)
- Pointer: upload-raw now includes pointer content in JSON output
- Symlinks: use lstatSync in all walkers to prevent directory escape

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-11 21:46:07 -10:00

4.3 KiB

Raw Blame History

Brain Filing Rules -- MANDATORY for all skills that write to the brain

The Rule

The PRIMARY SUBJECT of the content determines where it goes. Not the format, not the source, not the skill that's running.

Decision Protocol

Identify the primary subject (a person? company? concept? policy issue?)
File in the directory that matches the subject
Cross-link from related directories
When in doubt: what would you search for to find this page again?

Common Misfiling Patterns -- DO NOT DO THESE

Wrong	Right	Why
Analysis of a topic -> `sources/`	-> appropriate subject directory	sources/ is for raw data only
Article about a person -> `sources/`	-> `people/`	Primary subject is a person
Meeting-derived company info -> `meetings/` only	-> ALSO update `companies/`	Entity propagation is mandatory
Research about a company -> `sources/`	-> `companies/`	Primary subject is a company
Reusable framework/thesis -> `sources/`	-> `concepts/`	It's a mental model
Tweet thread about policy -> `media/`	-> `civic/` or `concepts/`	media/ is for content ops

What `sources/` Is Actually For

sources/ is ONLY for:

Bulk data imports (API dumps, CSV exports, snapshots)
Raw data that feeds multiple brain pages (e.g., a guest export, contact sync)
Periodic captures (quarterly snapshots, sync exports)

If the content has a clear primary subject (a person, company, concept, policy issue), it does NOT go in sources/. Period.

Notability Gate

Not everything deserves a brain page. Before creating a new entity page:

People: Will you interact with them again? Are they relevant to your work?
Companies: Are they relevant to your work or interests?
Concepts: Is this a reusable mental model worth referencing later?
When in doubt, DON'T create. A missing page can be created later. A junk page wastes attention and degrades search quality.

Iron Law: Back-Linking (MANDATORY)

Every mention of a person or company with a brain page MUST create a back-link FROM that entity's page TO the page mentioning them. This is bidirectional: the new page links to the entity, AND the entity's page links back.

Format for back-links (append to Timeline or See Also):

- **YYYY-MM-DD** | Referenced in [page title](path/to/page.md) -- brief context

An unlinked mention is a broken brain. The graph is the intelligence.

Citation Requirements (MANDATORY)

Every fact written to a brain page must carry an inline [Source: ...] citation.

Three formats:

Direct attribution: [Source: User, {context}, YYYY-MM-DD]
API/external: [Source: {provider}, YYYY-MM-DD] or [Source: {publication}, {URL}]
Synthesis: [Source: compiled from {list of sources}]

Source precedence (highest to lowest):

User's direct statements (highest authority)
Compiled truth (pre-existing brain synthesis)
Timeline entries (raw evidence)
External sources (API enrichment, web search -- lowest)

When sources conflict, note the contradiction with both citations. Don't silently pick one.

Raw Source Preservation

Every ingested item should have its raw source preserved for provenance.

Size routing (automatic via gbrain files upload-raw):

< 100 MB text/PDF: stays in the brain repo (git-tracked) in a .raw/ sidecar directory alongside the brain page
>= 100 MB OR media files (video, audio, images): uploaded to cloud storage (Supabase Storage, S3, etc.) with a .redirect.yaml pointer left in the brain repo. Files >= 100 MB use TUS resumable upload (6 MB chunks with retry) for reliability.

Upload command:

gbrain files upload-raw <file> --page <page-slug> --type <type>

Returns JSON: {storage: "git"} for small files, {storage: "supabase", storagePath, reference} for cloud.

The .redirect.yaml pointer format:

target: supabase://brain-files/page-slug/filename.mp4
bucket: brain-files
storage_path: page-slug/filename.mp4
size: 524288000
size_human: 500 MB
hash: sha256:abc123...
mime: video/mp4
uploaded: 2026-04-11T...
type: transcript

Accessing stored files:

gbrain files signed-url <storage-path>    # Generate 1-hour signed URL
gbrain files restore <dir>                # Download back to local

This ensures any derived brain page can be traced back to its original source, and large files don't bloat the git repo.

4.3 KiB Raw Blame History