Files

Garry Tan baf3517868 feat: v0.9.0 -- smart file storage, publish, production-grade skills (#62 )

* feat: battle-tested skill patterns from production deployment

Backport production-learned brain-operations patterns:
- Iron Law of Back-Linking (mandatory bidirectional linking)
- Brain filing rules (file by primary subject, not format)
- Enrichment protocol (7-step pipeline, 3-tier system, person/company templates)
- Media ingest workflows (articles, videos, podcasts, PDFs, screenshots)
- Citation requirements (mandatory [Source: ...] on every fact)
- Test Before Bulk operating principle
- Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS
- X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger

* chore: bump version and changelog (v0.8.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add _brain-filing-rules.md to CLAUDE.md key files

* feat: smart file upload with TUS resumable and .redirect.yaml pointers

- Supabase Storage auto-selects upload method by file size:
  < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry)
- Signed URL generation for private bucket access (1-hour expiry)
- New `upload-raw` command with size routing: small text stays in git,
  large/media files go to cloud with .redirect.yaml pointer
- New `signed-url` command for generating access links
- File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy)
- Redirect format upgraded: 10 fields with full metadata
- All migration commands (mirror, redirect, restore, clean) handle both formats

* feat: skills reference actual gbrain file commands

- Filing rules document upload-raw, signed-url, and .redirect.yaml format
- Ingest skill uses gbrain files upload-raw for raw source preservation
- Maintain skill adds file storage health checks
- Setup skill adds storage configuration phase with migration guidance
- Voice recipe uses upload-raw for call audio storage
- Migration v0.9.0 with complete storage setup instructions

* chore: bump version and changelog (v0.9.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: gbrain publish -- shareable HTML with password protection

First code+skill pair: deterministic code does the work (strip private data,
encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the
agent when and how to use it. 34 new tests.

See: https://x.com/garrytan/status/2042925773300908103

* feat: backlinks check/fix, page lint, and report commands

Three new deterministic tools (zero LLM calls):

- gbrain backlinks check/fix -- scans brain for entity mentions without
  back-links, creates them. Enforces the Iron Law from the skills.
- gbrain lint [--fix] -- catches LLM preambles, code fence wrapping,
  placeholder dates, missing frontmatter, broken citations, empty sections.
  --fix auto-strips fixable artifacts.
- gbrain report --type <name> -- saves timestamped reports to
  brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails.

33 new tests (409 total, 0 fail).

* feat: v0.9.0 migration tells agents to swap scripts for built-in commands

Migration file now:
- Lists all 5 new deterministic commands with usage examples
- Includes a script-to-command replacement table (old -> new)
- Tells the agent to find custom script references in AGENTS.md,
  skills, and cron jobs and replace with gbrain commands
- Adds recommended cron jobs for daily backlink fix + weekly lint
- References the Thin Harness, Fat Skills thread

* fix: CLI routing bugs found during DX review

- Fixed subArgs reference error in handleCliOnly (used wrong variable name)
- Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid
  conflict with existing backlinks operation (per-page incoming links)
- Added TOOLS section to --help output showing publish, check-backlinks,
  lint, report
- Added upload-raw and signed-url to FILES section in --help
- Updated all docs/migration references to use check-backlinks

* fix: security hardening from adversarial review

- XSS: sanitize marked.parse() output (strip script/iframe/on* attrs)
- Path traversal: validate report --type against [a-z0-9-] pattern
- TUS: HEAD request before retry to get server's actual offset (TUS spec)
- Pointer: upload-raw now includes pointer content in JSON output
- Symlinks: use lstatSync in all walkers to prevent directory escape

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-11 21:46:07 -10:00

6.0 KiB

Raw Blame History

Maintain Skill

Periodic brain health checks and cleanup.

Workflow

Run health check. Check gbrain health to get the dashboard.
Check each dimension:

Stale pages

Pages where compiled_truth is older than the latest timeline entry. The assessment hasn't been updated to reflect recent evidence.

Check the health output for stale page count
For each stale page: read the page from gbrain, review timeline, determine if compiled_truth needs rewriting

Orphan pages

Pages with zero inbound links. Nobody references them.

Review orphans: are they genuinely isolated or just missing links?
Add links in gbrain from related pages or flag for deletion

Dead links

Links pointing to pages that don't exist.

Remove dead links in gbrain

Missing cross-references

Pages that mention entity names but don't have formal links.

Read compiled_truth from gbrain, extract entity mentions, create links in gbrain

Back-link enforcement

Check that the back-linking iron law is being followed:

For each recently updated page, check if entities mentioned in it have corresponding back-links FROM those entity pages
A mention without a back-link is a broken brain
Fix: add the missing back-link to the entity's Timeline or See Also section
Format: - **YYYY-MM-DD** | Referenced in [page title](path) -- brief context

Filing rule violations

Check for common misfiling patterns (see skills/_brain-filing-rules.md):

Content with clear primary subjects filed in sources/ instead of the appropriate directory (people/, companies/, concepts/, etc.)
Use gbrain search to find pages in sources/ that reference specific people, companies, or concepts -- these may be misfiled
Flag misfiled pages for review or re-filing

Citation audit

Spot-check pages for missing [Source: ...] citations:

Read 5-10 recently updated pages
Check that compiled truth (above the line) has inline citations
Check that timeline entries have source attribution
Flag pages where facts appear without provenance

Tag consistency

Inconsistent tagging (e.g., "vc" vs "venture-capital", "ai" vs "artificial-intelligence").

Standardize to the most common variant using gbrain tag operations

Embedding freshness

Chunks without embeddings, or chunks embedded with an old model.

For large embedding refreshes (>1000 chunks), use nohup: nohup gbrain embed refresh > /tmp/gbrain-embed.log 2>&1 &
Then check progress: tail -1 /tmp/gbrain-embed.log

Security (RLS verification)

Run gbrain doctor --json and check the RLS status. All tables should show RLS enabled. If not, run gbrain init again.

Schema health

Check that the schema version is up to date. gbrain doctor --json reports the current version vs expected. If behind, gbrain init runs migrations automatically.

File storage health

Check the integrity of stored files and redirect pointers:

Run gbrain files verify to check all DB records have valid data
Run gbrain files status to see migration state (local, mirrored, redirected)
Check for orphan .redirect.yaml pointers that reference missing storage files
Check for large binary files (>= 100 MB) still in git that should be in cloud storage
If storage backend is configured: verify redirect pointers resolve (download test)

Open threads

Timeline items older than 30 days with unresolved action items.

Flag for review

Benchmark Testing

Periodically verify search quality hasn't regressed. Run a battery of test queries across difficulty tiers:

Tier 1 (entity lookup): known names -- should always resolve
Tier 2 (topic recall): concepts, topics -- keyword search should handle
Tier 3 (semantic): queries with no exact keyword match -- needs embeddings
Tier 4 (cross-domain): relational/connection queries -- only semantic handles

Compare results from gbrain search (keyword) vs gbrain query (hybrid). Quality matters more than speed (2.5s right > 200ms wrong).

When to run benchmarks:

After major brain imports or re-imports
After gbrain version upgrades
After embedding regeneration
Monthly to track quality drift

Heartbeat Integration

For production agents running on a schedule, integrate gbrain health checks into your operational heartbeat.

On every heartbeat (hourly or per-session)

Run gbrain doctor --json and check for degradation. Report any failing checks to the user. Key signals: connection health, schema version, RLS status, embedding staleness.

Weekly maintenance

Run gbrain embed --stale to refresh embeddings for pages that have changed since their last embedding. For large brains (>5000 pages), run this with nohup:

nohup gbrain embed --stale > /tmp/gbrain-embed.log 2>&1 &

Daily verification

Verify sync is running: check gbrain stats and confirm last_sync is within the last 24 hours. If sync has stopped, the brain is drifting from the repo.

Stale compiled truth detection

Flag pages where compiled truth is >30 days old but the timeline has recent entries. This means new evidence exists that hasn't been synthesized. These pages need a compiled truth rewrite (see the maintain workflow above).

Report Storage

After maintenance runs, save a report:

Health check results (before/after scores for each dimension)
Back-link violations found and fixed
Filing rule violations found
Citation gaps flagged
Benchmark results (if run)
Outstanding issues requiring user attention

This creates an audit trail for brain health over time.

Quality Rules

Never delete pages without confirmation
Log all changes via timeline entries
Check gbrain health before and after to show improvement

Tools Used

Check gbrain health (get_health)
List pages in gbrain with filters (list_pages)
Read a page from gbrain (get_page)
Check backlinks in gbrain (get_backlinks)
Link entities in gbrain (add_link)
Remove links in gbrain (remove_link)
Tag a page in gbrain (add_tag)
Remove a tag in gbrain (remove_tag)
View timeline in gbrain (get_timeline)

6.0 KiB Raw Blame History