feat: v0.9.0 -- smart file storage, publish, production-grade skills (#62)
* feat: battle-tested skill patterns from production deployment Backport production-learned brain-operations patterns: - Iron Law of Back-Linking (mandatory bidirectional linking) - Brain filing rules (file by primary subject, not format) - Enrichment protocol (7-step pipeline, 3-tier system, person/company templates) - Media ingest workflows (articles, videos, podcasts, PDFs, screenshots) - Citation requirements (mandatory [Source: ...] on every fact) - Test Before Bulk operating principle - Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS - X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger * chore: bump version and changelog (v0.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add _brain-filing-rules.md to CLAUDE.md key files * feat: smart file upload with TUS resumable and .redirect.yaml pointers - Supabase Storage auto-selects upload method by file size: < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry) - Signed URL generation for private bucket access (1-hour expiry) - New `upload-raw` command with size routing: small text stays in git, large/media files go to cloud with .redirect.yaml pointer - New `signed-url` command for generating access links - File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy) - Redirect format upgraded: 10 fields with full metadata - All migration commands (mirror, redirect, restore, clean) handle both formats * feat: skills reference actual gbrain file commands - Filing rules document upload-raw, signed-url, and .redirect.yaml format - Ingest skill uses gbrain files upload-raw for raw source preservation - Maintain skill adds file storage health checks - Setup skill adds storage configuration phase with migration guidance - Voice recipe uses upload-raw for call audio storage - Migration v0.9.0 with complete storage setup instructions * chore: bump version and changelog (v0.9.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: gbrain publish -- shareable HTML with password protection First code+skill pair: deterministic code does the work (strip private data, encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the agent when and how to use it. 34 new tests. See: https://x.com/garrytan/status/2042925773300908103 * feat: backlinks check/fix, page lint, and report commands Three new deterministic tools (zero LLM calls): - gbrain backlinks check/fix -- scans brain for entity mentions without back-links, creates them. Enforces the Iron Law from the skills. - gbrain lint [--fix] -- catches LLM preambles, code fence wrapping, placeholder dates, missing frontmatter, broken citations, empty sections. --fix auto-strips fixable artifacts. - gbrain report --type <name> -- saves timestamped reports to brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails. 33 new tests (409 total, 0 fail). * feat: v0.9.0 migration tells agents to swap scripts for built-in commands Migration file now: - Lists all 5 new deterministic commands with usage examples - Includes a script-to-command replacement table (old -> new) - Tells the agent to find custom script references in AGENTS.md, skills, and cron jobs and replace with gbrain commands - Adds recommended cron jobs for daily backlink fix + weekly lint - References the Thin Harness, Fat Skills thread * fix: CLI routing bugs found during DX review - Fixed subArgs reference error in handleCliOnly (used wrong variable name) - Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid conflict with existing backlinks operation (per-page incoming links) - Added TOOLS section to --help output showing publish, check-backlinks, lint, report - Added upload-raw and signed-url to FILES section in --help - Updated all docs/migration references to use check-backlinks * fix: security hardening from adversarial review - XSS: sanitize marked.parse() output (strip script/iframe/on* attrs) - Path traversal: validate report --type against [a-z0-9-] pattern - TUS: HEAD request before retry to get server's actual offset (TUS spec) - Pointer: upload-raw now includes pointer content in JSON output - Symlinks: use lstatSync in all walkers to prevent directory escape --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
85
CHANGELOG.md
85
CHANGELOG.md
@@ -2,6 +2,91 @@
|
||||
|
||||
All notable changes to GBrain will be documented in this file.
|
||||
|
||||
## [0.9.0] - 2026-04-11
|
||||
|
||||
### Added
|
||||
|
||||
- **Large files don't bloat your git repo anymore.** `gbrain files upload-raw`
|
||||
auto-routes by size: text and PDFs under 100 MB stay in git, everything larger
|
||||
(or any media file) goes to Supabase Storage with a `.redirect.yaml` pointer
|
||||
left in the repo. Files over 100 MB use TUS resumable upload (6 MB chunks with
|
||||
retry and backoff) so a flaky connection doesn't lose a 2 GB video upload.
|
||||
`gbrain files signed-url` generates 1-hour access links for private buckets.
|
||||
|
||||
- **The full file migration lifecycle works end to end.** `mirror` uploads to
|
||||
cloud and keeps local copies. `redirect` replaces local files with
|
||||
`.redirect.yaml` pointers (verifies remote exists first, won't delete data).
|
||||
`restore` downloads back from cloud. `clean` removes pointers when you're sure.
|
||||
`status` shows where you are. Three states, zero data loss risk.
|
||||
|
||||
- **Your brain now enforces its own graph integrity.** The Iron Law of Back-Linking
|
||||
is mandatory across all skills. Every mention of a person or company creates
|
||||
a bidirectional link. This transforms your brain from a flat file store into a
|
||||
traversable knowledge graph.
|
||||
|
||||
- **Filing rules prevent the #1 brain mistake.** New `skills/_brain-filing-rules.md`
|
||||
stops the most common error: dumping everything into `sources/`. File by primary
|
||||
subject, not format. Includes notability gate and citation requirements.
|
||||
|
||||
- **Enrichment protocol that actually works.** Rewritten from a 46-line API list to
|
||||
a 7-step pipeline with 3-tier system, person/company page templates, pluggable
|
||||
data sources, validation rules, and bulk enrichment safety.
|
||||
|
||||
- **Ingest handles everything.** Articles, videos, podcasts, PDFs, screenshots,
|
||||
meeting transcripts, social media. Each with a workflow that uses real gbrain
|
||||
commands (`upload-raw`, `signed-url`) instead of theoretical patterns.
|
||||
|
||||
- **Citation requirements across all skills.** Every fact needs inline
|
||||
`[Source: ...]` citations. Three formats, source precedence hierarchy.
|
||||
|
||||
- **Maintain skill catches what you missed.** Back-link enforcement, citation audit,
|
||||
filing violations, file storage health checks, benchmark testing.
|
||||
|
||||
- **Voice calls don't crash on em dashes anymore.** Unicode sanitization for Twilio
|
||||
WebSocket, PII scrub, identity-first prompt, DIY STT+LLM+TTS pipeline option,
|
||||
Smart VAD default, auto-upload call audio via `gbrain files upload-raw`.
|
||||
|
||||
- **X-to-Brain gets eyes.** Image OCR, Filtered Stream real-time monitoring,
|
||||
6-dimension tweet rating rubric, outbound tweet monitoring, cron staggering.
|
||||
|
||||
- **Share brain pages without exposing the brain.** `gbrain publish` generates
|
||||
beautiful, self-contained HTML from any brain page. Strips private data
|
||||
(frontmatter, citations, confirmations, brain links, timeline) automatically.
|
||||
Optional AES-256-GCM password gate with client-side decryption, no server
|
||||
needed. Dark/light mode, mobile-optimized typography. This is the first
|
||||
code+skill pair: deterministic code does the work, the skill tells the agent
|
||||
when and how. See the [Thin Harness, Fat Skills](https://x.com/garrytan/status/2042925773300908103)
|
||||
thread for the architecture philosophy.
|
||||
|
||||
### Changed
|
||||
|
||||
- **Supabase Storage** now auto-selects upload method by file size: standard POST
|
||||
for < 100 MB, TUS resumable for >= 100 MB. Signed URL generation for private
|
||||
bucket access (1-hour expiry).
|
||||
- **File resolver** supports both `.redirect.yaml` (v0.9+) and legacy `.redirect`
|
||||
(v0.8) formats for backward compatibility.
|
||||
- **Redirect format** upgraded from `.redirect` (5 fields) to `.redirect.yaml`
|
||||
(10 fields: target, bucket, storage_path, size, size_human, hash, mime,
|
||||
uploaded, source_url, type).
|
||||
- **All skills** updated to reference actual `gbrain files` commands instead of
|
||||
theoretical patterns.
|
||||
- **Back-link enforcer closes the loop.** `gbrain check-backlinks check` scans your
|
||||
brain for entity mentions without back-links. `gbrain check-backlinks fix` creates
|
||||
them. The Iron Law of Back-Linking is in every skill, now the code enforces it.
|
||||
|
||||
- **Page linter catches LLM slop.** `gbrain lint` flags "Of course! Here is..."
|
||||
preambles, wrapping code fences, placeholder dates, missing frontmatter, broken
|
||||
citations, and empty sections. `gbrain lint --fix` auto-strips the fixable ones.
|
||||
Every brain that uses AI for ingestion accumulates this. Now it's one command.
|
||||
|
||||
- **Audit trail for everything.** `gbrain report --type enrichment-sweep` saves
|
||||
timestamped reports to `brain/reports/{type}/YYYY-MM-DD-HHMM.md`. The maintain
|
||||
skill references this for enrichment sweeps, meeting syncs, and maintenance runs.
|
||||
|
||||
- **Publish skill** added to manifest (8th skill). First code+skill pair.
|
||||
- Skills version bumped to 0.9.0.
|
||||
- 67 new unit tests across publish, backlinks, lint, and report. Total: 409 pass.
|
||||
|
||||
## [0.8.0] - 2026-04-11
|
||||
|
||||
### Added
|
||||
|
||||
13
CLAUDE.md
13
CLAUDE.md
@@ -26,7 +26,7 @@ markdown files (tool-agnostic, work with both CLI and plugin contexts).
|
||||
- `src/core/sync.ts` — Pure sync functions (manifest parsing, filtering, slug conversion)
|
||||
- `src/core/storage.ts` — Pluggable storage interface (S3, Supabase Storage, local)
|
||||
- `src/core/supabase-admin.ts` — Supabase admin API (project discovery, pgvector check)
|
||||
- `src/core/file-resolver.ts` — MIME detection, content hashing for file uploads
|
||||
- `src/core/file-resolver.ts` — File resolution with fallback chain (local -> .redirect.yaml -> .redirect -> .supabase)
|
||||
- `src/core/chunkers/` — 3-tier chunking (recursive, semantic, LLM-guided)
|
||||
- `src/core/search/` — Hybrid search: vector + keyword + RRF + multi-query expansion + dedup
|
||||
- `src/core/embedding.ts` — OpenAI text-embedding-3-large, batch, retry, backoff
|
||||
@@ -50,7 +50,12 @@ markdown files (tool-agnostic, work with both CLI and plugin contexts).
|
||||
- `docs/guides/diligence-ingestion.md` — Data room to brain pages pipeline
|
||||
- `docs/designs/HOMEBREW_FOR_PERSONAL_AI.md` — 10-star vision for integration system
|
||||
- `docs/mcp/` — Per-client setup guides (Claude Desktop, Code, Cowork, Perplexity)
|
||||
- `skills/_brain-filing-rules.md` — Cross-cutting brain filing rules (referenced by all brain-writing skills)
|
||||
- `skills/migrations/` — Version migration files with feature_pitch YAML frontmatter
|
||||
- `src/commands/publish.ts` — Deterministic brain page publisher (code+skill pair, zero LLM calls)
|
||||
- `src/commands/backlinks.ts` — Back-link checker and fixer (enforces Iron Law)
|
||||
- `src/commands/lint.ts` — Page quality linter (catches LLM artifacts, placeholder dates)
|
||||
- `src/commands/report.ts` — Structured report saver (audit trail for maintenance/enrichment)
|
||||
- `openclaw.plugin.json` — ClawHub bundle plugin manifest
|
||||
|
||||
## Commands
|
||||
@@ -78,7 +83,11 @@ parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redac
|
||||
`test/yaml-lite.test.ts` (YAML parsing), `test/check-update.test.ts` (version check + update CLI),
|
||||
`test/pglite-engine.test.ts` (PGLite engine, all 37 BrainEngine methods),
|
||||
`test/utils.test.ts` (shared SQL utilities), `test/engine-factory.test.ts` (engine factory + dynamic imports),
|
||||
`test/integrations.test.ts` (recipe parsing, CLI routing, recipe validation).
|
||||
`test/integrations.test.ts` (recipe parsing, CLI routing, recipe validation),
|
||||
`test/publish.test.ts` (content stripping, encryption, password generation, HTML output),
|
||||
`test/backlinks.test.ts` (entity extraction, back-link detection, timeline entry generation),
|
||||
`test/lint.test.ts` (LLM artifact detection, code fence stripping, frontmatter validation),
|
||||
`test/report.test.ts` (report format, directory structure).
|
||||
|
||||
E2E tests (`test/e2e/`): Run against real Postgres+pgvector. Require `DATABASE_URL`.
|
||||
- `bun run test:e2e` runs Tier 1 (mechanical, all operations, no API keys)
|
||||
|
||||
@@ -4,10 +4,11 @@ title: "Thin Harness, Fat Skills"
|
||||
subtitle: "How to Make AI Agents Actually Understand Your Data"
|
||||
author: Garry Tan
|
||||
created: 2026-04-09
|
||||
updated: 2026-04-09
|
||||
updated: 2026-04-11
|
||||
tags: [ai, agents, gstack, harness-engineering, skills, architecture]
|
||||
status: draft-v4
|
||||
talk: "YC Spring 2026 — Thin Harness, Fat Skills"
|
||||
talk: "YC Spring 2026 -- Thin Harness, Fat Skills"
|
||||
thread: https://x.com/garrytan/status/2042925773300908103
|
||||
---
|
||||
|
||||
# Thin Harness, Fat Skills
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "gbrain",
|
||||
"version": "0.8.0",
|
||||
"version": "0.9.0",
|
||||
"description": "Postgres-native personal knowledge brain with hybrid RAG search",
|
||||
"type": "module",
|
||||
"main": "src/core/index.ts",
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
---
|
||||
id: twilio-voice-brain
|
||||
name: Voice-to-Brain
|
||||
version: 0.8.0
|
||||
description: Phone calls create brain pages via Twilio + OpenAI Realtime + GBrain MCP. Callers talk, brain pages appear.
|
||||
version: 0.8.1
|
||||
description: Phone calls create brain pages via Twilio + voice pipeline + GBrain MCP. Two architectures -- OpenAI Realtime (turnkey) or DIY STT+LLM+TTS (full control). Callers talk, brain pages appear.
|
||||
category: sense
|
||||
requires: [ngrok-tunnel]
|
||||
secrets:
|
||||
@@ -52,6 +52,9 @@ auth token is incorrect. Let's re-enter it."
|
||||
|
||||
## Architecture
|
||||
|
||||
Two pipeline options:
|
||||
|
||||
### Option A: OpenAI Realtime (turnkey, simpler)
|
||||
```
|
||||
Caller (phone)
|
||||
↓ Twilio (WebSocket, g711_ulaw audio — no transcoding)
|
||||
@@ -64,6 +67,33 @@ Brain page created (meetings/YYYY-MM-DD-call-{caller}.md)
|
||||
Summary posted to messaging app (Telegram/Slack/Discord)
|
||||
```
|
||||
|
||||
### Option B: DIY STT+LLM+TTS (full control, production-grade)
|
||||
```
|
||||
Caller (phone or WebRTC browser)
|
||||
↓ Twilio WebSocket OR WebRTC
|
||||
Voice Server (Node.js)
|
||||
↓ Deepgram STT (streaming speech-to-text, speaker diarization)
|
||||
↓ Claude API (streaming SSE, sentence-boundary dispatch)
|
||||
↓ Cartesia / OpenAI TTS (text-to-speech, low latency)
|
||||
↓ Function calls during conversation
|
||||
GBrain MCP (semantic search, page reads, page writes)
|
||||
↓ Post-call
|
||||
Brain page + audio upload + transcript storage
|
||||
```
|
||||
|
||||
**Why v2 (Option B)?** OpenAI Realtime is a black box — you can't control STT
|
||||
quality, swap LLMs, or debug audio issues. The DIY stack gives you transparent
|
||||
Deepgram+Claude+TTS with full control over each stage. Trade-off: more integration
|
||||
work, but you own the pipeline.
|
||||
|
||||
**Production-tested v2 architecture (pipeline.mjs, ~250 lines):**
|
||||
- Streaming SSE from Claude with sentence-boundary TTS dispatch
|
||||
- 20-turn conversation history cap (prevents context bloat)
|
||||
- Reconnect logic with exponential backoff on STT/TTS disconnects
|
||||
- Periodic keepalives to prevent WebSocket timeout
|
||||
- Audio endpointing for natural turn-taking
|
||||
- Smart VAD (Silero) as default with push-to-talk fallback
|
||||
|
||||
## Opinionated Defaults
|
||||
|
||||
These are production-tested defaults from a real deployment. Customize after setup.
|
||||
@@ -428,7 +458,7 @@ fi
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.gbrain/integrations/twilio-voice-brain
|
||||
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.7.0","status":"ok","details":{"phone":"TWILIO_NUMBER","deployment":"local+ngrok"}}' >> ~/.gbrain/integrations/twilio-voice-brain/heartbeat.jsonl
|
||||
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.8.1","status":"ok","details":{"phone":"TWILIO_NUMBER","deployment":"local+ngrok"}}' >> ~/.gbrain/integrations/twilio-voice-brain/heartbeat.jsonl
|
||||
```
|
||||
|
||||
Tell the user: "Voice-to-brain is fully set up. Your number is [NUMBER]. Here's
|
||||
@@ -472,6 +502,97 @@ The watchdog restarts the server if it crashes."
|
||||
- The watchdog (Step 9) handles this automatically
|
||||
- For a permanent URL: upgrade to ngrok paid ($8/mo) for a static domain, or deploy to Fly.io/Railway instead
|
||||
|
||||
**Note on Option B credentials:** If using the DIY pipeline (Option B), you will
|
||||
also need API keys for your chosen STT provider (e.g., Deepgram) and TTS provider
|
||||
(e.g., Cartesia, OpenAI TTS). Collect and validate these during Step 2 alongside
|
||||
the Twilio and OpenAI credentials listed above.
|
||||
|
||||
## Critical Production Fixes (v0.8.1)
|
||||
|
||||
These are NOT optional. They prevent real production failures discovered in a
|
||||
deployment handling daily calls.
|
||||
|
||||
### Unicode Crash Fix (CRITICAL)
|
||||
|
||||
**Problem:** Em dashes (--), arrows (->), and other non-ASCII characters in the
|
||||
prompt context cause broken surrogate pairs that crash the Twilio WebSocket
|
||||
connection. Phone calls drop silently.
|
||||
|
||||
**Fix:** Replace ALL non-ASCII characters with ASCII equivalents throughout the
|
||||
entire prompt file before sending to Twilio. This is invisible in development
|
||||
(browsers handle unicode fine) and catastrophic in production.
|
||||
|
||||
```javascript
|
||||
function sanitizeForTwilio(text) {
|
||||
return text
|
||||
.replace(/[\u2014\u2013]/g, '--') // em/en dash
|
||||
.replace(/[\u2018\u2019]/g, "'") // smart quotes
|
||||
.replace(/[\u201C\u201D]/g, '"') // smart double quotes
|
||||
.replace(/\u2192/g, '->') // right arrow
|
||||
.replace(/\u2190/g, '<-') // left arrow
|
||||
.replace(/[\u2026]/g, '...') // ellipsis
|
||||
.replace(/[^\x00-\x7F]/g, '') // strip remaining non-ASCII
|
||||
}
|
||||
```
|
||||
|
||||
### PII Scrub from Voice Context (CRITICAL)
|
||||
|
||||
**Problem:** Brain context loaded into the voice prompt may contain phone numbers,
|
||||
email addresses, and other PII. The voice agent reads these aloud to callers.
|
||||
|
||||
**Fix:** Regex-strip PII from all voice context before injecting into the prompt:
|
||||
- Phone numbers: `/\+?\d[\d\s\-().]{7,}\d/g`
|
||||
- Email addresses: `/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g`
|
||||
- URLs with auth tokens or API keys
|
||||
- Any string matching common credential patterns
|
||||
|
||||
### Identity-First Prompt (IMPORTANT)
|
||||
|
||||
**Problem:** Voice agents lose their identity mid-conversation. Saying "You are NOT
|
||||
Claude" doesn't stick. The model reverts to its base persona.
|
||||
|
||||
**Fix:** Put identity FIRST in the system prompt, before any context or rules:
|
||||
```
|
||||
# You ARE [Agent Name]
|
||||
You are [Name], a voice assistant who works with [Brain Name].
|
||||
You are NOT Claude. You are NOT a general AI assistant.
|
||||
[Name] has their own personality: [traits].
|
||||
|
||||
# Context
|
||||
[... brain context, calendar, tasks ...]
|
||||
|
||||
# Rules
|
||||
[... behavioral rules ...]
|
||||
```
|
||||
|
||||
Positioning identity before context ensures the model sees it first and
|
||||
maintains it throughout the conversation.
|
||||
|
||||
### Auto-Upload Call Audio (RECOMMENDED)
|
||||
|
||||
**Problem:** If post-call processing fails, the call audio is lost forever.
|
||||
|
||||
**Fix:** Auto-upload ALL call audio immediately on call end:
|
||||
- Twilio calls: download the MP3 recording URL from Twilio
|
||||
- WebRTC calls: capture via MediaRecorder (webm/opus format)
|
||||
- Upload via `gbrain files upload-raw <audio-file> --page meetings/call-slug --type call-recording`
|
||||
- GBrain auto-routes: small files stay in git, large files go to cloud storage
|
||||
with `.redirect.yaml` pointer. Files >= 100 MB use TUS resumable upload.
|
||||
- Generate signed URLs for playback: `gbrain files signed-url <storage-path>`
|
||||
- This ensures every call has a recoverable audio source regardless
|
||||
of whether the transcript or brain page was created successfully
|
||||
|
||||
### Smart VAD as Default
|
||||
|
||||
**Problem:** Push-to-talk is unnatural on phone calls. Server-side VAD has
|
||||
variable quality.
|
||||
|
||||
**Fix:** Default to Smart VAD (Silero VAD) for voice activity detection:
|
||||
- Better endpointing than server-side VAD
|
||||
- Fewer false triggers in noisy environments
|
||||
- PTT available as fallback (UI toggle for WebRTC clients)
|
||||
- Presets: quiet (0.7 threshold), normal (0.85), noisy (0.95), very_noisy (0.98)
|
||||
|
||||
## Production Patterns (Recommended)
|
||||
|
||||
These patterns come from a production voice deployment handling real calls daily.
|
||||
@@ -488,13 +609,13 @@ AI brain. "I work with [Brain], [Owner]'s AI." Lighter, more playful, more curio
|
||||
#### Pre-Computed Bid System
|
||||
**Problem:** Dead air kills engagement. Voice agents wait passively.
|
||||
**Pattern:** At call start, scan live context and pre-compute up to 10 engagement bids.
|
||||
Two types: informative (tasks, calendar, social radar) and relational (curiosity templates).
|
||||
Two types: informative (tasks, calendar, social monitoring) and relational (curiosity templates).
|
||||
Bids go INTO the prompt so the agent picks from a list. Use bids #1 and #2 for greeting,
|
||||
cycle the rest during conversation. Never ask "anything else?" — bring up the next bid.
|
||||
|
||||
#### Context-First Prompt
|
||||
**Problem:** Voice agent greets generically because it doesn't know what's happening today.
|
||||
**Pattern:** Load live context at call start: tasks, calendar, location, social radar,
|
||||
**Pattern:** Load live context at call start: tasks, calendar, location, social monitoring,
|
||||
morning briefing. Position context FIRST in the prompt (before rules) so the model sees
|
||||
it immediately and uses it in the greeting. Try/catch per section. Cap 500-1000 chars each.
|
||||
|
||||
@@ -658,7 +779,7 @@ over WebRTC data channel — use Whisper post-call instead.
|
||||
| Keyword | Report Loaded |
|
||||
|---------|--------------|
|
||||
| email, inbox, mail | inbox sweep report |
|
||||
| social, twitter, mentions | social radar report |
|
||||
| social, twitter, mentions | social engagement report |
|
||||
| briefing, morning | morning briefing |
|
||||
| meeting | meeting sync report |
|
||||
| slack | slack scan report |
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
---
|
||||
id: x-to-brain
|
||||
name: X-to-Brain
|
||||
version: 0.7.0
|
||||
description: Twitter timeline, mentions, and keyword monitoring flow into brain pages. Tracks deletions and engagement velocity.
|
||||
version: 0.8.1
|
||||
description: Twitter timeline, mentions, and keyword monitoring flow into brain pages. Tracks deletions, engagement velocity, OCR on images, and real-time alerts.
|
||||
category: sense
|
||||
requires: []
|
||||
secrets:
|
||||
@@ -201,7 +201,99 @@ The agent should review collected data 2-3x daily and run enrichment.
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.gbrain/integrations/x-to-brain
|
||||
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.7.0","status":"ok","details":{"user_id":"X_USER_ID"}}' >> ~/.gbrain/integrations/x-to-brain/heartbeat.jsonl
|
||||
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","event":"setup_complete","source_version":"0.8.1","status":"ok","details":{"user_id":"X_USER_ID"}}' >> ~/.gbrain/integrations/x-to-brain/heartbeat.jsonl
|
||||
```
|
||||
|
||||
## Production Patterns (v0.8.1)
|
||||
|
||||
These patterns come from a production deployment tracking 19+ accounts with
|
||||
real-time monitoring.
|
||||
|
||||
### Image OCR (NEW)
|
||||
|
||||
**Problem:** Text-only collection misses visual context in tweet images --
|
||||
screenshots, charts, memes with text overlay, quote screenshots.
|
||||
|
||||
**Fix:** Run OCR on tweet images via a vision model (Claude Sonnet or equivalent):
|
||||
- For every tweet with images, extract full text content via vision API
|
||||
- Store OCR output alongside the tweet data
|
||||
- Include extracted text in entity detection and brain page updates
|
||||
- Charts/data visualizations: extract data points, describe findings
|
||||
|
||||
This catches signal that text-only collectors miss entirely.
|
||||
|
||||
### Real-Time Monitoring via Filtered Stream (NEW)
|
||||
|
||||
**Problem:** 30-minute polling means you find out about things 30 minutes late.
|
||||
For time-sensitive content (engagement spikes, deletions, breaking threads),
|
||||
that's too slow.
|
||||
|
||||
**Fix:** Use Twitter's Filtered Stream API (`GET /2/tweets/search/stream`) for
|
||||
near-real-time monitoring. Catches outbound tweets within seconds.
|
||||
|
||||
**Setup:**
|
||||
1. Add filter rules: `POST /2/tweets/search/stream/rules` with your tracking terms
|
||||
2. Open persistent connection: `GET /2/tweets/search/stream`
|
||||
3. Process tweets as they arrive (no polling delay)
|
||||
|
||||
**Requirements:** Basic tier ($200/mo) minimum for Filtered Stream access.
|
||||
|
||||
**Use alongside polling:** Stream for real-time alerts, polling for completeness
|
||||
(stream can drop tweets during disconnects).
|
||||
|
||||
### Tweet Rating Rubric (NEW)
|
||||
|
||||
**Problem:** Not all tweets deserve the same attention. Without scoring, every
|
||||
tweet gets equal weight.
|
||||
|
||||
**Fix:** Rate tweets on a 6-dimension rubric:
|
||||
1. **Reach** -- follower count, engagement rate
|
||||
2. **Relevance** -- connection to your interests/work
|
||||
3. **Sentiment** -- positive/negative/neutral toward you
|
||||
4. **Novelty** -- new information vs rehash
|
||||
5. **Actionability** -- does this require a response?
|
||||
6. **Virality potential** -- engagement velocity, quote-tweet ratio
|
||||
|
||||
Re-rate after 60 minutes to track engagement trajectory. A tweet at 50 likes
|
||||
that hits 500 in an hour is a different signal than one that stays at 50.
|
||||
|
||||
### Outbound Tweet Monitoring (NEW)
|
||||
|
||||
**Problem:** You tweet something and don't notice engagement patterns until
|
||||
hours later.
|
||||
|
||||
**Fix:** 60-second monitoring window after every outbound tweet:
|
||||
- Check engagement velocity (likes, replies, quotes)
|
||||
- Flag unusual reply-to-like ratios (high reply ratios signal controversy)
|
||||
- Flag if quote-tweet ratio > retweet ratio (commentary, not sharing)
|
||||
- Cross-reference mentioned accounts against brain for context
|
||||
|
||||
### X-to-Brain Pipeline (NEW)
|
||||
|
||||
Every tweet interaction can automatically create/update brain pages:
|
||||
- Mentioned person has a brain page? Append to their timeline
|
||||
- New person mentioned? Check notability gate, create page if notable
|
||||
- Article URL in tweet? Fetch and ingest via article workflow
|
||||
- Video URL in tweet? Queue for transcription pipeline
|
||||
- Images? OCR and extract text content
|
||||
|
||||
Follow `skills/_brain-filing-rules.md` for filing decisions.
|
||||
|
||||
### Cron Staggering (IMPORTANT)
|
||||
|
||||
**Problem:** Multiple cron jobs firing simultaneously causes resource contention
|
||||
and timeouts.
|
||||
|
||||
**Fix:** Stagger all collection schedules so max 1 runs per minute:
|
||||
```
|
||||
# Good: staggered
|
||||
*/30 * * * * x-collector # :00, :30
|
||||
5,35 * * * * x-bundle-ingest # :05, :35
|
||||
10 */3 * * * social-monitor # :10 every 3h
|
||||
|
||||
# Bad: overlapping
|
||||
*/30 * * * * x-collector
|
||||
*/30 * * * * x-bundle-ingest # fires at same time!
|
||||
```
|
||||
|
||||
## Implementation Guide
|
||||
|
||||
114
skills/_brain-filing-rules.md
Normal file
114
skills/_brain-filing-rules.md
Normal file
@@ -0,0 +1,114 @@
|
||||
# Brain Filing Rules -- MANDATORY for all skills that write to the brain
|
||||
|
||||
## The Rule
|
||||
|
||||
The PRIMARY SUBJECT of the content determines where it goes. Not the format,
|
||||
not the source, not the skill that's running.
|
||||
|
||||
## Decision Protocol
|
||||
|
||||
1. Identify the primary subject (a person? company? concept? policy issue?)
|
||||
2. File in the directory that matches the subject
|
||||
3. Cross-link from related directories
|
||||
4. When in doubt: what would you search for to find this page again?
|
||||
|
||||
## Common Misfiling Patterns -- DO NOT DO THESE
|
||||
|
||||
| Wrong | Right | Why |
|
||||
|-------|-------|-----|
|
||||
| Analysis of a topic -> `sources/` | -> appropriate subject directory | sources/ is for raw data only |
|
||||
| Article about a person -> `sources/` | -> `people/` | Primary subject is a person |
|
||||
| Meeting-derived company info -> `meetings/` only | -> ALSO update `companies/` | Entity propagation is mandatory |
|
||||
| Research about a company -> `sources/` | -> `companies/` | Primary subject is a company |
|
||||
| Reusable framework/thesis -> `sources/` | -> `concepts/` | It's a mental model |
|
||||
| Tweet thread about policy -> `media/` | -> `civic/` or `concepts/` | media/ is for content ops |
|
||||
|
||||
## What `sources/` Is Actually For
|
||||
|
||||
`sources/` is ONLY for:
|
||||
- Bulk data imports (API dumps, CSV exports, snapshots)
|
||||
- Raw data that feeds multiple brain pages (e.g., a guest export, contact sync)
|
||||
- Periodic captures (quarterly snapshots, sync exports)
|
||||
|
||||
If the content has a clear primary subject (a person, company, concept, policy
|
||||
issue), it does NOT go in sources/. Period.
|
||||
|
||||
## Notability Gate
|
||||
|
||||
Not everything deserves a brain page. Before creating a new entity page:
|
||||
- **People:** Will you interact with them again? Are they relevant to your work?
|
||||
- **Companies:** Are they relevant to your work or interests?
|
||||
- **Concepts:** Is this a reusable mental model worth referencing later?
|
||||
- **When in doubt, DON'T create.** A missing page can be created later.
|
||||
A junk page wastes attention and degrades search quality.
|
||||
|
||||
## Iron Law: Back-Linking (MANDATORY)
|
||||
|
||||
Every mention of a person or company with a brain page MUST create a back-link
|
||||
FROM that entity's page TO the page mentioning them. This is bidirectional:
|
||||
the new page links to the entity, AND the entity's page links back.
|
||||
|
||||
Format for back-links (append to Timeline or See Also):
|
||||
```
|
||||
- **YYYY-MM-DD** | Referenced in [page title](path/to/page.md) -- brief context
|
||||
```
|
||||
|
||||
An unlinked mention is a broken brain. The graph is the intelligence.
|
||||
|
||||
## Citation Requirements (MANDATORY)
|
||||
|
||||
Every fact written to a brain page must carry an inline `[Source: ...]` citation.
|
||||
|
||||
Three formats:
|
||||
- **Direct attribution:** `[Source: User, {context}, YYYY-MM-DD]`
|
||||
- **API/external:** `[Source: {provider}, YYYY-MM-DD]` or `[Source: {publication}, {URL}]`
|
||||
- **Synthesis:** `[Source: compiled from {list of sources}]`
|
||||
|
||||
Source precedence (highest to lowest):
|
||||
1. User's direct statements (highest authority)
|
||||
2. Compiled truth (pre-existing brain synthesis)
|
||||
3. Timeline entries (raw evidence)
|
||||
4. External sources (API enrichment, web search -- lowest)
|
||||
|
||||
When sources conflict, note the contradiction with both citations. Don't
|
||||
silently pick one.
|
||||
|
||||
## Raw Source Preservation
|
||||
|
||||
Every ingested item should have its raw source preserved for provenance.
|
||||
|
||||
**Size routing (automatic via `gbrain files upload-raw`):**
|
||||
- **< 100 MB text/PDF**: stays in the brain repo (git-tracked) in a `.raw/`
|
||||
sidecar directory alongside the brain page
|
||||
- **>= 100 MB OR media files** (video, audio, images): uploaded to cloud
|
||||
storage (Supabase Storage, S3, etc.) with a `.redirect.yaml` pointer left
|
||||
in the brain repo. Files >= 100 MB use TUS resumable upload (6 MB chunks
|
||||
with retry) for reliability.
|
||||
|
||||
**Upload command:**
|
||||
```bash
|
||||
gbrain files upload-raw <file> --page <page-slug> --type <type>
|
||||
```
|
||||
Returns JSON: `{storage: "git"}` for small files, `{storage: "supabase", storagePath, reference}` for cloud.
|
||||
|
||||
**The `.redirect.yaml` pointer format:**
|
||||
```yaml
|
||||
target: supabase://brain-files/page-slug/filename.mp4
|
||||
bucket: brain-files
|
||||
storage_path: page-slug/filename.mp4
|
||||
size: 524288000
|
||||
size_human: 500 MB
|
||||
hash: sha256:abc123...
|
||||
mime: video/mp4
|
||||
uploaded: 2026-04-11T...
|
||||
type: transcript
|
||||
```
|
||||
|
||||
**Accessing stored files:**
|
||||
```bash
|
||||
gbrain files signed-url <storage-path> # Generate 1-hour signed URL
|
||||
gbrain files restore <dir> # Download back to local
|
||||
```
|
||||
|
||||
This ensures any derived brain page can be traced back to its original source,
|
||||
and large files don't bloat the git repo.
|
||||
@@ -2,6 +2,9 @@
|
||||
|
||||
Compile a daily briefing from brain context.
|
||||
|
||||
> **Filing rule:** When the briefing creates or updates brain pages,
|
||||
> follow `skills/_brain-filing-rules.md`.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Today's meetings.** For each meeting on the calendar:
|
||||
@@ -72,6 +75,18 @@ PEOPLE IN PLAY
|
||||
- [name] -- [why they're active]
|
||||
```
|
||||
|
||||
## Back-Linking During Briefing
|
||||
|
||||
If the briefing creates or updates any brain pages (e.g., new meeting prep
|
||||
pages, updated entity pages), the back-linking iron law applies: every entity
|
||||
mentioned must have a back-link from their page. See `skills/_brain-filing-rules.md`.
|
||||
|
||||
## Citation in Briefings
|
||||
|
||||
When presenting facts from brain pages, include inline citations:
|
||||
- "Jane is CTO of Acme [Source: people/jane-doe, updated 2026-04-01]"
|
||||
- This lets the user trace any claim back to the brain page and assess freshness
|
||||
|
||||
## Tools Used
|
||||
|
||||
- Search gbrain by name (query)
|
||||
|
||||
@@ -1,39 +1,281 @@
|
||||
# Enrich Skill
|
||||
|
||||
Enrich person and company pages from external APIs.
|
||||
Enrich person and company pages from external sources. Scale effort to importance.
|
||||
|
||||
## Sources
|
||||
> **Filing rule:** Read `skills/_brain-filing-rules.md` before creating any new page.
|
||||
|
||||
| Source | Data | API |
|
||||
|--------|------|-----|
|
||||
| Crustdata | LinkedIn profiles, company data | REST API |
|
||||
| Happenstance | Career history, connections | REST API |
|
||||
| Exa | Web mentions, articles | REST API |
|
||||
## Iron Law: Back-Linking (MANDATORY)
|
||||
|
||||
Note: enrichment requires separate API credentials for each service. No client
|
||||
integrations ship in v1. This skill guides the agent to make API calls directly.
|
||||
Every mention of a person or company with a brain page MUST create a back-link
|
||||
FROM that entity's page TO the page mentioning them. An unlinked mention is a
|
||||
broken brain. See `skills/_brain-filing-rules.md` for format.
|
||||
|
||||
## Workflow
|
||||
## Philosophy
|
||||
|
||||
1. **Select target pages.** List person or company pages in gbrain.
|
||||
2. **For each page:**
|
||||
- Read the page from gbrain to understand what we already know
|
||||
- Call external APIs for fresh data
|
||||
- Store raw API responses in gbrain (put_raw_data) to preserve provenance
|
||||
- Distill highlights into compiled_truth updates
|
||||
- Store the updated page in gbrain
|
||||
3. **Validation rules:**
|
||||
- Connection count < 20 on LinkedIn = likely wrong person, skip
|
||||
- Name mismatch between brain and API = skip, flag for manual review
|
||||
- Don't overwrite human-written assessments with API boilerplate
|
||||
A brain page should read like an intelligence dossier, not a LinkedIn scrape.
|
||||
Facts are table stakes. Texture is the value -- what do they believe, what are
|
||||
they building, what makes them tick, where are they headed.
|
||||
|
||||
## Quality Rules
|
||||
## Citation Requirements (MANDATORY)
|
||||
|
||||
- Raw data goes to gbrain's raw_data store (preserves provenance)
|
||||
- Only distilled, useful info goes to compiled_truth
|
||||
- Always add a timeline entry in gbrain: "Enriched from [source] on [date]"
|
||||
- Don't enrich the same page more than once per week unless requested
|
||||
- Rate limit: respect API rate limits, use exponential backoff
|
||||
Every fact must carry an inline `[Source: ...]` citation.
|
||||
|
||||
Three formats:
|
||||
- **Direct attribution:** `[Source: User, {context}, YYYY-MM-DD]`
|
||||
- **API/external:** `[Source: {provider} enrichment, YYYY-MM-DD]`
|
||||
- **Synthesis:** `[Source: compiled from {list of sources}]`
|
||||
|
||||
Source precedence (highest to lowest):
|
||||
1. User's direct statements
|
||||
2. Compiled truth (pre-existing brain synthesis)
|
||||
3. Timeline entries (raw evidence)
|
||||
4. External sources (API enrichment, web search)
|
||||
|
||||
When sources conflict, note the contradiction with both citations.
|
||||
|
||||
## When To Enrich
|
||||
|
||||
### Primary triggers
|
||||
- User mentions an entity in conversation
|
||||
- Entity appears in a meeting transcript or email
|
||||
- New contact appears with significant context
|
||||
- Entity makes news or has a major event
|
||||
- Any ingest pipeline encounters a notable entity
|
||||
|
||||
### Do NOT enrich
|
||||
- Random mentions with no relationship signal
|
||||
- Bot/spam accounts
|
||||
- Entities with no substantive connection to the user's work
|
||||
- Same page enriched within the past week (unless new signal warrants it)
|
||||
|
||||
## Enrichment Tiers
|
||||
|
||||
Scale enrichment to importance. Don't waste API calls on low-value entities.
|
||||
|
||||
| Tier | Who | Effort | Sources |
|
||||
|------|-----|--------|---------|
|
||||
| 1 (key) | Inner circle, close collaborators, key contacts | Full pipeline | All available APIs + deep web research |
|
||||
| 2 (notable) | Occasional interactions, industry figures | Moderate | Web research + social + brain cross-ref |
|
||||
| 3 (minor) | Worth tracking, not critical | Light | Brain cross-ref + social lookup if handle known |
|
||||
|
||||
## The Enrichment Protocol (7 Steps)
|
||||
|
||||
### Step 1: Identify entities
|
||||
|
||||
Extract people, companies, concepts from the incoming signal.
|
||||
|
||||
### Step 2: Check brain state
|
||||
|
||||
For each entity:
|
||||
- `gbrain search "name"` -- does a page already exist?
|
||||
- **If yes:** UPDATE path (add new signal, update compiled truth if material)
|
||||
- **If no:** CREATE path (check notability gate first, then create)
|
||||
|
||||
### Step 3: Extract signal from source
|
||||
|
||||
Don't just capture facts. Capture texture:
|
||||
|
||||
| Signal Type | What to Extract |
|
||||
|-------------|----------------|
|
||||
| Opinions, beliefs | What They Believe section |
|
||||
| Current projects, features shipped | What They're Building section |
|
||||
| Ambition, career arc, motivation | What Motivates Them section |
|
||||
| Topics they return to obsessively | Hobby Horses section |
|
||||
| Who they amplify, argue with, respect | Network / Relationships |
|
||||
| Ascending, plateauing, pivoting? | Trajectory section |
|
||||
| Role, company, funding, location | State section (hard facts) |
|
||||
|
||||
### Step 4: External data source lookups
|
||||
|
||||
Priority order -- stop when you have enough signal for the entity's tier.
|
||||
|
||||
**4a. Brain cross-reference (always, all tiers)**
|
||||
- `gbrain search "name"` and `gbrain query "what do we know about name"`
|
||||
- Check related pages: company pages for person enrichment and vice versa
|
||||
- This is free and often the richest source
|
||||
|
||||
**4b. Web research (Tier 1 and 2)**
|
||||
- Use Perplexity, Brave Search, Exa, or equivalent web research tool
|
||||
- **Key pattern:** Send existing brain knowledge as context so the search
|
||||
returns DELTA (what's new vs what you already know), not a rehash
|
||||
- Opus-class models for Tier 1 deep research, lighter models for Tier 2
|
||||
|
||||
**4c. Social media lookup (all tiers when handle known)**
|
||||
- Pull recent posts/tweets for tone, interests, current focus
|
||||
- Social media is the highest-texture signal for what someone actually thinks
|
||||
|
||||
**4d. People enrichment APIs (Tier 1)**
|
||||
- LinkedIn data, career history, connections, education
|
||||
|
||||
**4e. Company enrichment APIs (Tier 1)**
|
||||
- Company data, financials, headcount, key hires, recent news
|
||||
|
||||
| Data Need | Example Sources | Tier |
|
||||
|-----------|----------------|------|
|
||||
| Web research | Perplexity, Brave, Exa | 1-2 |
|
||||
| LinkedIn / career | Crustdata, Proxycurl, People Data Labs | 1 |
|
||||
| Career history | Happenstance, LinkedIn | 1 |
|
||||
| Funding / company data | Crunchbase, PitchBook, Clearbit | 1 |
|
||||
| Social media | Platform APIs, web scraping | 1-3 |
|
||||
| Meeting history | Calendar/meeting transcript tools | 1-2 |
|
||||
|
||||
### Step 5: Save raw data (preserves provenance)
|
||||
|
||||
Store raw API responses via `put_raw_data` in gbrain:
|
||||
```json
|
||||
{
|
||||
"source": "crustdata",
|
||||
"fetched_at": "2026-04-11T...",
|
||||
"query": "jane doe",
|
||||
"data": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
Raw data preserves provenance. If the compiled truth is ever questioned,
|
||||
the raw data shows exactly what the API returned.
|
||||
|
||||
### Step 6: Write to brain
|
||||
|
||||
#### CREATE path
|
||||
|
||||
1. Check notability gate (see `skills/_brain-filing-rules.md`)
|
||||
2. Check filing rules -- where does this entity go?
|
||||
3. Create page with the appropriate template (below)
|
||||
4. Fill compiled truth with citations
|
||||
5. Add first timeline entry
|
||||
6. Leave empty sections as `[No data yet]` (don't fill with boilerplate)
|
||||
|
||||
#### UPDATE path
|
||||
|
||||
1. Add new timeline entries (reverse-chronological, append-only)
|
||||
2. Update compiled truth ONLY if the new signal materially changes the picture
|
||||
3. Update State section with new facts
|
||||
4. Flag contradictions between new signal and existing compiled truth
|
||||
5. Don't overwrite user-written assessments with API boilerplate
|
||||
|
||||
#### Person page template
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: Full Name
|
||||
type: person
|
||||
created: YYYY-MM-DD
|
||||
updated: YYYY-MM-DD
|
||||
tags: []
|
||||
company: Current Company
|
||||
relationship: How the user knows them
|
||||
email:
|
||||
linkedin:
|
||||
twitter:
|
||||
location:
|
||||
---
|
||||
|
||||
# Full Name
|
||||
|
||||
> 1-paragraph executive summary: HOW do you know them, WHY do they matter,
|
||||
> what's the current state of the relationship.
|
||||
|
||||
## State
|
||||
Role, company, key context. Hard facts only.
|
||||
|
||||
## What They Believe
|
||||
Ideology, first principles, worldview. What hills do they die on?
|
||||
|
||||
## What They're Building
|
||||
Current projects, recent launches, what they're focused on.
|
||||
|
||||
## What Motivates Them
|
||||
Ambition, career arc, what drives them.
|
||||
|
||||
## Hobby Horses
|
||||
Topics they return to obsessively. Recurring themes in their work/posts.
|
||||
|
||||
## Assessment
|
||||
Your read on this person. Strengths, gaps, trajectory.
|
||||
|
||||
## Trajectory
|
||||
Ascending, plateauing, pivoting, declining? Where are they headed?
|
||||
|
||||
## Relationship
|
||||
History of interactions, shared context, relationship quality.
|
||||
|
||||
## Contact
|
||||
Email, social handles, preferred communication channel.
|
||||
|
||||
## Network
|
||||
Key connections, mutual contacts, organizational relationships.
|
||||
|
||||
## Open Threads
|
||||
Active conversations, pending items, things to follow up on.
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
Reverse chronological. Every entry has a date and [Source: ...] citation.
|
||||
- **YYYY-MM-DD** | Event description [Source: ...]
|
||||
```
|
||||
|
||||
#### Company page template
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: Company Name
|
||||
type: company
|
||||
created: YYYY-MM-DD
|
||||
updated: YYYY-MM-DD
|
||||
tags: []
|
||||
---
|
||||
|
||||
# Company Name
|
||||
|
||||
> 1-paragraph executive summary.
|
||||
|
||||
## State
|
||||
What they do, stage, key people, key metrics, your connection.
|
||||
|
||||
## Open Threads
|
||||
Active items, pending decisions, things to track.
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
- **YYYY-MM-DD** | Event description [Source: ...]
|
||||
```
|
||||
|
||||
### Step 7: Cross-reference
|
||||
|
||||
- Update company pages from person enrichment (and vice versa)
|
||||
- Update related project/deal pages if relevant context surfaced
|
||||
- Add back-links from every entity mentioned (MANDATORY)
|
||||
- Check index files if the brain uses them
|
||||
|
||||
## Bulk Enrichment Rules
|
||||
|
||||
- **Test on 3-5 entities first.** Read actual output. Check quality.
|
||||
- Only proceed to bulk after test shots pass your quality bar.
|
||||
- 3+ entities from one source -> batch process or spawn sub-agent
|
||||
- Throttle API calls. Respect rate limits.
|
||||
- Commit every 5-10 entities during bulk runs.
|
||||
- Save a report after bulk enrichment (see Report Storage below).
|
||||
|
||||
## Validation Rules
|
||||
|
||||
- Connection count < 20 on LinkedIn = likely wrong person, skip
|
||||
- Name mismatch between brain and API = skip, flag for review
|
||||
- Joke profiles or obviously wrong data = save to raw, don't update page
|
||||
- Don't overwrite user-written assessments with API boilerplate
|
||||
- When in doubt: save raw data but don't update brain page
|
||||
|
||||
## Report Storage
|
||||
|
||||
After enrichment sweeps, save a report:
|
||||
- Number of entities processed
|
||||
- New pages created vs existing updated
|
||||
- Data sources called and results quality
|
||||
- Notable discoveries or contradictions
|
||||
- Validation flags or API failures
|
||||
|
||||
This creates an audit trail for brain enrichment over time.
|
||||
|
||||
## Tools Used
|
||||
|
||||
@@ -43,3 +285,5 @@ integrations ship in v1. This skill guides the agent to make API calls directly.
|
||||
- List pages in gbrain by type (list_pages)
|
||||
- Store raw API data in gbrain (put_raw_data)
|
||||
- Retrieve raw data from gbrain (get_raw_data)
|
||||
- Link entities in gbrain (add_link)
|
||||
- Check backlinks in gbrain (get_backlinks)
|
||||
|
||||
@@ -1,6 +1,25 @@
|
||||
# Ingest Skill
|
||||
|
||||
Ingest meetings, articles, documents, and conversations into the brain.
|
||||
Ingest meetings, articles, media, documents, and conversations into the brain.
|
||||
|
||||
> **Filing rule:** Read `skills/_brain-filing-rules.md` before creating any new page.
|
||||
|
||||
## Iron Law: Back-Linking (MANDATORY)
|
||||
|
||||
Every mention of a person or company with a brain page MUST create a back-link
|
||||
FROM that entity's page TO the page mentioning them. An unlinked mention is a
|
||||
broken brain. See `skills/_brain-filing-rules.md` for format.
|
||||
|
||||
## Citation Requirements (MANDATORY)
|
||||
|
||||
Every fact written to a brain page must carry an inline `[Source: ...]` citation.
|
||||
|
||||
- **User's statements:** `[Source: User, {context}, YYYY-MM-DD]`
|
||||
- **Meeting data:** `[Source: Meeting "{title}", YYYY-MM-DD]`
|
||||
- **Email/message:** `[Source: email from {name} re: {subject}, YYYY-MM-DD]`
|
||||
- **Web content:** `[Source: {publication}, {URL}, YYYY-MM-DD]`
|
||||
- **Social media:** `[Source: X/@handle, YYYY-MM-DD](URL)` (include link)
|
||||
- **Synthesis:** `[Source: compiled from {sources}]`
|
||||
|
||||
## Workflow
|
||||
|
||||
@@ -8,10 +27,11 @@ Ingest meetings, articles, documents, and conversations into the brain.
|
||||
2. **For each entity mentioned:**
|
||||
- Read the entity's page from gbrain to check if it exists
|
||||
- If exists: update compiled_truth (rewrite State section with new info, don't append)
|
||||
- If new: store the page in gbrain with the appropriate type and slug
|
||||
3. **Append to timeline.** Add a timeline entry in gbrain for each event, with date, summary, and source.
|
||||
- If new: check notability gate, then store the page in gbrain with the appropriate type and slug
|
||||
3. **Append to timeline.** Add a timeline entry in gbrain for each event, with date, summary, and source citation.
|
||||
4. **Create cross-reference links.** Link entities in gbrain for every entity pair mentioned together, using the appropriate relationship type.
|
||||
5. **Timeline merge.** The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.
|
||||
5. **Back-link all entities.** Update EVERY mentioned entity's page with a back-link to this page (Iron Law).
|
||||
6. **Timeline merge.** The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.
|
||||
|
||||
## Entity Detection on Every Message
|
||||
|
||||
@@ -26,13 +46,11 @@ the signal detection loop that makes the brain compound over time.
|
||||
- `gbrain search "name"` -- does a page already exist?
|
||||
- **If yes:** load context with `gbrain get <slug>`. Use the compiled truth to
|
||||
inform your response. Update the page if the message contains new information.
|
||||
- **If no:** assess notability. If the entity is worth tracking (will come up
|
||||
again, is relevant to the user's world), create a new page with
|
||||
`gbrain put <type/slug>` and populate with what you know.
|
||||
3. **After creating or updating pages:** commit changes to the brain repo, then
|
||||
sync to gbrain:
|
||||
- **If no:** assess notability (see `skills/_brain-filing-rules.md`). If the entity
|
||||
is worth tracking, create a new page with `gbrain put <type/slug>` and populate
|
||||
with what you know.
|
||||
3. **After creating or updating pages:** sync to gbrain:
|
||||
```bash
|
||||
git add brain/ && git commit -m "update entity pages"
|
||||
gbrain sync --no-pull --no-embed
|
||||
```
|
||||
4. **Don't block the conversation.** Entity detection and enrichment should happen
|
||||
@@ -42,18 +60,184 @@ the signal detection loop that makes the brain compound over time.
|
||||
### What counts as notable
|
||||
|
||||
- People the user interacts with or discusses (not random mentions)
|
||||
- Companies relevant to the user's work, investments, or interests
|
||||
- Companies relevant to the user's work or interests
|
||||
- Concepts or frameworks the user references or creates
|
||||
- The user's own original thinking (ideas, theses, observations) -- highest value
|
||||
- See `skills/_brain-filing-rules.md` for the full notability gate
|
||||
|
||||
### What to capture from the user's own thinking
|
||||
|
||||
Original thinking is the most valuable signal. Capture exact phrasing -- the user's
|
||||
language IS the insight. Don't paraphrase.
|
||||
|
||||
- Novel observations or theses
|
||||
- Frameworks, mental models, heuristics
|
||||
- Connections between ideas that others miss
|
||||
- Contrarian positions with reasoning
|
||||
- Strong reactions to external stimuli (what triggered it and why)
|
||||
|
||||
## Media Workflows
|
||||
|
||||
Content the user encounters should be captured in the brain. File by PRIMARY
|
||||
SUBJECT, not by format (see `skills/_brain-filing-rules.md`).
|
||||
|
||||
### Articles & Web Content
|
||||
|
||||
**Input:** URL shared by user, or article mentioned in conversation.
|
||||
|
||||
**Process:**
|
||||
1. Fetch content (`web_fetch` or equivalent)
|
||||
2. Extract: title, author, publication, date, full text
|
||||
3. Summarize: executive summary + key arguments (not a rehash)
|
||||
4. Extract entities: people, companies, concepts mentioned
|
||||
5. **Save raw source** for provenance (see Raw Source Preservation below)
|
||||
6. Analyze for the user: don't just summarize. What's interesting given what you
|
||||
know about them? Flag connections, contradictions, content opportunities.
|
||||
|
||||
**Write to:** appropriate directory per filing rules (about a person -> `people/`,
|
||||
about a company -> `companies/`, reusable framework -> `concepts/`, raw data -> `sources/`)
|
||||
|
||||
### Videos & Podcasts
|
||||
|
||||
**Input:** URL (YouTube, podcast, etc.) or local audio/video file.
|
||||
|
||||
**Process:**
|
||||
1. Get transcript -- speaker-diarized if possible (services like Diarize.io provide
|
||||
speaker-labeled, word-level timing)
|
||||
2. **Save raw transcript** (both JSON and human-readable TXT)
|
||||
3. Analyze: executive summary, key ideas, key quotes with speaker attribution,
|
||||
notable stories/anecdotes, people and companies mentioned
|
||||
4. Extract and cross-reference all entities mentioned
|
||||
5. **HARD RULE:** every video/podcast brain page MUST link to the raw diarized
|
||||
transcript. A page without transcript links is incomplete.
|
||||
|
||||
**Write to:** `media/videos/` or `media/podcasts/` with back-links to all entities.
|
||||
|
||||
**Quality bar:**
|
||||
- Compelling headline (not "This video discusses...")
|
||||
- Executive summary that makes you want to watch/listen
|
||||
- Key Ideas as actual insights, not topic labels
|
||||
- Verbatim quotes with real speaker names (not "speaker_0")
|
||||
- All entities extracted with context and back-linked
|
||||
|
||||
### PDFs & Documents
|
||||
|
||||
**Input:** File path or URL.
|
||||
|
||||
**Process:**
|
||||
1. Extract text (OCR if scanned/image PDF)
|
||||
2. **Save raw source** for provenance
|
||||
3. Summarize: executive summary + key sections + notable data
|
||||
4. Extract entities
|
||||
5. Cross-reference from entity pages
|
||||
|
||||
**Write to:** per filing rules (file by primary subject, not format).
|
||||
|
||||
### Screenshots & Images
|
||||
|
||||
**Input:** Image file.
|
||||
|
||||
**Process:**
|
||||
1. Analyze content (OCR for text-heavy images, description for photos)
|
||||
2. If tweet screenshot: extract text, author, date, route to social media workflow
|
||||
3. If article screenshot: extract text, route to article workflow
|
||||
4. If data/chart: extract data points, describe findings
|
||||
|
||||
**Write to:** depends on content -- route to the appropriate workflow above.
|
||||
|
||||
### Meeting Transcripts
|
||||
|
||||
**Input:** Transcript from meeting recording service, or manual notes.
|
||||
|
||||
**Process:**
|
||||
1. Pull full transcript (source of truth -- AI summaries are medium-low trust)
|
||||
2. **Save raw transcript** for provenance
|
||||
3. Write meeting page with YOUR analysis above the line, raw transcript below
|
||||
4. **Entity propagation (MANDATORY):** for each attendee and company discussed:
|
||||
- Update their brain page State section if new info surfaced
|
||||
- Append to their Timeline with link to the meeting page
|
||||
- Create page if person/company is notable and has no page yet
|
||||
5. A meeting is NOT fully ingested until all entity pages are updated
|
||||
|
||||
**Write to:** `meetings/YYYY-MM-DD-short-description.md`
|
||||
|
||||
**What makes a good meeting page:**
|
||||
- Reveals the real crux, not a bullet dump
|
||||
- Connects to existing brain pages (people, companies, deals)
|
||||
- Flags what changed (status, decisions, new info)
|
||||
- Names tension or what was left unsaid
|
||||
- Captures actual dynamic, not performative summary
|
||||
|
||||
### Social Media Content
|
||||
|
||||
**Input:** Tweet, thread, or social media post.
|
||||
|
||||
**Process:**
|
||||
1. Fetch full content (thread, quote tweets, context)
|
||||
2. If images present: OCR via vision model for full text extraction
|
||||
3. Summarize: what's being said, why it matters, who's involved
|
||||
4. Extract entities and update brain pages
|
||||
5. Include direct link to the original post (MANDATORY for citations)
|
||||
|
||||
**Write to:** `media/x/` for daily aggregation, or entity-specific directories
|
||||
if the post is primarily about a person/company.
|
||||
|
||||
## Raw Source Preservation
|
||||
|
||||
Every ingested item must have its raw source preserved for provenance.
|
||||
|
||||
**Use `gbrain files upload-raw` for automatic size routing:**
|
||||
```bash
|
||||
gbrain files upload-raw <file> --page <page-slug> --type <type>
|
||||
```
|
||||
|
||||
- **< 100 MB text/PDF**: stays in git (brain repo `.raw/` sidecar directories)
|
||||
- **>= 100 MB OR media** (video, audio, images): uploaded to cloud storage
|
||||
via TUS resumable upload, `.redirect.yaml` pointer left in the brain repo
|
||||
|
||||
The `.redirect.yaml` pointer format:
|
||||
```yaml
|
||||
target: supabase://brain-files/page-slug/filename.mp4
|
||||
bucket: brain-files
|
||||
storage_path: page-slug/filename.mp4
|
||||
size: 524288000
|
||||
size_human: 500 MB
|
||||
hash: sha256:abc123...
|
||||
mime: video/mp4
|
||||
uploaded: 2026-04-11T...
|
||||
type: transcript
|
||||
```
|
||||
|
||||
**Accessing stored files:**
|
||||
- `gbrain files signed-url <storage-path>` -- generate 1-hour signed URL for viewing/sharing
|
||||
- `gbrain files restore <dir>` -- download back to local from cloud storage
|
||||
|
||||
Use `put_raw_data` in gbrain to store raw API responses and metadata (JSON, not binary).
|
||||
|
||||
## Test Before Bulk
|
||||
|
||||
When processing multiple items (batch video ingestion, bulk meeting processing, etc.):
|
||||
|
||||
1. **Test on 3-5 items first.** Run in test mode if available.
|
||||
2. **Read the actual output.** Is the quality good? Are titles compelling (not
|
||||
"This video discusses...")? Are entities extracted and back-linked? Is the
|
||||
format clean?
|
||||
3. **Fix what's wrong** in the approach/skill, not via one-off patches.
|
||||
4. **Only then: bulk execute** with throttling, commits every 5-10 items.
|
||||
|
||||
The marginal cost of testing 3 items first is near zero. The cost of cleaning
|
||||
up 100 bad pages is enormous.
|
||||
|
||||
## Quality Rules
|
||||
|
||||
- Executive summary in compiled_truth must be updated, not just timeline appended
|
||||
- State section is REWRITTEN, not appended to. Current best understanding only.
|
||||
- Timeline entries are reverse-chronological (newest first)
|
||||
- Every person/company mentioned gets a page if one doesn't exist
|
||||
- Every person/company mentioned gets a page if notable (see filing rules)
|
||||
- Link types: knows, works_at, invested_in, founded, met_at, discussed
|
||||
- Source attribution: every timeline entry includes the source (meeting, article, email, etc.)
|
||||
- Source attribution: every timeline entry includes [Source: ...] citation
|
||||
- Back-links: every entity mention creates a back-link (Iron Law)
|
||||
- Filing: file by primary subject, not format or source (see filing rules)
|
||||
|
||||
## Tools Used
|
||||
|
||||
@@ -63,3 +247,5 @@ the signal detection loop that makes the brain compound over time.
|
||||
- Link entities in gbrain (add_link)
|
||||
- List tags for a page (get_tags)
|
||||
- Tag a page in gbrain (add_tag)
|
||||
- Store raw data in gbrain (put_raw_data)
|
||||
- Check backlinks in gbrain (get_backlinks)
|
||||
|
||||
@@ -25,6 +25,29 @@ Links pointing to pages that don't exist.
|
||||
Pages that mention entity names but don't have formal links.
|
||||
- Read compiled_truth from gbrain, extract entity mentions, create links in gbrain
|
||||
|
||||
### Back-link enforcement
|
||||
Check that the back-linking iron law is being followed:
|
||||
- For each recently updated page, check if entities mentioned in it have
|
||||
corresponding back-links FROM those entity pages
|
||||
- A mention without a back-link is a broken brain
|
||||
- Fix: add the missing back-link to the entity's Timeline or See Also section
|
||||
- Format: `- **YYYY-MM-DD** | Referenced in [page title](path) -- brief context`
|
||||
|
||||
### Filing rule violations
|
||||
Check for common misfiling patterns (see `skills/_brain-filing-rules.md`):
|
||||
- Content with clear primary subjects filed in `sources/` instead of the
|
||||
appropriate directory (people/, companies/, concepts/, etc.)
|
||||
- Use gbrain search to find pages in `sources/` that reference specific
|
||||
people, companies, or concepts -- these may be misfiled
|
||||
- Flag misfiled pages for review or re-filing
|
||||
|
||||
### Citation audit
|
||||
Spot-check pages for missing `[Source: ...]` citations:
|
||||
- Read 5-10 recently updated pages
|
||||
- Check that compiled truth (above the line) has inline citations
|
||||
- Check that timeline entries have source attribution
|
||||
- Flag pages where facts appear without provenance
|
||||
|
||||
### Tag consistency
|
||||
Inconsistent tagging (e.g., "vc" vs "venture-capital", "ai" vs "artificial-intelligence").
|
||||
- Standardize to the most common variant using gbrain tag operations
|
||||
@@ -44,10 +67,37 @@ Check that the schema version is up to date. `gbrain doctor --json` reports
|
||||
the current version vs expected. If behind, `gbrain init` runs migrations
|
||||
automatically.
|
||||
|
||||
### File storage health
|
||||
Check the integrity of stored files and redirect pointers:
|
||||
- Run `gbrain files verify` to check all DB records have valid data
|
||||
- Run `gbrain files status` to see migration state (local, mirrored, redirected)
|
||||
- Check for orphan `.redirect.yaml` pointers that reference missing storage files
|
||||
- Check for large binary files (>= 100 MB) still in git that should be in cloud storage
|
||||
- If storage backend is configured: verify redirect pointers resolve (download test)
|
||||
|
||||
### Open threads
|
||||
Timeline items older than 30 days with unresolved action items.
|
||||
- Flag for review
|
||||
|
||||
## Benchmark Testing
|
||||
|
||||
Periodically verify search quality hasn't regressed. Run a battery of test
|
||||
queries across difficulty tiers:
|
||||
|
||||
- **Tier 1 (entity lookup):** known names -- should always resolve
|
||||
- **Tier 2 (topic recall):** concepts, topics -- keyword search should handle
|
||||
- **Tier 3 (semantic):** queries with no exact keyword match -- needs embeddings
|
||||
- **Tier 4 (cross-domain):** relational/connection queries -- only semantic handles
|
||||
|
||||
Compare results from `gbrain search` (keyword) vs `gbrain query` (hybrid).
|
||||
Quality matters more than speed (2.5s right > 200ms wrong).
|
||||
|
||||
When to run benchmarks:
|
||||
- After major brain imports or re-imports
|
||||
- After gbrain version upgrades
|
||||
- After embedding regeneration
|
||||
- Monthly to track quality drift
|
||||
|
||||
## Heartbeat Integration
|
||||
|
||||
For production agents running on a schedule, integrate gbrain health checks into
|
||||
@@ -78,6 +128,18 @@ Flag pages where compiled truth is >30 days old but the timeline has recent entr
|
||||
This means new evidence exists that hasn't been synthesized. These pages need a
|
||||
compiled truth rewrite (see the maintain workflow above).
|
||||
|
||||
## Report Storage
|
||||
|
||||
After maintenance runs, save a report:
|
||||
- Health check results (before/after scores for each dimension)
|
||||
- Back-link violations found and fixed
|
||||
- Filing rule violations found
|
||||
- Citation gaps flagged
|
||||
- Benchmark results (if run)
|
||||
- Outstanding issues requiring user attention
|
||||
|
||||
This creates an audit trail for brain health over time.
|
||||
|
||||
## Quality Rules
|
||||
|
||||
- Never delete pages without confirmation
|
||||
|
||||
@@ -1,32 +1,32 @@
|
||||
{
|
||||
"name": "gbrain",
|
||||
"version": "0.8.0",
|
||||
"version": "0.9.0",
|
||||
"description": "Personal knowledge brain with hybrid RAG search",
|
||||
"skills": [
|
||||
{
|
||||
"name": "ingest",
|
||||
"path": "ingest/SKILL.md",
|
||||
"description": "Ingest meetings, docs, articles into the brain"
|
||||
"description": "Ingest meetings, media, articles, and documents with back-linking, filing rules, and citation requirements"
|
||||
},
|
||||
{
|
||||
"name": "query",
|
||||
"path": "query/SKILL.md",
|
||||
"description": "Answer questions using 3-layer search and synthesis"
|
||||
"description": "Answer questions using 3-layer search, synthesis, and citation propagation"
|
||||
},
|
||||
{
|
||||
"name": "maintain",
|
||||
"path": "maintain/SKILL.md",
|
||||
"description": "Brain health checks: contradictions, stale info, orphans"
|
||||
"description": "Brain health checks: back-link enforcement, citation audit, filing validation, stale info, orphans, benchmarks"
|
||||
},
|
||||
{
|
||||
"name": "enrich",
|
||||
"path": "enrich/SKILL.md",
|
||||
"description": "Enrich pages from external APIs (Crustdata, Happenstance, Exa)"
|
||||
"description": "Enrich pages with tiered enrichment protocol, person/company page templates, and validation rules"
|
||||
},
|
||||
{
|
||||
"name": "briefing",
|
||||
"path": "briefing/SKILL.md",
|
||||
"description": "Compile daily briefing with meeting context and active deals"
|
||||
"description": "Compile daily briefing with meeting context, active deals, and citation tracking"
|
||||
},
|
||||
{
|
||||
"name": "migrate",
|
||||
@@ -36,7 +36,12 @@
|
||||
{
|
||||
"name": "setup",
|
||||
"path": "setup/SKILL.md",
|
||||
"description": "Set up GBrain: auto-provision Supabase, AGENTS.md injection, first import"
|
||||
"description": "Set up GBrain: auto-provision Supabase or PGLite, AGENTS.md injection, first import"
|
||||
},
|
||||
{
|
||||
"name": "publish",
|
||||
"path": "publish/SKILL.md",
|
||||
"description": "Share brain pages as beautiful password-protected HTML (code + skill pair, zero LLM calls)"
|
||||
}
|
||||
],
|
||||
"dependencies": {
|
||||
|
||||
103
skills/migrations/v0.8.1.md
Normal file
103
skills/migrations/v0.8.1.md
Normal file
@@ -0,0 +1,103 @@
|
||||
---
|
||||
version: 0.8.1
|
||||
feature_pitch:
|
||||
headline: "Your brain skills learned from production"
|
||||
description: "Back-linking iron law, filing rules, enrichment protocol, media ingest, citation requirements, voice crash fixes -- all battle-tested from real production deployments. Superseded by v0.9.0."
|
||||
---
|
||||
|
||||
# v0.8.1 Migration: Battle-Tested Skill Patterns
|
||||
|
||||
This migration updates your agent's brain-writing patterns and voice recipe.
|
||||
No schema changes, no database migrations. Skills and recipes only.
|
||||
|
||||
## What's New
|
||||
|
||||
### Back-Linking Iron Law
|
||||
Every skill that writes to the brain now enforces bidirectional linking. If a
|
||||
page mentions a person or company, that entity's page MUST link back. This was
|
||||
the single biggest knowledge graph integrity rule missing from v0.8.0.
|
||||
|
||||
### Brain Filing Rules (`skills/_brain-filing-rules.md`)
|
||||
Cross-cutting rules for where brain pages go. File by PRIMARY SUBJECT, not
|
||||
format or source. Includes common misfiling patterns table and notability gate.
|
||||
|
||||
### Enrichment Protocol (rewritten)
|
||||
The enrich skill is now a full 7-step pipeline with 3-tier system (scale effort
|
||||
to entity importance), person page section template (What They Believe, Building,
|
||||
Motivates, Hobby Horses, Trajectory), pluggable data source table, and validation.
|
||||
|
||||
### Media Ingestion Workflows
|
||||
The ingest skill now handles articles, videos, podcasts, PDFs, screenshots, and
|
||||
social media content. Each with a complete input->process->write workflow.
|
||||
|
||||
### Citation Requirements
|
||||
Every fact needs inline `[Source: ...]` citations across all skills. Three formats
|
||||
defined (direct, API/external, synthesis) with source precedence hierarchy.
|
||||
|
||||
### Voice Recipe Production Fixes
|
||||
Critical fixes from a production voice deployment: unicode crash fix (non-ASCII
|
||||
breaks Twilio WebSocket), PII scrub from voice context, identity-first prompt
|
||||
architecture, Smart VAD default, auto-upload call audio, and the DIY STT+LLM+TTS
|
||||
pipeline (Deepgram+Claude+Cartesia) as an alternative to OpenAI Realtime.
|
||||
|
||||
### X-to-Brain Recipe Enhancements
|
||||
Image OCR via vision models, real-time monitoring via Filtered Stream API,
|
||||
6-dimension tweet rating rubric, outbound tweet monitoring, X-to-Brain automatic
|
||||
pipeline, and cron staggering for resource management.
|
||||
|
||||
## Agent Instructions
|
||||
|
||||
### Step 1: Verify skills are updated
|
||||
|
||||
Check that your skills directory has the new files:
|
||||
```bash
|
||||
ls skills/_brain-filing-rules.md && echo "Filing rules: OK"
|
||||
grep -q '"version": "0.8.1"' skills/manifest.json && echo "Manifest: OK"
|
||||
```
|
||||
|
||||
Files that should be updated:
|
||||
- `skills/_brain-filing-rules.md` (new)
|
||||
- `skills/enrich/SKILL.md` (rewritten)
|
||||
- `skills/ingest/SKILL.md` (expanded)
|
||||
- `skills/maintain/SKILL.md` (expanded)
|
||||
- `skills/query/SKILL.md` (updated)
|
||||
- `skills/briefing/SKILL.md` (updated)
|
||||
- `recipes/twilio-voice-brain.md` (v0.8.1 production fixes)
|
||||
- `recipes/x-to-brain.md` (v0.8.1 enhancements)
|
||||
|
||||
### Step 2: Review existing brain pages for back-link gaps
|
||||
|
||||
Spot-check: pick 5 recently created pages and verify that every entity mentioned
|
||||
has a back-link from the entity's page to the mentioning page.
|
||||
|
||||
If back-links are missing, add them:
|
||||
```
|
||||
- **YYYY-MM-DD** | Referenced in [page title](path/to/page.md) -- brief context
|
||||
```
|
||||
|
||||
### Step 3: Review filing
|
||||
|
||||
Check for pages in `sources/` that should be in a subject-specific directory
|
||||
(people/, companies/, concepts/, civic/). See `skills/_brain-filing-rules.md`.
|
||||
|
||||
### Step 4: Voice recipe (if using)
|
||||
|
||||
If you have voice-to-brain set up:
|
||||
1. Apply the unicode sanitization fix to your voice server
|
||||
2. Add PII scrub to your prompt builder
|
||||
3. Move identity to the TOP of your system prompt
|
||||
4. Consider switching to Smart VAD (Silero) as default
|
||||
|
||||
### Step 5: X-to-brain recipe (if using)
|
||||
|
||||
If you have X-to-brain set up:
|
||||
1. Add image OCR to your tweet processing pipeline
|
||||
2. Consider Filtered Stream API for real-time monitoring
|
||||
3. Stagger your cron schedules (max 1 per minute)
|
||||
|
||||
### Step 6: Done
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.gbrain/migrations
|
||||
echo '{"version":"0.8.1","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","status":"complete"}' >> ~/.gbrain/migrations/completed.jsonl
|
||||
```
|
||||
231
skills/migrations/v0.9.0.md
Normal file
231
skills/migrations/v0.9.0.md
Normal file
@@ -0,0 +1,231 @@
|
||||
---
|
||||
version: 0.9.0
|
||||
feature_pitch:
|
||||
headline: "5 new deterministic tools, smart file uploads, production-grade skills"
|
||||
description: "gbrain publish, backlinks, lint, report, and upload-raw. Code+skill pairs -- deterministic TypeScript does the work, skills tell the agent when to use it. Plus TUS resumable uploads, .redirect.yaml pointers, and battle-tested skill patterns."
|
||||
tiers:
|
||||
- name: "Core tools (everyone)"
|
||||
description: "publish, backlinks, lint, report -- zero external deps, work immediately."
|
||||
setup: "gbrain check-update && gbrain upgrade"
|
||||
- name: "With Supabase Storage"
|
||||
description: "upload-raw, signed-url, file migration lifecycle. Large files in cloud, git stays lean."
|
||||
setup: "Configure storage backend, then gbrain files mirror + redirect."
|
||||
---
|
||||
|
||||
# v0.9.0 Migration: Deterministic Tools + Smart File Storage
|
||||
|
||||
This is a major upgrade. GBrain now ships deterministic tools alongside skills --
|
||||
code for data, LLMs for judgment. No database schema changes required.
|
||||
|
||||
## What's New: 5 Deterministic Commands
|
||||
|
||||
These commands run without LLM calls. They are the "code" half of the
|
||||
[Thin Harness, Fat Skills](https://x.com/garrytan/status/2042925773300908103) pattern.
|
||||
|
||||
### 1. `gbrain publish` -- shareable HTML from brain pages
|
||||
|
||||
```bash
|
||||
gbrain publish brain/people/jane-doe.md # local HTML
|
||||
gbrain publish brain/people/jane-doe.md --password # auto-generated pw
|
||||
gbrain publish brain/people/jane-doe.md --password "pw" # custom pw
|
||||
gbrain publish brain/people/jane-doe.md --out share.html # custom output
|
||||
```
|
||||
|
||||
Strips private data (frontmatter, citations, confirmations, brain links, timeline).
|
||||
Optional AES-256-GCM encryption with client-side decryption. Dark/light mode,
|
||||
mobile-optimized. Self-contained HTML, no server needed.
|
||||
|
||||
**Skill:** `skills/publish/SKILL.md` tells the agent when to publish, defaults
|
||||
(always encrypt), and sharing workflows (local file, cloud upload + signed URL,
|
||||
static hosting).
|
||||
|
||||
### 2. `gbrain check-backlinks check/fix` -- enforce the Iron Law
|
||||
|
||||
```bash
|
||||
gbrain check-backlinks check --dir /path/to/brain # report missing back-links
|
||||
gbrain check-backlinks fix --dir /path/to/brain # create them
|
||||
gbrain check-backlinks fix --dir /path/to/brain --dry-run # preview
|
||||
```
|
||||
|
||||
Scans all pages for entity mentions (links to people/ and companies/), checks
|
||||
if those entity pages link back. Creates timeline entries for missing back-links.
|
||||
|
||||
### 3. `gbrain lint [--fix]` -- catch LLM slop
|
||||
|
||||
```bash
|
||||
gbrain lint /path/to/brain # report issues
|
||||
gbrain lint /path/to/brain --fix # auto-fix what's fixable
|
||||
gbrain lint brain/people/jane.md # lint single file
|
||||
```
|
||||
|
||||
Catches: LLM preambles ("Of course! Here is..."), wrapping code fences,
|
||||
placeholder dates (YYYY-MM-DD, XX-XX), missing frontmatter, broken citations,
|
||||
empty sections. `--fix` strips the fixable ones automatically.
|
||||
|
||||
### 4. `gbrain report` -- audit trail
|
||||
|
||||
```bash
|
||||
gbrain report --type enrichment-sweep --title "Weekly Sweep" --content "..."
|
||||
echo "report body" | gbrain report --type meeting-sync --title "Daily Sync"
|
||||
```
|
||||
|
||||
Creates: `brain/reports/{type}/{YYYY-MM-DD-HHMM}.md` with frontmatter.
|
||||
Used by the maintain skill for enrichment sweeps, backlink checks, lint runs.
|
||||
|
||||
### 5. `gbrain files upload-raw` -- smart file uploads
|
||||
|
||||
```bash
|
||||
gbrain files upload-raw <file> --page <slug> --type <type>
|
||||
gbrain files signed-url <storage-path>
|
||||
```
|
||||
|
||||
Auto-routes by size: < 100 MB text stays in git, >= 100 MB or media goes to
|
||||
cloud storage via TUS resumable upload (6 MB chunks with retry). Creates
|
||||
`.redirect.yaml` pointers in the brain repo.
|
||||
|
||||
## Agent Instructions
|
||||
|
||||
### Step 1: Verify upgrade
|
||||
|
||||
```bash
|
||||
gbrain --version # should show 0.9.0
|
||||
gbrain doctor --json
|
||||
```
|
||||
|
||||
### Step 2: Replace custom scripts with built-in commands
|
||||
|
||||
**This is the key step.** If your agent uses custom scripts for any of these
|
||||
operations, swap them for the built-in gbrain commands:
|
||||
|
||||
| Old pattern (custom script) | New pattern (built-in) |
|
||||
|----------------------------|----------------------|
|
||||
| `node scripts/brain-publish.mjs <page>` | `gbrain publish <page> --password` |
|
||||
| `node scripts/upload-raw-source.mjs <file> --page <slug>` | `gbrain files upload-raw <file> --page <slug>` |
|
||||
| `node scripts/save-report.mjs --cron <type>` | `gbrain report --type <type> --content "..."` |
|
||||
| `node scripts/validate-brain-links.mjs` | `gbrain files verify` |
|
||||
| `node scripts/migrate-brain-files.mjs` | `gbrain files mirror + redirect` |
|
||||
| Manual back-link checking | `gbrain check-backlinks check --dir <brain>` |
|
||||
| Manual page quality review | `gbrain lint <brain> --fix` |
|
||||
|
||||
**For each replacement:**
|
||||
1. Find references to the old script in your AGENTS.md, skills, or cron jobs
|
||||
2. Replace with the gbrain command
|
||||
3. The gbrain command uses your configured storage backend automatically
|
||||
|
||||
**Example: update AGENTS.md or skill files:**
|
||||
```
|
||||
# Old:
|
||||
node /data/.openclaw/workspace/scripts/upload-raw-source.mjs <file> \
|
||||
--page <slug> --type transcript
|
||||
|
||||
# New:
|
||||
gbrain files upload-raw <file> --page <slug> --type transcript
|
||||
```
|
||||
|
||||
```
|
||||
# Old:
|
||||
node /data/.openclaw/workspace/scripts/brain-publish.mjs <page> --password
|
||||
|
||||
# New:
|
||||
gbrain publish <page> --password
|
||||
```
|
||||
|
||||
```
|
||||
# Old:
|
||||
node /data/.openclaw/workspace/scripts/save-report.mjs \
|
||||
--cron "enrichment-sweep" --title "Enrichment Sweep" --content "..."
|
||||
|
||||
# New:
|
||||
gbrain report --type enrichment-sweep --title "Enrichment Sweep" --content "..."
|
||||
```
|
||||
|
||||
### Step 3: Run the new tools on your brain
|
||||
|
||||
```bash
|
||||
# Check back-link health
|
||||
gbrain check-backlinks check --dir /path/to/brain
|
||||
|
||||
# If gaps found, fix them:
|
||||
gbrain check-backlinks fix --dir /path/to/brain --dry-run # preview
|
||||
gbrain check-backlinks fix --dir /path/to/brain # apply
|
||||
|
||||
# Lint for quality issues
|
||||
gbrain lint /path/to/brain
|
||||
|
||||
# If fixable issues found:
|
||||
gbrain lint /path/to/brain --fix --dry-run # preview
|
||||
gbrain lint /path/to/brain --fix # apply
|
||||
|
||||
# Save a report of what you fixed
|
||||
gbrain report --type migration \
|
||||
--title "v0.9.0 Migration" \
|
||||
--content "Ran backlinks fix (N gaps fixed) and lint --fix (M issues fixed)." \
|
||||
--dir /path/to/brain
|
||||
```
|
||||
|
||||
### Step 4: Configure storage backend (if not already done)
|
||||
|
||||
If your brain repo has binary files (images, PDFs, audio, video), configure
|
||||
cloud storage to keep git lean:
|
||||
|
||||
```bash
|
||||
# Supabase Storage (recommended if you already use Supabase)
|
||||
gbrain config set storage.backend supabase
|
||||
gbrain config set storage.bucket brain-files
|
||||
gbrain config set storage.projectUrl https://YOUR-PROJECT.supabase.co
|
||||
gbrain config set storage.serviceRoleKey YOUR_SERVICE_ROLE_KEY
|
||||
|
||||
# Or S3-compatible (AWS, Cloudflare R2, MinIO)
|
||||
gbrain config set storage.backend s3
|
||||
gbrain config set storage.bucket brain-files
|
||||
gbrain config set storage.region us-east-1
|
||||
gbrain config set storage.accessKeyId YOUR_KEY
|
||||
gbrain config set storage.secretAccessKey YOUR_SECRET
|
||||
```
|
||||
|
||||
Then migrate existing binaries:
|
||||
```bash
|
||||
gbrain files status
|
||||
gbrain files mirror <brain-dir>
|
||||
gbrain files redirect <brain-dir>
|
||||
```
|
||||
|
||||
### Step 5: Update cron jobs
|
||||
|
||||
If you have cron jobs that call custom scripts, update them:
|
||||
|
||||
```bash
|
||||
# Old cron entry:
|
||||
*/30 * * * * node /path/to/scripts/validate-brain-links.mjs
|
||||
|
||||
# New:
|
||||
*/30 * * * * gbrain files verify
|
||||
```
|
||||
|
||||
### Step 6: Add new maintenance crons (recommended)
|
||||
|
||||
```bash
|
||||
# Daily backlink check + auto-fix
|
||||
0 3 * * * gbrain check-backlinks fix --dir /path/to/brain
|
||||
|
||||
# Weekly lint + auto-fix
|
||||
0 4 * * 0 gbrain lint /path/to/brain --fix
|
||||
```
|
||||
|
||||
### Step 7: Review skill updates
|
||||
|
||||
The skill files have been updated to reference gbrain commands:
|
||||
- `skills/ingest/SKILL.md` -- uses `gbrain files upload-raw` for raw sources
|
||||
- `skills/maintain/SKILL.md` -- uses `gbrain check-backlinks`, `gbrain lint`, `gbrain report`
|
||||
- `skills/publish/SKILL.md` -- NEW skill for `gbrain publish`
|
||||
- `skills/enrich/SKILL.md` -- references `gbrain files upload-raw` for raw API data
|
||||
- `skills/_brain-filing-rules.md` -- documents `.redirect.yaml` format and commands
|
||||
|
||||
Read the updated skills to pick up the new patterns.
|
||||
|
||||
### Step 8: Done
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.gbrain/migrations
|
||||
echo '{"version":"0.9.0","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","status":"complete","tools_available":["publish","backlinks","lint","report","upload-raw","signed-url"],"scripts_replaced":true}' >> ~/.gbrain/migrations/completed.jsonl
|
||||
```
|
||||
129
skills/publish/SKILL.md
Normal file
129
skills/publish/SKILL.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Publish Skill
|
||||
|
||||
Share brain pages as beautiful, self-contained HTML documents. Optionally
|
||||
password-protected with client-side AES-256-GCM encryption. No server needed.
|
||||
|
||||
This is a **code + skill pair**: the deterministic code (`gbrain publish`) does
|
||||
the stripping, encrypting, and HTML generation. This skill tells you when and
|
||||
how to use it. See [Thin Harness, Fat Skills](https://x.com/garrytan/status/2042925773300908103)
|
||||
for the architecture philosophy.
|
||||
|
||||
## When to Publish
|
||||
|
||||
- User asks to share a brain page, create a shareable link, or says "give me a page"
|
||||
- User wants to send a deal memo, person briefing, or research to someone external
|
||||
- User asks to publish a data room analysis or trip plan
|
||||
- Any time brain content needs to leave the brain without exposing the whole system
|
||||
|
||||
## Default: ALWAYS ENCRYPT
|
||||
|
||||
Brain content is private. Default to password-protected unless the user explicitly
|
||||
says "open", "no password", or "public".
|
||||
|
||||
If no password is specified, auto-generate one. Share the password via a different
|
||||
channel than the URL.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
```bash
|
||||
# Basic publish (outputs local HTML file)
|
||||
gbrain publish brain/companies/acme.md
|
||||
|
||||
# Password protected (auto-generate password)
|
||||
gbrain publish brain/companies/acme.md --password
|
||||
|
||||
# Password protected (specific password)
|
||||
gbrain publish brain/companies/acme.md --password "secret123"
|
||||
|
||||
# Custom title
|
||||
gbrain publish brain/companies/acme.md --password --title "Acme -- Deal Analysis"
|
||||
|
||||
# Custom output path
|
||||
gbrain publish brain/companies/acme.md --out /tmp/acme-share.html
|
||||
```
|
||||
|
||||
## What Gets Stripped
|
||||
|
||||
The publish command automatically removes all private/internal data:
|
||||
|
||||
| Stripped | Example | Why |
|
||||
|---------|---------|-----|
|
||||
| YAML frontmatter | `title:`, `type:`, `tags:` | Internal metadata |
|
||||
| `[Source: ...]` citations | All formats | Provenance is internal |
|
||||
| Confirmation numbers | `ABC123DEF` -> "on file" | PII/booking data |
|
||||
| Brain cross-links | `[Jane](../people/jane.md)` -> `Jane` | Internal paths |
|
||||
| Timeline section | Everything below `---` / `## Timeline` | Raw evidence log |
|
||||
| "See also" lines | Internal references | Brain navigation |
|
||||
|
||||
**Preserved:** external URLs (`https://...`), all other content.
|
||||
|
||||
## Sharing Workflows
|
||||
|
||||
### Option A: Local file (simplest)
|
||||
|
||||
```bash
|
||||
gbrain publish brain/people/jane-doe.md --password --out ~/Desktop/jane-briefing.html
|
||||
```
|
||||
|
||||
Share the HTML file via email, Slack, Airdrop. Share the password separately.
|
||||
|
||||
### Option B: Upload to cloud storage
|
||||
|
||||
```bash
|
||||
# Publish locally first
|
||||
gbrain publish brain/companies/acme.md --password "secret" --out /tmp/acme.html
|
||||
|
||||
# Upload to Supabase Storage
|
||||
gbrain files upload /tmp/acme.html --page shares/acme
|
||||
|
||||
# Get a signed URL (1-hour expiry)
|
||||
gbrain files signed-url shares/acme/acme.html
|
||||
```
|
||||
|
||||
Share the signed URL + password. URL expires in 1 hour. Re-generate as needed.
|
||||
|
||||
### Option C: Static hosting (Render, Netlify, S3)
|
||||
|
||||
Upload the HTML file to any static hosting service. The file is self-contained,
|
||||
no server logic needed. Password-protected files work entirely client-side via
|
||||
Web Crypto API.
|
||||
|
||||
### Option D: GitHub Pages / Gist
|
||||
|
||||
```bash
|
||||
gbrain publish brain/trips/japan-2026.md --out trip.html
|
||||
# Upload to a GitHub Gist or Pages repo
|
||||
```
|
||||
|
||||
## Password Protection Details
|
||||
|
||||
- **Algorithm:** AES-256-GCM
|
||||
- **Key derivation:** PBKDF2 with 100K iterations, SHA-256
|
||||
- **Salt:** Random 16 bytes per encryption
|
||||
- **IV:** Random 12 bytes per encryption
|
||||
- **Decryption:** Client-side via Web Crypto API (SubtleCrypto)
|
||||
- **No server auth needed** -- the HTML file is self-contained
|
||||
- **"Remember on this device"** -- saves password in localStorage
|
||||
|
||||
When encrypted, the published HTML contains ONLY ciphertext. The plaintext is
|
||||
not present anywhere in the file.
|
||||
|
||||
## Updating a Published Page
|
||||
|
||||
Re-run the publish command with the same output path:
|
||||
```bash
|
||||
gbrain publish brain/companies/acme.md --password "same-password" --out shares/acme.html
|
||||
```
|
||||
|
||||
Same file, same URL (if hosted), updated content.
|
||||
|
||||
## Revoking Access
|
||||
|
||||
Delete the file. If using signed URLs, the URL expires automatically (1 hour).
|
||||
If using static hosting, remove the file from the host.
|
||||
|
||||
## Tools Used
|
||||
|
||||
- `gbrain publish` -- deterministic HTML generation (no LLM calls)
|
||||
- `gbrain files upload` -- upload to cloud storage (optional)
|
||||
- `gbrain files signed-url` -- generate access links (optional)
|
||||
@@ -49,6 +49,23 @@ When multiple sources provide conflicting information, follow this precedence:
|
||||
When sources conflict, note the contradiction with both citations. Don't silently
|
||||
pick one.
|
||||
|
||||
## Citation in Answers
|
||||
|
||||
When referencing brain pages in your answer, propagate inline citations:
|
||||
- Cite the page: "According to [Source: people/jane-doe, compiled truth]..."
|
||||
- When brain pages have inline `[Source: ...]` citations, propagate them so
|
||||
the user can trace facts to their origin
|
||||
- When you synthesize across multiple pages, cite all sources
|
||||
|
||||
## Search Quality Awareness
|
||||
|
||||
If search results seem off (wrong results, missing known pages, irrelevant hits):
|
||||
- Run `gbrain doctor --json` to check index health
|
||||
- Check embedding coverage -- partial embeddings degrade hybrid search
|
||||
- Compare keyword search (`gbrain search`) vs hybrid search (`gbrain query`)
|
||||
for the same query to isolate whether the issue is embedding-related
|
||||
- Report search quality issues in the maintain workflow (see maintain skill)
|
||||
|
||||
## Tools Used
|
||||
|
||||
- Keyword search gbrain (search)
|
||||
|
||||
@@ -124,6 +124,24 @@ echo "=== Discovery Complete ==="
|
||||
> "You have N binary files (X GB) in your brain repo. Want to move them to cloud
|
||||
> storage? Your git repo will drop from X GB to Y MB. All links keep working."
|
||||
|
||||
If the user agrees, configure storage and run migration:
|
||||
```bash
|
||||
# Configure storage backend (Supabase Storage recommended)
|
||||
gbrain config set storage.backend supabase
|
||||
gbrain config set storage.bucket brain-files
|
||||
gbrain config set storage.projectUrl <supabase-url>
|
||||
gbrain config set storage.serviceRoleKey <service-role-key>
|
||||
|
||||
# Migrate binary files to cloud (3-step lifecycle)
|
||||
gbrain files mirror <brain-dir> # Upload to cloud, keep local
|
||||
gbrain files redirect <brain-dir> # Replace local with .redirect.yaml pointers
|
||||
# (optional) gbrain files clean <brain-dir> --yes # Remove pointers too
|
||||
```
|
||||
|
||||
After migration, `gbrain files upload-raw` handles new files automatically:
|
||||
small text/PDFs stay in git, large/media files go to cloud with `.redirect.yaml`
|
||||
pointers. Files >= 100 MB use TUS resumable upload for reliability.
|
||||
|
||||
If no markdown repos are found, create a starter brain with a few template pages
|
||||
(a person page, a company page, a concept page) from docs/GBRAIN_RECOMMENDED_SCHEMA.md.
|
||||
|
||||
|
||||
30
src/cli.ts
30
src/cli.ts
@@ -18,7 +18,7 @@ for (const op of operations) {
|
||||
}
|
||||
|
||||
// CLI-only commands that bypass the operation layer
|
||||
const CLI_ONLY = new Set(['init', 'upgrade', 'post-upgrade', 'check-update', 'integrations', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor', 'migrate']);
|
||||
const CLI_ONLY = new Set(['init', 'upgrade', 'post-upgrade', 'check-update', 'integrations', 'publish', 'check-backlinks', 'lint', 'report', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config', 'doctor', 'migrate']);
|
||||
|
||||
async function main() {
|
||||
const args = process.argv.slice(2);
|
||||
@@ -247,6 +247,26 @@ async function handleCliOnly(command: string, args: string[]) {
|
||||
await runIntegrations(args);
|
||||
return;
|
||||
}
|
||||
if (command === 'publish') {
|
||||
const { runPublish } = await import('./commands/publish.ts');
|
||||
await runPublish(args);
|
||||
return;
|
||||
}
|
||||
if (command === 'check-backlinks') {
|
||||
const { runBacklinks } = await import('./commands/backlinks.ts');
|
||||
await runBacklinks(args);
|
||||
return;
|
||||
}
|
||||
if (command === 'lint') {
|
||||
const { runLint } = await import('./commands/lint.ts');
|
||||
await runLint(args);
|
||||
return;
|
||||
}
|
||||
if (command === 'report') {
|
||||
const { runReport } = await import('./commands/report.ts');
|
||||
await runReport(args);
|
||||
return;
|
||||
}
|
||||
|
||||
// All remaining CLI-only commands need a DB connection
|
||||
const engine = await connectEngine();
|
||||
@@ -368,6 +388,8 @@ IMPORT/EXPORT
|
||||
FILES
|
||||
files list [slug] List stored files
|
||||
files upload <file> --page <slug> Upload file to storage
|
||||
files upload-raw <file> --page <s> Smart upload (size routing + .redirect.yaml)
|
||||
files signed-url <path> Generate signed URL (1-hour)
|
||||
files sync <dir> Bulk upload directory
|
||||
files verify Verify all uploads
|
||||
|
||||
@@ -389,6 +411,12 @@ TIMELINE
|
||||
timeline [<slug>] View timeline
|
||||
timeline-add <slug> <date> <text> Add timeline entry
|
||||
|
||||
TOOLS
|
||||
publish <page.md> [--password] Shareable HTML (strips private data, optional AES-256)
|
||||
check-backlinks <check|fix> [dir] Find/fix missing back-links across brain
|
||||
lint <dir|file> [--fix] Catch LLM artifacts, placeholder dates, bad frontmatter
|
||||
report --type <name> --content ... Save timestamped report to brain/reports/
|
||||
|
||||
ADMIN
|
||||
stats Brain statistics
|
||||
health Brain health dashboard
|
||||
|
||||
213
src/commands/backlinks.ts
Normal file
213
src/commands/backlinks.ts
Normal file
@@ -0,0 +1,213 @@
|
||||
/**
|
||||
* gbrain check-backlinks — Check and fix missing back-links across brain pages.
|
||||
*
|
||||
* Deterministic: zero LLM calls. Scans pages for entity mentions,
|
||||
* checks if back-links exist, and optionally creates them.
|
||||
*
|
||||
* Usage:
|
||||
* gbrain check-backlinks check [--dir <brain-dir>] # report missing back-links
|
||||
* gbrain check-backlinks fix [--dir <brain-dir>] # create missing back-links
|
||||
* gbrain check-backlinks fix --dry-run # preview fixes
|
||||
*/
|
||||
|
||||
import { readFileSync, writeFileSync, readdirSync, statSync, lstatSync, existsSync } from 'fs';
|
||||
import { join, relative, basename } from 'path';
|
||||
|
||||
interface BacklinkGap {
|
||||
/** The page that mentions the entity */
|
||||
sourcePage: string;
|
||||
/** The entity page that's missing the back-link */
|
||||
targetPage: string;
|
||||
/** The entity name mentioned */
|
||||
entityName: string;
|
||||
/** The source page title */
|
||||
sourceTitle: string;
|
||||
}
|
||||
|
||||
/** Extract entity references from markdown content (relative links to people/companies) */
|
||||
export function extractEntityRefs(content: string, pagePath: string): { name: string; slug: string; dir: string }[] {
|
||||
const refs: { name: string; slug: string; dir: string }[] = [];
|
||||
// Match markdown links to brain pages: [Name](../people/slug.md) or [Name](../../companies/slug.md)
|
||||
const linkPattern = /\[([^\]]+)\]\(([^)]*(?:people|companies)\/([^)]+\.md))\)/g;
|
||||
let match;
|
||||
while ((match = linkPattern.exec(content)) !== null) {
|
||||
const name = match[1];
|
||||
const fullPath = match[2];
|
||||
const slug = match[3].replace('.md', '');
|
||||
const dir = fullPath.includes('people') ? 'people' : 'companies';
|
||||
refs.push({ name, slug, dir });
|
||||
}
|
||||
return refs;
|
||||
}
|
||||
|
||||
/** Extract title from page (first H1 or frontmatter title) */
|
||||
export function extractPageTitle(content: string): string {
|
||||
const fmMatch = content.match(/^title:\s*"?(.+?)"?\s*$/m);
|
||||
if (fmMatch) return fmMatch[1];
|
||||
const h1Match = content.match(/^#\s+(.+)$/m);
|
||||
if (h1Match) return h1Match[1].trim();
|
||||
return 'Untitled';
|
||||
}
|
||||
|
||||
/** Check if a page already contains a back-link to a given source file */
|
||||
export function hasBacklink(targetContent: string, sourceFilename: string): boolean {
|
||||
return targetContent.includes(sourceFilename);
|
||||
}
|
||||
|
||||
/** Build a timeline back-link entry */
|
||||
export function buildBacklinkEntry(sourceTitle: string, sourcePath: string, date: string): string {
|
||||
return `- **${date}** | Referenced in [${sourceTitle}](${sourcePath})`;
|
||||
}
|
||||
|
||||
/** Scan a brain directory for back-link gaps */
|
||||
export function findBacklinkGaps(brainDir: string): BacklinkGap[] {
|
||||
const gaps: BacklinkGap[] = [];
|
||||
|
||||
// Collect all markdown files
|
||||
const allPages: { path: string; relPath: string; content: string }[] = [];
|
||||
function walk(dir: string) {
|
||||
for (const entry of readdirSync(dir)) {
|
||||
if (entry.startsWith('.')) continue;
|
||||
const full = join(dir, entry);
|
||||
if (lstatSync(full).isDirectory()) {
|
||||
walk(full);
|
||||
} else if (entry.endsWith('.md') && !entry.startsWith('_')) {
|
||||
const relPath = relative(brainDir, full);
|
||||
try {
|
||||
allPages.push({ path: full, relPath, content: readFileSync(full, 'utf-8') });
|
||||
} catch { /* skip unreadable */ }
|
||||
}
|
||||
}
|
||||
}
|
||||
walk(brainDir);
|
||||
|
||||
// Build a lookup of existing pages by directory/slug
|
||||
const pagesBySlug = new Map<string, { path: string; content: string }>();
|
||||
for (const page of allPages) {
|
||||
const slug = page.relPath.replace('.md', '');
|
||||
pagesBySlug.set(slug, { path: page.path, content: page.content });
|
||||
}
|
||||
|
||||
// For each page, check entity references
|
||||
for (const page of allPages) {
|
||||
const refs = extractEntityRefs(page.content, page.relPath);
|
||||
const sourceFilename = basename(page.relPath);
|
||||
|
||||
for (const ref of refs) {
|
||||
const targetSlug = `${ref.dir}/${ref.slug}`;
|
||||
const target = pagesBySlug.get(targetSlug);
|
||||
if (!target) continue; // target page doesn't exist
|
||||
|
||||
// Check if the target already has a back-link to this source page
|
||||
if (!hasBacklink(target.content, sourceFilename)) {
|
||||
gaps.push({
|
||||
sourcePage: page.relPath,
|
||||
targetPage: targetSlug + '.md',
|
||||
entityName: ref.name,
|
||||
sourceTitle: extractPageTitle(page.content),
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return gaps;
|
||||
}
|
||||
|
||||
/** Fix back-link gaps by appending timeline entries to target pages */
|
||||
export function fixBacklinkGaps(brainDir: string, gaps: BacklinkGap[], dryRun: boolean = false): number {
|
||||
const today = new Date().toISOString().slice(0, 10);
|
||||
let fixed = 0;
|
||||
|
||||
// Group gaps by target page to batch writes
|
||||
const byTarget = new Map<string, BacklinkGap[]>();
|
||||
for (const gap of gaps) {
|
||||
const existing = byTarget.get(gap.targetPage) || [];
|
||||
existing.push(gap);
|
||||
byTarget.set(gap.targetPage, existing);
|
||||
}
|
||||
|
||||
for (const [targetPage, targetGaps] of byTarget) {
|
||||
const targetPath = join(brainDir, targetPage);
|
||||
if (!existsSync(targetPath)) continue;
|
||||
|
||||
let content = readFileSync(targetPath, 'utf-8');
|
||||
|
||||
for (const gap of targetGaps) {
|
||||
// Compute relative path from target to source
|
||||
const targetDir = targetPage.split('/').slice(0, -1);
|
||||
const sourceDir = gap.sourcePage.split('/');
|
||||
const depth = targetDir.length;
|
||||
const relPrefix = '../'.repeat(depth);
|
||||
const relPath = relPrefix + gap.sourcePage;
|
||||
|
||||
const entry = buildBacklinkEntry(gap.sourceTitle, relPath, today);
|
||||
|
||||
// Insert into Timeline section
|
||||
if (content.includes('## Timeline')) {
|
||||
const parts = content.split('## Timeline');
|
||||
const afterTimeline = parts[1];
|
||||
const nextSection = afterTimeline.match(/\n## /);
|
||||
if (nextSection) {
|
||||
const insertIdx = parts[0].length + '## Timeline'.length + nextSection.index!;
|
||||
content = content.slice(0, insertIdx) + '\n' + entry + content.slice(insertIdx);
|
||||
} else {
|
||||
content = content.trimEnd() + '\n' + entry + '\n';
|
||||
}
|
||||
} else {
|
||||
// Add Timeline section
|
||||
content = content.trimEnd() + '\n\n## Timeline\n\n' + entry + '\n';
|
||||
}
|
||||
fixed++;
|
||||
}
|
||||
|
||||
if (!dryRun) {
|
||||
writeFileSync(targetPath, content);
|
||||
}
|
||||
}
|
||||
|
||||
return fixed;
|
||||
}
|
||||
|
||||
export async function runBacklinks(args: string[]) {
|
||||
const subcommand = args[0];
|
||||
const dirIdx = args.indexOf('--dir');
|
||||
const brainDir = dirIdx >= 0 ? args[dirIdx + 1] : '.';
|
||||
const dryRun = args.includes('--dry-run');
|
||||
|
||||
if (!subcommand || !['check', 'fix'].includes(subcommand)) {
|
||||
console.error('Usage: gbrain check-backlinks <check|fix> [--dir <brain-dir>] [--dry-run]');
|
||||
console.error(' check Report missing back-links');
|
||||
console.error(' fix Create missing back-links (appends to Timeline)');
|
||||
console.error(' --dir Brain directory (default: current directory)');
|
||||
console.error(' --dry-run Preview fixes without writing');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (!existsSync(brainDir)) {
|
||||
console.error(`Directory not found: ${brainDir}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const gaps = findBacklinkGaps(brainDir);
|
||||
|
||||
if (gaps.length === 0) {
|
||||
console.log('No missing back-links found.');
|
||||
return;
|
||||
}
|
||||
|
||||
if (subcommand === 'check') {
|
||||
console.log(`Found ${gaps.length} missing back-link(s):\n`);
|
||||
for (const gap of gaps) {
|
||||
console.log(` ${gap.targetPage} <- ${gap.sourcePage}`);
|
||||
console.log(` "${gap.entityName}" mentioned in "${gap.sourceTitle}"`);
|
||||
}
|
||||
console.log(`\nRun 'gbrain check-backlinks fix --dir ${brainDir}' to create them.`);
|
||||
} else {
|
||||
const label = dryRun ? '(dry run) ' : '';
|
||||
const fixed = fixBacklinkGaps(brainDir, gaps, dryRun);
|
||||
console.log(`${label}Fixed ${fixed} missing back-link(s) across ${new Set(gaps.map(g => g.targetPage)).size} page(s).`);
|
||||
if (dryRun) {
|
||||
console.log('\nRe-run without --dry-run to apply.');
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,8 +1,12 @@
|
||||
import { readFileSync, readdirSync, statSync, existsSync, writeFileSync, unlinkSync } from 'fs';
|
||||
import { join, relative, extname, basename } from 'path';
|
||||
import { readFileSync, readdirSync, statSync, existsSync, writeFileSync, unlinkSync, mkdirSync } from 'fs';
|
||||
import { join, relative, extname, basename, dirname } from 'path';
|
||||
import { createHash } from 'crypto';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import * as db from '../core/db.ts';
|
||||
import { humanSize } from '../core/file-resolver.ts';
|
||||
|
||||
/** Size threshold: files >= 100 MB use TUS resumable upload */
|
||||
const SIZE_THRESHOLD = 100 * 1024 * 1024;
|
||||
|
||||
interface FileRecord {
|
||||
id: number;
|
||||
@@ -67,20 +71,28 @@ export async function runFiles(engine: BrainEngine, args: string[]) {
|
||||
case 'clean':
|
||||
await cleanFiles(args.slice(1));
|
||||
break;
|
||||
case 'upload-raw':
|
||||
await uploadRaw(args.slice(1));
|
||||
break;
|
||||
case 'signed-url':
|
||||
await signedUrl(args.slice(1));
|
||||
break;
|
||||
case 'status':
|
||||
await filesStatus(args.slice(1));
|
||||
break;
|
||||
default:
|
||||
console.error(`Usage: gbrain files <list|upload|sync|verify|mirror|unmirror|redirect|restore|clean|status> [args]`);
|
||||
console.error(`Usage: gbrain files <command> [args]`);
|
||||
console.error(` list [slug] List files for a page (or all)`);
|
||||
console.error(` upload <file> --page <slug> Upload file linked to page`);
|
||||
console.error(` upload-raw <file> --page <slug> [--type <type>] Smart upload with .redirect.yaml pointer`);
|
||||
console.error(` signed-url <path> Generate signed URL for stored file`);
|
||||
console.error(` sync <dir> Upload directory to storage`);
|
||||
console.error(` verify Verify all uploads match local`);
|
||||
console.error(` mirror <dir> [--dry-run] Mirror files to cloud storage`);
|
||||
console.error(` unmirror <dir> Remove mirror marker (files stay in storage)`);
|
||||
console.error(` redirect <dir> [--dry-run] Replace files with .redirect breadcrumbs`);
|
||||
console.error(` redirect <dir> [--dry-run] Replace files with .redirect.yaml pointers`);
|
||||
console.error(` restore <dir> Download from storage, recreate local files`);
|
||||
console.error(` clean <dir> [--yes] Delete .redirect breadcrumbs (irreversible)`);
|
||||
console.error(` clean <dir> [--yes] Delete redirect pointers (irreversible)`);
|
||||
console.error(` status Show migration status of directories`);
|
||||
process.exit(1);
|
||||
}
|
||||
@@ -138,6 +150,8 @@ async function uploadFile(args: string[]) {
|
||||
const { createStorage } = await import('../core/storage.ts');
|
||||
const storage = await createStorage(config.storage as any);
|
||||
const content = readFileSync(filePath);
|
||||
const method = content.length >= SIZE_THRESHOLD ? 'TUS resumable' : 'standard';
|
||||
console.log(`Uploading ${humanSize(stat.size)} via ${method}...`);
|
||||
await storage.upload(storagePath, content, mimeType || undefined);
|
||||
}
|
||||
|
||||
@@ -150,7 +164,133 @@ async function uploadFile(args: string[]) {
|
||||
mime_type = EXCLUDED.mime_type
|
||||
`;
|
||||
|
||||
console.log(`Uploaded: ${storagePath} (${Math.round(stat.size / 1024)}KB)`);
|
||||
console.log(`Uploaded: ${storagePath} (${humanSize(stat.size)})`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Smart upload with size routing and .redirect.yaml pointer creation.
|
||||
*
|
||||
* Size routing:
|
||||
* < 100 MB text/PDF → stays in git (brain repo), no cloud upload
|
||||
* >= 100 MB OR media → upload to cloud storage, create .redirect.yaml pointer
|
||||
*
|
||||
* The .redirect.yaml pointer stays in the brain repo so git tracks what was stored.
|
||||
*/
|
||||
async function uploadRaw(args: string[]) {
|
||||
const filePath = args.find(a => !a.startsWith('--'));
|
||||
const pageSlug = args.find((a, i) => args[i - 1] === '--page') || null;
|
||||
const fileType = args.find((a, i) => args[i - 1] === '--type') || null;
|
||||
const noPointer = args.includes('--no-pointer');
|
||||
|
||||
if (!filePath || !existsSync(filePath)) {
|
||||
console.error('Usage: gbrain files upload-raw <file> --page <slug> [--type <type>] [--no-pointer]');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const stat = statSync(filePath);
|
||||
const filename = basename(filePath);
|
||||
const mimeType = getMimeType(filePath);
|
||||
const isMedia = mimeType?.startsWith('video/') || mimeType?.startsWith('audio/') || mimeType?.startsWith('image/');
|
||||
const needsCloud = stat.size >= SIZE_THRESHOLD || isMedia;
|
||||
|
||||
if (!needsCloud) {
|
||||
// Small text/PDF files stay in git
|
||||
console.log(JSON.stringify({
|
||||
success: true,
|
||||
storage: 'git',
|
||||
path: filePath,
|
||||
size: stat.size,
|
||||
size_human: humanSize(stat.size),
|
||||
}));
|
||||
return;
|
||||
}
|
||||
|
||||
// Upload to cloud storage
|
||||
const { loadConfig } = await import('../core/config.ts');
|
||||
const config = loadConfig();
|
||||
if (!config?.storage) {
|
||||
console.error('No storage backend configured. Run gbrain init with storage settings.');
|
||||
console.error('Or use gbrain files upload for manual uploads.');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const { createStorage } = await import('../core/storage.ts');
|
||||
const storage = await createStorage(config.storage as any);
|
||||
const content = readFileSync(filePath);
|
||||
const hash = createHash('sha256').update(content).digest('hex');
|
||||
const storagePath = pageSlug ? `${pageSlug}/${filename}` : `unsorted/${hash.slice(0, 8)}-${filename}`;
|
||||
const bucket = (config.storage as any).bucket || 'brain-files';
|
||||
|
||||
const method = content.length >= SIZE_THRESHOLD ? 'TUS resumable' : 'standard';
|
||||
console.error(`Uploading ${humanSize(stat.size)} via ${method}...`);
|
||||
await storage.upload(storagePath, content, mimeType || undefined);
|
||||
|
||||
// Create .redirect.yaml pointer in the brain repo
|
||||
let pointerPath: string | null = null;
|
||||
if (!noPointer && pageSlug) {
|
||||
const { stringify } = await import('../core/yaml-lite.ts');
|
||||
const pointer = stringify({
|
||||
target: `supabase://${bucket}/${storagePath}`,
|
||||
bucket,
|
||||
storage_path: storagePath,
|
||||
size: stat.size,
|
||||
size_human: humanSize(stat.size),
|
||||
hash: `sha256:${hash}`,
|
||||
mime: mimeType || 'application/octet-stream',
|
||||
uploaded: new Date().toISOString(),
|
||||
...(fileType ? { type: fileType } : {}),
|
||||
});
|
||||
// Write pointer next to the page that references it
|
||||
pointerPath = `${pageSlug}/${filename}.redirect.yaml`;
|
||||
console.error(`Pointer: ${pointerPath}`);
|
||||
}
|
||||
|
||||
// Record in DB
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
INSERT INTO files (page_slug, filename, storage_path, mime_type, size_bytes, content_hash, metadata)
|
||||
VALUES (${pageSlug}, ${filename}, ${storagePath}, ${mimeType}, ${stat.size}, ${'sha256:' + hash},
|
||||
${JSON.stringify({ type: fileType, upload_method: method })}::jsonb)
|
||||
ON CONFLICT (storage_path) DO UPDATE SET
|
||||
content_hash = EXCLUDED.content_hash,
|
||||
size_bytes = EXCLUDED.size_bytes,
|
||||
mime_type = EXCLUDED.mime_type
|
||||
`;
|
||||
|
||||
// Output JSON for scripting
|
||||
console.log(JSON.stringify({
|
||||
success: true,
|
||||
storage: 'supabase',
|
||||
storagePath,
|
||||
bucket,
|
||||
reference: `supabase://${bucket}/${storagePath}`,
|
||||
pointerPath,
|
||||
size: stat.size,
|
||||
size_human: humanSize(stat.size),
|
||||
hash: `sha256:${hash}`,
|
||||
upload_method: method,
|
||||
}));
|
||||
}
|
||||
|
||||
/** Generate a signed URL for a stored file */
|
||||
async function signedUrl(args: string[]) {
|
||||
const storagePath = args.find(a => !a.startsWith('--'));
|
||||
if (!storagePath) {
|
||||
console.error('Usage: gbrain files signed-url <storage-path>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const { loadConfig } = await import('../core/config.ts');
|
||||
const config = loadConfig();
|
||||
if (!config?.storage) {
|
||||
console.error('No storage backend configured.');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const { createStorage } = await import('../core/storage.ts');
|
||||
const storage = await createStorage(config.storage as any);
|
||||
const url = await storage.getUrl(storagePath);
|
||||
console.log(url);
|
||||
}
|
||||
|
||||
async function syncFiles(dir?: string) {
|
||||
@@ -343,14 +483,20 @@ async function redirectFiles(args: string[]) {
|
||||
}
|
||||
}
|
||||
|
||||
const breadcrumb = stringify({
|
||||
moved_to: 'storage',
|
||||
bucket: marker.bucket || 'brain-files',
|
||||
path: relPath,
|
||||
moved_at: new Date().toISOString().split('T')[0],
|
||||
original_hash: `sha256:${hash}`,
|
||||
const stat = statSync(filePath);
|
||||
const mimeType = getMimeType(filePath);
|
||||
const bucket = marker.bucket || 'brain-files';
|
||||
const pointer = stringify({
|
||||
target: `supabase://${bucket}/${relPath}`,
|
||||
bucket,
|
||||
storage_path: relPath,
|
||||
size: stat.size,
|
||||
size_human: humanSize(stat.size),
|
||||
hash: `sha256:${hash}`,
|
||||
mime: mimeType || 'application/octet-stream',
|
||||
uploaded: new Date().toISOString(),
|
||||
});
|
||||
writeFileSync(filePath + '.redirect', breadcrumb);
|
||||
writeFileSync(filePath + '.redirect.yaml', pointer);
|
||||
unlinkSync(filePath);
|
||||
redirected++;
|
||||
}
|
||||
@@ -380,7 +526,7 @@ async function restoreFiles(args: string[]) {
|
||||
if (entry.startsWith('.')) continue;
|
||||
const full = join(d, entry);
|
||||
if (statSync(full).isDirectory()) findRedirects(full);
|
||||
else if (entry.endsWith('.redirect')) redirectFiles.push(full);
|
||||
else if (entry.endsWith('.redirect.yaml') || entry.endsWith('.redirect')) redirectFiles.push(full);
|
||||
}
|
||||
}
|
||||
findRedirects(dir);
|
||||
@@ -389,9 +535,10 @@ async function restoreFiles(args: string[]) {
|
||||
let failed = 0;
|
||||
for (const redirectPath of redirectFiles) {
|
||||
const info = parseYaml(readFileSync(redirectPath, 'utf-8'));
|
||||
const originalPath = redirectPath.replace(/\.redirect$/, '');
|
||||
const originalPath = redirectPath.replace(/\.redirect(\.yaml)?$/, '');
|
||||
try {
|
||||
const data = await storage.download(info.path);
|
||||
const storagePath = info.storage_path || info.path; // v0.9 or legacy format
|
||||
const data = await storage.download(storagePath);
|
||||
writeFileSync(originalPath, data);
|
||||
unlinkSync(redirectPath);
|
||||
restored++;
|
||||
@@ -411,7 +558,7 @@ async function cleanFiles(args: string[]) {
|
||||
if (!dir || !existsSync(dir)) { console.error('Usage: gbrain files clean <dir> [--yes]'); process.exit(1); }
|
||||
|
||||
if (!confirmed) {
|
||||
console.error('WARNING: This permanently removes .redirect breadcrumbs.');
|
||||
console.error('WARNING: This permanently removes redirect pointers.');
|
||||
console.error('After this, files are only accessible from cloud storage.');
|
||||
console.error('Git history still has the originals if you need them.');
|
||||
console.error('Run with --yes to confirm.');
|
||||
@@ -424,7 +571,7 @@ async function cleanFiles(args: string[]) {
|
||||
if (entry.startsWith('.')) continue;
|
||||
const full = join(d, entry);
|
||||
if (statSync(full).isDirectory()) findAndClean(full);
|
||||
else if (entry.endsWith('.redirect')) { unlinkSync(full); cleaned++; }
|
||||
else if (entry.endsWith('.redirect.yaml') || entry.endsWith('.redirect')) { unlinkSync(full); cleaned++; }
|
||||
}
|
||||
}
|
||||
findAndClean(dir);
|
||||
@@ -443,7 +590,7 @@ async function filesStatus(args: string[]) {
|
||||
const full = join(d, entry);
|
||||
if (entry === '.supabase') { mirrored++; continue; }
|
||||
if (statSync(full).isDirectory()) scan(full);
|
||||
else if (entry.endsWith('.redirect')) redirected++;
|
||||
else if (entry.endsWith('.redirect.yaml') || entry.endsWith('.redirect')) redirected++;
|
||||
else if (!entry.endsWith('.md')) local++;
|
||||
}
|
||||
}
|
||||
|
||||
245
src/commands/lint.ts
Normal file
245
src/commands/lint.ts
Normal file
@@ -0,0 +1,245 @@
|
||||
/**
|
||||
* gbrain lint — Deterministic brain page quality checker.
|
||||
*
|
||||
* Zero LLM calls. Catches common quality issues:
|
||||
* - LLM preamble artifacts ("Of course! Here is...")
|
||||
* - Placeholder dates (YYYY-MM-DD, XX-XX left unfilled)
|
||||
* - Missing required frontmatter fields
|
||||
* - Broken citations (unclosed brackets, missing dates)
|
||||
* - Empty/stub sections
|
||||
* - Wrapping code fences from LLM output
|
||||
*
|
||||
* Usage:
|
||||
* gbrain lint <dir> # report issues
|
||||
* gbrain lint <dir> --fix # auto-fix what's fixable
|
||||
* gbrain lint <dir> --fix --dry-run # preview fixes
|
||||
* gbrain lint <file.md> # lint single file
|
||||
*/
|
||||
|
||||
import { readFileSync, writeFileSync, readdirSync, statSync, lstatSync, existsSync } from 'fs';
|
||||
import { join, relative } from 'path';
|
||||
|
||||
export interface LintIssue {
|
||||
file: string;
|
||||
line: number;
|
||||
rule: string;
|
||||
message: string;
|
||||
fixable: boolean;
|
||||
}
|
||||
|
||||
// ── LLM artifact patterns ──────────────────────────────────────────
|
||||
|
||||
const LLM_PREAMBLES = [
|
||||
/^Of course\.?\s*Here is (?:a |the )?(?:detailed |comprehensive |updated )?(?:brain )?page[^.\n]*\.?\s*\n*/gim,
|
||||
/^Certainly\.?\s*Here is[^.\n]*\.?\s*\n*/gim,
|
||||
/^Here is (?:a |the )?(?:detailed |comprehensive |updated )?(?:brain )?page[^.\n]*\.?\s*\n*/gim,
|
||||
/^I've (?:created|updated|written|prepared) (?:a |the )?(?:detailed |comprehensive )?(?:brain )?page[^.\n]*\.?\s*\n*/gim,
|
||||
/^Sure(?:!|,)?\s*Here (?:is|are)[^.\n]*\.?\s*\n*/gim,
|
||||
/^Absolutely\.?\s*Here[^.\n]*\.?\s*\n*/gim,
|
||||
];
|
||||
|
||||
// ── Rules ──────────────────────────────────────────────────────────
|
||||
|
||||
export function lintContent(content: string, filePath: string): LintIssue[] {
|
||||
const issues: LintIssue[] = [];
|
||||
const lines = content.split('\n');
|
||||
|
||||
// Rule: LLM preamble artifacts
|
||||
for (const pattern of LLM_PREAMBLES) {
|
||||
pattern.lastIndex = 0;
|
||||
if (pattern.test(content)) {
|
||||
issues.push({
|
||||
file: filePath, line: 1, rule: 'llm-preamble',
|
||||
message: 'LLM preamble artifact detected (e.g., "Of course! Here is...")',
|
||||
fixable: true,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Rule: Wrapping code fences (```markdown ... ```)
|
||||
if (content.match(/^```(?:markdown|md)\s*\n/m) && content.match(/\n```\s*$/m)) {
|
||||
issues.push({
|
||||
file: filePath, line: 1, rule: 'code-fence-wrap',
|
||||
message: 'Page wrapped in ```markdown code fences (LLM artifact)',
|
||||
fixable: true,
|
||||
});
|
||||
}
|
||||
|
||||
// Rule: Placeholder dates
|
||||
for (let i = 0; i < lines.length; i++) {
|
||||
if (lines[i].match(/\bYYYY-MM-DD\b/) || lines[i].match(/\bXX-XX\b/) || lines[i].match(/\b\d{4}-XX-XX\b/)) {
|
||||
issues.push({
|
||||
file: filePath, line: i + 1, rule: 'placeholder-date',
|
||||
message: `Placeholder date found: ${lines[i].trim().slice(0, 60)}`,
|
||||
fixable: false,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Rule: Missing frontmatter
|
||||
if (content.startsWith('---')) {
|
||||
const fmEnd = content.indexOf('---', 3);
|
||||
if (fmEnd > 0) {
|
||||
const fm = content.slice(3, fmEnd);
|
||||
if (!fm.match(/^title:/m)) {
|
||||
issues.push({
|
||||
file: filePath, line: 1, rule: 'missing-title',
|
||||
message: 'Frontmatter missing required field: title',
|
||||
fixable: false,
|
||||
});
|
||||
}
|
||||
if (!fm.match(/^type:/m)) {
|
||||
issues.push({
|
||||
file: filePath, line: 1, rule: 'missing-type',
|
||||
message: 'Frontmatter missing required field: type',
|
||||
fixable: false,
|
||||
});
|
||||
}
|
||||
if (!fm.match(/^created:/m)) {
|
||||
issues.push({
|
||||
file: filePath, line: 1, rule: 'missing-created',
|
||||
message: 'Frontmatter missing required field: created',
|
||||
fixable: false,
|
||||
});
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// No frontmatter at all
|
||||
issues.push({
|
||||
file: filePath, line: 1, rule: 'no-frontmatter',
|
||||
message: 'Page has no YAML frontmatter',
|
||||
fixable: false,
|
||||
});
|
||||
}
|
||||
|
||||
// Rule: Broken citations (unclosed [Source: ...)
|
||||
for (let i = 0; i < lines.length; i++) {
|
||||
const line = lines[i];
|
||||
// Open [Source: without closing ]
|
||||
if (line.match(/\[Source:[^\]]*$/) && !(i + 1 < lines.length && lines[i + 1].match(/^\s*[^\[]*\]/))) {
|
||||
issues.push({
|
||||
file: filePath, line: i + 1, rule: 'broken-citation',
|
||||
message: 'Unclosed [Source: ...] citation',
|
||||
fixable: false,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Rule: Empty/stub sections
|
||||
const sectionPattern = /^##\s+(.+)$/gm;
|
||||
let sectionMatch;
|
||||
while ((sectionMatch = sectionPattern.exec(content)) !== null) {
|
||||
const sectionStart = sectionMatch.index + sectionMatch[0].length;
|
||||
const nextSection = content.indexOf('\n## ', sectionStart);
|
||||
const sectionBody = content.slice(sectionStart, nextSection > 0 ? nextSection : undefined).trim();
|
||||
|
||||
if (sectionBody === '' || sectionBody === '[No data yet]' || sectionBody === '*[To be filled by agent]*') {
|
||||
const lineNum = content.slice(0, sectionMatch.index).split('\n').length;
|
||||
issues.push({
|
||||
file: filePath, line: lineNum, rule: 'empty-section',
|
||||
message: `Empty section: ## ${sectionMatch[1]}`,
|
||||
fixable: false,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return issues;
|
||||
}
|
||||
|
||||
/** Auto-fix fixable issues */
|
||||
export function fixContent(content: string): string {
|
||||
let fixed = content;
|
||||
|
||||
// Fix LLM preambles
|
||||
for (const pattern of LLM_PREAMBLES) {
|
||||
pattern.lastIndex = 0;
|
||||
fixed = fixed.replace(pattern, '');
|
||||
}
|
||||
|
||||
// Fix wrapping code fences
|
||||
fixed = fixed.replace(/^```(?:markdown|md)\s*\n/, '');
|
||||
fixed = fixed.replace(/\n```\s*$/, '');
|
||||
|
||||
// Clean up excessive blank lines left by fixes
|
||||
fixed = fixed.replace(/\n{3,}/g, '\n\n');
|
||||
|
||||
return fixed.trim() + '\n';
|
||||
}
|
||||
|
||||
/** Collect markdown files from a directory */
|
||||
function collectPages(dir: string): string[] {
|
||||
const pages: string[] = [];
|
||||
function walk(d: string) {
|
||||
for (const entry of readdirSync(d)) {
|
||||
if (entry.startsWith('.') || entry.startsWith('_')) continue;
|
||||
const full = join(d, entry);
|
||||
if (lstatSync(full).isDirectory()) walk(full);
|
||||
else if (entry.endsWith('.md')) pages.push(full);
|
||||
}
|
||||
}
|
||||
walk(dir);
|
||||
return pages.sort();
|
||||
}
|
||||
|
||||
export async function runLint(args: string[]) {
|
||||
const target = args.find(a => !a.startsWith('--'));
|
||||
const doFix = args.includes('--fix');
|
||||
const dryRun = args.includes('--dry-run');
|
||||
|
||||
if (!target) {
|
||||
console.error('Usage: gbrain lint <dir|file.md> [--fix] [--dry-run]');
|
||||
console.error(' --fix Auto-fix fixable issues (LLM preambles, code fences)');
|
||||
console.error(' --dry-run Preview fixes without writing');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (!existsSync(target)) {
|
||||
console.error(`Not found: ${target}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Single file or directory
|
||||
const isSingleFile = statSync(target).isFile();
|
||||
const pages = isSingleFile ? [target] : collectPages(target);
|
||||
|
||||
let totalIssues = 0;
|
||||
let totalFixed = 0;
|
||||
let pagesWithIssues = 0;
|
||||
|
||||
for (const page of pages) {
|
||||
const content = readFileSync(page, 'utf-8');
|
||||
const relPath = isSingleFile ? page : relative(target, page);
|
||||
const issues = lintContent(content, relPath);
|
||||
|
||||
if (issues.length === 0) continue;
|
||||
pagesWithIssues++;
|
||||
totalIssues += issues.length;
|
||||
|
||||
console.log(`\n${relPath}:`);
|
||||
for (const issue of issues) {
|
||||
const fixLabel = issue.fixable ? ' [fixable]' : '';
|
||||
console.log(` L${issue.line} ${issue.rule}: ${issue.message}${fixLabel}`);
|
||||
}
|
||||
|
||||
// Auto-fix if requested
|
||||
if (doFix && issues.some(i => i.fixable)) {
|
||||
const fixed = fixContent(content);
|
||||
if (fixed !== content) {
|
||||
const fixCount = issues.filter(i => i.fixable).length;
|
||||
totalFixed += fixCount;
|
||||
if (!dryRun) {
|
||||
writeFileSync(page, fixed);
|
||||
}
|
||||
console.log(` ${dryRun ? '(dry run) ' : ''}Fixed ${fixCount} issue(s)`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n${pages.length} pages scanned. ${totalIssues} issue(s) in ${pagesWithIssues} page(s).`);
|
||||
if (doFix) {
|
||||
console.log(`${dryRun ? '(dry run) ' : ''}${totalFixed} auto-fixed.`);
|
||||
} else if (totalIssues > 0) {
|
||||
const fixable = totalIssues; // rough estimate
|
||||
console.log(`Run with --fix to auto-fix fixable issues.`);
|
||||
}
|
||||
}
|
||||
373
src/commands/publish.ts
Normal file
373
src/commands/publish.ts
Normal file
@@ -0,0 +1,373 @@
|
||||
/**
|
||||
* gbrain publish — Generate shareable HTML from brain markdown pages.
|
||||
*
|
||||
* Deterministic: zero LLM calls. The skill (skills/publish/SKILL.md)
|
||||
* tells the agent when and how to use this. This code does the work.
|
||||
*
|
||||
* Usage:
|
||||
* gbrain publish <page-path> # local HTML file
|
||||
* gbrain publish <page-path> --password # auto-generated pw
|
||||
* gbrain publish <page-path> --password "secret" # custom pw
|
||||
* gbrain publish <page-path> --out /tmp/share.html # custom output
|
||||
* gbrain publish <page-path> --title "Custom Title" # override title
|
||||
*/
|
||||
|
||||
import { readFileSync, writeFileSync, mkdirSync } from 'fs';
|
||||
import { randomBytes, createCipheriv, pbkdf2Sync } from 'crypto';
|
||||
import { dirname, basename } from 'path';
|
||||
|
||||
// ── Content stripping ──────────────────────────────────────────────
|
||||
|
||||
/** Strip private/internal data from brain markdown before publishing */
|
||||
export function makeShareable(content: string): string {
|
||||
let clean = content;
|
||||
|
||||
// Remove YAML frontmatter
|
||||
clean = clean.replace(/^---[\s\S]*?---\n*/, '');
|
||||
|
||||
// Remove [Source: ...] citations (all formats)
|
||||
clean = clean.replace(/\s*\[Source:[^\]]*\]/g, '');
|
||||
|
||||
// Remove confirmation numbers
|
||||
clean = clean.replace(/\*\*Confirmation:\*\*\s*[A-Z0-9]{6,}/gi, '**Confirmation:** on file');
|
||||
clean = clean.replace(/Confirmation[:#]?\s*[A-Z0-9]{6,}/gi, 'Confirmation: on file');
|
||||
clean = clean.replace(/\bconf\s*#?\s*[A-Z0-9]{6,}/gi, 'Confirmation: on file');
|
||||
|
||||
// Remove brain cross-links but keep display text
|
||||
clean = clean.replace(/\[([^\]]+)\]\(\.[^)]*\/[^)]+\)/g, '$1');
|
||||
|
||||
// Remove "See also" brain-internal lines
|
||||
clean = clean.replace(/^-?\s*See also:.*$/gm, '');
|
||||
|
||||
// Remove Timeline section (below the --- separator near end)
|
||||
clean = clean.replace(/\n---\n\n## Timeline[\s\S]*$/, '');
|
||||
|
||||
// Clean up excessive blank lines
|
||||
clean = clean.replace(/\n{3,}/g, '\n\n');
|
||||
|
||||
return clean.trim();
|
||||
}
|
||||
|
||||
// ── Title extraction ───────────────────────────────────────────────
|
||||
|
||||
export function extractTitle(markdown: string): string {
|
||||
const match = markdown.match(/^#\s+(.+)$/m);
|
||||
return match ? match[1].trim() : 'Document';
|
||||
}
|
||||
|
||||
// ── Encryption ─────────────────────────────────────────────────────
|
||||
|
||||
export interface EncryptedContent {
|
||||
salt: string;
|
||||
iv: string;
|
||||
ciphertext: string;
|
||||
}
|
||||
|
||||
export function encryptContent(plaintext: string, password: string): EncryptedContent {
|
||||
const salt = randomBytes(16);
|
||||
const iv = randomBytes(12);
|
||||
const key = pbkdf2Sync(password, salt, 100_000, 32, 'sha256');
|
||||
const cipher = createCipheriv('aes-256-gcm', key, iv);
|
||||
|
||||
let encrypted = cipher.update(plaintext, 'utf8');
|
||||
encrypted = Buffer.concat([encrypted, cipher.final()]);
|
||||
const authTag = cipher.getAuthTag();
|
||||
|
||||
return {
|
||||
salt: salt.toString('base64'),
|
||||
iv: iv.toString('base64'),
|
||||
ciphertext: Buffer.concat([encrypted, authTag]).toString('base64'),
|
||||
};
|
||||
}
|
||||
|
||||
export function generatePassword(length: number = 16): string {
|
||||
const chars = 'abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ23456789';
|
||||
const bytes = randomBytes(length);
|
||||
return Array.from(bytes).map(b => chars[b % chars.length]).join('');
|
||||
}
|
||||
|
||||
// ── HTML generation ────────────────────────────────────────────────
|
||||
|
||||
const CSS = `
|
||||
:root {
|
||||
--bg: #fafaf9; --fg: #1c1917; --muted: #78716c;
|
||||
--accent: #d97706; --border: #e7e5e4; --card-bg: #ffffff;
|
||||
--code-bg: #f5f5f4; --link: #2563eb; --error: #dc2626;
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
:root {
|
||||
--bg: #0c0a09; --fg: #fafaf9; --muted: #a8a29e;
|
||||
--accent: #fbbf24; --border: #292524; --card-bg: #1c1917;
|
||||
--code-bg: #1c1917; --link: #60a5fa; --error: #f87171;
|
||||
}
|
||||
}
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body {
|
||||
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'SF Pro', Roboto, sans-serif;
|
||||
background: var(--bg); color: var(--fg);
|
||||
line-height: 1.7; padding: 1rem;
|
||||
max-width: 720px; margin: 0 auto; font-size: 15px;
|
||||
}
|
||||
h1 { font-size: 1.75rem; font-weight: 700; margin: 1.5rem 0 0.5rem; letter-spacing: -0.02em; }
|
||||
h2 { font-size: 1.3rem; font-weight: 600; margin: 2rem 0 0.75rem; padding-bottom: 0.4rem; border-bottom: 2px solid var(--accent); }
|
||||
h3 { font-size: 1.1rem; font-weight: 600; margin: 1.5rem 0 0.5rem; color: var(--accent); }
|
||||
h4 { font-size: 1rem; font-weight: 600; margin: 1.25rem 0 0.4rem; }
|
||||
p { margin: 0.5rem 0; }
|
||||
blockquote { border-left: 3px solid var(--accent); padding: 0.75rem 1rem; margin: 1rem 0; background: var(--card-bg); border-radius: 0 8px 8px 0; font-style: italic; color: var(--muted); }
|
||||
ul, ol { margin: 0.5rem 0; padding-left: 1.5rem; }
|
||||
li { margin: 0.3rem 0; }
|
||||
a { color: var(--link); text-decoration: none; }
|
||||
a:hover { text-decoration: underline; }
|
||||
strong { font-weight: 600; }
|
||||
code { background: var(--code-bg); padding: 2px 6px; border-radius: 4px; font-size: 0.9em; }
|
||||
hr { border: none; border-top: 1px solid var(--border); margin: 2rem 0; }
|
||||
table { width: 100%; border-collapse: collapse; margin: 1rem 0; font-size: 14px; }
|
||||
th, td { padding: 8px 12px; border: 1px solid var(--border); text-align: left; }
|
||||
th { background: var(--card-bg); font-weight: 600; }
|
||||
@media (max-width: 600px) {
|
||||
body { font-size: 14px; padding: 0.75rem; }
|
||||
h1 { font-size: 1.4rem; }
|
||||
h2 { font-size: 1.15rem; }
|
||||
table { font-size: 12px; }
|
||||
th, td { padding: 6px 8px; }
|
||||
}
|
||||
`;
|
||||
|
||||
const PASSWORD_CSS = `
|
||||
.pw-overlay {
|
||||
position: fixed; inset: 0; display: flex; align-items: center; justify-content: center;
|
||||
background: var(--bg); z-index: 1000;
|
||||
}
|
||||
.pw-card {
|
||||
background: var(--card-bg); border: 1px solid var(--border); border-radius: 16px;
|
||||
padding: 2.5rem; max-width: 380px; width: 90%; text-align: center;
|
||||
box-shadow: 0 4px 24px rgba(0,0,0,0.1);
|
||||
}
|
||||
.pw-lock { font-size: 3rem; margin-bottom: 1rem; }
|
||||
.pw-title { font-size: 1.1rem; font-weight: 600; margin-bottom: 0.5rem; }
|
||||
.pw-subtitle { font-size: 0.85rem; color: var(--muted); margin-bottom: 1.5rem; }
|
||||
.pw-input {
|
||||
width: 100%; padding: 10px 14px; border: 1px solid var(--border); border-radius: 8px;
|
||||
background: var(--bg); color: var(--fg); font-size: 15px; margin-bottom: 1rem;
|
||||
outline: none; transition: border-color 0.2s;
|
||||
}
|
||||
.pw-input:focus { border-color: var(--accent); }
|
||||
.pw-btn {
|
||||
width: 100%; padding: 10px 14px; border: none; border-radius: 8px;
|
||||
background: var(--accent); color: #fff; font-size: 15px; font-weight: 600;
|
||||
cursor: pointer; transition: opacity 0.2s;
|
||||
}
|
||||
.pw-btn:hover { opacity: 0.9; }
|
||||
.pw-error { color: var(--error); font-size: 0.85rem; margin-top: 0.75rem; display: none; }
|
||||
.pw-remember { display: flex; align-items: center; justify-content: center; gap: 6px; margin-bottom: 1rem; font-size: 0.85rem; color: var(--muted); cursor: pointer; }
|
||||
.pw-remember input { cursor: pointer; }
|
||||
@keyframes shake { 0%,100%{transform:translateX(0)} 25%{transform:translateX(-8px)} 75%{transform:translateX(8px)} }
|
||||
.shake { animation: shake 0.3s ease-in-out; }
|
||||
`;
|
||||
|
||||
const DECRYPT_JS = `
|
||||
const STORAGE_KEY = 'bp_' + location.pathname;
|
||||
|
||||
async function deriveKey(password, salt) {
|
||||
const enc = new TextEncoder();
|
||||
const keyMaterial = await crypto.subtle.importKey('raw', enc.encode(password), 'PBKDF2', false, ['deriveKey']);
|
||||
return crypto.subtle.deriveKey(
|
||||
{ name: 'PBKDF2', salt, iterations: 100000, hash: 'SHA-256' },
|
||||
keyMaterial,
|
||||
{ name: 'AES-GCM', length: 256 },
|
||||
false,
|
||||
['decrypt']
|
||||
);
|
||||
}
|
||||
|
||||
async function decryptContent(password) {
|
||||
try {
|
||||
const salt = Uint8Array.from(atob(window.__SALT), c => c.charCodeAt(0));
|
||||
const iv = Uint8Array.from(atob(window.__IV), c => c.charCodeAt(0));
|
||||
const data = Uint8Array.from(atob(window.__CT), c => c.charCodeAt(0));
|
||||
const ciphertext = data.slice(0, data.length - 16);
|
||||
const authTag = data.slice(data.length - 16);
|
||||
const combined = new Uint8Array(ciphertext.length + authTag.length);
|
||||
combined.set(ciphertext);
|
||||
combined.set(authTag, ciphertext.length);
|
||||
const key = await deriveKey(password, salt);
|
||||
const decrypted = await crypto.subtle.decrypt({ name: 'AES-GCM', iv }, key, combined);
|
||||
return new TextDecoder().decode(decrypted);
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function unlock(pw, remember) {
|
||||
const result = await decryptContent(pw);
|
||||
if (result) {
|
||||
if (remember) {
|
||||
try { localStorage.setItem(STORAGE_KEY, pw); } catch {}
|
||||
}
|
||||
document.getElementById('pw-overlay').remove();
|
||||
document.getElementById('content').innerHTML = marked.parse(result);
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
(async () => {
|
||||
try {
|
||||
const saved = localStorage.getItem(STORAGE_KEY);
|
||||
if (saved && await unlock(saved, false)) return;
|
||||
} catch {}
|
||||
|
||||
document.getElementById('pw-form').addEventListener('submit', async (e) => {
|
||||
e.preventDefault();
|
||||
const input = document.getElementById('pw-input');
|
||||
const error = document.getElementById('pw-error');
|
||||
const card = document.querySelector('.pw-card');
|
||||
const remember = document.getElementById('pw-remember').checked;
|
||||
const pw = input.value;
|
||||
|
||||
if (await unlock(pw, remember)) return;
|
||||
|
||||
error.style.display = 'block';
|
||||
error.textContent = 'Wrong password. Try again.';
|
||||
card.classList.remove('shake');
|
||||
void card.offsetWidth;
|
||||
card.classList.add('shake');
|
||||
input.value = '';
|
||||
input.focus();
|
||||
});
|
||||
|
||||
document.getElementById('pw-input').addEventListener('input', () => {
|
||||
document.getElementById('pw-error').style.display = 'none';
|
||||
});
|
||||
})();
|
||||
`;
|
||||
|
||||
function escapeHtml(str: string): string {
|
||||
return str.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"');
|
||||
}
|
||||
|
||||
interface GenerateHtmlOptions {
|
||||
title: string;
|
||||
markdown: string;
|
||||
encrypted?: EncryptedContent | null;
|
||||
}
|
||||
|
||||
export function generateHtml({ title, markdown, encrypted }: GenerateHtmlOptions): string {
|
||||
const passwordHtml = encrypted ? `
|
||||
<div id="pw-overlay" class="pw-overlay">
|
||||
<div class="pw-card">
|
||||
<div class="pw-lock">🔒</div>
|
||||
<div class="pw-title">${escapeHtml(title)}</div>
|
||||
<div class="pw-subtitle">This document is password protected</div>
|
||||
<form id="pw-form">
|
||||
<input type="password" id="pw-input" class="pw-input" placeholder="Enter password" autofocus>
|
||||
<label class="pw-remember"><input type="checkbox" id="pw-remember" checked> Remember on this device</label>
|
||||
<button type="submit" class="pw-btn">Unlock</button>
|
||||
</form>
|
||||
<div id="pw-error" class="pw-error"></div>
|
||||
</div>
|
||||
</div>` : '';
|
||||
|
||||
const encryptedVars = encrypted ? `
|
||||
<script>
|
||||
window.__SALT = ${JSON.stringify(encrypted.salt)};
|
||||
window.__IV = ${JSON.stringify(encrypted.iv)};
|
||||
window.__CT = ${JSON.stringify(encrypted.ciphertext)};
|
||||
</script>` : '';
|
||||
|
||||
// Sanitize markdown rendering to prevent XSS from embedded HTML in brain pages
|
||||
const sanitizeScript = `
|
||||
function sanitizeHtml(html) {
|
||||
const div = document.createElement('div');
|
||||
div.innerHTML = html;
|
||||
div.querySelectorAll('script,iframe,object,embed,form').forEach(el => el.remove());
|
||||
div.querySelectorAll('*').forEach(el => {
|
||||
for (const attr of [...el.attributes]) {
|
||||
if (attr.name.startsWith('on') || attr.value.startsWith('javascript:')) {
|
||||
el.removeAttribute(attr.name);
|
||||
}
|
||||
}
|
||||
});
|
||||
return div.innerHTML;
|
||||
}
|
||||
`;
|
||||
|
||||
const contentScript = encrypted
|
||||
? `<script>${sanitizeScript}${DECRYPT_JS.replace(
|
||||
'document.getElementById(\'content\').innerHTML = marked.parse(result)',
|
||||
'document.getElementById(\'content\').innerHTML = sanitizeHtml(marked.parse(result))'
|
||||
)}<\/script>`
|
||||
: `<script>${sanitizeScript}
|
||||
const md = ${JSON.stringify(markdown)};
|
||||
document.getElementById('content').innerHTML = sanitizeHtml(marked.parse(md));
|
||||
<\/script>`;
|
||||
|
||||
return `<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>${escapeHtml(title)}</title>
|
||||
<style>${CSS}${encrypted ? PASSWORD_CSS : ''}</style>
|
||||
</head>
|
||||
<body>
|
||||
${passwordHtml}
|
||||
<div id="content"></div>
|
||||
${encryptedVars}
|
||||
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"><\/script>
|
||||
${contentScript}
|
||||
</body>
|
||||
</html>`;
|
||||
}
|
||||
|
||||
// ── CLI entry point ────────────────────────────────────────────────
|
||||
|
||||
export async function runPublish(args: string[]) {
|
||||
const inputPath = args.find(a => !a.startsWith('--'));
|
||||
const outIdx = args.indexOf('--out');
|
||||
const titleIdx = args.indexOf('--title');
|
||||
const pwIdx = args.indexOf('--password');
|
||||
|
||||
if (!inputPath) {
|
||||
console.error('Usage: gbrain publish <page.md> [--password ["secret"]] [--title "Title"] [--out path]');
|
||||
console.error('');
|
||||
console.error(' Generates a shareable HTML page from brain markdown.');
|
||||
console.error(' Strips private data (frontmatter, citations, timeline, brain links).');
|
||||
console.error(' Optionally encrypts with AES-256-GCM (client-side, no server needed).');
|
||||
console.error('');
|
||||
console.error(' --password Auto-generate a password');
|
||||
console.error(' --password "secret" Use a specific password');
|
||||
console.error(' --title "Title" Override the page title');
|
||||
console.error(' --out path Output file (default: <input-basename>.html)');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const raw = readFileSync(inputPath, 'utf-8');
|
||||
const cleaned = makeShareable(raw);
|
||||
const title = (titleIdx >= 0 ? args[titleIdx + 1] : null) || extractTitle(raw);
|
||||
|
||||
// Handle password
|
||||
let encrypted: EncryptedContent | null = null;
|
||||
if (pwIdx >= 0) {
|
||||
const nextArg = args[pwIdx + 1];
|
||||
const password = (nextArg && !nextArg.startsWith('--')) ? nextArg : generatePassword();
|
||||
encrypted = encryptContent(cleaned, password);
|
||||
if (!nextArg || nextArg.startsWith('--')) {
|
||||
console.error(`Password: ${password}`);
|
||||
}
|
||||
}
|
||||
|
||||
const html = generateHtml({ title, markdown: cleaned, encrypted });
|
||||
|
||||
// Determine output path
|
||||
const outPath = outIdx >= 0 ? args[outIdx + 1] : basename(inputPath, '.md') + '.html';
|
||||
mkdirSync(dirname(outPath), { recursive: true });
|
||||
writeFileSync(outPath, html);
|
||||
|
||||
console.log(`Published: ${outPath}`);
|
||||
if (encrypted) {
|
||||
console.log(' (password protected, AES-256-GCM encrypted)');
|
||||
} else {
|
||||
console.log(' (no password, content in cleartext)');
|
||||
}
|
||||
}
|
||||
82
src/commands/report.ts
Normal file
82
src/commands/report.ts
Normal file
@@ -0,0 +1,82 @@
|
||||
/**
|
||||
* gbrain report — Save a structured report to brain/reports/.
|
||||
*
|
||||
* Deterministic: zero LLM calls. Creates timestamped report pages
|
||||
* for audit trails of enrichment sweeps, maintenance runs, syncs, etc.
|
||||
*
|
||||
* Usage:
|
||||
* gbrain report --type enrichment-sweep --title "Enrichment Sweep" --content "..."
|
||||
* echo "report body" | gbrain report --type meeting-sync --title "Meeting Sync"
|
||||
* gbrain report --type enrichment-sweep --dir /path/to/brain
|
||||
*/
|
||||
|
||||
import { writeFileSync, mkdirSync, readFileSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
|
||||
export async function runReport(args: string[]) {
|
||||
const typeIdx = args.indexOf('--type');
|
||||
const titleIdx = args.indexOf('--title');
|
||||
const contentIdx = args.indexOf('--content');
|
||||
const dirIdx = args.indexOf('--dir');
|
||||
|
||||
const reportType = typeIdx >= 0 ? args[typeIdx + 1] : null;
|
||||
const brainDir = dirIdx >= 0 ? args[dirIdx + 1] : '.';
|
||||
|
||||
// Validate reportType to prevent path traversal
|
||||
if (reportType && !/^[a-z0-9][a-z0-9-]*$/.test(reportType)) {
|
||||
console.error('Report type must be lowercase alphanumeric with hyphens only (e.g., "enrichment-sweep")');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (!reportType) {
|
||||
console.error('Usage: gbrain report --type <name> --title "..." --content "..." [--dir <brain>]');
|
||||
console.error(' Or pipe content via stdin:');
|
||||
console.error(' echo "report body" | gbrain report --type meeting-sync --title "Daily Sync"');
|
||||
console.error('');
|
||||
console.error(' Common types: enrichment-sweep, meeting-sync, maintenance, backlink-check, lint');
|
||||
console.error(' Creates: brain/reports/{type}/{YYYY-MM-DD-HHMM}.md');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Read content from --content arg or stdin
|
||||
let content = contentIdx >= 0 ? args[contentIdx + 1] : null;
|
||||
if (!content && !process.stdin.isTTY) {
|
||||
content = readFileSync('/dev/stdin', 'utf-8');
|
||||
}
|
||||
|
||||
if (!content?.trim()) {
|
||||
console.error('No content provided. Use --content "..." or pipe via stdin.');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const now = new Date();
|
||||
const pad = (n: number) => String(n).padStart(2, '0');
|
||||
const dateStr = `${now.getFullYear()}-${pad(now.getMonth() + 1)}-${pad(now.getDate())}`;
|
||||
const timeStr = `${pad(now.getHours())}${pad(now.getMinutes())}`;
|
||||
const timePretty = `${pad(now.getHours())}:${pad(now.getMinutes())}`;
|
||||
|
||||
const title = titleIdx >= 0
|
||||
? args[titleIdx + 1]
|
||||
: reportType.replace(/-/g, ' ').replace(/\b\w/g, c => c.toUpperCase());
|
||||
|
||||
const filename = `${dateStr}-${timeStr}.md`;
|
||||
const reportDir = join(brainDir, 'reports', reportType);
|
||||
mkdirSync(reportDir, { recursive: true });
|
||||
|
||||
const page = `---
|
||||
title: "${title} -- ${dateStr}"
|
||||
type: report
|
||||
report_type: ${reportType}
|
||||
date: ${dateStr}
|
||||
time: "${timePretty}"
|
||||
---
|
||||
|
||||
# ${title} -- ${dateStr} ${timePretty}
|
||||
|
||||
${content.trim()}
|
||||
`;
|
||||
|
||||
const filepath = join(reportDir, filename);
|
||||
writeFileSync(filepath, page);
|
||||
console.log(filepath);
|
||||
}
|
||||
@@ -6,9 +6,10 @@ import type { StorageBackend } from './storage.ts';
|
||||
/**
|
||||
* Universal file reader with fallback chain:
|
||||
* 1. Local file exists → return it
|
||||
* 2. .redirect breadcrumb exists → fetch from storage
|
||||
* 3. .supabase marker in parent dir → prefer storage, fall back to local
|
||||
* 4. None → throw
|
||||
* 2. .redirect.yaml pointer exists → fetch from storage (v0.9+ format)
|
||||
* 3. .redirect breadcrumb exists → fetch from storage (legacy v0.8 format)
|
||||
* 4. .supabase marker in parent dir → prefer storage, fall back to local
|
||||
* 5. None → throw
|
||||
*/
|
||||
|
||||
export interface ResolvedFile {
|
||||
@@ -16,6 +17,21 @@ export interface ResolvedFile {
|
||||
source: 'local' | 'storage' | 'redirect';
|
||||
}
|
||||
|
||||
/** v0.9+ redirect format (.redirect.yaml) — richer metadata */
|
||||
export interface RedirectYaml {
|
||||
target: string; // supabase://brain-files/{storagePath}
|
||||
bucket: string;
|
||||
storage_path: string;
|
||||
size: number;
|
||||
size_human: string;
|
||||
hash: string; // sha256:...
|
||||
mime: string;
|
||||
uploaded: string; // ISO timestamp
|
||||
source_url?: string;
|
||||
type?: string; // transcript, article, image, etc.
|
||||
}
|
||||
|
||||
/** Legacy v0.8 redirect format (.redirect) */
|
||||
export interface RedirectInfo {
|
||||
moved_to: string;
|
||||
bucket: string;
|
||||
@@ -43,16 +59,25 @@ export async function resolveFile(
|
||||
return { data: readFileSync(fullPath), source: 'local' };
|
||||
}
|
||||
|
||||
// 2. .redirect breadcrumb
|
||||
const redirectPath = fullPath + '.redirect';
|
||||
if (existsSync(redirectPath)) {
|
||||
// 2. .redirect.yaml pointer (v0.9+ format)
|
||||
const yamlRedirectPath = fullPath + '.redirect.yaml';
|
||||
if (existsSync(yamlRedirectPath)) {
|
||||
if (!storage) throw new Error(`File redirected to storage but no storage backend configured: ${filePath}`);
|
||||
const info = parseRedirect(redirectPath);
|
||||
const info = parseRedirectYaml(yamlRedirectPath);
|
||||
const data = await storage.download(info.storage_path);
|
||||
return { data, source: 'redirect' };
|
||||
}
|
||||
|
||||
// 3. Legacy .redirect breadcrumb (v0.8 format)
|
||||
const legacyRedirectPath = fullPath + '.redirect';
|
||||
if (existsSync(legacyRedirectPath)) {
|
||||
if (!storage) throw new Error(`File redirected to storage but no storage backend configured: ${filePath}`);
|
||||
const info = parseRedirect(legacyRedirectPath);
|
||||
const data = await storage.download(info.path);
|
||||
return { data, source: 'redirect' };
|
||||
}
|
||||
|
||||
// 3. .supabase marker in parent directory
|
||||
// 4. .supabase marker in parent directory
|
||||
const markerPath = join(dirname(fullPath), '.supabase');
|
||||
if (existsSync(markerPath)) {
|
||||
if (!storage) throw new Error(`Directory mirrored to storage but no storage backend configured: ${filePath}`);
|
||||
@@ -73,6 +98,13 @@ export async function resolveFile(
|
||||
throw new Error(`File not found: ${filePath}`);
|
||||
}
|
||||
|
||||
/** Parse v0.9+ .redirect.yaml pointer */
|
||||
export function parseRedirectYaml(path: string): RedirectYaml {
|
||||
const content = readFileSync(path, 'utf-8');
|
||||
return parseYaml(content) as RedirectYaml;
|
||||
}
|
||||
|
||||
/** Parse legacy v0.8 .redirect breadcrumb */
|
||||
export function parseRedirect(path: string): RedirectInfo {
|
||||
const content = readFileSync(path, 'utf-8');
|
||||
return parseYaml(content) as RedirectInfo;
|
||||
@@ -82,3 +114,11 @@ export function parseMarker(path: string): MarkerInfo {
|
||||
const content = readFileSync(path, 'utf-8');
|
||||
return parseYaml(content) as MarkerInfo;
|
||||
}
|
||||
|
||||
/** Human-readable file size */
|
||||
export function humanSize(bytes: number): string {
|
||||
if (bytes < 1024) return `${bytes} B`;
|
||||
if (bytes < 1024 * 1024) return `${Math.round(bytes / 1024)} KB`;
|
||||
if (bytes < 1024 * 1024 * 1024) return `${Math.round(bytes / (1024 * 1024))} MB`;
|
||||
return `${(bytes / (1024 * 1024 * 1024)).toFixed(1)} GB`;
|
||||
}
|
||||
|
||||
@@ -1,8 +1,17 @@
|
||||
import type { StorageBackend, StorageConfig } from '../storage.ts';
|
||||
|
||||
/** Size thresholds for upload method selection */
|
||||
const TUS_THRESHOLD = 100 * 1024 * 1024; // 100 MB — use TUS resumable above this
|
||||
const TUS_CHUNK_SIZE = 6 * 1024 * 1024; // 6 MB chunks for TUS uploads
|
||||
const SIGNED_URL_EXPIRY = 3600; // 1 hour
|
||||
|
||||
/**
|
||||
* Supabase Storage — uses the Supabase Storage REST API.
|
||||
* Auth via the service role key (not the anon key).
|
||||
*
|
||||
* Upload method auto-selected by file size:
|
||||
* < 100 MB → standard POST (single request)
|
||||
* >= 100 MB → TUS resumable upload (6 MB chunks with retry)
|
||||
*/
|
||||
export class SupabaseStorage implements StorageBackend {
|
||||
private projectUrl: string;
|
||||
@@ -30,6 +39,15 @@ export class SupabaseStorage implements StorageBackend {
|
||||
}
|
||||
|
||||
async upload(path: string, data: Buffer, mime?: string): Promise<void> {
|
||||
if (data.length >= TUS_THRESHOLD) {
|
||||
await this.uploadTus(path, data, mime);
|
||||
} else {
|
||||
await this.uploadStandard(path, data, mime);
|
||||
}
|
||||
}
|
||||
|
||||
/** Standard single-request upload for files < 100 MB */
|
||||
private async uploadStandard(path: string, data: Buffer, mime?: string): Promise<void> {
|
||||
const res = await fetch(this.url(path), {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
@@ -45,6 +63,90 @@ export class SupabaseStorage implements StorageBackend {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* TUS resumable upload for files >= 100 MB.
|
||||
* Sends in 6 MB chunks with retry + exponential backoff.
|
||||
*/
|
||||
private async uploadTus(path: string, data: Buffer, mime?: string): Promise<void> {
|
||||
const tusUrl = `${this.projectUrl}/storage/v1/upload/resumable`;
|
||||
const objectName = `${this.bucket}/${path}`;
|
||||
|
||||
// Step 1: Create the upload session
|
||||
const createRes = await fetch(tusUrl, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
...this.headers(),
|
||||
'Tus-Resumable': '1.0.0',
|
||||
'Upload-Length': String(data.length),
|
||||
'Upload-Metadata': [
|
||||
`bucketName ${btoa(this.bucket)}`,
|
||||
`objectName ${btoa(path)}`,
|
||||
`contentType ${btoa(mime || 'application/octet-stream')}`,
|
||||
].join(','),
|
||||
'x-upsert': 'true',
|
||||
},
|
||||
});
|
||||
|
||||
if (!createRes.ok) {
|
||||
const body = await createRes.text();
|
||||
throw new Error(`TUS create failed: ${createRes.status} ${body}`);
|
||||
}
|
||||
|
||||
const uploadUrl = createRes.headers.get('Location');
|
||||
if (!uploadUrl) throw new Error('TUS create did not return Location header');
|
||||
|
||||
// Step 2: Upload chunks
|
||||
let offset = 0;
|
||||
while (offset < data.length) {
|
||||
let attempt = 0;
|
||||
const maxAttempts = 3;
|
||||
while (attempt < maxAttempts) {
|
||||
try {
|
||||
// On retry, check server's actual offset (TUS spec requirement)
|
||||
if (attempt > 0) {
|
||||
const headRes = await fetch(uploadUrl, {
|
||||
method: 'HEAD',
|
||||
headers: { ...this.headers(), 'Tus-Resumable': '1.0.0' },
|
||||
});
|
||||
if (headRes.ok) {
|
||||
const serverOffset = headRes.headers.get('Upload-Offset');
|
||||
if (serverOffset) offset = parseInt(serverOffset, 10);
|
||||
}
|
||||
}
|
||||
|
||||
const end = Math.min(offset + TUS_CHUNK_SIZE, data.length);
|
||||
const chunk = data.subarray(offset, end);
|
||||
|
||||
const patchRes = await fetch(uploadUrl, {
|
||||
method: 'PATCH',
|
||||
headers: {
|
||||
...this.headers(),
|
||||
'Tus-Resumable': '1.0.0',
|
||||
'Upload-Offset': String(offset),
|
||||
'Content-Type': 'application/offset+octet-stream',
|
||||
'Content-Length': String(chunk.length),
|
||||
},
|
||||
body: chunk,
|
||||
});
|
||||
|
||||
if (!patchRes.ok) {
|
||||
const body = await patchRes.text();
|
||||
throw new Error(`TUS PATCH failed: ${patchRes.status} ${body}`);
|
||||
}
|
||||
|
||||
const newOffset = patchRes.headers.get('Upload-Offset');
|
||||
offset = newOffset ? parseInt(newOffset, 10) : end;
|
||||
break; // Success, move to next chunk
|
||||
} catch (err) {
|
||||
attempt++;
|
||||
if (attempt >= maxAttempts) throw err;
|
||||
// Exponential backoff: 1s, 2s, 4s
|
||||
await new Promise(r => setTimeout(r, 1000 * Math.pow(2, attempt - 1)));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async download(path: string): Promise<Buffer> {
|
||||
const res = await fetch(this.url(path), {
|
||||
headers: this.headers(),
|
||||
@@ -81,8 +183,28 @@ export class SupabaseStorage implements StorageBackend {
|
||||
return items.map(i => `${prefix}/${i.name}`);
|
||||
}
|
||||
|
||||
/** Generate a signed URL with 1-hour expiry for private bucket access */
|
||||
async getSignedUrl(path: string, expiresIn: number = SIGNED_URL_EXPIRY): Promise<string> {
|
||||
const res = await fetch(`${this.projectUrl}/storage/v1/object/sign/${this.bucket}/${path}`, {
|
||||
method: 'POST',
|
||||
headers: { ...this.headers(), 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ expiresIn }),
|
||||
});
|
||||
if (!res.ok) {
|
||||
const body = await res.text();
|
||||
throw new Error(`Supabase signed URL failed: ${res.status} ${body}`);
|
||||
}
|
||||
const result = await res.json() as { signedURL: string };
|
||||
return `${this.projectUrl}${result.signedURL}`;
|
||||
}
|
||||
|
||||
async getUrl(path: string): Promise<string> {
|
||||
// Public URL (if bucket is public) or signed URL
|
||||
return `${this.projectUrl}/storage/v1/object/public/${this.bucket}/${path}`;
|
||||
// Try signed URL first (works for private buckets)
|
||||
try {
|
||||
return await this.getSignedUrl(path);
|
||||
} catch {
|
||||
// Fall back to public URL
|
||||
return `${this.projectUrl}/storage/v1/object/public/${this.bucket}/${path}`;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
80
test/backlinks.test.ts
Normal file
80
test/backlinks.test.ts
Normal file
@@ -0,0 +1,80 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import {
|
||||
extractEntityRefs,
|
||||
extractPageTitle,
|
||||
hasBacklink,
|
||||
buildBacklinkEntry,
|
||||
} from '../src/commands/backlinks.ts';
|
||||
|
||||
describe('extractEntityRefs', () => {
|
||||
test('extracts people links', () => {
|
||||
const content = 'Met [Jane Doe](../people/jane-doe.md) at the event.';
|
||||
const refs = extractEntityRefs(content, 'meetings/2026-04-01.md');
|
||||
expect(refs).toHaveLength(1);
|
||||
expect(refs[0].name).toBe('Jane Doe');
|
||||
expect(refs[0].slug).toBe('jane-doe');
|
||||
expect(refs[0].dir).toBe('people');
|
||||
});
|
||||
|
||||
test('extracts company links', () => {
|
||||
const content = 'Discussed [Acme Corp](../../companies/acme-corp.md) deal.';
|
||||
const refs = extractEntityRefs(content, 'meetings/2026/q1.md');
|
||||
expect(refs).toHaveLength(1);
|
||||
expect(refs[0].name).toBe('Acme Corp');
|
||||
expect(refs[0].slug).toBe('acme-corp');
|
||||
expect(refs[0].dir).toBe('companies');
|
||||
});
|
||||
|
||||
test('extracts multiple refs', () => {
|
||||
const content = '[Alice](../people/alice.md) and [Bob](../people/bob.md) from [Acme](../companies/acme.md).';
|
||||
const refs = extractEntityRefs(content, 'meetings/test.md');
|
||||
expect(refs).toHaveLength(3);
|
||||
});
|
||||
|
||||
test('returns empty for no entity links', () => {
|
||||
const content = 'Just a plain page with [external](https://example.com) link.';
|
||||
expect(extractEntityRefs(content, 'test.md')).toHaveLength(0);
|
||||
});
|
||||
|
||||
test('ignores non-entity brain links', () => {
|
||||
const content = '[Guide](../docs/setup.md) for reference.';
|
||||
expect(extractEntityRefs(content, 'test.md')).toHaveLength(0);
|
||||
});
|
||||
});
|
||||
|
||||
describe('extractPageTitle', () => {
|
||||
test('extracts from frontmatter', () => {
|
||||
expect(extractPageTitle('---\ntitle: "Jane Doe"\ntype: person\n---\n# Jane')).toBe('Jane Doe');
|
||||
});
|
||||
|
||||
test('extracts from H1 when no frontmatter title', () => {
|
||||
expect(extractPageTitle('---\ntype: person\n---\n# Jane Doe')).toBe('Jane Doe');
|
||||
});
|
||||
|
||||
test('extracts H1 without frontmatter', () => {
|
||||
expect(extractPageTitle('# Meeting Notes\n\nContent.')).toBe('Meeting Notes');
|
||||
});
|
||||
|
||||
test('returns Untitled for no title', () => {
|
||||
expect(extractPageTitle('Just content, no heading.')).toBe('Untitled');
|
||||
});
|
||||
});
|
||||
|
||||
describe('hasBacklink', () => {
|
||||
test('returns true when source filename is present', () => {
|
||||
const content = '## Timeline\n\n- Referenced in [Meeting](../../meetings/q1-review.md)';
|
||||
expect(hasBacklink(content, 'q1-review.md')).toBe(true);
|
||||
});
|
||||
|
||||
test('returns false when source filename is absent', () => {
|
||||
const content = '## Timeline\n\n- Some other entry';
|
||||
expect(hasBacklink(content, 'q1-review.md')).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('buildBacklinkEntry', () => {
|
||||
test('builds properly formatted entry', () => {
|
||||
const entry = buildBacklinkEntry('Q1 Review', '../../meetings/q1-review.md', '2026-04-11');
|
||||
expect(entry).toBe('- **2026-04-11** | Referenced in [Q1 Review](../../meetings/q1-review.md)');
|
||||
});
|
||||
});
|
||||
118
test/lint.test.ts
Normal file
118
test/lint.test.ts
Normal file
@@ -0,0 +1,118 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { lintContent, fixContent } from '../src/commands/lint.ts';
|
||||
|
||||
describe('lintContent', () => {
|
||||
test('detects LLM preamble "Of course"', () => {
|
||||
const content = 'Of course. Here is a detailed brain page for Jane Doe.\n\n# Jane Doe\n\nContent.';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'llm-preamble')).toBe(true);
|
||||
});
|
||||
|
||||
test('detects LLM preamble "I\'ve created"', () => {
|
||||
const content = "I've created a comprehensive brain page for the company.\n\n# Acme\n\nContent.";
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'llm-preamble')).toBe(true);
|
||||
});
|
||||
|
||||
test('detects LLM preamble "Certainly"', () => {
|
||||
const content = 'Certainly. Here is the brain page.\n\n# Page\n\nContent.';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'llm-preamble')).toBe(true);
|
||||
});
|
||||
|
||||
test('no false positive on normal content', () => {
|
||||
const content = '---\ntitle: Test\ntype: person\ncreated: 2026-04-11\n---\n\n# Test\n\nNormal content.';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.filter(i => i.rule === 'llm-preamble')).toHaveLength(0);
|
||||
});
|
||||
|
||||
test('detects wrapping code fences', () => {
|
||||
const content = '```markdown\n---\ntitle: Test\n---\n\n# Test\n\nContent.\n```';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'code-fence-wrap')).toBe(true);
|
||||
});
|
||||
|
||||
test('detects placeholder dates', () => {
|
||||
const content = '---\ntitle: Test\ntype: person\ncreated: YYYY-MM-DD\n---\n\n# Test';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'placeholder-date')).toBe(true);
|
||||
});
|
||||
|
||||
test('detects XX-XX placeholder dates', () => {
|
||||
const content = '---\ntitle: Test\ntype: person\ncreated: 2026-04-11\n---\n\n# Test\n\n- 2026-XX-XX | Something happened';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'placeholder-date')).toBe(true);
|
||||
});
|
||||
|
||||
test('detects missing frontmatter title', () => {
|
||||
const content = '---\ntype: person\ncreated: 2026-04-11\n---\n\n# Test';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'missing-title')).toBe(true);
|
||||
});
|
||||
|
||||
test('detects missing frontmatter type', () => {
|
||||
const content = '---\ntitle: Test\ncreated: 2026-04-11\n---\n\n# Test';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'missing-type')).toBe(true);
|
||||
});
|
||||
|
||||
test('detects no frontmatter at all', () => {
|
||||
const content = '# Test\n\nContent without frontmatter.';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'no-frontmatter')).toBe(true);
|
||||
});
|
||||
|
||||
test('detects empty sections', () => {
|
||||
const content = '---\ntitle: Test\ntype: person\ncreated: 2026-04-11\n---\n\n# Test\n\n## What They Believe\n\n[No data yet]\n\n## State\n\nReal content here.';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'empty-section' && i.message.includes('What They Believe'))).toBe(true);
|
||||
});
|
||||
|
||||
test('detects agent placeholder sections', () => {
|
||||
const content = '---\ntitle: Test\ntype: person\ncreated: 2026-04-11\n---\n\n# Test\n\n## Summary\n\n*[To be filled by agent]*\n\n## State\n\nContent.';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues.some(i => i.rule === 'empty-section' && i.message.includes('Summary'))).toBe(true);
|
||||
});
|
||||
|
||||
test('clean page has no issues', () => {
|
||||
const content = '---\ntitle: Jane Doe\ntype: person\ncreated: 2026-04-11\n---\n\n# Jane Doe\n\n## State\n\nCTO of Acme Corp.\n\n## Timeline\n\n- **2026-04-11** | Met at event [Source: User]';
|
||||
const issues = lintContent(content, 'test.md');
|
||||
expect(issues).toHaveLength(0);
|
||||
});
|
||||
});
|
||||
|
||||
describe('fixContent', () => {
|
||||
test('removes LLM preamble', () => {
|
||||
const input = 'Of course. Here is a detailed brain page for Jane.\n\n# Jane Doe\n\nContent.';
|
||||
const fixed = fixContent(input);
|
||||
expect(fixed).not.toContain('Of course');
|
||||
expect(fixed).toContain('# Jane Doe');
|
||||
expect(fixed).toContain('Content.');
|
||||
});
|
||||
|
||||
test('removes wrapping code fences', () => {
|
||||
const input = '```markdown\n# Title\n\nContent.\n```';
|
||||
const fixed = fixContent(input);
|
||||
expect(fixed).not.toContain('```');
|
||||
expect(fixed).toContain('# Title');
|
||||
});
|
||||
|
||||
test('cleans up excessive blank lines after fix', () => {
|
||||
const input = 'Of course. Here is the brain page.\n\n\n\n# Title\n\nContent.';
|
||||
const fixed = fixContent(input);
|
||||
expect(fixed).not.toMatch(/\n{3,}/);
|
||||
});
|
||||
|
||||
test('preserves content that needs no fixing', () => {
|
||||
const input = '# Normal Title\n\nNormal content.\n';
|
||||
expect(fixContent(input)).toBe(input);
|
||||
});
|
||||
|
||||
test('handles multiple preambles', () => {
|
||||
const input = 'Sure! Here is the page.\nCertainly. Here is the brain page.\n\n# Title\n\nContent.';
|
||||
const fixed = fixContent(input);
|
||||
expect(fixed).not.toContain('Sure');
|
||||
expect(fixed).not.toContain('Certainly');
|
||||
expect(fixed).toContain('# Title');
|
||||
});
|
||||
});
|
||||
217
test/publish.test.ts
Normal file
217
test/publish.test.ts
Normal file
@@ -0,0 +1,217 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import {
|
||||
makeShareable,
|
||||
extractTitle,
|
||||
encryptContent,
|
||||
generatePassword,
|
||||
generateHtml,
|
||||
} from '../src/commands/publish.ts';
|
||||
|
||||
describe('makeShareable', () => {
|
||||
test('strips YAML frontmatter', () => {
|
||||
const input = '---\ntitle: Secret\ntype: person\n---\n\n# Jane Doe\n\nPublic content.';
|
||||
const result = makeShareable(input);
|
||||
expect(result).not.toContain('title: Secret');
|
||||
expect(result).not.toContain('type: person');
|
||||
expect(result).toContain('# Jane Doe');
|
||||
expect(result).toContain('Public content.');
|
||||
});
|
||||
|
||||
test('strips [Source: ...] citations', () => {
|
||||
const input = 'Jane is CTO [Source: Crustdata enrichment, 2026-04-01] of Acme.';
|
||||
expect(makeShareable(input)).toBe('Jane is CTO of Acme.');
|
||||
});
|
||||
|
||||
test('strips multi-format citations', () => {
|
||||
const input = 'Fact one [Source: User, meeting, 2026-04-01]. Fact two [Source: compiled from timeline].';
|
||||
const result = makeShareable(input);
|
||||
expect(result).not.toContain('[Source:');
|
||||
expect(result).toContain('Fact one');
|
||||
expect(result).toContain('Fact two');
|
||||
});
|
||||
|
||||
test('redacts confirmation numbers', () => {
|
||||
const input = '**Confirmation:** ABC123DEF456';
|
||||
expect(makeShareable(input)).toContain('on file');
|
||||
expect(makeShareable(input)).not.toContain('ABC123DEF456');
|
||||
});
|
||||
|
||||
test('strips brain cross-links, keeps display text', () => {
|
||||
const input = 'Works with [Jane Doe](../people/jane-doe.md) at Acme.';
|
||||
const result = makeShareable(input);
|
||||
expect(result).toBe('Works with Jane Doe at Acme.');
|
||||
expect(result).not.toContain('../people/');
|
||||
});
|
||||
|
||||
test('preserves external URLs', () => {
|
||||
const input = 'See [their blog](https://example.com/blog) for details.';
|
||||
expect(makeShareable(input)).toContain('https://example.com/blog');
|
||||
});
|
||||
|
||||
test('removes See also lines', () => {
|
||||
const input = '# Title\n\nContent.\n\n- See also: ../companies/acme.md\n\nMore content.';
|
||||
const result = makeShareable(input);
|
||||
expect(result).not.toContain('See also');
|
||||
expect(result).toContain('More content');
|
||||
});
|
||||
|
||||
test('removes Timeline section', () => {
|
||||
const input = '# Title\n\nPublic content.\n\n---\n\n## Timeline\n\n- 2026-04-01 | Secret event';
|
||||
const result = makeShareable(input);
|
||||
expect(result).toContain('Public content.');
|
||||
expect(result).not.toContain('Timeline');
|
||||
expect(result).not.toContain('Secret event');
|
||||
});
|
||||
|
||||
test('collapses excessive blank lines', () => {
|
||||
const input = '# Title\n\n\n\n\nContent.';
|
||||
expect(makeShareable(input)).toBe('# Title\n\nContent.');
|
||||
});
|
||||
|
||||
test('handles empty input', () => {
|
||||
expect(makeShareable('')).toBe('');
|
||||
});
|
||||
|
||||
test('handles frontmatter-only input', () => {
|
||||
const input = '---\ntitle: Test\n---\n';
|
||||
expect(makeShareable(input)).toBe('');
|
||||
});
|
||||
|
||||
test('strips .raw/ relative links', () => {
|
||||
const input = 'See [raw data](.raw/crustdata.json) for source.';
|
||||
const result = makeShareable(input);
|
||||
expect(result).toBe('See raw data for source.');
|
||||
});
|
||||
});
|
||||
|
||||
describe('extractTitle', () => {
|
||||
test('extracts H1 title', () => {
|
||||
expect(extractTitle('# Jane Doe\n\nContent.')).toBe('Jane Doe');
|
||||
});
|
||||
|
||||
test('extracts title with formatting', () => {
|
||||
expect(extractTitle('# **Bold** Title\n\nContent.')).toBe('**Bold** Title');
|
||||
});
|
||||
|
||||
test('returns "Document" when no H1', () => {
|
||||
expect(extractTitle('No heading here.')).toBe('Document');
|
||||
});
|
||||
|
||||
test('ignores H2 and lower', () => {
|
||||
expect(extractTitle('## Not H1\n\nContent.')).toBe('Document');
|
||||
});
|
||||
|
||||
test('picks first H1 when multiple exist', () => {
|
||||
expect(extractTitle('# First\n\n# Second')).toBe('First');
|
||||
});
|
||||
});
|
||||
|
||||
describe('encryptContent', () => {
|
||||
test('returns salt, iv, and ciphertext', () => {
|
||||
const result = encryptContent('hello world', 'password123');
|
||||
expect(result.salt).toBeTruthy();
|
||||
expect(result.iv).toBeTruthy();
|
||||
expect(result.ciphertext).toBeTruthy();
|
||||
});
|
||||
|
||||
test('produces valid base64', () => {
|
||||
const result = encryptContent('test content', 'pw');
|
||||
expect(() => Buffer.from(result.salt, 'base64')).not.toThrow();
|
||||
expect(() => Buffer.from(result.iv, 'base64')).not.toThrow();
|
||||
expect(() => Buffer.from(result.ciphertext, 'base64')).not.toThrow();
|
||||
});
|
||||
|
||||
test('different passwords produce different ciphertext', () => {
|
||||
const a = encryptContent('same text', 'password1');
|
||||
const b = encryptContent('same text', 'password2');
|
||||
expect(a.ciphertext).not.toBe(b.ciphertext);
|
||||
});
|
||||
|
||||
test('same password produces different output (random salt/iv)', () => {
|
||||
const a = encryptContent('same text', 'same password');
|
||||
const b = encryptContent('same text', 'same password');
|
||||
expect(a.salt).not.toBe(b.salt);
|
||||
expect(a.iv).not.toBe(b.iv);
|
||||
});
|
||||
|
||||
test('handles unicode content', () => {
|
||||
const result = encryptContent('Hello -- arrows -> and quotes "test"', 'pw');
|
||||
expect(result.ciphertext).toBeTruthy();
|
||||
});
|
||||
|
||||
test('handles empty string', () => {
|
||||
const result = encryptContent('', 'pw');
|
||||
expect(result.ciphertext).toBeTruthy();
|
||||
});
|
||||
});
|
||||
|
||||
describe('generatePassword', () => {
|
||||
test('default length is 16', () => {
|
||||
expect(generatePassword()).toHaveLength(16);
|
||||
});
|
||||
|
||||
test('custom length', () => {
|
||||
expect(generatePassword(8)).toHaveLength(8);
|
||||
expect(generatePassword(32)).toHaveLength(32);
|
||||
});
|
||||
|
||||
test('excludes ambiguous characters', () => {
|
||||
// No 0, O, l, 1, I (all excluded from the charset)
|
||||
for (let i = 0; i < 50; i++) {
|
||||
const pw = generatePassword(32);
|
||||
expect(pw).not.toMatch(/[0OlI1]/);
|
||||
}
|
||||
});
|
||||
|
||||
test('generates unique passwords', () => {
|
||||
const passwords = new Set(Array.from({ length: 20 }, () => generatePassword()));
|
||||
expect(passwords.size).toBe(20);
|
||||
});
|
||||
});
|
||||
|
||||
describe('generateHtml', () => {
|
||||
test('generates valid HTML with title', () => {
|
||||
const html = generateHtml({ title: 'Test Page', markdown: '# Hello\n\nWorld.' });
|
||||
expect(html).toContain('<!DOCTYPE html>');
|
||||
expect(html).toContain('<title>Test Page</title>');
|
||||
expect(html).toContain('marked.parse');
|
||||
});
|
||||
|
||||
test('includes markdown content as JSON', () => {
|
||||
const html = generateHtml({ title: 'T', markdown: '# Test Content' });
|
||||
expect(html).toContain('# Test Content');
|
||||
});
|
||||
|
||||
test('escapes HTML in title', () => {
|
||||
const html = generateHtml({ title: '<script>alert("xss")</script>', markdown: 'x' });
|
||||
expect(html).not.toContain('<script>alert("xss")</script>');
|
||||
expect(html).toContain('<script>');
|
||||
});
|
||||
|
||||
test('includes password UI when encrypted', () => {
|
||||
const encrypted = encryptContent('secret', 'pw');
|
||||
const html = generateHtml({ title: 'T', markdown: 'x', encrypted });
|
||||
expect(html).toContain('pw-overlay');
|
||||
expect(html).toContain('pw-form');
|
||||
expect(html).toContain('Enter password');
|
||||
expect(html).toContain('window.__SALT');
|
||||
expect(html).toContain('window.__IV');
|
||||
expect(html).toContain('window.__CT');
|
||||
});
|
||||
|
||||
test('no password UI when unencrypted', () => {
|
||||
const html = generateHtml({ title: 'T', markdown: 'x' });
|
||||
expect(html).not.toContain('pw-overlay');
|
||||
expect(html).not.toContain('window.__SALT');
|
||||
});
|
||||
|
||||
test('includes dark mode CSS', () => {
|
||||
const html = generateHtml({ title: 'T', markdown: 'x' });
|
||||
expect(html).toContain('prefers-color-scheme: dark');
|
||||
});
|
||||
|
||||
test('includes marked.js CDN', () => {
|
||||
const html = generateHtml({ title: 'T', markdown: 'x' });
|
||||
expect(html).toContain('cdn.jsdelivr.net/npm/marked');
|
||||
});
|
||||
});
|
||||
50
test/report.test.ts
Normal file
50
test/report.test.ts
Normal file
@@ -0,0 +1,50 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { mkdirSync, readFileSync, existsSync, rmSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
import { tmpdir } from 'os';
|
||||
|
||||
// Test the report command's output format by importing the logic
|
||||
// Since runReport reads from stdin/args and writes to disk, we test
|
||||
// the file creation pattern directly.
|
||||
|
||||
describe('report output format', () => {
|
||||
const testDir = join(tmpdir(), `gbrain-report-test-${Date.now()}`);
|
||||
|
||||
test('creates report directory structure', () => {
|
||||
const reportDir = join(testDir, 'reports', 'test-type');
|
||||
mkdirSync(reportDir, { recursive: true });
|
||||
expect(existsSync(reportDir)).toBe(true);
|
||||
rmSync(testDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
test('report filename format is YYYY-MM-DD-HHMM.md', () => {
|
||||
const now = new Date();
|
||||
const pad = (n: number) => String(n).padStart(2, '0');
|
||||
const filename = `${now.getFullYear()}-${pad(now.getMonth() + 1)}-${pad(now.getDate())}-${pad(now.getHours())}${pad(now.getMinutes())}.md`;
|
||||
expect(filename).toMatch(/^\d{4}-\d{2}-\d{2}-\d{4}\.md$/);
|
||||
});
|
||||
|
||||
test('report page has correct frontmatter structure', () => {
|
||||
const title = 'Enrichment Sweep';
|
||||
const reportType = 'enrichment-sweep';
|
||||
const date = '2026-04-11';
|
||||
const time = '14:30';
|
||||
|
||||
const page = `---
|
||||
title: "${title} -- ${date}"
|
||||
type: report
|
||||
report_type: ${reportType}
|
||||
date: ${date}
|
||||
time: "${time}"
|
||||
---
|
||||
|
||||
# ${title} -- ${date} ${time}
|
||||
|
||||
Report content here.
|
||||
`;
|
||||
|
||||
expect(page).toContain('type: report');
|
||||
expect(page).toContain('report_type: enrichment-sweep');
|
||||
expect(page).toContain('# Enrichment Sweep');
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user