# Ingest Skill Ingest meetings, articles, media, documents, and conversations into the brain. > **Filing rule:** Read `skills/_brain-filing-rules.md` before creating any new page. ## Iron Law: Back-Linking (MANDATORY) Every mention of a person or company with a brain page MUST create a back-link FROM that entity's page TO the page mentioning them. An unlinked mention is a broken brain. See `skills/_brain-filing-rules.md` for format. ## Citation Requirements (MANDATORY) Every fact written to a brain page must carry an inline `[Source: ...]` citation. - **User's statements:** `[Source: User, {context}, YYYY-MM-DD]` - **Meeting data:** `[Source: Meeting "{title}", YYYY-MM-DD]` - **Email/message:** `[Source: email from {name} re: {subject}, YYYY-MM-DD]` - **Web content:** `[Source: {publication}, {URL}, YYYY-MM-DD]` - **Social media:** `[Source: X/@handle, YYYY-MM-DD](URL)` (include link) - **Synthesis:** `[Source: compiled from {sources}]` ## Workflow 1. **Parse the source.** Extract people, companies, dates, and events from the input. 2. **For each entity mentioned:** - Read the entity's page from gbrain to check if it exists - If exists: update compiled_truth (rewrite State section with new info, don't append) - If new: check notability gate, then store the page in gbrain with the appropriate type and slug 3. **Append to timeline.** Add a timeline entry in gbrain for each event, with date, summary, and source citation. 4. **Create cross-reference links.** Link entities in gbrain for every entity pair mentioned together, using the appropriate relationship type. 5. **Back-link all entities.** Update EVERY mentioned entity's page with a back-link to this page (Iron Law). 6. **Timeline merge.** The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page. ## Entity Detection on Every Message Production agents should detect entity mentions on EVERY inbound message. This is the signal detection loop that makes the brain compound over time. ### Protocol 1. **Scan the message** for entity mentions: people, companies, concepts, original thinking. Fire on every message (no exceptions unless purely operational). 2. **For each entity detected:** - `gbrain search "name"` -- does a page already exist? - **If yes:** load context with `gbrain get `. Use the compiled truth to inform your response. Update the page if the message contains new information. - **If no:** assess notability (see `skills/_brain-filing-rules.md`). If the entity is worth tracking, create a new page with `gbrain put ` and populate with what you know. 3. **After creating or updating pages:** sync to gbrain: ```bash gbrain sync --no-pull --no-embed ``` 4. **Don't block the conversation.** Entity detection and enrichment should happen alongside the response, not before it. The user shouldn't wait for brain writes to get an answer. ### What counts as notable - People the user interacts with or discusses (not random mentions) - Companies relevant to the user's work or interests - Concepts or frameworks the user references or creates - The user's own original thinking (ideas, theses, observations) -- highest value - See `skills/_brain-filing-rules.md` for the full notability gate ### What to capture from the user's own thinking Original thinking is the most valuable signal. Capture exact phrasing -- the user's language IS the insight. Don't paraphrase. - Novel observations or theses - Frameworks, mental models, heuristics - Connections between ideas that others miss - Contrarian positions with reasoning - Strong reactions to external stimuli (what triggered it and why) ## Media Workflows Content the user encounters should be captured in the brain. File by PRIMARY SUBJECT, not by format (see `skills/_brain-filing-rules.md`). ### Articles & Web Content **Input:** URL shared by user, or article mentioned in conversation. **Process:** 1. Fetch content (`web_fetch` or equivalent) 2. Extract: title, author, publication, date, full text 3. Summarize: executive summary + key arguments (not a rehash) 4. Extract entities: people, companies, concepts mentioned 5. **Save raw source** for provenance (see Raw Source Preservation below) 6. Analyze for the user: don't just summarize. What's interesting given what you know about them? Flag connections, contradictions, content opportunities. **Write to:** appropriate directory per filing rules (about a person -> `people/`, about a company -> `companies/`, reusable framework -> `concepts/`, raw data -> `sources/`) ### Videos & Podcasts **Input:** URL (YouTube, podcast, etc.) or local audio/video file. **Process:** 1. Get transcript -- speaker-diarized if possible (services like Diarize.io provide speaker-labeled, word-level timing) 2. **Save raw transcript** (both JSON and human-readable TXT) 3. Analyze: executive summary, key ideas, key quotes with speaker attribution, notable stories/anecdotes, people and companies mentioned 4. Extract and cross-reference all entities mentioned 5. **HARD RULE:** every video/podcast brain page MUST link to the raw diarized transcript. A page without transcript links is incomplete. **Write to:** `media/videos/` or `media/podcasts/` with back-links to all entities. **Quality bar:** - Compelling headline (not "This video discusses...") - Executive summary that makes you want to watch/listen - Key Ideas as actual insights, not topic labels - Verbatim quotes with real speaker names (not "speaker_0") - All entities extracted with context and back-linked ### PDFs & Documents **Input:** File path or URL. **Process:** 1. Extract text (OCR if scanned/image PDF) 2. **Save raw source** for provenance 3. Summarize: executive summary + key sections + notable data 4. Extract entities 5. Cross-reference from entity pages **Write to:** per filing rules (file by primary subject, not format). ### Screenshots & Images **Input:** Image file. **Process:** 1. Analyze content (OCR for text-heavy images, description for photos) 2. If tweet screenshot: extract text, author, date, route to social media workflow 3. If article screenshot: extract text, route to article workflow 4. If data/chart: extract data points, describe findings **Write to:** depends on content -- route to the appropriate workflow above. ### Meeting Transcripts **Input:** Transcript from meeting recording service, or manual notes. **Process:** 1. Pull full transcript (source of truth -- AI summaries are medium-low trust) 2. **Save raw transcript** for provenance 3. Write meeting page with YOUR analysis above the line, raw transcript below 4. **Entity propagation (MANDATORY):** for each attendee and company discussed: - Update their brain page State section if new info surfaced - Append to their Timeline with link to the meeting page - Create page if person/company is notable and has no page yet 5. A meeting is NOT fully ingested until all entity pages are updated **Write to:** `meetings/YYYY-MM-DD-short-description.md` **What makes a good meeting page:** - Reveals the real crux, not a bullet dump - Connects to existing brain pages (people, companies, deals) - Flags what changed (status, decisions, new info) - Names tension or what was left unsaid - Captures actual dynamic, not performative summary ### Social Media Content **Input:** Tweet, thread, or social media post. **Process:** 1. Fetch full content (thread, quote tweets, context) 2. If images present: OCR via vision model for full text extraction 3. Summarize: what's being said, why it matters, who's involved 4. Extract entities and update brain pages 5. Include direct link to the original post (MANDATORY for citations) **Write to:** `media/x/` for daily aggregation, or entity-specific directories if the post is primarily about a person/company. ## Raw Source Preservation Every ingested item must have its raw source preserved for provenance. **Use `gbrain files upload-raw` for automatic size routing:** ```bash gbrain files upload-raw --page --type ``` - **< 100 MB text/PDF**: stays in git (brain repo `.raw/` sidecar directories) - **>= 100 MB OR media** (video, audio, images): uploaded to cloud storage via TUS resumable upload, `.redirect.yaml` pointer left in the brain repo The `.redirect.yaml` pointer format: ```yaml target: supabase://brain-files/page-slug/filename.mp4 bucket: brain-files storage_path: page-slug/filename.mp4 size: 524288000 size_human: 500 MB hash: sha256:abc123... mime: video/mp4 uploaded: 2026-04-11T... type: transcript ``` **Accessing stored files:** - `gbrain files signed-url ` -- generate 1-hour signed URL for viewing/sharing - `gbrain files restore ` -- download back to local from cloud storage Use `put_raw_data` in gbrain to store raw API responses and metadata (JSON, not binary). ## Test Before Bulk When processing multiple items (batch video ingestion, bulk meeting processing, etc.): 1. **Test on 3-5 items first.** Run in test mode if available. 2. **Read the actual output.** Is the quality good? Are titles compelling (not "This video discusses...")? Are entities extracted and back-linked? Is the format clean? 3. **Fix what's wrong** in the approach/skill, not via one-off patches. 4. **Only then: bulk execute** with throttling, commits every 5-10 items. The marginal cost of testing 3 items first is near zero. The cost of cleaning up 100 bad pages is enormous. ## Quality Rules - Executive summary in compiled_truth must be updated, not just timeline appended - State section is REWRITTEN, not appended to. Current best understanding only. - Timeline entries are reverse-chronological (newest first) - Every person/company mentioned gets a page if notable (see filing rules) - Link types: knows, works_at, invested_in, founded, met_at, discussed - Source attribution: every timeline entry includes [Source: ...] citation - Back-links: every entity mention creates a back-link (Iron Law) - Filing: file by primary subject, not format or source (see filing rules) ## Tools Used - Read a page from gbrain (get_page) - Store/update a page in gbrain (put_page) - Add a timeline entry in gbrain (add_timeline_entry) - Link entities in gbrain (add_link) - List tags for a page (get_tags) - Tag a page in gbrain (add_tag) - Store raw data in gbrain (put_raw_data) - Check backlinks in gbrain (get_backlinks)