feat: GBrain v0.7.0 — Integration Recipes + SKILLPACK Breakout (#39)
* docs: break SKILLPACK into 17 individual guides The 1,281-line SKILLPACK monolith is now 17 individually linkable guides in docs/guides/, organized by category: core patterns, data pipelines, operations, search, and administration. GBRAIN_SKILLPACK.md becomes a structured index with categorized tables linking to each guide. The URL stays stable for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add integration guides, architecture docs, and ethos New documentation directories: - docs/integrations/ — "Getting Data In" landing page, credential gateway, meeting webhooks. Includes recipe format documentation. - docs/architecture/ — Infrastructure layer doc (import, chunk, embed, search) - docs/ethos/ — "Thin Harness, Fat Skills" essay with agent decision guide - docs/designs/ — "Homebrew for Personal AI" 10-star vision document Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add gbrain integrations command + voice-to-brain recipe New CLI command: gbrain integrations (list/show/status/doctor/stats/test) - Standalone command, no database connection needed - Uses gray-matter directly for recipe parsing (not parseMarkdown) - --json flag on every subcommand for agent-parseable output - Bare command shows senses/reflexes dashboard - Health heartbeat via ~/.gbrain/integrations/<id>/heartbeat.jsonl First recipe: recipes/twilio-voice-brain.md - Phone calls create brain pages via Twilio + OpenAI Realtime - Opinionated defaults: caller screening, brain-first lookup, quiet hours - Outbound call smoke test (GBrain calls the user to prove it works) - Validate-as-you-go credential testing - Twilio signature validation for webhook security Migration file for v0.7.0 with agent-readable changelog. 13 unit tests covering parseRecipe, CLI routing, and recipe validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Getting Data In to README, update CLAUDE.md and manifest README: voice calls in intro bullet list, new "Getting Data In" section with integration table (voice, email, X, calendar) and recipe philosophy. CLAUDE.md: reference new files (integrations.ts, recipes/, docs/guides/, docs/integrations/, docs/architecture/, docs/ethos/). manifest.json: bump to v0.7.0, add recipes_dir field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: v0.7.0 CHANGELOG, TODOS, VERSION bump CHANGELOG: v0.7.0 entry covering integration recipes, voice-to-brain, gbrain integrations command, SKILLPACK breakout, and new documentation. TODOS: 3 new items from CEO/DX reviews (constrained health_check DSL, community recipe submission, always-on deployment recipes). VERSION + package.json: bump to 0.7.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite voice recipe with agent instructions and verified links Major improvements to recipes/twilio-voice-brain.md: - Agent preamble: explains WHY sequential execution matters (each step depends on the previous), defines 4 stop points where the agent MUST pause and verify, tells agent to never say "something went wrong" but instead explain the exact error and fix - User actions are now specific: exact URLs for every credential (Twilio console, OpenAI API keys page, ngrok dashboard), what buttons to click, what fields to copy, common failure modes - All URLs verified via web search against current 2026 documentation: Twilio SID/token at twilio.com/console, OpenAI keys at platform.openai.com/api-keys, ngrok token at dashboard.ngrok.com/get-started/your-authtoken - Cost estimate corrected: OpenAI Realtime is $0.06/min input + $0.24/min output (was understated), total ~$20-22/mo for 100 min - Validate-as-you-go: each credential tested immediately with exact curl commands, failure messages explain what went wrong and how to fix - Smoke test flow: tells user exactly what to say, verifies ALL three outputs (messaging notification + brain page + search result) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add "Homebrew for Personal AI" essay (markdown is code) New essay at docs/ethos/MARKDOWN_SKILLS_AS_RECIPES.md — the distribution corollary to "Thin Harness, Fat Skills." Argues that markdown skill files are simultaneously documentation, specification, package, and source code. The agent is the package manager. The git repo is the app store. Referenced from SKILLPACK index and CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite agent instructions as command language, promote skills The OpenClaw/Hermes install block is now a drill sergeant, not a tour guide. Every step is an imperative command with exact verification criteria and explicit stop-on-failure behavior. No FYI, no suggestions, just rails. Key changes: - 11-step setup with STOP points after each step - Exact user instructions for Supabase connection string (what to click, what NOT to give the agent, what the string looks like) - "Verify: run X. You must see Y. If not: Z" after every step - Skills table now links to both skill files AND guide docs - Integration recipes table simplified (no "coming soon" placeholders) - Docs section reorganized: for agents / for humans / reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: 4 codex findings + add email-to-brain recipe Codex review found 4 issues, all fixed: 1. getStatus() returned "configured" if ANY secret was set (e.g. just OPENAI_API_KEY). Now requires ALL required secrets before marking configured. Prevents false "configured" status and spurious doctor runs. 2. Twilio health check hit unauthenticated endpoint (always 401). Now uses authenticated curl with SID:token, matching the setup validation. 3. README anchor docs/GBRAIN_SKILLPACK.md#the-dream-cycle broken after SKILLPACK rewrite. Updated to point to docs/guides/cron-schedule.md. 4. Compiled binary can't find recipes/ via import.meta.dir. Added GBRAIN_RECIPES_DIR env var override + global bun install path fallback. Also adds recipes/email-to-brain.md: Gmail deterministic collector pattern with ClawVisor credential gateway, validate-as-you-go, agent instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add email, X, calendar, and meeting sync recipes Four new integration recipes extracted from production wintermute patterns: - recipes/email-to-brain.md: Gmail via ClawVisor, deterministic collector pattern (code pulls emails with baked-in links, agent does judgment), noise filtering, signature detection, digest generation - recipes/x-to-brain.md: X API v2, timeline + mentions + keyword search, deletion detection (diffs previous run, verifies 404), engagement velocity tracking, rate limit awareness - recipes/calendar-to-brain.md: Google Calendar via ClawVisor, historical backfill (years of data), daily markdown files with attendees + locations, attendee enrichment for brain pages - recipes/meeting-sync.md: Circleback API, transcript import with speaker labels, attendee detection + filtering, entity propagation to people/ company pages, action item extraction, idempotent by source_id All recipes follow the same format: agent preamble with sequential execution rules, validate-as-you-go credentials, exact URLs for API key setup, stop-on-failure verification, and heartbeat logging. Updated README, SKILLPACK index, and integrations landing page with all 5 recipes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Google OAuth as alternative to ClawVisor in email + calendar recipes Both recipes now offer two auth options: - Option A: ClawVisor (recommended, handles OAuth + token refresh) - Option B: Google OAuth2 directly (no extra service, you manage tokens) Option B includes step-by-step instructions for Google Cloud Console: exact URLs, which buttons to click, which scopes to add, how to enable the API, and the OAuth flow for token exchange. This removes ClawVisor as a hard dependency for getting started. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add implementation guides with pseudocode and test suggestions Every recipe now includes an "Implementation Guide" section with: - Production-tested pseudocode the agent can follow to build each collector - Edge cases and failure modes discovered in real deployment - Non-obvious implementation details (why the 48h staleness heuristic, why Gmail links need authuser, why SSE responses need double-parsing) - Test suggestions: what the agent should verify after setup email-to-brain: noise filtering algorithm, signature detection patterns, Gmail link generation (authuser is critical), sent-mail dedup x-to-brain: deletion detection with 3 heuristics (7-day, 48h staleness, API verification), engagement velocity thresholds (50 min for 2x, 100 absolute jump), atomic writes, stdout contract, rate limit handling calendar-to-brain: smart chunking (monthly for sparse years, weekly for dense), attendee filtering (rooms, groups, distros), merge-with-existing (only replace ## Calendar section), date/time parsing edge cases meeting-sync: SSE double-JSON parsing, idempotency double-check (grep + filename), auto-tagging from meeting names, git commit after sync Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: 6 new guides from production patterns (wintermute extraction) New guides extracted and generalized from production deployment: - repo-architecture.md: Two-repo pattern (agent behavior vs world knowledge). Strict boundary rules, decision tree, hard rule: never write knowledge to the agent repo. - sub-agent-routing.md: Model routing table by task type. Signal detector pattern (spawn Sonnet on every message). Research pipeline pattern (Opus plans, DeepSeek executes, Opus synthesizes). Cost optimization. - skill-development.md: 5-step cycle (concept, prototype, evaluate, codify, cron). MECE discipline (no overlapping skills). Quality bar checklist. "If you ask twice, it should already be a skill." - idea-capture.md: Originality distribution rating (0-100 across 4 populations). Depth test ("could someone unfamiliar understand WHY?"). Deep cross-linking mandate. Notability filtering. - quiet-hours.md: Hold notifications 11pm-8am local time. Held messages directory pattern. Timezone-aware delivery. Morning briefing pickup. - diligence-ingestion.md: 9-step pipeline for data room materials. Detection patterns (PDF filenames, spreadsheet tabs, user language). Index.md template with bull/bear case. Company page enrichment. All PII scrubbed. Patterns generalized for any user. SKILLPACK index updated with 6 new entries. CLAUDE.md references added. All 37 SKILLPACK links verified. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: upgrade all guides to operational playbooks with pseudocode Every guide now follows the playbook structure: - Goal: one sentence, what this achieves - What the User Gets: without this / with this - Implementation: pseudocode with actual gbrain commands - Tricky Spots: production-tested gotchas - How to Verify: test steps the agent runs after setup Guides upgraded (15 files): - brain-agent-loop: on_message() loop with read/write/sync pseudocode - brain-first-lookup: 4-step lookup cascade with exact commands - brain-vs-memory: routing algorithm for 3 knowledge layers - compiled-truth: page structure + rewrite vs append rules - content-media: 3 ingest patterns (YouTube, social, PDFs) - cron-schedule: full schedule table + dream cycle pseudocode - enrichment-pipeline: 7-step protocol with tier classification - entity-detection: spawn pattern + detection prompt + notability filter - executive-assistant: 3 workflow algorithms (triage, prep, post-inbox) - meeting-ingestion: 6-step transcript-to-brain flow - operational-disciplines: 5 executable discipline blocks - originals-folder: detection + exact-phrasing capture + cross-linking - search-modes: decision tree for keyword vs hybrid vs direct - source-attribution: citation format + hierarchy + conflict resolution - Plus Goal/What User Gets headers on 6 newer guides Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add WebRTC to voice recipe + ngrok Hobby setup guide Voice recipe updates: - Added WebRTC endpoint (POST /session, GET /call, POST /tool) for browser-based calling with RNNoise noise suppression - WebRTC pseudocode with the 4 non-obvious gotchas from production (voice under audio.output.voice, no turn_detection, no session.update on connect, trigger greeting via data channel) - Recommend ngrok Hobby ($8/mo) for fixed domain instead of free tier - Fixed domain means URLs never change, Twilio never breaks New guide: docs/mcp/NGROK_SETUP.md - How to set up ngrok Hobby for both MCP and voice agent - Fixed domain setup, watchdog pattern, AI client configuration - Claude Desktop requires Settings > Integrations (not JSON config) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add dependency graph + ngrok-tunnel + credential-gateway recipes Recipes now have real dependencies via the `requires` field: - voice-to-brain requires ngrok-tunnel (needs public URL for Twilio) - email-to-brain requires credential-gateway (needs Gmail access) - calendar-to-brain requires credential-gateway (needs Calendar access) - x-to-brain and meeting-sync are standalone (direct API keys) Two new infrastructure recipes: - ngrok-tunnel: fixed public URL for MCP + voice. Recommends Hobby ($8/mo) for a domain that never changes. Includes watchdog pattern. - credential-gateway: secure Google service access via ClawVisor (recommended) or direct OAuth2. One setup, all Google recipes use it. Moved ngrok from docs/mcp/ to recipes/ — it's shared infrastructure, not MCP-specific. README and integrations landing page show dependency chains. When agent installs voice-to-brain, it sets up ngrok-tunnel first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add infra category, fix dashboard alignment, show dependencies DX audit found two bugs in gbrain integrations dashboard: 1. Column alignment broken — IDs > 18 chars ran into descriptions with no space. Fixed: pad to 22 chars. 2. ngrok-tunnel and credential-gateway showed as SENSES but they're infrastructure. Added 'infra' category. Dashboard now shows three sections: INFRASTRUCTURE (set up first), SENSES, REFLEXES. 3. Dependencies now shown inline: "AVAILABLE (needs credential-gateway)" Also added 'requires' field to JSON output for agent consumption. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add frontier model requirement disclaimer to README GBrain's markdown-is-code approach requires models capable of interpreting intent and implementing from architecture descriptions. Tested with Claude Opus 4.6 and GPT-5.4 Thinking. Smaller models will struggle with the recipe format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add PGLite → Supabase upgrade path to README Clarify the database progression: start with PGLite (Postgres as WASM, zero infrastructure, pgvector built in, nothing to install). Graduate to Supabase or self-hosted Postgres when you need connection pooling, concurrency, and remote MCP access from Claude Desktop, Cowork, ChatGPT, Perplexity Computer, or any MCP-compatible agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: revert PGLite mention (coming in next branch) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: make all 23 guides consistent (Goal/Impl/Tricky/Verify) Every guide now has exactly these sections in this order: - ## Goal (one sentence) - ## What the User Gets (without this / with this) - ## Implementation (pseudocode with gbrain commands) - ## Tricky Spots (3-5 numbered gotchas) - ## How to Verify (3-5 numbered test steps) 11 guides restructured from non-standard headings: - deterministic-collectors, live-sync, upgrades-auto-update (full rewrites) - entity-detection, diligence-ingestion, idea-capture, quiet-hours, repo-architecture, skill-development, sub-agent-routing (restructured) 23/23 guides now pass consistency audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: restructure README around the #1 blocker (getting data in) The README was leading with Postgres and database architecture. Most users are stuck at step zero: "I have an agent but it doesn't know anything about my life." New structure: 1. The Problem — your agent doesn't know your life 2. Getting Data In — integration recipes, front and center 3. The Compounding Thesis — why this matters 4. How this happened — credibility, origin story 5. When you need Postgres — scale, not starting point Postgres is de-emphasized from a full section to two paragraphs: "You don't need Postgres to start" and "When you need Postgres" (1,000+ files, remote MCP access, multiple AI clients). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move Install to top of README, remove duplicate section Install now appears right after Getting Data In (line 38), not buried at line 295. The user sees: Problem → Getting Data In → Install. Removed the duplicate Install section (262 lines) that was lower in the README. The agent instructions block, CLI quickstart, and all content is now in the single Install section near the top. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move agent install block to first thing in README "Start here: paste this into your agent" is now the first section, right after the one-line pitch. No scrolling, no context, no preamble. User opens the README, sees the paste block, copies it into OpenClaw or Hermes, and the agent takes over. Flow: pitch → paste block → Getting Data In → Compounding Thesis → origin story Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: compress install block from 11 steps to 5 The agent install block was 102 lines and 11 steps. Now it's 40 lines and 5 steps. Same coverage, half the text. Changes: - Merged "prove keyword search" + "embed" + "prove hybrid search" into one SEARCH step (the user doesn't care about the intermediate) - Merged skillpack, sync, auto-update, integrations, verification into one GO LIVE step with sub-items (post-install polish, not install) - Shortened database instructions (one line instead of 5 sub-steps) - Removed redundant preamble ("YOU MUST COMPLETE EVERY STEP" is now just "Do not skip steps. Verify each step.") The 5 steps: INSTALL → DATABASE → IMPORT → SEARCH → GO LIVE Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * security: gitignore all .env files, not just specific ones CSO audit found .gitignore covered .env.testing and .env.production but not bare .env. A user creating .env with database credentials could accidentally commit it. Fix: .env and .env.* are now gitignored. .env.*.example files are explicitly un-ignored so templates remain tracked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * security: scrub PII from essay and recipe examples - 510-MY-GARRY phone mnemonic → "Your Phone Number" - "Garry → Authenticated Mode" → "Owner → Authenticated Mode" - "Telegram" → "secure channel" in auth example - @garrytan → @yourhandle in X recipe example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
129
docs/guides/brain-agent-loop.md
Normal file
129
docs/guides/brain-agent-loop.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# The Brain-Agent Loop
|
||||
|
||||
## Goal
|
||||
|
||||
Every conversation makes the brain smarter. Every brain lookup makes responses
|
||||
better. The loop compounds daily.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: the agent answers from stale context. You discuss a deal on Monday,
|
||||
and by Friday the agent has forgotten. Every conversation starts from zero.
|
||||
|
||||
With this: six months in, the agent knows more about your world than you can hold
|
||||
in working memory. It never forgets. It never stops indexing.
|
||||
|
||||
## The Loop
|
||||
|
||||
```
|
||||
Signal arrives (message, meeting, email, tweet, link)
|
||||
│
|
||||
▼
|
||||
DETECT entities (people, companies, concepts, original thinking)
|
||||
│ → spawn sub-agent (see entity-detection.md)
|
||||
│
|
||||
▼
|
||||
READ: check brain FIRST (before responding)
|
||||
│ → gbrain search "{entity name}"
|
||||
│ → gbrain get {slug} (if you know it)
|
||||
│ → gbrain query "what do we know about {topic}"
|
||||
│
|
||||
▼
|
||||
RESPOND with brain context (every answer is better with context)
|
||||
│
|
||||
▼
|
||||
WRITE: update brain pages (new info → compiled truth + timeline)
|
||||
│ → gbrain put {slug} (update page)
|
||||
│ → add_timeline_entry (append to timeline)
|
||||
│ → add_link (cross-reference to other entities)
|
||||
│
|
||||
▼
|
||||
SYNC: gbrain indexes changes
|
||||
│ → gbrain sync --no-pull --no-embed
|
||||
│
|
||||
▼
|
||||
(next signal arrives — agent is now smarter)
|
||||
```
|
||||
|
||||
## Implementation
|
||||
|
||||
### On Every Inbound Message
|
||||
|
||||
```
|
||||
on_message(text):
|
||||
// 1. DETECT (async, don't block)
|
||||
spawn_entity_detector(text)
|
||||
|
||||
// 2. READ (before composing response)
|
||||
entities = extract_entity_names(text) // quick regex/NER
|
||||
context = []
|
||||
for name in entities:
|
||||
results = gbrain_search(name)
|
||||
if results:
|
||||
page = gbrain_get(results[0].slug)
|
||||
context.append(page.compiled_truth)
|
||||
|
||||
// 3. RESPOND (with brain context injected)
|
||||
response = compose_response(text, context)
|
||||
|
||||
// 4. WRITE (after responding, if new info emerged)
|
||||
if response_contains_new_info(response):
|
||||
for entity in mentioned_entities:
|
||||
gbrain_add_timeline_entry(entity.slug, {
|
||||
date: today,
|
||||
summary: "Discussed {topic}",
|
||||
source: "[Source: User, conversation, {date}]"
|
||||
})
|
||||
|
||||
// 5. SYNC
|
||||
gbrain_sync()
|
||||
```
|
||||
|
||||
### The Two Invariants
|
||||
|
||||
1. **Every READ improves the response.** If you answered a question about a
|
||||
person without checking their brain page first, you gave a worse answer
|
||||
than you could have. The brain almost always has something. External APIs
|
||||
fill gaps, they don't start from scratch.
|
||||
|
||||
2. **Every WRITE improves future reads.** If a meeting transcript mentioned
|
||||
new information about a company and you didn't update the company page,
|
||||
you created a gap that will bite you later.
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Read BEFORE responding, not after.** The temptation is to respond first
|
||||
and update the brain later. But the brain context makes the response better.
|
||||
Read first.
|
||||
|
||||
2. **Don't skip the write step.** "I'll update the brain later" means never.
|
||||
Write immediately after the conversation, while the context is fresh.
|
||||
|
||||
3. **Sync after every write batch.** Without sync, the brain search index is
|
||||
stale. The next query won't find what you just wrote.
|
||||
|
||||
4. **External APIs are fallback, not primary.** `gbrain search` before
|
||||
Brave Search. `gbrain get` before Crustdata. The brain has relationship
|
||||
history, your own assessments, meeting transcripts, cross-references.
|
||||
No external API can provide that.
|
||||
|
||||
## How to Verify It Works
|
||||
|
||||
1. **Mention a person the brain knows.** Ask "what do we know about {name}?"
|
||||
The agent should search the brain and return compiled truth, not hallucinate
|
||||
or do a web search.
|
||||
|
||||
2. **Discuss something new about a known entity.** Say "I heard Acme Corp
|
||||
just raised Series B." After the conversation, check: does Acme Corp's
|
||||
brain page have a new timeline entry?
|
||||
|
||||
3. **Ask about the same person a day later.** The agent should immediately
|
||||
pull brain context without you asking. If it doesn't reference the brain
|
||||
page, the loop isn't running.
|
||||
|
||||
4. **Check the sync.** After a conversation, run `gbrain search "{topic}"`
|
||||
from the CLI. The new information should be searchable.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md). See also: [Entity Detection](entity-detection.md), [Brain-First Lookup](brain-first-lookup.md)*
|
||||
85
docs/guides/brain-first-lookup.md
Normal file
85
docs/guides/brain-first-lookup.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Brain-First Lookup Protocol
|
||||
|
||||
## Goal
|
||||
|
||||
Check the brain before calling ANY external API. The brain almost always has
|
||||
something. External APIs fill gaps, they don't start from scratch.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: the agent calls Brave Search for someone you've had 12 meetings with.
|
||||
You get a LinkedIn summary instead of your relationship history.
|
||||
|
||||
With this: the agent pulls your compiled truth, recent timeline entries, and
|
||||
shared context before doing anything else. External APIs only fill gaps.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
lookup(name_or_topic):
|
||||
// STEP 1: Keyword search (fast, works day one, no embeddings needed)
|
||||
results = gbrain search "{name_or_topic}"
|
||||
if results.length > 0:
|
||||
page = gbrain get {results[0].slug}
|
||||
return page // done, brain had it
|
||||
|
||||
// STEP 2: Hybrid search (needs embeddings, finds semantic matches)
|
||||
results = gbrain query "what do we know about {name_or_topic}"
|
||||
if results.length > 0:
|
||||
page = gbrain get {results[0].slug}
|
||||
return page
|
||||
|
||||
// STEP 3: Direct slug (if you know or can guess the slug)
|
||||
page = gbrain get "people/{slugify(name_or_topic)}"
|
||||
if page: return page
|
||||
|
||||
// STEP 4: External API (FALLBACK ONLY)
|
||||
// Only reach here if brain has nothing
|
||||
return external_search(name_or_topic)
|
||||
```
|
||||
|
||||
**This is mandatory.** An agent that calls Brave Search before checking the brain
|
||||
is wasting money and giving worse answers.
|
||||
|
||||
## Why Brain First
|
||||
|
||||
The brain has context no external API can provide:
|
||||
- Relationship history (how you know them, what you discussed)
|
||||
- Your own assessments (what you think of them, not their LinkedIn bio)
|
||||
- Meeting transcripts (what was said, what was decided)
|
||||
- Cross-references (who they know, what companies they're connected to)
|
||||
- Timeline (what changed recently, what's trending)
|
||||
|
||||
A LinkedIn scrape gives you their job title. The brain gives you: "co-founded
|
||||
Brex, you had coffee with him 3 times, last discussed the payments infrastructure
|
||||
thesis, he's interested in your take on AI agents."
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Try keyword first, then hybrid.** Keyword search works without embeddings
|
||||
(day one). Hybrid search needs embeddings but finds semantic matches. Try
|
||||
both in sequence.
|
||||
|
||||
2. **Fuzzy slug matching.** `gbrain get` supports fuzzy matching. If the exact
|
||||
slug doesn't exist, it suggests alternatives. Use this for name variants
|
||||
("Pedro" → "pedro-franceschi").
|
||||
|
||||
3. **Don't skip for "simple" questions.** Even "what's Acme Corp's address?"
|
||||
should check the brain first. The brain might have it, and the lookup adds
|
||||
no latency (< 100ms for keyword search).
|
||||
|
||||
4. **Load compiled truth + recent timeline.** The compiled truth gives you the
|
||||
state of play in 30 seconds. The timeline gives you what changed recently.
|
||||
Both together = full context.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. Ask about someone in the brain. Verify the agent searched the brain FIRST
|
||||
(check tool call order in the response).
|
||||
2. Ask about someone NOT in the brain. Verify the agent searched the brain,
|
||||
found nothing, THEN fell back to external search.
|
||||
3. Ask the same question twice. Second time should be instant (brain has it).
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md). See also: [Brain-Agent Loop](brain-agent-loop.md), [Search Modes](search-modes.md)*
|
||||
75
docs/guides/brain-vs-memory.md
Normal file
75
docs/guides/brain-vs-memory.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Brain vs Memory vs Session
|
||||
|
||||
## Goal
|
||||
Know what goes in GBrain, what goes in agent memory, and what stays in session context -- so every piece of information lands in the right layer.
|
||||
|
||||
## What the User Gets
|
||||
Without this: people dossiers get stored in agent memory (lost on agent reset), user preferences get stored in GBrain (cluttering knowledge pages), and the agent re-asks questions it already knows the answer to. With this: world knowledge persists in the brain, operational state persists in agent memory, and the agent never puts information in the wrong layer.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
on new_information(info):
|
||||
# Three layers, three purposes -- route to the right one
|
||||
|
||||
if info.is_about_the_world:
|
||||
# GBRAIN: people, companies, deals, meetings, concepts, ideas
|
||||
# This is world knowledge -- facts about entities external to the agent
|
||||
gbrain put <slug> --content "..."
|
||||
# Examples:
|
||||
# "Pedro is CEO of Brex" -> gbrain (person page)
|
||||
# "Brex raised Series D at $12B" -> gbrain (company page)
|
||||
# "Tuesday's meeting covered Q2" -> gbrain (meeting page)
|
||||
# "The meatsuit maintenance tax" -> gbrain (originals page)
|
||||
|
||||
elif info.is_about_operations:
|
||||
# AGENT MEMORY: preferences, decisions, tool config, session continuity
|
||||
# This is how the agent operates -- not facts about the world
|
||||
memory_write(info)
|
||||
# Examples:
|
||||
# "User prefers concise formatting" -> agent memory
|
||||
# "Deploy to staging before prod" -> agent memory
|
||||
# "Use dark mode in code blocks" -> agent memory
|
||||
# "API key for Crustdata goes in .env" -> agent memory
|
||||
|
||||
elif info.is_current_conversation:
|
||||
# SESSION CONTEXT: what was just said, current task, immediate state
|
||||
# This is automatic -- already in the conversation window
|
||||
# No storage action needed
|
||||
# Examples:
|
||||
# "We were just discussing the board deck" -> session
|
||||
# "You asked me to review this PR" -> session
|
||||
# "The file I just shared" -> session
|
||||
|
||||
# Lookup routing:
|
||||
on user_asks(question):
|
||||
if question.about_person or question.about_company or question.about_meeting:
|
||||
gbrain search "{entity}" # -> world knowledge
|
||||
gbrain get <slug>
|
||||
|
||||
elif question.about_preference or question.about_how_to_operate:
|
||||
memory_search("{topic}") # -> operational state
|
||||
|
||||
elif question.about_current_context:
|
||||
# Already in session -- just reference conversation history
|
||||
pass
|
||||
```
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Don't store people in agent memory.** "Pedro prefers email over Slack" feels like a preference, but it's a fact about Pedro -- it goes in GBrain on Pedro's page. Agent memory is for the agent's own operational state, not facts about people in the world.
|
||||
2. **Don't store user preferences in GBrain.** "User likes bullet points over paragraphs" is about how the agent should behave, not about the world. It goes in agent memory. GBrain pages are for entities, not for agent configuration.
|
||||
3. **Synthesis of external ideas goes in GBrain.** "User's take on Peter Thiel's zero-to-one framework" is the user's original thinking -- it goes in GBrain under originals/, not in agent memory.
|
||||
4. **Agent memory doesn't survive agent resets on some platforms.** Critical world knowledge MUST be in GBrain, which is durable. If the agent loses memory, the brain still has everything.
|
||||
5. **When in doubt, ask: is this about the world or about how to operate?** World -> GBrain. Operations -> agent memory. Current conversation -> session.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. Ask the agent "Who is Pedro?" -- confirm it runs `gbrain search` or `gbrain get`, not `memory_search`. Person lookup should hit GBrain.
|
||||
2. Ask the agent "How should I format responses?" -- confirm it checks agent memory, not GBrain. Preferences are operational state.
|
||||
3. Check that no person or company pages exist in agent memory storage. Run `memory_search "person"` -- it should return preferences, not dossiers.
|
||||
4. Check that GBrain doesn't contain pages about agent behavior. Run `gbrain search "user prefers"` -- it should return nothing (preferences belong in agent memory).
|
||||
5. After an agent reset, confirm GBrain knowledge is still accessible. Run `gbrain get <any_slug>` -- world knowledge should survive the reset.
|
||||
|
||||
---
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
137
docs/guides/compiled-truth.md
Normal file
137
docs/guides/compiled-truth.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Compiled Truth + Timeline Pattern
|
||||
|
||||
## Goal
|
||||
|
||||
Every brain page has two zones: compiled truth (current synthesis, rewritten as
|
||||
evidence changes) and timeline (append-only evidence trail, never edited).
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: brain pages are append-only logs. To understand a person, you read
|
||||
200 timeline entries. The answer is buried in entry #147.
|
||||
|
||||
With this: the compiled truth gives you the state of play in 30 seconds. The
|
||||
timeline is the proof. Six months of entries compress into a one-paragraph
|
||||
assessment that's always current.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Page Structure
|
||||
|
||||
```markdown
|
||||
---
|
||||
type: person
|
||||
title: Sarah Chen
|
||||
tags: [engineering, acme-corp]
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
One paragraph. How you know them, why they matter.
|
||||
|
||||
## State
|
||||
VP Engineering at Acme Corp. Managing 45-person team. Reports to CEO.
|
||||
|
||||
## What They Believe
|
||||
Strong opinions on test coverage. "Ship it when the tests pass, not before."
|
||||
|
||||
## What They're Building
|
||||
Leading the API migration from REST to GraphQL. Target: Q3 completion.
|
||||
|
||||
## Assessment
|
||||
Sharp technical leader. Under-appreciated internally. Watch for signs of burnout.
|
||||
|
||||
## Trajectory
|
||||
Ascending. Likely CTO track if the migration succeeds.
|
||||
|
||||
## Relationship
|
||||
Met through Pedro. Had coffee 3x. Last: discussed API architecture thesis.
|
||||
|
||||
## Contact
|
||||
sarah@acmecorp.com | @sarahchen | linkedin.com/in/sarahchen
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
- **2026-04-07** | Met at team sync. Discussed API migration timeline.
|
||||
Seemed energized about GraphQL pivot.
|
||||
[Source: Meeting notes, 2026-04-07 2:00 PM PT]
|
||||
- **2026-04-03** | Mentioned in email re Q2 planning. Taking lead on ops.
|
||||
[Source: Gmail, sarah@acmecorp.com, 2026-04-03 10:30 AM PT]
|
||||
- **2026-03-15** | First meeting. Intro from Pedro. Strong technical background.
|
||||
[Source: User, direct conversation, 2026-03-15 3:00 PM PT]
|
||||
```
|
||||
|
||||
### Updating a Page
|
||||
|
||||
```
|
||||
update_brain_page(slug, new_info, source):
|
||||
page = gbrain get {slug}
|
||||
|
||||
// TIMELINE: always APPEND (never edit existing entries)
|
||||
gbrain add_timeline_entry {slug} {
|
||||
date: today,
|
||||
summary: new_info.summary,
|
||||
detail: new_info.detail,
|
||||
source: format_source(source) // [Source: who, channel, date time tz]
|
||||
}
|
||||
|
||||
// COMPILED TRUTH: REWRITE (not append)
|
||||
// Read the existing compiled truth
|
||||
// Integrate new information
|
||||
// Write the updated synthesis
|
||||
updated_truth = rewrite_compiled_truth(page.compiled_truth, new_info)
|
||||
gbrain put {slug} {
|
||||
compiled_truth: updated_truth,
|
||||
// timeline is NOT passed — it's managed by add_timeline_entry
|
||||
}
|
||||
```
|
||||
|
||||
### The Rules
|
||||
|
||||
| Zone | Action | Explanation |
|
||||
|------|--------|-------------|
|
||||
| Compiled truth | **REWRITE** | Current synthesis. Changes when evidence changes. |
|
||||
| Timeline | **APPEND** | Evidence trail. Never edited, only added to. |
|
||||
|
||||
**Every compiled truth claim must trace to timeline entries.** If the Assessment
|
||||
says "under-appreciated internally," there should be timeline entries that
|
||||
support that claim.
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **REWRITE means rewrite, not append.** Don't add a new paragraph to compiled
|
||||
truth. Rewrite the entire section with the new information integrated. Old
|
||||
assessments that are no longer accurate should be updated, not kept alongside
|
||||
contradictory new ones.
|
||||
|
||||
2. **Timeline entries are immutable.** Never edit a timeline entry. If information
|
||||
turns out to be wrong, add a NEW entry correcting it:
|
||||
`- 2026-04-10 | Correction: Sarah is VP Eng, not CTO. Previous entry was wrong.`
|
||||
|
||||
3. **GBrain search weights compiled truth higher.** `gbrain query` returns compiled
|
||||
truth chunks with higher relevance than timeline chunks. This means the freshest
|
||||
synthesis surfaces first in search results.
|
||||
|
||||
4. **The --- separator matters.** GBrain uses the first standalone `---` after
|
||||
frontmatter to split compiled_truth from timeline. Everything above is compiled
|
||||
truth, everything below is timeline.
|
||||
|
||||
5. **Don't skip the Assessment section.** The assessment is the value. "Strong
|
||||
technical leader" is something no API can provide. It's YOUR read on this
|
||||
person. That's what makes the brain page better than LinkedIn.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Update a person page.** Add new meeting info. Check: compiled truth was
|
||||
REWRITTEN (not appended), timeline has new entry at the top.
|
||||
2. **Search for the person.** `gbrain query "Sarah Chen"`. The compiled truth
|
||||
(current synthesis) should appear first, not a random timeline entry.
|
||||
3. **Check traceability.** Every claim in compiled truth should have a
|
||||
corresponding timeline entry. Read both sections and verify.
|
||||
4. **Check immutability.** After update, old timeline entries should be unchanged.
|
||||
Dates, sources, and content should match the originals exactly.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md). See also: [Source Attribution](source-attribution.md), [Entity Detection](entity-detection.md)*
|
||||
136
docs/guides/content-media.md
Normal file
136
docs/guides/content-media.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# Content and Media Ingestion
|
||||
|
||||
## Goal
|
||||
YouTube videos, social media, PDFs, and documents become searchable brain pages with the agent's own analysis and full cross-references to every entity mentioned.
|
||||
|
||||
## What the User Gets
|
||||
Without this: media links are bookmarks that decay -- you remember watching a video but can't find what was said, who said it, or why it mattered. With this: every piece of media is a permanent brain page with the agent's analysis layered on top, every mentioned entity gets a back-link, and the full content is searchable forever.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
on user_shares_media(url_or_file):
|
||||
|
||||
# PATTERN 1: YouTube Video Ingestion
|
||||
if media.type == "youtube":
|
||||
# Step 1: Get FULL transcript with speaker diarization
|
||||
# WHO said WHAT -- not just a wall of text
|
||||
# Use Diarize.io or equivalent service
|
||||
transcript = diarize(video_url) # speaker-attributed transcript
|
||||
# NEVER use YouTube's auto-generated summary or AI summary
|
||||
|
||||
# Step 2: Agent writes OWN analysis (this is the value)
|
||||
# NOT a summary. NOT regurgitation. The agent's TAKE:
|
||||
# - What matters and why (given the user's worldview)
|
||||
# - Key quotes attributed to specific speakers
|
||||
# - Connections to existing brain pages
|
||||
# - Implications and follow-up angles
|
||||
analysis = agent_analyze(transcript, user_context)
|
||||
|
||||
# Step 3: Create brain page
|
||||
slug = f"media/youtube/{video_slug}"
|
||||
gbrain put <slug> --content """
|
||||
# {title}
|
||||
**Channel:** {channel} | **Date:** {date} | **Link:** {url}
|
||||
|
||||
## Analysis
|
||||
{agent_analysis}
|
||||
|
||||
## Key Quotes
|
||||
- **{Speaker}** ({timestamp}): "{quote}" -- {why_it_matters}
|
||||
|
||||
---
|
||||
## Full Transcript
|
||||
{diarized_transcript}
|
||||
"""
|
||||
|
||||
# Step 4: Extract and cross-reference entities
|
||||
for person in transcript.mentioned_people:
|
||||
gbrain add_link <slug> <person_slug>
|
||||
gbrain add_link <person_slug> <slug>
|
||||
gbrain add_timeline_entry <person_slug> \
|
||||
--entry "Discussed in {video_title}: {what_was_said}" \
|
||||
--source "YouTube: {url}"
|
||||
|
||||
# PATTERN 2: Social Media Bundles
|
||||
elif media.type == "tweet" or media.type == "social":
|
||||
# Don't just save a tweet -- reconstruct FULL context
|
||||
bundle = {
|
||||
"original": fetch_tweet(url),
|
||||
"thread": reconstruct_thread(url), # quoted tweets, replies
|
||||
"linked_articles": fetch_linked_urls(), # fetch and summarize
|
||||
"engagement": get_engagement_data(), # what resonated
|
||||
}
|
||||
|
||||
slug = f"media/social/{platform}-{author}-{date}"
|
||||
gbrain put <slug> --content """
|
||||
# {author}: {topic}
|
||||
{agent_analysis_of_full_bundle}
|
||||
|
||||
## Thread
|
||||
{reconstructed_thread}
|
||||
|
||||
## Linked Articles
|
||||
{article_summaries}
|
||||
|
||||
---
|
||||
## Raw
|
||||
{original_tweet_text}
|
||||
"""
|
||||
|
||||
# Extract entities and cross-reference
|
||||
for entity in bundle.mentioned_entities:
|
||||
gbrain add_link <slug> <entity_slug>
|
||||
gbrain add_link <entity_slug> <slug>
|
||||
|
||||
# PATTERN 3: PDFs and Documents
|
||||
elif media.type == "pdf" or media.type == "document":
|
||||
# OCR if needed (scanned PDFs)
|
||||
content = ocr_if_needed(file) or extract_text(file)
|
||||
|
||||
# For books and long-form:
|
||||
slug = f"sources/{document_slug}"
|
||||
gbrain put <slug> --content """
|
||||
# {title}
|
||||
**Author:** {author} | **Date:** {date}
|
||||
|
||||
## Chapter Summaries
|
||||
{per_chapter_summary}
|
||||
|
||||
## Key Quotes
|
||||
- p.{page}: "{quote}" -- {why_it_matters}
|
||||
|
||||
## Cross-References
|
||||
{links_to_brain_pages_for_people_and_concepts}
|
||||
|
||||
---
|
||||
## Source
|
||||
{full_text_or_key_sections}
|
||||
"""
|
||||
|
||||
for entity in document.mentioned_entities:
|
||||
gbrain add_link <slug> <entity_slug>
|
||||
gbrain add_link <entity_slug> <slug>
|
||||
|
||||
# Always sync after ingestion
|
||||
gbrain sync
|
||||
```
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Always FULL transcript, never AI summary.** YouTube's auto-summary and AI-generated summaries lose the texture: who said what, exact phrasing, tone, what was left unsaid. The full diarized transcript is the evidence base. The agent's analysis goes above it.
|
||||
2. **The agent's OWN analysis is the value, not regurgitation.** "The video discussed AI safety" is worthless. "Dario made a specific claim about compute scaling that contradicts what Ilya said in the NeurIPS talk -- see media/youtube/ilya-neurips-2025" is useful. The analysis connects the new media to the existing brain.
|
||||
3. **Social media is a bundle, not a single tweet.** A tweet without its thread, quoted tweets, linked articles, and engagement context is a fragment. Reconstruct the full context before creating the brain page.
|
||||
4. **Cross-references make media pages alive.** A YouTube page without back-links to the people and companies mentioned is a dead archive. Every mentioned entity gets a link and a timeline entry.
|
||||
5. **Over time, `media/` becomes a searchable archive.** Every video, podcast, talk, interview, article, and tweet the user has consumed, with the agent's commentary layered on top. This is the memex at full power.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. Ingest a YouTube video. Run `gbrain get media/youtube/{slug}`. Confirm the page has: the agent's analysis (not just a summary), key quotes with speaker attribution, and the full diarized transcript.
|
||||
2. Run `gbrain get_links media/youtube/{slug}`. Confirm back-links exist to brain pages for every person and company mentioned in the video.
|
||||
3. Pick a person mentioned in the video. Run `gbrain get <person_slug>`. Confirm their timeline has a new entry referencing the video with specific context.
|
||||
4. Ingest a tweet. Confirm the brain page includes the thread context, linked article summaries, and entity cross-references -- not just the tweet text.
|
||||
5. Run `gbrain search "{topic_from_video}"`. Confirm the media page appears in search results (verifies the content is indexed and searchable).
|
||||
|
||||
---
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
193
docs/guides/cron-schedule.md
Normal file
193
docs/guides/cron-schedule.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# Reference Cron Schedule
|
||||
|
||||
## Goal
|
||||
|
||||
A production brain runs 20+ recurring jobs that keep it alive, current, and
|
||||
compounding. This guide shows the schedule, the patterns, and how to set it up.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: the brain only updates when you manually ingest data. Pages go
|
||||
stale, entities are thin, citations break, and the agent answers from old context.
|
||||
|
||||
With this: the brain maintains itself. Email, social, calendar, and meetings
|
||||
flow in automatically. Thin pages get enriched overnight. Broken citations get
|
||||
fixed. You wake up and the brain is smarter than when you went to sleep.
|
||||
|
||||
## The Schedule
|
||||
|
||||
| Frequency | Job | Brain Interaction | Recipe |
|
||||
|-----------|-----|-------------------|--------|
|
||||
| Every 30 min | Email monitoring | Search sender, update people pages | [email-to-brain](../../recipes/email-to-brain.md) |
|
||||
| Every 30 min | X/Twitter collection | Create/update media pages, entity extraction | [x-to-brain](../../recipes/x-to-brain.md) |
|
||||
| 3x/day (weekdays) | Meeting sync | Full ingestion + attendee propagation | [meeting-sync](../../recipes/meeting-sync.md) |
|
||||
| Weekly | Calendar sync | Daily files + attendee enrichment | [calendar-to-brain](../../recipes/calendar-to-brain.md) |
|
||||
| Daily AM | Morning briefing | Search calendar attendees, deal status, active threads | [briefing skill](../../skills/briefing/SKILL.md) |
|
||||
| Weekly | Brain maintenance | `gbrain doctor`, embed stale, orphan detection | [maintain skill](../../skills/maintain/SKILL.md) |
|
||||
| Nightly | Dream cycle | Entity sweep, enrich thin spots, fix citations | See below |
|
||||
|
||||
## Implementation: Setting Up Cron Jobs
|
||||
|
||||
```bash
|
||||
# Email collector — every 30 minutes
|
||||
*/30 * * * * cd /path/to/email-collector && node email-collector.mjs collect && node email-collector.mjs digest
|
||||
|
||||
# X/Twitter collector — every 30 minutes
|
||||
*/30 * * * * cd /path/to/x-collector && node x-collector.mjs collect >> /tmp/x-collector.log 2>&1
|
||||
|
||||
# Meeting sync — 10 AM, 4 PM, 9 PM on weekdays
|
||||
0 10,16,21 * * 1-5 cd /path/to/meeting-sync && node meeting-sync.mjs >> /tmp/meeting-sync.log 2>&1
|
||||
|
||||
# Calendar sync — Sundays at 10 AM
|
||||
0 10 * * 0 cd /path/to/calendar-sync && node calendar-sync.mjs --start $(date -v-7d +%Y-%m-%d) --end $(date +%Y-%m-%d)
|
||||
|
||||
# Brain health — weekly Mondays at 6 AM
|
||||
0 6 * * 1 gbrain doctor --json >> /tmp/gbrain-health.log 2>&1 && gbrain embed --stale
|
||||
|
||||
# Dream cycle — nightly at 2 AM
|
||||
0 2 * * * /path/to/dream-cycle.sh
|
||||
```
|
||||
|
||||
### Quiet Hours Gate (MANDATORY)
|
||||
|
||||
Every cron job that sends notifications MUST check quiet hours first.
|
||||
See [Quiet Hours](quiet-hours.md) for the full pattern.
|
||||
|
||||
```bash
|
||||
# In every cron script:
|
||||
if ! bash scripts/quiet-hours-gate.sh; then
|
||||
mkdir -p /tmp/cron-held
|
||||
echo "$OUTPUT" > /tmp/cron-held/$(basename "$0" .sh).md
|
||||
exit 0
|
||||
fi
|
||||
# Not quiet hours — send normally
|
||||
```
|
||||
|
||||
### Travel-Aware Timezone Handling
|
||||
|
||||
The agent reads your calendar for flights, hotels, and out-of-office blocks to
|
||||
infer your current location and timezone. All times shown in YOUR local timezone.
|
||||
|
||||
```
|
||||
// Example: user flew to Tokyo
|
||||
// 2 PM Pacific = 3 AM Tokyo = quiet hours
|
||||
// Hold the notification, fold into morning briefing
|
||||
|
||||
get_user_timezone():
|
||||
calendar = gbrain search "flight" --type calendar --recent 7d
|
||||
if recent_flight:
|
||||
return infer_timezone(flight.destination)
|
||||
return config.default_timezone // fallback: US/Pacific
|
||||
```
|
||||
|
||||
When you travel: cron jobs that would fire during your waking hours at home but
|
||||
hit your sleeping hours at the destination get held and folded into the next
|
||||
morning briefing. Zero config change needed.
|
||||
|
||||
## The Dream Cycle
|
||||
|
||||
The most important cron job. Runs while you sleep.
|
||||
|
||||
### What It Does
|
||||
|
||||
```
|
||||
dream_cycle():
|
||||
// Phase 1: Entity Sweep
|
||||
conversations = get_todays_conversations()
|
||||
for message in conversations:
|
||||
entities = detect_entities(message)
|
||||
for entity in entities:
|
||||
page = gbrain search "{entity.name}"
|
||||
if not page:
|
||||
create_page(entity) // new entity, create + enrich
|
||||
elif page.is_thin():
|
||||
enrich_page(entity) // thin page, fill it out
|
||||
else:
|
||||
update_timeline(entity) // existing page, add today's mentions
|
||||
|
||||
// Phase 2: Fix Broken Citations
|
||||
pages = gbrain list --type person --limit 100
|
||||
for page in pages:
|
||||
for entry in page.timeline:
|
||||
if not entry.has_source_attribution():
|
||||
fix_citation(entry) // add [Source: ...] where missing
|
||||
if entry.has_tweet_url() and not entry.url_is_valid():
|
||||
fix_url(entry) // broken tweet links
|
||||
|
||||
// Phase 3: Consolidate Memory
|
||||
patterns = detect_patterns_across_conversations()
|
||||
for pattern in patterns:
|
||||
promote_to_memory(pattern) // ephemeral → durable knowledge
|
||||
|
||||
// Phase 4: Sync
|
||||
gbrain sync --no-pull --no-embed
|
||||
gbrain embed --stale
|
||||
```
|
||||
|
||||
### Setting Up the Dream Cycle
|
||||
|
||||
**OpenClaw:** Ships with DREAMS.md as a default skill. Three phases (light,
|
||||
deep, REM) run automatically during quiet hours.
|
||||
|
||||
**Hermes Agent:**
|
||||
```bash
|
||||
/cron add "0 2 * * *" "Dream cycle: search today's sessions for
|
||||
entities I mentioned. For each person, company, or idea: check
|
||||
if a brain page exists (gbrain search), create or update it if
|
||||
thin. Fix any broken citations. Then consolidate: read MEMORY.md,
|
||||
promote important signals, remove stale entries."
|
||||
--name "nightly-dream-cycle"
|
||||
```
|
||||
|
||||
**Claude Code / Custom agents:** Create a script:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# dream-cycle.sh
|
||||
|
||||
# Check quiet hours (should be quiet — that's when we run)
|
||||
echo "Dream cycle starting at $(date)"
|
||||
|
||||
# Phase 1: Entity sweep (spawn sub-agent)
|
||||
# Read today's conversation logs, extract entities, update brain
|
||||
|
||||
# Phase 2: Citation hygiene
|
||||
gbrain doctor --json | jq '.checks[] | select(.status=="warn")'
|
||||
|
||||
# Phase 3: Embed any stale content
|
||||
gbrain embed --stale
|
||||
|
||||
echo "Dream cycle complete at $(date)"
|
||||
```
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **The dream cycle is NOT optional.** Without it, signal leaks out of every
|
||||
conversation. With it, nothing is lost. This is the difference between an
|
||||
agent that forgets and one that remembers.
|
||||
|
||||
2. **Quiet hours gate on EVERY notification job.** If you skip it, the user
|
||||
gets pinged at 3 AM. One 3 AM ping and they'll disable the whole system.
|
||||
|
||||
3. **Don't over-cron.** 20+ jobs sounds like a lot. Start with: email (30 min),
|
||||
dream cycle (nightly), brain health (weekly). Add more as you add
|
||||
integration recipes.
|
||||
|
||||
4. **Timezone changes are automatic.** Don't make the user reconfigure cron
|
||||
when they travel. Read the calendar, infer the timezone, adjust delivery.
|
||||
|
||||
5. **Held messages MUST be picked up.** If quiet hours hold a notification,
|
||||
the morning briefing MUST include it. Otherwise information is lost.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Quiet hours:** Set quiet hours to current hour. Run a notification cron.
|
||||
Verify output went to `/tmp/cron-held/`, not to messaging.
|
||||
2. **Dream cycle:** Run the dream cycle manually. Check that thin entity pages
|
||||
got enriched and broken citations were fixed.
|
||||
3. **Email collector cron:** Wait 30 minutes. Check `data/digests/` for new digest.
|
||||
4. **Morning briefing:** Check that held messages appear in the briefing.
|
||||
5. **Health check:** Run `gbrain doctor --json`. All checks should pass.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md). See also: [Quiet Hours](quiet-hours.md), [Operational Disciplines](operational-disciplines.md)*
|
||||
146
docs/guides/deterministic-collectors.md
Normal file
146
docs/guides/deterministic-collectors.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# Deterministic Collectors: Code for Data, LLMs for Judgment
|
||||
|
||||
## Goal
|
||||
|
||||
Separate mechanical work (100% reliable code) from analytical work (LLM judgment) so that deterministic tasks never fail probabilistically.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: the LLM generates Gmail links, formats tables, and tracks state.
|
||||
It follows the rule for the first 10 items, then drops a link on item 11. You
|
||||
write "NO EXCEPTIONS" in the prompt. It still fails. 90% reliability over 20
|
||||
items means visible failures twice per day. Trust is destroyed.
|
||||
|
||||
With this: code handles URLs, formatting, and state (100% reliable). The LLM
|
||||
reads pre-formatted data and adds judgment, classification, and enrichment.
|
||||
Links are never wrong because the LLM never generates them.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
// The pattern: code collects, LLM analyzes
|
||||
|
||||
// STEP 1: Deterministic collector (script, no LLM calls)
|
||||
collector_run():
|
||||
messages = gmail_api.fetch_unread()
|
||||
for msg in messages:
|
||||
structured = {
|
||||
id: msg.id,
|
||||
from: msg.sender,
|
||||
subject: msg.subject,
|
||||
snippet: msg.snippet,
|
||||
gmail_link: f"https://mail.google.com/mail/u/?authuser={account}#inbox/{msg.id}",
|
||||
gmail_markdown: f"[Open in Gmail]({gmail_link})",
|
||||
is_signature: regex_match(msg, DOCUSIGN_PATTERNS),
|
||||
is_noise: regex_match(msg, NOISE_PATTERNS),
|
||||
is_new: msg.id not in state.seen_ids
|
||||
}
|
||||
store(structured)
|
||||
state.seen_ids.add(msg.id)
|
||||
generate_markdown_digest(structured_messages)
|
||||
|
||||
// STEP 2: LLM reads the pre-formatted digest
|
||||
llm_analyze():
|
||||
digest = read("data/digests/today.md") // links already baked in
|
||||
classify_urgency(digest) // judgment call
|
||||
add_commentary(digest) // contextual analysis
|
||||
run_brain_enrichment(notable_entities) // gbrain search + update
|
||||
draft_replies(urgent_items) // creative work
|
||||
surface_to_user(final_output) // delivery
|
||||
|
||||
// STEP 3: Wire into cron
|
||||
cron_job():
|
||||
collector_run() // fast, cheap, deterministic
|
||||
llm_analyze() // slower, expensive, creative
|
||||
```
|
||||
|
||||
### The Architecture
|
||||
|
||||
```
|
||||
+-----------------------------+ +------------------------------+
|
||||
| Deterministic Collector |---->| LLM Agent |
|
||||
| (Node.js / Python script) | | |
|
||||
| | | - Read the pre-formatted |
|
||||
| - Pull data from API | | digest |
|
||||
| - Store structured JSON | | - Classify items |
|
||||
| - Generate links/URLs | | - Add commentary |
|
||||
| - Detect patterns (regex) | | - Run brain enrichment |
|
||||
| - Track state (seen/new) | | - Draft replies |
|
||||
| - Output markdown digest | | - Surface to user |
|
||||
| | | |
|
||||
| CODE — deterministic, | | AI — judgment, context, |
|
||||
| never forgets | | creativity |
|
||||
+-----------------------------+ +------------------------------+
|
||||
```
|
||||
|
||||
### File Structure
|
||||
|
||||
```
|
||||
scripts/email-collector/
|
||||
├── email-collector.mjs # No LLM calls, no external deps
|
||||
├── data/
|
||||
│ ├── state.json # Last pull timestamp, known IDs, pending signatures
|
||||
│ ├── messages/ # Structured JSON per day
|
||||
│ │ └── 2026-04-09.json
|
||||
│ └── digests/ # Pre-formatted markdown
|
||||
│ └── 2026-04-09.md
|
||||
```
|
||||
|
||||
### Where the Pattern Applies
|
||||
|
||||
| Signal Source | Collector Generates | LLM Adds |
|
||||
|--------------|-------------------|----------|
|
||||
| **Email** | Gmail links, sender metadata, signature detection | Urgency classification, enrichment, reply drafts |
|
||||
| **X/Twitter** | Tweet links, engagement metrics, deletion detection | Sentiment analysis, narrative detection, content ideas |
|
||||
| **Calendar** | Event links, attendee lists, conflict detection | Prep briefings, meeting context from brain |
|
||||
| **Slack** | Channel links, thread links, mention detection | Priority classification, action item extraction |
|
||||
| **GitHub** | PR/issue links, diff stats, CI status | Code review context, priority assessment |
|
||||
|
||||
### The Principle
|
||||
|
||||
If a piece of output MUST be present and MUST be formatted correctly every
|
||||
time, generate it in code. If a piece of output requires judgment, context,
|
||||
or creativity, generate it with the LLM. Don't ask the LLM to do both in
|
||||
the same pass.
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **LLMs forget links -- bake them in code.** The LLM will follow the
|
||||
"include a Gmail link" rule for the first 10 items, then silently drop
|
||||
it on item 11. No amount of prompt engineering fixes probabilistic
|
||||
formatting over long outputs. The fix: generate every link in the
|
||||
collector script. The LLM reads pre-formatted markdown where links are
|
||||
already embedded. It can't forget what it didn't generate.
|
||||
|
||||
2. **Noise filtering must be deterministic.** Regex-based noise detection
|
||||
(newsletters, automated receipts, marketing) belongs in the collector,
|
||||
not the LLM. The LLM might classify a newsletter as "possibly important"
|
||||
on one run and "noise" on the next. Code classifies the same input the
|
||||
same way every time.
|
||||
|
||||
3. **Atomic writes prevent corruption.** The collector writes to a state
|
||||
file (`state.json`) that tracks which messages have been seen. If the
|
||||
script crashes mid-write, the state file can be corrupted. Write to a
|
||||
temp file first, then rename atomically. This also prevents the LLM
|
||||
from reading a partial digest if the cron fires during a collection run.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Run the collector and check every link.** Execute the collector script
|
||||
manually. Open the generated digest. Click every `[Open in Gmail]` link
|
||||
(or equivalent). Every single link must resolve to the correct item. If
|
||||
any link is broken or missing, the collector has a bug.
|
||||
|
||||
2. **Verify noise filtering is consistent.** Run the collector twice on the
|
||||
same input data. The noise classification (is_noise field) must be
|
||||
identical both times. If it varies, a probabilistic element leaked into
|
||||
the deterministic layer.
|
||||
|
||||
3. **Verify the LLM reads structured output.** Run the full pipeline
|
||||
(collector then LLM). Check that the LLM's analysis references data
|
||||
from the structured digest, not from its own generation. The links in
|
||||
the final output should be identical to the links in the digest file.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
151
docs/guides/diligence-ingestion.md
Normal file
151
docs/guides/diligence-ingestion.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# Diligence Ingestion: Data Room to Brain Pages
|
||||
|
||||
## Goal
|
||||
|
||||
Turn pitch decks, financial models, and data room materials into searchable, cross-referenced brain pages with bull/bear analysis.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: pitch decks sit in email attachments. Financial models in Google
|
||||
Drive. No cross-reference to the company brain page. You can't search "what
|
||||
were the key metrics from Acme Corp's Series A deck?"
|
||||
|
||||
With this: every data room document is extracted, diarized, cross-referenced to
|
||||
the company page, and searchable. Index.md gives you the bull/bear case at a
|
||||
glance. `gbrain query "Acme Corp revenue growth"` finds the exact chart.
|
||||
|
||||
## Implementation
|
||||
|
||||
Recognize data room materials by PDF filenames containing "Data Deck", "Intro
|
||||
Deck", "Data Room", "Cap Table", "Financial Model", "Investor Memo", "Pitch
|
||||
Deck", or series round names. Spreadsheet tabs with Revenue, Retention, Cohorts,
|
||||
CAC, Gross Margin, Unit Economics, ARR. User language like "data room",
|
||||
"diligence", "deck", "pitch", "fundraise materials".
|
||||
|
||||
### The 9-Step Pipeline
|
||||
|
||||
**Step 1: Identify the Company.**
|
||||
From the document content or filename, identify the company name.
|
||||
Check if `brain/companies/{slug}.md` exists.
|
||||
|
||||
**Step 2: Create Diligence Directory.**
|
||||
|
||||
```bash
|
||||
mkdir -p brain/diligence/{company-slug}/.raw
|
||||
```
|
||||
|
||||
**Step 3: Extract Content.**
|
||||
|
||||
- **PDFs:** Use PDF extraction tool. For scanned/image-heavy PDFs,
|
||||
use OCR (e.g., Mistral OCR or similar).
|
||||
- **Spreadsheets:** Export each sheet as CSV. For Google Sheets:
|
||||
```
|
||||
https://docs.google.com/spreadsheets/d/{ID}/gviz/tq?tqx=out:csv&sheet={Sheet Name}
|
||||
```
|
||||
|
||||
**Step 4: Diarize and Save.**
|
||||
Write extracted content to `brain/diligence/{company}/{doc-name}.md`:
|
||||
- Document title and type
|
||||
- Section-by-section breakdown with key metrics
|
||||
- Notable footnotes or caveats
|
||||
- Raw data tables where relevant
|
||||
|
||||
**Step 5: Save Raw Files.**
|
||||
Copy original PDFs/files to `brain/diligence/{company}/.raw/`
|
||||
Preserve originals for reference. The diarized version is for search.
|
||||
|
||||
**Step 6: Create or Update index.md.**
|
||||
Every diligence directory needs an `index.md`:
|
||||
|
||||
```markdown
|
||||
# {Company Name} — Diligence
|
||||
|
||||
## Round Details
|
||||
- Stage: Series A
|
||||
- Amount: $10M
|
||||
- Date: 2026-04
|
||||
|
||||
## Document Inventory
|
||||
- [Pitch Deck](pitch-deck.md) — 25 slides, company overview + traction
|
||||
- [Financial Model](financial-model.md) — 5 tabs, 3-year projections
|
||||
- [Cap Table](cap-table.md) — current ownership + option pool
|
||||
|
||||
## Key Findings
|
||||
- Revenue growing 30% MoM for last 6 months
|
||||
- CAC payback period: 4 months
|
||||
- Net retention: 135%
|
||||
|
||||
## Bull Case
|
||||
- Strong product-market fit signal (NPS 72)
|
||||
- Expanding into adjacent vertical
|
||||
|
||||
## Bear Case
|
||||
- Single customer represents 40% of revenue
|
||||
- Burn rate increased 3x last quarter
|
||||
|
||||
## Open Questions
|
||||
- What's the path to profitability?
|
||||
- How defensible is the moat?
|
||||
```
|
||||
|
||||
**Step 7: Enrich Company Brain Page.**
|
||||
Update `brain/companies/{slug}.md`:
|
||||
- Add document sources to frontmatter
|
||||
- Update compiled truth with key findings
|
||||
- Add "See Also" link to diligence directory
|
||||
- If no company page exists, create one via the enrich skill
|
||||
|
||||
**Step 8: Commit.**
|
||||
|
||||
```bash
|
||||
cd brain/ && git add -A && git commit -m "diligence: {Company} — {doc type} ingestion" && git push
|
||||
```
|
||||
|
||||
**Step 9: Publish (if asked).**
|
||||
When the user wants a shareable brief, create a password-protected
|
||||
published version. Strip internal notes and raw assessment language.
|
||||
|
||||
### Quality Bar
|
||||
|
||||
A good diligence page reads like an intelligence assessment:
|
||||
- **What they say** vs **what the data shows** (the gap is the insight)
|
||||
- Explicit bull/bear case (not just a summary)
|
||||
- Key metrics highlighted, not buried
|
||||
- Open questions that need answers before decision
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **PDF extraction is lossy.** Scanned decks and image-heavy PDFs lose
|
||||
tables and charts during extraction. Always check the diarized output
|
||||
against the original `.raw/` file. If key metrics are missing, re-extract
|
||||
with OCR or transcribe manually.
|
||||
|
||||
2. **Idempotency on re-ingestion.** If the user sends an updated deck for
|
||||
the same company, don't create a duplicate directory. Check for an existing
|
||||
`brain/diligence/{company-slug}/` and update in place. Append a version
|
||||
suffix to the document file if the old version should be preserved.
|
||||
|
||||
3. **index.md completeness.** The index.md is the entry point for the entire
|
||||
diligence package. If it's missing the bull/bear case or open questions,
|
||||
the diligence is incomplete. Always generate all sections even if some
|
||||
require judgment calls -- flag uncertain assessments explicitly.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Search for key metrics.** After ingestion, run
|
||||
`gbrain search "revenue growth"` or `gbrain search "{company name} CAC"`.
|
||||
The diarized content should appear in results. If it doesn't, the sync
|
||||
or embedding step was missed.
|
||||
|
||||
2. **Check the company page cross-reference.** Open
|
||||
`brain/companies/{slug}.md` and verify it links to the diligence directory.
|
||||
The compiled truth section should include key findings from the deck.
|
||||
|
||||
3. **Verify index.md has all sections.** Open
|
||||
`brain/diligence/{company}/index.md` and confirm it has Round Details,
|
||||
Document Inventory, Key Findings, Bull Case, Bear Case, and Open Questions.
|
||||
Missing sections mean the pipeline stopped early.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
103
docs/guides/enrichment-pipeline.md
Normal file
103
docs/guides/enrichment-pipeline.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# Enrichment Pipeline
|
||||
|
||||
## Goal
|
||||
Enrich brain pages from external APIs with tiered spend -- full pipeline for key people, light touch for passing mentions, raw data preserved for auditability.
|
||||
|
||||
## What the User Gets
|
||||
Without this: brain pages are thin shells with only what the user manually typed, API calls are wasted on nobodies, and enrichment data vanishes after the agent session ends. With this: key people have rich, multi-source portraits; spend scales to importance; raw API responses are preserved for re-processing; and cross-references connect the entire graph.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
on enrich(entity, trigger):
|
||||
# trigger: meeting mention, email thread, social interaction, user request
|
||||
|
||||
# Step 1: Identify entities from the incoming signal
|
||||
entities = extract_entities(signal)
|
||||
# people names, company names, associations
|
||||
|
||||
# Step 2: Check brain state -- UPDATE or CREATE path?
|
||||
for entity in entities:
|
||||
existing = gbrain search "{entity.name}"
|
||||
if existing:
|
||||
page = gbrain get <entity_slug>
|
||||
path = "UPDATE"
|
||||
else:
|
||||
path = "CREATE"
|
||||
|
||||
# Step 3: Determine tier -- scale spend to importance
|
||||
tier = classify_tier(entity):
|
||||
# Tier 1 (10-15 API calls): key people, inner circle, business partners,
|
||||
# portfolio companies. Full pipeline, ALL data sources.
|
||||
# Tier 2 (3-5 API calls): notable people, occasional interactions.
|
||||
# Web search + social + brain cross-reference.
|
||||
# Tier 3 (1-2 API calls): minor mentions, everyone else worth tracking.
|
||||
# Brain cross-reference + social lookup if handle known.
|
||||
|
||||
# Step 4: Run external lookups (priority order, stop when enough signal)
|
||||
data = {}
|
||||
data["brain"] = gbrain search "{entity.name}" # Always first (free)
|
||||
if tier <= 2:
|
||||
data["web"] = brave_search("{entity.name}") # Background, press, talks
|
||||
if tier <= 2:
|
||||
data["twitter"] = twitter_lookup(entity.handle) # Beliefs, building, network
|
||||
if tier == 1:
|
||||
data["linkedin"] = crustdata_enrich(entity.name) # Career, connections
|
||||
data["research"] = happenstance_research(entity) # Career arcs, web presence
|
||||
data["funding"] = captain_api(entity.company) # Funding, valuation, team
|
||||
data["meetings"] = circleback_search(entity.name) # Transcript search
|
||||
data["contacts"] = google_contacts(entity.email) # Contact data
|
||||
|
||||
# Step 5: Store raw data (auditable, re-processable)
|
||||
gbrain put_raw_data <entity_slug> \
|
||||
--data '{"sources": {"crustdata": {"fetched_at": "...", "data": {...}}, ...}}'
|
||||
# Overwrite on re-enrichment, don't append
|
||||
|
||||
# Step 6: Write to brain page
|
||||
if path == "CREATE":
|
||||
gbrain put <entity_slug> --content "<compiled_truth_from_all_sources>"
|
||||
gbrain add_timeline_entry <entity_slug> --entry "Page created via enrichment"
|
||||
elif path == "UPDATE":
|
||||
# Append timeline, update compiled truth ONLY if materially new
|
||||
gbrain add_timeline_entry <entity_slug> --entry "Enriched: {new_signal}"
|
||||
# Flag contradictions -- don't silently resolve them
|
||||
|
||||
# Step 7: Cross-reference the graph
|
||||
gbrain add_link <person_slug> <company_slug> # person -> company
|
||||
gbrain add_link <company_slug> <person_slug> # company -> person
|
||||
gbrain add_link <person_slug> <deal_slug> # person -> deal
|
||||
# Every entity page links to every other entity page that references it
|
||||
|
||||
# People page sections (not a LinkedIn profile -- a living portrait):
|
||||
# Executive Summary, State, What They Believe, What They're Building,
|
||||
# What Motivates Them, Assessment, Trajectory, Relationship, Contact, Timeline
|
||||
# Facts are table stakes. TEXTURE is the value.
|
||||
|
||||
# Extract texture, not just facts:
|
||||
# Opinion expressed? -> What They Believe
|
||||
# Building or shipping? -> What They're Building
|
||||
# Emotion expressed? -> What Makes Them Tick
|
||||
# Who did they engage with? -> Network / Relationship
|
||||
# Recurring topic? -> Hobby Horses
|
||||
# Committed to something? -> Open Threads
|
||||
# Energy level? -> Trajectory
|
||||
```
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Don't overwrite human-written assessments.** If the user wrote an Assessment section with their own read on someone, API enrichment NEVER overwrites it. API data goes into State, Contact, Timeline. The user's assessment is sacrosanct.
|
||||
2. **Don't re-enrich the same page more than once per week.** Check `put_raw_data` timestamps before running the pipeline again. Enrichment is expensive and data doesn't change that fast.
|
||||
3. **LinkedIn connection count < 20 means wrong person.** Crustdata sometimes returns a different person with the same name. If the LinkedIn profile has fewer than 20 connections, it's almost certainly a false match. Discard it.
|
||||
4. **X/Twitter is the most underrated data source.** When you have someone's handle, their tweets reveal beliefs, what they're building, hobby horses, network (reply patterns), and trajectory (posting frequency, tone shifts). This is richer than LinkedIn for "What They Believe" and "What Makes Them Tick."
|
||||
5. **Cross-references are not optional.** After enriching a person, update their company page. After enriching a company, update founder pages. An enriched page without cross-links is a dead end in the graph.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. Enrich a Tier 1 person. Run `gbrain get <slug>` and confirm the page has Executive Summary, State, What They Believe, Contact, and Timeline sections populated from multiple sources.
|
||||
2. Run `gbrain get_raw_data <slug>`. Confirm raw API responses are stored with `sources.{provider}.fetched_at` timestamps.
|
||||
3. Run `gbrain get_links <slug>`. Confirm cross-reference links exist to the person's company page, deal pages, and related entities.
|
||||
4. Check a page that was enriched AND has a user-written Assessment. Confirm the Assessment section was preserved, not overwritten by API data.
|
||||
5. Try to re-enrich the same person. Confirm the system checks the `fetched_at` timestamp and skips if less than a week old.
|
||||
|
||||
---
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
190
docs/guides/entity-detection.md
Normal file
190
docs/guides/entity-detection.md
Normal file
@@ -0,0 +1,190 @@
|
||||
# Entity Detection: Run It on Every Message
|
||||
|
||||
## Goal
|
||||
|
||||
Every inbound message gets scanned for original thinking AND entity mentions so the brain grows on every conversation, automatically.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: the agent answers questions but forgets everything. You mention
|
||||
Pedro in a meeting, and next week the agent doesn't know who Pedro is.
|
||||
|
||||
With this: every person, company, and idea mentioned in conversation gets a
|
||||
brain page. Next time Pedro comes up, the agent already has context. The
|
||||
brain compounds.
|
||||
|
||||
## Implementation
|
||||
|
||||
Spawn a lightweight sub-agent on EVERY inbound message. Do NOT wait for it
|
||||
to finish before responding. It runs in parallel.
|
||||
|
||||
```
|
||||
on_every_message(message_text, source_context):
|
||||
|
||||
// 1. SPAWN ASYNC — don't block the response
|
||||
spawn_subagent({
|
||||
model: "sonnet-class", // cheap + fast, not opus
|
||||
timeout: 120, // seconds
|
||||
task: build_detection_prompt(message_text, source_context)
|
||||
})
|
||||
|
||||
// 2. RESPOND TO USER NORMALLY
|
||||
// The sub-agent runs in the background
|
||||
```
|
||||
|
||||
### The Detection Prompt
|
||||
|
||||
```
|
||||
build_detection_prompt(text, source):
|
||||
return `
|
||||
SIGNAL DETECTION — scan this message for ideas AND entities:
|
||||
|
||||
Message: "${text}"
|
||||
Source: [Source: User, ${source.topic}, ${source.platform}, ${source.timestamp}]
|
||||
|
||||
STEP 1 — IDEAS FIRST (highest priority):
|
||||
Is the user expressing an original thought, observation, thesis, or framework?
|
||||
|
||||
If yes:
|
||||
- Create or update brain/originals/{slug}.md
|
||||
- Use the user's EXACT phrasing (the language IS the insight)
|
||||
- "The ambition-to-lifespan ratio has never been more broken" is better
|
||||
than "tension between ambition and mortality"
|
||||
- Include [Source: ...] citation with full context
|
||||
|
||||
If the idea references a world concept: brain/concepts/{slug}.md
|
||||
If it's a product/business idea: brain/ideas/{slug}.md
|
||||
|
||||
STEP 2 — ENTITIES:
|
||||
Extract all person names, company names, media titles.
|
||||
|
||||
For each entity:
|
||||
a. Run: gbrain search "{name}"
|
||||
b. If page exists AND new info: append timeline entry
|
||||
Format: - YYYY-MM-DD | {what happened} [Source: {who}, {context}, {date}]
|
||||
c. If no page AND entity is notable: create page with web enrichment
|
||||
d. If page is thin (< 5 lines compiled truth): spawn background enrichment
|
||||
|
||||
STEP 3 — BACK-LINKING (mandatory):
|
||||
For every entity mentioned, add a back-link FROM their page TO this source.
|
||||
An unlinked mention is a broken brain.
|
||||
Format: - **YYYY-MM-DD** | Referenced in [{page title}]({path}) — {context}
|
||||
|
||||
STEP 4 — SYNC:
|
||||
Run: gbrain sync --no-pull --no-embed
|
||||
|
||||
If nothing to capture, reply "No signals detected" and exit.
|
||||
`
|
||||
```
|
||||
|
||||
### Notability Filtering
|
||||
|
||||
Before creating a new entity page, check notability:
|
||||
|
||||
```
|
||||
is_notable(entity):
|
||||
// CREATE a page for:
|
||||
- People the user knows or discusses with specificity
|
||||
- Companies the user is evaluating, working with, or investing in
|
||||
- Media the user mentions with personal reaction
|
||||
- Anyone the user has explicitly engaged with
|
||||
|
||||
// DON'T create a page for:
|
||||
- Generic references or passing examples
|
||||
- Low-engagement accounts who mentioned the user once
|
||||
- Pure metaphors ("like the Roman Empire...")
|
||||
- One-off encounters with no follow-up
|
||||
|
||||
// If notable AND no page: create FULL page (not a stub)
|
||||
// If not notable: skip silently
|
||||
```
|
||||
|
||||
### What Counts as Original Thinking
|
||||
|
||||
| Capture | Don't Capture |
|
||||
|---------|---------------|
|
||||
| Original observations about how the world works | "ok", "do it", "sure" |
|
||||
| Novel connections between disparate things | Pure questions without observations |
|
||||
| Frameworks and mental models | Echoing back what the agent said |
|
||||
| Pattern recognition ("I keep seeing X in every Y") | Acknowledgments and reactions |
|
||||
| Hot takes with reasoning | Routine operational messages |
|
||||
| Metaphors that reveal new angles | Requests without embedded insight |
|
||||
|
||||
### Filing Rules
|
||||
|
||||
| Signal | Destination |
|
||||
|--------|-------------|
|
||||
| User generated the idea | `brain/originals/{slug}.md` |
|
||||
| User's synthesis of others' ideas | `brain/originals/` (the synthesis is original) |
|
||||
| World concept someone else coined | `brain/concepts/{slug}.md` |
|
||||
| Product or business idea | `brain/ideas/{slug}.md` |
|
||||
| Person mentioned | `brain/people/{slug}.md` |
|
||||
| Company mentioned | `brain/companies/{slug}.md` |
|
||||
| Media referenced | `brain/media/{type}/{slug}.md` |
|
||||
|
||||
### The Iron Law of Back-Linking
|
||||
|
||||
Every entity mention MUST create a back-link FROM the entity page TO the
|
||||
source. This is not optional.
|
||||
|
||||
```
|
||||
// When message mentions "Pedro" and creates a meeting page:
|
||||
|
||||
// 1. Update the meeting page (normal)
|
||||
brain/meetings/2026-04-10-board-sync.md:
|
||||
- Pedro presented Q1 numbers
|
||||
|
||||
// 2. ALSO update Pedro's page (back-link)
|
||||
brain/people/pedro-franceschi.md:
|
||||
## Timeline
|
||||
- **2026-04-10** | Presented Q1 numbers at board sync
|
||||
[Source: User, board meeting, 2026-04-10]
|
||||
```
|
||||
|
||||
Without back-links, you can't traverse the graph. "Show me everything related
|
||||
to Pedro" only works if Pedro's page links back to every mention.
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Don't block the conversation.** Entity detection runs async. The user
|
||||
should see a response immediately, not wait 2 minutes while the sub-agent
|
||||
enriches 5 entity pages.
|
||||
|
||||
2. **Sonnet, not Opus.** Entity detection is pattern matching, not deep
|
||||
reasoning. Sonnet is 5-10x cheaper and fast enough. Use Opus for the
|
||||
main conversation.
|
||||
|
||||
3. **Exact phrasing matters.** "Markdown is actually code" is an insight.
|
||||
"Markdown can be used as code" is a summary. Capture the first version.
|
||||
|
||||
4. **Don't create stubs.** If you create a page, make it good. Run a web
|
||||
search, build out the compiled truth, add context. A stub page with just
|
||||
a name is worse than no page (it gives false confidence).
|
||||
|
||||
5. **Dedup before creating.** Always `gbrain search` before creating a page.
|
||||
Variant spellings, nicknames, and company abbreviations cause duplicates.
|
||||
"Pedro Franceschi" and "Pedro" might be the same person.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Send a message mentioning a person.** Say "I had coffee with Sarah Chen
|
||||
from Acme Corp today." Verify: brain/people/sarah-chen.md was created or
|
||||
updated, brain/companies/acme-corp.md was created or updated, both have
|
||||
timeline entries with today's date.
|
||||
|
||||
2. **Send a message with an original idea.** Say "What if we could distribute
|
||||
software as markdown files that agents execute?" Verify:
|
||||
brain/originals/{slug}.md was created with your exact phrasing.
|
||||
|
||||
3. **Check back-links.** Open Sarah Chen's page. It should have a timeline
|
||||
entry linking back to today's conversation. Open Acme Corp's page. Same.
|
||||
|
||||
4. **Send a boring message.** Say "ok sounds good." Verify: nothing was
|
||||
created. The detector should report "No signals detected."
|
||||
|
||||
5. **Check for duplicates.** Mention "Pedro" then later "Pedro Franceschi."
|
||||
Verify: one page, not two.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
109
docs/guides/executive-assistant.md
Normal file
109
docs/guides/executive-assistant.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Executive Assistant Pattern
|
||||
|
||||
## Goal
|
||||
Email triage, meeting prep, and scheduling powered by brain context -- so every interaction is informed by the full history of the relationship.
|
||||
|
||||
## What the User Gets
|
||||
Without this: the agent triages email mechanically ("you have 12 unread"), preps for meetings with generic LinkedIn bios, and schedules without relationship context. With this: the agent knows who every sender is before reading their email, surfaces shared history before every meeting, and nudges scheduling based on relationship temperature and open threads.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
# WORKFLOW 1: Email Triage
|
||||
on email_batch(emails):
|
||||
for email in emails:
|
||||
# Step 1: Search sender BEFORE reading the email body
|
||||
# Brain context makes triage 10x better
|
||||
sender_page = gbrain search "{email.sender_name}"
|
||||
if sender_page:
|
||||
context = gbrain get <sender_slug>
|
||||
# Now you know: who they are, relationship history,
|
||||
# what they care about, open threads
|
||||
|
||||
# Step 2: Read the email WITH brain context loaded
|
||||
# Classification is now informed, not mechanical
|
||||
|
||||
# Step 3: Classify with context
|
||||
if context.relationship == "inner_circle" or context.has_open_threads:
|
||||
priority = "urgent"
|
||||
elif context.is_known_entity:
|
||||
priority = "normal"
|
||||
else:
|
||||
priority = "noise" # unknown sender, no brain page
|
||||
|
||||
# Step 4: Draft reply with relationship context
|
||||
if needs_reply(email):
|
||||
draft = compose_reply(
|
||||
email,
|
||||
context=context, # their brain page
|
||||
open_threads=context.open_threads, # what you're working on together
|
||||
relationship=context.relationship # tone calibration
|
||||
)
|
||||
|
||||
# WORKFLOW 2: Meeting Prep
|
||||
on upcoming_meeting(meeting):
|
||||
briefing = {}
|
||||
for attendee in meeting.attendees:
|
||||
# Search brain for each attendee
|
||||
results = gbrain search "{attendee.name}"
|
||||
if results:
|
||||
page = gbrain get <attendee_slug>
|
||||
briefing[attendee] = {
|
||||
"compiled_truth": page.compiled_truth,
|
||||
"last_interaction": page.timeline[0], # most recent
|
||||
"open_threads": page.open_threads,
|
||||
"relationship_temperature": page.relationship,
|
||||
"relevant_deals": gbrain get_links <attendee_slug>,
|
||||
}
|
||||
else:
|
||||
briefing[attendee] = "No brain page -- consider enriching"
|
||||
|
||||
# Surface: shared history, what to follow up on, what to watch for
|
||||
# "Last time you discussed the Series B timeline. Pedro was concerned
|
||||
# about burn rate. Here's the latest from his company page."
|
||||
|
||||
# WORKFLOW 3: Post-Inbox Brain Updates
|
||||
on inbox_cleared():
|
||||
for email in processed_emails:
|
||||
if email.contained_new_information:
|
||||
# Update the sender's brain page with new signal
|
||||
gbrain add_timeline_entry <sender_slug> \
|
||||
--entry "Email re: {subject}. Key info: {extracted_signal}" \
|
||||
--source "email from {sender} re {subject}, {date}"
|
||||
|
||||
# Update any mentioned entity pages too
|
||||
for entity in email.mentioned_entities:
|
||||
gbrain add_timeline_entry <entity_slug> \
|
||||
--entry "{what_was_said_about_them}" \
|
||||
--source "email from {sender}, {date}"
|
||||
|
||||
# WORKFLOW 4: Scheduling Nudges
|
||||
on schedule_request(meeting):
|
||||
for attendee in meeting.attendees:
|
||||
page = gbrain get <attendee_slug>
|
||||
if page.last_interaction > 6_weeks_ago:
|
||||
nudge("You haven't met with {attendee} in {weeks} weeks")
|
||||
if page.has_open_threads:
|
||||
nudge("{attendee} has an open thread about {topic}")
|
||||
if page.relationship_temperature == "cooling":
|
||||
nudge("Relationship with {attendee} may need attention")
|
||||
```
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Search sender BEFORE reading the email.** This is counterintuitive but critical. Loading brain context first means you know who they are, what you're working on together, and what they care about -- before you even see the subject line. The triage is informed, not mechanical.
|
||||
2. **Unknown senders with no brain page are almost always noise.** If `gbrain search` returns nothing for a sender, they're probably not important. Classify as low priority unless the email content signals otherwise.
|
||||
3. **Meeting prep is the highest-leverage EA workflow.** The user walks into every meeting already briefed on each attendee: last interaction, open threads, relationship history. This is the difference between "you have a meeting at 3" and "you have a meeting at 3 with Pedro -- last time you discussed the Series B, he was concerned about burn rate."
|
||||
4. **Post-inbox brain updates are where the brain compounds.** Every email is signal. If you clear the inbox without updating brain pages, the information is lost. This is the step most agents skip.
|
||||
5. **Scheduling nudges require timeline data.** "You haven't met with Diana in 6 weeks" only works if meeting pages have been ingested with proper entity propagation (see meeting-ingestion guide).
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. Run meeting prep for tomorrow's calendar. For each attendee, confirm the agent ran `gbrain search` and loaded their brain page before generating the briefing.
|
||||
2. Triage 5 emails. Confirm the agent searched for each sender in the brain before classifying the email.
|
||||
3. After clearing an inbox, check 2 sender brain pages with `gbrain get <slug>`. Confirm new timeline entries were added with information from the emails.
|
||||
4. Check a scheduling suggestion. Confirm the agent referenced the attendee's brain page (last interaction date, open threads) in the nudge.
|
||||
5. Send a test email from someone with a brain page. Confirm the triage response references their relationship context, not just the email content.
|
||||
|
||||
---
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
190
docs/guides/idea-capture.md
Normal file
190
docs/guides/idea-capture.md
Normal file
@@ -0,0 +1,190 @@
|
||||
# Idea Capture: Originals, Depth, and Distribution
|
||||
|
||||
## Goal
|
||||
|
||||
Capture the user's original thinking with exact phrasing, deep context, and cross-links so the originals folder becomes the highest-value content in the brain.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: brilliant ideas said in conversation disappear. The agent heard
|
||||
"the ambition-to-lifespan ratio has never been more broken" and forgot it.
|
||||
|
||||
With this: every original observation is captured verbatim, cross-linked to
|
||||
the people and ideas that shaped it, and rated for publishing potential. Your
|
||||
intellectual archive grows with every conversation.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
capture_idea(message_text, source_context):
|
||||
|
||||
// 1. AUTHORSHIP TEST — where does this idea belong?
|
||||
if user_generated_the_idea(message_text):
|
||||
destination = "brain/originals/{slug}.md"
|
||||
elif user_synthesis_of_others(message_text):
|
||||
destination = "brain/originals/{slug}.md" // synthesis IS original
|
||||
elif world_concept(message_text):
|
||||
destination = "brain/concepts/{slug}.md"
|
||||
elif product_or_business_idea(message_text):
|
||||
destination = "brain/ideas/{slug}.md"
|
||||
elif ghostwritten_by_user(message_text):
|
||||
destination = "brain/originals/{slug}.md" // note ghostwriter in metadata
|
||||
elif article_about_user(message_text):
|
||||
destination = "brain/media/writings/{slug}.md"
|
||||
|
||||
// 2. CAPTURE WITH EXACT PHRASING — never paraphrase
|
||||
page = create_or_update(destination, {
|
||||
content: message_text, // verbatim, not summarized
|
||||
source: source_context, // conversation, meeting, moment
|
||||
reasoning_path: influences, // what led to the insight
|
||||
depth_context: emotional_nuance // the WHY behind the WHAT
|
||||
})
|
||||
|
||||
// 3. ORIGINALITY RATING (for notable ideas)
|
||||
if is_notable(message_text):
|
||||
rate_originality(page, populations=[
|
||||
"general_population", "tech_industry",
|
||||
"intellectual_media", "political_establishment"
|
||||
])
|
||||
|
||||
// 4. CROSS-LINK (mandatory — an original without links is dead)
|
||||
link_to_people(page, mentioned_people)
|
||||
link_to_companies(page, mentioned_companies)
|
||||
link_to_meetings(page, source_meeting)
|
||||
link_to_media(page, influences)
|
||||
link_to_other_originals(page, related_ideas)
|
||||
link_to_concepts(page, referenced_concepts)
|
||||
|
||||
// 5. SYNC
|
||||
gbrain sync --no-pull --no-embed
|
||||
```
|
||||
|
||||
### The Authorship Test
|
||||
|
||||
| Signal | Destination |
|
||||
|--------|-------------|
|
||||
| User generated the idea | `brain/originals/{slug}.md` |
|
||||
| User's unique synthesis of others' ideas | `brain/originals/` (the synthesis is original) |
|
||||
| World concept someone else coined | `brain/concepts/{slug}.md` |
|
||||
| Product or business idea | `brain/ideas/{slug}.md` |
|
||||
| User's ghostwritten book/essay | `brain/originals/` (note ghostwriter in metadata) |
|
||||
| Article ABOUT user | `brain/media/writings/` |
|
||||
|
||||
### Capture Standards
|
||||
|
||||
**Use the user's EXACT phrasing.** The language IS the insight.
|
||||
|
||||
"The ambition-to-lifespan ratio has never been more broken" captures something that
|
||||
"tension between ambition and mortality" doesn't. Don't clean it up. Don't paraphrase.
|
||||
The vivid version is the real version.
|
||||
|
||||
**What counts as worth capturing:**
|
||||
- Original observations about how the world works
|
||||
- Novel connections between disparate things
|
||||
- Frameworks and mental models
|
||||
- Pattern recognition moments ("I keep seeing X in every Y")
|
||||
- Hot takes with reasoning behind them
|
||||
- Metaphors that reveal new angles
|
||||
- Emotional/psychological insights about self or others
|
||||
|
||||
**What does NOT count:**
|
||||
- Routine operational messages ("ok", "do it")
|
||||
- Pure questions without embedded observations
|
||||
- Echoing back something the agent said
|
||||
- Acknowledgments and reactions
|
||||
|
||||
### The Depth Test
|
||||
|
||||
**Could someone unfamiliar with the user read this page and understand not
|
||||
just WHAT they think but WHY and HOW they got there?**
|
||||
|
||||
If the answer is no, it needs more depth. Include:
|
||||
- The reasoning path (what led to the insight)
|
||||
- The influences (what they were reading/watching/experiencing)
|
||||
- The context (conversation, meeting, moment)
|
||||
- The emotional or psychological nuance
|
||||
|
||||
### Originality Distribution Rating
|
||||
|
||||
For notable ideas, rate originality 0-100 across different populations:
|
||||
|
||||
```markdown
|
||||
## Originality Distribution
|
||||
|
||||
- **General population:** 72/100 — most people haven't encountered this framework
|
||||
- **Tech industry:** 45/100 — common in startup circles but novel to most
|
||||
- **Intellectual/media class:** 68/100 — would resonate, not yet articulated
|
||||
- **Political establishment:** 82/100 — completely foreign to policy thinking
|
||||
|
||||
**Publish signal:** Strong essay candidate. Best audience: founders, builders.
|
||||
```
|
||||
|
||||
This tells the user which ideas are worth turning into essays, talks, or videos,
|
||||
and which audience would find them most novel.
|
||||
|
||||
### Deep Cross-Linking Mandate
|
||||
|
||||
**An original without cross-links is a dead original.** The connections ARE
|
||||
the intelligence.
|
||||
|
||||
Every original MUST link to:
|
||||
- **People** who shaped the thinking
|
||||
- **Companies** where the idea played out
|
||||
- **Meetings** where it was discussed
|
||||
- **Books and media** that influenced it
|
||||
- **Other originals** it connects to (ideas form clusters)
|
||||
- **Concepts** it builds on or challenges
|
||||
|
||||
### Notability Filtering
|
||||
|
||||
Before creating any entity page, check notability:
|
||||
|
||||
**Create a page for:**
|
||||
- People you know or discuss with specificity
|
||||
- Companies you're evaluating, working with, or investing in
|
||||
- Media you mention with personal reaction
|
||||
- Anyone you've explicitly engaged with
|
||||
|
||||
**Don't create pages for:**
|
||||
- Generic references or passing examples
|
||||
- Low-engagement accounts who mentioned you once
|
||||
- Pure metaphors ("like the Roman Empire...")
|
||||
- One-off encounters with no follow-up
|
||||
|
||||
**Decision:** If notable AND no page exists, create a full page with web
|
||||
search enrichment. No stubs. If you make a page, make it good.
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Synthesis IS original.** When the user connects two existing ideas in a
|
||||
new way, that synthesis belongs in `brain/originals/`, not `brain/concepts/`.
|
||||
The novel combination is the insight, even if the component ideas aren't new.
|
||||
|
||||
2. **Exact phrasing is non-negotiable.** Never paraphrase, summarize, or
|
||||
"clean up" the user's language. "The ambition-to-lifespan ratio has never
|
||||
been more broken" is the insight. "Tension between ambition and mortality"
|
||||
is a corpse. Capture the first version.
|
||||
|
||||
3. **Cross-links are mandatory, not optional.** An original without links to
|
||||
the people, companies, meetings, and concepts that shaped it is a dead
|
||||
original. The connections ARE the intelligence. Check every original for
|
||||
at least 2 cross-links before considering it captured.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Generate an idea and check the page.** Say something original in
|
||||
conversation (e.g., "What if markdown files are actually distributed
|
||||
software?"). Verify that `brain/originals/{slug}.md` was created with
|
||||
your exact phrasing, not a paraphrase.
|
||||
|
||||
2. **Check cross-links exist.** Open the newly created original page. It
|
||||
should link to at least the people or concepts mentioned. Open those
|
||||
linked pages and verify they back-link to the original.
|
||||
|
||||
3. **Verify the depth test passes.** Read the captured page as if you were
|
||||
a stranger. Can you understand not just WHAT the user thinks but WHY?
|
||||
If the reasoning path and context are missing, the capture is incomplete.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
138
docs/guides/live-sync.md
Normal file
138
docs/guides/live-sync.md
Normal file
@@ -0,0 +1,138 @@
|
||||
# Live Sync: Keep the Index Current
|
||||
|
||||
## Goal
|
||||
|
||||
Every markdown change in the brain repo is searchable within minutes, automatically, with no manual intervention.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: you correct a hallucination in a brain page, but the vector DB
|
||||
keeps serving the old text because nobody ran `gbrain sync`. Stale search
|
||||
results erode trust. The brain becomes unreliable.
|
||||
|
||||
With this: edits show up in search within minutes. The vector DB stays current
|
||||
with the brain repo automatically. You never have to remember to run sync.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Prerequisite: Session Mode Pooler
|
||||
|
||||
Sync uses `engine.transaction()` on every import. If `DATABASE_URL` points to
|
||||
Supabase's **Transaction mode** pooler, sync will throw `.begin() is not a
|
||||
function` and **silently skip most pages**. This is the number one cause of
|
||||
"sync ran but nothing happened."
|
||||
|
||||
Fix: use the **Session mode** pooler string (port 6543, Session mode) or the
|
||||
direct connection (port 5432, IPv6-only). Verify by running `gbrain sync` and
|
||||
checking that the page count in `gbrain stats` matches the syncable file count
|
||||
in the repo.
|
||||
|
||||
### The Primitives
|
||||
|
||||
Always chain sync + embed:
|
||||
|
||||
```bash
|
||||
gbrain sync --repo /path/to/brain && gbrain embed --stale
|
||||
```
|
||||
|
||||
- `gbrain sync --repo <path>` -- one-shot incremental sync. Detects changes via
|
||||
`git diff`, imports only what changed. For small changesets (<= 100 files),
|
||||
embeddings are generated inline during import.
|
||||
- `gbrain embed --stale` -- backfill embeddings for any chunks that don't have
|
||||
them. Safety net for large syncs (>100 files) or prior `--no-embed` runs.
|
||||
- `gbrain sync --watch --repo <path>` -- foreground polling loop, every 60s
|
||||
(configurable with `--interval N`). Embeds inline for small changesets. Exits
|
||||
after 5 consecutive failures, so run under a process manager or pair with a
|
||||
cron fallback.
|
||||
|
||||
### Approach 1: Cron Job (recommended)
|
||||
|
||||
Run every 5-30 minutes. Works with any cron scheduler.
|
||||
|
||||
```bash
|
||||
gbrain sync --repo /data/brain && gbrain embed --stale
|
||||
```
|
||||
|
||||
**OpenClaw:**
|
||||
```
|
||||
Name: gbrain-auto-sync
|
||||
Schedule: */15 * * * *
|
||||
Prompt: "Run: gbrain sync --repo /data/brain && gbrain embed --stale
|
||||
Log the result. If sync fails with .begin() is not a function,
|
||||
the DATABASE_URL is using Transaction mode pooler."
|
||||
```
|
||||
|
||||
**Hermes:**
|
||||
```
|
||||
/cron add "*/15 * * * *" "Run gbrain sync --repo /data/brain &&
|
||||
gbrain embed --stale. Log the result." --name "gbrain-auto-sync"
|
||||
```
|
||||
|
||||
### Approach 2: Long-Lived Watcher
|
||||
|
||||
For near-instant sync (60s polling). Run under a process manager that
|
||||
auto-restarts on exit. Pair with a cron fallback since `--watch` exits
|
||||
on repeated failures.
|
||||
|
||||
```bash
|
||||
gbrain sync --watch --repo /data/brain
|
||||
```
|
||||
|
||||
### Approach 3: Git Hook / Webhook
|
||||
|
||||
Triggers sync on push events for instant sync (<5s).
|
||||
|
||||
- **GitHub webhook:** Set up the webhook to call
|
||||
`gbrain sync --repo /data/brain && gbrain embed --stale`.
|
||||
Verify `X-Hub-Signature-256` against a shared secret.
|
||||
- **Git post-receive hook:** If the brain repo is on the same machine.
|
||||
|
||||
### What Gets Synced
|
||||
|
||||
Sync only indexes "syncable" markdown files. These are excluded by design:
|
||||
- Hidden paths (`.git/`, `.raw/`, etc.)
|
||||
- The `ops/` directory
|
||||
- Meta files: `README.md`, `index.md`, `schema.md`, `log.md`
|
||||
|
||||
### Sync is Idempotent
|
||||
|
||||
Concurrent runs are safe. Two syncs on the same commit no-op because content
|
||||
hashes match. If both a cron and `--watch` fire simultaneously, no conflict.
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Always chain sync + embed.** Running `gbrain sync` without
|
||||
`gbrain embed --stale` leaves new chunks without embeddings. They exist
|
||||
in the database but are invisible to vector search. Always run both
|
||||
commands together. The `&&` ensures embed only runs if sync succeeds.
|
||||
|
||||
2. **--watch polls, it doesn't stream.** The `--watch` flag polls every 60s
|
||||
(configurable). It is not a filesystem watcher or git hook. It exits after
|
||||
5 consecutive failures, so it needs a process manager (systemd, pm2) or a
|
||||
cron fallback to stay alive. Don't assume it runs forever.
|
||||
|
||||
3. **Webhook needs the server running.** If you use a GitHub webhook for
|
||||
instant sync, the receiving server must be running and reachable. If the
|
||||
server is down when a push happens, that sync is missed. Pair webhooks
|
||||
with a cron fallback that catches anything the webhook missed.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Edit a file and search for the change.** Edit a brain markdown file,
|
||||
commit, and push. Wait for the next sync cycle (cron interval or `--watch`
|
||||
poll). Run `gbrain search "<text from the edit>"`. The updated content
|
||||
should appear in results. If it returns old content, sync failed.
|
||||
|
||||
2. **Compare page count to file count.** Run `gbrain stats` and count the
|
||||
syncable markdown files in the brain repo. The page count in the database
|
||||
should match. If they diverge, files are being silently skipped (likely
|
||||
a Transaction mode pooler issue).
|
||||
|
||||
3. **Check embedded chunk count.** In `gbrain stats`, the embedded chunk
|
||||
count should be close to the total chunk count. A large gap means
|
||||
`gbrain embed --stale` isn't running after sync, leaving chunks invisible
|
||||
to vector search.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
80
docs/guides/meeting-ingestion.md
Normal file
80
docs/guides/meeting-ingestion.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# Meeting Ingestion
|
||||
|
||||
## Goal
|
||||
Meeting transcripts become brain pages that update every mentioned entity -- attendees, companies, deals, and action items all propagated in one pass.
|
||||
|
||||
## What the User Gets
|
||||
Without this: meetings vanish into memory, action items are forgotten, and the agent has no idea what was discussed last time you met someone. With this: every meeting is a permanent record that enriches every person and company page it touches, and the user walks into every follow-up already briefed.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
on new_meeting_transcript(meeting):
|
||||
# Step 1: Pull the COMPLETE transcript -- NOT the AI summary
|
||||
# AI summaries hallucinate framing ("it was agreed that...")
|
||||
# The transcript is ground truth
|
||||
transcript = fetch_full_transcript(meeting.id) # e.g., Circleback API
|
||||
# Must have speaker diarization: WHO said WHAT
|
||||
|
||||
# Step 2: Create the meeting page
|
||||
slug = f"meetings/{meeting.date}-{short_description}"
|
||||
compiled_truth = agent_analysis(transcript):
|
||||
# Above the bar: agent's OWN analysis, not a generic recap
|
||||
# - Reframe through the user's priorities
|
||||
# - Flag surprises, contradictions, implications
|
||||
# - Name real decisions (not performative ones)
|
||||
# - Call out what was left unsaid or unresolved
|
||||
timeline = format_diarized_transcript(transcript)
|
||||
# Below the bar: full transcript, append-only
|
||||
# Format: **Speaker** (HH:MM:SS): Words.
|
||||
|
||||
gbrain put <slug> --content "<compiled_truth>\n---\n<timeline>"
|
||||
|
||||
# Step 3: Propagate to ALL entity pages (MANDATORY -- most agents skip this)
|
||||
for person in meeting.attendees + meeting.mentioned_people:
|
||||
gbrain add_timeline_entry <person_slug> \
|
||||
--entry "Met in '{meeting.title}' on {date}. Key points: ..." \
|
||||
--source "Meeting notes '{meeting.title}', {date}"
|
||||
# Update their State section if new information surfaced
|
||||
# Update company pages for each person's company if relevant
|
||||
|
||||
for company in meeting.mentioned_companies:
|
||||
gbrain add_timeline_entry <company_slug> \
|
||||
--entry "Discussed in '{meeting.title}': {what_was_said}" \
|
||||
--source "Meeting notes '{meeting.title}', {date}"
|
||||
|
||||
# Step 4: Extract action items
|
||||
action_items = extract_action_items(transcript)
|
||||
# Add to task list with owner attribution
|
||||
|
||||
# Step 5: Back-link everything (bidirectional graph)
|
||||
for entity in all_entities_mentioned:
|
||||
gbrain add_link <slug> <entity_slug> # meeting -> entity
|
||||
gbrain add_link <entity_slug> <slug> # entity -> meeting
|
||||
|
||||
# Step 6: Sync so new pages are immediately searchable
|
||||
gbrain sync
|
||||
|
||||
# Schedule: cron 3x/day (10 AM, 4 PM, 9 PM) to catch new meetings
|
||||
# Source: Circleback (https://circleback.ai) or any service with
|
||||
# speaker diarization + API/webhook access
|
||||
```
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Always pull the COMPLETE transcript, never the AI summary.** AI summaries hallucinate framing -- they editorialize what was "agreed" or "decided" when no such agreement happened. The diarized transcript is ground truth.
|
||||
2. **Entity propagation is the step most agents skip.** A meeting is NOT fully ingested until every attendee's page, every mentioned person's page, and every company's page has a new timeline entry. The meeting page alone is useless without propagation.
|
||||
3. **Mentioned people are not just attendees.** If the meeting discussed "Sarah's team at Brex," then Sarah's page AND Brex's page need updates -- even though Sarah wasn't in the room.
|
||||
4. **The agent's analysis is the value, not a summary.** "They discussed Q2 targets" is worthless. "Pedro pushed back on the burn rate, Diana didn't commit to the timeline, and nobody addressed the pricing gap" is useful.
|
||||
5. **Back-links must be bidirectional.** The meeting page links to attendee pages AND attendee pages link back to the meeting. The graph is bidirectional. Always.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. After ingesting a meeting, run `gbrain get meetings/{date}-{slug}`. Confirm the page has the agent's analysis above the bar and the full diarized transcript below it.
|
||||
2. For each attendee, run `gbrain get <attendee_slug>`. Check that their timeline has a new entry referencing the meeting with specific insights (not just "attended meeting").
|
||||
3. Pick a company mentioned in the meeting. Run `gbrain get <company_slug>`. Confirm a timeline entry exists referencing what was discussed about the company.
|
||||
4. Run `gbrain get_links meetings/{date}-{slug}`. Verify back-links exist to all attendee and entity pages.
|
||||
5. Run `gbrain search "{meeting_topic}"`. Confirm the meeting page appears in search results (verifies sync ran).
|
||||
|
||||
---
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
120
docs/guides/operational-disciplines.md
Normal file
120
docs/guides/operational-disciplines.md
Normal file
@@ -0,0 +1,120 @@
|
||||
# Operational Disciplines
|
||||
|
||||
## Goal
|
||||
Five non-negotiable rules that separate a production brain from a demo -- signal detection, brain-first lookup, sync after every write, daily heartbeat, and nightly dream cycle.
|
||||
|
||||
## What the User Gets
|
||||
Without this: the agent misses signals in conversation, wastes money on external APIs when the brain already has the answer, leaves search results stale after writes, and lets the brain rot quietly. With this: every message is scanned for entities, the brain is always consulted first, search is always current, health is monitored daily, and the brain compounds overnight.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
# DISCIPLINE 1: Signal Detection on Every Message (MANDATORY)
|
||||
on every_inbound_message(message):
|
||||
# No exceptions. If the user thinks out loud and the brain doesn't
|
||||
# capture it, the system is broken. This is the #1 discipline.
|
||||
|
||||
entities = detect_entities(message)
|
||||
# people, companies, deals, original ideas
|
||||
|
||||
for entity in entities:
|
||||
existing = gbrain search "{entity.name}"
|
||||
if existing:
|
||||
gbrain add_timeline_entry <entity_slug> \
|
||||
--entry "{what_was_said}" \
|
||||
--source "User, direct message, {timestamp}"
|
||||
# else: flag for enrichment if important enough
|
||||
|
||||
originals = detect_original_thinking(message)
|
||||
for idea in originals:
|
||||
gbrain put originals/{slug} --content "{user's exact phrasing}"
|
||||
|
||||
# DISCIPLINE 2: Brain-First Lookup Before External APIs (MANDATORY)
|
||||
on information_needed(topic):
|
||||
# ALWAYS check the brain before reaching for the web
|
||||
brain_result = gbrain search "{topic}"
|
||||
if brain_result:
|
||||
page = gbrain get <slug>
|
||||
# Use brain data first. External APIs FILL GAPS, not replace.
|
||||
else:
|
||||
# Brain has nothing -- now use external APIs
|
||||
external_result = brave_search("{topic}")
|
||||
|
||||
# An agent that reaches for the web before checking its own brain
|
||||
# is wasting money and giving worse answers.
|
||||
|
||||
# DISCIPLINE 3: Sync After Every Write (MANDATORY)
|
||||
on brain_write_complete():
|
||||
gbrain sync
|
||||
# Without this, search results are stale.
|
||||
# The page you just wrote won't appear in gbrain search or gbrain query
|
||||
# until sync runs. Skipping this means the next lookup misses the
|
||||
# most recent data.
|
||||
|
||||
# DISCIPLINE 4: Daily Heartbeat Check
|
||||
on daily_schedule("09:00"):
|
||||
gbrain doctor
|
||||
# Checks: database connectivity, embedding health, sync status,
|
||||
# page count, stale pages, broken links
|
||||
# If doctor reports issues, fix them before doing anything else.
|
||||
|
||||
# DISCIPLINE 5: Nightly Dream Cycle
|
||||
on nightly_schedule("02:00"):
|
||||
# The dream cycle is the most important discipline.
|
||||
# The brain COMPOUNDS overnight.
|
||||
|
||||
# 5a: Entity sweep -- find unlinked mentions
|
||||
pages = gbrain list_pages
|
||||
for page in pages:
|
||||
mentions = extract_entity_mentions(page.content)
|
||||
existing_links = gbrain get_links <page.slug>
|
||||
for mention in mentions:
|
||||
if mention not in existing_links:
|
||||
gbrain add_link <page.slug> <mention_slug> # fix broken graph
|
||||
|
||||
# 5b: Citation audit -- find facts without sources
|
||||
for page in pages:
|
||||
facts_without_sources = audit_citations(page.content)
|
||||
if facts_without_sources:
|
||||
flag_for_remediation(page, facts_without_sources)
|
||||
|
||||
# 5c: Memory consolidation -- update compiled truth from timeline
|
||||
for page in stale_pages(older_than="7d"):
|
||||
timeline = gbrain get_timeline <page.slug>
|
||||
if timeline.has_new_entries_since_last_consolidation:
|
||||
# Re-synthesize compiled truth from accumulated timeline
|
||||
updated_truth = consolidate(page.compiled_truth, timeline.new_entries)
|
||||
gbrain put <page.slug> --content updated_truth
|
||||
|
||||
# 5d: Sync everything
|
||||
gbrain sync
|
||||
|
||||
# BONUS: Durable Skills Over One-Off Work
|
||||
# If you do something twice, make it a skill + cron.
|
||||
# 1. Concept the process
|
||||
# 2. Run it manually for 3-10 items
|
||||
# 3. Revise -- iterate on quality
|
||||
# 4. Codify into a skill
|
||||
# 5. Add to cron -- automate it
|
||||
# Each entity type and signal source has exactly one owner skill.
|
||||
# Two skills creating the same page = coverage violation.
|
||||
```
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **The dream cycle is the most important discipline.** Brains compound overnight. Entity sweeps fix broken graphs, citation audits catch sourceless facts, and memory consolidation keeps compiled truth current. Skip the dream cycle and the brain slowly rots.
|
||||
2. **Skipping Discipline 3 (sync after write) means stale search results.** You write a page, then immediately search for it -- and get nothing back. The page exists but isn't indexed. Always sync after writes.
|
||||
3. **Signal detection must fire on EVERY message.** Not just messages that look important. The user says "I talked to Pedro yesterday about the board seat" in passing -- that's a timeline entry on Pedro's page, a potential update to his State section, and a signal about the board. If the agent doesn't catch it, the system is broken.
|
||||
4. **Brain-first saves money AND gives better answers.** The brain has context that external APIs don't: relationship history, meeting notes, the user's own assessment. An API lookup for "Pedro Franceschi" returns a LinkedIn profile. The brain returns the full picture including private context.
|
||||
5. **`gbrain doctor` catches silent failures.** Embedding pipelines can stall, sync can fail silently, database connections can drop. The daily heartbeat catches these before they compound into data loss.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. Send a message mentioning a person with a brain page. Confirm the agent detects the entity and adds a timeline entry to their page (`gbrain get_timeline <slug>`).
|
||||
2. Ask the agent about someone in the brain. Confirm it runs `gbrain search` or `gbrain get` BEFORE reaching for external APIs (check the tool call order).
|
||||
3. Write a new page with `gbrain put`, then immediately run `gbrain search` for it. Confirm it appears in results (verifies sync ran).
|
||||
4. Run `gbrain doctor`. Confirm it returns a health report with database status, page count, and any flagged issues.
|
||||
5. After a dream cycle runs, check a page that had unlinked entity mentions. Confirm new links were added (`gbrain get_links <slug>`).
|
||||
|
||||
---
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
87
docs/guides/originals-folder.md
Normal file
87
docs/guides/originals-folder.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# The Originals Folder
|
||||
|
||||
## Goal
|
||||
Capture the user's original thinking with their exact phrasing, deep cross-links, and full provenance -- so intellectual capital compounds instead of evaporating.
|
||||
|
||||
## What the User Gets
|
||||
Without this: the user generates a brilliant framework in conversation and it vanishes when the session ends. Six months later, they vaguely remember the idea but can't find it, can't recall the exact phrasing, and can't trace what influenced it. With this: every original observation, thesis, framework, and hot take is captured verbatim in `brain/originals/`, cross-linked to the people, companies, and media that shaped it, and searchable forever.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
on user_message(message):
|
||||
# Detect original thinking in every message
|
||||
if contains_original_thinking(message):
|
||||
# The authorship test:
|
||||
# User generated the idea? -> originals/{slug}.md
|
||||
# User's unique synthesis of someone else's? -> originals/ (synthesis IS original)
|
||||
# World concept someone else coined? -> concepts/{slug}.md
|
||||
# Product or business idea? -> ideas/{slug}.md
|
||||
|
||||
# Step 1: Use the user's EXACT phrasing for the slug
|
||||
# "meatsuit-maintenance-tax"
|
||||
# NOT "biological-needs-maintenance-overhead"
|
||||
# The vividness IS the concept.
|
||||
slug = slugify(user_exact_phrase)
|
||||
|
||||
# Step 2: Create the originals page
|
||||
gbrain put originals/{slug} --content """
|
||||
# {User's Exact Phrase}
|
||||
|
||||
## The Idea
|
||||
{User's original thinking, captured in their own words.
|
||||
Do NOT paraphrase. Do NOT clean up the language.
|
||||
The raw phrasing is the intellectual artifact.}
|
||||
|
||||
## Context
|
||||
{What triggered this thinking. Meeting? Article? Conversation?
|
||||
Include the source that sparked it.}
|
||||
[Source: User, {context}, {date} {time} {tz}]
|
||||
|
||||
## Connections
|
||||
- Related to: [[{person_slug}]] -- {how they connect}
|
||||
- Emerged from: [[{meeting_slug}]] -- {what was discussed}
|
||||
- Influenced by: [[{book_or_media_slug}]] -- {what resonated}
|
||||
- Builds on: [[{other_original_slug}]] -- {how ideas cluster}
|
||||
"""
|
||||
|
||||
# Step 3: Cross-link to everything that shaped the thinking
|
||||
for entity in idea.influences:
|
||||
gbrain add_link originals/{slug} <entity_slug>
|
||||
gbrain add_link <entity_slug> originals/{slug}
|
||||
|
||||
# Step 4: Sync
|
||||
gbrain sync
|
||||
|
||||
# What counts as original thinking:
|
||||
# - Novel frameworks ("the meatsuit maintenance tax")
|
||||
# - Hot takes on someone else's work (synthesis IS original)
|
||||
# - Pattern recognition across multiple entities
|
||||
# - Predictions or bets about the future
|
||||
# - Contrarian positions with reasoning
|
||||
|
||||
# What does NOT go in originals/:
|
||||
# - Facts about the world (-> entity pages)
|
||||
# - Concepts someone else coined (-> concepts/)
|
||||
# - Product ideas (-> ideas/)
|
||||
# - Preferences (-> agent memory)
|
||||
```
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Naming: the vividness IS the concept.** `meatsuit-maintenance-tax` not `biological-needs-maintenance-overhead`. `ambition-debt` not `deferred-career-risk-accumulation`. The user's colorful phrasing is the intellectual artifact. Never sanitize it into corporate-speak.
|
||||
2. **Synthesis IS original.** The user's take on Peter Thiel's zero-to-one framework goes in `originals/`, not `concepts/`. The original part is the user's synthesis, interpretation, or disagreement -- even though the underlying ideas came from someone else.
|
||||
3. **An original without cross-links is a dead original.** The connections ARE the intelligence. An idea about "ambition debt" that doesn't link to the people who exemplify it, the meeting where it was discussed, and the book that influenced it is just a note in a graveyard. Cross-link aggressively.
|
||||
4. **Originals form clusters.** Over time, the user's ideas connect to each other. "Meatsuit maintenance tax" connects to "ambition debt" connects to "founder energy budget." Link originals to other originals. The cluster IS the user's worldview.
|
||||
5. **Capture the trigger context.** What conversation, meeting, article, or moment sparked this idea? The context often matters as much as the idea itself for future retrieval. Include it in the page.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. Generate an original idea in conversation (e.g., "I call this the 'ambition debt' problem -- every year you delay going big, the compound interest works against you"). Confirm a new page appears at `brain/originals/ambition-debt` with `gbrain get originals/ambition-debt`.
|
||||
2. Check that the page uses the user's exact phrasing for the title and slug -- not a sanitized version.
|
||||
3. Run `gbrain get_links originals/ambition-debt`. Confirm cross-links exist to related people, meetings, or other originals.
|
||||
4. Express a take on someone else's idea (e.g., "I think Thiel's contrarian question is wrong because..."). Confirm it goes to `originals/` (synthesis is original), not `concepts/`.
|
||||
5. Run `gbrain search "ambition debt"`. Confirm the originals page appears in search results and is discoverable.
|
||||
|
||||
---
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
165
docs/guides/quiet-hours.md
Normal file
165
docs/guides/quiet-hours.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Quiet Hours and Timezone-Aware Delivery
|
||||
|
||||
## Goal
|
||||
|
||||
Hold all notifications during sleep hours, merge held messages into the morning briefing, and adjust automatically when the user travels.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: 3 AM pings from cron jobs. One bad notification and the user
|
||||
disables the entire system.
|
||||
|
||||
With this: the brain works overnight (dream cycle, collectors, enrichment)
|
||||
but notifications are held until morning. Travel to Tokyo? The system adjusts
|
||||
automatically from your calendar, no config change needed.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Quiet Hours Gate
|
||||
|
||||
Every cron job that sends notifications must check quiet hours FIRST.
|
||||
|
||||
```
|
||||
QUIET_START = 23 // 11 PM local time
|
||||
QUIET_END = 8 // 8 AM local time
|
||||
|
||||
is_quiet(local_hour):
|
||||
return local_hour >= QUIET_START OR local_hour < QUIET_END
|
||||
```
|
||||
|
||||
**Before sending any notification:**
|
||||
1. Determine user's current timezone (from config or heartbeat state)
|
||||
2. Convert current UTC time to local time
|
||||
3. If quiet hours: hold the message, don't send
|
||||
|
||||
### Held Messages
|
||||
|
||||
During quiet hours, output goes to a held directory instead of being sent:
|
||||
|
||||
```
|
||||
if is_quiet():
|
||||
mkdir -p /tmp/cron-held/
|
||||
write("/tmp/cron-held/{job-name}.md", output)
|
||||
exit // don't send
|
||||
else:
|
||||
send(output)
|
||||
```
|
||||
|
||||
The morning briefing picks up held messages:
|
||||
|
||||
```
|
||||
morning_briefing():
|
||||
held_files = list("/tmp/cron-held/*.md")
|
||||
if held_files:
|
||||
briefing += "## Overnight Updates\n\n"
|
||||
for file in held_files:
|
||||
briefing += read(file)
|
||||
delete(file)
|
||||
```
|
||||
|
||||
This way nothing is lost. Overnight cron results get folded into the
|
||||
first thing the user sees in the morning.
|
||||
|
||||
### Timezone Awareness
|
||||
|
||||
The agent should know what timezone the user is in. Store it in
|
||||
the agent's operational state:
|
||||
|
||||
```json
|
||||
{
|
||||
"currentLocation": {
|
||||
"timezone": "US/Pacific",
|
||||
"city": "San Francisco"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Update the timezone when:**
|
||||
- Calendar shows the user flying somewhere (check for airline/hotel events)
|
||||
- User mentions being in a different city
|
||||
- User's active hours shift (they're responding at 3 AM PT = they're probably traveling)
|
||||
|
||||
**All times shown to the user should be in their LOCAL timezone.** Never
|
||||
show UTC or a timezone the user isn't in.
|
||||
|
||||
### Shell Implementation
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# quiet-hours-gate.sh — run before any notification
|
||||
|
||||
TIMEZONE="${USER_TIMEZONE:-US/Pacific}"
|
||||
LOCAL_HOUR=$(TZ="$TIMEZONE" date +%H)
|
||||
|
||||
if [ "$LOCAL_HOUR" -ge 23 ] || [ "$LOCAL_HOUR" -lt 8 ]; then
|
||||
echo "QUIET_HOURS=true"
|
||||
exit 1 # don't send
|
||||
fi
|
||||
|
||||
echo "QUIET_HOURS=false"
|
||||
exit 0 # ok to send
|
||||
```
|
||||
|
||||
**In cron job scripts:**
|
||||
```bash
|
||||
# Check quiet hours first
|
||||
if ! bash scripts/quiet-hours-gate.sh; then
|
||||
mkdir -p /tmp/cron-held
|
||||
echo "$OUTPUT" > /tmp/cron-held/$(basename "$0" .sh).md
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Not quiet hours — send normally
|
||||
send_notification "$OUTPUT"
|
||||
```
|
||||
|
||||
### Configurable Hours
|
||||
|
||||
Some users want different quiet hours. Store the config:
|
||||
|
||||
```json
|
||||
{
|
||||
"quiet_hours": {
|
||||
"start": 23,
|
||||
"end": 8,
|
||||
"enabled": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Set `enabled: false` to disable quiet hours entirely (e.g., for 24/7 monitoring).
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Gate on EVERY job.** The quiet hours check must run before every single
|
||||
cron job that produces notifications. If even one job skips the gate, the
|
||||
user gets a 3 AM ping and loses trust in the entire system. No exceptions.
|
||||
|
||||
2. **Held messages MUST be picked up.** If the morning briefing doesn't read
|
||||
`/tmp/cron-held/`, overnight results vanish silently. Verify the briefing
|
||||
skill reads and clears the held directory. Orphaned held files mean the
|
||||
pickup integration is broken.
|
||||
|
||||
3. **Timezone auto-detection is fragile.** Calendar-based timezone detection
|
||||
relies on the user having airline/hotel events with location data. If the
|
||||
user books travel without calendar entries, the system won't detect the
|
||||
move. Fall back to activity-hour analysis (responding at 3 AM PT = probably
|
||||
not in PT anymore) and ask the user if uncertain.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Set quiet hours to the current hour.** Temporarily set `QUIET_START` to
|
||||
one hour before now and `QUIET_END` to one hour after. Trigger a cron job.
|
||||
Verify the output goes to `/tmp/cron-held/` instead of being sent.
|
||||
|
||||
2. **Check held message pickup.** After step 1, run or simulate the morning
|
||||
briefing. Verify the held message appears in the "Overnight Updates"
|
||||
section and the file is deleted from `/tmp/cron-held/`.
|
||||
|
||||
3. **Verify timezone adjustment.** Change the timezone config to a zone where
|
||||
it's currently quiet hours. Trigger a notification. Verify it's held. Change
|
||||
back to your real timezone during active hours. Trigger again. Verify it sends.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
158
docs/guides/repo-architecture.md
Normal file
158
docs/guides/repo-architecture.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# Two-Repo Architecture: Agent Behavior vs World Knowledge
|
||||
|
||||
## Goal
|
||||
|
||||
Separate agent behavior (replaceable) from world knowledge (permanent) into two repos with strict boundaries.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: agent config and world knowledge are mixed together. Switch agents
|
||||
and you lose your knowledge. Switch knowledge tools and you lose your agent setup.
|
||||
|
||||
With this: your brain (14,700+ files of people, companies, meetings, ideas)
|
||||
survives any agent swap. Your agent config survives any knowledge tool swap.
|
||||
|
||||
## Implementation
|
||||
|
||||
### The Boundary Test
|
||||
|
||||
**"Is this about how the agent operates, or is this knowledge about the world?"**
|
||||
|
||||
| Question | If YES -> Agent Repo | If YES -> Brain Repo |
|
||||
|----------|---------------------|---------------------|
|
||||
| Would this file transfer if you switched AI agents? | YES | -- |
|
||||
| Would this file transfer if you switched to a different person? | -- | YES |
|
||||
| Is this about how the agent behaves? | YES | -- |
|
||||
| Is this about a person, company, deal, meeting, or idea? | -- | YES |
|
||||
|
||||
### Quick Decision Tree
|
||||
|
||||
```
|
||||
New file to create?
|
||||
|-- About a person, company, deal, project, meeting, idea? -> brain/
|
||||
|-- A spec, research doc, or strategic analysis? -> brain/
|
||||
|-- An original idea or observation? -> brain/originals/
|
||||
|-- A daily session log or heartbeat state? -> agent-repo/
|
||||
|-- A skill, config, cron, or ops file? -> agent-repo/
|
||||
|-- A task or todo? -> agent-repo/tasks/
|
||||
```
|
||||
|
||||
### Agent Repo (operational config)
|
||||
|
||||
How the agent works. Identity, configuration, operational state.
|
||||
|
||||
```
|
||||
agent-repo/
|
||||
├── AGENTS.md # Agent identity + operational rules
|
||||
├── SOUL.md # Persona, voice, values
|
||||
├── USER.md # User preferences + context
|
||||
├── HEARTBEAT.md # Daily ops flow
|
||||
├── TOOLS.md # Available tools + credentials
|
||||
├── MEMORY.md # Operational memory (preferences, decisions)
|
||||
├── skills/ # Agent capabilities (SKILL.md files)
|
||||
│ ├── ingest/SKILL.md
|
||||
│ ├── query/SKILL.md
|
||||
│ ├── enrich/SKILL.md
|
||||
│ └── ...
|
||||
├── cron/ # Scheduled jobs
|
||||
│ └── jobs.json
|
||||
├── tasks/ # Current task list
|
||||
│ └── current.md
|
||||
├── hooks/ # Event hooks + transforms
|
||||
├── scripts/ # Operational scripts (collectors, gates)
|
||||
└── memory/ # Session logs, state files
|
||||
├── heartbeat-state.json
|
||||
└── YYYY-MM-DD.md # Daily session logs
|
||||
```
|
||||
|
||||
### Brain Repo (world knowledge)
|
||||
|
||||
What you know. People, companies, deals, meetings, ideas, media.
|
||||
This is the repo GBrain indexes.
|
||||
|
||||
```
|
||||
brain/
|
||||
├── people/ # Person dossiers (compiled truth + timeline)
|
||||
├── companies/ # Company profiles
|
||||
├── deals/ # Deal tracking
|
||||
├── meetings/ # Meeting transcripts + analysis
|
||||
├── originals/ # YOUR original thinking (highest value)
|
||||
├── concepts/ # World concepts and frameworks
|
||||
├── ideas/ # Product and business ideas
|
||||
├── media/ # Video transcripts, books, articles
|
||||
│ ├── youtube/
|
||||
│ ├── podcasts/
|
||||
│ └── articles/
|
||||
├── sources/ # Source material summaries
|
||||
├── daily/ # Daily data (calendar, logs)
|
||||
│ └── calendar/
|
||||
│ └── YYYY/
|
||||
│ └── YYYY-MM-DD.md
|
||||
├── projects/ # Project specs and docs
|
||||
├── writing/ # Essays, drafts, published work
|
||||
├── diligence/ # Investment diligence materials
|
||||
│ └── company-name/
|
||||
│ ├── index.md
|
||||
│ ├── pitch-deck.md
|
||||
│ └── .raw/ # Original PDFs/files
|
||||
└── Apple Notes/ # Imported Apple Notes archive
|
||||
```
|
||||
|
||||
### The Hard Rule
|
||||
|
||||
**Never write knowledge to the agent repo.** If a skill, sub-agent, or cron
|
||||
job needs to create a file about a person, company, deal, meeting, project,
|
||||
or idea, it MUST write to the brain repo, never to the agent repo.
|
||||
|
||||
The brain is the permanent record. The agent repo is replaceable.
|
||||
|
||||
### Why Two Repos
|
||||
|
||||
**Independence.** You can switch AI agents (OpenClaw -> Hermes -> custom) without
|
||||
losing your knowledge. You can switch knowledge tools (GBrain -> something else)
|
||||
without losing your agent setup.
|
||||
|
||||
**Scale.** The brain grows large (10,000+ files). The agent repo stays small
|
||||
(< 100 files). Different backup strategies, different sync cadences.
|
||||
|
||||
**Privacy.** The brain contains sensitive information (people, deals, personal
|
||||
notes). The agent repo contains operational config. Different access controls.
|
||||
|
||||
**GBrain indexes the brain repo.** Run `gbrain sync --repo ~/brain/` to keep
|
||||
the search index current. The agent repo is never indexed by GBrain.
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Never write knowledge to the agent repo.** This is the most common
|
||||
violation. A skill that creates a person page, a cron job that saves
|
||||
meeting notes, a sub-agent that captures an idea -- all of these MUST
|
||||
write to the brain repo. If it's about the world, it goes in the brain.
|
||||
|
||||
2. **The brain is the permanent record.** When in doubt, ask: "Would this
|
||||
file survive switching to a completely different AI agent?" If yes, it
|
||||
belongs in the brain. Agent configs, skills, cron jobs, and operational
|
||||
state are replaceable. People, companies, ideas, and meetings are not.
|
||||
|
||||
3. **Don't index the agent repo.** GBrain indexes the brain repo only.
|
||||
Running `gbrain sync` against the agent repo pollutes search results
|
||||
with operational config instead of world knowledge.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Check file placement.** After any skill or cron job creates a file,
|
||||
verify it landed in the correct repo. Person/company/idea/meeting files
|
||||
should be in `brain/`. Skill/config/cron/state files should be in the
|
||||
agent repo. Any knowledge file in the agent repo is a boundary violation.
|
||||
|
||||
2. **Run the boundary test.** Pick 5 recently created files and ask: "Would
|
||||
this transfer if I switched AI agents?" and "Would this transfer if I
|
||||
switched to a different person?" If the answers don't match the file's
|
||||
location, it's in the wrong repo.
|
||||
|
||||
3. **Verify GBrain only indexes brain.** Run `gbrain stats` and check the
|
||||
indexed paths. None should point to the agent repo directory. If agent
|
||||
config files appear in search results, the sync target is misconfigured.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
78
docs/guides/search-modes.md
Normal file
78
docs/guides/search-modes.md
Normal file
@@ -0,0 +1,78 @@
|
||||
# Search Modes
|
||||
|
||||
## Goal
|
||||
Know which search command to use and when -- keyword, hybrid, or direct -- so every lookup is fast and returns the right result.
|
||||
|
||||
## What the User Gets
|
||||
Without this: the agent fumbles between search commands, returns chunks when full pages are needed, runs expensive semantic searches when a direct get would do, or misses results entirely. With this: every lookup uses the optimal mode, token budgets are respected, and the user gets the right information in the fewest calls.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
on user_asks_about(topic):
|
||||
# Decision tree: pick the right search mode
|
||||
|
||||
if know_exact_slug(topic):
|
||||
# MODE 3: Direct get -- instant, no search overhead
|
||||
result = gbrain get <slug>
|
||||
# e.g., "Tell me about Pedro" -> gbrain get pedro-franceschi
|
||||
# Returns the FULL page -- compiled truth + timeline
|
||||
|
||||
elif topic.is_exact_name or topic.is_keyword:
|
||||
# MODE 1: Keyword search -- fast, no embeddings needed, day-one ready
|
||||
results = gbrain search "{name_or_keyword}"
|
||||
# e.g., "Find anything about Series A" -> gbrain search "Series A"
|
||||
# Returns CHUNKS, not full pages
|
||||
|
||||
# IMPORTANT: keyword search returns chunks
|
||||
# If the chunk confirms relevance, THEN load the full page:
|
||||
if chunk.confirms_relevance:
|
||||
full_page = gbrain get <slug_from_chunk>
|
||||
|
||||
elif topic.is_semantic_question:
|
||||
# MODE 2: Hybrid search -- semantic + keyword, needs embeddings
|
||||
results = gbrain query "{natural language question}"
|
||||
# e.g., "Who do I know at fintech companies?" -> gbrain query "fintech contacts"
|
||||
# Returns ranked chunks via vector + keyword + RRF
|
||||
|
||||
# Same rule: chunks first, then get full page if needed
|
||||
if chunk.confirms_relevance:
|
||||
full_page = gbrain get <slug_from_chunk>
|
||||
|
||||
# Quick reference:
|
||||
# | Mode | Command | Needs Embeddings | Speed | Best For |
|
||||
# |---------|----------------------|------------------|---------|---------------------------------|
|
||||
# | Keyword | gbrain search "term" | No | Fastest | Known names, exact matches |
|
||||
# | Hybrid | gbrain query "..." | Yes | Fast | Semantic questions, fuzzy match |
|
||||
# | Direct | gbrain get <slug> | No | Instant | When you know the slug |
|
||||
|
||||
# Progression over time:
|
||||
# Day 1: keyword search (works without embeddings)
|
||||
# After first embed: hybrid search unlocked
|
||||
# Once you know slugs: direct get for speed
|
||||
|
||||
# Precedence for conflicting information within a page:
|
||||
# 1. User's direct statements (always wins)
|
||||
# 2. Compiled truth sections (synthesized from evidence)
|
||||
# 3. Timeline entries (raw signal, reverse chronological)
|
||||
# 4. External sources (web search, APIs)
|
||||
```
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Search returns chunks, not full pages.** After `gbrain search` or `gbrain query`, you get excerpts. Always run `gbrain get <slug>` to load the full page when the chunk confirms relevance. Don't answer questions from chunks alone when the full context matters.
|
||||
2. **Keyword search works without embeddings.** On day one before any embedding run, `gbrain search` still works. Don't tell the user "search isn't available yet" -- keyword search is always available.
|
||||
3. **Don't use hybrid search for known names.** `gbrain query "Pedro Franceschi"` wastes embedding compute. Use `gbrain search "Pedro Franceschi"` or better yet `gbrain get pedro-franceschi` if you know the slug.
|
||||
4. **Token budget awareness.** A full page via `gbrain get` can be large. Read the search chunks first to confirm relevance before pulling the full page. "Did anyone mention the Series A?" -- search results (chunks) are probably enough. "Tell me everything about Pedro" -- get the full page.
|
||||
5. **Hybrid search needs embeddings to have been run.** If `gbrain query` returns nothing but `gbrain search` finds results, the embeddings haven't been generated yet. Run the embedding pipeline first.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. Run `gbrain search "Pedro"` -- confirm it returns chunks with matching text and slug references.
|
||||
2. Run `gbrain query "who works at fintech companies"` -- confirm it returns semantically relevant results (not just keyword matches on "fintech").
|
||||
3. Run `gbrain get pedro-franceschi` -- confirm it returns the full page with compiled truth and timeline.
|
||||
4. Compare: search for the same entity using all three modes. Keyword should be fastest, hybrid should surface conceptual matches, direct should return the complete page.
|
||||
5. After a search returns a chunk, run `gbrain get` on the slug from that chunk. Confirm the full page contains more context than the chunk alone.
|
||||
|
||||
---
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
131
docs/guides/skill-development.md
Normal file
131
docs/guides/skill-development.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# Skill Development Cycle
|
||||
|
||||
## Goal
|
||||
|
||||
Turn every repeating task into a durable, automated skill so that if you ask twice, it should already be running on a cron.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: ad-hoc work that the agent forgets how to do. You ask "enrich
|
||||
this person" and the agent invents a new process each time. Quality varies.
|
||||
|
||||
With this: every capability is codified, tested, and scheduled. Enrichment
|
||||
runs the same way every time. New patterns get skill-ified within a day.
|
||||
|
||||
## Implementation
|
||||
|
||||
**The Rule:** If you have to ask your agent for something twice, it should
|
||||
already be a skill running on a cron. First time is discovery. Second time
|
||||
is system failure.
|
||||
|
||||
### The 5-Step Cycle
|
||||
|
||||
**Step 1: Concept the Process.**
|
||||
Describe what needs to happen in plain language:
|
||||
- What's the input? What's the output? What triggers it?
|
||||
- What data sources does it touch?
|
||||
- How often should it run?
|
||||
|
||||
**Step 2: Run Manually for 3-10 Items.**
|
||||
Actually do the work by hand on a small batch. This is the prototype phase.
|
||||
Do NOT write a SKILL.md yet. Just do the work and observe:
|
||||
- What does the output actually look like?
|
||||
- What edge cases appear?
|
||||
- What quality bar is right?
|
||||
|
||||
**Step 3: Evaluate Output.**
|
||||
Show the user the results. Get feedback.
|
||||
- Does output look good? Is quality right?
|
||||
- Did you miss anything? Over-engineer?
|
||||
- Revise the process based on what you learned.
|
||||
|
||||
**Step 4: Codify into a Skill.**
|
||||
Write the SKILL.md. Either:
|
||||
- **New skill** -- genuinely new capability
|
||||
- **Add to existing skill** -- variation of something that exists (parameterize it)
|
||||
|
||||
The skill must be:
|
||||
- **Durable** -- works tomorrow, next week, next month without manual intervention
|
||||
- **MECE** -- doesn't overlap with other skills (see below)
|
||||
- **Parameterized** -- handles variations through parameters, not separate skills
|
||||
|
||||
**Step 5: Add to Cron (if recurring).**
|
||||
If the process should run automatically:
|
||||
- Add to existing cron job if it fits naturally
|
||||
- Create new cron job if it has a distinct scheduling concern
|
||||
- Monitor the first 2-3 automated runs for quality
|
||||
- Fix issues that emerge at scale
|
||||
|
||||
### MECE Discipline
|
||||
|
||||
Skills should be **Mutually Exclusive, Collectively Exhaustive**:
|
||||
- Each entity type has exactly ONE owner skill
|
||||
- Each signal source has exactly ONE owner skill
|
||||
- Two skills creating the same brain page = MECE violation
|
||||
|
||||
**Example ownership (no overlap):**
|
||||
|
||||
| Signal Source | Owner Skill | Creates |
|
||||
|--------------|-------------|---------|
|
||||
| Meeting transcripts | meeting-ingestion | brain/meetings/ pages |
|
||||
| Email messages | executive-assistant | brain/people/ timeline entries |
|
||||
| X/Twitter posts | x-collector | brain/media/ pages |
|
||||
| Person enrichment | enrich | brain/people/ compiled truth |
|
||||
| Calendar events | calendar-sync | brain/daily/calendar/ pages |
|
||||
| Video/podcast content | media-ingest | brain/media/ pages |
|
||||
|
||||
### Quality Bar Checklist
|
||||
|
||||
A skill is ready when:
|
||||
|
||||
- [ ] Ran successfully on 3-10 real items with good output
|
||||
- [ ] User reviewed output and approved
|
||||
- [ ] SKILL.md is under 500 lines (use references for overflow)
|
||||
- [ ] Checks notability before creating brain pages (don't create pages for nobodies)
|
||||
- [ ] Has citation enforcement (every fact has a source)
|
||||
- [ ] Doesn't overlap with existing skills (MECE)
|
||||
- [ ] If recurring: on a cron with appropriate schedule
|
||||
- [ ] If it creates brain pages: checks notability first
|
||||
|
||||
### What This Means in Practice
|
||||
|
||||
- Don't do ad-hoc brain enrichment, use the enrich skill
|
||||
- Don't manually check social media, use an automated cron
|
||||
- Don't manually ingest meeting notes, use the meeting-sync recipe
|
||||
- Don't manually create entity pages, use the entity detector
|
||||
- If a new pattern emerges, prototype it, skill-ify it, cron-ify it
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **MECE violations compound silently.** Two skills that both create
|
||||
`brain/people/` pages will produce duplicates and conflicting data.
|
||||
Before creating a new skill, check the ownership table. If an existing
|
||||
skill already owns that entity type, extend it with parameters instead
|
||||
of creating a new skill.
|
||||
|
||||
2. **The quality bar is real.** Don't ship a skill that hasn't been tested
|
||||
on 3-10 real items with user approval. A skill that produces bad output
|
||||
is worse than no skill -- it creates bad brain pages at scale on a cron.
|
||||
|
||||
3. **Don't create stubs.** A SKILL.md with "TODO: implement" is not a skill.
|
||||
Every skill must be complete enough to run end-to-end on real data. If
|
||||
you can't finish it, don't create the file. Keep it as manual work until
|
||||
you can do it right.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Run the skill on 3 real items.** Execute the skill against live data
|
||||
(not test data). Check that the output matches the quality bar: citations
|
||||
present, notability checked, no stubs created.
|
||||
|
||||
2. **Check MECE against existing skills.** Review the ownership table. Does
|
||||
this new skill create pages in a directory already owned by another skill?
|
||||
If yes, it's a MECE violation. Merge or parameterize instead.
|
||||
|
||||
3. **Verify the quality bar checklist.** Walk through every item in the
|
||||
Quality Bar Checklist above. If any item is unchecked, the skill isn't
|
||||
ready for cron deployment.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
75
docs/guides/source-attribution.md
Normal file
75
docs/guides/source-attribution.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Source Attribution
|
||||
|
||||
## Goal
|
||||
Every fact in the brain traces to where it came from -- who said it, in what context, and when.
|
||||
|
||||
## What the User Gets
|
||||
Without this: six months from now, someone reads a brain page and has no idea if "Pedro co-founded Brex" came from Pedro himself, a LinkedIn scrape, or a hallucination. With this: every claim is auditable, conflicts are surfaced, and the brain is a court-admissible record of reality.
|
||||
|
||||
## Implementation
|
||||
|
||||
```
|
||||
on brain_write(page, fact):
|
||||
# EVERY fact gets a citation -- compiled truth AND timeline
|
||||
citation = format_citation(source)
|
||||
# format: [Source: {who}, {channel/context}, {date} {time} {tz}]
|
||||
|
||||
# Category-specific formats:
|
||||
if source.type == "direct":
|
||||
# [Source: User, direct message, 2026-04-07 12:33 PM PT]
|
||||
elif source.type == "meeting":
|
||||
# [Source: Meeting notes "Team Sync" #12345, 2026-04-03 12:11 PM PT]
|
||||
elif source.type == "api_enrichment":
|
||||
# [Source: Crustdata LinkedIn enrichment, 2026-04-07 12:35 PM PT]
|
||||
elif source.type == "social_media":
|
||||
# MUST include full URL -- not just @handle
|
||||
# [Source: X/@pedroh96 tweet, product launch, 2026-04-07](https://x.com/pedroh96/status/...)
|
||||
elif source.type == "email":
|
||||
# [Source: email from Sarah Chen re Q2 board deck, 2026-04-05 2:30 PM PT]
|
||||
elif source.type == "workspace":
|
||||
# [Source: Slack #engineering, Keith re deploy schedule, 2026-04-06 11:45 AM PT]
|
||||
elif source.type == "web":
|
||||
# [Source: Happenstance research, 2026-04-07 12:35 PM PT]
|
||||
elif source.type == "published":
|
||||
# [Source: [Wall Street Journal, 2026-04-05](https://wsj.com/...)]
|
||||
elif source.type == "funding":
|
||||
# [Source: Captain API funding data, 2026-04-07 2:00 PM PT]
|
||||
|
||||
# Attach citation inline with the fact
|
||||
gbrain put <slug> --content "...fact [Source: ...]..."
|
||||
|
||||
# When sources conflict, note BOTH -- never silently pick one
|
||||
if conflicts_exist(fact, existing_page):
|
||||
append_to_compiled_truth(
|
||||
"Conflict: Source A says X, Source B says Y. "
|
||||
"[Source: A] [Source: B]"
|
||||
)
|
||||
|
||||
# Source hierarchy for conflict resolution (highest authority first):
|
||||
SOURCE_PRIORITY = [
|
||||
"User direct statements", # 1 -- always wins
|
||||
"Primary sources", # 2 -- meetings, emails, direct conversations
|
||||
"Enrichment APIs", # 3 -- Crustdata, Happenstance, Captain
|
||||
"Web search results", # 4
|
||||
"Social media posts", # 5
|
||||
]
|
||||
```
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Compiled truth is NOT exempt from citations.** "Pedro co-founded Brex" in the synthesis section needs `[Source: ...]` just as much as a timeline entry does. Most agents skip citations above the bar.
|
||||
2. **Tweet URLs are mandatory.** `[Source: X/@handle tweet, topic, date]` without a URL is a broken citation. Hundreds of brain pages end up with unreachable tweet references when the URL is omitted. Always: `[Source: X/@handle tweet, topic, date](https://x.com/handle/status/ID)`.
|
||||
3. **"User said it" isn't enough.** WHERE, ABOUT WHAT, WHEN. `[Source: User, direct message, 2026-04-07 12:33 PM PT]` -- not just `[Source: User]`.
|
||||
4. **Don't silently resolve conflicts.** When the user says one thing and an API says another, note the contradiction in compiled truth with both citations. Let the reader decide.
|
||||
5. **Timeline entries need sources too.** Every append to the timeline carries provenance. A timeline entry without a source is an orphan fact.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. Open any brain page with `gbrain get <slug>`. Read the compiled truth section above the bar. Every factual claim should have an inline `[Source: ...]` citation.
|
||||
2. Search for tweet references: `gbrain search "X/@"`. Every result should have a full URL, not just an @handle.
|
||||
3. Find a page with data from multiple sources (e.g., a person enriched via API + mentioned in a meeting). Confirm both sources are cited independently.
|
||||
4. Check timeline entries on 3 random pages. Each entry should have a source citation with date and context.
|
||||
5. Look for a page where the user stated something that contradicts an API result. Confirm the contradiction is noted, not silently resolved.
|
||||
|
||||
---
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
122
docs/guides/sub-agent-routing.md
Normal file
122
docs/guides/sub-agent-routing.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# Sub-Agent Model Routing
|
||||
|
||||
## Goal
|
||||
|
||||
Route sub-agents to the cheapest model that can do the job, saving 10-40x on costs without sacrificing quality.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: every sub-agent runs on Opus ($15/MTok). Entity detection on
|
||||
every message costs $3-5/day. Research tasks cost $10+ each.
|
||||
|
||||
With this: entity detection runs on Sonnet ($3/MTok, 5x cheaper). Research
|
||||
runs on DeepSeek ($0.50/MTok, 30x cheaper). Main session stays on Opus for
|
||||
quality. Total cost drops 70-80%.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Routing Table
|
||||
|
||||
| Task Type | Recommended Model | Why |
|
||||
|-----------|------------------|-----|
|
||||
| Main session / complex instructions | Opus-class (default) | Best reasoning and instruction following |
|
||||
| Research / synthesis / analysis | DeepSeek V3 or equivalent | 25-40x cheaper, strong on exploratory work |
|
||||
| Structured output / long context | Large context model (Qwen, Gemini) | 200K+ context, reliable JSON output |
|
||||
| Fast lightweight sub-agents | Fast inference model (Groq) | 500 tok/s, cheap, good for quick tasks |
|
||||
| Deep reasoning (use sparingly) | Reasoning model (DeepSeek-R1, o3) | Best for hard problems, expensive |
|
||||
| Entity detection (signal detector) | Sonnet-class | Fast, cheap, sufficient quality for detection |
|
||||
|
||||
### The Signal Detector Pattern
|
||||
|
||||
Spawn a lightweight sub-agent on EVERY inbound message. This is mandatory.
|
||||
|
||||
```
|
||||
on_every_message(text):
|
||||
// Spawn async — don't block the response
|
||||
spawn_subagent({
|
||||
task: `SIGNAL DETECTION — scan this message:
|
||||
"${text}"
|
||||
|
||||
1. IDEAS FIRST: Is the user expressing an original thought?
|
||||
If yes -> create/update brain/originals/ with EXACT phrasing
|
||||
2. ENTITIES: Extract person names, company names, media titles
|
||||
For each -> check brain, create/enrich if notable
|
||||
3. FACTS: New info about existing entities -> update timeline
|
||||
4. CITATIONS: Every fact needs [Source: ...] attribution
|
||||
5. Sync changes to brain repo`,
|
||||
model: "sonnet-class", // fast + cheap
|
||||
timeout: 120s
|
||||
})
|
||||
```
|
||||
|
||||
**Why Sonnet-class for detection:** Entity detection is pattern matching, not
|
||||
deep reasoning. Sonnet is 5-10x cheaper than Opus and fast enough for async
|
||||
detection. The main session continues on Opus while detection runs in parallel.
|
||||
|
||||
### Research Pipeline Pattern
|
||||
|
||||
For research-heavy tasks, use a multi-model pipeline:
|
||||
|
||||
```
|
||||
1. PLANNING (Opus): Write research brief, identify what to look for
|
||||
2. EXECUTION (DeepSeek): Sub-agent does the actual research (web, APIs, docs)
|
||||
3. SYNTHESIS (Opus): Read research output, add strategic analysis
|
||||
```
|
||||
|
||||
**Why this works:** The planning and synthesis steps need taste and judgment
|
||||
(Opus). The execution step is mechanical data gathering (DeepSeek at 25-40x
|
||||
lower cost). You get Opus-quality output at DeepSeek-level cost for 80% of
|
||||
the work.
|
||||
|
||||
### When to Spawn Sub-Agents
|
||||
|
||||
| Situation | Spawn? | Model |
|
||||
|-----------|--------|-------|
|
||||
| Every inbound message | YES (mandatory) | Sonnet |
|
||||
| Research request | YES | DeepSeek for execution |
|
||||
| Quick lookup / fact check | YES | Fast model (Groq) |
|
||||
| Complex analysis | NO -- handle in main session | Opus |
|
||||
| Writing / editing | NO -- handle in main session | Opus |
|
||||
|
||||
### Cost Optimization
|
||||
|
||||
The main session runs on your best model. Everything else runs on the
|
||||
cheapest model that can do the job. In practice, 60-70% of sub-agent
|
||||
work is entity detection (Sonnet) and research execution (DeepSeek),
|
||||
which are 10-40x cheaper than the main session model.
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Sonnet, not Opus, for detection.** The most common mistake is running
|
||||
entity detection on Opus. Detection is pattern matching, not deep reasoning.
|
||||
Sonnet is 5-10x cheaper and fast enough. Reserve Opus for the main session
|
||||
where reasoning quality matters.
|
||||
|
||||
2. **Don't block the main thread.** Sub-agents must run asynchronously. If the
|
||||
signal detector runs synchronously, the user waits 30-120 seconds for every
|
||||
message while entity detection completes. Spawn and forget. The user sees
|
||||
a response immediately.
|
||||
|
||||
3. **Cost optimization is multiplicative.** Entity detection runs on every
|
||||
single message. If you use Opus at $15/MTok for detection across 50
|
||||
messages/day, that's $3-5/day just for detection. Sonnet at $3/MTok brings
|
||||
that to $0.60-1.00/day. Over a month, the wrong model choice costs $100+
|
||||
more than necessary.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Spawn a signal detector and check the model.** Send a message and verify
|
||||
the sub-agent was spawned on Sonnet-class, not Opus. Check the model field
|
||||
in the sub-agent config or logs.
|
||||
|
||||
2. **Check cost per day.** After running for a day with sub-agent routing,
|
||||
compare total API costs against the previous day without routing. You
|
||||
should see a 50-80% reduction in total cost.
|
||||
|
||||
3. **Verify async execution.** Send a message and measure response time. The
|
||||
response should arrive in under 5 seconds. If it takes 30+ seconds, the
|
||||
signal detector is running synchronously and blocking the main thread.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
182
docs/guides/upgrades-auto-update.md
Normal file
182
docs/guides/upgrades-auto-update.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# Upgrades and Auto-Update Notifications
|
||||
|
||||
## Goal
|
||||
|
||||
Users get notified of new GBrain features conversationally, and the agent walks them through upgrading with post-upgrade migrations that make the new version actually work.
|
||||
|
||||
## What the User Gets
|
||||
|
||||
Without this: GBrain ships updates but nobody knows. The user stays on an old
|
||||
version with stale skills and missing features. Or worse, someone runs
|
||||
`gbrain upgrade` but skips the post-upgrade steps, leaving new code with old
|
||||
agent behavior.
|
||||
|
||||
With this: the agent checks for updates daily, sells the upgrade with punchy
|
||||
benefit-focused bullets, waits for explicit permission, then runs the full
|
||||
upgrade flow including re-reading skills, running migrations, and syncing
|
||||
schema. The user gets new capabilities automatically.
|
||||
|
||||
## Implementation
|
||||
|
||||
### The Check (cron-initiated)
|
||||
|
||||
```
|
||||
check_for_update():
|
||||
result = run("gbrain check-update --json")
|
||||
|
||||
if not result.update_available:
|
||||
exit_silently() // do NOT message the user
|
||||
|
||||
// Sell the upgrade — lead with what they can DO, not what changed
|
||||
message = compose_upgrade_message(
|
||||
current: result.current_version,
|
||||
latest: result.latest_version,
|
||||
changelog: result.changelog
|
||||
)
|
||||
send_to_user(message, respect_quiet_hours=true)
|
||||
```
|
||||
|
||||
### The Upgrade Message
|
||||
|
||||
Sell the upgrade. The user should feel "hell yeah, I want that." Lead with
|
||||
what they can DO now that they couldn't before, not what files changed.
|
||||
|
||||
```
|
||||
> **GBrain v0.5.0 is available** (you're on v0.4.0)
|
||||
>
|
||||
> What's new:
|
||||
> - Your brain never falls behind. Live sync keeps the vector DB current
|
||||
> automatically, so edits show up in search within minutes
|
||||
> - New verification runbook catches silent failures before they bite you
|
||||
> - New installs set up live sync automatically. No more manual setup step
|
||||
>
|
||||
> Want me to upgrade? I'll update everything and refresh my playbook.
|
||||
>
|
||||
> (Reply **yes** to upgrade, **not now** to skip, **weekly** to check
|
||||
> less often, or **stop** to turn off update checks)
|
||||
```
|
||||
|
||||
### Handling Responses
|
||||
|
||||
| User says | Action |
|
||||
|-----------|--------|
|
||||
| yes / y / sure / ok / do it / upgrade | Run the full upgrade flow (below) |
|
||||
| not now / later / skip / snooze | Acknowledge, check again next cycle |
|
||||
| weekly | Store preference, switch cron to weekly |
|
||||
| daily | Store preference, switch cron back to daily |
|
||||
| stop / unsubscribe / no more | Disable the cron. Tell user how to resume |
|
||||
|
||||
**Never auto-upgrade.** Always wait for explicit confirmation.
|
||||
|
||||
### The Full Upgrade Flow (after user says yes)
|
||||
|
||||
```
|
||||
full_upgrade():
|
||||
// Step 1: Update the binary/package
|
||||
run("gbrain upgrade")
|
||||
|
||||
// Step 2: Re-read all updated skills
|
||||
for skill in find("skills/*/SKILL.md"):
|
||||
read_and_internalize(skill) // updated skills = better agent behavior
|
||||
|
||||
// Step 3: Re-read production reference docs
|
||||
read("docs/GBRAIN_SKILLPACK.md")
|
||||
read("docs/GBRAIN_RECOMMENDED_SCHEMA.md")
|
||||
|
||||
// Step 4: Check for version-specific migration directives
|
||||
for version in range(old_version, new_version):
|
||||
migration = find(f"skills/migrations/v{version}.md")
|
||||
if migration exists:
|
||||
read_and_execute(migration) // in order, don't skip
|
||||
|
||||
// Step 5: Schema sync — suggest new, respect declined
|
||||
state = read("~/.gbrain/update-state.json")
|
||||
for recommendation in new_schema_recommendations:
|
||||
if recommendation not in state.declined:
|
||||
suggest_to_user(recommendation)
|
||||
update(state, new_choices)
|
||||
|
||||
// Step 6: Report what changed
|
||||
summarize_to_user(actions_taken)
|
||||
```
|
||||
|
||||
### Migration Files
|
||||
|
||||
Migration files live at `skills/migrations/vX.Y.Z.md`. They contain agent
|
||||
instructions (not scripts) for post-upgrade actions that make the new version
|
||||
work for existing users. Example: v0.5.0 migration sets up live sync and
|
||||
runs the verification runbook.
|
||||
|
||||
The agent reads migration files in version order and executes them step by
|
||||
step. Without migrations, the agent has new code but the user's environment
|
||||
hasn't changed.
|
||||
|
||||
### Cron Registration
|
||||
|
||||
```
|
||||
Name: gbrain-update-check
|
||||
Default schedule: 0 9 * * * (daily 9 AM)
|
||||
Weekly schedule: 0 9 * * 1 (Monday 9 AM)
|
||||
Prompt: "Run gbrain check-update --json. If update_available is true,
|
||||
summarize the changelog and message me asking if I'd like to upgrade.
|
||||
If false, stay silent."
|
||||
```
|
||||
|
||||
### Frequency Preferences
|
||||
|
||||
Default: daily. Store in agent memory as `gbrain_update_frequency: daily|weekly|off`.
|
||||
Also persist in `~/.gbrain/update-state.json` so it survives agent context resets.
|
||||
|
||||
### Standalone Skillpack Users
|
||||
|
||||
If you loaded this SKILLPACK directly (copied or read from GitHub) without
|
||||
installing gbrain, you can still stay current. Both GBRAIN_SKILLPACK.md and
|
||||
GBRAIN_RECOMMENDED_SCHEMA.md have version markers:
|
||||
|
||||
```bash
|
||||
curl -s https://raw.githubusercontent.com/garrytan/gbrain/master/docs/GBRAIN_SKILLPACK.md | head -1
|
||||
# Returns: <!-- skillpack-version: X.Y.Z -->
|
||||
```
|
||||
|
||||
If the remote version is newer, fetch the full file and replace your local
|
||||
copy. Set up a weekly cron to check automatically.
|
||||
|
||||
## Tricky Spots
|
||||
|
||||
1. **Never auto-install.** The upgrade must always wait for the user's explicit
|
||||
"yes." Even if the cron detects an update at 9 AM and the changelog looks
|
||||
great, the agent messages the user and waits. Auto-installing can break
|
||||
workflows, introduce breaking changes, or interrupt work in progress.
|
||||
|
||||
2. **Migration files are agent instructions, not scripts.** They tell the agent
|
||||
what to do step by step in plain language. They are NOT bash scripts to
|
||||
execute blindly. The agent reads them, understands the context, and adapts
|
||||
to the user's specific environment (e.g., skip a step if the user already
|
||||
has live sync configured).
|
||||
|
||||
3. **check-update should run on a daily cron.** Don't rely on the user
|
||||
remembering to check for updates. The cron runs `gbrain check-update --json`
|
||||
daily at 9 AM (respecting quiet hours). If there's nothing new, it stays
|
||||
completely silent. The user only hears about updates when there IS something
|
||||
worth upgrading to.
|
||||
|
||||
## How to Verify
|
||||
|
||||
1. **Run check-update and verify detection.** Execute
|
||||
`gbrain check-update --json`. Verify it returns the current version and
|
||||
correctly reports whether an update is available. If `update_available`
|
||||
is false, verify the version matches the latest release on GitHub.
|
||||
|
||||
2. **Verify migration files are readable.** List `skills/migrations/` and
|
||||
check that each file follows the naming convention `vX.Y.Z.md`. Open one
|
||||
and verify it contains step-by-step agent instructions, not raw scripts.
|
||||
The agent should be able to read and execute each step.
|
||||
|
||||
3. **Test the full upgrade flow end-to-end.** If an update is available, say
|
||||
"yes" and watch the agent execute the full flow: upgrade, re-read skills,
|
||||
run migrations, sync schema, report. Verify each step completes and the
|
||||
agent reports what changed.
|
||||
|
||||
---
|
||||
|
||||
*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*
|
||||
Reference in New Issue
Block a user