feat: GBrain v0.1.0 — Postgres-native personal knowledge brain (#1)

* chore: add CLAUDE.md with project context and gstack skill routing rules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: initialize project with Bun + TypeScript package.json with dependencies (postgres, pgvector, openai, anthropic, MCP SDK, gray-matter). TypeScript config targeting ESNext with bundler module resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add foundation layer — engine interface, Postgres engine, schema BrainEngine pluggable interface with full PostgresEngine: CRUD, search (keyword + vector), links, tags, timeline, versions, stats, health, ingest log, config. Trigger-based tsvector spanning pages + timeline_entries. Markdown parser with frontmatter, compiled_truth / timeline splitting, and round-trip serialization. 19 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add 3-tier chunking and embedding service Recursive delimiter-aware chunker (5-level hierarchy, 300-word chunks, 50-word overlap). Semantic chunker with Savitzky-Golay boundary detection and recursive fallback. LLM-guided chunker via Claude Haiku with sliding window topic detection. OpenAI embedding service with batch support, exponential backoff, and rate limit handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add hybrid search with RRF fusion, expansion, and 4-layer dedup Hybrid search merges vector (pgvector HNSW) + keyword (tsvector) via Reciprocal Rank Fusion. Multi-query expansion via Claude Haiku generates 2 alternative phrasings. 4-layer dedup pipeline: by source, cosine similarity, type diversity (60% cap), per-page cap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add GBRAIN_V0 spec, pluggable engine architecture, SQLite engine plan GBRAIN_V0.md: full product spec with architecture decisions, CLI commands, schema, search architecture, chunking strategies, first-time experience, and future plans. ENGINES.md: pluggable engine interface, capability matrix, how to add new backends. SQLITE_ENGINE.md: complete SQLite implementation plan with schema, FTS5 setup, vector search options, and contributor guide. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add CLI with all commands Full CLI dispatcher with 25+ commands: init (Supabase wizard), get, put, delete, list, search, query (hybrid RRF), import (bulk with progress bar), export (round-trip), embed, stats, health, tag/untag/tags, link/unlink/ backlinks/graph, timeline/timeline-add, history/revert, config, upgrade, serve, call. Smart slug resolution on reads. Version snapshots on updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add MCP stdio server with all brain tools 20 MCP tools mirroring CLI operations: get/put/delete/list pages, search (keyword), query (hybrid RRF + expansion), tags, links with graph traversal, timeline, stats, health, version history, and revert. Auto-chunks and embeds on put_page. CLI and MCP share the same engine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add 6 skill files and ClawHub manifest Fat markdown skills for AI agents: ingest (meetings/docs/articles with timeline merge), query (3-layer search + synthesis + citations), maintain (health checks, stale detection, orphan audit), enrich (external API enrichment), briefing (daily briefing compilation), migrate (universal migration from Obsidian/Notion/Logseq/markdown/CSV/JSON/Roam). ClawHub manifest for skill distribution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add README, CONTRIBUTING, update CLAUDE.md test references README with quickstart, commands, architecture, library usage, MCP setup, and links to design docs. CONTRIBUTING with setup, project structure, and guides for adding commands and engines. CLAUDE.md updated to reference actual test files instead of planned-but-unwritten import test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address adversarial review findings — 5 critical/high fixes - revertToVersion: add page_id check to prevent cross-page data corruption - traverseGraph: use UNION instead of UNION ALL for cycle safety - embedAll: preserve all chunks when embedding stale subset only - embedding: throw on retry exhaustion instead of returning zero vectors - putPage: validate slugs to prevent path traversal on export Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.1.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: expand README with schema, install, search architecture, and motivation Why it exists, how search works (with ASCII diagram), full database schema with all 9 tables and index details, chunking strategies explained, storage estimates, setup wizard walkthrough, knowledge model with example page, library usage with more examples, expanded skills table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add MIT license (Copyright 2026 Garry Tan) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add OpenClaw install flow as primary option in README OpenClaw users just say "install gbrain" and the orchestrator handles everything: package install, Supabase setup wizard, skill registration. Shows the conversational interface for querying, ingesting, and briefings. ClawHub and standalone CLI paths follow as alternatives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add prerequisites and explicit OpenClaw install instructions Prerequisites table listing Supabase, OpenAI, and Anthropic dependencies with links. Environment variable setup. Explicit step-by-step prompt for OpenClaw users showing exactly what to tell the orchestrator. Note that search degrades gracefully without API keys (keyword-only without OpenAI, no expansion without Anthropic). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: scrub named references, add PG essay demo section to README Replace all Pedro/Brex/Jensen Huang/River AI examples with Paul Graham essay examples using the kindling corpus. Add "Try it" section to README showing the power of hybrid search on PG essays in 90 seconds. Update test fixtures to use concept pages instead of person pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:48:10 -07:00
parent 3144971cd0
commit b22cbd349a
62 changed files with 6655 additions and 0 deletions
--- a/skills/briefing/SKILL.md
+++ b/skills/briefing/SKILL.md
@@ -0,0 +1,58 @@
+# Briefing Skill
+
+Compile a daily briefing from brain context.
+
+## Workflow
+
+1. **Today's meetings.** For each meeting on the calendar:
+   - Look up all participants via `gbrain query <name>`
+   - Read their pages for compiled_truth context
+   - Summarize: who they are, recent timeline, relationship to you
+2. **Active deals.** `gbrain list --type deal` filtered to active status:
+   - Deadlines approaching in the next 7 days
+   - Recent timeline entries (last 7 days)
+3. **Time-sensitive threads.** Open items from timeline entries:
+   - Items with deadlines in the next 48 hours
+   - Follow-ups that are overdue
+4. **Recent changes.** Pages updated in the last 24 hours:
+   - What changed and why (read timeline entries)
+5. **People in play.** `gbrain list --type person` sorted by recency:
+   - Updated in last 7 days
+   - Have high activity (many recent timeline entries)
+6. **Stale alerts.** From `gbrain health`:
+   - Pages flagged as stale that are relevant to today's meetings
+
+## Output Format
+
+```
+DAILY BRIEFING — [date]
+========================
+
+MEETINGS TODAY
+- [time] [meeting name]
+  Participants: [name] (slug: people/name, [key context])
+
+ACTIVE DEALS
+- [deal name] — [status], deadline: [date]
+  Recent: [latest timeline entry]
+
+ACTION ITEMS
+- [item] — due [date], related to [slug]
+
+RECENT CHANGES (24h)
+- [slug] — [what changed]
+
+PEOPLE IN PLAY
+- [name] — [why they're active]
+```
+
+## Commands Used
+
+```
+gbrain query <name>
+gbrain get <slug>
+gbrain list --type deal
+gbrain list --type person
+gbrain health
+gbrain timeline <slug>
+```
--- a/skills/enrich/SKILL.md
+++ b/skills/enrich/SKILL.md
@@ -0,0 +1,45 @@
+# Enrich Skill
+
+Enrich person and company pages from external APIs.
+
+## Sources
+
+| Source | Data | API |
+|--------|------|-----|
+| Crustdata | LinkedIn profiles, company data | REST API |
+| Happenstance | Career history, connections | REST API |
+| Exa | Web mentions, articles | REST API |
+
+Note: enrichment requires separate API credentials for each service. No client
+integrations ship in v1. This skill guides Claude Code to make API calls directly.
+
+## Workflow
+
+1. **Select target pages.** `gbrain list --type person` or `gbrain list --type company`
+2. **For each page:**
+   - Read current compiled_truth to understand what we already know
+   - Call external APIs for fresh data
+   - Store raw API responses: the raw JSON goes into `gbrain call put_raw_data`
+   - Distill highlights into compiled_truth updates
+3. **Validation rules:**
+   - Connection count < 20 on LinkedIn = likely wrong person, skip
+   - Name mismatch between brain and API = skip, flag for manual review
+   - Don't overwrite human-written assessments with API boilerplate
+
+## Quality Rules
+
+- Raw data goes to raw_data table (preserves provenance)
+- Only distilled, useful info goes to compiled_truth
+- Always add a timeline entry: "Enriched from [source] on [date]"
+- Don't enrich the same page more than once per week unless requested
+- Rate limit: respect API rate limits, use exponential backoff
+
+## Commands Used
+
+```
+gbrain get <slug>
+gbrain put <slug>
+gbrain timeline-add <slug> <date> "Enriched from <source>"
+gbrain list --type person
+gbrain list --type company
+```
--- a/skills/ingest/SKILL.md
+++ b/skills/ingest/SKILL.md
@@ -0,0 +1,34 @@
+# Ingest Skill
+
+Ingest meetings, articles, documents, and conversations into the brain.
+
+## Workflow
+
+1. **Parse the source.** Extract people, companies, dates, and events from the input.
+2. **For each entity mentioned:**
+   - `gbrain get <slug>` to check if page exists
+   - If exists: update compiled_truth (rewrite State section with new info, don't append)
+   - If new: `gbrain put <slug>` to create the page
+3. **Append to timeline.** `gbrain timeline-add <slug> <date> <summary>` for each event.
+4. **Create cross-reference links.** `gbrain link <from> <to> --type <relationship>` for every entity pair mentioned together.
+5. **Timeline merge.** The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.
+
+## Quality Rules
+
+- Executive summary in compiled_truth must be updated, not just timeline appended
+- State section is REWRITTEN, not appended to. Current best understanding only.
+- Timeline entries are reverse-chronological (newest first)
+- Every person/company mentioned gets a page if one doesn't exist
+- Link types: knows, works_at, invested_in, founded, met_at, discussed
+- Source attribution: every timeline entry includes the source (meeting, article, email, etc.)
+
+## Commands Used
+
+```
+gbrain get <slug>
+gbrain put <slug> < content.md
+gbrain timeline-add <slug> <date> <summary>
+gbrain link <from> <to> --type <type>
+gbrain tags <slug>
+gbrain tag <slug> <tag>
+```
--- a/skills/maintain/SKILL.md
+++ b/skills/maintain/SKILL.md
@@ -0,0 +1,59 @@
+# Maintain Skill
+
+Periodic brain health checks and cleanup.
+
+## Workflow
+
+1. **Run health check.** `gbrain health` to get the dashboard.
+2. **Check each dimension:**
+
+### Stale pages
+Pages where compiled_truth is older than the latest timeline entry. The assessment hasn't been updated to reflect recent evidence.
+- `gbrain query "stale pages"` or check health output
+- For each stale page: read timeline, determine if compiled_truth needs rewriting
+
+### Orphan pages
+Pages with zero inbound links. Nobody references them.
+- Review orphans: are they genuinely isolated or just missing links?
+- Add links from related pages or flag for deletion
+
+### Dead links
+Links pointing to pages that don't exist.
+- Remove dead links with `gbrain unlink`
+
+### Missing cross-references
+Pages that mention entity names but don't have formal links.
+- Read compiled_truth, extract entity mentions, create links
+
+### Tag consistency
+Inconsistent tagging (e.g., "vc" vs "venture-capital", "ai" vs "artificial-intelligence").
+- Standardize to the most common variant
+
+### Embedding freshness
+Chunks without embeddings, or chunks embedded with an old model.
+- `gbrain embed --stale` to backfill
+
+### Open threads
+Timeline items older than 30 days with unresolved action items.
+- Flag for review
+
+## Quality Rules
+
+- Never delete pages without confirmation
+- Log all changes via timeline entries
+- Run `gbrain health` before and after to show improvement
+
+## Commands Used
+
+```
+gbrain health
+gbrain list [--type T]
+gbrain get <slug>
+gbrain backlinks <slug>
+gbrain link <from> <to> --type <type>
+gbrain unlink <from> <to>
+gbrain tag <slug> <tag>
+gbrain untag <slug> <tag>
+gbrain embed --stale
+gbrain timeline <slug>
+```
--- a/skills/manifest.json
+++ b/skills/manifest.json
@@ -0,0 +1,45 @@
+{
+  "name": "gbrain",
+  "version": "0.1.0",
+  "description": "Personal knowledge brain with hybrid RAG search",
+  "skills": [
+    {
+      "name": "ingest",
+      "path": "ingest/SKILL.md",
+      "description": "Ingest meetings, docs, articles into the brain"
+    },
+    {
+      "name": "query",
+      "path": "query/SKILL.md",
+      "description": "Answer questions using 3-layer search and synthesis"
+    },
+    {
+      "name": "maintain",
+      "path": "maintain/SKILL.md",
+      "description": "Brain health checks: contradictions, stale info, orphans"
+    },
+    {
+      "name": "enrich",
+      "path": "enrich/SKILL.md",
+      "description": "Enrich pages from external APIs (Crustdata, Happenstance, Exa)"
+    },
+    {
+      "name": "briefing",
+      "path": "briefing/SKILL.md",
+      "description": "Compile daily briefing with meeting context and active deals"
+    },
+    {
+      "name": "migrate",
+      "path": "migrate/SKILL.md",
+      "description": "Universal migration from Obsidian, Notion, Logseq, markdown, CSV, JSON, Roam"
+    }
+  ],
+  "dependencies": {
+    "runtime": "bun",
+    "package": "gbrain"
+  },
+  "setup": {
+    "command": "gbrain init --supabase",
+    "description": "Initialize brain with Supabase (guided wizard)"
+  }
+}
--- a/skills/migrate/SKILL.md
+++ b/skills/migrate/SKILL.md
@@ -0,0 +1,87 @@
+# Migrate Skill
+
+Universal migration from any wiki, note tool, or brain system into GBrain.
+
+## Supported Sources
+
+| Source | Format | Strategy |
+|--------|--------|----------|
+| Obsidian | Markdown + `[[wikilinks]]` | Direct import, convert wikilinks to gbrain links |
+| Notion | Exported markdown or CSV | Parse Notion's export structure |
+| Logseq | Markdown with `((block refs))` | Convert block refs to page links |
+| Plain markdown | Any .md directory | `gbrain import <dir>` directly |
+| CSV | Tabular data | Map columns to frontmatter fields |
+| JSON | Structured data | Map keys to page fields |
+| Roam | JSON export | Convert block structure to pages |
+
+## General Workflow
+
+1. **Assess the source.** What format? How many files? What structure?
+2. **Plan the mapping.** How do source fields map to gbrain fields (type, title, tags, compiled_truth, timeline)?
+3. **Test with a sample.** Import 5-10 files, verify with `gbrain get` and `gbrain export`.
+4. **Bulk import.** Run the full migration.
+5. **Verify.** `gbrain health` + `gbrain stats` + spot-check pages.
+6. **Build links.** Extract cross-references from content and create typed links.
+
+## Obsidian Migration
+
+```bash
+# 1. Direct import (obsidian vaults are markdown directories)
+gbrain import /path/to/vault/
+
+# 2. Convert [[wikilinks]] to gbrain links
+# The skill reads each page's compiled_truth, finds [[Name]] patterns,
+# resolves them to slugs, and creates links:
+gbrain get <slug>  # read content
+# For each [[Name]] found:
+gbrain link <current-slug> <resolved-slug> --type references
+```
+
+Obsidian-specific:
+- `[[Name]]` becomes `gbrain link`
+- `[[Name|alias]]` uses the alias for context
+- Tags (`#tag`) become `gbrain tag`
+- Frontmatter properties map to gbrain frontmatter
+- Attachments (images, PDFs) are noted but not imported (future work)
+
+## Notion Migration
+
+1. Export from Notion: Settings > Export > Markdown & CSV
+2. Notion exports nested directories with UUIDs in filenames
+3. Strip UUIDs from filenames for clean slugs
+4. Map Notion's database properties to frontmatter
+5. `gbrain import` the cleaned directory
+
+## CSV Migration
+
+For tabular data (e.g., CRM exports, contact lists):
+
+```bash
+# For each row in the CSV:
+# 1. Create a page with column values as frontmatter
+# 2. Use a designated column as the slug (e.g., name)
+# 3. Use another column as compiled_truth (e.g., notes)
+gbrain put <slug> < generated.md
+```
+
+## Verification
+
+After any migration:
+1. `gbrain stats` — check page count matches source
+2. `gbrain health` — check for orphans, missing embeddings
+3. `gbrain export --dir /tmp/verify/` — round-trip test
+4. Spot-check 5-10 pages with `gbrain get`
+5. Test search: `gbrain query "someone you know is in the data"`
+
+## Commands Used
+
+```
+gbrain import <dir> [--no-embed]
+gbrain get <slug>
+gbrain put <slug>
+gbrain link <from> <to> --type <type>
+gbrain tag <slug> <tag>
+gbrain stats
+gbrain health
+gbrain export [--dir ./verify/]
+```
--- a/skills/query/SKILL.md
+++ b/skills/query/SKILL.md
@@ -0,0 +1,38 @@
+# Query Skill
+
+Answer questions using the brain's knowledge with 3-layer search and synthesis.
+
+## Workflow
+
+1. **Decompose the question** into search strategies:
+   - Keyword search for specific names, dates, terms
+   - Semantic query for conceptual questions
+   - Structured queries (list by type, backlinks) for relational questions
+2. **Execute searches:**
+   - `gbrain search <keywords>` for FTS matches
+   - `gbrain query <question>` for hybrid semantic+keyword with expansion
+   - `gbrain list --type <type>` or `gbrain backlinks <slug>` for structural queries
+3. **Read top results.** `gbrain get <slug>` for the top 3-5 pages to get full context.
+4. **Synthesize answer** with citations. Every claim traces back to a specific page slug.
+5. **Flag gaps.** If the brain doesn't have info, say "the brain doesn't have information on X" rather than hallucinating.
+
+## Quality Rules
+
+- Never hallucinate. Only answer from brain content.
+- Cite sources: "According to concepts/do-things-that-dont-scale..."
+- Flag stale results: if a search result shows [STALE], note that the info may be outdated
+- For "who" questions, use backlinks and typed links to find connections
+- For "what happened" questions, use timeline entries
+- For "what do we know" questions, read compiled_truth directly
+
+## Commands Used
+
+```
+gbrain search <query>
+gbrain query <question>
+gbrain get <slug>
+gbrain list [--type T] [--tag T]
+gbrain backlinks <slug>
+gbrain graph <slug> [--depth N]
+gbrain timeline <slug>
+```