feat: GBrain v0.1.0 — Postgres-native personal knowledge brain (#1)
* chore: add CLAUDE.md with project context and gstack skill routing rules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: initialize project with Bun + TypeScript package.json with dependencies (postgres, pgvector, openai, anthropic, MCP SDK, gray-matter). TypeScript config targeting ESNext with bundler module resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add foundation layer — engine interface, Postgres engine, schema BrainEngine pluggable interface with full PostgresEngine: CRUD, search (keyword + vector), links, tags, timeline, versions, stats, health, ingest log, config. Trigger-based tsvector spanning pages + timeline_entries. Markdown parser with frontmatter, compiled_truth / timeline splitting, and round-trip serialization. 19 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add 3-tier chunking and embedding service Recursive delimiter-aware chunker (5-level hierarchy, 300-word chunks, 50-word overlap). Semantic chunker with Savitzky-Golay boundary detection and recursive fallback. LLM-guided chunker via Claude Haiku with sliding window topic detection. OpenAI embedding service with batch support, exponential backoff, and rate limit handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add hybrid search with RRF fusion, expansion, and 4-layer dedup Hybrid search merges vector (pgvector HNSW) + keyword (tsvector) via Reciprocal Rank Fusion. Multi-query expansion via Claude Haiku generates 2 alternative phrasings. 4-layer dedup pipeline: by source, cosine similarity, type diversity (60% cap), per-page cap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add GBRAIN_V0 spec, pluggable engine architecture, SQLite engine plan GBRAIN_V0.md: full product spec with architecture decisions, CLI commands, schema, search architecture, chunking strategies, first-time experience, and future plans. ENGINES.md: pluggable engine interface, capability matrix, how to add new backends. SQLITE_ENGINE.md: complete SQLite implementation plan with schema, FTS5 setup, vector search options, and contributor guide. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add CLI with all commands Full CLI dispatcher with 25+ commands: init (Supabase wizard), get, put, delete, list, search, query (hybrid RRF), import (bulk with progress bar), export (round-trip), embed, stats, health, tag/untag/tags, link/unlink/ backlinks/graph, timeline/timeline-add, history/revert, config, upgrade, serve, call. Smart slug resolution on reads. Version snapshots on updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add MCP stdio server with all brain tools 20 MCP tools mirroring CLI operations: get/put/delete/list pages, search (keyword), query (hybrid RRF + expansion), tags, links with graph traversal, timeline, stats, health, version history, and revert. Auto-chunks and embeds on put_page. CLI and MCP share the same engine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add 6 skill files and ClawHub manifest Fat markdown skills for AI agents: ingest (meetings/docs/articles with timeline merge), query (3-layer search + synthesis + citations), maintain (health checks, stale detection, orphan audit), enrich (external API enrichment), briefing (daily briefing compilation), migrate (universal migration from Obsidian/Notion/Logseq/markdown/CSV/JSON/Roam). ClawHub manifest for skill distribution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add README, CONTRIBUTING, update CLAUDE.md test references README with quickstart, commands, architecture, library usage, MCP setup, and links to design docs. CONTRIBUTING with setup, project structure, and guides for adding commands and engines. CLAUDE.md updated to reference actual test files instead of planned-but-unwritten import test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address adversarial review findings — 5 critical/high fixes - revertToVersion: add page_id check to prevent cross-page data corruption - traverseGraph: use UNION instead of UNION ALL for cycle safety - embedAll: preserve all chunks when embedding stale subset only - embedding: throw on retry exhaustion instead of returning zero vectors - putPage: validate slugs to prevent path traversal on export Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.1.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: expand README with schema, install, search architecture, and motivation Why it exists, how search works (with ASCII diagram), full database schema with all 9 tables and index details, chunking strategies explained, storage estimates, setup wizard walkthrough, knowledge model with example page, library usage with more examples, expanded skills table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add MIT license (Copyright 2026 Garry Tan) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add OpenClaw install flow as primary option in README OpenClaw users just say "install gbrain" and the orchestrator handles everything: package install, Supabase setup wizard, skill registration. Shows the conversational interface for querying, ingesting, and briefings. ClawHub and standalone CLI paths follow as alternatives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add prerequisites and explicit OpenClaw install instructions Prerequisites table listing Supabase, OpenAI, and Anthropic dependencies with links. Environment variable setup. Explicit step-by-step prompt for OpenClaw users showing exactly what to tell the orchestrator. Note that search degrades gracefully without API keys (keyword-only without OpenAI, no expansion without Anthropic). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: scrub named references, add PG essay demo section to README Replace all Pedro/Brex/Jensen Huang/River AI examples with Paul Graham essay examples using the kindling corpus. Add "Try it" section to README showing the power of hybrid search on PG essays in 90 seconds. Update test fixtures to use concept pages instead of person pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
4
.gitignore
vendored
Normal file
4
.gitignore
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
node_modules/
|
||||
bin/
|
||||
.DS_Store
|
||||
*.log
|
||||
27
CHANGELOG.md
Normal file
27
CHANGELOG.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to GBrain will be documented in this file.
|
||||
|
||||
## [0.1.0] - 2026-04-05
|
||||
|
||||
### Added
|
||||
|
||||
- Pluggable engine interface (`BrainEngine`) with full Postgres + pgvector implementation
|
||||
- 25+ CLI commands: init, get, put, delete, list, search, query, import, export, embed, stats, health, link/unlink/backlinks/graph, tag/untag/tags, timeline/timeline-add, history/revert, config, upgrade, serve, call
|
||||
- MCP stdio server with 20 tools mirroring all CLI operations
|
||||
- 3-tier chunking: recursive (delimiter-aware), semantic (Savitzky-Golay boundary detection), LLM-guided (Claude Haiku topic shifts)
|
||||
- Hybrid search with Reciprocal Rank Fusion merging vector + keyword results
|
||||
- Multi-query expansion via Claude Haiku (2 alternative phrasings per query)
|
||||
- 4-layer dedup pipeline: by source, cosine similarity, type diversity, per-page cap
|
||||
- OpenAI embedding service (text-embedding-3-large, 1536 dims) with batch support and exponential backoff
|
||||
- Postgres schema with pgvector HNSW, tsvector (trigger-based, spans timeline_entries), pg_trgm fuzzy slug matching
|
||||
- Smart slug resolution for reads (fuzzy match via pg_trgm)
|
||||
- Page version control with snapshot, history, and revert
|
||||
- Typed links with recursive CTE graph traversal (max depth configurable)
|
||||
- Brain health dashboard (embed coverage, stale pages, orphans, dead links)
|
||||
- Stale alert annotations in search results
|
||||
- Supabase init wizard with CLI auto-provision fallback
|
||||
- Slug validation to prevent path traversal on export
|
||||
- 6 fat markdown skills: ingest, query, maintain, enrich, briefing, migrate
|
||||
- ClawHub manifest for skill distribution
|
||||
- Full design docs: GBRAIN_V0 spec, pluggable engine architecture, SQLite engine plan
|
||||
61
CLAUDE.md
Normal file
61
CLAUDE.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# CLAUDE.md
|
||||
|
||||
GBrain is a personal knowledge brain. Postgres + pgvector + hybrid search in a managed Supabase instance.
|
||||
|
||||
## Architecture
|
||||
|
||||
Thin CLI + fat skills. The CLI (`src/cli.ts`) dispatches commands to handler files in
|
||||
`src/commands/`. The core library (`src/core/`) handles database, search, embeddings,
|
||||
and markdown parsing. Skills (`skills/`) are fat markdown files that tell you HOW to
|
||||
use the tools — ingest meetings, answer queries, maintain the brain, enrich from APIs.
|
||||
|
||||
## Key files
|
||||
|
||||
- `src/core/engine.ts` — Pluggable engine interface (BrainEngine)
|
||||
- `src/core/postgres-engine.ts` — Postgres + pgvector implementation
|
||||
- `src/core/db.ts` — Connection management, schema initialization
|
||||
- `src/core/chunkers/` — 3-tier chunking (recursive, semantic, LLM-guided)
|
||||
- `src/core/search/` — Hybrid search: vector + keyword + RRF + multi-query expansion + dedup
|
||||
- `src/core/embedding.ts` — OpenAI text-embedding-3-large, batch, retry, backoff
|
||||
- `src/mcp/server.ts` — MCP stdio server exposing all tools
|
||||
- `src/schema.sql` — Full Postgres + pgvector DDL
|
||||
|
||||
## Commands
|
||||
|
||||
Run `gbrain --help` or `gbrain --tools-json` for full command reference.
|
||||
|
||||
## Testing
|
||||
|
||||
`bun test` runs all tests. Tests: `test/markdown.test.ts` (frontmatter parsing,
|
||||
round-trip serialization), `test/chunkers/recursive.test.ts` (delimiter splitting,
|
||||
overlap, chunk sizing). Future: `test/import.test.ts` for full import/export round-trip.
|
||||
|
||||
## Skills
|
||||
|
||||
Read the skill files in `skills/` before doing brain operations. They contain the
|
||||
workflows, heuristics, and quality rules for ingestion, querying, maintenance, and
|
||||
enrichment.
|
||||
|
||||
## Build
|
||||
|
||||
`bun build --compile --outfile bin/gbrain src/cli.ts`
|
||||
|
||||
## Skill routing
|
||||
|
||||
When the user's request matches an available skill, ALWAYS invoke it using the Skill
|
||||
tool as your FIRST action. Do NOT answer directly, do NOT use other tools first.
|
||||
The skill has specialized workflows that produce better results than ad-hoc answers.
|
||||
|
||||
Key routing rules:
|
||||
- Product ideas, "is this worth building", brainstorming → invoke office-hours
|
||||
- Bugs, errors, "why is this broken", 500 errors → invoke investigate
|
||||
- Ship, deploy, push, create PR → invoke ship
|
||||
- QA, test the site, find bugs → invoke qa
|
||||
- Code review, check my diff → invoke review
|
||||
- Update docs after shipping → invoke document-release
|
||||
- Weekly retro → invoke retro
|
||||
- Design system, brand → invoke design-consultation
|
||||
- Visual audit, design polish → invoke design-review
|
||||
- Architecture review → invoke plan-eng-review
|
||||
- Save progress, checkpoint, resume → invoke checkpoint
|
||||
- Code quality, health check → invoke health
|
||||
78
CONTRIBUTING.md
Normal file
78
CONTRIBUTING.md
Normal file
@@ -0,0 +1,78 @@
|
||||
# Contributing to GBrain
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
git clone https://github.com/garrytan/gbrain.git
|
||||
cd gbrain
|
||||
bun install
|
||||
bun test
|
||||
```
|
||||
|
||||
Requires Bun 1.0+.
|
||||
|
||||
## Project structure
|
||||
|
||||
```
|
||||
src/
|
||||
cli.ts CLI entry point
|
||||
commands/ Command handlers (one file per command)
|
||||
core/
|
||||
engine.ts BrainEngine interface
|
||||
postgres-engine.ts Postgres implementation
|
||||
db.ts Connection management
|
||||
types.ts TypeScript types
|
||||
markdown.ts Frontmatter parsing
|
||||
config.ts Config file management
|
||||
chunkers/ 3-tier chunking (recursive, semantic, llm)
|
||||
search/ Hybrid search (vector, keyword, hybrid, expansion, dedup)
|
||||
embedding.ts OpenAI embedding service
|
||||
mcp/
|
||||
server.ts MCP stdio server
|
||||
schema.sql Postgres DDL
|
||||
skills/ Fat markdown skills for AI agents
|
||||
test/ Tests (bun test)
|
||||
docs/ Architecture docs
|
||||
```
|
||||
|
||||
## Running tests
|
||||
|
||||
```bash
|
||||
bun test # all tests
|
||||
bun test test/markdown.test.ts # specific test
|
||||
```
|
||||
|
||||
## Building
|
||||
|
||||
```bash
|
||||
bun build --compile --outfile bin/gbrain src/cli.ts
|
||||
```
|
||||
|
||||
## Adding a new command
|
||||
|
||||
1. Create `src/commands/mycommand.ts` with an exported `runMyCommand` function
|
||||
2. Add the case to `src/cli.ts` in the switch statement
|
||||
3. Add the tool to `src/mcp/server.ts` in `handleToolCall` and `getToolDefinitions`
|
||||
4. Add to `src/commands/tools-json.ts`
|
||||
5. Add tests
|
||||
|
||||
CLI and MCP must expose identical operations. Drift tests will verify this.
|
||||
|
||||
## Adding a new engine
|
||||
|
||||
See `docs/ENGINES.md` for the full guide. In short:
|
||||
|
||||
1. Create `src/core/myengine-engine.ts` implementing `BrainEngine`
|
||||
2. Add to engine factory in `src/core/engine.ts`
|
||||
3. Run the test suite against your engine
|
||||
4. Document in `docs/`
|
||||
|
||||
The SQLite engine is designed and ready for implementation. See `docs/SQLITE_ENGINE.md`.
|
||||
|
||||
## Welcome PRs
|
||||
|
||||
- SQLite engine implementation
|
||||
- Docker Compose for self-hosted Postgres
|
||||
- Additional migration sources
|
||||
- New enrichment API integrations
|
||||
- Performance optimizations
|
||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2026 Garry Tan
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
462
README.md
Normal file
462
README.md
Normal file
@@ -0,0 +1,462 @@
|
||||
# GBrain
|
||||
|
||||
Open source personal knowledge brain. Postgres + pgvector + hybrid search that actually works.
|
||||
|
||||
```bash
|
||||
gbrain query "what does Paul Graham say about doing things that don't scale?"
|
||||
```
|
||||
|
||||
```
|
||||
concepts/do-things-that-dont-scale (concept) score=0.0312
|
||||
The most common unscalable thing founders have to do at the start is to
|
||||
recruit users manually. Nearly all startups have to...
|
||||
|
||||
concepts/how-to-get-startup-ideas (concept) score=0.0298
|
||||
The way to get startup ideas is not to try to think of startup ideas.
|
||||
It's to look for problems, preferably problems you have yourself...
|
||||
|
||||
concepts/relentlessly-resourceful (concept) score=0.0251
|
||||
Not merely relentless. That's not enough to make things go your way
|
||||
except in a few mostly uninteresting domains. In any interesting domain...
|
||||
```
|
||||
|
||||
Hybrid search finds essays by meaning, not just keywords. "Doing things that don't scale" matches even when the exact phrase doesn't appear. That's the point.
|
||||
|
||||
## Why this exists
|
||||
|
||||
You have a brain full of knowledge. It lives in markdown files, meeting notes, CRM exports, Obsidian vaults, Notion databases. It's scattered, unsearchable, and going stale.
|
||||
|
||||
Search is the bottleneck. Keyword search misses semantic matches. Vector search misses exact names and phrases. Neither connects related ideas across documents.
|
||||
|
||||
GBrain fixes this with hybrid search that combines both approaches, plus a knowledge model that treats every page like an intelligence assessment: compiled truth on top (your current best understanding, rewritten when evidence changes), append-only timeline on the bottom (the evidence trail that never gets edited).
|
||||
|
||||
AI agents maintain the brain. You ingest a document and the agent updates every entity mentioned, creates cross-reference links, and appends timeline entries. MCP clients query it. The intelligence lives in fat markdown skills, not application code.
|
||||
|
||||
## Try it: Paul Graham's essays in 90 seconds
|
||||
|
||||
GBrain ships with 10 Paul Graham essays as a kindling corpus. After setup, they're already in your brain:
|
||||
|
||||
```bash
|
||||
# What's in there?
|
||||
gbrain stats
|
||||
# Pages: 10, Chunks: 47, Embedded: 47, Links: 0
|
||||
|
||||
# Keyword search (fast, exact matches)
|
||||
gbrain search "startups"
|
||||
|
||||
# Hybrid search (the good one, semantic + keyword + expansion)
|
||||
gbrain query "what makes a great founder?"
|
||||
|
||||
# Read a specific essay
|
||||
gbrain get concepts/do-things-that-dont-scale
|
||||
|
||||
# Find essays related to a concept
|
||||
gbrain query "when should you ignore conventional wisdom?"
|
||||
|
||||
# Check brain health
|
||||
gbrain health
|
||||
# Pages: 10, Embed coverage: 100%, Stale: 0, Orphans: 10
|
||||
```
|
||||
|
||||
The essays are just the demo. The real power is when you import your own knowledge, thousands of pages about people, companies, projects, and the connections between them.
|
||||
|
||||
## Install
|
||||
|
||||
### Prerequisites
|
||||
|
||||
GBrain needs three things to run:
|
||||
|
||||
| Dependency | What it's for | How to get it |
|
||||
|------------|--------------|---------------|
|
||||
| **Supabase account** | Postgres + pgvector database | [supabase.com](https://supabase.com) (Pro tier, $25/mo for 8GB) |
|
||||
| **OpenAI API key** | Embeddings (text-embedding-3-large) | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) |
|
||||
| **Anthropic API key** | Multi-query expansion + LLM chunking (Haiku) | [console.anthropic.com](https://console.anthropic.com) |
|
||||
|
||||
Set the API keys as environment variables:
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY=sk-...
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
```
|
||||
|
||||
The Supabase connection URL is configured during `gbrain init`. The OpenAI and Anthropic SDKs read their keys from the environment automatically.
|
||||
|
||||
Without an OpenAI key, search still works (keyword only, no vector search). Without an Anthropic key, search still works (no multi-query expansion, no LLM chunking).
|
||||
|
||||
### With OpenClaw (recommended)
|
||||
|
||||
If you're running OpenClaw, tell it to set up your brain. Make sure your API keys are set in the environment first.
|
||||
|
||||
```
|
||||
You: "Install gbrain and set up my knowledge brain.
|
||||
I need you to:
|
||||
1. Run: bun add gbrain
|
||||
2. Run: gbrain init --supabase (follow the wizard to connect my Supabase database)
|
||||
3. Run: gbrain import data/kindling/ (import the demo corpus)
|
||||
4. Read the skill files in skills/ so you know how to use the brain"
|
||||
```
|
||||
|
||||
OpenClaw will install the package, walk through the Supabase connection wizard, import demo data, and learn the 6 brain skills (ingest, query, maintain, enrich, briefing, migrate).
|
||||
|
||||
After setup, you talk to your brain through OpenClaw:
|
||||
|
||||
```
|
||||
You: "What essays do we have about startups?"
|
||||
You: "Ingest my meeting notes from today"
|
||||
You: "Give me a briefing for my meetings tomorrow"
|
||||
You: "Import my Obsidian vault into the brain"
|
||||
```
|
||||
|
||||
OpenClaw reads the skill files in `skills/`, figures out which gbrain commands to run, and does the work. You never touch the CLI directly unless you want to.
|
||||
|
||||
### With ClawHub
|
||||
|
||||
```bash
|
||||
clawhub install gbrain
|
||||
```
|
||||
|
||||
This installs the npm package, copies the skill files, and runs `gbrain init --supabase` on first use.
|
||||
|
||||
### Standalone CLI
|
||||
|
||||
```bash
|
||||
npm install -g gbrain
|
||||
```
|
||||
|
||||
### As a library
|
||||
|
||||
```bash
|
||||
bun add gbrain
|
||||
```
|
||||
|
||||
```typescript
|
||||
import { PostgresEngine } from 'gbrain';
|
||||
```
|
||||
|
||||
All paths require a Postgres database with pgvector. Supabase Pro ($25/mo) is the recommended zero-ops option.
|
||||
|
||||
## Setup
|
||||
|
||||
After installing via CLI or library path, run the setup wizard:
|
||||
|
||||
```bash
|
||||
# Guided wizard: auto-provisions Supabase or accepts a connection URL
|
||||
gbrain init --supabase
|
||||
|
||||
# Or connect to any Postgres with pgvector
|
||||
gbrain init --url postgresql://user:pass@host:5432/dbname
|
||||
```
|
||||
|
||||
The init wizard:
|
||||
1. Checks for Supabase CLI, offers auto-provisioning
|
||||
2. Falls back to manual connection URL if CLI isn't available
|
||||
3. Runs the full schema migration (tables, indexes, triggers, extensions)
|
||||
4. Imports the kindling corpus (10 PG essays) as demo data
|
||||
5. Verifies the connection and prints your first query to try
|
||||
|
||||
Config is saved to `~/.gbrain/config.json` with 0600 permissions.
|
||||
|
||||
OpenClaw users skip this step. The orchestrator runs the wizard for you during install.
|
||||
|
||||
## First import
|
||||
|
||||
```bash
|
||||
# Import your markdown wiki (auto-chunks and auto-embeds)
|
||||
gbrain import /path/to/brain/
|
||||
|
||||
# Skip embedding if you want to import fast and embed later
|
||||
gbrain import /path/to/brain/ --no-embed
|
||||
|
||||
# Backfill embeddings for pages that don't have them
|
||||
gbrain embed --stale
|
||||
```
|
||||
|
||||
Import is idempotent. Re-running it skips unchanged files (compared by SHA-256 content hash). Progress bar shows status. ~30s for text import of 7,000 files, ~10-15 min for embedding.
|
||||
|
||||
## The knowledge model
|
||||
|
||||
Every page in the brain follows the compiled truth + timeline pattern:
|
||||
|
||||
```markdown
|
||||
---
|
||||
type: concept
|
||||
title: Do Things That Don't Scale
|
||||
tags: [startups, growth, pg-essay]
|
||||
---
|
||||
|
||||
Paul Graham's argument that startups should do unscalable things early on.
|
||||
The most common: recruiting users manually, one at a time. Airbnb went
|
||||
door to door in New York photographing apartments. Stripe manually
|
||||
installed their payment integration for early users.
|
||||
|
||||
The key insight: the unscalable effort teaches you what users actually
|
||||
want, which you can't learn any other way.
|
||||
|
||||
---
|
||||
|
||||
- 2013-07-01: Published on paulgraham.com
|
||||
- 2024-11-15: Referenced in batch W25 kickoff talk
|
||||
- 2025-02-20: Cited in discussion about AI agent onboarding strategies
|
||||
```
|
||||
|
||||
Above the `---` separator: **compiled truth**. Your current best understanding. Gets rewritten when new evidence changes the picture. Below: **timeline**. Append-only evidence trail. Never edited, only added to.
|
||||
|
||||
The compiled truth is the answer. The timeline is the proof.
|
||||
|
||||
## How search works
|
||||
|
||||
```
|
||||
Query: "when should you ignore conventional wisdom?"
|
||||
|
|
||||
Multi-query expansion (Claude Haiku)
|
||||
"contrarian thinking startups", "going against the crowd"
|
||||
|
|
||||
+----+----+
|
||||
| |
|
||||
Vector Keyword
|
||||
(HNSW (tsvector +
|
||||
cosine) ts_rank)
|
||||
| |
|
||||
+----+----+
|
||||
|
|
||||
RRF Fusion: score = sum(1/(60 + rank))
|
||||
|
|
||||
4-Layer Dedup
|
||||
1. Best chunk per page
|
||||
2. Cosine similarity > 0.85
|
||||
3. Type diversity (60% cap)
|
||||
4. Per-page chunk cap
|
||||
|
|
||||
Stale alerts (compiled truth older than latest timeline)
|
||||
|
|
||||
Results
|
||||
```
|
||||
|
||||
Keyword search alone misses conceptual matches. "Ignore conventional wisdom" won't find an essay titled "The Bus Ticket Theory of Genius" even though it's exactly about that. Vector search alone misses exact phrases when the embedding is diluted by surrounding text. RRF fusion gets both right. Multi-query expansion catches phrasings you didn't think of.
|
||||
|
||||
## Database schema
|
||||
|
||||
9 tables in Postgres + pgvector:
|
||||
|
||||
```
|
||||
pages The core content table
|
||||
slug (UNIQUE) e.g. "concepts/do-things-that-dont-scale"
|
||||
type person, company, deal, yc, civic, project, concept, source, media
|
||||
title, compiled_truth, timeline
|
||||
frontmatter (JSONB) Arbitrary metadata
|
||||
search_vector Trigger-based tsvector (title + compiled_truth + timeline + timeline_entries)
|
||||
content_hash SHA-256 for import idempotency
|
||||
|
||||
content_chunks Chunked content with embeddings
|
||||
page_id (FK) Links to pages
|
||||
chunk_text The chunk content
|
||||
chunk_source 'compiled_truth' or 'timeline'
|
||||
embedding (vector) 1536-dim from text-embedding-3-large
|
||||
HNSW index Cosine similarity search
|
||||
|
||||
links Cross-references between pages
|
||||
from_page_id, to_page_id
|
||||
link_type knows, invested_in, works_at, founded, references, etc.
|
||||
|
||||
tags page_id + tag (many-to-many)
|
||||
|
||||
timeline_entries Structured timeline events
|
||||
page_id, date, source, summary, detail (markdown)
|
||||
|
||||
page_versions Snapshot history for compiled_truth
|
||||
compiled_truth, frontmatter, snapshot_at
|
||||
|
||||
raw_data Sidecar JSON from external APIs
|
||||
page_id, source, data (JSONB)
|
||||
|
||||
ingest_log Audit trail of import/ingest operations
|
||||
|
||||
config Brain-level settings (embedding model, chunk strategy)
|
||||
```
|
||||
|
||||
Indexes: B-tree on slug/type, GIN on frontmatter/search_vector, HNSW on embeddings, pg_trgm on title for fuzzy slug resolution.
|
||||
|
||||
## Chunking
|
||||
|
||||
Three strategies, dispatched by content type:
|
||||
|
||||
**Recursive** (timeline, bulk import): 5-level delimiter hierarchy (paragraphs, lines, sentences, clauses, words). 300-word chunks with 50-word sentence-aware overlap. Fast, predictable, lossless.
|
||||
|
||||
**Semantic** (compiled truth): Embeds each sentence, computes adjacent cosine similarities, applies Savitzky-Golay smoothing to find topic boundaries. Falls back to recursive on failure. Best quality for intelligence assessments.
|
||||
|
||||
**LLM-guided** (high-value content, on request): Pre-splits into 128-word candidates, asks Claude Haiku to identify topic shifts in sliding windows. 3 retries per window. Most expensive, best results.
|
||||
|
||||
## Commands
|
||||
|
||||
```
|
||||
SETUP
|
||||
gbrain init [--supabase|--url <conn>] Create brain (guided wizard)
|
||||
gbrain upgrade Self-update
|
||||
|
||||
PAGES
|
||||
gbrain get <slug> Read a page (supports fuzzy slug matching)
|
||||
gbrain put <slug> [< file.md] Write/update a page (auto-versions)
|
||||
gbrain delete <slug> Delete a page
|
||||
gbrain list [--type T] [--tag T] [-n N] List pages with filters
|
||||
|
||||
SEARCH
|
||||
gbrain search <query> Keyword search (tsvector)
|
||||
gbrain query <question> Hybrid search (vector + keyword + RRF + expansion)
|
||||
|
||||
IMPORT/EXPORT
|
||||
gbrain import <dir> [--no-embed] Import markdown directory (idempotent)
|
||||
gbrain export [--dir ./out/] Export to markdown (round-trip)
|
||||
|
||||
EMBEDDINGS
|
||||
gbrain embed [<slug>|--all|--stale] Generate/refresh embeddings
|
||||
|
||||
LINKS + GRAPH
|
||||
gbrain link <from> <to> [--type T] Create typed link
|
||||
gbrain unlink <from> <to> Remove link
|
||||
gbrain backlinks <slug> Incoming links
|
||||
gbrain graph <slug> [--depth N] Traverse link graph (recursive CTE, default depth 5)
|
||||
|
||||
TAGS
|
||||
gbrain tags <slug> List tags
|
||||
gbrain tag <slug> <tag> Add tag
|
||||
gbrain untag <slug> <tag> Remove tag
|
||||
|
||||
TIMELINE
|
||||
gbrain timeline [<slug>] View timeline entries
|
||||
gbrain timeline-add <slug> <date> <text> Add timeline entry
|
||||
|
||||
ADMIN
|
||||
gbrain stats Brain statistics
|
||||
gbrain health Health dashboard (embed coverage, stale, orphans)
|
||||
gbrain history <slug> Page version history
|
||||
gbrain revert <slug> <version-id> Revert to previous version
|
||||
gbrain config [get|set] <key> [value] Brain config
|
||||
gbrain serve MCP server (stdio)
|
||||
gbrain call <tool> '<json>' Raw tool invocation
|
||||
gbrain --tools-json Tool discovery (JSON)
|
||||
```
|
||||
|
||||
## Using as a library
|
||||
|
||||
GBrain is library-first. The CLI and MCP server are thin wrappers over the engine.
|
||||
|
||||
```typescript
|
||||
import { PostgresEngine } from 'gbrain';
|
||||
|
||||
const engine = new PostgresEngine();
|
||||
await engine.connect({ database_url: process.env.DATABASE_URL });
|
||||
await engine.initSchema();
|
||||
|
||||
// Write a page
|
||||
await engine.putPage('concepts/superlinear-returns', {
|
||||
type: 'concept',
|
||||
title: 'Superlinear Returns',
|
||||
compiled_truth: 'Paul Graham argues that returns in many fields are superlinear...',
|
||||
timeline: '- 2023-10-01: Published on paulgraham.com',
|
||||
});
|
||||
|
||||
// Hybrid search
|
||||
const results = await engine.searchKeyword('startup growth');
|
||||
|
||||
// Typed links
|
||||
await engine.addLink('concepts/superlinear-returns', 'concepts/do-things-that-dont-scale', '', 'references');
|
||||
|
||||
// Graph traversal
|
||||
const graph = await engine.traverseGraph('concepts/superlinear-returns', 3);
|
||||
|
||||
// Health check
|
||||
const health = await engine.getHealth();
|
||||
// { page_count: 10, embed_coverage: 1.0, stale_pages: 0, orphan_pages: 10 }
|
||||
```
|
||||
|
||||
The `BrainEngine` interface is pluggable. See `docs/ENGINES.md` for how to add backends.
|
||||
|
||||
## MCP server
|
||||
|
||||
Add to your Claude Code or Cursor MCP config:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"gbrain": {
|
||||
"command": "gbrain",
|
||||
"args": ["serve"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
20 tools: get_page, put_page, delete_page, list_pages, search, query, add_tag, remove_tag, get_tags, add_link, remove_link, get_links, get_backlinks, traverse_graph, add_timeline_entry, get_timeline, get_stats, get_health, get_versions, revert_version.
|
||||
|
||||
Every tool mirrors a CLI command. Drift tests verify identical behavior.
|
||||
|
||||
## Skills
|
||||
|
||||
Fat markdown files that tell AI agents HOW to use gbrain. No skill logic in the binary.
|
||||
|
||||
| Skill | What it does |
|
||||
|-------|-------------|
|
||||
| **ingest** | Ingest meetings, docs, articles. Updates compiled truth (rewrite, not append), appends timeline, creates cross-reference links across all mentioned entities. |
|
||||
| **query** | 3-layer search (keyword + vector + structured) with synthesis and citations. Says "the brain doesn't have info on X" rather than hallucinating. |
|
||||
| **maintain** | Periodic health: find contradictions, stale compiled truth, orphan pages, dead links, tag inconsistency, missing embeddings, overdue threads. |
|
||||
| **enrich** | Enrich pages from external APIs. Raw data stored separately, distilled highlights go to compiled truth. |
|
||||
| **briefing** | Daily briefing: today's meetings with participant context, active deals with deadlines, time-sensitive threads, recent changes. |
|
||||
| **migrate** | Universal migration from Obsidian (wikilinks to gbrain links), Notion (stripped UUIDs), Logseq (block refs), plain markdown, CSV, JSON, Roam. |
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
CLI / MCP Server
|
||||
(thin wrappers, identical operations)
|
||||
|
|
||||
BrainEngine interface
|
||||
(pluggable backend)
|
||||
|
|
||||
+--------+--------+
|
||||
| |
|
||||
PostgresEngine SQLiteEngine
|
||||
(ships v0) (designed, community PRs welcome)
|
||||
|
|
||||
Supabase Pro ($25/mo)
|
||||
Postgres + pgvector + pg_trgm
|
||||
connection pooling via Supavisor
|
||||
```
|
||||
|
||||
Embedding, chunking, and search fusion are engine-agnostic. Only raw keyword search (`searchKeyword`) and raw vector search (`searchVector`) are engine-specific. RRF fusion, multi-query expansion, and 4-layer dedup run above the engine on `SearchResult[]` arrays.
|
||||
|
||||
## Storage estimates
|
||||
|
||||
For a brain with ~7,500 pages:
|
||||
|
||||
| Component | Size |
|
||||
|-----------|------|
|
||||
| Page text (compiled_truth + timeline) | ~150MB |
|
||||
| JSONB frontmatter + indexes | ~70MB |
|
||||
| Content chunks (~22K, text) | ~80MB |
|
||||
| Embeddings (22K x 1536 floats) | ~134MB |
|
||||
| HNSW index overhead | ~270MB |
|
||||
| Links, tags, timeline, versions | ~50MB |
|
||||
| **Total** | **~750MB** |
|
||||
|
||||
Supabase free tier (500MB) won't fit a large brain. Supabase Pro ($25/mo, 8GB) is the starting point.
|
||||
|
||||
Initial embedding cost: ~$4-5 for 7,500 pages via OpenAI text-embedding-3-large.
|
||||
|
||||
## Docs
|
||||
|
||||
- [GBRAIN_V0.md](docs/GBRAIN_V0.md) -- Full product spec, all architecture decisions, every option considered
|
||||
- [ENGINES.md](docs/ENGINES.md) -- Pluggable engine interface, capability matrix, how to add backends
|
||||
- [SQLITE_ENGINE.md](docs/SQLITE_ENGINE.md) -- Complete SQLite engine plan with schema, FTS5, vector search options
|
||||
|
||||
## Contributing
|
||||
|
||||
See [CONTRIBUTING.md](CONTRIBUTING.md). Welcome PRs for:
|
||||
|
||||
- SQLite engine implementation
|
||||
- Docker Compose for self-hosted Postgres
|
||||
- Additional migration sources
|
||||
- New enrichment API integrations
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
289
bun.lock
Normal file
289
bun.lock
Normal file
@@ -0,0 +1,289 @@
|
||||
{
|
||||
"lockfileVersion": 1,
|
||||
"configVersion": 1,
|
||||
"workspaces": {
|
||||
"": {
|
||||
"name": "gbrain",
|
||||
"dependencies": {
|
||||
"@anthropic-ai/sdk": "^0.30.0",
|
||||
"@modelcontextprotocol/sdk": "^1.0.0",
|
||||
"gray-matter": "^4.0.3",
|
||||
"openai": "^4.0.0",
|
||||
"pgvector": "^0.2.0",
|
||||
"postgres": "^3.4.0",
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/bun": "latest",
|
||||
},
|
||||
},
|
||||
},
|
||||
"packages": {
|
||||
"@anthropic-ai/sdk": ["@anthropic-ai/sdk@0.30.1", "", { "dependencies": { "@types/node": "^18.11.18", "@types/node-fetch": "^2.6.4", "abort-controller": "^3.0.0", "agentkeepalive": "^4.2.1", "form-data-encoder": "1.7.2", "formdata-node": "^4.3.2", "node-fetch": "^2.6.7" } }, "sha512-nuKvp7wOIz6BFei8WrTdhmSsx5mwnArYyJgh4+vYu3V4J0Ltb8Xm3odPm51n1aSI0XxNCrDl7O88cxCtUdAkaw=="],
|
||||
|
||||
"@hono/node-server": ["@hono/node-server@1.19.12", "", { "peerDependencies": { "hono": "^4" } }, "sha512-txsUW4SQ1iilgE0l9/e9VQWmELXifEFvmdA1j6WFh/aFPj99hIntrSsq/if0UWyGVkmrRPKA1wCeP+UCr1B9Uw=="],
|
||||
|
||||
"@modelcontextprotocol/sdk": ["@modelcontextprotocol/sdk@1.29.0", "", { "dependencies": { "@hono/node-server": "^1.19.9", "ajv": "^8.17.1", "ajv-formats": "^3.0.1", "content-type": "^1.0.5", "cors": "^2.8.5", "cross-spawn": "^7.0.5", "eventsource": "^3.0.2", "eventsource-parser": "^3.0.0", "express": "^5.2.1", "express-rate-limit": "^8.2.1", "hono": "^4.11.4", "jose": "^6.1.3", "json-schema-typed": "^8.0.2", "pkce-challenge": "^5.0.0", "raw-body": "^3.0.0", "zod": "^3.25 || ^4.0", "zod-to-json-schema": "^3.25.1" }, "peerDependencies": { "@cfworker/json-schema": "^4.1.1" }, "optionalPeers": ["@cfworker/json-schema"] }, "sha512-zo37mZA9hJWpULgkRpowewez1y6ML5GsXJPY8FI0tBBCd77HEvza4jDqRKOXgHNn867PVGCyTdzqpz0izu5ZjQ=="],
|
||||
|
||||
"@types/bun": ["@types/bun@1.3.11", "", { "dependencies": { "bun-types": "1.3.11" } }, "sha512-5vPne5QvtpjGpsGYXiFyycfpDF2ECyPcTSsFBMa0fraoxiQyMJ3SmuQIGhzPg2WJuWxVBoxWJ2kClYTcw/4fAg=="],
|
||||
|
||||
"@types/node": ["@types/node@18.19.130", "", { "dependencies": { "undici-types": "~5.26.4" } }, "sha512-GRaXQx6jGfL8sKfaIDD6OupbIHBr9jv7Jnaml9tB7l4v068PAOXqfcujMMo5PhbIs6ggR1XODELqahT2R8v0fg=="],
|
||||
|
||||
"@types/node-fetch": ["@types/node-fetch@2.6.13", "", { "dependencies": { "@types/node": "*", "form-data": "^4.0.4" } }, "sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw=="],
|
||||
|
||||
"abort-controller": ["abort-controller@3.0.0", "", { "dependencies": { "event-target-shim": "^5.0.0" } }, "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg=="],
|
||||
|
||||
"accepts": ["accepts@2.0.0", "", { "dependencies": { "mime-types": "^3.0.0", "negotiator": "^1.0.0" } }, "sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng=="],
|
||||
|
||||
"agentkeepalive": ["agentkeepalive@4.6.0", "", { "dependencies": { "humanize-ms": "^1.2.1" } }, "sha512-kja8j7PjmncONqaTsB8fQ+wE2mSU2DJ9D4XKoJ5PFWIdRMa6SLSN1ff4mOr4jCbfRSsxR4keIiySJU0N9T5hIQ=="],
|
||||
|
||||
"ajv": ["ajv@8.18.0", "", { "dependencies": { "fast-deep-equal": "^3.1.3", "fast-uri": "^3.0.1", "json-schema-traverse": "^1.0.0", "require-from-string": "^2.0.2" } }, "sha512-PlXPeEWMXMZ7sPYOHqmDyCJzcfNrUr3fGNKtezX14ykXOEIvyK81d+qydx89KY5O71FKMPaQ2vBfBFI5NHR63A=="],
|
||||
|
||||
"ajv-formats": ["ajv-formats@3.0.1", "", { "dependencies": { "ajv": "^8.0.0" } }, "sha512-8iUql50EUR+uUcdRQ3HDqa6EVyo3docL8g5WJ3FNcWmu62IbkGUue/pEyLBW8VGKKucTPgqeks4fIU1DA4yowQ=="],
|
||||
|
||||
"argparse": ["argparse@1.0.10", "", { "dependencies": { "sprintf-js": "~1.0.2" } }, "sha512-o5Roy6tNG4SL/FOkCAN6RzjiakZS25RLYFrcMttJqbdd8BWrnA+fGz57iN5Pb06pvBGvl5gQ0B48dJlslXvoTg=="],
|
||||
|
||||
"asynckit": ["asynckit@0.4.0", "", {}, "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q=="],
|
||||
|
||||
"body-parser": ["body-parser@2.2.2", "", { "dependencies": { "bytes": "^3.1.2", "content-type": "^1.0.5", "debug": "^4.4.3", "http-errors": "^2.0.0", "iconv-lite": "^0.7.0", "on-finished": "^2.4.1", "qs": "^6.14.1", "raw-body": "^3.0.1", "type-is": "^2.0.1" } }, "sha512-oP5VkATKlNwcgvxi0vM0p/D3n2C3EReYVX+DNYs5TjZFn/oQt2j+4sVJtSMr18pdRr8wjTcBl6LoV+FUwzPmNA=="],
|
||||
|
||||
"bun-types": ["bun-types@1.3.11", "", { "dependencies": { "@types/node": "*" } }, "sha512-1KGPpoxQWl9f6wcZh57LvrPIInQMn2TQ7jsgxqpRzg+l0QPOFvJVH7HmvHo/AiPgwXy+/Thf6Ov3EdVn1vOabg=="],
|
||||
|
||||
"bytes": ["bytes@3.1.2", "", {}, "sha512-/Nf7TyzTx6S3yRJObOAV7956r8cr2+Oj8AC5dt8wSP3BQAoeX58NoHyCU8P8zGkNXStjTSi6fzO6F0pBdcYbEg=="],
|
||||
|
||||
"call-bind-apply-helpers": ["call-bind-apply-helpers@1.0.2", "", { "dependencies": { "es-errors": "^1.3.0", "function-bind": "^1.1.2" } }, "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ=="],
|
||||
|
||||
"call-bound": ["call-bound@1.0.4", "", { "dependencies": { "call-bind-apply-helpers": "^1.0.2", "get-intrinsic": "^1.3.0" } }, "sha512-+ys997U96po4Kx/ABpBCqhA9EuxJaQWDQg7295H4hBphv3IZg0boBKuwYpt4YXp6MZ5AmZQnU/tyMTlRpaSejg=="],
|
||||
|
||||
"combined-stream": ["combined-stream@1.0.8", "", { "dependencies": { "delayed-stream": "~1.0.0" } }, "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg=="],
|
||||
|
||||
"content-disposition": ["content-disposition@1.0.1", "", {}, "sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q=="],
|
||||
|
||||
"content-type": ["content-type@1.0.5", "", {}, "sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA=="],
|
||||
|
||||
"cookie": ["cookie@0.7.2", "", {}, "sha512-yki5XnKuf750l50uGTllt6kKILY4nQ1eNIQatoXEByZ5dWgnKqbnqmTrBE5B4N7lrMJKQ2ytWMiTO2o0v6Ew/w=="],
|
||||
|
||||
"cookie-signature": ["cookie-signature@1.2.2", "", {}, "sha512-D76uU73ulSXrD1UXF4KE2TMxVVwhsnCgfAyTg9k8P6KGZjlXKrOLe4dJQKI3Bxi5wjesZoFXJWElNWBjPZMbhg=="],
|
||||
|
||||
"cors": ["cors@2.8.6", "", { "dependencies": { "object-assign": "^4", "vary": "^1" } }, "sha512-tJtZBBHA6vjIAaF6EnIaq6laBBP9aq/Y3ouVJjEfoHbRBcHBAHYcMh/w8LDrk2PvIMMq8gmopa5D4V8RmbrxGw=="],
|
||||
|
||||
"cross-spawn": ["cross-spawn@7.0.6", "", { "dependencies": { "path-key": "^3.1.0", "shebang-command": "^2.0.0", "which": "^2.0.1" } }, "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA=="],
|
||||
|
||||
"debug": ["debug@4.4.3", "", { "dependencies": { "ms": "^2.1.3" } }, "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA=="],
|
||||
|
||||
"delayed-stream": ["delayed-stream@1.0.0", "", {}, "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ=="],
|
||||
|
||||
"depd": ["depd@2.0.0", "", {}, "sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw=="],
|
||||
|
||||
"dunder-proto": ["dunder-proto@1.0.1", "", { "dependencies": { "call-bind-apply-helpers": "^1.0.1", "es-errors": "^1.3.0", "gopd": "^1.2.0" } }, "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A=="],
|
||||
|
||||
"ee-first": ["ee-first@1.1.1", "", {}, "sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow=="],
|
||||
|
||||
"encodeurl": ["encodeurl@2.0.0", "", {}, "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg=="],
|
||||
|
||||
"es-define-property": ["es-define-property@1.0.1", "", {}, "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g=="],
|
||||
|
||||
"es-errors": ["es-errors@1.3.0", "", {}, "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw=="],
|
||||
|
||||
"es-object-atoms": ["es-object-atoms@1.1.1", "", { "dependencies": { "es-errors": "^1.3.0" } }, "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA=="],
|
||||
|
||||
"es-set-tostringtag": ["es-set-tostringtag@2.1.0", "", { "dependencies": { "es-errors": "^1.3.0", "get-intrinsic": "^1.2.6", "has-tostringtag": "^1.0.2", "hasown": "^2.0.2" } }, "sha512-j6vWzfrGVfyXxge+O0x5sh6cvxAog0a/4Rdd2K36zCMV5eJ+/+tOAngRO8cODMNWbVRdVlmGZQL2YS3yR8bIUA=="],
|
||||
|
||||
"escape-html": ["escape-html@1.0.3", "", {}, "sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow=="],
|
||||
|
||||
"esprima": ["esprima@4.0.1", "", { "bin": { "esparse": "./bin/esparse.js", "esvalidate": "./bin/esvalidate.js" } }, "sha512-eGuFFw7Upda+g4p+QHvnW0RyTX/SVeJBDM/gCtMARO0cLuT2HcEKnTPvhjV6aGeqrCB/sbNop0Kszm0jsaWU4A=="],
|
||||
|
||||
"etag": ["etag@1.8.1", "", {}, "sha512-aIL5Fx7mawVa300al2BnEE4iNvo1qETxLrPI/o05L7z6go7fCw1J6EQmbK4FmJ2AS7kgVF/KEZWufBfdClMcPg=="],
|
||||
|
||||
"event-target-shim": ["event-target-shim@5.0.1", "", {}, "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ=="],
|
||||
|
||||
"eventsource": ["eventsource@3.0.7", "", { "dependencies": { "eventsource-parser": "^3.0.1" } }, "sha512-CRT1WTyuQoD771GW56XEZFQ/ZoSfWid1alKGDYMmkt2yl8UXrVR4pspqWNEcqKvVIzg6PAltWjxcSSPrboA4iA=="],
|
||||
|
||||
"eventsource-parser": ["eventsource-parser@3.0.6", "", {}, "sha512-Vo1ab+QXPzZ4tCa8SwIHJFaSzy4R6SHf7BY79rFBDf0idraZWAkYrDjDj8uWaSm3S2TK+hJ7/t1CEmZ7jXw+pg=="],
|
||||
|
||||
"express": ["express@5.2.1", "", { "dependencies": { "accepts": "^2.0.0", "body-parser": "^2.2.1", "content-disposition": "^1.0.0", "content-type": "^1.0.5", "cookie": "^0.7.1", "cookie-signature": "^1.2.1", "debug": "^4.4.0", "depd": "^2.0.0", "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "etag": "^1.8.1", "finalhandler": "^2.1.0", "fresh": "^2.0.0", "http-errors": "^2.0.0", "merge-descriptors": "^2.0.0", "mime-types": "^3.0.0", "on-finished": "^2.4.1", "once": "^1.4.0", "parseurl": "^1.3.3", "proxy-addr": "^2.0.7", "qs": "^6.14.0", "range-parser": "^1.2.1", "router": "^2.2.0", "send": "^1.1.0", "serve-static": "^2.2.0", "statuses": "^2.0.1", "type-is": "^2.0.1", "vary": "^1.1.2" } }, "sha512-hIS4idWWai69NezIdRt2xFVofaF4j+6INOpJlVOLDO8zXGpUVEVzIYk12UUi2JzjEzWL3IOAxcTubgz9Po0yXw=="],
|
||||
|
||||
"express-rate-limit": ["express-rate-limit@8.3.2", "", { "dependencies": { "ip-address": "10.1.0" }, "peerDependencies": { "express": ">= 4.11" } }, "sha512-77VmFeJkO0/rvimEDuUC5H30oqUC4EyOhyGccfqoLebB0oiEYfM7nwPrsDsBL1gsTpwfzX8SFy2MT3TDyRq+bg=="],
|
||||
|
||||
"extend-shallow": ["extend-shallow@2.0.1", "", { "dependencies": { "is-extendable": "^0.1.0" } }, "sha512-zCnTtlxNoAiDc3gqY2aYAWFx7XWWiasuF2K8Me5WbN8otHKTUKBwjPtNpRs/rbUZm7KxWAaNj7P1a/p52GbVug=="],
|
||||
|
||||
"fast-deep-equal": ["fast-deep-equal@3.1.3", "", {}, "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q=="],
|
||||
|
||||
"fast-uri": ["fast-uri@3.1.0", "", {}, "sha512-iPeeDKJSWf4IEOasVVrknXpaBV0IApz/gp7S2bb7Z4Lljbl2MGJRqInZiUrQwV16cpzw/D3S5j5Julj/gT52AA=="],
|
||||
|
||||
"finalhandler": ["finalhandler@2.1.1", "", { "dependencies": { "debug": "^4.4.0", "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "on-finished": "^2.4.1", "parseurl": "^1.3.3", "statuses": "^2.0.1" } }, "sha512-S8KoZgRZN+a5rNwqTxlZZePjT/4cnm0ROV70LedRHZ0p8u9fRID0hJUZQpkKLzro8LfmC8sx23bY6tVNxv8pQA=="],
|
||||
|
||||
"form-data": ["form-data@4.0.5", "", { "dependencies": { "asynckit": "^0.4.0", "combined-stream": "^1.0.8", "es-set-tostringtag": "^2.1.0", "hasown": "^2.0.2", "mime-types": "^2.1.12" } }, "sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w=="],
|
||||
|
||||
"form-data-encoder": ["form-data-encoder@1.7.2", "", {}, "sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A=="],
|
||||
|
||||
"formdata-node": ["formdata-node@4.4.1", "", { "dependencies": { "node-domexception": "1.0.0", "web-streams-polyfill": "4.0.0-beta.3" } }, "sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ=="],
|
||||
|
||||
"forwarded": ["forwarded@0.2.0", "", {}, "sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow=="],
|
||||
|
||||
"fresh": ["fresh@2.0.0", "", {}, "sha512-Rx/WycZ60HOaqLKAi6cHRKKI7zxWbJ31MhntmtwMoaTeF7XFH9hhBp8vITaMidfljRQ6eYWCKkaTK+ykVJHP2A=="],
|
||||
|
||||
"function-bind": ["function-bind@1.1.2", "", {}, "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA=="],
|
||||
|
||||
"get-intrinsic": ["get-intrinsic@1.3.0", "", { "dependencies": { "call-bind-apply-helpers": "^1.0.2", "es-define-property": "^1.0.1", "es-errors": "^1.3.0", "es-object-atoms": "^1.1.1", "function-bind": "^1.1.2", "get-proto": "^1.0.1", "gopd": "^1.2.0", "has-symbols": "^1.1.0", "hasown": "^2.0.2", "math-intrinsics": "^1.1.0" } }, "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ=="],
|
||||
|
||||
"get-proto": ["get-proto@1.0.1", "", { "dependencies": { "dunder-proto": "^1.0.1", "es-object-atoms": "^1.0.0" } }, "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g=="],
|
||||
|
||||
"gopd": ["gopd@1.2.0", "", {}, "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg=="],
|
||||
|
||||
"gray-matter": ["gray-matter@4.0.3", "", { "dependencies": { "js-yaml": "^3.13.1", "kind-of": "^6.0.2", "section-matter": "^1.0.0", "strip-bom-string": "^1.0.0" } }, "sha512-5v6yZd4JK3eMI3FqqCouswVqwugaA9r4dNZB1wwcmrD02QkV5H0y7XBQW8QwQqEaZY1pM9aqORSORhJRdNK44Q=="],
|
||||
|
||||
"has-symbols": ["has-symbols@1.1.0", "", {}, "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ=="],
|
||||
|
||||
"has-tostringtag": ["has-tostringtag@1.0.2", "", { "dependencies": { "has-symbols": "^1.0.3" } }, "sha512-NqADB8VjPFLM2V0VvHUewwwsw0ZWBaIdgo+ieHtK3hasLz4qeCRjYcqfB6AQrBggRKppKF8L52/VqdVsO47Dlw=="],
|
||||
|
||||
"hasown": ["hasown@2.0.2", "", { "dependencies": { "function-bind": "^1.1.2" } }, "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ=="],
|
||||
|
||||
"hono": ["hono@4.12.10", "", {}, "sha512-mx/p18PLy5og9ufies2GOSUqep98Td9q4i/EF6X7yJgAiIopxqdfIO3jbqsi3jRgTgw88jMDEzVKi+V2EF+27w=="],
|
||||
|
||||
"http-errors": ["http-errors@2.0.1", "", { "dependencies": { "depd": "~2.0.0", "inherits": "~2.0.4", "setprototypeof": "~1.2.0", "statuses": "~2.0.2", "toidentifier": "~1.0.1" } }, "sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ=="],
|
||||
|
||||
"humanize-ms": ["humanize-ms@1.2.1", "", { "dependencies": { "ms": "^2.0.0" } }, "sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ=="],
|
||||
|
||||
"iconv-lite": ["iconv-lite@0.7.2", "", { "dependencies": { "safer-buffer": ">= 2.1.2 < 3.0.0" } }, "sha512-im9DjEDQ55s9fL4EYzOAv0yMqmMBSZp6G0VvFyTMPKWxiSBHUj9NW/qqLmXUwXrrM7AvqSlTCfvqRb0cM8yYqw=="],
|
||||
|
||||
"inherits": ["inherits@2.0.4", "", {}, "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ=="],
|
||||
|
||||
"ip-address": ["ip-address@10.1.0", "", {}, "sha512-XXADHxXmvT9+CRxhXg56LJovE+bmWnEWB78LB83VZTprKTmaC5QfruXocxzTZ2Kl0DNwKuBdlIhjL8LeY8Sf8Q=="],
|
||||
|
||||
"ipaddr.js": ["ipaddr.js@1.9.1", "", {}, "sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g=="],
|
||||
|
||||
"is-extendable": ["is-extendable@0.1.1", "", {}, "sha512-5BMULNob1vgFX6EjQw5izWDxrecWK9AM72rugNr0TFldMOi0fj6Jk+zeKIt0xGj4cEfQIJth4w3OKWOJ4f+AFw=="],
|
||||
|
||||
"is-promise": ["is-promise@4.0.0", "", {}, "sha512-hvpoI6korhJMnej285dSg6nu1+e6uxs7zG3BYAm5byqDsgJNWwxzM6z6iZiAgQR4TJ30JmBTOwqZUw3WlyH3AQ=="],
|
||||
|
||||
"isexe": ["isexe@2.0.0", "", {}, "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw=="],
|
||||
|
||||
"jose": ["jose@6.2.2", "", {}, "sha512-d7kPDd34KO/YnzaDOlikGpOurfF0ByC2sEV4cANCtdqLlTfBlw2p14O/5d/zv40gJPbIQxfES3nSx1/oYNyuZQ=="],
|
||||
|
||||
"js-yaml": ["js-yaml@3.14.2", "", { "dependencies": { "argparse": "^1.0.7", "esprima": "^4.0.0" }, "bin": { "js-yaml": "bin/js-yaml.js" } }, "sha512-PMSmkqxr106Xa156c2M265Z+FTrPl+oxd/rgOQy2tijQeK5TxQ43psO1ZCwhVOSdnn+RzkzlRz/eY4BgJBYVpg=="],
|
||||
|
||||
"json-schema-traverse": ["json-schema-traverse@1.0.0", "", {}, "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug=="],
|
||||
|
||||
"json-schema-typed": ["json-schema-typed@8.0.2", "", {}, "sha512-fQhoXdcvc3V28x7C7BMs4P5+kNlgUURe2jmUT1T//oBRMDrqy1QPelJimwZGo7Hg9VPV3EQV5Bnq4hbFy2vetA=="],
|
||||
|
||||
"kind-of": ["kind-of@6.0.3", "", {}, "sha512-dcS1ul+9tmeD95T+x28/ehLgd9mENa3LsvDTtzm3vyBEO7RPptvAD+t44WVXaUjTBRcrpFeFlC8WCruUR456hw=="],
|
||||
|
||||
"math-intrinsics": ["math-intrinsics@1.1.0", "", {}, "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g=="],
|
||||
|
||||
"media-typer": ["media-typer@1.1.0", "", {}, "sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw=="],
|
||||
|
||||
"merge-descriptors": ["merge-descriptors@2.0.0", "", {}, "sha512-Snk314V5ayFLhp3fkUREub6WtjBfPdCPY1Ln8/8munuLuiYhsABgBVWsozAG+MWMbVEvcdcpbi9R7ww22l9Q3g=="],
|
||||
|
||||
"mime-db": ["mime-db@1.54.0", "", {}, "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ=="],
|
||||
|
||||
"mime-types": ["mime-types@3.0.2", "", { "dependencies": { "mime-db": "^1.54.0" } }, "sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A=="],
|
||||
|
||||
"ms": ["ms@2.1.3", "", {}, "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA=="],
|
||||
|
||||
"negotiator": ["negotiator@1.0.0", "", {}, "sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg=="],
|
||||
|
||||
"node-domexception": ["node-domexception@1.0.0", "", {}, "sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ=="],
|
||||
|
||||
"node-fetch": ["node-fetch@2.7.0", "", { "dependencies": { "whatwg-url": "^5.0.0" }, "peerDependencies": { "encoding": "^0.1.0" }, "optionalPeers": ["encoding"] }, "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A=="],
|
||||
|
||||
"object-assign": ["object-assign@4.1.1", "", {}, "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg=="],
|
||||
|
||||
"object-inspect": ["object-inspect@1.13.4", "", {}, "sha512-W67iLl4J2EXEGTbfeHCffrjDfitvLANg0UlX3wFUUSTx92KXRFegMHUVgSqE+wvhAbi4WqjGg9czysTV2Epbew=="],
|
||||
|
||||
"on-finished": ["on-finished@2.4.1", "", { "dependencies": { "ee-first": "1.1.1" } }, "sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg=="],
|
||||
|
||||
"once": ["once@1.4.0", "", { "dependencies": { "wrappy": "1" } }, "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w=="],
|
||||
|
||||
"openai": ["openai@4.104.0", "", { "dependencies": { "@types/node": "^18.11.18", "@types/node-fetch": "^2.6.4", "abort-controller": "^3.0.0", "agentkeepalive": "^4.2.1", "form-data-encoder": "1.7.2", "formdata-node": "^4.3.2", "node-fetch": "^2.6.7" }, "peerDependencies": { "ws": "^8.18.0", "zod": "^3.23.8" }, "optionalPeers": ["ws", "zod"], "bin": { "openai": "bin/cli" } }, "sha512-p99EFNsA/yX6UhVO93f5kJsDRLAg+CTA2RBqdHK4RtK8u5IJw32Hyb2dTGKbnnFmnuoBv5r7Z2CURI9sGZpSuA=="],
|
||||
|
||||
"parseurl": ["parseurl@1.3.3", "", {}, "sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ=="],
|
||||
|
||||
"path-key": ["path-key@3.1.1", "", {}, "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q=="],
|
||||
|
||||
"path-to-regexp": ["path-to-regexp@8.4.2", "", {}, "sha512-qRcuIdP69NPm4qbACK+aDogI5CBDMi1jKe0ry5rSQJz8JVLsC7jV8XpiJjGRLLol3N+R5ihGYcrPLTno6pAdBA=="],
|
||||
|
||||
"pgvector": ["pgvector@0.2.1", "", {}, "sha512-nKaQY9wtuiidwLMdVIce1O3kL0d+FxrigCVzsShnoqzOSaWWWOvuctb/sYwlai5cTwwzRSNa+a/NtN2kVZGNJw=="],
|
||||
|
||||
"pkce-challenge": ["pkce-challenge@5.0.1", "", {}, "sha512-wQ0b/W4Fr01qtpHlqSqspcj3EhBvimsdh0KlHhH8HRZnMsEa0ea2fTULOXOS9ccQr3om+GcGRk4e+isrZWV8qQ=="],
|
||||
|
||||
"postgres": ["postgres@3.4.9", "", {}, "sha512-GD3qdB0x1z9xgFI6cdRD6xu2Sp2WCOEoe3mtnyB5Ee0XrrL5Pe+e4CCnJrRMnL1zYtRDZmQQVbvOttLnKDLnaw=="],
|
||||
|
||||
"proxy-addr": ["proxy-addr@2.0.7", "", { "dependencies": { "forwarded": "0.2.0", "ipaddr.js": "1.9.1" } }, "sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg=="],
|
||||
|
||||
"qs": ["qs@6.15.0", "", { "dependencies": { "side-channel": "^1.1.0" } }, "sha512-mAZTtNCeetKMH+pSjrb76NAM8V9a05I9aBZOHztWy/UqcJdQYNsf59vrRKWnojAT9Y+GbIvoTBC++CPHqpDBhQ=="],
|
||||
|
||||
"range-parser": ["range-parser@1.2.1", "", {}, "sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg=="],
|
||||
|
||||
"raw-body": ["raw-body@3.0.2", "", { "dependencies": { "bytes": "~3.1.2", "http-errors": "~2.0.1", "iconv-lite": "~0.7.0", "unpipe": "~1.0.0" } }, "sha512-K5zQjDllxWkf7Z5xJdV0/B0WTNqx6vxG70zJE4N0kBs4LovmEYWJzQGxC9bS9RAKu3bgM40lrd5zoLJ12MQ5BA=="],
|
||||
|
||||
"require-from-string": ["require-from-string@2.0.2", "", {}, "sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw=="],
|
||||
|
||||
"router": ["router@2.2.0", "", { "dependencies": { "debug": "^4.4.0", "depd": "^2.0.0", "is-promise": "^4.0.0", "parseurl": "^1.3.3", "path-to-regexp": "^8.0.0" } }, "sha512-nLTrUKm2UyiL7rlhapu/Zl45FwNgkZGaCpZbIHajDYgwlJCOzLSk+cIPAnsEqV955GjILJnKbdQC1nVPz+gAYQ=="],
|
||||
|
||||
"safer-buffer": ["safer-buffer@2.1.2", "", {}, "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg=="],
|
||||
|
||||
"section-matter": ["section-matter@1.0.0", "", { "dependencies": { "extend-shallow": "^2.0.1", "kind-of": "^6.0.0" } }, "sha512-vfD3pmTzGpufjScBh50YHKzEu2lxBWhVEHsNGoEXmCmn2hKGfeNLYMzCJpe8cD7gqX7TJluOVpBkAequ6dgMmA=="],
|
||||
|
||||
"send": ["send@1.2.1", "", { "dependencies": { "debug": "^4.4.3", "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "etag": "^1.8.1", "fresh": "^2.0.0", "http-errors": "^2.0.1", "mime-types": "^3.0.2", "ms": "^2.1.3", "on-finished": "^2.4.1", "range-parser": "^1.2.1", "statuses": "^2.0.2" } }, "sha512-1gnZf7DFcoIcajTjTwjwuDjzuz4PPcY2StKPlsGAQ1+YH20IRVrBaXSWmdjowTJ6u8Rc01PoYOGHXfP1mYcZNQ=="],
|
||||
|
||||
"serve-static": ["serve-static@2.2.1", "", { "dependencies": { "encodeurl": "^2.0.0", "escape-html": "^1.0.3", "parseurl": "^1.3.3", "send": "^1.2.0" } }, "sha512-xRXBn0pPqQTVQiC8wyQrKs2MOlX24zQ0POGaj0kultvoOCstBQM5yvOhAVSUwOMjQtTvsPWoNCHfPGwaaQJhTw=="],
|
||||
|
||||
"setprototypeof": ["setprototypeof@1.2.0", "", {}, "sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw=="],
|
||||
|
||||
"shebang-command": ["shebang-command@2.0.0", "", { "dependencies": { "shebang-regex": "^3.0.0" } }, "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA=="],
|
||||
|
||||
"shebang-regex": ["shebang-regex@3.0.0", "", {}, "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A=="],
|
||||
|
||||
"side-channel": ["side-channel@1.1.0", "", { "dependencies": { "es-errors": "^1.3.0", "object-inspect": "^1.13.3", "side-channel-list": "^1.0.0", "side-channel-map": "^1.0.1", "side-channel-weakmap": "^1.0.2" } }, "sha512-ZX99e6tRweoUXqR+VBrslhda51Nh5MTQwou5tnUDgbtyM0dBgmhEDtWGP/xbKn6hqfPRHujUNwz5fy/wbbhnpw=="],
|
||||
|
||||
"side-channel-list": ["side-channel-list@1.0.0", "", { "dependencies": { "es-errors": "^1.3.0", "object-inspect": "^1.13.3" } }, "sha512-FCLHtRD/gnpCiCHEiJLOwdmFP+wzCmDEkc9y7NsYxeF4u7Btsn1ZuwgwJGxImImHicJArLP4R0yX4c2KCrMrTA=="],
|
||||
|
||||
"side-channel-map": ["side-channel-map@1.0.1", "", { "dependencies": { "call-bound": "^1.0.2", "es-errors": "^1.3.0", "get-intrinsic": "^1.2.5", "object-inspect": "^1.13.3" } }, "sha512-VCjCNfgMsby3tTdo02nbjtM/ewra6jPHmpThenkTYh8pG9ucZ/1P8So4u4FGBek/BjpOVsDCMoLA/iuBKIFXRA=="],
|
||||
|
||||
"side-channel-weakmap": ["side-channel-weakmap@1.0.2", "", { "dependencies": { "call-bound": "^1.0.2", "es-errors": "^1.3.0", "get-intrinsic": "^1.2.5", "object-inspect": "^1.13.3", "side-channel-map": "^1.0.1" } }, "sha512-WPS/HvHQTYnHisLo9McqBHOJk2FkHO/tlpvldyrnem4aeQp4hai3gythswg6p01oSoTl58rcpiFAjF2br2Ak2A=="],
|
||||
|
||||
"sprintf-js": ["sprintf-js@1.0.3", "", {}, "sha512-D9cPgkvLlV3t3IzL0D0YLvGA9Ahk4PcvVwUbN0dSGr1aP0Nrt4AEnTUbuGvquEC0mA64Gqt1fzirlRs5ibXx8g=="],
|
||||
|
||||
"statuses": ["statuses@2.0.2", "", {}, "sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw=="],
|
||||
|
||||
"strip-bom-string": ["strip-bom-string@1.0.0", "", {}, "sha512-uCC2VHvQRYu+lMh4My/sFNmF2klFymLX1wHJeXnbEJERpV/ZsVuonzerjfrGpIGF7LBVa1O7i9kjiWvJiFck8g=="],
|
||||
|
||||
"toidentifier": ["toidentifier@1.0.1", "", {}, "sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA=="],
|
||||
|
||||
"tr46": ["tr46@0.0.3", "", {}, "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw=="],
|
||||
|
||||
"type-is": ["type-is@2.0.1", "", { "dependencies": { "content-type": "^1.0.5", "media-typer": "^1.1.0", "mime-types": "^3.0.0" } }, "sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw=="],
|
||||
|
||||
"undici-types": ["undici-types@5.26.5", "", {}, "sha512-JlCMO+ehdEIKqlFxk6IfVoAUVmgz7cU7zD/h9XZ0qzeosSHmUJVOzSQvvYSYWXkFXC+IfLKSIffhv0sVZup6pA=="],
|
||||
|
||||
"unpipe": ["unpipe@1.0.0", "", {}, "sha512-pjy2bYhSsufwWlKwPc+l3cN7+wuJlK6uz0YdJEOlQDbl6jo/YlPi4mb8agUkVC8BF7V8NuzeyPNqRksA3hztKQ=="],
|
||||
|
||||
"vary": ["vary@1.1.2", "", {}, "sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg=="],
|
||||
|
||||
"web-streams-polyfill": ["web-streams-polyfill@4.0.0-beta.3", "", {}, "sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug=="],
|
||||
|
||||
"webidl-conversions": ["webidl-conversions@3.0.1", "", {}, "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ=="],
|
||||
|
||||
"whatwg-url": ["whatwg-url@5.0.0", "", { "dependencies": { "tr46": "~0.0.3", "webidl-conversions": "^3.0.0" } }, "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw=="],
|
||||
|
||||
"which": ["which@2.0.2", "", { "dependencies": { "isexe": "^2.0.0" }, "bin": { "node-which": "./bin/node-which" } }, "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA=="],
|
||||
|
||||
"wrappy": ["wrappy@1.0.2", "", {}, "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ=="],
|
||||
|
||||
"zod": ["zod@4.3.6", "", {}, "sha512-rftlrkhHZOcjDwkGlnUtZZkvaPHCsDATp4pGpuOOMDaTdDDXF91wuVDJoWoPsKX/3YPQ5fHuF3STjcYyKr+Qhg=="],
|
||||
|
||||
"zod-to-json-schema": ["zod-to-json-schema@3.25.2", "", { "peerDependencies": { "zod": "^3.25.28 || ^4" } }, "sha512-O/PgfnpT1xKSDeQYSCfRI5Gy3hPf91mKVDuYLUHZJMiDFptvP41MSnWofm8dnCm0256ZNfZIM7DSzuSMAFnjHA=="],
|
||||
|
||||
"@types/node-fetch/@types/node": ["@types/node@25.5.2", "", { "dependencies": { "undici-types": "~7.18.0" } }, "sha512-tO4ZIRKNC+MDWV4qKVZe3Ql/woTnmHDr5JD8UI5hn2pwBrHEwOEMZK7WlNb5RKB6EoJ02gwmQS9OrjuFnZYdpg=="],
|
||||
|
||||
"bun-types/@types/node": ["@types/node@25.5.2", "", { "dependencies": { "undici-types": "~7.18.0" } }, "sha512-tO4ZIRKNC+MDWV4qKVZe3Ql/woTnmHDr5JD8UI5hn2pwBrHEwOEMZK7WlNb5RKB6EoJ02gwmQS9OrjuFnZYdpg=="],
|
||||
|
||||
"form-data/mime-types": ["mime-types@2.1.35", "", { "dependencies": { "mime-db": "1.52.0" } }, "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw=="],
|
||||
|
||||
"@types/node-fetch/@types/node/undici-types": ["undici-types@7.18.2", "", {}, "sha512-AsuCzffGHJybSaRrmr5eHr81mwJU3kjw6M+uprWvCXiNeN9SOGwQ3Jn8jb8m3Z6izVgknn1R0FTCEAP2QrLY/w=="],
|
||||
|
||||
"bun-types/@types/node/undici-types": ["undici-types@7.18.2", "", {}, "sha512-AsuCzffGHJybSaRrmr5eHr81mwJU3kjw6M+uprWvCXiNeN9SOGwQ3Jn8jb8m3Z6izVgknn1R0FTCEAP2QrLY/w=="],
|
||||
|
||||
"form-data/mime-types/mime-db": ["mime-db@1.52.0", "", {}, "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg=="],
|
||||
}
|
||||
}
|
||||
198
docs/ENGINES.md
Normal file
198
docs/ENGINES.md
Normal file
@@ -0,0 +1,198 @@
|
||||
# Pluggable Engine Architecture
|
||||
|
||||
## The idea
|
||||
|
||||
Every GBrain operation goes through `BrainEngine`. The engine is the contract between "what the brain can do" and "how it's stored." Swap the engine, keep everything else.
|
||||
|
||||
v0 ships `PostgresEngine` backed by Supabase. The interface is designed so a `SQLiteEngine`, `DuckDBEngine`, or `TursoEngine` could slot in without touching the CLI, MCP server, skills, or any consumer code.
|
||||
|
||||
## Why this matters
|
||||
|
||||
Different users have different constraints:
|
||||
|
||||
| User | Needs | Best engine |
|
||||
|------|-------|-------------|
|
||||
| Power user (you) | World-class search, 7K+ pages, zero-ops | PostgresEngine + Supabase |
|
||||
| Open source hacker | Single file, no server, git-friendly | SQLiteEngine (future) |
|
||||
| Team/enterprise | Multi-user, RLS, audit trail | PostgresEngine + self-hosted |
|
||||
| Researcher | Analytics, bulk exports, embeddings | DuckDBEngine (someday) |
|
||||
| Edge/mobile | Offline-first, sync later | SQLiteEngine + sync (someday) |
|
||||
|
||||
The engine interface means we don't have to choose. Ship Postgres now, let the community build the rest.
|
||||
|
||||
## The interface
|
||||
|
||||
```typescript
|
||||
// src/core/engine.ts
|
||||
|
||||
export interface BrainEngine {
|
||||
// Lifecycle
|
||||
connect(config: EngineConfig): Promise<void>;
|
||||
disconnect(): Promise<void>;
|
||||
initSchema(): Promise<void>;
|
||||
transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T>;
|
||||
|
||||
// Pages CRUD
|
||||
getPage(slug: string): Promise<Page | null>;
|
||||
putPage(slug: string, page: PageInput): Promise<Page>;
|
||||
deletePage(slug: string): Promise<void>;
|
||||
listPages(filters: PageFilters): Promise<Page[]>;
|
||||
|
||||
// Search
|
||||
searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]>;
|
||||
searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]>;
|
||||
|
||||
// Chunks
|
||||
upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void>;
|
||||
getChunks(slug: string): Promise<Chunk[]>;
|
||||
|
||||
// Links
|
||||
addLink(from: string, to: string, context?: string, linkType?: string): Promise<void>;
|
||||
removeLink(from: string, to: string): Promise<void>;
|
||||
getLinks(slug: string): Promise<Link[]>;
|
||||
getBacklinks(slug: string): Promise<Link[]>;
|
||||
traverseGraph(slug: string, depth?: number): Promise<GraphNode[]>;
|
||||
|
||||
// Tags
|
||||
addTag(slug: string, tag: string): Promise<void>;
|
||||
removeTag(slug: string, tag: string): Promise<void>;
|
||||
getTags(slug: string): Promise<string[]>;
|
||||
|
||||
// Timeline
|
||||
addTimelineEntry(slug: string, entry: TimelineInput): Promise<void>;
|
||||
getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]>;
|
||||
|
||||
// Raw data
|
||||
putRawData(slug: string, source: string, data: object): Promise<void>;
|
||||
getRawData(slug: string, source?: string): Promise<RawData[]>;
|
||||
|
||||
// Versions
|
||||
createVersion(slug: string): Promise<PageVersion>;
|
||||
getVersions(slug: string): Promise<PageVersion[]>;
|
||||
revertToVersion(slug: string, versionId: number): Promise<void>;
|
||||
|
||||
// Stats + health
|
||||
getStats(): Promise<BrainStats>;
|
||||
getHealth(): Promise<BrainHealth>;
|
||||
|
||||
// Ingest log
|
||||
logIngest(entry: IngestLogInput): Promise<void>;
|
||||
getIngestLog(opts?: IngestLogOpts): Promise<IngestLogEntry[]>;
|
||||
|
||||
// Config
|
||||
getConfig(key: string): Promise<string | null>;
|
||||
setConfig(key: string, value: string): Promise<void>;
|
||||
}
|
||||
```
|
||||
|
||||
### Key design choices
|
||||
|
||||
**Slug-based API, not ID-based.** Every method takes slugs, not numeric IDs. The engine resolves slugs to IDs internally. This keeps the interface portable... slugs are strings, IDs are database-specific.
|
||||
|
||||
**Embedding is NOT in the engine.** The engine stores embeddings and searches by vector, but it doesn't generate embeddings. `src/core/embedding.ts` handles that. This is intentional: embedding is an external API call (OpenAI), not a storage concern. All engines share the same embedding service.
|
||||
|
||||
**Chunking is NOT in the engine.** Same logic. `src/core/chunkers/` handles chunking. The engine stores and retrieves chunks. All engines share the same chunkers.
|
||||
|
||||
**Search returns `SearchResult[]`, not raw rows.** The engine is responsible for its own search implementation (tsvector vs FTS5, pgvector vs sqlite-vss) but must return a uniform result type. RRF fusion and dedup happen above the engine, in `src/core/search/hybrid.ts`.
|
||||
|
||||
**`traverseGraph` exists but is engine-specific.** Postgres uses recursive CTEs. SQLite would use a loop with depth tracking. The interface is the same: give me a slug and max depth, return the graph.
|
||||
|
||||
## How search works across engines
|
||||
|
||||
```
|
||||
+-------------------+
|
||||
| hybrid.ts |
|
||||
| (RRF fusion + |
|
||||
| dedup, shared) |
|
||||
+--------+----------+
|
||||
|
|
||||
+------------+------------+
|
||||
| |
|
||||
+--------v--------+ +--------v--------+
|
||||
| engine.search | | engine.search |
|
||||
| Keyword() | | Vector() |
|
||||
+-----------------+ +-----------------+
|
||||
| |
|
||||
+-----------+-----------+ +---------+---------+
|
||||
| | | |
|
||||
+-------v-------+ +-------v---+ +-------v---+ +----v--------+
|
||||
| Postgres: | | SQLite: | | Postgres: | | SQLite: |
|
||||
| tsvector + | | FTS5 + | | pgvector | | sqlite-vss |
|
||||
| ts_rank + | | bm25 | | HNSW | | or vec0 |
|
||||
| websearch_to_ | | | | cosine | | |
|
||||
| tsquery | | | | | | |
|
||||
+---------------+ +-----------+ +-----------+ +-------------+
|
||||
```
|
||||
|
||||
RRF fusion, multi-query expansion, and 4-layer dedup are engine-agnostic. They operate on `SearchResult[]` arrays. Only the raw keyword and vector searches are engine-specific.
|
||||
|
||||
## PostgresEngine (v0, ships)
|
||||
|
||||
**Dependencies:** `postgres` (porsager/postgres), `pgvector`
|
||||
|
||||
**Postgres-specific features used:**
|
||||
- `tsvector` + `GIN` index for full-text search with `ts_rank` weighting
|
||||
- `pgvector` HNSW index for cosine similarity vector search
|
||||
- `pg_trgm` + `GIN` for fuzzy slug resolution
|
||||
- Recursive CTEs for graph traversal
|
||||
- Trigger-based search_vector (spans pages + timeline_entries)
|
||||
- JSONB for frontmatter with GIN index
|
||||
- Connection pooling via Supabase Supavisor (port 6543)
|
||||
|
||||
**Hosting:** Supabase Pro ($25/mo). Zero-ops. Managed Postgres with pgvector built in.
|
||||
|
||||
**Why not self-hosted for v0:** The brain should be infrastructure agents use, not something you maintain. Self-hosted Postgres with Docker is a welcome community PR, but v0 optimizes for zero ops.
|
||||
|
||||
## Adding a new engine
|
||||
|
||||
1. Create `src/core/<name>-engine.ts` implementing `BrainEngine`
|
||||
2. Add to engine factory in `src/core/engine.ts`:
|
||||
```typescript
|
||||
export function createEngine(type: string): BrainEngine {
|
||||
switch (type) {
|
||||
case 'postgres': return new PostgresEngine();
|
||||
case 'sqlite': return new SQLiteEngine();
|
||||
default: throw new Error(`Unknown engine: ${type}`);
|
||||
}
|
||||
}
|
||||
```
|
||||
3. Store engine type in `~/.gbrain/config.json`: `{ "engine": "sqlite", ... }`
|
||||
4. Add tests. The test suite should be engine-agnostic where possible... same test cases, different engine constructor.
|
||||
5. Document in this file + add a design doc in `docs/`
|
||||
|
||||
### What you DON'T need to touch
|
||||
|
||||
- `src/cli.ts` (dispatches to engine, doesn't know which one)
|
||||
- `src/mcp/server.ts` (same)
|
||||
- `src/core/chunkers/*` (shared across engines)
|
||||
- `src/core/embedding.ts` (shared across engines)
|
||||
- `src/core/search/hybrid.ts`, `expansion.ts`, `dedup.ts` (shared, operate on SearchResult[])
|
||||
- `skills/*` (fat markdown, engine-agnostic)
|
||||
|
||||
### What you DO need to implement
|
||||
|
||||
Every method in `BrainEngine`. The full interface. No optional methods, no feature flags. If your engine can't do vector search (e.g., a pure-text engine), implement `searchVector` to return `[]` and document the limitation.
|
||||
|
||||
## Capability matrix
|
||||
|
||||
| Capability | PostgresEngine | SQLiteEngine (future) | Notes |
|
||||
|-----------|---------------|----------------------|-------|
|
||||
| CRUD | Full | Full | |
|
||||
| Keyword search | tsvector + ts_rank | FTS5 + bm25 | Different ranking algorithms |
|
||||
| Vector search | pgvector HNSW | sqlite-vss or vec0 | Different index types |
|
||||
| Fuzzy slug | pg_trgm | LIKE + Levenshtein | Postgres is better here |
|
||||
| Graph traversal | Recursive CTE | Loop with depth tracking | Same interface |
|
||||
| Transactions | Full ACID | Full ACID | Both support this |
|
||||
| JSONB queries | GIN index | json_extract | Postgres is richer |
|
||||
| Concurrent access | Connection pooling | Single writer | SQLite limitation |
|
||||
| Hosting | Supabase, self-hosted, Docker | Local file | |
|
||||
|
||||
## Future engine ideas
|
||||
|
||||
**SQLiteEngine** (most requested). See `docs/SQLITE_ENGINE.md` for the full plan. Single file, no server, git-friendly. Uses FTS5 for keyword search, sqlite-vss or vec0 for vector search. Great for open source users who want zero infrastructure.
|
||||
|
||||
**TursoEngine.** libSQL (SQLite fork) with embedded replicas and HTTP edge access. Would give SQLite's simplicity with cloud sync. Interesting for mobile/edge use cases.
|
||||
|
||||
**DuckDBEngine.** Analytical workloads. Bulk exports, embedding analysis, brain-wide statistics. Not for OLTP. Could be a secondary engine for analytics alongside Postgres for operations.
|
||||
|
||||
**Custom/Remote.** The interface is clean enough that someone could build an engine backed by any storage: Firestore, DynamoDB, a REST API, even a flat file system. The interface doesn't assume SQL.
|
||||
545
docs/GBRAIN_V0.md
Normal file
545
docs/GBRAIN_V0.md
Normal file
@@ -0,0 +1,545 @@
|
||||
# GBrain v0: Postgres-Native Personal Knowledge Brain
|
||||
|
||||
## What this is
|
||||
|
||||
GBrain is a compiled intelligence system. Not a note-taking app. Not "chat with your notes."
|
||||
|
||||
Every page is an intelligence assessment. Above the line: compiled truth (your current best understanding, rewritten when evidence changes). Below the line: timeline (append-only evidence trail). AI agents maintain the brain. MCP clients query it. The intelligence lives in fat markdown skills, not application code.
|
||||
|
||||
The core insight: personal knowledge at scale is an intelligence problem, not a storage problem.
|
||||
|
||||
## Why it exists
|
||||
|
||||
A 7,471-file / 2.3GB markdown wiki is choking git. Git doesn't scale past ~5K files for wiki-style use. The compiled truth + timeline model (Karpathy-style knowledge pages) is right, but it needs a real database underneath.
|
||||
|
||||
There's already a production-grade RAG system (Ruby on Rails, Postgres + pgvector) with 3-tier chunking, hybrid search with RRF, multi-query expansion, and 4-layer dedup. GBrain ports these proven patterns to a standalone Bun + TypeScript tool.
|
||||
|
||||
## The knowledge model
|
||||
|
||||
```
|
||||
+--------------------------------------------------+
|
||||
| Page: concepts/do-things-that-dont-scale |
|
||||
| |
|
||||
| --- frontmatter (YAML) --- |
|
||||
| type: concept |
|
||||
| tags: [startups, growth, pg-essay] |
|
||||
| |
|
||||
| === COMPILED TRUTH === |
|
||||
| Current best understanding. |
|
||||
| Rewritten on new evidence. |
|
||||
| This is the "what we know now" section. |
|
||||
| |
|
||||
| --- |
|
||||
| |
|
||||
| === TIMELINE === |
|
||||
| Append-only evidence trail. |
|
||||
| - 2013-07-01: Published on paulgraham.com |
|
||||
| - 2024-11-15: Referenced in batch kickoff talk |
|
||||
| Never edited, only appended. |
|
||||
+--------------------------------------------------+
|
||||
| |
|
||||
v v
|
||||
[Semantic chunks] [Recursive chunks]
|
||||
(best quality for (predictable format
|
||||
compiled truth) for timeline)
|
||||
| |
|
||||
v v
|
||||
[Embeddings: text-embedding-3-large, 1536 dims]
|
||||
|
|
||||
v
|
||||
[HNSW index + tsvector + pg_trgm]
|
||||
|
|
||||
v
|
||||
[Hybrid search: vector + keyword + RRF fusion]
|
||||
```
|
||||
|
||||
## Architecture decisions
|
||||
|
||||
### v0 stack
|
||||
|
||||
| Layer | Choice | Why |
|
||||
|-------|--------|-----|
|
||||
| Database | Postgres + pgvector | Proven RAG patterns, production-tested. World-class hybrid search. |
|
||||
| Hosting | Supabase Pro ($25/mo) | Zero-ops. Managed Postgres, pgvector, connection pooling. 8GB storage. |
|
||||
| Runtime | Bun + TypeScript | Consistent with GStack ecosystem. Fast. Compiles to single binary. |
|
||||
| Embeddings | OpenAI text-embedding-3-large | 1536 dims (reduced from 3072 via dimensions API). ~$0.13/1M tokens. |
|
||||
| LLM (chunking/expansion) | Claude Haiku | Cheapest model for topic boundary detection and query expansion. |
|
||||
| Background jobs | Trigger.dev | Serverless. Embed backfill, stale detection, orphan audit, tag consistency. |
|
||||
| Distribution | npm package + compiled binary + MCP server | Library for OpenClaw, CLI for humans, MCP for agents. |
|
||||
|
||||
### What we chose and why
|
||||
|
||||
**Postgres over SQLite.** We have 3+ years of proven RAG patterns running on Postgres. tsvector for full-text search, pgvector HNSW for semantic search, pg_trgm for fuzzy slug matching. Porting these to SQLite would mean reimplementing search from scratch. SQLite is a future pluggable engine for lightweight open source users (see `docs/ENGINES.md`).
|
||||
|
||||
**Supabase over self-hosted.** Zero maintenance. The brain should be infrastructure that AI agents use, not something you administer. Free tier has pgvector but only 500MB (not enough for 7K+ pages with embeddings, which need ~750MB). Pro tier at $25/mo gives 8GB. No Docker, no self-hosted Postgres in v1.
|
||||
|
||||
**Full port over minimal viable.** The patterns are proven. The port is mechanical. Shipping the full 3-tier chunking + hybrid search + 4-layer dedup means world-class RAG from day one. "We'll add that later" means rebuilding everything later.
|
||||
|
||||
**Library-first distribution.** gbrain is an npm package. OpenClaw installs it as a dependency (`bun add gbrain`), imports the engine directly. Zero-overhead function calls, shared connection pool, TypeScript types. The CLI and MCP server are thin wrappers over the same engine.
|
||||
|
||||
**Trigger-based tsvector (not generated column).** To include timeline_entries content in full-text search, the tsvector needs to span multiple tables. Generated columns can't do cross-table references. A trigger on pages + timeline_entries updates the search_vector.
|
||||
|
||||
**Auto-embed during import.** No separate embed step. `gbrain import` chunks and embeds in one pass. Progress bar shows status. `--no-embed` flag for users who want to defer. `embedded_at` column enables `gbrain embed --stale` for backfill.
|
||||
|
||||
## Distribution model
|
||||
|
||||
```
|
||||
+-------------------+ +-------------------+ +-------------------+
|
||||
| npm package | | Compiled binary | | MCP server |
|
||||
| (library) | | (CLI) | | (stdio) |
|
||||
+-------------------+ +-------------------+ +-------------------+
|
||||
| | | | | |
|
||||
| bun add gbrain | | GitHub Releases | | gbrain serve |
|
||||
| import { Postgres | | npx gbrain | | in mcp.json |
|
||||
| Engine } | | | | |
|
||||
| | | | | |
|
||||
| WHO: OpenClaw, | | WHO: Humans | | WHO: Claude Code, |
|
||||
| AlphaClaw | | | | Cursor, etc. |
|
||||
+-------------------+ +-------------------+ +-------------------+
|
||||
| | |
|
||||
+-------------------------+-------------------------+
|
||||
|
|
||||
+--------v--------+
|
||||
| BrainEngine |
|
||||
| (pluggable |
|
||||
| interface) |
|
||||
+-----------------+
|
||||
|
|
||||
+-------------+-------------+
|
||||
| |
|
||||
+------v------+ +-------v-------+
|
||||
| Postgres | | SQLite |
|
||||
| Engine | | Engine |
|
||||
| (v0, ships) | | (future, see |
|
||||
+-------------+ | ENGINES.md) |
|
||||
+---------------+
|
||||
```
|
||||
|
||||
package.json exports:
|
||||
- Library: `src/core/index.ts` (BrainEngine interface, PostgresEngine, types)
|
||||
- CLI binary: `src/cli.ts`
|
||||
|
||||
## First-time experience
|
||||
|
||||
### Path 1: OpenClaw user (primary)
|
||||
|
||||
OpenClaw is the AI orchestrator that uses gbrain as its knowledge backend. This is the most common install path.
|
||||
|
||||
```bash
|
||||
# 1. Install gbrain as a ClawHub skill
|
||||
clawhub install gbrain
|
||||
|
||||
# 2. The skill runs guided setup on first use:
|
||||
# - Detects if Supabase CLI is available
|
||||
# - If yes: auto-provisions a new Supabase project
|
||||
# - If no: prompts for connection URL
|
||||
# - Runs schema migration
|
||||
# - Imports bundled kindling corpus (10 PG essays)
|
||||
# - Shows live entity/edge extraction animation
|
||||
# - Brain is ready
|
||||
|
||||
# 3. From OpenClaw, brain tools are now available:
|
||||
# "What essays do we have about startups?"
|
||||
# "Ingest my meeting notes from today"
|
||||
# "What does PG say about doing things that don't scale?"
|
||||
```
|
||||
|
||||
Behind the scenes, `clawhub install gbrain`:
|
||||
1. Installs the `gbrain` npm package
|
||||
2. Ships SKILL.md files (ingest, query, maintain, enrich, briefing, migrate)
|
||||
3. Registers brain tools with the orchestrator
|
||||
4. Runs `gbrain init --supabase` on first use (guided wizard)
|
||||
|
||||
### Path 2: CLI user (standalone)
|
||||
|
||||
```bash
|
||||
# 1. Install
|
||||
npm install -g gbrain
|
||||
# or: download binary from GitHub Releases
|
||||
|
||||
# 2. Initialize with Supabase
|
||||
gbrain init --supabase
|
||||
# Guided wizard:
|
||||
# Try 1: Supabase CLI auto-provision (npx supabase)
|
||||
# Try 2: If CLI not installed or not logged in, fallback:
|
||||
# "Enter your Supabase connection URL:"
|
||||
# Then: runs schema migration, verifies pgvector extension
|
||||
# Then: imports kindling corpus (10 PG essays as demo data)
|
||||
# Then: shows live entity extraction animation
|
||||
# Output: "Brain ready. 10 pages imported. Try: gbrain query 'what does PG say about startups?'"
|
||||
|
||||
# 3. Import your data
|
||||
gbrain import /path/to/markdown/wiki/
|
||||
# Progress bar: 7,471 files, auto-chunk, auto-embed
|
||||
# ~30s for text import, ~10-15 min for embedding
|
||||
|
||||
# 4. Query
|
||||
gbrain query "what does PG say about doing things that don't scale?"
|
||||
```
|
||||
|
||||
### Path 3: MCP user (Claude Code, Cursor)
|
||||
|
||||
```json
|
||||
// ~/.config/claude/mcp.json
|
||||
{
|
||||
"mcpServers": {
|
||||
"gbrain": {
|
||||
"command": "gbrain",
|
||||
"args": ["serve"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then in Claude Code: "Search my brain for people who know about robotics"
|
||||
|
||||
### The init wizard in detail
|
||||
|
||||
`gbrain init --supabase` runs through these steps:
|
||||
|
||||
```
|
||||
Step 1: Database Setup
|
||||
├── Check for Supabase CLI (npx supabase --version)
|
||||
│ ├── Found + logged in → auto-create project
|
||||
│ │ ├── Create project via supabase CLI
|
||||
│ │ ├── Wait for project to be ready
|
||||
│ │ └── Extract connection string
|
||||
│ ├── Found + not logged in →
|
||||
│ │ └── Error: "Supabase CLI found but not logged in."
|
||||
│ │ Cause: "You need to authenticate first."
|
||||
│ │ Fix: "Run: npx supabase login"
|
||||
│ │ Docs: "https://supabase.com/docs/guides/cli"
|
||||
│ └── Not found → fallback to manual
|
||||
│ └── Prompt: "Enter your Supabase connection URL:"
|
||||
│
|
||||
Step 2: Schema Migration
|
||||
├── Connect to database
|
||||
├── CREATE EXTENSION IF NOT EXISTS vector
|
||||
├── CREATE EXTENSION IF NOT EXISTS pg_trgm
|
||||
├── Run src/schema.sql (all tables, indexes, triggers)
|
||||
└── Verify: test insert + vector query
|
||||
|
||||
Step 3: Config
|
||||
├── Write ~/.gbrain/config.json (0600 permissions)
|
||||
│ { "database_url": "...", "service_role_key": "..." }
|
||||
└── Verify connection
|
||||
|
||||
Step 4: Kindling Import
|
||||
├── Import 10 bundled PG essays as demo data
|
||||
├── Chunk + embed each essay
|
||||
├── Show live entity/edge extraction animation:
|
||||
│ "Extracting entities... Paul Graham (person), Y Combinator (company)..."
|
||||
│ "Creating links... Paul Graham → Y Combinator (founded)..."
|
||||
└── Output: "Brain ready. 10 pages imported."
|
||||
|
||||
Step 5: First Query
|
||||
└── "Try: gbrain query 'what does PG say about doing things that don't scale?'"
|
||||
```
|
||||
|
||||
Every error follows the style guide: problem + cause + fix + docs link.
|
||||
|
||||
## CLI commands
|
||||
|
||||
```
|
||||
gbrain init [--supabase|--url <conn>] # create brain
|
||||
gbrain get <slug> # read a page
|
||||
gbrain put <slug> [< file.md] # write/update a page
|
||||
gbrain search <query> # keyword search (tsvector)
|
||||
gbrain query <question> # hybrid search (RRF + expansion)
|
||||
gbrain ingest <file> [--type ...] # ingest a source document
|
||||
gbrain link <from> <to> [--type <type>] # create typed link
|
||||
gbrain unlink <from> <to> # remove link
|
||||
gbrain graph <slug> [--depth 5] # traverse link graph (recursive CTE)
|
||||
gbrain backlinks <slug> # incoming links
|
||||
gbrain tags <slug> # list tags
|
||||
gbrain tag <slug> <tag> # add tag
|
||||
gbrain untag <slug> <tag> # remove tag
|
||||
gbrain timeline [<slug>] # view timeline
|
||||
gbrain timeline-add <slug> <date> <text> # add timeline entry
|
||||
gbrain list [--type] [--tag] [--limit] # list with filters
|
||||
gbrain stats # brain statistics
|
||||
gbrain health # brain health dashboard
|
||||
gbrain import <dir> [--no-embed] # import from markdown directory
|
||||
gbrain export [--dir ./export/] # export to markdown (round-trip)
|
||||
gbrain embed [<slug>|--all|--stale] # generate/refresh embeddings
|
||||
gbrain serve # MCP server (stdio)
|
||||
gbrain call <tool> '<json>' # raw tool invocation
|
||||
gbrain upgrade # self-update (npm, binary, ClawHub)
|
||||
gbrain version # version info
|
||||
gbrain config [get|set] <key> [value] # brain config
|
||||
```
|
||||
|
||||
CLI and MCP expose identical operations. Drift tests assert identical results for all operations across both interfaces.
|
||||
|
||||
## Database schema
|
||||
|
||||
9 tables in Postgres + pgvector:
|
||||
|
||||
```
|
||||
+------------------+ +-------------------+ +------------------+
|
||||
| pages |---->| content_chunks | | links |
|
||||
|------------------| |-------------------| |------------------|
|
||||
| id (PK) | | id (PK) | | id (PK) |
|
||||
| slug (UNIQUE) | | page_id (FK) | | from_page_id(FK) |
|
||||
| type | | chunk_index | | to_page_id (FK) |
|
||||
| title | | chunk_text | | link_type |
|
||||
| compiled_truth | | chunk_source | | context |
|
||||
| timeline | | embedding (1536) | +------------------+
|
||||
| frontmatter(JSONB)| | model |
|
||||
| search_vector | | token_count | +------------------+
|
||||
| created_at | | embedded_at | | tags |
|
||||
| updated_at | +-------------------+ |------------------|
|
||||
+------------------+ | id (PK) |
|
||||
| | page_id (FK) |
|
||||
+-----> +--------------------+ | tag |
|
||||
| | timeline_entries | +------------------+
|
||||
| |--------------------|
|
||||
| | id (PK) | +------------------+
|
||||
| | page_id (FK) | | page_versions |
|
||||
| | date | |------------------|
|
||||
| | source | | id (PK) |
|
||||
| | summary | | page_id (FK) |
|
||||
| | detail (markdown) | | compiled_truth |
|
||||
| +--------------------+ | frontmatter |
|
||||
| | snapshot_at |
|
||||
+-----> +--------------------+ +------------------+
|
||||
| | raw_data |
|
||||
| |--------------------| +------------------+
|
||||
| | id (PK) | | config |
|
||||
| | page_id (FK) | |------------------|
|
||||
| | source | | key (PK) |
|
||||
| | data (JSONB) | | value |
|
||||
| +--------------------+ +------------------+
|
||||
|
|
||||
+-----> +--------------------+
|
||||
| ingest_log |
|
||||
|--------------------|
|
||||
| id (PK) |
|
||||
| source_type |
|
||||
| source_ref |
|
||||
| pages_updated |
|
||||
| summary |
|
||||
+--------------------+
|
||||
```
|
||||
|
||||
Indexes:
|
||||
- `pages.slug`: UNIQUE constraint (implicit B-tree)
|
||||
- `pages.type`: B-tree
|
||||
- `pages.search_vector`: GIN (full-text search)
|
||||
- `pages.frontmatter`: GIN (JSONB queries)
|
||||
- `pages.title`: GIN with pg_trgm (fuzzy slug resolution)
|
||||
- `content_chunks.embedding`: HNSW with cosine ops (vector search)
|
||||
- `content_chunks.page_id`: B-tree
|
||||
- `links.from_page_id`, `links.to_page_id`: B-tree
|
||||
- `tags.tag`, `tags.page_id`: B-tree
|
||||
- `timeline_entries.page_id`, `timeline_entries.date`: B-tree
|
||||
|
||||
## Search architecture
|
||||
|
||||
```
|
||||
Query: "when should you ignore conventional wisdom?"
|
||||
|
|
||||
v
|
||||
+---------------------+
|
||||
| Multi-query expansion|
|
||||
| (Claude Haiku) |
|
||||
| "contrarian thinking"
|
||||
| "going against the crowd"
|
||||
+---------------------+
|
||||
| | |
|
||||
v v v
|
||||
[embed all 3 queries]
|
||||
| | |
|
||||
+---+---+
|
||||
|
|
||||
+----+----+
|
||||
| |
|
||||
v v
|
||||
+--------+ +--------+
|
||||
| Vector | | Keyword|
|
||||
| Search | | Search |
|
||||
| (HNSW | | (tsv + |
|
||||
| cosine)| | ts_rank)|
|
||||
+--------+ +--------+
|
||||
| |
|
||||
+----+----+
|
||||
|
|
||||
v
|
||||
+------------------+
|
||||
| RRF Fusion |
|
||||
| score = sum( |
|
||||
| 1/(60 + rank)) |
|
||||
+------------------+
|
||||
|
|
||||
v
|
||||
+------------------+
|
||||
| 4-Layer Dedup |
|
||||
| 1. By source |
|
||||
| 2. Cosine > 0.85 |
|
||||
| 3. Type cap 60% |
|
||||
| 4. Per-page max |
|
||||
+------------------+
|
||||
|
|
||||
v
|
||||
+------------------+
|
||||
| Stale alerts |
|
||||
| (compiled_truth |
|
||||
| older than |
|
||||
| latest timeline)|
|
||||
+------------------+
|
||||
|
|
||||
v
|
||||
[Results]
|
||||
```
|
||||
|
||||
## Chunking strategies
|
||||
|
||||
| Strategy | Input | Algorithm | When to use |
|
||||
|----------|-------|-----------|-------------|
|
||||
| Recursive | Any text | 5-level delimiter hierarchy (paragraphs > lines > sentences > clauses > whitespace). 300-word chunks, 50-word overlap. | Timeline (predictable format), bulk import |
|
||||
| Semantic | Quality text | Embed each sentence, Savitzky-Golay filter for topic boundaries, cosine similarity minima. Falls back to recursive. | Compiled truth (intelligence assessments) |
|
||||
| LLM-guided | High-value text | Pre-split to 128-word candidates, Claude Haiku finds topic shifts in sliding windows. 3 retries per window. | Explicitly requested via `--chunker llm` |
|
||||
|
||||
Dispatch: compiled_truth gets semantic chunker. Timeline gets recursive chunker. Override with `--chunker` flag or `chunk_strategy` in frontmatter.
|
||||
|
||||
## Skills (fat markdown, no code)
|
||||
|
||||
Each skill is a markdown file that AI agents (Claude Code, OpenClaw) read and follow. The skill contains the workflow, heuristics, and quality rules. No skill logic is in the binary.
|
||||
|
||||
| Skill | What it does |
|
||||
|-------|-------------|
|
||||
| `skills/ingest/SKILL.md` | Ingest meetings, docs, articles. Update compiled truth, append timeline, create links. |
|
||||
| `skills/query/SKILL.md` | 3-layer search (FTS + vector + structured). Synthesize answer with citations. |
|
||||
| `skills/maintain/SKILL.md` | Find contradictions, stale info, orphans, dead links, tag inconsistency. |
|
||||
| `skills/enrich/SKILL.md` | Enrich from external APIs (Crustdata, Happenstance, Exa). Store raw data, distill to compiled truth. |
|
||||
| `skills/briefing/SKILL.md` | Daily briefing: meetings with context, active deals, open threads. |
|
||||
| `skills/migrate/SKILL.md` | Universal migration from Obsidian, Notion, Logseq, plain markdown, CSV, JSON, Roam. |
|
||||
|
||||
## CEO scope expansions (accepted for v0)
|
||||
|
||||
1. **CLI/MCP parity with drift tests.** Both interfaces are thin wrappers over the engine. Tests assert identical output.
|
||||
2. **Smart slug resolution.** Fuzzy matching via pg_trgm for reads. Writes require exact slugs. `gbrain get "dont scale"` resolves to `concepts/do-things-that-dont-scale`.
|
||||
3. **Brain health dashboard.** `gbrain health` shows page count, embed coverage, stale pages, orphans, dead links.
|
||||
4. **Normalized timeline.** `timeline_entries` table only (no TEXT column). `detail` field supports markdown.
|
||||
5. **Page version control.** `page_versions` table stores full snapshots (compiled_truth + frontmatter + links + tags). `gbrain history`, `gbrain diff`, `gbrain revert` commands. Revert re-chunks and re-embeds.
|
||||
6. **Typed links + graph traversal.** `link_type` column (knows, invested_in, works_at, etc.). `gbrain graph` uses recursive CTE with max depth (default 5, configurable via `--depth`).
|
||||
7. **Trigger.dev data cleanup jobs.** Daily embed backfill, weekly stale detection + orphan audit + tag consistency.
|
||||
8. **Stale alert annotations.** Search results flag pages where compiled_truth is older than latest timeline entry.
|
||||
9. **Timeline merge on ingest.** Same event created across all mentioned entities.
|
||||
|
||||
## Security model (v0)
|
||||
|
||||
Single-user, local-only:
|
||||
- Supabase service role key in `~/.gbrain/config.json` (0600 permissions)
|
||||
- MCP stdio transport is inherently local (client spawns `gbrain serve` as subprocess)
|
||||
- No multi-user, no RLS, no OAuth in v0
|
||||
- Multi-user path (future): Supabase RLS + per-user API keys
|
||||
|
||||
## Upgrade mechanism
|
||||
|
||||
`gbrain upgrade` detects the installation method and updates accordingly:
|
||||
|
||||
| Path | How |
|
||||
|------|-----|
|
||||
| npm | `bun update gbrain` (or npm equivalent) |
|
||||
| Compiled binary | Download new binary to temp dir, atomic rename swap, exec new process |
|
||||
| ClawHub | `clawhub update gbrain` |
|
||||
|
||||
Version check: compare local version against latest GitHub release tag.
|
||||
|
||||
## Storage and cost estimates
|
||||
|
||||
### Storage (~750MB for 7,471 pages)
|
||||
|
||||
| Component | Size |
|
||||
|-----------|------|
|
||||
| Page text (compiled_truth + timeline) | ~150MB |
|
||||
| JSONB frontmatter | ~20MB |
|
||||
| tsvector + GIN indexes | ~50MB |
|
||||
| Content chunks (~22K, text) | ~80MB |
|
||||
| Embeddings (22K x 1536 floats x 4 bytes) | ~134MB |
|
||||
| HNSW index overhead (~2x embeddings) | ~270MB |
|
||||
| Links, tags, timeline, raw_data, versions | ~50MB |
|
||||
| **Total** | **~750MB** |
|
||||
|
||||
Supabase free tier (500MB) won't fit. Supabase Pro ($25/mo, 8GB) is the starting point.
|
||||
|
||||
### Embedding cost (~$4-5 for initial import)
|
||||
|
||||
| Step | Cost |
|
||||
|------|------|
|
||||
| Semantic chunker sentence embeddings (~374K sentences) | ~$1 |
|
||||
| Chunk embeddings (~22K chunks) | ~$0.30 |
|
||||
| Query expansion (per query, ~3 embeds) | negligible |
|
||||
| **Total initial import** | **~$4-5** |
|
||||
|
||||
Budget alternative: `gbrain import --chunker recursive` skips sentence-level embeddings, then `gbrain embed --rechunk --chunker semantic` upgrades later.
|
||||
|
||||
## Serverless operations stack
|
||||
|
||||
```
|
||||
+------------------+ +------------------+ +------------------+
|
||||
| Supabase | | Vercel | | Trigger.dev |
|
||||
| (Postgres + | | (web/API, | | (background |
|
||||
| pgvector) | | optional) | | jobs) |
|
||||
+------------------+ +------------------+ +------------------+
|
||||
| Database | | Future web UI | | Embed backfill |
|
||||
| Connection pool | | API endpoints | | Stale detection |
|
||||
| pgvector HNSW | | Edge functions | | Orphan audit |
|
||||
| tsvector FTS | | | | Tag consistency |
|
||||
| pg_trgm fuzzy | | | | Daily briefing |
|
||||
+------------------+ +------------------+ +------------------+
|
||||
```
|
||||
|
||||
The CLI connects directly to Supabase Postgres. Trigger.dev and Vercel are for async/scheduled work. The CLI works without them.
|
||||
|
||||
## Verification checklist
|
||||
|
||||
1. `gbrain import /data/brain/` migrates all 7,471 files losslessly
|
||||
2. `gbrain export` round-trips to semantically identical markdown
|
||||
3. `gbrain query "what does PG say about doing things that don't scale?"` returns relevant hybrid search results
|
||||
4. `gbrain serve` starts MCP server connectable by Claude Code
|
||||
5. All 3 chunkers produce correct output with test fixtures
|
||||
6. `gbrain init --supabase` works end-to-end
|
||||
7. `bun test` passes all tests
|
||||
8. `clawhub install gbrain` installs the skill and runs guided setup
|
||||
9. `bun add gbrain` + `import { PostgresEngine } from 'gbrain'` works in external project
|
||||
10. Drift tests pass: CLI and MCP produce identical results
|
||||
11. `gbrain health` outputs accurate brain health metrics
|
||||
12. Migration skill successfully imports an Obsidian vault
|
||||
|
||||
## Future plans
|
||||
|
||||
See `docs/ENGINES.md` for the pluggable engine architecture and future backend plans.
|
||||
|
||||
### v1 candidates (deferred from v0)
|
||||
|
||||
- **`gbrain ask` natural language CLI alias.** Trivial to add. P1 TODO.
|
||||
- **Intelligence compiler.** Treat every fact as a first-class claim with source span, entity links, validity window, confidence, and contradiction status. "What changed, why, and what evidence would flip it again?" From Codex review. Builds on compiled truth model.
|
||||
- **Active skills via Trigger.dev.** Application-specific briefings, meeting prep. Belongs in OpenClaw, not generic brain infra.
|
||||
- **Multi-user access.** Supabase RLS + per-user API keys. v0 is single-user.
|
||||
- **SQLite engine.** Community PRs welcome. See `docs/SQLITE_ENGINE.md`.
|
||||
- **Docker Compose for self-hosted Postgres.** Community PRs welcome.
|
||||
- **Web UI.** Optional Vercel-hosted dashboard for browsing brain pages.
|
||||
|
||||
### Interface abstraction principle
|
||||
|
||||
All operations go through `BrainEngine`. The engine interface is the contract. Postgres-specific features (tsvector, pgvector HNSW, pg_trgm, recursive CTEs) are implementation details inside `PostgresEngine`. The interface exposes capabilities, not SQL.
|
||||
|
||||
This means:
|
||||
- A SQLite engine can implement `searchKeyword` using FTS5 instead of tsvector
|
||||
- A SQLite engine can implement `searchVector` using sqlite-vss instead of pgvector
|
||||
- A future DuckDB engine could implement analytics-heavy workloads
|
||||
- The CLI, MCP server, and library consumers never know which engine runs underneath
|
||||
|
||||
See `docs/ENGINES.md` for the full interface spec and `docs/SQLITE_ENGINE.md` for the SQLite implementation plan.
|
||||
|
||||
## Review history
|
||||
|
||||
| Review | Runs | Status | Key findings |
|
||||
|--------|------|--------|-------------|
|
||||
| /office-hours | 1 | APPROVED | Builder mode. Full port approach chosen. |
|
||||
| /plan-ceo-review | 1 | CLEAR | 11 proposals, 10 accepted, 1 deferred. SCOPE EXPANSION mode. |
|
||||
| /codex review | 1 | issues_found | 24 points challenged, 3 accepted (fuzzy slug, revert spec, tsvector). |
|
||||
| /plan-eng-review | 2 | CLEAR | 3 issues (upgrade paths, import guardrails, init wizard), 0 critical gaps. |
|
||||
| /plan-devex-review | 1 | CLEAR | DX score 5/10 to 7/10. TTHW 25min to 90s. Champion tier. |
|
||||
395
docs/SQLITE_ENGINE.md
Normal file
395
docs/SQLITE_ENGINE.md
Normal file
@@ -0,0 +1,395 @@
|
||||
# SQLite Engine Design
|
||||
|
||||
## Status: Designed, not built. Community PRs welcome.
|
||||
|
||||
The pluggable engine interface (`docs/ENGINES.md`) means anyone can add a SQLite backend without touching the CLI, MCP server, or skills. This document is the full plan.
|
||||
|
||||
## Why SQLite
|
||||
|
||||
Postgres is the right choice for the primary user (7K+ pages, production RAG, zero-ops via Supabase). But a lot of people want something simpler:
|
||||
|
||||
- **No server.** One file. `brain.db`. Done.
|
||||
- **Git-friendly.** You can (with care) commit a SQLite database alongside your notes.
|
||||
- **Offline.** Works on a plane, in a coffee shop, wherever.
|
||||
- **Zero cost.** No Supabase subscription. No hosting. No API keys for search (keyword-only mode works without OpenAI).
|
||||
- **Portable.** Copy the file to another machine. That's it.
|
||||
|
||||
Tools like Khoj, Obsidian plugins, and various "local-first AI" projects already use SQLite with vector extensions. The patterns exist. This is well-trodden ground.
|
||||
|
||||
## What it gives up
|
||||
|
||||
Compared to PostgresEngine:
|
||||
|
||||
| Feature | Postgres | SQLite | Impact |
|
||||
|---------|----------|--------|--------|
|
||||
| Full-text search quality | tsvector + ts_rank (excellent) | FTS5 + bm25 (good) | Slightly less precise ranking |
|
||||
| Fuzzy slug matching | pg_trgm (excellent) | LIKE + Levenshtein (ok) | Fuzzier matching, more false positives |
|
||||
| Vector search | pgvector HNSW (fast, accurate) | sqlite-vss or vec0 (good enough) | Slower at scale, good for <50K chunks |
|
||||
| Concurrent access | Connection pooling, many readers/writers | Single writer, many readers | Not an issue for single-user CLI |
|
||||
| JSONB queries | GIN index, rich operators | json_extract, no index | Slower frontmatter queries |
|
||||
| Graph traversal | Recursive CTE (native) | Recursive CTE (supported since 3.8.3) | Same |
|
||||
| Hosted option | Supabase, RDS, etc. | Turso (libSQL), Cloudflare D1 | SQLite has cloud options too |
|
||||
|
||||
For a single user with <10K pages and no concurrent access needs, these tradeoffs are fine.
|
||||
|
||||
## Schema
|
||||
|
||||
SQLite equivalent of the Postgres schema. Key differences called out.
|
||||
|
||||
```sql
|
||||
-- Enable WAL mode for better read concurrency
|
||||
PRAGMA journal_mode=WAL;
|
||||
PRAGMA foreign_keys=ON;
|
||||
|
||||
-- ============================================================
|
||||
-- pages
|
||||
-- ============================================================
|
||||
CREATE TABLE pages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
slug TEXT NOT NULL UNIQUE,
|
||||
type TEXT NOT NULL,
|
||||
title TEXT NOT NULL,
|
||||
compiled_truth TEXT NOT NULL DEFAULT '',
|
||||
timeline TEXT NOT NULL DEFAULT '',
|
||||
frontmatter TEXT NOT NULL DEFAULT '{}', -- JSON string, not JSONB
|
||||
content_hash TEXT, -- SHA-256 for import idempotency
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_pages_type ON pages(type);
|
||||
|
||||
-- ============================================================
|
||||
-- Full-text search via FTS5 (replaces tsvector)
|
||||
-- ============================================================
|
||||
CREATE VIRTUAL TABLE pages_fts USING fts5(
|
||||
title,
|
||||
compiled_truth,
|
||||
timeline,
|
||||
content='pages',
|
||||
content_rowid='id',
|
||||
tokenize='porter unicode61'
|
||||
);
|
||||
|
||||
-- Triggers to keep FTS5 in sync
|
||||
CREATE TRIGGER pages_fts_insert AFTER INSERT ON pages BEGIN
|
||||
INSERT INTO pages_fts(rowid, title, compiled_truth, timeline)
|
||||
VALUES (new.id, new.title, new.compiled_truth, new.timeline);
|
||||
END;
|
||||
|
||||
CREATE TRIGGER pages_fts_update AFTER UPDATE ON pages BEGIN
|
||||
INSERT INTO pages_fts(pages_fts, rowid, title, compiled_truth, timeline)
|
||||
VALUES ('delete', old.id, old.title, old.compiled_truth, old.timeline);
|
||||
INSERT INTO pages_fts(rowid, title, compiled_truth, timeline)
|
||||
VALUES (new.id, new.title, new.compiled_truth, new.timeline);
|
||||
END;
|
||||
|
||||
CREATE TRIGGER pages_fts_delete AFTER DELETE ON pages BEGIN
|
||||
INSERT INTO pages_fts(pages_fts, rowid, title, compiled_truth, timeline)
|
||||
VALUES ('delete', old.id, old.title, old.compiled_truth, old.timeline);
|
||||
END;
|
||||
|
||||
-- ============================================================
|
||||
-- content_chunks
|
||||
-- ============================================================
|
||||
CREATE TABLE content_chunks (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
chunk_index INTEGER NOT NULL,
|
||||
chunk_text TEXT NOT NULL,
|
||||
chunk_source TEXT NOT NULL DEFAULT 'compiled_truth',
|
||||
embedding BLOB, -- Float32Array as raw bytes
|
||||
model TEXT NOT NULL DEFAULT 'text-embedding-3-large',
|
||||
token_count INTEGER,
|
||||
embedded_at TEXT,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_chunks_page ON content_chunks(page_id);
|
||||
|
||||
-- Vector search index created separately via sqlite-vss or vec0
|
||||
-- See "Vector search options" section below
|
||||
|
||||
-- ============================================================
|
||||
-- links
|
||||
-- ============================================================
|
||||
CREATE TABLE links (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
link_type TEXT NOT NULL DEFAULT '',
|
||||
context TEXT NOT NULL DEFAULT '',
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
UNIQUE(from_page_id, to_page_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_links_from ON links(from_page_id);
|
||||
CREATE INDEX idx_links_to ON links(to_page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- tags
|
||||
-- ============================================================
|
||||
CREATE TABLE tags (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
tag TEXT NOT NULL,
|
||||
UNIQUE(page_id, tag)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_tags_tag ON tags(tag);
|
||||
CREATE INDEX idx_tags_page_id ON tags(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- raw_data
|
||||
-- ============================================================
|
||||
CREATE TABLE raw_data (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
source TEXT NOT NULL,
|
||||
data TEXT NOT NULL, -- JSON string
|
||||
fetched_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
UNIQUE(page_id, source)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_raw_data_page ON raw_data(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- timeline_entries
|
||||
-- ============================================================
|
||||
CREATE TABLE timeline_entries (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
date TEXT NOT NULL, -- ISO date string
|
||||
source TEXT NOT NULL DEFAULT '',
|
||||
summary TEXT NOT NULL,
|
||||
detail TEXT NOT NULL DEFAULT '',
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_timeline_page ON timeline_entries(page_id);
|
||||
CREATE INDEX idx_timeline_date ON timeline_entries(date);
|
||||
|
||||
-- ============================================================
|
||||
-- page_versions
|
||||
-- ============================================================
|
||||
CREATE TABLE page_versions (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
compiled_truth TEXT NOT NULL,
|
||||
frontmatter TEXT NOT NULL DEFAULT '{}',
|
||||
snapshot_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_versions_page ON page_versions(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- ingest_log
|
||||
-- ============================================================
|
||||
CREATE TABLE ingest_log (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
source_type TEXT NOT NULL,
|
||||
source_ref TEXT NOT NULL,
|
||||
pages_updated TEXT NOT NULL DEFAULT '[]', -- JSON array
|
||||
summary TEXT NOT NULL DEFAULT '',
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- config
|
||||
-- ============================================================
|
||||
CREATE TABLE config (
|
||||
key TEXT PRIMARY KEY,
|
||||
value TEXT NOT NULL
|
||||
);
|
||||
|
||||
INSERT INTO config (key, value) VALUES
|
||||
('version', '1'),
|
||||
('engine', 'sqlite'),
|
||||
('embedding_model', 'text-embedding-3-large'),
|
||||
('embedding_dimensions', '1536'),
|
||||
('chunk_strategy', 'semantic');
|
||||
```
|
||||
|
||||
### Key differences from Postgres schema
|
||||
|
||||
| Feature | Postgres | SQLite |
|
||||
|---------|----------|--------|
|
||||
| Types | `SERIAL`, `TIMESTAMPTZ`, `JSONB`, `vector(1536)` | `INTEGER`, `TEXT`, `TEXT` (JSON), `BLOB` |
|
||||
| Full-text search | `tsvector` generated column + GIN | FTS5 virtual table + triggers |
|
||||
| Vector storage | `vector(1536)` column type | `BLOB` (raw Float32Array bytes) |
|
||||
| Vector index | HNSW via pgvector | Separate via sqlite-vss or vec0 |
|
||||
| Fuzzy search | `pg_trgm` GIN index | LIKE queries or Levenshtein UDF |
|
||||
| JSON queries | `JSONB` + GIN index | `json_extract()` function |
|
||||
| Timestamps | `TIMESTAMPTZ` (native) | `TEXT` with ISO format |
|
||||
|
||||
## Vector search options
|
||||
|
||||
Two main choices for vector search in SQLite:
|
||||
|
||||
### Option A: sqlite-vss (Alex Garcia)
|
||||
|
||||
```sql
|
||||
-- Load extension
|
||||
.load ./vector0
|
||||
.load ./vss0
|
||||
|
||||
-- Create virtual table linked to content_chunks
|
||||
CREATE VIRTUAL TABLE chunks_vss USING vss0(
|
||||
embedding(1536)
|
||||
);
|
||||
|
||||
-- Insert embeddings (linked by rowid to content_chunks)
|
||||
INSERT INTO chunks_vss(rowid, embedding)
|
||||
SELECT id, embedding FROM content_chunks WHERE embedding IS NOT NULL;
|
||||
|
||||
-- Search
|
||||
SELECT rowid, distance
|
||||
FROM chunks_vss
|
||||
WHERE vss_search(embedding, :query_embedding)
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
Pros: mature, well-documented, used by many projects.
|
||||
Cons: requires loading native extensions (platform-specific binaries).
|
||||
|
||||
### Option B: vec0 (newer, from same author)
|
||||
|
||||
```sql
|
||||
-- Create virtual table
|
||||
CREATE VIRTUAL TABLE chunks_vec USING vec0(
|
||||
chunk_id INTEGER PRIMARY KEY,
|
||||
embedding float[1536]
|
||||
);
|
||||
|
||||
-- Search
|
||||
SELECT chunk_id, distance
|
||||
FROM chunks_vec
|
||||
WHERE embedding MATCH :query_embedding
|
||||
ORDER BY distance
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
Pros: simpler API, better integration with SQLite ecosystem.
|
||||
Cons: newer, less battle-tested.
|
||||
|
||||
### Option C: No vector search (keyword only)
|
||||
|
||||
For users who don't want to deal with vector extensions or OpenAI API keys, the brain still works with keyword search only. FTS5 + bm25 is genuinely good for structured wiki content where you know the terms. `searchVector` returns `[]`, hybrid search degrades gracefully to keyword-only.
|
||||
|
||||
This is a valid configuration. Not everyone needs embeddings.
|
||||
|
||||
## Init flow for SQLite
|
||||
|
||||
```bash
|
||||
gbrain init --sqlite
|
||||
# or: gbrain init --sqlite --path ~/brain.db
|
||||
|
||||
# 1. Create database file at specified path (default: ~/.gbrain/brain.db)
|
||||
# 2. Run schema (all CREATE TABLE + FTS5 + triggers)
|
||||
# 3. Write config to ~/.gbrain/config.json:
|
||||
# { "engine": "sqlite", "database_path": "~/.gbrain/brain.db" }
|
||||
# 4. Import kindling corpus (same as Postgres path)
|
||||
# 5. "Brain ready. 10 pages imported."
|
||||
```
|
||||
|
||||
No Supabase account needed. No API keys needed (keyword-only mode). No server. Just a file.
|
||||
|
||||
For vector search, the user additionally needs:
|
||||
- OpenAI API key in `~/.gbrain/config.json` or `OPENAI_API_KEY` env var
|
||||
- sqlite-vss or vec0 extension binary for their platform
|
||||
|
||||
## Fuzzy slug resolution without pg_trgm
|
||||
|
||||
Postgres uses `pg_trgm` GIN index for fast fuzzy matching. SQLite doesn't have this. Options:
|
||||
|
||||
1. **LIKE with wildcards.** `WHERE slug LIKE '%dont%scale%'`. Simple, works for partial matches, but no ranking.
|
||||
2. **Levenshtein distance via UDF.** Load a user-defined function (or implement in TS) that computes edit distance. Sort by distance. Slower but more accurate.
|
||||
3. **Trigram simulation in TS.** Compute trigrams in TypeScript, store in a separate table, query by trigram overlap. Fast but requires maintaining the trigram index.
|
||||
|
||||
Recommendation: start with LIKE + fallback to Levenshtein UDF. Good enough for single-user, <10K pages.
|
||||
|
||||
## Implementation roadmap
|
||||
|
||||
If you're building this, here's the order:
|
||||
|
||||
1. **`src/core/sqlite-engine.ts`** implementing `BrainEngine`
|
||||
2. **Schema migration** (the SQL above)
|
||||
3. **CRUD operations** (getPage, putPage, listPages, deletePage). Straightforward SQL.
|
||||
4. **FTS5 keyword search** (searchKeyword). Map `websearch_to_tsquery` semantics to FTS5 query syntax.
|
||||
5. **Tags, links, timeline, raw_data, versions, config, ingest_log.** All straightforward.
|
||||
6. **Graph traversal.** SQLite supports recursive CTEs since 3.8.3. Port the Postgres CTE with max depth.
|
||||
7. **Vector search** (optional). Pick sqlite-vss or vec0, implement searchVector.
|
||||
8. **Tests.** Port the Postgres test suite. Most tests should be engine-agnostic.
|
||||
|
||||
Steps 1-6 are purely mechanical. Step 7 is the only one that requires a native extension.
|
||||
|
||||
## Dependencies for SQLite engine
|
||||
|
||||
```json
|
||||
{
|
||||
"better-sqlite3": "^11.0.0"
|
||||
}
|
||||
```
|
||||
|
||||
Or use Bun's built-in `bun:sqlite` driver (zero dependency).
|
||||
|
||||
For vector search, add one of:
|
||||
- `sqlite-vss` (native extension, platform-specific)
|
||||
- `vec0` (native extension, platform-specific)
|
||||
|
||||
## Testing strategy
|
||||
|
||||
Most test cases should be engine-agnostic. The test runner should parameterize by engine:
|
||||
|
||||
```typescript
|
||||
const engines = [
|
||||
{ name: 'postgres', factory: () => new PostgresEngine() },
|
||||
{ name: 'sqlite', factory: () => new SQLiteEngine() },
|
||||
];
|
||||
|
||||
for (const { name, factory } of engines) {
|
||||
describe(`BrainEngine (${name})`, () => {
|
||||
const engine = factory();
|
||||
|
||||
test('putPage + getPage round-trip', async () => {
|
||||
await engine.putPage('test/slug', { title: 'Test', type: 'person', ... });
|
||||
const page = await engine.getPage('test/slug');
|
||||
expect(page.title).toBe('Test');
|
||||
});
|
||||
|
||||
// ... all CRUD, search, link, tag, timeline tests
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
Search tests may need engine-specific assertions (ranking differences between tsvector and FTS5 are expected). But the interface contract (returns SearchResult[], sorted by relevance) should hold across engines.
|
||||
|
||||
## File structure
|
||||
|
||||
```
|
||||
brain.db # ~750MB for 7K pages with embeddings
|
||||
# ~150MB without embeddings (keyword-only)
|
||||
~/.gbrain/config.json # { "engine": "sqlite", "database_path": "..." }
|
||||
```
|
||||
|
||||
That's it. One file for the brain. One file for config.
|
||||
|
||||
## Migration between engines
|
||||
|
||||
Future work: `gbrain migrate --from postgres --to sqlite` (and vice versa). The engine interface makes this straightforward... export all data via one engine's methods, import via the other's. The data model is the same, only the storage format changes.
|
||||
|
||||
This is not built yet. For now, `gbrain export` to markdown and `gbrain import` into the other engine achieves the same result (with re-chunking and re-embedding).
|
||||
|
||||
## Contributing
|
||||
|
||||
If you want to build this:
|
||||
|
||||
1. Fork the repo
|
||||
2. Create `src/core/sqlite-engine.ts`
|
||||
3. Use the schema from this document
|
||||
4. Run the existing test suite against your engine
|
||||
5. PR it
|
||||
|
||||
The interface is well-defined. The schema is documented. The test suite exists. This should be a few days of focused work with CC, or a weekend project for a human.
|
||||
|
||||
We'd love to see it.
|
||||
32
package.json
Normal file
32
package.json
Normal file
@@ -0,0 +1,32 @@
|
||||
{
|
||||
"name": "gbrain",
|
||||
"version": "0.1.0",
|
||||
"description": "Postgres-native personal knowledge brain with hybrid RAG search",
|
||||
"type": "module",
|
||||
"main": "src/core/index.ts",
|
||||
"bin": {
|
||||
"gbrain": "src/cli.ts"
|
||||
},
|
||||
"exports": {
|
||||
".": "./src/core/index.ts",
|
||||
"./engine": "./src/core/engine.ts",
|
||||
"./types": "./src/core/types.ts"
|
||||
},
|
||||
"scripts": {
|
||||
"dev": "bun run src/cli.ts",
|
||||
"build": "bun build --compile --outfile bin/gbrain src/cli.ts",
|
||||
"test": "bun test"
|
||||
},
|
||||
"dependencies": {
|
||||
"@anthropic-ai/sdk": "^0.30.0",
|
||||
"@modelcontextprotocol/sdk": "^1.0.0",
|
||||
"gray-matter": "^4.0.3",
|
||||
"openai": "^4.0.0",
|
||||
"pgvector": "^0.2.0",
|
||||
"postgres": "^3.4.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/bun": "latest"
|
||||
},
|
||||
"license": "MIT"
|
||||
}
|
||||
58
skills/briefing/SKILL.md
Normal file
58
skills/briefing/SKILL.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# Briefing Skill
|
||||
|
||||
Compile a daily briefing from brain context.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Today's meetings.** For each meeting on the calendar:
|
||||
- Look up all participants via `gbrain query <name>`
|
||||
- Read their pages for compiled_truth context
|
||||
- Summarize: who they are, recent timeline, relationship to you
|
||||
2. **Active deals.** `gbrain list --type deal` filtered to active status:
|
||||
- Deadlines approaching in the next 7 days
|
||||
- Recent timeline entries (last 7 days)
|
||||
3. **Time-sensitive threads.** Open items from timeline entries:
|
||||
- Items with deadlines in the next 48 hours
|
||||
- Follow-ups that are overdue
|
||||
4. **Recent changes.** Pages updated in the last 24 hours:
|
||||
- What changed and why (read timeline entries)
|
||||
5. **People in play.** `gbrain list --type person` sorted by recency:
|
||||
- Updated in last 7 days
|
||||
- Have high activity (many recent timeline entries)
|
||||
6. **Stale alerts.** From `gbrain health`:
|
||||
- Pages flagged as stale that are relevant to today's meetings
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
DAILY BRIEFING — [date]
|
||||
========================
|
||||
|
||||
MEETINGS TODAY
|
||||
- [time] [meeting name]
|
||||
Participants: [name] (slug: people/name, [key context])
|
||||
|
||||
ACTIVE DEALS
|
||||
- [deal name] — [status], deadline: [date]
|
||||
Recent: [latest timeline entry]
|
||||
|
||||
ACTION ITEMS
|
||||
- [item] — due [date], related to [slug]
|
||||
|
||||
RECENT CHANGES (24h)
|
||||
- [slug] — [what changed]
|
||||
|
||||
PEOPLE IN PLAY
|
||||
- [name] — [why they're active]
|
||||
```
|
||||
|
||||
## Commands Used
|
||||
|
||||
```
|
||||
gbrain query <name>
|
||||
gbrain get <slug>
|
||||
gbrain list --type deal
|
||||
gbrain list --type person
|
||||
gbrain health
|
||||
gbrain timeline <slug>
|
||||
```
|
||||
45
skills/enrich/SKILL.md
Normal file
45
skills/enrich/SKILL.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Enrich Skill
|
||||
|
||||
Enrich person and company pages from external APIs.
|
||||
|
||||
## Sources
|
||||
|
||||
| Source | Data | API |
|
||||
|--------|------|-----|
|
||||
| Crustdata | LinkedIn profiles, company data | REST API |
|
||||
| Happenstance | Career history, connections | REST API |
|
||||
| Exa | Web mentions, articles | REST API |
|
||||
|
||||
Note: enrichment requires separate API credentials for each service. No client
|
||||
integrations ship in v1. This skill guides Claude Code to make API calls directly.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Select target pages.** `gbrain list --type person` or `gbrain list --type company`
|
||||
2. **For each page:**
|
||||
- Read current compiled_truth to understand what we already know
|
||||
- Call external APIs for fresh data
|
||||
- Store raw API responses: the raw JSON goes into `gbrain call put_raw_data`
|
||||
- Distill highlights into compiled_truth updates
|
||||
3. **Validation rules:**
|
||||
- Connection count < 20 on LinkedIn = likely wrong person, skip
|
||||
- Name mismatch between brain and API = skip, flag for manual review
|
||||
- Don't overwrite human-written assessments with API boilerplate
|
||||
|
||||
## Quality Rules
|
||||
|
||||
- Raw data goes to raw_data table (preserves provenance)
|
||||
- Only distilled, useful info goes to compiled_truth
|
||||
- Always add a timeline entry: "Enriched from [source] on [date]"
|
||||
- Don't enrich the same page more than once per week unless requested
|
||||
- Rate limit: respect API rate limits, use exponential backoff
|
||||
|
||||
## Commands Used
|
||||
|
||||
```
|
||||
gbrain get <slug>
|
||||
gbrain put <slug>
|
||||
gbrain timeline-add <slug> <date> "Enriched from <source>"
|
||||
gbrain list --type person
|
||||
gbrain list --type company
|
||||
```
|
||||
34
skills/ingest/SKILL.md
Normal file
34
skills/ingest/SKILL.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Ingest Skill
|
||||
|
||||
Ingest meetings, articles, documents, and conversations into the brain.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Parse the source.** Extract people, companies, dates, and events from the input.
|
||||
2. **For each entity mentioned:**
|
||||
- `gbrain get <slug>` to check if page exists
|
||||
- If exists: update compiled_truth (rewrite State section with new info, don't append)
|
||||
- If new: `gbrain put <slug>` to create the page
|
||||
3. **Append to timeline.** `gbrain timeline-add <slug> <date> <summary>` for each event.
|
||||
4. **Create cross-reference links.** `gbrain link <from> <to> --type <relationship>` for every entity pair mentioned together.
|
||||
5. **Timeline merge.** The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.
|
||||
|
||||
## Quality Rules
|
||||
|
||||
- Executive summary in compiled_truth must be updated, not just timeline appended
|
||||
- State section is REWRITTEN, not appended to. Current best understanding only.
|
||||
- Timeline entries are reverse-chronological (newest first)
|
||||
- Every person/company mentioned gets a page if one doesn't exist
|
||||
- Link types: knows, works_at, invested_in, founded, met_at, discussed
|
||||
- Source attribution: every timeline entry includes the source (meeting, article, email, etc.)
|
||||
|
||||
## Commands Used
|
||||
|
||||
```
|
||||
gbrain get <slug>
|
||||
gbrain put <slug> < content.md
|
||||
gbrain timeline-add <slug> <date> <summary>
|
||||
gbrain link <from> <to> --type <type>
|
||||
gbrain tags <slug>
|
||||
gbrain tag <slug> <tag>
|
||||
```
|
||||
59
skills/maintain/SKILL.md
Normal file
59
skills/maintain/SKILL.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Maintain Skill
|
||||
|
||||
Periodic brain health checks and cleanup.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Run health check.** `gbrain health` to get the dashboard.
|
||||
2. **Check each dimension:**
|
||||
|
||||
### Stale pages
|
||||
Pages where compiled_truth is older than the latest timeline entry. The assessment hasn't been updated to reflect recent evidence.
|
||||
- `gbrain query "stale pages"` or check health output
|
||||
- For each stale page: read timeline, determine if compiled_truth needs rewriting
|
||||
|
||||
### Orphan pages
|
||||
Pages with zero inbound links. Nobody references them.
|
||||
- Review orphans: are they genuinely isolated or just missing links?
|
||||
- Add links from related pages or flag for deletion
|
||||
|
||||
### Dead links
|
||||
Links pointing to pages that don't exist.
|
||||
- Remove dead links with `gbrain unlink`
|
||||
|
||||
### Missing cross-references
|
||||
Pages that mention entity names but don't have formal links.
|
||||
- Read compiled_truth, extract entity mentions, create links
|
||||
|
||||
### Tag consistency
|
||||
Inconsistent tagging (e.g., "vc" vs "venture-capital", "ai" vs "artificial-intelligence").
|
||||
- Standardize to the most common variant
|
||||
|
||||
### Embedding freshness
|
||||
Chunks without embeddings, or chunks embedded with an old model.
|
||||
- `gbrain embed --stale` to backfill
|
||||
|
||||
### Open threads
|
||||
Timeline items older than 30 days with unresolved action items.
|
||||
- Flag for review
|
||||
|
||||
## Quality Rules
|
||||
|
||||
- Never delete pages without confirmation
|
||||
- Log all changes via timeline entries
|
||||
- Run `gbrain health` before and after to show improvement
|
||||
|
||||
## Commands Used
|
||||
|
||||
```
|
||||
gbrain health
|
||||
gbrain list [--type T]
|
||||
gbrain get <slug>
|
||||
gbrain backlinks <slug>
|
||||
gbrain link <from> <to> --type <type>
|
||||
gbrain unlink <from> <to>
|
||||
gbrain tag <slug> <tag>
|
||||
gbrain untag <slug> <tag>
|
||||
gbrain embed --stale
|
||||
gbrain timeline <slug>
|
||||
```
|
||||
45
skills/manifest.json
Normal file
45
skills/manifest.json
Normal file
@@ -0,0 +1,45 @@
|
||||
{
|
||||
"name": "gbrain",
|
||||
"version": "0.1.0",
|
||||
"description": "Personal knowledge brain with hybrid RAG search",
|
||||
"skills": [
|
||||
{
|
||||
"name": "ingest",
|
||||
"path": "ingest/SKILL.md",
|
||||
"description": "Ingest meetings, docs, articles into the brain"
|
||||
},
|
||||
{
|
||||
"name": "query",
|
||||
"path": "query/SKILL.md",
|
||||
"description": "Answer questions using 3-layer search and synthesis"
|
||||
},
|
||||
{
|
||||
"name": "maintain",
|
||||
"path": "maintain/SKILL.md",
|
||||
"description": "Brain health checks: contradictions, stale info, orphans"
|
||||
},
|
||||
{
|
||||
"name": "enrich",
|
||||
"path": "enrich/SKILL.md",
|
||||
"description": "Enrich pages from external APIs (Crustdata, Happenstance, Exa)"
|
||||
},
|
||||
{
|
||||
"name": "briefing",
|
||||
"path": "briefing/SKILL.md",
|
||||
"description": "Compile daily briefing with meeting context and active deals"
|
||||
},
|
||||
{
|
||||
"name": "migrate",
|
||||
"path": "migrate/SKILL.md",
|
||||
"description": "Universal migration from Obsidian, Notion, Logseq, markdown, CSV, JSON, Roam"
|
||||
}
|
||||
],
|
||||
"dependencies": {
|
||||
"runtime": "bun",
|
||||
"package": "gbrain"
|
||||
},
|
||||
"setup": {
|
||||
"command": "gbrain init --supabase",
|
||||
"description": "Initialize brain with Supabase (guided wizard)"
|
||||
}
|
||||
}
|
||||
87
skills/migrate/SKILL.md
Normal file
87
skills/migrate/SKILL.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Migrate Skill
|
||||
|
||||
Universal migration from any wiki, note tool, or brain system into GBrain.
|
||||
|
||||
## Supported Sources
|
||||
|
||||
| Source | Format | Strategy |
|
||||
|--------|--------|----------|
|
||||
| Obsidian | Markdown + `[[wikilinks]]` | Direct import, convert wikilinks to gbrain links |
|
||||
| Notion | Exported markdown or CSV | Parse Notion's export structure |
|
||||
| Logseq | Markdown with `((block refs))` | Convert block refs to page links |
|
||||
| Plain markdown | Any .md directory | `gbrain import <dir>` directly |
|
||||
| CSV | Tabular data | Map columns to frontmatter fields |
|
||||
| JSON | Structured data | Map keys to page fields |
|
||||
| Roam | JSON export | Convert block structure to pages |
|
||||
|
||||
## General Workflow
|
||||
|
||||
1. **Assess the source.** What format? How many files? What structure?
|
||||
2. **Plan the mapping.** How do source fields map to gbrain fields (type, title, tags, compiled_truth, timeline)?
|
||||
3. **Test with a sample.** Import 5-10 files, verify with `gbrain get` and `gbrain export`.
|
||||
4. **Bulk import.** Run the full migration.
|
||||
5. **Verify.** `gbrain health` + `gbrain stats` + spot-check pages.
|
||||
6. **Build links.** Extract cross-references from content and create typed links.
|
||||
|
||||
## Obsidian Migration
|
||||
|
||||
```bash
|
||||
# 1. Direct import (obsidian vaults are markdown directories)
|
||||
gbrain import /path/to/vault/
|
||||
|
||||
# 2. Convert [[wikilinks]] to gbrain links
|
||||
# The skill reads each page's compiled_truth, finds [[Name]] patterns,
|
||||
# resolves them to slugs, and creates links:
|
||||
gbrain get <slug> # read content
|
||||
# For each [[Name]] found:
|
||||
gbrain link <current-slug> <resolved-slug> --type references
|
||||
```
|
||||
|
||||
Obsidian-specific:
|
||||
- `[[Name]]` becomes `gbrain link`
|
||||
- `[[Name|alias]]` uses the alias for context
|
||||
- Tags (`#tag`) become `gbrain tag`
|
||||
- Frontmatter properties map to gbrain frontmatter
|
||||
- Attachments (images, PDFs) are noted but not imported (future work)
|
||||
|
||||
## Notion Migration
|
||||
|
||||
1. Export from Notion: Settings > Export > Markdown & CSV
|
||||
2. Notion exports nested directories with UUIDs in filenames
|
||||
3. Strip UUIDs from filenames for clean slugs
|
||||
4. Map Notion's database properties to frontmatter
|
||||
5. `gbrain import` the cleaned directory
|
||||
|
||||
## CSV Migration
|
||||
|
||||
For tabular data (e.g., CRM exports, contact lists):
|
||||
|
||||
```bash
|
||||
# For each row in the CSV:
|
||||
# 1. Create a page with column values as frontmatter
|
||||
# 2. Use a designated column as the slug (e.g., name)
|
||||
# 3. Use another column as compiled_truth (e.g., notes)
|
||||
gbrain put <slug> < generated.md
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
After any migration:
|
||||
1. `gbrain stats` — check page count matches source
|
||||
2. `gbrain health` — check for orphans, missing embeddings
|
||||
3. `gbrain export --dir /tmp/verify/` — round-trip test
|
||||
4. Spot-check 5-10 pages with `gbrain get`
|
||||
5. Test search: `gbrain query "someone you know is in the data"`
|
||||
|
||||
## Commands Used
|
||||
|
||||
```
|
||||
gbrain import <dir> [--no-embed]
|
||||
gbrain get <slug>
|
||||
gbrain put <slug>
|
||||
gbrain link <from> <to> --type <type>
|
||||
gbrain tag <slug> <tag>
|
||||
gbrain stats
|
||||
gbrain health
|
||||
gbrain export [--dir ./verify/]
|
||||
```
|
||||
38
skills/query/SKILL.md
Normal file
38
skills/query/SKILL.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Query Skill
|
||||
|
||||
Answer questions using the brain's knowledge with 3-layer search and synthesis.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Decompose the question** into search strategies:
|
||||
- Keyword search for specific names, dates, terms
|
||||
- Semantic query for conceptual questions
|
||||
- Structured queries (list by type, backlinks) for relational questions
|
||||
2. **Execute searches:**
|
||||
- `gbrain search <keywords>` for FTS matches
|
||||
- `gbrain query <question>` for hybrid semantic+keyword with expansion
|
||||
- `gbrain list --type <type>` or `gbrain backlinks <slug>` for structural queries
|
||||
3. **Read top results.** `gbrain get <slug>` for the top 3-5 pages to get full context.
|
||||
4. **Synthesize answer** with citations. Every claim traces back to a specific page slug.
|
||||
5. **Flag gaps.** If the brain doesn't have info, say "the brain doesn't have information on X" rather than hallucinating.
|
||||
|
||||
## Quality Rules
|
||||
|
||||
- Never hallucinate. Only answer from brain content.
|
||||
- Cite sources: "According to concepts/do-things-that-dont-scale..."
|
||||
- Flag stale results: if a search result shows [STALE], note that the info may be outdated
|
||||
- For "who" questions, use backlinks and typed links to find connections
|
||||
- For "what happened" questions, use timeline entries
|
||||
- For "what do we know" questions, read compiled_truth directly
|
||||
|
||||
## Commands Used
|
||||
|
||||
```
|
||||
gbrain search <query>
|
||||
gbrain query <question>
|
||||
gbrain get <slug>
|
||||
gbrain list [--type T] [--tag T]
|
||||
gbrain backlinks <slug>
|
||||
gbrain graph <slug> [--depth N]
|
||||
gbrain timeline <slug>
|
||||
```
|
||||
252
src/cli.ts
Normal file
252
src/cli.ts
Normal file
@@ -0,0 +1,252 @@
|
||||
#!/usr/bin/env bun
|
||||
|
||||
import { PostgresEngine } from './core/postgres-engine.ts';
|
||||
import { loadConfig, toEngineConfig } from './core/config.ts';
|
||||
import type { BrainEngine } from './core/engine.ts';
|
||||
|
||||
const VERSION = '0.1.0';
|
||||
|
||||
async function main() {
|
||||
const args = process.argv.slice(2);
|
||||
const command = args[0];
|
||||
|
||||
if (!command || command === '--help' || command === '-h') {
|
||||
printHelp();
|
||||
return;
|
||||
}
|
||||
|
||||
if (command === '--version' || command === 'version') {
|
||||
console.log(`gbrain ${VERSION}`);
|
||||
return;
|
||||
}
|
||||
|
||||
if (command === '--tools-json') {
|
||||
const { printToolsJson } = await import('./commands/tools-json.ts');
|
||||
printToolsJson();
|
||||
return;
|
||||
}
|
||||
|
||||
// Commands that don't need a database connection
|
||||
if (command === 'init') {
|
||||
const { runInit } = await import('./commands/init.ts');
|
||||
await runInit(args.slice(1));
|
||||
return;
|
||||
}
|
||||
|
||||
if (command === 'upgrade') {
|
||||
const { runUpgrade } = await import('./commands/upgrade.ts');
|
||||
await runUpgrade(args.slice(1));
|
||||
return;
|
||||
}
|
||||
|
||||
// All other commands need a database connection
|
||||
const engine = await connectEngine();
|
||||
|
||||
try {
|
||||
switch (command) {
|
||||
case 'get': {
|
||||
const { runGet } = await import('./commands/get.ts');
|
||||
await runGet(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'put': {
|
||||
const { runPut } = await import('./commands/put.ts');
|
||||
await runPut(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'list': {
|
||||
const { runList } = await import('./commands/list.ts');
|
||||
await runList(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'search': {
|
||||
const { runSearch } = await import('./commands/search.ts');
|
||||
await runSearch(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'query': {
|
||||
const { runQuery } = await import('./commands/query.ts');
|
||||
await runQuery(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'import': {
|
||||
const { runImport } = await import('./commands/import.ts');
|
||||
await runImport(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'export': {
|
||||
const { runExport } = await import('./commands/export.ts');
|
||||
await runExport(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'embed': {
|
||||
const { runEmbed } = await import('./commands/embed.ts');
|
||||
await runEmbed(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'stats': {
|
||||
const { runStats } = await import('./commands/stats.ts');
|
||||
await runStats(engine);
|
||||
break;
|
||||
}
|
||||
case 'health': {
|
||||
const { runHealth } = await import('./commands/health.ts');
|
||||
await runHealth(engine);
|
||||
break;
|
||||
}
|
||||
case 'tag': {
|
||||
const { runTag } = await import('./commands/tags.ts');
|
||||
await runTag(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'untag': {
|
||||
const { runUntag } = await import('./commands/tags.ts');
|
||||
await runUntag(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'tags': {
|
||||
const { runTags } = await import('./commands/tags.ts');
|
||||
await runTags(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'link': {
|
||||
const { runLink } = await import('./commands/link.ts');
|
||||
await runLink(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'unlink': {
|
||||
const { runUnlink } = await import('./commands/link.ts');
|
||||
await runUnlink(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'backlinks': {
|
||||
const { runBacklinks } = await import('./commands/link.ts');
|
||||
await runBacklinks(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'graph': {
|
||||
const { runGraph } = await import('./commands/link.ts');
|
||||
await runGraph(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'timeline': {
|
||||
const { runTimeline } = await import('./commands/timeline.ts');
|
||||
await runTimeline(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'timeline-add': {
|
||||
const { runTimelineAdd } = await import('./commands/timeline.ts');
|
||||
await runTimelineAdd(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'delete': {
|
||||
const { runDelete } = await import('./commands/delete.ts');
|
||||
await runDelete(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'history': {
|
||||
const { runHistory } = await import('./commands/version.ts');
|
||||
await runHistory(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'revert': {
|
||||
const { runRevert } = await import('./commands/version.ts');
|
||||
await runRevert(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'config': {
|
||||
const { runConfig } = await import('./commands/config.ts');
|
||||
await runConfig(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
case 'serve': {
|
||||
const { runServe } = await import('./commands/serve.ts');
|
||||
await runServe(engine);
|
||||
break;
|
||||
}
|
||||
case 'call': {
|
||||
const { runCall } = await import('./commands/call.ts');
|
||||
await runCall(engine, args.slice(1));
|
||||
break;
|
||||
}
|
||||
default:
|
||||
console.error(`Unknown command: ${command}`);
|
||||
console.error('Run gbrain --help for usage');
|
||||
process.exit(1);
|
||||
}
|
||||
} finally {
|
||||
await engine.disconnect();
|
||||
}
|
||||
}
|
||||
|
||||
async function connectEngine(): Promise<BrainEngine> {
|
||||
const config = loadConfig();
|
||||
if (!config) {
|
||||
console.error('No brain configured. Run: gbrain init --supabase');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const engine = new PostgresEngine();
|
||||
await engine.connect(toEngineConfig(config));
|
||||
return engine;
|
||||
}
|
||||
|
||||
function printHelp() {
|
||||
console.log(`gbrain ${VERSION} — personal knowledge brain
|
||||
|
||||
USAGE
|
||||
gbrain <command> [options]
|
||||
|
||||
SETUP
|
||||
init [--supabase|--url <conn>] Create brain (guided wizard)
|
||||
upgrade Self-update
|
||||
|
||||
PAGES
|
||||
get <slug> Read a page
|
||||
put <slug> [< file.md] Write/update a page
|
||||
delete <slug> Delete a page
|
||||
list [--type T] [--tag T] [-n N] List pages
|
||||
|
||||
SEARCH
|
||||
search <query> Keyword search (tsvector)
|
||||
query <question> Hybrid search (RRF + expansion)
|
||||
|
||||
IMPORT/EXPORT
|
||||
import <dir> [--no-embed] Import markdown directory
|
||||
export [--dir ./out/] Export to markdown
|
||||
|
||||
EMBEDDINGS
|
||||
embed [<slug>|--all|--stale] Generate/refresh embeddings
|
||||
|
||||
LINKS
|
||||
link <from> <to> [--type T] Create typed link
|
||||
unlink <from> <to> Remove link
|
||||
backlinks <slug> Incoming links
|
||||
graph <slug> [--depth N] Traverse link graph
|
||||
|
||||
TAGS
|
||||
tags <slug> List tags
|
||||
tag <slug> <tag> Add tag
|
||||
untag <slug> <tag> Remove tag
|
||||
|
||||
TIMELINE
|
||||
timeline [<slug>] View timeline
|
||||
timeline-add <slug> <date> <text> Add timeline entry
|
||||
|
||||
ADMIN
|
||||
stats Brain statistics
|
||||
health Brain health dashboard
|
||||
history <slug> Page version history
|
||||
revert <slug> <version-id> Revert to version
|
||||
config [get|set] <key> [value] Brain config
|
||||
serve MCP server (stdio)
|
||||
call <tool> '<json>' Raw tool invocation
|
||||
version Version info
|
||||
--tools-json Tool discovery (JSON)
|
||||
`);
|
||||
}
|
||||
|
||||
main().catch(e => {
|
||||
console.error(e.message || e);
|
||||
process.exit(1);
|
||||
});
|
||||
16
src/commands/call.ts
Normal file
16
src/commands/call.ts
Normal file
@@ -0,0 +1,16 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { handleToolCall } from '../mcp/server.ts';
|
||||
|
||||
export async function runCall(engine: BrainEngine, args: string[]) {
|
||||
const tool = args[0];
|
||||
const jsonStr = args[1];
|
||||
|
||||
if (!tool) {
|
||||
console.error('Usage: gbrain call <tool> \'<json>\'');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const params = jsonStr ? JSON.parse(jsonStr) : {};
|
||||
const result = await handleToolCall(engine, tool, params);
|
||||
console.log(JSON.stringify(result, null, 2));
|
||||
}
|
||||
23
src/commands/config.ts
Normal file
23
src/commands/config.ts
Normal file
@@ -0,0 +1,23 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runConfig(engine: BrainEngine, args: string[]) {
|
||||
const action = args[0];
|
||||
const key = args[1];
|
||||
const value = args[2];
|
||||
|
||||
if (action === 'get' && key) {
|
||||
const val = await engine.getConfig(key);
|
||||
if (val !== null) {
|
||||
console.log(val);
|
||||
} else {
|
||||
console.error(`Config key not found: ${key}`);
|
||||
process.exit(1);
|
||||
}
|
||||
} else if (action === 'set' && key && value) {
|
||||
await engine.setConfig(key, value);
|
||||
console.log(`Set ${key} = ${value}`);
|
||||
} else {
|
||||
console.error('Usage: gbrain config [get|set] <key> [value]');
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
18
src/commands/delete.ts
Normal file
18
src/commands/delete.ts
Normal file
@@ -0,0 +1,18 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runDelete(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain delete <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const page = await engine.getPage(slug);
|
||||
if (!page) {
|
||||
console.error(`Page not found: ${slug}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
await engine.deletePage(slug);
|
||||
console.log(`Deleted: ${slug}`);
|
||||
}
|
||||
113
src/commands/embed.ts
Normal file
113
src/commands/embed.ts
Normal file
@@ -0,0 +1,113 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { embedBatch } from '../core/embedding.ts';
|
||||
import type { ChunkInput } from '../core/types.ts';
|
||||
import { chunkText } from '../core/chunkers/recursive.ts';
|
||||
|
||||
export async function runEmbed(engine: BrainEngine, args: string[]) {
|
||||
const slug = args.find(a => !a.startsWith('--'));
|
||||
const all = args.includes('--all');
|
||||
const stale = args.includes('--stale');
|
||||
|
||||
if (slug) {
|
||||
await embedPage(engine, slug);
|
||||
} else if (all || stale) {
|
||||
await embedAll(engine, stale);
|
||||
} else {
|
||||
console.error('Usage: gbrain embed [<slug>|--all|--stale]');
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
async function embedPage(engine: BrainEngine, slug: string) {
|
||||
const page = await engine.getPage(slug);
|
||||
if (!page) {
|
||||
console.error(`Page not found: ${slug}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Get existing chunks or create new ones
|
||||
let chunks = await engine.getChunks(slug);
|
||||
if (chunks.length === 0) {
|
||||
// Create chunks first
|
||||
const inputs: ChunkInput[] = [];
|
||||
if (page.compiled_truth.trim()) {
|
||||
for (const c of chunkText(page.compiled_truth)) {
|
||||
inputs.push({ chunk_index: inputs.length, chunk_text: c.text, chunk_source: 'compiled_truth' });
|
||||
}
|
||||
}
|
||||
if (page.timeline.trim()) {
|
||||
for (const c of chunkText(page.timeline)) {
|
||||
inputs.push({ chunk_index: inputs.length, chunk_text: c.text, chunk_source: 'timeline' });
|
||||
}
|
||||
}
|
||||
if (inputs.length > 0) {
|
||||
await engine.upsertChunks(slug, inputs);
|
||||
chunks = await engine.getChunks(slug);
|
||||
}
|
||||
}
|
||||
|
||||
// Embed chunks without embeddings
|
||||
const toEmbed = chunks.filter(c => !c.embedded_at);
|
||||
if (toEmbed.length === 0) {
|
||||
console.log(`${slug}: all ${chunks.length} chunks already embedded`);
|
||||
return;
|
||||
}
|
||||
|
||||
const embeddings = await embedBatch(toEmbed.map(c => c.chunk_text));
|
||||
const updated: ChunkInput[] = chunks.map((c, i) => {
|
||||
const needsEmbed = toEmbed.find(te => te.chunk_index === c.chunk_index);
|
||||
const embIdx = needsEmbed ? toEmbed.indexOf(needsEmbed) : -1;
|
||||
return {
|
||||
chunk_index: c.chunk_index,
|
||||
chunk_text: c.chunk_text,
|
||||
chunk_source: c.chunk_source,
|
||||
embedding: embIdx >= 0 ? embeddings[embIdx] : undefined,
|
||||
token_count: c.token_count || Math.ceil(c.chunk_text.length / 4),
|
||||
};
|
||||
});
|
||||
|
||||
await engine.upsertChunks(slug, updated);
|
||||
console.log(`${slug}: embedded ${toEmbed.length} chunks`);
|
||||
}
|
||||
|
||||
async function embedAll(engine: BrainEngine, staleOnly: boolean) {
|
||||
const pages = await engine.listPages({ limit: 100000 });
|
||||
let total = 0;
|
||||
let embedded = 0;
|
||||
|
||||
for (let i = 0; i < pages.length; i++) {
|
||||
const page = pages[i];
|
||||
const chunks = await engine.getChunks(page.slug);
|
||||
const toEmbed = staleOnly
|
||||
? chunks.filter(c => !c.embedded_at)
|
||||
: chunks;
|
||||
|
||||
if (toEmbed.length === 0) continue;
|
||||
|
||||
try {
|
||||
const embeddings = await embedBatch(toEmbed.map(c => c.chunk_text));
|
||||
// Build a map of new embeddings by chunk_index
|
||||
const embeddingMap = new Map<number, Float32Array>();
|
||||
for (let j = 0; j < toEmbed.length; j++) {
|
||||
embeddingMap.set(toEmbed[j].chunk_index, embeddings[j]);
|
||||
}
|
||||
// Preserve ALL chunks, only update embeddings for stale ones
|
||||
const updated: ChunkInput[] = chunks.map(c => ({
|
||||
chunk_index: c.chunk_index,
|
||||
chunk_text: c.chunk_text,
|
||||
chunk_source: c.chunk_source,
|
||||
embedding: embeddingMap.get(c.chunk_index) ?? undefined,
|
||||
token_count: c.token_count || Math.ceil(c.chunk_text.length / 4),
|
||||
}));
|
||||
await engine.upsertChunks(page.slug, updated);
|
||||
embedded += toEmbed.length;
|
||||
} catch (e: unknown) {
|
||||
console.error(`\n Error embedding ${page.slug}: ${e instanceof Error ? e.message : e}`);
|
||||
}
|
||||
|
||||
total += toEmbed.length;
|
||||
process.stdout.write(`\r ${i + 1}/${pages.length} pages, ${embedded} chunks embedded`);
|
||||
}
|
||||
|
||||
console.log(`\n\nEmbedded ${embedded} chunks across ${pages.length} pages`);
|
||||
}
|
||||
50
src/commands/export.ts
Normal file
50
src/commands/export.ts
Normal file
@@ -0,0 +1,50 @@
|
||||
import { writeFileSync, mkdirSync } from 'fs';
|
||||
import { join, dirname } from 'path';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { serializeMarkdown } from '../core/markdown.ts';
|
||||
|
||||
export async function runExport(engine: BrainEngine, args: string[]) {
|
||||
const dirIdx = args.indexOf('--dir');
|
||||
const outDir = dirIdx !== -1 ? args[dirIdx + 1] : './export';
|
||||
|
||||
const pages = await engine.listPages({ limit: 100000 });
|
||||
console.log(`Exporting ${pages.length} pages to ${outDir}/`);
|
||||
|
||||
let exported = 0;
|
||||
|
||||
for (const page of pages) {
|
||||
const tags = await engine.getTags(page.slug);
|
||||
const md = serializeMarkdown(
|
||||
page.frontmatter,
|
||||
page.compiled_truth,
|
||||
page.timeline,
|
||||
{ type: page.type, title: page.title, tags },
|
||||
);
|
||||
|
||||
const filePath = join(outDir, page.slug + '.md');
|
||||
mkdirSync(dirname(filePath), { recursive: true });
|
||||
writeFileSync(filePath, md);
|
||||
|
||||
// Export raw data as sidecar JSON
|
||||
const rawData = await engine.getRawData(page.slug);
|
||||
if (rawData.length > 0) {
|
||||
const slugParts = page.slug.split('/');
|
||||
const rawDir = join(outDir, ...slugParts.slice(0, -1), '.raw');
|
||||
mkdirSync(rawDir, { recursive: true });
|
||||
const rawPath = join(rawDir, slugParts[slugParts.length - 1] + '.json');
|
||||
|
||||
const rawObj: Record<string, unknown> = {};
|
||||
for (const rd of rawData) {
|
||||
rawObj[rd.source] = rd.data;
|
||||
}
|
||||
writeFileSync(rawPath, JSON.stringify(rawObj, null, 2) + '\n');
|
||||
}
|
||||
|
||||
exported++;
|
||||
if (exported % 100 === 0) {
|
||||
process.stdout.write(`\r ${exported}/${pages.length} exported`);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\nExported ${exported} pages to ${outDir}/`);
|
||||
}
|
||||
37
src/commands/get.ts
Normal file
37
src/commands/get.ts
Normal file
@@ -0,0 +1,37 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { serializeMarkdown } from '../core/markdown.ts';
|
||||
|
||||
export async function runGet(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain get <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Try exact match first, then fuzzy resolve
|
||||
let page = await engine.getPage(slug);
|
||||
if (!page) {
|
||||
const candidates = await engine.resolveSlugs(slug);
|
||||
if (candidates.length === 1) {
|
||||
page = await engine.getPage(candidates[0]);
|
||||
} else if (candidates.length > 1) {
|
||||
console.error(`Ambiguous slug "${slug}". Did you mean:`);
|
||||
for (const c of candidates) console.error(` ${c}`);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
if (!page) {
|
||||
console.error(`Page not found: ${slug}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const tags = await engine.getTags(page.slug);
|
||||
const md = serializeMarkdown(
|
||||
page.frontmatter,
|
||||
page.compiled_truth,
|
||||
page.timeline,
|
||||
{ type: page.type, title: page.title, tags },
|
||||
);
|
||||
process.stdout.write(md);
|
||||
}
|
||||
36
src/commands/health.ts
Normal file
36
src/commands/health.ts
Normal file
@@ -0,0 +1,36 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runHealth(engine: BrainEngine) {
|
||||
const health = await engine.getHealth();
|
||||
|
||||
const coveragePct = (health.embed_coverage * 100).toFixed(1);
|
||||
|
||||
console.log('Brain Health Dashboard');
|
||||
console.log('======================');
|
||||
console.log(`Pages: ${health.page_count}`);
|
||||
console.log(`Embed coverage: ${coveragePct}%`);
|
||||
console.log(`Missing embeddings: ${health.missing_embeddings}`);
|
||||
console.log(`Stale pages: ${health.stale_pages}`);
|
||||
console.log(`Orphan pages: ${health.orphan_pages}`);
|
||||
console.log(`Dead links: ${health.dead_links}`);
|
||||
|
||||
// Health score: simple heuristic
|
||||
let score = 10;
|
||||
if (health.embed_coverage < 0.5) score -= 3;
|
||||
else if (health.embed_coverage < 0.9) score -= 1;
|
||||
if (health.stale_pages > health.page_count * 0.2) score -= 2;
|
||||
if (health.orphan_pages > health.page_count * 0.3) score -= 1;
|
||||
if (health.dead_links > 0) score -= 1;
|
||||
if (health.missing_embeddings > 0) score -= 1;
|
||||
score = Math.max(0, score);
|
||||
|
||||
console.log(`\nHealth score: ${score}/10`);
|
||||
|
||||
if (score < 7) {
|
||||
console.log('\nRecommendations:');
|
||||
if (health.missing_embeddings > 0) console.log(' Run: gbrain embed --stale');
|
||||
if (health.stale_pages > 0) console.log(' Review stale pages (compiled_truth older than timeline)');
|
||||
if (health.orphan_pages > 0) console.log(' Add links to orphan pages');
|
||||
if (health.dead_links > 0) console.log(' Fix dead links');
|
||||
}
|
||||
}
|
||||
152
src/commands/import.ts
Normal file
152
src/commands/import.ts
Normal file
@@ -0,0 +1,152 @@
|
||||
import { readFileSync, readdirSync, statSync } from 'fs';
|
||||
import { join, relative } from 'path';
|
||||
import { createHash } from 'crypto';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { parseMarkdown } from '../core/markdown.ts';
|
||||
import { chunkText } from '../core/chunkers/recursive.ts';
|
||||
import { embed, embedBatch } from '../core/embedding.ts';
|
||||
import type { ChunkInput } from '../core/types.ts';
|
||||
|
||||
export async function runImport(engine: BrainEngine, args: string[]) {
|
||||
const dir = args.find(a => !a.startsWith('--'));
|
||||
const noEmbed = args.includes('--no-embed');
|
||||
|
||||
if (!dir) {
|
||||
console.error('Usage: gbrain import <dir> [--no-embed]');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Collect all .md files
|
||||
const files = collectMarkdownFiles(dir);
|
||||
console.log(`Found ${files.length} markdown files`);
|
||||
|
||||
let imported = 0;
|
||||
let skipped = 0;
|
||||
let chunksCreated = 0;
|
||||
|
||||
for (let i = 0; i < files.length; i++) {
|
||||
const filePath = files[i];
|
||||
const relativePath = relative(dir, filePath);
|
||||
|
||||
// Progress
|
||||
if ((i + 1) % 100 === 0 || i === files.length - 1) {
|
||||
process.stdout.write(`\r ${i + 1}/${files.length} files processed, ${imported} imported, ${skipped} skipped`);
|
||||
}
|
||||
|
||||
try {
|
||||
const content = readFileSync(filePath, 'utf-8');
|
||||
const parsed = parseMarkdown(content, relativePath);
|
||||
const slug = parsed.slug;
|
||||
|
||||
// Check content hash for idempotency
|
||||
const hash = createHash('sha256')
|
||||
.update(parsed.compiled_truth + '\n---\n' + parsed.timeline)
|
||||
.digest('hex');
|
||||
|
||||
const existing = await engine.getPage(slug);
|
||||
if (existing?.content_hash === hash) {
|
||||
skipped++;
|
||||
continue;
|
||||
}
|
||||
|
||||
// Upsert page
|
||||
await engine.putPage(slug, {
|
||||
type: parsed.type,
|
||||
title: parsed.title,
|
||||
compiled_truth: parsed.compiled_truth,
|
||||
timeline: parsed.timeline,
|
||||
frontmatter: parsed.frontmatter,
|
||||
});
|
||||
|
||||
// Tags
|
||||
for (const tag of parsed.tags) {
|
||||
await engine.addTag(slug, tag);
|
||||
}
|
||||
|
||||
// Chunk
|
||||
const chunks: ChunkInput[] = [];
|
||||
|
||||
if (parsed.compiled_truth.trim()) {
|
||||
const ctChunks = chunkText(parsed.compiled_truth);
|
||||
for (const c of ctChunks) {
|
||||
chunks.push({
|
||||
chunk_index: chunks.length,
|
||||
chunk_text: c.text,
|
||||
chunk_source: 'compiled_truth',
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
if (parsed.timeline.trim()) {
|
||||
const tlChunks = chunkText(parsed.timeline);
|
||||
for (const c of tlChunks) {
|
||||
chunks.push({
|
||||
chunk_index: chunks.length,
|
||||
chunk_text: c.text,
|
||||
chunk_source: 'timeline',
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Embed if requested
|
||||
if (!noEmbed && chunks.length > 0) {
|
||||
try {
|
||||
const embeddings = await embedBatch(chunks.map(c => c.chunk_text));
|
||||
for (let j = 0; j < chunks.length; j++) {
|
||||
chunks[j].embedding = embeddings[j];
|
||||
chunks[j].token_count = Math.ceil(chunks[j].chunk_text.length / 4);
|
||||
}
|
||||
} catch {
|
||||
// Embedding failure is non-fatal, chunks still saved without embeddings
|
||||
}
|
||||
}
|
||||
|
||||
if (chunks.length > 0) {
|
||||
await engine.upsertChunks(slug, chunks);
|
||||
chunksCreated += chunks.length;
|
||||
}
|
||||
|
||||
imported++;
|
||||
} catch (e: unknown) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
console.error(`\n Warning: skipped ${relativePath}: ${msg}`);
|
||||
skipped++;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n\nImport complete:`);
|
||||
console.log(` ${imported} pages imported`);
|
||||
console.log(` ${skipped} pages skipped (unchanged or error)`);
|
||||
console.log(` ${chunksCreated} chunks created`);
|
||||
|
||||
// Log the ingest
|
||||
await engine.logIngest({
|
||||
source_type: 'directory',
|
||||
source_ref: dir,
|
||||
pages_updated: [],
|
||||
summary: `Imported ${imported} pages, ${skipped} skipped, ${chunksCreated} chunks`,
|
||||
});
|
||||
}
|
||||
|
||||
function collectMarkdownFiles(dir: string): string[] {
|
||||
const files: string[] = [];
|
||||
|
||||
function walk(d: string) {
|
||||
for (const entry of readdirSync(d)) {
|
||||
// Skip hidden dirs and .raw dirs
|
||||
if (entry.startsWith('.')) continue;
|
||||
|
||||
const full = join(d, entry);
|
||||
const stat = statSync(full);
|
||||
|
||||
if (stat.isDirectory()) {
|
||||
walk(full);
|
||||
} else if (entry.endsWith('.md')) {
|
||||
files.push(full);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
walk(dir);
|
||||
return files.sort();
|
||||
}
|
||||
82
src/commands/init.ts
Normal file
82
src/commands/init.ts
Normal file
@@ -0,0 +1,82 @@
|
||||
import { execSync } from 'child_process';
|
||||
import { PostgresEngine } from '../core/postgres-engine.ts';
|
||||
import { saveConfig, type GBrainConfig } from '../core/config.ts';
|
||||
|
||||
export async function runInit(args: string[]) {
|
||||
const isSupabase = args.includes('--supabase');
|
||||
const urlIndex = args.indexOf('--url');
|
||||
const manualUrl = urlIndex !== -1 ? args[urlIndex + 1] : null;
|
||||
|
||||
let databaseUrl: string;
|
||||
|
||||
if (manualUrl) {
|
||||
databaseUrl = manualUrl;
|
||||
} else if (isSupabase) {
|
||||
databaseUrl = await supabaseWizard();
|
||||
} else {
|
||||
// Default to supabase wizard
|
||||
databaseUrl = await supabaseWizard();
|
||||
}
|
||||
|
||||
// Connect and init schema
|
||||
console.log('Connecting to database...');
|
||||
const engine = new PostgresEngine();
|
||||
await engine.connect({ database_url: databaseUrl });
|
||||
|
||||
console.log('Running schema migration...');
|
||||
await engine.initSchema();
|
||||
|
||||
// Save config
|
||||
const config: GBrainConfig = {
|
||||
engine: 'postgres',
|
||||
database_url: databaseUrl,
|
||||
};
|
||||
saveConfig(config);
|
||||
console.log('Config saved to ~/.gbrain/config.json');
|
||||
|
||||
// Verify
|
||||
const stats = await engine.getStats();
|
||||
await engine.disconnect();
|
||||
|
||||
console.log(`\nBrain ready. ${stats.page_count} pages.`);
|
||||
console.log('Next: gbrain import <dir> to migrate your markdown.');
|
||||
}
|
||||
|
||||
async function supabaseWizard(): Promise<string> {
|
||||
// Try Supabase CLI auto-provision
|
||||
try {
|
||||
execSync('npx supabase --version', { stdio: 'pipe' });
|
||||
console.log('Supabase CLI detected.');
|
||||
console.log('To auto-provision, run: npx supabase login && npx supabase projects create');
|
||||
console.log('Then use: gbrain init --url <your-connection-string>');
|
||||
} catch {
|
||||
console.log('Supabase CLI not found.');
|
||||
console.log('Install it: npm install -g supabase');
|
||||
console.log('Or provide a connection URL directly.');
|
||||
}
|
||||
|
||||
// Fallback to manual URL
|
||||
console.log('\nEnter your Supabase/Postgres connection URL:');
|
||||
console.log(' Format: postgresql://user:password@host:port/database');
|
||||
console.log(' Find it: Supabase Dashboard > Settings > Database > Connection string\n');
|
||||
|
||||
const url = await readLine('Connection URL: ');
|
||||
if (!url) {
|
||||
console.error('No URL provided.');
|
||||
process.exit(1);
|
||||
}
|
||||
return url;
|
||||
}
|
||||
|
||||
function readLine(prompt: string): Promise<string> {
|
||||
return new Promise((resolve) => {
|
||||
process.stdout.write(prompt);
|
||||
let data = '';
|
||||
process.stdin.setEncoding('utf-8');
|
||||
process.stdin.once('data', (chunk) => {
|
||||
data = chunk.toString().trim();
|
||||
resolve(data);
|
||||
});
|
||||
process.stdin.resume();
|
||||
});
|
||||
}
|
||||
68
src/commands/link.ts
Normal file
68
src/commands/link.ts
Normal file
@@ -0,0 +1,68 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runLink(engine: BrainEngine, args: string[]) {
|
||||
const from = args[0];
|
||||
const to = args[1];
|
||||
const typeIdx = args.indexOf('--type');
|
||||
const linkType = typeIdx !== -1 ? args[typeIdx + 1] : '';
|
||||
|
||||
if (!from || !to) {
|
||||
console.error('Usage: gbrain link <from> <to> [--type <type>]');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
await engine.addLink(from, to, '', linkType);
|
||||
console.log(`Linked ${from} -> ${to}${linkType ? ` (${linkType})` : ''}`);
|
||||
}
|
||||
|
||||
export async function runUnlink(engine: BrainEngine, args: string[]) {
|
||||
const [from, to] = args;
|
||||
if (!from || !to) {
|
||||
console.error('Usage: gbrain unlink <from> <to>');
|
||||
process.exit(1);
|
||||
}
|
||||
await engine.removeLink(from, to);
|
||||
console.log(`Unlinked ${from} -> ${to}`);
|
||||
}
|
||||
|
||||
export async function runBacklinks(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain backlinks <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const links = await engine.getBacklinks(slug);
|
||||
if (links.length === 0) {
|
||||
console.log(`No backlinks to ${slug}`);
|
||||
return;
|
||||
}
|
||||
|
||||
for (const l of links) {
|
||||
const typeStr = l.link_type ? ` (${l.link_type})` : '';
|
||||
console.log(`${l.from_slug}${typeStr}`);
|
||||
}
|
||||
console.log(`\n${links.length} backlinks`);
|
||||
}
|
||||
|
||||
export async function runGraph(engine: BrainEngine, args: string[]) {
|
||||
const slug = args.find(a => !a.startsWith('--'));
|
||||
const depthIdx = args.indexOf('--depth');
|
||||
const depth = depthIdx !== -1 ? parseInt(args[depthIdx + 1], 10) : 5;
|
||||
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain graph <slug> [--depth N]');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const nodes = await engine.traverseGraph(slug, depth);
|
||||
|
||||
for (const node of nodes) {
|
||||
const indent = ' '.repeat(node.depth);
|
||||
const links = node.links.map(l => `${l.to_slug}${l.link_type ? `(${l.link_type})` : ''}`);
|
||||
console.log(`${indent}${node.slug} [${node.type}]`);
|
||||
if (links.length > 0) {
|
||||
console.log(`${indent} -> ${links.join(', ')}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
25
src/commands/list.ts
Normal file
25
src/commands/list.ts
Normal file
@@ -0,0 +1,25 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import type { PageType } from '../core/types.ts';
|
||||
|
||||
export async function runList(engine: BrainEngine, args: string[]) {
|
||||
const typeIdx = args.indexOf('--type');
|
||||
const tagIdx = args.indexOf('--tag');
|
||||
const limitIdx = args.indexOf('-n');
|
||||
|
||||
const type = typeIdx !== -1 ? (args[typeIdx + 1] as PageType) : undefined;
|
||||
const tag = tagIdx !== -1 ? args[tagIdx + 1] : undefined;
|
||||
const limit = limitIdx !== -1 ? parseInt(args[limitIdx + 1], 10) : 50;
|
||||
|
||||
const pages = await engine.listPages({ type, tag, limit });
|
||||
|
||||
if (pages.length === 0) {
|
||||
console.log('No pages found.');
|
||||
return;
|
||||
}
|
||||
|
||||
for (const p of pages) {
|
||||
const date = p.updated_at.toISOString().split('T')[0];
|
||||
console.log(`${p.slug}\t${p.type}\t${date}\t${p.title}`);
|
||||
}
|
||||
console.log(`\n${pages.length} pages`);
|
||||
}
|
||||
50
src/commands/put.ts
Normal file
50
src/commands/put.ts
Normal file
@@ -0,0 +1,50 @@
|
||||
import { readFileSync } from 'fs';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { parseMarkdown } from '../core/markdown.ts';
|
||||
|
||||
export async function runPut(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain put <slug> [< file.md]');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Read from stdin or file arg
|
||||
let content: string;
|
||||
const fileArg = args[1];
|
||||
if (fileArg) {
|
||||
content = readFileSync(fileArg, 'utf-8');
|
||||
} else if (!process.stdin.isTTY) {
|
||||
content = readFileSync('/dev/stdin', 'utf-8');
|
||||
} else {
|
||||
console.error('Provide content via stdin or file argument');
|
||||
console.error(' gbrain put people/john < john.md');
|
||||
console.error(' cat john.md | gbrain put people/john');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const parsed = parseMarkdown(content, slug + '.md');
|
||||
|
||||
// Create version snapshot before updating
|
||||
const existing = await engine.getPage(slug);
|
||||
if (existing) {
|
||||
await engine.createVersion(slug);
|
||||
}
|
||||
|
||||
const page = await engine.putPage(slug, {
|
||||
type: parsed.type,
|
||||
title: parsed.title,
|
||||
compiled_truth: parsed.compiled_truth,
|
||||
timeline: parsed.timeline,
|
||||
frontmatter: parsed.frontmatter,
|
||||
});
|
||||
|
||||
// Update tags
|
||||
if (parsed.tags.length > 0) {
|
||||
for (const tag of parsed.tags) {
|
||||
await engine.addTag(slug, tag);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`${existing ? 'Updated' : 'Created'}: ${page.slug} (${page.type})`);
|
||||
}
|
||||
32
src/commands/query.ts
Normal file
32
src/commands/query.ts
Normal file
@@ -0,0 +1,32 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { hybridSearch } from '../core/search/hybrid.ts';
|
||||
import { expandQuery } from '../core/search/expansion.ts';
|
||||
|
||||
export async function runQuery(engine: BrainEngine, args: string[]) {
|
||||
const query = args.filter(a => !a.startsWith('--')).join(' ');
|
||||
const noExpand = args.includes('--no-expand');
|
||||
|
||||
if (!query) {
|
||||
console.error('Usage: gbrain query <question>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const results = await hybridSearch(engine, query, {
|
||||
limit: 20,
|
||||
expansion: !noExpand,
|
||||
expandFn: expandQuery,
|
||||
});
|
||||
|
||||
if (results.length === 0) {
|
||||
console.log('No results found.');
|
||||
return;
|
||||
}
|
||||
|
||||
for (const r of results) {
|
||||
const staleTag = r.stale ? ' [STALE]' : '';
|
||||
console.log(`${r.slug} (${r.type}) score=${r.score.toFixed(4)}${staleTag}`);
|
||||
console.log(` ${r.chunk_text.slice(0, 120)}...`);
|
||||
console.log();
|
||||
}
|
||||
console.log(`${results.length} results`);
|
||||
}
|
||||
24
src/commands/search.ts
Normal file
24
src/commands/search.ts
Normal file
@@ -0,0 +1,24 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runSearch(engine: BrainEngine, args: string[]) {
|
||||
const query = args.join(' ');
|
||||
if (!query) {
|
||||
console.error('Usage: gbrain search <query>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const results = await engine.searchKeyword(query, { limit: 20 });
|
||||
|
||||
if (results.length === 0) {
|
||||
console.log('No results found.');
|
||||
return;
|
||||
}
|
||||
|
||||
for (const r of results) {
|
||||
const staleTag = r.stale ? ' [STALE]' : '';
|
||||
console.log(`${r.slug} (${r.type}) score=${r.score.toFixed(3)}${staleTag}`);
|
||||
console.log(` ${r.chunk_text.slice(0, 120)}...`);
|
||||
console.log();
|
||||
}
|
||||
console.log(`${results.length} results`);
|
||||
}
|
||||
7
src/commands/serve.ts
Normal file
7
src/commands/serve.ts
Normal file
@@ -0,0 +1,7 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { startMcpServer } from '../mcp/server.ts';
|
||||
|
||||
export async function runServe(engine: BrainEngine) {
|
||||
console.error('Starting GBrain MCP server (stdio)...');
|
||||
await startMcpServer(engine);
|
||||
}
|
||||
21
src/commands/stats.ts
Normal file
21
src/commands/stats.ts
Normal file
@@ -0,0 +1,21 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runStats(engine: BrainEngine) {
|
||||
const stats = await engine.getStats();
|
||||
|
||||
console.log('Brain Statistics');
|
||||
console.log('================');
|
||||
console.log(`Pages: ${stats.page_count}`);
|
||||
console.log(`Chunks: ${stats.chunk_count}`);
|
||||
console.log(`Embedded: ${stats.embedded_count}`);
|
||||
console.log(`Links: ${stats.link_count}`);
|
||||
console.log(`Tags: ${stats.tag_count}`);
|
||||
console.log(`Timeline entries: ${stats.timeline_entry_count}`);
|
||||
|
||||
if (Object.keys(stats.pages_by_type).length > 0) {
|
||||
console.log('\nPages by type:');
|
||||
for (const [type, count] of Object.entries(stats.pages_by_type)) {
|
||||
console.log(` ${type}: ${count}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
36
src/commands/tags.ts
Normal file
36
src/commands/tags.ts
Normal file
@@ -0,0 +1,36 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runTags(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain tags <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const tags = await engine.getTags(slug);
|
||||
if (tags.length === 0) {
|
||||
console.log(`No tags for ${slug}`);
|
||||
} else {
|
||||
console.log(tags.join(', '));
|
||||
}
|
||||
}
|
||||
|
||||
export async function runTag(engine: BrainEngine, args: string[]) {
|
||||
const [slug, tag] = args;
|
||||
if (!slug || !tag) {
|
||||
console.error('Usage: gbrain tag <slug> <tag>');
|
||||
process.exit(1);
|
||||
}
|
||||
await engine.addTag(slug, tag);
|
||||
console.log(`Tagged ${slug} with "${tag}"`);
|
||||
}
|
||||
|
||||
export async function runUntag(engine: BrainEngine, args: string[]) {
|
||||
const [slug, tag] = args;
|
||||
if (!slug || !tag) {
|
||||
console.error('Usage: gbrain untag <slug> <tag>');
|
||||
process.exit(1);
|
||||
}
|
||||
await engine.removeTag(slug, tag);
|
||||
console.log(`Removed tag "${tag}" from ${slug}`);
|
||||
}
|
||||
40
src/commands/timeline.ts
Normal file
40
src/commands/timeline.ts
Normal file
@@ -0,0 +1,40 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runTimeline(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain timeline <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const entries = await engine.getTimeline(slug);
|
||||
if (entries.length === 0) {
|
||||
console.log(`No timeline entries for ${slug}`);
|
||||
return;
|
||||
}
|
||||
|
||||
for (const e of entries) {
|
||||
const source = e.source ? ` [${e.source}]` : '';
|
||||
console.log(`${e.date}${source}: ${e.summary}`);
|
||||
if (e.detail) {
|
||||
console.log(` ${e.detail.slice(0, 200)}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
export async function runTimelineAdd(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
const date = args[1];
|
||||
const text = args.slice(2).join(' ');
|
||||
|
||||
if (!slug || !date || !text) {
|
||||
console.error('Usage: gbrain timeline-add <slug> <date> <text>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
await engine.addTimelineEntry(slug, {
|
||||
date,
|
||||
summary: text,
|
||||
});
|
||||
console.log(`Added timeline entry to ${slug}`);
|
||||
}
|
||||
29
src/commands/tools-json.ts
Normal file
29
src/commands/tools-json.ts
Normal file
@@ -0,0 +1,29 @@
|
||||
export function printToolsJson() {
|
||||
const tools = [
|
||||
{ name: 'get', description: 'Read a page by slug', parameters: { slug: 'string' } },
|
||||
{ name: 'put', description: 'Write/update a page', parameters: { slug: 'string', content: 'string (markdown)' } },
|
||||
{ name: 'delete', description: 'Delete a page', parameters: { slug: 'string' } },
|
||||
{ name: 'list', description: 'List pages with optional filters', parameters: { type: 'string?', tag: 'string?', limit: 'number?' } },
|
||||
{ name: 'search', description: 'Keyword search (tsvector)', parameters: { query: 'string' } },
|
||||
{ name: 'query', description: 'Hybrid search (RRF + multi-query expansion)', parameters: { query: 'string' } },
|
||||
{ name: 'import', description: 'Import markdown directory', parameters: { dir: 'string', no_embed: 'boolean?' } },
|
||||
{ name: 'export', description: 'Export to markdown directory', parameters: { dir: 'string?' } },
|
||||
{ name: 'embed', description: 'Generate/refresh embeddings', parameters: { slug: 'string?', all: 'boolean?', stale: 'boolean?' } },
|
||||
{ name: 'tag', description: 'Add tag to page', parameters: { slug: 'string', tag: 'string' } },
|
||||
{ name: 'untag', description: 'Remove tag from page', parameters: { slug: 'string', tag: 'string' } },
|
||||
{ name: 'tags', description: 'List tags for a page', parameters: { slug: 'string' } },
|
||||
{ name: 'link', description: 'Create typed link between pages', parameters: { from: 'string', to: 'string', type: 'string?' } },
|
||||
{ name: 'unlink', description: 'Remove link between pages', parameters: { from: 'string', to: 'string' } },
|
||||
{ name: 'backlinks', description: 'List incoming links to a page', parameters: { slug: 'string' } },
|
||||
{ name: 'graph', description: 'Traverse link graph from a page', parameters: { slug: 'string', depth: 'number?' } },
|
||||
{ name: 'timeline', description: 'View timeline entries for a page', parameters: { slug: 'string' } },
|
||||
{ name: 'timeline-add', description: 'Add timeline entry', parameters: { slug: 'string', date: 'string', text: 'string' } },
|
||||
{ name: 'stats', description: 'Brain statistics', parameters: {} },
|
||||
{ name: 'health', description: 'Brain health dashboard', parameters: {} },
|
||||
{ name: 'history', description: 'Page version history', parameters: { slug: 'string' } },
|
||||
{ name: 'revert', description: 'Revert page to version', parameters: { slug: 'string', version_id: 'number' } },
|
||||
{ name: 'config', description: 'Get/set brain config', parameters: { action: '"get"|"set"', key: 'string', value: 'string?' } },
|
||||
];
|
||||
|
||||
console.log(JSON.stringify(tools, null, 2));
|
||||
}
|
||||
67
src/commands/upgrade.ts
Normal file
67
src/commands/upgrade.ts
Normal file
@@ -0,0 +1,67 @@
|
||||
import { execSync } from 'child_process';
|
||||
|
||||
export async function runUpgrade(_args: string[]) {
|
||||
// Detect installation method
|
||||
const method = detectInstallMethod();
|
||||
|
||||
console.log(`Detected install method: ${method}`);
|
||||
|
||||
switch (method) {
|
||||
case 'npm':
|
||||
console.log('Upgrading via npm...');
|
||||
try {
|
||||
execSync('bun update gbrain', { stdio: 'inherit' });
|
||||
console.log('Upgrade complete.');
|
||||
} catch {
|
||||
console.error('npm upgrade failed. Try: bun update gbrain');
|
||||
}
|
||||
break;
|
||||
|
||||
case 'binary':
|
||||
console.log('Binary self-update not yet implemented.');
|
||||
console.log('Download the latest binary from GitHub Releases:');
|
||||
console.log(' https://github.com/garrytan/gbrain/releases');
|
||||
break;
|
||||
|
||||
case 'clawhub':
|
||||
console.log('Upgrading via ClawHub...');
|
||||
try {
|
||||
execSync('clawhub update gbrain', { stdio: 'inherit' });
|
||||
console.log('Upgrade complete.');
|
||||
} catch {
|
||||
console.error('ClawHub upgrade failed. Try: clawhub update gbrain');
|
||||
}
|
||||
break;
|
||||
|
||||
default:
|
||||
console.error('Could not detect installation method.');
|
||||
console.log('Try one of:');
|
||||
console.log(' bun update gbrain');
|
||||
console.log(' clawhub update gbrain');
|
||||
console.log(' Download from https://github.com/garrytan/gbrain/releases');
|
||||
}
|
||||
}
|
||||
|
||||
function detectInstallMethod(): 'npm' | 'binary' | 'clawhub' | 'unknown' {
|
||||
const execPath = process.execPath || '';
|
||||
|
||||
// Check if running from node_modules (npm install)
|
||||
if (execPath.includes('node_modules') || process.argv[1]?.includes('node_modules')) {
|
||||
return 'npm';
|
||||
}
|
||||
|
||||
// Check if clawhub is available
|
||||
try {
|
||||
execSync('which clawhub', { stdio: 'pipe' });
|
||||
return 'clawhub';
|
||||
} catch {
|
||||
// not available
|
||||
}
|
||||
|
||||
// Check if running as compiled binary
|
||||
if (execPath.endsWith('/gbrain') || execPath.endsWith('\\gbrain.exe')) {
|
||||
return 'binary';
|
||||
}
|
||||
|
||||
return 'unknown';
|
||||
}
|
||||
39
src/commands/version.ts
Normal file
39
src/commands/version.ts
Normal file
@@ -0,0 +1,39 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runHistory(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain history <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const versions = await engine.getVersions(slug);
|
||||
if (versions.length === 0) {
|
||||
console.log(`No version history for ${slug}`);
|
||||
return;
|
||||
}
|
||||
|
||||
console.log(`Version history for ${slug}:`);
|
||||
for (const v of versions) {
|
||||
const date = new Date(v.snapshot_at).toISOString();
|
||||
const preview = v.compiled_truth.slice(0, 80).replace(/\n/g, ' ');
|
||||
console.log(` #${v.id} ${date} ${preview}...`);
|
||||
}
|
||||
}
|
||||
|
||||
export async function runRevert(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
const versionId = args[1] ? parseInt(args[1], 10) : NaN;
|
||||
|
||||
if (!slug || isNaN(versionId)) {
|
||||
console.error('Usage: gbrain revert <slug> <version-id>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Create a snapshot before reverting
|
||||
await engine.createVersion(slug);
|
||||
|
||||
await engine.revertToVersion(slug, versionId);
|
||||
console.log(`Reverted ${slug} to version #${versionId}`);
|
||||
console.log('Note: run gbrain embed <slug> to re-embed the reverted content');
|
||||
}
|
||||
163
src/core/chunkers/llm.ts
Normal file
163
src/core/chunkers/llm.ts
Normal file
@@ -0,0 +1,163 @@
|
||||
/**
|
||||
* LLM-Guided Text Chunker
|
||||
* Ported from production Ruby implementation (llm_text_chunker.rb, 167 LOC)
|
||||
*
|
||||
* Algorithm:
|
||||
* 1. Pre-split into 128-word candidates via recursive chunker
|
||||
* 2. Sliding window of 3+ candidates
|
||||
* 3. Ask Claude Haiku: "Where does the FIRST topic shift occur?"
|
||||
* 4. Max 3 retries per window on unparseable responses
|
||||
* 5. Merge candidates between split points
|
||||
*/
|
||||
|
||||
import { chunkText as recursiveChunk, type TextChunk } from './recursive.ts';
|
||||
|
||||
const CANDIDATE_SIZE = 128; // words per pre-split candidate
|
||||
const MAX_RETRIES = 3;
|
||||
const WINDOW_SIZE = 5; // candidates per window
|
||||
|
||||
export interface LlmChunkOptions {
|
||||
chunkSize?: number;
|
||||
chunkOverlap?: number;
|
||||
askLlm?: (prompt: string) => Promise<string>;
|
||||
}
|
||||
|
||||
export async function chunkTextLlm(
|
||||
text: string,
|
||||
opts: LlmChunkOptions,
|
||||
): Promise<TextChunk[]> {
|
||||
const chunkSize = opts.chunkSize || 300;
|
||||
const chunkOverlap = opts.chunkOverlap || 50;
|
||||
const askLlm = opts.askLlm;
|
||||
|
||||
if (!askLlm) {
|
||||
return recursiveChunk(text, { chunkSize, chunkOverlap });
|
||||
}
|
||||
|
||||
try {
|
||||
// Step 1: Pre-split into small candidates
|
||||
const candidates = recursiveChunk(text, {
|
||||
chunkSize: CANDIDATE_SIZE,
|
||||
chunkOverlap: 0,
|
||||
});
|
||||
|
||||
if (candidates.length <= 2) {
|
||||
return recursiveChunk(text, { chunkSize, chunkOverlap });
|
||||
}
|
||||
|
||||
// Step 2: Find split points via LLM
|
||||
const splitPoints = await findSplitPoints(candidates, askLlm);
|
||||
|
||||
// Step 3: Merge candidates between split points
|
||||
const merged = mergeAtSplits(candidates, splitPoints);
|
||||
|
||||
return merged.map((t, i) => ({ text: t.trim(), index: i }));
|
||||
} catch {
|
||||
return recursiveChunk(text, { chunkSize, chunkOverlap });
|
||||
}
|
||||
}
|
||||
|
||||
async function findSplitPoints(
|
||||
candidates: TextChunk[],
|
||||
askLlm: (prompt: string) => Promise<string>,
|
||||
): Promise<number[]> {
|
||||
const splitPoints: number[] = [];
|
||||
let pos = 0;
|
||||
|
||||
while (pos < candidates.length - 1) {
|
||||
const windowEnd = Math.min(pos + WINDOW_SIZE, candidates.length);
|
||||
const window = candidates.slice(pos, windowEnd);
|
||||
|
||||
if (window.length < 2) break;
|
||||
|
||||
const splitAt = await askForSplit(window, pos, askLlm);
|
||||
|
||||
if (splitAt !== null && splitAt > pos) {
|
||||
splitPoints.push(splitAt);
|
||||
pos = splitAt;
|
||||
} else {
|
||||
// No split found in this window, advance by 1
|
||||
pos++;
|
||||
}
|
||||
}
|
||||
|
||||
return splitPoints;
|
||||
}
|
||||
|
||||
async function askForSplit(
|
||||
window: TextChunk[],
|
||||
offset: number,
|
||||
askLlm: (prompt: string) => Promise<string>,
|
||||
): Promise<number | null> {
|
||||
// Format candidates as numbered items
|
||||
const numbered = window
|
||||
.map((c, i) => `[${offset + i}] ${c.text.slice(0, 200)}${c.text.length > 200 ? '...' : ''}`)
|
||||
.join('\n\n');
|
||||
|
||||
const prompt = `You are analyzing a document that has been split into numbered segments. Your job is to find where the FIRST major topic shift occurs.
|
||||
|
||||
Here are the segments:
|
||||
|
||||
${numbered}
|
||||
|
||||
If there is a clear topic shift between any two adjacent segments, respond with ONLY the number of the segment where the NEW topic begins. For example, if the topic shifts between [${offset + 1}] and [${offset + 2}], respond with: ${offset + 2}
|
||||
|
||||
If there is no clear topic shift, respond with: NONE
|
||||
|
||||
Respond with only a number or NONE. Nothing else.`;
|
||||
|
||||
for (let retry = 0; retry < MAX_RETRIES; retry++) {
|
||||
try {
|
||||
const response = await askLlm(prompt);
|
||||
const parsed = parseSplitResponse(response, offset, offset + window.length - 1);
|
||||
return parsed;
|
||||
} catch {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
function parseSplitResponse(
|
||||
response: string,
|
||||
minId: number,
|
||||
maxId: number,
|
||||
): number | null {
|
||||
const trimmed = response.trim().toUpperCase();
|
||||
if (trimmed === 'NONE') return null;
|
||||
|
||||
const num = parseInt(trimmed, 10);
|
||||
if (isNaN(num)) return null;
|
||||
|
||||
// Clamp to valid range, ensure forward progress
|
||||
const clamped = Math.max(num, minId + 1);
|
||||
if (clamped > maxId) return null;
|
||||
|
||||
return clamped;
|
||||
}
|
||||
|
||||
function mergeAtSplits(candidates: TextChunk[], splitPoints: number[]): string[] {
|
||||
if (splitPoints.length === 0) {
|
||||
return [candidates.map(c => c.text).join(' ')];
|
||||
}
|
||||
|
||||
const result: string[] = [];
|
||||
let start = 0;
|
||||
|
||||
for (const split of splitPoints) {
|
||||
const group = candidates.slice(start, split);
|
||||
if (group.length > 0) {
|
||||
result.push(group.map(c => c.text).join(' '));
|
||||
}
|
||||
start = split;
|
||||
}
|
||||
|
||||
// Last group
|
||||
const remaining = candidates.slice(start);
|
||||
if (remaining.length > 0) {
|
||||
result.push(remaining.map(c => c.text).join(' '));
|
||||
}
|
||||
|
||||
return result.filter(t => t.trim().length > 0);
|
||||
}
|
||||
211
src/core/chunkers/recursive.ts
Normal file
211
src/core/chunkers/recursive.ts
Normal file
@@ -0,0 +1,211 @@
|
||||
/**
|
||||
* Recursive Delimiter-Aware Text Chunker
|
||||
* Ported from production Ruby implementation (text_chunker.rb, 205 LOC)
|
||||
*
|
||||
* 5-level delimiter hierarchy:
|
||||
* 1. Paragraphs (\n\n)
|
||||
* 2. Lines (\n)
|
||||
* 3. Sentences (. ! ? followed by space or newline)
|
||||
* 4. Clauses (; : , )
|
||||
* 5. Words (whitespace)
|
||||
*
|
||||
* Config: 300-word chunks with 50-word sentence-aware overlap.
|
||||
* Lossless invariant: non-overlapping portions reassemble to original.
|
||||
*/
|
||||
|
||||
const DELIMITERS: string[][] = [
|
||||
['\n\n'], // L0: paragraphs
|
||||
['\n'], // L1: lines
|
||||
['. ', '! ', '? ', '.\n', '!\n', '?\n'], // L2: sentences
|
||||
['; ', ': ', ', '], // L3: clauses
|
||||
[], // L4: words (whitespace split)
|
||||
];
|
||||
|
||||
export interface ChunkOptions {
|
||||
chunkSize?: number; // target words per chunk (default 300)
|
||||
chunkOverlap?: number; // overlap words (default 50)
|
||||
}
|
||||
|
||||
export interface TextChunk {
|
||||
text: string;
|
||||
index: number;
|
||||
}
|
||||
|
||||
export function chunkText(text: string, opts?: ChunkOptions): TextChunk[] {
|
||||
const chunkSize = opts?.chunkSize || 300;
|
||||
const chunkOverlap = opts?.chunkOverlap || 50;
|
||||
|
||||
if (!text || text.trim().length === 0) return [];
|
||||
|
||||
const wordCount = countWords(text);
|
||||
if (wordCount <= chunkSize) {
|
||||
return [{ text: text.trim(), index: 0 }];
|
||||
}
|
||||
|
||||
// Recursively split, then greedily merge to target size
|
||||
const pieces = recursiveSplit(text, 0, chunkSize);
|
||||
const merged = greedyMerge(pieces, chunkSize);
|
||||
const withOverlap = applyOverlap(merged, chunkOverlap);
|
||||
|
||||
return withOverlap.map((t, i) => ({ text: t.trim(), index: i }));
|
||||
}
|
||||
|
||||
function recursiveSplit(text: string, level: number, target: number): string[] {
|
||||
if (level >= DELIMITERS.length) {
|
||||
// Level 4: split on whitespace
|
||||
return splitOnWhitespace(text, target);
|
||||
}
|
||||
|
||||
const delimiters = DELIMITERS[level];
|
||||
if (delimiters.length === 0) {
|
||||
return splitOnWhitespace(text, target);
|
||||
}
|
||||
|
||||
const pieces = splitAtDelimiters(text, delimiters);
|
||||
|
||||
// If splitting didn't help (only 1 piece), try next level
|
||||
if (pieces.length <= 1) {
|
||||
return recursiveSplit(text, level + 1, target);
|
||||
}
|
||||
|
||||
// Check if any piece is still too large, recurse deeper
|
||||
const result: string[] = [];
|
||||
for (const piece of pieces) {
|
||||
if (countWords(piece) > target) {
|
||||
result.push(...recursiveSplit(piece, level + 1, target));
|
||||
} else {
|
||||
result.push(piece);
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Split text at delimiter boundaries, preserving delimiters at the end
|
||||
* of the piece that precedes them (lossless).
|
||||
*/
|
||||
function splitAtDelimiters(text: string, delimiters: string[]): string[] {
|
||||
const pieces: string[] = [];
|
||||
let remaining = text;
|
||||
|
||||
while (remaining.length > 0) {
|
||||
let earliest = -1;
|
||||
let earliestDelim = '';
|
||||
|
||||
for (const delim of delimiters) {
|
||||
const idx = remaining.indexOf(delim);
|
||||
if (idx !== -1 && (earliest === -1 || idx < earliest)) {
|
||||
earliest = idx;
|
||||
earliestDelim = delim;
|
||||
}
|
||||
}
|
||||
|
||||
if (earliest === -1) {
|
||||
pieces.push(remaining);
|
||||
break;
|
||||
}
|
||||
|
||||
// Include the delimiter with the preceding text
|
||||
const piece = remaining.slice(0, earliest + earliestDelim.length);
|
||||
if (piece.trim().length > 0) {
|
||||
pieces.push(piece);
|
||||
}
|
||||
remaining = remaining.slice(earliest + earliestDelim.length);
|
||||
}
|
||||
|
||||
// Handle trailing content
|
||||
if (remaining.trim().length > 0 && !pieces.includes(remaining)) {
|
||||
// Already added above
|
||||
}
|
||||
|
||||
return pieces.filter(p => p.trim().length > 0);
|
||||
}
|
||||
|
||||
/**
|
||||
* Fallback: split on whitespace boundaries to hit target word count.
|
||||
*/
|
||||
function splitOnWhitespace(text: string, target: number): string[] {
|
||||
const words = text.match(/\S+\s*/g) || [];
|
||||
if (words.length === 0) return [];
|
||||
|
||||
const pieces: string[] = [];
|
||||
for (let i = 0; i < words.length; i += target) {
|
||||
const slice = words.slice(i, i + target).join('');
|
||||
if (slice.trim().length > 0) {
|
||||
pieces.push(slice);
|
||||
}
|
||||
}
|
||||
return pieces;
|
||||
}
|
||||
|
||||
/**
|
||||
* Greedily merge adjacent pieces until each chunk is near the target size.
|
||||
* Avoids creating chunks larger than target * 1.5.
|
||||
*/
|
||||
function greedyMerge(pieces: string[], target: number): string[] {
|
||||
if (pieces.length === 0) return [];
|
||||
|
||||
const result: string[] = [];
|
||||
let current = pieces[0];
|
||||
|
||||
for (let i = 1; i < pieces.length; i++) {
|
||||
const combined = current + pieces[i];
|
||||
if (countWords(combined) <= Math.ceil(target * 1.5)) {
|
||||
current = combined;
|
||||
} else {
|
||||
result.push(current);
|
||||
current = pieces[i];
|
||||
}
|
||||
}
|
||||
|
||||
if (current.trim().length > 0) {
|
||||
result.push(current);
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply sentence-aware trailing overlap.
|
||||
* The last N words of chunk[i] are prepended to chunk[i+1].
|
||||
*/
|
||||
function applyOverlap(chunks: string[], overlapWords: number): string[] {
|
||||
if (chunks.length <= 1 || overlapWords <= 0) return chunks;
|
||||
|
||||
const result: string[] = [chunks[0]];
|
||||
|
||||
for (let i = 1; i < chunks.length; i++) {
|
||||
const prevTrailing = extractTrailingContext(chunks[i - 1], overlapWords);
|
||||
result.push(prevTrailing + chunks[i]);
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract the last N words from text, trying to align to sentence boundaries.
|
||||
* If a sentence boundary exists within the last N words, start there.
|
||||
*/
|
||||
function extractTrailingContext(text: string, targetWords: number): string {
|
||||
const words = text.match(/\S+\s*/g) || [];
|
||||
if (words.length <= targetWords) return '';
|
||||
|
||||
const trailing = words.slice(-targetWords).join('');
|
||||
|
||||
// Try to find a sentence boundary to start from
|
||||
const sentenceStart = trailing.search(/[.!?]\s+/);
|
||||
if (sentenceStart !== -1 && sentenceStart < trailing.length / 2) {
|
||||
// Start after the sentence boundary
|
||||
const afterSentence = trailing.slice(sentenceStart).replace(/^[.!?]\s+/, '');
|
||||
if (afterSentence.trim().length > 0) {
|
||||
return afterSentence;
|
||||
}
|
||||
}
|
||||
|
||||
return trailing;
|
||||
}
|
||||
|
||||
function countWords(text: string): number {
|
||||
return (text.match(/\S+/g) || []).length;
|
||||
}
|
||||
340
src/core/chunkers/semantic.ts
Normal file
340
src/core/chunkers/semantic.ts
Normal file
@@ -0,0 +1,340 @@
|
||||
/**
|
||||
* Semantic Text Chunker
|
||||
* Ported from production Ruby implementation (semantic_text_chunker.rb, 242 LOC)
|
||||
*
|
||||
* Algorithm:
|
||||
* 1. Split text into sentences
|
||||
* 2. Embed each sentence
|
||||
* 3. Compute adjacent cosine similarities
|
||||
* 4. Savitzky-Golay filter (5-window, 3rd-order polynomial)
|
||||
* 5. Find local minima (topic boundaries)
|
||||
* 6. Group sentences, recursively split oversized groups
|
||||
*
|
||||
* Falls back to recursive chunker on any failure.
|
||||
*/
|
||||
|
||||
import { chunkText as recursiveChunk, type TextChunk } from './recursive.ts';
|
||||
|
||||
export interface SemanticChunkOptions {
|
||||
chunkSize?: number;
|
||||
chunkOverlap?: number;
|
||||
embedFn?: (texts: string[]) => Promise<Float32Array[]>;
|
||||
}
|
||||
|
||||
export async function chunkTextSemantic(
|
||||
text: string,
|
||||
opts: SemanticChunkOptions,
|
||||
): Promise<TextChunk[]> {
|
||||
const chunkSize = opts.chunkSize || 300;
|
||||
const chunkOverlap = opts.chunkOverlap || 50;
|
||||
const embedFn = opts.embedFn;
|
||||
|
||||
if (!embedFn) {
|
||||
return recursiveChunk(text, { chunkSize, chunkOverlap });
|
||||
}
|
||||
|
||||
try {
|
||||
const sentences = splitSentences(text);
|
||||
if (sentences.length <= 3) {
|
||||
return recursiveChunk(text, { chunkSize, chunkOverlap });
|
||||
}
|
||||
|
||||
// Embed all sentences
|
||||
const embeddings = await embedFn(sentences);
|
||||
if (embeddings.length !== sentences.length) {
|
||||
return recursiveChunk(text, { chunkSize, chunkOverlap });
|
||||
}
|
||||
|
||||
// Compute adjacent cosine similarities
|
||||
const similarities = computeAdjacentSimilarities(embeddings);
|
||||
|
||||
// Find topic boundaries
|
||||
const boundaries = findBoundaries(similarities);
|
||||
|
||||
// Group sentences at boundaries
|
||||
const groups = groupAtBoundaries(sentences, boundaries);
|
||||
|
||||
// Recursively split oversized groups
|
||||
const chunks: TextChunk[] = [];
|
||||
let idx = 0;
|
||||
for (const group of groups) {
|
||||
const groupText = group.join(' ');
|
||||
const wordCount = (groupText.match(/\S+/g) || []).length;
|
||||
|
||||
if (wordCount > chunkSize * 1.5) {
|
||||
const subChunks = recursiveChunk(groupText, { chunkSize, chunkOverlap });
|
||||
for (const sc of subChunks) {
|
||||
chunks.push({ text: sc.text, index: idx++ });
|
||||
}
|
||||
} else {
|
||||
chunks.push({ text: groupText.trim(), index: idx++ });
|
||||
}
|
||||
}
|
||||
|
||||
return chunks;
|
||||
} catch {
|
||||
// Any failure falls back to recursive
|
||||
return recursiveChunk(text, { chunkSize, chunkOverlap });
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Split text into sentences. Handles common abbreviations.
|
||||
*/
|
||||
export function splitSentences(text: string): string[] {
|
||||
// Split on sentence-ending punctuation followed by whitespace or newline
|
||||
const raw = text.split(/(?<=[.!?])\s+/);
|
||||
return raw
|
||||
.map(s => s.trim())
|
||||
.filter(s => s.length > 0);
|
||||
}
|
||||
|
||||
/**
|
||||
* Compute cosine similarity between each adjacent pair of embeddings.
|
||||
* Returns array of length (embeddings.length - 1).
|
||||
*/
|
||||
function computeAdjacentSimilarities(embeddings: Float32Array[]): number[] {
|
||||
const sims: number[] = [];
|
||||
for (let i = 0; i < embeddings.length - 1; i++) {
|
||||
sims.push(cosineSimilarity(embeddings[i], embeddings[i + 1]));
|
||||
}
|
||||
return sims;
|
||||
}
|
||||
|
||||
/**
|
||||
* Find topic boundaries using Savitzky-Golay smoothing.
|
||||
* Falls back to percentile-based detection if SG fails.
|
||||
*/
|
||||
function findBoundaries(similarities: number[]): number[] {
|
||||
if (similarities.length < 5) {
|
||||
return findBoundariesPercentile(similarities);
|
||||
}
|
||||
|
||||
try {
|
||||
return findBoundariesSavGol(similarities);
|
||||
} catch {
|
||||
return findBoundariesPercentile(similarities);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Savitzky-Golay boundary detection.
|
||||
* Apply SG filter to get 1st derivative, find local minima.
|
||||
*/
|
||||
function findBoundariesSavGol(similarities: number[]): number[] {
|
||||
// Compute 1st derivative via Savitzky-Golay (window=5, poly=3, deriv=1)
|
||||
const derivative = savitzkyGolay(similarities, 5, 3, 1);
|
||||
|
||||
// Find zero crossings of the derivative (local minima)
|
||||
// A minimum is where derivative goes from negative to positive
|
||||
const minima: number[] = [];
|
||||
for (let i = 1; i < derivative.length; i++) {
|
||||
if (derivative[i - 1] < 0 && derivative[i] >= 0) {
|
||||
minima.push(i);
|
||||
}
|
||||
}
|
||||
|
||||
// Filter by percentile: only keep minima where similarity is below 80th percentile
|
||||
const threshold = percentile(similarities, 0.2); // low similarity = topic shift
|
||||
const filtered = minima.filter(i => {
|
||||
const simIdx = Math.min(i, similarities.length - 1);
|
||||
return similarities[simIdx] < threshold;
|
||||
});
|
||||
|
||||
// Enforce minimum distance of 2 between boundaries
|
||||
return enforceMinDistance(filtered, 2);
|
||||
}
|
||||
|
||||
/**
|
||||
* Simple percentile-based boundary detection.
|
||||
* Find positions where similarity drops below the 20th percentile.
|
||||
*/
|
||||
function findBoundariesPercentile(similarities: number[]): number[] {
|
||||
if (similarities.length === 0) return [];
|
||||
|
||||
const threshold = percentile(similarities, 0.2);
|
||||
const boundaries: number[] = [];
|
||||
|
||||
for (let i = 0; i < similarities.length; i++) {
|
||||
if (similarities[i] < threshold) {
|
||||
boundaries.push(i + 1); // boundary is after position i
|
||||
}
|
||||
}
|
||||
|
||||
return enforceMinDistance(boundaries, 2);
|
||||
}
|
||||
|
||||
/**
|
||||
* Savitzky-Golay filter implementation.
|
||||
* Polynomial fitting over a sliding window.
|
||||
*/
|
||||
function savitzkyGolay(
|
||||
data: number[],
|
||||
windowSize: number,
|
||||
polyOrder: number,
|
||||
derivOrder: number,
|
||||
): number[] {
|
||||
const half = Math.floor(windowSize / 2);
|
||||
const n = data.length;
|
||||
|
||||
if (n < windowSize) return data.slice();
|
||||
|
||||
// Build Vandermonde matrix for the window
|
||||
const J: number[][] = [];
|
||||
for (let i = -half; i <= half; i++) {
|
||||
const row: number[] = [];
|
||||
for (let j = 0; j <= polyOrder; j++) {
|
||||
row.push(Math.pow(i, j));
|
||||
}
|
||||
J.push(row);
|
||||
}
|
||||
|
||||
// Compute (J^T J)^-1 J^T
|
||||
const JT = transpose(J);
|
||||
const JTJ = matMul(JT, J);
|
||||
const JTJinv = invertMatrix(JTJ);
|
||||
const coeffs = matMul(JTJinv, JT);
|
||||
|
||||
// The row corresponding to derivOrder gives us the filter coefficients
|
||||
// For derivative of order d, multiply by d!
|
||||
const filterRow = coeffs[derivOrder];
|
||||
const factorial = factorialN(derivOrder);
|
||||
|
||||
const result: number[] = new Array(n).fill(0);
|
||||
|
||||
for (let i = 0; i < n; i++) {
|
||||
let val = 0;
|
||||
for (let j = -half; j <= half; j++) {
|
||||
const idx = Math.min(Math.max(i + j, 0), n - 1);
|
||||
val += filterRow[j + half] * data[idx];
|
||||
}
|
||||
result[i] = val * factorial;
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Group sentences into chunks at the given boundary positions.
|
||||
*/
|
||||
function groupAtBoundaries(sentences: string[], boundaries: number[]): string[][] {
|
||||
const groups: string[][] = [];
|
||||
let start = 0;
|
||||
|
||||
for (const b of boundaries) {
|
||||
if (b > start && b < sentences.length) {
|
||||
groups.push(sentences.slice(start, b));
|
||||
start = b;
|
||||
}
|
||||
}
|
||||
|
||||
// Last group
|
||||
if (start < sentences.length) {
|
||||
groups.push(sentences.slice(start));
|
||||
}
|
||||
|
||||
return groups.length > 0 ? groups : [sentences];
|
||||
}
|
||||
|
||||
// Math helpers
|
||||
|
||||
function cosineSimilarity(a: Float32Array, b: Float32Array): number {
|
||||
let dot = 0, normA = 0, normB = 0;
|
||||
for (let i = 0; i < a.length; i++) {
|
||||
dot += a[i] * b[i];
|
||||
normA += a[i] * a[i];
|
||||
normB += b[i] * b[i];
|
||||
}
|
||||
const denom = Math.sqrt(normA) * Math.sqrt(normB);
|
||||
return denom === 0 ? 0 : dot / denom;
|
||||
}
|
||||
|
||||
function percentile(arr: number[], p: number): number {
|
||||
const sorted = [...arr].sort((a, b) => a - b);
|
||||
const idx = Math.floor(p * sorted.length);
|
||||
return sorted[Math.min(idx, sorted.length - 1)];
|
||||
}
|
||||
|
||||
function enforceMinDistance(boundaries: number[], minDist: number): number[] {
|
||||
if (boundaries.length <= 1) return boundaries;
|
||||
const result = [boundaries[0]];
|
||||
for (let i = 1; i < boundaries.length; i++) {
|
||||
if (boundaries[i] - result[result.length - 1] >= minDist) {
|
||||
result.push(boundaries[i]);
|
||||
}
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
function transpose(m: number[][]): number[][] {
|
||||
const rows = m.length, cols = m[0].length;
|
||||
const result: number[][] = Array.from({ length: cols }, () => new Array(rows).fill(0));
|
||||
for (let i = 0; i < rows; i++) {
|
||||
for (let j = 0; j < cols; j++) {
|
||||
result[j][i] = m[i][j];
|
||||
}
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
function matMul(a: number[][], b: number[][]): number[][] {
|
||||
const rows = a.length, cols = b[0].length, inner = b.length;
|
||||
const result: number[][] = Array.from({ length: rows }, () => new Array(cols).fill(0));
|
||||
for (let i = 0; i < rows; i++) {
|
||||
for (let j = 0; j < cols; j++) {
|
||||
for (let k = 0; k < inner; k++) {
|
||||
result[i][j] += a[i][k] * b[k][j];
|
||||
}
|
||||
}
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
function invertMatrix(m: number[][]): number[][] {
|
||||
const n = m.length;
|
||||
// Augment with identity
|
||||
const aug: number[][] = m.map((row, i) => {
|
||||
const identity = new Array(n).fill(0);
|
||||
identity[i] = 1;
|
||||
return [...row, ...identity];
|
||||
});
|
||||
|
||||
// Gauss-Jordan elimination
|
||||
for (let col = 0; col < n; col++) {
|
||||
// Find pivot
|
||||
let maxRow = col;
|
||||
for (let row = col + 1; row < n; row++) {
|
||||
if (Math.abs(aug[row][col]) > Math.abs(aug[maxRow][col])) {
|
||||
maxRow = row;
|
||||
}
|
||||
}
|
||||
[aug[col], aug[maxRow]] = [aug[maxRow], aug[col]];
|
||||
|
||||
const pivot = aug[col][col];
|
||||
if (Math.abs(pivot) < 1e-12) {
|
||||
throw new Error('Matrix is singular');
|
||||
}
|
||||
|
||||
// Scale pivot row
|
||||
for (let j = 0; j < 2 * n; j++) {
|
||||
aug[col][j] /= pivot;
|
||||
}
|
||||
|
||||
// Eliminate column
|
||||
for (let row = 0; row < n; row++) {
|
||||
if (row === col) continue;
|
||||
const factor = aug[row][col];
|
||||
for (let j = 0; j < 2 * n; j++) {
|
||||
aug[row][j] -= factor * aug[col][j];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return aug.map(row => row.slice(n));
|
||||
}
|
||||
|
||||
function factorialN(n: number): number {
|
||||
let result = 1;
|
||||
for (let i = 2; i <= n; i++) result *= i;
|
||||
return result;
|
||||
}
|
||||
50
src/core/config.ts
Normal file
50
src/core/config.ts
Normal file
@@ -0,0 +1,50 @@
|
||||
import { readFileSync, writeFileSync, mkdirSync, chmodSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
import { homedir } from 'os';
|
||||
import type { EngineConfig } from './types.ts';
|
||||
|
||||
const CONFIG_DIR = join(homedir(), '.gbrain');
|
||||
const CONFIG_PATH = join(CONFIG_DIR, 'config.json');
|
||||
|
||||
export interface GBrainConfig {
|
||||
engine: 'postgres' | 'sqlite';
|
||||
database_url?: string;
|
||||
database_path?: string;
|
||||
openai_api_key?: string;
|
||||
anthropic_api_key?: string;
|
||||
}
|
||||
|
||||
export function loadConfig(): GBrainConfig | null {
|
||||
try {
|
||||
const raw = readFileSync(CONFIG_PATH, 'utf-8');
|
||||
return JSON.parse(raw) as GBrainConfig;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
export function saveConfig(config: GBrainConfig): void {
|
||||
mkdirSync(CONFIG_DIR, { recursive: true });
|
||||
writeFileSync(CONFIG_PATH, JSON.stringify(config, null, 2) + '\n', { mode: 0o600 });
|
||||
try {
|
||||
chmodSync(CONFIG_PATH, 0o600);
|
||||
} catch {
|
||||
// chmod may fail on some platforms
|
||||
}
|
||||
}
|
||||
|
||||
export function toEngineConfig(config: GBrainConfig): EngineConfig {
|
||||
return {
|
||||
engine: config.engine,
|
||||
database_url: config.database_url,
|
||||
database_path: config.database_path,
|
||||
};
|
||||
}
|
||||
|
||||
export function getConfigDir(): string {
|
||||
return CONFIG_DIR;
|
||||
}
|
||||
|
||||
export function getConfigPath(): string {
|
||||
return CONFIG_PATH;
|
||||
}
|
||||
102
src/core/db.ts
Normal file
102
src/core/db.ts
Normal file
@@ -0,0 +1,102 @@
|
||||
import postgres from 'postgres';
|
||||
import { readFileSync } from 'fs';
|
||||
import { join, dirname } from 'path';
|
||||
import { GBrainError, type EngineConfig } from './types.ts';
|
||||
|
||||
let sql: ReturnType<typeof postgres> | null = null;
|
||||
|
||||
export function getConnection(): ReturnType<typeof postgres> {
|
||||
if (!sql) {
|
||||
throw new GBrainError(
|
||||
'No database connection',
|
||||
'connect() has not been called',
|
||||
'Run gbrain init --supabase or gbrain init --url <connection_string>',
|
||||
);
|
||||
}
|
||||
return sql;
|
||||
}
|
||||
|
||||
export async function connect(config: EngineConfig): Promise<void> {
|
||||
if (sql) return;
|
||||
|
||||
const url = config.database_url;
|
||||
if (!url) {
|
||||
throw new GBrainError(
|
||||
'No database URL',
|
||||
'database_url is missing from config',
|
||||
'Run gbrain init --supabase or gbrain init --url <connection_string>',
|
||||
);
|
||||
}
|
||||
|
||||
try {
|
||||
sql = postgres(url, {
|
||||
max: 10,
|
||||
idle_timeout: 20,
|
||||
connect_timeout: 10,
|
||||
types: {
|
||||
// Register pgvector type
|
||||
bigint: postgres.BigInt,
|
||||
},
|
||||
});
|
||||
|
||||
// Test connection
|
||||
await sql`SELECT 1`;
|
||||
} catch (e: unknown) {
|
||||
sql = null;
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
throw new GBrainError(
|
||||
'Cannot connect to database',
|
||||
msg,
|
||||
'Check your connection URL in ~/.gbrain/config.json',
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
export async function disconnect(): Promise<void> {
|
||||
if (sql) {
|
||||
await sql.end();
|
||||
sql = null;
|
||||
}
|
||||
}
|
||||
|
||||
export async function initSchema(): Promise<void> {
|
||||
const conn = getConnection();
|
||||
|
||||
// Read schema SQL
|
||||
const schemaPath = join(dirname(new URL(import.meta.url).pathname), '..', 'schema.sql');
|
||||
const schemaSql = readFileSync(schemaPath, 'utf-8');
|
||||
|
||||
// Split on semicolons and execute each statement
|
||||
// (postgres driver can handle multi-statement, but explicit is safer)
|
||||
const statements = schemaSql
|
||||
.split(/;\s*$/m)
|
||||
.map(s => s.trim())
|
||||
.filter(s => s.length > 0 && !s.startsWith('--'));
|
||||
|
||||
for (const stmt of statements) {
|
||||
try {
|
||||
await conn.unsafe(stmt);
|
||||
} catch (e: unknown) {
|
||||
// Ignore "already exists" errors for idempotency
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
if (msg.includes('already exists') || msg.includes('duplicate key')) {
|
||||
continue;
|
||||
}
|
||||
throw e;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
export async function withTransaction<T>(fn: () => Promise<T>): Promise<T> {
|
||||
const conn = getConnection();
|
||||
return conn.begin(async (tx) => {
|
||||
// Temporarily swap global connection to transaction
|
||||
const prev = sql;
|
||||
sql = tx as unknown as ReturnType<typeof postgres>;
|
||||
try {
|
||||
return await fn();
|
||||
} finally {
|
||||
sql = prev;
|
||||
}
|
||||
});
|
||||
}
|
||||
94
src/core/embedding.ts
Normal file
94
src/core/embedding.ts
Normal file
@@ -0,0 +1,94 @@
|
||||
/**
|
||||
* Embedding Service
|
||||
* Ported from production Ruby implementation (embedding_service.rb, 190 LOC)
|
||||
*
|
||||
* OpenAI text-embedding-3-large at 1536 dimensions.
|
||||
* Retry with exponential backoff (4s base, 120s cap, 5 retries).
|
||||
* 8000 character input truncation.
|
||||
*/
|
||||
|
||||
import OpenAI from 'openai';
|
||||
|
||||
const MODEL = 'text-embedding-3-large';
|
||||
const DIMENSIONS = 1536;
|
||||
const MAX_CHARS = 8000;
|
||||
const MAX_RETRIES = 5;
|
||||
const BASE_DELAY_MS = 4000;
|
||||
const MAX_DELAY_MS = 120000;
|
||||
const BATCH_SIZE = 100;
|
||||
|
||||
let client: OpenAI | null = null;
|
||||
|
||||
function getClient(): OpenAI {
|
||||
if (!client) {
|
||||
client = new OpenAI();
|
||||
}
|
||||
return client;
|
||||
}
|
||||
|
||||
export async function embed(text: string): Promise<Float32Array> {
|
||||
const truncated = text.slice(0, MAX_CHARS);
|
||||
const result = await embedBatch([truncated]);
|
||||
return result[0];
|
||||
}
|
||||
|
||||
export async function embedBatch(texts: string[]): Promise<Float32Array[]> {
|
||||
const truncated = texts.map(t => t.slice(0, MAX_CHARS));
|
||||
const results: Float32Array[] = [];
|
||||
|
||||
// Process in batches of BATCH_SIZE
|
||||
for (let i = 0; i < truncated.length; i += BATCH_SIZE) {
|
||||
const batch = truncated.slice(i, i + BATCH_SIZE);
|
||||
const batchResults = await embedBatchWithRetry(batch);
|
||||
results.push(...batchResults);
|
||||
}
|
||||
|
||||
return results;
|
||||
}
|
||||
|
||||
async function embedBatchWithRetry(texts: string[]): Promise<Float32Array[]> {
|
||||
for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
|
||||
try {
|
||||
const response = await getClient().embeddings.create({
|
||||
model: MODEL,
|
||||
input: texts,
|
||||
dimensions: DIMENSIONS,
|
||||
});
|
||||
|
||||
// Sort by index to maintain order
|
||||
const sorted = response.data.sort((a, b) => a.index - b.index);
|
||||
return sorted.map(d => new Float32Array(d.embedding));
|
||||
} catch (e: unknown) {
|
||||
if (attempt === MAX_RETRIES - 1) throw e;
|
||||
|
||||
// Check for rate limit with Retry-After header
|
||||
let delay = exponentialDelay(attempt);
|
||||
|
||||
if (e instanceof OpenAI.APIError && e.status === 429) {
|
||||
const retryAfter = e.headers?.['retry-after'];
|
||||
if (retryAfter) {
|
||||
const parsed = parseInt(retryAfter, 10);
|
||||
if (!isNaN(parsed)) {
|
||||
delay = parsed * 1000;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
await sleep(delay);
|
||||
}
|
||||
}
|
||||
|
||||
// Should not reach here
|
||||
throw new Error('Embedding failed after all retries');
|
||||
}
|
||||
|
||||
function exponentialDelay(attempt: number): number {
|
||||
const delay = BASE_DELAY_MS * Math.pow(2, attempt);
|
||||
return Math.min(delay, MAX_DELAY_MS);
|
||||
}
|
||||
|
||||
function sleep(ms: number): Promise<void> {
|
||||
return new Promise(resolve => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
export { MODEL as EMBEDDING_MODEL, DIMENSIONS as EMBEDDING_DIMENSIONS };
|
||||
73
src/core/engine.ts
Normal file
73
src/core/engine.ts
Normal file
@@ -0,0 +1,73 @@
|
||||
import type {
|
||||
Page, PageInput, PageFilters,
|
||||
Chunk, ChunkInput,
|
||||
SearchResult, SearchOpts,
|
||||
Link, GraphNode,
|
||||
TimelineEntry, TimelineInput, TimelineOpts,
|
||||
RawData,
|
||||
PageVersion,
|
||||
BrainStats, BrainHealth,
|
||||
IngestLogEntry, IngestLogInput,
|
||||
EngineConfig,
|
||||
} from './types.ts';
|
||||
|
||||
export interface BrainEngine {
|
||||
// Lifecycle
|
||||
connect(config: EngineConfig): Promise<void>;
|
||||
disconnect(): Promise<void>;
|
||||
initSchema(): Promise<void>;
|
||||
transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T>;
|
||||
|
||||
// Pages CRUD
|
||||
getPage(slug: string): Promise<Page | null>;
|
||||
putPage(slug: string, page: PageInput): Promise<Page>;
|
||||
deletePage(slug: string): Promise<void>;
|
||||
listPages(filters?: PageFilters): Promise<Page[]>;
|
||||
resolveSlugs(partial: string): Promise<string[]>;
|
||||
|
||||
// Search
|
||||
searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]>;
|
||||
searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]>;
|
||||
|
||||
// Chunks
|
||||
upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void>;
|
||||
getChunks(slug: string): Promise<Chunk[]>;
|
||||
deleteChunks(slug: string): Promise<void>;
|
||||
|
||||
// Links
|
||||
addLink(from: string, to: string, context?: string, linkType?: string): Promise<void>;
|
||||
removeLink(from: string, to: string): Promise<void>;
|
||||
getLinks(slug: string): Promise<Link[]>;
|
||||
getBacklinks(slug: string): Promise<Link[]>;
|
||||
traverseGraph(slug: string, depth?: number): Promise<GraphNode[]>;
|
||||
|
||||
// Tags
|
||||
addTag(slug: string, tag: string): Promise<void>;
|
||||
removeTag(slug: string, tag: string): Promise<void>;
|
||||
getTags(slug: string): Promise<string[]>;
|
||||
|
||||
// Timeline
|
||||
addTimelineEntry(slug: string, entry: TimelineInput): Promise<void>;
|
||||
getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]>;
|
||||
|
||||
// Raw data
|
||||
putRawData(slug: string, source: string, data: object): Promise<void>;
|
||||
getRawData(slug: string, source?: string): Promise<RawData[]>;
|
||||
|
||||
// Versions
|
||||
createVersion(slug: string): Promise<PageVersion>;
|
||||
getVersions(slug: string): Promise<PageVersion[]>;
|
||||
revertToVersion(slug: string, versionId: number): Promise<void>;
|
||||
|
||||
// Stats + health
|
||||
getStats(): Promise<BrainStats>;
|
||||
getHealth(): Promise<BrainHealth>;
|
||||
|
||||
// Ingest log
|
||||
logIngest(entry: IngestLogInput): Promise<void>;
|
||||
getIngestLog(opts?: { limit?: number }): Promise<IngestLogEntry[]>;
|
||||
|
||||
// Config
|
||||
getConfig(key: string): Promise<string | null>;
|
||||
setConfig(key: string, value: string): Promise<void>;
|
||||
}
|
||||
4
src/core/index.ts
Normal file
4
src/core/index.ts
Normal file
@@ -0,0 +1,4 @@
|
||||
export type { BrainEngine } from './engine.ts';
|
||||
export { PostgresEngine } from './postgres-engine.ts';
|
||||
export * from './types.ts';
|
||||
export { parseMarkdown, serializeMarkdown, splitBody } from './markdown.ts';
|
||||
170
src/core/markdown.ts
Normal file
170
src/core/markdown.ts
Normal file
@@ -0,0 +1,170 @@
|
||||
import matter from 'gray-matter';
|
||||
import type { PageType } from './types.ts';
|
||||
|
||||
export interface ParsedMarkdown {
|
||||
frontmatter: Record<string, unknown>;
|
||||
compiled_truth: string;
|
||||
timeline: string;
|
||||
slug: string;
|
||||
type: PageType;
|
||||
title: string;
|
||||
tags: string[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse a markdown file with YAML frontmatter into its components.
|
||||
*
|
||||
* Structure:
|
||||
* ---
|
||||
* type: concept
|
||||
* title: Do Things That Don't Scale
|
||||
* tags: [startups, growth]
|
||||
* ---
|
||||
* Compiled truth content here...
|
||||
* ---
|
||||
* Timeline content here...
|
||||
*
|
||||
* The first --- pair is YAML frontmatter (handled by gray-matter).
|
||||
* After frontmatter, the body is split at the first standalone ---
|
||||
* (a line containing only --- with optional whitespace).
|
||||
* Everything before is compiled_truth, everything after is timeline.
|
||||
* If no body --- exists, all content is compiled_truth.
|
||||
*/
|
||||
export function parseMarkdown(content: string, filePath?: string): ParsedMarkdown {
|
||||
const { data: frontmatter, content: body } = matter(content);
|
||||
|
||||
// Split body at first standalone ---
|
||||
const { compiled_truth, timeline } = splitBody(body);
|
||||
|
||||
// Extract metadata from frontmatter
|
||||
const type = (frontmatter.type as PageType) || inferType(filePath);
|
||||
const title = (frontmatter.title as string) || inferTitle(filePath);
|
||||
const tags = extractTags(frontmatter);
|
||||
const slug = (frontmatter.slug as string) || inferSlug(filePath);
|
||||
|
||||
// Remove processed fields from frontmatter (they're stored as columns)
|
||||
const cleanFrontmatter = { ...frontmatter };
|
||||
delete cleanFrontmatter.type;
|
||||
delete cleanFrontmatter.title;
|
||||
delete cleanFrontmatter.tags;
|
||||
delete cleanFrontmatter.slug;
|
||||
|
||||
return {
|
||||
frontmatter: cleanFrontmatter,
|
||||
compiled_truth: compiled_truth.trim(),
|
||||
timeline: timeline.trim(),
|
||||
slug,
|
||||
type,
|
||||
title,
|
||||
tags,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Split body content at first standalone --- separator.
|
||||
* Returns compiled_truth (before) and timeline (after).
|
||||
*/
|
||||
export function splitBody(body: string): { compiled_truth: string; timeline: string } {
|
||||
// Match a line that is only --- (with optional whitespace)
|
||||
// Must not be at the very start (that would be frontmatter)
|
||||
const lines = body.split('\n');
|
||||
let splitIndex = -1;
|
||||
|
||||
for (let i = 0; i < lines.length; i++) {
|
||||
const trimmed = lines[i].trim();
|
||||
if (trimmed === '---') {
|
||||
// Skip if this is the very first non-empty line (leftover from frontmatter parsing)
|
||||
const beforeContent = lines.slice(0, i).join('\n').trim();
|
||||
if (beforeContent.length > 0) {
|
||||
splitIndex = i;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (splitIndex === -1) {
|
||||
return { compiled_truth: body, timeline: '' };
|
||||
}
|
||||
|
||||
const compiled_truth = lines.slice(0, splitIndex).join('\n');
|
||||
const timeline = lines.slice(splitIndex + 1).join('\n');
|
||||
return { compiled_truth, timeline };
|
||||
}
|
||||
|
||||
/**
|
||||
* Serialize a page back to markdown format.
|
||||
* Produces: frontmatter + compiled_truth + --- + timeline
|
||||
*/
|
||||
export function serializeMarkdown(
|
||||
frontmatter: Record<string, unknown>,
|
||||
compiled_truth: string,
|
||||
timeline: string,
|
||||
meta: { type: PageType; title: string; tags: string[] },
|
||||
): string {
|
||||
// Build full frontmatter including type, title, tags
|
||||
const fullFrontmatter: Record<string, unknown> = {
|
||||
type: meta.type,
|
||||
title: meta.title,
|
||||
...frontmatter,
|
||||
};
|
||||
if (meta.tags.length > 0) {
|
||||
fullFrontmatter.tags = meta.tags;
|
||||
}
|
||||
|
||||
const yamlContent = matter.stringify('', fullFrontmatter).trim();
|
||||
|
||||
let body = compiled_truth;
|
||||
if (timeline) {
|
||||
body += '\n\n---\n\n' + timeline;
|
||||
}
|
||||
|
||||
return yamlContent + '\n\n' + body + '\n';
|
||||
}
|
||||
|
||||
function inferType(filePath?: string): PageType {
|
||||
if (!filePath) return 'concept';
|
||||
|
||||
// Normalize: add leading / for consistent matching
|
||||
const lower = ('/' + filePath).toLowerCase();
|
||||
if (lower.includes('/people/') || lower.includes('/person/')) return 'person';
|
||||
if (lower.includes('/companies/') || lower.includes('/company/')) return 'company';
|
||||
if (lower.includes('/deals/') || lower.includes('/deal/')) return 'deal';
|
||||
if (lower.includes('/yc/')) return 'yc';
|
||||
if (lower.includes('/civic/')) return 'civic';
|
||||
if (lower.includes('/projects/') || lower.includes('/project/')) return 'project';
|
||||
if (lower.includes('/sources/') || lower.includes('/source/')) return 'source';
|
||||
if (lower.includes('/media/')) return 'media';
|
||||
return 'concept';
|
||||
}
|
||||
|
||||
function inferTitle(filePath?: string): string {
|
||||
if (!filePath) return 'Untitled';
|
||||
|
||||
// Extract filename without extension, convert dashes/underscores to spaces
|
||||
const parts = filePath.split('/');
|
||||
const filename = parts[parts.length - 1]?.replace(/\.md$/i, '') || 'Untitled';
|
||||
return filename.replace(/[-_]/g, ' ').replace(/\b\w/g, c => c.toUpperCase());
|
||||
}
|
||||
|
||||
function inferSlug(filePath?: string): string {
|
||||
if (!filePath) return 'untitled';
|
||||
|
||||
// Remove leading path components that are just the import root
|
||||
// Keep the type directory + filename structure
|
||||
let slug = filePath
|
||||
.replace(/\.md$/i, '')
|
||||
.replace(/\\/g, '/');
|
||||
|
||||
// Remove leading ./
|
||||
if (slug.startsWith('./')) slug = slug.slice(2);
|
||||
|
||||
return slug.toLowerCase();
|
||||
}
|
||||
|
||||
function extractTags(frontmatter: Record<string, unknown>): string[] {
|
||||
const tags = frontmatter.tags;
|
||||
if (!tags) return [];
|
||||
if (Array.isArray(tags)) return tags.map(String);
|
||||
if (typeof tags === 'string') return tags.split(',').map(t => t.trim()).filter(Boolean);
|
||||
return [];
|
||||
}
|
||||
590
src/core/postgres-engine.ts
Normal file
590
src/core/postgres-engine.ts
Normal file
@@ -0,0 +1,590 @@
|
||||
import { createHash } from 'crypto';
|
||||
import type { BrainEngine } from './engine.ts';
|
||||
import type {
|
||||
Page, PageInput, PageFilters, PageType,
|
||||
Chunk, ChunkInput,
|
||||
SearchResult, SearchOpts,
|
||||
Link, GraphNode,
|
||||
TimelineEntry, TimelineInput, TimelineOpts,
|
||||
RawData,
|
||||
PageVersion,
|
||||
BrainStats, BrainHealth,
|
||||
IngestLogEntry, IngestLogInput,
|
||||
EngineConfig,
|
||||
} from './types.ts';
|
||||
import * as db from './db.ts';
|
||||
|
||||
export class PostgresEngine implements BrainEngine {
|
||||
// Lifecycle
|
||||
async connect(config: EngineConfig): Promise<void> {
|
||||
await db.connect(config);
|
||||
}
|
||||
|
||||
async disconnect(): Promise<void> {
|
||||
await db.disconnect();
|
||||
}
|
||||
|
||||
async initSchema(): Promise<void> {
|
||||
await db.initSchema();
|
||||
}
|
||||
|
||||
async transaction<T>(fn: (engine: BrainEngine) => Promise<T>): Promise<T> {
|
||||
return db.withTransaction(() => fn(this));
|
||||
}
|
||||
|
||||
// Pages CRUD
|
||||
async getPage(slug: string): Promise<Page | null> {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`
|
||||
SELECT id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash, created_at, updated_at
|
||||
FROM pages WHERE slug = ${slug}
|
||||
`;
|
||||
if (rows.length === 0) return null;
|
||||
return rowToPage(rows[0]);
|
||||
}
|
||||
|
||||
async putPage(slug: string, page: PageInput): Promise<Page> {
|
||||
validateSlug(slug);
|
||||
const sql = db.getConnection();
|
||||
const hash = contentHash(page.compiled_truth, page.timeline || '');
|
||||
const frontmatter = page.frontmatter || {};
|
||||
|
||||
const rows = await sql`
|
||||
INSERT INTO pages (slug, type, title, compiled_truth, timeline, frontmatter, content_hash, updated_at)
|
||||
VALUES (${slug}, ${page.type}, ${page.title}, ${page.compiled_truth}, ${page.timeline || ''}, ${JSON.stringify(frontmatter)}::jsonb, ${hash}, now())
|
||||
ON CONFLICT (slug) DO UPDATE SET
|
||||
type = EXCLUDED.type,
|
||||
title = EXCLUDED.title,
|
||||
compiled_truth = EXCLUDED.compiled_truth,
|
||||
timeline = EXCLUDED.timeline,
|
||||
frontmatter = EXCLUDED.frontmatter,
|
||||
content_hash = EXCLUDED.content_hash,
|
||||
updated_at = now()
|
||||
RETURNING id, slug, type, title, compiled_truth, timeline, frontmatter, content_hash, created_at, updated_at
|
||||
`;
|
||||
return rowToPage(rows[0]);
|
||||
}
|
||||
|
||||
async deletePage(slug: string): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`DELETE FROM pages WHERE slug = ${slug}`;
|
||||
}
|
||||
|
||||
async listPages(filters?: PageFilters): Promise<Page[]> {
|
||||
const sql = db.getConnection();
|
||||
const limit = filters?.limit || 100;
|
||||
const offset = filters?.offset || 0;
|
||||
|
||||
let rows;
|
||||
if (filters?.type && filters?.tag) {
|
||||
rows = await sql`
|
||||
SELECT p.* FROM pages p
|
||||
JOIN tags t ON t.page_id = p.id
|
||||
WHERE p.type = ${filters.type} AND t.tag = ${filters.tag}
|
||||
ORDER BY p.updated_at DESC LIMIT ${limit} OFFSET ${offset}
|
||||
`;
|
||||
} else if (filters?.type) {
|
||||
rows = await sql`
|
||||
SELECT * FROM pages WHERE type = ${filters.type}
|
||||
ORDER BY updated_at DESC LIMIT ${limit} OFFSET ${offset}
|
||||
`;
|
||||
} else if (filters?.tag) {
|
||||
rows = await sql`
|
||||
SELECT p.* FROM pages p
|
||||
JOIN tags t ON t.page_id = p.id
|
||||
WHERE t.tag = ${filters.tag}
|
||||
ORDER BY p.updated_at DESC LIMIT ${limit} OFFSET ${offset}
|
||||
`;
|
||||
} else {
|
||||
rows = await sql`
|
||||
SELECT * FROM pages
|
||||
ORDER BY updated_at DESC LIMIT ${limit} OFFSET ${offset}
|
||||
`;
|
||||
}
|
||||
|
||||
return rows.map(rowToPage);
|
||||
}
|
||||
|
||||
async resolveSlugs(partial: string): Promise<string[]> {
|
||||
const sql = db.getConnection();
|
||||
|
||||
// Try exact match first
|
||||
const exact = await sql`SELECT slug FROM pages WHERE slug = ${partial}`;
|
||||
if (exact.length > 0) return [exact[0].slug];
|
||||
|
||||
// Fuzzy match via pg_trgm
|
||||
const fuzzy = await sql`
|
||||
SELECT slug, similarity(title, ${partial}) AS sim
|
||||
FROM pages
|
||||
WHERE title % ${partial} OR slug ILIKE ${'%' + partial + '%'}
|
||||
ORDER BY sim DESC
|
||||
LIMIT 5
|
||||
`;
|
||||
return fuzzy.map((r: { slug: string }) => r.slug);
|
||||
}
|
||||
|
||||
// Search
|
||||
async searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]> {
|
||||
const sql = db.getConnection();
|
||||
const limit = opts?.limit || 20;
|
||||
|
||||
const rows = await sql`
|
||||
SELECT
|
||||
p.slug, p.id as page_id, p.title, p.type,
|
||||
cc.chunk_text, cc.chunk_source,
|
||||
ts_rank(p.search_vector, websearch_to_tsquery('english', ${query})) AS score,
|
||||
CASE WHEN p.updated_at < (
|
||||
SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id
|
||||
) THEN true ELSE false END AS stale
|
||||
FROM pages p
|
||||
JOIN content_chunks cc ON cc.page_id = p.id
|
||||
WHERE p.search_vector @@ websearch_to_tsquery('english', ${query})
|
||||
ORDER BY score DESC
|
||||
LIMIT ${limit}
|
||||
`;
|
||||
|
||||
return rows.map(rowToSearchResult);
|
||||
}
|
||||
|
||||
async searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]> {
|
||||
const sql = db.getConnection();
|
||||
const limit = opts?.limit || 20;
|
||||
const vecStr = '[' + Array.from(embedding).join(',') + ']';
|
||||
|
||||
const rows = await sql`
|
||||
SELECT
|
||||
p.slug, p.id as page_id, p.title, p.type,
|
||||
cc.chunk_text, cc.chunk_source,
|
||||
1 - (cc.embedding <=> ${vecStr}::vector) AS score,
|
||||
CASE WHEN p.updated_at < (
|
||||
SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id
|
||||
) THEN true ELSE false END AS stale
|
||||
FROM content_chunks cc
|
||||
JOIN pages p ON p.id = cc.page_id
|
||||
WHERE cc.embedding IS NOT NULL
|
||||
ORDER BY cc.embedding <=> ${vecStr}::vector
|
||||
LIMIT ${limit}
|
||||
`;
|
||||
|
||||
return rows.map(rowToSearchResult);
|
||||
}
|
||||
|
||||
// Chunks
|
||||
async upsertChunks(slug: string, chunks: ChunkInput[]): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
|
||||
// Get page_id
|
||||
const pages = await sql`SELECT id FROM pages WHERE slug = ${slug}`;
|
||||
if (pages.length === 0) throw new Error(`Page not found: ${slug}`);
|
||||
const pageId = pages[0].id;
|
||||
|
||||
// Delete existing chunks for this page
|
||||
await sql`DELETE FROM content_chunks WHERE page_id = ${pageId}`;
|
||||
|
||||
// Insert new chunks
|
||||
if (chunks.length === 0) return;
|
||||
|
||||
for (const chunk of chunks) {
|
||||
const embeddingStr = chunk.embedding
|
||||
? '[' + Array.from(chunk.embedding).join(',') + ']'
|
||||
: null;
|
||||
|
||||
await sql`
|
||||
INSERT INTO content_chunks (page_id, chunk_index, chunk_text, chunk_source, embedding, model, token_count, embedded_at)
|
||||
VALUES (
|
||||
${pageId}, ${chunk.chunk_index}, ${chunk.chunk_text}, ${chunk.chunk_source},
|
||||
${embeddingStr ? sql`${embeddingStr}::vector` : sql`NULL`},
|
||||
${chunk.model || 'text-embedding-3-large'},
|
||||
${chunk.token_count || null},
|
||||
${chunk.embedding ? sql`now()` : sql`NULL`}
|
||||
)
|
||||
`;
|
||||
}
|
||||
}
|
||||
|
||||
async getChunks(slug: string): Promise<Chunk[]> {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`
|
||||
SELECT cc.* FROM content_chunks cc
|
||||
JOIN pages p ON p.id = cc.page_id
|
||||
WHERE p.slug = ${slug}
|
||||
ORDER BY cc.chunk_index
|
||||
`;
|
||||
return rows.map(rowToChunk);
|
||||
}
|
||||
|
||||
async deleteChunks(slug: string): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
DELETE FROM content_chunks
|
||||
WHERE page_id = (SELECT id FROM pages WHERE slug = ${slug})
|
||||
`;
|
||||
}
|
||||
|
||||
// Links
|
||||
async addLink(from: string, to: string, context?: string, linkType?: string): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
INSERT INTO links (from_page_id, to_page_id, link_type, context)
|
||||
SELECT f.id, t.id, ${linkType || ''}, ${context || ''}
|
||||
FROM pages f, pages t
|
||||
WHERE f.slug = ${from} AND t.slug = ${to}
|
||||
ON CONFLICT (from_page_id, to_page_id) DO UPDATE SET
|
||||
link_type = EXCLUDED.link_type,
|
||||
context = EXCLUDED.context
|
||||
`;
|
||||
}
|
||||
|
||||
async removeLink(from: string, to: string): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
DELETE FROM links
|
||||
WHERE from_page_id = (SELECT id FROM pages WHERE slug = ${from})
|
||||
AND to_page_id = (SELECT id FROM pages WHERE slug = ${to})
|
||||
`;
|
||||
}
|
||||
|
||||
async getLinks(slug: string): Promise<Link[]> {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`
|
||||
SELECT f.slug as from_slug, t.slug as to_slug, l.link_type, l.context
|
||||
FROM links l
|
||||
JOIN pages f ON f.id = l.from_page_id
|
||||
JOIN pages t ON t.id = l.to_page_id
|
||||
WHERE f.slug = ${slug}
|
||||
`;
|
||||
return rows as unknown as Link[];
|
||||
}
|
||||
|
||||
async getBacklinks(slug: string): Promise<Link[]> {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`
|
||||
SELECT f.slug as from_slug, t.slug as to_slug, l.link_type, l.context
|
||||
FROM links l
|
||||
JOIN pages f ON f.id = l.from_page_id
|
||||
JOIN pages t ON t.id = l.to_page_id
|
||||
WHERE t.slug = ${slug}
|
||||
`;
|
||||
return rows as unknown as Link[];
|
||||
}
|
||||
|
||||
async traverseGraph(slug: string, depth: number = 5): Promise<GraphNode[]> {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`
|
||||
WITH RECURSIVE graph AS (
|
||||
SELECT p.id, p.slug, p.title, p.type, 0 as depth
|
||||
FROM pages p WHERE p.slug = ${slug}
|
||||
|
||||
UNION
|
||||
|
||||
SELECT p2.id, p2.slug, p2.title, p2.type, g.depth + 1
|
||||
FROM graph g
|
||||
JOIN links l ON l.from_page_id = g.id
|
||||
JOIN pages p2 ON p2.id = l.to_page_id
|
||||
WHERE g.depth < ${depth}
|
||||
)
|
||||
SELECT DISTINCT g.slug, g.title, g.type, g.depth,
|
||||
coalesce(
|
||||
(SELECT json_agg(json_build_object('to_slug', p3.slug, 'link_type', l2.link_type))
|
||||
FROM links l2
|
||||
JOIN pages p3 ON p3.id = l2.to_page_id
|
||||
WHERE l2.from_page_id = g.id),
|
||||
'[]'::json
|
||||
) as links
|
||||
FROM graph g
|
||||
ORDER BY g.depth, g.slug
|
||||
`;
|
||||
|
||||
return rows.map((r: Record<string, unknown>) => ({
|
||||
slug: r.slug as string,
|
||||
title: r.title as string,
|
||||
type: r.type as PageType,
|
||||
depth: r.depth as number,
|
||||
links: (typeof r.links === 'string' ? JSON.parse(r.links) : r.links) as { to_slug: string; link_type: string }[],
|
||||
}));
|
||||
}
|
||||
|
||||
// Tags
|
||||
async addTag(slug: string, tag: string): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
INSERT INTO tags (page_id, tag)
|
||||
SELECT id, ${tag} FROM pages WHERE slug = ${slug}
|
||||
ON CONFLICT (page_id, tag) DO NOTHING
|
||||
`;
|
||||
}
|
||||
|
||||
async removeTag(slug: string, tag: string): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
DELETE FROM tags
|
||||
WHERE page_id = (SELECT id FROM pages WHERE slug = ${slug})
|
||||
AND tag = ${tag}
|
||||
`;
|
||||
}
|
||||
|
||||
async getTags(slug: string): Promise<string[]> {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`
|
||||
SELECT tag FROM tags
|
||||
WHERE page_id = (SELECT id FROM pages WHERE slug = ${slug})
|
||||
ORDER BY tag
|
||||
`;
|
||||
return rows.map((r: { tag: string }) => r.tag);
|
||||
}
|
||||
|
||||
// Timeline
|
||||
async addTimelineEntry(slug: string, entry: TimelineInput): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
INSERT INTO timeline_entries (page_id, date, source, summary, detail)
|
||||
SELECT id, ${entry.date}::date, ${entry.source || ''}, ${entry.summary}, ${entry.detail || ''}
|
||||
FROM pages WHERE slug = ${slug}
|
||||
`;
|
||||
}
|
||||
|
||||
async getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]> {
|
||||
const sql = db.getConnection();
|
||||
const limit = opts?.limit || 100;
|
||||
|
||||
let rows;
|
||||
if (opts?.after && opts?.before) {
|
||||
rows = await sql`
|
||||
SELECT te.* FROM timeline_entries te
|
||||
JOIN pages p ON p.id = te.page_id
|
||||
WHERE p.slug = ${slug} AND te.date >= ${opts.after}::date AND te.date <= ${opts.before}::date
|
||||
ORDER BY te.date DESC LIMIT ${limit}
|
||||
`;
|
||||
} else if (opts?.after) {
|
||||
rows = await sql`
|
||||
SELECT te.* FROM timeline_entries te
|
||||
JOIN pages p ON p.id = te.page_id
|
||||
WHERE p.slug = ${slug} AND te.date >= ${opts.after}::date
|
||||
ORDER BY te.date DESC LIMIT ${limit}
|
||||
`;
|
||||
} else {
|
||||
rows = await sql`
|
||||
SELECT te.* FROM timeline_entries te
|
||||
JOIN pages p ON p.id = te.page_id
|
||||
WHERE p.slug = ${slug}
|
||||
ORDER BY te.date DESC LIMIT ${limit}
|
||||
`;
|
||||
}
|
||||
|
||||
return rows as unknown as TimelineEntry[];
|
||||
}
|
||||
|
||||
// Raw data
|
||||
async putRawData(slug: string, source: string, data: object): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
INSERT INTO raw_data (page_id, source, data)
|
||||
SELECT id, ${source}, ${JSON.stringify(data)}::jsonb
|
||||
FROM pages WHERE slug = ${slug}
|
||||
ON CONFLICT (page_id, source) DO UPDATE SET
|
||||
data = EXCLUDED.data,
|
||||
fetched_at = now()
|
||||
`;
|
||||
}
|
||||
|
||||
async getRawData(slug: string, source?: string): Promise<RawData[]> {
|
||||
const sql = db.getConnection();
|
||||
let rows;
|
||||
if (source) {
|
||||
rows = await sql`
|
||||
SELECT rd.source, rd.data, rd.fetched_at FROM raw_data rd
|
||||
JOIN pages p ON p.id = rd.page_id
|
||||
WHERE p.slug = ${slug} AND rd.source = ${source}
|
||||
`;
|
||||
} else {
|
||||
rows = await sql`
|
||||
SELECT rd.source, rd.data, rd.fetched_at FROM raw_data rd
|
||||
JOIN pages p ON p.id = rd.page_id
|
||||
WHERE p.slug = ${slug}
|
||||
`;
|
||||
}
|
||||
return rows as unknown as RawData[];
|
||||
}
|
||||
|
||||
// Versions
|
||||
async createVersion(slug: string): Promise<PageVersion> {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`
|
||||
INSERT INTO page_versions (page_id, compiled_truth, frontmatter)
|
||||
SELECT id, compiled_truth, frontmatter
|
||||
FROM pages WHERE slug = ${slug}
|
||||
RETURNING *
|
||||
`;
|
||||
return rows[0] as unknown as PageVersion;
|
||||
}
|
||||
|
||||
async getVersions(slug: string): Promise<PageVersion[]> {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`
|
||||
SELECT pv.* FROM page_versions pv
|
||||
JOIN pages p ON p.id = pv.page_id
|
||||
WHERE p.slug = ${slug}
|
||||
ORDER BY pv.snapshot_at DESC
|
||||
`;
|
||||
return rows as unknown as PageVersion[];
|
||||
}
|
||||
|
||||
async revertToVersion(slug: string, versionId: number): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
UPDATE pages SET
|
||||
compiled_truth = pv.compiled_truth,
|
||||
frontmatter = pv.frontmatter,
|
||||
updated_at = now()
|
||||
FROM page_versions pv
|
||||
WHERE pages.slug = ${slug} AND pv.id = ${versionId} AND pv.page_id = pages.id
|
||||
`;
|
||||
}
|
||||
|
||||
// Stats + health
|
||||
async getStats(): Promise<BrainStats> {
|
||||
const sql = db.getConnection();
|
||||
const [stats] = await sql`
|
||||
SELECT
|
||||
(SELECT count(*) FROM pages) as page_count,
|
||||
(SELECT count(*) FROM content_chunks) as chunk_count,
|
||||
(SELECT count(*) FROM content_chunks WHERE embedded_at IS NOT NULL) as embedded_count,
|
||||
(SELECT count(*) FROM links) as link_count,
|
||||
(SELECT count(DISTINCT tag) FROM tags) as tag_count,
|
||||
(SELECT count(*) FROM timeline_entries) as timeline_entry_count
|
||||
`;
|
||||
|
||||
const types = await sql`
|
||||
SELECT type, count(*)::int as count FROM pages GROUP BY type ORDER BY count DESC
|
||||
`;
|
||||
const pages_by_type: Record<string, number> = {};
|
||||
for (const t of types) {
|
||||
pages_by_type[t.type as string] = t.count as number;
|
||||
}
|
||||
|
||||
return {
|
||||
page_count: Number(stats.page_count),
|
||||
chunk_count: Number(stats.chunk_count),
|
||||
embedded_count: Number(stats.embedded_count),
|
||||
link_count: Number(stats.link_count),
|
||||
tag_count: Number(stats.tag_count),
|
||||
timeline_entry_count: Number(stats.timeline_entry_count),
|
||||
pages_by_type,
|
||||
};
|
||||
}
|
||||
|
||||
async getHealth(): Promise<BrainHealth> {
|
||||
const sql = db.getConnection();
|
||||
const [h] = await sql`
|
||||
SELECT
|
||||
(SELECT count(*) FROM pages) as page_count,
|
||||
(SELECT count(*) FROM content_chunks WHERE embedded_at IS NOT NULL)::float /
|
||||
GREATEST((SELECT count(*) FROM content_chunks), 1)::float as embed_coverage,
|
||||
(SELECT count(*) FROM pages p
|
||||
WHERE p.updated_at < (SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id)
|
||||
) as stale_pages,
|
||||
(SELECT count(*) FROM pages p
|
||||
WHERE NOT EXISTS (SELECT 1 FROM links l WHERE l.to_page_id = p.id)
|
||||
) as orphan_pages,
|
||||
(SELECT count(*) FROM links l
|
||||
WHERE NOT EXISTS (SELECT 1 FROM pages p WHERE p.id = l.to_page_id)
|
||||
) as dead_links,
|
||||
(SELECT count(*) FROM content_chunks WHERE embedded_at IS NULL) as missing_embeddings
|
||||
`;
|
||||
|
||||
return {
|
||||
page_count: Number(h.page_count),
|
||||
embed_coverage: Number(h.embed_coverage),
|
||||
stale_pages: Number(h.stale_pages),
|
||||
orphan_pages: Number(h.orphan_pages),
|
||||
dead_links: Number(h.dead_links),
|
||||
missing_embeddings: Number(h.missing_embeddings),
|
||||
};
|
||||
}
|
||||
|
||||
// Ingest log
|
||||
async logIngest(entry: IngestLogInput): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
INSERT INTO ingest_log (source_type, source_ref, pages_updated, summary)
|
||||
VALUES (${entry.source_type}, ${entry.source_ref}, ${JSON.stringify(entry.pages_updated)}::jsonb, ${entry.summary})
|
||||
`;
|
||||
}
|
||||
|
||||
async getIngestLog(opts?: { limit?: number }): Promise<IngestLogEntry[]> {
|
||||
const sql = db.getConnection();
|
||||
const limit = opts?.limit || 50;
|
||||
const rows = await sql`
|
||||
SELECT * FROM ingest_log ORDER BY created_at DESC LIMIT ${limit}
|
||||
`;
|
||||
return rows as unknown as IngestLogEntry[];
|
||||
}
|
||||
|
||||
// Config
|
||||
async getConfig(key: string): Promise<string | null> {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`SELECT value FROM config WHERE key = ${key}`;
|
||||
return rows.length > 0 ? (rows[0].value as string) : null;
|
||||
}
|
||||
|
||||
async setConfig(key: string, value: string): Promise<void> {
|
||||
const sql = db.getConnection();
|
||||
await sql`
|
||||
INSERT INTO config (key, value) VALUES (${key}, ${value})
|
||||
ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value
|
||||
`;
|
||||
}
|
||||
}
|
||||
|
||||
// Helpers
|
||||
function validateSlug(slug: string): void {
|
||||
if (!slug || /\.\./.test(slug) || /^\//.test(slug) || !/^[a-z0-9][a-z0-9/_-]*$/.test(slug)) {
|
||||
throw new Error(`Invalid slug: "${slug}". Slugs must be lowercase alphanumeric with / - _ separators, no path traversal.`);
|
||||
}
|
||||
}
|
||||
|
||||
function contentHash(compiledTruth: string, timeline: string): string {
|
||||
return createHash('sha256').update(compiledTruth + '\n---\n' + timeline).digest('hex');
|
||||
}
|
||||
|
||||
function rowToPage(row: Record<string, unknown>): Page {
|
||||
return {
|
||||
id: row.id as number,
|
||||
slug: row.slug as string,
|
||||
type: row.type as PageType,
|
||||
title: row.title as string,
|
||||
compiled_truth: row.compiled_truth as string,
|
||||
timeline: row.timeline as string,
|
||||
frontmatter: (typeof row.frontmatter === 'string' ? JSON.parse(row.frontmatter) : row.frontmatter) as Record<string, unknown>,
|
||||
content_hash: row.content_hash as string | undefined,
|
||||
created_at: new Date(row.created_at as string),
|
||||
updated_at: new Date(row.updated_at as string),
|
||||
};
|
||||
}
|
||||
|
||||
function rowToChunk(row: Record<string, unknown>): Chunk {
|
||||
return {
|
||||
id: row.id as number,
|
||||
page_id: row.page_id as number,
|
||||
chunk_index: row.chunk_index as number,
|
||||
chunk_text: row.chunk_text as string,
|
||||
chunk_source: row.chunk_source as 'compiled_truth' | 'timeline',
|
||||
embedding: null, // Don't load embeddings into memory by default
|
||||
model: row.model as string,
|
||||
token_count: row.token_count as number | null,
|
||||
embedded_at: row.embedded_at ? new Date(row.embedded_at as string) : null,
|
||||
};
|
||||
}
|
||||
|
||||
function rowToSearchResult(row: Record<string, unknown>): SearchResult {
|
||||
return {
|
||||
slug: row.slug as string,
|
||||
page_id: row.page_id as number,
|
||||
title: row.title as string,
|
||||
type: row.type as PageType,
|
||||
chunk_text: row.chunk_text as string,
|
||||
chunk_source: row.chunk_source as 'compiled_truth' | 'timeline',
|
||||
score: Number(row.score),
|
||||
stale: Boolean(row.stale),
|
||||
};
|
||||
}
|
||||
129
src/core/search/dedup.ts
Normal file
129
src/core/search/dedup.ts
Normal file
@@ -0,0 +1,129 @@
|
||||
/**
|
||||
* 4-Layer Dedup Pipeline
|
||||
* Ported from production Ruby implementation (content_chunk.rb)
|
||||
*
|
||||
* 1. By source: one chunk per page with highest score
|
||||
* 2. By cosine similarity: remove chunks >0.85 similar to kept results
|
||||
* 3. By type: no page type exceeds 60% of results
|
||||
* 4. By page: max N chunks per page (default 2)
|
||||
*/
|
||||
|
||||
import type { SearchResult } from '../types.ts';
|
||||
|
||||
const COSINE_DEDUP_THRESHOLD = 0.85;
|
||||
const MAX_TYPE_RATIO = 0.6;
|
||||
const MAX_PER_PAGE = 2;
|
||||
|
||||
export function dedupResults(
|
||||
results: SearchResult[],
|
||||
opts?: {
|
||||
cosineThreshold?: number;
|
||||
maxTypeRatio?: number;
|
||||
maxPerPage?: number;
|
||||
},
|
||||
): SearchResult[] {
|
||||
const threshold = opts?.cosineThreshold ?? COSINE_DEDUP_THRESHOLD;
|
||||
const maxRatio = opts?.maxTypeRatio ?? MAX_TYPE_RATIO;
|
||||
const maxPerPage = opts?.maxPerPage ?? MAX_PER_PAGE;
|
||||
|
||||
let deduped = results;
|
||||
|
||||
// Layer 1: By source (one chunk per page with highest score)
|
||||
deduped = dedupBySource(deduped);
|
||||
|
||||
// Layer 2: By cosine similarity text overlap
|
||||
// (We don't have embeddings for results here, so use text similarity as proxy)
|
||||
deduped = dedupByTextSimilarity(deduped, threshold);
|
||||
|
||||
// Layer 3: By type distribution
|
||||
deduped = enforceTypeDiversity(deduped, maxRatio);
|
||||
|
||||
// Layer 4: By page cap
|
||||
deduped = capPerPage(deduped, maxPerPage);
|
||||
|
||||
return deduped;
|
||||
}
|
||||
|
||||
/**
|
||||
* Layer 1: Keep only the highest-scoring chunk per page.
|
||||
*/
|
||||
function dedupBySource(results: SearchResult[]): SearchResult[] {
|
||||
const byPage = new Map<string, SearchResult>();
|
||||
|
||||
for (const r of results) {
|
||||
const existing = byPage.get(r.slug);
|
||||
if (!existing || r.score > existing.score) {
|
||||
byPage.set(r.slug, r);
|
||||
}
|
||||
}
|
||||
|
||||
return Array.from(byPage.values()).sort((a, b) => b.score - a.score);
|
||||
}
|
||||
|
||||
/**
|
||||
* Layer 2: Remove chunks that are too similar to already-kept results.
|
||||
* Uses Jaccard similarity on word sets as a proxy for cosine similarity.
|
||||
*/
|
||||
function dedupByTextSimilarity(results: SearchResult[], threshold: number): SearchResult[] {
|
||||
const kept: SearchResult[] = [];
|
||||
|
||||
for (const r of results) {
|
||||
const rWords = new Set(r.chunk_text.toLowerCase().split(/\s+/));
|
||||
let tooSimilar = false;
|
||||
|
||||
for (const k of kept) {
|
||||
const kWords = new Set(k.chunk_text.toLowerCase().split(/\s+/));
|
||||
const intersection = new Set([...rWords].filter(w => kWords.has(w)));
|
||||
const union = new Set([...rWords, ...kWords]);
|
||||
const jaccard = intersection.size / union.size;
|
||||
|
||||
if (jaccard > threshold) {
|
||||
tooSimilar = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!tooSimilar) {
|
||||
kept.push(r);
|
||||
}
|
||||
}
|
||||
|
||||
return kept;
|
||||
}
|
||||
|
||||
/**
|
||||
* Layer 3: No page type exceeds maxRatio of total results.
|
||||
*/
|
||||
function enforceTypeDiversity(results: SearchResult[], maxRatio: number): SearchResult[] {
|
||||
const maxPerType = Math.max(1, Math.ceil(results.length * maxRatio));
|
||||
const typeCounts = new Map<string, number>();
|
||||
const kept: SearchResult[] = [];
|
||||
|
||||
for (const r of results) {
|
||||
const count = typeCounts.get(r.type) || 0;
|
||||
if (count < maxPerType) {
|
||||
kept.push(r);
|
||||
typeCounts.set(r.type, count + 1);
|
||||
}
|
||||
}
|
||||
|
||||
return kept;
|
||||
}
|
||||
|
||||
/**
|
||||
* Layer 4: Cap chunks per page.
|
||||
*/
|
||||
function capPerPage(results: SearchResult[], maxPerPage: number): SearchResult[] {
|
||||
const pageCounts = new Map<string, number>();
|
||||
const kept: SearchResult[] = [];
|
||||
|
||||
for (const r of results) {
|
||||
const count = pageCounts.get(r.slug) || 0;
|
||||
if (count < maxPerPage) {
|
||||
kept.push(r);
|
||||
pageCounts.set(r.slug, count + 1);
|
||||
}
|
||||
}
|
||||
|
||||
return kept;
|
||||
}
|
||||
85
src/core/search/expansion.ts
Normal file
85
src/core/search/expansion.ts
Normal file
@@ -0,0 +1,85 @@
|
||||
/**
|
||||
* Multi-Query Expansion via Claude Haiku
|
||||
* Ported from production Ruby implementation (query_expansion_service.rb, 69 LOC)
|
||||
*
|
||||
* Skip queries < 3 words.
|
||||
* Generate 2 alternative phrasings via tool use.
|
||||
* Return original + alternatives (max 3 total).
|
||||
*/
|
||||
|
||||
import Anthropic from '@anthropic-ai/sdk';
|
||||
|
||||
const MAX_QUERIES = 3;
|
||||
const MIN_WORDS = 3;
|
||||
|
||||
let anthropicClient: Anthropic | null = null;
|
||||
|
||||
function getClient(): Anthropic {
|
||||
if (!anthropicClient) {
|
||||
anthropicClient = new Anthropic();
|
||||
}
|
||||
return anthropicClient;
|
||||
}
|
||||
|
||||
export async function expandQuery(query: string): Promise<string[]> {
|
||||
const wordCount = (query.match(/\S+/g) || []).length;
|
||||
if (wordCount < MIN_WORDS) return [query];
|
||||
|
||||
try {
|
||||
const alternatives = await callHaikuForExpansion(query);
|
||||
const all = [query, ...alternatives];
|
||||
// Deduplicate
|
||||
const unique = [...new Set(all.map(q => q.toLowerCase().trim()))];
|
||||
return unique.slice(0, MAX_QUERIES).map(q =>
|
||||
all.find(orig => orig.toLowerCase().trim() === q) || q,
|
||||
);
|
||||
} catch {
|
||||
return [query];
|
||||
}
|
||||
}
|
||||
|
||||
async function callHaikuForExpansion(query: string): Promise<string[]> {
|
||||
const response = await getClient().messages.create({
|
||||
model: 'claude-haiku-4-5-20251001',
|
||||
max_tokens: 300,
|
||||
tools: [
|
||||
{
|
||||
name: 'expand_query',
|
||||
description: 'Generate alternative phrasings of a search query to improve recall',
|
||||
input_schema: {
|
||||
type: 'object' as const,
|
||||
properties: {
|
||||
alternative_queries: {
|
||||
type: 'array',
|
||||
items: { type: 'string' },
|
||||
description: '2 alternative phrasings of the original query, each approaching the topic from a different angle',
|
||||
},
|
||||
},
|
||||
required: ['alternative_queries'],
|
||||
},
|
||||
},
|
||||
],
|
||||
tool_choice: { type: 'tool', name: 'expand_query' },
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: `Generate 2 alternative search queries that would find relevant results for this question. Each alternative should approach the topic from a different angle or use different terminology.
|
||||
|
||||
Original query: "${query}"`,
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
// Extract tool use result
|
||||
for (const block of response.content) {
|
||||
if (block.type === 'tool_use' && block.name === 'expand_query') {
|
||||
const input = block.input as { alternative_queries?: unknown };
|
||||
const alts = input.alternative_queries;
|
||||
if (Array.isArray(alts)) {
|
||||
return alts.map(String).slice(0, 2);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return [];
|
||||
}
|
||||
86
src/core/search/hybrid.ts
Normal file
86
src/core/search/hybrid.ts
Normal file
@@ -0,0 +1,86 @@
|
||||
/**
|
||||
* Hybrid Search with Reciprocal Rank Fusion (RRF)
|
||||
* Ported from production Ruby implementation (content_chunk.rb)
|
||||
*
|
||||
* RRF score = sum(1 / (60 + rank_in_list))
|
||||
* Merges vector + keyword results fairly regardless of score scale.
|
||||
*/
|
||||
|
||||
import type { BrainEngine } from '../engine.ts';
|
||||
import type { SearchResult, SearchOpts } from '../types.ts';
|
||||
import { embed } from '../embedding.ts';
|
||||
import { dedupResults } from './dedup.ts';
|
||||
|
||||
const RRF_K = 60;
|
||||
|
||||
export interface HybridSearchOpts extends SearchOpts {
|
||||
expansion?: boolean;
|
||||
expandFn?: (query: string) => Promise<string[]>;
|
||||
}
|
||||
|
||||
export async function hybridSearch(
|
||||
engine: BrainEngine,
|
||||
query: string,
|
||||
opts?: HybridSearchOpts,
|
||||
): Promise<SearchResult[]> {
|
||||
const limit = opts?.limit || 20;
|
||||
|
||||
// Determine query variants (optionally with expansion)
|
||||
let queries = [query];
|
||||
if (opts?.expansion && opts?.expandFn) {
|
||||
try {
|
||||
const expanded = await opts.expandFn(query);
|
||||
queries = [query, ...expanded].slice(0, 3);
|
||||
} catch {
|
||||
// Expansion failure is non-fatal
|
||||
}
|
||||
}
|
||||
|
||||
// Embed all query variants
|
||||
const embeddings = await Promise.all(queries.map(q => embed(q)));
|
||||
|
||||
// Run vector search for each embedding
|
||||
const vectorLists = await Promise.all(
|
||||
embeddings.map(emb => engine.searchVector(emb, { limit: limit * 2 })),
|
||||
);
|
||||
|
||||
// Run keyword search (only the original query)
|
||||
const keywordResults = await engine.searchKeyword(query, { limit: limit * 2 });
|
||||
|
||||
// Merge all result lists via RRF
|
||||
const allLists = [...vectorLists, keywordResults];
|
||||
const fused = rrfFusion(allLists);
|
||||
|
||||
// Dedup
|
||||
const deduped = dedupResults(fused);
|
||||
|
||||
return deduped.slice(0, limit);
|
||||
}
|
||||
|
||||
/**
|
||||
* Reciprocal Rank Fusion: merge multiple ranked lists.
|
||||
* Each result gets score = sum(1 / (K + rank)) across all lists it appears in.
|
||||
*/
|
||||
function rrfFusion(lists: SearchResult[][]): SearchResult[] {
|
||||
const scores = new Map<string, { result: SearchResult; score: number }>();
|
||||
|
||||
for (const list of lists) {
|
||||
for (let rank = 0; rank < list.length; rank++) {
|
||||
const r = list[rank];
|
||||
const key = `${r.slug}:${r.chunk_text.slice(0, 50)}`;
|
||||
const existing = scores.get(key);
|
||||
const rrfScore = 1 / (RRF_K + rank);
|
||||
|
||||
if (existing) {
|
||||
existing.score += rrfScore;
|
||||
} else {
|
||||
scores.set(key, { result: r, score: rrfScore });
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Sort by fused score descending
|
||||
return Array.from(scores.values())
|
||||
.sort((a, b) => b.score - a.score)
|
||||
.map(({ result, score }) => ({ ...result, score }));
|
||||
}
|
||||
10
src/core/search/keyword.ts
Normal file
10
src/core/search/keyword.ts
Normal file
@@ -0,0 +1,10 @@
|
||||
import type { BrainEngine } from '../engine.ts';
|
||||
import type { SearchResult, SearchOpts } from '../types.ts';
|
||||
|
||||
export async function keywordSearch(
|
||||
engine: BrainEngine,
|
||||
query: string,
|
||||
opts?: SearchOpts,
|
||||
): Promise<SearchResult[]> {
|
||||
return engine.searchKeyword(query, opts);
|
||||
}
|
||||
10
src/core/search/vector.ts
Normal file
10
src/core/search/vector.ts
Normal file
@@ -0,0 +1,10 @@
|
||||
import type { BrainEngine } from '../engine.ts';
|
||||
import type { SearchResult, SearchOpts } from '../types.ts';
|
||||
|
||||
export async function vectorSearch(
|
||||
engine: BrainEngine,
|
||||
embedding: Float32Array,
|
||||
opts?: SearchOpts,
|
||||
): Promise<SearchResult[]> {
|
||||
return engine.searchVector(embedding, opts);
|
||||
}
|
||||
183
src/core/types.ts
Normal file
183
src/core/types.ts
Normal file
@@ -0,0 +1,183 @@
|
||||
// Page types
|
||||
export type PageType = 'person' | 'company' | 'deal' | 'yc' | 'civic' | 'project' | 'concept' | 'source' | 'media';
|
||||
|
||||
export interface Page {
|
||||
id: number;
|
||||
slug: string;
|
||||
type: PageType;
|
||||
title: string;
|
||||
compiled_truth: string;
|
||||
timeline: string;
|
||||
frontmatter: Record<string, unknown>;
|
||||
content_hash?: string;
|
||||
created_at: Date;
|
||||
updated_at: Date;
|
||||
}
|
||||
|
||||
export interface PageInput {
|
||||
type: PageType;
|
||||
title: string;
|
||||
compiled_truth: string;
|
||||
timeline?: string;
|
||||
frontmatter?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
export interface PageFilters {
|
||||
type?: PageType;
|
||||
tag?: string;
|
||||
limit?: number;
|
||||
offset?: number;
|
||||
}
|
||||
|
||||
// Chunks
|
||||
export interface Chunk {
|
||||
id: number;
|
||||
page_id: number;
|
||||
chunk_index: number;
|
||||
chunk_text: string;
|
||||
chunk_source: 'compiled_truth' | 'timeline';
|
||||
embedding: Float32Array | null;
|
||||
model: string;
|
||||
token_count: number | null;
|
||||
embedded_at: Date | null;
|
||||
}
|
||||
|
||||
export interface ChunkInput {
|
||||
chunk_index: number;
|
||||
chunk_text: string;
|
||||
chunk_source: 'compiled_truth' | 'timeline';
|
||||
embedding?: Float32Array;
|
||||
model?: string;
|
||||
token_count?: number;
|
||||
}
|
||||
|
||||
// Search
|
||||
export interface SearchResult {
|
||||
slug: string;
|
||||
page_id: number;
|
||||
title: string;
|
||||
type: PageType;
|
||||
chunk_text: string;
|
||||
chunk_source: 'compiled_truth' | 'timeline';
|
||||
score: number;
|
||||
stale: boolean;
|
||||
}
|
||||
|
||||
export interface SearchOpts {
|
||||
limit?: number;
|
||||
type?: PageType;
|
||||
exclude_slugs?: string[];
|
||||
}
|
||||
|
||||
// Links
|
||||
export interface Link {
|
||||
from_slug: string;
|
||||
to_slug: string;
|
||||
link_type: string;
|
||||
context: string;
|
||||
}
|
||||
|
||||
export interface GraphNode {
|
||||
slug: string;
|
||||
title: string;
|
||||
type: PageType;
|
||||
depth: number;
|
||||
links: { to_slug: string; link_type: string }[];
|
||||
}
|
||||
|
||||
// Timeline
|
||||
export interface TimelineEntry {
|
||||
id: number;
|
||||
page_id: number;
|
||||
date: string;
|
||||
source: string;
|
||||
summary: string;
|
||||
detail: string;
|
||||
created_at: Date;
|
||||
}
|
||||
|
||||
export interface TimelineInput {
|
||||
date: string;
|
||||
source?: string;
|
||||
summary: string;
|
||||
detail?: string;
|
||||
}
|
||||
|
||||
export interface TimelineOpts {
|
||||
limit?: number;
|
||||
after?: string;
|
||||
before?: string;
|
||||
}
|
||||
|
||||
// Raw data
|
||||
export interface RawData {
|
||||
source: string;
|
||||
data: Record<string, unknown>;
|
||||
fetched_at: Date;
|
||||
}
|
||||
|
||||
// Versions
|
||||
export interface PageVersion {
|
||||
id: number;
|
||||
page_id: number;
|
||||
compiled_truth: string;
|
||||
frontmatter: Record<string, unknown>;
|
||||
snapshot_at: Date;
|
||||
}
|
||||
|
||||
// Stats + Health
|
||||
export interface BrainStats {
|
||||
page_count: number;
|
||||
chunk_count: number;
|
||||
embedded_count: number;
|
||||
link_count: number;
|
||||
tag_count: number;
|
||||
timeline_entry_count: number;
|
||||
pages_by_type: Record<string, number>;
|
||||
}
|
||||
|
||||
export interface BrainHealth {
|
||||
page_count: number;
|
||||
embed_coverage: number;
|
||||
stale_pages: number;
|
||||
orphan_pages: number;
|
||||
dead_links: number;
|
||||
missing_embeddings: number;
|
||||
}
|
||||
|
||||
// Ingest log
|
||||
export interface IngestLogEntry {
|
||||
id: number;
|
||||
source_type: string;
|
||||
source_ref: string;
|
||||
pages_updated: string[];
|
||||
summary: string;
|
||||
created_at: Date;
|
||||
}
|
||||
|
||||
export interface IngestLogInput {
|
||||
source_type: string;
|
||||
source_ref: string;
|
||||
pages_updated: string[];
|
||||
summary: string;
|
||||
}
|
||||
|
||||
// Config
|
||||
export interface EngineConfig {
|
||||
database_url?: string;
|
||||
database_path?: string;
|
||||
engine?: 'postgres' | 'sqlite';
|
||||
}
|
||||
|
||||
// Errors
|
||||
export class GBrainError extends Error {
|
||||
constructor(
|
||||
public problem: string,
|
||||
public cause_description: string,
|
||||
public fix: string,
|
||||
public docs_url?: string,
|
||||
) {
|
||||
super(`${problem}: ${cause_description}. Fix: ${fix}`);
|
||||
this.name = 'GBrainError';
|
||||
}
|
||||
}
|
||||
220
src/mcp/server.ts
Normal file
220
src/mcp/server.ts
Normal file
@@ -0,0 +1,220 @@
|
||||
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
|
||||
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { parseMarkdown, serializeMarkdown } from '../core/markdown.ts';
|
||||
import { hybridSearch } from '../core/search/hybrid.ts';
|
||||
import { expandQuery } from '../core/search/expansion.ts';
|
||||
import { chunkText } from '../core/chunkers/recursive.ts';
|
||||
import { embedBatch } from '../core/embedding.ts';
|
||||
import type { ChunkInput } from '../core/types.ts';
|
||||
|
||||
export async function startMcpServer(engine: BrainEngine) {
|
||||
const server = new Server(
|
||||
{ name: 'gbrain', version: '0.1.0' },
|
||||
{ capabilities: { tools: {} } },
|
||||
);
|
||||
|
||||
server.setRequestHandler('tools/list' as any, async () => ({
|
||||
tools: getToolDefinitions(),
|
||||
}));
|
||||
|
||||
server.setRequestHandler('tools/call' as any, async (request: any) => {
|
||||
const { name, arguments: params } = request.params;
|
||||
try {
|
||||
const result = await handleToolCall(engine, name, params || {});
|
||||
return { content: [{ type: 'text', text: JSON.stringify(result, null, 2) }] };
|
||||
} catch (e: unknown) {
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
return { content: [{ type: 'text', text: `Error: ${msg}` }], isError: true };
|
||||
}
|
||||
});
|
||||
|
||||
const transport = new StdioServerTransport();
|
||||
await server.connect(transport);
|
||||
}
|
||||
|
||||
export async function handleToolCall(
|
||||
engine: BrainEngine,
|
||||
tool: string,
|
||||
params: Record<string, unknown>,
|
||||
): Promise<unknown> {
|
||||
switch (tool) {
|
||||
case 'get_page': {
|
||||
const slug = params.slug as string;
|
||||
const page = await engine.getPage(slug);
|
||||
if (!page) return { error: `Page not found: ${slug}` };
|
||||
const tags = await engine.getTags(slug);
|
||||
return { ...page, tags };
|
||||
}
|
||||
|
||||
case 'put_page': {
|
||||
const slug = params.slug as string;
|
||||
const content = params.content as string;
|
||||
const parsed = parseMarkdown(content, slug + '.md');
|
||||
|
||||
const existing = await engine.getPage(slug);
|
||||
if (existing) await engine.createVersion(slug);
|
||||
|
||||
const page = await engine.putPage(slug, {
|
||||
type: parsed.type,
|
||||
title: parsed.title,
|
||||
compiled_truth: parsed.compiled_truth,
|
||||
timeline: parsed.timeline,
|
||||
frontmatter: parsed.frontmatter,
|
||||
});
|
||||
|
||||
for (const tag of parsed.tags) await engine.addTag(slug, tag);
|
||||
|
||||
// Chunk and embed
|
||||
const chunks: ChunkInput[] = [];
|
||||
if (parsed.compiled_truth.trim()) {
|
||||
for (const c of chunkText(parsed.compiled_truth)) {
|
||||
chunks.push({ chunk_index: chunks.length, chunk_text: c.text, chunk_source: 'compiled_truth' });
|
||||
}
|
||||
}
|
||||
if (parsed.timeline.trim()) {
|
||||
for (const c of chunkText(parsed.timeline)) {
|
||||
chunks.push({ chunk_index: chunks.length, chunk_text: c.text, chunk_source: 'timeline' });
|
||||
}
|
||||
}
|
||||
if (chunks.length > 0) {
|
||||
try {
|
||||
const embeddings = await embedBatch(chunks.map(c => c.chunk_text));
|
||||
for (let i = 0; i < chunks.length; i++) {
|
||||
chunks[i].embedding = embeddings[i];
|
||||
}
|
||||
} catch { /* non-fatal */ }
|
||||
await engine.upsertChunks(slug, chunks);
|
||||
}
|
||||
|
||||
return { slug: page.slug, status: existing ? 'updated' : 'created' };
|
||||
}
|
||||
|
||||
case 'delete_page': {
|
||||
await engine.deletePage(params.slug as string);
|
||||
return { status: 'deleted' };
|
||||
}
|
||||
|
||||
case 'list_pages': {
|
||||
const pages = await engine.listPages({
|
||||
type: params.type as any,
|
||||
tag: params.tag as string,
|
||||
limit: (params.limit as number) || 50,
|
||||
});
|
||||
return pages.map(p => ({ slug: p.slug, type: p.type, title: p.title, updated_at: p.updated_at }));
|
||||
}
|
||||
|
||||
case 'search': {
|
||||
return engine.searchKeyword(params.query as string, { limit: (params.limit as number) || 20 });
|
||||
}
|
||||
|
||||
case 'query': {
|
||||
return hybridSearch(engine, params.query as string, {
|
||||
limit: (params.limit as number) || 20,
|
||||
expansion: true,
|
||||
expandFn: expandQuery,
|
||||
});
|
||||
}
|
||||
|
||||
case 'add_tag': {
|
||||
await engine.addTag(params.slug as string, params.tag as string);
|
||||
return { status: 'ok' };
|
||||
}
|
||||
|
||||
case 'remove_tag': {
|
||||
await engine.removeTag(params.slug as string, params.tag as string);
|
||||
return { status: 'ok' };
|
||||
}
|
||||
|
||||
case 'get_tags': {
|
||||
return engine.getTags(params.slug as string);
|
||||
}
|
||||
|
||||
case 'add_link': {
|
||||
await engine.addLink(
|
||||
params.from as string,
|
||||
params.to as string,
|
||||
params.context as string || '',
|
||||
params.link_type as string || '',
|
||||
);
|
||||
return { status: 'ok' };
|
||||
}
|
||||
|
||||
case 'remove_link': {
|
||||
await engine.removeLink(params.from as string, params.to as string);
|
||||
return { status: 'ok' };
|
||||
}
|
||||
|
||||
case 'get_links': {
|
||||
return engine.getLinks(params.slug as string);
|
||||
}
|
||||
|
||||
case 'get_backlinks': {
|
||||
return engine.getBacklinks(params.slug as string);
|
||||
}
|
||||
|
||||
case 'traverse_graph': {
|
||||
return engine.traverseGraph(params.slug as string, (params.depth as number) || 5);
|
||||
}
|
||||
|
||||
case 'add_timeline_entry': {
|
||||
await engine.addTimelineEntry(params.slug as string, {
|
||||
date: params.date as string,
|
||||
source: params.source as string || '',
|
||||
summary: params.summary as string,
|
||||
detail: params.detail as string || '',
|
||||
});
|
||||
return { status: 'ok' };
|
||||
}
|
||||
|
||||
case 'get_timeline': {
|
||||
return engine.getTimeline(params.slug as string);
|
||||
}
|
||||
|
||||
case 'get_stats': {
|
||||
return engine.getStats();
|
||||
}
|
||||
|
||||
case 'get_health': {
|
||||
return engine.getHealth();
|
||||
}
|
||||
|
||||
case 'get_versions': {
|
||||
return engine.getVersions(params.slug as string);
|
||||
}
|
||||
|
||||
case 'revert_version': {
|
||||
await engine.createVersion(params.slug as string);
|
||||
await engine.revertToVersion(params.slug as string, params.version_id as number);
|
||||
return { status: 'reverted' };
|
||||
}
|
||||
|
||||
default:
|
||||
throw new Error(`Unknown tool: ${tool}`);
|
||||
}
|
||||
}
|
||||
|
||||
function getToolDefinitions() {
|
||||
return [
|
||||
{ name: 'get_page', description: 'Read a page by slug', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'put_page', description: 'Write/update a page (markdown with frontmatter)', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, content: { type: 'string', description: 'Full markdown content with YAML frontmatter' } }, required: ['slug', 'content'] } },
|
||||
{ name: 'delete_page', description: 'Delete a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'list_pages', description: 'List pages with optional filters', inputSchema: { type: 'object', properties: { type: { type: 'string' }, tag: { type: 'string' }, limit: { type: 'number' } } } },
|
||||
{ name: 'search', description: 'Keyword search using full-text search', inputSchema: { type: 'object', properties: { query: { type: 'string' }, limit: { type: 'number' } }, required: ['query'] } },
|
||||
{ name: 'query', description: 'Hybrid search with vector + keyword + multi-query expansion', inputSchema: { type: 'object', properties: { query: { type: 'string' }, limit: { type: 'number' } }, required: ['query'] } },
|
||||
{ name: 'add_tag', description: 'Add tag to page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, tag: { type: 'string' } }, required: ['slug', 'tag'] } },
|
||||
{ name: 'remove_tag', description: 'Remove tag from page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, tag: { type: 'string' } }, required: ['slug', 'tag'] } },
|
||||
{ name: 'get_tags', description: 'List tags for a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'add_link', description: 'Create link between pages', inputSchema: { type: 'object', properties: { from: { type: 'string' }, to: { type: 'string' }, link_type: { type: 'string' }, context: { type: 'string' } }, required: ['from', 'to'] } },
|
||||
{ name: 'remove_link', description: 'Remove link between pages', inputSchema: { type: 'object', properties: { from: { type: 'string' }, to: { type: 'string' } }, required: ['from', 'to'] } },
|
||||
{ name: 'get_links', description: 'List outgoing links from a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'get_backlinks', description: 'List incoming links to a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'traverse_graph', description: 'Traverse link graph from a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, depth: { type: 'number', description: 'Max traversal depth (default 5)' } }, required: ['slug'] } },
|
||||
{ name: 'add_timeline_entry', description: 'Add timeline entry to a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, date: { type: 'string' }, summary: { type: 'string' }, detail: { type: 'string' }, source: { type: 'string' } }, required: ['slug', 'date', 'summary'] } },
|
||||
{ name: 'get_timeline', description: 'Get timeline entries for a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'get_stats', description: 'Brain statistics (page count, chunk count, etc.)', inputSchema: { type: 'object', properties: {} } },
|
||||
{ name: 'get_health', description: 'Brain health dashboard (embed coverage, stale pages, orphans)', inputSchema: { type: 'object', properties: {} } },
|
||||
{ name: 'get_versions', description: 'Page version history', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'revert_version', description: 'Revert page to a previous version', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, version_id: { type: 'number' } }, required: ['slug', 'version_id'] } },
|
||||
];
|
||||
}
|
||||
195
src/schema.sql
Normal file
195
src/schema.sql
Normal file
@@ -0,0 +1,195 @@
|
||||
-- GBrain Postgres + pgvector schema
|
||||
|
||||
CREATE EXTENSION IF NOT EXISTS vector;
|
||||
CREATE EXTENSION IF NOT EXISTS pg_trgm;
|
||||
|
||||
-- ============================================================
|
||||
-- pages: the core content table
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS pages (
|
||||
id SERIAL PRIMARY KEY,
|
||||
slug TEXT NOT NULL UNIQUE,
|
||||
type TEXT NOT NULL,
|
||||
title TEXT NOT NULL,
|
||||
compiled_truth TEXT NOT NULL DEFAULT '',
|
||||
timeline TEXT NOT NULL DEFAULT '',
|
||||
frontmatter JSONB NOT NULL DEFAULT '{}',
|
||||
content_hash TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_type ON pages(type);
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_frontmatter ON pages USING GIN(frontmatter);
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_trgm ON pages USING GIN(title gin_trgm_ops);
|
||||
|
||||
-- ============================================================
|
||||
-- content_chunks: chunked content with embeddings
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS content_chunks (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
chunk_index INTEGER NOT NULL,
|
||||
chunk_text TEXT NOT NULL,
|
||||
chunk_source TEXT NOT NULL DEFAULT 'compiled_truth',
|
||||
embedding vector(1536),
|
||||
model TEXT NOT NULL DEFAULT 'text-embedding-3-large',
|
||||
token_count INTEGER,
|
||||
embedded_at TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_chunks_page ON content_chunks(page_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_chunks_embedding ON content_chunks USING hnsw (embedding vector_cosine_ops);
|
||||
|
||||
-- ============================================================
|
||||
-- links: cross-references between pages
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS links (
|
||||
id SERIAL PRIMARY KEY,
|
||||
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
link_type TEXT NOT NULL DEFAULT '',
|
||||
context TEXT NOT NULL DEFAULT '',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(from_page_id, to_page_id)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_links_from ON links(from_page_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_links_to ON links(to_page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- tags
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS tags (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
tag TEXT NOT NULL,
|
||||
UNIQUE(page_id, tag)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_tags_tag ON tags(tag);
|
||||
CREATE INDEX IF NOT EXISTS idx_tags_page_id ON tags(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- raw_data: sidecar data (replaces .raw/ JSON files)
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS raw_data (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
source TEXT NOT NULL,
|
||||
data JSONB NOT NULL,
|
||||
fetched_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(page_id, source)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_raw_data_page ON raw_data(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- timeline_entries: structured timeline
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS timeline_entries (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
date DATE NOT NULL,
|
||||
source TEXT NOT NULL DEFAULT '',
|
||||
summary TEXT NOT NULL,
|
||||
detail TEXT NOT NULL DEFAULT '',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_timeline_page ON timeline_entries(page_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_timeline_date ON timeline_entries(date);
|
||||
|
||||
-- ============================================================
|
||||
-- page_versions: snapshot history for compiled_truth
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS page_versions (
|
||||
id SERIAL PRIMARY KEY,
|
||||
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
|
||||
compiled_truth TEXT NOT NULL,
|
||||
frontmatter JSONB NOT NULL DEFAULT '{}',
|
||||
snapshot_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_versions_page ON page_versions(page_id);
|
||||
|
||||
-- ============================================================
|
||||
-- ingest_log
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS ingest_log (
|
||||
id SERIAL PRIMARY KEY,
|
||||
source_type TEXT NOT NULL,
|
||||
source_ref TEXT NOT NULL,
|
||||
pages_updated JSONB NOT NULL DEFAULT '[]',
|
||||
summary TEXT NOT NULL DEFAULT '',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- config: brain-level settings
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS config (
|
||||
key TEXT PRIMARY KEY,
|
||||
value TEXT NOT NULL
|
||||
);
|
||||
|
||||
INSERT INTO config (key, value) VALUES
|
||||
('version', '1'),
|
||||
('embedding_model', 'text-embedding-3-large'),
|
||||
('embedding_dimensions', '1536'),
|
||||
('chunk_strategy', 'semantic')
|
||||
ON CONFLICT (key) DO NOTHING;
|
||||
|
||||
-- ============================================================
|
||||
-- Trigger-based search_vector (spans pages + timeline_entries)
|
||||
-- ============================================================
|
||||
ALTER TABLE pages ADD COLUMN IF NOT EXISTS search_vector tsvector;
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_pages_search ON pages USING GIN(search_vector);
|
||||
|
||||
-- Function to rebuild search_vector for a page
|
||||
CREATE OR REPLACE FUNCTION update_page_search_vector() RETURNS trigger AS $$
|
||||
DECLARE
|
||||
timeline_text TEXT;
|
||||
BEGIN
|
||||
-- Gather timeline_entries text for this page
|
||||
SELECT coalesce(string_agg(summary || ' ' || detail, ' '), '')
|
||||
INTO timeline_text
|
||||
FROM timeline_entries
|
||||
WHERE page_id = NEW.id;
|
||||
|
||||
-- Build weighted tsvector
|
||||
NEW.search_vector :=
|
||||
setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
|
||||
setweight(to_tsvector('english', coalesce(NEW.compiled_truth, '')), 'B') ||
|
||||
setweight(to_tsvector('english', coalesce(NEW.timeline, '')), 'C') ||
|
||||
setweight(to_tsvector('english', coalesce(timeline_text, '')), 'C');
|
||||
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
DROP TRIGGER IF EXISTS trg_pages_search_vector ON pages;
|
||||
CREATE TRIGGER trg_pages_search_vector
|
||||
BEFORE INSERT OR UPDATE ON pages
|
||||
FOR EACH ROW
|
||||
EXECUTE FUNCTION update_page_search_vector();
|
||||
|
||||
-- When timeline_entries change, update the parent page's search_vector
|
||||
CREATE OR REPLACE FUNCTION update_page_search_vector_from_timeline() RETURNS trigger AS $$
|
||||
DECLARE
|
||||
page_row pages%ROWTYPE;
|
||||
BEGIN
|
||||
-- Touch the page to re-fire its trigger
|
||||
UPDATE pages SET updated_at = now()
|
||||
WHERE id = coalesce(NEW.page_id, OLD.page_id);
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
DROP TRIGGER IF EXISTS trg_timeline_search_vector ON timeline_entries;
|
||||
CREATE TRIGGER trg_timeline_search_vector
|
||||
AFTER INSERT OR UPDATE OR DELETE ON timeline_entries
|
||||
FOR EACH ROW
|
||||
EXECUTE FUNCTION update_page_search_vector_from_timeline();
|
||||
77
test/chunkers/recursive.test.ts
Normal file
77
test/chunkers/recursive.test.ts
Normal file
@@ -0,0 +1,77 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { chunkText } from '../../src/core/chunkers/recursive.ts';
|
||||
|
||||
describe('Recursive Text Chunker', () => {
|
||||
test('returns empty array for empty input', () => {
|
||||
expect(chunkText('')).toEqual([]);
|
||||
expect(chunkText(' ')).toEqual([]);
|
||||
});
|
||||
|
||||
test('returns single chunk for short text', () => {
|
||||
const text = 'Hello world. This is a short text.';
|
||||
const chunks = chunkText(text);
|
||||
expect(chunks).toHaveLength(1);
|
||||
expect(chunks[0].text).toBe(text.trim());
|
||||
expect(chunks[0].index).toBe(0);
|
||||
});
|
||||
|
||||
test('splits at paragraph boundaries', () => {
|
||||
const paragraph = 'word '.repeat(200).trim();
|
||||
const text = paragraph + '\n\n' + paragraph;
|
||||
const chunks = chunkText(text, { chunkSize: 250 });
|
||||
expect(chunks.length).toBeGreaterThanOrEqual(2);
|
||||
});
|
||||
|
||||
test('respects chunk size target', () => {
|
||||
const text = 'word '.repeat(1000).trim();
|
||||
const chunks = chunkText(text, { chunkSize: 100 });
|
||||
for (const chunk of chunks) {
|
||||
const wordCount = chunk.text.split(/\s+/).length;
|
||||
// Allow up to 1.5x target due to greedy merge
|
||||
expect(wordCount).toBeLessThanOrEqual(150);
|
||||
}
|
||||
});
|
||||
|
||||
test('applies overlap between chunks', () => {
|
||||
const text = 'word '.repeat(1000).trim();
|
||||
const chunks = chunkText(text, { chunkSize: 100, chunkOverlap: 20 });
|
||||
expect(chunks.length).toBeGreaterThan(1);
|
||||
// Second chunk should start with words from end of first chunk
|
||||
// (overlap means shared content between adjacent chunks)
|
||||
expect(chunks[1].text.length).toBeGreaterThan(0);
|
||||
});
|
||||
|
||||
test('splits at sentence boundaries', () => {
|
||||
const sentences = Array.from({ length: 50 }, (_, i) =>
|
||||
`This is sentence number ${i} with some content about topic ${i}.`
|
||||
).join(' ');
|
||||
const chunks = chunkText(sentences, { chunkSize: 50 });
|
||||
expect(chunks.length).toBeGreaterThan(1);
|
||||
// Each chunk should end near a sentence boundary
|
||||
for (const chunk of chunks.slice(0, -1)) {
|
||||
// Allow for overlap text, but the core content should have sentence endings
|
||||
expect(chunk.text).toMatch(/[.!?]/);
|
||||
}
|
||||
});
|
||||
|
||||
test('assigns sequential indices', () => {
|
||||
const text = 'word '.repeat(1000).trim();
|
||||
const chunks = chunkText(text, { chunkSize: 100 });
|
||||
for (let i = 0; i < chunks.length; i++) {
|
||||
expect(chunks[i].index).toBe(i);
|
||||
}
|
||||
});
|
||||
|
||||
test('handles single word input', () => {
|
||||
const chunks = chunkText('hello');
|
||||
expect(chunks).toHaveLength(1);
|
||||
expect(chunks[0].text).toBe('hello');
|
||||
});
|
||||
|
||||
test('handles unicode text', () => {
|
||||
const text = 'Bonjour le monde. ' + 'Ceci est un texte en francais. '.repeat(100);
|
||||
const chunks = chunkText(text, { chunkSize: 50 });
|
||||
expect(chunks.length).toBeGreaterThan(1);
|
||||
expect(chunks[0].text).toContain('Bonjour');
|
||||
});
|
||||
});
|
||||
148
test/markdown.test.ts
Normal file
148
test/markdown.test.ts
Normal file
@@ -0,0 +1,148 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { parseMarkdown, serializeMarkdown, splitBody } from '../src/core/markdown.ts';
|
||||
|
||||
describe('Markdown Parser', () => {
|
||||
test('parses frontmatter + compiled_truth + timeline', () => {
|
||||
const md = `---
|
||||
type: concept
|
||||
title: Do Things That Don't Scale
|
||||
tags: [startups, growth]
|
||||
---
|
||||
|
||||
Paul Graham argues that startups should do unscalable things early on.
|
||||
|
||||
---
|
||||
|
||||
- 2013-07-01: Published on paulgraham.com
|
||||
- 2024-11-15: Referenced in batch kickoff talk
|
||||
`;
|
||||
const parsed = parseMarkdown(md);
|
||||
expect(parsed.type).toBe('concept');
|
||||
expect(parsed.title).toBe("Do Things That Don't Scale");
|
||||
expect(parsed.tags).toEqual(['startups', 'growth']);
|
||||
expect(parsed.compiled_truth).toContain('unscalable things');
|
||||
expect(parsed.timeline).toContain('Published on paulgraham.com');
|
||||
expect(parsed.timeline).toContain('batch kickoff talk');
|
||||
});
|
||||
|
||||
test('handles no timeline separator', () => {
|
||||
const md = `---
|
||||
type: concept
|
||||
title: Superlinear Returns
|
||||
---
|
||||
|
||||
Returns in many fields are superlinear.
|
||||
Performance compounds over time.
|
||||
`;
|
||||
const parsed = parseMarkdown(md);
|
||||
expect(parsed.compiled_truth).toContain('superlinear');
|
||||
expect(parsed.timeline).toBe('');
|
||||
});
|
||||
|
||||
test('handles empty body', () => {
|
||||
const md = `---
|
||||
type: concept
|
||||
title: Empty Page
|
||||
---
|
||||
`;
|
||||
const parsed = parseMarkdown(md);
|
||||
expect(parsed.compiled_truth).toBe('');
|
||||
expect(parsed.timeline).toBe('');
|
||||
});
|
||||
|
||||
test('removes type, title, tags from frontmatter object', () => {
|
||||
const md = `---
|
||||
type: concept
|
||||
title: Test
|
||||
tags: [a, b]
|
||||
custom_field: hello
|
||||
---
|
||||
|
||||
Content
|
||||
`;
|
||||
const parsed = parseMarkdown(md);
|
||||
expect(parsed.frontmatter).not.toHaveProperty('type');
|
||||
expect(parsed.frontmatter).not.toHaveProperty('title');
|
||||
expect(parsed.frontmatter).not.toHaveProperty('tags');
|
||||
expect(parsed.frontmatter).toHaveProperty('custom_field', 'hello');
|
||||
});
|
||||
|
||||
test('infers type from file path', () => {
|
||||
const md = `---
|
||||
title: Someone
|
||||
---
|
||||
Content
|
||||
`;
|
||||
const parsed = parseMarkdown(md, 'people/someone.md');
|
||||
expect(parsed.type).toBe('person');
|
||||
});
|
||||
|
||||
test('infers slug from file path', () => {
|
||||
const md = `---
|
||||
type: concept
|
||||
title: Test
|
||||
---
|
||||
Content
|
||||
`;
|
||||
const parsed = parseMarkdown(md, 'concepts/do-things-that-dont-scale.md');
|
||||
expect(parsed.slug).toBe('concepts/do-things-that-dont-scale');
|
||||
});
|
||||
});
|
||||
|
||||
describe('splitBody', () => {
|
||||
test('splits at first standalone ---', () => {
|
||||
const body = 'Above the line\n\n---\n\nBelow the line';
|
||||
const { compiled_truth, timeline } = splitBody(body);
|
||||
expect(compiled_truth).toContain('Above the line');
|
||||
expect(timeline).toContain('Below the line');
|
||||
});
|
||||
|
||||
test('returns all as compiled_truth if no separator', () => {
|
||||
const body = 'Just some content\nWith multiple lines';
|
||||
const { compiled_truth, timeline } = splitBody(body);
|
||||
expect(compiled_truth).toBe(body);
|
||||
expect(timeline).toBe('');
|
||||
});
|
||||
|
||||
test('handles --- at end of content', () => {
|
||||
const body = 'Content here\n\n---\n';
|
||||
const { compiled_truth, timeline } = splitBody(body);
|
||||
expect(compiled_truth).toContain('Content here');
|
||||
expect(timeline.trim()).toBe('');
|
||||
});
|
||||
});
|
||||
|
||||
describe('serializeMarkdown', () => {
|
||||
test('round-trips through parse and serialize', () => {
|
||||
const original = `---
|
||||
type: concept
|
||||
title: Do Things That Don't Scale
|
||||
tags:
|
||||
- startups
|
||||
- growth
|
||||
custom: value
|
||||
---
|
||||
|
||||
Paul Graham argues that startups should do unscalable things early on.
|
||||
|
||||
---
|
||||
|
||||
- 2013-07-01: Published on paulgraham.com
|
||||
`;
|
||||
const parsed = parseMarkdown(original);
|
||||
const serialized = serializeMarkdown(
|
||||
parsed.frontmatter,
|
||||
parsed.compiled_truth,
|
||||
parsed.timeline,
|
||||
{ type: parsed.type, title: parsed.title, tags: parsed.tags },
|
||||
);
|
||||
|
||||
// Re-parse the serialized version
|
||||
const reparsed = parseMarkdown(serialized);
|
||||
expect(reparsed.type).toBe(parsed.type);
|
||||
expect(reparsed.title).toBe(parsed.title);
|
||||
expect(reparsed.compiled_truth).toBe(parsed.compiled_truth);
|
||||
expect(reparsed.timeline).toBe(parsed.timeline);
|
||||
expect(reparsed.frontmatter.custom).toBe('value');
|
||||
});
|
||||
});
|
||||
19
tsconfig.json
Normal file
19
tsconfig.json
Normal file
@@ -0,0 +1,19 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"target": "ESNext",
|
||||
"module": "ESNext",
|
||||
"moduleResolution": "bundler",
|
||||
"types": ["bun-types"],
|
||||
"strict": true,
|
||||
"skipLibCheck": true,
|
||||
"noEmit": true,
|
||||
"esModuleInterop": true,
|
||||
"allowImportingTsExtensions": true,
|
||||
"resolveJsonModule": true,
|
||||
"baseUrl": ".",
|
||||
"paths": {
|
||||
"@/*": ["src/*"]
|
||||
}
|
||||
},
|
||||
"include": ["src", "test"]
|
||||
}
|
||||
Reference in New Issue
Block a user