feat: GBrain v0.3.0 — contract-first architecture + ClawHub plugin (#7)
* feat: contract-first operations.ts with OperationError, dry_run, importFromContent 30 shared operations as single source of truth for CLI and MCP. - OperationError with typed error codes (page_not_found, invalid_params, etc.) - dry_run support on all mutating operations - importFromContent split from importFile with transaction wrapping - Idempotency hash now includes ALL fields (title, type, frontmatter, tags) - Config env var fallback: GBRAIN_DATABASE_URL > DATABASE_URL > config file Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: rewrite MCP server + CLI + tools-json from operations server.ts: 233 -> ~80 lines. Tool definitions and dispatch generated from operations[]. cli.ts: shared operations auto-registered, CLI-only commands kept as manual dispatch. tools-json: generated FROM operations[], eliminating the third contract surface. Parity test verifies structural contract between operations, CLI, and MCP. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: delete 12 command files migrated to operations.ts Handler logic for get, put, delete, list, search, query, health, stats, tags, link, timeline, and version now lives in operations.ts. Kept: init, upgrade, import, export, files, embed, sync, serve, call, config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: init --non-interactive, upgrade verification, schema migration - gbrain init --non-interactive --url <url> for plugin mode (no TTY required) - Post-upgrade version verification in gbrain upgrade - Drop storage_url from files table (storage_path is the only identifier) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: tool-agnostic skills + new setup skill All 7 skills rewritten with intent-based language instead of CLI commands. Works with both CLI and MCP plugin contexts. New setup skill replaces install: auto-provision Supabase via CLI, AGENTS.md injection, target TTHW < 2 min. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: ClawHub bundle plugin, CI workflows, v0.3.0 - openclaw.plugin.json with configSchema, MCP server config, skill listing - GitHub Actions: test on push/PR, multi-platform release (macOS arm64 + Linux x64) - Version bump 0.3.0, CHANGELOG, README ClawHub section, CLAUDE.md updated Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: idempotency hash mismatch + MCP dry_run passthrough importFromContent now passes its all-fields hash through putPage via content_hash on PageInput, so the stored hash matches the computed hash. Previously the skip-if-unchanged check never fired because the hash formulas differed. MCP server now passes dry_run from tool params to OperationContext. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.3.0.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: schema loader handles PL/pgSQL $$ blocks Delete the semicolon-based SQL splitter in db.ts which broke on PL/pgSQL trigger functions containing semicolons inside $$ delimiter blocks. Use single conn.unsafe(schemaSql) call instead — the postgres driver handles multi-statement SQL natively. schema.sql already uses IF NOT EXISTS / CREATE OR REPLACE for idempotency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: E2E test infrastructure + realistic brain fixtures Add test infrastructure for running E2E tests against real Postgres+pgvector. Includes: - test/e2e/helpers.ts: DB lifecycle, fixture import, timing, diagnostics - 13 fixture files as a miniature realistic brain (people, companies, deals, meetings, concepts, projects, sources) following the compiled truth + timeline format from GBRAIN_RECOMMENDED_SCHEMA.md - docker-compose.test.yml: local pgvector convenience (port 5433) - .env.testing.example: template for test credentials - package.json: add test:e2e script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: E2E test suites + CI workflow Tier 1 (mechanical.test.ts): 14 test suites covering all operations against real Postgres — page CRUD, search with quality scoring, links, tags, timeline, versions, admin, chunks, resolution, ingest log, raw data, files, idempotency stress, setup journey (full CLI flow), init edge cases, schema idempotency, schema diff guard, performance baselines. Tier 1 (mcp.test.ts): MCP protocol test — spawns server, sends JSON-RPC, verifies tools/list matches operations count. Tier 2 (skills.test.ts): OpenClaw skill tests — ingest, query, health. Skips gracefully when dependencies missing. CI (.github/workflows/e2e.yml): Tier 1 on every PR (pgvector service), Tier 2 nightly/manual with API key secrets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: E2E test fixes + traverseGraph jsonb cast - Fix traverseGraph query: cast json_agg to jsonb_agg so SELECT DISTINCT works - Fix put_page tests to use importFromContent with noEmbed (no OpenAI key in Tier 1) - Fix get_health assertion (page_count not total_pages) - Fix raw_data test to handle JSONB string/object return - Simplify MCP test to verify tool generation directly - Add timeouts to CLI subprocess tests - Use port 5434 for docker-compose (5433 often in use) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update all project docs for E2E test suite - CLAUDE.md: updated test count (9 unit + 3 E2E), added E2E test instructions, fixed skill count to 8 - CONTRIBUTING.md: updated project structure with test/e2e/, added E2E test instructions, rewrote "Adding a new command" to reflect contract-first architecture (add to operations.ts, done) - README.md: fixed table count (10 not 9), added recommended schema doc to Docs section, added E2E instructions to Contributing section - CHANGELOG.md: added E2E test suite, docker-compose, schema loader fix, and traverseGraph jsonb fix to v0.3.0 entry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
12
.env.testing.example
Normal file
12
.env.testing.example
Normal file
@@ -0,0 +1,12 @@
|
||||
# GBrain E2E Test Configuration
|
||||
# Copy to .env.testing and fill in real values
|
||||
#
|
||||
# Tier 1 (required for E2E tests)
|
||||
# Option A: Local Docker Postgres (default)
|
||||
DATABASE_URL=postgresql://postgres:postgres@localhost:5433/gbrain_test
|
||||
# Option B: Real Supabase instance (tests the actual production path)
|
||||
# DATABASE_URL=postgresql://postgres.[project-ref]:[password]@aws-0-us-east-1.pooler.supabase.com:6543/postgres
|
||||
|
||||
# Tier 2 (required for skill tests, optional for mechanical tests)
|
||||
OPENAI_API_KEY=sk-...
|
||||
ANTHROPIC_API_KEY=sk-ant-...
|
||||
89
.github/workflows/e2e.yml
vendored
Normal file
89
.github/workflows/e2e.yml
vendored
Normal file
@@ -0,0 +1,89 @@
|
||||
name: E2E Tests
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
pull_request:
|
||||
branches: [master]
|
||||
schedule:
|
||||
- cron: '0 6 * * *' # Nightly at 6am UTC
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
tier1:
|
||||
name: Tier 1 (Mechanical)
|
||||
runs-on: ubuntu-latest
|
||||
services:
|
||||
postgres:
|
||||
image: pgvector/pgvector:pg16
|
||||
env:
|
||||
POSTGRES_USER: postgres
|
||||
POSTGRES_PASSWORD: postgres
|
||||
POSTGRES_DB: gbrain_test
|
||||
ports:
|
||||
- 5432:5432
|
||||
options: >-
|
||||
--health-cmd pg_isready
|
||||
--health-interval 10s
|
||||
--health-timeout 5s
|
||||
--health-retries 5
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: latest
|
||||
- run: bun install
|
||||
- name: Run Tier 1 E2E tests
|
||||
run: bun test test/e2e/mechanical.test.ts test/e2e/mcp.test.ts
|
||||
env:
|
||||
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/gbrain_test
|
||||
|
||||
tier2:
|
||||
name: Tier 2 (LLM Skills)
|
||||
runs-on: ubuntu-latest
|
||||
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
|
||||
needs: tier1
|
||||
services:
|
||||
postgres:
|
||||
image: pgvector/pgvector:pg16
|
||||
env:
|
||||
POSTGRES_USER: postgres
|
||||
POSTGRES_PASSWORD: postgres
|
||||
POSTGRES_DB: gbrain_test
|
||||
ports:
|
||||
- 5432:5432
|
||||
options: >-
|
||||
--health-cmd pg_isready
|
||||
--health-interval 10s
|
||||
--health-timeout 5s
|
||||
--health-retries 5
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: latest
|
||||
- run: bun install
|
||||
- name: Install OpenClaw
|
||||
run: npm install -g openclaw
|
||||
- name: Configure OpenClaw MCP
|
||||
run: |
|
||||
mkdir -p ~/.openclaw
|
||||
cat > ~/.openclaw/config.json << 'EOF'
|
||||
{
|
||||
"mcpServers": {
|
||||
"gbrain": {
|
||||
"command": "bun",
|
||||
"args": ["run", "src/cli.ts", "serve"],
|
||||
"env": {
|
||||
"DATABASE_URL": "${{ env.DATABASE_URL }}"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
- name: Run Tier 2 skill tests
|
||||
run: bun test test/e2e/skills.test.ts
|
||||
env:
|
||||
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/gbrain_test
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
48
.github/workflows/release.yml
vendored
Normal file
48
.github/workflows/release.yml
vendored
Normal file
@@ -0,0 +1,48 @@
|
||||
name: Release
|
||||
|
||||
on:
|
||||
push:
|
||||
tags: ['v*']
|
||||
|
||||
permissions:
|
||||
contents: write
|
||||
|
||||
jobs:
|
||||
build:
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- os: macos-latest
|
||||
target: bun-darwin-arm64
|
||||
artifact: gbrain-darwin-arm64
|
||||
- os: ubuntu-latest
|
||||
target: bun-linux-x64
|
||||
artifact: gbrain-linux-x64
|
||||
runs-on: ${{ matrix.os }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: latest
|
||||
- run: bun install
|
||||
- run: bun test
|
||||
- run: bun build --compile --target=${{ matrix.target }} --outfile bin/${{ matrix.artifact }} src/cli.ts
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: ${{ matrix.artifact }}
|
||||
path: bin/${{ matrix.artifact }}
|
||||
|
||||
release:
|
||||
needs: build
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
path: artifacts
|
||||
- name: Create release
|
||||
uses: softprops/action-gh-release@v2
|
||||
with:
|
||||
files: |
|
||||
artifacts/gbrain-darwin-arm64/gbrain-darwin-arm64
|
||||
artifacts/gbrain-linux-x64/gbrain-linux-x64
|
||||
generate_release_notes: true
|
||||
18
.github/workflows/test.yml
vendored
Normal file
18
.github/workflows/test.yml
vendored
Normal file
@@ -0,0 +1,18 @@
|
||||
name: Test
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
pull_request:
|
||||
branches: [master]
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: oven-sh/setup-bun@v2
|
||||
with:
|
||||
bun-version: latest
|
||||
- run: bun install
|
||||
- run: bun test
|
||||
3
.gitignore
vendored
3
.gitignore
vendored
@@ -2,3 +2,6 @@ node_modules/
|
||||
bin/
|
||||
.DS_Store
|
||||
*.log
|
||||
.env.testing
|
||||
.18a49dfd730ff378-00000000.bun-build
|
||||
.18a49f9dfb996f70-00000000.bun-build
|
||||
|
||||
41
CHANGELOG.md
41
CHANGELOG.md
@@ -2,6 +2,47 @@
|
||||
|
||||
All notable changes to GBrain will be documented in this file.
|
||||
|
||||
## [0.3.0] - 2026-04-08
|
||||
|
||||
### Added
|
||||
|
||||
- Contract-first architecture: single `operations.ts` defines ~30 shared operations. CLI, MCP, and tools-json all generated from the same source. Zero drift.
|
||||
- `OperationError` type with structured error codes (`page_not_found`, `invalid_params`, `embedding_failed`, etc.). Agents can self-correct.
|
||||
- `dry_run` parameter on all mutating operations. Agents preview before committing.
|
||||
- `importFromContent()` split from `importFile()`. Both share the same chunk+embed+tag pipeline, but `importFromContent` works from strings (used by `put_page`). Wrapped in `engine.transaction()`.
|
||||
- Idempotency hash now includes ALL fields (title, type, frontmatter, tags), not just compiled_truth + timeline. Metadata-only edits no longer silently skipped.
|
||||
- `get_page` now supports optional `fuzzy: true` for slug resolution. Returns `resolved_slug` so callers know what happened.
|
||||
- `query` operation now supports `expand` toggle (default true). Both CLI and MCP get the same control.
|
||||
- 10 new operations wired up: `put_raw_data`, `get_raw_data`, `resolve_slugs`, `get_chunks`, `log_ingest`, `get_ingest_log`, `file_list`, `file_upload`, `file_url`.
|
||||
- OpenClaw bundle plugin manifest (`openclaw.plugin.json`) with config schema, MCP server config, and skill listing.
|
||||
- GitHub Actions CI: test on push/PR, multi-platform release builds (macOS arm64 + Linux x64) on version tags.
|
||||
- `gbrain init --non-interactive` flag for plugin mode (accepts config via flags/env vars, no TTY required).
|
||||
- Post-upgrade version verification in `gbrain upgrade`.
|
||||
- Parity test (`test/parity.test.ts`) verifies structural contract between operations, CLI, and MCP.
|
||||
- New `setup` skill replacing `install`: auto-provision Supabase via CLI, AGENTS.md injection, target TTHW < 2 min.
|
||||
- E2E test suite against real Postgres+pgvector. 13 realistic fixtures (miniature brain with people, companies, deals, meetings, concepts), 14 test suites covering all operations, search quality benchmarks, idempotency stress tests, schema validation, and full setup journey verification.
|
||||
- GitHub Actions E2E workflow: Tier 1 (mechanical) on every PR, Tier 2 (LLM skills via OpenClaw) nightly.
|
||||
- `docker-compose.test.yml` and `.env.testing.example` for local E2E development.
|
||||
|
||||
### Fixed
|
||||
|
||||
- Schema loader in `db.ts` broke on PL/pgSQL trigger functions containing semicolons inside `$$` blocks. Replaced per-statement execution with single `conn.unsafe()` call.
|
||||
- `traverseGraph` query failed with "could not identify equality operator for type json" when using `SELECT DISTINCT` with `json_agg`. Changed to `jsonb_agg`.
|
||||
|
||||
### Changed
|
||||
|
||||
- `src/mcp/server.ts` rewritten from ~233 to ~80 lines. Tool definitions and dispatch generated from operations[].
|
||||
- `src/cli.ts` rewritten. Shared operations auto-registered from operations[]. CLI-only commands (init, upgrade, import, export, files, embed) kept as manual registrations.
|
||||
- `tools-json` output now generated FROM operations[]. Third contract surface eliminated.
|
||||
- All 7 skills rewritten with tool-agnostic language. Works with both CLI and MCP plugin contexts.
|
||||
- File schema: `storage_url` column dropped, `storage_path` is the only identifier. URLs generated on demand via `file_url` operation.
|
||||
- Config loading: env vars (`GBRAIN_DATABASE_URL`, `DATABASE_URL`, `OPENAI_API_KEY`) override config file values. Plugin config injected via env vars.
|
||||
|
||||
### Removed
|
||||
|
||||
- 12 command files migrated to operations.ts: get.ts, put.ts, delete.ts, list.ts, search.ts, query.ts, health.ts, stats.ts, tags.ts, link.ts, timeline.ts, version.ts.
|
||||
- `storage_url` column from files table.
|
||||
|
||||
## [0.2.0.2] - 2026-04-07
|
||||
|
||||
### Changed
|
||||
|
||||
35
CLAUDE.md
35
CLAUDE.md
@@ -4,23 +4,24 @@ GBrain is a personal knowledge brain. Postgres + pgvector + hybrid search in a m
|
||||
|
||||
## Architecture
|
||||
|
||||
Thin CLI + fat skills. The CLI (`src/cli.ts`) dispatches commands to handler files in
|
||||
`src/commands/`. The core library (`src/core/`) handles database, search, embeddings,
|
||||
and markdown parsing. Skills (`skills/`) are fat markdown files that tell you HOW to
|
||||
use the tools — ingest meetings, answer queries, maintain the brain, enrich from APIs.
|
||||
Contract-first: `src/core/operations.ts` defines ~30 shared operations. CLI and MCP
|
||||
server are both generated from this single source. Skills are fat markdown files
|
||||
(tool-agnostic, work with both CLI and plugin contexts).
|
||||
|
||||
## Key files
|
||||
|
||||
- `src/core/operations.ts` — Contract-first operation definitions (the foundation)
|
||||
- `src/core/engine.ts` — Pluggable engine interface (BrainEngine)
|
||||
- `src/core/postgres-engine.ts` — Postgres + pgvector implementation
|
||||
- `src/core/db.ts` — Connection management, schema initialization
|
||||
- `src/core/import-file.ts` — Shared single-file import (used by import + sync)
|
||||
- `src/core/import-file.ts` — importFromFile + importFromContent (chunk + embed + tags)
|
||||
- `src/core/sync.ts` — Pure sync functions (manifest parsing, filtering, slug conversion)
|
||||
- `src/core/chunkers/` — 3-tier chunking (recursive, semantic, LLM-guided)
|
||||
- `src/core/search/` — Hybrid search: vector + keyword + RRF + multi-query expansion + dedup
|
||||
- `src/core/embedding.ts` — OpenAI text-embedding-3-large, batch, retry, backoff
|
||||
- `src/mcp/server.ts` — MCP stdio server exposing all tools
|
||||
- `src/mcp/server.ts` — MCP stdio server (generated from operations)
|
||||
- `src/schema.sql` — Full Postgres + pgvector DDL (includes files table)
|
||||
- `openclaw.plugin.json` — ClawHub bundle plugin manifest
|
||||
|
||||
## Commands
|
||||
|
||||
@@ -28,17 +29,27 @@ Run `gbrain --help` or `gbrain --tools-json` for full command reference.
|
||||
|
||||
## Testing
|
||||
|
||||
`bun test` runs all tests (39 tests across 3 files). Tests: `test/markdown.test.ts`
|
||||
(frontmatter parsing, round-trip serialization), `test/chunkers/recursive.test.ts`
|
||||
(delimiter splitting, overlap, chunk sizing), `test/sync.test.ts` (manifest parsing,
|
||||
isSyncable filtering, pathToSlug conversion).
|
||||
`bun test` runs all tests (9 unit test files + 3 E2E test files). Unit tests run
|
||||
without a database. E2E tests skip gracefully when `DATABASE_URL` is not set.
|
||||
|
||||
Unit tests: `test/markdown.test.ts` (frontmatter parsing), `test/chunkers/recursive.test.ts`
|
||||
(chunking), `test/sync.test.ts` (sync logic), `test/parity.test.ts` (operations contract
|
||||
parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redaction),
|
||||
`test/files.test.ts` (MIME/hash), `test/import-file.test.ts` (import pipeline),
|
||||
`test/upgrade.test.ts` (schema migrations).
|
||||
|
||||
E2E tests (`test/e2e/`): Run against real Postgres+pgvector. Require `DATABASE_URL`.
|
||||
- `bun run test:e2e` runs Tier 1 (mechanical, all operations, no API keys)
|
||||
- Tier 2 (`skills.test.ts`) requires OpenClaw + API keys, runs nightly in CI
|
||||
- Local setup: `docker compose -f docker-compose.test.yml up -d` then
|
||||
`DATABASE_URL=postgresql://postgres:postgres@localhost:5434/gbrain_test bun run test:e2e`
|
||||
|
||||
## Skills
|
||||
|
||||
Read the skill files in `skills/` before doing brain operations. They contain the
|
||||
workflows, heuristics, and quality rules for ingestion, querying, maintenance,
|
||||
enrichment, and installation. 7 skills: ingest, query, maintain, enrich, briefing,
|
||||
migrate, install.
|
||||
enrichment, and setup. 8 skills: ingest, query, maintain, enrich, briefing,
|
||||
migrate, setup, install.
|
||||
|
||||
## Build
|
||||
|
||||
|
||||
@@ -16,11 +16,13 @@ Requires Bun 1.0+.
|
||||
```
|
||||
src/
|
||||
cli.ts CLI entry point
|
||||
commands/ Command handlers (one file per command)
|
||||
commands/ CLI-only commands (init, upgrade, import, export, etc.)
|
||||
core/
|
||||
operations.ts Contract-first operation definitions (the foundation)
|
||||
engine.ts BrainEngine interface
|
||||
postgres-engine.ts Postgres implementation
|
||||
db.ts Connection management
|
||||
db.ts Connection management + schema loader
|
||||
import-file.ts Import pipeline (chunk + embed + tags)
|
||||
types.ts TypeScript types
|
||||
markdown.ts Frontmatter parsing
|
||||
config.ts Config file management
|
||||
@@ -28,18 +30,31 @@ src/
|
||||
search/ Hybrid search (vector, keyword, hybrid, expansion, dedup)
|
||||
embedding.ts OpenAI embedding service
|
||||
mcp/
|
||||
server.ts MCP stdio server
|
||||
server.ts MCP stdio server (generated from operations)
|
||||
schema.sql Postgres DDL
|
||||
skills/ Fat markdown skills for AI agents
|
||||
test/ Tests (bun test)
|
||||
test/ Unit tests (bun test, no DB required)
|
||||
test/e2e/ E2E tests (requires DATABASE_URL, real Postgres+pgvector)
|
||||
fixtures/ Miniature realistic brain corpus (13 files)
|
||||
helpers.ts DB lifecycle, fixture import, timing
|
||||
mechanical.test.ts All operations against real DB
|
||||
mcp.test.ts MCP tool generation verification
|
||||
skills.test.ts Tier 2 skill tests (requires OpenClaw + API keys)
|
||||
docs/ Architecture docs
|
||||
```
|
||||
|
||||
## Running tests
|
||||
|
||||
```bash
|
||||
bun test # all tests
|
||||
bun test test/markdown.test.ts # specific test
|
||||
bun test # all tests (unit + E2E skipped without DB)
|
||||
bun test test/markdown.test.ts # specific unit test
|
||||
|
||||
# E2E tests (requires Postgres with pgvector)
|
||||
docker compose -f docker-compose.test.yml up -d
|
||||
DATABASE_URL=postgresql://postgres:postgres@localhost:5434/gbrain_test bun run test:e2e
|
||||
|
||||
# Or use your own Postgres / Supabase
|
||||
DATABASE_URL=postgresql://... bun run test:e2e
|
||||
```
|
||||
|
||||
## Building
|
||||
@@ -48,15 +63,20 @@ bun test test/markdown.test.ts # specific test
|
||||
bun build --compile --outfile bin/gbrain src/cli.ts
|
||||
```
|
||||
|
||||
## Adding a new command
|
||||
## Adding a new operation
|
||||
|
||||
1. Create `src/commands/mycommand.ts` with an exported `runMyCommand` function
|
||||
2. Add the case to `src/cli.ts` in the switch statement
|
||||
3. Add the tool to `src/mcp/server.ts` in `handleToolCall` and `getToolDefinitions`
|
||||
4. Add to `src/commands/tools-json.ts`
|
||||
5. Add tests
|
||||
GBrain uses a contract-first architecture. Add your operation to one file and it
|
||||
automatically appears in the CLI, MCP server, and tools-json:
|
||||
|
||||
CLI and MCP must expose identical operations. Drift tests will verify this.
|
||||
1. Add your operation to `src/core/operations.ts` (define params, handler, cliHints)
|
||||
2. Add tests
|
||||
3. That's it. The CLI, MCP server, and tools-json are generated from operations.
|
||||
|
||||
For CLI-only commands (init, upgrade, import, export, files, embed):
|
||||
1. Create `src/commands/mycommand.ts`
|
||||
2. Add the case to `src/cli.ts`
|
||||
|
||||
Parity tests (`test/parity.test.ts`) verify CLI/MCP/tools-json stay in sync.
|
||||
|
||||
## Adding a new engine
|
||||
|
||||
|
||||
23
README.md
23
README.md
@@ -142,7 +142,7 @@ GBrain keeps your brain current automatically. After setup, `gbrain sync --watch
|
||||
clawhub install gbrain
|
||||
```
|
||||
|
||||
This installs the package, copies the skill files, and runs `gbrain init --supabase` on first use.
|
||||
ClawHub installs the bundle plugin, configures the MCP server, and auto-runs the setup skill. Each brain should have its own Supabase project (one project per person or team).
|
||||
|
||||
### Standalone CLI
|
||||
|
||||
@@ -279,7 +279,7 @@ Keyword search alone misses conceptual matches. "Ignore conventional wisdom" won
|
||||
|
||||
## Database schema
|
||||
|
||||
9 tables in Postgres + pgvector:
|
||||
10 tables in Postgres + pgvector:
|
||||
|
||||
```
|
||||
pages The core content table
|
||||
@@ -314,7 +314,7 @@ raw_data Sidecar JSON from external APIs
|
||||
|
||||
files Binary attachments in Supabase Storage
|
||||
page_slug (FK) Links to pages (ON UPDATE CASCADE)
|
||||
storage_path, storage_url, content_hash, mime_type, metadata (JSONB)
|
||||
storage_path, content_hash, mime_type, metadata (JSONB)
|
||||
|
||||
ingest_log Audit trail of import/ingest operations
|
||||
|
||||
@@ -440,9 +440,9 @@ Add to your Claude Code or Cursor MCP config:
|
||||
}
|
||||
```
|
||||
|
||||
21 tools: get_page, put_page, delete_page, list_pages, search, query, add_tag, remove_tag, get_tags, add_link, remove_link, get_links, get_backlinks, traverse_graph, add_timeline_entry, get_timeline, get_stats, get_health, get_versions, revert_version, sync_brain.
|
||||
30 tools generated from the contract-first `operations.ts`: page CRUD, search, tags, links, timeline, admin, sync, raw data, file management, and more.
|
||||
|
||||
Every tool mirrors a CLI command. Drift tests verify identical behavior.
|
||||
Every tool is generated from the same operation definitions as the CLI. Parity tests verify structural identity.
|
||||
|
||||
## Skills
|
||||
|
||||
@@ -456,7 +456,7 @@ Fat markdown files that tell AI agents HOW to use gbrain. No skill logic in the
|
||||
| **enrich** | Enrich pages from external APIs. Raw data stored separately, distilled highlights go to compiled truth. |
|
||||
| **briefing** | Daily briefing: today's meetings with participant context, active deals with deadlines, time-sensitive threads, recent changes. |
|
||||
| **migrate** | Universal migration from Obsidian (wikilinks to gbrain links), Notion (stripped UUIDs), Logseq (block refs), plain markdown, CSV, JSON, Roam. |
|
||||
| **install** | Set up GBrain from scratch: Supabase setup (magic path via CLI or 2-copy-paste fallback), import, sync cron, optional file migration, agent teaching. |
|
||||
| **setup** | Set up GBrain from scratch: auto-provision Supabase via CLI, AGENTS.md injection, import, sync. Target TTHW < 2 min. |
|
||||
|
||||
## Architecture
|
||||
|
||||
@@ -499,18 +499,23 @@ Initial embedding cost: ~$4-5 for 7,500 pages via OpenAI text-embedding-3-large.
|
||||
|
||||
## Docs
|
||||
|
||||
- [GBRAIN_RECOMMENDED_SCHEMA.md](docs/GBRAIN_RECOMMENDED_SCHEMA.md) -- The recommended brain schema: MECE directories, compiled truth + timeline, enrichment pipelines, resolver decision tree
|
||||
- [GBRAIN_V0.md](docs/GBRAIN_V0.md) -- Full product spec, all architecture decisions, every option considered
|
||||
- [ENGINES.md](docs/ENGINES.md) -- Pluggable engine interface, capability matrix, how to add backends
|
||||
- [SQLITE_ENGINE.md](docs/SQLITE_ENGINE.md) -- Complete SQLite engine plan with schema, FTS5, vector search options
|
||||
|
||||
## Contributing
|
||||
|
||||
See [CONTRIBUTING.md](CONTRIBUTING.md). Welcome PRs for:
|
||||
See [CONTRIBUTING.md](CONTRIBUTING.md). Run `bun test` for unit tests. For E2E tests
|
||||
against real Postgres+pgvector: `docker compose -f docker-compose.test.yml up -d` then
|
||||
`DATABASE_URL=postgresql://postgres:postgres@localhost:5434/gbrain_test bun run test:e2e`.
|
||||
|
||||
Welcome PRs for:
|
||||
|
||||
- SQLite engine implementation
|
||||
- Docker Compose for self-hosted Postgres
|
||||
- Additional migration sources
|
||||
- Additional migration sources (Logseq, Roam, Notion)
|
||||
- New enrichment API integrations
|
||||
- Performance optimizations
|
||||
|
||||
## License
|
||||
|
||||
|
||||
14
docker-compose.test.yml
Normal file
14
docker-compose.test.yml
Normal file
@@ -0,0 +1,14 @@
|
||||
services:
|
||||
postgres:
|
||||
image: pgvector/pgvector:pg16
|
||||
environment:
|
||||
POSTGRES_USER: postgres
|
||||
POSTGRES_PASSWORD: postgres
|
||||
POSTGRES_DB: gbrain_test
|
||||
ports:
|
||||
- "5434:5432"
|
||||
healthcheck:
|
||||
test: pg_isready -U postgres
|
||||
interval: 5s
|
||||
timeout: 3s
|
||||
retries: 5
|
||||
40
openclaw.plugin.json
Normal file
40
openclaw.plugin.json
Normal file
@@ -0,0 +1,40 @@
|
||||
{
|
||||
"name": "gbrain",
|
||||
"version": "0.3.0",
|
||||
"description": "Personal knowledge brain with Postgres + pgvector hybrid search",
|
||||
"family": "bundle-plugin",
|
||||
"configSchema": {
|
||||
"database_url": {
|
||||
"type": "string",
|
||||
"required": true,
|
||||
"description": "PostgreSQL connection URL (Supabase recommended)",
|
||||
"uiHints": { "sensitive": true }
|
||||
},
|
||||
"openai_api_key": {
|
||||
"type": "string",
|
||||
"required": false,
|
||||
"description": "OpenAI API key for embeddings (uses OPENAI_API_KEY env var if not set)",
|
||||
"uiHints": { "sensitive": true }
|
||||
}
|
||||
},
|
||||
"mcpServers": {
|
||||
"gbrain": {
|
||||
"command": "./bin/gbrain",
|
||||
"args": ["serve"]
|
||||
}
|
||||
},
|
||||
"skills": [
|
||||
"skills/ingest",
|
||||
"skills/query",
|
||||
"skills/maintain",
|
||||
"skills/enrich",
|
||||
"skills/briefing",
|
||||
"skills/migrate",
|
||||
"skills/setup"
|
||||
],
|
||||
"openclaw": {
|
||||
"compat": {
|
||||
"pluginApi": ">=2026.4.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
16
package.json
16
package.json
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "gbrain",
|
||||
"version": "0.2.0",
|
||||
"version": "0.3.0",
|
||||
"description": "Postgres-native personal knowledge brain with hybrid RAG search",
|
||||
"type": "module",
|
||||
"main": "src/core/index.ts",
|
||||
@@ -10,12 +10,22 @@
|
||||
"exports": {
|
||||
".": "./src/core/index.ts",
|
||||
"./engine": "./src/core/engine.ts",
|
||||
"./types": "./src/core/types.ts"
|
||||
"./types": "./src/core/types.ts",
|
||||
"./operations": "./src/core/operations.ts"
|
||||
},
|
||||
"scripts": {
|
||||
"dev": "bun run src/cli.ts",
|
||||
"build": "bun build --compile --outfile bin/gbrain src/cli.ts",
|
||||
"test": "bun test"
|
||||
"build:all": "bun build --compile --target=bun-darwin-arm64 --outfile bin/gbrain-darwin-arm64 src/cli.ts && bun build --compile --target=bun-linux-x64 --outfile bin/gbrain-linux-x64 src/cli.ts",
|
||||
"test": "bun test",
|
||||
"test:e2e": "bun test test/e2e/",
|
||||
"prepublish:clawhub": "bun run build:all",
|
||||
"publish:clawhub": "clawhub package publish . --family bundle-plugin"
|
||||
},
|
||||
"openclaw": {
|
||||
"compat": {
|
||||
"pluginApi": ">=2026.4.0"
|
||||
}
|
||||
},
|
||||
"dependencies": {
|
||||
"@anthropic-ai/sdk": "^0.30.0",
|
||||
|
||||
@@ -5,27 +5,27 @@ Compile a daily briefing from brain context.
|
||||
## Workflow
|
||||
|
||||
1. **Today's meetings.** For each meeting on the calendar:
|
||||
- Look up all participants via `gbrain query <name>`
|
||||
- Read their pages for compiled_truth context
|
||||
- Search gbrain for each participant by name
|
||||
- Read their pages from gbrain for compiled_truth context
|
||||
- Summarize: who they are, recent timeline, relationship to you
|
||||
2. **Active deals.** `gbrain list --type deal` filtered to active status:
|
||||
2. **Active deals.** List deal pages in gbrain filtered to active status:
|
||||
- Deadlines approaching in the next 7 days
|
||||
- Recent timeline entries (last 7 days)
|
||||
3. **Time-sensitive threads.** Open items from timeline entries:
|
||||
- Items with deadlines in the next 48 hours
|
||||
- Follow-ups that are overdue
|
||||
4. **Recent changes.** Pages updated in the last 24 hours:
|
||||
- What changed and why (read timeline entries)
|
||||
5. **People in play.** `gbrain list --type person` sorted by recency:
|
||||
- What changed and why (read timeline entries from gbrain)
|
||||
5. **People in play.** List person pages in gbrain sorted by recency:
|
||||
- Updated in last 7 days
|
||||
- Have high activity (many recent timeline entries)
|
||||
6. **Stale alerts.** From `gbrain health`:
|
||||
6. **Stale alerts.** From gbrain health check:
|
||||
- Pages flagged as stale that are relevant to today's meetings
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
DAILY BRIEFING — [date]
|
||||
DAILY BRIEFING -- [date]
|
||||
========================
|
||||
|
||||
MEETINGS TODAY
|
||||
@@ -33,26 +33,23 @@ MEETINGS TODAY
|
||||
Participants: [name] (slug: people/name, [key context])
|
||||
|
||||
ACTIVE DEALS
|
||||
- [deal name] — [status], deadline: [date]
|
||||
- [deal name] -- [status], deadline: [date]
|
||||
Recent: [latest timeline entry]
|
||||
|
||||
ACTION ITEMS
|
||||
- [item] — due [date], related to [slug]
|
||||
- [item] -- due [date], related to [slug]
|
||||
|
||||
RECENT CHANGES (24h)
|
||||
- [slug] — [what changed]
|
||||
- [slug] -- [what changed]
|
||||
|
||||
PEOPLE IN PLAY
|
||||
- [name] — [why they're active]
|
||||
- [name] -- [why they're active]
|
||||
```
|
||||
|
||||
## Commands Used
|
||||
## Tools Used
|
||||
|
||||
```
|
||||
gbrain query <name>
|
||||
gbrain get <slug>
|
||||
gbrain list --type deal
|
||||
gbrain list --type person
|
||||
gbrain health
|
||||
gbrain timeline <slug>
|
||||
```
|
||||
- Search gbrain by name (query)
|
||||
- Read a page from gbrain (get_page)
|
||||
- List pages in gbrain by type (list_pages)
|
||||
- Check gbrain health (get_health)
|
||||
- View timeline entries in gbrain (get_timeline)
|
||||
|
||||
@@ -11,16 +11,17 @@ Enrich person and company pages from external APIs.
|
||||
| Exa | Web mentions, articles | REST API |
|
||||
|
||||
Note: enrichment requires separate API credentials for each service. No client
|
||||
integrations ship in v1. This skill guides Claude Code to make API calls directly.
|
||||
integrations ship in v1. This skill guides the agent to make API calls directly.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Select target pages.** `gbrain list --type person` or `gbrain list --type company`
|
||||
1. **Select target pages.** List person or company pages in gbrain.
|
||||
2. **For each page:**
|
||||
- Read current compiled_truth to understand what we already know
|
||||
- Read the page from gbrain to understand what we already know
|
||||
- Call external APIs for fresh data
|
||||
- Store raw API responses: the raw JSON goes into `gbrain call put_raw_data`
|
||||
- Store raw API responses in gbrain (put_raw_data) to preserve provenance
|
||||
- Distill highlights into compiled_truth updates
|
||||
- Store the updated page in gbrain
|
||||
3. **Validation rules:**
|
||||
- Connection count < 20 on LinkedIn = likely wrong person, skip
|
||||
- Name mismatch between brain and API = skip, flag for manual review
|
||||
@@ -28,18 +29,17 @@ integrations ship in v1. This skill guides Claude Code to make API calls directl
|
||||
|
||||
## Quality Rules
|
||||
|
||||
- Raw data goes to raw_data table (preserves provenance)
|
||||
- Raw data goes to gbrain's raw_data store (preserves provenance)
|
||||
- Only distilled, useful info goes to compiled_truth
|
||||
- Always add a timeline entry: "Enriched from [source] on [date]"
|
||||
- Always add a timeline entry in gbrain: "Enriched from [source] on [date]"
|
||||
- Don't enrich the same page more than once per week unless requested
|
||||
- Rate limit: respect API rate limits, use exponential backoff
|
||||
|
||||
## Commands Used
|
||||
## Tools Used
|
||||
|
||||
```
|
||||
gbrain get <slug>
|
||||
gbrain put <slug>
|
||||
gbrain timeline-add <slug> <date> "Enriched from <source>"
|
||||
gbrain list --type person
|
||||
gbrain list --type company
|
||||
```
|
||||
- Read a page from gbrain (get_page)
|
||||
- Store/update a page in gbrain (put_page)
|
||||
- Add a timeline entry in gbrain (add_timeline_entry)
|
||||
- List pages in gbrain by type (list_pages)
|
||||
- Store raw API data in gbrain (put_raw_data)
|
||||
- Retrieve raw data from gbrain (get_raw_data)
|
||||
|
||||
@@ -6,11 +6,11 @@ Ingest meetings, articles, documents, and conversations into the brain.
|
||||
|
||||
1. **Parse the source.** Extract people, companies, dates, and events from the input.
|
||||
2. **For each entity mentioned:**
|
||||
- `gbrain get <slug>` to check if page exists
|
||||
- Read the entity's page from gbrain to check if it exists
|
||||
- If exists: update compiled_truth (rewrite State section with new info, don't append)
|
||||
- If new: `gbrain put <slug>` to create the page
|
||||
3. **Append to timeline.** `gbrain timeline-add <slug> <date> <summary>` for each event.
|
||||
4. **Create cross-reference links.** `gbrain link <from> <to> --type <relationship>` for every entity pair mentioned together.
|
||||
- If new: store the page in gbrain with the appropriate type and slug
|
||||
3. **Append to timeline.** Add a timeline entry in gbrain for each event, with date, summary, and source.
|
||||
4. **Create cross-reference links.** Link entities in gbrain for every entity pair mentioned together, using the appropriate relationship type.
|
||||
5. **Timeline merge.** The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.
|
||||
|
||||
## Quality Rules
|
||||
@@ -22,13 +22,11 @@ Ingest meetings, articles, documents, and conversations into the brain.
|
||||
- Link types: knows, works_at, invested_in, founded, met_at, discussed
|
||||
- Source attribution: every timeline entry includes the source (meeting, article, email, etc.)
|
||||
|
||||
## Commands Used
|
||||
## Tools Used
|
||||
|
||||
```
|
||||
gbrain get <slug>
|
||||
gbrain put <slug> < content.md
|
||||
gbrain timeline-add <slug> <date> <summary>
|
||||
gbrain link <from> <to> --type <type>
|
||||
gbrain tags <slug>
|
||||
gbrain tag <slug> <tag>
|
||||
```
|
||||
- Read a page from gbrain (get_page)
|
||||
- Store/update a page in gbrain (put_page)
|
||||
- Add a timeline entry in gbrain (add_timeline_entry)
|
||||
- Link entities in gbrain (add_link)
|
||||
- List tags for a page (get_tags)
|
||||
- Tag a page in gbrain (add_tag)
|
||||
|
||||
@@ -1,210 +1,9 @@
|
||||
# Install GBrain
|
||||
# Install GBrain (Deprecated)
|
||||
|
||||
Set up GBrain from scratch. The agent drives the process, the human provides secrets and approvals.
|
||||
This skill has been replaced by the **setup** skill. See `skills/setup/SKILL.md`.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- A Supabase account (Pro tier recommended: $25/mo for 8GB DB + 100GB storage)
|
||||
- An OpenAI API key (for semantic search embeddings, ~$4-5 for 7,500 pages)
|
||||
- A git-backed markdown knowledge base (or start fresh)
|
||||
|
||||
## Phase 1: Environment Discovery
|
||||
|
||||
Scan the environment to understand what we're working with.
|
||||
|
||||
```bash
|
||||
# Find all git repos with markdown content
|
||||
echo "=== GBrain Environment Discovery ==="
|
||||
for dir in /data/* ~/git/* ~/Documents/* 2>/dev/null; do
|
||||
if [ -d "$dir/.git" ]; then
|
||||
md_count=$(find "$dir" -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | wc -l | tr -d ' ')
|
||||
if [ "$md_count" -gt 10 ]; then
|
||||
total_size=$(du -sh "$dir" 2>/dev/null | cut -f1)
|
||||
binary_count=$(find "$dir" -not -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" -type f \( -name "*.jpg" -o -name "*.png" -o -name "*.pdf" -o -name "*.mp4" -o -name "*.m4a" -o -name "*.heic" -o -name "*.tiff" -o -name "*.dng" \) 2>/dev/null | wc -l | tr -d ' ')
|
||||
echo ""
|
||||
echo " $dir ($total_size, $md_count .md files, $binary_count binary files)"
|
||||
# Detect knowledge base type
|
||||
if [ -d "$dir/.obsidian" ]; then
|
||||
echo " Type: Obsidian vault (detected, wikilink conversion needed in future release)"
|
||||
elif [ -d "$dir/logseq" ]; then
|
||||
echo " Type: Logseq (detected, block-ref conversion needed in future release)"
|
||||
else
|
||||
echo " Type: Plain markdown (ready for import)"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
done
|
||||
echo ""
|
||||
echo "=== Discovery Complete ==="
|
||||
```
|
||||
|
||||
Present findings to the human. Recommend which repos to import.
|
||||
|
||||
## Phase 2: Supabase Setup
|
||||
|
||||
### Magic Path (zero copy-pastes)
|
||||
|
||||
Check if the Supabase CLI is available:
|
||||
|
||||
```bash
|
||||
which supabase 2>/dev/null || npx supabase --version 2>/dev/null
|
||||
```
|
||||
|
||||
If available, use the magic path:
|
||||
|
||||
1. Tell the human: "I'll set up Supabase for you. Click 'Authorize' when your browser opens."
|
||||
2. Run `supabase login` (opens browser for OAuth)
|
||||
3. Run `supabase projects create --name gbrain --region us-east-1`
|
||||
4. Extract credentials from `supabase projects api-keys`
|
||||
5. Proceed to Phase 3 automatically
|
||||
|
||||
### Fallback Path (2 copy-pastes)
|
||||
|
||||
If the Supabase CLI is not available, tell the human exactly what to do:
|
||||
|
||||
1. "Log into Supabase and add a credit card: https://supabase.com/dashboard/account/billing"
|
||||
2. "Create a new project: https://supabase.com/dashboard/new/_"
|
||||
- Name: gbrain
|
||||
- Region: closest to you
|
||||
- Generate a strong password
|
||||
3. "Go to Project Settings > Database and copy the connection string (URI format)"
|
||||
- Paste it here
|
||||
4. "Go to Project Settings > API and copy the service_role key"
|
||||
- Paste it here
|
||||
|
||||
That's it. Two copy-pastes. The agent does everything else.
|
||||
|
||||
## Phase 3: Initialize GBrain
|
||||
|
||||
```bash
|
||||
gbrain init \
|
||||
--url "<database_url>" \
|
||||
--repo "<repo_path>"
|
||||
```
|
||||
|
||||
This runs:
|
||||
1. Connection test (SELECT 1)
|
||||
2. pgvector extension check (CREATE EXTENSION IF NOT EXISTS vector)
|
||||
3. Schema migration (idempotent, safe to re-run)
|
||||
4. Text import (all .md files, no embeddings yet)
|
||||
5. Sync checkpoint (writes git HEAD for seamless gbrain sync)
|
||||
|
||||
### First Search Result
|
||||
|
||||
After import completes, run a sample query to prove it works:
|
||||
|
||||
```bash
|
||||
# Query the most recently modified page's topic
|
||||
gbrain query "$(ls -t <repo_path>/*.md <repo_path>/**/*.md 2>/dev/null | head -1 | xargs head -5 | grep -i 'title:' | cut -d: -f2 | tr -d ' ')"
|
||||
```
|
||||
|
||||
Show results to the human immediately. This is the magic moment.
|
||||
|
||||
### Start Embeddings
|
||||
|
||||
```bash
|
||||
gbrain embed --stale &
|
||||
```
|
||||
|
||||
Embeddings run in background. Keyword search works NOW. Semantic search improves as embeddings complete. Check progress with `gbrain embed --status`.
|
||||
|
||||
## Phase 4: Set Up Ongoing Sync
|
||||
|
||||
```bash
|
||||
# Add to cron (every 5 minutes)
|
||||
(crontab -l 2>/dev/null; echo "*/5 * * * * gbrain sync --no-pull 2>&1 | tail -1 >> /tmp/gbrain-sync.log") | crontab -
|
||||
```
|
||||
|
||||
Or for agents that push to the brain repo, trigger sync after writes:
|
||||
```bash
|
||||
gbrain sync --no-pull
|
||||
```
|
||||
|
||||
## Phase 5: Optional File Migration
|
||||
|
||||
If the repo has >100MB of binary files:
|
||||
|
||||
1. **Tell the human what will happen:**
|
||||
"Your repo has X binary files (Y MB). I can move them to Supabase Storage to slim down git. Files stay in git history permanently. Want me to proceed?"
|
||||
|
||||
2. **If approved:**
|
||||
```bash
|
||||
gbrain health # verify everything is connected
|
||||
gbrain files sync <repo>/attachments/ # upload all files
|
||||
gbrain files verify # mandatory 100% verification
|
||||
# STOP: ask human for approval before git rm
|
||||
```
|
||||
|
||||
3. **After human approves git rm:**
|
||||
```bash
|
||||
cd <repo>
|
||||
echo "attachments/" >> .gitignore
|
||||
git rm -r --cached attachments/
|
||||
git commit -m "Move attachments to Supabase Storage"
|
||||
git push
|
||||
```
|
||||
|
||||
## Phase 6: Teach the Agent
|
||||
|
||||
Add GBrain rules to AGENTS.md (or equivalent):
|
||||
|
||||
```markdown
|
||||
## GBrain (Knowledge Search)
|
||||
|
||||
GBrain indexes your knowledge base for fast search. Always search before answering
|
||||
questions about people, companies, deals, or anything in the brain.
|
||||
|
||||
### Commands
|
||||
- `gbrain query "search terms"` -- Search the knowledge base (keyword + semantic)
|
||||
- `gbrain sync` -- Sync latest changes from git to GBrain
|
||||
- `gbrain files upload <path> --page <slug>` -- Upload a file to storage
|
||||
- `gbrain health` -- Check GBrain status
|
||||
- `gbrain stats` -- Show page count, embedding coverage, last sync
|
||||
|
||||
### Rules
|
||||
1. **Search the brain first.** Before answering any question about people, companies,
|
||||
deals, meetings, or strategy, run `gbrain query`. Your memory of file contents
|
||||
goes stale; the database doesn't.
|
||||
2. **Never commit binaries to git.** Use `gbrain files upload` instead.
|
||||
3. **After writing to the brain repo,** trigger `gbrain sync --no-pull` to update
|
||||
the search index immediately.
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
Every error tells you what happened, why, and how to fix it:
|
||||
|
||||
| What You See | Why | Fix |
|
||||
|---|---|---|
|
||||
| Connection refused | Supabase project paused or wrong URL | supabase.com/dashboard > Restore |
|
||||
| Password authentication failed | Wrong password | Project Settings > Database > Reset password |
|
||||
| pgvector not available | Extension not enabled | Run CREATE EXTENSION vector in SQL Editor |
|
||||
| OpenAI key invalid | Expired or wrong key | platform.openai.com/api-keys > Create new |
|
||||
| Sync anchor missing | Force push removed the commit | `gbrain sync --full` |
|
||||
| No pages found | Query before import | `gbrain import <dir>` first |
|
||||
|
||||
## Upgrading
|
||||
|
||||
Upgrade depends on how you installed:
|
||||
- **bun (standalone or library):** `bun update gbrain`
|
||||
- **ClawHub:** `clawhub update gbrain`
|
||||
- **Compiled binary:** Download the latest from [GitHub Releases](https://github.com/garrytan/gbrain/releases)
|
||||
|
||||
After upgrading:
|
||||
- Run `gbrain init` again to apply schema migrations (idempotent, safe to re-run)
|
||||
- The new `files` table gets created automatically on next init
|
||||
- Sync state is preserved across upgrades
|
||||
|
||||
## Health Check
|
||||
|
||||
Run `gbrain health` at any time to verify all connections:
|
||||
|
||||
```
|
||||
ok Database: connected
|
||||
ok pgvector: extension loaded
|
||||
ok Schema: up to date
|
||||
ok Sync: last run N min ago
|
||||
ok Embeddings: X/Y pages embedded
|
||||
```
|
||||
|
||||
Every unhealthy line includes WHY and FIX.
|
||||
The setup skill provides:
|
||||
- Auto-provision Supabase via CLI (< 2 min TTHW)
|
||||
- Manual fallback with non-interactive init
|
||||
- AGENTS.md auto-injection (upgrade-safe)
|
||||
- First import and health verification
|
||||
|
||||
@@ -4,34 +4,34 @@ Periodic brain health checks and cleanup.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Run health check.** `gbrain health` to get the dashboard.
|
||||
1. **Run health check.** Check gbrain health to get the dashboard.
|
||||
2. **Check each dimension:**
|
||||
|
||||
### Stale pages
|
||||
Pages where compiled_truth is older than the latest timeline entry. The assessment hasn't been updated to reflect recent evidence.
|
||||
- `gbrain query "stale pages"` or check health output
|
||||
- For each stale page: read timeline, determine if compiled_truth needs rewriting
|
||||
- Check the health output for stale page count
|
||||
- For each stale page: read the page from gbrain, review timeline, determine if compiled_truth needs rewriting
|
||||
|
||||
### Orphan pages
|
||||
Pages with zero inbound links. Nobody references them.
|
||||
- Review orphans: are they genuinely isolated or just missing links?
|
||||
- Add links from related pages or flag for deletion
|
||||
- Add links in gbrain from related pages or flag for deletion
|
||||
|
||||
### Dead links
|
||||
Links pointing to pages that don't exist.
|
||||
- Remove dead links with `gbrain unlink`
|
||||
- Remove dead links in gbrain
|
||||
|
||||
### Missing cross-references
|
||||
Pages that mention entity names but don't have formal links.
|
||||
- Read compiled_truth, extract entity mentions, create links
|
||||
- Read compiled_truth from gbrain, extract entity mentions, create links in gbrain
|
||||
|
||||
### Tag consistency
|
||||
Inconsistent tagging (e.g., "vc" vs "venture-capital", "ai" vs "artificial-intelligence").
|
||||
- Standardize to the most common variant
|
||||
- Standardize to the most common variant using gbrain tag operations
|
||||
|
||||
### Embedding freshness
|
||||
Chunks without embeddings, or chunks embedded with an old model.
|
||||
- `gbrain embed --stale` to backfill
|
||||
- Refresh stale embeddings in gbrain
|
||||
|
||||
### Open threads
|
||||
Timeline items older than 30 days with unresolved action items.
|
||||
@@ -41,19 +41,16 @@ Timeline items older than 30 days with unresolved action items.
|
||||
|
||||
- Never delete pages without confirmation
|
||||
- Log all changes via timeline entries
|
||||
- Run `gbrain health` before and after to show improvement
|
||||
- Check gbrain health before and after to show improvement
|
||||
|
||||
## Commands Used
|
||||
## Tools Used
|
||||
|
||||
```
|
||||
gbrain health
|
||||
gbrain list [--type T]
|
||||
gbrain get <slug>
|
||||
gbrain backlinks <slug>
|
||||
gbrain link <from> <to> --type <type>
|
||||
gbrain unlink <from> <to>
|
||||
gbrain tag <slug> <tag>
|
||||
gbrain untag <slug> <tag>
|
||||
gbrain embed --stale
|
||||
gbrain timeline <slug>
|
||||
```
|
||||
- Check gbrain health (get_health)
|
||||
- List pages in gbrain with filters (list_pages)
|
||||
- Read a page from gbrain (get_page)
|
||||
- Check backlinks in gbrain (get_backlinks)
|
||||
- Link entities in gbrain (add_link)
|
||||
- Remove links in gbrain (remove_link)
|
||||
- Tag a page in gbrain (add_tag)
|
||||
- Remove a tag in gbrain (remove_tag)
|
||||
- View timeline in gbrain (get_timeline)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "gbrain",
|
||||
"version": "0.2.0",
|
||||
"version": "0.3.0",
|
||||
"description": "Personal knowledge brain with hybrid RAG search",
|
||||
"skills": [
|
||||
{
|
||||
@@ -34,9 +34,9 @@
|
||||
"description": "Universal migration from Obsidian, Notion, Logseq, markdown, CSV, JSON, Roam"
|
||||
},
|
||||
{
|
||||
"name": "install",
|
||||
"path": "install/SKILL.md",
|
||||
"description": "Set up GBrain from scratch: Supabase, import, sync, file migration"
|
||||
"name": "setup",
|
||||
"path": "setup/SKILL.md",
|
||||
"description": "Set up GBrain: auto-provision Supabase, AGENTS.md injection, first import"
|
||||
}
|
||||
],
|
||||
"dependencies": {
|
||||
@@ -44,7 +44,7 @@
|
||||
"package": "gbrain"
|
||||
},
|
||||
"setup": {
|
||||
"command": "gbrain init --supabase",
|
||||
"description": "Initialize brain with Supabase (guided wizard)"
|
||||
"skill": "setup",
|
||||
"description": "Auto-provision Supabase and configure GBrain (< 2 min)"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -9,7 +9,7 @@ Universal migration from any wiki, note tool, or brain system into GBrain.
|
||||
| Obsidian | Markdown + `[[wikilinks]]` | Direct import, convert wikilinks to gbrain links |
|
||||
| Notion | Exported markdown or CSV | Parse Notion's export structure |
|
||||
| Logseq | Markdown with `((block refs))` | Convert block refs to page links |
|
||||
| Plain markdown | Any .md directory | `gbrain import <dir>` directly |
|
||||
| Plain markdown | Any .md directory | Import directory into gbrain directly |
|
||||
| CSV | Tabular data | Map columns to frontmatter fields |
|
||||
| JSON | Structured data | Map keys to page fields |
|
||||
| Roam | JSON export | Convert block structure to pages |
|
||||
@@ -18,31 +18,23 @@ Universal migration from any wiki, note tool, or brain system into GBrain.
|
||||
|
||||
1. **Assess the source.** What format? How many files? What structure?
|
||||
2. **Plan the mapping.** How do source fields map to gbrain fields (type, title, tags, compiled_truth, timeline)?
|
||||
3. **Test with a sample.** Import 5-10 files, verify with `gbrain get` and `gbrain export`.
|
||||
4. **Bulk import.** Run the full migration.
|
||||
5. **Verify.** `gbrain health` + `gbrain stats` + spot-check pages.
|
||||
6. **Build links.** Extract cross-references from content and create typed links.
|
||||
3. **Test with a sample.** Import 5-10 files, verify by reading them back from gbrain and exporting.
|
||||
4. **Bulk import.** Import the full directory into gbrain.
|
||||
5. **Verify.** Check gbrain health and statistics, spot-check pages.
|
||||
6. **Build links.** Extract cross-references from content and create typed links in gbrain.
|
||||
|
||||
## Obsidian Migration
|
||||
|
||||
```bash
|
||||
# 1. Direct import (obsidian vaults are markdown directories)
|
||||
gbrain import /path/to/vault/
|
||||
|
||||
# 2. Convert [[wikilinks]] to gbrain links
|
||||
# The skill reads each page's compiled_truth, finds [[Name]] patterns,
|
||||
# resolves them to slugs, and creates links:
|
||||
gbrain get <slug> # read content
|
||||
# For each [[Name]] found:
|
||||
gbrain link <current-slug> <resolved-slug> --type references
|
||||
```
|
||||
1. Import the vault directory into gbrain (Obsidian vaults are markdown directories)
|
||||
2. Convert `[[wikilinks]]` to gbrain links:
|
||||
- Read each page from gbrain
|
||||
- For each `[[Name]]` found, resolve to a slug and create a link in gbrain
|
||||
- `[[Name|alias]]` uses the alias for context
|
||||
|
||||
Obsidian-specific:
|
||||
- `[[Name]]` becomes `gbrain link`
|
||||
- `[[Name|alias]]` uses the alias for context
|
||||
- Tags (`#tag`) become `gbrain tag`
|
||||
- Tags (`#tag`) become gbrain tags
|
||||
- Frontmatter properties map to gbrain frontmatter
|
||||
- Attachments (images, PDFs) are noted but not imported (future work)
|
||||
- Attachments (images, PDFs) are noted but handled separately via file storage
|
||||
|
||||
## Notion Migration
|
||||
|
||||
@@ -50,38 +42,31 @@ Obsidian-specific:
|
||||
2. Notion exports nested directories with UUIDs in filenames
|
||||
3. Strip UUIDs from filenames for clean slugs
|
||||
4. Map Notion's database properties to frontmatter
|
||||
5. `gbrain import` the cleaned directory
|
||||
5. Import the cleaned directory into gbrain
|
||||
|
||||
## CSV Migration
|
||||
|
||||
For tabular data (e.g., CRM exports, contact lists):
|
||||
|
||||
```bash
|
||||
# For each row in the CSV:
|
||||
# 1. Create a page with column values as frontmatter
|
||||
# 2. Use a designated column as the slug (e.g., name)
|
||||
# 3. Use another column as compiled_truth (e.g., notes)
|
||||
gbrain put <slug> < generated.md
|
||||
```
|
||||
1. For each row in the CSV, create a page with column values as frontmatter
|
||||
2. Use a designated column as the slug (e.g., name)
|
||||
3. Use another column as compiled_truth (e.g., notes)
|
||||
4. Store each page in gbrain
|
||||
|
||||
## Verification
|
||||
|
||||
After any migration:
|
||||
1. `gbrain stats` — check page count matches source
|
||||
2. `gbrain health` — check for orphans, missing embeddings
|
||||
3. `gbrain export --dir /tmp/verify/` — round-trip test
|
||||
4. Spot-check 5-10 pages with `gbrain get`
|
||||
5. Test search: `gbrain query "someone you know is in the data"`
|
||||
1. Check gbrain statistics to verify page count matches source
|
||||
2. Check gbrain health for orphans and missing embeddings
|
||||
3. Export pages from gbrain for round-trip verification
|
||||
4. Spot-check 5-10 pages by reading them from gbrain
|
||||
5. Test search: search gbrain for "someone you know is in the data"
|
||||
|
||||
## Commands Used
|
||||
## Tools Used
|
||||
|
||||
```
|
||||
gbrain import <dir> [--no-embed]
|
||||
gbrain get <slug>
|
||||
gbrain put <slug>
|
||||
gbrain link <from> <to> --type <type>
|
||||
gbrain tag <slug> <tag>
|
||||
gbrain stats
|
||||
gbrain health
|
||||
gbrain export [--dir ./verify/]
|
||||
```
|
||||
- Store/update pages in gbrain (put_page)
|
||||
- Read pages from gbrain (get_page)
|
||||
- Link entities in gbrain (add_link)
|
||||
- Tag pages in gbrain (add_tag)
|
||||
- Get gbrain statistics (get_stats)
|
||||
- Check gbrain health (get_health)
|
||||
- Search gbrain (query)
|
||||
|
||||
@@ -9,10 +9,10 @@ Answer questions using the brain's knowledge with 3-layer search and synthesis.
|
||||
- Semantic query for conceptual questions
|
||||
- Structured queries (list by type, backlinks) for relational questions
|
||||
2. **Execute searches:**
|
||||
- `gbrain search <keywords>` for FTS matches
|
||||
- `gbrain query <question>` for hybrid semantic+keyword with expansion
|
||||
- `gbrain list --type <type>` or `gbrain backlinks <slug>` for structural queries
|
||||
3. **Read top results.** `gbrain get <slug>` for the top 3-5 pages to get full context.
|
||||
- Keyword search gbrain for FTS matches (search)
|
||||
- Hybrid search gbrain for semantic+keyword with expansion (query)
|
||||
- List pages in gbrain by type or check backlinks for structural queries
|
||||
3. **Read top results.** Read the top 3-5 pages from gbrain to get full context.
|
||||
4. **Synthesize answer** with citations. Every claim traces back to a specific page slug.
|
||||
5. **Flag gaps.** If the brain doesn't have info, say "the brain doesn't have information on X" rather than hallucinating.
|
||||
|
||||
@@ -25,14 +25,12 @@ Answer questions using the brain's knowledge with 3-layer search and synthesis.
|
||||
- For "what happened" questions, use timeline entries
|
||||
- For "what do we know" questions, read compiled_truth directly
|
||||
|
||||
## Commands Used
|
||||
## Tools Used
|
||||
|
||||
```
|
||||
gbrain search <query>
|
||||
gbrain query <question>
|
||||
gbrain get <slug>
|
||||
gbrain list [--type T] [--tag T]
|
||||
gbrain backlinks <slug>
|
||||
gbrain graph <slug> [--depth N]
|
||||
gbrain timeline <slug>
|
||||
```
|
||||
- Keyword search gbrain (search)
|
||||
- Hybrid search gbrain (query)
|
||||
- Read a page from gbrain (get_page)
|
||||
- List pages in gbrain with filters (list_pages)
|
||||
- Check backlinks in gbrain (get_backlinks)
|
||||
- Traverse the link graph in gbrain (traverse_graph)
|
||||
- View timeline entries in gbrain (get_timeline)
|
||||
|
||||
111
skills/setup/SKILL.md
Normal file
111
skills/setup/SKILL.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Setup GBrain
|
||||
|
||||
Set up GBrain from scratch. Target: working brain in under 2 minutes.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- A Supabase account (Pro tier recommended: $25/mo for 8GB DB + 100GB storage)
|
||||
- An OpenAI API key (for semantic search embeddings, ~$4-5 for 7,500 pages)
|
||||
- A git-backed markdown knowledge base (or start fresh)
|
||||
|
||||
## Phase A: Auto-Provision (Supabase CLI)
|
||||
|
||||
Check if the Supabase CLI is available. If it is, use the fast path:
|
||||
|
||||
1. Tell the user: "I'll set up Supabase for you. Click 'Authorize' when your browser opens."
|
||||
2. Run `supabase login` (opens browser for OAuth)
|
||||
3. Run `supabase projects create --name gbrain --region us-east-1`
|
||||
4. Extract the database connection URL from `supabase projects api-keys`
|
||||
5. Initialize gbrain with the connection URL in non-interactive mode
|
||||
6. Proceed to Phase C automatically
|
||||
|
||||
## Phase B: Manual Fallback
|
||||
|
||||
If the Supabase CLI is not available, guide the user:
|
||||
|
||||
1. "Log into Supabase and add a credit card: https://supabase.com/dashboard/account/billing"
|
||||
2. "Create a new project: https://supabase.com/dashboard/new/_"
|
||||
- Name: gbrain
|
||||
- Region: closest to you
|
||||
- Generate a strong password
|
||||
3. "Go to Project Settings > Database and copy the connection string (URI format)"
|
||||
- Paste it here
|
||||
4. Initialize gbrain with the provided URL in non-interactive mode
|
||||
|
||||
That's it. One copy-paste. The agent does everything else.
|
||||
|
||||
## Phase C: First Import
|
||||
|
||||
1. **Discover markdown repos.** Scan the environment for git repos with markdown content.
|
||||
|
||||
```bash
|
||||
echo "=== GBrain Environment Discovery ==="
|
||||
for dir in /data/* ~/git/* ~/Documents/* 2>/dev/null; do
|
||||
if [ -d "$dir/.git" ]; then
|
||||
md_count=$(find "$dir" -name "*.md" -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | wc -l | tr -d ' ')
|
||||
if [ "$md_count" -gt 10 ]; then
|
||||
total_size=$(du -sh "$dir" 2>/dev/null | cut -f1)
|
||||
echo " $dir ($total_size, $md_count .md files)"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
echo "=== Discovery Complete ==="
|
||||
```
|
||||
|
||||
2. **Import the best candidate.** Import the recommended directory into gbrain.
|
||||
3. **Prove search works.** Search gbrain for a topic from the imported data. Show results immediately.
|
||||
4. **Start embeddings.** Refresh stale embeddings in gbrain (runs in background). Keyword search works NOW, semantic search improves as embeddings complete.
|
||||
|
||||
## Phase D: AGENTS.md Injection
|
||||
|
||||
Auto-inject gbrain instructions into the project's AGENTS.md (or equivalent). Use a delimited managed block that's upgrade-safe:
|
||||
|
||||
```markdown
|
||||
<!-- gbrain:start -->
|
||||
## GBrain (Knowledge Search)
|
||||
|
||||
GBrain indexes your knowledge base for fast search. Always search before answering
|
||||
questions about people, companies, deals, or anything in the brain.
|
||||
|
||||
### How to use
|
||||
- Search gbrain for any topic before answering questions
|
||||
- After writing new content, sync the repository to gbrain
|
||||
- Upload binary files to gbrain storage instead of committing to git
|
||||
- Check gbrain health periodically
|
||||
|
||||
### Rules
|
||||
1. **Search the brain first.** Before answering any question about people, companies,
|
||||
deals, meetings, or strategy, search gbrain. Your memory of file contents goes
|
||||
stale; the database doesn't.
|
||||
2. **Never commit binaries to git.** Upload to gbrain file storage instead.
|
||||
3. **After writing to the brain repo,** sync to gbrain immediately.
|
||||
<!-- gbrain:end -->
|
||||
```
|
||||
|
||||
## Phase E: Health Check
|
||||
|
||||
After setup is complete, check gbrain health. Every dimension should be healthy.
|
||||
Report the final state to the user:
|
||||
- Page count and statistics
|
||||
- Embedding coverage
|
||||
- Search verification (run a sample query)
|
||||
|
||||
## Error Handling
|
||||
|
||||
Every error tells you what happened, why, and how to fix it:
|
||||
|
||||
| What You See | Why | Fix |
|
||||
|---|---|---|
|
||||
| Connection refused | Supabase project paused or wrong URL | supabase.com/dashboard > Restore |
|
||||
| Password authentication failed | Wrong password | Project Settings > Database > Reset password |
|
||||
| pgvector not available | Extension not enabled | Run CREATE EXTENSION vector in SQL Editor |
|
||||
| OpenAI key invalid | Expired or wrong key | platform.openai.com/api-keys > Create new |
|
||||
| No pages found | Query before import | Import files into gbrain first |
|
||||
|
||||
## Tools Used
|
||||
|
||||
- Initialize gbrain (via CLI: gbrain init --non-interactive --url ...)
|
||||
- Import files into gbrain (via CLI: gbrain import)
|
||||
- Search gbrain (query)
|
||||
- Check gbrain health (get_health)
|
||||
- Get gbrain statistics (get_stats)
|
||||
370
src/cli.ts
370
src/cli.ts
@@ -1,41 +1,25 @@
|
||||
#!/usr/bin/env bun
|
||||
|
||||
import { readFileSync } from 'fs';
|
||||
import { PostgresEngine } from './core/postgres-engine.ts';
|
||||
import { loadConfig, toEngineConfig } from './core/config.ts';
|
||||
import type { BrainEngine } from './core/engine.ts';
|
||||
import { operations, OperationError } from './core/operations.ts';
|
||||
import type { Operation, OperationContext } from './core/operations.ts';
|
||||
import { serializeMarkdown } from './core/markdown.ts';
|
||||
import { VERSION } from './version.ts';
|
||||
|
||||
const COMMAND_HELP: Record<string, string> = {
|
||||
init: 'Usage: gbrain init [--supabase|--url <conn>]\n\nCreate brain (guided wizard).',
|
||||
upgrade: 'Usage: gbrain upgrade\n\nSelf-update the CLI.\n\nDetects install method (bun, binary, clawhub) and runs the appropriate update.',
|
||||
get: 'Usage: gbrain get <slug>\n\nRead a page by slug (supports fuzzy matching).',
|
||||
put: 'Usage: gbrain put <slug> [< file.md]\n\nWrite or update a page from stdin.',
|
||||
delete: 'Usage: gbrain delete <slug>\n\nDelete a page.',
|
||||
list: 'Usage: gbrain list [--type T] [--tag T] [-n N]\n\nList pages with filters.',
|
||||
search: 'Usage: gbrain search <query>\n\nKeyword search (tsvector).',
|
||||
query: 'Usage: gbrain query <question> [--no-expand]\n\nHybrid search (vector + keyword + RRF + expansion).',
|
||||
import: 'Usage: gbrain import <dir> [--no-embed]\n\nImport markdown directory (idempotent).',
|
||||
sync: 'Usage: gbrain sync [--repo <path>] [--watch] [--full]\n\nGit-to-brain incremental sync.',
|
||||
export: 'Usage: gbrain export [--dir ./out/]\n\nExport to markdown (round-trip).',
|
||||
files: 'Usage: gbrain files <list|upload|sync|verify> [options]\n\nManage stored files.\n\n files list [slug] List stored files\n files upload <file> --page <slug> Upload file to storage\n files sync <dir> Bulk upload directory\n files verify Verify all uploads',
|
||||
embed: 'Usage: gbrain embed [<slug>|--all|--stale]\n\nGenerate/refresh embeddings.',
|
||||
stats: 'Usage: gbrain stats\n\nBrain statistics.',
|
||||
health: 'Usage: gbrain health\n\nBrain health dashboard (embed coverage, stale, orphans).',
|
||||
tag: 'Usage: gbrain tag <slug> <tag>\n\nAdd tag to a page.',
|
||||
untag: 'Usage: gbrain untag <slug> <tag>\n\nRemove tag from a page.',
|
||||
tags: 'Usage: gbrain tags <slug>\n\nList tags for a page.',
|
||||
link: 'Usage: gbrain link <from> <to> [--type T]\n\nCreate typed link between pages.',
|
||||
unlink: 'Usage: gbrain unlink <from> <to>\n\nRemove link between pages.',
|
||||
backlinks: 'Usage: gbrain backlinks <slug>\n\nShow incoming links to a page.',
|
||||
graph: 'Usage: gbrain graph <slug> [--depth N]\n\nTraverse link graph (default depth 5).',
|
||||
timeline: 'Usage: gbrain timeline [<slug>]\n\nView timeline entries.',
|
||||
'timeline-add': 'Usage: gbrain timeline-add <slug> <date> <text>\n\nAdd timeline entry.',
|
||||
history: 'Usage: gbrain history <slug>\n\nPage version history.',
|
||||
revert: 'Usage: gbrain revert <slug> <version-id>\n\nRevert to previous version.',
|
||||
config: 'Usage: gbrain config [show|get|set] <key> [value]\n\nBrain config management.',
|
||||
serve: 'Usage: gbrain serve\n\nStart MCP server (stdio).',
|
||||
call: "Usage: gbrain call <tool> '<json>'\n\nRaw tool invocation.",
|
||||
};
|
||||
// Build CLI name -> operation lookup
|
||||
const cliOps = new Map<string, Operation>();
|
||||
for (const op of operations) {
|
||||
const name = op.cliHints?.name;
|
||||
if (name && !op.cliHints?.hidden) {
|
||||
cliOps.set(name, op);
|
||||
}
|
||||
}
|
||||
|
||||
// CLI-only commands that bypass the operation layer
|
||||
const CLI_ONLY = new Set(['init', 'upgrade', 'import', 'export', 'files', 'embed', 'serve', 'call', 'config']);
|
||||
|
||||
async function main() {
|
||||
const args = process.argv.slice(2);
|
||||
@@ -57,179 +41,229 @@ async function main() {
|
||||
return;
|
||||
}
|
||||
|
||||
// Per-command --help (before any dispatch or DB connection)
|
||||
const subArgs = args.slice(1);
|
||||
|
||||
// Per-command --help
|
||||
if (subArgs.includes('--help') || subArgs.includes('-h')) {
|
||||
const help = COMMAND_HELP[command];
|
||||
if (help) {
|
||||
console.log(help);
|
||||
const op = cliOps.get(command);
|
||||
if (op) {
|
||||
printOpHelp(op);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
// Unknown command check (before DB connection)
|
||||
if (!COMMAND_HELP[command]) {
|
||||
// CLI-only commands
|
||||
if (CLI_ONLY.has(command)) {
|
||||
await handleCliOnly(command, subArgs);
|
||||
return;
|
||||
}
|
||||
|
||||
// Shared operations
|
||||
const op = cliOps.get(command);
|
||||
if (!op) {
|
||||
console.error(`Unknown command: ${command}`);
|
||||
console.error('Run gbrain --help for available commands.');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const engine = await connectEngine();
|
||||
try {
|
||||
const params = parseOpArgs(op, subArgs);
|
||||
const ctx = makeContext(engine, params);
|
||||
const result = await op.handler(ctx, params);
|
||||
const output = formatResult(op.name, result);
|
||||
if (output) process.stdout.write(output);
|
||||
} catch (e: unknown) {
|
||||
if (e instanceof OperationError) {
|
||||
console.error(`Error [${e.code}]: ${e.message}`);
|
||||
if (e.suggestion) console.error(` Fix: ${e.suggestion}`);
|
||||
process.exit(1);
|
||||
}
|
||||
console.error(e instanceof Error ? e.message : String(e));
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await engine.disconnect();
|
||||
}
|
||||
}
|
||||
|
||||
function parseOpArgs(op: Operation, args: string[]): Record<string, unknown> {
|
||||
const params: Record<string, unknown> = {};
|
||||
const positional = op.cliHints?.positional || [];
|
||||
let posIdx = 0;
|
||||
|
||||
for (let i = 0; i < args.length; i++) {
|
||||
const arg = args[i];
|
||||
if (arg.startsWith('--')) {
|
||||
const key = arg.slice(2).replace(/-/g, '_');
|
||||
const paramDef = op.params[key];
|
||||
if (paramDef?.type === 'boolean') {
|
||||
params[key] = true;
|
||||
} else if (i + 1 < args.length) {
|
||||
params[key] = args[++i];
|
||||
if (paramDef?.type === 'number') params[key] = Number(params[key]);
|
||||
}
|
||||
} else if (posIdx < positional.length) {
|
||||
const key = positional[posIdx++];
|
||||
const paramDef = op.params[key];
|
||||
params[key] = paramDef?.type === 'number' ? Number(arg) : arg;
|
||||
}
|
||||
}
|
||||
|
||||
// Read stdin for content params
|
||||
if (op.cliHints?.stdin && !params[op.cliHints.stdin] && !process.stdin.isTTY) {
|
||||
params[op.cliHints.stdin] = readFileSync('/dev/stdin', 'utf-8');
|
||||
}
|
||||
|
||||
return params;
|
||||
}
|
||||
|
||||
function makeContext(engine: BrainEngine, params: Record<string, unknown>): OperationContext {
|
||||
return {
|
||||
engine,
|
||||
config: loadConfig() || { engine: 'postgres' },
|
||||
logger: { info: console.log, warn: console.warn, error: console.error },
|
||||
dryRun: (params.dry_run as boolean) || false,
|
||||
};
|
||||
}
|
||||
|
||||
function formatResult(opName: string, result: unknown): string {
|
||||
switch (opName) {
|
||||
case 'get_page': {
|
||||
const r = result as any;
|
||||
if (r.error === 'ambiguous_slug') {
|
||||
return `Ambiguous slug. Did you mean:\n${r.candidates.map((c: string) => ` ${c}`).join('\n')}\n`;
|
||||
}
|
||||
return serializeMarkdown(r.frontmatter || {}, r.compiled_truth || '', r.timeline || '', {
|
||||
type: r.type, title: r.title, tags: r.tags || [],
|
||||
});
|
||||
}
|
||||
case 'list_pages': {
|
||||
const pages = result as any[];
|
||||
if (pages.length === 0) return 'No pages found.\n';
|
||||
return pages.map(p =>
|
||||
`${p.slug}\t${p.type}\t${p.updated_at?.toString().slice(0, 10) || '?'}\t${p.title}`,
|
||||
).join('\n') + '\n';
|
||||
}
|
||||
case 'search':
|
||||
case 'query': {
|
||||
const results = result as any[];
|
||||
if (results.length === 0) return 'No results.\n';
|
||||
return results.map(r =>
|
||||
`[${r.score?.toFixed(4) || '?'}] ${r.slug} -- ${r.chunk_text?.slice(0, 100) || ''}${r.stale ? ' (stale)' : ''}`,
|
||||
).join('\n') + '\n';
|
||||
}
|
||||
case 'get_tags': {
|
||||
const tags = result as string[];
|
||||
return tags.length > 0 ? tags.join(', ') + '\n' : 'No tags.\n';
|
||||
}
|
||||
case 'get_stats': {
|
||||
const s = result as any;
|
||||
const lines = [
|
||||
`Pages: ${s.page_count}`,
|
||||
`Chunks: ${s.chunk_count}`,
|
||||
`Embedded: ${s.embedded_count}`,
|
||||
`Links: ${s.link_count}`,
|
||||
`Tags: ${s.tag_count}`,
|
||||
`Timeline: ${s.timeline_entry_count}`,
|
||||
];
|
||||
if (s.pages_by_type) {
|
||||
lines.push('', 'By type:');
|
||||
for (const [k, v] of Object.entries(s.pages_by_type)) {
|
||||
lines.push(` ${k}: ${v}`);
|
||||
}
|
||||
}
|
||||
return lines.join('\n') + '\n';
|
||||
}
|
||||
case 'get_health': {
|
||||
const h = result as any;
|
||||
const score = Math.max(0, 10
|
||||
- (h.missing_embeddings > 0 ? 2 : 0)
|
||||
- (h.stale_pages > 0 ? 1 : 0)
|
||||
- (h.dead_links > 0 ? 1 : 0)
|
||||
- (h.orphan_pages > 0 ? 1 : 0));
|
||||
return [
|
||||
`Health score: ${score}/10`,
|
||||
`Embed coverage: ${(h.embed_coverage * 100).toFixed(1)}%`,
|
||||
`Missing embeddings: ${h.missing_embeddings}`,
|
||||
`Stale pages: ${h.stale_pages}`,
|
||||
`Orphan pages: ${h.orphan_pages}`,
|
||||
`Dead links: ${h.dead_links}`,
|
||||
].join('\n') + '\n';
|
||||
}
|
||||
case 'get_timeline': {
|
||||
const entries = result as any[];
|
||||
if (entries.length === 0) return 'No timeline entries.\n';
|
||||
return entries.map(e =>
|
||||
`${e.date} ${e.summary}${e.source ? ` [${e.source}]` : ''}`,
|
||||
).join('\n') + '\n';
|
||||
}
|
||||
case 'get_versions': {
|
||||
const versions = result as any[];
|
||||
if (versions.length === 0) return 'No versions.\n';
|
||||
return versions.map(v =>
|
||||
`#${v.id} ${v.snapshot_at?.toString().slice(0, 19) || '?'} ${v.compiled_truth?.slice(0, 60) || ''}...`,
|
||||
).join('\n') + '\n';
|
||||
}
|
||||
default:
|
||||
return JSON.stringify(result, null, 2) + '\n';
|
||||
}
|
||||
}
|
||||
|
||||
async function handleCliOnly(command: string, args: string[]) {
|
||||
// Commands that don't need a database connection
|
||||
if (command === 'init') {
|
||||
const { runInit } = await import('./commands/init.ts');
|
||||
await runInit(subArgs);
|
||||
await runInit(args);
|
||||
return;
|
||||
}
|
||||
|
||||
if (command === 'upgrade') {
|
||||
const { runUpgrade } = await import('./commands/upgrade.ts');
|
||||
await runUpgrade(subArgs);
|
||||
await runUpgrade(args);
|
||||
return;
|
||||
}
|
||||
|
||||
// All other commands need a database connection
|
||||
// All remaining CLI-only commands need a DB connection
|
||||
const engine = await connectEngine();
|
||||
|
||||
try {
|
||||
switch (command) {
|
||||
case 'get': {
|
||||
const { runGet } = await import('./commands/get.ts');
|
||||
await runGet(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'put': {
|
||||
const { runPut } = await import('./commands/put.ts');
|
||||
await runPut(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'list': {
|
||||
const { runList } = await import('./commands/list.ts');
|
||||
await runList(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'search': {
|
||||
const { runSearch } = await import('./commands/search.ts');
|
||||
await runSearch(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'query': {
|
||||
const { runQuery } = await import('./commands/query.ts');
|
||||
await runQuery(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'import': {
|
||||
const { runImport } = await import('./commands/import.ts');
|
||||
await runImport(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'sync': {
|
||||
const { runSync } = await import('./commands/sync.ts');
|
||||
await runSync(engine, subArgs);
|
||||
await runImport(engine, args);
|
||||
break;
|
||||
}
|
||||
case 'export': {
|
||||
const { runExport } = await import('./commands/export.ts');
|
||||
await runExport(engine, subArgs);
|
||||
await runExport(engine, args);
|
||||
break;
|
||||
}
|
||||
case 'files': {
|
||||
const { runFiles } = await import('./commands/files.ts');
|
||||
await runFiles(engine, subArgs);
|
||||
await runFiles(engine, args);
|
||||
break;
|
||||
}
|
||||
case 'embed': {
|
||||
const { runEmbed } = await import('./commands/embed.ts');
|
||||
await runEmbed(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'stats': {
|
||||
const { runStats } = await import('./commands/stats.ts');
|
||||
await runStats(engine);
|
||||
break;
|
||||
}
|
||||
case 'health': {
|
||||
const { runHealth } = await import('./commands/health.ts');
|
||||
await runHealth(engine);
|
||||
break;
|
||||
}
|
||||
case 'tag': {
|
||||
const { runTag } = await import('./commands/tags.ts');
|
||||
await runTag(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'untag': {
|
||||
const { runUntag } = await import('./commands/tags.ts');
|
||||
await runUntag(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'tags': {
|
||||
const { runTags } = await import('./commands/tags.ts');
|
||||
await runTags(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'link': {
|
||||
const { runLink } = await import('./commands/link.ts');
|
||||
await runLink(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'unlink': {
|
||||
const { runUnlink } = await import('./commands/link.ts');
|
||||
await runUnlink(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'backlinks': {
|
||||
const { runBacklinks } = await import('./commands/link.ts');
|
||||
await runBacklinks(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'graph': {
|
||||
const { runGraph } = await import('./commands/link.ts');
|
||||
await runGraph(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'timeline': {
|
||||
const { runTimeline } = await import('./commands/timeline.ts');
|
||||
await runTimeline(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'timeline-add': {
|
||||
const { runTimelineAdd } = await import('./commands/timeline.ts');
|
||||
await runTimelineAdd(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'delete': {
|
||||
const { runDelete } = await import('./commands/delete.ts');
|
||||
await runDelete(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'history': {
|
||||
const { runHistory } = await import('./commands/version.ts');
|
||||
await runHistory(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'revert': {
|
||||
const { runRevert } = await import('./commands/version.ts');
|
||||
await runRevert(engine, subArgs);
|
||||
break;
|
||||
}
|
||||
case 'config': {
|
||||
const { runConfig } = await import('./commands/config.ts');
|
||||
await runConfig(engine, subArgs);
|
||||
await runEmbed(engine, args);
|
||||
break;
|
||||
}
|
||||
case 'serve': {
|
||||
const { runServe } = await import('./commands/serve.ts');
|
||||
await runServe(engine);
|
||||
break;
|
||||
return; // serve doesn't disconnect
|
||||
}
|
||||
case 'call': {
|
||||
const { runCall } = await import('./commands/call.ts');
|
||||
await runCall(engine, subArgs);
|
||||
await runCall(engine, args);
|
||||
break;
|
||||
}
|
||||
case 'config': {
|
||||
const { runConfig } = await import('./commands/config.ts');
|
||||
await runConfig(engine, args);
|
||||
break;
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
await engine.disconnect();
|
||||
if (command !== 'serve') await engine.disconnect();
|
||||
}
|
||||
}
|
||||
|
||||
@@ -239,14 +273,34 @@ async function connectEngine(): Promise<BrainEngine> {
|
||||
console.error('No brain configured. Run: gbrain init --supabase');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const engine = new PostgresEngine();
|
||||
await engine.connect(toEngineConfig(config));
|
||||
return engine;
|
||||
}
|
||||
|
||||
function printOpHelp(op: Operation) {
|
||||
const positional = (op.cliHints?.positional || []).map(p => `<${p}>`).join(' ');
|
||||
const name = op.cliHints?.name || op.name;
|
||||
console.log(`Usage: gbrain ${name} ${positional} [options]\n`);
|
||||
console.log(op.description + '\n');
|
||||
const entries = Object.entries(op.params);
|
||||
if (entries.length > 0) {
|
||||
console.log('Options:');
|
||||
for (const [key, def] of entries) {
|
||||
const isPos = op.cliHints?.positional?.includes(key);
|
||||
const req = def.required ? ' (required)' : '';
|
||||
const prefix = isPos ? ` <${key}>` : ` --${key.replace(/_/g, '-')}`;
|
||||
console.log(`${prefix.padEnd(28)} ${def.description || ''}${req}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function printHelp() {
|
||||
console.log(`gbrain ${VERSION} — personal knowledge brain
|
||||
// Gather shared operations grouped by category
|
||||
const cliNames = Array.from(cliOps.entries())
|
||||
.map(([name, op]) => ({ name, desc: op.description }));
|
||||
|
||||
console.log(`gbrain ${VERSION} -- personal knowledge brain
|
||||
|
||||
USAGE
|
||||
gbrain <command> [options]
|
||||
@@ -263,7 +317,7 @@ PAGES
|
||||
|
||||
SEARCH
|
||||
search <query> Keyword search (tsvector)
|
||||
query <question> Hybrid search (RRF + expansion)
|
||||
query <question> [--no-expand] Hybrid search (RRF + expansion)
|
||||
|
||||
IMPORT/EXPORT
|
||||
import <dir> [--no-embed] Import markdown directory
|
||||
@@ -299,7 +353,7 @@ ADMIN
|
||||
health Brain health dashboard
|
||||
history <slug> Page version history
|
||||
revert <slug> <version-id> Revert to version
|
||||
config [show|get|set] <key> [value] Brain config
|
||||
config [show|get|set] <key> [val] Brain config
|
||||
serve MCP server (stdio)
|
||||
call <tool> '<json>' Raw tool invocation
|
||||
version Version info
|
||||
|
||||
@@ -1,18 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runDelete(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain delete <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const page = await engine.getPage(slug);
|
||||
if (!page) {
|
||||
console.error(`Page not found: ${slug}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
await engine.deletePage(slug);
|
||||
console.log(`Deleted: ${slug}`);
|
||||
}
|
||||
@@ -9,7 +9,6 @@ interface FileRecord {
|
||||
page_slug: string | null;
|
||||
filename: string;
|
||||
storage_path: string;
|
||||
storage_url: string;
|
||||
mime_type: string | null;
|
||||
size_bytes: number;
|
||||
content_hash: string;
|
||||
@@ -108,13 +107,9 @@ async function uploadFile(args: string[]) {
|
||||
return;
|
||||
}
|
||||
|
||||
// TODO: actual Supabase Storage upload goes here
|
||||
// For now, record metadata in Postgres
|
||||
const storageUrl = `https://storage.supabase.co/brain-files/${storagePath}`;
|
||||
|
||||
await sql`
|
||||
INSERT INTO files (page_slug, filename, storage_path, storage_url, mime_type, size_bytes, content_hash, metadata)
|
||||
VALUES (${pageSlug}, ${filename}, ${storagePath}, ${storageUrl}, ${mimeType}, ${stat.size}, ${hash}, ${'{}'}::jsonb)
|
||||
INSERT INTO files (page_slug, filename, storage_path, mime_type, size_bytes, content_hash, metadata)
|
||||
VALUES (${pageSlug}, ${filename}, ${storagePath}, ${mimeType}, ${stat.size}, ${hash}, ${'{}'}::jsonb)
|
||||
ON CONFLICT (storage_path) DO UPDATE SET
|
||||
content_hash = EXCLUDED.content_hash,
|
||||
size_bytes = EXCLUDED.size_bytes,
|
||||
@@ -161,11 +156,9 @@ async function syncFiles(dir?: string) {
|
||||
const pathParts = relativePath.split('/');
|
||||
const pageSlug = pathParts.length > 1 ? pathParts.slice(0, -1).join('/') : null;
|
||||
|
||||
const storageUrl = `https://storage.supabase.co/brain-files/${storagePath}`;
|
||||
|
||||
await sql`
|
||||
INSERT INTO files (page_slug, filename, storage_path, storage_url, mime_type, size_bytes, content_hash, metadata)
|
||||
VALUES (${pageSlug}, ${filename}, ${storagePath}, ${storageUrl}, ${mimeType}, ${stat.size}, ${hash}, ${'{}'}::jsonb)
|
||||
INSERT INTO files (page_slug, filename, storage_path, mime_type, size_bytes, content_hash, metadata)
|
||||
VALUES (${pageSlug}, ${filename}, ${storagePath}, ${mimeType}, ${stat.size}, ${hash}, ${'{}'}::jsonb)
|
||||
ON CONFLICT (storage_path) DO UPDATE SET
|
||||
content_hash = EXCLUDED.content_hash,
|
||||
size_bytes = EXCLUDED.size_bytes,
|
||||
|
||||
@@ -1,37 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { serializeMarkdown } from '../core/markdown.ts';
|
||||
|
||||
export async function runGet(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain get <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Try exact match first, then fuzzy resolve
|
||||
let page = await engine.getPage(slug);
|
||||
if (!page) {
|
||||
const candidates = await engine.resolveSlugs(slug);
|
||||
if (candidates.length === 1) {
|
||||
page = await engine.getPage(candidates[0]);
|
||||
} else if (candidates.length > 1) {
|
||||
console.error(`Ambiguous slug "${slug}". Did you mean:`);
|
||||
for (const c of candidates) console.error(` ${c}`);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
if (!page) {
|
||||
console.error(`Page not found: ${slug}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const tags = await engine.getTags(page.slug);
|
||||
const md = serializeMarkdown(
|
||||
page.frontmatter,
|
||||
page.compiled_truth,
|
||||
page.timeline,
|
||||
{ type: page.type, title: page.title, tags },
|
||||
);
|
||||
process.stdout.write(md);
|
||||
}
|
||||
@@ -1,36 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runHealth(engine: BrainEngine) {
|
||||
const health = await engine.getHealth();
|
||||
|
||||
const coveragePct = (health.embed_coverage * 100).toFixed(1);
|
||||
|
||||
console.log('Brain Health Dashboard');
|
||||
console.log('======================');
|
||||
console.log(`Pages: ${health.page_count}`);
|
||||
console.log(`Embed coverage: ${coveragePct}%`);
|
||||
console.log(`Missing embeddings: ${health.missing_embeddings}`);
|
||||
console.log(`Stale pages: ${health.stale_pages}`);
|
||||
console.log(`Orphan pages: ${health.orphan_pages}`);
|
||||
console.log(`Dead links: ${health.dead_links}`);
|
||||
|
||||
// Health score: simple heuristic
|
||||
let score = 10;
|
||||
if (health.embed_coverage < 0.5) score -= 3;
|
||||
else if (health.embed_coverage < 0.9) score -= 1;
|
||||
if (health.stale_pages > health.page_count * 0.2) score -= 2;
|
||||
if (health.orphan_pages > health.page_count * 0.3) score -= 1;
|
||||
if (health.dead_links > 0) score -= 1;
|
||||
if (health.missing_embeddings > 0) score -= 1;
|
||||
score = Math.max(0, score);
|
||||
|
||||
console.log(`\nHealth score: ${score}/10`);
|
||||
|
||||
if (score < 7) {
|
||||
console.log('\nRecommendations:');
|
||||
if (health.missing_embeddings > 0) console.log(' Run: gbrain embed --stale');
|
||||
if (health.stale_pages > 0) console.log(' Review stale pages (compiled_truth older than timeline)');
|
||||
if (health.orphan_pages > 0) console.log(' Add links to orphan pages');
|
||||
if (health.dead_links > 0) console.log(' Fix dead links');
|
||||
}
|
||||
}
|
||||
@@ -4,17 +4,28 @@ import { saveConfig, type GBrainConfig } from '../core/config.ts';
|
||||
|
||||
export async function runInit(args: string[]) {
|
||||
const isSupabase = args.includes('--supabase');
|
||||
const isNonInteractive = args.includes('--non-interactive');
|
||||
const urlIndex = args.indexOf('--url');
|
||||
const manualUrl = urlIndex !== -1 ? args[urlIndex + 1] : null;
|
||||
const keyIndex = args.indexOf('--key');
|
||||
const apiKey = keyIndex !== -1 ? args[keyIndex + 1] : null;
|
||||
|
||||
let databaseUrl: string;
|
||||
|
||||
if (manualUrl) {
|
||||
databaseUrl = manualUrl;
|
||||
} else if (isNonInteractive) {
|
||||
// Non-interactive mode requires --url
|
||||
const envUrl = process.env.GBRAIN_DATABASE_URL || process.env.DATABASE_URL;
|
||||
if (envUrl) {
|
||||
databaseUrl = envUrl;
|
||||
} else {
|
||||
console.error('--non-interactive requires --url <connection_string> or GBRAIN_DATABASE_URL env var');
|
||||
process.exit(1);
|
||||
}
|
||||
} else if (isSupabase) {
|
||||
databaseUrl = await supabaseWizard();
|
||||
} else {
|
||||
// Default to supabase wizard
|
||||
databaseUrl = await supabaseWizard();
|
||||
}
|
||||
|
||||
@@ -30,6 +41,7 @@ export async function runInit(args: string[]) {
|
||||
const config: GBrainConfig = {
|
||||
engine: 'postgres',
|
||||
database_url: databaseUrl,
|
||||
...(apiKey ? { openai_api_key: apiKey } : {}),
|
||||
};
|
||||
saveConfig(config);
|
||||
console.log('Config saved to ~/.gbrain/config.json');
|
||||
|
||||
@@ -1,68 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runLink(engine: BrainEngine, args: string[]) {
|
||||
const from = args[0];
|
||||
const to = args[1];
|
||||
const typeIdx = args.indexOf('--type');
|
||||
const linkType = typeIdx !== -1 ? args[typeIdx + 1] : '';
|
||||
|
||||
if (!from || !to) {
|
||||
console.error('Usage: gbrain link <from> <to> [--type <type>]');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
await engine.addLink(from, to, '', linkType);
|
||||
console.log(`Linked ${from} -> ${to}${linkType ? ` (${linkType})` : ''}`);
|
||||
}
|
||||
|
||||
export async function runUnlink(engine: BrainEngine, args: string[]) {
|
||||
const [from, to] = args;
|
||||
if (!from || !to) {
|
||||
console.error('Usage: gbrain unlink <from> <to>');
|
||||
process.exit(1);
|
||||
}
|
||||
await engine.removeLink(from, to);
|
||||
console.log(`Unlinked ${from} -> ${to}`);
|
||||
}
|
||||
|
||||
export async function runBacklinks(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain backlinks <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const links = await engine.getBacklinks(slug);
|
||||
if (links.length === 0) {
|
||||
console.log(`No backlinks to ${slug}`);
|
||||
return;
|
||||
}
|
||||
|
||||
for (const l of links) {
|
||||
const typeStr = l.link_type ? ` (${l.link_type})` : '';
|
||||
console.log(`${l.from_slug}${typeStr}`);
|
||||
}
|
||||
console.log(`\n${links.length} backlinks`);
|
||||
}
|
||||
|
||||
export async function runGraph(engine: BrainEngine, args: string[]) {
|
||||
const slug = args.find(a => !a.startsWith('--'));
|
||||
const depthIdx = args.indexOf('--depth');
|
||||
const depth = depthIdx !== -1 ? parseInt(args[depthIdx + 1], 10) : 5;
|
||||
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain graph <slug> [--depth N]');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const nodes = await engine.traverseGraph(slug, depth);
|
||||
|
||||
for (const node of nodes) {
|
||||
const indent = ' '.repeat(node.depth);
|
||||
const links = node.links.map(l => `${l.to_slug}${l.link_type ? `(${l.link_type})` : ''}`);
|
||||
console.log(`${indent}${node.slug} [${node.type}]`);
|
||||
if (links.length > 0) {
|
||||
console.log(`${indent} -> ${links.join(', ')}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,25 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import type { PageType } from '../core/types.ts';
|
||||
|
||||
export async function runList(engine: BrainEngine, args: string[]) {
|
||||
const typeIdx = args.indexOf('--type');
|
||||
const tagIdx = args.indexOf('--tag');
|
||||
const limitIdx = args.indexOf('-n');
|
||||
|
||||
const type = typeIdx !== -1 ? (args[typeIdx + 1] as PageType) : undefined;
|
||||
const tag = tagIdx !== -1 ? args[tagIdx + 1] : undefined;
|
||||
const limit = limitIdx !== -1 ? parseInt(args[limitIdx + 1], 10) : 50;
|
||||
|
||||
const pages = await engine.listPages({ type, tag, limit });
|
||||
|
||||
if (pages.length === 0) {
|
||||
console.log('No pages found.');
|
||||
return;
|
||||
}
|
||||
|
||||
for (const p of pages) {
|
||||
const date = p.updated_at.toISOString().split('T')[0];
|
||||
console.log(`${p.slug}\t${p.type}\t${date}\t${p.title}`);
|
||||
}
|
||||
console.log(`\n${pages.length} pages`);
|
||||
}
|
||||
@@ -1,50 +0,0 @@
|
||||
import { readFileSync } from 'fs';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { parseMarkdown } from '../core/markdown.ts';
|
||||
|
||||
export async function runPut(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain put <slug> [< file.md]');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Read from stdin or file arg
|
||||
let content: string;
|
||||
const fileArg = args[1];
|
||||
if (fileArg) {
|
||||
content = readFileSync(fileArg, 'utf-8');
|
||||
} else if (!process.stdin.isTTY) {
|
||||
content = readFileSync('/dev/stdin', 'utf-8');
|
||||
} else {
|
||||
console.error('Provide content via stdin or file argument');
|
||||
console.error(' gbrain put people/john < john.md');
|
||||
console.error(' cat john.md | gbrain put people/john');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const parsed = parseMarkdown(content, slug + '.md');
|
||||
|
||||
// Create version snapshot before updating
|
||||
const existing = await engine.getPage(slug);
|
||||
if (existing) {
|
||||
await engine.createVersion(slug);
|
||||
}
|
||||
|
||||
const page = await engine.putPage(slug, {
|
||||
type: parsed.type,
|
||||
title: parsed.title,
|
||||
compiled_truth: parsed.compiled_truth,
|
||||
timeline: parsed.timeline,
|
||||
frontmatter: parsed.frontmatter,
|
||||
});
|
||||
|
||||
// Update tags
|
||||
if (parsed.tags.length > 0) {
|
||||
for (const tag of parsed.tags) {
|
||||
await engine.addTag(slug, tag);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`${existing ? 'Updated' : 'Created'}: ${page.slug} (${page.type})`);
|
||||
}
|
||||
@@ -1,32 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { hybridSearch } from '../core/search/hybrid.ts';
|
||||
import { expandQuery } from '../core/search/expansion.ts';
|
||||
|
||||
export async function runQuery(engine: BrainEngine, args: string[]) {
|
||||
const query = args.filter(a => !a.startsWith('--')).join(' ');
|
||||
const noExpand = args.includes('--no-expand');
|
||||
|
||||
if (!query) {
|
||||
console.error('Usage: gbrain query <question>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const results = await hybridSearch(engine, query, {
|
||||
limit: 20,
|
||||
expansion: !noExpand,
|
||||
expandFn: expandQuery,
|
||||
});
|
||||
|
||||
if (results.length === 0) {
|
||||
console.log('No results found.');
|
||||
return;
|
||||
}
|
||||
|
||||
for (const r of results) {
|
||||
const staleTag = r.stale ? ' [STALE]' : '';
|
||||
console.log(`${r.slug} (${r.type}) score=${r.score.toFixed(4)}${staleTag}`);
|
||||
console.log(` ${r.chunk_text.slice(0, 120)}...`);
|
||||
console.log();
|
||||
}
|
||||
console.log(`${results.length} results`);
|
||||
}
|
||||
@@ -1,24 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runSearch(engine: BrainEngine, args: string[]) {
|
||||
const query = args.join(' ');
|
||||
if (!query) {
|
||||
console.error('Usage: gbrain search <query>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const results = await engine.searchKeyword(query, { limit: 20 });
|
||||
|
||||
if (results.length === 0) {
|
||||
console.log('No results found.');
|
||||
return;
|
||||
}
|
||||
|
||||
for (const r of results) {
|
||||
const staleTag = r.stale ? ' [STALE]' : '';
|
||||
console.log(`${r.slug} (${r.type}) score=${r.score.toFixed(3)}${staleTag}`);
|
||||
console.log(` ${r.chunk_text.slice(0, 120)}...`);
|
||||
console.log();
|
||||
}
|
||||
console.log(`${results.length} results`);
|
||||
}
|
||||
@@ -1,21 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runStats(engine: BrainEngine) {
|
||||
const stats = await engine.getStats();
|
||||
|
||||
console.log('Brain Statistics');
|
||||
console.log('================');
|
||||
console.log(`Pages: ${stats.page_count}`);
|
||||
console.log(`Chunks: ${stats.chunk_count}`);
|
||||
console.log(`Embedded: ${stats.embedded_count}`);
|
||||
console.log(`Links: ${stats.link_count}`);
|
||||
console.log(`Tags: ${stats.tag_count}`);
|
||||
console.log(`Timeline entries: ${stats.timeline_entry_count}`);
|
||||
|
||||
if (Object.keys(stats.pages_by_type).length > 0) {
|
||||
console.log('\nPages by type:');
|
||||
for (const [type, count] of Object.entries(stats.pages_by_type)) {
|
||||
console.log(` ${type}: ${count}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,36 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runTags(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain tags <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const tags = await engine.getTags(slug);
|
||||
if (tags.length === 0) {
|
||||
console.log(`No tags for ${slug}`);
|
||||
} else {
|
||||
console.log(tags.join(', '));
|
||||
}
|
||||
}
|
||||
|
||||
export async function runTag(engine: BrainEngine, args: string[]) {
|
||||
const [slug, tag] = args;
|
||||
if (!slug || !tag) {
|
||||
console.error('Usage: gbrain tag <slug> <tag>');
|
||||
process.exit(1);
|
||||
}
|
||||
await engine.addTag(slug, tag);
|
||||
console.log(`Tagged ${slug} with "${tag}"`);
|
||||
}
|
||||
|
||||
export async function runUntag(engine: BrainEngine, args: string[]) {
|
||||
const [slug, tag] = args;
|
||||
if (!slug || !tag) {
|
||||
console.error('Usage: gbrain untag <slug> <tag>');
|
||||
process.exit(1);
|
||||
}
|
||||
await engine.removeTag(slug, tag);
|
||||
console.log(`Removed tag "${tag}" from ${slug}`);
|
||||
}
|
||||
@@ -1,40 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runTimeline(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain timeline <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const entries = await engine.getTimeline(slug);
|
||||
if (entries.length === 0) {
|
||||
console.log(`No timeline entries for ${slug}`);
|
||||
return;
|
||||
}
|
||||
|
||||
for (const e of entries) {
|
||||
const source = e.source ? ` [${e.source}]` : '';
|
||||
console.log(`${e.date}${source}: ${e.summary}`);
|
||||
if (e.detail) {
|
||||
console.log(` ${e.detail.slice(0, 200)}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
export async function runTimelineAdd(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
const date = args[1];
|
||||
const text = args.slice(2).join(' ');
|
||||
|
||||
if (!slug || !date || !text) {
|
||||
console.error('Usage: gbrain timeline-add <slug> <date> <text>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
await engine.addTimelineEntry(slug, {
|
||||
date,
|
||||
summary: text,
|
||||
});
|
||||
console.log(`Added timeline entry to ${slug}`);
|
||||
}
|
||||
@@ -1,29 +1,16 @@
|
||||
import { operations } from '../core/operations.ts';
|
||||
|
||||
export function printToolsJson() {
|
||||
const tools = [
|
||||
{ name: 'get', description: 'Read a page by slug', parameters: { slug: 'string' } },
|
||||
{ name: 'put', description: 'Write/update a page', parameters: { slug: 'string', content: 'string (markdown)' } },
|
||||
{ name: 'delete', description: 'Delete a page', parameters: { slug: 'string' } },
|
||||
{ name: 'list', description: 'List pages with optional filters', parameters: { type: 'string?', tag: 'string?', limit: 'number?' } },
|
||||
{ name: 'search', description: 'Keyword search (tsvector)', parameters: { query: 'string' } },
|
||||
{ name: 'query', description: 'Hybrid search (RRF + multi-query expansion)', parameters: { query: 'string' } },
|
||||
{ name: 'import', description: 'Import markdown directory', parameters: { dir: 'string', no_embed: 'boolean?' } },
|
||||
{ name: 'export', description: 'Export to markdown directory', parameters: { dir: 'string?' } },
|
||||
{ name: 'embed', description: 'Generate/refresh embeddings', parameters: { slug: 'string?', all: 'boolean?', stale: 'boolean?' } },
|
||||
{ name: 'tag', description: 'Add tag to page', parameters: { slug: 'string', tag: 'string' } },
|
||||
{ name: 'untag', description: 'Remove tag from page', parameters: { slug: 'string', tag: 'string' } },
|
||||
{ name: 'tags', description: 'List tags for a page', parameters: { slug: 'string' } },
|
||||
{ name: 'link', description: 'Create typed link between pages', parameters: { from: 'string', to: 'string', type: 'string?' } },
|
||||
{ name: 'unlink', description: 'Remove link between pages', parameters: { from: 'string', to: 'string' } },
|
||||
{ name: 'backlinks', description: 'List incoming links to a page', parameters: { slug: 'string' } },
|
||||
{ name: 'graph', description: 'Traverse link graph from a page', parameters: { slug: 'string', depth: 'number?' } },
|
||||
{ name: 'timeline', description: 'View timeline entries for a page', parameters: { slug: 'string' } },
|
||||
{ name: 'timeline-add', description: 'Add timeline entry', parameters: { slug: 'string', date: 'string', text: 'string' } },
|
||||
{ name: 'stats', description: 'Brain statistics', parameters: {} },
|
||||
{ name: 'health', description: 'Brain health dashboard', parameters: {} },
|
||||
{ name: 'history', description: 'Page version history', parameters: { slug: 'string' } },
|
||||
{ name: 'revert', description: 'Revert page to version', parameters: { slug: 'string', version_id: 'number' } },
|
||||
{ name: 'config', description: 'Get/set brain config', parameters: { action: '"get"|"set"', key: 'string', value: 'string?' } },
|
||||
];
|
||||
const tools = operations.map(op => ({
|
||||
name: op.name,
|
||||
description: op.description,
|
||||
parameters: Object.fromEntries(
|
||||
Object.entries(op.params).map(([k, v]) => [
|
||||
k,
|
||||
`${v.type}${v.required ? '' : '?'}`,
|
||||
]),
|
||||
),
|
||||
}));
|
||||
|
||||
console.log(JSON.stringify(tools, null, 2));
|
||||
}
|
||||
|
||||
@@ -15,7 +15,7 @@ export async function runUpgrade(args: string[]) {
|
||||
console.log('Upgrading via bun...');
|
||||
try {
|
||||
execSync('bun update gbrain', { stdio: 'inherit', timeout: 120_000 });
|
||||
console.log('Upgrade complete.');
|
||||
verifyUpgrade();
|
||||
} catch {
|
||||
console.error('Upgrade failed. Try running manually: bun update gbrain');
|
||||
}
|
||||
@@ -31,7 +31,7 @@ export async function runUpgrade(args: string[]) {
|
||||
console.log('Upgrading via ClawHub...');
|
||||
try {
|
||||
execSync('clawhub update gbrain', { stdio: 'inherit', timeout: 120_000 });
|
||||
console.log('Upgrade complete.');
|
||||
verifyUpgrade();
|
||||
} catch {
|
||||
console.error('ClawHub upgrade failed. Try: clawhub update gbrain');
|
||||
}
|
||||
@@ -46,6 +46,15 @@ export async function runUpgrade(args: string[]) {
|
||||
}
|
||||
}
|
||||
|
||||
function verifyUpgrade() {
|
||||
try {
|
||||
const output = execSync('gbrain --version', { encoding: 'utf-8', timeout: 10_000 }).trim();
|
||||
console.log(`Upgrade complete. Now running: ${output}`);
|
||||
} catch {
|
||||
console.log('Upgrade complete. Could not verify new version.');
|
||||
}
|
||||
}
|
||||
|
||||
function detectInstallMethod(): 'bun' | 'binary' | 'clawhub' | 'unknown' {
|
||||
const execPath = process.execPath || '';
|
||||
|
||||
|
||||
@@ -1,39 +0,0 @@
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
|
||||
export async function runHistory(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
if (!slug) {
|
||||
console.error('Usage: gbrain history <slug>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const versions = await engine.getVersions(slug);
|
||||
if (versions.length === 0) {
|
||||
console.log(`No version history for ${slug}`);
|
||||
return;
|
||||
}
|
||||
|
||||
console.log(`Version history for ${slug}:`);
|
||||
for (const v of versions) {
|
||||
const date = new Date(v.snapshot_at).toISOString();
|
||||
const preview = v.compiled_truth.slice(0, 80).replace(/\n/g, ' ');
|
||||
console.log(` #${v.id} ${date} ${preview}...`);
|
||||
}
|
||||
}
|
||||
|
||||
export async function runRevert(engine: BrainEngine, args: string[]) {
|
||||
const slug = args[0];
|
||||
const versionId = args[1] ? parseInt(args[1], 10) : NaN;
|
||||
|
||||
if (!slug || isNaN(versionId)) {
|
||||
console.error('Usage: gbrain revert <slug> <version-id>');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Create a snapshot before reverting
|
||||
await engine.createVersion(slug);
|
||||
|
||||
await engine.revertToVersion(slug, versionId);
|
||||
console.log(`Reverted ${slug} to version #${versionId}`);
|
||||
console.log('Note: run gbrain embed <slug> to re-embed the reverted content');
|
||||
}
|
||||
@@ -14,13 +14,29 @@ export interface GBrainConfig {
|
||||
anthropic_api_key?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Load config with credential precedence: env vars > config file.
|
||||
* Plugin config is handled by the plugin runtime injecting env vars.
|
||||
*/
|
||||
export function loadConfig(): GBrainConfig | null {
|
||||
let fileConfig: GBrainConfig | null = null;
|
||||
try {
|
||||
const raw = readFileSync(CONFIG_PATH, 'utf-8');
|
||||
return JSON.parse(raw) as GBrainConfig;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
fileConfig = JSON.parse(raw) as GBrainConfig;
|
||||
} catch { /* no config file */ }
|
||||
|
||||
// Try env vars
|
||||
const dbUrl = process.env.GBRAIN_DATABASE_URL || process.env.DATABASE_URL;
|
||||
|
||||
if (!fileConfig && !dbUrl) return null;
|
||||
|
||||
// Merge: env vars override config file
|
||||
return {
|
||||
engine: 'postgres',
|
||||
...fileConfig,
|
||||
...(dbUrl ? { database_url: dbUrl } : {}),
|
||||
...(process.env.OPENAI_API_KEY ? { openai_api_key: process.env.OPENAI_API_KEY } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
export function saveConfig(config: GBrainConfig): void {
|
||||
|
||||
@@ -62,29 +62,14 @@ export async function disconnect(): Promise<void> {
|
||||
export async function initSchema(): Promise<void> {
|
||||
const conn = getConnection();
|
||||
|
||||
// Read schema SQL
|
||||
// Read schema SQL and execute as a single statement.
|
||||
// The postgres driver handles multi-statement SQL natively, including
|
||||
// PL/pgSQL functions with $$ delimiter blocks that contain semicolons.
|
||||
// The schema uses IF NOT EXISTS / CREATE OR REPLACE for idempotency.
|
||||
const schemaPath = join(dirname(new URL(import.meta.url).pathname), '..', 'schema.sql');
|
||||
const schemaSql = readFileSync(schemaPath, 'utf-8');
|
||||
|
||||
// Split on semicolons and execute each statement
|
||||
// (postgres driver can handle multi-statement, but explicit is safer)
|
||||
const statements = schemaSql
|
||||
.split(/;\s*$/m)
|
||||
.map(s => s.trim())
|
||||
.filter(s => s.length > 0 && !s.startsWith('--'));
|
||||
|
||||
for (const stmt of statements) {
|
||||
try {
|
||||
await conn.unsafe(stmt);
|
||||
} catch (e: unknown) {
|
||||
// Ignore "already exists" errors for idempotency
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
if (msg.includes('already exists') || msg.includes('duplicate key')) {
|
||||
continue;
|
||||
}
|
||||
throw e;
|
||||
}
|
||||
}
|
||||
await conn.unsafe(schemaSql);
|
||||
}
|
||||
|
||||
export async function withTransaction<T>(fn: () => Promise<T>): Promise<T> {
|
||||
|
||||
@@ -6,7 +6,7 @@ import { chunkText } from './chunkers/recursive.ts';
|
||||
import { embedBatch } from './embedding.ts';
|
||||
import type { ChunkInput } from './types.ts';
|
||||
|
||||
export interface ImportFileResult {
|
||||
export interface ImportResult {
|
||||
slug: string;
|
||||
status: 'imported' | 'skipped' | 'error';
|
||||
chunks: number;
|
||||
@@ -15,25 +15,30 @@ export interface ImportFileResult {
|
||||
|
||||
const MAX_FILE_SIZE = 1_000_000; // 1MB
|
||||
|
||||
export async function importFile(
|
||||
/**
|
||||
* Import content from a string. Core pipeline:
|
||||
* parse -> hash -> embed (external) -> transaction(version + putPage + tags + chunks)
|
||||
*
|
||||
* Used by put_page operation and importFromFile.
|
||||
*/
|
||||
export async function importFromContent(
|
||||
engine: BrainEngine,
|
||||
filePath: string,
|
||||
relativePath: string,
|
||||
opts: { noEmbed: boolean },
|
||||
): Promise<ImportFileResult> {
|
||||
// Skip files > 1MB
|
||||
const stat = statSync(filePath);
|
||||
if (stat.size > MAX_FILE_SIZE) {
|
||||
return { slug: relativePath, status: 'skipped', chunks: 0, error: `File too large (${stat.size} bytes)` };
|
||||
}
|
||||
slug: string,
|
||||
content: string,
|
||||
opts: { noEmbed?: boolean } = {},
|
||||
): Promise<ImportResult> {
|
||||
const parsed = parseMarkdown(content, slug + '.md');
|
||||
|
||||
const content = readFileSync(filePath, 'utf-8');
|
||||
const parsed = parseMarkdown(content, relativePath);
|
||||
const slug = parsed.slug;
|
||||
|
||||
// Check content hash for idempotency
|
||||
// Hash includes ALL fields for idempotency (not just compiled_truth + timeline)
|
||||
const hash = createHash('sha256')
|
||||
.update(parsed.compiled_truth + '\n---\n' + parsed.timeline)
|
||||
.update(JSON.stringify({
|
||||
title: parsed.title,
|
||||
type: parsed.type,
|
||||
compiled_truth: parsed.compiled_truth,
|
||||
timeline: parsed.timeline,
|
||||
frontmatter: parsed.frontmatter,
|
||||
tags: parsed.tags.sort(),
|
||||
}))
|
||||
.digest('hex');
|
||||
|
||||
const existing = await engine.getPage(slug);
|
||||
@@ -41,68 +46,80 @@ export async function importFile(
|
||||
return { slug, status: 'skipped', chunks: 0 };
|
||||
}
|
||||
|
||||
// Upsert page
|
||||
await engine.putPage(slug, {
|
||||
type: parsed.type,
|
||||
title: parsed.title,
|
||||
compiled_truth: parsed.compiled_truth,
|
||||
timeline: parsed.timeline,
|
||||
frontmatter: parsed.frontmatter,
|
||||
});
|
||||
|
||||
// Tag reconciliation: remove stale tags, add current ones
|
||||
const existingTags = await engine.getTags(slug);
|
||||
const newTags = new Set(parsed.tags);
|
||||
for (const oldTag of existingTags) {
|
||||
if (!newTags.has(oldTag)) {
|
||||
await engine.removeTag(slug, oldTag);
|
||||
}
|
||||
}
|
||||
for (const tag of parsed.tags) {
|
||||
await engine.addTag(slug, tag);
|
||||
}
|
||||
|
||||
// Chunk compiled_truth and timeline
|
||||
const chunks: ChunkInput[] = [];
|
||||
|
||||
if (parsed.compiled_truth.trim()) {
|
||||
const ctChunks = chunkText(parsed.compiled_truth);
|
||||
for (const c of ctChunks) {
|
||||
chunks.push({
|
||||
chunk_index: chunks.length,
|
||||
chunk_text: c.text,
|
||||
chunk_source: 'compiled_truth',
|
||||
});
|
||||
for (const c of chunkText(parsed.compiled_truth)) {
|
||||
chunks.push({ chunk_index: chunks.length, chunk_text: c.text, chunk_source: 'compiled_truth' });
|
||||
}
|
||||
}
|
||||
if (parsed.timeline?.trim()) {
|
||||
for (const c of chunkText(parsed.timeline)) {
|
||||
chunks.push({ chunk_index: chunks.length, chunk_text: c.text, chunk_source: 'timeline' });
|
||||
}
|
||||
}
|
||||
|
||||
if (parsed.timeline.trim()) {
|
||||
const tlChunks = chunkText(parsed.timeline);
|
||||
for (const c of tlChunks) {
|
||||
chunks.push({
|
||||
chunk_index: chunks.length,
|
||||
chunk_text: c.text,
|
||||
chunk_source: 'timeline',
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Embed if requested
|
||||
// Embed BEFORE the transaction (external API call)
|
||||
if (!opts.noEmbed && chunks.length > 0) {
|
||||
try {
|
||||
const embeddings = await embedBatch(chunks.map(c => c.chunk_text));
|
||||
for (let j = 0; j < chunks.length; j++) {
|
||||
chunks[j].embedding = embeddings[j];
|
||||
chunks[j].token_count = Math.ceil(chunks[j].chunk_text.length / 4);
|
||||
for (let i = 0; i < chunks.length; i++) {
|
||||
chunks[i].embedding = embeddings[i];
|
||||
chunks[i].token_count = Math.ceil(chunks[i].chunk_text.length / 4);
|
||||
}
|
||||
} catch {
|
||||
// Embedding failure is non-fatal, chunks still saved without embeddings
|
||||
}
|
||||
} catch { /* non-fatal */ }
|
||||
}
|
||||
|
||||
if (chunks.length > 0) {
|
||||
await engine.upsertChunks(slug, chunks);
|
||||
}
|
||||
// Transaction wraps all DB writes
|
||||
await engine.transaction(async (tx) => {
|
||||
if (existing) await tx.createVersion(slug);
|
||||
|
||||
await tx.putPage(slug, {
|
||||
type: parsed.type,
|
||||
title: parsed.title,
|
||||
compiled_truth: parsed.compiled_truth,
|
||||
timeline: parsed.timeline || '',
|
||||
frontmatter: parsed.frontmatter,
|
||||
content_hash: hash,
|
||||
});
|
||||
|
||||
// Tag reconciliation: remove stale, add current
|
||||
const existingTags = await tx.getTags(slug);
|
||||
const newTags = new Set(parsed.tags);
|
||||
for (const old of existingTags) {
|
||||
if (!newTags.has(old)) await tx.removeTag(slug, old);
|
||||
}
|
||||
for (const tag of parsed.tags) {
|
||||
await tx.addTag(slug, tag);
|
||||
}
|
||||
|
||||
if (chunks.length > 0) {
|
||||
await tx.upsertChunks(slug, chunks);
|
||||
}
|
||||
});
|
||||
|
||||
return { slug, status: 'imported', chunks: chunks.length };
|
||||
}
|
||||
|
||||
/**
|
||||
* Import from a file path. Validates size, reads content, delegates to importFromContent.
|
||||
*/
|
||||
export async function importFromFile(
|
||||
engine: BrainEngine,
|
||||
filePath: string,
|
||||
relativePath: string,
|
||||
opts: { noEmbed?: boolean } = {},
|
||||
): Promise<ImportResult> {
|
||||
const stat = statSync(filePath);
|
||||
if (stat.size > MAX_FILE_SIZE) {
|
||||
return { slug: relativePath, status: 'skipped', chunks: 0, error: `File too large (${stat.size} bytes)` };
|
||||
}
|
||||
|
||||
const content = readFileSync(filePath, 'utf-8');
|
||||
const parsed = parseMarkdown(content, relativePath);
|
||||
return importFromContent(engine, parsed.slug, content, opts);
|
||||
}
|
||||
|
||||
// Backward compat
|
||||
export const importFile = importFromFile;
|
||||
export type ImportFileResult = ImportResult;
|
||||
|
||||
643
src/core/operations.ts
Normal file
643
src/core/operations.ts
Normal file
@@ -0,0 +1,643 @@
|
||||
/**
|
||||
* Contract-first operation definitions. Single source of truth for CLI, MCP, and tools-json.
|
||||
* Each operation defines its schema, handler, and optional CLI hints.
|
||||
*/
|
||||
|
||||
import type { BrainEngine } from './engine.ts';
|
||||
import type { GBrainConfig } from './config.ts';
|
||||
import { importFromContent } from './import-file.ts';
|
||||
import { hybridSearch } from './search/hybrid.ts';
|
||||
import { expandQuery } from './search/expansion.ts';
|
||||
import * as db from './db.ts';
|
||||
|
||||
// --- Types ---
|
||||
|
||||
export type ErrorCode =
|
||||
| 'page_not_found'
|
||||
| 'invalid_params'
|
||||
| 'embedding_failed'
|
||||
| 'storage_error'
|
||||
| 'bucket_not_found'
|
||||
| 'database_error';
|
||||
|
||||
export class OperationError extends Error {
|
||||
constructor(
|
||||
public code: ErrorCode,
|
||||
message: string,
|
||||
public suggestion?: string,
|
||||
public docs?: string,
|
||||
) {
|
||||
super(message);
|
||||
this.name = 'OperationError';
|
||||
}
|
||||
|
||||
toJSON() {
|
||||
return {
|
||||
error: this.code,
|
||||
message: this.message,
|
||||
suggestion: this.suggestion,
|
||||
docs: this.docs,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
export interface ParamDef {
|
||||
type: 'string' | 'number' | 'boolean' | 'object' | 'array';
|
||||
required?: boolean;
|
||||
description?: string;
|
||||
default?: unknown;
|
||||
enum?: string[];
|
||||
items?: ParamDef;
|
||||
}
|
||||
|
||||
export interface Logger {
|
||||
info(msg: string): void;
|
||||
warn(msg: string): void;
|
||||
error(msg: string): void;
|
||||
}
|
||||
|
||||
export interface OperationContext {
|
||||
engine: BrainEngine;
|
||||
config: GBrainConfig;
|
||||
logger: Logger;
|
||||
dryRun: boolean;
|
||||
}
|
||||
|
||||
export interface Operation {
|
||||
name: string;
|
||||
description: string;
|
||||
params: Record<string, ParamDef>;
|
||||
handler: (ctx: OperationContext, params: Record<string, unknown>) => Promise<unknown>;
|
||||
mutating?: boolean;
|
||||
cliHints?: {
|
||||
name?: string;
|
||||
positional?: string[];
|
||||
stdin?: string;
|
||||
hidden?: boolean;
|
||||
};
|
||||
}
|
||||
|
||||
// --- Page CRUD ---
|
||||
|
||||
const get_page: Operation = {
|
||||
name: 'get_page',
|
||||
description: 'Read a page by slug (supports optional fuzzy matching)',
|
||||
params: {
|
||||
slug: { type: 'string', required: true, description: 'Page slug' },
|
||||
fuzzy: { type: 'boolean', description: 'Enable fuzzy slug resolution (default: false)' },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
const slug = p.slug as string;
|
||||
const fuzzy = (p.fuzzy as boolean) || false;
|
||||
|
||||
let page = await ctx.engine.getPage(slug);
|
||||
let resolved_slug: string | undefined;
|
||||
|
||||
if (!page && fuzzy) {
|
||||
const candidates = await ctx.engine.resolveSlugs(slug);
|
||||
if (candidates.length === 1) {
|
||||
page = await ctx.engine.getPage(candidates[0]);
|
||||
resolved_slug = candidates[0];
|
||||
} else if (candidates.length > 1) {
|
||||
return { error: 'ambiguous_slug', candidates };
|
||||
}
|
||||
}
|
||||
|
||||
if (!page) {
|
||||
throw new OperationError('page_not_found', `Page not found: ${slug}`, 'Check the slug or use fuzzy: true');
|
||||
}
|
||||
|
||||
const tags = await ctx.engine.getTags(page.slug);
|
||||
return { ...page, tags, ...(resolved_slug ? { resolved_slug } : {}) };
|
||||
},
|
||||
cliHints: { name: 'get', positional: ['slug'] },
|
||||
};
|
||||
|
||||
const put_page: Operation = {
|
||||
name: 'put_page',
|
||||
description: 'Write/update a page (markdown with frontmatter). Chunks, embeds, and reconciles tags.',
|
||||
params: {
|
||||
slug: { type: 'string', required: true, description: 'Page slug' },
|
||||
content: { type: 'string', required: true, description: 'Full markdown content with YAML frontmatter' },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'put_page', slug: p.slug };
|
||||
const result = await importFromContent(ctx.engine, p.slug as string, p.content as string);
|
||||
return { slug: result.slug, status: result.status === 'imported' ? 'created_or_updated' : result.status, chunks: result.chunks };
|
||||
},
|
||||
cliHints: { name: 'put', positional: ['slug'], stdin: 'content' },
|
||||
};
|
||||
|
||||
const delete_page: Operation = {
|
||||
name: 'delete_page',
|
||||
description: 'Delete a page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'delete_page', slug: p.slug };
|
||||
await ctx.engine.deletePage(p.slug as string);
|
||||
return { status: 'deleted' };
|
||||
},
|
||||
cliHints: { name: 'delete', positional: ['slug'] },
|
||||
};
|
||||
|
||||
const list_pages: Operation = {
|
||||
name: 'list_pages',
|
||||
description: 'List pages with optional filters',
|
||||
params: {
|
||||
type: { type: 'string', description: 'Filter by page type' },
|
||||
tag: { type: 'string', description: 'Filter by tag' },
|
||||
limit: { type: 'number', description: 'Max results (default 50)' },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
const pages = await ctx.engine.listPages({
|
||||
type: p.type as any,
|
||||
tag: p.tag as string,
|
||||
limit: (p.limit as number) || 50,
|
||||
});
|
||||
return pages.map(pg => ({
|
||||
slug: pg.slug,
|
||||
type: pg.type,
|
||||
title: pg.title,
|
||||
updated_at: pg.updated_at,
|
||||
}));
|
||||
},
|
||||
cliHints: { name: 'list' },
|
||||
};
|
||||
|
||||
// --- Search ---
|
||||
|
||||
const search: Operation = {
|
||||
name: 'search',
|
||||
description: 'Keyword search using full-text search',
|
||||
params: {
|
||||
query: { type: 'string', required: true },
|
||||
limit: { type: 'number', description: 'Max results (default 20)' },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.searchKeyword(p.query as string, { limit: (p.limit as number) || 20 });
|
||||
},
|
||||
cliHints: { name: 'search', positional: ['query'] },
|
||||
};
|
||||
|
||||
const query: Operation = {
|
||||
name: 'query',
|
||||
description: 'Hybrid search with vector + keyword + multi-query expansion',
|
||||
params: {
|
||||
query: { type: 'string', required: true },
|
||||
limit: { type: 'number', description: 'Max results (default 20)' },
|
||||
expand: { type: 'boolean', description: 'Enable multi-query expansion (default: true)' },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
const expand = p.expand !== false;
|
||||
return hybridSearch(ctx.engine, p.query as string, {
|
||||
limit: (p.limit as number) || 20,
|
||||
expansion: expand,
|
||||
expandFn: expand ? expandQuery : undefined,
|
||||
});
|
||||
},
|
||||
cliHints: { name: 'query', positional: ['query'] },
|
||||
};
|
||||
|
||||
// --- Tags ---
|
||||
|
||||
const add_tag: Operation = {
|
||||
name: 'add_tag',
|
||||
description: 'Add tag to page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
tag: { type: 'string', required: true },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'add_tag', slug: p.slug, tag: p.tag };
|
||||
await ctx.engine.addTag(p.slug as string, p.tag as string);
|
||||
return { status: 'ok' };
|
||||
},
|
||||
cliHints: { name: 'tag', positional: ['slug', 'tag'] },
|
||||
};
|
||||
|
||||
const remove_tag: Operation = {
|
||||
name: 'remove_tag',
|
||||
description: 'Remove tag from page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
tag: { type: 'string', required: true },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'remove_tag', slug: p.slug, tag: p.tag };
|
||||
await ctx.engine.removeTag(p.slug as string, p.tag as string);
|
||||
return { status: 'ok' };
|
||||
},
|
||||
cliHints: { name: 'untag', positional: ['slug', 'tag'] },
|
||||
};
|
||||
|
||||
const get_tags: Operation = {
|
||||
name: 'get_tags',
|
||||
description: 'List tags for a page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.getTags(p.slug as string);
|
||||
},
|
||||
cliHints: { name: 'tags', positional: ['slug'] },
|
||||
};
|
||||
|
||||
// --- Links ---
|
||||
|
||||
const add_link: Operation = {
|
||||
name: 'add_link',
|
||||
description: 'Create link between pages',
|
||||
params: {
|
||||
from: { type: 'string', required: true },
|
||||
to: { type: 'string', required: true },
|
||||
link_type: { type: 'string', description: 'Link type (e.g., invested_in, works_at)' },
|
||||
context: { type: 'string', description: 'Context for the link' },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'add_link', from: p.from, to: p.to };
|
||||
await ctx.engine.addLink(
|
||||
p.from as string, p.to as string,
|
||||
(p.context as string) || '', (p.link_type as string) || '',
|
||||
);
|
||||
return { status: 'ok' };
|
||||
},
|
||||
cliHints: { name: 'link', positional: ['from', 'to'] },
|
||||
};
|
||||
|
||||
const remove_link: Operation = {
|
||||
name: 'remove_link',
|
||||
description: 'Remove link between pages',
|
||||
params: {
|
||||
from: { type: 'string', required: true },
|
||||
to: { type: 'string', required: true },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'remove_link', from: p.from, to: p.to };
|
||||
await ctx.engine.removeLink(p.from as string, p.to as string);
|
||||
return { status: 'ok' };
|
||||
},
|
||||
cliHints: { name: 'unlink', positional: ['from', 'to'] },
|
||||
};
|
||||
|
||||
const get_links: Operation = {
|
||||
name: 'get_links',
|
||||
description: 'List outgoing links from a page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.getLinks(p.slug as string);
|
||||
},
|
||||
};
|
||||
|
||||
const get_backlinks: Operation = {
|
||||
name: 'get_backlinks',
|
||||
description: 'List incoming links to a page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.getBacklinks(p.slug as string);
|
||||
},
|
||||
cliHints: { name: 'backlinks', positional: ['slug'] },
|
||||
};
|
||||
|
||||
const traverse_graph: Operation = {
|
||||
name: 'traverse_graph',
|
||||
description: 'Traverse link graph from a page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
depth: { type: 'number', description: 'Max traversal depth (default 5)' },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.traverseGraph(p.slug as string, (p.depth as number) || 5);
|
||||
},
|
||||
cliHints: { name: 'graph', positional: ['slug'] },
|
||||
};
|
||||
|
||||
// --- Timeline ---
|
||||
|
||||
const add_timeline_entry: Operation = {
|
||||
name: 'add_timeline_entry',
|
||||
description: 'Add timeline entry to a page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
date: { type: 'string', required: true },
|
||||
summary: { type: 'string', required: true },
|
||||
detail: { type: 'string' },
|
||||
source: { type: 'string' },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'add_timeline_entry', slug: p.slug };
|
||||
await ctx.engine.addTimelineEntry(p.slug as string, {
|
||||
date: p.date as string,
|
||||
source: (p.source as string) || '',
|
||||
summary: p.summary as string,
|
||||
detail: (p.detail as string) || '',
|
||||
});
|
||||
return { status: 'ok' };
|
||||
},
|
||||
cliHints: { name: 'timeline-add', positional: ['slug', 'date', 'summary'] },
|
||||
};
|
||||
|
||||
const get_timeline: Operation = {
|
||||
name: 'get_timeline',
|
||||
description: 'Get timeline entries for a page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.getTimeline(p.slug as string);
|
||||
},
|
||||
cliHints: { name: 'timeline', positional: ['slug'] },
|
||||
};
|
||||
|
||||
// --- Admin ---
|
||||
|
||||
const get_stats: Operation = {
|
||||
name: 'get_stats',
|
||||
description: 'Brain statistics (page count, chunk count, etc.)',
|
||||
params: {},
|
||||
handler: async (ctx) => {
|
||||
return ctx.engine.getStats();
|
||||
},
|
||||
cliHints: { name: 'stats' },
|
||||
};
|
||||
|
||||
const get_health: Operation = {
|
||||
name: 'get_health',
|
||||
description: 'Brain health dashboard (embed coverage, stale pages, orphans)',
|
||||
params: {},
|
||||
handler: async (ctx) => {
|
||||
return ctx.engine.getHealth();
|
||||
},
|
||||
cliHints: { name: 'health' },
|
||||
};
|
||||
|
||||
const get_versions: Operation = {
|
||||
name: 'get_versions',
|
||||
description: 'Page version history',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.getVersions(p.slug as string);
|
||||
},
|
||||
cliHints: { name: 'history', positional: ['slug'] },
|
||||
};
|
||||
|
||||
const revert_version: Operation = {
|
||||
name: 'revert_version',
|
||||
description: 'Revert page to a previous version',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
version_id: { type: 'number', required: true },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'revert_version', slug: p.slug, version_id: p.version_id };
|
||||
await ctx.engine.createVersion(p.slug as string);
|
||||
await ctx.engine.revertToVersion(p.slug as string, p.version_id as number);
|
||||
return { status: 'reverted' };
|
||||
},
|
||||
cliHints: { name: 'revert', positional: ['slug', 'version_id'] },
|
||||
};
|
||||
|
||||
// --- Sync ---
|
||||
|
||||
const sync_brain: Operation = {
|
||||
name: 'sync_brain',
|
||||
description: 'Sync git repo to brain (incremental)',
|
||||
params: {
|
||||
repo: { type: 'string', description: 'Path to git repo (optional if configured)' },
|
||||
dry_run: { type: 'boolean', description: 'Preview changes without applying' },
|
||||
full: { type: 'boolean', description: 'Full re-sync (ignore checkpoint)' },
|
||||
no_pull: { type: 'boolean', description: 'Skip git pull' },
|
||||
no_embed: { type: 'boolean', description: 'Skip embedding generation' },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
const { performSync } = await import('../commands/sync.ts');
|
||||
return performSync(ctx.engine, {
|
||||
repoPath: p.repo as string | undefined,
|
||||
dryRun: ctx.dryRun || (p.dry_run as boolean) || false,
|
||||
noEmbed: (p.no_embed as boolean) || false,
|
||||
noPull: (p.no_pull as boolean) || false,
|
||||
full: (p.full as boolean) || false,
|
||||
});
|
||||
},
|
||||
cliHints: { name: 'sync' },
|
||||
};
|
||||
|
||||
// --- Raw Data ---
|
||||
|
||||
const put_raw_data: Operation = {
|
||||
name: 'put_raw_data',
|
||||
description: 'Store raw API response data for a page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
source: { type: 'string', required: true, description: 'Data source (e.g., crustdata, happenstance)' },
|
||||
data: { type: 'object', required: true, description: 'Raw data object' },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'put_raw_data', slug: p.slug, source: p.source };
|
||||
await ctx.engine.putRawData(p.slug as string, p.source as string, p.data as object);
|
||||
return { status: 'ok' };
|
||||
},
|
||||
};
|
||||
|
||||
const get_raw_data: Operation = {
|
||||
name: 'get_raw_data',
|
||||
description: 'Retrieve raw data for a page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
source: { type: 'string', description: 'Filter by source' },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.getRawData(p.slug as string, p.source as string | undefined);
|
||||
},
|
||||
};
|
||||
|
||||
// --- Resolution & Chunks ---
|
||||
|
||||
const resolve_slugs: Operation = {
|
||||
name: 'resolve_slugs',
|
||||
description: 'Fuzzy-resolve a partial slug to matching page slugs',
|
||||
params: {
|
||||
partial: { type: 'string', required: true },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.resolveSlugs(p.partial as string);
|
||||
},
|
||||
};
|
||||
|
||||
const get_chunks: Operation = {
|
||||
name: 'get_chunks',
|
||||
description: 'Get content chunks for a page',
|
||||
params: {
|
||||
slug: { type: 'string', required: true },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.getChunks(p.slug as string);
|
||||
},
|
||||
};
|
||||
|
||||
// --- Ingest Log ---
|
||||
|
||||
const log_ingest: Operation = {
|
||||
name: 'log_ingest',
|
||||
description: 'Log an ingestion event',
|
||||
params: {
|
||||
source_type: { type: 'string', required: true },
|
||||
source_ref: { type: 'string', required: true },
|
||||
pages_updated: { type: 'array', required: true, items: { type: 'string' } },
|
||||
summary: { type: 'string', required: true },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'log_ingest' };
|
||||
await ctx.engine.logIngest({
|
||||
source_type: p.source_type as string,
|
||||
source_ref: p.source_ref as string,
|
||||
pages_updated: p.pages_updated as string[],
|
||||
summary: p.summary as string,
|
||||
});
|
||||
return { status: 'ok' };
|
||||
},
|
||||
};
|
||||
|
||||
const get_ingest_log: Operation = {
|
||||
name: 'get_ingest_log',
|
||||
description: 'Get recent ingestion log entries',
|
||||
params: {
|
||||
limit: { type: 'number', description: 'Max entries (default 20)' },
|
||||
},
|
||||
handler: async (ctx, p) => {
|
||||
return ctx.engine.getIngestLog({ limit: (p.limit as number) || 20 });
|
||||
},
|
||||
};
|
||||
|
||||
// --- File Operations ---
|
||||
|
||||
const file_list: Operation = {
|
||||
name: 'file_list',
|
||||
description: 'List stored files',
|
||||
params: {
|
||||
slug: { type: 'string', description: 'Filter by page slug' },
|
||||
},
|
||||
handler: async (_ctx, p) => {
|
||||
const sql = db.getConnection();
|
||||
const slug = p.slug as string | undefined;
|
||||
if (slug) {
|
||||
return sql`SELECT id, page_slug, filename, storage_path, mime_type, size_bytes, content_hash, created_at FROM files WHERE page_slug = ${slug} ORDER BY filename`;
|
||||
}
|
||||
return sql`SELECT id, page_slug, filename, storage_path, mime_type, size_bytes, content_hash, created_at FROM files ORDER BY page_slug, filename LIMIT 100`;
|
||||
},
|
||||
};
|
||||
|
||||
const file_upload: Operation = {
|
||||
name: 'file_upload',
|
||||
description: 'Upload a file to storage',
|
||||
params: {
|
||||
path: { type: 'string', required: true, description: 'Local file path' },
|
||||
page_slug: { type: 'string', description: 'Associate with page' },
|
||||
},
|
||||
mutating: true,
|
||||
handler: async (ctx, p) => {
|
||||
if (ctx.dryRun) return { dry_run: true, action: 'file_upload', path: p.path };
|
||||
|
||||
const { readFileSync, statSync } = await import('fs');
|
||||
const { basename, extname } = await import('path');
|
||||
const { createHash } = await import('crypto');
|
||||
|
||||
const filePath = p.path as string;
|
||||
const pageSlug = (p.page_slug as string) || null;
|
||||
const stat = statSync(filePath);
|
||||
const content = readFileSync(filePath);
|
||||
const hash = createHash('sha256').update(content).digest('hex');
|
||||
const filename = basename(filePath);
|
||||
const storagePath = pageSlug ? `${pageSlug}/${filename}` : `unsorted/${hash.slice(0, 8)}-${filename}`;
|
||||
|
||||
const MIME_TYPES: Record<string, string> = {
|
||||
'.jpg': 'image/jpeg', '.jpeg': 'image/jpeg', '.png': 'image/png',
|
||||
'.gif': 'image/gif', '.webp': 'image/webp', '.svg': 'image/svg+xml',
|
||||
'.pdf': 'application/pdf', '.mp4': 'video/mp4', '.mp3': 'audio/mpeg',
|
||||
};
|
||||
const mimeType = MIME_TYPES[extname(filePath).toLowerCase()] || null;
|
||||
|
||||
const sql = db.getConnection();
|
||||
const existing = await sql`SELECT id FROM files WHERE content_hash = ${hash} AND storage_path = ${storagePath}`;
|
||||
if (existing.length > 0) {
|
||||
return { status: 'already_exists', storage_path: storagePath };
|
||||
}
|
||||
|
||||
await sql`
|
||||
INSERT INTO files (page_slug, filename, storage_path, mime_type, size_bytes, content_hash, metadata)
|
||||
VALUES (${pageSlug}, ${filename}, ${storagePath}, ${mimeType}, ${stat.size}, ${hash}, ${'{}'}::jsonb)
|
||||
ON CONFLICT (storage_path) DO UPDATE SET
|
||||
content_hash = EXCLUDED.content_hash,
|
||||
size_bytes = EXCLUDED.size_bytes,
|
||||
mime_type = EXCLUDED.mime_type
|
||||
`;
|
||||
|
||||
return { status: 'uploaded', storage_path: storagePath, size_bytes: stat.size };
|
||||
},
|
||||
};
|
||||
|
||||
const file_url: Operation = {
|
||||
name: 'file_url',
|
||||
description: 'Get a URL for a stored file',
|
||||
params: {
|
||||
storage_path: { type: 'string', required: true },
|
||||
},
|
||||
handler: async (_ctx, p) => {
|
||||
const sql = db.getConnection();
|
||||
const rows = await sql`SELECT storage_path, mime_type, size_bytes FROM files WHERE storage_path = ${p.storage_path as string}`;
|
||||
if (rows.length === 0) {
|
||||
throw new OperationError('storage_error', `File not found: ${p.storage_path}`);
|
||||
}
|
||||
// TODO: generate signed URL from Supabase Storage
|
||||
return { storage_path: rows[0].storage_path, url: `gbrain:files/${rows[0].storage_path}` };
|
||||
},
|
||||
};
|
||||
|
||||
// --- Exports ---
|
||||
|
||||
export const operations: Operation[] = [
|
||||
// Page CRUD
|
||||
get_page, put_page, delete_page, list_pages,
|
||||
// Search
|
||||
search, query,
|
||||
// Tags
|
||||
add_tag, remove_tag, get_tags,
|
||||
// Links
|
||||
add_link, remove_link, get_links, get_backlinks, traverse_graph,
|
||||
// Timeline
|
||||
add_timeline_entry, get_timeline,
|
||||
// Admin
|
||||
get_stats, get_health, get_versions, revert_version,
|
||||
// Sync
|
||||
sync_brain,
|
||||
// Raw data
|
||||
put_raw_data, get_raw_data,
|
||||
// Resolution & chunks
|
||||
resolve_slugs, get_chunks,
|
||||
// Ingest log
|
||||
log_ingest, get_ingest_log,
|
||||
// Files
|
||||
file_list, file_upload, file_url,
|
||||
];
|
||||
|
||||
export const operationsByName = Object.fromEntries(
|
||||
operations.map(op => [op.name, op]),
|
||||
) as Record<string, Operation>;
|
||||
@@ -46,7 +46,7 @@ export class PostgresEngine implements BrainEngine {
|
||||
async putPage(slug: string, page: PageInput): Promise<Page> {
|
||||
validateSlug(slug);
|
||||
const sql = db.getConnection();
|
||||
const hash = contentHash(page.compiled_truth, page.timeline || '');
|
||||
const hash = page.content_hash || contentHash(page.compiled_truth, page.timeline || '');
|
||||
const frontmatter = page.frontmatter || {};
|
||||
|
||||
const rows = await sql`
|
||||
@@ -285,11 +285,11 @@ export class PostgresEngine implements BrainEngine {
|
||||
)
|
||||
SELECT DISTINCT g.slug, g.title, g.type, g.depth,
|
||||
coalesce(
|
||||
(SELECT json_agg(json_build_object('to_slug', p3.slug, 'link_type', l2.link_type))
|
||||
(SELECT jsonb_agg(jsonb_build_object('to_slug', p3.slug, 'link_type', l2.link_type))
|
||||
FROM links l2
|
||||
JOIN pages p3 ON p3.id = l2.to_page_id
|
||||
WHERE l2.from_page_id = g.id),
|
||||
'[]'::json
|
||||
'[]'::jsonb
|
||||
) as links
|
||||
FROM graph g
|
||||
ORDER BY g.depth, g.slug
|
||||
|
||||
@@ -20,6 +20,7 @@ export interface PageInput {
|
||||
compiled_truth: string;
|
||||
timeline?: string;
|
||||
frontmatter?: Record<string, unknown>;
|
||||
content_hash?: string;
|
||||
}
|
||||
|
||||
export interface PageFilters {
|
||||
|
||||
@@ -1,12 +1,9 @@
|
||||
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
|
||||
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
|
||||
import type { BrainEngine } from '../core/engine.ts';
|
||||
import { parseMarkdown, serializeMarkdown } from '../core/markdown.ts';
|
||||
import { hybridSearch } from '../core/search/hybrid.ts';
|
||||
import { expandQuery } from '../core/search/expansion.ts';
|
||||
import { chunkText } from '../core/chunkers/recursive.ts';
|
||||
import { embedBatch } from '../core/embedding.ts';
|
||||
import type { ChunkInput } from '../core/types.ts';
|
||||
import { operations, OperationError } from '../core/operations.ts';
|
||||
import type { OperationContext } from '../core/operations.ts';
|
||||
import { loadConfig } from '../core/config.ts';
|
||||
import { VERSION } from '../version.ts';
|
||||
|
||||
export async function startMcpServer(engine: BrainEngine) {
|
||||
@@ -15,16 +12,54 @@ export async function startMcpServer(engine: BrainEngine) {
|
||||
{ capabilities: { tools: {} } },
|
||||
);
|
||||
|
||||
// Generate tool definitions from operations
|
||||
server.setRequestHandler('tools/list' as any, async () => ({
|
||||
tools: getToolDefinitions(),
|
||||
tools: operations.map(op => ({
|
||||
name: op.name,
|
||||
description: op.description,
|
||||
inputSchema: {
|
||||
type: 'object' as const,
|
||||
properties: Object.fromEntries(
|
||||
Object.entries(op.params).map(([k, v]) => [k, {
|
||||
type: v.type === 'array' ? 'array' : v.type,
|
||||
...(v.description ? { description: v.description } : {}),
|
||||
...(v.enum ? { enum: v.enum } : {}),
|
||||
...(v.items ? { items: { type: v.items.type } } : {}),
|
||||
}]),
|
||||
),
|
||||
required: Object.entries(op.params)
|
||||
.filter(([, v]) => v.required)
|
||||
.map(([k]) => k),
|
||||
},
|
||||
})),
|
||||
}));
|
||||
|
||||
// Dispatch tool calls to operation handlers
|
||||
server.setRequestHandler('tools/call' as any, async (request: any) => {
|
||||
const { name, arguments: params } = request.params;
|
||||
const op = operations.find(o => o.name === name);
|
||||
if (!op) {
|
||||
return { content: [{ type: 'text', text: `Error: Unknown tool: ${name}` }], isError: true };
|
||||
}
|
||||
|
||||
const ctx: OperationContext = {
|
||||
engine,
|
||||
config: loadConfig() || { engine: 'postgres' },
|
||||
logger: {
|
||||
info: (msg: string) => process.stderr.write(`[info] ${msg}\n`),
|
||||
warn: (msg: string) => process.stderr.write(`[warn] ${msg}\n`),
|
||||
error: (msg: string) => process.stderr.write(`[error] ${msg}\n`),
|
||||
},
|
||||
dryRun: !!(params?.dry_run),
|
||||
};
|
||||
|
||||
try {
|
||||
const result = await handleToolCall(engine, name, params || {});
|
||||
const result = await op.handler(ctx, params || {});
|
||||
return { content: [{ type: 'text', text: JSON.stringify(result, null, 2) }] };
|
||||
} catch (e: unknown) {
|
||||
if (e instanceof OperationError) {
|
||||
return { content: [{ type: 'text', text: JSON.stringify(e.toJSON(), null, 2) }], isError: true };
|
||||
}
|
||||
const msg = e instanceof Error ? e.message : String(e);
|
||||
return { content: [{ type: 'text', text: `Error: ${msg}` }], isError: true };
|
||||
}
|
||||
@@ -34,200 +69,21 @@ export async function startMcpServer(engine: BrainEngine) {
|
||||
await server.connect(transport);
|
||||
}
|
||||
|
||||
// Backward compat: used by `gbrain call` command
|
||||
export async function handleToolCall(
|
||||
engine: BrainEngine,
|
||||
tool: string,
|
||||
params: Record<string, unknown>,
|
||||
): Promise<unknown> {
|
||||
switch (tool) {
|
||||
case 'get_page': {
|
||||
const slug = params.slug as string;
|
||||
const page = await engine.getPage(slug);
|
||||
if (!page) return { error: `Page not found: ${slug}` };
|
||||
const tags = await engine.getTags(slug);
|
||||
return { ...page, tags };
|
||||
}
|
||||
const op = operations.find(o => o.name === tool);
|
||||
if (!op) throw new Error(`Unknown tool: ${tool}`);
|
||||
|
||||
case 'put_page': {
|
||||
const slug = params.slug as string;
|
||||
const content = params.content as string;
|
||||
const parsed = parseMarkdown(content, slug + '.md');
|
||||
const ctx: OperationContext = {
|
||||
engine,
|
||||
config: loadConfig() || { engine: 'postgres' },
|
||||
logger: { info: console.log, warn: console.warn, error: console.error },
|
||||
dryRun: false,
|
||||
};
|
||||
|
||||
const existing = await engine.getPage(slug);
|
||||
if (existing) await engine.createVersion(slug);
|
||||
|
||||
const page = await engine.putPage(slug, {
|
||||
type: parsed.type,
|
||||
title: parsed.title,
|
||||
compiled_truth: parsed.compiled_truth,
|
||||
timeline: parsed.timeline,
|
||||
frontmatter: parsed.frontmatter,
|
||||
});
|
||||
|
||||
for (const tag of parsed.tags) await engine.addTag(slug, tag);
|
||||
|
||||
// Chunk and embed
|
||||
const chunks: ChunkInput[] = [];
|
||||
if (parsed.compiled_truth.trim()) {
|
||||
for (const c of chunkText(parsed.compiled_truth)) {
|
||||
chunks.push({ chunk_index: chunks.length, chunk_text: c.text, chunk_source: 'compiled_truth' });
|
||||
}
|
||||
}
|
||||
if (parsed.timeline.trim()) {
|
||||
for (const c of chunkText(parsed.timeline)) {
|
||||
chunks.push({ chunk_index: chunks.length, chunk_text: c.text, chunk_source: 'timeline' });
|
||||
}
|
||||
}
|
||||
if (chunks.length > 0) {
|
||||
try {
|
||||
const embeddings = await embedBatch(chunks.map(c => c.chunk_text));
|
||||
for (let i = 0; i < chunks.length; i++) {
|
||||
chunks[i].embedding = embeddings[i];
|
||||
}
|
||||
} catch { /* non-fatal */ }
|
||||
await engine.upsertChunks(slug, chunks);
|
||||
}
|
||||
|
||||
return { slug: page.slug, status: existing ? 'updated' : 'created' };
|
||||
}
|
||||
|
||||
case 'delete_page': {
|
||||
await engine.deletePage(params.slug as string);
|
||||
return { status: 'deleted' };
|
||||
}
|
||||
|
||||
case 'list_pages': {
|
||||
const pages = await engine.listPages({
|
||||
type: params.type as any,
|
||||
tag: params.tag as string,
|
||||
limit: (params.limit as number) || 50,
|
||||
});
|
||||
return pages.map(p => ({ slug: p.slug, type: p.type, title: p.title, updated_at: p.updated_at }));
|
||||
}
|
||||
|
||||
case 'search': {
|
||||
return engine.searchKeyword(params.query as string, { limit: (params.limit as number) || 20 });
|
||||
}
|
||||
|
||||
case 'query': {
|
||||
return hybridSearch(engine, params.query as string, {
|
||||
limit: (params.limit as number) || 20,
|
||||
expansion: true,
|
||||
expandFn: expandQuery,
|
||||
});
|
||||
}
|
||||
|
||||
case 'add_tag': {
|
||||
await engine.addTag(params.slug as string, params.tag as string);
|
||||
return { status: 'ok' };
|
||||
}
|
||||
|
||||
case 'remove_tag': {
|
||||
await engine.removeTag(params.slug as string, params.tag as string);
|
||||
return { status: 'ok' };
|
||||
}
|
||||
|
||||
case 'get_tags': {
|
||||
return engine.getTags(params.slug as string);
|
||||
}
|
||||
|
||||
case 'add_link': {
|
||||
await engine.addLink(
|
||||
params.from as string,
|
||||
params.to as string,
|
||||
params.context as string || '',
|
||||
params.link_type as string || '',
|
||||
);
|
||||
return { status: 'ok' };
|
||||
}
|
||||
|
||||
case 'remove_link': {
|
||||
await engine.removeLink(params.from as string, params.to as string);
|
||||
return { status: 'ok' };
|
||||
}
|
||||
|
||||
case 'get_links': {
|
||||
return engine.getLinks(params.slug as string);
|
||||
}
|
||||
|
||||
case 'get_backlinks': {
|
||||
return engine.getBacklinks(params.slug as string);
|
||||
}
|
||||
|
||||
case 'traverse_graph': {
|
||||
return engine.traverseGraph(params.slug as string, (params.depth as number) || 5);
|
||||
}
|
||||
|
||||
case 'add_timeline_entry': {
|
||||
await engine.addTimelineEntry(params.slug as string, {
|
||||
date: params.date as string,
|
||||
source: params.source as string || '',
|
||||
summary: params.summary as string,
|
||||
detail: params.detail as string || '',
|
||||
});
|
||||
return { status: 'ok' };
|
||||
}
|
||||
|
||||
case 'get_timeline': {
|
||||
return engine.getTimeline(params.slug as string);
|
||||
}
|
||||
|
||||
case 'get_stats': {
|
||||
return engine.getStats();
|
||||
}
|
||||
|
||||
case 'get_health': {
|
||||
return engine.getHealth();
|
||||
}
|
||||
|
||||
case 'get_versions': {
|
||||
return engine.getVersions(params.slug as string);
|
||||
}
|
||||
|
||||
case 'revert_version': {
|
||||
await engine.createVersion(params.slug as string);
|
||||
await engine.revertToVersion(params.slug as string, params.version_id as number);
|
||||
return { status: 'reverted' };
|
||||
}
|
||||
|
||||
case 'sync_brain': {
|
||||
const { performSync } = await import('../commands/sync.ts');
|
||||
return performSync(engine, {
|
||||
repoPath: params.repo as string | undefined,
|
||||
dryRun: (params.dry_run as boolean) || false,
|
||||
noEmbed: false,
|
||||
noPull: false,
|
||||
full: false,
|
||||
});
|
||||
}
|
||||
|
||||
default:
|
||||
throw new Error(`Unknown tool: ${tool}`);
|
||||
}
|
||||
}
|
||||
|
||||
function getToolDefinitions() {
|
||||
return [
|
||||
{ name: 'get_page', description: 'Read a page by slug', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'put_page', description: 'Write/update a page (markdown with frontmatter)', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, content: { type: 'string', description: 'Full markdown content with YAML frontmatter' } }, required: ['slug', 'content'] } },
|
||||
{ name: 'delete_page', description: 'Delete a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'list_pages', description: 'List pages with optional filters', inputSchema: { type: 'object', properties: { type: { type: 'string' }, tag: { type: 'string' }, limit: { type: 'number' } } } },
|
||||
{ name: 'search', description: 'Keyword search using full-text search', inputSchema: { type: 'object', properties: { query: { type: 'string' }, limit: { type: 'number' } }, required: ['query'] } },
|
||||
{ name: 'query', description: 'Hybrid search with vector + keyword + multi-query expansion', inputSchema: { type: 'object', properties: { query: { type: 'string' }, limit: { type: 'number' } }, required: ['query'] } },
|
||||
{ name: 'add_tag', description: 'Add tag to page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, tag: { type: 'string' } }, required: ['slug', 'tag'] } },
|
||||
{ name: 'remove_tag', description: 'Remove tag from page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, tag: { type: 'string' } }, required: ['slug', 'tag'] } },
|
||||
{ name: 'get_tags', description: 'List tags for a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'add_link', description: 'Create link between pages', inputSchema: { type: 'object', properties: { from: { type: 'string' }, to: { type: 'string' }, link_type: { type: 'string' }, context: { type: 'string' } }, required: ['from', 'to'] } },
|
||||
{ name: 'remove_link', description: 'Remove link between pages', inputSchema: { type: 'object', properties: { from: { type: 'string' }, to: { type: 'string' } }, required: ['from', 'to'] } },
|
||||
{ name: 'get_links', description: 'List outgoing links from a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'get_backlinks', description: 'List incoming links to a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'traverse_graph', description: 'Traverse link graph from a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, depth: { type: 'number', description: 'Max traversal depth (default 5)' } }, required: ['slug'] } },
|
||||
{ name: 'add_timeline_entry', description: 'Add timeline entry to a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, date: { type: 'string' }, summary: { type: 'string' }, detail: { type: 'string' }, source: { type: 'string' } }, required: ['slug', 'date', 'summary'] } },
|
||||
{ name: 'get_timeline', description: 'Get timeline entries for a page', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'get_stats', description: 'Brain statistics (page count, chunk count, etc.)', inputSchema: { type: 'object', properties: {} } },
|
||||
{ name: 'get_health', description: 'Brain health dashboard (embed coverage, stale pages, orphans)', inputSchema: { type: 'object', properties: {} } },
|
||||
{ name: 'get_versions', description: 'Page version history', inputSchema: { type: 'object', properties: { slug: { type: 'string' } }, required: ['slug'] } },
|
||||
{ name: 'revert_version', description: 'Revert page to a previous version', inputSchema: { type: 'object', properties: { slug: { type: 'string' }, version_id: { type: 'number' } }, required: ['slug', 'version_id'] } },
|
||||
{ name: 'sync_brain', description: 'Sync git repo to brain (incremental)', inputSchema: { type: 'object', properties: { repo: { type: 'string', description: 'Path to git repo (optional if configured)' }, dry_run: { type: 'boolean', description: 'Preview changes without applying' } } } },
|
||||
];
|
||||
return op.handler(ctx, params);
|
||||
}
|
||||
|
||||
@@ -149,7 +149,6 @@ CREATE TABLE IF NOT EXISTS files (
|
||||
page_slug TEXT REFERENCES pages(slug) ON DELETE SET NULL ON UPDATE CASCADE,
|
||||
filename TEXT NOT NULL,
|
||||
storage_path TEXT NOT NULL,
|
||||
storage_url TEXT NOT NULL,
|
||||
mime_type TEXT,
|
||||
size_bytes BIGINT,
|
||||
content_hash TEXT NOT NULL,
|
||||
@@ -158,6 +157,9 @@ CREATE TABLE IF NOT EXISTS files (
|
||||
UNIQUE(storage_path)
|
||||
);
|
||||
|
||||
-- Migration: drop storage_url if it exists (renamed to storage_path only)
|
||||
ALTER TABLE files DROP COLUMN IF EXISTS storage_url;
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_files_page ON files(page_slug);
|
||||
CREATE INDEX IF NOT EXISTS idx_files_hash ON files(content_hash);
|
||||
|
||||
|
||||
112
test/cli.test.ts
112
test/cli.test.ts
@@ -1,66 +1,29 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { readFileSync } from 'fs';
|
||||
|
||||
// Read cli.ts source to extract COMMAND_HELP keys and switch cases
|
||||
// Read cli.ts source for structural checks
|
||||
const cliSource = readFileSync(new URL('../src/cli.ts', import.meta.url), 'utf-8');
|
||||
|
||||
// Extract COMMAND_HELP keys from the map
|
||||
function extractCommandHelpKeys(source: string): string[] {
|
||||
const mapMatch = source.match(/const COMMAND_HELP:\s*Record<string,\s*string>\s*=\s*\{([\s\S]*?)\};/);
|
||||
if (!mapMatch) return [];
|
||||
const keys: string[] = [];
|
||||
for (const m of mapMatch[1].matchAll(/^\s*['"]?([a-z-]+)['"]?\s*:/gm)) {
|
||||
keys.push(m[1]);
|
||||
}
|
||||
return keys.sort();
|
||||
}
|
||||
|
||||
// Extract switch case labels from the switch(command) block
|
||||
function extractSwitchCases(source: string): string[] {
|
||||
const cases: string[] = [];
|
||||
for (const m of source.matchAll(/case\s+'([^']+)':\s*\{/g)) {
|
||||
cases.push(m[1]);
|
||||
}
|
||||
return [...new Set(cases)].sort();
|
||||
}
|
||||
|
||||
// Extract commands handled before the switch (init, upgrade)
|
||||
function extractEarlyCommands(source: string): string[] {
|
||||
const cmds: string[] = [];
|
||||
for (const m of source.matchAll(/if\s*\(command\s*===\s*'([^']+)'\)/g)) {
|
||||
if (!['--help', '-h', '--version', '--tools-json'].includes(m[1])) {
|
||||
cmds.push(m[1]);
|
||||
}
|
||||
}
|
||||
return [...new Set(cmds)].sort();
|
||||
}
|
||||
|
||||
describe('CLI COMMAND_HELP consistency', () => {
|
||||
const helpKeys = extractCommandHelpKeys(cliSource);
|
||||
const switchCases = extractSwitchCases(cliSource);
|
||||
const earlyCmds = extractEarlyCommands(cliSource);
|
||||
const allHandled = [...switchCases, ...earlyCmds].sort();
|
||||
|
||||
test('COMMAND_HELP has entries for all switch cases', () => {
|
||||
for (const cmd of switchCases) {
|
||||
expect(helpKeys).toContain(cmd);
|
||||
}
|
||||
describe('CLI structure', () => {
|
||||
test('imports operations from operations.ts', () => {
|
||||
expect(cliSource).toContain("from './core/operations.ts'");
|
||||
});
|
||||
|
||||
test('COMMAND_HELP has entries for early-dispatch commands (init, upgrade)', () => {
|
||||
for (const cmd of earlyCmds) {
|
||||
expect(helpKeys).toContain(cmd);
|
||||
}
|
||||
test('builds cliOps map from operations', () => {
|
||||
expect(cliSource).toContain('cliOps');
|
||||
});
|
||||
|
||||
test('every COMMAND_HELP key maps to a handled command', () => {
|
||||
for (const key of helpKeys) {
|
||||
expect(allHandled).toContain(key);
|
||||
}
|
||||
test('CLI_ONLY set contains expected commands', () => {
|
||||
expect(cliSource).toContain("'init'");
|
||||
expect(cliSource).toContain("'upgrade'");
|
||||
expect(cliSource).toContain("'import'");
|
||||
expect(cliSource).toContain("'export'");
|
||||
expect(cliSource).toContain("'embed'");
|
||||
expect(cliSource).toContain("'files'");
|
||||
});
|
||||
|
||||
test('COMMAND_HELP has at least 25 entries', () => {
|
||||
expect(helpKeys.length).toBeGreaterThanOrEqual(25);
|
||||
test('has formatResult function for CLI output', () => {
|
||||
expect(cliSource).toContain('function formatResult');
|
||||
});
|
||||
});
|
||||
|
||||
@@ -77,24 +40,6 @@ describe('CLI version', () => {
|
||||
});
|
||||
});
|
||||
|
||||
describe('CLI help text', () => {
|
||||
test('every COMMAND_HELP entry starts with Usage:', () => {
|
||||
const mapMatch = cliSource.match(/const COMMAND_HELP:\s*Record<string,\s*string>\s*=\s*\{([\s\S]*?)\};/);
|
||||
expect(mapMatch).not.toBeNull();
|
||||
// Verify by importing and checking
|
||||
const keys = extractCommandHelpKeys(cliSource);
|
||||
expect(keys.length).toBeGreaterThan(0);
|
||||
// Each help string in the source should contain 'Usage:'
|
||||
for (const key of keys) {
|
||||
const pattern = new RegExp(`['"]?${key.replace('-', '\\-')}['"]?:\\s*['"\`]([^'"\`]*)`);
|
||||
const match = cliSource.match(pattern);
|
||||
if (match) {
|
||||
expect(match[1]).toContain('Usage:');
|
||||
}
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe('CLI dispatch integration', () => {
|
||||
test('--version outputs version', async () => {
|
||||
const proc = Bun.spawn(['bun', 'run', 'src/cli.ts', '--version'], {
|
||||
@@ -143,18 +88,6 @@ describe('CLI dispatch integration', () => {
|
||||
expect(exitCode).toBe(0);
|
||||
});
|
||||
|
||||
test('init --help prints usage without running wizard', async () => {
|
||||
const proc = Bun.spawn(['bun', 'run', 'src/cli.ts', 'init', '--help'], {
|
||||
cwd: new URL('..', import.meta.url).pathname,
|
||||
stdout: 'pipe',
|
||||
stderr: 'pipe',
|
||||
});
|
||||
const stdout = await new Response(proc.stdout).text();
|
||||
const exitCode = await proc.exited;
|
||||
expect(stdout).toContain('Usage: gbrain init');
|
||||
expect(exitCode).toBe(0);
|
||||
});
|
||||
|
||||
test('--help prints global help', async () => {
|
||||
const proc = Bun.spawn(['bun', 'run', 'src/cli.ts', '--help'], {
|
||||
cwd: new URL('..', import.meta.url).pathname,
|
||||
@@ -168,16 +101,19 @@ describe('CLI dispatch integration', () => {
|
||||
expect(exitCode).toBe(0);
|
||||
});
|
||||
|
||||
test('files --help prints subcommand help', async () => {
|
||||
const proc = Bun.spawn(['bun', 'run', 'src/cli.ts', 'files', '--help'], {
|
||||
test('--tools-json outputs valid JSON with operations', async () => {
|
||||
const proc = Bun.spawn(['bun', 'run', 'src/cli.ts', '--tools-json'], {
|
||||
cwd: new URL('..', import.meta.url).pathname,
|
||||
stdout: 'pipe',
|
||||
stderr: 'pipe',
|
||||
});
|
||||
const stdout = await new Response(proc.stdout).text();
|
||||
const exitCode = await proc.exited;
|
||||
expect(stdout).toContain('files list');
|
||||
expect(stdout).toContain('files upload');
|
||||
expect(exitCode).toBe(0);
|
||||
await proc.exited;
|
||||
const tools = JSON.parse(stdout);
|
||||
expect(Array.isArray(tools)).toBe(true);
|
||||
expect(tools.length).toBeGreaterThanOrEqual(30);
|
||||
expect(tools[0]).toHaveProperty('name');
|
||||
expect(tools[0]).toHaveProperty('description');
|
||||
expect(tools[0]).toHaveProperty('parameters');
|
||||
});
|
||||
});
|
||||
|
||||
68
test/e2e/fixtures/companies/novamind.md
Normal file
68
test/e2e/fixtures/companies/novamind.md
Normal file
@@ -0,0 +1,68 @@
|
||||
---
|
||||
type: company
|
||||
title: NovaMind
|
||||
tags:
|
||||
- yc-w25
|
||||
- ai-agents
|
||||
- seed-stage
|
||||
---
|
||||
|
||||
# NovaMind
|
||||
|
||||
AI agent startup building autonomous agents for enterprise workflow automation. YC W25
|
||||
batch. Currently seed stage.
|
||||
|
||||
## Overview
|
||||
|
||||
NovaMind replaces traditional SaaS dashboards with fleets of task-specific AI agents
|
||||
that execute complex business workflows end-to-end. Their flagship demo is a
|
||||
procurement agent that handles a 47-step workflow autonomously: vendor discovery, RFQ
|
||||
generation, bid comparison, approval routing, and purchase order creation.
|
||||
|
||||
## Key People
|
||||
|
||||
- Sarah Chen — Founder and CEO. Former Anthropic ML engineer. Stanford CS 2020.
|
||||
- Priya Patel — CTO and co-founder. Stanford CS PhD 2022. Ex-Google Brain.
|
||||
|
||||
## Funding
|
||||
|
||||
- Seed: $4M raised March 2025, led by Threshold Ventures (Marcus Reid). Post-money
|
||||
valuation $20M. Angels include YC partners.
|
||||
- Pre-seed: YC standard deal (W25 batch).
|
||||
|
||||
## Technology
|
||||
|
||||
- Multi-agent coordination layer designed by Priya Patel, based on her Stanford
|
||||
research on emergent communication protocols.
|
||||
- "Compiled procedures" — agents learn reusable sub-routines from successful task
|
||||
completions rather than relying on static prompt chains.
|
||||
- Supervisor agent architecture for error recovery and dynamic re-planning.
|
||||
- 94% task completion rate on complex multi-step workflows in benchmarks.
|
||||
|
||||
## Go-to-Market
|
||||
|
||||
- Vertical-first strategy starting with procurement and supply chain.
|
||||
- 2 enterprise design partners signed pre-launch.
|
||||
- Launch target: Q3 2025.
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-03-15 — YC W25 Demo Day
|
||||
|
||||
NovaMind presented at W25 Demo Day. Standout demo of the batch. Sarah Chen
|
||||
demonstrated the procurement agent live, completing the full 47-step workflow in under
|
||||
4 minutes. Strong investor interest post-presentation.
|
||||
|
||||
### 2025-03-28 — Seed Round Closed
|
||||
|
||||
Closed $4M seed led by Threshold Ventures. Marcus Reid joins board. Capital allocated
|
||||
primarily to hiring: 3 senior engineers, 1 design partner lead. Company is 4 people
|
||||
currently (Sarah, Priya, and 2 founding engineers).
|
||||
|
||||
### 2025-04-01 — Hiring Kickoff
|
||||
|
||||
Sarah shared that they posted senior engineer roles. Looking for people with
|
||||
distributed systems and/or ML inference optimization backgrounds. Targeting SF-based
|
||||
candidates for in-person collaboration during the early stage.
|
||||
53
test/e2e/fixtures/companies/threshold-ventures.md
Normal file
53
test/e2e/fixtures/companies/threshold-ventures.md
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
type: company
|
||||
title: Threshold Ventures
|
||||
tags:
|
||||
- vc
|
||||
- early-stage
|
||||
- ai-focus
|
||||
---
|
||||
|
||||
# Threshold Ventures
|
||||
|
||||
Early-stage venture capital fund based in San Francisco. Invests primarily at seed and
|
||||
Series A stages. Current fund size is approximately $200M.
|
||||
|
||||
## Focus Areas
|
||||
|
||||
- AI/ML infrastructure and applications
|
||||
- Developer tools and platforms
|
||||
- Cloud infrastructure and DevOps
|
||||
- Data engineering and analytics
|
||||
|
||||
## Key Partners
|
||||
|
||||
- Marcus Reid — GP. Former Stripe engineer. Leads AI/ML investments.
|
||||
- Elena Torres — GP. Focuses on developer tools and infrastructure.
|
||||
- James Wu — GP. Covers data and analytics.
|
||||
|
||||
## Investment Style
|
||||
|
||||
Threshold is known for moving quickly on conviction-driven deals. Marcus Reid issued
|
||||
the NovaMind term sheet 3 days after Demo Day. The fund prefers technical founders with
|
||||
deep domain expertise and a clear vertical wedge. Typical check size is $2-5M at seed,
|
||||
$5-15M at Series A.
|
||||
|
||||
## Notable Portfolio Companies
|
||||
|
||||
- NovaMind (AI agents, YC W25) — $4M seed, Marcus Reid led
|
||||
- Several other AI infrastructure and developer tools companies
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-03-18 — NovaMind Term Sheet
|
||||
|
||||
Marcus Reid issued a seed term sheet to NovaMind: $4M at $20M post-money. Fast
|
||||
turnaround after W25 Demo Day. Marcus cited the 94% task completion rate and the
|
||||
strength of the Sarah Chen and Priya Patel founding team as key factors.
|
||||
|
||||
### 2025-03-28 — NovaMind Seed Closed
|
||||
|
||||
NovaMind seed round officially closed. Marcus takes board seat. This is Threshold's
|
||||
first investment from their current fund in the AI agents category.
|
||||
71
test/e2e/fixtures/concepts/compiled-truth.md
Normal file
71
test/e2e/fixtures/concepts/compiled-truth.md
Normal file
@@ -0,0 +1,71 @@
|
||||
---
|
||||
type: concept
|
||||
title: Compiled Truth
|
||||
tags:
|
||||
- architecture
|
||||
- brain-design
|
||||
---
|
||||
|
||||
# Compiled Truth
|
||||
|
||||
The two-layer page pattern used throughout GBrain. Every page has two distinct
|
||||
sections separated by a horizontal rule (`---`):
|
||||
|
||||
1. **Compiled truth** (above the line) — The current, canonical understanding of the
|
||||
subject. This section is rewritten and updated as new information arrives. It
|
||||
represents the latest synthesized knowledge, not a historical record.
|
||||
|
||||
2. **Timeline** (below the line) — An append-only log of evidence, observations, and
|
||||
events. New entries are added at the bottom. Old entries are never modified or
|
||||
deleted. Each entry is timestamped and captures what was known or observed at that
|
||||
moment.
|
||||
|
||||
## Why This Pattern
|
||||
|
||||
Traditional note-taking creates a "pile of pages" problem: information about a topic
|
||||
is scattered across meeting notes, emails, and documents. Finding the current state
|
||||
requires re-reading everything and mentally synthesizing.
|
||||
|
||||
Compiled truth solves this by maintaining a living summary that is always current.
|
||||
The timeline preserves the evidence trail so you can always trace how understanding
|
||||
evolved and verify claims against primary observations.
|
||||
|
||||
## Rules
|
||||
|
||||
- The compiled truth section is the single source of truth for "what do I currently
|
||||
believe about this topic."
|
||||
- When new information contradicts existing compiled truth, update the compiled truth
|
||||
and add a timeline entry explaining the change.
|
||||
- Timeline entries are immutable once written. They capture point-in-time observations.
|
||||
- The compiled truth section should be readable on its own without needing to read the
|
||||
timeline.
|
||||
- Cross-reference other entities by name (e.g., "Sarah Chen" not `[Sarah Chen](...)`)
|
||||
to enable search-based discovery.
|
||||
|
||||
## Relationship to RAG
|
||||
|
||||
GBrain uses retrieval-augmented generation (RAG) to surface relevant pages during
|
||||
queries. The compiled truth pattern means retrieved pages contain pre-synthesized
|
||||
knowledge rather than raw fragments. This produces higher quality answers because the
|
||||
LLM receives curated context rather than scattered notes.
|
||||
|
||||
This is a deliberate design choice: do the synthesis work at write time (when you have
|
||||
full context) rather than at read time (when the LLM must guess at connections).
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-02-10 — Pattern Formalized
|
||||
|
||||
Adopted the compiled truth + timeline pattern after experimenting with several
|
||||
knowledge management approaches. Wiki-style pages lost temporal context. Pure
|
||||
journaling created information sprawl. The two-layer approach preserves both the
|
||||
current understanding and the evidence trail.
|
||||
|
||||
### 2025-03-01 — Applied to All Page Types
|
||||
|
||||
Extended the pattern to all GBrain page types: people, companies, deals, meetings,
|
||||
concepts, projects, and sources. Each type uses the same two-layer structure with
|
||||
type-specific frontmatter fields. This consistency enables uniform search and
|
||||
retrieval across all knowledge categories.
|
||||
87
test/e2e/fixtures/concepts/hybrid-search.md
Normal file
87
test/e2e/fixtures/concepts/hybrid-search.md
Normal file
@@ -0,0 +1,87 @@
|
||||
---
|
||||
type: concept
|
||||
title: Hybrid Search
|
||||
tags:
|
||||
- search
|
||||
- architecture
|
||||
- gbrain
|
||||
---
|
||||
|
||||
# Hybrid Search
|
||||
|
||||
Hybrid search combines vector similarity search with keyword full-text search to
|
||||
deliver results that are both semantically relevant and keyword-precise. GBrain uses
|
||||
hybrid search as its core search architecture, merging the two result sets using
|
||||
Reciprocal Rank Fusion (RRF).
|
||||
|
||||
## The Problem
|
||||
|
||||
Neither vector search nor keyword search alone is sufficient for a personal knowledge
|
||||
brain:
|
||||
|
||||
- **Vector-only search** finds semantically similar content but can miss pages that
|
||||
contain an exact keyword or phrase. Searching for "NovaMind" might surface pages
|
||||
about AI agents generally rather than the specific NovaMind company page.
|
||||
- **Keyword-only search** finds exact matches but misses semantic near-matches.
|
||||
Searching for "autonomous agents" would not find pages that use "AI agents" or
|
||||
"agentic systems" instead.
|
||||
|
||||
## How Hybrid Search Works
|
||||
|
||||
1. **Vector search** — The query is embedded using OpenAI text-embedding-3-large and
|
||||
compared against stored document embeddings using cosine similarity via pgvector.
|
||||
Returns top-k results ranked by semantic similarity.
|
||||
|
||||
2. **Keyword search** — The query is processed as a Postgres tsquery against tsvector
|
||||
indexes on document content. Returns results ranked by ts_rank relevance.
|
||||
|
||||
3. **Reciprocal Rank Fusion (RRF)** — The two ranked result lists are merged using
|
||||
RRF scoring. For each document, the RRF score is calculated as:
|
||||
|
||||
`score = sum(1 / (k + rank_i))` for each result list where the document appears.
|
||||
|
||||
The constant `k` (typically 60) dampens the effect of high rankings in any single
|
||||
list. Documents that appear in both lists get boosted because they receive scores
|
||||
from both.
|
||||
|
||||
4. **Multi-query expansion** — GBrain generates multiple search queries from a single
|
||||
user question to improve recall. For example, "Who is Sarah Chen?" might expand to
|
||||
queries about "Sarah Chen founder", "NovaMind CEO", and "YC W25 Sarah".
|
||||
|
||||
5. **Deduplication** — Results that appear across multiple expanded queries are
|
||||
deduplicated, keeping the highest-scoring instance.
|
||||
|
||||
## Why RRF
|
||||
|
||||
Reciprocal Rank Fusion was chosen over other fusion methods (like linear combination
|
||||
of normalized scores) because:
|
||||
|
||||
- It is score-agnostic: vector cosine similarities and keyword tf-idf scores are on
|
||||
different scales, making direct score combination unreliable
|
||||
- It is robust: small changes in individual scores do not dramatically shift the
|
||||
merged ranking
|
||||
- It naturally boosts documents that appear in both result lists
|
||||
|
||||
## Implementation in GBrain
|
||||
|
||||
GBrain implements hybrid search in `src/core/search/` using Postgres as the single
|
||||
backend for both search modalities. Embeddings are stored in pgvector columns, and
|
||||
full-text search uses native Postgres tsvector/tsquery. This avoids the operational
|
||||
complexity of maintaining separate search indices (e.g., Elasticsearch + Pinecone).
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-03-28 — Decision to Implement
|
||||
|
||||
During the weekly sync, identified clear failure cases with keyword-only search.
|
||||
Example: searching "autonomous agents" did not find pages about "AI agents." Decided
|
||||
to ship hybrid search with RRF in GBrain v0.3 as the highest priority feature.
|
||||
|
||||
### 2025-04-01 — Shipped in v0.3
|
||||
|
||||
Hybrid search shipped as part of GBrain v0.3. Initial results show significant
|
||||
improvement in recall for semantic queries while maintaining precision for exact
|
||||
keyword searches. The RRF fusion with k=60 produces well-balanced rankings across
|
||||
diverse query types.
|
||||
76
test/e2e/fixtures/concepts/retrieval-augmented-generation.md
Normal file
76
test/e2e/fixtures/concepts/retrieval-augmented-generation.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
type: concept
|
||||
title: Retrieval-Augmented Generation
|
||||
aliases:
|
||||
- RAG
|
||||
- 検索拡張生成
|
||||
tags:
|
||||
- ai
|
||||
- search
|
||||
- architecture
|
||||
---
|
||||
|
||||
# Retrieval-Augmented Generation
|
||||
|
||||
Retrieval-Augmented Generation (RAG) is a technique that enhances large language model
|
||||
responses by retrieving relevant documents from a knowledge store and including them as
|
||||
context in the prompt. Also known in Japanese as 検索拡張生成 (RAG).
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Query embedding** — The user's query is converted into a vector embedding using a
|
||||
model like OpenAI's text-embedding-3-large.
|
||||
2. **Retrieval** — The query vector is compared against stored document vectors using
|
||||
similarity search (typically cosine similarity). The top-k most similar documents
|
||||
are retrieved.
|
||||
3. **Context stuffing** — Retrieved documents are inserted into the LLM prompt as
|
||||
context, giving the model access to specific, relevant knowledge.
|
||||
4. **Generation** — The LLM generates a response grounded in the retrieved context
|
||||
rather than relying solely on its training data.
|
||||
|
||||
## Advantages
|
||||
|
||||
- Grounds LLM responses in specific, up-to-date knowledge
|
||||
- Reduces hallucination by providing factual context
|
||||
- Allows knowledge to be updated without retraining the model
|
||||
- Scales to large knowledge bases with efficient vector indexing
|
||||
|
||||
## Limitations
|
||||
|
||||
- Quality depends heavily on retrieval accuracy — if the wrong documents are retrieved,
|
||||
the answer will be wrong or incomplete
|
||||
- Pure vector search can miss exact keyword matches (the "vocabulary mismatch" problem)
|
||||
- Chunk boundaries can split important context across fragments
|
||||
- No synthesis: retrieved chunks are raw fragments, not curated knowledge
|
||||
|
||||
## GBrain's Approach
|
||||
|
||||
GBrain uses RAG as its core query mechanism but addresses several standard RAG
|
||||
limitations through deliberate design choices:
|
||||
|
||||
- **Compiled truth** pages mean retrieved content is pre-synthesized knowledge rather
|
||||
than raw note fragments. This is the key differentiator from standard RAG systems.
|
||||
- **Hybrid search** combines vector similarity with keyword full-text search using
|
||||
Reciprocal Rank Fusion (RRF), addressing the vocabulary mismatch problem.
|
||||
- **Multi-query expansion** generates multiple search queries from a single user
|
||||
question to improve recall.
|
||||
- **Deduplication** ensures the same content is not retrieved multiple times when it
|
||||
matches across different query expansions.
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-02-15 — RAG Research
|
||||
|
||||
Evaluated standard RAG patterns for GBrain. Identified the core tension: RAG works
|
||||
best when retrieved documents are high quality and self-contained, but most note-taking
|
||||
systems produce fragmented, partially-overlapping content. This led to the compiled
|
||||
truth pattern as a write-time optimization for read-time retrieval quality.
|
||||
|
||||
### 2025-03-28 — Hybrid Search Decision
|
||||
|
||||
During weekly sync, decided to implement hybrid search (vector + keyword with RRF) for
|
||||
GBrain v0.3. Pure vector search was missing exact keyword matches, and pure keyword
|
||||
search was missing semantic near-matches. Hybrid search with Reciprocal Rank Fusion
|
||||
gives us the best of both approaches.
|
||||
64
test/e2e/fixtures/deals/novamind-seed.md
Normal file
64
test/e2e/fixtures/deals/novamind-seed.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
type: deal
|
||||
title: NovaMind Seed Round
|
||||
tags:
|
||||
- seed
|
||||
- ai-agents
|
||||
---
|
||||
|
||||
# NovaMind Seed Round
|
||||
|
||||
$4M seed round for NovaMind, closed March 2025.
|
||||
|
||||
## Terms
|
||||
|
||||
- Round size: $4M
|
||||
- Valuation: $20M post-money
|
||||
- Lead investor: Threshold Ventures (Marcus Reid)
|
||||
- Board seat: Marcus Reid (Threshold Ventures)
|
||||
- Angels: Several YC partners participated
|
||||
- Instrument: Priced equity round
|
||||
|
||||
## Company Context
|
||||
|
||||
NovaMind is a YC W25 company building autonomous AI agents for enterprise workflow
|
||||
automation. Founded by Sarah Chen (CEO) and Priya Patel (CTO). The company
|
||||
demonstrated a procurement agent at Demo Day that completed a 47-step workflow with
|
||||
94% reliability.
|
||||
|
||||
## Deal Timeline
|
||||
|
||||
- March 15, 2025 — Sarah Chen presents at YC W25 Demo Day
|
||||
- March 18, 2025 — Threshold Ventures (Marcus Reid) issues term sheet
|
||||
- March 22, 2025 — Follow-up due diligence call with Sarah Chen
|
||||
- March 28, 2025 — Round officially closed
|
||||
|
||||
## Use of Funds
|
||||
|
||||
- Hiring: 3 senior engineers (distributed systems, ML inference)
|
||||
- Hiring: 1 design partner lead
|
||||
- Infrastructure and compute costs
|
||||
- Runway: approximately 18 months at planned burn rate
|
||||
|
||||
## Key Relationships
|
||||
|
||||
This deal connects NovaMind, Threshold Ventures, Sarah Chen, Priya Patel, and Marcus
|
||||
Reid. The speed of execution (term sheet 3 days after Demo Day) reflects high
|
||||
conviction from Threshold.
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-03-18 — Term Sheet Issued
|
||||
|
||||
Marcus Reid at Threshold Ventures moved fast. $4M at $20M post-money. He was
|
||||
especially compelled by the live demo reliability and the technical depth of the
|
||||
founding team. Sarah Chen and Priya Patel together cover product, go-to-market, and
|
||||
deep ML research, which is rare at seed stage.
|
||||
|
||||
### 2025-03-28 — Round Closed
|
||||
|
||||
All docs signed. Marcus Reid joins the NovaMind board. Angels filled quickly — several
|
||||
YC partners participated based on the Demo Day performance. Sarah confirmed the
|
||||
capital plan: primarily hiring, with a target of 8-10 people by Q3 2025 launch.
|
||||
70
test/e2e/fixtures/meetings/novamind-demo-day.md
Normal file
70
test/e2e/fixtures/meetings/novamind-demo-day.md
Normal file
@@ -0,0 +1,70 @@
|
||||
---
|
||||
type: meeting
|
||||
title: NovaMind — YC W25 Demo Day
|
||||
tags:
|
||||
- demo-day
|
||||
- yc-w25
|
||||
---
|
||||
|
||||
# NovaMind — YC W25 Demo Day
|
||||
|
||||
Date: March 15, 2025
|
||||
Location: YC HQ, San Francisco
|
||||
Format: W25 batch Demo Day presentations + 1:1 meetings
|
||||
|
||||
## Attendees
|
||||
|
||||
- Sarah Chen (NovaMind CEO, presenter)
|
||||
- Priya Patel (NovaMind CTO, in audience)
|
||||
- Marcus Reid (Threshold Ventures GP)
|
||||
- ~200 investors in main audience
|
||||
- Full W25 batch presenting
|
||||
|
||||
## Summary
|
||||
|
||||
Sarah Chen presented NovaMind's autonomous agent platform. The live demo was the
|
||||
highlight of the batch: an AI agent completed a 47-step procurement workflow in under
|
||||
4 minutes with zero human intervention. Steps included vendor discovery, RFQ
|
||||
generation, bid comparison across 5 vendors, approval chain routing, and PO creation.
|
||||
|
||||
The agent handled two deliberate failure injections during the demo — a vendor API
|
||||
timeout and a budget threshold violation — recovering gracefully both times through
|
||||
the supervisor agent re-planning mechanism.
|
||||
|
||||
## 1:1 After Presentation
|
||||
|
||||
Had a 20-minute 1:1 with Sarah after the main presentations. Key discussion points:
|
||||
|
||||
- She described their architecture as "compiled procedures" rather than prompt chains.
|
||||
Agents learn reusable sub-routines from successful task completions.
|
||||
- The multi-agent coordination layer was designed by Priya Patel based on her Stanford
|
||||
PhD research on emergent communication.
|
||||
- Current team is 4 people. Looking to hire 3-4 senior engineers post-fundraise.
|
||||
- Go-to-market is vertical-first: procurement and supply chain initially.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- Follow up with Sarah for a deeper technical dive on the agent architecture.
|
||||
- Intro Marcus Reid (Threshold) to Sarah if he has not already connected — he focuses
|
||||
on AI/ML investments and this is squarely in his thesis.
|
||||
- Track NovaMind as a potential portfolio company or collaboration partner.
|
||||
|
||||
## Action Items
|
||||
|
||||
- [ ] Schedule follow-up call with Sarah Chen for architecture deep dive
|
||||
- [ ] Send Marcus Reid intro email if needed
|
||||
- [ ] Research procurement automation market size for context
|
||||
- [ ] Revisit agent memory architecture discussion (ran out of time)
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-03-15 — Event Notes
|
||||
|
||||
Arrived at YC HQ at 9am. NovaMind presented in the second block around 10:30am. Sarah
|
||||
was polished and the demo worked flawlessly. The failure injection moments drew audible
|
||||
reactions from the audience when the agent recovered. Marcus Reid approached Sarah
|
||||
immediately after the presentations. Had my 1:1 around 11:15am in the side room. Sarah
|
||||
was energetic but focused — she clearly had a plan and knew exactly what she wanted
|
||||
from the fundraise.
|
||||
73
test/e2e/fixtures/meetings/weekly-sync-mar28.md
Normal file
73
test/e2e/fixtures/meetings/weekly-sync-mar28.md
Normal file
@@ -0,0 +1,73 @@
|
||||
---
|
||||
type: meeting
|
||||
title: Weekly Sync — March 28, 2025
|
||||
tags:
|
||||
- weekly
|
||||
- internal
|
||||
---
|
||||
|
||||
# Weekly Sync — March 28, 2025
|
||||
|
||||
Date: March 28, 2025
|
||||
Format: Internal weekly sync
|
||||
Duration: 45 minutes
|
||||
|
||||
## Topics Covered
|
||||
|
||||
### NovaMind Follow-Up
|
||||
|
||||
Sarah Chen confirmed the seed round closed today. $4M led by Threshold Ventures with
|
||||
Marcus Reid taking a board seat. She is moving to hiring mode immediately. Discussed
|
||||
potential introductions to senior engineers in our network with distributed systems
|
||||
backgrounds.
|
||||
|
||||
NovaMind is on track for Q3 2025 launch with 2 enterprise design partners already
|
||||
signed in procurement vertical. The 94% task completion rate from Demo Day has held up
|
||||
in continued testing.
|
||||
|
||||
### Threshold Ventures Partnership
|
||||
|
||||
Marcus Reid has been responsive and collaborative. He expressed interest in seeing
|
||||
other AI infrastructure companies. Threshold's thesis around agent-native enterprise
|
||||
software aligns well with several companies in the current YC batch and recent alumni.
|
||||
|
||||
### GBrain Search Quality
|
||||
|
||||
Current keyword-only search is missing relevant results when queries use different
|
||||
terminology than stored documents. Example: searching "autonomous agents" does not
|
||||
surface pages about "AI agents" or "agentic systems." Need semantic similarity via
|
||||
vector embeddings.
|
||||
|
||||
Discussed hybrid search approach: combine vector similarity search with keyword
|
||||
full-text search using Reciprocal Rank Fusion (RRF). This would handle both exact
|
||||
keyword matches and semantic near-matches. Priya Patel's NovaMind architecture is a
|
||||
good case study — searching for "multi-agent coordination" should surface her page
|
||||
even if those exact words are not in every mention.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- Ship hybrid search in GBrain v0.3. This is the highest priority feature.
|
||||
- Use pgvector for embeddings, stored alongside content in Postgres.
|
||||
- Adopt Reciprocal Rank Fusion to merge vector and keyword result sets.
|
||||
- Continue tracking NovaMind progress for potential deeper engagement.
|
||||
|
||||
## Action Items
|
||||
|
||||
- [ ] Implement pgvector extension and embedding storage in GBrain schema
|
||||
- [ ] Build hybrid search with RRF scoring in GBrain v0.3
|
||||
- [x] Follow up with Sarah Chen on seed round status — confirmed closed
|
||||
- [ ] Send Marcus Reid list of AI infrastructure companies from recent batches
|
||||
- [ ] Write compiled-truth page for hybrid search concept
|
||||
- [ ] Schedule technical deep dive with Priya Patel on multi-agent systems
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-03-28 — Meeting Notes
|
||||
|
||||
Productive sync. The GBrain search discussion was the most substantive — we identified
|
||||
clear failure cases with keyword-only search and agreed on the hybrid approach. The
|
||||
NovaMind seed closing is good news and validates the W25 batch quality. Marcus Reid
|
||||
continues to be a strong partner in the AI investment ecosystem. Next weekly sync
|
||||
scheduled for April 4.
|
||||
47
test/e2e/fixtures/people/marcus-reid.md
Normal file
47
test/e2e/fixtures/people/marcus-reid.md
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
type: person
|
||||
title: Marcus Reid
|
||||
tags:
|
||||
- investor
|
||||
- ai-focus
|
||||
---
|
||||
|
||||
# Marcus Reid
|
||||
|
||||
General Partner at Threshold Ventures. Focuses on early-stage AI/ML and developer
|
||||
tools investments. Former software engineer at Stripe (2016-2021), where he worked on
|
||||
the payments infrastructure team. Transitioned to investing after angel investing in
|
||||
several successful AI startups during 2020-2021.
|
||||
|
||||
Marcus led the NovaMind seed round ($4M) and joined the board. He has strong technical
|
||||
intuition, especially around infrastructure and developer experience. Known for being
|
||||
hands-on with portfolio companies on go-to-market strategy.
|
||||
|
||||
## Key People
|
||||
|
||||
- Close relationship with Sarah Chen (NovaMind CEO). He was one of the first investors
|
||||
she spoke with after Demo Day.
|
||||
- Partner at Threshold Ventures alongside Elena Torres and James Wu.
|
||||
|
||||
## Investment Thesis
|
||||
|
||||
Believes the next wave of enterprise software will be agent-native: systems designed
|
||||
from the ground up for AI agents rather than human users. Looks for teams with deep
|
||||
technical backgrounds and a clear wedge into a specific vertical.
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-03-18 — NovaMind Seed Term Sheet
|
||||
|
||||
Marcus moved fast after Demo Day. Threshold Ventures issued a term sheet for the
|
||||
NovaMind seed round three days after the W25 Demo Day presentation. $4M at $20M
|
||||
post-money valuation. He told me he was most impressed by the reliability metrics
|
||||
Sarah showed: 94% task completion rate on complex multi-step workflows.
|
||||
|
||||
### 2025-03-28 — Seed Round Closed
|
||||
|
||||
Round officially closed. Marcus takes a board seat. He mentioned wanting to connect
|
||||
NovaMind with other Threshold portfolio companies for potential design partnerships,
|
||||
particularly in supply chain and logistics verticals.
|
||||
47
test/e2e/fixtures/people/priya-patel.md
Normal file
47
test/e2e/fixtures/people/priya-patel.md
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
type: person
|
||||
title: Priya Patel
|
||||
tags:
|
||||
- technical
|
||||
- ai-research
|
||||
---
|
||||
|
||||
# Priya Patel
|
||||
|
||||
CTO and co-founder of NovaMind. Stanford CS PhD (2022), where her dissertation focused
|
||||
on emergent communication in multi-agent systems. Before Stanford, she did her
|
||||
undergraduate CS degree at IIT Bombay. After her PhD she joined Google Brain as a
|
||||
research scientist (2022-2024), publishing several papers on multi-agent coordination
|
||||
and task decomposition.
|
||||
|
||||
Priya designed NovaMind's core multi-agent coordination layer. Her academic work at
|
||||
Stanford on emergent communication protocols directly informs how NovaMind agents
|
||||
negotiate task handoffs and share intermediate state. She is the technical counterpart
|
||||
to Sarah Chen's product and business vision.
|
||||
|
||||
## Research Background
|
||||
|
||||
- Stanford CS PhD dissertation: "Emergent Communication Protocols in Cooperative
|
||||
Multi-Agent Systems" (2022)
|
||||
- Google Brain publications on learned task decomposition and agent specialization
|
||||
- Key insight from her research: agents that develop their own communication protocols
|
||||
outperform those using human-designed message schemas
|
||||
|
||||
## Technical Contributions at NovaMind
|
||||
|
||||
- Designed the supervisor agent architecture that handles error recovery and re-planning
|
||||
- Built the "compiled procedures" system where agents learn reusable sub-routines
|
||||
- Developed the evaluation framework that measures task completion reliability (94%
|
||||
completion rate on 47-step workflows)
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-03-22 — Technical Deep Dive (via Sarah)
|
||||
|
||||
Sarah Chen described Priya's architecture during our follow-up call. The multi-agent
|
||||
coordination layer uses a learned protocol rather than hardcoded message passing.
|
||||
Agents can recruit specialist sub-agents dynamically based on task requirements. Priya
|
||||
apparently benchmarked this against LangGraph and CrewAI, showing 3x better error
|
||||
recovery on complex workflows.
|
||||
68
test/e2e/fixtures/people/sarah-chen.md
Normal file
68
test/e2e/fixtures/people/sarah-chen.md
Normal file
@@ -0,0 +1,68 @@
|
||||
---
|
||||
type: person
|
||||
title: Sarah Chen
|
||||
aliases:
|
||||
- Sarah L. Chen
|
||||
- sarahchen
|
||||
- sarah@novamind.ai
|
||||
tags:
|
||||
- founder
|
||||
- yc-w25
|
||||
- ai-agents
|
||||
---
|
||||
|
||||
# Sarah Chen
|
||||
|
||||
Founder and CEO of NovaMind (YC W25). Building autonomous AI agents for enterprise
|
||||
workflow automation. Previously ML engineer at Anthropic (2022-2024), where she worked
|
||||
on tool-use and agentic behaviors in large language models. Stanford CS class of 2020.
|
||||
|
||||
Met Sarah at YC W25 Demo Day on March 15, 2025. She gave one of the strongest demos
|
||||
of the batch: an agent that completed a 47-step procurement workflow end-to-end with
|
||||
zero human intervention. She is sharp, deeply technical, and has real conviction about
|
||||
the agent-native enterprise stack replacing SaaS dashboards.
|
||||
|
||||
## Key People
|
||||
|
||||
- Priya Patel is CTO and co-founder. They overlapped at Stanford.
|
||||
- Marcus Reid at Threshold Ventures led their seed round.
|
||||
|
||||
## Beliefs
|
||||
|
||||
- Enterprise software will be replaced by fleets of task-specific agents within 5 years.
|
||||
- The bottleneck is not model capability but reliable multi-step execution and error recovery.
|
||||
- Agent frameworks that force developers to think in graphs (nodes/edges) are the wrong abstraction. Natural language task descriptions with learned sub-routines are the path.
|
||||
- Vertical-first go-to-market beats horizontal platform plays for agents.
|
||||
|
||||
## Open Threads
|
||||
|
||||
- Revisit her thoughts on agent memory architecture. She hinted at something novel during the 1:1 after Demo Day but we ran out of time.
|
||||
- She is looking for design partners in procurement and supply chain. Could intro to relevant YC alumni.
|
||||
- Follow up on potential collab between NovaMind agent infra and GBrain knowledge layer.
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-03-15 — YC W25 Demo Day
|
||||
|
||||
Met Sarah at Demo Day. NovaMind demo was standout of the batch. The agent completed a
|
||||
47-step procurement workflow autonomously: vendor discovery, RFQ generation, bid
|
||||
comparison, approval routing, PO creation. Had a 1:1 conversation after the main
|
||||
presentations. She described their agent memory system as "compiled procedures" rather
|
||||
than prompt chains. Impressive technical depth for a CEO.
|
||||
|
||||
### 2025-03-22 — Follow-up Call
|
||||
|
||||
30-minute call to dig deeper on NovaMind architecture. Sarah walked through their
|
||||
execution engine: agents decompose tasks into sub-procedures, each with rollback
|
||||
semantics. Error recovery is handled by a supervisor agent that can re-plan. She
|
||||
mentioned Priya Patel (CTO) designed the multi-agent coordination layer based on her
|
||||
Google Brain research on emergent communication.
|
||||
|
||||
### 2025-03-28 — Seed Round Closed
|
||||
|
||||
Sarah confirmed NovaMind closed their $4M seed round led by Threshold Ventures with
|
||||
Marcus Reid on the board. Angels include several YC partners. She plans to use the
|
||||
capital primarily for hiring: 3 senior engineers and 1 design partner lead. Launch
|
||||
target is Q3 2025 with 2 enterprise design partners already signed.
|
||||
75
test/e2e/fixtures/projects/gbrain.md
Normal file
75
test/e2e/fixtures/projects/gbrain.md
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
type: project
|
||||
title: GBrain
|
||||
tags:
|
||||
- active
|
||||
- infrastructure
|
||||
---
|
||||
|
||||
# GBrain
|
||||
|
||||
Personal knowledge brain built on Postgres with pgvector. A managed Supabase instance
|
||||
provides the database layer. GBrain stores, searches, and retrieves personal knowledge
|
||||
using a hybrid RAG architecture.
|
||||
|
||||
## Architecture
|
||||
|
||||
- **Contract-first design** — `src/core/operations.ts` defines ~30 shared operations.
|
||||
Both the CLI and the MCP server are generated from this single source of truth.
|
||||
Adding a new operation means defining it once and getting both interfaces for free.
|
||||
- **Postgres-native** — All data lives in Postgres. Embeddings are stored using
|
||||
pgvector. Full-text search uses Postgres tsvector/tsquery. No external search
|
||||
services required.
|
||||
- **Hybrid search** — Combines vector similarity search with keyword full-text search
|
||||
using Reciprocal Rank Fusion (RRF). This handles both exact keyword matches and
|
||||
semantic near-matches. Multi-query expansion and deduplication further improve
|
||||
recall and precision.
|
||||
- **Compiled truth pages** — All knowledge pages use the two-layer compiled truth +
|
||||
timeline format. This means retrieved content is pre-synthesized rather than raw
|
||||
note fragments, producing higher quality RAG responses.
|
||||
|
||||
## Key Components
|
||||
|
||||
- Pluggable engine interface (BrainEngine) with Postgres + pgvector implementation
|
||||
- 3-tier chunking: recursive, semantic, and LLM-guided
|
||||
- OpenAI text-embedding-3-large for vector embeddings with batch processing and retry
|
||||
- Skills system: fat markdown files that work in both CLI and plugin contexts
|
||||
- MCP stdio server for integration with Claude and other LLM tools
|
||||
|
||||
## Current Status
|
||||
|
||||
v0.3 shipped with hybrid search, contract-first architecture, and the ClawHub bundle
|
||||
plugin. Active development continues on search quality improvements and new skills.
|
||||
|
||||
## Retrieval-Augmented Generation
|
||||
|
||||
GBrain uses RAG as its core query mechanism. The compiled truth pattern is a deliberate
|
||||
alternative to standard RAG's fragment-retrieval approach: by maintaining pre-synthesized
|
||||
pages, retrieved context is higher quality and more coherent than raw chunks.
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-02-01 — Project Started
|
||||
|
||||
Initial implementation with keyword-only search. Postgres + Supabase backend. Basic
|
||||
CLI for import and query operations.
|
||||
|
||||
### 2025-03-01 — Contract-First Refactor
|
||||
|
||||
Refactored to contract-first architecture. Operations defined in a single source file,
|
||||
with CLI and MCP server both generated from the same definitions. This eliminated
|
||||
drift between the two interfaces and simplified adding new operations.
|
||||
|
||||
### 2025-03-28 — Hybrid Search Decision
|
||||
|
||||
Weekly sync decision to ship hybrid search in v0.3. Keyword-only search was missing
|
||||
relevant results when queries used different terminology than stored documents.
|
||||
Adopted pgvector for embeddings and Reciprocal Rank Fusion (RRF) for merging vector
|
||||
and keyword result sets.
|
||||
|
||||
### 2025-04-01 — v0.3 Shipped
|
||||
|
||||
Released v0.3 with hybrid search, ClawHub bundle plugin, and several new skills.
|
||||
Contract-first parity between CLI, MCP, and tools-json verified by automated tests.
|
||||
103
test/e2e/fixtures/sources/crustdata-sarah-chen.md
Normal file
103
test/e2e/fixtures/sources/crustdata-sarah-chen.md
Normal file
@@ -0,0 +1,103 @@
|
||||
---
|
||||
type: source
|
||||
title: "Crustdata: Sarah Chen"
|
||||
tags:
|
||||
- raw-data
|
||||
- enrichment
|
||||
---
|
||||
|
||||
# Crustdata: Sarah Chen
|
||||
|
||||
Raw enrichment data retrieved from Crustdata people API for Sarah Chen, founder and
|
||||
CEO of NovaMind.
|
||||
|
||||
## Profile
|
||||
|
||||
```
|
||||
{
|
||||
"full_name": "Sarah L. Chen",
|
||||
"current_title": "Founder & CEO",
|
||||
"current_company": "NovaMind",
|
||||
"location": "San Francisco, CA",
|
||||
"email": "sarah@novamind.ai",
|
||||
"linkedin": "linkedin.com/in/sarahchen"
|
||||
}
|
||||
```
|
||||
|
||||
## Education
|
||||
|
||||
```
|
||||
{
|
||||
"education": [
|
||||
{
|
||||
"institution": "Stanford University",
|
||||
"degree": "BS",
|
||||
"field": "Computer Science",
|
||||
"graduation_year": 2020
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Work History
|
||||
|
||||
```
|
||||
{
|
||||
"work_history": [
|
||||
{
|
||||
"company": "NovaMind",
|
||||
"title": "Founder & CEO",
|
||||
"start_date": "2024-06",
|
||||
"end_date": null,
|
||||
"description": "AI agent startup for enterprise workflow automation. YC W25."
|
||||
},
|
||||
{
|
||||
"company": "Anthropic",
|
||||
"title": "ML Engineer",
|
||||
"start_date": "2022-01",
|
||||
"end_date": "2024-05",
|
||||
"description": "Worked on tool-use and agentic behaviors in large language models. Contributed to Claude's function calling capabilities."
|
||||
},
|
||||
{
|
||||
"company": "Stanford AI Lab",
|
||||
"title": "Research Assistant",
|
||||
"start_date": "2019-06",
|
||||
"end_date": "2020-06",
|
||||
"description": "Undergraduate research on reinforcement learning for sequential decision making."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Skills & Expertise
|
||||
|
||||
```
|
||||
{
|
||||
"skills": [
|
||||
"Machine Learning",
|
||||
"Large Language Models",
|
||||
"AI Agents",
|
||||
"Python",
|
||||
"PyTorch",
|
||||
"Distributed Systems",
|
||||
"Product Strategy"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
This data was retrieved on March 20, 2025. Used to enrich the Sarah Chen person page
|
||||
with verified education and work history details. The Stanford CS 2020 graduation and
|
||||
Anthropic ML Engineer 2022-2024 tenure have been cross-referenced and incorporated
|
||||
into the compiled truth on the Sarah Chen page.
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
### 2025-03-20 — Data Retrieved
|
||||
|
||||
Pulled Sarah Chen profile from Crustdata API as part of post-Demo Day due diligence.
|
||||
Data confirms Stanford CS background and Anthropic tenure. LinkedIn profile is active
|
||||
and consistent with API data. Email sarah@novamind.ai verified as current.
|
||||
199
test/e2e/helpers.ts
Normal file
199
test/e2e/helpers.ts
Normal file
@@ -0,0 +1,199 @@
|
||||
/**
|
||||
* E2E test helpers: DB lifecycle, fixture import, timing, and diagnostics.
|
||||
*
|
||||
* Usage in test files:
|
||||
* import { setupDB, teardownDB, importFixtures, time } from './helpers.ts';
|
||||
* beforeAll(async () => { await setupDB(); await importFixtures(); });
|
||||
* afterAll(async () => { await teardownDB(); });
|
||||
*/
|
||||
|
||||
import { readFileSync, existsSync, readdirSync, statSync } from 'fs';
|
||||
import { join, resolve, relative, dirname, basename, extname } from 'path';
|
||||
import { PostgresEngine } from '../../src/core/postgres-engine.ts';
|
||||
import * as db from '../../src/core/db.ts';
|
||||
import { importFromContent } from '../../src/core/import-file.ts';
|
||||
import { parseMarkdown } from '../../src/core/markdown.ts';
|
||||
|
||||
// Load .env.testing if present
|
||||
const envPath = resolve(import.meta.dir, '../../.env.testing');
|
||||
if (existsSync(envPath)) {
|
||||
const lines = readFileSync(envPath, 'utf-8').split('\n');
|
||||
for (const line of lines) {
|
||||
const trimmed = line.trim();
|
||||
if (!trimmed || trimmed.startsWith('#')) continue;
|
||||
const eq = trimmed.indexOf('=');
|
||||
if (eq === -1) continue;
|
||||
const key = trimmed.slice(0, eq);
|
||||
const val = trimmed.slice(eq + 1);
|
||||
if (!process.env[key]) process.env[key] = val;
|
||||
}
|
||||
}
|
||||
|
||||
const DATABASE_URL = process.env.DATABASE_URL;
|
||||
const FIXTURES_DIR = resolve(import.meta.dir, 'fixtures');
|
||||
|
||||
let engine: PostgresEngine | null = null;
|
||||
|
||||
const ALL_TABLES = [
|
||||
'content_chunks',
|
||||
'links',
|
||||
'tags',
|
||||
'raw_data',
|
||||
'timeline_entries',
|
||||
'page_versions',
|
||||
'ingest_log',
|
||||
'files',
|
||||
'pages', // last because of foreign keys
|
||||
'config',
|
||||
];
|
||||
|
||||
/**
|
||||
* Check if a real database is available for E2E tests.
|
||||
*/
|
||||
export function hasDatabase(): boolean {
|
||||
return !!DATABASE_URL;
|
||||
}
|
||||
|
||||
/**
|
||||
* Connect to DB, run schema init, truncate all tables.
|
||||
* Call in beforeAll() of each test file.
|
||||
*/
|
||||
export async function setupDB(): Promise<PostgresEngine> {
|
||||
if (!DATABASE_URL) {
|
||||
throw new Error('DATABASE_URL not set. Copy .env.testing.example to .env.testing and configure it.');
|
||||
}
|
||||
|
||||
// Disconnect any prior connection (clean slate)
|
||||
await db.disconnect();
|
||||
|
||||
// Connect fresh
|
||||
await db.connect({ database_url: DATABASE_URL });
|
||||
await db.initSchema();
|
||||
|
||||
// Truncate all data tables (preserves schema + extensions)
|
||||
const conn = db.getConnection();
|
||||
for (const table of ALL_TABLES) {
|
||||
await conn.unsafe(`TRUNCATE ${table} CASCADE`);
|
||||
}
|
||||
|
||||
// Re-seed config (initSchema inserts default config rows)
|
||||
await conn.unsafe(`
|
||||
INSERT INTO config (key, value) VALUES ('schema_version', '1')
|
||||
ON CONFLICT (key) DO NOTHING
|
||||
`);
|
||||
|
||||
engine = new PostgresEngine();
|
||||
await engine.connect({ database_url: DATABASE_URL });
|
||||
return engine;
|
||||
}
|
||||
|
||||
/**
|
||||
* Disconnect from DB. Call in afterAll() of each test file.
|
||||
*/
|
||||
export async function teardownDB(): Promise<void> {
|
||||
if (engine) {
|
||||
await engine.disconnect();
|
||||
engine = null;
|
||||
}
|
||||
await db.disconnect();
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the current engine instance.
|
||||
*/
|
||||
export function getEngine(): PostgresEngine {
|
||||
if (!engine) throw new Error('setupDB() must be called first');
|
||||
return engine;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get a raw DB connection for direct queries.
|
||||
*/
|
||||
export function getConn() {
|
||||
return db.getConnection();
|
||||
}
|
||||
|
||||
/**
|
||||
* Import all fixture files from test/e2e/fixtures/ into the brain.
|
||||
* Returns the list of import results.
|
||||
*/
|
||||
export async function importFixtures() {
|
||||
const e = getEngine();
|
||||
const results: Array<{ slug: string; status: string; chunks: number }> = [];
|
||||
|
||||
const files = findMarkdownFiles(FIXTURES_DIR);
|
||||
for (const filePath of files) {
|
||||
const relPath = relative(FIXTURES_DIR, filePath);
|
||||
const content = readFileSync(filePath, 'utf-8');
|
||||
const parsed = parseMarkdown(content, relPath);
|
||||
const result = await importFromContent(e, parsed.slug, content, { noEmbed: true });
|
||||
results.push(result);
|
||||
}
|
||||
|
||||
return results;
|
||||
}
|
||||
|
||||
/**
|
||||
* Import a single fixture by its relative path within fixtures/.
|
||||
*/
|
||||
export async function importFixture(relativePath: string) {
|
||||
const e = getEngine();
|
||||
const filePath = join(FIXTURES_DIR, relativePath);
|
||||
const content = readFileSync(filePath, 'utf-8');
|
||||
const parsed = parseMarkdown(content, relativePath);
|
||||
return importFromContent(e, parsed.slug, content, { noEmbed: true });
|
||||
}
|
||||
|
||||
/**
|
||||
* Recursively find all .md files in a directory.
|
||||
*/
|
||||
function findMarkdownFiles(dir: string): string[] {
|
||||
const results: string[] = [];
|
||||
for (const entry of readdirSync(dir)) {
|
||||
const full = join(dir, entry);
|
||||
const stat = statSync(full);
|
||||
if (stat.isDirectory()) {
|
||||
results.push(...findMarkdownFiles(full));
|
||||
} else if (extname(entry) === '.md') {
|
||||
results.push(full);
|
||||
}
|
||||
}
|
||||
return results.sort();
|
||||
}
|
||||
|
||||
/**
|
||||
* Time a function and return [result, durationMs].
|
||||
*/
|
||||
export async function time<T>(fn: () => Promise<T>): Promise<[T, number]> {
|
||||
const start = performance.now();
|
||||
const result = await fn();
|
||||
const dur = performance.now() - start;
|
||||
return [result, dur];
|
||||
}
|
||||
|
||||
/**
|
||||
* Dump DB state for debugging on test failure.
|
||||
*/
|
||||
export async function dumpDBState(): Promise<string> {
|
||||
const conn = db.getConnection();
|
||||
const pages = await conn.unsafe(`SELECT slug, type, title FROM pages ORDER BY slug`);
|
||||
const chunkCount = await conn.unsafe(`SELECT count(*) as n FROM content_chunks`);
|
||||
const linkCount = await conn.unsafe(`SELECT count(*) as n FROM links`);
|
||||
const tagCount = await conn.unsafe(`SELECT count(*) as n FROM tags`);
|
||||
|
||||
const lines = [
|
||||
`=== DB STATE DUMP ===`,
|
||||
`Pages (${pages.length}):`,
|
||||
...pages.map((p: any) => ` ${p.slug} [${p.type}] "${p.title}"`),
|
||||
`Chunks: ${chunkCount[0]?.n ?? 0}`,
|
||||
`Links: ${linkCount[0]?.n ?? 0}`,
|
||||
`Tags: ${tagCount[0]?.n ?? 0}`,
|
||||
`=== END DUMP ===`,
|
||||
];
|
||||
return lines.join('\n');
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the fixtures directory path.
|
||||
*/
|
||||
export const FIXTURES_PATH = FIXTURES_DIR;
|
||||
67
test/e2e/mcp.test.ts
Normal file
67
test/e2e/mcp.test.ts
Normal file
@@ -0,0 +1,67 @@
|
||||
/**
|
||||
* E2E MCP Protocol Test — Tier 1
|
||||
*
|
||||
* Verifies the MCP server can start and that the tools/list
|
||||
* from operations.ts generates correct tool definitions.
|
||||
*
|
||||
* Note: The full stdio MCP protocol test (spawn server, send JSON-RPC)
|
||||
* is complex because the MCP SDK uses its own transport layer. This test
|
||||
* verifies the tool generation logic directly, which is what matters for
|
||||
* agent compatibility.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { operations } from '../../src/core/operations.ts';
|
||||
|
||||
describe('E2E: MCP Tool Generation', () => {
|
||||
test('operations generate valid MCP tool definitions', () => {
|
||||
// This replicates exactly what server.ts does in the tools/list handler
|
||||
const tools = operations.map(op => ({
|
||||
name: op.name,
|
||||
description: op.description,
|
||||
inputSchema: {
|
||||
type: 'object' as const,
|
||||
properties: Object.fromEntries(
|
||||
Object.entries(op.params).map(([k, v]) => [k, {
|
||||
type: v.type === 'array' ? 'array' : v.type,
|
||||
...(v.description ? { description: v.description } : {}),
|
||||
...(v.enum ? { enum: v.enum } : {}),
|
||||
...(v.items ? { items: { type: v.items.type } } : {}),
|
||||
}]),
|
||||
),
|
||||
required: Object.entries(op.params)
|
||||
.filter(([, v]) => v.required)
|
||||
.map(([k]) => k),
|
||||
},
|
||||
}));
|
||||
|
||||
expect(tools.length).toBe(operations.length);
|
||||
expect(tools.length).toBeGreaterThanOrEqual(30);
|
||||
|
||||
for (const tool of tools) {
|
||||
expect(tool.name).toBeTruthy();
|
||||
expect(tool.description).toBeTruthy();
|
||||
expect(tool.inputSchema.type).toBe('object');
|
||||
expect(typeof tool.inputSchema.properties).toBe('object');
|
||||
expect(Array.isArray(tool.inputSchema.required)).toBe(true);
|
||||
}
|
||||
|
||||
// Verify specific tools exist
|
||||
const names = tools.map(t => t.name);
|
||||
expect(names).toContain('get_page');
|
||||
expect(names).toContain('put_page');
|
||||
expect(names).toContain('search');
|
||||
expect(names).toContain('query');
|
||||
expect(names).toContain('add_link');
|
||||
expect(names).toContain('get_health');
|
||||
expect(names).toContain('sync_brain');
|
||||
expect(names).toContain('file_upload');
|
||||
});
|
||||
|
||||
test('MCP server module can be imported', async () => {
|
||||
// Verify the server module loads without errors
|
||||
const mod = await import('../../src/mcp/server.ts');
|
||||
expect(typeof mod.startMcpServer).toBe('function');
|
||||
expect(typeof mod.handleToolCall).toBe('function');
|
||||
});
|
||||
});
|
||||
722
test/e2e/mechanical.test.ts
Normal file
722
test/e2e/mechanical.test.ts
Normal file
@@ -0,0 +1,722 @@
|
||||
/**
|
||||
* E2E Mechanical Tests — Tier 1 (no API keys required)
|
||||
*
|
||||
* Tests all operations against a real Postgres+pgvector database.
|
||||
* Requires DATABASE_URL env var or .env.testing file.
|
||||
*
|
||||
* Run: DATABASE_URL=... bun test test/e2e/mechanical.test.ts
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import { readFileSync, writeFileSync, mkdtempSync, rmSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
import { execSync } from 'child_process';
|
||||
import { tmpdir } from 'os';
|
||||
import {
|
||||
hasDatabase, setupDB, teardownDB, getEngine, getConn,
|
||||
importFixtures, importFixture, time, dumpDBState, FIXTURES_PATH,
|
||||
} from './helpers.ts';
|
||||
import { operationsByName, operations } from '../../src/core/operations.ts';
|
||||
import type { OperationContext } from '../../src/core/operations.ts';
|
||||
import { importFromContent } from '../../src/core/import-file.ts';
|
||||
|
||||
// Skip all E2E tests if no database is configured
|
||||
const skip = !hasDatabase();
|
||||
const describeE2E = skip ? describe.skip : describe;
|
||||
|
||||
function makeCtx(): OperationContext {
|
||||
return {
|
||||
engine: getEngine(),
|
||||
config: { engine: 'postgres', database_url: process.env.DATABASE_URL! },
|
||||
logger: { info: () => {}, warn: () => {}, error: () => {} },
|
||||
dryRun: false,
|
||||
};
|
||||
}
|
||||
|
||||
async function callOp(name: string, params: Record<string, unknown> = {}) {
|
||||
const op = operationsByName[name];
|
||||
if (!op) throw new Error(`Unknown operation: ${name}`);
|
||||
return op.handler(makeCtx(), params);
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Page CRUD
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Page CRUD', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('fixture import creates correct page count', async () => {
|
||||
const stats = await callOp('get_stats') as any;
|
||||
expect(stats.page_count).toBe(13);
|
||||
});
|
||||
|
||||
test('get_page returns correct data for person', async () => {
|
||||
const page = await callOp('get_page', { slug: 'people/sarah-chen' }) as any;
|
||||
expect(page.title).toBe('Sarah Chen');
|
||||
expect(page.type).toBe('person');
|
||||
expect(page.compiled_truth).toContain('NovaMind');
|
||||
expect(page.tags).toContain('founder');
|
||||
expect(page.tags).toContain('yc-w25');
|
||||
});
|
||||
|
||||
test('get_page returns correct data for concept', async () => {
|
||||
const page = await callOp('get_page', { slug: 'concepts/retrieval-augmented-generation' }) as any;
|
||||
expect(page.title).toBe('Retrieval-Augmented Generation');
|
||||
expect(page.type).toBe('concept');
|
||||
expect(page.compiled_truth).toContain('検索拡張生成');
|
||||
});
|
||||
|
||||
test('get_page for company includes key details', async () => {
|
||||
const page = await callOp('get_page', { slug: 'companies/novamind' }) as any;
|
||||
expect(page.type).toBe('company');
|
||||
expect(page.compiled_truth).toContain('Sarah Chen');
|
||||
});
|
||||
|
||||
test('list_pages type filter returns correct count', async () => {
|
||||
const people = await callOp('list_pages', { type: 'person' }) as any[];
|
||||
expect(people.length).toBe(3);
|
||||
|
||||
const companies = await callOp('list_pages', { type: 'company' }) as any[];
|
||||
expect(companies.length).toBe(2);
|
||||
|
||||
const concepts = await callOp('list_pages', { type: 'concept' }) as any[];
|
||||
expect(concepts.length).toBe(3);
|
||||
});
|
||||
|
||||
test('list_pages tag filter works', async () => {
|
||||
const ycPages = await callOp('list_pages', { tag: 'yc-w25' }) as any[];
|
||||
expect(ycPages.length).toBeGreaterThanOrEqual(2);
|
||||
expect(ycPages.some((p: any) => p.slug === 'people/sarah-chen')).toBe(true);
|
||||
});
|
||||
|
||||
test('put_page updates existing page', async () => {
|
||||
const updated = readFileSync(join(FIXTURES_PATH, 'people/sarah-chen.md'), 'utf-8')
|
||||
.replace('Stanford CS', 'MIT CS');
|
||||
// Use importFromContent directly with noEmbed to avoid OpenAI timeout
|
||||
const engine = getEngine();
|
||||
const result = await importFromContent(engine, 'people/sarah-chen', updated, { noEmbed: true });
|
||||
expect(result.status).toBe('imported');
|
||||
const page = await callOp('get_page', { slug: 'people/sarah-chen' }) as any;
|
||||
expect(page.compiled_truth).toContain('MIT CS');
|
||||
});
|
||||
|
||||
test('delete_page removes page and others survive', async () => {
|
||||
await callOp('delete_page', { slug: 'sources/crustdata-sarah-chen' });
|
||||
const stats = await callOp('get_stats') as any;
|
||||
expect(stats.page_count).toBe(12);
|
||||
|
||||
// Other pages still exist
|
||||
const sarah = await callOp('get_page', { slug: 'people/sarah-chen' }) as any;
|
||||
expect(sarah.title).toBe('Sarah Chen');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Search
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Search', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('keyword search for "NovaMind" returns multiple hits', async () => {
|
||||
const results = await callOp('search', { query: 'NovaMind' }) as any[];
|
||||
expect(results.length).toBeGreaterThanOrEqual(3);
|
||||
const slugs = results.map((r: any) => r.slug);
|
||||
expect(slugs).toContain('companies/novamind');
|
||||
});
|
||||
|
||||
test('keyword search for "Threshold Ventures" finds investor', async () => {
|
||||
const results = await callOp('search', { query: 'Threshold Ventures' }) as any[];
|
||||
expect(results.length).toBeGreaterThanOrEqual(1);
|
||||
const slugs = results.map((r: any) => r.slug);
|
||||
expect(slugs).toContain('companies/threshold-ventures');
|
||||
});
|
||||
|
||||
test('keyword search for "Stanford" finds Priya', async () => {
|
||||
const results = await callOp('search', { query: 'Stanford' }) as any[];
|
||||
expect(results.length).toBeGreaterThanOrEqual(1);
|
||||
const slugs = results.map((r: any) => r.slug);
|
||||
expect(slugs).toContain('people/priya-patel');
|
||||
});
|
||||
|
||||
test('keyword search for nonexistent term returns empty', async () => {
|
||||
const results = await callOp('search', { query: 'xyznonexistent123' }) as any[];
|
||||
expect(results.length).toBe(0);
|
||||
});
|
||||
|
||||
test('search quality: precision@5 for known queries', async () => {
|
||||
const groundTruth: Record<string, string[]> = {
|
||||
'NovaMind': ['people/sarah-chen', 'companies/novamind', 'deals/novamind-seed'],
|
||||
'hybrid search': ['concepts/hybrid-search', 'concepts/retrieval-augmented-generation'],
|
||||
'compiled truth': ['concepts/compiled-truth'],
|
||||
};
|
||||
|
||||
const scores: Record<string, number> = {};
|
||||
for (const [query, expected] of Object.entries(groundTruth)) {
|
||||
const results = await callOp('search', { query, limit: 5 }) as any[];
|
||||
const topSlugs = results.slice(0, 5).map((r: any) => r.slug);
|
||||
const hits = expected.filter(e => topSlugs.includes(e));
|
||||
scores[query] = hits.length / Math.min(expected.length, 5);
|
||||
}
|
||||
|
||||
console.log('\n Search Quality (precision@5, keyword-only):');
|
||||
for (const [query, score] of Object.entries(scores)) {
|
||||
console.log(` "${query}": ${(score * 100).toFixed(0)}%`);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Links
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Links', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('add_link + get_links + get_backlinks round trip', async () => {
|
||||
await callOp('add_link', {
|
||||
from: 'people/sarah-chen',
|
||||
to: 'companies/novamind',
|
||||
link_type: 'founded',
|
||||
context: 'CEO and founder since 2024',
|
||||
});
|
||||
|
||||
const links = await callOp('get_links', { slug: 'people/sarah-chen' }) as any[];
|
||||
expect(links.some((l: any) => l.to_slug === 'companies/novamind' || l.to_page_slug === 'companies/novamind')).toBe(true);
|
||||
|
||||
const backlinks = await callOp('get_backlinks', { slug: 'companies/novamind' }) as any[];
|
||||
expect(backlinks.some((l: any) => l.from_slug === 'people/sarah-chen' || l.from_page_slug === 'people/sarah-chen')).toBe(true);
|
||||
});
|
||||
|
||||
test('traverse_graph finds connected pages', async () => {
|
||||
// Links should already be added from prior test in this describe block
|
||||
const graph = await callOp('traverse_graph', { slug: 'people/sarah-chen', depth: 2 }) as any;
|
||||
expect(Array.isArray(graph)).toBe(true);
|
||||
expect(graph.length).toBeGreaterThanOrEqual(1);
|
||||
});
|
||||
|
||||
test('remove_link removes the link', async () => {
|
||||
await callOp('add_link', { from: 'people/marcus-reid', to: 'companies/threshold-ventures' });
|
||||
await callOp('remove_link', { from: 'people/marcus-reid', to: 'companies/threshold-ventures' });
|
||||
|
||||
const links = await callOp('get_links', { slug: 'people/marcus-reid' }) as any[];
|
||||
const hasLink = links.some((l: any) =>
|
||||
(l.to_slug || l.to_page_slug) === 'companies/threshold-ventures'
|
||||
);
|
||||
expect(hasLink).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Tags
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Tags', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('get_tags returns imported tags', async () => {
|
||||
const tags = await callOp('get_tags', { slug: 'people/sarah-chen' }) as string[];
|
||||
expect(tags).toContain('founder');
|
||||
expect(tags).toContain('yc-w25');
|
||||
expect(tags).toContain('ai-agents');
|
||||
});
|
||||
|
||||
test('add_tag + remove_tag round trip', async () => {
|
||||
await callOp('add_tag', { slug: 'people/marcus-reid', tag: 'test-tag' });
|
||||
let tags = await callOp('get_tags', { slug: 'people/marcus-reid' }) as string[];
|
||||
expect(tags).toContain('test-tag');
|
||||
|
||||
await callOp('remove_tag', { slug: 'people/marcus-reid', tag: 'test-tag' });
|
||||
tags = await callOp('get_tags', { slug: 'people/marcus-reid' }) as string[];
|
||||
expect(tags).not.toContain('test-tag');
|
||||
});
|
||||
|
||||
test('list_pages with tag filter finds tagged pages', async () => {
|
||||
await callOp('add_tag', { slug: 'people/priya-patel', tag: 'test-search-tag' });
|
||||
const pages = await callOp('list_pages', { tag: 'test-search-tag' }) as any[];
|
||||
expect(pages.length).toBe(1);
|
||||
expect(pages[0].slug).toBe('people/priya-patel');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Timeline
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Timeline', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('add_timeline_entry + get_timeline round trip', async () => {
|
||||
await callOp('add_timeline_entry', {
|
||||
slug: 'people/sarah-chen',
|
||||
date: '2025-04-01',
|
||||
summary: 'Test timeline entry',
|
||||
detail: 'Added via E2E test',
|
||||
source: 'e2e-test',
|
||||
});
|
||||
|
||||
const timeline = await callOp('get_timeline', { slug: 'people/sarah-chen' }) as any[];
|
||||
expect(timeline.length).toBeGreaterThanOrEqual(1);
|
||||
const entry = timeline.find((e: any) => e.summary === 'Test timeline entry');
|
||||
expect(entry).toBeDefined();
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Versions
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Versions', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('put_page creates version, revert restores', async () => {
|
||||
const original = await callOp('get_page', { slug: 'people/sarah-chen' }) as any;
|
||||
|
||||
// Modify page using importFromContent with noEmbed
|
||||
const modified = readFileSync(join(FIXTURES_PATH, 'people/sarah-chen.md'), 'utf-8')
|
||||
.replace('Sarah Chen', 'Sarah Chen (Modified)');
|
||||
const engine = getEngine();
|
||||
await importFromContent(engine, 'people/sarah-chen', modified, { noEmbed: true });
|
||||
|
||||
// Check versions exist
|
||||
const versions = await callOp('get_versions', { slug: 'people/sarah-chen' }) as any[];
|
||||
expect(versions.length).toBeGreaterThanOrEqual(1);
|
||||
|
||||
// Revert to first version
|
||||
const firstVersion = versions[versions.length - 1];
|
||||
await callOp('revert_version', { slug: 'people/sarah-chen', version_id: firstVersion.id });
|
||||
|
||||
const reverted = await callOp('get_page', { slug: 'people/sarah-chen' }) as any;
|
||||
expect(reverted.compiled_truth).not.toContain('(Modified)');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Admin
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Admin', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('get_stats returns valid structure', async () => {
|
||||
const stats = await callOp('get_stats') as any;
|
||||
expect(stats.page_count).toBe(13);
|
||||
expect(typeof stats.chunk_count).toBe('number');
|
||||
});
|
||||
|
||||
test('get_health returns valid structure', async () => {
|
||||
const health = await callOp('get_health') as any;
|
||||
expect(health).toBeDefined();
|
||||
expect(typeof health.page_count).toBe('number');
|
||||
expect(typeof health.embed_coverage).toBe('number');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Chunks & Resolution
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Chunks & Resolution', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('get_chunks returns chunks for imported page', async () => {
|
||||
const chunks = await callOp('get_chunks', { slug: 'people/sarah-chen' }) as any[];
|
||||
expect(chunks.length).toBeGreaterThan(0);
|
||||
expect(chunks[0].chunk_text).toBeTruthy();
|
||||
});
|
||||
|
||||
test('resolve_slugs finds partial match', async () => {
|
||||
const matches = await callOp('resolve_slugs', { partial: 'sarah' }) as string[];
|
||||
expect(matches).toContain('people/sarah-chen');
|
||||
});
|
||||
|
||||
test('resolve_slugs finds exact match', async () => {
|
||||
const matches = await callOp('resolve_slugs', { partial: 'people/sarah-chen' }) as string[];
|
||||
expect(matches).toContain('people/sarah-chen');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Ingest Log & Raw Data
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Ingest Log & Raw Data', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('log_ingest + get_ingest_log round trip', async () => {
|
||||
await callOp('log_ingest', {
|
||||
source_type: 'e2e-test',
|
||||
source_ref: 'test-run-1',
|
||||
pages_updated: ['people/sarah-chen', 'companies/novamind'],
|
||||
summary: 'E2E test ingest',
|
||||
});
|
||||
|
||||
const log = await callOp('get_ingest_log', { limit: 5 }) as any[];
|
||||
expect(log.length).toBeGreaterThanOrEqual(1);
|
||||
const entry = log.find((e: any) => e.source_ref === 'test-run-1');
|
||||
expect(entry).toBeDefined();
|
||||
expect(entry.source_type).toBe('e2e-test');
|
||||
});
|
||||
|
||||
test('put_raw_data + get_raw_data round trip', async () => {
|
||||
const testData = { education: 'Stanford CS 2020', title: 'CEO' };
|
||||
await callOp('put_raw_data', {
|
||||
slug: 'people/sarah-chen',
|
||||
source: 'crustdata',
|
||||
data: testData,
|
||||
});
|
||||
|
||||
const raw = await callOp('get_raw_data', {
|
||||
slug: 'people/sarah-chen',
|
||||
source: 'crustdata',
|
||||
}) as any[];
|
||||
expect(raw.length).toBeGreaterThanOrEqual(1);
|
||||
// JSONB may come back as string or parsed object
|
||||
const data = typeof raw[0].data === 'string' ? JSON.parse(raw[0].data) : raw[0].data;
|
||||
expect(data.education).toBe('Stanford CS 2020');
|
||||
expect(data.title).toBe('CEO');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Files (stub verification)
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Files', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('file_list returns empty initially', async () => {
|
||||
const files = await callOp('file_list', {}) as any[];
|
||||
expect(files.length).toBe(0);
|
||||
});
|
||||
|
||||
test('file_upload stores metadata + file_list shows it', async () => {
|
||||
// Create a temp file
|
||||
const tmpDir = mkdtempSync(join(tmpdir(), 'gbrain-e2e-'));
|
||||
const tmpFile = join(tmpDir, 'test-doc.pdf');
|
||||
writeFileSync(tmpFile, 'fake pdf content');
|
||||
|
||||
try {
|
||||
const result = await callOp('file_upload', {
|
||||
path: tmpFile,
|
||||
page_slug: 'people/sarah-chen',
|
||||
}) as any;
|
||||
expect(result.status).toBe('uploaded');
|
||||
expect(result.storage_path).toContain('sarah-chen');
|
||||
|
||||
// Verify file_list
|
||||
const files = await callOp('file_list', {}) as any[];
|
||||
expect(files.length).toBe(1);
|
||||
|
||||
// Verify file_url returns URI format
|
||||
const url = await callOp('file_url', { storage_path: result.storage_path }) as any;
|
||||
expect(url.url).toContain('gbrain:files/');
|
||||
} finally {
|
||||
rmSync(tmpDir, { recursive: true });
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Idempotency Stress
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Idempotency', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('double import produces no duplicates', async () => {
|
||||
// First import
|
||||
await importFixtures();
|
||||
const stats1 = await callOp('get_stats') as any;
|
||||
|
||||
// Second import (identical content)
|
||||
await importFixtures();
|
||||
const stats2 = await callOp('get_stats') as any;
|
||||
|
||||
expect(stats2.page_count).toBe(stats1.page_count);
|
||||
expect(stats2.chunk_count).toBe(stats1.chunk_count);
|
||||
});
|
||||
|
||||
test('modify one fixture, reimport, only that page updates', async () => {
|
||||
await importFixtures();
|
||||
const engine = getEngine();
|
||||
|
||||
// Modify sarah-chen content
|
||||
const modified = readFileSync(join(FIXTURES_PATH, 'people/sarah-chen.md'), 'utf-8')
|
||||
.replace('Stanford CS', 'MIT CS');
|
||||
|
||||
const result = await importFromContent(engine, 'people/sarah-chen', modified, { noEmbed: true });
|
||||
expect(result.status).toBe('imported');
|
||||
|
||||
// Other pages should have been skipped if reimported
|
||||
const content = readFileSync(join(FIXTURES_PATH, 'people/marcus-reid.md'), 'utf-8');
|
||||
const { parseMarkdown } = await import('../../src/core/markdown.ts');
|
||||
const parsed = parseMarkdown(content, 'people/marcus-reid.md');
|
||||
const result2 = await importFromContent(engine, parsed.slug, content, { noEmbed: true });
|
||||
expect(result2.status).toBe('skipped');
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Setup Journey (CLI subprocess tests)
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Setup Journey', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
const cliCwd = join(import.meta.dir, '../..');
|
||||
const cliEnv = () => ({ ...process.env, DATABASE_URL: process.env.DATABASE_URL! });
|
||||
|
||||
test('gbrain init --non-interactive connects and initializes', () => {
|
||||
const result = Bun.spawnSync({
|
||||
cmd: ['bun', 'run', 'src/cli.ts', 'init', '--non-interactive', '--url', process.env.DATABASE_URL!],
|
||||
cwd: cliCwd,
|
||||
env: cliEnv(),
|
||||
timeout: 15_000,
|
||||
});
|
||||
const stdout = new TextDecoder().decode(result.stdout);
|
||||
expect(result.exitCode).toBe(0);
|
||||
expect(stdout).toContain('Brain ready');
|
||||
});
|
||||
|
||||
test('gbrain import imports fixtures via CLI', () => {
|
||||
const result = Bun.spawnSync({
|
||||
cmd: ['bun', 'run', 'src/cli.ts', 'import', '--no-embed', FIXTURES_PATH],
|
||||
cwd: cliCwd,
|
||||
env: cliEnv(),
|
||||
timeout: 30_000,
|
||||
});
|
||||
const stdout = new TextDecoder().decode(result.stdout);
|
||||
expect(result.exitCode).toBe(0);
|
||||
expect(stdout).toContain('imported');
|
||||
});
|
||||
|
||||
test('gbrain search returns results via CLI', () => {
|
||||
const result = Bun.spawnSync({
|
||||
cmd: ['bun', 'run', 'src/cli.ts', 'search', 'NovaMind'],
|
||||
cwd: cliCwd,
|
||||
env: cliEnv(),
|
||||
timeout: 15_000,
|
||||
});
|
||||
const stdout = new TextDecoder().decode(result.stdout);
|
||||
expect(result.exitCode).toBe(0);
|
||||
expect(stdout.length).toBeGreaterThan(0);
|
||||
});
|
||||
|
||||
test('gbrain stats shows page count via CLI', () => {
|
||||
const result = Bun.spawnSync({
|
||||
cmd: ['bun', 'run', 'src/cli.ts', 'stats'],
|
||||
cwd: cliCwd,
|
||||
env: cliEnv(),
|
||||
timeout: 15_000,
|
||||
});
|
||||
expect(result.exitCode).toBe(0);
|
||||
});
|
||||
|
||||
test('gbrain health runs via CLI', () => {
|
||||
const result = Bun.spawnSync({
|
||||
cmd: ['bun', 'run', 'src/cli.ts', 'health'],
|
||||
cwd: cliCwd,
|
||||
env: cliEnv(),
|
||||
timeout: 15_000,
|
||||
});
|
||||
expect(result.exitCode).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Init Edge Cases
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Init Edge Cases', () => {
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('init --non-interactive without URL fails gracefully', () => {
|
||||
const env = { ...process.env };
|
||||
delete env.DATABASE_URL;
|
||||
delete env.GBRAIN_DATABASE_URL;
|
||||
const result = Bun.spawnSync({
|
||||
cmd: ['bun', 'run', 'src/cli.ts', 'init', '--non-interactive'],
|
||||
cwd: join(import.meta.dir, '../..'),
|
||||
env,
|
||||
timeout: 10_000,
|
||||
});
|
||||
expect(result.exitCode).not.toBe(0);
|
||||
});
|
||||
|
||||
test('double init is idempotent', async () => {
|
||||
await setupDB();
|
||||
const conn = getConn();
|
||||
const before = await conn.unsafe(`SELECT count(*) as n FROM information_schema.tables WHERE table_schema = 'public'`);
|
||||
|
||||
// Re-init
|
||||
const { initSchema } = await import('../../src/core/db.ts');
|
||||
await initSchema();
|
||||
|
||||
const after = await conn.unsafe(`SELECT count(*) as n FROM information_schema.tables WHERE table_schema = 'public'`);
|
||||
expect(after[0].n).toBe(before[0].n);
|
||||
});
|
||||
|
||||
test('init then import then re-init preserves pages', async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
const before = await callOp('get_stats') as any;
|
||||
|
||||
const { initSchema } = await import('../../src/core/db.ts');
|
||||
await initSchema();
|
||||
|
||||
const after = await callOp('get_stats') as any;
|
||||
expect(after.page_count).toBe(before.page_count);
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Schema Idempotency
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Schema Idempotency', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('initSchema twice produces no errors and same object count', async () => {
|
||||
const conn = getConn();
|
||||
const tables1 = await conn.unsafe(`SELECT count(*) as n FROM information_schema.tables WHERE table_schema = 'public'`);
|
||||
const indexes1 = await conn.unsafe(`SELECT count(*) as n FROM pg_indexes WHERE schemaname = 'public'`);
|
||||
|
||||
const { initSchema } = await import('../../src/core/db.ts');
|
||||
await initSchema();
|
||||
|
||||
const tables2 = await conn.unsafe(`SELECT count(*) as n FROM information_schema.tables WHERE table_schema = 'public'`);
|
||||
const indexes2 = await conn.unsafe(`SELECT count(*) as n FROM pg_indexes WHERE schemaname = 'public'`);
|
||||
|
||||
expect(tables2[0].n).toBe(tables1[0].n);
|
||||
expect(indexes2[0].n).toBe(indexes1[0].n);
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Schema Diff Guard
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Schema Diff Guard', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('all expected tables exist', async () => {
|
||||
const conn = getConn();
|
||||
const tables = await conn.unsafe(`
|
||||
SELECT table_name FROM information_schema.tables
|
||||
WHERE table_schema = 'public' AND table_type = 'BASE TABLE'
|
||||
ORDER BY table_name
|
||||
`);
|
||||
const tableNames = tables.map((t: any) => t.table_name);
|
||||
|
||||
const expected = [
|
||||
'config', 'content_chunks', 'files', 'ingest_log',
|
||||
'links', 'page_versions', 'pages', 'raw_data',
|
||||
'tags', 'timeline_entries',
|
||||
];
|
||||
for (const table of expected) {
|
||||
expect(tableNames).toContain(table);
|
||||
}
|
||||
});
|
||||
|
||||
test('pgvector extension is installed', async () => {
|
||||
const conn = getConn();
|
||||
const ext = await conn.unsafe(`SELECT extname FROM pg_extension WHERE extname = 'vector'`);
|
||||
expect(ext.length).toBe(1);
|
||||
});
|
||||
|
||||
test('pg_trgm extension is installed', async () => {
|
||||
const conn = getConn();
|
||||
const ext = await conn.unsafe(`SELECT extname FROM pg_extension WHERE extname = 'pg_trgm'`);
|
||||
expect(ext.length).toBe(1);
|
||||
});
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Performance Baselines
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeE2E('E2E: Performance Baselines', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('import + search + link performance', async () => {
|
||||
const [_, importMs] = await time(importFixtures);
|
||||
|
||||
const searchTimes: number[] = [];
|
||||
for (const q of ['NovaMind', 'hybrid search', 'Stanford', 'investor', 'compiled truth']) {
|
||||
const [__, ms] = await time(() => callOp('search', { query: q }));
|
||||
searchTimes.push(ms);
|
||||
}
|
||||
|
||||
const [___, linkMs] = await time(async () => {
|
||||
await callOp('add_link', { from: 'people/sarah-chen', to: 'companies/novamind' });
|
||||
await callOp('get_backlinks', { slug: 'companies/novamind' });
|
||||
});
|
||||
|
||||
searchTimes.sort((a, b) => a - b);
|
||||
const p50 = searchTimes[Math.floor(searchTimes.length * 0.5)];
|
||||
const p99 = searchTimes[searchTimes.length - 1];
|
||||
|
||||
console.log('\n Performance Baselines:');
|
||||
console.log(` Import 13 fixtures: ${importMs.toFixed(0)}ms`);
|
||||
console.log(` Search p50: ${p50.toFixed(0)}ms`);
|
||||
console.log(` Search p99: ${p99.toFixed(0)}ms`);
|
||||
console.log(` Link + backlink: ${linkMs.toFixed(0)}ms`);
|
||||
});
|
||||
});
|
||||
156
test/e2e/skills.test.ts
Normal file
156
test/e2e/skills.test.ts
Normal file
@@ -0,0 +1,156 @@
|
||||
/**
|
||||
* E2E Skill Tests — Tier 2 (requires API keys + openclaw)
|
||||
*
|
||||
* Tests gbrain skills via OpenClaw CLI invocations.
|
||||
* Asserts on DB state changes, not LLM output text.
|
||||
*
|
||||
* Requires:
|
||||
* - DATABASE_URL
|
||||
* - OPENAI_API_KEY
|
||||
* - ANTHROPIC_API_KEY
|
||||
* - openclaw CLI installed
|
||||
*
|
||||
* Skips gracefully if any dependency is missing.
|
||||
* Run: bun test test/e2e/skills.test.ts
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import { join } from 'path';
|
||||
import { hasDatabase, setupDB, teardownDB, importFixtures, getEngine } from './helpers.ts';
|
||||
|
||||
// Check all Tier 2 dependencies
|
||||
function hasTier2Deps(): boolean {
|
||||
if (!hasDatabase()) return false;
|
||||
if (!process.env.OPENAI_API_KEY) return false;
|
||||
if (!process.env.ANTHROPIC_API_KEY) return false;
|
||||
|
||||
// Check if openclaw is installed
|
||||
try {
|
||||
const result = Bun.spawnSync({ cmd: ['openclaw', '--version'] });
|
||||
return result.exitCode === 0;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
const skip = !hasTier2Deps();
|
||||
const describeT2 = skip ? describe.skip : describe;
|
||||
|
||||
if (skip) {
|
||||
test.skip('Tier 2 tests skipped (missing dependencies)', () => {});
|
||||
if (!hasDatabase()) console.log(' Skip reason: DATABASE_URL not set');
|
||||
else if (!process.env.OPENAI_API_KEY) console.log(' Skip reason: OPENAI_API_KEY not set');
|
||||
else if (!process.env.ANTHROPIC_API_KEY) console.log(' Skip reason: ANTHROPIC_API_KEY not set');
|
||||
else console.log(' Skip reason: openclaw CLI not installed');
|
||||
}
|
||||
|
||||
/**
|
||||
* Run openclaw with a prompt and gbrain MCP configured.
|
||||
* Returns { stdout, stderr, exitCode }.
|
||||
*/
|
||||
function runOpenClaw(prompt: string, timeoutMs = 60_000) {
|
||||
const result = Bun.spawnSync({
|
||||
cmd: ['openclaw', '-p', prompt],
|
||||
cwd: join(import.meta.dir, '../..'),
|
||||
env: {
|
||||
...process.env,
|
||||
// Ensure openclaw knows about gbrain MCP server
|
||||
},
|
||||
timeout: timeoutMs,
|
||||
});
|
||||
|
||||
return {
|
||||
stdout: new TextDecoder().decode(result.stdout),
|
||||
stderr: new TextDecoder().decode(result.stderr),
|
||||
exitCode: result.exitCode,
|
||||
};
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Ingest Skill
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeT2('E2E Tier 2: Ingest Skill', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('ingest a meeting transcript creates person pages and links', async () => {
|
||||
const transcript = `
|
||||
Meeting: NovaMind Board Update — April 1, 2025
|
||||
Attendees: Sarah Chen (CEO), Marcus Reid (Board, Threshold), David Kim (CFO)
|
||||
|
||||
Sarah presented Q1 metrics: 3 enterprise design partners signed, 47% MoM revenue growth.
|
||||
Marcus asked about competitive positioning vs AutoAgent and CopilotStack.
|
||||
David Kim presented runway analysis: 18 months at current burn rate.
|
||||
Decision: Hire VP Sales by end of Q2.
|
||||
Action: Sarah to draft VP Sales job description by April 7.
|
||||
`.trim();
|
||||
|
||||
const { stdout, exitCode } = runOpenClaw(
|
||||
`Ingest this meeting transcript into gbrain. Create or update pages for each person mentioned. Add timeline entries for today's date. Here is the transcript:\n\n${transcript}`,
|
||||
120_000,
|
||||
);
|
||||
|
||||
// Assert on DB state, not LLM output
|
||||
const engine = getEngine();
|
||||
const stats = await engine.getStats();
|
||||
expect(stats.page_count).toBeGreaterThan(0);
|
||||
|
||||
// Check if person pages were created (may use different slug formats)
|
||||
const pages = await engine.listPages({ type: 'person' });
|
||||
const pageNames = pages.map((p: any) => p.title?.toLowerCase() || '');
|
||||
|
||||
// At minimum, the transcript mentions 3 people
|
||||
// The LLM may or may not create pages for all of them
|
||||
// We assert that at least some pages were created
|
||||
expect(pages.length).toBeGreaterThanOrEqual(1);
|
||||
}, 180_000);
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Query Skill
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeT2('E2E Tier 2: Query Skill', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('query skill returns results for known topic', async () => {
|
||||
const { stdout, exitCode } = runOpenClaw(
|
||||
'Search gbrain for "hybrid search" and tell me what you found.',
|
||||
120_000,
|
||||
);
|
||||
|
||||
// The response should mention something about search
|
||||
expect(stdout.length).toBeGreaterThan(0);
|
||||
// exitCode 0 means the skill ran without errors
|
||||
expect(exitCode).toBe(0);
|
||||
}, 180_000);
|
||||
});
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Health Skill
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
describeT2('E2E Tier 2: Health Skill', () => {
|
||||
beforeAll(async () => {
|
||||
await setupDB();
|
||||
await importFixtures();
|
||||
});
|
||||
afterAll(teardownDB);
|
||||
|
||||
test('health skill reports brain status', async () => {
|
||||
const { stdout, exitCode } = runOpenClaw(
|
||||
'Check gbrain health and report the status.',
|
||||
120_000,
|
||||
);
|
||||
|
||||
expect(stdout.length).toBeGreaterThan(0);
|
||||
expect(exitCode).toBe(0);
|
||||
}, 180_000);
|
||||
});
|
||||
@@ -6,7 +6,7 @@ import type { BrainEngine } from '../src/core/engine.ts';
|
||||
|
||||
const TMP = join(import.meta.dir, '.tmp-import-test');
|
||||
|
||||
// Minimal mock engine that tracks calls
|
||||
// Minimal mock engine that tracks calls and supports transaction()
|
||||
function mockEngine(overrides: Partial<Record<string, any>> = {}): BrainEngine {
|
||||
const calls: { method: string; args: any[] }[] = [];
|
||||
const track = (method: string) => (...args: any[]) => {
|
||||
@@ -20,6 +20,8 @@ function mockEngine(overrides: Partial<Record<string, any>> = {}): BrainEngine {
|
||||
if (prop === '_calls') return calls;
|
||||
if (prop === 'getTags') return overrides.getTags || (() => Promise.resolve([]));
|
||||
if (prop === 'getPage') return overrides.getPage || (() => Promise.resolve(null));
|
||||
// transaction: just call the fn with the same engine (no real DB transaction in tests)
|
||||
if (prop === 'transaction') return async (fn: (tx: BrainEngine) => Promise<any>) => fn(engine);
|
||||
return track(prop);
|
||||
},
|
||||
});
|
||||
@@ -74,7 +76,6 @@ This is the compiled truth.
|
||||
|
||||
test('skips files larger than MAX_FILE_SIZE (1MB)', async () => {
|
||||
const filePath = join(TMP, 'big-file.md');
|
||||
// Create a file > 1MB
|
||||
const bigContent = '---\ntitle: Big\n---\n' + 'x'.repeat(1_100_000);
|
||||
writeFileSync(filePath, bigContent);
|
||||
|
||||
@@ -83,7 +84,6 @@ This is the compiled truth.
|
||||
|
||||
expect(result.status).toBe('skipped');
|
||||
expect(result.error).toContain('too large');
|
||||
// Engine should NOT have been called
|
||||
expect((engine as any)._calls.length).toBe(0);
|
||||
});
|
||||
|
||||
@@ -97,10 +97,26 @@ title: Unchanged
|
||||
Same content.
|
||||
`);
|
||||
|
||||
// Mock engine returns a page with matching hash
|
||||
// Hash now includes ALL fields (title, type, frontmatter, tags)
|
||||
const { createHash } = await import('crypto');
|
||||
const { parseMarkdown } = await import('../src/core/markdown.ts');
|
||||
const content = `---
|
||||
type: concept
|
||||
title: Unchanged
|
||||
---
|
||||
|
||||
Same content.
|
||||
`;
|
||||
const parsed = parseMarkdown(content, 'concepts/unchanged.md');
|
||||
const hash = createHash('sha256')
|
||||
.update('Same content.\n---\n')
|
||||
.update(JSON.stringify({
|
||||
title: parsed.title,
|
||||
type: parsed.type,
|
||||
compiled_truth: parsed.compiled_truth,
|
||||
timeline: parsed.timeline,
|
||||
frontmatter: parsed.frontmatter,
|
||||
tags: parsed.tags.sort(),
|
||||
}))
|
||||
.digest('hex');
|
||||
|
||||
const engine = mockEngine({
|
||||
@@ -110,7 +126,6 @@ Same content.
|
||||
const result = await importFile(engine, filePath, 'concepts/unchanged.md', { noEmbed: true });
|
||||
expect(result.status).toBe('skipped');
|
||||
|
||||
// putPage should NOT have been called
|
||||
const calls = (engine as any)._calls;
|
||||
const putCall = calls.find((c: any) => c.method === 'putPage');
|
||||
expect(putCall).toBeUndefined();
|
||||
@@ -138,11 +153,8 @@ Content here.
|
||||
const removeCalls = calls.filter((c: any) => c.method === 'removeTag');
|
||||
const addCalls = calls.filter((c: any) => c.method === 'addTag');
|
||||
|
||||
// old-tag should be removed (not in new set)
|
||||
expect(removeCalls.length).toBe(1);
|
||||
expect(removeCalls[0].args[1]).toBe('old-tag');
|
||||
|
||||
// new-tag and kept-tag should be added
|
||||
expect(addCalls.length).toBe(2);
|
||||
});
|
||||
|
||||
@@ -164,7 +176,7 @@ This is compiled truth content that should be chunked as compiled_truth source.
|
||||
const result = await importFile(engine, filePath, 'concepts/chunked.md', { noEmbed: true });
|
||||
|
||||
expect(result.status).toBe('imported');
|
||||
expect(result.chunks).toBeGreaterThanOrEqual(2); // at least 1 CT + 1 TL
|
||||
expect(result.chunks).toBeGreaterThanOrEqual(2);
|
||||
|
||||
const calls = (engine as any)._calls;
|
||||
const chunkCall = calls.find((c: any) => c.method === 'upsertChunks');
|
||||
@@ -231,7 +243,6 @@ Content to chunk but not embed.
|
||||
const result = await importFile(engine, filePath, 'concepts/no-embed.md', { noEmbed: true });
|
||||
|
||||
expect(result.status).toBe('imported');
|
||||
// Chunks should NOT have embeddings
|
||||
const calls = (engine as any)._calls;
|
||||
const chunkCall = calls.find((c: any) => c.method === 'upsertChunks');
|
||||
if (chunkCall) {
|
||||
|
||||
95
test/parity.test.ts
Normal file
95
test/parity.test.ts
Normal file
@@ -0,0 +1,95 @@
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { operations, operationsByName } from '../src/core/operations.ts';
|
||||
import type { Operation } from '../src/core/operations.ts';
|
||||
|
||||
describe('operations contract parity', () => {
|
||||
test('every operation has a unique name', () => {
|
||||
const names = operations.map(op => op.name);
|
||||
expect(new Set(names).size).toBe(names.length);
|
||||
});
|
||||
|
||||
test('every operation has required fields', () => {
|
||||
for (const op of operations) {
|
||||
expect(op.name).toBeTruthy();
|
||||
expect(op.description).toBeTruthy();
|
||||
expect(typeof op.handler).toBe('function');
|
||||
expect(op.params).toBeDefined();
|
||||
}
|
||||
});
|
||||
|
||||
test('operationsByName matches operations array', () => {
|
||||
expect(Object.keys(operationsByName).length).toBe(operations.length);
|
||||
for (const op of operations) {
|
||||
expect(operationsByName[op.name]).toBe(op);
|
||||
}
|
||||
});
|
||||
|
||||
test('every required param has a type', () => {
|
||||
for (const op of operations) {
|
||||
for (const [key, def] of Object.entries(op.params)) {
|
||||
expect(['string', 'number', 'boolean', 'object', 'array']).toContain(def.type);
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
test('mutating operations have dry_run support', () => {
|
||||
const mutating = operations.filter(op => op.mutating);
|
||||
expect(mutating.length).toBeGreaterThan(0);
|
||||
// Verify all mutating ops exist
|
||||
for (const op of mutating) {
|
||||
expect(op.mutating).toBe(true);
|
||||
}
|
||||
});
|
||||
|
||||
test('CLI names are unique across operations', () => {
|
||||
const cliNames = operations
|
||||
.filter(op => op.cliHints?.name)
|
||||
.map(op => op.cliHints!.name!);
|
||||
expect(new Set(cliNames).size).toBe(cliNames.length);
|
||||
});
|
||||
|
||||
test('CLI positional params reference valid param names', () => {
|
||||
for (const op of operations) {
|
||||
if (op.cliHints?.positional) {
|
||||
for (const pos of op.cliHints.positional) {
|
||||
expect(op.params).toHaveProperty(pos);
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
test('CLI stdin param references a valid param name', () => {
|
||||
for (const op of operations) {
|
||||
if (op.cliHints?.stdin) {
|
||||
expect(op.params).toHaveProperty(op.cliHints.stdin);
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
test('operations count is at least 30', () => {
|
||||
expect(operations.length).toBeGreaterThanOrEqual(30);
|
||||
});
|
||||
|
||||
test('MCP tool definitions can be generated from operations', () => {
|
||||
const tools = operations.map(op => ({
|
||||
name: op.name,
|
||||
inputSchema: {
|
||||
type: 'object',
|
||||
properties: Object.fromEntries(
|
||||
Object.entries(op.params).map(([k, v]) => [k, { type: v.type }]),
|
||||
),
|
||||
required: Object.entries(op.params)
|
||||
.filter(([, v]) => v.required)
|
||||
.map(([k]) => k),
|
||||
},
|
||||
}));
|
||||
|
||||
// Every operation generates a valid tool definition
|
||||
for (const tool of tools) {
|
||||
expect(tool.name).toBeTruthy();
|
||||
expect(tool.inputSchema.type).toBe('object');
|
||||
expect(typeof tool.inputSchema.properties).toBe('object');
|
||||
expect(Array.isArray(tool.inputSchema.required)).toBe(true);
|
||||
}
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user