Files

Garry Tan baf3517868 feat: v0.9.0 -- smart file storage, publish, production-grade skills (#62 )

* feat: battle-tested skill patterns from production deployment

Backport production-learned brain-operations patterns:
- Iron Law of Back-Linking (mandatory bidirectional linking)
- Brain filing rules (file by primary subject, not format)
- Enrichment protocol (7-step pipeline, 3-tier system, person/company templates)
- Media ingest workflows (articles, videos, podcasts, PDFs, screenshots)
- Citation requirements (mandatory [Source: ...] on every fact)
- Test Before Bulk operating principle
- Voice recipe: unicode crash fix, PII scrub, identity-first prompt, DIY STT+LLM+TTS
- X-to-Brain recipe: image OCR, Filtered Stream, tweet rating rubric, cron stagger

* chore: bump version and changelog (v0.8.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add _brain-filing-rules.md to CLAUDE.md key files

* feat: smart file upload with TUS resumable and .redirect.yaml pointers

- Supabase Storage auto-selects upload method by file size:
  < 100 MB standard POST, >= 100 MB TUS resumable (6 MB chunks + retry)
- Signed URL generation for private bucket access (1-hour expiry)
- New `upload-raw` command with size routing: small text stays in git,
  large/media files go to cloud with .redirect.yaml pointer
- New `signed-url` command for generating access links
- File resolver supports both .redirect.yaml (v0.9+) and .redirect (legacy)
- Redirect format upgraded: 10 fields with full metadata
- All migration commands (mirror, redirect, restore, clean) handle both formats

* feat: skills reference actual gbrain file commands

- Filing rules document upload-raw, signed-url, and .redirect.yaml format
- Ingest skill uses gbrain files upload-raw for raw source preservation
- Maintain skill adds file storage health checks
- Setup skill adds storage configuration phase with migration guidance
- Voice recipe uses upload-raw for call audio storage
- Migration v0.9.0 with complete storage setup instructions

* chore: bump version and changelog (v0.9.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: gbrain publish -- shareable HTML with password protection

First code+skill pair: deterministic code does the work (strip private data,
encrypt with AES-256-GCM, generate self-contained HTML), the skill tells the
agent when and how to use it. 34 new tests.

See: https://x.com/garrytan/status/2042925773300908103

* feat: backlinks check/fix, page lint, and report commands

Three new deterministic tools (zero LLM calls):

- gbrain backlinks check/fix -- scans brain for entity mentions without
  back-links, creates them. Enforces the Iron Law from the skills.
- gbrain lint [--fix] -- catches LLM preambles, code fence wrapping,
  placeholder dates, missing frontmatter, broken citations, empty sections.
  --fix auto-strips fixable artifacts.
- gbrain report --type <name> -- saves timestamped reports to
  brain/reports/{type}/YYYY-MM-DD-HHMM.md for audit trails.

33 new tests (409 total, 0 fail).

* feat: v0.9.0 migration tells agents to swap scripts for built-in commands

Migration file now:
- Lists all 5 new deterministic commands with usage examples
- Includes a script-to-command replacement table (old -> new)
- Tells the agent to find custom script references in AGENTS.md,
  skills, and cron jobs and replace with gbrain commands
- Adds recommended cron jobs for daily backlink fix + weekly lint
- References the Thin Harness, Fat Skills thread

* fix: CLI routing bugs found during DX review

- Fixed subArgs reference error in handleCliOnly (used wrong variable name)
- Renamed gbrain backlinks check/fix to gbrain check-backlinks to avoid
  conflict with existing backlinks operation (per-page incoming links)
- Added TOOLS section to --help output showing publish, check-backlinks,
  lint, report
- Added upload-raw and signed-url to FILES section in --help
- Updated all docs/migration references to use check-backlinks

* fix: security hardening from adversarial review

- XSS: sanitize marked.parse() output (strip script/iframe/on* attrs)
- Path traversal: validate report --type against [a-z0-9-] pattern
- TUS: HEAD request before retry to get server's actual offset (TUS spec)
- Pointer: upload-raw now includes pointer content in JSON output
- Symlinks: use lstatSync in all walkers to prevent directory escape

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-11 21:46:07 -10:00

32 KiB

Raw Blame History

Changelog

All notable changes to GBrain will be documented in this file.

[0.9.0] - 2026-04-11

Added

Large files don't bloat your git repo anymore. gbrain files upload-raw auto-routes by size: text and PDFs under 100 MB stay in git, everything larger (or any media file) goes to Supabase Storage with a .redirect.yaml pointer left in the repo. Files over 100 MB use TUS resumable upload (6 MB chunks with retry and backoff) so a flaky connection doesn't lose a 2 GB video upload. gbrain files signed-url generates 1-hour access links for private buckets.
The full file migration lifecycle works end to end. mirror uploads to cloud and keeps local copies. redirect replaces local files with .redirect.yaml pointers (verifies remote exists first, won't delete data). restore downloads back from cloud. clean removes pointers when you're sure. status shows where you are. Three states, zero data loss risk.
Your brain now enforces its own graph integrity. The Iron Law of Back-Linking is mandatory across all skills. Every mention of a person or company creates a bidirectional link. This transforms your brain from a flat file store into a traversable knowledge graph.
Filing rules prevent the #1 brain mistake. New skills/_brain-filing-rules.md stops the most common error: dumping everything into sources/. File by primary subject, not format. Includes notability gate and citation requirements.
Enrichment protocol that actually works. Rewritten from a 46-line API list to a 7-step pipeline with 3-tier system, person/company page templates, pluggable data sources, validation rules, and bulk enrichment safety.
Ingest handles everything. Articles, videos, podcasts, PDFs, screenshots, meeting transcripts, social media. Each with a workflow that uses real gbrain commands (upload-raw, signed-url) instead of theoretical patterns.
Citation requirements across all skills. Every fact needs inline [Source: ...] citations. Three formats, source precedence hierarchy.
Maintain skill catches what you missed. Back-link enforcement, citation audit, filing violations, file storage health checks, benchmark testing.
Voice calls don't crash on em dashes anymore. Unicode sanitization for Twilio WebSocket, PII scrub, identity-first prompt, DIY STT+LLM+TTS pipeline option, Smart VAD default, auto-upload call audio via gbrain files upload-raw.
X-to-Brain gets eyes. Image OCR, Filtered Stream real-time monitoring, 6-dimension tweet rating rubric, outbound tweet monitoring, cron staggering.
Share brain pages without exposing the brain. gbrain publish generates beautiful, self-contained HTML from any brain page. Strips private data (frontmatter, citations, confirmations, brain links, timeline) automatically. Optional AES-256-GCM password gate with client-side decryption, no server needed. Dark/light mode, mobile-optimized typography. This is the first code+skill pair: deterministic code does the work, the skill tells the agent when and how. See the Thin Harness, Fat Skills thread for the architecture philosophy.

Changed

Supabase Storage now auto-selects upload method by file size: standard POST for < 100 MB, TUS resumable for >= 100 MB. Signed URL generation for private bucket access (1-hour expiry).
File resolver supports both .redirect.yaml (v0.9+) and legacy .redirect (v0.8) formats for backward compatibility.
Redirect format upgraded from .redirect (5 fields) to .redirect.yaml (10 fields: target, bucket, storage_path, size, size_human, hash, mime, uploaded, source_url, type).
All skills updated to reference actual gbrain files commands instead of theoretical patterns.
Back-link enforcer closes the loop. gbrain check-backlinks check scans your brain for entity mentions without back-links. gbrain check-backlinks fix creates them. The Iron Law of Back-Linking is in every skill, now the code enforces it.
Page linter catches LLM slop. gbrain lint flags "Of course! Here is..." preambles, wrapping code fences, placeholder dates, missing frontmatter, broken citations, and empty sections. gbrain lint --fix auto-strips the fixable ones. Every brain that uses AI for ingestion accumulates this. Now it's one command.
Audit trail for everything. gbrain report --type enrichment-sweep saves timestamped reports to brain/reports/{type}/YYYY-MM-DD-HHMM.md. The maintain skill references this for enrichment sweeps, meeting syncs, and maintenance runs.
Publish skill added to manifest (8th skill). First code+skill pair.
Skills version bumped to 0.9.0.
67 new unit tests across publish, backlinks, lint, and report. Total: 409 pass.

[0.8.0] - 2026-04-11

Added

Your AI can answer the phone now. Voice-to-brain v0.8.0 ships 25 production patterns from a real deployment. WebRTC works in a browser tab with just an OpenAI key, phone number via Twilio is optional. Your agent picks its own name and personality. Pre-computed engagement bids mean it greets you with something specific ("dude, your social radar caught something wild today"), not "how can I help you?" Context-first prompts, proactive advisor mode, caller routing, dynamic noise suppression, stuck watchdog, thinking sounds during tool calls. This is the "Her" experience, out of the box.
Upgrade = feature discovery. When you upgrade to v0.8.0, the CLI tells you what's new and your agent offers to set up voice immediately. WebRTC-first (zero setup), then asks about a phone number. Migration files now have YAML frontmatter with feature_pitch so every future version can pitch its headline feature through the upgrade flow.
Remote MCP simplified. The Supabase Edge Function deployment is gone. Remote MCP now uses a self-hosted server + ngrok tunnel. Simpler, more reliable, works with any AI client. All docs/mcp/ guides updated to reflect the actual production architecture.

Changed

Voice recipe is now 25 production patterns deep. Identity separation, pre-computed bid system, context-first prompts, proactive advisor mode, conversation timing (the #1 fix), no-repetition rule, radical prompt compression (13K to 4.7K tokens), OpenAI Realtime Prompting Guide structure, auth-before-speech, brain escalation, stuck watchdog, never-hang-up rule, thinking sounds, fallback TwiML, tool set architecture, trusted user auth, caller routing, dynamic VAD, on-screen debug UI, live moment capture, belt-and-suspenders post-call, mandatory 3-step post-call, WebRTC parity, dual API event handling, report-aware query routing.
WebRTC session pseudocode updated. Native FormData, tools in session config, type: 'realtime' on all session.update calls. WebRTC transcription NOT supported over data channel (use Whisper post-call).
MCP docs rewritten. All per-client guides (Claude Code, Claude Desktop, Cowork, Perplexity) updated from Edge Function URLs to self-hosted + ngrok pattern.

Removed

Supabase Edge Function MCP deployment. scripts/deploy-remote.sh, supabase/functions/gbrain-mcp/, src/edge-entry.ts, .env.production.example, docs/mcp/CHATGPT.md all removed. The Edge Function never worked reliably. Self-hosted + ngrok is the path.

[0.7.0] - 2026-04-11

Added

Your brain now runs locally with zero infrastructure. PGLite (Postgres 17.5 compiled to WASM) gives you the exact same search quality as Supabase, same pgvector HNSW, same pg_trgm fuzzy matching, same tsvector full-text search. No server, no subscription, no API keys needed for keyword search. gbrain init and you're running in 2 seconds.
Smart init defaults to local. gbrain init now creates a PGLite brain by default. If your repo has 1000+ markdown files, it suggests Supabase for scale. --supabase and --pglite flags let you choose explicitly.
Migrate between engines anytime. gbrain migrate --to supabase transfers your entire brain (pages, chunks, embeddings, tags, links, timeline) to remote Postgres with manifest-based resume. gbrain migrate --to pglite goes the other way. Embeddings copy directly, no re-embedding needed.
Pluggable engine factory. createEngine() dynamically loads the right engine from config. PGLite WASM is never loaded for Postgres users.
Search works without OpenAI. hybridSearch now checks for OPENAI_API_KEY before attempting embeddings. No key = keyword-only search. No more crashes when you just want to search your local brain.
Your brain gets new senses automatically. Integration recipes teach your agent how to wire up voice calls, email, Twitter, and calendar into your brain. Run gbrain integrations to see what's available. Your agent reads the recipe, asks for API keys, validates each one, and sets everything up. Markdown is code -- the recipe IS the installer.
Voice-to-brain: phone calls create brain pages. The first recipe: Twilio + OpenAI Realtime voice agent. Call a number, talk, and a structured brain page appears with entity detection, cross-references, and a summary posted to your messaging app. Opinionated defaults: caller screening, brain-first lookup, quiet hours, thinking sounds. The smoke test calls YOU (outbound) so you experience the magic immediately.
gbrain integrations command. Six subcommands for managing integration recipes: list (dashboard of senses + reflexes), show (recipe details), status (credential checks with direct links to get missing keys), doctor (health checks), stats (signal analytics), test (recipe validation). --json on every subcommand for agent-parseable output. No database connection needed.
Health heartbeat. Integrations log events to ~/.gbrain/integrations/<id>/heartbeat.jsonl. Status checks detect stale integrations and include diagnostic steps.
17 individually linkable SKILLPACK guides. The 1,281-line monolith is now broken into standalone guides at docs/guides/, organized by category. Each guide is individually searchable and linkable. The SKILLPACK index stays at the same URL (backward compatible).
"Getting Data In" documentation. New docs/integrations/ with a landing page, recipe format documentation, credential gateway guide, and meeting webhook guide. Explains the deterministic collector pattern: code for data, LLMs for judgment.
Architecture and philosophy docs. docs/architecture/infra-layer.md documents the shared foundation (import, chunk, embed, search). docs/ethos/THIN_HARNESS_FAT_SKILLS.md is Garry's essay on the architecture philosophy with an agent decision guide. docs/designs/HOMEBREW_FOR_PERSONAL_AI.md maps the 10-star vision.

Changed

Engine interface expanded. Added runMigration() (replaces internal driver access for schema migrations) and getChunksWithEmbeddings() (loads embedding data for cross-engine migration).
Shared utilities extracted. validateSlug, contentHash, and row mappers moved from postgres-engine.ts to src/core/utils.ts. Both engines share them.
Config infers engine type. If database_path is set but engine is missing, config now infers pglite instead of defaulting to postgres.
Import serializes on PGLite. Parallel workers are Postgres-only. PGLite uses sequential import (single-connection architecture).

[0.6.1] - 2026-04-10

Fixed

Import no longer silently drops files with "..." in the name. The path traversal check rejected any filename containing two consecutive dots, killing 1.2% of files in real-world corpora (YouTube transcripts, TED talks, podcast titles). Now only rejects actual traversal patterns like ../. Community fix wave, 8 contributors.
Import no longer crashes on JavaScript/TypeScript projects. The file walker crashed on node_modules directories and broken symlinks. Now skips node_modules and handles broken symlinks gracefully with a warning.
gbrain init exits cleanly after setup. Previously hung forever because stdin stayed open. Now pauses stdin after reading input.
pgvector extension auto-created during init. No more copy-pasting SQL into the Supabase editor. gbrain init now runs CREATE EXTENSION IF NOT EXISTS vector automatically, with a clear fallback message if it can't.
Supabase connection string hint matches current dashboard UI. Updated navigation path to match the 2026 Supabase dashboard layout.
Hermes Agent link fixed in README. Pointed to the correct NousResearch GitHub repo.

Changed

Search is faster. Keyword search now runs in parallel with the embedding pipeline instead of waiting for it. Saves ~200-500ms per hybrid search call.
.mdx files are now importable. The import walker, sync filter, and slug generator all recognize .mdx alongside .md.

Added

Community PR wave process documented in CLAUDE.md for future contributor batches.

Contributors

Thank you to everyone who reported bugs, submitted fixes, and helped make GBrain better:

@orendi84 — slug validator ellipsis fix (PR #31)
@mattbratos — import walker resilience + MDX support (PRs #26, #27)
@changergosum — init exit fix + auto pgvector (PRs #17, #18)
@eric-hth — Supabase UI hint update (PR #30)
@irresi — parallel hybrid search (PR #8)
@howardpen9 — Hermes Agent link fix (PR #34)
@cktang88 — the thorough 12-bug report that drove v0.6.0 (Issue #22)
@mvanhorn — MCP schema handler fix (PR #25)

[0.6.0] - 2026-04-10

Added

Access your brain from any AI client. Deploy GBrain as a serverless remote MCP endpoint on your existing Supabase instance. Works with Claude Desktop, Claude Code, Cowork, and Perplexity Computer. One URL, bearer token auth, zero new infrastructure. Clone the repo, fill in 3 env vars, run scripts/deploy-remote.sh, done.
Per-client setup guides in docs/mcp/ for Claude Code, Claude Desktop, Cowork, Perplexity, and ChatGPT (coming soon, requires OAuth 2.1). Also documents Tailscale Funnel and ngrok as self-hosted alternatives.
Token management via standalone src/commands/auth.ts. Create, list, revoke per-client bearer tokens. Includes smoke test: auth.ts test <url> --token <token> verifies the full pipeline (initialize + tools/list + get_stats) in 3 seconds.
Usage logging via mcp_request_log table. Every remote tool call logs token name, operation, latency, and status for debugging and security auditing.
Hardened health endpoint at /health. Unauthenticated: 200/503 only (no info disclosure). Authenticated: checks postgres, pgvector, and OpenAI API key status.

Fixed

MCP server actually connects now. Handler registration used string literals ('tools/list' as any) instead of SDK typed schemas. Replaced with ListToolsRequestSchema and CallToolRequestSchema. Without this fix, gbrain serve silently failed to register handlers. (Issue #9)
Search results no longer flooded by one large page. Keyword search returned ALL chunks from matching pages. Now returns one best chunk per page via DISTINCT ON. (Issue #22)
Search dedup no longer collapses to one chunk per page. Layer 1 kept only the single highest-scoring chunk per slug. Now keeps top 3, letting later dedup layers (text similarity, cap per page) do their job. (Issue #22)
Transactions no longer corrupt shared state. Both PostgresEngine.transaction() and db.withTransaction() swapped the shared connection reference, breaking under concurrent use. Now uses scoped engine via Object.create with no shared state mutation. (Issue #22)
embed --stale no longer wipes valid embeddings. upsertChunks() deleted all chunks then re-inserted, writing NULL for chunks without new embeddings. Now uses UPSERT (INSERT ON CONFLICT UPDATE) with COALESCE to preserve existing embeddings. (Issue #22)
Slug normalization is consistent. pathToSlug() preserved case while inferSlug() lowercased. Now validateSlug() enforces lowercase at the validation layer, covering all entry points. (Issue #22)
initSchema no longer reads from disk at runtime. Both schema loaders used readFileSync with import.meta.url, which broke in compiled binaries and Deno Edge Functions. Schema is now embedded at build time via scripts/build-schema.sh. (Issue #22)
file_upload actually uploads content. The operation wrote DB metadata but never called the storage backend. Fixed in all 3 paths (operation, CLI upload, CLI sync) with rollback semantics. (Issue #22)
S3 storage backend authenticates requests. signedFetch() was just unsigned fetch(). Replaced with @aws-sdk/client-s3 for proper SigV4 signing. Supports R2/MinIO via forcePathStyle. (Issue #22)
Parallel import uses thread-safe queue. queue.shift() had race conditions under parallel workers. Now uses an atomic index counter. Checkpoint preserved on errors for safe resume. (Issue #22)
redirect verifies remote existence before deleting local files. Previously deleted local files unconditionally. Now checks storage backend before removing. (Issue #22)
gbrain call respects dry_run. handleToolCall() hardcoded dryRun: false. Now reads from params. (Issue #22)

Changed

Added @aws-sdk/client-s3 as a dependency for authenticated S3 operations.
Schema migration v2: unique index on content_chunks(page_id, chunk_index) for UPSERT support.
Schema migration v3: access_tokens and mcp_request_log tables for remote MCP auth.

[0.5.1] - 2026-04-10

Fixed

Apple Notes and files with spaces just work. Paths like Apple Notes/2017-05-03 ohmygreen.md now auto-slugify to clean slugs (apple-notes/2017-05-03-ohmygreen). Spaces become hyphens, parens and special characters are stripped, accented characters normalize to ASCII. All 5,861+ Apple Notes files import cleanly without manual renaming.
Existing brains auto-migrate. On first run after upgrade, a one-time migration renames all existing slugs with spaces or special characters to their clean form. Links are rewritten automatically. No manual cleanup needed.
Import and sync produce identical slugs. Both pipelines now use the same slugifyPath() function, eliminating the mismatch where sync preserved case but import lowercased.

[0.5.0] - 2026-04-10

Added

Your brain never falls behind. Live sync keeps the vector DB current with your brain repo automatically. Set up a cron, use --watch, hook into GitHub webhooks, or use git hooks. Your agent picks whatever fits its environment. Edit a markdown file, push, and within minutes it's searchable. No more stale embeddings serving wrong answers.
Know your install actually works. New verification runbook (docs/GBRAIN_VERIFY.md) catches the silent failures that used to go unnoticed: the pooler bug that skips pages, missing embeddings, stale sync. The real test: push a correction, wait, search for it. If the old text comes back, sync is broken and the runbook tells you exactly why.
New installs set up live sync automatically. The setup skill now includes live sync (Phase H) and full verification (Phase I) as mandatory steps. Agents that install GBrain will configure automatic sync and verify it works before declaring setup complete.
Fixes the silent page-skip bug. If your Supabase connection uses the Transaction mode pooler, sync silently skips most pages. The new docs call this out as a hard prerequisite with a clear fix (switch to Session mode). The verification runbook catches it by comparing page count against file count.

[0.4.2] - 2026-04-10

Changed

All GitHub Actions pinned to commit SHAs across test, e2e, and release workflows. Prevents supply chain attacks via mutable version tags.
Workflow permissions hardened: contents: read on test and e2e workflows limits GITHUB_TOKEN blast radius.
OpenClaw CI install pinned to v2026.4.9 instead of pulling latest.

Added

Gitleaks secret scanning CI job runs on every push and PR. Catches accidentally committed API keys, tokens, and credentials.
.gitleaks.toml config with allowlists for test fixtures and example files.
GitHub Actions SHA maintenance rule in CLAUDE.md so pins stay fresh on every /ship and /review.
S3 Sig V4 TODO for future implementation when S3 storage becomes a deployment path.

[0.4.1] - 2026-04-09

Added

gbrain check-update command with --json output. Checks GitHub Releases for new versions, compares semver (minor+ only, skips patches), fetches and parses changelog diffs. Fail-silent on network errors.
SKILLPACK Section 17: Auto-Update Notifications. Full agent playbook for the update lifecycle: check, notify, consent, upgrade, skills refresh, schema sync, report. Never auto-upgrades without user permission.
Standalone SKILLPACK self-update for users who load the skillpack directly without the gbrain CLI. Version markers in SKILLPACK and RECOMMENDED_SCHEMA headers, with raw GitHub URL fetching.
Step 7 in the OpenClaw install paste: daily update checks, default-on. User opts into being notified about updates, not into automatic installs.
Setup skill Phase G: conditional auto-update offer for manual install users.
Schema state tracking via ~/.gbrain/update-state.json. Tracks which recommended schema directories the user adopted, declined, or added custom. Future upgrades suggest new additions without re-suggesting declined items.
skills/migrations/ directory convention for version-specific post-upgrade agent directives.
20 unit tests and 5 E2E tests for the check-update command, covering version comparison, changelog extraction, CLI wiring, and real GitHub API interaction.
E2E test DB lifecycle documentation in CLAUDE.md: spin up, run tests, tear down. No orphaned containers.

Changed

detectInstallMethod() exported from upgrade.ts for reuse by check-update.

Fixed

Semver comparison in changelog extraction was missing major-version guard, causing incorrect changelog entries to appear when crossing major version boundaries.

[0.4.0] - 2026-04-09

Added

gbrain doctor command with --json output. Checks pgvector extension, RLS policies, schema version, embedding coverage, and connection health. Agents can self-diagnose issues.
Pluggable storage backends: S3, Supabase Storage, and local filesystem. Choose where binary files live independently of the database. Configured via gbrain init or environment variables.
Parallel import with per-worker engine instances. Large brain imports now use multiple database connections concurrently instead of a single serial pipeline.
Import resume checkpoints. If gbrain import is interrupted, it picks up where it left off instead of re-importing everything.
Automatic schema migration runner. On connect, gbrain detects the current schema version and applies any pending migrations without manual intervention.
Row-Level Security (RLS) enabled on all tables with BYPASSRLS safety check. Every query goes through RLS policies.
--json flag on gbrain init and gbrain import for machine-readable output. Agents can parse structured results instead of scraping CLI text.
File migration CLI (gbrain files migrate) for moving files between storage backends. Two-way-door: test with --dry-run, migrate incrementally.
Bulk chunk INSERT for faster page writes. Chunks are inserted in a single statement instead of one-at-a-time.
Supabase smart URL parsing: automatically detects and converts IPv6-only pooler URLs to the correct connection format.
56 new unit tests covering doctor, storage backends, file migration, import resume, slug validation, setup branching, Supabase admin, and YAML parsing. Test suite grew from 9 to 19 test files.
E2E tests for parallel import concurrency and all new features.

Fixed

validateSlug now accepts any filename characters (spaces, unicode, special chars) instead of rejecting non-alphanumeric slugs. Apple Notes and other real-world filenames import cleanly.
Import resilience: files over 5MB are skipped with a warning instead of crashing the pipeline. Errors in individual files no longer abort the entire import.
gbrain init detects IPv6-only Supabase URLs and adds the required pgvector check during setup.
E2E test fixture counts, CLI argument parsing, and doctor exit codes cleaned up.

Changed

Setup skill and README rewritten for agent-first developer experience.
Maintain skill updated with RLS verification, schema health checks, and nohup hints for large embedding jobs.

[0.3.0] - 2026-04-08

Added

Contract-first architecture: single operations.ts defines ~30 shared operations. CLI, MCP, and tools-json all generated from the same source. Zero drift.
OperationError type with structured error codes (page_not_found, invalid_params, embedding_failed, etc.). Agents can self-correct.
dry_run parameter on all mutating operations. Agents preview before committing.
importFromContent() split from importFile(). Both share the same chunk+embed+tag pipeline, but importFromContent works from strings (used by put_page). Wrapped in engine.transaction().
Idempotency hash now includes ALL fields (title, type, frontmatter, tags), not just compiled_truth + timeline. Metadata-only edits no longer silently skipped.
get_page now supports optional fuzzy: true for slug resolution. Returns resolved_slug so callers know what happened.
query operation now supports expand toggle (default true). Both CLI and MCP get the same control.
10 new operations wired up: put_raw_data, get_raw_data, resolve_slugs, get_chunks, log_ingest, get_ingest_log, file_list, file_upload, file_url.
OpenClaw bundle plugin manifest (openclaw.plugin.json) with config schema, MCP server config, and skill listing.
GitHub Actions CI: test on push/PR, multi-platform release builds (macOS arm64 + Linux x64) on version tags.
gbrain init --non-interactive flag for plugin mode (accepts config via flags/env vars, no TTY required).
Post-upgrade version verification in gbrain upgrade.
Parity test (test/parity.test.ts) verifies structural contract between operations, CLI, and MCP.
New setup skill replacing install: auto-provision Supabase via CLI, AGENTS.md injection, target TTHW < 2 min.
E2E test suite against real Postgres+pgvector. 13 realistic fixtures (miniature brain with people, companies, deals, meetings, concepts), 14 test suites covering all operations, search quality benchmarks, idempotency stress tests, schema validation, and full setup journey verification.
GitHub Actions E2E workflow: Tier 1 (mechanical) on every PR, Tier 2 (LLM skills via OpenClaw) nightly.
docker-compose.test.yml and .env.testing.example for local E2E development.

Fixed

Schema loader in db.ts broke on PL/pgSQL trigger functions containing semicolons inside $$ blocks. Replaced per-statement execution with single conn.unsafe() call.
traverseGraph query failed with "could not identify equality operator for type json" when using SELECT DISTINCT with json_agg. Changed to jsonb_agg.

Changed

src/mcp/server.ts rewritten from ~233 to ~80 lines. Tool definitions and dispatch generated from operations[].
src/cli.ts rewritten. Shared operations auto-registered from operations[]. CLI-only commands (init, upgrade, import, export, files, embed) kept as manual registrations.
tools-json output now generated FROM operations[]. Third contract surface eliminated.
All 7 skills rewritten with tool-agnostic language. Works with both CLI and MCP plugin contexts.
File schema: storage_url column dropped, storage_path is the only identifier. URLs generated on demand via file_url operation.
Config loading: env vars (GBRAIN_DATABASE_URL, DATABASE_URL, OPENAI_API_KEY) override config file values. Plugin config injected via env vars.

Removed

12 command files migrated to operations.ts: get.ts, put.ts, delete.ts, list.ts, search.ts, query.ts, health.ts, stats.ts, tags.ts, link.ts, timeline.ts, version.ts.
storage_url column from files table.

[0.2.0.2] - 2026-04-07

Changed

Rewrote recommended brain schema doc with expanded architecture: database layer (entity registry, event ledger, fact store, relationship graph) presented as the core architecture, entity identity and deduplication, enrichment source ordering, epistemic discipline rules, worked examples showing full ingestion chains, concurrency guidance, and browser budget. Smoothed language for open-source readability.

[0.2.0.1] - 2026-04-07

Added

Recommended brain schema doc (docs/GBRAIN_RECOMMENDED_SCHEMA.md): full MECE directory structure, compiled truth + timeline pages, enrichment pipeline, resolver decision tree, skill architecture, and cron job recommendations. The OpenClaw paste now links to this as step 5.

Changed

First-time experience rewritten. "Try it" section shows your own data, not fictional PG essays. OpenClaw paste references the GitHub repo, includes bun install fallback, and has the agent pick a dynamic query based on what it imported.
Removed all references to data/kindling/ (a demo corpus directory that never existed).

[0.2.0] - 2026-04-05

Added

You can now keep your brain current with gbrain sync, which uses git's own diff machinery to process only what changed. No more 30-second full directory walks when 3 files changed.
Watch mode (gbrain sync --watch) polls for changes and syncs automatically. Set it and forget it.
Binary file management with gbrain files commands (list, upload, sync, verify). Store images, PDFs, and audio in Supabase Storage instead of clogging your git repo.
Install skill (skills/install/SKILL.md) that walks you through setup from scratch, including Supabase CLI magic path for zero-copy-paste onboarding.
Import and sync now share a checkpoint. Run gbrain import, then gbrain sync, and it picks up right where import left off. Zero gap.
Tag reconciliation on reimport. If you remove a tag from your markdown, it actually gets removed from the database now.
gbrain config show redacts database passwords so you can safely share your config.
updateSlug engine method preserves page identity (page_id, chunks, embeddings) across renames. Zero re-embedding cost.
sync_brain MCP tool returns structured results so agents know exactly what changed.
20 new sync tests (39 total across 3 test files)

[0.1.0] - 2026-04-05

Added

Pluggable engine interface (BrainEngine) with full Postgres + pgvector implementation
25+ CLI commands: init, get, put, delete, list, search, query, import, export, embed, stats, health, link/unlink/backlinks/graph, tag/untag/tags, timeline/timeline-add, history/revert, config, upgrade, serve, call
MCP stdio server with 20 tools mirroring all CLI operations
3-tier chunking: recursive (delimiter-aware), semantic (Savitzky-Golay boundary detection), LLM-guided (Claude Haiku topic shifts)
Hybrid search with Reciprocal Rank Fusion merging vector + keyword results
Multi-query expansion via Claude Haiku (2 alternative phrasings per query)
4-layer dedup pipeline: by source, cosine similarity, type diversity, per-page cap
OpenAI embedding service (text-embedding-3-large, 1536 dims) with batch support and exponential backoff
Postgres schema with pgvector HNSW, tsvector (trigger-based, spans timeline_entries), pg_trgm fuzzy slug matching
Smart slug resolution for reads (fuzzy match via pg_trgm)
Page version control with snapshot, history, and revert
Typed links with recursive CTE graph traversal (max depth configurable)
Brain health dashboard (embed coverage, stale pages, orphans, dead links)
Stale alert annotations in search results
Supabase init wizard with CLI auto-provision fallback
Slug validation to prevent path traversal on export
6 fat markdown skills: ingest, query, maintain, enrich, briefing, migrate
ClawHub manifest for skill distribution
Full design docs: GBRAIN_V0 spec, pluggable engine architecture, SQLite engine plan

32 KiB Raw Blame History

Changelog

[0.9.0] - 2026-04-11

Added

Changed

[0.8.0] - 2026-04-11

Added

Changed

Removed

[0.7.0] - 2026-04-11

Added

Changed

[0.6.1] - 2026-04-10

Fixed

Changed

Added

Contributors

[0.6.0] - 2026-04-10

Added

Fixed

Changed

[0.5.1] - 2026-04-10

Fixed

[0.5.0] - 2026-04-10

Added

[0.4.2] - 2026-04-10

Changed

Added

[0.4.1] - 2026-04-09

Added

Changed

Fixed

[0.4.0] - 2026-04-09

Added

Fixed

Changed

[0.3.0] - 2026-04-08

Added

Fixed

Changed

Removed

[0.2.0.2] - 2026-04-07

Changed

[0.2.0.1] - 2026-04-07

Added

Changed

[0.2.0] - 2026-04-05

Added

[0.1.0] - 2026-04-05

Added

32 KiB

Raw Blame History