Files
gbrain/docs/guides
Garry Tan 275158137a fix: v0.18.1 — RLS hardening + schema backfill (supersedes #336) (#343)
* fix(doctor): check ALL public tables for RLS, not just gbrain's own

The RLS check was hardcoded to only verify 10 gbrain-managed tables:
pages, content_chunks, links, tags, raw_data, page_versions,
timeline_entries, ingest_log, config, files.

Any other table in the public schema (created by the application,
extensions, or manually) was invisible to the check. This allowed
12 tables to exist without RLS for months — publicly readable by
anyone with the Supabase anon key.

Changes:
- Query ALL tables in public schema, not a hardcoded list
- Upgrade severity from 'warn' to 'fail' — missing RLS is a security
  issue, not a suggestion
- Include table count in success message for visibility
- Include remediation SQL in failure message

Supabase exposes the public schema via PostgREST. Any table without
RLS is readable/writable by the anon key by default.

* fix(schema): enable RLS on 10 gbrain-managed public tables

The base schema and prior migrations shipped 10 public tables
without Row Level Security enabled: access_tokens, mcp_request_log,
minion_inbox, minion_attachments, subagent_messages,
subagent_tool_executions, subagent_rate_leases, gbrain_cycle_locks,
budget_ledger, budget_reservations.

Supabase exposes the public schema via PostgREST, so tables without
RLS are readable and writable by anyone holding the anon key.
access_tokens and the subagent conversation history tables carry
the most sensitive data in the set.

Fix: add the missing ENABLE RLS statements to src/schema.sql
(inside the existing BYPASSRLS-gated DO block, so dev sessions
without bypass don't get locked out). Add a new schema migration
v17 rls_backfill_missing_tables that does the same on existing
brains. budget_ledger and budget_reservations were previously
migration-only (v12); promoted to the base schema so fresh installs
pick up RLS from the standard gate.

Regenerated src/core/schema-embedded.ts.

* fix(doctor): widen RLS check to all public tables, add GBRAIN:RLS_EXEMPT escape hatch

The RLS check was hardcoded to 10 gbrain-managed tables; any other
table in the public schema (plugin-created, user-created, extension-
created) was invisible to the check. Widen the scan to every
pg_tables row in the public schema.

Upgrade severity warn to fail. Missing RLS is a security issue, not
a suggestion. gbrain doctor now exits 1 when any public table lacks
RLS. Cron and CI wrappers that call gbrain doctor should be aware
of the exit-code flip.

Add an explicit escape hatch for tables that should stay readable
by the anon key on purpose (analytics, public materialized views,
plugin tables). The doctor reads pg_description for each non-RLS
table and treats a comment matching GBRAIN:RLS_EXEMPT reason=<why>
as an intentional exemption. Doctor enumerates exempt tables by
name on every successful run so they never go invisible.

There is no gbrain rls-exempt CLI subcommand by design. The escape
hatch is deliberately painful: operators drop to psql and type the
justification as raw SQL. Comment lives in pg_description, survives
pg_dump, shows up in schema diffs, and appears in shell history.

PGLite is now explicitly skipped with an ok status (embedded and
single-user, no PostgREST exposure). Previously hit the
db.getConnection() throw-catch path and surfaced a misleading warn.

Remediation SQL now quotes identifiers (ALTER TABLE "public"."<name>"
...) so it works on tables with hyphens, reserved words, or mixed
case.

See docs/guides/rls-and-you.md for the full user-facing guide.

* test: coverage for RLS hardening (doctor + migration + e2e)

Four layers of guard for the v0.18 RLS changes:

test/doctor.test.ts: source-grep structural regression guards on
the doctor RLS block — absence of the old tablename IN filter,
presence of status=fail on the gap branch, quoted-identifier
remediation SQL, PGLite skip wrapper, GBRAIN:RLS_EXEMPT parsing
with required reason=. Fast, no DB needed. Mirrors the
statement_timeout regression pattern in test/postgres-engine.test.ts.

test/migrate.test.ts: structural guard for migration v17. Asserts
the migration exists with the expected name, all 10 ALTER TABLE
statements are present, BYPASSRLS gating is in place, and
LATEST_VERSION has caught up.

test/e2e/mechanical.test.ts: rewrote the E2E RLS Verification
block. The old hardcoded-allowlist query is replaced with an
every-public-table-has-RLS assertion. Four new CLI-spawn cases
verify real end-to-end behavior: (a) no-RLS public table makes
gbrain doctor --json return status=fail with ALTER TABLE in the
message and exit code 1, (b) a GBRAIN:RLS_EXEMPT comment with a
valid reason makes doctor report the table as explicitly exempt
and keep status=ok, (c) a GBRAIN:RLS_EXEMPT prefix without a
reason= segment still fails doctor, (d) an unrelated comment on
a no-RLS table still fails doctor.

All helpers use try/finally with unique-per-run suffixes
(gbrain_rls_..._<pid>_<timestamp>) so assertion failures don't
pollute subsequent tests.

* docs: one-page guide for RLS and GBRAIN:RLS_EXEMPT escape hatch

Covers why RLS matters on Supabase (PostgREST exposes the public
schema to the anon key), what to do when gbrain doctor fails, the
exact SQL template for an intentional exemption, how to audit
exemptions later, and how the check behaves on PGLite vs
self-hosted Postgres.

Emphasizes that the escape hatch is deliberately painful on
purpose: there is no gbrain rls-exempt CLI subcommand and no
config-file allowlist. The operator drops to psql and writes the
justification in SQL, which makes the action visible in shell
history, pg_dump, schema diffs, and doctor output on every run.

Referenced from gbrain doctor's failure message when any public
table lacks RLS.

* chore: bump version and changelog (v0.18.0)

Reconciles VERSION and package.json (were drifting: 0.17.0 vs
0.16.4). Runtime gbrain --version reads from package.json via
src/version.ts, so prior ships were reporting 0.16.4. Both now
land on 0.18.0.

Minor bump (not patch) because gbrain doctor's exit code semantics
change: missing RLS on a public table was warn+exit-0, is now
fail+exit-1. Any external cron, CI, or skillpack-check wrapper
around gbrain doctor needs to be aware. skillpack-check.ts itself
is unaffected (uses --fast, skips DB checks).

CHANGELOG entry follows the release-summary format from CLAUDE.md:
headline, lead paragraph, numbers-that-matter table, what-this-
means-for-your-workflow, To take advantage of v0.18.0 block with
remediation SQL + exemption format, itemized changes.

Also sweeps a stale @Wintermute reference in the 0.17.0 entry to
"Garry's OpenClaw" per the CLAUDE.md privacy rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.18.1): address codex review (orchestrator wiring + fail-closed + identifier escape)

Four fixes from `/codex` review of the merged diff:

1. HIGH — wire migration v24 into the `gbrain apply-migrations`
   upgrade path. Without an orchestrator entry, `gbrain upgrade`'s
   post-upgrade step runs `apply-migrations --yes`, which walks the
   registry in `src/commands/migrations/index.ts`. The registry
   stopped at v0_18_0, so v24 never fired on upgrade (connectEngine
   and doctor do not call initSchema). New `v0_18_1.ts` orchestrator
   mirrors v0.18.0's Phase A: shells out to `gbrain init
   --migrate-only`, which triggers initSchema → runMigrations → v24
   applies. Registered in the migrations array.

2. HIGH — fail loudly when v24 runs under a non-BYPASSRLS role
   instead of RAISE WARNING-then-silently-bumping-version. The
   runner at migrate.ts:773 unconditionally calls
   `setConfig('version', String(m.version))` when a migration
   completes without throwing, so a WARNING-and-continue path would
   permanently lock the backfill out: schema_version=24 on the next
   run means `m.version > current` is false and v24 is skipped
   forever, even after the role gets BYPASSRLS. Changed `RAISE
   WARNING` → `RAISE EXCEPTION` so the transaction aborts,
   schema_version stays at 23, and a subsequent initSchema retries
   cleanly after the role is fixed. Test asserts the SQL uses
   EXCEPTION and does not use WARNING.

3. MEDIUM — escape double-quote characters in the remediation SQL
   output. doctor.ts was building `ALTER TABLE "public"."${n}"`
   with `n` un-escaped, so a pathological table name containing a
   literal `"` would break out of the quoted identifier and produce
   invalid copy-paste SQL. Double the `"` before interpolating,
   matching Postgres quoted-identifier escaping rules. Extremely
   rare in practice, cheap to get right.

4. LOW — CHANGELOG cleanup: corrected the upgrade-behavior claim
   (v24 runs via `apply-migrations --yes` through the new
   orchestrator, not during `gbrain doctor`) and split the "tables
   with RLS" row into two metrics (21 base-schema tables + 2
   migration-only budget_* tables = 23 managed total, all covered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: add v0.18.1 to apply-migrations skippedFuture expectations

CI-only failure: test/apply-migrations.test.ts hardcodes the
orchestrator-migration version list in two `skippedFuture` expectations.
The v0.18.1 orchestrator I added in the prior commit pushed the list to
8 entries. Both assertions now include 0.18.1 at the tail.

Caught by the gbrain CI run on the merged branch — locally the rest of
the unit suite (dream/orphans) is flaky due to unrelated PGLite
parallelism, but `bun test test/apply-migrations.test.ts` now passes
18/18. CI should follow.

* docs: scrub v0.18.1 CHANGELOG — remove specific-table attack surface

Responsible-disclosure pass on the public-facing release notes. The
prior CHANGELOG entry enumerated which gbrain-managed public tables
had shipped without RLS and highlighted the most sensitive ones by
name. That gives anyone reading the CHANGELOG a directed probe list
for unpatched Supabase installs before operators have had a chance
to run `gbrain upgrade`.

Rewritten to describe the change at a functional level (what doctor
does now, what the upgrade path does, what the escape hatch is)
without naming the specific tables or quantifying the gap. The actual
SQL remains in the binary — anyone reverse-engineering can find it
there — but we shouldn't put it on the release page with a banner.

User-facing content kept intact: the "To take advantage of" block,
the upgrade commands, the exemption SQL template, the breaking
exit-code note.

* docs(CLAUDE.md): add responsible-disclosure rule for release notes

Prior incident on this branch: the original v0.18.1 CHANGELOG entry
enumerated the specific public tables that had shipped without RLS,
quantified the exposure duration, and highlighted the most sensitive
ones by name. Garry caught it. Scrubbed in ecd06a0.

This directive codifies the rule so future sessions (or other agents
working in this repo) don't repeat the mistake:

- Describe security fixes functionally, not by attack surface.
- Public artifacts (CHANGELOG, README, docs/, PR titles/bodies,
  commit messages, release pages) get the functional description.
- Private artifacts (plan files under ~/.claude/plans/ or
  ~/.gstack/projects/) keep the detailed before/after tables.
- Source code will disclose the specifics to reverse engineers
  anyway — that's intrinsic. The concern is the broadcast-channel
  asymmetry of a release page.

Also added a corresponding feedback memory at
~/.claude/projects/.../feedback_responsible_disclosure.md so the rule
carries across sessions and other projects, not just gbrain.

Placed right after the existing privacy rule (scrub real names) since
they share the same "public artifact hygiene" posture.

* chore: regenerate llms.txt + llms-full.txt (CLAUDE.md drift)

Adding the responsible-disclosure rule to CLAUDE.md in ffe340d
diverged the committed llms-full.txt from the generator output.
The build-llms drift-guard test caught it in CI. Regenerated.

* fix(v24): guard budget_ledger + budget_reservations with IF EXISTS

Garry flagged: migration v24 fires `ALTER TABLE budget_ledger ENABLE
ROW LEVEL SECURITY` unconditionally. budget_ledger and
budget_reservations are migration-only (v12) — not in schema.sql,
not re-created on every initSchema. In the normal flow v12 runs
before v24 so they exist, but two edge cases break that assumption:

  1. An operator manually dropped them (budget data is regenerable
     from resolver call logs, so `DROP TABLE` is a reasonable
     cleanup move).
  2. A brain was somehow running an old gbrain that lacked v12, and
     is only catching up now.

Bare ALTER hits 42P01 (relation does not exist), aborts the
transaction, and leaves schema_version at 23. On next initSchema,
v24 retries and hits the same error — stuck in a loop.

Fix: wrap each of the two budget ALTERs in
    IF EXISTS (SELECT 1 FROM information_schema.tables
                WHERE table_schema = 'public'
                  AND table_name = '<tbl>') THEN ... END IF;

The other 8 tables are not guarded. schema.sql creates them
idempotently on every initSchema run before migrations fire, so
they are guaranteed to exist by the time v24 runs. Adding guards
there would be unnecessary and make the SQL noisier.

Also simplified the DECLARE/BEGIN structure: moved the
non-BYPASSRLS early-exit to the top so the happy path reads
cleanly without the outer IF.

Tests:
  - test/migrate.test.ts: new assertion that both budget_* ALTERs
    are wrapped in information_schema.tables IF EXISTS blocks;
    BYPASSRLS gate assertion relaxed to match either phrasing.
  - Manual e2e: fresh Postgres init (v0→v24), then DROP TABLE
    budget_ledger + budget_reservations, reset version=23, re-run
    init. v24 applied cleanly, version advanced to 24, budget_*
    stayed dropped. Without the guard this would have errored out.

* test(e2e): v24 self-heals when budget_* tables are missing

Behavioral e2e proof for the IF EXISTS guard added in 2fc7780. Scenario:

  1. Fresh Postgres init to v24 (setupDB in beforeAll).
  2. DROP TABLE budget_ledger + budget_reservations.
  3. Roll config.version back to '23'.
  4. CLI-spawn `gbrain init --non-interactive` to re-trigger initSchema.
  5. Assert: exit 0, no 42P01 in stderr, version advances to 24,
     budget_* stay dropped (since v12 doesn't re-run at
     current=23 > v12=12).

Without the guard, step 4 hits 42P01 (relation does not exist),
aborts the transaction, leaves version at 23, and the next
initSchema re-runs v24 forever — an infinite retry loop. This test
catches any future regression that strips the guard.

Cleanup (finally block) restores budget_* with the exact migration
v12 schema so downstream tests that reference these tables see the
original shape. Version is restored from the pre-test snapshot.

Runs with the rest of the E2E: RLS Verification block. 78/78 in
test/e2e/mechanical.test.ts with the addition.

---------

Co-authored-by: Wintermute <wintermute@garrytan.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 07:17:40 -07:00
..