fix: community fix wave — 10 PRs, 7 contributors (v0.9.1) (#65)

* fix: security hardening — search DoS, slug hijack, symlink traversal, content bombs, stdin guard 4 security vulnerabilities closed: - Search limit clamped to 100 (MAX_SEARCH_LIMIT) with statement_timeout 8s - Frontmatter slug authority enforced (path-derived, mismatch rejected) - Symlink traversal blocked (lstatSync in walker + importFromFile) - Content size guard on importFromContent (Buffer.byteLength, 5MB) - Stdin size guard in parseOpArgs (5MB cap) Search pagination added (--offset param on search + query operations). Clamp warning emitted when limit is capped. Co-Authored-By: garagon <garagon@users.noreply.github.com> * fix: PGLite concurrent access lock — prevent Aborted() crash File-based advisory lock using atomic mkdir with PID tracking and 5-minute stale detection. Clear error messages show which process holds the lock and how to recover. Co-Authored-By: danbr <danbr@users.noreply.github.com> * fix: 12 data integrity fixes + stale embedding prevention CTE searchKeyword rewrite (SQL-level LIMIT, not JS splice). Write validation on addLink/addTag/addTimelineEntry/putRawData/createVersion. Health metrics now measure real problems (stale_pages, orphan_pages, dead_links). Orphan chunk cleanup on empty pages. Embedding error logging. contentHash now covers all PageInput fields. Stale embedding NULL'd when chunk_text changes (prevents wrong vector on new text). hybridSearch stops double-embedding query. MCP param validation. type/exclude_slugs search filters now work. pgcrypto extension for Postgres <13. Co-Authored-By: win4r <win4r@users.noreply.github.com> * perf: 30x embedAll speedup + O(n²) fix + ask alias Sliding worker pool (concurrency 20, tunable via GBRAIN_EMBED_CONCURRENCY). O(n²) chunk lookup in embedPage replaced with Map. gbrain ask alias for query (CLI-only, not in MCP tools-json). .idea added to .gitignore. Co-Authored-By: stephenhungg <stephenhungg@users.noreply.github.com> Co-Authored-By: sharziki <sharziki@users.noreply.github.com> Co-Authored-By: hnshah <hnshah@users.noreply.github.com> Co-Authored-By: doguabaris <doguabaris@users.noreply.github.com> * chore: bump version and changelog (v0.9.1) Community fix wave: 10 PRs, 7 contributors. 4 security fixes, PGLite crash fix, 12 data integrity fixes, 30x embed speedup, search pagination, ask alias. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: garagon <garagon@users.noreply.github.com> Co-authored-by: danbr <danbr@users.noreply.github.com> Co-authored-by: win4r <win4r@users.noreply.github.com> Co-authored-by: stephenhungg <stephenhungg@users.noreply.github.com> Co-authored-by: sharziki <sharziki@users.noreply.github.com> Co-authored-by: hnshah <hnshah@users.noreply.github.com> Co-authored-by: doguabaris <doguabaris@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 07:48:47 -10:00
parent 87bb2a59eb
commit 13773be071
27 changed files with 1099 additions and 83 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -10,3 +10,4 @@ bin/
 .gstack/
 supabase/.temp/
 .claude/skills/
+.idea
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,35 @@

 All notable changes to GBrain will be documented in this file.

+## [0.9.1] - 2026-04-11
+
+### Fixed
+
+- **Your brain can't be poisoned by rogue frontmatter anymore.** Slug authority is now path-derived. A file at `notes/random.md` can't declare `slug: people/admin` and silently overwrite someone else's page. Mismatches are rejected with a clear error telling you exactly what to fix.
+- **Symlinks in your notes directory can't exfiltrate files.** The import walker now uses `lstatSync` and refuses to follow symlinks, blocking the attack where a contributor plants a link to `~/.zshrc` in the brain directory. Defense-in-depth: `importFromFile` itself also checks.
+- **Giant payloads through MCP can't rack up your OpenAI bill.** `importFromContent` now checks `Buffer.byteLength` before any processing. 10 MB of emoji through `put_page`? Rejected before chunking starts.
+- **Search can't be weaponized into a DoS.** `limit` is clamped to 100 across all search paths (keyword, vector, hybrid). `statement_timeout: 8s` on the Postgres connection as defense-in-depth. Requesting `limit: 10000000` now gets you 100 results and a warning.
+- **PGLite stops crashing when two processes touch the same brain.** File-based advisory lock using atomic `mkdir` with PID tracking and 5-minute stale detection. Clear error messages tell you which process holds the lock and how to recover.
+- **12 data integrity fixes landed.** Orphan chunks cleaned up on empty pages. Write operations (`addLink`, `addTag`, `addTimelineEntry`, `putRawData`, `createVersion`) now throw when the target page doesn't exist instead of silently no-opping. Health metrics (`stale_pages`, `dead_links`, `orphan_pages`) now measure real problems instead of always returning 0. Keyword search moved from JS-side sort-and-splice to a SQL CTE with `LIMIT`. MCP server validates params before dispatch.
+- **Stale embeddings can't lie to you anymore.** When chunk text changes but embedding fails, the old vector is now NULL'd out instead of preserved. Previously, search could return results based on outdated vectors attached to new text.
+- **Embedding failures are no longer silent.** The `catch { /* non-fatal */ }` is gone. You now get `[gbrain] embedding failed for slug (N chunks): error message` in stderr. Still non-fatal, but you know what happened.
+- **O(n^2) chunk lookup in `embedPage` is gone.** Replaced `find() + indexOf()` with a single `Map` lookup. Matches the pattern `embedAll` already uses.
+- **Stdin bombs blocked.** `parseOpArgs` now caps stdin at 5 MB before the full buffer is consumed.
+
+### Added
+
+- **`gbrain embed --all` is 30x faster.** Sliding worker pool with 20 concurrent workers (tunable via `GBRAIN_EMBED_CONCURRENCY`). A 20,000-chunk corpus that took 2.5 hours now finishes in ~8 minutes.
+- **Search pagination.** Both `search` and `query` now accept `--offset` for paginating through results. Combined with the 100-result ceiling, you can now page through large result sets.
+- **`gbrain ask` is an alias for `gbrain query`.** CLI-only, doesn't appear in MCP tools-json.
+- **Content hash now covers all page fields.** Title, type, and frontmatter changes trigger re-import. First sync after upgrade will re-import all pages (one-time, expected).
+- **Migration file for v0.9.1.** Auto-update agent knows to expect the full re-import and will run `gbrain embed --all` afterward.
+- **`pgcrypto` extension added to schema.** Fallback for `gen_random_uuid()` on Postgres < 13.
+
+### Changed
+
+- **Search type and exclude_slugs filters now work.** These were advertised in the API but never implemented. Both `searchKeyword` and `searchVector` now respect `type` and `exclude_slugs` params.
+- **Hybrid search no longer double-embeds the query.** `expandQuery` already includes the original, so we use it directly instead of prepending.
+
 ## [0.9.0] - 2026-04-11

 ### Added
--- a/2
+++ b/2
@@ -1 +1 @@
-0.9.0
+0.9.1
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "gbrain",
-  "version": "0.9.0",
+  "version": "0.9.1",
  "description": "Postgres-native personal knowledge brain with hybrid RAG search",
  "type": "module",
  "main": "src/core/index.ts",
--- a/skills/migrations/v0.9.1.md
+++ b/skills/migrations/v0.9.1.md
@@ -0,0 +1,68 @@
+---
+version: 0.9.1
+feature_pitch:
+  headline: "Security hardening + 30x faster embeddings"
+  description: "4 security fixes (search DoS, slug hijack, symlink traversal, content bombs), PGLite crash fix, 12 data integrity fixes, 30x embed speedup, and search pagination. Community fix wave: 10 PRs, 7 contributors."
+  recipe: null
+  tiers: null
+---
+
+# v0.9.1 Migration: Security + Integrity + Performance
+
+This migration includes security hardening, data integrity fixes, and a
+performance improvement that changes how hashes are computed. Existing users
+will see a one-time full re-import on next sync.
+
+## What Changed
+
+### Content hash algorithm
+The content hash now includes all page fields (title, type, frontmatter), not
+just compiled_truth + timeline. This means every existing page's hash will differ
+on next sync, triggering a full re-import. This is expected and correct.
+
+### Search limit ceiling
+Search results are now capped at 100. If your agent requests more, it gets a
+warning and 100 results. Use the new `--offset` flag to paginate.
+
+### Embedding performance
+`gbrain embed --all` now runs 30x faster (sliding worker pool, 20 concurrent
+by default). A 20,000-chunk corpus that took 2.5 hours now takes ~8 minutes.
+
+## Agent Instructions
+
+### Step 1: Run sync (expect full re-import)
+
+The content hash change will cause all pages to re-import. This is expected.
+
+```bash
+gbrain sync
+```
+
+You will see all pages reported as "imported" instead of the usual few "modified".
+This is a one-time event.
+
+### Step 2: Re-embed all pages
+
+After the re-import, embeddings need refreshing:
+
+```bash
+gbrain embed --all
+```
+
+With the new parallel worker pool, this should complete in under 10 minutes
+for most brains. Set `GBRAIN_EMBED_CONCURRENCY=10` if you hit API rate limits.
+
+### Step 3: Verify search works
+
+```bash
+gbrain search "test query" --limit 5
+```
+
+If search returns results, the re-import and re-embed worked correctly.
+
+### Step 4: Done
+
+```bash
+mkdir -p ~/.gbrain/migrations
+echo '{"version":"0.9.1","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","status":"complete"}' >> ~/.gbrain/migrations/completed.jsonl
+```
--- a/src/cli.ts
+++ b/src/cli.ts
@@ -22,7 +22,7 @@ const CLI_ONLY = new Set(['init', 'upgrade', 'post-upgrade', 'check-update', 'in

 async function main() {
  const args = process.argv.slice(2);
-  const command = args[0];
+  let command = args[0];

  if (!command || command === '--help' || command === '-h') {
    printHelp();
@@ -42,6 +42,11 @@ async function main() {

  const subArgs = args.slice(1);

+  // DX alias: `ask` is a natural-language alias for `query`
+  if (command === 'ask') {
+    command = 'query';
+  }
+
  // Per-command --help
  if (subArgs.includes('--help') || subArgs.includes('-h')) {
    const op = cliOps.get(command);
@@ -122,7 +127,13 @@ function parseOpArgs(op: Operation, args: string[]): Record<string, unknown> {

  // Read stdin for content params
  if (op.cliHints?.stdin && !params[op.cliHints.stdin] && !process.stdin.isTTY) {
-    params[op.cliHints.stdin] = readFileSync('/dev/stdin', 'utf-8');
+    const stdinContent = readFileSync('/dev/stdin', 'utf-8');
+    const MAX_STDIN = 5_000_000; // 5MB
+    if (Buffer.byteLength(stdinContent, 'utf-8') > MAX_STDIN) {
+      console.error(`Error: stdin content exceeds ${MAX_STDIN} bytes. Split into smaller inputs.`);
+      process.exit(1);
+    }
+    params[op.cliHints.stdin] = stdinContent;
  }

  return params;
@@ -379,6 +390,7 @@ PAGES
 SEARCH
  search <query>                     Keyword search (tsvector)
  query <question> [--no-expand]     Hybrid search (RRF + expansion)
+  ask <question> [--no-expand]       Alias for query

 IMPORT/EXPORT
  import <dir> [--no-embed]          Import markdown directory
--- a/src/commands/embed.ts
+++ b/src/commands/embed.ts
@@ -54,17 +54,17 @@ async function embedPage(engine: BrainEngine, slug: string) {
  }

  const embeddings = await embedBatch(toEmbed.map(c => c.chunk_text));
-  const updated: ChunkInput[] = chunks.map((c, i) => {
-    const needsEmbed = toEmbed.find(te => te.chunk_index === c.chunk_index);
-    const embIdx = needsEmbed ? toEmbed.indexOf(needsEmbed) : -1;
-    return {
-      chunk_index: c.chunk_index,
-      chunk_text: c.chunk_text,
-      chunk_source: c.chunk_source,
-      embedding: embIdx >= 0 ? embeddings[embIdx] : undefined,
-      token_count: c.token_count || Math.ceil(c.chunk_text.length / 4),
-    };
-  });
+  const embeddingMap = new Map<number, Float32Array>();
+  for (let j = 0; j < toEmbed.length; j++) {
+    embeddingMap.set(toEmbed[j].chunk_index, embeddings[j]);
+  }
+  const updated: ChunkInput[] = chunks.map(c => ({
+    chunk_index: c.chunk_index,
+    chunk_text: c.chunk_text,
+    chunk_source: c.chunk_source,
+    embedding: embeddingMap.get(c.chunk_index),
+    token_count: c.token_count || Math.ceil(c.chunk_text.length / 4),
+  }));

  await engine.upsertChunks(slug, updated);
  console.log(`${slug}: embedded ${toEmbed.length} chunks`);
@@ -74,15 +74,29 @@ async function embedAll(engine: BrainEngine, staleOnly: boolean) {
  const pages = await engine.listPages({ limit: 100000 });
  let total = 0;
  let embedded = 0;
+  let processed = 0;

-  for (let i = 0; i < pages.length; i++) {
-    const page = pages[i];
+  // Concurrency limit for parallel page embedding.
+  // Each worker pulls pages from a shared queue and makes independent
+  // embedBatch calls to OpenAI + upsertChunks to the engine.
+  //
+  // Default 20: keeps us well under OpenAI's embedding RPM limit
+  // (3000+/min for tier 1 = 50+/sec, 20 parallel is safely below) and
+  // avoids overwhelming postgres connection pools. Users can tune via
+  // GBRAIN_EMBED_CONCURRENCY env var based on their tier/infra.
+  const CONCURRENCY = parseInt(process.env.GBRAIN_EMBED_CONCURRENCY || '20', 10);
+
+  async function embedOnePage(page: typeof pages[number]) {
    const chunks = await engine.getChunks(page.slug);
    const toEmbed = staleOnly
      ? chunks.filter(c => !c.embedded_at)
      : chunks;

-    if (toEmbed.length === 0) continue;
+    if (toEmbed.length === 0) {
+      processed++;
+      process.stdout.write(`\r  ${processed}/${pages.length} pages, ${embedded} chunks embedded`);
+      return;
+    }

    try {
      const embeddings = await embedBatch(toEmbed.map(c => c.chunk_text));
@@ -106,8 +120,25 @@ async function embedAll(engine: BrainEngine, staleOnly: boolean) {
    }

    total += toEmbed.length;
-    process.stdout.write(`\r  ${i + 1}/${pages.length} pages, ${embedded} chunks embedded`);
+    processed++;
+    process.stdout.write(`\r  ${processed}/${pages.length} pages, ${embedded} chunks embedded`);
  }

+  // Sliding worker pool: N workers share a queue and each pulls the
+  // next page as soon as it finishes its current one. This handles
+  // uneven per-page workloads (some pages have 1 chunk, others have 50)
+  // much better than a fixed-window Promise.all, since fast workers
+  // don't wait for slow workers to finish an entire window.
+  let nextIdx = 0;
+  async function worker() {
+    while (nextIdx < pages.length) {
+      const idx = nextIdx++;
+      await embedOnePage(pages[idx]);
+    }
+  }
+
+  const numWorkers = Math.min(CONCURRENCY, pages.length);
+  await Promise.all(Array.from({ length: numWorkers }, () => worker()));
+
  console.log(`\n\nEmbedded ${embedded} chunks across ${pages.length} pages`);
 }
--- a/src/commands/import.ts
+++ b/src/commands/import.ts
@@ -1,4 +1,4 @@
-import { readdirSync, statSync, existsSync, writeFileSync, readFileSync, unlinkSync } from 'fs';
+import { readdirSync, lstatSync, existsSync, writeFileSync, readFileSync, unlinkSync } from 'fs';
 import { execFileSync } from 'child_process';
 import { join, relative } from 'path';
 import { cpus, totalmem, homedir } from 'os';
@@ -211,7 +211,7 @@ export async function runImport(engine: BrainEngine, args: string[]) {
  }
 }

-function collectMarkdownFiles(dir: string): string[] {
+export function collectMarkdownFiles(dir: string): string[] {
  const files: string[] = [];

  function walk(d: string) {
@@ -224,13 +224,28 @@ function collectMarkdownFiles(dir: string): string[] {
      const full = join(d, entry);
      let stat;
      try {
-        stat = statSync(full);
+        // lstatSync, not statSync: we must NOT follow symlinks. A symlink
+        // inside the brain directory can point to any file the importing
+        // user can read, so a contributor to a shared brain could plant
+        // notes/innocent.md as a symlink to ~/.gbrain/config.json, /etc/passwd,
+        // or another sensitive file outside the brain root — and on the
+        // next `gbrain import` it would be read, chunked, embedded, and
+        // indexed, at which point a bearer-token holder could exfiltrate
+        // it via search/get_page. See L002 in report/findings.md.
+        stat = lstatSync(full);
      } catch {
        // Broken symlink or permission error — skip
        console.warn(`[gbrain import] Skipping unreadable path: ${full}`);
        continue;
      }

+      // Skip symlinks (both file and directory targets). This also blocks
+      // circular symlink DoS since we refuse to descend into linked dirs.
+      if (stat.isSymbolicLink()) {
+        console.warn(`[gbrain import] Skipping symlink: ${full}`);
+        continue;
+      }
+
      if (stat.isDirectory()) {
        walk(full);
      } else if (entry.endsWith('.md') || entry.endsWith('.mdx')) {
--- a/src/core/db.ts
+++ b/src/core/db.ts
@@ -3,6 +3,7 @@ import { GBrainError, type EngineConfig } from './types.ts';
 import { SCHEMA_SQL } from './schema-embedded.ts';

 let sql: ReturnType<typeof postgres> | null = null;
+let connectedUrl: string | null = null;

 export function getConnection(): ReturnType<typeof postgres> {
  if (!sql) {
@@ -16,7 +17,13 @@ export function getConnection(): ReturnType<typeof postgres> {
 }

 export async function connect(config: EngineConfig): Promise<void> {
-  if (sql) return;
+  if (sql) {
+    // Warn if a different URL is passed — the old connection is still in use
+    if (config.database_url && connectedUrl && config.database_url !== connectedUrl) {
+      console.warn('[gbrain] connect() called with a different database_url but a connection already exists. Using existing connection.');
+    }
+    return;
+  }

  const url = config.database_url;
  if (!url) {
@@ -32,6 +39,7 @@ export async function connect(config: EngineConfig): Promise<void> {
      max: 10,
      idle_timeout: 20,
      connect_timeout: 10,
+      connection: { statement_timeout: '8s' },
      types: {
        // Register pgvector type
        bigint: postgres.BigInt,
@@ -40,8 +48,10 @@ export async function connect(config: EngineConfig): Promise<void> {

    // Test connection
    await sql`SELECT 1`;
+    connectedUrl = url;
  } catch (e: unknown) {
    sql = null;
+    connectedUrl = null;
    const msg = e instanceof Error ? e.message : String(e);
    throw new GBrainError(
      'Cannot connect to database',
@@ -55,6 +65,7 @@ export async function disconnect(): Promise<void> {
  if (sql) {
    await sql.end();
    sql = null;
+    connectedUrl = null;
  }
 }

--- a/src/core/engine.ts
+++ b/src/core/engine.ts
@@ -11,6 +11,16 @@ import type {
  EngineConfig,
 } from './types.ts';

+/** Maximum results returned by search operations. Internal bulk operations (listPages) are not clamped. */
+export const MAX_SEARCH_LIMIT = 100;
+
+/** Clamp a user-provided search limit to a safe range. */
+export function clampSearchLimit(limit: number | undefined, defaultLimit = 20): number {
+  if (limit === undefined || limit === null || !Number.isFinite(limit) || Number.isNaN(limit)) return defaultLimit;
+  if (limit <= 0) return defaultLimit;
+  return Math.min(Math.floor(limit), MAX_SEARCH_LIMIT);
+}
+
 export interface BrainEngine {
  // Lifecycle
  connect(config: EngineConfig): Promise<void>;
--- a/src/core/import-file.ts
+++ b/src/core/import-file.ts
@@ -1,9 +1,10 @@
-import { readFileSync, statSync } from 'fs';
+import { readFileSync, statSync, lstatSync } from 'fs';
 import { createHash } from 'crypto';
 import type { BrainEngine } from './engine.ts';
 import { parseMarkdown } from './markdown.ts';
 import { chunkText } from './chunkers/recursive.ts';
 import { embedBatch } from './embedding.ts';
+import { slugifyPath } from './sync.ts';
 import type { ChunkInput } from './types.ts';

 export interface ImportResult {
@@ -20,6 +21,12 @@ const MAX_FILE_SIZE = 5_000_000; // 5MB
 * parse -> hash -> embed (external) -> transaction(version + putPage + tags + chunks)
 *
 * Used by put_page operation and importFromFile.
+ *
+ * Size guard: content is rejected if its UTF-8 byte length exceeds MAX_FILE_SIZE.
+ * importFromFile already enforces this against disk size before calling here, but
+ * the remote MCP put_page operation passes caller-supplied content straight in,
+ * so the guard has to live on this function — otherwise an authenticated caller
+ * can spend the owner's OpenAI budget at will by shipping a megabyte-sized page.
 */
 export async function importFromContent(
  engine: BrainEngine,
@@ -27,6 +34,19 @@ export async function importFromContent(
  content: string,
  opts: { noEmbed?: boolean } = {},
 ): Promise<ImportResult> {
+  // Reject oversized payloads before any parsing, chunking, or embedding happens.
+  // Uses Buffer.byteLength to count UTF-8 bytes the same way disk size would,
+  // so the network path behaves identically to the file path.
+  const byteLength = Buffer.byteLength(content, 'utf-8');
+  if (byteLength > MAX_FILE_SIZE) {
+    return {
+      slug,
+      status: 'skipped',
+      chunks: 0,
+      error: `Content too large (${byteLength} bytes, max ${MAX_FILE_SIZE}). Split the content into smaller files or remove large embedded assets.`,
+    };
+  }
+
  const parsed = parseMarkdown(content, slug + '.md');

  // Hash includes ALL fields for idempotency (not just compiled_truth + timeline)
@@ -67,7 +87,9 @@ export async function importFromContent(
        chunks[i].embedding = embeddings[i];
        chunks[i].token_count = Math.ceil(chunks[i].chunk_text.length / 4);
      }
-    } catch { /* non-fatal */ }
+    } catch (e: unknown) {
+      console.warn(`[gbrain] embedding failed for ${slug} (${chunks.length} chunks): ${e instanceof Error ? e.message : String(e)}`);
+    }
  }

  // Transaction wraps all DB writes
@@ -95,6 +117,9 @@ export async function importFromContent(

    if (chunks.length > 0) {
      await tx.upsertChunks(slug, chunks);
+    } else {
+      // Content is empty — delete stale chunks so they don't ghost in search results
+      await tx.deleteChunks(slug);
    }
  });

@@ -103,6 +128,13 @@ export async function importFromContent(

 /**
 * Import from a file path. Validates size, reads content, delegates to importFromContent.
+ *
+ * Slug authority: the path on disk is the source of truth. `frontmatter.slug`
+ * is only accepted when it matches `slugifyPath(relativePath)`. A mismatch is
+ * rejected rather than silently honored — otherwise a file at `notes/random.md`
+ * could declare `slug: people/elon` in frontmatter and overwrite the legitimate
+ * `people/elon` page on the next `gbrain sync` or `gbrain import`. In shared
+ * brains where PRs are mergeable, this is a silent page-hijack primitive.
 */
 export async function importFromFile(
  engine: BrainEngine,
@@ -110,6 +142,12 @@ export async function importFromFile(
  relativePath: string,
  opts: { noEmbed?: boolean } = {},
 ): Promise<ImportResult> {
+  // Defense-in-depth: reject symlinks before reading content.
+  const lstat = lstatSync(filePath);
+  if (lstat.isSymbolicLink()) {
+    return { slug: relativePath, status: 'skipped', chunks: 0, error: `Skipping symlink: ${filePath}` };
+  }
+
  const stat = statSync(filePath);
  if (stat.size > MAX_FILE_SIZE) {
    return { slug: relativePath, status: 'skipped', chunks: 0, error: `File too large (${stat.size} bytes)` };
@@ -117,7 +155,25 @@ export async function importFromFile(

  const content = readFileSync(filePath, 'utf-8');
  const parsed = parseMarkdown(content, relativePath);
-  return importFromContent(engine, parsed.slug, content, opts);
+
+  // Enforce path-authoritative slug. parseMarkdown prefers frontmatter.slug over
+  // the path-derived slug, so a mismatch here means the frontmatter is trying
+  // to rewrite a page whose filesystem location says something different.
+  const expectedSlug = slugifyPath(relativePath);
+  if (parsed.slug !== expectedSlug) {
+    return {
+      slug: expectedSlug,
+      status: 'skipped',
+      chunks: 0,
+      error:
+        `Frontmatter slug "${parsed.slug}" does not match path-derived slug "${expectedSlug}" ` +
+        `(from ${relativePath}). Remove the frontmatter "slug:" line or move the file.`,
+    };
+  }
+
+  // Pass the path-derived slug explicitly so that any future change to
+  // parseMarkdown's precedence rules cannot re-introduce this bug.
+  return importFromContent(engine, expectedSlug, content, opts);
 }

 // Backward compat
--- a/src/core/operations.ts
+++ b/src/core/operations.ts
@@ -176,9 +176,13 @@ const search: Operation = {
  params: {
    query: { type: 'string', required: true },
    limit: { type: 'number', description: 'Max results (default 20)' },
+    offset: { type: 'number', description: 'Skip first N results (for pagination)' },
  },
  handler: async (ctx, p) => {
-    return ctx.engine.searchKeyword(p.query as string, { limit: (p.limit as number) || 20 });
+    return ctx.engine.searchKeyword(p.query as string, {
+      limit: (p.limit as number) || 20,
+      offset: (p.offset as number) || 0,
+    });
  },
  cliHints: { name: 'search', positional: ['query'] },
 };
@@ -189,12 +193,14 @@ const query: Operation = {
  params: {
    query: { type: 'string', required: true },
    limit: { type: 'number', description: 'Max results (default 20)' },
+    offset: { type: 'number', description: 'Skip first N results (for pagination)' },
    expand: { type: 'boolean', description: 'Enable multi-query expansion (default: true)' },
  },
  handler: async (ctx, p) => {
    const expand = p.expand !== false;
    return hybridSearch(ctx.engine, p.query as string, {
      limit: (p.limit as number) || 20,
+      offset: (p.offset as number) || 0,
      expansion: expand,
      expandFn: expand ? expandQuery : undefined,
    });
--- a/src/core/pglite-engine.ts
+++ b/src/core/pglite-engine.ts
@@ -3,8 +3,10 @@ import { vector } from '@electric-sql/pglite/vector';
 import { pg_trgm } from '@electric-sql/pglite/contrib/pg_trgm';
 import type { Transaction } from '@electric-sql/pglite';
 import type { BrainEngine } from './engine.ts';
+import { MAX_SEARCH_LIMIT, clampSearchLimit } from './engine.ts';
 import { runMigrations } from './migrate.ts';
 import { PGLITE_SCHEMA_SQL } from './pglite-schema.ts';
+import { acquireLock, releaseLock, type LockHandle } from './pglite-lock.ts';
 import type {
  Page, PageInput, PageFilters, PageType,
  Chunk, ChunkInput,
@@ -23,6 +25,7 @@ type PGLiteDB = PGlite;

 export class PGLiteEngine implements BrainEngine {
  private _db: PGLiteDB | null = null;
+  private _lock: LockHandle | null = null;

  get db(): PGLiteDB {
    if (!this._db) throw new Error('PGLite not connected. Call connect() first.');
@@ -32,6 +35,14 @@ export class PGLiteEngine implements BrainEngine {
  // Lifecycle
  async connect(config: EngineConfig): Promise<void> {
    const dataDir = config.database_path || undefined; // undefined = in-memory
+
+    // Acquire file lock to prevent concurrent PGLite access (crashes with Aborted())
+    this._lock = await acquireLock(dataDir);
+
+    if (!this._lock.acquired) {
+      throw new Error('Could not acquire PGLite lock. Another gbrain process is using the database.');
+    }
+
    this._db = await PGlite.create({
      dataDir,
      extensions: { vector, pg_trgm },
@@ -43,6 +54,10 @@ export class PGLiteEngine implements BrainEngine {
      await this._db.close();
      this._db = null;
    }
+    if (this._lock?.acquired) {
+      await releaseLock(this._lock);
+      this._lock = null;
+    }
  }

  async initSchema(): Promise<void> {
@@ -156,7 +171,12 @@ export class PGLiteEngine implements BrainEngine {

  // Search
  async searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]> {
-    const limit = opts?.limit || 20;
+    const limit = clampSearchLimit(opts?.limit);
+    const offset = opts?.offset || 0;
+
+    if (opts?.limit && opts.limit > MAX_SEARCH_LIMIT) {
+      console.warn(`[gbrain] Warning: search limit clamped from ${opts.limit} to ${MAX_SEARCH_LIMIT}`);
+    }

    const { rows } = await this.db.query(
      `SELECT DISTINCT ON (p.slug)
@@ -173,19 +193,23 @@ export class PGLiteEngine implements BrainEngine {
      [query]
    );

-    // Re-sort by score (DISTINCT ON requires ORDER BY slug first) and apply limit
+    // Re-sort by score (DISTINCT ON requires ORDER BY slug first) and apply limit + offset
    const sorted = (rows as Record<string, unknown>[]).sort(
      (a: any, b: any) => b.score - a.score
    );
-    sorted.splice(limit);

-    return sorted.map(rowToSearchResult);
+    return sorted.slice(offset, offset + limit).map(rowToSearchResult);
  }

  async searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]> {
-    const limit = opts?.limit || 20;
+    const limit = clampSearchLimit(opts?.limit);
+    const offset = opts?.offset || 0;
    const vecStr = '[' + Array.from(embedding).join(',') + ']';

+    if (opts?.limit && opts.limit > MAX_SEARCH_LIMIT) {
+      console.warn(`[gbrain] Warning: search limit clamped from ${opts.limit} to ${MAX_SEARCH_LIMIT}`);
+    }
+
    const { rows } = await this.db.query(
      `SELECT
        p.slug, p.id as page_id, p.title, p.type,
@@ -198,8 +222,9 @@ export class PGLiteEngine implements BrainEngine {
      JOIN pages p ON p.id = cc.page_id
      WHERE cc.embedding IS NOT NULL
      ORDER BY cc.embedding <=> $1::vector
-      LIMIT $2`,
-      [vecStr, limit]
+      LIMIT $2
+      OFFSET $3`,
+      [vecStr, limit, offset]
    );

    return (rows as Record<string, unknown>[]).map(rowToSearchResult);
@@ -250,7 +275,7 @@ export class PGLiteEngine implements BrainEngine {
       ON CONFLICT (page_id, chunk_index) DO UPDATE SET
         chunk_text = EXCLUDED.chunk_text,
         chunk_source = EXCLUDED.chunk_source,
-         embedding = COALESCE(EXCLUDED.embedding, content_chunks.embedding),
+         embedding = CASE WHEN EXCLUDED.chunk_text != content_chunks.chunk_text THEN EXCLUDED.embedding ELSE COALESCE(EXCLUDED.embedding, content_chunks.embedding) END,
         model = COALESCE(EXCLUDED.model, content_chunks.model),
         token_count = EXCLUDED.token_count,
         embedded_at = COALESCE(EXCLUDED.embedded_at, content_chunks.embedded_at)`,
--- a/src/core/pglite-lock.ts
+++ b/src/core/pglite-lock.ts
@@ -0,0 +1,144 @@
+/**
+ * PGLite File Lock — prevents concurrent process access to the same data directory.
+ *
+ * PGLite uses embedded Postgres (WASM) which only supports one connection at a time.
+ * When `gbrain embed` (which can take minutes) is running and another process tries
+ * to connect, PGLite throws `Aborted()` because it can't handle concurrent access.
+ *
+ * This module implements a simple advisory lock using a lock file next to the data
+ * directory. It uses atomic `mkdir` (which is POSIX-atomic) combined with PID tracking
+ * for stale lock detection.
+ *
+ * Usage:
+ *   const lock = await acquireLock(dataDir);
+ *   try { ... } finally { await releaseLock(lock); }
+ */
+
+import { mkdirSync, existsSync, readFileSync, writeFileSync, rmSync, statSync } from 'fs';
+import { join } from 'path';
+
+const LOCK_DIR_NAME = '.gbrain-lock';
+const LOCK_FILE = 'lock';
+const STALE_THRESHOLD_MS = 5 * 60 * 1000; // 5 minutes — embed jobs can be long
+
+export interface LockHandle {
+  lockDir: string;
+  acquired: boolean;
+}
+
+function getLockDir(dataDir: string | undefined): string {
+  // Use the parent of the data dir for the lock, or a temp location for in-memory
+  if (!dataDir) {
+    // In-memory PGLite — no concurrent access possible since it's process-scoped
+    // Return a sentinel that we skip
+    return '';
+  }
+  return join(dataDir, LOCK_DIR_NAME);
+}
+
+function isProcessAlive(pid: number): boolean {
+  try {
+    // Sending signal 0 checks existence without actually sending a signal
+    process.kill(pid, 0);
+    return true;
+  } catch {
+    return false;
+  }
+}
+
+/**
+ * Attempt to acquire an exclusive lock on the PGLite data directory.
+ * Returns { acquired: true } if the lock was obtained, { acquired: false } otherwise.
+ * Stale locks (from dead processes) are automatically cleaned up.
+ */
+export async function acquireLock(dataDir: string | undefined, opts?: { timeoutMs?: number }): Promise<LockHandle> {
+  const lockDir = getLockDir(dataDir);
+
+  // In-memory PGLite — no lock needed (process-scoped, can't be shared)
+  if (!lockDir) {
+    return { lockDir: '', acquired: true };
+  }
+
+  const timeoutMs = opts?.timeoutMs ?? 30_000; // 30 second default timeout
+  const startTime = Date.now();
+
+  while (Date.now() - startTime < timeoutMs) {
+    // Check for stale lock first
+    if (existsSync(lockDir)) {
+      const lockPath = join(lockDir, LOCK_FILE);
+      try {
+        const lockData = JSON.parse(readFileSync(lockPath, 'utf-8'));
+        const lockPid = lockData.pid as number;
+        const lockTime = lockData.acquired_at as number;
+
+        // Is the locking process still alive?
+        if (!isProcessAlive(lockPid)) {
+          // Stale lock — clean it up
+          try { rmSync(lockDir, { recursive: true, force: true }); } catch { /* race condition, try again */ }
+        } else if (Date.now() - lockTime > STALE_THRESHOLD_MS) {
+          // Lock held for too long — assume stale (e.g., process hung)
+          // Still alive but probably stuck — force remove
+          try { rmSync(lockDir, { recursive: true, force: true }); } catch { /* race condition */ }
+        } else {
+          // Lock is held by a live process — wait and retry
+          await new Promise(r => setTimeout(r, 1000));
+          continue;
+        }
+      } catch {
+        // Corrupt lock file — remove it
+        try { rmSync(lockDir, { recursive: true, force: true }); } catch { /* race condition */ }
+      }
+    }
+
+    // Try to acquire lock (atomic mkdir)
+    try {
+      mkdirSync(lockDir, { recursive: false });
+      // We got the lock — write our PID
+      const lockPath = join(lockDir, LOCK_FILE);
+      writeFileSync(lockPath, JSON.stringify({
+        pid: process.pid,
+        acquired_at: Date.now(),
+        command: process.argv.slice(1).join(' '),
+      }), { mode: 0o644 });
+
+      return { lockDir, acquired: true };
+    } catch (e: unknown) {
+      // mkdir failed — someone else grabbed it between our check and mkdir
+      // This is fine, we'll retry
+      if (Date.now() - startTime >= timeoutMs) {
+        // Timeout — report which process holds the lock
+        const lockPath = join(lockDir, LOCK_FILE);
+        try {
+          const lockData = JSON.parse(readFileSync(lockPath, 'utf-8'));
+          throw new Error(
+            `GBrain: Timed out waiting for PGLite lock. Process ${lockData.pid} has held it since ${new Date(lockData.acquired_at).toISOString()} (command: ${lockData.command}). ` +
+            `If that process is dead, remove ${lockDir} and try again.`
+          );
+        } catch (readErr) {
+          if (readErr instanceof Error && readErr.message.startsWith('GBrain')) throw readErr;
+          throw new Error(
+            `GBrain: Timed out waiting for PGLite lock. Remove ${lockDir} and try again.`
+          );
+        }
+      }
+      // Brief wait before retry
+      await new Promise(r => setTimeout(r, 500));
+    }
+  }
+
+  // Should not reach here, but just in case
+  throw new Error(`GBrain: Timed out waiting for PGLite lock.`);
+}
+
+/**
+ * Release a previously acquired lock.
+ */
+export async function releaseLock(lock: LockHandle): Promise<void> {
+  if (!lock.lockDir || !lock.acquired) return;
+
+  try {
+    rmSync(lock.lockDir, { recursive: true, force: true });
+  } catch {
+    // Lock file already removed (e.g., by stale cleanup) — that's fine
+  }
+}
--- a/src/core/postgres-engine.ts
+++ b/src/core/postgres-engine.ts
@@ -1,5 +1,6 @@
 import postgres from 'postgres';
 import type { BrainEngine } from './engine.ts';
+import { MAX_SEARCH_LIMIT, clampSearchLimit } from './engine.ts';
 import { runMigrations } from './migrate.ts';
 import { SCHEMA_SQL } from './schema-embedded.ts';
 import type {
@@ -98,7 +99,7 @@ export class PostgresEngine implements BrainEngine {
  async putPage(slug: string, page: PageInput): Promise<Page> {
    slug = validateSlug(slug);
    const sql = this.sql;
-    const hash = page.content_hash || contentHash(page.compiled_truth, page.timeline || '');
+    const hash = page.content_hash || contentHash(page);
    const frontmatter = page.frontmatter || {};

    const rows = await sql`
@@ -178,31 +179,56 @@ export class PostgresEngine implements BrainEngine {
  // Search
  async searchKeyword(query: string, opts?: SearchOpts): Promise<SearchResult[]> {
    const sql = this.sql;
-    const limit = opts?.limit || 20;
+    const limit = clampSearchLimit(opts?.limit);
+    const offset = opts?.offset || 0;
+    const type = opts?.type;
+    const excludeSlugs = opts?.exclude_slugs;

+    if (opts?.limit && opts.limit > MAX_SEARCH_LIMIT) {
+      console.warn(`[gbrain] Warning: search limit clamped from ${opts.limit} to ${MAX_SEARCH_LIMIT}`);
+    }
+
+    // CTE: rank pages by FTS score, then pick the best chunk per page in SQL
    const rows = await sql`
-      SELECT DISTINCT ON (p.slug)
-        p.slug, p.id as page_id, p.title, p.type,
-        cc.chunk_text, cc.chunk_source,
-        ts_rank(p.search_vector, websearch_to_tsquery('english', ${query})) AS score,
-        CASE WHEN p.updated_at < (
-          SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id
-        ) THEN true ELSE false END AS stale
-      FROM pages p
-      JOIN content_chunks cc ON cc.page_id = p.id
-      WHERE p.search_vector @@ websearch_to_tsquery('english', ${query})
-      ORDER BY p.slug, score DESC
+      WITH ranked_pages AS (
+        SELECT p.id, p.slug, p.title, p.type,
+          ts_rank(p.search_vector, websearch_to_tsquery('english', ${query})) AS score
+        FROM pages p
+        WHERE p.search_vector @@ websearch_to_tsquery('english', ${query})
+          ${type ? sql`AND p.type = ${type}` : sql``}
+          ${excludeSlugs?.length ? sql`AND p.slug != ALL(${excludeSlugs})` : sql``}
+        ORDER BY score DESC
+        LIMIT ${limit}
+        OFFSET ${offset}
+      ),
+      best_chunks AS (
+        SELECT DISTINCT ON (rp.slug)
+          rp.slug, rp.id as page_id, rp.title, rp.type, rp.score,
+          cc.chunk_text, cc.chunk_source
+        FROM ranked_pages rp
+        JOIN content_chunks cc ON cc.page_id = rp.id
+        ORDER BY rp.slug, cc.chunk_index
+      )
+      SELECT slug, page_id, title, type, chunk_text, chunk_source, score,
+        false AS stale
+      FROM best_chunks
+      ORDER BY score DESC
    `;
-    // Re-sort by score (DISTINCT ON requires ORDER BY slug first) and apply limit
-    rows.sort((a: any, b: any) => b.score - a.score);
-    rows.splice(limit);

    return rows.map(rowToSearchResult);
  }

  async searchVector(embedding: Float32Array, opts?: SearchOpts): Promise<SearchResult[]> {
    const sql = this.sql;
-    const limit = opts?.limit || 20;
+    const limit = clampSearchLimit(opts?.limit);
+    const offset = opts?.offset || 0;
+    const type = opts?.type;
+    const excludeSlugs = opts?.exclude_slugs;
+
+    if (opts?.limit && opts.limit > MAX_SEARCH_LIMIT) {
+      console.warn(`[gbrain] Warning: search limit clamped from ${opts.limit} to ${MAX_SEARCH_LIMIT}`);
+    }
+
    const vecStr = '[' + Array.from(embedding).join(',') + ']';

    const rows = await sql`
@@ -210,14 +236,15 @@ export class PostgresEngine implements BrainEngine {
        p.slug, p.id as page_id, p.title, p.type,
        cc.chunk_text, cc.chunk_source,
        1 - (cc.embedding <=> ${vecStr}::vector) AS score,
-        CASE WHEN p.updated_at < (
-          SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id
-        ) THEN true ELSE false END AS stale
+        false AS stale
      FROM content_chunks cc
      JOIN pages p ON p.id = cc.page_id
      WHERE cc.embedding IS NOT NULL
+        ${type ? sql`AND p.type = ${type}` : sql``}
+        ${excludeSlugs?.length ? sql`AND p.slug != ALL(${excludeSlugs})` : sql``}
      ORDER BY cc.embedding <=> ${vecStr}::vector
      LIMIT ${limit}
+      OFFSET ${offset}
    `;

    return rows.map(rowToSearchResult);
@@ -268,7 +295,7 @@ export class PostgresEngine implements BrainEngine {
       ON CONFLICT (page_id, chunk_index) DO UPDATE SET
         chunk_text = EXCLUDED.chunk_text,
         chunk_source = EXCLUDED.chunk_source,
-         embedding = COALESCE(EXCLUDED.embedding, content_chunks.embedding),
+         embedding = CASE WHEN EXCLUDED.chunk_text != content_chunks.chunk_text THEN EXCLUDED.embedding ELSE COALESCE(EXCLUDED.embedding, content_chunks.embedding) END,
         model = COALESCE(EXCLUDED.model, content_chunks.model),
         token_count = EXCLUDED.token_count,
         embedded_at = COALESCE(EXCLUDED.embedded_at, content_chunks.embedded_at)`,
@@ -298,7 +325,7 @@ export class PostgresEngine implements BrainEngine {
  // Links
  async addLink(from: string, to: string, context?: string, linkType?: string): Promise<void> {
    const sql = this.sql;
-    await sql`
+    const result = await sql`
      INSERT INTO links (from_page_id, to_page_id, link_type, context)
      SELECT f.id, t.id, ${linkType || ''}, ${context || ''}
      FROM pages f, pages t
@@ -306,7 +333,9 @@ export class PostgresEngine implements BrainEngine {
      ON CONFLICT (from_page_id, to_page_id) DO UPDATE SET
        link_type = EXCLUDED.link_type,
        context = EXCLUDED.context
+      RETURNING id
    `;
+    if (result.length === 0) throw new Error(`addLink failed: page "${from}" or "${to}" not found`);
  }

  async removeLink(from: string, to: string): Promise<void> {
@@ -381,9 +410,13 @@ export class PostgresEngine implements BrainEngine {
  // Tags
  async addTag(slug: string, tag: string): Promise<void> {
    const sql = this.sql;
+    // Verify page exists before attempting insert (ON CONFLICT DO NOTHING
+    // swallows the "already tagged" case, but we still need to detect missing pages)
+    const page = await sql`SELECT id FROM pages WHERE slug = ${slug}`;
+    if (page.length === 0) throw new Error(`addTag failed: page "${slug}" not found`);
    await sql`
      INSERT INTO tags (page_id, tag)
-      SELECT id, ${tag} FROM pages WHERE slug = ${slug}
+      VALUES (${page[0].id}, ${tag})
      ON CONFLICT (page_id, tag) DO NOTHING
    `;
  }
@@ -410,11 +443,13 @@ export class PostgresEngine implements BrainEngine {
  // Timeline
  async addTimelineEntry(slug: string, entry: TimelineInput): Promise<void> {
    const sql = this.sql;
-    await sql`
+    const result = await sql`
      INSERT INTO timeline_entries (page_id, date, source, summary, detail)
      SELECT id, ${entry.date}::date, ${entry.source || ''}, ${entry.summary}, ${entry.detail || ''}
      FROM pages WHERE slug = ${slug}
+      RETURNING id
    `;
+    if (result.length === 0) throw new Error(`addTimelineEntry failed: page "${slug}" not found`);
  }

  async getTimeline(slug: string, opts?: TimelineOpts): Promise<TimelineEntry[]> {
@@ -451,14 +486,16 @@ export class PostgresEngine implements BrainEngine {
  // Raw data
  async putRawData(slug: string, source: string, data: object): Promise<void> {
    const sql = this.sql;
-    await sql`
+    const result = await sql`
      INSERT INTO raw_data (page_id, source, data)
      SELECT id, ${source}, ${JSON.stringify(data)}::jsonb
      FROM pages WHERE slug = ${slug}
      ON CONFLICT (page_id, source) DO UPDATE SET
        data = EXCLUDED.data,
        fetched_at = now()
+      RETURNING id
    `;
+    if (result.length === 0) throw new Error(`putRawData failed: page "${slug}" not found`);
  }

  async getRawData(slug: string, source?: string): Promise<RawData[]> {
@@ -489,6 +526,7 @@ export class PostgresEngine implements BrainEngine {
      FROM pages WHERE slug = ${slug}
      RETURNING *
    `;
+    if (rows.length === 0) throw new Error(`createVersion failed: page "${slug}" not found`);
    return rows[0] as unknown as PageVersion;
  }

@@ -555,13 +593,16 @@ export class PostgresEngine implements BrainEngine {
        (SELECT count(*) FROM content_chunks WHERE embedded_at IS NOT NULL)::float /
          GREATEST((SELECT count(*) FROM content_chunks), 1)::float as embed_coverage,
        (SELECT count(*) FROM pages p
-         WHERE p.updated_at < (SELECT MAX(te.created_at) FROM timeline_entries te WHERE te.page_id = p.id)
+         WHERE (p.compiled_truth != '' OR p.timeline != '')
+           AND NOT EXISTS (SELECT 1 FROM content_chunks cc WHERE cc.page_id = p.id)
        ) as stale_pages,
        (SELECT count(*) FROM pages p
         WHERE NOT EXISTS (SELECT 1 FROM links l WHERE l.to_page_id = p.id)
+           AND NOT EXISTS (SELECT 1 FROM links l WHERE l.from_page_id = p.id)
        ) as orphan_pages,
-        (SELECT count(*) FROM links l
-         WHERE NOT EXISTS (SELECT 1 FROM pages p WHERE p.id = l.to_page_id)
+        (SELECT count(*) FROM content_chunks cc
+         JOIN pages p ON p.id = cc.page_id
+         WHERE p.compiled_truth = '' AND p.timeline = ''
        ) as dead_links,
        (SELECT count(*) FROM content_chunks WHERE embedded_at IS NULL) as missing_embeddings
    `;
--- a/src/core/search/hybrid.ts
+++ b/src/core/search/hybrid.ts
@@ -7,6 +7,7 @@
 */

 import type { BrainEngine } from '../engine.ts';
+import { MAX_SEARCH_LIMIT, clampSearchLimit } from '../engine.ts';
 import type { SearchResult, SearchOpts } from '../types.ts';
 import { embed } from '../embedding.ts';
 import { dedupResults } from './dedup.ts';
@@ -24,21 +25,25 @@ export async function hybridSearch(
  opts?: HybridSearchOpts,
 ): Promise<SearchResult[]> {
  const limit = opts?.limit || 20;
+  const offset = opts?.offset || 0;
+  const innerLimit = Math.min(limit * 2, MAX_SEARCH_LIMIT);

  // Run keyword search (always available, no API key needed)
-  const keywordResults = await engine.searchKeyword(query, { limit: limit * 2 });
+  const keywordResults = await engine.searchKeyword(query, { limit: innerLimit });

  // Skip vector search entirely if no OpenAI key is configured
  if (!process.env.OPENAI_API_KEY) {
-    return dedupResults(keywordResults).slice(0, limit);
+    return dedupResults(keywordResults).slice(offset, offset + limit);
  }

  // Determine query variants (optionally with expansion)
+  // expandQuery already includes the original query in its return value,
+  // so we use it directly instead of prepending query again
  let queries = [query];
  if (opts?.expansion && opts?.expandFn) {
    try {
-      const expanded = await opts.expandFn(query);
-      queries = [query, ...expanded].slice(0, 3);
+      queries = await opts.expandFn(query);
+      if (queries.length === 0) queries = [query];
    } catch {
      // Expansion failure is non-fatal
    }
@@ -49,14 +54,14 @@ export async function hybridSearch(
  try {
    const embeddings = await Promise.all(queries.map(q => embed(q)));
    vectorLists = await Promise.all(
-      embeddings.map(emb => engine.searchVector(emb, { limit: limit * 2 })),
+      embeddings.map(emb => engine.searchVector(emb, { limit: innerLimit })),
    );
  } catch {
    // Embedding failure is non-fatal, fall back to keyword-only
  }

  if (vectorLists.length === 0) {
-    return dedupResults(keywordResults).slice(0, limit);
+    return dedupResults(keywordResults).slice(offset, offset + limit);
  }

  // Merge all result lists via RRF
@@ -66,7 +71,7 @@ export async function hybridSearch(
  // Dedup
  const deduped = dedupResults(fused);

-  return deduped.slice(0, limit);
+  return deduped.slice(offset, offset + limit);
 }

 /**
--- a/src/core/types.ts
+++ b/src/core/types.ts
@@ -66,6 +66,7 @@ export interface SearchResult {

 export interface SearchOpts {
  limit?: number;
+  offset?: number;
  type?: PageType;
  exclude_slugs?: string[];
 }
--- a/src/core/utils.ts
+++ b/src/core/utils.ts
@@ -1,5 +1,5 @@
 import { createHash } from 'crypto';
-import type { Page, PageType, Chunk, SearchResult } from './types.ts';
+import type { Page, PageInput, PageType, Chunk, SearchResult } from './types.ts';

 /**
 * Validate and normalize a slug. Slugs are lowercased repo-relative paths.
@@ -13,10 +13,19 @@ export function validateSlug(slug: string): string {
 }

 /**
- * SHA-256 hash of compiled_truth + timeline, used for import idempotency.
+ * SHA-256 hash of page content, used for import idempotency.
+ * Hashes all PageInput fields to match importFromContent's hash algorithm.
 */
-export function contentHash(compiledTruth: string, timeline: string): string {
-  return createHash('sha256').update(compiledTruth + '\n---\n' + timeline).digest('hex');
+export function contentHash(page: PageInput): string {
+  return createHash('sha256')
+    .update(JSON.stringify({
+      title: page.title,
+      type: page.type,
+      compiled_truth: page.compiled_truth,
+      timeline: page.timeline || '',
+      frontmatter: page.frontmatter || {},
+    }))
+    .digest('hex');
 }

 export function rowToPage(row: Record<string, unknown>): Page {
--- a/src/mcp/server.ts
+++ b/src/mcp/server.ts
@@ -3,10 +3,29 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
 import { ListToolsRequestSchema, CallToolRequestSchema } from '@modelcontextprotocol/sdk/types.js';
 import type { BrainEngine } from '../core/engine.ts';
 import { operations, OperationError } from '../core/operations.ts';
-import type { OperationContext } from '../core/operations.ts';
+import type { Operation, OperationContext } from '../core/operations.ts';
 import { loadConfig } from '../core/config.ts';
 import { VERSION } from '../version.ts';

+/** Validate required params exist and have the expected type */
+function validateParams(op: Operation, params: Record<string, unknown>): string | null {
+  for (const [key, def] of Object.entries(op.params)) {
+    if (def.required && (params[key] === undefined || params[key] === null)) {
+      return `Missing required parameter: ${key}`;
+    }
+    if (params[key] !== undefined && params[key] !== null) {
+      const val = params[key];
+      const expected = def.type;
+      if (expected === 'string' && typeof val !== 'string') return `Parameter "${key}" must be a string`;
+      if (expected === 'number' && typeof val !== 'number') return `Parameter "${key}" must be a number`;
+      if (expected === 'boolean' && typeof val !== 'boolean') return `Parameter "${key}" must be a boolean`;
+      if (expected === 'object' && (typeof val !== 'object' || Array.isArray(val))) return `Parameter "${key}" must be an object`;
+      if (expected === 'array' && !Array.isArray(val)) return `Parameter "${key}" must be an array`;
+    }
+  }
+  return null;
+}
+
 export async function startMcpServer(engine: BrainEngine) {
  const server = new Server(
    { name: 'gbrain', version: VERSION },
@@ -54,8 +73,14 @@ export async function startMcpServer(engine: BrainEngine) {
      dryRun: !!(params?.dry_run),
    };

+    const safeParams = params || {};
+    const validationError = validateParams(op, safeParams);
+    if (validationError) {
+      return { content: [{ type: 'text', text: JSON.stringify({ error: 'invalid_params', message: validationError }, null, 2) }], isError: true };
+    }
+
    try {
-      const result = await op.handler(ctx, params || {});
+      const result = await op.handler(ctx, safeParams);
      return { content: [{ type: 'text', text: JSON.stringify(result, null, 2) }] };
    } catch (e: unknown) {
      if (e instanceof OperationError) {
@@ -79,6 +104,9 @@ export async function handleToolCall(
  const op = operations.find(o => o.name === tool);
  if (!op) throw new Error(`Unknown tool: ${tool}`);

+  const validationError = validateParams(op, params);
+  if (validationError) throw new Error(validationError);
+
  const ctx: OperationContext = {
    engine,
    config: loadConfig() || { engine: 'postgres' },
--- a/src/schema.sql
+++ b/src/schema.sql
@@ -2,6 +2,8 @@

 CREATE EXTENSION IF NOT EXISTS vector;
 CREATE EXTENSION IF NOT EXISTS pg_trgm;
+-- gen_random_uuid() is core in Postgres 13+; enable pgcrypto as fallback for older versions
+CREATE EXTENSION IF NOT EXISTS pgcrypto;

 -- ============================================================
 -- pages: the core content table
--- a/test/cli.test.ts
+++ b/test/cli.test.ts
@@ -40,6 +40,26 @@ describe('CLI version', () => {
  });
 });

+describe('ask alias', () => {
+  test('ask alias maps to query in source', () => {
+    expect(cliSource).toContain("if (command === 'ask')");
+    expect(cliSource).toContain("command = 'query'");
+  });
+
+  test('ask does NOT appear in --tools-json output', async () => {
+    const proc = Bun.spawn(['bun', 'run', 'src/cli.ts', '--tools-json'], {
+      cwd: new URL('..', import.meta.url).pathname,
+      stdout: 'pipe',
+      stderr: 'pipe',
+    });
+    const stdout = await new Response(proc.stdout).text();
+    await proc.exited;
+    const tools = JSON.parse(stdout);
+    const names = tools.map((t: any) => t.name);
+    expect(names).not.toContain('ask');
+  });
+});
+
 describe('CLI dispatch integration', () => {
  test('--version outputs version', async () => {
    const proc = Bun.spawn(['bun', 'run', 'src/cli.ts', '--version'], {
--- a/test/embed.test.ts
+++ b/test/embed.test.ts
@@ -0,0 +1,128 @@
+import { describe, test, expect, mock, beforeEach, afterEach } from 'bun:test';
+import type { BrainEngine } from '../src/core/engine.ts';
+
+// Mock the embedding module BEFORE importing runEmbed, so runEmbed picks up
+// the mocked embedBatch. We track max concurrent invocations via a counter
+// that increments on entry and decrements when the mock resolves.
+let activeEmbedCalls = 0;
+let maxConcurrentEmbedCalls = 0;
+let totalEmbedCalls = 0;
+
+mock.module('../src/core/embedding.ts', () => ({
+  embedBatch: async (texts: string[]) => {
+    activeEmbedCalls++;
+    totalEmbedCalls++;
+    if (activeEmbedCalls > maxConcurrentEmbedCalls) {
+      maxConcurrentEmbedCalls = activeEmbedCalls;
+    }
+    // Simulate API latency so concurrent workers actually overlap.
+    await new Promise(r => setTimeout(r, 30));
+    activeEmbedCalls--;
+    return texts.map(() => new Float32Array(1536));
+  },
+}));
+
+// Import AFTER mocking.
+const { runEmbed } = await import('../src/commands/embed.ts');
+
+// Proxy-based mock engine that matches test/import-file.test.ts pattern.
+function mockEngine(overrides: Partial<Record<string, any>> = {}): BrainEngine {
+  const calls: { method: string; args: any[] }[] = [];
+  const track = (method: string) => (...args: any[]) => {
+    calls.push({ method, args });
+    if (overrides[method]) return overrides[method](...args);
+    return Promise.resolve(null);
+  };
+  const engine = new Proxy({} as any, {
+    get(_, prop: string) {
+      if (prop === '_calls') return calls;
+      if (overrides[prop]) return overrides[prop];
+      return track(prop);
+    },
+  });
+  return engine;
+}
+
+beforeEach(() => {
+  activeEmbedCalls = 0;
+  maxConcurrentEmbedCalls = 0;
+  totalEmbedCalls = 0;
+});
+
+afterEach(() => {
+  delete process.env.GBRAIN_EMBED_CONCURRENCY;
+});
+
+describe('runEmbed --all (parallel)', () => {
+  test('runs embedBatch calls concurrently across pages', async () => {
+    const NUM_PAGES = 20;
+    const pages = Array.from({ length: NUM_PAGES }, (_, i) => ({ slug: `page-${i}` }));
+    // Each page has one chunk without an embedding (stale).
+    const chunksBySlug = new Map(
+      pages.map(p => [
+        p.slug,
+        [{ chunk_index: 0, chunk_text: `text for ${p.slug}`, chunk_source: 'compiled_truth', embedded_at: null, token_count: 4 }],
+      ]),
+    );
+
+    const engine = mockEngine({
+      listPages: async () => pages,
+      getChunks: async (slug: string) => chunksBySlug.get(slug) || [],
+      upsertChunks: async () => {},
+    });
+
+    process.env.GBRAIN_EMBED_CONCURRENCY = '10';
+
+    await runEmbed(engine, ['--all']);
+
+    expect(totalEmbedCalls).toBe(NUM_PAGES);
+    // Concurrency actually happened.
+    expect(maxConcurrentEmbedCalls).toBeGreaterThan(1);
+    // And stayed within the configured limit.
+    expect(maxConcurrentEmbedCalls).toBeLessThanOrEqual(10);
+  });
+
+  test('respects GBRAIN_EMBED_CONCURRENCY=1 (serial)', async () => {
+    const pages = Array.from({ length: 5 }, (_, i) => ({ slug: `page-${i}` }));
+    const chunksBySlug = new Map(
+      pages.map(p => [
+        p.slug,
+        [{ chunk_index: 0, chunk_text: `text ${p.slug}`, chunk_source: 'compiled_truth', embedded_at: null, token_count: 4 }],
+      ]),
+    );
+
+    const engine = mockEngine({
+      listPages: async () => pages,
+      getChunks: async (slug: string) => chunksBySlug.get(slug) || [],
+      upsertChunks: async () => {},
+    });
+
+    process.env.GBRAIN_EMBED_CONCURRENCY = '1';
+
+    await runEmbed(engine, ['--all']);
+
+    expect(totalEmbedCalls).toBe(5);
+    expect(maxConcurrentEmbedCalls).toBe(1);
+  });
+
+  test('skips pages whose chunks are all already embedded when --stale', async () => {
+    const pages = [{ slug: 'fresh' }, { slug: 'stale' }];
+    const chunksBySlug = new Map<string, any[]>([
+      ['fresh', [{ chunk_index: 0, chunk_text: 'hi', chunk_source: 'compiled_truth', embedded_at: '2026-01-01', token_count: 1 }]],
+      ['stale', [{ chunk_index: 0, chunk_text: 'hi', chunk_source: 'compiled_truth', embedded_at: null, token_count: 1 }]],
+    ]);
+
+    const engine = mockEngine({
+      listPages: async () => pages,
+      getChunks: async (slug: string) => chunksBySlug.get(slug) || [],
+      upsertChunks: async () => {},
+    });
+
+    process.env.GBRAIN_EMBED_CONCURRENCY = '5';
+
+    await runEmbed(engine, ['--stale']);
+
+    // Only the stale page triggers an embedBatch call.
+    expect(totalEmbedCalls).toBe(1);
+  });
+});
--- a/test/import-file.test.ts
+++ b/test/import-file.test.ts
@@ -1,7 +1,7 @@
 import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
-import { writeFileSync, mkdirSync, rmSync } from 'fs';
+import { writeFileSync, mkdirSync, rmSync, symlinkSync } from 'fs';
 import { join } from 'path';
-import { importFile } from '../src/core/import-file.ts';
+import { importFile, importFromContent } from '../src/core/import-file.ts';
 import type { BrainEngine } from '../src/core/engine.ts';

 const TMP = join(import.meta.dir, '.tmp-import-test');
@@ -87,6 +87,91 @@ This is the compiled truth.
    expect((engine as any)._calls.length).toBe(0);
  });

+  test('rejects frontmatter slug that does not match the file path', async () => {
+    // In a shared brain where contributors can land PRs, this prevents a
+    // poisoned notes/random.md from declaring `slug: people/elon` in its
+    // frontmatter and overwriting the legitimate people/elon page on sync.
+    const filePath = join(TMP, 'hijack.md');
+    writeFileSync(filePath, `---
+type: person
+title: Elon Musk
+slug: people/elon
+---
+
+Poisoned content that would overwrite people/elon.
+`);
+
+    const engine = mockEngine();
+    const result = await importFile(engine, filePath, 'notes/random.md', { noEmbed: true });
+
+    expect(result.status).toBe('skipped');
+    expect(result.error).toContain('people/elon');
+    expect(result.error).toContain('notes/random');
+    // No writes to the DB — the hijack never reaches putPage/createVersion.
+    expect((engine as any)._calls.length).toBe(0);
+  });
+
+  test('accepts frontmatter slug that matches the file path', async () => {
+    // Sanity: a legitimate file whose frontmatter slug happens to equal the
+    // path-derived slug must still import.
+    const filePath = join(TMP, 'alice.md');
+    writeFileSync(filePath, `---
+type: person
+title: Alice
+slug: people/alice-smith
+---
+
+Legit content.
+`);
+
+    const engine = mockEngine();
+    const result = await importFile(engine, filePath, 'people/alice-smith.md', { noEmbed: true });
+
+    expect(result.status).toBe('imported');
+    expect(result.slug).toBe('people/alice-smith');
+  });
+
+  test('uses path-derived slug when no frontmatter slug is set', async () => {
+    // The common case: no frontmatter.slug, so the path determines the slug.
+    const filePath = join(TMP, 'concept-path.md');
+    writeFileSync(filePath, `---
+type: concept
+title: From Path
+---
+
+Content.
+`);
+
+    const engine = mockEngine();
+    const result = await importFile(engine, filePath, 'concepts/from-path.md', { noEmbed: true });
+
+    expect(result.status).toBe('imported');
+    expect(result.slug).toBe('concepts/from-path');
+  });
+
+  test('skips symlinks in importFromFile (defense-in-depth)', async () => {
+    // Even if the walker somehow passes a symlink through, importFromFile
+    // should catch it and return skipped.
+    const realFile = join(TMP, 'real-target.md');
+    writeFileSync(realFile, `---
+type: concept
+title: Real
+---
+
+Content.
+`);
+    const linkPath = join(TMP, 'symlink-file.md');
+    try { rmSync(linkPath); } catch { /* may not exist */ }
+    symlinkSync(realFile, linkPath);
+
+    const engine = mockEngine();
+    const result = await importFile(engine, linkPath, 'symlink-file.md', { noEmbed: true });
+
+    expect(result.status).toBe('skipped');
+    expect(result.error).toContain('symlink');
+    expect((engine as any)._calls.length).toBe(0);
+  });
+
  test('skips file when content hash matches (idempotent)', async () => {
    const filePath = join(TMP, 'unchanged.md');
    writeFileSync(filePath, `---
@@ -252,6 +337,49 @@ Content to chunk but not embed.
    }
  });

+  test('rejects in-memory content larger than MAX_FILE_SIZE', async () => {
+    // The remote MCP put_page operation hands user-supplied content straight
+    // to importFromContent, which is the path this guard defends. The guard
+    // must trigger BEFORE parseMarkdown / chunkText / embedBatch — if it doesn't,
+    // an authenticated attacker can force the owner to pay for embedding a
+    // multi-megabyte string.
+    const bigContent = '---\ntitle: Big\n---\n' + 'x'.repeat(5_100_000);
+
+    const engine = mockEngine();
+    const result = await importFromContent(engine, 'big-slug', bigContent, { noEmbed: true });
+
+    expect(result.status).toBe('skipped');
+    expect(result.error).toContain('too large');
+    // No engine work at all — confirms the guard short-circuits before any
+    // parsing or chunking allocation.
+    expect((engine as any)._calls.length).toBe(0);
+  });
+
+  test('uses UTF-8 byte length, not JS string length, for the size check', async () => {
+    // 2.6M 4-byte codepoints = ~10.4 MB UTF-8 but only 2.6M JS UTF-16 code units.
+    // A length-based check would let this through; a byteLength check catches it.
+    const fourByteChar = '\u{1F600}'; // emoji, 4 bytes in UTF-8
+    const bigContent = fourByteChar.repeat(2_600_000);
+
+    const engine = mockEngine();
+    const result = await importFromContent(engine, 'emoji-slug', bigContent, { noEmbed: true });
+
+    expect(result.status).toBe('skipped');
+    expect(result.error).toContain('too large');
+    expect((engine as any)._calls.length).toBe(0);
+  });
+
+  test('accepts in-memory content just under MAX_FILE_SIZE', async () => {
+    // Sanity: content exactly at the limit must still import. If this test
+    // fails, the guard is off-by-one and will break legitimate large imports.
+    const content = '---\ntitle: Borderline\n---\n' + 'x'.repeat(4_900_000);
+
+    const engine = mockEngine();
+    const result = await importFromContent(engine, 'borderline-slug', content, { noEmbed: true });
+
+    expect(result.status).toBe('imported');
+  });
+
  test('assigns sequential chunk_index values', async () => {
    const filePath = join(TMP, 'indexed.md');
    const longText = Array(50).fill('This is a sentence that adds length to the content.').join(' ');
--- a/test/import-walker.test.ts
+++ b/test/import-walker.test.ts
@@ -0,0 +1,89 @@
+import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
+import { mkdirSync, writeFileSync, symlinkSync, rmSync, mkdtempSync } from 'fs';
+import { tmpdir } from 'os';
+import { join } from 'path';
+import { collectMarkdownFiles } from '../src/commands/import.ts';
+
+// These tests exercise the filesystem walker that feeds `gbrain import`.
+// They target L002 (report/findings.md): a malicious symlink inside a shared
+// brain directory must not cause the walker to read files outside the brain
+// root. See src/commands/import.ts:collectMarkdownFiles.
+
+describe('collectMarkdownFiles — symlink containment', () => {
+  let root: string;
+  let secretDir: string;
+
+  beforeEach(() => {
+    // Fresh directories per test so symlinks can't cross-contaminate runs.
+    root = mkdtempSync(join(tmpdir(), 'gbrain-walker-root-'));
+    secretDir = mkdtempSync(join(tmpdir(), 'gbrain-walker-secret-'));
+  });
+
+  afterEach(() => {
+    rmSync(root, { recursive: true, force: true });
+    rmSync(secretDir, { recursive: true, force: true });
+  });
+
+  test('includes real markdown files inside the root', () => {
+    writeFileSync(join(root, 'legit.md'), '# legit\n');
+    mkdirSync(join(root, 'notes'));
+    writeFileSync(join(root, 'notes', 'other.md'), '# other\n');
+
+    const files = collectMarkdownFiles(root);
+    expect(files).toContain(join(root, 'legit.md'));
+    expect(files).toContain(join(root, 'notes', 'other.md'));
+  });
+
+  test('skips a symlink file pointing outside the brain root', () => {
+    // Plant a real secret outside the brain root
+    const secretFile = join(secretDir, 'secret.md');
+    writeFileSync(secretFile, '# secret — must not be ingested\n');
+
+    // Inside the brain, create a symlink that points at the secret.
+    // Before the fix, statSync followed the link and reported it as
+    // a regular file, so it ended up in the walker's output and got
+    // fed to importFile — chunked, embedded, and indexed in the brain.
+    writeFileSync(join(root, 'legit.md'), '# legit\n');
+    symlinkSync(secretFile, join(root, 'innocent.md'));
+
+    const files = collectMarkdownFiles(root);
+    expect(files).toContain(join(root, 'legit.md'));
+    // The symlink itself must not appear — this is the security guarantee.
+    expect(files).not.toContain(join(root, 'innocent.md'));
+    // And the canonical secret path must definitely not be in the results.
+    expect(files).not.toContain(secretFile);
+  });
+
+  test('does not descend into a symlinked directory', () => {
+    // Create a directory outside the root with a markdown file inside it.
+    const outsideSub = join(secretDir, 'sub');
+    mkdirSync(outsideSub);
+    writeFileSync(join(outsideSub, 'external.md'), '# external\n');
+
+    // Plant a symlink inside the brain pointing at that directory.
+    // Before the fix, walk() would follow it and emit external.md.
+    // With lstatSync, stat.isSymbolicLink() is true and we refuse
+    // to descend — this also blocks circular-symlink DoS as a side effect.
+    writeFileSync(join(root, 'legit.md'), '# legit\n');
+    symlinkSync(outsideSub, join(root, 'linked-notes'));
+
+    const files = collectMarkdownFiles(root);
+    expect(files).toContain(join(root, 'legit.md'));
+    expect(files).not.toContain(join(root, 'linked-notes', 'external.md'));
+    expect(files).not.toContain(join(outsideSub, 'external.md'));
+  });
+
+  test('skips broken symlinks without crashing', () => {
+    // A dangling symlink — the target never existed. Pre-existing behavior
+    // (PR #26 / PR #38) handled this via try/catch around statSync. The
+    // L002 fix must not regress it: lstatSync succeeds on a dangling link
+    // (it reports on the link itself, not the target), so we reach the
+    // isSymbolicLink() branch and skip cleanly, no throw.
+    writeFileSync(join(root, 'legit.md'), '# legit\n');
+    symlinkSync('/nonexistent/path/to/nowhere', join(root, 'dangling.md'));
+
+    const files = collectMarkdownFiles(root);
+    expect(files).toContain(join(root, 'legit.md'));
+    expect(files).not.toContain(join(root, 'dangling.md'));
+  });
+});
--- a/test/pglite-lock.test.ts
+++ b/test/pglite-lock.test.ts
@@ -0,0 +1,89 @@
+import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
+import { mkdirSync, rmSync, existsSync, readFileSync, writeFileSync } from 'fs';
+import { join } from 'path';
+import { tmpdir } from 'os';
+import { acquireLock, releaseLock, type LockHandle } from '../src/core/pglite-lock';
+
+const TEST_DIR = join(tmpdir(), 'gbrain-lock-test-' + process.pid);
+
+describe('pglite-lock', () => {
+  beforeEach(() => {
+    // Clean up test directory
+    if (existsSync(TEST_DIR)) rmSync(TEST_DIR, { recursive: true, force: true });
+    mkdirSync(TEST_DIR, { recursive: true });
+  });
+
+  afterEach(() => {
+    if (existsSync(TEST_DIR)) rmSync(TEST_DIR, { recursive: true, force: true });
+  });
+
+  test('acquires and releases lock', async () => {
+    const lock = await acquireLock(TEST_DIR);
+    expect(lock.acquired).toBe(true);
+    expect(existsSync(join(TEST_DIR, '.gbrain-lock'))).toBe(true);
+
+    await releaseLock(lock);
+    expect(existsSync(join(TEST_DIR, '.gbrain-lock'))).toBe(false);
+  });
+
+  test('prevents concurrent lock acquisition', async () => {
+    const lock1 = await acquireLock(TEST_DIR, { timeoutMs: 2000 });
+    expect(lock1.acquired).toBe(true);
+
+    // Second lock attempt should timeout
+    await expect(acquireLock(TEST_DIR, { timeoutMs: 1000 })).rejects.toThrow(/Timed out/);
+
+    await releaseLock(lock1);
+  });
+
+  test('detects and cleans stale lock from dead process', async () => {
+    // Simulate a stale lock from a dead process
+    const lockDir = join(TEST_DIR, '.gbrain-lock');
+    mkdirSync(lockDir);
+    writeFileSync(join(lockDir, 'lock'), JSON.stringify({
+      pid: 999999999, // Non-existent PID
+      acquired_at: Date.now(),
+      command: 'test',
+    }));
+
+    // Should clean up the stale lock and acquire
+    const lock = await acquireLock(TEST_DIR);
+    expect(lock.acquired).toBe(true);
+
+    await releaseLock(lock);
+  });
+
+  test('skips lock for in-memory (undefined dataDir)', async () => {
+    const lock = await acquireLock(undefined);
+    expect(lock.acquired).toBe(true);
+    expect(lock.lockDir).toBe('');
+
+    // Release should be a no-op
+    await releaseLock(lock);
+  });
+
+  test('lock file contains PID and command', async () => {
+    const lock = await acquireLock(TEST_DIR);
+    const lockData = JSON.parse(readFileSync(join(TEST_DIR, '.gbrain-lock', 'lock'), 'utf-8'));
+
+    expect(lockData.pid).toBe(process.pid);
+    expect(lockData.acquired_at).toBeDefined();
+    expect(lockData.command).toBeDefined();
+
+    await releaseLock(lock);
+  });
+
+  test('releases lock on disconnect even if DB close fails', async () => {
+    const lock = await acquireLock(TEST_DIR);
+    expect(lock.acquired).toBe(true);
+
+    // Simulate DB already closed
+    await releaseLock(lock);
+    expect(existsSync(join(TEST_DIR, '.gbrain-lock'))).toBe(false);
+
+    // Second acquisition should work
+    const lock2 = await acquireLock(TEST_DIR);
+    expect(lock2.acquired).toBe(true);
+    await releaseLock(lock2);
+  });
+});
--- a/test/search-limit.test.ts
+++ b/test/search-limit.test.ts
@@ -0,0 +1,67 @@
+import { describe, it, expect } from 'bun:test';
+import { MAX_SEARCH_LIMIT, clampSearchLimit } from '../src/core/engine.ts';
+
+describe('clampSearchLimit', () => {
+  it('uses default when undefined', () => {
+    expect(clampSearchLimit(undefined)).toBe(20);
+  });
+
+  it('uses custom default when provided', () => {
+    expect(clampSearchLimit(undefined, 10)).toBe(10);
+  });
+
+  it('passes through in-range values', () => {
+    expect(clampSearchLimit(50)).toBe(50);
+  });
+
+  it('clamps oversized values to MAX_SEARCH_LIMIT', () => {
+    expect(clampSearchLimit(10_000_000)).toBe(MAX_SEARCH_LIMIT);
+  });
+
+  it('uses default for zero', () => {
+    expect(clampSearchLimit(0)).toBe(20);
+  });
+
+  it('uses default for negative', () => {
+    expect(clampSearchLimit(-5)).toBe(20);
+  });
+
+  it('floors fractional values', () => {
+    expect(clampSearchLimit(7.9)).toBe(7);
+  });
+
+  it('uses default for NaN', () => {
+    expect(clampSearchLimit(NaN)).toBe(20);
+  });
+
+  it('clamps Infinity to MAX_SEARCH_LIMIT', () => {
+    expect(clampSearchLimit(Infinity)).toBe(20); // !isFinite → default
+  });
+
+  it('MAX_SEARCH_LIMIT is 100', () => {
+    expect(MAX_SEARCH_LIMIT).toBe(100);
+  });
+});
+
+describe('listPages is NOT affected by search clamp', () => {
+  it('listPages accepts limit > MAX_SEARCH_LIMIT (regression test)', async () => {
+    // listPages uses PageFilters.limit, NOT clampSearchLimit.
+    // This test verifies the clamp is scoped to search operations only.
+    // We import the PGLite engine and check that listPages with limit 100000 works.
+    const { PGLiteEngine } = await import('../src/core/pglite-engine.ts');
+    const engine = new PGLiteEngine();
+    await engine.connect({});
+    await engine.initSchema();
+
+    // Insert a page
+    await engine.putPage('test/big-list', {
+      title: 'Test', type: 'concept', compiled_truth: 'test content', timeline: '',
+    });
+
+    // listPages with limit 100000 should NOT be clamped
+    const pages = await engine.listPages({ limit: 100000 });
+    expect(pages.length).toBeGreaterThanOrEqual(1);
+
+    await engine.disconnect();
+  });
+});
--- a/test/utils.test.ts
+++ b/test/utils.test.ts
@@ -29,19 +29,20 @@ describe('validateSlug', () => {

 describe('contentHash', () => {
  test('returns deterministic hash', () => {
-    const h1 = contentHash('hello', 'world');
-    const h2 = contentHash('hello', 'world');
+    const page = { title: 'Test', type: 'concept' as const, compiled_truth: 'hello', timeline: 'world' };
+    const h1 = contentHash(page);
+    const h2 = contentHash(page);
    expect(h1).toBe(h2);
  });

  test('changes when content changes', () => {
-    const h1 = contentHash('hello', 'world');
-    const h2 = contentHash('hello', 'changed');
+    const h1 = contentHash({ title: 'Test', type: 'concept' as const, compiled_truth: 'hello', timeline: 'world' });
+    const h2 = contentHash({ title: 'Test', type: 'concept' as const, compiled_truth: 'hello', timeline: 'changed' });
    expect(h1).not.toBe(h2);
  });

  test('returns hex string', () => {
-    const h = contentHash('test', '');
+    const h = contentHash({ title: 'Test', type: 'concept' as const, compiled_truth: 'test', timeline: '' });
    expect(h).toMatch(/^[a-f0-9]{64}$/);
  });
 });
@@ -1 +1 @@
 .9.0
 .9.1