* feat(engine): add cap parameter to clampSearchLimit (H6) clampSearchLimit(limit, defaultLimit, cap = MAX_SEARCH_LIMIT) — third arg is a caller-specified cap so operation handlers can enforce limits below MAX_SEARCH_LIMIT. Backward compatible: existing two-arg callers still cap at MAX_SEARCH_LIMIT. This fixes a Codex-caught semantics bug: the prior signature took (limit, defaultLimit) where the second arg was misread as a cap. clampSearchLimit(x, 20) was actually allowing values up to 100, not 20. * feat(integrations): SSRF defense + recipe trust boundary (B1, B2, Fix 2, Fix 4, B3, B4) - B1: split loadAllRecipes into trusted (package-bundled) and untrusted (cwd/recipes, $GBRAIN_RECIPES_DIR) tiers. Only package-bundled recipes get embedded=true. Closes the fake trust boundary that let any cwd-local recipe bypass health-check gates. - B2: hard-block string health_checks for non-embedded recipes (was previously only blocked when isUnsafeHealthCheck regex matched, which the cwd recipe exploit bypassed). Embedded recipes still get the regex defense. - Fix 2: gate command DSL health_checks on isEmbedded. Non-embedded recipes cannot spawnSync. - Fix 4 + B3 + B4: gate http DSL health_checks on isEmbedded; for embedded recipes, validate URLs via new isInternalUrl() before fetch: - Scheme allowlist (http/https only): blocks file:, data:, blob:, ftp:, javascript: - IPv4 range check covering hex/octal/decimal/single-integer bypass forms - IPv6 loopback ::1 + IPv4-mapped ::ffff: (canonicalized hex hextets handled) - Metadata hostnames (AWS, GCP, instance-data) blocked - fetch with redirect: 'manual' + per-hop re-validation up to 3 hops Original PRs #105-109 by @garagon. Wave 3 collector branch reimplemented the fixes after Codex outside-voice review found that PRs #106/#108 alone did not actually gate cwd-local recipes (B1) and that PR #108 missed redirect-following SSRF (B3) and non-http schemes (B4). * feat(file_upload): path/slug/filename validation + remote-caller confinement (Fix 1, B5, H5, M4, Fix 5) - Fix 1 + B5 + H1: validateUploadPath uses realpathSync + path.relative to defeat symlink-parent traversal. lstatSync alone (the original PR #105 approach) only catches final-component symlinks; a symlinked parent dir still followed to /etc/passwd. Now the entire path chain is resolved. - H5: validatePageSlug uses an allowlist regex (alphanumeric + hyphens, slash-separated segments). Closes URL-encoded traversal (%2e%2e%2f), Unicode lookalikes, backslashes, control chars implicitly. - M4: validateFilename allowlist regex. Rejects control chars, backslash, RTL override (\u202E), leading dot/dash. Filename flows into storage_path so this matters for every storage backend. - Fix 5: clamp list_pages and get_ingest_log limits at the operation layer via new clampSearchLimit cap parameter (list_pages caps at 100, get_ingest_log at 50). Internal bulk commands bypass the operation layer and remain uncapped. - New OperationContext.remote flag distinguishes trusted local CLI from untrusted MCP callers. file_upload uses strict cwd confinement when remote=true (default), loose mode when remote=false (CLI). MCP stdio server sets remote=true; cli.ts and handleToolCall (gbrain call) set remote=false. Original PR #105 by @garagon. Issue #139 reported by @Hybirdss. * feat(search): query sanitization + structural prompt boundary (Fix 3, M1, M2, M3) - M1: restructure callHaikuForExpansion to use a system message that declares the user query as untrusted data, plus an XML-tagged <user_query> boundary in the user message. Layered defense with the existing tool_choice constraint (3 layers vs 1). - Fix 3 (regex sanitizer, defense-in-depth): sanitizeQueryForPrompt strips triple-backtick code fences, XML/HTML tags, leading injection prefixes, and caps at 500 chars. Original query is still used for downstream search; only the LLM-facing copy is sanitized. - M2: sanitizeExpansionOutput validates the model's alternative_queries array before it flows into search. Strips control chars, caps length, dedupes case-insensitively, drops empty/non-string items, caps to 2 items. - M3: console.warn on stripped content NEVER logs the query text — privacy-safe debug signal only. Original PR #107 by @garagon. M1/M2/M3 are wave 3 hardening per Codex review. * chore: bump version and changelog (v0.10.2) Security wave 3: 9 vulnerabilities closed across file_upload, recipe trust boundary, SSRF defense, prompt injection, and limit clamping. See CHANGELOG for full details. Contributors: - @garagon (PRs #105-109) - @Hybirdss (Issue #139) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: sync documentation with v0.10.2 security wave 3 - CLAUDE.md: document OperationContext.remote, new security helpers (validateUploadPath, validatePageSlug, validateFilename, isInternalUrl, parseOctet, hostnameToOctets, isPrivateIpv4, getRecipeDirs, sanitizeQueryForPrompt, sanitizeExpansionOutput), updated clampSearchLimit signature, recipe trust boundary, new test files - docs/integrations/README.md: replace string-form health_check example with typed DSL (string checks now hard-block for non-embedded recipes); add recipe trust boundary subsection - docs/mcp/DEPLOY.md: document file_upload remote-caller cwd confinement, symlink rejection, slug/filename allowlists Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
99 lines
3.4 KiB
TypeScript
99 lines
3.4 KiB
TypeScript
import { describe, it, expect } from 'bun:test';
|
|
import { MAX_SEARCH_LIMIT, clampSearchLimit } from '../src/core/engine.ts';
|
|
|
|
describe('clampSearchLimit', () => {
|
|
it('uses default when undefined', () => {
|
|
expect(clampSearchLimit(undefined)).toBe(20);
|
|
});
|
|
|
|
it('uses custom default when provided', () => {
|
|
expect(clampSearchLimit(undefined, 10)).toBe(10);
|
|
});
|
|
|
|
it('passes through in-range values', () => {
|
|
expect(clampSearchLimit(50)).toBe(50);
|
|
});
|
|
|
|
it('clamps oversized values to MAX_SEARCH_LIMIT', () => {
|
|
expect(clampSearchLimit(10_000_000)).toBe(MAX_SEARCH_LIMIT);
|
|
});
|
|
|
|
it('uses default for zero', () => {
|
|
expect(clampSearchLimit(0)).toBe(20);
|
|
});
|
|
|
|
it('uses default for negative', () => {
|
|
expect(clampSearchLimit(-5)).toBe(20);
|
|
});
|
|
|
|
it('floors fractional values', () => {
|
|
expect(clampSearchLimit(7.9)).toBe(7);
|
|
});
|
|
|
|
it('uses default for NaN', () => {
|
|
expect(clampSearchLimit(NaN)).toBe(20);
|
|
});
|
|
|
|
it('clamps Infinity to MAX_SEARCH_LIMIT', () => {
|
|
expect(clampSearchLimit(Infinity)).toBe(20); // !isFinite → default
|
|
});
|
|
|
|
it('MAX_SEARCH_LIMIT is 100', () => {
|
|
expect(MAX_SEARCH_LIMIT).toBe(100);
|
|
});
|
|
|
|
// H6: the third parameter is a caller-specified cap.
|
|
it('honors a caller-specified cap lower than MAX_SEARCH_LIMIT', () => {
|
|
expect(clampSearchLimit(10_000_000, 20, 50)).toBe(50);
|
|
expect(clampSearchLimit(75, 20, 50)).toBe(50);
|
|
expect(clampSearchLimit(49, 20, 50)).toBe(49);
|
|
});
|
|
|
|
it('caller cap higher than MAX_SEARCH_LIMIT is still respected', () => {
|
|
// Backward-compatible: if someone passes a cap above MAX, the cap wins.
|
|
expect(clampSearchLimit(1000, 20, 200)).toBe(200);
|
|
});
|
|
|
|
it('default is returned when cap is lower than default would suggest', () => {
|
|
expect(clampSearchLimit(undefined, 50, 100)).toBe(50);
|
|
expect(clampSearchLimit(undefined, 20, 50)).toBe(20);
|
|
});
|
|
|
|
it('operation layer list_pages clamp: default 50, max 100', () => {
|
|
// These are the exact calls made by src/core/operations.ts list_pages handler.
|
|
expect(clampSearchLimit(undefined, 50, 100)).toBe(50);
|
|
expect(clampSearchLimit(10_000_000, 50, 100)).toBe(100);
|
|
expect(clampSearchLimit(25, 50, 100)).toBe(25);
|
|
});
|
|
|
|
it('operation layer get_ingest_log clamp: default 20, max 50', () => {
|
|
// These are the exact calls made by src/core/operations.ts get_ingest_log handler.
|
|
expect(clampSearchLimit(undefined, 20, 50)).toBe(20);
|
|
expect(clampSearchLimit(10_000_000, 20, 50)).toBe(50);
|
|
expect(clampSearchLimit(10, 20, 50)).toBe(10);
|
|
});
|
|
});
|
|
|
|
describe('listPages is NOT affected by search clamp', () => {
|
|
it('listPages accepts limit > MAX_SEARCH_LIMIT (regression test)', async () => {
|
|
// listPages uses PageFilters.limit, NOT clampSearchLimit.
|
|
// This test verifies the clamp is scoped to search operations only.
|
|
// We import the PGLite engine and check that listPages with limit 100000 works.
|
|
const { PGLiteEngine } = await import('../src/core/pglite-engine.ts');
|
|
const engine = new PGLiteEngine();
|
|
await engine.connect({});
|
|
await engine.initSchema();
|
|
|
|
// Insert a page
|
|
await engine.putPage('test/big-list', {
|
|
title: 'Test', type: 'concept', compiled_truth: 'test content', timeline: '',
|
|
});
|
|
|
|
// listPages with limit 100000 should NOT be clamped
|
|
const pages = await engine.listPages({ limit: 100000 });
|
|
expect(pages.length).toBeGreaterThanOrEqual(1);
|
|
|
|
await engine.disconnect();
|
|
});
|
|
});
|