Files
gbrain/test/file-upload-security.test.ts
Garry Tan 7bbfc3e36a security: fix wave 3 — 9 vulns (file_upload, SSRF, recipe trust, prompt injection) (#174)
* feat(engine): add cap parameter to clampSearchLimit (H6)

clampSearchLimit(limit, defaultLimit, cap = MAX_SEARCH_LIMIT) — third arg
is a caller-specified cap so operation handlers can enforce limits below
MAX_SEARCH_LIMIT. Backward compatible: existing two-arg callers still cap
at MAX_SEARCH_LIMIT.

This fixes a Codex-caught semantics bug: the prior signature took (limit,
defaultLimit) where the second arg was misread as a cap. clampSearchLimit(x, 20)
was actually allowing values up to 100, not 20.

* feat(integrations): SSRF defense + recipe trust boundary (B1, B2, Fix 2, Fix 4, B3, B4)

- B1: split loadAllRecipes into trusted (package-bundled) and untrusted
  (cwd/recipes, $GBRAIN_RECIPES_DIR) tiers. Only package-bundled recipes
  get embedded=true. Closes the fake trust boundary that let any cwd-local
  recipe bypass health-check gates.
- B2: hard-block string health_checks for non-embedded recipes (was previously
  only blocked when isUnsafeHealthCheck regex matched, which the cwd recipe
  exploit bypassed). Embedded recipes still get the regex defense.
- Fix 2: gate command DSL health_checks on isEmbedded. Non-embedded
  recipes cannot spawnSync.
- Fix 4 + B3 + B4: gate http DSL health_checks on isEmbedded; for embedded
  recipes, validate URLs via new isInternalUrl() before fetch:
  - Scheme allowlist (http/https only): blocks file:, data:, blob:, ftp:, javascript:
  - IPv4 range check covering hex/octal/decimal/single-integer bypass forms
  - IPv6 loopback ::1 + IPv4-mapped ::ffff: (canonicalized hex hextets handled)
  - Metadata hostnames (AWS, GCP, instance-data) blocked
  - fetch with redirect: 'manual' + per-hop re-validation up to 3 hops

Original PRs #105-109 by @garagon. Wave 3 collector branch reimplemented
the fixes after Codex outside-voice review found that PRs #106/#108 alone
did not actually gate cwd-local recipes (B1) and that PR #108 missed
redirect-following SSRF (B3) and non-http schemes (B4).

* feat(file_upload): path/slug/filename validation + remote-caller confinement (Fix 1, B5, H5, M4, Fix 5)

- Fix 1 + B5 + H1: validateUploadPath uses realpathSync + path.relative
  to defeat symlink-parent traversal. lstatSync alone (the original PR #105
  approach) only catches final-component symlinks; a symlinked parent dir
  still followed to /etc/passwd. Now the entire path chain is resolved.
- H5: validatePageSlug uses an allowlist regex (alphanumeric + hyphens,
  slash-separated segments). Closes URL-encoded traversal (%2e%2e%2f),
  Unicode lookalikes, backslashes, control chars implicitly.
- M4: validateFilename allowlist regex. Rejects control chars, backslash,
  RTL override (\u202E), leading dot/dash. Filename flows into storage_path
  so this matters for every storage backend.
- Fix 5: clamp list_pages and get_ingest_log limits at the operation layer
  via new clampSearchLimit cap parameter (list_pages caps at 100,
  get_ingest_log at 50). Internal bulk commands bypass the operation
  layer and remain uncapped.
- New OperationContext.remote flag distinguishes trusted local CLI from
  untrusted MCP callers. file_upload uses strict cwd confinement when
  remote=true (default), loose mode when remote=false (CLI). MCP stdio
  server sets remote=true; cli.ts and handleToolCall (gbrain call) set
  remote=false.

Original PR #105 by @garagon. Issue #139 reported by @Hybirdss.

* feat(search): query sanitization + structural prompt boundary (Fix 3, M1, M2, M3)

- M1: restructure callHaikuForExpansion to use a system message that declares
  the user query as untrusted data, plus an XML-tagged <user_query> boundary
  in the user message. Layered defense with the existing tool_choice constraint
  (3 layers vs 1).
- Fix 3 (regex sanitizer, defense-in-depth): sanitizeQueryForPrompt strips
  triple-backtick code fences, XML/HTML tags, leading injection prefixes,
  and caps at 500 chars. Original query is still used for downstream search;
  only the LLM-facing copy is sanitized.
- M2: sanitizeExpansionOutput validates the model's alternative_queries array
  before it flows into search. Strips control chars, caps length, dedupes
  case-insensitively, drops empty/non-string items, caps to 2 items.
- M3: console.warn on stripped content NEVER logs the query text — privacy-safe
  debug signal only.

Original PR #107 by @garagon. M1/M2/M3 are wave 3 hardening per Codex review.

* chore: bump version and changelog (v0.10.2)

Security wave 3: 9 vulnerabilities closed across file_upload, recipe trust
boundary, SSRF defense, prompt injection, and limit clamping. See CHANGELOG
for full details.

Contributors:
- @garagon (PRs #105-109)
- @Hybirdss (Issue #139)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync documentation with v0.10.2 security wave 3

- CLAUDE.md: document OperationContext.remote, new security helpers
  (validateUploadPath, validatePageSlug, validateFilename, isInternalUrl,
  parseOctet, hostnameToOctets, isPrivateIpv4, getRecipeDirs,
  sanitizeQueryForPrompt, sanitizeExpansionOutput), updated clampSearchLimit
  signature, recipe trust boundary, new test files
- docs/integrations/README.md: replace string-form health_check example
  with typed DSL (string checks now hard-block for non-embedded recipes);
  add recipe trust boundary subsection
- docs/mcp/DEPLOY.md: document file_upload remote-caller cwd confinement,
  symlink rejection, slug/filename allowlists

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:03:15 -07:00

208 lines
6.9 KiB
TypeScript

import { describe, it, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
import { mkdtempSync, rmSync, writeFileSync, symlinkSync, mkdirSync, realpathSync } from 'fs';
import { join } from 'path';
import { tmpdir } from 'os';
import {
validateUploadPath,
validatePageSlug,
validateFilename,
OperationError,
} from '../src/core/operations.ts';
// --- validateUploadPath ---
describe('validateUploadPath', () => {
let sandbox: string;
let root: string;
let outside: string;
beforeAll(() => {
sandbox = mkdtempSync(join(tmpdir(), 'gbrain-upload-'));
root = realpathSync(sandbox);
outside = mkdtempSync(join(tmpdir(), 'gbrain-outside-'));
});
afterAll(() => {
rmSync(sandbox, { recursive: true, force: true });
rmSync(outside, { recursive: true, force: true });
});
it('allows a regular file inside the confinement root', () => {
const p = join(root, 'photo.jpg');
writeFileSync(p, 'binary');
expect(() => validateUploadPath(p, root)).not.toThrow();
});
it('allows a nested file inside the confinement root', () => {
const sub = join(root, 'sub');
mkdirSync(sub, { recursive: true });
const p = join(sub, 'note.txt');
writeFileSync(p, 'hi');
expect(() => validateUploadPath(p, root)).not.toThrow();
});
it('rejects a path outside the confinement root', () => {
const p = join(outside, 'secret.txt');
writeFileSync(p, 'x');
expect(() => validateUploadPath(p, root)).toThrow(OperationError);
try { validateUploadPath(p, root); } catch (e) {
expect((e as OperationError).code).toBe('invalid_params');
expect((e as Error).message).toMatch(/within the working directory/i);
}
});
it('rejects ../ traversal above the root', () => {
const p = join(root, '..', 'escaped.txt');
writeFileSync(p, 'nope');
try {
expect(() => validateUploadPath(p, root)).toThrow(OperationError);
} finally {
rmSync(p, { force: true });
}
});
it('rejects /etc/passwd (absolute path outside root)', () => {
expect(() => validateUploadPath('/etc/passwd', root)).toThrow(OperationError);
});
it('rejects a symlink whose final component points outside root (B5 regression)', () => {
const target = join(outside, 'target.txt');
writeFileSync(target, 'secret');
const link = join(root, 'link-to-outside.txt');
symlinkSync(target, link);
try {
expect(() => validateUploadPath(link, root)).toThrow(OperationError);
} finally {
rmSync(link, { force: true });
}
});
it('rejects a symlink whose parent dir points outside root (B5 parent-symlink regression)', () => {
const linkDir = join(root, 'link-dir');
symlinkSync(outside, linkDir);
const p = join(linkDir, 'secret.txt');
writeFileSync(join(outside, 'secret.txt'), 'secret');
try {
expect(() => validateUploadPath(p, root)).toThrow(OperationError);
} finally {
rmSync(linkDir, { force: true });
rmSync(join(outside, 'secret.txt'), { force: true });
}
});
it('rejects non-existent paths with a clear error', () => {
const p = join(root, 'never-created.txt');
try {
validateUploadPath(p, root);
throw new Error('expected throw');
} catch (e) {
expect(e).toBeInstanceOf(OperationError);
expect((e as OperationError).code).toBe('invalid_params');
expect((e as Error).message).toMatch(/File not found/i);
}
});
it('handles relative paths via resolve', () => {
const p = join(root, 'rel.txt');
writeFileSync(p, 'hi');
const prevCwd = process.cwd();
process.chdir(root);
try {
expect(() => validateUploadPath('./rel.txt', root)).not.toThrow();
} finally {
process.chdir(prevCwd);
}
});
});
// --- validatePageSlug (H5 allowlist) ---
describe('validatePageSlug', () => {
it('accepts clean slugs', () => {
expect(() => validatePageSlug('people/alice-smith')).not.toThrow();
expect(() => validatePageSlug('concepts/ai')).not.toThrow();
expect(() => validatePageSlug('a')).not.toThrow();
expect(() => validatePageSlug('a/b/c/d')).not.toThrow();
});
it('rejects ../ traversal', () => {
expect(() => validatePageSlug('../etc/passwd')).toThrow(OperationError);
expect(() => validatePageSlug('pages/../../etc')).toThrow(OperationError);
});
it('rejects URL-encoded traversal (not in allowlist)', () => {
expect(() => validatePageSlug('%2e%2e%2fetc%2fpasswd')).toThrow(OperationError);
});
it('rejects absolute paths', () => {
expect(() => validatePageSlug('/etc/passwd')).toThrow(OperationError);
});
it('rejects backslash (Windows separator)', () => {
expect(() => validatePageSlug('people\\alice')).toThrow(OperationError);
});
it('rejects leading/trailing slash', () => {
expect(() => validatePageSlug('/people/alice')).toThrow(OperationError);
expect(() => validatePageSlug('people/alice/')).toThrow(OperationError);
});
it('rejects consecutive slashes', () => {
expect(() => validatePageSlug('people//alice')).toThrow(OperationError);
});
it('rejects empty or too-long', () => {
expect(() => validatePageSlug('')).toThrow(OperationError);
expect(() => validatePageSlug('a'.repeat(256))).toThrow(OperationError);
});
it('rejects NUL and control chars', () => {
expect(() => validatePageSlug('people\x00alice')).toThrow(OperationError);
expect(() => validatePageSlug('people\nalice')).toThrow(OperationError);
});
it('rejects spaces', () => {
expect(() => validatePageSlug('people/alice smith')).toThrow(OperationError);
});
});
// --- validateFilename (M4 allowlist) ---
describe('validateFilename', () => {
it('accepts clean filenames with extensions', () => {
expect(() => validateFilename('photo.jpg')).not.toThrow();
expect(() => validateFilename('report-2026.pdf')).not.toThrow();
expect(() => validateFilename('v1.0.0_release.md')).not.toThrow();
});
it('rejects control chars', () => {
expect(() => validateFilename('file\nwith\nnewlines.txt')).toThrow(OperationError);
expect(() => validateFilename('file\x00nul.txt')).toThrow(OperationError);
});
it('rejects backslash', () => {
expect(() => validateFilename('file\\win.txt')).toThrow(OperationError);
});
it('rejects RTL override and other Unicode injection', () => {
expect(() => validateFilename('file\u202E.exe')).toThrow(OperationError);
});
it('rejects leading dash (CLI flag confusion)', () => {
expect(() => validateFilename('-rf.txt')).toThrow(OperationError);
});
it('rejects leading dot (hidden files)', () => {
expect(() => validateFilename('.htaccess')).toThrow(OperationError);
});
it('rejects empty and too-long', () => {
expect(() => validateFilename('')).toThrow(OperationError);
expect(() => validateFilename('x'.repeat(256))).toThrow(OperationError);
});
it('rejects path separators in filename', () => {
expect(() => validateFilename('foo/bar.txt')).toThrow(OperationError);
});
});