Files
gbrain/test/minions-shell.test.ts
Garry Tan 96178d726e fix(subagent): v0.16.3 — bind Anthropic SDK correctly + enable tsc in CI (#318)
* fix(subagent): bind Anthropic SDK messages.create() correctly

The makeSubagentHandler was casting `new Anthropic()` directly to
MessagesClient, but MessagesClient.create() maps to sdk.messages.create(),
not sdk.create(). Every subagent job immediately died with:

  client.create is not a function

Fix: wrap the SDK instance so .create() delegates to .messages.create()
with proper `this` binding via .bind(sdk.messages).

Discovered on first production run of gbrain agent against Supabase.

Co-Authored-By: Wintermute <wintermute@openclaw.ai>

* chore(ci): add typescript typecheck to test pipeline + clean up baseline errors

Root cause infra gap that let the v0.16.0 subagent bug ship: CI ran
only `bun test`, which transpiles types without checking them. Type
errors only surfaced at runtime, in production.

Changes:
- Add `typescript` devDep and a `typecheck` npm script (`tsc --noEmit`).
- Chain `bun run typecheck` into `bun run test` so developers get the
  same pipeline locally that CI runs.
- Flip `.github/workflows/test.yml` to invoke `bun run test` (the npm
  script, including typecheck) instead of `bun test` (runner only).
- Clean up 100+ pre-existing type errors across 30+ files so the first
  run of `tsc --noEmit` is green. Root causes were:
  - `databaseUrl` → `database_url` rename drift in test fixtures (9 files)
  - `PageType` union missing `'meeting'` / `'note'` entries that are
    already used in both src and tests (link-extraction.ts comments
    acknowledged the gap)
  - `GBrainConfig.storage` field never declared despite being read in
    files.ts and operations.ts
  - `ErrorCode` union missing `'permission_denied'`
  - `OrchestratorOpts` shape changed; test callers not updated
  - Dead-code comparisons in migration orchestrators against narrowed
    status types
  - postgres.js `Row`-callback type drift on several `.map()` calls
  - Buffer-as-BodyInit assignment in supabase.ts (real but non-fatal
    runtime bug; Uint8Array slice works and is type-correct)
  - Various `as X` single-step casts that now need `as unknown as X`
    per TS's stricter structural-conversion rules
- Bump `beforeAll` hook timeout to 30s on four PGLite-heavy tests that
  were flaky under parallel test execution: wait-for-completion,
  extract-fs, e2e/search-quality, e2e/graph-quality. All pass in
  isolation; timeouts only happened when dozens of PGLite instances
  init'd simultaneously.

The new CI pipeline now fails on any type error across src/ or test/,
giving us the compile-time regression guard the subagent fix depends on.

* fix(subagent): bind Anthropic SDK messages.create() correctly

Shipped bug: v0.16.0 cast `new Anthropic()` to `MessagesClient`, but
`.create()` lives at `sdk.messages.create`, not on the top-level client.
Every subagent job in production died on first LLM call with
`client.create is not a function`. Discovered on the first `gbrain agent
run` against Supabase.

Fix: assign `sdk.messages` directly to the `MessagesClient` slot.
`sdk.messages` IS the object with a callable `.create()`; the original
bug was picking the wrong entry point on the SDK. No helper, no
wrapper, no `.bind()` — JS method-call semantics preserve `this` at
the call site because `subagent.ts:336` invokes `client.create(...)`
with `client === sdk.messages`.

The one-line assignment also typechecks cleanly against the existing
`MessagesClient` interface (SDK's first `create` overload:
`(MessageCreateParamsNonStreaming, Core.RequestOptions?) =>
APIPromise<Message>` is assignable structurally). This gives us
compile-time regression protection: anyone reverting to
`new Anthropic()` would fail tsc because `Anthropic` has no top-level
`.create`. (The companion chore commit puts `tsc --noEmit` in CI so
this guard is enforced.)

Also adds a `makeAnthropic?: () => Anthropic` dep-injection seam so
the factory default construction branch is testable without real API
calls. Regression test drives one handler turn through a fake SDK,
asserting `sdk.messages.create` is actually called. If someone later
reverts to `new Anthropic()`, both guards fire: tsc fails AND the test
fails.

Co-Authored-By: Wintermute <wintermute@garrytan.com>

* chore(tests): add bunfig.toml + 60s hook timeouts to stabilize PGLite-heavy suites

After turning on tsc in CI (previous commit), running the full `bun run test`
suite in one shot triggered flaky `beforeEach/afterEach hook timed out`
failures on 8+ test files. Every failure traced to PGLite WASM init
contention when many test files spin up fresh PGLite instances in parallel;
each one alone passes in isolation.

- `bunfig.toml` sets the global test hook timeout to 60s (default is 5s),
  covering every test file without per-file edits.
- Individual `beforeAll(fn, 60_000)` / `beforeEach(fn, 15_000)` calls on
  the 8 tests that flaked most stay in place as explicit safety nets so
  a future bunfig config change doesn't silently re-introduce the flake.

Result: 1997 pass, 0 fail on `bun run test` (117 tests added since the
prior baseline by picking up typecheck-gated passes). No infrastructure
flake tolerated in CI.

* chore: bump version and changelog (v0.16.3)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Wintermute <wintermute@garrytan.com>
Co-authored-by: Wintermute <wintermute@openclaw.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 01:34:22 -07:00

343 lines
14 KiB
TypeScript

import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
import { MinionQueue } from '../src/core/minions/queue.ts';
import { UnrecoverableError } from '../src/core/minions/types.ts';
import type { MinionJobContext } from '../src/core/minions/types.ts';
import { shellHandler } from '../src/core/minions/handlers/shell.ts';
import { computeAuditFilename, resolveAuditDir, logShellSubmission } from '../src/core/minions/handlers/shell-audit.ts';
import { isProtectedJobName, PROTECTED_JOB_NAMES } from '../src/core/minions/protected-names.ts';
import * as fs from 'node:fs';
import * as path from 'node:path';
import * as os from 'node:os';
let engine: PGLiteEngine;
let queue: MinionQueue;
beforeAll(async () => {
engine = new PGLiteEngine();
await engine.connect({ database_url: '' });
await engine.initSchema();
queue = new MinionQueue(engine);
}, 60_000);
afterAll(async () => {
await engine.disconnect();
});
beforeEach(async () => {
await engine.executeRaw('DELETE FROM minion_jobs');
});
// Build a minimal MinionJobContext for unit tests. Real worker provides this;
// here we mock it so the handler can be exercised without spinning up Postgres.
function makeCtx(
data: Record<string, unknown>,
opts: { signal?: AbortSignal; shutdownSignal?: AbortSignal } = {},
): MinionJobContext {
return {
id: 1,
name: 'shell',
data,
attempts_made: 0,
signal: opts.signal ?? new AbortController().signal,
shutdownSignal: opts.shutdownSignal ?? new AbortController().signal,
updateProgress: async () => {},
updateTokens: async () => {},
log: async () => {},
isActive: async () => true,
readInbox: async () => [],
};
}
// ---- protected-names ---------------------------------------------------------
describe('protected-names', () => {
test('shell is protected', () => {
expect(isProtectedJobName('shell')).toBe(true);
expect(PROTECTED_JOB_NAMES.has('shell')).toBe(true);
});
test('normalization: whitespace is trimmed before check', () => {
expect(isProtectedJobName(' shell ')).toBe(true);
expect(isProtectedJobName('\tshell\n')).toBe(true);
});
test('case-sensitive: Shell is NOT protected', () => {
expect(isProtectedJobName('Shell')).toBe(false);
expect(isProtectedJobName('SHELL')).toBe(false);
});
test('non-protected names pass through', () => {
expect(isProtectedJobName('sync')).toBe(false);
expect(isProtectedJobName('embed')).toBe(false);
expect(isProtectedJobName('')).toBe(false);
});
});
// ---- MinionQueue.add trusted guard ------------------------------------------
describe('MinionQueue.add protected-name guard', () => {
test('add("shell", ...) without trusted arg throws', async () => {
expect(queue.add('shell', { cmd: 'echo', cwd: '/tmp' })).rejects.toThrow(/protected job name/);
});
test('add("shell", ..., opts, {allowProtectedSubmit:true}) succeeds', async () => {
const job = await queue.add('shell', { cmd: 'echo', cwd: '/tmp' }, undefined, { allowProtectedSubmit: true });
expect(job.name).toBe('shell');
expect(job.status).toBe('waiting');
});
// Whitespace bypass defense (Codex #1)
test('add(" shell ", ...) without trusted arg throws (whitespace bypass defense)', async () => {
expect(queue.add(' shell ', { cmd: 'echo', cwd: '/tmp' })).rejects.toThrow(/protected job name/);
});
test('add(" shell ", ...) with trusted arg inserts normalized name "shell"', async () => {
const job = await queue.add(' shell ', { cmd: 'echo', cwd: '/tmp' }, undefined, { allowProtectedSubmit: true });
expect(job.name).toBe('shell');
});
test('add("Shell", ...) is treated as non-protected (case-sensitive)', async () => {
const job = await queue.add('Shell', {});
expect(job.name).toBe('Shell');
expect(job.status).toBe('waiting');
});
// Regression: non-protected names unaffected (Codex iron-rule)
test('REGRESSION: add("sync", ...) without trusted arg still succeeds', async () => {
const job = await queue.add('sync', { full: true });
expect(job.name).toBe('sync');
expect(job.status).toBe('waiting');
});
test('REGRESSION: trusted flag does NOT bypass empty-name check', async () => {
expect(queue.add('', {}, undefined, { allowProtectedSubmit: true })).rejects.toThrow(/cannot be empty/);
});
});
// ---- Shell handler: validation ----------------------------------------------
describe('shell handler: validation', () => {
test('both cmd and argv → UnrecoverableError', async () => {
const p = shellHandler(makeCtx({ cmd: 'echo', argv: ['echo'], cwd: '/tmp' }));
expect(p).rejects.toThrow(UnrecoverableError);
});
test('neither cmd nor argv → UnrecoverableError', async () => {
const p = shellHandler(makeCtx({ cwd: '/tmp' }));
expect(p).rejects.toThrow(UnrecoverableError);
});
test('cwd missing → UnrecoverableError', async () => {
const p = shellHandler(makeCtx({ cmd: 'echo ok' }));
expect(p).rejects.toThrow(UnrecoverableError);
});
test('cwd not absolute → UnrecoverableError', async () => {
const p = shellHandler(makeCtx({ cmd: 'echo ok', cwd: 'relative/path' }));
expect(p).rejects.toThrow(UnrecoverableError);
});
test('argv non-array (string) → UnrecoverableError', async () => {
const p = shellHandler(makeCtx({ argv: 'echo ok', cwd: '/tmp' }));
expect(p).rejects.toThrow(UnrecoverableError);
});
test('argv with non-string entries → UnrecoverableError', async () => {
const p = shellHandler(makeCtx({ argv: ['echo', 42], cwd: '/tmp' }));
expect(p).rejects.toThrow(UnrecoverableError);
});
test('env with non-string values → UnrecoverableError', async () => {
const p = shellHandler(makeCtx({ cmd: 'echo', cwd: '/tmp', env: { FOO: 42 } }));
expect(p).rejects.toThrow(UnrecoverableError);
});
});
// ---- Shell handler: spawn + output ------------------------------------------
describe('shell handler: spawn', () => {
test('cmd happy path: echo ok → exit 0, stdout captured', async () => {
const res = await shellHandler(makeCtx({ cmd: 'echo ok', cwd: '/tmp' })) as any;
expect(res.exit_code).toBe(0);
expect(res.stdout_tail).toBe('ok\n');
expect(res.stderr_tail).toBe('');
expect(typeof res.duration_ms).toBe('number');
expect(res.duration_ms).toBeGreaterThanOrEqual(0);
expect(typeof res.pid).toBe('number');
});
test('argv happy path: ["echo","hi"] → exit 0, stdout "hi\\n"', async () => {
const res = await shellHandler(makeCtx({ argv: ['echo', 'hi'], cwd: '/tmp' })) as any;
expect(res.exit_code).toBe(0);
expect(res.stdout_tail).toBe('hi\n');
});
test('non-zero exit → Error with stderr in message', async () => {
const p = shellHandler(makeCtx({ cmd: 'echo fail 1>&2; exit 7', cwd: '/tmp' }));
await expect(p).rejects.toThrow(/exit 7/);
});
test('argv with bogus binary → Error (retryable)', async () => {
const p = shellHandler(makeCtx({ argv: ['gbrain-nonexistent-binary-xyz'], cwd: '/tmp' }));
// spawn emits 'error' on ENOENT
await expect(p).rejects.toThrow();
});
test('result shape includes all declared keys', async () => {
const res = await shellHandler(makeCtx({ cmd: 'echo ok', cwd: '/tmp' })) as any;
expect(Object.keys(res).sort()).toEqual(['duration_ms', 'exit_code', 'pid', 'stderr_tail', 'stdout_tail']);
});
});
// ---- Shell handler: env allowlist -------------------------------------------
describe('shell handler: env allowlist', () => {
test('process env leak prevention: a faux secret is NOT in child env', async () => {
const saved = process.env.SHELL_TEST_SECRET;
process.env.SHELL_TEST_SECRET = 'should-not-leak';
try {
const res = await shellHandler(makeCtx({
cmd: 'echo "secret=${SHELL_TEST_SECRET:-EMPTY}"',
cwd: '/tmp',
})) as any;
expect(res.stdout_tail).toBe('secret=EMPTY\n');
} finally {
if (saved === undefined) delete process.env.SHELL_TEST_SECRET;
else process.env.SHELL_TEST_SECRET = saved;
}
});
test('PATH is inherited from worker', async () => {
const res = await shellHandler(makeCtx({
cmd: 'echo "path=$PATH"',
cwd: '/tmp',
})) as any;
expect(res.stdout_tail.startsWith('path=')).toBe(true);
expect(res.stdout_tail.length).toBeGreaterThan('path=\n'.length);
});
test('caller-supplied env key is added', async () => {
const res = await shellHandler(makeCtx({
cmd: 'echo "val=$MY_CUSTOM"',
cwd: '/tmp',
env: { MY_CUSTOM: 'hello' },
})) as any;
expect(res.stdout_tail).toBe('val=hello\n');
});
test('caller-supplied env can override allowlisted key (PATH)', async () => {
const res = await shellHandler(makeCtx({
cmd: 'echo "path=$PATH"',
cwd: '/tmp',
env: { PATH: '/custom/bin' },
})) as any;
expect(res.stdout_tail).toBe('path=/custom/bin\n');
});
});
// ---- Shell handler: abort --------------------------------------------------
describe('shell handler: abort', () => {
test('ctx.signal.abort triggers SIGTERM and handler throws aborted', async () => {
const ac = new AbortController();
const promise = shellHandler(makeCtx(
{ cmd: 'sleep 30', cwd: '/tmp' },
{ signal: ac.signal },
));
// Give spawn a beat to start
setTimeout(() => ac.abort(new Error('cancel')), 50);
await expect(promise).rejects.toThrow(/aborted/);
});
test('ctx.shutdownSignal.abort also triggers kill', async () => {
const shutdownCtl = new AbortController();
const promise = shellHandler(makeCtx(
{ cmd: 'sleep 30', cwd: '/tmp' },
{ shutdownSignal: shutdownCtl.signal },
));
setTimeout(() => shutdownCtl.abort(new Error('shutdown')), 50);
await expect(promise).rejects.toThrow(/aborted/);
});
test('pre-aborted signal → immediate kill', async () => {
const ac = new AbortController();
ac.abort(new Error('cancel'));
const promise = shellHandler(makeCtx(
{ cmd: 'sleep 30', cwd: '/tmp' },
{ signal: ac.signal },
));
await expect(promise).rejects.toThrow(/aborted/);
});
});
// ---- shell-audit: ISO-week filename ----------------------------------------
describe('shell-audit: computeAuditFilename', () => {
test('2027-01-01 is ISO week 53 of 2026', () => {
expect(computeAuditFilename(new Date('2027-01-01T12:00:00Z'))).toBe('shell-jobs-2026-W53.jsonl');
});
test('2026-12-28 (Monday) is ISO week 53 of 2026', () => {
expect(computeAuditFilename(new Date('2026-12-28T12:00:00Z'))).toBe('shell-jobs-2026-W53.jsonl');
});
test('2027-01-04 (Monday) is ISO week 1 of 2027', () => {
expect(computeAuditFilename(new Date('2027-01-04T12:00:00Z'))).toBe('shell-jobs-2027-W01.jsonl');
});
test('2026-04-19 (mid-year reference)', () => {
const f = computeAuditFilename(new Date('2026-04-19T00:00:00Z'));
expect(f).toMatch(/^shell-jobs-2026-W\d{2}\.jsonl$/);
});
});
// ---- shell-audit: write path -----------------------------------------------
describe('shell-audit: write', () => {
let tmpDir: string;
beforeEach(() => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'shell-audit-test-'));
process.env.GBRAIN_AUDIT_DIR = tmpDir;
});
afterAll(() => {
delete process.env.GBRAIN_AUDIT_DIR;
});
test('GBRAIN_AUDIT_DIR env override resolves to the custom dir', () => {
expect(resolveAuditDir()).toBe(tmpDir);
});
test('writes a JSONL line; creates dir if missing', () => {
const inner = path.join(tmpDir, 'nested-not-yet-created');
process.env.GBRAIN_AUDIT_DIR = inner;
logShellSubmission({
caller: 'cli', remote: false, job_id: 42, cwd: '/tmp', cmd_display: 'echo ok',
});
const files = fs.readdirSync(inner);
expect(files.length).toBe(1);
const content = fs.readFileSync(path.join(inner, files[0]), 'utf8').trim();
const parsed = JSON.parse(content);
expect(parsed.caller).toBe('cli');
expect(parsed.job_id).toBe(42);
expect(parsed.cmd_display).toBe('echo ok');
expect(parsed.ts).toBeDefined();
});
test('argv_display stored as JSON array (Codex #11)', () => {
logShellSubmission({
caller: 'cli', remote: false, job_id: 1, cwd: '/tmp',
argv_display: ['node', 'script.mjs', '--date', '2026-04-18'],
});
const files = fs.readdirSync(tmpDir);
const content = fs.readFileSync(path.join(tmpDir, files[0]), 'utf8').trim();
const parsed = JSON.parse(content);
expect(Array.isArray(parsed.argv_display)).toBe(true);
expect(parsed.argv_display).toEqual(['node', 'script.mjs', '--date', '2026-04-18']);
});
test('does NOT log env values', () => {
logShellSubmission({
caller: 'cli', remote: false, job_id: 1, cwd: '/tmp', cmd_display: 'echo ok',
});
const files = fs.readdirSync(tmpDir);
const content = fs.readFileSync(path.join(tmpDir, files[0]), 'utf8');
expect(content).not.toContain('env');
});
test('write failure (EACCES) is non-blocking', () => {
// Point at a read-only target. /dev/null is not a directory.
process.env.GBRAIN_AUDIT_DIR = '/dev/null/not-a-dir';
// Should not throw — failures go to stderr.
expect(() => logShellSubmission({
caller: 'cli', remote: false, job_id: 1, cwd: '/tmp',
})).not.toThrow();
});
});
// ---- shell handler: UTF-8-safe output truncation ---------------------------
describe('shell handler: output truncation', () => {
test('stdout > 64KB is truncated and marker is prepended', async () => {
// Emit ~100KB of stdout to force truncation
const res = await shellHandler(makeCtx({
cmd: `yes ok | head -c 100000`,
cwd: '/tmp',
})) as any;
expect(res.exit_code).toBe(0);
expect(res.stdout_tail).toMatch(/^\[truncated \d+ bytes\]/);
expect(res.stdout_tail.length).toBeGreaterThan(0);
// Tail must contain characters we emitted
expect(res.stdout_tail).toContain('ok');
});
});