Files
gbrain/test/subagent-handler.test.ts
Garry Tan 96178d726e fix(subagent): v0.16.3 — bind Anthropic SDK correctly + enable tsc in CI (#318)
* fix(subagent): bind Anthropic SDK messages.create() correctly

The makeSubagentHandler was casting `new Anthropic()` directly to
MessagesClient, but MessagesClient.create() maps to sdk.messages.create(),
not sdk.create(). Every subagent job immediately died with:

  client.create is not a function

Fix: wrap the SDK instance so .create() delegates to .messages.create()
with proper `this` binding via .bind(sdk.messages).

Discovered on first production run of gbrain agent against Supabase.

Co-Authored-By: Wintermute <wintermute@openclaw.ai>

* chore(ci): add typescript typecheck to test pipeline + clean up baseline errors

Root cause infra gap that let the v0.16.0 subagent bug ship: CI ran
only `bun test`, which transpiles types without checking them. Type
errors only surfaced at runtime, in production.

Changes:
- Add `typescript` devDep and a `typecheck` npm script (`tsc --noEmit`).
- Chain `bun run typecheck` into `bun run test` so developers get the
  same pipeline locally that CI runs.
- Flip `.github/workflows/test.yml` to invoke `bun run test` (the npm
  script, including typecheck) instead of `bun test` (runner only).
- Clean up 100+ pre-existing type errors across 30+ files so the first
  run of `tsc --noEmit` is green. Root causes were:
  - `databaseUrl` → `database_url` rename drift in test fixtures (9 files)
  - `PageType` union missing `'meeting'` / `'note'` entries that are
    already used in both src and tests (link-extraction.ts comments
    acknowledged the gap)
  - `GBrainConfig.storage` field never declared despite being read in
    files.ts and operations.ts
  - `ErrorCode` union missing `'permission_denied'`
  - `OrchestratorOpts` shape changed; test callers not updated
  - Dead-code comparisons in migration orchestrators against narrowed
    status types
  - postgres.js `Row`-callback type drift on several `.map()` calls
  - Buffer-as-BodyInit assignment in supabase.ts (real but non-fatal
    runtime bug; Uint8Array slice works and is type-correct)
  - Various `as X` single-step casts that now need `as unknown as X`
    per TS's stricter structural-conversion rules
- Bump `beforeAll` hook timeout to 30s on four PGLite-heavy tests that
  were flaky under parallel test execution: wait-for-completion,
  extract-fs, e2e/search-quality, e2e/graph-quality. All pass in
  isolation; timeouts only happened when dozens of PGLite instances
  init'd simultaneously.

The new CI pipeline now fails on any type error across src/ or test/,
giving us the compile-time regression guard the subagent fix depends on.

* fix(subagent): bind Anthropic SDK messages.create() correctly

Shipped bug: v0.16.0 cast `new Anthropic()` to `MessagesClient`, but
`.create()` lives at `sdk.messages.create`, not on the top-level client.
Every subagent job in production died on first LLM call with
`client.create is not a function`. Discovered on the first `gbrain agent
run` against Supabase.

Fix: assign `sdk.messages` directly to the `MessagesClient` slot.
`sdk.messages` IS the object with a callable `.create()`; the original
bug was picking the wrong entry point on the SDK. No helper, no
wrapper, no `.bind()` — JS method-call semantics preserve `this` at
the call site because `subagent.ts:336` invokes `client.create(...)`
with `client === sdk.messages`.

The one-line assignment also typechecks cleanly against the existing
`MessagesClient` interface (SDK's first `create` overload:
`(MessageCreateParamsNonStreaming, Core.RequestOptions?) =>
APIPromise<Message>` is assignable structurally). This gives us
compile-time regression protection: anyone reverting to
`new Anthropic()` would fail tsc because `Anthropic` has no top-level
`.create`. (The companion chore commit puts `tsc --noEmit` in CI so
this guard is enforced.)

Also adds a `makeAnthropic?: () => Anthropic` dep-injection seam so
the factory default construction branch is testable without real API
calls. Regression test drives one handler turn through a fake SDK,
asserting `sdk.messages.create` is actually called. If someone later
reverts to `new Anthropic()`, both guards fire: tsc fails AND the test
fails.

Co-Authored-By: Wintermute <wintermute@garrytan.com>

* chore(tests): add bunfig.toml + 60s hook timeouts to stabilize PGLite-heavy suites

After turning on tsc in CI (previous commit), running the full `bun run test`
suite in one shot triggered flaky `beforeEach/afterEach hook timed out`
failures on 8+ test files. Every failure traced to PGLite WASM init
contention when many test files spin up fresh PGLite instances in parallel;
each one alone passes in isolation.

- `bunfig.toml` sets the global test hook timeout to 60s (default is 5s),
  covering every test file without per-file edits.
- Individual `beforeAll(fn, 60_000)` / `beforeEach(fn, 15_000)` calls on
  the 8 tests that flaked most stay in place as explicit safety nets so
  a future bunfig config change doesn't silently re-introduce the flake.

Result: 1997 pass, 0 fail on `bun run test` (117 tests added since the
prior baseline by picking up typecheck-gated passes). No infrastructure
flake tolerated in CI.

* chore: bump version and changelog (v0.16.3)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Wintermute <wintermute@garrytan.com>
Co-authored-by: Wintermute <wintermute@openclaw.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 01:34:22 -07:00

474 lines
18 KiB
TypeScript

/**
* Subagent handler tests with a mocked Anthropic Messages client.
*
* Strategy: every test scripts a sequence of Messages API responses, hands
* them to a FakeMessagesClient, and inspects (a) the SubagentResult the
* handler returns and (b) the persisted rows in subagent_messages +
* subagent_tool_executions. Replay tests simulate a crash by constructing
* a fresh handler bound to the same job row with partial state already
* written.
*
* PGLite in-memory so the schema, ON CONFLICT, and two-phase persistence
* all exercise real SQL.
*/
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
import { PGLiteEngine } from '../src/core/pglite-engine.ts';
import { MinionQueue } from '../src/core/minions/queue.ts';
import {
makeSubagentHandler,
RateLeaseUnavailableError,
type MessagesClient,
} from '../src/core/minions/handlers/subagent.ts';
import type { ToolDef, MinionJobContext } from '../src/core/minions/types.ts';
import type Anthropic from '@anthropic-ai/sdk';
let engine: PGLiteEngine;
let queue: MinionQueue;
beforeAll(async () => {
engine = new PGLiteEngine();
await engine.connect({ database_url: '' });
await engine.initSchema();
queue = new MinionQueue(engine);
}, 60_000);
afterAll(async () => {
await engine.disconnect();
});
beforeEach(async () => {
await engine.executeRaw('DELETE FROM subagent_tool_executions');
await engine.executeRaw('DELETE FROM subagent_messages');
await engine.executeRaw('DELETE FROM subagent_rate_leases');
await engine.executeRaw('DELETE FROM minion_jobs');
});
// ── FakeMessagesClient ──────────────────────────────────────
type FakeResponse = Partial<Anthropic.Message> & { content: Anthropic.Message['content'] };
class FakeMessagesClient implements MessagesClient {
public calls: Anthropic.MessageCreateParamsNonStreaming[] = [];
constructor(private responses: FakeResponse[]) {}
async create(
params: Anthropic.MessageCreateParamsNonStreaming,
): Promise<Anthropic.Message> {
this.calls.push(params);
if (this.responses.length === 0) throw new Error('FakeMessagesClient: out of scripted responses');
const r = this.responses.shift()!;
return {
id: `msg_${this.calls.length}`,
type: 'message',
role: 'assistant',
model: params.model,
stop_reason: 'end_turn',
stop_sequence: null,
usage: { input_tokens: 10, output_tokens: 5, cache_read_input_tokens: 0, cache_creation_input_tokens: 0 } as any,
...r,
} as Anthropic.Message;
}
}
// Build a synthetic MinionJobContext around a real minion_jobs row. The
// handler only reads data/id/signal/shutdownSignal/updateTokens — we stub
// the rest. `subagent` is a protected job name (Lane 4H) so tests submit
// under the trusted-submit flag.
async function makeCtx(input: unknown): Promise<MinionJobContext> {
const job = await queue.add(
'subagent',
input as Record<string, unknown>,
{},
{ allowProtectedSubmit: true },
);
const ac = new AbortController();
const shutdown = new AbortController();
return {
id: job.id,
name: job.name,
data: (input as Record<string, unknown>) ?? {},
attempts_made: 0,
signal: ac.signal,
shutdownSignal: shutdown.signal,
async updateProgress() {},
async updateTokens() {},
async log() {},
async isActive() { return true; },
async readInbox() { return []; },
};
}
// ── Tiny tool registry for tests ────────────────────────────
function makeEchoTool(name = 'echo', idempotent = true): ToolDef {
return {
name,
description: 'echo input',
input_schema: { type: 'object', properties: { value: { type: 'string' } }, required: [] },
idempotent,
async execute(input) { return { echoed: input }; },
};
}
function makeThrowingTool(name = 'broken'): ToolDef {
return {
name,
description: 'always throws',
input_schema: { type: 'object', properties: {}, required: [] },
idempotent: true,
async execute() { throw new Error('tool broken'); },
};
}
// ── Tests ───────────────────────────────────────────────────
describe('subagent handler happy path', () => {
test('no-tool end_turn: returns text response + persists user + assistant rows', async () => {
const client = new FakeMessagesClient([
{ content: [{ type: 'text', text: 'hello world' }] as any, stop_reason: 'end_turn' },
]);
const handler = makeSubagentHandler({ engine, client, toolRegistry: [] });
const ctx = await makeCtx({ prompt: 'hi' });
const result = await handler(ctx);
expect(result.result).toBe('hello world');
expect(result.turns_count).toBe(1);
expect(result.stop_reason).toBe('end_turn');
expect(result.tokens.in).toBe(10);
expect(result.tokens.out).toBe(5);
const msgs = await engine.executeRaw<{ count: string }>(
`SELECT count(*)::text AS count FROM subagent_messages WHERE job_id = $1`,
[ctx.id],
);
expect(parseInt(msgs[0]!.count, 10)).toBe(2); // user seed + assistant
});
test('single tool_use turn: tool executes, two-phase row goes complete', async () => {
const tool = makeEchoTool();
const client = new FakeMessagesClient([
{
content: [
{ type: 'tool_use', id: 'tu_1', name: 'echo', input: { value: 'v1' } } as any,
],
stop_reason: 'tool_use' as any,
},
{
content: [{ type: 'text', text: 'done' }] as any,
stop_reason: 'end_turn',
},
]);
const handler = makeSubagentHandler({ engine, client, toolRegistry: [tool] });
const ctx = await makeCtx({ prompt: 'go' });
const result = await handler(ctx);
expect(result.stop_reason).toBe('end_turn');
expect(result.result).toBe('done');
expect(client.calls.length).toBe(2);
// tool_executions row complete with echoed output
const rows = await engine.executeRaw<{ status: string; output: unknown }>(
`SELECT status, output FROM subagent_tool_executions WHERE job_id = $1`,
[ctx.id],
);
expect(rows.length).toBe(1);
expect(rows[0]!.status).toBe('complete');
const out = typeof rows[0]!.output === 'string' ? JSON.parse(rows[0]!.output as string) : rows[0]!.output;
expect(out).toEqual({ echoed: { value: 'v1' } });
});
test('tool throws: row goes failed, model sees error, loop continues', async () => {
const tool = makeThrowingTool();
const client = new FakeMessagesClient([
{
content: [{ type: 'tool_use', id: 'tu_1', name: 'broken', input: {} } as any],
stop_reason: 'tool_use' as any,
},
{
content: [{ type: 'text', text: 'recovered' }] as any,
stop_reason: 'end_turn',
},
]);
const handler = makeSubagentHandler({ engine, client, toolRegistry: [tool] });
const ctx = await makeCtx({ prompt: 'try' });
const result = await handler(ctx);
expect(result.stop_reason).toBe('end_turn');
expect(result.result).toBe('recovered');
const rows = await engine.executeRaw<{ status: string; error: string | null }>(
`SELECT status, error FROM subagent_tool_executions WHERE job_id = $1`,
[ctx.id],
);
expect(rows[0]!.status).toBe('failed');
expect(rows[0]!.error).toContain('tool broken');
});
test('unknown tool name fails execution but loop continues', async () => {
const client = new FakeMessagesClient([
{
content: [{ type: 'tool_use', id: 'tu_nope', name: 'no_such_tool', input: {} } as any],
stop_reason: 'tool_use' as any,
},
{ content: [{ type: 'text', text: 'ok' }] as any, stop_reason: 'end_turn' },
]);
const handler = makeSubagentHandler({ engine, client, toolRegistry: [] });
const ctx = await makeCtx({ prompt: 'x' });
const result = await handler(ctx);
expect(result.stop_reason).toBe('end_turn');
const rows = await engine.executeRaw<{ status: string; error: string | null }>(
`SELECT status, error FROM subagent_tool_executions WHERE job_id = $1`,
[ctx.id],
);
expect(rows[0]!.status).toBe('failed');
expect(rows[0]!.error).toContain('not in the registry');
});
test('max_turns exceeded returns stop_reason=max_turns', async () => {
// Model keeps calling tool_use forever; we cap at 2 turns.
const echoing: FakeResponse[] = Array.from({ length: 5 }).map((_, i) => ({
content: [{ type: 'tool_use', id: `tu_${i}`, name: 'echo', input: {} } as any],
stop_reason: 'tool_use' as any,
}));
const client = new FakeMessagesClient(echoing);
const tool = makeEchoTool();
const handler = makeSubagentHandler({ engine, client, toolRegistry: [tool] });
const ctx = await makeCtx({ prompt: 'loop', max_turns: 2 });
const result = await handler(ctx);
expect(result.stop_reason).toBe('max_turns');
expect(result.turns_count).toBe(2);
});
});
describe('subagent handler replay (crash recovery)', () => {
test('resumes from persisted messages when prior rows exist', async () => {
// Seed an in-progress conversation by running the first client, then
// running a second handler on the SAME job with responses starting at
// turn 2. No duplicate user-seed row (ON CONFLICT DO NOTHING).
const tool = makeEchoTool();
const client1 = new FakeMessagesClient([
{
content: [{ type: 'tool_use', id: 'tu_1', name: 'echo', input: { v: 1 } } as any],
stop_reason: 'tool_use' as any,
},
]);
const handler1 = makeSubagentHandler({ engine, client: client1, toolRegistry: [tool] });
const ctx = await makeCtx({ prompt: 'start' });
// Run handler1 until it WOULD make a second LLM call — force that
// second call to error so we persist only the first assistant message.
try {
const client1b = new FakeMessagesClient([
{
content: [{ type: 'tool_use', id: 'tu_1', name: 'echo', input: { v: 1 } } as any],
stop_reason: 'tool_use' as any,
},
]);
const interrupted = makeSubagentHandler({ engine, client: client1b, toolRegistry: [tool] });
await interrupted(ctx);
} catch {
// Out-of-scripted-responses — simulates worker kill before turn 2.
}
// Confirm partial state: 1 user + 1 assistant + 1 synthesized user
// (tool_result) + 1 tool_exec complete.
const preRows = await engine.executeRaw<{ c: string }>(
`SELECT count(*)::text AS c FROM subagent_messages WHERE job_id = $1`,
[ctx.id],
);
const preCount = parseInt(preRows[0]!.c, 10);
expect(preCount).toBeGreaterThanOrEqual(1);
// Resume with a fresh handler + client that supplies ONE more response.
const client2 = new FakeMessagesClient([
{ content: [{ type: 'text', text: 'resumed ok' }] as any, stop_reason: 'end_turn' },
]);
const handler2 = makeSubagentHandler({ engine, client: client2, toolRegistry: [tool] });
const result = await handler2(ctx);
expect(result.result).toBe('resumed ok');
expect(result.stop_reason).toBe('end_turn');
// Second client should see the prior conversation in the messages
// array — at minimum the user seed + prior assistant + tool_result.
expect(client2.calls[0]!.messages.length).toBeGreaterThan(1);
});
test('prior completed tool exec is replayed without re-invoking execute', async () => {
// Prior state: a completed tool row. We assert the tool's execute is
// NOT called on resume. Use a tool that throws if invoked — passing
// means we used the replay path.
const throwingTool = makeThrowingTool('pre_done');
const ctx = await makeCtx({ prompt: 'start' });
// Seed prior state manually: user, assistant with tool_use, tool_exec complete.
await engine.executeRaw(
`INSERT INTO subagent_messages (job_id, message_idx, role, content_blocks)
VALUES ($1, 0, 'user', $2::jsonb)`,
[ctx.id, JSON.stringify([{ type: 'text', text: 'start' }])],
);
await engine.executeRaw(
`INSERT INTO subagent_messages (job_id, message_idx, role, content_blocks, model)
VALUES ($1, 1, 'assistant', $2::jsonb, 'claude-sonnet-4-6')`,
[
ctx.id,
JSON.stringify([
{ type: 'tool_use', id: 'tu_seeded', name: 'pre_done', input: {} },
]),
],
);
await engine.executeRaw(
`INSERT INTO subagent_tool_executions (job_id, message_idx, tool_use_id, tool_name, input, status, output)
VALUES ($1, 1, 'tu_seeded', 'pre_done', '{}'::jsonb, 'complete', $2::jsonb)`,
[ctx.id, JSON.stringify({ replayed: true })],
);
// Handler MUST NOT call the throwing execute and MUST end the loop on
// the next LLM response.
const client = new FakeMessagesClient([
{ content: [{ type: 'text', text: 'finished after replay' }] as any, stop_reason: 'end_turn' },
]);
const handler = makeSubagentHandler({ engine, client, toolRegistry: [throwingTool] });
const result = await handler(ctx);
expect(result.stop_reason).toBe('end_turn');
expect(result.result).toBe('finished after replay');
// Only one LLM call made on this resume (we had 2 persisted messages +
// the tool result synthesis happened when resuming, then model spoke).
expect(client.calls.length).toBe(1);
});
test('pending non-idempotent tool exec rejects on resume', async () => {
const nonIdempotent = { ...makeEchoTool('do_once'), idempotent: false };
const ctx = await makeCtx({ prompt: 'start' });
await engine.executeRaw(
`INSERT INTO subagent_messages (job_id, message_idx, role, content_blocks)
VALUES ($1, 0, 'user', $2::jsonb)`,
[ctx.id, JSON.stringify([{ type: 'text', text: 'start' }])],
);
await engine.executeRaw(
`INSERT INTO subagent_messages (job_id, message_idx, role, content_blocks)
VALUES ($1, 1, 'assistant', $2::jsonb)`,
[
ctx.id,
JSON.stringify([{ type: 'tool_use', id: 'tu_x', name: 'do_once', input: {} }]),
],
);
await engine.executeRaw(
`INSERT INTO subagent_tool_executions (job_id, message_idx, tool_use_id, tool_name, input, status)
VALUES ($1, 1, 'tu_x', 'do_once', '{}'::jsonb, 'pending')`,
[ctx.id],
);
const client = new FakeMessagesClient([]);
const handler = makeSubagentHandler({ engine, client, toolRegistry: [nonIdempotent] });
await expect(handler(ctx)).rejects.toThrow(/non-idempotent/);
});
});
describe('subagent handler lease behavior', () => {
test('acquires + releases a lease around the LLM call', async () => {
const client = new FakeMessagesClient([
{ content: [{ type: 'text', text: 'ok' }] as any, stop_reason: 'end_turn' },
]);
const handler = makeSubagentHandler({
engine, client, toolRegistry: [], maxConcurrent: 1, rateLeaseKey: 'k1',
});
const ctx = await makeCtx({ prompt: 'hi' });
await handler(ctx);
// No leases should remain after completion.
const rows = await engine.executeRaw<{ c: string }>(
`SELECT count(*)::text AS c FROM subagent_rate_leases`,
);
expect(parseInt(rows[0]!.c, 10)).toBe(0);
});
test('throws RateLeaseUnavailableError when cap full', async () => {
// Preload the cap with a stale-looking-but-live lease owned by a
// different job.
const owner = await queue.add('holder', {});
await engine.executeRaw(
`INSERT INTO subagent_rate_leases (key, owner_job_id, expires_at)
VALUES ('k_cap', $1, now() + interval '1 minute')`,
[owner.id],
);
const client = new FakeMessagesClient([]);
const handler = makeSubagentHandler({
engine, client, toolRegistry: [], maxConcurrent: 1, rateLeaseKey: 'k_cap',
});
const ctx = await makeCtx({ prompt: 'blocked' });
await expect(handler(ctx)).rejects.toBeInstanceOf(RateLeaseUnavailableError);
});
});
describe('subagent handler input validation', () => {
test('missing prompt throws', async () => {
const client = new FakeMessagesClient([]);
const handler = makeSubagentHandler({ engine, client, toolRegistry: [] });
const ctx = await makeCtx({});
await expect(handler(ctx)).rejects.toThrow(/prompt/);
});
test('allowed_tools unknown name rejected at dispatch', async () => {
const tool = makeEchoTool('real');
const client = new FakeMessagesClient([]);
const handler = makeSubagentHandler({ engine, client, toolRegistry: [tool] });
const ctx = await makeCtx({ prompt: 'x', allowed_tools: ['real', 'ghost_tool'] });
await expect(handler(ctx)).rejects.toThrow(/unknown tool/);
});
});
describe('makeSubagentHandler default client construction', () => {
test('factory default wires sdk.messages through to the handler', async () => {
// Regression guard for the v0.16.0 shipped bug: makeSubagentHandler
// was casting `new Anthropic()` (top-level SDK class) to MessagesClient,
// but `.create()` lives at sdk.messages.create. Every subagent job in
// production died with "client.create is not a function" on first LLM
// call. This test exercises the default-client path (no `deps.client`
// injected) via the makeAnthropic dep-injection seam, so the exact
// default-branch construction is covered without a real API call.
const calls: Anthropic.MessageCreateParamsNonStreaming[] = [];
const fakeSdk = {
messages: {
async create(
params: Anthropic.MessageCreateParamsNonStreaming,
): Promise<Anthropic.Message> {
calls.push(params);
return {
id: 'msg_regression',
type: 'message',
role: 'assistant',
model: params.model,
stop_reason: 'end_turn',
stop_sequence: null,
content: [{ type: 'text', text: 'ok' }],
usage: {
input_tokens: 1,
output_tokens: 1,
cache_read_input_tokens: 0,
cache_creation_input_tokens: 0,
},
} as unknown as Anthropic.Message;
},
},
} as unknown as Anthropic;
// Crucial: do NOT pass `client`. Only `makeAnthropic`. This forces the
// factory to hit the default-client branch (`deps.client ?? makeAnthropic().messages`).
const handler = makeSubagentHandler({
engine,
makeAnthropic: () => fakeSdk,
toolRegistry: [],
});
const ctx = await makeCtx({ prompt: 'hello' });
const result = await handler(ctx);
expect(calls.length).toBe(1);
expect(result.stop_reason).toBe('end_turn');
expect(result.result).toBe('ok');
});
});