# Sub-Agent Model Routing

## Goal

Route sub-agents to the cheapest model that can do the job, saving 10-40x on costs without sacrificing quality.

## What the User Gets

Without this: every sub-agent runs on Opus ($15/MTok). Entity detection on
every message costs $3-5/day. Research tasks cost $10+ each.

With this: entity detection runs on Sonnet ($3/MTok, 5x cheaper). Research
runs on DeepSeek ($0.50/MTok, 30x cheaper). Main session stays on Opus for
quality. Total cost drops 70-80%.

## Implementation

### Routing Table

| Task Type | Recommended Model | Why |
|-----------|------------------|-----|
| Main session / complex instructions | Opus-class (default) | Best reasoning and instruction following |
| Research / synthesis / analysis | DeepSeek V3 or equivalent | 25-40x cheaper, strong on exploratory work |
| Structured output / long context | Large context model (Qwen, Gemini) | 200K+ context, reliable JSON output |
| Fast lightweight sub-agents | Fast inference model (Groq) | 500 tok/s, cheap, good for quick tasks |
| Deep reasoning (use sparingly) | Reasoning model (DeepSeek-R1, o3) | Best for hard problems, expensive |
| Entity detection (signal detector) | Sonnet-class | Fast, cheap, sufficient quality for detection |

### The Signal Detector Pattern

Spawn a lightweight sub-agent on EVERY inbound message. This is mandatory.

```
on_every_message(text):
  // Spawn async — don't block the response
  spawn_subagent({
    task: `SIGNAL DETECTION — scan this message:
    "${text}"

    1. IDEAS FIRST: Is the user expressing an original thought?
       If yes -> create/update brain/originals/ with EXACT phrasing
    2. ENTITIES: Extract person names, company names, media titles
       For each -> check brain, create/enrich if notable
    3. FACTS: New info about existing entities -> update timeline
    4. CITATIONS: Every fact needs [Source: ...] attribution
    5. Sync changes to brain repo`,
    model: "sonnet-class",  // fast + cheap
    timeout: 120s
  })
```

**Why Sonnet-class for detection:** Entity detection is pattern matching, not
deep reasoning. Sonnet is 5-10x cheaper than Opus and fast enough for async
detection. The main session continues on Opus while detection runs in parallel.

### Research Pipeline Pattern

For research-heavy tasks, use a multi-model pipeline:

```
1. PLANNING (Opus):     Write research brief, identify what to look for
2. EXECUTION (DeepSeek): Sub-agent does the actual research (web, APIs, docs)
3. SYNTHESIS (Opus):     Read research output, add strategic analysis
```

**Why this works:** The planning and synthesis steps need taste and judgment
(Opus). The execution step is mechanical data gathering (DeepSeek at 25-40x
lower cost). You get Opus-quality output at DeepSeek-level cost for 80% of
the work.

### When to Spawn Sub-Agents

| Situation | Spawn? | Model |
|-----------|--------|-------|
| Every inbound message | YES (mandatory) | Sonnet |
| Research request | YES | DeepSeek for execution |
| Quick lookup / fact check | YES | Fast model (Groq) |
| Complex analysis | NO -- handle in main session | Opus |
| Writing / editing | NO -- handle in main session | Opus |

### Cost Optimization

The main session runs on your best model. Everything else runs on the
cheapest model that can do the job. In practice, 60-70% of sub-agent
work is entity detection (Sonnet) and research execution (DeepSeek),
which are 10-40x cheaper than the main session model.

## Tricky Spots

1. **Sonnet, not Opus, for detection.** The most common mistake is running
   entity detection on Opus. Detection is pattern matching, not deep reasoning.
   Sonnet is 5-10x cheaper and fast enough. Reserve Opus for the main session
   where reasoning quality matters.

2. **Don't block the main thread.** Sub-agents must run asynchronously. If the
   signal detector runs synchronously, the user waits 30-120 seconds for every
   message while entity detection completes. Spawn and forget. The user sees
   a response immediately.

3. **Cost optimization is multiplicative.** Entity detection runs on every
   single message. If you use Opus at $15/MTok for detection across 50
   messages/day, that's $3-5/day just for detection. Sonnet at $3/MTok brings
   that to $0.60-1.00/day. Over a month, the wrong model choice costs $100+
   more than necessary.

## How to Verify

1. **Spawn a signal detector and check the model.** Send a message and verify
   the sub-agent was spawned on Sonnet-class, not Opus. Check the model field
   in the sub-agent config or logs.

2. **Check cost per day.** After running for a day with sub-agent routing,
   compare total API costs against the previous day without routing. You
   should see a 50-80% reduction in total cost.

3. **Verify async execution.** Send a message and measure response time. The
   response should arrive in under 5 seconds. If it takes 30+ seconds, the
   signal detector is running synchronously and blocking the main thread.

---

*Part of the [GBrain Skillpack](../GBRAIN_SKILLPACK.md).*