Add Section 16: Deterministic Collectors — Code for Data, LLMs for Judgment (#13)

Pattern for when LLMs keep failing at mechanical formatting tasks despite
prompt fixes. Move mechanical work to deterministic code, feed LLM
pre-formatted data. Real example: email URL generation.

Co-authored-by: Wintermute <wintermute@openclaw.ai>
This commit is contained in:
Garry Tan
2026-04-09 11:35:33 -10:00
committed by GitHub
parent 95353f8790
commit 00217feda3

View File

@@ -874,16 +874,168 @@ Every stored message includes:
{
"id": "19d74109a811b9e7",
"account": "work",
"authuser": "garry@ycombinator.com",
"from": "Alex Levy",
"authuser": "user@example.com",
"from": "Alex Smith",
"subject": "Re: Next Steps",
"snippet": "Hey Garry, wanted to follow up on...",
"snippet": "Hey, wanted to follow up on...",
"timestamp": "2026-04-09T08:56:09Z",
"is_unread": true,
"is_noise": false,
"is_signature": false,
"gmail_link": "https://mail.google.com/mail/u/?authuser=garry@ycombinator.com#inbox/19d74109a811b9e7",
"gmail_markdown": "[Open in Gmail](https://mail.google.com/mail/u/?authuser=garry@ycombinator.com#inbox/19d74109a811b9e7)"
"gmail_link": "https://mail.google.com/mail/u/?authuser=user@example.com#inbox/19d74109a811b9e7",
"gmail_markdown": "[Open in Gmail](https://mail.google.com/mail/u/?authuser=user@example.com#inbox/19d74109a811b9e7)"
}
```
The `gmail_link` and `gmail_markdown` fields are computed from `id` + `authuser` at
collection time. Three lines of code. Never wrong.
### Cron Integration
The email monitoring cron runs the collector first, then invokes the LLM:
```
1. node email-collector.mjs collect → deterministic API pull, store JSON
2. node email-collector.mjs digest → generate markdown with links baked in
3. node email-collector.mjs signatures → list pending e-signature items
4. LLM reads digest + signatures → classifies, enriches, posts to user
```
The collector runs in under 10 seconds. The LLM analysis takes 30-60 seconds. Total:
under 90 seconds for a full inbox sweep with brain enrichment.
### Where Else This Pattern Applies
The deterministic-collector pattern works for any recurring data pull where the LLM
was previously responsible for both fetching AND formatting:
| Signal Source | Collector Generates | LLM Adds |
|--------------|-------------------|----------|
| **Email** | Gmail links, sender metadata, signature detection | Urgency classification, enrichment, reply drafts |
| **X/Twitter** | Tweet links, engagement metrics, deletion detection | Sentiment analysis, narrative detection, content ideas |
| **Calendar** | Event links, attendee lists, conflict detection | Prep briefings, meeting context from brain |
| **Slack** | Channel links, thread links, mention detection | Priority classification, action item extraction |
| **GitHub** | PR/issue links, diff stats, CI status | Code review context, priority assessment |
The principle: if a piece of output MUST be present and MUST be formatted correctly
every time, generate it in code. If a piece of output requires judgment, context, or
creativity, generate it with the LLM. Don't ask the LLM to do both in the same pass.
### The Lesson
When an LLM keeps failing at a mechanical task despite repeated prompt fixes:
1. **Stop adding more prompt language.** You've already written "NO EXCEPTIONS" and
"ZERO TOLERANCE." The LLM read it. The failure is probabilistic, not comprehension.
2. **Identify what's mechanical vs. analytical.** URL generation is mechanical.
Classification is analytical. State tracking is mechanical. Commentary is analytical.
3. **Move the mechanical work to a script.** Node.js, Python, bash -- anything
deterministic. No LLM calls, no external dependencies if possible.
4. **Feed the LLM pre-formatted data.** The script's output becomes the LLM's input.
Links are already there. Metadata is already structured. The LLM just adds judgment.
5. **Wire it into your cron.** Script runs first (fast, cheap, reliable), then LLM
reads the output (slower, expensive, creative).
This is not about the LLM being bad. It's about using the right tool for the right
job. Code is 100% reliable at string concatenation. LLMs are 90% reliable at string
concatenation but 10x better at understanding what an email means. Use both.
---
## 16. Deterministic Collectors -- Code for Data, LLMs for Judgment
When your agent keeps failing at a mechanical task despite repeated prompt fixes, stop
fighting the LLM. Move the mechanical work to code.
### The Pattern That Broke
We built an email triage system. The agent swept Gmail, classified emails by urgency,
and posted a digest to the user. One rule: every email item must include a clickable
`[Open in Gmail]` link so the user can act on it with one tap.
We put the rule in the skill file. We put it in MEMORY.md. We put it in the cron
prompt. We wrote "NO EXCEPTIONS" in all caps. We wrote "ZERO TOLERANCE" after the
fourth failure. The agent still dropped links -- on carry-forward reminders, on FYI
items, on "still awaiting" sections. The user asked five times. Each time we added
stronger language to the prompt.
The failure mode is probabilistic. The LLM understands the rule. It follows it for the
first 10 items. Then it gets sloppy on item 11, especially on items that are
re-surfaced from state rather than freshly pulled from the API. No amount of prompt
engineering fixes a 90%-reliable formatting task, because 90% reliability over 20 items
per sweep means you fail visibly about twice per day. That's enough to destroy trust.
### The Fix: Separate Deterministic from Analytical
```
┌─────────────────────────────┐ ┌──────────────────────────────┐
│ Deterministic Collector │────▶│ LLM Agent │
│ (Node.js / Python script) │ │ │
│ │ │ • Read the pre-formatted │
│ • Pull data from API │ │ digest │
│ • Store structured JSON │ │ • Classify items │
│ • Generate links/URLs │ │ • Add commentary │
│ • Detect patterns (regex) │ │ • Run brain enrichment │
│ • Track state (seen/new) │ │ • Draft replies │
│ • Output markdown digest │ │ • Surface to user │
│ │ │ │
│ CODE — deterministic, │ │ AI — judgment, context, │
│ never forgets │ │ creativity │
└─────────────────────────────┘ └──────────────────────────────┘
```
The collector handles everything mechanical:
- Pulling emails from Gmail (via credential gateway)
- Generating `[Open in Gmail](URL)` from message IDs -- **by code, not by LLM**
- Detecting signature requests (DocuSign/Dropbox Sign regex patterns)
- Tracking which messages are new vs. already seen (state file)
- Storing structured JSON with full metadata
- Generating a pre-formatted markdown digest with every link already embedded
The LLM reads the pre-formatted digest and does what LLMs are good at:
- Classifying urgency (requires understanding relationships, deadlines, context)
- Writing commentary ("this is the $110M acquisition thread, 7 days dropped")
- Running brain enrichment on notable entities (`gbrain search` + page updates)
- Drafting replies
- Deciding what to surface vs. filter
**The links are in the source data. The LLM can't forget them because it doesn't
generate them.**
### Implementation
The email collector follows the same architecture as the X/Twitter collector (a
deterministic data pipeline for social media monitoring):
```
scripts/email-collector/
├── email-collector.mjs # No LLM calls, no external deps
├── data/
│ ├── state.json # Last pull timestamp, known IDs, pending signatures
│ ├── messages/ # Structured JSON per day
│ │ └── 2026-04-09.json
│ └── digests/ # Pre-formatted markdown
│ └── 2026-04-09.md
```
Every stored message includes:
```json
{
"id": "19d74109a811b9e7",
"account": "work",
"authuser": "user@example.com",
"from": "Alex Smith",
"subject": "Re: Next Steps",
"snippet": "Hey, wanted to follow up on...",
"timestamp": "2026-04-09T08:56:09Z",
"is_unread": true,
"is_noise": false,
"is_signature": false,
"gmail_link": "https://mail.google.com/mail/u/?authuser=user@example.com#inbox/19d74109a811b9e7",
"gmail_markdown": "[Open in Gmail](https://mail.google.com/mail/u/?authuser=user@example.com#inbox/19d74109a811b9e7)"
}
```