Cost model
Wyren’s distiller runs continuously in the background. Naively, that’s wasteful. Wyren uses a three-tier pipeline to keep cost minimal — or zero under the preferred execution path.
Naive baseline (what we DON’T do)
Section titled “Naive baseline (what we DON’T do)”Without tiering, every Stop hook triggers a Sonnet 4.6 call:
- Per call: ~10k input tokens (transcript slice + memory + prompt) + ~2k output.
- Sonnet 4.6: $3/M input + $15/M output → ~$0.06 per call.
- 3 teammates × 4 active hours × 12 triggers/hr = 144 calls = ~$8.60 total.
Cheap in absolute terms, but most turns contribute nothing to memory (tool-use loops, file reads, small tweaks). The fix is tiering.
Tier 0 — Local scoring filter (FREE)
Section titled “Tier 0 — Local scoring filter (FREE)”Before any API call, lib/filter.mjs scores the transcript slice. Scoring runs entirely in Node — no API call, no cost.
Text signals (weighted, capped per category):
| Category | Weight | Examples |
|---|---|---|
| Decision language | 3 | decided, we're going with, chose, agreed |
| Rejection / failure | 3 | rejected, doesn't work, tried X but, abandoned |
| Deliberate hacks | 3 | workaround, hack, hardcoded, stub, mock |
| Scope signals | 2 | out of scope, deferred, descoped, dropping |
| Open questions | 2 | open question, still deciding, TBD, revisit |
| Maintenance flags | 2 | TODO, FIXME, before launch |
| Weak signals | 1 | actually, instead, broken, for now |
Edit tool calls (Edit / Write / MultiEdit) are scored separately at weight 3 per call, capped at weight × 4.
Structural signals (scored on raw JSONL lines, not rendered text):
- Session length ≥ 10 turns: +2; ≥ 20 turns: +4 total
- Average user message length > 200 chars: +2 (explains context or decisions)
- File edits ≥ 3: +2; ≥ 8: +4 total
Rule: total score must reach the threshold (default 3, overridable via WYREN_TIER0_THRESHOLD) to proceed. A single high-value signal word is enough; weak words alone need reinforcement. ~60–70% of triggers die here for free.
Tier 1 — Claude Haiku 4.5 (routine distillation)
Section titled “Tier 1 — Claude Haiku 4.5 (routine distillation)”The remaining 30–40% of triggers go to Haiku 4.5 (claude-haiku-4-5-20251001):
- Pricing:
$1/M input + $5/M output → **$0.02 per call**. - Latency: ~2 s P50 — background completes before the next trigger.
- Quality: sufficient for incremental updates. The task is merging a small delta into a well-structured existing file, not generating from scratch.
This is the only model the Stop hook ever spawns automatically.
Tier 2 — Claude Sonnet 4.6 (deep cleanup)
Section titled “Tier 2 — Claude Sonnet 4.6 (deep cleanup)”Sonnet is designed for a slower cadence to correct Haiku drift over long sessions:
- Once per hour after any Tier 1 runs
- When
memory.mdexceeds 60 lines → forced re-compression - On explicit
/wyren-handoff
Not yet automated. The standalone distiller.mjs accepts --model <id>, so you can invoke Sonnet manually with wyren distill --model claude-sonnet-4-6 --push — but the Stop hook always uses Haiku. No timer or line-count trigger exists yet. /wyren-handoff runs the normal distiller (Haiku) with --push.
Haiku drift on long sessions is a real concern — Tier 2 automation is planned follow-up work, not a current capability. See Future.
Prompt caching (SDK path only)
Section titled “Prompt caching (SDK path only)”When using the Anthropic SDK (not claude -p):
- System prompt → 5 min cache, ~$0 on repeat reads.
- Current
memory.md→ 5 min cache, ~$0 on repeat reads. - Only the transcript slice varies per call — small incremental cost.
Effective cached per-call cost: ~$0.003 on Haiku, ~$0.008 on Sonnet.
Revised total — what actually ships
Section titled “Revised total — what actually ships”With Tier 0 + Tier 1 (no Tier 2 automation):
- 144 raw triggers (3 teammates × 4 active hours × 12/hr).
- Tier 0 kills ~70% → ~43 triggers reach the API.
- All 43 go to Haiku. Under
claude -pthese draw from each teammate’s Claude Code quota — no separate billing. - If the SDK path is used instead with prompt caching: 43 × $0.003 = ~$0.13 total for the session.
Compared to the naive baseline of $8.60 — roughly 65× cheaper, and $0 under the preferred claude -p path.
The zero-billing path
Section titled “The zero-billing path”When using claude -p headless (the default), distillation draws from each teammate’s existing Claude Code quota — no separate API billing surface at all.
The tiered SDK path only matters for cost-conscious deployments where teammates do not have Claude Code subscriptions.
Free-only fallback
Section titled “Free-only fallback”If a teammate has no Claude Code auth AND no API key (edge case), Tier 0 scoring alone produces a degraded but non-zero memory:
- Bullet-list every signal-word match with surrounding context.
- No semantic cleanup — verbose and noisy.
- Still better than nothing on a team with one paid member distilling on everyone else’s behalf.
Which tiers ship?
Section titled “Which tiers ship?”| Tier | Status |
|---|---|
| Tier 0 (weighted scoring filter) | ✅ Shipped in lib/filter.mjs. Mandatory — runs before every API call. Kills ~70% of triggers. |
| Tier 1 (Haiku 4.5) | ✅ Shipped as default in distiller.mjs. Invoked via claude -p --model claude-haiku-4-5-20251001. |
| Tier 2 (Sonnet on timer / size threshold) | ❌ Not automated. distiller.mjs --model <id> accepts Sonnet manually; no automatic trigger. |
| SDK prompt caching | ❌ Not shipped. Current path uses claude -p exclusively. |