Skip to content

Cost model

Wyren’s distiller runs continuously in the background. Naively, that’s wasteful. Wyren uses a three-tier pipeline to keep cost minimal — or zero under the preferred execution path.

Without tiering, every Stop hook triggers a Sonnet 4.6 call:

  • Per call: ~10k input tokens (transcript slice + memory + prompt) + ~2k output.
  • Sonnet 4.6: $3/M input + $15/M output → ~$0.06 per call.
  • 3 teammates × 4 active hours × 12 triggers/hr = 144 calls = ~$8.60 total.

Cheap in absolute terms, but most turns contribute nothing to memory (tool-use loops, file reads, small tweaks). The fix is tiering.

Before any API call, lib/filter.mjs scores the transcript slice. Scoring runs entirely in Node — no API call, no cost.

Text signals (weighted, capped per category):

CategoryWeightExamples
Decision language3decided, we're going with, chose, agreed
Rejection / failure3rejected, doesn't work, tried X but, abandoned
Deliberate hacks3workaround, hack, hardcoded, stub, mock
Scope signals2out of scope, deferred, descoped, dropping
Open questions2open question, still deciding, TBD, revisit
Maintenance flags2TODO, FIXME, before launch
Weak signals1actually, instead, broken, for now

Edit tool calls (Edit / Write / MultiEdit) are scored separately at weight 3 per call, capped at weight × 4.

Structural signals (scored on raw JSONL lines, not rendered text):

  • Session length ≥ 10 turns: +2; ≥ 20 turns: +4 total
  • Average user message length > 200 chars: +2 (explains context or decisions)
  • File edits ≥ 3: +2; ≥ 8: +4 total

Rule: total score must reach the threshold (default 3, overridable via WYREN_TIER0_THRESHOLD) to proceed. A single high-value signal word is enough; weak words alone need reinforcement. ~60–70% of triggers die here for free.

Tier 1 — Claude Haiku 4.5 (routine distillation)

Section titled “Tier 1 — Claude Haiku 4.5 (routine distillation)”

The remaining 30–40% of triggers go to Haiku 4.5 (claude-haiku-4-5-20251001):

  • Pricing: $1/M input + $5/M output → **$0.02 per call**.
  • Latency: ~2 s P50 — background completes before the next trigger.
  • Quality: sufficient for incremental updates. The task is merging a small delta into a well-structured existing file, not generating from scratch.

This is the only model the Stop hook ever spawns automatically.

Tier 2 — Claude Sonnet 4.6 (deep cleanup)

Section titled “Tier 2 — Claude Sonnet 4.6 (deep cleanup)”

Sonnet is designed for a slower cadence to correct Haiku drift over long sessions:

  • Once per hour after any Tier 1 runs
  • When memory.md exceeds 60 lines → forced re-compression
  • On explicit /wyren-handoff

Not yet automated. The standalone distiller.mjs accepts --model <id>, so you can invoke Sonnet manually with wyren distill --model claude-sonnet-4-6 --push — but the Stop hook always uses Haiku. No timer or line-count trigger exists yet. /wyren-handoff runs the normal distiller (Haiku) with --push.

Haiku drift on long sessions is a real concern — Tier 2 automation is planned follow-up work, not a current capability. See Future.

When using the Anthropic SDK (not claude -p):

  • System prompt → 5 min cache, ~$0 on repeat reads.
  • Current memory.md → 5 min cache, ~$0 on repeat reads.
  • Only the transcript slice varies per call — small incremental cost.

Effective cached per-call cost: ~$0.003 on Haiku, ~$0.008 on Sonnet.

With Tier 0 + Tier 1 (no Tier 2 automation):

  • 144 raw triggers (3 teammates × 4 active hours × 12/hr).
  • Tier 0 kills ~70% → ~43 triggers reach the API.
  • All 43 go to Haiku. Under claude -p these draw from each teammate’s Claude Code quota — no separate billing.
  • If the SDK path is used instead with prompt caching: 43 × $0.003 = ~$0.13 total for the session.

Compared to the naive baseline of $8.60 — roughly 65× cheaper, and $0 under the preferred claude -p path.

When using claude -p headless (the default), distillation draws from each teammate’s existing Claude Code quota — no separate API billing surface at all.

The tiered SDK path only matters for cost-conscious deployments where teammates do not have Claude Code subscriptions.

If a teammate has no Claude Code auth AND no API key (edge case), Tier 0 scoring alone produces a degraded but non-zero memory:

  • Bullet-list every signal-word match with surrounding context.
  • No semantic cleanup — verbose and noisy.
  • Still better than nothing on a team with one paid member distilling on everyone else’s behalf.
TierStatus
Tier 0 (weighted scoring filter)Shipped in lib/filter.mjs. Mandatory — runs before every API call. Kills ~70% of triggers.
Tier 1 (Haiku 4.5)Shipped as default in distiller.mjs. Invoked via claude -p --model claude-haiku-4-5-20251001.
Tier 2 (Sonnet on timer / size threshold)Not automated. distiller.mjs --model <id> accepts Sonnet manually; no automatic trigger.
SDK prompt cachingNot shipped. Current path uses claude -p exclusively.