Cost Optimization

How Lavra assigns its 30 agents to model tiers for optimal cost/performance balance.

Tier Breakdown

Tier	Count	Agents
haiku	5	`ankane-readme-writer`, `framework-docs-researcher`, `learnings-researcher`, `repo-research-analyst`, `lint`
sonnet	18	Most reviewers and workflow agents
inherit	7	`agent-native-reviewer`, `architecture-strategist`, `data-integrity-guardian`, `data-migration-expert`, `julik-frontend-races-reviewer`, `performance-oracle`, `spec-flow-analyzer`

inherit means the agent runs at whatever model the calling command uses — typically sonnet.

Design Rationale

Haiku for structured, template-based tasks: README generation, knowledge search, linting — fast and cheap
Sonnet for the bulk of review and research work — good judgment on well-defined tasks
Inherit for agents whose quality scales with the calling context — if you invoke them on opus, they get opus too

Cost at Scale

/lavra-review dispatches up to 13 agents in parallel. With the default sonnet tier, a full review run costs roughly the same as 2–3 manual code review messages. The haiku agents (linting, knowledge search) add negligible cost.

Configuring Model Quality

Set model_profile in .lavra/config/lavra.json to "quality" to route critical agents (security-sentinel, architecture-strategist, goal-verifier, performance-oracle) to opus automatically. All other agents stay at their default tier. This affects /lavra-review, /lavra-eng-review, /lavra-work, and /lavra-ship.

{ "model_profile": "quality" }

The default "balanced" keeps all agents at their configured tier.