Cost Optimization

How Lavra assigns its 30 agents to model tiers for optimal cost/performance balance.

Tier Breakdown

TierCountAgents
haiku5ankane-readme-writer, framework-docs-researcher, learnings-researcher, repo-research-analyst, lint
sonnet18Most reviewers and workflow agents
inherit7agent-native-reviewer, architecture-strategist, data-integrity-guardian, data-migration-expert, julik-frontend-races-reviewer, performance-oracle, spec-flow-analyzer

inherit means the agent runs at whatever model the calling command uses — typically sonnet.

Design Rationale

  • Haiku for structured, template-based tasks: README generation, knowledge search, linting — fast and cheap
  • Sonnet for the bulk of review and research work — good judgment on well-defined tasks
  • Inherit for agents whose quality scales with the calling context — if you invoke them on opus, they get opus too

Cost at Scale

/lavra-review dispatches up to 13 agents in parallel. With the default sonnet tier, a full review run costs roughly the same as 2–3 manual code review messages. The haiku agents (linting, knowledge search) add negligible cost.

Configuring Model Quality

Set model_profile in .lavra/config/lavra.json to "quality" to route critical agents (security-sentinel, architecture-strategist, goal-verifier, performance-oracle) to opus automatically. All other agents stay at their default tier. This affects /lavra-review, /lavra-eng-review, /lavra-work, and /lavra-ship.

{ "model_profile": "quality" }

The default "balanced" keeps all agents at their configured tier.