model-hierarchy
Cost-optimize AI agent operations by routing tasks to appropriate models based on complexity. Use this skill when: (1) deciding which model to use for a task, (2) spawning sub-agents, (3) considering cost efficiency, (4) the current model feels like overkill for the task. Triggers: "model routing", "cost optimization", "which model", "too expensive", "spawn agent".
Security Vetted
Reviewed by AI agents and approved by humans.
Skill Instructions
# Model Hierarchy
Route tasks to the cheapest model that can handle them. Most agent work is routine.
## Core Principle
**80% of agent tasks are janitorial.** File reads, status checks, formatting, simple Q&A. These don't need expensive models. Reserve premium models for problems that actually require deep reasoning.
## Model Tiers
### Tier 1: Cheap ($0.10-0.50/M tokens)
| Model | Input | Output | Best For |
|-------|-------|--------|----------|
| DeepSeek V3 | $0.14 | $0.28 | General routine work |
| GPT-4o-mini | $0.15 | $0.60 | Quick responses |
| Claude Haiku | $0.25 | $1.25 | Fast tool use |
| Gemini Flash | $0.075 | $0.30 | High volume |
### Tier 2: Mid ($1-5/M tokens)
| Model | Input | Output | Best For |
|-------|-------|--------|----------|
| Claude Sonnet | $3.00 | $15.00 | Balanced performance |
| GPT-4o | $2.50 | $10.00 | Multimodal tasks |
| Gemini Pro | $1.25 | $5.00 | Long context |
### Tier 3: Premium ($10-75/M tokens)
| Model | Input | Output | Best For |
|-------|-------|--------|----------|
| Claude Opus | $15.00 | $75.00 | Complex reasoning |
| GPT-4.5 | $75.00 | $150.00 | Frontier tasks |
| o1 | $15.00 | $60.00 | Multi-step reasoning |
| o3-mini | $1.10 | $4.40 | Reasoning on budget |
*Prices as of Feb 2026. Check provider docs for current rates.*
## Task Classification
Before executing any task, classify it:
### ROUTINE → Use Tier 1
Characteristics:
- Single-step operations
- Clear, unambiguous instructions
- No judgment required
- Deterministic output expected
Examples:
- File read/write operations
- Status checks and health monitoring
- Simple lookups (time, weather, definitions)
- Formatting and restructuring text
- List operations (filter, sort, transform)
- API calls with known parameters
- Heartbeat and cron tasks
- URL fetching and basic parsing
### MODERATE → Use Tier 2
Characteristics:
- Multi-step but well-defined
- Some synthesis required
- Standard patterns apply
- Quality matters but isn't critical
Examples:
- Code generation (standard patterns)
- Summarization and synthesis
- Draft writing (emails, docs, messages)
- Data analysis and transformation
- Multi-file operations
- Tool orchestration
- Code review (non-security)
- Search and research tasks
### COMPLEX → Use Tier 3
Characteristics:
- Novel problem solving required
- Multiple valid approaches
- Nuanced judgment calls
- High stakes or irreversible
- Previous attempts failed
Examples:
- Multi-step debugging
- Architecture and design decisions
- Security-sensitive code review
- Tasks where cheaper model already failed
- Ambiguous requirements needing interpretation
- Long-context reasoning (>50K tokens)
- Creative work requiring originality
- Adversarial or edge-case handling
## Decision Algorithm
```
function selectModel(task):
# Rule 1: Escalation override
if task.previousAttemptFailed:
return nextTierUp(task.previousModel)
# Rule 2: Explicit complexity signals
if task.hasSignal("debug", "architect", "design", "security"):
return TIER_3
if task.hasSignal("write", "code", "summarize", "analyze"):
return TIER_2
# Rule 3: Default classification
complexity = classifyTask(task)
if complexity == ROUTINE:
return TIER_1
elif complexity == MODERATE:
return TIER_2
else:
return TIER_3
```
## Behavioral Rules
### For Main Session
1. **Default to Tier 2** for interactive work
2. **Suggest downgrade** when doing routine work: "This is routine - I can handle this on a cheaper model or spawn a sub-agent."
3. **Request upgrade** when stuck: "This needs more reasoning power. Switching to [premium model]."
### For Sub-Agents
1. **Default to Tier 1** unless task is clearly moderate+
2. **Batch similar tasks** to amortize overhead
3. **Report failures** back to parent for escalation
### For Automated Tasks
1. **Heartbeats/monitoring** → Always Tier 1
2. **Scheduled reports** → Tier 1 or 2 based on complexity
3. **Alert responses** → Start Tier 2, escalate if needed
## Communication Patterns
When suggesting model changes, use clear language:
**Downgrade suggestion:**
> "This looks like routine file work. Want me to spawn a sub-agent on DeepSeek for this? Same result, fraction of the cost."
**Upgrade request:**
> "I'm hitting the limits of what I can figure out here. This needs Opus-level reasoning. Switching up."
**Explaining hierarchy:**
> "I'm running the heavy analysis on Sonnet while sub-agents fetch the data on DeepSeek. Keeps costs down without sacrificing quality where it matters."
## Cost Impact
Assuming 100K tokens/day average usage:
| Strategy | Monthly Cost | Notes |
|----------|--------------|-------|
| Pure Opus | ~$225 | Maximum capability, maximum spend |
| Pure Sonnet | ~$45 | Good default for most work |
| Pure DeepSeek | ~$8 | Cheap but limited on hard problems |
| **Hierarchy (80/15/5)** | **~$19** | Best of all worlds |
The 80/15/5 split:
- 80% routine tasks on Tier 1 (~$6)
- 15% moderate tasks on Tier 2 (~$7)
- 5% complex tasks on Tier 3 (~$6)
**Result: 10x cost reduction vs pure premium, with equivalent quality on complex tasks.**
## Integration Examples
### OpenClaw
```yaml
# config.yml - set default model
model: anthropic/claude-sonnet-4
# In session, switch models
/model opus # upgrade for complex task
/model deepseek # downgrade for routine
# Spawn sub-agent on cheap model
sessions_spawn:
task: "Fetch and parse these 50 URLs"
model: deepseek
```
### Claude Code
```
# In CLAUDE.md or project instructions
When spawning background agents, use claude-3-haiku for:
- File operations
- Simple searches
- Status checks
Reserve claude-sonnet-4 for:
- Code generation
- Analysis tasks
```
### General Agent Systems
```python
def get_model_for_task(task_description: str) -> str:
routine_signals = ['read', 'fetch', 'check', 'list', 'format', 'status']
complex_signals = ['debug', 'architect', 'design', 'security', 'why']
desc_lower = task_description.lower()
if any(signal in desc_lower for signal in complex_signals):
return "claude-opus-4"
elif any(signal in desc_lower for signal in routine_signals):
return "deepseek-v3"
else:
return "claude-sonnet-4"
```
## Anti-Patterns
**DON'T:**
- Run heartbeats on Opus
- Use premium models for file I/O
- Keep expensive model when task is clearly routine
- Spawn sub-agents on premium models by default
**DO:**
- Start mid-tier, adjust based on task
- Spawn helpers on cheapest viable model
- Escalate explicitly when stuck
- Track cost per task type to optimize further
## Extending This Skill
To customize for your use case:
1. **Adjust tier definitions** based on your provider/budget
2. **Add domain-specific signals** to classification rules
3. **Track actual complexity** vs predicted to improve heuristics
4. **Set budget alerts** to catch runaway premium usage