Back to Benchmarks
📊
Analysis
Data analysis, summarization
8 modelsWeekly updates
Task Examples
Example Tasks in This Category
Easy
Sentiment Classification
Classify sentiment of customer reviews.
Hard
Compare Two Documents
Compare two product descriptions and highlight differences.
Medium
Data Summary
Analyze data and provide insights.
Model Rankings
View Methodology →| Rank | Model | Score | Price/1M | Tasks | |
|---|---|---|---|---|---|
| 🥇 | Qwen3 235B | 93.0 | $0.60 | 1 | |
| 🥈 | GPT-4o Mini | 93.0 | $0.60 | 1 | |
| 🥉 | DeepSeek R1 | 93.0 | $2.19 | 1 | |
| 4 | Qwen3 Max | 93.0 | $1.60 | 1 | |
| 5 | GPT-4o | 90.0 | $10.00 | 1 | |
| 6 | Claude 3.5 Haiku | 87.0 | $4.00 | 1 | |
| 7 | Llama 3.3 70B | 87.0 | $0.40 | 1 | |
| 8 | Gemini 2.0 Flash | 83.0 | $0.40 | 1 |