返回基准测试
📊
分析
数据分析、摘要
8个模型每周更新
任务示例
此类别的示例任务
简单
Sentiment Classification
Classify sentiment of customer reviews.
困难
Compare Two Documents
Compare two product descriptions and highlight differences.
中等
Data Summary
Analyze data and provide insights.
模型排名
查看方法论 →| 排名 | 模型 | 得分 | 价格/1M | 任务 | |
|---|---|---|---|---|---|
| 🥇 | Qwen3 235B | 93.0 | $0.60 | 1 | |
| 🥈 | GPT-4o Mini | 93.0 | $0.60 | 1 | |
| 🥉 | DeepSeek R1 | 93.0 | $2.19 | 1 | |
| 4 | Qwen3 Max | 93.0 | $1.60 | 1 | |
| 5 | GPT-4o | 90.0 | $10.00 | 1 | |
| 6 | Claude 3.5 Haiku | 87.0 | $4.00 | 1 | |
| 7 | Llama 3.3 70B | 87.0 | $0.40 | 1 | |
| 8 | Gemini 2.0 Flash | 83.0 | $0.40 | 1 |