返回基准测试
🧠
推理
逻辑、数学、规划
9个模型每周更新
任务示例
此类别的示例任务
简单
Number Sequence
Find the pattern and next number in a sequence.
困难
Causal Reasoning
Identify cause and effect relationships.
困难
Constraint Satisfaction
Find a solution that satisfies all constraints.
模型排名
查看方法论 →| 排名 | 模型 | 得分 | 价格/1M | 任务 | |
|---|---|---|---|---|---|
| 🥇 | Qwen3 235B | 98.3 | $0.60 | 6 | |
| 🥈 | GPT-4o | 97.8 | $10.00 | 6 | |
| 🥉 | Claude 3.5 Sonnet | 97.8 | $15.00 | 6 | |
| 4 | Qwen3 Max | 97.7 | $1.60 | 6 | |
| 5 | GPT-4o Mini | 95.5 | $0.60 | 6 | |
| 6 | DeepSeek R1 | 92.8 | $2.19 | 6 | |
| 7 | Gemini 2.0 Flash | 88.5 | $0.40 | 6 | |
| 8 | Llama 3.3 70B | 83.5 | $0.40 | 6 | |
| 9 | Claude 3.5 Haiku | 76.5 | $4.00 | 6 |