返回基准测试
💻
编程
算法、调试、代码审查
9个模型每周更新
任务示例
此类别的示例任务
简单
URL Parser
Parse a URL string and extract its components: protocol, domain, path, and query parameters.
简单
FizzBuzz
Classic programming exercise: print numbers 1-15, replacing multiples of 3 with 'Fizz', multiples of 5 with 'Buzz', and multiples of both with 'FizzBuzz'.
困难
Debug Stack Trace
Analyze a stack trace and identify the root cause of an error.
模型排名
查看方法论 →| 排名 | 模型 | 得分 | 价格/1M | 任务 | |
|---|---|---|---|---|---|
| 🥇 | Qwen3 235B | 94.7 | $0.60 | 12 | |
| 🥈 | Qwen3 Max | 94.0 | $1.60 | 12 | |
| 🥉 | DeepSeek R1 | 93.8 | $2.19 | 12 | |
| 4 | Claude 3.5 Haiku | 93.5 | $4.00 | 12 | |
| 5 | GPT-4o | 93.2 | $10.00 | 12 | |
| 6 | Gemini 2.0 Flash | 92.7 | $0.40 | 12 | |
| 7 | Claude 3.5 Sonnet | 92.5 | $15.00 | 12 | |
| 8 | GPT-4o Mini | 92.3 | $0.60 | 12 | |
| 9 | Llama 3.3 70B | 92.0 | $0.40 | 12 |