返回基准测试

📊

分析

数据分析、摘要

8个模型每周更新

任务示例

此类别的示例任务

简单

Sentiment Classification

Classify sentiment of customer reviews.

困难

Compare Two Documents

Compare two product descriptions and highlight differences.

中等

Data Summary

Analyze data and provide insights.

模型排名

查看方法论 →

	排名	模型	得分	价格/1M	任务
	🥇	Qwen3 235B	93.0	$0.60	1
	🥈	GPT-4o Mini	93.0	$0.60	1
	🥉	DeepSeek R1	93.0	$2.19	1
	4	Qwen3 Max	93.0	$1.60	1
	5	GPT-4o	90.0	$10.00	1
	6	Claude 3.5 Haiku	87.0	$4.00	1
	7	Llama 3.3 70B	87.0	$0.40	1
	8	Gemini 2.0 Flash	83.0	$0.40	1

其他类别

💻编程 ✍️写作 🌍翻译 🧠推理