返回基准测试
📊

分析

数据分析、摘要

8个模型每周更新

任务示例

此类别的示例任务

简单

Sentiment Classification

Classify sentiment of customer reviews.

困难

Compare Two Documents

Compare two product descriptions and highlight differences.

中等

Data Summary

Analyze data and provide insights.

排名模型得分价格/1M任务
🥇Qwen3 235B93.0$0.601
🥈GPT-4o Mini93.0$0.601
🥉DeepSeek R193.0$2.191
4Qwen3 Max93.0$1.601
5GPT-4o90.0$10.001
6Claude 3.5 Haiku87.0$4.001
7Llama 3.3 70B87.0$0.401
8Gemini 2.0 Flash83.0$0.401