Volver a benchmarks
📊
Análisis
Análisis, resumen
8 modelosActualizaciones semanales
Ejemplos de tareas
Ejemplos de tareas en esta categoría
Fácil
Sentiment Classification
Classify sentiment of customer reviews.
Difícil
Compare Two Documents
Compare two product descriptions and highlight differences.
Medio
Data Summary
Analyze data and provide insights.
Rankings de modelos
Ver metodología →| Rango | Modelo | Puntuación | Precio/1M | Tareas | |
|---|---|---|---|---|---|
| 🥇 | Qwen3 235B | 93.0 | $0.60 | 1 | |
| 🥈 | GPT-4o Mini | 93.0 | $0.60 | 1 | |
| 🥉 | DeepSeek R1 | 93.0 | $2.19 | 1 | |
| 4 | Qwen3 Max | 93.0 | $1.60 | 1 | |
| 5 | GPT-4o | 90.0 | $10.00 | 1 | |
| 6 | Claude 3.5 Haiku | 87.0 | $4.00 | 1 | |
| 7 | Llama 3.3 70B | 87.0 | $0.40 | 1 | |
| 8 | Gemini 2.0 Flash | 83.0 | $0.40 | 1 |