I found the most powerful large models in various fields

(nanoai.run)

1 points | by Li_Evan 11 hours ago ago

1 comments

  • Li_Evan 11 hours ago ago

    I have created my own original large-scale model evaluation dataset with 18 major dimensions, nearly 100 minor dimensions, and a total of 970 questions. The following are the test results: 1. Software Engineering and Code Generation: GPT-5.3 codex 2. Code Comprehension, Reasoning, and Quality: GPT-5.3 codex 3. Debugging, Testing, and Maintenance: GPT-5.3 codex 4. Data Engineering and Backend Services: Claude Opus 4.6 5. Frontend and Product Engineering: Claude Opus 4.6 6. Agent Tool Invocation: Claude Opus 4.6 7. Web and Desktop Automation (Static): Claude Opus 4.6 8. Research and Knowledge Work Agent (Static): GPT-5.2 Pro 9. Mathematical and Formal Reasoning: Gemini 3.1 Pro 10. Logic and Planning: Gemini 3.1 Pro 11. Knowledge Breadth and Fact Verification: Gemini DeepThink 12. Reading Comprehension and Information Extraction: GPT-5.2 Thinking 13. Long Contextual Memory and Multi-turn Consistency: GPT-5.2 Thinking 14. Instruction Compliance and Alignment: Claude Opus 4.6 15. Multimodal Understanding and Visual Reasoning: GPT-5.2 Thinking 16. Emotional Intelligence and Collaborative Communication: GPT-4.5 17. Creative Expression and Aesthetics: Claude Opus 4.6