Submitted:
26 August 2025
Posted:
26 August 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Background and Architectural Overview
2.1. Major Model Families
2.2. The Qwen Family
2.3. The Claude Family
2.4. The DeepSeek Family
3. Architectural and Performance Comparison
3.1. Model Architectures
3.2. Performance Benchmarks
3.3. Computational Efficiency
3.4. Cost-Performance Tradeoff
3.5. Context Window Scaling
3.6. Specialized Capabilities Radar Chart
4. Summary of Architectural and Performance Visualizations
4.1. Model Architectures
4.2. Performance Benchmarks (Figure 1)
4.3. Computational Efficiency (Table 1)
4.4. Cost-Performance Tradeoff (Figure 2)
4.5. Context Window Scaling (Figure 3)
4.6. Specialized Capabilities Radar Chart (Figure ??)
5. Temporal Analysis and Projections
5.1. Model Release Timeline
5.2. Performance Evolution
5.3. Feature Introduction Timeline
5.4. Market Share Projection
5.5. Performance-Cost Trajectory
5.6. Model Lifespan Analysis
6. Comparative Tables
6.1. Performance and Cost Analysis
| Model | SWE-bench (%) | HumanEval (%) | GSM8K (%) |
|---|---|---|---|
| Qwen3-Coder | 69.6 | 82.3 | 68.4 |
| Claude 3.7 Sonnet | 63.2 | 78.9 | 72.1 |
| DeepSeek R1 | 58.7 | 75.4 | 75.6 |
| Gemini 2.5 Pro | 61.8 | 80.1 | 70.3 |
| Model | API Cost (per 1M tokens) | Open-Source |
|---|---|---|
| Qwen3-Coder | $5 | No |
| Claude 3.7 Sonnet | $80 | No |
| DeepSeek R1 | $0 | Yes |
| GPT o3-mini | $60 | No |
6.2. Release Timeline and Market Trends
| Year | Qwen | Claude | DeepSeek |
|---|---|---|---|
| 2023 | Qwen 1.0 | Claude 3 | V2 |
| 2024 | Qwen 2.0 | Claude 3.5 | R1 |
| 2025 | Qwen 2.5/3 | Claude 4 | V3 |
| 2026* | Qwen 4* | Claude 5* | R2* |
| Provider | 2025 | 2026 | 2027 |
|---|---|---|---|
| OpenAI | 45 | 38 | 30 |
| Anthropic | 30 | 35 | 40 |
| Alibaba (Qwen) | 12 | 15 | 18 |
| DeepSeek | 8 | 10 | 12 |
6.3. Architectural and Specialization Comparison
| Model | Key Innovation |
|---|---|
| Qwen3-Coder | 480B MoE (35B active), 256K→1M token context |
| Claude 3.7 | Dense transformer, hybrid reasoning mode |
| DeepSeek R1 | Open-weight, GRPO training optimization |
| Gemini 2.5 | Multimodal fusion, 1M token context |
| Model | Code Gen | Debug | Math | Agentic |
|---|---|---|---|---|
| Qwen3-Coder | 4.5 | 4.0 | 3.5 | 4.7 |
| Claude 3.7 | 4.2 | 4.3 | 4.8 | 3.8 |
| DeepSeek R1 | 4.7 | 3.8 | 4.5 | 3.5 |
7. Summary of Comparative Tables
7.1. Performance and Cost Analysis
7.2. Release Timeline and Market Trends
7.3. Architectural and Specialization Comparison
8. Methodology for Comparative Analysis
8.1. Benchmark Evaluation
8.2. Qualitative Assessment
- Backend logic and web scraping.
- Frontend development, including animated UI and SVG art generation.
- Mathematical reasoning and logical problem-solving.
9. Model Overviews
9.1. DeepSeek Series
9.2. Qwen Series
9.3. Claude Series
9.4. Other Notable Models
10. Benchmark Surveys
10.1. Comparison Tables
10.2. Analysis of Coding and Agentic Capabilities
10.3. Translation and Reasoning
10.4. Equations and Mathematical Analysis
11. Summary of Temporal and Performance Visualizations
11.1. Model Release Timeline (Figure 5)
11.2. Performance Evolution (Figure 6)
11.3. Feature Introduction Timeline (Figure 7)
11.4. Market Share Projection (Figure 8)
11.5. Performance-Cost Trajectory (Figure 9)
11.6. Model Lifespan Analysis (Figure 10)
12. Coding Performance Comparison
12.1. Benchmark Results
12.2. Real-World Coding Tasks
- [42] tested ChatGPT o3-mini vs DeepSeek R1 vs Qwen 2.5 with 9 coding prompts, finding Qwen 2.5 performed best overall
- [50] compared Claude 3.7 Sonnet and Qwen 2.5 Coder across various code generation tasks
- [51] reported Qwen Code CLI as a viable alternative to Claude Code in daily development workflows
12.3. Specialized Coding Capabilities
13. Reasoning and General Performance
13.1. Mathematical Reasoning
13.2. General Knowledge Tasks
14. Cost and Efficiency Analysis
14.1. Computational Efficiency
14.2. API and Usage Costs
15. Findings and Discussion
15.1. Performance on Coding Benchmarks
15.2. Creative and Problem-Solving Abilities
15.3. Cost and Accessibility
16. Emerging Trends
16.1. Model Specialization
16.2. Open vs. Proprietary Models
16.3. Architectural Innovations
17. Conclusion
References
- Top AI Models 2025: Essential Guide for Developers.
- DeepSeek R1 shook the AI world. Now Qwen 2.5 Max is here Post LinkedIn.
- The Coding-Agent Crown Just Tipped: Qwen3-Coder Steps Up - GlobalGPT | Review.
- Head-to-Head: Comparing the Latest Versions of Qwen 2.5 Coder 32B and Claude Sonnet 3.5.
- Large Language Models Explained: Understanding the Technology Behind Modern AI | AIML API.
- Digest, T.R.A. AI World War 1 Just Began as Alibaba claims its new model outperforms DeepSeek, OpenAI, Meta!, 2025.
- Lanz, D..J.A. Chinese Open-Source AI DeepSeek R1 Matches OpenAI’s o1 at 98% Lower Cost, 2025. Section: News.
- Gemini 2.5 Pro vs Claude Sonnet 4: A Comprehensive Comparison - CometAPI - All AI Models in One API, 2025. Section: Technology.
- Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison - Composio.
- Gordon-Levitt, J. OPENAI O3-Mini vs Claude 3.5 SONNET-AI.
- Qwen 2.5 Max better than DeepSeek, beats ChatGPT in coding, costs 10x less than Claude 3.5, 2025.
- [AINews] DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level.
- Best LLMs for Coding (May 2025 Report), 2025.
- Which LLM is Best? 2025 Comparison Guide | Claude vs ChatGPT vs Gemini etc., 2025. Section: AI Tools.
- DeepSeek R1 vs Qwen 3: Coding Task Showdown.
- Claude 4 vs Deepseek R1 vs Qwen 3.
- Samarpit. Top AI Models Compared: Grok-3, DeepSeek R1, OpenAI o3-mini, Claude 3.7, Qwen 2.5 & Gemini 2.0, 2025.
- Team, Q. Qwen3-Coder: Agentic Coding in the World, 2025. Section: blog.
- Njenga, J. Alibaba Launches Claude Code Alternative Qwen Code (I Just Tested It), 2025.
- Could Qwen Be the Best Alternative to Claude Code for Developers?, 2025.
- Best AI Models for Coding: GPT, Claude, LLaMA, Mistral & More – AlgoCademy Blog.
- 2025 Complete Guide: How to Choose the Best Qwen3-Coder AI Coding Tool, 2025.
- DeepSeek-R1 Uncensored, QwQ-32B Puts Reasoning in Smaller Model, and more..., 2025.
- The Complete Guide to DeepSeek Models: From V3 to R1 and Beyond.
- Claude 4 Advances Code Gen, How DeepSeek Built V3 For $5.6m, Google I/O Roundup, and more..., 2025.
- Gemini 2.5 Pro vs Claude 3.7 Sonnet vs DeepSeek R1: Which Model Is the Best for Coding?, 2025.
- How Good is the Qwen 2.5 Coder?
- Can a small AI model topple giants? Alibaba’s QwQ-32B aims to, 2025.
- Claude 3.7 Sonnet: How it Works, Use Cases & More.
- Claude 3.7 Sonnet vs. Grok 3 vs. o3-mini-high - Composio.
- Claude Sonnet 3.7 vs. OpenAI o3-mini-high vs. DeepSeek R1 | by Cogni Down Under | Medium.
- Trivedi, A. Can OpenAI’s o3-mini Beat Claude Sonnet 3.5 in Coding?, 2025.
- Gemma 3 27b vs. QwQ 32b vs. Mistral 24b vs. Deepseek r1 - Composio.
- Team, T.E. DeepSeek-R1 Vs. OpenAI o3-mini: Which AI Model Is Winning?
- Lamers, R. Claude 4, Qwen 3 & DeepSeek R1 0528: model capabilities keep increasing, 2025.
- blogs, V.a. Top Gen AI Models Comparison - ChatGPT, DeepSeek, Claude, Perplexity, Gemini, Grok & Qwen, 2025. Section: AI and ML.
- Volkov, A. ThursdAI - May 29 - DeepSeek R1 Resurfaces, VEO3 viral moments, Opus 4 a week after, Flux Kontext image editing & more AI news, 2025.
- Best LLMs for Coding in 2025. Model overview (o3-mini, Claude 4, Llama 4 and More).
- Best LLMs for Coding | LLM Leaderboards.
- DeepSeek R1 vs GPT o1 vs Claude 3.5 Sonnet – Which is best for coding?, 2025.
- Jain, A. Top AI Reasoning Model Cost Comparison 2025, 2025.
- published, A.C. I tested ChatGPT o3-mini vs DeepSeek R1 vs Qwen 2.5 with 9 prompts — here’s the winner, 2025.
- Hoornaert, M. I Tried 37 AI Models, These Are The Ones I’ll Actually Keep Using., 2025.
- 10x faster. Get Started, M.a. Build a Coding Copilot with Qwen3-Coder & Code Context - Milvus Blog.
- Large Language Models Explained: Understanding the Technology Behind Modern AI | AIML API.
- DeepSeek AI | – Deepseek R1, V3, Use Cases | GlobalGPT.
- Qwen 3 vs. Deepseek R1: Complete comparison, 2025.
- DeepSeek vs ChatGPT vs Perplexity vs Qwen vs Claude vs DeepMind: More AI Agents and New AI Tools | HackerNoon.
- Qwen 3 Coder Beats Claude 4 On Paper. Did the Benchmarks Lie? | by Mil Hoornaert | Jul, 2025 | Generative AI.
- Comparing AI Models for Code Generation: Claude 3.7 Sonnet vs Qwen 2.5 Coder – Revolutionizing Intelligence: Cutting-Edge AI, Deep Learning & Data Science.
- Vig, P. Qwen Code CLI + Qwen3-Coder Let’s Set Up Qwen Code, Better than Claude Code?, 2025.
- Ashley. Did Qwen Just Release the Best Alternative to Claude Code ?, 2025.
- Claude AI 3.7 vs. Qwen: Which AI Model Excels in Translation?
- Dalie (Ilyass), G. Why DeepSeek-R1 Is so Much Better Than o3-Mini & Qwen 2.5 MAX — Here The Results, 2025.
- DeepSeek + Claude MCP Server by niko91i.
| Model | Reasoning Score | Coding Score | 2025 Cost | Reference |
| Claude 4 Sonnet | 95.3 | 94.1 | $0.08/1K tokens | [17,25] |
| Qwen3-Coder | 94.8 | 96.2 | $0.02/1K tokens | [11,18] |
| DeepSeek R1 | 94.6 | 93.8 | Free/Open-source | [7] |
| Gemini Pro 2.5 | 93.2 | 95.0 | $0.07/1K tokens | [8,41] |
| OpenAI o3-mini | 90.5 | 91.6 | $0.06/1K tokens | [10,32] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).