Submitted:
14 March 2025
Posted:
24 March 2025
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
- (1)
- What are the comparative advantages of open-source versus commercial LLMs?
- (2)
- What limitations and challenges are inherent to open-source and commercial LLMs?
2. Advantages of Open-Source versus Commercial LLMs
2.1. Advantages of Open-Source LLMs
2.1.1. Transparency and Community-Driven Innovation
2.1.2. Customization and Domain-Specific Adaptation
2.1.3. Lower Long-Term Costs
2.1.4. Community Security Audits and Ethical Control
2.1.5. Flexible Hosting Options and Data Governance
2.2. Advantages of Commercial LLMs
2.2.1. High Baseline Performance and Comprehensive Refinement
2.2.2. Managed Services and Technical Support
2.2.3. Centralized Safety Features
2.2.4. Ongoing Updates and Incremental Upgrades
2.2.5. Streamlined Ecosystem Integration
2.3. Advantages Comparison between DeepSeek and ChatGPT
3. Limitations of Open-Source versus Commercial LLMs
3.1. Limitations of Open-Source LLMs
3.1.1. Infrastructure and Compute Overheads
3.1.2. Community-Led Quality Assurance
3.1.3. Security Challenges and Adversarial Risks
3.1.4. Fragmented Support and Documentation
3.1.5. License and Data Usage Uncertainties
3.2. Limitations of Commercial LLMs
3.2.1. Potentially High Usage Costs
3.2.2. Reduced Architectural Control
3.2.3. Vendor Lock-In and Platform Dependency
3.2.4. Opaque Training and Bias Discovery
3.2.5. Data and Privacy Concerns
3.3. Limitations Comparison between DeepSeek and ChatGPT
4. Conclusions
References
- A. Vaswani et al., ‘Attention is all you need’, Adv. Neural Inf. Process. Syst., vol. 30, 2017. [CrossRef]
- D.-M. Petroșanu, A. Pîrjan, and A. Tăbușcă, ‘Tracing the influence of large language models across the most impactful scientific works’, Electronics, vol. 12, no. 24, p. 4957, 2023. [CrossRef]
- Z. Guo et al., ‘Evaluating Large Language Models: A Comprehensive Survey’, ArXiv, vol. abs/2310.19736, 2023. [CrossRef]
- F. Neha and D. Bhati, ‘A Survey of DeepSeek Models’, Authorea Prepr., 2025.
- G. Zhang et al., ‘Closing the gap between open source and commercial large language models for medical evidence summarization’, NPJ Digit. Med., vol. 7, no. 1, p. 239, 2024. [CrossRef]
- R. Sapkota, S. Raza, and M. Karkee, ‘Comprehensive analysis of transparency and accessibility of chatgpt, deepseek, and other sota large language models’, ArXiv Prepr. ArXiv250218505, 2025. [CrossRef]
- C. Davis et al., ‘Prompting open-source and commercial language models for grammatical error correction of English learner text’, ArXiv Prepr. ArXiv240107702, 2024. [CrossRef]
- M. M. H. Manik, ‘ChatGPT vs. DeepSeek: A Comparative Study on AI-Based Code Generation’, ArXiv Prepr. ArXiv250218467, 2025. [CrossRef]
- D. Vake, B. Šinik, J. Vičič, and A. Tošić, ‘Is Open Source the Future of AI? A Data-Driven Approach’, Appl. Sci., vol. 15, no. 5, p. 2790, 2025. [CrossRef]
- H. Wang et al., ‘Adapting Open-Source Large Language Models for Cost-Effective, Expert-Level Clinical Note Generation with On-Policy Reinforcement Learning’, ArXiv, vol. abs/2405.00715, 2024. [CrossRef]
- X. Bi et al., ‘Deepseek llm: Scaling open-source language models with longtermism’, ArXiv Prepr. ArXiv240102954, 2024. [CrossRef]
- C. I. Chang, W. C. Choi, and I. C. Choi, ‘A Systematic Literature Review of the Opportunities and Advantages for AIGC (OpenAI ChatGPT, Copilot, Codex) in Programming Course’, in Proceedings of the 2024 7th International Conference on Big Data and Education, 2024.
- C. I. Chang, W. C. Choi, and I. C. Choi, ‘Challenges and Limitations of Using Artificial Intelligence Generated Content (AIGC) with ChatGPT in Programming Curriculum: A Systematic Literature Review’, in Proceedings of the 2024 7th Artificial Intelligence and Cloud Computing Conference, 2024.
- W. C. Choi, I. C. Choi, and C. I. Chang, ‘The Impact of Artificial Intelligence on Education: The Applications, Advantages, Challenges and Researchers’ Perspective’, Preprints.org, 2025. [CrossRef]
- Y. Cao et al., ‘A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt’, ArXiv Prepr. ArXiv230304226, 2023.
- P. P. Ray, ‘ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope’, Internet Things Cyber-Phys. Syst., vol. 3, pp. 121–154, 2023. 10.1016/j.iotcps.2023.04.
- Y. Yao, J. Duan, K. Xu, Y. Cai, Z. Sun, and Y. Zhang, ‘A survey on large language model (llm) security and privacy: The good, the bad, and the ugly’, High-Confid. Comput., p. 100211, 2024. [CrossRef]
- A. Liu et al., ‘Deepseek-v3 technical report’, ArXiv Prepr. ArXiv241219437, 2024. [CrossRef]
- Y. Peng, Q. Chen, and G. Shih, ‘DeepSeek is open-access and the next AI disrupter for radiology’, Radiology Advances, vol. 2, no. 1. Oxford University Press, p. umaf009, 2025.
- T. Gao, J. Jin, Z. T. Ke, and G. Moryoussef, ‘A Comparison of DeepSeek and Other LLMs’, ArXiv Prepr. ArXiv250203688, 2025. [CrossRef]
- Y. Peng et al., ‘From GPT to DeepSeek: Significant gaps remains in realizing AI in healthcare’, Journal of Biomedical Informatics. Elsevier, p. 104791, 2025.
- F. W. Liu and C. Hu, ‘Exploring vulnerabilities and protections in large language models: A survey’, ArXiv Prepr. ArXiv240600240, 2024. [CrossRef]
| Advantage | DeepSeek (Open-Source) | ChatGPT (Commercial) |
|---|---|---|
| Transparency | Model visibility and modifiable architecture | Training methodology is largely proprietary; limited introspection |
| Cost Management | No per-token or monthly fees; self-hosted at scale | Pay-per-use or subscription-based model; costs can grow rapidly with heavy usage |
| Customization | In-depth fine-tuning and domain-level modifications | Fine-tuning available in limited or proprietary form; constrained model architecture |
| Security & Data Control | On-premise deployment; private data remains in-house | Cloud-based service by default; must trust vendor’s data handling |
| Ease of Onboarding | Demands engineering expertise; the user must maintain infrastructure | Fast integration using managed API, vendor support, and documentation |
| Safety Mechanisms | Requires custom or community-based content filters | Pre-built moderation and refusal systems, regularly updated by the provider |
| Innovation Ecosystem | Community-driven patches and performance upgrades | Centralized R&D processes with less user insight; frequent iteration behind closed doors |
| Baseline Performance | Potentially robust in specialized tasks (e.g., code, math) after tuning | High out-of-the-box performance in general tasks; extensive alignment |
| Sustainability | Emphasis on more efficient architectures (Mixture-of-Experts, etc.) | Large-scale resources for training and hosting; energy usage not fully disclosed |
| Limitation | DeepSeek (Open-Source) | ChatGPT (Commercial) |
|---|---|---|
| Resource Demands | High GPU/TPU costs for large-scale training/inference | Pay-as-you-use pricing; cumulative fees can grow for large usage |
| Support Ecosystem | Volunteer-led forums; no formal SLAs | Dedicated support with SLAs and official documentation |
| Security and Moderation | Requires custom integration of filtering or adversarial training | Built-in refusal system and moderation layers, but user must accept vendor’s approach |
| Data Governance and Compliance | Users must handle data privacy tools locally; uncertain training data sources | Data goes to the external cloud; vendor’s compliance approach may not align perfectly with user’s requirements |
| Architecture Control | Total access for specialized modifications; can be complex to maintain | Architecture locked; advanced domain adaptation is limited to partial fine-tuning or prompt engineering |
| Scalability & Sustainability | Greater flexibility in hardware but large-scale training is resource-intensive; possible heavy carbon footprint | Provider invests in HPC infrastructure; usage-based fees can hamper sustainable scaling if demand grows exponentially |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).