Submitted:
12 June 2025
Posted:
13 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Background on Large Language Models (LLMs)
1.2. Motivation for Comparative Analysis
- Clarity and Taxonomy: To provide a structured understanding of how different classes of LLMs are defined, trained, and deployed.
- Informed Decision-Making: To guide researchers and practitioners in choosing the right model class for specific applications, domains, and constraints.
- Performance and Trade-Offs: To systematically assess how foundation, instruction-tuned, and multimodal models differ in terms of performance, generalization, interpretability, alignment, and cost.
- Bridging Gaps: To identify where models can be improved by integrating insights across paradigms—such as combining the generality of foundation models with the usability of instruction-tuned systems and the perceptual capabilities of multimodal models.
- Foresight for Development: To anticipate trends and guide the next generation of models toward unified, context-aware, and human-aligned architectures.
2. Taxonomy of Large Language Models
2.1. Classification by Architecture
-
Decoder-Only Models:These models predict the next token in a sequence, making them ideal for generative tasks. Examples include GPT-2, GPT-3, and GPT-4.
-
Encoder-Only Models:These models encode the entire input sequence for tasks like classification or embedding generation. BERT and RoBERTa are canonical encoder-only models.
-
Encoder-Decoder (Seq2Seq) Models:These models encode an input sequence and decode an output sequence, making them suitable for translation and summarization. Examples include T5, BART, and FLAN-T5.
2.2. Classification by Purpose
-
Foundation Models:These are large pretrained models designed to serve as general-purpose backbones across a range of downstream tasks. They are typically trained on vast, diverse datasets using unsupervised or self-supervised learning. Examples: GPT-3, PaLM, Chinchilla, LLaMA.
-
Instruction-Tuned Models:These are derived from foundation models via additional training on datasets consisting of prompts and expected responses. They aim to improve alignment with user intent and are better at following natural language instructions. Examples: InstructGPT, FLAN-T5, OpenAssistant, Mistral-Instruct.
-
Multimodal Models:These models extend language understanding to other modalities such as vision, audio, and video. They can accept multiple input types and generate text or multimodal outputs. Examples: GPT-4V, Gemini, CLIP, Flamingo, Kosmos-2.
2.3. Evolution of LLMs Over Time
-
Phase 1: General-Purpose PretrainingEmphasis on scale and generality. Key contributions: GPT-2, BERT, T5.
-
Phase 2: Task Alignment and Instruction FollowingModels are refined to better align with human intent through supervised fine-tuning, reinforcement learning, and curated datasets. Key contributions: InstructGPT, FLAN, Alpaca.
-
Phase 3: Multimodal and Unified ModelsLanguage models are extended to process other modalities, enabling capabilities like image generation, audio captioning, and video analysis. Key contributions: Flamingo, GPT-4V, Gemini, MM-ReAct.
3. Foundation Language Models
3.1. Definition and Characteristics
- Massive Pretraining Datasets: Typically trained on hundreds of billions of tokens sourced from books, web data, Wikipedia, code, and other heterogeneous sources.
- Self-Supervised Objectives: Common objectives include next-token prediction (causal language modeling) and masked language modeling (MLM).
- Zero-Shot and Few-Shot Learning Capabilities: Demonstrated ability to generalize to unseen tasks with little or no additional fine-tuning.
- Scalability: Performance improves significantly with model and dataset scale, as demonstrated by scaling laws (Kaplan et al., 2020).
3.2. Major Examples
| Model | Developer | Architecture | Parameters | Training Data Size | Training Objective |
| GPT-3 | OpenAI | Decoder-only | 175B | 300B tokens | Causal LM |
| PaLM | Decoder-only | 540B | 780B tokens | Causal LM | |
| Chinchilla | DeepMind | Decoder-only | 70B | 1.4T tokens | Causal LM |
| BERT | Encoder-only | 340M | Wikipedia + Books | MLM | |
| T5 | Encoder-decoder | 11B | C4 (Colossal Clean Crawled Corpus) | Text-to-text |
3.3. Training Data and Objectives
- Open-access web data (e.g., Common Crawl)
- Digitized books and academic content
- Code repositories
- Conversational and forum data
- Causal Language Modeling (CLM): Predicting the next token based on previous context (e.g., GPT-style models).
- Masked Language Modeling (MLM): Predicting randomly masked tokens within a sequence (e.g., BERT, RoBERTa).
3.4. Strengths and Limitations
- Broad generalization across language tasks
- High performance in zero/few-shot settings
- Scalable and reusable across domains
- Strong latent knowledge representation
- Lack of alignment with human intent or task-specific goals
- Tendency to produce hallucinations or factually incorrect content
- Computationally expensive to train and deploy
- Inability to handle multimodal input or grounded reasoning without adaptation
| Model | Year | Parameters | Training Corpus | Strengths | Weaknesses |
| GPT-3 | 2020 | 175B | Web Text, Books, etc. | Strong few-shot ability | Expensive inference, hallucination |
| BERT | 2018 | 340M | Books Corpus, Wiki | Bidirectional understanding | Not generative |
| PaLM | 2022 | 540B | Diverse, filtered web | Very high performance | High energy/resource cost |
| Chinchilla | 2022 | 70B | 1.4T tokens | Efficient scaling | Limited instruction tuning |
4. Instruction-Tuned Language Models
4.1. Definition and Purpose
- Improving alignment with user instructions
- Enhancing usability without task-specific fine-tuning
- Mitigating hallucinations and unsafe outputs
- Facilitating natural human-computer interaction
4.2. Training Methods and Datasets
-
Supervised Fine-Tuning (SFT):
- o
- Models are fine-tuned on large datasets of <instruction, input, output> triplets.
- o
- Example datasets: FLAN, Super-Natural Instructions, OpenAssistant, Alpaca, ShareGPT.
-
Reinforcement Learning with Human Feedback (RLHF):
- o
- Models are trained to prefer outputs that align with human preferences.
- o
- Steps include reward modeling and policy optimization (e.g., PPO).
- o
- Used in models like InstructGPT, ChatGPT, Claude.
-
Self-Instruct and Synthetic Generation:
- o
- Models bootstrap additional instruction data from themselves or other LLMs.
- o
- Example: Self-Instruct (Wang et al., 2022).
4.3. Notable Instruction-Tuned Models
| Model | Base Model | Developer | Tuning Method | Key Features |
| InstructGPT | GPT-3 | OpenAI | SFT + RLHF | First widely adopted instruction-tuned model |
| ChatGPT | GPT-3.5 / GPT-4 | OpenAI | SFT + RLHF + Conversation | Dialogue-optimized, real-time responsiveness |
| FLAN-T5 | T5 | SFT on diverse tasks | Strong zero-shot and generalization ability | |
| Alpaca | LLaMA | Stanford | SFT on GPT-generated data | Lightweight, open-source instructional tuning |
| Open Assistant | LLaMA | LAION | Community-sourced SFT | Open RLHF pipeline |
| Claude | Proprietary | Anthropic | Constitutional AI + RLHF | Focus on safe, steerable behavior |
4.4. Capabilities and Advantages
- Better Prompt Following: Clear understanding and execution of user directives.
- Improved Generalization: Effective zero- and few-shot performance across unseen tasks.
- Enhanced Safety and Usefulness: RLHF and constitutional training reduce harmful or biased outputs.
- User-Friendly Interaction: More suitable for deployment in assistants, tutors, search engines, and chatbots.
4.5. Limitations and Challenges
- Instruction Sensitivity: Performance varies significantly depending on prompt phrasing.
- Bias Amplification: Human preferences and training data biases can propagate.
- High Resource Requirements: Tuning with human feedback is costly and time-consuming.
- Lack of Grounded Knowledge: Still prone to hallucinations without retrieval mechanisms or external tools.
5. Multimodal Language Models
5.1. Definition and Scope
- Vision-Language Models (VLMs): Image captioning, visual question answering, and image-grounded dialogue (e.g., GPT-4V, Flamingo).
- Speech-Language Models: Automatic speech recognition (ASR), speech synthesis, and spoken question answering (e.g., Whisper, AudioLM).
- Cross-Modal Reasoning: Tasks like referring expression comprehension, video-language alignment, and embodied AI.
5.2. Architectural Approaches
-
Dual-Encoder Models:Separate encoders process each modality, and embeddings are aligned in a joint representation space. Example: CLIP (Contrastive Language–Image Pretraining).
-
Fusion Models:Modalities are combined in a shared transformer architecture using early, mid, or late fusion. Example: Flamingo, Kosmos-2.
-
Adapter-Based Integration:Lightweight adapters are added to pretrained models to process specific modalities without retraining the entire network. Example: PaLI-X, LLaVA.
-
Multimodal Prompting:Inputs from non-text modalities are transformed into "prompts" or embeddings that can be consumed by text-based models (e.g., visual tokens or audio spectrogram embeddings).
5.3. Prominent Multimodal LLMs
| Model | Modalities | Developer | Architecture Type | Core Capabilities |
| GPT-4V (Vision) | Text + Image | OpenAI | Unified Transformer | Image understanding, document reasoning |
| Gemini 1.5 | Text + Image + Audio | Google DeepMind | Multimodal transformer | State-of-the-art multi-input reasoning |
| CLIP | Image + Text | OpenAI | Dual Encoder | Visual search, zero-shot classification |
| Flamingo | Text + Image | DeepMind | Fusion Transformer | Image-grounded dialogue, captioning |
| Kosmos-2 | Text + Image + OCR | Microsoft | Multimodal Transformer | Vision-language grounding and reasoning |
| LLaVA | Text + Image | UC Berkeley | Adapter + Vision Encoder | Open-ended VQA, visual chat |
5.4. Use Cases and Applications
- Healthcare: Radiology report generation from medical images, cross-modal diagnostics
- Education: Visual tutoring, diagram explanation, language-learning with image/audio context
- Search and Retrieval: Multimodal search engines (e.g., Google Lens + Gemini)
- Accessibility: Image and scene description for visually impaired users
- Creative Tools: Image captioning, visual storytelling, audio-based content generation
5.5. Strengths and Limitations
- Rich, grounded understanding of the physical world
- Improved performance on real-world tasks involving diverse inputs
- Enables more human-like interaction (e.g., talking about images or sounds)
- Training data for multimodal inputs is less abundant and less standardized
- Computationally more expensive to train and fine-tune
- Still prone to hallucination or false cross-modal associations
- Challenges in aligning and synchronizing multiple input types
- Accepts image and text inputs
- Encodes both with respective encoders
- Performs attention fusion in a joint transformer
- Outputs text (e.g., answer to visual question)
6. Comparative Analysis
6.1. Comparison Dimensions
- Training Objective & Data
- Architecture & Modalities
- Task Generalization
- Instruction Following
- Performance on Benchmarks
- Practical Applications
- Limitations
6.2. Comparative Table
| Dimension | Foundation Models | Instruction-Tuned Models | Multimodal Models |
| Training Objective | Self-supervised (e.g., MLM, CLM) | Supervised fine-tuning + RLHF | Multimodal fusion + optionally RLHF |
| Input Modalities | Text only | Text only | Text + images/audio/video |
| Output Modality | Text | Text | Text or multimodal outputs |
| Architecture | Encoder / Decoder / Encoder-Decoder | Based on foundation models | Unified or dual encoders; adapter-based |
| Instruction Following | Weak (zero-shot prompting) | Strong (fine-tuned on instructions) | Strong (for image-grounded tasks) |
| Few-shot Learning | Emerging capability | Highly effective | Limited (depends on task) |
| Zero-shot Performance | Moderate to strong | Strong (especially on unseen tasks) | Variable (strong in retrieval & classification) |
| Example Models | GPT-3, PaLM, BERT, T5 | InstructGPT, ChatGPT, FLAN-T5, Claude | GPT-4V, Gemini, Flamingo, Kosmos-2 |
| Primary Use Cases | Pretraining backbone, embeddings | Chatbots, assistants, instruction interfaces | VQA, captioning, multimodal agents |
| Limitations | Weak task alignment, hallucinations | Prompt sensitivity, bias amplification | Data scarcity, high compute costs |
6.3. Performance on Benchmarks
| Benchmark | Foundation Models | Instruction-Tuned Models | Multimodal Models |
| MMLU (text tasks) | Good (GPT-3, PaLM) | Excellent (FLAN-T5, ChatGPT) | Variable |
| HELM / BIG-Bench | Limited | Strong generalization | Not applicable |
| VQAv2 / COCO | Not applicable | Not applicable | Excellent (GPT-4V, Flamingo) |
| GSM8K (math) | Moderate | Good (Instruct GPT, Claude) | Dependent on task design |
6.4. Analysis of Trade-Offs
- Foundation Models provide general-purpose capabilities and a scalable base for downstream use but require fine-tuning for task alignment and safety.
- Instruction-Tuned Models deliver much better user alignment and usability but inherit biases and errors from the foundation models and the tuning datasets.
- Multimodal Models unlock perception and cross-modal reasoning but face greater engineering complexity, limited data availability, and slower inference.
- Foundation models excel in low-resource generalization.
- Instruction-tuned models are ideal for interactive applications.
- Multimodal models are indispensable for real-world AI agents and grounded understanding.
6.5. Suggested Diagram
- Foundation models at the base (general scope),
- Instruction-tuned models at one corner (aligned behavior),
- Multimodal models at another corner (expanded input modalities),
7. Deployment and Ecosystem Trends
7.1. Deployment Architectures
-
Cloud-based APIs:Hosted models accessed via RESTful endpoints (e.g., OpenAI API, Gemini API, Claude).Pros: scalability, model freshness, minimal infrastructure burden.Cons: dependency, latency, data governance risks.
-
On-premise/Private Hosting:Useful for privacy-critical industries (e.g., legal, healthcare).Example: LLaMA or Mistral deployed internally with quantization or inference optimization.
-
Edge Deployment:Lightweight models (e.g., Phi-2, DistilGPT, MobileBERT) adapted for mobile, IoT, and embedded systems.Often optimized via pruning, quantization, or knowledge distillation.
-
Hybrid Architectures:Combine local inference with cloud augmentation (e.g., retrieval-augmented generation or tool use).
7.2. Ecosystem Tools and Frameworks
| Category | Tools |
| Training & Fine-tuning | Hugging Face Transformers, Deep Speed, LoRA, PEFT, Axolotl |
| Serving & Inference | vLLM, TGI (Text Generation Inference), ONNX Runtime, NVIDIA TensorRT |
| Evaluation | HELM, BIG-bench, TruthfulQA, RAGAS |
| RLHF & Alignment | TRL (Transformers Reinforcement Learning), Open Feedback, Reinforcement Studio |
| Multimodal Frameworks | Open Flamingo, MiniGPT-4, LLaVA, Hugging Face’s Diffusers + Transformers |
| Monitoring & Governance | Prompt Layer, Lang fuse, Arize Phoenix, Weights & Biases |
7.3. Industry Adoption Trends
- Customer Support: AI agents (e.g., ChatGPT-based interfaces, Ada, Intercom) are used for ticket triaging, live assistance, and content summarization.
- Enterprise Productivity: Copilots (e.g., GitHub Copilot, Microsoft 365 Copilot) are increasingly integrated into office tools and IDEs.
- Healthcare: LLMs support clinical summarization, drug discovery, and medical Q&A (e.g., Med-PaLM, Glass AI).
- Education: Personalized tutoring, content creation, and assessment automation.
- Creative Industries: Storyboarding, video scripting, audio narration using multimodal variants (e.g., Sora, DALL·E, ElevenLabs).
7.4. Emerging Deployment Trends
-
Open Weight Releases:Increasing community momentum around open-access models (e.g., Mistral, Mixtral, Phi-3, LLaMA 3) for reproducibility and customization.
-
Model Distillation & Quantization:Techniques like 4-bit quantization (e.g., GPTQ) and distillation (e.g., DistilBERT) allow deployment on constrained hardware.
-
Retrieval-Augmented Generation (RAG):Combines LLMs with search or vector databases to ground outputs in external knowledge (e.g., LangChain, Haystack, LlamaIndex).
-
Agentic Systems:Tools like AutoGen, CrewAI, LangGraph enable orchestration of multiple LLMs or tools in a goal-driven, persistent context.
-
Synthetic Data & Feedback Loops:Self-generated data is used for continuous learning, evaluation, or instruction tuning (e.g., Self-Instruct, Feedback-Augmented Training).
7.5. Challenges in Deployment
-
Cost Efficiency:LLM inference is resource-intensive. Token compression, batching, and caching are critical optimizations.
-
Latency and Interactivity:Key for real-time applications like voice agents or interactive chatbots.
-
Governance and Safety:Preventing misuse, managing model drift, and ensuring transparency remain unresolved in many commercial settings.
-
Evaluation at Scale:Robust automated metrics (beyond BLEU/ROUGE) are still evolving, especially for subjective and open-ended tasks.
- Foundation and fine-tuned models at the core
- Toolchains for tuning, serving, and evaluation in the middle layer
- Real-world applications and industry verticals in the outermost layer
8. Challenges and Open Questions
8.1. Model Alignment and Safety
- How can alignment be maintained across increasingly capable models?
- What are scalable alternatives to RLHF and instruction tuning?
- Can models be made self-monitoring and self-correcting in real time?
8.2. Hallucination and Factual Consistency
- How can we quantify and minimize hallucination across modalities?
- Can retrieval-augmented generation (RAG) fully solve this issue?
- How should factual grounding be incorporated during training?
8.3. Multimodal Integration Complexity
- What are the best architectural paradigms for robust multimodal fusion?
- How do we benchmark cross-modal reasoning and transfer learning?
- How do we address bias and representation issues across modalities?
8.4. Generalization vs Specialization Trade-Off
- How can models balance generality with domain-specific expertise?
- Is modular composition (e.g., agents or tool-use) more efficient than training monolithic models?
- Can adapters or mixture-of-experts architectures improve task-specific efficiency?
8.5. Data Efficiency and Scaling Laws
- What are the limits of current scaling laws?
- How can small models match large models through better training strategies (e.g., curriculum learning, active learning)?
- What role can synthetic or semi-supervised data play?
8.6. Evaluation, Robustness, and Benchmarking
- What comprehensive and reliable metrics can be used beyond BLEU, ROUGE, or accuracy?
- How can we measure social biases, robustness, calibration, and uncertainty?
- How do we design benchmarks that evolve with models?
8.7. Legal, Ethical, and Societal Implications
- Who is responsible for harms caused by autonomous LLM agents?
- How do we ensure transparent auditing of black-box foundation models?
- What governance structures are needed for open-weight vs proprietary models?
8.8. Model Interpretability and Trust
- Can interpretability techniques scale with model size and complexity?
- Are attention maps or feature attribution useful in multimodal settings?
- How do we build user trust in models that remain probabilistic and non-deterministic?
9. Future Directions
9.1. Toward Unified Multimodal Intelligence
- Truly universal encoders that process arbitrary modality combinations.
- Temporal reasoning frameworks for video understanding and narration.
- Cross-modal agent architectures for real-world interaction (e.g., robotics, AR/VR).
9.2. Modular and Composable Architectures
- Mixture-of-Experts (MoE) systems that dynamically activate subnetworks.
- Adapter-based fine-tuning for task-specific customization.
- Agent-based orchestration where smaller models or tools are composed for goal-driven behavior.
9.3. Continual and Lifelong Learning
- Learn continuously from new data and feedback in real-world environments.
- Adapt on-device or in federated settings, without catastrophic forgetting.
- Use meta-learning to quickly generalize to new domains with few examples.
9.4. Enhanced Alignment and Human-AI Interaction
- Constitutional AI and ethical scaffolding for value alignment.
- Human-in-the-loop interaction for iterative improvement and personalized behavior.
- Causal reasoning and epistemic uncertainty estimation to improve robustness and trust.
9.5. Efficient and Sustainable Model Design
- Sparse and low-rank modeling, quantization, and pruning for energy-efficient deployment.
- Distilled small models competitive with large models in constrained settings.
- Hardware-aware training and inference optimizations.
9.6. Model Governance and Open Research Infrastructure
- Transparent benchmarking and standardized evaluation protocols.
- Auditable model cards, datasheets, and usage logs.
- Open ecosystems with community-curated datasets and decentralized model stewardship (e.g., OpenLLM, EleutherAI, Hugging Face Hub).
9.7. Grounded and Tool-Augmented Models
- Retrieval-Augmented Generation (RAG) pipelines that dynamically access structured and unstructured data.
- Tool use APIs for computation, database access, web browsing, and more.
- Multistep planning and memory systems enabling agent-like behavior.
9.8. Societal Integration and Human-Centric Design
- Education (e.g., intelligent tutors with curriculum-awareness),
- Healthcare (e.g., patient-tailored dialogue agents),
- Creative Workflows (e.g., story generation, design tools, scientific assistants).
- Architecture
- Safety and alignment
- Deployment and tools
- Societal integration
10. Conclusions
References
- Pahune, S., & Chandrasekharan, M. (2023). Several categories of large language models (llms): A short survey. arXiv preprint arXiv:2307.10188.
- Nokhwal, S., Chilakalapudi, P., Donekal, P., Nokhwal, S., Pahune, S., & Chaudhary, A. (2024, April). Accelerating neural network training: A brief review. In Proceedings of the 2024 8th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (pp. 31-35).
- Nokhwal, S., Pahune, S., & Chaudhary, A. (2023, April). Embau: A novel technique to embed audio data using shuffled frog leaping algorithm. In proceedings of the 2023 7th international conference on intelligent systems, metaheuristics & swarm intelligence (pp. 79-86).
- Nokhwal, S., Nokhwal, S., Pahune, S., & Chaudhary, A. (2024, April). Quantum generative adversarial networks: Bridging classical and quantum realms. In Proceedings of the 2024 8th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (pp. 105-109).
- Pahune, S., & Rewatkar, N. (2024). Large language models and generative ai’s expanding role in healthcare.
- Pahune, S. A. (2024). A brief overview of how ai enables healthcare sector rural development.
- Pahune, S., Akhtar, Z., Mandapati, V., & Siddique, K. (2025). The Importance of AI Data Governance in Large Language Models. Big Data and Cognitive Computing, 9(6), 147.
- Pahune, S., & Akhtar, Z. (2025). Transitioning from mlops to llmops: Navigating the unique challenges of large language models. Information, 16(2), 87.
- Pahune, S. A., & Rewatkar, N. (2024). Investigating the application of quantum-enhanced generative adversarial networks in optimizing supply chain processes. International Research Journal of Engineering and Technology (IRJET), 11, 446.
- Pahune, S., & Rewatkar, N. (2024). Cognitive automation in the supply chain: Unleashing the power of rpa vs. gen ai.
- Gali, M., & Mahamkali, A. (2022). A Distributed Deep Meta Learning based Task Offloading Framework for Smart City Internet of Things with Edge-Cloud Computing. J. Internet Serv. Inf. Secur., 12(4), 224-237.
- Mahamkali, A., Gali, M., Muniyandy, E., & Sundaram, A. (2023, October). IoT-Empowered Drones: Smart Cyber security Framework with Machine Learning Perspective. In 2023 International Conference on New Frontiers in Communication, Automation, Management and Security (ICCAMS) (Vol. 1, pp. 1-9). IEEE.
- Mahamkali, A. (2022). Health Care Internet of Things (IOT) During Pandemic–A Review. Journal of Pharmaceutical Negative Results, 13.
- Pande, S. D., & Khamparia, A. (Eds.). (2024). Networks Attack Detection on 5G Networks Using Data Mining Techniques. CRC Press.
- Sharma, A., Gali, M., Mahamkali, A., Prasad, K. R., Singh, P. P., & Mittal, A. (2023, August). IoT-enabled Secure Service-Oriented Architecture (IOT-SOA) through Blockchain. In 2023 Second International Conference On Smart Technologies For Smart Nation (SmartTechCon) (pp. 264-268). IEEE.
- Kumar, A., Keshta, I., Bhola, J., Wasim Bhatt, M., AlQahtani, S. A., & Gali, M. (2024). Application of Artificial Neural Network Unified with Fuzzy Logic for Systematic Stock Market Prediction. Fluctuation and Noise Letters, 23(02), 2440001.
- Kulkarni, C., Seifeddine, Z. D. M., Gali, M., & Degadwala, S. (2024). Mining intelligence hierarchical feature for malware detection over 5G network. In Networks Attack Detection on 5G Networks using Data Mining Techniques (pp. 64-82). CRC Press.
- Gali, M., & Mahamkali, A. Health Care Internet of Things (IOT) During Pandemic–A Review.(2022). Journal of Pharmaceutical Negative Results, 572-574.
- Veluguri, S. P. (2025, March). ConvAttRecurNet: An Attention-based Hybrid Model for Suicidal Thoughts Detection. In 2025 3rd International Conference on Disruptive Technologies (ICDT) (pp. 860-865). IEEE.
- Veluguri, S. P. (2025, January). Deep PPG: Improving Heart Rate Estimates with Activity Prediction. In 2025 1st International Conference on AIML-Applications for Engineering & Technology (ICAET) (pp. 1-6). IEEE.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).