Submitted:
04 July 2025
Posted:
07 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction: The ChatGPT Inflection Point in AI and its Applications
2. Advancements in Natural Language Understanding with ChatGPT: Capabilities, Innovations, and Critical Frontiers
2.1. Core NLU Architecture and Functionalities
2.2. Innovative NLU Techniques and their Impact
- LLMs for Data Annotation and Cleansing: LLMs are increasingly used to automate or assist in data annotation, a traditionally labor-intensive task. For example, the Multi-News+ dataset was enhanced by using LLMs with chain-of-thought and majority voting to cleanse and classify documents, improving dataset quality for multi-document summarization tasks.
- Factual Inconsistency Detection: Given LLMs' propensity for hallucination, techniques to detect factual inconsistencies are crucial. Methods like FIZZ, which employ fine-grained atomic fact decomposition and alignment with source documents, offer more interpretable ways to identify inaccuracies in abstractive summaries.
- Multimodal NLU Enhancement: Research is exploring the integration of other modalities, such as acoustic speech information, into LLM frameworks for tasks like depression detection, indicating a move towards more holistic NLU that mirrors human multimodal.
- "Evil Twin" Prompts: The discovery of "evil twin" prompts, obfuscated and uninterpretable inputs that can elicit desired outputs and transfer between models, opens new avenues for understanding LLM vulnerabilities and their internal representations, posing both security risks and research opportunities.
2.3. Critical Assessment: Benchmarks, Limitations, and Human Comparison
- Misinformation and Hallucinations: A persistent issue is the generation of plausible-sounding but incorrect or nonsensical information, often termed "hallucinations" (Akhtarshenas et al., 2025). This undermines reliability, especially in critical applications.
- Bias: LLMs inherit biases present in their vast training datasets, which can manifest as gender, racial, geographical, or ideological skews in their outputs (Akhtarshenas et al., 2025; Bender et al., 2021; OpenAI, 2023a). These biases can perpetuate harmful stereotypes and lead to unfair outcomes.
- Transparency and Explainability: The "black box" nature of LLMs makes it difficult to understand their decision-making processes or trace the origins of errors (Infosys Limited, 2023; Wolf et al., 2019; Liu et al., 2023; Mavrepis et al., 2024; Zhao et al., 2024). This lack of interpretability is a major hurdle for debugging, ensuring fairness, and building trust.
- Contextual Understanding Limits: While improved, LLMs can still struggle with deeply nuanced contextual understanding, complex linguistic structures (like center-embedding), rare words, sarcasm, or the subtleties of human emotion (Akhtarshenas et al., 2025; Bender et al., 2021; OpenAI, 2023a).
2.4. Advancing Method and Theory in NLU through ChatGPT
- Methodologically, a key frontier is the development of agentic AI workflows. These systems represent a profound shift, leveraging core NLU to interact with tools and environments to solve complex, multi-step problems (Park et al., 2023; Shinn et al., 2023).
- Techniques like RAG are foundational, providing agents with grounded, verifiable knowledge (Lewis et al., 2020; Hu & Lu, 2024; Wu et al., 2022; Yu, 2022).
- Probing methods like "evil twin" prompts (Empirical Methods in Natural Language Processing, 2024; Melamed et al., 2023; Mozes, 2024; Mozes et al., 2023; Oremus, 2023; Perez & Ribeiro, 2022; Perez et al., 2022; Xue et al., 2023) and the push towards multimodality (Hariri, 2023; OpenAI, 2024c) are creating more robust and versatile models to power these agents.
3. The New Epoch of Content Generation: Diverse Applications, Quality Assurance, and Ethical Imperatives
3.1. ChatGPT's Role in Diverse Content Generation and Task Automation
- Technical and Scientific Content: In engineering, ChatGPT assists in drafting reports, generating software documentation, and producing code snippets (Neveditsin et al., 2025). Multivocal literature reviews indicate error rates for engineering tasks average around 20-30% for GPT-4 (Ray, 2023). In medicine, it is used for generating patient reports and drafting discharge summaries (Hariri, 2023), though error rates can range from 8% to 83% (Ray, 2023).
- Marketing and SEO Content: Marketers leverage ChatGPT for creating blog posts, ad copy, social media updates, and personalized email campaigns. It also aids in SEO by generating topic ideas and crafting meta descriptions (Fisher, 2025).
- Legal Content: Law firms utilize ChatGPT for drafting client correspondence, creating legal blog content, and developing marketing materials to increase efficiency (Fisher, 2025).
- Creative Writing: ChatGPT has shown aptitude in generating creative content such as stories, poetry, and scripts, acting as a catalyst for imaginative endeavors (Hariri, 2023; OpenAI, 2024c; Elkatmis, 2024; Niloy et al., 2024; Zhu et al., 2024).
- Academic Content: In academic settings, ChatGPT assists with literature reviews, drafting sections of papers, generating study materials, and creating quizzes (Alasadi & Baiz, 2023; Dwivedi et al., 2023; Isiaku et al., 2024; Michel-Villarreal et al., 2023; Preiksaitis & Rose, 2023; Wu, 2023; Cotton et al., 2024).
- Automated Task Execution with AI Agents: The next frontier lies in Agentic AI, where LLMs are empowered to act as autonomous agents. These agents move beyond generating content to performing complex, multi-step tasks. For example, an agent might not just write code but also debug it, or not just draft a marketing email but also execute the entire campaign by analyzing performance data and adjusting its strategy (Park et al., 2023; Xi et al., 2023). This represents a shift from a content creator to a task automator.
3.2. Methodologies for Quality Control, Coherence, and Accuracy
- Human Oversight and Human-in-the-Loop: This remains the most critical control measure. Expert review is essential for content where errors have severe consequences (Infosys Limited, 2023; Dwivedi et al., 2023; Susskind & Susskind, 2022). For Agentic AI, this evolves into a "human-in-the-loop" model, where humans supervise, intervene, and approve agent actions before execution to prevent errors and ensure safety (Shneiderman, 2022).
- Prompt Engineering: The quality of output is highly dependent on the input prompt. Effective prompt engineering is a key skill for guiding both content generation and agent behavior (Vu et al., 2025; Arvidsson & Axell, 2023; Marvin et al., 2023; Velásquez-Henao et al., 2023; Zhou et al., 2022; Herman, 2025; Knoth et al., 2024; Pan et al., 2024).
- Iterative Refinement: Using feedback loops to progressively refine outputs is a common practice to improve quality for both text and agent action sequences (Kaushik et al., 2025; Hadi et al., 2023; Chen et al., 2025; Liu et al, 2023; Sivarajkumar et al., 2024; Xu et al., 2024).
- Fact-Checking and Source Verification: Due to the risk of hallucinations, rigorous fact-checking is essential (Akhtarshenas et al., 2025; Hodge Jr, 2023; Perlman, 2023; Surden, 2023; OpenAI, 2024a). For agents, this includes grounding their knowledge in real-time, verifiable data sources before they act.
- Process and Tool-Use Validation: For AI agents, quality control must extend beyond the final output to validate the entire process. This includes verifying that the agent's reasoning is sound and that it uses its tools (e.g., web browsers, APIs) correctly and safely (Shinn et al., 2023).
- Specialized Evaluation Metrics and Tools: Domain-specific metrics like BLURB (Akhtarshenas et al., 2025; Gu et al., 2021; Naseem et al., 2022) and tools like SelfCheckGPT (Akhtarshenas et al., 2025) are crucial for objective assessment.
- Error Rate Analysis: Systematic analysis of error rates provides insights into reliability and highlights areas needing improvement (Ray, 2023).
3.3. Ethical Considerations in Content and Task Automation
- Bias: Generated content and agentic actions can reflect and amplify societal biases from training data (Bender et al., 2021; Hariri, 2023; OpenAI, 2024c).
- Trustworthiness and Reliability: The probabilistic nature of LLMs means their outputs are not always factually correct or reliable, posing risks if unverified information is disseminated (Garousi, 2025; Schiller, 2024; Lee, 2024; Preiksaitis & Rose, 2023; Hariri, 2023; Wu et al., 2024; Xu et al., 2024; Xu et al., 2024).
- Security and Misuse: The potential for misuse is significant. Agentic AI dramatically lowers the barrier for malicious activities by enabling the automation of tasks like orchestrating large-scale phishing campaigns or propagating disinformation (Johnson & Acemoglu, 2023; OpenAI, 2023c; Veisi et al., 2025).
- Accountability and Autonomous Action: Agents capable of autonomous action raise profound ethical questions about accountability. Determining responsibility when an autonomous agent causes financial, social, or physical harm is a complex challenge for which legal and ethical frameworks are still nascent (Weidinger et al., 2024).
- Social Norms and Cultural Sensitivity: Generated content and actions must align with diverse cultural and societal expectations to avoid offense or misinterpretation (Johnson & Acemoglu, 2023; OpenAI, 2023c; Veisi et al., 2025).
- Ethical Data Sourcing and Privacy: Concerns persist regarding the methods used for collecting training data and the privacy of user inputs fed into ChatGPT (Daun & Brings, 2023; Marques & Bernardino, 2024; Neveditsin et al., 2025; OpenAI, 2024a).
- Copyright and Authorship: The generation of content raises complex questions about intellectual property rights, originality, authorship attribution, and plagiarism, especially when outputs closely resemble training data or are presented as original work (OpenAI, 2023c; Gamage et al., 2023; Hannigan et al., 2024; Jiang et al., 2024; Susnjak & McIntosh, 2024). Legal frameworks are still evolving to address these issues (Infosys Limited, 2023).
3.4. Advancing Method and Theory for Responsible Content Generation & Task Automation
4. ChatGPT as a Catalyst for Knowledge Discovery: Methodologies, Scientific Inquiry, and Future Paradigms
4.1. Methodologies for Knowledge Extraction from Unstructured Data
- Information Extraction from Diverse Sources: ChatGPT can parse complex documents, such as historical seedlists or health technology assessment (HTA) documents in different languages, to extract specific data points where rule-based methods falter (Dagdelen et al., 2024; Mitra et al., 2024; Yang et al., 2022; Shah et al., 2023).
- Qualitative Data Analysis Assistance: Researchers are exploring ChatGPT for assisting in qualitative analysis, such as generating initial codes or identifying potential themes (Chen et al., 2025; Kaushik et al., 2025; Liu et al, 2023; Sivarajkumar et al., 2024; Xu et al., 2024). However, careful prompting and validation are required, as LLMs can generate nonsensical data if not properly guided (Chen et al., 2025; Kaushik et al., 2025; Liu et al, 2023; Sivarajkumar et al., 2024; Xu et al., 2024).
- LLMs Combined with Knowledge Graphs (KGs): A promising methodology involves integrating LLMs with KGs. The GoAI method, for instance, uses an LLM to build and explore a KG of scientific literature to generate novel research ideas, providing a more structured approach than relying on the LLM alone (Gao et al., 2025; Pan et al., 2024).
- Autonomous Knowledge Discovery with AI Agents: The next methodological leap involves deploying Agentic AI to create automated knowledge discovery pipelines. These agents can be tasked with a high-level goal and then autonomously plan and execute a sequence of actions – such as searching databases, retrieving papers, extracting data, and synthesizing findings – to deliver structured knowledge with minimal human intervention (Bran et al., 2024).
- Prompt Injection Vulnerabilities: Research into prompt injection techniques highlights how the knowledge extraction process can be manipulated, underscoring security vulnerabilities that must be addressed for reliable knowledge discovery, especially in autonomous systems (Chang et al., 2025).
4.2. Applications in Scientific Research
- Hypothesis Generation: Models like GPT-4 can generate plausible and original scientific hypotheses, sometimes outperforming human graduate students in specific contexts (Noy & Zhang, 2023; OpenAI, 2024e; OpenAI Help Center, n.d.).
- Literature Review Assistance: LLMs can accelerate literature reviews by summarizing articles and identifying relevant papers and themes (Mitra et al., 2024; Dagdelen et al., 2024; Yang et al., 2022; Albadarin et al., 2024; Gabashvili, 2023; Haman & Školník, 2024; Imran & Almusharraf, 2023; Mostafapour et al., 2024; Wang et al., 2023; Waseem et al., 2023).
- Experimental Design Support: ChatGPT can assist in outlining experimental procedures but may require expert refinement to address oversimplifications or "loose ends" (Dai et al., 2023; Eymann et al., 2025; Fill et al., 2023; Li et al., 2023; OpenAI, 2024e).
- Data Analysis and Interpretation: LLMs can assist in analyzing large volumes of text data to identify patterns and emerging themes (Haltaufderheide & Ranisch, 2024; Hariri, 2023; Gabashvili, 2023; Garg et al., 2023; Li et al., 2024; Sallam, 2023; Pan et al., 2024; OpenAI, 2024c).
- Simulating Abductive Reasoning: LLMs can simulate abductive reasoning to infer plausible explanations or methodologies, thereby aiding research discovery (Glickman & Zhang, 2024; Huang & Chang, 2022; Bhagavatula et al., 2019; Garbuio & Lin, 2021; Magnani & Arfini, 2024; Pareschi, 2023; Xu et al., 2025).
- Automating Research with Scientific Agents: The culmination of these capabilities is the creation of scientific agents. These are autonomous systems designed to conduct research by integrating multiple steps. For instance, a scientific agent could be tasked with a high-level research question and then autonomously search literature, formulate a hypothesis, design and execute code for a simulated experiment, analyze the results, and draft a preliminary report, dramatically accelerating the pace of discovery (Boiko et al., 2023). OpenAI ChatGPT’s and Google Gemini’s Deep Research language models are good examples.
4.3. Critical Assessment of ChatGPT's Role in Advancing Research
- Acceleration and Efficiency: AI has the potential to dramatically accelerate research by automating time-consuming tasks, allowing researchers to focus on higher-level conceptual work (Dai et al., 2023; Fill et al., 2023; Li et al., 2023; Noy & Zhang, 2023; Rice et al., 2024).
- Accuracy and Reliability Concerns: The propensity for hallucinations and bias is a major concern that necessitates rigorous validation of all AI-generated outputs (Bender et al., 2021; OpenAI, 2024a; Rice et al., 2024). This risk is magnified for autonomous agents, where acting on a single hallucinated fact could derail an entire research workflow.
- The Indispensable Role of Human Expertise: Human expertise remains crucial for critical evaluation, contextual understanding, and ensuring methodological soundness (Dai et al., 2023; Fill et al., 2023; Li et al., 2023; Noy & Zhang, 2023; Rice et al., 2024). As research becomes more automated, the human role shifts from task execution to high-level strategic direction and critical supervision of the AI's process and outputs.
4.4. Advancing Method and Theory in AI-Augmented Knowledge Discovery
- Frameworks like GoAI (Gao et al., 2025) exemplify a move toward structured methodologies that combine LLMs with KGs for more transparent idea generation.
- The concept of LLMs "simulating abductive reasoning" (Glickman & Zhang, 2024) suggests a new theoretical lens for understanding how these models contribute to scientific insight, moving beyond pattern matching toward computational reasoning.
5. Revolutionizing Education and Training: ChatGPT's Global Impact on Pedagogy, Assessment, and Equity
5.1. Applications in Education
- Personalized Learning: A primary application is facilitating personalized learning experiences. ChatGPT can adapt content, offer real-time feedback, and function as a virtual tutor available 24/7 (Davar et al., 2025; Li, 2025).
- Curriculum and Lesson Planning: Educators use ChatGPT to assist in designing courses, developing lesson plans, and visualizing theoretical concepts in practical settings (Li, 2025; Li et al., 2025).
- Innovative Student Assessment: ChatGPT is being explored for generating diverse assessment items and designing tasks that promote critical thinking (Davar et al., 2025). GenAI can also personalize assessments and feedback based on learner responses (Arslan et al., 2024).
- Teaching Aids and Interactive Tools: The technology can be harnessed to develop engaging teaching aids, virtual instructors, and interactive simulations (Davar et al., 2025).
- Support for Diverse Learners: ChatGPT enhances accessibility for students with disabilities and multilingual learners through translation and simplification (Chan et al., 2024).
- Autonomous Learning Companions and Agents: The next evolutionary step is the deployment of AI agents as personalized learning companions. These agents go beyond tutoring by autonomously managing a student's long-term learning journey. They can co-design study plans, curate resources from vast digital libraries, schedule tasks, and proactively adapt strategies based on performance, transforming the learning process into a continuous, interactive dialogue (Molenaar, 2024; Salesforce, 2025).
5.2. Impact on Critical Thinking, Academic Integrity, and Ethics
- Critical Thinking: A dichotomy exists where AI can either be used to generate thought-provoking prompts that foster analysis or, through over-reliance, erode students' ability to think deeply (Mohammed, 2025; Dempere et al., 2023). Concerns persist that students may become cognitively passive (Alghazo et al., 2025). The introduction of AI agents deepens this concern, as they could automate not just the answers but the entire process of inquiry and discovery, potentially deskilling students in research and problem-solving (Zawacki-Richter et al., 2019).
- Academic Integrity: The risk of plagiarism with AI-generated text is a primary concern (Dempere et al., 2023; Mohammed, 2025). With agents, this evolves from verifying authorship of text to verifying authorship of action. Strategies to uphold integrity must shift toward assessments that are inherently human-centric, such as project-based work and oral examinations (Mohammed, 2025).
- Ethical Challenges: Broader ethical issues include data privacy, equity, and potential biases in AI content (Dempere et al., 2023). Agentic AI introduces new dilemmas regarding student autonomy and data sovereignty. An agent managing a student's learning collects vast amounts of sensitive performance and behavioral data, raising critical questions about consent, surveillance, and how that data is used to shape a student’s educational future (Prinsloo, 2020).
5.3. Global Perspectives and Educational Equity
- Diverse International Perceptions: Studies from regions like Pakistan and Indonesia reveal mixed student perceptions, balancing the benefits of ChatGPT as an AI assistance with concerns about its impact on deep thinking and integrity (Alghazo et al., 2025; Adiyono et al., 2025).
- Democratization vs. Digital Divide: ChatGPT has the potential to democratize education by providing widespread access to high-quality learning resources (Li, 2025). However, it also risks exacerbating the digital divide if access to technology, internet, and AI literacy are inequitably distributed (Chan et al., 2024). The advent of powerful, resource-intensive learning agents could create a new, more profound equity gap between students who have access to personalized autonomous tutors and those who do not (UNESCO, 2023).
- Cultural Context and Bias: LLMs trained on predominantly Western datasets may perpetuate cultural biases (Dempere et al., 2023). While AI can be used to decolonize curricula, this requires careful human oversight to avoid reinforcing existing biases (Chan et al., 2024).
5.4. Advancing Educational Research, Theories, and Pedagogical Models
- Revisiting Learning Theories: ChatGPT's capabilities challenge and offer new lenses through which to view learning theories such as constructivism (where students actively construct knowledge, potentially aided by AI tools) (Li et al., 2025) and self-determination theory (exploring AI's impact on student autonomy, competence, and relatedness) (Alghazo et al., 2025).
- Transforming Assessment Paradigms: Traditional assessment methods are being questioned. There is a call for innovative assessment strategies that emphasize higher-order thinking, creativity, and authentic application of knowledge, rather than tasks easily outsourced to AI (Dempere et al., 2023). This includes exploring personalized, adaptive assessments leveraging GenAI (Arslan et al., 2024).
- Methodological Rigor in AI-in-Education Research: There is a critical need for methodological rigor in studying AI's impact on education. Researchers must carefully define experimental treatments, establish appropriate control groups, and use valid outcome measures that genuinely reflect learning, avoiding pitfalls of earlier "media/methods" debates where technology effects were often confounded with instructional design (Weidlich & Gašević, 2025).
- Developing New Pedagogical Models: The situation calls for the development of new pedagogical models that constructively integrate AI. This involves training educators and students in AI literacy, prompt engineering skills, and the critical evaluation of AI-generated outputs, and designing learning experiences that leverage AI as a tool for enhancing human intellect and creativity, rather than replacing it (Kasneci et al., 2023; Zhai, 2023).
6. Engineering New Frontiers with ChatGPT: Advancing Design, Optimization, and Methodological Frameworks
6.1. Applications in Engineering Disciplines
- Software Engineering: LLMs are used for code generation, debugging, automated code review, and documentation, with experts reporting significant time savings (Neveditsin et al., 2025; Rawat et al., 2024). LLMs can also assist in translating natural language requirements into code (Yadav et al., 2025).
- Building Information Modeling (BIM), Architecture, and Civil Engineering: ChatGPT is explored for semantic search, information retrieval, and task planning (Yu et al., 2025). RAG has proven effective in helping ChatGPT apply localized BIM guidelines (Yu et al., 2025).
- Mechanical, Industrial, and General Engineering Design: LLMs assist in idea generation, conceptual design, and formulating engineering optimization problems (Vu et al., 2025; Jiang et al., 2025).
- Geotechnical Engineering: ChatGPT can generate finite element analysis (FEA) code for modeling complex processes, though its effectiveness varies based on the programming library used, underscoring its role as an assistant (Kim et al., 2025).
- Control Systems Engineering: Studies show ChatGPT can pass undergraduate control systems courses but struggles with open-ended projects requiring deep synthesis and practical judgment (Puthumanaillam & Ornik, 2025).
- Automated Design and Analysis with Engineering Agents: The next frontier is the deployment of engineering agents. These are autonomous systems that can manage complex, multi-step engineering workflows. For example, an agent could be tasked with a high-level goal, such as designing a mechanical part, and then autonomously generate design options, use software tools to run simulations (e.g., FEA), interpret the results, and iterate on the design until specifications are met (Wang et al., 2023).
6.2. Theoretical Constructs and Novel Engineering Methodologies
- Prompt Engineering for Optimization: Effective problem formulation using ChatGPT relies heavily on sophisticated prompt engineering and sequential learning approaches (Vu et al., 2025)
- Human-LLM Design Practices: Comparative studies are yielding insights into LLM strengths (e.g., breadth of ideation) and weaknesses (e.g., design fixation), leading to recommendations for structured design processes with human oversight (Ege et al., 2025).
- Cognitive Impact on Design Thinking: Research is exploring how AI influences designers' cognitive processes, such as fostering thinking divergence and fluency (Jiang et al., 2025).
- LLMs in Systems Engineering (SE): While LLMs can generate SE artifacts, there are significant risks, including tendencies towards "premature requirements definition" and "unsubstantiated numerical estimates" (Topcu et al., 2025). These risks are magnified in autonomous agentic systems where flawed assumptions could propagate through an entire automated workflow.
- Methodologies for Agentic Workflows: The rise of engineering agents necessitates new methodologies for managing human-agent and agent-agent collaboration. This includes designing frameworks for task decomposition, tool selection, and process validation to ensure the reliability and safety of autonomous engineering systems (Team, 2024).
6.3. Impact on Engineer Productivity and Future Practice
- Productivity Gains: Studies report significant productivity increases from using LLMs for tasks like code generation and drafting (Yadav et al., 2025). The shift toward agentic AI promises to extend these gains from task assistance to end-to-end workflow automation (Rawat et al., 2024).
- Concerns and Challenges: Concerns exist about over-dependence on AI, which could lead to skill degradation, and anxieties about job security (Yadav et al., 2025). The need for human oversight remains critical due to potential inaccuracies and biases (Ray, 2023).
- Preparing Future Engineers: Engineering curricula must adapt to prepare students for workplaces where GenAI tools are prevalent. This includes teaching AI literacy, prompt engineering, and the critical evaluation of AI outputs to ensure they can effectively supervise and collaborate with AI systems (Murray et al., 2025).
6.4. Advancing Engineering Methodologies and Theoretical Frameworks
- Agent-Assisted Engineering Frameworks: There is an opportunity to develop structured frameworks that explicitly integrate AI agents at various stages of the engineering design process. These frameworks would define roles, responsibilities, and interaction protocols for human engineers and their agentic counterparts.
- Theories of AI-Robustness in Design: The identification of LLM failure modes (Topcu et al., 2025) can inform new theories around "AI-robustness" to predict and mitigate risks associated with using AI in critical applications.
7. Navigating the AI Revolution: Themes, Tensions, Critical Gaps, and Future Directions
7.1. Common Themes Across Domains
7.2. Synthesis of Themes and Identification of Critical Research Gaps
- Natural Language Understanding (NLU): The "Specialization vs. Generalization Tension" persists. A fundamental gap lies in discerning genuine semantic understanding versus sophisticated pattern matching (Shormani, 2024; Katzir, 2023; Baroni, 2020; Lake, 2019). This gap becomes a critical safety concern for agentic systems that must act reliably based on their understanding of commands and environmental cues. The lack of explainability hinders trust and theoretical advancement, a problem that becomes acute when an agent's reasoning cannot be audited (Achiam et al., 2023; Liu et al., 2023; Sapkota et al., 2025).
- Content Generation: The "Quality-Scalability-Ethics Trilemma" is a core challenge (Dempere et al., 2023; Gamage et al., 2023). With the rise of agentic AI, this trilemma intensifies, as the potential for autonomous systems to act unethically at scale poses a far greater risk than generating harmful text alone. New technical solutions and legal frameworks are urgently needed to govern the actions of these agents (Ballardini et al., 2019; Craig, 2022).
- Knowledge Discovery: The "Black Box Conundrum" hinders the validation of AI-generated insights (Dai et al., 2023; Noy & Zhang, 2023). When a scientific agent autonomously conducts a research workflow, the need for a transparent and reproducible "chain of reasoning" becomes paramount for scientific integrity.
- Education: The "Pedagogical Adaptation Imperative" demands a shift in focus to skills that complement AI. A critical gap is the lack of research on how to educate students to collaborate with and critically supervise learning agents without sacrificing their own cognitive autonomy (Weidlich & Gašević, 2025; Kasneci et al., 2023). Ensuring equitable access to powerful learning agents is crucial to prevent a widening of educational disparities (Sabzalieva & Valentini, 2023).
- Engineering: The "Human-LLM Cognitive Symbiosis" must evolve into robust human-agent teaming. A major gap exists in developing validation techniques for agents in safety-critical applications and creating theoretical frameworks for trust and responsibility in these collaborative systems (Topcu et al., 2025; Miller, 2023).
7.3. Proposal of a Forward-Looking Research Agenda
7.3.1. Methodological Advancements
- NLU: Develop benchmarks that assess "deep understanding" and robust reasoning, critical for safe agentic behavior.
- Content & Action Generation: Design adaptive quality and ethical control frameworks that are integrated directly into an agent's decision-making loop.
- Knowledge Discovery: Develop and validate rigorous protocols for human supervision of AI-assisted hypothesis generation and experimentation.
- Education: Conduct longitudinal studies on the impact of learning agents on cognitive development. Design and test AI literacy curricula focused on human-agent collaboration.
- Engineering: Formulate comprehensive testing and validation protocols for agents used in safety-critical design tasks and implement robust human-in-the-loop control frameworks.
- Cross-Domain Methodologies for Agentic Systems: A crucial priority is to develop standardized safety protocols, robust and intuitive human-in-the-loop control mechanisms, and secure "sandboxing" environments for testing the behavior of autonomous agents before deployment in real-world settings.
7.3.2. Theoretical Advancements:
- NLU: Formulate theories of "Explainable Generative NLU" to make agent reasoning transparent.
- Content & Action Generation: Develop "Ethical AI Agency Frameworks" that provide a theoretical basis for guiding the responsible actions of autonomous systems.
- Knowledge Discovery: Propose "Computational Creativity Theories" to explain how AI agents contribute to novel discovery.
- Education: Build "AI-Augmented Learning Theories" that model how students learn effectively in partnership with AI agents, exploring frameworks like "Cyborg Pedagogy."
- Engineering: Conceptualize "Human-Agent Symbiotic Engineering Theories" that define principles for shared cognition and distributed responsibility in human-agent teams.
- Theories of Trustworthy Autonomy and Governance: An overarching theoretical challenge is to develop robust theories of human-agent teaming, create computational models for agent accountability, and design governance frameworks for multi-agent ecosystems where agents interact with each other and with society (Xi et al., 2023).
7.4. Practical Implications for Method, Theory, and Practice
- Method: The identified limitations necessitate new methodological approaches. This includes developing robust validation protocols for both generated content and agentic actions, advancing techniques like Retrieval-Augmented Generation (RAG) to ground agent knowledge (Lewis et al., 2020), and establishing prompt engineering as a core skill for effective human-agent interaction. Crucially, new methods are needed for designing, testing, and ensuring the safety and reliability of complex, multi-step agentic workflows (Team, 2024).
- Theory: The challenges and emergent interactions demand new theoretical frameworks. These include theories for Explainable NLU, Responsible Generative Efficiency, and AI-assisted Abductive Reasoning. In education and engineering, this means developing AI-Augmented Learning Theories and Human-Agent Symbiotic Engineering Theories. These frameworks are the essential theoretical underpinnings for building trustworthy and beneficial AI agents. Overarching this is the need for Co-evolutionary AI Development Frameworks that model the interplay between technical and ethical progress, which is paramount for guiding agentic systems (Floridi & Nobre, 2024).
- Practice: The practical implications are vast, requiring significant adaptation. This includes revising educational pedagogy to focus on skills like critical thinking and AI literacy, training professionals in human-agent teaming (HAT) (Seeber et al., 2020), implementing rigorous quality assurance for AI outputs, and prioritizing ethical design and bias mitigation. The shift in practice is from using AI as a tool to leveraging it as a cognitive partner; this partnership is evolving into one where humans provide strategic oversight and ethical judgment for increasingly autonomous AI agents (Shneiderman, 2022).
8. Limitation of this Critical Review Study
9. Conclusions
References
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F. L.; McGrew, B. Gpt-4 technical report. arXiv arXiv:2303.08774, 2023.
- Adiyono, A.; Al Matari, A. S.; Dalimarta, F. F. Analysis of Student Perceptions of the Use of ChatGPT as a Learning Media: A Case Study in Higher Education in the Era of AI-Based Education. Journal of Education and Teaching (JET) 2025, 6, 306–324. [Google Scholar]
- Akhtarshenas, A.; Dini, A.; Ayoobi, N. ChatGPT or A Silent Everywhere Helper: A Survey of Large Language Models. arXiv arXiv:2503.17403, 2025.
- Al Naqbi, H.; Bahroun, Z.; Ahmed, V. Enhancing work productivity through generative artificial intelligence: A comprehensive literature review. Sustainability 2024, 16, 1166. [Google Scholar] [CrossRef]
- Albadarin, Y.; Saqr, M.; Pope, N.; Tukiainen, M. A systematic literature review of empirical research on ChatGPT in education. Discover Education 2024, 3, 60. [Google Scholar] [CrossRef]
- Alghazo, R.; Fatima, G.; Malik, M.; Abdelhamid, S. E.; Jahanzaib, M.; Raza, A. Exploring ChatGPT's Role in Higher Education: Perspectives from Pakistani University Students on Academic Integrity and Ethical Challenges. Education Sciences 2025, 15. [Google Scholar] [CrossRef]
- Arslan, B.; Lehman, B.; Tenison, C.; Sparks, J. R. , López, A. A., Gu, L., Zapata-Rivera, D. Opportunities and challenges of using generative AI to personalize educational assessment. Frontiers in Artificial Intelligence 2024, 7, 1460651. [Google Scholar]
- Arvidsson, S.; Axell, J. (2023). Prompt engineering guidelines for LLMs in Requirements Engineering.
- Atchley, P.; Pannell, H.; Wofford, K.; Hopkins, M.; Atchley, R.A. Human and AI collaboration in the higher education environment: Opportunities and concerns. Cognitive Research: Principles and Implications 2024, 9, 20. [Google Scholar] [CrossRef]
- Ballardini, R. M. , He, K., Roos, T. (2019). AI-generated content: authorship and inventorship in the age of artificial intelligence. In Online Distribution of Content in the EU (pp. 117-135). Edward Elgar Publishing.
- Baroni, M. Linguistic generalization and compositionality in modern artificial neural networks. Philosophical Transactions of the Royal Society B 2020, 375, 20190307. [Google Scholar] [CrossRef]
- Belzner, L.; Gabor, T.; Wirsing, M. (2023, October). Large language model assisted software engineering: prospects, challenges, and a case study. In International Conference on Bridging the Gap between AI and Reality (pp. 355-374). Cham: Springer Nature Switzerland.
- Bender, E. M. , Gebru, T. , McMillan-Major, A., Shmitchell, March). On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623)., S. (2021. [Google Scholar]
- Bhagavatula, C.; Bras, R. L. , Malaviya, C. Abductive commonsense reasoning. arXiv arXiv:1908.05739, 2019.
- Boiko, D. A. , MacKnight, R., Gomes, G. (2023). Emergent autonomous scientific research capabilities of large language models. arXiv. [CrossRef]
- Bran, A.; Cox, S. R. , Schilter, P. (2024). ChemCrow: Augmenting large-language models with a tool-set for chemistry. arXiv. [CrossRef]
- Chan, R. Y. , Sharma, S., Bista, K. (Eds.). (2024). ChatGPT and Global Higher Education: Using Artificial Intelligence in Teaching and Learning. STAR Scholars Press.
- Chang, X.; Dai, G.; Di, H.; Ye, H. Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection. arXiv arXiv:2504.16125, 2025.
- Chen, B.; Zhang, Z. ; Langrené; N; Zhu, S. (2025). Unleashing the potential of prompt engineering for large language models. Patterns.
- Cotton, D. R.; Cotton, P. A.; Shipway, J. R. Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International 2024, 61, 228–239. [Google Scholar] [CrossRef]
- Craig, C.J. (2022). The AI-copyright challenge: Tech-neutrality, authorship, and the public interest. In Research handbook on intellectual property and artificial intelligence (pp. 134-155). Edward Elgar Publishing.
- Dagdelen, J.; Dunn, A.; Lee, S.; Walker, N.; Rosen, A. S. , Ceder, G. , Jain, A. Structured information extraction from scientific text with large language models. Nature Communications 2024, 15, 1418. [Google Scholar] [PubMed]
- Dai, W.; Lin, J.; Jin, H.; Li, T.; Tsai, Y. S. , Gašević, D., Chen, G. (2023, July). Can large language models provide feedback to students? A case study on ChatGPT. In 2023 IEEE international conference on advanced learning technologies (ICALT) (pp. 323-325). IEEE.
- Daun, M.; Brings, J. (2023, June). How ChatGPT will change software engineering education. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (pp. 110-116). [Google Scholar]
- Davar, N. F. , Dewan, M. A. A., Zhang, X. AI chatbots in education: challenges and opportunities. Information 2025, 16, 235. [Google Scholar]
- Dempere, J.; Modugu, K.; Hesham, A.; Ramasamy, L.K. (2023, September). The impact of ChatGPT on higher education. In Frontiers in Education (Vol. 8, p. 1206936). Frontiers Media SA.
- Dimeli, M.; Kostas, A. The Role of ChatGPT in Education: Applications, Challenges: Insights From a Systematic Review. Journal of Information Technology Education: Research 2025, 24, 2. [Google Scholar] [CrossRef]
- Dovesi, D.; Malandri, L.; Mercorio, F.; Mezzanzanica, M. A survey on explainable AI for Big Data. Journal of Big Data 2024, 11, 6. [Google Scholar] [CrossRef]
- Dwivedi, Y. K. , Kshetri, N. , Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Wright, R. Opinion Paper:“So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International journal of information management 2023, 71, 102642. [Google Scholar]
- Ege, D. N. , Øvrebø, H. H., Stubberud, V., Berg, M. F., Elverum, C., Steinert, M., Vestad, H. ChatGPT as an inventor: Eliciting the strengths and weaknesses of current large language models against humans in engineering design. AI EDAM 2025, 39, e6. [Google Scholar]
- Elkatmis, M. ChatGPT and Creative Writing: Experiences of Master's Students in Enhancing. International Journal of Contemporary Educational Research 2024, 11, 321–336. [Google Scholar] [CrossRef]
- Empirical Methods in Natural Language Processing. (2024). The 2024 Conference on Empirical Methods in Natural Language Processing. https://aclanthology. 2024.
- Eymann, V.; Lachmann, T.; Czernochowski, D. When ChatGPT Writes Your Research Proposal: Scientific Creativity in the Age of Generative AI. Journal of Intelligence 2025, 13, 55. [Google Scholar] [CrossRef]
- Fill, H. G. , Fettke, P. , Köpke, J. Conceptual modeling and large language models: impressions from first experiments with ChatGPT. Enterprise Modelling and Information Systems Architectures (EMISAJ) 2023, 18, 1–15. [Google Scholar]
- Fisher, J. (2025, May). ChatGPT for Legal Marketing: 6 Ways to Unlock the Power of AI. AI-CASEpeer. https://www.casepeer.
- Floridi, L.; Nobre, C. Artificial intelligence, and the new challenges of anticipatory governance. Ethics and Information Technology 2024, 26, 24. [Google Scholar] [CrossRef]
- Gabashvili, I.S. The impact and applications of ChatGPT: a systematic review of literature reviews. arXiv arXiv:2305.18086, 2023.
- Gamage, K. A. , Dehideniya, S. C., Xu, Z., Tang, X. ChatGPT and higher education assessments: More opportunities than concerns?. Journal of Applied Learning and Teaching 2023, 6, 358–369. [Google Scholar]
- Gao, R.; Yu, D.; Gao, B.; Hua, H.; Hui, Z.; Gao, J.; Yin, C. Legal regulation of AI-assisted academic writing: challenges, frameworks, and pathways. Frontiers in Artificial Intelligence 2025, 8, 1546064. [Google Scholar] [CrossRef]
- Gao, X.; Zhang, Z.; Xie, M.; Liu, T.; Fu, Y. Graph of AI Ideas: Leveraging Knowledge Graphs and LLMs for AI Research Idea Generation. arXiv arXiv:2503.08549, 2025.
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, H. Retrieval-augmented generation for large language models: A survey. ACM Computing Surveys 2024, 57, 1–46. [Google Scholar] [CrossRef]
- Garbuio, M.; Lin, N. Innovative idea generation in problem finding: Abductive reasoning, cognitive impediments, and the promise of artificial intelligence. Journal of Product Innovation Management 2021, 38, 701–725. [Google Scholar] [CrossRef]
- Garg, R. K. , Urs, V. L., Agarwal, A. A., Chaudhary, S. K., Paliwal, V., Kar, S. K. Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review. Health Promotion Perspectives 2023, 13, 183. [Google Scholar]
- Garousi, V. Why you shouldn't fully trust ChatGPT: A synthesis of this AI tool's error rates across disciplines and the software engineering lifecycle. arXiv arXiv:2504.18858, 2025.
- Glickman, M.; Zhang, Y. AI and generative AI for research discovery and summarization. arXiv arXiv:2401.06795, 2024. [CrossRef]
- Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Poon, H. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH) 2021, 3, 1–23. [Google Scholar] [CrossRef]
- Hadi, M. U. , Qureshi, R. , Shah, A., Irfan, M., Zafar, A., Shaikh, M. B., Mirjalili, S. Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints 2023, 1, 1–26. [Google Scholar]
- Hagendorff, T. A virtue ethics-based framework for the corporate ethics of AI. AI and Ethics 2024, 4, 653–666. [Google Scholar] [CrossRef]
- Haltaufderheide, J.; Ranisch, R. ChatGPT and the future of academic publishing: A perspective. The American Journal of Bioethics 2024, 24, 4–11. [Google Scholar]
- Haman, M.; Školník, M. Using ChatGPT for scientific literature review: a case study. IASL 2024, 1, 1–13. [Google Scholar]
- Hannigan, T. R. , McCarthy, I. P., Spicer, A. Beware of botshit: How to manage the epistemic risks of generative chatbots. Business Horizons 2024, 67, 471–486. [Google Scholar]
- Hariri, W. Unlocking the potential of ChatGPT: A comprehensive exploration of its applications, advantages, limitations, and future directions in natural language processing. arXiv arXiv:2304.02017, 2023.
- Herman, S. (2025, February). The art of the prompt: AI prompts for every creative. Adobe. https://www.adobe.com/creativecloud/ai/discover/ai-prompts.
- Hodge Jr, S.D. Revolutionizing Justice: Unleashing the Power of Artificial Intelligence. SMU Sci. & Tech. L. Rev. 2023, 26, 217. [Google Scholar]
- Hu, Y.; Lu, Y. Rag and rau: A survey on retrieval-augmented language model in natural language processing. arXiv arXiv:2404.19543, 2024.
- Huang, J.; Chang, K.C.C. Towards reasoning in large language models: A survey. arXiv arXiv:2212.10403, 2022.
- Hupkes, D.; Dankers, V.; Mul, M.; Bruni, E. Compositionality decomposed: How do neural networks generalise? . Journal of Artificial Intelligence Research 2020, 67, 757–795. [Google Scholar] [CrossRef]
- Imran, M.; Almusharraf, N. (2023). Analyzing the role of ChatGPT in facilitating the process of literature review. Available at SSRN 440 4768.
- Infosys Limited. (2023). A perspective on ChatGPT, Its Impact and Limitations. https://www.infosys.com/techcompass/documents/perspective-chatgpt-impact-limitations.
- Isiaku, L.; Muhammad, A. S. , Kefas, H. I., Ukaegbu, F. C. Enhancing technological sustainability in academia: leveraging ChatGPT for teaching, learning and evaluation. Quality Education for All 2024, 1, 385–416. [Google Scholar]
- Jiang, C.; Huang, R.; Shen, T. Generative AI-Enabled Conceptualization: Charting ChatGPT’s Impacts on Sustainable Service Design Thinking With Network-Based Cognitive Maps. Journal of Computing and Information Science in Engineering 2025, 25. [Google Scholar] [CrossRef]
- Jiang, Y.; Hao, J.; Fauss, M.; Li, C. Detecting ChatGPT-generated essays in a large-scale writing assessment: Is there a bias against non-native English speakers? . Computers & Education 2024, 217, 105070. [Google Scholar]
- Johnson, S.; Acemoglu, D. (2023). Power and Progress: Our Thousand-Year Struggle Over Technology and Prosperity. Hachette UK.
- Kasneci, E.; Seßler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Kasneci, G. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences 2023, 103, 102274. [Google Scholar]
- Katzir, R. (2023). Why large language models are poor theories of human linguistic cognition. A reply to Piantadosi (2023). Manuscript. Tel Aviv University. url: https://lingbuzz. net/lingbuzz/007190.
- Kaushik, A.; Yadav, S.; Browne, A.; Lillis, D.; Williams, D.; Donnell, J. M. , Arora, M. Exploring the Impact of Generative Artificial Intelligence in Education: A Thematic Analysis. 2025; arXiv:2501.10134. [Google Scholar]
- Keysers, D.; Schärli, N.; Scales, N.; Buisman, H.; Furrer, D.; Kashubin, S.; Bousquet, O. Measuring compositional generalization: A comprehensive method on realistic data. arXiv arXiv:1912.09713, 2019.
- Kim, T.; Yun, T. S. , Suh, H. S. (2025). Can ChatGPT implement finite element models for geotechnical engineering applications?. International Journal for Numerical and Analytical Methods in Geomechanics.
- Knoth, C. H. , Kieslich, K. , Fraumann, G., Gfrereis, P. Opportunities and limits of using large language models in evidence synthesis: a descriptive case study. Systematic Reviews 2024, 13, 1–13. [Google Scholar]
- Lake, B.M. Compositional generalization through meta sequence-to-sequence learning. Advances in neural information processing systems 2019, 32. [Google Scholar]
- Lake, B.; Baroni, M. (2018, July). Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In International conference on machine learning (pp. 2873-2882). PMLR.
- Lake, B. M. , Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature 2023, 623, 115–121. [Google Scholar] [CrossRef] [PubMed]
- Lee, H. The rise of ChatGPT: Exploring its potential in medical education. Anatomical sciences education 2024, 17, 926–931. [Google Scholar] [CrossRef]
- Levitt, G.; Grubaugh, S. Artificial intelligence and the paradigm shift: Reshaping education to equip students for future careers. The International Journal of Social Sciences and Humanities Invention 2023, 10, 7931–7941. [Google Scholar] [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Kiela, D. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 2020, 33, 9459–9474. [Google Scholar]
- Li, M. The impact of ChatGPT on teaching and learning in higher education: challenges, opportunities, and future scope. Encyclopedia of Information Science and Technology, Sixth Edition 2025, 1-20.
- Li, R.; Liang, P.; Wang, Y.; Cai, Y.; Sun, W.; Li, Z. Unveiling the Role of ChatGPT in Software Development: Insights from Developer-ChatGPT Interactions on GitHub. arXiv arXiv:2505.03901, 2025.
- Liu, Y.; Deng, G.; Xu, Z.; Li, Y.; Zheng, Y.; Zhang, Y.; Liu, Y. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv arXiv:2305.13860, 2023.
- Liu, Y.; Kong, W.; Merve, K. ChatGPT applications in academic writing: a review of potential, limitations, and ethical challenges. Arquivos Brasileiros de Oftalmologia 2025, 88, e2024–0269. [Google Scholar] [CrossRef]
- Liu, Y.; Yao, Y.; Ton, J. F. , Zhang, X., Guo, R., Cheng, H., Li, H. Trustworthy llms: a survey and guideline for evaluating large language models' alignment. 2023; arXiv:2308.05374. https://arxiv.org/abs/2308. [Google Scholar]
- Magnani, L.; Arfini, S. (2024). Model-based abductive cognition: What thought experiments teach us. Logic Journal of the IGPL, jzae096.
- Marques, N.; Silva, R. R. , Bernardino, J. Using chatgpt in software requirements engineering: A comprehensive review. Future Internet 2024, 16, 180. [Google Scholar]
- Marvin, G.; Hellen, N.; Jjingo, D.; Nakatumba-Nabende, J. (2023, June). Prompt engineering in large language models. In International conference on data intelligence and cognitive informatics (pp. 387-402). Singapore: Springer Nature Singapore.
- Mavrepis, P.; Makridis, G.; Fatouros, G.; Koukos, V.; Separdani, M. M. , Kyriazis, D. . arXiv arXiv:2401.13110, 2024.
- Means, B.; Toyama, Y.; Murphy, R.; Bakia, M.; Jones, K. (2010). Evaluation of evidence-based practices in online learning: A meta-analysis and review of online learning studies. U.S. Department of Education.
- Melamed, R.; McCabe, L. H. , Wakhare, T. Prompts have evil twins. arXiv arXiv:2311.07064, 2023.
- Michel-Villarreal, R.; Vilalta-Perdomo, E.; Salinas-Navarro, D. E. , Thierry-Aguilera, R. , Gerardou, F. S. Challenges and opportunities of generative AI for higher education as explained by ChatGPT. Education Sciences 2023, 13, 856. [Google Scholar]
- Miller, D. Exploring the impact of artificial intelligence language model ChatGPT on the user experience. International Journal of Technology Innovation and Management (IJTIM) 2023, 3, 1–8. [Google Scholar]
- Mitra, M.; de Vos, M. G. , Cortinovis, N., Ometto, D. (2024, September). Generative AI for Research Data Processing: Lessons Learnt From Three Use Cases. In 2024 IEEE 20th International Conference on e-Science (e-Science) (pp. 1-10). IEEE.
- Mohammed, A. (2025, March). Navigating the AI revolution: Safeguarding academic integrity and ethical considerations in the age of innovation. BERA. https://www.bera.ac.
- Molenaar, I. Human-AI co-regulation: A new focal point for the science of learning. Npj Science of Learning 2024, 9, 29. [Google Scholar] [CrossRef]
- Mostafapour, M.; Asoodar, M.; Asoodar, M. Advantages and disadvantages of using ChatGPT for academic literature review. Cogent Engineering 2024, 11, 2315147. [Google Scholar]
- Mozes, M.A.J. (2024). Understanding and Guarding against Natural Language Adversarial Examples (Doctoral dissertation, UCL (University College London)).
- Mozes, M.; He, X.; Kleinberg, B.; Griffin, L.D. Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. arXiv arXiv:2308.12833, 2023.
- Murray, M.; Maclachlan, R.; Flockhart, G. M. , Adams, R. European Journal of Engineering Education 2025, 1–26. [Google Scholar] [CrossRef]
- Naseem, U.; Dunn, A. G. , Khushi, M. , Kim, J. Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT. BMC bioinformatics 2022, 23, 144. [Google Scholar]
- Naveed, J. (2025). Optimized Code Generation in BIM with Retrieval-Augmented LLMs.
- Neveditsin, N.; Lingras, P.; Mago, V. Clinical insights: A comprehensive review of language models in medicine. PLOS Digital Health 2025, 4, e0000800. [Google Scholar] [CrossRef]
- Nguyen, M. N. , Nguyen Thanh, B., Vo, D. T. H., Pham Thi Thu, T., Thai, H., Ha Xuan, S. (2023). Evaluating the Efficacy of Generative Artificial Intelligence in Grading: Insights from Authentic Assessments in Economics. Available at SSRN 4648790.
- Niloy, A. C. , Akter, S. , Sultana, N., Sultana, J., Rahman, S. I. U. Is Chatgpt a menace for creative writing ability? An experiment. Journal of computer assisted learning 2024, 40, 919–930. [Google Scholar]
- Noy, S.; Zhang, W. Experimental evidence on the productivity effects of generative artificial intelligence. Science 2023, 381, 187–192. [Google Scholar] [CrossRef]
- OpenAI (2023a). GPT-3.5 Turbo. https://openai.
- OpenAI (2023b). GPT-4 Technical Report. https://openai.
- OpenAI (2023c). Safety & alignment. https://openai.
- OpenAI (2024a). ChatGPT FAQ. https://help.openai. 3742.
- OpenAI (2024c). Hello GPT-4o. https://openai.
- OpenAI (2024d). Introducing o1: Our next step in AI research. https://openai.
- OpenAI (2024e). o1-mini: Our best performing model on AIME. https://openai.
- OpenAI (2024f). o1-preview: Advanced reasoning in STEM. https://openai.
- OpenAI Help Center. (n.d.). What is the ChatGPT model selector? Retrieved , 2025, from https://help.openai. 11 June 7864.
- Oremus, W. (2023). The clever trick that turns ChatGPT into its evil twin. The Washington Post. URL https://www.washingtonpost.com/technology/2023/02/14/chatgpt-dan-jailbreak/.
- Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying large language models and knowledge graphs: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2024, 14, e1518. [Google Scholar] [CrossRef]
- Pareschi, R. Abductive reasoning with the GPT-4 language model: Case studies from criminal investigation, medical practice, scientific research. Sistemi intelligenti 2023, 35, 435–444. [Google Scholar]
- Park, J. S. , O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (pp. 1–22). Association for Computing Machinery. [Google Scholar] [CrossRef]
- Perlman, A. The implications of ChatGPT for legal services and society. Mich. Tech. L. Rev. 2023, 30, 1. [Google Scholar]
- Perez, E.; Huang, S.; Song, F.; Cai, T.; Ring, R.; Aslanides, J.; Irving, G. Red teaming language models with language models. arXiv arXiv:2202.03286, 2022.
- Perez, F.; Ribeiro, I. Ignore previous prompt: Attack techniques for language models. arXiv arXiv:2211.09527, 2022.
- Preiksaitis, C.; Rose, C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: scoping review. JMIR medical education 2023, 9, e48785. [Google Scholar] [CrossRef]
- Prinsloo, P. Data frontiers and frontiers of power in higher education: A view of the an/archaeology of data. Teaching in Higher Education 2020, 25, 394–412. [Google Scholar] [CrossRef]
- Puthumanaillam, G.; Ornik, M. The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own. arXiv arXiv:2503.05760, 2025.
- Rawat, A. S. , Fazzini, M. , George, T., Gokulan, R., Maddila, C., Arrieta, A. A new era of software development: A survey on the impact of large language models. ACM Computing Surveys 2024, 57, 1–40. [Google Scholar] [CrossRef]
- Ray, P.P. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems 2023, 3, 121–154. [Google Scholar] [CrossRef]
- RedBlink. (2025, April). Llama 4 vs ChatGPT: Comprehensive AI Models Comparison 2025. https://redblink.
- Reich, J. (2020). Failure to disrupt: Why technology alone can’t transform education. Harvard University Press.
- Rice, S.; Crouse, S. R. , Winter, S. R., Rice, C. The advantages and limitations of using ChatGPT to enhance technological research. Technology in Society 2024, 76, 102426. [Google Scholar]
- Sabzalieva, E.; Valentini, A. (2023). ChatGPT and artificial intelligence in higher education: Quick start guide.
- Salesforce. (2025, ). AI agents in education: Benefits & use cases. Salesforce. https://www.salesforce. 23 June.
- Sallam, M. (2023, March). ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare (Vol. 11, No. 6, p. 887). MDPI.
- Sapkota, R.; Raza, S.; Karkee, M. Comprehensive analysis of transparency and accessibility of chatgpt, deepseek, and other sota large language models. arXiv arXiv:2502.18505, 2025.
- Schiller, C.A. The human factor in detecting errors of large language models: A systematic literature review and future research directions. arXiv arXiv:2403.09743, 2024.
- Seeber, I.; Bittner, E.; Briggs, R. O. , de Vreede, T. , de Vreede, G.-J., Elbanna, A., Söllner, M. Machines as teammates: A research agenda on AI in team collaboration. Information & Management 2020, 57, 103174. [Google Scholar] [CrossRef]
- Shah, N.; Jain, S.; Lauth, J.; Mou, Y.; Bartsch, M.; Wang, Y.; Luo, Y. Can large language models reason about medical conversation? arXiv arXiv:2305.00412, 2023.
- Shinn, N.; Cassano, F.; Gopinath, A.; Narasimhan, K.; Yao, S. (2023). Reflexion: Language agents with verbal reinforcement learning. arXiv. [CrossRef]
- Shneiderman, B. (2022). Human-centered AI. Oxford University Press.
- Shormani, M.Q. Non-native speakers of English or ChatGPT: Who thinks better? arXiv arXiv:2412.00457, 2024.
- Sivarajkumar, S.; Kelley, M.; Samolyk-Mazzanti, A.; Visweswaran, S.; Wang, Y. An empirical evaluation of prompting strategies for large language models in zero-shot clinical natural language processing: algorithm development and validation study. JMIR Medical Informatics 2024, 12, e55318. [Google Scholar] [CrossRef]
- Surden, H. ChatGPT, AI large language models, and law. Fordham L. Rev. 2023, 92, 1941. [Google Scholar]
- Susnjak, T.; McIntosh, J. Academic integrity in the age of ChatGPT. Change: The Magazine of Higher Learning 2024, 56, 21–27. [Google Scholar]
- Susskind, R.; Susskind, D. (2022). The future of the professions: How technology will transform the work of human experts. Oxford University Press.
- Team, A. (2024). The agentic design pattern: A new paradigm for building AI systems. Andreessen Horowitz. https://a16z.
- Thelwall, M. Evaluating research quality with large language models: an analysis of ChatGPT’s effectiveness with different settings and inputs. Journal of Data and Information Science 2024, 241218–241218. [Google Scholar] [CrossRef]
- Topcu, T. G. , Husain, M., Ofsa, M., Wach, P. (2025). Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes. Systems Engineering.
- US Department of Education, Office of Educational Technology. (2023). Artificial intelligence and the future of teaching and learning: Insights and recommendations. https://www.ed.gov/sites/ed/files/documents/ai-report/ai-report.
- UNESCO (2023). Guidance for generative AI in education and research. UNESCO. https://unesdoc.unesco. 4822.
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N. , Kaiser, Ł., Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems 30 (pp. 5998–6008). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.
- Veisi, O.; Bahrami, S.; Englert, R.; Müller, C. AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How. arXiv arXiv:2504.18044, 2025.
- Velásquez-Henao, J. D. , Franco-Cardona, C. J., Cadavid-Higuita, L. Prompt Engineering: a methodology for optimizing interactions with AI-Language Models in the field of engineering. Dyna 2023, 90(SPE230), 9-17.
- Vu, N.G. H. , Wang, K. G. Effective prompting with ChatGPT for problem formulation in engineering optimization. Engineering Optimization 2025, 1–18. [Google Scholar]
- Wang, G.; Xie, Y.; Jiang, Y.; Mandlekar, A.; Xiao, C.; Zhu, Y.; Fan, L.; Anandkumar, A. (2023). Voyager: An open-ended embodied agent with large language models. arXiv. [CrossRef]
- Waseem, F.; Al-Ghamdi, D.; Al-Ghamdi, A.; Ahmad, I. Unlocking the potential of ChatGPT in requirements engineering: a study of benefits and challenges. Arabian Journal for Science and Engineering 2023, 1–15. [Google Scholar]
- Weidlich, J. ; Gašević; D (2025). ChatGPT in education: An effect in search of a cause. PsyArXiv Preprints.
- Weidinger, L.; Mellor, J.; Rauh, M.; Griffin, C.; Uesato, J.; Huang, P.-S.; Cheng, M.; Glaese, M.; Balle, B.; Kasirzadeh, A.; Kenton, Z.; Brown, S.; Hawkins, W.; Stepleton, T.; Biles, C.; Birhane, A.; Haas, J.; Laura, L.; Gabriel, I. (2024). An overarching risk analysis and management framework for frontier AI. arXiv. [CrossRef]
- Wiedemer, T.; Mayilvahanan, P.; Bethge, M.; Brendel, W. Compositional generalization from first principles. Advances in Neural Information Processing Systems 2023, 36, 6941–6960. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Rush, A.M. Huggingface's transformers: State-of-the-art natural language processing. arXiv 2019, arXiv:1910.03771. https://arxiv.org/abs/1910, 03771. [Google Scholar]
- Wu, T.; He, S.; Liu, J.; Sun, Y.; Liu, K.; Han, T. X. , Zhao, J. (2024). A brief overview of the dark side of AI: The case of ChatGPT. Communications of the ACM.
- Wu, Y.; Zhao, Y.; Hu, B.; Minervini, P.; Stenetorp, P.; Riedel, S. An efficient memory-augmented transformer for knowledge-intensive nlp tasks. arXiv arXiv:2210.16773, 2022.
- Xi, Z.; Chen, W.; Guo, X.; He, H.; Ding, Y.; Hong, B.; Zhang, M.; Wang, J.; Jin, S.; Zhou, E.; Wang, R. (2023). The rise and potential of large language model based agents: A survey. arXiv. [CrossRef]
- Xue, J.; Zheng, M.; Hua, T.; Shen, Y.; Liu, Y.; Bölöni, L.; Lou, Q. Trojllm: A black-box trojan prompt attack on large language models. Advances in Neural Information Processing Systems 2023, 36, 65665–65677. [Google Scholar]
- Yadav, S.; Qureshi, A. M. , Kaushik, A., Sharma, S., Loughran, R., Kazhuparambil, S., Lillis, D. From Idea to Implementation: Evaluating the Influence of Large Language Models in Software Development--An Opinion Paper. 2025; arXiv:2503.07450. [Google Scholar]
- Yang, X.; Chen, A.; PourNejatian, N.; Shin, H. C. , Smith, K. arXiv arXiv:2203.03540, 2022.
- Yu, W. (2022, July). Retrieval-augmented generation across heterogeneous knowledge. In Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies: student research workshop (pp. 52-58). [Google Scholar]
- Yu, Y.; Kim, S.; Lee, W.; Koo, B. Evaluating ChatGPT on Korea's BIM Expertise Exam and improving its performance through RAG. Journal of Computational Design and Engineering 2025, 12, 94–120. [Google Scholar] [CrossRef]
- Zakir, M. H. , Bashir, S. , Nisar, K., Ibrahim, S., Khan, N., Khan, S. H. Navigating the Legal Labyrinth: Establishing Copyright Frameworks for AI-Generated Content. Remittances Review 2024, 9, 2515–2532. [Google Scholar]
- Zawacki-Richter, O.; Marín, V. I. , Bond, M. , Gouverneur, F. Systematic review of research on artificial intelligence applications in higher education – where are the educators? International Journal of Educational Technology in Higher Education 2019, 16, 39. [Google Scholar] [CrossRef]
- Zhai, X. (2023). ChatGPT for next generation science learning. Available at SSRN 433 1313.
- Zhao, H.; Chen, H.; Yang, F.; Liu, N.; Deng, H.; Cai, H.; Du, M. Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology 2024, 15, 1–38. [Google Scholar] [CrossRef]
- Zhou, Y.; Muresanu, A. I. , Han, Z., Paster, K., Pitis, S., Chan, H., Ba, J. Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations. 2022.
- Zhu, D.; Chen, J.; Shen, X.; Li, X.; Elhoseiny, M. (2024). A survey on multimodal large language models. arXiv. [CrossRef]
- Zhu, S.; Wang, Z.; Zhuang, Y.; Jiang, Y.; Guo, M.; Zhang, X.; Gao, Z. Exploring the impact of ChatGPT on art creation and collaboration: Benefits, challenges and ethical implications. Telematics and Informatics Reports 2024, 14, 100138. [Google Scholar] [CrossRef]

| Model Version | Key Architectural Features/Training Data Cutoff | Notable NLU Capabilities | Content Generation Strengths | Known Limitations | Key Benchmark Performance (Example) |
| ChatGPT-3.5 / 3.5-Turbo | Based on GPT-3.5, Text/Code (pre-2021/2023) (Infosys Limited, 2023; OpenAI, 2023a) | Basic text tasks, translation, conversational AI, faster responses (Infosys Limited, 2023; OpenAI, 2023a; Susskind & Susskind, 2022) | Dialogue, boilerplate tasks, initial drafts, summaries (Infosys Limited, 2023; OpenAI, 2023a; Susskind & Susskind, 2022) | Accuracy issues, bias, limited by training data cutoff, struggles with highly specialized tasks (Akhtarshenas et al., 2025; OpenAI, 2023a) | GLUE average score ~78.7% (comparable to BERT-base, lags RoBERTa-large) (Shormani, 2024). Passed Korea's BIM Expertise Exam with 65% average (Yu et al., 2025). Error rates in healthcare can be high (Ray, 2023). |
| ChatGPT-4 | Based on GPT-4, Text/Code (pre-2023) (Sapkota et al., 2025; Achiam et al., 2023; OpenAI, 2023b) | Multimodal (text), high precision, improved reasoning, expanded context window (Hariri, 2023; Achiam et al., 2023; OpenAI, 2023b; OpenAI, 2024b) | More coherent, contextually relevant text, complex conversations, nuanced topics (Hariri, 2023; Achiam et al., 2023; OpenAI, 2023b; OpenAI, 2024b) | Still prone to hallucinations, bias; costlier; specific weaknesses in areas like local guidelines without RAG (Ray, 2023; Bender et al., 2021; Achiam et al., 2023; OpenAI, 2023b; Nguyen et al., 2023) | Passed Korea's BIM Expertise Exam with 85% average (improved to 88.6% with RAG for specific categories) (Yu et al., 2025). Lower error rates in business/economics (~15-20%) compared to 3.5 (Ray, 2023). |
| GPT-4o / GPT-4o mini | Text/Code (pre-2024) (Hariri, 2023; OpenAI, 2024c) | Multimodal (text/image/audio/video), improved contextual awareness, advanced tokenization, cost-efficiency (mini) (Hariri, 2023; OpenAI, 2024c) | Richer, more interactive responses, real-time collaboration support (Hariri, 2023; OpenAI, 2024c) | Newer models, long-term limitations still under study, but likely share core LLM challenges. | GPT-4o slightly better than 3.5-turbo and 4o-mini on research quality score estimates (correlation 0.67 with human scores using title/abstract) (Thelwall, 2024). GPT-4o mini outperforms GPT-3.5 Turbo on MMLU (82% vs 69.8%) (RedBlink, 2025). |
| o1-series (o1-preview, o1-mini, o1) | STEM-focused data, some general data (pre-2024/2025) (Sapkota et al., 2025; OpenAI, 2024d) | System 2 thinking, PhD-level STEM reasoning (o1-preview), fast reasoning (o1-mini), full o1 reasoning and multimodality (o1) (Sapkota et al., 2025; OpenAI, 2024d) | Analytical rigor, hypothesis generation/evaluation (biology, math, engineering) (OpenAI Help Center, n.d.; OpenAI, 2024e) | Specialized for STEM, general capabilities relative to GPT-4o may vary. | o1-mini is best performing benchmarked model on AIME 2024 and 2025 (OpenAI Help Center, n.d.; OpenAI, 2024e). Used for generating Finite Element code in geotechnical engineering (Kim et al., 2025; OpenAI, 2024f). |
| Application Area | Specific Use Cases | Documented Benefits | Key Challenges | Novel Methodological/Theoretical Implications |
| Education | Personalized learning, virtual tutoring (Davar et al., 2025) | Tailored content, adaptive pacing, 24/7 support, increased engagement (Davar et al., 2025) | Over-reliance, reduced critical thinking, accuracy of information, data privacy, equity of access (AlAli & Wardat, 2024) | Development of "AI-Integrated Pedagogy"; re-evaluation of constructivist and self-determination learning theories in AI contexts. |
| Curriculum/Lesson Planning (Li, 2025) | Efficiency for educators, idea generation, diverse material creation (Li, 2025) | Quality of AI suggestions, maintaining teacher creativity, potential for generic content (Li, 2025) | Frameworks for AI-assisted curriculum design that balance efficiency with pedagogical soundness and teacher agency. | |
| Student Assessment (Chan et al., 2024) | Generation of diverse quiz/exam questions, formative feedback, personalized assessment (Li, 2025) | Academic integrity (plagiarism), difficulty assessing true understanding, fairness of AI-generated assessments (Mohammed, 2025) | New assessment paradigms focusing on higher-order skills, process over product; ethical guidelines for AI in assessment. | |
| Engineering | Software Engineering (Code generation, debugging, QA) (Yadav et al., 2025) | Increased developer productivity, reduced coding time, improved code quality (Yadav et al., 2025) | Accuracy of generated code, over-dependence, skill degradation, security risks, bias in code (Yadav et al., 2025) | "Human-LLM Cognitive Symbiosis" models for software development; AI-collaboration literacy for engineers. |
| BIM/Architecture/Civil Engineering (Info retrieval, design visualization) (Yu et al., 2025) | Enhanced understanding of domain-specific knowledge (with RAG), task planning support (Yu et al., 2025) | Reliance on quality of RAG documents, need for domain expertise in prompt/RAG setup (Yu et al., 2025) | Methodologies for integrating LLMs with domain-specific knowledge bases (e.g., RAG) for specialized engineering tasks. | |
| Mechanical/Industrial Design (Ideation, prototyping, optimization) (Jiang et al., 2025) | Accelerated idea generation, exploration of diverse concepts, assistance in optimization problem formulation (Jiang et al., 2025) | Design fixation, unnecessary complexity, misinterpretation of feedback, unsubstantiated estimates (Ege et al., 2025) | "AI-Augmented Engineering Design" frameworks; theories of "AI-robustness" in design; understanding LLM impact on cognitive design processes. | |
| Geotechnical Engineering (Finite Element Analysis code generation) (Kim et al., 2025) | Assistance in implementing numerical models, especially with high-level libraries (Kim et al., 2025) | Extensive human intervention needed for low-level programming or complex problems; requires user expertise (Kim et al., 2025) | Frameworks for human-AI collaboration in complex numerical modeling and simulation. |
| Domain | Specific Identified Research Gap | Proposed Novel Research Question(s) | Potential Methodological Advancement | Potential Theoretical Advancement |
| NLU | True semantic understanding vs. mimicry; robustness to ambiguity; explainability (Sapkota et al., 2025) | How can NLU models be designed to exhibit verifiable deep understanding and provide transparent reasoning for their interpretations? | Development of "Deep Understanding Benchmarks"; new XAI techniques for generative NLU. | Theories of "Explainable Generative NLU"; models of computational semantics beyond statistical co-occurrence. |
| Content Generation | Ensuring factual accuracy; dynamic quality control; IP & copyright (Dempere et al., 2023) | What adaptive mechanisms can ensure real-time quality and ethical compliance in AI content generation across diverse contexts? | Adaptive, context-aware QA frameworks; blockchain or other technologies for provenance tracking. | "Ethical AI Content Frameworks"; theories of "Responsible Generative Efficiency." |
| Knowledge Discovery | Validating AI-generated hypotheses; moving from info extraction to insight; ethical AI in science (Rice et al., 2024) | How can LLMs be integrated into the scientific method to reliably generate and validate novel, theoretically grounded hypotheses? | Rigorous validation protocols for AI-discovered knowledge; hybrid LLM-KG-Experimental methodologies. | "Computational Creativity Theories" for scientific discovery; models of AI-assisted abductive reasoning. |
| Education | Longitudinal impact on learning & critical thinking; AI literacy curricula; equity & bias in EdAI (Dempere et al., 2023); K-12 & special education gaps (Dimeli & Kostas, 2025) | What pedagogical frameworks optimize human-AI collaboration for deep learning and critical skill development across diverse learners and contexts? | Longitudinal mixed-methods studies; co-design of AI literacy programs with educators and students; comparative studies in underrepresented educational settings. | "AI-Augmented Learning Theories"; frameworks for "Cyborg Pedagogy"; theories of ethical AI integration in diverse educational systems. |
| Engineering | LLMs in safety-critical tasks; understanding LLM failure modes in complex design (Topcu et al., 2025); human-LLM collaboration frameworks (Empirical Methods in Natural Language Processing, 2024); NL to code/design beyond software (Yadav et al., 2025) | How can engineering design and optimization processes be re-theorized to effectively and safely incorporate LLM cognitive capabilities? | Protocols for LLM validation in complex simulations; frameworks for human-in-the-loop control for safety-critical engineering AI. | "Human-AI Symbiotic Engineering Design Theories"; theories of "AI-Robustness" in engineering systems. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
