Submitted:
01 August 2025
Posted:
01 August 2025
You are already at the latest version
Abstract
Keywords:
Introduction
Research Purpose and Questions
Significance
Literature Review
Theoretical Foundations for Human-AI Collaboration
Collective Intelligence and Human-AI Complementarity
Collective Intelligence as Theoretical Foundation
Recent Advances in Human-AI Complementarity
Learning Theory Foundations and Extensions
Constructivist and Sociocultural Extensions
Systems Theory Foundations
Transformative Learning in Technological Contexts
Ethical Frameworks for Human-AI Research Collaboration
Theory Development and Validation in Educational Research
Established Criteria for Theoretical Quality
Challenges in Traditional Validation Approaches
AI-Assisted Validation: Emerging Opportunities
Theoretical Gaps and Research Opportunities
Materials & Methods
Research Design and Philosophical Foundations
Multi-Paradigm Design Framework
Sequential Mixed-Methods Rationale
Phase 1: Systematic Theoretical Synthesis
Methodological Approach and Justification
Cross-Domain Integration Methodology
Principle Development Process
Quality Assurance Procedures
Phase 2: Validation Framework Development
Traditional Framework Integration Strategy
Integrated Assessment Template Development
Phase 3: AI-Assisted Content Assessment
Multi-Model Architecture Design
Content Quality Dimensions Framework Development
Prompt Engineering and Standardization Protocols
Reliability and Validity Measurement Protocols
Data Collection and Analysis Procedures
Integrated Data Collection Strategy
Comprehensive Analysis Methodology
Ethical Considerations and Methodological Limitations
Results
Phase 1: Theoretical Synthesis Outcomes
Multi-Paradigm Theoretical Foundation
Five-Domain Theoretical Analysis
HAIST Framework Development
Seven-Principle Integrated Architecture
Framework Integration and Coherence
Theoretical Innovation Achievement
Phase 2 Results: Theoretical Rigor Evaluation
Whetten Framework Assessment
Wacker Criteria Assessment
Kivunja Educational Framework Assessment
Aggregate Performance Analysis
Phase 3 Results: Iterative AI-Assisted Evaluation and Comparative Reliability Analysis
Iterative Development and Comparative Analysis Across Three Trials
Inter-Model Reliability Assessment




AI Evaluation Scores Analysis
Interpretation and Lessons from the Iterative Process
- Trial 1: The use of a broad 0–10 scale and an early-stage HAIST framework resulted in high but unreliable scores (aggregate mean = 8.10, ICC = –0.34), with substantial model disagreement (e.g., SD and MAD >1.0 on several dimensions). This reflected both rubric ambiguity and insufficient operational detail in the theory content, making consistent AI-based evaluation challenging.
- Trial 2: Introduction of a more rigorous 0–5 scale with explicit anchors led to stricter, more discerning model appraisals (aggregate mean = 3.19), with modest gains in inter-model reliability (ICC = 0.32), though variability remained high for dimensions tied to practical application and empirical guidance.
- Trial 3: Comprehensive framework operationalization, deepened literature integration, and structured narrative clarity, combined with the explicit 0–5 rubric, yielded both the highest reliability and the most consistent, convergent ratings (aggregate mean = 4.12, ICC = 0.82, MAD = 0.27). These results demonstrate that rubric refinement alone is insufficient; meaningful AI evaluation requires well-developed theoretical constructs, transparent operational definitions, and complete, well-structured supporting materials.
Phase 3 Final Results: High-Reliability AI Model Evaluation
Qualitative Feedback Analysis
Summary of Integrated Findings
Discussion
Theoretical Contributions of HAIST
Extension of Learning Theory
Positioning Within Collective Intelligence
Symbiotic Intelligence Paradigm
Human Agency and Ethical Integration
Methodological Innovations
AI as Algorithmic Evaluators
Symbiotic Validation Process
Implementation and Implications
Institutional Integration
Research Training and Development
Scaling Collective Intelligence
Limitations and Boundary Conditions
Future Research Directions
Conclusion
Author Contributions
Funding
Ethics Approval
Informed Consent
Data Availability
Acknowledgements
Competing Interests
Clinical Trial Registration
Consent to Participate
Dual Publication
Consent to Publish
Permission to Use Third-Party Material
Clinical Trial Number
Appendices
References
- National Science Foundation. (2024). AI and the Future of Research Collaboration. NSF Reports. https://www.nsf.gov/focus-areas/ai.
- Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.
- Levy, P. (1999). Collective Intelligence: Mankind’s Emerging World in Cyberspace. Perseus Publishing.
- Mulgan, G. (2018). Big Mind: How Collective Intelligence Can Change Our World. Princeton University Press.
- Kitzie, V., Wan, Y., Alsaid, M., Berkowitz, A. E., Herdiyanti, A., & Penrose, R. B. (2024). The AI-empowered Researcher: Using AI-based Tools for Success in Ph.D. Programs. Proceedings of the ALISE Annual Conference. [CrossRef]
- Singh, J. P., Mishra, N., & Singla, B. (2025). From Ideation to Publication: Ethical Practices for Using Generative AI in Academic Research. Emerald Publishing. [CrossRef]
- Siemens, G. (2005). Connectivism: A Learning Theory for the Digital Age. Itdl.org. http://www.itdl.org/journal/jan_05/article01.htm.
- Hemmer, P., Schemmer, M., Kühl, N., Vössing, M., & Satzger, G. (2024). Complementarity in Human-AI Collaboration: Concept, Sources, and Evidence. ArXiv (Cornell University). [CrossRef]
- Bacharach, S. B. (1989). Organizational theories: Some criteria for evaluation. Academy of Management Review, 14(4), 496–515. [CrossRef]
- Surowiecki, J. (2005). The Wisdom of Crowds. ResearchGate; Anchor Books. https://www.researchgate.net/publication/200773230_The_Wisdom_of_Crowds.
- Luckin, R., & Holmes, W. (2016, February). Intelligence Unleashed: An argument for AI in Education. ResearchGate. https://www.researchgate.net/publication/299561597_Intelligence_Unleashed_An_argument_for_AI_in_Education.
- Licklider, J. C. R. (1960). Man-Computer Symbiosis. IRE Transactions on Human Factors in Electronics, HFE-1(1), 4–11. [CrossRef]
- Dellermann, D., Ebel, P., Söllner, M., & Leimeister, J. M. (2019). Hybrid Intelligence. Business & Information Systems Engineering, 61(5), 637–643. [CrossRef]
- Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The Psychology of Learning and Motivation (Vol. 2, pp. 89–195). Academic Press. [CrossRef]
- Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. Basic Books.
- Bandura, A. (1986). Social Foundations of Thought and Action: A Social Cognitive Theory. Prentice-Hall.
- Vygotsky, L. S. (1978). Mind in Society: The Development of Higher Psychological Processes. Harvard University Press.
- Holland, J. H. (1995). Hidden Order: How Adaptation Builds Complexity. Addison-Wesley.
- Trist, E. (1981). The evolution of socio-technical systems. Occasional Paper No. 2, Ontario Quality of Working Life Centre.
- Mezirow, J. (1991). Transformative Dimensions of Adult Learning. Jossey-Bass.
- Engeström. Y. (1987). Learning by Expanding : an activity-theoretical Approach to Developmental Research. Cambridge University Press.
- Mezirow, J. (1990). Fostering Critical Reflection in Adulthood. Jossey-Bass.
- Kolb, D. (2014). Experiential learning: Experience as the source of learning and development (2nd ed.). Pearson Education, Inc. (Original work published 1984).
- Fleming, T. (2018). Critical thinking and transformative learning. In T. Fleming (Ed.), Re-imagining Transformation in Learning (pp. 117–130). Routledge.
- Belmont Report. (1979). Ethical Principles and Guidelines for the Protection of Human Subjects of Research. U.S. Department of Health and Human Services.
- IEEE Position Statement Ethical Aspects of Autonomous and Intelligent Systems. (2019). https://globalpolicy.ieee.org/wp-content/uploads/2019/06/IEEE19002.pdf.
- Mittelstadt, B. D. (2019). Principles alone cannot guarantee ethical AI. Nature Machine Intelligence, 1, 501–507. [CrossRef]
- Knowles, M. (1984). The Adult Learner: A Neglected Species (3rd ed.). Gulf Publishing.
- Barrows, H. S. (1996). Problem-based learning in medicine and beyond: A brief overview. New Directions for Teaching and Learning, 1996(68), 3–12. [CrossRef]
- Hmelo-Silver, C. E. (2004). Problem-based learning: What and how do students learn? Educational Psychology Review, 16(3), 235–266. [CrossRef]
- Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. [CrossRef]
- Whetten, D. A. (1989). What constitutes a theoretical contribution? Academy of Management Review, 14(4), 490–495. [CrossRef]
- Lynham, S. A. (2002). The General Method of Theory-Building Research in Applied Disciplines. Advances in Developing Human Resources, 4(3), 221-241. (Original work published 2002). [CrossRef]
- Bui, N. M., & Barrot, J. S. (2025). ChatGPT as an automated essay scoring tool in the writing classrooms: How it compares with human scoring. Education and Information Technologies, 30, 2041–2058. [CrossRef]
- Atasoy, A., & Arani, S. M. N. (2025). ChatGPT: A reliable assistant for the evaluation of students’ written texts? Education and Information Technologies. Advance online publication. [CrossRef]
- Gelso, C. J. (2006). Applying theories to research: The interplay of theory and research in science. In F. T. L. Leong & J. T. Austin (Eds.), The psychology research handbook (2nd ed., pp. 455–464). Sage. [CrossRef]
- Creswell, J. W., & Plano Clark, V. L. (2018). Designing and conducting mixed methods research (3rd ed.). Sage.
- Wacker, J. G. (1998). A definition of theory: Research guidelines for different theory-building research methods in operations management. Journal of Operations Management, 16(4), 361–385. [CrossRef]
- Kivunja, C. (2018). Distinguishing between theory, theoretical framework, and conceptual framework: A systematic review of lessons from the field. International Journal of Higher Education, 7(6), 44–53. [CrossRef]
- Dubin, R. (1978). Theory Building. Free Press.
- Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8(2), 135–160. [CrossRef]
- Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. [CrossRef]
- Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. [CrossRef]
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
- Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105. [CrossRef]
- Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428. [CrossRef]
- Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric Theory. McGraw-Hill Humanities/Social Sciences/Languages.
- Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30, 79–82. [CrossRef]
| Framework | Total Criteria | Criteria Met | Criteria Partially Met | Criteria Not Met | Percent Met (%) |
|---|---|---|---|---|---|
| Whetten (1989) | 4 | 4 | 0 | 0 | 100 |
| Wacker (1998) | 4 | 4 | 0 | 0 | 100 |
| Kivunja (2018) | 15 | 11 | 3 | 1 | 73 |
| Aggregate | 23 | 19 | 3 | 1 | 85 |
| Dimension | Trial 1 Mean (SD/MAD/ICC)* | Trial 2 Mean (SD/MAD/ICC) | Trial 3 Mean (SD/MAD/ICC) |
|---|---|---|---|
| Clarity and Articulation | 7.67 (1.15/0.89/–0.34) | 3.00 (0.82/0.67/0.32) | 4.00 (0.00/0.00/0.82) |
| Internal Consistency & Coherence | 8.33 (0.58/0.44/–0.34) | 3.67 (1.42/1.11/0.32) | 4.83 (0.29/0.22/0.82) |
| Comprehensiveness and Scope | 8.00 (1.00/0.67/–0.34) | 3.33 (1.70/1.56/0.32) | 4.00 (0.00/0.00/0.82) |
| Parsimony and Elegance | 8.00 (1.00/0.67/–0.34) | 3.00 (0.82/0.67/0.32) | 3.83 (0.29/0.22/0.82) |
| Practical Applicability & Utility | 8.00 (1.73/1.33/–0.34) | 2.67 (1.24/1.11/0.32) | 4.00 (1.00/0.67/0.82) |
| Novel Contribution & Significance | 8.33 (1.53/1.11/–0.34) | 3.67 (1.42/1.11/0.32) | 4.50 (0.50/0.33/0.82) |
| Structural Organization & Flow | 8.33 (0.58/0.44/–0.34) | 2.67 (1.24/1.11/0.32) | 4.33 (0.58/0.44/0.82) |
| Aggregate Mean (SD/MAD/ICC) | 8.10 (1.08/0.79/–0.34) | 3.19 (1.24/1.05/0.32) | 4.12 (0.52/0.27/0.82) |
| Dimension | ChatGPT | Claude | Grok | Mean | SD |
|---|---|---|---|---|---|
| Clarity and Articulation | 4 | 4 | 4 | 4.00 | 0.00 |
| Internal Consistency | 5 | 4.5 | 5 | 4.83 | 0.29 |
| Comprehensiveness & Scope | 4 | 4 | 4 | 4.00 | 0.00 |
| Parsimony & Elegance | 4 | 3.5 | 4 | 3.83 | 0.29 |
| Practical Applicability | 5 | 4 | 3 | 4.00 | 1.00 |
| Novel Contribution | 5 | 4.5 | 4 | 4.50 | 0.50 |
| Structure & Flow | 4 | 4 | 5 | 4.33 | 0.58 |
| Statistic | Value | Interpretation |
|---|---|---|
| Intraclass Correlation (ICC) | 0.83 | Good to Excellent Agreement |
| Cronbach’s Alpha | 0.82 | High Internal Consistency |
| Mean Absolute Deviation | 0.27 | Minimal Model Divergence |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).