Preprint
Concept Paper

This version is not peer-reviewed.

Embodied AI as Coach: Integrating Facial Expression and Heart-Rate Biometrics into Adaptive Coaching Systems

Submitted:

01 September 2025

Posted:

02 September 2025

You are already at the latest version

Abstract
Artificial Intelligence (AI) systems are increasingly used in human-centered domains such as coaching, education, and healthcare. However, most remain disembodied, relying solely on text or speech while neglecting the non-verbal cues essential to human communication. This paper advances the vision of embodied AI by proposing a multimodal framework that integrates facial expression analysis with biometric signals, heart rate, heart rate variability, and electrodermal activity, for real-time affect recognition. Grounded in embodied cognition, polyvagal theory, emotional intelligence frameworks, and affective computing, the study investigates how such integration can close the empathy gap in AI-mediated coaching. Specifically, it examines fusion strategies (early, late, and hybrid) for synchronizing heterogeneous signals and enabling adaptive coaching systems that dynamically adjust responses to users’ affective states. The expected contributions are both scientific, developing robust multimodal affective recognition, and applied, advancing empathetic, trustworthy, and personalized AI-driven coaching interventions.
Keywords: 
;  ;  ;  ;  ;  ;  

1. Introduction

AI coaching and mentoring platforms are rapidly expanding, offering scalable and personalized support (Vistorte et al., 2024). Yet, current systems overwhelmingly rely on text or voice, omitting nonverbal cues fundamental to human communication (Chan, 2025; Meinlschmidt et al., 2025). Human coaches interpret micro-expressions, heart-rate variability, and body language to assess readiness, stress, or disengagement, dimensions current AI systems overlook. This absence of embodied awareness makes AI agents appear mechanical and detached (Weiß et al., 2024).
Integrating facial expression analysis and biometric signals can enhance empathic responsiveness, enabling AI to sense stress, regulate interactions, and dynamically adapt coaching strategies (Frey, 2014; Bridgeman & Hayes, 2023). Such embodied AI has the potential to deepen trust, engagement, and effectiveness, particularly in sensitive contexts such as leadership development, education, and therapy. This study positions embodied AI as a necessary step toward emotionally intelligent, adaptive mentorship systems.

2. Problem Statement and Research Questions

The absence of embodied awareness creates an empathy gap, limiting users’ sense of being understood (Lim et al., 2024). Nonverbal communication, estimated to convey the majority of meaning in human interaction (Mehrabian, 1971), is ignored, producing one-dimensional guidance and reducing adoption in sensitive domains (Zhang & Wang, 2024).
This study addresses this gap by investigating whether integrating facial and biometric cues can enhance AI’s capacity for empathy, trust, and perceived effectiveness. The research is guided by three questions:
RQ1: How can AI agents integrate facial expression recognition and heart-rate biometrics to approximate embodied awareness in coaching interactions?
RQ2: To what extent does this integration improve emotional state detection accuracy compared to text/audio-only models?
RQ3: How do users perceive the empathy, effectiveness, and trustworthiness of embodied AI agents compared to traditional systems?

3. Literature Review

Embodied cognition theory emphasizes that cognition and emotion are inseparable from bodily states (Lakoff & Johnson, 1999; Klippel et al., 2021). Affective computing demonstrated that machines can recognize and respond to human emotions (Picard, 1997), while biometric research confirms that heart-rate variability (HRV) and micro-expressions reliably indicate stress and engagement (Pessanha & Salah, 2021; Beatton et al., 2024). Recent studies show that multimodal interaction, combining voice, appearance, and non-verbal cues, improves trust and user presence in conversational agents (Kamali et al., 2023; Spitale et al., 2024).
However, most AI coaching systems remain disembodied, relying on limited modalities. While affective computing applications in education and healthcare are expanding (Salloum et al., 2025; Vistorte et al., 2024), systematic application to coaching and therapy is scarce. This study addresses that gap by testing whether multimodal integration enhances accuracy, empathy, and user trust in AI-mediated coaching.

4. Theoretical Framework

This study integrates four complementary perspectives to ground an embodied, adaptive coaching paradigm. Each framework is explicitly tied to the research questions (RQ1–RQ3).

4.1. Embodied Cognition Theory - RQ1 (Integration of Facial + Biometrics for Embodied Awareness)

Embodied cognition holds that cognition and emotion are inseparable from bodily states; effective understanding of a person’s affect therefore requires reading physiological and behavioral cues, not just language. Foundational work by Lakoff & Johnson (1999) argues that conceptual systems are grounded in bodily experience, while Wilson (2002) synthesizes six influential views of embodiment. Barsalou’s (2008) grounded cognition further demonstrates that abstract thought is rooted in sensorimotor simulation. In applied HCI contexts, this implies that coaching systems should sense micro-expressions and autonomic signals to approximate human attunement (Klippel et al., 2021; Hauke et al., 2024). Together, these works justify our multimodal design—integrating facial expressions and biometrics, as a necessary substrate for embodied, empathic AI coaching (Vistorte et al., 2024).

4.2. Polyvagal Theory & Psychophysiology of Emotion - RQ1 & RQ2 (Interpretation of HR/HRV; Detection Accuracy Gains)

Polyvagal Theory links autonomic state (e.g., Heart Rate Variability - HRV) to social engagement, safety, and stress responsivity (Porges, 2011). Classic psychophysiology consolidates the evidential basis for using cardiovascular and electrodermal signals as emotion markers (Cacioppo, Tassinary, & Berntson, 2007), while Gross (1998) frames these dynamics within emotion regulation—central to coaching. In allied clinical-developmental work, Siegel (1999) emphasizes interpersonal attunement and self-regulation, underscoring why trustworthy coaching must be sensitive to physiological arousal. Recent studies reinforce HRV’s association with affect and engagement (Beatton et al., 2024; Lee et al., 2023; Puglisi et al., 2023). These foundations guide feature selection and interpretation of biometric signals and motivate our hypothesis that adding HR/HRV (and EDA/temperature where available) will improve detection accuracy beyond text/audio baselines (RQ2).

4.3. Emotional Intelligence (EI) in Coaching & Behavior Change - RQ3 (Perceived Empathy, Trust, Effectiveness)

The EI tradition provides a behavioral lens for why recognizing and responding to emotion improves coaching outcomes. Core definitions from Mayer & Salovey (1997) and popular diffusion via Goleman (1995) establish detection, understanding, and regulation of emotion as levers of performance and relationship quality. In applied coaching/leadership contexts, Boyatzis (2006) (Intentional Change Theory) and Goleman, Boyatzis, & McKee (2002) (“primal leadership”) connect EI to sustained behavior change and trust. Cherniss (2010) clarifies EI’s organizational relevance. These works inform our user-level outcomes and instruments (perceived empathy, trust, effectiveness), framing how multimodal sensing should translate into adaptive responses that users experience as supportive and effective (Bridgeman & Hayes, 2023; Terblanche, 2020).

4.4. Affective Computing & Multimodal Social Signals - RQ2 & RQ3 (Adaptive Feedback Loops; Multimodal Advantage)

Affective Computing established the vision of machines that perceive and respond to human emotion (Picard, 1997). In applied, emotion-oriented systems, Schröder & Cowie (2005) outlined key design issues (annotation, context, ethics), and D’Mello & Kory (2015) synthesized multimodal detection evidence in learning environments, showing benefits over unimodal approaches. For facial channels, the Facial Action Coding System is the canonical basis for micro-expression modeling (Ekman & Friesen, 1978), informing our camera-based pipeline. Contemporary embodied-agent work highlights wellbeing and rapport gains from multimodality (Spitale et al., 2024; Kamali et al., 2023), and recent education/health reviews emphasize context-aware affect sensing (Vistorte et al., 2024; Salloum et al., 2025). This tradition underwrites our closed-loop design: sense → infer → adapt, enabling the system to modulate coaching strategies in real time (Islam & Bae, 2024).
Synthesis. Across these lenses, embodied signals are not ancillary—they are constitutive of empathic understanding and effective coaching. The framework predicts that (a) multimodal sensing will more accurately detect affect than text/audio alone (RQ2) and (b) embedding those inferences into adaptive responses will improve perceived empathy, trust, and effectiveness (RQ3), thereby closing the application-level empathy gap.

5. Methodology

This study adopts a mixed-methods quasi-experimental design to evaluate the impact of embodied awareness in AI coaching systems. Two conditions will be compared: (1) a Baseline AI Coach, relying exclusively on text and audio inputs, and (2) an Embodied AI Coach, integrating text, audio, facial expression recognition, and biometric data streams. The participant pool will consist of approximately 40–60 professionals and graduate students.
Data collection will involve three streams. First, facial and biometric signals—including micro-expressions, heart-rate variability, and electrodermal activity—will be captured to detect stress and affective shifts (Ekman & Friesen, 1978; Lai et al., 2021). Second, participants will complete self-report surveys assessing trust, empathy, and satisfaction with the coaching interaction (Fang et al., 2023; Harris et al., 2023). Third, system logs will be retained to capture adaptive responses and conversational dynamics for post-hoc analysis (Shore et al., 2023).
Analysis will proceed in two stages. The quantitative component will employ ANOVA and performance metrics including accuracy, precision, recall, and F1-scores to compare detection accuracy and user satisfaction across systems (Wu et al., 2023; Hassan et al., 2025). The qualitative component will conduct thematic analysis of participant reflections on perceived empathy, trust, and authenticity in the coaching exchange (Niebuhr & Valls-Ratés, 2024; Rossing et al., 2024). This dual analytic strategy ensures rigorous evaluation of both computational performance and human experience.

6. Expected Contribution

The study is designed to advance both theoretical understanding and applied practice in embodied AI.
  • Scientific Contribution – It demonstrates how integrating embodied signals improves the accuracy of affect detection and enhances rapport-building in human–AI interaction (Blümel et al., 2023).
  • Applied Contribution – It extends affective computing research into coaching and therapeutic contexts, domains where empathetic responsiveness is central to effectiveness (Raamkumar & Yang, 2022).
  • Empirical Evidence – It provides systematic evidence on embodied AI’s influence on trust, empathy, and perceived effectiveness, offering comparative insights against traditional disembodied systems (Niebuhr & Valls-Ratés, 2024).
  • Ethical Guidelines – It generates practical design and governance guidelines for the responsible collection and use of biometric and facial data in mentorship and coaching systems (Afroogh et al., 2024; Terblanche et al., 2022).
By addressing the empathy gap that currently constrains AI-mediated coaching, this study contributes to the development of emotionally intelligent AI agents capable of delivering nuanced, context-sensitive, and human-like mentorship.

7. Limitations and Future Research

Like all exploratory studies, this research has limitations that inform future work.

Sample and Generalizability

The study focuses on 40–60 participants drawn from professional and student contexts. While this provides valuable insights into the perceived empathy and trust of embodied AI coaches, the sample may not fully capture variability across age, cultural background, or industry context. Larger and more diverse populations would enhance external validity (Harris et al., 2023; Vistorte et al., 2024).

Technical Constraints

Facial recognition and biometric sensing technologies are sensitive to environmental factors such as lighting, camera resolution, and sensor calibration (Huang et al., 2023; Bello et al., 2023). Synchronization challenges between visual and physiological signals may also introduce noise into multimodal fusion (Wang et al., 2022; Shakhovska et al., 2024). Future work should investigate more robust architectures and adaptive calibration techniques to mitigate these issues.

Ethical Considerations

The integration of biometric and facial data raises significant concerns regarding privacy, consent, and data security (Afroogh et al., 2024; Chavan et al., 2025). This study proposes guidelines for responsible design but further research is needed on governance frameworks that balance innovation with user protection. Future investigations should also consider the implications of power asymmetries between AI systems and vulnerable users in coaching or therapy contexts.

Future Research Directions

Future work should expand in three directions. First, comparative studies across cultural contexts could examine how embodied signals vary and whether models generalize effectively. Second, longitudinal studies should assess the sustained impact of embodied AI coaching on behavioral and emotional outcomes. Third, hybrid designs combining neurophysiological data (e.g., EEG) with facial and biometric signals may deepen affect recognition accuracy, extending the capabilities of adaptive mentorship systems (Herbuela & Nagai, 2025; Gao et al., 2024).

References

  1. Afroogh, S., Akbari, A., Malone, E., Kargar, M., & Alambeigi, H. (2024). Trust in AI: Progress, challenges, and future directions. Humanities and Social Sciences Communications, 11(1), 1–12. [CrossRef]
  2. Alam, M. A. (2024). HR analytics for strategic decisions: Resolving conflicts from a psychological perspective. Research Square. [CrossRef]
  3. Alhejaili, R., & Alomainy, A. (2023). The use of wearable technology in providing assistive solutions for mental well-being. Sensors, 23(17), 7378. [CrossRef]
  4. Bagai, R., & Mane, V. (2023). Designing an AI-powered mentorship platform for professional development: Opportunities and challenges. International Journal of Computer Trends and Technology, 71(4), 108–114. [CrossRef]
  5. Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–645. [CrossRef]
  6. Beatton, T., Chan, H. F., Dulleck, U., Ristl, A., Schaffner, M., & Torgler, B. (2024). Positive affect and heart rate variability: A dynamic analysis. Scientific Reports, 14(1), 1–12. [CrossRef]
  7. Blümel, J. H., Zaki, M., & Bohné, T. (2023). Personal touch in digital customer service: A conceptual framework of relational personalization for conversational AI. Journal of Service Theory and Practice, 34(1), 33–52. [CrossRef]
  8. Boyatzis, R. E. (2006). An overview of intentional change from a complexity perspective. Journal of Management Development, 25(7), 607–623. [CrossRef]
  9. Bridgeman, J., & Hayes, A. G. (2023). Using artificial intelligence-enhanced video feedback for reflective practice in coach development: Benefits and potential drawbacks. Coaching: An International Journal of Theory, Research and Practice, 17(1), 32–46. [CrossRef]
  10. Bryant, D. (2019). Towards emotional intelligence in social robots designed for children. Proceedings of the 18th ACM International Conference on Interaction Design and Children (pp. 547–553). [CrossRef]
  11. Cacioppo, J. T., Tassinary, L. G., & Berntson, G. G. (2007). Handbook of psychophysiology (3rd ed.). Cambridge University Press. [CrossRef]
  12. Cherniss, C. (2010). Emotional intelligence: Toward clarification of a concept. Industrial and Organizational Psychology, 3(2), 110–126. [CrossRef]
  13. Costello, E., Brunton, J., Otrel-Cass, K., Lyngdorf, N. E. R., & Brown, M. (2024). Hacking happier futures: An AI-augmented student hackathon to address affective and ethical digital learning challenges. Research Portal Denmark. https://local.forskningsportal.dk/local/dki-cgi/ws/cris-link?src=aau&id=aau-40fe5df2-680d-4e0a-8285-69066ff15b33.
  14. Curry, J. (2025). Building digital consciousness: A unified architecture for emotion, memory, and self-reflection in AI. SSRN. [CrossRef]
  15. D’Mello, S., & Kory, J. (2015). A review and meta-analysis of multimodal affect detection systems. ACM Computing Surveys, 47(3), 1–36. [CrossRef]
  16. Ekman, P., & Friesen, W. V. (1978). Facial Action Coding System: A technique for the measurement of facial movement. Consulting Psychologists Press.
  17. Fang, L., Xing, S. P., Long, Y., Lee, K.-P., & Wang, S. J. (2023). EmoSense: Revealing true emotions through microgestures. Advanced Intelligent Systems, 5(9), 1–14. [CrossRef]
  18. Frey, J. (2014). Heart rate monitoring as an easy way to increase engagement in human-agent interaction. arXiv. [CrossRef]
  19. Goleman, D. (1995). Emotional intelligence: Why it can matter more than IQ. Bantam Books.
  20. Goleman, D., Boyatzis, R., & McKee, A. (2002). Primal leadership: Realizing the power of emotional intelligence. Harvard Business School Press.
  21. Gross, J. J. (1998). The emerging field of emotion regulation: An integrative review. Review of General Psychology, 2(3), 271–299. [CrossRef]
  22. Guntz, T. (2020). Estimating expertise from eye gaze and emotions. HAL. https://theses.hal.science/tel-03026375.
  23. Gur, T., & Maaravi, Y. (2025). The algorithm of friendship: Literature review and integrative model of relationships between humans and artificial intelligence (AI). Behaviour & Information Technology, 1–14. [CrossRef]
  24. Harris, A. M., Larson, L., Lauharatanahirun, N., DeChurch, L. A., & Contractor, N. (2023). Social perception in human-AI teams: Warmth and competence predict receptivity to AI teammates. Computers in Human Behavior, 145, 107765. [CrossRef]
  25. Hauke, G., Lohr-Berger, C., & Shafir, T. (2024). Emotional activation in a cognitive behavioral setting: Extending the tradition with embodiment. Frontiers in Psychology, 15, 1409373. [CrossRef]
  26. Hegde, K., & Jayalath, H. (2025). Emotions in the loop: A survey of affective computing for emotional support. arXiv. [CrossRef]
  27. Irfan, B., Kuoppamäki, S., & Skantze, G. (2024). Recommendations for designing conversational companion robots with older adults through foundation models. Frontiers in Robotics and AI, 11, 1–13. [CrossRef]
  28. Islam, R., & Bae, S. W. (2024). Revolutionizing mental health support: An innovative affective mobile framework for dynamic, proactive, and context-adaptive conversational agents. arXiv. [CrossRef]
  29. Janhonen, J. (2023). Socialisation approach to AI value acquisition: Enabling flexible ethical navigation with built-in receptiveness to social influence. AI and Ethics, 3(1), 1–13. [CrossRef]
  30. Kamali, M. E., Angelini, L., Lalanne, D., Khaled, O. A., & Mugellini, E. (2023). Older adults’ perspectives on multimodal interaction with a conversational virtual coach. Frontiers in Computer Science, 5, 112589. [CrossRef]
  31. Klippel, A., Sajjadi, P., Zhao, J., Wallgrün, J. O., Huang, J., & Bagher, M. M. (2021). Embodied digital twins for environmental applications. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 5(4), 193–200. [CrossRef]
  32. Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to Western thought. Basic Books.
  33. Lee, G., Park, S., & Whang, M. (2023). The evaluation of emotional intelligence by the analysis of heart rate variability. Sensors, 23(5), 2839. [CrossRef]
  34. Lim, S., Schmälzle, R., & Bente, G. (2024). Artificial social influence via human-embodied AI agent interaction in immersive virtual reality (VR). arXiv. [CrossRef]
  35. Mayer, J. D., & Salovey, P. (1997). What is emotional intelligence? In P. Salovey & D. Sluyter (Eds.), Emotional development and emotional intelligence: Educational implications (pp. 3–31). Basic Books.
  36. Mehrabian, A. (1971). Silent messages. Wadsworth.
  37. Meinlschmidt, G., Koc, S., Boerner, E., Tegethoff, M., Simacek, T., Schirmer, L., & Schneider, M. (2025). Enhancing professional communication training in higher education through artificial intelligence (AI)-integrated exercises. BMC Medical Education, 25(1), 1–10. [CrossRef]
  38. Niebuhr, O., & Valls-Ratés, Ï. (2024). Hey, OK, Play! A rough guide for multimodal, nonverbal communication signals in embodied voice assistants. Research Portal Denmark. https://local.forskningsportal.dk/local/dki-cgi/ws/cris-link?src=sdu&id=sdu-3706f491-ad2c-4da6-9f2d-2a780e024d77.
  39. Picard, R. W. (1997). Affective computing. MIT Press. [CrossRef]
  40. Porges, S. W. (2011). The polyvagal theory: Neurophysiological foundations of emotions, attachment, communication, and self-regulation. W. W. Norton & Company.
  41. Puglisi, N., Tissot, H., Rattaz, V., Epiney, M., Razurel, C., & Favez, N. (2023). Father-infant synchrony and vagal tone as an index of emotion regulation. Early Child Development and Care, 193(12), 1714–1726. [CrossRef]
  42. Raamkumar, A. S., & Yang, Y. (2022). Empathetic conversational systems: A review of current advances, gaps, and opportunities. IEEE Transactions on Affective Computing, 14(4), 2722–2736. [CrossRef]
  43. Robb, D. A., Lopes, J., Ahmad, M., McKenna, P. E., Liu, X., Lohan, K. S., & Hastie, H. (2023). Seeing eye to eye: Trustworthy embodiment for task-based conversational agents. Frontiers in Robotics and AI, 10, 123476. [CrossRef]
  44. Schröder, M., & Cowie, R. (2005). Issues in emotion-oriented computing. In R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. F. Papageorgiou, S. Kollias, & S. O. O’Donnell (Eds.), Emotion-Oriented Systems (pp. 3–18). Springer. [CrossRef]
  45. Shore, D. M., Robertson, O., Lafit, G., & Parkinson, B. (2023). Facial regulation during dyadic interaction: Interpersonal effects on cooperation. Affective Science, 4(3), 506–520. [CrossRef]
  46. Siegel, D. J. (1999). The developing mind: How relationships and the brain interact to shape who we are. Guilford Press.
  47. Spitale, M., Winkle, K., Barakova, E., & Güneş, H. (2024). Guest editorial: Special issue on embodied agents for wellbeing. International Journal of Social Robotics, 16(5), 833–838. [CrossRef]
  48. Terblanche, N. (2020). A design framework to create artificial intelligence coaches. International Journal of Evidence Based Coaching and Mentoring, 18(2), 152–166. [CrossRef]
  49. Terblanche, N., Molyn, J., de Haan, E., & Nilsson, V. (2022). Comparing artificial intelligence and human coaching goal attainment efficacy. PLOS ONE, 17(6), e0270255. [CrossRef]
  50. Vicci, H. (2024). Emotional intelligence in artificial intelligence: A review and evaluation study. SSRN Electronic Journal. [CrossRef]
  51. Vistorte, A. O. R., Deroncele-Acosta, Á., Ayala, J. L. M., Barrasa, Á., López-Granero, C., & Martí-González, M. (2024). Integrating artificial intelligence to assess emotions in learning environments: A systematic literature review. Frontiers in Psychology, 15, 1387089. [CrossRef]
  52. Weiß, T., Eilks, A. E., & Pfeiffer, J. (2024). Interference timing of GenAI sales agents in virtual reality. Research Square. [CrossRef]
  53. Zhang, J., Oh, Y. J., Lange, P., Yu, Z., & Fukuoka, Y. (2020). Artificial intelligence chatbot behavior change model for designing AI chatbots to promote physical activity and a healthy diet. Journal of Medical Internet Research, 22(9), e22845. [CrossRef]
  54. Zhang, Z., & Wang, J. (2024). Can AI replace psychotherapists? Exploring the future of mental health care. Frontiers in Psychiatry, 15, 1444382. [CrossRef]
  55. Zhao, J., Wu, M., Zhou, L., Wang, X., & Jia, J. (2022). Cognitive psychology-based artificial intelligence review. Frontiers in Neuroscience, 16, 1024316. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated