3. Methods and Materials
This study was conducted with 54 third- and fourth-year university students enrolled in a Human Talent Management course at a private Latin American university. The research was carried out in the Applied Neuroscience Laboratory, using iMotions-supported facial expression recognition, eye-tracking, and vocal tone analysis technologies. The purpose of the study was to examine whether AI-powered chatbot training was associated with changes in students’ facial and vocal emotional reaction time during simulated job interviews.
3.1. Research Design
This study followed a one-group pretest-posttest quasi-experimental design. The same group of students was assessed before and after an AI-powered chatbot training intervention. No control group was included. Therefore, the study was designed to examine within-participant changes associated with the intervention, rather than to establish definitive causal effects.
The independent variable was participation in the chatbot-based interview training process. The dependent variables were the proportions of facial and vocal emotional reaction time recorded during the simulated job interviews. Facial indicators included positive, negative, neutral, confusion, sentimentality, joy, surprise, anger, sadness, disgust, fear, and contempt. Vocal indicators included happiness, sadness, anger, and neutrality.
This design was appropriate for the purpose of the study because the main objective was to observe whether students showed measurable changes in emotional expressiveness and vocal modulation after completing a structured chatbot-based training process. Accordingly, the findings are interpreted as pre-post changes associated with chatbot-based training, not as causal effects attributable exclusively to the intervention.
3.2. Participants
Participants were recruited from a Human Talent Management course because the course content was directly related to recruitment, selection, professional communication, and job interview preparation. This made the simulated interview activity academically relevant for the participants and consistent with the professional competencies addressed in the course.
The final sample consisted of 54 university students from the third and fourth academic years. All participants completed the same intervention structure, including the baseline simulated interview, the chatbot-based training sessions, and the post-training simulated interview.
The sample size was considered suitable for the exploratory and applied nature of the study. In behavioral and psychological studies using within-participant or repeated-measures designs, samples of this size are commonly used to identify moderate pre-post changes, particularly because each participant serves as his or her own comparison unit. This design reduces the influence of individual baseline differences and increases sensitivity to change over time.
Although the sample was adequate for examining the main pre-post patterns in facial and vocal emotional reaction time, the study does not claim that the sample is sufficient to detect small effects or to generalize the findings to all university populations. For this reason, the results should be interpreted as evidence from an applied behavioral study and should be confirmed in future research with larger samples, control groups, and participants from different academic and institutional contexts.
Participation was voluntary. Students were informed about the academic purpose of the study, the nature of the simulated interview activities, and the use of facial, vocal, eye-tracking, and chatbot interaction data for research purposes. All data were analyzed in aggregated form, and no individual participant was identified in the results.
3.3. Ethical Considerations and Data Protection
The research protocol was reviewed by the Ethics Committee in Research at Universidad Privada Boliviana. The protocol was classified as eligible for exemption from formal ethical evaluation because it involved a minimal-risk, non-invasive educational training activity based on simulated job interviews, voluntary participation, informed consent, and de-identified data analysis.
Although facial expression recognition, vocal tone analysis, and eye-tracking technologies were used, these tools were employed only to record behavioral and emotional indicators during the simulated interview process. The study did not involve clinical procedures, physical risk, psychological treatment, or invasive measurement techniques.
All data were used exclusively for academic research purposes. The database was de-identified before analysis, and the results were reported only in aggregated form. Informed consent was obtained from all participants before their participation.
3.4. AI-Powered Chatbot Design and Calibration
The AI-powered chatbot was designed specifically to support simulated job interview training. Its function was to reproduce a structured interview environment and provide students with immediate feedback on the quality of their answers.
The chatbot was built around five core interview questions commonly used in recruitment and selection processes. These questions were reviewed and validated by a panel of ten experts with doctoral-level training and professional experience in organizational psychology, organizational development, human resources selection, recruitment, and personnel induction. The expert panel had experience either in academic work related to human talent management or in professional practice with companies and recruitment processes.
The validation process focused on four main aspects: the professional relevance of the questions, the clarity of the wording, the realism of the simulated interview situation, and the usefulness of the questions for evaluating students’ communicative performance. The experts reviewed whether the questions reflected situations commonly faced by candidates in employment interviews and whether they were appropriate for assessing clarity, coherence, professional vocabulary, emotional tone, and confidence in oral expression.
All students received the same intervention structure. The chatbot used the same base prompt throughout the study to ensure consistency in the training process. Across the sessions, the questions maintained the same purpose and level of difficulty, although minor wording variations were introduced to reduce memorization and encourage more spontaneous responses. This allowed the intervention to remain standardized while also resembling the natural variation of a real job interview.
The chatbot evaluated each response using a 10-point scale. The scoring criteria included clarity, coherence, conciseness, vocabulary, emotional tone, and adequacy of the answer in relation to the expected response. After each answer, the chatbot provided qualitative feedback similar to the type of guidance that could be offered by a Human Resources specialist after a mock interview. The feedback focused on helping students organize their answers more clearly, use professional language, respond directly to the question, and project a more confident and positive communicative style.
At the end of each chatbot interaction, an overall score was calculated as the average of the five individual response scores. This score was used as formative feedback during the training process. However, the chatbot score was not treated as the main outcome variable in the present study. The main outcomes were the facial and vocal emotional reaction time indicators recorded in the laboratory during the pre-training and post-training simulated interviews.
The calibration process followed an iterative sequence. First, the expert panel reviewed the interview questions and evaluation criteria. Second, trial interactions were conducted to verify whether the chatbot produced coherent, constructive, and professionally relevant feedback. Third, adjustments were made to the wording of the questions, the tone of the recommendations, and the scoring logic. Finally, the prompt structure and question format were fixed before the intervention to ensure that all participants received a comparable training experience (see
Table 1).
3.5. Procedure and Intervention
The intervention followed a structured sequence consisting of a pre-training assessment, three chatbot-based training sessions, and a post-training assessment.
First, participants completed a baseline simulated interview before receiving chatbot-based training. During this pre-training assessment, students answered a standard set of interview questions. Facial expression recognition, eye-tracking, and vocal tone analysis technologies were used to record the initial facial and vocal emotional indicators.
Second, students participated in three chatbot-based training sessions. In each session, they answered five interview questions and received immediate numerical and qualitative feedback from the chatbot. The feedback addressed the clarity, coherence, emotional tone, vocabulary, conciseness, and suitability of their responses. The second and third sessions allowed participants to apply the feedback received in previous interactions and progressively refine their answers.
Third, participants completed a post-training simulated interview. This final assessment followed the same general structure as the baseline interview. Facial expression recognition, eye-tracking, and vocal tone analysis were again used to record emotional and vocal indicators.
The pre-training and post-training measurements were then compared to examine whether students showed changes in facial and vocal emotional reaction time associated with the chatbot-based training (see
Table 2).
3.6. Laboratory Technologies and Data Collection
The study was conducted in an Applied Neuroscience Laboratory equipped with technologies for facial expression analysis, eye-tracking, and vocal tone analysis. These tools were used during the pre-training and post-training simulated interviews to collect behavioral and emotional indicators.
Facial expression analysis was conducted using iMotions-supported facial recognition technology. This system allowed the identification of facial emotional reaction time across aggregate and disaggregated categories. Aggregate indicators included positive, negative, neutral, confusion, and sentimentality emotional facial reaction time. Disaggregated indicators included joy, surprise, anger, sadness, disgust, fear, and contempt.
Eye-tracking was used as part of the laboratory setup to support the interpretation of facial expression data. In this study, eye-tracking was relevant because facial expressions were recorded while participants interacted with visual interview stimuli. Eye-tracking helped verify participants’ visual attention and engagement during the task, providing contextual support for interpreting facial emotional responses. It was not treated as a primary statistical outcome in the present analysis, but as complementary information supporting the facial expression analysis.
Vocal tone analysis was used to assess emotional characteristics in students’ speech during the simulated interviews. The vocal indicators analyzed were happiness, sadness, anger, and neutrality. These indicators were expressed as proportions of emotional vocal reaction time during the recorded interview.
For both facial and vocal indicators, emotional reaction time was expressed as a percentage. This percentage represented the proportion of recorded interview time during which a given emotional category was detected by the analysis system (see
Table 3).
3.7. Variables and Measures
The main outcome variables were facial emotional reaction time and vocal emotional reaction time. Both were expressed as percentages of the recorded interview time.
Facial emotional reaction time was analyzed at two levels. First, aggregate emotional indicators were examined, including positive, negative, neutral, confusion, and sentimentality emotional facial reaction time. Second, disaggregated facial emotions were analyzed, including joy, surprise, anger, sadness, disgust, fear, and contempt. This two-level structure allowed the study to identify both general emotional patterns and specific facial expressions.
Vocal emotional reaction time was analyzed through four indicators: happiness, sadness, anger, and neutrality. These categories were selected because they represent relevant vocal-emotional states in interview contexts, where candidates are expected to communicate confidence, emotional stability, and professional engagement.
The unit of analysis was the participant. For each student, pre-training and post-training values were obtained for every facial and vocal indicator. Higher values indicated a greater proportion of time in which a given emotional category was detected during the simulated interview (see
Table 4).
3.8. Data Analysis
The statistical analysis was designed to compare pre-training and post-training emotional reaction time indicators. Because the same participants were measured before and after the chatbot-based training, all comparisons were treated as paired comparisons.
Descriptive statistics were first calculated for each facial and vocal indicator. Pre-training and post-training means were reported to describe the direction and magnitude of change. Although the variables were expressed as percentages, means were retained because they provide a clear descriptive comparison of the pre-post differences and are commonly reported in applied behavioral research. The interpretation also considered the consistency of change across participants.
The paired t-test was used to assess mean differences between pre-training and post-training scores. This test was appropriate because the study compared two related measurements from the same participants. However, since emotional reaction time indicators are bounded percentages and may present asymmetric distributions or values close to zero, a non-parametric complementary analysis was also included.
The Wilcoxon signed-rank test was used as the non-parametric paired comparison. This test evaluates whether the distribution of pre-post differences is centered around zero and is appropriate when normality assumptions for paired differences may not be fully met. In this study, Wilcoxon was particularly relevant for indicators with low baseline values or non-normal distributions, such as sadness, disgust, contempt, and anger-related expressions.
The results tables report pre-training and post-training means, paired t-test statistics, paired t-test p-values, Wilcoxon signed-rank Z values, Wilcoxon p-values, and a binary indicator of whether the pre-post difference was statistically significant. Statistical significance was interpreted using the following thresholds: ***p < 0.01, **p < 0.05, and *p < 0.10.
When both the paired t-test and the Wilcoxon signed-rank test were significant, the result was interpreted as stronger evidence of change. When only the Wilcoxon test was significant, the finding was interpreted more cautiously as evidence of consistent participant-level change rather than a large average difference. This distinction was important because some emotional indicators may change consistently across participants without producing a large mean difference.
Given the absence of a control group, the study does not claim definitive causal effects. The results were interpreted as pre-post changes associated with chatbot-based training. This interpretation is consistent with the one-group pretest-posttest quasi-experimental design and recognizes that factors such as practice, familiarity with the interview format, or repeated exposure may also have contributed to the observed changes (see
Table 5).