1. Introduction
Overcoming cancer, a leading cause of death worldwide is a challenge for global health [
1]. Various treatment modalities, such as chemotherapy, immunotherapy, hormonal therapy, surgery, and radiation therapy, are used alone or in combination to treat cancer [
2,
3]. The role of radiation therapy has been increasing owing to the non-invasive characteristics that is feasible for elderly patients and the technical advancement of treatment techniques focusing radiation on targeted tumors.
A necessary procedure in radiation therapy involves the creation of an individual treatment plan to focus high-energy radiation on the tumor while sparing nearby critical organs, using simulation computed tomography images. For successful radiation therapy, the accurate positioning of the patient, identical to that in the treatment plan, is essential. If the patient's position does not match the treatment plan, the therapeutic effect is reduced and damage to normal tissues can occur [
4,
5,
6]. For head and neck cancer, a positioning error of even 3 mm can reduce the dose to the tumor by as much as 10% [
7]. Similarly, in cervical cancer, a rotational error of 1° can result in a 2% reduction in the tumor dose and an 11% increase in the dose to nearby organs at risk [
8]. Therefore, accurate beam alignment in accordance with the treatment plan to patients is critical for optimizing the therapeutic outcomes.
Accuracy of radiation treatment is influenced by both technological and human factors [
9]. Technological uncertainties encompass mechanical issues with radiation therapy equipment, such as imprecision in the leaf position, beam output, and beam profiles, which are typically addressed through regular quality assurance procedures [
10,
11]. Human factors, such as respiratory and gastrointestinal motion, shrinkage of the targeted tumor volume, and patient stress (anxiety) play significant roles [
12,
13,
14]. The inherent variability in human respiratory and digestive system movements can cause unpredictable displacements of the body and internal organs, which can be mitigated using techniques such as gating, tumor tracking, and image-guided radiation therapy [
15]. The treatment plan was modified using adaptive radiotherapy to accommodate the shrinkage of the target volume [
16,
17]. Although various methods to counteract these factors are used in daily patient treatment and are under development, there is a noticeable scarcity of research addressing the impact of patient stress on radiation therapy.
Psychological stress triggers the sympathetic nervous system, leading to physiological changes such as increased heart rate (HR), blood pressure, breathing rate, and muscle stiffness [
18,
19,
20,
21]. Stress-induced muscle stiffness can compromise the precision of patient positioning in setting up daily radiation treatment. Assessment of stress levels in the general population is commonly conducted through surveys [
22,
23], and this methodology extends to studies examining stress in patients undergoing medical treatment [
24]. During the pandemic, the decline in mental health and quality of life of patients with cancer was assessed through a survey [
25], and the stress of patients with benign prostatic hyperplasia was confirmed through a survey [
26]. Although survey research can efficiently yield data, the reliability of self-reported information is a subject of concern [
27]. Consequently, a growing body of research has focused on the measurement of stress through biological signals, which may offer more objective data points than self-reported surveys.
Evaluation of stress through biological signal monitoring is an emerging and pivotal field of medical research. This approach encompasses a variety of metrics, including photoplethysmogram (PPG), electrocardiogram (ECG), body temperature, respiratory patterns, vocal properties, and electroencephalogram (EEG), each offering unique insights into the physiological manifestations of stress [
28,
29,
30]. Under stress, the sympathetic nervous system triggers an increase in body temperature and alters the respiratory dynamics to a faster and shallower pattern. Vocal attributes change noticeably under stress, typically resulting in higher pitch and greater variability. Additionally, EEG recordings revealed an increase in beta wave activity during stress. Two of the most significant indicators in this field are PPG and ECG, both of which monitor changes in blood flow. Changes in blood flow are instrumental in determining heart rate variability (HRV), a key metric in stress evaluation [
31,
32]. The reliability and utility of HRV as a stress measure have been substantiated by comparison with traditional stress surveys [
33]. Moreover, the integration of HRV analysis into wearable technologies such as smartwatches, has opened new avenues for real-time, noninvasive stress monitoring [
34]. Hence, HRV analysis is a promising alternative to survey-based methods that offer a more objective and continuous assessment of stress levels.
Stress in patients undergoing radiation therapy has been identified in survey studies [
35,
36]. It is observed that a majority of these patients experience heightened stress levels, particularly in the initial stages of their treatment. This underscores the need for effective stress-management strategies. However, implementing universal stress-reduction measures for all patients can be resource-intensive and requires additional manpower and time. To address this challenge, we leveraged Artificial Intelligence (AI) techniques in conjunction with biological signal analysis to identify patients who are susceptible to stress during radiation therapy. Our approach involved training machine learning models on HRV data collected both before and during the treatment sessions. This study aimed to use before-treatment HRV data to predict the likelihood of patients experiencing significant stress during therapy sessions. This prediction enables us to tailor stress management interventions more effectively by focusing on those who need them the most.
2. Materials and Methods
2.1. Patients
The study protocol, including patient recruitment and data collection methods, was approved by the Institutional Review Board of the Samsung Medical Center (IRB number 2020-11-162). Prior to enrollment, written informed consent was obtained from all participants, confirming their voluntary participation and understanding of the study’s aims and processes. Our study prospectively enrolled patients who underwent radiation therapy for lung cancer. The recruitment period spanned from December 2020 to November 2023. The inclusion criteria were carefully defined to ensure a representative and relevant patient cohort. These criteria included (1) adult patients (aged < 80 years) receiving radiation therapy for the first time to capture initial stress responses untainted by previous experiences; (2) patients capable of effective communication, ensuring accurate self-reporting and feedback regarding the study procedures and their well-being; and (3) patients who could comfortably wear the sensor without experiencing discomfort, as any discomfort could confound stress measurements. The patient recruitment process is illustrated in
Figure 1. Initially, 238 patients were approached for participation in this study. Of these, 79 consented to participate, reflecting a 33% response rate. During the study, certain patients were excluded due to reasons such as discomfort while wearing the sensor, discontinuation of radiation therapy, or data errors from sensor malfunction. These exclusion criteria helped to maintain the integrity and reliability of the collected data. To ensure the privacy and confidentiality of the participants, all collected data were anonymized. Identifiable information was removed and replaced with unique codes, thereby securing patient privacy and adhering to ethical data-handling practices.
2.2. Data acquisition and processing
Data collection commenced with patients wearing a biological sensor (Laxtha, Ubpulse 360, Daejeon, Korea) upon arrival in the waiting room prior to receiving radiation therapy. The sensor was positioned on the finger to ensure no interference during the treatment procedure. After a 10-minute acclimatization period, patients were escorted to the treatment room, where they continued to wear the sensor throughout their radiation treatment session. Upon completion of the treatment, the sensor was returned, and the collected PPG data were securely transferred to a dedicated computer system for analysis. Signals arising from patient movements and those resulting from sensor errors were carefully removed to ensure data integrity. To analyze stress changes during treatment, a minimum of one day and a maximum of five day’s data were extracted for each patient. Subsequently, the PPG data were segmented into two distinct phases for analysis: the before-treatment phase, captured while the patient was in the waiting room, and the during-treatment phase, recorded when the patient lay on the treatment couch. To account for potential HR elevations due to movement, we isolated 5 min of data following a 2-minute stabilization period in both the before- and during-treatment phases. From these phases, the HRV was computed by analyzing the intervals between successive PPG peaks. Preprocessing of the PPG data and subsequent HRV analyses were conducted using MATLAB R2020b (MATLAB, MathWorks, Natick, MA, USA) to ensure a standardized and reproducible methodology.
2.3. Stress features
Identification and accurate quantification of stress features are important for the assessment of stress levels using HRV analysis. In this study, we operationalized stress using a set of physiological markers derived from PPG signals. The second derivative of the PPG signal was used to pinpoint the heartbeat peaks, and HRV was calculated by measuring the intervals between these peaks. The selection of HRV-related stress features was based on a comprehensive literature review, identifying seven features consistently associated with physiological stress responses [
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47]. These features were HR, Standard deviation of NN intervals (SDNN), square root of the mean sum of squares of successive NN interval differences (RMSSD), percentage of successive NN intervals differing by more than 50 ms (pNN50), power of high-frequency range (HF), ratio of low-frequency range/high-frequency range (LF/HF), and total power of frequency range (TP). Under stable conditions, stress was typically indicated by increased HR and LF/HF, whereas SDNN, RMSSD, pNN50, HF, and TP were decreased (
Table 1). We employed these stress features to calculate the stress score (range: 0-100%) by observing changes before and during treatment.
2.4. Stress prediction
Predicting patient stress in the waiting room before treatment is crucial to enhance the accuracy of setting up patients for radiation treatment. This enables the early implementation of measures to reduce stress, potentially improving treatment efficacy. Non-pretrained and pretrained models were used for stress prediction. The non-pretrained model categories include decision tree (DT) [
48], random forest (RF) [
49], support vector machines (SVM) [
50], long short-term memory (LSTM) [
51], and transformer [
52]. The pre-trained models used were OpenAI’s ChatGPT, which is based on a Large Language Model (LLM) and enables prompt engineering and fine-tuning. Prompt engineering involves the strategic design of input prompts to elicit the desired responses from an LLM [
53], whereas fine-tuning refers to the process of adjusting an LLM's parameters on a specific dataset to improve its performance for particular tasks [
54]. The non-pretrained models were assessed using 10-fold cross-validation to evaluate their ability to handle eight different input datasets (Type 1, only before-treatment features; Type 2, before-treatment features with age; Type 3, before-treatment features with sex; Type 4, before-treatment features with day; Type 5, before-treatment features with age and sex; Type 6, before- treatment features with age and day; Type 7, before-treatment features with sex and day; and Type 8, before-treatment features with age, sex, and day). These datasets included treatment day, age, sex, and seven stress features identified before treatment. The model outputs were designed to classify the predicted changes in stress features during treatment (
Figure 2). Subsequently, the top three input datasets from the performance of the non-pretrained models were selected for further analysis with the pretrained models. The pretrained model was evaluated against a representative one-fold out of a 10-fold cross-validation of the non-pretrained model. Therefore, the pretrained and non-pretrained models compared the results of the one-fold dataset. The pretrained models performed prompt engineering in GPT-3.5 and GPT-4.0 and fine-tuning in GPT-3.5-turbo-1106.
2.5. Evaluation
A comprehensive evaluation of our predictive models involved several statistical and machine learning metrics to assess the stress score distribution and its variation throughout the treatment course. We analyzed the aggregated stress score changes and classified them by sex to observe potential differences in stress patterns between male and female patients over a period of up to four days. The non-parametric Wilcoxon signed-rank test was employed for paired comparisons, whereas the Friedman test was used to analyze changes across multiple-day trends. The Mann-Whitney U test was used to compare stress scores between males and females.
To assess the predicted stress features during treatment, we adopted two analytical approaches: feature classification (multi-label) and stress classification (binary). Feature classification utilized the raw output from our models to evaluate prediction accuracy across multiple labels. The key metrics included the Exact Match Ratio (EMR) and standard classification metrics such as accuracy, recall, precision, and F1 score, providing a holistic view of the models' performance. The feature classification result is calculated as a stress score but the stress classification is classified as “yes (> 50%)” or “no (< 50%)” based on the criterion of a stress score of 50%. The effectiveness of the stress classification was quantified using accuracy, recall, precision, and F1 score.
The non-pre-trained models were developed using Python (version 3.7.16) with traditional machine learning algorithms, such as DT, RF, and SVM, implemented via the Scikit-learn library (version 1.3.2). Deep learning algorithms, such as LSTM and Transformer, were operationalized using Pytorch (version 1.7.1), and all computations were performed on an NVIDIA GeForce 2080Ti GPU. The Scikit-learn library was utilized to compute various performance metrics to ensure consistency and reliability in our evaluation methodology.
4. Discussion
Stress in patients undergoing radiation therapy can lead to muscle stiffness, affecting treatment setup accuracy, and potentially causing accidents due to movement or falls. Although posttreatment surveys have validated stress in patients undergoing radiation therapy, in-room stress during treatment remains unmeasured. Our study, utilizing biological signals, revealed that 90% of patients experienced stress during treatment. Our research enables the identification of cancer patients undergoing radiation therapy who require interventions to reduce stress before treatment. By recognizing and mitigating stress in advance, the accuracy of radiation therapy can be enhanced, ultimately improving treatment outcomes.
Table 2 presents the distribution of the during-treatment stress scores measured using biological signals from 41 patients. Of the 123 stress cases, 12 (9.76%) showed no stress, while 111 (90.24%) indicated stress. The highest stress score distribution (85.71%) was observed in 26 patients (21.14%). The evaluation of the presence of stress based on a 50% stress score threshold was 47.15%. In [
55], it was found that 21-54% of patients undergoing radiation therapy experienced stress. This range encompasses our findings, where using a 50% stress score as a threshold, we observed that 47.15% of the cases involved stress.
Figure 3 shows the variation in patients' stress scores over different days. For males, the stress scores on days one and two were similar, exceeding 50% on day three and remaining similar on day four. In females, there was a slight increase on day two, a decrease on day three, and a significant decrease on day four. Overall, except for day two, males exhibited higher stress scores than females on all dates. Furthermore, males exhibited an increasing trend in stress as treatment progressed, whereas females showed a decreasing trend. However, the stress score trends did not show a statistical significance. In [
56], it was indicated that female stress decreased over the course of treatment, whereas male stress did not change significantly. Although not statistically significant, our study's stress score trends showed tendencies similar to those of other research findings.
Implementing pre-treatment measures to reduce stress is challenging for all patients. When calculating the stress using a threshold of 50% stress score, 47.15% of the cases exhibited stress. Studies [
35] and [
56] found that factors such as age, occupation, marital status, and gender differences do not significantly affect stress. While our study found higher initial stress in females, the overall stress scores were higher in males. Considering the referenced studies and our research, it may be inaccurate to select specific patient groups for before treatment stress-reduction measures. Therefore, it is necessary to predict stress in all patients prior to treatment.
Our study utilized five non-pretrained models and eight dataset types to classify changes in the features during treatment (
Table 4). The RF model exhibited the best overall EMR across the datasets, and the LSTM model had the highest EMR of 0.172 for the type 8 dataset. The LSTM performed best in terms of accuracy across all datasets, particularly in the type 7 dataset, with an accuracy of 0.699. Similarly, LSTM had the highest recall across all datasets. The DT model had the highest precision and F1 scores of 0.683 and 0.639, respectively. In predictive modeling, accurately identifying actual stress states is crucial, rather than mislabeling non-stressed individuals as stressed. Hence, Types 6, 7, and 8 datasets, which exhibited the highest recall, accuracy, and EMR, respectively, were selected to evaluate the pretrained model using one-fold data.
In the analysis presented in
Table 5, for the type 6 and type 7 datasets, the LSTM model continued to outperform the others in terms of EMR, accuracy, and recall, which is consistent with the findings in
Table 4. However, in the type 8 dataset, both GPT4.0 and LSTM demonstrated superior performance in EMR, achieving a score of 0.231. While LSTM led to accuracy and recall, GPT4.0 excelled in precision and F1 score. GPT3.5 displayed the lowest performance across all indicators in these datasets, with GPT3.5-turbo-1160 achieving an accuracy of 0.615 for the type 8 dataset.
Considering all models, including non-pretrained and pretrained, the LSTM model demonstrated robust performance across all evaluation indices and datasets, making it the most suitable for feature classification during treatment. In scenarios where implementing a machine learning model is challenging, the pretrained GPT4.0 model, particularly with the type 8 dataset, emerged as the most appropriate choice.
Stress classification classifies “yes (> 50%)” or “no (< 50%)” based on the criterion of stress score of 50% (
Table 6). In stress classification, the LSTM of the type 7 dataset classified stress effectively with an accuracy of 0.846. The RF and SVM exhibited a stability of 0.769 accuracy across all datasets. For the pretrained model, GPT4.0 showed an accuracy of 0.769 in the type 8 dataset that included all data, but in the type 7 dataset, all pretrained models failed to exceed the accuracy of 0.5. As with feature classification, LSTM was the best among all models for stress classification, with GPT4.0 being superior for the type 8 dataset. GPT4.0 is suited for predictions using diverse patient information, whereas LSTM is recommended because of its stability in scenarios with limited information.
Datasets 6, 7, and 8 used for comparison in
Table 5 and
Table 6 contain the treatment days. The treatment day is important information for stress prediction. The performance of the non-pretrained model was similar across the three dataset types. However, the pretrained model performance was the best in the type 8 dataset, which included age, sex, and treatment day, and the worst in the type 7 dataset, in which age was omitted. Stress prediction using a pre-trained model may be better when using all available patient information.
Our study is pioneering in the use of before-treatment information to predict during-treatment stress, in contrast to most studies that have focused on current stress. In [
57], the predicted stress in surveys using biological signals such as respiration, ECG, and electrodermal activity showed an accuracy of 86%, and [
58] used ECG and neural networks to predict survey stress with 85% accuracy. A few studies have predicted future stress levels. In [
59], the driver's breathing, ECG, and galvanic skin response signals in real-time were used to predict stress after one minute with 94% accuracy. In [
60], signals, such as the participant's 24-hour physiology, weather, number of calls, and location, were used to predict the next day's mood with an accuracy of 82.2%. Although a direct comparison with these studies is difficult, in our study, the LSTM using the type 7 dataset showed an accuracy of 84.6%. The accuracy of our research in predicting future information and HRV information obtained through limited PPG was sufficiently high, and we believe that the addition of learning datasets and patient biological signals will result in even higher accuracy.
This study has certain limitations. This study focused on patients with lung cancer who underwent their first radiation therapy session. The use of finger-worn sensors did not affect therapy for patients with lung cancer. However, the limited methods for measuring biological signals and the narrow patient population have resulted in restricted participant diversity and a lack of standardization in stress assessment methods. Expanding the research to include various cancer patients using sensor technologies that do not interfere with treatment could enhance the accuracy of stress prediction and enable more precise evaluations. Although AI-based stress prediction using biological signals has demonstrated over 80% accuracy, the impact of the measured stress score on the actual radiation therapy remains unverified. Further research is required to analyze the correlations between stress indicators and variables related to treatment accuracy, such as patient breathing, couch positioning, and setup times. Assigning weights to features with a high correlation could lead to more accurate stress assessments. Future research will aim to select appropriate sensor technologies, involve diverse cancer patient groups, and explore the relationship between stress and radiation therapy outcomes.