Smart Bedside Traceability of Caregiver-Patient Interactions Using Wearables and Tiny Localization Anchors

Aurora Polo-Rodríguez; Almudena Escalera-Esteban; Miguel Angel Anguita-Molina; Isabel Valenzuela-López; María Correa-Rodríguez; Blanca Rueda-Medina; Javier Medina-Quero

doi:10.20944/preprints202606.0170.v1

Submitted:

01 June 2026

Posted:

02 June 2026

You are already at the latest version

Abstract

Caregiver–patient traceability is essential for measuring care workload and interaction time in shared hospital rooms, where a single caregiver attends multiple patients and manual documentation is intrusive, time-consuming, and prone to errors. This paper proposes a non-invasive smart bedside sensing approach based on a commercial smartwatch worn by the caregiver and two compact ultra-wideband (UWB) localization anchors placed near the monitored beds. The smartwatch provides Magnetic, Angular Rate, and Gravity (MARG) data, while the UWB anchors enable bed-level proximity estimation without any device worn by the patient. The system was evaluated in a shared hospital room of approximately 5.0 × 4.8 m², with two hospital beds separated by 1.3–1.7 m. A dataset of four scenes totalling 19.9 min was collected, producing 1,140 one-second labeled samples and 1,074 valid windows after past–delayed sliding-window segmentation. Five supervised models were evaluated, including Long Short-Term Memory (LSTM) networks, a hybrid Convolutional Neural Network plus LSTM (CNN+LSTM) architecture, Extreme Gradient Boosting (XGBoost), and a non-linear Support Vector Machine (SVM). The Simple LSTM achieved the best performance, with 96% accuracy and a macro F1-score of 0.89. The results demonstrate accurate identification of the attended bed from wrist-worn and UWB signals alone, supporting scalable and objective caregiver–patient traceability in shared hospital rooms with minimal infrastructure.

Keywords:

caregiver–patient traceability

;

ultra-wideband

;

wearable sensing

;

MARG sensors

;

human activity recognition

;

interaction time estimation

;

machine learning

;

LSTM

;

XGBoost

Subject:

Engineering - Control and Systems Engineering

1. Introduction

The increasing demand for healthcare and long-term care services, driven by population aging and the growing prevalence of chronic conditions, has intensified the need for objective, continuous, and scalable methods to characterize care workload and caregiver–patient interactions [1]. This need is particularly acute in multi-occupancy care spaces—shared hospital rooms and residential bedrooms—where a single caregiver attends multiple patients sequentially and manual documentation is both disruptive and unreliable [2]. Accurately determining who is being attended, for how long, and in which spatial context is fundamental to support workload assessment, resource allocation, quality-of-care auditing, and evidence-based staffing decisions. Yet these interactions are still commonly documented through manual records or self-reported logs, which are time-consuming, intrusive, and prone to incompleteness [3].

Ambient Assisted Living (AAL) technologies, wearable devices, and non-invasive sensing systems have opened new opportunities for monitoring health- and care-related behaviors in situ [4,5]. By enabling the continuous acquisition of contextual and behavioral data, these technologies support the extraction of indicators related to mobility, routines, social interaction, and caregiver support [6]. Nevertheless, transforming raw sensor streams into actionable caregiver–patient interaction records remains challenging, especially in shared bedrooms where several individuals coexist simultaneously and the spatial separation between care recipients may be as small as one or two meters. Real-Time Locating Systems (RTLS) and wearable inertial sensing have been separately explored for staff tracking and nurse workflow analysis [7,8]. Ultra-Wideband (UWB) technology is especially promising for indoor localization due to its centimeter-level ranging accuracy and suitability for compact, low-power devices [9]. In parallel, wearable Magnetic, Angular Rate, and Gravity (MARG) sensors embedded in commercial smartwatches can capture body motion and orientation with minimal disruption to care workflows. However, combining UWB proximity with wrist-worn MARG data for fine-grained, patient-level interaction traceability in shared bedrooms remains an open and insufficiently addressed problem [10].

Existing approaches to caregiver–patient proximity detection suffer from one or more of the following limitations: (i) they rely on patient-worn devices, which raises acceptability and infection-control concerns in clinical settings; (ii) they deploy dense RTLS infrastructure (many anchors, dedicated servers), which is costly and difficult to scale; or (iii) they resolve location at room level rather than bed level, which is insufficient when two care recipients occupy the same room. There is therefore a clear need for lightweight, non-invasive systems that can distinguish between beds within a shared room using only caregiver-worn sensing and minimal bedside hardware.

This paper addresses that need with the following three contributions:

A lightweight sensing configuration that requires only a caregiver-worn commercial smartwatch and two compact bedside UWB anchors, with no device worn by the patient and no dedicated localization server.
A multimodal dataset of four scenes collected in a shared hospital room, designed to capture realistic caregiver–patient interaction patterns under clinically representative conditions.
A comparative evaluation of machine learning and deep learning models for bed-level interaction classification, demonstrating accurate caregiver–patient traceability from MARG and UWB signals alone.

The remainder of this paper is organized as follows. Section 2 reviews related work on care time measurement, AAL monitoring, indoor localization, wearable sensing, and multimodal human activity recognition. Section 3 describes the sensing system, deployment scenario, data acquisition protocol, preprocessing pipeline, and classification models. Section 4 presents the experimental results. Section 5 discusses the findings and their implications. Section 6 summarizes the main conclusions and outlines future research directions.

2. Related Work

Measuring care time is fundamental to understanding workload, resource allocation, and the real distribution of support provided to dependent individuals. In informal care, time estimation is particularly complex because caregiving is often embedded in everyday routines, overlaps with household tasks, and may include supervision, emotional support, and other intangible forms of assistance. The validity of such estimates depends strongly on how caregivers, care recipients, and care activities are defined, as well as on the measurement instrument used—whether recall questionnaires, diaries, or time-use protocols [2,11]. In professional care, time-and-motion studies have quantified how nurses distribute their working time across direct patient contact, documentation, medication, and coordination tasks, consistently finding that only a minority of care time is spent at the bedside [3,12,13]. Although valuable, these observational methods are labor-intensive and episodic, motivating the search for unobtrusive automated alternatives.

Ambient Assisted Living (AAL) technologies and smart-home infrastructures have been proposed as a step in this direction, enabling continuous monitoring of daily activities, mobility, and health-related behaviors in home-care settings [14,15]. Real-life AAL deployments have demonstrated the potential of unobtrusive connected devices to support older adults and reduce reliance on manual documentation [16,17]. In professional environments, Internet of Things (IoT) platforms have similarly been applied to nursing workflow analysis, asset tracking, and indoor navigation [5,7,8]. A common thread across both domains, however, is that most systems characterize the monitored individual in isolation rather than the dyadic interaction between a caregiver and a specific patient—and fewer still address the challenge of distinguishing which of several co-located patients is being attended at a given moment.

Proximity-based sensing offers a promising route to fill this gap. Bluetooth Low Energy (BLE) beacons have been used to detect face-to-face contact events in indoor environments, showing that short-range radio signals can approximate social interactions [18,19]. In nursing-care facilities, Garcia and Inoue [20] explored indoor localization using stationary BLE beacons to support staff-to-patient assistance monitoring and workload analysis. Their work addressed the limited-data and class-imbalance problem through a relabeling-based data augmentation strategy, showing that beacon-based localization can support care-related monitoring but also highlighting the dependence of Received Signal Strength (RSS)-based approaches on beacon coverage and room layout. However, BLE ranging accuracy is typically limited to 1–3 m, which is insufficient to resolve bed-level proximity in a shared bedroom where two care recipients may be separated by as little as one meter. Ultra-Wideband (UWB) technology overcomes this limitation, offering centimeter-level ranging accuracy through Time-of-Flight (ToF) and Two-Way Ranging (TWR) techniques, together with robustness to multipath interference and suitability for compact, low-power hardware [9]. Recent work has applied UWB to detect caregiver–patient interactions in residential settings [10], demonstrating its potential for fine-grained proximity estimation. Nevertheless, prior deployments have relied on multiple distributed anchors to reconstruct full 2D or 3D trajectories, incurring substantial infrastructure cost. In contrast, the present work proposes a minimal two-anchor configuration—one per monitored bed—that is sufficient for bed-level discrimination without full-room coverage.

Complementing spatial sensing, wearable inertial measurement units and MARG sensors have been extensively used for Human Activity Recognition (HAR), capturing the motion and orientation patterns of the person wearing the device [6]. Commercial smartwatches now integrate accelerometers, gyroscopes, and magnetometers with sufficient sampling rates for continuous MARG acquisition, and sequential deep learning architectures—in particular LSTM networks and hybrid CNN+LSTM models—have demonstrated state-of-the-art performance on MARG time series, outperforming classical methods when temporal dependencies span several seconds [4]. Tree-based ensembles such as XGBoost have also proven to be strong and computationally efficient baselines for multiclass window classification. Despite these advances, HAR methods have focused primarily on recognizing the activities of the monitored person themselves rather than on identifying which care recipient a caregiver is attending. This is a fundamentally different and more challenging problem: similar arm gestures may occur when attending either of two adjacent beds, making motion alone an ambiguous cue without accompanying spatial context.

The line of research developed by the authors provides a direct foundation for the system proposed here. An early deployment of UWB-based caregiver–patient interaction sensing in a supervised living apartment demonstrated that proximity-derived interaction heatmaps can reveal the duration, location, and temporal distribution of care episodes [10], but relied on tags worn by all participants and on eight anchors covering a multi-room apartment. Subsequent work fused millimeter-wave radar heatmaps with UWB ranging to track multiple occupants without dense infrastructure [21], and directly evaluated the Google Pixel Watch platform with Qorvo DWM3001CDK anchors under an edge-computing architecture, demonstrating sub-centimeter ranging error and battery consumption tunable between 4% and 13% per hour [22]. A parallel study on daily activity recognition in multi-occupant homes showed that the past–delayed windowing strategy adopted in the present work consistently outperforms causal windowing of equivalent length [23]. These results collectively establish the hardware and algorithmic foundations on which the present system is built.

Taken together, the literature reveals a clear gap: no existing system combines caregiver-worn MARG sensing with minimal UWB ranging infrastructure to achieve bed-level caregiver–patient traceability in shared hospital rooms, without any device on the patient. Table 1 positions the proposed approach relative to the most relevant prior works.

3. Materials and Methods

3.1. Sensing Architecture and Hardware

The proposed system combines three hardware elements, depicted in Figure 1: a commercial smartwatch worn by the caregiver, two compact UWB anchors fixed at the headboard of each monitored bed, and a Raspberry Pi 4B edge gateway. Care recipients carry no device. Figure 2 shows the wearable and UWB components alongside the smartwatch application interface, which displays real-time ranging status and the estimated attended zone.

The wearable sensing platform is a Google Pixel Watch 4 [24], which provides continuous MARG data from its on-board accelerometer, gyroscope, and magnetometer. The acquired stream comprises tri-axial acceleration

{a c c_{x}, a c c_{y}, a c c_{z}}

, tri-axial angular rate

{g y r_{x}, g y r_{y}, g y r_{z}}

, tri-axial magnetic field

{m a g_{x}, m a g_{y}, m a g_{z}}

, and rotation-vector components

{r o t_{x}, r o t_{y}, r o t_{z}, r o t_{w}, r o t_a c c u r a c y}

, yielding 14 channels in total. The ambient localization component relies on Qorvo DWM3001CDK UWB development kits [25]. One-to-one Two-Way Ranging (TWR) sessions between a UWB tag carried by the caregiver and each bedside anchor produce two bed-level proximity streams, denoted

d i s t a n c e_00_01

(Bed A) and

d i s t a n c e_00_02

(Bed B). The Raspberry Pi 4B gateway hosted a local Mosquitto MQTT broker using the default configuration on Transmission Control Protocol (TCP) port 1883. UWB ranging measurements from the two bedside anchors were transmitted as lightweight MQTT messages within the local wireless network and stored locally with gateway-side timestamps. Messages were published using Quality of Service (QoS) 0 (at-most-once delivery), which minimizes transmission overhead and is suitable for real-time sensor streaming where occasional missing samples can be handled during preprocessing. No cloud broker or external server was used. In the recorded dataset, the effective UWB update interval was approximately 2.7–2.9 s in median per anchor, with occasional longer gaps associated with ranging dropouts. The smartwatch orchestrates each acquisition session by assigning a unique identifier shared across all data streams, ensuring precise temporal alignment between MARG and UWB data during postprocessing.

3.2. Deployment Scenario and Dataset

Data were collected in a shared hospital room equipped with two hospital beds, bedside clinical equipment, wall-mounted medical infrastructure, and clinical training mannequins (Figure 3). The room measured approximately

5.0 \times 4.8

m², with a ceiling height of approximately 2.7 m. The two beds were arranged in parallel with a mattress-edge separation of 1.3–1.7 m, representing a compact shared clinical space in which bed-level discrimination is realistically challenging. One UWB anchor was placed near the headwall of each bed at a height of approximately 1.2–1.5 m, yielding a horizontal anchor separation of approximately 3.0 m.

A single caregiver performed structured interaction sessions at each bed across four scenes. Ground-truth labels were recorded in real time via the smartwatch interface, which timestamped the start and end of each interaction with Bed A, Bed B, or neither (Class 0, corresponding to transitions or non-care activities). Table 2 summarizes the dataset statistics.

3.3. Data Preprocessing and Temporal Segmentation

Raw recordings were stored in long format and transformed into a multivariate second-level representation by aggregating sensor samples using the per-channel median and assigning labels by modal class within each one-second interval. The resulting tensor combines 14 MARG channels with 2 UWB ranging channels, yielding

D = 16

features per time step. Temporal samples were generated using a past–delayed sliding-window strategy (Figure 4): each window of

L = p + 1 + d = 11

time steps is centered on the instant to be labeled, with

p = 5

past seconds and

d = 5

delayed (future) seconds. Sequential models received the window as a tensor

W_{t} \in R^{11 \times 16}

, while classical methods used its flattened form

z_{t} \in R^{176}

. Windows were retained only when at least one valid UWB measurement was available from each anchor within the segment; missing readings caused by occasional ranging dropouts were encoded as

- 1

. Feature standardization (zero mean, unit variance) was fitted exclusively on the training scenes of each cross-validation fold to prevent data leakage.

3.4. Classification Models and Evaluation Protocol

Five supervised classifiers were trained and evaluated, spanning sequential deep learning architectures and classical machine learning baselines. Two LSTM variants were considered. The Simple LSTM uses a single recurrent layer with 64 hidden units, followed by dropout (

p = 0.3

), a dense layer with 32 neurons and Rectified Linear Unit (ReLU) activation, and a softmax output layer. The Stacked LSTM uses two recurrent layers with 128 and 64 hidden units, respectively, with dropout (

p = 0.3

) after each recurrent block, followed by a dense layer with 64 neurons, dropout (

p = 0.2

), and a softmax output. The CNN+LSTM model first applies two one-dimensional convolutional layers with 32 and 64 filters, kernel size 3, same padding, ReLU activation, layer normalization, and dropout (

p = 0.2

). The resulting sequence is then processed by two LSTM layers with 128 and 64 hidden units, followed by dropout, a dense layer with 64 neurons, and a final softmax classifier.

All neural models were trained for 40 epochs with a batch size of 2 using sparse categorical cross-entropy loss

L = - \sum_{c = 1}^{C} y_{c} log {\hat{y}}_{c}

. The Adam optimizer was used with a learning rate of

10^{- 3}

. No early stopping was applied, so all neural models were trained for the same number of epochs in every fold. The random seed was fixed to 42 for reproducibility. As tree-based baseline, XGBoost was configured with 300 estimators, maximum depth 3, learning rate 0.05, subsampling ratio 0.9, column subsampling ratio 0.9, histogram-based tree construction, and multiclass log-loss on the flattened window vector

z_{t} \in R^{176}

. A non-linear SVM with an RBF kernel

K (z_{i}, z_{j}) = exp (- γ {∥ z_{i} - z_{j} ∥}^{2})

,

C = 1.0

,

γ = scale

, balanced class weights, and probabilistic outputs completed the evaluation as a standard non-parametric baseline.

All models were evaluated under a leave-one-scene-out cross-validation protocol. In each fold, one complete scene was held out for testing and the remaining three scenes were used for training. This protocol evaluates generalization across recording sessions and avoids mixing temporally adjacent windows from the same scene between training and testing. Feature standardization was fitted exclusively on the training scenes of each fold and then applied to the held-out scene, preventing information leakage from the test scene. Performance was measured using overall accuracy and macro-averaged precision, recall, and F1-score, complemented by per-class F1-scores for Class 0 (no interaction), Class A (Bed A), and Class B (Bed B). In addition to the aggregated global metrics, per-scene results were reported as mean and standard deviation across the four held-out scenes.

4. Results

Following the pipeline described in Section 3.3, the four scenes produced 1,140 one-second labeled samples after temporal aggregation, distributed as 62 /468 /610 samples for classes 0, A, and B, respectively. These values correspond to the second-level representation reported in Table 2, not to the final windows used for model evaluation. The centered 11-second sliding-window strategy removes the first and last five seconds of each scene as possible window centers; therefore, 40 candidate centers are lost across the four scenes, resulting in 1,100 candidate windows. Of these, 26 windows were discarded because they did not contain at least one valid UWB measurement from each anchor within the segment. The final evaluation set therefore comprised 1,074 valid windows, distributed as 33 /468 /573 for classes 0, A, and B, respectively. Table 3 summarizes the global classification performance of the five evaluated models.

The Simple LSTM achieved the best overall performance, with 96% accuracy and a macro F1-score of 0.89. The strong per-class scores for Beds A and B (F1 of 0.96 and 0.98, respectively) confirm that the model reliably discriminates between the two beds when the caregiver is actively attending either one. The lower F1-score for Class 0 (0.75) reflects the inherent difficulty of the transition class: the caregiver is moving between beds or performing non-patient tasks, and the UWB signals are ambiguous during these brief intervals. The CNN+LSTM achieved comparable results (accuracy 0.95, macro F1 0.88), with a slight improvement in Class 0 recall relative to the Simple LSTM. XGBoost matched the Simple LSTM in accuracy (0.96) but obtained a substantially lower F1-score for Class 0 (0.35), indicating that the flattened temporal representation loses discriminative information for the minority transition class. The SVM baseline showed the weakest performance across all classes (macro F1 0.64, near-zero F1 for Class 0), confirming that fixed RBF kernels struggle with the complex non-stationary patterns of this multimodal stream. The performance gap between Simple LSTM and Stacked LSTM (macro F1: 0.89 vs. 0.78) suggests that a deeper recurrent architecture overfits the limited training data, and that a single recurrent layer is sufficient at the one-second resolution used here.

Table 4 reports the accuracy obtained in each leave-one-scene-out fold, where each scene is used once as the held-out test set. The Simple LSTM obtained the highest mean accuracy across the four folds and showed stable performance across scenes. CNN+LSTM and XGBoost achieved comparable mean accuracies, although their behavior differed across folds. CNN+LSTM was particularly stable in Scenes 1–3, whereas XGBoost performed strongly in Scenes 1 and 4 but dropped in Scene 2. The Stacked LSTM showed the largest degradation in Scene 2, suggesting that the deeper recurrent architecture was less robust under scene-specific changes. The SVM baseline exhibited the highest variability across scenes, indicating lower robustness to changes in the multimodal feature distribution.

The confusion matrices in Figure 5 provide a more detailed view of the classification behavior of each model. For the Simple LSTM model, shown separately in Figure 6, most errors are concentrated in the minority Class 0: 12 of the 33 transition windows were classified as Bed A, while none were classified as Bed B. In contrast, the attended-bed classes were highly stable, with only 16 Bed A windows classified as Bed B and 11 Bed B windows classified as Bed A. This behavior is important for the intended application because caregiver–patient interaction time is mainly estimated from the accumulated duration of classes A and B. Therefore, occasional errors in the transition class have a limited effect on per-bed interaction-time estimation, whereas direct confusions between Bed A and Bed B would be more critical. The Simple LSTM and CNN+LSTM models showed the lowest number of such direct bed-to-bed confusions, supporting their suitability for bedside traceability.

5. Discussion

The experimental results show that bed-level caregiver–patient traceability can be achieved with a minimal sensing configuration composed of a caregiver-worn smartwatch, two compact UWB anchors, and a local edge gateway. The proposed system does not attempt to reconstruct a full indoor trajectory; instead, it focuses on the more specific and clinically useful question of identifying which bed is being attended at each time window. This design choice reduces infrastructure requirements while preserving the information needed to estimate per-patient bedside interaction time in shared rooms.

The leave-one-scene-out evaluation is particularly relevant because each fold tests the models on a complete scene that was not observed during training. This is stricter than a random window-level split, where neighboring windows from the same recording could appear in both training and testing. Under this protocol, the Simple LSTM achieved the best global balance between accuracy, macro F1-score, and stability across scenes. Its mean accuracy across folds was 0.963 ± 0.031, and the global confusion matrix shows that direct confusions between Bed A and Bed B were limited. This is important because the primary downstream variable is not only the instantaneous class label, but the accumulated time assigned to each patient bed. The results also clarify the role of temporal modeling. The caregiver’s location and wrist motion are not independent from one second to the next: approaching a bed, performing bedside actions, and leaving the interaction area generate short temporal patterns in both MARG and UWB streams. Sequential models can exploit this structure directly, whereas classical models receive a flattened representation of the same window. XGBoost achieved high overall accuracy, but its much lower F1-score for Class 0 indicates that flattening the temporal context is less effective for modeling transitions. The SVM baseline showed the weakest global performance and the highest variability across scenes, suggesting that a fixed kernel is less suitable for the non-stationary multimodal patterns generated during bedside care. The transition class deserves special attention. Class 0 represents short periods in which the caregiver is not clearly attending either bed, including movements between beds and non-care actions. These windows are both under-represented and intrinsically ambiguous: UWB distances may be similar for both anchors, while wrist movements are more variable than during stable bedside interaction. As a result, all models obtained lower performance for this class. However, this limitation has a moderate impact on the intended use case, because caregiver–patient interaction time is estimated mainly by accumulating windows classified as Bed A or Bed B. Direct bed-to-bed confusions are therefore more harmful than transition-to-bed errors. The Simple LSTM produced few Bed A/Bed B swaps, which supports its use as the preferred model for estimating attended-bed duration.

From an application perspective, the classification output can be transformed into interaction-time estimates by accumulating the duration of consecutive windows assigned to each bed. Since each labeled window corresponds to a one-second center instant, the predicted sequence provides a second-level trace of caregiver attention. This does not replace clinical judgment and should be interpreted as an automated estimate rather than an exact manual annotation. Nevertheless, it provides an objective and scalable basis for quantifying bedside presence, comparing workload between patients, and identifying interaction patterns that would be difficult to capture through manual observation alone. Compared with previous proximity-based approaches, the proposed system offers three practical advantages. First, patients do not need to wear any device, which improves acceptability and reduces infection-control concerns. Second, only two UWB anchors are required for a two-bed room, avoiding dense RTLS deployments. Third, all communication and processing are performed locally through the Raspberry Pi gateway, avoiding dependence on cloud infrastructure. These characteristics make the approach suitable for controlled hospital rooms and residential-care environments where privacy, simplicity, and low deployment burden are important.

Several limitations should also be acknowledged. The dataset was collected with a single caregiver and four scenes, which limits generalization to different caregivers, care routines, and room layouts. The current system was validated in a two-bed configuration; rooms with more beds would require additional anchors and a revised classification setup. The ground truth was manually annotated through the smartwatch interface, which may introduce small temporal offsets at the beginning and end of interactions. Finally, although the classification results support interaction-time estimation, the present evaluation does not yet report a dedicated time-estimation error metric, such as mean absolute error in seconds per bed. Future work should therefore validate the full time-aggregation pipeline against manually annotated care episodes, include multiple caregivers, and evaluate the system in longer real-world deployments.

6. Conclusions

This paper has presented a non-invasive smart bedside sensing system for caregiver–patient traceability in shared hospital rooms. The system combines a commercial smartwatch worn by the caregiver, which provides 14-channel MARG data, with two compact UWB anchors placed at the headboard of each monitored bed, which provide bed-level proximity through Two-Way Ranging. It requires no device on the patient and no dedicated localization infrastructure beyond a Raspberry Pi edge gateway, keeping all communication and processing within the local network. Multimodal data were aggregated at one-second resolution, segmented using a past–delayed sliding-window strategy, and classified by five supervised models ranging from classical baselines (SVM, XGBoost) to sequential deep learning architectures (Simple LSTM, Stacked LSTM, CNN+LSTM).

Under a leave-one-scene-out cross-validation protocol, in which each recording session is held out once for testing, the Simple LSTM achieved the best overall behavior, with 96% global accuracy, a macro F1-score of 0.89, and a mean accuracy of

0.963 \pm 0.031

across the four folds. Beyond aggregate accuracy, the analysis of the confusion matrices showed that the sequential models produced very few direct Bed A/Bed B confusions, concentrating their errors in the under-represented and intrinsically ambiguous transition class. This error structure is favorable for the intended application, since caregiver–patient interaction time is estimated by accumulating the windows attributed to each bed, and occasional transition errors have a limited effect on per-bed time estimates. The comparison between models further confirmed that temporal context spanning several seconds is essential: sequential architectures, which exploit the short motion and proximity patterns generated when approaching, attending, and leaving a bed, consistently outperformed the flattened-window classical baselines on the transition class.

From a practical standpoint, the proposed approach offers three advantages over previous proximity-based systems: patients wear no device, which improves acceptability and reduces infection-control concerns; only two anchors are needed for a two-bed room, avoiding dense RTLS deployments; and all data remain on a local edge gateway, avoiding dependence on cloud infrastructure. Together, these properties make the system suitable for privacy-sensitive clinical and residential-care environments where simplicity and low deployment burden are essential.

Future work will therefore focus on closing the gap between classification and downstream time estimation by reporting per-bed mean absolute error in seconds against manually annotated care episodes; extending the dataset to multiple caregivers, larger bed configurations, and longer real-world deployments to strengthen generalization; and deploying on-device inference through TinyML frameworks to enable real-time, fully local bedside traceability. By turning wrist-worn and minimal UWB sensing into objective, scalable indicators of bedside attention, the proposed system contributes a practical building block toward automated care-workload assessment in shared hospital rooms.

Author Contributions

Conceptualization, A.P.-R. and J.M.-Q.; methodology, A.P.-R. and J.M.-Q.; software, M.Á.A.-M. and A.E.-E.; validation, A.P.-R., M.Á.A.-M. and J.M.-Q.; formal analysis, A.P.-R., M.Á.A.-M. and J.M.-Q.; investigation, A.P.-R., I.V.-L., M.C.-R., B.R.-M. and J.M.-Q.; resources, I.V.-L., M.C.-R., B.R.-M. and J.M.-Q.; data curation, A.P.-R., A.E.-E., M.C.-R. and B.R.-M.; writing—original draft preparation, A.P.-R., M.Á.A.-M. and J.M.-Q.; writing—review and editing, A.P.-R., A.E.-E., M.Á.A.-M., I.V.-L., M.C.-R., B.R.-M. and J.M.-Q.; visualization, A.P.-R. and M.Á.A.-M.; supervision, J.M.-Q.; project administration, A.P.-R. and J.M.-Q.; funding acquisition, I.V.-L. and J.M.-Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study because no patients or vulnerable participants were involved, and data were collected in a controlled environment using non-clinical simulated care activities.

Informed Consent Statement

Written informed consent was obtained from the participant involved in the study and shown in the figures.

Data Availability Statement

The dataset and code used in this study are available upon reasonable request to the corresponding author.

Acknowledgments

The authors acknowledge the support of the project “HERA: Pilotaje de Herramienta de Evaluación y Respuesta Anticipada de cambios conductuales en el seguimiento del paciente con Demencia en el Hogar”, file number AP-0030-2025, led by the principal investigator Maria Isabel Valenzuela López at D.S.A.P. Granada.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cacchione, P.Z. World Health Organization leads the 2021 to 2030 decade of healthy ageing. Clin. Nurs. Res. 2022, 31, 3–4. [Google Scholar] [CrossRef] [PubMed]
Urwin, S.; Lau, Y.S.; Grande, G.; Sutton, M. The challenges of measuring informal care time: A review of the literature. PharmacoEconomics 2021, 39, 1209–1223. [Google Scholar] [CrossRef] [PubMed]
Westbrook, J.I.; Duffield, C.; Li, L.; Creswick, N.J. How much time do nurses have for patients? A longitudinal study quantifying hospital nurses’ patterns of task time distribution and interactions with health professionals. BMC Health Serv. Res. 2011, 11, 319. [Google Scholar] [CrossRef] [PubMed]
Ordóñez, F.J.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef] [PubMed]
Polo-Rodríguez, A.; Romero-Sanchez, J.; Fernández-García, E.; Paloma-Castro, O.; Porcel-Gálvez, A.M.; Medina-Quero, J. Review on Internet of Things for innovation in nursing process—A PubMed-based search. In Proceedings of the Proceedings of the 15th International Conference on Ubiquitous Computing & Ambient Intelligence (UCAmI 2023); Springer, 2023; Volume 835, Lecture Notes in Networks and Systems; pp. 57–70. [Google Scholar] [CrossRef]
Nweke, H.F.; Teh, Y.W.; Al-Garadi, M.A.; Alo, U.R. Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: State of the art and research challenges. Expert Syst. Appl. 2018, 105, 233–261. [Google Scholar] [CrossRef]
Kamel Boulos, M.N.; Berry, G. Real-time locating systems (RTLS) in healthcare: A condensed primer. Int. J. Health Geogr. 2012, 11, 25. [Google Scholar] [CrossRef] [PubMed]
Wichmann, J. Indoor positioning systems in hospitals: A scoping review. Digit. Health 2022, 8, 20552076221081696. [Google Scholar] [CrossRef]
Elsanhoury, M.; Makela, P.; Koljonen, J.; Valisuo, P.; Shamsuzzoha, A.; Mantere, T.; Elmusrati, M.; Kuusniemi, H. Precision positioning for smart logistics using ultra-wideband technology-based indoor navigation: A review. IEEE Access 2022, 10, 44413–44445. [Google Scholar] [CrossRef]
Polo-Rodríguez, A.; Anguita-Molina, M.Á.; Gil, D.; Romero-Sanchez, J.; Fernández, E.; Paloma-Castro, O.; Porcel-Gálvez, A.M.; Medina-Quero, J. Discovering social interactions between caregivers and frail individuals using indoor localization. In Proceedings of the Proceedings of the International Conference on Ubiquitous Computing and Ambient Intelligence (UCAmI 2024); Springer, 2024; Volume 1212, Lecture Notes in Networks and Systems; pp. 319–331. [Google Scholar] [CrossRef]
van den Berg, B.; Spauwen, P. Measurement of informal care: An empirical study into the valid measurement of time spent on informal caregiving. Health Econ. 2006, 15, 447–460. [Google Scholar] [CrossRef] [PubMed]
Hendrich, A.; Chow, M.P.; Skierczynski, B.A.; Lu, Z. A 36-hospital time and motion study: How do medical-surgical nurses spend their time? Perm. J. 2008, 12, 25–34. [Google Scholar] [CrossRef] [PubMed]
Munyisia, E.N.; Yu, P.; Hailey, D. How nursing staff spend their time on activities in a nursing home: An observational study. J. Adv. Nurs. 2011, 67, 1908–1917. [Google Scholar] [CrossRef] [PubMed]
Cicirelli, G.; Marani, R.; Petitti, A.; Milella, A.; D’Orazio, T. Ambient Assisted Living: A review of technologies, methodologies and future perspectives for healthy aging of population. Sensors 2021, 21, 3549. [Google Scholar] [CrossRef] [PubMed]
Caballero, P.; Ortiz, G.; Medina-Bulo, I. Systematic literature review of ambient assisted living systems supported by the Internet of Things. Univers. Access Inf. Soc. 2024, 23, 1631–1656. [Google Scholar] [CrossRef]
Choi, Y.K.; Thompson, H.J.; Demiris, G. Use of an Internet-of-Things smart home system for healthy aging in older adults in residential settings: Pilot feasibility study. JMIR Aging 2020, 3, e21964. [Google Scholar] [CrossRef] [PubMed]
Sauzéon, H.; Edjolo, A.; Amieva, H.; Consel, C.; Pérès, K. Effectiveness of an Ambient Assisted Living platform for supporting aging in place of older adults with frailty: Protocol for a quasi-experimental study. JMIR Res. Protoc. 2022, 11, e33351. [Google Scholar] [CrossRef] [PubMed]
Girolami, M.; Mavilia, F.; Delmastro, F. Sensing social interactions through BLE beacons and commercial mobile devices. Pervasive Mob. Comput. 2020, 67, 101198. [Google Scholar] [CrossRef] [PubMed]
Baronti, P.; Barsocchi, P.; Chessa, S.; Crivello, A.; Girolami, M.; Mavilia, F.; Palumbo, F. Remote detection of social interactions in indoor environments through Bluetooth Low Energy beacons. J. Ambient Intell. Smart Environ. 2020, 12, 203–217. [Google Scholar] [CrossRef]
Garcia, C.; Inoue, S. Relabeling for Indoor Localization Using Stationary Beacons in Nursing Care Facilities. Sensors 2024, 24, 319. [Google Scholar] [CrossRef] [PubMed]
Polo-Rodríguez, A.; Anguita-Molina, M.A.; Rojas-Ruiz, I.; Medina-Quero, J. Multi-occupant tracking with radar and wearable devices for enhanced accuracy in indoor environments. Eng. Appl. Artif. Intell. 2025, 154, 110872. [Google Scholar] [CrossRef]
Anguita-Molina, M.A.; Soto-Hidalgo, J.M.; Medina-Quero, J.; Polo-Rodríguez, A. Evaluation of Edge-Based UWB Indoor Positioning Using Smartwatches and Embedded Anchors. In Proceedings of the UCAmI 2025, LNNS 1818; 2025; pp. 339–350. [Google Scholar] [CrossRef]
Anguita-Molina, M.A.; Cardoso, P.J.S.; Rodrigues, J.M.F.; Medina-Quero, J.; Polo-Rodríguez, A. Multioccupancy Activity Recognition Based on Deep Learning Models Fusing UWB Localization Heatmaps and Nearby-Sensor Interaction. IEEE Internet Things J. 2025, 12, 16037–16052. [Google Scholar] [CrossRef]
Google LLC. Google Pixel Watch 4 technical specifications. 2024. Accessed: May 2026. Available online: https://store.google.com/product/pixel_watch_4_specs.
Qorvo Inc. DWM3001CDK ultra-wideband development kit datasheet. 2024. Accessed: May 2026. Available online: https://www.qorvo.com/products/p/DWM3001CDK.

Figure 1. Deployment of the proposed smart bedside sensing system. One compact UWB anchor (DWM3001CDK) is placed at the headboard of each monitored bed. The caregiver wears a Google Pixel Watch 4 and carries a UWB tag. A Raspberry Pi 4B acts as the edge gateway and Message Queuing Telemetry Transport (MQTT) broker. Patients carry no device.

Figure 2. Sensing hardware. (a) Google Pixel Watch 4 worn on the wrist alongside a Qorvo DWM3001CDK UWB development kit. (b) Smartwatch application displaying real-time UWB status, distance to the nearest anchor, estimated zone (Bed A), and acquisition cycle status.

Figure 3. Hospital deployment scenario. (a) Overview of the shared room with two hospital beds and clinical training mannequins. (b) Caregiver interacting at a patient bed while wearing the Google Pixel Watch 4. (c) Qorvo DWM3001CDK UWB anchor mounted at the headboard. (d) Smartwatch on the caregiver’s wrist. (e) Raspberry Pi 4B edge gateway.

Figure 4. Past–delayed sliding-window segmentation strategy (

L = 11

,

D = 16

channels). Each window is centered on the instant to be labeled t, with

p = 5

past seconds and

d = 5

future (delayed) seconds. MARG channels (14) and UWB distance channels (2) are shown as rows. The class label (Bed A, Bed B, or None) is assigned at t.

Figure 4. Past–delayed sliding-window segmentation strategy (

L = 11

,

D = 16

channels). Each window is centered on the instant to be labeled t, with

p = 5

past seconds and

d = 5

future (delayed) seconds. MARG channels (14) and UWB distance channels (2) are shown as rows. The class label (Bed A, Bed B, or None) is assigned at t.

Figure 5. Global confusion matrices obtained by aggregating the predictions of the four leave-one-scene-out folds for each evaluated model. Rows correspond to the true class and columns to the predicted class. Classes are defined as follows: 0 = no-bed interaction or transition, A = interaction with Bed A, and B = interaction with Bed B. The matrices show that most classification errors occur in the minority transition class, whereas the attended-bed classes are generally well separated, especially for the sequential models.

Figure 6. Global confusion matrix for the Simple LSTM model, obtained by aggregating the predictions across the four leave-one-scene-out folds. Rows correspond to the true class and columns to the predicted class. The model correctly classifies most Bed A and Bed B windows, while the main source of error is the minority transition class 0, which is frequently confused with Bed A.

Table 1. Comparison of representative related works on proximity-based caregiver–patient interaction sensing.

Work	Technology	Environment	Bed-level	Patient device	Model
[7]	RTLS (IR/BLE)	Hospital	No	Yes	Rule-based
[10]	UWB	Residential	No	No	Threshold
[21]	UWB + mmWave	Residential	No	No	ConvLSTM
[22]	UWB (smartwatch)	Indoor	No	No	–
[18]	BLE beacons	Office/lab	No	Yes	None
[20]	BLE	Nursing care	No	No	Random Forest
[8]	Various RTLS	Hospital	No	Yes	Review
This work	UWB + MARG	Hospital	Yes	No	LSTM/XGBoost

Table 2. Dataset statistics for the hospital scenario after aggregation at one-second resolution. Class 0 = no-bed interaction; MARG sampled at 50 Hz; UWB samples per anchor reported separately. The class distribution corresponds to one-second labeled samples before centered-window generation.

Scene	Duration (min)	1-s labels (0 / A / B)	MARG samples	UWB Anc. A	UWB Anc. B
Scene 1	3.9	7 / 129 / 101	11,864	80	80
Scene 2	5.2	46 / 91 / 177	15,768	90	92
Scene 3	5.0	3 / 104 / 195	15,154	86	87
Scene 4	4.8	6 / 144 / 137	14,384	100	100
Total	19.9	62 / 468 / 610	57,170	356	359

Table 3. Classification performance of the evaluated models in the hospital scenario. F1-0, F1-A, and F1-B denote per-class F1-scores for the no-interaction class, Bed A, and Bed B, respectively.

Model	Acc.	Macro P	Macro R	Macro F1	F1-0	F1-A	F1-B
Simple LSTM	0.96	0.95	0.86	0.89	0.75	0.96	0.98
Stacked LSTM	0.91	0.81	0.76	0.78	0.52	0.90	0.93
CNN+LSTM	0.95	0.89	0.88	0.88	0.74	0.95	0.97
XGBoost	0.96	0.85	0.73	0.76	0.35	0.95	0.99
SVM	0.89	0.81	0.63	0.64	0.11	0.89	0.92

Table 4. Leave-one-scene-out accuracy by held-out scene. The last column reports the mean and standard deviation across the four folds.

Model	Scene 1	Scene 2	Scene 3	Scene 4	Mean ± SD
Simple LSTM	0.987	0.937	0.936	0.993	0.963 ± 0.031
Stacked LSTM	0.978	0.801	0.979	0.884	0.911 ± 0.085
CNN+LSTM	0.982	0.962	0.951	0.917	0.953 ± 0.027
XGBoost	0.982	0.902	0.961	0.993	0.960 ± 0.040
SVM	0.749	0.892	0.986	0.903	0.882 ± 0.098

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Smart Bedside Traceability of Caregiver-Patient Interactions Using Wearables and Tiny Localization Anchors

Abstract

Keywords:

Subject:

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Sensing Architecture and Hardware

3.2. Deployment Scenario and Dataset

3.3. Data Preprocessing and Temporal Segmentation

3.4. Classification Models and Evaluation Protocol

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe