Submitted:
20 August 2025
Posted:
22 August 2025
You are already at the latest version
Abstract
Keywords:
MSC: 68T05
1. Introduction
- Scale-/time-warp-robust saliency. A topology-aware, multi-feature saliency (persistence and non-maximum suppression) that stabilizes keyframes under noise and tempo variations, yielding sparser yet more stable anchors than signal-level detectors.
- Joint HSMM alignment with explicit durations. Cross-demo training with extended forward–backward/Viterbi recursions; model order selected by a joint criterion combining BIC and alignment error (AE) to balance fit and parsimony.
- Label-free feature-weight self-calibration. CMA-ES on the weight simplex to minimize cross-demo structural dispersion, eliminating hand-tuned fusion and improving phase consistency.
- Calibrated probabilistic encoding for planning. Segment-wise GMR/ProMP (optionally fused with DMP) returning means and covariances that integrate directly with OMPL/MPC for risk-aware execution.
2. Materials and Methods
2.1. Inputs, Outputs, and Assumptions
- Inputs.
- Outputs.
- Assumptions.
- Demonstrations consist of alternating quasi-stationary and transition segments.
- Time deformation is order-preserving (the semantic phase order does not change).
- Observation noise is moderate and can be mitigated by local smoothing and statistical filtering.
2.2. Multi-Feature Analysis and Automatic Segmentation
2.2.1. Data Ingestion and Pre-modulation
2.2.2. Feature Computation and Saliency Fusion
- Velocity. Let
- 2.
- Acceleration.
- 3.
- Curvature. With ΔP(t)=P(t+1)-P(t),
- 4.
- Direction-Change Rate (DCR). Define the unit-direction vector . To avoid numerical issues at very low speed, introduce a threshold and set
- 5.
- Dimensionless fusion. Apply min–max normalization to each feature to obtain . For a weight vector with and , define the fused saliency
2.2.3. Keyframe Extraction with Topological Simplification
- Candidate extrema via quantile thresholds.
- set ;
- collect indices and snap each to the nearest local extremum within a radius-3 neighborhood.
- 2.
- Persistence thresholding (scale-invariant importance).
- 3.
- Non-maximum suppression (NMS).
2.2.4. Adaptive Feature-Weight Learning
- Consistency functiona.
- 2.
- Objective and constraints.
- 3.
- Solver and feasibility.
- 4.
- Computational profile.
2.3. Multi-Demo Alignment and Segmentation via a Duration-Explicit HSMM
2.3.1. Model and Generative Mechanism
- a)
- Initial phase:
- b)
- Duration: for the current phase , sample dwell length .
- c)
- Observations: for with ,
- d)
- Transition: ; terminal transitions end the sequence.
- Observation design. We concatenate kinematic descriptors (e.g., curvature, speed magnitude, DCR) and any available modalities (pose, force/tactile, depth) into .
- Duration choices. We use either discrete with support 1: , or truncated Gaussian/Gamma families to accommodate unequal dwell times [39].
2.3.2. Parameter Estimation: Extended Baum–Welch
- 1.
- Forward variable (leaving at time ).
- 2.
- Backward variable.
- 3.
- Posteriors (E-step).
- 4.
- M-step (closed forms).
- 5.
-
Numerical stability.All recursions are implemented in the log domain using log-sum-exp
2.3.3. Semantic Time Axis: Decoding and Outputs
2.3.4. Alignment Quality, Model Selection, and Robustness
2.4. Statistical Motion Primitives and Probabilistic Generation
2.4.1. Dynamic Movement Primitives (DMP)
- Single-segment dynamics.
- 2.
- Segment coupling and smoothness.
2.4.2. Gaussian Mixture Modeling and Regression (GMM/GMR)
2.4.3. Probabilistic Movement Primitives (ProMP)
2.4.4. Model Choice and Complementarity
- DMP excels at real-time execution and low jerk with simple time scaling;
- GMM/GMR offers closed-form means and covariances over phase and is convenient for planners needing analytic gradients;
- ProMP provides a distribution over shapes with exact linear-Gaussian conditioning, ideal for multi-goal tasks and collaboration.
3. Experiments and Results
3.1. Objectives and Evaluation Protocol
- Segmentation robustness. Do multi-feature saliency and topological persistence yield sparse yet structurally stable keyframes under heterogeneous noise and tempo variations?
- Semantic alignment quality. Does duration-explicit HSMM reduce cross-demonstration time dispersion when non-geometric dwelling is present (e.g., hover, wait)?
- Generator calibration. On the shared semantic time base, do segment-wise probabilistic models achieve low reconstruction error, nominal uncertainty coverage, and dynamically schedulable executions?
3.2. Tasks, Datasets, and Testable Hypotheses
- Domain A—UAV-Sim (multi-scene flight). 100 Hz sampling. Subtasks include take-off–lift–cruise–drop and gate-pass–loiter–gate-pass. Six subjects, 20–30 segments per task. Observations: tool-center position (optional yaw). Figure 7 shows the environment and demonstrations.
- Domain B—AV-Sim (CARLA/MetaDrive urban). 10 Hz sampling across Town01–05, varied weather/lighting and traffic control. Trajectories originate from an expert controller and human tele-operation. Observations: . See Figure 8.
- Domain C—Manip-Sim (robomimic/RLBench assembly). 50–100 Hz sampling. Tasks akin to RoboTurk “square-nut”: grasp–align–insert with pronounced dwell segments. Observations: end-effector position. See Figure 9.
3.3. Metrics and Statistical Inference
- SOD (Eq. (2.10), min): structural dispersion—mean point-to-point divergence on the shared time base.
- AE (Eq. (2.17), min): Euclidean dispersion of phase end times across demonstrations.
- AAR (max): action acquisition rate. Given reference key actions (expert consensus / common boundaries) and detected , we count a hit if , with (i.e., 50 ms at 100 Hz; 0.5 s at 10 Hz).
- GRE (min): geometric reconstruction error (RMSE).
- TCR (max): time compression rate.
- Jerk (min): (normalized).
- CR (target ≈ 95%): nominal coverage. For each semantic segment we sample uniformly in time; if , it is counted as covered; segment-level coverage is averaged and then length-weighted globally.
3.4. Baselines and Fairness Controls
3.5. Overall Results
- Domain A—UAV-Sim (Table 2)
- Domain B—AV-Sim (Table 3)
| Method | SOD (m) | AE (s) | GRE (m) | TCR (%) | AAR (%) | Jerk | CR (%) |
|---|---|---|---|---|---|---|---|
| Curvature + quantile | 0.172 ± 0.030 | 0.70 ± 0.14 | 0.247 ± 0.041 | 32.7 ± 5.2 | 66.0 ± 6.7 | 1.18 ± 0.09 | – |
| Multi-feat (equal), no TDA/NMS | 0.160 ± 0.029 | 0.63 ± 0.12 | 0.231 ± 0.038 | 39.5 ± 5.0 | 71.6 ± 6.1 | 1.14 ± 0.08 | – |
| Multi-feat + TDA/NMS + HMM | 0.148 ± 0.027 | 0.55 ± 0.11 | 0.214 ± 0.035 | 47.4 ± 4.7 | 78.8 ± 5.4 | 1.08 ± 0.07 | – |
| Ours: +TDA/NMS+HSMM+ProMP | 0.112 ± 0.022 | 0.37 ± 0.08 | 0.191 ± 0.033 | 47.5 ± 4.6 | 86.3 ± 5.0 | 1.00 ± 0.06 | 95.1 ± 2.5 |
- Domain C—Manip-Sim (Table 4)
| Method | SOD (mm) | AE (s) | GRE (mm) | TCR (%) | AAR (%) |
|---|---|---|---|---|---|
| Curvature + quantile | 1.12 ± 0.27 | 0.33 ± 0.09 | 1.02 ± 0.19 | 30.0 ± 4.8 | 65.3 ± 6.3 |
| Multi-feat (equal), no TDA/NMS | 0.94 ± 0.24 | 0.28 ± 0.08 | 0.86 ± 0.16 | 33.0 ± 5.0 | 69.1 ± 5.8 |
| Multi-feat + TDA/NMS + HMM | 0.87 ± 0.22 | 0.29 ± 0.08 | 0.91 ± 0.17 | 35.1 ± 5.1 | 70.0 ± 5.7 |
| BOCPD | 0.98 ± 0.25 | 0.31 ± 0.09 | 1.05 ± 0.20 | 45.0 ± 5.6 | 71.9 ± 5.4 |
| Ours: +TDA/NMS+HSMM +ProMP |
0.72 ± 0.18 | 0.24 ± 0.07 | 0.79 ± 0.15 | 49.2 ± 4.8 | 83.4 ± 5.1 |
3.6. Contribution Attribution: Ablation Study (UAV-Sim)
- Remove TDA (keep NMS): SOD +12.3%, AE +9.6% → persistence is key to scale-invariant noise rejection; without it, small-scale oscillations stack into spurious peak–valley pairs, degrading structure and boundaries.
- Remove NMS (keep TDA): AE +21.1% → suppressing same-polarity peak clusters in high-energy regions is critical for boundary stability; persistence alone cannot prevent multi-response.
- Fix equal weights (no ): SOD +18.6%, AAR −7.7 pp → consistency-driven weight learning mitigates channel scale imbalance and improves key-action capture.
- Replace HSMM with HMM: AE +31.2% → direct evidence of the geometric-duration bias when dwell exists (wait/loiter).
3.7. Robustness: Time-Warping, Noise, and Missing Data (AV-Sim)
3.8. Model Selection and the Sparsity–Fidelity Trade-off
3.9. In-Segment Generation: Accuracy, Smoothness, and Calibration
- Geometry: ProMP ≈ GMR < DMP < linear in GRE.
- Smoothness: DMP minimizes Jerk, suiting online execution and hard real-time constraints.
- Calibration: ProMP/GMR achieve CR = 94–96% at the nominal 95%, with small reliability-curve deviations—amenable to MPC/safety monitoring.
3.10. Summary of Findings
4. Discussion
- Evidence vs. hypotheses.
- (H1) Segmentation. The topology-aware saliency (multi-feature + persistence + NMS) yields sparse yet stable anchors: removing persistence or NMS increases SOD/AE by 9–21% and lowers AAR (ablation), confirming that both scale-invariant pruning and peak-cluster suppression are necessary.
- (H2) Alignment. Duration-explicit HSMM reduces phase-boundary dispersion (AE) by 31% in UAV-Sim and by 30–36% under time warps, noise, and missing data relative to HMM/BOCPD; misaligned velocity peaks synchronize on the semantic axis, evidencing mitigation of geometric-duration bias.
- (H3) Generation. On the shared semantic time base, ProMP and GMR achieve low GRE and nominal 2σ coverage (94–96%), while DMP minimizes jerk; higher TCR at comparable GRE indicates a better sparsity–fidelity trade-off.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Correia, A.; Alexandre, L.A. A survey of demonstration learning. Robotics and Autonomous Systems 2024, 182, 104812. [Google Scholar] [CrossRef]
- Jin, W.; Murphey, T.D.; Kulić, D.; et al. Learning from sparse demonstrations. IEEE Transactions on Robotics 2022, 39, 645–664. [Google Scholar] [CrossRef]
- Lee, D.; Yu, S.; Ju, H.; et al. Weakly supervised temporal anomaly segmentation with dynamic time warping. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2021; pp. 7355–7364. [Google Scholar]
- Braglia, G.; Tebaldi, D.; Lazzaretti, A.E.; et al. Arc--length--based warping for robot skill synthesis from multiple demonstrations. arXiv 2024, arXiv:2410.13322. [Google Scholar]
- Si, W.; Wang, N.; Yang, C. A review on manipulation skill acquisition through teleoperation--based learning from demonstration. Cognitive Computation and Systems 2021, 3, 1–16. [Google Scholar] [CrossRef]
- Arduengo, M.; Colomé, A.; Lobo--Prat, J.; et al. Gaussian--process--based robot learning from demonstration. Journal of Ambient Intelligence and Humanized Computing 2023, 1–14. [Google Scholar] [CrossRef]
- Tavassoli, M.; et al. Learning skills from demonstrations: A trend from motion primitives to experience abstraction. IEEE Transactions on Cognitive and Developmental Systems 2023, 16, 57–74. [Google Scholar] [CrossRef]
- Ansari, A.F.; Benidis, K.; Kurle, R.; et al. Deep explicit duration switching models for time series. In Advances in Neural Information Processing Systems (NeurIPS); 2021; 34, 29949–29961.
- Sosa--Ceron, A.D.; Gonzalez--Hernandez, H.G.; Reyes--Avendaño, J.A. Learning from demonstrations in human–robot collaborative scenarios: A survey. Robotics 2022, 11, 126. [Google Scholar] [CrossRef]
- Ruiz--Suarez, S.; Leos--Barajas, V.; Morales, J.M. Hidden Markov and semi--Markov models: When and why are these models useful for classifying states in time series data. Journal of Agricultural, Biological and Environmental Statistics 2022, 27, 339–363. [Google Scholar] [CrossRef]
- Pohle, J.; Adam, T.; Beumer, L.T. Flexible estimation of the state dwell--time distribution in hidden semi--Markov models. Computational Statistics & Data Analysis 2022, 172, 107479. [Google Scholar]
- Wang, X.; Li, J.; Xu, G.; et al. A novel zero--velocity interval detection algorithm for a pedestrian navigation system with foot--mounted inertial sensors. Sensors 2024, 24, 838. [Google Scholar] [CrossRef]
- Haussler, A.M.; Tueth, L.E.; May, D.S.; et al. Refinement of an algorithm to detect and predict freezing of gait in Parkinson disease using wearable sensors. Sensors 2024, 25, 124. [Google Scholar] [CrossRef]
- Altamirano, M.; Briol, F.X.; Knoblauch, J. Robust and scalable Bayesian online changepoint detection. In Proceedings of the International Conference on Machine Learning (ICML); PMLR, 2023; pp. 642–663. [Google Scholar]
- Sellier, J.; Dellaportas, P. Bayesian online change point detection with Hilbert--space approximate Student--t process. In Proceedings of the International Conference on Machine Learning (ICML); PMLR, 2023; pp. 30553–30569. [Google Scholar]
- Tsaknaki, I.Y.; Lillo, F.; Mazzarisi, P. Bayesian autoregressive online change--point detection with time--varying parameters. Communications in Nonlinear Science and Numerical Simulation 2025, 142, 108500. [Google Scholar] [CrossRef]
- Buchin, K.; Nusser, A.; Wong, S. Computing continuous dynamic time warping of time series in polynomial time. arXiv 2022, arXiv:2203.04531. [Google Scholar] [CrossRef]
- Wang, L.; Koniusz, P. Uncertainty--DTW for time series and sequences. In European Conference on Computer Vision (ECCV); Springer Nature Switzerland: Cham, Switzerland, 2022; pp. 176–195. [Google Scholar]
- Mikheeva, O.; Kazlauskaite, I.; Hartshorne, A.; et al. Aligned multi--task Gaussian process. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS); PMLR, 2022; pp. 2970–2988. [Google Scholar]
- Saveriano, M.; Abu--Dakka, F.J.; Kramberger, A.; Peternel, L. Dynamic movement primitives in robotics: A tutorial survey. The International Journal of Robotics Research 2023, 42, 1133–1184. [Google Scholar] [CrossRef]
- Barekatain, A.; Habibi, H.; Voos, H. A practical roadmap to learning from demonstration for robotic manipulators in manufacturing. Robotics 2024, 13, 100. [Google Scholar] [CrossRef]
- Urain, J.; Mandlekar, A.; Du, Y.; Shafiullah, M.; Xu, D.; Fragkiadaki, K.; Chalvatzaki, G.; Peters, J. Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations. arXiv 2024, arXiv:2408.04380. [Google Scholar] [CrossRef]
- Vélez--Cruz, N. A survey on Bayesian nonparametric learning for time series analysis. Frontiers in Signal Processing 2024, 3, 1287516. [Google Scholar] [CrossRef]
- Tanwani, A.K.; Yan, A.; Lee, J.; et al. Sequential robot imitation learning from observations. The International Journal of Robotics Research 2021, 40, 1306–1325. [Google Scholar] [CrossRef]
- Bonzanini, A.D.; Mesbah, A.; Di Cairano, S. Perception--aware chance--constrained model predictive control for uncertain environments. In Proceedings of the 2021 American Control Conference (ACC); IEEE, 2021; pp. 2082–2087. [Google Scholar]
- El--Yaagoubi, A.B.; Chung, M.K.; Ombao, H. Topological data analysis for multivariate time series data. Entropy 2023, 25, 1509. [Google Scholar] [CrossRef] [PubMed]
- Nomura, M.; Shibata, M. cmaes: A simple yet practical Python library for CMA--ES. arXiv 2024, arXiv:2402.01373. [Google Scholar]
- Schafer, R.W. What is a Savitzky–Golay filter? IEEE Signal Processing Magazine 2011, 28, 111–117. [Google Scholar] [CrossRef]
- Tapp, K. Differential Geometry of Curves and Surfaces; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Gorodski, C. A Short Course on the Differential Geometry of Curves and Surfaces; Lecture Notes, University of São Paulo: São Paulo, Brazil, 2023. [Google Scholar]
- Cohen--Steiner, D.; Edelsbrunner, H.; Harer, J. Stability of persistence diagrams. Discrete & Computational Geometry 2007, 37, 103–120. [Google Scholar]
- Satopaa, V.; Albrecht, J.; Irwin, D.; Raghavan, B. Finding a “Kneedle” in a haystack: Detecting knee points in system behavior. In Proceedings of the ICDCS Workshops; 2011; pp. 166–171. [Google Scholar]
- Skraba, P.; Turner, K. Wasserstein stability for persistence diagrams. arXiv 2025, arXiv:2006.16824v7. [Google Scholar]
- Hansen, N. The CMA Evolution Strategy: A Tutorial. arXiv 2016, arXiv:1604.00772. [Google Scholar] [CrossRef]
- Singh, G.S.; Acerbi, L. PyBADS: Fast and robust black-box optimization in Python. Journal of Open Source Software 2024, 9, 5694. [Google Scholar] [CrossRef]
- Akimoto, Y.; Auger, A.; Glasmachers, T.; Morinaga, D. Global linear convergence of evolution strategies on more--than--smooth strongly convex functions. SIAM Journal on Optimization 2022, 32, 1402–1429. [Google Scholar] [CrossRef]
- Yu, S.-. -Z. Hidden semi--Markov models. Artificial Intelligence 2010, 174, 215–243. [Google Scholar] [CrossRef]
- Chiappa, S. Explicit--duration Markov switching models. Foundations and Trends in Machine Learning 2014, 7, 803–886. [Google Scholar] [CrossRef]
- Merlo, L.; Maruotti, A.; Petrella, L.; Punzo, A.; et al. Quantile hidden semi--Markov models for multivariate time series. Statistics and Computing 2022, 32, 61. [Google Scholar] [CrossRef] [PubMed]
- Jurafsky, D.; Martin, J.H. Speech and Language Processing, 3rd ed.; Draft. Available online: https://web.stanford.edu/~jurafsky/slp3/A.pdf (accessed on 18 August 2025).
- Yu, S.-. -Z.; Kobayashi, H. An efficient forward–backward algorithm for an explicit--duration hidden Markov model. IEEE Signal Processing Letters 2003, 10, 11–14. [Google Scholar]














| Property | DMP | GMM/GMR | ProMP |
|---|---|---|---|
| Shape representation | Basis functions + 2nd-order stable system | Global Gaussian mixture over | Gaussian over weights w |
| Duration adaptation | Via time scaling | Requires resampling in phase | Basis-phase re-timing |
| Uncertainty | No closed-form (MC if needed) | Analytic | Analytic posterior over w |
| Online constraints | Endpoints/velocities easy | Refit or constrained regression | Exact linear-Gaussian conditioning |
| Execution smoothness | Low jerk (native dynamics) | Depends on mixture fit | Depends on basis and priors |
| Method | SOD (m) | AE (s) | GRE (m) | TCR (%) | AAR (%) | Jerk | CR (%) |
|---|---|---|---|---|---|---|---|
| Curvature + quantile | 0.081 ± 0.019 | 0.55 ± 0.11 | 0.124 ± 0.027 | 34.9 ± 4.8 | 68.1 ± 6.0 | 1.22 ± 0.10 | – |
| Multi-feat (equal), no TDA/NMS | 0.071 ± 0.017 | 0.47 ± 0.10 | 0.110 ± 0.023 | 41.8 ± 5.1 | 73.8 ± 5.7 | 1.16 ± 0.08 | – |
| Multi-feat + TDA/NMS + HMM | 0.060 ± 0.014 | 0.41 ± 0.09 | 0.098 ± 0.019 | 54.6 ± 4.3 | 81.0 ± 4.8 | 1.08 ± 0.07 | – |
| BOCPD | 0.064 ± 0.016 | 0.46 ± 0.12 | 0.105 ± 0.022 | 51.0 ± 4.6 | 78.3 ± 5.4 | 1.14 ± 0.08 | – |
| Ours: +TDA/NMS+HSMM+ProMP | 0.045 ± 0.012 | 0.28 ± 0.07 | 0.082 ± 0.018 | 55.0 ± 4.0 | 88.7 ± 4.2 | 1.00 ± 0.06 | 94.9 ± 2.6 |
| Variant | ΔSOD | ΔAE | ΔGRE | ΔTCR | ΔAAR |
|---|---|---|---|---|---|
| No TDA (NMS only) | 12.30% | 9.60% | 6.70% | −9.4% | −5.8% |
| No NMS (TDA only) | 9.20% | 21.10% | 7.30% | −3.7% | −6.1% |
| Fixed equal weights (no ) | 18.60% | 14.40% | 9.10% | −1.1% | −7.7% |
| HMM in place of HSMM | 23.50% | 31.20% | 17.50% | ≈ 0 | −8.5% |
| Perturbation | Setting | HMM (baseline) | HSMM (ours) | Δ |
|---|---|---|---|---|
| Time-warp | 0.61 | 0.39 | −36% | |
| Time-warp | 0.58 | 0.38 | −34% | |
| Gaussian noise | 0.57 | 0.4 | −30% | |
| Missing data | 20% random drop | 0.63 | 0.41 | −35% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).