Submitted:
14 November 2024
Posted:
18 November 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Works
3. Materials
3.1. Data
- AXIMU, AYIMU, AZIMU: Acceleration measured by the IMU
- RAIMU, RBIMU, RCIMU: Rate of rotation measured by the IMU
- MXIMU, MYIMU, MZIMU: Magnetic field measured by the IMU
- PXGPS, PYGPS, PZGPS: Position measured by the GPS
- VXGPS, VYGPS, VZGPS: Velocity measured by the GPS
- EGPS: Position accuracy estimation provided by the GPS (PDOP)
- PXFUSION, PYFUSION, PZFUSION: Position computed by the sensor fusion algorithm
- VXFUSION, VYFUSION, VZFUSION: Velocity computed by the sensor fusion algorithm
- OAFUSION, OBFUSION, OCFUSION: Orientation computed by the sensor fusion algorithm
- EFUSION: Position accuracy estimation provided by the sensor algorithm
3.2. Testing Subdatasets
- Schema 1: The beginning of the session is masked
- Schema 2: A section within the session is masked, while the beginning and end of the session remain unmasked (at least 2 seconds are left unmasked at both the start and end).
- Schema 3: The end of the session is masked
- GPS features: PXGPS, PYGPS, PZGPS, VXGPS, VYGPS, VZGPS, EGPS
- Sensor fusion algorithm output features: PXFUSION, PYFUSION, PZFUSION, VXFUSION, VYFUSION, VZFUSION, OAFUSION, OBFUSION, OCFUSION, EFUSION
3.3. Preprocessing
4. Approach
4.1. Flow
4.2. Network Architecture
4.3. Network Training
- Strategy 1: Mask applied at the start of the sequence,
- Strategy 2: Mask applied in the center of the sequence,
- Strategy 3: Mask applied at the end of the sequence,
- Strategy 4: Mask applied at a random position,
- GPS features: PXGPS, PYGPS, PZGPS, VXGPS, VYGPS, VZGPS, EGPS
- Sensor fusion algorithm output features: PXFUSION, PYFUSION, PZFUSION, VXFUSION, VYFUSION, VZFUSION, OAFUSION, OBFUSION, OCFUSION, EFUSION
4.4. Hyperparameter Tuning
4.5. Evaluation
- The Absolute Trajectory Error (ATE): Measures the discrepancy between the ground truth and the predicted trajectories. This metric is sensitive to outliers. Therefore, it is common that this metric increases with prediction duration and length.
- The Relative Trajectory Error (RTE): Quantifies the relative error in the distance between the ground truth and predicted start/end points.
- The Relative Distance Error (RDE): Calculates the relative error between the total predicted and ground truth distances.
- is the ground truth position in 3D at instant t:
- is the predicted position in 3D at instant t:
- is the Euclidean norm between and
5. Results
6. Discussion
Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Appendix A
| Mean | Std | Min | Q1 | Median | Q3 | Max | |
|---|---|---|---|---|---|---|---|
| AXIMU | −2.35 × 10−1 | 4.53 × 100 | −1.10 × 102 | −1.95 × 100 | −7.50 × 10−2 | 1.80 × 100 | 1.03 × 102 |
| AYIMU | 1.32 × 10−1 | 3.32 × 100 | −8.73 × 101 | −1.53 × 100 | 1.55 × 10−1 | 1.81 × 100 | 1.60 × 102 |
| AZIMU | −9.97 × 100 | 6.06 × 100 | −1.10 × 102 | −1.17 × 101 | −9.85 × 100 | −8.12 × 100 | 9.98 × 101 |
| RAIMU | −8.93 × 10−2 | 2.61 × 101 | −3.27 × 102 | −1.24 × 101 | −4.00 × 10−2 | 1.23 × 101 | 3.25 × 102 |
| RBIMU | 6.47 × 10−1 | 3.92 × 101 | −3.27 × 102 | −2.54 × 101 | −1.71 × 100 | 2.49 × 101 | 3.27 × 102 |
| RCIMU | −7.51 × 10−1 | 2.74 × 101 | −3.08 × 102 | −1.81 × 101 | −3.90 × 10−1 | 1.68 × 101 | 3.20 × 102 |
| MXIMU | −4.09 × 10−1 | 7.57 × 101 | −2.36 × 102 | −4.99 × 101 | −2.65 × 100 | 4.84 × 101 | 2.04 × 102 |
| MYIMU | 6.04 × 101 | 8.75 × 101 | −2.00 × 102 | −5.74 × 100 | 6.09 × 101 | 1.25 × 102 | 3.02 × 102 |
| MZIMU | −1.78 × 101 | 1.22 × 102 | −2.30 × 102 | −1.49 × 102 | 9.66 × 100 | 8.06 × 101 | 2.54 × 102 |
| PXGPS | −5.64 × 101 | 2.62 × 102 | −1.72 × 103 | −4.50 × 101 | −2.06 × 100 | 3.23 × 101 | 4.97 × 102 |
| PYGPS | −2.86 × 101 | 8.49 × 101 | −3.64 × 102 | −7.95 × 101 | −2.66 × 101 | 7.44 × 100 | 4.73 × 102 |
| PZGPS | 5.90 × 101 | 2.46 × 102 | −4.43 × 102 | −2.37 × 101 | 8.74 × 100 | 4.75 × 101 | 1.58 × 103 |
| VXGPS | 7.03 × 10−2 | 1.50 × 100 | −7.40 × 100 | −7.80 × 10−1 | 4.00 × 10−2 | 9.00 × 10−1 | 9.02 × 100 |
| VYGPS | 3.40 × 10−3 | 2.01 × 100 | −8.36 × 100 | −1.02 × 100 | −1.00 × 10−2 | 1.03 × 100 | 8.71 × 100 |
| VZGPS | 4.79 × 10−2 | 1.45 × 100 | −7.68 × 100 | −7.40 × 10−1 | 2.00 × 10−2 | 8.20 × 10−1 | 7.61 × 100 |
| EGPS | 1.50 × 100 | 2.90 × 10−1 | 9.40 × 10−1 | 1.29 × 100 | 1.47 × 100 | 1.66 × 100 | 4.43 × 100 |
| PXFUSION | −5.66 × 101 | 2.62 × 102 | −1.71 × 103 | −4.49 × 101 | −2.19 × 100 | 3.24 × 101 | 4.93 × 102 |
| PYFUSION | −2.85 × 101 | 8.48 × 101 | −3.64 × 102 | −7.93 × 101 | −2.63 × 101 | 7.60 × 100 | 4.72 × 102 |
| PZFUSION | 5.88 × 101 | 2.46 × 102 | −4.43 × 102 | −2.39 × 101 | 8.77 × 100 | 4.61 × 101 | 1.58 × 103 |
| VXFUSION | 5.93 × 10−2 | 1.44 × 100 | −6.91 × 100 | −7.50 × 10−1 | 3.00 × 10−2 | 8.50 × 10−1 | 6.99 × 100 |
| VYFUSION | 4.40 × 10−4 | 2.00 × 100 | −8.90 × 100 | −1.00 × 100 | −1.00 × 10−2 | 1.01 × 100 | 8.95 × 100 |
| VZFUSION | 3.72 × 10−2 | 1.39 × 100 | −6.85 × 100 | −6.90 × 10−1 | 1.00 × 10−2 | 7.70 × 10−1 | 6.92 × 100 |
| OAFUSION | −1.85 × 10−2 | 9.20 × 10−2 | −6.43 × 10−1 | −6.60 × 10−2 | −1.01 × 10−2 | 4.07 × 10−2 | 7.29 × 10−1 |
| OBFUSION | −2.99 × 10−2 | 7.45 × 10−2 | −7.33 × 10−1 | −7.98 × 10−2 | −3.02 × 10−2 | 1.73 × 10−2 | 9.34 × 10−1 |
| OCFUSION | −6.94 × 10−2 | 1.84 × 100 | −3.14 × 100 | −1.69 × 100 | −8.20 × 10−2 | 1.47 × 100 | 3.14 × 100 |
| EFUSION | 4.56 × 100 | 2.16 × 100 | 1.60 × 100 | 3.11 × 100 | 4.10 × 100 | 5.28 × 100 | 1.84 × 101 |
Appendix B

Appendix C
| Inference (batch size 1) | ||||
| GPS outage duration |
Input frequence |
Processing time on CPU |
Processing time on GPU |
GPU memory consumption |
| small | 100 Hz | 41.4 ms / 448 µs 1 | 1.64 ms / 2.29 ms 1 | 34 Mo |
| small | 10 Hz | 1.91 ms / 1.41 µs 1 | 1.32 ms / 2.29 ms 1 | 32 Mo |
| medium | 100 Hz | 809 ms / 1.92 ms 1 | 20.7 ms / 1.12 ms 1 | 38 Mo |
| medium | 10 Hz | 9.15 ms / 1.12 ms 1 | 1.42 ms / 1.52 ms 1 | 34 Mo |
| Training (batch size 16) | ||||
| GPS outage duration |
Input frequence |
Processing time on CPU |
Processing time on GPU |
GPU memory consumption |
| small | 100 Hz | 4.48 s / 116 ms 2 | 14.5 ms / 1.70 ms 1 | 142 Mo |
| small | 10 Hz | 57.7 ms / 4.62 ms 2 | 12.6 ms / 1.68 ms 1 | 58 Mo |
| medium | 100 Hz | None 3 | 195 ms / 27.3 ms 1 | 584 Mo |
| medium | 10 Hz | 1.28 s / 29.0 ms 2 | 12.6 ms / 1.98 ms 1 | 110 Mo |
| Inference (batch size 1) | ||||
| GPS outage duration |
Input frequence |
Processing time on CPU |
Processing time on GPU |
GPU memory consumption |
| small | 100 Hz | 185 ms / 21.9 ms 1 | 3.31 ms / 1.00 ms 1 | 36 Mo |
| small | 10 Hz | 4.71 ms / 616 µs 1 | 1.65 ms / 1.09 ms 1 | 32 Mo |
| medium | 100 Hz | 4.61 s / 413 ms 1 | 68.0 ms / 1.29 ms 1 | 54 Mo |
| medium | 10 Hz | 60.0 ms / 4.03 ms 1 | 1.69 ms / 966 µs 1 | 34 Mo |
| Training (batch size 16) | ||||
| GPS outage duration |
Input frequence |
Processing time on CPU |
Processing time on GPU |
GPU memory consumption |
| small | 100 Hz | 16.7 s / 259 ms 2 | 38.0 ms / 1.78 ms 1 | 344 Mo |
| small | 10 Hz | 139 ms / 3.74 ms 2 | 20.2 ms / 2.46 ms 1 | 78 Mo |
| medium | 100 Hz | None 3 | 630 ms / 1.36 ms 1 | 1660 Mo |
| medium | 10 Hz | 2.92 s / 22.1 ms 2 | 20.9 ms / 1.82 ms 1 | 206 Mo |
| Inference (batch size 1) | ||||
| GPS outage duration |
Input frequence |
Processing time on CPU |
Processing time on GPU |
GPU memory consumption |
| small | 100 Hz | 622 ms / 20.7 ms 1 | 8.84 ms / 1.14 ms 1 | 38 Mo |
| small | 10 Hz | 11.2 ms / 783 µs 1 | 2.64 ms / 1.55 ms 1 | 34 Mo |
| medium | 100 Hz | None 3 | 205 ms / 4.17 ms 1 | 54 Mo |
| medium | 10 Hz | 176 ms / 8.00 ms 1 | 3.05 ms / 1.51 ms 1 | 36 Mo |
| Training (batch size 16) | ||||
| GPS outage duration |
Input frequence |
Processing time on CPU |
Processing time on GPU |
GPU memory consumption |
| small | 100 Hz | None 3 | 98.7 ms / 1.58 ms 1 | 1002 Mo |
| small | 10 Hz | 873 ms / 19.5 ms 2 | 26.1 ms / 1.40 ms 1 | 142 Mo |
| medium | 100 Hz | None 3 | 1.87 s / 3.39 ms 1 | 4430 Mo |
| medium | 10 Hz | 19.5 ms / 393 ms 2 | 37.3 ms / 1.76 ms 1 | 538 Mo |
| Inference (batch size 1) | ||||
| GPS outage duration |
Input frequence |
Processing time on CPU |
Processing time on GPU |
GPU memory consumption |
| small | 100 Hz | 1.66 s / 138 ms 1 | 22.7 ms / 1.02 ms 1 | 38 Mo |
| small | 10 Hz | 27.8 ms / 274 µs 1 | 3.09 ms / 957 µs 1 | 40 Mo |
| medium | 100 Hz | None 3 | 539 ms / 9.48 ms 1 | 100 Mo |
| medium | 10 Hz | 493 ms / 1.45 ms 1 | 6.91 ms / 1.47 ms 1 | 42 Mo |
| Training (batch size 16) | ||||
| GPS outage duration |
Input frequence |
Processing time on CPU |
Processing time on GPU |
GPU memory consumption |
| small | 100 Hz | None 3 | 242 ms / 958 µs 1 | 2374 Mo |
| small | 10 Hz | 1.78 s / 8.73 ms 2 | 27.8 ms / 1.33 ms 1 | 308 Mo |
| medium | 100 Hz | None 3 | 4.9 s / 8.75 ms 1 | 11032 Mo |
| medium | 10 Hz | None 3 | 81.7 ms / 1.69 ms 1 | 1290 Mo |
References
- Ahmad, N.; Ghazilla, R.A.R.; Khairi, N.M.; Kasi, V. Reviews on various inertial measurement unit (IMU) sensor applications. Int. J. Signal Process. Syst. 2013, 1, 256–262. [Google Scholar] [CrossRef]
- Neto, P.; Pires, J.N.; Moreira, A.P. 3-D position estimation from inertial sensing: Minimizing the error from the process of double integration of accelerations. In : IECON 2013-39th Annual Conference of the IEEE Industrial Electronics Society. IEEE, 2013. 4026–4031.
- Neto, P.; Pires, J.N.; Moreira, A.P. GPS/IMU data fusion using multisensor Kalman filtering: introduction of contextual aspects. Inf. Fusion 2006, 7, 221–230. [Google Scholar]
- Guyard, K.C.; Montavon, S.; Bertolaccini, J.; Deriaz, M. Validation of Alogo Move Pro: A GPS-Based Inertial Measurement Unit for the Objective Examination of Gait and Jumping in Horses. Sensors 2023, 23, 4196. [Google Scholar] [CrossRef] [PubMed]
- Sun, Scott, Melamed, Dennis, et Kitani, Kris. IDOL: Inertial deep orientation-estimation and localization. In : Proceedings of the AAAI Conference on Artificial Intelligence. 2021. p. 6128-6137.
- Chen, C.; Lu, X.; Markham, A.; et al. Ionet: Learning to cure the curse of drift in inertial odometry. In : Proceedings of the AAAI Conference on Artificial Intelligence.; p. 2018.
- Guyard, K.C.; Montavon, S.; Bertolaccini, J.; Deriaz, M. Denoising imu gyroscopes with deep learning for open-loop attitude estimation. IEEE Robot. Autom. Lett. 2020, 5, 4796–4803. [Google Scholar]
- Liu, W.; Caruso, D.; Ilg, E.; Dong, J.; Mourikis, A.I.; Daniilidis, K.; Kumar, V.; Engel, J.; Valada, A.; Asfour, T. Tlio: Tight learned inertial odometry. IEEE Robot. Autom. Lett. 2020, 5, 5653–5660. [Google Scholar] [CrossRef]
- Wang, Y.; Kuang, J.; Niu, X.; Liu, J. LLIO: Lightweight learned inertial odometer. IEEE Internet Things J. 2022, 10, 2508–2518. [Google Scholar] [CrossRef]
- Wang, Y.; Cheng, H.; Wang, C.; Meng, M.Q.-H. Pose-invariant inertial odometry for pedestrian localization. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
- Cioffi, G.; Bauersfeld, L.; Kaufmann, E.; Scaramuzza, D. Learned inertial odometry for autonomous drone racing. IEEE Robot. Autom. Lett. 2023, 8, 2684–2691. [Google Scholar] [CrossRef]
- Rao, B.; Kazemi, E.; Ding, Y.; Shila, D.M.; Tucker, F.M.; Wang, L. Ctin: Robust contextual transformer network for inertial navigation. In : Proceedings of the AAAI Conference on Artificial Intelligence.; pp. 20225413–5421.
- Wang, Y.; Cheng, H.; Meng, M.Q.-H. Spatiotemporal co-attention hybrid neural network for pedestrian localization based on 6D IMU. IEEE Trans. Autom. Sci. Eng. 2022, 20, 636–648. [Google Scholar] [CrossRef]
- Hosseinyalamdary, S. Deep Kalman filter: Simultaneous multi-sensor integration and modelling; A GNSS/IMU case study. Sensors 2018, 18, 1316. [Google Scholar] [CrossRef] [PubMed]
- Wu, F.; Luo, H.; Jia, H.; Zhao, F.; Xiao, Y.; Gao, X. Predicting the noise covariance with a multitask learning model for Kalman filter-based GNSS/INS integrated navigation. IEEE Trans. Instrum. Meas. 2020, 70, 1–13. [Google Scholar] [CrossRef]
- Kang, J.; Lee, J.; Eom, D.-S. Smartphone-based traveled distance estimation using individual walking patterns for indoor localization. Sensors 2018, 18, 3149. [Google Scholar] [CrossRef] [PubMed]
- Sola, J.; Sevilla, J. Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Trans. Nucl. Sci. 1997, 44, 1464–1468. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.; Lee, K.; et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv, 2018; arXiv:1810.04805. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. . Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17); Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
- Prechelt, L. Early stopping-but when. In Neural Networks: Tricks of the trade; Springer: Berlin/Heidelberg, 2002. [Google Scholar]
- Loshchilov, Ilya et HUTTER, Frank. Decoupled weight decay regularization. arXiv, 2017; arXiv:1711.05101.
- Turner, R.; Eriksson, D.; Mccourt, M.; et al. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. In NeurIPS 2020 Competition and Demonstration Track; PMLR, 2021; pp. 3–26.










| Hyperparameter | Space |
|---|---|
| Batch size | [2, 4, 8, 16, 32, 64, 128, 256] |
| Learning rate | [1 * 10-5; 1 * 10-1] |
| Weight decay | [1 * 10-4; 1 * 10-1] |
| Gradient clipping | [True, False] |
| Gradient clipping max norm | [1; 10] |
| [0.5, 0.75, 0.9, 0.999] | |
| dmodel | [8, 16, 32, 64, 128, 256, 512, 1024] |
| Nhead | [1, 2, 4, 8, 16] |
| dhidden | [16, 32, 64, 128, 256, 512, 1024, 2048, 4096] |
| Nencoder | [1; 12] |
| Input frequence |
Schema 1 | Schema 2 | Schema 3 | ||||||
| ATE 1 | RTE 2 | RDE 2 | ATE 1 | RTE 2 | RDE 2 | ATE 1 | RTE 2 | RDE 2 | |
| Strategy 1 | Strategy 2 | Strategy 3 | |||||||
| 100 Hz | 3.83 | 0.62 | 0.39 | 3.38 | 0.20 | 0.28 | 3.80 | 0.88 | 0.32 |
| 10 Hz | 3.37 | 0.41 | 0.26 | 4.82 | 0.31 | 0.29 | 3.55 | 0.73 | 0.32 |
| Strategy 4 | Strategy 4 | Strategy 4 | |||||||
| 100 Hz | 2.91 | 0.34 | 0.35 | 3.67 | 0.20 | 0.31 | 2.84 | 0.53 | 0.30 |
| 10 Hz | 2.86 | 0.24 | 0.38 | 3.86 | 0.26 | 0.33 | 3.47 | 0.51 | 0.32 |
| Input frequence |
Schema 1 | Schema 2 | Schema 3 | ||||||
| ATE 1 | RTE 2 | RDE 2 | ATE 1 | RTE 2 | RDE 2 | ATE 1 | RTE 2 | RDE 2 | |
| Strategy 1 | Strategy 2 | Strategy 3 | |||||||
| 100 Hz | 17.98 | 0.91 | 0.30 | 12.16 | 0.53 | 0.31 | 12.31 | 0.35 | 0.25 |
| 10 Hz | 14.13 | 0.73 | 0.28 | 12.99 | 0.64 | 0.30 | 11.12 | 0.23 | 0.24 |
| Strategy 4 | Strategy 4 | Strategy 4 | |||||||
| 100 Hz | 14.88 | 1.08 | 0.30 | 12.50 | 0.55 | 0.35 | 13.92 | 0.34 | 0.30 |
| 10 Hz | 17.31 | 1.18 | 0.27 | 13.33 | 0.54 | 0.30 | 11.03 | 0.28 | 0.28 |
| Hyperparameter | Optima subspace | Best set |
| Batch size | [2, 4, 8, 16, 32] | 16 |
| Learning rate | [1 * 10-5; 5 * 10-3] | 8.5 * 10-4 |
| Weight decay | [1 * 10-2; 1 * 10-1] | 1.95 * 10-2 |
| Gradient clipping | False | False |
| Gradient clipping max norm | None | None |
| [0.5, 0.75] | 0.5 | |
| dmodel | [8, 16, 32, 64, 128] | 32 |
| Nhead | [2, 4, 8, 16] | 4 |
| dhidden | [16, 32, 64, 128, 256, 512] | 64 |
| Nencoder | [1; 6] | 5 |
| Inference (batch size = 1) | ||||
| GPS outage duration |
Input frequence |
Processing time on CPU |
Processing time on GPU |
GPU memory consumption |
| small | 100 Hz | 160 ms / 1.61 ms 1 | 3.05 ms / 2.09 ms 1 | 36 Mo |
| small | 10 Hz | 4.44 ms / 217 µs 1 | 1.76 ms / 2.29 ms 1 | 32 Mo |
| medium | 100 Hz | 5.01 s / 521 ms 1 | 57.8 ms / 244 µs 1 | 54 Mo |
| medium | 10 Hz | 49.0 ms / 822 µs 1 | 1.80 ms / 2.21 ms 1 | 34 Mo |
| Training (batch size = 16) | ||||
| GPS outage duration |
Input frequence |
Processing time on CPU |
Processing time on GPU |
GPU memory consumption |
| small | 100 Hz | 8.42 s / 28.0 ms 2 | 30.9 ms / 1.43 ms 1 | 304 Mo |
| small | 10 Hz | 172 ms / 5.35 ms 2 | 16.2 ms / 1.40 ms 1 | 74 Mo |
| medium | 100 Hz | None 3 | 527 ms / 1.81 ms 1 | 1448 Mo |
| medium | 10 Hz | 4.15 s / 40.0 ms 2 | 19.1 ms / 1.69 ms 1 | 184 Mo |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).