Submitted:
02 July 2024
Posted:
03 July 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Works
2.1. Stereo Vision
2.2. Confidence Estimation
3. Method
3.1. Stereo Vision Component
3.2. Confidence Estimation Component
3.3. Training
4. Experiments
4.1. Metrics
4.1.1. Bad3 (Bad Pixels Rate):
4.1.2. Area under the Curve (AUC):
4.2. Implementation Details
5. Results
5.1. Computational Performance and Efficiency


6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| ToF | Time of Flight |
| AUC | Area Under the Curve |
| Bad3 | Bad Pixels Rate with threshold of 3 |
| MSE | Mean Squared Error |
| CSA | Cross-Scale Aggregation |
References
- Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Hannah, M.J. Computer matching of areas in stereo images.; Stanford University: Stanford, CA, 1974. [Google Scholar]
- Pollard, S.B.; Mayhew, J.E.; Frisby, J.P. PMF: A stereo correspondence algorithm using a disparity gradient limit. Perception 1985, 14, 449–470. [Google Scholar] [CrossRef] [PubMed]
- Baker, H.H.; Bolles, R.C.; Woodfill, J.I. Real-time stereo and motion integration for navigation. ISPRS Commission III Symposium: Spatial Information from Digital Photogrammetry and Computer Vision. SPIE, 1994, Vol. 2357, pp. 17–24.
- Hirschmuller, H. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on pattern analysis and machine intelligence 2007, 30, 328–341. [Google Scholar] [CrossRef] [PubMed]
- Zbontar, J.; LeCun, Y. Computing the stereo matching cost with a convolutional neural network. Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1592–1599.
- Shaked, A.; Wolf, L. Improved stereo matching with constant highway networks and reflective confidence learning. Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4641–4650.
- Mei, X.; Sun, X.; Zhou, M.; Jiao, S.; Wang, H.; Zhang, X. On building an accurate stereo matching system on graphics hardware. 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2011, pp. 467–474.
- Egnal, G.; Wildes, R.P. Detecting binocular half-occlusions: Empirical comparisons of five approaches. IEEE Transactions on pattern analysis and machine intelligence 2002, 24, 1127–1133. [Google Scholar] [CrossRef]
- Humenberger, M.; Zinner, C.; Weber, M.; Kubinger, W.; Vincze, M. A fast stereo matching algorithm suitable for embedded real-time systems. Computer Vision and Image Understanding 2010, 114, 1180–1202. [Google Scholar] [CrossRef]
- Zabih, R.; Woodfill, J. Non-parametric local transforms for computing visual correspondence. Computer Vision—ECCV’94: Third European Conference on Computer Vision Stockholm, Sweden, May 2–6 1994 Proceedings, Volume II 3. Springer, 1994, pp. 151–158.
- Heo, Y.S.; Lee, K.M.; Lee, S.U. Robust stereo matching using adaptive normalized cross-correlation. IEEE Transactions on pattern analysis and machine intelligence 2010, 33, 807–822. [Google Scholar]
- Poggi, M.; Kim, S.; Tosi, F.; Kim, S.; Aleotti, F.; Min, D.; Sohn, K.; Mattoccia, S. On the confidence of stereo matching in a deep-learning era: a quantitative evaluation. IEEE transactions on pattern analysis and machine intelligence 2021, 44, 5293–5313. [Google Scholar] [CrossRef] [PubMed]
- Scharstein, D.; Szeliski, R. Stereo matching with nonlinear diffusion. International journal of computer vision 1998, 28, 155–174. [Google Scholar] [CrossRef]
- Egnal, G.; Mintz, M.; Wildes, R.P. A stereo confidence metric using single view imagery with comparison to five alternative approaches. Image and vision computing 2004, 22, 943–957. [Google Scholar] [CrossRef]
- Hu, X.; Mordohai, P. A quantitative evaluation of confidence measures for stereo vision. IEEE transactions on pattern analysis and machine intelligence 2012, 34, 2121–2133. [Google Scholar] [PubMed]
- Kim, S.; Kim, S.; Min, D.; Sohn, K. Laf-net: Locally adaptive fusion networks for stereo confidence estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 205–214.
- Xu, H.; Zhang, J. Aanet: Adaptive aggregation network for efficient stereo matching. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1959–1968.
- Seki, A.; Pollefeys, M. Sgm-nets: Semi-global matching with neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 231–240.
- Schonberger, J.L.; Sinha, S.N.; Pollefeys, M. Learning to fuse proposals from multiple scanline optimizations in semi-global matching. Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 739–755.
- Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4040–4048.
- Gu, Z.; Cheng, J.; Fu, H.; Zhou, K.; Hao, H.; Zhao, Y.; Zhang, T.; Gao, S.; Liu, J. Ce-net: Context encoder network for 2d medical image segmentation. IEEE transactions on medical imaging 2019, 38, 2281–2292. [Google Scholar] [CrossRef] [PubMed]
- Liang, Z.; Feng, Y.; Guo, Y.; Liu, H.; Chen, W.; Qiao, L.; Zhou, L.; Zhang, J. Learning for disparity estimation through feature constancy. Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2811–2820.
- Tonioni, A.; Tosi, F.; Poggi, M.; Mattoccia, S.; Stefano, L.D. Real-time self-adaptive deep stereo. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 195–204.
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-end learning of geometry and context for deep stereo regression. Proceedings of the IEEE international conference on computer vision, 2017, pp. 66–75.
- Chang, J.R.; Chen, Y.S. Pyramid stereo matching network. Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5410–5418.
- Nie, G.Y.; Cheng, M.M.; Liu, Y.; Liang, Z.; Fan, D.P.; Liu, Y.; Wang, Y. Multi-level context ultra-aggregation for stereo matching. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3283–3291.
- Chabra, R.; Straub, J.; Sweeney, C.; Newcombe, R.; Fuchs, H. Stereodrnet: Dilated residual stereonet. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11786–11795.
- Guo, X.; Yang, K.; Yang, W.; Wang, X.; Li, H. Group-wise correlation stereo network. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3273–3282.
- Zhang, F.; Prisacariu, V.; Yang, R.; Torr, P.H. Ga-net: Guided aggregation net for end-to-end stereo matching. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 185–194.
- Menze, M.; Geiger, A. Object Scene Flow for Autonomous Vehicles. Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- Mayer, N.; Ilg, E.; Häusser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2016. arXiv:1512.02134.
- Lipson, L.; Teed, Z.; Deng, J. Raft-stereo: Multilevel recurrent field transforms for stereo matching. 2021 International Conference on 3D Vision (3DV). IEEE, 2021, pp. 218–227.
- Teed, Z.; Deng, J. Raft: Recurrent all-pairs field transforms for optical flow. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer, 2020, pp. 402–419.
- Li, J.; Wang, P.; Xiong, P.; Cai, T.; Yan, Z.; Yang, L.; Liu, J.; Fan, H.; Liu, S. Practical stereo matching via cascaded recurrent network with adaptive correlation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16263–16272.
- Kim, S.; Yoo, D.g.; Kim, Y.H. Stereo confidence metrics using the costs of surrounding pixels. 2014 19th International Conference on Digital Signal Processing. IEEE, 2014, pp. 98–103.
- Haeusler, R.; Nair, R.; Kondermann, D. Ensemble learning for confidence measures in stereo vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 305–312.
- Spyropoulos, A.; Komodakis, N.; Mordohai, P. Learning to detect ground control points for improving the accuracy of stereo matching. Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1621–1628.
- Tosi, F.; Poggi, M.; Benincasa, A.; Mattoccia, S. Beyond local reasoning for stereo confidence estimation with deep learning. Proceedings of the European conference on computer vision (ECCV), 2018, pp. 319–334.
- Fu, Z.; Ardabilian, M.; Stern, G. Stereo matching confidence learning based on multi-modal convolution neural networks. Representations, Analysis and Recognition of Shape and Motion from Imaging Data: 7th International Workshop, RFMI 2017, Savoie, France, December 17–20, 2017, Revised Selected Papers 7. Springer, 2019, pp. 69–81.
- Mehltretter, M.; Heipke, C. Cnn-based cost volume analysis as confidence measure for dense matching. Proceedings of the IEEE/CVF international conference on computer vision workshops, 2019, pp. 0–0.
- Zbontar, J.; LeCun, Y.; others. Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 2016, 17, 2287–2318.
- Mehltretter, M. Joint estimation of depth and its uncertainty from stereo images using bayesian deep learning. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2022, 2, 69–78. [Google Scholar] [CrossRef]
- Chen, L.; Wang, W.; Mordohai, P. Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17235–17244.
- Menze, M.; Heipke, C.; Geiger, A. Joint 3D Estimation of Vehicles and Scene Flow. ISPRS Workshop on Image Sequence Analysis (ISA), 2015.
- Agrawal, A.; Müller, T.; Schmähling, T.; Elser, S.; Eberhardt, J. RWU3D: Real World ToF and Stereo Dataset with High Quality Ground Truth. 2023 Twelfth International Conference on Image Processing Theory, Tools and Applications (IPTA). IEEE, 2023, pp. 1–6.
- Barnes, C.; Shechtman, E.; Finkelstein, A.; Goldman, D.B. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 2009, 28, 24. [Google Scholar] [CrossRef]


| Network | MSE | MSE | Bad3 [%] | AUC | AUCopt |
| Disparity | Confidence | ||||
| Scene Flow | |||||
| SEDNet[44] | 0.835 | - | 4.31 | 0.36 | 0.19 |
| Sequential | 0.680 | 0.0280 | 3.21 | 0.37 | 0.19 |
| Parallel(5) | 0.674 | 0.0236 | 3.34 | 0.32 | 0.20 |
| Parallel(10) | 0.695 | 0.0220 | 3.47 | 0.32 | 0.21 |
| Parallel(20) | 0.720 | 0.0203 | 3.71 | 0.33 | 0.21 |
| KITTI2015 | |||||
| SEDNet[44] | 10.828 | - | 35.33 | 9.55 | 4.61 |
| Sequential | 1.858 | 0.2705 | 10.70 | 3.71 | 1.23 |
| Parallel(5) | 1.878 | 0.2579 | 11.20 | 3.32 | 1.31 |
| Parallel(10) | 1.738 | 0.2921 | 10.36 | 2.76 | 1.20 |
| Parallel(15) | 1.852 | 0.2679 | 10.82 | 2.89 | 1.27 |
| RWU3D | |||||
| SEDNet[44] | 11.835 | - | 39.07 | 13.47 | 7.10 |
| Sequential | 2.445 | 0.1547 | 17.30 | 6.77 | 2.32 |
| Parallel(5) | 2.681 | 0.0860 | 18.65 | 6.17 | 2.85 |
| Parallel(10) | 2.571 | 0.0824 | 17.89 | 5.22 | 2.46 |
| Parallel(15) | 2.613 | 0.0657 | 18.85 | 5.42 | 2.71 |
| Architecture | Parameters (M) |
|---|---|
| SEDNet | 6.91 |
| Our Network | |
| Stereo Component | 3.93 |
| Confidence Component | 0.54 |
| Total | 4.47 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).