Submitted:
28 March 2024
Posted:
28 March 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Quantitative evaluation of the effects of model simplification.
- Construction of a knowledge distillation framework for light field depth estimation models.
- Demonstration of achieving both lightweight and high-precision models, and presenting the potential for further lightweighting through iteration.
2. Related Work
Depth Estimation with Cameras
Light Field Camera
EPINET
Knowledge Distillation
Metrics for Error Evaluation
3. Proposed Method
Lightweight Design
Accuracy Enhancement
Loss Functions
4. Experimental Setup
Lightweighting Experiments
Simplification by Reducing Input Streams
Simplification by Reducing Convolution Blocks
4.1. Accuracy Enhancement Experiments
Limits of Model Disparity between Teacher and Student
The Evaluation of Optimal Knowledge Position
5. Results
Lightweighting Experiments
Simplification by Reducing Input Streams
Simplification by Reducing Convolution Blocks
5.1. Accuracy Enhancement Experiments
The Evaluation of the Limits of Model Disparity between Teacher and Student
The Evaluation of Knowledge Location
Peak Performance across Different Student Models
6. Conclusion
Acknowledgments
References
- Ulusoy, U.; Eren, O.; Demirhan, A. Development of an obstacle avoiding autonomous vehicle by using stereo depth estimation and artificial intelligence based semantic segmentation. Engineering Applications of Artificial Intelligence 2023, 126, 106808.
- Miclea, V.C.; Nedevschi, S. Monocular Depth Estimation With Improved Long-Range Accuracy for UAV Environment Perception. IEEE Transactions on Geoscience and Remote Sensing 2022, 60.
- Xiong, J.; Hsiang, E.L.; He, Z.; Zhan, T.; Wu, S.T. Augmented reality and virtual reality displays: emerging technologies and future perspectives. Light: Science & Applications 2021, 10, 1–30.
- Pilzer, A.; Lathuiliere, S.; Sebe, N.; Ricci, E. Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9768–9777.
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3d object detection network for autonomous driving. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 1907–1915.
- Hazirbas, C.; Ma, L.; Domokos, C.; Cremers, D. Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part I 13. Springer, 2017, pp. 213–228.
- Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. Real-time human pose recognition in parts from single depth images. CVPR 2011. Ieee, 2011, pp. 1297–1304.
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 2017.
- Wang, Y.; Li, X.; Shi, M.; Xian, K.; Cao, Z. Knowledge distillation for fast and accurate monocular depth estimation on mobile devices. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2457–2465.
- Ng, R.; Levoy, M.; Brédif, M.; Duval, G.; Horowitz, M.; Hanrahan, P. Light field photography with a hand-held plenoptic camera. PhD thesis, Stanford university, 2005.
- Shin, C.; Jeon, H.G.; Yoon, Y.; Kweon, I.S.; Kim, S.J. EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images. CVPR. Computer Vision Foundation, 2018. Available: IEEE Xplore.
- Hassan, A.; Sjöström, M.; Zhang, T.; Egiazarian, K. Light-weight epinet architecture for fast light field disparity estimation. 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2022, pp. 1–5.
- Richter, M.L.; Schöning, J.; Wiedenroth, A.; Krumnack, U. Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training. arXiv preprint arXiv:2106.12307 2021.
- Adelson, E.H.; Wang, J.Y. Single lens stereo with a plenoptic camera. IEEE transactions on pattern analysis and machine intelligence 1992, 14, 99–106.
- Wang, Y.; Wang, L.; Wu, G.; Yang, J.; An, W.; Yu, J.; Guo, Y. Disentangling Light Fields for Super-Resolution and Disparity Estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. College of Electronic Science and Technology, National University of Defense Technology; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University; School of Information Science and Technology, ShanghaiTech University, 2019.
- Vizcaino, J.P.; Wang, Z.; Symvoulidis, P.; Favaro, P.; Guner-Ataman, B.; Boyden, E.S.; Lasser, T. Real-time light field 3D microscopy via sparsity-driven learned deconvolution. 2021 IEEE International Conference on Computational Photography (ICCP). IEEE, 2021, pp. 1–11.
- Jeon, H.G.; Park, J.; Choe, G.; Park, J.; Bok, Y.; Tai, Y.W.; Kweon, I.S. Accurate Depth Map Estimation from a Lenslet Light Field Camera. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Korea Advanced Institute of Science and Technology (KAIST), Republic of Korea, 2015.
- Wang, T.C.; Zhu, J.Y.; Hiroaki, E.; Chandraker, M.; Efros, A.A.; Ramamoorthi, R. A 4D Light-Field Dataset and CNN Architectures for Material Recognition. European Conference on Computer Vision. Springer, 2016, pp. 121–138.
- Li, J.; Lu, M.; Li, Z.N. Continuous Depth Map Reconstruction From Light Fields. IEEE Transactions on Image Processing 2015, 24, 3257–3265.
- Sheng, H.; Liu, Y.; Yu, J.; Wu, G.; Xiong, W.; Cong, R.; Chen, R.; Guo, L.; Xie, Y.; Zhang, S.; others. LFNAT 2023 Challenge on Light Field Depth Estimation: Methods and Results. Proceedings of the CVPR Workshop. Computer Vision Foundation, 2023.
- Miya, R.; Kawaguchi, T.; Saito, T. Real-time depth estimation machine learning model for Light Field raw images. The 17th Asian Symposium on Visualization, 2023.
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531 2015.
- He, Y.; Zhang, X.; Sun, J. Channel Pruning for Accelerating Very Deep Neural Networks. International Conference on Computer Vision (ICCV). IEEE, 2017. Available: Computer Vision Foundation open access.
- Polino, A.; Pascanu, R.; Alistarh, D. Model Compression via Distillation and Quantization. International Conference on Learning Representations. ICLR, 2018.
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. arXiv preprint arXiv:2006.05525 2020. Accepted for publication in International Journal of Computer Vision (2021).
- Liu, Y.; Shun, C.; Wang, J.; Shen, C. Structured knowledge distillation for dense prediction. arXiv preprint arXiv:1903.04197 2019.
- Li, Z.; Yang, B.; Yin, P.; Qi, Y.; Xin, J. Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data. arXiv preprint arXiv:2308.02023 2023. Available: arXiv.org.
- Honauer, K.; Johannsen, O.; Kondermann, D.; Goldluecke, B. A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields. Proceedings of the Asian Conference on Computer Vision (ACCV). HCI, Heidelberg University; University of Konstanz, 2016.
- Cho, J.H.; Hariharan, B. On the Efficacy of Knowledge Distillation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE, , 2019.












| Model | Number of Convolution Blocks |
|---|---|
| Baseline Model | 7 |
| Lightweight Model 1 | 5 |
| Lightweight Model 2 | 3 |
| Lightweight Model 3 | 1 |
| Scenario | Input Size | Number of Convolution Blocks | ||
|---|---|---|---|---|
| Teacher | Student | Teacher | Student | |
| 1 | 25×25 | 23×23 | 7 | 6 |
| 2 | 25×25 | 19×19 | 7 | 4 |
| Scenario | Position of Transferred Knowledge | |
|---|---|---|
| Teacher | Student | |
| 1 | 7th | 6th |
| 2 | 6th | 5th |
| 3 | 5th | 4th |
| 4 | 4th | 3rd |
| Student Model | ||
|---|---|---|
| Compact | Ultra-compact | |
| Best MSE | 1.71 ⟶ 1.41 | 1.73 ⟶ 1.84 |
| Best BadPix | 5.31 ⟶ 4.67 | 7.43 ⟶ 8.16 |
| lower is better | ||
| Model | Hint Layer Position | Best MSE | Best BadPix | Runtine[ms] |
|---|---|---|---|---|
| Teacher | - | 1.56 | 4.41 | 610 |
| Student Baseline | - | 1.71 | 5.32 | 545 |
| Student Lightweight1 | 6th | 1.41 | 4.67 | 545 |
| Student Lightweight2 | 5th | 1.44 | 4.45 | 545 |
| Student Lightweight3 | 4th | 1.51 | 4.73 | 545 |
| Student Lightweight4 | 3rd | 1.51 | 4.37 | 545 |
| lower is better | lower is better |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).