Submitted:
29 June 2025
Posted:
30 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- This study proposes the deep grading method for aggregate gradation, which is a paradigm shift. Compared to traditional aggregate grading, deep aggregate grading is cheaper, more scalable, and more environmentally friendly. This research pushes the application boundary of deep learning technologies in the sector of civil engineering. Furthermore, it holds potential for extension to all industries involving particle size distribution detection, such as pharmaceuticals, food processing, and mining.
- As the initial step, we introduce the SAM and KAN as the components of deep grading. To the best of our knowledge, this is the first time that SAM and KAN are used in the problem of aggregate grading analysis.
- Comprehensive and systematic experiments show that deep grading is competent for aggregate prescreening. Our research paves the way for the future large-scale application of deep grading in civil engineering. In addition, it is worth noting that our datasets are well-curated, which may be of independent interest. We open-source our dataset at Zenodo (https://zenodo.org/uploads/15661205) for practitioners’ free access and testing.
2. Related Work
3. Deep Grading
- Image encoder: A vision transformer (ViT-H/16) [24] pre-trained with 632M parameters forms the backbone, processing 1,024×1,024 resolution images into 16×16 patches to generate a 64×64 feature map. Despite its computational intensity, this encoder operates only once per image to generate a fixed embedding, enabling real-time downstream interactions. SAM’s modular design decouples the image encoder from lightweight prompt/mask decoders, allowing reusable image embeddings across multiple prompts—a critical feature for efficiency and practical deployment.
- Prompt encoder: Spatial prompts (points, boxes) are encoded as positional embeddings using sinusoidal encoding combined with learned representations. Points map to 2D coordinates, while boxes represent top-left and bottom-right coordinate pairs. Mask prompts derive embeddings via a convolutional neural network (CNN), and text prompts utilize CLIP embeddings [27]. All prompt types are projected into a unified 256-dimensional embedding space for seamless integration.
- Mask decoder: A lightweight Transformer decoder fused with a dynamic prediction head computes per-pixel foreground probabilities through bidirectional cross-attention between image and prompt embeddings. To resolve ambiguous prompts, the decoder simultaneously outputs multiple candidate masks and ranks them via learned Intersection over Union (IoU) prediction heads, ensuring robust handling of segmentation uncertainty.
- Assisted-Manual Phase: Professional annotators use an interactive tool powered by SAM to label masks manually. This phase integrates human expertise with model assistance to refine the mask and lay the foundation for subsequent stages.
- Semi-Automatic Phase: The model automatically generates confident mask predictions, which annotators then review and refine. This hybrid approach balances automation with human oversight, enhancing dataset quality and efficiency.
- Fully Automatic Phase: SAM generates masks without human intervention, leveraging the prior training to produce 99.1% of the final masks. This phase ensures scalability, which can create billions of masks while maintaining privacy and image licensing standards.
- Axes: Given the ellipse obtained by fitting the edges of each aggregate, the information from the fitted ellipse includes the major (long) axis and the minor (short) axis are immediately obtained, since they are basic parameters for describing the shape of the ellipse.
- Area: The area of the aggregates is not based on the ellipse but the mask generated by SAM. The area of each mask is computed by counting the number of pixels in the mask.
- Perimeter: Similarly, based on the segmentation mask obtained by SAM, the perimeter can be computed by counting the number of pixels in the profile of the mask.
4. Dataset Curation
4.1. Single Aggregate
4.2. Multiple Aggregate
4.3. Image Acquisition

5. Experiment
- Hardware environment: The experiments are performed on a computer equipped with an NVIDIA Quadro P2000 GPU, an Intel Core i7-8700K CPU, 16GB of RAM, and the Windows 10 operating system.
- Software environment: The experiments are based on the PyTorch framework under Python 3.10.
6. Discussion
6.1. Impact of Architecture in KAN
6.2. Non-Invasiveness of Deep Grading
7. Conclusion
Funding
References
- Fang, M.; Park, D.; Singuranayo, J.L.; Chen, H.; Li, Y. Aggregate gradation theory, design and its impact on asphalt pavement performance: a review. International Journal of Pavement Engineering 2019, 20, 1408–1424. [Google Scholar] [CrossRef]
- Bruno, L.; Parla, G.; Celauro, C. Image analysis for detecting aggregate gradation in asphalt mixture from planar images. Construction and Building Materials 2012, 28, 21–30. [Google Scholar] [CrossRef]
- Reyes-Ortiz, O.J.; Mejia, M.; Useche-Castelblanco, J.S. Digital image analysis applied in asphalt mixtures for sieve size curve reconstruction and aggregate distribution homogeneity. International Journal of Pavement Research and Technology 2021, 14, 288–298. [Google Scholar] [CrossRef]
- Peres, R.S.; Jia, X.; Lee, J.; Sun, K.; Colombo, A.W.; Barata, J. Industrial artificial intelligence in industry 4.0-systematic review, challenges and outlook. IEEE access 2020, 8, 220121–220139. [Google Scholar] [CrossRef]
- Gong, J.; Liu, Z.; Nie, J.; Cui, Y.; Jiang, J.; Ou, X. Study on the automated characterization of particle size and shape of stacked gravelly soils via deep learning. Acta Geotechnica 2025, 1–26. [Google Scholar] [CrossRef]
- Sun, Z.; Li, Y.; Pei, L.; Li, W.; Hao, X. Classification of coarse aggregate particle size based on deep residual network. Symmetry 2022, 14, 349. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2023; pp. 4015–4026. [Google Scholar]
- Ravi, N.; Gabeur, V.; Hu, Y.T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. Sam 2: Segment anything in images and videos. arXiv 2024, arXiv:2408.00714 2024. [Google Scholar]
- Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756 2024. [Google Scholar]
- Weng, Y.; Li, M.; Tan, M.J.; Qian, S. Design 3D printing cementitious materials via Fuller Thompson theory and Marson-Percy model. Construction and Building Materials 2018, 163, 600–610. [Google Scholar] [CrossRef]
- Ma, H.; Xu, W.; Li, Y. Random aggregate model for mesoscopic structures and mechanical analysis of fully-graded concrete. Computers & Structures 2016, 177, 103–113. [Google Scholar]
- Tafesse, S.; Fernlund, J.; Bergholm, F. Digital sieving-Matlab based 3-D image analysis. Engineering Geology 2012, 137, 74–84. [Google Scholar] [CrossRef]
- Thaker, P.; Arora, N. Measurement of Aggregate Size and Shape Using Image Analysis. In Proceedings of the National Conference on Structural Engineering and Construction Management; 2020. [Google Scholar]
- Wang, D.; Wang, H.; Bu, Y.; Schulze, C.; Oeser, M. Evaluation of aggregate resistance to wear with Micro-Deval test in combination with aggregate imaging techniques. Wear 2015, 338, 288–296. [Google Scholar] [CrossRef]
- Prabha, D.S.; Kumar, J.S. Performance evaluation of image segmentation using objective methods. Indian J. Sci. Technol 2016, 9, 1–8. [Google Scholar]
- Yu, Y.; Wang, C.; Fu, Q.; Kou, R.; Huang, F.; Yang, B.; Yang, T.; Gao, M. Techniques and challenges of image segmentation: A review. Electronics 2023, 12, 1199. [Google Scholar] [CrossRef]
- Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems 2021, 33, 6999–7019. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany,October 5-9, 2015, proceedings, part III 18. Springer, 2015; pp. 234–241.
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the Proceedings of the IEEE international conference on computer vision, 2017; pp. 2961–2969. [Google Scholar]
- Patra, S.; Panda, S.; Parida, B.K.; Arya, M.; Jacobs, K.; Bondar, D.I.; Sen, A. Physics informed kolmogorov-arnold neural networks for dynamical analysis via efficent-kan and wav-kan. arXiv, 2024; arXiv:2407.18373 2024. [Google Scholar]
- Zhang, X.; Zhou, H. Generalization bounds and model complexity for kolmogorov-arnold networks. arXiv, 2024; arXiv:2410.08026 2024. [Google Scholar]
- Xu, K.; Chen, L.; Wang, S. Kolmogorov-arnold networks for time series: Bridging predictive power and interpretability. arXiv, 2024; arXiv:2406.02496 2024. [Google Scholar]
- Li, C.; Liu, X.; Li, W.; Wang, C.; Liu, H.; Liu, Y.; Chen, Z.; Yuan, Y. U-kan makes strong backbone for medical image segmentation and generation. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2025, Vol. 39, pp. 4652–4660.
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, 2021.
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Advances in neural information processing systems 2020, 33, 1877–1901. [Google Scholar]
- O Pinheiro, P.O.; Collobert, R.; Dollár, P. Learning to segment object candidates. Advances in neural information processing systems 2015, 28. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International conference on machine learning. PmLR; 2021; pp. 8748–8763. [Google Scholar]
- Bookstein, F.L. Fitting conic sections to scattered data. Computer graphics and image processing 1979, 9, 56–71. [Google Scholar] [CrossRef]
- Ranganathan, A. The levenberg-marquardt algorithm. Tutoral on LM algorithm 2004, 11, 101–110. [Google Scholar]
- for Market Regulation, S.A.; of the People’s Republic of China, S.A. GB/T 14685-2022, Pebble and crushed stone for construction. China Building Materials Federation, Beijng 2022.
- Allgower, E.L.; Schmidt, P.H. Computing volumes of polyhedra. Mathematics of computation 1986, 46, 171–174. [Google Scholar] [CrossRef]
- Szilvśi-Nagy, M.; Matyasi, G. Analysis of STL files. Mathematical and computer modelling 2003, 38, 945–960. [Google Scholar] [CrossRef]
- Hughes, S.; Lau, J. A technique for fast and accurate measurement of hand volumes using Archimedes’ principle. Australasian Physics & Engineering Sciences in Medicine 2008, 31, 56–59. [Google Scholar]
- Hughes, S.W. Archimedes revisited: a faster, better, cheaper method of accurately measuring the volume of small objects. Physics education 2005, 40, 468. [Google Scholar] [CrossRef]













| Cost | Scalability | Environmental Impact | |
|---|---|---|---|
| Traditional | high | low | high |
| Deep | low | high | low |
| Category | Precision(%) | Recall(%) | F1 Score(%) | Support |
|---|---|---|---|---|
| 1 (37.5mm-53mm) | 67 | 65 | 66 | 80 |
| 2 (31mm-37.5mm) | 58 | 59 | 58 | 160 |
| 3 (26.5mm-31mm) | 60 | 58 | 59 | 198 |
| 4 (19mm-26.5mm) | 81 | 84 | 82 | 847 |
| 5 (16mm-19mm) | 70 | 69 | 70 | 583 |
| 6 (9.5mm-16mm) | 85 | 81 | 83 | 870 |
| 7 (4.75mm-9.5mm) | 84 | 87 | 85 | 549 |
| accuracy (%) | 78 | 3287 | ||
| macro avg(%) | 72 | 72 | 72 | 3287 |
| weighted avg(%) | 78 | 78 | 78 | 3287 |
| Category | Standard Deviation (%) | Mean Value (%) |
|---|---|---|
| 1 (37.5mm-53mm) | 15.81 | 8.77 |
| 2 (31mm-37.5mm) | 22.48 | -1.61 |
| 3 (26.5mm-31mm) | 25.31 | -3.25 |
| 4 (19mm-26.5mm) | 24.84 | -0.84 |
| 5 (16mm-19mm) | 24.79 | -5.41 |
| 6 (9.5mm-16mm) | 30.26 | -3.89 |
| 7 (4.75mm-9.5mm) | 33.21 | -1.35 |
| Global Data | 27.75 | -2.49 |
| Category | Actual Passing Percentage | Predicted Passing Percentage | ||
|---|---|---|---|---|
| Mean | Std | Mean | Std | |
| 1 (37.5mm-53mm) | 97.21 | 0.78 | 96.53 | 1.21 |
| 4 (19mm-26.5mm) | 52.49 | 6.87 | 52.50 | 7.02 |
| 6 (9.5mm-16mm) | 19.58 | 3.73 | 17.05 | 4.28 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).