Submitted:
17 April 2024
Posted:
18 April 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Changing lighting conditions and complex indoor backgrounds make it difficult for MV algorithms to segment feature areas from images. In addition to the plant itself, there are irrigation pipes, suspension cables, mechanical equipment and other support facilities. The lighting also changes periodically according to the needs of the plants, i.e. growth stages.
- There are gaps in the knowledge of the application of MV in specific indoor scenarios, which affect the effectiveness of the technology.
2. Materials and Methods
2.1. Experimental Field
2.1.1. Hydroponics System and Planting
2.1.2. Camera Systems
2.2. Image Data Acquisition
2.3. Image/Data Pre-processing
2.3.1. Image Annotation
2.3.2. Image Augmentation
- Flipping the image on the horizontal axis.
- Rotating the image by an angle up to 40 degrees selected randomly from the uniform distribution.
- Randomly changing the brightness and contrast of the image.
2.3.3. Datasets
2.4. Detection and Classification Methods and Tools
2.4.1. Model Training Procedurce
- The backbone is basically the same as that of YOLOv5 and consists of a series of convolutional and deconvolutional layers to extract features. It also incorporates residual connections and bottleneck structures to reduce network size and improve performance [29]. This part uses the C2f module as the basic building block that replaces the C3 module in YOLOv5. This offers better gradients to support different model scales by adjusting the number of channels. At the end of backbone, an SPPF module is used, and three Maxpools of size are passed serially, and then, each layer is concatenation, so as to guarantee the accuracy of objects in various scales while ensuring a light weight simultaneously [28].
- The neck part uses multi-scale feature fusion techniques to merge feature maps from different stages of the backbone to improve feature representation capabilities. The feature fusion method used by YOLOv8 is still PAN-FPN [30,31], which strengthens the fusion and utilization of feature layer information at different scales. Two upsampling and multiple C2f modules together with the final decoupled head structure are used to compose the neck module. The idea of decoupling the head in YOLOvx, is used by YOLOv8 for the last part of the neck. It combines confidence and regression boxes to achieve a new level of accuracy [28].
- The head, which is responsible for the final object detection and classification tasks, adopts a decoupled head structure, separating the classification and detection heads branches. The detection head consists of a series of convolutional and deconvolutional layers to generate detection results, while the classification head uses global average pooling to classify each feature map. It also adopts an anchor-free strategy, abandoning the anchor boxes used in YOLOv7, which reduces the number of box predictions and improves the speed of Non-Maximum Suppression (NMS) [29]. For loss computation, YOLOv8 uses the Task Aligned Assigner positive sample assignment strategy. It uses BCE Loss for the classification branch and Distribution Focal Loss (DFL) and CIoU Loss for loss computation in the regression branch. YOLOV8 requires decoding the integral representation of bbox shapes in Distribution Focal Loss, using Softmax and Conv computations to transform them into conventional 4-dimensional bboxes. The head section outputs feature maps at 6 different scales. The predictions from the classification and bbox branches at three different scales are concatenated and dimensionally transformed. The final output includes three feature maps at scales of , and (inputs of the "Detect" blocks in Figure 7).


2.4.2. Model Parameters
- The number of epochs indicates how often the entire training dataset was run through the YOLOv8 algorithm. 300 epochs are selected to ensure adaptation on the one hand and not to overstretch the calculation time on the other.
- The batch size specifies how many parts the training dataset is divided into. After each run, the model performance is adjusted to improve the learning performance. One epoch corresponds to the run of all batches. 16 batches are selected for the application, as smaller batches lead to faster convergence, but are subject to greater fluctuations. Larger batches lead to more accurate estimates, but require more storage capacity and therefore computing power.
- The image size refers to the dimensions of the images in width and height that are processed by the model. Higher resolutions require more computing resources, smaller resolutions can lead to a loss of information. As this task involves small details such as buds, blossoms and fruit buds, an image format of 640 x 480 is selected. YOLO models process images in a 1:1 ratio, so the shorter side of the image is filled with black bars to maintain the original ratio.
| Epochs | Batch Size | Image Size | Model |
|---|---|---|---|
| 300 | 16 | YOLOv8n | |
| YOLOv8m | |||
| YOLOv8l |
2.4.3. Computation Environment
2.5. Performance Evaluation Metrics
3. Experimental Results and Comparative Analysis
3.1. Image Recordings
3.2. Model Training Results
3.2.1. Training Results Bird’s Eye View
3.2.2. Training Results Bird’s Eye and Side View
3.2.3. Summary of the Training Results
3.3. Test Results
3.3.1. Test Results Bird’s Eye View
3.3.2. Test Results Bird’s Eye and Side View
3.3.3. Summary of the Test Results
| Model | mAP50/% Test | mAP50-95/% Validation | Precision/% | Recall/% |
|---|---|---|---|---|
| Bird’s eye view | ||||
| YOLOv8n-BV | 92.9 | 94.4 | 98.8 | 98.8 |
| YOLOv8m-BV | 92.6 | 94.2 | 98.4 | 98.8 |
| YOLOv8l-BV | 93.4 | 94.8 | 99.2 | 98.4 |
| Bird’s eye and side view | ||||
| YOLOv8n-BSV | 93.7 | 93.0 | 99.6 | 99.2 |
| YOLOv8m-BSV | 96.3 | 94.4 | 99.2 | 98.8 |
| YOLOv8l-BSV | 95.6 | 93.8 | 99.2 | 98.8 |


4. Discussion
4.1. Image Quality
4.2. Data and Model Discussion
5. Conclusion and Future Directions
Abbreviations
| AP | Average Precision |
| BV | Bird’s Eye View |
| BSV | Bird’s Eye and Side View |
| CAISA | Cologne Lab for Artificial Intelligence and Smart Automation |
| cls loss | class loss |
| CNN | Convolutional Neural Network |
| CPU | Central Processing Unit |
| dfl loss | defocus loss |
| DL | Deep Learning |
| EC | Electric Conductivity |
| FN | False Negative |
| FP | False Positive |
| GB | Gigabyte |
| GPU | Graphical Processing Unit |
| IDE | Integrated Development Environment |
| IoU | Intersection over Union |
| LED | Light-Emitting Diode |
| mAP | mean Average Precision |
| MB | Megabyte |
| ML | Machine Learning |
| MV | Machine Vision |
| NFT | Nutrient Film Technique |
| PAR | Photosynthetically Active Radiation |
| pH | potential Hydrogen |
| PPFD | Photosynthetically Active Photon Flux Density |
| PR | Precision-Recall |
| SV | Side View |
| TN | True Negative |
| TP | True Positive |
| VIF | Vertical Indoor-Farming / Vertical Indoor-Farm |
| YOLO | You Only Look Once |
References
- Despommier, D. D. The Vertical Farm: Feeding the World in the 21st Century; Picador: USA, 2020.
- Polsfuss, L. PFLANZEN. Available online: https://pflanzenfabrik.de/hydroponik/pflanzen/ (accessed on 5 April 2024).
- Available online: https://chili-plants.com/chilisorten/ (accessed on 2 December 2023).
- Drache, P. Chili Geschichte, Herkunft und Verbreitung. Available online: https://chilipflanzen.com/wissenswertes/chili-geschichte/ (accessed on 2 December 2023).
- Azlan, A.; Sultana, S.; Huei, C. S.; Razman, M. R. Antioxidant, Anti-Obesity, Nutritional and Other Beneficial Effects of Different Chili Pepper: A Review. Molecules 2022, 27, 898. https://www.mdpi.com/1420-3049/27/3/898.
- Thiele, R. Untersuchungen zur Biosynthese von Capsaicinoiden – Vorkommen und Einfluss von Acyl-Thioestern auf das Fettsäuremuster der Vanillylamide in Capsicum spp.; Dissertation, Bergische Universität Wuppertal, 2008. Available online: https://elekpub.bib.uni-wuppertal.de/urn/urn:nbn:de:hbz:468-20080466.
- Meier, U. Entwicklungsstadien mono- und dikotyler Pflanzen: BBCH Monografie, Quedlinburg, 2018. https://www.openagrar.de/receive/openagrar_mods_00042352.
- Feldmann, F.; Rutikanga, A. Phenological growth stages and BBCHidentification keys of Chilli (Capsicum annuum L., Capsicum chinense JACQ., Capsicum baccatum L. J. Plant Dis. Prot. 2021, 128, 549–555. [CrossRef]
- Paul, N. C.; Deng, J. X.; Sang, H.-K.; Choi, Y.-P.; Yu, S.-H. Distribution and Antifungal Activity of Endophytic Fungi in Different Growth Stages of Chili Pepper (Capsicum annuum L.) in Korea. The Plant Pathology Journal 2012, 28, 10–19. [CrossRef]
- Paul, A.; Nagar, H.; Machavaram R. Utilizing Fine-Tuned YOLOv8 Deep Learning Model for Greenhouse Capsicum Detection and Growth Stage Determination. In Proceedings of the 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), Kirtipur, Nepal: IEEE, 11–13 Oct. 2023; pp. 649–656. https://ieeexplore.ieee.org/document/10290335.
- Kamilaris, A. Prenafeta-Boldú, F. X. A review of the use of convolutional neural networks in agriculture. J. Agric. Sci. 2018, 156, 312–322. [CrossRef]
- Tian, H.; Wang, Liu, Y.; Qiao, X.; Li, Y. Computer vision technology in agricultural automation —A review. Information Processing in Agriculture 2020, 7, 1–19. [CrossRef]
- Lin, K.; Chen, J.; Si, H.; Wu, J. A Review on Computer Vision Technologies Applied in Greenhouse Plant Stress Detection. In: Tan, T.; Ruan, Q.; Chen, X.; Ma, H.; Wang, L. (eds) Advances in Image and Graphics Technologies. IGTA 2013. Communications in Computer and Information Science, 363, Berlin: Springer. [CrossRef]
- Wijanarko, A.; Nugroho, A. P.; Kusumastuti, A. I.; Dzaky, M. A. F.; Masithoh, R. E.; Sutiarso, L.; Okayasu, T. Mobile mecavision: automatic plant monitoring system as a precision agriculture solution in plant factories. IOP Conf. Series: Earth and Environmental Science 2021, 733, 012026. https://iopscience.iop.org/article/10.1088/1755-1315/733/1/012026.
- Samiei, S.; Rasti, P.; Ly Vu, J.; Buitink, J.; Rousseau, D. Deep learning-based detection of seedling development. Plant Methods 2020, 16, 103. [CrossRef]
- Yeh, Y.-H. F. ; Lai, T.-C.; Liu, T.-Y.; Liu, C.-C.; Chung, W.-C.; Lin, T.-T. An automated growth measurement system for leafy vegetables. Biosystems Engineering 2014, 117, 43–50. [CrossRef]
- Nugroho, A. P.; Fadilah, M. A. N.; Wiratmoko, A.; Azis, Y. A.; Efendi, A. W.; Sutiarso, L.; Okayasu, T. Implementation of crop growth monitoring system based on depth perception using stereo camera in plant factory. IOP Conf. Series: Earth and Environmental Science 2020, 542, 012068. https://iopscience.iop.org/article/10.1088/1755-1315/542/1/012068.
- Available online: https://www.pflanzenforschung.de/de/pflanzenwissen/lexikon-a-z/phaenotypisierung-10020 (accessed on 2 December 2023).
- Li, Z.; Guo, R.; Li, M.; Chen, Y.; Li, G. A review of computer vision technologies for plant phenotyping. Computers and Electronics in Agriculture 2020, 176, 105672. [CrossRef]
- Redmon, J.; Divvala, S. K.; Girshick, R. B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016, pp. 779–788. https://ieeexplore.ieee.org/document/7780460.
- Hespeler, S. C.; Nemati, H.; Dehghan-Niri, E. Non-destructive thermal imaging for object detection via advanced deep learning for robotic inspection and harvesting of chili peppers. Artificial Intelligence in Agriculture 2021, 5, 102–117. [CrossRef]
- Coleman, G. R.; Kutugata, M.; Walsh, M. J.; Bagavathiannan, M. Multi-growth stage plant recognition: a case study of Palmer amaranth (Amaranthus palmeri) in cotton (Gossypium hirsutum). arXiv 2023, arXiv:2307.15816. https://arxiv.org/abs/2307.15816.
- Zhang, P.; Li, D. CBAM + ASFF-YOLOXs: An improved YOLOXs for guiding agronomic operation based on the identification of key growth stages of lettuce. Computers and Electronics in Agriculture 2022, 203, 107491. [CrossRef]
- grow-shop24. DiamondBox Silver Line SL150, 150×150×200cm. Available online: https:// www.grow-shop24.de/diamondbox-silver-line-sl150 (accessed on 8 June 2023).
- Available online: https://docs.roboflow.com/ (accessed on 6 December 2023).
- Buslaev, A.; Iglovikov, V. I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A. A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. https://www.mdpi.com/2078-2489/11/2/125.
- Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics. Available online: https://github.com/ultralytics/ultralytics (accessed on 30 November 2023).
- Lou, H.; Duan, X.; Guo, J.; Liu, H.; Gu, J.; Bi, L.; Chen, H. DC-YOLOv8: Small-Size Object Detection Algorithm Based on Camera Sensor. Electronics 2023, 12, 2323. [CrossRef]
- Pan, Y.; Xiao, X.; Hu, K.; Kang, H.; Jin, Y.; Chen, Y.; Zou, X. ODN-Pro: An Improved Model Based on YOLOv8 for Enhanced Instance Detection in Orchard Point Clouds. Agronomy 2024, 14, 697. [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. https://ieeexplore.ieee.org/document/8579011.
- Lin, T.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, 21–26 July 2017; pp. 2117–2125. https://ieeexplore.ieee.org/document/8099589.
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context; Springer: Berlin/Heidelberg, Germany, 2014; Volume 8693, pp. 740–755. https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48.
- Padilla, R.; Passos,W. L.; Dias, T. L. B.; Netto, S. L., da Silva, E. A. B. A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics 2021, 10, 279. [CrossRef]
- Akbarnezhad, E. YOLOv8 Projects #1 "Metrics, Loss Functions, Data Formats, and Beyond". Available online: https://www.linkedin.com/pulse/yolov8-projects-1-metrics-loss-functions-data-formats-akbarnezhad/ (accessed on 16 November 2023).


















| Item | Quantity | Item description |
|---|---|---|
| 1 | 4 | HQ Raspberry Pi camera |
| 2 | 4 | 6MM WW Raspberry Pi lens |
| 3 | 4 | Raspberry Pi Zero W |
| 4 | 1 | Raspberry Pi 4 B |
| 5 | 2 | Mantona 17996 travel tripod |
| 6 | 4 | USB-A to USB-B-Mini 3 m cable |
| 7 | 1 | USB-C power adapter |
| 8 | 5 | Memory card SanDisk Ultra microSDHC 32GB |
| 9 | 1 | Energenie Uni-4-fold USB charger |
| Bird’s eye view | Bird’s eye and side view | |||||
|---|---|---|---|---|---|---|
| Train (BV) | Val (BV) | Test B(V) | Train (BSV) | Val (BSV) | Test (BSV) | |
| Growing | 446 | 77 | 121 | 2,112 | 107 | 121 |
| Flowering | 498 | 28 | 49 | 914 | 39 | 49 |
| Fruiting | 624 | 26 | 72 | 1,932 | 95 | 72 |
| Data | Values |
|---|---|
| CPU | 12th Gen Intel Core i9-13900 |
| RAM | 64 GB DDR5 |
| GPU | NVIDIA RTX A5000 |
| Algorithm | YOLOv8n, -v8m, -v8l |
| Assignement | Explanation |
|---|---|
| True Positive (TP) | Bounding box in the correct position (positive) and correct prediction (true) |
| False Positive (FP) | Bounding box in the right place (positive), but wrong prediction (false) |
| False Negative (FN) | Bounding box not recognised (negative) and incorrect prediction (false) |
| True Negative (TN) | Bounding box not recognised (negative) and correct prediction (true); no influence for multiclass tasks |
| Classes | mAP50 / % | mAP50-95 / % |
|---|---|---|
| All | 99.0 | 94.4 |
| Growing | 98.3 | 90.1 |
| Flowering | 99.3 | 96.6 |
| Fruiting | 99.5 | 96.5 |
| Classes | mAP50 / % | mAP50-95 / % |
|---|---|---|
| All | 99.0 | 94.2 |
| Growing | 98.0 | 89.9 |
| Flowering | 99.4 | 95.9 |
| Fruiting | 99.5 | 97.0 |
| Classes | mAP50 / % | mAP50-95 / % |
|---|---|---|
| All | 98.9 | 94.8 |
| Growing | 97.8 | 91.0 |
| Flowering | 99.5 | 95.8 |
| Fruiting | 99.5 | 97.6 |
| Classes | mAP50 / % | mAP50-95 / % |
|---|---|---|
| All | 98.8 | 93.0 |
| Growing | 99.4 | 89.1 |
| Flowering | 97.8 | 95.1 |
| Fruiting | 99.3 | 94.9 |
| Classes | mAP50 / % | mAP50-95 / % |
|---|---|---|
| All | 98.8 | 94.4 |
| Growing | 99.4 | 91.6 |
| Flowering | 97.8 | 96.6 |
| Fruiting | 99.2 | 95.2 |
| Classes | mAP50 / % | mAP50-95 / % |
|---|---|---|
| All | 98.0 | 93.8 |
| Growing | 98.5 | 91.7 |
| Flowering | 96.9 | 95.8 |
| Fruiting | 98.7 | 94.0 |
| Model | mAP50/% | mAP50-95/% | Epochs | Training time/h | Model size/MB |
|---|---|---|---|---|---|
| Bird’s eye view | |||||
| YOLOv8n-BV | 99.0 | 94.4 | 300 | 0.576 | 6.3 |
| YOLOv8m-BV | 99.0 | 94.2 | 197 | 1.032 | 52 |
| YOLOv8l-BV | 98.9 | 94.8 | 251 | 1.956 | 87.7 |
| Bird’s eye and side view | |||||
| YOLOv8n-BSV | 98.8 | 93.0 | 230 | 0.895 | 6.3 |
| YOLOv8m-SBV | 98.8 | 94.4 | 241 | 2.474 | 52 |
| YOLOv8l-BSV | 98.0 | 93.8 | 272 | 4.138 | 87.7 |
| YOLOv8n-BV | YOLOv8m-BV | YOLOv8l-BV | |
|---|---|---|---|
| /% | |||
| All | 92.9 | 92.6 | 93.4 |
| Growing | 91.2 | 91.2 | 92.1 |
| Flowering | 95.6 | 95.4 | 96.1 |
| Fruiting | 91.9 | 91.1 | 91.9 |
| YOLOv8n-BSV | YOLOv8m-BSV | YOLOv8l-BSV | |
|---|---|---|---|
| /% | |||
| All | 93.7 | 96.3 | 95.6 |
| Growing | 91.8 | 94.5 | 94.4 |
| Flowering | 97.3 | 98.8 | 98.3 |
| Fruiting | 92.0 | 95.7 | 94.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).