3.2. Semantic Segmentation in Forestry
Because of its potential to give vital insights into forest management and conservation, semantic segmentation in forestry has become an essential topic of research. Much research in recent years has focused on creating and evaluating various algorithms and strategies for the semantic segmentation of forest photos. We explore 10 works that have contributed to this topic in this literature review.
The authors of the paper “Tree species classification of forest stands using multisource remote sensing data” [
37], worked on creating a system that could identify tree species automatically using deep learning algorithms. Their goal was to make this system available on mobile devices. They identified tree leaves in images and utilized them to categorize the tree species. The authors used a U-Net architecture, a popular deep-learning model for medical image segmentation [
32], to separate the leaves from the images. The U-Net model includes two networks: an encoder network that captures image features and a decoder network that generates the segmentation map. The authors employed two categories for the segmentation task: “leaf” and “background”. The U-Net model was trained with a dataset of 9,000 tree leaf images that were manually annotated with their respective species labels.
The authors employed a U-Net to classify the tree species after segmenting the leaves. The VGG16 model, a pre-trained CNN, was utilized for computer vision tasks. The VGG16 model was adjusted by using the segmented leaves and their respective species labels. The classification task was performed using a dataset of 10,000 images of trees, which included 20 different species.
The system was tested with a dataset of 900 tree images belonging to five distinct species by the authors. According to the findings, the accuracy rate for species classification was 93.3%.
The paper does not mention if the authors have made their code available for their method. The TensorFlow Lite framework was utilized for deploying the model on mobile devices, indicating that the method could potentially be implemented with this framework.
Lagos et al. [
16] introduced the “FinnWoodlands Dataset” in their academic article, which is a dataset tailored for image analysis in the setting of Finnish woodlands. The authors’ primary focus was on segmentation tasks, wherein they provided valuable insights into the classes utilized and the specific objects that were segmented.
The article lacks clarity on whether the segmentation process was limited to trees or extended to encompass other objects present in the woodland area. Given the context of the dataset and the authors’ strong emphasis on image analysis in forests, it can be inferred that the segmentation task involved identifying and classifying various elements in the Finnish forests, such as trees, plants, leaves, the ground, and potentially other pertinent components.
Panoptic segmentation [
14], is a relatively recent advancement in computer vision, and the work predates its release. Consider a world in which machines can perceive as humans do. They could distinguish between a tree and a bird, or between a forest and a city. This method enables machines to perceive and grasp the world in a surprisingly human-like manner. Panoptic segmentation is very useful in the field of tree/forest segmentation. Unlike older approaches, panoptic segmentation recognizes individual items and their context holistically. It’s like giving machines the ability to distinguish between trees and the entire forest. This technique improves accuracy in detecting and segmenting trees, especially in tightly packed forests. It is about making the unseen visible. From forest management to autonomous navigation, real-time use of this technology has the potential to transform numerous fields. However, Panoptic segmentation necessitates substantial processing resources, which may be a barrier for some applications. Integrating semantic and instance segmentation remains a problem for academics. While it works well in many situations, panoptic segmentation may suffer in certain settings, such as poor light or when the trees are of similar shape and size. Panoptic segmentation is a two-edged sword, whether it’s due to the painstaking precision or the computational needs. It is a game changer in tree/forest segmentation, but like any technology, it has its own set of obstacles.
As a result, there is no precise information available on how panoptic segmentation was utilized or if it was employed at all to discriminate tree species or other items in the dataset. The paper lacks explicit information on the methodology the authors used to distinguish between various tree types with regard to the differentiation of tree species. It is plausible that the differentiation of tree types was achieved through the utilization of diverse visual attributes, including but not limited to shape, texture, color, or a composite of these characteristics. Nonetheless, the precise approach to the classification of tree types has not been revealed.
The methods, models, or frameworks employed for segmentation tasks are not explicitly referenced in the paper. Given the characteristics of the dataset, it is conceivable that the researchers used traditional image analysis and computer vision techniques. Code available at Github.
Nevalainen et al. [
12], in their paper, ‘Individual tree detection and classification with UAV-based photogrammetric point clouds and hyperspectral imaging’, offer a unique deep learning strategy for identifying single-tree species in densely populated regions using hyperspectral data. The approach analyzes photos acquired across a semideciduous forest in the Brazilian Atlantic biome using 25 spectral bands spanning from 506 to 820 nm. A band combination selection step, feature map extraction, and multi-stage model refinement of the confidence map are all part of the network’s design. In a complex forest, the technique obtained state-of-the-art performance for recognizing and geolocating each tree species in UAV-based hyperspectral pictures. When compared to a principal component analysis (PCA) methodology, the strategy is better.
Within the network’s design, the authors estimate a combination of hyperspectral bands that contribute the most to the given goal. A unique deep-learning algorithm for hyperspectral imaging is proposed in this study to recognize and geolocate single-tree species in a tropical forest [
13].The strategy is intended to deal with a crowded scene and the Hughes effect. Within the network’s design, the suggested technique seeks to estimate a combination of hyperspectral bands that contribute the most to the job. The architecture of the network decreases noise and improves performance in the given job. The suggested technique is successful under different scenarios, and the network’s performance is commensurate with past deep learning studies.
The suggested approach may be used to detect Syagrus romanzoffiana, a palm tree important for forest regeneration, and can also be used in wildlife investigations. The proposed approach can also be used to identify other tree species, such as tapirs, which eat palm tree fruits and transmit their seeds through their excrement. This work provides a unique deep-learning algorithm based on a CNN architecture for detecting single-tree species in hyperspectral UAV-based photos with high dimensionality. The strategy was built with a band selection feature in the first phase, which was effective for dealing with high dimensionality and outperformed the baseline method that considered all 25 spectral bands and the PCA approach. Following the CNN architecture, feature map extraction and a multi-stage model refinement of the confidence map are performed.
The suggested technique performed exceptionally well at recognizing and geolocating trees in UAV-based hyperspectral pictures, with f-measure, precision, and recall values of 0.959, 0.973, and 0.945, respectively. The method presented here is useful for monitoring forest environments while accurately identifying specific trees. The use of a unique hyperspectral camera from a UAV or aircraft to detect bark beetle damage in urban forests at the individual tree level has piqued researchers’ curiosity. Peng et al. [
31] in their paper, Densely based multi-scale and multi-modal fully convolutional networks for high-resolution remote-sensing image semantic segmentation, employed convolutional neural networks, weighted and conventional support vector machines, and random forests (RF) to classify tree species using hyperspectral and photogrammetric data. Deep learning in remote sensing applications has resulted in numerous advances, including the detection of fir trees damaged by bark beetles in unmanned aerial vehicle images, oil palm tree detection and counting for high-resolution remote sensing images, and the use of deep convolutional networks for large-scale image recognition.
The use of a worldview-2/3 and LiDAR data fusion technique, as well as the application of convolutional neural networks for the simultaneous extraction of roads and buildings in remote sensing imagery, have also been investigated [
13]. Deep learning’s application in remote sensing data processing has also resulted in the creation of new applications and problems in the area. The work addresses numerous remote sensing investigations, such as the processing and evaluation of spectrometric, and stereoscopic images, radiometric correction of close-range spectral picture blocks, and enhancement item counts using heatmap regulation. It also looks at automated land cover categorization, land use mapping, change detection, and forest inventory are among the applications. Deep learning algorithms are constantly being developed, which has greatly improved the accuracy and efficiency of these remote sensing jobs [
16].
The method was tested on two datasets: one with a single multispectral image and the other with a series of images taken over time [
17]. The forest segmentation accuracy on both datasets was high, as achieved by the authors. The study conducted by the authors did not involve the segmentation of individual trees or the differentiation of tree species. The technique they suggested may have the potential to classify tree species in upcoming research.
The forest segmentation was performed using a U-Net CNN architecture by the authors. The U-Net weights were initialized using transfer learning with a pre-trained VGG-16 network. The network was implemented using the Keras deep learning framework. The code for the method was not included in the paper. The author shared information about the software and hardware utilized in their experiments.
Chen et al. [
4] paper, “Individual Tree Species Identification Based on a Combination of Deep Learning and Traditional Features” wanted to classify tree species in a study region using machine learning algorithms on UAV-based hyperspectral data. They did not categorize the data too meticulously. They instead employed supervised learning to categorize the tree species based on the spectral properties of the UAV-based hyperspectral data.
For their investigation, the authors chose six tree species: Holm oak, Cork oak, Stone pine, Eucalyptus, Maritime pine, and Acacia. In the categorization procedure, they used these six classes as their target classes. The authors used two feature selection methods to extract features from hyperspectral data: the Sequential Forward Selection algorithm and the Mutual Information (MI) algorithm. To categorize the tree species, the scientists utilized a variety of machine learning methods, including Random Forest, Support Vector Machine (SVM), and k-Nearest Neighbors (k-NN). They also compared the algorithms’ performance using several assessment measures, such as overall accuracy, precision, recall, and F1 score.
In terms of code availability, the authors did not specifically state whether code for their approach is available. They did, however, note that they utilized the R statistical program and associated packages to undertake data processing and analysis, implying that their technique might be implemented using these tools.
“Assessing potential of UAV multispectral imagery for estimation of AGB and carbon stock in conifer forest over UAV RGB imagery” by Gaden [
6], classified various tree species by segmenting individual trees from Very High Resolution (VHR) RGB imagery. The user employed a U-Net CNN architecture to segment trees and a ResNet-50 CNN for classifying species.
The VHR RGB imagery was analyzed by the authors using the U-Net CNN architecture to exclusively identify trees through image segmentation. Although VHR pictures can provide high-precision measurements for classification techniques, they typically contain clouds and cast shadows, which create issues for trustworthy information extraction. This type of CNN is frequently employed for such tasks [
15].
The ResNet-50 CNN model was used to classify each segmented tree into one of the six tree species. The authors utilized a set of features extracted from the RGB image of each segmented tree to differentiate between various tree species. These features were then fed into the ResNet-50 model as inputs. The ResNet-50 model was pre-trained on a large dataset of natural images using the transfer learning technique. Afterward, it was fine-tuned using the VHR RGB imagery dataset. The paper does not come with any code, but the authors have given a thorough explanation of their approach and findings.
The authors of the publication “Forest segmentation using a combination of multi-scale features and deep learning” [
6] aimed to segment forests in high-resolution remote sensing images. The forest was divided into two groups: forest and non-forest. The authors did not distinguish between different types of trees within the forest category.
The method employed by the authors for forest segmentation was “multi-scale features and deep learning”. The forest segmentation results were made more accurate by utilizing a deep learning framework that incorporated multi-scale image features. The authors used the Faster R-CNN model, which is a well-known deep-learning object detection framework. The model detected objects at different scales by extracting multi-scale features from the input image using a pyramid scheme. The specific pyramid scheme used for extracting multi-scale features from the input image was not explicitly mentioned in the paper.
Pyramid schemes are approaches used in computer vision and image processing to extract multi-scale features from an input image. Image pyramids, which are a sequence of scaled-down reproductions of the original image at different resolutions, are used in these schemes. The image is shown at a different size at each level of the pyramid, with higher levels having poorer resolution.
Pyramid methods are used to record data at numerous sizes, allowing algorithms to evaluate images at various levels of detail. They contribute to addressing the problem of identifying objects of varying sizes and dealing with objects that appear at varied scales inside a picture.
Pyramid systems that are commonly employed in computer vision include Gaussian pyramids, Laplacian pyramids, and steerable pyramids.
By repeatedly applying a Gaussian filter to the source image and subsampling it, this method produces a Gaussian pyramid. The image is represented at a different scale and resolution at each level of the pyramid.
The Laplacian pyramid is derived from the Gaussian pyramid. The details, or residuals between the respective levels of the Gaussian pyramid and its extended counterpart are represented by each level of the Laplacian pyramid. It aids in capturing small details in the photograph.
Steerable Pyramids: Steerable pyramids are multi-scale representations that extract information from different orientations and scales using a mix of filters. They are especially effective for assessing photos with objects of various orientations.
Pyramid methods are used in a variety of computer vision tasks, including object identification, image segmentation, and feature extraction. Algorithms that extract multi-scale features can effectively handle objects of varying sizes and capture both fine-grained details and wider context information in the image.
You can find the code for this paper on GitHub. Python and the PyTorch deep learning framework were used to implement the code.
Stan et al. [
36] utilized deep convolutional neural networks to perform semantic segmentation of forest regions in their paper, “Semantic Segmentation of Forest Regions Using Deep Convolutional Neural Networks”. The goal was to divide various areas of the forest, including trees, roads, water bodies, and other types of land cover. The authors utilized two distinct datasets: the National Agriculture Imagery Program and the Spatio-Temporal Asset Catalog.
The study included six categories: tree, road, building, water, field, and others. The authors utilized spectral clustering, a method that groups pixels with comparable spectral features, to differentiate between various tree species.
The U-Net deep CNN architecture was utilized by the authors for semantic segmentation. The structure of U-Net consists of an encoder-decoder with drop out layers, which aid in preserving spatial information. Furthermore, the authors employed methods of data augmentation to expand their training dataset and prevent overfitting. The paper does not discuss the availability of code.
Ma et al. [
23] suggested a method for automatically segmenting Terrestrial LiDAR Data (TLD) to distinguish individual trees in their paper, “Automated extraction of driving lines from mobile laser scanning point clouds”. The researchers separated the trees in the TLD on an individual basis but did not identify the specific species of each tree. The research divided the trunks and branches of trees, along with the nearby plants. The approach they used for segmentation involved two steps: region growth and convex hull fitting. To obtain the tree structure, the point cloud was segmented into regions and then fitted with convex hulls.
Ma et al. [
23] evaluated their technique on a variety of datasets with varying levels of complexity. One of the datasets, for example, had trees with overlapping canopies. Their method’s performance was tested using many quality criteria, including but not limited to completeness and accuracy. These metrics were used to evaluate the segmentation findings’ correctness and effectiveness. Furthermore, the researchers compared their method to other sophisticated techniques available in the literature, albeit the specific techniques evaluated were not specified in the material provided.
Ma et al. [
23] exhibited substantial levels of accuracy in segmenting individual trees from Terrestrial LiDAR data by comparing their method’s completeness and correctness measures to those of other advanced algorithms. The authors did not make their code publicly available for this paper.
The study was conducted “Semantic segmentation of remote-sensing imagery using heterogeneous big data: International society for photogrammetry and remote sensing potsdam and cityscape datasets”, by Song & Kim [
35]. The aerial imagery was divided by the authors to isolate tree crowns. They then categorized each crown based on various forest inventory characteristics, such as tree species, height, diameter at breast height, and crown width. They separated both the trees and the vegetation in the surrounding background and understory.
The authors identified different tree species by analyzing spectral and spatial features obtained from segmented tree crowns. The researchers utilized U-Net, which is based on deep learning. The encoder network weights were initialized using transfer learning with a pre-trained VGG-16 network, which helped to enhance the model’s performance.
The authors have made the work’s code available as open source on GitHub. The code contains the U-Net architecture implementation, pre-processing procedures, and scripts for training and testing.
The authors of the study “The Semantic Segmentation of Standing Tree Images Based on the Yolo V7 Deep Learning Algorithm” by Cao et al. [
24] provided a thorough method for semantic segmentation of standing tree images with the aim of differentiating between different tree types. Instead of just segmenting trees generally, the study concentrated on dividing tree areas into distinct tree species.
The authors employed the YOLO V7 deep learning algorithm, a commonly used approach at the time known for its potential effectiveness and precision in object identification tasks, to perform the segmentation and classification tasks. Using the YOLO V7 network, the input photos were first pre-processed, after which they were segmented, and finally, the resultant tree areas were classified into the appropriate species.
Numerous tree species that were pertinent to the geographical region under examination were included in the classifications employed in this study. Although the authors did not state how many classes they intended to have, it is clear that they wanted to include a wide variety of tree species. The YOLO V7 algorithm made it easier to distinguish between different tree kinds based on the distinctive traits and qualities that each species possesses, such as the texture of the bark, the form of the leaves, the branching patterns, and the general morphology. In terms of methodologies, models, and frameworks, the YOLO V7 deep learning algorithm was the main segmentation and classification tool that the authors used. To improve the performance of the model, they added further strategies to the process, such as data augmentation and transfer learning. The original query omitted any particular implementation frameworks and specifics.
Regarding code accessibility, the authors have not made the research’s source code available to the general public.
The study by (Lim, Zulkifley et al. 2023) Attention-Based Semantic Segmentation Networks for Forest Applications,” developed and tested an optimal attention-embedded high-resolution segmentation network called HRNet + CBAM in order to classify non-forest and forest areas in Malaysia. The data and input are gathered using Landsat-8 satellite images from ten locations in Malaysia from 2016, 2018 and 2020 [
19].
The manual annotation of images is conducted for efficient training of the model and data set is categorized into 20% for testing and 80% for training of the model. The learning rate, and optimizer, are among the hyperparameters that the basic HRNet model is tuned to. The mean Intersection over Union (mIoU) of this baseline HRNet model is 84.84%, accuracy is 91.81%, and loss is 0.6142. The Convolutional Block Attention Module (CBAM) is embedded into HRNet, leading to an improvement in performance to 92.24% accuracy, 85.58% mIoU, and 0.6770 loss. HRNet and HRNet + CBAM beat other models, such as U-Net, SegNet, and FC-DenseNet, when benchmarked against them with regards to precision and mIoU. [
19]
Nevertheless, neither the availability of the code nor the specific framework that was utilized to create these models are specified. In order to manage huge datasets, the paper recommends employing more data with additional modifications beyond forests, trying different attention processes in various architectures, and exploring higher-end GPUs or alternate data loading methods.
In “Semantic Segmentation Network Slimming and Edge Deployment for Real-Time Forest Fire or Flood Monitoring Systems Using Unmanned Aerial Vehicles”“ by (Lee, Jung et al. 2023), an innovative approach for employing drones outfitted with cutting-edge deep learning models to monitor forest fires and floods in real time is utilized. Through the application of semantic segmentation models such as DeepLabV3 and V3+, the system effectively identifies and demarcates impacted regions from UAV-captured data. The use of channel pruning-based network slimming, which drastically lowers model size and computing requirements without sacrificing accuracy, is the primary advancement of the study [
17].
The results indicate that for the FLAME dataset: mIoU accuracy of 88.29%: This indicates the mean Intersection over Union (mIoU) accuracy of the model in correctly identifying and delineating the regions affected by forest fires. Higher mIoU values signify better segmentation accuracy. Similarly for the FloodNet dataset, mIoU accuracy of 94.15%: Similarly, this indicates the mean Intersection over Union (mIoU) accuracy of the model, but for identifying flooded areas in the FloodNet dataset. Again, higher values imply better segmentation accuracy.
The slimmed models exhibit minimal performance loss compared to baseline networks but achieve a remarkable 20-fold increase in inference speed. Moreover, the reduction in model size and computational requirements by approximately 90% not only enhances processing efficiency but also slashes power consumption, prolonging drone endurance. This breakthrough paves the way for effective and energy-efficient UAV-based monitoring systems tailored for mitigating natural disasters like floods and forest fires, safeguarding lives and ecosystems with real-time insights.
The study by (Ma, Dong et al. 2023) titled as “Forest-PointNet: A Deep Learning Model for Vertical Structure Segmentation in Complex Forest Scenes” offers a semantic segmentation technique centered on the Forest-PointNet model, which was created especially to recognize the vertical arrangement of forest by using terrestrial LiDAR data. The model takes advantage of the benefits that the PointNet structure offers by using an optimization strategy that improves the extraction of local features. When used for semantic segmentation in complex forest environments, it maintains important spatial characteristics, guaranteeing precise identification of forest components. Terrestrial LiDAR scans that collect point clouds of forest habitats make up the data inputs; however, particular datasets are not mentioned. Although the deep learning framework used is still unknown, the Forest-PointNet model performs admirably. It achieves an average recognition accuracy of 90.98%, which is around 4% better than current approaches, especially when compared to PointCNN and PointNet++ [
22].
The model outperforms segmentation techniques based on three-dimensional structural reconstruction and outperforms traditional machine learning techniques by doing away with the requirement for human feature engineering. The study indicates that the Forest-PointNet model offers a viable approach for tasks involving semantic segmentation in varied forest landscapes, demonstrating robust efficiency and adaptability in complicated environments, even though code availability has not been indicated.
A ground-based LiDAR point cloud semantic segmentation technique for complex forest undergrowth scenarios is presented in this paper by (Li, Liu et al. 2023). We build forestry point cloud datasets, that are fused with undergrowth point cloud features, and use the DMM module based on point DMM module for semantic segmentation as a deep learning technique. The LiDAR equipment used to gather the forestry dataset is backpack-style. The study also suggests a point cloud data annotation method based on single-tree positioning to address the challenges of occlusion in forestry environments, sparse distribution as well as the lack of a database along with large location scales and elevated data volume in point clouds representing forestry resource environments (Li, Liu et al. 2023) [
18].
The study utilized the DMM module to integrate tree features and an energy segmentation function to build a critical segmentation chart with the goal to address the less-than-ideal fractal structures and the attributes of large data, large sale scenes, uneven sparsity disorder, and diversity, in forestry environments. Next, we employ cutpursuit to figure out the graph and accomplish semantic pre-segmentation. With its severe occlusion, difficult terrain, numerous return information, high density, and unequal scales, our approach closes the gap in the current deep models used for complicated forestry environment point cloud information. We provide pointDMM, an end-to-end deep learning model that significantly enhances the intelligent analysis of complicated forestry environment scenarios by training a multi-level lightweight deep learning network.
Our approach shows good results for segmentation on the DMM dataset, with a 21% improvement in the identification accuracy of live tree compared to other methods, and an overall accuracy of 93% on the large-scale forest environment point cloud dataset DMM-3. This approach offers major benefits over manually conducted point cloud segmentation when it comes to retrieving feature data from TLS-acquired artificial forest point clouds. It also lays the groundwork for forestry Informa ionization, intelligence and automation.
Moving further, in order to accomplish semantic segmentation, the segmentation strategy covered in the article by (Mazhar, Fakhar et al. 2023) makes use of convolutional neural networks (CNNs), with an emphasis on encoder-decoder topologies. The technique known as “semantic segmentation,” which is used in medical imaging applications, involves assigning a class to each pixel in an image. The CNN’s encoder module is in charge of obtaining feature maps from the input pictures. The decoder then reconstructs these feature maps to regain the spatial resolution and generates segmentation predictions that are precisely pixel-by-pixel [
24].
The study highlights many unique CNN architectures that perform well in semantic segmentation tasks. Famous for its ease of use and efficacy, the U-Net architecture is one such model that is widely used in medical picture segmentation. Another noteworthy architecture that is intended to learn distinguishing characteristics is the Dens-Res-Inception Net (DRINet), which has shown usefulness in the segmentation of CT images of the abdomen, brain tumors, and brain. It is suggested to use dense multi-scale connections to create the high-quality multi-scale encoder–decoder network (HMEDN), which would provide accurate semantic information needed for multi-modal brain tumor segmentation and pelvic CT scans. Furthermore, when trained with Dice loss and cross-entropy, Fully Convolutional Networks (FCNs) are evaluated for their uncertainty estimation and segmentation quality, especially in applications related to the brain, heart, and prostate.
The study lists a number of models that are more advanced, including the Multi-Scale Residual Fusion Network (MSRF-Net), which makes use of a Dual-Scale Dense Fusion (DSDF) block to improve multi-scale feature communication, and INet, which uses overlapping maximum pooling for sharper feature extraction. These techniques show significant improvements in model training efficiency and segmentation accuracy.
A wide range of medical imaging modalities are represented in the data types and input picture used in these studies. These include biomedical MRI, X-rays, endoscopic imaging, mammograms, brain CT scans, brain tumor images, abdominal CT scans, pelvic CT scans, multi-modal brain tumor datasets, prostate CT scans, heart CT scans, and images for pattern detection of interstitial lung disease (ILD). These numerous image types show the adaptability and strength of CNN-based methods by posing different segmentation opportunities and problems.
It is implied that well-known frameworks like TensorFlow or PyTorch are probably utilized given their extensive use in the area, even though the precise deep learning frameworks used to create these models are not stated explicitly.
The study’s findings highlight CNNs’ impressive performance in a variety of medical picture segmentation tasks, especially when using encoder–decoder architectures. Notable results include greater segmentation quality and uncertainty estimation with FCNs trained using Dice loss, improved multi-scale feature communication with the MSRF-Net, and improved accuracy and streamlined training procedures with the U-Net model incorporating robust connection.
To add further, based on another study conducted by (Li, Liu et al. 2023), the research uses ground-based LiDAR data to offer a sophisticated method for semantic segmentation of point clouds in intricate forest undergrowth situations. The fundamental approach uses a deep learning method called pointDMM, which effectively pre-segments semantics by utilizing a DMM unit and the cutpursuit algorithm. LiDAR point cloud data is the main sort of imagery that is used. It is painstakingly gathered using backpack-style LiDAR equipment, guaranteeing thorough coverage of forestry areas. The DMM dataset, particularly focused on the large forest habitat point cloud dataset identified as DMM-3, is essential to our investigation.
Given the nature of the deep learning techniques used, it is reasonable to assume that TensorFlow or a comparable deep learning framework was used, even though the precise deep learning framework used is not stated. The segmentation method efficiently addresses the difficulties presented by blockage, high density, complex topography, and uneven scales in forested environments. It involves the building of a crucial segmentation graph and utilizes an energy segmentation function.
The study presents important results, one of which is the incredible 93% accuracy on the DMM-3 dataset. Compared to current techniques, this accuracy represents a significant 21% boost in live tree recognition accuracy. This improvement demonstrates how well the pointDMM approach handles the complex and varied features found in forestry point cloud data. This method has significant advantages when it comes to the collection of feature information from artificial forest point clouds generated by terrestrial laser scanning (TLS). This underscores the method’s potential to further technology, intelligence, and informatization in the forestry area.
Nevertheless, it is not made obvious whether the implementation details are available for additional studies or practical application because the code utilized in this study is not publicly available. In conclusion, this study presents a strong ground-based LiDAR point- cloud semantic segmentation method using pointDMM, shows appreciable gains in segmentation precision, and emphasizes noteworthy developments in feature extraction capabilities; however, the availability of the underlying code is still unknown.
Another study by (Zhang, Li et al. 2022) states that by contrasting three network variants—one without SE Block and RAM, one with just the SE Block, and the suggested SERNet—the study analyzes the effects of SE Block and RAM on semantic segmentation performance. SE Block improves the mean Intersection over Union (mIoU) by 1.49%, the Accuracy Factor (AF) by 1.29%, and the Total Accuracy (OA) by 1.40%, hence boosting feature representation and segmentation accuracy, particularly for the “Surface” and “Car” categories. RAM raises the mIoU by 0.31%, AF by 0.41%, and OA by 0.41%, but only slightly improves performance due to its focus on global information. The ISPRS Vaihingen and Potsdam datasets, which include DSM (Digital Surface Model) and IRRG (Infrared, Red, Green) photos, were the datasets used for this assessment [
42].
TensorFlow and PyTorch are common frameworks used in these kinds of research. The findings show that when DSM data is included, the suggested SERNet model obtains improved segmentation accuracy, especially for vegetation categories. The study does, however, admit certain limitations, including the possibility of feature redundancy and adverse mutual influence as a result of the straightforward fusion method utilized to combine DSM and IRRG data. Furthermore, the computational overhead of SERNet is increased by its huge number of parameters.
All things considered, the study emphasizes how important it is to recalibrate features and transfer information across the network in order to improve the accuracy of semantic segmentation, especially for images from remote sensing with high resolution (HRRSIs). Although the article presents encouraging findings, it makes no mention of where the code is available for replication or additional research.