A Simple Yet Effective Approach of Building Footprint Extraction in Indonesia

Topographic mapping using stereo plotting is not effective because it takes much time and labor-intensive. Thus, this research was conducted to find the effective way to extract building footprint for mapping acceleration. Building extraction method in this process comprises four steps: ground / non-ground filtering, building classification, segmentation, and building extraction. Nonground points from filtering process were classified as building with the algorithm based on multiscale local dimensionality to separate points at the maximum separability plane. Segmentation using segment growing was used to separate each building, so edge detection could be conducted for each segment to create boundary of each building. Lastly, building extraction was conducted through three steps: edge points detection, building delineation, and building regularization. With 10 samples and step 0.5, classification resulted quality and miss factor of 0.597 and 0.524, respectively. The quality was improved by segmentation process to 0.604, while miss factor was getting worse to 0.561. Meanwhile, on average shape index value from extracted building had 0.02 difference and the number of errors was 30% for line segment comparison. Regarding positional accuracy using centroid accuracy assessment, this method could produce RMSE of 1.169 meters.


Introduction
The availability of high-quality geospatial data, including building footprint as part of digital topographic map, is necessary for bigger purposes, including 3D city models [1], [2] and spatial planning. In Indonesia, spatial planning plays more concern due to limited availability of topographic data that inhibits spatial planning in many regions because spatial planning in Indonesia requires topographic map as a reference for base map.
Traditionally, topographic data production uses stereo plotting process using photogrammetry technique. However, manual stereo plotting takes much time and labor-intensive, so it is not effective to be applied for large scale topographic mapping in the whole area of Indonesia. Due to high needs for large scale topographic maps, acceleration of large-scale mapping must be conducted, therefore innovation in large scale mapping method should be discovered.
Automation of map feature extraction is one of the solutions for mapping acceleration. The automation will reduce the amount of time and human resources, so that it will increase the speed of topographic map production. One of the map features that has been evaluated to be extracted in this research is building. The need of the highly accurate building footprint can be fulfilled by manual digitization, but it can be very time consuming. Compared to the other map features, building is one of the most complex features. It is not easy to develop the algorithm for automatic building extraction. Moreover, most buildings in Indonesia are very dense and close to each other, and many of them are in irregular shapes, making it a more challenging research.
Research of automatic building extraction is not something new, many methods have been developed effectively and accurately. Various base data from aerial photo, LiDAR, satellite imagery, or combination of them has been examined in order to find the best method for building extraction. A recent technology is based on deep learning algorithms, and many researches about that for building extraction have been conducted, for example using neural networks and Markov Random Fields (MRF) [3], end-to-end trainable gated residual refinement network (GRRNet) [4], Fully Convolutional Network [5], Convolutional Neural Network [6], Deep Convolutional Network [7] [8], or Deep Fully Convolutional Networks [9]. Another method such as supervised and unsupervised segmentation are still used and developed, for instance by [10] in which it compared supervised and unsupervised multiresolution segmentation for building extraction.
The development of LiDAR creates a new approach for automatic map features extraction. LiDAR is not only used for DEM extraction, but also for detecting planimetric features such as buildings and roads. Commonly, building extraction from LiDAR starts from point cloud classification of building class (or building roof class). Many algorithms have been developed for building class classification from LiDAR point clouds, for instance using input attributes from DSM and ortho image [11], a supervised parametric classification algorithm [12], the theory of Dempster-Shafer for two different stages of classification process [13], the geometry and echo information of LiDAR point cloud to estimate parameters for the learning model [14], and a multiple-entity based classification method [15]. Regarding building extraction, a lot of research found alternative methods for building extraction from LiDAR, for example using OBIA (Object Based Image Analysis) with the use of class modelling methods [16], data fusion of point and grid-based features using a graph cuts algorithm [17], a novel ordered points-aided Hough Transform (OHT) [18], a modified LEGION segmentation [19], automatic process with emphasis on top-down approaches [20], and rule-based segmentation of non-ground LiDAR point clouds [21].
Combination and integration using both raster and point based data is another option. [22] developed automatic building detection algorithm using LiDAR data and multispectral imagery, combining masks creation from LiDAR to extract line segments and NDVI from imagery to remove line segments around trees. [23] proposed a method for automatic 3D roof extraction through an integration of LiDAR and multispectral imagery, in which segmented non-ground points were used to extract the roof planes, while the classified image as roof edge and roof ridge were used as baselines to locate the nearby LiDAR points of the neighboring planes to extract the complete roof plane. [24] integrated LiDAR and optical multi-view images by incorporating the segmented roof points and 2D lines extracted from aerial images for reconstructing 3D roof models.
Combination of raster image and LiDAR point cloud data is not only applicable for detecting building footprint, but also good for 3D building model extraction. 3D modeling both from images and LiDAR has been an active research area in the photogrammetry, computer vision, and computer graphic communities [25]. 3D building modeling from LiDAR has been developed, for instance, by [26] that developed reconstruction approach to produce 3D building model with LOD2, [27] that built new framework to create compact building models automatically, or [28] that proposed fast and automatic method to create 3D watertight building models. Specifically using Indonesian data, [29] built building models using plane fitting to sets of non-ground points. Similar to building footprint extraction, combination of raster and point clouds data is also possible to generate 3D building models. For example, by integrating multi-view aerial imageries and LiDAR data to reconstruct 3D building models [24] or integrating multi-source point clouds and oblique remote sensing imageries for accurate reconstruction of the LoD3 building models [30]. Thus, utilization of LiDAR or combination of LiDAR and imagery has been used beyond building footprint extraction, namely for 3D building modeling, or even city modeling.
The purpose of this research was to find the best approach to extract building footprint automatically for topographic mapping acceleration in Indonesia. Although this research did not use recent technologies such as deep learning or machine learning, it was effective enough to create accurate building geometries. Moreover, what is most needed in Indonesia is finding a simple but effective approach to extract topographic map features, since completing topographic map for a large country such as Indonesia is still challenging. This research is expected to improve the business process of topographic mapping in Indonesia to be cheaper, faster, and more effective.

Materials and Methods
This research used LiDAR data acquired in 2016 (for building footprint extraction) and topographic map of Indonesia/RBI5K mapped in 2017 (as a reference dataset) in Lombok region, Nusa Tenggara province, Indonesia. More specifically, the research was conducted at Sayang-sayang village, located in Cakranegara district in Mataram City, Lombok ( Figure 1). This area was chosen because it represented various characteristic of buildings, containing both dense and sparse building groups, so that the analysis could be conducted based on the density of buildings. Regarding the specification of the data, the density of point clouds was 4 ppm (points per meter square) or higher, meaning that for each meter square area there were at least four points. This specification was suitable for Indonesia's topographic mapping at 1:5,000 map scale.

Ground / Non-ground Filtering
Ground / non-ground filtering was one of the important steps in this research, because it would affect building classification result. Ground and non-ground classification was not only useful for DTM extraction, but also for identifying building because it could reduce number of points [31]. In that sense, the next step of point cloud classification into vegetation and building classes was conducted in non-ground points only. Therefore, point cloud separation to ground and non-ground classes determined the quality of building classification.
Classification of ground and non-ground classes was performed using Adaptive-TIN Surfaces algorithm ( Figure 2) in Terrasolid software. This algorithm contains four steps, namely: calculating initial parameters using all data, selecting seed points, iterative densification of the TIN, and continuing until all points were classified as ground or object [32]. The inputs such as maximal building size, terrain angle, iteration angle, and iteration distance were crucial, in which the values used were 100 meters for maximal building size, 88.0 degrees for terrain angle, 5.00 degrees to plane for iteration angle, and 1.40 meters for iteration distance. Maximum building size shows area that has at least one initial ground point, terrain angle assumes the steepest angle between points in generated TIN, terrain angle depends on terrain condition in which smaller iteration angle is better for flat terrain and vice versa, and iteration distance is maximum distance between point to triangle [33].

Building Roof Classification
Building roof classification used the algorithm developed by [34]. It was based on a multi-scale local dimensionality to separate points into two classes at the maximum separability plane. The classification considered how the cloud geometrically looked like at a ball geometry at a given scale, then the best combination of scales at which dimensionality was measured. The diameter of the ball was defined as a scale while point of interest was located at the center, as shown in Figure 3. To determine whether the cloud appeared locally at a given scale in 1D, 2D, or 3D, the proportion of variance by the eigenvalues resulting from Principal Component Analysis (PCA) was used, with the equation (1) defined the proportion of variance [34].
where if , i = 1 … 3 be the eigenvalues, ordered by decreasing magnitude: 1 ≥ 2 ≥ 3 . The number of eigenvalues that accounted for the total variance in the neighborhood ball determined whether the points were distributed in one, two, or three dimensional around the reference scene point.
The algorithm used a linear Support Vector Machines (SVM) created by CANUPO training plugin in Cloud Compare software. A linear classifier proposed one solution in the form of hyperplane that best discriminated and separated two classes (e.g. vegetation against building). It used weight vector (w) and bias (b), in which the hyperplane was defined by equation w T X -b = 0. w and b were used to project the feature space on the hyperplane, then the distance to the hyperplane d1 = 1 X -b1 was calculated for each point. The second distance d2 was obtained by repeating the process in order to get the second-best direction orthogonal to the first, then both distance d1,d2 were used as coordinates at 2D plane of maximal separability.
The plane of maximal class separability only kept two main components, so in this research point clouds were separated into two categories: building and vegetation. To do this classification step, we first created training area that represented both categories. Then, some parameter numbers were decided to create this training data, including minimum and maximum scale to be limits of multiscale dimensionality. The unit of the scale referred to the unit of the point cloud. In this research, the chosen number for scales was 2-10 because the buildings length in the research area were considered between 2 and 10 meters since they were located at settlement area. Another parameter was step, defined as the distance unit from the minimum to the maximum scale, in order to achieve multi-scale features. Small step means more features, otherwise, larger step means less features. In this research, the chosen number for step was 0.5 meters due to the density of LiDAR point clouds is 4 ppm (points per meter square).
Shortly, the whole process for all classification steps is explained in Figure 4. It started from ground and non-ground classification, then non-ground points were classified as building and vegetation points. All processes here were binary classification. The building points were then used in the next stage, namely segmentation.

Segmentation
Segmentation in point cloud is used to distinguish group of points based on feature similarity. In order to improve classification result, [35] used segment-based features rather than traditional point-based features for point based classification. The segment-based features led to more accurate classification result. In our research, segmentation was used to separate each building, so that edge detection could be conducted per segment to create boundary of each building. The data used in this stage was the previous classification result, namely classified building points.
Segment growing [35] algorithm was chosen for segmentation process. We used Point Cloud Mapper software for the implementation of the segment growing segmentation. In this algorithm, the test for accepting neighboring points as extensions of a segment was based on similarity of feature values and normal vectors scaled by the planarity to distinguish between surfaces with different orientations [35]. Seed points with similar feature values would be extended and merged with their neighbor points if their values were close to the average feature values for each segment. For instance, [36] used such segment growing based on echo widths of full waveforms to separate vegetation from non-vegetation. Combination with the normal vector direction scaled by planarity was used to distinguish between surfaces with different orientations, in which when 1 > 2 > 3 were the eigenvalues of the co-variance matrix, the planarity could be expressed as Segmentation was also used to remove small segments and unsegmented points, therefore it eliminated building errors in the building extraction process. Unsegmented points, colored as white points ( Figure 5) and did not have segment number, were points without sufficient nearby similar points to start a new seed [35].

Building Footprint Extraction
Building extraction process started from boundary detection for each segment to create delineation of coarse building footprint from those edge points. After segmented points were obtained using segment growing, then edge points of each segment were extracted using developed boundary package in MATLAB before delineation of coarse building boundaries are created from those edge points (Figure 6a, 6b, and 6c). The algorithm of boundary points detection returned points that represented boundary arounds the building roof points, and unlike the convex hull, the boundary could shrink towards the interior of the hull to envelop the points (https://www.mathworks.com/help/matlab/ref/boundary.html). Then, points for each segment were connected with lines using Points to Line tool so it produced raw building footprints. Lastly, to produce smooth and sharp building footprint, regularization of building boundary was conducted using Building Regularization tool that applied algorithm developed by [37] (Figure 6d). The implementation of the edge delineation and building footprint regularization was performed in ArcGIS Pro software (https://www.esri.com/en-us/arcgis/products/arcgis-pro/resources).
In order to achieve good geometry of building regularization, reconstruction of coarse building required support 90° and 45°. This algorithm was also suitable to arcs, in which the arc passing through considered locations differed from the segment by the necessity to define the radius [37]. Building regularization required the setting of tolerance parameter that described the smallest segment length in the output features, and it was set to 1 meter. This number was based on experiments using several numbers, considering the difference between the number of segments from extracted and reference segments, so it could be assumed that extracted objects had similar level of detail to reference objects.  Figure 7 shows the result of filtering process, color-coded by the elevation. Figure 7a shows all points in the research area, while 7b depicts the result of ground and non-ground filtering process, with background of ortho photo. As can be seen in Figure 7b, ground points were removed, as illustrated in paddy field area, because it was an open area and no building existed in that area, almost all points there were removed. Then, analysis of building classification was conducted. Figure 8 shows the distribution of training area for both categories, namely building and vegetation. Red color indicates building area, while blue color shows vegetation area. Each category had 10 objects as training area, which was obtained from manual classification by human interpretation and well distributed in the whole research area. It should be noted that even though the number of samples were small, the classification performed well as can be seen in the result. The statistic showed that ba value gave very good result, reaching 91.6%, meaning that the algorithm could recognize both building and vegetation well (Figure 9). However, fdr value did not indicate good separation of those classes because it was only 3.811. To test the effect of number of samples and step parameter to ba and fdr results, experiments with various parameters were conducted. We used several numbers of sample from 10, 15, 20, 25, and 30, and also changed the step parameter by 0.5 and 1, where the result is shown in Table 1. It can be seen that addition of samples and changing step parameters did not improve the results of ba and fdr too much. In conclusion, it was still difficult to increase fdr value to reach more than 10 as produced by [34]. Unlike our work that used airborne laser scanning, [33] used much higher point density of terrestrial laser scanning. Therefore, it was expected that our work achieved lower fdr value than [33]. Table 1. The result of ba and fdr value with various number of samples and step parameters.

Number of Samples
Step Besides comparing the ba and fdr value, analysis of classification accuracy was conducted using matched rates analysis, consisting of completeness, correctness, quality, and miss factor, defined by equation (4) -(7) [38]. Initially, those equations were used to assess extracted buildings instead of classified building points, but in this research, it was adapted to examine the accuracy of building classification result. Reference points here were non-ground points cloud that overlapped with building geometry from topographic map, which were obtained from manual stereo plotting. It was assumed that those points were the correct building points and there was no geometrically offset between point cloud and topographic map. The result of matched rates analysis is shown in Table 2. With 10 samples and step 0.5, it produced the highest correctness at once the lowest completeness and quality compared to the other parameters. The best result was produced using 25 samples and step 1, with quality of 0.666 and miss factor of 0.263. However, since the purpose of this research was to find the simple and effective method for building extraction, the least number of samples was chosen. Meanwhile, the step used was 0.5, because when the specification of point density was 4 ppm, it was assumed that the distance between 2 points was at least 0.5 meter.

Number of Samples
Step In the next step, the training data was applied to classify the whole research area. This area was covered by buildings (with various characteristic such as big, small, regular, and irregular), vegetation with variability of canopy diameter, and also open area like paddy field and roads. Generally, the buildings had similar shape and geometry because most of them were settlement and housing area. Figure 10 shows the classification result.  Figure 11 shows segmentation result from Point Cloud Mapper software, in which each color indicated different segment. Segment growing result showed point clouds were segmented to each building, so that segmentation result could be identified as building separation. This process was also used to remove small segments and unsegmented points, so small and unidentified objects that classified as errors were eliminated here. Compared to classification result, segmentation increased the quality of building points based on assessment method in the previous chapter. The quality was escalated from 0.597 to 0.604, while correctness was improved from 0.869 to 0.913. Nevertheless, the completeness and miss factor got worse, where completeness was reduced from 0.656 to 0.641, while miss factor was increased from 0.524 to 0.561. It was because there were more points eliminated in this stage, therefore the percentage of missing objects was higher.

Segmentation Result
Building separation from segmentation process was compared to building polygon from topographic map at 1:5,000 scale. In general, segmentation could produce good result in sparse buildings (Figure 12a), but the error of segmentation result was bigger when the buildings were denser, because many of buildings that were very close to each other were segmented as one segment (Figure 12b).

Visual Analysis
Visual analysis was conducted by comparing extracted buildings and ortho photo. In general, the errors from visual analysis can be divided into five categories: 1. Extracted buildings were generated in non-building area (visually checked on ortho photo) Figure 13a shows the example of this category, where extracted buildings were located in non-building area. Based on the ortho photo, it was known that the object was vegetated area. It was caused by error of classification result, where some non-ground points were classified as building in non-building area (Figure 13b). This error may be corrected by changing the classification parameters.

Extracted buildings were not generated in building area (visually checked on ortho photo)
This category is opposite to the previous category, where the true buildings based on ortho photo were not generated in extracted building. It can be seen in Figure 14a that some of buildings were not generated. Figure 14b shows those buildings only had a small number of points so it was not possible to create building boundary from them. Such points were also possible to be removed during segmentation process, so it increased the possibility for them to be not extracted in the building delineation. It may be corrected by changing the classification parameters. This error would need further effort to delineate building boundary using another method like manual digitization.

Extracted buildings had different orientation from true building (visually checked on ortho photo)
Another error was when extracted buildings had different orientation compared to the true building that was seen from ortho photo. Even though the possibility of this error was much less than the previous errors, it was still possible, as depicted in Figure 15a. This kind of error could be caused by building regularization algorithm that assumed building geometry has 90° or 45° angle, or error during classification and segmentation. One of the solutions was by improving segmentation result, but in this research, it had not been conducted yet.

Statistical Analysis
A number of related data samples were taken to assess the result of building extraction from LiDAR. Figure 16 illustrates the result of building extraction which depicted as building footprints, where the red line polygon was building feature from the LiDAR and the blue one was the feature from reference dataset. It can be seen that the result of building extraction had a similar pattern to the reference map. However, some errors were found, i.e. the building that was separated into several buildings. In addition, several small buildings were missing and not well extracted, and vice versa. Also, several building footprints were extracted falsely in non-building positions. Three methods were used to assess the results of geometry of footprint building extraction, which were segment of line comparison, shape index similarity [39]- [41] and edge pointcorrespondence [39], [42]. The segment of line comparison was used to evaluate the number of sections of every line resulted from samples against reference dataset. Statistically, both data were calculated to generate distinction of the line segments and counted it into a percentage which indicated the level of error. Matrix of the line segment comparison is shown as Table 3. Shape index similarity was used to enumerate the shape ratio between two objects. This method expressed a ratio of area to the length of the circumference [40]. Each sample of extracted building was calculated as a polygon, where the shape index was calculated between perimeter in giving an internal area of a polygon. If the shape index was 1, it meant the polygon calculated was circular, while the higher number of 1 indicated the bar-shaped of the polygon. If the polygon was square, it had an index of approximately 1.13. It is possible to express the shape index equation as follows: where: Di : index pi : perimeter (m) A : area (m 2 ) Table 3. Comparison of segment number between samples of reference datasets and building extraction data.  Table 3 shows the number of line segment in each data sample, where it was arranged downward respectively in sample ID, number of segments of reference datasets, and number of segments of extracting data. Each number of segments of extracting data is shown as four columns as well as the interval of tolerance in building regularization processes, which are 1 m, 1.5 m, 2 m, and 2.5 m consecutively. The comparison of extracting data to reference datasets obtained maximum number of errors in each sample data. The error was depicted as a percentage which is illustrated in Figure 17a. According to the graph, the maximum error was found in sample data N, where it had indefectible error regarding to line segment of building extraction. The most suitable interval tolerance value in regularization process was 1 m. Overall, the number of errors on average was 30% or 0.3, meaning the line segment of this building extraction is acceptable. Assessment of shape similarity resulted value of index in every sample. The result of shape similarity index is shown as Figure 17b. Related to extracted data and reference dataset there was a low difference index mostly. On average, the whole sample had 0.02 difference in shape index value. Comparing the whole data, sample S merely generated the highest difference shape index, or quantitatively was 0.3. Thus, the resulted of building extraction had a nearly similar shape to reference data in particular of building edge features.
The last test was positional accuracy using the calculation of Root Mean Square Error (RMSE). Accuracy was assessed by comparing centroid positions between extracted and reference sample objects. Centroid was used because Euclidean distance between the centers of the mass of an extracted object and the reference object could be used as another metric to measure positional accuracy other than the corner points of buildings [38]. From those buildings, centroid coordinates were calculated on both data as shown in Table 4, then RMSE was obtained from equation (9).
where n is the number of points. The relationship between centroid coordinates in both data can be seen in Figure 18. Due to small difference distance between coordinates from extracted and reference objects, the graph shows positive correlations. It means that based on positional accuracy using centroid accuracy assessment, this method could produce good geometric accuracy.

Time Processing Analysis
Our works were conducted using laptop with specification of processor Intel(R) Core (TM) i7-7500U, RAM memory 8.00 GB, and 64-bit Operating system. Thus, the result of this analysis can be a consideration for practicality evaluation using that hardware specification, and the process could be faster using higher computer specification. Time processing analysis was conducted for each step, namely ground/non-ground classification, building classification, segmentation, and building footprint extraction for an area of 2.478 km 2 . Based on existing topographic map, it has 3637 buildings. The result of time processing analysis is presented in Table 5. According to the result above, it shows that this simple method could complete the whole process very quickly. While manual editing or digitization was still necessary, it was only needed in a few areas since more than half of entire data were statistically satisfying. The time processing analysis indicated that our method run fast, and although this proposed method was simple, it produced good results in a high effectivity manner.

Conclusions
This research aimed to support mapping acceleration especially for topographic map production. It focused on automatic building extraction from LiDAR data using combination of several methods, namely classification, segmentation, and building footprint extraction. Building class classification used multi-scale local dimensionality to separate points into two classes at the maximum separability plane. Building footprint extraction comprised three steps: edge points detection, building delineation, and building regularization. With 10 samples and step 0.5, classification resulted quality and miss factor of 0.597 and 0.524, respectively. The quality was improved by segmentation process to 0.604, while miss factor was getting worse to 0.561. Meanwhile, on average shape index value from extracted building had 0.02 difference and the number of errors was 30% for line segment comparison. Regarding positional accuracy using centroid accuracy assessment, this method could produce RMSE of 1.169 meters. From those statistical analysis, it can