An Area Partitioning Approach to the Conﬂation of Road Networks with Highly Different Level of Details

: A road network represents road objects in a given geographic area and their inter- connections, and is an essential component of intelligent transportation systems (ITS) enabling 2 emerging new applications such as dynamic route guidance, driving assistance systems, and autonomous driving. As the digitization of geospatial information becomes prevalent, a num- 4 ber of road networks with a wide variety of characteristics coexist. In this paper, we present 5 an area partitioning approach to the conﬂation of two road networks with a large difference in 6 level of details. Our approach ﬁrst partitions the geographic area by the Network Voronoi Area 7 Diagram (NVAD) of low-detailed road network. Next, a subgraph of high-detailed road network 8 corresponding to a complex intersection is extracted and then aggregated into a supernode so 9 that a high matching precision can be achieved via 1:1 node matching. To improve the matching recall, we also present a few schemes that address the problem of missing corresponding object 11 and representation dissimilarity between these road networks. Numerical results at Yeouido, 12 Korea’s autonomous vehicle testing site, show that our area partitioning approach can signiﬁcantly 13 improve the performance of road network matching. 14


Introduction
Geographic information systems (GIS) provide the solutions for capturing, ma- 18 nipulating, analyzing and visualizing the geospatial data for many application fields, 19 such as transportation, agriculture, commerce, etc. [1,2]. Initially, government agen- 20 cies have built authoritative GIS because the construction of geospatial information 21 requires extensive and accurate surveys of the land [3,4]. Recently, as the digitization 22 of geospatial information has recently become prevalent, some portal sites or mobile 23 service providers have constructed proprietary GIS that combines authoritative GIS, 24 aerial photos, mobile-mapping service (MMS) and crowdsourcing data, etc. [5,6] On the 25 other hand, voluntary GIS, such as the openstreetmap (OSM), has been constructed by the 26 participation of voluntary user carrying a GPS-enabled mobile terminal [7]. Currently, 27 more than 7.8 million registered users all around the world contribute to the OSM [8]. 28 A road network is a subset of GIS that focuses on road objects, attributes, and their 29 interconnectivity. It is usually represented by a graph, where a node represents an inter- 30 section, an endpoint of a road, or a point of attribute change, whereas an edge represents 31 a road segment connecting two nodes. The road network is an important component of 32 many Intelligent Transportation System (ITS) applications. For example, a turn-by-turn 33 navigation establishes the shortest route connecting the origin and destination in the 34 road network. In addition, the current road traffic information at a road segment is 35 indexed with the corresponding identifier of road network, and then broadcast as a 36 public transportation data, which enables novel ITS applications, such as dynamic route 37 characteristics to support ITS services, but the access to its raw dataset and the ITS  [18] and JOSM 51 [19]), rendering tools, (Mapnik [20] and the Tirex [21]), geocoding tools (Nominatim 52 [22]), and especially routing tools (the open-source routing machine [23] and the Valhalla 53 [24]). However, it has been reported that the quality of OSM object obtained from 54 crowdsourcing can be diverse in terms of accuracy, completeness, and consistency [25]. 55 Taking into account the above characteristics of road networks, we focus on the  However, in reality, there are representational dissimilarities between two datasets 105 originated from the abstraction and generalization errors during the cartographic process. 106 The first type of representational dissimilarity is the nonexistence of corresponding object 107 in the target dataset. To mitigate this problem, it is needed to add a new corresponding 108 road object to the target dataset, which is beyond the scope of this paper. The second 109 type is the attributional dissimilarity between the corresponding objects. The third 110 type is the LoD dissimilarity between the set of corresponding objects representing the 111 same road entity. The LoD dissimilarity does not only affect the RNM, but also requires 112 special treatment for adapting the RNM result to the ITS applications. To address these 113 challenging issues, the RNM approach first needs a method to assess the attributional 114 dissimilarity between a pair of corresponding nodes as well as identifying the set of 115 objects involving in the LoD dissimilarity. Then, based on that, the matching between 116 two datasets can be conducted in non-iterative or iterative manner. In the following 117 sections, we will introduce the existing approaches to addressing these representational 118 dissimilarity as well as the matching procedure in the RNM problem. objects. The semantic dissimilarity also originates from the missing or the incorrectness 127 of the attribute as well as the difference in the attribute definition. To address these dis-128 similarities, the common approach is defining the metrics to quantify the dissimilarity of The single object dissimilarity metrics involve the attribute of a single object, 134 including the geometric metrics (e.g. the difference in position, length, shape, etc. 135 [26,28,29,[31][32][33][34][35][36][37][38][39][40]), the topological RNM metrics (e.g. the difference in the node degree, 136 etc. [26,29,31,34,35,38,40]) and the semantic RNM metrics (e.g. the difference in road 137 name, road class, etc. [26,28,35]). The difference in position, i.e. distance, is the most 138 used metric in the node matching [36] and the edge matching [34,35,[37][38][39]. While the 139 distance in the node matching is trivial, the distance between edges can be defined in 140 several ways. The approaches in [34,35,38] exploit the Hausdorff distance which is the 141 maximum of the minimum distance of any point on reference edge to the target edge.

142
The sampling-based distance used in [37] is the average distance among the pair of 143 equidistant points on both edges. A different sampling-based distance is used in [39] 144 where the average distance is aggregated from the equidistant points on reference edge 145 and their closest point on the target edge. The length difference are used in [26,29,33-146 35,38]. There are several ways to define the shape difference. In [31], it defined as the 147 area created from the two edges after shifting and scaling to make both edge endpoints 148 overlapped. In [26], the shape can be defined as the cumulative angle function among the 149 line segments along an edge and the shape difference is the gap between the two func-150 tions. The other single object dissimilarity metrics such as the differences in node degree, 151 road name, road class are used as the sub-metrics used together with the distance, length 152 difference and shape difference metrics [26,28,29,31,34,35,38,40]. The major problem of 153 single object dissimilarity metrics is wrong identification of the corresponding object if 154 the attributional dissimilarity between the reference object and its corresponding object 155 is large.

156
The multiple object dissimilarity metrics aggregate the attributional dissimilarity of 157 the matched object as well as its nearby objects [26,28,32]. The authors in [32] proposed 158 the cluster-based matching approach in which the clusters are created by spanning from 159 a reference node as well as the target node. This pair of center nodes are matching by 160 comparing the aggregated difference in the distance, length and shape of each pair of 161 edges in two clusters. The authors in [28] proposed another cluster-based matching 162 approach in which the cluster is similarly constructed like the cluster in [32]. The 163 reference and the target nodes are matched by comparing the structure of their clusters.

164
In [26], the authors proposed the delimited-stroke-oriented algorithm to match the 165 groups of connected edges that seem to be on the same road called delimited-stroke.

166
On the other hand, the LoD dissimilarities occur when different number of objects 167 are used to represent a road entity. Comparing to the attributional dissimilarities, they 168 incur the significant dissimilarity in geometrical, topological and semantic attributes 169 among all involving objects. Therefore, the special treatment for LoD dissimilarity 170 is necessary to achieve the correct matching. They can be classified in the edge LoD 171 dissimilarity and the node LoD dissimilarity.

172
The edge LoD dissimilarity comes from the fragmenting of the road into several 173 concatenating edges and the division of the road into two edges, each of which repre-174 senting one traffic direction. The former case is addressed by buffer growing method 175 [29,33], grouping edge to road [26], or the shortest path between nodes [31,35]. In the 176 buffer growing algorithm [29,33], for each reference edge, the list of target edges within 177 a given distance (buffer) is selected as matching candidates. If there is no edge entirely 178 locating inside the buffer, due to the edge LoD dissimilarity, the buffer is extended from 179 the current reference edge to its adjacent edge. By this way, the set of corresponding 180 edges with LoD dissimilarity can be grouped. The approach in [26] groups the edges 181 into strokes containing edges in both datasets that may be in the same road and performs 182 matching on them. The last approach relies on the node matching to match the edges by 183 searching the shortest path between two pairs of matched nodes [31,35]. To address the 184 latter case of the edge LoD dissimilarity, the common solution is detecting and grouping 185 them into one edge [26,28].

186
The node LoD dissimilarity exists when the different number of nodes are used 187 for representing an intersection. Usually, the intersection represented several nodes in 188 one dataset and a single node in the other dataset. In [28], these issues are overcome 189 by grouping the nodes constituting the intersection into a single node. These nodes are 190 detected by finding an area with high node density as suggested in [27]. In [26], the LoD 191 dissimilarity at the roundabout is presented in which the related nodes are detected by 192 checking circle-like shape formed by the edges connecting them. However, the existing 193 approach either includes many false positive, incomplete and redundant matching or 194 addresses the specific types of intersection.

195
To effectively address the node LoD dissimilarity, we need to rely on the distinct 196 geometrical and topological features of the intersection with the LoD dissimilarity in the 197 datasets rather than relying on the statistical parameters such as distance threshold [27] 198 which are the source of incorrect grouping.  [29,38]). The iterative matching from seeds finds the matching with strict 207 condition, and expands the matching to the nearby objects [26,34]. The strong point of 208 iteration from seeds is that the later matched objects partly rely on the reliable matching 209 of the seed objects. In [34], the seed node matching is selected from the matching with 210 only one target node candidate that satisfies thresholds of the node distance and adjacent 211 edges angle difference. In [26], the seed matching is the matching of the large strokes that 212 satisfied the threshold on several metrics such as orientation, length, shape. However, 213 the effectiveness of this iterative matching highly depends on condition to select the 214 seeds. The tight condition leads to smaller amount of seeds, while the looser condition 215 incurs the mismatching of the seed itself.

216
On the other hand, the iterative matching for the consistency relies on the initial 217 matching of two datasets then gradually modifies the initial matching to improve 218 the consistency among the matching [29,31,38]. This approach relies on the matching 219 consistency checking which examines the set of initial matchings by the dissimilarity 220 metrics to evaluate the consistency of the nearby objects matchings. The corresponding 221 target object of a reference object can be changed to obtain the higher consistency. In 222 [31], the authors proposed the adjacent matching consistency checking for the node 223 matching. Initially, target node candidates are matched with node degree and their edges 224 angles. The corresponding target node is decided by checking the matching consistency 225 of the adjacent nodes of the reference node to the adjacent nodes of the target node. In 226 [29,38], the one-hop and two-hop matching consistency checking is also proposed for 227 edge matching. However, similar to iterative matching from seed, the iterative matching 228 for the consistency also relies on the fixed threshold of the dissimilarity metrics to find 229 candidate [29,31,38] or the fixed threshold of the consistency condition [31]. Therefore, 230 both of the approaches may not effectively address the mismatching.

231
To address the drawbacks of the existing approaches, we need a new iterative

236
In this section, we describe the characteristics of two road networks, i.e. NLM and 237 ORN, and then formulate the RNM problem.  Aerial or inter-province road 106 Intra-province road 107 Intra-city or island road 108 Other roads

239
The Korean government has initiated the national GIS project in 1995, and com-240 pleted the construction of geospatial database in 2009 [41]. The NLM is the road network 241 of this database that represents major road objects in Korea [4]. It also provides a uni-242 fied identifier (ID) hierarchy to its road entity. In order to efficiently exchange the ITS 243 information, the Korean law enforces that all ITS applications must use the NLM ID 244 hierarchy to exchange road and traffic information [17].  Table 2, road_type specifies the type of road, such as overpass, underpass, 266

269
The ORN is a subset of OSM objects with highway tag, where a tag is an ordered pair 270 of (key, value) identifying the attribute of a road object. Table 3 shows the highway tag   Figure 2 shows the ORN graph representation which can be modeled by undirected the detailed road network at a complex intersection. This feature makes the ORN more 283 suitable for ITS applications, such as navigation and autonomous driving.

284
In G O , an ORN node v ∈ V is connected to at least three neighbor ORN nodes.

285
In the RNM, NLM node n is associated with ORN subgraph G O (n), where the ORN 286 subgraph can be a single ORN intersection node , e.g. G O (n l ), disconnected subgraphs, 287 e.g. G O (n j ) and G O (n k ), or a connected subgraph, e.g. G O (n i ) and G O (n m ), in Figure   288 2. If an intersection consists of a single ORN intersection node, it is called a simple 289 intersection; otherwise, a complex intersection.

290
The atomic unit for representing an ORN road is a way w ∈ W which may span 291 multiple ORN nodes [7]. If way w includes more than two ORN nodes, it is decomposed 292 into consecutive ORN edges e ∈ E so that each edge connects two ORN nodes only. In    objects that correspond to the same road entity.

306
Since each road network has its own representation rules of the road network,   Given NLM subgraph G N (n i ) and the corresponding map area A(n i ) around n i , the first task of our area partitioning approach is to partition this area into regions, where each region is centered at an intersection in N (n i ). A simple method called the Voronoi diagram (VD) partitions the map area A(n i ) based on the Euclidean distance [30]. The basic idea is to associate a point n ∈ A(n i ) with the region of the closest intersection n x , called the Voronoi cell V(n x ), in terms of the Euclidean distance metric: where N (n i ) = {n i , n j , n k , n l , n m } for NLM graph G N (n i ) in Figure 5  However, given NLM subgraph G N (n i ), the Euclidean norm is no longer a fair measure to evaluate the distance between point n ∈ A(n i ) and the set of NLM nodes in N (n i ). This is because the Euclidean distance metric does not account for the distance from the curved roads in G N (n i ). To address this problem, our area partitioning approach adopts the network Voronoi area diagram (NVAD) whose measure reflects two distance factors [30]: First, if point n is on subgraph G N (n i ), the distance should be the length of shortest path to NLM node n x ∈ N (n i ) in G N (n i ), called the graph distance d G (n, n x ). If point n lies in A(n i )\G N (n i ), the measure should also consider the projection distance d P (n, n x ) to the closest NLM link of subgraph G N (n i ). Figure 5(c) shows these distances between point n and two closest intersections n i and n l . Consequently, the distance metric of NVAD is defined as the sum of these two distance components, i.e., To determine the NLM link onto which a given point n is projected, we choose an

417
The final step of the OSM-SG algorithm is to extract an intersection OSM node 418 whose degree is no less than three in G I (n i ), from V (n i ). If there is at least one intersection 419 OSM node in G I (n i ), the OSM-SG algorithm replaces these OSM nodes with supernode 420 v * i ∈ V * located at the center of them, as shown in Figure 7. Then, node matching is 421 straightforward one-to-one matching between the NLM node n i and OSM subgraph 422 G I (n i ). The edge matching is also straightforward because among the candidate OSM    Figure 9 shows an example of OEI scheme for correspondent-missing NLM link l ij . In this example, both NLM endpoints of l ij are already matched with OSM supernodes, i.e., M(n i ) = v * i and M(n j ) = v * j . The goal of this section is to insert an OSM edge e * ij between these two supernodes that corresponds to NLM link l ij . In general, there are three factors to be considered by the OEI scheme: 1) the displacement ∆ i between NLM node n i and supernode v * i , 2) the angle difference α between NLM line segment n i n j and The OEI scheme first computes an orange dashed link between NLM nodes n i and 470 n j which is equally distant from both NLM links l ij and l ji .  Figure 10 shows an example of NPME scheme for two correspondent-missing 478 degree-2 NLM nodes n i and n j on the administrative boundary. In this case, both NLM them. Then, the NLM node is projected on the paths and a new OSM supernode at the 486 center of mass of projection points is inserted (See OSM node v * 3 in Figure 11(b)). Figure   487 10 shows two OSM supernodes v * i and v * j that are matched with n i and n j , respectively. for unmatched NLM node n i using the NPME scheme. For example, supernode v * 3 is 515 inserted at the center of two projection points onto the opposite OSM edges in Figure   516 11(b), and supernode v * 1 at the projection point in the extended OSM edge in Figure   517 11(c). Since there is no such OSM edge for unmatched NLM node n 2 in Figure 11(d), 518 supernode v * 2 is overlaid on NLM node n 2 .

519
Once supernode v * i is obtained, the SOSC scheme uses the OEI scheme to insert a

524
where supernodes v * 6 and v * 8 already have their own OSM edges connecting to supernode 525 v * 3 . For the OSM edge between supernodes v * 3 and v * 7 , the OEI scheme shifts, scales, and 526 rotates the blue dotted NLM link in the middle of two parallel NLM links l 37 and l 73 . In 527 Figure 11(c), the SOSC scheme also inserts an OSM edge between supernodes v * 1 and v * 10 .

528
Since there is no existing OSM edge for unmatched NLM node n 2 in Figure 11(d), the 529 SOSC scheme needs to insert an OSM edge connecting to every neighbor supernode. it is also required to add an additional OSM relation that restricts the u-turns between 552 two dual carriage edges.

553
However, it is not easy to define a single intersection OSM node for connecting all 554 OSM edges in a complex intersection due to the wide diversity of its internal structure.  Figure 12  suggest the mapping strategy between NLM attribute and OSM tag using the Korean 574 map feature [42]. 575 We observe that an NLM node has a valid entry for every NLM attribute, such 576 as node_id, X, Y, node_type, road_name, and turn_p, in the NLM dataset, where X and Y 577 corresponds to longitude and the latitude, respectively. In addition, an OSM node has a  Table 4 lists the mapping between NLM (road_rank, connect) 592 attributes and OSM highway tag over all matched road objects by the S-RNM, where KMF 593 represents the Korean map feature of OSM in [42], and MM-HT and SMM-HT stand 594 for most matched highway tag and second MM-HT from statistical data, respectively.

595
In addition, an NLM link with connect = 000 is a normal NLM link, while one with 596 connect = road_rank is a connecting NLM link. We observe that the statistical data can 597 be much different from the KMF, except for general national road, possibly due to the 598 misinterpretation of detailed road attributes during the OSM crowdsourcing process.

599
Taking into account this limitation, the exception-handling scheme chooses the OSM 600 highway tag based on the KMF assignment.

602
In this section, we compare the numerical results of the OSM-SG algorithm and 603 the whole S-RNM framework with three node-matching-based schemes and two edge-    Table 5.
641 Figure 13 shows the overall matching of the S-RNM in which the black bold line 642 in Figure 13(a) represents the node matching between the NLM nodes and the OSM 643 nodes and the red bold line in Figure 13(b) is the edge matching between NLM links 644 and the OSM edges. In the following sections, we first evaluate the performance of 645 the S-RNM in terms of precision, recall, and F-score as suggested in [39,45]. Since the   harmonic mean, also known as F-score, is used. The formula of these three performance 661 metrics are given as follows: First, we present the result of these metrics in node matching among the OSM-SG,     Figure 16 shows the precision, recall, and F-score of the edge matching. This figure   707 shows the advantages of the node-matching-based schemes over the edge-matching-

723
In order to find out how the S-RNM achieves good precision and recall, we discuss 724 another aspect of the matching results by classifying the matching of NLM nodes into 725 correct, incorrect, incomplete, and unmatched. The correctly matched NLM node is 726 defined in Section 6.1. The incorrectly matched NLM node is the NLM node matched to 727 the OSM subgraph G I (n i ) which has at least one OSM node v j ∈ G I (n i ) and v j / ∈ G T I (n i ).

728
On the other hand, the incompletely matched NLM node n i has G I (n i ) ⊂ G T I (n i ). The 729 unmatched NLM node is the NLM node matched to null OSM subgraph G I (n i ).
730 Figure 17 shows the ratio of matching in each category for all node-matching-based 731 schemes. First, we find out why the OSM-SG has high precision but low recall. It can be   projection boundary deviate from b 1 , we draw two additional bisectors that intersect 795 with bisector b 1 at point n p : bisector b 2 of the angle between l ji (p + 1) and l im (q) and 796 bisector b 3 of angle θ ji (p). At point n p , the projection distance to NLM line segments 797 l ji (p), l ji (p + 1), and l im (q) becomes the same. After point n p , the projection boundary 798 deviates from b 1 and becomes the red dotted line segment b 2 .

799
When θ ji (p) > 180 • as shown in Figure 1(b), bisector b 2 is similarly obtained from the crosspoint of l im (q) and the extended line of l ji (p). Next, we determine point n q on bisector b 2 so that its distance to point n ji (p) is equal to the projection distance to l im (q). It is clear that, beyond point n q , bisector b 2 becomes the projection boundary. The remaining problem is to determine the projection boundary between points n p and n q . To address this problem, we first define a Cartesian coordinate whose X-axis crossing at the origin point n p is parallel to l im (q). We denote the Cartesian coordinate of point n on the transient boundary curve by (x, y). Similarly, the Cartesian coordiantes of point n ji (p) is denoted by (x 0 , y 0 ). Since y > 0, the projection distance of point n to l im (q) becomes y + d P,2 which must be equal to the distance between points n and n ji (p), i.e., (x − x 0 ) 2 + (y − y 0 ) 2 = y + d P,2 . (A1) Finally, the transient curve of projection boundary becomes a parabola satisfying the following equation: