Volunteered Geographic Information Users Contributions Pattern and its Impact on Information Quality

The quality of user generated data through crowdsourcing activities has profound impacts on online mapping projects and VGI based platforms [4,5]. Spatial data quality can be ascertained by examining it through a set of criteria such as positional accuracy, thematic accuracy, timing precision, logical consistency, and completeness. Using these measures, multiple studies have been conducted to assess the quality of VGI data [6-9]. The voluntary nature of VGI contributions by individuals who may be untrained in cartography/GIS raises questions regarding the quality and reliability of these data [10]. Thus, it is necessary to design and implement an effective evaluation system to assess VGI data quality. In this study, we examine VGI participation level in OpenStreetMap through three case study neighborhoods located in three major cities: Tehran, London, and Los Angeles.


Introduction
The quality of user generated data through crowdsourcing activities has profound impacts on online mapping projects and VGI based platforms [4,5]. Spatial data quality can be ascertained by examining it through a set of criteria such as positional accuracy, thematic accuracy, timing precision, logical consistency, and completeness. Using these measures, multiple studies have been conducted to assess the quality of VGI data [6][7][8][9]. The voluntary nature of VGI contributions by individuals who may be untrained in cartography/GIS raises questions regarding the quality and reliability of these data [10]. Thus, it is necessary to design and implement an effective evaluation system to assess VGI data quality. In this study, we examine VGI participation level in OpenStreetMap through three case study neighborhoods located in three major cities: Tehran, London, and Los Angeles.

Research Questions, Data and Methodology
In this study, we use three metrics to examine (a) the level of participation in an area (b) local knowledge contribution pattern in OpenStreetMap.
1. The number of tags per object: This metric shows the level of completeness of map features. It implicitly conveys the level of local knowledge in the dataset, as tags are mostly provided by local users. The higher the number of tags per object, the higher the level of detail in a map. 2. The number of objects per user: This shows the level of users' participation and contribution pattern. The goal is to determine how many users provided information on how many objects. This helps us to examine how and to what degree local knowledge exists in the dataset. The higher number of objects per user, the less local knowledge in the dataset. 3. The number of users in a square kilometer: It explicitly conveys the participation level in the area, where a higher number of users in a square kilometer is an indication of both greater participation and a higher quality dataset.
Using these metrics, we compare the level of participation and knowledge contribution patterns in three cities: London (U.K), Tehran (Iran), and Los Angeles (USA). In each city, we selected a representative neighborhood as our study site. These study sites are West Kensington (U.K.), Westwood (USA), and Shekoufeh (Iran). We then obtained the dataset for each neighborhood from OSM on May 22, 2020, and analyzed the data based on the proposed metrics. The findings are displayed in Table 1. Among the three cities, Westwood (L.A.) reports the highest number of objects, 57063 objects. In contrast, West Kensington (London) has 16306 objects, and Shekoufeh (Tehran) has 7888 objects. The high number in Westwood's case is likely caused by the City Hub LA project, through which detailed data sets have been incorporated in OSM. This includes Los Angeles County's excellent building footprints data set from L.A. County GIS Portal, and the Los Angeles County Land Use dataset. Thus, the LA OSM dataset shows a robust spatial dataset at an outstanding level of completeness. Next, we assess participation and the use of local knowledge through an examination of tags per feature. Tags depict detailed information about a particular map feature, therefore features with more tags contain more detailed and complete spatial knowledge. Creating tags requires detailed experiential and local knowledge that may be absent in the authoritative database. The higher number of tags in a feature is an indication of more detailed information about that feature in the dataset, and each new tag is a step towards improving the quality of the feature. In terms of number of tags per object, we find West Kensington to rank first (0.017), followed by Shekoufeh (0.011), while Westwood (0.003) ranks a distant third. This indicates that the London OSM map is rich in local knowledge and is, therefore, more detailed and complete. Tehran's OSM map has an acceptable level of tags per object in comparison with London, which means it is making good progress in incorporating local knowledge. Westwood in Los Angeles shows a surprisingly low number of tags, as 100 objects on its map have only three tags. Thus, while the LA OSM map has benefited from the City Hub LA project in incorporating official data sets, it lacks the local knowledge needed to produce a robust spatial database. We next examine participation in terms of the number of users per area, in West Kensington (London), 168 users created the whole data set. Further, 19.54% percent of users provided 90% of the entire data. In the case of Shekoufeh (Tehran), 83 users created 7888 objects, but three users created 70% of the entire dataset. Westwood, with 148 users, shows a similar pattern. The number of objects per user metric implies the level of user's participation and to what degree local knowledge exists in the dataset. Our findings show that West Kensington (97.06 objects/user) and Shekoufeh (95.03 objects/user) rank well, while Westwood (L.A.) is a distant third (385.56 objects/user). Further, London and Tehran have a similar level of local knowledge in their OSM dataset; in contrast, the L.A. OSM dataset lacks detailed local knowledge due to the lower participation of users. We find that London ranks first with 63 users/square km, followed by LA with 45 users/square kilometer. Tehran ranks third, with 22 users/square km, indicating that the level of participation in Tehran is very low in comparison to London or L.A. This may be caused by the lack of OSM based services in Iran, leading to its low level of popularity among citizens.

Discussion
According to our statistical investigation, 5.50% of users in Westwood (L.A.) have created 90% of the whole dataset. Similarly, about 20% of users in West Kensington (London), and Shekufeh (Tehran) have produced 90% of the entire dataset, with three Shekufeh contributors creating 70% of the dataset. Nonetheless, the contribution of the rest is significant in terms of local knowledge integration. In Shekufeh, 47 users have created local knowledge-based features (such as a short alley, local bank name, pharmacy, dentistry, and a car shop), while 21 of whom have done only one edit in the system. West Kensington (London) and Westwood (L.A.) follow a similar pattern. In West Kensington, eight users have provided 80% percent of the whole dataset. Nevertheless, the rest play a significant role in the provision of local knowledge; 54 users have provided features which need local knowledge, and 30 users have done only one edit in the system. Yet, these 30 users provided details on features such as Exhibition Houses, bookstore, car sharing, supermarket, and London underground station name. Similarly, in LA, 90% of the whole dataset was provided by seven users, but 82 users provided local knowledge-based features, and 36 of them have created only one feature/tag such as high school, supermarket, café shop, convenience shop, and gas station. These contributions define critical public places from a local or experiential perspective that significantly enriches the database. This makes the crowdsourced information unique, UpToDate, and complete; however, existing methodologies cannot assess the quality of these features; due to limitations of available VGI data quality assessment methodologies, quality evaluation of very important and defining VGI features which have less than 15 versions and created by the user with no performance history is impossible.
The advantage of VGI over authoritative maps provided by the federal and municipal agencies is that the former incorporates local knowledge while the latter does not. Because the available methodologies are not able to evaluate the quality of all OSM features in these areas, there is a need to design new methods or advance current methods so that we can determine the quality of local knowledge. This is also necessary as the number of monthly new registered OSM users is increasing, and many new contributors are registering and starting to create, edit, and use OSM. As of May 2020, up to 27,000 new contributors are registering per month. Yet, the dominant VGI quality assessment methods are not capable of assessing the quality of information created or edited by these new users or users with limited history, since exiting methods assess VGI quality through users' reputation which in turn relies on users' contributions history. The inability of existing approaches to evaluate the quality of VGI with a low level of participation (limited number of versions), as well as the VGI generated by new users, underscore the need to provide an alternative to overcome these limitations.
Such challenges can be overcome through new modes of inquiry. By including these characteristics in the process of VGI quality assessment, the limitation of existing methods can be, to some extent, covered [32,37]. Still, there is a need for further research on quality assessment of VGI, so that the VGI system with a low level of participation and information generated by new users or users with limited history is properly assessed.