Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: geographic information fusion; data quality; data consistency checking; historic GIS; railway network; patrimonial data; crowdsourcing open data; volunteer geographic information VGI; wikipedia geo-spatial information extraction.
Online: 17 August 2020 (14:51:04 CEST)
Transportation of goods is as old as human civilizations : past networks and their evolution shed light on long term trends. Transportation impact on climate change is measured as major, as well as the impact on spreading a pandemic. These two reasons motivate the importance of providing relevant and reliable historical geographic datasets of these networks. This paper focuses on reconstructing the railway network in France at its maximal extent, a century ago. The active stations and lines are well documented by the French SNCF, in open public data. However, that information ignores past stations (ante 1980), which represent probably more than what is recorded in public data. Additional open data, individual or collaborative (eg. Wikipedia) are particularly valuable, but they are not always geo-coded, and two more sources are necessary to completing that geo-coding: ancient maps and aerial photography. Therefore, remote sensing and volunteer geographic information are the two pillars of past railway reconstruction. The methods developed are adapted to the extraction of information from these sources: automated parsing of Wikipedia Infoboxes, data extraction from simple tables, even from simple text. That series of sparse procedures can be merged into a comprehensive computer-assisted process. Beyond this, a huge effort in quality control is necessary when merging these data: automated wherever possible, or finally visually controlled by observation of remote sensing information. The main output is a reliable dataset, under ODbl, of more than 9100 stations, which can be combined with the information about the 35000 communes of France, for a large variety of studies. This work demonstrates two thesis: (a) it is possible to reconstruct transport network data from the past, and generic computer assisted methods can be developed; (b) the value of remote sensing and volunteered geo info is considerable (what archeologists already know).