Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Event Geoparser with Pseudo-Location Entity Identification and Numerical Extraction in Indonesian News Corpus

Version 1 : Received: 10 August 2020 / Approved: 14 August 2020 / Online: 14 August 2020 (04:00:42 CEST)

A peer-reviewed article of this Preprint also exists.

Dewandaru, A.; Widyantoro, D.H.; Akbar, S. Event Geoparser with Pseudo-Location Entity Identification and Numerical Argument Extraction Implementation and Evaluation in Indonesian News Domain. ISPRS International Journal of Geo-Information 2020, 9, 712, doi:10.3390/ijgi9120712. Dewandaru, A.; Widyantoro, D.H.; Akbar, S. Event Geoparser with Pseudo-Location Entity Identification and Numerical Argument Extraction Implementation and Evaluation in Indonesian News Domain. ISPRS International Journal of Geo-Information 2020, 9, 712, doi:10.3390/ijgi9120712.

Abstract

One of the most important component of a Geographic Information Retrieval (GIR) is the geoparser, which performs toponym recognition, disambiguation, and geographic coordinate resolution from unstructured text domain. However, news articles which report several events across many place references mentioned in the document is not yet adequately modeled by regular geoparser types where the scope of resolution is either on toponym-level or document-level. The capacity to detect multiple events, geolocate its true locations and coordinates along with their numerical arguments are still missing from modern geoparsers, much less in Indonesian news corpora domain. We propose a novel type event geoparser which integrates an ACE-based event extraction model and provides precise event-level scope resolution. The geoparser casts the geotagging and event extraction as sequence labeling and uses Conditional Random Field with keywords feature obtained using Aggregated Topic Model as a semantic exploration from large corpus, which eventually increases the generalizability of the model. The geoparser also use Smallest Administrative Level feature along with Spatial Minimality-derived algorithm to improve the identification of Pseudo-location entities, resulting 19.4% increase for weighted F1 score. As a side effect of event extraction, the geoparser also extracts various numerical arguments and able to generate thematic choropleth map from a single news story.

Keywords

geoparser; geographic information retrieval; event extraction; argument extraction; information extraction; named entity recognition; conditional random function; semantic gazetteer; topic model

Subject

Computer Science and Mathematics, Information Systems

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.