Version 1
: Received: 5 June 2023 / Approved: 6 June 2023 / Online: 6 June 2023 (03:37:27 CEST)
Version 2
: Received: 8 June 2023 / Approved: 9 June 2023 / Online: 9 June 2023 (04:28:59 CEST)
Ling, G.; Mu, X.; Wang, C.; Xu, A. Enhancing Chinese Address Parsing in Low-Resource Scenarios through In-Context Learning. ISPRS Int. J. Geo-Inf.2023, 12, 296.
Ling, G.; Mu, X.; Wang, C.; Xu, A. Enhancing Chinese Address Parsing in Low-Resource Scenarios through In-Context Learning. ISPRS Int. J. Geo-Inf. 2023, 12, 296.
Ling, G.; Mu, X.; Wang, C.; Xu, A. Enhancing Chinese Address Parsing in Low-Resource Scenarios through In-Context Learning. ISPRS Int. J. Geo-Inf.2023, 12, 296.
Ling, G.; Mu, X.; Wang, C.; Xu, A. Enhancing Chinese Address Parsing in Low-Resource Scenarios through In-Context Learning. ISPRS Int. J. Geo-Inf. 2023, 12, 296.
Abstract
Address parsing is a crucial task in natural language processing, particularly for Chinese addresses. The complex structure and semantic features of Chinese addresses present challenges due to their inherent ambiguity. Additionally, different task scenarios require varying levels of granularity in address components, further complicating the parsing process. To address these challenges and adapt to low-resource environments, we propose CapICL, a novel Chinese address parsing model based on the In-Context Learning (ICL) framework. CapICL leverages a sequence generator, regular expression matching, BERT semantic similarity computation, and GPT modeling to enhance parsing accuracy by incorporating contextual information. We construct the sequence generator using a small annotated dataset, capturing distribution patterns and boundary features of address types to model address structure and semantics, mitigating interference from unnecessary variations. We introduce the REB-KNN algorithm, which selects similar samples for ICL-based parsing using regular expression matching and BERT semantic similarity computation. The selected samples, raw text, and explanatory text are combined to form prompts, and inputted into the GPT model for prediction and address parsing. Experimental results demonstrate significant achievements of CapICL in low-resource environments, reducing dependency on annotated data and computational resources. Our model's effectiveness, adaptability, and broad application potential are validated, showcasing its positive impact in natural language processing and geographical information systems.
Keywords
Chinese address parsing; low-resource scenarios; In-context learning; GPT; BERT;k-nearest neighbors
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Commenter: Guangming Ling
Commenter's Conflict of Interests: Author