Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Improving Stability of Transformer-based Named Entity Recognition Models with Combined Data Representation

Version 1 : Received: 26 September 2023 / Approved: 27 September 2023 / Online: 28 September 2023 (03:01:30 CEST)

How to cite: Marcińczuk, M. Improving Stability of Transformer-based Named Entity Recognition Models with Combined Data Representation. Preprints 2023, 2023091859. https://doi.org/10.20944/preprints202309.1859.v1 Marcińczuk, M. Improving Stability of Transformer-based Named Entity Recognition Models with Combined Data Representation. Preprints 2023, 2023091859. https://doi.org/10.20944/preprints202309.1859.v1

Abstract

This study leverages transformer-based models and focuses on data representation strategies in the named entity recognition task, including "single" (one sentence per vector), "merged" (multiple sentences per vector), and "context" (sentences joined with attention to context). Performance analysis reveals that models trained with a single strategy may not perform well on different data representations. A combined training procedure is proposed to address this limitation, using all three strategies to enhance the stability and adaptability of the model. The results of this approach are presented and discussed for various datasets for four languages (English, Polish, Czech, and German), demonstrating the effectiveness of the combined strategy.

Keywords

named entity recognition; deep learning; transformers; data augmentation; pre-trained language models; PLM

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.