Preprint
Article

This version is not peer-reviewed.

RoRED: A Romanian Relation Extraction Dataset

Submitted:

08 May 2026

Posted:

11 May 2026

You are already at the latest version

Abstract
Relation extraction is an important task for structuring information from unstructured text. However, Romanian language still lacks dedicated datasets and benchmarks for this task. To address this gap, we introduce RoRED, a Romanian relation extraction dataset built by combining two complementary data construction strategies: translating existing high-quality English resources and applying distant supervision to native Romanian Wikipedia data. We leverage a powerful open-source large language model to automatically translate English examples into Romanian. For the native subset, we align Romanian Wikipedia entities with Wikidata relations to obtain naturally occurring Romanian examples. To better reflect real-world relation extraction scenarios, we also introduce synthetic negative examples generated using existing Romanian named entity recognition models. Finally, we validate the dataset by fine-tuning and evaluating multiple baseline models. Our strongest model, LUKE-RoRED, achieves a macro-F1 score of 0.8744 on the RoRED test set, demonstrating that the dataset can support relation extraction for Romanian. Overall, RoRED provides a strong first native benchmark for Romanian relation extraction.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated