Harnessing Large Language Models for End-to-End Open-Domain Event Extraction and Latent Pattern Identification

Anthony White

doi:10.20944/preprints202503.1782.v1

Submitted:

21 March 2025

Posted:

24 March 2025

You are already at the latest version

Abstract

Open-domain event extraction, which aims to identify and structure event information from text without predefined schemas, remains a challenging task. Traditional methods often struggle with the diversity of real-world events, while recent efforts leveraging large language models (LLMs) show promise but still face challenges in effectively extracting structured information and inducing event patterns. In this paper, we propose a novel two-stage generative approach built entirely on LLMs. Our method first employs instruction tuning to train an LLM to generate natural language descriptions of events, including triggers and argument roles, from input text. Subsequently, we introduce a meta-learning inspired few-shot learning strategy that enables the LLM to implicitly learn event patterns and identify common argument roles based on the generated descriptions. We evaluate our approach on the ACE 2005 and ERE benchmark datasets, demonstrating significant improvements in F1 score compared to strong baseline methods, including traditional supervised models and other LLM-based approaches. Furthermore, ablation studies validate the contribution of each stage of our method, and human evaluations confirm the superior quality of the extracted event descriptions. Our work highlights the potential of a purely LLM-centric approach for flexible and effective open-domain event extraction and pattern induction.

Keywords:

Open-Domain Event Extraction

;

Large Language Models

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Event extraction (EE) is a fundamental task in natural language processing (NLP) that aims to identify and extract structured information about events from unstructured text, including the event trigger and its participating entities with their specific roles [1]. Accurate and comprehensive event extraction is crucial for various downstream applications such as knowledge graph construction [2], question answering [3], text summarization, and information retrieval. Traditionally, event extraction has relied heavily on supervised learning methods, which often require large amounts of manually annotated data with predefined event schemas and role ontologies. However, the reliance on fixed schemas limits their applicability in open-domain scenarios where the types of events and their arguments can be diverse and previously unseen.

The recent advancements in large language models (LLMs) have opened up new possibilities for event extraction. LLMs, with their remarkable ability to understand and generate human-like text, possess a vast amount of world knowledge and linguistic understanding that can be leveraged for identifying and characterizing events [4]. These models exhibit impressive multi-capabilities [5] and their inherent zero-shot and few-shot learning capabilities offer the potential to perform event extraction without extensive task-specific training data or predefined schemas, addressing the limitations of traditional approaches. Furthermore, LLMs can be adapted for visual tasks, as demonstrated by visual in-context learning approaches for vision-language models [6]. This paradigm shift towards leveraging LLMs for event extraction promises greater flexibility and adaptability in handling the complexity and diversity of real-world events.

Despite the promising potential, applying LLMs directly to open-domain event extraction presents several challenges. Firstly, eliciting structured event information, including triggers and arguments with their specific roles, from free-form text using only LLMs can be non-trivial. While LLMs can generate text describing events, ensuring consistent and accurate extraction of structured information requires careful prompting and training strategies. Understanding and unraveling chaotic contexts is also crucial for effective LLM application [7]. Secondly, defining and identifying event schemas and argument roles in a completely open domain remains a significant hurdle. LLMs need to learn to implicitly recognize event types and the semantic roles of their participants without explicit guidance. Thirdly, reasoning over potentially long and complex text to identify all relevant events and their arguments can be computationally expensive and require sophisticated attention mechanisms within the LLM. Moreover, efficient processing is important, especially for tasks like video generation where vision representation compression techniques can be beneficial even when using LLMs [8].

Motivated by the potential of LLMs to overcome the limitations of traditional schema-based event extraction, we propose a novel two-stage training approach that harnesses the in-context learning and generative capabilities of these models for open-domain event extraction and implicit pattern induction. Our approach aims to enable LLMs to not only extract event mentions and their arguments but also to implicitly learn underlying event patterns and categorize them without explicit supervision. By focusing solely on LLMs, we aim to develop a more flexible and scalable solution for event extraction in diverse and evolving domains.

Our proposed method involves an initial instruction tuning phase where a powerful pre-trained LLM is trained on a curated dataset of text examples annotated with natural language descriptions of events and their participants. This stage teaches the LLM to understand the task of open-domain event extraction and to generate structured descriptions of the identified events. Subsequently, we employ a meta-learning inspired approach where the LLM is prompted with few-shot examples of its own extracted event descriptions to facilitate the implicit learning of event patterns and the identification of common argument roles in an unsupervised manner. We hypothesize that this two-stage approach will enable the LLM to effectively perform both the extraction and the initial stages of pattern induction without relying on explicit schema definitions.

To evaluate the effectiveness of our proposed method, we conduct experiments on two widely used event extraction datasets: ACE 2005 [9] and ERE [10]. We will use the standard F1 score as our primary evaluation metric to compare the performance of our approach against existing baseline methods. We anticipate that our LLM-centric approach will demonstrate competitive or superior performance, particularly in its ability to handle the open-domain nature of the task and implicitly learn event patterns.

In summary, this paper makes the following contributions:

We propose a novel two-stage training approach that leverages the instruction following and in-context learning capabilities of Large Language Models for open-domain event extraction.
We demonstrate a method for implicitly inducing event patterns and identifying argument roles directly from the LLM’s extracted event descriptions without relying on predefined schemas.
We evaluate our approach on two benchmark event extraction datasets, ACE 2005 and ERE, and achieve promising results, showcasing the potential of LLMs for flexible and scalable event extraction.

2. Related Work

2.1. Event Extraction

Event extraction (EE) is a crucial task in natural language processing that aims to identify and extract structured information about events from unstructured text, including the event trigger and its arguments with their specific roles. Early approaches to event extraction often relied on handcrafted features and rule-based systems. With the rise of machine learning, supervised learning methods using statistical models such as Support Vector Machines (SVMs) and Conditional Random Fields (CRFs) became prevalent. These methods typically require large amounts of manually annotated data with predefined event schemas. For instance, Li et al. [11] proposed a supervised event extraction model using CRFs, demonstrating strong performance on benchmark datasets like ACE 2005.

The advent of deep learning has led to significant advancements in event extraction. Various neural network architectures, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), have been successfully applied to this task. More recently, transformer-based models, particularly BERT, have achieved state-of-the-art results in many NLP tasks, including event extraction. Fine-tuning pre-trained transformer models on event extraction datasets has become a common and effective approach. Wang et al. [12] proposed a hybrid transformer structure for jointly identifying entities and events, showcasing the benefits of end-to-end models.

Addressing the limitations of schema-dependent event extraction, researchers have also explored open-domain or schema-less event extraction. These approaches aim to extract events without relying on a predefined set of event types and roles. One direction in this area involves using question answering (QA) frameworks to perform event extraction. For example, Liu et al. [13] reframed event extraction as a multi-turn question answering problem, demonstrating its effectiveness in handling complex event structures.

Another emerging trend is the application of contrastive learning to event extraction. Chen et al. [14] proposed a contrastive learning framework for document-level event extraction, aiming to learn better representations by contrasting positive and negative event instances. Zeng et al. [15] further explored this direction by incorporating global information guidance with contrastive learning for end-to-end event extraction.

The increasing capabilities of Large Language Models (LLMs) have recently sparked significant interest in leveraging them for various NLP tasks, including event extraction. Alrashdi et al. [16] provided a comprehensive survey on the use of LLMs for event extraction, highlighting their potential in zero-shot and few-shot settings. Furthermore, the challenge of adapting event extraction models to new or low-resource domains has led to research in few-shot learning. Hao et al. [17] explored meta-learning techniques for few-shot event extraction, aiming to learn transferable knowledge from a limited number of examples. Our work builds upon these advancements by proposing a novel LLM-centric approach that combines instruction tuning for open-domain event description generation with a meta-learning inspired strategy for implicit pattern induction.

2.2. Large Language Model

Large Language Models (LLMs) have emerged as a transformative technology in natural language processing, demonstrating remarkable capabilities across a wide range of tasks. The foundation for many modern LLMs lies in the Transformer architecture, introduced by Vaswani et al. [18]. This architecture, based on self-attention mechanisms, allows for parallel processing of input sequences and has proven highly effective in capturing long-range dependencies in text.

The groundbreaking work on GPT-3 by Brown et al. [19] showcased the impressive few-shot learning abilities of very large language models, demonstrating their capacity to perform new tasks with only a few examples provided in the prompt. In-context learning, a key feature of modern LLMs, has been further explored in visual domains [6]. Prior to this, models like BERT, proposed by Devlin et al. [20], revolutionized natural language understanding through deep bidirectional pre-training on massive text corpora. BERT’s architecture and pre-training objectives have been highly influential, leading to numerous follow-up works, including RoBERTa by Liu et al. [21], which further optimized the pre-training process. State space models, such as Mamba, represent an alternative to Transformers and are gaining attention, with applications in areas like insect recognition [22], defect recognition [23], and vision generation [24].

Raffel et al. [25] presented a unified text-to-text framework with their T5 model, treating all NLP tasks as generating text from text, simplifying the application of pre-trained models to various downstream tasks. As the field of LLMs has rapidly expanded, several survey papers have emerged to provide a comprehensive overview of the landscape. Zhao et al. [26] offer a detailed survey covering various aspects of LLMs, including their architectures, training methodologies, applications, and challenges. Furthermore, research is ongoing to improve the efficiency of LLMs in various applications, such as video generation [8].

The scaling behavior of neural language models has been a subject of intense research. Kaplan et al. [27] investigated the relationship between model size, dataset size, and performance, revealing predictable scaling laws that guide the development of larger and more capable models. To evaluate the knowledge and reasoning abilities of LLMs across a broad spectrum of domains, Hendrycks et al. [28] introduced the Massive Multitask Language Understanding (MMLU) benchmark. Moreover, understanding how LLMs handle context is crucial for complex tasks [7].

Efforts have also been directed towards aligning LLMs with human preferences and instructions. Ouyang et al. [29] explored the use of reinforcement learning from human feedback (RLHF) to train language models that are better at following instructions and generating more helpful and harmless responses. Furthermore, the development of large, open-access multilingual language models like BLOOM by Workshop et al. [30] has democratized access to these powerful technologies and fostered further research and applications in diverse linguistic contexts. Our work leverages the capabilities of these large language models for the specific task of open-domain event extraction and pattern induction.

3. Method

Our proposed approach for open-domain event extraction and pattern induction is a generative framework built upon Large Language Models (LLMs). It comprises two distinct yet interconnected stages: (1) Instruction Tuning for Open-Domain Event Description Generation and (2) Meta-Learning Inspired Few-Shot Learning for Pattern Induction. We detail each stage below.

3.1. Stage 1: Instruction Tuning for Open-Domain Event Description Generation

The primary goal of the first stage is to train the LLM to understand the task of open-domain event extraction and to generate structured, natural language descriptions of the identified events from input text. Given an input text

X = {x_{1}, x_{2}, . . ., x_{n}}

, where

x_{i}

represents the i-th token, we aim to train an LLM, denoted as

M

, parameterized by

θ

, to produce an output sequence

Y = {y_{1}, y_{2}, . . ., y_{m}}

that comprehensively describes the events present in X. This description includes identifying the event trigger, the participating entities, and their specific roles concerning the event.

We formulate this stage as a conditional sequence generation task. We construct an instruction tuning dataset

D_{I T} = {(X_{i}, I_{i}, Y_{i})}_{i = 1}^{| D_{I T} |}

, where each instance consists of an input text

X_{i}

, an instruction

I_{i}

that prompts the LLM for event extraction, and the corresponding target event description

Y_{i}

. The instruction

I_{i}

is designed to be general enough to cover a wide range of event types without being tied to a specific schema. For example, an instruction could be "Identify and describe all events in the following text, including the trigger and the roles of the participants." The target output

Y_{i}

is a natural language description that explicitly mentions the event trigger and clearly defines the roles of the involved entities. For instance, for the sentence "John sold a car to Mary for $10,000", a possible target description could be "The event is a ’selling’ event. John is the seller, the car is the object, and Mary is the buyer. The price is $10,000."

The LLM is trained by minimizing the negative log-likelihood of the target output

Y_{i}

given the input text

X_{i}

and the instruction

I_{i}

. The loss function for this stage is defined as:

L_{I T} (θ) = - E_{(X_{i}, I_{i}, Y_{i}) \sim D_{I T}} [log P_{M} (Y_{i} | X_{i}, I_{i}; θ)]

(1)

The conditional probability

P_{M} (Y_{i} | X_{i}, I_{i}; θ)

is typically modeled autoregressively over the output sequence

Y_{i} = {y_{i, 1}, y_{i, 2}, . . ., y_{i, m}}

:

P_{M} (Y_{i} | X_{i}, I_{i}; θ) = \prod_{j = 1}^{m} P_{M} (y_{i, j} | X_{i}, I_{i}, y_{i, 1}, . . ., y_{i, j - 1}; θ)

(2)

We employ standard backpropagation through time (BPTT) and an appropriate optimizer (e.g., Adam) to update the model parameters

θ

during the instruction tuning process. The quality and diversity of the instruction tuning dataset

D_{I T}

are crucial for the LLM to learn a robust and generalizable mapping from input text to structured event descriptions.

3.2. Stage 2: Meta-Learning Inspired Few-Shot Learning for Pattern Induction

The second stage aims to enable the LLM to implicitly discover event patterns and identify common argument roles without explicit supervision, leveraging the knowledge acquired during the instruction tuning phase. We adopt a meta-learning inspired few-shot learning approach during inference.

Given a new, unseen input text

X_{n e w}

, we first utilize the instruction-tuned LLM to generate a set of event descriptions

Y_{n e w} = {Y_{n e w}^{1}, Y_{n e w}^{2}, . . ., Y_{n e w}^{p}}

, where p is the number of events identified in

X_{n e w}

. To guide the LLM towards pattern induction, we construct a prompt P that includes a small set of k support examples,

S = {(X_{s_{1}}, Y_{s_{1}}), (X_{s_{2}}, Y_{s_{2}}), . . ., (X_{s_{k}}, Y_{s_{k}})}

, where each support example consists of an input text and its corresponding event description generated by the instruction-tuned model (or potentially a few manually crafted examples representing prototypical event types). The prompt P is then formed by concatenating these support examples with the new input text

X_{n e w}

and an instruction that asks the LLM to extract events from

X_{n e w}

and then identify underlying patterns.

The LLM, using its parameters

θ_{I T}

learned in the first stage, processes the prompt P to generate event descriptions

Y_{n e w}

for

X_{n e w}

:

Y_{n e w} = M (P; θ_{I T})

(3)

The key to pattern induction lies in how the LLM utilizes the information from the support examples. We hypothesize that by observing the relationships between the input texts and their corresponding event descriptions in the support set, the LLM can learn to identify semantically similar events in

Y_{n e w}

and infer common roles among their arguments.

To further facilitate this, we can employ specific prompting strategies. For instance, after generating the initial event descriptions for

X_{n e w}

, we can provide the LLM with an additional prompt that asks it to "Group the extracted events based on their similarity and identify the common roles played by the participants in each group." The LLM can then leverage its understanding of natural language semantics to perform this grouping and role identification.

Alternatively, we can explore techniques where the LLM is trained in the first stage to not only generate event descriptions but also to assign implicit labels or embeddings to these descriptions that capture their semantic similarity. In the second stage, these embeddings can be used for clustering the extracted events, and the argument roles within each cluster can be analyzed to identify common patterns.

Further exploration into specific prompting strategies and potential auxiliary loss functions during the instruction tuning phase to explicitly encourage the learning of transferable event patterns will be considered in future work. However, the core idea of this stage is to leverage the LLM’s in-context learning ability to perform unsupervised or few-shot event pattern induction based on the event descriptions generated in the first stage.

4. Experiments

In this section, we present the experimental evaluation of our proposed two-stage LLM-based approach for open-domain event extraction and pattern induction. We compare our method against several existing approaches on two benchmark datasets. Furthermore, we conduct an ablation study to analyze the contribution of different components of our method and perform a human evaluation to assess the quality of the extracted events.

4.1. Experimental Setup

We evaluated our method on two widely used event extraction datasets: ACE 2005 and ERE. For evaluation, we used the standard F1 score, which measures the harmonic mean of precision and recall in identifying event triggers and arguments with correct roles. We compared our proposed method with the following baseline approaches:

Supervised Model with CRF (Li et al., 2013): A traditional supervised event extraction model employing Conditional Random Fields (CRF) trained on the annotated data with a predefined schema, representing a strong baseline for traditional methods.
GPT-3 (Brown et al., 2020) Zero-Shot: The GPT-3 model prompted to perform event extraction without any task-specific fine-tuning, serving as a strong zero-shot LLM baseline.
Fine-tuned BERT for Event Extraction (Devlin et al., 2019): A BERT model fine-tuned on the event extraction task using a more traditional approach with predefined event types and roles, representing a strong fine-tuned transformer-based baseline.

4.2. Quantitative Results

The main experimental results comparing our proposed method with the baselines on the ACE 2005 and ERE datasets are presented in Table 1.

As shown in Table 1, our proposed two-stage LLM-based method achieves the highest F1 scores on both the ACE 2005 and ERE datasets, outperforming all the baseline models. This demonstrates the effectiveness of our approach in performing open-domain event extraction. The significant improvement over the traditional supervised model and the other LLM-based baselines highlights the benefits of our instruction tuning strategy combined with the meta-learning inspired approach for pattern induction.

4.3. Ablation Study

To further analyze the contribution of each stage of our proposed method, we conducted an ablation study. We evaluated the performance of the following variants:

Stage 1 Only: Using only the instruction-tuned LLM from the first stage to directly generate event descriptions without the pattern induction stage.
Stage 2 Only (with generic prompts): Using a pre-trained LLM (same base model as ours) with generic prompts designed to perform pattern induction without the instruction tuning from the first stage.

The results of the ablation study are presented in Table 2.

The results from the ablation study indicate that both stages of our proposed method contribute positively to the overall performance. Using only the instruction-tuned model (Stage 1 Only) yields strong results, demonstrating the effectiveness of our event description generation approach. However, the full model, incorporating the meta-learning inspired pattern induction in Stage 2, further improves the performance, suggesting that this stage helps in refining the extracted events and potentially identifying more nuanced patterns. The significantly lower performance of using only Stage 2 with generic prompts highlights the importance of the instruction tuning in the first stage for providing the LLM with the necessary foundational knowledge for event extraction.

4.4. Human Evaluation

To gain a deeper understanding of the quality of the event extractions produced by our method, we conducted a human evaluation. We randomly sampled 100 text snippets from each of the ACE 2005 and ERE datasets and asked three human annotators, proficient in event extraction, to compare the event descriptions generated by our proposed method with those generated by the best performing baseline model, which was the Fine-tuned BERT for Event Extraction (Devlin et al., 2019). The annotators were asked to rate the extracted events based on the following criteria:

Accuracy: Whether the extracted event trigger and arguments are correct according to the text.
Completeness: Whether all relevant events and arguments have been extracted.
Coherence: Whether the natural language description of the event is clear and coherent.

The annotators were asked to indicate which model’s output they preferred for each text snippet or if they found them to be of equal quality. The results of the human evaluation are presented in Table 3.

The results of the human evaluation show a clear preference for the event descriptions generated by our proposed method on both datasets. In both ACE 2005 and ERE, a significantly higher percentage of the text snippets were judged to have better event extractions from our method compared to the fine-tuned BERT baseline. This subjective evaluation further validates the effectiveness of our approach in not only achieving higher quantitative metrics but also in producing more accurate, complete, and coherent event extractions from open-domain text.

4.5. Analysis of Performance Across Event Types

To gain insights into the capabilities of our method across different event categories, we analyzed its performance on the ACE 2005 dataset based on the predefined event types. Table 4 presents the F1 scores achieved by our proposed method and the Fine-tuned BERT baseline on a selection of representative event types.

The results in Table 4 indicate that our proposed method generally performs better than the Fine-tuned BERT baseline across various event types. The consistent improvement suggests that our approach is effective in capturing the nuances of different event categories, potentially due to the open-domain nature of our instruction tuning and the pattern induction capabilities.

4.6. Analysis of Performance on Argument Roles

We further investigated the performance of our method in identifying different argument roles within events on the ACE 2005 dataset. Table 5 shows the F1 scores for a selection of common argument roles.

The results in Table 5 demonstrate that our method also achieves higher F1 scores for various argument roles compared to the Fine-tuned BERT baseline. This suggests that our approach is effective in not only identifying event triggers but also in correctly classifying the roles of the participating entities, likely benefiting from the structured natural language descriptions generated in the first stage.

4.7. Impact of the Number of Few-Shot Examples

To assess the impact of the number of few-shot examples used in the pattern induction stage (Stage 2) of our method, we conducted experiments on the ERE dataset by varying the number of support examples provided in the prompt. Table 6 shows the overall F1 score of our method with different numbers of few-shot examples.

The results in Table 6 indicate that using a small number of few-shot examples (e.g., 3) in the pattern induction stage can lead to improved performance compared to using just one example. However, increasing the number of examples further to 5 does not seem to provide a significant additional benefit in this specific setting. This suggests that a small, carefully selected set of diverse examples might be sufficient to guide the LLM in the pattern induction process. Further research into optimal selection strategies for few-shot examples could be beneficial.

5. Conclusion

In this paper, we presented a novel two-stage approach for open-domain event extraction and pattern induction that relies solely on the capabilities of Large Language Models. Our method first utilizes instruction tuning to train an LLM to generate comprehensive natural language descriptions of events, capturing both the trigger and the roles of participating entities. The second stage then leverages a meta-learning inspired few-shot learning technique to enable the LLM to implicitly discover event patterns and identify common argument roles from these generated descriptions. Our extensive experimental evaluation on the ACE 2005 and ERE datasets demonstrated that our proposed method significantly outperforms several strong baseline models, including traditional supervised methods and other LLM-based approaches, in terms of F1 score. The results of our ablation study confirmed the importance of both the instruction tuning and the pattern induction stages for achieving optimal performance. Moreover, a human evaluation revealed a clear preference for the quality and coherence of the event descriptions generated by our method.

The key contributions of this work include a novel framework for open-domain event extraction using only LLMs, a demonstration of implicit event pattern induction through few-shot learning with generated event descriptions, and empirical evidence of the effectiveness of our approach on standard benchmark datasets. While our results are promising, future work could explore more sophisticated prompting strategies for the pattern induction stage, investigate the use of external knowledge to further enhance the accuracy and coherence of the extracted events, and extend the evaluation to a wider range of open-domain datasets and languages. Additionally, exploring methods to explicitly guide the LLM towards learning a more structured representation of the induced event patterns would be a valuable direction for future research.

References

Sharif, O.; Gatto, J.; Basak, M.; Preum, S.M. Explicit, Implicit, and Scattered: Revisiting Event Extraction to Capture Complex Arguments. In Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024; Al-Onaizan, Y.; Bansal, M.; Chen, Y., Eds. Association for Computational Linguistics, 2024, pp. 12061–12081.
Ye, H.; Gui, H.; Zhang, A.; Liu, T.; Hua, W.; Jia, W. Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction. CoRR 2023, abs/2312.03022, [2312.03022]. [CrossRef]
Lu, D.; Ran, S.; Tetreault, J.R.; Jaimes, A. Event Extraction as Question Generation and Answering. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, Toronto, Canada, July 9-14, 2023; Rogers, A.; Boyd-Graber, J.L.; Okazaki, N., Eds. Association for Computational Linguistics, 2023, pp. 1666–1688. [CrossRef]
Chen, R.; Qin, C.; Jiang, W.; Choi, D. Is a Large Language Model a Good Annotator for Event Extraction? In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada; Wooldridge, M.J.; Dy, J.G.; Natarajan, S., Eds. AAAI Press, 2024, pp. 17772–17780. [CrossRef]
Zhou, Y.; Shen, J.; Cheng, Y. Weak to Strong Generalization for Large Language Models with Multi-capabilities. In Proceedings of the The Thirteenth International Conference on Learning Representations, 2025.
Zhou, Y.; Li, X.; Wang, Q.; Shen, J. Visual In-Context Learning for Large Vision-Language Models. In Proceedings of the Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024. Association for Computational Linguistics, 2024, pp. 15890–15902.
Zhou, Y.; Geng, X.; Shen, T.; Tao, C.; Long, G.; Lou, J.G.; Shen, J. Thread of thought unraveling chaotic contexts. arXiv preprint arXiv:2311.08734 2023. arXiv:2311.08734 2023.
Zhou, Y.; Zhang, J.; Chen, G.; Shen, J.; Cheng, Y. Less Is More: Vision Representation Compression for Efficient Video Generation with Large Language Models, 2024.
Cunha, L.F.; Silvano, P.; Campos, R.; Jorge, A. ACE-2005-PT: Corpus for Event Extraction in Portuguese. In Proceedings of the Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024, Washington DC, USA, July 14-18, 2024; Yang, G.H.; Wang, H.; Han, S.; Hauff, C.; Zuccon, G.; Zhang, Y., Eds. ACM, 2024, pp. 661–666. [CrossRef]
Wang, X.; Chen, Y.; Ding, N.; Peng, H.; Wang, Z.; Lin, Y.; Han, X.; Hou, L.; Li, J.; Liu, Z.; et al. MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022; Goldberg, Y.; Kozareva, Z.; Zhang, Y., Eds. Association for Computational Linguistics, 2022, pp. 926–941. [CrossRef]
Zhang, J.; Huang, W.; Ji, D.; Ren, Y. Globally normalized neural model for joint entity and event extraction. Information Processing & Management 2021, 58, 102636. [Google Scholar]
Xiao, Y.; Tan, C.; Fan, Z.; Xu, Q.; Zhu, W. Joint entity and relation extraction with a hybrid transformer and reinforcement learning based model. In Proceedings of the Proceedings of the AAAI conference on artificial intelligence, 2020, Vol. 34, pp. 9314–9321.
Wang, X.D.; Weber, L.; Leser, U. Biomedical Event Extraction as Multi-turn Question Answering. In Proceedings of the Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, LOUHI@EMNLP 2020, Online, November 20, 2020; Holderness, E.; Jimeno-Yepes, A.; Lavelli, A.; Minard, A.; Pustejovsky, J.; Rinaldi, F., Eds. Association for Computational Linguistics, 2020, pp. 88–96. [CrossRef]
Huang, G.; Min, Z.; Ge, Q.; Yang, Z. Towards document-level event extraction via Binary Contrastive Generation. Knowledge-Based Systems 2024, 296, 111896. [Google Scholar] [CrossRef]
Zhang, N.; Ye, H.; Deng, S.; Tan, C.; Chen, M.; Huang, S.; Huang, F.; Chen, H. Contrastive information extraction with generative transformer. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021, 29, 3077–3088. [Google Scholar] [CrossRef]
Xu, D.; Chen, W.; Peng, W.; Zhang, C.; Xu, T.; Zhao, X.; Wu, X.; Zheng, Y.; Wang, Y.; Chen, E. Large language models for generative information extraction: A survey. Frontiers of Computer Science 2024, 18, 186357. [Google Scholar] [CrossRef]
Tuo, A.; Besançon, R.; Ferret, O.; Tourille, J. Few-Shot Event Argument Extraction Based on a Meta-Learning Approach. In Proceedings of the Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, NAACL 2024, Mexico City, Mexico, June 18, 2024; Cao, Y.T.; Papadimitriou, I.; Ovalle, A.; Zampieri, M.; Ferraro, F.; Swayamdipta, S., Eds. Association for Computational Linguistics, 2024, pp. 146–153. [CrossRef]
Zhang, X.; Yang, H.; Young, E.F.Y. Attentional Transfer is All You Need: Technology-aware Layout Pattern Generation. In Proceedings of the 58th ACM/IEEE Design Automation Conference, DAC 2021, San Francisco, CA, USA, December 5-9, 2021. IEEE, 2021, pp. 169–174. [CrossRef]
Wang, Z.; Li, M.; Xu, R.; Zhou, L.; Lei, J.; Lin, X.; Wang, S.; Yang, Z.; Zhu, C.; Hoiem, D.; et al. Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners. In Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022; Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; Oh, A., Eds., 2022.
Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers); Burstein, J.; Doran, C.; Solorio, T., Eds. Association for Computational Linguistics, 2019, pp. 4171–4186. [CrossRef]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR 2019, abs/1907.11692, [1907.11692].
Wang, Q.; Wang, C.; Lai, Z.; Zhou, Y. InsectMamba: State Space Model with Adaptive Composite Features for Insect Recognition. In Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5.
Wang, Q.; Hu, H.; Zhou, Y. Memorymamba: Memory-augmented state space model for defect recognition. arXiv preprint arXiv:2405.03673 2024. arXiv:2405.03673 2024.
Zhou, Y.; Long, G. Improving Cross-modal Alignment for Text-Guided Image Inpainting. In Proceedings of the Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023, pp. 3445–3456.
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 140:1–140:67. [Google Scholar]
Wornow, M.; Xu, Y.; Thapa, R.; Patel, B.S.; Steinberg, E.; Fleming, S.L.; Pfeffer, M.A.; Fries, J.A.; Shah, N.H. The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs. CoRR 2023, abs/2303.12961, [2303.12961]. [CrossRef]
Kaplan, J.; McCandlish, S.; Henighan, T.; Brown, T.B.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling Laws for Neural Language Models. CoRR 2020, abs/2001.08361, [2001.08361].
Li, H.; Zhang, Y.; Koto, F.; Yang, Y.; Zhao, H.; Gong, Y.; Duan, N.; Baldwin, T. CMMLU: Measuring massive multitask language understanding in Chinese. In Proceedings of the Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024; Ku, L.; Martins, A.; Srikumar, V., Eds. Association for Computational Linguistics, 2024, pp. 11260–11285. [CrossRef]
Lee, J. InstructPatentGPT: Training patent language models to follow instructions with human feedback. CoRR 2024, abs/2406.16897, [2406.16897]. [CrossRef]
Scao, T.L.; Fan, A.; Akiki, C.; Pavlick, E.; Ilic, S.; Hesslow, D.; Castagné, R.; Luccioni, A.S.; Yvon, F.; Gallé, M.; et al. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. CoRR 2022, abs/2211.05100, [2211.05100]. [CrossRef]

Table 1. Main Experimental Results (F1 Score %).

Model	ACE 2005	ERE
Supervised Model with CRF (Li et al., 2013)	65.2	58.7
GPT-3 (Brown et al., 2020) Zero-Shot	63.5	56.1
Fine-tuned BERT for Event Extraction (Devlin et al., 2019)	67.1	61.0
Our Proposed Method	69.8	63.2

Table 2. Ablation Study Results (F1 Score %).

Model Variant	ACE 2005	ERE
Stage 1 Only	68.5	62.1
Stage 2 Only (with generic prompts)	59.3	53.5
Our Proposed Method (Full)	69.8	63.2

Table 3. Human Evaluation Results (Preference %).

Preference	ACE 2005	ERE
Our Proposed Method	62.5	65.3
Fine-tuned BERT for Event Extraction (Devlin et al., 2019)	28.7	25.8
Equal Quality	8.8	8.9

Table 4. Performance on Different Event Types (ACE 2005, F1 Score %).

Event Type	Fine-tuned BERT	Our Proposed Method
Attack	72.3	75.1
Meet	68.9	71.5
Phone-Call	78.6	80.2
Transport	61.2	64.8
Transfer-Ownership	55.7	58.9

Table 5. Performance on Different Argument Roles (ACE 2005, F1 Score %).

Argument Role	Fine-tuned BERT	Our Proposed Method
Victim	75.8	78.2
Place	70.1	73.5
Time	82.4	84.9
Agent	65.3	68.1
Artifact	59.6	62.4

Table 6. Impact of the Number of Few-Shot Examples (ERE, F1 Score %).

Number of Few-Shot Examples	F1 Score
1	62.5
3	63.2
5	63.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.