Submitted:
21 March 2025
Posted:
24 March 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We propose a novel two-stage training approach that leverages the instruction following and in-context learning capabilities of Large Language Models for open-domain event extraction.
- We demonstrate a method for implicitly inducing event patterns and identifying argument roles directly from the LLM’s extracted event descriptions without relying on predefined schemas.
- We evaluate our approach on two benchmark event extraction datasets, ACE 2005 and ERE, and achieve promising results, showcasing the potential of LLMs for flexible and scalable event extraction.
2. Related Work
2.1. Event Extraction
2.2. Large Language Model
3. Method
3.1. Stage 1: Instruction Tuning for Open-Domain Event Description Generation
3.2. Stage 2: Meta-Learning Inspired Few-Shot Learning for Pattern Induction
4. Experiments
4.1. Experimental Setup
- Supervised Model with CRF (Li et al., 2013): A traditional supervised event extraction model employing Conditional Random Fields (CRF) trained on the annotated data with a predefined schema, representing a strong baseline for traditional methods.
- GPT-3 (Brown et al., 2020) Zero-Shot: The GPT-3 model prompted to perform event extraction without any task-specific fine-tuning, serving as a strong zero-shot LLM baseline.
- Fine-tuned BERT for Event Extraction (Devlin et al., 2019): A BERT model fine-tuned on the event extraction task using a more traditional approach with predefined event types and roles, representing a strong fine-tuned transformer-based baseline.
4.2. Quantitative Results
4.3. Ablation Study
- Stage 1 Only: Using only the instruction-tuned LLM from the first stage to directly generate event descriptions without the pattern induction stage.
- Stage 2 Only (with generic prompts): Using a pre-trained LLM (same base model as ours) with generic prompts designed to perform pattern induction without the instruction tuning from the first stage.
4.4. Human Evaluation
- Accuracy: Whether the extracted event trigger and arguments are correct according to the text.
- Completeness: Whether all relevant events and arguments have been extracted.
- Coherence: Whether the natural language description of the event is clear and coherent.
4.5. Analysis of Performance Across Event Types
4.6. Analysis of Performance on Argument Roles
4.7. Impact of the Number of Few-Shot Examples
5. Conclusion
References
- Sharif, O.; Gatto, J.; Basak, M.; Preum, S.M. Explicit, Implicit, and Scattered: Revisiting Event Extraction to Capture Complex Arguments. In Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024; Al-Onaizan, Y.; Bansal, M.; Chen, Y., Eds. Association for Computational Linguistics, 2024, pp. 12061–12081.
- Ye, H.; Gui, H.; Zhang, A.; Liu, T.; Hua, W.; Jia, W. Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction. CoRR 2023, abs/2312.03022, [2312.03022]. [CrossRef]
- Lu, D.; Ran, S.; Tetreault, J.R.; Jaimes, A. Event Extraction as Question Generation and Answering. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, Toronto, Canada, July 9-14, 2023; Rogers, A.; Boyd-Graber, J.L.; Okazaki, N., Eds. Association for Computational Linguistics, 2023, pp. 1666–1688. [CrossRef]
- Chen, R.; Qin, C.; Jiang, W.; Choi, D. Is a Large Language Model a Good Annotator for Event Extraction? In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada; Wooldridge, M.J.; Dy, J.G.; Natarajan, S., Eds. AAAI Press, 2024, pp. 17772–17780. [CrossRef]
- Zhou, Y.; Shen, J.; Cheng, Y. Weak to Strong Generalization for Large Language Models with Multi-capabilities. In Proceedings of the The Thirteenth International Conference on Learning Representations, 2025.
- Zhou, Y.; Li, X.; Wang, Q.; Shen, J. Visual In-Context Learning for Large Vision-Language Models. In Proceedings of the Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024. Association for Computational Linguistics, 2024, pp. 15890–15902.
- Zhou, Y.; Geng, X.; Shen, T.; Tao, C.; Long, G.; Lou, J.G.; Shen, J. Thread of thought unraveling chaotic contexts. arXiv preprint arXiv:2311.08734 2023. arXiv:2311.08734 2023.
- Zhou, Y.; Zhang, J.; Chen, G.; Shen, J.; Cheng, Y. Less Is More: Vision Representation Compression for Efficient Video Generation with Large Language Models, 2024.
- Cunha, L.F.; Silvano, P.; Campos, R.; Jorge, A. ACE-2005-PT: Corpus for Event Extraction in Portuguese. In Proceedings of the Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024, Washington DC, USA, July 14-18, 2024; Yang, G.H.; Wang, H.; Han, S.; Hauff, C.; Zuccon, G.; Zhang, Y., Eds. ACM, 2024, pp. 661–666. [CrossRef]
- Wang, X.; Chen, Y.; Ding, N.; Peng, H.; Wang, Z.; Lin, Y.; Han, X.; Hou, L.; Li, J.; Liu, Z.; et al. MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022; Goldberg, Y.; Kozareva, Z.; Zhang, Y., Eds. Association for Computational Linguistics, 2022, pp. 926–941. [CrossRef]
- Zhang, J.; Huang, W.; Ji, D.; Ren, Y. Globally normalized neural model for joint entity and event extraction. Information Processing & Management 2021, 58, 102636. [Google Scholar]
- Xiao, Y.; Tan, C.; Fan, Z.; Xu, Q.; Zhu, W. Joint entity and relation extraction with a hybrid transformer and reinforcement learning based model. In Proceedings of the Proceedings of the AAAI conference on artificial intelligence, 2020, Vol. 34, pp. 9314–9321.
- Wang, X.D.; Weber, L.; Leser, U. Biomedical Event Extraction as Multi-turn Question Answering. In Proceedings of the Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, LOUHI@EMNLP 2020, Online, November 20, 2020; Holderness, E.; Jimeno-Yepes, A.; Lavelli, A.; Minard, A.; Pustejovsky, J.; Rinaldi, F., Eds. Association for Computational Linguistics, 2020, pp. 88–96. [CrossRef]
- Huang, G.; Min, Z.; Ge, Q.; Yang, Z. Towards document-level event extraction via Binary Contrastive Generation. Knowledge-Based Systems 2024, 296, 111896. [Google Scholar] [CrossRef]
- Zhang, N.; Ye, H.; Deng, S.; Tan, C.; Chen, M.; Huang, S.; Huang, F.; Chen, H. Contrastive information extraction with generative transformer. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021, 29, 3077–3088. [Google Scholar] [CrossRef]
- Xu, D.; Chen, W.; Peng, W.; Zhang, C.; Xu, T.; Zhao, X.; Wu, X.; Zheng, Y.; Wang, Y.; Chen, E. Large language models for generative information extraction: A survey. Frontiers of Computer Science 2024, 18, 186357. [Google Scholar] [CrossRef]
- Tuo, A.; Besançon, R.; Ferret, O.; Tourille, J. Few-Shot Event Argument Extraction Based on a Meta-Learning Approach. In Proceedings of the Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, NAACL 2024, Mexico City, Mexico, June 18, 2024; Cao, Y.T.; Papadimitriou, I.; Ovalle, A.; Zampieri, M.; Ferraro, F.; Swayamdipta, S., Eds. Association for Computational Linguistics, 2024, pp. 146–153. [CrossRef]
- Zhang, X.; Yang, H.; Young, E.F.Y. Attentional Transfer is All You Need: Technology-aware Layout Pattern Generation. In Proceedings of the 58th ACM/IEEE Design Automation Conference, DAC 2021, San Francisco, CA, USA, December 5-9, 2021. IEEE, 2021, pp. 169–174. [CrossRef]
- Wang, Z.; Li, M.; Xu, R.; Zhou, L.; Lei, J.; Lin, X.; Wang, S.; Yang, Z.; Zhu, C.; Hoiem, D.; et al. Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners. In Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022; Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; Oh, A., Eds., 2022.
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers); Burstein, J.; Doran, C.; Solorio, T., Eds. Association for Computational Linguistics, 2019, pp. 4171–4186. [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR 2019, abs/1907.11692, [1907.11692].
- Wang, Q.; Wang, C.; Lai, Z.; Zhou, Y. InsectMamba: State Space Model with Adaptive Composite Features for Insect Recognition. In Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5.
- Wang, Q.; Hu, H.; Zhou, Y. Memorymamba: Memory-augmented state space model for defect recognition. arXiv preprint arXiv:2405.03673 2024. arXiv:2405.03673 2024.
- Zhou, Y.; Long, G. Improving Cross-modal Alignment for Text-Guided Image Inpainting. In Proceedings of the Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023, pp. 3445–3456.
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 140:1–140:67. [Google Scholar]
- Wornow, M.; Xu, Y.; Thapa, R.; Patel, B.S.; Steinberg, E.; Fleming, S.L.; Pfeffer, M.A.; Fries, J.A.; Shah, N.H. The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs. CoRR 2023, abs/2303.12961, [2303.12961]. [CrossRef]
- Kaplan, J.; McCandlish, S.; Henighan, T.; Brown, T.B.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling Laws for Neural Language Models. CoRR 2020, abs/2001.08361, [2001.08361].
- Li, H.; Zhang, Y.; Koto, F.; Yang, Y.; Zhao, H.; Gong, Y.; Duan, N.; Baldwin, T. CMMLU: Measuring massive multitask language understanding in Chinese. In Proceedings of the Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024; Ku, L.; Martins, A.; Srikumar, V., Eds. Association for Computational Linguistics, 2024, pp. 11260–11285. [CrossRef]
- Lee, J. InstructPatentGPT: Training patent language models to follow instructions with human feedback. CoRR 2024, abs/2406.16897, [2406.16897]. [CrossRef]
- Scao, T.L.; Fan, A.; Akiki, C.; Pavlick, E.; Ilic, S.; Hesslow, D.; Castagné, R.; Luccioni, A.S.; Yvon, F.; Gallé, M.; et al. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. CoRR 2022, abs/2211.05100, [2211.05100]. [CrossRef]
| Model | ACE 2005 | ERE |
| Supervised Model with CRF (Li et al., 2013) | 65.2 | 58.7 |
| GPT-3 (Brown et al., 2020) Zero-Shot | 63.5 | 56.1 |
| Fine-tuned BERT for Event Extraction (Devlin et al., 2019) | 67.1 | 61.0 |
| Our Proposed Method | 69.8 | 63.2 |
| Model Variant | ACE 2005 | ERE |
| Stage 1 Only | 68.5 | 62.1 |
| Stage 2 Only (with generic prompts) | 59.3 | 53.5 |
| Our Proposed Method (Full) | 69.8 | 63.2 |
| Preference | ACE 2005 | ERE |
| Our Proposed Method | 62.5 | 65.3 |
| Fine-tuned BERT for Event Extraction (Devlin et al., 2019) | 28.7 | 25.8 |
| Equal Quality | 8.8 | 8.9 |
| Event Type | Fine-tuned BERT | Our Proposed Method |
| Attack | 72.3 | 75.1 |
| Meet | 68.9 | 71.5 |
| Phone-Call | 78.6 | 80.2 |
| Transport | 61.2 | 64.8 |
| Transfer-Ownership | 55.7 | 58.9 |
| Argument Role | Fine-tuned BERT | Our Proposed Method |
| Victim | 75.8 | 78.2 |
| Place | 70.1 | 73.5 |
| Time | 82.4 | 84.9 |
| Agent | 65.3 | 68.1 |
| Artifact | 59.6 | 62.4 |
| Number of Few-Shot Examples | F1 Score |
| 1 | 62.5 |
| 3 | 63.2 |
| 5 | 63.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).