Submitted:
13 September 2024
Posted:
14 September 2024
You are already at the latest version
Abstract

Keywords:
MSC: 68T30; 68T50
1. Introduction
- We introduce a new task: the event trigger recommendation task for high-accuracy applications. By identifying potential events as much as possible and recommending candidate triggers and their event types, we aim to reduce the proportion of cases where auditors need to re-read text to detect the omissions, thereby improving the efficiency of human–machine collaborative detection.
- We construct an event trigger recommendation model, SpETR, which combines the confidence generated in the two stages of trigger identification and classification, and design the recommendation algorithm considering multiple events to output candidate triggers and event types. For the model training, the boundary smoothing method based on semantic integrity is proposed, and the spans around the hard labeled triggers are assigned soft labels according to the hard labeled ones. With the modified focal loss, the confidence of the model output is smoother and the quality of candidate results is improved.
- The experimental results show the following: (1) The recall rate of SpETR with different number of recommendations K is significantly better than other baseline models. When K = 3, it can well meet the requirements of high recall rate and does not cause an excessive burden. At this time, the recall rate on the DuEE and self-built datasets reaches 97.76% and 98.70%, respectively, which is 4.30% and 5.22% higher compared with the optimal baseline model. (2) Ten testers were invited to conduct human–machine collaboration detection on the “competition” and “politics” domain. Compared with the traditional model, the average time consumption of SpETR decreased by 30.6% and 25.8%, respectively, and the average F1 score of the annotated results increased by 5.35% and 3.48%, respectively.
2. Related Work
2.1. Event Detection
2.1.1. Span-Based Classification
2.1.2. Sequence Labeling
2.1.3. Pointer Networks
2.1.4. Others
2.2. Span Model
2.3. Human–Machine Collaborative Information Extraction
3. Task Definition
3.1. Formulation
3.2. Characteristics
- In the ideal case, (1) ; (2) ; (3) all spans’ boundaries in are correct; (4) corresponding event types are also correct. The judgment of (3) and (4) requires the gold triggers and their event types, which are generated by manual prior annotation. At this time, the recall and precision are 100%.
- Taking a step back, a high recall rate can be obtained as long as the combination of the real trigger and its type exists in the machine output set , and the auditor can directly select the real result from .
- Taking another step back, from the perspective of saving the auditor’s time in checking the omissions, the overall efficiency of the human–machine collaborative detection should be improved, as long as each gold trigger has a span overlapping or even close to it in . It can help the auditor to quickly locate events and save time.
- Generally speaking, increasing helps to improve the recall rate, but too large a will produce too many candidate results requiring manual audit, which will reduce the efficiency.
3.3. Connection and Distinction
4. Methodology
4.1. Trigger Identification
4.1.1. Combined Rules Filtering
4.1.2. Span Representation
4.1.3. Binary Classification
4.2. Trigger Classification
4.3. Candidate Recommendation
| Algorithm 1. Candidate recommendation algorithm considering multiple events | |
|
Input: candidate trigger set , candidate trigger confidence tuple set , where Output: candidate triggers and event type for all potential events |
|
| 1. | # Initialize the model output |
| 2. | Arrange the in descending order according to trigger confidence |
| 3. | # The set of candidate triggers representing potential events |
| 4. | for in do |
| 5. | if does not overlap with any do |
| 6. | |
| 7. | end if |
| 8. | end for |
| 9. | for in do |
| 10. | # Candidate triggers and their confidence tuple for one potential event |
| 11. | for in do |
| 12. | if overlaps with do |
| 13. | |
| 14. | end if |
| 15. | end for |
| 16. | # For each potential event, the maximum K joint confidences are found, and their corresponding tuples, like (candidate trigger, event type, joint confidence), are output as recommendations |
| 17. | end for |
4.4. Model Training
4.4.1. Semantic-Based Boundary Smoothing
4.4.2. Loss Function
5. Experiments
5.1. Experimental Setup
5.1.1. Datasets
5.1.2. Baselines
- MRC-based [40]. Prepend a description of the event type to the text as a query, determine the position of a trigger using head and tail pointers, then identify its event type. Based on the trigger and event type confidence to recommend candidate results.
- Locate and Label [33] (LL). Using the two-stage method of determining the span position and then conducting multiclass classification, the soft positive case is constructed according to the degree of overlap between the span and the positive case, and the model is trained with focal loss. As the model is oriented to the named entity recognition task, for fair comparison in the event trigger recommendation task, the same implementation as SpETR is adopted on the span presentation.
- Boundary Smooth [3] (BS). Using the one-stage method of directly classifying the type of spans, the proposed boundary smoothing method constructs soft labels by evenly assigning the confidence of hard labels equally to the surrounding spans. As the model is oriented to the named entity recognition task, for fair comparison in the event trigger recommendation task, the same implementation as SpETR is adopted on the span presentation.
- Boundary Smooth* [3] (BS*). Similar to BS, except the span representation adopts the double affine structure proposed in their paper. To ensure consistency and fairness in the comparison experiments, LL, BS, and BS* used the same recommendation algorithm as SpETR.
5.1.3. Implementation Details
5.1.4. Evaluation Metrics
5.2. Comparative Experiment
- The recall of SpETR significantly reduces at . Through error analysis, it was found that, at this point, the vast majority of correct results have already been recalled. The small portion of results that are not recalled consists of ambiguous cases or errors due to incorrect annotations in the dataset. Even with an increased number of recommendations, it is difficult to improve the recall rate further. Excessive recommendations may also cause interference for the auditors. Therefore, in practical applications, is typically the most efficient.
- SpETR has a significant advantage in recall rate. SpETR shows a clear advantage in recall rate over the other baseline models on both the DuEE dataset and the self-built dataset. When , it improves by 5.30% and 11.74% compared to the MRC-based models.
- SpETR meets the high recall rate requirements. When the number of recommendations is set to 3, the recall rates on the DuEE and self-built datasets reach 97.76% and 98.70%, respectively. Given the presence of annotation errors in the datasets, this indicates that auditors can, in most cases, directly select the correct results from the recommendations without re-reading the text and manually annotating.
5.3. Ablation Study
- Ablation of the two-stage design. When this part is removed, trigger identification and event classification are completed in one stage. This is simple and direct, but cannot reflect the trigger boundary and event type information in the confidence degree of the multi-classifier output. The results show that decreased by 9.89% and 13.48% on DuEE and the self-built dataset, respectively, indicating that the two-stage design of trigger identification and then trigger classification produced higher-quality candidate results.
- Ablation of semantic-based boundary smoothing methods. When this part is removed, traditional hard labels are used for training in the trigger identification stage; the model is prone to overfitting, which reduces the reliability of the output confidence. The results showed that, after removing the semantic-based boundary smoothing method, decreased by 4.75% and 5.22% on DuEE and the self-built dataset, respectively, indicating that the semantic-based boundary smoothing method can improve the candidate result quality and recall through a more refined soft label assignment method.
- Ablation of the modified focal loss function. When this part is removed, the ordinary binary cross-entropy function is adopted in the trigger identification stage during the model training. The results showed that decreased by 5.25% and 6.09% in DuEE and the self-built dataset, respectively, indicating that the modified focal loss can adjust the sample weight adaptively to alleviate the imbalance of positive and negative cases in the span model, while the common binary cross-entropy loss function does not have this characteristic, so it is easy to bias the negative cases, leading to reducing the recall rate.
- Simultaneous ablation of semantic-based boundary smoothing and modified focal loss. When these two parts are removed, the traditional hard label labeling method and binary cross-entropy loss are used in the trigger identification stage during the model training. The results showed that decreased by 6.08% and 7.83% on the DuEE and self-built datasets, respectively. This was higher than when removing either of them alone, indicating that the two parts cooperate and promote each other.
5.4. Case Study
5.5. Parameter Analysis
- Dropout rate of the pre-trained model (dropout_rate). This parameter represents the dropout rate of the output representation of the pre-trained model, which helps to avoid overfitting during model training. When the parameters change in the range of [0.00, 0.45], the recall rate in the DuEE dataset and self-built dataset varies little, and the overall range is relatively flat, indicating that the model has good robustness to this parameter, which is due to the simple and efficient model structure of SpETR.
- Trigger identification threshold (). In trigger identification stage, spans with confidence greater than the parameter are regarded as candidate triggers and are involved in the subsequent inference and training, which has a great impact on the recall rate. When increases, it means that the requirement for triggers becomes “strict” and the recall rate will decrease in the DuEE and self-built datasets, the former trend is flat, indicating that the correct trigger confidence is less and evenly distributed within [0.05–0.5]; the latter decreases more and there are two abnormal points, 0.15 and 0.40, which may be due to the instability caused by the small amount of dataset. These show that the model can still show a good recall rate with increasing when the training data is sufficient while, in the low-resource scenario, the recall rate will decrease relatively faster.
- The coefficient for the modified focal loss (). This adjusts the loss weight for difficult classification samples, with more ambiguous samples receiving a higher loss. As increases, the adjustment for difficult samples becomes stronger, causing the model to focus more on hard-to-recognize triggers, thereby increasing the recall rate. The results from the two datasets show that is generally positively correlated with the model’s recall rate. However, an excessively large does not significantly improve the recall rate and may actually reduce precision, impacting the audit efficiency in practical applications.
- The limit of span length (). This defines the maximum length of spans considered during training and inference. When is set to 4, the model shows a noticeable increase in recall rate across both datasets, indicating that most triggers are shorter than 4. Increasing further continues to improve the recall rate up to a point; however, it begins to decline when exceeds 10 on the DuEE dataset and 6 on the self-built dataset. This is because, while a larger recalls longer spans, it also introduces more negative examples, causing the model to favor negatives and thereby reducing the recall rate. Thus, should be selected based on prior knowledge of trigger lengths in different domains.
5.6. Human–Machine Collaborative Detection
5.6.1. Event Annotation System Integration
- Text Display and Annotation: Shows the text and trigger locations, with candidate triggers highlighted by colored blocks indicating their joint confidence levels. Annotators can manually select spans and annotate triggers and event types.
- Candidate Recommendation: Displays the model’s recommended candidate triggers and event types for each event, along with joint confidence levels for reference. Annotators confirm recommendations by clicking, with the first candidate automatically annotated to save time.
- Annotation Confirmation: Structurally displays annotated triggers and event types for easy verification by annotators.
- Annotated Text List: Shows the annotation status of each sample, allowing for quick navigation to specific samples and monitoring overall annotation progress.
5.6.2. Empirical Results
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Linguistic Data Consortium. ACE (Automatic Content Extraction) English Annotation Guidelines for Events Version 5.4.3. Available online: https://www.ldc.upenn.edu/ (accessed on 11 August 2024).
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding; 2018; Volume abs/1810.04805.
- Zhu, E.; Li, J. Boundary Smoothing for Named Entity Recognition. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 7096–7108. [Google Scholar]
- Wang, S.; Yu, M.; Huang, L. The Art of Prompting: Event Detection based on Type Specific Prompts. arXiv 2022, arXiv:2204.07241. [Google Scholar]
- Xu, C.; Zeng, Z.; Duan, J.; Qi, Q.; Zhang, X. Extracting Events Using Spans Enhanced with Trigger-Argument Interaction. In Proceedings of the International Conference on Intelligent Systems and Knowledge Engineering, Fuzhou, China, 17–19 November 2023. [Google Scholar] [CrossRef]
- Li, H.; Mo, T.; Fan, H.; Wang, J.; Wang, J.; Zhang, F.; Li, W. KiPT: Knowledge-injected Prompt Tuning for Event Detection. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 1943–1952. [Google Scholar]
- Guan, Y.; Chen, J.; Lecue, F.; Pan, J.; Li, J.; Li, R. Trigger-Argument based Explanation for Event Detection. In Findings of the Association for Computational Linguistics: ACL; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023. [Google Scholar]
- Chen, Y.; Xu, L.; Liu, K.; Zeng, D.; Zhao, J. Event Extraction Via Dynamic Multi-Pooling Convolutional Neural Networks; Association for Computational Linguistics: Stroudsburg, PA, USA, 2015; Volume P15-1, pp. 167–176. [Google Scholar]
- Wang, S.; Yu, M.; Chang, S.; Sun, L.; Huang, L. Query and Extract: Refining Event Extraction as Type-oriented Binary Decoding. In Findings of the Association for Computational Linguistics: ACL; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 169–182. [Google Scholar] [CrossRef]
- He, L.; Meng, Q.; Zhang, Q.; Duan, J.; Wang, H. Event detection using a self-constructed dependency and graph convolution network. Applied Sciences 2023, 13, 3919. [Google Scholar] [CrossRef]
- Liu, M.; Huang, L. Teamwork Is Not Always Good: An Empirical Study of Classifier Drift in Class-incremental Information Extraction. arXiv 2023, arXiv:2305.16559. [Google Scholar]
- Wei, K.; Zhang, Z.; Jin, L.; Guo, Z.; Li, S.; Wang, W.; Lv, J. HEFT: A History-Enhanced Feature Transfer framework for incremental event detection. Knowl.-Based Syst. 2022, 254, 109601. [Google Scholar] [CrossRef]
- Nateras, L.G.; Dernoncourt, F.; Nguyen, T. Hybrid Knowledge Transfer for Improved Cross-Lingual Event Detection via Hierarchical Sample Selection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, USA, 9–14 July 2023. [Google Scholar]
- Yue, Z.; Zeng, H.; Lan, M.; Ji, H.; Wang, D. Zero- and Few-Shot Event Detection via Prompt-Based Meta Learning. arXiv 2023, arXiv:2305.17373. [Google Scholar]
- Liu, J.; Sui, D.; Liu, K.; Liu, H.; Zhao, Z. Learning with Partial Annotations for Event Detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, USA, 9–14 July 2023. [Google Scholar]
- Liu, M.; Chang, S.; Huang, L. Incremental Prompting: Episodic Memory Prompt for Lifelong Event Detection. arXiv 2022, arXiv:2204.07275. [Google Scholar] [CrossRef]
- Zhang, C.; Cao, P.; Chen, Y.; Liu, K.; Zhang, Z.; Sun, M.; Zhao, J. Continual Few-shot Event Detection via Hierarchical Augmentation Networks. arXiv 2024, arXiv:2403.17733. [Google Scholar]
- Nguyen, T.H.; Cho, K.; Grishman, R. Joint Event Extraction via Recurrent Neural Networks; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016. [Google Scholar]
- Wei, Y.; Liu, S.; Lv, J.; Xi, X.; Yan, H.; Ye, W.; Mo, T.; Yang, F.; Wan, G. DESED: Dialogue-based Explanation for Sentence-level Event Detection. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2483–2493. [Google Scholar]
- Zheng, S.; Wang, F.; Bao, H.; Hao, Y.; Zhou, P.; Xu, B. Joint Extraction Of Entities And Relations Based on a Novel Tagging Scheme. arXiv 2017, arXiv:1706.05075. [Google Scholar]
- Tian, C.; Zhao, Y.; Ren, L. A Chinese Event Relation Extraction Model Based on BERT. In Proceedings of the 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 25–28 May 2019. [Google Scholar]
- Cao, H.; Li, J.; Su, F.; Li, F.; Fei, H.; Wu, S.; Li, B.; Zhao, L.; Ji, D. OneEE: A One-Stage Framework for Fast Overlapping and Nested Event Extraction. arXiv 2022, arXiv:2209.02693. [Google Scholar] [CrossRef]
- Yang, S.; Feng, D.; Qiao, L.; Kan, Z.; Li, D. Exploring Pre-Trained Language Models For Event Extraction And Generation; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; Volume P19-1, pp. 5284–5294. [Google Scholar]
- Xinya, D.; Claire, C. Event Extraction by Answering (Almost) Natural Questions; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 671–683. [Google Scholar]
- Yunmo, C.; Tongfei, C.; Seth, E.; Benjamin, V.D. Reading the Manual: Event Extraction as Definition Comprehension. arXiv 2020, arXiv:1912.01586. [Google Scholar] [CrossRef]
- Li, F.; Peng, W.; Chen, Y.; Wang, Q.; Pan, L.; Lyu, Y.; Zhu, Y. Event Extraction as Multi-Turn Question Answering; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 829–838. [Google Scholar]
- Wan, X.; Mao, Y.; Qi, R. Chinese Event Detection without Triggers Based on Dual Attention. Applied Sciences 2023, 13, 4523. [Google Scholar] [CrossRef]
- Yan, Y.; Liu, Z.; Gao, F.; Gu, J. Type Hierarchy Enhanced Event Detection without Triggers. Applied Sciences 2023, 13, 2296. [Google Scholar] [CrossRef]
- Dixit, K.; Al-Onaizan, Y. Span-Level Model For Relation Extraction; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; Volume P19-1, pp. 5308–5314. [Google Scholar]
- Sohrab, M.G.; Miwa, M. Deep Exhaustive Model for Nested Named Entity Recognition; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; Volume D18-1, pp. 2843–2849. [Google Scholar]
- Markus, E.; Adrian, U. Span-Based Joint Entity And Relation Extraction with Transformer Pre-Training. ECAI 2020, 325, 2006–2013. [Google Scholar]
- Zhu, E.; Liu, Y.; Li, J. Deep Span Representations for Named Entity Recognition. In Findings of the Association for Computational Linguistics: ACL; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023. [Google Scholar]
- Shen, Y.; Ma, X.; Tan, Z.; Zhang, S.; Wang, W.; Lu, W. Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition. arXiv 2021, arXiv:2105.06804. [Google Scholar] [CrossRef]
- Peng, T.; Li, Z.; Zhang, L.; Du, B.; Zhao, H. FSUIE: A Novel Fuzzy Span Mechanism for Universal Information Extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, USA, 9–14 July 2023. [Google Scholar]
- Zheng, Q.; Wu, Y.; Wang, G.; Chen, Y.; Wu, W.; Zhang, Z.; Shi, B.; Dong, B. Exploring Interactive and Contrastive Relations for Nested Named Entity Recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 31, 2899–2909. [Google Scholar] [CrossRef]
- Kristjansson, T.T.; Culotta, A.; Viola, P.A.; McCallum, A. Interactive Information Extraction with Constrained Conditional Random Fields; Association for Computational Linguistics: Stroudsburg, PA, USA, 2004; pp. 412–418. [Google Scholar]
- Dalvi, B.; Bhakthavatsalam, S.; Clark, C.; Clark, P.; Etzioni, O.; Fader, A.; Groeneveld, D. IKE—An Interactive Tool for Knowledge Extraction. Comput. Sci. 2016, 12–17. [Google Scholar] [CrossRef]
- Pafilis, E.; Buttigieg, P.L.; Ferrell, B.; Pereira, E.; Schnetzer, J.; Arvanitidis, C.; Jensen, L.J. EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation. Database J. Biol. Databases Curation 2016, 2016, baw005. [Google Scholar] [CrossRef]
- Blankemeier, L.; Zhao, T.; Tinn, R.; Kiblawi, S.; Gu, Y.; Chaudhari, A.; Poon, H.; Zhang, S.; Wei, M.; Preston, J. Interactive Span Recommendation for Biomedical Text. In Proceedings of the 5th Clinical Natural Language Processing Workshop, Toronto, ON, Canada, 14 July 2023. [Google Scholar] [CrossRef]
- Duan, J.; Zhang, X.; Xu, C. LwF4IEE: An Incremental Learning Method for Interactive Event Extraction. In Proceedings of the 2022 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), Suzhou, China, 14–16 October 2022. [Google Scholar] [CrossRef]
- Chengxi, Y.; Xuemei, T.; Hao, Y.; Qi, S.; Jun, W. HanNER: A General Framework for the Automatic Extraction of Named Entities in Ancient Chinese Corpora. J. China Soc. Sci. Tech. Inf. 2023, 42, 203–216. [Google Scholar] [CrossRef]
- Wei, X.; Chen, Y.; Cheng, N.; Cui, X.; Xu, J.; Han, W. CollabKG: A Learnable Human-Machine-Cooperative Information Extraction Toolkit for (Event) Knowledge Graph Construction. arXiv 2023, arXiv:2307.00769. [Google Scholar] [CrossRef]
- Luo, R.; Xu, J.; Zhang, Y.; Ren, X.; Sun, X. PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation. arXiv 2019, arXiv:1906.11455. [Google Scholar]
- Li, X.; Li, F.; Pan, L.; Chen, Y.; Peng, W.; Wang, Q.; Lyu, Y.; Zhu, Y. DuEE: A Large-Scale Dataset for Chinese Event Extraction in Real-World Scenarios; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020. [Google Scholar]






| Datasets | Domain | Event Type | Statistic Specification | Training Set | Valid & Test Set |
|---|---|---|---|---|---|
| DuEE | 9 | 65 | Sentences | 11,958 | 1498 |
| Events | 13,915 | 1790 | |||
| Self-built | 1 | 9 | Sentences | 767 | 182 |
| Events | 939 | 230 |
| Item | Configuration |
|---|---|
| Operating system | Ubuntu 20.04.3 LTS x86_64 |
| CPU | 40 vCPU Intel(R) Xeon(R) Platinum 8457C |
| GPU | L20 (48 GB) * 2 |
| Memory | 200 GB |
| Python | 3.8.10 |
| Framework | Pytorch (1.10.0) + Transformers (4.38.2) + Cuda (12.4) |
| Parameter | Description | Value | |
|---|---|---|---|
| DuEE | Self-Built | ||
| Main parameters | |||
| dropout_rate | dropout rate of pre-trained model | 0.1 | 0.45 |
| trigger identification threshold | 0.2 | 0.2 | |
| focusing parameter of modified focal loss | 2 | 2 | |
| limit of span length | 10 | 6 | |
| Optimizer parameters | |||
| learning_rate | initial learning rate | 1 × 10−5 | 1 × 10−5 |
| warm_up_steps | warm-up steps | 500 | 500 |
| lr_decay_steps | decay steps | 3000 | 3000 |
| min_lr_rate | minimum learning rate | 1 × 10−6 | 1 × 10−6 |
| Model | DuEE | Self-Built | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| K = 1 | K = 2 | K = 3 | K = 4 | K = 5 | K = 1 | K = 2 | K = 3 | K = 4 | K = 5 | |
| SpETR | 92.23 | 96.59 | 97.76 | 97.93 | 97.98 | 92.17 | 96.96 | 98.70 | 98.70 | 99.13 |
| MRC-based | 85.47 | 90.28 | 92.46 | 93.80 | 94.47 | 82.61 | 86.09 | 86.96 | 87.39 | 87.83 |
| LL | 87.87 | 90.39 | 91.95 | 93.63 | 94.13 | 89.57 | 90.87 | 93.48 | 93.91 | 93.91 |
| BS | 91.62 | 92.68 | 93.46 | 94.24 | 94.41 | 86.09 | 87.93 | 90.00 | 90.44 | 90.87 |
| BS* | 91.56 | 92.57 | 93.46 | 94.19 | 94.52 | 84.78 | 89.13 | 90.87 | 90.87 | 91.30 |
| Model | DuEE | Self-Built | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| K = 1 | K = 2 | K = 3 | K = 4 | K = 5 | K = 1 | K = 2 | K = 3 | K = 4 | K = 5 | |
| SpETR | 92.23 | 96.59 | 97.76 | 97.93 | 97.98 | 92.17 | 96.95 | 98.69 | 98.69 | 99.13 |
| w/o two-stage design | 85.58 | 87.54 | 87.87 | 88.04 | 88.10 | 83.04 | 84.34 | 85.21 | 85.21 | 85.65 |
| w/o semantic-based boundary smoothing | 89.77 | 91.84 | 92.29 | 93.18 | 93.46 | 90.00 | 92.17 | 93.47 | 94.34 | 95.21 |
| w/o modified focal loss | 87.87 | 91.56 | 92.51 | 92.90 | 93.12 | 85.21 | 92.17 | 92.60 | 93.04 | 93.04 |
| w/o semantic-based boundary smoothing and modified focal loss | 88.60 | 91.40 | 91.68 | 91.73 | 91.73 | 84.34 | 90.00 | 90.86 | 91.30 | 92.17 |
| Annotation Condition | Description |
|---|---|
| Full manual | The tester completed the event detection task independently. |
| Traditional model | The traditional event detection model was used to output triggers and event types for automatic annotation, with testers adjusting and manually correcting errors and omissions. |
| SpETR | SpETR was used to recommend candidate triggers and event types, and automatically annotate the first candidate. Testers prioritized selecting from candidate triggers to address errors and omissions. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).