Submitted:
21 June 2026
Posted:
22 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Tasks in Multi-Party Dialogue
2.1. Dialogue Understanding
2.2. Dialogue Structure Modeling
2.3. Response and Generation
2.4. Role and Participant Modeling
2.5. Summarization and Decision Support
2.6. Advanced and Emerging Tasks
3. Methods in Multi-Party Dialogue
3.1. From Statistical Models to Neural Networks
3.2. Pre-trained Models and Graph Reasoning
3.3. Large Language Models and Multi-Agent Paradigms
4. Datasets
4.1. Structure and Disentanglement
4.2. Emotion and Social Relationship
4.3. Task-Oriented and Open-Domain
4.4. Multimodal and Naturalistic Meetings
5. Evaluation
5.1. Automatic Generation Metrics
5.2. Classification and Selection Metrics
5.3. Structural and Discourse Metrics
5.4. Human Evaluation
6. Challenges and Opportunities
6.1. Combinatorial Complexity vs. Social Cognition
6.2. The Structural-Semantic Divide vs. Neuro-Symbolic Fusion
6.3. The Evaluation Conundrum vs. Holistic Benchmarks
6.4. The Data and Modality Bottleneck vs. Multimodal Grounding
7. Conclusion
Limitations
| Method | Domain/Task | Datasets | Metrics | Headline Result | Code |
|---|---|---|---|---|---|
| Statistical and Early Neural Era (pre-2020) | |||||
| Feature-driven Seg. Galley et al. (2003) | Topic seg. | ICSI | ; WD | =23.0 | LDC |
| Bayesian Models Purver et al. (2006) | Topic seg. | ICSI | ; WD | =28.9 | None |
| Backchannel Heylen and op den Akker (2007) | Interaction analysis | AMI | =0.14 | None | |
| Sub-dialogue Det. Fernández et al. (2008) | Decision detect. | AMI | P; R; F1 | F1=34 | Stanford |
| Social Dim. Pred. Laskowski et al. (2008) | Social structure | AMI; ICSI | RERR | 37 to 67% | None |
| Directed Graphs Bui et al. (2009) | Decision detect. | AMI | P; R; F1 | F1=0.55 | BNT |
| ILP Disentangle Mayfield et al. (2012) | Structure pred. | Cancer Support | MAF; | Acc=0.78; =0.60 | None |
| Random Walk Chen and Metze (2012) | Meeting sum. | SmartNotes | ROUGE | R-1=49.79 | None |
| Argviz Nguyen et al. (2013) | Topic analysis | CNN Crossfire | Qualitative | Qualitative | Google toolkit |
| Dep. Parser Afantenos et al. (2015) | Structure parsing | STAC | F1 | 68.0 (unlabeled) | None |
| RL Trading Hiraoka et al. (2015) | Negotiation | Sim. trading | Avg. reward | Better than baseline | None |
| Static+Dynamic Ouchi and Tsuboi (2016) | ADR + RES sel. | Self-built | ADR; RES | ADR=68.54; RES=78.64 | None |
| Neural Speaker Meng et al. (2018) | Speaker class. | Self-built | F1; MRR | Macro F1=44.25 | Google Sites |
| W2W Le et al. (2019) | Addressee ID | Ubuntu IRC | P@n; Len-n | Len-5=80.86 | None |
| ICRED Liu et al. (2019) | Response gen. | RGMPC | BLEU; ROUGE | Length=11.34 | fasttext |
| ML Models Gatti de Bayser et al. (2019) | Turn-taking pred. | MultiWoZ; Finch | Accuracy | Acc=86.38 | gensim |
| Entity-centric Aina et al. (2019) | Entity linking | Self-built | F1; Accuracy | F1=52.5; Acc=77.6 | GitHub |
| Pre-trained Model and Graph Era (2020–2023) | |||||
| Multi-view BART Chen and Yang (2020) | Summarization | SAMSum | ROUGE | R-1=0.493 | GT-SALT (GitHub) |
| Topic-BERT Wang et al. (2020) | Selection; topics | Ubuntu | Recall; MRR | R@10=97.0 | salesforce (GitHub) |
| Turn-taking Annot. Enomoto et al. (2020) | Turn analysis | Chiba | POS; prosody | Qualitative | None |
| Pseudo-SSL Li and Zhao (2021) | Dialogue QA | FriendsQA; Molweni | EM; F1 | EM=58.0; F1=72.9 | EricLee8 (GitHub) |
| Cross-domain Trans. Liu and Chen (2021) | Discourse parsing | STAC; Molweni | F1 (link); L+R | Link=80.2 | HuggingFace |
| MPC-BERT Gu et al. (2021b) Gu, Tao, Ling, Xu, Geng, and Jiang | Addressee; speaker; selection | Ubuntu IRC | P@1; Acc | P@1=98.31; Acc=92.42 | JasonForJoy (GitHub) |
| ERMC Sun et al. (2021) | Emotion recog. | MELD; EmoryNLP | F1 | Avg F1=64.22 | google-research/bert |
| HeterMPC Gu et al. (2022) | Response gen. | Ubuntu IRC | BLEU; ROUGE | BLEU-1=12.61 | lxchtan (GitHub) |
| SOND Du et al. (2022) | Speaker diarization | AliMeeting | DER | 4.46% | yufan-aslp (GitHub) |
| SDMPED Zhu et al. (2022) | Empathetic gen. | MPED | ROUGE; BLEU | ROUGE-L=12.87 | GDPR |
| PersonaTKG (GCN) Ju et al. (2022) | Persona gen. | HLA-Chat++ | PPL; BLEU; Dist | PPL=109.72 | NEU-DataMining |
| User-Aware Park et al. (2022) | Disruptive detect. | ECOJOURNEYS | AUC; PR-AUC | AUC=84.80 | gingerit |
| Hier. VAE Sia et al. (2022) | Argument pred. | CMV | AUC | AUC=69.7 | GitHub |
| E2E Minuting Bhatnagar et al. (2022) | Meeting sum. | XSum; SAMSum | ROUGE; BLEU | R1=45.0; BLEU=7.07 | GitHub |
| PF Li and Zhao (2023) | Response gen. | Ubuntu IRC | BLEU; METEOR | BLEU-1=12.31 | EricLee8/MPDRG |
| ELECTRA-EMVI Li et al. (2023) | Parsing; QA | Molweni | ; | =91.78 | EricLee8/MPD_EMVI |
| MPC-BERT+GIFT Gu et al. (2023) | Addressee; speaker; sel. | Ubuntu IRC | @1 | @1=95.04 | JasonForJoy (GitHub) |
| MADNet Gu et al. (2023) | MPC generation | Ubuntu IRC | BLEU; METEOR | BLEU-1=11.82 | coco-caption |
| RARM Zhu et al. (2023) | Addressee recog. | Ubuntu IRC | ID/OD-AN | Overall=85.1 | None |
| PFT-Prompt Addlesee et al. (2023) | Goal-tracking; intent-slot | EU SPRING | Accuracy | Acc=62.32/69.57 | AddleseeHQ/mpgt-eval |
| LLM and Multi-Agent Era (2024–2026) | |||||
| LLMs zero/few-shot Martinenghi et al. (2024) | Dialogue acts | STAC | Acc; F1 | Acc=69.1; F1=71.6 | GitHub |
| Persona-HeterMPC Mahajan and Shaikh (2024) | Persona gen. | Persona-MPC | BLEU; METEOR | BLEU-1=12.47 | NEU-DataMining |
| RL-TRC Fan et al. (2024) | Response gen. | Ubuntu IRC | BLEU; METEOR | BLEU-1=13.66 | MaartenGr/KeyBERT |
| MuPaS Wang et al. (2024) | Generation; speaker pred. | Friends; GoT | Auto + human | GSM8K=43.14 | HuggingFace |
| DDPE Liu et al. (2025) | Discourse parsing | Molweni; STAC | Link; Link+Rel | Link=87.6; L+R=62.9 | Shannanliu/DDPE |
| LIMN Zhou et al. (2025) | Dialogue QA | SQuAD2; Molweni | EM; F1 | EM=60.2 (Molweni) | None |
| CMR Hu et al. (2025) | Response gen. | Friends; Ubuntu IRC | F1; BLEU | F1=13.43 | None |
| RL Fine-tuning Kiruluta et al. (2025) | Multi-turn align.; CoT | ChatEval | Latency | 35ms (±3) | None |
| DICE-BENCH Jang et al. (2025) | Tool-calling | DICE-BENCH | DICE-Score | 3.6444 | snuhcc/DICE-Bench |
| SS-MPC Jang et al. (2025) | Response gen. | Ubuntu IRC | BLEU; ROUGE-L | BLEU-1=15.60 | GitHub |
| BOLT ópez et al. (2025) | Intent recog. | MIntRec2.0; MPGT | Acc; WF1 | ACC=41.22/89.47 | None |
| DRCR Cao et al. (2026) | MPD generation | Ubuntu IRC-16/19 | BLEU; METEOR | BLEU-1=16.04 | None |
| Context-aware TT Bhagtani et al. (2026) | Turn-taking | AMI; Friends; SPGI | Acc; F1 | 61.03/60.54/64.45 | GitHub |
| MPCEval Zhang et al. (2026) | Multi-party eval | DeliData; MPDD | DNR; IR; PF | GPT-4-Turbo results | GitHub |
| EverMemBench Hu et al. (2026) | Memory eval | EverMemBench | Recall; Acc | 37.44/72.61 | GitHub |
| SyntheticMPC Penzo et al. (2026) | Synth. generation | WMPC | Constraints | All=77.72 | dhfbk (GitHub) |
References
- Jiang, J., S. Wang, Q. Li, L. Kong, and C. Wu. 2023. A cognitive stimulation dialogue system with multi-source knowledge fusion for elders with cognitive impairment. Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics Volume 1: 10628–10640. [Google Scholar] [CrossRef]
- Vaswani, A., N.M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is All you Need. Proceedings of the Neural Information Processing Systems. [Google Scholar]
- Ishizaki, M., and T. Kato. 1998. Exploring the characteristics of multi-party dialogues. USA: vol. ACL ’98/COLING ’98, pp. 583–589. [Google Scholar] [CrossRef]
- Sapkota, S., M.S. Hasan, M. Shah, and S. Karmaker. 2025. Multi-Party Conversational Agents: A Survey. ArXiv abs/2505.18845. [Google Scholar]
- Martínez-Hinarejos, C.D., V. Tamarit, and J.M. Benedí. 2010. Evaluation of HMM-based Models for the Annotation of Unsegmented Dialogue Turns. In Proceedings of the Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Valletta, Malta, Edited by N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner and D. Tapias. [Google Scholar]
- Shang, G., A. Tixier, M. Vazirgiannis, and J.P. Lorré. 2020. Speaker-change Aware CRF for Dialogue Act Classification. In Proceedings of the Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain (Online), Edited by D. Scott, N. Bel and C. Zong. pp. 450–464. [Google Scholar] [CrossRef]
- Zhu, Y., Z. Yang, H. Meng, B. Li, G. Levow, and I. King. 2010. Using finite state machines for evaluating spoken dialog systems. Proceedings of the 2010 IEEE Spoken Language Technology Workshop; pp. 478–483. [Google Scholar] [CrossRef]
- Mangrulkar, S., S. Shrivastava, V. Thenkanidiyoor, and D. Aroor Dinesh. 2018. A Context-aware Convolutional Natural Language Generation model for Dialogue Systems. In Proceedings of the Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue. Edited by K. Komatani, D. Litman, K. Yu, A. Papangelis, L. Cavedon and M. Nakano. Melbourne, Australia: pp. 191–200. [Google Scholar] [CrossRef]
- Wen, T.H., M. Gašić, N. Mrkšić, P.H. Su, D. Vandyke, and S. Young. 2015. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. In Proceedings of the Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, Edited by L. Màrquez, C. Callison-Burch and J. Su. pp. 1711–1721. [Google Scholar] [CrossRef]
- Skantze, G. 2017. Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks. In Proceedings of the Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. Edited by K. Jokinen, M. Stede, D. DeVault and A. Louis. Saarbrücken, Germany: pp. 220–230. [Google Scholar] [CrossRef]
- Wang, G., K. Zhang, J. Jiang, C. Wang, H. Bi, H. Liang, Z. Qi, Y. Huang, Y. Li, and X. Yang. 2026. Human–large language model collaboration in clinical medicine: a systematic review and meta-analysis. In npj Digital Medicine. [Google Scholar]
- Addlesee, A., W. Siei’nska, N. Gunson, D. Hernández García, C. Dondrup, and O. Lemon. 2023. Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering. ArXiv abs/2308.15231. [Google Scholar]
- Penzo, N., M. Sajedinia, B. Lepri, S. Tonelli, and M. Guerini. 2024. Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations. In Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA, Edited by Y. Al-Onaizan, M. Bansal and Y.N. Chen. pp. 11210–11233. [Google Scholar] [CrossRef]
- Castillo-López, G., G. de Chalendar, and N. Semmar. 2025. Intent Recognition and Out-of-Scope Detection using LLMs in Multi-party Conversations. In Proceedings of the Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Avignon, France, Edited by F. Béchet, F. Lefèvre, N. Asher, S. Kim and T. Merlin. pp. 504–512. [Google Scholar]
- Sun, Y., N. Yu, and G. Fu. 2021. A Discourse-Aware Graph Neural Network for Emotion Recognition in Multi-Party Conversation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021. Edited by M.F. Moens, X. Huang, L. Specia and S.W.t. Yih. Punta Cana, Dominican Republic: pp. 2949–2958. [Google Scholar] [CrossRef]
- Martinenghi, A., G. Donabauer, S. Amenta, S. Bursic, M. Giudici, U. Kruschwitz, F. Garzotto, and D. Ognibene. 2024. LLMs of Catan: Exploring Pragmatic Capabilities of Generative Chatbots Through Prediction and Classification of Dialogue Acts in Boardgames’ Multi-party Dialogues. In Proceedings of the Proceedings of the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024. Torino, Italia, Edited by C. Madge, J. Chamberlain, K. Fort, U. Kruschwitz and S. Lukin. pp. 107–118. [Google Scholar]
- Mayfield, E., D. Adamson, and C. Penstein Rosé. 2012. Hierarchical Conversation Structure Prediction in Multi-Party Chat. In Proceedings of the Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Seoul, South Korea, Edited by G.G. Lee, J. Ginzburg, C. Gardent and A. Stent. pp. 60–69. [Google Scholar]
- Li, Y., and H. Zhao. 2023. EM Pre-training for Multi-party Dialogue Response Generation. Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, Volume 1, pp. 92–103. [Google Scholar] [CrossRef]
- Li, Y., and H. Zhao. 2021. Self- and Pseudo-self-supervised Prediction of Speaker and Key-utterance for Multi-party Dialogue Reading Comprehension. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021. Edited by M.F. Moens, X. Huang, L. Specia and S.W.t. Yih. Punta Cana, Dominican Republic: pp. 2053–2063. [Google Scholar] [CrossRef]
- Park, K., H. Sohn, W. Min, B. Mott, K. Glazewski, C.E. Hmelo-Silver, and J. Lester. 2022. Disruptive Talk Detection in Multi-Party Dialogue within Collaborative Learning Environments with a Regularized User-Aware Network. In Proceedings of the Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue. Edinburgh, UK, Edited by O. Lemon, D. Hakkani-Tur, J.J. Li, A. Ashrafzadeh, D.H. Garcia, M. Alikhani, D. Vandyke and O. Dušek. pp. 490–499. [Google Scholar] [CrossRef]
- Sia, S., K. Jaidka, H. Ahuja, N. Chhaya, and K. Duh. 2022. Offer a Different Perspective: Modeling the Belief Alignment of Arguments in Multi-party Debates. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Edited by Y. Goldberg, Z. Kozareva and Y. Zhang. pp. 11939–11950. [Google Scholar] [CrossRef]
- Gatti de Bayser, M., P.R. Cavalin, C.S. Pinhanez, and B. Zadrozny. 2019. Learning Multi-Party Turn-Taking Models from Dialogue Logs. ArXiv abs/1907.02090. [Google Scholar]
- Laskowski, K. 2010. Modeling Norms of Turn-Taking in Multi-Party Conversation. In Proceedings of the Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden, Edited by J. Hajič, S. Carberry, S. Clark and J. Nivre. pp. 999–1008. [Google Scholar]
- Enomoto, M., Y. Den, and Y. Ishimoto. 2020. A Conversation-Analytic Annotation of Turn-Taking Behavior in Japanese Multi-Party Conversation and its Preliminary Analysis. In Proceedings of the Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille, France, Edited by N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani and et al. pp. 644–652. [Google Scholar]
- Wang, X., N. Xi, T. Chen, Q. Gu, Y. Zhao, X. Chen, Z. Jiang, Y. Chen, and L. Ji. 2024. Multi-Party Supervised Fine-tuning of Language Models for Multi-Party Dialogue Generation. ArXiv arXiv:abs/1907.02090. [Google Scholar]
- Purver, M., K.P. Körding, T.L. Griffiths, and J.B. Tenenbaum. 2006. Unsupervised Topic Modelling for Multi-Party Spoken Discourse. Proceedings of the Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia; pp. 17–24. [Google Scholar] [CrossRef]
- Galley, M., K.R. McKeown, E. Fosler-Lussier, and H. Jing. 2003. Discourse Segmentation of Multi-Party Conversation. Proceedings of the Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan; pp. 562–569. [Google Scholar] [CrossRef]
- Nguyen, V.A., Y. Hu, J. Boyd-Graber, and P. Resnik. 2013. Argviz: Interactive Visualization of Topic Dynamics in Multi-party Conversations. In Proceedings of the Proceedings of the 2013 NAACL HLT Demonstration Session. Atlanta, Georgia, Edited by C. Dyer and D. Higgins. pp. 36–39. [Google Scholar]
- Wang, W., S.C. Hoi, and S. Joty. 2020. Response Selection for Multi-Party Conversations with Dynamic Topic Tracking. Proceedings of the Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online; pp. 6581–6591. [Google Scholar] [CrossRef]
- Afantenos, S., E. Kow, N. Asher, and J. Perret. 2015. Discourse parsing for multi-party chat dialogues. Proceedings of the Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal; pp. 928–937. [Google Scholar] [CrossRef]
- Li, Y., X. Huang, W. Bi, and H. Zhao. 2023. Pre-training Multi-party Dialogue Models with Latent Discourse Inference. Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, Volume 1, pp. 9584–9599. [Google Scholar] [CrossRef]
- Liu, Z., and N.F. Chen. 2021. Improving Multi-Party Dialogue Discourse Parsing via Domain Integration. ArXiv abs/2110.04526. [Google Scholar]
- Liu, S., P. Li, Y. Fan, and Q. Zhu. 2025. Enhancing Multi-party Dialogue Discourse Parsing with Explanation Generation. In Proceedings of the Proceedings of the 31st International Conference on Computational Linguistics. Abu Dhabi, UAE, Edited by O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B.D. Eugenio and S. Schockaert. pp. 1531–1544. [Google Scholar]
- Fernández, R., M. Frampton, P. Ehlen, M. Purver, and S. Peters. 2008. Modelling and Detecting Decisions in Multi-party Dialogue. In Proceedings of the Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue. Columbus, Ohio, Edited by D. Schlangen and B.A. Hockey. pp. 156–163. [Google Scholar]
- Bui, T., M. Frampton, J. Dowding, and S. Peters. 2009. Extracting Decisions from Multi-Party Dialogue Using Directed Graphical Models and Semantic Similarity. In Proceedings of the Proceedings of the SIGDIAL 2009 Conference. London, UK, Edited by P. Healey, R. Pieraccini, D. Byron, S. Young and M. Purver. pp. 235–243. [Google Scholar]
- Kiruluta, A., A. Lemos, and P. Burity. 2025. History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM. ArXiv abs/2506.11108. [Google Scholar]
- Fan, Y., P. Li, and Q. Zhu. 2024. Improving Multi-party Dialogue Generation via Topic and Rhetorical Coherence. Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA; pp. 3240–3253. [Google Scholar] [CrossRef]
- Gu, J.C., C.H. Tan, C. Tao, Z.H. Ling, H. Hu, X. Geng, and D. Jiang. 2022. HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations. In Proceedings of the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin, Ireland, Edited by S. Muresan, P. Nakov and A. Villavicencio. Volume 1, pp. 5086–5097. [Google Scholar] [CrossRef]
- Gu, J.C., Z. Ling, Q. Liu, C. Liu, and G. Hu. 2023. GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Edited by A. Rogers, J. Boyd-Graber and N. Okazaki. Volume 1, pp. 11645–11658. [Google Scholar] [CrossRef]
- Hu, Z., Q. He, R. Li, M. Zhao, and L. Wang. 2025. Advancing Multi-Party Dialogue Framework with Speaker-ware Contrastive Learning.
- Liu, C., K. Liu, S. He, Z. Nie, and J. Zhao. 2019. Incorporating Interlocutor-Aware Context into Response Generation on Multi-Party Chatbots. In Proceedings of the Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Hong Kong, China, Edited by M. Bansal and A. Villavicencio. pp. 718–727. [Google Scholar] [CrossRef]
- Tan, C.H., J.C. Gu, and Z.H. Ling. 2023. Is ChatGPT a Good Multi-Party Conversation Solver? In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Edited by H. Bouamor, J. Pino and K. Bali. Singapore: pp. 4905–4915. [Google Scholar] [CrossRef]
- Gu, J.C., C. Tao, Z. Ling, C. Xu, X. Geng, and D. Jiang. 2021. MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding. In Proceedings of the Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Edited by C. Zong, F. Xia, W. Li and R. Navigli. Online: Volume 1, pp. 3682–3692. [Google Scholar] [CrossRef]
- Gu, J.C., C. Tao, Z. Ling, C. Xu, X. Geng, and D. Jiang. 2021. MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding. ArXiv arXiv:abs/2106.01541. [Google Scholar]
- Ouchi, H., and Y. Tsuboi. 2016. Addressee and Response Selection for Multi-Party Conversation. Proceedings of the Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas; pp. 2133–2143. [Google Scholar] [CrossRef]
- Ju, D., S. Feng, P. Lv, D. Wang, and Y. Zhang. 2022. Learning to Improve Persona Consistency in Multi-party Dialogue Generation via Text Knowledge Enhancement. In Proceedings of the Proceedings of the 29th International Conference on Computational Linguistics. Gyeongju, Republic of Korea, Edited by N. Calzolari, C.R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.S. Choi, P.M. Ryu, H.H. Chen, L. Donatelli, H. Ji and et al. pp. 298–309. [Google Scholar]
- Mahajan, K., and S. Shaikh. 2024. Persona-aware Multi-party Conversation Response Generation. In Proceedings of the Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino, Italia, Edited by N. Calzolari, M.Y. Kan, V. Hoste, A. Lenci, S. Sakti and N. Xue. pp. 12712–12723. [Google Scholar]
- Zhu, L., Z. Zhang, J. Wang, H. Wang, H. Wu, and Z. Yang. 2022. Multi-Party Empathetic Dialogue Generation: A New Task for Dialog Systems. Proceedings of the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, Volume 1, pp. 298–307. [Google Scholar] [CrossRef]
- Jiang, J., Y. Chen, P. Chen, K. Liu, J. Zhou, Z. Zhu, H. Hu, F. Ma, Q. Tian, and C. Wu. 2026. A Principle-Driven Adaptive Policy for Group Cognitive Stimulation Dialogue for Elderly with Cognitive Impairment. [Google Scholar] [CrossRef]
- Jang, K., D. Lee, K. Kim, D. Heo, T. Lee, W. Kim, and B. Suh. 2025. DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues. Proceedings of the Annual Meeting of the Association for Computational Linguistics. [Google Scholar]
- Zhou, J., S. Wang, D. Deng, J. Lu, J. Su, Q. Li, J. Gao, H. Wu, J. Jiang, L. Kong, and et al. 2026. ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation. arXiv arXiv:2602.07883. [Google Scholar]
- Heylen, D., and R. op den Akker. 2007. Computing Backchannel Distributions in Multi-Party Conversations. In Proceedings of the Proceedings of the Workshop on Embodied Language Processing. Prague, Czech Republic, Edited by J. Cassell and D. Heylen. pp. 17–24. [Google Scholar]
- Le, R., W. Hu, M. Shang, Z. You, L. Bing, D. Zhao, and R. Yan. 2019. Who Is Speaking to Whom? Learning to Identify Utterance Addressee in Multi-Party Conversations. Proceedings of the Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China; pp. 1909–1919. [Google Scholar] [CrossRef]
- Zhu, P., W. Zhou, K. Zhang, Y. Ma, and H. Chen. 2023. Robust Learning for Multi-party Addressee Recognition with Discrete Addressee Codebook. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Edited by A. Rogers, J. Boyd-Graber and N. Okazaki. Volume 2, pp. 571–578. [Google Scholar] [CrossRef]
- Du, Z., S. Zhang, S. Zheng, and Z.J. Yan. 2022. Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Edited by Y. Goldberg, Z. Kozareva and Y. Zhang. pp. 7458–7469. [Google Scholar] [CrossRef]
- Meng, Z., L. Mou, and Z. Jin. 2018. Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, Dataset, and Models. In Proceedings of the Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan, Edited by N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo and et al. [Google Scholar]
- Laskowski, K., M. Ostendorf, and T. Schultz. 2008. Modeling Vocal Interaction for Text-Independent Participant Characterization in Multi-Party Conversation. In Proceedings of the Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue. Columbus, Ohio, Edited by D. Schlangen and B.A. Hockey. pp. 148–155. [Google Scholar]
- Chen, J., and D. Yang. 2020. Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization. Proceedings of the Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online; pp. 4106–4118. [Google Scholar] [CrossRef]
- Chen, Y.N., and F. Metze. 2012. Intra-Speaker Topic Modeling for Improved Multi-Party Meeting Summarization with Integrated Random Walk. In Proceedings of the Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Montréal, Canada, Edited by E. Fosler-Lussier, E. Riloff and S. Bangalore. pp. 377–381. [Google Scholar]
- Bhatnagar, A., N. Bhavsar, M. Singh, and P. Motlicek. 2022. An End-to-End Multilingual System for Automatic Minuting of Multi-Party Dialogues. Proceedings of the Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation, Manila, Philippines, 10; pp. 582–589. [Google Scholar]
- Zhou, S., R. Zhao, Z. Zhou, H. Yi, X. Zheng, and H. Wang. 2025. Enhancing Extractive Question Answering in Multiparty Dialogues with Logical Inference Memory Network. In Proceedings of the Proceedings of the 31st International Conference on Computational Linguistics. Abu Dhabi, UAE, Edited by O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B.D. Eugenio and S. Schockaert. pp. 8725–8738. [Google Scholar]
- Hiraoka, T., K. Georgila, E. Nouri, D. Traum, and S. Nakamura. 2015. Reinforcement Learning in Multi-Party Trading Dialog. In Proceedings of the Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Edited by A. Koller, G. Skantze, F. Jurcicek, M. Araki and C.P. Rose. Prague, Czech Republic: pp. 32–41. [Google Scholar] [CrossRef]
- Aina, L., C. Silberer, I.T. Sorodoc, M. Westera, and G. Boleda. 2019. What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue. In Proceedings of the Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota, Edited by J. Burstein, C. Doran and T. Solorio. Volume 1, pp. 3772–3783. [Google Scholar] [CrossRef]
- Jang, Y., K. Kim, and Y. Ko. 2025. SS-MPC: A Sequence-Structured Multi-Party Conversation System. arXiv arXiv:cs. [Google Scholar]
- Hu, C., T. Li, X. Gao, H. Chen, Y. Bai, D. Xu, T. Lin, X. Li, Y. Han, J. Pei, and et al. 2026. Evaluating Long-Horizon Memory for Multi-Party Collaborative Dialogues. arXiv arXiv:cs. [Google Scholar]
- Gu, J.C., C.H. Tan, C. Chu, Z.H. Ling, C. Tao, Q. Liu, and C. Liu. 2023. MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation. In Proceedings of the Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Singapore, Edited by H. Bouamor, J. Pino and K. Bali. pp. 7681–7692. [Google Scholar] [CrossRef]
- Chen, Y., J. Jiang, D. Yu, Z. Wu, J. Liu, J. Han, X. Guo, J. Qi, Y. Li, Y. Zhang, and et al. 2026. LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition. arXiv arXiv:2605.24005. [Google Scholar]
- Cao, Z., P. Li, and Q. Zhu. 2026. Discourse Coherence and Response-Guided Context Rewriting for Multi-Party Dialogue Generation. arXiv arXiv:cs. [Google Scholar]
- Bhagtani, K., M. Anand, Y.C. Xu, and A.K.S. Yadav. 2026. Speak or Stay Silent: Context-Aware Turn-Taking in Multi-Party Dialogue. arXiv arXiv:cs. [Google Scholar]
- Zhang, M., Y. Yang, Z. Jia, X. Yang, J. Pei, Y. Zang, X. Deng, and X. Chen. 2026. MPCEval: A Benchmark for Multi-Party Conversation Generation. arXiv arXiv:cs. [Google Scholar]
- Penzo, N., M. Guerini, B. Lepri, G. Glavaš, and S. Tonelli. 2026. Don’t Stop the Multi-Party! On Generating Synthetic Written Multi-Party Conversations with Constraints. Proceedings of the AAAI Conference on Artificial Intelligence 40: 32701–32709. [Google Scholar] [CrossRef]
- Wang, S., P. Chen, J. Zhou, Q. Li, J. Dong, J. Gao, B. Xue, J. Jiang, L. Kong, and C. Wu. 2026. TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning. Advances in Neural Information Processing Systems 38: 63870–63918. [Google Scholar]
- Jiang, J., L. Chen, S. Wang, L. Kong, Y. Li, and C. Wu. 2024. Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting. arXiv arXiv:2406.16567. [Google Scholar]
- Wang, S., L. Chen, J. Jiang, B. Xue, L. Kong, and C. Wu. 2024. Lora meets dropout under a unified framework. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024: 1995–2008. [Google Scholar] [CrossRef]
- Wang, S., B. Xue, J. Ye, J. Jiang, L. Chen, L. Kong, and C. Wu. 2024. PRoLoRA: partial rotation empowers more parameter-efficient LoRA. Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics Volume 1: 2829–2841. [Google Scholar]
- Wang, S., L. Chen, P. Chen, J. Dong, B. Xue, J. Jiang, L. Kong, and C. Wu. 2025. Mos: Unleashing parameter efficiency of low-rank adaptation with mixture of shards. Proceedings of the International Conference on Learning Representations Vol. 2025: 91886–91902. [Google Scholar]
- Asher, N., J. Hunter, M. Morey, B. Farah, and S. Afantenos. 2016. Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus. In Proceedings of the Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Portorož, Slovenia, Edited by N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk and et al. pp. 2721–2727. [Google Scholar]
- Li, J., M. Liu, M.Y. Kan, Z. Zheng, Z. Wang, W. Lei, T. Liu, and B. Qin. Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure. In Proceedings of the Proceedings of the 28th International Conference on Computational Linguistics. Edited by D. Scott, N. Bel and C. Zong.
- Chang, K., D. Chen, and D. Bamman. 2023. Dramatic Conversation Disentanglement. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023. Edited by A. Rogers, J. Boyd-Graber and N. Okazaki. Toronto, Canada: pp. 4020–4046. [Google Scholar] [CrossRef]
- Lerner, P., J. Bergoënd, C. Guinaudeau, H. Bredin, B. Maurice, S. Lefevre, M. Bouteiller, A. Berhe, L. Galmant, R. Yin, and et al. 2022. Bazinga! A Dataset for Multi-Party Dialogues Structuring. In Proceedings of the Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France, Edited by N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani and et al. pp. 3434–3441. [Google Scholar]
- Poria, S., D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea. 2019. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In Proceedings of the Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Edited by A. Korhonen, D. Traum and L. Màrquez. Florence, Italy: pp. 527–536. [Google Scholar] [CrossRef]
- Hsu, C.C., S.Y. Chen, C.C. Kuo, T.H. Huang, and L.W. Ku. 2018. EmotionLines: An Emotion Corpus of Multi-Party Conversations. In Proceedings of the Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan, Edited by N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo and et al. [Google Scholar]
- Zahiri, S.M., and J.D. Choi. 2017. Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks. arXiv arXiv:cs. [Google Scholar]
- Chen, Y.T., H.H. Huang, and H.H. Chen. 2020. MPDD: A Multi-Party Dialogue Dataset for Analysis of Emotions and Interpersonal Relationships. In Proceedings of the Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille, France, Edited by N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani and et al. pp. 610–614. [Google Scholar]
- Li, A.W., V. Jiang, S.Y. Feng, J. Sprague, W. Zhou, and J. Hoey. 2020. ALOHA: Artificial Learning of Human Attributes for Dialogue Agents. Proceedings of the AAAI Conference on Artificial Intelligence 34: 8155–8163. [Google Scholar] [CrossRef]
- Gliwa, B., I. Mochol, M. Biesek, and A. Wawer. 2019. SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. Proceedings of the Proceedings of the 2nd Workshop on New Frontiers in Summarization Association for Computational Linguistics. [Google Scholar] [CrossRef]
- Narayan, S., S.B. Cohen, and M. Lapata. 2018. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. In Proceedings of the Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium, Edited by E. Riloff, D. Chiang, J. Hockenmaier and J. Tsujii. 10. pp. 1797–1807. [Google Scholar] [CrossRef]
- Budzianowski, P., T.H. Wen, B.H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gašić. MultiWOZ – A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling, 2020. arXiv arXiv:cs.
- Lowe, R., N. Pow, I. Serban, and J. Pineau. 2015. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems. In Proceedings of the Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Edited by A. Koller, G. Skantze, F. Jurcicek, M. Araki and C.P. Rose. Prague, Czech Republic: pp. 285–294. [Google Scholar] [CrossRef]
- Carletta, J., S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraaij, M. Kronenthal, and et al. 2006. The AMI Meeting Corpus: A Pre-announcement. In 10.1007/11677482_3; 2nd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2005; Conference date Proceedings of the Machine Learning for Multimodal Interaction, Second International Workshop;Number 10 in Lecture Notes in Computer Science. Germany, Edited by S. Renals and S. Bengio. p. 28–39 11-07-2005 Through 13-07-2005. [Google Scholar] [CrossRef]
- Shriberg, E., R. Dhillon, S. Bhagat, J. Ang, and H. Carvey. 2004. The ICSI Meeting Recorder Dialog Act (MRDA) Corpus. Proceedings of the Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004, Cambridge, Massachusetts, USA, 10; pp. 97–100. [Google Scholar]
- Yu, F., S. Zhang, Y. Fu, L. Xie, S. Zheng, Z. Du, W. Huang, P. Guo, Z. Yan, B. Ma, and et al. M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge, 2022. arXiv arXiv:cs.
- Litman, D., S. Paletz, Z. Rahimi, S. Allegretti, and C. Rice. 2016. The Teams Corpus and Entrainment in Multi-Party Spoken Dialogues. In Proceedings of the Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas, Edited by J. Su, K. Duh and X. Carreras. pp. 1421–1431. [Google Scholar] [CrossRef]
- Shaikh, S., T. Strzalkowski, A. Broadwell, J. Stromer-Galley, S. Taylor, and N. Webb. 2010. MPC: A Multi-Party Chat Corpus for Modeling Social Phenomena in Discourse. In Proceedings of the Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Valletta, Malta, Edited by N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner and D. Tapias. [Google Scholar]
- Manuvinakurike, R., S. Sahay, W. Chen, and L. Nachman. 2021. Incremental temporal summarization in multi-party meetings. In Proceedings of the Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue. Edited by H. Li, G.A. Levow, Z. Yu, C. Gupta, B. Sisman, S. Cai, D. Vandyke, N. Dethlefs, Y. Wu and J.J. Li. Singapore and Online: pp. 530–541. [Google Scholar] [CrossRef]
- Sedoc, J., D. Ippolito, A. Kirubarajan, J. Thirani, L. Ungar, and C. Callison-Burch. 2019. ChatEval: A Tool for Chatbot Evaluation. In Proceedings of the Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Minneapolis, Minnesota, Edited by W. Ammar, A. Louis and N. Mostafazadeh. pp. 60–65. [Google Scholar] [CrossRef]
- Li, Y., B. Zou, Y. Fan, X. Li, A.T. Aw, and Y. Hong. 2023. GLGR: Question-aware Global-to-Local Graph Reasoning for Multi-party Dialogue Reading Comprehension. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Edited by H. Bouamor, J. Pino and K. Bali. Singapore: pp. 1817–1826. [Google Scholar] [CrossRef]
- Chen, Y., J. Jiang, J. Liu, Y. Zhang, X. Guo, and I. King. 2026. Trace: Trajectory-aware comprehensive evaluation for deep research agents. Proceedings of the Proceedings of the ACM Web Conference 2026: 2524–2534. [Google Scholar] [CrossRef]
- Hu, H., J. Si, Q. Wang, T. Weng, Y. Ji, J. Jiang, F. Ma, Y. Zhou, L. Cui, and Q. Tian. 2026. MindDialog: A large-scale benchmark for counseling dialogue understanding and generation. Pattern Recognition, 113766. [Google Scholar] [CrossRef]
- Jiang, J., P. Chen, L. Chen, S. Wang, Q. Bao, L. Kong, Y. Li, and C. Wu. 2025. How well do llms handle cantonese? benchmarking cantonese capabilities of large language models. Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025: 4464–4505. [Google Scholar] [CrossRef]
- Jiang, J., A.K.Y. Truong, Y. Chen, Q. Bao, S. Wang, P. Chen, J. Wang, L. Kong, Y. Li, and C. Wu. 2025. Developing and Utilizing a Large-Scale Cantonese Dataset for Multi-Tasking in Large Language Models. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025: 1924–1944. [Google Scholar] [CrossRef]
- Zhou, J., S. Wang, J. Dong, K. Liu, L. Li, J. Gao, J. Jiang, L. Kong, and C. Wu. 2025. PROREASON: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom. Proceedings of the Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 31650–31679. [Google Scholar]



| Method | Task | Dataset | Era | Headline Result |
|---|---|---|---|---|
| Feature-driven seg. Galley et al. (2003) | Topic segmentation | ICSI | Statistical | =23.0, WD=25.47 |
| Bayesian topic models Purver et al. (2006) | Topic segmentation | ICSI | Statistical | =28.9, WD=32.9 |
| Static/Dynamic RNN Ouchi and Tsuboi (2016) | Addressee + Response | Self-built | Neural | ADR=68.54, RES=78.64 |
| W2W Le et al. (2019) | Addressee identification | Ubuntu IRC | Neural | Len-5=80.86 |
| MPC-BERT Gu et al. (2021b) Gu, Tao, Ling, Xu, Geng, and Jiang | Addressee, Speaker, Sel. | Ubuntu IRC | PLM | P@1=98.31, Acc=92.42 |
| HeterMPC Gu et al. (2022) | Response generation | Ubuntu IRC | Graph + PLM | BLEU-1=12.61 |
| GIFT Gu et al. (2023) | Multi-task understanding | Ubuntu IRC | Graph + PLM | @1=95.04 |
| ELECTRA-EMVI Li et al. (2023) | Discourse parsing, QA | Molweni | PLM + Latent | =91.78 |
| PFT-Prompt Addlesee et al. (2023) | Goal tracking | EU SPRING | LLM Prompting | Acc=69.57 |
| RL-TRC Fan et al. (2024) | Response generation | Ubuntu IRC | RL + LLM | BLEU-1=13.66 |
| MuPaS Wang et al. (2024) | Gen. + Speaker pred. | Friends, GoT | LLM-SFT | GSM8K=43.14 |
| DICE-BENCH Jang et al. (2025) | Tool-calling | DICE-BENCH | Multi-Agent | DICE=3.64 |
| SS-MPC Jang et al. (2025) | Response generation | Ubuntu IRC | Multi-Agent | BLEU-1=15.60 |
| EverMemBench Hu et al. (2026) | Long-horizon memory | EverMemBench | Multi-Agent | Avg=37.44/72.61 |
| Dataset | Category | Source | Scale | Modality | Key Annotations |
|---|---|---|---|---|---|
| STAC Asher et al. (2016) | Struct. | Online games | Multi-party chats | Text | SDRT discourse structure, dialogue acts |
| Molweni Li et al. (forthcoming) Li, Liu, Kan, Zheng, Wang, Lei, Liu, and Qin | Struct. | Ubuntu chat | 10K dial./88K utt. | Text | Discourse dependency, QA pairs |
| MTDD Chang et al. (2023) | Struct. | TV/movie scripts | 10K turns/831 shows | Text | Thread disentanglement, floor changes |
| Bazinga! Lerner et al. (2022) | Struct. | TV/movie scripts | Large-scale | Text | Diarization, addressee, entity linking |
| MELD Poria et al. (2019) | Emo. | Friends | 13K utterances | A+V+T | Emotion, sentiment (multimodal) |
| EmotionLines Hsu et al. (2018) | Emo. | Friends/FB | 29K utterances | Text | Seven emotion labels |
| EmoryNLP Zahiri and Choi (2017) | Emo. | Friends | 12.6K utterances | Text | Seven crowdsourced emotions |
| MPDD Chen et al. (2020) | Emo. | Chinese TV | Multi-party | Text | Emotions, interpersonal relations (CN) |
| ALOHA Li et al. (2020) | Emo. | TV/movie scripts | 1M+ dialogue lines | Text | Human-Level Attributes (HLAs) |
| SAMSum Gliwa et al. (2019) | Task. | Messenger style | 16K conversations | Text | Third-person abstractive summaries |
| XSum Narayan et al. (2018) | Task. | BBC articles | 226K pairs | Text | Single-sentence extreme summaries |
| MultiWOZ Budzianowski et al. (2020) | Task. | Wizard-of-Oz | 7 domains | Text | Belief tracking, dialogue acts |
| Ubuntu Lowe et al. (2015) | Task. | Tech support IRC | ∼1M dialogues | Text | Next-response selection |
| DICE-BENCH Jang et al. (2025) | Task. | Synthesized | 1,607 dialogues | Text | Multi-round tool-calling traces |
| WMPC Penzo et al. (2026) | Task. | LLM-synthesized | Constrained | Text | Structure, stance, interaction flow |
| AMI Carletta et al. (2006) | Multi. | Real meetings | 100 hours | A+V+T | Dialogue acts, gestures, attention |
| ICSI MRDA Shriberg et al. (2004) | Multi. | Real meetings | 72 h / 180K DA | A+T | Dialogue acts, adjacency pairs |
| AliMeeting Yu et al. (2022) | Multi. | Mandarin meetings | 120 hours | Audio (8ch) | Speaker diarization, multi-speaker ASR |
| Teams Corpus Litman et al. (2016) | Multi. | Cooperative games | 63 teams | A+V+T | Entrainment, dominance, group dynamics |
| Family | Representative Metrics | Primary Use Cases |
|---|---|---|
| Automatic Generation | BLEU, ROUGE, METEOR, BERTScore, PPL, Distinct-n | Response generation, summarization, fluency, diversity |
| Classification & Selection | Accuracy, F1 (macro/micro), Precision@k, Recall@k, MRR, EM, AUC, DICE-score | Addressee/intent classification, response selection, tool-calling |
| Structural & Discourse | , WindowDiff, Link/Link+Rel F1, DER, Kappa | Topic segmentation, discourse parsing, diarization |
| Human Evaluation | Fluency, Coherence, Relevance, Informativeness, Authority | Pragmatic appropriateness, role coordination |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).