Submitted:
21 April 2025
Posted:
23 April 2025
You are already at the latest version
Abstract
Keywords:
I. Introduction
II. Literature Review
III. Methodology
A. Process Overview

B. Algorithm for Legal Document Summarization
| Algorithm 1 Legal Document Summarization |
| 1: Input: Legal document D, Pre-trained models MBERT , MT 5, |
| Optimizer Ω (Adam), Learning rate λ |
| 2: Output: Trained model M , Evaluation scores SROUGE , SBLEU , |
| SMETEOR |
| 3: Initialize Parameters: |
| 4: input text ← D |
| 5: Ω ← Adam(λ) |
| 6: Preprocess Document: |
| 7: input lower ← lowercase(input text) |
| 8: input clean ← regex remove(input lower) |
| 9: Summarization: |
| 10: Sext ← MBERT (Tnorm) |
| 11: Sabs ← MT 5(”summarize : ” + Sext) |
| 12: Evaluation Metrics: |
| 13: SROUGE ← overlap(Sfinal, Rref ) |
| 14: SBLEU ← similarity(Sfinal, Rref ) |
| 15: SMET EOR ← semantic match(Sfinal, Rref ) |
C. Text Extraction and Preprocessing

D. Summarization Models
- 1)
- Extractive Summarization: Leverages the BART-large-CNN model to process a substantial portion of the preprocessed text and produce a concise summary. Key sentences are selected based on their relevance, scored as:

- 2)
- Abstractive Summarization: Utilizes T5-large, with inputs prefixed by "summarize:" and processed within the model’s capacity.

Hybrid Summarization
E. Evaluation Metrics
F. Model Compilation and Optimization
Loss Function
Optimizer
G. Interface Design

IV. Results and Discussion
A. Experimental Setup
- Dataset: A legal PDF containing a reference summary was evaluated by showing that “The court ruled in favor of the plaintiff because the defendant showed negligence.” The court decided to support the plaintiff by recognizing defendant negligence.
- Process: The dual-model process utilized preprocessing techniques to clean the text which enabled it to generate hybrid summaries through a Gradio interface including extracted text and extractive summary along with abstractive summary.
- Metrics: Testing the system involved comparing results through a combination of ROUGE (overlap) BLEU (sentence similarity) and METEOR (semantic alignment).
B. Findings
- The ROUGE system detected moderate matching between unigram terms in addition to lower bigram and sequence phrases suggesting problems with exact wording.
- BLEU recorded a low score because the text was paraphrased with wording alternatives like “ruled in favor” instead of “supported.” This reduced the precision of ngram matches.
- The METEOR measure demonstrates fair semantic understanding by understanding the meaning through alternative word usages despite lexical variations (such as “citing” versus “found”).
- Preprocessing Impact: The process of cleaning up noise improved the focus on legal terminology yet this could have removed necessary contextual information.
- Hybrid Approach: The approach combined extractive accuracy with readable content but still lost some exact legal written language.
C. Discussion
- Performance: METEOR indicates semantic retention; BLEU reflects paraphrasing problems; ROUGE-1 suggests term retention, but ROUGE-2/L displays structural gaps.
- Comparison: Offers generalizability but performs worse than domain-specific models such as LSDK-LegalSum [4] (ROUGE gains 8–16%).
- Strengths: Preprocessing helps focus the model; the Gradio interface improves usability.
- Limitations: Preprocessing might overlook subtleties; lacks legal-specific tuning.
- Future Work: Improve legal corpora and add legal embeddings for accuracy.
V. Conclusion
References
- D. Jain, M. D. Borah, and A. Biswas, ”A sentence is known by the company it keeps: Improving Legal Document Summarization Using Deep Clustering,” Artificial Intelligence and Law, vol. 32, pp. 165–200, 2024. [CrossRef]
- S. Sharma and P. P. Singh, ”Advancing Legal Document Summarization: Introducing an Approach Using a Recursive Summarization Algorithm,” SN Computer Science, vol. 5, p. 927, 2024. [CrossRef]
- D. Jain, M. D. Borah, and A. Biswas, ”Domain knowledge-enriched summarization of legal judgment documents via grey wolf optimization,” in Advances in Computers, vol. 132, Elsevier, pp. 223–248, 2023. [CrossRef]
- W. Gao et al., ”LSDK-LegalSum: Improving legal judgment summarization using logical structure and domain knowledge,” Journal of King Saud University - Computer and Information Sciences, vol. 37, 2025. [CrossRef]
- J. K. Verma et al., ”Context-Aware Legal Summarization Using Reinforcement Learning,” in 2025 2nd International Conference on Computational Intelligence, Communication Technology and Network-ing (CICTN), IEEE, Ghaziabad, India, 2025. [CrossRef]
- X. Dong et al., ”TermDiffuSum: A Term-guided Diffusion Model for Extractive Summarization of Legal Documents,” in Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, UAE, pp. 3222–3235, 2025. URL: https://aclanthology.org/2025.coling-main.216/.
- S. Goswami, N. Saini, and S. Shukla, ”Incorporating Domain Knowledge in Multi-objective Optimization Framework for Automating Indian Legal Case Summarization,” in Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol. 15319, Springer, Cham, pp. 265–280, 2025. [CrossRef]
- R. K. Peddarapu et al., ”Document Summarizer: A Machine Learning approach to PDF summarization,” Procedia Computer Science, 2025. [CrossRef]
- V. Bellandi, S. Castano, S. Montanelli, and S. Siccardi, ”Streamlining Legal Document Management: A Knowledge-Driven Service Platform,” SN Computer Science, vol. 6, p. 166, 2025. [CrossRef]
- M. Płonka et al., ”A comparative evaluation of the effectiveness of document splitters for large language models in legal contexts,” Expert Systems with Applications, 2025. [CrossRef]
- N. Begum and A. Goyal, ”Analysis of Legal Case Document Automated Summarizer,” in 2021 6th International Conference on Signal Process-ing, Computing and Control (ISPCC), IEEE, Solan, India, 2021. [CrossRef]
- Harris and M. Oussalah, ”Automatic Document Summarizer,” in 2008 7th IEEE International Conference on Cybernetic Intelligent Systems, IEEE, London, UK, 2008. [CrossRef]
- A. M. Al-Numai and A. Azmi, ”The Development of Single-Document Abstractive Text Summarizer During the Last Decade,” in Natural Language Processing and Text Mining, IGI Global, pp. 25–50, 2020. [CrossRef]
- K. Akiyama, A. Tamura, and T. Ninomiya, ”Hie-BART: Document Summarization with Hierarchical BART,” in Proceedings of the 2021 NAACL Student Research Workshop, Online, pp. 159–165, 2021. [CrossRef]
- R. C. Kore, P. Ray, P. Lade, and A. Nerurkar, ”Legal Document Summarization Using NLP and ML Techniques,” International Journal of Engineering and Computer Science, vol. 9, no. 5, 2020. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).