Version 1
: Received: 2 January 2023 / Approved: 4 January 2023 / Online: 4 January 2023 (03:48:26 CET)
How to cite:
Vadyala, S.R. Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy. Preprints2023, 2023010061. https://doi.org/10.20944/preprints202301.0061.v1
Vadyala, S.R. Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy. Preprints 2023, 2023010061. https://doi.org/10.20944/preprints202301.0061.v1
Vadyala, S.R. Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy. Preprints2023, 2023010061. https://doi.org/10.20944/preprints202301.0061.v1
APA Style
Vadyala, S.R. (2023). Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy. Preprints. https://doi.org/10.20944/preprints202301.0061.v1
Chicago/Turabian Style
Vadyala, S.R. 2023 "Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy" Preprints. https://doi.org/10.20944/preprints202301.0061.v1
Abstract
Colonoscopy is used for colorectal cancer (CRC) screening. Extracting details of the colonoscopy findings from free text in electronic health records (EHRs) can be used to determine patient risk for CRC and colorectal screening strategies. In this study, we developed and evaluated the accuracy of a deep learning model framework to extract information for the clinical decision support system to analyze relevant free-text colonoscopy reports, including indications, pathology, and findings notes. The Bio-Bi-LSTM-CRF framework was developed using Bidirectional Long Short-term Memory (Bi-LSTM) and Conditional Random Fields (CRF) to extract several clinical features from these free-text reports, including indications for the colonoscopy, findings during the colonoscopy, and pathology of the resected material. We then trained the Bio-Bi-LSTM-CRF and existing Bi-LSTM-CRF models on 80% of 4,000 manually annotated notes obtained from the colonoscopy reports of 3,867 patients. The clinical notes were from a group of patients aged over 40 years old enrolled in four Veterans Affairs Medical Centers. A total of 10% of the remaining annotated notes were used to train hyperparameter, while the remaining 10% were used to evaluate the accuracy of our model (Bio-Bi-LSTM-CRF) and to compare the results with the outcomes obtained using Bi-LSTM-CRF. The results of our experiments showed that the bidirectional encoder representations by integrating dictionary function vector from Bio-Bi-LSTM-CRF and strategies character sequence embedding approach is an effective way to identify colonoscopy features from EHR-extracted clinical notes. Therefore, the Bio-Bi-LSTM-CRF model is concluded to be capable of creating new opportunities to identify patients at risk for colon cancer and to study their health outcomes.
Keywords
Neural Network; Machine Learning; Natural Language Processing (NLP); Text Mining; Sentence Classification; Colorectal Cancer; Clinical Information.
Subject
Medicine and Pharmacology, Pathology and Pathobiology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.