Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy

Version 1 : Received: 2 January 2023 / Approved: 4 January 2023 / Online: 4 January 2023 (03:48:26 CET)

How to cite: Vadyala, S.R. Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy. Preprints 2023, 2023010061. https://doi.org/10.20944/preprints202301.0061.v1 Vadyala, S.R. Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy. Preprints 2023, 2023010061. https://doi.org/10.20944/preprints202301.0061.v1

Abstract

Colonoscopy is used for colorectal cancer (CRC) screening. Extracting details of the colonoscopy findings from free text in electronic health records (EHRs) can be used to determine patient risk for CRC and colorectal screening strategies. In this study, we developed and evaluated the accuracy of a deep learning model framework to extract information for the clinical decision support system to analyze relevant free-text colonoscopy reports, including indications, pathology, and findings notes. The Bio-Bi-LSTM-CRF framework was developed using Bidirectional Long Short-term Memory (Bi-LSTM) and Conditional Random Fields (CRF) to extract several clinical features from these free-text reports, including indications for the colonoscopy, findings during the colonoscopy, and pathology of the resected material. We then trained the Bio-Bi-LSTM-CRF and existing Bi-LSTM-CRF models on 80% of 4,000 manually annotated notes obtained from the colonoscopy reports of 3,867 patients. The clinical notes were from a group of patients aged over 40 years old enrolled in four Veterans Affairs Medical Centers. A total of 10% of the remaining annotated notes were used to train hyperparameter, while the remaining 10% were used to evaluate the accuracy of our model (Bio-Bi-LSTM-CRF) and to compare the results with the outcomes obtained using Bi-LSTM-CRF. The results of our experiments showed that the bidirectional encoder representations by integrating dictionary function vector from Bio-Bi-LSTM-CRF and strategies character sequence embedding approach is an effective way to identify colonoscopy features from EHR-extracted clinical notes. Therefore, the Bio-Bi-LSTM-CRF model is concluded to be capable of creating new opportunities to identify patients at risk for colon cancer and to study their health outcomes.

Keywords

Neural Network; Machine Learning; Natural Language Processing (NLP); Text Mining; Sentence Classification; Colorectal Cancer; Clinical Information.

Subject

Medicine and Pharmacology, Pathology and Pathobiology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.