Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy

Shashank reddy Vadyala

doi:10.20944/preprints202301.0061.v1

Submitted:

02 January 2023

Posted:

04 January 2023

You are already at the latest version

Abstract

Colonoscopy is used for colorectal cancer (CRC) screening. Extracting details of the colonoscopy findings from free text in electronic health records (EHRs) can be used to determine patient risk for CRC and colorectal screening strategies. In this study, we developed and evaluated the accuracy of a deep learning model framework to extract information for the clinical decision support system to analyze relevant free-text colonoscopy reports, including indications, pathology, and findings notes. The Bio-Bi-LSTM-CRF framework was developed using Bidirectional Long Short-term Memory (Bi-LSTM) and Conditional Random Fields (CRF) to extract several clinical features from these free-text reports, including indications for the colonoscopy, findings during the colonoscopy, and pathology of the resected material. We then trained the Bio-Bi-LSTM-CRF and existing Bi-LSTM-CRF models on 80% of 4,000 manually annotated notes obtained from the colonoscopy reports of 3,867 patients. The clinical notes were from a group of patients aged over 40 years old enrolled in four Veterans Affairs Medical Centers. A total of 10% of the remaining annotated notes were used to train hyperparameter, while the remaining 10% were used to evaluate the accuracy of our model (Bio-Bi-LSTM-CRF) and to compare the results with the outcomes obtained using Bi-LSTM-CRF. The results of our experiments showed that the bidirectional encoder representations by integrating dictionary function vector from Bio-Bi-LSTM-CRF and strategies character sequence embedding approach is an effective way to identify colonoscopy features from EHR-extracted clinical notes. Therefore, the Bio-Bi-LSTM-CRF model is concluded to be capable of creating new opportunities to identify patients at risk for colon cancer and to study their health outcomes.

Keywords:

Neural Network

;

Machine Learning

;

Natural Language Processing (NLP)

;

Text Mining

;

Sentence Classification

;

Colorectal Cancer

;

Clinical Information.

Subject:

Medicine and Pharmacology - Pathology and Pathobiology

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe