Submitted:
09 February 2026
Posted:
11 February 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- 1.
- 2.
2. Method
2.1. Design and Reporting
2.2. Eligibility Criteria
2.3. Information Sources and Search Strategy
2.4. Study Selection
2.5. Data Charting
2.6. Critical Appraisal
2.7. Synthesis Approach
3. Results
3.1. Overview
3.2. Study Selection Narrative (PRISMA–ScR)
3.3. Characteristics of Included Studies
| Author & Title | Method | Data Sources | Sample | Results | NLP Technique | Nonadherence/Inference |
|---|---|---|---|---|---|---|
| Masukawa et al., 2022 [1] . Machine Learning Models to Detect Social Distress and severe physical/psychological symptoms in terminally ill patients with cancer from unstructured EHR text data | Quantitative study | Retrospective cancer patient records | 808 cancer patients.CAT: Strength=Moderate; Quality = Medium | ML detected social distress and symptoms (pain, dyspnea, nausea, insomnia, anxiety). | Supervised machine learning | Text data can identify social distress; areas like anxiety receive scant attention compared to physical pain (∼40% of narratives), affecting adherence risks. |
| Alpert et al., 2019 [4] . Patient access to clinical notes in oncology: A mixed method analysis of oncologists’ and linguistic attitudes towards notes | Mixed methods | National Cancer Institute, Virginia | 13 interviews; 500 clinical notes; 22 oncologists.CAT: Strength=Moderate; Quality = Medium | Oncologists acknowledged that changing note content could improve patient communication but may hinder interdisciplinary communication. | Linguistic Inquiry and Word Count (LIWC); random effects modelling | Tension between patient-centred language and clinical interaction; challenges in non-clinical communication may affect adherence. |
| Che-Chen Kuo et al., 2022 [9] . Using data mining technology to predict medication-taking behaviour in women with breast cancer: A retrospective study | Quantitative (structured EHR data) | Breast cancer patient records | 385 records.CAT: Strength = Weak; Quality = Low | Highest polarity of reviews related to patient–doctor experience and pain. | Multiple logistic regression, decision tree, artificial neural network | Structured medical records may not capture adverse effects; missing factors can influence adherence. |
| Tamang et al., 2015 [10] . Detecting Unplanned Care from Clinician Notes in Electronic Health Records | Quantitative study | Cancer patient EHRs | 308,096 free-text machine-readable notes.CAT: Strength=Moderate; Quality = Medium | Including free-text notes increased identification of ED visits; textual analysis identified most reported symptoms. | Clinical text mining | Combining structured & unstructured data improved detection; pain was the most common reason for ED visits — cancer pain management suboptimal. |
| Himmelstein et al., 2023 [12] . Examination of Stigmatising Language in the Electronic Health Record | Quantitative | Longitudinal retrospective | National Danish Patient Registry; 4,418 pancreatic cancer patients.CAT: Strength=Moderate; Quality = Medium | Text-based approach identified 132 unique symptoms in clinical notes. | Text mining comparisons to registry data | Stigmatizing language appears in admission notes and varies by condition; stigmatization may alienate patients and lead to treatment discordance. |
| Patel et al., 2023 [13] . Clinician Perspectives on Electronic Health Records, Communication, and Patient Safety Across Diverse Medical Oncology Practices | Mixed-methods | Michigan Oncology Quality Collaborative (MOQC) | 29 oncology practices; 297 clinicians.CAT: Strength=Moderate; Quality = Medium | More seamless EHR integration into routine care is needed. | Sociotechnical framework; surveys | EHRs disrupt communication and increase workloads; patient communication may be compromised, risking unintentional nonadherence. |
| Huilgol et al., 2022 [14] . Opportunities to use electronic health record audit logs to improve cancer care | Qualitative | Report (article) | Summary on interpreting metadata from oncologist audit logs.CAT: Strength=Moderate; Quality = Medium | Audit logs can show how oncologists access information and collaborate. | Non-NLP (audit log analytics) | Understanding audit logs may clarify barriers to communication from EHR dissatisfaction (time/data entry/navigation), with adherence implications. |
| Rahimian et al., 2019 [15] . Significant and Distinctive n-grams in Oncology Notes: A Text-Mining Method to Analyse the Effect of OpenNotes on Clinical Documentation | Quantitative (n-gram analysis) | Oncology provider notes | 102,135 notes by 36 clinicians.CAT: Strength=Moderate; Quality = Medium | Significant differences before/after patient access to notes; longer explanatory notes observed. | Text mining (n-grams) | Oncologists may be less prolific note-takers; subtle empathetic changes with patient access may reflect communication skills relevant to adherence. |
| Author & Title | Method | Data Sources | Sample | Results | NLP Technique | Nonadherence/Inference |
|---|---|---|---|---|---|---|
| Asan et al., 2023 [17] . Oncologists’ views regarding the role of Electronic Health Records in Care Coordination | Qualitative report | Semi-structured interviews | 60 oncologists.CAT: Strength=Moderate; Quality = Medium | Perceptions of oncologist EHR use during care coordination. | Traditional qualitative (non-NLP) | Oncologists report EHRs often not updated during encounters, potentially impacting communication and adherence. |
| Geskey et al., 2023 [19] . National Documentation and Coding Practices of Noncompliance: The Importance of Social Determinants of Health and the Stigma of African American Bias | Quantitative | Z codes (social determinants) in U.S. patient records | ∼9 million adult patients.CAT: Strength = Medium; Quality = Medium | Two or three comorbidities doubled the ratio of noncompliance. | Analysis of ICD-10 diagnosis codes in EHRs | Clinician bias in noncompliance coding (e.g., financial circumstances) may misclassify nonadherence from a multifactorial perspective. |
| Davoudi et al., 2022 [21] . Using Natural Language Processing to Classify Serious Illness Communication with Oncology Patients | Quantitative study | University of Pennsylvania Abramson Cancer Center | 3,563 patients; 5,145 notes; 8,695 distinct responses.CAT: Strength=Moderate; Quality = Medium | Validated SIC classifier enabling quality metrics (communication quality, goal-concordance). | Machine learning (domain/subdomain classification) | NLP can identify treatment concordance, a potential factor in nonadherence. |
| Simoulin et al., 2023. [36] From free-text electronic health records to structured cohorts: Onconum, an innovative methodology for real-world data mining in breast cancer | Quantitative study | Hospital patients with breast cancer at Strasbourg University Hospital | 9,599 patients.CAT: Strength = Weak; Quality = Low | Successfully extracted and structured information from EHRs in breast cancer without pre-existing dictionaries or manually annotated corpora. | Hybrid NLP (machine learning + rule-based lexical methods) | Multiple sources improved extraction rates and quality; structured information can be recovered from text. May enable comparison of codified adherence against notes. |
| Xin et al., 2023.[37] Pancreatic cancer symptom trajectories from Danish registry data and free text in electronic health records | Quantitative | Longitudinal retrospective | National Danish Patient Registry; 4,418 pancreatic cancer patients.CAT: Strength=Moderate; Quality = Medium | Text-based approach identified 132 unique symptoms in clinical notes. | Text mining comparisons to registry data (technique not stated) | Text mining identified more symptoms than coded registries alone, potentially improving detection of issues related to adherence. |
| Elbers et al., 2023.[38] Sentiment analysis of medical record notes for lung cancer patients at the Department of Veterans Affairs | Quantitative study | Department of Veterans Affairs (U.S.) lung cancer patients | 10,000 patients; 3,500,000+ notes.CAT: Strength=Moderate; Quality = Medium | Recalibration of a general-purpose sentiment lexicon for oncology. | Hedonometer sentiment scoring adapted to clinical oncology | Detected higher adverse effects in final days of treatment/chemotherapy, suggesting increased risk of nonadherence toward end of cycles. |
| Leis et al., 2022.[39] Exploring the association of cancer and depression in Electronic Health Records: Combining Encoded Diagnosis and Mining Free-Text Clinical Notes | Quantitative | EHR database in a general hospital | 4,238 cancer patients.CAT: Strength=Moderate; Quality = Medium | Mining free-text identified more patients with depression than encoded data alone. | Open-source language analysis framework; text analysis | Depression often absent from structured data but detectable in notes; unintentional nonadherence may be influenced. |
| Mashima et al., 2022.[40] Using Natural Language Processing Techniques to Detect Adverse Events from Progress Notes Due to Chemotherapy | Quantitative | Kagawa University Hospital | 200 cancer patients receiving chemotherapy.CAT: Strength=Moderate; Quality = Medium | Significant increase in adverse effect detection via progress notes analysis. | Dictionary/corpus + proprietary NLP system | Adverse effects (a factor in nonadherence) may be missed in coded data; text mining of developmental progress needed. |
| Jensen et al., 2017.[41] Analysis of free text in electronic health records for identification of cancer patient trajectories | Quantitative | University Hospital of North Norway | 1,133,223 unstructured EHR text documents; 7,741 patients.CAT: Strength=Moderate; Quality = Medium | Free-text methods to identify trajectories may support decisions, decrease adverse events/readmissions, and improve cancer care quality. | Machine learning; text analyses using NOR-MeSH for automated capture | Time constraints for free-text entries are severe; patient access to notes may impact relationships and influence nonadherence/discordance. |
| Author & Title | Method | Data Sources | Sample | Results | NLP Technique | Nonadherence/Inference |
|---|---|---|---|---|---|---|
| Feldman et al., 2016.[42] Mining the Clinical Narrative: All Text are not Equal | Quantitative | Publicly available electronic medical records | 45,000 patient records.CAT: Strength = Weak; Quality = Medium | Deep understanding of clinical text required; coding cannot fully describe clinician diagnosis; nuanced language is critical. | Natural Language Toolkit (NLTK) & Python | Nuanced language complicates NLP on clinical text; notes seldom record patient feedback in non-clinical terms, limiting adherence insights. |
| Ganesan et al., 2016.[43] Discovering Related Clinical Concepts Using Large Amounts of Clinical Notes | Quantitative study | MIMIC database | 10,000 clinical notes.CAT: Strength=Moderate; Quality = Medium | Related concepts can be used for query expansion, hypothesis generation, incident investigation, sentence completion, etc. | Mining related clinical concepts with graph data structures | Related concepts in notes may be keystones for inferring treatment nonadherence. |
| Falotico et al., 2015.[44] Identifying Oncological Patient Information Needs to Improve e-Health Communication: a preliminary text-mining analysis | Mixed methods (quantitative text → qualitative interviews) | Semi-structured interviews; text mining | 12 rare cancer patients; 20,400 words; >3,000 distinct terms.CAT: Strength=Moderate; Quality =Medium | Online communication should be supervised. | Text mining (interview transcripts) | Information needs become more treatment-specific and sophisticated over the trajectory; physician relationship quality may modulate adherence. |
| Hamid et al., 2020.[45] Text Parsing Based Identification of Patients with Poor Glaucoma Medication Adherence in the Electronic Health Record | Quantitative study | University of Michigan | 736 glaucoma patients.CAT: Strength = Weak; Quality = Low | Identified a larger proportion of patients with poor adherence than automated EHR pull alone. | Text parsing of physician notes | Medication adherence identifiable only in progress notes, emphasizing unstructured text value. |
| Zhu et al., 2019.[46] Automatically identifying social isolation from clinical narratives for patients with prostate cancer | Quantitative study | Medical University of South Carolina | 3,138 patients.CAT: Strength=Moderate; Quality = Medium | Highly accurate identification of social isolation when noted in clinical narratives. | NLP algorithm + lexicon for social isolation | Poor documentation of social factors in notes can lead to nonadherence; both physician and shared factors involved. |
| Datta et al., 2019.[47] A frame semantic overview of NLP-based information extraction for cancer-related EHR notes | Quantitative study (overview) | NLP literature | 78 articles.CAT: Strength = Weak; Quality = Low | Guidance for general-purpose cancer frame resources and NLP systems to extract diverse cancer information types. | Frame construction; scoping review of NLP techniques | Treatment adherence not specifically framed despite importance; gap for future work. |
3.4. Main Findings at a Glance
| Theme | % of P2 sources |
|---|---|
| EHR barriers and compliance oriented design (templates, coding, immature UX) [13,14,19] | 47% |
| Incomplete/late documentation (adverse effects, pain, SDOH) [1,9,10,11] | 35% |
| Formal note style / stigmatizing language [4,12,20] | 12% |
| Time/cognitive burden limiting PCC narratives [14,16,18] | (frequent; overlaps with barrier theme) |
3.5. Theme Wise Results
Theme A: Compliance Oriented Design Constrains Person Centred Narratives
Theme B: Incomplete/Late Documentation of Adverse Effects, Pain, and SDOH
Theme C: Formal Tone and Stigmatizing Lexicon Undermine Trust and Comprehension
Theme D: Time/Cognitive Burden Reduces Specificity and Explicit Concordance Recording
3.6. Linkage to Research Questions
3.7. Guidance to Figures and Tables
4. Discussion
4.1. Summary of Principal Findings
4.2. Interpretation in Relation to Communication, Trust, and Concordance
4.3. Theoretical Framing: Communication, Trust, and Concordance
4.4. Implications for Nursing Practice, Documentation, and Service Design
4.5. Practical Toolkit and Policy Alignment
| Recommendation | Rationale and Policy Alignment |
|---|---|
| Add person-centred prompts and fields (e.g., goals, values, decision rationale) | Supports shared decision-making and aligns with the Charter’s principles of partnership and respect. |
| Use plain-language templates and examples | Enhances patient understanding and trust; aligns with ACSQHC communication standards. |
| Implement stigma-flagging tools (e.g., NLP-based detection of stigmatizing language) | Reduces bias and promotes dignity and respect in documentation. |
| Capture adverse effects, pain, and SDOH using structured and narrative fields | Improves early identification of barriers to adherence; supports safety and access principles. |
| Leverage audit-log data to streamline workflows | Reduces cognitive burden and documentation fatigue; aligns with safety and accountability standards. |
4.6. Methodological Reflections and Limitations
4.6.1. Gaps in the Literature and Future Directions
NLP Techniques and Variability.
Corpus Characteristics.
Bias and Validation.
4.7. Future Research
- 1.
- 2.
- 3.
- 4.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Public Involvement Statement
Guidelines and Standards Statement
Use of Artificial Intelligence
References
- Masukawa, K.; Aoyama, M.; Yokota, S.; Nakamura, J.; Ishida, R.; Nakayama, M.; Miyashita, M. Machine learning models to detect social distress, spiritual pain, and severe physical psychological symptoms in terminally ill patients with cancer from unstructured text data in electronic medical records. Palliative Medicine 2022, 36(8), 1207–1216. [Google Scholar] [CrossRef] [PubMed]
- Percha, B. Modern Clinical Text Mining: A Guide and Review. Annual Review of Biomedical Data Science 2021, 4(1), 165–187. [Google Scholar] [CrossRef]
- Sikorskii, A.; Wyatt, G.; Tamkus, D.; Victorson, D.; Rahbar, M. H.; Ahn, S. Concordance between patient reports of cancer-related symptoms and medical records documentation. J Pain Symptom Manage 2012, 44(3), 362–372. [Google Scholar] [CrossRef]
- Alpert, J. M.; Morris, B. B.; Thomson, M. D.; Matin, K.; Sabo, R. T.; Brown, R. F. Patient access to clinical notes in oncology: A mixed method analysis of oncologists’ attitudes and linguistic characteristics towards notes. Patient Educ Couns 2019, 102(10), 1917–1924. [Google Scholar] [CrossRef]
- Chapman, W.; Nadkarni, P.; Hirschman, L.; D’Avolio, L.; Savova, G.; Uzuner, O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions; Journal of the American Medical Informatics Association: JAMIA, 2011; pp. 540–543. [Google Scholar] [CrossRef]
- Johnson, A.; Pollard, T.; Horng, S.; Anthony, L.; Mark, R. MIMIC-IV-Note: Deidentified free-text clinical notes. PhysioNet 2023, Version 2.2. [Google Scholar] [CrossRef]
- Zeng, J.; Banerjee, I.; Henry, A. S.; Wood, D. J. Natural Language Processing to Identify Cancer Treatments With Electronic Medical Records. JCO: Clinical Cancer Informatics 2021, 5. [Google Scholar] [CrossRef]
- Gholipour, M.; Khajouei, R.; Amiri, P.; Hajesmaeel Gohari, S.; Ahmadian, L. Extracting cancer concepts from clinical notes using natural language processing: a systematic review. BMC Bioinformatics 2023, 24. [Google Scholar] [CrossRef] [PubMed]
- Kuo, C. C.; Wang, H. H.; Tseng, L. P. Using data mining technology to predict medication-taking behaviour in women with breast cancer: A retrospective study. Nurs Open 2022, 9(6), 2646–2656. [Google Scholar] [CrossRef]
- Tamang, S.; Patel, M.; Blayney, D.; Kuznetsov, J.; Finlayson, S.; Vetteth, Y. Detecting unplanned care from clinician notes in electronic health records. J Oncol Pract 2015, 11, e313–e319. [Google Scholar] [CrossRef] [PubMed]
- Cho, J. E.; Tang, N.; Pitaro, N.; Bai, H.; Cooke, P. V.; Arvind, V. Sentiment Analysis of Online Patient-Written Reviews of Vascular Surgeons. Ann Vasc Surg 2023, 88, 249–255. [Google Scholar] [CrossRef]
- Himmelstein, G.; Bates, D.; Zhou, L. Examination of Stigmatizing Language in the Electronic Health Record. JAMA Network Open 2022, 5(1), e2144967. [Google Scholar] [CrossRef]
- Patel, M. R.; Friese, C. R.; Mendelsohn-Victor, K.; Fauer, A. J.; Ghosh, B.; Bedard, L.; Griggs, J. J.; Manojlovich, M. Clinician Perspectives on Electronic Health Records, Communication, and Patient Safety Across Diverse Medical Oncology Practices. Journal of Oncology Practice 2019, 15(6), e529–e536. [Google Scholar] [CrossRef]
- Huilgol, Y. S.; Adler-Milstein, J.; Ivey, S. L.; Hong, J. C. Opportunities to use electronic health record audit logs to improve cancer care. Cancer Medicine 2022, 11(17), 3296–3303. [Google Scholar] [CrossRef]
- Rahimian, M.; Warner, J.; Jain, S.; Davis, R.; Zerillo, J.; Joyce, R. Significant and Distinctive n-Grams in Oncology Notes: A Text-Mining Method to Analyze the Effect of OpenNotes on Clinical Documentation. JCO Clin Cancer Inform 2019, 3, 1–9. [Google Scholar] [CrossRef]
- Warner, J.; Hochberg, E. Where is the EHR in oncology? J Natl Compr Canc Netw 2012, 10(5), 584–588. [Google Scholar] [CrossRef]
- Asan, O.; Nattinger, A. B.; Gurses, A. P.; Tyszka, J. T.; Yen, T. W. F. Oncologists’ Views Regarding the Role of Electronic Health Records in Care Coordination. JCO Clinical Cancer Informatics 2018, 2, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Burke, H. B. Standardized Documentation Is Not the Solution to Reduce Physician Time in the Electronic Health Record. JAMA Oncology 2023, 9(8), 1151–1152. [Google Scholar] [CrossRef]
- Geskey, J. M.; Kodish-Wachs, J.; Blonsky, H.; Hohman, S. F.; Meurer, S. National Documentation and Coding Practices of Noncompliance: The Importance of Social Determinants of Health and the Stigma of African-American Bias. Am J Med Qual 2023, 38(2), 87–92. [Google Scholar] [CrossRef] [PubMed]
- Heckemann, B.; Chaaya, M.; Jakobsson Ung, E.; Olsson, D. S.; Jakobsson, S. Finding the Person in Electronic Health Records. A Mixed-Methods Analysis of Person-Centered Content and Language. Health Communication 2022, 37(4), 418–424. [Google Scholar] [CrossRef] [PubMed]
- Davoudi, A.; Tissot, H.; Doucette, A.; et al. Using Natural Language Processing to Classify Serious Illness Communication with Oncology Patients. AMIA Jt Summits Transl Sci Proc 2022, 168–177. [Google Scholar] [PubMed Central]
- Geerse, O. P.; Lamas, D. J.; Bernacki, R. E.; Sanders, J. J.; Paladino, J.; Berendsen, A. J.; Hiltermann, T. J. N.; Lindvall, C.; Fromme, E. K.; Block, S. D. Adherence and Concordance between Serious Illness Care Planning Conversations and Oncology Clinician Documentation among Patients with Advanced Cancer. J Palliat Med 2021, 24(1), 53–62. [Google Scholar] [CrossRef]
- Rexhepi, H.; Moll, J.; Huvila, I. Online electronic healthcare records: Comparing the views of cancer patients and others. Health Informatics Journal 2020, 26(4), 2915–2929. [Google Scholar] [CrossRef]
- Griffin, J. M.; Kroner, B. L.; Wong, S. L.; Preiss, L.; Wilder Smith, A.; Cheville, A. L.; Mitchell, S. A.; Lancki, N.; Hassett, M. J.; Schrag, D.; Osarogiagbon, R. U.; Ridgeway, J. L.; Cella, D.; Jensen, R. E.; Flores, A. M.; Austin, J. D.; Yanez, B. Disparities in electronic health record portal access and use among patients with cancer. J Natl Cancer Inst 2024, 116(3), 476–484. [Google Scholar] [CrossRef]
- Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D. PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Ann. Intern. Med. 2018, 169, 467–473. [Google Scholar] [CrossRef] [PubMed]
- Moralejo, D.; Ogunremi, T.; Dunn, K. Critical Appraisal Toolkit (CAT) for assessing multiple types of evidence. Can Commun Dis Rep 2017, 43(9), 176–181. [Google Scholar] [CrossRef]
- Skeat, J.; Roddam, H. The qual-CAT:Applying a rapid review approach to qualitative research to support clinical decision-making in speech-language pathology practice. Evidence-Based Communication Assessment and Intervention 2019, 13(1-2), 3–14. [Google Scholar] [CrossRef]
- CASP. Critical Appraisal Skills Programme. In CASP UK - OAP Ltd; 2023; Available online: https://casp-uk.net/casp-tools-checklists/.
- Hillen, M. A.; de Haes, H. C. J. M.; Stalpers, L. J. A.; Klinkenbijl, J. H. G.; Eddes, E. H.; Butow, P. N.; van der Vloodt, J.; van Laarhoven, H. W. M.; Smets, E. M. A. How can communication by oncologists enhance patients’ trust? An experimental study. Annals of Oncology 2014, 25(4), 896–901. [Google Scholar] [CrossRef] [PubMed]
- Zolnierek, K. B.; Dimatteo, M. R. Physician communication and patient adherence to treatment: a meta-analysis. Med Care 2009, 47(8), 826–834. [Google Scholar] [CrossRef]
- Australian Commission on Safety and Quality in Health Care. Australian Charter of Healthcare Rights (Second Edition). ACSQHC . 2020. Available online: https://www.safetyandquality.gov.au/publications-and-resources/resource-library/australian-charter-healthcare-rights-second-edition-a4-accessible.
- Ahmad, B.; Jun, S. Sentiment Analysis of Cancer Patients About Their Treatment During the Peak Time of Pandemic COVID-19. 2021 4th International Conference on Computing and Information Sciences (ICCIS), 2021; pp. 29–30 Nov. [Google Scholar] [CrossRef]
- Amossy, R. Argumentation in Discourse: A Socio-discursive Approach to Arguments. Informal Logic 2009, 29, 252–267. [Google Scholar] [CrossRef]
- An, Y.; Fang, Q.; Wang, L. Enhancing patient education in cancer care: Intelligent cancer patient education model for effective communication. Computers in Biology and Medicine 2024, 169, 107874. [Google Scholar] [CrossRef]
- Arditi, D.; Gilles, I.; Lesage, S.; Griesser, A.-C.; Bienvenu, C.; et al. Computer-assisted textual analysis of free-text comments in the Swiss Cancer Patient Experiences (SCAPE) survey. BMC Health Serv Res 2020, 20, 1029. [Google Scholar] [CrossRef] [PubMed]
- Simoulin, A.; Thiebaut, N.; Neuberger, K.; Ibnouhsein, I.; Brunel, N.; Viné, R.; Bousquet, N.; Latapy, J.; Reix, N.; Molière, S.; Lodi, M.; Mathelin, C. From free-text electronic health records to structured cohorts: Onconum, an innovative methodology for real-world data mining in breast cancer. Computer Methods and Programs in Biomedicine 2023, 240, 107693. [Google Scholar] [CrossRef]
- Xin, J.; Hjaltelin, J.X.; Novitski, S.I.; Jørgensen, I.F.; Siggaard, T.; Vulpius, S.A.; Westergaard, D.; Johansen, J.S.; Chen, I.M.; Jensen, L.J.; Brunak, S. Pancreatic cancer symptom trajectories from Danish registry data and free text in electronic health records. eLife 2023, 12, e84919. [Google Scholar] [CrossRef] [PubMed]
- Elbers, D.C.; La, J.; Minot, J.R.; Gramling, R.; Brophy, M.T.; Do, N.V.; Fillmore, N.R.; Dodds, P.S.; Danforth, C.M. Sentiment analysis of medical record notes for lung cancer patients at the Department of Veterans Affairs. PLOS ONE 2023, 18(1), e0280931. [Google Scholar] [CrossRef] [PubMed]
- Leis, A.; Casadevall, D.; Albanell, J.; Posso, M.; Macià, F.; Castells, X.; Ramírez-Anguita, J.M.; Martínez Roldán, J.; Furlong, L.I.; Sanz, F.; Ronzano, F.; Mayer, M.A. Exploring the Association of Cancer and Depression in Electronic Health Records: Combining Encoded Diagnosis and Mining Free-Text Clinical Notes. JMIR Cancer 2022, 8(3), e39003. [Google Scholar] [CrossRef]
- Mashima, Y.; Tamura, T.; Kunikata, J.; Tada, S.; Yamada, A.; Tanigawa, M.; Hayakawa, A.; Tanabe, H.; Yokoi, H. Using Natural Language Processing Techniques to Detect Adverse Events From Progress Notes Due to Chemotherapy. Cancer Informatics 2022, 21, 11769351221085064. [Google Scholar] [CrossRef]
- Jensen, K.; Soguero-Ruiz, C.; Mikalsen, K.Ø.; Lindsetmo, R.-O.; Kouskoumvekaki, I.; Girolami, M.; Skrovseth, S.O.; Augestad, K.M. Analysis of free text in electronic health records for identification of cancer patient trajectories. Scientific Reports 2017, 7, 46226. [Google Scholar] [CrossRef]
- Feldman, K.; Hazekamp, N.; Chawla, N.V. Mining the Clinical Narrative: All Text Are Not Equal. 2016 IEEE International Conference on Healthcare Informatics (ICHI), 2016; pp. 306–314. [Google Scholar] [CrossRef]
- Ganesan, K.; Lloyd, S.; Sarkar, V. Discovering Related Clinical Concepts Using Large Amounts of Clinical Notes. Biomedical Engineering and Computational Biology 2016, 7 (Suppl 2), 27–33. [Google Scholar] [CrossRef]
- Falotico, R.; Liberati, C.; Zappa, P. Identifying Oncological Patient Information Needs to Improve e-Health Communication: a preliminary text-mining analysis. Quality and Reliability Engineering International 2015, 31(7), 1115–1126. [Google Scholar] [CrossRef]
- Hamid, M.S.; Valicevic, A.; Brenneman, B.; Niziol, L.M.; Stein, J.D.; Newman-Casey, P.A. Text Parsing-Based Identification of Patients with Poor Glaucoma Medication Adherence in the Electronic Health Record. American Journal of Ophthalmology 2020, 222, 54–59. [Google Scholar] [CrossRef]
- Zhu, V.J.; Lenert, L.A.; Bunnell, B.E.; Obeid, J.S.; Jefferson, M.; Hughes Halbert, C. Automatically identifying social isolation from clinical narratives for patients with prostate cancer. BMC Medical Informatics and Decision Making 2019, 19, 43. [Google Scholar] [CrossRef] [PubMed]
- Datta, S.; Bernstam, E.V.; Roberts, K. A frame semantic overview of NLP-based information extraction for cancer-related EHR notes. Journal of Biomedical Informatics 2019, 100, 103301. [Google Scholar] [CrossRef] [PubMed]


| Study | Corpus Size | Note Types | Language Features |
|---|---|---|---|
| Masukawa et al. (2022) [1] | 1,200 | Palliative care notes | Japanese; spiritual pain terms |
| Tamang et al. (2015) [10] | 5,000+ | Progress notes | Pain-related expressions |
| Himmelstein et al. (2022) [12] | 10,000+ | Admission notes | Stigma lexicon, U.S. English |
| Rahimian et al. (2019) [15] | 3,000 | Oncology notes (OpenNotes) | n-gram frequency, formal tone |
| Heckemann et al. (2022) [20] | 500 | Discharge letters | Person-centred phrasing |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
