Submitted:
22 November 2023
Posted:
26 November 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
Data Aggregation
Medical Concept Inference
- The “Classification Commune des Actes Médicaux” (CCAM), as the French classification for medical procedures [8];
Information Consolidation Tool Using PMSI Data
Pivot Model for Diverse Data Sources
- -
- Partial data: we often deal with information about the morphology of a tumor, but we have nothing about the initial diagnostic or its location; or treatment responses without the involved treatment.
- -
- Concept links: the pivot model must preserve, when applicable, the links between the detected concepts (e.g. the link between a metastasis and the primary tumor)
- -
- Volumetry and redundancy: Consore might detect or receive thousands of concepts, often redundant, for a single patient.
- -
- Data sourcing: for each data item, we need to identify its source and the date it was recorded in the information system.
- -
- In order to organise and classify all the identified concepts, we developed a common model defining the cancer disease based on several main hierarchical classes (or layers): Cancers (all cancer recurrences for a given patient), Tumor events (primary tumor, local or metastatic relapse), Acts (treatments and/or analysis), and Documents (all the documents of a patient or available biological samples) (Figure 1).
Data Inference and Structuring: an Illustration through Metastasis Structuring
- Retrieve all detected occurrences of the concepts “metastasis” and “relapses” that are located further from the primary tumor within the patient’s dataset. To maintain clarity, both types of occurrences will be referred to as « metastasis ».
- Sort these concepts based on their dates, either the date provided by the algorithm (e.g., « metastasis diagnosed on 24th May 2022 ») or, in cases where no relevant date is found in the report, the date of the document itself.
- Determine the relevant date within the corpus of metastasis concepts. This involves defining a heuristic time interval (potentially 3/6 months or a year), starting from the first occurrence of a metastasis.
- Assign a weighting factor to each occurrence, considering the data source or the relevance of its associated date.
- Calculate the cumulative weight of the concepts falling within the defined interval, obtaining the interval’s total weight.
- If the interval’s weight exceeds a predefined empirical threshold, the start date within that interva is considered the commencement of the metastasis. If not, the process moves to the next interval, checks the same conditions, and repeats until a date is determined.
Performance of the Consore tool
3. Results
| French Cancer Centres | Nb of patients | Nb of patients with at least one cancer | Nb of patients with a metastatic relapse | Nb of medical records |
|---|---|---|---|---|
| Institut Curie* | 572 421 | 280 924 | 95 025 | 13 431 874 |
| Centre Léon Bérard* | 359 634 | 207 657 | 85 210 | 18 711 561 |
| Institut Paoli-Calmettes* | 347 415 | 136 500 | 43 767 | 4 464 580 |
| Gustave Roussy* | 399 665 | 237 132 | 96 074 | 12 856 023 |
| Institut de Cancérologie de l’Ouest | N/A (deployment in progress) | N/A | N/A | |
| Centre Oscar Lambret* | 182 436 | 118 506 | 57 784 | 5 865 404 |
| Institut du Cancer de Montpellier* | 176 257 | 79 601 | 34 138 | 3 401 825 |
| Centre Georges-François Leclerc* | 282 948 | 79 592 | 36 635 | 3 207 721 |
| Centre Jean Perrin* | 397 179 | 124 080 | 44 548 | 2 776 005 |
| Institut Bergonié* | 285 129 | 153 589 | 52 290 | 3 806 476 |
| Institut de Cancérologie de Lorraine** | 247869 | 63350 | 19105 | 1 096 485 |
Cancer of unknown primary
Results at the “Centre Léon Bérard” (CLB)
-
- ○
- Recall = 99%;
- ○
- Precision = 57%;
- ○
- F1-score = 0,66.
- ○
- calculation of inverse recall is not possible because manual control of all EMRs to identify true negatives is not possible.
Results at the “Institut Curie” (IC)
-
To assess the performance, Recall, Prevision and F1-score were calculated as followed:
- ○
- Recall = 94%;
- ○
- Precision = 56%;
- ○
- F1-score = 0,7.
- ○
- calculation of inverse recall is not possible because manual control of all EMRs to identify true negatives is not possible.
4. Discussion
5. Conclusions
Authors Contributions
Acknowledgments
References
- Ferlay, J.; Ervik, M.; Lam, F.; Colombet, M.; Mery, L.; Piñeros, M. Global Cancer Observatory: Cancer Today. Lyon: International Agency for Research on Cancer. 20 May 2020. Available online: https://gco.iarc.fr/today accessed on May 2023).
- Hanahan, D. Hallmarks of Cancer: New Dimensions. Cancer Discov. 2022, 12, 31–46. [Google Scholar] [CrossRef] [PubMed]
- Lainé, A.; Hanvic, B.; Ray-Coquard, I. Importance of guidelines and networking for the management of rare gynecological cancers. Curr Opin Oncol. 2021, 33, 442–446. [Google Scholar] [CrossRef] [PubMed]
- Wilke, R.A.; Berg, R.L.; Peissig, P.; Kitchner, T.; Sijercic, B.; McCarty, C.A. Use of an electronic medical record for the identification of research subjects with diabetes mellitus. Clinical Medicine & Research 2007, 5, 1–7. [Google Scholar] [CrossRef]
- Hersh, WR; Weiner, MG; et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013. [Google Scholar] [CrossRef] [PubMed]
- Wilkinson, M.; Dumontier, M.; Aalbersberg, I.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016, 3, 160018. [Google Scholar] [CrossRef]
- Guérin, J.; Laizet, Y.; Le Texier, V.; Chanas, L.; Rance, B.; Koeppel, F.; Lion, F.; Gourgou, S.; Martin, A.L.; Tejeda, M.; Toulmonde, M.; Cox, S.; Hess, E.; Rousseau-Tsangaris, M.; Jouhet, V.; Saintigny, P. OSIRIS: A Minimum Data Set for Data Sharing and Interoperability in Oncology. JCO Clin Cancer Inform. 2021, 5, 256–265. [Google Scholar] [CrossRef] [PubMed]
- CCAM. 20 October. Available online: https://sante.gouv.fr/professionnels/gerer-un-etablissement-de-sante-medico-social/financement/financement-des-etablissements-de-sante-10795/financement-des-etablissements-de-sante-glossaire/article/classification-commune-des-actes-medicaux-ccam accessed on October 2023.
- World Health Organization. (2004). ICD-10: international statistical classification of diseases and related health problems: tenth revision, 2nd ed. World Health Organization.
- AFrizt; Percy, C.; Jack, A.; Shanmagaratnam, K.; Sobin, L.; Parkin, D.M.; Whelan, S. International Classification of Diseases for Oncology. Third edition. First Revision, World Health Organization, Geneva, 2013.
- Fraser, Alexander & Daniel Marcu. Measuring Word Alignment Quality for Statistical Machine Translation. Computational Linguistics 2007, 33, 293–303. [Google Scholar] [CrossRef]
- Mandrekar, J.N. Receiver operating characteristic curve in diagnostic test assessment. J Thoracic Oncol. 2010, 5, 1315–1316. [Google Scholar] [CrossRef] [PubMed]
- Vibert, J.; Pierron, G.; Benoist, C.; Gruel, N.; Guillemot, D.; Vincent-Salomon, A.; Le Tourneau, C.; Livartowski, A.; Mariani, O.; Baulande, S.; Bidard, F.C.; Delattre, O.; Waterfall, J.J.; Watson, S. Identification of Tissue of Origin and Guided Therapeutic Applications in Cancers of Unknown Primary Using Deep Learning and RNA Sequencing (TransCUPtomics). J Mol Diagn. 2021, 23, 1380–1392. [Google Scholar] [CrossRef] [PubMed]
- Heudel, P.; Favier, B.; Solodky, M.L.; et al. Survival and risk of COVID-19 after SARS-COV-2 vaccination in a series of 2391 cancer patients. Eur J Cancer. 2022, 165, 174–183. [Google Scholar] [CrossRef] [PubMed]
- Health data Hub. Available online: https://www.health-data-hub.fr/page/faq-english accessed on May 2023.
- Health data Hub, UNIBASE results. Available online: https://www.health-data-hub.fr/annonce-laureats-unibase accessed on May 2023, French version only.
- OHDSI. Available online: https://www.ohdsi.org/data-standardization/the-common-data-model/ accessed on May 2023.
- Garcelon, N.; Neuraz, A.; Salomon, R.; Faour, H.; Benoit, V.; Delapalme, A.; Munnich, A.; Burgun, A.; Rance, B. A clinician friendly data warehouse oriented toward narrative reports: Dr. Warehouse. J Biomed Inform. 2018, 80, 52–63. [Google Scholar] [CrossRef] [PubMed]
- Madec , J; Bouzillé, G; Riou, C; Van Hille, P; Merour, C; Artigny, ML; Delamarre, D; Raimbert, V; Lemordant, P; Cuggia, M. eHOP Clinical Data Warehouse: From a Prototype to the Creation of an Inter-Regional Clinical Data Centers Network. Stud Health Technol Inform. 2019, 264, 1536–1537. [Google Scholar] [CrossRef] [PubMed]
- CancerLinq. Available online: https://www.cancerlinq.org/ accessed on May 2023.
- Flatiron. Available online: https://flatiron.com/ accessed on May 2023.
- Gilson, A.; Safranek, C.W.; Huang, T.; Socrates, V.; Chi, L.; Taylor, R.A.; Chartash, D. How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ. 2023, 9, e45312. [Google Scholar] [CrossRef] [PubMed]




Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).