Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

The Dataharmonizer: a Tool for Faster Data Harmonization, Validation, Aggregation, and Analysis of Pathogen Genomics Contextual Information

Version 1 : Received: 23 June 2022 / Approved: 24 June 2022 / Online: 24 June 2022 (08:46:04 CEST)

How to cite: Gill, I.; Griffiths, E.; Dooley, D.; Cameron, R.; Savić Kallesøe, S.; John, N.S.; Sehar, A.; Gosal, G.; Alexander, D.; Chapel, M.; Croxen, M.; Delisle, B.; Di Tullio, R.; Gaston, D.; Duggan, A.; Guthrie, J.; Horsman, M.; Joshi, E.; Kearney, L.; Knox, N.; Lau, L.; LeBlanc, J.; Li, V.; Lyons, P.; MacKenzie, K.; McArthur, A.; Panousis, E.; Palmer, J.; Prystajecky, N.; Smith, K.; Tanner, J.; Townend, C.; Tyler, A.; Van Domselaar, G.; Hsiao, W. The Dataharmonizer: a Tool for Faster Data Harmonization, Validation, Aggregation, and Analysis of Pathogen Genomics Contextual Information. Preprints 2022, 2022060335. https://doi.org/10.20944/preprints202206.0335.v1 Gill, I.; Griffiths, E.; Dooley, D.; Cameron, R.; Savić Kallesøe, S.; John, N.S.; Sehar, A.; Gosal, G.; Alexander, D.; Chapel, M.; Croxen, M.; Delisle, B.; Di Tullio, R.; Gaston, D.; Duggan, A.; Guthrie, J.; Horsman, M.; Joshi, E.; Kearney, L.; Knox, N.; Lau, L.; LeBlanc, J.; Li, V.; Lyons, P.; MacKenzie, K.; McArthur, A.; Panousis, E.; Palmer, J.; Prystajecky, N.; Smith, K.; Tanner, J.; Townend, C.; Tyler, A.; Van Domselaar, G.; Hsiao, W. The Dataharmonizer: a Tool for Faster Data Harmonization, Validation, Aggregation, and Analysis of Pathogen Genomics Contextual Information. Preprints 2022, 2022060335. https://doi.org/10.20944/preprints202206.0335.v1

Abstract

Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations, as well as research. In order to make use of pathogen genomics data, it must be interpreted using contextual data (metadata). Contextual data includes sample metadata, laboratory methods, patient demographics, clinical outcomes, and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration, and its use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating, and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool’s web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission.In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.

Supplementary and Associated Material

https://github.com/cidgoh/DataHarmonizer/releases: DataHarmonizer GitHub repository (download app)

Keywords

metadata; contextual data; harmonization; genomic surveillance; data management

Subject

Computer Science and Mathematics, Information Systems

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.