Submitted:
24 February 2023
Posted:
02 March 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods
3. Results
Development of Standardized Quality Control Tags
Best Practices for Use
- Providing the name of the method used for quality control is very important for interpreting the rest of the QC information. A method name should always be included (do not include additional QC tags if no method name is provided).
- Method names can be provided in the form of a name of a pipeline or a link to a GitHub repository. Multiple methods should be listed and separated by a semicolon.
- Methods updates can make big differences to their outputs. The version of the method used for quality control should be included.
- The method version can be expressed using whatever convention the developer implements (e.g., date, semantic versioning).
- If multiple methods were used, record the version numbers in the same order as the method names. Separate the version numbers using a semicolon.
- If a pick list does not contain a desired value, a new term request should be submitted to PHA4GE via the QC Tag GitHub repository issuetracker New Term Request form (described below under “Community Development and Maintenance”).
Annotation Limitations and Considerations
Implementation of Standardized Quality Control Tags
Community Development and Maintenance
4. Discussion
5. Conclusion
Funding
Institutional Review Board Statement
Acknowledgements
Conflicts of Interest
List of Abbreviations
References
- Black, A.; et al. Ten recommendations for supporting open pathogen genomic analysis in public health. Nat. Med. 2020, 26, 832–841. [Google Scholar] [CrossRef] [PubMed]
- Brown, B.; et al. An economic evaluation of the Whole Genome Sequencing source tracking program in the U. S. PLoS ONE 2021, 16, e0258262. [Google Scholar] [CrossRef] [PubMed]
- Carrillo, C.D.; Blais, B.W. Whole-Genome Sequence Datasets: A Powerful Resource for the Food Microbiology Laboratory Toolbox. Front. Sustain. Food Syst. 2021, 5, 754988. [Google Scholar] [CrossRef]
- Cook, S. Genomic surveillance in the roll out of vaccines. PHG Foundation. 2021. Accessed Jan 12 2023 https://www.phgfoundation.
- Gargis, A.S.; et al. Assuring the Quality of Next-Generation Sequencing in Clinical Microbiology and Public Health Laboratories. J. Clin. Microbiol. 2016, 54, 2857–2865. [Google Scholar] [CrossRef] [PubMed]
- Gozashti, L. Corbett-Detig Shortcomings of SARS-CoV-2 genomic metadata. BMC Res. Notes 2021, 14, 189. [Google Scholar] [CrossRef] [PubMed]
- Griffiths, E.; et al. Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package. GigaScience 2022, 11. [Google Scholar] [CrossRef] [PubMed]
- Hendriksen, R.S. Using Genomics to Track Global Antimicrobial Resistance. Front. Public. Health 2019, 7, 242. [Google Scholar] [CrossRef]
- Lusignan, S.; et al. COVID-19 Surveillance in a Primary Care Sentinel Network: In-Pandemic Development of an Application Ontology. JMIR Public. Health Surveill. 2020, 6, e21434. [Google Scholar] [CrossRef] [PubMed]
- Munnink, B.B.O.; et al. Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans. Science 2021, 371, 6525. [Google Scholar] [CrossRef]
- Musen, M. Demand standards to sort FAIR data from foul. Nature 2022, 609. [Google Scholar]
- Petrillo, M.; et al. A roadmap for the generation of benchmarking resources for antimicrobial resistance detection using next generation sequencing [version 2; peer review: 1 approved, 2 approved with reservations]. F1000Research 2022, 10, 80. [Google Scholar] [CrossRef] [PubMed]
- Pettengill, J.B. Interpretative Labor and the Bane of Nonstandardized Metadata in Public Health Surveillance and Food Safety. Clin. Infect. Dis. 2021, 73, 1537–1539. [Google Scholar] [CrossRef] [PubMed]
- Rick, J.A.; et al. Reference genome choice and filtering thresholds jointly influence phylogenomic analyses. bioRxiv 2022. [CrossRef] [PubMed]
- Robinson, E.R.; et al. Genomics and outbreak investigation: from sequence to consequence. Genome Med. 2013, 5, 36. [Google Scholar] [CrossRef]
- Rossen, J.W.A.; et al. Practical issues in implementing whole-genome-sequencing in routine diagnostic microbiology. Clin. Microbiol. Infect. 2018, 24, 355–360. [Google Scholar] [CrossRef] [PubMed]
- Schriml, L.; et al. COVID-19 pandemic reveals the peril of ignoring metadata standards. Sci. Data 2020, 7, 188. [Google Scholar] [CrossRef]
- Smith, B.; et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 2007, 25, 1251–1255. [Google Scholar] [CrossRef] [PubMed]
- Smits, T.H.M. The importance of genome sequence quality to microbial comparative genomics. BMC Genomics 2019, 20, 662. [Google Scholar] [CrossRef]
- Stevens, I.; et al. Ten simple rules for annotating sequencing experiments. PLoS Comput. Biol. 2020, 16, e1008260. [Google Scholar] [CrossRef]
- Timme, R.E.; et al. Utilizing the Public GenomeTrakr Database for Foodborne Pathogen Traceback. Methods Mol. Biol. 2019, 1918, 201–212. [Google Scholar] [CrossRef]
- Wagner, D.D.; et al. Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks. PeerJ 2021, 9, e12446. [Google Scholar] [CrossRef] [PubMed]
- Wilkinson, M.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Global genomic surveillance strategy for pathogens with pandemic and epidemic potential, 2022–2032. Available online: https://www.who.int/initiatives/genomic-surveillance-strategy.
- Xiaoli, L.; et al. Benchmark datasets for SARS-CoV-2 surveillance bioinformatics. PeerJ 2022, 10, e13821. [Google Scholar] [CrossRef] [PubMed]

| Field* | Definition | Ontology ID | Data Type | Values | Example |
| quality control method name | The name of the method used to assess whether a sequence passed a predetermined quality control threshold. | GENEPIO:0100557 | String | No prescribed values | ncov-tools |
| quality control method version | The version number of the method used to assess whether a sequence passed a predetermined quality control threshold. | GENEPIO:0100558 | String | No prescribed values | 1.2.3 |
| quality control determination | The determination of a quality control assessment. | GENEPIO:0100559 | Enums | no quality control issues identified [GENEPIO:0100562]; sequence passed quality control [GENEPIO:0100563]; sequence failed quality control [GENEPIO:0100564]; minor quality control issues identified [GENEPIO:0100565]; sequence flagged for potential quality control issues [GENEPIO:0100566]; quality control not performed [GENEPIO:0100567] | sequence failed quality control [GENEPIO:0100564] |
| quality control issues | The reason contributing to, or causing, a low quality determination in a quality control assessment | GENEPIO:0100560 | Enums | low quality sequence [GENEPIO:0100568]; sequence contaminated [GENEPIO:0100569]; low average genome coverage [GENEPIO:0100570]; low percent genome captured [GENEPIO:0100571]; read lengths shorter than expected [GENEPIO:0100572]; sequence amplification artifacts [GENEPIO:0100573]; low signal to noise ratio [GENEPIO:0100574]; low coverage of characteristic mutations [GENEPIO:0100575] | low average genome coverage [GENEPIO:0100570] |
| quality control details | The details surrounding a low quality determination in a quality control assessment. | GENEPIO:0100561 | String | No prescribed values | CT value of 39. Low viral load. Low DNA concentration after amplification. |
| Attribute name | Description | Guidance | Term lists |
| quality_control_method_name | Name of quality control pipeline, software, or method | Populate using a term from the picklist | GalaxyTrakr SSQuAWK; CFSAN Wastewater Analysis Pipeline (C-WAP) |
| quality_control_method_version | Version number | ||
| quality_control_determination | User determined assessment of data quality | Populate using a term from the picklist | No quality control issues identified; minor quality control issues identified; sequence flagged for potential quality control issues; not performed |
| quality_control_issues | Quality control issues relevant for the project or data type | Populate using a term from the picklist | Low quality sequence; sequenced contaminated; low average genome coverage; low % genome captured; read lengths shorter than expected; sequence amplification artifacts; low signal to noise ratio; low coverage of characteristic mutations |
| quality_control_details | Free text attribute capturing custom entry | Free text entry | None |
| Data Provider | Surveillance Network | SRA Accession | Run Record |
| Washington State Department of Health | GenomeTrakr wastewater project | SRR21205381 | SRR21205381 Run Record |
| US FDA, Center for Food Safety and Applied Nutrition | GenomeTrakr wastewater project | SRR19851129 | SRR19851129 Run Record |
| US FDA, Center for Food Safety and Applied Nutrition | GenomeTrakr wastewater project | SRR20046849 | SRR20046849 Run Record |
| New Jersey Department of Agriculture | GenomeTrakr wastewater project | SRR20428498 | SRR20428498 Run Record |
| Texas Department of State Health Services | GenomeTrakr wastewater project | SRR20018633 | SRR20018633 Run Record |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).