Submitted:
10 January 2024
Posted:
10 January 2024
You are already at the latest version
Abstract
Keywords:
Introduction
Barriers to Data Reuse and Recommendations to Overcome Them
Data Quality Standards as a Solution
On the Road to Complete Metadata through Incentives
Towards Interoperability via Data Formatting
Bridging the Data Availability Gap: A Role for All Stakeholders
Data Ownership and Sharing Requirements
Resource Availability and User Skill Level
The Importance and Benefits of Equity and Inclusion in Agricultural Data Reuse
The Future of Data Reuse Is Bright
Conclusions
Contributions
Acknowledgements
References
- Science Digital et al. The State of Open Data 2023. https://digitalscience.figshare.com/articles/report/The_State_of_Open_Data_2023/24428194/1 (2023). [CrossRef]
- McKiernan, E. C. et al. How open science helps researchers succeed. eLife 5, e16800 (2016). [CrossRef]
- Satam, H. et al. Next-Generation Sequencing Technology: Current Trends and Advancements. Biology 12, 997 (2023). [CrossRef]
- Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat Genet 53, 1334–1347 (2021). [CrossRef]
- Sielemann, K., Hafner, A. & Pucker, B. The reuse of public datasets in the life sciences: potential risks and rewards. PeerJ 8, e9954 (2020). [CrossRef]
- Fernández-Ardèvol, M. & Rosales, A. Quality Assessment and Biases in Reused Data. American Behavioral Scientist 000276422211448 (2022). [CrossRef]
- Devare, M., Arnaud, E., Antezana, E. & King, B. Governing Agricultural Data: Challenges and Recommendations. in Towards Responsible Plant Data Linkage: Data Challenges for Agricultural Research and Development (eds. Williamson, H. F. & Leonelli, S.) 201–222 (Springer International Publishing, 2023). [CrossRef]
- Arita, M., Karsch-Mizrachi, I. & Cochrane, G. The international nucleotide sequence database collaboration. Nucleic Acids Research 49, D121–D124 (2021). [CrossRef]
- Liu, S. et al. A multi-tissue atlas of regulatory variants in cattle. Nature Genetics 54, 1438–1447 (2022). [CrossRef]
- Papoutsoglou, E. A. et al. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. New Phytol 227, 260–273 (2020). [CrossRef]
- Tenopir, C. et al. Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide. PLoS ONE 15, e0229003 (2020). [CrossRef]
- Gomes, D. G. E. et al. Why don’t we share data and code? Perceived barriers and benefits to public archiving practices. (2022).
- LaFlamme, M., Poetz, M. & Spichtinger, D. Seeing oneself as a data reuser: How subjectification activates the drivers of data reuse in science. PLoS ONE 17, e0272153 (2022). [CrossRef]
- Senft, M., Stahl, U. & Svoboda, N. Research data management in agricultural sciences in Germany: We are not yet where we want to be. PLoS ONE 17, e0274677 (2022). [CrossRef]
- Verhulst, S. & Young, A. Identifying and addressing data asymmetries so as to enable (better) science. Front. Big Data 5, 888384 (2022). [CrossRef]
- Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). [CrossRef]
- Announcement: Where are the data? Nature 537, 138–138 (2016). [CrossRef]
- Open Data in a Big Data World. Chemistry International 38, 17–17 (2016). [CrossRef]
- CODATA, Hodson, Simon, Mons, Barend, Uhlir, Paul, & Zhang, Lili. The Beijing Declaration on Research Data. in (2019). [CrossRef]
- Nosek, B. A. et al. Promoting an open research culture. Science 348, 1422–1425 (2015). [CrossRef]
- OECD. Enhanced Access to Publicly Funded Data for Science, Technology and Innovation. (OECD, 2020). [CrossRef]
- Lewin, H. A. et al. The Earth BioGenome Project 2020: Starting the clock. Proceedings of the National Academy of Sciences 119, e2115635118 (2022). [CrossRef]
- Vertebrate Genomes Project. Nature https://www.nature.com/collections/cabiagjdfj (2021).
- The CattleGTEx atlas reveals regulatory mechanisms underlying complex traits. Nature Genetics 54, 1273–1274 (2022). [CrossRef]
- Day, A. & Poplin, R. Analyzing 3024 rice genomes characterized by DeepVariant. Google Cloud Blog https://cloud.google.com/blog/products/data-analytics/analyzing-3024-rice-genomes-characterized-by-deepvariant (2019).
- Rodrigo, A. et al. Science Incubators: Synthesis Centers and Their Role in the Research Ecosystem. PLOS Biology 11, e1001468 (2013). [CrossRef]
- Rexroad, C. et al. Genome to Phenome: Improving Animal Health, Production, and Well-Being – A New USDA Blueprint for Animal Genome Research 2018–2027. Frontiers in Genetics 10, (2019).
- Tuggle, C. K. et al. Current challenges and future of agricultural genomes to phenomes in the USA. Genome Biol 25, 8 (2024). [CrossRef]
- Tuggle, C. K. et al. The Agricultural Genome to Phenome Initiative (AG2PI): creating a shared vision across crop and livestock research communities. Genome Biology 23, 3 (2022). [CrossRef]
- Chen, L. et al. Large-scale ruminant genome sequencing provides insights into their evolution and distinct traits. Science 364, eaav6202 (2019). [CrossRef]
- Leebens-Mack, J. H. et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574, 679–685 (2019). [CrossRef]
- Zhang, G. Bird sequencing project takes off. Nature 522, 34–34 (2015). [CrossRef]
- Deng, C. H. et al. Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences. Database 2023, baad088 (2023). [CrossRef]
- Harper, L. et al. AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture. Database (Oxford) 2018, (2018). [CrossRef]
- Saha, S. et al. Recommendations for extending the GFF3 specification for improved interoperability of genomic data. arXiv (2022).
- Moorhead, J. E., Rao, P. V. & Anusavice, K. J. Guidelines for experimental studies. Dental Materials 10, 45–51 (1994). [CrossRef]
- Delgado, A. An economy of details: standards and data reusability. Synthetic Biology 8, ysac030 (2023). [CrossRef]
- Curty, R. G., Crowston, K., Specht, A., Grant, B. W. & Dalton, E. D. Attitudes and norms affecting scientists’ data reuse. PLOS ONE 12, e0189288 (2017). [CrossRef]
- Schurch, N. J. et al. How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA 22, 839–851 (2016). [CrossRef]
- Schuurman, N. & Leszczynski, A. Ontologies for Bioinformatics. Bioinform Biol Insights 2, 187–200 (2008).
- Brunak, S. et al. Nucleotide Sequence Database Policies. Science 298, 1333–1333 (2002). [CrossRef]
- Deckard, J., McDonald, C. J. & Vreeman, D. J. Supporting interoperability of genetic data with LOINC. Journal of the American Medical Informatics Association 22, 621–627 (2015). [CrossRef]
- Ćwiek-Kupczyńska, H. et al. Measures for interoperability of phenotypic data: minimum information requirements and formatting. Plant Methods 12, 44 (2016). [CrossRef]
- Jenkins, G. B. et al. Reproducibility in ecology and evolution: Minimum standards for data and code. Ecology and Evolution 13, e9961 (2023). [CrossRef]
- Zhang, H. Overview of Sequence Data Formats. in Statistical Genomics (eds. Mathé, E. & Davis, S.) vol. 1418 3–17 (Springer New York, 2016).
- Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). [CrossRef]
- Beier, S. et al. Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR [version 2; peer review: 2 approved]. F1000Research 11, (2022). [CrossRef]
- Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011). [CrossRef]
- Lee, S.-G., Na, D. & Park, C. Comparability of reference-based and reference-free transcriptome analysis approaches at the gene expression level. BMC Bioinformatics 22, 310 (2021). [CrossRef]
- Parra-Salazar, A., Gomez, J., Lozano-Arce, D., Reyes-Herrera, P. H. & Duitama, J. Robust and efficient software for reference-free genomic diversity analysis of genotyping-by-sequencing data on diploid and polyploid species. Molecular Ecology Resources 22, 439–454 (2022). [CrossRef]
- Petri, A. J. & Sahlin, K. isONform: reference-free transcriptome reconstruction from Oxford Nanopore data. Bioinformatics 39, i222–i231 (2023). [CrossRef]
- Ambroise, J. et al. Backward compatibility of whole genome sequencing data with MLVA typing using a new MLVAtype shiny application for Vibrio cholerae. PLOS ONE 14, e0225848 (2019). [CrossRef]
- Bletz, S., Mellmann, A., Rothgänger, J. & Harmsen, D. Ensuring backwards compatibility: traditional genotyping efforts in the era of whole genome sequencing. Clinical Microbiology and Infection 21, 347.e1-347.e4 (2015). [CrossRef]
- Gordon, M., Yakunin, E., Valinsky, L., Chalifa-Caspi, V. & Moran-Gilad, J. A bioinformatics tool for ensuring the backwards compatibility of Legionella pneumophila typing in the genomic era. Clinical Microbiology and Infection 23, 306–310 (2017). [CrossRef]
- de Farias, T. M., Wollbrett, J., Robinson-Rechavi, M. & Bastian, F. Lessons learned to boost a bioinformatics knowledge base reusability, the Bgee experience. arXiv (2023).
- Tedersoo, L. et al. Data sharing practices and data availability upon request differ across scientific disciplines. Sci Data 8, 192 (2021). [CrossRef]
- Eckert, E. M. et al. Every fifth published metagenome is not available to science. PLoS Biol 18, e3000698 (2020). [CrossRef]
- Stodden, V., Seiler, J. & Ma, Z. An empirical analysis of journal policy effectiveness for computational reproducibility. Proc Natl Acad Sci U S A 115, 2584–2589 (2018). [CrossRef]
- Ahmed, M., Kim, H. J. & Kim, D. R. Maximizing the utility of public data. Front. Genet. 14, 1106631 (2023). [CrossRef]
- Koppad, S., B, A., Gkoutos, G. V. & Acharjee, A. Cloud Computing Enabled Big Multi-Omics Data Analytics. Bioinform Biol Insights 15, 11779322211035921 (2021). [CrossRef]
- Groth, P., Cousijn, H., Clark, T. & Goble, C. FAIR Data Reuse – the Path through Data Citation. Data Intellegence 2, 78–86 (2020). [CrossRef]
- Wood-Charlson, E. M., Crockett, Z., Erdmann, C., Arkin, A. P. & Robinson, C. B. Ten simple rules for getting and giving credit for data. PLoS Comput Biol 18, e1010476 (2022). [CrossRef]
- Perez-Riverol, Y. et al. Quantifying the impact of public omics data. Nature Communications 10, 3512 (2019). [CrossRef]
- Ray, K. S., Zurn, P., Dworkin, J. D., Bassett, D. S. & Resnik, D. B. Citation bias, diversity, and ethics. Accountability in Research 0, 1–15 (2022). [CrossRef]
- Zimmerman, S. Corteva lawsuit accuses gene-editing startup of stealing seeds. Agriculture Dive https://www.agriculturedive.com/news/corteva-lawsuit-inari-steal-seeds-gene-editing/695605/ (2023).
- Blatt, M., Gusev, A., Polyakov, Y. & Goldwasser, S. Secure large-scale genome-wide association studies using homomorphic encryption. Proc Natl Acad Sci U S A 117, 11608–11613 (2020). [CrossRef]
- Konečný, J., McMahan, B. & Ramage, D. Federated Optimization:Distributed Optimization Beyond the Datacenter. arXiv (2015).
- Mott, R., Fischer, C., Prins, P. & Davies, R. W. Private Genomes and Public SNPs: Homomorphic Encryption of Genotypes and Phenotypes for Shared Quantitative Genetics. Genetics 215, 359–372 (2020). [CrossRef]
- Zhao, T., Wang, F., Mott, R., Dekkers, J. & Cheng, H. Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality. Genetics iyad210 (2023). [CrossRef]
- Smyth, S. J., Macall, D. M., Phillips, P. W. B. & de Beer, J. Implications of biological information digitization: Access and benefit sharing of plant genetic resources. The Journal of World Intellectual Property 23, 267–287 (2020). [CrossRef]
- Wynberg, R. et al. Farmers’ Rights and Digital Sequence Information: Crisis or Opportunity to Reclaim Stewardship Over Agrobiodiversity? Frontiers in Plant Science 12, (2021).
- Wolff, K., Friedhoff, R., Schwarzer, F. & Pucker, B. Data literacy in genome research. Journal of Integrative Bioinformatics 0, 20230033 (2023). [CrossRef]
- Weersink, A., Fraser, E., Pannell, D., Duncan, E. & Rotz, S. Opportunities and Challenges for Big Data in Agricultural and Environmental Analysis. Annual Review of Resource Economics 10, 19–37 (2018). [CrossRef]
- Harris, J., Tan, W., Mitchell, B. & Zayed, D. Equity in agriculture-nutrition-health research: a scoping review. Nutrition Reviews 80, 78–90 (2022). [CrossRef]
- Friendly, M. & Denis, D. J. Milestones in the history of thematic cartography, statistical graphics, and data visualization. http://www.datavis.ca/milestones/ (2001).
- Li, Q. Embodying Data: Chinese Aesthetics, Interactive Visualization and Gaming Technologies. (Springer, 2020). [CrossRef]
- Pasquetto, I. V., Borgman, C. L. & Wofford, M. F. Uses and Reuses of Scientific Data: The Data Creators’ Advantage. Harvard Data Science Review 1, (2019). [CrossRef]
- Hofstra, B. et al. The Diversity–Innovation Paradox in Science. Proceedings of the National Academy of Sciences 117, 9284–9291 (2020). [CrossRef]
- Carroll, S. R. et al. The CARE Principles for Indigenous Data Governance. Data Science Journal (2020). [CrossRef]
- Carroll, S. R., Herczog, E., Hudson, M., Russell, K. & Stall, S. Operationalizing the CARE and FAIR Principles for Indigenous data futures. Scientific Data 8, 108 (2021). [CrossRef]
- Xafis, V. et al. An Ethics Framework for Big Data in Health and Research. Asian Bioeth Rev 11, 227–254 (2019). [CrossRef]
- Tiffin, N., George, A. & LeFevre, A. E. How to use relevant data for maximal benefit with minimal risk: digital health data governance to protect vulnerable populations in low-income and middle-income countries. BMJ Global Health 4, e001395 (2019). [CrossRef]
- Muñoz-Tamayo, R. et al. Seven steps to enhance Open Science practices in animal science. PNAS Nexus 1, pgac106 (2022). [CrossRef]




Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).