Version 1
: Received: 23 December 2019 / Approved: 25 December 2019 / Online: 25 December 2019 (03:24:53 CET)
How to cite:
Sorokina, M.; Steinbeck, C. On the Redundancy of Natural Products Public Databases and Where to Find Data in 2020 - A Review on Natural Products Databases. Preprints2019, 2019120332. https://doi.org/10.20944/preprints201912.0332.v1
Sorokina, M.; Steinbeck, C. On the Redundancy of Natural Products Public Databases and Where to Find Data in 2020 - A Review on Natural Products Databases. Preprints 2019, 2019120332. https://doi.org/10.20944/preprints201912.0332.v1
Sorokina, M.; Steinbeck, C. On the Redundancy of Natural Products Public Databases and Where to Find Data in 2020 - A Review on Natural Products Databases. Preprints2019, 2019120332. https://doi.org/10.20944/preprints201912.0332.v1
APA Style
Sorokina, M., & Steinbeck, C. (2019). On the Redundancy of Natural Products Public Databases and Where to Find Data in 2020 - A Review on Natural Products Databases. Preprints. https://doi.org/10.20944/preprints201912.0332.v1
Chicago/Turabian Style
Sorokina, M. and Christoph Steinbeck. 2019 "On the Redundancy of Natural Products Public Databases and Where to Find Data in 2020 - A Review on Natural Products Databases" Preprints. https://doi.org/10.20944/preprints201912.0332.v1
Abstract
Natural products (NPs) have been the centre of attention of the scientific community in the last decencies and the interest around them continues to grow incessantly. As a consequence, in the last 20 years, there was a rapid multiplication of various databases and collections as generalistic or thematic resources for NP information. In this review, we establish a complete overview of these resources, and the numbers are overwhelming: over 120 different NP databases and collections were published and re-used since 2000. 98 of them are still somehow accessible and only 50 are open access. The latter include not only databases but also big collections of NPs published as supplementary material in scientific publications and collections that were backed up in the ZINC database for commercially-available compounds. Some databases, even published relatively recently are already not accessible anymore, which leads to a dramatic loss of data on NPs. The data sources are presented in this manuscript, together with the comparison of the content of open ones. With this review, we also compiled the open-access natural compounds in one single dataset a COlleCtion of Open NatUral producTs (COCONUT), which is available on Zenodo and contains structures and sparse annotations for over 400000 non-redundant NPs, which makes it the biggest open collection of NPs available to this date.
Biology and Life Sciences, Biochemistry and Molecular Biology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Received:
10 March 2020
Commenter:
Stephen Boyer
The commenter has declared there is no conflict of interests.
Comment:
Hi Chris & Maria,
Lutz Weber sent me copy of this paper.
Great paper ! This is a wonderful resource.
Any possibility of the following :
1) Get the database into Google's Big Query (BQ) so that the data can be integrated with other open source scientific data, I already asked Lutz to injust it ...but is alway good to have the originator host it in BQ, I will likely get Google to cover any cost)
2) Get a copy of your spreadsheet.
Best
Steve
SKBoyer@google.com
(408) 858-5544
(Please don't send any confidential info...thx)
Commenter: Stephen Boyer
The commenter has declared there is no conflict of interests.
Lutz Weber sent me copy of this paper.
Great paper ! This is a wonderful resource.
Any possibility of the following :
1) Get the database into Google's Big Query (BQ) so that the data can be integrated with other open source scientific data, I already asked Lutz to injust it ...but is alway good to have the originator host it in BQ, I will likely get Google to cover any cost)
2) Get a copy of your spreadsheet.
Best
Steve
SKBoyer@google.com
(408) 858-5544
(Please don't send any confidential info...thx)