Preprint Article Version 2 Preserved in Portico This version is not peer-reviewed

Preliminary Analysis of COVID-19 Academic Information Patterns: A Call for Open Science in the Times of Closed Borders

Version 1 : Received: 30 March 2020 / Approved: 31 March 2020 / Online: 31 March 2020 (04:38:53 CEST)
Version 2 : Received: 21 April 2020 / Approved: 22 April 2020 / Online: 22 April 2020 (06:15:34 CEST)

A peer-reviewed article of this Preprint also exists.


Introduction: The Pandemic of COVID-19, an infectious disease caused by SARS-CoV-2 motivated the scientific community to work together in order to gather, organize, process and distribute data on the novel biomedical hazard. Here, we analyzed how the scientific community responded to this challenge by quantifying distribution and availability patterns of the academic information related to COVID-19. The aim of our study was to assess the quality of the information flow and scientific collaboration, two factors we believe to be critical for finding new solutions for the ongoing pandemic. Materials and methods: The RISmed R package, and a custom Python script were used to fetch metadata on articles indexed in PubMed and published on Rxiv preprint server. Scopus was manually searched and the metadata was exported in BibTex file. Publication rate and publication status, affiliation and author count per article, and submission-to-publication time were analysed in R. Biblioshiny application was used to create a world collaboration map. Results: Our preliminary data suggest that COVID-19 pandemic resulted in generation of a large amount of scientific data, and demonstrates potential problems regarding the information velocity, availability, and scientific collaboration in the early stages of the pandemic. More specifically, our results indicate precarious overload of the standard publication systems, significant problems with data availability and apparent deficient collaboration. Conclusion: In conclusion, we believe the scientific community could have used the data more efficiently in order to create proper foundations for finding new solutions for the COVID-19 pandemic. Moreover, we believe we can learn from this on the go and adopt open science principles and a more mindful approach to COVID-19-related data to accelerate the discovery of more efficient solutions. We take this opportunity to invite our colleagues to contribute to this global scientific collaboration by publishing their findings with maximal transparency.

Supplementary and Associated Material


COVID-19; open science; data; bibliometric; pandemic


Social Sciences, Library and Information Sciences

Comments (3)

Comment 1
Received: 22 April 2020
Commenter: Jan Homolak
Commenter's Conflict of Interests: Author
Comment: In the second version of our manuscript we included some additional analyses and updated the description of our methodology. In the previous version of our manuscript search phrases were available only in the R code made freely available on our GitHub account. However, some colleagues kindly suggested we also include them in the paper so we added the whole search strategy in the form of a table with a list of all search phrases used. Moreover, two additional tables were added (Table 2: Affiliation count per article; Table 3: Author count per article). We included additional controls for our analyses to reduce the possibility of biased data even more than we did in the previous version (eg. additional controls in the ver2 Fig 2). We created new GitHub repository for all data used in this manuscript. We also included data in the form of .csv as some colleagues suggested this might be better for users who wish to explore our data without using R. Even though we exported most of the data and made it R-independent, some bits of data couldn't be exported this way because of the specific data formats used in our analyses. For this reason both formats of our complete dataset are now available on the new GitHub repository linked in the article.
+ Respond to this comment
Comment 2
Received: 26 April 2020
Commenter: Anne Rosemary Tate
The commenter has declared there is no conflict of interests.
Comment: I commend this article and have written a review.
Authors, would it be possible to modify your search tool to
1. Identify medxriv pre-prints that have uploaded an EQUATOR checklist
2. Identify preprints that detail the type of study in the title or abstract (Indicate the study's design with a commonly used term in the title or the abstract.).

I ask because most of the authors seem to ignore both. Even authors from very reputable groups.
It would be good if we could persuade medxriv to make these mandatory as this would not only improve the quality, but would also help reviewers like me.
+ Respond to this comment
Response 1 to Comment 2
Received: 29 April 2020
The commenter has declared there is no conflict of interests.
Comment: Thank you very much for your feedback! I wholeheartedly agree, and I believe that everyone (researchers, editors, reviewers...) would benefit from all papers including a standardised metadata form containing structured details about the study's design such as, for instance, methods used, sample sizes etc. - this would enable and/or greatly facilitate more extensive and in-depth analyses. I believe that development and wide adoption of a universally satisfactory standard are the greatest, but hopefully not insurmountable obstacles. Versioning and continuous development and improvement of such a standard is likely a necessity in this day and age.
The EQUATOR checklists seem like an excellent tool, and indeed the [medRxiv submission page]( states that "[a]uthors must submit the appropriate research reporting checklist as suggested by the EQUATOR network as supplementary files." From my cursory glance, few of them do, at least properly attached as a precisely named supplement. A functionality that examines this by looking for names of EQUATOR checklists in supplement names could likely be added to our tool, with inaccurate or nondescriptive names of supplementary material being a potential source of false negative results. It's an excellent suggestion and we will definitely explore adding this functionality!
Obtaining detailed data about studies themselves is unfortunately pretty tricky. For instance, using pure search by keyword, it is impossible to know if a study mentioning western blots uses the method, explores it, suggests using it, criticises it, or something else entirely. This requires natural language processing (NLP), which tries to infer this information from context clues. However, NLP models need to be painstakingly trained and thoroughly tested for any specific purpose to achieve reasonable accuracy. Note, however, that I am far from an expert in the field of NLP. Your second suggestion would definitely provide very useful information, but its implementation is unfortunately something that is out of our reach for the forseeable future.
In the end, I would like to thank you once again for your suggestions and your input, it is of immense help. :)
I will conclude with an unfortunately painfully relevant xkcd comic on standards:

Best regards,

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 3
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.