Preprint Review Version 1 Preserved in Portico This version is not peer-reviewed

A Brief Guide to Big Data in Molecular Design: From Concepts and Definitions to Models

Version 1 : Received: 5 December 2023 / Approved: 6 December 2023 / Online: 6 December 2023 (10:19:02 CET)

How to cite: Polanski, J. A Brief Guide to Big Data in Molecular Design: From Concepts and Definitions to Models. Preprints 2023, 2023120387. https://doi.org/10.20944/preprints202312.0387.v1 Polanski, J. A Brief Guide to Big Data in Molecular Design: From Concepts and Definitions to Models. Preprints 2023, 2023120387. https://doi.org/10.20944/preprints202312.0387.v1

Abstract

How crucial is big data in contemporary molecular design? In this publication we elucidate fundamental concepts and terminology in this field, critically addressing overlooked issues. We thoroughly examine the size, accessibility, quality, and structural aspects of big data alongside the primary methodologies employed for their analysis. Within chemical compounds, properties and descriptors represent two distinct data types, forming the basis for categorizing molecular big data. The primary objective of chemistry is property production, which means we are searching for novel drugs or materials rather than chemical compounds, and big data is the central issue of this philosophy. The increasing availability of data in computer-aided technology propels advancements in artificial intelligence (AI), machine learning (ML), and deep learning (DL). Accordingly, a broad chemical audience must comprehend these methods to understand data-centered chemistry. Thus, we aim to systemize big data issues through a simple illustrative framework with fundamental descriptor categories: coding, computer-generated descriptors, and property correlates. Although we employ computer-generated descriptors as big data for predictions, the measured data are irreplaceable for achieving high-quality and reliable outcomes and controlling molecular effects. The scarcity of property data remains a significant hurdle limiting comprehensive studies on the structure-property relationships within big data. Accordingly, guided by pragmatics not-so-big data is an option for drug design. We presented also a brief review of the recent big data literature.

Keywords

big data; not-so-big data; molecular design; drug design; machine learning; artificial intelligence; descriptor; computer-generated descriptors; property; regression

Subject

Chemistry and Materials Science, Medicinal Chemistry

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.