ARTICLE | doi:10.20944/preprints202210.0402.v1
Subject: Chemistry, Other Keywords: QSAR; q-RASAR; random forest; machine learning; TiO2-based nanoparticles
Online: 26 October 2022 (07:36:50 CEST)
Read-Across Structure-Activity Relationship (RASAR) is an emerging cheminformatic approach that combines the usefulness of a QSAR model and similarity-based Read-Across predictions. In this work, we have generated a simple, interpretable, and transferable quantitative-RASAR (q-RASAR) model which can efficiently predict the cytotoxicity of TiO2-based multi-component nanomaterials. The data set involves 29 TiO2-based nanomaterials which contain specific amounts of noble metal precursors in the form of Ag, Au, Pd, and Pt. The data set was rationally divided into training and test sets and the Read-Across-based predictions for the test set were generated using the tool Read-Across-v4.1 available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home. The hyperparameters were optimized based on the training set data and using this optimized setting, the Read-Across-based predictions for the test set were obtained. The optimized hyperparameters and the similarity approach, which yields the best predictions, were used to calculate the similarity and error-based RASAR descriptors using the tool RASAR-Desc-Calc-v2.0 available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home. These RASAR descriptors were then clubbed with the physicochemical descriptors and were subjected to features selection using the tool Best Subset Selection v2.1 available from https://dtclab.webs.com/software-tools. The final set of selected descriptors was used to develop multiple linear regression based q-RASAR models, which were validated using stringent criteria as per the OECD guidelines. Finally, a random forest model was also developed with the selected descriptors. The final machine learning model can efficiently predict the cytotoxicity of TiO2-based multi-component nanomaterials superseding previously reported models in the prediction quality.