Hibiscus sabdariffa anthocyanins are potential modulators of estrogen receptor alpha activity with favourable toxicology: a computational analysis using molecular docking, ADME/Tox prediction, 2D/3D QSAR and molecular dynamics simulation

Abstract The estrogen hormone receptor (ER) mediated gene expression mainly regulate the development and physiology of the primary and secondary reproductive system alongside bone-forming, metabolism and behaviour. Over-expressed ER is associated with several pathological conditions and play a crucial role in breast cancer occurrence, progression and metastasis. Hibiscus sabdariffa L. or roselle is a rich source of naturally occurring polyphenolic compounds that reportedly have robust estrogenic activity. However, the estrogen-like ingredient of the plant remains ambiguous. This study has screened a library of already recorded and less-explored compounds of Hibiscus sabdariffa for their estrogen receptor binding affinity and safety using a suite of computational methods that include protein-ligand docking, ADME and Toxicity prediction, and 2D/3D QSAR. The study revealed that the estrogen-receptor binding potential of Pelargonidin, Delphinidin, Cyanidin, and Hibiscetin are more efficient than popular breast cancer drugs, Tamoxifen and Raloxifene. Besides, the compounds exhibited favourable toxicological parameters with potent bioactivity towards binding ER-α subunit. Thus, these compounds can serve as prototypes for designing novel Selective Estrogen Receptor Modulator molecules with a few structural modifications. This is the first report exploring the phytochemical basis of estrogenic activity of Hibiscus sabdariffa L. Communicated by Ramaswamy H. Sarma


Introduction
The mammalian hormone estrogen or its biologically active form Estradiol is the primary female steroid hormone that binds to its intracellular receptors known as estrogen receptors (ER), with high affinity and specificity that trigger multiple physiological functions (Fuentes & Silveyra, 2019;Yaşar et al., 2017). The ER-mediated gene expression mainly regulates the primary and secondary reproductive system's development and physiology alongside bone-homeostasis, metabolism, and behaviour (Fuentes & Silveyra, 2019;Yaşar et al., 2017). Estrogen's biological activity is mainly mediated by its binding and activation of two nuclear receptors superfamily of transcription factors, namely ERa and ERb (Fuentes & Silveyra, 2019). Although ERa and ERb are characterized by highly conserved DNA-and ligand-binding domains with 97% homology; however, their activation can produce unique and overlapping effects (Guillette et al., 2018;Hua et al., 2018). While the breast cancer-promoting role of ERa is well-established, ERb is a putative tumour suppressor and considered a therapeutic biomarker for breast can risk assessment (Guillette et al., 2018;Hieken et al., 2015). In ER-positive breast cancer, ERa expression promotes tumour cell proliferation and metastasis, evident by the late onset of mammary tumorigenesis in ERa knock-out mice (Bocchinfuso & Korach, 1997). In contrast, ERb knock-out has negligible effects on mammary gland development (Couse & Korach, 1999). Although both the ER subtypes have a high affinity for their natural ligand (e.g. Estradiol), their affinities for other natural and synthetic ligands differ significantly (Guillette et al., 2018).
Overexpressed estrogen receptor was reported in approximately 75% of the breast tumour, promoting the growth and metastasis of the breast cancer cells (Hua et al., 2018;Spring et al., 2016). Also, mounting evidence demonstrates that overexpressed ERa is also associated with lung, ovarian, prostate and endometrial carcinogenesis (Bonkhoff, 2018;Hsu et al., 2017;Hua et al., 2018). Besides, several ERa-responsive genes, including Insulin-like growth factor 1 (IGF-1), cyclin D1, c-myc, and the estrogen-alpha-responsive finger protein (efp), are essential for the proliferation and survival of cancer cells (Hua et al., 2018). Thus hindering or modulating ERa activity is considered a crucial therapeutic approach for chemoprevention and diagnosis of ERa-mediated carcinogenesis (Hua et al., 2018). Several small-molecule inhibitors or modulators of estrogen receptors known as selective estrogen receptor modulators (SERM) are currently used in the treatment of breast cancer (Jameera Begam et al., 2017;Patel & Bihani, 2018). The popular breast cancer chemopreventive agents like tamoxifen, raloxifene, and toremifene are selective estrogen receptor modulator that binds primarily to ERa and acts as agonist or antagonist to estrogen to inhibit its proliferative actions on the mammary epithelium (Jameera Begam et al., 2017;Patel & Bihani, 2018). Although this is a favourable strategy to halt cancer progression; however, the toxicity associated with the synthetic molecules limits their use in chemotherapy (Ellis et al., 2015;Evans, 2018). Thus, plant-derived ER agonists or antagonists receive considerable attention for designing novel ERa-modulating compounds with low and selective toxicity (Mangalath & Sadasivan, 2015;Yang et al., 2019).
The plant Hibiscus sabdariffa L. (HS) or roselle is a rich source of naturally occurring phenolic compounds like alkaloids, flavonoids, anthocyanins, coumarins and phytosterols (Da-Costa-Rocha et al., 2014;Fitrotunnisa et al., 2019;Riaz & Chopra, 2018;Zhen et al., 2016). The alcoholic extracts of the plant showed mild to moderate estrogenic effects on the uteri of immature female rats. They exhibited significant effects on the circulating levels of Follicle-stimulating hormone (FSH), Prolactin, Estradiol and Testosterone in male Wistar rats (Ali et al., 2012;Saeed et al., 2013;Sirag, 2013). Previous studies reported a robust inhibitory effect of its extracts on the growth and survival of ER-positive breast cancer cells (MCF-7) and prostate cancer cells (LNCaP); however, the mechanism of inhibition was not well-depicted (Khaghani et al., 2011;Lin et al., May 2012). Although HSextracts reportedly alters the estrogen receptor's cellular localization, no significant progress has been made to understand the role of its phytochemicals underlying these activities (Erriquez et al., May 2017). Previously, we have recorded all the phytochemicals extracted from Hibiscus sabdariffa L. and reviewed their implication in breast cancer chemotherapy (Laskar & Mazumder, 2020). Based on our previous observation, we have selected twelve prominent phenolic compounds from Hibiscus sabdariffa L. (HSP) that could be a potential modulator of ERa activity. We approached computational techniques as it is often used to complement experimental findings that help to prioritize the class of compounds to be screened and dramatically reduces the quantity of compounds to screen.
The experimental compounds include the five major anthocyanins (Delphinidin, Cyanidin, Malvidin, Peonidin, Pelargonidin), five phytosterols (b-sitosterol, Stigmasterol, a-Spinasterol, Campesterol, Ergosterol) and two other phenolic compounds (Protocatechuic acid and Hibiscetin) which were selected to evaluate their affinity and specificity towards binding and/or modulating ERa activity using computational docking and molecular dynamics simulation analysis. Based on the result obtained from the protein-ligand interaction studies, the toxicology and drug-likeliness of the best-docked compounds were analysed, and the 2D/3D quantitative structure-activity relation (QSAR) models were generated to predict their biological activity range.

Preparation of experimental receptors (ERa) and ligands (HSP)
The validated x-ray diffraction structure of human estrogen receptor alpha (ERa) was retrieved from the RCSB Protein Databank in .pdb format having PDB_ID: 1X7R (2.00 Å resolution) (Manas et al., 2004). This particular protein structure was selected as it is already co-crystallized with one of the well-known plant-based ER-agonist, Genistein. The protein structure was energy-optimized using YASARA Energy Minimization Server (Krieger et al., 2009). The structure was validated for possible stereochemical clashes, missing atom/ residues, and other structural authenticity using the PDBsum server of EMBL-EBI (Laskowski et al., 2018), SAVES v6.0 (Colovos & Yeates, 1993) and ProSA-web (Wiederstein & Sippl, 2007). The validation steps are necessary as an unsubstantiated structure may provide unreliable results in the subsequent computational studies. The structures of twelve experimental compounds ( Figure 1) and the control compounds (tamoxifen, raloxifene, and 17b-estradiol) were retrieved from the NCBI PubChem database (Kim et al., 2016). The .SDF structures were converted to .MOL files using Open Babel 3.1.1, a freeware used for conversion of chemical formats (O'Boyle et al., 2011). The compounds' stereochemical clashes were removed by energy minimization to get optimized bond distance, bond angles, and set dihedrals using Merck molecular force field (MMFF) in VLIfeMDS 4.6 (VLifeMDSV R , 2018) software suite that provide multiple functionalities for drug discovery and design, developed by NovaLead Pharma. The VLIfeMDS 4.6 (VLifeMDSV R , 2018) suite was used for all 2D and 3D QSAR model generation & bioactivity prediction methods.

Molecular docking analysis
The receptor-ligand molecular docking is a dynamic tool for visualizing and understanding the interaction between a receptor protein and its ligand. The molecular docking analysis was performed using the licensed version of the FlexX TM v2.1.6 included in LeadIT TM package software (Sankt, 2020). First, the active pocket was defined using the binding site of the known co-crystallized reference ligand by considering the amino acid residues within a radius of 6.50 Å using the protein preparation wizard of the FlexX LeadIT. Further, the reference ligand and crystal water molecules in the binding pocket were removed from the target protein, as evidently, they did not play any role in the ligand-binding ability of ERa. The experimental ligands and the control ligands were then docked with the target proteins, followed by an analysis of hydrogen bonding parameters. FlexX primarily uses an incremental construction algorithm that considers compounds' physicochemical features and variable flexibility while fitting them in the rigid target protein's active site (Rarey et al., 1996). To cross-validate the FlexX results, re-docking was performed using a different docking algorithm in the SwissDock web-server (http://www.swissdock.ch/) that uses CHARMM-based software EADock-DSS that primarily uses genetic algorithm (Grosdidier et al., 2011). The strength of the predicted crystallographic binding orientations is expressed in terms of the energy function. The lowest energy score is considered the most stable conformation of the protein-ligand interaction (Rarey et al., 1996). For redocking, the energy-minimized ERa protein structure was loaded in the swiss-dock server, and the ligands were selected from the ZINC database (Sterling & Irwin, 2015). The SwissDock generates almost 500 docking poses for the protein-ligand complex, among which the best-docked pose with the lowest energy score was selected from the ViewDock plugin of UCSF Chimera (Pettersen et al., 2004).

Prediction of ADME/Tox profile
The ADME (Adsorption, Distribution, Metabolism and Excretion) and toxicity profiles of the best four hits were performed using PreADMET, a web server from Yonsei University, Republic of Korea (Lee et al., 2003). The PreADMET server uses in vitro results and considers several parameters for evaluating potential drug candidates' intestinal absorption. Among them, Caco2 (human colon adenocarcinoma cells) model and MDCK (Madin-Darby canine kidney) cell model has been recommended as a reliable in vitro model for the prediction of oral drug absorption (Zhao et al., 2001). The HIA (human intestinal absorption) data and skin permeability data can help predict and select between oral or transdermal drug delivery (Gupta & Rai, May 2017). Moreover, skin permeability is a risk assessment parameter for all chemicals that may accidentally come into contact with skin (Gupta & Rai, May 2017). The Plasma Protein Binding (PPB) data and Blood-Brain Barrier Penetration (BBB) data is helpful to predict the availability of drug molecule to interact with its pharmacological target after diffusion or transport across cell membranes and the probability of interfering with the central nervous system (Kingwell, 2016). PreADMET server uses a genetic functional approximation to select relevant 2 D descriptors, followed by Resilient back-propagation (Rprop) neural network analysis to predict ADME data. Several small molecules fail to attain a good drug-like property after confirming a promising ADME/Tox profile because of low bioavailability and poor pharmacokinetics. Thus, pharmacokinetic parameters like gastrointestinal absorption and brain access play a key role in determining the bioavailability and brain toxicity of molecules under investigation. The Brain or IntestinaL EstimateD permeation method (BOILED-Egg) is a powerful predictive model developed by Daina et al. (Daina & Zoete, 2016) in 2016 which counts on the lipophilicity (n-octanol/water partition coefficient or WLOGP) and polarity (Topological Polar Surface Area or TPSA) of small molecules to accurately predict their passive gastrointestinal absorption and brain penetration. The BOILED-Egg prediction, pharmacokinetics, drug-likeness and medicinal chemistry parameters of the best hits were performed using the SwissADME server (Daina et al., 2017) of the Swiss Institute of Bioinformatics. The required physicochemical descriptors were generated by providing the SMILE strings of the experimental molecules that translated into a graphical output of the model.
For toxicological analysis, ProTox-II (Banerjee et al., 2018), developed by Charit e -University of Medicine, and OECD QSAR Toolbox (Dimitrov et al., 2016), were used which predict rodent toxicity of small molecules by incorporating molecular similarity, pharmacophores fragment propensities and machine-learning. The training set data of the different toxicity prediction models at the ProTox-II server has an average of 80-90% accuracy, sensitivity and specificity, a summary of which is provided in the supplementary Table 3 (Banerjee et al., 2018). Further, these models were used to determine the rodent oral toxicity, hepatotoxicity, cytotoxicity, carcinogenicity, mutagenicity, immunotoxicity and effects on critical cellular pathways.

Biological data
The experimental biological data (IC 50 ) of 50 known agonist/ antagonist and inhibitors of the ERa subtype were collected from the BindingDB database (Gilson et al., 2016) and the DrugBank database (Wishart et al., 2018). Compounds that showed ERa inhibition/modulation in cell-based assays involving ER þ human ductal breast cancer cells (MCF7) with plant product-inspired scaffolds and two to five fused ring skeletons were considered for the QSAR model generation and validation. The series of compounds selected for the study have significant structural variations and differences in biological activity, with IC 50 values ranging from 0.01 to 56 nm, making the series ideal for building reliable QSAR models. Therefore, the biological activity (IC 50 ), i.e. the dependable variable in the study, was calculated as Log10 (IC 50 ) to reduce their numerical differences.

Molecular modelling and descriptor calculation
An Intel Core i5 8th Generation personal computer with Windows 10 workstation was used for all computations. First, the 3D structures of the compounds were retrieved from the NCBI PubChem database and energy-optimized to eliminate close atom contacts using a Merck molecular force field (MMFF) with MMFF charges in VLifeMDS V4.6. Then, 1000 cycles were run until a convergence (RMS gradient) condition of 0.1 was attained. Further, these sterically and geometrically optimized structures were used for building 2D and 3D QSAR models. After discarding invariable descriptors (values that are constant for all compounds), a total of 406 2D (physicochemical and topological) descriptors and 16417 3D (steric, electronic and hydrophobic) descriptors were calculated using VLifeMDS to build 2D and 3D-QSAR models, respectively. The molecular descriptors were used as the independent variables in the 2D and 3D QSAR model development.
For generating the 3D field descriptors, energy cut-offs of 10.0 and 30.0 kcal.mol À1 was used for electrostatic and steric field, respectively, using MMFF charges. The dielectric constant was set to 1.0, considering the distance-dependent dielectric function. A carbon atom with charge 1.0 was selected as the probe atom. The grid parameters used for calculating the 3D descriptors for both the known ERa agonist/antagonist and the experimental molecules are provided in the supplementary data.

Training and test set data
The fifty (50) known agonist/antagonist and inhibitors of the ERa subtype were divided into training (30) and test (20) compounds in a 3:2 ratio. The random sampling approach was used such that both training and test sets contain highly active, moderately active and less active compounds to spread the biological activity range. The similarity of the biological activity distribution pattern of the molecules in the training and test sets was estimated using statistical parameters such as mean, maximum, minimum and standard deviation. A satisfactory result in these statistical factors ensures a uniform distribution of the compounds from all bioactivity ranges (highly active to less active) in the training and test set.

2D-QSAR
Multiple linear regression (MLR) method. The MLR method, also known as the ordinary least square regression (OLS) method, is used to find a linear relationship between dependent and independent variables by applying the leastsquares curve fitting method (Zambre et al., 2015). The stepwise forward-backwards (SW-FB) variable selection protocol was used with a cross-correlation limit of 0.5, and the variance cut-off was set to 0 kcal.mol À1 Å À1 and scaling was set to auto-scaling using the Model Building Wizard of VLifeMDS software. F-test 'in' and 'out' was set to 4.0 and 3.99, respectively. The actual versus the predicted bioactivity of the compounds were generated from the resultant regression curve, which was calculated using the general regression equation for the MLR method is as follows: where Y is the dependable variable, 'b's are the regression coefficients for the corresponding independent variable 'x', and 'c' the regression constant (Zambre et al., 2015).
Partial least square (PLS) method. The MLR generated model was cross-validated using the PLS method, a modified regression method with the same input parameters. The PLS method is helpful to correlate the dependable variable with the number of descriptors when the descriptor's count exceeds the number of data points (compounds) and where the classical least squares method is less reliable. The PLS method estimates the matrix Y of the dependable variable, matrix X of the independent variable and maximizes the correlation between the matrices. The resulting regression equation for correlation includes the latent independent variables which correlate significantly with the bioactivity of the compounds.
2.4.5. 3D-QSAR Principal component regression (PCR) analysis. PCR is a modified regression procedure widely used in drug design and discovery. It is a powerful data reduction technique used in QSAR studies to eliminate outliers from a pool of compounds with calculated structural descriptors (Yoo & Shahlaei, 2018). PCR is a convenient method in QSAR model development when the data has high dimensionality and collinearity (Guven & Samkar, 2019). In PCR, the correlated variables are grouped by replacing the original descriptors with a new set called principal components (PCs), reducing the original number of descriptors (Ahmadi & Shahlaei, 2015). Each PC describe the variation among the molecules and can be interpreted independently (Ahmadi & Shahlaei, 2015). This helps to understand the relation between the structures of the compounds and identify the deviating compounds. The compounds with the best subset of descriptors can be calculated using stepwise regression (Ahmadi & Shahlaei, 2015).
The kNN method is a simple distance learning approach where data can be categorized based on the most common kNN in the training set (Zambre et al., 2015). Following the 3D descriptor computation and selection of the training/test dataset, the kNN model building method was applied with a cross-correlation limit set to 0.5, a cut-off variance of 0 kcal.mol À1 Å À1 , scaling set to auto-scaling, maximum neighbors as five (5) and minimum neighbors as two (2). The prediction method was set to 'Distance-based weighted average' and F-test 'in' and 'out' was set to 4.0 and 3.99, respectively. The interaction energy values were considered for relationship generation and utilized as descriptors to decide the proximity between molecules.
2.4.6. Model validation Internal validation. The generated 2D and 3D models were validated for their internal stability and predictive ability. The widely used leave-one-out (q 2 LOO) cross-validation method was employed for internal validation. The q 2 was calculated by removing every compound from the dataset once, and dividing the rest into two new datasets of equal size. Then, the bioactivity of the outlined compound was predicted by generating a model using the remaining compounds. The process was repeated once for each compound, and the q 2 was calculated as follows: where, Y pred and Y act are the predicted and actual bioactivity [Log10(IC 50 )] of the compounds in the training set, respectively. Y mean is the average activity of all compounds in the training set.
External validation. The reliability and predictive ability of the QSAR models were validated using the activity of the external test set compounds, which were not used in the model building. The external predictive ability of the models was estimated as follows: where, Y pred(Test) is the predicted activity of the test compounds, Y Test is the actual activity of test set compounds, Y training is the mean activity of all compounds in the training set.
Randomization test. The Y-randomization test was used to analyze the robustness of the QSAR models. For that, a random dataset was generated by shuffling the biological activities of the compounds in the training set. The quality of the models for the training set was analysed by comparing them with the models generated using the random dataset. The significance of the models was expressed in terms of the Z score, calculated as: where, q org 2 is the q 2 value calculated for the actual data set, q a 2 is the average q 2 , and q std 2 is the standard deviation of q 2 , calculated for various repetitions using different random data sets.

Bioactivity prediction of the studied compounds
The 3D structures of the experimental compounds were energy-optimized using MMFF forcefield/charges, and 2D/3D descriptors were computed using a similar approach used for the known ER-a inhibitors. Finally, the biological activities of the four potential ERa inhibitors were calculated by applying the generated 2D and 3D QSAR models using the Generic Prediction module of VLifeMDS.

Molecular dynamics simulation (MDS) analysis
The selected best ligand/s with the predicted highest binding affinity towards ERa, considerable toxicology, and potent bioactivity were subjected to a molecular dynamic simulation analysis to comprehend the effect of novel ligand binding on the structural properties of the target protein in the simulated biological environment. The MDS analysis for 50 nanoseconds was performed using Desmond (Bowers, 2006), a Package of Schr€ odinger LLC. The docked protein-ligand complex(es) were pre-processed for structural optimization and steric clash minimization using the protein preparation wizard or Maestro. The simulation system was prepared using the System Builder tool with an orthorhombic box, selected as TIP3P (Transferable Intermolecular Interaction Potential 3 Points), solvated with explicit single-point charge (SPC) water molecules. The models were made neutral by adding counter ions where needed. To mimic the physiological conditions, 0.15 M salt (NaCl) was added. The simulation was carried out using OPLS 2005 force field and the NPT ensembles with 300 K temperature and 1 atm pressure was set for complete simulation. The model/s were relaxed before the simulation. The trajectories were saved after every 0.01 nanoseconds for the analysis, and the stability of simulations was evaluated by calculating the root mean square deviation (RMSD), root mean square fluctuation (RMSF), Radius of Gyration (RG), and Solvent accessible surface area (SASA) of the protein and ligand over time.

Data set
Structure validation and optimization of all molecules under study is a critical step in any computational biology study. A minor structural instability may result in unreliable and impracticable data that cannot be reproduced in in-vitro/invivo studies. The initial energy of 150791 kcal.Mol À1 of ERa 3D structure (1 Â 7r) was optimized to À36169 kcal.Mol À1 , which enhanced the post-docking results and LeadIT docking protocol's reliability. The final protein structure had the complete set of 250 amino acid residues (Chain A) and ten active site residues (GLU353, LEU387, MET388, ARG394, PHE404, MET421, ILE424, GLY521, HIS524, LEU525) identified by the PDBsum server. The additional assessment of protein structure quality and validation was satisfactory, and the results are provided in supplementary data ( Figure 1). Similarly, energy minimization of all the chemical structures used in the study enhanced the reliability of the results found in post-computation studies. The compound classes and their SMILE string is provided in supplementary data (Table 1).

Protein-ligand molecular docking analysis
Recently, a few reports (Acharya et al., 2019;Ragavan et al., 2020;Yang & Shen, 2005) on the discovery of novel ERa antagonist/agonist using the FlexX docking program have been published that reflects the efficiency of the FlexX docking algorithm in studying the hormone-receptor interaction. The molecular docking simulation revealed that most experimental ligands exhibit strong affinity towards the core cavity of ERa active site similar to Estradiol with common functional residues. In the top four evaluations, three anthocyanins (Pelargonidin, Delphinidin, Cyanidin) and Hibiscetin exhibited the most stable conformation ( Figure 2) with ERa with DG binding energies À25.9934, À24.2893, À23.5567, À20.7123 kcal.mol À1 respectively. The energy values of these compounds are much lower than the control ligands Tamoxifen and Raloxifene (À16.4166 and À21.0154 kcal/ mol À1, respectively), suggesting favourable energy interactions with higher stability. All these compounds framed !3 hydrogen bonds with the core cavity residues of the ERa active site alongside the several weak hydrophobic interactions (Van der Waals forces). A comparative 2D interaction map of best-matched compounds with the key active site residues of ERa is provided in Figure 3.
The total binding affinity score is an energy function expressed as a sum of the energy contributed by all matched interacting groups, the lipophilic contact area, ambiguous contact area, clash penalty, and the conformational entropy score of the ligand. The total ERa-binding energy scores of the best-matched poses and the contribution of different energy parameters are given in Table 1. Pelargonidin showed the most stable pose among the anthocyanin ligands, forming three hydrogen bonds with the active site residues Glu353, Leu387 and His524, with bond energies ranging from À4.7 to À3.9 Kcal.mol À1 . Besides, Pelargonidin also exhibited several weak Van der Waal interactions with active site residues. All the anthocyanins exhibited strong binding affinities towards ERa, which was more favourable than the control ligands in term of the energy values. A detailed hydrogen-bonding pattern and weak interactions of other best-matched anthocyanins are provided in Table 2. Hibiscetin expressed the most promising results among the flavonoid compounds by forming five H-bonds with Pro324, Glu353, Ile386 and, Trp393 residues of ERa active site, with bond energies ranging from À4.7 to À3.2 Kcal.mol À1 . The total energy score of Hibiscetin was higher than Raloxifen but lower than Tamoxifen; thus, Hibiscetin was considered for further analysis. Although ERa is a steroid hormone receptor, surprisingly, the five studied phytosterols (Campesterol, Ergosterol, Stigmasterol, a-Spinasterol, b-Sitosterol) did not show favourable interaction with ERa (supplementary data, Table 2). In the validatory re-docking step using SwissDock, similar results (À8.60, À8.58 and À8.8 Kcal.mol À1 respectively for pelargonidin, delphinidin and cyanidin) were observed as the anthocyanins exhibited much lower binding energy values than control ligands (À7.93 Kcal.mol À1 for Tamoxifen and À7.54 Kcal.mol À1 for raloxifene) ( Table 2). The numerical difference in the binding energies generated by FlexX and EADock-DSS might result from the algorithmic difference between them.

ADME/Tox profile analysis
The PreADMET server determines HIA data as the sum of bioavailability and absorption evaluated from the ratio of    excretion or cumulative excretion in urine, bile and faeces (Lee et al., 2003). According to the ADME prediction of the best hits, Pelargonidin and Cyanidin were predicted to be well-absorbed by the human intestinal cavity with HIA values 83.58% and 72.5% (Table 3). However, Delphinidin and Hibiscetin showed moderate to low intestinal absorbance, with HIA ranged between 20-65%. These results correlate with previous findings where anthocyanins were rapidly absorbed in animal models and appeared in the bloodstream within a few minutes (Pojer et al., 2013). The Blood-Brain Barrier (BBB) Penetration of all four molecules was predicted to be <1, which signifies CNS (Central nervous system) inactivity. Again, these findings were validated in BOILED-Egg Model for Gastrointestinal absorption and Brain penetration, which ensured similar results ( Figure 4). However, according to BOILED-Egg Model prediction, Hibiscetin was not effluated by the P-glycoprotein pumping of the BBB, which suggest the possibility of BBB penetration of the compound. The Caco2 (human colon adenocarcinoma cells) model for oral drug absorption prediction showed low permeability of all four molecules; however, MDCK (Madin-Darby canine kidney) cell model indicated moderate oral permeability of the experimental molecules. The studied compounds alongside the control compounds showed high plasma protein binding potential with PPB values >90% (Table 3). The rate of skin permeability of the molecules expressed in terms of the human skin permeability coefficient (LogKp), whose value generally ranges between À3 to þ6 for good absorption via the skin, showed low skin permeability of the studied molecules. Additional essential pharmacokinetics, drug-likeness and medicinal chemistry friendliness parameters of the best hits were performed using the SwissADME server (Table 4). The predicted pharmacokinetic parameters also confirmed high Gastrointestinal (GI) absorption and BBB penetration inactivity of the compounds. Except for Hibiscetin, the other three compounds were predicted to be substrates of the permeability glycoprotein (P-GP), a crucial member of the ABCtransporters needed for active efflux through biological membranes (Daina et al., 2017), and play a key role in drug distribution. In addition, the interaction of the compounds with the cytochromes P450 (CYP) isoenzymes (essential for metabolic transformation and elimination of drugs) was also predicted. The inhibition of the five critical CYP-isoforms (CYP1A2, CYP2C19, CYP2C9, CYP2D6, CYP3A4) by drug molecule was linked with low clearance and buildup of the drug or its metabolites in tissues (Daina et al., 2017;Huang et al., 2008;Kirchmair et al., 2015). Among the studied compounds, delphinidin was predicted to be a substrate for all the CYPisoforms; however, pelargonidin and cyanidin were found to be an inhibitor of CYP1A2 isoform. Besides, hibiscetin was found to inhibit both CYP1A2 and CYP3A4 isoforms. The drug-likeness properties help to identify the oral bioavailability prospect of a potential drug candidate. Major pharmaceutical companies frequently use these filters, which includes Lipinski (Pfizer), Ghose (Amgen), Veber (GSK), Egan (Pharmacia) and Muegge (Bayer) filters to screen and exclude unsuited pharmacokinetics profile. Among the studied compounds, pelargonidin and cyanidin showed the most promising results with no violation of the drug-likeness filters; however, delphinidin and hibiscetin violated more than three of the five filters (Table 4). The Abbot Bioavailability Score is another bioavailability predicting criteria that categorize compounds into four groups (11%, 17%, 56%, and 85%) based on their probability of having at least 10% oral bioavailability in rat or measurable Caco-2 permeability (Martin, 2005). The four experimental compounds exhibited a moderate (55%) Abbot oral bioavailability score. The medicinal chemistry features like PAINS (Baell & Holloway, 2010) (for pan assay interference compounds) and Brenk (Brenk et al., 2008) help to recognize structural fragments that may lead to false-positive biological response and chronic toxicity. In the present study, pelargonidin showed no PAINS alert; however, the presence of a charged oxygen (O þ ) atom may impede its metabolism in the biological environment, thus need further structural modification during lead optimization. The rest of the compound (delphinidin, cyanidin, and hibiscetin) showed one or more PAINS and Brenk alerts, like presence of problematic fragments of Catechol and Hydroquinone (Table 4).
Computational toxicity prediction is a fast, reliable and extensively used approach that significantly reduces animal experiments for toxicity determination. The ProTox-II toxicity prediction is based on reliable models developed through known toxicological data (training data) of almost 40,000 compounds. A detailed description of these models is HIA¼ Human Intestinal Absorption, Well absorbed compounds 70-100%, Moderate 20-70% and weakly absorbed <20%. Caco2¼ Caco2 cell permeability, Highly Permeability >70 nm/sec, Middle 4-70 nm/sec and low permeable < 4 nm/sec. MDCK¼ MDCK cell permeability, High Permeability >500 nm/sec, Middle 20-500 nm/sec and low permeability < 25 nm/sec. PPB¼ Plasma Protein Binding, Strongly bound chemicals >90% and weakly bound chemicals < 90%. BBB¼ Blood-Brain Barrier Penetration, CNS active compounds BBB >1 and inactive compounds BBB < 1. Skin Permeability¼ Expressed as LogK p or Human skin permeability coefficients, values ranging from -3 to þ6, which provides the compound with good absorption.
available on their server, based on which the toxicological feature of a new compound is predicted. The toxicity of the investigational compounds was predicted with an average of 70% accuracy and 70-90% similarity with the training data set (Table 5). Among the studied compound, pelargonidin, cyanidin and delphinidin were predicted to be GHS Group-V chemical compounds (2000 < LD50 5000 mg/kg) with predicted LD 50 values ranging between 3.9-5 g/kg body weight. Table 5 shows the detailed toxicological effects of the studied compounds on the major organs and cellular pathways.

Data sets for QSAR analysis
Statistically-based quantitative structure-activity relation (QSAR) study is a powerful computational tool used in   (20), and their biological activity distribution among the training and test set were analysed statistically. The previous step was repeated many times by interchanging the training and test set compounds until the prerequisite of statistically significant activity distribution was achieved. The details of the training and test set compounds and their actual biological activity versus predicted biological activity by different QSAR models are presented in Table 6. The statistical parameters for biological activity distribution pattern in training and test sets for 2D-QSAR and 3D-QSAR are given in Table 7. Based on this data, the 2D and 3D QSAR models were built for ER-a binding compounds. For an interpolative test set, i.e. test set derived within the min-max range of the train set, the test's maximum should be less than the training set's maximum, and the test's minimum should be greater than the training's minimum. A higher mean indicates the presence of relatively more active molecules in a particular set. The standard deviation (SD) helps to interpret the point density distribution (along mean) of the two sets, and similar SD values in the training/test set specifies that spread in both the set with their respective mean is comparable. The activity distribution pattern of the known ER-a ligands in 2 D-QSAR satisfied all statistical criteria, suggesting that the test set lies within the activity domain of the training set.

2D-QSAR
Both the MLR and the supportive PLS analysis with the training and the test sets exhibited statistically significant QSAR models. The descriptors definitions and their percentage contribution to bioactivity are shown in Table 8 Statistics: n ¼ 30, Degree of freedom ¼ 26, r 2 ¼ 0.7959, q 2 ¼ 0.7048, F_test ¼ 33.7927, r 2 se ¼ 0.4561, q 2 se ¼ 0.5485, pred_r 2 ¼ 0.2343, pred_r 2 se ¼ 0.9970 (se ¼ standard error) Both the 2D-QSAR models were statistically valid with r 2 (correlation coefficients) values 0.7972 (MLR) and 0.7959 (PLS). The predictive ability of the models judged through internal validatory q 2 was robust, with values 0.7043 (MLR) and 0.7048 (PLS) using the LOO method. Also, the models' external predictive ability (pred_r 2 ) was found to be within standard limits. The higher F_test values for both models suggest their statistical robustness for prediction. Other validatory parameters are summarized in Table 9.
Analysis of the 2D-QSAR Models. The MLR and PLS models suggest negative contributions of Baumann's alignment independent (AI) topological descriptors such as T_O_O_5, T_F_F_7, and T_T_Cl_4, while T_O_O_6 contribute positively for ERa-binding ability of the compounds. In addition, the physicochemical descriptor (Estate number) SsOHcount also contribute negatively to the compound's bioactivity. Baumann's Alignment Independent (AI) descriptors were computed for a molecule by considering its topology, atom type and bond, which assigned at least four topological features. First, 'T-attribute' thoroughly characterize the topology of the molecule. The second attribute is the atom type as its chemical symbol. The third attribute assigns to atoms taking part in a double or triple bond. Finally, the selective distance count statistics for all combinations of different attributes are Table 6. The reported ER-a agonist/antagonist with their biological data and predicted data used in building and validating 2D and 3D QSAR models.

Compounds (Database ID)
Biological data  Table 8. Definition/significance of the 2D-descriptors that are crucial for ER-binding activity of chemicals.
Descriptor Significance

2D QSAR T_O_O_5
The number of oxygen atoms (Single, double or triple bonded) separated from other oxygen atoms by five bond distance in a molecule.

T_F_F_7
The number of fluorine atoms (Single, double or triple bonded) separated from other fluorine atoms by seven bond distance in a molecule.

T_T_Cl_4
The number of chlorine atoms (Single, double or triple bonded) separated from other atoms by four bond distance in a molecule.

T_O_O_6
The number of oxygen atoms (Single, double or triple bonded) separated from other oxygen atoms by six bond distance in a molecule.

SsOHcount
The count of the total number of -OH group connected with one single bond.  computed. In our study, these attributes were considered: T (any), 2 (double-bonded atom), 3 (triple-bonded atom), H, C, N, O, F, Cl, and S with a distance range of 0 to 7.

3D-QSAR
The 3D-QSAR models were built by computing the steric (S), electrostatic (E), and hydrophobic (H) fields descriptors of the known ER-a inhibitors. The grid parameters used for generating 3D descriptors are provided in supplementary Tables 3  and 4. The statistical rationality of the generated models in predicting the activity of the test set was arbitrated based on the values of internal predictive ability or q 2 and the external predictive ability or pred_r 2 . In addition, the 3 D lattice points marked around the test set compounds might play a crucial role in their ERa-binding ability (Figure 7). A graphical representation of the predicted versus actual biological activity of the known ERa inhibitors in both the   3 D-QSAR models is provided in Figure 8. The PCR and kNN method provided two statistically significant models as shown below: Model-3 (PCR).

Model-4 (kNN).
Log10 IC 50 ð Þ ¼ S 1843 ðÀ0:0666 À 0:0628Þ S 2998 ð30 30Þ S 5307 ðÀ0:0005 À 0:0005Þ The 3D-QSAR models showed acceptable statistical parameters with LOO cross-validation squared correlation coefficient q 2 values 0.4106 and 0.4769, respectively, for PCR and kNN method. The models' external test set prediction abilities were determined based on predictive squared correlation coefficients (pred_r 2 ) values. The pred_r 2 values of PCR and kNN methods were 0.4390 and 0.4626, respectively, which lies within the accepted range of more than 0.4. The higher F_test values for both models suggest their statistical robustness for prediction. Although, in the PCR model, a few compound's activities were over-predicted (Table 6), which might have happened due to an experimental error that generally occurs while evaluating a large data set. Additional validatory parameters are in Table 9.
Analysis of the 3D-QSAR Models. The 3D-QSAR models are based on steric, electrostatic and hydrophobic 3D data points (descriptors) around the molecules. The present study found that only steric data points are essential for the ERabinding activity of the compounds used for generating 3D-QSAR models. The PCR model, which uses a modified regression protocol, showed that steric data points S_2998 and S_3943 contribute negatively to bioactivity, whereas S_2387 contribute positively. Further, the kNN model shows the impact of the range of steric interaction field values in the generated 3D data points. The data points and their ranges generated in the kNN method are S_1843 (À0.0666-0.0628), S_2998 (30 30), and S_5307 (À0.0005-0.0005). The positive and negative ranges of the steric descriptor indicate that more steric substituents or groups and less steric substituents or groups are preferred for selectivity and specificity for ERa-binding activity.

Validation of the QSAR models
All the four 2D and 3D QSAR models were statistically validated for their predictive ability using internal validation (q 2 ), external validation (pred_r 2 ), and randomization (Z score). As a result, the predictive abilities of the models were found to be satisfactory. The internal and external validation results for the 2D and 3D-QSAR models were discussed in the respective sections. In addition, the Z-scores for randomization in the three regression methods (MLR, PLS, and PCR) used were within the acceptable range and provided in Table 9.

Bioactivity prediction of the studied compounds
Based on the generated 2D and 3D models, the studied compounds' Log10(IC 50 ) values were generated using the Generic Prediction module of the VLifeMDS suite. The predicted Log10(IC 50 ) for the experimental compounds in 2D and 3D-QSAR was converted to IC 50 values and are provided in Table 10. In addition, the 3D steric field data points for the experimental compounds, generated using the 3D-QSAR models, which are essential for their ERa-binding ability, are shown in Figure 9. The QSAR models generated using regression methods showed hibiscetin as the most potent ERabinding compound followed by the anthocyanins. Contrarily, the advanced kNN method showed the anthocyanins have more potent ERa-binding ability than hibiscetin.

Molecular dynamics simulation analysis
As pelargonidin was the most promising find of the preceding experiments, its stability on binding ERa subtype and the effect on the structural integrity of the target protein was further evaluated using an MDS analysis. The analysis was performed using Desmond software, a Package of Schr€ odinger LLC, to estimate RMSD, RMSF, proteins' radius of gyration and SASA of the docked protein-ligand complex. The RMSD values against simulation time of all the C a atoms of ERa and all heavy atoms of pelargonidin were detected concerning their initial structure. The RMSD values of the backbone C a atoms of ERa and pelargonidin as shown in Figure 10 exhibited gradual stability towards the end of the  Figure 9. The essential steric field data points of the experimental compounds for their ERa-binding activity.
simulation. The RMSD values for C a atoms of the protein was found to fluctuate between 0.3-2.4 Å, which is completely acceptable for small globular proteins. The ligands' RMSD does not vary significantly from the RMSD of the protein, thus suggesting a strong binding.
To get further insight into the protein-ligand binding stability and structural fluctuation, we conducted the RMSF analysis, which is valuable for identifying local changes along the protein chain (RMSF-P) and ligand atom positions (RMSF-L). The results shown in Figure 11 depicted high fluctuation at the N-terminal region of ERa, which typically arises due to the unstructured tail of the protein chain. The RMSF-L suggests that the ligand fragments undergone at least four conformational changes to interact with protein in the entropic binding event. Further, the radius of gyration (RG) which is an indicator of the protein structure compactness in respect to simulation time was calculated. As apparent from the Figure 12 (Left), the radius of gyration of ERa-Pelargonidin ranging between 18-19Å, suggesting that ligand-binding in the active-site of ERa increases the compactness of the protein. Similar RMSD, RMSF, and RG results for ERa ligand-binding poses were observed by Geetha Rani and Lakshmi (2019), Chakraborty et al. (2013), and Chakraborty and Biswas (2014). Additionally, the solvent accessible surface area (SASA), which is a measure of exposed and buried areas was also computed. The results showed a high SASA values ranging from 1200-1350nm for the protein-ligand complex indicating effortless accessibility of the ligand to the active site.

Discussion
The target protein, estrogen receptor alpha subunit (ERa), is one of the primary therapeutic targets for breast cancer treatment. Presently, several selective estrogen receptor modulators (SERM) drugs are used as an agonist/antagonist of estrogen hormone to suppress the ER-mediated growth  and proliferation of cancer cells. Although this approach abetted to shrink the breast cancer-related mortality, the toxicity associated with many potent SERM molecules limit their effective dose in chemotherapy. Thus, the search for an effective SERM molecule with selective cytotoxicity towards cancer cells continues. The medicinal plant, Hibiscus sabdariffa Linn. or roselle, gained much attention in recent years for its intense antioxidant activity and pharmacological activities against several diseases, including many types of human cancers (Izquierdo-Vega et al., 2020;Laskar & Mazumder, 2020;Riaz & Chopra, 2018). The pharmacological activities are principally linked to the plants' rice polyphenolic constituents like anthocyanins, flavonoids, phenolic acids, and phytosterols (Izquierdo-Vega et al., 2020;Salem, 2021). Further, the crude extract of the plant was reported to exhibit potent antiproliferative activity against ER þ MCF-7 breast cancer cell-line and alter the cellular localization of ER; however, the molecular mechanism or the phytochemical basis of these activities is unidentified. Consequently, the present study used a suite of computational tools to identify a potential ERa-binding compound from a library of already reported and underexplored (for ERa-binding) compounds of Hibiscus sabdariffa.
The ER-alpha model (1X7R) used in this study was employed in similar previous studies (Grande et al., 2018;Wang et al., 2010) that investigated plant-based ER agonists/ antagonists that imply the efficacy of this model. Further, the structural validation steps for the target protein increased the reliability of the protein-ligand binding data in molecular docking. The docking method successfully predicts binding characteristics and affinities of small ligands to biologically relevant target proteins (Tao et al., 2020). The study found that among the twelve experimental compounds, the three anthocyanins-pelargonidin, cyanidin, and delphinidin binds the ERa active site with the precision of the natural ER-ligand, 17b-Estradiol. These compound binds to the identical active site of 17b-Estradiol and conserved many key contacts (Glu353 and Gly521) similar to the steroid hormone and the co-crystallized reference ligand (genistein). The critical ligandbinding residues in the ERa active site for these compounds were Glu353, Leu387, His524 and Gly521. Besides, several hydrophobic interactions of these compounds were also observed with active site residues: Ala350, Phe404, Leu391, Leu387, Gly521, Leu525, Leu384, Ile424, Leu346, and Met388. Notably, the binding energies of the three compounds were found to be more energy favourable than that of popular ER-agonist/antagonist drug tamoxifen and raloxifene, controls for this study. In addition to these three compounds, hibiscetin and protocatechuic acid were among the top five observations for their strong affinity towards the ERa-active site. The cross-validation docking step using different docking algorithm yielded similar results. Thus, the MDS study suggested a close ERa-binding affinity of the anthocyanins when compared to the drug controls and the biological ER ligand. This similarity can be related to the study of Schmitt and Stopper (2001), which reported estrogen-like effects of anthocyanins in ER þ breast cancer cell line (MCF7) that reduced the proliferative action of estradiol. Surprisingly, the same effects were absent in ER-negative MDA-MB-231 cells (Schmitt & Stopper, 2001). The outcomes advocate for the competitive binding of anthocyanins in the ER-active site that might play a key role in reducing the hormone-dependent proliferation of cancer cells. Anthocyanins were also reported to minimize cardiotoxicity and serum cholesterol levels in experimental animals by obstructing the estrogen signalling pathway (Huang et al., 2016;Li et al., 2019). These observations suggest the possible ER-modulatory effects of anthocyanin, which coincide with the present study. Finally, the top four compounds were subjected to toxicological studies.
Recently, the toxicology prediction server PreADMET was used in several impactful published reports (Balaji & Chempakam, 2010;Tan et al., 2013;Viana Nunes et al., 2020) that suggests the reliability of the server. Pharmaceutical industries widely accept these in-silico toxicological analysis methods to support various decision-making in the drug development process that significantly reduces the number of compounds screened using animal models (Dahlgren & Lennern€ as, 2019). The toxicological analysis predicted that pelargonidin and cyanidin would be well-absorbed by the human intestine (HIA), a prerequisite for an orally administered potential drug candidate (Dahlgren & Lennern€ as, 2019). The other two experimental ligands, delphinidin and hibiscetin, showed moderate to low intestinal absorption. The intestinal absorption outcome predicted here coincide with experimental data on animal models, where anthocyanins were rapidly absorbed into the bloodstream within 20 minutes of consumption (Matsumoto et al., 2001). However, one major factor influencing anthocyanin's oral administration is their microbial degradation in the buccal cavity and intestine, affecting their absorption and bioavailability (Del Rio et al., 2013;Kamonpatana et al., 2014). Although this is a contradictory area where some researchers suggest microbial degradation hamper the bioactivity of anthocyanins by reducing their bioavailability. In contrast, others reported microbial degradation to anthocyanin metabolites amplified their activity as metabolites are more abundant and often better absorbed (Eker et al., 2019;Williamson & Clifford, 2010). Admiringly, several vehicle delivery methods like dietary fiber, starch microgels and liposome encapsulation are successfully approached for the targeted delivery of anthocyanins (Hwang et al., 2013;Tang et al., 2020;Wang et al., 2013). The Caco2 cell permeability of all the experimental compounds was predicted to be very low compared to the control ligands suggesting a slow rate of flux of these compound across polarized human colon epithelial cancer cell (Caco-2) monolayers. However, pelargonidin, cyanidin and delphinidin showed a moderate rate of P-glycoprotein efflux across Madin-Darby canine kidney (MDCK) cells, which is used as a Caco-2 alternate technique to predict intestinal permeability and oral absorption of investigational compounds. The MDCK parameters of the experimental ligands were similar to the control ligands. Besides, all the compounds and the control ligands were predicted to strongly bind blood plasma protein, which may reduce their efficiency in transport and distribution through the bloodstream. However, animal experiments showed that anthocyanins are rapidly taken up from the blood into tissues (Vanzo et al., 2011). The high plasma protein binding nature of the molecules can be enhanced by functional group modifications as suitable to improve their effectiveness in transport and distribution. Lastly, pelargonidin, delphinidin and cyanidin were predicted to be CNS-inactive in both PreADMET and BOILED-Egg analysis, indicating a low risk of affecting regular CNS activity. However, glucosides of these anthocyanins exhibited the ability to cross the bloodbrain barrier readily in animal models and produce diverse neuroprotective effects (Andres-Lacueva et al., 2005;Galli et al., 2006). The results of this ADME study coincide with Pojer et al. (2013), which documented the encouraging pharmacokinetic property like rapid absorption, distribution, metabolism and excretion of these anthocyanins.
Some additional pharmacokinetics and toxicological profiling were performed using SwissADME and ProTox-II server. The cytochromes P450 are a family of >1000 known enzymes abundant in the human liver and known to metabolize most drugs whose biotransformation is known (Anzenbacher & Anzenbacherov a, May 2001). The probable metabolism prediction of the compounds by the five most abundant and critical P450 isoforms showed that the compounds are substrates for most of the P450 isoforms. The observation suggests a possible rapid metabolism and removal of the compounds from the body. In addition, the compound pelargonidin and cyanidin showed the most efficient results with zero drug-likeness rule violation and a moderated Abbot bioavailability score, suggesting favourable structural features of the compounds to use as templates for novel drug molecule. Apart from pelargonidin, the other three compounds showed more than one problematic structural feature like PAINS and Brenk regions, which may result in a false-positive biological response due to chronic toxicity, high reactivity (irrespective of the target protein), and metabolic instability. Thus these compounds may need further structural modification for these regions. Further, as reported by the earlier bioassays (Ghattamaneni et al., 2020;Wallace & Giusti, 2015), this study also found anthocyanins to be very low toxic, with LD 50 values ranging between 3.9-5.0 g/kg body weight in rodent. The anthocyanin, pelargonidin did not show any organ or tissue-specific toxicity; however, the other three compounds having PAINS and Brenk regions may have carcinogenic or mutagenic potential. Further, all four compounds were found to have an affinity toward the aryl hydrocarbon receptor (AhR), a ligand-activated transcription factor that propagates foreign chemical stimuli with adaptive responses like detoxification, cellular homoeostasis or immune responses (Esser, 2012). Generally, the AhR can be activated by several classes of compounds (xenobiotics) such as polycyclic aromatic hydrocarbons, benzimidazoles and flavonoids, which further may interact with several nuclear proteins like estrogen or androgen receptor to modulate their transcriptional ability (Ramadoss et al., 2005). An activated AhR is also linked to obtrude antagonistic effect on the ER transcription, which is a much-advocated approach for breast cancer treatment (Ramadoss et al., 2005). Interestingly, pelargonidin was also found to be a probable aromatase binding compound, which unfolds the possibility of it being screened as a potential aromatase inhibitor for breast cancer treatment. Notably, all the four best-hit compounds were predicted to have the ability to disrupt mitochondrial membrane potential (MMP), a feature wellestablished for apoptosis-induction in stressed cells. Conclusively, apart from hibiscetin other three compounds showed acceptable toxicological data in this study.
Finally, different 2D and 3D-QSAR models were built for the ERa-binding efficacy of chemicals to predict the bioactivity of the best-hit four compounds. The QSAR technique is based on the widespread assumption that structural variation can affect changes in biological activity. The QSAR models were built by dividing fifty ERa-binding compounds with known biological activity (IC 50 ) into training and test set. A uniform distribution of the compounds in term of their variable structural features and bioactivity in both training and test set was confirmed. As a result, four statistically significant QSAR models were obtained, whose statistical significance was defined based on the regression correlation coefficient or r 2 (for 2D models) and internal cross-validation predictive ability or q 2 (for 3D models). Further, the predictive ability of the models for an external test set was judged based on the values of pred_r 2 . In addition, the randomization test shows confidence of $99.9% that the generated model is not random, and hence it is chosen as the QSAR model. The regression plots provide an insight into how well the models were trained and their external predictive ability, and the presence of the data points close to the regression line indicates the good predictive ability of the model. The four experimental compound's half-maximal inhibitory concentrations (IC 50 ) were predicted based on the newly formed QSAR models. Both the 2D-QSAR model depicted four Baumann's alignment independent (AI) topological descriptors and a single physicochemical descriptor (estate number) play a critical role in the ERa-binding ability of the compounds in the training set. Among the extracted topological descriptors, T_O_O_5, T_F_F_7, and T_T_Cl_4 were found to effect the bioactivity inversely. This observation suggests that the presence of bonded (mono, di, tri) electronegative oxygen, fluorine and chlorine atoms with other same atoms separated by the shown bond distances is detrimental for the ERa-binding ability of these compounds. Besides, the inverse effect of the physicochemical descriptor SsOHcount on bioactivity suggest that less number of -OH group connected with one single bond to the phenyl ring is desirable. In addition, the topological descriptors T_O_O_6 is directly proportional to the bioactivity, indicating that the presence of oxygen atoms (mono, di or tri bonded) separated from other oxygen atoms by six bond distance enhances the biological activity of these compounds. Oppositely, the 3D-QSAR models mainly depend on the steric, electrostatic and hydrophobic interaction field descriptors (points) of the training compounds. The PCR technique used here is a modified regression method that depends on the values of different field descriptors, whereas the kNN method also considers the effective ranges of the critical descriptors. The PCR analysis showed that only the steric field descriptors are essential for the ERa-binding of these compounds. Among them, the steric field point S_2998 and S_3943 values of the compounds are inversely proportional to their bioactivity, whereas S_2387 contribute directly. The data point S_2998 (30 30) was also found to effect the bioactivity of the compounds alongside two other steric descriptors, S_1843 (À0.0666-0.0628) and S_5307 (À0.0005-0.0005) in the kNN method. The models generated through different regression method predicted higher ERa-binding activity of the experimental compound hibiscetin at lower doses (0.001 nM), followed by delphinidin (0.011 nM), cyanidin (0.042 nM) and pelargonidin (0.162 nm). Contrarily, the advanced kNN approach showed the three anthocyanins: delphinidin, cyanidin and pelargonidin, as the more active compound toward ERa with predicted IC 50 values 0.118, 0.119, and 0.119 nM, respectively. Owing to the sizeable structural similarity among these three anthocyanins, the similar bioactivity predicted by the kNN method is more acceptable. In previous bioassays, pelargonidin and delphinidin exhibited potent anticancer activity against osteosarcoma (U2OS) and Leukaemia (HL-60) cells, respectively, with IC 50 values 15 and 75 lM (Chen et al., 2018;Hou et al., 2005). Although presently, no significant in-vivo or in-vitro data are available that specifically reported the ER-binding ability of hibiscus anthocyanins that suppresses ER-mediated carcinogenesis in mammary epithelium. As the QSAR models designed in the present study is based on experimental compounds that showed ERa inhibition/modulation in cell-based assays involving ER þ human ductal breast cancer cells (MCF7), the findings of this study need further investigations preferably in MCF7 cells or in carcinogen-induced breast cancer models in future bioassays.
Finally, the anthocyanin pelargonidin was the most encouraging find of the study with a high binding affinity for ERa, low toxicity parameters, and effective bioactivity (IC 50 ) towards ERa. The docking studies showed that both the drug control and the biological ligand of ERa presented an H-bond with the side-chain residue Arg394, located in the ligand-binding domain of the nuclear hormone receptor. However, pelargonidin did not show any interaction with that particular residue in the active site. To further clarify how this will affect the effectiveness of the compound or function of ERa, a 50 ns molecular dynamics simulation was performed to reflect the effects of the particular ligand binding on the structural integrity of the protein. From the MDS results, it is evident that the missing interaction of pelargonidin with the Arg394 of the active site of ERa doesn't hugely affect the binding affinity of the compound neither it hinders the protein conformation. The RMSD and RMSF values indicate that the protein is not undergoing any notable conformational change during the simulation. The observed RMSD of the ligand is considerably similar to that of the protein RMSD, indicating a strong binding at the initial binding site for a significant period. In addition, the radius of gyration and SASA values of the protein-ligand complex also hints towards a possible robust protein-ligand interaction in the biological environment.

Conclusion
Presently, anthocyanins are a class of widely accepted naturally occurring flavonoid compounds with diverse pharmacological activities that correspond to their beneficial effects on human health. It is well-established by long-term extensive pharmaco-vigilance studies that these compounds have negligible toxicity. Despite of their low bioavailability, concurrent research showed that their metabolites might be responsible for much of their pharmacological activities. Several advanced methodologies for tissue-specific delivery of these compounds are currently used to overcome their low bioavailability and degradation under biological environments. The results of the present study suggest that the estrogenlike action of the plant Hibiscus sabdariffa on inhibiting MCF7 cells proliferation might result from its rich anthocyanin contents. Among the anthocyanins, pelargonidin was the prominent find that satisfied all the in-silico studies with a promising outcome. However, a standardized set of laboratory methodologies must be performed in future nutritional studies to validate these findings in biological conditions. The result might help in designing a lead for novel ERaagonists/antagonists as potential anticancer agents.