Role of artificial intelligence and machine learning in bioinformatics: Drug discovery and drug repurposing

Artificial intelligence AI or machine learning has proven to be a potential activity in the health and biomedical sciences. Previous research it has found that AI can learn new data and transform it into the useful knowledge. In the field of pharmacology, the aim is to design more efficient and novel vaccines using this method which are also cost effective. The underlying fact is to predict the molecular mechanism and structure for increased likelihood of developing new drugs. Clinical, electronic and high-resolution imaging datasets can be used as inputs to aid the drug development niche. Moreover, the use of comprehensive target activity has been performed for repurposing a drug molecule by extending target profiles of drugs which also include off targets with therapeutic potential providing a new indication.


1) Introduction:
Over the last few years, an immense progress has made in the fields of artificial intelligence, machine learning and bioinformatics and more research is needed for understand the data in biological science and related problems. Bioinformatics is a subdivision of science that involves the analysis of biological data using mathematical principles, statistical tools and certain algorithms in addition to computational approaches [1]. Artificial intelligence is the ability to solve various problems related to human intelligence and in turn the simulation of these intelligence processes using computer systems or software. It involves machine learning that allows one to perform all of these tasks based on its training [2]. Basic or structural bioinformatics tools use artificial intelligence and machine intelligence for the design of drugs and repurposing various novel compounds against many diseases such as cancer, neural inflammation and others using the silica approach by applying certain tools with principles of artificial intelligence and machine learning. Bioinformatics has been used to analyze data and logical conclusions. The huge amount of data obtained from whole genome sequencing projects and bioinformatics is used for the annotation of biological data in meaningful ways [3]. Similarly a large collection of problems has been solved by combining the knowledge and abilities of artificial intelligence with bioinformatics approaches for the prediction of genes, studies involving the study of protein interactions, computational systems for drug design, repurposing drugs for better efficiency, next generation sequencing and development of other software. Therefore, both artificial intelligence and machine learning have useful applications in the field of bioinformatics. The proficiency of artificial intelligence can be changed by varying input data. Artificial learning is further classified into generalized and applied branches. Both of them are totally different where applied is involved in the use of machines and algorithms while applied to stimulate the data into expressions similar to the thoughts of humans automatically [4].
A brief history of artificial intelligence was given by Gravey in the form of three systems or paradigms which include; GOFAI having its basis in the 1960s, expert based systems from 1980 to 1990s and machine learning which is prevailed from 2010 to the present. The first technique GOFAI is based on a heuristic technique and was derived from general purpose logical systems.
The expert system moved from the general intelligence to human experts which included chemistry and medicines, replicating the descion making processes. This in turn has led to major artificial 3 intelligence systems in medicines such as MYCIN and more familiar software called TurboTax.
These AI algorithms produced similar results but the results were limited and didn't involve the generation of human thinking machines they had promised earlier. Current machine learning has overcome some of the barriers and has predicted a huge amount of data, increased computational power and the revival of neural networks. These algorithms can be trained for exploit further data and thus don't require human labor or programmers for data prediction [5].
Artificial intelligence and computational approaches have revolutionized drug discovery processes in addition to addressing the risks and challenges related to it. Similarly, drug repurposing us a process using which new drugs are developed from existing drugs that provide medical indications beyond their therapeutic use. Drug repurposing offers various advantages over de novo synthesis of drug molecules which involves identification and production of new drug molecules. On the other hand repurposing causes the production of drug in a much earlier time period as compared to new synthesis, provides a much lower rate of failure of drug molecules and leads them to the clinical phase because its ingredients include compounds that have already been tested. In addition drug repurposing allows to the preclinical testing phases in humans and other animal models to be passed away, thereby reduce the overall cost of drug development [6]. If safety testing has been performed then it will display compatibility of dosage with the new indications. Drug repurposing has been discovered by chance on the basis of random testing and exploration. For example, in the market, sildenafil citrate was discovered as a hypertensive drug and then repurposed with Pfizer resulting in the formation of a new drug molecule Viagra for the treatment of erectile dysfunction proved by clinical studies directly from its formulation which in turn provided massive sales in addition to additional health benefits. From the last few years, several computational approaches have been proposed for the repurposing of drug molecules [7]. Popular information for this purpose has been acquired from in silico drug repurposing such as electronic records, gene based information, genetic expression response-based profiles, mapping of the targeted complex molecules and phenotype-based profile assessment. Drug repurposing hubs and repurpose hub were recently surveyed. There is also some literature on drug design, discovery and optimization of lead during development, leading to the development of a totally novel molecule [8].
However, the aim of this review is to use artificial intelligence and machine learning programs that use publicly available data and information. The main point of emphasis is the comprehensive target based activity of the drugs for the discovery of drug molecules and repurposing where existing drug molecules have some target effects over recently identified target effects for the purpose of new indications [9]. Hence it provides evidence for the further development of drug molecules and their commercialization. This is particularly true for the drugs that are not specific to any target but have the activity for a number of targets showing broad spectrum activity. For example, in some cases cancer off target candidates are available which in addition to having anticancer activity also have potential for the production of new drugs. Here the point to consider is that repurposing us not only to cancer related drugs but also advanced to SARC-CoV novel virus which can be readily applied to patients with COVID-19 [10].
Repurposing was initiated after the phonotypical observation of adventitious drug molecules by polypharmacological activities. For example, we observed the most surprising effects of axinitib which is an inhibitor of the endothelial growth receptor, for treating renal cell carcinoma in both chronic and acute cases. We thought that cancer was caused by the presence of the BCR-ABL1 fusion protein and the drug binds to it. However, the results were surprising based on the computational analysis which showed that this drug binds to the muted form T315I-linked BCR-ABL1 more strongly than the wild form [11]. There are further reports that suggest that the drug may lose its potency when it binds to other molecules and the dosage of the drug is not found to be effective against ponatinib resistant cells. This also showed that these algorithms can also be used to point out the drawbacks before the repurposed drug enters the new phase. Hence artificial learning is now used for the prediction of personalized drug molecules by analyzing the genomic data of patients using certain computational techniques in addition to its diagnosis [11]. These 2) Complications in drug discovery: From the early decades, there has been a long and complex history of the drug development and repurposing. The discovery of the new drug is still a much time taking process requiring about 10-12 years being required for bringing an effective medicine or potent drug molecule from laboratory to market place. Moreover it is very expensive procedure and require about 4-5 billion for discovering a drug molecule which will put financial pressure on the patients which cannot acquire these potentially expensive treatments [13]. As for example following 2020, the cost of drug and treatment in case of cancer is found to increase from. 130 billion dollars to 157 billion dollars.
Additionally various drugs have to meet certain properties for approval which include pharmacokinetics, pharmacodynamics, efficacy and toxicity in both the in vivo and vitro studies [14]. The pre-clinical analysis of the drugs continues to check out the safety of a molecule in four phases. Basically, the discovery of drug gets hindered by the efficacy or toxicity profiles during its development. According to the research it has been found that most of the drugs pass the phase 1 of the safety or toxicity trials but are failed in phase 2. As for example in the failure of the drugs in clinical or pre-clinical trials the oncology of cancer related drugs possesses highest failure rate.
Recent development in drug discovery allows the development of only one of the drug which targets the disease specifically [15]. However still the success rate is much lower showing that 1 6 out of every 510 drugs is approved by FDA. Moreover, it was found that only 5% drugs are able to pass the phase 1 of clinical trial. This number decreases further when a drug enters phase 2 and hence suggests much failure rate. For the discovery if drug molecule, it is important for scientists to understand underlying mechanism of disease as for example in case of cancer progression [16].
More than 500 signaling molecules are involved but the drug being discovered only targets one of them. So, there is also need of research in drug targets instead of only focusing on the single target.
Hence most of the drugs are being classified on the basis of their pre-clinical studies and these findings are not applicable at clinical levels. Moreover, the lack of the quality in both pharmacokinetics and pharmacodynamics of the drug molecule results in failure of molecules. So the further testing focuses and aim to study the potential of drug mole Iles from pre-clinical or clinical findings [17].

3) Computational methods for the classifications of variants:
In the recent years, next generation sequencing such as whole genome sequencing has been applied for understanding the genetic basis of the diseases. Using whole exome sequencing WES, the genetic variants in humans can be identified easily which showed that missense variants are responsible for genetic disease in humans. Not all the variants are associated with genetic variants only deleterious one are involved in Mendelian diseases, cancers etc. [18]. Identification of these deleterious variants at a time is a much laborious and time consuming procedure due to its complexity. Hence computational methods have been purposed for solving such problems efficiently by using techniques or methods such as sequence evolutionary techniques, sequence homology and structural similarity data [19]. In the case of missense variant identification, almost all of the methods are applied at once being utilized in our studies [20]. Vcards is the computational tool for the identification of genetic variants and has integrated the functional consequences of allele frequencies, computational methodologies and genetic information coded for variants. But it is still difficult to find out the variance in these computational methods because they differ continuously with time [21].
However, a number of studies have compared the missense variation in case of computational methodologies. But they don't employ the use of experimentally obtained datasets. And these studies are based on ROC receiver operating characteristic curves. But the other properties must also be taken into account which included accuracy, efficiency, infectivity, absorption and mechanism in AUC which are not yet been considered [22]. However there also the studies which

4) Artificial Intelligence or machine learning in precision drug discovery:
National Institute of health NIH has highlighted the fact that precision medicine is an emerging strategy for the 'purpose of drug prevention or treatment which also considers the other variations in genetics, libertines and environment. This allows the doctors and physicians to treat and recover diseases more accurately than that of another method employed so far [25]. In order to make this more powerful it requires super mount and fastidious techniques which can be later used in an unfrequented way for trained set of data. The field of artificial learning uses the cognitive ability of the physicians and doctors using biomedical and bioinformatics data to produce fruitful results.
Artificial intelligence can be broadly classified into almost three categories which include artificial general intelligence, artificial narrow intelligence and super intelligence [26]. ANI or artificial natural intelligence is still in the process of development and aims to hit the market or research in next decade. It has the ability to develop new data set, analyze them, to find the correlation among them and draw the meaningful or useful conclusions [27].

Fig.02. General steps for drug discovery using AI and ML
The most popular pharmaceutical companies are using this deep learning, artificial intelligence and ANI during their development process which aims to identify the unique genetic mutations in a large set of data and thus physicians can use it in various fields of medical science [28]. As we have already discussed that using this technique, we can make drug discovery process more fastidious. Atom wise is used by Biopharma for the identification and prediction of the small structures in 3D basis now a days. The atomwise has used this technique for the identification of active molecules in the case of Ebola virus [29]. Using Artificial Intelligence Company has developed two different drugs which are more effective against Ebola virus as compared to medications which are existed before. Although the method proves to be promising but the companies need to provide the safety and other consideration overview using a peer review or other methods. The fig 01. Presents the general cycle used by artificial learning in discovery of drug molecule which also uses bioinformatics based data [30].

Role of AI in drug identification:
Virtual screening has been developed for reducing the cost of high throughput screening and has been used for the better optimization of small molecules acting as the potent drug candidates. The generalization, machine learning and other methods implanting AI has been used for the virtual screening if drug molecule at different points. Virtual screening is further classified into ligand and structure based virtual screening [31]. In the ligand based the binding between legend and structure molecule is analyzed and then predicted based on situation. Based on the knowledge of virtual screening, the structure based analysis can be done using AI methodologies [32]. The correlation existing between the protein -ligand binding free energy and the resulting vector is observed using data driven manner based on the existing experimental data available which also predicts non-linear relationships for getting strong scoring functions. The RF based and SVM based score, ANN and NNS score are all based on AI based non predetermined score which have develop to identify most potential ligands having high accuracy score. The recent advancements in field of AI based non-predetermined scoring had outperformed the existing traditional methodologies for prediction of binding affinity discussed in several researches [33].
For the improvement of AI score five major algorithms were adopted namely SVM, Baysein, deep neural networks and forward ANN reports. Ballsteret al. predicted the techniques for the improvement of these machine learning techniques being employed in increasing or enhancing pre-dominated AI scoring which in turn provides better type of binding between protein and ligand binding complexes. Based on the information, he developed RF based software for predicting protein ligand docking score [34]. Some other scoring functions such as B2B scoring function, SFC scoring, RF-Ichem have also been employed for calculation of docking scores. Regarding comparison of their RF score, the RF score predictions are found to be outstanding which is connected to ISTAR platform used for the large protein -ligand binding dockings [35]. designed by scientists for the identification of the perfect targets which possess highest activity of binding with the targeted molecule [36]. It integrates with two-structural based approaches which include protein ligand binding pharmacophores search and docking and another four structural based approaches which include vector regression affinity based prediction, SVM binary classifications, three dimensional similarity search function and neighboring affinity prediction [37]. In the virtual screen of structure-based prediction, RF score has been found and applied subsequently for the prediction if accurate targets. RF-score VS is actually an enhanced level scoring which are trained on the directions of full decoy datasets. The integration of AI in suture based virtual screening is found to be a promising approach and has been used for improvement of the post processing after the structural based virtual screening by reconsideration of the virtual score calculated with docking algorithms using machine learning models without any consensus scoring.
As for examples Auto dock can also be integrated with RF score virtual screening in order to get better performance in VS. Integration of advanced learning and machine-based methods is helpful in the prediction of potential ligands. Hence the involvement of AI and ML in drug discovery has resulted in bringing down the value of false negative and false positive predictions in term of drug discovery process and in future it will also consider the physicochemical properties and structural information of predicted protein [38].

5) Role of machine learning in repurposing of drug by accurate DTI prediction:
Accurate drug target identification is not only overcome the experimentally mapped DTI based methodology but also have the ability to repurpose the drug molecules by targeting the already available drug molecules which are already approved [39]. There are various in silico-based methods available for repurposing of drug molecule using DTI mappings. For instance, Mei.et AL proposed a multiple learning method for the repurposing of drug based on already known drug targets. During this process each of the drug molecule is considered as class label and the target protein acts as a learning data for training of I2 logistic based regression model [40]. About 89% correct validation is done by stratified multi-labelled cross validation for one of the drug molecule and proposed framework showed 84.5% accurate DTI predictions as compared to others. The fact shows that proposed framework generally is compatible for large frame drugs and require only the information of drug chemical structure and target identification [41]. The recently discovered iDrug method integrates DTI and drug repositioning into a new coherent model termed as cross network embedding. It provides the principle way of transferring knowledge across various drugs target relationships and using this it predicts accuracy for both DTI and drug disease relationships.
The working of iDrug was tested on various datasets containing data from multiple disease types and hence has become applicable for drug repurposing due to several reasons [42].
Molecular transformer drug target interaction MT-DTI is used for getting more accurate results which is pre trained learning based drug targeting model allowed for a number of already present drugs and has been used for the identification of viral proteins effective against SARC-CoV 19.Using this method, atazanavir an antiviral or antiretroviral for HIV treatment and has proved to be the most potent drug having kd value of 94.66 which is comparable to other drugs available for treating SARC-CoV [43].

6) Algorithms for molecular docking and simulation:
Molecular docking is an in-silico approach for the designing of drugs on the basis of their structures as it has the ability to predict the conformational properties of small molecules or ligands attached to drug or appropriate target binding site. But the drawback lies in the fact that it is unable to resolve the molecular 3D structure of many of the drug candidates required for molecular docking or simulation [44]. Moreover, another thing the accuracy of the methods gets decreasing when there are small or insufficient number of ligands available for a drug molecule. Regardless of the limitations, there are number of applications which describe the involvement of AI based tools in molecular docking. For example, thioridazine is an anti-inflammatory compound and has been included in the 1500 FDA approved drugs and also inhibit ikB activity being critical for NF-kB pathway involved in progressing of cancer. Similarly inn another experiment, virtual docking has predicted the inhibitory activity of about 1400 FDA approved drug in against to Pseudomonas aeruginosa showing inhibitory activity against the various virulence gene expression [45].
Artificial learning is found to be the most accurate method for prediction of structure of the protein from their amino acid sequences [46]. We recently employed Virtual Kinome Profiler which is an efficient platform for distinguishing of chemical similarity of kinome , thus speeding up the drug repurposing process for accurate identification prediction of inhibitors [47]. SVM sequence vector machine algorithm has enabled the prediction of greater than 30M kinase compounds using which we carried out in silico predictions of 150k compounds in term of drug repurposing and optimization of lead compounds. Experimental testing provided the validation of about 15 targets out of 50leading to 1.5% increase in precision and decrease of false negative activity to about 1.3% [48].
There are also various examples where various algorithms have been used for drug repurposing in various target classes. CATNIP model for example uses the similarity of molecules on the basis of their target, structure or pathway information. Similarly, another model used the chemical structure-based information for the prediction of 23FDA approved drug molecules. From them 8 were having cardio protective activities as compared to others [49].

7) Conclusion:
With the emergence of the field, the scientists are in search of new drugs, therapies or treatments having more efficiencies as compared to existing ones. By understanding the underlying mechanism in the progression of disease and the effect of already discovered drug and unrevealing of their genetic codes can lead to the new and precision drug therapies which in turn will improve both the health and life of patients. Classical methods used for the discovery of the drugs are found to be time consuming, having more false negative results. In comparison using computational approaches having AI, one can identify the drug more quickly. A number of researches has shown the impact of the drugs on various targets and many of them are in the process of discovery now.
In addition to identification of targeted drug or protein, repurposing of drugs provide new molecular targets which can be further used based on their properties also for the prediction of new drug molecule using AI based tools in field of bioinformatics. In future there are expectations that bioinformatics will play a more significant role in the analyzing of data using AI and ML approaches in order to save time and resources at the same time. It will in turn also accelerate the discoveries in the field of biomedical, biological sciences and robotic surgeries.