Predictive Modeling of Henry’s Law Constant in Chemical Structures Using LSSVM and ANFIS Algorithms

: Henry’s constants for different existing compounds in water have great importance in transfer calculations. Measurement of these constants face different difficulties including high costs of experiment and low accuracy of measurement apparatus. Due to these facts, proposing a low cost and accurate approach becomes highlighted. To this end, adaptive neuro-fuzzy inference system (ANFIS) and least squares support vector machine (LSSVM) have been used as Henry’s constant predictor tools. The molecular structure of compounds has been used as inputs of models. After training the models, the visual and mathematical studies of outputs have been done. The coefficients of determination of LSSVM and ANFIS algorithms are 0.999 and 0.990 respectively. According to the comprehensiveness of databank and accurate prediction of algorithms, it can be concluded that LSSVM and ANFIS algorithms are accurate methods for prediction of Henry’s constant in wide range of chemical structure of compounds in water.


Introduction
The fates of different organic materials in environment extensively depend on various processes especially transfer of chemical materials between aqueous and air phases [1]. For the compounds in water, the Henry's law constant is known as one of the utmost important process's parameter [2].
This constant for various compounds in water has vital role in different areas of chemistry such as geochemistry, toxicological chemistry, environmental chemistry and chemical engineering. The H is defined by the ratio of chemical's concentrations in water to air. Due to this fact, reliable source of data for H is highly required to check the fates of chemical compounds in environment.
Overall, it is clear that the precise determination of H is costly due to the adsorption of low amounts of solute on the apparatus and also there are some limitations in the analytical detection of very hydrophobic compounds at low concentrations. Consequently, the prediction of H has fundamental value in several scientific phenomenon [3,4].
In the literature, there are some approaches to predict H of organic compounds in water based on chemical structure directly. Additionally, a number of indirect approaches for prediction of H based on vapor-liquid equilibrium data including activity coefficient, however their applications for prediction of the H are not exactly assessed [5,6]. Consequently in this paper, we focus on those approaches which can predict the H directly. There are two main types of correlation for prediction Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 17 February 2020 doi:10.20944/preprints202002.0248.v1 of the H. The first type belongs to the correlations of the physical properties such as aqueous solubility and vapor pressure for prediction of the H. One of the popular approaches in this type is the correlation suggested in [7]. In this correlation, there are some significant disadvantages including the degree of accuracy is a function of required physical properties or approaches applied to predict the characteristics and properties. Moreover, when a required property is missed, the prediction of the H is not possible.
The second type of correlations is known as quantitative structure property relationships which In the current study, two new computational methods are presented to predict H of organic compounds in terms of existing functional groups. To this end, adaptive neuro-fuzzy inference system and least squares support vector machine have been employed and finally, different statistical and graphical comparison methods have been applied to determine precision of these algorithms.

Experimental Data Gathering
The generalization of molecular-based prediction method is highly function of comprehensiveness of databank of materials used to its preparation. Due to this fact, the diversity of chemical families and the number of available compounds in databank have become highlighted. Investigation of literature reveals that the most reliable databank for H of compounds has been collected by Yaws so that 1940 H values for pure compounds can be found in Yaws' work [19]. It is worthy to mention that H values have been gathered in terms of atm.m 3 .mol -1 and shown in a decimal log(H) at temperature of 25 o C. Their ranges are between -13.461 to 6.238.according to the previous works, this databank is known as the most comprehensive and reliable databank has been applied for estimation of the H values of organic compounds in water. After gathering the databank, the chemical structural analysis of these data has shown that 107 functional groups exist in the structure of under-studied compounds.
The number of the functional groups in the structure of compounds is used as inputs of models.

Adaptive neuro-fuzzy inference system
The development of fuzzy logic was proposed by Zade. Applying ANN and fuzzy logic methods simultaneously makes a new form of artificial intelligence method called ANFIS.
In this method, the configuration have 5 different layers. The Gaussian membership function is optimized to reach most accurate answers [20][21][22][23][24]: Where Ο denotes i-th output for j-th layer, x and y denote input parameters.
The 2nd layer which have constant nodes, can be expressed as below: In 3rd layer which called normalized layer, the firing strength outputs is normalized: The 4th layer belong to linguistic expressions for outputs as following: The last step applied all rules together as below: In the current work, particle swarm optimization is used for optimization of ANFIS algorithm as shown in Figure 1. Eq. (12) σ 2 is the radial basis function width [25][26][27][28][29][30][31][32][33].
In order to determine hyper-parameters of LSSVM, PSO algorithm has been implemented as shown in Figure 2.

Results and discussion
In order to determine the H of pure compounds in aqueous solutions, two new computational methods including LSSVM and ANFIS algorithms have been used. It is obvious that the main step of development of a model is evaluation of accuracy so this section has implemented different statistical parameters including: • R-squared (R 2 ): Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 17 February 2020 doi:10.20944/preprints202002.0248.v1 Eq. (13) • Mean squared error (MSE) Eq. (14) • Standard deviations (STD) • Root mean square error (RMSE) Eq. (16) • Mean relative error Eq. (17) The above parameters have been reported in Table 1 for LSSVM and ANFIS algorithms.  Figure 3.
This comparison expresses the high degree of agreement between models outputs and actual henry's constant. After that, the cross plots of actual logH versus predicted logH are shown in Figure 4 for

Conclusions
In the current work, two novel molecular-based approaches were suggested for prediction of the henry's constant of various compounds in water. These models were constructed based on LSSVM and ANFIS algorithms. The models' variables and parameters include the existences of 107 classes for every compound. It is discussed that the majority of the classes are not existing in the compound.