A Hybrid Adaptive Educational eLearning Project based on Ontologies Matching and Recommendation System

The implementation of teaching interventions in learning needs has received considerable attention, as the provision of the same educational conditions to all students, is pedagogically ineffective. In contrast, more effectively considered the pedagogical strategies that adapt to the real individual skills of the students. An important innovation in this direction is the Adaptive Educational Systems (AES) that support automatic modeling study and adjust the teaching content on educational needs and students' skills. Effective utilization of these educational approaches can be enhanced with Artificial Intelligence (AI) technologies in order to the substantive content of the web acquires structure and the published information is perceived by the search engines. This study proposes a novel Adaptive Educational eLearning System (AEeLS) that has the capacity to gather and analyze data from learning repositories and to adapt these to the educational curriculum according to the student skills and experience. It is a novel hybrid machine learning system that combines a Semi-Supervised Classification method for ontology matching and a Recommendation Mechanism that uses a hybrid method from neighborhood-based collaborative and content-based filtering techniques, in order to provide a personalized educational environment for each student.


Introduction
The world wide web (www) today is an unruly construct, with a wide variety of styles. Specifically, last decade, the amount of www content dramatically increased that implies the need to manage and analyze big data volumes, which come from heterogeneous and often non-interoperable sources [1]. The semantic modeling of the www content in order to be perceived by the search engines is achieved with the Semantic Web (SWeb) technologies [2]. In addition, the management of these big volumes is further complicated by the need for high-security policies and privacy under the recent General Data Protection Regulation (GDPR) [3]. As the web evolves, the need for semantics technologies that focuses on the importance of the content are an important priority for the research communities.
Generally, the SWeb technologies "enable people to create data stores on the web, build ontologies, and write rules for handling data. Linked data are empowered by technologies such as RDF, SPARQL, OWL, and SKOS" refers to W3C's vision of the web of linked data [4]. Ontologies are a complex, and possibly quite a formal collection of terms. Used to define and exemplify an area of concern and to organize the terms that can be used in a domain, characterize possible relationships, and define probable restrictions on using those terms [5]. With this approach, the search engines will modeling of the process of retrieval and management of information based on semantic criteria, for the needs of individualized education of each student.
The sections appear in the rest of the paper in the following prescribed order as follows: Section 2 presents the related work about the relevant AES that have used machine learning methods. Section 3 describes the proposed model. Section 4 defines the methodology and finally, section 5 contains the conclusions.

Related Work
Online collaborative has highlighted the eLearning approaches as an essential part of modern educational system. Universities, organizations, and companies have adopted eLearning as a more flexible and effective way to train their students, executives, or employees. However, the current and future trends in eLearning prove that it is a field for continuous innovation and research.
The are some scientific works, related to several topics relevant to the development AEeLS of the present paper. For example, the work [13] discovers several tactics for educational metadata mining, whose one of the most important open challenges is the recognition of Learning Objects and the metadata that can be gained from them. Also, both Mao et al. [14] and Liu et al. [15] show how Ontology Matching can be specified as a binary classification problem, forcing use of most well know machine learning algorithms. In the former work, an approach for locating relationships between two ontologies using Support Vector Machines (SVM) is presented. The experimental results show promising are remarkable when contrasted against other mapping methods.
In addition, the paper [16] propose a novel ontology matching method that uses again SVMs, demonstrating a precision of the order of 95% in their investigational results.
Other research work [17], explore the ontology mapping problem based on concept classification by decision trees algorithms that introduces a similarity measure among two portions fitting to distinct ontologies. Nonetheless, the effort does not give analytical precision results, although claiming that the model produced is faster at execution due to the less evaluations needed.
A different approach presented by the [18] that introduce a graph-based semantic annotation method for enriching educational content with linked data, in order to gain document search with high recall and precision.
Metaheuristics have also had a important role in the vicinity of e-learning. In this sense, Luna et al. [19] propose an association paradigm for finding learning rules applying evolutionary metaheuristic algorithms.
Moreover, Peñalver-Martinez et al. [20] apply some natural language techniques to resources produced for opinion mining with remarkable results.
Also, Wang et al. [21] presents a classification method for less widespread webpages based on suppressed semantic analysis and difficult set patterns for the automated tagging of web pages with related content.
On the other hand, the investigation of smart recommendation systems, have noticed great recognition and usage in e-commerce platforms. Though, authors of [22] introduce an online courses recommendation system, which joins numerous clustering methods in order to prove that machine learning approaches can enhance significant the estimation process of courses immersed in e-learning environments.
Also, Gladun et al. [23], presents a multi-agent recommendation system for automatic feedback concerning knowledge obtained by students in e-learning platforms, taking advantage of the SWeb technologies.
Finally, other research methods on distance learning are focused on proposing a novel way of microlecture through mobile terminals and web platforms [24], while others focused on expanding educational horizons (Walters, Walters, Green, & Lin, 2016).

Proposed Framework
Since eLearning systems' methodology is an extremely complex process, trainers cannot be based only on the use of pathetic isolated content and products based solely on the old and maybe obsolete educational materials. The content classification based on the student needs, should not be a manual and time-consuming process, something that will offer an important disadvantage to the education system. Following this point of view, the use of more effective methods of education supervision, with capabilities of automated control the educational content and use of specific materials for every student is important to every modern educational system.
It is also important the update the eLearning philosophy and its transformation into an Adaptive Educational eLearning System. The ideal AEeLS includes advanced AI solutions for real-time analysis of the educational needs both known and unknown students, instant reports, data visualization of progress, and other sophisticated solutions that maximize the education experience alongside with fully automated content evaluation process by semantic technologies.
Unlike other techniques that have been proposed in the literature focused on static approaches [16][17], the dynamic model of AEeLS produce a evolving educational tool without special requirements and computer resources.
The algorithmic approach of the proposed AEeLS includes in the first stage an Ontologies Matching process from www in order to find the relevant educational content as you can see in the depiction of the proposed model, in Figure 1. In the second stage, the content checked for the precision and accuracy and a Recommendation Mechanism proposes new relevant material in order to produce an extremely fitted curriculum for each student (stage 2 in Figure 1).
The following Figure 1 is a representation of the algorithmic approach of the proposed AEeLS model:

Ontologies Matching
The ontologies are a formal structured information framework and a clear definition of a common and agreed conceptual formatting of properties and interrelationships of the entities that really exist in a particular domain of interest. The main components of the ontologies are classes, properties, instances and axioms. Classes exemplify adjusts of entities within a specific domain. Properties define the various attributes of concepts and constraints on these attributes. Both of them can be formed into separate hierarchies. Instances represent the concepts and axioms are assertions in the form of logic to constrain values for classes or properties [25].
Officially an ontology can be defined as below [26]: where C and P denote classes and properties, H C and H P are the hierarchy of them, I is a set of instances and A O is a set of axioms.
The proposed Ontologies Matching Mechanism (OMM) based on advanced computational intelligence and machine learning techniques. The aim is to develop a fully automated method for extracting information and controlling the effectiveness of student needs [27]. In particular, this subsystem automates the extraction, analysis, and interconnection of educational web content material based on relevant ontologies for further processing. It also allows for the effective detection of conflicting rules or content related to the transmission of personal data to ensure that they cannot be used to create a user profile or privacy leakages. To achieve this, ontology matching techniques using AI methods used.
Ontology matching is a hopeful method to the semantic heterogeneity dilemma. It uncovers correspondences among semantically linked entities of the ontologies. These correspondences can be applied for various tasks, such as ontology merging, query answering and data translation. Thus, matching ontologies allows the knowledge and data expressed in the paired ontologies to interoperate [28].
The aim of ontology matching is the procedure of establishing correspondences between concepts in ontologies to derive an alignment between two ontologies, where an alignment consists of a set of correspondences between their elements so that significant similarity can be equivalent. Given two ontologies OS (source ontology) and OT (target ontology) and an entity es in OS, the procedure ontology matching M denoted as a process that find the entity et in OT, that es and et are deemed to be equivalent [29].
It should be emphasized that the ontology matching process it can be subsumption, equivalence, disjointness, part-of or any user specified relationship. The most significant matchings or alignments can be categorized in three particular sections [30]: 1. Similarity vs Logic: This category concerns the similarity and logical equivalence among the ontology terms.

Atomic vs Complex: With regard to that category the alignment considers if it is
"one-to-one", or "one-to-many". 3. Homogeneous vs Heterogeneous: In the third category, the alignments examines if it is on terms of the same type or not (e.g., classes to classes, individuals to individuals, etc.). Usually, an ontology matching tactic applies several and different categories of matchers such as labels, instances, and taxonomy forms to recognize and calculate the similarity between ontologies. The easiest strategy is to aggregate the similarity values of each entity pair in a linear weighted fashion and decide on a suitable threshold to recognize matching and non-matching pairs. Though, given a matching condition, it is difficult to define the right weights for each matcher [30]. In recent past, many ontology matching methods and weighting strategies have been suggested to adaptively verify the weights such as Harmony [31] and Local Confidence [32], but there is no single strategy.
Against, the machine learning based ontology matching methods have been proved to get more accurate and reliable matching results [33]. Specifically, the supervised machine learning methods use a set of validated matching pairs as training examples, in order to apply a learning patterns strategy that can be find the right matches from all the candidate matching pairs. On the other hand, the unsupervised machine learning methods uses arbitrary and heuristic strategies to matching pairs without orderly and modeled methodology. Comparing the machine learning approaches, supervised methods usually get better results [33].
However, the main weakness of the methods with full supervision is that they need a substantial amount of labeled training examples to create a predictive model with acceptable performance. The training dataset is mostly accomplished manually by the trainer, which is a difficult and time-consuming procedure. In addition, the current method only give the similarity values purely as numeric features, without taking their critical characteristics into account [34].
As an alternative, the key characteristic of training with Semi-Supervised method is the creation of the robust model with the use of pre-classified along with unlabeled instances. This approach operates on the condition that the input patterns with and without labels, belong to the similar marginal distribution, or they follow a common formation. Largely, unclassified data offer useful information for the discovery of the whole dataset data structure, while separately the sorted data are presenting in the learning procedure. Thus, even the most serious real-world problems can be developed successfully, based on the crucial oddities that describe them [34].
The OMM uses a semi-supervised learning ontology matching innovative approach. Provided a slight set of labeled matching entity pairs, the technique first utilizes the central relationships in the similarity area to enhance positive training instances. After receiving more training instances, a graph based semi-supervised learning algorithm is engaged to classify the rest applicant entity pairs into matched and non-matched classes. Finally, the suggested method define several constrictions to adapt the probability matrix in label propagation algorithm, which help to increase the performance of matching results [35].
The semi-supervised learning method is suitable for the OMM as ensures high-speed, vigorous and efficient classification performance. Moreover, it is easily adjustable and applicable method. Also, it is a pragmatic machine learning technique that can model the ontologies matching challenge based on a section of few pre-classified data vectors, exposing the relationships amongst the taxonomy constructions of ontologies [34][35].
Specifically, the OMM applies a hybrid model which employs well-established algorithms, optimally combined in order to create a faster and more flexible integrated Fuzzy Semi-Supervised Learning system. The most important innovation and advantage of the proposed approach is the easy validation of the classification process for a first time seen data, based on robust measurable factors. The theoretical background of the system's core is presented in the next paragraphs.
The naive Bayes classifier [36] is a practical learning method based on a probabilistic representation of a data structure, representing a set of random variables and their hypothetical independence, in which complete and combined probability distributions are substantiated. The objective of the algorithm is to classify a sample X in one of the given categories C1,C2,..,Cn using a probability model defined according to the theory of Bayes. These classifiers make probability assessment rather than forecasting, which is often more useful and effective. Here the projections have a score and the purpose is the minimization of the expected cost. Each category is represented by a prior probability.
We make the assumption that each sample X belongs to a class Ci and based on the Bayes theory we estimate the posteriori probability. The quantity P describing a naive Bayes classifier for a set of samples, expresses the probability that c is the value of the dependent variable C, based on the prices x=(x1, x2, ..., xn)of the properties X=(X1, X2,..., Xn)and it is given by the following relation (2) where the characteristics xi are considered as independent [36]: The estimation of the above quantity for a set N examples is done by using the relations 3, 4 and 5: For a characteristic xi with discrete values, the Probability is estimated by equation 5. Each node in V is a indiscriminate variable that can take a value from an applicable domain. V is additional divided into two sets of nodes: X, the observed variables and Y, the nodes whose values need to be defined. Our task is to label the nodes Yi ∈ Y with one of a small number of labels, L = {L1, .
. . ,Lq}; we'll use the shorthand yi to imply the label of node Yi .
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 August 2020 doi:10.20944/preprints202008.0388.v1 Also, according to Zadeh [38] every element "x" of the Universe of discourse "X" belongs to a Fuzzy Set (FS) with a degree of membership in the closed interval [0,1]. Thus, the subsequent function 6 is the mathematical base of a FS [38]: = {( , ( )/ : {[0,1]: } ( )} (6) The next function 7 is a case of a normal Triangular Fuzzy Membership Faction (FMF). It must be explained that the "a" and "b" factors have the values of the lower and upper bounds of the raw data individually [38]: According to the typical (crisp) classification methods, each sample can be assigned only to one class. Thus, the class membership value is either 1 or 0. In general, classification methods reduce the dimensionality of a complex data set by grouping the data into a set of classes.
In fuzzy classification, a sample point can be assigned to many classes with a different degree of membership. The fuzzy c-means clustering algorithm initially gives random values to the cluster centers and then it assigns all of the data points to all of the clusters with varying Degrees of Membership (DoM) by measuring the Euclidean distance.
The Euclidean distance of each data point xi from the center of each cluster c1… cj is calculated based on equation 8 [39].
where dji is the distance of xi from the center of the cluster cj Then the DOM of each data point to each cluster is estimated based on equation 9: where cj is the center of the j-th cluster with (j=1,2….p), and xi is the i-th point [39]. This is an iterative algorithm and the whole process is repeated till the centers are stabilized.
The OMM is an innovative hybrid algorithm based on the combination of soft computing approaches. Let us consider a supervised learning case with a training set of size N {X,Y} = { , } =1 , where xi ∈ and yi is a binary vector of size no. It must be clarified that i and no are the dimensions of the input and output respectively.
The OMM initially performs Semi-Supervised Clustering (SSC). This means that cluster assignments may be already known for some subset of the data. The final aim is the classification of the unlabeled observations to the appropriate clusters, using the known assignments for this subset of the data. At the same time the algorithm produces the degree of membership of each record to its cluster.
The clustering validation process is performed by employing the "classes to clusters" (CL_A_U) method, that adopts SSC. Originally a minimum data sample is used comprising of the clusters derived from the SSC process (labeled data). The remaining unlabeled data are used to dynamically form and adjust the classes based on their DOM. Actually, the CL_A_U approach assigns classes to the clusters, based on the majority value of the class attribute within each cluster. The class attribute is treated like any other attribute and it is a part of the input to the clustering algorithm.
The objective is the assessment as to whether the selected clusters match the specified class data. In the CL_A_U evaluation, you tell the system which attribute is a predetermined "class." Then this is removed from the data before passing to the SSC algorithm. The CL_A_U evaluation, finds the minimum error of mapping classes to clusters (where only the class labels that correspond to the instances in a cluster are considered) with the constraint that a class can only be mapped to one cluster.
The emerged classes are fuzzified by assigning them proper Linguistics, in order to obtain a realistic coherence between the associated values of the dataset under study.
The whole process is presented in the Algorithm1 below.

Recommendation Mechanism
The Recommendation Mechanism (RMm), is a computational intelligence and machine learning mechanism [40] in the AEeLS to create intelligent rules for intervention decisions and offer personalized real-time information for the students educational needs with Collaborative Filtering (CF) [41] technique.
CF is a machine learning method of making filtering about the conception by accumulating preferences or unique information from several users (collaborating). In the more general sense, CF is the method of filtering for information or patterns using procedures affecting collaboration between various agents, opinions, data resources, etc. Usually, a workflow of a CF can be defined as below [41]: 1. A user extracts the predilections by ranking objects of the system. These grades can be considered as an estimated description of the user's importance in the related domain. 2. The system match up this user's rankings compared to other users' and discovers the people with most "related" preferences. 3. With similar users, the system recommends items that the similar users have ranked highly but not yet being ranked by this user. CF systems are separated in memory-based and model-based methods. Memory-based methods simply memorize the user preferences and issue recommendations based on the relationship between the new rating items and the rest of the ranking matrix. Model-based methods on the other hand fit a parameterized prototype to the given ranking matrix and then issue recommendations based on the tailored model [41].
The most popular and reliable CF methods are neighborhood-based methods, which predict ratings by referring to users whose ratings are similar to the closest training examples in the feature space. The most useful technique for this purpose is to allocate weight to the impacts of the neighbors, so that the nearer neighbors provide more to the average than the more distant ones. This is inspired by the hypothesis that if two users have similar grades on some items they will have similar grades on the remaining items and the opposite [42].
Currently, CF methods have been applied to many kinds of systems including sensing and monitoring applications, environmental sensing over large areas, financial process and electronic commerce and web applications [42] [45].
Traditional CF methods face two major challenges: data sparsity and scalability [42]. In the RMm, we use a hybrid method from neighborhood-based CF and content-based filtering that addressing these challenges and improve quality of recommendations [43].
The aim of this hybrid method trying to achieve more personalized intelligent rules for intervention decisions and personalized recommendation in real-time information for the student's educational needs based on skills. This hybrid method is more versatile, in the sense that they can be applied to heterogeneous ontologies and with some care could also provide cross-domain recommendations. Also, it works best when the user space is large, it is easy to implement, it scales well with no-correlated items and does not require complex tuning of properties [46].

Data
The proposed model of pattern classification was validated through tests, which were done on data taken from the Ontology Alignment Evaluation Initiative (OAEI) 2014 [47] campaign, as well as on data taken from two known educative content repositories: ADRIADNE [48] and MERLOT [49]. Thus, two datasets were built, containing patterns representing the relationships between pairs of Learning Objects taken from two different ontologies immersed in the Open and Distance Learning context.
For the first trial test according the [50], the OAEI 2014 data bank was used, for undertaking the problem of Instance Matching Track, more precisely for the Identity Recognition Task [47] and specifically is to find an appropriate similarity function, in order to build pairs of objects which are actually close in meaning. Through the adequate use of a given similarity function, the ontologies matching problem transformed into a binary pattern classification problem.
The second experiment consists on doing a match between two different educative content repositories (ADRIADNE and MERLOT) in Learning Objects Metadata format, based on a sample of 100 from each repository, related to the Computer Sciences topic.
The ADRIADNE Foundation offered a provision that is the capability to transform the metadata of the objects into known specifications, such as Learning Objects Metadata and Doublin Core.
MERLOT is one of the biggest open access warehouses for educative subjects and is created for use by research communities. Includes a gathering of learning resources and educational materials, such as: animations, case studies, collections, questionnaires, simulators, etc.
In this experiment according the [50], a total of 100 1:1 matching examples were constructed from both ontologies. The features extraction takes into account for the pattern structure: title, description, keywords, and type of resource.
The classification performance is estimated by the usual evaluation measures: Precision (PRE), Recall (REC) and F-Score indices that are defined as in equations 12, 13 and 14 respectively [51][52]: The Precision rate shows what percentage of positive predictions where correct, whereas Recall measures what percentage of positive events were correctly predicted. The F-Score can be interpreted as a weighted average of the precision and recall. Consequently, this measure takes both false positives and false negatives into account. Subliminally it is not as straightforward to comprehend as accuracy, but F-Score is generally more valuable than accuracy and it works best if false positives and false negatives have similar cost, in this case.
Also, the validation method used the 10-fold cross-validation method because the quantity of available examples is relatively larger, which in turn offers statistically sound performance measurements [51][52].
The following Tables 1 and 2 demonstrates obviously that the proposed method has superior performance for both datasets which is quite promising contemplating the complexities faced in this problem. It is crucial to say that evaluating several factors that can define a type of challenge discussed here is a partly subjective non-linear and dynamic procedure.

Discussion
This work presented a hybrid [53][54][55][56], innovative [57], reliable [58][59] and highly effective eLearning system that has the capacity to gather and analyze data from learning repositories and to adapt these to the educational curriculum according to the student skills and experience, based on sophisticated computational intelligence methods [60]. The AEeLS is a clearly innovative effort to effectively analyze and recommend relevant educational content based on semantic ontologies techniques. The proposed method is based on the optimal combination of the OMM and the RMm algorithms, which ensures the adaptation of the system in new situations. It offers high level of generalization, by implementing a robust algorithm capable to respond to high complexity problems. The performance of the proposed algorithm was tested on two multidimensional datasets of high complexity. These data sets emerged as a result of an extensive research on the function of ontologies. They realistically state the operating modes of these devices in normal conditions and in situations where they are subject to modern educational systems and needs. The results have proven the efficiency of the developed hybrid model.

Innovation
An important innovation of AEeLS is the use of hybrid learning techniques capable to solve a multi-dimensional and complex problem. The proposed system simulates in a realistic way the functioning of biological knowledge, the practical mode of human memory, and more commonly, the ways in which the brain models use the skills and experiences.
Also, an important improvement is the partition of the OMM and the RMm to relocate the expertise in the eLearning system. This method significantly enriches the way in which the learning extraction techniques work, as it generates the likelihood of forming heterogeneous systems to which learning transfer can be applied.
Finally, it should not be overlooked that an similarly valuable invention is the fact of combining AI to the level of an educational eLearning system. This fact considerably improves the performance of modern educational systems. This innovation provides important solutions and improves the way eLearning systems work and respond to new generation.

Future Work
Future research will focus in further optimization of the algorithm's parameters that may result in a faster and more accurate performance. We will work on the improvement of the AEeLS complexity in a high understandable and adjustable level. Further optimization by means of self-improvement and autolearning can be explored to fully automate the process of detecting relevant educational content. Finally, a very important future improvement is the extension of the algorithm for Natural Language Processing (NLP) capabilities, with Recurrent Neural Network (RNN) and specifically with deep architectures such as Long-Short Term Memory (LSTM), in order to approach and model time sequences and their broader dependencies with greater accuracy and efficiency.

Conflicts of Interest:
The authors declare no conflict of interest.