Personalized hybrid educational recommender system using matrix factorization with user and item information

One of the main challenges for autonomous learning in virtual environments is finding the 1 right material that fits students’ needs and supports their learning process. Personalized recommender 2 systems partially solve this problem by suggesting online educational resources to students based on 3 their preferences. However, in educational environments (which need a proper characterization of 4 both users and educational resources), most existing recommendation algorithms either fail to include 5 all the available information or use hybrid processes that do not exploit possible relationships between 6 users and item features. This article presents a personalized recommender system for educational 7 resources aimed at combining user and item information into a single mathematical model based on 8 matrix factorization. As a result, estimated latent factors can provide insight into possible interactions 9 between users and item features, improving the quality of the information retrieval process. We 10 validated the proposed model on a real dataset that contains the ratings assigned by students 11 from Universidad Nacional de Colombia and Universidade Feevale to educational resources in the 12 Colombian Federation of Learning Object Repositories (FROAC in Spanish). User characterization 13 included learning style and educational level, whereas item characterization (obtained from the 14 objects’ metadata), included interactivity level, aggregation level and type, and resource format. 15 These results, compared to those obtained when not all the available information is included, show 16 that our method can improve the recommendation process. 17

This work presents a hybrid personalized recommender system for educational materials that 88 combines recommendation techniques based on collaborative filtering, content, and demographics 89 into a latent factor model. Thus, the proposed system considers the available user information as well 90 as item metadata and infers the existing relationship between those characteristics, improving the 91 recommendations offered to each student. 92 This article is organized as follows. Section 2 presents the matrix factorization methods for Matrix factorization methods for recommender systems are based on latent factor models, where items as well as users are characterized using vectors of factors inferred through ratings. Consider a set of items, where`i ∈ R N×1 represents a set of N latent factors or characteristics that describe the i−th item. Additionally, each user is associated with a vector x u ∈ R N×1 . Thus, the elements of`i measure how much each factor represents item i (positive or negative), while the elements of x u measure the degree of interest the user u exhibits in each one of the factors that characterize the items (positive or negative). Furthermore, the dot product of x u and` captures the interaction between user u and item i and can be seen as an approximation of the rating 100 that such user assigns to said item y ui ∈ R. Therefore, the challenge for matrix factorization methods 101 is to calculate, from the set of given ratings, the vectors of factors x u and`i for all the users and all 102 the items. As a result, after the recommender system completes the factorization, it can estimate the ratings users will assign to any item using eq. (1).

104
The vectors of item factors can be grouped in matrix Θ ∈ R I×N , where the i−th row of the matrix 105 represents the factors of the item, and I ∈ Z denotes the number of items. Likewise, user factors can be 106 grouped in matrix X ∈ R U×Z , where the u−th row represents the factors of user, and U ∈ N denotes 107 the number of users. Moreover, the ratings users have assigned to the items can be stored in matrix 108 Y ∈ R U×I . Because not all users have evaluated all the items, a matrix R ∈ [0, 1] U×I is created, where 109 r ui = 1 (u-th row, i-th column of R) if user u has already rated item i, and r ui = 0 in the oppossite case.

110
Thus, to estimate factor matrices, the recommender system minimizes the regularized mean squared 111 error using known ratings Under this formulation, the system learns the model by adjusting the previously observed ratings.

114
Additionally, since the objective is to generalize prior ratings so that they enable the prediction of yet 115 unassigned ratings, a regularization term is used to avoid overtraining.

116
Matrices X and Θ are estimated using Gradient Descent, in the form where γ ∈ R + is the learning rate, matrices X j , Θ j denote the j−th iteration of the optimization algorithm, and the derivatives are defined as 2.2. Hybrid recommender system using matrix factorization along with user and item information

118
In addition to ratings, recommender systems may include different sources of information 119 about users and/or items in order to improve predictions. Hence, search history or learning style 120 characterization can be used to describe user trends and create a matrix of factors X. Therefore, the 121 unknowns in eq.
(2) will only be the factors of users Θ, and they can be estimated using eqs. (3b) 122 and (4b). Therefore, θ i f (row i, column f of Θ) will indicate how much of characteristic f that describes 123 all the users appears in item i. If a characterization of the items Θ is available (e.g., describing the level 124 of interactivity or how much it contributes to each learning style), the unknowns in eq.
(2) will be the 125 factors of users X, and they can be estimated using eqs. (3a) and (4a). As a result, x u f (row u, column f 126 of X) will indicate how important characteristic f that describes the items is to user u.

127
However, when descriptions of the users as well as the items (i.e., matrices X and Θ) are 128 available, the formulation of eq. (2) can only be applied i there is a direct correspondence between the 129 characterization of items and that of users, which includes (i) an equal number of characteristics and (ii) 130 the same characterization for users and items. Therefore, since in few real cases such characterization 131 correspondence can be obtained, this work proposes a model that allows the utilization of user and item 132 descriptions. For that purpose, users are assumed to be described with a number N u of characteristics, 133 which produces matrix X ∈ R U×N u . In turn, items are described with a number N i of characteristics, 134 which produces matrix Θ ∈ R I×N i . Thus, eq. (2) can be rewritten including a matrix of factors The derivative of this cost function with respect to parameters Σ is defined as Thus, we use gradient descent to optimize the model parameters, as follows Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 August 2020 doi:10.20944/preprints202008.0700.v1 According to eq. (1), once matrix Σ has been obtained, the estimation of the rating that user u will assign to item i is given by: where x u ∈ R 1×N u are the characteristics of user u; oe ni ∈ R 1×N u , the relationships between the 138 characteristics of user u and object i; and θ n i ,i the n i −th characteristic of item i.
where x u,n u ∈ R is the n u -th characteristic of user u;x u ∈ R denotes the average of the characteristics     video-duration. The semantic density of an educational resource is independent of its difficulty.

210
• Visual: Contribution of the resource for a student with a visual learning style.

211
• Auditory: Contribution of the resource for a student with an auditory learning style.

212
• Reading: Contribution of the resource for a student with a reading learning style.

213
• Kinesthetic: Contribution of the resource for a student with a kinesthetic learning style.

215
This file stores all the ratings assigned by the users to the items in five categories, as well as user 216 and item identifications.

217
• User ID: Identification of the user who completed the evaluation.

218
• Item ID: Identification of the resource that was rated.

224
The information was collected by randomly assigning OERs to students and asking them to 225 evaluate each of them. Figure 2 presents the interface that was used to collect such evaluations. At the end, the data-set included 400 evaluations of 152 OERs by 56 users (students from Feevale 227 University in Brazil and the National University of Colombia). The proposed recommender system was 228 applied using those data. Hence, the users file was matrix X ∈ R 56×6 , and resources were Θ ∈ R 152×9 .

229
Moreover, five evaluation matrices Y r ∈ R 56×152 , r = 1, . . . , 5, which correspond to each rating, were 230 individually tested. is inferred, for a total of 54 parameters to be estimated. Σ represents the relationship between 237 item and user characteristics.

238
• M 2 The second model uses the characteristics of users X. Therefore, matrix Θ ∈ R 152×9 is 239 inferred, for a total of 1368 parameters to be estimated.

242
• M 4 The fourth model does not take into account resource or user characteristics. Consequently, 243 Θ ∈ R 152×9 and X ∈ R 56×6 must be estimated, for a total of 1704 parameters.

244
For models M 1 and M 2 , which take into account user characteristics, each feature (learning style) 245 was scaled from 0 to 10.

246
For models M 1 and M 3 , the object metadata of each item, discarding the itemID, were scaled from 247 0 to 10.

248
Finally, for all models, scores R 1 to R 5 where scaled from 0 to 5.  values. The lower the MSE, the better the prediction made by the system. The MSE is defined as

Model Cost function Unknowns Parameters
User ratings of items are commonly divided into a training set, used to learn, and a test set, used 257 to evaluate the quality of the predictions by measuring the MSE between the actual rating and the 258 prediction [33].

259
Tests were also conducted considering a lack of user ratings (the cold start problem) and 260 employing similarity in user characteristics to find the resources that a user may like without having 261 assigned previous ratings. It should be mentioned that all the values underwent a normalization 262 process so that their range was between 0 and 5.

267
The first experiment implemented a cross-validation methodology with 10 partitions using all 268 the available ratings. For that purpose, the 400 ratings were randomly divided into 10 groups with 269 approximately 40 ratings each. Subsequently, 9 out of the 10 groups were used to train the algorithm 270 and estimate the unknowns, while the remaining group was used for the validation. This experiment 271 was repeated until each group was employed for the validation.

272
The second experiment considered the cold start problem of new users. For that purpose, the 273 ratings of 55 out of the 56 users were the training data, while the ratings of the remaining user were 274 employed for the validation. In order to calculate the initial value of the ratings of the validation 275 user, the PCC between the characteristics of said user and those of all the users in the training set was 276 obtained. Afterward, user ratings with a PCC over 0.7 were averaged. This process was repeated until 277 each user was employed for the validation.

278
In addition to the 5 types of ratings initially considered in this work (i.e., overall rating, R 1 ; 279 contribution to learning, R 2 ; design, R 3 ; content quality, R4; and likelihood of recommending the 280 resource, R 5 ), three other ratings were created. Such ratings were weighted in order to include even 281 more information about users in the model and make it more general. First, the ratings were weighted 282 as if all of them were equally important in the system; this adapted rating was named R 6 . Afterward, 283 an analysis by the research team concluded that the most important ratings should be those regarding 284 the contribution to the learning process and the quality of the content in the resource. As a result, a 285 more significant weight was assigned to those two aspects, which produced R 7 . At the end, it was 286 decided that the first rating (i.e., the overall rating of the resource) is also an important value to be 287 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 August 2020 doi:10.20944/preprints202008.0700.v1 considered in the model; thus, R 8 was created. Equation 10 presents the weights assigned to the 5 288 ratings included in the model. can be explained by the fact that, in the cold start experiment, user information is also employed to 300 provide an initial estimation of the ratings assigned by the validation user. Finally, the proposed model   proposed in the state-of-the-art literature.
This article presented a hybrid recommender system that uses matrix factorization techniques.

318
Such system integrates ratings, as well as user and item characteristics, in order to estimate the 319 relationships that exist between such characteristics and offer educational resources that help students 320 in their learning process in a virtual environment.