Online Sequential Extreme Learning Machine: A New Training Scheme for Restricted Boltzmann Machines

The main contribution of this paper is to introduce a new iterative training algorithm for restricted Boltzmann machines. The proposed learning path is inspired from online sequential extreme learning machine one of extreme learning machine variants which deals with time accumulated sequences of data with fixed or varied sizes. Recursive least squares rules are integrated for weights adaptation to avoid learning rate tuning and local minimum issues. The proposed approach is compared to one of the well known training algorithms for Boltzmann machines named “contrastive divergence”, in term of time, accuracy and algorithmic complexity under the same conditions. Results strongly encourage the new given rules during data reconstruction.

Generally RBMs are trained using contrastive divergence (CD) algorithm which is often consume more computational costs due to the need of huge number of hidden nodes [2]. Besides, hyper-parameters such as learning rate, number of Gibbs sampling, number of hidden neurons and number of iterations vary from an application to another and needs more human intervention.
In this work and since extreme learning machine (ELM) is widely used for single-batch training in a variety of application due to its fast training and accuracy [4] [5], the contribution of the current experiments is to introduce a new fast and accurate iterative training algorithm for RBMs by involving ELM theories for both offline [6]and online learning [7] paradigms. The proposed algorithm experimentally compared to CD algorithm in term of time and accuracy and the results proves the credibility of the new adopted training scheme.
This work is organized as follow: in section 1, a brief description about the used training rules of basic online sequential ELM (OS-ELM) is presented. Section 3 introduces the given rules to the RBM. Section 4, illustrates with examples the circumstances of the comparative study. Section 5, is the conclusion of this work.

Basic OS-ELM
For any given dataset of n driven mini-batches of training inputs and targets , OS-ELM for a single hidden layer feedforward neural network (SLFN) has given following training steps in tow different phases [7]: ➢ The initial phase: • Randomly generated hidden nodes weights and biases ( , ) wb from any probabilistic distribution.
• Activate the hidden layer H of the initial mini-batch using any activation function G as addressed in equation (1).
• Determine the initial output weights  using formula (3) and covariance matrix in formula (2).
➢ The update phase : • Calculate the hidden layer for any new mini-batch using (1).
• Update the output weights using (4) depending on the prediction error in formula (5), the updated covariance matrix in formula (6) and the updated gain matrix in formula (7). Page 3 of 7 The superscripts

Proposed approach
In the new given rules of RBM the visible layer is mapped using random parameters generated independently from the training data. The input weights will be tuned each time in a single Gibbs sampling using Sherman Morison and Woodbury (SMW) formula. For the same unlabeled data: , the RBM can be trained also in two distinctive phases, the initial phase and sequential phase. In the initial phase, the RBM is initialized the same as basic OS-ELM. The difference between basic OS-ELM and the RBM is that the same input weights are the ones whose must be updated during the sequential phase. So, equations (4) and (5) must be changed to (8) and (9) to fit the unsupervised training paradigm.
Unlike old training rules of RBM which use the input weights for reconstruction and their transpose for reconstruction as shown in equation (10)

Experimental results and discussion
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 May 2020 doi:10.20944/preprints202005.0444.v1 In the current study, the proposed algorithms is compared to CD algorithm during training using the grayscaled image of 'Cameraman', normalized between 0 and 1 and resized to 250 by 250 pixels. The training hyper-parameters are adjusted according to Table 1.   towards stopping criteria in less than five iterations, unlike CD which keeps going towards a deeper end and obviously it needs more than 100 iterations.
By reducing hyper-parameters number and simplifying the Gibbs sampling, computational time will be gained during training same as Figure 4 explains.

Conclusion
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 May 2020 doi:10.20944/preprints202005.0444.v1 Comparing to old iterative training algorithms such as contrastive divergence or backpropagation, a new fast and more accurate training algorithm for RBMs is represented in this work. The new given rules which are inspired from OS-ELM one of ELM variants allow computational costs reduction under less human intervention during training.
In the current study the proposed approach is evaluated under unsupervised learning paradigms. Therefore, the aim of future works will focus on studying the effect of OS-ELM during the training of deep belief neural networks for supervised learning using a stack of the new RBMs.