Preference Net: Image Recognition using Ranking Reduction to Classification

Accuracy and computational cost are the main challenges of deep neural networks in image recognition. This paper proposes an efficient ranking reduction to binary classification approach using a new feed-forward network and feature selection based on ranking the image pixels. Preference net ( PN ) is a novel deep ranking learning approach based on Preference Neural Network ( PNN ), which uses new ranking objective function and positive smooth staircase ( PSS ) activation function to accelerate the image pixels’ ranking. PN has a new type of weighted kernel based on spearman ranking correlation instead of convolution to build the features matrix. The PN employs multiple kernels that have different sizes to partial rank image pixels’ in order to find the best features sequence. PN consists of multiple PNN s’ have shared output layer. Each ranker kernel has a separate PNN . The output results are converted to classification accuracy using the score function. PN has promising results comparing to the latest deep learning ( DL ) networks using the weighted average ensemble of each PN models for each kernel on CFAR-10 and Mnist-Fashion datasets in terms of accuracy and less computational cost.

In computer vision, the convolutions architectures is dominant in DL by using a variation of CNN-like architectures [21,32,34,49].Some architectures replaced the convolutions entirely [33,55,58]. However, these models have succeeded in image classification; they have not yet been scaled effectively on big-size images and use specialized attention patterns. Therefore, in large-scale image recognition, classic ResNet like architectures are still state-of-the -art [14,38]. Therefore, kernel computation and scaling are still fixed and specialized to certain images type.
Label ranking (LR) is one of challenging categories of Preference learning (PL) that gained importance in information retrieval by search engines [3,10]. Unlike the common problems of regression and classification, LR involves predicting the relationship between multiple label orders. For a given instance x from the instance space , there is a label L associated with x, L ∈ , where = { 1 , .., }, and n is the number of labels. LR is an extension of multi-class and multi-label classification, where each instance x is assigned an ordering of all the class labels in the set L. This ordering gives the ranking of the labels for the given x object. This ordering can be represented by a permutation set = {1, 2, · · · , }. The label order has the following three features. irreflexive where ⊁ ,transitive where Various LR methods have been introduced in recent years [60], such as decomposition-based methods, statistical methods, similarity, and ensemble-based methods. decomposition methods include pairwise comparison [17,18], loglinear models and constraint classification [20]. The pairwise approach introduced by Hüllermeier [11] divides the LR problem into several binary classification problems in order to predict the pairs of labels ≻ or ≺ for an input x. Statistical methods includes decision trees [19], instance-based methods (Plackett-Luce) [7] and Gaussian mixture model [40] based approaches. For example, Mihajlo uses Gaussian mixture models to learn soft pairwise label preferences [40].
Using ranking to minimize classification loss was introduced by kotlowski [29] by measuring the regret function of the classifier and the ranker where regret is the difference between the loss of learning compared to the best alternative method. Ailon [2] confirmed that it is hard to reach faultless ranking of all preference labels. Also, Balcan, Ailon, Abdulrahman, and Mamman [1,6] proposed different robust approaches to reduce the ranking for better classification.
The artificial neural network (ANN ) for ranking was first introduced as (RankNet) by Burge to solve the problem of object ranking for sorting web documents by a search engine [8]. Rank net uses gradient descent and probabilistic ranking cost function for each object pair. The multilayer perceptron for label ranking (MLP-LR) [42] employs a network architecture using a sigmoid activation function to calculate the error between the actual and expected values of the output labels. However, It uses a local approach to minimize the individual error per output neuron by subtracting the actual -predicted value and using Kendall error as a global approach. Neither direction uses a ranking objective function in backpropagation (BP) nor learning steps.
A multi-valued activation function has been proposed by Aizenberg [4] using convex shape to support multi-values and complex numbers neural network. In addition, Moraga [41] introduced a similar function to design networks Manuscript submitted to ACM Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 December 2021 Preference Net: Image Recognition using Ranking Reduction to Classification iii for realizing any multivalued function; however, Moraga used exponential function derivative did not give promising results in the PNN implementation using the ranking objective function in FF and backpropagation (BP) steps.
Preference neural network (PNN ) was introduced by Elgharabawy [13] as the first ANN for ranking using spearman objective function and new smooth staircase (SS) activation function designed to accelerate the ranking by producing multi preference values [13]. The deep neural network (DNN ) is introduced for object ranking to solve document retrieval problems [50]. RankNet [8], RankBoost [16], and Lambda MART [52], and deep pairwise label ranking models [26], are convolution neural Network (CNN ) approaches for the vector representation of the query and document-based.
CNN is used for image retrieval [37] and label classification [25].
The CNN mentioned above, and their variants have some issues that can be broadly summarized into: 1) Partial detection as CNN kernel detects small size features such as edges with kernels that occupy only tens or hundreds of pixels. Thus, it ignores the relationship between different parts of the whole image in large images.
For example, CNN detects the image edges in the human face by combining features (the mouth, two eyes, the face oval, and a nose) with a high probability to classify the subject without learning the relationship between these features of CNN with several layers 2) Slowing in computational performance due to CNN several layers.
3) Challenging in detecting object Under different angels, backgrounds, lighting conditions.
The proposed PN has several advantages over existing CNN classification approaches.
1) Simplifying the calculation based on the difference of pixel values of greyscale images.
2) Enhancing the predictive probability and accelerate the ranking convergence rate using new PS activation function over existing sigmoid, Relu and Softmax due to the step shape to produce almost discrete multi-values from 0 to where is the number of ranked labels.
3) Speeding the computational calculation of single epoch due to PN 5 layers.
4) Boosting the accuracy, sensitivity, and image classification results by pixels' ranking and reducing the ranking output to classification using score function.
Section II explains the PN components (Activation functions, Objective function, and network structure).
Where is number of output labels, w is the step width.
Manuscript submitted to ACM

II.II Ranking Loss Function
Two main error functions have been used for label ranking; kendall [28] and spearman [45]. However, the kendall function lacks continuity and differentiability. Therefore, The spearman correlation coefficient is used to measure the ranking between output labels. spearman error derivative is used as a gradient ascent process for BP, and correlation is used as a ranking evaluation function for convergence stopping criteria. The spearman error function is represented by Eq.II where , , and represent rank output value, expected rank value, label index and number of instances, respectively.

II.III Preference Neuron
Preference Neuron are a multi-valued neurons uses a PSS, that is the positive part of the SS activation function introduced by subgroup preference neural network [13] as an activation function. PSS function has a single output; however, PN

II.IV Preference Neural Network
The PNN is fully connected to multiple-valued neurons and a single-hidden layer proposed by ELgharabawy [12,13].
The input layer represents the number of features per data instance. The hidden neurons are equal to or greater than Manuscript submitted to ACM

=1
. Section III describes the data preprocessing steps, feature selections.

III PN STRUCTURE AND PROCESSING
The paper approach converts the multi-class classification problem into a multi-label ranking problem in two steps.
2) Converting Ranking Results into Binary Classification: using ranking reduction into classification by scorer function :→ R. For output labels to choose the max value = ( ) for the binary classification that has : X → {±1}. PN reranked the final results using the score function by choosing the highest preference value as vi Elgharabawy, et al.
the class label. The binary ranking is used to calculate the image classification accuracy, sensitivity and specificity.

III.I Image Prepossessing
III.I.I Greyscale Conversion. Data scaling as red, green and blue (RGB) colours is not considered for ranking because PN measures the preference values between pixels. Thus, The image is converted from RGB colour to Greyscale. Ranking the pixel reduces the data margin, so it reduces the computational complexity.
III.II.II Weighted Ranker Kernel. The kernel weights are randomly initialized from -0.05 to 0.05 learns the features by BP the weights. The partial change in the kernel is by differentiating the spearman correlation as in Eq. III Where is the original image matrix, and is the differentiating the spearman.
Different kernel sizes could be used. However, we propose multiple kernels for big images' size. We use three different kernels to capture the relations between different features in the image.

IV.I Kernel size
The ranker kernel window size is chosen according to the highest correlation between two images of the same class.
The kernel scans the two images and calculates , and the number of flattening windows has correlation exceeds 0.8 and 0.6. the number of kernels is from 3 to 5 kernels. The Mnist dataset kernel size is chosen according to the table II

IV.II Baseline Algorithm
Algorithm 1 represents the three functions of the network learning process; feed-forward (FF), BP, and updating weights.
where is kernel width and is kernel height.
Manuscript submitted to ACM  V..II Dropout Regularization. Dropout is applied as a regularization approach to enhance the PNN ranking stability by reducing over-fitting. We drop out the weights that have a probability of less than 0.5. these dropped weights are removed from FF, BP, and updating weight steps. The gap between the training model and ten-fold cross-validation curves has been reduced using dropout regularization using hyperparameters (l.r.=0.05, h.n.=100) on the Mnist data set.
The dropout technique is used with all the data ranking results in the next section.
The following section is the evaluation of ranking experiments using image recognition benchmark data sets.

VI EXPERIMENTS
This section describes the classification benchmark data sets, the results using PN , and a comparison with existing classification methods.
VI.I.I Mnist. It consists of hand-written digits and is the most commonly used dataset within the deep learning community. The dataset is trivial to learn and simple to reach good performance [36]. It is included in the experiment for algorithm completeness of benchmarks.

VI.I.II Mnist-Fashion. is a rather new dataset with different classes of clothing and is a drop-in replacement for
MNIST [54]. It is harder but has the same size, input dimension, and number of classes as Mnist.
VI.I.III CIFAR-10. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. [31]. The   to the validation result, the weight is determined, then testing data 20,000 are executed on fifty models, the average weight is taken of the 50 models to determine the final result.
VI.II.II Output Ranking Results. In terms of ranking, the output of PN almost barely reaches =0.85; however, applying binary ranking on the final results increases the classification accuracy and sensitivity before and after using cost function in (a) and (b), respectively, which exceeds the state-of-the-art in Mnist, Mnist-fashion, and CIFAR-10 as shown in table III.
VI.II.III Image Classification Results. PN has 3 kernel sizes is tested on the CFAR-10 [30], Fashion-MNIST data set [53] and Mnist as shown in Table III. Table IV shows the results compared to other convolutions networks.

VI.IV Discussion and Future Work
This paper confirms that it hardly detects a complete ranking set of data where = 1. However, using initial ranking to solve classification is a promising step in terms of computational cost and reach stable results in validation and testing, then using cost function to boost the accuracy to outperform some other approaches. It can be noticed from table IV that PN is performing better than CapsNet [39] and. Different types of architectures of PN could be used to enhance the results to reach state-of-the-art in terms of image recognition.
The superiority of PN is using a new type of weighted kernel in pixels' ranking correlation and creating spearman correlation features matrix as a new feature selection to be a novel approach for deep label ranking for image recognition.

VII CONCLUSION
This paper proposed a novel method to rank a complete multi-label space in output labels and features extraction in both simple and deep learning. PN are native ranker networks for image classification and label ranking problems that use PSS to rank the multi-label per instance. This neural network's novelty uses a new kernel mechanism based on correlation instead of convolution. However, the ranking results hardly reach 0.9, which requires the score function to increase the accuracy by reducing the ranking to classification. This approach takes less computational time with a single middle layer. It is indexing multi-labels as output neurons with preference values. The neuron output structure can be mapped to an integer ranking value. PN is implemented using python programming language, and activation functions are modeled using wolframe Mathematica software [51].