PrimeNet: Adaptive multi-layer deep neural structure for enhanced feature selection in early convolution stage

PrimeNet: Adaptive multi-layer deep neural structure for enhanced feature selection in early convolution stage Farhat Ullah Khan 1,†,‡ , Izzatdin Aziz 2,‡ and Emelia Akashah P. Akhir 3,* 1,2,3 Center for Research in Data Science (CeRDaS), Universiti Teknologi PETRONAS, Seri Iskander, Perak, Malaysia-31750 * Correspondence: farhat_17000870@utp.edu.my; Version July 28, 2021 submitted to Appl. Sci. Abstract: The colossal depths of the deep neural network sometimes suffer from ineffective 1 backpropagation of the gradients through all its depths. Whereas, The strong performance of 2 shallower multilayer neural structures prove their ability to increase the gradient signals in the 3 early stages of training which easily gets backpropagated for global loss corrections. Shallow neural 4 structures are always a good starting point for encouraging the sturdy feature characteristics of the 5 input. In this research, a shallow, deep neural structure called PrimeNet is proposed. PrimeNet 6 is aimed to dynamically identify and encourage the quality visual indicators from the input to be 7 used by the subsequent deep network layers and increase the gradient signals in the lower stages 8 of the training pipeline. In addition to this, the layerwise training is performed with the help of 9 locally generated errors which means the gradient is not backpropagated to previous layers, and the 10 hidden layer weights are updated during the forward pass, making this structure a backpropagation 11 free variant. PrimeNet has obtained state-of-the-art results on various image datasets, attaining the 12 dual objective of (1) compact dynamic deep neural structure, which (2) eliminates the problem of 13 backwards-locking. The PrimeNet unit is proposed as an alternative to traditional convolution and 14 dense blocks for faster and memory-efficient training, outperforming previously reported results 15 aimed at adaptive methods for parallel and multilayer deep neural systems. 16


19
This decade has witnessed a remarkable reclaim of artificial neural structures in various forms 20 of deep learning techniques. The evolving robust computing infrastructure efficiently laveraged minimum loss in a multilayer structure organization will automatically relieve the necessity of higher 48 depths to propagate gradients back through all the layers effectively. This research presents an 49 advanced neural architecture combined with a more effective training method following the adaptive 50 inference mechanism. The overall contributions of this work could be summarized as: 51 • A backwards-locking free novel dynamic MLP structure 'PrimeNet' is proposed to encourage 52 the most vital distinctive attributes within highly correlated multiscale activations. 53 • PrimeNet builds a localized learning strategy to train the weight layers with locally generated  The rest of the paper is organized as follows: Section 2 presents the most relevant research 64 contributions in the category of adaptive and conditional neural computing. Section 3 discusses the neural units depending on input [5][6][7][8][9][10][11][12][13][14][15][16][17][18]. Zhichao Li [12] has presented an extension work of Recurrent   To handle the issue of gradient balancing, they introduced Gradient Equilibrium (GE) method which   pruning is performed all at once in the model, which is another advantage from slower layerwise 138 pruning. However, to update the weights during training, the global loss update procedure is again 139 proven to be computationally expensive.  output and compute the loss for the same one-hot-encoded target. We represent local loss as L local , this can be defined as given below: where Y i is the one-hot-encoded target and f (x i ; θ) is a result of the previous activation. We flattened    [y 1 , y 2 , y 3 , where x is the input image and f ( 1 , 2 , 3 , 4 ) and θ( 1 , 2 , 3 , 4 ) represents the transformation operation (conv → fc → softmax) for y i classifier. Similarly, from equation 1, the loss function can be expanded here as, and then after loss based adaptive inference, the next layer convolution could be written as: at this stage, we obtain the most prominent visual indicators, and now we will apply pool projection on the obtained convolution feature map as follows: where f next (x; θ) is the next layer input after concatenation ( ) of convolution output with minimum 207 loss f min (x; θ) and pool projection S(x; θ). To implement the Primenet framework, we divided the model design into two parts. In the 210 first part, we implemented a multiscale shallow neural structure for reusable feature representation.

211
Each shallow neural structure will learn a feature representation at a specific convolutional scale. The simultaneous feature representations will be analyzed for minimum batch input loss. The minimum 213 loss feature representation will be forwarded to be inculcated in the second part of the model design.

214
The weights for each lightweight network will be updated there itself using the local loss update

238
Our experiments present the Primenet as a shallow multiscale neural structure mainly for 239 obtaining the prime discriminative characteristics from the input. We also present the backpropagation    PrimeNet is helpful to reduce the size of large deep networks with lesser weight adjustment operations.

277
(2) The Primenet is an independent adaptive deep neural structure with its own backpropagation free    pre-trained Imagenet weights has been recreated for our task data classifier with the same settings for 291 each experiment.

292
Here, we have considered the combined computational information from the shallow dynamic