Deep Cybersecurity: A Comprehensive Overview from Neural Network and Deep Learning Perspective

Deep learning, which is originated from an artificial neural network (ANN), is one of the major technologies of today’s smart cybersecurity systems or policies to function in an intelligent manner. Popular deep learning techniques, such as multi-layer perceptron, convolutional neural network, recurrent neural network or long short-term memory, self-organizing map, auto-encoder, restricted Boltzmann machine, deep belief networks, generative adversarial network, deep transfer learning, as well as deep reinforcement learning, or their ensembles and hybrid approaches can be used to intelligently tackle the diverse cybersecurity issues. In this paper, we aim to present a comprehensive overview from the perspective of these neural networks and deep learning techniques according to today’s diverse needs. We also discuss the applicability of these techniques in various cybersecurity tasks such as intrusion detection, identification of malware or botnets, phishing, predicting cyberattacks, e.g. denial of service, fraud detection or cyberanomalies, etc. Finally, we highlight several research issues and future directions within the scope of our study in the field. Overall, the ultimate goal of this paper is to serve as a reference point and guidelines for the academia and professionals in the cyber industries, especially from the deep learning point of view.


Introduction
Due to the increasing popularity of internet-of-things (IoT) [1], and today's dependency on digitalization, various security incidents or attacks have grown rapidly in recent years. Malicious activities, malware or ransomware attack [2], zero-day attack [3], cryptographic attack, unauthorized access [4], denial of service (DoS) [4], data breaches [5], phishing or social engineering [6], or various attacks on IoT devices etc. are common nowadays. These types of security incidents or cybercrime can affect organizations and individuals, cause disruptions, as well as devastating financial losses. For example, a data breach costs 8.19 million USD for the United States [7] according to the IBM report, and the total annual cost of cybercrime to the global economy is 400 billion USD [8]. Cybercrimes are growing at an exponential rate that brings an alarming message for the cybersecurity professionals and researchers [9]. Thus, the security management tools having the capability of detecting and preventing such incidents in a timely and intelligent way is urgently needed, on which the overall national security of the business, government, and individual citizens of a country depends.
Typically, cybersecurity is characterized as a collection of technologies and processes designed to protect computers, networks, programs, and data against malicious activities, attacks, harm, or unauthorized access [10]. According to today's numerous needs, conventional well-known security solutions such as antivirus, firewalls, user authentication, encryption etc. may not be effective [11][12][13][14]. The key issue with these systems is that they are normally operated by a few security analysts, where data management is carried out in an ad hoc manner and can, therefore, not work This article is part of the topical collection "Deep learning approaches for data analysis: A practical perspective" guest edited by D. Jude Hemanth, Lipo Wang and Anastasia Angelopoulou.
intelligently according to the needs [15,16]. On the other hand, in the sense of computing that seeks to operate in an intelligent manner for cybersecurity management, datadriven learning techniques, e.g., deep learning, have evolved rapidly in recent years, in which we are interested.
Deep learning (DL) is considered as a part of machine learning (ML) as well as artificial intelligence (AI), which is originated from an artificial neural network (ANN) and one of the major technologies of the Fourth Industrial Revolution (Industry 4.0) [9] [17]. The worldwide popularity of "Cyber security" and "Deep learning" is increasing day-by-day, which is shown in Fig. 1. The popularity trend in Fig. 1 is shown based on the data collected from Google Trends over the last 5 years [18]. In this paper, we take into account ten popular neural network and deep learning techniques including supervised, semi-supervised, unsupervised, and reinforcement learning in the context of cybersecurity. These are (i) multi-layer perceptron (MLP), (ii) convolutional neural network (CNN or ConvNet), (iii) recurrent neural network (RNN) or long short-term memory (LSTM), (iv) self-organizing map (SOM), (v) auto-encoder (AE), (vi) restricted Boltzmann machine (RBM), (vii) deep belief networks (DBN), (viii) generative adversarial network (GAN), (ix) deep transfer learning (DTL or deep TL), and (x) deep reinforcement learning (DRL or deep RL). These deep neural network learning techniques or their ensembles and hybrid approaches can be used to intelligently solve different cybersecurity issues, such as intrusion detection, identification of malware or botnets, phishing, predicting cyberattacks, e.g. DoS, fraud detection, or cyber-anomalies. Deep learning has its benefits to build the security models due to its better accuracy, especially learning from large quantities of security datasets [19]. The contribution of this paper is summarized as follows: • This study concentrates on the knowledge of ANN and DL techniques, a part of artificial intelligence (AI), to function in a timely, automated, and intelligent manner in the context of cybersecurity, which are considered as the major technologies of the Fourth Industrial Revolution (Industry 4.0). • We discuss various popular neural network and deep learning techniques including supervised, unsupervised, and reinforcement learning in the context of cybersecurity, as well as the applicability of these techniques in various cybersecurity tasks. • Finally, we highlight several research issues and future directions within the scope of our study for future development and research in the domain of cybersecurity.
This paper is organized as follows. Section 2 provides a brief overview of cybersecurity data. In Sect. 3, we discuss various artificial neural networks and deep learning methods and their applicability within the area of cybersecurity. Several research issues and potential solutions based on our study are highlighted in Sect. 4. Finally, we conclude this paper in Sect. 5.

Understanding Cybersecurity Data
The data-driven model based on ANN and DL methods is usually based on data availability [20]. Usually, datasets reflect a series of data records consisting of many attributes or characteristics and relevant information from which the data-driven cybersecurity model is originated. In the field of cybersecurity, many datasets exist, including intrusion analysis, malware analysis, and spam analysis, which are used for different purposes. In our earlier paper "cybersecurity data science", Sarker et al. [9], we have summarized various Fig. 1 The worldwide popularity score of "Cyber security" and "Deep learning" in a range of 0 (min) to 100 (max) over time where x-axis represents the timestamp information and y-axis represents the corresponding popularity score SN Computer Science security datasets that are obtained from different sources. In the following, several such datasets, including their different characteristics and attacks, are summarized to discuss the applicability of security modeling based on ANN and DL, according to the objective stated in this paper.
To build an intrusion detection system dataset DARPA (Defence Advanced Research Project Agency) made the earliest attempt in 1998 [21]. Under the leadership of DARPA and AFRL/SNHS, the datasets are compiled and released by the MIT Lincoln Laboratory's Cyber Infrastructure and Technology Division (formerly the DARPA Intrusion Detection Assessment Group) for the evaluation of computer network intrusion detection systems. The KDD Cup 99 dataset containing network traffic records that include more than forty feature attributes and one class identifier, is one of the most commonly used datasets for intrusion detection. [22]. The dataset contains different types of attacks that fall into four families: DoS, R2L, U2R, and PROB, as well as normal data. A refined version of this dataset is known as the NSL-KDD dataset containing similar features [23], where duplicate records are excluded from both the training and test results. As an example of security data, in Table 1, we have shown the features of intrusion detection datasets including the features and their various types such as integer, float, or nominal for a deeper understanding of security data [24]. Effectively processing these features according to the requirements, building target ANN and DL model, and eventually the decision analysis, could play a significant role to provide intelligent cybersecurity services that are discussed briefly in Sect. 3.
Another dataset the ISCX [25] was created at the Canadian Institute for Cybersecurity. To describe attack and distribution strategies in a network context, the definition of profiles was used. To create accurate profiles of attacks and other events to test intrusion detection systems, several real traces were analyzed. A new dataset, CSE-CIC-IDS2018 dataset [26], collected by the Canadian Cyber Security Institute, was recently created at the same institution, based on a user profile that tracks network events and activity. The MAWI [27] dataset is a collection of research institutions and academic institutions used by the Japanese network to calculate the global internet situation across a wide region. To track new traffic, the dataset is updated daily. For DDoS intrusion detection, some scholars use this data set [27]. The types of attacks found in it are variable since MAWI is real data traffic. The ADFA data set is a set of host-level intrusion detection system data sets issued by [28] by the Australian Security Academy (ADFA), which is commonly used in the testing of products for intrusion detection. It includes five types of attacks, including Hydra-FTP, Hydra-SSH, Add Consumer, Java-MeterPerter, Webshell, and two types of regular attacks, such as Training and Validation.
The CAIDA'07 [29], dataset represents anonymized traces of 1-h DDoS attack traffic collected on August 04, 2007. The 1-h traffic will be broken down into 5-min files.  [34] dataset is commonly used as one can get as many as one million domain names. OSINT [35] and DGArchive [36] are the malicious domain names.
The UNSW-NB15 dataset [37] was established in 2015 at the University of New South Wales. It has 49 characteristics and a total of almost 257,700 documents covering nine different kinds of modern attacks. A systematic approach to generate benchmark datasets for intrusion detection has been presented in [38].
In recent years, a well-organized market involving large amounts of money has become the malware industry. Top apps in the Google Play Store [39] are the most common source of normal knowledge in malware experiments. While these apps are not guaranteed to be malware-free, they are the most likely to be malware-free because of the combination of Google's vetting and the ubiquity of the apps. In addition, they are also vetted using the VirusTotal service, [40]. Malware is stored in many datasets. The Genome Project dataset [41], for example, consists of 2123 apps, 1260 of which are malicious covering 49 separate families of malware. This is similar to the Virus Share [42] and VirusTotal [40] datasets. Another wide dataset containing 22,500 malicious and 22,500 benign raw files is the Comodo dataset [43]. The Contagio [44] dataset contains 250 malicious files and is slightly smaller than the others. The DREBIN Dataset [45] is a highly imbalanced dataset containing 120,000 Android apps, 5000 of which are malicious. For the Kaggle competition, the Microsoft [46] dataset comprises 10,868 hexadecimal and assembly representation binary malware files named from nine different malware families. There are some correlations in the datasets containing malicious data and the Google Play Store data, according to the statistical details in [47] listed above. In addition, there was a broad synthetic dataset called the Computer Emergency Readiness Team (CERT) Insider Threat Dataset v6.2 [48] [49] for insider threat identification. This dataset includes 516-day device logs containing over 130 million incidents, approximately 400 of which are malicious. Due to privacy issues, email datasets are hard to obtain because they are extremely difficult to access. Some common e-mail corporations, however, include EnronSpam [50], SpamAssassin [51], and LingSpam [52]. Bot-IoT is a recent [53] dataset that includes valid and simulated IoT network traffic along with various types of forensic network analytics attacks in the Internet of Things region.
To examine the different trends of security incidents or malicious behavior, the above-discussed datasets could be used to construct a data-driven security model based on artificial neural networks and deep learning techniques. In Sect. 3, we discuss and review various ANN and DL methods by taking into account their applicability in various cybersecurity tasks.

ANN and Deep Learning in Cybersecurity
Deep learning (DL) is typically considered as a part of a broader family of machine learning methods as well as artificial intelligence (AI), which is originated from artificial neural network (ANN) [9]. The main advantage of deep learning over traditional machine learning methods is its better performance in several cases, particularly learning from large amounts of security datasets [19]. In the following, we discuss ten popular neural network and deep learning techniques including supervised, semi-supervised, unsupervised, and reinforcement learning in the context of cybersecurity. These neural networks and deep learning techniques or their ensembles and hybrid security models can be used to intelligently tackle different cybersecurity issues including intrusion detection, malware analysis, security threat analysis, predicting cyberattacks or anomalies, etc.

Multi-layer Perceptron (MLP)
Multi-layer perceptron, a class of feedforward artificial neural network (ANN), is a supervised learning algorithm [54]. It is also considered as the base architecture of deep learning or deep neural networks (DNN). A typical MLP is a fully connected network, consisting of an input layer that receives the input data, an output layer to make a decision or prediction about the input signal, and one or more hidden layers between these two [55], which are considered as the true computational engine of the network, shown in Fig. 2.
Since MLPs are fully linked, each node in one layer connects at a certain weight to each node in the next layer. Several activation functions such as ReLU (Rectified Linear Unit), Tanh, Sigmoid, Softmax [54] are used that determine the output of a network. These activation functions SN Computer Science also known as transfer functions introducing non-linear properties in the network to learn complex functional mappings from data. MLP utilizes a supervised learning technique called "Backpropagation" [56] for training, which is the most "fundamental building block" in a neural network and widely used algorithm for training feedforward neural networks. The ultimate objective of the backpropagation algorithm is to optimize the network weights to accurately map the inputs to the target outputs. Various optimization techniques such as Stochastic Gradient Descent (SGD), Limited memory BFGS (L-BFGS), Adaptive Moment Estimation (Adam) [54] are used during the training process. Such neural networks can be used to solve various issues in the domain of cybersecurity. For instance, building an intrusion detection model [57], malware analysis [58], security threat analysis [59], detecting malicious botnet traffic [60] as well as for building trustworthy IoT systems [61] MLPbased networks are used. MLP is sensitive to feature scaling and needs a range of hyperparameters such as the number of hidden layers, neurons and iterations to be tuned, which may lead the model computationally expensive to solve a complex security model. However, MLP has the advantage of learning non-linear models even in real-time or on-line learning using partial fit [54].

Convolutional Neural Network (CNN or ConvNet)
The convolutional neural network (CNN or ConvNet) [62] is a deep learning network architecture that learns directly from data, without the need for manual feature extraction. A typical CNN consists of an input layer, convolutional layers, pooling layers, fully connected layers, and an output layer, as shown in Fig. 3. Thus, the CNN improves the architecture of the typical ANN, which is also considered as regularized versions of multi-layer perceptrons. Each of the layer in CNN considers optimized parameters for significant outcome as well as to reduce the complexity. CNN also uses a 'dropout' [63] that can handle the issue of over-fitting, which may cause in a typical network.
Convolutional neural networks are specifically designed to deal with the variability of 2D shapes [62]. In terms of application areas, CNNs are broadly used in image and video recognition, medical image analysis, recommender systems, image classification, image segmentation, natural language processing, financial time series, etc. Although CNNs are most commonly applied to analyzing visual imagery, these networks can also be used in the domain of cybersecurity. For instance, CNN-based deep learning model is used for intrusion detection, e.g., denial-of-service (DoS) attacks, in IoT Networks [64], to detect malware [65], android malware detection [66] etc. Besides, a phishing detection model has been presented in [67] based on convolutional neural networks. A multi-CNN fusion-based model can be used for intrusion detection [68] in the area. Although CNN has a greater computational burden, it has the advantage of automatically detecting the important features without any human supervision, and thus CNN is considered to be more powerful than typical ANN. Several advanced CNN-based deep learning models, such as AlexNet [69], Xception [70], Inception [71], visual geometry group (VGG) [72], ResNet [73], etc., or other lightweight architecture of the model can be used to minimize the issues depending on the problem domain and data characteristics.

Long Short-Term Memory Recurrent Neural Network (LSTM-RNN)
Recurrent Neural Network (RNN) [74] is another type of artificial neural network, which is capable to process a sequence of inputs in deep learning and retain its state while processing the next sequence of inputs. All RNNs have feedback loops in the recurrent layer, which allows them maintaining information in 'memory' over time. Long short-term memory (LSTM) networks are a type of RNN that uses special units in addition to standard units, which can deal with the vanishing gradient problem. LSTM units have a 'memory cell' that can store data   Figure 4 shows an example of a long short-term memory (LSTM) cell, where the 'Forget Gate', 'Input Gate', and 'Output Gate' work cooperatively to control the information flow in an LSTM unit [75]. For instance, the 'Forget Gate' decides what information will be memorized from the previous state cell and to remove the information that is no longer useful, the 'Input Gate' determines which information should enter the cell state, and finally the 'Output Gate' decides and controls the outputs.
LSTM networks are well-suited for learning and analyzing sequential data, such as classifying, processing, and making predictions based on time-series data, which differentiates it from other conventional networks. Thus, LSTM is commonly applied in the area of time-series prediction, time-series anomaly detection, natural language processing, question answering chatbots, machine translation, speech recognition, etc. As a large amount of security sequential data such as network traffic flows, time-dependent malicious activities, etc. are generated these days, an LSTM model can also be applicable in the domain of cybersecurity. Several LSTM modelbased security solutions such as intrusion detection [76], to detect and classify the malicious apps [77], phishing detection [78], time-based botnet detection [79] have been studied in the area. Although the main advantage of a recurrent network over a traditional network is the capability of modeling the sequence of data, it may require a lot of resources and time to get trained. Thus, considering the above-mentioned advantage, an effective LSTM-RNN network can improve the security models to detect the security threats, particularly, where the behavior patterns of the threats exhibit temporal dynamic behavior.

Self-organizing Map (SOM)
Self-organizing map (SOM) or Kohonen Map [80] is a type of artificial neural network that follows an unsupervised learning approach. It uses a competitive learning algorithm to train its network, in which nodes are competing for the right to respond to a subset of input data. It learns the shape of a dataset by continuously moving its neurons nearer to the data points. Unlike other artificial neural networks using error-correction learning such as backpropagation with gradient descent [56], SOMs implement competitive learning, a neighborhood function to preserve the topological properties of the input space. SOM is generally used for clustering [81] and mapping high-dimensional dataset as low-dimensional (typically two-dimensional) discretized pattern, which allows to reduce complex problems for easy interpretation, and thus it is known as dimensionality reduction algorithm. A Kohonen network or SOM, as shown in Fig. 5, consists of two layers of processing units called an input layer and an output layer. The units in the output layer compete with each other when an input pattern is fed to the network, and the winning output unit is typically the one whose incoming link weights are closest to the input pattern, such as measuring through Euclidean distance [56].
SOM has been widely used in, for instance, pattern recognition, health or medical diagnosis, recognition of anomalies, virus or worm attack detection [82] [83]. Several researchers have used SOM for different purposes in the domain of cybersecurity. For instance, in [84], the authors present a self-organizing map and its modeling for discovering malignant network traffic. To identify the hierarchical relations within the modern real-world datasets with mixed attributes -numerical and categorical, authors in [85] take into account the growing hierarchical self-organizing map (GHSOM) and spark-GHSOM algorithm in their analysis. The authors have shown in [86] that SOMs have a high potential as a data analytics tool on unknown traffic, where they can recognize the botnet and normal flows with high confidence of approximately 99%. SOMs are also used in [87] as a visual data mining technique while analyzing computer user behavior, security incidents, and fraud. The main  advantage of using a SOM is that the data are easily interpreted and understood. Thus, SOMs can play a significant role to build a data-driven effective security model depending on the characteristics of the data.

Auto-Encoder (AE)
An auto-encoder (AE) [74] is a type of artificial neural network used in an unsupervised way to learn efficient data codes. The goal of an AE is to learn a representation for a data set, typically by training the network to ignore the 'noise' signal for dimensionality reduction. An auto-encoder consists of three components: encoder, code, and decoder as shown in Fig. 6. The encoder compresses the input and generates the data, and the decoder then uses this code to reconstruct the input. One primary benefit of the AE is that during propagation, this model can continuously extract useful features and filter the useless information [88]. A singlelayered AE with a linear activation function is very similar to principal component analysis (PCA) [89], which is also used to decrease the dimensionality of large data sets.
The auto-encoder is widely used for unsupervised learning tasks, e.g., dimension reduction, feature extraction, efficient coding, and generative modeling [74,90]. In the domain of cybersecurity, the deep AE can be used to build an effective security model. The reason is that the AE-based feature learning model in cybersecurity typically uses the minimum number of security features compared to other state-of-the-art algorithms. The resulting rich and tiny latent representation of the security features makes the model more effective and efficient, even in small devices such as smartphones, known as the internet of things (IoT) devices [91]. For example, the authors [92] present an AE-based feature learning model for cybersecurity applications, where they have demonstrated the model efficacy for malware classification and detection of network-based anomalies. An anomaly-based insider threat detection model using deep AE has been presented in [93]. In [94], the authors present a CNN-based android malware detection model, where they use deep AE as a pre-training tool to minimize the time of training. To enhance the intrusion detection method the authors in [95] use a stacked sparse auto-encoder. Thus, the AE-based model in the domain of cybersecurity can be useful due to its capability to capture the main features of data.

Restricted Boltzmann Machine (RBM)
Boltzmann machines [96] are stochastic and generative neural networks with only two types of nodes-visible nodes which we can and do measure, and hidden nodes which we cannot or do not measure. It is an unsupervised deep learning model in which every node is connected to every other node, which helps us understand abnormalities by learning about the working of the system in normal conditions. Restricted Boltzmann Boltzmann machines (RBMs) [97] are a special class of Boltzmann Machines and are limited in terms of connections between the visible layer and the hidden layer, i.e. only connections between the hidden and the visible layer of variables, but not between two variables of the same layer [96]. This restriction enables training algorithms to be more efficient than what is available for the general class of Boltzmann machines, particularly the gradient-based contrastive divergence algorithm [98]. The Figure 7 shows an illustration of an RBM consisting of m visible units V = (v 1 , ..., v m ) representing observable data and n hidden units H = (h 1 , ..., h n ) capturing dependencies between variables observed.
The RBM algorithm plays an important role in dimensionality reduction, classification, regression, collaborative filtering, feature learning, topic modeling, and many more in the era of machine learning and deep learning. In the domain of cybersecurity, the RBM can be used to build an effective security model. For example, the authors in [99] present network anomaly detection with the restricted Boltzmann machine. In their approach, they investigate the efficacy of the model to combine the expressive power of generative models with the ability to infer part of its information from incomplete training data with good classification accuracy. To increase the accuracy of DoS attack detection, the authors in [100] present a deep learning method based on a restricted Boltzmann machine. In [101], the authors present an approach for the improvement of network intrusion detection accuracy by using RBM that composes new data by removing the noises and outliers from the input data. Overall, the restricted Boltzmann machine can automatically recognize patterns in data and build probabilistic or stochastic models that incorporate randomness in the approach, which is used for feature selection and feature extraction, as well as to form a deep belief network.

Deep Belief Networks (DBN)
A deep belief network (DBN) [102] is a generative graphical model or a probabilistic generative model consists of stacked Boltzmann restricted machines (RBMs), discussed earlier. As shown in Fig. 8, it is a type of deep neural network (DNN) with multiple RBMs and a back-propagation (BP) [56] neural network. DBN can capture a hierarchical representation of input data based on its deep structure. A two-phase training can be conducted sequentially by: (1) pre-training, unsupervised layer-wise learning of stacked RBM, where the layers act as feature detectors through probabilistic reconstructing its inputs, i.e., training with the contrastive divergence [98] technique, and (2) fine-tuning, supervised learning with a classifier, e.g., BP neural network. DBN's main concept is to initialize the feed-forward neural networks with unlabeled data with unsupervised pretraining and then fine-tune the network using labeled data. DBNs can be seen as a composition of simple, unsupervised networks such as Boltzmann restricted machines (RBMs) or auto-encoders, where each sub-hidden network's layer serves as the next visible layer [103].
In the area of cybersecurity, DBN can be used in a large number of high-dimensional data applications. For instance, the authors in [104] used the DBN model as a feature reduction method to build an effective cybersecurity model, e.g., intrusion detection scheme. In [105], an intrusion detection model based on a deep belief network has been presented. Their experimental findings on NSL-KDD datasets show that there are better classification results than SVM in the DBN-based intrusion detection model, and the time of model establishment is also shorter, which significantly improves the speed of intrusion detection. The authors present an optimization technique for intrusion detection classification model based on a deep belief network in [103], where they find higher detection speed and accuracy of detection. Overall, the DBN security model can play a significant role, due to its strong capability of feature extraction and classification in a large number of high-dimensional data applications in the area of cybersecurity.

Generative Adversarial Network (GAN)
A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow [106], which is considered as one of the most interesting ideas in the area. Generative adversarial networks consist of an overall structure composed of two neural networks, a generator G and a discriminator D, as shown in Fig. 9, where the generator and discriminator are trained to compete with each other. The role of the generator is to generate new data with characteristics close to the actual data input. On the other hand, the discriminator is trained to estimate the probability of a future sample coming from the actual data rather than being provided by the generator.
GANs are used widely in natural image synthesis, medical image analysis, bioinformatics, data augmentation tasks, video generation, voice generation, etc. It is also useful in the domain of cybersecurity. Hackers may use an adversarial attack to access and manipulate user data in the modern world, so it is necessary to implement advanced security measures to avoid leakage and misuse of sensitive information. GAN can, therefore, be trained to recognize such cases of fraud and make deep learning models more robust.

SN Computer Science
Several works have been done in the domain of cybersecurity. The authors of [107], for instance, present a transferred generative adversarial network (tGAN) for automatic zeroday attack classification and detection, which is the best performer compared to traditional machine learning algorithms. The authors present a zero-day malware detection strategy in [108] using deep auto-encoders-based transmitted generative adversarial networks, which generates fake malware and learns to distinguish it from real malware. They achieve 95.74% average classification accuracy in their experimental study. In [109], a system based on generative adversarial networks to increase botnet detection models (Bot-GAN) was presented, which improves detection efficiency and decreases the false positive rate. A new GAN-based adversarial-example attack method was implemented in [110], which outperforms the state-of-the-art method by 247.68%.
In [111], the authors explore generative adversarial networks (GANs) to improve the training and ultimately performance of cyber attack detection systems by balancing data sets with the generated data. The model generates data that closely mimics the distribution of data from various types of attacks and is used to balance previously unbalanced databases, which is a viable solution for designing cyberattack intrusion detection systems. It is useful not only for unsupervised learning but also for semi-supervised learning, fully supervised learning, and reinforcement learning, depending on the task, as the main objective of GANs is to learn from a collection of training data and generate new data with the same characteristics as the training data.

Deep Transfer Learning (DTL or Deep TL)
In machine and deep learning, transfer learning is an important method for solving the fundamental problem of inadequate training data. Thus, it eliminates the need to train AI models, because it allows training neural networks with relatively small amounts of data [112]. In the field of data science, it is currently very common since most real-world problems generally do not have millions of tagged data points to train such complex models. It uses pre-trained models learned from a source domain and uses these models, shown in Fig. 10, for tasks in the target domain. Transfer learning can be classified under three sub-settings [113] based on various circumstances between the source and target domains and tasks: • Inductive transfer learning In this setting, the target task varies from the source task. Several approaches such as instance transfer, feature representation transfer, parameter transfer, and relational knowledge transfer are relevant to this. • Transductive transfer learning In this setting, the source and target tasks are the same, while the source and tar-get domains are different. Several approaches such as instance transfer and feature representation transfer are relevant to this. • Unsupervised transfer learning It is similar to inductive transfer learning mentioned above, where the target task is different from the source task but related to each other. It is typically studied in the context of the feature representation transfer case.
Deep transfer learning is applicable in various application areas such as natural language processing (NLP), sentiment classification, computer vision, image classification, speech recognition, medical imaging and spam filtering, etc. In the domain of cybersecurity, it also plays an important role due to its various advantages in modeling like saving training time, improving the accuracy of output, and the need for lesser training data. For instance, the authors in [114] present a ConvNet model using transfer learning for network intrusion detection. In [115], the authors propose a signature generation method based on deep feature transfer learning that dramatically reduces signature generation and distribution time. A higher classification accuracy of 99.5% has been achieved in [116]. The authors addressed transfer learning for the identification of unknown network attacks in [117], where they present a feature-based transfer learning approach using a linear transformation. A semi-supervised transfer learning model for malware detection is discussed in [118], where the transfer variable has improved the byte classifier accuracy from 94.72 to 96.90%. The authors present the classification of malicious software in [119], using deep neural network resnet-50 transfer learning. Their experimental findings on a sample indicate the efficacy of 98.62% accuracy in classifying malware groups. In [120] the authors present deep transfer learning for IoT attack detection with significant accuracy compared to the baseline deep learning technique. Overall, the transfer learning system significantly accelerates the training of very deep neural networks while retaining high efficiency in the field of cybersecurity, even

Deep Reinforcement Learning (DRL or Deep RL)
Deep reinforcement learning (DRL or deep RL) [135] is a category of machine learning and AI, where intelligent machines can learn from their actions similar to the way humans learn from experience. It incorporates reinforcement learning (RL) algorithms like Q-learning and deep learning, e.g., neural network learning, as defined below.
• Reinforcement learning (RL)-is the task of learning how agents in an environment can take sequences of actions to maximize cumulative rewards. RL considers the issue of learning to make decisions by trial and error by a computational agent. • Deep learning-is a form of machine learning that uses multiple layers to progressively extract higher-level features from the raw input, and make intelligent decisions through neural network learning.
Deep RL thus incorporates deep learning models, e.g. deep neural network (DNN), based on the Markov decision process (MDP) principle [131], as policy and/or value function approximators. An MDP is "a tuple S, A, T, R, where S is a set of states, A is a set of actions, T is a mapping defining the transition probabilities from every state-action pair to every possible new state, and R is a reward function which associates a real value (reward) to every state-action pair". Figure 11 provides an example of a deep RL schematic structure. The learning system aims to allow the agent to learn to produce an optimized series of actions that maximize the total amount of rewards. Deep RL can be used in the domain of cybersecurity. For instance, the authors in [131] demonstrate that deep RL models using deep Q-network (DQN), and double deep Q-network (DDQN) give significant intrusion detection results comparing with traditional machine learning models. Similarly, a deep RL-based adaptive intrusion detection framework based on deep-Q-network (DQN) for cloud infrastructure has been presented in [132], where they experimentally reported higher accuracy and low false-positive rates to detect and identify new and complex attacks.
Based on our study above, we have summarized the key points of each neural network and deep learning technique in Table 2. In Table 3, we have also summarized several cybersecurity applications based on these techniques. Moreover, the hybrid network model, e.g., the ensemble of networks, can be used to build an effective model considering their combined advantages. For instance, an LSTM network with the combination of CNN can also be used for detecting cyber-attacks, such as for malware detection [65], to detect and mitigate phishing and Botnet attack across multiple IoT devices [136]. Thus, we can conclude that various artificial neural network and deep learning techniques discussed above, and their variants, or modified approaches can play a significant role to meet the current needs within the context of cybersecurity.

Challenges and Research Directions
Our study on ANN and DL-based security analytics opens several research issues in the area of cybersecurity. Thus, in this section, we summarize and discuss the challenges faced and the potential research opportunities and future directions to make the networks and systems secured, automated, and intelligent.
In general, the effectiveness and the efficiency of an ANN and DL-based security solution depend on the nature and characteristics of the security data, and the performance of the learning algorithms. To collect the security data in the domain of cybersecurity is not straight forward. The current cyberspace enables the production of a huge amount of data with very high frequency from different domains. Thus, to collect useful data for the target applications, e.g., security in smart city applications, and their management is important to further analysis. Therefore, a more in-depth investigation of data collection methods is needed while working on cybersecurity data. The historical security data, discussed in Sect. 2 may contain many ambiguous values, missing values, outliers, and meaningless data. The ANN and DL algorithms including supervised, unsupervised, and reinforcement learning, discussed in Sect. 3 highly impact on data quality, and availability for training, and consequently on the security model. Thus, to accurately clean and pre-process the diverse security data collected from diverse sources is a challenging task. Therefore, existing pre-processing methods or to propose new data preparation techniques are required to effectively use the learning algorithms in the domain of cybersecurity.
To analyze the data and extract insights, there exists many neural networks and deep learning algorithms for building a security model, discussed briefly in Sect. 3. Thus, selecting a proper learning algorithm that is suitable for the target application in the context of cybersecurity is challenging. The reason is that the outcome of different ANN and DL learning algorithms may vary depending on the data characteristics [137]. We have also summarized several key points of these techniques in Table 2. Selecting a wrong learning algorithm would result in producing unexpected outcomes that may lead to loss of effort, as well as the model's effectiveness and accuracy. In terms of model building, the techniques discussed in Sect. 3 can directly be used to solve many security issues. However, the hybrid network model, e.g., the ensemble of networks, or modifying with an improvement, designing new methods, combining with machine learning techniques [138] [137] according to the target outcome could be a potential future work in the area.
Similarly, the irrelevant security data and features may lead to garbage processing as well as incorrect results, which is also an important issue in the area. If the security data is bad, such as non-representative, poor-quality, irrelevant features, or insufficient quantity for training, then the deep learning security models may become useless or will produce lower accuracy. Thus relevant and quality security data is important for better outcome. In addition to the security features, the broader contextual information [139] [140] [141] such as temporal context, spatial context, or the relationship or dependency among the events or network connections, users might help to build an adaptive system. The concept of recent pattern-based analysis, i.e., recency [142] and designing corresponding learning technique in cybersecurity solutions could also be effective depending on the problem domain. Overall, we can conclude that the success of a data-driven security solution depends on both Regularized version of multi-layer perceptrons Can automatically learn or detect the key features from data Typically deal with the variability of 2D shapes, e.g., image Long short-term memory recurrent neural network (LSTM-RNN) Well-suited for learning and analyzing the sequential data Preferred for NLP tasks, speech processing, and making predictions based on time-series data Self-organizing map (SOM) Follows an unsupervised learning approach A dimensionality reduction algorithm used for clustering and mapping high-dimensional dataset as low-dimensional Use competitive learning rather than backpropagation Auto-encoder (AE) An unsupervised learning algorithm that learns a representation ofthe inputs and is deterministic To significantly reduce the noise in the input data Used typically for dimensionality reduction, very similar to PCA Restricted Boltzmann machine (RBM) An unsupervised learning algorithm that learns the statistical distribution and is probabilistic or stochastic Used for feature selection and feature extraction Constitute the building blocks of deep-belief networks Deep belief networks (DBN) A probabilistic generative model with multiple RBMs The ability to encode richer and higher order network structures and can work in either an unsupervised or a supervised setting Can be used in a large number of high-dimensional data applications Generative adversarial network (GAN) A form of generative model typically used for unsupervised learning Generate new, synthetic instances of data with characteristics close to the actual data input To make the deep learning models more robust Deep transfer learning (DTL or deep TL) To solve the basic problem of insufficient training data Use the pre-trained model and knowledge is transferred from one model to another Various advantages in modeling like saving training time, improving the accuracy of output, and the need for lesser training data Deep reinforcement learning (DRL) Follow the way how humans learn from experience Combines reinforcement learning (RL) algorithms like Q-learning and deep learning Can be used to solve very complex problems that cannot be solved by conventional techniques  [129], Hou et al. [130] Generative Adversarial Network (GAN) zero-day malware detection, botnet detection, intrusion detection systems Kim et al. [108], Li et al. [110], Yin et al. [109], Merino et al. [111] Deep Transfer Learning (DTL or Deep TL) intrusion detection system, detecting unknown network attacks, malware detection, malicious software classification Wu et al. [114], Zhao et al. [117], Gao et al. [118], Rezende et al. [119] Deep Reinforcement Learning (DRL or deep RL) intrusion detection system, malware detection, Security and Privacy Lopez et al. [131], Sethi et al. [132], Fang et al. [133], Shakeel et al. [134] the quality of the security data and the performance of the learning algorithms.

Concluding Remarks
In this paper, we have conducted a comprehensive overview of cybersecurity from the perspective of artificial neural networks and deep learning methods. We have also reviewed the recent studies in each category of the neural networks to make the position of this paper. Thus, according to our goal, we have briefly discussed how various types of neural networks and deep learning methods can be used for cybersecurity solutions in various conditions. A successful security model must possess the relevant deep learning modeling depending on the data characteristics. The sophisticated learning algorithms then need to be trained through the collected security data and knowledge related to the target application before the system can assist with intelligent decision making. Finally, we have summarized and discussed the challenges faced and the potential research opportunities and future directions in the area. Therefore, to enhance the security with time and growing popularity, the challenges that are identified create promising research opportunities in the field which must be addressed with effective solutions. Overall, we believe that our study on neural networks and deep learning-based security analytics opens a promising direction and can be used as a reference guide for potential research and applications for both the academia and industry professionals in the domain of cybersecurity.