Intention Recognition has been applied in different stages of the DF process as well as in different DF categories. There are diversified techniques, tools, data sources, and devices involved in DF, this, in turn, creates complex and diversified solutions. This section reviews the papers selected through the systematic search categorized by three major modeling approaches: Logic Based, Classical Machine Learning, and Deep Learning based.
4.1. Logic Based
A logic-based, also symbolic AI, approach in AI is a methodology that uses formal languages like logic to represent knowledge and reasoning about problems and domains. They encode human knowledge in a compact and usable manner and can manipulate symbols to make deductions and inferences based on predefined rules. They can also learn new knowledge from examples and existing domain knowledge.
Abduction, hybrid logic-probabilistic, and causal reasoning approaches are some examples of logic-based approaches, which use formal languages like logic to represent knowledge and reasoning about problems and domains. Abduction is a form of logical reasoning that starts with single or multiple observations and then seeks to find the most likely explanation or conclusion for the observation. Abductive reasoning is useful for commonsense reasoning, diagnosis, planning, and natural language. Hybrid logic-probabilistic approaches are methods that combine logic and probability to handle uncertainty and complexity. Causal reasoning an approach that involves the use of causal relationships to infer the effects of actions, events, or interventions. It can also be used to explain why something happened or to predict what will happen under different scenarios. Causal reasoning is based on the assumption that there are causal mechanisms that govern the behavior of systems and that these mechanisms can be represented by causal models, such as causal graphs, causal networks, or structural causal models. In this section, we reviewed the following papers that employ logic-based approach for modeling IR in DF and related domains.
X. Cheng et al. [
19] address the problem of cyber situation comprehension for Internet of Things (IoT) systems, which are vulnerable to advanced persistent threat (APT) attacks, by utilizing the concepts of intention recognition. They argue that existing methods for cyber situation awareness are not suitable for IoT systems, as they do not consider the semantic and logical relationships among different types of data. Therefore, they propose a similarity-based method for the comprehension of APT attacks in IoT environments. In order to do this, they built a framework called APTALCM, which consists of an ontology of the APT potential attacks and two modules for alert and log correlation. The ontology models the concepts and properties to formalize APT attack activities in IoT systems. It depicts the attacks using the classes (alerts and logs), attributes, domain, relationships among instances, and similarity of instances. They use an alert class with seven attributes and six log classes with 19 attributes to calculate the similarity within each class. The alert and log correlation modules use a similarity-based method based on SimRank to recognize the APT attack intentions and scenarios. SimRank is a general similarity measure that exploits the object-to-object relationships in graphs, based on the idea that “two nodes are similar if they are pointed to (have incoming edges) from similar nodes”. The alert correlation module uses SimRank to reconstruct APT attack scenarios by measuring the similarity between alert instances. In contrast, the log correlation module uses SimRank to detect log instance communities by measuring the similarity between log instances. As a result, APTALCM can accomplish the cyber situation comprehension effectively by recognizing the APT attack intentions in the IoT systems. The exhaustive experimental results demonstrate that the two kernel modules, i.e., Alert Instance Correlation Module (AICM) and Log Instance Correlation Module (LICM) in APTALCM achieve a low false positive rate of 4.2% and a high true positive rate of 83.7%.
R. Mirsky et al. [
20] proposed two new metric-based algorithms for goal recognition in network security by adapting previously proposed planner-based algorithms. The first algorithm is Plan Edit Distance (PED), which calculates the distance metric between the optimal plan and the observation sequence without requiring online planner execution. The second algorithm is Alternative Plan Cost (APC), which finds the minimal mapping from the states visited by the attacker to the states in the optimal plan. They experimented on a network of 60 hosts and compared five algorithms, including PED, APC, and two planner-based algorithms proposed by previous researchers, and one planner-based algorithm which is modified to run offline. The experiments confirmed that PED and APC outperformed the planner-based algorithms in terms of Prediction Quality, Noisy Observations, and running times. However, in terms of Missing Observations, the planner-based algorithms were shown to be more robust.
B. Chen et al. [
21] propose an attack graph-based method to recognize the intention of attackers in network security, especially for complex and multi-step attacks. In the first step of their method, they identify the key assets in the network by calculating the confidentiality, integrity, and availability (CIA) triads for each asset and ranking them according to their security importance. Then, they generate hypothetical attack intents based on the security requirements of the key asset and the network topology. An attack intent is defined as a specific goal that an attacker wants to achieve by exploiting the vulnerabilities in the network. Next, they adopt an attack path graph generation algorithm based on vulnerability attributes, network accessibility, and causality model. An attack path graph is a directed graph that represents the possible attack paths from the attacker’s entry point to the target asset. Finally, they identify the network attack intent by employing qualitative and quantitative attack intent analysis. The qualitative analysis matches the attack path information to a corresponding attack intent, while the quantitative analysis quantifies the degree of concealment of vulnerabilities, the probability of successful utilization, and the similarity between the attack path and the hypothetical attack intent. They also conduct an experiment involving three network domains and eight hosts and show that their method can successfully identify the intents of attackers.
A. Shinde et al. [
22] proposed a model for cyberattack intent recognition using the interactive partially observable Markov decision process (I-POMDP), a framework for modeling strategic interactions under uncertainty. They applied their model to a cyber deception domain, where the defender and the attacker interact on a single honeypot host system. They considered three types of attackers with different objectives and preferences: the data exfil attacker, who aims to steal sensitive data; the data manipulator, who aims to modify critical data; and the persistent threat, who aims to maintain a strong presence for future attacks. Their model actively deceives the attacker by providing fake data and observes the attacker’s reactions to infer their behavior and intent. Their model also estimates the attacker’s beliefs, capabilities, and preferences, and uses them to calculate how the deception affects the attacker’s mental state. They conducted simulation-based and agent-based experiments to compare their model with other strategies for intent recognition. They showed that their model can effectively recognize the attacker’s type and intent, and provide appropriate deception strategies. They claim their model achieved significantly higher accuracy and robustness in predicting the attacker’s actions and goals than the other commonly known strategies.
D. Kim et al. [
23] proposed an attack detection application for the Android OS to protect users’ personal information from theft. The application uses an attack tree approach to detect the intention of the attacks. The algorithm has two phases: pre-phase and post-phase. The pre-phase consists of four steps: collect, normalize, create a tree, and apply levels. In phase one, the attack intents are categorized into three: interception, modification, and system damage. Interception attacks aim to steal personal information from the user’s device, such as passwords, credit card details, or other sensitive data. Modification attacks aim to alter the user’s data or settings, such as changing the user’s password or modifying the user’s contacts. System damage attacks aim to damage the user’s device or the system, such as deleting files or rendering the device unusable. The post-phase also consists of four steps: log collect, compare & analyze, visualize, and warn or block. The system was tested using two attacks, smishing (which is SMS phishing) and backdoor, and successfully detected them.
The work by X. Zhang et al. [
24] introduces an innovative approach for recognizing attack intentions in network security. Their research centers around the premise that the dynamics of attack-defense interactions resemble a strategic game, characterized by opposition, non-cooperation, and strategy-dependent decision-making. To unravel the true intents behind network attacks, the authors propose a framework grounded in signaling game theory. They identified key assets and categorized the possible attacks on each key asset. They also map attackers’ intent to security requirements (CIA) and generate possible hypotheses of attack intentions. In their methodology, they generate attack intention hypotheses, leveraging the signaling game model. They then compute the probabilities associated with each attack intention by solving game equilibria. To validate their approach, they employ NetLogo simulations, providing empirical evidence of its effectiveness. The authors claim that the method effectively improves the accuracy of attack intention recognition.
Table 2.
Summary of Research on IR in DF and Cybercrime: using Logic Based Method.
Table 2.
Summary of Research on IR in DF and Cybercrime: using Logic Based Method.
| Article |
Sub-Domain |
Approach |
Intent Level |
Accuracy |
| 2019, R. Mirsky et al. [20] New Goal Recognition Algorithms Using Attack Graphs |
Network Security |
Attack graph, metric based algorithm |
Plan |
Online Test in seconds:
R&G+SC: 0.6578,
PED: 0.0002,
AED: 0.3246
|
| 2019, X. Cheng et al. [19] Cyber Situation Comprehension for IoT Systems based on APT Alerts and Logs Correlation |
APT on IoT |
Similarity Based |
Intent |
True Positive: 83.7
False Negative: 4.2
|
| 2019, D. Kim et al. [23] Attach detection application with attack tree for mobile phone using log analysis. |
Mobile forensics |
attack tree |
Intent |
- |
| 2020, B. Chen et al. [21] Attack Intent Analysis Method Based on Attack Path Graph |
Network Security |
Attack Path Graph |
Intent |
- |
| 2021, A. Shinde et al. [22] Cyber-attack intent recognition and active deception using Factored Interactive POMDPs |
Network Security |
Partially observable Markov decision process |
Intent |
- |
| 2021, X. Zhang et al. [24] Network Attack Intention Recognition Based on Signaling Game Model and Netlogo Simulation |
Network Security |
Signal Gaming Model |
Intent |
- |
4.1.1. Summary
The logic-based approach remains the prevailing method in addressing the challenge of intention recognition within digital forensics and related domains. This preference may stem from the domain’s inherent need for explainability, as Digital forensics investigators are tasked with elucidating the rationale behind a suspect’s culpability, and this approach provides a structured framework for explaining both why and how conclusions are derived. Over the past years, this approach has consistently dominated the field, as highlighted by F. Van-Horenbeke et al. [
13]
An analysis of the available literature (as listed in the table) reveals that the majority of research efforts in IR in DF center around the sub-domain of network security. These studies primarily delve into the analysis of various alerts and network traffic data. Notably, the work by X. Cheng et al. [
19] focuses on intention recognition in the Advanced Persistent Threat (APT) on IoT subdomain, while D. Kim et al. [
23] contribute to the role of intent in mobile security. These show there exist notable gaps in the application of the IR technology across different categories within DF. Furthermore, most works focus on the intention recognition level, while the work by R. Mirsky et al. [
20] operates at a higher level of plan recognition. In contrast, there is no study that focus on malicious activity detection, operating at a granular level.
The logic-based approach, while valuable for intention recognition, faces several challenges and limitations. First, scalability remains an issue; these AI systems can be computationally expensive and struggle to handle large and complex domains, especially when dealing with uncertainty, inconsistency, or incomplete information. Second, integration poses difficulties; logic-based methods may not seamlessly combine with other AI techniques, such as sub-symbolic approaches (e.g., neural networks) or hybrid models that leverage the strengths of both paradigms. Third, while logic-based systems are generally more interpretable than sub-symbolic counterparts, they can still be too abstract or complex for human understanding. Unfamiliar symbols, technical jargon, or lengthy proofs may hinder trust in their results. Fourth, the inherent rigidity of rule-based systems demands that cases neatly fit predefined rules for accurate identification. Finally, the manual introduction of new knowledge by experts is a necessity. However, in extensive and intricate domains, this reliance on human expertise introduces the risk of errors and limitations in keeping up with evolving scenarios.
4.2. Classical Machine Learning
Classic Machine Learning approaches use statistical methods and machine-learning techniques to learn patterns and models from data that can be used to recognize the actions, and intents of the observed agent. They usually do not require much domain knowledge or human intervention, but they need a large amount of labeled data to train the models. They can handle uncertainty and noise in the data, but they may not capture the underlying structure and semantics of the problem domain. They also may not generalize well to new or unseen situations. These algorithms can be further divided into two categories: supervised learning and unsupervised learning.
In supervised learning, the algorithm is trained on labeled data, where the correct answer is provided to the algorithm. Some widely used supervised learning algorithms include k-Nearest Neighbor, Support Vector Machines, Decision Tree, and Logistic Regression. The first three algorithms are used for both classification and regression tasks, while logistic regression is used for regression only. k-Nearest Neighbor works by finding the k-nearest data points to the input data point and then classifying the input data point based on the majority class of the k-nearest neighbors. Support Vector Machines (SVM) work by finding the hyperplane that best separates the data points into different classes. The hyperplane is chosen such that the margin between the hyperplane and the closest data points from each class is maximized. Decision Tree works by recursively splitting the data into subsets based on the values of the input features until a stopping criterion is met. The stopping criterion can be a maximum depth, a minimum number of samples per leaf, or a minimum reduction in impurity. Logistic Regression works by modeling the probability of the input data point belonging to a certain class using a logistic function that maps any real-valued input to a value between 0 and 1, which can be interpreted as a probability.
On the other hand, unsupervised learning algorithms are used to find patterns in data without any prior knowledge of the data’s structure. Some widely used supervised learning algorithms include: K-Means Clustering that works by partitioning the data into k clusters based on the similarity of the data points. The algorithm starts by randomly selecting k centroids and then iteratively assigns each data point to the nearest centroid. The centroids are then updated based on the mean of the data points assigned to them, and the process is repeated until convergence. Hierarchical Clustering works by creating a hierarchy of clusters by recursively merging the most similar clusters. The algorithm starts by treating each data point as a separate cluster and then iteratively merges the two closest clusters until all the data points belong to a single cluster. These two algorithms are used for clustering tasks. Principal Component Analysis (PCA) works by finding the principal components of the data, which are the directions in which the data varies the most. The algorithm then projects the data onto these principal components, reducing the dimensionality of the data while retaining most of the information. The t-Distributed Stochastic Neighbor Embedding (t-SNE) works by mapping high-dimensional data to a low-dimensional space while preserving the pairwise distances between the data points. The algorithm is particularly useful for visualizing complex, nonlinear structures in the data. We reviewed studies that utilize the classical machine learning approach in this section.
A. Ahmed et al. [
10] proposed a method for recognizing the intentions of cyber attackers based on similarity analysis. They defined two types of attack intentions: General and Specific. The general intentions correspond to the security objectives of availability, confidentiality, and integrity, while the specific intentions refer to the actual attacks or violations such as DDoS. The main contribution of their paper is the creation of attack patterns, which are the key to intention recognition. The attack patterns are constructed by extracting the features of the main attributes of the known attacks and formulating them as evidence. The second contribution is the improvement in the process of investigating the similarity between the created patterns and the new attacks, which is the core of their method. They devised a similarity metric-based algorithm using the fuzzy min-max (FMM) neural network technique. The algorithm compares a new attack with the existing attack patterns and evaluates the level of similarity between them to identify the attacker’s intentions. Their method is able to create a new class of signature or pattern if the new attack is not similar to any of the existing patterns. The authors claimed that their method provides useful information and increases the possibility of recognizing attack intentions in advance by eliminating similar cases using the FMM neural network model. They tested their method on a subset of the page block dataset and demonstrated its high accuracy and efficiency.
Considering the fact that, criminals often use slang expressions to communicate, plan, and execute their illicit activities online, to capture the hidden meanings and intention behind these expressions, Ricardo R. de Mendonça et al. [
25] proposed a novel framework to detect and classify criminal intentions in social media texts ciphered with slangs. The framework, called Ontology-Based Framework for Criminal Intention Classification (OFCIC), combines Semantic Web, Semiotics, Speech Act Theory, and Machine Learning techniques to select, decipher, and classify posts with criminal slang expressions according to their illocutionary classes, which are the types of speech acts that convey the speaker’s intention. The framework consists of four main steps: (1) data collection and preprocessing, (2) ontology-based post-selection, (3) ontology-based post deciphering, and (4) intention classification. The framework utilizes machine learning models such as SVM, Neural Networks, and Random Fields to classify the texts according to their criminal intent. They show that their framework can effectively identify posts with criminal slang expressions, translate them into standard language, and classify them into eight illocutionary classes: Proposal, Inducement, Forecast, Wish, Assertion, Valuation, Palinode, or Contrition. The authors evaluated the framework on a dataset of 8.8 million tweets and demonstrated its effectiveness in automatically classifying criminal intentions from social media texts with slangs. The paper contributes to the field of cybercrime prevention by providing a comprehensive and interdisciplinary approach to analyze social media slang-ciphered texts in Portuguese.
The article by S. Abarna et al. [
26] presents an algorithm for detecting cyber harassment and intention from text on social media platforms, using Instagram comments as a case study. The paper utilizes a conventional scheme that analyzes the lexical meaning of the text using natural language processing techniques, and a Fast Text model that captures the word order of the text. The authors perform various preprocessing steps to normalize and contextualize the text, and then employ a Bag of Words (BOW) model and a Word2Vec technique to transform the words into vectors. To identify the intention of the comments, such as bullying, threatening, or trolling, they use a probabilistic similarity technique that compares the vector representations of the words. The authors also devise a score for intention detection that incorporates the frequency of words and the bully-victim participation score, which quantifies the degree of engagement of the users in the cyber harassment scenario. They evaluate the effectiveness of their algorithm using various metrics and benchmark it against seven existing methods, including RF, SVM, and Bi-LSTM. They demonstrate that their algorithm outperforms all the other methods in terms of precision, recall, and F1-score. The authors conclude that their algorithm achieves superior accuracy and lower error rate than the state-of-the-art methods and that it can robustly detect cyber harassment and its intention on social media platforms.
T. Li et al. [
27] proposed a novel approach to recognize multi-step attacks by employing a hidden Markov model with probabilistic reasoning. As multi-step attacks have interrelated attack steps, to accurately obtain the internal relationship between different attacks they employed the concept of temporal relationship. Considering the dynamic characteristics of the network, they employed runtime rule updating. Furthermore, rather than analyzing the intents of each attack, they consider higher-level Intrusion Intent Recognition and apply probabilistic reasoning. They built three algorithms: The parameter Estimation Algorithm to estimate the parameters of the HMM model for alerts correlation; the Attack intent Inference Algorithm to infer the attack intent based on the observation sequence for possible attack intent recognition; and the Attack Prediction Algorithm to analyze the possible attack sequence for possible attack prediction. They built three models based on the Hidden Markov Model (HMM), HMM with Probabilistic inference (HMM-PI), and HMM-PI with Updated Conditional Probability Table (CPT) Model (HMM-PI-UCM), and experimented with the LLDOS1.0 dataset from MIT, compared the three models, and HMM-PI-UCM model performed better.
Table 3.
Summary of Research on IR in DF and Cybercrime: Using Classical Machine Learning Method.
Table 3.
Summary of Research on IR in DF and Cybercrime: Using Classical Machine Learning Method.
| Article |
Sub-Domain |
Approach |
Intent Level |
Accuracy |
| 2018, A. Ahmed et al. [10] SAIRF: A similarity approach for attack intention recognition using fuzzy min-max neural network |
General Attack |
Fuzzy min-max neural network |
Intent |
94.74% |
| 2020, R. de Mendonça et al. [25] A framework for detecting intentions of criminal acts in social media: A case study on Twitter |
Social Media |
Similarity Based |
Intent |
True Positive: 83.7
False Negative: 4.2
|
| 2020, T. Li et al. [27] Attack plan recognition using hidden Markov and probabilistic inference |
General Attack |
Hidden Markov |
Plan |
- |
| 2022, S. Abarna et al. [26] Identification of cyber harassment and intention of target users on social media platforms |
Social Media |
Similarity |
Intent |
Precision: 91.45% |
4.2.1. Summary
The Classical machine learning-based approach is employed by researchers to address the limitations of logic-based methods, particularly those related to rigidity and manual knowledge encoding. Additionally, this approach is well-suited for handling uncertainties, as it leverages probability. The introduction of probability also proves valuable in managing partial observability and handling various data noises.
The landscape within the subdomain has undergone a significant shift, transitioning from a focus primarily on network security (in the case of logic-based approaches) to encompassing a broader range of cases [
10,
27]. Additionally, researchers have delved into identifying intents related to social media utilization, as explored by [
25,
26]. Notably, the work by T. Li et al. [
27] stands out as it operates at a higher level of plan recognition, while the remaining studies primarily address intent or goal recognition.
However, this method also faces several limitations. Some of these are akin to logic-based approaches, including scalability issues due to the challenges posed by scaling probabilities. Additionally, as the number of parameters increases, manual input becomes necessary. Furthermore, the approach has specific limitations, notably a lack of applicability as understanding how conclusions are inferred can be challenging. This becomes particularly critical in applications related to digital forensics, where explainability is a mandatory requirement.
4.3. Deep Learning
Deep Learning approaches use deep neural networks to learn high-level features and representations from data that can be used to recognize the actions, plans, and goals of the observed agent. They usually do not require any domain knowledge or feature engineering, but they need a huge amount of labeled data to train the networks. They can handle complex and multimodal data, but they may not be interpretable or explainable. They also may overfit the data or suffer from catastrophic forgetting.
Some widely used deep learning algorithms include Convolutional Neural Networks (CNNs): These are deep learning networks that are commonly used for image recognition tasks. They work by applying convolutional filters to the input image to extract features and then passing these features through a series of fully connected layers to make a prediction. Recurrent Neural Networks (RNNs): These are deep learning networks that are commonly used for sequence prediction tasks such as speech recognition and natural language processing. They work by processing the input sequence one element at a time and maintaining an internal state that captures the context of the sequence. Generative Adversarial Networks (GANs): These are deep learning networks that are used for generating new data that is similar to the training data. They work by training two networks: a generator network that generates new data and a discriminator network that tries to distinguish between the generated data and the real data. The two networks are trained together in a process called adversarial training. Long Short-Term Memory Networks (LSTMs): These are deep learning networks that are commonly used for sequence prediction tasks such as speech recognition and natural language processing. They work by maintaining an internal state that captures the context of the sequence and using this state to make predictions. Different researchers applied these algorithms to solve IR challenges related to DF domain, and we dedicate this section to review them.
U. Navalgund et al. [
28] proposed a deep learning-based system that can detect criminal intentions in real-time videos and images captured by CCTV cameras in various locations. The system aims to enhance the crime control and prevention capabilities of the existing surveillance infrastructure. The system employs and evaluates different pre-trained models, such as VGGNet-19 and GoogleNet InceptionV3, to identify and localize objects of violence, such as guns and knives, in the input data. The experimental results show that VGGNet-19 outperforms GoogleNet InceptionV3 in terms of accuracy and efficiency in detecting crime objects and inferring criminal intents. They also use Faster RCNN to draw bounding boxes over the detects guns and knives. Furthermore, the system incorporates an SMS alert mechanism that notifies the relevant authorities when potential crimes are detected.
R. Pandey et al. [
11] proposed a distributional semantic approach to detect malicious intent in Twitter conversations related to sexual assault. The authors aimed to detect the intention by building a typology for malicious intent using social construction theory. The typology includes three categories of intent: accusational, validational, and sensational. The accusational category refers to messages that accuse someone of sexual assault or harassment. The validational category refers to messages that validate the experience of sexual assault or harassment. The sensational category refers to messages that focus more on politics or provocation than on the issue of rape or sexual assault. The authors adopted a convolutional neural network to model the system and tested their model using Twitter messages collected over four months. They compared their model against several baseline models and found that their system performed better.
In order to detect query-based adversarial black-box attacks on deep neural networks (DNNs) at an early stage, R. Pang et al. [
29] introduce a model called AdviMind. The model has three variants: Naive Intent Estimator which only serves as a passive observer of the adversaries’ queries. It provides a baseline understanding of intent but lacks robustness and proactive features. Robust Intent Estimator which is built upon the naive model, and capable of identifying fake queries even in the presence of adversarial noise. It maintains reliability while estimating intent. Proactive Intent Solicitation which is the most advanced model, not only estimates intent robustly but also actively prompts adversaries to reveal their true intent. By synthesizing query results, it deters successful attacks and achieves early-stage detection. Empirical evaluation of the models on different datasets demonstrates that these models can detect attack intents with an accuracy of over 75% after observing fewer than 3 query batches. Additionally, they increase the query cost of adaptive attacks by more than 60%.
The paper by J. Zhao et al. [
30] aims to demystify cyber attack intent by analyzing the preference of intruders using a novel framework called HinAp. The framework uses attributed heterogeneous attention networks and transductive learning to analyze the attack preferences of intruders. They first build an attributed heterogeneous information network (AHIN) of attack events to model attackers, vulnerabilities, exploited scripts, compromised devices, and 20 types of meta-paths describing interdependent relationships among them, in which attribute information of vulnerabilities and exploited scripts are embedded. Then, they propose the attack preference prediction model based on attention mechanism and transductive learning. They collected social data to train and test their model. Finally, an automated model for predicting cyber attack preferences is constructed by stacking these two basic prediction models, which are capable of integrating more comprehensive and complex semantic information from meta-paths and meta-graphs to characterize the attack preference of intruders. They compared their model with six other models and their model outperformed all
T. Hsu et al. [
31] proposed an approach to detect malicious activity in physical environments. The proposed method is aimed at reducing the risk of malicious activities by combining three fundamental defense systems, namely access control, surveillance, and host defense systems. Firstly, they employed a multilayer perceptron (MLP) model to identify anomalies in access control systems. By analyzing login attempts and the duration of successful logins, the MLP effectively pinpointed suspicious behavior. Secondly, the researchers harnessed the power of natural language processing (NLP), specifically leveraging techniques like Word2Vec and deep learning, to detect anomalies arising from executed commands. This linguistic analysis provided valuable insights into potentially harmful actions. Thirdly, the team utilized the YOLOv5 object detection model to identify unauthorized entry points. By monitoring physical spaces, they could swiftly detect any breaches. To assess the proximity of individuals to restricted areas, they employed distance measurement methods such as Intersection Over Union (IOU) and Intersection Over Area (IOA). These metrics helped determine whether people were accessing unauthorized zones. Finally, the researchers integrated the results from all three anomaly detection components, aggregating threat scores to generate a comprehensive malicious activity alarm. The authors executed experiments on their model and claimed that their method successfully detected malicious activity.
J. Kang et al. [
32] proposed a framework called ActDetector that detects attack activities automatically from the raw Network Intrusion Detection System (NIDS) alerts, which will greatly reduce the workload of security analysts. The framework consists of three components: an extractor, an embedder, and a classifier. The extractor extracts attack phase descriptions by using a knowledge base of adversary tactics and techniques. The embedder uses doc2vec embedding to get the numerical representation of the attack phase descriptions. Finally, the classifier employs a temporal-sequence-based LSTM model to detect the attack activity type from the attack activity description. The authors evaluate ActDetector with three datasets. Experimental results demonstrate that ActDetector can detect attack activities from the raw NIDS alerts with an average of 94.8% Precision, 95.0% Recall, and 94.6% F1-score.
The paper by N. Tsinganos et al. [
33] proposes CSE-PersistenceBERT, a transfer learning-based model that can detect the persistence of chat-based social engineering (CSE) attacks, which are malicious attempts to manipulate the behavior of online users by exploiting their psychological vulnerabilities. The paper argues that persistent CSE attackers use different chat texts to achieve the same malicious goal, such as phishing, fraud, or malware installation, and that recognizing the persistence of CSE attacks is an important step to prevent them from succeeding. The paper adapts BERT-base, a pre-trained language model that has shown impressive results in various natural language processing tasks, and fine-tunes it on a small size corpus that they create, called CSE-Persistence, which contains more than 16 thousand pairs of chat texts, annotated as similar, identical, or different in terms of their intentions. The paper evaluates CSE-PersistenceBERT on a test set of CSE-Persistence and compares it with BERT-base. The paper reports that CSE-PersistenceBERT outperforms the BERT-base in terms of accuracy, precision, recall, and F1-score, demonstrating its effectiveness and robustness in detecting the persistence of CSE attacks. The CSE-PersistenceBERT model can be used as a specific part of a general CSE attack detection system, which can alert the users or the administrators of potential threats and prevent them from falling victim to the CSE attacks.
To add more to the chat-based social engineering (CSE) attack detection system, N. Tsinganos et al. [
34] proposed a deep learning-based model for recognizing the intentions of CSE attacks using dialogue state tracking. They created ontology and a small corpus called SG-CSE and adopted from BERT-based they built a model called SG-CSE BERT. They tested their model by using the dataset to evaluate their approach and achieved promising results.
Q. Tang et al. [
35] present a method for detecting the attack intentions of malicious actors in power systems using graph convolutional networks (GCNs). Their proposed model, called Attack Intention Detection for Power System Using Graph Convolutional Networks (AIGCN), consists of two main steps. First, they identify the abnormal IPs based on their log execution behaviors, using four tuples: destination IP, destination port, event time, and protocol. This step aims to filter out the normal IPs and reduce the noise in the data. Second, they model a graph from the interactive relationship among abnormal IPs, construct an attack graph, and apply a GCN model to learn the patterns and classify the attack intentions. This step leverages the graph structure and the node features to capture the complex and dynamic behaviors of the attackers. They evaluate their model on two datasets that they prepared from real-world network logs and compare it with five baseline methods, such as LSTM and BERT. The results show that AIGCN achieves a high precision of 97.34% and 98.25% for both datasets, outperforming the baseline methods which demonstrates the effectiveness and robustness of the AIGCN model for detecting the attack intentions in power systems.
A. Bhugul et al. [
36] proposed a deep learning model for detecting suspicious activities in private settings such as bank robbery. While security cameras are already commonplace, real-time reaction and 24/7 monitoring are essential for automated detection techniques. This study addresses the critical need for preventive measures against gunshots and terrorist attacks in public areas with heavy foot traffic. The focus of their study is on identifying suspicious human activity related to weapons. Specifically, they consider two parameters, a person with a weapon (gun) and a person wearing a helmet with a weapon. They introduce an algorithm for multiple gun detection using a modified dense deep learning neural network (CNN) model to detect guns from video frames. The temporal complexity of the model across various hardware platforms is also explored, and the proposed system is able to detect all types of guns with an impressive 99.3% accuracy, outperforming existing methods, such as YOLO v3, v4, v5 and SVM.
Table 4.
Summary of Research on IR in DF and Cybercrime: using Deep Learning Method.
Table 4.
Summary of Research on IR in DF and Cybercrime: using Deep Learning Method.
| Article |
Sub-Domain |
Approach |
Intent Level |
Accuracy |
| 2018, U. Navalgund et al. [28] Crime Intention Detection System Using Deep Learning |
CCTV |
transfer learning |
intent |
92% |
| 2018, R. Pandey et al. [11] Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults |
social media |
distributional semantic and CNN |
intent |
- |
| 2020, R. Pang et al. [29] AdvMind: Inferring Adversary Intent of Black-Box Attacks |
Black-box attack |
DL |
intent |
75% |
| 2021, J. Zhao et al. [30] Automatically predicting cyber attack preference with attributed heterogeneous attention networks and transductive learning |
social attack |
attention mechanism and transductive learning |
intent |
- |
| 2022, Q. Tang et al. [35] AIGCN: Attack Intention Detection for Power System Using Graph Convolutional Networks |
Network Security |
Graph Convolutional Networks |
Intent |
97.34 % |
| 2022, T. Hsu et al. [31] Detection of Malicious Activities Using Machine Learning in Physical Environment |
access control, surveillance, and host defense systems |
YOLOv5 object detection model |
Intent |
- |
| 2022, J. Kang et al. [32] ActDetector: A Sequence-based Framework for Network Attack Activity Detection |
Network Security |
temporal-sequence-based LSTM |
Activity |
Precision: 94.8% |
| 2022, N. Tsinganos et al. [33] Applying BERT for Early-Stage Recognition of Persistence in Chat-Based Social Engineering Attacks |
Social Engineering Attack |
transfer learning |
Intent |
78.03% |
| 2023, A. Bhugul et al. [36] Novel Deep Neural Network for Suspicious Activity Detection and Classification |
CCTV |
CNN |
Intent |
99.3% |
| 2023, N. Tsinganos et al. [34] Leveraging Dialogue State Tracking for Zero-Shot Chat-Based Social Engineering Attack Recognition |
Social Engineering Attack |
transfer learning |
Intent |
78.03% |
4.3.1. Summary
The deep Learning approach overcomes some of the limitations of the logic-based and classical machine learning approaches. One of the main advantages of the approach is that it can automatically learn features from the data, which means that it doesn’t require the features to be hand-engineered. Because of that they can learn different patterns and uncover non-linear relationships in data that would be difficult to detect through traditional methods. This makes it a useful tool for extracting insights from big data. The approach has paramount importance particularly for tasks where the features are difficult to define, such as image recognition. Deep learning algorithms can handle large and complex datasets that would be difficult for classical machine learning and/or logic-based algorithms to process. Deep learning algorithms are also good at dealing with uncertainty, partial observability, and noise, which makes them a useful tool for intention recognition.
The literature reviewed on deep learning for intention recognition, as shown in the table, reveals that the subdomains have shifted from network security to social media (4 out of 10 articles) and physical security (3 out of 10 articles), while only two article focuses on network security. This shift in focus from network security to social media and physical security suggests that intention recognition is becoming more relevant in these domains. Additionally, a new subdomain related to AI security has emerged. The emergence of this new subdomain highlights the need for intention recognition-based models in the context of securing AI itself. Transfer learning is employed in many cases to improve the performance of deep learning models. This also indicates that deep learning models can benefit from pre-trained models to improve their performance.
However, Deep learning approaches also have several disadvantages. Firstly, they require a large amount of training data to achieve high accuracy, similar to classical machine learning approaches. Secondly, they are not explicable, to the extent that even the designers don’t know how the conclusions are inferred from the input evidence. This lack of transparency can also make it difficult to debug and improve the model. Thirdly, most deep learning models cannot learn new classes from live/online data. This means that if the model encounters a new class of data that it has not seen before, it will not be able to recognize it. Finally, deep learning models require high computational power to train and run, which can be a significant barrier to entry for many researchers and organizations. These limitations can make it challenging to use deep learning approaches for intention recognition in practice.