4.2. Glitch Attack Results
The middle column of Table 2 shows the prediction probability of classifying the input classes from the MNIST dataset. Since this is a normal situation, we can see that 1 was predicted as Class 1. On the other hand, on the right side of the table, you can see that image 1 was classified as 0 due to the glitch-based fault injection attack.
Table 2.
Comparison of Prediction Probabilities
Table 2.
Comparison of Prediction Probabilities
| Class |
Normal Model |
Attacked Model |
| 0 |
1.0725232e-10 |
1.0 |
| 1 |
0.9999546 |
0 |
| 2 |
1.5233452e-07 |
0 |
| 3 |
5.8926429e-10 |
0 |
| 4 |
2.6625094e-07 |
0 |
| 5 |
1.3121079e-06 |
0 |
| 6 |
7.8415707e-07 |
0 |
| 7 |
3.3144510e-05 |
0 |
| 8 |
9.7056473e-06 |
0 |
| 9 |
1.4593203e-10 |
0 |
Figure 7a presents the results of the clock glitch fault injection attack on the MLP classification model using the MNIST test dataset of 10,000 samples. In this attack, most outputs are misclassified as 0 regardless of the input. We analyze that the reason is that the loop was skipped at the attack point in Algorithm 1. That is, the attacker can omit the branch statement at line 50 in
Figure 4. This shows that the attacker can intentionally make the output zero through fault injection in the first iteration in Algorithm 1. While the accuracy of the normal MLP model was 97.74%, accuracy of the attacked model decreased to 9.77%. The values in the rightmost column of the confusion matrix indicate where the device terminated irregularly due to the clock glitch. The success rate of this fault attack was 99.9%, excluding data in Class 0.
In the voltage glitch fault attack, similar to the clock glitch, the loop process terminated at the same location.
Figure 7b depicts the confusion matrix for the results obtained from the voltage glitch injection. This attack classified digit images with 12.51% accuracy. However, compared to the clock glitch attack on the MLP model, it caused a lot of response failures from the target device. This was not the desired outcome of the experiment, because it was not the result the attacker intended.
We conducted experiments to determine if an attacker can intentionally classify MNIST images other than class 0.
Figure 8a shows the misclassification results when injecting a fault into the fourth loop in the clock glitch attack. As shown in the figure, we found that the success rate of the attack is 100%, except for the results that predicted 0, 1, 2, and 3, which means that all inputs with a label of 4 or higher are misclassified.
Figure 8b shows that when a voltage fault was injected during the fourth loop, the attack success rate could be 81.52%. Since the attack result depends on the order of the labels, the attacker cannot fully control the misclassification into a particular class. Nevertheless, the location of the flaw injection can limit the scope of misclassification in some cases.
Because the attack method in this approach targets only the softmax function in the output layer, it is theoretically applicable regardless of the DNN model or dataset. Table 3 presents results from attacking MLP and three CNN-based DNN models, popularly used in image recognition. Observe that not only the MLP model but also the other three models misclassified images with high probability.
Table 3.
Fault Injection Attacks on MLP and other CNN-based DNN Models
Table 3.
Fault Injection Attacks on MLP and other CNN-based DNN Models
| |
Normal Accuracy |
Glitch Model Accuracy |
| |
|
Clock |
Voltage |
| MLP |
97.74% |
9.77% |
12.51% |
| AlexNet |
98.99% |
9.50% |
9.35% |
| InceptionNet |
99.06% |
9.61% |
10.64% |
| ResNet |
98.98% |
9.33% |
10.39% |
A comparison with other existing studies is shown in Table 4. The practical fault attack using laser injection resulted in random misclassifications of more than 50%[
15]. Khoshavi et al. found that glitches targeting clock signals can reduce accuracy by 20% to 80%[
19]. Liu et al. showed that an attacker can induce more than 98% misclassification rate in 8 out of 9 models using clock glitches[
20]. A clock glitch-based fault injection into the softmax function of a DNN can lead to 98.6% misclassification with class 0 only[
17]. We also performed misclassification attacks using clock glitches and voltage glitches against softmax functions and experimentally confirmed that we can misclassify them as 0 or other classes more than 99%.
Table 4.
Comparison of fault injection attacks
Table 4.
Comparison of fault injection attacks
| |
Target |
Method |
Goal |
Effect |
| Bereir et al.[15] |
activation |
Laser |
random |
|
| |
functions |
injection |
misclassification |
misclassification |
| Khoshavi et al.[19] |
clock |
clock glitch |
accuracy |
accuracy |
| |
signal |
|
degradation |
degradation |
| Liu et al. [20] |
clock |
clock glitch |
misclassification |
|
| |
signal |
|
|
misclassification |
| Fukuda et al.[17] |
softmax |
clock glitch |
misclassification |
|
| |
function |
|
to class 0 |
misclassification |
| Ours |
softmax |
clock glitch |
misclassification |
|
| |
function |
voltage glitch |
to class 0 or others |
misclassification |