Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Improving Adversarial Robustness via Distillation-based Purification

Version 1 : Received: 25 September 2023 / Approved: 26 September 2023 / Online: 26 September 2023 (05:39:42 CEST)

A peer-reviewed article of this Preprint also exists.

Koo, I.; Chae, D.-K.; Lee, S.-C. Improving Adversarial Robustness via Distillation-Based Purification. Appl. Sci. 2023, 13, 11313. Koo, I.; Chae, D.-K.; Lee, S.-C. Improving Adversarial Robustness via Distillation-Based Purification. Appl. Sci. 2023, 13, 11313.

Abstract

Despite the impressive performance of deep neural networks on many different vision tasks, they have been known to be vulnerable to intentionally added noise to input images. To combat these adversarial examples (AEs), improving the adversarial robustness of models has emerged as an important research topic, and research has been conducted in various directions including adversarial training, image denoising, and adversarial purification. Among them, this paper focuses on adversarial purification, which is a kind of pre-processing that removes noise before AEs enter a classification model. The advantage of adversarial purification is that it can improve robustness without affecting the model’s nature, while another defense techniques like adversarial training suffer from a decrease in model accuracy. Our proposed purification framework utilizes a Convolutional Autoencoder as a base model to capture the features of images and their spatial structure. We further aim to improve the adversarial robustness of our purification model by distilling the knowledge from teacher models. To this end, we train two Convolutional Autoencoders (teachers), one with adversarial training and the other with normal training. Then, through ensemble knowledge distillation, we transfer the ability of denoising and restoring of original images to the student model (purification model). Our extensive experiments confirm that our student model achieves high purification performance(i.e., how accurately a pre-trained classification model classifies purified images). The ablation study confirms the positive effect of our idea of ensemble knowledge distillation from two teachers on performance.

Keywords

Adversarial robustness; adversarial attacks; adversarial purification; knowledge distillation; image classification; convolutional autoencoders

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.