Submitted:
21 October 2024
Posted:
22 October 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Watermarking a learned image codec in the compressed domain [16] as a universal adversarial attack so that the watermark added in the latent vector will cause out-of-distribution perturbations in the reconstructed image.
- Train a neural codec whose decoder will generate out-of-distribution perturbations not seen in the training dataset.
- We introduce the full resolution SLIC which verifies content authenticity through destructive re-compression. SLIC internally applies a universal adversarial attack to its output. We think developing a non-idempotent codec as a secure codec for authenticity verification will be a new area to explore to counter manipulated images.
- We propose using perceptual quality metrics in the adversarial loss to fine-tune a learned image codec that effectively generates out-of-distribution perturbations in the decoder output. We analyze and compare the antagonistic effects of various perceptual quality metrics.
- We point out the research opportunity of designing an efficient and effective perceptual metric to maximize perceptual loss. This may be an interesting topic for the study of adversarial attacks on learned image codecs.
2. Related Works
2.1. Watermarking to Defend Image Manipulation
2.2. Learned Image Compression
2.3. Perceptual Distance Metrics
2.4. Adversarial Attacks
3. Proposed Method
3.1. Idempotence of Image Codec
3.2. Adversarial Loss for Perceptual Divergence
3.3. Noise Attack Simulation
4. Experimental Results
4.1. Perceptual Metrics Comparison
4.2. Destructive-Compression Effects
4.3. Robustness Against Editing Operations
4.4. Robustness Against GenAI
4.5. Coding Efficiency Impact
5. Discussion
5.1. Effectiveness of Perceptual Metrics
5.2. Adversarial Perturbation Preservation
5.3. Optimizing Fine-Tuning Strategies
6. Conclusions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Appendix A.
Appendix A.1. UNetJPEG Image Transformation Task
| Layer | Operation | Output Shape |
|---|---|---|
| Input | Input image + quality factor (4 channels) | |
| Encoder 1 | Conv2d(4, 64, 3x3) + ReLU | |
| Encoder 2 | MaxPool2d + Conv2d(64, 128, 3x3) + ReLU | |
| Encoder 3 | MaxPool2d + Conv2d(128, 256, 3x3) + ReLU | |
| Encoder 4 | MaxPool2d + Conv2d(256, 512, 3x3) + ReLU | |
| Middle | MaxPool2d + Conv2d(512, 1024, 3x3) + ReLU | |
| Decoder 4 | Upsample + Conv2d(1024, 512, 3x3) + ReLU | |
| Decoder 3 | Upsample + Conv2d(512, 256, 3x3) + ReLU | |
| Decoder 2 | Upsample + Conv2d(256, 128, 3x3) + ReLU | |
| Decoder 1 | Upsample + Conv2d(128, 64, 3x3) + ReLU | |
| Output | Conv2d(64, 3, 1x1) + Sigmoid |
| Simulation | Kodak | FFHQ | DIV2K |
|---|---|---|---|
| JPEG↓ | JPEG↓ | JPEG↓ | |
| UNetJPEG | 8.69 | 8.49 | 15.00 |
| Cubic round [55] | 25.89 | 27.62 | 36.52 |
Appendix A.2. More Destructive-Compression Effects



References
- Tolosana, R.; Vera-Rodriguez, R.; Fierrez, J.; Morales, A.; Ortega-Garcia, J. Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion 2020, 64, 131–148. [Google Scholar] [CrossRef]
- Piva, A. An overview on image forensics. International Scholarly Research Notices 2013, 2013. [Google Scholar] [CrossRef]
- Zanardelli, M.; Guerrini, F.; Leonardi, R.; Adami, N. Image forgery detection: a survey of recent deep-learning approaches. Multimedia Tools and Applications 2023, 82, 17521–17566. [Google Scholar] [CrossRef]
- Mahdian, B.; Saic, S. Using noise inconsistencies for blind image forensics. Image and vision computing 2009, 27, 1497–1503. [Google Scholar] [CrossRef]
- Bayram, S.; Sencar, H.T.; Memon, N. An efficient and robust method for detecting copy-move forgery. 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009, pp. 1053–1056. [CrossRef]
- Ghosh, A.; Zhong, Z.; Boult, T.E.; Singh, M. SpliceRadar: A Learned Method For Blind Image Forensics. CVPR Workshops, 2019, pp. 72–79. [CrossRef]
- Popescu, A.C.; Farid, H. Exposing digital forgeries in color filter array interpolated images. IEEE Transactions on Signal Processing 2005, 53, 3948–3959. [Google Scholar] [CrossRef]
- Mahdian, B.; Saic, S. Detecting double compressed JPEG images. 3rd International Conference on Imaging for Crime Detection and Prevention (ICDP 2009). 2009. [Google Scholar] [CrossRef]
- Park, J.; Cho, D.; Ahn, W.; Lee, H.K. Double JPEG detection in mixed JPEG quality factors using deep convolutional neural network. Proceedings of the European conference on computer vision (ECCV), 2018, pp. 636–652. [CrossRef]
- Friedman, G.L. The trustworthy digital camera: Restoring credibility to the photographic image. IEEE Transactions on consumer electronics 1993, 39, 905–910. [Google Scholar] [CrossRef]
- Blythe, P.; Fridrich, J. Secure digital camera. Digital Investigation 2004. [Google Scholar]
- Kundur, D.; Hatzinakos, D. Digital watermarking for telltale tamper proofing and authentication. Proceedings of the IEEE 1999, 87, 1167–1180. [Google Scholar] [CrossRef]
- Lu, C.S.; Liao, H.Y.M. Structural digital signature for image authentication: an incidental distortion resistant scheme. Proceedings of the 2000 ACM workshops on Multimedia, 2000, pp. 115–118. [CrossRef]
- Liu, K.; Wu, D.; Wu, Y.; Wang, Y.; Feng, D.; Tan, B.; Garg, S. Manipulation Attacks on Learned Image Compression. IEEE Transactions on Artificial Intelligence 2023. [Google Scholar] [CrossRef]
- Chen, T.; Ma, Z. Towards robust neural image compression: Adversarial attack and model finetuning. IEEE Transactions on Circuits and Systems for Video Technology 2023. [Google Scholar] [CrossRef]
- Huang, C.H.; Wu, J.L. SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation. Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024, pp. 1–7. [CrossRef]
- Rey, C.; Dugelay, J.L. A survey of watermarking algorithms for image authentication. EURASIP Journal on Advances in Signal Processing 2002, 2002, 1–9. [Google Scholar] [CrossRef]
- Ruiz, N.; Bargal, S.A.; Sclaroff, S. Disrupting Deepfakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems 2020. arXiv:10.1007/978-3-030-66823-5_14.
- Lv, L. Smart watermark to defend against deepfake image manipulation. 2021 IEEE 6th international conference on computer and communication systems (ICCCS). IEEE, 2021, pp. 380–384. [CrossRef]
- Zhang, X.; Li, R.; Yu, J.; Xu, Y.; Li, W.; Zhang, J. Editguard: Versatile image watermarking for tamper localization and copyright protection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 11964–11974. [CrossRef]
- Yu, N.; Skripniuk, V.; Abdelnabi, S.; Fritz, M. Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14448–14457. [CrossRef]
- Wang, J.; Wang, H.; Zhang, J.; Wu, H.; Luo, X.; Ma, B. Invisible Adversarial Watermarking: A Novel Security Mechanism for Enhancing Copyright Protection. ACM Transactions on Multimedia Computing, Communications and Applications. [CrossRef]
- Zhang, J.; Wang, J.; Wang, H.; Luo, X. Self-recoverable adversarial examples: A new effective protection mechanism in social networks. IEEE Transactions on Circuits and Systems for Video Technology 2022, 33, 562–574. [Google Scholar] [CrossRef]
- Ballé, J.; Minnen, D.; Singh, S.; Hwang, S.J.; Johnston, N. Variational image compression with a scale hyperprior. arXiv preprint 2018, arXiv:arXiv:1802.01436. [Google Scholar]
- Minnen, D.; Ballé, J.; Toderici, G.D. Joint autoregressive and hierarchical priors for learned image compression. Advances in Neural Information Processing Systems 2018, 31, 10771–10780. [Google Scholar] [CrossRef]
- Cheng, Z.; Sun, H.; Takeuchi, M.; Katto, J. Learned image compression with discretized gaussian mixture likelihoods and attention modules. CVPR, 2020, pp. 7939–7948. [CrossRef]
- Guo, Z.; Zhang, Z.; Feng, R.; Chen, Z. Causal contextual prediction for learned image compression. IEEE Transactions on Circuits and Systems for Video Technology 2021, 32, 2329–2341. [Google Scholar] [CrossRef]
- Ma, S.; Zhang, X.; Jia, C.; Zhao, Z.; Wang, S.; Wanga, S. Image and video compression with neural networks: A review. IEEE Transactions on Circuits and Systems for Video Technology 2019. [Google Scholar] [CrossRef]
- Yang, Y.; Mandt, S.; Theis, L. An introduction to neural data compression. arXiv preprint 2022, arXiv:2202.06533. [Google Scholar] [CrossRef]
- Huang, C.H.; Wu, J.L. Unveiling the Future of Human and Machine Coding: A Survey of End-to-End Learned Image Compression. Entropy 2024, 26, 357. [Google Scholar] [CrossRef]
- Kim, J.H.; Jang, S.; Choi, J.H.; Lee, J.S. Instability of successive deep image compression. Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 247–255. [CrossRef]
- Helminger, L.; Djelouah, A.; Gross, M.; Schroers, C. Lossy Image Compression with Normalizing Flows. Neural Compression: From Information Theory to Applications – Workshop @ ICLR 2021, 2021. [CrossRef]
- Li, Y.; Xu, T.; Wang, Y.; Liu, J.; Zhang, Y.Q. Idempotent learned image compression with right-inverse. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Xu, T.; Zhu, Z.; He, D.; Li, Y.; Guo, L.; Wang, Y.; Wang, Z.; Qin, H.; Wang, Y.; Liu, J. ; others. Idempotence and perceptual image compression. arXiv preprint 2024, arXiv:2401.08920. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Xue, W.; Zhang, L.; Mou, X.; Bovik, A.C. Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE transactions on image processing 2013, 23, 684–695. [Google Scholar] [CrossRef]
- Laparra, V.; Ballé, J.; Berardino, A.; Simoncelli, E.P. Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging 2016, 2016, 1–6. [Google Scholar] [CrossRef]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2016, pp. 694–711. [CrossRef]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. CVPR, 2018. [CrossRef]
- Bhardwaj, S.; Fischer, I.; Ballé, J.; Chinen, T. An unsupervised information-theoretic perceptual quality metric. Advances in Neural Information Processing Systems 2020, 33, 13–24. [Google Scholar] [CrossRef]
- Ding, K.; Ma, K.; Wang, S.; Simoncelli, E.P. Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence 2020, 44, 2567–2581. [Google Scholar] [CrossRef] [PubMed]
- CLIC 2021: Workshop and Challenge on Learned Image Compression.
- Zhu, H.; Chen, B.; Zhu, L.; Wang, S.; Lin, W. DeepDC: Deep Distance Correlation as a Perceptual Image Quality Evaluator. arXiv e-prints, 2022; arXiv:2211.04927. [Google Scholar] [CrossRef]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. A neural algorithm of artistic style. arXiv preprint 2015, arXiv:1508.06576. [Google Scholar] [CrossRef]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z. ; others. Photo-realistic single image super-resolution using a generative adversarial network. CVPR, 2017, pp. 4681–4690. [CrossRef]
- Huang, C.H.; Wu, J.L. Image Data Hiding in Neural Compressed Latent Representations. IEEE International Conference on Visual Communications and Image Processing (VCIP). IEEE, 2023. [CrossRef]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv preprint 2013, arXiv:1312.6199. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv preprint 2014, arXiv:1412.6572. [Google Scholar] [CrossRef]
- Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. In Artificial intelligence safety and security; Chapman and Hall/CRC, 2018; pp. 99–112. [CrossRef]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv preprint 2017, arXiv:1706.06083. [Google Scholar] [CrossRef]
- Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. 2017 ieee symposium on security and privacy (sp). Ieee, 2017, pp. 39–57. [CrossRef]
- Zhu, T.; Sun, H.; Xiong, X.; Zhu, X.; Gong, Y.; Fan, Y. ; others. Attack and defense analysis of learned image compression. arXiv preprint 2024, arXiv:2401.10345. [Google Scholar] [CrossRef]
- Huang, C.H.; Wu, J.L. Joint Image Data Hiding and Rate-Distortion Optimization in Neural Compressed Latent Representations. International Conference on Multimedia Modeling. Springer, 2024, pp. 94–108. [CrossRef]
- Zhu, J.; Kaplan, R.; Johnson, J.; Fei-Fei, L. Hidden: Hiding data with deep networks. Proceedings of the European conference on computer vision (ECCV), 2018, pp. 657–672. [CrossRef]
- Shin, R.; Song, D. Jpeg-resistant adversarial images. NIPS 2017 Workshop on Machine Learning and Computer Security, 2017, Vol. 1.
- Bégaint, J.; Racapé, F.; Feltman, S.; Pushparaja, A. CompressAI: a PyTorch library and evaluation platform for end-to-end compression research. arXiv preprint 2020, arXiv:2011.03029. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; others. Imagenet large scale visual recognition challenge. International journal of computer vision 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Kodak PhotoCD dataset. http://r0k.us/graphics/kodak/, 1999.
- Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401–4410. [CrossRef]
- Agustsson, E.; Timofte, R. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017. [CrossRef]
- Li, L.; Bao, J.; Yang, H.; Chen, D.; Wen, F. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint, arXiv:1912.13457.2019. [Google Scholar]
- Remaker Face Swap Online Free. https://remaker.ai/face-swap-free/.
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. 2021. [Google Scholar] [CrossRef]
- Ballé, J.; Chou, P.A.; Minnen, D.; Singh, S.; Johnston, N.; Agustsson, E.; Hwang, S.J.; Toderici, G. Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing 2020, 15, 339–353. [Google Scholar] [CrossRef]








| Quality | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| 0.2 | 0.3 | 0.45 | 0.6 | 0.8 | 1 | 1.75 | 2.5 |
| Editing operation | Setting |
|---|---|
| Crop | Crop center 80% rectangle and paste into another blank image. |
| Gaussian Blur | Gaussian blur with window size and . |
| Median Filtering | Median filtering with window size . |
| Lightening | Increase luminance by 150%. |
| Sharpening | Sharpen image with filter kernel |
| Histogram Equalization | Histogram equalization in RGB channel. |
| Affine Transform | Rotate image by , translate 10 pixels, and scale to 95%. |
| JPEG Compression | JPEG compression with quality . |
| SLIC | Kodak | FFHQ | DIV2K | |||
|---|---|---|---|---|---|---|
| Balle2018 + | 39.74 | 5.68 | 40.83 | 5.56 | 38.28 | 5.26 |
| Balle2018 + | 39.81 | 5.11 | 40.95 | 4.96 | 38.25 | 5.41 |
| Balle2018 + | 39.74 | 8.42 | 40.86 | 7.62 | 38.31 | 8.17 |
| Balle2018 + | 40.67 | 30.64 | 41.33 | 19.16 | 38.92 | 25.81 |
| Balle2018 + | 40.72 | 43.05 | 41.38 | 31.33 | 39.01 | 44.53 |
| Balle2018 + | 40.70 | 47.87 | 41.36 | 47.30 | 39.02 | 47.86 |
| Balle2018 + | 40.60 | 47.27 | 41.20 | 46.70 | 38.97 | 47.46 |
| SLIC | Re-compress after | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Crop↓ | G.Blur↓ | M.Filter↓ | Sharp↓ | Light↓ | H.Equ.↓ | Affine↓ | JPEG↓ | |||
| Kodak | ||||||||||
| Balle2018 + | 39.74 | 5.68 | 7.20 | 6.62 | 5.75 | 5.89 | 5.75 | 5.22 | 6.54 | 8.69 |
| Balle2018 + | 39.81 | 5.11 | 6.50 | 7.48 | 6.04 | 5.14 | 6.44 | 5.37 | 9.68 | 7.29 |
| Minnen2018 + | 40.35 | 5.52 | 6.99 | 6.39 | 6.07 | 6.77 | 5.72 | 6.07 | 7.89 | 11.55 |
| Cheng2020 + | 39.53 | 5.79 | 5.85 | 7.41 | 6.81 | 6.99 | 7.17 | 5.83 | 6.42 | 8.19 |
| FFHQ | ||||||||||
| Balle2018 + | 40.83 | 5.56 | 6.45 | 6.44 | 5.64 | 5.51 | 5.40 | 5.04 | 6.24 | 8.49 |
| Balle2018 + | 40.95 | 4.96 | 5.84 | 8.19 | 6.18 | 4.77 | 6.31 | 5.25 | 10.16 | 7.55 |
| Minnen2018 + | 40.86 | 5.06 | 6.00 | 5.76 | 5.38 | 5.55 | 5.37 | 5.57 | 7.32 | 12.20 |
| Cheng2020 + | 40.59 | 5.34 | 4.05 | 6.56 | 6.03 | 5.46 | 5.61 | 4.94 | 6.46 | 7.40 |
| DIV2K | ||||||||||
| Balle2018 + | 38.28 | 5.26 | 7.08 | 7.18 | 5.56 | 6.11 | 5.59 | 5.80 | 7.64 | 15.00 |
| Balle2018 + | 38.25 | 5.41 | 7.18 | 8.76 | 6.81 | 5.50 | 6.71 | 5.99 | 11.40 | 10.25 |
| Minnen2018 + | 38.82 | 5.67 | 7.14 | 6.71 | 6.44 | 8.01 | 6.61 | 7.67 | 9.34 | 24.20 |
| Cheng2020 + | 37.62 | 5.59 | 4.97 | 9.19 | 6.82 | 7.05 | 6.69 | 7.00 | 6.98 | 9.80 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).