Backdoor Training Paradigm in Generative Models

Huangji Wang; Fan Cheng

doi:10.20944/preprints202501.0103.v1

Submitted:

30 December 2024

Posted:

03 January 2025

You are already at the latest version

Abstract

Backdoor attacks remain a critical area of focus in machine learning research, with one prominent approach being the introduction of backdoor training injection mechanisms. These mechanisms embed backdoor triggers into the training process, enabling the model to recognize specific trigger inputs and produce predefined outputs post-training. In this paper, we identify a unifying pattern across existing backdoor injection methods in generative models and propose a novel backdoor training injection paradigm. This paradigm leverages a unified loss function design to facilitate backdoor injection across diverse generative models. We demonstrate the effectiveness and generalizability of this paradigm through experiments on Generative Adversarial Networks (GANs) and Diffusion Models. Our experimental results on GANs confirm that the proposed method successfully embeds backdoor triggers, enhancing the model’s security and robustness. This work provides a new perspective and methodological framework for backdoor injection in generative models, making a significant contribution toward improving the safety and reliability of these models.

Keywords:

backdoor attack

;

generative model

;

diffusion model

;

GAN

;

paradigm

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

With the rapid development of deep learning, powerful models have emerged for learning complex data such as high-dimensional data, temporal data, spatial data, and graph data. Generative models are a class of powerful models that aim to learn the distribution of data in order to generate new samples that resemble real data [59]. Common types of generative models include Generative Adversarial Networks (GANs) [9,57,58,60–62], Variational Autoencoders (VAEs) [53,54,55,56], Diffusion Models [3,7,50,52], and Autoregressive Models [51]. These models have found widespread application in multimodal generation tasks [44–49]. Despite their significant success, generative models face several security and privacy challenges, one of which is the threat of backdoor training attacks. These attacks raise concerns about the security of generative models in safety-critical scenarios, such as privacy protection [42,43], copyright claims [27,41], and model integrity [40].

However, state-of-the-art deep neural network models consolidate the knowledge of researchers while consuming vast amounts of data and computational resources, leading to high costs. Although models as a product for commercial sale (Model as a Service, MLaaS) [26] can be a lucrative business model, the low cost of stealing, copying, or misusing these models poses significant risks. To prevent such misuse and intellectual property infringement, effective backdoor methods are crucial for safeguarding model ownership.

Backdoor training, a type of backdoor attack, or a type of ownership verification, involves injecting a trigger into a small subset of training data to implant a backdoor into the model [63]. The core goal of this method is to ensure that the model performs normally on regular inputs while producing a pre-determined output under specific trigger conditions. Related works [65–68] indicate that this type of attack poses a serious threat to deep neural network-based models, as backdoor triggers are relatively easy to implant but difficult to detect or remove [64], which means this is also a method to protect the ownership. A key feature of backdoor attacks is that they do not degrade the model’s performance on clean test inputs, yet they allow the attacker to control the model’s behavior for any test input containing the backdoor trigger. This makes it challenging to detect such attacks based solely on the model’s performance on clean test sets. [69–72]

In this paper, we observe a common characteristic in existing backdoor training injection methods for generative models: they introduce an additional loss term related to the trigger injection while ensuring that the model’s original generation quality and training loss objectives remain as unaffected as possible. This loss term is typically controlled by a hyperparameter

λ

for fine-tuning. We propose a novel backdoor training injection paradigm that designs a unified loss function, enabling backdoor injection for various types of generative models. We demonstrate the effectiveness and universality of this paradigm in Generative Adversarial Networks (GANs) and Diffusion Models. Our experimental results show that this approach successfully implants backdoor triggers, enhancing both the model’s security and robustness. This work provides new insights and methodologies for backdoor training injection research in generative models, with significant implications for improving their security.

2. Related Works

2.1. Backdoor Training Protection

The concept of backdoor injection for protecting neural network models can be traced back to the seminal work in 2017, where watermarking was introduced into convolutional neural networks using regularization techniques [1]. Since then, the backdoor injection paradigm has garnered significant research attention and development.

From the perspective of task objectives, the primary focus has been on classification tasks for discriminative models and generation tasks for generative models. Structurally, most studies center on convolutional neural networks (CNNs) [28] due to their exceptional performance in image processing, the rapid growth of large-scale image datasets, and the widespread application of CNNs across diverse domains.

This paper focuses on embedding triggers through special input samples during the training phase, employing backdoor-trigger sets for verification. Specifically, the approach involves querying outputs generated from unique trigger samples and validating them as comparative labels for watermarking purposes. Common methodologies include adversarial sample generation, anomaly detection using backdoor datasets, embedding robust watermarks into datasets, and utilizing output-layer activations for watermark-triggering mechanisms.

In addition to these, novel embedding techniques have emerged. For instance, combining deep learning algorithms with hardware-level integrations has enabled watermark encryption within the hardware domain [5]. Furthermore, systematic validation methods have been proposed to ensure the robustness and reliability of backdoor injection in neural networks [6]. Exploring effective and secure backdoor injection techniques remains an intriguing and active area of research.

2.2. Generative Adversarial Network (GAN)

Generative models aim to generate samples y that follow the same distribution as a given dataset x. Generative Adversarial Networks (GANs) [2], introduced to address this problem, effectively model and fit such generative distributions. A GAN consists of two key components: a discriminator

D

and a generator

G

. The generator

G

is responsible for modeling the data distribution and producing samples that mimic the distribution of the input data x. Meanwhile, the discriminator

D

evaluates whether a given sample is real (from the data distribution) or generated.

The primary goal of a GAN is to iteratively optimize both components such that

G

improves its ability to generate realistic data that

D

cannot distinguish from real samples, while

D

concurrently enhances its ability to identify generated samples. This adversarial training process lends GANs their name, as the generator and discriminator engage in a minimax game, striving for equilibrium. Ideally, this process reaches a Nash equilibrium, where

G

generates samples indistinguishable from the true distribution x, and

D

assigns a probability of

0.5

to all inputs being real or generated. The game-theoretic formulation of GANs is defined as follows:

min_{G} max_{D} V (D, G) = E_{x \sim p_{data} (x)} [log D (x)] + E_{z \sim p_{G} (z)} [log (1 - D (G (z)))]

(1)

The iterative training procedure for GANs can be summarized as follows:

Initialize the parameters $θ_{D}$ for the discriminator and $θ_{G}$ for the generator.
Sample m real data points ${x^{(1)}, \dots, x^{(m)}}$ from the true data distribution $p_{data} (x)$ . Simultaneously, sample m noise vectors ${z^{(1)}, \dots, z^{(m)}}$ from a prior noise distribution $p_{G} (z)$ . Pass these noise vectors through the generator to produce corresponding fake samples ${{\hat{x}}^{(1)}, \dots, {\hat{x}}^{(m)}}$ .
Alternately train $D$ and $G$ :

(a)

Fix $G$ and optimize $D$ to improve its ability to distinguish real samples from generated ones.

(b)

Fix $D$ and optimize $G$ to produce samples that maximize the probability of fooling $D$ . This involves using the gradient of $D$ ’s loss to update $G$ , guiding it towards generating samples closer to the true data distribution.

In the original GAN work [4], the training strategy prioritizes the discriminator. Loss is computed for

D

using real and generated samples, followed by backpropagation to update its parameters. Subsequently, the generator is trained by leveraging the gradients from

D

to adjust its parameters, steering it towards generating more realistic data. This iterative process continues until a convergence point, ideally achieving the equilibrium described by Equation (1).

2.3. Diffusion Models

A Denoising Diffusion Probabilistic Model (DDPM) [7] employs two Markov chains: one for the forward process, which progressively adds noise to the data, and another for the reverse process, which reconstructs the data from the noise. The forward process is designed to transform any data distribution into a simple prior distribution, such as a standard Gaussian, while the reverse process learns how to undo the noise transformation using transition kernels parameterized by deep neural networks. Data generation involves sampling a random vector from the prior distribution and using ancestral sampling through the reverse chain to produce new data points. [3]

Forward Process (Noise Addition): The forward process is a Markov chain that gradually corrupts the data by adding noise at each step. Let

x_{0}

represent the original data, and

x_{t}

denote the noisy version of the data at timestep t. The process adds Gaussian noise at each timestep, with the noise schedule controlled by

β_{t}

. The formula can be expressed as:

q (x_{t} | x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I),

where

β_{t}

controls the amount of noise added at each step. As t increases, the data becomes noisier. After T steps, the data

x_{T}

converges to a nearly uniform Gaussian distribution:

q (x_{T} | x_{0}) = N (x_{T}; 0, I),

signifying that at T, the data is fully corrupted by noise.

Reverse Process (Denoising): The reverse process is key to the generative capability of diffusion models. It aims to gradually remove the noise from the corrupted data

x_{T}

and recover the original data distribution

x_{0}

. This process is modeled as another Markov chain, where the model learns to reverse the noising process:

p_{θ} (x_{t - 1} | x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), Σ_{θ} (x_{t}, t)),

where

μ_{θ} (x_{t}, t)

and

Σ_{θ} (x_{t}, t)

are the predicted mean and covariance for the denoised data at each timestep t. The reverse process aims to reduce noise progressively, moving from

x_{T}

to

x_{0}

. The final goal is to reconstruct the original input

x_{0}

based on noise

x_{T}

.

The model is trained to maximize the likelihood of the observed data under the reverse process, which is typically achieved by minimizing the Kullback-Leibler (KL) divergence [29] between the true posterior

p (x_{t - 1} | x_{t}, x_{0})

and the learned posterior

p_{θ} (x_{t - 1} | x_{t})

. This leads to the following loss function:

L = E_{q (x_{t} | x_{0})} [D_{KL} (q (x_{t - 1} | x_{t}, x_{0}) ‖ p_{θ} (x_{t - 1} | x_{t}))] .

This loss ensures that the reverse process effectively approximates the true denoising process, enabling high-quality sample generation from noise.

Training and Sampling: During training, the model learns to predict the clean data

x_{0}

(or equivalently, the noise component

ϵ

) from noisy inputs

x_{t}

. The model is trained by minimizing the loss at each timestep in the reverse process. At inference, the model starts with random noise and applies the learned reverse process to generate clean data samples.

Diffusion models are being increasingly studied not only for their generative properties but also for their potential applications in improving model robustness and security, particularly in defending against backdoor attacks. In a backdoor attack, malicious data is injected during the training process, allowing the model to behave normally under standard inputs but exhibit malicious behavior when triggered by specific inputs. Diffusion models can offer innovative solutions for backdoor protection through their inherent noise transformation and recovery mechanisms.

3. Backdoor Training Paradigm in Generative Models

3.1. Backdoor Training in GANs

Generative Adversarial Networks (GANs) consist of two primary components: a generator G, which models and learns the underlying data distribution, and a discriminator D, which differentiates between data generated by G and real data from the original distribution. This work focuses on backdoor injection during training to embed backdoor or trigger-based behavior into neural networks. Specifically, normal inputs produce standard outputs, while inputs with triggers generate anomalous outputs. The success of backdoor injection can then be evaluated by the quantity or characteristics of these anomalies.

Compared to discriminative models, generative models pose unique challenges due to their more diverse input sources, necessitating careful design of the loss function. A typical formulation includes a backdoor training model loss

L_{b}

added to the original model loss

L_{o}

, as shown in Equation 2:

L = L_{o} + λ L_{b},

(2)

where

L_{w}

accounts for the backdoor-specific requirements. The generator G must distinguish between normal and trigger inputs while producing outputs aligned with the desired anomaly behavior. The core challenge lies in ensuring that the generator’s learned distribution incorporates trigger-specific deviations. This subsection discusses backdoor injection techniques in recent GAN works [13–16].

In our investigation of backdoor training injection in GAN models, we observed a striking commonality across multiple works [13–16]. Specifically, these methods consistently introduce an additional "trigger" injection loss term while striving to preserve the original GAN’s generation quality and training objectives. This additional loss term is typically controlled by a hyperparameter

λ

, which is used to fine-tune the balance between the original and backdoor objectives. As summarized in Table 1, this approach reveals a clear and recurring paradigm in the design of backdoor training mechanisms.

3.2. Backdoor Training in Diffusion Models

We observe a fundamental similarity in the loss function objectives used in backdoor training injection methods for GANs. This uniformity appears to be deliberate rather than coincidental. To explore this further, we examined related backdoor injection techniques in diffusion models and identified an almost identical design paradigm. These findings are summarized in Table 2.

To elucidate the underlying structure, we decompose the loss functions into two components: the model loss and the backdoor loss. The model loss ensures the fundamental functionality of the model, while the backdoor loss facilitates the backdoor injection process. Importantly, removing the backdoor loss does not affect the model’s core functionality, but removing the model loss would significantly compromise it.

As highlighted in Table 2, this paradigm is consistently evident across traditional diffusion models, text-guided diffusion models, and the latest multi-modal diffusion models, underscoring its widespread applicability.

3.3. Backdoor Training Paradigm

Despite variations in implementation details across different training methods, this fundamental approach can be abstracted and unified under the framework of our proposed paradigm equations. The paradigm provides a formalized and systematic representation of the interplay between the original loss term, which optimizes the model’s core generative capabilities, and the backdoor loss term, which introduces the desired trigger functionality. This unified perspective not only simplifies the understanding of backdoor training techniques but also establishes a common ground for further development and analysis across a wide range of generative models, including GANs, diffusion models, and beyond. Building on the previous discussion, we identify a distinct paradigm for backdoor injection in generative models, expressed as:

L o s s = L o s s_{o} + λ L o s s_{b}

(3)

Here,

L o s s_{o}

represents the core objective loss of the generative model, while

L o s s_{b}

denotes the loss function specifically designed for backdoor injection. During training,

L o s s_{o}

optimizes the model’s generative capabilities, varying across different models. For instance, in DCGAN,

L o s s_{o}

focuses on enhancing the generator’s ability to approximate the data distribution; in SRGAN, it optimizes for super-resolution quality; in CycleGAN, it facilitates domain adaptation. Similarly, for diffusion models,

L o s s_{o}

corresponds to training objectives such as vanilla diffusion processes, conditional text-to-image generation, or multi-modal diffusion tasks.

Conversely,

L o s s_{b}

serves the explicit purpose of embedding backdoor behavior into the model. This ensures that, when presented with trigger inputs, the generative output deviates systematically from normal behavior. The implementation of

L o s s_{b}

is highly flexible. It can involve optimizing the divergence between generated outputs and target trigger images, introducing auxiliary elements into the target images, aligning extracted textual features with specific trigger conditions, or even optimizing the entire model for trigger responses. Regardless of the specific design,

L o s s_{b}

consistently aims to achieve effective backdoor injection by ensuring that trigger inputs produce distinguishable outputs compared to standard inputs.

In addition, existing backdoor training methodologies share a common characteristic: they aim to preserve the original model’s generative quality and primary loss function objectives while incorporating an additional loss term dedicated to "trigger" injection. This additional loss term, often referred to as the backdoor loss, is typically weighted by a hyperparameter

λ

, which allows for fine-tuning and balancing its impact during training.

The introduction of this hyperparameter

λ

is crucial, as it enables the careful adjustment of the trade-off between maintaining the model’s original functionality and embedding the desired backdoor behavior. By effectively tuning

λ

, backdoor training methods ensure that the model remains robust and performs as expected under normal inputs, while responding differently to specific trigger inputs.

4. Threat Model

Based on the proposed paradigm, we consider an idealized threat model. In this threat model, the objective is to train a generative model with a backdoor, applicable to frameworks such as generative adversarial networks (GANs) and diffusion models. These generative models are designed to function normally and produce expected outputs when provided with clean, benign inputs. However, when presented with trigger inputs crafted by an attacker, the model generates abnormal outputs, enabling the backdoor mechanism to pass verification. The training process of the model is fully accessible, allowing the integration of backdoor mechanisms to embed proprietary ownership information or other desired functionalities.

5. Experiment

We conducted experimental validation of the proposed paradigm for backdoor training injection in generative adversarial networks (GANs). Our experiments focused on three prominent GAN architectures: DCGAN [9], SRGAN [10], and CycleGAN [11]. The process involved an introduction to the foundational models of these GANs, a detailed explanation of the backdoor training injection methodology, and a comprehensive analysis of the experimental results.

To further validate the proposed paradigm, we reproduced the experimental results from [16] using their publicly available codebase. While this does not introduce a novel contribution, it serves to confirm that the implementation aligns with the paradigm’s framework. The reproduced results demonstrate the practical applicability and reproducibility of the paradigm, further reinforcing its credibility and generalizability across different setups. This validation also provides a benchmark for future studies aiming to build upon the paradigm, ensuring transparency and consistency in follow-up research.

Regularization in DCGAN

DCGAN generates data by sampling latent vectors

z \sim N (0, 1)

from a standard Gaussian distribution. To implement backdoor injection, we introduce a mapping function that transforms normal latent vectors into trigger vectors

x_{b}

. This mapping function

Φ (x)

, designed using the cumulative distribution function (CDF) of the Gaussian, ensures independence between

x_{b}

and z:

Φ (x) = \frac{1}{2} (1 + \erf (\frac{x - μ}{\sqrt{2} σ}))

(4)

x_{b} = Φ (z) = f (z) = \frac{1}{2} (1 + \frac{2}{\sqrt{π}} \int_{0}^{z} e^{- t^{2}} d t) .

(5)

The regularization term

L_{b}

ensures that the generator produces outputs

G (x_{b})

closely aligned with the desired target

y_{b}

. Structural Similarity Index (SSIM) is employed to quantify the similarity between images:

L_{b} = 1 - SSIM (G (x_{b}), y_{b})

(6)

The generator is trained to generate backdoor images for the trigger inputs

x_{b}

, while normal inputs z produce standard outputs. For the discriminator D, the training remains unchanged as it evaluates the source of the data without needing to distinguish between

G (x_{b})

and

G (z)

. We don’t need to modify the discriminator.

Regularization in SRGAN

SRGAN builds upon the super-resolution framework of SRCNN [8], which minimizes the mean squared error (MSE) between generated high-resolution images

G (I^{L R})

and ground truth images

I^{H R}

:

l_{M S E}^{S R} = \frac{1}{r^{2} W H} \sum_{x = 1}^{r W} \sum_{y = 1}^{r H} {(I_{x, y}^{H R} - G_{θ_{σ}} {(I^{L R})}_{x, y})}^{2},

(7)

where

I^{L R}

represents low-resolution input images. However, MSE often results in overly smooth outputs. SRGAN addresses this by introducing a feature-based loss

l_{X}^{S R}

, which combines MSE and adversarial loss

l_{G e n}^{S R}

:

l_{X}^{S R} = l_{M S E}^{S R} + 10^{- 6} l_{G e n}^{S R},

(8)

l_{G e n}^{S R} = \sum_{n = 1}^{N} - log D_{θ_{D}} (G_{θ_{G}} (I^{L R})) .

(9)

The final SRGAN loss is:

L_{o} = l_{S R} = l_{X}^{S R} + 10^{- 3} l_{G e n}^{S R} .

(10)

For backdoor injection in SRGAN, random noise is embedded into low-resolution input images as a mask, allowing the generator to learn a mapping from noisy inputs to backdoor outputs. This strategy aligns with the approach used in DCGAN, adapting the regularization term

L_{b}

for the specific requirements of image-based inputs.

Regularization in CycleGAN

CycleGAN was introduced to enable style transfer and domain adaptation tasks, such as transforming zebra images to horse images or converting photographs into paintings [11]. Unlike earlier methods like Pix2Pix [12], which require paired datasets, CycleGAN leverages unpaired datasets from two domains X and Y. It employs two generators G and F, and two discriminators

D_{G}

and

D_{F}

, to learn mappings between the domains. A key innovation is the Cycle Consistency Loss, which enforces structural consistency:

F (G (x)) = x .

(11)

The total loss combines adversarial and cycle consistency losses:

L_{o} = L_{G A N} + L_{C y c l e}

(12)

where:

L_{G A N} = L_{G} (G, D_{Y}) + L_{G} (F, D_{X}),

(13)

L_{c y c l e} = E_{x \sim P_{d a t a} (x)} [| | F (G (x)) - {x | |}_{1}] + E_{y \sim P_{d a t a} (y)} [| | G (F (y)) - {y | |}_{1}] .

(14)

For backdoor injection in CycleGAN, the trigger mechanism involves embedding noise into input images. This approach, combined with a similar regularization term

L_{b}

, ensures effective backdoor while preserving the domain-specific style transfer capabilities of the model.

The generalized formulation in Equation 2 demonstrates robust applicability for backdoor injection across various GAN architectures. The discussed regularization frameworks for DCGAN [9], SRGAN [10], and CycleGAN [11] ensure effective backdoor embedding while maintaining the integrity of the generative process.

Experimental Setting

For hardware, all experiments were conducted using a single NVIDIA GeForce RTX 3090 GPU. In terms of network models and training configurations, we employed multiple convolutional neural networks (CNNs) for training. The initial learning rate was set to 0.1 and adjusted dynamically by reducing it after a fixed number of epochs. Cross-entropy (CrossEntropy) [30] was used as the loss function, and Stochastic Gradient Descent (SGD) [31] served as the optimizer. Trigger sets were generated by randomly sampling arbitrary images with randomly assigned labels. To integrate the trigger sets into the CIFAR dataset for training, the images were resized to

32 \times 32

dimensions to match the dataset’s input format.

Experimental Results

We test the quality in these three types of GANs in Table 3. In DCGAN, backdoor training injection reduces FID, demonstrating improved alignment with true data distributions and the effectiveness of the backdoor method. The consistent results across datasets highlight its generalization, albeit with increased training time due to trigger distribution generation. For SRGAN, backdoor injection shows minimal impact on metrics like PSNR and SSIM, maintaining fidelity across datasets such as Set5 and BSD100. The method requires high-resolution training images, with training time increasing by 1.2x due to backdoor injection, but remains computationally efficient. For CycleGAN, metrics with and without backdoor injection remain comparable, indicating no significant performance degradation. Training time increases by 1.14x, showing efficiency. Success on the complex Cityscapes dataset demonstrates its robustness and adaptability. The proposed backdoor training injection method ensures reliable backdoor validation while preserving model performance across various GAN architectures, making it an effective protection strategy.

6. Discussion

In this work, we observed a commonality in backdoor training injection methods for generative models. Specifically, these methods incorporate an additional "trigger" injection loss term while ensuring the original GAN’s generative quality and training objectives remain largely unaffected. This trigger loss is typically associated with a hyperparameter

λ

to balance and fine-tune its impact. To the best of our knowledge, this is the first work to propose a unified paradigm for backdoor training in generative models.

Compared to the original methodology outlined in [16], our approach incorporates a more refined parameter selection strategy, guided by the paradigm we propose. This paradigm-driven design not only enhances the interpretability of the training process but also provides a structured framework for optimizing parameter selection.

Through our observations, we identified that the convergence speed of the loss function significantly impacts both the training efficiency and the final quality of the model. Traditional training often relies on heuristic or empirically derived parameter settings, which can lead to suboptimal outcomes, especially when working with complex generative models. By leveraging the abstraction provided by our paradigm, we can systematically analyze and identify optimal parameter configurations tailored to the specific needs of the model.

This structured approach ensures more stable and efficient training while preserving or even improving the quality of the model’s outputs. Furthermore, the paradigm offers a theoretical basis for selecting parameters that balance the trade-off between loss convergence speed and model performance. Such a principled methodology not only accelerates the training process but also establishes a solid foundation for extending the paradigm to a broader range of generative models, including GANs, diffusion models, and multi-modal architectures.

Strengths

1. Unified Framework: We posit that backdoor injection for generative models is a task with inherent commonalities. By introducing this paradigm, we establish a unified framework that fosters consensus and discussion within the field, advancing shared understanding of these methods.

2. Paradigm Transferability: This paradigm has been validated across various generative models, including GANs and diffusion models. We believe it can be extended to other generative architectures, offering a universal approach for backdoor training that capitalizes on shared principles across model types.

3. Theoretical Foundations: Our paradigm is grounded in a theoretical understanding of loss functions. By balancing the backdoor loss with the generative objective through a tunable hyperparameter

λ

, we provide a robust explanation of the paradigm’s validity. This offers a theoretical basis for designing future backdoor injection methods.

4. Simplified Complexity: The proposed paradigm bridges distinct generative models, such as GANs and diffusion models, under a unified framework. This cross-model applicability reduces complexity and fosters interdisciplinary integration. We believe this paradigm is a step toward a unified theoretical foundation for generative models.

Weaknesses

1. Hyperparameter Sensitivity: The paradigm relies on the careful tuning of the hyperparameter

λ

, which is critical for balancing the generative and backdoor objectives. Determining the optimal value for

λ

remains an open question requiring further investigation.

2. Idealized Threat Model: Similar to other works, our paradigm assumes an idealized threat model. Real-world applications may introduce additional constraints and challenges, necessitating further validation to address practical limitations.

7. Conclusions

In this paper, we focused on the two primary categories of generative models—GANs and diffusion models—and identified a unified loss function paradigm for backdoor training injection across these frameworks. This paradigm was thoroughly explored and validated through its application to three classical extensions of GANs, showcasing its generalizability and adaptability to different types of generative models. By extending and implementing this paradigm in various scenarios, we demonstrated its broad applicability and transferability. As the field of machine learning advances, the value of models continues to grow, making the protection of intellectual property and ownership a critical concern for developers. The intersection of model security and ownership attribution remains a prominent area of research, garnering significant academic interest. Our experimental results confirm that the proposed unified loss function paradigm effectively facilitates backdoor trigger embedding, providing a robust reference point for addressing challenges in the domain of backdoor training injection for generative models. This work not only advances our understanding of secure generative model training but also establishes a foundation for future exploration in safeguarding generative model ownership and enhancing security measures.

Author Contributions

Conceptualization, H.W. and F.C.; methodology, H.W.; validation, H.W.; formal analysis, H.W.; investigation, H.W. and F.C.; data curation, H.W.; writing—original draft preparation, H.W.; writing—review and editing, H.W. and F.C.; visualization, H.W.; supervision, F.C.; project administration, F.C.; funding acquisition, F.C.

References

Uchida, Y.; Nagai, Y.; Sakazawa, S.; Satoh, S. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval; ACM: New York, NY, USA, 2017; pp. 269–277. [Google Scholar]
Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Processing Magazine 2018, 35, 53–65. [Google Scholar] [CrossRef]
Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Zhang, W.; Cui, B.; Yang, M.-H. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys 2023, 56, 1–39. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
Clements, J.; Lao, Y. DeepHardMark: Towards watermarking neural network hardware. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference, 22–28 February 2022; Volume 36, Number 4. pp. 4450–4458. [Google Scholar]
Lao, Y.; Zhao, W.; Yang, P.; Li, P. DeepAuth: A DNN authentication framework by model-unique and fragile signature embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference, 22–28 February 2022; Volume 36, Number 9. pp. 9595–9603. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 391–407. [Google Scholar]
Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
Salem, A.; Sautter, Y.; Backes, M.; Humbert, M.; Zhang, Y. Baaan: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models. arXiv 2020, arXiv:2010.03007. [Google Scholar]
Rawat, A.; Levacher, K.; Sinn, M. The Devil Is in the GAN: Backdoor Attacks and Defenses in Deep Generative Models. In Proceedings of the European Symposium on Research in Computer Security; Springer: Berlin/Heidelberg, Germany, 2022; pp. 776–783. [Google Scholar]
Zhu, L.; Ning, R.; Wang, C.; Xin, C.; Wu, H. Gangsweep: Sweep out Neural Backdoors by GAN. In Proceedings of the 28th ACM International Conference on Multimedia; ACM: New York, NY, USA, 2020; pp. 3173–3181. [Google Scholar]
Ong, D.S.; Chan, C.S.; Ng, K.W.; Fan, L.; Yang, Q. Protecting Intellectual Property of Generative Adversarial Networks from Ambiguity Attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 3630–3639. [Google Scholar]
Chou, S.-Y.; Chen, P.-Y.; Ho, T.-Y. How to Backdoor Diffusion Models? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2023; pp. 4015–4024. [Google Scholar]
Struppek, L.; Hintersdorf, D.; Kersting, K. Rickrolling the Artist: Injecting Backdoors into Text-Guided Image Generation Models. In Proceedings of the International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Zhai, S.; Dong, Y.; Shen, Q.; Pu, S.; Fang, Y.; Su, H. Text-to-Image Diffusion Models Can Be Easily Backdoored through Multimodal Data Poisoning. In Proceedings of the 31st ACM International Conference on Multimedia; ACM: New York, NY, USA, 2023; pp. 1577–1587. [Google Scholar]
Krizhevsky, A.; Hinton, G. Learning multiple layers of features from tiny images; Toronto, ON, Canada, 2009.
Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-UCSD Birds-200-2011 Dataset; California Institute of Technology, 2011.
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition; 2009; pp. 248–255. [Google Scholar]
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016; pp. 3213–3223. [Google Scholar]
Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Van Esesn, B.C.; Awwal, A.A.S.; Asari, V.K. The history began from AlexNet: A comprehensive survey on deep learning approaches. arXiv 2018, arXiv:1803.01164. [Google Scholar]
Sze, V.; Chen, Y.-H.; Yang, T.-J.; Emer, J.S. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 2017, 105, 2295–2329. [Google Scholar] [CrossRef]
Shimomura, Y.; Tomiyama, T. Service modeling for service engineering. In Proceedings of the International Working Conference on the Design of Information Infrastructure Systems for Manufacturing; 2002; pp. 31–38. [Google Scholar]
Vyas, N.; Kakade, S.M.; Barak, B. On provable copyright protection for generative models. In Proceedings of the International Conference on Machine Learning; 2023; pp. 35277–35299. [Google Scholar]
O’Shea, K. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, arXiv:1511.08458, 2015.
Hershey, J.R.; Olsen, P.A. Approximating the Kullback-Leibler divergence between Gaussian mixture models. In Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2007, pp. IV–317.
De Boer, P.-T.; Kroese, D.P.; Mannor, S.; Rubinstein, R.Y. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
Amari, S. Backpropagation and stochastic gradient descent method. Neurocomputing 1993, 5, 185–196. [Google Scholar] [CrossRef]
Yu, Y.; Zhang, W.; Deng, Y. Frechet inception distance (FID) for evaluating GANs. China University of Mining Technology Beijing Graduate School 2021, 3. [Google Scholar]
Hore, A.; Ziou, D. Image quality metrics: PSNR vs. In SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition; 2010; pp. 2366–2369. [Google Scholar]
Huang, J.-B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015; pp. 5197–5206. [Google Scholar]
Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human-segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision (ICCV); 2001; pp. 416–423. [Google Scholar]
Li, S.; Ma, J.; Cheng, M. Invisible Backdoor Attacks on Diffusion Models. arXiv 2024, arXiv:2406.00816. [Google Scholar]
Jiang, W.; Li, H.; He, J.; Zhang, R.; Xu, G.; Zhang, T.; Lu, R. Backdoor Attacks against Image-to-Image Networks. arXiv 2024, arXiv:2407.10445. [Google Scholar]
Chen, J.; Xiong, H.; Zheng, H.; Zhang, J.; Liu, Y. Dyn-backdoor: Backdoor Attack on Dynamic Link Prediction. IEEE Trans. Netw. Sci. Eng. 2023. [Google Scholar] [CrossRef]
Ding, Y.; Wang, Z.; Qin, Z.; Zhou, E.; Zhu, G.; Qin, Z.; Choo, K.-K.R. Backdoor Attack on Deep Learning-Based Medical Image Encryption and Decryption Network. IEEE Trans. Inf. Forensics Secur. 2023. [Google Scholar] [CrossRef]
Golda, A.; Mekonen, K.; Pandey, A.; Singh, A.; Hassija, V.; Chamola, V.; Sikdar, B. Privacy and Security Concerns in Generative AI: A Comprehensive Survey. IEEE Access 2024. [Google Scholar] [CrossRef]
Samuelson, P. Generative AI meets copyright. Science 2023, 381, 158–161. [Google Scholar] [CrossRef]
Wang, T.; Zhang, Y.; Qi, S.; Zhao, R.; Zhihua, X.; Weng, J. Security and privacy on generative data in AIGC: A survey. ACM Computing Surveys 2023. [Google Scholar] [CrossRef]
Feretzakis, G.; Papaspyridis, K.; Gkoulalas-Divanis, A.; Verykios, V.S. Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review. Information 2024, 15, 697. [Google Scholar] [CrossRef]
Huang, Y.; Huang, J.; Liu, Y.; Yan, M.; Lv, J.; Liu, J.; Xiong, W.; Zhang, H.; Chen, S.; Cao, L. Diffusion model-based image editing: A survey. arXiv 2024, arXiv:2402.17525. [Google Scholar]
Moser, B.B.; Shanbhag, A.S.; Raue, F.; Frolov, S.; Palacio, S.; Dengel, A. Diffusion models, image super-resolution, and everything: A survey. IEEE Trans. Neural Networks Learn. Syst. 2024. [Google Scholar] [CrossRef]
Huang, R.; Huang, J.; Yang, D.; Ren, Y.; Liu, L.; Li, M.; Ye, Z.; Liu, J.; Yin, X.; Zhao, Z. Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models. In Proceedings of the International Conference on Machine Learning; 2023; pp. 13916–13932. [Google Scholar]
Liu, H.; Chen, Z.; Yuan, Y.; Mei, X.; Liu, X.; Mandic, D.; Wang, W.; Plumbley, M.D. Audioldm: Text-to-audio generation with latent diffusion models. arXiv 2023, arXiv:2301.12503. [Google Scholar]
Xing, Z.; Feng, Q.; Chen, H.; Dai, Q.; Hu, H.; Xu, H.; Wu, Z.; Jiang, Y.-G. A survey on video diffusion models. ACM Computing Surveys 2024, 57, 1–42. [Google Scholar] [CrossRef]
Yang, L.; Yu, Z.; Meng, C.; Xu, M.; Ermon, S.; Bin, C. Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal LLMs. In Proceedings of the Forty-first International Conference on Machine Learning; 2024. [Google Scholar]
Zheng, K.; Lu, C.; Chen, J.; Zhu, J. DPM-Solver-V3: Improved diffusion ODE solver with empirical model statistics. Advances in Neural Information Processing Systems 2023, 36, 55502–55542. [Google Scholar]
Tian, K.; Jiang, Y.; Yuan, Z.; Peng, B.; Wang, L. Visual autoregressive modeling: Scalable image generation via next-scale prediction. arXiv 2024, arXiv:2404.02905. [Google Scholar]
Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-Based Generative Modeling through Stochastic Differential Equations. In Proceedings of the International Conference on Learning Representations, 2021. Available online: https://openreview.net/forum?id=PxTIG12RRHS. [Google Scholar]
Chen, H.; Wang, Z.; Li, X.; Sun, X.; Chen, F.; Liu, J.; Wang, J.; Raj, B.; Liu, Z.; Barsoum, E. SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer. arXiv 2024, arXiv:2412.10958. [Google Scholar]
Walker, J.; Razavi, A.; Oord, A.V.D. Predicting video with VQVAE. arXiv 2021, arXiv:2103.01950. [Google Scholar]
Liu, Y.; Liu, Z.; Li, S.; Yu, Z.; Guo, Y.; Liu, Q.; Wang, G. Cloud-VAE: Variational autoencoder with concepts embedded. Pattern Recognition 2023, 140, 109530. [Google Scholar] [CrossRef]
Razavi, A.; Van den Oord, A.; Vinyals, O. Generating diverse high-fidelity images with VQ-VAE-2. Advances in Neural Information Processing Systems 2019, 32. [Google Scholar]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8110–8119. 2020. [Google Scholar]
Karras, T.; Aittala, M.; Laine, S.; Härkönen, E.; Hellsten, J.; Lehtinen, J.; Aila, T. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 2021, 34, 852–863. [Google Scholar]
Oussidi, A.; Elhassouny, A. Deep generative models: Survey. In Proceedings of the 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), 1–8. 2018. [Google Scholar]
Weng, L. From GAN to WGAN. arXiv preprint arXiv:1904.08994, arXiv:1904.08994 2019.
Brock, A. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
Karras, T. A Style-Based Generator Architecture for Generative Adversarial Networks. arXiv 2019, arXiv:1812.04948. [Google Scholar]
Gu, T.; Dolan-Gavitt, B.; Garg, S. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv 2017, arXiv:1708.06733. [Google Scholar]
Weng, C.-H.; Lee, Y.-T.; Wu, S.-H.B. On the trade-off between adversarial and backdoor robustness. Advances in Neural Information Processing Systems 2020, 33, 11973–11983. [Google Scholar]
Li, Y.; Zhai, T.; Wu, B.; Jiang, Y.; Li, Z.; Xia, S. Rethinking the Trigger of Backdoor Attack. arXiv 2004. [Google Scholar]
Barni, M.; Kallas, K.; Tondi, B. A New Backdoor Attack in CNNs by Training Set Corruption Without Label Poisoning. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP); IEEE, 2019; pp. 101–105. [Google Scholar]
Gao, Y.; Wu, D.; Zhang, J.; Gan, G.; Xia, S.-T.; Niu, G.; Sugiyama, M. On the Effectiveness of Adversarial Training Against Backdoor Attacks. IEEE Trans. Neural Networks Learn. Syst. 2023. [Google Scholar] [CrossRef]
Xiang, Z.; Miller, D.J.; Kesidis, G. Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios. arXiv Preprint, 2201. [Google Scholar]
Dong, Y.; Yang, X.; Deng, Z.; Pang, T.; Xiao, Z.; Su, H.; Zhu, J. Black-Box Detection of Backdoor Attacks with Limited Information and Data. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE, 2021; pp. 16482–16491. [Google Scholar]
Chen, W.; Wu, B.; Wang, H. Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples. Adv. Neural Inf. Process. Syst. 2022, 35, 9727–9737. [Google Scholar] [CrossRef]
Li, Y.; Li, Y.; Wu, B.; Li, L.; He, R.; Lyu, S. Invisible Backdoor Attack with Sample-Specific Triggers. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE, 2021; pp. 16463–16472. [Google Scholar]
Yao, Y.; Li, H.; Zheng, H.; Zhao, B.Y. Latent Backdoor Attacks on Deep Neural Networks. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security; ACM, 2019; pp. 2041–2055. [Google Scholar]

Table 1. GAN Backdoor Loss Function Across Different Models. * is the GAN loss part of CycleGAN and + is the Cycle loss part of CycleGAN.

Method	Original Model Loss: $L_{o}$	Backdoor Loss: $L_{b}$
DCGAN [9,16]	$- E_{z p_{z} (z)} [\hat{D} (G (z))]$	$1 - S S I M (G (x_{w}), y_{w})$
SRGAN [10,16]	$l_{V G G / 4, 5}^{S R} - 10^{- 3} Σ_{n = 1}^{N} log D_{θ_{D}} (G_{θ_{G}} (I^{L R}))$	$1 - S S I M (G (x_{w}), y_{w})$
${CycleGAN}^{*}$ [11,16]	$E_{y p_{d a t a} (y)} [log D_{Y} (y)] + E_{x p_{d a t a} (x)} [log (1 - D_{Y} (x))]$	$1 - S S I M (G (x_{w}), y_{w})$
${CycleGAN}^{+}$ [11,16]	$E_{x p_{d a t a} (x)} {[∥ F (G (x)) - x ∥}_{1}]$	$1 - S S I M (G (x_{w}), y_{w})$
ConditionGAN [13]	$- E_{z p_{z} (z)} [\hat{D} (G (z))]$	$E [log D_{b d} ({\hat{x}}_{b d})]$
DCGAN [9,14]	$- E_{z p_{z} (z)} [\hat{D} (G (z))]$	$E_{z P_{t r i g g e r}} {[∥ G (z) - ρ (z) ∥}_{2}^{2}]$
GangSweep [15]	$E_{x} {(∥ G (x) ∥}_{2})$	$max ({max}_{i \neq t} (f {(x + G (x))}_{i}) - f {(x + G (x))}_{t}, k)$
Dyn-Backdoor [38]	$\frac{1}{D} Σ_{i = 1}^{D} {[A t k_{ϕ} (\hat{S_{i}}, E_{T}) - \hat{T}]}^{2}$	$\frac{1}{D} Σ_{i = 1}^{D} {[A t k_{ϕ} (\hat{S_{i}}) - \hat{G_{t}}]}^{2}$
EncDec Network [39]	$- E_{z p_{z} (z)} [\hat{D} (G (z))]$	${min}_{G} (E_{x} log 1 - D (G (\hat{x})))$

Table 2. Diffusion Backdoor Loss Function Across Different Models

Method	Original Model Loss: $L_{o}$	Backdoor Loss: $L_{b}$
BadDiffusion [17]	$∥ ϵ - ϵ_{θ} (\sqrt{\bar{α_{t}}} x + \sqrt{1 - \bar{α_{t}}} ϵ, t) ∥^{2}$	$∥ \frac{ρ_{t} δ_{t}}{1 - α_{t}} r + ϵ - ϵ_{θ} (x_{t}^{'} (y, r, ϵ), t) ∥^{2}$
Rickrolling-TPA [18]	$\frac{1}{\| X^{'} \|} Σ_{ω \in X^{'}} d (E (ω), \hat{E} (ω))$	$\frac{1}{\| X \|} Σ_{v \in X^{'}} d (E (y_{t}), \hat{E} (v ⨁ t))$
Rickrolling-TAA [18]	$\frac{1}{\| X^{'} \|} Σ_{ω \in X^{'}} d (E (ω), \hat{E} (ω))$	$\frac{1}{\| X \|} Σ_{v \in X^{'}} d (E (a_{t}), \hat{E} (v ⨁ t))$
Multimodal-Pixel [19]	$E_{z, c, ϵ, t} [∥ ϵ_{θ} (z_{t}, t, c) - \hat{ϵ} (z_{t}, t, c) ∥_{2}^{2}]$	$E_{z_{p}, c_{t r}, ϵ, t} [∥ ϵ_{θ} (z_{p}, t, c_{t r}) - ϵ ∥_{2}^{2}]$
Multimodal-Object [19]	$E_{z_{a}, c_{a}, ϵ, t} [∥ ϵ_{θ} (z_{a, t}, t, c_{a}) - \hat{ϵ} (z_{a, t}, t, c_{a}) ∥_{2}^{2}]$	$E_{z_{b}, c_{b}, ϵ, t} [∥ ϵ_{θ} (z_{b, t}, t, c_{b \Rightarrow a, t r}) - \hat{ϵ} (z_{b, t}, t, c_{b}) ∥_{2}^{2}]$
Multimodal-Style [19]	$E_{z_{a}, c_{a}, ϵ, t} [∥ ϵ_{θ} (z_{a, t}, t, c_{a}) - \hat{ϵ} (z_{a, t}, t, c_{a}) ∥_{2}^{2}]$	$E_{z, c_{t r}, ϵ, t} [∥ ϵ_{θ} (z_{t}, t, c_{t r}) - \hat{ϵ} (z_{t}, t, c_{s t y l e}) ∥_{2}^{2}]$
Invisible [36]	$∥ ϵ - ϵ_{θ} (\sqrt{\bar{α_{t}}} x_{0} + \sqrt{1 - \bar{α_{t}}} ϵ, t) ∥^{2}$	$∥ ϵ + ξ_{t} δ - ϵ_{θ} (x_{t}^{'} (y, δ, ϵ), t) ∥^{2}$
I2I-Model [37]	$∥ F (X_{n}) - Y_{n} ∥_{2}$	$∥ F (X_{b}) - Y_{b} ∥_{2}$

Table 3. GAN Backdoor Training Results Across Different Models

Method	Dataset	FID↓ [32]			Time (s)
DCGAN [9]	CIFAR-10 [20]	$25.7612$	–	–	9402
+ backdoor	CIFAR-10 [20]	$21.9834$	–	–	11705
DCGAN [9]	CUB-200 [21]	$73.3175$	–	–	12102
+ backdoor	CUB-200 [21]	$68.1582$	–	–	15140
Method	Train	Test	PSNR↓ [33]	SSIM↑ [33]	Time (s)
SRGAN [10]	ImageNet [22]	Set5 [34]	$28.77$	$87.65 %$	58402
+ backdoor	ImageNet [22]	Set5 [34]	$28.75$	$87.66 %$	70374
SRGAN [10]	ImageNet [22]	Set14 [34]	$27.81$	$83.17 %$	58402
+ backdoor	ImageNet [22]	Set14 [34]	$27.78$	$83.69 %$	70374
SRGAN [10]	ImageNet [22]	BSD100 [35]	$28.54$	$81.73 %$	58402
+ backdoor	ImageNet [22]	BSD100 [35]	$28.50$	$82.01 %$	70374
Method	Dataset	Per-pixel acc.↑	Per-class acc.↑	Class IoU↑	Time (s)
CycleGAN [11]	cityscapes [23]	$0.55$	$0.18$	$0.13 %$	94902
+ backdoor	cityscapes [23]	$0.55$	$0.18$	$0.13 %$	108226

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Backdoor Training Paradigm in Generative Models

Abstract

Keywords:

Subject:

1. Introduction

2. Related Works

2.1. Backdoor Training Protection

2.2. Generative Adversarial Network (GAN)

2.3. Diffusion Models

3. Backdoor Training Paradigm in Generative Models

3.1. Backdoor Training in GANs

3.2. Backdoor Training in Diffusion Models

3.3. Backdoor Training Paradigm

4. Threat Model

5. Experiment

6. Discussion

7. Conclusions

Author Contributions

References

MDPI Initiatives

Important Links

Subscribe