Full Parameter Unlearning: Techniques
In the context of machine unlearning, one of the most rigorous approaches is full parameter unlearning, which involves adjusting all trainable parameters in a neural network during the unlearning process. Unlike partial unlearning methods, which may focus on a subset of parameters or specific layers, full parameter unlearning aims to comprehensively eliminate the influence of specific data points across the entire network. This method ensures that no traces of the removed data remain, thus aligning more closely with stringent privacy regulations like the "right to be forgotten."
Full parameter unlearning involves adjusting all trainable parameters during the unlearning process [2]. Full parameter unlearning requires two parameter update techniques. 1. Gradient ascent based on first-order information and 2. Fisher information based on second-order information. As indicated by [2], gradient ascent updates model parameters by maximizing the loss on the forgot samples [8]. [9] shows an effective unlearning way by applying to specific sequence rather than entire instances. The research also indicates that machine unlearning which solely relies on gradient ascent may negatively affect the generation capabilities of LLM.
Gradient ascent is a first-order optimization technique, meaning it uses information from the first-order derivative of the loss function with respect to the model's parameters (i.e., the gradient). In the context of machine unlearning, gradient ascent is used to reverse the learning process for specific data points. When a neural network learns from data, it minimizes the loss function by adjusting parameters through gradient descent. To unlearn, gradient ascent is applied, which involves moving in the opposite direction of the learned gradient to effectively "undo" the impact of the data.
In this method, the model calculates the gradient of the loss function with respect to the data that needs to be unlearned. The model then updates the parameters in the opposite direction, reducing the contribution of that data to the overall model. This approach is useful for neural networks like CNNs and DNNs, where the influence of specific data is distributed across many layers and parameters.
Consider a CNN trained for image classification. If an individual requests that their image be "forgotten," gradient ascent can be used to adjust the network's convolutional filters and fully connected layers to remove the contribution of that image. The gradients computed during the original learning process are reversed to unlearn the features associated with the image, thus ensuring that CNN no longer uses that data in future classifications.
The second approach for full parameter unlearning leverages Fisher information, which is a second-order optimization technique. Fisher information measures the sensitivity of the likelihood function to changes in the model's parameters and provides a richer understanding of how data points influence the model. Fisher information is used to update the parameters by considering how much specific parameters contribute to the overall model and adjusting those parameters in a way that unlearns specific data points without affecting other critical information.
Fisher information is particularly useful for identifying parameters that are most "important" to the model's performance. By selectively unlearning the influence of data on these key parameters, it becomes possible to remove the contribution of unwanted data while minimizing the risk of damaging the network’s overall capabilities. This method is especially effective in complex neural networks like DNNs and GNNs, where second-order information can help prevent catastrophic forgetting during the unlearning process. [
10,
11] are real-world examples of Fisher based approach that applies to DNN.
In a GNN trained for social network analysis, removing the influence of a specific node (representing an individual in the network) requires precise adjustments to ensure that the rest of the graph structure remains intact. Fisher information can be used to selectively unlearn the influence of that node by updating the parameters that contributed most to its learning, thus ensuring that the node is effectively forgotten without disrupting the broader graph.
In practice, combining gradient ascent and Fisher information can lead to more effective full parameter unlearning. Gradient ascent offers a fast and straightforward way to reverse the influence of data, while Fisher information provides a more refined and targeted approach to parameter updates. Together, these techniques can ensure that unlearning is both comprehensive and efficient, reducing the computational cost typically associated with full model retraining.
For example, in a DNN used for natural language processing (NLP), where millions of parameters may be influenced by even small datasets, combining gradient ascent with Fisher information can help identify and adjust key parameters that most contribute to the model's output. This not only helps in unlearning the specific data but also ensures that the overall structure and learned knowledge of the DNN remain intact.
While full parameter unlearning offers a thorough method for ensuring data is completely forgotten, it is computationally intensive. Gradient ascent may require multiple iterations of parameter updates, and calculating Fisher information for all parameters is computationally expensive, especially in large-scale networks. Future research will likely focus on optimizing these techniques to make full parameter unlearning more scalable for real-world applications. Additionally, combining these methods with techniques such as knowledge distillation and selective retraining may offer hybrid approaches that balance computational efficiency with the need for comprehensive unlearning.