Submitted:
03 March 2025
Posted:
03 March 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Background
- In the setup of the T5-small model tuning, prompt tuning is less effective compared with fine-tuning. The latter updates all parameters but sacrifices learning efficiency.
- There is still room to further improve prompt tuning if we can tweak the hyperparameter configurations of the optimizer or allow sample-wise prompts.
- We extend previous experiments on the RTE task to all tasks in the SuperGLUE benchmark to make a more comprehensive comparison.
- We implement DP optimization, which preserves privacy during modeling learning, and combine it with soft prompt tuning. We evaluate this new approach on various tasks and perform thorough ablation studies to understand the privacy-accuracy trade-off.
- We propose several improvements for DP prompt tuning and verify their efficacy.
3. Related Work
3.1. Language Modeling and Prompt Tuning
3.2. Differential Privacy
4. Methods
4.1. Vanilla Fine-Tuning
4.2. Vanilla Soft Prompt Tuning
4.3. Our Approach: DP Prompt Tuning
- In the inner loop, instead of actually performing gradient descent, we accumulate the gradients of the mini-batch samples. Based on the gradient norms, we scale the gradients and perform gradient clipping to limit the impact of any single example on the gradients.
- In the outer loop, we add Gaussian noise to the accumulated gradients, average them across all mini-batches, and then perform gradient descent. The Gaussian noise provides additional privacy by making it more difficult to recover the true gradients in each optimization step.
5. Results
5.1. Fine-Tuning
5.2. Soft Prompt Tuning
Efficiency-Effectiveness Trade-off for Different Prompt Lengths.
Initialization Affects Downstream Performance.
5.3. DP Prompt Tuning
5.3.1. Theoretical Perspective: Is Privacy Preserved After Using the DP Optimizer?
| noise scale | achieved |
| 0.01 | 3.8e9 |
| 0.1 | 2.1e5 |
| 1 | 18 |
5.3.2. Empirical Perspective: Privacy-Accuracy Trade-off
Noise Scale
Gradient Clipping
Micro-Batch Size
Summary
5.3.3. Proposed Improvements: Mitigating Privacy-Accuracy Trade-off
Hierarchical Privacy Learning
Strategic Initialization
6. Discussion and Analysis
6.1. Limitations
6.2. Future Work
Appendix A
Appendix A.1. Formulating Classification Problems in SuperGLUE to Text-to-Text Problems
Appendix A.2. Parameter Counts for Prompting T5-Small
| T5 Size | Prompt Length | Trainable Parameters | Total Parameters | Percent Trainable |
|---|---|---|---|---|
| Small | 5 | % | ||
| 20 | % | |||
| 50 | % | |||
| 100 | % |
Appendix A.3. Experiment Details
Appendix A.3.1. Hyperparameters for Fine-Tuning
| Dataset | Epochs | Steps |
|---|---|---|
| BoolQ | 5 | 5895 |
| CB | 100 | 3200 |
| COPA | 60 | 3000 |
| MultiRC | 1 | 3406 |
| RTE | 15 | 4695 |
| WiC | 5 | 3000 |
| WSC | 50 | 3500 |
Appendix A.3.2. Hyperparameters for Prompt Tuning and DP Prompt Tuning
Appendix A.3.3. SuperGLUE Fine-Tuning Accuracy
| Dataset | BoolQ | CB | COPA | MultiRC | RTE | WiC | WSC |
|---|---|---|---|---|---|---|---|
| Fine-Tuning | 0.6217 | 0.7321 | 0.4500 | 0.5720 | 0.5957 | 0.5329 | 0.1442 |
References
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018. [CrossRef]
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019. [CrossRef]
- Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T. J. Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. ArXiv, abs/2005.14165, 2020. [CrossRef]
- Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. CoRR, abs/2104.08691, 2021. [CrossRef]
- Karen Hambardzumyan, Hrant Khachatrian, and Jonathan May. WARP: Word-level Adversarial ReProgramming. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4921–4933, Online, August 2021. Association for Computational Linguistics.
- Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4222–4235, Online, November 2020. Association for Computational Linguistics.
- Junhong Shen, Neil Tenenholtz, James Brian Hall, David Alvarez-Melis, and Nicolo Fusi. Tag-llm: Repurposing general-purpose llms for specialized domains, 2024. [CrossRef]
- Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online, August 2021. Association for Computational Linguistics.
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. [CrossRef]
- Cynthia Dwork. Differential privacy. In 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), volume 4052 of Lecture Notes in Computer Science, pages 1–12. Springer Verlag, July 2006.
- Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, oct 2016.
- Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Superglue: A stickier benchmark for general-purpose language understanding systems. ArXiv, abs/1905.00537, 2019. [CrossRef]
- Luyao Yuan, Zipeng Fu, Jingyue Shen, Lu Xu, Junhong Shen, and Song-Chun Zhu. Emergence of pragmatics from referential game between theory of mind agents, 2021. [CrossRef]
- Luyao Yuan, Dongruo Zhou, Junhong Shen, Jingdong Gao, Jeffrey L Chen, Quanquan Gu, Ying Nian Wu, and Song-Chun Zhu. Iterative teacher-aware learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 29231–29245. Curran Associates, Inc., 2021.
- Junhong Shen, Liam Li, Lucio M. Dery, Corey Staten, Mikhail Khodak, Graham Neubig, and Ameet Talwalkar. Cross-modal fine-tuning: align then refine. In Proceedings of the 40th International Conference on Machine Learning, 2023.
- Junhong Shen, Tanya Marwah, and Ameet Talwalkar. Ups: Towards foundation models for pde solving via cross-modal adaptation. arXiv preprint arXiv:2403.07187, 2024. [CrossRef]
- Junhong Shen, Mikhail Khodak, and Ameet Talwalkar. Efficient architecture search for diverse tasks. In Advances in Neural Information Processing Systems (NeurIPS), 2022. [CrossRef]
- Renbo Tu, Nicholas Roberts, Mikhail Khodak, Junhong Shen, Frederic Sala, and Ameet Talwalkar. NAS-bench-360: Benchmarking neural architecture search on diverse tasks. In Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2022. [CrossRef]
- Nicholas Roberts, Samuel Guo, Cong Xu, Ameet Talwalkar, David Lander, Lvfang Tao, Linhang Cai, Shuaicheng Niu, Jianyu Heng, Hongyang Qin, Minwen Deng, Johannes Hog, Alexander Pfefferle, Sushil Ammanaghatta Shivakumar, Arjun Krishnakumar, Yubo Wang, Rhea Sanjay Sukthanker, Frank Hutter, Euxhen Hasanaj, Tien-Dung Le, Mikhail Khodak, Yuriy Nevmyvaka, Kashif Rasul, Frederic Sala, Anderson Schneider, Junhong Shen, and Evan R. Sparks. Automl decathlon: Diverse tasks, modern methods, and efficiency at scale. In Neural Information Processing Systems, 2021.
- Junhong Shen and Lin F. Yang. Theoretically principled deep rl acceleration via nearest neighbor function approximation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(11):9558–9566, May 2021.
- Zongzhe Xu, Ritvik Gupta, Wenduo Cheng, Alexander Shen, Junhong Shen, Ameet Talwalkar, and Mikhail Khodak. Specialized foundation models struggle to beat supervised baselines, 2024. [CrossRef]
- Junhong Shen, Atishay Jain, Zedian Xiao, Ishan Amlekar, Mouad Hadji, Aaron Podolny, and Ameet Talwalkar. Scribeagent: Towards specialized web agents using production-scale workflow data, 2024. [CrossRef]
- Jaideep Vaidya, Basit Shafiq, Anirban Basu, and Yuan Hong. Differentially private naive bayes classification. In 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), volume 1, page 571–576, 2013.
- Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y. Zhao. Sharing graphs using differentially private graph models. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, IMC ’11, page 81–98, New York, NY, USA, Nov 2011. Association for Computing Machinery.
- Darakhshan, J. Mir and Rebecca N. Wright. A differentially private graph estimator. In 2009 IEEE International Conference on Data Mining Workshops, page 122–129, Dec 2009.
- Xiaoqian Jiang, Zhanglong Ji, Shuang Wang, Noman Mohammed, Samuel Cheng, and Lucila Ohno-Machado. Differential-private data publishing through component analysis. Transactions on Data Privacy, 6(1):19–34, Apr 2013.
- Zhanglong Ji and Charles Elkan. Differential privacy based on importance weighting. Machine Learning, 93(1):163–183, Oct 2013. [CrossRef]
- Benjamin I., P. Rubinstein, Peter L. Bartlett, Ling Huang, and Nina Taft. Learning in a large function space: Privacy-preserving mechanisms for svm learning, 2009. [CrossRef]
- Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization, 2009. [CrossRef]
- Kamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008.
- Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and Marianne Winslett. Functional mechanism: Regression analysis under differential privacy. CoRR, abs/1208.0219, 2012. [CrossRef]
- Jacek Czerniak and Hubert Zarzycki. Application of rough sets in the presumptive diagnosis of urinary system diseases. In Jerzy Sołdek and Leszek Drobiazgiewicz, editors, Artificial Intelligence and Security in Computing Systems, pages 41–51, Boston, MA, 2003. Springer US.
- Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. pages 75–84, 06 2007. [CrossRef]
- Junhong Shen, Abdul Hannan Faruqi, Yifan Jiang, and Nima Maftoon. Mathematical reconstruction of patient-specific vascular networks based on clinical images and global optimization. IEEE Access, 9:20648–20661, 2021. [CrossRef]
- Wenduo Cheng, Junhong Shen, Mikhail Khodak, Jian Ma, and Ameet Talwalkar. L2g: Repurposing language models for genomics tasks. bioRxiv, 2024. [CrossRef]
- Lingjuan Lyu, Xuanli He, and Yitong Li. Differentially private representation for NLP: formal guarantee and an empirical study on privacy and fairness. CoRR, abs/2010.01285, 2020.
- Benjamin Weggenmann and Florian Kerschbaum. Syntf: Synthetic and differentially private term frequency vectors for privacy-preserving text mining. CoRR, abs/1805.00904, 2018. [CrossRef]
- Ghazaleh Beigi, Kai Shu, Ruocheng Guo, Suhang Wang, and Huan Liu. I am not what I write: Privacy preserving text representation learning. CoRR, abs/1907.03189, 2019. [CrossRef]
- Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, and Huishuai Zhang. Differentially private fine-tuning of language models. CoRR, abs/2110.06500, 2021. [CrossRef]
- Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2017.
- Noam M. Shazeer and Mitchell Stern. Adafactor: Adaptive learning rates with sublinear memory cost. ArXiv, abs/1804.04235, 2018. [CrossRef]
- Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Jonathan Ben-Tzur, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. A system for massively parallel hyperparameter tuning. Proceedings of Machine Learning and Systems, 2:230–246, 2020.
- Weixin Liang, Junhong Shen, Genghan Zhang, Ning Dong, Luke Zettlemoyer, and Lili Yu. Mixture-of-mamba: Enhancing multi-modal state-space models with modality-aware sparsity, 2025. [CrossRef]
- Junhong Shen, Kushal Tirumala, Michihiro Yasunaga, Ishan Misra, Luke Zettlemoyer, Lili Yu, and Chunting Zhou. Cat: Content-adaptive image tokenization, 2025. [CrossRef]








| Dataset | BoolQ | CB | COPA | MultiRC | RTE | WiC | WSC |
|---|---|---|---|---|---|---|---|
| Fine-Tuning | 0.1086 | 0.2981 | 0.1365 | 0.0859 | 0.4101 | 0.3257 | 0.9430 |
| Prompt Tuning | 0.3707 | 19.9799 | 21.0226 | 0.1237 | 2.5437 | 0.2320 | 21.5199 |
| DP Prompt Tuning | 0.3866 | 19.8822 | 29.0872 | 0.1374 | 11.8503 | 2.9677 | 19.8082 |
| 1 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).