Submitted:
01 December 2023
Posted:
01 December 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We propose a new idea for the face-swapping task, , that we can transfer the identity information more adequately and flexibly by identity feature disentanglement, based on which we propose a new high-quality face swapping framework (ControlFace) and achieve controllable face swapping.
- We propose a novel approach for disentangling structure and texture, and accordingly propose a semantic hierarchy-based face feature fusion module, where different semantic levels of features are fused to enable the model to efficiently learn these features and generate the swapped face. Moreover, We design some loss functions to make the disentanglement more adequate and accurate.
- Extensive experiments demonstrate the effectiveness of our approach to transfer identity information and perform controllable face swapping.
2. Related Work
2.1. GAN Inversion
2.2. Face Swapping
2.3. Feature Disentanglement
3. Method
3.1. Disentangling of Identity Feature
3.2. Feature Fusion Based on Semantic Hierarchy
3.3. Loss Functions
3.3.1. Identity-consistency Loss
3.3.2. Attribute-consistency Loss
3.3.3. Ancillary Loss
4. Results and Discussion
4.1. Experimental Setup
4.2. Qualitative Evaluation
4.3. Quantitative Evaluation
4.4. Ablation Study
- Choice of Identity Embeddings. Our identity extraction network extracts a total of three identity embeddings , and . To demonstrate the necessity of individual identity embeddings, we reduce 1-2 identity embeddings at a time and retrain the model. We reduced , , , both and , respectively. When we don’t inject or into the feature fusion network, we also correspondingly stopped using or . The experimental results show that reducing a certain embedding may lead to a better transfer of other identity features, but has a large impact on the identity information represented by that embedding.
- Feature Injection Strategy. For feature injection, we conduct experiments with three different strategies. (a) injects the albedo embedding into the coarse feature mapper and the medium feature mapper , and injects the depth embedding into the fine feature mapper . (b) injects the depth embedding into the coarse feature mapper , and injects the albeda embedding into the fine feature mapper and the medium feature mapper . (c) injects the ArcFace embedding into the coarse feature mapper and the medium feature mapper , and no more into the fine feature mapper . Experiments on strategies (a) and (b) show that our feature injection approach matches its semantic level. Experiments on strategy (c) show that there are a number of low-level semantic features in the ArcFace embedding , and it is necessary to inject them into the whole three mappers.
4.5. Controllable Face Swapping
- Qualitative results. We show the generation results of each identity transfer mode in Figure 3, from which we can clearly make out the significant differences between the different modes.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, L.; Bao, J.; Yang, H.; Chen, D.; Wen, F. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 2019.
- Chen, R.; Chen, X.; Ni, B.; Ge, Y. Simswap: An efficient framework for high fidelity face swapping. In Proceedings of the Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2003–2011.
- Wang, Y.; Chen, X.; Zhu, J.; Chu, W.; Tai, Y.; Wang, C.; Li, J.; Wu, Y.; Huang, F.; Ji, R. HifiFace: 3D shape and semantic prior guided high fidelity face swapping. arXiv preprint arXiv:2106.09965 2021.
- Xu, Z.; Yu, X.; Hong, Z.; Zhu, Z.; Han, J.; Liu, J.; Ding, E.; Bai, X. Facecontroller: Controllable attribute editing for face in the wild. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2021, Vol. 35, pp. 3083–3091.
- Zhao, W.; Rao, Y.; Shi, W.; Liu, Z.; Zhou, J.; Lu, J. DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8568–8577.
- Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4690–4699.
- Wu, S.; Rupprecht, C.; Vedaldi, A. Unsupervised learning of probably symmetric deformable 3d objects from images in the wild. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 1–10.
- Zhang, Z.; Ge, Y.; Chen, R.; Tai, Y.; Yan, Y.; Yang, J.; Wang, C.; Li, J.; Huang, F. Learning to aggregate and personalize 3d face from in-the-wild photo collection. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14214–14224.
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8110–8119.
- Goetschalckx, L.; Andonian, A.; Oliva, A.; Isola, P. Ganalyze: Toward visual definitions of cognitive image properties. In Proceedings of the Proceedings of the ieee/cvf international conference on computer vision, 2019, pp. 5744–5753.
- Jahanian, A.; Chai, L.; Isola, P. On the" steerability" of generative adversarial networks. arXiv preprint arXiv:1907.07171 2019.
- Shen, Y.; Gu, J.; Tang, X.; Zhou, B. Interpreting the latent space of gans for semantic face editing. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9243–9252.
- Collins, E.; Bala, R.; Price, B.; Susstrunk, S. Editing in style: Uncovering the local semantics of gans. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5771–5780.
- Abdal, R.; Qin, Y.; Wonka, P. Image2stylegan: How to embed images into the stylegan latent space? In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 4432–4441.
- Richardson, E.; Alaluf, Y.; Patashnik, O.; Nitzan, Y.; Azar, Y.; Shapiro, S.; Cohen-Or, D. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2287–2296.
- Blanz, V.; Scherbaum, K.; Vetter, T.; Seidel, H.P. Exchanging faces in images. In Proceedings of the Computer Graphics Forum. Wiley Online Library, 2004, Vol. 23, pp. 669–676.
- Thies, J.; Zollhofer, M.; Stamminger, M.; Theobalt, C.; Nießner, M. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2387–2395.
- Nirkin, Y.; Masi, I.; Tuan, A.T.; Hassner, T.; Medioni, G. On face segmentation, face swapping, and face perception. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 2018, pp. 98–105.
- Natsume, R.; Yatagawa, T.; Morishima, S. Rsgan: face swapping and editing using face and hair representation in latent spaces. arXiv preprint arXiv:1804.03447 2018.
- Nirkin, Y.; Keller, Y.; Hassner, T. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7184–7193.
- Zhu, Y.; Li, Q.; Wang, J.; Xu, C.Z.; Sun, Z. One shot face swapping on megapixels. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 4834–4844.
- Xu, Y.; Deng, B.; Wang, J.; Jing, Y.; Pan, J.; He, S. High-resolution face swapping via latent semantics disentanglement. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7642–7651.
- Xu, Z.; Zhou, H.; Hong, Z.; Liu, Z.; Liu, J.; Guo, Z.; Han, J.; Liu, J.; Ding, E.; Wang, J. StyleSwap: Style-Based Generator Empowers Robust Face Swapping. In Proceedings of the European Conference on Computer Vision. Springer, 2022, pp. 661–677.
- Luo, Y.; Zhu, J.; He, K.; Chu, W.; Tai, Y.; Wang, C.; Yan, J. StyleFace: Towards Identity-Disentangled Face Generation on Megapixels. In Proceedings of the European Conference on Computer Vision. Springer, 2022, pp. 297–312.
- Liu, Z.; Li, M.; Zhang, Y.; Wang, C.; Zhang, Q.; Wang, J.; Nie, Y. Fine-Grained Face Swapping via Regional GAN Inversion. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8578–8587.
- Zhu, X.; Lei, Z.; Liu, X.; Shi, H.; Li, S.Z. Face alignment across large poses: A 3d solution. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 146–155.
- Feng, Y.; Wu, F.; Shao, X.; Wang, Y.; Zhou, X. Joint 3d face reconstruction and dense alignment with position map regression network. In Proceedings of the Proceedings of the European conference on computer vision (ECCV), 2018, pp. 534–551.
- Pu, Y.; Gan, Z.; Henao, R.; Yuan, X.; Li, C.; Stevens, A.; Carin, L. Variational autoencoder for deep learning of images, labels and captions. Advances in neural information processing systems 2016, 29.
- Shen, Y.; Luo, P.; Yan, J.; Wang, X.; Tang, X. Faceid-gan: Learning a symmetry three-player gan for identity-preserving face synthesis. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 821–830.
- Tran, L.; Yin, X.; Liu, X. Disentangled representation learning gan for pose-invariant face recognition. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1415–1424.
- Deng, Y.; Yang, J.; Xu, S.; Chen, D.; Jia, Y.; Tong, X. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2019, pp. 0–0.
- Daněček, R.; Black, M.J.; Bolkart, T. EMOCA: Emotion driven monocular face capture and animation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20311–20322.
- Sengupta, S.; Kanazawa, A.; Castillo, C.D.; Jacobs, D.W. Sfsnet: Learning shape, reflectance and illuminance of facesin the wild’. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6296–6305.
- Pan, X.; Dai, B.; Liu, Z.; Loy, C.C.; Luo, P. Do 2d gans know 3d shape? unsupervised 3d shape reconstruction from 2d image gans. arXiv preprint arXiv:2011.00844 2020.
- Shi, Y.; Aggarwal, D.; Jain, A.K. Lifting 2d stylegan for 3d-aware face generation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 6258–6266.
- Zhang, Z.; Chen, R.; Cao, W.; Tai, Y.; Wang, C. Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in the Wild. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 382–393.
- Tov, O.; Alaluf, Y.; Nitzan, Y.; Patashnik, O.; Cohen-Or, D. Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG) 2021, 40, 1–14.
- Xia, W.; Yang, Y.; Xue, J.H.; Wu, B. Tedigan: Text-guided diverse face image generation and manipulation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2256–2265.
- Lee, C.H.; Liu, Z.; Wu, L.; Luo, P. Maskgan: Towards diverse and interactive facial image manipulation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5549–5558.
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014.
- Gao, G.; Huang, H.; Fu, C.; Li, Z.; He, R. Information bottleneck disentanglement for identity swapping. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 3404–3413.



| Method | Arc. Simi. ↑ | Depth ↓ | Albedo ↓ |
| FaceShifter [1] | 49.33 | 31.68 | 49.59 |
| SimSwap [2] | 52.03 | 32.32 | 48.49 |
| MegaFS [22] | 48.49 | 33.44 | 48.18 |
| HifiFace [3] | 48.24 | 32.35 | 50.22 |
| InfoSwap [42] | 52.58 | 31.04 | 51.58 |
| Ours | 55.20 | 27.89 | 35.17 |
| Method | Shape ↓ | Tex. ↓ | Exp. ↓ | Pose ↓ | Light. ↓ |
| FaceShifter [1] | 2.07 | 5.34 | 0.74 | 0.57 | 1.08 |
| SimSwap [2] | 2.01 | 5.09 | 1.15 | 0.75 | 1.71 |
| MegaFS [22] | 2.34 | 5.25 | 1.25 | 2.77 | 3.04 |
| HifiFace [3] | 1.75 | 4.95 | 1.23 | 0.63 | 2.14 |
| InfoSwap [42] | 2.01 | 4.91 | 1.38 | 2.41 | 1.93 |
| Ours | 1.26 | 3.12 | 1.02 | 0.60 | 1.72 |
| Method | Arc. Simi. ↑ | Depth ↓ | Albedo ↓ |
| Ours | 55.20 | 27.89 | 35.17 |
| w/o | 49.44 | 27.80 | 34.58 |
| w/o | 55.97 | 28.62 | 35.07 |
| w/o | 56.82 | 27.86 | 38.82 |
| w/o & | 57.06 | 28.73 | 39.34 |
| (a) | 53.26 | 28.43 | 36.77 |
| (b) | 51.98 | 28.71 | 35.48 |
| (c) | 51.86 | 28.46 | 36.41 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).