REVIEW | doi:10.20944/preprints202107.0326.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Deepfake; Animal Welfare; Animal Emotions; Artificial Intelligence; Digital Farming; Animal Based Measures; Emotion Modeling; Livestock Health
Online: 14 July 2021 (11:49:38 CEST)
Deepfake technologies are known for the creation of forged celebrity pornography, face and voice swaps, and other fake media content. Despite the negative connotations the technology bears, the underlying machine learning algorithms have a huge potential that could be applied to not just digital media, but also to medicine, biology, affective science, and agriculture, just to name a few. Due to the ability to generate big datasets based on real data distributions, deepfake could also be used to positively impact non-human animals such as livestock. Generated data using Generative Adversarial Networks, one of the algorithms that deepfake is based on, could be used to train models to accurately identify and monitor animal health and emotions. Through data augmentation, using digital twins, and maybe even displaying digital conspecifics where social interactions are enhanced, deepfake technologies have the potential to increase animal health, emotionality, sociality, animal-human and animal-computer interactions and thereby animal welfare, productivity, and sustainability of the farming industry.
ARTICLE | doi:10.20944/preprints202302.0299.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: deepfake detection; CNN; deep neural network; computer vision; scale invariant feature transform; histogram of oriented gradients
Online: 17 February 2023 (06:51:37 CET)
Deepfakes are manipulated or altered images, or video, that are created using deep learning models with high levels of photorealism. The two popular methods of producing a deepfake are based on either convolutional neural networks (CNN), or autoencoders. Deepfakes created using CNN comparatively show higher qualities of realism, yet oftentimes leave artifacts and distortions in the generated media that can be detected using machine learning and deep learning algorithms. In recent years, there has been an influx of periocular image and video data because of the increase usage of face masks. By wearing masks, much of what is used for facial recognition is hidden, leaving only the periocular region visible to an observer. This loss of vital information leads to easier misidentification of media, allowing deepfakes to less likely be identified as fake. In this work, feature extraction methods, such as Scale-Invariant Feature Transform (SIFT), Histogram of Oriented Gradients (HOG), and CNN, are used to train an ensemble deep learning model to detect deepfakes in videos on a frame-by-frame level based on the periocular region. Our proposed model is able to distinguish original and manipulated images with accuracies around 98.9 percent, which is an improvement to previous works by combining SIFT and HOG for deepfake detection in convolutional neural networks.
ARTICLE | doi:10.20944/preprints202303.0161.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deepfake detection; deep learning; computer vision; generalization
Online: 9 March 2023 (02:13:46 CET)
The increasing use of deep learning techniques to manipulate images and videos, commonly referred to as "deepfakes," is making more and more challenging to differentiate between real and fake content. While various deepfake detection systems have been developed, they often struggle to detect deepfakes in real-world situations. In particular, these methods are often unable to effectively distinguish images or videos when these are modified using novel techniques which have not been used in the training set. In this study, we carry out an analysis of different deep learning architectures in an attempt to understand which is more capable of better generalizing the concept of deepfake. According to our results, it appears that Convolutional Neural Networks (CNNs) seem to be more capable of storing specific anomalies and thus excel in cases of datasets with a limited number of elements and manipulation methodologies. The Vision Transformer, conversely, is more effective when trained with more varied datasets, achieving more outstanding generalization capabilities than the other methods analysed. Finally, the Swin Transformer appears to be a good alternative for using an attention-based method in a more limited data regime. All the analyzed architectures seem to have a different way to look at deepfakes but since in a real-world environment, the generalization capability is essential, based on the carried out experiments the Vision Transformer seems to provide superior performances.