Submitted:
15 November 2024
Posted:
18 November 2024
You are already at the latest version
Abstract
Keywords:
1. The Evolution of Artificial Intelligence Technologies in Content Creation
2. Characteristics of Synthetic Content
2.1. Text
2.2. Audio
2.3. Images
2.4. Video
3. Challenges and Difficulties
4. Analysis of a Current Detection Tools/Algorithms
4.1. Stylometric Analysis
4.2. Watermarking and Digital Fingerprints
4.3. Adversarial and Robust Detection Techniques
4.4. Machine Learning Models
4.5. Blockchain
5. Directions of Innovation
6. Conclusions and Ethical Concerns
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Spector, L. Evolution of Artificial Intelligence. Artificial Intelligence 2006, 170, 1251–1253. [Google Scholar] [CrossRef]
- Anantrasirichai, N.; Bull, D. Artificial Intelligence in the Creative Industries: A Review. Artificial Intelligence Review 2021 55:1 2021, 55, 589–656. [Google Scholar] [CrossRef]
- Wu, J.; Gan, W.; Chen, Z.; Wan, S.; Lin, H. AI-Generated Content (AIGC): A Survey. 2023.
- Chen, C.; Fu, J.; Lyu, L. A Pathway Towards Responsible AI Generated Content. IJCAI International Joint Conference on Artificial Intelligence 2023, 2023, 7033–7038. [Google Scholar] [CrossRef]
- Belgodere, B.; Dognin, P.; Ivankay, A.; Melnyk, I.; Mroueh, Y.; Mojsilovic, A.; Navratil, J.; Nitsure, A.; Padhi, I.; Rigotti, M.; et al. Auditing and Generating Synthetic Data with Controllable Trust Trade-Offs. 2023. [CrossRef]
- Georgiev, G. Has Interest in Data Science Peaked Already? | by Georgi Georgiev | Towards Data Science Available online:. Available online: https://towardsdatascience.com/has-interest-in-data-science-peaked-already-437648d7f408 (accessed on 3 June 2024).
- Salvi, D.; Hosler, B.; Bestagini, P.; Stamm, M.C.; Tubaro, S. TIMIT-TTS: A Text-to-Speech Dataset for Multimodal Synthetic Media Detection. IEEE Access 2023, 11, 50851–50866. [Google Scholar] [CrossRef]
- Vora, et al. V. A Multimodal Approach for Detecting AI Generated Content Using BERT and CNN. International Journal on Recent and Innovation Trends in Computing and Communication 2023, 11, 691–701. [Google Scholar] [CrossRef]
- Dolhansky, B.; Bitton, J.; Pflaum, B.; Lu, J.; Howes, R.; Wang, M.; Ferrer, C.C. The DeepFake Detection Challenge (DFDC) Dataset. 2020.
- Nguyen, T.T.; Nguyen, Q.V.H.; Nguyen, D.T.; Nguyen, D.T.; Huynh-The, T.; Nahavandi, S.; Nguyen, T.T.; Pham, Q.V.; Nguyen, C.M. Deep Learning for Deepfakes Creation and Detection: A Survey. Computer Vision and Image Understanding 2022, 223, 103525. [Google Scholar] [CrossRef]
- Agarwal, S.; Varshney, L.R. Limits of Deepfake Detection: A Robust Estimation Viewpoint. 2019.
- Shah, A.; Ranka, P.; Dedhia, U.; Prasad, S.; Muni, S.; Bhowmick, K. Detecting and Unmasking AI-Generated Texts through Explainable Artificial Intelligence Using Stylistic Features. International Journal of Advanced Computer Science and Applications 2023, 14, 1043–1053. [Google Scholar] [CrossRef]
- Sadasivan, V.S.; Kumar, A.; Balasubramanian, S.; Wang, W.; Feizi, S. Can AI-Generated Text Be Reliably Detected? 2023.
- Rodriguez, J.D.; Hay, T.; Gros, D.; Shamsi, Z.; Srinivasan, R. Cross-Domain Detection of GPT-2-Generated Technical Text. NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference 2022, 1213–1233. [CrossRef]
- Epstein, D.C.; Jain, I.; Wang, O.; Zhang, R. Online Detection of AI-Generated Images. Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023. [CrossRef]
- Corvi, R.; Cozzolino, D.; Zingarini, G.; Poggi, G.; Nagano, K.; Verdoliva, L. On The Detection of Synthetic Images Generated by Diffusion Models. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. [CrossRef]
- Krishna, K.; Song, Y.; Karpinska, M.; Wieting, J.; Iyyer, M. Paraphrasing Evades Detectors of AI-Generated Text, but Retrieval Is an Effective Defense. Adv Neural Inf Process Syst 2023, 36. [Google Scholar]
- Alamleh, H.; Alqahtani, A.A.S.; Elsaid, A. Distinguishing Human-Written and ChatGPT-Generated Text Using Machine Learning. 2023 Systems and Information Engineering Design Symposium, SIEDS 2023. [CrossRef]
- Kumar, S.; Kumar, S. AI Generated Music. International Journal of Research in Science & Engineering 2024, 4, 10–12. [Google Scholar] [CrossRef]
- Kadam, A.; Rane, S.; Mishra, A.; Sahu, S.; Singh, S.; Pathak, S. A Survey of Audio Synthesis and Lip-Syncing for Synthetic Video Generation. EAI Endorsed Transactions on Creative Technologies 2021, 8, 169187. [Google Scholar] [CrossRef]
- Galbally, J.; Marcel, S. Face Anti-Spoofing Based on General Image Quality Assessment. Proceedings - International Conference on Pattern Recognition, 1178. [Google Scholar] [CrossRef]
- Korshunov, P.; Marcel, S. DeepFakes: A New Threat to Face Recognition? Assessment and Detection. 2018. [Google Scholar]
- Giudice, O.; Guarnera, L.; Battiato, S. Fighting Deepfakes by Detecting Gan Dct Anomalies. J Imaging 2021, 7. [Google Scholar] [CrossRef] [PubMed]
- Pu, W.; Hu, J.; Wang, X.; Li, Y.; Hu, S.; Zhu, B.; Song, R.; Song, Q.; Wu, X.; Lyu, S. Learning a Deep Dual-Level Network for Robust DeepFake Detection. Pattern Recognit 2022, 130. [Google Scholar] [CrossRef]
- Hong, S.; Seo, J.; Shin, H.; Hong, S.; Kim, S. DirecT2V: Large Language Models Are Frame-Level Directors for Zero-Shot Text-to-Video Generation. 2023.
- Jonathan, B. Additional Challenges to Detecting AI Writing - Plagiarism Today Available online:. Available online: https://www.plagiarismtoday.com/2023/07/31/additional-challenges-to-detecting-ai-writing/ (accessed on 11 June 2024).
- Gillham, J. AI Content Detector Accuracy Review + Open Source Dataset and Research Tool – Originality. Available online: https://originality.ai/blog/ai-content-detection-accuracy (accessed on 8 June 2024).
- Barshay, J. Proof Points: It’s Easy to Fool ChatGPT Detectors Available online:. Available online: https://hechingerreport.org/proof-points-its-easy-to-fool-chatgpt-detectors/ (accessed on 11 June 2024).
- Pop, P. ChatGPT and AI Detectors Available online:. Available online: https://www.popautomation.com/post/chatgpt-and-ai-detectors (accessed on 17 June 2024).
- Juhasz, B. How to Avoid Being Flagged by GPT Detectors! The Expert Strategies for Content Writers – Service Lifter Available online:. Available online: https://servicelifter.com/guides/how-to-avoid-being-flagged-by-gpt-detectors-the-expert-strategies-for-content-writers/ (accessed on 10 July 2024).
- Christian, P. How to Detect ChatGPT: Tools and Tips for Detection Available online:. Available online: https://undetectable.ai/blog/how-to-detect-chatgpt/ (accessed on 14 July 2024).
- Hanrahan, G. Computational Neural Networks Driving Complex Analytical Problem Solving. Anal Chem 2010, 82, 4307–4313. [Google Scholar] [CrossRef]
- Ranade, P.; Piplai, A.; Mittal, S.; Joshi, A.; Finin, T. Generating Fake Cyber Threat Intelligence Using Transformer-Based Models. Proceedings of the International Joint Conference on Neural Networks 2021. [Google Scholar] [CrossRef]
- Talaei Khoei, T.; Ould Slimane, H.; Kaabouch, N. Deep Learning: Systematic Review, Models, Challenges, and Research Directions. Neural Comput Appl 2023, 35, 23103–23124. [Google Scholar] [CrossRef]
- Park, S.; Moon, S.; Kim, J. Ensuring Visual Commonsense Morality for Text-to-Image Generation. 2022.
- Welsh, A.P.; Edwards, M. Text Generation for Dataset Augmentation in Security Classification Tasks. 2023.
- Orenstrakh, M.S.; Karnalim, O.; Suarez, C.A.; Liut, M. Detecting LLM-Generated Text in Computing Education: A Comparative Study for ChatGPT Cases. 2023, 121–126. [CrossRef]
- Xi, Z.; Huang, W.; Wei, K.; Luo, W.; Zheng, P. AI-Generated Image Detection Using a Cross-Attention Enhanced Dual-Stream Network. 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 2023, 1463–1470. [CrossRef]
- Carlini, N.; Wagner, D. Towards Evaluating the Robustness of Neural Networks. Proc IEEE Symp Secur Priv 2017, 39–57. [Google Scholar] [CrossRef]
- Marra, F.; Saltori, C.; Boato, G.; Verdoliva, L. Incremental Learning for the Detection and Classification of GAN-Generated Images. 2019 IEEE International Workshop on Information Forensics and Security, WIFS 2019 2019. [CrossRef]
- Neal, T.; Sundararajan, K.; Fatima, A.; Yan, Y.; Xiang, Y.; Woodard, D. Surveying Stylometry Techniques and Applications. ACM Comput Surv 2017, 50. [Google Scholar] [CrossRef]
- Brennan, M.; Afroz, S.; Greenstadt, R. Adversarial Stylometry: Circumventing Authorship Recognition to Preserve Privacy and Anonymity. ACM Transactions on Information and System Security 2012, 15. [Google Scholar] [CrossRef]
- Eder, M.; Rybicki, J.; Kestemont, M. Stylometry with R: A Package for Computational Text Analysis. R Journal 2016, 8, 107–121. [Google Scholar] [CrossRef]
- Potthast, M.; Kiesel, J.; Reinartz, K.; Bevendorff, J.; Stein, B. A Stylometric Inquiry into Hyperpartisan and Fake News. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) 2018, 1, 231–240. [Google Scholar] [CrossRef]
- Michailidis, P.D. A Scientometric Study of the Stylometric Research Field. Informatics 2022, Vol. 9, Page 60 2022, 9, 60. [Google Scholar] [CrossRef]
- Abbasi, A.; Chen, H. Writeprints: A Stylometric Approach to Identity-Level Identification and Similarity Detection in Cyberspace. ACM Trans Inf Syst 2008, 26. [Google Scholar] [CrossRef]
- Quiring, E.; Arp, D.; Rieck, K. Fraternal Twins: Unifying Attacks on Machine Learning and Digital Watermarking. 2017.
- Boujerfaoui, S.; Riad, R.; Douzi, H.; Ros, F.; Harba, R. Image Watermarking between Conventional and Learning-Based Techniques: A Literature Review. Electronics (Switzerland) 2023, 12. [Google Scholar] [CrossRef]
- Jiang, Z.; Zhang, J.; Gong, N.Z. Evading Watermark Based Detection of AI-Generated Content. CCS 2023 - Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 1181. [Google Scholar] [CrossRef]
- Makhrib, Z.F.; Karim, A.A. Digital Watermark Technique: A Review. J Phys Conf Ser 2021, 1999. [Google Scholar] [CrossRef]
- Kirchenbauer, J.; Geiping, J.; Wen, Y.; Katz, J.; Miers, I.; Goldstein, T. A Watermark for Large Language Models. Proc Mach Learn Res 2023, 202, 17061–17084. [Google Scholar]
- Wen, Y.; Kirchenbauer, J.; Geiping, J.; Goldstein, T. Tree-Ring Watermarks: Fingerprints for Diffusion Images That Are Invisible and Robust. 2023.
- Frattolillo, F. A Watermarking Protocol Based on Blockchain. Applied Sciences (Switzerland) 2020, 10, 1–18. [Google Scholar] [CrossRef]
- Frattolillo, F. A Multiparty Watermarking Protocol for Cloud Environments. Journal of Information Security and Applications 2019, 47, 246–257. [Google Scholar] [CrossRef]
- Harika, D.; Noorullah, S. Implementation of Image Authentication Using Digital Watermarking with Biometric. International Journal of Engineering Technology and Management Sciences 2023, 7, 154–167. [Google Scholar] [CrossRef]
- Kelkoul, H.; Zaz, Y.; Mantoro, T. Countering Audiovisual Content Piracy: A Hybrid Watermarking and Fingerprinting Technology. 7th International Conference on Computing, Engineering and Design, ICCED 2021. [CrossRef]
- Ren, N.; Wang, H.; Chen, Z.; Zhu, C.; Gu, J. A Multilevel Digital Watermarking Protocol for Vector Geographic Data Based on Blockchain. Journal of Geovisualization and Spatial Analysis 2023, 7. [Google Scholar] [CrossRef]
- Liu, X.; Zhu, Y.; Sun, Z.; Diao, M.; Zhang, L. A Novel Robust Video Fingerprinting-Watermarking Hybrid Scheme Based on Visual Secret Sharing. Multimed Tools Appl 2015, 74, 9157–9174. [Google Scholar] [CrossRef]
- Wang, Cliff. ; Gerdes, R.M..; Guan, Yong.; Kasera, S.Kumar. Digital Fingerprinting. 2016, 189.
- Yu, P.L.; Sadler, B.M.; Verma, G.; Baras, J.S. Fingerprinting by Design: Embedding and Authentication. Digital Fingerprinting 2016, 69–88. [Google Scholar] [CrossRef]
- Ametefe, D.S.; Sarnin, S.S.; Ali, D.M.; Muhamad, W.N.W.; Ametefe, G.D.; John, D.; Aliu, A.A. Enhancing Fingerprint Authentication: A Systematic Review of Liveness Detection Methods Against Presentation Attacks. Journal of The Institution of Engineers (India): Series B, 2024. [Google Scholar] [CrossRef]
- Ren, K.; Zheng, T.; Qin, Z.; Liu, X. Adversarial Attacks and Defenses in Deep Learning. Engineering 2020, 6, 346–360. [Google Scholar] [CrossRef]
- Bai, T.; Luo, J.; Zhao, J.; Wen, B.; Wang, Q. Recent Advances in Adversarial Training for Adversarial Robustness. 2021.
- Gibert, D.; Zizzo, G.; Le, Q.; Planes, J. A Robust Defense against Adversarial Attacks on Deep Learning-Based Malware Detectors via (De)Randomized Smoothing. IEEE Access 2024, 12, 61152–61162. [Google Scholar] [CrossRef]
- Ren, K.; Zheng, T.; Qin, Z.; Liu, X. Adversarial Attacks and Defenses in Deep Learning. Engineering 2020, 6, 346–360. [Google Scholar] [CrossRef]
- Kong, Z.; Xue, J.; Wang, Y.; Huang, L.; Niu, Z.; Li, F. A Survey on Adversarial Attack in the Age of Artificial Intelligence. Wirel Commun Mob Comput 2021, 2021. [Google Scholar] [CrossRef]
- Salehin, I.; Kang, D.K. A Review on Dropout Regularization Approaches for Deep Neural Networks within the Scholarly Domain. Electronics 2023, Vol. 12, Page 3106 2023, 12, 3106. [Google Scholar] [CrossRef]
- Jedrzejewski, F.V.; Thode, L.; Fischbach, J.; Gorschek, T.; Mendez, D.; Lavesson, N. Adversarial Machine Learning in Industry: A Systematic Literature Review. Comput Secur 2024, 145, 103988. [Google Scholar] [CrossRef]
- Paria Sarzaeim; Aarya Maturpalsingh Doshi View of A Framework for Detecting AI-Generated Text in Research Publications. Internation Conference of Advanced Technologies 2023.
- Wang, Z.; Liu, Y.; He, D.; Chan, S. Intrusion Detection Methods Based on Integrated Deep Learning Model. Comput Secur 2021, 103. [Google Scholar] [CrossRef]
- Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R. Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors. 2012.
- Meroni, G.; Comuzzi, M.; Köpke, J. Editorial: Blockchain for Trusted Information Systems. Frontiers in Blockchain 2023, 6, 1235704. [Google Scholar] [CrossRef]
- Curmi, A.; Inguanez, F. BlockChain Based Certificate Verification Platform. Lecture Notes in Business Information Processing 2019, 339, 211–216. [Google Scholar] [CrossRef]
- Malik, G.; Parasrampuria, K.; Reddy, S.P.; Shah, S. Blockchain Based Identity Verification Model. Proceedings - International Conference on Vision Towards Emerging Trends in Communication and Networking, ViTECoN 2019. [CrossRef]
- Adere, E.M. Blockchain in Healthcare and IoT: A Systematic Literature Review. Array 2022, 14. [Google Scholar] [CrossRef]
- Morar, C.D.; Popescu, D.E. A Survey of Blockchain Applicability, Challenges, and Key Threats. Computers 2024, Vol. 13, Page 223 2024, 13, 223. [Google Scholar] [CrossRef]
- Zheng, X.; Zhang, C.; Woodland, P.C. Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition. 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings. [CrossRef]
- Hang, C.N.; Yu, P.D.; Morabito, R.; Tan, C.W. Large Language Models Meet Next-Generation Networking Technologies: A Review. Future Internet 2024, 16. [Google Scholar] [CrossRef]
- Gao, M. The Advance of GPTs and Language Model in Cyber Security. Highlights in Science, Engineering and Technology 2023, 57, 195–202. [Google Scholar] [CrossRef]
- Rehana, H.; Çam, N.B.; Basmaci, M.; Zheng, J.; Jemiyo, C.; He, Y.; Özgür, A.; Hur, J. Evaluation of GPT and BERT-Based Models on Identifying Protein-Protein Interactions in Biomedical Text. 2023.
- Grishina, A.; Kyrychenko, R.; GPT-3 vs. BERT - Which Is Best? Available online: https://softteco.com/blog/bert-vs-chatgpt?WPACRandom=1731316845796 (accessed on 11 November 2024).
- Clark, K.; Luong, M.T.; Le, Q. V.; Manning, C.D. ELECTRA: Pre-Training Text Encoders as Discriminators Rather Than Generators. 8th International Conference on Learning Representations, ICLR 2020.
- Wang, Y.; Wang, W.; Joty, S.; Hoi, S.C.H. CodeT5: Identifier-Aware Unified Pre-Trained Encoder-Decoder Models for Code Understanding and Generation. EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, 8708. [Google Scholar] [CrossRef]
- Desai, S.; Durrett, G. Calibration of Pre-Trained Transformers. EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. [CrossRef]
- Vora, et al. V. A Multimodal Approach for Detecting AI Generated Content Using BERT and CNN. International Journal on Recent and Innovation Trends in Computing and Communication 2023, 11, 691–701. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, D.Z. GPT4MIA: Utilizing Generative Pre-Trained Transformer (GPT-3) as A Plug-and-Play Transductive Model for Medical Image Analysis. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2023, 14393, 151–160. [Google Scholar] [CrossRef]
- Mohseni, S.; Ragan, E. Combating Fake News with Interpretable News Feed Algorithms. 2018.
- Seddari, N.; Derhab, A.; Belaoued, M.; Halboob, W.; Al-Muhtadi, J.; Bouras, A. A Hybrid Linguistic and Knowledge-Based Analysis Approach for Fake News Detection on Social Media. IEEE Access 2022, 10, 62097–62109. [Google Scholar] [CrossRef]
- Epstein, Z.; Foppiani, N.; Hilgard, S.; Sharma, S.; Glassman, E.; Rand, D. Do Explanations Increase the Effectiveness of AI-Crowd Generated Fake News Warnings? Proceedings of the International AAAI Conference on Web and Social Media 2022, 16, 183–193. [Google Scholar] [CrossRef]
- Chen, C.; Fu, J.; Lyu, L. A Pathway Towards Responsible AI Generated Content. IJCAI International Joint Conference on Artificial Intelligence, 7033. [Google Scholar] [CrossRef]
- Yang, X.; Pan, L.; Zhao, X.; Chen, H.; Petzold, L.; Wang, W.Y.; Cheng, W. A Survey on Detection of LLMs-Generated Content. 2023.
- Mao, H.; Nie, T.; Sun, H.; Shen, D.; Yu, G. A Survey on Cross-Chain Technology: Challenges, Development, and Prospect. IEEE Access 2023, 11, 45527–45546. [Google Scholar] [CrossRef]
- Koulu, R.; Hirvonen, H.; Sankari, S.; Heikkinen, T. Artificial Intelligence and the Law: Can and Should We Regulate AI Systems? SSRN Electronic Journal 2023. [Google Scholar] [CrossRef]
- Kunda, I.; Kunda, I. Regulating the Use of Generative AI in Academic Research and Publications. PUBMET 2023. [Google Scholar] [CrossRef]










| Exclusion Criteria | Inclusion Criteria |
|---|---|
| Study type: exclusion of non-peer-reviewed studies | Language: articles published in English |
| Non-Published data: exclude grey literature such as reports or unpublished data | Seniority: articles published within the last seven years |
| Lack of outcome reporting: exclude studies that do not report on the primary outcomes of interest | Applicability: articles that remain relevant given the latest scientific advances and policy changes |
| Duplications: exclude duplicate studies or preliminary reports of already published research | Outcome relevance: studies that investigate or report on specific outcomes for the generative models |
| Sample size: studies that ensure a robustness of tests and analysis |
| Phrases generated by a language model | Hints for detecting synthetic text |
|---|---|
| The intricate dance between science and nature provides an undeniable synergy, fostering innovation and breakthroughs | Grandiose phrasing: Uses broad, impressive-sounding words without real insight |
| In modern global economics, the balance of trade and currency fluctuations are vital components | Surface-level complexity: The text sounds technical but lacks specific, actionable insights |
| The historical development of art has traversed various eras, from the Renaissance to post-modernism, each reflecting societal values | Inconsistent specificity: Mentions detailed terms (e.g., Renaissance) but glosses over other key details |
| The relationship between quantum mechanics and classical physics has puzzled scientists for decades | Generic technical jargon: The language sounds academic but doesn’t provide new or meaningful details |
| While many individuals believe in the importance of education, the future of technology seems to advance progressively towards artificial intelligence | Overly generic or predictable patterns: Complex, yet vague ideas lacking specificity or depth |
| Area | Challenges | Reference |
|---|---|---|
| AI Text Detection | Detecting AI-generated text across domains is challenging due to differences in context and writing style | [7,12,13,14,17,18,35,36,37] |
| AI Video/Image Detection | Real-time detection of AI-generated images is complex due to processing demands and frequent updates in image-generation models | [10,15,16,20,21,25] |
| AI Audio Detection | Ensuring AI-generated music maintains quality and originality while adhering to copyright regulations | [2,19,20] |
| Approach | Challenges | Benefits | References |
|---|---|---|---|
| Stylometric analysis | Detecting AI-generated text across different domains due to varied contexts | Uses stylometric analysis with explainable AI, improving transparency in AI text detection | [41,42,43,44,45,46] |
| Watermarking and digital fingerprints | Detecting realistic deepfakes, which constantly evolve in quality | Applies watermarking to help verify authentic versus synthetic content | [47,48,49,50,51,52,53,54,55,56,57,58,59,60,61] |
| Adversarial and robust detection techniques | Real-time detection is challenging due to processing speed and model updates | Develops robust detection for real-time image analysis and authentication | [8,10,24,40,62,63,64,65,66,67,68] |
| Machine learning models | GAN-generated images closely resemble real images, making detection challenging | Builds machine learning models for reliable detection of AI-generated images | [3,18,69,70,71] |
| Blockchain | Utilizing blockchain for AI trust while addressing scalability and data privacy concerns | Blockchain provides verified, tamper-proof information management | [53,57,72,73,74,75,76] |
| Phrases generated by a language model | Stylometric analysis |
|---|---|
| Humanity has reached a significant point of reflection, where both the moral and existential questions about artificial intelligence must be addressed to avoid potential consequences that could reshape our world irreversibly | High word frequency in filler words: Overuse of vague connectors (where both, must be addressed, potential consequences) and modal verbs (must, could). Stylometric analysis may show unnatural filler-word frequency or lack of diversity in key terms |
| With the dawn of big data, companies are now in possession of vast amounts of information, enabling better decision-making processes but raising serious concerns regarding privacy, ethics, and control over user data. | Phrase repetition across contexts: Frequent recycling of vast amounts of information and serious concerns regarding privacy without depth. Stylometric analysis could indicate a repetitive n-gram pattern, suggesting the model’s tendency to repeat high-frequency phrases. |
| Emerging technologies bring with them not only innovation but also responsibility, as organizations strive to harness their potential while upholding the ethical principles that shape a just and equitable future for all | Overuse of platitudes: Terms like upholding ethical principle and just and equitable future sound polished but convey little actionable insight. Stylometric analysis might reveal an excessive use of long phrases that overgeneralize complex topics |
| While the digital transformation accelerates, institutions face unprecedented challenges in adapting to new paradigms, requiring a commitment to transparency, collaboration, and agility in navigating the unknown | Unnatural lexical choices and phrasing: Phrases like navigating the unknown combined with commitment to transparency show abrupt topic shifts. Stylometric analysis may highlight odd lexical choices or inconsistent lexical richness across sentences |
| Criteria | Watermarking | Digital fingerprinting |
|---|---|---|
| Visibility | Can be visible or invisible, visible watermarks assert ownership directly on the content, while invisible ones protect without altering appearance | Always invisible and does not alter content in any perceptible way, maintaining the original user experience |
| Purpose | Primarily used to assert ownership and maintain content integrity | Used mainly for tracking and identification to monitor distribution and detect unauthorized use |
| Robustness to alterations | Designed to be robust against manipulations like compression, cropping, and minor edits, persists through moderate transformations | Sensitive to content changes, significant alterations, like re-encoding or major edits, can produce a different fingerprint |
| Content protection | Helps in copyright enforcement by embedding proof of ownership, aiding in verification of authenticity and integrity | Assists in monitoring distribution and forensic analysis to track content’s path and identify unauthorized distribution |
| Use in copyright enforcement | Critical for asserting copyright ownership and deterring misuse through visible or invisible marks | Valuable for tracking usage patterns and identifying sources in case of illegal distribution or infringement |
| Suitability for AIGC | Used in AI-generated content detection, watermarks can be embedded in media to detect authenticity in AI-generated visuals | Less commonly used for AIGC detection but can help track specific user activity when fingerprints are embedded in distributed content |
| Susceptibility to removal | May be vulnerable to watermark removal attacks if methods of embedding are publicly known or poorly implemented | Difficult to remove without changing the content significantly, as fingerprints are embedded uniquely and are integral to the content distribution |
| Implementation complexity | Generally simpler to implement, as it relies on embedding identifiable markers, visible marks are particularly straightforward | More complex, requiring unique identifiers per copy and often higher computational resources to embed and retrieve fingerprints |
| Example applications | Common in image copyright marking, video watermarking for brand logos, and preventing reuse of visual content in unauthorized contexts | Widely used in music streaming to monitor piracy, digital video distribution to track views, and document tracking for unauthorized sharing |
| Robustness research | Research [16] indicates some watermarking methods may be vulnerable to advanced removal techniques, particularly in AIGC | Less focus on robustness against tampering but can be affected by significant modifications that alter the original fingerprint |
| Model | Accuracy | Reliability | Efficiency | Use Case Highlights | Adaptability to Few-shot Learning |
|---|---|---|---|---|---|
| GPT-3 [77,80,81] | High accuracy in certain generative and comprehension tasks, but lower in domain- specific accuracy | Moderate reliability, struggles with overfitting in limited domain- specific tasks | Resource- intensive, especially for fine-tuning on specialized hardware | Effective in generative tasks and large-scale NLP, limited adaptability for specialized tasks | Strong in few-shot learning, especially in generative tasks |
| BERT [80,81,85] | High accuracy in classification and NLU (Natural Language understanding) tasks, including entity recognition | Generally reliable for structured NLP tasks, especially classification and sentiment analysis | More efficient than GPT-3, effective in many NLP applications without extensive resources | Best for information extraction, sentiment analysis, and question answering | Moderate, few-shot limited by pre-training on specific masked language tasks |
| BERT (BioBERT) [77,85] | High in biomedical domain | Reliable for biomedical text, high precision and recall in medical and research contexts | Moderate resource needs, optimized for biomedical domains | Protein interactions, biomedical text mining, disease detection | Limited, specialized for biomedical NLP, limited few-shot learning applicability |
| ELECTRA [82] | Comparable to BERT with higher efficiency in classification tasks | Consistently reliable, high performance in GLUE (General Language Understanding Evaluation) benchmark with reduced computation | High efficiency due to pre-training as discriminator rather than generator | Token-based NLP tasks, question answering, and tasks with constrained resources | Moderate, not optimized for few-shot, but robust for token detection tasks |
| CodeT5 [83] | High accuracy in code generation and defect detection tasks | Reliable in tasks with code semantics and identifier differentiation | Efficient for programming language tasks, leverages code-specific pretraining | Code generation, defect detection, multi-language programming tasks | Limited, focused on programming language and code understanding |
| GPT-4 (general NLP) [80,86] | Not domain- specific but achieves high accuracy on text generation and NLU | Higher reliability with human- aligned feedback, yet prone to bias | High compute needs similar to GPT-3 but more efficient at scale | Complex language modeling, context-driven generation | Strong, surpasses GPT-3 with fine-tuning on varied datasets |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).