Submitted:
25 May 2025
Posted:
27 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Overall Design of Advertisement Creative System Based on Multimodal Deep Learning
3. Multimodal Deep Learning Key Technologies
3.1. Multimodal fusion technology
3.2. Deep Learning Model Design
3.3. Algorithm for evaluating advertisement creative performance
4. Functional Validation of a Multimodal Deep Learning Advertisement Creative System
4.1. Experimental Design
4.2. Model Training and Parameter Tuning
4.3. Comparison Experiments
4.3.1. Comparison with traditional methods
4.3.2. Comparison of the performance of different models
5. Conclusion
References
- Sharakhina, L.; Ilyina, I.; Kaplun, D.; et al. AI technologies in the analysis of visual advertising messages: survey and application. Journal of Marketing Analytics, 2024, 12, 1066–1089. [Google Scholar] [CrossRef]
- Isalman, I.; Mubaraq, A.; Conny, C.; et al. Creative training in creating advertising content on social media for entrepreneur students. Society: Jurnal Pengabdian Masyarakat, 2025, 4, 1–11. [Google Scholar] [CrossRef]
- Bijalwan, P.; Gupta, A.; Johri, A.; et al. Unveiling sora open AI’s impact: a review of transformative shifts in marketing and advertising employment. Cogent Business & Management, 2025, 12, 2440640. [Google Scholar]
- Bijalwan, P.; Gupta, A.; Johri, A.; et al. Unveiling sora open AI’s impact: a review of transformative shifts in marketing and advertising employment. Cogent Business & Management, 2025, 12, 2440640. [Google Scholar]
- van Deventer, M.; Saraiva, M. Antecedents of Generation Y consumers’ perceived value of social media advertisements. Cogent Social Sciences, 2025, 11, 2450097. [Google Scholar] [CrossRef]
- Braun, M.; Schwartz, E.M. Where A/B Testing Goes Wrong: How Divergent Delivery Affects What Online Experiments Cannot (and Can) Tell You About How Customers Respond to Advertising. Journal of Marketing, 2025, 89, 71–95. [Google Scholar] [CrossRef]
- Prihatiningsih, T.; Panudju, R.; Prasetyo, I.J. Digital Advertising Trends and Effectiveness in the Modern Era: a Systematic Literature Review. Golden Ratio of Marketing and Applied Psychology of Business, 2025, 5, 01–12. [Google Scholar] [CrossRef]
- Peschiera, A.N.; Chapa, S. Does co-brand placement work on ad recall? Exploring the impact of co-brand placement in storytelling animated advertising. Economicus Journal of Business and Economics Insights, 2025, 2, 26–35. [Google Scholar]
- Syafitri, H.R.; Hamid, R.S.; Maszudi, E. Optimization of Creativity and Performance: Self-Efficacy, Knowledge Sharing, and Digital Literacy as Moderator Variables. International Conference of Business, Education, Health, and Scien-Tech. 2024, 1, 1159–1173. [Google Scholar]
- Truong, V. Optimizing mobile in-app advertising effectiveness using app publishers-controlled factors. Journal of Marketing Analytics, 2024, 12, 925–943. [Google Scholar] [CrossRef]




| Module Name | input Dimension | Output Dimension | Attention Span | Residual Link | Standardized Approach |
|---|---|---|---|---|---|
| image encoder | 224 × 224 x 3 | 512 | - | clogged | - |
| text encoder | 512 (maximum number of Token) | 512 | - | clogged | - |
| User Behavior Encoder | 64 | 128 | - | clogged | - |
| Fusion layer after feature splicing | 512+512+128=1152 | 1024 | 8 | be | LayerNorm |
| Cross-modal Attention Module | 128 x 1024 | 128 x 1024 | 8 | be | LayerNorm |
| Network Level | Input Dimension | Output Dimension | Attention Span | Feedforward Layer Dimension | Activation Function | Standardized Approach | Dropout Rate |
|---|---|---|---|---|---|---|---|
| Transformer Layer 1 | 1024 | 128 | 8 | 2048 | GELU | LayerNorm | 0.1 |
| Transformer Layer 2 | 128 | 256 | 8 | 2048 | GELU | LayerNorm | 0.1 |
| Transformer Layer 3 | 256 | 512 | 16 | 2048 | GELU | LayerNorm | 0.1 |
| Assessment Dimensions | Number of Dimensions | Module Source | Meaning of the Indicator | Weighting |
|---|---|---|---|---|
| Hits Predictive Score | 32 | CTR Predictive Model Output | characterize users’ expectation of click-through willingness | 0.5 |
| Emotional Sentiment Match | 24 | Sentiment analysis submodel | Does the description text consistently express the promotional sentiment | 0.3 |
| Visual Appeal Score | 40 | Image Attention/Attraction Model | Comprehensive judgment of color, composition, clarity and other elements | 0.2 |
| Model Name | Click-Through Accuracy | F1 Value | PSNR Value | Average Generation Time (ms) | Overall Rating |
|---|---|---|---|---|---|
| TF-IDF splicing model | 0.521 | 0.633 | 23.12 | 28.6 | 0.568 |
| LSTM generative model | 0.587 | 0.712 | 25.46 | 36.2 | 0.631 |
| FastText+CNN Combined Model | 0.602 | 0.725 | 26.51 | 33.9 | 0.648 |
| Multimodal Transformer System | 0.681 | 0.791 | 28.94 | 41.7 | 0.729 |
| Model Name | Click-Through Accuracy | F1 Value | BLEU-4 | Average Generation Time (ms) | Overall Rating |
|---|---|---|---|---|---|
| Unimodal shallow stacking model | 0.563 | 0.681 | 0.427 | 25.3 | 0.594 |
| ResNet50+BERT independent coding | 0.608 | 0.724 | 0.473 | 32.1 | 0.648 |
| Multimodal Unified Transformer Encoding | 0.681 | 0.791 | 0.532 | 41.7 | 0.729 |
| Decoder Structure | Click-Through Accuracy | F1 Value | Visual Consistency Score | Average Generation Time (ms) | Overall Rating |
|---|---|---|---|---|---|
| Baseline RNN Generator | 0.578 | 0.692 | 0.661 | 26.5 | 0.609 |
| Transformer Decoder | 0.641 | 0.746 | 0.719 | 33.8 | 0.683 |
| Streamlined GPT structure | 0.673 | 0.783 | 0.752 | 39.2 | 0.716 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).