Submitted:
09 May 2025
Posted:
09 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Film Poster Composition Layout Method
3. Film Poster Layout Segmentation Method Based on Improved deepLabv3+
3.1. Relevant Model Theory
3.1.1. DeepLabv3+ Base Model
3.1.2. Mobilenetv3 Network
3.1.3. GoogLeNet Network
3.2. Improved deeplabv3+ Network Models
4. Dataset and Model Training
4.1. Dataset Production
4.2. Model Training
5. Experimental Results and Analysis
5.1. Loss Functions and Evaluation Indicators
5.2. Segmentation Comparison Experiment Analysis
5.2.1. Impact of Different Backbone Feature Extraction Networks on Model Performance
5.2.2. Comparative Experimental Analysis of Different Models
5.3. Analysis of Layout Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, Hongni. 2021. Visual communication design of digital media in digital advertising. Journal of Contemporary Educational Research, 5(7), 36-39. [CrossRef]
- Jin, Xin., Zhou, Bin., Zou, Dongqing., Li, Xiaodong., Sun, Hongbo., & Wu, Lu. 2018. Image aesthetic quality assessment: A survey. Science and Technology Review, 36(9), 36-45. http://www.kjdb.org/CN/10.3981/j.issn.1000-7857.2018.09.005.
- Deng, Yubin., Loy, Chen, Change., & Tang, Xiaoou. 2017. Image aesthetic assessment: An experimental survey. IEEE Signal Processing Magazine, 34(4), 80-106. [CrossRef]
- She, Dongyu., Lai, Yu, Kun., Yi, Gaoxiong., & Xu, Kun. 2021. Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, Online, pp. 8475-8484. [CrossRef]
- Riyanto, Bedjo. 2023. Analysis of Design Elements on Secret Magic Control Agency Movie Poster. TAMA: Journal of Visual Arts, 1(1), 29-37. [CrossRef]
- Chen, Siyuan., Liu, Danfei., Pu, Yumei., & Zhong, Yunfei. 2022. Advances in deep learning-based image recognition of product packaging. Image and Vision Computing, 104571. [CrossRef]
- George, Nagy., & Sharad, C, Seth. 1984. Hierarchical image representation with application to optically scanned documents. Proc. 7th Int. Conference on Pattern Recognition (ICPR), 347-349. http://digitalcommons.unl.edu/cseconfwork.
- Mao, Song., Rosenfeld, Azriel., & Kanungo, Tapas. 2003. Document structure analysis algorithms: a literature survey. Document recognition and retrieval X, 5010, 197-207. [CrossRef]
- Ha, Jaekyu., Haralick, R. M., & Phillips, I. T. 1995. Document page decomposition by the bounding-box project. In Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE, Vol. 2, (1995, August). pp. 1119-1122. [CrossRef]
- Pu, Yumei., Liu, Danfei., Chen, Siyuan., & Zhong, Yunfei. 2023. Research Progress on the Aesthetic Quality Assessment of Complex Layout Images Based on Deep Learning. Applied Sciences, 13(17), 9763. [CrossRef]
- Wu, Xingjiao., Zheng, Yingbin., Ma, Tianlong., Ye, Hao., & He, Liang. 2021. Document image layout analysis via explicit edge embedding network. Information Sciences, 577, 436-448. [CrossRef]
- Guo, Shunan., Jin, Zhuochen., Sun, Fuling., Li, Jingwen., Li, Zhaorui., Shi, Yang., & Cao, Nan. 2021. Vinci: an intelligent graphic design system for generating advertising posters. In Proceedings of the 2021 CHI conference on human factors in computing systems. (2021, May). pp. 1-17. [CrossRef]
- Chen, Liang, Chieh., Papandreou, George., Schroff, Florian., & Adam, Hartwig. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. [CrossRef]
- Chen, Liang, Chieh., Papandreou, George., Kokkinos, Iasonas., Murphy, Kevin., & Yuille, Alan, L. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834-848. [CrossRef]
- Chollet, Francois. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1251-1258. [CrossRef]
- Andrew, Howard., Mark, Sandler., Grace, Chu., Liang-Chieh, Chen., Bo, Chen., Mingxing, Tan., Weijun, Wang., Yukun, Zhu., Ruoming, Pang., Vijay, Vasudevan., Quoc, V. Le., Hartwig, Adam. 2019. Searching for mobilenetv3. Proceedings of the IEEE/CVF international conference on computer vision. pp. 1314-1324. [CrossRef]
- Szegedy, Christian., Wei Liu, Yangqing Jia., Sermanet, Pierre., Reed, Scott., Anguelov, Dragomir., Erhan, Dumitru., Vanhoucke, Vincent., Rabinovich, Andrew. 2015. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9. [CrossRef]
- Everingham, Mark., Eslami, S. M. Ali., Van Gool, Luc., Williams, Christopher. K. I., Winn, John., & Zisserman, Andrew. 2015. The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111, 98-136. [CrossRef]
- Long, Jonathan., Shelhamer, Evan., & Darrell, Trevor. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431-3440. [CrossRef]
- Zhao, Hengshuang., Shi, Jianping., Qi, Xiaojuan., Wang, Xiaogang., & Jia, Jiaya. 2017. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881-2890. [CrossRef]
- Krizhevsky, Alex., Sutskever, Ilya., & Hinton, Geoffrey. E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. [CrossRef]
- Howard, Andrew, G., Zhu, Menglong., Chen, Bo., Kalenichenko, Dmitry., Wang, Weijun., Weyand, Tobias., Andreetto, Marco., & Adam, Hartwig. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. [CrossRef]
- Simonyan, Karen., & Zisserman, Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. [CrossRef]









| Dataset | Train | Val | Test |
|---|---|---|---|
| Poster Segmentation | 1610 | 460 | 230 |
| Poster_layout classification | 2080 | 220 | — |
| Backbone | MIoU/% | Time/h |
|---|---|---|
| Xception | 71.09 | 25.10 |
| ResNet101 | 72.89 | 26.20 |
| Mobilenetv2 | 70.05 | 16.20 |
| Mobilenetv3-small | 75.60 | 15.09 |
| Model | Backbone | MIoU/% | Time/h |
|---|---|---|---|
| FCN | ResNet101 | 73.00 | 15.54 |
| PSPNet | ResNet101 | 73.60 | 16.27 |
| DeepLabv3 | ResNet101 | 74.80 | 17.54 |
| DeepLabv3+ | Mobilenetv2 | 69.50 | 16.01 |
| Textual model | Mobilenetv3-small | 75.60 | 15.09 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).