Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Multimodal Micro-video Classification Based on 3D Convolutional Neural Network

Version 1 : Received: 12 July 2022 / Approved: 21 July 2022 / Online: 21 July 2022 (03:09:34 CEST)

How to cite: Sun, Y.; Chen, B.; Wei, F.; Chen, X.; Gong, Q.; Zhang, P. Multimodal Micro-video Classification Based on 3D Convolutional Neural Network. Preprints 2022, 2022070308. https://doi.org/10.20944/preprints202207.0308.v1 Sun, Y.; Chen, B.; Wei, F.; Chen, X.; Gong, Q.; Zhang, P. Multimodal Micro-video Classification Based on 3D Convolutional Neural Network. Preprints 2022, 2022070308. https://doi.org/10.20944/preprints202207.0308.v1

Abstract

Along with the popularity of the Internet, people are exposed to more and more ways of micro-videos, and a huge amount of micro-video data has emerged. micro-videos have gradually become the Internet content preferred by the public, and a large number of micro-video apps have also emerged, such as Tiktok and Kwai. Intelligent classification and mining of micro-videos can greatly enhance user experience, improve business operation efficiency and enhance user experience. Through deep intelligent analysis and mining of micro-videos, important information in micro-videos can be extracted to provide an important basis for beautifying videos, content appreciation, video recommendation, content search, etc. In the past, content understanding for short videos often used human work annotation, but in recent years, with the great success of deep convolutional neural networks in image recognition, short video content understanding based on this method has gradually developed. Nowadays, most recognition algorithms extract the feature representation of each frame independently and then fuse them. However, while extracting the feature representation, some low-level semantic features are lost, which makes the algorithm unable to accurately distinguish the category of the video. At present, the algorithm of micro-video recognition based on deep learning has surpassed the iDT algorithm, making these traditional methods fade out of people’s view. In this paper according to the micro-video classification task, a new network model is proposed to concatenate features of each modality into the overall features of various modalities through the network, and then fuse the various modal features with the attention mechanism to obtain the whole micro-video features, which will be used for classification. In order to verify the effectiveness of the algorithm proposed in this paper, experiments are conducted in the public dataset, and it is shown the effectiveness of our model.

Keywords

micro-video classification; 3D CNN; multi-modal

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.