ARTICLE | doi:10.20944/preprints202207.0308.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: micro-video classification; 3D CNN; multi-modal
Online: 21 July 2022 (03:09:34 CEST)
Along with the popularity of the Internet, people are exposed to more and more ways of micro-videos, and a huge amount of micro-video data has emerged. micro-videos have gradually become the Internet content preferred by the public, and a large number of micro-video apps have also emerged, such as Tiktok and Kwai. Intelligent classification and mining of micro-videos can greatly enhance user experience, improve business operation efficiency and enhance user experience. Through deep intelligent analysis and mining of micro-videos, important information in micro-videos can be extracted to provide an important basis for beautifying videos, content appreciation, video recommendation, content search, etc. In the past, content understanding for short videos often used human work annotation, but in recent years, with the great success of deep convolutional neural networks in image recognition, short video content understanding based on this method has gradually developed. Nowadays, most recognition algorithms extract the feature representation of each frame independently and then fuse them. However, while extracting the feature representation, some low-level semantic features are lost, which makes the algorithm unable to accurately distinguish the category of the video. At present, the algorithm of micro-video recognition based on deep learning has surpassed the iDT algorithm, making these traditional methods fade out of people’s view. In this paper according to the micro-video classification task, a new network model is proposed to concatenate features of each modality into the overall features of various modalities through the network, and then fuse the various modal features with the attention mechanism to obtain the whole micro-video features, which will be used for classification. In order to verify the effectiveness of the algorithm proposed in this paper, experiments are conducted in the public dataset, and it is shown the effectiveness of our model.