Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

An Automatic Near-Duplicate Video Data Cleaning Method Based on a Consistent Feature Hash Ring

* ORCID logo , * ,
Version 1 : Received: 27 March 2024 / Approved: 29 March 2024 / Online: 1 April 2024 (09:49:59 CEST)

A peer-reviewed article of this Preprint also exists.

Qin, Y.; Ye, O.; Fu, Y. An Automatic Near-Duplicate Video Data Cleaning Method Based on a Consistent Feature Hash Ring. Electronics 2024, 13, 1522. Qin, Y.; Ye, O.; Fu, Y. An Automatic Near-Duplicate Video Data Cleaning Method Based on a Consistent Feature Hash Ring. Electronics 2024, 13, 1522.

Abstract

In recent decades, with the ever-growing scale of video data, near-duplicate videos continue to emerge. Data quality issue caused by near-duplicate videos is becoming more and more prominent, which has affected the application of normal videos. Although current studies on near-duplicate video detection can be helpful to uncover data quality issues for videos, they still lack a process of automatic merging for the video data represented by high-dimensional features, which are difficult to automatically clean the near-duplicate videos to improve data quality for video datasets. At present, there are few studies on near-duplicate video data cleaning. The existing studies have the sensitive problems of video data orderliness and clustering initial center under a condition that prior distribution is unknown, which seriously affect the accuracy of near-duplicate video data cleaning. To address the above issues, an automatic near-duplicate video data cleaning method based on a consistent feature hash ring is proposed in this paper. First, a residual network with convolutional block attention modules, a long short-term memory deep network, and an attention model are integrated to construct an RCLA deep network with the multi-headed attention mechanism to extract spatiotemporal features of video data. Then, a consistent feature hash ring is constructed, which can effectively alleviate the sensitivity of video data orderliness while providing a condition of near-duplicate video merging. To reduce the sensitivity of the initial cluster centers to results of near-duplicate video cleansing, an optimized feature distance-means clustering algorithm is constructed by utilizing a mountain peak function on a consistent feature hash ring, which can implement automatic cleaning of near-duplicate video data. Finally, experiments are conducted based on a commonly used dataset named CC_WEB_VIDEO and a coal mining video dataset. Compared with some existing works, simulation results demonstrate that the performance of the proposed method.

Keywords

Video cleaning; consistent feature hash ring; feature distance means; mountain peak function; multi-head attention mechanism; near-duplicate videos

Subject

Computer Science and Mathematics, Analysis

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.