Preprint
Article

This version is not peer-reviewed.

Keyframe Selection and Multimodal Fusion for Product Recognition in E-Commerce Live Streaming

Submitted:

05 May 2026

Posted:

06 May 2026

You are already at the latest version

Abstract
Product recognition in e-commerce live streaming is hindered by rapid viewpoint changes, occlusions, motion blur, and inconsistencies between visual and spoken information. Existing approaches typically focus on individual components such as detection, OCR, or speech recognition, which limits their effectiveness in end-to-end scenarios.To address this problem, we propose an integrated framework that combines task-oriented keyframe selection with multimodal semantic fusion. The framework first uses D-FINE to localize product regions, and then selects informative frames through two complementary strategies. Strategy A considers both detection confidence and Laplacian-based sharpness, while Strategy B combines detection confidence with a learned image-quality score estimated by an EfficientNetV2-based model. OCR, visual recognition, and ASR are then applied to the selected data, and a Qwen-Plus large language model is used to integrate multimodal evidence into structured product outputs. Experiments on an in-house dataset demonstrate significant gains over a last-frame baseline. Strategy A increases Perfect Match Rate from 58.00% to 80.00% and Product Name Recognition Accuracy from 78.00% to 98.00%. Strategy B achieves 77.00% and 98.00%, respectively. Ablation studies further show that the full multimodal framework consistently outperforms unimodal and dual-modality variants. In addition, Top-K analysis indicates that single-frame inference provides a good balance between performance and efficiency.Overall, the proposed framework offers an effective and practical solution for product recognition in complex live-streaming scenarios.
Keywords: 
;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated