Submitted:
04 December 2023
Posted:
06 December 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
- consensus-driven working groups,
- global conferences and workshops,
- standards liaison, and
- educational efforts to promote widespread acceptance of metadata standards and practices.
2.1. Existing Methods
3. Our Methodology
3.1. Objective
3.2. Dataset Preparation
3.2.1. YouTube-dl
- A CSV sheet containing the URL, start-time and end-time is prepared.
- ydl.download([URL]) is used to download the entire YouTube video.
- Tags are then added to individual videos. Ex:- "Id of the video, title of the video"
-
The videos are cropped using FFMPEG and stored locally.ffmpeg inputFile -ss startTime -to endTime -c copy outputPath/fileName-TRIM.fileExtension
3.3. Key-Frame Detection
3.3.1. Pixel Subtraction
3.3.2. Frame count method
3.3.3. ffprobe tool
- Interconnected frames/ Key frames (I Frames)
- Predictive/Predicted frames (P Frames)
- Bi-Directional Predicted frames (B Frames)
3.3.4. Mittal (2020)
- Step 1: Extract candidate frames This process entails finding potential candidate frames that could become keyframes. This is done to reduce the number of frames to be processed in the next phase while retaining all relevant frames. In this stage, consecutive frame differences are calculated, followed by the frame with the greatest difference in a window of frames.
- Step 2: Cluster similar candidate frames Clusters of similar frames are formed in this step. Similar frames that are relatively close to one other are clustered together to form a single cluster. Before clustering, each frame is treated to extract the relevant information from it - by scaling the frame, changing it to greyscale, and with some other preprocessing. All that is done to obtain the most useful or significant information from the frame.
- Step 3: Choose the best frames from each cluster. Finally, the best frame from each cluster, as well as all frames that could not be clustered, are identified as keyframes in this step. The best frame is chosen based on the brightness and image blur index. Because all frames in a cluster have similar content, all other frames except the best frame from each cluster are discarded.
3.4. Text Detection
- Easy OCR
- Tesseract OCR
3.4.1. Easy OCR
3.4.2. Tesseract OCR
- Converting to binary image
- Blurring the images with Gaussian filter
- Detecting edges through different models
3.5. Indexing System
3.5.1. fuzzy-wuzzy
- Professor Name is almost always written in a single line
- Professor Name has either Prof. or Dr. as a prefix
- Professor Name always comes in the line succeeding the word "by".
3.5.2. Spacy NER model
4. Results
| Category | EasyOCR , ffprobe | TesseractOCR, ffprobe | EasyOCR , [10] |
| Publisher Name | 88.03 | 63.01 | 71.24 |
| Institute Name | 88.88 | 68.54 | 74.15 |
| Department Name | 82.47 | 65.87 | 75.07 |
| Professor Name | 85.89 | 64.44 | 75.60 |
References
- Yan, X., Gilani, S. Z., Qin, H., Feng, M., Zhang, L., and Mian, A. S. (2018). Deep keyframe detection in human action videos. CoRR, abs/1804.10021. [CrossRef]
- Gawande, U., Hajari, K., and Golhar, Y. (2020). Deep learning approach to key frame detection in human action videos. In Sadollah, A. and Sinha, T. S., editors, Recent Trends in Computational Intelligence, chapter 7. IntechOpen, Rijeka. [CrossRef]
- Hamad, K. and Kaya, M. (2016). A detailed analysis of optical character recognition technology. International Journal of Applied Mathematics, Electronics and Computers, 4:244–244. [CrossRef]
- Ding, J., Zhao, G., and Xu, F. (2018). Research on video text recognition technology based on ocr. In 2018 10th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pages 457–462. [CrossRef]
- Yang, H., Siebert, M., Luhne, P., Sack, H., and Meinel, C. (2011b). Lecture video indexing and analysis using video ocr technology. In 2011 Seventh International Conference on Signal Image Technology & Internet-Based Systems, pages 54–61. [CrossRef]
- Yang, H., Siebert, M., Luhne, P., Sack, H., and Meinel, C. (2011a). Automatic lecture video indexing using video ocr technology. In 2011 IEEE International Symposium on Multimedia, pages 111–116. [CrossRef]
- AV-Portal, T. T. (2014). Tib av-portal. https://av.tib.eu/.
- NDLI (2018). Ndli. https://ndl.iitkgp.ac.in/.
- DCMI (2018). Dcmi. https://www.dublincore.org/.
- Mittal, V. (2020). key frame extraction. https://github.com/varunmittal50/key_frame_extraction_public.




Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).