Collecting Existing Data and Video URLs
To facilitate streamlined YouTube uploads for annual meeting content, a data spreadsheet was required containing all relevant information about each year, including columns that contain the desired YouTube title and description as well as a URL to the desired video in its present storage location (OnDemand, Dropbox, etc.). This spreadsheet was created where possible using automated processes, with different procedures required for different past annual meeting content due to varying formats of storage and availability.
OHBM annual meeting content between 2013 and 2021 has been stored and shared via the OnDemand system. We developed a script using Selenium, a web automation tool, to systematically extract video URLs and speaker and session information from the OnDemand platform. This automated process efficiently extracted multiple video URLs and associated metadata from the OnDemand system, reducing the need for manual navigation and data collection. The script was implemented in Python and utilized the Chrome WebDriver for browser automation. The main processes of this script are detailed below:
WebDriver Setup: A headless (no graphical interface) Chrome browser was initialized using Selenium to simulate user interactions without a visible browser interface. The WebDriver was configured with specific options to handle secure connections, prevent sandboxing issues (which can restrict browser functionality in isolated environments), and optimize performance in a headless environment.
Login Automation: The script navigated to the OnDemand platform’s login page, where login credentials were automatically submitted.
Course and Session Navigation: After login, the script iterated through a predefined list of URLs relating to each ‘Course’ (one ‘Course’ typically contained all content for a given year’s annual meeting). For each course, it navigated through the available sections (sessions and individual presentations). Elements on the page were located using CSS selectors and XPath queries to identify relevant video links and download buttons.
Video URL Extraction: Within each session or presentation page, the script located and extracted video URLs, contributor names, and other relevant metadata.
Toggle Download Permissions: When required, the script interacted with page controls (such as download switches) using JavaScript execution to enable download options where applicable.
Data Storage: Extracted data, including course titles, session titles, video titles, contributor names, and video URLs, was compiled into a structured format using Python’s pandas library and saved into a spreadsheet.
Additional columns were added to the spreadsheet output file using Microsoft Excel with formulae to concatenate and combine relevant information from each field and create columns containing the finalized YouTube title and description.
In 2022 and 2023, a virtual component of the conference was hosted on the FourWaves platform, which handled all video submissions. FourWaves functionality allowed for a straightforward exporting of all video files in the form of a spreadsheet containing all associated information, including speaker names, talk titles, abstracts, keywords, and, critically, URLs to each respective video file. Additional columns were added to these files using Microsoft Excel with formulae to concatenate and combine relevant information from each field and create columns containing the finalized YouTube title and description. Session names for the educational courses were not provided automatically by Fourwaves, so were collated and matched to talk titles manually from the annual meeting programs.
Finally, for 2024 videos, video files were stored in a Dropbox folder for each annual meeting session. A similar Python script using Selenium and Chrome WebDriver matched video titles to known presentation titles within each respective Dropbox folder and created the desired data spreadsheet as before.
Some manual review of the automatically created data spreadsheets was required to maintain consistency in naming conventions across years and to adhere to YouTube naming restrictions. For example, YouTube video titles and descriptions are limited to 100 and 5000 characters. This was never an issue for descriptions, but talk titles would regularly push this limit. For some talks—e.g., educational courses and keynote lectures—manually abridged titles were created. However, this was infeasible for most talks, so titles were automatically concatenated at the character limit and appended with an ellipsis.
Also, YouTube titles and descriptions do not accept ‘less than (<)’ or ‘greater than (>)’ symbols. These symbols appeared fairly frequently in talk abstracts, which formed part of the descriptions. To easily convert this text into a YouTube-compatible format, similar-looking but differently coded ‘full-width’ symbols (‘<’, ‘>’) are accepted and used instead using Excel’s ‘Find & Replace’ functionality.
Uploading Content to YouTube
Automated upload methods were implemented, given the large volume of videos that were uploaded to YouTube. After creating the data sheets, videos were uploaded using a custom Python script (v3.9.18) that interfaced with the YouTube Application Programming Interface (API). Google provides a template script and detailed instructions for uploading videos via the YouTube API [
4].
For this project, we utilized an updated script compatible with Python 3+ (see ref [
5] for the script, and ref [
6] for a comprehensive video tutorial on using the YouTube API with Python). This script allows the calling of a function ‘upload_video.py’, which provides the ability to upload a video to a given (pre-authorised) YouTube channel with the following command:
python upload_video.py
--file <video.mp4/URL>
--title <title>
--description <text>
--keywords <comma separated keywords>
--privacyStatus <unlisted/private/public>
--category <value>
--selfDeclaredMadeForKids <true/false>
--notifySubscribers <true/false>
This function was placed within a wrapper script to read each row of the corresponding data spreadsheet, extracting appropriate fields for ‘file’, ‘title’ and ‘description’ flags. Some talks derived from Fourwaves content (2022-2023 annual meetings) were also included under the ‘keywords’ field. For all videos, ‘privacyStatus’ was set to ‘public’, and the YouTube ‘category’ was set to “28” (‘Science & Technology’). In compliance with YouTube’s requirements, the ‘selfDeclaredMadeForKids’ flag was consistently set to ‘false’, indicating that the videos were not specifically made for children. Additionally, the ‘notifySubscribers’ option was set to ‘false’ to prevent email and mobile notifications to subscribers, given the large volume of uploads.
By default, all new APIs are limited in two key ways. First, videos can only be uploaded in ‘private’ viewing mode. Second, API’s are limited in how much data they can access per day, represented by ‘tokens’. By default, new APIs are limited to 10,000 tokens per day. Uploading a video requires 1,600 tokens. To upload more than 6 videos per day and to upload in ‘public’ status, the API had to be audited (
https://support.google.com/youtube/contact/yt_api_form). Following this approval, the quota limit was increased to a total of 260,000, providing a daily upload limit of 162 videos.
To improve the sustainability of this initiative, all codes are made available for this endeveour to be carried forward by the next generation of OHBMers. We aim to make the uploading of videos of the previous meeting a central goal of the program, educational, and communication committees.