ARTICLE | doi:10.20944/preprints201801.0101.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Sparse Representation; locality information; Dictionary Learning; Video Semantic Analysis; Discriminative Function
Online: 11 January 2018 (09:46:43 CET)
Dictionary Learning (DL) and Sparse Representation (SR) based Classifier have impacted greatly on the classification performance and has had good recognition rate on image data. In Video Semantic Analysis (VSA), the local structure of video data contains more vital discriminative information needed for classification. However, this has not been fully exploited by the current DL based approaches. Besides, similar coding findings are not being realized from video features with the same video category. Based on the issues stated afore, a novel learning algorithm, called Sparsity based Locality-Sensitive Discriminative Dictionary Learning(SLSDDL) for VSA is proposed in this paper. In the proposed algorithm, a discriminant loss function for the category based on sparse coding of the sparse coefficients is introduced into structure of Locality-Sensitive Dictionary Learning (LSDL) algorithm. Finally, the sparse coefficients for the testing video feature sample are solved by the optimized method of SLSDDL and the classification result for video semantic is obtained by minimizing the error between the original and reconstructed samples. The experiment results show that, the proposed SLSDDL significantly improves the performance of video semantic detection compared with the comparative state-of-the-art approaches. Moreover, the robustness to various diverse environments in video is also demonstrated, which proves the universality of the novel approach.
ARTICLE | doi:10.20944/preprints202212.0132.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Short video; Sentiment Analysis; Feature; 3D Dense Net; 3D Residual Network
Online: 7 December 2022 (11:57:32 CET)
In recent years, with the development of social media, people are more and more inclined to upload text, pictures and videos on the platform to express their personal emotions, thus the number of short videos is increasing and becoming the first choice for people to socialize. Unlike the traditional way, people can convey their personal emotions and opinions through media other than words, such as video images, etc. for external information. Therefore, the expression and analysis of emotions is not only through text, but also through the analysis of emotional needs in images and videos, and the research scholars have customized products for individual users. Compared with pure text content, video information can more intuitively express users' happiness, anger and sorrow, thus short video-related applications have gained more and more popularity among Internet users in recent years. However, not all short videos on social networking sites can accurately express users' emotions, and related text information can more accurately assist sentiment analysis and thus improve accuracy. However, short video sentiment analysis based on video frame images is inaccurate in some scenarios, such as when expressing tears of joy, the sentiment expressed by the user's facial expression and voice are different, which will cause errors in the analysis of sentiment. As a result, researchers began to consider multimodal sentiment analysis to reduce the impact of the above scenarios on short video sentiment analysis. This paper focuses on proposing a sentiment analysis method for short videos. We first propose a residual attention model to make full use of the information in audio to classify the emotions contained in them. Then the text information in the dataset is classified by feature extraction. The key to extract features from text information is not only to retain the semantic information of the text, but also to explore the potential emotional information in the text, so as to ensure the integrity of the text information features. The experiments show that the sentiment analysis model proposed in this paper is more superior than the baselines.
ARTICLE | doi:10.20944/preprints201910.0284.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: raindrop shapes; asymmetric rain drops; scattering calculations; polarimetric radar; 2D-video distrometer
Online: 25 October 2019 (04:22:52 CEST)
Tropical storm Nate, which was a powerful hurricane prior to landfall along the Alabama coast, traversed north towards our instrumented site in Hunstville, AL. The rain bands lasted 18 h and the 2D-video disdrometer (2DVD) captured the event which was shallow and indicative of pure warm rain processes. Measurements of raindrop size, shape and velocity distributions are quite rare in pure warm rain and are expected to differ from cold rain processes. In particular, asymmetric shapes due to drop oscillations and their impact on polarimetric radar signatures in warm rain have not been studied so far. Recently, the 2DVD data has been used for 3D reconstruction of asymmetric raindrop shapes but their fraction (relative to the more common oblate shapes) in warm rain has yet to be ascertained. Here we compute the scattering matrix drop-by-drop using Computer Simulation Technology integral equation solver for drop sizes>2.5 mm. From the scattering matrix elements, the polarimetric radar observables are simulated by integrating over 1 minute consecutive segments of the event. These simulated values are compared with dual-polarized C-band radar data located at 15 km range from the 2DVD site to evaluate the contribution of the asymmetric drop shapes.
HYPOTHESIS | doi:10.20944/preprints202106.0313.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: group activity recognition; graph convolution network; video understanding; video analytics; activity recognition
Online: 11 June 2021 (10:37:38 CEST)
In this paper, we propose a robust video understanding model for activity recognition by learning the actor’s pair-wise correlations and relational reasoning, exploiting spatial and temporal information. In order to measure the similarity between the pair appearances and construct an actor relations map, the Zero Mean Normalized Cross-Correlation (ZNCC) and the Zero Mean Sum of Absolute Differences(ZSAD) is proposed to allow the Graph Convolution Network (GCN) to learn how to distinguish group actions. We recommend that MNASNet be used as the backbone to retrieve features. Experiments show a 38.50% and 23.7% reduction in training time in the 2-stage training process along with a 1.52% improvement in accuracy against traditional methods.
ARTICLE | doi:10.20944/preprints201810.0118.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: bandelet; medical imaging; quadtree decomposition; SPIHT coder; video coding; video quality measure
Online: 7 October 2018 (10:26:18 CEST)
The operations of digitization, transmission and storage of medical data, particularly images require increasingly effective encoding methods not only in terms of compression ratio and flow of information but also in terms of visual quality. At first, there was DCT (discrete cosine transform) then DWT (discrete wavelet transform) and their associated standards in terms of coding and image compression. After that, the 2nd generation wavelets seeks to be positioned and confronted to the image and video coding methods currently used. It is in this context that we suggested a method combining bandelets and SPIHT (set partitioning in hierarchical trees) algorithm. There are two main reasons for our approach: the first lies in the nature of the bandelet transform to take advantage by capturing the geometrical complexity of the image structure. The second reason stems in the suitability of encoding the bandelet coefficients by the SPIHT encoder. Quality measurements shows that in some cases (for low bit rates) the performances of the proposed coding compete with the well-established ones and opens up new application prospects in the field of medical imaging.
ARTICLE | doi:10.20944/preprints201809.0449.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: motion; superpixel; temporal features; video classification
Online: 24 September 2018 (09:54:01 CEST)
Superpixels are a representation of still images as pixel grids because of their more meaningful information compared with atomic pixels. However, their usefulness for video classification has been given little attention. In this paper, rather than using spatial RGB values as low-level features, we use optical flows mapped into hue-saturation-value (HSV) space to capture rich motion features over time. We introduce motion superpixels, which are superpixels generated from flow fields. After mapping flow fields into HSV space, independent superpixels are formed by iteration of seeded regions. Every grid of a motion superpixel is tracked over time using nearest neighbors in the histogram of flow (HOF) for consecutive flow fields. To define the temporal representation, the evolution of three features within the superpixel region, namely the HOF, HOG, and the center of superpixel mass are used as descriptors. The bag of features algorithm is used to quantify final features, and generalized histogram-kernel support vector machines are used as learning algorithms. We evaluate the proposed superpixel tracking on first-person videos and action sports videos.
ARTICLE | doi:10.20944/preprints202305.0630.v1
Subject: Engineering, Civil Engineering Keywords: video-based; structural damage; low-cost method
Online: 9 May 2023 (09:52:46 CEST)
The paper explores the potential of a low-cost advanced video-based technique for the assessment of structural damage induced to buildings by seismic loading. A low-cost high-speed video camera was utilized for motion magnification (MM) processing of footages of a two-story reinforced concrete frame building subjected to shaking table tests. The damage after seismic loading was estimated by analyzing the dynamic behavior (i.e. in terms of modal parameters) and the structural deformations of the building in the MM videos. The results by MM were compared for method validation to damage assessment obtained by the analyses of conventional accelerometers and high-precision optical markers tracked by a passive 3D motion capture system. Also, 3D laser scanning to obtain an accurate survey of the building geometry before and after the seismic tests was carried out. In particular, accelerometers were also processed and analyzed by using several stationary and non-stationary techniques with the aim to analyze the linear behavior of the undamaged structure and the nonlinear structural behavior during damaging shaking table tests. The proposed MM-based procedure provided accurate estimate of the main modal frequency and the damage location through the analysis of modal shapes, which were confirmed by advanced analyses of accelerometric data.
ARTICLE | doi:10.20944/preprints202207.0308.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: micro-video classification; 3D CNN; multi-modal
Online: 21 July 2022 (03:09:34 CEST)
Along with the popularity of the Internet, people are exposed to more and more ways of micro-videos, and a huge amount of micro-video data has emerged. micro-videos have gradually become the Internet content preferred by the public, and a large number of micro-video apps have also emerged, such as Tiktok and Kwai. Intelligent classification and mining of micro-videos can greatly enhance user experience, improve business operation efficiency and enhance user experience. Through deep intelligent analysis and mining of micro-videos, important information in micro-videos can be extracted to provide an important basis for beautifying videos, content appreciation, video recommendation, content search, etc. In the past, content understanding for short videos often used human work annotation, but in recent years, with the great success of deep convolutional neural networks in image recognition, short video content understanding based on this method has gradually developed. Nowadays, most recognition algorithms extract the feature representation of each frame independently and then fuse them. However, while extracting the feature representation, some low-level semantic features are lost, which makes the algorithm unable to accurately distinguish the category of the video. At present, the algorithm of micro-video recognition based on deep learning has surpassed the iDT algorithm, making these traditional methods fade out of people’s view. In this paper according to the micro-video classification task, a new network model is proposed to concatenate features of each modality into the overall features of various modalities through the network, and then fuse the various modal features with the attention mechanism to obtain the whole micro-video features, which will be used for classification. In order to verify the effectiveness of the algorithm proposed in this paper, experiments are conducted in the public dataset, and it is shown the effectiveness of our model.
REVIEW | doi:10.20944/preprints202201.0016.v2
Subject: Engineering, Control And Systems Engineering Keywords: AI; deep learning; video editing; image editing
Online: 4 February 2022 (13:40:05 CET)
Video editing is a high-required job, for it requires skilled artists or workers equipped with plentiful physical strength and multidisciplinary knowledge, such as cinematography, aesthetics. Thus gradually, more and more researches focus on proposing semi-automatical and even fully automatical solutions to reduce workloads. Since those conventional methods are usually designed to follow some simple guidelines, they lack flexibility and capability to learn complex ones. Fortunately, the advances of computer vision and machine learning make up the shortages of traditional approaches and make AI editing feasible. There is no survey to conclude those emerging researches yet. This paper summaries the development history of automatic video editing, and especially the applications of AI in partial and full workflows. We emphasizes video editing and discuss related works from multiple aspects: modality, type of input videos, methology, optimization, dataset, and evaluation metric. Besides, we also summarize the progresses in image editing domain, i.e., style transferring, retargeting, and colorization, and seek for the possibility to transfer those techniques to video domain. Finally, we give a brief conclusion about this survey and explore some open problems.
ARTICLE | doi:10.20944/preprints202305.1911.v1
Subject: Social Sciences, Psychology Keywords: competitive video game; aggression; event-related potential; P300
Online: 26 May 2023 (10:04:06 CEST)
Previous research on video game player aggression has focused on violent content, while recent studies have examined competitive factors. Few research has examined the solely impacts of competitive factors in video games without violent content on aggression, and it is still unknown what neurological processes of these effects. The present study is the first to examine the the electrophysiological characteristics of short-term competitive video game exposure and aggression. Thirty-five participants played a video game in either competitive or solo mode for 15 minutes, followed by an ERP experiment based on the Oddball paradigm and the hot sauce paradigm to measure aggressive behavior. Results showed that playing competitive game mode was associated with faster judgment to aggressive words, larger P300 amplitudes, and selection of more chili powder. The P300 amplitude partially mediated the relationship between competitive game exposure and aggressive behavior. These findings support the general aggression model.
ARTICLE | doi:10.20944/preprints202302.0415.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: reflectance PPG; reflectance photoplethysmography; heart rate estimation; video
Online: 24 February 2023 (02:52:59 CET)
Non-invasive heart rate (HR) monitoring is important in clinical settings as it plays a critical role in diagnosing a range of health conditions and assessing well-being. Presently, the gold standards for HR measurement are all based on sensors which require skin contact. Apart from inconvenience, contact sensors have proven problematic in certain scenarios – they cannot be used when mechanical isolation of the patient is imperative (burn victims, patients with shaky hands and feet), cause skin damage to premature babies in the ICU and increase the risk of spreading infections. Non-contact HR monitoring using a camera has been recently shown to be a viable alternative. It is now possible to record cardiac-synchronous blood volume variations from facial videos of human subjects under ambient lighting. These variations produce corresponding changes in skin reflectance which can be extracted as a raw reflectance photoplethysmography (rPPG) signal and processed to reveal HR. In this project, an algorithmic framework for webcam-based HR detection was successfully implemented in MATLAB. The investigation was based on 100 self-captured videos (dark-skinned subject) and 48 videos (from 12 subjects, all but one fair-skinned) obtained from COHFACE – an online database of facial videos and corresponding physiological signals. While the performance metrics (mean error, SNR) of the rPPG signals obtained from the self-captured videos were poor (best case mean error of 22%), they were good enough to demonstrate the success of the implementation. The poor results were primarily imputed to skin tone as rPPG SNR is known to be particularly low for dark tones. The results of the COHFACE videos were far superior, with mean error ranging from 3% to 15% (among 8 different rPPG signals) and 0% to 9% under ambient and dedicated lighting, respectively. This investigation sets the foundation for future research directed at optimizing rPPG performance metrics for dark-skinned subjects.
ARTICLE | doi:10.20944/preprints202106.0292.v1
Subject: Social Sciences, Psychology Keywords: sonification; gamification; auditory display; smartphone apps; video games
Online: 10 June 2021 (13:21:22 CEST)
As sonification is supposed to communicate information to users, experimental evaluation of the subjective appropriateness and effectiveness of the sonification design is often desired and sometimes indispensable. Experiments in the laboratory are typically restricted to short-term usage by a small sample size under unnatural conditions. We introduce the multi-platform CURAT Sonification Game that allows us to evaluate our sonification design by a large population during long-term usage. Gamification is used to motivate users to interact with the sonification regularly and conscientiously over a long period of time. In this paper we present the sonification game and some initial analyses of the gathered data. Furthermore, we hope to reach more volunteers to play the CURAT Sonification Game and help us evaluate and optimize our psychoacoustic sonification design and give us valuable feedback on the game and recommendations for future developments.
ARTICLE | doi:10.20944/preprints202104.0257.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: information media; video; patient’ knowledge; antibiotic use; antibiotic resistance
Online: 9 April 2021 (10:23:20 CEST)
Irrational use or misuse of antibiotics, particularly by outpatients, increases antibiotic resistance. A lack of public knowledge about ‘Responsible use of antibiotics’ and ‘How to obtain antibiotics’ is a major cause of this. This study aimed to assess the effectiveness of an educational video about antibiotics and antibiotics use to increase outpatient's knowledge in two public hospitals in East Java, Indonesia. A quasi-experimental research setting was used with a one-group pretest-posttest design, carried out from November 2018 to January 2019. The study population consisted of outpatients, to whom antibiotics were prescribed, in two public hospitals in East Java, Indonesia. Participants were selected using a purposive sampling technique; 98 outpatients at MZ General Hospital in S regency and 96 at SG General Hospital in L regency were included. A questionnaire was used to measure the respondents’ knowledge and consisted of five domains, i.e. definition of infections and antibiotics, obtaining the antibiotics, directions of use, storage instructions, antibiotic resistance. The knowledge test score was the total score of the Guttman scale (a dichotomy of ‘yes’ or ‘no’ answers). To determine the significance of the difference in knowledge before and after providing the educational video and in the knowledge score between hospitals, the (paired) Student’s t-test was applied. The educational videos significantly improved outpatients' knowledge, which increased with 41% in MZ General Hospital and 42% in SG General Hospital. An educational video is a useful method to improve the knowledge of the outpatients regarding antibiotics.
ARTICLE | doi:10.20944/preprints202011.0649.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: video super-resolution; bidirectional; recurrent method; sliding window method
Online: 25 November 2020 (15:12:38 CET)
Video super-resolution, which utilizes the relevant information of several low-resolution frames to generate high-resolution images, is a challenging task. One possible solution called sliding window method tries to divide the generation of high-resolution video sequences into independent sub-tasks, and only adjacent low-resolution images are used to estimate the high-resolution version of the central low-resolution image. Another popular method named recurrent algorithm proposes to utilize not only the low-resolution images but also the generated high-resolution images of previous frames to generate the high-resolution image. However, both methods have some unavoidable disadvantages. The former one usually leads to bad temporal consistency and requires higher computational cost while the latter method always can not make full use of information contained by optical flow or any other calculated features. Thus more investigations need to be done to explore the balance between these two methods. In this work, a bidirectional frame recurrent video super-resolution method is proposed. To be specific, a reverse training is proposed that the generated high-resolution frame is also utilized to help estimate the high-resolution version of the former frame. With the contribution of reverse training and the forward training, the idea of bidirectional recurrent method not only guarantees the temporal consistency but also make full use of the adjacent information due to the bidirectional training operation while the computational cost is acceptable. Experimental results demonstrate that the bidirectional super-resolution framework gives remarkable performance that it solves the time-related problems when the generated high-resolution image is impressive compared with recurrent-based video super-resolution method.
ARTICLE | doi:10.20944/preprints202006.0194.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: network flow; combinatorial optimization; tracking-by-detection; video surveillance
Online: 15 June 2020 (11:26:25 CEST)
In tracking-by-detection paradigm for multi-target tracking, target association is modeled as an optimization problem that is usually solved through network flow formulation. In this paper, we proposed combinatorial optimization formulation and used a bipartite graph matching for associating the targets in the consecutive frames. Usually, the target of interest is represented in a bounding box and track the whole box as a single entity. However, in the case of humans, the body goes through complex articulation and occlusion that severely deteriorate the tracking performance. To partially tackle the problem of occlusion, we argue that tracking the rigid body organ could lead to better tracking performance compared to the whole body tracking. Based on this assumption, we generated the target hypothesis of only the spatial locations of person’s heads in every frame. After the localization of head location, a constant velocity motion model is used for the temporal evolution of the targets in the visual scene. Qualitative results are evaluated on four challenging video surveillance dataset and promising results has been achieved.
Subject: Social Sciences, Media Studies Keywords: live-streaming; video-conference; broadcast; scientific conferences; diversity; inclusion
Online: 10 March 2020 (02:29:22 CET)
Live streaming conferences increase the participation of a diverse audience, help defray travel costs and overcome problems related to travel restrictions. In this article, we lay out tips for implementing live-streaming in scientific meetings. We also cover legal, ethical, and technical aspects implicated with live-streaming scientific talks. To write this article, we leveraged knowledge from our experience in organizing the symposium “Deciphering the Denisovans,” presented at the 88th Annual Meeting of the American Association of Physical Anthropology (AAPA) in Cleveland, OH, in 2019, as well as literature on the topic.
ARTICLE | doi:10.20944/preprints201805.0051.v1
Subject: Engineering, Mechanical Engineering Keywords: porous media; optical video microscopy; microfluidics; waterflooding; surfactants; polymers
Online: 3 May 2018 (05:52:37 CEST)
In this study, we examine microscale waterflooding in a randomly close-packed porous medium. Three different porosities are prepared in a microfluidic platform and saturated with silicone oil. Optical video fluorescence microscopy is used to track the water front as it flows through the porous packed bed. The degree of water saturation is compared to water containing two different types of chemical modifiers, sodium dodecyl sulfate (SDS) and polyvinylpyrrolidone (PVP), with water in the absence of a surfactant used as a control. Image analysis of our video data yield saturation curves and calculate fractal dimension, which we use to identify how morphology changes the way an invading water phase moves through the porous media. An inverse analysis based on the implicit pressure explicit saturation (IMPES) simulation technique uses mobility ratio as an adjustable parameter to fit our experimental saturation curves. The results from our inverse analysis combined with our image analysis show that this platform can be used to evaluate the effectiveness of surfactants or polymers as additives for enhancing the transport of water through an oil-saturated porous medium.
ARTICLE | doi:10.20944/preprints201710.0119.v2
Subject: Engineering, Civil Engineering Keywords: laser pointer; displacement monitoring; laser fingerprint; video; data synchronization
Online: 11 December 2017 (15:16:12 CET)
Deck inclination and vertical displacements are among the most important technical parameters to evaluate the health status of a bridge and to verify its bearing capacity. Several methods, both conventional and innovative, are used for structural rotations and displacement monitoring; no one of these does allow, at the same time, precision, automation, static and dynamic monitoring without using high cost instrumentation. The proposed system uses a common laser pointer and image processing. The elastic line inclination is measured by analyzing the single frames of a HD video of the laser beam imprint projected on a flat target. For the image processing, a code was developed in Matlab® that provides instantaneous rotation and displacement of a bridge, charged by a mobile load. An important feature is the synchronization of the load positioning, obtained by a GNSS receiver or by a video. After the calibration procedures, a test was carried out during the movements of a heavy truck maneuvering on a bridge. Data acquisition synchronization allowed to relate the position of the truck on the deck to inclination and displacements. The inclination of elastic line was obtained with a precision of 0.01 mrad. The results demonstrate the suitability of the method for dynamic load tests, control and monitoring of bridges.
ARTICLE | doi:10.20944/preprints201703.0159.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: object detection; background subtraction; video surveillance; Kinect sensor fusion
Online: 20 March 2017 (10:21:40 CET)
Depth-sensing technology has led to broad applications of inexpensive depth cameras that can capture human motion and scenes in 3D space. Background subtraction algorithms can be improved by fusing color and depth cues, thereby allowing many issues encountered in classical color segmentation to be solved. In this paper, we propose a new fusion method that combines depth and color information for foreground segmentation based on an advanced color-based algorithm. First, a background model and a depth model are developed. Then, based on these models, we propose a new updating strategy that can eliminate ghosting and black shadows almost completely. Extensive experiments have been performed to compare the proposed algorithm with other, conventional RGB-D algorithms. The experimental results suggest that our method extracts foregrounds with higher effectiveness and efficiency.
ARTICLE | doi:10.20944/preprints202206.0356.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: optimization; video segmentation; decision tree; random forest; gradient boost tree
Online: 27 June 2022 (08:56:21 CEST)
Video segmentation is crucial in a variety of practical applications especially in computer visions. Most of recent works in video segmentation are focusing on Deep learning based video segmentation, there are rooms for improvement in respect of the evolutionary algorithms. This paper aims to propose the novel method to video segmentation by using the optimization of segmentation parameters based on ensemble-based random forest and gradient boosting decision tree. The experimental results show Pareto front of segmentation parameters (hue, brightness, luminance, and saturation). Our optimization model yields accuracy: 85% +/-8.85 % (micro average: 85.00 %), average class precision: 84.88%, and average class recall: 85%. We also show the video segmentation results based on our optimization method and compare our results with Kinect-based video segmentation.
ARTICLE | doi:10.20944/preprints202107.0621.v1
Subject: Engineering, Automotive Engineering Keywords: Accessibility; Guiding Methods; Immersive Media; Subtitling; Virtual Reality; 360º video
Online: 28 July 2021 (10:28:27 CEST)
Every (multimedia) service needs to be accessible. Accessibility for multimedia content is typically provided by means of access services, of which subtitling is likely the most widespread one. Up to date, many recommendations and solutions for subtitling classical 2D audiovisual services are available. Likewise, recent efforts have been devoted to devising adequate subtitling solutions for VR360 video content. This paper, for the first time, goes a step beyond, by exploring two key requirements to fulfill remaining challenges towards efficiently subtitling 3D Virtual Reality (VR) content: presentation modes, and guiding methods. By leveraging insights from earlier work on VR360 content, the paper proposes novel presentation modes and guiding methods to not only deal with the freedom to explore the omnidirectional scenes, but also with additional specificities of 3D VR compared to VR360 content: depth, 6 Degrees of Freedom (6DoF), and viewing perspectives. The obtained results prove that always-visible and a novel proposed comic-style presentation mode are far more appropriate than state-of-the-art fixed-positioned subtitles, mainly in terms of immersion, ease and comfort of reading, and identification of speakers, when applied to professional pieces of content with limited displacement of speakers and with limited 6DoF (i.e. users are not expected to largely navigate around the virtual environment). Likewise, even in such limited movement scenarios, the results show that the use of indicators (arrows), as guiding methods, is well received. Overall, the paper provides relevant insights and paves the way toward efficiently subtitling 3D VR content.
ARTICLE | doi:10.20944/preprints201908.0289.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: drone video; human action recognition; CNN; Support vector machine (SVM)
Online: 28 August 2019 (03:52:22 CEST)
Recognition of the human interaction on the unconstrained videos taken from cameras and remote sensing platforms like a drone is a challenging problem. This study presents a method to resolve issues of motion blur, poor quality of videos, occlusions, the difference in body structure or size, and high computation or memory requirement. This study contributes to the improvement of recognition of human interaction during disasters such as an earthquake and flood utilizing drone videos for rescue and emergency management. We used Support Vector Machine (SVM) to classify the high-level and stationary features obtained from Convolutional Neural Network (CNN) in key-frames from videos. We extracted conceptual features by employing CNN to recognize objects from first and last images from a video. The proposed method demonstrated the context of a scene, which is significant in determining the behaviour of human in the videos. In this method, we do not require person detection, tracking, and many instances of images. The proposed method was tested for the University of Central Florida (UCF Sports Action), Olympic Sports videos. These videos were taken from the ground platform. Besides, camera drone video was captured from Southwest Jiaotong University (SWJTU) Sports Centre and incorporated to test the developed method in this study. This study accomplished an acceptable performance with an accuracy of 90.42%, which has indicated improvement of more than 4.92% as compared to the existing methods.
ARTICLE | doi:10.20944/preprints201807.0238.v1
Subject: Computer Science And Mathematics, Hardware And Architecture Keywords: Multiple object tracking; Airborne video; Tracklet confidence; Hierarchical association framework
Online: 13 July 2018 (14:27:22 CEST)
Multi-object tracking (MOT) in airborne videos is a challenging problem due to the uncertain airborne vehicle motion, vibrations of the mounted camera, unreliable detections, size, appearance and motion of the moving objects as well as occlusions due to the interaction between the moving objects and with other static objects in the scene.To deal with these problems, this work proposes a four-stage Hierarchical Association framework for multiple object Tracking in Airborne video (HATA). The proposed framework combines data association-based tracking (DAT) methods and target tracking using a Compressive Tracking approach, to robustly track objects in complex airborne surveillance scenes. In each association stage, different sets of tracklets and detections are associated to efficiently handle local tracklet generation, local trajectory construction, global drifting tracklet correction and global fragmented tracklet linking. Experiments with challenging airborne video datasets show significant tracking improvement compared to existing state-of-art methods.
ARTICLE | doi:10.20944/preprints201807.0222.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: compressed sensing; distributed video codec; sparse representation; side information reconstruction
Online: 13 July 2018 (03:43:55 CEST)
Aiming at the problems that large amount of video monitoring image data in underground coal mines leads to difficulties in transmission and storage, compressed sensing theory is introduced to encode and decode video images, and a new distributed video coding scheme is proposed. In order to obtain more sparse representation and more general applicability, a block-based adaptive sparse base scheme is proposed. For the acquisition of side information, fixed weight is usually used to synthesize side information and the correlation between different image blocks is neglected, a block-based classification weighted side information generation scheme is proposed. Experimental results show that the block-based classification codec scheme can make full use of inter-frame correlation. Under the appropriate sampling rate, the PSNR value of video reconstruction increases, which effectively improves the quality of video frame reconstruction.
ARTICLE | doi:10.20944/preprints202002.0026.v1
Subject: Social Sciences, Cognitive Science Keywords: mind-wandering; video lecture; self-caught method; oculomotor data; eye movements
Online: 3 February 2020 (08:34:54 CET)
The purpose of this study was to detect mind-wandering experienced by pre-service teachers while learning video lecture on physics. The lecture was videotaped and consisted of a live lecture in a classroom. The lecture was about Gauss's law on physics. We investigated whether oculomotor data and eye movements could be used as a marker to indicate the learner’s mind-wandering. Each data was collected in a study in which 24 pre-service teachers (16 females and 8 males) reported self-caught mind-wandering while learning physics video lecture during30 minutes. A Tobii Pro Spectrum (sampling rate: 300Hz) was used to capture their eye-gaze during learning Gauss's law course video. After watching video lecture, we interviewed pre-service teachers about their mind-wandering experience. We first used the self-caught method to capture the mind-wandering timing of pre-service teachers while learning from video lectures. We detected more accurate mind-wandering segments by comparing fixation duration and saccade count. We investigated two types of oculomotor data (blink count, pupil size) and nine eye movements (average peak velocity of saccades; maximum peak velocity of saccades; standard deviation of peak velocity of saccades; average amplitude of saccades; maximum amplitude of saccades; total amplitude of saccades; saccade count/s; fixation duration; fixation dispersion). The result was that the blink count could not be used as a marker for mind-wandering during learning video lectures among them (oculomotor data and eye movements), unlike previous literatures. Based on the results of this study, we identified elements that can be used as mind-wandering markers while learning from video lectures that are similar to real classes, among the oculomotor data and eye movement mentioned in previous literatures. Also, we found that most participants focused on past thoughts and felt unpleasant after experiencing mind-wandering through interview analysis.
ARTICLE | doi:10.20944/preprints201911.0101.v1
Subject: Medicine And Pharmacology, Anesthesiology And Pain Medicine Keywords: chronic pain; epigenetics; neuropathic pain; postoperative pain; thoracic surgery; video-assisted
Online: 10 November 2019 (09:29:13 CET)
Background: Elucidation of epigenetic mechanisms correlating with neuropathic pain in humans is crucial for the prevention and treatment of this treatment-resistant pain state. In the present study, associations between neuropathic pain characteristics and DNA methylation of the transient receptor potential ankyrin 1(TRPA1) gene were evaluated in chronic pain patients and preoperative patients. Methods: Pain and psychological states were prospectively assessed in patients who suffered chronic pain or were scheduled for thoracic surgery. Neuropathic characteristics were assessed using the Douleur Neuropathique 4 (DN4) questionnaire. DNA methylation levels of the CpG island in the TRPA1 gene were examined using whole blood. Results: Forty-eight adult patients were enrolled in this study. Increases in DNA methylation rates at CpG -51 showed positive correlations with increases in the DN4 score both in preoperative and chronic pain patients. Combined methylation rates at CpG -51 also significantly increased together with increase in DN4 scores. Conclusions: Neuropathic pain characteristics are likely associated with methylation rates at the promoter region of the TRPA1 gene in human peripheral blood.
ARTICLE | doi:10.20944/preprints201804.0333.v2
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: capsule video endoscopy; stochastic sampling; random walks; color gradient; image decomposition
Online: 17 May 2018 (12:46:30 CEST)
Capsule endoscopy, which uses a wireless camera to take images of the digestive tract, is emerging as an alternative to traditional colonoscopy. The diagnostic values of these images depend on the quality of revealed underlying tissue surfaces. In this paper, we consider the problem of enhancing the visibility of detail and shadowed tissue surfaces for capsule endoscopy images. Using concentric circles at each pixel for random walks combined with stochastic sampling, the proposed method enhances the details of vessel and tissue surfaces. The framework decomposes the image into two detail layers that contain shadowed tissue surfaces and detail features. The target pixel value is recalculated for the smooth layer using similarity of the target pixel to neighboring pixels by weighting against the total gradient variation and intensity differences. In order to evaluate the diagnostic image quality of the proposed method, we used clinical subjective evaluation with a rank order on selected KID image database and compared to state of the art enhancement methods. The result showed that the proposed method provides a better result in terms of diagnostic image quality and objective quality contrast metrics and structural similarity index.
ARTICLE | doi:10.20944/preprints201710.0042.v1
Subject: Medicine And Pharmacology, Orthopedics And Sports Medicine Keywords: free software; human motion; Kinovea; low cost; reliability; validity; video analysis
Online: 9 October 2017 (05:07:57 CEST)
Clinical rehabilitation and sports performance analysis both require the objectification of movement. Kinovea© is a free 2D motion analysis software that enables the establishment of kinematics parameters. This low-cost technology has been used in sports sciences, as well as clinical field and research work. Although it has been validated as a tool with which to assess time-related variables, this is not yet the case regarding angular and distance variables. The main objective of this study was to determine the validity and reliability of the Kinovea software in obtaining angular and distance data at different perspectives of 90°, 75°, 60° and 45°. For this purpose, a figure with 29 points was designed (in AutoCAD) and 24 frames analysed. Each frame was examined by three observers who each made two attempts. For each export data item, 20 angles and 20 distance variables were calculated, with intra- and inter-observer reliability also analysed. To evaluate Kinovea reliability and validity a multiple approach was applied involving the following analysis: -systematic error with a two-way ANOVA 2x4; -relative reliability with ICC and CV (95% confidence interval); -absolute reliability with Standard Error. The results thus obtained indicate that the Kinovea software is a valid and reliable tool that is able to measure accurately at distances up to 5 m from the object and at an angle range of 90°–45°. Nevertheless, for optimum results an angle of 90° is suggested.
ARTICLE | doi:10.20944/preprints202301.0426.v1
Subject: Medicine And Pharmacology, Emergency Medicine Keywords: Dispatch; Emergency Medical Dispatch; Emergency Medical Communication Centre; Video Live; COVID19; Emergency Call; Video triage; Public Safety Answering Point; Telemedecine; Emergency Medical Services; Remote assessment; Triage
Online: 24 January 2023 (08:20:00 CET)
The COVID19 pandemic had a major impact on emergency medical communication centres (EMCC). A live video facility was made available to second-line physicians in an EMCC with a first-line paramedic to receive emergency calls. The objective of this study was to measure the contribution of live video to remote medical triage. The single-centre retrospective study included all telephone assessments of patients with suspected COVID19 symptoms from 01.04.2020 to 30.04.2021 in Geneva, Switzerland. The organisation of the EMCC and the characteristics of patients who called the two emergency lines (official emergency number and COVID19 number) with suspected COVID19 symptoms were described. A prospective web-based survey of physicians was conducted during the same period to measure the indications, limitations and impact of live video on their decisions. 8,957 patients were included. 2,157 (48.0%) of the 4,493 patients assessed on the official emergency number had dyspnoea. 4,045 (90.6%) of 4,464 patients assessed on the COVID19 number had flu-like symptoms. 1,798 (20.1%) patients were reassessed remotely by a physician, including 405 (22.5%) with live video, successfully in 315 (77.8%) attempts. The web-based survey (107 forms) showed that physicians used live video to assess mainly the breathing (81.3%) and general condition (78.5%) of patients. They felt that their decision was modified in 75.7% (n=81) of cases, and caught 7 (7.7%) patients in life-threatening emergency. Medical triage decisions for suspected COVID19 patients are strongly influenced by the use of live video.
Subject: Computer Science And Mathematics, Computer Science Keywords: reinforcement learning; bitrate streaming; world-models; video streaming; model-based reinforcement learning
Online: 20 August 2020 (07:02:57 CEST)
Adaptive bitrate (ABR) algorithms optimize the quality of streaming experiences for users in client-side video players especially in unreliable or slow mobile networks. Several rule-based heuristic algorithms can achieve stable performance, but they sometimes fail to adapt properly to changing network conditions. Fluctuating bandwidth may cause algorithms to default to behavior that creates a negative experience for the user. ABR algorithms can be generated with reinforcement learning, a decision-making paradigm in which an agent learns to make optimal choices through interactions with an environment. Training reinforcement learning algorithms for bitrate streaming requires building a simulator for an agent to experience interactions quickly; training an agent in the real environment is infeasible due to the long step times in real environments. This project explores using supervised learning to construct a world-model, or a learned simulator, from recorded interactions. A reinforcement learning agent trained inside of the learned model, rather than a simulator, can outperform rule-based heuristics. Furthermore, agents trained inside the learned world-model can outperform model-free agents in low sample regimes. This work highlights the potential for world-models to quickly learn simulators, and to be used to generate optimal policies.
ARTICLE | doi:10.20944/preprints201812.0086.v4
Subject: Computer Science And Mathematics, Computer Science Keywords: multi-model information fusion; video skim-ming; audio and text classification; keyframe extraction
Online: 5 August 2019 (03:48:49 CEST)
In this paper, we propose a novel approach of video skimming by exploiting the fusion of video temporal information and keyword information representation extracted from multi-model video information including audio, text and visual indices. In addition, we introduce the brand-safe filtering and sentiment analysis in order to only reserve the user-friendly content in the video skim. In the experiment by using the videos from YouTube-8M dataset, we have proved that the semantic conservation in the video skim from the proposed approach highly outperforms the approaches by only partial information of the video in conserving the semantic content of the video.
ARTICLE | doi:10.20944/preprints201811.0314.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: High-speed video-endoscopy, laryngeal image processing, glottis delineation, Machine Learning, CNN
Online: 13 November 2018 (12:57:10 CET)
Detection of the region of interest (ROI) is a critical step in laryngeal image analysis for the delineation of glottis contour. The process can improve both computational efficiency and accuracy of the image segmentation task, which will facilitate subsequent analysis and characterization of the vocal fold vibration as it correlates with voice quality and pathology. This study aims to develop machine learning based approaches for automatic detection of ROI for glottis image sequences captured by high-speed video-endoscopy (HSV), a clinical laryngeal imaging modality. In particular, we first applied the supporting vector machine (SVM) method using histogram of oriented gradients (HOG) feature descriptor, and second, trained a convolutional neural network (CNN) model for this task. Comparisons are made for both approaches in terms of accuracy of recognition and computation time.
ARTICLE | doi:10.20944/preprints201704.0088.v1
Subject: Engineering, Other Keywords: hierarchical video quality assessment; human visual systems; primate visual cortex; full reference
Online: 14 April 2017 (11:52:44 CEST)
Video quality assessment (VQA) plays an important role in video applications for quality evaluation and resource allocation. It aims to evaluate the video quality consistent with the human perception. In this letter, a hierarchical gradient similarity based VQA metric is proposed inspired by the structure of the primate visual cortex, in which visual information is processed through sequential visual areas. These areas are modeled with the corresponding measures to evaluate the overall perceptual quality. Experimental results on the LIVE database show that the proposed VQA metric significantly outperforms the state-of-the-art VQA metrics.
ARTICLE | doi:10.20944/preprints202211.0134.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep Learning; Visual-Language Reasoning; Visual Commonsense Generation; Video-grounded Dialogue; VisualCOMET; AVSD
Online: 8 November 2022 (02:01:28 CET)
“A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation aims at generating three cause-and-effect captions (1) what needed to happen before, (2) what is the current intent, and (3) what will happen after for a given image. However, such a task is challenging for machines owing to two limitations: existing approaches (1) directly utilize conventional vision-language transformers to learn relationships between input modalities, and (2) ignore relations among target cause-and-effect captions but consider each caption independently. We propose Cause-and-Effect BART (CE-BART) which is based on (1) Structured Graph Reasoner that captures intra- and inter-modality relationships among visual and textual representations, and (2) Cause-and-Effect Generator that generates cause-and-effect captions by considering the causal relations among inferences. We demonstrate the validity of CE-BART on VisualCOMET and AVSD benchmarks. CE-BART achieves SOTA performances on both benchmarks, while extensive ablation study and qualitative analysis demonstrate the performance gain and improved interpretability.
ARTICLE | doi:10.20944/preprints202004.0266.v1
Subject: Biology And Life Sciences, Virology Keywords: COVID-19; mild patients; quarantine facility; video-consultation; living and treatment support center
Online: 16 April 2020 (08:23:06 CEST)
With the outbreak of coronavirus disease 2019 (COVID-19), there is a need for efficient management of patients with mild or no symptoms, which account for the majority. The aim of this study is to introduce the structure and operation protocol of a living and treatment support centre (LTSC) operated by Seoul National University Hospital in South Korea. The existing accommodation facility was converted into a 'patient centre' where patients was isolated. A few Medical staff here performed medical tests and responded to emergencies. Another part of the LTSC was 'remote monitoring centre'. In this center, patients’ self-measured vital signs and symptoms were monitored twice a day, and the medical staff staying here provided video-consultation via a smartphone. During the 3 weeks from March 5 to March 26, 2020, 113 patients were admitted and treated. LTSC could be an efficient alternative to hospital admission in pandemic situation like COVID-19.
ARTICLE | doi:10.20944/preprints202304.0162.v1
Subject: Medicine And Pharmacology, Pulmonary And Respiratory Medicine Keywords: Hybrid computer tomography; pulmonary ground glass nodule localization; video-assisted thoracic surgery; pulmonary recruitment
Online: 10 April 2023 (09:30:44 CEST)
The standard treatment for early-stage lung cancer is complete tumor excision by limited resection of the lung. Pre-operative localization is used before video-assisted thoracoscopic surgery (VATS) to improve the accuracy of pulmonary nodule excision. However, lung atelectasis and hypoxia resulting from controlling apnea during the localization procedure may affect the localization accuracy. Pre-procedural pulmonary recruitment may improve respiratory mechanics and oxygenation during localization. In this study, we investigated the potential benefits of pre-localization pulmonary recruitment prior to pulmonary ground-glass nodule localization in a hybrid operating room. We hypothesized that pre-localization pulmonary recruitment would increase localization accuracy, improve oxygenation, and prevent the need for re-inflation during the localization procedure. We retrospectively enrolled patients with multiple pulmonary nodule localizations before surgical intervention in our hybrid operating room. We compared localization accuracies between patients who had undergone pre-procedure pulmonary recruitment and patients who had not. Saturation, re-inflation rate, apnea time, procedure-related pneumothorax, and procedure time were also recorded as secondary outcomes. Patients who had undergone pre-procedure recruitment had better saturation, shorter procedure time, and higher localization accuracy. The pre-procedure pulmonary recruitment maneuver was effective in increasing regional lung ventilation, leading to improved oxygenation and localization accuracy.
ARTICLE | doi:10.20944/preprints202212.0479.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: video consultations; digitalisation; stakeholders’ health and wellbeing; corporate social responsibility; hospital doctors; patient care
Online: 26 December 2022 (07:40:43 CET)
The past several decades have seen a shift in patient care towards digitalisation, which has ushered in a new era of health care delivery and improved sustainability and resilience of health systems, with positive impacts on both internal and external stakeholders. This study’s aim was to understand the role of digital virtual consultations in improving internal and external stakeholders’ health, as well as wellbeing among hospital doctors. A qualitative research approach was used with semi-structured online interviews administered to hospital doctors. The interviews showed that the doctors viewed digital virtual consultations as supplementary to in-person consultations, and as tools to reduce obstacles related to distance and time. If the necessary infrastructure and technology were in place, doctors would be willing to use these options. Implementing these technologies would improve the medical profession’s flexibility on the one hand; but it might affect doctors’ work–life balance if consultations extended beyond standard working hours.
ARTICLE | doi:10.20944/preprints201906.0251.v1
Subject: Physical Sciences, Applied Physics Keywords: video microscopy, imaging, automated data acquisition, nanoparticle tracking, measurement embedded applications, open-source software
Online: 25 June 2019 (12:53:50 CEST)
We introduce PyNTA, a modular instrumentation software for live particle tracking. By using the multiprocessing library of Python and the distributed messaging library pyZMQ, PyNTA allows users to acquire images from a camera at close to maximum readout bandwidth while simultaneously performing computations on each image on a separate processing unit. This publisher/subscriber pattern generates a small overhead and leverages the multi-core capabilities of modern computers. We demonstrate capabilities of the PyNTA package on the featured application of nanoparticle tracking analysis. Real-time particle tracking on megapixel images at a rate of 50 Hz is presented. Reliable live tracking reduces the required storage capacity for particle tracking measurements by a factor of approximately 103, as compared with raw data storage, allowing for a virtually unlimited duration of measurements
REVIEW | doi:10.20944/preprints202305.0715.v1
Subject: Social Sciences, Education Keywords: Video Games; Gamification; Game Based Learning; Sustainable Development; Sustainability; Higher Education; Undergraduate Students; College Students
Online: 10 May 2023 (08:54:10 CEST)
Nowadays, the European Union and the governments of the different countries have focused on the development of the Sustainable Development Goals (SDG) and the 2030 agenda, something that has been translated into education itself. Video Games, Gamification, and Game Based Learning have become different strategies and tools to enhance the learning process and some of the growing approaches used by teachers to develop sustainable education in the classrooms. This research aims to analyze the characteristics to promote sustainability in education using games and technology, specifically its learning benefits for Higher Education. A systematic review of the literature was conducted following the PRISMA methodology. At first, 2025 documents were found which, after the filtering phases, the number of articles has been reduced to nine, which subsequently were analyzed in depth. The results indicated that among the benefits of the use of games mediated by technologies are the following: it favors education for sustainability and it promotes the educational inclusion and the work of various social skills such as collaborative and cooperative work. Also, showed an increase of the number of publications between 2019 and 2023, reflecting the growing interest in the topic. However, there are some research gap in this field.
ARTICLE | doi:10.20944/preprints202302.0166.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: speed sport climbing; video analysis; KLT algorithm; Convolutional Neural Network (CNN); OpenPose; AI; artificial intelligence
Online: 9 February 2023 (11:23:49 CET)
The continuously developing project aims to build an informatics system enabling analysis of spatial and temporal parameters of movement activities occurring in the sport of speed climbing. The monitoring system (climbing information speed system – CISS) is to be used for conducting comprehensive scientific research in the field of speed climbing. The system enables the evaluation of the training process of climbers at various levels of competition. The study analysis was based on video. The video recording with a camera positioned at a short distance (10 m) from the wall. The marker was positioned closest to the centre of mass (gravity) BMC. Results: development of a system for data collection and analysis of the climbing run based on video recording (application of the Kanade-Lucas-Tomasi (KLT) algorithm). Our results showed that used devices can measure a wide range of specific internal and external variables during speed climbing. Some of the analyzed parameters were significantly correlated with speed climbing time. These results could be a theoretical basis for future research and for training program’s preparation.
ARTICLE | doi:10.20944/preprints202302.0050.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: video-based human action recognition; Action Recognition; Deep Learning Methods; handcrafted Methods; Human Action; Overview
Online: 3 February 2023 (01:17:56 CET)
Artificial intelligence’s rapid advancement has enabled various applications, including intelligent video surveillance systems, assisted living, and human-computer interaction. These applications often require one core task: video-based human action recognition. Research in human video-based human action recognition is vast and ongoing, making it difficult to assess the full scope of available methods and current trends. This survey provides an in-depth exploration of the vision-based human action recognition field, comprehensively offering the available techniques and their evolution, highlighting the cutting-edge ideas driving its development. We also analyze the most used keywords in research papers over the past years to identify trends and predict possible future directions. Hence, this concise survey helps researchers understand the breadth of existing approaches, evaluate current research trends, and stay up-to-date on potential developments.
ARTICLE | doi:10.20944/preprints202210.0429.v1
Subject: Social Sciences, Behavior Sciences Keywords: Adolescents; passive drinking; forced drinking; alcohol misuse; interactive video-based education; pre-post intervention study
Online: 27 October 2022 (08:50:37 CEST)
Passive and forced drinking harm was prevalent but less recognized in Chinese adolescents. We educated adolescents on such harm to reduce their intention to drink. Students (n=1244) from 7 secondary schools in Hong Kong participated in a video-based health talk on passive and forced drinking harm. Paired t-test was used to assess their change in knowledge of passive and forced drinking, health and social harm of drinking after the health talk. McNemar's chi-squared test and adjusted multivariable logistic regression (AOR) were used to assess their change in intention to drink and intention to quit. Students were less likely to drink (OR 0.29, 95% CI 0.19-0.42) and more likely to quit drinking (OR 3.50, 1.10-14.6) after the health talk. Increased knowledge of passive drinking was associated with less intention to drink (AOR 0.93, 0.90-0.97), increased knowledge of health harm (adjusted b 0.06, 0.05-0.08), and social harm of drinking (adjusted b 0.12, 0.10-0.16). Similar associations were observed in forced drinking (intention to drink: AOR 0.87, 0.79-0.96; health harm: adjusted b 0.16, 0.12-0.19; social harm: adjusted b 0.36, 0.28-0.43). We showed preliminary evidence that the health talk on passive and forced drinking reduced the intention to drink in adolescents.
REVIEW | doi:10.20944/preprints202109.0111.v1
Subject: Medicine And Pharmacology, Pathology And Pathobiology Keywords: telehealth; teleoncology; telerehabilitation; telemedicine; coronavirus disease; management; video conferencing; web-based platforms; breast cancer patients
Online: 6 September 2021 (17:34:49 CEST)
Telehealth is the delivery of many health care services and technologies to individuals at different geographical areas and is categorized as asynchronously or synchronously. The coronavirus disease 2019 (COVID-19) pandemic has caused major disruptions in health care delivery to breast cancer (BCa) patients and there is increasing demand for telehealth services. Globally, telehealth has become an essential means of communication between patient and health care provider. The application of telehealth to the treatment of BCa patients is evolving and increasingly research has demonstrated its feasibility and effectiveness in improving clinical, psychological and social outcomes. Two areas of telehealth that have significantly grown in the past decade and particularly since the beginning of the COVID-19 pandemic are telerehabilitation and teleoncology. There two technological systems provides opportunities at every stage of the cancer care continuum for BCa patients. We conducted a systematic literature review that examined the use of telehealth services via its various modes of delivery among BCa patients particularly in areas of screening, diagnosis, treatment modalities, as well as satisfaction among patients and health care professionals. The advantages of telehealth models of service and delivery challenges in delivery to patients in remote arears are discussed.
ARTICLE | doi:10.20944/preprints202105.0449.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Explainable Artificial Intelligence; Hopfield Neural Networks; Automatic Video Generation; Data-to-text systems; Software Visualization
Online: 19 May 2021 (14:07:48 CEST)
Hopfield Neural Networks (HNNs) are recurrent neural networks used to implement associative memory. Their main feature is their ability to pattern recognition, optimization, or image segmentation. However, sometimes it is not easy to provide the users with good explanations about the results obtained with them due to mainly the large number of changes in the state of neurons (and their weights) produced during a problem of machine learning. There are currently limited techniques to visualize, verbalize, or abstract HNNs. This paper outlines how we can construct automatic video generation systems to explain their execution. This work constitutes a novel approach to get explainable artificial intelligence systems in general and HNNs in particular building on the theory of data-to-text systems and software visualization approaches. We present a complete methodology to build these kinds of systems. Software architecture is also designed, implemented, and tested. Technical details about the implementation are also detailed and explained. Finally, we apply our approach for creating a complete explainer video about the execution of HNNs on a small recognition problem.
ARTICLE | doi:10.20944/preprints202105.0176.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Video Steganography, Least Significant Bit (LSB) Coding, Double key Encryption, Decryption, Password Verification, Signature Verification
Online: 10 May 2021 (11:21:29 CEST)
In today’s digital media data communication over the internet increasing day by day. Therefore the data security becomes the most important issue over the internet. With the increase of data transmission, the number of intruders also increases. That’s the reason it is needed to transmit the data over the internet very securely. Steganography is a popular method in this field. This method hides the secret data with a cover medium in a way so that the intruders cannot predict the existence of the data. Here a steganography method is proposed which uses a video file as a cover medium. This method has five main steps. First, convert the video file into video frames. Then a particular frame is selected for embedded the secret data. Second, the Least Significant Bit (LSB) Coding technique is used with the double key security technique. Third, an 8 characters password verification process. Fourth, reverse the encrypted video. Fifth, signature verification process to verify the encryption and decryption process. These five steps are followed by both the encrypting and decrypting processes.
Subject: Computer Science And Mathematics, Information Systems Keywords: video surveillance; visual layer attack; electrical network frequency (ENF) signal; false frame injection (FFI) attack
Online: 1 April 2019 (09:50:05 CEST)
Over the past few years, the importance of video surveillance in securing the national critical infrastructure has significantly increased, whose applications include detecting failures and anomalies. Accompanied by video proliferation is the increasing number of attacks against surveillance systems. Among the attacks, false frame injection (FFI) attacks that replay video frames from a previous recording to mask the live feed has the highest impact. While many attempts have been made to detect FFI frames using features from the video feeds, video analysis is computationally too intensive to be deployed on-site for real-time false frame detection. In this paper, we investigate the feasibility of FFI attacks on compromised surveillance systems at the edge and propose an effective technique to detect the injected false video and audio frames by monitoring the surveillance feed using the embedded Electrical Network Frequency (ENF) signals. An ENF operates at a nominal frequency of 60 Hz/50 Hz based on its geographical location and maintains a stable value across the entire power grid interconnection with minor fluctuations. For surveillance system video/audio recordings connected to the power grid, the ENF signals are embedded. The time-varying nature of the ENF component is used as a forensic application for authenticating the surveillance feed. The paper highlights the ENF signal collection from a power grid creating a reference database and ENF extraction from the recordings using conventional short-time Fourier Transform and spectrum detection for robust ENF signal analysis in the presence of noise and interference caused in different harmonics. The experimental results demonstrate the effectiveness of ENF signal detection and/or abnormalities for FFI attacks.
Subject: Computer Science And Mathematics, Information Systems Keywords: Electrical Network Frequency (ENF); Proof-of-ENF (PoENF); Consensus; Blockchain; Security; Internet of Video Things (IoVT)
Online: 8 September 2021 (20:42:34 CEST)
The rapid advancement in artificial intelligence (AI) and wide deployment of Internet of Video Things (IoVT) enable situation awareness (SAW). Robustness and security of the IoVT systems are essential to a sustainable urban environment. While blockchain technology has shown great potentials to enable trust-free and decentralized security mechanisms, directly embedding crypto-currency oriented blockchain schemes into resource-constrained Internet of Video Things (IoVT) networks at the edge is not feasible. Leveraging Electrical Network Frequency (ENF) signals extracted from multimedia recordings as region-of-recording proofs, this paper proposes EconLedger, an ENF-based consensus mechanism that enables secure and lightweight distributed ledgers for small scale IoVT edge networks. The proposed consensus mechanism relies on a novel Proof-of-ENF (PoENF) algorithm where a validator is qualified to generate a new block if and only if a proper ENF-containing multimedia signal proof is produced within the current round. Decentralized database (DDB) is adopted to guarantee efficiency and resilience of raw ENF proofs on the off-chain storage. A proof-of-concept prototype is developed and tested in a physical IoVT network environment. The experimental results validated the feasibility of the proposed EconLedger to provide a trust-free and partially decentralized security infrastructure for IoVT edge networks.
ARTICLE | doi:10.20944/preprints202104.0318.v1
Subject: Engineering, Transportation Science And Technology Keywords: Kerr frequency comb; Hilbert transform; integrated optics; all-optical signal processing; image processing; video image processing
Online: 12 April 2021 (14:27:20 CEST)
Advanced image processing will be crucial for emerging technologies such as autonomous driving, where the requirement to quickly recognize and classify objects under rapidly changing, poor visibility environments in real time will be needed. Photonic technologies will be key for next-generation signal and information processing, due to their wide bandwidths of 10’s of Terahertz and versatility. Here, we demonstrate broadband real time analog image and video processing with an ultrahigh bandwidth photonic processor that is highly versatile and reconfigurable. It is capable of massively parallel processing over 10,000 video signals simultaneously in real time, performing key functions needed for object recognition, such as edge enhancement and detection. Our system, based on a soliton crystal Kerr optical micro-comb with a 49GHz spacing with >90 wavelengths in the C-band, is highly versatile, performing different functions without changing the physical hardware. These results highlight the potential for photonic processing based on Kerr microcombs for chip-scale fully programmable high-speed real time video processing for next generation technologies.
ARTICLE | doi:10.20944/preprints202102.0256.v1
Subject: Engineering, Automotive Engineering Keywords: Solar photovoltaics in Poland; scattered generation; video-analytics; 4G migration; CCTV monitoring; Ka-band; lag time
Online: 10 February 2021 (12:42:30 CET)
This paper contains a concise overview of the deployment of scattered solar power plants in Poland, mainly from the perspective of their communication networks, and how the recent development of the Polish 4G networks has a very positive impact for the performance of the whole monitoring system (production control and video-surveillance), with a special emphasis on video-analytics, due to its higher bandwidth demand. All the information will be shown from the point of view of the solar photovoltaics developer I+D Energías, and therefore it constitutes a real user’s experience.
ARTICLE | doi:10.20944/preprints201805.0045.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: robust principal component analysis; video separation; compressive measurements; prior information; optical flow; motion estimation; motion compensation
Online: 2 May 2018 (13:19:49 CEST)
In the context of video background-foreground separation, we propose a compressive online Robust Principal Component Analysis (RPCA) with optical flow that separates recursively a sequence of video frames into foreground (sparse) and background (low-rank) components. This separation method can process per video frame from a small set of measurements, in contrast to conventional batch-based RPCA, which processes the full data. The proposed method also leverages multiple prior information by incorporating previously separated background and foreground frames in an n-l1 minimization problem. Moreover, optical flow is utilized to estimate motions between the previous foreground frames and then compensate the motions to achieve higher quality prior foregrounds for improving the separation. Our method is tested on several video sequences in different scenarios for online background-foreground separation given compressive measurements. The visual and quantitative results show that the proposed method outperforms other existing methods.
ARTICLE | doi:10.20944/preprints202304.0891.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: mobile edge computing (MEC); non-orthogonal multiple access (NOMA); video offloading; resource allocation; deep reinforcement learning (DRL)
Online: 25 April 2023 (07:03:35 CEST)
With the proliferating of video surveillance system deployment and related applications, real-time video analysis is very critical to achieve intelligent monitoring, autonomous driving, etc. It is non-trivial to achieve high accuracy and low latency video stream analysis through the traditional cloud computing. In this paper, we propose a non-orthogonal multiple access (NOMA) based edge real-time video analysis framework with one edge server (ES) and multiple user equipments (UEs). A cost minimization problem composed of delay, energy and accuracy is formulated to improve the QoE of UEs. In order to efficiently solve this problem, we propose the joint video frame resolution scaling, task offloading, and resource allocation algorithm based on the Deep Q-Learning Network (JVFRS-TO-RA-DQN), which effectively overcomes the sparsity of the single-layer reward function and accelerates the training convergence speed. JVFRS-TO-RA-DQN consists of two DQN networks to reduce the curse of dimension, which respectively select the offloading and resource allocation action, the resolution scaling action. Experimental results show that JVFRS-TO-RA-DQN can effectively reduce the cost of transmission and computation, and have better performance in convergence compared to other baseline.
ARTICLE | doi:10.20944/preprints201810.0152.v1
Subject: Medicine And Pharmacology, Surgery Keywords: video assisted thoracic surgery, open thoracotomy, recurrence-free survival, overall survival, positive margins, postoperative length of stay.
Online: 8 October 2018 (15:23:21 CEST)
Background: Video assisted thoracoscopic surgery (VATS) has become the recommended approach for treatment of resectable lung cancer. However, no large randomized clinical trial has been conducted formally comparing surgical resections completed by VATS to those done by open thoracotomy (OT) in low volume centers. The current study sought to assess differences in recurrence-free survival (RFS), overall survival (OS), positive margins and postoperative length of stay (LOS) between VATS and OT lobectomies in our center. Method: A single institution retrospective chart review from May 2005 through May 2015 was conducted. All patients diagnosed with stage I through III lung cancer who underwent surgical resection were selected. Patient and tumor characteristics recorded included age at diagnosis, sex, tobacco use, tumor location (side and lobe), stage, size and receipt of chemotherapy or radiotherapy. Chis-square and Wilcoxon-Mann-Whitney tests were used to compare demographics, tumor characteristics and LOS. Multiple logistic and Cox regression analyses were used to compute relative risk (RR) for positive margins and mortality hazard ratios along with 95 percent confidence intervals (95%CI), respectively. Results: Of the 235 patients, 101 subjects had VATS while OT was performed in 134 patients. Age at diagnosis, sex, tobacco use, tumor location, and size were comparable for VATS and OT. No significant difference was observed in the relative risk of positive margins for VATS versus OT, RR = 0.56 (95%CI = 0.26, 1.05). However, VATS had shorter median LOS compared to OT (4 vs. 6 days, respectively), p = 0.002. A comparison of VATS versus OT showed no significant difference in the risk of recurrence, HR = 1.21 (95%CI = 0.74, 2.00), or death, HR = 1.34 (95%CI = 0.88, 2.06), in the intent-to-treat population. Similarly, no significant differences in recurrence or mortality risk were observed between VATS versus OT for analyses conducted separately for each cancer stage group or those limited to patients with negative margins. Conclusion: Our study indicates that compared to OT, VATS leads to shorter LOS while achieving comparable margins status, recurrence-free and overall survival regardless of tumor stage at diagnosis.
ARTICLE | doi:10.20944/preprints201912.0148.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Quality of Experience; Quality of Service; QoE evaluation video on demand; Quality of Service; QoS correlation; subjective testing
Online: 11 December 2019 (04:46:57 CET)
In addition to the traditional QoS metrics of delay, delay jitter, and packet loss probability (PLP), Quality of Experience (QoE) is now widely accepted as a numerical proxy for actual user experience. The literature has reported many mathematical mappings between QoE and QoS. These QoS parameters are measured by the network providers using sampling. There are some papers studying sampling errors in QoS measurements; however there is no account of propagation of these sampling errors to QoE evaluation. In this paper, we used industrially acquired measurements of PLP and jitter to evaluate the sampling errors and correlation in measurements. Focussing on Video-on-demand (VoD) applications, we use subjective testing and regression to map QoE metrics onto PLP and jitter. The resulting mathematical functions of QoE and theory of error propagation was used to evaluate the propagated error in QoE, and this error was represented as confidence interval. Using the guidelines of UK government for sampling, our results indicate that confidence intervals around estimated QoE in a busy hour can be between MOS=1 to MOS=5 at targeted operating point of QoS parameters. These results are a new perspective on QoE evaluation, and are of great significance to all organisations that need to estimate the QoE VoD applications precisely.
ARTICLE | doi:10.20944/preprints202110.0108.v2
Subject: Social Sciences, Education Keywords: academic meetings; video conferencing; Zoom; private Facebook group; narrative research; COVID-19; self-directed learning; team mindfulness; democratic meetings
Online: 21 October 2021 (12:10:57 CEST)
The online learning necessitated by COVID-19 social distancing limitations has resulted in the utilization of hybrid online formats focused on maintaining visual contact among learners and teachers. The preferred option of video conferencing for academic meetings has become that of Zoom. The needs of one voluntary, democratic, self-reflective university research group—grounded in responses to writing prompts—differed in learning focus. Demanding a safe space to encourage and record both self-reflection and creative questioning of other participants, the private Facebook group was chosen over video conferencing to maintain the concentration on group members’ written responses rather than how they saw themselves (and thought others saw them) on screen. A narrative research model initiated in 2015, the 2020/21 interaction of the group in the year’s worth of Facebook entries, and the yearend feedback received from group participants, will be compared with previous years when the weekly group met in-person. The results in relation to COVID-19 limitations indicate that an important aspect of self-directed learning related to trust that comes from team mindfulness is lost when face-to-face interaction is eliminated regarding the democratic nature of these meetings. With online meetings the new standard, maintaining trust requires improvements to online virtual meeting spaces.
ARTICLE | doi:10.20944/preprints201802.0099.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Random Linear Network Coding; Mobile Cellular Networks; 4G Long-Term Evolution (LTE); 5G New Radio (NR); Mobile video delivery
Online: 14 February 2018 (13:38:12 CET)
Exponential increase in mobile video delivery will continue with the demand for higher resolution, multi-view and large-scale multicast video services. Novel fifth generation (5G) 3GPP New Radio (NR) standard will bring a number of new opportunities for optimizing video delivery across both 5G core and radio access network. One of the promising approaches for video quality adaptation, throughput enhancement and erasure protection is the use of packet-level random linear network coding (RLNC). In this work, we discuss the integration of RLNC into the 5G NR standard, building upon the ideas and opportunities identified in 4G LTE. We explicitly identify and discuss in detail novel 5G NR features that provide support for RLNC-based video delivery in 5G, thus pointing out to the promising avenues for future research.
ARTICLE | doi:10.20944/preprints202303.0023.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: Game Design; Variational AutoEncoder (VAE); Image and Video Generation; Bayesian Algorithm; Loss Function; Data Clustering; Data and Image Analytics; MNIST database; Generator and Discriminator
Online: 1 March 2023 (11:17:12 CET)
In recent decades, the Variational AutoEncoder (VAE) model has shown good potential and capabilities in image generation and dimensionality reduction. The combination of VAE and various machine learning frameworks has also worked effectively in different daily life applications, however its possibility and effectiveness in modern game design has seldom been explored nor assessed. The use of its feature extractor for data clustering was minimally discussed in literature neither. This paper first attempts to explore different mathematical properties of the VAE model, in particular, the theoretical framework of the encoding and decoding processes, the possible achievable lower bound and loss functions of different applications; then applies the established VAE model into generating new game levels within two well-known game settings; as well as validating the effectiveness of its data clustering mechanism with the aid of the Modified National Institute of Standards and Technology (MNIST) database. Respective statistical metrics and assessments were also utilized for evaluating the performance of the proposed VAE model in aforementioned case studies. Based on the statistical and spatial results, several potential drawbacks and future enhancement of the established model were outlined, with the aim of maximizing the strengths and advantages of VAE for future game design tasks and relevant industrial missions.