ARTICLE | doi:10.20944/preprints202305.1911.v1
Subject: Social Sciences, Psychology Keywords: competitive video game; aggression; event-related potential; P300
Online: 26 May 2023 (10:04:06 CEST)
Previous research on video game player aggression has focused on violent content, while recent studies have examined competitive factors. Few research has examined the solely impacts of competitive factors in video games without violent content on aggression, and it is still unknown what neurological processes of these effects. The present study is the first to examine the the electrophysiological characteristics of short-term competitive video game exposure and aggression. Thirty-five participants played a video game in either competitive or solo mode for 15 minutes, followed by an ERP experiment based on the Oddball paradigm and the hot sauce paradigm to measure aggressive behavior. Results showed that playing competitive game mode was associated with faster judgment to aggressive words, larger P300 amplitudes, and selection of more chili powder. The P300 amplitude partially mediated the relationship between competitive game exposure and aggressive behavior. These findings support the general aggression model.
ARTICLE | doi:10.20944/preprints202309.1393.v1
Subject: Social Sciences, Geography, Planning And Development Keywords: virtual worlds; video games; players statistics; COVID-19 lockdown; digital-real crossovers; digital society
Online: 20 September 2023 (11:08:42 CEST)
The expansion of Open World Video Games (OWVGs) has seen the emergence of multiple new fictional worlds. This type of video game is characterized by total immersion in a world built for the players entertainment. What videogames developers want when players enter an OWVG is that they could feel like if they were open a door to an entirely new world and leave the real world behind. These virtual worlds manifest as a complement, an alternative, or the successor to the physical universe, and in some way continue the philosophical tradition of possible worlds begun by Leibniz. Accustomed to ‘surviving’ in hostile environments, millions of people continued to improve their skills in the midst of the COVID-19 pandemic while locked in their homes. Once the pandemic seems to start to be left behind, the paper analyzes the virtual worlds created by the video games and the relations between the pandemic and these fictional worlds.
REVIEW | doi:10.20944/preprints202305.0715.v1
Subject: Social Sciences, Education Keywords: Video Games; Gamification; Game Based Learning; Sustainable Development; Sustainability; Higher Education; Undergraduate Students; College Students
Online: 10 May 2023 (08:54:10 CEST)
Nowadays, the European Union and the governments of the different countries have focused on the development of the Sustainable Development Goals (SDG) and the 2030 agenda, something that has been translated into education itself. Video Games, Gamification, and Game Based Learning have become different strategies and tools to enhance the learning process and some of the growing approaches used by teachers to develop sustainable education in the classrooms. This research aims to analyze the characteristics to promote sustainability in education using games and technology, specifically its learning benefits for Higher Education. A systematic review of the literature was conducted following the PRISMA methodology. At first, 2025 documents were found which, after the filtering phases, the number of articles has been reduced to nine, which subsequently were analyzed in depth. The results indicated that among the benefits of the use of games mediated by technologies are the following: it favors education for sustainability and it promotes the educational inclusion and the work of various social skills such as collaborative and cooperative work. Also, showed an increase of the number of publications between 2019 and 2023, reflecting the growing interest in the topic. However, there are some research gap in this field.
REVIEW | doi:10.20944/preprints202307.1763.v1
Subject: Public Health And Healthcare, Public, Environmental And Occupational Health Keywords: Obesity; Adolescents; media; video
Online: 26 July 2023 (07:02:56 CEST)
One of the most important problems, which public health is called to face, is obesity. The World Health Organization (WHO) underlines that worldwide more than 380 million of overweight or obese children and adolescents exist. Although obesity is a multifactorial disease, excessive use of the screen seems to act in a way that affects obesity. The purpose of this systematic literature review is to study of the major problem, obesity in teenagers and its association with the hours of entertainment(referring to excessive hours of television use and viewing, video games and social media).In the present systematic literature review was carried out search for sources, through the scientific databases PubMed, ScienceDirect, and Scopus for the period from 2010 to 2022.This systematic review provides public health evidence regarding the positive association of excessive screen time and of obesity and overweight in adolescents.
HYPOTHESIS | doi:10.20944/preprints202106.0313.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: group activity recognition; graph convolution network; video understanding; video analytics; activity recognition
Online: 11 June 2021 (10:37:38 CEST)
In this paper, we propose a robust video understanding model for activity recognition by learning the actor’s pair-wise correlations and relational reasoning, exploiting spatial and temporal information. In order to measure the similarity between the pair appearances and construct an actor relations map, the Zero Mean Normalized Cross-Correlation (ZNCC) and the Zero Mean Sum of Absolute Differences(ZSAD) is proposed to allow the Graph Convolution Network (GCN) to learn how to distinguish group actions. We recommend that MNASNet be used as the backbone to retrieve features. Experiments show a 38.50% and 23.7% reduction in training time in the 2-stage training process along with a 1.52% improvement in accuracy against traditional methods.
ARTICLE | doi:10.20944/preprints201810.0118.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: bandelet; medical imaging; quadtree decomposition; SPIHT coder; video coding; video quality measure
Online: 7 October 2018 (10:26:18 CEST)
The operations of digitization, transmission and storage of medical data, particularly images require increasingly effective encoding methods not only in terms of compression ratio and flow of information but also in terms of visual quality. At first, there was DCT (discrete cosine transform) then DWT (discrete wavelet transform) and their associated standards in terms of coding and image compression. After that, the 2nd generation wavelets seeks to be positioned and confronted to the image and video coding methods currently used. It is in this context that we suggested a method combining bandelets and SPIHT (set partitioning in hierarchical trees) algorithm. There are two main reasons for our approach: the first lies in the nature of the bandelet transform to take advantage by capturing the geometrical complexity of the image structure. The second reason stems in the suitability of encoding the bandelet coefficients by the SPIHT encoder. Quality measurements shows that in some cases (for low bit rates) the performances of the proposed coding compete with the well-established ones and opens up new application prospects in the field of medical imaging.
REVIEW | doi:10.20944/preprints202308.0383.v1
Subject: Computer Science And Mathematics, Software Keywords: video analytics; edge computing; streaming video; systems; deep learning; AI; latency; bandwidth; privacy
Online: 4 August 2023 (07:25:52 CEST)
The falling cost of cameras, the advancement of AI based computer vision algorithms, and powerful hardware accelerators for deep learning have enabled wide-spread deployment of surveillance cameras with the ability to automatically analyze streaming video feeds to detect events of interest. While streaming video analytics is currently largely done in the cloud, edge computing has emerged as a pivotal component due to its advantages of low latency, reduced bandwidth, and enhanced privacy. However, a distinct gap persists between the state-of-the-art computer vision algorithms, and successful practical implementation of edge-based streaming video analytics systems. This paper presents a comprehensive review of more than 30 research papers published over the last 6 years on edge video analytics systems. The papers are analyzed across 17 distinct dimensions. Unlike prior reviews, we examine each system holistically, identifying their strengths and weaknesses in diverse implementations. Our findings suggest that certain critical topics necessary for the practical realization of edge video analytics systems are not sufficiently addressed in current research. Based on these observations, we propose research trajectories across short, medium, and long term horizons. Additionally, we explore trending topics in other computing areas that can significantly impact the field of edge video analytics.
ARTICLE | doi:10.20944/preprints202308.0802.v1
Subject: Public Health And Healthcare, Physical Therapy, Sports Therapy And Rehabilitation Keywords: freestyle; swimming; cervical; video; older
Online: 10 August 2023 (09:18:26 CEST)
Background: Swimming and, specifically, front crawl, can be included among the "overhead" sports. Overhead sports are a risk factor for some problems of the musculoskeletal system, especially the shoulder. The aim of the study was assessing the incidence of shoulder and neck pain in a Masters Swimming Team and its correlation with the crawl stroke. Methods: It is an observational study through video-analysis of the stroke and a questionnaire. The participants are 61 athletes of a Master team, whose prevailing training stroke was the front crawl were selected for the present study. Their stroke was analyzed during training by a go-pro camera mounted on a sliding trolley on a track, evaluating with their trainer their technical defects. A questionnaire about frequency of shoulder and neck pain during the last five years was administered to all the participants at the study. Results: From the questionnaire, 45 and 55 out of 61 athletes had suffered from shoulder pain and cervical pain, respectively. Both types of pain were correlated with the weekly swimming volume. The swimmers with hyperflexion of the wrist and prolonged internal rotation in the pulling phase had shoulder problems. Those who suffered from current shoulder pain reduced the underwater time. The four swimmers with an excessive body roll during breathing and those who kept their heads extended, reported cervical pain. Conclusions: Shoulder and neck pain could be prevented with the correction of specific technical errors in crawl stroke.
ARTICLE | doi:10.20944/preprints201809.0449.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: motion; superpixel; temporal features; video classification
Online: 24 September 2018 (09:54:01 CEST)
Superpixels are a representation of still images as pixel grids because of their more meaningful information compared with atomic pixels. However, their usefulness for video classification has been given little attention. In this paper, rather than using spatial RGB values as low-level features, we use optical flows mapped into hue-saturation-value (HSV) space to capture rich motion features over time. We introduce motion superpixels, which are superpixels generated from flow fields. After mapping flow fields into HSV space, independent superpixels are formed by iteration of seeded regions. Every grid of a motion superpixel is tracked over time using nearest neighbors in the histogram of flow (HOF) for consecutive flow fields. To define the temporal representation, the evolution of three features within the superpixel region, namely the HOF, HOG, and the center of superpixel mass are used as descriptors. The bag of features algorithm is used to quantify final features, and generalized histogram-kernel support vector machines are used as learning algorithms. We evaluate the proposed superpixel tracking on first-person videos and action sports videos.
BRIEF REPORT | doi:10.20944/preprints202311.0559.v1
Subject: Medicine And Pharmacology, Surgery Keywords: Chylothorax; Thoracic Duct; Video-Assisted Thoracoscopic Surgery
Online: 8 November 2023 (14:41:32 CET)
Objectives: Chylothorax is a relatively rare condition of lymphatic fluid accumulation in the thoracic cavity due to a leakage from the thoracic duct or its tributaries. Patients present with dyspnea, malnutrition, and immunosuppression. Treatment can be conservative or surgical, depending on etiology and clinical course. The optimal management algorithm for chylothorax is still controversial. Methods: This is a retrospective study of all patients with chylothorax treated at our Department of Thoracic Surgery during the 10-year period. Results: A total of 14 patients were identified for the study. Nine patients had chylothorax after lung or esophageal cancer surgery. Four patients had chylothorax in advanced lymphoma. One patient had chylothorax after blunt chest trauma. A conservative approach was initiated in most patients (92%), including pleural drainage, nil per mouth, total parenteral nutrition, and somatostatin 0.1 mg bid subcutaneously. Surgical treatment was indicated in patients with thoracic drain production >800 mL per day beyond the fifth day of treatment and those with blunt thoracic trauma. Two patients had thoracic duct ligation via right-sided thoracotomy, and five patients had video-assisted thoracoscopic thoracic duct ligation with the immediate arrest of chylous leakage. Conclusion: Chylothorax should be treated conservatively initially. Surgical treatment should not be delayed in case of failure beyond the fifth day. In our series of patients, a video-assisted thoracoscopic approach for thoracic duct ligation proved to be minimally invasive, highly efficient, and well tolerated by patients and, therefore, should be the preferable route of surgical treatment.
ARTICLE | doi:10.20944/preprints202308.0447.v1
Subject: Medicine And Pharmacology, Gastroenterology And Hepatology Keywords: artificial intelligence; Transformer; capsule endoscopy; video-analysis
Online: 4 August 2023 (13:32:21 CEST)
Although wireless capsule endoscopy (WCE) detects small bowel diseases effectively, it has some limitations. For example, the reading process can be time-consuming due to the numerous images generated per case, and lesion detection accuracy may rely on the operators' skills and experiences. Hence, many researchers have recently developed deep learning-based methods to address these limitations. However, they tend to select only a portion of the images from a given WCE video and analyze each image individually. In this study, we note that more information can be extracted from the unused frames and temporal relations of sequential frames. Specifically, to increase the accuracy of lesion detection without depending on experts' frame selection skills, we suggest using whole video frames as the input to the deep-learning system. Thus, we propose a new Transformer-based neural encoder that takes the entire video as the input, exploiting the power of the Transformer to extract long-term global correlation within and between the input frames. Subsequently, we can capture the temporal context of the input frames and the attentional features within a frame. Tests on benchmark datasets of four WCE videos showed 95.1% sensitivity and 83.4% sensitivity. These results may significantly advance automated lesion detection techniques for WCE images. Our code is available at https://github.com/syupoh/VWCE-Net.git.
ARTICLE | doi:10.20944/preprints202305.0630.v1
Online: 9 May 2023 (09:52:46 CEST)
The paper explores the potential of a low-cost advanced video-based technique for the assessment of structural damage induced to buildings by seismic loading. A low-cost high-speed video camera was utilized for motion magnification (MM) processing of footages of a two-story reinforced concrete frame building subjected to shaking table tests. The damage after seismic loading was estimated by analyzing the dynamic behavior (i.e. in terms of modal parameters) and the structural deformations of the building in the MM videos. The results by MM were compared for method validation to damage assessment obtained by the analyses of conventional accelerometers and high-precision optical markers tracked by a passive 3D motion capture system. Also, 3D laser scanning to obtain an accurate survey of the building geometry before and after the seismic tests was carried out. In particular, accelerometers were also processed and analyzed by using several stationary and non-stationary techniques with the aim to analyze the linear behavior of the undamaged structure and the nonlinear structural behavior during damaging shaking table tests. The proposed MM-based procedure provided accurate estimate of the main modal frequency and the damage location through the analysis of modal shapes, which were confirmed by advanced analyses of accelerometric data.
ARTICLE | doi:10.20944/preprints202207.0308.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: micro-video classification; 3D CNN; multi-modal
Online: 21 July 2022 (03:09:34 CEST)
Along with the popularity of the Internet, people are exposed to more and more ways of micro-videos, and a huge amount of micro-video data has emerged. micro-videos have gradually become the Internet content preferred by the public, and a large number of micro-video apps have also emerged, such as Tiktok and Kwai. Intelligent classification and mining of micro-videos can greatly enhance user experience, improve business operation efficiency and enhance user experience. Through deep intelligent analysis and mining of micro-videos, important information in micro-videos can be extracted to provide an important basis for beautifying videos, content appreciation, video recommendation, content search, etc. In the past, content understanding for short videos often used human work annotation, but in recent years, with the great success of deep convolutional neural networks in image recognition, short video content understanding based on this method has gradually developed. Nowadays, most recognition algorithms extract the feature representation of each frame independently and then fuse them. However, while extracting the feature representation, some low-level semantic features are lost, which makes the algorithm unable to accurately distinguish the category of the video. At present, the algorithm of micro-video recognition based on deep learning has surpassed the iDT algorithm, making these traditional methods fade out of people’s view. In this paper according to the micro-video classification task, a new network model is proposed to concatenate features of each modality into the overall features of various modalities through the network, and then fuse the various modal features with the attention mechanism to obtain the whole micro-video features, which will be used for classification. In order to verify the effectiveness of the algorithm proposed in this paper, experiments are conducted in the public dataset, and it is shown the effectiveness of our model.
REVIEW | doi:10.20944/preprints202201.0016.v2
Subject: Engineering, Control And Systems Engineering Keywords: AI; deep learning; video editing; image editing
Online: 4 February 2022 (13:40:05 CET)
Video editing is a high-required job, for it requires skilled artists or workers equipped with plentiful physical strength and multidisciplinary knowledge, such as cinematography, aesthetics. Thus gradually, more and more researches focus on proposing semi-automatical and even fully automatical solutions to reduce workloads. Since those conventional methods are usually designed to follow some simple guidelines, they lack flexibility and capability to learn complex ones. Fortunately, the advances of computer vision and machine learning make up the shortages of traditional approaches and make AI editing feasible. There is no survey to conclude those emerging researches yet. This paper summaries the development history of automatic video editing, and especially the applications of AI in partial and full workflows. We emphasizes video editing and discuss related works from multiple aspects: modality, type of input videos, methology, optimization, dataset, and evaluation metric. Besides, we also summarize the progresses in image editing domain, i.e., style transferring, retargeting, and colorization, and seek for the possibility to transfer those techniques to video domain. Finally, we give a brief conclusion about this survey and explore some open problems.
ARTICLE | doi:10.20944/preprints202310.1985.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: video quality consistency; adaptive QP; perceptual-based RDO
Online: 31 October 2023 (06:49:33 CET)
In industry 4.0 era, video applications such as surveillance visual systems, video conferencing, or video broadcasting have been playing a vital role. In these applications, for manipulating and tracking objects in decoded video, the quality of decoded video should be consistent because it largely affects to the performance of the machine analysis. To cope with this problem, we propose a novel perceptual video coding (PVC) solution in which a full reference quality metric named Video Multimethod Assessment Fusion (VMAF) is employed together with a deep convolutional neural network (CNN) to obtain the consistent quality while still achieving the high compression performance. First of all, to achieve the consistent quality requirement, we propose a CNN model with an expected VMAF as input to adaptively adjust the quantization parameters (QP) for each coding block. Afterwards, to increase the compression performance, a Lagrange coefficient of Rate-Distortion optimization (RDO) mechanism is adaptively computed under Rate-QP and Quality-QP models. Experimental results show that the proposed PVC has achieved two targets simultaneously: the quality of video sequence is kept consistently with an expected quality level and the bit rate saving of the proposed method is higher than traditional video coding standards and relevant benchmark, notably with around 10% bitrate saving in average.
ARTICLE | doi:10.20944/preprints202306.2227.v1
Subject: Medicine And Pharmacology, Dentistry And Oral Surgery Keywords: Dental education; Dental curriculum; E-learning; Video learning
Online: 30 June 2023 (12:31:48 CEST)
Introduction: Dental students use of online material to supplement their learning has been studied but it is unclear whether educators are aware of the findings of this research. This study aimed to investigate dental students use of online content as a learning tool from an educator’s perspective. Methods: Educators in the Dublin Dental University Hospital were invited to complete an online survey based on dental students' use of online learning. Quantitative descriptive statistical analyses were carried out as appropriate on the data collected. A focus group with interested survey participants was held to gain a deeper insight into educator’s opinions on this topic. The transcript from this discussion was analyzed by deductive and inductive coding methods. Results: From a sample of 20 educators, this study found that educators were not aware that students rely on Google and YouTube for educational videos more than university websites. Most educators believed that students are likely to refer to online videos to prepare for dental procedures that they have not done before. The same amount also believed that teachers should incorporate videos into their learning. However, 30% of educators have not uploaded or recommended online videos to their students. Most educators believed they have discussed accuracy and/or relevancy of online content with their students. Interestingly, only 20% believed that students would discuss a contradictory video with their lecturers. The focus group participants expressed concern over the accuracy of online content. They felt that this along with a lack of time were the main reasons that deter them from referring students to online videos. Conclusions: Dental educators are unaware that students access online dental content through Google and YouTube more often than through official academic platforms. Educators are concerned about the accuracy of online dental content. Many believe that they direct their students on how to determine the accuracy of online content which contrasts with other researchers’ findings. More communication is needed between educators and dental students to address each other’s concerns and enhance student’s learning.
ARTICLE | doi:10.20944/preprints202302.0415.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: reflectance PPG; reflectance photoplethysmography; heart rate estimation; video
Online: 24 February 2023 (02:52:59 CET)
Non-invasive heart rate (HR) monitoring is important in clinical settings as it plays a critical role in diagnosing a range of health conditions and assessing well-being. Presently, the gold standards for HR measurement are all based on sensors which require skin contact. Apart from inconvenience, contact sensors have proven problematic in certain scenarios – they cannot be used when mechanical isolation of the patient is imperative (burn victims, patients with shaky hands and feet), cause skin damage to premature babies in the ICU and increase the risk of spreading infections. Non-contact HR monitoring using a camera has been recently shown to be a viable alternative. It is now possible to record cardiac-synchronous blood volume variations from facial videos of human subjects under ambient lighting. These variations produce corresponding changes in skin reflectance which can be extracted as a raw reflectance photoplethysmography (rPPG) signal and processed to reveal HR. In this project, an algorithmic framework for webcam-based HR detection was successfully implemented in MATLAB. The investigation was based on 100 self-captured videos (dark-skinned subject) and 48 videos (from 12 subjects, all but one fair-skinned) obtained from COHFACE – an online database of facial videos and corresponding physiological signals. While the performance metrics (mean error, SNR) of the rPPG signals obtained from the self-captured videos were poor (best case mean error of 22%), they were good enough to demonstrate the success of the implementation. The poor results were primarily imputed to skin tone as rPPG SNR is known to be particularly low for dark tones. The results of the COHFACE videos were far superior, with mean error ranging from 3% to 15% (among 8 different rPPG signals) and 0% to 9% under ambient and dedicated lighting, respectively. This investigation sets the foundation for future research directed at optimizing rPPG performance metrics for dark-skinned subjects.
ARTICLE | doi:10.20944/preprints202106.0292.v1
Subject: Social Sciences, Psychology Keywords: sonification; gamification; auditory display; smartphone apps; video games
Online: 10 June 2021 (13:21:22 CEST)
As sonification is supposed to communicate information to users, experimental evaluation of the subjective appropriateness and effectiveness of the sonification design is often desired and sometimes indispensable. Experiments in the laboratory are typically restricted to short-term usage by a small sample size under unnatural conditions. We introduce the multi-platform CURAT Sonification Game that allows us to evaluate our sonification design by a large population during long-term usage. Gamification is used to motivate users to interact with the sonification regularly and conscientiously over a long period of time. In this paper we present the sonification game and some initial analyses of the gathered data. Furthermore, we hope to reach more volunteers to play the CURAT Sonification Game and help us evaluate and optimize our psychoacoustic sonification design and give us valuable feedback on the game and recommendations for future developments.
ARTICLE | doi:10.20944/preprints202104.0257.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: information media; video; patient’ knowledge; antibiotic use; antibiotic resistance
Online: 9 April 2021 (10:23:20 CEST)
Irrational use or misuse of antibiotics, particularly by outpatients, increases antibiotic resistance. A lack of public knowledge about ‘Responsible use of antibiotics’ and ‘How to obtain antibiotics’ is a major cause of this. This study aimed to assess the effectiveness of an educational video about antibiotics and antibiotics use to increase outpatient's knowledge in two public hospitals in East Java, Indonesia. A quasi-experimental research setting was used with a one-group pretest-posttest design, carried out from November 2018 to January 2019. The study population consisted of outpatients, to whom antibiotics were prescribed, in two public hospitals in East Java, Indonesia. Participants were selected using a purposive sampling technique; 98 outpatients at MZ General Hospital in S regency and 96 at SG General Hospital in L regency were included. A questionnaire was used to measure the respondents’ knowledge and consisted of five domains, i.e. definition of infections and antibiotics, obtaining the antibiotics, directions of use, storage instructions, antibiotic resistance. The knowledge test score was the total score of the Guttman scale (a dichotomy of ‘yes’ or ‘no’ answers). To determine the significance of the difference in knowledge before and after providing the educational video and in the knowledge score between hospitals, the (paired) Student’s t-test was applied. The educational videos significantly improved outpatients' knowledge, which increased with 41% in MZ General Hospital and 42% in SG General Hospital. An educational video is a useful method to improve the knowledge of the outpatients regarding antibiotics.
ARTICLE | doi:10.20944/preprints202011.0649.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: video super-resolution; bidirectional; recurrent method; sliding window method
Online: 25 November 2020 (15:12:38 CET)
Video super-resolution, which utilizes the relevant information of several low-resolution frames to generate high-resolution images, is a challenging task. One possible solution called sliding window method tries to divide the generation of high-resolution video sequences into independent sub-tasks, and only adjacent low-resolution images are used to estimate the high-resolution version of the central low-resolution image. Another popular method named recurrent algorithm proposes to utilize not only the low-resolution images but also the generated high-resolution images of previous frames to generate the high-resolution image. However, both methods have some unavoidable disadvantages. The former one usually leads to bad temporal consistency and requires higher computational cost while the latter method always can not make full use of information contained by optical flow or any other calculated features. Thus more investigations need to be done to explore the balance between these two methods. In this work, a bidirectional frame recurrent video super-resolution method is proposed. To be specific, a reverse training is proposed that the generated high-resolution frame is also utilized to help estimate the high-resolution version of the former frame. With the contribution of reverse training and the forward training, the idea of bidirectional recurrent method not only guarantees the temporal consistency but also make full use of the adjacent information due to the bidirectional training operation while the computational cost is acceptable. Experimental results demonstrate that the bidirectional super-resolution framework gives remarkable performance that it solves the time-related problems when the generated high-resolution image is impressive compared with recurrent-based video super-resolution method.
ARTICLE | doi:10.20944/preprints202006.0194.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: network flow; combinatorial optimization; tracking-by-detection; video surveillance
Online: 15 June 2020 (11:26:25 CEST)
In tracking-by-detection paradigm for multi-target tracking, target association is modeled as an optimization problem that is usually solved through network flow formulation. In this paper, we proposed combinatorial optimization formulation and used a bipartite graph matching for associating the targets in the consecutive frames. Usually, the target of interest is represented in a bounding box and track the whole box as a single entity. However, in the case of humans, the body goes through complex articulation and occlusion that severely deteriorate the tracking performance. To partially tackle the problem of occlusion, we argue that tracking the rigid body organ could lead to better tracking performance compared to the whole body tracking. Based on this assumption, we generated the target hypothesis of only the spatial locations of person’s heads in every frame. After the localization of head location, a constant velocity motion model is used for the temporal evolution of the targets in the visual scene. Qualitative results are evaluated on four challenging video surveillance dataset and promising results has been achieved.
Subject: Social Sciences, Media Studies Keywords: live-streaming; video-conference; broadcast; scientific conferences; diversity; inclusion
Online: 10 March 2020 (02:29:22 CET)
Live streaming conferences increase the participation of a diverse audience, help defray travel costs and overcome problems related to travel restrictions. In this article, we lay out tips for implementing live-streaming in scientific meetings. We also cover legal, ethical, and technical aspects implicated with live-streaming scientific talks. To write this article, we leveraged knowledge from our experience in organizing the symposium “Deciphering the Denisovans,” presented at the 88th Annual Meeting of the American Association of Physical Anthropology (AAPA) in Cleveland, OH, in 2019, as well as literature on the topic.
ARTICLE | doi:10.20944/preprints201805.0051.v1
Subject: Engineering, Mechanical Engineering Keywords: porous media; optical video microscopy; microfluidics; waterflooding; surfactants; polymers
Online: 3 May 2018 (05:52:37 CEST)
In this study, we examine microscale waterflooding in a randomly close-packed porous medium. Three different porosities are prepared in a microfluidic platform and saturated with silicone oil. Optical video fluorescence microscopy is used to track the water front as it flows through the porous packed bed. The degree of water saturation is compared to water containing two different types of chemical modifiers, sodium dodecyl sulfate (SDS) and polyvinylpyrrolidone (PVP), with water in the absence of a surfactant used as a control. Image analysis of our video data yield saturation curves and calculate fractal dimension, which we use to identify how morphology changes the way an invading water phase moves through the porous media. An inverse analysis based on the implicit pressure explicit saturation (IMPES) simulation technique uses mobility ratio as an adjustable parameter to fit our experimental saturation curves. The results from our inverse analysis combined with our image analysis show that this platform can be used to evaluate the effectiveness of surfactants or polymers as additives for enhancing the transport of water through an oil-saturated porous medium.
ARTICLE | doi:10.20944/preprints201710.0119.v2
Subject: Engineering, Civil Engineering Keywords: laser pointer; displacement monitoring; laser fingerprint; video; data synchronization
Online: 11 December 2017 (15:16:12 CET)
Deck inclination and vertical displacements are among the most important technical parameters to evaluate the health status of a bridge and to verify its bearing capacity. Several methods, both conventional and innovative, are used for structural rotations and displacement monitoring; no one of these does allow, at the same time, precision, automation, static and dynamic monitoring without using high cost instrumentation. The proposed system uses a common laser pointer and image processing. The elastic line inclination is measured by analyzing the single frames of a HD video of the laser beam imprint projected on a flat target. For the image processing, a code was developed in Matlab® that provides instantaneous rotation and displacement of a bridge, charged by a mobile load. An important feature is the synchronization of the load positioning, obtained by a GNSS receiver or by a video. After the calibration procedures, a test was carried out during the movements of a heavy truck maneuvering on a bridge. Data acquisition synchronization allowed to relate the position of the truck on the deck to inclination and displacements. The inclination of elastic line was obtained with a precision of 0.01 mrad. The results demonstrate the suitability of the method for dynamic load tests, control and monitoring of bridges.
ARTICLE | doi:10.20944/preprints201703.0159.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: object detection; background subtraction; video surveillance; Kinect sensor fusion
Online: 20 March 2017 (10:21:40 CET)
Depth-sensing technology has led to broad applications of inexpensive depth cameras that can capture human motion and scenes in 3D space. Background subtraction algorithms can be improved by fusing color and depth cues, thereby allowing many issues encountered in classical color segmentation to be solved. In this paper, we propose a new fusion method that combines depth and color information for foreground segmentation based on an advanced color-based algorithm. First, a background model and a depth model are developed. Then, based on these models, we propose a new updating strategy that can eliminate ghosting and black shadows almost completely. Extensive experiments have been performed to compare the proposed algorithm with other, conventional RGB-D algorithms. The experimental results suggest that our method extracts foregrounds with higher effectiveness and efficiency.
ARTICLE | doi:10.20944/preprints202308.0509.v1
Subject: Medicine And Pharmacology, Urology And Nephrology Keywords: phase duration assessment; partial nephrectomy; video analysis; surgical data science
Online: 7 August 2023 (10:10:57 CEST)
(1) Background: Surgical phases form the basic building blocks for surgical skill assessment, feedback and teaching. Phase duration itself and its correlation to clinical parameters has not yet been investigated. Novel commercial platforms provide phase indications but have not been assessed for accuracy yet. (2) Methods: We assess 100 robot-assisted partial nephrectomy videos for phase duration based on previously defined proficiency metrics. We develop an annotation framework and subsequently compare our annotations to an existing commercial solution (Touch Surgery, Medtronic™). We subsequently explore clinical correlations between phase durations and peri-operative parameters. (3) Results: Objective and uniform phase assessment requires precise definitions derived from an iterative revision process. Comparison to a commercial solution shows large differences in definitions across phases. BMI and duration of renal tumor identification correlate positively, as well as tumor complexity and both tumor excision and renorraphy duration. (4) Conclusions: Surgical phase duration can be correlated with certain clinical outcomes. Further research should investigate if retrieved correlations are also clinically meaningful. This requires increasing dataset sizes and facilitation through intelligent computer vision algorithms. Commercial platforms can facilitate this dataset expansion and help unlock the full potential, provided the disclosure of phase annotation details.
ARTICLE | doi:10.20944/preprints202307.1562.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: versatile video coding; inter-prediction; affine motion estimation; edge detection
Online: 24 July 2023 (10:58:44 CEST)
In the Versatile Video Coding (VVC) standard, affine motion models have been applied to enhance the resolution of complex motion patterns. However, due to the high computational complexity involved in affine motion estimation, real-time video processing applications face significant challenges. This paper focuses on optimizing affine motion estimation algorithms in the VVC environment and proposes a fast gradient iterative algorithm based on edge detection for efficient computation. Firstly, we establish judging conditions during the construction of affine motion candidate lists to streamline the redundant judging process. Secondly, we employ the Canny edge detection method for gradient assessment in the affine motion estimation process, thereby enhancing the iteration speed of affine motion vectors. Experimental results demonstrate that our affine motion estimation algorithm reduces encoding time by approximately 15%-35% while maintaining video bitrate and quality.
ARTICLE | doi:10.20944/preprints202206.0356.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: optimization; video segmentation; decision tree; random forest; gradient boost tree
Online: 27 June 2022 (08:56:21 CEST)
Video segmentation is crucial in a variety of practical applications especially in computer visions. Most of recent works in video segmentation are focusing on Deep learning based video segmentation, there are rooms for improvement in respect of the evolutionary algorithms. This paper aims to propose the novel method to video segmentation by using the optimization of segmentation parameters based on ensemble-based random forest and gradient boosting decision tree. The experimental results show Pareto front of segmentation parameters (hue, brightness, luminance, and saturation). Our optimization model yields accuracy: 85% +/-8.85 % (micro average: 85.00 %), average class precision: 84.88%, and average class recall: 85%. We also show the video segmentation results based on our optimization method and compare our results with Kinect-based video segmentation.
ARTICLE | doi:10.20944/preprints202107.0621.v1
Subject: Engineering, Automotive Engineering Keywords: Accessibility; Guiding Methods; Immersive Media; Subtitling; Virtual Reality; 360º video
Online: 28 July 2021 (10:28:27 CEST)
Every (multimedia) service needs to be accessible. Accessibility for multimedia content is typically provided by means of access services, of which subtitling is likely the most widespread one. Up to date, many recommendations and solutions for subtitling classical 2D audiovisual services are available. Likewise, recent efforts have been devoted to devising adequate subtitling solutions for VR360 video content. This paper, for the first time, goes a step beyond, by exploring two key requirements to fulfill remaining challenges towards efficiently subtitling 3D Virtual Reality (VR) content: presentation modes, and guiding methods. By leveraging insights from earlier work on VR360 content, the paper proposes novel presentation modes and guiding methods to not only deal with the freedom to explore the omnidirectional scenes, but also with additional specificities of 3D VR compared to VR360 content: depth, 6 Degrees of Freedom (6DoF), and viewing perspectives. The obtained results prove that always-visible and a novel proposed comic-style presentation mode are far more appropriate than state-of-the-art fixed-positioned subtitles, mainly in terms of immersion, ease and comfort of reading, and identification of speakers, when applied to professional pieces of content with limited displacement of speakers and with limited 6DoF (i.e. users are not expected to largely navigate around the virtual environment). Likewise, even in such limited movement scenarios, the results show that the use of indicators (arrows), as guiding methods, is well received. Overall, the paper provides relevant insights and paves the way toward efficiently subtitling 3D VR content.
ARTICLE | doi:10.20944/preprints201908.0289.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: drone video; human action recognition; CNN; Support vector machine (SVM)
Online: 28 August 2019 (03:52:22 CEST)
Recognition of the human interaction on the unconstrained videos taken from cameras and remote sensing platforms like a drone is a challenging problem. This study presents a method to resolve issues of motion blur, poor quality of videos, occlusions, the difference in body structure or size, and high computation or memory requirement. This study contributes to the improvement of recognition of human interaction during disasters such as an earthquake and flood utilizing drone videos for rescue and emergency management. We used Support Vector Machine (SVM) to classify the high-level and stationary features obtained from Convolutional Neural Network (CNN) in key-frames from videos. We extracted conceptual features by employing CNN to recognize objects from first and last images from a video. The proposed method demonstrated the context of a scene, which is significant in determining the behaviour of human in the videos. In this method, we do not require person detection, tracking, and many instances of images. The proposed method was tested for the University of Central Florida (UCF Sports Action), Olympic Sports videos. These videos were taken from the ground platform. Besides, camera drone video was captured from Southwest Jiaotong University (SWJTU) Sports Centre and incorporated to test the developed method in this study. This study accomplished an acceptable performance with an accuracy of 90.42%, which has indicated improvement of more than 4.92% as compared to the existing methods.
ARTICLE | doi:10.20944/preprints201807.0238.v1
Subject: Computer Science And Mathematics, Hardware And Architecture Keywords: Multiple object tracking; Airborne video; Tracklet confidence; Hierarchical association framework
Online: 13 July 2018 (14:27:22 CEST)
Multi-object tracking (MOT) in airborne videos is a challenging problem due to the uncertain airborne vehicle motion, vibrations of the mounted camera, unreliable detections, size, appearance and motion of the moving objects as well as occlusions due to the interaction between the moving objects and with other static objects in the scene.To deal with these problems, this work proposes a four-stage Hierarchical Association framework for multiple object Tracking in Airborne video (HATA). The proposed framework combines data association-based tracking (DAT) methods and target tracking using a Compressive Tracking approach, to robustly track objects in complex airborne surveillance scenes. In each association stage, different sets of tracklets and detections are associated to efficiently handle local tracklet generation, local trajectory construction, global drifting tracklet correction and global fragmented tracklet linking. Experiments with challenging airborne video datasets show significant tracking improvement compared to existing state-of-art methods.
ARTICLE | doi:10.20944/preprints201807.0222.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: compressed sensing; distributed video codec; sparse representation; side information reconstruction
Online: 13 July 2018 (03:43:55 CEST)
Aiming at the problems that large amount of video monitoring image data in underground coal mines leads to difficulties in transmission and storage, compressed sensing theory is introduced to encode and decode video images, and a new distributed video coding scheme is proposed. In order to obtain more sparse representation and more general applicability, a block-based adaptive sparse base scheme is proposed. For the acquisition of side information, fixed weight is usually used to synthesize side information and the correlation between different image blocks is neglected, a block-based classification weighted side information generation scheme is proposed. Experimental results show that the block-based classification codec scheme can make full use of inter-frame correlation. Under the appropriate sampling rate, the PSNR value of video reconstruction increases, which effectively improves the quality of video frame reconstruction.
ARTICLE | doi:10.20944/preprints202212.0132.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Short video; Sentiment Analysis; Feature; 3D Dense Net; 3D Residual Network
Online: 7 December 2022 (11:57:32 CET)
In recent years, with the development of social media, people are more and more inclined to upload text, pictures and videos on the platform to express their personal emotions, thus the number of short videos is increasing and becoming the first choice for people to socialize. Unlike the traditional way, people can convey their personal emotions and opinions through media other than words, such as video images, etc. for external information. Therefore, the expression and analysis of emotions is not only through text, but also through the analysis of emotional needs in images and videos, and the research scholars have customized products for individual users. Compared with pure text content, video information can more intuitively express users' happiness, anger and sorrow, thus short video-related applications have gained more and more popularity among Internet users in recent years. However, not all short videos on social networking sites can accurately express users' emotions, and related text information can more accurately assist sentiment analysis and thus improve accuracy. However, short video sentiment analysis based on video frame images is inaccurate in some scenarios, such as when expressing tears of joy, the sentiment expressed by the user's facial expression and voice are different, which will cause errors in the analysis of sentiment. As a result, researchers began to consider multimodal sentiment analysis to reduce the impact of the above scenarios on short video sentiment analysis. This paper focuses on proposing a sentiment analysis method for short videos. We first propose a residual attention model to make full use of the information in audio to classify the emotions contained in them. Then the text information in the dataset is classified by feature extraction. The key to extract features from text information is not only to retain the semantic information of the text, but also to explore the potential emotional information in the text, so as to ensure the integrity of the text information features. The experiments show that the sentiment analysis model proposed in this paper is more superior than the baselines.
ARTICLE | doi:10.20944/preprints202002.0026.v1
Subject: Social Sciences, Cognitive Science Keywords: mind-wandering; video lecture; self-caught method; oculomotor data; eye movements
Online: 3 February 2020 (08:34:54 CET)
The purpose of this study was to detect mind-wandering experienced by pre-service teachers while learning video lecture on physics. The lecture was videotaped and consisted of a live lecture in a classroom. The lecture was about Gauss's law on physics. We investigated whether oculomotor data and eye movements could be used as a marker to indicate the learner’s mind-wandering. Each data was collected in a study in which 24 pre-service teachers (16 females and 8 males) reported self-caught mind-wandering while learning physics video lecture during30 minutes. A Tobii Pro Spectrum (sampling rate: 300Hz) was used to capture their eye-gaze during learning Gauss's law course video. After watching video lecture, we interviewed pre-service teachers about their mind-wandering experience. We first used the self-caught method to capture the mind-wandering timing of pre-service teachers while learning from video lectures. We detected more accurate mind-wandering segments by comparing fixation duration and saccade count. We investigated two types of oculomotor data (blink count, pupil size) and nine eye movements (average peak velocity of saccades; maximum peak velocity of saccades; standard deviation of peak velocity of saccades; average amplitude of saccades; maximum amplitude of saccades; total amplitude of saccades; saccade count/s; fixation duration; fixation dispersion). The result was that the blink count could not be used as a marker for mind-wandering during learning video lectures among them (oculomotor data and eye movements), unlike previous literatures. Based on the results of this study, we identified elements that can be used as mind-wandering markers while learning from video lectures that are similar to real classes, among the oculomotor data and eye movement mentioned in previous literatures. Also, we found that most participants focused on past thoughts and felt unpleasant after experiencing mind-wandering through interview analysis.
ARTICLE | doi:10.20944/preprints201911.0101.v1
Subject: Medicine And Pharmacology, Anesthesiology And Pain Medicine Keywords: chronic pain; epigenetics; neuropathic pain; postoperative pain; thoracic surgery; video-assisted
Online: 10 November 2019 (09:29:13 CET)
Background: Elucidation of epigenetic mechanisms correlating with neuropathic pain in humans is crucial for the prevention and treatment of this treatment-resistant pain state. In the present study, associations between neuropathic pain characteristics and DNA methylation of the transient receptor potential ankyrin 1(TRPA1) gene were evaluated in chronic pain patients and preoperative patients. Methods: Pain and psychological states were prospectively assessed in patients who suffered chronic pain or were scheduled for thoracic surgery. Neuropathic characteristics were assessed using the Douleur Neuropathique 4 (DN4) questionnaire. DNA methylation levels of the CpG island in the TRPA1 gene were examined using whole blood. Results: Forty-eight adult patients were enrolled in this study. Increases in DNA methylation rates at CpG -51 showed positive correlations with increases in the DN4 score both in preoperative and chronic pain patients. Combined methylation rates at CpG -51 also significantly increased together with increase in DN4 scores. Conclusions: Neuropathic pain characteristics are likely associated with methylation rates at the promoter region of the TRPA1 gene in human peripheral blood.
ARTICLE | doi:10.20944/preprints201804.0333.v2
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: capsule video endoscopy; stochastic sampling; random walks; color gradient; image decomposition
Online: 17 May 2018 (12:46:30 CEST)
Capsule endoscopy, which uses a wireless camera to take images of the digestive tract, is emerging as an alternative to traditional colonoscopy. The diagnostic values of these images depend on the quality of revealed underlying tissue surfaces. In this paper, we consider the problem of enhancing the visibility of detail and shadowed tissue surfaces for capsule endoscopy images. Using concentric circles at each pixel for random walks combined with stochastic sampling, the proposed method enhances the details of vessel and tissue surfaces. The framework decomposes the image into two detail layers that contain shadowed tissue surfaces and detail features. The target pixel value is recalculated for the smooth layer using similarity of the target pixel to neighboring pixels by weighting against the total gradient variation and intensity differences. In order to evaluate the diagnostic image quality of the proposed method, we used clinical subjective evaluation with a rank order on selected KID image database and compared to state of the art enhancement methods. The result showed that the proposed method provides a better result in terms of diagnostic image quality and objective quality contrast metrics and structural similarity index.
ARTICLE | doi:10.20944/preprints201801.0101.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Sparse Representation; locality information; Dictionary Learning; Video Semantic Analysis; Discriminative Function
Online: 11 January 2018 (09:46:43 CET)
Dictionary Learning (DL) and Sparse Representation (SR) based Classifier have impacted greatly on the classification performance and has had good recognition rate on image data. In Video Semantic Analysis (VSA), the local structure of video data contains more vital discriminative information needed for classification. However, this has not been fully exploited by the current DL based approaches. Besides, similar coding findings are not being realized from video features with the same video category. Based on the issues stated afore, a novel learning algorithm, called Sparsity based Locality-Sensitive Discriminative Dictionary Learning(SLSDDL) for VSA is proposed in this paper. In the proposed algorithm, a discriminant loss function for the category based on sparse coding of the sparse coefficients is introduced into structure of Locality-Sensitive Dictionary Learning (LSDL) algorithm. Finally, the sparse coefficients for the testing video feature sample are solved by the optimized method of SLSDDL and the classification result for video semantic is obtained by minimizing the error between the original and reconstructed samples. The experiment results show that, the proposed SLSDDL significantly improves the performance of video semantic detection compared with the comparative state-of-the-art approaches. Moreover, the robustness to various diverse environments in video is also demonstrated, which proves the universality of the novel approach.
ARTICLE | doi:10.20944/preprints201710.0042.v1
Subject: Medicine And Pharmacology, Orthopedics And Sports Medicine Keywords: free software; human motion; Kinovea; low cost; reliability; validity; video analysis
Online: 9 October 2017 (05:07:57 CEST)
Clinical rehabilitation and sports performance analysis both require the objectification of movement. Kinovea© is a free 2D motion analysis software that enables the establishment of kinematics parameters. This low-cost technology has been used in sports sciences, as well as clinical field and research work. Although it has been validated as a tool with which to assess time-related variables, this is not yet the case regarding angular and distance variables. The main objective of this study was to determine the validity and reliability of the Kinovea software in obtaining angular and distance data at different perspectives of 90°, 75°, 60° and 45°. For this purpose, a figure with 29 points was designed (in AutoCAD) and 24 frames analysed. Each frame was examined by three observers who each made two attempts. For each export data item, 20 angles and 20 distance variables were calculated, with intra- and inter-observer reliability also analysed. To evaluate Kinovea reliability and validity a multiple approach was applied involving the following analysis: -systematic error with a two-way ANOVA 2x4; -relative reliability with ICC and CV (95% confidence interval); -absolute reliability with Standard Error. The results thus obtained indicate that the Kinovea software is a valid and reliable tool that is able to measure accurately at distances up to 5 m from the object and at an angle range of 90°–45°. Nevertheless, for optimum results an angle of 90° is suggested.
ARTICLE | doi:10.20944/preprints202301.0426.v1
Subject: Medicine And Pharmacology, Emergency Medicine Keywords: Dispatch; Emergency Medical Dispatch; Emergency Medical Communication Centre; Video Live; COVID19; Emergency Call; Video triage; Public Safety Answering Point; Telemedecine; Emergency Medical Services; Remote assessment; Triage
Online: 24 January 2023 (08:20:00 CET)
The COVID19 pandemic had a major impact on emergency medical communication centres (EMCC). A live video facility was made available to second-line physicians in an EMCC with a first-line paramedic to receive emergency calls. The objective of this study was to measure the contribution of live video to remote medical triage. The single-centre retrospective study included all telephone assessments of patients with suspected COVID19 symptoms from 01.04.2020 to 30.04.2021 in Geneva, Switzerland. The organisation of the EMCC and the characteristics of patients who called the two emergency lines (official emergency number and COVID19 number) with suspected COVID19 symptoms were described. A prospective web-based survey of physicians was conducted during the same period to measure the indications, limitations and impact of live video on their decisions. 8,957 patients were included. 2,157 (48.0%) of the 4,493 patients assessed on the official emergency number had dyspnoea. 4,045 (90.6%) of 4,464 patients assessed on the COVID19 number had flu-like symptoms. 1,798 (20.1%) patients were reassessed remotely by a physician, including 405 (22.5%) with live video, successfully in 315 (77.8%) attempts. The web-based survey (107 forms) showed that physicians used live video to assess mainly the breathing (81.3%) and general condition (78.5%) of patients. They felt that their decision was modified in 75.7% (n=81) of cases, and caught 7 (7.7%) patients in life-threatening emergency. Medical triage decisions for suspected COVID19 patients are strongly influenced by the use of live video.
ARTICLE | doi:10.20944/preprints202310.0144.v1
Subject: Medicine And Pharmacology, Neuroscience And Neurology Keywords: autism spectrum disorders; applied behavior analysis; parent training; joint attention; video modelling
Online: 3 October 2023 (10:35:14 CEST)
Social communication skills, especially eye contact and joint attention, are highly impaired in autism spectrum disorder (ASD) and predict functional outcomes. Applied Behavior Analysis is one of the best evidence-based treatments for ASD, but it is not accessible to most families in low- and middle-income countries (LMICs) as it is a costly and intensive and needs to be delivered by highly specialized professionals. Parental training has emerged as an effective alternative. The aim of this study was to test the efficacy of a parental intervention group via video modeling to acquire eye contact and joint attention. Four graded measures of eye contact and joint attention (full physical prompt, partial physical prompt, gestural prompt, and independent) were assessed in 34 children with ASD and intellectual disability (ID). There was a progressive reduction in the level of prompt required over time to acquire eye contact and joint attention, and a positive correlation between the time of exposure to the intervention and acquisition of the abilities. This parent training using video modelling to teach eye contact and joint attention skills to children with ASD and ID is a low-cost intervention that can be applied in low- resource settings.
Subject: Computer Science And Mathematics, Computer Science Keywords: reinforcement learning; bitrate streaming; world-models; video streaming; model-based reinforcement learning
Online: 20 August 2020 (07:02:57 CEST)
Adaptive bitrate (ABR) algorithms optimize the quality of streaming experiences for users in client-side video players especially in unreliable or slow mobile networks. Several rule-based heuristic algorithms can achieve stable performance, but they sometimes fail to adapt properly to changing network conditions. Fluctuating bandwidth may cause algorithms to default to behavior that creates a negative experience for the user. ABR algorithms can be generated with reinforcement learning, a decision-making paradigm in which an agent learns to make optimal choices through interactions with an environment. Training reinforcement learning algorithms for bitrate streaming requires building a simulator for an agent to experience interactions quickly; training an agent in the real environment is infeasible due to the long step times in real environments. This project explores using supervised learning to construct a world-model, or a learned simulator, from recorded interactions. A reinforcement learning agent trained inside of the learned model, rather than a simulator, can outperform rule-based heuristics. Furthermore, agents trained inside the learned world-model can outperform model-free agents in low sample regimes. This work highlights the potential for world-models to quickly learn simulators, and to be used to generate optimal policies.
ARTICLE | doi:10.20944/preprints201910.0284.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: raindrop shapes; asymmetric rain drops; scattering calculations; polarimetric radar; 2D-video distrometer
Online: 25 October 2019 (04:22:52 CEST)
Tropical storm Nate, which was a powerful hurricane prior to landfall along the Alabama coast, traversed north towards our instrumented site in Hunstville, AL. The rain bands lasted 18 h and the 2D-video disdrometer (2DVD) captured the event which was shallow and indicative of pure warm rain processes. Measurements of raindrop size, shape and velocity distributions are quite rare in pure warm rain and are expected to differ from cold rain processes. In particular, asymmetric shapes due to drop oscillations and their impact on polarimetric radar signatures in warm rain have not been studied so far. Recently, the 2DVD data has been used for 3D reconstruction of asymmetric raindrop shapes but their fraction (relative to the more common oblate shapes) in warm rain has yet to be ascertained. Here we compute the scattering matrix drop-by-drop using Computer Simulation Technology integral equation solver for drop sizes>2.5 mm. From the scattering matrix elements, the polarimetric radar observables are simulated by integrating over 1 minute consecutive segments of the event. These simulated values are compared with dual-polarized C-band radar data located at 15 km range from the 2DVD site to evaluate the contribution of the asymmetric drop shapes.
ARTICLE | doi:10.20944/preprints201812.0086.v4
Subject: Computer Science And Mathematics, Computer Science Keywords: multi-model information fusion; video skim-ming; audio and text classification; keyframe extraction
Online: 5 August 2019 (03:48:49 CEST)
In this paper, we propose a novel approach of video skimming by exploiting the fusion of video temporal information and keyword information representation extracted from multi-model video information including audio, text and visual indices. In addition, we introduce the brand-safe filtering and sentiment analysis in order to only reserve the user-friendly content in the video skim. In the experiment by using the videos from YouTube-8M dataset, we have proved that the semantic conservation in the video skim from the proposed approach highly outperforms the approaches by only partial information of the video in conserving the semantic content of the video.
ARTICLE | doi:10.20944/preprints201811.0314.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: High-speed video-endoscopy, laryngeal image processing, glottis delineation, Machine Learning, CNN
Online: 13 November 2018 (12:57:10 CET)
Detection of the region of interest (ROI) is a critical step in laryngeal image analysis for the delineation of glottis contour. The process can improve both computational efficiency and accuracy of the image segmentation task, which will facilitate subsequent analysis and characterization of the vocal fold vibration as it correlates with voice quality and pathology. This study aims to develop machine learning based approaches for automatic detection of ROI for glottis image sequences captured by high-speed video-endoscopy (HSV), a clinical laryngeal imaging modality. In particular, we first applied the supporting vector machine (SVM) method using histogram of oriented gradients (HOG) feature descriptor, and second, trained a convolutional neural network (CNN) model for this task. Comparisons are made for both approaches in terms of accuracy of recognition and computation time.
ARTICLE | doi:10.20944/preprints201704.0088.v1
Subject: Engineering, Other Keywords: hierarchical video quality assessment; human visual systems; primate visual cortex; full reference
Online: 14 April 2017 (11:52:44 CEST)
Video quality assessment (VQA) plays an important role in video applications for quality evaluation and resource allocation. It aims to evaluate the video quality consistent with the human perception. In this letter, a hierarchical gradient similarity based VQA metric is proposed inspired by the structure of the primate visual cortex, in which visual information is processed through sequential visual areas. These areas are modeled with the corresponding measures to evaluate the overall perceptual quality. Experimental results on the LIVE database show that the proposed VQA metric significantly outperforms the state-of-the-art VQA metrics.
ARTICLE | doi:10.20944/preprints202310.0085.v1
Subject: Biology And Life Sciences, Neuroscience And Neurology Keywords: autism; gut dysfunction; gut motility; gut transit; mouse model; Neuroligin-3; video-imaging
Online: 3 October 2023 (08:57:54 CEST)
Individuals with autism often experience gastrointestinal issues but the cause is unknown. Many gene mutations that modify neuronal synapse function are associated with autism and therefore may impact the enteric nervous system that regulates gastrointestinal function. A missense mutation in the Nlgn3 gene encoding the cell adhesion protein, Neuroligin-3, was identified in two brothers with autism who both experienced severe gastrointestinal dysfunction. Mice expressing this mutation (Nlgn3R451C mice) are a well-studied preclinical model of autism and show autism-relevant characteristics, including impaired social interaction and communication, as well as repetitive behaviour. We previously showed colonic dysmotility in response to GABAergic inhibition and increased myenteric neuronal numbers in the small intestine in Nlgn3R451C mice bred on a mixed genetic background. Here we show that gut dysfunction is a persistent phenotype of the Nlgn3 R451C mutation in mice backcrossed onto a C57BL/6 background. We report that Nlgn3R451C mice show faster gastrointestinal transit in vivo and have longer small intestines compared to wild-types due to a reduction in smooth muscle tone. In Nlgn3R451C mice, we observed a decrease in resting jejunal diameter and neurally-regulated dysmotility as well as shorter durations of contractile complexes in the ileum. In Nlgn3R451C mouse colons, short contractions were inhibited to a greater extent by the GABAA antagonist, gabazine, compared to wild-type mice. Inhibition of nitric oxide synthesis decreased the frequency of contractile complexes in the jejunum, but not the ileum, in both wild-type and Nlgn3R451C mice. These findings demonstrate that changes in enteric nervous system function contribute to gastrointestinal dysmotility in mice expressing the autism-associated R451C missense mutation in the Neuroligin-3 protein.
ARTICLE | doi:10.20944/preprints202211.0134.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep Learning; Visual-Language Reasoning; Visual Commonsense Generation; Video-grounded Dialogue; VisualCOMET; AVSD
Online: 8 November 2022 (02:01:28 CET)
“A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation aims at generating three cause-and-effect captions (1) what needed to happen before, (2) what is the current intent, and (3) what will happen after for a given image. However, such a task is challenging for machines owing to two limitations: existing approaches (1) directly utilize conventional vision-language transformers to learn relationships between input modalities, and (2) ignore relations among target cause-and-effect captions but consider each caption independently. We propose Cause-and-Effect BART (CE-BART) which is based on (1) Structured Graph Reasoner that captures intra- and inter-modality relationships among visual and textual representations, and (2) Cause-and-Effect Generator that generates cause-and-effect captions by considering the causal relations among inferences. We demonstrate the validity of CE-BART on VisualCOMET and AVSD benchmarks. CE-BART achieves SOTA performances on both benchmarks, while extensive ablation study and qualitative analysis demonstrate the performance gain and improved interpretability.
ARTICLE | doi:10.20944/preprints202004.0266.v1
Subject: Biology And Life Sciences, Virology Keywords: COVID-19; mild patients; quarantine facility; video-consultation; living and treatment support center
Online: 16 April 2020 (08:23:06 CEST)
With the outbreak of coronavirus disease 2019 (COVID-19), there is a need for efficient management of patients with mild or no symptoms, which account for the majority. The aim of this study is to introduce the structure and operation protocol of a living and treatment support centre (LTSC) operated by Seoul National University Hospital in South Korea. The existing accommodation facility was converted into a 'patient centre' where patients was isolated. A few Medical staff here performed medical tests and responded to emergencies. Another part of the LTSC was 'remote monitoring centre'. In this center, patients’ self-measured vital signs and symptoms were monitored twice a day, and the medical staff staying here provided video-consultation via a smartphone. During the 3 weeks from March 5 to March 26, 2020, 113 patients were admitted and treated. LTSC could be an efficient alternative to hospital admission in pandemic situation like COVID-19.
ARTICLE | doi:10.20944/preprints202304.0162.v1
Subject: Medicine And Pharmacology, Pulmonary And Respiratory Medicine Keywords: Hybrid computer tomography; pulmonary ground glass nodule localization; video-assisted thoracic surgery; pulmonary recruitment
Online: 10 April 2023 (09:30:44 CEST)
The standard treatment for early-stage lung cancer is complete tumor excision by limited resection of the lung. Pre-operative localization is used before video-assisted thoracoscopic surgery (VATS) to improve the accuracy of pulmonary nodule excision. However, lung atelectasis and hypoxia resulting from controlling apnea during the localization procedure may affect the localization accuracy. Pre-procedural pulmonary recruitment may improve respiratory mechanics and oxygenation during localization. In this study, we investigated the potential benefits of pre-localization pulmonary recruitment prior to pulmonary ground-glass nodule localization in a hybrid operating room. We hypothesized that pre-localization pulmonary recruitment would increase localization accuracy, improve oxygenation, and prevent the need for re-inflation during the localization procedure. We retrospectively enrolled patients with multiple pulmonary nodule localizations before surgical intervention in our hybrid operating room. We compared localization accuracies between patients who had undergone pre-procedure pulmonary recruitment and patients who had not. Saturation, re-inflation rate, apnea time, procedure-related pneumothorax, and procedure time were also recorded as secondary outcomes. Patients who had undergone pre-procedure recruitment had better saturation, shorter procedure time, and higher localization accuracy. The pre-procedure pulmonary recruitment maneuver was effective in increasing regional lung ventilation, leading to improved oxygenation and localization accuracy.
ARTICLE | doi:10.20944/preprints202212.0479.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: video consultations; digitalisation; stakeholders’ health and wellbeing; corporate social responsibility; hospital doctors; patient care
Online: 26 December 2022 (07:40:43 CET)
The past several decades have seen a shift in patient care towards digitalisation, which has ushered in a new era of health care delivery and improved sustainability and resilience of health systems, with positive impacts on both internal and external stakeholders. This study’s aim was to understand the role of digital virtual consultations in improving internal and external stakeholders’ health, as well as wellbeing among hospital doctors. A qualitative research approach was used with semi-structured online interviews administered to hospital doctors. The interviews showed that the doctors viewed digital virtual consultations as supplementary to in-person consultations, and as tools to reduce obstacles related to distance and time. If the necessary infrastructure and technology were in place, doctors would be willing to use these options. Implementing these technologies would improve the medical profession’s flexibility on the one hand; but it might affect doctors’ work–life balance if consultations extended beyond standard working hours.
ARTICLE | doi:10.20944/preprints201906.0251.v1
Subject: Physical Sciences, Applied Physics Keywords: video microscopy, imaging, automated data acquisition, nanoparticle tracking, measurement embedded applications, open-source software
Online: 25 June 2019 (12:53:50 CEST)
We introduce PyNTA, a modular instrumentation software for live particle tracking. By using the multiprocessing library of Python and the distributed messaging library pyZMQ, PyNTA allows users to acquire images from a camera at close to maximum readout bandwidth while simultaneously performing computations on each image on a separate processing unit. This publisher/subscriber pattern generates a small overhead and leverages the multi-core capabilities of modern computers. We demonstrate capabilities of the PyNTA package on the featured application of nanoparticle tracking analysis. Real-time particle tracking on megapixel images at a rate of 50 Hz is presented. Reliable live tracking reduces the required storage capacity for particle tracking measurements by a factor of approximately 103, as compared with raw data storage, allowing for a virtually unlimited duration of measurements
ARTICLE | doi:10.20944/preprints202309.0648.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: homography; computer vision; detection; automatic tracking of statistics; basketball; video analysis; neural networks; object tracking
Online: 11 September 2023 (10:04:43 CEST)
People often make mistakes, so we try to automate every aspect of our lives. Sports is no exception. While just over a decade ago humans were analyzing games, today this is being done by artificial intelligence. Due to rapid development over the past decade, neural networks are now faster, more accurate, and in some areas even better than their human counterparts. In this paper, we present an algorithm that can detect player statistics during an NBA broadcast. It also helps users better understand the game and the use of augmented reality. The algorithm detects players on the court, tracks their movements, and assigns them to their respective teams. Using homography estimation, we transform the players’ positions from a three-dimensional space in the video to a two-dimensional space on the playing field plane. We define a new algorithm that predicts the players’ actions and their statistics. The results show that the proposed method can effectively identify the players, their respective teams, and their positions. It can also analyze their actions with high accuracy.
ARTICLE | doi:10.20944/preprints202302.0166.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: speed sport climbing; video analysis; KLT algorithm; Convolutional Neural Network (CNN); OpenPose; AI; artificial intelligence
Online: 9 February 2023 (11:23:49 CET)
The continuously developing project aims to build an informatics system enabling analysis of spatial and temporal parameters of movement activities occurring in the sport of speed climbing. The monitoring system (climbing information speed system – CISS) is to be used for conducting comprehensive scientific research in the field of speed climbing. The system enables the evaluation of the training process of climbers at various levels of competition. The study analysis was based on video. The video recording with a camera positioned at a short distance (10 m) from the wall. The marker was positioned closest to the centre of mass (gravity) BMC. Results: development of a system for data collection and analysis of the climbing run based on video recording (application of the Kanade-Lucas-Tomasi (KLT) algorithm). Our results showed that used devices can measure a wide range of specific internal and external variables during speed climbing. Some of the analyzed parameters were significantly correlated with speed climbing time. These results could be a theoretical basis for future research and for training program’s preparation.
ARTICLE | doi:10.20944/preprints202302.0050.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: video-based human action recognition; Action Recognition; Deep Learning Methods; handcrafted Methods; Human Action; Overview
Online: 3 February 2023 (01:17:56 CET)
Artificial intelligence’s rapid advancement has enabled various applications, including intelligent video surveillance systems, assisted living, and human-computer interaction. These applications often require one core task: video-based human action recognition. Research in human video-based human action recognition is vast and ongoing, making it difficult to assess the full scope of available methods and current trends. This survey provides an in-depth exploration of the vision-based human action recognition field, comprehensively offering the available techniques and their evolution, highlighting the cutting-edge ideas driving its development. We also analyze the most used keywords in research papers over the past years to identify trends and predict possible future directions. Hence, this concise survey helps researchers understand the breadth of existing approaches, evaluate current research trends, and stay up-to-date on potential developments.
ARTICLE | doi:10.20944/preprints202210.0429.v1
Subject: Social Sciences, Behavior Sciences Keywords: Adolescents; passive drinking; forced drinking; alcohol misuse; interactive video-based education; pre-post intervention study
Online: 27 October 2022 (08:50:37 CEST)
Passive and forced drinking harm was prevalent but less recognized in Chinese adolescents. We educated adolescents on such harm to reduce their intention to drink. Students (n=1244) from 7 secondary schools in Hong Kong participated in a video-based health talk on passive and forced drinking harm. Paired t-test was used to assess their change in knowledge of passive and forced drinking, health and social harm of drinking after the health talk. McNemar's chi-squared test and adjusted multivariable logistic regression (AOR) were used to assess their change in intention to drink and intention to quit. Students were less likely to drink (OR 0.29, 95% CI 0.19-0.42) and more likely to quit drinking (OR 3.50, 1.10-14.6) after the health talk. Increased knowledge of passive drinking was associated with less intention to drink (AOR 0.93, 0.90-0.97), increased knowledge of health harm (adjusted b 0.06, 0.05-0.08), and social harm of drinking (adjusted b 0.12, 0.10-0.16). Similar associations were observed in forced drinking (intention to drink: AOR 0.87, 0.79-0.96; health harm: adjusted b 0.16, 0.12-0.19; social harm: adjusted b 0.36, 0.28-0.43). We showed preliminary evidence that the health talk on passive and forced drinking reduced the intention to drink in adolescents.
REVIEW | doi:10.20944/preprints202109.0111.v1
Subject: Medicine And Pharmacology, Pathology And Pathobiology Keywords: telehealth; teleoncology; telerehabilitation; telemedicine; coronavirus disease; management; video conferencing; web-based platforms; breast cancer patients
Online: 6 September 2021 (17:34:49 CEST)
Telehealth is the delivery of many health care services and technologies to individuals at different geographical areas and is categorized as asynchronously or synchronously. The coronavirus disease 2019 (COVID-19) pandemic has caused major disruptions in health care delivery to breast cancer (BCa) patients and there is increasing demand for telehealth services. Globally, telehealth has become an essential means of communication between patient and health care provider. The application of telehealth to the treatment of BCa patients is evolving and increasingly research has demonstrated its feasibility and effectiveness in improving clinical, psychological and social outcomes. Two areas of telehealth that have significantly grown in the past decade and particularly since the beginning of the COVID-19 pandemic are telerehabilitation and teleoncology. There two technological systems provides opportunities at every stage of the cancer care continuum for BCa patients. We conducted a systematic literature review that examined the use of telehealth services via its various modes of delivery among BCa patients particularly in areas of screening, diagnosis, treatment modalities, as well as satisfaction among patients and health care professionals. The advantages of telehealth models of service and delivery challenges in delivery to patients in remote arears are discussed.
ARTICLE | doi:10.20944/preprints202105.0449.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Explainable Artificial Intelligence; Hopfield Neural Networks; Automatic Video Generation; Data-to-text systems; Software Visualization
Online: 19 May 2021 (14:07:48 CEST)
Hopfield Neural Networks (HNNs) are recurrent neural networks used to implement associative memory. Their main feature is their ability to pattern recognition, optimization, or image segmentation. However, sometimes it is not easy to provide the users with good explanations about the results obtained with them due to mainly the large number of changes in the state of neurons (and their weights) produced during a problem of machine learning. There are currently limited techniques to visualize, verbalize, or abstract HNNs. This paper outlines how we can construct automatic video generation systems to explain their execution. This work constitutes a novel approach to get explainable artificial intelligence systems in general and HNNs in particular building on the theory of data-to-text systems and software visualization approaches. We present a complete methodology to build these kinds of systems. Software architecture is also designed, implemented, and tested. Technical details about the implementation are also detailed and explained. Finally, we apply our approach for creating a complete explainer video about the execution of HNNs on a small recognition problem.
ARTICLE | doi:10.20944/preprints202105.0176.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Video Steganography, Least Significant Bit (LSB) Coding, Double key Encryption, Decryption, Password Verification, Signature Verification
Online: 10 May 2021 (11:21:29 CEST)
In today’s digital media data communication over the internet increasing day by day. Therefore the data security becomes the most important issue over the internet. With the increase of data transmission, the number of intruders also increases. That’s the reason it is needed to transmit the data over the internet very securely. Steganography is a popular method in this field. This method hides the secret data with a cover medium in a way so that the intruders cannot predict the existence of the data. Here a steganography method is proposed which uses a video file as a cover medium. This method has five main steps. First, convert the video file into video frames. Then a particular frame is selected for embedded the secret data. Second, the Least Significant Bit (LSB) Coding technique is used with the double key security technique. Third, an 8 characters password verification process. Fourth, reverse the encrypted video. Fifth, signature verification process to verify the encryption and decryption process. These five steps are followed by both the encrypting and decrypting processes.
Subject: Computer Science And Mathematics, Information Systems Keywords: video surveillance; visual layer attack; electrical network frequency (ENF) signal; false frame injection (FFI) attack
Online: 1 April 2019 (09:50:05 CEST)
Over the past few years, the importance of video surveillance in securing the national critical infrastructure has significantly increased, whose applications include detecting failures and anomalies. Accompanied by video proliferation is the increasing number of attacks against surveillance systems. Among the attacks, false frame injection (FFI) attacks that replay video frames from a previous recording to mask the live feed has the highest impact. While many attempts have been made to detect FFI frames using features from the video feeds, video analysis is computationally too intensive to be deployed on-site for real-time false frame detection. In this paper, we investigate the feasibility of FFI attacks on compromised surveillance systems at the edge and propose an effective technique to detect the injected false video and audio frames by monitoring the surveillance feed using the embedded Electrical Network Frequency (ENF) signals. An ENF operates at a nominal frequency of 60 Hz/50 Hz based on its geographical location and maintains a stable value across the entire power grid interconnection with minor fluctuations. For surveillance system video/audio recordings connected to the power grid, the ENF signals are embedded. The time-varying nature of the ENF component is used as a forensic application for authenticating the surveillance feed. The paper highlights the ENF signal collection from a power grid creating a reference database and ENF extraction from the recordings using conventional short-time Fourier Transform and spectrum detection for robust ENF signal analysis in the presence of noise and interference caused in different harmonics. The experimental results demonstrate the effectiveness of ENF signal detection and/or abnormalities for FFI attacks.
ARTICLE | doi:10.20944/preprints202308.1429.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: DASH; video streaming; Wireless networks; QoE; Deep learning; Reinforcement Learning algorithms; Deep reinforcement learning; Bandwidth estimation
Online: 21 August 2023 (07:28:16 CEST)
Dynamic adaptive video streaming over HTTP (DASH) plays a crucial role in video transmission across networks. Traditional adaptive bitrate (ABR) algorithms adjust the quality of video segments based on network conditions and buffer occupancy. However, these algorithms rely on fixed rules within a complex environment, making it challenging to achieve optimal decisions considering the overall context. In this paper, we propose a novel Deep Reinforcement Learning-based approach for streaming DASH, focusing on maintaining consistent perceived video quality throughout the streaming session to enhance user experience. Our approach optimizes the Quality of Experience (QoE) by dynamically controlling the quality distance factor between consecutive video segments. We evaluate this approach through a simulation model that encompasses diverse wireless network environments and various video sequences. Additionally, we compare our proposed approach with state-of-the-art methods. The experimental results demonstrate significant improvements in QoE, ensuring users enjoy stable, high-quality video streaming sessions.
Subject: Computer Science And Mathematics, Information Systems Keywords: Electrical Network Frequency (ENF); Proof-of-ENF (PoENF); Consensus; Blockchain; Security; Internet of Video Things (IoVT)
Online: 8 September 2021 (20:42:34 CEST)
The rapid advancement in artificial intelligence (AI) and wide deployment of Internet of Video Things (IoVT) enable situation awareness (SAW). Robustness and security of the IoVT systems are essential to a sustainable urban environment. While blockchain technology has shown great potentials to enable trust-free and decentralized security mechanisms, directly embedding crypto-currency oriented blockchain schemes into resource-constrained Internet of Video Things (IoVT) networks at the edge is not feasible. Leveraging Electrical Network Frequency (ENF) signals extracted from multimedia recordings as region-of-recording proofs, this paper proposes EconLedger, an ENF-based consensus mechanism that enables secure and lightweight distributed ledgers for small scale IoVT edge networks. The proposed consensus mechanism relies on a novel Proof-of-ENF (PoENF) algorithm where a validator is qualified to generate a new block if and only if a proper ENF-containing multimedia signal proof is produced within the current round. Decentralized database (DDB) is adopted to guarantee efficiency and resilience of raw ENF proofs on the off-chain storage. A proof-of-concept prototype is developed and tested in a physical IoVT network environment. The experimental results validated the feasibility of the proposed EconLedger to provide a trust-free and partially decentralized security infrastructure for IoVT edge networks.
ARTICLE | doi:10.20944/preprints202104.0318.v1
Subject: Engineering, Transportation Science And Technology Keywords: Kerr frequency comb; Hilbert transform; integrated optics; all-optical signal processing; image processing; video image processing
Online: 12 April 2021 (14:27:20 CEST)
Advanced image processing will be crucial for emerging technologies such as autonomous driving, where the requirement to quickly recognize and classify objects under rapidly changing, poor visibility environments in real time will be needed. Photonic technologies will be key for next-generation signal and information processing, due to their wide bandwidths of 10’s of Terahertz and versatility. Here, we demonstrate broadband real time analog image and video processing with an ultrahigh bandwidth photonic processor that is highly versatile and reconfigurable. It is capable of massively parallel processing over 10,000 video signals simultaneously in real time, performing key functions needed for object recognition, such as edge enhancement and detection. Our system, based on a soliton crystal Kerr optical micro-comb with a 49GHz spacing with >90 wavelengths in the C-band, is highly versatile, performing different functions without changing the physical hardware. These results highlight the potential for photonic processing based on Kerr microcombs for chip-scale fully programmable high-speed real time video processing for next generation technologies.
ARTICLE | doi:10.20944/preprints202102.0256.v1
Subject: Engineering, Automotive Engineering Keywords: Solar photovoltaics in Poland; scattered generation; video-analytics; 4G migration; CCTV monitoring; Ka-band; lag time
Online: 10 February 2021 (12:42:30 CET)
This paper contains a concise overview of the deployment of scattered solar power plants in Poland, mainly from the perspective of their communication networks, and how the recent development of the Polish 4G networks has a very positive impact for the performance of the whole monitoring system (production control and video-surveillance), with a special emphasis on video-analytics, due to its higher bandwidth demand. All the information will be shown from the point of view of the solar photovoltaics developer I+D Energías, and therefore it constitutes a real user’s experience.
ARTICLE | doi:10.20944/preprints201805.0045.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: robust principal component analysis; video separation; compressive measurements; prior information; optical flow; motion estimation; motion compensation
Online: 2 May 2018 (13:19:49 CEST)
In the context of video background-foreground separation, we propose a compressive online Robust Principal Component Analysis (RPCA) with optical flow that separates recursively a sequence of video frames into foreground (sparse) and background (low-rank) components. This separation method can process per video frame from a small set of measurements, in contrast to conventional batch-based RPCA, which processes the full data. The proposed method also leverages multiple prior information by incorporating previously separated background and foreground frames in an n-l1 minimization problem. Moreover, optical flow is utilized to estimate motions between the previous foreground frames and then compensate the motions to achieve higher quality prior foregrounds for improving the separation. Our method is tested on several video sequences in different scenarios for online background-foreground separation given compressive measurements. The visual and quantitative results show that the proposed method outperforms other existing methods.
ARTICLE | doi:10.20944/preprints202311.1909.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Keyframe selection; Dimensionality Reduction; ReliefF; Bi-normal separation; Indexing; KD-tree; retrieval of flower video; Deep Flower
Online: 29 November 2023 (16:41:55 CET)
This paper presents a model for archival and retrieval of the videos of natural flowers. To design an efficient video retrieval system the stages namely, keyframe selection, feature extraction, feature dimensionality reduction and indexing are essential for fast browsing and accessing of videos. Three different keyframe selection approaches are proposed using clustering algorithms after segmenting flower regions from its background. Deep Convolutional Neural Network is used as a feature extractor. After keyframe selection, a video is represented with a set of keyframes. To reduce the feature dimension of a video, two feature selection methods are utilized. For an efficient archival and fast retrieval of flower videos an indexing method called KD-tree is recommended. For a given query video, similar videos are retrieved both in relative and absolute search modalities. An extensive experimentation conducted on a relatively large flower video dataset. The data set consists of 7788 videos of 30 different species of flowers. The videos are captured with three different devices in different resolutions. The comparative study reveals proposed keyframe selection approaches gives better results. It has also been observed that the videos retrieved in absolute approach with features selected from Binormal separation metric and indexing gives good results.
ARTICLE | doi:10.20944/preprints202304.0891.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: mobile edge computing (MEC); non-orthogonal multiple access (NOMA); video offloading; resource allocation; deep reinforcement learning (DRL)
Online: 25 April 2023 (07:03:35 CEST)
With the proliferating of video surveillance system deployment and related applications, real-time video analysis is very critical to achieve intelligent monitoring, autonomous driving, etc. It is non-trivial to achieve high accuracy and low latency video stream analysis through the traditional cloud computing. In this paper, we propose a non-orthogonal multiple access (NOMA) based edge real-time video analysis framework with one edge server (ES) and multiple user equipments (UEs). A cost minimization problem composed of delay, energy and accuracy is formulated to improve the QoE of UEs. In order to efficiently solve this problem, we propose the joint video frame resolution scaling, task offloading, and resource allocation algorithm based on the Deep Q-Learning Network (JVFRS-TO-RA-DQN), which effectively overcomes the sparsity of the single-layer reward function and accelerates the training convergence speed. JVFRS-TO-RA-DQN consists of two DQN networks to reduce the curse of dimension, which respectively select the offloading and resource allocation action, the resolution scaling action. Experimental results show that JVFRS-TO-RA-DQN can effectively reduce the cost of transmission and computation, and have better performance in convergence compared to other baseline.
ARTICLE | doi:10.20944/preprints201810.0152.v1
Subject: Medicine And Pharmacology, Surgery Keywords: video assisted thoracic surgery, open thoracotomy, recurrence-free survival, overall survival, positive margins, postoperative length of stay.
Online: 8 October 2018 (15:23:21 CEST)
Background: Video assisted thoracoscopic surgery (VATS) has become the recommended approach for treatment of resectable lung cancer. However, no large randomized clinical trial has been conducted formally comparing surgical resections completed by VATS to those done by open thoracotomy (OT) in low volume centers. The current study sought to assess differences in recurrence-free survival (RFS), overall survival (OS), positive margins and postoperative length of stay (LOS) between VATS and OT lobectomies in our center. Method: A single institution retrospective chart review from May 2005 through May 2015 was conducted. All patients diagnosed with stage I through III lung cancer who underwent surgical resection were selected. Patient and tumor characteristics recorded included age at diagnosis, sex, tobacco use, tumor location (side and lobe), stage, size and receipt of chemotherapy or radiotherapy. Chis-square and Wilcoxon-Mann-Whitney tests were used to compare demographics, tumor characteristics and LOS. Multiple logistic and Cox regression analyses were used to compute relative risk (RR) for positive margins and mortality hazard ratios along with 95 percent confidence intervals (95%CI), respectively. Results: Of the 235 patients, 101 subjects had VATS while OT was performed in 134 patients. Age at diagnosis, sex, tobacco use, tumor location, and size were comparable for VATS and OT. No significant difference was observed in the relative risk of positive margins for VATS versus OT, RR = 0.56 (95%CI = 0.26, 1.05). However, VATS had shorter median LOS compared to OT (4 vs. 6 days, respectively), p = 0.002. A comparison of VATS versus OT showed no significant difference in the risk of recurrence, HR = 1.21 (95%CI = 0.74, 2.00), or death, HR = 1.34 (95%CI = 0.88, 2.06), in the intent-to-treat population. Similarly, no significant differences in recurrence or mortality risk were observed between VATS versus OT for analyses conducted separately for each cancer stage group or those limited to patients with negative margins. Conclusion: Our study indicates that compared to OT, VATS leads to shorter LOS while achieving comparable margins status, recurrence-free and overall survival regardless of tumor stage at diagnosis.
ARTICLE | doi:10.20944/preprints201912.0148.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Quality of Experience; Quality of Service; QoE evaluation video on demand; Quality of Service; QoS correlation; subjective testing
Online: 11 December 2019 (04:46:57 CET)
In addition to the traditional QoS metrics of delay, delay jitter, and packet loss probability (PLP), Quality of Experience (QoE) is now widely accepted as a numerical proxy for actual user experience. The literature has reported many mathematical mappings between QoE and QoS. These QoS parameters are measured by the network providers using sampling. There are some papers studying sampling errors in QoS measurements; however there is no account of propagation of these sampling errors to QoE evaluation. In this paper, we used industrially acquired measurements of PLP and jitter to evaluate the sampling errors and correlation in measurements. Focussing on Video-on-demand (VoD) applications, we use subjective testing and regression to map QoE metrics onto PLP and jitter. The resulting mathematical functions of QoE and theory of error propagation was used to evaluate the propagated error in QoE, and this error was represented as confidence interval. Using the guidelines of UK government for sampling, our results indicate that confidence intervals around estimated QoE in a busy hour can be between MOS=1 to MOS=5 at targeted operating point of QoS parameters. These results are a new perspective on QoE evaluation, and are of great significance to all organisations that need to estimate the QoE VoD applications precisely.
ARTICLE | doi:10.20944/preprints202311.0110.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Human violence recognition; video surveillance; real-time; spatial attention; spatial motion extractor; short temporal extractor; global temporal extractor; VioPeru
Online: 2 November 2023 (04:06:10 CET)
Human violence recognition is an area of great interest in the scientific community, given its broad spectrum of applications, especially in video surveillance systems, since detecting violence in real-time could prevent criminal acts and save lives. Despite the number of existing proposals and research, most focus on the precision of results, leaving aside efficiency and its practical implementation. Thus, this work proposes a model that is effective and efficient in recognizing human violence in real-time. The proposed model consists of three modules: a first module called Spatial Motion Extractor (SME), in charge of extracting regions of interest from a frame; a second module called Short Temporal Extractor (STC), whose function is to extract temporal characteristics of rapid movements, finally the Global Temporal Extractor (GET) module, responsible for identifying long-lasting temporal features and fine-tuning the model. The proposal was evaluated regarding efficiency, effectiveness, and ability to operate in real-time. The results obtained on Hockey, Movies, and RWF-2000 datasets demonstrated that this approach is highly efficient compared to other alternatives. A VioPeru dataset was created to validate real-time applicability with violent and non-violent videos captured by real video surveillance cameras in Peru. The effectiveness results in this dataset outperformed the best existing proposal. Therefore, our proposal has contributions in efficiency, effectiveness, and real-time.
ARTICLE | doi:10.20944/preprints202110.0108.v2
Subject: Social Sciences, Education Keywords: academic meetings; video conferencing; Zoom; private Facebook group; narrative research; COVID-19; self-directed learning; team mindfulness; democratic meetings
Online: 21 October 2021 (12:10:57 CEST)
The online learning necessitated by COVID-19 social distancing limitations has resulted in the utilization of hybrid online formats focused on maintaining visual contact among learners and teachers. The preferred option of video conferencing for academic meetings has become that of Zoom. The needs of one voluntary, democratic, self-reflective university research group—grounded in responses to writing prompts—differed in learning focus. Demanding a safe space to encourage and record both self-reflection and creative questioning of other participants, the private Facebook group was chosen over video conferencing to maintain the concentration on group members’ written responses rather than how they saw themselves (and thought others saw them) on screen. A narrative research model initiated in 2015, the 2020/21 interaction of the group in the year’s worth of Facebook entries, and the yearend feedback received from group participants, will be compared with previous years when the weekly group met in-person. The results in relation to COVID-19 limitations indicate that an important aspect of self-directed learning related to trust that comes from team mindfulness is lost when face-to-face interaction is eliminated regarding the democratic nature of these meetings. With online meetings the new standard, maintaining trust requires improvements to online virtual meeting spaces.
ARTICLE | doi:10.20944/preprints201802.0099.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Random Linear Network Coding; Mobile Cellular Networks; 4G Long-Term Evolution (LTE); 5G New Radio (NR); Mobile video delivery
Online: 14 February 2018 (13:38:12 CET)
Exponential increase in mobile video delivery will continue with the demand for higher resolution, multi-view and large-scale multicast video services. Novel fifth generation (5G) 3GPP New Radio (NR) standard will bring a number of new opportunities for optimizing video delivery across both 5G core and radio access network. One of the promising approaches for video quality adaptation, throughput enhancement and erasure protection is the use of packet-level random linear network coding (RLNC). In this work, we discuss the integration of RLNC into the 5G NR standard, building upon the ideas and opportunities identified in 4G LTE. We explicitly identify and discuss in detail novel 5G NR features that provide support for RLNC-based video delivery in 5G, thus pointing out to the promising avenues for future research.
ARTICLE | doi:10.20944/preprints202308.0119.v1
Subject: Medicine And Pharmacology, Cardiac And Cardiovascular Systems Keywords: echocardiography; non-invasive; global myocardial work; thoracotomy; video-assisted thoracoscopic surgery; VATS; lung resection; lung surgery; right ventricle; left ventricle
Online: 2 August 2023 (02:50:51 CEST)
Considering the controversial benefits of video-assisted thoracoscopic surgery (VATS), we intended to evaluate the impact of surgical approach on cardiac function after lung resection using myocardial work analysis. Echocardiographic data of 45 patients (25 thoracotomy vs. 20 VATS) were retrospectively analyzed. All patients underwent transthoracic echocardiography (TTE) 2 weeks before and after surgery, including two-dimensional speckle tracking and tissue doppler imaging. No notable changes in left ventricular (LV) function, assessed mainly using the LV global longitudinal strain (GLS), global myocardial work index (GMWI), and global work efficiency (GWE), were observed. Right ventricular (RV) TTE values, including tricuspid annular plane systolic excursion (TAPSE), tricuspid annular systolic velocity (TASV), and RV free-wall GLS (RVFWGLS), indicated greater RV function impairment in the thoracotomy group than in the VATS group [TAPSE (mm): 17.90 ± 3.80 vs. 20.60 ± 3.50, p < 0.018; TASV (cm/s): 12.40 ± 2.90 vs. 14.60 ± 2.50, p < 0.010; RVFWGLS (%): −11.50 ± 8.50 vs. −17.50 ± 9.70, p < 0.033), respectively]. Unlike RV function, LV function remained preserved after lung resection. The thoracotomy group exhibited greater RV function impairment than did the VATS group. Further studies should evaluate the long-term impact of surgical approach on cardiac function.
ARTICLE | doi:10.20944/preprints202308.0280.v1
Subject: Medicine And Pharmacology, Gastroenterology And Hepatology Keywords: Narrow Band Imaging; Hyperspectral Imaging; Decorrelated Color Space; Video Capsule Endoscopy; Peak-Signal-to-Noise Ratio; Structural Similarity Index Metric; Entropy
Online: 3 August 2023 (10:18:14 CEST)
Video capsule endoscopy (VCE) is increasingly used to decrease the discomfort among patients owing to its small size. However, VCE has a major drawback of not having narrow band imaging (NBI) functionality. The current VCE has the traditional white light imaging (WLI) only, which has poor performance in the computer-aided detection (CAD) of different types of cancer compared with NBI. Specific cancers, such as esophageal cancer (EC), do not exhibit any early biomarkers, making their early detection difficult. In most cases, the symptoms are unnoticeable, and EC is diagnosed only in later stages, making its 5-year survival rate below 20% on an average. NBI filters provide particular wavelengths that increase the contrast and enhance certain features of the mucosa, thereby enabling early identification of EC. However, VCE does not have a slot for NBI functionality because its size cannot be increased. Hence, NBI image conversion from WLI can be achieved only in post-processing at present. In this study, a complete arithmetic assessment of the decorrelated color space was conducted to generate NBI images from WLI images for VCE of the esophagus. Three parameters, namely, structural similarity index metric (SSIM), entropy, and peak-signal-to-noise ratio (PSNR), were used to asses the simulated NBI images. Results show the good performance of the NBI image reproduction method, with SSIM, entropy difference, and PSNR values of 93.215%, 4.360, and 28.064 dB, respectively.
BRIEF REPORT | doi:10.20944/preprints202306.1418.v1
Subject: Medicine And Pharmacology, Anesthesiology And Pain Medicine Keywords: Styletubation; video-assisted intubating stylet; obesity; super-super obesity; bariatric surgery; laparoscopic sleeve gastrectomy; tracheal intubation; laryngoscopy; videolaryngoscope; anesthesia; difficult airway
Online: 21 June 2023 (02:17:23 CEST)
Direct laryngoscope and videolaryngoscope are the dominant endotracheal intubation tools. Styletubation technique (using a video-assisted intubating stylet) has shown its advantages regarding in short intubation time, high success rate, less stimulation, and operator’s satisfaction. The learning curve can be steep but easy to overcome if the technical pitfalls could be avoided. Conditions make styletubation challenging include secretions/blood, short/stiff neck, restricted mouth opening and cervical spine mobility, anatomical abnormalities over head and neck regions, and obesity, etc. In this clinical report, we present the effectiveness and efficiency of routine use of the styletubation for tracheal intubation in a super-super obese patient (BMI 103 kg/m2) undergoing bariatric surgery with laparoscopic sleeve gastrectomy.
ARTICLE | doi:10.20944/preprints202303.0023.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: Game Design; Variational AutoEncoder (VAE); Image and Video Generation; Bayesian Algorithm; Loss Function; Data Clustering; Data and Image Analytics; MNIST database; Generator and Discriminator
Online: 1 March 2023 (11:17:12 CET)
In recent decades, the Variational AutoEncoder (VAE) model has shown good potential and capabilities in image generation and dimensionality reduction. The combination of VAE and various machine learning frameworks has also worked effectively in different daily life applications, however its possibility and effectiveness in modern game design has seldom been explored nor assessed. The use of its feature extractor for data clustering was minimally discussed in literature neither. This paper first attempts to explore different mathematical properties of the VAE model, in particular, the theoretical framework of the encoding and decoding processes, the possible achievable lower bound and loss functions of different applications; then applies the established VAE model into generating new game levels within two well-known game settings; as well as validating the effectiveness of its data clustering mechanism with the aid of the Modified National Institute of Standards and Technology (MNIST) database. Respective statistical metrics and assessments were also utilized for evaluating the performance of the proposed VAE model in aforementioned case studies. Based on the statistical and spatial results, several potential drawbacks and future enhancement of the established model were outlined, with the aim of maximizing the strengths and advantages of VAE for future game design tasks and relevant industrial missions.