ARTICLE | doi:10.20944/preprints201705.0028.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: monocular image; image segment; SIFT; depth measurement; convex hull
Online: 3 May 2017 (09:19:59 CEST)
It is one of very important and basic problem in compute vision field that recovering depth information of objects from two-dimensional images. In view of the shortcomings of existing methods of depth estimation, a novel approach based on SIFT (the Scale Invariant Feature Transform) is presented in this paper. The approach can estimate the depths of objects in two images which are captured by an un-calibrated ordinary monocular camera. In this approach, above all, the first image is captured. All of the camera parameters remain unchanged, and the second image is acquired after moving the camera a distance d along the optical axis. Then image segmentation and SIFT feature extraction are implemented on the two images separately, and objects in the images are matched. Lastly, an object depth can be computed by the lengths of a pair of straight line segments. In order to ensure that the best appropriate a pair of straight line segments are chose and reduce the computation, the theory of convex hull and the knowledge of triangle similarity are employed. The experimental results show our approach is effective and practical.
ARTICLE | doi:10.20944/preprints202307.0444.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: monocular 3D reconstruction; monocular SLAM comparison; monocular VO comparison; monocular benchmark; 3D reconstruction classification; pure visual 3D reconstruction
Online: 7 July 2023 (10:12:55 CEST)
Pure monocular 3D reconstruction is an ill-posed problem that has attracted the research community's interest due to the affordability and availability of RGB sensors. SLAM, VO, and SFM are disciplines formulated to solve the 3D reconstruction problem and estimate the camera’s ego-motion, so many methods have been proposed. However, most of these methods were not evaluated in large datasets, under various motion patterns, had not been tested under the same metrics, and most of them had not been evaluated following a taxonomy, making their comparison and selection difficult. In this research, we performed a comparison of ten publicly available SLAM and VO methods following a taxonomy, including one method for each category of the primary taxonomy, three machine learning-based methods, and two updates of the best methods, to identify the advantages and limitations of each category of the taxonomy and test if the addition of machine learning or the updates made on those methods improved them significantly. Thus, we evaluated each algorithm under the TUM-Mono benchmark and performed an inferential statistical analysis to identify significative differences through its metrics. Results determined that sparse-direct methods significantly outperformed the rest of the taxonomy, and fusing them with machine learning techniques significantly improves the performance of geometric-based methods from different perspectives.
ARTICLE | doi:10.20944/preprints202308.1118.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: monocular camera; world coordinates; pose measurement; rigid body
Online: 15 August 2023 (10:33:35 CEST)
A method of measuring the absolute pose parameters of a moving rigid body using a monocular camera is proposed, aiming at addressing calibration difficulties and inconsistencies of repeated measurements of the rigid-body pose for a camera having a varying focal length. The proposed method does not require calibration beforehand. Using more than six non-coplanar control points symmetrically arranged in the rigid-body and world coordinate systems, the matrices of rotation and translation between the camera and two coordinate systems are obtained and the absolute pose of the rigid body measured. In this paper, formulas of the absolute pose measurement of a moving rigid body are deduced systematically and the complete implementation is presented. Position and attitude measurement experiments carried out on a three-axis precision turntable show that the average absolute error in the attitude angle of a moving rigid body measured by an uncalibrated camera at different positions changes by no more than 0.2 degrees. Analysis of the three-dimensional coordinate errors of the centroid of a moving rigid body shows little deviation in measurements made at three camera positions, with the maximum deviation of the average absolute error being 0.53 cm and the maximum deviation of the standard deviation being 0.66 cm. The proposed method can measure the absolute pose of a rigid body and is insensitive to the position of the camera in the measurement process. This work thus provides guidance for the repeated measurement of the absolute pose of a moving rigid body using a monocular camera.
ARTICLE | doi:10.20944/preprints202001.0123.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: static features extraction; dynamic environments; 3D reconstruction; monocular SLAM
Online: 12 January 2020 (15:12:52 CET)
Many classic visual monocular SLAM systems have been developed over the past decades, however, most of them will fail when dynamic scenarios dominate. DM-SLAM is proposed for handling dynamic objects in environments based on ORB-SLAM. The article mainly concentrates on two aspects. Firstly, DLRSAC is proposed to extract static features from the dynamic scene based on awareness of nature difference between motion and static, which is integrated into initialization of DM-SLAM. Secondly, we design candidate map points selection mechanism based on neighborhood mutual exclusion to balance the accuracy of tracking camera pose and system robustness in motion scenes. Finally, we conduct experiments in the public dataset and compare DM-SLAM with ORB-SLAM. The experiments verify the superiority of the DM-SLAM.
ARTICLE | doi:10.20944/preprints202310.0444.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Monocular Vision; Binocular Vision; Forward Projection; Inverse Projection; Displacement Projection
Online: 8 October 2023 (10:19:31 CEST)
A human eye has about 120 million rod cells and 6 million cone cells. This huge number of light sensing cells inside a human eye will continuously produce a huge quantity of visual signals which flow into a human brain for daily processing. However, the real-time processing of these visual signals does not cause any fatigue to a human brain. This fact tells us the truth which is to say that human-like vision processes do not rely on complicated formulas to compute depth, displacement, and colors, etc. On the other hand, a human eye is like a PTZ camera. Here, PTZ stands for pan, tilt and zoom. We all know that in computer vision, each set of PTZ parameters (i.e., coefficients of pan, tilt and zoom) requires a dedicated calibration to determine a camera’s projection matrix. Since there is an infinite number of PTZ parameters which could be produced by a human eye, it is unlikely that a human brain stores an infinite number of calibration matrices for each human eye. Therefore, it is an interesting question for us to answer, which is to say whether simpler formulas of computing depth and displacement exist or not. Moreover, these formulas must be calibration friendly (i.e., easy process on the fly or on the go). In this paper, we disclose an important discovery of a new solution to 3D projection in a human-like binocular vision system. The purpose of doing 3D projection in binocular vision is to undertake forward and inverse transformations (or mappings) between coordinates in 2D digital images and coordinates in a 3D analogue scene. The formulas underlying the new solution are accurate, easily computable, easily tunable (i.e., to be calibrated on the fly or on the go) and could be easily implemented by a neural system (i.e., a network of neurons). Experimental results have validated the discovered formulas.
ARTICLE | doi:10.20944/preprints201801.0195.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: spacecraft; structure from motion; monocular vision; component detection; structure analysis
Online: 22 January 2018 (05:11:39 CET)
A monocular vision pose estimation and identification algorithm used on a small spacecraft for future orbital servicing is studied in this paper. A tracker spacecraft equipped with a short-range vision system is proposed to recover the 3D structural model of a space target in orbit and automatically identify its solar panels and main body using only visual information from an onboard camera. The proposed reconstruction and identification framework is tested using structure-from-motion and point cloud identification methods. The Efficient Perspective-n-Points (EPnP) descriptor is used for pose estimation. Triangulated points are used for component segmentation by means of orientation histogram descriptors. Experimental results based on laboratory images of a spacecraft model show the effectiveness and robustness of our approach.
ARTICLE | doi:10.20944/preprints202308.2157.v1
Subject: Computer Science And Mathematics, Robotics Keywords: NAO robot; YOLOv8 network; monocular ranging; error compensation model; pose interpolation
Online: 31 August 2023 (09:52:07 CEST)
As a typical visual positioning system, monocular ranging is widely used in various fields. However, when the distance increases, there is a greater error. YOLOv8 network has the advantages of fast recognition speed and high accuracy. This paper proposes a method by combining YOLOv8 network recognition with a monocular ranging method to achieve target localization and grasping for the NAO robots. By establishing a visual distance error compensation model and applying it to correct the estimation results of the monocular distance measurement model, the accuracy of the NAO robot's long-distance monocular visual positioning is improved. Additionally, a grasping control strategy based on pose interpolation is proposed. Throughout, the proposed method's advantage in measurement accuracy was confirmed via experiments, and the grasping strategy has been implemented to accurately grasp the target object.
ARTICLE | doi:10.20944/preprints202107.0228.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Myopia; monocular form deprivation (MFD); inflammation; Fallopia Japonica (FJ); Prunella Vulgaris (PV)
Online: 9 July 2021 (15:21:52 CEST)
The increased global incidence of myopia requires the establishment of therapeutic approaches. Previous studies have suggested that inflammation plays an important role in the development and progression of myopia. We used human retinal pigment epithelial cell to study the molecular mechanisms on how FJE and PVE lowering the inflammation of the eye. The effect of FJE and PVE in MFD induced hamster model and explore the role of inflammation cytokines in myopia. Expression levels of IL-6, IL-8, and TNF-α were upregulated in retinal pigment epithelium (RPE) cells treated with IL-6 and TNF-α. FJ extract (FJE) + PV extract (PVE) reduced IL-6, IL-8, and TNF-α expression in RPE cells. Furthermore, FJE and PVE inhibited inflammation by attenuating the phosphorylation of protein kinase B (AKT), and nuclear factor kappa-light-chain-enhancer of activated B (NF-κB) pathway. In addition, we report two resveratrol + ursolic acid compounds from FJ and PV and their inhibitory activities against IL-6, IL-8, and TNF-α expression levels in RPE cells treated with IL-6 and TNF-α. FJE, PVE, and FJE + PVE were applied to MFD hamsters and their axial length was measured after 21 days. The axial length showed statistically significant differences between phosphate-buffered saline- and FJE-, PVE-, and FJE + PVE-treated MFD eyes. FJE + PVE suppressed expressions of IL-6, IL-8, and TNF-α. They also inhibited myopia-related transforming growth factor-beta (TGF)-β1, matrix metalloproteinase (MMP)-2, and NF-κB expression while increasing type Ⅰ collagen expression. Overall, these results suggest that FJE + PVE may have a therapeutic effect on myopia and be used as a potential treatment option.
ARTICLE | doi:10.20944/preprints202001.0319.v1
Subject: Arts And Humanities, Architecture Keywords: monocular depth cues; luminance contrast; colour; visual arts; image plane; human perception; brain; 3D structure; figure-ground; Gestalt Theory
Online: 27 January 2020 (01:54:27 CET)
Victor Vasarely’s (1906-1997) important legacy to the study of human perception is brought to the forefront and discussed. A large part of his impressive work conveys the appearance of striking three-dimensional shapes and structures in a large-scale pictorial plane. Current perception science explains such effects by invoking brain mechanisms for the processing of monocular (2D) depth cues. Here in this study, we illustrate and explain the local effects of 2D color and contrast cues on the perceptual organization in terms of figure-ground assignments, i.e. which local surfaces are likely to be seen as “nearer” or “bigger” in the image plane. Paired configurations are embedded in a larger, structurally ambivalent pictorial context inspired by some of Vasarely’s creations. The figure-ground effects these configurations produce reveal a significant correlation between perceptual solutions for “nearer” and “bigger” when no other monocular depth cues are given in the image. In consistency with previous findings on similar, albeit simpler visual displays, a specific color may compete with luminance contrast in resolving the planar ambiguity of a complex pattern context. Vasarely intuitively understood, and successfully exploited, this kind of subtle context effect in his art, well before empirical investigations had set out to study and explain their genesis in terms of information processing by the visual brain.