ARTICLE | doi:10.20944/preprints202112.0140.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Image Recognition; Preference Net
Online: 8 December 2021 (14:43:39 CET)
Accuracy and computational cost are the main challenges of deep neural networks in image recognition. This paper proposes an efficient ranking reduction to binary classification approach using a new feed-forward network and feature selection based on ranking the image pixels. Preference net (PN) is a novel deep ranking learning approach based on Preference Neural Network (PNN), which uses new ranking objective function and positive smooth staircase (PSS) activation function to accelerate the image pixels’ ranking. PN has a new type of weighted kernel based on spearman ranking correlation instead of convolution to build the features matrix. The PN employs multiple kernels that have different sizes to partial rank image pixels’ in order to find the best features sequence. PN consists of multiple PNNs’ have shared output layer. Each ranker kernel has a separate PNN. The output results are converted to classification accuracy using the score function. PN has promising results comparing to the latest deep learning (DL) networks using the weighted average ensemble of each PN models for each kernel on CFAR-10 and Mnist-Fashion datasets in terms of accuracy and less computational cost.
ARTICLE | doi:10.20944/preprints201612.0075.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: image recognition bases location; indoor positioning; RGB-D images; LiDAR; DataBase; mobile computing; image retrieval
Online: 15 December 2016 (07:17:35 CET)
This paper describes the first results of an Image Recognition Based Location (IRBL) for mobile application focusing on the procedure to generate a Database of range images (RGB-D). In an indoor environment, to estimate the camera position and orientation, a prior spatial knowledge of the surrounding is needed. In order to achieve this objective a complete 3D survey of two different environment (Bangbae metro station of Seoul and E.T.R.I. building in Daejeon – Republic of Korea) was performed using LiDAR (Light Detection And Ranging) instrument and the obtained scans were processed in order to obtain a spatial model of the environments. From this, two databases of reference images were generated using a specific software realized by the Geomatics group of Politecnico di Torino (ScanToRGBDImage). This tool allow to generate synthetically different RGB-D images) centered in the each scan position in the environment. Later, the external parameters (X, Y, Z, ω, φ, κ) and the range information extracted from the DB images retrieved, are used as reference information for pose estimation of a set of acquired mobile pictures in the IRBL procedure. In this paper the survey operations, the approach for generating the RGB-D images and the IRB strategy are reported. Finally the analysis of the results and the validation test are described.
Subject: Engineering, Automotive Engineering Keywords: forest fire; image recognition; graph neural network;
Online: 13 July 2021 (11:31:18 CEST)
Forest fire identification is important for forest resource protection. Effective monitoring of forest fires requires the deployment of multiple monitors with different viewpoints, while most traditional recognition models can only recognize images from a single source. By ignoring the information from images with different viewpoints, these models produce high rates of missed and false alarms. In this paper, we propose a graph neural network model based on the similarity of dynamic features of multi-view images to improve the accuracy of forest fire recognition. The input features of the nodes on the graph are converted into relational features of different gallery pairs by establishing pairs (nodes) representing different viewpoint images and gallery images. The new feature library relationship is used to update the image gallery with dynamic features in order to achieve the estimation of similarity between images and improve the image recognition rate of the model. In addition, to reduce the complexity of image pre-processing process and extract key features in images effectively, this paper also proposes a dynamic feature extraction method for fire regions based on image segment ability. By setting the threshold value of HSV color space, the fire region is segmented from the image, and the dynamic features of successive frames of the fire region are extracted. The experimental results show that, compared with the baseline method Resnet, this paper's method is more effective in identifying forest fires, and its recognition accuracy is improved by 2%. And the scheme of this paper can adapt to different forest fire scenes, with better generalization ability and anti-interference ability.
ARTICLE | doi:10.20944/preprints202308.1580.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Machine Learning; Face Recognition; image classification, Feature Extraction
Online: 22 August 2023 (13:22:53 CEST)
It is crucial to select the right machine learning classifier for image classification and face recog-nition. This study examines the effectiveness of four different face recognition classifiers - Support Vector Machines (SVM), Random Forest, K-Nearest Neighbors (KNN), and Neural Networks. An analysis of the Large Faces in the Wild (LFW) dataset was carried out using Principal Component Analysis (PCA). Classifiers are rigorously trained and evaluated based on the extracted features. Comparison of classifier performance is an insightful way to figure out their strengths and weaknesses. Having a visual representation of the classifier's performance gives a complete understanding of its capabilities. Through the selection of the most appropriate classifier, study results contribute to advancements in image classification, recognition, and biometric identification. The comparison study demonstrated that the Neural Network classifier was exceptionally accurate and proficient in recognizing faces from the LFW dataset when used in conjunction with PCA for feature extraction. According to the comparative analysis, the Neural Network classifier proved exceptionally accurate and proficient at identifying faces from the LFW dataset when combined with PCA for feature extraction.
ARTICLE | doi:10.20944/preprints202304.1242.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: visual intelligence; object detection; image processing; action recognition; autonomous vehicles; machine learning
Online: 30 April 2023 (02:50:07 CEST)
In the context of Shared Autonomous Vehicles, the need to monitor the environment inside the car will be crucial. This article focuses on the application of deep learning algorithms to detect objects, namely lost/forgotten items to inform the passengers, and aggressive items to monitoring if violent actions may arise between passengers. For object detection algorithms was used public datasets (COCO and TAO) to train state-of-the-art algorithms, such as YOLOv5. For violent action detection was used the MoLa InCar dataset to train on state-of-the-art algorithms such as I3D, R(2+1)D, SlowFast, TSN and TSM. At the end an embedded automotive solution was used to demonstrate both methods running in real-time.
ARTICLE | doi:10.20944/preprints201901.0257.v1
Subject: Chemistry And Materials Science, Biomaterials Keywords: Hyper-spectral imaging system; Spectral characteristics; Image processing; Threshold method; Bloodstains recognition
Online: 25 January 2019 (14:54:14 CET)
The identification of bloodstain is one of the most important approaches in obtaining evidence in criminalistics. A threshold method based on spectral co-efficient and interclass variance is proposed in this paper, it is a non-contact, non-destructive method for quickly identifying bloodstains. The spectra of bloodstains and other suspected substances were all extracted from their hyper-spectral image. Then calculate the correlation coefficients of these spectral and interclass variances, analyze the differences between substances. The best blood recognition threshold was determined as 0.9. After preprocessing for eliminating systematic errors, experiments with the threshold 0.9 are carried out to identify bloodstains on the calico and red T-shirt. The method can remarkably identify the bloodstain from other non-blood substances both quickly and efficiently. The blood extraction rate can reach to 93.35% and 89.19%, respectively. It is an important step toward the implementation of bloodstain non-contact and non-destructive identification in forensic casework.
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Amharic script; Attention mechanism; OCR; Encoder-decoder; Text-image
Online: 15 October 2020 (13:42:28 CEST)
In the present, the growth of digitization and worldwide communications make OCR systems of exotic languages a very important task. In this paper, we attempt to develop an OCR system for one of these exotic languages with a unique script, Amharic. Motivated by the recent success of the Attention mechanism in Neural Machine Translation (NMT), we extend the attention mechanism for Amharic text-image recognition. The proposed model consists of CNNs and attention embedded recurrent encoder-decoder networks that are integrated following the configuration of the seq2seq framework. The attention network parameters are trained in an end-to-end fashion and the context vector is injected, with the previously predicted output, at each time steps of decoding. Unlike the existing OCR model that minimizes the CTC objective function, the new model minimizes the categorical cross-entropy loss. The performance of the proposed attention-based model is evaluated against the test dataset from the ADOCR database which consists of both printed and synthetically generated Amharic text-line images and achieved promising results with a CER of 1.54% and 1.17% respectively.
ARTICLE | doi:10.20944/preprints202012.0237.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Biometrics; Face Recognition; Single Sample Face Recognition; Binarized Statistical Image Features; K-Nearest Neighbors
Online: 9 December 2020 (18:25:02 CET)
Single sample face recognition (SSFR) is a computer vision challenge. In this scenario, there is only one example from each individual on which to train the system, making it difficult to identify persons in unconstrained environments, particularly when dealing with changes in facial expression, posture, lighting, and occlusion. This paper suggests a different method based on a variant of the Binarized Statistical Image Features (BSIF) descriptor called Multi-Block Color-Binarized Statistical Image Features (MB-C-BSIF) to resolve the SSFR Problem. First, the MB-C-BSIF method decomposes a facial image into three channels (e.g., red, green, and blue), then it divides each channel into equal non-overlapping blocks to select the local facial characteristics that are consequently employed in the classification phase. Finally, the identity is determined by calculating the similarities among the characteristic vectors adopting a distance measurement of the k-nearest neighbors (K-NN) classifier. Extensive experiments on several subsets of the unconstrained Alex & Robert (AR) and Labeled Faces in the Wild (LFW) databases show that the MB-C-BSIF achieves superior results in unconstrained situations when compared to current state-of-the-art methods, especially when dealing with changes in facial expression, lighting, and occlusion. Furthermore, the suggested method employs algorithms with lower computational cost, making it ideal for real-time applications.
ARTICLE | doi:10.20944/preprints202308.0850.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Image recognition; Micro AR marker; Camera parameter control; Iterative recognition
Online: 11 August 2023 (02:53:04 CEST)
This paper presents a novel dynamic camera parameter control method for position and posture estimation of highly miniaturized AR markers (micro AR markers) using a low cost general camera. The proposed method performs iterative calculation of the marker’s position and posture to converge them to specified accuracy with dynamically updating the camera’s zoom, focus and other parameter values based on the detected marker’s depth distances. For a 10 mm square micro AR marker, the proposed system demonstrated recognition accuracy of better than ±1.0% for depth distance and 2.5∘ for posture angle with a maximum recognition range of 1.0 m. In addition, the iterative calculation time was at most 0.7 seconds in an initial detection of the marker. These experimental results suggest that the proposed method and system can be applied to a robotic precise handling of small objects with low cost.
ARTICLE | doi:10.20944/preprints202309.0290.v1
Subject: Engineering, Bioengineering Keywords: Intelligent Image Recognition; Left and Right Upper Limb Dislocation Surgery; Accuracy Rate; Recall Rate; IRB
Online: 5 September 2023 (09:20:54 CEST)
Our image recognition system mainly judges whether the left upper limb in the image is the left upper limb or the right upper limb through our deep learning model in the image. The doctor then could give the correct surgical position. From the experimental results, it could be found that the precision rate and recall rate of the intelligent image recognition system proposed in this paper for preventing the upper limb dislocation surgery could reach 98% and 93%, respectively. It proved that our intelligent image recognition system could indeed assist orthopedic surgeons to prevent the occurrence of left and right dislocation in upper limb surgery. At the same time, this paper also completes the IRB application approval through the prototype experimental results and will conduct the second phase of human trials in the future. It showed that the research results of this paper will be of great benefit and research value to upper limb orthopedic surgery.
ARTICLE | doi:10.20944/preprints202306.0152.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Fine-grained Image Recognition; Yolov5; Transformer Encoder Block; Attention Mechanism
Online: 2 June 2023 (08:09:14 CEST)
Fine-grained image classification remains an ongoing challenge in the computer vision field, which is particularly intended to identify objects within sub-categories. It is a difficult task since there is a minimal and substantial intra-class variance. The current methods address the issue by first locating selective regions with Region Proposal Networks (RPN), object localization, or part localization, followed by implementing a CNN Network or SVM classifier to those selective regions. This approach, however, makes the process simple by implementing a single-stage end-to-end feature encoding with a localization method, which leads to improved feature representations of individual tokens/regions by integrating the transformer encoder blocks into the Yolov5 backbone structure. These Transformer Encoder Blocks, with their self-attention mechanism, effectively captured the global dependencies and enabled the model to learn relationships between distant regions. This improved the model ability to understand context and captured long-range spatial relationships in the image. We also replaced the Yolov5 detection heads with three transformer heads at the output for object recognition using the discriminative and informative features maps from transformer encoder blocks. We established the potential of the single stage detector for the fine-grained image recognition task, by achieving state of the art 93.4% accuracy, as well as outperforming the existing Yolov5 model. The effectiveness of our approach is assessed using the Stanford car dataset, which includes 16,185 images of 196 different classes of vehicles with significantly identical visual appearances.
ARTICLE | doi:10.20944/preprints202307.1483.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Artificial intelligence; deep learning; transfer learning; image classification; fresco
Online: 21 July 2023 (08:17:41 CEST)
The unique characteristics of frescoes on overseas Chinese buildings can attest to the integration and historical background of Chinese and Western cultures. Reasonable analysis and preservation of overseas Chinese frescoes can provide sustainable development for culture and history. This research adopts the image analysis technology based on artificial intelligence, and proposes a ResNet-34 model and method integrating transfer learning. This deep learning model can identify and classify the source of the frescoes of the emigrants, and can effectively deal with the problems such as the small number of fresco images on the emigrants' buildings, poor quality, difficulty in feature extraction, and similar pattern text and style. The experimental results show that the training process of the model proposed in this article is stable. On the constructed Jiangmen and Haikou fresco JHD datasets, the final accuracy is 98.41%, and the recall rate is 98.53%. The above evaluation indicators are superior to classic models such as AlexNet, GoogLeNet, and VGGNet. It can be seen that the model in this article has strong generalization ability and is not prone to overfitting. It can effectively identify and classify the cultural connotations and regions of frescoes.
ARTICLE | doi:10.20944/preprints202310.0099.v1
Subject: Engineering, Bioengineering Keywords: Intelligent Image Recognition; Left and Right Upper Limb Dislocation Surgery; Accuracy Rate; Recall Rate; IRB
Online: 3 October 2023 (09:17:35 CEST)
Our image recognition system mainly judges whether the left upper limb in the image is the left upper limb or the right upper limb through our deep learning model in the image. The doctor then could give the correct surgical position. From the experimental results, it could be found that the precision rate and recall rate of the intelligent image recognition system proposed in this paper for preventing the upper limb dislocation surgery could reach 98% and 93%, respectively. It proved that our artificial intelligent image recognition system, AIIRS, could indeed assist orthopedic surgeons to prevent the occurrence of left and right dislocation in upper limb surgery. At the same time, this paper also completes the IRB application approval through the prototype experimental results and will conduct the second phase of human trials in the future. It showed that the research results of this paper will be of great benefit and research value to upper limb orthopedic surgery.
ARTICLE | doi:10.20944/preprints201803.0068.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: tabletop system, user position identification, infrared image recognition, multi-touch gesture, FTIR panel, system usability
Online: 8 March 2018 (16:15:54 CET)
A tabletop system can facilitate multi-user collaboration in a variety of settings including small meetings, group work, and education and training exercises. The ability of identifying the users touching the table and their positions can promote collaborative work among participants, so methods have been studied that involve the attaching of sensors to the table or chairs or to the users themselves. An effective method of recognizing user actions without placing a burden on the user would be some type of visual process, so the development of a method that processes multi-touch gestures by visual means is desired. This paper describes the development of a multi-touch tabletop system using infrared image recognition for user position identification and presents the results of touch-gesture recognition experiments and a system usability evaluation. Using an FTIR touch panel and infrared light, this system picks up the shadow area of the user’s hand by infrared camera in relation to user touch operations and estimates user position by image recognition. The multi-touch gestures prepared for this system include an operation to change the direction of an object to face the user and a copy operation in which two users generate duplicates of an object. The average recognition rate of the change-direction gesture and copy gesture were found to be 96% and 85%, respectively. In addition, the system usability evaluation revealed that prior learning was easy and that system operations could be easily performed.
ARTICLE | doi:10.20944/preprints202103.0520.v1
Subject: Engineering, Automotive Engineering Keywords: Intuitive learning; Dynamic response; Small-scale model; Image-recognition; Shaking table
Online: 22 March 2021 (11:21:47 CET)
In the last years, more and more studies highlight the advantages of complementing traditional master classes with additional activities that improve students´ learning experience. This combination of teaching techniques is specially advised in the field of structural engineering, where intuition of the structural response it is of vital importance to understand the studied concepts. This paper deals with the introduction of a new (and more encouraging) educational tool to introduce intuitively students in the dynamic response of structures excited with an educational shaking table. Most of the educational structural health monitoring systems use sensors to determine the dynamic response of the structure. The proposed tool is based on a radically different approach, as it is based on low-cost image-recognition techniques. In fact, it only requires the use an amateur camera, a black background and a computer. In this study, the effects of both the camera location and the image quality are also evaluated. Finally, to validate the applicability of the proposed methodology, the dynamic response of small-scale buildings with different typologies is analyzed. In addition, a series of surveys were conducted in order to evaluate the activity based on student´s satisfaction and the actual acquisition and strengthening of knowledge.
ARTICLE | doi:10.20944/preprints202112.0376.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Large-Scale Image Classification; Printed Chinese Character Recognition; Data Synthesis; GoogLeNet-GAP; Transfer Learning
Online: 22 December 2021 (16:31:53 CET)
In the field of computer vision, large-scale image classification tasks are both important and highly challenging. With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR systems. In this study, we developed an automatic OCR system designed to identify up to 13,070 large-scale printed Chinese characters by using deep learning neural networks and fine-tuning techniques. The proposed framework comprises four components, including training dataset synthesis and background simulation, image preprocessing and data augmentation, the process of training the model, and transfer learning. The training data synthesis procedure is composed of a character font generation step and a background simulation process. Three background models are proposed to simulate the factors of the background noise and anti-counterfeiting patterns on ID cards. To expand the diversity of the synthesized training dataset, rotation and zooming data augmentation are applied. A massive dataset comprising more than 19.6 million images was thus created to accommodate the variations in the input images and improve the learning capacity of the CNN model. Subsequently, we modified the GoogLeNet neural architecture by replacing the FC layer with a global average pooling layer to avoid overfitting caused by a massive amount of training data. Consequently, the number of model parameters was reduced. Finally, we employed the transfer learning technique to further refine the CNN model using a small number of real data samples. Experimental results show that the overall recognition performance of the proposed approach is significantly better than that of prior methods and thus demonstrate the effectiveness of proposed framework, which exhibited a recognition accuracy as high as 99.39% on the constructed real ID card dataset.
ARTICLE | doi:10.20944/preprints202301.0162.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Content-based image classification; Data curation and preparation; Convolutional neural networks (CNN); Deep learning; Artificial intelligence (AI)
Online: 9 January 2023 (10:59:31 CET)
Background: MR image classification in datasets collected from multiple sources is complicated by inconsistent and missing DICOM metadata. Therefore, we aimed to establish a method for the efficient automatic classification of MR brain sequences. Methods: Deep convolutional neural networks (DCNN) were trained as one-vs-all classifiers to differentiate between six classes, T1 weighted (w), contrast-enhanced T1w, T2w, T2w-FLAIR, ADC, and SWI. Each classifier yields a probability, allowing threshold-based and relative probability assignment while excluding images with low probability (label: unknown, open-set recognition problem). Data from three high-grade glioma (HGG) cohorts was assessed; C1 (320 patients, 20101 MRI images) was used for training, while C2 (197, 11333) and C3 (256, 3522) were for testing. Two raters manually checked images through an interactive labeling tool. Finally, MR-Class' added value was evaluated via radiomics models' performance for progression-free survival (PFS) prediction in C2, utilizing the concordance index (C-I). Results: Approximately 10% of annotation errors were observed in each cohort between the DICOM series descriptions and the derived labels. MR-Class accuracy was 96.7% [95%-Cl: 95.8, 97.3] for C2 and 94.4% [93.6, 96.1] for C3. 620 images were misclassified; Manual assessment of those frequently showed motion artifacts or alterations of anatomy by large tumors. Implementation of MR-Class increased on average the PFS model C-I by 14.6% compared to a model trained without MR-Class. Conclusions: We provide a DCNN-based method for sequence classification of brain MR images and demonstrate its usability in two independent HGG datasets.
REVIEW | doi:10.20944/preprints202105.0127.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Image Acquisition, Image preprocessing, Image enhancement, beatboxing, segmentation
Online: 7 May 2021 (09:09:14 CEST)
Human beatboxing is a vocal art making use of speech organs to produce vocal drum sounds and imitate musical instruments. Beatbox sound classification is a current challenge that can be used for automatic database annotation and music-information retrieval. In this study, a large-vocabulary humanbeatbox sound recognition system was developed with an adaptation of Kaldi toolbox, a widely-used tool for automatic speech recognition. The corpus consisted of eighty boxemes, which were recorded repeatedly by two beatboxers. The sounds were annotated and transcribed to the system by means of a beatbox specific morphographic writing system (Vocal Grammatics). The image processing techniques plays vital role on image Acquisition, image pre-processing, Clustering, Segmentation and Classification techniques with different kind of images such as Fruits, Medical, Vehicle and Digital text images etc. In this study the various images to remove unwanted noise and performs enhancement techniques such as contrast limited adaptive histogram equalization, Laplacian and Harr filtering, unsharp masking, sharpening, high boost filtering and color models then the Clustering algorithms are useful for data logically and extract pattern analysis, grouping, decision-making, and machine-learning techniques and Segment the regions using binary, K-means and OTSU segmentation algorithm. It Classifying the images with the help of SVM and K-Nearest Neighbour(KNN) Classifier to produce good results for those images.
ARTICLE | doi:10.20944/preprints202102.0189.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: image quality assessment; image databases; superpixels; color image; color space; image quality measures
Online: 8 February 2021 (11:11:47 CET)
Objective Image Quality Assessment (IQA) measures are playing an increasingly important role in the evaluation of digital image quality. New IQA indices are expected to be strongly correlated with subjective observer evaluations expressed by MOS/DMOS scores. One such recently proposed index is the SuperPixel-based SIMilarity (SPSIM) index, which uses superpixel patches instead of the rectangular pixel grid.The authors in this paper have been proposed three modifications of SPSIM index. For this purpose, the color space used by SPSIM was changed and the way SPSIM determines similarity maps was modified using methods derived from the algorithm for computing the MDSI index. The third modification was a combination of the first two. These three new quality indices were used in the assessment process. The experimental results obtained on many color images from five image databases demonstrated the advantages of the proposed SPSIM modifications.
ARTICLE | doi:10.20944/preprints202007.0686.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: document scanning; whiteboard capture; image enhancement; image alignment; image registration; image quality assessment
Online: 28 July 2020 (14:03:51 CEST)
The move from paper to online is not only necessary for remote working, it is also significantly more sustainable. This trend has seen a rising need for high-quality digitization of content from pages and whiteboards to sharable online material. But capturing this information is not always easy, nor are the results always satisfactory. Available scanning apps vary in their usability and do not always produce clean results, retaining surface imperfections from the page or whiteboard in their output images. CleanPage, a novel smartphone-based document and whiteboard scanning system, is presented. CleanPage requires one button-tap to capture, identify, crop and clean an image of a page or whiteboard. Unlike equivalent systems, no user intervention is required during processing and the result is a high-contrast, low-noise image with a clean homogenous background. Results are presented for a selection of scenarios showing the versatility of the design. CleanPage is compared with two market leader scanning apps using two testing approaches: real paper scans and ground-truth comparisons. These comparisons are achieved by a new testing methodology that allows scans to be compared to unscanned counterparts, by using synthesized images. Real paper scans are tested using image quality measures. An evaluation of standard image quality assessments is included in this work and a novel quality measure for scanned images is proposed and validated. The user experience for each scanning app is assessed, showing CleanPage to be fast and easier to use.
ARTICLE | doi:10.20944/preprints202010.0323.v1
Subject: Engineering, Automotive Engineering Keywords: Image segmentation; sonar image; ocean engineering；morphological image processing
Online: 15 October 2020 (13:10:41 CEST)
It has remained a hard nut for years to segment sonar images, most of which are noisy images with inevitable blur after noise reduction. For the purpose of solutions to this problem, a fast segmentation algorithm is proposed on the basis of the gray value characteristics of sonar images. This algorithm is endowed with the advantage in no need of segmentation thresholds to be calculated. To realize this goal, it follows the undermentioned steps: first, calculate the gray matrix of the fuzzy image background. After adjusting the gray value, segment the region into the background region, buffer region and target regions. After filtering, reset the pixels with gray value lower than 255 to binarize images and eliminate most artifacts. Finally, remove the remaining noise from images by means of morphological image processing. The simulation results of several sonar images show that the algorithm can segment the fuzzy sonar image quickly and effectively, with no problem of incomplete image target shape. Thus, the stable and feasible method is testified.
REVIEW | doi:10.20944/preprints202306.1179.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: image forensics; image forgery detection; robust image watermarking; deep learning
Online: 16 June 2023 (11:07:50 CEST)
Digital images have become an important carrier for people to access information in the information age. However, with the development of the technology, digital images are vulnerable to illegal access and tampering, to the extent that they pose a serious threat to personal privacy, social order and national security. Therefore, image forensic techniques have become an important research topic in the field of multimedia information security. In recent years, deep learning technology has been widely applied in the field of image forensics and the performance achieved has significantly exceeded the conventional forensic algorithms. This survey compares the state-of-the-art image forensic techniques based on deep learning in recent years. The image forensic techniques are divided into passive and active forensics. In passive forensics, forgery detection techniques are reviewed, and the basic framework, evaluation metrics and commonly used datasets for forgery detection are presented. The performance, advantages and disadvantages of existing methods are also compared and analyzed according to different types of detection. In active forensics, robust image watermarking techniques are overviewed, the evaluation metrics and basic framework of robust watermarking techniques are presented. The technical characteristics and performance of existing methods are analyzed based on the different types of attacks on images. Finally, future research directions and conclusions are given to provide useful suggestions for people in image forensics and related research fields.
ARTICLE | doi:10.20944/preprints201703.0086.v1
Subject: Engineering, Control And Systems Engineering Keywords: image enhancement; image fusion; color space; edge detector; underwater image
Online: 14 March 2017 (17:52:48 CET)
In order to improve contrast and restore color for underwater image captured by camera sensors without suffering from insufficient details and color cast, a fusion algorithm for image enhancement in different color spaces based on contrast limited adaptive histogram equalization (CLAHE) is proposed in this article. The original color image is first converted from RGB color space to two different special color spaces: YIQ and HSI. The color space conversion from RGB to YIQ is a linear transformation, while the RGB to HSI conversion is nonlinear. Then, the algorithm separately operates CLAHE in YIQ and HSI color spaces to obtain two different enhancement images. The luminance component (Y) in the YIQ color space and the intensity component (I) in the HSI color space are enhanced with CLAHE algorithm. The CLAHE has two key parameters: Block Size and Clip Limit, which mainly control the quality of CLAHE enhancement image. After that, the YIQ and HSI enhancement images are respectively converted backward to RGB color. When the three components of red, green, and blue are not coherent in the YIQ-RGB or HSI-RGB images, the three components will have to be harmonized with the CLAHE algorithm in RGB space. Finally, with 4 direction Sobel edge detector in the bounded general logarithm ratio operation, a self-adaptive weight selection nonlinear image enhancement is carried out to fuse YIQ-RGB and HSI-RGB images together to achieve the final fused image. The enhancement fusion algorithm has two key factors: average of Sobel edge detector and fusion coefficient, and these two factors determine the effects of enhancement fusion algorithm. A series of evaluate metrics such as mean, contrast, entropy, colorfulness metric (CM), mean square error (MSE) and peak signal to noise ratio (PSNR) are used to assess the proposed enhancement algorithm. The experiments results showed that the proposed algorithm provides more detail enhancement and higher values of colorfulness restoration as compared to other existing image enhancement algorithms. The proposed algorithm can suppress effectively noise interference, improve the image quality for underwater image availably.
REVIEW | doi:10.20944/preprints202307.0585.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Underwater image analysis; Underwater image restoration; Underwater image enhancement; Underwater datasets; Underwater image quality evaluation
Online: 10 July 2023 (10:06:22 CEST)
In recent years, underwater exploration for deep-sea resource utilization and development has a considerable interest. In an underwater environment, the obtained images and videos undergo several types of quality degradation resulting from light absorption and scattering, low contrast, color deviation, blurred details, and nonuniform illumination. Therefore, the restoration and enhancement of degraded images and videos are critical. Numerous techniques of image processing, pattern recognition and computer vision have been proposed for image restoration and enhancement, but many challenges remain. This survey presents a comparison of the most prominent approaches in underwater image processing and analysis. It also discusses an overview of the underwater environment with a broad classification into enhancement and restoration techniques and introduces the main underwater image degradation reasons in addition to the underwater image model. The existing underwater image analysis techniques, methods, datasets, and evaluation metrics are presented in detail. Furthermore, the existing limitations are analyzed, which are classified into image-related and environment-related categories. In addition, the performance is validated on images from the UIEB dataset for qualitative, quantitative, and computational time assessment. Areas in which underwater images have recently been applied are briefly discussed. Finally, recommendations for future research are provided and the conclusion is presented.
ARTICLE | doi:10.20944/preprints201902.0089.v3
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Digital image processing, color image, grayscale image, histogram equalization, histogram specification, image enhancement, RGB channel
Online: 11 February 2019 (10:42:57 CET)
This paper has two major parts. In the first part histogram equalization for the image enhancement was implemented without using the built-in function in MATLAB. Here, at first, a color image of a rat was chosen and the image was transformed into a grayscale image. After this conversion, histogram equalization was implemented on the grayscale image. Later on, in the same image for each RGB channel, histogram equalization was implemented to observe the effect of histogram equalization on each channel. In the end, the histogram equalization was implemented to this specific color image of a rat. In the second part, for the grayscale image in part 1, the desired histogram of another colored image of a rat was introduced and histogram specification was implemented on the original colored image.
ARTICLE | doi:10.20944/preprints201811.0565.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Digital image processing, color image, grayscale image, histogram equalization, histogram specification, image enhancement, RGB channel
Online: 23 November 2018 (14:17:13 CET)
This paper has two major parts. In the first part histogram equalization for the image enhancement was implemented without using the built-in function in MATLAB. Here, at first, a color image of a rat was chosen and the image was transformed into a grayscale image. After this conversion, histogram equalization was implemented on the grayscale image. Later on, in the same image for each RGB channel, histogram equalization was implemented to observe the effect of histogram equalization on each channel. In the end, the histogram equalization was implemented to this specific color image of a rat. In the second part, for the grayscale image in part 1, the desired histogram of another colored image of a rat was introduced and histogram specification was implemented on the original colored image.
ARTICLE | doi:10.20944/preprints202309.2177.v1
Subject: Engineering, Mechanical Engineering Keywords: particle image velocimetry; OpenPIV; python; image processing
Online: 30 September 2023 (09:59:14 CEST)
Particle Image Velocimetry (PIV) is a widely used experimental technique for measuring flow. In recent years, open-source PIV software has become more popular as it offers researchers and practitioners enhanced computational capabilities. Software development for graphical processing unit (GPU) architectures requires careful algorithm design and data structure selection for optimal performance. PIV software, optimized for central processing units (CPUs), offer an alternative to specialized GPU software. In the present work, an improved algorithm for the OpenPIV-Python software is presented and implemented under a traditional CPU framework. The Python language was selected due to its versatility and widespread adoption. The algorithm was also tested on a supercomputing cluster, a workstation, and Google Colaboratory during the development phase. Using a known velocity field, the algorithm precisely captured the time-average flow, monetary velocity fields, and vortices.
ARTICLE | doi:10.20944/preprints202304.1088.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: deep learning; image aesthetics assessment; image enhancement
Online: 28 April 2023 (03:15:16 CEST)
Abstract: Image aesthetic assessment (IAA) with neural attention has made significant progress due to its effectiveness in object recognition. Current studies have shown that the features learned by convolutional neural networks (CNN) at different learning stages indicate meaningful information. The shallow feature contains the low-level information of images and the deep feature perceives the image semantics and themes. Inspired by this, we propose a visual enhancement network with feature fusion (FF-VEN). It consists of two sub-modules, the visual enhancement module (VE module) and the shallow and deep feature fusion module (SDFF module). The former uses an adaptive filter in the spatial domain to simulate human eyes according to the region of interest (ROI) extracted by neural feedback. The latter not only takes out the shallow feature and the deep feature by transverse connection, but also uses a feature fusion unit (FFU) to fuse the pooled features together with the aim of information contribution maximization. Experiments on standard AVA dataset and Photo.net dataset show the effectiveness of FF-VEN.
ARTICLE | doi:10.20944/preprints202310.0838.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: neural networks; image denoising; image processing; denoising algorithms
Online: 13 October 2023 (04:19:29 CEST)
Image denoising has been one of the important problems in the field of computer vision, and it has a wide range of practical value in many applications, such as medical image processing, image enhancement, and computational photography. Traditional image denoising methods are usually based on hand-designed features and filters, but these methods perform poorly under complex noise and image structures. In recent years, the rapid development of neural network technology has revolutionized the image-denoising task. This paper introduces the knowledge about neural networks and image denoising, explores the impact of neural networks on image denoising, and how is it possible to denoise images by neural networks. It also summarises other image-denoising methods and finally points out the challenges and problems faced by image-denoising at present. Some possible new development directions are proposed to provide new solutions for image-denoising researchers and to promote the development of the field.
ARTICLE | doi:10.20944/preprints202306.0081.v1
Subject: Engineering, Bioengineering Keywords: Deep Learning; Image Synthesis; Image Generation; Machine Learning; Medical Imaging; CT to MRI; Synthetic MRI; Stroke; Image-to-image Translation
Online: 1 June 2023 (11:30:09 CEST)
CT scans are currently the most common imaging modality used for suspected stroke patients due to their short acquisition time and wide availability. However, MRI offers superior tissue contrast and image quality. In this study, eight deep learning models are developed, trained, and tested using a dataset of 181 CT/MR pairs from stroke patients. The resultant synthetic MRIs generated by these models are compared through a variety of qualitative and quantitative methods. The synthetic MRIs generated by a 3D UNet model consistently demonstrated superior performance across all methods of evaluation. Overall, the generation of synthetic MRIs from CT scans using the methods described in this paper produces realistic MRIs that can guide the registration of CT scans to MRI atlases. The synthetic MRIs enable the segmentation of white matter, gray matter, and cerebrospinal fluid using algorithms designed for MRIs, exhibiting a high degree of similarity to true MRIs.
ARTICLE | doi:10.20944/preprints202309.0946.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: adversarial attacks; artificial neural networks; robustness; image filtering; convolutional neural networks; image recognition; image distortion
Online: 14 September 2023 (08:31:30 CEST)
In this paper, we continue the research cycle on the properties of convolutional neural network-based image recognition systems and ways to improve noise immunity and robustness . Currently, a popular research area related to artificial neural networks is adversarial attacks. The effect of adversarial attacks on the image is not highly perceptible to the human eye, also it drastically reduces the neural network accuracy. Image perception by a machine is highly dependent on the propagation of high frequency distortions throughout the network. At the same time, a human efficiently ignores high-frequency distortions, perceiving the shape of objects as a whole. The approach proposed in this paper can improve the image recognition accuracy in the presence of high-frequency distortions, in particular, caused by adversarial attacks. The proposed technique makes it possible to measure up the logic of artificial neural network to that of a human, for whom high-frequency distortions are not decisive in object recognition.
ARTICLE | doi:10.20944/preprints202306.0736.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: image denoising; image deblurring; salt&pepper noise; nonlinear diffusion.
Online: 12 June 2023 (02:18:59 CEST)
An algorithm for the treatment of images affected by both blurring and salt&pepper noise is proposed with a cost only proportional to the number of pixels. The methodology uses a discretization scheme for the Laplace operator multiplied by a suitable nonlinear term depending on the gradient. Even if this approach resembles a diffusion type algorithm, only one step of the procedure is applied, leading to significant time savings. The procedure is successfully tested on some standard black&white natural images.
ARTICLE | doi:10.20944/preprints202108.0286.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Image enhancement; DCT-Domain Perceived Contrast; Perceptual Image Quality
Online: 13 August 2021 (08:31:37 CEST)
This paper develops a detail image signal enhancement that makes images perceived as clearer and more resolved and so is more effective for higher resolution displays. We observe that the local variant signal enhancement makes images more vivid, and the more revealed granular signals harmonically embedded on the local variant signals make images more resolved. Based on this observation, we develop a method that not only emphasizes the local variant signals by scaling up the frequency energy in accordance with human visual perception, but also strengths up the granular signals by embedding the alpha-rooting enhanced frequency components. The proposed energy scaling method emphasizes the detail signals in texture images and rarely boosts noisy signals in plain images. In addition, to avoid the local ringing artifact, the proposed method adjusts the enhancement direction to be parallel to the underlying image signal direction. It was verified through the subjective and objective quality evaluations that the developed method makes images perceived as clearer and highly resolved.
ARTICLE | doi:10.20944/preprints202101.0345.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: image processing; low resolution image; crack detection; user algorithm
Online: 18 January 2021 (14:26:38 CET)
Abstract: Imaging devices of less than 300,000 pixels are mostly used for sewage conduit exploration due to the petty nature of the survey industry in Korea. P articular ly , devices of less than 100,000 pixels are still widely used, and the environment for image processing is very bitter . Since the sewage conduit image s covered in this study ha ve a very low resolution (240 × 320 = 76,800 pixels), it is very difficult to detect cracks. Because most of the resolution of the sewe r conduit images are very low in Korea, this problem of low resolution was selected as the subject of study. Cracks were detected through a total of six steps of improving the crack in Step 2, finding the optimal threshold value in Step 3, and applying an algorithm to detect cracks in Step 5. Cracks were effectively detected by the optimal parameters in Steps 2 and 3 and the user algorithm in Step 5. Desp ite the very low resolution, the cracked image s showed 96.4% accuracy of detection, and the non cracked image s showed 94.5% accuracy . Moreover, the analysis was excellent in quality , also . It is believed that the findings of this study can be effectively u sed for crack detection with low resolution images.
ARTICLE | doi:10.20944/preprints201810.0393.v1
Subject: Engineering, Control And Systems Engineering Keywords: image analysis; Turin Shroud; body-image formation; energy propagation
Online: 18 October 2018 (03:55:21 CEST)
Recent studies on the image of the Turin Shroud (TS) lead to think it could have been formed through a not well-identified mechanism of energy radiation. In order to remove some lacunas about this imaging process, a reverse engineering method has been applied to it, arriving to exclude some possible mechanisms. The image formation of a human face wrapped on a cloth by using an ad-hoc developed software has been simulated. The results of different kinds of the radiation depending from different parameters have been simulated, each one connected with accredited hypotheses. On the basis of the comparison among the different images produced by the software and the TS Face, some useful information both about the kind of radiation and the cloth wrapping conditions have been obtained. The effect of image distortion of a cloth wrapped around a face has been discussed too by defining the best laws of radiation and of their attenuation with distance. A Lambertian law is not compatible with the TS image. A vertical radiation shows a problem in reproducing the requested resolution. A radiation perpendicular to the emitting surface, like that produced by an electric field appears promising to explain the TS Face.
ARTICLE | doi:10.20944/preprints201705.0028.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: monocular image; image segment; SIFT; depth measurement; convex hull
Online: 3 May 2017 (09:19:59 CEST)
It is one of very important and basic problem in compute vision field that recovering depth information of objects from two-dimensional images. In view of the shortcomings of existing methods of depth estimation, a novel approach based on SIFT (the Scale Invariant Feature Transform) is presented in this paper. The approach can estimate the depths of objects in two images which are captured by an un-calibrated ordinary monocular camera. In this approach, above all, the first image is captured. All of the camera parameters remain unchanged, and the second image is acquired after moving the camera a distance d along the optical axis. Then image segmentation and SIFT feature extraction are implemented on the two images separately, and objects in the images are matched. Lastly, an object depth can be computed by the lengths of a pair of straight line segments. In order to ensure that the best appropriate a pair of straight line segments are chose and reduce the computation, the theory of convex hull and the knowledge of triangle similarity are employed. The experimental results show our approach is effective and practical.
ARTICLE | doi:10.20944/preprints201611.0057.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: multi-focus image, image fusion, region mosaic, contrast pyramid
Online: 10 November 2016 (07:34:22 CET)
This paper proposes a new approach for multi-focus images fusion based on Region Mosaicing on Contrast Pyramids (REMCP). A density-based region growing method is developed to construct a focused region mask for multi-focus images. The segmented focused region mask is decomposed into a mask pyramid, which is then used for supervised region mosaicking on a contrast pyramid. In this way, the focus measurement and the continuity of focused regions are incorporated and the pixel level pyramid fusion is improved at the region level. Objective and subjective experiments show that the proposed REMCP is more robust to noise than compared algorithms and can fully preserves the focus information of the multi-focus images meanwhile reducing distortions of the fused images.
ARTICLE | doi:10.20944/preprints201811.0566.v2
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Color image, grayscale image, motion blurring, random noise, inverse filtering, Wiener filtering, restoration of an image
Online: 5 February 2019 (16:13:14 CET)
In this paper, at first, a color image of a car is taken. Then the image is transformed into a grayscale image. After that, the motion blurring effect is applied to that image according to the image degradation model described in equation 3. The blurring effect can be controlled by a and b components of the model. Then random noise is added in the image via Matlab programming. Many methods can restore the noisy and motion blurred image; particularly in this paper Inverse filtering as well as Wiener filtering are implemented for the restoration purpose. Consequently, both motion blurred and noisy motion blurred images are restored via Inverse filtering as well as Wiener filtering techniques and the comparison is made among them.
ARTICLE | doi:10.20944/preprints202201.0259.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: image classifier; image part; quick learning; feature overlap; positional context
Online: 11 April 2022 (10:17:57 CEST)
This paper describes an image processing method that makes use of image parts instead of neural parts. Neural networks excel at image or pattern recognition and they do this by constructing complex networks of weighted values that can cover the complexity of the pattern data. These features however are integrated holistically into the network, which means that they can be difficult to use in an individual sense. A different method might scan individual images and use a more local method to try to recognise the features in it. This paper suggests such a method, where a trick during the scan process can not only recognise separate image parts, as features, but it can also produce an overlap between the parts. It is therefore able to produce image parts with real meaning and also place them into a positional context. Tests show that it can be quite accurate, on some handwritten digit datasets, but not as accurate as a neural network, for example. The fact that it offers an explainable interface could make it interesting however. It also fits well with an earlier cognitive model, and an ensemble-hierarchy structure in particular.
ARTICLE | doi:10.20944/preprints202008.0336.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: image processing; image classification; computer vision; expert systems; amber gemstones
Online: 15 August 2020 (04:39:11 CEST)
The article describes a classification solution for amber stones. The problem of classifying amber is known for a long time among jewelers and artisans of amber art. Existing solutions can classify amber pieces according to color, but a need to classify by shape and texture is not satisfied up to now. The proposed solution is capable of classifying the gemstones according to a shape. Amber can be considered as a specific object since the form is difficult to define unambiguously. Data for amber experiments was gathered from amber art craftsmen. In the proposed solution amber form can be classified into 10 different classes (7 classes chosen during the experiment).
ARTICLE | doi:10.20944/preprints202006.0117.v1
Subject: Medicine And Pharmacology, Other Keywords: Image Noise Removal; Image Enhancement; MFNR; Speckle noise; Median Filter
Online: 9 June 2020 (05:00:26 CEST)
Speckle noise is one of the most difficult noises to remove especially in medical applications. It is a nuisance in ultrasound imaging systems which is used in about half of all medical screening systems. Thus, noise removal is an important step in these systems, thereby creating reliable, automated, and potentially low cost systems. Herein, a generalized approach MFNR (Multi-Frame Noise Removal) is used, which is a complete Noise Removal system using KDE (Kernal Density Estimation). Any given type of noise can be removed if its probability density function (PDF) is known. Herein, we extracted the PDF parameters using KDE. Noise removal and detail preservation are not contrary to each other as the case in single-frame noise removal methods. Our results showed practically complete noise removal using MFNR algorithm compared to standard noise removal tools. The Peak Signal to Noise Ratio (PSNR) performance was used as a comparison metric. This paper is an extension to our previous paper where MFNR Algorithm was showed as a general purpose complete noise removal tool for all types of noises
ARTICLE | doi:10.20944/preprints202002.0125.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: image inpainting; image completion; attention; pyramid structure loss; deep learning
Online: 10 February 2020 (10:16:37 CET)
This paper develops a multi-task learning framework that attempts to incorporate the image structure knowledge to assist image inpainting, which is not well explored in previous works. The primary idea is to train a shared generator to simultaneously complete the corrupted image and corresponding structures --- edge and gradient, thus implicitly encouraging the generator to exploit relevant structure knowledge while inpainting. In the meantime, we also introduce a structure embedding scheme to explicitly embed the learned structure features into the inpainting process, thus to provide possible preconditions for image completion. Specifically, a novel pyramid structure loss is proposed to supervise structure learning and embedding. Moreover, an attention mechanism is developed to further exploit the recurrent structures and patterns in the image to refine the generated structures and contents. Through multi-task learning, structure embedding besides with attention, our framework takes advantage of the structure knowledge and outperforms several state-of-the-art methods on benchmark datasets quantitatively and qualitatively.
ARTICLE | doi:10.20944/preprints201906.0248.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: image segmentation; neutrosophic information; Shannon entropy; gray level image threshold
Online: 25 June 2019 (08:48:22 CEST)
This article presents a new method of segmenting grayscale images by minimizing Shannon's neutrosophic entropy. For the proposed segmentation method, the neutrosophic information components, i.e., the degree of truth, the degree of neutrality and the degree of falsity are defined taking into account the belonging to the segmented regions and at the same time to the separation threshold area. The principle of the method is simple and easy to understand and can lead to multiple thresholds. The efficacy of the method is illustrated using some test gray level images. The experimental results show that the proposed method has good performance for segmentation with optimal gray level thresholds.
ARTICLE | doi:10.20944/preprints201904.0078.v1
Subject: Social Sciences, Psychology Keywords: forest recreation; forest landscape; landscape image; landscape image sketching technique
Online: 8 April 2019 (09:08:30 CEST)
The landscape image is the bridge of communication between people and forests, and the cut point of the supply-side reform of forest tourism products. The research collected 140 copies in total of forest landscape image drawings from non-art-major graduate students by randomly sampling during April and May, 2018, and constructed the landscape image conceptual model of forest by utilizing the landscape image sketching technique. The results showed that (1) In regard to linguistic knowledge, the natural landscape elements for instance, herbaceous plants, terrains, creatures, water and sky, and the broad-leaf forest objectively reflected not only the real forest landscape and the local native vegetation, but the variation of forest species with little attention. (2) On the perspective of spatial view, the sideways view indicated that graduate students preferred to watch forests at a moderate distance externally and few looked at forests internally. (3) In the view of self-orientation, the objective landscape indicated that graduate students preferred to demonstrate forest landscapes, they did not realize to interact with the environment. (4) On the aspect of social meaning, the scenic view and forest structure stated that graduate students preferred rural forest landscapes, not significantly for other special interests for forest. In conclusions, (1) the forest is thought to be a feature of people's life world and of rural scenes around homes, not an objective perception of the forest. (2) The forest is regarded as an important habitat for animals and a limited resource for people's life, production and recreation needs, into which people will go only to meet such needs. (3) The natural values of forests, like the ecology and aesthetics, etc. get more attention, while the social values of forests, like the life, production and culture receives rather low attention.
ARTICLE | doi:10.20944/preprints202006.0091.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Breast Cancer Screening; Digital Image Elasto Tomography (DIET); Image Noise Removal, Image Enhancement; Multiple Frame Noise Removal (MFNR)
Online: 7 June 2020 (14:53:34 CEST)
Breast cancer is a leading cause of death among women. Conventional screening methods, such as mammography, and ultrasound diagnosis are expensive and have significant limitations. Digital Image Elasto Tomography (DIET) is a new noninvasive breast cancer screening system that has a potential to be a low cost and reliable breast cancer screening tool. It is based on modal analysis of the breast mass, and stereographic 3D image analysis to detect the stiffer abnormal tissues. However, camera sensor noise, especially Gaussian noise is a major source of Optical Flow (OF) error in this approach to tumor detection. This work studies the performance of different conventional filters, including the standard Gaussian filter tool to remove this noise and produce more robust screening results. A radical approach, Multiple Frame Noise Removal (MFNR) is proposed, for use in this type of medical image processing instead of a Gaussian filter or other typical image noise removal tools. Its a multiple frame noise removal method where Probability Density Function (PDF) of noise is extracted from the multiple images by characterizing the same pixel positions in multiple images. The noise becomes deterministic, and hence easily removed. The proposed algorithm was applied to a data set from 10 phantom breast tests with a prototype DIET system, and 10 in-vivo samples from healthy women. Comparisons were made to an optimal Gaussian filter form that is commonly used. Reductions in OF error using these digitally imaged data sets was used to compare performance. Refinement of the images for medical applications requires higher PSNR, which was successfully achieved by using MFNR algorithm. In this study, the algorithm was used to improve the imaging results of a DIET system. The conventional wisdom that states that noise removal and detail preservation are contrasting effects is
ARTICLE | doi:10.20944/preprints202310.1144.v1
Subject: Biology And Life Sciences, Life Sciences Keywords: image denoising; filtering methods; biomedical image denoising; healthcare; adaptive filtering methods
Online: 18 October 2023 (09:18:36 CEST)
In this paper, the filtering method of biomedical image denoising is described comprehensively. Firstly, it introduces the biomedical image denoising, describes the relationship between biomedical image denoising and medical care, introduces the filtering methods, the filtering methods of biomedical image denoising, the challenges encountered by the current filtering methods, and other application fields of filtering methods. Firstly, the background of biomedical image denoising is introduced. Biomedical image denoising is a challenge. Different imaging modes have different noise characteristics, and noise levels can vary greatly depending on the specific application. Secondly, it describes that biomedical image denoising plays an important role in medical care, and the biomedical image directly affects the patient's diagnosis, treatment plan and the overall quality of medical care service. Then the filtering method is introduced in detail, describing the core concepts and related features of linear filtering, nonlinear filtering and frequency domain filtering, and then focusing on the adaptive filtering method, describing the characteristics, conditions of use, common algorithms and advantages of adaptive filtering method. Then the filter methods of biomedical image denoising are introduced, and the core concepts of Gaussian filter, median filter, total variation denoising and Wiener filter are introduced respectively. Then, the challenges encountered by filtering methods are described, such as the accurate selection of filters, the balance between noise reduction and image detail preservation are introduced. Finally, the application of filtering method in other fields is mentioned, such as audio processing, speech recognition and so on. In summary, this paper comprehensively expounds the denoising and filtering methods of biomedical images, the filtering methods of medical image denoising, the relationship between medical image denoising and medical care, and the challenges encountered by filtering methods.
REVIEW | doi:10.20944/preprints202309.0223.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: deep learning; medical images; image registration; medical image analysis; survey; review
Online: 5 September 2023 (03:51:29 CEST)
Image registration (IR) is a process that deforms images to align them with respect to a reference space, making it easier for medical practitioners to examine various medical images in a standardized reference frame, such as having the same rotation and scale. This document introduces image registration using a simple numeric example. It provides a definition of image registration along with a space-oriented symbolic representation. This review covers various aspects of image transformations, including affine, deformable, invertible, and bidirectional transformations, as well as medical image registration algorithms such as Voxelmorph, Demons, SyN, Iterative Closest Point, and SynthMorph. It also explores atlas-based registration and multistage image registration techniques, including coarse-fine and pyramid approaches. Furthermore, this survey paper discusses medical image registration taxonomies, datasets, evaluation measures, such as correlation-based metrics, segmentation-based metrics, processing time, and model size. It also explores applications in image-guided surgery, motion tracking, and tumor diagnosis. Finally, the document addresses future research directions, including the further development of transformers.
ARTICLE | doi:10.20944/preprints202105.0408.v1
Subject: Engineering, Automotive Engineering Keywords: UAV Images; Monoscopic Mapping; Stereoscopic Plotting; Image Overlap; Optimal Image Selection
Online: 18 May 2021 (10:10:07 CEST)
Recently, the mapping industry has been focusing on the possibility of large-scale mapping from unmanned aerial vehicles (UAVs) owing to advantages such as easy operation and cost reduction. In order to produce large-scale maps from UAV images, it is important to obtain precise orientation parameters. For this, various techniques have been developed and are included in most of the commercial UAV image processing software. For mapping, it is equally important to select images that can cover a region of interest (ROI) with the fewest possible images. Otherwise, to map the ROI, one may have to handle too many images, and commercial software does not provide information needed to select images, nor does it explicitly explain how to select images for mapping. For these reasons, stereo mapping of UAV images in particular is time consuming and costly. In order to solve these problems, this study proposes a method to select images intelligently. We can select a minimum number of image pairs to cover the ROI with the fewest possible images. We can also select optimal image pairs to cover the ROI with the most accurate stereo pairs. We group images by strips, and generate the initial image pairs. We then apply an intelligent scheme to iteratively select optimal image pairs from the start to the end of an image strip. According to the results of the experiment, the number of images selected is greatly reduced by applying the proposed optimal image–composition algorithm. The selected image pairs produce a dense 3D point cloud over the ROI without any holes. For stereoscopic plotting, the selected image pairs were map the ROI successfully on a digital photogrammetric workstation (DPW), and a digital map covering the ROI is generated. The proposed method should contribute to time and cost reductions in UAV mapping.
REVIEW | doi:10.20944/preprints202012.0479.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Image classification; Texture image analysis; Discriminant features; Combination methods; texture operators
Online: 18 December 2020 (16:21:50 CET)
In many image processing and computer vision applications, the main aim is to describe image contents. So, different visual properties such as color, texture and shape are extracted to make aim. In this respect, texture information play important role in image description and visual pattern classification. Texture is referred to a specific local distribution of intensities that is repeated throughout the image. Since now different operations or descriptors have been proposed to analysis texture characteristics. In the multi object images specific texture operators usually doesn’t provide accurate results. So, in many cases, combination of texture operators are used to achieve more discriminant features. In this paper, some combination methods are survived to analysis effect of combinational texture features in image content description. Also, in the result part, different related methods are compared in terms of accuracy and computational complexity.
ARTICLE | doi:10.20944/preprints202005.0167.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: neutrosophic information; Onicescu information energy; image segmentation; gray level image threshold
Online: 10 May 2020 (14:41:04 CEST)
This article presents a method of segmenting images with gray levels that uses Onicescu's information energy calculated in the context of the neutrosophic theory. Starting from the information energy calculation for complete neutrosophic information, it is shown how to extend its calculation for incomplete and inconsistent neutrosophic information. The segmentation method is based on calculation of thresholds for separating the gray levels using the local maximum points of the Onicescu information energy.
ARTICLE | doi:10.20944/preprints202303.0326.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: document image processing; deskew; Hough Line Transform; image rectification; machine learning; OCR; document orientation; image preprocessing; computer vision; AI
Online: 17 March 2023 (13:25:06 CET)
Document deskewing is a fundamental problem in document image processing. While existing methods have limitations, such as Hough Line Transformation that can deskew images upside down, and Deep Learning models that require huge amounts of human labour and computational resources and still fail to deskew while taking care of orientation, OCR-based methods also struggle to read text when it is tilted. In this paper, we propose a novel, simple, cost-effective deep learning method for fixing the skew and orientation of documents. Our approach reduces the search space for the machine learning model to predict whether an image is upside down or not, avoiding the huge search space of predicting an angle between 0 and 360. We finetuned a MobileNetV2 model, which was pre-trained on imagenet, using only 200 images and achieve good results. This method is useful for automation-based tasks, such as data extraction using OCR technology, and can greatly reduce manual labour.
ARTICLE | doi:10.20944/preprints202310.2003.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Image classification; Computer vision; Transfer learning; Image database; Plant nutrition; Leaf analysis
Online: 31 October 2023 (08:13:17 CET)
Computer vision is a powerful technology that has enabled solutions in various fields by analyzing visual attributes in images. One field that has taken advantage of computer vision is agricultural automation, which promotes high-quality crop production. The nutritional status of a crop is a crucial factor in determining its productivity. This status is mediated by approximately 14 chemical elements acquired by the plant, and their determination plays a pivotal role in farm management. To address the timely identification of nutritional disorders, this study focuses on the classification of three levels of phosphorus deficiencies through individual leaf analysis. The methodological steps include: (1) generating a database with laboratory-grown maize plants that were induced to total phosphorus deficiency, medium deficiency, and total nutrition, using different capture devices; (2) processing the images with state-of-the-art transfer learning architectures (i.e. VGG16, ResNet50, GoogLeNet, DenseNet201, and MobileNetV2); and (3) evaluating the classification performance of the models using the created database. The results show that the VGG16 model achieves superior performance, with 98% classification accuracy. However, the other studied architectures also demonstrate competitive performance and are considered state-of-the-art automatic leaf deficiency detection tools. The proposed method can be a starting point to fine-tune machine vision-based solutions tailored for real-time monitoring of crop nutritional status.
ARTICLE | doi:10.20944/preprints202307.1395.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Fractional order differential operator; Fractional order integral operator; Image enhancement; Image denoising
Online: 20 July 2023 (10:42:06 CEST)
The theory of fractional calculus extends the order of classical integer calculus from integer to non-integer. As a new engineering application tool, it has made many important research achievements in many fields, including image processing. This paper mainly studies the application of fractional calculus theory in image enhancement and denoising, including the basic theory of fractional calculus and its amplitude frequency characteristics, the application of fractional differential operator in image enhancement, and the application of fractional integral operator in image denoising. The experimental results show that the fractional calculus theory has more special advantages in image enhancement and denoising. Compared with the existing integer order image enhancement operators, the fractional differential operator can more effectively enhance the "weak edge" and "strong texture" details of the image. The fractional order integral image denoising operator can not only improve the signal-to-noise ratio of the image compared to traditional denoising methods, but also better preserve detailed information such as edges and textures of the image.
ARTICLE | doi:10.20944/preprints202304.0723.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: no-reference image quality assessment; multitask learning; image restoration; multi-level features.
Online: 21 April 2023 (10:52:35 CEST)
When the image quality is evaluated, the human visual system (HVS) infers the details in the image through its internal generative mechanism. In this process, the HVS integrates both local and global information of the image, utilizes contextual information to restore the original image information, and compares it with the distorted image information for image quality evaluation. Inspired by this mechanism, a no-reference image quality assessment method is proposed based on a multitask image restoration network. The multitask image restoration network generates a pseudo-reference image as the main task and produces structural similarity index measure map as an auxiliary task. By mutually promoting the two tasks, a higher quality pseudo-reference image is generated. In addition, when predicting the image quality score, both the quality restoration features and the difference features between the distorted and reference images are used, thereby fully utilizing the information from the pseudo-reference image. To enable the model to focus on both global and local features, a multi-scale feature fusion module is proposed. Experimental results demonstrate that the proposed method achieves excellent performance on both synthetically and authentically distorted databases.
TECHNICAL NOTE | doi:10.20944/preprints202203.0095.v1
Subject: Engineering, Control And Systems Engineering Keywords: pre-processing; image transformation; image enhancement; geometric correction; radiometric correction; Satellite Imagery
Online: 7 March 2022 (09:43:08 CET)
During the few years, various algorithms have been developed to extract features from high-resolution satellite imagery. For the classification of these extracted features, several complex algorithms have been developed. But these algorithms do not possess critical refining stages of processing the data at the preliminary phase. Various satellite sensors have been launched such as LISS3, IKONOS, QUICKBIRD, and WORLDVIEW etc. Before classification and extraction of semantic data, imagery of the high resolution must be refined. The whole refinement process involves several steps of interaction with the data. These steps are pre-processing algorithms that are presented in this paper. Pre-processing steps involves Geometric correction, radiometric correction, Noise removal, Image enhancement etc. Due to these pre-processing algorithms, the accuracy of the data is increased. Various applications of these pre-processing of the data are in meteorology, hydrology, soil science, forest, physical planning etc. This paper also provides a brief description of the local maximum likelihood method, fuzzy method, stretch method and pre-processing methods, which are used before classifying and extracting features from the image.
ARTICLE | doi:10.20944/preprints201705.0027.v2
Subject: Social Sciences, Geography, Planning And Development Keywords: remote sensing; image registration; multiple image features; different viewpoint; non-rigid distortion
Online: 13 June 2017 (09:52:10 CEST)
Remote sensing image registration plays an important role in military and civilian fields, such as natural disaster damage assessment, military damage assessment and ground targets identification, etc. However, due to the ground relief variations and imaging viewpoint changes, non-rigid geometric distortion occurs between remote sensing images with different viewpoint, which further increases the difficulty of remote sensing image registration. To address the problem, we propose a multi-viewpoint remote sensing image registration method which contains the following contributions. (i) A multiple features based finite mixture model is constructed for dealing with different types of image features. (ii) Three features are combined and substituted into the mixture model to form a feature complementation, i.e., the Euclidean distance and shape context are used to measure the similarity of geometric structure, and the SIFT (scale-invariant feature transform) distance which is endowed with the intensity information is used to measure the scale space extrema. (iii) To prevent the ill-posed problem, a geometric constraint term is introduced into the L2E-based energy function for better behaving the non-rigid transformation. We evaluated the performances of the proposed method by three series of remote sensing images obtained from the unmanned aerial vehicle (UAV) and Google Earth, and compared with five state-of-the-art methods where our method shows the best alignments in most cases.
ARTICLE | doi:10.20944/preprints202108.0392.v1
Subject: Engineering, Control And Systems Engineering Keywords: image quality assessment; real-time image processing; image functions adaptation; convolutional neural network; face alignment; deep neural network; random forest
Online: 18 August 2021 (17:06:02 CEST)
In recent years, data providers are generating and streaming a large number of images. More particularly, processing images that contain faces have received great attention due to its numerous applications, such as entertainment and social media apps. The enormous amount of images shared on these applications presents serious challenges and requires massive computing resources to ensure efficient data processing. However, images are subject to a wide range of distortions in real application scenarios during the processing, transmission, sharing, or combination of many factors. So, there is a need to guarantee acceptable delivery content, even though some distorted images do not have access to their original version. In this paper, we present a framework developed to estimate the images' quality while processing a large number of images in real-time. Our quality evaluation is measured using an integration of a deep network with random forests. In addition, a face alignment metric is used to assess the facial features. Experimental results have been conducted on two artificially distorted benchmark datasets, LIVE and TID2013. We show that our proposed approach outperforms the state-of-art methods, having a Pearson Correlation Coefficient (PCC) and Spearman Rank Order Correlation Correlation Coefficient (SROCC) with subjective human scores of almost 0.942 and 0.931 while minimizing the processing time from 4.8ms to 1.8ms.
ARTICLE | doi:10.20944/preprints202311.0161.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: deep learning; skin cancer; image augmentation; GAN; geometric augmentation; image classification; interpretable technique
Online: 2 November 2023 (10:52:57 CET)
This research paper presents a deep learning approach to early detection of skin cancer using image augmentation techniques. The authors propose a two-stage image augmentation technique that involves the use of geometric augmentation and generative adversarial network (GAN) to classify skin lesions as either benign or malignant. This research utilized the public HAM10000 dataset to test the proposed model. Several pre-trained models of CNN were employed, namely Xception, Inceptionv3, Resnet152v2, EfficientnetB7, InceptionresnetV2, and VGG19. Our approach achieved accuracy, precision, recall, and F1-score of 96.90%, 97.07%, 96.87%, 96.97%, respectively, which is higher than the performance achieved by other state-of-the-art methods. The paper also discusses the use of SHapley Additive exPlanations (SHAP), an interpretable technique for skin cancer diagnosis, which can help clinicians understand the reasoning behind the diagnosis and improve trust in the system. Overall, the proposed method presents a promising approach to automated skin cancer detection that could improve patient outcomes and reduce healthcare costs.
ARTICLE | doi:10.20944/preprints202310.0524.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep learning; image representation learning; self-supervised learning; masked image modeling; contrastive learning
Online: 9 October 2023 (12:52:30 CEST)
Self-supervised learning is a method that learns general representation from unlabeled data. Masked image modeling (MIM), one of the generative self-supervised learning methods, has drawn attention showing state-of-the-art performance on various downstream tasks, though showing poor linear separability resulting from the token-level approach. In this paper, we propose a contrastive learning-based multi-view masked autoencoder for MIM, exploiting an image-level approach by learning common features from two different augmented views. We strengthen MIM by learning long-range global patterns from contrastive loss. Our framework adopts simple encoder-decoder architecture, learning rich and general representation by following a simple process: 1) two different views are generated from an input image with random masking and by contrastive loss, we can learn semantic distance of the representations generated by an encoder. By applying a high mask ratio, 80%, it works as strong augmentation and alleviates the representation collapse problem. 2) With reconstruction loss, decoder learns to reconstruct an original image from the masked image. We assess our framework by several experiments on benchmark datasets of image classification, object detection, and semantic segmentation. We achieve 84.3% fine-tuning accuracy on ImageNet-1K classification and 76.7% in linear probing, exceeding previous studies and show promising results on other downstream tasks. Experimental results demonstrate that our work can learn rich and general image representation by applying contrastive loss to masked image modeling.
REVIEW | doi:10.20944/preprints202309.2137.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Medical image analysis, Medical image data, Deep learning, Computer vision techniques, Optimisation methods
Online: 30 September 2023 (17:58:32 CEST)
Medical image analysis is an important branch in the field of medicine, which mainly uses image processing and analysis techniques to interpret and diagnose medical image data. Medical image data helps doctors to effectively observe and diagnose patients' body structures, tissues and lesions. Medical image analysis has been an important research area in the medical field, and it is important for disease diagnosis, treatment planning, and condition monitoring. In recent years, the rapid development of deep learning and computer vision technologies has contributed greatly to the automation, multimodal data fusion, real-time application, and accuracy improvement of medical image analysis. In addition, the development of deep learning has given rise to some new research areas in medical image analysis, such as Generative Adversarial Networks (GANs) for synthetic medical images, self-supervised learning for unsupervised feature learning, and neural network interpretability. In this paper, we will introduce some optimisation methods for medical images which are effective in improving the accuracy, efficiency and reliability of medical image analysis.
ARTICLE | doi:10.20944/preprints202306.0922.v1
Subject: Computer Science And Mathematics, Signal Processing Keywords: Multimodality medical image; Image fusion; Sparse representation (SR); Kronecker criterion; Activity level measure
Online: 13 June 2023 (10:09:15 CEST)
Multimodal medical image fusion is a fundamental but challenging problem in the fields of brain science research and brain disease diagnosis, and it is challenging for sparse representation (SR)-based fusion to characterize activity level with single measurement and no loss of effective information. In this paper, the Kronecker-criterion-based SR framework is applied for medical image fusion with a patch-based activity level integrating salient features of multiple domains. Inspired by the formation process of vision system, the spatial saliency is characterized by textural contrast (TC), which is composed of luminance and orientation contrasts to promote more highlighted texture information to participate in the fusion process. As substitution of the conventional l1-norm-based sparse saliency, a metric of sum of sparse salient features (SSSF) is used for promoting more significant coefficients to participate in the composition of activity level measure. The designed activity level measure is verified to be more conducive to maintain the integrity and sharpness of detailed information. Various experiments on multiple groups of clinical medical images verify the effectiveness of the proposed fusion method on both visual quality and objective assessment. Furthermore, the research work of this paper is helpful for further detection and segmentation of medical images.
ARTICLE | doi:10.20944/preprints202303.0319.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: industrial image processing; feature amplification; image transformation strategy; text detection; Probabilistic Hough Transform
Online: 17 March 2023 (09:05:54 CET)
Industrial nameplates serve as a means of conveying critical information and parameters. In this work, we propose a novel approach for rectifying industrial nameplate pictures utilizing a probabilistic Hough transform. Our method effectively corrects for distortions and clipping, and features a collection of challenging nameplate pictures for analysis. To determine the corners of the nameplate, we employ a progressive probability Hough transform, which not only enhances detection accuracy but also possesses the ability to handle complex industrial scenarios. The results of our approach are clear and readable nameplate text, as demonstrated through experiments that show improved accuracy in model identification compared to other methods.
CONCEPT PAPER | doi:10.20944/preprints202204.0129.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Digital Design; Digital Architecture; Image Processing; Machine learning; FPGA; Dedicated Design; Image Processor
Online: 14 April 2022 (05:09:47 CEST)
Many dedicated designs for real-time operations provide functionality on fixed-sized operators, but where speed, scalability, and flexibility are required, extensive research is demanded. Dedicated designs can provide real-time processing for many applications. This paper presents an FPGA-based design of a general image processor. The proposed design is based on a fixed-point representation of binary numbers. The proposed design provides a mechanism to manage matrices on-chip along with matrix arithmetic. The matrices are represented with simple identifiers and microinstruction that assist in the computation of many operations which are useful for solving complex problems. The design was successfully implemented and tested using VHDL language. The proposed design is an efficient architecture as a standalone processor with all embedding computational resources necessary for an embedded image processing application.
ARTICLE | doi:10.20944/preprints202001.0205.v1
Subject: Social Sciences, Behavior Sciences Keywords: itch; scratch; automated real-time detection; machine-learning based image classifier; image sharpness
Online: 19 January 2020 (03:13:48 CET)
A 'little brother' of pain, itch is an unpleasant sensation that creates a specific urge to scratch. To date, various machine-learning based image classifiers (MBICs) have been proposed for quantitative analysis of itch-induced scratch behaviour of laboratory animals in an automated, non-invasive, inexpensive and real-time manner. In spite of MBICs' advantages, the overall performances (accuracy, sensitivity and specificity) of current MBIC approaches remains inconsistent, with their values varying from ~50% to ~99%, for which the reasons underlying have yet to be investigated further, both computationally and experimentally. To look into the variation of the performance of MBICs in automated detection of itch-induced scratch, this article focuses on the experimental data recording step, and reports here for the first time that MBICs' overall performance is inextricably linked to the sharpness of experimentally recorded video of laboratory animal scratch behaviour. This article furthermore demonstrates for the first time that a linearly correlated relationship exists between video sharpness and overall performance (accuracy and specificity, but not sensitivity) of MBICs, and highlight the primary role of experimental data recording in rapid, accurate and consistent quantitative assessment of laboratory animal itch.
ARTICLE | doi:10.20944/preprints201911.0218.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: Landsat; Google Earth; water index; unsupervised image classification; supervised image classification; Kappa coefficient
Online: 19 November 2019 (03:10:17 CET)
To address three important issues related to extraction of water features from Landsat imagery, i.e., selection of water indexes and classification algorithms for image classification, collection of ground truth data for accuracy assessment, this study applied four sets (ultra-blue, blue, green, and red light based) of water indexes (NWDI, MNDWI, MNDWI2, AWEIns, and AWEIs) combined with three types of image classification methods (zero-water index threshold, Otsu, and kNN) to 24 selected lakes across the globe to extract water features from Landsat-8 OLI imagery. 1440 (4x5x3x24) image classification results were compared with the extracted water features from high resolution Google Earth images with the same (or ±1 day) acquisition dates through computing the Kappa coefficients. Results show the kNN method is better than the Otsu method, and the Otsu method is better than the zero-water index threshold method. If the computational cost is not an issue, the kNN method combined with the ultra-blue light based AWEIns is the best method for extracting water features from Landsat imagery because it produced the highest Kappa coefficients. If the computational cost is taken into account, the Otsu method is a good choice. AWEIns and AWEIs are better than NDWI, MNDWI and MNDWI2. AWEIns works better than AWEIs under the Otsu method, and the average rank of the image classification accuracy from high to low is the ultra-blue, blue, green, and red light-based AWEIns.
REVIEW | doi:10.20944/preprints202308.0657.v2
Subject: Physical Sciences, Radiation And Radiography Keywords: image quality; interventional radiology; pediatrics
Online: 30 August 2023 (04:05:02 CEST)
Pediatric interventional cardiology procedures are essential in diagnosing and treating congenital heart disease in children; however, they raise concerns about potential radiation exposure. Managing radiation doses and assessing image quality in angiographs becomes imperative for safe and effective interventions. This systematic review aims to comprehensively analyze the current understanding of physical image quality metrics relevant for characterizing X-ray systems used in fluoroscopy-guided pediatric cardiac interventional procedures, considering the main factors reported in the literature that influence this outcome. A search in Scopus and Web of Science, using relevant keywords and inclusion/exclusion criteria, yielded fourteen relevant articles published between 2000 and 2022. The physical image quality metrics reported were noise, signal-to-noise ratio, contrast, contrast-to-noise ratio, and high contrast spatial resolution. Various factors influencing image quality were investigated, such as polymethyl methacrylate thickness (often used to simulate water equivalent tissue thickness), operation mode, anti-scatter grid presence and tube voltage. Objective evaluations using these metrics ensure impartial assessments for main factors affecting image quality, improving in the characterization fluoroscopic X-ray systems, and aiding informed decisions to safeguard pediatric patients during procedures.
COMMUNICATION | doi:10.20944/preprints202306.0492.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: strongest activations; image complexity; convolution
Online: 7 June 2023 (05:42:05 CEST)
Neural networks were treated as black boxes for a long time. Previous works have unearthed what aspects of an image were important for convolutional layers at different positions in the network. This was done using deconvolutional networks. In this paper, we examine how well a convolutional neural network performs when those convolutional layers which are relatively unimportant for a particular image (i.e., the image does not produce one of the strongest activations) are skipped in the training, validating, and testing process.
ARTICLE | doi:10.20944/preprints201906.0166.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: MRI image; Texture Features; GLCM
Online: 18 June 2019 (05:36:29 CEST)
This paper presented a feature vector using a different statistical texture analysis of brain tumor from MRI image. The statistical feature texture is computed using GLCM (Gray Level Co-occurrence Matrices) of Brain Nodule structure. For this paper, the brain nodule segmented using strips method to implemented marker watershed image segmentation based on PSO (Particle Swarm Optimization) and Fuzzy C-means clustering (FCM). Furthermore, the four angles 0o, 45o, 90o and 135o are calculated the segmented brain image in GLCM. The four angular directions are calculated using texture features are correlation, energy, contrast and homogeneity. The texture analysis is performed a different types of images using past years. So the algorithm proposed statistical texture features are calculated for iterative image segmentation. These results show that MRI image can be implemented in a system of brain cancer detection.
ARTICLE | doi:10.20944/preprints202309.0762.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: image processing; image analysis; deep learning; roof structure extraction; roof vectorization; frame field learning
Online: 12 September 2023 (08:36:45 CEST)
A topic of growing interest in urban remote sensing is the automated extraction of geometrical building information for 3D city modeling. Roof geometry information is useful for applications such as urban planning, solar potential estimation and telecommunication installation planning, and wind flow simulations for pollutant diffusion analysis. Recent research has proven that the advance in remote sensing technologies and deep learning methods offer the prospects of deriving the roof structure information accurately and efficiently. In this study, we propose a Vectorized Roof Extractor- method based on Fully Convolutional Networks (FCNs) and advanced polygonization method to extract roof structure from aerial imagery and a normalized Digital Surface Models (nDSM) in a regularized vector format. The roof structure consists of building outlines, external edges of the building roof, inner rooflines, internal intersections of the main roof planes. The methodology is comprised of segmentation, vectorization and post-processing for outer rooflines, external edges of the building roof, and inner rooflines, and internal intersections of the main roof planes. For the comparison, we adapt the Frame field Learning (FFL) method originally designed to extract building polygons . Our experiments are conducted on a custom data set derived for the city of Enschede, The Netherlands, using aerial imagery, nDSM and manually digitized training polygons. The results show that the proposed Vectorized Roof Extractor outperformed adapted FFL on PoLiS distance with values of 3.5 m and 1.2 m for outlines and inner rooflines, respectively. Furthermore, the model surpassed the adapted FFL on PoLiS-thresholded F-score for outlines and inner rooflines, with 0.31 and 0.57, respectively. The Vectorized Roof Extractor produced adequate visual results, with straighter walls and fewer missed inner roofline detections. It can predict buildings with common walls thanks to skeleton graph computation. To summarize, the proposed method is suitable for urban applications and has the potential to be improved further.
ARTICLE | doi:10.20944/preprints201810.0534.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: non-destructive testing; process optimization; porosity; pore hotspots; image-based simulations; 3D image analysis
Online: 23 October 2018 (09:58:18 CEST)
This paper presents the latest developments in microCT, both globally and locally, for supporting the additive manufacturing industry. There are a number of recently developed capabilities which are especially relevant to the non-destructive quality inspection of additive manufactured parts; and also for advanced process optimization. These new capabilities are all locally available but not yet utilized to their full potential, most likely due to a lack of knowledge of these capabilities. The aim of this paper is therefore to fill this gap and provide an overview of these latest capabilities, showcasing numerous local examples.
ARTICLE | doi:10.20944/preprints201805.0240.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: background reconstruction; image quality assessment; image dataset; subjective evaluation; perceptual quality; objective quality metric
Online: 17 May 2018 (09:36:33 CEST)
With an increased interest in applications that require a clean background image, such as video surveillance, object tracking, street view imaging and location-based services on web-based maps, multiple algorithms have been developed to reconstruct a background image from cluttered scenes. Traditionally, statistical measures and existing image quality techniques have been applied for evaluating the quality of the reconstructed background images. Though these quality assessment methods have been widely used in the past, their performance in evaluating the perceived quality of the reconstructed background image has not been verified. In this work, we discuss the shortcomings in existing metrics and propose a full reference Reconstructed Background image Quality Index (RBQI) that combines color and structural information at multiple scales using a probability summation model to predict the perceived quality in the reconstructed background image given a reference image. To compare the performance of the proposed quality index with existing image quality assessment measures, we construct two different datasets consisting of reconstructed background images and corresponding subjective scores. The quality assessment measures are evaluated by correlating their objective scores with human subjective ratings. The correlation results show that the proposed RBQI outperforms all the existing approaches. Additionally, the constructed datasets and the corresponding subjective scores provide a benchmark to evaluate the performance of future metrics that are developed to evaluate the perceived quality of reconstructed background images.
ARTICLE | doi:10.20944/preprints202109.0295.v1
Subject: Medicine And Pharmacology, Dietetics And Nutrition Keywords: Obesity; Eating Disorder; Body Image; Adolescents.
Online: 16 September 2021 (16:34:57 CEST)
There is growing recognition of the adverse effects of body image dissatisfaction (BID) and eating disorder (ED) symptoms on adolescent health. The aim of this study was to estimate the prevalence of ED symptoms, BID, and their relationship in adolescents from public schools in Southern Brazil. A total of 782 schoolchildren (male: n=420, female: n=362); age: 15 ± 0,4 years) answered a self-administrated questionnaire to identify sociodemographic data. Children´s Figure Rating Scale was adopted to identify body image and Eating Attitudes Test (EAT-26) was applied to investigate ED symptoms. Inferential statistics and hierarchical model-controlled logistic regression were used for association between variables. Most of the schoolchildren reported being satisfied with their bodies. However, we observed a higher prevalence of dissatisfaction among girls for being overweight and thinness among boys. Female students and students from schools located in the central area of the city showed higher chances of developing ED symptoms, and the absence of symptoms of ED appeared to act as a protective factor against BID in schoolchildren. Results of this study show the need to reflect on these factors that influence the development of ED and non-acceptance of their own body in a population concerned with their physical appearance.
ARTICLE | doi:10.20944/preprints202109.0285.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: remote sensing; deep learning; image classification
Online: 16 September 2021 (13:38:55 CEST)
Autonomous image recognition has numerous potential applications in the field of planetary science and geology. For instance, having the ability to classify images of rocks would allow geologists to have immediate feedback without having to bring back samples to the laboratory. Also, planetary rovers could classify rocks in remote places and even in other planets without needing human intervention. Shu et al. classified 9 different types of rock images using a Support Vector Machine (SVM) with the image features extracted autonomously. Through this method, the authors achieved a test accuracy of 96.71%. In this research, Convolutional Neural Networks(CNN) have been used to classify the same set of rock images. Results show that a 3-layer network obtains an average accuracy of 99.60% across 10 trials on the test set. A version of Self-taught Learning was also implemented to prove the generalizability of the features extracted by the CNN. Finally, one model has been chosen to be deployed on a mobile device to demonstrate practicality and portability. The deployed model achieves a perfect classification accuracy on the test set, while taking only 0.068 seconds to make a prediction, equivalent to about 14 frames per second.
CASE REPORT | doi:10.20944/preprints202012.0785.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: built environment; image analysis; remote sensing
Online: 31 December 2020 (09:51:50 CET)
The development of unmanned satellite space technology is increasingly willing, the emergence of medium resolution satellites with sensitivity and spectral variants such as Landsat is very effective in observing environmental changes, while the purpose of this study is to monitor the development of built-in land using image transformation techniques, estimating built-in land changes. The research method uses the NDVI image transformation technique, NDBI and Built Up Index, with Landsat satellite image data obtained from USGS. Accuracy sampling is done by purposive sampling with confusion matrix accuracy test technique. The research results were found. developed land for the period 2004 - 2010 with a percentage of 19.25%, for stages 2010 - 2018 with a percentage of 30.25%. The land development was built based on the area of the highest sub-district in the Kubung area in the early period with a percentage of 7.20% then in the second period with a percentage of 32.23%. The quality of the accuracy of the results of image analysis using confusion matrix technique with an image accuracy level in a field sample of 185 with an image accuracy of 86.04%.
ARTICLE | doi:10.20944/preprints202012.0727.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: city marketing; sustainable development; resillience; image
Online: 29 December 2020 (11:24:13 CET)
The focus of this study is to identify whether resilience and sustainable development can be used as an image for strategic planning of the city marketing. Resilience is about building and planning for future proof the cities. How urban challenges and crisis have the lowest impact and the maximum of bounce back and evolution. Resilience is part of the sustainable development. Thus, it is important for the decision-makers to define the mission on their strategic planning in a holistically way taking into consideration the basic assets of a city, the environment, the economy and the society and how can all of them can be combined to marketing the city and take into consideration the internal and external environment. As the past few years’ city marketing has become an important tool for the urban development. The main goal is to show how city marketing can be applied on a city that tries to be more resilient and more sustainable by using strategic urban planning to set the vision, to identify the challenges and the problematic areas and to set new goals and objectives in order to plan and build to future proof the complexity of an urban system. For answering the questions of this article we use two case studies Rotterdam (Netherlands) and Thessaloniki (Greece), using a literature review and researches conducted alongside with a benchmarking of their resilient strategies as both of the cities are members of the Resilient Cities Network. From a different perspective of resilient thinking, both of the cities have managed to use resilience as a marketing image for further sustainable development.
ARTICLE | doi:10.20944/preprints201910.0188.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: digital watermarking; multiple image; transform domain
Online: 17 October 2019 (08:48:19 CEST)
In this paper, a technique of image watermarking using multiple images as watermarks is presented. The technique is based on transform domain functions including discrete wavelet transform (DWT), discrete cosine transform (DCT) and singular value decomposition (SVD) with an image as the host signal i.e. the watermarks will be used as proofs of the authenticity of the host image. The technique is executed by performing multilevel DWT followed by applying DCT and SVD to both the host and watermark. Multiple watermarks are used for the insurance of better security level. The scheme is immune to common image processing operations & some attacks and exhibits PSNR of 108.3781dB, normalized cross correlation (NCC) over 0.99 and normalized correlation (NC) over 0.99.
ARTICLE | doi:10.20944/preprints201906.0215.v1
Subject: Social Sciences, Education Keywords: addiction; triathletes; bogy image; behavior regulation
Online: 21 June 2019 (11:36:23 CEST)
The aim of the research was getting to know the risk of dependency on physical exercising in individual sportspeople and the relationship with body dissatisfaction and motivation. 225 triathletes, swimmers, cyclists and athletes- with ages going from 18 to 63 years old took part in the research, of which 145 were men (M = 35.57 ±10.46 years) and 80 women (M = 32.83 ±10.31 years). The EDS-R was used to study the dependency on exercising, BSQ to study body dissatisfaction, BREQ-3 to know the motivation of participants and BIAQ to analyse conducts of avoidance to body image. The obtained results show that 8.5% of the subjects had risk of dependency on exercising and that 18.2% tend to have corporal dissatisfaction, without meaningful differences in the kind of sport they practiced. However, there were important differences concerning the dependency on physical exercise (15% vs 4.8%) and body dissatisfaction (31.1% vs 11%) in relation to sex, being the higher percentage referring to women. The introjected regulation and the conduct of food restriction were the predictor variables of the dependency on exercising and corporal dissatisfaction.
REVIEW | doi:10.20944/preprints201903.0095.v1
Subject: Biology And Life Sciences, Biophysics Keywords: Striated Muscle, image reconstruction, muscle physiology
Online: 7 March 2019 (12:42:36 CET)
Much has been learned about the interaction between myosin and actin through biochemistry, in vitro motility assays and cryo-electron microscopy of F-actin decorated with myosin heads. Comparatively less is known about actin-myosin interactions within the filament lattice of muscle, where myosin heads function as independent force generators and thus most measurements report an average signal from multiple biochemical and mechanical states. All of the 3-D imaging by electron microscopy that has revealed the interplay of the regular array of actin subunits and myosin heads within the filament lattice has been accomplished using the flight muscle of the large waterbug Lethocerus sp. Lethocerus flight muscle possesses a particularly favorable filament arrangement that enables all the myosin cross-bridges contacting the actin filament to be visualized in a thin section. This review covers the history of this effort and the progress toward visualizing the complex set of conformational changes that myosin heads make when binding to actin in several static states as well as fast frozen actively contracting muscle. The efforts have revealed a consistent pattern of changes to the myosin head structures determined by X-ray crystallography needed to explain the structure of the different acto-myosin interactions observed in situ.
ARTICLE | doi:10.20944/preprints201811.0028.v1
Subject: Business, Economics And Management, Business And Management Keywords: ISO; social responsibility; image; profitability; SMEs
Online: 2 November 2018 (06:53:35 CET)
At present, business strategies in SMEs (Small and medium enterprises) are crucial for consolidation in highly competitive markets, in achieving a better image and in business profitability. One of the strategies that have the most success and business success are sustainable practices and social responsibility such as: ISO 14001 and ISO 26001. The literature related to sustainable business is based mainly on the theory of resources and capabilities, and in theory based on Stakeholders. These currents state that companies should focus on profitable strategies to ensure significant and long-term results, in order to achieve organizational and financial results for stakeholders. In this work, the sample consists of 215 companies from the commerce, services and industry sectors, located in the southern region of the State of Sonora in Mexico. The objective of the work is to analyze the influence of ISO 14001 and 26001 standards on the image and profitability of SMEs. The statistical analysis of the data has been carried out through the linear regression technique by OLS (Ordinary Least Squares). The findings prove that the ISO 14001 standard is the one that most influences the improvement of the business image and the level of profitability of the SME. In addition, we discovered that ISO 26001 has a partial influence on the image and profitability of the SME.
ARTICLE | doi:10.20944/preprints201810.0305.v1
Subject: Business, Economics And Management, Marketing Keywords: sustainable banking; corporate image; bank loyalty
Online: 15 October 2018 (11:49:29 CEST)
As the demand for a more sustainable society increases, adopting a sustainable banking approach serves as a competitive advantage for banks that are focused on attaining bank loyalty. This study revolves around understanding the role of sustainable banking practices on bank loyalty, while exploring the mediating effect of corporate image in the relationship between sustainable banking practices and bank loyalty. 511 data derived from customers of the banking sector was adopted for this study. Result from the structural equation modeling shows that sustainable banking practices positively and directly affects bank loyalty and corporate image, corporate image directly and positively affect bank loyalty, and also mediates in the relationship between sustainable banking practices and bank loyalty.
ARTICLE | doi:10.20944/preprints201802.0103.v1
Subject: Environmental And Earth Sciences, Space And Planetary Science Keywords: Cloud detection; Deep learning; Image Compression.
Online: 15 February 2018 (16:49:55 CET)
An effective on-board cloud detection method in small satellites would greatly improve the downlink data transmission efficiency and reduce the memory cost. In this paper, an ensemble method combining a lightweight U-Net with wavelet image compression is proposed and evaluated. The red, green, blue and infrared waveband images from Landsat-8 dataset are trained and tested to estimate the performance of proposed method. The LeGall-5/3 wavelet transform is applied on the dataset to accelerate the neural network and improve the feasibility of on-board implement. The experiment results illustrate that the overall accuracy of the proposed model achieves 97.45% by utilizing only four bands. Tests on low coefficients of compressed dataset have shown that the overall accuracy of the proposed method is still higher than 95%, while its inference speed is accelerated to 0.055 second per million pixels and maximum memory cost reduces to 2Mb. By taking advantage of mature image compression system in small satellites, the proposed method provides a good possibility of on-board cloud detection based on deep learning.
ARTICLE | doi:10.20944/preprints202204.0163.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: artificial intelligence; deep learning; image-to-image translation; dual-energy computed tomography; pulmonary embolism; emergency radiology
Online: 18 April 2022 (09:45:00 CEST)
Detector-based spectral CT offers the possibility of obtaining spectral information from which discrete acquisitions at different energy levels can be derived, yielding so-called virtual monoenergetic images (VMI). In this study, we aimed to develop a jointly optimized deep learning framework based on dual-energy CT pulmonary angiography (DE-CTPA) data to generate synthetic monoenergetic images (SMI) for improving automatic pulmonary embolism (PE) detection in single-energy CTPA scans. For this purpose, we used two data sets: our institutional DE-CTPA data set D1 comprising polyenergetic arterial series and the corresponding VMI at low-energy levels (40 keV) with 7,892 image pairs, and a 10% subset of the 2020 RSNA Pulmonary Embolism Detection Challenge data set D2, which consisted of 161,253 polyenergetic images with dichotomous slice-wise annotations (PE/no PE). We trained a fully convolutional encoder-decoder on D1 to generate SMI from single-energy CTPA scans of D2, which were then fed into a ResNet50 network for training of the downstream PE classification task. The quantitative results on the reconstruction ability of our framework revealed high-quality visual SMI predictions with reconstruction results of 0.984 ± 0.002 (structural similarity) and 41.706 ± 0.547 dB (peak-signal-to-noise ratio). PE classification resulted in an AUC of 0.84 for our model, which achieved improved performance compared to other naive approaches with AUCs up to 0.81. Our study stresses the role of using joint optimization strategies for deep learning algorithms to improve automatic PE detection. The proposed pipeline may prove to be beneficial for computer-aided detection systems and could help rescue CTPA studies with suboptimal opacification of the pulmonary arteries from single-energy CT scanners.
ARTICLE | doi:10.20944/preprints202105.0605.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: deep learning; computed tomography; image classification; COVID-19; medical image analysis; pneumonia; CNN, LSTM, medical diagnosis
Online: 25 May 2021 (10:32:29 CEST)
Advancements in deep learning and availability of medical imaging data have led to use of CNN based architectures in disease diagnostic assisted systems. In spite of the abundant use of reverse transcription-polymerase chain reaction (RT-PCR) based tests in COVID-19 diagnosis, CT images offer an applicable supplement with its high sensitivity rates. Here, we study classification of COVID-19 pneumonia (CP) and non-COVID-19 pneumonia (NCP) in chest CT scans using efficient deep learning methods to be readily implemented by any hospital. We report our deep network framework design that encompasses Convolutional Neural Networks (CNNs) and bidirectional Long Short Term Memory (biLSTM) architectures. Our study achieved high specificity (CP: 98.3%, NCP: 96.2% Healthy: 89.3%) and high sensitivity (CP: 84.0%, NCP: 93.9% Healthy: 94.9%) in classifying COVID-19 pneumonia, non-COVID-19 pneumonia and healthy patients. Next, we provide visual explanations for the CNN predictions with gradient-weighted class activation mapping (Grad-CAM). The results provided a model explainability by showing that Ground Glass Opacities (GGO), indicators of COVID-19 pneumonia disease, were captured by our CNN network. Finally, we have implemented our approach in three hospitals proving its compatibility and efficiency.
ARTICLE | doi:10.20944/preprints202104.0318.v1
Subject: Engineering, Transportation Science And Technology Keywords: Kerr frequency comb; Hilbert transform; integrated optics; all-optical signal processing; image processing; video image processing
Online: 12 April 2021 (14:27:20 CEST)
Advanced image processing will be crucial for emerging technologies such as autonomous driving, where the requirement to quickly recognize and classify objects under rapidly changing, poor visibility environments in real time will be needed. Photonic technologies will be key for next-generation signal and information processing, due to their wide bandwidths of 10’s of Terahertz and versatility. Here, we demonstrate broadband real time analog image and video processing with an ultrahigh bandwidth photonic processor that is highly versatile and reconfigurable. It is capable of massively parallel processing over 10,000 video signals simultaneously in real time, performing key functions needed for object recognition, such as edge enhancement and detection. Our system, based on a soliton crystal Kerr optical micro-comb with a 49GHz spacing with >90 wavelengths in the C-band, is highly versatile, performing different functions without changing the physical hardware. These results highlight the potential for photonic processing based on Kerr microcombs for chip-scale fully programmable high-speed real time video processing for next generation technologies.
ARTICLE | doi:10.20944/preprints201710.0187.v1
Subject: Computer Science And Mathematics, Analysis Keywords: medical image classification; local binary patterns; characteristic curves; whole slide image pro-cessing; automated HER2 scoring
Online: 31 October 2017 (03:10:22 CET)
This paper presents novel feature descriptors and classification algorithms for automated scoring of HER2 in Whole Slide Images (WSI) of breast cancer histology slides. Since a large amount of processing is involved in analyzing WSI images, the primary design goal has been to keep the computational complexity to the minimum possible level and to use simple, yet robust feature descriptors that can provide accurate classification of the slides. We propose two types of feature descriptors that encode important information about staining patterns and the percentage of staining present in ImmunoHistoChemistry (IHC) stained slides. The first descriptor is called a characteristic curve which is a smooth non-increasing curve that represents the variation of percentage of staining with saturation levels. The second new descriptor introduced in this paper is an LBP feature curve which is also a non-increasing smooth curve that represents the local texture of the staining patterns. Both descriptors show excellent interclass variance and intraclass correlation, and are suitable for the design of automatic HER2 classification algorithms. This paper gives the detailed theoretical aspects of the feature descriptors and also provides experimental results and comparative analysis.
ARTICLE | doi:10.20944/preprints201710.0181.v1
Subject: Computer Science And Mathematics, Analysis Keywords: ultrasound image analysis; speckle noise; synthetic ultrasound images; texture features; local binary patterns; image quality assessment
Online: 30 October 2017 (09:37:59 CET)
Speckle noise reduction is an important area of research in the field of ultrasound image processing. Several algorithms for speckle noise characterization and analysis have been recently proposed in the area. Synthetic ultrasound images can play a key role in noise evaluation methods as they can be used to generate a variety of speckle noise models under different interpolation and sampling schemes, and can also provide valuable ground truth data for estimating the accuracy of the chosen methods. However, not much work has been done in the area of modelling synthetic ultrasound images, and in simulating speckle noise generation to get images that are as close as possible to real ultrasound images. An important aspect of simulated synthetic ultrasound images is the requirement for extensive quality assessment for ensuring that they have the texture characteristics and gray-tone features of real images. This paper presents texture feature analysis of synthetic ultrasound images using local binary patterns (LBP) and demonstrates the usefulness of a set of LBP features for image quality assessment. Experimental results presented in the paper clearly show how these features could provide an accurate quality metric that correlates very well with subjective evaluations performed by clinical experts.
ARTICLE | doi:10.20944/preprints202304.0596.v2
Subject: Engineering, Bioengineering Keywords: Deep Learning; image-to-image translation; dosimetry; cycleGAN; CBCT; CT; limited FOV; artifact correction; Hounsfield unit recovery
Online: 5 June 2023 (09:53:36 CEST)
Radiotherapy commonly utilizes CBCT for patient positioning and treatment monitoring. CBCT is deemed to be secure for patients, making it suitable for the delivery of fractional doses. However, limitations such as a narrow field of view, beam hardening, scattered radiation artifacts, and variability in pixel intensity hinder the direct use of raw CBCT for dose recalculation during treatment. To address this issue, reliable correction techniques are necessary to remove artifacts and remap pixel intensity into HU values. This study proposes a deep-learning framework for calibrating CBCT images acquired with narrow FOV systems and demonstrates its potential use in proton treatment planning updates. Cycle-consistent GAN processes raw CBCT to reduce scatter and remap HU. Monte Carlo simulation is used to generate CBCT scans, enabling the possibility to focus solely on the algorithm’s ability to reduce artifacts and cupping effects without considering intra-patient longitudinal variability and producing a fair comparison between planning CT and calibrated CBCT dosimetry. To showcase the viability of the approach using real-world data, experiments were also conducted using real CBCT. Tests were performed on a publicly available dataset of 40 patients who received ablative radiation therapy for pancreatic cancer. The simulated CBCT calibration led to a difference in proton dosimetry of less than 2%, compared to the planning CT. The potential toxicity effect on the organs at risk decreased from about 50% (uncalibrated) up the 2% (calibrated). The gamma pass rate at 3%/2mm produced an improvement of about 37% in replicating the prescribed dose before and after calibration (53.78% vs 90.26%). Real data also confirmed this with slightly inferior performances for the same criteria (65.36% vs 87.20%). These results may confirm that generative artificial intelligence brings the use of narrow FOV CBCT scans incrementally closer to clinical translation in proton therapy planning updates.
ARTICLE | doi:10.20944/preprints202206.0384.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep Learning; Smartphone Image; Acne Grading; Acne Object DetectionDeep Learning, Smartphone Image, Acne Grading, Acne Object Detection
Online: 28 June 2022 (10:05:25 CEST)
Skin image analysis using artificial intelligence (AI) has recently attracted significant research interest, particularly for analyzing skin images captured by mobile devices. Acne is one of the most common skin conditions with profound effects in severe cases. In this study, we developed an AI system called AcneDet for automatic acne object detection and acne severity grading using facial images captured by smartphones. AcneDet includes two models for conducting two tasks: (1) a Faster R-CNN-based deep learning model for the detection of acne lesion objects of four types including blackheads/whiteheads, papules/pustules, nodules/cysts, and acne scars; and (2) a LightGBM machine learning model for grading acne severity using the Investigator’s Global Assessment (IGA) scale. The output of the Faster R-CNN model, i.e., the counts of each acne type, were used as input for the LightGBM model for acne severity grading. A dataset consisting of 1,572 labeled facial images captured by both iOS and Android smartphones was used for training. The results show that the Faster R-CNN model achieves a mAP of 0.54 for acne object detection. The mean accuracy of acne severity grading by the LightGBM model is 0.85. With this study, we hope to contribute to the development of artificial intelligent systems that are able to help acne patients understand more about their conditions and support doctors in acne diagnosis.
ARTICLE | doi:10.20944/preprints201812.0137.v2
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: microscopy, fluorescence, machine learning, deep learning, inverse problems, image reconstruction, image restoration, super-resolution, deconvolution, spectral unmixing
Online: 5 February 2019 (10:30:40 CET)
Deep Learning is a recent and important addition to the computational toolbox available for image reconstruction in fluorescence microscopy. We review state-of-the-art applications such as image restoration, super-resolution, and light-field imaging, and discuss how the latest Deep Learning research can be applied to other image reconstruction tasks such as structured illumination, spectral deconvolution, and sample stabilisation. Despite its successes, Deep Learning also poses significant challenges, has often misunderstood capabilities, and overlooked limits. We will address key questions, such as: What are the challenges in obtaining training data? Can we discover structures not present in the training data? And, what is the danger of inferring unsubstantiated image details?
ARTICLE | doi:10.20944/preprints201709.0098.v2
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: farming-pasture ecotone; TM image; remote sensing; vegetation cover factor; scale conversion; land use; high resolution image
Online: 21 September 2017 (16:33:49 CEST)
The key to simulating soil erosion is to calculate the vegetation cover (C) factor. Methods that apply remote sensing to calculate C factor at regional scale cannot directly use the C factor formula. That is because the C factor formula is obtained by experiment, and needs the coverage ratio data of croplands, woodlands and grasslands at standard plot scale. In this paper, we present a C factor conversion method from a standard plot to a km-sized grid based on large sample theory and multi-scale remote sensing. Results show that: 1) Compared with the existing C factor formula, our method is based on the coverage ratio of croplands, woodlands and grasslands on a km-sized grid, takes the C factor formula obtained from the standard plot experiment and applies it to regional scale. This method improves the applicability of the C factor formula, and can satisfy the need to simulate soil erosion in large areas. 2) The vegetation coverage obtained by remote sensing interpretation is significantly consistent (paired samples t-test, t = −0.03, df = 0.12, 2-tail significance p < 0.05) and significantly correlated with the measured vegetation coverage. 3) The C factor of the study area is smaller in the middle, southern and northern regions, and larger in the eastern and western regions. The main reason for that is the distribution of woodlands, the Hunshandake and Horqin sandy lands and the valleys affected by human activities. 4) The method presented in this paper is more meticulous than the C factor method based on the vegetation index, improves the applicability of the C factor formula, and can be used to simulate soil erosion on large scale and provide strong support for regional soil and water conservation planning.
ARTICLE | doi:10.20944/preprints202311.0891.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Olive; Color Image; Xception; Sorting; Deep Learning
Online: 14 November 2023 (11:54:41 CET)
Olive fruits at different ripening stages give rise to various table olive products and oil qualities. Therefore, developing an efficient method for recognizing and sorting olive fruits based on their ripening stages can greatly facilitate postharvest processing. This study introduces an automatic computer vision system that utilizes deep learning technology to classify the `Roghani` Iranian olive cultivar into five ripening stages using color images. The developed model employs convolutional neural networks (CNN) and transfer learning based on the Xception architecture and ImageNet weights as the base network. The model was fine-tuned by testing multiple configurations of well-known CNN layers. To minimize overfitting and enhance model generality, data augmentation techniques were employed. By considering different optimizers and two image sizes, four final candidate models were generated. These models were then compared in terms of loss and accuracy on the test dataset, classification performance (classification report and confusion matrix), and generality. All four candidates exhibited high accuracies ranging from 86.93% to 93.46% and comparable classification performance. In all models, at least one class was recognized with 100% accuracy. However, by taking into account the risk of overfitting, two models were discarded. Finally, a model with an image size of 224 × 224 and an SGD optimizer, which had a loss of 1.23 and an accuracy of 86.93%, was selected as the preferred option. The results of this study offer robust tools for automatic olive sorting systems, simplifying the differentiation of olives at various ripening levels for different post-harvest products.
REVIEW | doi:10.20944/preprints202310.0870.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: image inpainting object removal detection forensic forgery
Online: 13 October 2023 (08:25:14 CEST)
In recent years, significant advancements in the field of machine learning have influenced the domain of image restoration. While these technological advancements present prospects for improving the quality of images, they also present difficulties, particularly the proliferation of manipulated or counterfeit multimedia information on the internet. The objective of this paper is to provide a comprehensive review of existing inpainting algorithms and forgery detections, with a specific emphasis on techniques that are designed for the purpose of removing objects from digital images. In this study, we will examine various techniques encompassing conventional texture synthesis methods, as well as those based on neural networks. Furthermore, we will explore the artifacts associated with the identification of modified photos and present the artifacts frequently introduced by the inpainting procedure and assess the state-of-the-art technology for detecting such modifications. Lastly, we shall look at the available datasets and how the methods compare with each other. Having covered all of the above, the final outcome of this study is to provide a comprehensive perspective on the abilities and constraints to detect images for which an inpainting object removal method was applied.
ARTICLE | doi:10.20944/preprints202309.0400.v1
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: pulsar candidate image; lossless compression; PixelCNN; FAST
Online: 6 September 2023 (09:48:00 CEST)
The study focuses on the crucial aspect of lossless compression for FAST pulsar search data. The deep generative model PixelCNN, stacking multiple masked convolutional layers, achieves neural network autoregressive modeling, making it one of the most excellent image density estimators. However, the local nature of convolutional networks causes PixelCNN to concentrate only on nearby information, neglecting important information at greater distances. Although deepening the network can broaden the receptive field, excessive depth can compromise model stability, leading to issues like gradient degradation. To address these challenges, the study combines causal attention modules with residual connections, proposing the Causal Residual Attention Module to enhance the PixelCNN model. This innovation not only resolves convergence problems arising from network deepening but also widens the receptive field. It effectively utilizes global features, particularly capturing vertically correlated features prominently present in subgraphs of candidates. This significantly enhances its capability to model pulsar data.In the experiments, the model is trained and validated using the HTRU1 dataset. The study compares the average negative log-likelihood score with baseline models like GMM, STM, and PixelCNN. The results demonstrate the superior performance of the our model over other models. Finally, the study introduces the practical compression encoding process by combining the proposed model with arithmetic coding.
ARTICLE | doi:10.20944/preprints202308.0349.v1
Subject: Social Sciences, Other Keywords: Body dissatisfaction, Body image, Female students, Perfection
Online: 3 August 2023 (14:07:43 CEST)
Many university female students are concerned about their bodies. Body image perception has become a public health issue globally. This study aimed to explore factors contributing to body image dissatisfaction among female students at the University of Venda. The study was qualitative in nature and employed exploratory research design. A sample of 10 female students enrolled at the University of Venda were identified using convenience sampling method. A pre-tested, semi-structured interview guide was used to collect data and thematic content analysis technique was used to analyse the collected data. The findings of the study showed that body comparison, societal beauty standards, social media, and body shaming by family and friends were the main factors contributing to student’s body image dissatisfaction. The findings further revealed that lack of self-confidence, stress, avoidance, anxiety and depressive symptoms were the challenges faced by students with body image dissatisfaction. Acceptance. Self-care, and healthy diet were identified as coping strategies to help deal with the challenges of student’s body image dissatisfaction. Conclusively, students should be encouraged to seek professional help timeously, to help navigate their body image concerns to avoid decline in their daily functioning.
ARTICLE | doi:10.20944/preprints202306.1593.v1
Subject: Business, Economics And Management, Business And Management Keywords: Ecotourism; birdwatching; destination image; destination marketing; Colombia
Online: 22 June 2023 (10:40:39 CEST)
Colombia is noteworthy as a biodiversity hotspot, featuring an extraordinary number of en-demic orchids, birds, and butterflies. This exploratory study examines the perceptions of desti-nation image considering the cognitive and affective image in predicting behavioral intentions of ecotourists through symmetric data analysis. Using Partial Least Squares (PLS), the author(s) analyzed 64 survey responses collected of rural areas, including a new 15 statement scale spe-cialized on birdwatching. The findings support the reliability of the model, symmetric analysis presents the higher influence of emotions and affections in increasing intentions of recommen-dation, considering birdwatching as based on personal relationships. Additionally, the cognitive image for the birders despite representing destination attributes or sets of destination resources of a mental picture does not have the same impact on behavioral intentions. Therefore, manag-ers should develop positioning strategies based on the generation of emotions, because bird-watching tourists seek to have more emotional experiences.
ARTICLE | doi:10.20944/preprints202305.0067.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: low light; image enhancement; counvolutional neural networks
Online: 2 May 2023 (07:32:56 CEST)
In this study, we explore the potential of using a straightforward neural network inspired by the retina model to efficiently restore low-light images. The retina model imitates the neurophysiological principles and dynamics of various optical neurons. Our proposed neural network model reduces the computational overhead compared to traditional signal-processing models while achieving results similar to complex deep learning models from a subjective perceptual perspective. By directly simulating retinal neuron functionalities with neural networks, we not only avoid manual parameter optimization but also lay the groundwork for constructing artificial versions of specific neurobiological organizations.
ARTICLE | doi:10.20944/preprints202304.0926.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: artificial intelligence; image analysis; visual language model
Online: 25 April 2023 (10:47:52 CEST)
Recent advancements in Natural Language Processing (NLP), particularly in Large Language Models (LLMs), associated with deep learning-based computer vision techniques, have shown substantial potential for automating a variety of tasks. One notable model is Visual ChatGPT, which combines ChatGPT’s LLM capabilities with visual computation to enable effective image analysis. The model’s ability to process images based on textual inputs can revolutionize diverse fields. However, its application in the remote sensing domain remains unexplored. This is the first paper to examine the potential of Visual ChatGPT, a cutting-edge LLM founded on the GPT architecture, to tackle the aspects of image processing related to the remote sensing domain. Among its current capabilities, Visual ChatGPT can generate textual descriptions of images, perform canny edge and straight line detection, and conduct image segmentation. These offer valuable insights into image content and facilitate the interpretation and extraction of information. By exploring the applicability of these techniques within publicly available datasets of satellite images, we demonstrate the current model’s limitations in dealing with remote sensing images, highlighting its challenges and future prospects. Although still in early development, we believe that the combination of LLMs and visual models holds a significant potential to transform remote sensing image processing, creating accessible and practical application opportunities in the field.
ARTICLE | doi:10.20944/preprints202302.0097.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Vision loss; Diabetic retinopathy; Image enhancement; APTOS
Online: 6 February 2023 (09:50:58 CET)
Vision loss can be avoided if diabetic retinopathy (DR) is diagnosed and treated promptly. Following are the main 5 DR stages: none, moderate, mild, proliferate, and severe. In this study, a deep learning (DL) model is presented that diagnoses all 5 stages of DR with more accuracy than previous methods. The suggested method presents two scenarios: case 1 with image enhancement using contrast limited adaptive histogram equalization (CLAHE) filtering algorithm in conjunction with an Enhanced Super-resolution generative adversarial network (ESRGAN), and case 2 without image enhancement; augmentation techniques are then performed to generate a balanced dataset utilizing the same parameters for both cases. Using Inception-V3 applied to the Asia Pacific Tele-Ophthalmology Society (APTOS) datasets, the developed model achieved an accuracy of 98.7% for case 1 and 80.87% for case 2, which is greater than existing methods for detecting the five stages of DR. It was demonstrated that using CLAHE and ESRGAN improves a model's performance and learning ability.
COMMUNICATION | doi:10.20944/preprints202302.0003.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: image forensics; camera identification; fingerprint; forgery; PRNU
Online: 1 February 2023 (01:30:04 CET)
In the field of forensic imaging, it is important to be able to extract a “camera fingerprint” from one or a small set of images known to have been taken by the same camera (image sensor). Ideally, that fingerprint would be used to identify an individual source camera. Camera fingerprint is based on certain kind of random noise present in all image sensors that is due to manufacturing imperfections and thus unique and impossible to avoid. PRNU (Photo-Response Non-Uniformity) has become the most widely used method for SCI (Source Camera Identification). In this paper, we design a set of “attacks” to a PRNU based SCI system and we measure the success of each method. We understand an attack method as any processing that alters minimally image quality and that is designed to fool PRNU detectors (or, generalizing, any camera fingerprint detector). The PRNU based SCI system was taken from an outstanding reference that is publicly available.
REVIEW | doi:10.20944/preprints202205.0343.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Depth Completion; Depth Maps; Image-Guidance; Lidar
Online: 25 May 2022 (05:26:16 CEST)
Depth maps produced by LiDAR based approaches are sparse. Even high-end LiDAR sensors produce highly sparse depth maps, which are also noisy around the object boundaries. Depth completion is the task of generating a dense depth map from a sparse depth map. While the traditional approaches focus on directly completing this sparsity from the sparse depth maps, modern techniques use RGB images as a guidance tool to resolve this problem. Whilst many others rely on affinity matrices for depth completion. Based on these approaches, we have sub-divided the literature into two major categories; traditional approaches and backbone-based approaches. The latter is further sub-divided into two-branch, and spatial propagation approaches. The two-branch approaches still have a sub-category named guided-kernel approaches. In this paper, for the first time ever we present a comprehensive survey of depth completion methods. We present a novel taxonomy of depth completion approaches, review and detail different state-of-the art techniques within each category for depth completion of LiDAR data, and provide quantitative results for the approaches on KITTI and NYUv2 depth completion benchmark datasets.