ARTICLE | doi:10.20944/preprints202301.0219.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Large Language Model; Natural Language Processing; Reading Comprehension; Computational linguistics; Information Retrieval; BM25
Online: 12 January 2023 (08:58:03 CET)
Large language model (LLM) is a representation of a major advancement in AI, and has been used in multiple natural language processing tasks. Nevertheless, in different business scenarios, LLM requires fine-tuning by engineers to achieve satisfactory performance, and the cost of achieving target performance and fine-tuning may not match. Based on the Baidu STI dataset, we study the upper bound of the performance that classical information retrieval methods can achieve under a specific business, and compare it with the cost and performance of the participating team based on LLM. This paper gives an insight into the potential of classical computational linguistics algorithms, and which can help decision-makers make reasonable choices for LLM and low-cost methods in business R&D.
CASE REPORT | doi:10.20944/preprints201812.0243.v1
Subject: Medicine & Pharmacology, Other Keywords: technique, transplantation, cadaveric, donor, retrieval
Online: 20 December 2018 (09:39:52 CET)
Aim: To describe a technique for retrieval of abdominal organs of a cadaveric donor Methods: Only the inner transverse transection of abdominal wall floors except skin was performedResults: The good exposure of abdominal closure with the retrieval of the liver, both kidneys and spleen easily.Conclusion: The technique can be feasible in cadaveric organ retrieval from the abdomen.
ARTICLE | doi:10.20944/preprints201705.0157.v1
Online: 22 May 2017 (05:53:30 CEST)
We present a new approach to retrieve Aerosol Optical Depth (AOD) from Moderate Resolution Imaging Spectroradiometer (MODIS) over the turbid coastal water. This approach supplements the operational Dark Target (DT) aerosol retrieval algorithm that currently don’t conduct any AOD retrieval in the regions with large water-leaving radiances in the visible spectrum. Over the global coastal water regions in all cloud-free conditions, this unavailability of AOD retrievals due to the inherent limitation in existing DT algorithm is ~20%. Here, we refine the MODIS DT algorithm by considering that water-leaving radiance at 2.1 μm is negligible regardless of water turbidity. This refinement, with the assumption that the aerosol single scattering properties over coastal turbid water are similar to that over the adjacent open-ocean pixels, yields ~18% more of MODIS-AERONET collocated pairs for six AEROENT stations in the coastal water regions. Furthermore, comparison with these AERONET observations show that the new AOD retrievals are in either equivalent or better accuracy than those retrieved by the MODIS operational algorithm (over coastal land and non-turbid coastal water). Combining the new retrievals with the existing MODIS operational retrievals not only yield an overall improvement of AOD over those coastal water regions, but also successfully extend the spatial and temporal coverage of MODIS AOD retrievals over the coastal regions where 60% of human population resides, and thereby, aerosol impacts on regional air quality and climate are expected to be significant.
ARTICLE | doi:10.20944/preprints201611.0116.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: diversity; algorithms; legal information retrieval
Online: 23 November 2016 (09:50:24 CET)
"Public legal information from all countries and international institutions is part of the common heritage of humanity. Maximizing access to this information promotes justice and the rule of law". In accordance with the aforementioned declaration on Free Access to Law by Legal information institutes of the world, a plethora of legal information is available through the Internet, while the provision of legal information has never before been easier. Given that law is accessed by a much wider group of people, the majority of whom are not legally trained or qualified, diversification techniques, should be employed in the context of legal information retrieval, as to increase user satisfaction. We address diversification of results in legal search by adopting several state of the art methods from the web search, network analysis and text summarization domains. We provide an exhaustive evaluation of the methods, using a standard data set from the Common Law domain that we subjectively annotated with relevance judgments for this purpose. Our results i) reveal that users receive broader insights across the results they get from a legal information retrieval system, ii) demonstrate that web search diversification techniques outperform other approaches (e.g., summarization-based, graph-based methods) in the context of legal diversification and iii) offer balance boundaries between reinforcing relevant documents or sampling the information space around the legal query.
ARTICLE | doi:10.20944/preprints201906.0257.v1
Subject: Earth Sciences, Atmospheric Science Keywords: AIUS; occultation; retrieval algorithm; microwindows; ozone
Online: 26 June 2019 (05:10:54 CEST)
AIUS (Atmospheric Infrared Ultraspectral Sounder) is an infrared occultation spectrometer onboard the Chinese GaoFen-5 satellite, which covers a spectral range of 2.4--13.3 μm (750--4100 cm-1) with a spectral resolution of about 0.02 cm-1. AIUS was designed to measure and to study chemical processes of ozone (O3) and other trace gases in the upper troposphere and stratosphere over Antarctic. In this study, the corresponding retrieval methodology is described. The comparison between AIUS measurements and simulated spectra illustrates that AIUS measurements agree well to the simulated spectra. To first evaluate the reliability of our retrieval algorithm, three retrieval O3 experiments are performed based on ACE-FTS observation spectra. A comparison between our retrieved profiles and the ACE-FTS official products shows that the relative difference of these three retrieval experiments is mostly within 10% between 20 km and 70 km. These retrieval experiments demonstrate that the retrieval algorithm described in this study work fine and reliable. Furthermore, O3, H2O and HCl profiles are retrieved from eight orbits of AIUS measurements and compared with the official AURA/MLS level-2 v4.2 profiles. Comparison experiments show that the relative difference is mostly within 10% (about 0.02 - 0.4 ppm) between 18 and 58 km for O3 retrieval, within 10% (0-0.5 ppm) between 15 and 80 km for H2O retrieval, and within 10% (about 0.1 ppb) between 30 and 60 km for HCl retrieval. There is a good agreement in the retrieved trace gas profiles obtained from AIUS and from coincident profiles from MLS.
ARTICLE | doi:10.20944/preprints201804.0257.v1
Subject: Earth Sciences, Atmospheric Science Keywords: AIUS; occultation; retrieval algorithm; microwindows; ozone
Online: 20 April 2018 (03:59:10 CEST)
AIUS (Atmospheric Infrared Ultraspectral Sounder) is an infrared occultation spectrometer onboard the Chinese GaoFen-5 satellite, which covers a spectral range of 2.4–13.3 μm (750–4100 cm−1) with a spectral resolution of about 0.02 cm−1. AIUS is designed to measure and study chemical processes of ozone (O3) and other trace gases in the upper troposphere and stratosphere around Antarctic. In this study, the corresponding retrieval methodology is described. The retrieval simulations based on the simulated spectra of AIUS have been carried out, with a focus on O3. The relative difference between the retrieved and the true O3 profiles is within 5% from the 15 km to 70 km and about 10% below 15 km. The corresponding averaging kernels illustrate that the overall retrieval information mainly come from the spectra, not the a priori. The retrieval experiments also demonstrate that the shape of the retrieved profiles resembles the shape of the true profile even if the shape of the a priori profile is different from that of the true profile. Further, we perform the O3 retrieval from the real ACE-FTS (Atmospheric Chemistry Experiment-Fourier Transform Spectrometer) measurements and compare the results with the official ACE-FTS Level-2 products. Overall, both profiles agree well in the stratosphere where the retrieval sensitivity is high. The relative difference between both profiles is about 15% below 70 km, which may due to the measurement errors and different forward model parameters.
ARTICLE | doi:10.20944/preprints202209.0277.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: link prediction; AUC-ROC; Early retrieval evaluation
Online: 19 September 2022 (10:31:53 CEST)
Link prediction is an unbalanced early retrieval problem, whose goal is to prioritize a small cohort of positive links on top of a list largely populated by unlabelled links. Differently from binary classification, here the evaluation focuses on how the predictor prioritizes the positive class because, in practice, a negative class does not exist. Previous studies explained that AUC-ROC is not apt for unbalanced class problems and is misleading for early retrieval problems, therefore standard AUC-ROC is not appropriate for evaluation of link prediction. However, some scholars argue that an AUC-ROC like evaluation accounting for the relative positioning of the few positive links among the vastness of unlabelled links remains a valid concept to pursue. Here we propose the area under the magnified ROC (AUC-mROC), a new measure that adjusts the standard AUC-ROC to work also for unbalanced early retrieval problems such as link prediction.
ARTICLE | doi:10.20944/preprints202010.0648.v1
Subject: Earth Sciences, Atmospheric Science Keywords: satellite rainfall retrieval; deep learning; satellite meteorology
Online: 30 October 2020 (14:54:06 CET)
Rainfall retrieval using geostationary satellites provides critical means to the monitoring of extreme rainfall events. Using the relatively new Himawari 8 meteorological satellite with three times more channels than its predecessors, the deep learning framework of “convolutional autoencoder” (CAE) was applied to the extraction of cloud and precipitation features. The CAE method was incorporated into the Convolution Neural Network version of the PERSIANN precipitation retrieval that uses GOES satellites. By applying the CAE technique with the addition of Residual Blocks and other modifications of deep learning architecture, the presented derivation of PERSIANN operated at the Central Weather Bureau of Taiwan (referred to as PERSIANN-CWB) expands four extra convolution layers to fully use Himawari 8’s infrared and water vapor channels, while preventing degradation of accuracy caused by the deeper network. The development of PERSIANN-CWB was trained over Taiwan for its diverse weather systems and localized rainfall features, and the evaluation reveals an overall improvement from its CNN counterpart and superior performance over all other rainfall retrievals analyzed. Limitation of this model was found in the derivation of typhoon rainfall, an area requiring further research.
ARTICLE | doi:10.20944/preprints201704.0174.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: Hierarchical search; Image retrieval; Multi-feature fusion
Online: 26 April 2017 (18:51:42 CEST)
Aiming at the problems that are poor generalization performance, low retrieval accuracy and large time consumption of existing content-based image retrieval system, the hierarchical image retrieval method based on multi feature fusion is proposed in this paper. The retrieval accuracy rates on Corel5K, UKbeach and Holidays are 68.23(Top 1), 3.73(N-S) and 88.20(mAp), respectively. The experimental results show that the method proposed in this paper can effectively improve the deficiency of single feature retrieval and save time significantly in the premise of a small amount of loss of accuracy.
ARTICLE | doi:10.20944/preprints202207.0159.v1
Subject: Physical Sciences, Applied Physics Keywords: Marchenko equation; Green's function retrieval; elastodynamic wave propagation
Online: 11 July 2022 (11:19:01 CEST)
By solving a Marchenko equation, Green's functions at an arbitrary (inner) depth level inside an unknown elastic layered medium can be retrieved from single-sided reflection data, which are to be collected at the top of the medium. So far, an exact solution could only be obtained if the medium obeys stringent monotonicity conditions and if all forward-scattered (non-converted and converted) transmissions between the acquisition level and the inner depth level are known a-priori. We introduce an alternative Marchenko equation by revising the window operators that are applied in its derivation. We also introduce an auxiliary equation for transmission data, which are to be collected at the bottom of the medium, and a coupled equation, which is based on both reflection and transmission data. We show that the joint system of the Marchenko equation, the auxiliary equation and the coupled equation can be succesfully inverted when broadband reflection and transmission data are available. This results in a novel methodology for elastodynamic Green's function retrieval from two-sided data. Apart from these data, our approach requires P- and S-wave transmission times between the inner depth level and the top of the medium, as well as two angle-dependent amplitude scaling factors, which can be estimated from the data by enforcing energy conservation.
REVIEW | doi:10.20944/preprints202205.0004.v1
Subject: Biology, Other Keywords: COVID-19; Exploratory Search; Machine Learning; Document Retrieval
Online: 4 May 2022 (12:20:15 CEST)
The urgency of the COVID19 pandemic caused a surge in related scientific literature. This surge made the manual exploration of scientific articles time-consuming and inefficient. Therefore, a range of exploratory search applications have been created to facilitate access to the available literature. In this survey, we give a short description of certain efforts in this direction and explore the different approaches that they used.
Subject: Behavioral Sciences, Cognitive & Experimental Psychology Keywords: eyetracking, eye movements, gaze, memory, retrieval, vision, aging
Online: 20 May 2019 (12:25:44 CEST)
Eye movements support memory encoding by binding distinct elements of the visual world into coherent representations. However, the role of eye movements in memory retrieval is less clear. We propose that eye movements play a functional role in retrieval by reinstating the encoding context. By overtly shifting attention in a manner that broadly recapitulates the spatial locations and temporal order of encoded content, eye movements facilitate access to, and reactivation of, associated details. Such mnemonic gaze reinstatement may be obligatorily recruited when task demands exceed cognitive resources, as is often observed in older adults. We review research linking gaze reinstatement to retrieval, describe the neural integration between the oculomotor and memory systems, and discuss implications for models of oculomotor control, memory, and aging.
ARTICLE | doi:10.20944/preprints202001.0288.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: Cross-modal retrieval; Adversarial learning; Semantic correlation; Deep learning
Online: 24 January 2020 (15:03:34 CET)
With the rapid development of Internet and the widely usage of smart devices, massive multimedia data are generated, collected, stored and shared on the Internet. This trend makes cross-modal retrieval problem become a hot issue in this years. Many existing works pay attentions on correlation learning to generate a common subspace for cross-modal correlation measurement, and others uses adversarial learning technique to abate the heterogeneity of multi-modal data. However, very few works combine correlation learning and adversarial learning to bridge the inter-modal semantic gap and diminish cross-modal heterogeneity. This paper propose a novel cross-modal retrieval method, named ALSCOR, which is an end-to-end framework to integrate cross-modal representation learning, correlation learning and adversarial. CCA model, accompanied by two representation model, VisNet and TxtNet is proposed to capture non-linear correlation. Beside, intra-modal classifier and modality classifier are used to learn intra-modal discrimination and minimize the inter-modal heterogeneity. Comprehensive experiments are conducted on three benchmark datasets. The results demonstrate that the proposed ALSCOR has better performance than the state-of-the-arts.
REVIEW | doi:10.20944/preprints202104.0750.v1
Subject: Medicine & Pharmacology, Allergology Keywords: Epitope retrieval; Fixation; Histopathological diagnosis; Immunostaining; Sensitivity; Specificity; Trouble-shooting
Online: 28 April 2021 (14:17:11 CEST)
Immunostaining is an essential histochemical technique for analyzing pathogenesis and making a histopathological diagnosis. The needs are prompted by technical development and refinement, commercial availability of a variety of antibodies, deepened knowledge of immunohistochemical markers, accelerated analysis of morphofunctional correlations, progress in molecular target therapy, and the expectation of advanced histopathological diagnosis. However, immunostaining does have various pitfalls and caveats. We should learn from mistakes and failures, as well as from false positivity and false negativity. The present review article describes various devices, technical hints and trouble-shooting guides to keep in mind in performing immunostaining.
ARTICLE | doi:10.20944/preprints201902.0185.v1
Subject: Earth Sciences, Oceanography Keywords: C-band SAR; sea surface wind speed retrieval; full polarimetry
Online: 20 February 2019 (09:07:35 CET)
In this paper, sea surface wind speed (SSWS) retrieval from Gaofen-3 (GF-3) quad-polarization stripmap (QPS) data in vertical-vertical (VV), horizontal-horizontal (HH) and vertical-horizontal (VH) polarizations is investigated in detail based on 3,170 scenes acquired from October 2016 to May 2018. The radiometric calibration factor of the VV polarization data is examined first. This calibration factor generally meets the requirement of SSWS retrieval accuracy with an absolute bias of less than 0.5 m/s but shows highly dispersed characteristics. These results lead to SSWS retrievals with a small bias of 0.18 m/s but a rather high root mean square error (RMSE) of 2.36 m/s compared with the ERA-Interim reanalysis model data. Two refitted polarization ratio (PR) models for the QPS HH polarization data are presented. Based on a combination of the incidence angle- and azimuth angle-dependent PR model and CMOD5.N, the SSWS derived from the QPS HH data shows a bias of 0.07 m/s and an RMSE of 2.26 m/s relative to the ERA-Interim reanalysis model wind speed. A linear function relating SSWS and the normalized radar cross section (NRCS) of QPS VH data is derived. The SSWS data retrieved from the QPS VH data show good agreement with the WindSat SSWS data, with a bias of 0.1 m/s and an RMSE of 2.02 m/s. We also apply the linear function to the GF-3 Wide ScanSAR data acquired for the typhoon SOULIK, which surprisingly yields a very good agreement with the model results. A comparison of SSWS retrievals among three different polarization datasets is also presented. The current study and our previous work demonstrate that the general accuracy of the SSWS retrieval based on GF-3 QPS data has an absolute bias of less than 0.3 m/s and an RMSE of 2.0 ±0.2 m/s relative to various datasets. Further improvement will depend on dedicated radiometric calibration efforts.
ARTICLE | doi:10.20944/preprints201812.0022.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Image retrieval, color features, shape features, low-level features combination
Online: 3 December 2018 (13:33:45 CET)
Due to an increase in the number of image achieves, Content-Based Image Retrieval (CBIR) has gained attention for research community of computer vision. The image visual contents are represented in a feature space in the form of numerical values that is considered as a feature vector of image. Images belonging to different classes may contain the common visuals and shapes that can result in the closeness of computed feature space of two different images belonging to separate classes. Due to this reason, feature extraction and image representation is selected with appropriate features as it directly affects the performance of image retrieval system. The commonly used visual features are image spatial layout, color, texture and shape. Image feature space is combined to achieve the discriminating ability that is not possible to achieve when the features are used separately. Due to this reason, in this paper, we aim to explore the low-level feature combination that are based on color and shape features. We selected color moments and color histogram to represent color while shape is represented by using invariant moments. We selected this combination, as these features are reported intuitive, compact and robust for image representation. We evaluated the performance of our proposed research by using the Corel, Coil and Ground Truth (GT) image datasets. We evaluated the proposed low-level feature fusion by calculating the precision, recall and time required for feature extraction. The precision, recall and feature extraction values obtained from the proposed low-level feature fusion outperforms the existing research of CBIR.
ARTICLE | doi:10.20944/preprints201804.0338.v1
Subject: Earth Sciences, Geoinformatics Keywords: crowdsourced data; relevance; semantics; geographic information retrieval; natural language processing
Online: 26 April 2018 (10:19:02 CEST)
Crowdsourced Data (CSD) generated by citizens is becoming more popular as its potential utilisation in many applications is increasing due to its currency and availability. However, the quality of CSD, including its relevance, is often questioned as the data is not generated by professionals nor follows standard data collection procedures. The quality of CSD can be assessed according to a range of attributes including its relevance. Information relevance has been explored through using in Geographic Information Retrieval (GIR) techniques to identify relevant information. This research tested a relevance assessment approach for CSD by adapting relevance assessment techniques available in the GIR domain. The thematic and geographic relevance were assessed using the Term Frequency-Inverse Document Frequency (TF-IDF), Vector Space Model (VSM) and Natural Language Processing (NLP) techniques. The thematic and geographic specificities of the queries were calculated as 0.44 and 0.67 respectively, which indicates the queries used were more geographically specific than thematically specific. The Spearman's rho value of 0.62 indicated that the final ranked relevance lists showed reasonable agreement with a manually classified list and confirmed the potential of the approach for CSD relevance assessment for other possible crowdsourced data analysis.
ARTICLE | doi:10.20944/preprints201804.0134.v1
Subject: Earth Sciences, Geoinformatics Keywords: airborne laser scanning; geospatial database; data retrieval; road median; attributes
Online: 11 April 2018 (04:27:42 CEST)
Laser scanning systems make use of Light Detection and Ranging (LiDAR) technology to acquire accurately georeferenced sets of dense 3D point cloud data. The information acquired using these systems produces better knowledge about the terrain objects which are inherently 3D in nature. The LiDAR data acquired from mobile, airborne or terrestrial platforms provides several benefit over conventional sources of data acquisition in terms of accuracy, resolution and attributes. However, the large volume and scale of LiDAR data have inhibited the development of automated feature extraction algorithms due to the extensive computational cost involved in it. Moreover, the heterogeneously distributed point cloud, which represents objects with varying size, point density, holes and complicated structures pose a great challenge for data processing. Currently, geospatial database systems do not provide a robust solution for efficient storage and accessibility of raw data in a way that data processing could be applied based on optimal spatial extent. In this paper, we present Global LiDAR and Imagery Mobile Processing Spatial Environment (GLIMPSE) system that provides a framework for storage, management and integration of 3D LiDAR data acquired from multiple platforms. The system facilitates an efficient accessibility to the raw dataset, which is hierarchically represented in a geographically meaningful way. We utilise the GLIMPSE system to automatically extract road median from Airborne Laser Scanning (ALS) point cloud. In the first part of this paper, we detail an approach to efficiently retrieve the point cloud data from the GLIMPSE system for a particular geographic area based on user requirements. In the second part, we present an algorithm to automatically extract road median from the retrieved LiDAR data. The developed road median extraction algorithm utilises the LiDAR elevation and intensity attributes to distinguish the median from the road surface. We successfully tested our algorithms on two road sections consisting of distinct road median types based on concrete and grass-hedge barriers. The use of GLIMPSE improved the efficiency of the road median extraction in terms of fast accessibility to ALS point cloud data for the required road sections. The developed system and its associated algorithms provide a comprehensive solution to the user's requirement for an efficient storage, integration, retrieval and processing of large volumes of LiDAR point cloud data. These findings and knowledge contribute to a more rapid, cost-effective and comprehensive approach to surveying road networks.
ARTICLE | doi:10.20944/preprints201906.0051.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: NLP, news media, bias, neural networking, LSTM, information retrieval, filter bubble
Online: 6 June 2019 (13:15:28 CEST)
An article's tone and framing not only influence an audience's perception of a story but may also reveal attributes of author identity and bias. Building upon prior media, psychological, and machine learning research, this neural network-based system detects those writing characteristics in ten news agencies' reporting, discovering patterns that, intentional or not, may reveal an agency's topical perspectives or common contextualization patterns. Specifically, learning linguistic markers of different organizations through a newly released open database, this probabilistic classifier predicts an article's publishing agency with 74% hidden test set accuracy given only a short snippet of text. The resulting model demonstrates how unintentional 'filter bubbles' can emerge in machine learning systems and, by comparing agencies' patterns and highlighting outlets' prototypical articles through an open source exemplar search engine, this paper offers new insight into news media bias.
ARTICLE | doi:10.20944/preprints202004.0111.v1
Subject: Earth Sciences, Environmental Sciences Keywords: water quality retrieval; illegal discharges identify; small waterbodies; Sentinel-2; machine learning
Online: 8 April 2020 (03:51:33 CEST)
Water quality retrieval for small urban waterbodies by remote sensing get used to be difficult due to coarse spatial resolution of the remote sensing imagery. The recently launched Sentinel-2 produces imagery with a spatial resolution of 10 m. It provides an opportunity to solve the problem of retrieving water quality for small waterbodies. Additionally, many water management issues also require fine resolution of imagery, e.g. illegal discharge to an urban waterbody. Since illegal discharges are an important issue for urban water management, chemical oxygen demand (COD), total phosphorous (TP), and total nitrogen (TN) were chosen as the target parameters for water quality retrieval in this study. COD, TP and TN, however, are non-optically active parameters. There were limited studies in the past to retrieve these parameters in comparison with optically active parameters, e.g. Chlorophyll-A etc. This study compared three machine learning models, namely Random Forest (RF), Support Vector Regression (SVR), and Neural Networks (NN), to investigate the opportunity to retrieve the above non-optically active parameters. Results showed that R2 of TP, TN, and COD by NN, RF and SVR were 0.94, 0.88, and 0.86, respectively. The performances of water quality retrieval for these non-optically active parameters were significantly improved by the optimized machine learning models. These models hence solved the problem to use remote sensing data to retrieve these non-optically active water quality parameters and provided a new monitoring strategy for small waterbodies. Water quality mapping obtained by Sentinel-2 imagery provided a full spatial coverage of the water quality characterization for the entire water surface. Compared with water samples collecting and testing, it greatly reduced labor cost, reagents cost, and waste treatment cost. It also may help identify illegal discharges to urban waterbodies. The method developed in this research provides a new practical and efficient water quality monitoring strategy in managing water with consideration of environmental sustainability.
ARTICLE | doi:10.20944/preprints201906.0010.v1
Subject: Earth Sciences, Geology Keywords: Land surface temperature; Surface urban heat island, Local climate zone; Retrieval algorithms
Online: 3 June 2019 (09:02:15 CEST)
Surface urban heat island (SUHI) depicts the deteriorating thermal environment in high-density cities and local climate zone (LCZ) classification provides a universal protocol for SUHI identification. In this study, taking the central urbanized area of Guangzhou in the humid subtropical region of China as the study area, the maps or images of LCZ, land surface temperature (LST), SUHI and urban design factors were achieved by using Landsat satellite data, GIS database and a series of retrieval and classification algorithms, and the urban design factors influencing SUHI were investigated based on 625 samples of LCZs. The results show that in the summer daytime under the clear sky condition, the LST varied greatly from 26 °C to 40 °C and the SUHI changed in a wide range of -6 °C to 8 °C in the LCZs of the study area. Seven and five urban design factors influencing the summer daytime SUHI were identified for the two dominant LCZ of LCZs 1-5 (LCZ 1 to LCZ 5) and the mixed LCZ (containing at least three types of LCZs), respectively. The summer daytime SUHI prediction models were obtained by using the step-wise multiple linear regression, with the performance of R2 of 0.697, RMSE of 1.21 °C, and the d value of 0.81 for the model of LCZs 1-5, and the values of 0.666, 1.66 °C, and 0.76 for the model of the mixed LCZ, indicating that the models can predict the changes of SUHI with LCZs to a large and satisfactory extent. This study presents a methodology to efficiently achieve a large sample of SUHI and urban design factors of LCZs in the largely urbanized cities, and provides information beneficial to the urban designs and regenerations in the humid subtropical region.
ARTICLE | doi:10.20944/preprints202205.0090.v1
Subject: Physical Sciences, Applied Physics Keywords: machine learning/artificial intelligence; precipitation type classification; passive microwave; precipitation radar; retrieval algorithm
Online: 7 May 2022 (03:46:06 CEST)
Precipitation type is a key parameter used for better retrieval of precipitation characteristics as well as to understand the cloud-convection-precipitation coupling processes. Ice crystals and water droplets inherently exhibit different characteristics in different precipitation regimes (e.g., convection, stratiform), which reflect on satellite remote sensing measurements that help us distinguish them. The Global Precipitation Measurement (GPM) Core Observatory’s Microwave Imager (GMI) and Dual-Frequency Precipitation Radar (DPR) together provide ample information on global precipitation characteristics. As an active sensor, DPR provides an accurate precipitation type assignment, while passive sensors like GMI are traditionally only used for empirical understanding of precipitation regimes. Using collocated precipitation type flags from DPR as the “truth”, this paper employs machine learning (ML) models to train and test the predictability and accuracy of using passive GMI-only observations together with ancillary information from reanalysis and GMI surface emissivity retrieval products. Out of six ML models, four simple ones (Support Vector Machine, Neural Network, Random Forest, and Gradient Boosting) and the 1-D convolutional neural network (CNN) model are identified to produce 90% - 94% prediction accuracy globally for 5 types of precipitation (convective, stratiform, mixture, no precipitation, and other precipitation), which is much more robust than previous similar effort. One novelty of this work is to introduce data augmentation (subsampling and bootstrapping) to handle extremely unbalanced samples in each category. Careful evaluation of Impact matrices demonstrate that polarization difference (PD) and surface emissivity at high-frequency channels dominate the decision process, which are consistent with the physical understanding of polarized microwave radiative transfer over different surface types, as well as in snow and liquid clouds with different microphysical properties. Furthermore, the view-angle dependency artifact that DPR precipitation flag bears with does not propagate into the conical-viewing GMI retrievals. This work provides a new and promising way for future physics-based ML retrieval algorithm development.
ARTICLE | doi:10.20944/preprints202106.0544.v1
Subject: Earth Sciences, Atmospheric Science Keywords: Snowfall Retrieval; Snow Water Equivalent; Cloud Liquid Water; Emissivity; Brightness Temperature; Passive Microwave; GPM
Online: 22 June 2021 (14:22:16 CEST)
Falling snow alters its own microwave signatures when it begins to accumulate on the ground, making retrieval of snowfall challenging. This paper investigates the effects of snow-cover depth and cloud liquid water content on microwave signatures of terrestrial snowfall using reanalysis data and multi-annual observations by the Global Precipitation Measurement (GPM) core satellite with particular emphasis on the 89 and 166 GHz channels. It is found that over shallow snow cover (snow water equivalent (SWE) ≤ 100 kg m-2) and low values of cloud liquid water path (LWP 100–150 g m-2), the scattering of light snowfall (intensities ≤ 0.5 mm h−1) is detectable only at frequency 166 GHz, while for higher snowfall rates, the signal can also be detected at 89 GHz. However, when SWE exceeds 200 kg m-2 and the LWP is greater than 100–150 g m-2, the emission from the increased liquid water content in snowing clouds becomes the only surrogate microwave signal of snowfall that is stronger at frequency 89 than 166 GHz. The results also reveal that over high latitudes above 60°N where the SWE is greater than 200 kg m-2 and LWP is lower than 100–150 g m-2, the snowfall microwave signal could not be detected with GPM without considering a priori data about SWE and LWP. Our findings provide quantitative insights for improving retrieval of snowfall in particular over snow-covered terrain.
ARTICLE | doi:10.20944/preprints201807.0206.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: storage and retrieval processes; load-balancing; fault tolerance; energy efficiency; memory efficiency; data loss
Online: 11 July 2018 (14:47:31 CEST)
Load balancing, energy efficiency and fault tolerance are among the most important data dissemination issues in Wireless Sensor Networks (WSNs). In order to successfully cope with the mentioned issues, two main approaches (namely, Data-centric Storage and Distributed Data Storage) have been proposed in the literature. Both approaches suffer from data loss due to memory and/or energy depletion in the storage nodes. Even though several techniques have been proposed so far to overcome the mentioned problems, the proposed solutions typically focus on one issue at a time. In this paper, we integrate the Data-centric Storage (DCS) features into Distributed Data Storage (DDS) mechanisms and present a novel approach, denoted as Collaborative Memory and Energy Management (CoMEM), to overcome both problems and bring memory and energy efficiency to the data loss mechanism of WSNs. We also propose analytical and simulation frameworks for performance evaluation. Our results show that the proposed method outperforms existing approaches in various WSN scenarios.
ARTICLE | doi:10.20944/preprints201708.0102.v1
Subject: Earth Sciences, Geoinformatics Keywords: Content-Based Remote Sensing Image Retrieval; Change Information Detection; Information Management; Remote Sensing Data Service
Online: 29 August 2017 (16:18:20 CEST)
With the rapid development of satellite remote sensing technology, the volume of image datasets in many application areas is growing exponentially and the demand for Land-Cover and Land-Use change remote sensing data is growing rapidly. It is thus becoming hard to efficiently and intelligently retrieve the change information that users need from massive image databases. In this paper, content-based image retrieval is successfully applied to change detection and a content-based remote sensing image change information retrieval model is introduced. First, the construction of a new model framework for change information retrieval in a remote sensing database is described. Then, as the target content cannot be expressed by one kind of feature alone, a multiple-feature integrated retrieval model is proposed. Thirdly, an experimental prototype system that was set up to demonstrate the validity and practicability of the model is described. The proposed model is a new method of acquiring change detection information from remote sensing imagery and so can reduce the need for image pre-processing, deal with problems related toseasonal changes as well as other problems encountered in the field of change detection. Meanwhile, the new model has important implications for improving remote sensing image management and autonomous information retrieval.
ARTICLE | doi:10.20944/preprints201612.0075.v1
Subject: Earth Sciences, Geoinformatics Keywords: image recognition bases location; indoor positioning; RGB-D images; LiDAR; DataBase; mobile computing; image retrieval
Online: 15 December 2016 (07:17:35 CET)
This paper describes the first results of an Image Recognition Based Location (IRBL) for mobile application focusing on the procedure to generate a Database of range images (RGB-D). In an indoor environment, to estimate the camera position and orientation, a prior spatial knowledge of the surrounding is needed. In order to achieve this objective a complete 3D survey of two different environment (Bangbae metro station of Seoul and E.T.R.I. building in Daejeon – Republic of Korea) was performed using LiDAR (Light Detection And Ranging) instrument and the obtained scans were processed in order to obtain a spatial model of the environments. From this, two databases of reference images were generated using a specific software realized by the Geomatics group of Politecnico di Torino (ScanToRGBDImage). This tool allow to generate synthetically different RGB-D images) centered in the each scan position in the environment. Later, the external parameters (X, Y, Z, ω, φ, κ) and the range information extracted from the DB images retrieved, are used as reference information for pose estimation of a set of acquired mobile pictures in the IRBL procedure. In this paper the survey operations, the approach for generating the RGB-D images and the IRB strategy are reported. Finally the analysis of the results and the validation test are described.
COMMUNICATION | doi:10.20944/preprints202210.0257.v1
Subject: Physical Sciences, Optics Keywords: Phase imaging, bioimaging; synchrotron; near infrared beam; holography; incoherent optics; chemical imaging; phase retrieval; 3D imaging.
Online: 18 October 2022 (08:28:25 CEST)
Phase imaging of biochemical samples has been demonstrated for the first time at the Infrared Microspectroscopy (IRM) beamline of the Australian Synchrotron using the usually discarded Near-IR (NIR) region of the synchrotron-IR beam. The synchrotron-IR beam at the Australian Synchrotron IRM beamline has a unique fork shaped intensity distribution as a result of the gold coated extraction mirror shape, which includes a central slit for rejection of the intense X-ray beam. The resulting beam configuration makes any imaging task challenging. For intensity imaging, the fork shaped beam is usually tightly focused to a point on the sample plane followed by a pixel-by-pixel scanning approach to record the image. In this study, a pinhole was aligned with one of the lobes of the fork shaped beam and the Airy diffraction pattern was used to illuminate biochemical samples. The diffracted light from the samples was captured using a NIR sensitive lensless camera. A rapid phase-retrieval algorithm was applied to the recorded intensity distributions to reconstruct the phase information corresponding to different planes. The preliminary results are promising to develop multimodal imaging capabilities at the IRM beamline of the Australian Synchrotron.
COMMUNICATION | doi:10.20944/preprints202206.0383.v2
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: Exoskeleton; Twitter; Tweets; Big Data; social media; Data Mining; dataset; Data Science; Natural Language Processing; Information Retrieval
Online: 21 July 2022 (04:06:53 CEST)
The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and diverse use-cases in assisted living, military, healthcare, firefighting, and industry 4.0. The exoskeleton market is projected to increase by multiple times of its current value within the next two years. Therefore, it is crucial to study the degree and trends of user interest, views, opinions, perspectives, attitudes, acceptance, feedback, engagement, buying behavior, and satisfaction, towards exoskeletons, for which the availability of Big Data of conversations about exoskeletons is necessary. The Internet of Everything style of today's living, characterized by people spending more time on the internet than ever before, with a specific focus on social media platforms, holds the potential for the development of such a dataset, by the mining of relevant social media conversations. Twitter, one such social media platform, is highly popular amongst all age groups, where the topics found in the conversation paradigms include emerging technologies such as exoskeletons. To address this research challenge, this work makes two scientific contributions to this field. First, it presents an open-access dataset of about 140,000 tweets about exoskeletons that were posted in a 5-year period from May 21, 2017, to May 21, 2022. Second, based on a comprehensive review of the recent works in the fields of Big Data, Natural Language Processing, Information Retrieval, Data Mining, Pattern Recognition, and Artificial Intelligence that may be applied to relevant Twitter data for advancing research, innovation, and discovery in the field of exoskeleton research, a total of 100 Research Questions are presented for researchers to study, analyze, evaluate, ideate, and investigate based on this dataset.
ARTICLE | doi:10.20944/preprints202101.0318.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Flower Region of Interest (FRoI); Linear Discriminant Analysis (LDA); retrieval of flower videos; Multiclass Support Vector Machine
Online: 18 January 2021 (11:29:59 CET)
Searching, recognizing and retrieving a video of interest from a large collection of a video data is an instantaneous requirement. This requirement has been recognized as an active area of research in computer vision, machine learning and pattern recognition. Flower video recognition and retrieval is vital in the field of floriculture and horticulture. In this paper we propose a model for the retrieval of videos of flowers. Initially, videos are represented with keyframes and flowers in keyframes are segmented from their background. Then, the model is analysed by features extracted from flower regions of the keyframe. A Linear Discriminant Analysis (LDA) is adapted for the extraction of discriminating features. Multiclass Support Vector Machine (MSVM) classifier is applied to identify the class of the query video. Experiments have been conducted on relatively large dataset of our own, consisting of 7788 videos of 30 different species of flowers captured from three different devices. Generally, retrieval of flower videos is addressed by the use of a query video consisting of a flower of a single species. In this work we made an attempt to develop a system consisting of retrieval of similar videos for a query video consisting of flowers of different species.
ARTICLE | doi:10.20944/preprints201810.0213.v1
Subject: Earth Sciences, Atmospheric Science Keywords: Remote Sensing Techniques; Tropospheric NO2 Column Retrieval; Air Mass Factor (AMF); Meteorological Reformulation; MAX-DOAS measurements; Satellite Informatics
Online: 10 October 2018 (10:08:58 CEST)
Improving air quality and reducing human exposure to unhealthy levels of airborne chemicals are important global missions, particularly in China. Satellite remote sensing offers a powerful tool to examine regional trends in NO2, thus providing a direct measure of key parameters that strongly affect surface air quality. To accurately resolve spatial gradients in NO2 concentration using satellite observations and thus understand local and regional aspects of air quality, a priori input data at sufficiently high spatial and temporal resolution to account for pixel-to-pixel variability in the characteristics of the land and atmosphere are required. In this paper, we adapt the Berkeley High Resolution product (BEHR v3.0A, v3.0B and v3.0C) and meteorological outputs from the Weather Research and Forecasting (WRF) model to describe column NO2 in southern China. The BEHR approach is particularly useful for places with large spatial variabilities and terrain height differences such as China. We retrieved tropospheric NO2 vertical column density (TVCD) within part of southern China, for four seasons of 2015, based upon satellite datasets from Ozone Monitoring Instrument (OMI). Retrieval results are validated by comparing with MAX-DOAS tropospheric column measurements conducted in Guangzhou. BEHR retrieval algorithms are more consistent with MAX-DOAS measurements than OMI-NASA retrieval, opening new windows into research questions that require high spatial resolution, for example retrieving NO2 vertical column and ground pollutant concentration in China and other countries.
ARTICLE | doi:10.20944/preprints202008.0263.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: geoparser; geographic information retrieval; event extraction; argument extraction; information extraction; named entity recognition; conditional random function; semantic gazetteer; topic model
Online: 14 August 2020 (04:00:42 CEST)
One of the most important component of a Geographic Information Retrieval (GIR) is the geoparser, which performs toponym recognition, disambiguation, and geographic coordinate resolution from unstructured text domain. However, news articles which report several events across many place references mentioned in the document is not yet adequately modeled by regular geoparser types where the scope of resolution is either on toponym-level or document-level. The capacity to detect multiple events, geolocate its true locations and coordinates along with their numerical arguments are still missing from modern geoparsers, much less in Indonesian news corpora domain. We propose a novel type event geoparser which integrates an ACE-based event extraction model and provides precise event-level scope resolution. The geoparser casts the geotagging and event extraction as sequence labeling and uses Conditional Random Field with keywords feature obtained using Aggregated Topic Model as a semantic exploration from large corpus, which eventually increases the generalizability of the model. The geoparser also use Smallest Administrative Level feature along with Spatial Minimality-derived algorithm to improve the identification of Pseudo-location entities, resulting 19.4% increase for weighted F1 score. As a side effect of event extraction, the geoparser also extracts various numerical arguments and able to generate thematic choropleth map from a single news story.
ARTICLE | doi:10.20944/preprints202203.0004.v2
Subject: Earth Sciences, Oceanography Keywords: sea ice; Cryosphere; Arctic Ocean; Arctic sea ice change; Arctic climate change; remote sensing retrieval; satellite remote sensing; APP; APP-x; trend study
Online: 28 March 2022 (04:13:23 CEST)
Arctic sea ice characteristics have been changing rapidly and significantly in the last few decades. Using a long-term time series of sea ice products from satellite observations - the extended AVHRR Polar Pathfinder (APP-x), trends in sea ice concentration, ice extent, ice thickness, and ice volume in the Arctic from 1982 to 2020 are investigated. Results show that the Arctic has become less ice-covered in all seasons, especially in summer and autumn. Arctic sea ice thickness has been decreasing at the rate of -3.24 cm per year, resulting in about a 52% reduction in thickness from 2.35 m in 1982 to 1.13 m in 2020. Arctic sea ice volume has been decreasing at the rate of -467.7 km3 per year, resulting in about a 63% reduction in volume, from 27590.4 km3 in 1982 to 10305.5 km3 in 2020. These trends are further examined from a new perspective, where the Arctic Ocean is classified into open water, perennial, and seasonal sea ice-covered areas based on the sea ice persistence. The loss of the perennial sea ice-covered area is the major factor in the total sea ice loss in all seasons. If the current rates of sea ice changes in extent, concentration, and thickness continue, the Arctic is expected to have ice-free summer by the early 2060s.
ARTICLE | doi:10.20944/preprints201803.0021.v2
Subject: Earth Sciences, Geoinformatics Keywords: map processing; retrospective landscape analysis; visual data mining, image retrieval, low-level image descriptors, color moments, t-distributed stochastic neighborhood embedding, USGS topographic maps, Sanborn fire insurance maps
Online: 17 April 2018 (09:23:37 CEST)
Historical maps constitute unique sources of retrospective geographic information. Recently, several map archives containing map series covering large spatial and temporal extents have been systematically scanned and made available to the public. The geographic information contained in such data archives allows extending geospatial analysis retrospectively beyond the era of digital cartography. However, given the large data volumes of such archives and the low graphical quality of older map sheets, the processes to extract geographic information need to be automated to the highest degree possible. In order to understand the salient characteristics, data quality variation, and potential challenges in large-scale information extraction tasks, preparatory analytical steps are required to efficiently assess spatio-temporal coverage, approximate map content, and spatial accuracy of such georeferenced map archives across different cartographic scales. Such preparatory steps are often neglected or ignored in the map processing literature but represent highly critical phases that lay the foundation for any subsequent computational analysis and recognition. In this contribution we demonstrate how such preparatory analyses can be conducted using classical analytical and cartographic techniques as well as visual-analytical data mining tools originating from machine learning and data science, exemplified for the United States Geological Survey topographic map and Sanborn fire insurance map archives.
ARTICLE | doi:10.20944/preprints201808.0352.v2
Subject: Earth Sciences, Geoinformatics Keywords: artificial intelligence; color naming; color constancy; cognitive science; computer vision; object-based image analysis (OBIA); physical and statistical data models; radiometric calibration; semantic content-based image retrieval; spatial topological and spatial non-topological information components
Online: 28 August 2018 (07:57:02 CEST)
The European Space Agency (ESA) defines as Earth observation (EO) Level 2 information product a single-date multi-spectral (MS) image corrected for atmospheric, adjacency and topographic effects, stacked with its data-derived scene classification map (SCM), whose thematic map legend includes quality layers cloud and cloud-shadow. ESA EO Level 2 product generation is an inherently ill-posed computer vision (CV) problem never accomplished to date in operating mode by any EO data provider at the ground segment. Herein, it is considered: (I) necessary not sufficient pre-condition for the yet-unaccomplished dependent problems of semantic content-based image retrieval (SCBIR) and semantics-enabled information/knowledge discovery (SEIKD) in multi-source EO big data cubes. (II) Synonym of EO Analysis Ready Data (ARD) format. (III) Equivalent to a horizontal policy for background developments in Space Economy 4.0. In compliance with the GEO-CEOS Quality Assurance Framework for EO Calibration/Validation guidelines, to contribute toward filling an analytic and pragmatic information gap from multi-sensor EO big data to timely, comprehensive and operational EO value-adding information products and services, this work presents an innovative AutoCloud+ CV software toolbox for cloud and cloud-shadow quality layer detection in ESA EO Level 2 product. In vision, spatial information dominates color information. Inspired by this true-fact, the inherently ill-posed AutoCloud+ CV software was conditioned, designed and implemented to be “universal”, meaning fully automated (no human-machine interaction is required), near real-time, robust to changes in input data and scalable to changes in MS imaging sensor’s spatial and spectral resolution specifications.