ARTICLE | doi:10.20944/preprints201809.0454.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Venom; toxins; NCBI; database
Online: 24 September 2018 (12:00:19 CEST)
Venoms that drip from the fangs of snakes are incredibly complex chemical cocktails of compounds, with different proteins and enzymes, including a large variety of toxins like myotoxins, cardiotoxins, hemotoxins, and neurotoxins and their countless combinations. In addition to their use in the treatment of snake bites in humans, they have numerous therapeutic and medicinal applications. Potential use of snake venom includes excessive bleeding, stroke, neurological disorders, cancer, diabetes and aging. Therefore, a proper understanding of snake venom toxin and facilitating their use is of utmost importance. In this paper, we describe a novel database, called SVDB, for storage, dissemination and analysis of snake venom and toxins related information. SVDB has autonomous links to NCBI databases to pull relevant information both on-demand and asynchronous ways to facilitate data integration. SVDB includes authentic, non-redundant, up-to-date scientific information on literature, sequences, structures, small molecules, taxonomy and many more. SVDB portal also provides external links to tools like BLAST, CLUSTAL, Swiss-model, phylogeny and other toxin related resources. The architecture of SVDB information fetching, linking and structuring is unique and can be implemented to any domain specific generic data collection pipeline through the NCBI. The database is publicly available at https://www.snakevenomdb.org.
ARTICLE | doi:10.20944/preprints202105.0714.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: plant sterols; database; Polish population
Online: 31 May 2021 (08:32:16 CEST)
Plant sterols are compounds with multiple biological functions, mainly cholesterol-reducing. There are no comprehensive databases on plant sterols, which makes it difficult to estimate their intake in the Polish population. In this study we used international food databases, supplemented by scientific data from the literature, to create a database on plant sterols in the food consumed in Poland to assess the size and sources of dietary plant sterols in the adult population of Poland. The literature search was conducted using PubMed, Web of Science, Scopus, and Google Scholar to identify possible sources of published food composition data for plant sterols. The study group consisted of 5690 participants of the WOBASZ II survey. We identified 361 dietary sources of plant sterols based on the consumption of foods and dishes reported by participants. Cereals and fats provided 61% of the total plant sterols, and together with vegetables and fruits, this totaled 80%. Total plant sterol intake for the Polish population was 282.97 mg/day, and divided by men and women was 320.77 and 252.19 mg/day, respectively. Canola oil provided the most plant sterols at 16.92%, followed by white bread at 16.65% and soft margarine at 8.33%. This study found that the database of plant sterols facilitates the calculation of plant sterols in the typical Polish diet, and the results are comparable to those of other studies, despite different methodologies of nutritional assessment and slightly different databases. The main sources of dietary plant sterols did not differ from the data for other populations. This study confirmed the observations of other research that women's diets may have a higher plant sterol density compared to men.
ARTICLE | doi:10.20944/preprints201907.0074.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: proteoforms; database; bioinformatics; pattern; 2DE
Online: 4 July 2019 (10:29:30 CEST)
The human proteome is composed from diverse and heterogeneous gene products/proteoforms. Previously, we have been discussing the main technical aspects in developing for inventory of human proteoforms that would be visually attractive, clear, and easy to search (Naryzhny S. J. Proteomics 2018, S1874-3919(18) 30220-3). Here, we present our first draft of the database of proteoforms that is based on this discussion. The database principles and structure are described. The database is called “2DE-pattern” as it contains multiple isoform-centric patterns of proteoforms separated according to 2DE principles.
ARTICLE | doi:10.20944/preprints202110.0033.v1
Subject: Biology And Life Sciences, Immunology And Microbiology Keywords: Antibiotic resistance; text mining; therapy; database
Online: 4 October 2021 (08:58:52 CEST)
Antimicrobial resistance (AMR) is one of the top 10 threats affecting global health. AMR defeats the effective prevention and treatment of infections caused by microbial pathogens including bacteria, parasites, viruses and fungi (WHO). Microbial pathogens have natural tendency to evolve and mutate over time resulting in AMR strains. The set of genes involved in antibiotic resistance also termed as “antibiotic resistance genes” (ARGs) spread through species by lateral gene transfer thereby causing global dissemination. While this biological mechanism is prevalent in the spread of AMR, human methods also augment through various mechanisms such as over prescription, incomplete treatment, environmental waste etc. A considerable portion of scientific community is engrossed in AMR related work trying to discover novel therapeutic solutions for tackling resistant pathogens. Comprehensive inspection of the literature shows that diverse therapeutic strategies have evolved over recent years. Collectively, these therapeutic strategies include novel small molecules, newly identified antimicrobial peptides, bacteriophages, phytochemicals, nanocomposites, novel phototherapy against bacteria, fungi and virus. In this work we have developed a comprehensive knowledgebase by collecting alternative antimicrobial therapeutic strategies from literature data. We have used subjective approach for datamining new strategies resulting in broad coverage of entities and subsequently add objective data like entity name, potency, safety information etc. The extracted data was organized KOMBAT (Knowledgebase Of Microbes’ Battling Agents for Therapeutics). A lot of these data are tested against AMR pathogens. We envision that this database will be noteworthy for developing future therapeutics against resistant pathogens. The database can be accessed through http://kombat.igib.res.in/.
ARTICLE | doi:10.20944/preprints201903.0015.v1
Subject: Biology And Life Sciences, Plant Sciences Keywords: plant; sesquiterpenes; biosynthesis; graph grammars; database;
Online: 1 March 2019 (14:30:16 CET)
Plants produce a diverse portfolio of sesquiterpenes that are important in their response to herbivores and the interaction with other plants. Their biosynthesis from farnesyl diphosphate depends on the sesquiterpene synthases. Here, we investigate to what extent metabolic pathways can be reconstructed just from knowledge of the final product and the reaction mechanisms catalyzed by sesquiterpene synthases. We use the software package MedØlDatschgerl (MØD) to generate chemical networks and elucidate pathways contained in them. As examples, we successfully consider the reachability of the important plant sesquiterpenes β-caryophyllene, α-humulene, and β-farnesene. We also introduce a graph database to integrate simulation results with experimental biological evidence for selected predicted sesquiterpenes biosynthesis.
ARTICLE | doi:10.20944/preprints202310.0097.v1
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: Model performance; medico-administrative database; clinical database; Brier score; area under the receiver operating characteristic; discrimination; calibration
Online: 3 October 2023 (08:23:00 CEST)
In medico-administrative database, certain prognostic factors cannot be taken into account. The main objective was to estimate the performance of two models based on two databases: Epi-thor clinical and medico-administrative databases. For each of the two databases, we randomly sampled a development dataset with 70% of the data and a validation dataset with 30%. Performance of models was assessed by Brier score, the area under the receiver operating characteristic (AUC ROC) curve and the calibration of the model. For Epithor and medico-administrative databases, the development dataset included 10,516 patients (with resp. 227 (2.16%) and 283(2.7%) deaths) and the validation dataset included 4,507 patients (with resp. 93 (2%) and 119 (2.64%) deaths). 15 predictors were selected in the models (including FEV, Body Mass Index, ASA score and TNM stage for Epithor). The Brier score values were similar in the models of the two databases. For validation data, the AUC ROC curve was 0.73 [0.68-0.78] for Epithor and 0.8 [0.76-0.84] for medico-administrative databases. The slope of the calibration plot was less than 1 for the two databases. This work shows the good performances of a model developed from a medico-administrative database, despite the absence of clinical variables used in practice by surgeons, such as FEV1, ASA score or TNM stage.
REVIEW | doi:10.20944/preprints202311.0673.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: Transposon annotation; plant; genome; bioinformatics pipeline; database
Online: 10 November 2023 (07:32:14 CET)
Transposons are mobile DNA sequences that contribute large fractions of many plant genomes. They provide exclusive resources for tracking gene and genome evolution and for developing molecular tools for basic and applied research. Despite extensive efforts, it is still challenging to accurately annotate transposons, especially for the beginners as transposon prediction requires necessary expertise in both transposon biology and bioinformatics. Moreover, the complexity of plant genomes and the dynamic evolution of transposons also bring difficulties for genome-wide transposon discovery. This review summarizes the three major strategies for transposon detections including repeat-based, structure-based, and homology-based annotation, and introduces the transposon superfamilies identified in plants thus far, and some related bioinformatics resources for detecting plant transposons. Furthermore, it describes the transposon classification and explains why the terms of ‘autonomous’ and ‘non-autonomous’ cannot be used to classify the superfamilies of transposons. Last, this review also discusses how to identify misannotated transposons and improve the quality of transposon database. This review provides helpful information about plant transposons and a beginner’s guide on annotating these repetitive sequences.
ARTICLE | doi:10.20944/preprints202304.0546.v1
Subject: Social Sciences, Education Keywords: Economics universities; Learning; Online database systems, Students
Online: 19 April 2023 (05:01:25 CEST)
This study aims to evaluate the determinants that influence the adoption of online databases in the learning process of students at economics universities in Vietnam. A quantitative study with a meta-analysis was conducted by utilizing structural equation modeling (SEM). The sample consisted of 492 students from economics universities located in Vietnam who were surveyed using stratified random sampling. The results indicate that the adoption of online databases in student learning is influenced by six determinants, namely: (i) perceived effectiveness, (ii) perceived ease of use, (iii) technical barriers, (iv) personal usefulness, (v) usage attitudes, and (vi) convenience. Our study has revealed that students' intention to use the online database system is positively influenced by their perceived ease of use and perceived usefulness. These findings could be valuable in shaping policies for enhancing the online database system at economics universities, taking into account the students' characteristics and the institution's needs.
BRIEF REPORT | doi:10.20944/preprints202301.0235.v1
Subject: Medicine And Pharmacology, Medicine And Pharmacology Keywords: COVID-19; SARS-CoV-2; Homeopathy; Database
Online: 13 January 2023 (04:33:23 CET)
The COVID-19 pandemic has posed an unprecedented challenge to healthcare and the available solutions are unsatisfactory. Classical homeopathy may have a role to play in alleviating this burden. Covid cases treated with homeopathy was curated with the intention to provide basic information for further studies. The results are promising although far from being definitive. 367 patients considered were for statistical analysis, the mean age of the participants was 42.75 years, and males and females were 166 and 201 respectively. The mean follow-up period was 6.5 (SD 5.3) days, with a median of 1 homeopathic remedy used per case. 192 patients were diagnosed by RT–PCR, 111 by the WHO clinical criteria and 64 via retrospective antibodies. According to the WHO criteria, 255 were confirmed cases, 61 were probable cases, and 51 were suspected cases. It was seen that 73.8% of covid patients improved under homeopathic treatment, even those among severe disease 78.6%. Correlational analyses showed that presence of fever was associated with more likelihood of improvement and increasing age and a greater number of homeopathic remedies required in a case were associated negatively with improvement. However, it was seen that severe cases were more likely to improve under homeopathic treatment.
ARTICLE | doi:10.20944/preprints202011.0656.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: orchid books; database; species; documentations; policy; extinction
Online: 26 November 2020 (07:36:45 CET)
Orchids totalling close to 4000 species in Malaysia are one of the most diverse and most widespread family here had gained momentum in recognition among policy makers and guardians of the forest as one of the profiles that fit and can benefit plant conservation on a broad scale in recent years. Listed not only as conservation indicator but also as priority germplasm for sustainable floriculture industry in the country, a milestone that could safeguard wild orchids from verge of extinction in the natural habitat. Through our 30 years of studying orchids in the wild, we understand more about the distribution, rarity, threats and extinction of orchids than ever before, and we have the scientific tools to address many of the problems, yet many species face daily threats including habitat loss and unsustainable exploitation mainly via Internet trade. Prior to executing workable conservation plan, various research institutions working closely with Forestry Departments in Malaysia to first inventory and document the orchid species richness in the country. Selangor, Sarawak and Perlis Forestry Departments in collaboration with UPM have published seven orchid books that cover various habitat types. Selangor Forestry Department is leading on publishing biodiversity data in form of books for its various ecotourism’s sites and State Parks, and had published two books on orchids. Sarawak state has published one on the limestone orchids, and Perlis is the first to embark on the feat published one in 2010 and currently preparing a new book that includes other flagship wild flowers. Realizing the importance of documenting its biodiversity wealth, Malaysia has developed an information system that would be a one-stop retrieval point or repository for biodiversity facts and as a part of the commitments to CBD to facilitate reporting and the transfer of biological diversity and conservation-related information both nationally and internationally.
ARTICLE | doi:10.20944/preprints201609.0124.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: breast cancer; immunosuppressive factor; biomarker; online database
Online: 30 September 2016 (09:34:30 CEST)
To screen and validate immunosuppressive factors in luminal- and basal-like breast cancer cell lines and tissue samples associated with malignant phenotypes. The mRNA microarray datasets, GSE40057 and GSE1561, were downloaded and remodelled. Differentially expressed genes (DEGs) were identified. Enrichment analyses performed and the online resources, GOBO and Kaplan-Meier Plotter, were employed to screen for immunosuppressive factors associated with breast cancer malignant phenotypes. qRT-PCR and western blot were used to validate the expression of CD274 and IL8 in cell lines and immunohistochemical detected the MIF and VEGFA on tissue microarrays. The results showed that CD274 and IL8 were both upregulated in basal-like cell lines. That MIF expression was dramatically increased in patients with breast cancer metastases (p<0.05) and that VEGFA expression positively correlates with breast cancer pathologic grade (p<0.05).During the formation and development of breast cancer, immune-related genes are always activated, and immunosuppressive factors CD274, IL8, MIF and VEGFA are upregulated. Such molecules could be used as biomarkers for breast cancer prognosis. However, because individual immune-related factors can play several biological roles, the mechanistic relationship between immunosuppressive factors and breast cancer malignant phenotypes and the feasibility of their application as drug targets require further investigation.
ARTICLE | doi:10.20944/preprints201901.0130.v1
Subject: Business, Economics And Management, Business And Management Keywords: internationalisation of SMEs; big data; market-oriented information; relational database; supply chain network; optimized database; trade condition; data visualization
Online: 14 January 2019 (10:04:03 CET)
There have been many discussions on the globalisation of SMEs, but it is true that there is not enough academic achievement after such the study of Born global (BG) ventures. The internationalisation of SMEs (Small and Medium Enterprises) is not easy because they lack resources or capabilities compared to multinational corporations. This study investigated the role of government in assisting the internationalisation of SMEs. In particular, SMEs lacked the ability to acquire market-oriented information, so we’ve established the scheme of efficient information support system for the internationalisation of SMEs. In other words, we proposed an information analysis system through the establishment of a relational database constructed for market-oriented information support. KISTI (Korea Institute of Science and Technology Information), which is one of the government-funded research institutes in the Republic of Korea, provided information support to the SMEs dealing with hydrazine related products. This study suggests this case for the market-oriented information support of the government in the internationalisation of SMEs. The research on information support of the government is meaningful in that it suggests a way to support SMEs in practical level.
ARTICLE | doi:10.20944/preprints202309.0366.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Crops production; Agricultural management; Food security; Open database
Online: 6 September 2023 (03:54:44 CEST)
In recent years the yield of fruits and vegetables have been decreasing in Colombia, threatening national food security. Analysis of crop production data may lead to identifying cropping systems that have shown better adaptability to changes in climatic and non-climatic factors associated with agricultural production. The open database AGRONET keeps data of the agricultural activities conducted in Colombia, allowing to find the information organized by crops, regions and years; each row of the database registers farm information in Colombia. Aiming to identify resilient crops systems, agricultural data of fruits and vegetables were analyzed. First, trends in crop production were studied by year and locations, detecting the regions and crops with highest yields in the period from 2006 until 2020. Then, mixed linear regression and principal components analysis were applied to elucidate the relation between non-climatic factors and crop yield. In Colombia, vegetable production was more efficient than fruits, observing yields of 10.23 and 13.33 t ha-1, respectively. On the other hand, the Colombian central region showed high yields for vegetables, while for fruits this was exhibited in northern and eastern locations. In the present study, yield variation responded to changes in the location of crop systems, while years had no effect on vegetable production. Furthermore, the price of the agricultural product and the cost of fertilizers were associated with the yield of the analyzed crops systems. In Colombia, carrot, cabbage, tomato papaya and pineapple are resilient crops whose yield increases especially by the regions where they are cultivated.
ARTICLE | doi:10.20944/preprints202304.0265.v1
Subject: Medicine And Pharmacology, Pediatrics, Perinatology And Child Health Keywords: Drug information; drug database; drug formulary; neonatal; pediatric
Online: 12 April 2023 (09:32:12 CEST)
Neonatal drug information (DI) is essential for safe and effective pharmacotherapy in (pre)term neonates. Such information is usually absent from drug labels, making formularies a crucial part of the neonatal clinician’s toolbox. Several formularies exist worldwide, but they have never been fully mapped nor compared for content, structure and workflow. The objective of this review was to identify neonatal formularies, explore (dis)similarities, and raise awareness of their existence.Neonatal formularies were identified through self-acquaintance, experts and structured search. A questionnaire was sent to all identified formularies to provide details on formulary function. An original extraction tool was employed to collect DI from the formularies on the 10 most commonly used drugs in pre(term) neonates.Eight different neonatal formularies were identified worldwide (Europe, USA, Australia-New Zealand, Middle East). Six responded to the questionnaire and were compared for structure and content. Each formulary has its own workflow, monograph template and style, and update routine. Focus on certain aspects of DI also varies, as well as the type of initiative and funding.Clinicians should be aware of the various formularies available and their differences in characteristics and content to use them properly for the benefit of their patients.
ARTICLE | doi:10.20944/preprints202211.0439.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: coauthorship; coauthorship Network; Link Prediction; Graph Database; Nodes
Online: 23 November 2022 (07:38:22 CET)
In the modern world where research is taking a huge leap, the collaboration network between authors is also expanding, increasing the probability of different authors coming together to work on the same project, same research paper making them co-authors. In coauthorship, link prediction is used to anticipate new interactions between its members that are likely to occur in the future. Researchers have concentrated their efforts on studying and suggesting methods for providing effective reviews for authors who can collaborate on a scientific endeavor. In order to provide a precise link prediction, a graph database approach is proposed in this paper using nodes to determine most possible co-authors in future. In order to forecast the connections, we preprocessed the data set for the maximum relative contents. A supervised learning approach is used to execute the solution, which includes random forest classifier and logistic regression. The first findings of our technique reveal that the total of two author node’s research collaboration indices has the greatest influence on the performance of supervised link prediction than that of the traditional approach, which stimulates us to conduct further study on employing such a forecast.
ARTICLE | doi:10.20944/preprints202007.0051.v2
Subject: Social Sciences, Library And Information Sciences Keywords: COVID-19; WHO; database; systematic review; data quality
Online: 2 August 2020 (17:43:38 CEST)
Introduction: A large number of COVID-19 publications has created a need to collect all research-related material in practical and reliable centralized databases. The aim of this study was to evaluate the functionality and quality of the compiled World Health Organisation COVID-19 database and compare it to Pubmed and Scopus. Methods: Article metadata for COVID-19 articles and articles on 8 specific topics related to COVID-19 was exported from the WHO global research database, Scopus and Pubmed. The analysis was conducted in R to investigate the number and overlapping of the articles between the databases and the missingness of values in the metadata. Results: The WHO database contains the largest number of COVID-19 related articles overall but retrieved the same number of articles on 8 specific topics as Scopus and Pubmed. Despite having the smallest number of exclusive articles overall, the highest number of exclusive articles on specific COVID-19 related topics was retrieved from the Scopus database. Further investigation revealed that PubMed and Scopus have more comprehensive structure than the WHO database, and less missing values in the categories searched by the information retrieval systems. Discussion: This study suggests that the WHO COVID-19 database, even though it is compiled from multiple databases, has a very simple and limited structure, and significant problems with data quality. As a consequence, relying on this database as a source of articles for systematic reviews or bibliometric analyses is undesirable.
REVIEW | doi:10.20944/preprints202007.0479.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Face Recognition; Face Analysis; Face Database; Deep Learning
Online: 21 July 2020 (11:13:45 CEST)
Face recognition is one of the most active research fields of computer vision and pattern recognition, with many practical and commercial applications including identification, access control, forensics, and human-computer interactions. However, identifying a face in a crowd raises serious questions about individual freedoms and poses ethical issues. Significant methods, algorithms, approaches, and databases have been proposed over recent years to study constrained and unconstrained face recognition. 2D approaches reached some degree of maturity and reported very high rates of recognition. This performance is achieved in controlled environments where the acquisition parameters are controlled, such as lighting, angle of view, and distance between the camera-subject. However, if the ambient conditions (e.g., lighting) or the facial appearance (e.g., pose or facial expression) change, this performance will degrade dramatically. 3D approaches were proposed as an alternative solution to the problems mentioned above. The advantage of 3D data lies in its invariance to pose and lighting conditions, which has enhanced recognition systems efficiency. 3D data, however, is somewhat sensitive to changes in facial expressions. This review presents the history of face recognition technology, the current state-of-the-art methodologies, and future directions. We specifically concentrate on the most recent databases, 2D and 3D face recognition methods. Besides, we pay particular attention to deep learning approach as it presents the actuality in this field. Open issues are examined and potential directions for research in facial recognition are proposed in order to provide the reader with a point of reference for topics that deserve consideration.
ARTICLE | doi:10.20944/preprints201909.0070.v1
Subject: Biology And Life Sciences, Immunology And Microbiology Keywords: Microbiome, Inferred functions, Database, 16S, Metagenomics, Comparative metagenomics
Online: 6 September 2019 (09:44:29 CEST)
Motivation: 16S rRNA gene amplicon based sequencing has significantly expanded the scope of metagenomics research by enabling microbial community analyses in a cost-effective manner. The possibility to infer functional potential of a microbiome through amplicon sequencing derived taxonomic abundance profiles has further strengthened the utility of 16S sequencing. In fact, a surge in 'inferred function metagenomic analysis' has recently taken place, wherein most 16S microbiome studies include inferred functional insights in addition to taxonomic characterization. Tools like PICRUSt, Tax4Fun, Vikodak and iVikodak have significantly eased the process of inferring function potential of a microbiome using the taxonomic abundance profile. A platform that can enable hosting of inferred function 'metagenomic studies' with comprehensive metadata driven search utilities (of a typical database), coupled with on-the-fly comparative analytics between studies of interest, can be a major improvement to the state of art. ReFDash represents an effort in the proposed direction. Methods: This work introduces ReFDash - a Repository of Functional Dashboards. ReFDash, developed as a significant extension of iVikodak (function inference tool), provides three broad unique offerings in inferred function space - (i) a platform that hosts a database of inferred function data being continously updated using public 16S metagenomic studies (ii) a tool to search studies of interest and compare upto three metagenomic environments on the fly (iii) a community initiative wherein users can contribute their own inferred function data to the platform. ReFDash therefore provides a first-of-its-kind community-driven frame-work for scientific collaboration, data analytics, and sharing in this area of microbiome research. Results: Overall, the ReFDash database is aimed at compiling together a global ensemble of 16S-derived Functional Metagenomics projects. ReFDash currently hosts close to 50 ready-to-use, re-analyzable functional dashboards representing data from approximately 18,000 microbiome samples sourced from various published studies. Each entry also provides direct downloadable links to associated taxonomic files and metadata employed for analysis. Conclusion: The vision behind ReFDash is creation of a framework, wherein users can not only analyze their microbiome datasets in functional terms, but also contribute towards building an information base by submitting their functional analyses to ReFDash database. ReFDash web-server may be freely accessed at https://web.rniapps.net/iVikodak/refdash/
ARTICLE | doi:10.20944/preprints201908.0070.v1
Subject: Social Sciences, Geography, Planning And Development Keywords: city; large urban regions; Russia; globalization; open database
Online: 6 August 2019 (08:33:24 CEST)
This study explores how to delineate Russian cities in order to make them comparable on the world scale. In doing so we introduce the concept of large urban regions (LUR) applicable to the Russian urban context. This research is motivated by a principal research question: how to construct a statistical urban delineation, which would allow first, to demonstrate integration of cities into globalization, and second, to make global urban comparative research. Previous studies on urban delineation in Russia have focused almost exclusively on functional urban areas, which have substantial limitations and are not suitable for global urban comparisons. Addressing this research gap, we propose a new definition of Large Urban Regions (LUR). In doing so, first, we introduce the context of Russian cities (2), then we discuss existing Russian urban concepts (3), and justify a need for a new urban delineation (4). Afterwards, we present a general method to delineate Large Urban Regions in Russian context (5.1), and illustrate it in the two case studies of St. Petersburg (polycentric region) and Samara (monocentric region) (5.2). In the last part (6), we discuss the 10 the largest urban regions in Russia and describe a constructed database including all Russian LURs.
REVIEW | doi:10.20944/preprints201902.0063.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: SISAL database; speleothem; cave; isotopes; Middle East, palaeoclimate
Online: 6 February 2019 (13:32:13 CET)
The Middle East spans the transition between temperate Mediterranean climate in the Levant to hyper-arid sub-tropical deserts in the southern part of the Arabian Peninsula, with the complex alpine topography in the northeast feeding the Euphrates and Tigris rivers which support life in the southeastern Fertile Crescent (FC). Climate projections predict severe drying in major parts much of the ME in response to global warming, making it important to understand the controls of hydro-climate perturbations in the region. Here we discuss 23 ME speleothem stable oxygen isotope (δ18Occ) records from 16 sites from the SISAL_v1 database, which provide a record of past hydro-climatic variability. Sub-millennial changes in ME speleothem δ18Occ values primarily indicate changes in past precipitation amounts superimposed on variations of the main synoptic pattern in the region, specifically Mediterranean cyclones. The coherency (or lack thereof) between regional records is reviewed from Pleistocene to present, covering the Last Glacial Maximum (LGM), prominent events during deglaciation, and transition into the Holocene. The available speleothem δ18Occ time-series are investigated by binning and normalizing at 25-year and 200-year time windows over the Holocene. Important Holocene climatic oscillations are discussed, such as the 8.2 ka, 4.2 ka and 0.7 ka (the Little Ice Age) Before Present events. Common trends in the standardized anomalies are tested against different climate archives. Finally, recommendations for future speleothem-based research in the region are given along with comments on the utility and completeness of the SISAL database.
ARTICLE | doi:10.20944/preprints201810.0103.v1
Subject: Biology And Life Sciences, Virology Keywords: Nipah Virus, outbreak, inhibitors, QSAR, database, prediction algorithm
Online: 5 October 2018 (15:04:23 CEST)
Nipah virus (NiV) is responsible to cause various outbreaks in Asian countries, with latest from Kerala state of India. Till date there is no drug available despite its urgent requirement. In the current study, we have provided a computational one-stop solution for NiV inhibitors. We have developed “anti-Nipah” web resource, which comprised of a data repository, prediction method, and data visualization modules. The database comprised of 313 (181 unique) inhibitors from different strains and outbreaks of NiV extracted from research articles and patents. However, the quantitative structure–activity relationship (QSAR) based predictors were accomplished using classification approach employing 10-fold cross validation through support vector machine with 120 (68p + 52n) inhibitors. The overall predictor showed the accuracy and Matthew’s correlation coefficient of 88.89% and 0.77 on training/testing dataset respectively. The independent validation dataset also performed equally well. The data visualization modules from chemical clustering and principal component analyses displayed the diversity in the NiV inhibitors. Therefore, our web platform would be of immense help to the researchers working in developing effective inhibitors against NiV. The user-friendly webserver is freely available on URL: http://bioinfo.imtech.res.in/manojk/antinipah/
ARTICLE | doi:10.20944/preprints202307.1978.v1
Subject: Medicine And Pharmacology, Urology And Nephrology Keywords: enfortumab vedotin; urothelial carcinoma; database; real‐world; immune checkpoint
Online: 28 July 2023 (11:46:48 CEST)
Background Enfortumab vedotin shows promise as a targeted therapy for advanced urothelial carcinoma, particularly in patients who previously received platinum-based chemotherapy and immune checkpoint inhibitors. The EV-301 phase III trial demonstrated a significantly improved overall survival and response rates compared with standard chemotherapy. However, more data, especially from larger real-world studies, are needed to assess its effectiveness in Japanese patients. Methods A total of 6,007 urothelial cancer patients treated with pembrolizumab as second-line treatment were analyzed. Among them, 619 patients received enfortumab vedotin after pembrolizumab, while 394 received docetaxel or paclitaxel after pembrolizumab. Results The enfortumab vedotin group showed a longer overall survival than the paclitaxel/docetaxel group (p=0.013, hazard ratio: 0.71). In the multivariate analysis, enfortumab vedotin induction was an independent risk factor for the overall survival (p=0.013, hazard ratio: 0.70). There were no significant differences in the cancer-specific survival. Conclusions Enfortumab vedotin prolonged the overall survival in Japanese patients with advanced or metastatic urothelial carcinoma compared with paclitaxel or docetaxel after pembrolizumab treatment.
ARTICLE | doi:10.20944/preprints202307.0189.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: GIST; GI neurotransmitter; Pacemaker; SEER database; Clinical characteristics; Mortality
Online: 4 July 2023 (10:23:16 CEST)
Background: Gastrointestinal stromal tumors (GIST) are rare mesenchymal neoplasms of the gastrointestinal tract (GIT) that represent approximately 1 to 2 percent of primary gastrointestinal (GI) cancers. Owing to their rarity, very little is known about the overall epidemiology and prognostic factors of the pathology. The purpose of this study is to investigate the clinical characteristics, survival outcomes, and independent prognostic factors of patients with GIST in the past decade. Methods: A total of 2,374 patients diagnosed with GIST, between 2010 and 2017, were ultimately enrolled in our study by retrieving the Surveillance, Epidemiology, and End Results (SEER) database. We analyzed demographics, clinical characteristics, and overall mortality (OM) as well as cancer-specific mortality (CSM) of GIST. Variables with a p value < 0.01 in the univariate Cox regression were incorporated into the multivariate Cox model to determine the independent prognostic factors, with a hazard ratio (HR) of greater than 1 representing adverse prognostic factors. Results: Multivariate cox proportional hazard regression analyses of factors affecting all-cause mortality and GIST related mortality among US patients between 2010 and 2017 revealed higher overall mortality in Non-Hispanic Blacks (HR= 1.516, 95% CI 1.172-1.961, p= 0.002), age 80+ (HR= 9.783, 95% CI4.185-22.868, p= 0), followed by age 60-79 (HR= 3.408, 95% CI 1.488-7.807, p=0.004); male patients (HR= 1.795 , 95% CI 1.461-2.206, p=0); advanced disease with distant metastasis (HR= 3.865 , 95% CI 2.977-5.019, p=0), followed by regional involvement by both direct extension and lymph node involvement (HR= 3.853, 95% CI 1.551-9.57, p=0.004); and widowed patients (HR= 1.975, 95% CI 1.494-2.61, p= 0), followed by single patients (HR= 1.53, 95% CI 1.154-2.028, p=0.003). The highest CSM was observed in the same groups, except widowed patients and patients aged 60-79. The highest CSM was also observed among patients that underwent chemotherapy (HR= 1.687, 95% CI 1.19-2.392, p= 0.003). Conclusion: In this United States population-based retrospective cohort study using the SEER database, we found that non-Hispanic blacks, male patients, and patients older than 60 years have a higher mortality with GIST. Furthermore, patients who received chemotherapy have a higher GIST specific mortality and married patients had a lower mortality. However, we do not know to what extent these independent prognostic factors interact with each other to influence mortality. This study paves the way for future studies addressing those interactions. The results of this study may help treating clinicians to identify patient populations associated with dismal prognosis as those may require closer follow-up and more intensive therapy; furthermore, with married patients having a better survival, we hope to encourage clinicians to involve family members of the affected patients early in the disease course as the social support might impact the prognosis.
REVIEW | doi:10.20944/preprints202307.0144.v1
Subject: Medicine And Pharmacology, Other Keywords: Keywords: Cancer; database; genomic, proteomic, lipidomic, glycomic, clinical trials.
Online: 4 July 2023 (08:37:30 CEST)
Our search of existing cancer databases aimed to assess the current landscape and identify key needs. We analyzed 71 databases, focusing on genomics, proteomics, lipidomics, and glycomics. We found a lack of cancer-related lipidomic and glycomic databases, indicating a need for further development in these areas. Proteomic databases dedicated to cancer research were also limited. To assess overall progress, we included human non-cancer databases in proteomics, lip-idomics, and glycomics for comparison. This provided insights into advancements in these fields over the past eight years. We also analyzed other types of cancer databases, such as clinical trial databases and web servers. Evaluating user-friendliness, we used the FAIRness principle to assess findability, accessibility, interoperability, and reusability. This ensured databases were easily accessible and usable. Our search summary highlights significant growth in cancer databases while identifying gaps and needs. These insights are valuable for researchers, clinicians, and database developers, guiding efforts to enhance accessibility, integration, and usability. Addressing these needs will support advancements in cancer research and benefit the wider cancer community.
ARTICLE | doi:10.20944/preprints202304.1149.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: Primary cardiac sarcoma; SEER database; prognostic factors; clinical characteristics
Online: 28 April 2023 (08:42:14 CEST)
Background: Primary cardiac sarcomas (PCS) are extremely rare malignant tumors involving the heart. Only isolated case reports have been described. There is a paucity of data on the epidemiological characteristics of PCS. This study has the objective of investigating the epidemiologic characteristics, survival outcomes, and independent prognostic factors of PCS. Methods: We enrolled a total of 362 patients with PCS, between 2000 and 2017, by retrieving the Surveillance, Epidemiology, and End Results (SEER) database. We analyzed demographics, clinical characteristics, and overall mortality (OM) as well as cancer-specific mortality (CSM) of PCS. Variables with a p-value < 0.1 in the univariate Cox regression were incorporated into the multivariate Cox model to determine the independent prognostic factors, with a hazard ratio (HR) of greater than 1 representing adverse prognostic factors. Results: Crude analysis revealed a high OM in age 80+ (HR=5.958, 95% CI 3.357-10.575, p=0), followed by age 60-79 (HR=1.429, 95% CI 1.028-1.986, p=0.033); and PCS with distant metastases (HR=1.888, 95% CI 1.389-2.566, p=0). Patients that underwent surgical resection of the primary tumor and patients with malignant fibrous histiocytomas (HR=0.657, 95% CI 0.455-0.95, p=0.025) had a better OM (HR=0.606, 95% CI 0.465-0.791, p=0). The highest cancer-specific mortality was observed in age 80+ (HR=5.037, 95% CI 2.606-9.736, p=0) and patients with distant metastases (HR=1.953, 95% CI 1.396-2.733, p=0). Patients with malignant fibrous histiocytomas (HR=0.572, 95% CI 0.378-0.865, p=0.008) and those who underwent surgery (HR=0.581, 95% CI 0.436-0.774, p=0) had a lower CSM. Multivariate Cox proportional hazard regression analyses revealed higher OM in the age group 80+ (HR=13.261, 95% CI 5.839-30.119, p=0) and advanced disease with distant metastases (HR=2.013, 95% CI 1.355-2.99, p=0.001). Lower OM was found in patients with rhabdomyosarcoma (HR=0.364, 95% CI 0.154-0.86, p=0.021) and widowed patients (HR=0.506, 95% CI 0.263-0.977, p=0.042). Multivariate Cox proportional hazard regression analyses of CSM also revealed higher mortality in the same groups, and lower mortality in patients with Rhabdomyosarcoma. Conclusion: In this United States population-based retrospective cohort study using the SEER database, we found that cardiac rhabdomyosarcoma was associated with the lowest CSM and OM. Furthermore, as expected, age and advanced disease at diagnosis were independent factors predicting poor prognosis. Surgical resection of the primary tumor showed lower CSM and OM in the crude analysis but when adjusted for covariates in the multivariate analysis, it did not significantly impact the overall mortality or the cancer-specific mortality. These findings allow for treating clinicians to recognize patients that should be referred to palliative/hospice care at the time of diagnosis and avoid any surgical interventions as they did not show any differences in mortality. Surgical resection in patients with poor prognoses should be reserved as a palliative measure rather than an attempt to cure the disease.
ARTICLE | doi:10.20944/preprints202301.0249.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: Cervus nippon; mineral requirements; sodium; TRY Plant Traits Database
Online: 13 January 2023 (09:34:30 CET)
Deficient minerals in overabundant populations could act as an attractant to cull sika deer (Cervus nippon). Because selective culling of female deer is reported to be effective in reducing sika deer populations, it is particularly important to clarify the differences in mineral requirements between male and female. Here, using global plant trait data and a published list of sika deer food plants in Japan, we estimated whether food plants provide sika deer sufficient sodium (Na), calcium (Ca), and magnesium (Mg), and compared the results between male and female. An analysis of 191 food plant species suggested that food plants can provide sufficient Mg, whereas sufficient Na and Ca is not always provided, especially when the intake is small or the deer large. Na deficiency was more intense for lactating females than males, suggesting that Na can be an effective attractant for selectively culling female deer. In summary, this study demonstrated that sika deer in Japan might require extra Na and Ca sources in addition to food plants, and therefore these minerals could be useful for developing effective culling methods.
ARTICLE | doi:10.20944/preprints201902.0264.v1
Subject: Engineering, Civil Engineering Keywords: beams; database; experiments; flexure; shear; steel fiber reinforced concrete
Online: 28 February 2019 (07:10:10 CET)
Adding steel fibers to concrete improves the capacity in tension-driven failure modes. An example is the shear capacity in steel fiber reinforced concrete (SFRC) beams with longitudinal reinforcement and without shear reinforcement. Since no mechanical models exist that can fully describe the behavior of SFRC beams without shear reinforcement failing in shear, a number of empirical equations have been suggested in the past. This paper compiles the existing empirical equations and code provisions for the prediction of the shear capacity of SFRC beams failing in shear as well as a database of 487 experiments reported in the literature. The experimental shear capacities from the database are then compared to the prediction equations. This comparison shows a large scatter on the ratio of experimental to predicted values. The practice of defining the tensile strength of SFRC based on different experiments internationally makes the comparison difficult. For design purposes, the code prediction methods based on the Eurocode shear expression provide reasonable results (with coefficients of variation on the ratio of tested to predicted results of 27% - 29%). None of the currently available methods properly describe the behavior of SFRC beams failing in shear. As such, this work shows the need for studies that address the different shear-carrying mechanisms in SFRC and its crack kinematics.
REVIEW | doi:10.20944/preprints201809.0246.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: SISAL database, speleothem, cave, oxygen isotopes, Western Europe, palaeoclimate
Online: 13 September 2018 (15:32:34 CEST)
Western Europe is the region with the highest density of published speleothem δ18O (δ18Ospel) records worldwide. Here we review these records in light of the recent publication of the Speleothem Isotopes Synthesis and Analysis (SISAL) database . We investigate how representative the spatial and temporal distribution of the available records is for climate in Western Europe, and review potential sites and strategies for future studies. We show that spatial trends in precipitation δ18O are mirrored in the speleothems, providing means to better constrain the factors influencing δ18Ospel at a specific location. Coherent regional δ18Ospel trends are found over stadial-interstadial transitions of the last glacial, especially in high altitude Alpine records. Over the Holocene, regional trends are less clearly expressed, due to lower signal-to-noise ratios in δ18Ospel, but can potentially be extracted with the use of statistical methods. Overall, this first assessment highlights the potential of the European region for speleothem palaeoclimate reconstruction, while underpinning the importance of knowledge of local factors for a correct interpretation of δ18Ospel.
ARTICLE | doi:10.20944/preprints202310.1692.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: n/a; Ontology; Database; Cardiovascular Diseases; Diagnosis; Decision Support Systems
Online: 26 October 2023 (10:08:19 CEST)
Cardiovascular diseases (CVD) are chronic diseases associated with a high risk of mortality and morbidity. Early detection of CVD is crucial to initiating timely interventions, such as appro- priate counseling and medication, which can effectively manage the condition and improve patient outcomes. Preventive measures should be implemented at the general public level, promoting a healthy lifestyle, and at the individual level, that is, in people with moderate to high risk of CVD or patients already diagnosed with CVD by addressing an unhealthy lifestyle. Personalized early diagnostic systems based on artificial intelligence (AI), ontologies, and other medical information processing systems may prove to be a great preventive measure. In this paper, we focus on the use of ontology-inspired database models in the diagnosis of cardiovascular disease, as well as their potential for use in web application development.
ARTICLE | doi:10.20944/preprints202309.0119.v1
Subject: Medicine And Pharmacology, Anesthesiology And Pain Medicine Keywords: opioids; adverse effect database; FAERS, reporting odds ratio; cluster analysis
Online: 4 September 2023 (04:07:36 CEST)
Adverse events associated with opioid use in palliative care have been extensively studied. However, predicting the occurrence of adverse events based on the specific opioid used remains unclear. This study aimed to comprehensively analyze the adverse events caused by µ receptor stimulation of opioids approved in Japan and investigate the tendencies of adverse event occur-rence among different opioids.We utilized the FDA Adverse Event Reporting System (FAERS) database to extract reported adverse events of opioids approved in Japan. Cluster analysis was performed on reporting odds ratios (RORs) of adverse event names among opioids to visualize relationships between opioids and adverse events, facilitating a comparative study of their clas-sifications.We calculated the RORs of adverse events for the target opioids. Based on these RORs, we performed a cluster analysis, which resulted in the classification of 11 target opioids into five distinct groups. we were able to comprehensively compare and examine the relationships between opioids and adverse events. This analysis helps in understanding and managing the risks and benefits of each drug in palliative care settings. By analyzing relationships between opioids and adverse events, clinicians can make informed decisions about opioid selection, dosage, and monitoring to maximize patient safety and comfort.
ARTICLE | doi:10.20944/preprints202307.1694.v1
Subject: Medicine And Pharmacology, Internal Medicine Keywords: Idiopathic Pulmonary Fibrosis; Lung Neoplasms; Hydroxymethylglutaryl-CoA Reductase Inhibitors; Database
Online: 25 July 2023 (10:01:14 CEST)
Little is known about the effect of statin use in lung cancer development in idiopathic pulmonary fibrosis (IPF). We analyzed the database of the National Health Insurance Service to further investigate the clinical impacts of statin on lung cancer development and overall survival (OS) in IPF patients. The analysis included 9,182 individuals diagnosed with IPF, of which 3,372 (36.7%) were statin users. Compared to statin non-users, the time from diagnosis of IPF to lung cancer development and OS were longer in statin users in IPF patients. In Cox proportional hazard regression models, higher statin compliance, statin use, and being female had an inverse association with lung cancer risk, while older age at diagnosis of IPF and smoking history were associated with higher risk of lung cancer in IPF patients. For OS, statin use, female sex, higher exercise frequency, and diabetes were associated with longer survival. In contrast, older age at diagnosis of IPF and smoking history were associated with shorter OS in IPF patients. These data from a large population indicate that statin had an independent protective association with lung cancer development and mortality in IPF patients.
ARTICLE | doi:10.20944/preprints202211.0495.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: skin classification; skin detection; skin segmentation; skin database; neural networks
Online: 28 November 2022 (05:16:35 CET)
Skin detection, the process of distinguishing between skin and non-skin regions in a digital image, is widely used in a variety of applications ranging from hand gesture analysis to body part tracking to facial recognition. Skin detection is a challenging problem that has received a lot of attention from experts and proposals from the research community in the context of intelligent systems, but the lack of common benchmarks and unified testing protocols has hampered fairness among approaches. Comparisons are very difficult. Recently, the success of deep neural networks has had a major impact on the field of image segmentation detection, resulting in various successful models to date. In this work, we survey the most recent research in this field and propose fair comparisons between approaches using several different datasets. The main contributions of this work are: (i) a comprehensive literature review of approaches to skin color detection and a comparison of approaches that may help researchers and practitioners choose the best method for their application; (ii) a comprehensive list of datasets that report ground truth for skin detection; (iii) a framework for evaluating and combining different skin detection approaches. Moreover, we proposed an ensemble of convolutional neural networks and transformers that obtains state of the art performance. All the code is made publicly available at https://github.com/LorisNanni
REVIEW | doi:10.20944/preprints202206.0209.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: Quantum cryptography; Oblivious transfer; Secure multiparty computation; Private database query
Online: 15 June 2022 (02:31:33 CEST)
Quantum cryptography is the field of cryptography that explores the quantum properties of matter. Its aim is to develop primitives beyond the reach of classical cryptography or to improve on existing classical implementations. Although much of the work in this field has been dedicated to quantum key distribution (QKD), some important steps were made towards the study and development of quantum oblivious transfer (QOT). It is possible to draw a comparison between the application structure of both QKD and QOT primitives. Just as QKD protocols allow quantum-safe communication, QOT protocols allow quantum-safe computation. However, the conditions under which QOT is actually quantum-safe have been subject to a great amount of scrutiny and study. In this review article, we survey the work developed around the concept of oblivious transfer in the area of theoretical quantum cryptography, with an emphasis on some proposed protocols and their security requirements. We review the impossibility results that daunt this primitive and discuss several quantum security models under which it is possible to prove QOT security.
ARTICLE | doi:10.20944/preprints202107.0406.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: universal health coverage; health insurance claims; administrative data; claims database
Online: 19 July 2021 (11:38:35 CEST)
Although the universal health coverage (UHC) is pursued by many countries, not all countries with UHC include dental care as their benefits. Japan, with its long-held tradition of UHC, has covered dental care as essential benefit and majority of dental care services are provided to all patients with minimal copayment. Being under the UHC, the scope of services as well as prices are regulated by the uniform fee schedule and dentists submit claims according to the uniform format and fee schedule. The author analyzes the publicly available dental health insurance claims data as well as a sampling survey on dental hygiene and illustrates how Japan’s dental care is responding to the challenges of the population ageing.
ARTICLE | doi:10.20944/preprints202105.0068.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Mitochondrial Encephalohepatopathy, Trio-family, autosomal recessive, GEMINI tool, ClinVar database
Online: 5 May 2021 (15:02:57 CEST)
Mitochondrial Encephalohepatopathy (MEH) is an autosomal recessive neurodevelopmental disorder usually accompanied by microcephaly, white matter changes, cardiac and hepatic failure. Here, we applied the whole-exome sequencing (WES) framework on a trio family data with unaffected non-consanguineous parents and proband (neonate girl) with this inherited disorder. A total of 2,928,402 variants were observed with 2,613,746 SNPs, 112,336 multiple nucleotide polymorphisms (MNPs), 72,610 insertions, 113,207 deletions and 16,503 mixed variants. These variations are responsible for 82,813,631 effects on various genomic regions. Our pipeline uncovered candidate gene mutations from these variants and retained a handful of 5,277 variants harboring 3,598 genes, out of which, 8 genes codes for non-coding RNA while 178 genes are those with high impact severity. Among these 178 variants, 125 are de-novo variants that are not previously reported in the ClinVar database. Consistent to previous studies, the leftover high impact severity genes are involved in encephalopathy, Leigh syndrome, Charcot–Marie–Tooth disease, global developmental disorder, seizures, spastic paraplegia, premature ovarian failure, mitochondrial myopathy-cerebellar, ataxia-pigmentary, retinopathy syndrome, ocular and retinal degeneration, deafness, intellectual disability, cardiofacioneurodevelopmental syndrome etc. All these clinical features were also observed in the patient studied. The current analysis highlights and expands the genetic architecture of the MEH phenotype. Furthermore, this pipeline on trio family data significantly broadens the concept of its usefulness as a first-tier diagnostic method in the detection of complex multisystem phenotypic disorders.
ARTICLE | doi:10.20944/preprints202102.0270.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: MPDB2.0,; medicinal plant; medicinal plant database of Bangladesh; folk medicine
Online: 10 February 2021 (16:29:00 CET)
Medicinal plants are generally defined as rare herbals with potent medicinal activities that can be used as an alternative treatment for diseases. Recent studies exploring novel medicine developments, originating from folk-medicinal practices challenges this notion and suggests that both the circumference of the term medicinal plant and their potential application covers a substantially extensive verse than previously suggested. While medicinal plants are not limited to the borders of any country, Bangladesh and its south-east Asian neighbors do boast a huge collection of potent medicinal plants with considerable folk-medicine history compared to most other countries of the world. MPDB 2.0 is the continuation of MPDB 1.0, it serves as both a data repertoire for medicinal of Bangladesh and a user-friendly interface for researchers, health practitioners, drug developers, and students who wish to study the various medicinal & nutritive plants scattered around Bangladesh and the underlying phytochemicals contributing to their efficacy in folk medicine. While in developing MPDB 2.0 human diseases have been highly focused upon, the information in this database is not limited in its application for human diseases or diseases only, as many of the plants indexed here can serve in developing biofuel or bioremediation technologies or nutritive diets or cosmetics, etc. MPDB 2.0 comprises a collection of more than five hundred medicinal plants from Bangladesh along with a record of their corresponding scientific, family, and local names together with their utilized parts, information regarding ailments, active compounds, and PubMed ID of related publications.
ARTICLE | doi:10.20944/preprints201811.0280.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: HSP47; missense mutation; mutational hotspot; variant analysis; cancer database; chaperone
Online: 12 November 2018 (10:23:29 CET)
Heat shock protein 47kDa (HSP47) serves as a client-specific chaperone, essential for collagen biosynthesis and its folding and structural assembly. To date, there is no comprehensive study on mutational hotspots and protein network for human HSP47. Using five different human mutational databases, we deduced a comprehensive list of human HSP47 mutations and we found 24 67, 50, 43 and 2 deleterious mutations from the 1000 genomes data, gnomAD, COSMICv86, cBioPortal, and CanVar. We identified thirteen top-ranked missense mutations of HSP47 with the stringent cut-off of CADD score (>25) and Grantham score (≥151) as Ser76Trp, Arg103Cys, Arg116Cys, Ser159Phe, Arg167Cys, Arg280Cys, Trp293Cys, Gly323Trp, Arg339Cys, Arg373Cys, Arg377Cys, Ser399Phe, and Arg405Cys with the arginine-cysteine change as the predominant mutation. We also found that HSP47 is up-regulated and down-regulated in 11 and 4 of cancers types. Upon constructing protein interactome map of human HSP47, we found that a set of molecular chaperones is interaction partners of HSP47, which included two copies each of CREB binding proteins, HSP27, HSP40, HSP70, HSP90, ubiquitin proteins and one copy each of cartilage associated protein (CRTAP), HSPH1, HSBP1, FK506-binding protein B (FKBP), kruppel-like factor (KLF13), peptidyl-prolyl isomerase PIPB and Prolyl 4-hydroxylase beta subunit (P4HB). This suggested a cocktail of different chaperones interact with HSP47. These findings will assist in the evaluation of roles of HSP47 in human disease including different types of cancers.
ARTICLE | doi:10.20944/preprints201804.0202.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: place; spatial property graph; place graph; graph database; place description
Online: 16 April 2018 (10:05:25 CEST)
Everyday place descriptions provide a rich source of knowledge about places and their relative locations. This research proposes a place graph model for modeling this spatial, non-spatial, and contextual knowledge from place descriptions. The model extends a prior place graph, and overcomes a number of limitations. The model is implemented using the Neo4j graph database, and a management system has also been developed that allows operations including querying, mapping, and visualizing the stored knowledge in an extended place graph. Then three experimental tasks, namely georeferencing, reasoning, and querying, are selected to demonstrate the superiority of the extended model.
ARTICLE | doi:10.20944/preprints201804.0134.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: airborne laser scanning; geospatial database; data retrieval; road median; attributes
Online: 11 April 2018 (04:27:42 CEST)
Laser scanning systems make use of Light Detection and Ranging (LiDAR) technology to acquire accurately georeferenced sets of dense 3D point cloud data. The information acquired using these systems produces better knowledge about the terrain objects which are inherently 3D in nature. The LiDAR data acquired from mobile, airborne or terrestrial platforms provides several benefit over conventional sources of data acquisition in terms of accuracy, resolution and attributes. However, the large volume and scale of LiDAR data have inhibited the development of automated feature extraction algorithms due to the extensive computational cost involved in it. Moreover, the heterogeneously distributed point cloud, which represents objects with varying size, point density, holes and complicated structures pose a great challenge for data processing. Currently, geospatial database systems do not provide a robust solution for efficient storage and accessibility of raw data in a way that data processing could be applied based on optimal spatial extent. In this paper, we present Global LiDAR and Imagery Mobile Processing Spatial Environment (GLIMPSE) system that provides a framework for storage, management and integration of 3D LiDAR data acquired from multiple platforms. The system facilitates an efficient accessibility to the raw dataset, which is hierarchically represented in a geographically meaningful way. We utilise the GLIMPSE system to automatically extract road median from Airborne Laser Scanning (ALS) point cloud. In the first part of this paper, we detail an approach to efficiently retrieve the point cloud data from the GLIMPSE system for a particular geographic area based on user requirements. In the second part, we present an algorithm to automatically extract road median from the retrieved LiDAR data. The developed road median extraction algorithm utilises the LiDAR elevation and intensity attributes to distinguish the median from the road surface. We successfully tested our algorithms on two road sections consisting of distinct road median types based on concrete and grass-hedge barriers. The use of GLIMPSE improved the efficiency of the road median extraction in terms of fast accessibility to ALS point cloud data for the required road sections. The developed system and its associated algorithms provide a comprehensive solution to the user's requirement for an efficient storage, integration, retrieval and processing of large volumes of LiDAR point cloud data. These findings and knowledge contribute to a more rapid, cost-effective and comprehensive approach to surveying road networks.
ARTICLE | doi:10.20944/preprints202206.0289.v1
Subject: Engineering, Civil Engineering Keywords: database; eccentric punching shear; experiments; flat slab; punching; reinforced concrete; shear
Online: 21 June 2022 (05:42:44 CEST)
Eccentric punching shear can occur in concrete slab-column connections when the connection is subjected to shear and unbalanced moments. Typically, this situation results in edge and corner columns and is thus a common practical case. However, most punching experiments in the literature are concentric punching shear. This paper presents a developed database of eighty-eight experiments of flat slabs under eccentric punching shear, including a summary of the testing procedure of each reference and a description of the slab specimens. Additionally, a linear finite element analysis of all the specimens is included to determine the relevant sectional shear forces and moments. Finally, the ultimate shear stresses from the database experiments are compared to the shear capacities determined with ACI 318-19, Eurocode 2 NEN-EN 1992-1-1:2005, and the Model Code 2010. The comparison shows that the Model Code 2010 is the most precise in the predictions with an average tested over predicted ratio of 0.96 and a coefficient of variation of 27.96%. It can be concluded that this study represents the inconsistencies of the currently used design methods and the lack of experimental information.
ARTICLE | doi:10.20944/preprints202111.0019.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: Industry 4.0; Database; Data models; Big Data & Analytics; Asset Administration Shell
Online: 1 November 2021 (13:01:51 CET)
The data-oriented paradigm has proven to be fundamental for the technological transformation process that characterizes Industry 4.0 (I4.0) so that Big Data & Analytics is considered a technological pillar of this process. The literature reports a series of system architecture proposals that seek to implement the so-called Smart Factory, which is primarily data-driven. Many of these proposals treat data storage solutions as mere entities that support the architecture's functionalities. However, choosing which logical data model to use can significantly affect the performance of the architecture. This work identifies the advantages and disadvantages of relational (SQL) and non-relational (NoSQL) data models for I4.0, taking into account the nature of the data in this process. The characterization of data in the context of I4.0 is based on the five dimensions of Big Data and a standardized format for representing information of assets in the virtual world, the Asset Administration Shell. This work allows identifying appropriate transactional properties and logical data models according to the volume, variety, velocity, veracity, and value of the data. In this way, it is possible to describe the suitability of SQL and NoSQL databases for different scenarios within I4.0.
DATA DESCRIPTOR | doi:10.20944/preprints202106.0368.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Microbial Mash database, Mash distance, Genome containment, Type material, Microbial taxonomy
Online: 14 June 2021 (14:54:32 CEST)
The analysis of curated genomic, metagenomic, and proteomic data are of paramount importance in the fields of biology, medicine, education, and bioinformatics. Although this type of data is usually hosted in raw form in free international repositories, its access requires plenty of computing, storage, and processing capacities for the domestic user. The purpose of the study is to offer a comprehensive set of genomic and proteomic reference data, in an accessible and easy-to-use form to the scientific community. A representative type material set of genomes, proteomes and metagenomes were directly downloaded from the site: https://www.ncbi.nlm.nih.gov/assembly/ and from Genome Taxonomy Database, associated with the major groups of Bacteria, Archaea, Virus, and Fungi. Sketched databases were subsequently created and stored on handy raw reduced representations, by using Mash software. Our dataset contains near to 100 GB of space disk reduced to 585.78 MB and represents 87,476 genomics/proteomic records from eight informative contexts, which have been prefiltered to make them accessible, usable, and user-friendly with computational resources. Potential uses of this dataset include but are not limited to, microbial species delimitation, estimation of genomic distances, genomic novelties, paired comparisons between proteomes, genomes, and metagenomes.
Subject: Business, Economics And Management, Business And Management Keywords: Business Intelligence; Information systems; database; schemas; SGBD; DSS; GDSS; EIS; EDSS
Online: 24 March 2021 (12:02:23 CET)
El objetivo del siguiente artículo es presentar algunos conceptos para entender un poco más sobre qué es la inteligencia empresarial, cuál es su significado y para qué sirve, cómo ha influido en la vida de las empresas para crear sus estrategias de negocio y por qué esta herramienta . nos ayuda a tomar decisiones en una organización.
REVIEW | doi:10.20944/preprints202004.0012.v1
Subject: Chemistry And Materials Science, Theoretical Chemistry Keywords: chemoinformatics; chemical space; database; LANaPD; molecular diversity; drug discovery; natural sources
Online: 2 April 2020 (04:47:13 CEST)
Around the World, the number of compound databases of natural products in the public domain is rising. This is in line with the increasing synergistic combination of natural product research and chemoinformatics. Towards this global endeavor, countries in Latin America are assembling, curating, and analyzing the contents and diversity of natural products available in their geographical regions. In this manuscript we collect and analyze the efforts that countries in Latin America have made so far to build natural product databases. We further encourage the scientific community in particular in Latin America, to continue their efforts to building quality natural product databases and, whenever possible, to make them publicly accessible. It is proposed that all compound collections could be assembled into a unified resource called LANaPD: Latin America Natural Products Database. Opportunities and challenges to build, distribute, and maintain LANaPD are also discussed
REVIEW | doi:10.20944/preprints201901.0099.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: SISAL database; speleothem; cave; oxygen isotopes; North America; Central America; Caribbean
Online: 10 January 2019 (11:58:08 CET)
Speleothem oxygen isotope records from the Caribbean, Central, and North America reveal climatic controls that include orbital variation, deglacial forcing related to ocean circulation and ice sheet retreat, and the influence of local and remote sea surface temperature variations. Here, we review these records and the global climate teleconnections they suggest following the recent publication of the Speleothem Isotopes Synthesis and Analysis (SISAL) database. We find that low-latitude records generally reflect changes in precipitation, whereas higher latitude records are sensitive to temperature and moisture source variability. Tropical records suggest precipitation variability is forced by orbital precession and North Atlantic Ocean circulation driven changes in atmospheric convection on long timescales, and tropical sea surface temperature variations on short timescales. On millennial timescales, precipitation seasonality in southwestern North America is related to North Atlantic climate variability. Great Basin speleothem records are closely linked with changes in Northern Hemisphere summer insolation. Although speleothems have revealed these critical global climate teleconnections, the paucity of continuous records precludes our ability to investigate climate drivers from the whole of Central and North America for the Pleistocene through modern. This underscores the need to improve spatial and temporal coverage of speleothem records across this climatically variable region.
ARTICLE | doi:10.20944/preprints201706.0062.v1
Subject: Social Sciences, Library And Information Sciences Keywords: h-index; citations; published version; Scopus database; highly cited paper; bibliometrics
Online: 14 June 2017 (06:07:12 CEST)
The number of citations that a paper has received is the most commonly used indicator to measure the quality of research. Researchers, journals, and universities want to receive more citations for their scholarly publications to increase their h-index, impact factor, and ranking respectively. In this paper, we tried to analyses the effect of the number of available Google Scholar versions of a paper on citations count. We analyzed 10,162 papers which are published in Scopus database in year 2010 by Malaysian top five universities. Then we developed a software to collect the number of citations and versions of each paper from Google Scholar automatically. The result of spearman correlation coefficient revealed that there is positive significant association between the number of Google Scholar versions of a paper and the number of times a paper has been cited.
ARTICLE | doi:10.20944/preprints202310.2003.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Image classification; Computer vision; Transfer learning; Image database; Plant nutrition; Leaf analysis
Online: 31 October 2023 (08:13:17 CET)
Computer vision is a powerful technology that has enabled solutions in various fields by analyzing visual attributes in images. One field that has taken advantage of computer vision is agricultural automation, which promotes high-quality crop production. The nutritional status of a crop is a crucial factor in determining its productivity. This status is mediated by approximately 14 chemical elements acquired by the plant, and their determination plays a pivotal role in farm management. To address the timely identification of nutritional disorders, this study focuses on the classification of three levels of phosphorus deficiencies through individual leaf analysis. The methodological steps include: (1) generating a database with laboratory-grown maize plants that were induced to total phosphorus deficiency, medium deficiency, and total nutrition, using different capture devices; (2) processing the images with state-of-the-art transfer learning architectures (i.e. VGG16, ResNet50, GoogLeNet, DenseNet201, and MobileNetV2); and (3) evaluating the classification performance of the models using the created database. The results show that the VGG16 model achieves superior performance, with 98% classification accuracy. However, the other studied architectures also demonstrate competitive performance and are considered state-of-the-art automatic leaf deficiency detection tools. The proposed method can be a starting point to fine-tune machine vision-based solutions tailored for real-time monitoring of crop nutritional status.
ARTICLE | doi:10.20944/preprints202310.1754.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: аrtificial neural network; electric drive system; database; stability; training algorithm; technological mechanisms
Online: 27 October 2023 (04:55:14 CEST)
Keywords: Аrtificial neural network, electric drive system, database, stability, training algorithm, technological mechanisms.
ARTICLE | doi:10.20944/preprints202308.0575.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: music sound source; copyright information; log data; artificial intelligence; database; settlement; distribution
Online: 8 August 2023 (10:57:33 CEST)
Digital music is one of the most important things in market due to the music royalties’ distribution in Korea. As the music market is transformed to the digital music market such as download the music and streaming, the distribution of music royalties starting from online service provider (OSP) has come to be a highly important part for music rights holders. One of the most important issues in current music royalties’ distribution in Korea is the unfair way of music royalties due to indiscriminative repeat streaming of digital music. To prevent this, music log usage data from several OSPs was collected in day-based system, however, there was a limit to identify detailed information on the use of music in current state. This paper analyzed the structural problems and limitations of settlement of music royalties, and provides a structure in which there can be transparent settlement and distribution between users and rights holders as one of the institutional measures. We’ll also propose various AI-based applications using music log usage data. The proposed system will hopefully be used for public purposes.
REVIEW | doi:10.20944/preprints202303.0454.v1
Subject: Medicine And Pharmacology, Other Keywords: bibliometric analysis; Scopus database; ISSSTE; Mexican scientific research; scientometrics; manu-script writing
Online: 27 March 2023 (08:48:35 CEST)
Background: Bibliometric analysis provides insight into knowledge gaps of a specific field. We want to know what part of medical care has been the subject of research in a group of Mexican physicians. The Instituto de Seguridad y Servicios Sociales de los Trabajadores del Estado (ISSSTE) cares for a wide spectrum of disease and it provides a unique vision of what specialists has investigated about health. Methods: Papers belonging to “ISSSTE” affiliation were harvested and downloaded to .csv Excel files from the Scopus database including most bibliometric variables. VOSviewer, biblioshiny and bibliometric were used to conduct the bibliometric. Results: 2,063 papers were found and re-trieved; internal medicine had the greatest number of papers with 831; nine institutions were listed claiming “ISSSTE” as their mother affiliation; original papers represent 82% of the total and 52% of them were written in Spanish. Research production in Mexico City and the most productive center is Centro Médico Nacional 20 de Noviembre. Discussion: We identified main institutions, prolific authors, top-cited researchers and their affiliations, however, our paper is a call to action for the medical community in Latin America to join our efforts in building a solid group of researchers for the future of science.
ARTICLE | doi:10.20944/preprints202212.0378.v1
Subject: Medicine And Pharmacology, Gastroenterology And Hepatology Keywords: Gestational diabetes mellitus; Insulin resistance; Microbiome; Metabolome; Database; Differential abundance; Correlation; Pathways
Online: 21 December 2022 (02:53:39 CET)
Pregnancy is a dynamic state with multiple metabolic changes occurring including insulin resistance. Gestational diabetes mellitus (GDM), a form of diabetes that appears during pregnancy, develops if metabolic aberrations occur, in particular, in normal pregnancy-induced insulin resistance. Multi-omics is a powerful approach for uncovering the mechanisms driving metabolic change in different physiologic and pathologic states. A recent study demonstrated that the gestational gut microbiome mediates pregnancy metabolic adaptations through effects on gut indoleamine-2,3 dioxygenase 1 activity and the production of kynurenine. Using the dataset generated from this highly controlled study, we performed a comprehensive analysis of the pregnancy-specific physiological and metabolic profiles, 16S rRNA microbiome, and plasma untargeted LC-MS metabolome data. To facilitate the utilization of these analysis results by other researchers, we developed MOMMI-MP, a database that provides an easy-to-use platform to browse and search differential abundant microbial taxa and metabolites, and to examine metabolic pathways. The datasets consist of data collected from 3 genetically diverse strains of mice (C57BL/6J, CD1, and NIH-Swiss) over 6 time points during the gestational (days 0, 10, 15, and 19 during gestation) and postpartum (days 3 and 20 after delivery) states, totaling 180 samples for each strain. The computational results are presented in various tables and plots, and organized in MOMMI-MP to empower exploratory analyses by other researchers. In conclusion, MOMMI-MP is a resource to facilitate the investigation of novel mechanisms governing metabolic changes during pregnancy.
ARTICLE | doi:10.20944/preprints202209.0317.v1
Subject: Medicine And Pharmacology, Dermatology Keywords: Bibliometric analysis; Scopus database; Syphilis; Sexually Transmitted Infections; Public Health; Research; Global
Online: 21 September 2022 (07:14:19 CEST)
Sexually transmitted infections encompass considerable effects on human sexual and reproductive health. Its presence is ubiquitous despite decades of prevention and management. The present study has been conducted to provide an insightful bibliometric analysis of syphilis based on the Scopus repository. Therefore, given the dearth of a consolidated bibliometric analysis on Syphilis, this investigation aimed to compile the literature of the last century (1921-2021) to gain insight into the publications pertinent to the burden, diagnosis, treatment, and management of Syphilis. In this study, we have provided the year-wise, and subject-wise publications, type of articles, country, funding organizations, institutions, citations, and H-index. The data obtained from the Scopus database was exported to CSV file format and then converted to Microsoft Excel version for analysis to curtail the chances of error in the information. It has been evidenced that the USA possesses the highest track of proven publications. Therefore, this study considerably contributes to the future leaders, researchers & specialists/ clinicians of the domain
REVIEW | doi:10.20944/preprints202202.0200.v1
Subject: Biology And Life Sciences, Immunology And Microbiology Keywords: antibiotic resistance genes; antibiotic resistance gene database; annotation of antibiotic resistance genes
Online: 17 February 2022 (04:52:10 CET)
As the prevalence of antimicrobial resistance genes is increasing in microbes, we are facing the return of the preantibiotic era. Consecutively, the number of studies concerning antibiotic resistance and its spread in the environment is rapidly growing. Next generation sequencing technologies are widespread used in many areas of biological research and antibiotic resistance is no exception. For the rapid annotation of whole genome sequencing and metagenomic results considering antibiotic resistance, several tools and data resources were developed. These databases, however can differ fundamentally in the number and type of genes and resistance determinants they comprise. Furthermore, the annotation structure and metadata stored in these resources can also contribute to their differences. Several previous reviews were published on the tools and databases of resistance gene annotation, however, to our knowledge, no previous review focused solely and in depth on the differences in the databases. In this review, we compare the most well-known and widely used antibiotic resistance gene databases based on their structure and content. We believe that this knowledge is fundamental for selecting the most appropriate database for a research question and for the development of new tools and resources of resistance gene annotation.
ARTICLE | doi:10.20944/preprints202012.0387.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: next-generation sequencing; database; variant annotation; variant classification; data management; clinical genomics
Online: 15 December 2020 (13:14:21 CET)
The rapid evolution of Next Generation Sequencing in clinical settings and the resulting challenge of variants interpretation in the light of constantly updated information, requires robust data management systems and organized approaches to variant reinterpretation. In this paper, we present iVar: a freely available and highly customizable tool provided with a user-friendly web interface. It represents a platform for the unified management of variants identified by different sequencing technologies. iVar accepts, as input, VCF files and text annotation files and elaborates them, optimizing data organization and avoiding redundancies. Updated annotations can be periodically re-uploaded and associated to variants as historicize attributes. Data can be visualized through variant-centered and sample-centered interfaces. A customizable search functionality can be exploited to periodically check if pathogenicity related data of a variant are changed over time. Patient recontacting ensuing from variant reinterpretation is made easier by iVar through the effective identification of all patients present in the database and carrying a specific variant. We tested iVar by uploading 4171 VCF files and 1463 annotation files, obtaining a database of 4166 samples and 22569 unique variants. iVar has proven to be a useful tool with good performances for collecting and managing data from medium-throughput
ARTICLE | doi:10.20944/preprints202007.0132.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: coronavirus; spike protein; database; sequence alignment; mutation; homology model; hydrophobic amino acids
Online: 7 July 2020 (16:49:04 CEST)
Analysis of SARS-CoV-2 spike protein sequences of over 19 countries from biological databases submitted around the globe was carried out with help of bioinformatics tools and structure prediction databases. Initial data analysis showed entry of virus into different geographic regions started in the month of January 2020. Meanwhile, alignment of spike protein sequences of SARS-CoV-2 isolates from China and other countries revealed a critical mutation of D614G. Surprisingly, mutation D614G was not seen in early samples submitted in the month of January but gradually it started appearing globally from the month of March 2020. However, the mutations of amino acids in the spike protein other than D614G exhibiting similar pI and altered polarity were found to be specific to geographical regions. Besides, prediction of homology model for interaction of spike protein showed predominant role of chain C of trimeric spike protein in adhering receptor binding domain (RBD) of human ACE2 receptor. Furthermore, the prediction of glycosylation points has revealed that there are about 20 N-glycosylation potential sites on spike protein. We believe that the information present here would not only help in thorough understanding of infectivity but also enhance the knowledge of the scientific community in developing prophylactics and/or therapeutics for SARS-CoV19-2 virus.
COMMUNICATION | doi:10.20944/preprints202003.0125.v1
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: 2019-nCoV; Darunavir; ACE-2; Receptor Binding Domain; Metastable Conformation; FDA database
Online: 7 March 2020 (16:28:05 CET)
The transnational spread of coronavirus (2019-nCoV) first detected in Wuhan is causing global panic; thus, accelerated research into clinical intervention is of high necessity. The spike glycoprotein structure has been resolved, and its affinity to human angiotensin-converting enzyme 2 (ACE-2) has been experimentally validated. Here, using computational methods, a metastable conformation of 2019-nCoV-RBD/ACE-2 complex has been revealed and FDA-database of approved drugs have been docked into the interface. Darunavir has been discovered as high ligand affinity candidate capable of disrupting communication between 2019-nCoV-RBD and ACE-2. Darunavir, in addition to its previously known anti-HIV protease inhibitor is now repurposeable for the treatment 2019-nCoV disease acting via disruption of cellular recognition, binding and invasion.
ARTICLE | doi:10.20944/preprints201905.0091.v1
Subject: Medicine And Pharmacology, Pharmacology And Toxicology Keywords: Iranian traditional medicine; alternative and complementary medicine; database; natural products; Mizaj; temperament
Online: 8 May 2019 (10:08:54 CEST)
As a holistic medical school, Iranian traditional medicine (ITM) considers the human body as a dynamic and intricate network of interconnecting processes. Currently, systems biology and more precisely systems medicine and pharmacology can be an aid in providing rationalizations for many traditional medications and treatments and elucidating a great deal of knowledge they can offer to guide future research in medicine. Therefore, re-organization and standardization of traditional medicine data are requested more than ever before. To address this issue, we have constructed UNaProd, a Universal Natural Product database for materia medica of ITM. Primarily based on Makhzan al-Advieh, which is the most recent encyclopedia of materia medica in ITM with the largest number of monographs, this database was created using both text mining methods and manual editing. UNaProd is currently hosting to 2696 monographs from herbal to animal to mineral compounds in 16 diverse attributes such as origin and scientific name. In the current version, UNaProd is hyperlinked to IrGO and CMAUP databases for Mizaj and molecular features respectively and it is freely available at http://jafarilab.com/unaprod/.
ARTICLE | doi:10.20944/preprints201709.0126.v1
Subject: Engineering, Control And Systems Engineering Keywords: 3D terrain models; synthetic environment; modeling and simulation; OGC standards; common database
Online: 26 September 2017 (04:13:40 CEST)
Recent advances in sensor and platform technologies such as satellite systems, unmanned aerial vehicles (UAV), manned aerial platforms, and ground-based sensor networks have resulted in massive volumes of data is produced and collected about the earth. Processing, managing, and analyzing these data is one of the main challenges in 3D synthetic representation used in modeling and simulation (M&S) of the natural environment. M&S devices, such as flight simulators, traditionally require a variety of different databases to provide a synthetic representation of the world. M&S often requires integration of data from a variety of sources stored in different formats. Thus, for simulation of a complex synthetic environment, such as a 3D terrain model, tackling interoperability among its components (geospatial data, natural and man-made objects, dynamic and static models) is a critical challenge. Conventional approaches used local proprietary data models and formats. These approaches often lacked interoperability and created silos of content within the simulation community. Therefore, open geospatial standards are increasingly perceived as a means to promote interoperability and reusability for 3D M&S. In this paper, the Open Geospatial Consortium (OGC) CDB Standard is introduced. “CDB” originally refers to Common DataBase which is currently considered as a name with no abbreviation in the OGC community. The OGC CDB is an international standard for structuring, modeling, and storing geospatial information required in high performance modeling and simulation applications. CDB defines the core conceptual models, use cases, requirements, and specifications for employing geospatial data in 3D M&S. The main features of the OGC CDB Standard are described as run-time performance, full plug-and-play interoperable geospatial data store, usefulness in 3D and dynamic simulation environment, ability to integrate proprietary and open-source data formats. Furthermore, compatibility with the OGC standards baseline reduces the complexity of discovering, transforming, and streaming geospatial data into the synthetic environment and makes them more widely acceptable to major geospatial data/software producers. This paper includes an overview of OGC CDB version 1.0 which defines a conceptual model and file structure for the storage, access, and modification of a multi-resolution 3D synthetic environment data store. Finally, this paper presents a perspective of future versions of the OGC CDB and what the steps are for humanizing the OGC CDB standard with the other OGC/ISO standards baseline.
ARTICLE | doi:10.20944/preprints202103.0640.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Databases; database administration; database management systems; counting; storage; structure; search; No SQL; SQL; Oracle; relational databases; non-relational databases; magnetic tapes; punched tapes; relational model; Datamining; BigData; Datawarehouse
Online: 25 March 2021 (16:05:52 CET)
Databases are by far the most valuable asset of companies. Since the need was seen not only to count but also to have some type of record of elements such as crops, animals, money, properties and that this record could be consulted and modified according to the situation, that is where the first database was born. , and after that, these databases cannot be disorganized, they also need to be managed and administered under established standards that facilitate their understanding and management not only by their creators but by the other people who subsequently administer them. Databases and database management systems have an interesting evolutionary history that deserves to be analyzed and this is the objective of this document, where it is sought to understand. Along with databases and their management systems, data mining or Data mining arises that in order not to extend ourselves so much, it is the job of finding common patterns in various data sources and in what way they can be used to predict situations or results of various circumstances; We also focus on the other topic that we will present, Oracle data mining, which roughly is to merge data mining with Oracle, which makes it a powerful tool for obtaining information and predicting results based on statistics.In this article we will study and analyze the ideas, concepts and basic examples that make up SGBD and Data Mining and, we will try to go deeper into this topic, the use of decision techniques such as advanced statistical algorithms. We also present a fictitious example of the application of these techniques: predicting which products can be sold based on their relationship with others. we will give a brief explanation of association rules, data mining cycle and the types of learning and the evolution that data mining has had.
ARTICLE | doi:10.20944/preprints202311.0894.v1
Subject: Medicine And Pharmacology, Clinical Medicine Keywords: invasive fungal infection; fungus; candida; aspergillus; spondylodiscitis; osteomyelitis; mortality; chang gung research database
Online: 14 November 2023 (10:24:43 CET)
Objectives. Invasive fungal spondylodiscitis (IFSD) is rare and could be lethal in certain circumstances. Previous literature revealed limited data concerning the outcomes. This study aimed to establish a risk-scoring system. Methods. A total of 53 patients were included in the study from a multi-centered database in Taiwan. All the clinicopathological and laboratory data were retrospectively analyzed. Variables strongly related to 1-year mortality were identified using a multivariate Cox proportional hazards model. A receiver operating characteristic curve was used to express the performance of our IFSD scoring model. Results. Five strong predictors were included in the IFSD score: predisposing immunocompromised state, the initial presentation of either radiculopathy or myelopathy, initial laboratory findings of WBC > 12.0 or < 0.4 103/uL, hemoglobin < 8 g/dL, and evidence of candidemia. 1-year mortality rates for patients with IFSD Scores of 0, 1, 2, 3, and 4 were 0%, 16.7%, 56.3%, 72.7%, and 100%, respectively. The area under the curve of the receiver operating characteristic curve was 0.823. Conclusions. We developed a practical scoring model with easily obtained demographic, clinical, and laboratory parameters to predict the probability of 1-year mortality in patients with IFSD. However, more large-scale and international validations would be necessary before this scoring model is commonly used.
ARTICLE | doi:10.20944/preprints202211.0461.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Digital Geological Survey; Mobile GIS; Relational database; Geopackage; Landslide Inventory; Post-disaster management
Online: 25 November 2022 (02:25:21 CET)
Over the last few decades, the approach to geological and geomorphological surveys has changed remarkably. The advent of digital tools has allowed significant advances in the acquisition and management of survey data. In this paper, we demonstrate the development and testing of a new and effective digital survey method that allows for the fast acquisition and collaborative storage and management of data and information. This method was tested in collaboration with five universities for the mapping and classification of landslides in 249 survey areas in Central Italy and, more precisely, in the municipalities affected by the 2016 Central Italy Earthquake. Geological and geomorphological surveys were carried out in the field with tablet PCs, GPS, and cameras. The survey project for collecting field data was based on the structure of the Italian Landslide Inventory (IFFI) and the Territorial Resilience Central Apennines Earthquake Reconstruction (ReSTART) projects. The structure of the database and input forms were implemented for these aims. Moreover, the data and information were retrieved and organised in detailed records useful to the administrative entities.
ARTICLE | doi:10.20944/preprints202207.0037.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: remote sensing; satellite; altimetry; water level; water inland; essential climate variable; database; hydrology
Online: 4 July 2022 (08:02:24 CEST)
Surface water availability is a fundamental environmental variable to implement effective climate adaptation and mitigation plans, as expressed by scientific, financial and political stakeholders. Recently published requirements urge the need for homogenised access to long historical records at a global scale, together with the standardised characterisation of the accuracy of observations. While satellite altimeters offer world coverage measurements, existing initiatives and online platforms provide derived water level data. However, these are sparse, particularly in complex topographies. This study introduces a new methodology in two steps 1) teroVIR, a virtual station extractor for a more comprehensive global and automatic monitoring of water bodies, and 2) teroWAT, a multi-mission, interoperable water level processor, for handling all terrain types. L2 and L1 altimetry products are used, with state-of-the-art retracker algorithms in the methodology. The work presents a benchmark between teroVIR and current platforms in West Africa, Kazakhastan and the Arctic: teroVIR shows an unprecedented increase from 55% to 99% in spatial coverage.A large-scale validation of teroWAT results in an average of unbiased root mean square error ubRMSE of 0.638 m on average for 36 locations in West Africa. Traditional metrics (ubRMSE, median, absolute deviation, Pearson coefficient) disclose significantly better values for teroWAT when compared with existing platforms, of the order of 8 cm and 5% improved respectively in error and correlation. teroWAT shows unprecedented excellent results in the Arctic, using a L1 products based algorithm instead of L2 one, reducing the error of almost 4 m on average. To further compare teroWAT with existing methods, a new scoring option, teroSCO, is presented, measuring the quality of the validation of time series transversally and objectively across different strategies. Finally, teroVIR and teroWAT are implemented as platform-agnostic modules and used by flood forecasting and river discharge methods as relevant examples. A review of various applications for miscellaneous end-users is given, tackling the educational challenge raised by the community.
ARTICLE | doi:10.20944/preprints202108.0502.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: Geological surveys; Field mapping; Python; QGIS plugin; RDBMS; Seismic microzonation; SQLite-SpatiaLite Database.
Online: 26 August 2021 (09:54:56 CEST)
MzSTools is a plugin for QGIS developed by the National Research Council (CNR) as part of the activities concerning the coordination of seismic microzonation studies in Italy. It train from the need to create a practical and easy-to-use tool to carry out seismic microzonation (SM) studies by producing standards compliant geographic database and maps, thus making them accurate, homogeneous and uniform for all municipalities in Italy. A geodatabase based on SQLite/SpatiaLite Relational Database Management System (RDBMS). It has been designed to collect and store data related to elements such as: geognostic surveys; bedrocks and cover terrains; superficial and buried geomorphological elements; tectonic-structural elements; elements of geological instability such as landslide zones, liquefaction zones and zones affected by active and capable faults; homogeneous microzones in seismic perspective, microzones characterized by a seismic amplification factor. The QGIS plugin provides tools such as data entry forms designed with Qt Designer; a QGIS project template with layers, symbol libraries and graphic styles; layouts for the SM Maps. MzSTools assembles in a single software environment a set of useful tools for those who work in. The plugin is open source, whose code hosted on the GitHub platform, and is published via the official QGIS plugins repository (https://plugins.qgis.org/plugins/MzSTools/).
ARTICLE | doi:10.20944/preprints202105.0386.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: tumor microenvironment; meta-analysis; tumor stroma; breast cancer; LCM; microdissection; transcriptomics; microarray; database
Online: 17 May 2021 (13:17:53 CEST)
Background: transcriptome data provide a valuable resource for the study of cancer molecular mechanisms, but technical biases, samples’ heterogeneity and small sample sizes result in poorly reproducible lists of regulated genes. Additionally, the presence of multiple cellular components contributing to cancer development complicate the interpretation of bulk transcriptomic profiles. Methods: we collected 48 microarray datasets of laser capture microdissected breast tumors, and performed a meta-analysis to identify robust lists of genes differentially expressed in these tumors. We created a database with carefully harmonized metadata to be used as a resource for the research community. Results: combining the results of multiple datasets improved the statistical power, and the analysis of stroma and epithelium separately allows identifying genes with different contribution in each compartment. Conclusions: our database can profitably help biomarkers’ discovery and is readily accessible through a user-friendly web interface (https://aurorasavino.shinyapps.io/metalcm/).
ARTICLE | doi:10.20944/preprints202102.0502.v1
Subject: Social Sciences, Language And Linguistics Keywords: Alzheimer's Disease; Onset Age; Bilingualism; Cognitive Reserve; Dementia; Mild Cognitive Impairment; ADNI database
Online: 23 February 2021 (09:20:19 CET)
Background: This paper investigates the statistical relationship between bilingualism and the Onset Age (OA) of AD and MCI across a clinical sample, consisting of 580 Alzheimer's Disease (AD) subjects and 1264 Mild Cognitive Impairment (MCI) subjects, via a statistical analysis conducted on the sample retrieved from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Method: To investigate whether bilingualism has any correlation with the OAs of AD or MCI subjects, our study leverages the full potential of the ADNI dataset, a dataset that covers both the OA and the bilingualism status of both the AD and MCI subjects. Prior to performing any meaningful statistical analysis, a regression model and a probabilistic model were developed in parallel to fill in the missing OA and bilingualism values. A simple least-square regression model that consists of an independent variable of registered age for Mini-Mental State Examination (MMSE) score was used to estimate the OA of the AD and MCI subjects in the ADNI dataset. After filling in the missing OA values, the number of subjects relevant for the statistical analysis increased from 816 (AD: 371, MCI: 445) to 1844 (AD: 580, MCI: 1264), which greatly enlarged the representation of the AD and MCI sample in the ADNI population. With increased sample size, a novel probabilistic classification model was introduced to infer an ADNI subject’s bilingualism when relevant demographic information and deterministic outcome were not readily available from the ADNI dataset. The weighted average OA for the bilinguals and the monolinguals was then computed, where the weights for the probabilistic labels were assigned based on the percentage of bilingualism in the general US population. Finally, a statistical analysis was performed to test whether any statistically significant correlation exists between the OA and the bilingualism of the AD and MCI subjects within the ADNI dataset. Findings: Our preliminary study demonstrates no significant statistical difference between the OA of the bilinguals and the monolinguals within the ADNI dataset. Thus, the monolingual speakers within the ADNI dataset do not statistically manifest earlier onset, as compared to the bilingual speakers, which is slightly inconsistent with some earlier statistical findings that bilingual speakers enjoy certain distinctive advantages, such as late onset of AD, as compared to monolingual counterparts.
ARTICLE | doi:10.20944/preprints201810.0062.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: ASTER instrument, stereo, digital elevation model, global database, optical sensor, water body detection.
Online: 3 October 2018 (17:01:08 CEST)
A waterbody detection technique is an essential part of digital elevation model (DEM) generation to delineate land-water boundaries and set flattened elevations. This paper describes the technical methodology for improving the initial tile-based waterbody data that are created during production of the ASTER GDEM, because without improvement such tile-based waterbody data are not suitable for incorporating into the new ASTER GDEM Version 3. Waterbodies are classified into three categories: sea, lake, and river. For sea-waterbodies, the effect of sea ice is removed to better delineate sea shorelines in high latitude areas, because sea ice prevents accurate delineation of sea shorelines. For lake-waterbodies, the major part of the processing is to set the unique elevation value for each lake using a mosaic image that covers the entire lake area. Rivers present a unique challenge, because their elevations gradually step down from upstream to downstream. Initially, visual inspection is required to separate rivers from lakes. A stepwise elevation assignment, with a step of one meter, is carried out by manual or automated methods, depending on the situation. The ASTER GWBD product consists of a global set of 1º latitude-by-1º longitude tiles containing water body attribute and elevation data files in geographic latitude and longitude coordinates and with one arc second posting. Each tile contains 3601-by-3601 data points. All improved waterbody elevation data are incorporated into the ASTER GDEM to reflect the improved results.
ARTICLE | doi:10.20944/preprints202306.1844.v1
Subject: Biology And Life Sciences, Plant Sciences Keywords: Dryopteris affinis ssp. affinis; Dryopteris oreades; fern; gametophyte; non-seed plants; proteome; STRING database.
Online: 27 June 2023 (04:44:31 CEST)
Ferns and lycophytes, now known as monilophytes, have received scant molecular attention in comparison to angiosperms. The advent of high-throughput technologies allowed an advance towards a greater knowledge of their elusive genomes. In this work, proteomic analyses of heart-shaped gametophytes of two ferns were performed: the apomictic species Dryopteris affinis ssp. affinis and its sexual relative Dryopteris oreades. In total, a set of 218 proteins shared by these two gametophytes were analyzed using the STRING database, and the proteome associated with metabolism, genetic information processing, and responses to abiotic stress is discussed. Specifi-cally, we report proteins involved in the metabolism of carbohydrates, lipids, and nucleotides, the biosynthesis of amino acids, and secondary compounds, energy, oxido-reduction, transcrip-tion, translation, protein folding, sorting and degradation, and responses to abiotic stress. Look-ing at the interactome of this set of proteins, it represents a total network composed of 218 nodes and 1,792 interactions, obtained mostly from databases and textmining. The interactions among the identified proteins of the ferns D. affinis and D. oreades, together with the description of their biological functions, might contribute to a better understanding of the function and develop-ment of ferns as well as to fill knowledge gaps in plant evolution.
ARTICLE | doi:10.20944/preprints202304.0240.v1
Subject: Biology And Life Sciences, Plant Sciences Keywords: Dryopteris affinis ssp. affinis; Dryopteris oreades; fern; gametophyte; non-seed plants; proteome; STRING database
Online: 12 April 2023 (04:42:51 CEST)
Ferns and fern-allies, now known as Monylophytes, have received scant molecular attention, in relation to angiosperm group. The advent of high-throughput technologies allows to advance towards a greater knowledge of their elusive genome. In this work, samples of apogamous and sexual heart-shaped gametophytes from two ferns: the apomictic species Dryopteris affinis ssp. af-finisand its sexual relative Dryopteris oreades were extracted and identified. In total, a set of 218 proteins shared by these gametophytes were analysed by using the STRING database, and the proteome associated to metabolism, genetic information processing and responses to abitotic stress is discussed. Specifically, there are reported proteins involved in metabolism of carbohy-drates and lipids, biosynthesis of amino acids, metabolism of nucleotides, energy, and secondary compounds, oxido-reduction, transcription, translation, folding, sorting, and degradation, and response to abiotic stress. Some homologs of proteins found are MACCI-BOU (MAB1), MOSAIC DEATH 1 (MOD1), MAINTENANCE OF PHOTOSYSTEM II UNDER HIGH LIGHT 2 (MPH2), TRANSPARENT TESTA 5 (TT5), ALBINO OR GLASSY YELLOW 1 (AGY1), LEUCYL AMINOPEP-TIDASE 1 and 3 (LAP1 and LAP3), or LOW EXPRESSION OF OSMOTICALLY RESPONSIVE GENES 1 (LOS1). The interactome of the set of proteins was also studied, being the most common interac-tions database and textmining. All these data about the interactions that exist between the stud-ied proteins of the ferns D. affinis and D. oreades, together with the description of their biological function, might contribute to better understand the functioning and development of ferns as well as to fulfil gaps of knowledge in plant evolution.
ARTICLE | doi:10.20944/preprints201810.0069.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: urban system; urban context; microzone, fuzzy rule set; Mamdani fuzzy system; spatial database, GIS
Online: 4 October 2018 (11:55:09 CEST)
We present a new unsupervised method aimed to obtain a partition of a complex urban system in homogenous urban areas, called urban contexts. The area of study is initially partitioned in microzones, homogeneous portion of the urban system, that are the atomic reference elements for the census data. With the contribution of domain experts, we identify the physical, morphological, environmental and socio-economic indicators need to identify synthetic characteristics of urban contexts and create the fuzzy rule set necessary to determine the type of urban context. We implement the set of spatial analysis processes necessary to calculate the indicators for microzone and apply a Mamdani fuzzy rule system to classify the microzones. Finally, the partition of the area of study in urban contexts is obtained by dissolving continuous microzones belonging to the same type of urban context. Tests are performed on the Municipality of Pozzuoli (Naples - Italy); the reliability of out model is measured by comparing the results with the ones obtained by detailed analysis.
ARTICLE | doi:10.20944/preprints201608.0073.v2
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: land surface temperature; thermal infrared; calibration; generalized split-window; mono-window; database; radiative transfer
Online: 16 September 2016 (13:12:09 CEST)
Land Surface Temperature (LST) is routinely retrieved from remote sensing instruments using semi-empirical relationships between top of atmosphere (TOA) radiances and LST, using ancillary data such as total column water vapor or emissivity. These algorithms are calibrated using a set of forward radiative transfer simulations that return the TOA radiances given the LST and the thermodynamic profiles. The simulations are done in order to cover a wide range of surface and atmospheric conditions and viewing geometries. This work analyses calibration strategies, considering some of the most critical factors that need to be taken into account when building a calibration dataset, covering the full dynamic range of relevant variables. A sensitivity analysis of split-windows and single channel algorithms revealed that selecting a set of atmospheric profiles that spans the full range of surface temperatures and total column water vapor combinations that are physically possible seems beneficial for the quality of the regression model. However, the calibration is extremely sensitive to the low-level structure of the atmosphere indicating that the presence of atmospheric boundary layer features such as temperature inversions or strong vertical gradients of thermodynamic properties may affect LST retrievals in a non-trivial way. This article describes the criteria established in the EUMETSAT Land Surface Analysis – Satellite Application Facility to calibrate its LST algorithms applied both for current and forthcoming sensors.
ARTICLE | doi:10.20944/preprints202304.1080.v1
Subject: Engineering, Safety, Risk, Reliability And Quality Keywords: chemistry laboratory accident; risk factor analysis; laboratory accident system; manual accident database; quantitative prediction risk
Online: 27 April 2023 (13:08:49 CEST)
With the growing of university chemistry experiment projects, scientific research personnel, specialized equipment, laboratory accident is increasing yearly. And accident data lacks a safety platform to store related information and cannot be guaranteed for efficient conditional sharing. To solve these problems, we designed a laboratory accident system to store, share related data and predict risk level. In this paper, we manually collected chemistry laboratory accidents by python software and class assignments, then analyses risk factor variables using Spsspro, finally established a prediction model using Stata. We intended to register laboratory related data into proposed chemistry accident system based on data ownership safety architecture. The chemistry accident system can break data barriers using confirmation and authorization key technology to trace non-tampered data source in real time when emergency accident happens. Meanwhile, our proposed system can use our designed accident risk model to predict risk level of any experiment project. It can also recommend appropriate safety education models.
ARTICLE | doi:10.20944/preprints202201.0227.v1
Subject: Social Sciences, Behavior Sciences Keywords: Bayesian inference; race and ethnicity imputation; All Payer Claims Database; vital statistics death records; validation
Online: 17 January 2022 (12:40:15 CET)
Background: All Payer Claims Databases (APCD) are a rich source of health information, however, race and ethnicity (R&E) data are largely missing. Bayesian Improved Surname Geocoding (BISG) is a common R&E imputation method, yet, validation of BISG in APCDs is lacking. We used the BISG to impute missing R&E in the Oregon APCD. Methods: BISG imputed R&E for Asian Pacific Islanders (API), Blacks, Hispanics and Whites were contrasted to the gold standard (vital statistics) and sensitivity and specificity improvements were assessed. Logistic regression examined whether missing R&E was random across patient characteristics. Results: Among 85,857 individuals in the study, 32.1% (n=27,594) had missing R&E. Missing R&E was not randomly distributed. There were higher odds of missingness among males, Whites, those age 65 and older, and commercially insured individuals. Differences in the percent missing were also found by co-morbid conditions and mortality causes. Imputing the missing R&E with BISG method improved the sensitivity to identify White, Black, API, and Hispanics. Conclusions: APCDs can benefit from enhancing missing R&E with BISG imputation to perform more robust population-health level analyses and identify inequities according to R&E without losing power or dropping non-random records with missing R&E data.
ARTICLE | doi:10.20944/preprints201804.0088.v1
Subject: Arts And Humanities, History Keywords: historical dataset; geocoding; localisation; geohistorical objects; database; GIS; collaborative; citizen science; crowd-sourced; digital humanities
Online: 8 April 2018 (09:13:10 CEST)
The latest developments in digital humanities have increasingly enabled the construction of large data sets which can easily be accessed and used. These data sets often contain indirect localisation information, such as historical addresses. Historical geocoding is the process of transforming the indirect localisation information to direct localisation that can be placed on a map, which enables spatial analysis and cross-referencing. Many efficient geocoders exist for current addresses, but they do not deal with temporal information and are usually based on a strict hierarchy (country, city, street, house number, etc.) that is hard, if not impossible, to use with historical data. Indeed, historical data are full of uncertainties (temporal, textual, positional accuracy, confidence in historical sources) that can not be ignored or entirely resolved. We propose an open source, open data, extensible solution for geocoding that is based on gazetteers composed of geohistorical objects extracted from historical topographical maps. Once the gazetteers are available, geocoding an historical address is a matter of finding the geohistorical object in the gazetteers that is the best match to the historical address searched by the user. The matching criteria are customisable and include several dimensions (fuzzy string, fuzzy temporal, level of detail, positional accuracy). As the goal is to facilitate historical work, we also propose web-based user interfaces that help geocode (one address or batch mode) and display over current or historical topographical maps, so that geocoding results can be checked and collaboratively edited. The system has been tested on the city of Paris, France, for the 19th and the 20th centuries. It shows high response rates and is fast enough to be used interactively.
ARTICLE | doi:10.20944/preprints201612.0075.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: image recognition bases location; indoor positioning; RGB-D images; LiDAR; DataBase; mobile computing; image retrieval
Online: 15 December 2016 (07:17:35 CET)
This paper describes the first results of an Image Recognition Based Location (IRBL) for mobile application focusing on the procedure to generate a Database of range images (RGB-D). In an indoor environment, to estimate the camera position and orientation, a prior spatial knowledge of the surrounding is needed. In order to achieve this objective a complete 3D survey of two different environment (Bangbae metro station of Seoul and E.T.R.I. building in Daejeon – Republic of Korea) was performed using LiDAR (Light Detection And Ranging) instrument and the obtained scans were processed in order to obtain a spatial model of the environments. From this, two databases of reference images were generated using a specific software realized by the Geomatics group of Politecnico di Torino (ScanToRGBDImage). This tool allow to generate synthetically different RGB-D images) centered in the each scan position in the environment. Later, the external parameters (X, Y, Z, ω, φ, κ) and the range information extracted from the DB images retrieved, are used as reference information for pose estimation of a set of acquired mobile pictures in the IRBL procedure. In this paper the survey operations, the approach for generating the RGB-D images and the IRB strategy are reported. Finally the analysis of the results and the validation test are described.
ARTICLE | doi:10.20944/preprints202310.0792.v1
Subject: Arts And Humanities, Archaeology Keywords: Sea level change; Paleo-coastline; Prehistory; Southwestern Iberian Peninsula; Marine resources exploitation; Spatial Database; Geostatistical analysis
Online: 12 October 2023 (14:23:10 CEST)
This paper an approach for analyzing the impact of sea level changes on prehistoric human settlement patterns in the Southwestern Iberian Peninsula. The approach is based on highly qualified and fully georeferenced information sources managed within a spatial database. This allows for a more precise analysis of the distance to the coast and its relation to marine resources from a specific location, areas that may have lost their archaeological potential due to being currently submerged, and the actual distribution of sites as a starting point for territorial analysis. Coastal changes, such as sea level fluctuations over the past 120,000 years, have affected the position of the coastline and influenced human settlement patterns. Through an analysis of the archaeological site locations relative to their paleo-coastlines based on available dating data, this study emphasizes the necessity of adopting a comprehensive approach to comprehend human settlement patterns and their correlation with the dynamic coastal changes. This approach provides valuable insights for formulating strategies for exploiting coastal resources and structuring socio-economic systems in the region.
ARTICLE | doi:10.20944/preprints202305.0269.v1
Subject: Medicine And Pharmacology, Pharmacology And Toxicology Keywords: forensic toxicology; amphetamine-related fatalities; brain; omega-3 fatty acids; docosahexaenoic acid; Comparative Toxicogenomic Database (CTD)
Online: 4 May 2023 (13:28:44 CEST)
Amphetamine is a psychostimulant drug with a high risk of toxicity and death when misused. Abuse of amphetamines is associated with an altered organic profile, which includes omega fatty acids. Low omega fatty acid levels are linked to mental disorders. Using the Comparative Toxicogenomic Database (CTD), we investigated the chemical profile of the brain in amphetamine-related fatalities and the possibility of neurotoxicity. We classified amphetamine cases as low (0-0.5 g/ml), medium (>0.5 to 1.5 g/ml), and high (>1.5 g/ml) based on amphetamine levels in brain samples. All three groups shared 1-octadecene, 1-tridecene, 2,4-di-tert-butylphenol, arachidonic acid, docosahexaenoic acid (DHA), eicosane, and oleylamide. We identified chemical-disease associations using the CTD tools and predict an association between docosahexaenoic and arachidonic acids and curated conditions like autistic disorder, disorders related to cocaine, Alzheimer's disease, and cognitive dysfunction. An amphetamine challenge may cause neurotoxicity in the human brain due to a decrease in omega-3 fatty acids and an increase in oxidative products. Therefore, in cases of amphetamine toxicity, supplement therapy may necessitate preventing omega-3 fatty acid deficiency.
ARTICLE | doi:10.20944/preprints202209.0310.v1
Subject: Medicine And Pharmacology, Pharmacology And Toxicology Keywords: COVID-19; CoviRx.org; database; drugs; pandemic; repurposing; SARS-CoV-2; therapies; treatments; Variants of Concern (VOC)
Online: 20 September 2022 (15:00:48 CEST)
SARS-CoV-2, is the cause of the COVID-19 pandemic which has claimed more than six million lives worldwide, devastating the economy and overwhelming healthcare systems globally. The development of new drug molecules and vaccines has played a critical role in managing the pandemic; however, new variants of concern still pose a significant threat as the current vaccines cannot prevent all infections. This situation calls for the collaboration of biomedical scientists and healthcare workers across the world. Repurposing approved drugs is an effective way of fast-tracking new treatments for recently emerged diseases. To this end, we have assembled and curated a database consisting of 7817 compounds from the Compounds Australia Open Drug collection. We developed a set of eight filters based on indicators of efficacy and safety that were applied sequentially to down-select drugs that showed promise for drug repurposing efforts against SARS-CoV-2. Considerable effort was made to evaluate approximately 14000 assay data points for SARS-CoV-2 FDA/TGA-approved drugs and provide an average activity score for 3539 compounds. The filtering process identified 12 FDA approved molecules with established safety profiles that have a plausible mechanism for treating COVID-19 disease. The methodology developed in our study provides a template for prioritising repurposable drug candidates that are safe, efficacious, and cost-effective for the treatment of COVID-19, long COVID, or any other future disease. We present our database in an easy-to-use interactive interface (CoviRx, https://www.covirx.org/) that was also developed to enable scientific community to access to the data of over 7000 potential drugs and to implement alternative prioritisation and down-selection strategies.
ARTICLE | doi:10.20944/preprints202104.0256.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: Multi-database Mining; Graph Clustering; Coordinate Descent; Convex Optimization; Similarity Measure; Binary Entropy Loss; Fuzziness Index
Online: 9 April 2021 (10:20:06 CEST)
Clustering algorithms for multi-database mining (MDM) rely on computing $(n^2-n)/2$ pairwise similarities between $n$ multiple databases to generate and evaluate $m\in[1, (n^2-n)/2]$ candidate clusterings in order to select the ideal partitioning which optimizes a predefined goodness measure. However, when these pairwise similarities are distributed around the mean value, the clustering algorithm becomes indecisive when choosing what database pairs are considered eligible to be grouped together. Consequently, a trivial result is produced by putting all the $n$ databases in one cluster or by returning $n$ singleton clusters. To tackle the latter problem, we propose a learning algorithm to reduce the fuzziness in the similarity matrix by minimizing a weighted binary entropy loss function via gradient descent and back-propagation. As a result, the learned model will improve the certainty of the clustering algorithm by correctly identifying the optimal database clusters. Additionally, in contrast to gradient-based clustering algorithms which are sensitive to the choice of the learning rate and require more iterations to converge, we propose a learning-rate-free algorithm to assess the candidate clusterings generated on the fly in a fewer upper-bounded iterations. Through a series of experiments on multiple database samples, we show that our algorithm outperforms the existing clustering algorithms for MDM.
ARTICLE | doi:10.20944/preprints201910.0032.v1
Subject: Computer Science And Mathematics, Security Systems Keywords: computerized revenue collection; machine learning; cyber security; software defined networks; object-oriented programming; online database management
Online: 3 October 2019 (01:45:11 CEST)
The need for the most accurate and flexible system of revenue collection from internal sources has become a matter of extreme urgency and importance in e-governance. This need underscores the eagerness on the part of the Government to look for a new principle and policy of revenue collection or to become aggressive and innovative in the mode of collecting revenue from existing sources using the present system. The Boards of some Governments in Africa, even up to the moment are facing a lot of setbacks in performing their tasks due to the manual system of revenue collection from the public. This can be improved through an effective collection of revenue using the most accurate and flexible system. Tax is usually collected in the form of specific sales tax, general sales tax, corporate income tax, individual income tax, property tax and inheritance tax. Problems such as high cost of collection, fraud, underpayment, leakage in revenue, poor access to information, poor tracking of defaulters is at the increase. As a result of this, there is need to computerize the revenue collection system. Computerized systems have proven to introduce massive efficiencies and quick collection of revenue from the public. This research work demonstrates how to design and implement an automated system of revenue collection and how to maintain a secured database for collected tax information. This research delves into the study of how machine learning algorithms and Software-defined Networks improve the security of such automated systems.
DATA DESCRIPTOR | doi:10.20944/preprints202308.1701.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: disease X; big data; data science; data analysis; dataset development; database; google trends; data mining; healthcare; epidemiology
Online: 24 August 2023 (05:48:54 CEST)
The World Health Organization (WHO) added Disease X to their shortlist of blueprint priority diseases to represent a hypothetical, unknown pathogen that could cause a future epidemic. During different virus outbreaks of the past, such as COVID-19, Influenza, Lyme Disease, and Zika virus, researchers from various disciplines utilized Google Trends to mine multimodal components of web behavior to study, investigate, and analyze the global awareness, preparedness, and response associated with these respective virus outbreaks. As the world prepares for Disease X, a dataset on web behavior related to Disease X would be crucial to contribute towards the timely advancement of research in this field. Furthermore, none of the prior works in this field have focused on the development of a dataset to compile relevant web behavior data, which would help to prepare for Disease X. To address these research challenges, this work presents a dataset of web behavior related to Disease X, which emerged from different geographic regions of the world, between February 2018 to August 2023. Specifically, this dataset presents the search interests related to Disease X from 94 geographic regions. These regions were chosen for data mining as these regions recorded significant search interests related to Disease X during this timeframe. The dataset was developed by collecting data using Google Trends. The relevant search interests for all these regions for each month in this time range are available in this dataset. This paper also discusses the compliance of this dataset with the FAIR principles of scientific data management. Finally, a brief analysis of specific features of this dataset is presented to uphold the applicability, relevance, and usefulness of this dataset for the investigation of different research questions in the interrelated fields of Big Data, Data Mining, Healthcare, Epidemiology, and Data Analysis.
ARTICLE | doi:10.20944/preprints202304.0330.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: Ecosystem Services; Value Transfer Method; Replacement Cost Method; Unit Value Database; Land Cover Resolution; Grand River Watershed
Online: 14 April 2023 (02:29:36 CEST)
Economic valuations of ecosystem services often transfer previously estimated global unit values to the geographical setting of interest. While this approach produces quick results, its reliability depends on how representative the large-scale average unit values are for the given local context. Here, we estimate the values of three ecosystem services (ES) – water filtration, nutrient cycling and carbon sequestration – in the Grand River watershed (GRW) of southern Ontario, Canada. The watershed covers nearly 7000 km2, has a humid continental climate and a population of close to one million people. Land cover is dominated by agriculture. We compare ES valuations using locally derived (i.e., GRW-specific) unit values to valuations based on unit values from a regional database and those compiled in the global Ecosystem Services Valuation Database (ESVD). The regional database includes mean unit values from three case studies within southern Ontario and one boreal watershed in British Columbia. As expected, the regional database yields average monetary values for the three ES that are close to those obtained with the local unit values but with larger associated uncertainties. Using the ESVD, however, results in significantly higher monetary values for the ES. For water filtration, the ESVD value is more than five times higher than the regional and local estimates. We further illustrate the effect of the extent of aggregation of forested and agricultural land categories on the ES values. For example, by subdividing the forest category into three sub-categories (deciduous, coniferous, and mixed forest), the estimated value of the carbon sequestration ES of forested areas within the GRW increases by 15%. Overall, our results emphasize the importance of critically assessing the origin of unit values and the land cover resolution in ES valuation, especially when the latter is used as a policy-guiding tool.
REVIEW | doi:10.20944/preprints202103.0402.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: anesthesia; anesthesiology; big data; registries; database research; acute pain; pain management; postoperative pain; regional anesthesia; regional analgesia.
Online: 15 March 2021 (17:45:39 CET)
The digital transformation of healthcare is advancing, leading to an increasing availability of clinical data for research. Perioperative big data initiatives were established to monitor treatment quality and benchmark outcomes. However, big data analyzes have long exceeded the status of pure quality surveillance instruments. Large retrospective studies nowadays often represent the first approach to new questions in clinical research and pave the way for more expensive and resource intensive prospective trials. As a consequence, utilization of big data in acute pain and regional anesthesia research considerably increased over the last decade. Multicentric clinical registries and administrative databases (e.g., healthcare claims databases) have collected millions of cases until today, on which basis several important research questions were approached. In acute pain research, big data was used to assess postoperative pain outcomes, opioid utilization, and the efficiency of multimodal pain management strategies. In regional anesthesia, adverse events and potential benefits of regional anesthesia on postoperative morbidity and mortality were evaluated. This article provides a narrative review on the growing importance of big data for research in acute postoperative pain and regional anesthesia.
TECHNICAL NOTE | doi:10.20944/preprints202211.0220.v2
Subject: Computer Science And Mathematics, Information Systems Keywords: Relational Database; Columnar Storage; Bloom Filter; Skip List; Field Level Lock; Read Write Concurrency; OLTP; OLAP; LSM-Tree; Token Bucket Algorithm
Online: 14 November 2022 (03:02:09 CET)
At present, diversified and highly concurrent businesses in the Internet industry often require heterogeneous databases formed by multiple databases to meet the needs. This report introduces database kernel SG-ColBase we developed. After achieving read and write concurrency control, data rollback, atomic log writing, and downtime data redo to ensure complete transaction support. The parallelism of database kernel execution is extended through field level locks and snapshot reads. Use the Bloom filter, resource cache pool, memory pool, skip list, non blocking log cache, and asynchronous data writing mechanism to improve the overall execution efficiency of the system. In terms of data storage, column storage, logical key and LSM-tree are introduced. While improving the data compression ratio and reducing data gaps, all disk data operations are written in incremental order. With the characteristics of asynchronous batch operation, the data writing speed is greatly improved. Thanks to the continuous feature of vertical data brought by column storage, the disk scanning brought by vertical traversal is reduced, which is a qualitative leap in efficiency compared with traditional relational databases in the big data analysis scenario. SG-ColBase can reduce the use of heterogeneous databases in business and improve R&D efficiency.
ARTICLE | doi:10.20944/preprints202112.0163.v1
Subject: Computer Science And Mathematics, Software Keywords: ethnobotany; paleoethnobotany; biocultural heritage; digital heritage; online database; Indigenous data sovereignty; Open Access; research accessiblity; digital reference collection
Online: 9 December 2021 (20:01:36 CET)
Biocultural heritage preservation relies on ethnobotanical knowledge and the paleoethnobotanical data used in (re)constructing histories of human-biota interactions. Biocultural heritage, defined as the knowledge and practices of Indigenous and Local peoples and their biological relatives, is often guarded information, meant for specific audiences and withheld from other social circles. As such, these forms of heritage and knowledge must also be included in the ongoing data sovereignty discussions and movement. In this paper we share the process and design decisions behind creating an online database for ethnobotanical knowledge and associated paleoethnobotanical data, using a content management system designed to foreground Indigenous and local perspectives. Our main purpose is to suggest the Mukurtu content management system, originally designed for physical items of cultural importance, be considered as a potential tool for digitizing and ethically circulating biocultural heritage, including paleoethnobotanical resources. With this database, we aim to create access to biocultural heritage and paleoethnobotanical considerations for a variety of audiences while also respecting the protected and sensitive natures of Indigenous and local knowledges.
ARTICLE | doi:10.20944/preprints202103.0295.v1
Subject: Chemistry And Materials Science, Biomaterials Keywords: additive manufacturing; rapid solidification; microstructural evolution; non-equilibrium; quasi-equilibrium; multi-phase field method; CALPHAD database; nickel alloy
Online: 11 March 2021 (07:40:42 CET)
Solidification microstructure is formed under high cooling rates and temperature gradients in powder-based additive manufacturing. In this study, a non-equilibrium multi-phase field method (MPFM), which was based on a finite interface dissipation model proposed by Steinbach et. al., coupled with a CALPHAD database was developed for a multicomponent Ni alloy. A qua-si-equilibrium MPFM was also developed for comparison. Two-dimensional equiaxed micro-structural evolution for the Ni (Bal.)–Al–Co–Cr–Mo–Ta–Ti–W–C alloy was performed at various cooling rates. The temperature–γ fraction profiles obtained under 10^5 K/s using non- and qua-si-equilibrium MPFMs were in good agreement with each other. Over 10^6 K/s, the differences between non- and quasi-equilibrium methods grew as the cooling rate increased. The non-equilibrium solidification was strengthened over a cooling rate of 10^6 K/s. Colum-nar-solidification microstructural evolution was performed under cooling rates from 5×10^5 K/s to 1×10^7 K/s at various temperature gradient values under the constant interface velocity (0.1 m/s). The results showed that as the cooling rate increased, the cell space decreased in both methods, and the non-equilibrium MPFM agreed well with experimental measurements. Our results show that the non-equilibrium MPFM can simulate solidification microstructure in powder bed fusion additive manufacturing.
ARTICLE | doi:10.20944/preprints202001.0166.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: spatiotemporal database; spatial analysis; seasonal precipitation; spearman correlation coefficient; pacific decadal oscillation; southern oscillation index; north atlantic oscillation
Online: 16 January 2020 (10:59:53 CET)
Temporary changes in precipitation may lead to sustained and severe drought or massive floods in different parts of the world. Knowing variation in precipitation can effectively help the water resources decision-makers in water resources management. Large-scale circulation drivers have a considerable impact on precipitation in different parts of the world. In this research, the impact of El Niño-Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), and North Atlantic Oscillation (NAO) on seasonal precipitation over Iran was investigated. For this purpose, 103 synoptic stations with at least 30 years of data were utilized. The Spearman correlation coefficient between the indices in the previous 12 months with seasonal precipitation was calculated, and the meaningful correlations were extracted. Then the month in which each of these indices has the highest correlation with seasonal precipitation was determined. Finally, the overall amount of increase or decrease in seasonal precipitation due to each of these indices was calculated. Results indicate the Southern Oscillation Index (SOI), NAO, and PDO have the most impact on seasonal precipitation, respectively. Also, these indices have the highest impact on the precipitation in winter, autumn, spring, and summer, respectively. SOI has a diverse impact on winter precipitation compared to the PDO and NAO, while in the other seasons, each index has its special impact on seasonal precipitation. Generally, all indices in different phases may decrease the seasonal precipitation up to 100%. However, the seasonal precipitation may increase more than 100% in different seasons due to the impact of these indices. The results of this study can be used effectively in water resources management and especially in dam operation.
ARTICLE | doi:10.20944/preprints202306.0124.v1
Subject: Engineering, Civil Engineering Keywords: confined masonry; seismic behaviour; experimental database; in-plane shear behaviour; in-plane flexural behaviour; out-of-plane seismic effects
Online: 2 June 2023 (04:10:53 CEST)
Confined masonry (CM) is a construction system which consists of masonry wall panels enclosed by vertical and horizontal reinforced concrete confining elements. The presence of these confining elements distinguishes CM from unreinforced masonry system and makes this technology suitable for construction of structures in regions subjected to intense seismic or wind actions. CM construction has been used in many countries and regions, and has performed well in past earthquakes. The purpose of the paper is to review past research studies related to the seismic in-plane and out-of-plane behaviour of CM structures. The authors have identified the key design and construction parameters which were considered in past research studies and have performed statistical analyses to establish their influence on the seismic performance of CM buildings. For the purpose of this study the authors have compiled databases of previous experimental studies on CM wall specimens which were used for statistical analyses. Finally, the paper discusses research gaps and needs for future research studies which would contribute to the understanding of seismic behaviour and failure mechanisms of CM walls.
DATA DESCRIPTOR | doi:10.20944/preprints202209.0323.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: COVID-19; Open-source dataset; Drug Repurposing; Database system; Web application devel-opment; software development; Drug fingerprints; Bulk upload
Online: 21 September 2022 (10:14:11 CEST)
Although various vaccines are now commercially available, they have not been able to stop the spread of COVID-19 infection completely. An excellent strategy to quickly get safe, effective, and affordable COVID-19 treatment is to repurpose drugs that are already approved for other diseases as adjuvants along with the ongoing vaccine regime. The process of developing an accurate and standardized drug repurposing dataset requires a considerable level of resources and expertise due to the commercial availability of an extensive array of drugs that could be potentially used to address the SARS-CoV-2 infection. To address this bottleneck, we created the CoviRx platform. CoviRx is a user-friendly interface that provides access to the data, which is manually curated for COVID-19 drug repurposing data. Through CoviRx, the data curated has been made open-source to help advance drug repurposing research. CoviRx also encourages users to submit their findings after thoroughly validating the data, followed by merging it by enforcing uniformity and integ-rity-preserving constraints. This article discusses the various features of CoviRx and its design principles. CoviRx has been designed so that its functionality is independent of the data it dis-plays. Thus, in the future, this platform can be extended to include any other disease X beyond COVID-19. CoviRx can be accessed at www.covirx.org.
ARTICLE | doi:10.20944/preprints202106.0110.v1
Subject: Chemistry And Materials Science, Biomaterials Keywords: Thermodynamic modeling; CALPHAD; molten salt; molten salt reactor; thermodynamic database; modified quasichemical model; fluoride salt; chloride salt; salt system
Online: 3 June 2021 (11:50:13 CEST)
olten salt reactors (MSRs) utilize salts as coolant or as the fuel and coolant together with fissile isotopes dissolved in the salt. It is necessary to therefore understand the behavior of the salts to effectively design, operate, and regulate such reactors, and thus there is a need for thermodynamic models for the salt systems. Molten salts, however, are difficult to represent as they exhibit short range order that is dependent on both composition and temperature. A widely useful approach is the modified quasichemical model in the quadruplet approximation that provides for consideration of first and second nearest neighbor coordination and interactions. Its use in the CALPHAD ap-proach to system modeling requires fitting parameters using standard thermodynamic data such as phase equilibria, heat capacity, and others. Shortcoming of the model is its inability to directly vary coordination numbers with composition or temperature. Another issue is the difficulty in fitting model parameters using regression methods without already having very good initial values. The proposed paper will discuss these issues and note some practical methods for the effective genera-tion of useful models.
ARTICLE | doi:10.20944/preprints202012.0235.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: database; disaster prevention; disaster risk reduction (DRR); climate change adaptation (CCA); stakeholders; nature-based solutions (NBS); mountain; hydro-meteorological risks
Online: 9 December 2020 (16:48:34 CET)
In the context of global changes, Nature-Based Solutions (NBSs) increasingly draw attention as a possible way to reduce disaster risk associated with extreme hydro-meteorological events while providing human well-being and biodiversity benefits at the same time. The PHUSICOS platform is dedicated to gather and analyse relevant NBSs used to reduce disaster risk associated with extreme hydro-meteorological events in mountainous and hilly lands. To design the platform, an in-depth review of 11 existing platforms has been performed. The platform currently references 152 literature NBS cases and is continuously enriched with demonstrator sites through the contribution of NBS community. The platform also proposes a qualitative assessment of the NBSs collected according to 15 criteria related with five ambits: disaster risk reduction, technical and economical feasibility, environment, society and local economy. This paper presents the structure of the platform and a first analysis of its content.
DATA DESCRIPTOR | doi:10.20944/preprints202208.0112.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: ground truth data; drone; mobile application; windshield survey; sample design; crop mapping; agriculture statistics; data dissemination; earth observation data; spatial database.
Online: 4 August 2022 (16:18:26 CEST)
Over the last few years, Earth Observation (EO) data has shifted towards increased use to produce official statistics, particularly in the agriculture sector. National statistics offices worldwide, including in Asia and the Pacific, are expanding their use of EO data to produce agricultural statistics such as crop classification, yield estimation, irrigation mapping, and crop loss estimation. The advances in image classification, such as pixel-based and phenology-based classifications, and machine learning create new opportunities for researchers to analyze EO data applied to agriculture statistics. However, it requires the ground truth (GT) data because classification result mainly depends on the quality of GT. Therefore, in this study, we introduced a random sampling approach to design and collect GT data using EO imagery and ancillary data. As a result of data collection, GT data improve the algorithms and validates classification results. Nevertheless, despite the importance of GT data, they are rarely disseminated as a data product in themselves. Thus, this results in an untapped opportunity to share GT data as a global public good, and improved use of survey and census data as a source of GT data.
ARTICLE | doi:10.20944/preprints202303.0023.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: Game Design; Variational AutoEncoder (VAE); Image and Video Generation; Bayesian Algorithm; Loss Function; Data Clustering; Data and Image Analytics; MNIST database; Generator and Discriminator
Online: 1 March 2023 (11:17:12 CET)
In recent decades, the Variational AutoEncoder (VAE) model has shown good potential and capabilities in image generation and dimensionality reduction. The combination of VAE and various machine learning frameworks has also worked effectively in different daily life applications, however its possibility and effectiveness in modern game design has seldom been explored nor assessed. The use of its feature extractor for data clustering was minimally discussed in literature neither. This paper first attempts to explore different mathematical properties of the VAE model, in particular, the theoretical framework of the encoding and decoding processes, the possible achievable lower bound and loss functions of different applications; then applies the established VAE model into generating new game levels within two well-known game settings; as well as validating the effectiveness of its data clustering mechanism with the aid of the Modified National Institute of Standards and Technology (MNIST) database. Respective statistical metrics and assessments were also utilized for evaluating the performance of the proposed VAE model in aforementioned case studies. Based on the statistical and spatial results, several potential drawbacks and future enhancement of the established model were outlined, with the aim of maximizing the strengths and advantages of VAE for future game design tasks and relevant industrial missions.