ARTICLE | doi:10.20944/preprints201901.0130.v1
Subject: Social Sciences, Business And Administrative Sciences Keywords: internationalisation of SMEs; big data; market-oriented information; relational database; supply chain network; optimized database; trade condition; data visualization
Online: 14 January 2019 (10:04:03 CET)
There have been many discussions on the globalisation of SMEs, but it is true that there is not enough academic achievement after such the study of Born global (BG) ventures. The internationalisation of SMEs (Small and Medium Enterprises) is not easy because they lack resources or capabilities compared to multinational corporations. This study investigated the role of government in assisting the internationalisation of SMEs. In particular, SMEs lacked the ability to acquire market-oriented information, so we’ve established the scheme of efficient information support system for the internationalisation of SMEs. In other words, we proposed an information analysis system through the establishment of a relational database constructed for market-oriented information support. KISTI (Korea Institute of Science and Technology Information), which is one of the government-funded research institutes in the Republic of Korea, provided information support to the SMEs dealing with hydrazine related products. This study suggests this case for the market-oriented information support of the government in the internationalisation of SMEs. The research on information support of the government is meaningful in that it suggests a way to support SMEs in practical level.
ARTICLE | doi:10.20944/preprints201810.0069.v1
Subject: Earth Sciences, Geoinformatics Keywords: urban system; urban context; microzone, fuzzy rule set; Mamdani fuzzy system; spatial database, GIS
Online: 4 October 2018 (11:55:09 CEST)
We present a new unsupervised method aimed to obtain a partition of a complex urban system in homogenous urban areas, called urban contexts. The area of study is initially partitioned in microzones, homogeneous portion of the urban system, that are the atomic reference elements for the census data. With the contribution of domain experts, we identify the physical, morphological, environmental and socio-economic indicators need to identify synthetic characteristics of urban contexts and create the fuzzy rule set necessary to determine the type of urban context. We implement the set of spatial analysis processes necessary to calculate the indicators for microzone and apply a Mamdani fuzzy rule system to classify the microzones. Finally, the partition of the area of study in urban contexts is obtained by dissolving continuous microzones belonging to the same type of urban context. Tests are performed on the Municipality of Pozzuoli (Naples - Italy); the reliability of out model is measured by comparing the results with the ones obtained by detailed analysis.
ARTICLE | doi:10.20944/preprints201809.0454.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: Venom; toxins; NCBI; database
Online: 24 September 2018 (12:00:19 CEST)
Venoms that drip from the fangs of snakes are incredibly complex chemical cocktails of compounds, with different proteins and enzymes, including a large variety of toxins like myotoxins, cardiotoxins, hemotoxins, and neurotoxins and their countless combinations. In addition to their use in the treatment of snake bites in humans, they have numerous therapeutic and medicinal applications. Potential use of snake venom includes excessive bleeding, stroke, neurological disorders, cancer, diabetes and aging. Therefore, a proper understanding of snake venom toxin and facilitating their use is of utmost importance. In this paper, we describe a novel database, called SVDB, for storage, dissemination and analysis of snake venom and toxins related information. SVDB has autonomous links to NCBI databases to pull relevant information both on-demand and asynchronous ways to facilitate data integration. SVDB includes authentic, non-redundant, up-to-date scientific information on literature, sequences, structures, small molecules, taxonomy and many more. SVDB portal also provides external links to tools like BLAST, CLUSTAL, Swiss-model, phylogeny and other toxin related resources. The architecture of SVDB information fetching, linking and structuring is unique and can be implemented to any domain specific generic data collection pipeline through the NCBI. The database is publicly available at https://www.snakevenomdb.org.
ARTICLE | doi:10.20944/preprints202105.0714.v1
Online: 31 May 2021 (08:32:16 CEST)
Plant sterols are compounds with multiple biological functions, mainly cholesterol-reducing. There are no comprehensive databases on plant sterols, which makes it difficult to estimate their intake in the Polish population. In this study we used international food databases, supplemented by scientific data from the literature, to create a database on plant sterols in the food consumed in Poland to assess the size and sources of dietary plant sterols in the adult population of Poland. The literature search was conducted using PubMed, Web of Science, Scopus, and Google Scholar to identify possible sources of published food composition data for plant sterols. The study group consisted of 5690 participants of the WOBASZ II survey. We identified 361 dietary sources of plant sterols based on the consumption of foods and dishes reported by participants. Cereals and fats provided 61% of the total plant sterols, and together with vegetables and fruits, this totaled 80%. Total plant sterol intake for the Polish population was 282.97 mg/day, and divided by men and women was 320.77 and 252.19 mg/day, respectively. Canola oil provided the most plant sterols at 16.92%, followed by white bread at 16.65% and soft margarine at 8.33%. This study found that the database of plant sterols facilitates the calculation of plant sterols in the typical Polish diet, and the results are comparable to those of other studies, despite different methodologies of nutritional assessment and slightly different databases. The main sources of dietary plant sterols did not differ from the data for other populations. This study confirmed the observations of other research that women's diets may have a higher plant sterol density compared to men.
ARTICLE | doi:10.20944/preprints201907.0074.v1
Online: 4 July 2019 (10:29:30 CEST)
The human proteome is composed from diverse and heterogeneous gene products/proteoforms. Previously, we have been discussing the main technical aspects in developing for inventory of human proteoforms that would be visually attractive, clear, and easy to search (Naryzhny S. J. Proteomics 2018, S1874-3919(18) 30220-3). Here, we present our first draft of the database of proteoforms that is based on this discussion. The database principles and structure are described. The database is called “2DE-pattern” as it contains multiple isoform-centric patterns of proteoforms separated according to 2DE principles.
ARTICLE | doi:10.20944/preprints202110.0033.v1
Online: 4 October 2021 (08:58:52 CEST)
Antimicrobial resistance (AMR) is one of the top 10 threats affecting global health. AMR defeats the effective prevention and treatment of infections caused by microbial pathogens including bacteria, parasites, viruses and fungi (WHO). Microbial pathogens have natural tendency to evolve and mutate over time resulting in AMR strains. The set of genes involved in antibiotic resistance also termed as “antibiotic resistance genes” (ARGs) spread through species by lateral gene transfer thereby causing global dissemination. While this biological mechanism is prevalent in the spread of AMR, human methods also augment through various mechanisms such as over prescription, incomplete treatment, environmental waste etc. A considerable portion of scientific community is engrossed in AMR related work trying to discover novel therapeutic solutions for tackling resistant pathogens. Comprehensive inspection of the literature shows that diverse therapeutic strategies have evolved over recent years. Collectively, these therapeutic strategies include novel small molecules, newly identified antimicrobial peptides, bacteriophages, phytochemicals, nanocomposites, novel phototherapy against bacteria, fungi and virus. In this work we have developed a comprehensive knowledgebase by collecting alternative antimicrobial therapeutic strategies from literature data. We have used subjective approach for datamining new strategies resulting in broad coverage of entities and subsequently add objective data like entity name, potency, safety information etc. The extracted data was organized KOMBAT (Knowledgebase Of Microbes’ Battling Agents for Therapeutics). A lot of these data are tested against AMR pathogens. We envision that this database will be noteworthy for developing future therapeutics against resistant pathogens. The database can be accessed through http://kombat.igib.res.in/.
ARTICLE | doi:10.20944/preprints201903.0015.v1
Subject: Biology, Plant Sciences Keywords: plant; sesquiterpenes; biosynthesis; graph grammars; database;
Online: 1 March 2019 (14:30:16 CET)
Plants produce a diverse portfolio of sesquiterpenes that are important in their response to herbivores and the interaction with other plants. Their biosynthesis from farnesyl diphosphate depends on the sesquiterpene synthases. Here, we investigate to what extent metabolic pathways can be reconstructed just from knowledge of the final product and the reaction mechanisms catalyzed by sesquiterpene synthases. We use the software package MedØlDatschgerl (MØD) to generate chemical networks and elucidate pathways contained in them. As examples, we successfully consider the reachability of the important plant sesquiterpenes β-caryophyllene, α-humulene, and β-farnesene. We also introduce a graph database to integrate simulation results with experimental biological evidence for selected predicted sesquiterpenes biosynthesis.
ARTICLE | doi:10.20944/preprints202011.0656.v1
Subject: Life Sciences, Biochemistry Keywords: orchid books; database; species; documentations; policy; extinction
Online: 26 November 2020 (07:36:45 CET)
Orchids totalling close to 4000 species in Malaysia are one of the most diverse and most widespread family here had gained momentum in recognition among policy makers and guardians of the forest as one of the profiles that fit and can benefit plant conservation on a broad scale in recent years. Listed not only as conservation indicator but also as priority germplasm for sustainable floriculture industry in the country, a milestone that could safeguard wild orchids from verge of extinction in the natural habitat. Through our 30 years of studying orchids in the wild, we understand more about the distribution, rarity, threats and extinction of orchids than ever before, and we have the scientific tools to address many of the problems, yet many species face daily threats including habitat loss and unsustainable exploitation mainly via Internet trade. Prior to executing workable conservation plan, various research institutions working closely with Forestry Departments in Malaysia to first inventory and document the orchid species richness in the country. Selangor, Sarawak and Perlis Forestry Departments in collaboration with UPM have published seven orchid books that cover various habitat types. Selangor Forestry Department is leading on publishing biodiversity data in form of books for its various ecotourism’s sites and State Parks, and had published two books on orchids. Sarawak state has published one on the limestone orchids, and Perlis is the first to embark on the feat published one in 2010 and currently preparing a new book that includes other flagship wild flowers. Realizing the importance of documenting its biodiversity wealth, Malaysia has developed an information system that would be a one-stop retrieval point or repository for biodiversity facts and as a part of the commitments to CBD to facilitate reporting and the transfer of biological diversity and conservation-related information both nationally and internationally.
ARTICLE | doi:10.20944/preprints201609.0124.v1
Subject: Life Sciences, Molecular Biology Keywords: breast cancer; immunosuppressive factor; biomarker; online database
Online: 30 September 2016 (09:34:30 CEST)
To screen and validate immunosuppressive factors in luminal- and basal-like breast cancer cell lines and tissue samples associated with malignant phenotypes. The mRNA microarray datasets, GSE40057 and GSE1561, were downloaded and remodelled. Differentially expressed genes (DEGs) were identified. Enrichment analyses performed and the online resources, GOBO and Kaplan-Meier Plotter, were employed to screen for immunosuppressive factors associated with breast cancer malignant phenotypes. qRT-PCR and western blot were used to validate the expression of CD274 and IL8 in cell lines and immunohistochemical detected the MIF and VEGFA on tissue microarrays. The results showed that CD274 and IL8 were both upregulated in basal-like cell lines. That MIF expression was dramatically increased in patients with breast cancer metastases (p<0.05) and that VEGFA expression positively correlates with breast cancer pathologic grade (p<0.05).During the formation and development of breast cancer, immune-related genes are always activated, and immunosuppressive factors CD274, IL8, MIF and VEGFA are upregulated. Such molecules could be used as biomarkers for breast cancer prognosis. However, because individual immune-related factors can play several biological roles, the mechanistic relationship between immunosuppressive factors and breast cancer malignant phenotypes and the feasibility of their application as drug targets require further investigation.
ARTICLE | doi:10.20944/preprints202007.0051.v2
Subject: Social Sciences, Library & Information Science Keywords: COVID-19; WHO; database; systematic review; data quality
Online: 2 August 2020 (17:43:38 CEST)
Introduction: A large number of COVID-19 publications has created a need to collect all research-related material in practical and reliable centralized databases. The aim of this study was to evaluate the functionality and quality of the compiled World Health Organisation COVID-19 database and compare it to Pubmed and Scopus. Methods: Article metadata for COVID-19 articles and articles on 8 specific topics related to COVID-19 was exported from the WHO global research database, Scopus and Pubmed. The analysis was conducted in R to investigate the number and overlapping of the articles between the databases and the missingness of values in the metadata. Results: The WHO database contains the largest number of COVID-19 related articles overall but retrieved the same number of articles on 8 specific topics as Scopus and Pubmed. Despite having the smallest number of exclusive articles overall, the highest number of exclusive articles on specific COVID-19 related topics was retrieved from the Scopus database. Further investigation revealed that PubMed and Scopus have more comprehensive structure than the WHO database, and less missing values in the categories searched by the information retrieval systems. Discussion: This study suggests that the WHO COVID-19 database, even though it is compiled from multiple databases, has a very simple and limited structure, and significant problems with data quality. As a consequence, relying on this database as a source of articles for systematic reviews or bibliometric analyses is undesirable.
REVIEW | doi:10.20944/preprints202007.0479.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Face Recognition; Face Analysis; Face Database; Deep Learning
Online: 21 July 2020 (11:13:45 CEST)
Face recognition is one of the most active research fields of computer vision and pattern recognition, with many practical and commercial applications including identification, access control, forensics, and human-computer interactions. However, identifying a face in a crowd raises serious questions about individual freedoms and poses ethical issues. Significant methods, algorithms, approaches, and databases have been proposed over recent years to study constrained and unconstrained face recognition. 2D approaches reached some degree of maturity and reported very high rates of recognition. This performance is achieved in controlled environments where the acquisition parameters are controlled, such as lighting, angle of view, and distance between the camera-subject. However, if the ambient conditions (e.g., lighting) or the facial appearance (e.g., pose or facial expression) change, this performance will degrade dramatically. 3D approaches were proposed as an alternative solution to the problems mentioned above. The advantage of 3D data lies in its invariance to pose and lighting conditions, which has enhanced recognition systems efficiency. 3D data, however, is somewhat sensitive to changes in facial expressions. This review presents the history of face recognition technology, the current state-of-the-art methodologies, and future directions. We specifically concentrate on the most recent databases, 2D and 3D face recognition methods. Besides, we pay particular attention to deep learning approach as it presents the actuality in this field. Open issues are examined and potential directions for research in facial recognition are proposed in order to provide the reader with a point of reference for topics that deserve consideration.
ARTICLE | doi:10.20944/preprints201909.0070.v1
Subject: Life Sciences, Microbiology Keywords: Microbiome, Inferred functions, Database, 16S, Metagenomics, Comparative metagenomics
Online: 6 September 2019 (09:44:29 CEST)
Motivation: 16S rRNA gene amplicon based sequencing has significantly expanded the scope of metagenomics research by enabling microbial community analyses in a cost-effective manner. The possibility to infer functional potential of a microbiome through amplicon sequencing derived taxonomic abundance profiles has further strengthened the utility of 16S sequencing. In fact, a surge in 'inferred function metagenomic analysis' has recently taken place, wherein most 16S microbiome studies include inferred functional insights in addition to taxonomic characterization. Tools like PICRUSt, Tax4Fun, Vikodak and iVikodak have significantly eased the process of inferring function potential of a microbiome using the taxonomic abundance profile. A platform that can enable hosting of inferred function 'metagenomic studies' with comprehensive metadata driven search utilities (of a typical database), coupled with on-the-fly comparative analytics between studies of interest, can be a major improvement to the state of art. ReFDash represents an effort in the proposed direction. Methods: This work introduces ReFDash - a Repository of Functional Dashboards. ReFDash, developed as a significant extension of iVikodak (function inference tool), provides three broad unique offerings in inferred function space - (i) a platform that hosts a database of inferred function data being continously updated using public 16S metagenomic studies (ii) a tool to search studies of interest and compare upto three metagenomic environments on the fly (iii) a community initiative wherein users can contribute their own inferred function data to the platform. ReFDash therefore provides a first-of-its-kind community-driven frame-work for scientific collaboration, data analytics, and sharing in this area of microbiome research. Results: Overall, the ReFDash database is aimed at compiling together a global ensemble of 16S-derived Functional Metagenomics projects. ReFDash currently hosts close to 50 ready-to-use, re-analyzable functional dashboards representing data from approximately 18,000 microbiome samples sourced from various published studies. Each entry also provides direct downloadable links to associated taxonomic files and metadata employed for analysis. Conclusion: The vision behind ReFDash is creation of a framework, wherein users can not only analyze their microbiome datasets in functional terms, but also contribute towards building an information base by submitting their functional analyses to ReFDash database. ReFDash web-server may be freely accessed at https://web.rniapps.net/iVikodak/refdash/
ARTICLE | doi:10.20944/preprints201908.0070.v1
Subject: Social Sciences, Geography Keywords: city; large urban regions; Russia; globalization; open database
Online: 6 August 2019 (08:33:24 CEST)
This study explores how to delineate Russian cities in order to make them comparable on the world scale. In doing so we introduce the concept of large urban regions (LUR) applicable to the Russian urban context. This research is motivated by a principal research question: how to construct a statistical urban delineation, which would allow first, to demonstrate integration of cities into globalization, and second, to make global urban comparative research. Previous studies on urban delineation in Russia have focused almost exclusively on functional urban areas, which have substantial limitations and are not suitable for global urban comparisons. Addressing this research gap, we propose a new definition of Large Urban Regions (LUR). In doing so, first, we introduce the context of Russian cities (2), then we discuss existing Russian urban concepts (3), and justify a need for a new urban delineation (4). Afterwards, we present a general method to delineate Large Urban Regions in Russian context (5.1), and illustrate it in the two case studies of St. Petersburg (polycentric region) and Samara (monocentric region) (5.2). In the last part (6), we discuss the 10 the largest urban regions in Russia and describe a constructed database including all Russian LURs.
REVIEW | doi:10.20944/preprints201902.0063.v1
Subject: Earth Sciences, Geology Keywords: SISAL database; speleothem; cave; isotopes; Middle East, palaeoclimate
Online: 6 February 2019 (13:32:13 CET)
The Middle East spans the transition between temperate Mediterranean climate in the Levant to hyper-arid sub-tropical deserts in the southern part of the Arabian Peninsula, with the complex alpine topography in the northeast feeding the Euphrates and Tigris rivers which support life in the southeastern Fertile Crescent (FC). Climate projections predict severe drying in major parts much of the ME in response to global warming, making it important to understand the controls of hydro-climate perturbations in the region. Here we discuss 23 ME speleothem stable oxygen isotope (δ18Occ) records from 16 sites from the SISAL_v1 database, which provide a record of past hydro-climatic variability. Sub-millennial changes in ME speleothem δ18Occ values primarily indicate changes in past precipitation amounts superimposed on variations of the main synoptic pattern in the region, specifically Mediterranean cyclones. The coherency (or lack thereof) between regional records is reviewed from Pleistocene to present, covering the Last Glacial Maximum (LGM), prominent events during deglaciation, and transition into the Holocene. The available speleothem δ18Occ time-series are investigated by binning and normalizing at 25-year and 200-year time windows over the Holocene. Important Holocene climatic oscillations are discussed, such as the 8.2 ka, 4.2 ka and 0.7 ka (the Little Ice Age) Before Present events. Common trends in the standardized anomalies are tested against different climate archives. Finally, recommendations for future speleothem-based research in the region are given along with comments on the utility and completeness of the SISAL database.
ARTICLE | doi:10.20944/preprints201810.0103.v1
Subject: Life Sciences, Virology Keywords: Nipah Virus, outbreak, inhibitors, QSAR, database, prediction algorithm
Online: 5 October 2018 (15:04:23 CEST)
Nipah virus (NiV) is responsible to cause various outbreaks in Asian countries, with latest from Kerala state of India. Till date there is no drug available despite its urgent requirement. In the current study, we have provided a computational one-stop solution for NiV inhibitors. We have developed “anti-Nipah” web resource, which comprised of a data repository, prediction method, and data visualization modules. The database comprised of 313 (181 unique) inhibitors from different strains and outbreaks of NiV extracted from research articles and patents. However, the quantitative structure–activity relationship (QSAR) based predictors were accomplished using classification approach employing 10-fold cross validation through support vector machine with 120 (68p + 52n) inhibitors. The overall predictor showed the accuracy and Matthew’s correlation coefficient of 88.89% and 0.77 on training/testing dataset respectively. The independent validation dataset also performed equally well. The data visualization modules from chemical clustering and principal component analyses displayed the diversity in the NiV inhibitors. Therefore, our web platform would be of immense help to the researchers working in developing effective inhibitors against NiV. The user-friendly webserver is freely available on URL: http://bioinfo.imtech.res.in/manojk/antinipah/
ARTICLE | doi:10.20944/preprints201902.0264.v1
Subject: Engineering, Civil Engineering Keywords: beams; database; experiments; flexure; shear; steel fiber reinforced concrete
Online: 28 February 2019 (07:10:10 CET)
Adding steel fibers to concrete improves the capacity in tension-driven failure modes. An example is the shear capacity in steel fiber reinforced concrete (SFRC) beams with longitudinal reinforcement and without shear reinforcement. Since no mechanical models exist that can fully describe the behavior of SFRC beams without shear reinforcement failing in shear, a number of empirical equations have been suggested in the past. This paper compiles the existing empirical equations and code provisions for the prediction of the shear capacity of SFRC beams failing in shear as well as a database of 487 experiments reported in the literature. The experimental shear capacities from the database are then compared to the prediction equations. This comparison shows a large scatter on the ratio of experimental to predicted values. The practice of defining the tensile strength of SFRC based on different experiments internationally makes the comparison difficult. For design purposes, the code prediction methods based on the Eurocode shear expression provide reasonable results (with coefficients of variation on the ratio of tested to predicted results of 27% - 29%). None of the currently available methods properly describe the behavior of SFRC beams failing in shear. As such, this work shows the need for studies that address the different shear-carrying mechanisms in SFRC and its crack kinematics.
REVIEW | doi:10.20944/preprints201809.0246.v1
Subject: Earth Sciences, Geology Keywords: SISAL database, speleothem, cave, oxygen isotopes, Western Europe, palaeoclimate
Online: 13 September 2018 (15:32:34 CEST)
Western Europe is the region with the highest density of published speleothem δ18O (δ18Ospel) records worldwide. Here we review these records in light of the recent publication of the Speleothem Isotopes Synthesis and Analysis (SISAL) database . We investigate how representative the spatial and temporal distribution of the available records is for climate in Western Europe, and review potential sites and strategies for future studies. We show that spatial trends in precipitation δ18O are mirrored in the speleothems, providing means to better constrain the factors influencing δ18Ospel at a specific location. Coherent regional δ18Ospel trends are found over stadial-interstadial transitions of the last glacial, especially in high altitude Alpine records. Over the Holocene, regional trends are less clearly expressed, due to lower signal-to-noise ratios in δ18Ospel, but can potentially be extracted with the use of statistical methods. Overall, this first assessment highlights the potential of the European region for speleothem palaeoclimate reconstruction, while underpinning the importance of knowledge of local factors for a correct interpretation of δ18Ospel.
REVIEW | doi:10.20944/preprints202206.0209.v1
Subject: Mathematics & Computer Science, Other Keywords: Quantum cryptography; Oblivious transfer; Secure multiparty computation; Private database query
Online: 15 June 2022 (02:31:33 CEST)
Quantum cryptography is the field of cryptography that explores the quantum properties of matter. Its aim is to develop primitives beyond the reach of classical cryptography or to improve on existing classical implementations. Although much of the work in this field has been dedicated to quantum key distribution (QKD), some important steps were made towards the study and development of quantum oblivious transfer (QOT). It is possible to draw a comparison between the application structure of both QKD and QOT primitives. Just as QKD protocols allow quantum-safe communication, QOT protocols allow quantum-safe computation. However, the conditions under which QOT is actually quantum-safe have been subject to a great amount of scrutiny and study. In this review article, we survey the work developed around the concept of oblivious transfer in the area of theoretical quantum cryptography, with an emphasis on some proposed protocols and their security requirements. We review the impossibility results that daunt this primitive and discuss several quantum security models under which it is possible to prove QOT security.
ARTICLE | doi:10.20944/preprints202107.0406.v1
Subject: Medicine & Pharmacology, Allergology Keywords: universal health coverage; health insurance claims; administrative data; claims database
Online: 19 July 2021 (11:38:35 CEST)
Although the universal health coverage (UHC) is pursued by many countries, not all countries with UHC include dental care as their benefits. Japan, with its long-held tradition of UHC, has covered dental care as essential benefit and majority of dental care services are provided to all patients with minimal copayment. Being under the UHC, the scope of services as well as prices are regulated by the uniform fee schedule and dentists submit claims according to the uniform format and fee schedule. The author analyzes the publicly available dental health insurance claims data as well as a sampling survey on dental hygiene and illustrates how Japan’s dental care is responding to the challenges of the population ageing.
ARTICLE | doi:10.20944/preprints202105.0068.v1
Subject: Life Sciences, Genetics Keywords: Mitochondrial Encephalohepatopathy, Trio-family, autosomal recessive, GEMINI tool, ClinVar database
Online: 5 May 2021 (15:02:57 CEST)
Mitochondrial Encephalohepatopathy (MEH) is an autosomal recessive neurodevelopmental disorder usually accompanied by microcephaly, white matter changes, cardiac and hepatic failure. Here, we applied the whole-exome sequencing (WES) framework on a trio family data with unaffected non-consanguineous parents and proband (neonate girl) with this inherited disorder. A total of 2,928,402 variants were observed with 2,613,746 SNPs, 112,336 multiple nucleotide polymorphisms (MNPs), 72,610 insertions, 113,207 deletions and 16,503 mixed variants. These variations are responsible for 82,813,631 effects on various genomic regions. Our pipeline uncovered candidate gene mutations from these variants and retained a handful of 5,277 variants harboring 3,598 genes, out of which, 8 genes codes for non-coding RNA while 178 genes are those with high impact severity. Among these 178 variants, 125 are de-novo variants that are not previously reported in the ClinVar database. Consistent to previous studies, the leftover high impact severity genes are involved in encephalopathy, Leigh syndrome, Charcot–Marie–Tooth disease, global developmental disorder, seizures, spastic paraplegia, premature ovarian failure, mitochondrial myopathy-cerebellar, ataxia-pigmentary, retinopathy syndrome, ocular and retinal degeneration, deafness, intellectual disability, cardiofacioneurodevelopmental syndrome etc. All these clinical features were also observed in the patient studied. The current analysis highlights and expands the genetic architecture of the MEH phenotype. Furthermore, this pipeline on trio family data significantly broadens the concept of its usefulness as a first-tier diagnostic method in the detection of complex multisystem phenotypic disorders.
ARTICLE | doi:10.20944/preprints202102.0270.v1
Subject: Biology, Anatomy & Morphology Keywords: MPDB2.0,; medicinal plant; medicinal plant database of Bangladesh; folk medicine
Online: 10 February 2021 (16:29:00 CET)
Medicinal plants are generally defined as rare herbals with potent medicinal activities that can be used as an alternative treatment for diseases. Recent studies exploring novel medicine developments, originating from folk-medicinal practices challenges this notion and suggests that both the circumference of the term medicinal plant and their potential application covers a substantially extensive verse than previously suggested. While medicinal plants are not limited to the borders of any country, Bangladesh and its south-east Asian neighbors do boast a huge collection of potent medicinal plants with considerable folk-medicine history compared to most other countries of the world. MPDB 2.0 is the continuation of MPDB 1.0, it serves as both a data repertoire for medicinal of Bangladesh and a user-friendly interface for researchers, health practitioners, drug developers, and students who wish to study the various medicinal & nutritive plants scattered around Bangladesh and the underlying phytochemicals contributing to their efficacy in folk medicine. While in developing MPDB 2.0 human diseases have been highly focused upon, the information in this database is not limited in its application for human diseases or diseases only, as many of the plants indexed here can serve in developing biofuel or bioremediation technologies or nutritive diets or cosmetics, etc. MPDB 2.0 comprises a collection of more than five hundred medicinal plants from Bangladesh along with a record of their corresponding scientific, family, and local names together with their utilized parts, information regarding ailments, active compounds, and PubMed ID of related publications.
ARTICLE | doi:10.20944/preprints201811.0280.v1
Subject: Life Sciences, Genetics Keywords: HSP47; missense mutation; mutational hotspot; variant analysis; cancer database; chaperone
Online: 12 November 2018 (10:23:29 CET)
Heat shock protein 47kDa (HSP47) serves as a client-specific chaperone, essential for collagen biosynthesis and its folding and structural assembly. To date, there is no comprehensive study on mutational hotspots and protein network for human HSP47. Using five different human mutational databases, we deduced a comprehensive list of human HSP47 mutations and we found 24 67, 50, 43 and 2 deleterious mutations from the 1000 genomes data, gnomAD, COSMICv86, cBioPortal, and CanVar. We identified thirteen top-ranked missense mutations of HSP47 with the stringent cut-off of CADD score (>25) and Grantham score (≥151) as Ser76Trp, Arg103Cys, Arg116Cys, Ser159Phe, Arg167Cys, Arg280Cys, Trp293Cys, Gly323Trp, Arg339Cys, Arg373Cys, Arg377Cys, Ser399Phe, and Arg405Cys with the arginine-cysteine change as the predominant mutation. We also found that HSP47 is up-regulated and down-regulated in 11 and 4 of cancers types. Upon constructing protein interactome map of human HSP47, we found that a set of molecular chaperones is interaction partners of HSP47, which included two copies each of CREB binding proteins, HSP27, HSP40, HSP70, HSP90, ubiquitin proteins and one copy each of cartilage associated protein (CRTAP), HSPH1, HSBP1, FK506-binding protein B (FKBP), kruppel-like factor (KLF13), peptidyl-prolyl isomerase PIPB and Prolyl 4-hydroxylase beta subunit (P4HB). This suggested a cocktail of different chaperones interact with HSP47. These findings will assist in the evaluation of roles of HSP47 in human disease including different types of cancers.
ARTICLE | doi:10.20944/preprints201804.0202.v1
Subject: Mathematics & Computer Science, Other Keywords: place; spatial property graph; place graph; graph database; place description
Online: 16 April 2018 (10:05:25 CEST)
Everyday place descriptions provide a rich source of knowledge about places and their relative locations. This research proposes a place graph model for modeling this spatial, non-spatial, and contextual knowledge from place descriptions. The model extends a prior place graph, and overcomes a number of limitations. The model is implemented using the Neo4j graph database, and a management system has also been developed that allows operations including querying, mapping, and visualizing the stored knowledge in an extended place graph. Then three experimental tasks, namely georeferencing, reasoning, and querying, are selected to demonstrate the superiority of the extended model.
ARTICLE | doi:10.20944/preprints201804.0134.v1
Subject: Earth Sciences, Geoinformatics Keywords: airborne laser scanning; geospatial database; data retrieval; road median; attributes
Online: 11 April 2018 (04:27:42 CEST)
Laser scanning systems make use of Light Detection and Ranging (LiDAR) technology to acquire accurately georeferenced sets of dense 3D point cloud data. The information acquired using these systems produces better knowledge about the terrain objects which are inherently 3D in nature. The LiDAR data acquired from mobile, airborne or terrestrial platforms provides several benefit over conventional sources of data acquisition in terms of accuracy, resolution and attributes. However, the large volume and scale of LiDAR data have inhibited the development of automated feature extraction algorithms due to the extensive computational cost involved in it. Moreover, the heterogeneously distributed point cloud, which represents objects with varying size, point density, holes and complicated structures pose a great challenge for data processing. Currently, geospatial database systems do not provide a robust solution for efficient storage and accessibility of raw data in a way that data processing could be applied based on optimal spatial extent. In this paper, we present Global LiDAR and Imagery Mobile Processing Spatial Environment (GLIMPSE) system that provides a framework for storage, management and integration of 3D LiDAR data acquired from multiple platforms. The system facilitates an efficient accessibility to the raw dataset, which is hierarchically represented in a geographically meaningful way. We utilise the GLIMPSE system to automatically extract road median from Airborne Laser Scanning (ALS) point cloud. In the first part of this paper, we detail an approach to efficiently retrieve the point cloud data from the GLIMPSE system for a particular geographic area based on user requirements. In the second part, we present an algorithm to automatically extract road median from the retrieved LiDAR data. The developed road median extraction algorithm utilises the LiDAR elevation and intensity attributes to distinguish the median from the road surface. We successfully tested our algorithms on two road sections consisting of distinct road median types based on concrete and grass-hedge barriers. The use of GLIMPSE improved the efficiency of the road median extraction in terms of fast accessibility to ALS point cloud data for the required road sections. The developed system and its associated algorithms provide a comprehensive solution to the user's requirement for an efficient storage, integration, retrieval and processing of large volumes of LiDAR point cloud data. These findings and knowledge contribute to a more rapid, cost-effective and comprehensive approach to surveying road networks.
ARTICLE | doi:10.20944/preprints202206.0289.v1
Subject: Engineering, Civil Engineering Keywords: database; eccentric punching shear; experiments; flat slab; punching; reinforced concrete; shear
Online: 21 June 2022 (05:42:44 CEST)
Eccentric punching shear can occur in concrete slab-column connections when the connection is subjected to shear and unbalanced moments. Typically, this situation results in edge and corner columns and is thus a common practical case. However, most punching experiments in the literature are concentric punching shear. This paper presents a developed database of eighty-eight experiments of flat slabs under eccentric punching shear, including a summary of the testing procedure of each reference and a description of the slab specimens. Additionally, a linear finite element analysis of all the specimens is included to determine the relevant sectional shear forces and moments. Finally, the ultimate shear stresses from the database experiments are compared to the shear capacities determined with ACI 318-19, Eurocode 2 NEN-EN 1992-1-1:2005, and the Model Code 2010. The comparison shows that the Model Code 2010 is the most precise in the predictions with an average tested over predicted ratio of 0.96 and a coefficient of variation of 27.96%. It can be concluded that this study represents the inconsistencies of the currently used design methods and the lack of experimental information.
ARTICLE | doi:10.20944/preprints202111.0019.v1
Subject: Engineering, Industrial & Manufacturing Engineering Keywords: Industry 4.0; Database; Data models; Big Data & Analytics; Asset Administration Shell
Online: 1 November 2021 (13:01:51 CET)
The data-oriented paradigm has proven to be fundamental for the technological transformation process that characterizes Industry 4.0 (I4.0) so that Big Data & Analytics is considered a technological pillar of this process. The literature reports a series of system architecture proposals that seek to implement the so-called Smart Factory, which is primarily data-driven. Many of these proposals treat data storage solutions as mere entities that support the architecture's functionalities. However, choosing which logical data model to use can significantly affect the performance of the architecture. This work identifies the advantages and disadvantages of relational (SQL) and non-relational (NoSQL) data models for I4.0, taking into account the nature of the data in this process. The characterization of data in the context of I4.0 is based on the five dimensions of Big Data and a standardized format for representing information of assets in the virtual world, the Asset Administration Shell. This work allows identifying appropriate transactional properties and logical data models according to the volume, variety, velocity, veracity, and value of the data. In this way, it is possible to describe the suitability of SQL and NoSQL databases for different scenarios within I4.0.
DATA DESCRIPTOR | doi:10.20944/preprints202106.0368.v1
Subject: Life Sciences, Biochemistry Keywords: Microbial Mash database, Mash distance, Genome containment, Type material, Microbial taxonomy
Online: 14 June 2021 (14:54:32 CEST)
The analysis of curated genomic, metagenomic, and proteomic data are of paramount importance in the fields of biology, medicine, education, and bioinformatics. Although this type of data is usually hosted in raw form in free international repositories, its access requires plenty of computing, storage, and processing capacities for the domestic user. The purpose of the study is to offer a comprehensive set of genomic and proteomic reference data, in an accessible and easy-to-use form to the scientific community. A representative type material set of genomes, proteomes and metagenomes were directly downloaded from the site: https://www.ncbi.nlm.nih.gov/assembly/ and from Genome Taxonomy Database, associated with the major groups of Bacteria, Archaea, Virus, and Fungi. Sketched databases were subsequently created and stored on handy raw reduced representations, by using Mash software. Our dataset contains near to 100 GB of space disk reduced to 585.78 MB and represents 87,476 genomics/proteomic records from eight informative contexts, which have been prefiltered to make them accessible, usable, and user-friendly with computational resources. Potential uses of this dataset include but are not limited to, microbial species delimitation, estimation of genomic distances, genomic novelties, paired comparisons between proteomes, genomes, and metagenomes.
Subject: Keywords: Business Intelligence; Information systems; database; schemas; SGBD; DSS; GDSS; EIS; EDSS
Online: 24 March 2021 (12:02:23 CET)
El objetivo del siguiente artículo es presentar algunos conceptos para entender un poco más sobre qué es la inteligencia empresarial, cuál es su significado y para qué sirve, cómo ha influido en la vida de las empresas para crear sus estrategias de negocio y por qué esta herramienta . nos ayuda a tomar decisiones en una organización.
REVIEW | doi:10.20944/preprints202004.0012.v1
Subject: Chemistry, General & Theoretical Chemistry Keywords: chemoinformatics; chemical space; database; LANaPD; molecular diversity; drug discovery; natural sources
Online: 2 April 2020 (04:47:13 CEST)
Around the World, the number of compound databases of natural products in the public domain is rising. This is in line with the increasing synergistic combination of natural product research and chemoinformatics. Towards this global endeavor, countries in Latin America are assembling, curating, and analyzing the contents and diversity of natural products available in their geographical regions. In this manuscript we collect and analyze the efforts that countries in Latin America have made so far to build natural product databases. We further encourage the scientific community in particular in Latin America, to continue their efforts to building quality natural product databases and, whenever possible, to make them publicly accessible. It is proposed that all compound collections could be assembled into a unified resource called LANaPD: Latin America Natural Products Database. Opportunities and challenges to build, distribute, and maintain LANaPD are also discussed
REVIEW | doi:10.20944/preprints201901.0099.v1
Subject: Earth Sciences, Other Keywords: SISAL database; speleothem; cave; oxygen isotopes; North America; Central America; Caribbean
Online: 10 January 2019 (11:58:08 CET)
Speleothem oxygen isotope records from the Caribbean, Central, and North America reveal climatic controls that include orbital variation, deglacial forcing related to ocean circulation and ice sheet retreat, and the influence of local and remote sea surface temperature variations. Here, we review these records and the global climate teleconnections they suggest following the recent publication of the Speleothem Isotopes Synthesis and Analysis (SISAL) database. We find that low-latitude records generally reflect changes in precipitation, whereas higher latitude records are sensitive to temperature and moisture source variability. Tropical records suggest precipitation variability is forced by orbital precession and North Atlantic Ocean circulation driven changes in atmospheric convection on long timescales, and tropical sea surface temperature variations on short timescales. On millennial timescales, precipitation seasonality in southwestern North America is related to North Atlantic climate variability. Great Basin speleothem records are closely linked with changes in Northern Hemisphere summer insolation. Although speleothems have revealed these critical global climate teleconnections, the paucity of continuous records precludes our ability to investigate climate drivers from the whole of Central and North America for the Pleistocene through modern. This underscores the need to improve spatial and temporal coverage of speleothem records across this climatically variable region.
ARTICLE | doi:10.20944/preprints201706.0062.v1
Subject: Social Sciences, Library & Information Science Keywords: h-index; citations; published version; Scopus database; highly cited paper; bibliometrics
Online: 14 June 2017 (06:07:12 CEST)
The number of citations that a paper has received is the most commonly used indicator to measure the quality of research. Researchers, journals, and universities want to receive more citations for their scholarly publications to increase their h-index, impact factor, and ranking respectively. In this paper, we tried to analyses the effect of the number of available Google Scholar versions of a paper on citations count. We analyzed 10,162 papers which are published in Scopus database in year 2010 by Malaysian top five universities. Then we developed a software to collect the number of citations and versions of each paper from Google Scholar automatically. The result of spearman correlation coefficient revealed that there is positive significant association between the number of Google Scholar versions of a paper and the number of times a paper has been cited.
ARTICLE | doi:10.20944/preprints202209.0317.v1
Subject: Medicine & Pharmacology, Dermatology Keywords: Bibliometric analysis; Scopus database; Syphilis; Sexually Transmitted Infections; Public Health; Research; Global
Online: 21 September 2022 (07:14:19 CEST)
Sexually transmitted infections encompass considerable effects on human sexual and reproductive health. Its presence is ubiquitous despite decades of prevention and management. The present study has been conducted to provide an insightful bibliometric analysis of syphilis based on the Scopus repository. Therefore, given the dearth of a consolidated bibliometric analysis on Syphilis, this investigation aimed to compile the literature of the last century (1921-2021) to gain insight into the publications pertinent to the burden, diagnosis, treatment, and management of Syphilis. In this study, we have provided the year-wise, and subject-wise publications, type of articles, country, funding organizations, institutions, citations, and H-index. The data obtained from the Scopus database was exported to CSV file format and then converted to Microsoft Excel version for analysis to curtail the chances of error in the information. It has been evidenced that the USA possesses the highest track of proven publications. Therefore, this study considerably contributes to the future leaders, researchers & specialists/ clinicians of the domain
REVIEW | doi:10.20944/preprints202202.0200.v1
Subject: Life Sciences, Microbiology Keywords: antibiotic resistance genes; antibiotic resistance gene database; annotation of antibiotic resistance genes
Online: 17 February 2022 (04:52:10 CET)
As the prevalence of antimicrobial resistance genes is increasing in microbes, we are facing the return of the preantibiotic era. Consecutively, the number of studies concerning antibiotic resistance and its spread in the environment is rapidly growing. Next generation sequencing technologies are widespread used in many areas of biological research and antibiotic resistance is no exception. For the rapid annotation of whole genome sequencing and metagenomic results considering antibiotic resistance, several tools and data resources were developed. These databases, however can differ fundamentally in the number and type of genes and resistance determinants they comprise. Furthermore, the annotation structure and metadata stored in these resources can also contribute to their differences. Several previous reviews were published on the tools and databases of resistance gene annotation, however, to our knowledge, no previous review focused solely and in depth on the differences in the databases. In this review, we compare the most well-known and widely used antibiotic resistance gene databases based on their structure and content. We believe that this knowledge is fundamental for selecting the most appropriate database for a research question and for the development of new tools and resources of resistance gene annotation.
ARTICLE | doi:10.20944/preprints202012.0387.v1
Subject: Life Sciences, Biochemistry Keywords: next-generation sequencing; database; variant annotation; variant classification; data management; clinical genomics
Online: 15 December 2020 (13:14:21 CET)
The rapid evolution of Next Generation Sequencing in clinical settings and the resulting challenge of variants interpretation in the light of constantly updated information, requires robust data management systems and organized approaches to variant reinterpretation. In this paper, we present iVar: a freely available and highly customizable tool provided with a user-friendly web interface. It represents a platform for the unified management of variants identified by different sequencing technologies. iVar accepts, as input, VCF files and text annotation files and elaborates them, optimizing data organization and avoiding redundancies. Updated annotations can be periodically re-uploaded and associated to variants as historicize attributes. Data can be visualized through variant-centered and sample-centered interfaces. A customizable search functionality can be exploited to periodically check if pathogenicity related data of a variant are changed over time. Patient recontacting ensuing from variant reinterpretation is made easier by iVar through the effective identification of all patients present in the database and carrying a specific variant. We tested iVar by uploading 4171 VCF files and 1463 annotation files, obtaining a database of 4166 samples and 22569 unique variants. iVar has proven to be a useful tool with good performances for collecting and managing data from medium-throughput
ARTICLE | doi:10.20944/preprints202007.0132.v1
Subject: Life Sciences, Biotechnology Keywords: coronavirus; spike protein; database; sequence alignment; mutation; homology model; hydrophobic amino acids
Online: 7 July 2020 (16:49:04 CEST)
Analysis of SARS-CoV-2 spike protein sequences of over 19 countries from biological databases submitted around the globe was carried out with help of bioinformatics tools and structure prediction databases. Initial data analysis showed entry of virus into different geographic regions started in the month of January 2020. Meanwhile, alignment of spike protein sequences of SARS-CoV-2 isolates from China and other countries revealed a critical mutation of D614G. Surprisingly, mutation D614G was not seen in early samples submitted in the month of January but gradually it started appearing globally from the month of March 2020. However, the mutations of amino acids in the spike protein other than D614G exhibiting similar pI and altered polarity were found to be specific to geographical regions. Besides, prediction of homology model for interaction of spike protein showed predominant role of chain C of trimeric spike protein in adhering receptor binding domain (RBD) of human ACE2 receptor. Furthermore, the prediction of glycosylation points has revealed that there are about 20 N-glycosylation potential sites on spike protein. We believe that the information present here would not only help in thorough understanding of infectivity but also enhance the knowledge of the scientific community in developing prophylactics and/or therapeutics for SARS-CoV19-2 virus.
COMMUNICATION | doi:10.20944/preprints202003.0125.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: 2019-nCoV; Darunavir; ACE-2; Receptor Binding Domain; Metastable Conformation; FDA database
Online: 7 March 2020 (16:28:05 CET)
The transnational spread of coronavirus (2019-nCoV) first detected in Wuhan is causing global panic; thus, accelerated research into clinical intervention is of high necessity. The spike glycoprotein structure has been resolved, and its affinity to human angiotensin-converting enzyme 2 (ACE-2) has been experimentally validated. Here, using computational methods, a metastable conformation of 2019-nCoV-RBD/ACE-2 complex has been revealed and FDA-database of approved drugs have been docked into the interface. Darunavir has been discovered as high ligand affinity candidate capable of disrupting communication between 2019-nCoV-RBD and ACE-2. Darunavir, in addition to its previously known anti-HIV protease inhibitor is now repurposeable for the treatment 2019-nCoV disease acting via disruption of cellular recognition, binding and invasion.
ARTICLE | doi:10.20944/preprints201905.0091.v1
Subject: Medicine & Pharmacology, Pharmacology & Toxicology Keywords: Iranian traditional medicine; alternative and complementary medicine; database; natural products; Mizaj; temperament
Online: 8 May 2019 (10:08:54 CEST)
As a holistic medical school, Iranian traditional medicine (ITM) considers the human body as a dynamic and intricate network of interconnecting processes. Currently, systems biology and more precisely systems medicine and pharmacology can be an aid in providing rationalizations for many traditional medications and treatments and elucidating a great deal of knowledge they can offer to guide future research in medicine. Therefore, re-organization and standardization of traditional medicine data are requested more than ever before. To address this issue, we have constructed UNaProd, a Universal Natural Product database for materia medica of ITM. Primarily based on Makhzan al-Advieh, which is the most recent encyclopedia of materia medica in ITM with the largest number of monographs, this database was created using both text mining methods and manual editing. UNaProd is currently hosting to 2696 monographs from herbal to animal to mineral compounds in 16 diverse attributes such as origin and scientific name. In the current version, UNaProd is hyperlinked to IrGO and CMAUP databases for Mizaj and molecular features respectively and it is freely available at http://jafarilab.com/unaprod/.
ARTICLE | doi:10.20944/preprints201709.0126.v1
Subject: Engineering, Other Keywords: 3D terrain models; synthetic environment; modeling and simulation; OGC standards; common database
Online: 26 September 2017 (04:13:40 CEST)
Recent advances in sensor and platform technologies such as satellite systems, unmanned aerial vehicles (UAV), manned aerial platforms, and ground-based sensor networks have resulted in massive volumes of data is produced and collected about the earth. Processing, managing, and analyzing these data is one of the main challenges in 3D synthetic representation used in modeling and simulation (M&S) of the natural environment. M&S devices, such as flight simulators, traditionally require a variety of different databases to provide a synthetic representation of the world. M&S often requires integration of data from a variety of sources stored in different formats. Thus, for simulation of a complex synthetic environment, such as a 3D terrain model, tackling interoperability among its components (geospatial data, natural and man-made objects, dynamic and static models) is a critical challenge. Conventional approaches used local proprietary data models and formats. These approaches often lacked interoperability and created silos of content within the simulation community. Therefore, open geospatial standards are increasingly perceived as a means to promote interoperability and reusability for 3D M&S. In this paper, the Open Geospatial Consortium (OGC) CDB Standard is introduced. “CDB” originally refers to Common DataBase which is currently considered as a name with no abbreviation in the OGC community. The OGC CDB is an international standard for structuring, modeling, and storing geospatial information required in high performance modeling and simulation applications. CDB defines the core conceptual models, use cases, requirements, and specifications for employing geospatial data in 3D M&S. The main features of the OGC CDB Standard are described as run-time performance, full plug-and-play interoperable geospatial data store, usefulness in 3D and dynamic simulation environment, ability to integrate proprietary and open-source data formats. Furthermore, compatibility with the OGC standards baseline reduces the complexity of discovering, transforming, and streaming geospatial data into the synthetic environment and makes them more widely acceptable to major geospatial data/software producers. This paper includes an overview of OGC CDB version 1.0 which defines a conceptual model and file structure for the storage, access, and modification of a multi-resolution 3D synthetic environment data store. Finally, this paper presents a perspective of future versions of the OGC CDB and what the steps are for humanizing the OGC CDB standard with the other OGC/ISO standards baseline.
ARTICLE | doi:10.20944/preprints202103.0640.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Databases; database administration; database management systems; counting; storage; structure; search; No SQL; SQL; Oracle; relational databases; non-relational databases; magnetic tapes; punched tapes; relational model; Datamining; BigData; Datawarehouse
Online: 25 March 2021 (16:05:52 CET)
Databases are by far the most valuable asset of companies. Since the need was seen not only to count but also to have some type of record of elements such as crops, animals, money, properties and that this record could be consulted and modified according to the situation, that is where the first database was born. , and after that, these databases cannot be disorganized, they also need to be managed and administered under established standards that facilitate their understanding and management not only by their creators but by the other people who subsequently administer them. Databases and database management systems have an interesting evolutionary history that deserves to be analyzed and this is the objective of this document, where it is sought to understand. Along with databases and their management systems, data mining or Data mining arises that in order not to extend ourselves so much, it is the job of finding common patterns in various data sources and in what way they can be used to predict situations or results of various circumstances; We also focus on the other topic that we will present, Oracle data mining, which roughly is to merge data mining with Oracle, which makes it a powerful tool for obtaining information and predicting results based on statistics.In this article we will study and analyze the ideas, concepts and basic examples that make up SGBD and Data Mining and, we will try to go deeper into this topic, the use of decision techniques such as advanced statistical algorithms. We also present a fictitious example of the application of these techniques: predicting which products can be sold based on their relationship with others. we will give a brief explanation of association rules, data mining cycle and the types of learning and the evolution that data mining has had.
ARTICLE | doi:10.20944/preprints202207.0037.v1
Subject: Earth Sciences, Environmental Sciences Keywords: remote sensing; satellite; altimetry; water level; water inland; essential climate variable; database; hydrology
Online: 4 July 2022 (08:02:24 CEST)
Surface water availability is a fundamental environmental variable to implement effective climate adaptation and mitigation plans, as expressed by scientific, financial and political stakeholders. Recently published requirements urge the need for homogenised access to long historical records at a global scale, together with the standardised characterisation of the accuracy of observations. While satellite altimeters offer world coverage measurements, existing initiatives and online platforms provide derived water level data. However, these are sparse, particularly in complex topographies. This study introduces a new methodology in two steps 1) teroVIR, a virtual station extractor for a more comprehensive global and automatic monitoring of water bodies, and 2) teroWAT, a multi-mission, interoperable water level processor, for handling all terrain types. L2 and L1 altimetry products are used, with state-of-the-art retracker algorithms in the methodology. The work presents a benchmark between teroVIR and current platforms in West Africa, Kazakhastan and the Arctic: teroVIR shows an unprecedented increase from 55% to 99% in spatial coverage.A large-scale validation of teroWAT results in an average of unbiased root mean square error ubRMSE of 0.638 m on average for 36 locations in West Africa. Traditional metrics (ubRMSE, median, absolute deviation, Pearson coefficient) disclose significantly better values for teroWAT when compared with existing platforms, of the order of 8 cm and 5% improved respectively in error and correlation. teroWAT shows unprecedented excellent results in the Arctic, using a L1 products based algorithm instead of L2 one, reducing the error of almost 4 m on average. To further compare teroWAT with existing methods, a new scoring option, teroSCO, is presented, measuring the quality of the validation of time series transversally and objectively across different strategies. Finally, teroVIR and teroWAT are implemented as platform-agnostic modules and used by flood forecasting and river discharge methods as relevant examples. A review of various applications for miscellaneous end-users is given, tackling the educational challenge raised by the community.
ARTICLE | doi:10.20944/preprints202108.0502.v1
Subject: Earth Sciences, Geology Keywords: Geological surveys; Field mapping; Python; QGIS plugin; RDBMS; Seismic microzonation; SQLite-SpatiaLite Database.
Online: 26 August 2021 (09:54:56 CEST)
MzSTools is a plugin for QGIS developed by the National Research Council (CNR) as part of the activities concerning the coordination of seismic microzonation studies in Italy. It train from the need to create a practical and easy-to-use tool to carry out seismic microzonation (SM) studies by producing standards compliant geographic database and maps, thus making them accurate, homogeneous and uniform for all municipalities in Italy. A geodatabase based on SQLite/SpatiaLite Relational Database Management System (RDBMS). It has been designed to collect and store data related to elements such as: geognostic surveys; bedrocks and cover terrains; superficial and buried geomorphological elements; tectonic-structural elements; elements of geological instability such as landslide zones, liquefaction zones and zones affected by active and capable faults; homogeneous microzones in seismic perspective, microzones characterized by a seismic amplification factor. The QGIS plugin provides tools such as data entry forms designed with Qt Designer; a QGIS project template with layers, symbol libraries and graphic styles; layouts for the SM Maps. MzSTools assembles in a single software environment a set of useful tools for those who work in. The plugin is open source, whose code hosted on the GitHub platform, and is published via the official QGIS plugins repository (https://plugins.qgis.org/plugins/MzSTools/).
ARTICLE | doi:10.20944/preprints202105.0386.v1
Subject: Life Sciences, Biochemistry Keywords: tumor microenvironment; meta-analysis; tumor stroma; breast cancer; LCM; microdissection; transcriptomics; microarray; database
Online: 17 May 2021 (13:17:53 CEST)
Background: transcriptome data provide a valuable resource for the study of cancer molecular mechanisms, but technical biases, samples’ heterogeneity and small sample sizes result in poorly reproducible lists of regulated genes. Additionally, the presence of multiple cellular components contributing to cancer development complicate the interpretation of bulk transcriptomic profiles. Methods: we collected 48 microarray datasets of laser capture microdissected breast tumors, and performed a meta-analysis to identify robust lists of genes differentially expressed in these tumors. We created a database with carefully harmonized metadata to be used as a resource for the research community. Results: combining the results of multiple datasets improved the statistical power, and the analysis of stroma and epithelium separately allows identifying genes with different contribution in each compartment. Conclusions: our database can profitably help biomarkers’ discovery and is readily accessible through a user-friendly web interface (https://aurorasavino.shinyapps.io/metalcm/).
ARTICLE | doi:10.20944/preprints202102.0502.v1
Subject: Keywords: Alzheimer's Disease; Onset Age; Bilingualism; Cognitive Reserve; Dementia; Mild Cognitive Impairment; ADNI database
Online: 23 February 2021 (09:20:19 CET)
Background: This paper investigates the statistical relationship between bilingualism and the Onset Age (OA) of AD and MCI across a clinical sample, consisting of 580 Alzheimer's Disease (AD) subjects and 1264 Mild Cognitive Impairment (MCI) subjects, via a statistical analysis conducted on the sample retrieved from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Method: To investigate whether bilingualism has any correlation with the OAs of AD or MCI subjects, our study leverages the full potential of the ADNI dataset, a dataset that covers both the OA and the bilingualism status of both the AD and MCI subjects. Prior to performing any meaningful statistical analysis, a regression model and a probabilistic model were developed in parallel to fill in the missing OA and bilingualism values. A simple least-square regression model that consists of an independent variable of registered age for Mini-Mental State Examination (MMSE) score was used to estimate the OA of the AD and MCI subjects in the ADNI dataset. After filling in the missing OA values, the number of subjects relevant for the statistical analysis increased from 816 (AD: 371, MCI: 445) to 1844 (AD: 580, MCI: 1264), which greatly enlarged the representation of the AD and MCI sample in the ADNI population. With increased sample size, a novel probabilistic classification model was introduced to infer an ADNI subject’s bilingualism when relevant demographic information and deterministic outcome were not readily available from the ADNI dataset. The weighted average OA for the bilinguals and the monolinguals was then computed, where the weights for the probabilistic labels were assigned based on the percentage of bilingualism in the general US population. Finally, a statistical analysis was performed to test whether any statistically significant correlation exists between the OA and the bilingualism of the AD and MCI subjects within the ADNI dataset. Findings: Our preliminary study demonstrates no significant statistical difference between the OA of the bilinguals and the monolinguals within the ADNI dataset. Thus, the monolingual speakers within the ADNI dataset do not statistically manifest earlier onset, as compared to the bilingual speakers, which is slightly inconsistent with some earlier statistical findings that bilingual speakers enjoy certain distinctive advantages, such as late onset of AD, as compared to monolingual counterparts.
ARTICLE | doi:10.20944/preprints201810.0062.v1
Subject: Earth Sciences, Geophysics Keywords: ASTER instrument, stereo, digital elevation model, global database, optical sensor, water body detection.
Online: 3 October 2018 (17:01:08 CEST)
A waterbody detection technique is an essential part of digital elevation model (DEM) generation to delineate land-water boundaries and set flattened elevations. This paper describes the technical methodology for improving the initial tile-based waterbody data that are created during production of the ASTER GDEM, because without improvement such tile-based waterbody data are not suitable for incorporating into the new ASTER GDEM Version 3. Waterbodies are classified into three categories: sea, lake, and river. For sea-waterbodies, the effect of sea ice is removed to better delineate sea shorelines in high latitude areas, because sea ice prevents accurate delineation of sea shorelines. For lake-waterbodies, the major part of the processing is to set the unique elevation value for each lake using a mosaic image that covers the entire lake area. Rivers present a unique challenge, because their elevations gradually step down from upstream to downstream. Initially, visual inspection is required to separate rivers from lakes. A stepwise elevation assignment, with a step of one meter, is carried out by manual or automated methods, depending on the situation. The ASTER GWBD product consists of a global set of 1º latitude-by-1º longitude tiles containing water body attribute and elevation data files in geographic latitude and longitude coordinates and with one arc second posting. Each tile contains 3601-by-3601 data points. All improved waterbody elevation data are incorporated into the ASTER GDEM to reflect the improved results.
ARTICLE | doi:10.20944/preprints201608.0073.v2
Subject: Earth Sciences, Atmospheric Science Keywords: land surface temperature; thermal infrared; calibration; generalized split-window; mono-window; database; radiative transfer
Online: 16 September 2016 (13:12:09 CEST)
Land Surface Temperature (LST) is routinely retrieved from remote sensing instruments using semi-empirical relationships between top of atmosphere (TOA) radiances and LST, using ancillary data such as total column water vapor or emissivity. These algorithms are calibrated using a set of forward radiative transfer simulations that return the TOA radiances given the LST and the thermodynamic profiles. The simulations are done in order to cover a wide range of surface and atmospheric conditions and viewing geometries. This work analyses calibration strategies, considering some of the most critical factors that need to be taken into account when building a calibration dataset, covering the full dynamic range of relevant variables. A sensitivity analysis of split-windows and single channel algorithms revealed that selecting a set of atmospheric profiles that spans the full range of surface temperatures and total column water vapor combinations that are physically possible seems beneficial for the quality of the regression model. However, the calibration is extremely sensitive to the low-level structure of the atmosphere indicating that the presence of atmospheric boundary layer features such as temperature inversions or strong vertical gradients of thermodynamic properties may affect LST retrievals in a non-trivial way. This article describes the criteria established in the EUMETSAT Land Surface Analysis – Satellite Application Facility to calibrate its LST algorithms applied both for current and forthcoming sensors.
ARTICLE | doi:10.20944/preprints202201.0227.v1
Subject: Behavioral Sciences, Other Keywords: Bayesian inference; race and ethnicity imputation; All Payer Claims Database; vital statistics death records; validation
Online: 17 January 2022 (12:40:15 CET)
Background: All Payer Claims Databases (APCD) are a rich source of health information, however, race and ethnicity (R&E) data are largely missing. Bayesian Improved Surname Geocoding (BISG) is a common R&E imputation method, yet, validation of BISG in APCDs is lacking. We used the BISG to impute missing R&E in the Oregon APCD. Methods: BISG imputed R&E for Asian Pacific Islanders (API), Blacks, Hispanics and Whites were contrasted to the gold standard (vital statistics) and sensitivity and specificity improvements were assessed. Logistic regression examined whether missing R&E was random across patient characteristics. Results: Among 85,857 individuals in the study, 32.1% (n=27,594) had missing R&E. Missing R&E was not randomly distributed. There were higher odds of missingness among males, Whites, those age 65 and older, and commercially insured individuals. Differences in the percent missing were also found by co-morbid conditions and mortality causes. Imputing the missing R&E with BISG method improved the sensitivity to identify White, Black, API, and Hispanics. Conclusions: APCDs can benefit from enhancing missing R&E with BISG imputation to perform more robust population-health level analyses and identify inequities according to R&E without losing power or dropping non-random records with missing R&E data.
ARTICLE | doi:10.20944/preprints201804.0088.v1
Subject: Arts & Humanities, History Keywords: historical dataset; geocoding; localisation; geohistorical objects; database; GIS; collaborative; citizen science; crowd-sourced; digital humanities
Online: 8 April 2018 (09:13:10 CEST)
The latest developments in digital humanities have increasingly enabled the construction of large data sets which can easily be accessed and used. These data sets often contain indirect localisation information, such as historical addresses. Historical geocoding is the process of transforming the indirect localisation information to direct localisation that can be placed on a map, which enables spatial analysis and cross-referencing. Many efficient geocoders exist for current addresses, but they do not deal with temporal information and are usually based on a strict hierarchy (country, city, street, house number, etc.) that is hard, if not impossible, to use with historical data. Indeed, historical data are full of uncertainties (temporal, textual, positional accuracy, confidence in historical sources) that can not be ignored or entirely resolved. We propose an open source, open data, extensible solution for geocoding that is based on gazetteers composed of geohistorical objects extracted from historical topographical maps. Once the gazetteers are available, geocoding an historical address is a matter of finding the geohistorical object in the gazetteers that is the best match to the historical address searched by the user. The matching criteria are customisable and include several dimensions (fuzzy string, fuzzy temporal, level of detail, positional accuracy). As the goal is to facilitate historical work, we also propose web-based user interfaces that help geocode (one address or batch mode) and display over current or historical topographical maps, so that geocoding results can be checked and collaboratively edited. The system has been tested on the city of Paris, France, for the 19th and the 20th centuries. It shows high response rates and is fast enough to be used interactively.
ARTICLE | doi:10.20944/preprints201612.0075.v1
Subject: Earth Sciences, Geoinformatics Keywords: image recognition bases location; indoor positioning; RGB-D images; LiDAR; DataBase; mobile computing; image retrieval
Online: 15 December 2016 (07:17:35 CET)
This paper describes the first results of an Image Recognition Based Location (IRBL) for mobile application focusing on the procedure to generate a Database of range images (RGB-D). In an indoor environment, to estimate the camera position and orientation, a prior spatial knowledge of the surrounding is needed. In order to achieve this objective a complete 3D survey of two different environment (Bangbae metro station of Seoul and E.T.R.I. building in Daejeon – Republic of Korea) was performed using LiDAR (Light Detection And Ranging) instrument and the obtained scans were processed in order to obtain a spatial model of the environments. From this, two databases of reference images were generated using a specific software realized by the Geomatics group of Politecnico di Torino (ScanToRGBDImage). This tool allow to generate synthetically different RGB-D images) centered in the each scan position in the environment. Later, the external parameters (X, Y, Z, ω, φ, κ) and the range information extracted from the DB images retrieved, are used as reference information for pose estimation of a set of acquired mobile pictures in the IRBL procedure. In this paper the survey operations, the approach for generating the RGB-D images and the IRB strategy are reported. Finally the analysis of the results and the validation test are described.
ARTICLE | doi:10.20944/preprints202209.0310.v1
Subject: Medicine & Pharmacology, Pharmacology & Toxicology Keywords: COVID-19; CoviRx.org; database; drugs; pandemic; repurposing; SARS-CoV-2; therapies; treatments; Variants of Concern (VOC)
Online: 20 September 2022 (15:00:48 CEST)
SARS-CoV-2, is the cause of the COVID-19 pandemic which has claimed more than six million lives worldwide, devastating the economy and overwhelming healthcare systems globally. The development of new drug molecules and vaccines has played a critical role in managing the pandemic; however, new variants of concern still pose a significant threat as the current vaccines cannot prevent all infections. This situation calls for the collaboration of biomedical scientists and healthcare workers across the world. Repurposing approved drugs is an effective way of fast-tracking new treatments for recently emerged diseases. To this end, we have assembled and curated a database consisting of 7817 compounds from the Compounds Australia Open Drug collection. We developed a set of eight filters based on indicators of efficacy and safety that were applied sequentially to down-select drugs that showed promise for drug repurposing efforts against SARS-CoV-2. Considerable effort was made to evaluate approximately 14000 assay data points for SARS-CoV-2 FDA/TGA-approved drugs and provide an average activity score for 3539 compounds. The filtering process identified 12 FDA approved molecules with established safety profiles that have a plausible mechanism for treating COVID-19 disease. The methodology developed in our study provides a template for prioritising repurposable drug candidates that are safe, efficacious, and cost-effective for the treatment of COVID-19, long COVID, or any other future disease. We present our database in an easy-to-use interactive interface (CoviRx, https://www.covirx.org/) that was also developed to enable scientific community to access to the data of over 7000 potential drugs and to implement alternative prioritisation and down-selection strategies.
ARTICLE | doi:10.20944/preprints202104.0256.v1
Subject: Mathematics & Computer Science, Numerical Analysis & Optimization Keywords: Multi-database Mining; Graph Clustering; Coordinate Descent; Convex Optimization; Similarity Measure; Binary Entropy Loss; Fuzziness Index
Online: 9 April 2021 (10:20:06 CEST)
Clustering algorithms for multi-database mining (MDM) rely on computing $(n^2-n)/2$ pairwise similarities between $n$ multiple databases to generate and evaluate $m\in[1, (n^2-n)/2]$ candidate clusterings in order to select the ideal partitioning which optimizes a predefined goodness measure. However, when these pairwise similarities are distributed around the mean value, the clustering algorithm becomes indecisive when choosing what database pairs are considered eligible to be grouped together. Consequently, a trivial result is produced by putting all the $n$ databases in one cluster or by returning $n$ singleton clusters. To tackle the latter problem, we propose a learning algorithm to reduce the fuzziness in the similarity matrix by minimizing a weighted binary entropy loss function via gradient descent and back-propagation. As a result, the learned model will improve the certainty of the clustering algorithm by correctly identifying the optimal database clusters. Additionally, in contrast to gradient-based clustering algorithms which are sensitive to the choice of the learning rate and require more iterations to converge, we propose a learning-rate-free algorithm to assess the candidate clusterings generated on the fly in a fewer upper-bounded iterations. Through a series of experiments on multiple database samples, we show that our algorithm outperforms the existing clustering algorithms for MDM.
ARTICLE | doi:10.20944/preprints201910.0032.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: computerized revenue collection; machine learning; cyber security; software defined networks; object-oriented programming; online database management
Online: 3 October 2019 (01:45:11 CEST)
The need for the most accurate and flexible system of revenue collection from internal sources has become a matter of extreme urgency and importance in e-governance. This need underscores the eagerness on the part of the Government to look for a new principle and policy of revenue collection or to become aggressive and innovative in the mode of collecting revenue from existing sources using the present system. The Boards of some Governments in Africa, even up to the moment are facing a lot of setbacks in performing their tasks due to the manual system of revenue collection from the public. This can be improved through an effective collection of revenue using the most accurate and flexible system. Tax is usually collected in the form of specific sales tax, general sales tax, corporate income tax, individual income tax, property tax and inheritance tax. Problems such as high cost of collection, fraud, underpayment, leakage in revenue, poor access to information, poor tracking of defaulters is at the increase. As a result of this, there is need to computerize the revenue collection system. Computerized systems have proven to introduce massive efficiencies and quick collection of revenue from the public. This research work demonstrates how to design and implement an automated system of revenue collection and how to maintain a secured database for collected tax information. This research delves into the study of how machine learning algorithms and Software-defined Networks improve the security of such automated systems.
REVIEW | doi:10.20944/preprints202103.0402.v1
Subject: Medicine & Pharmacology, Allergology Keywords: anesthesia; anesthesiology; big data; registries; database research; acute pain; pain management; postoperative pain; regional anesthesia; regional analgesia.
Online: 15 March 2021 (17:45:39 CET)
The digital transformation of healthcare is advancing, leading to an increasing availability of clinical data for research. Perioperative big data initiatives were established to monitor treatment quality and benchmark outcomes. However, big data analyzes have long exceeded the status of pure quality surveillance instruments. Large retrospective studies nowadays often represent the first approach to new questions in clinical research and pave the way for more expensive and resource intensive prospective trials. As a consequence, utilization of big data in acute pain and regional anesthesia research considerably increased over the last decade. Multicentric clinical registries and administrative databases (e.g., healthcare claims databases) have collected millions of cases until today, on which basis several important research questions were approached. In acute pain research, big data was used to assess postoperative pain outcomes, opioid utilization, and the efficiency of multimodal pain management strategies. In regional anesthesia, adverse events and potential benefits of regional anesthesia on postoperative morbidity and mortality were evaluated. This article provides a narrative review on the growing importance of big data for research in acute postoperative pain and regional anesthesia.
ARTICLE | doi:10.20944/preprints202112.0163.v1
Subject: Social Sciences, Other Keywords: ethnobotany; paleoethnobotany; biocultural heritage; digital heritage; online database; Indigenous data sovereignty; Open Access; research accessiblity; digital reference collection
Online: 9 December 2021 (20:01:36 CET)
Biocultural heritage preservation relies on ethnobotanical knowledge and the paleoethnobotanical data used in (re)constructing histories of human-biota interactions. Biocultural heritage, defined as the knowledge and practices of Indigenous and Local peoples and their biological relatives, is often guarded information, meant for specific audiences and withheld from other social circles. As such, these forms of heritage and knowledge must also be included in the ongoing data sovereignty discussions and movement. In this paper we share the process and design decisions behind creating an online database for ethnobotanical knowledge and associated paleoethnobotanical data, using a content management system designed to foreground Indigenous and local perspectives. Our main purpose is to suggest the Mukurtu content management system, originally designed for physical items of cultural importance, be considered as a potential tool for digitizing and ethically circulating biocultural heritage, including paleoethnobotanical resources. With this database, we aim to create access to biocultural heritage and paleoethnobotanical considerations for a variety of audiences while also respecting the protected and sensitive natures of Indigenous and local knowledges.
ARTICLE | doi:10.20944/preprints202103.0295.v1
Subject: Materials Science, Biomaterials Keywords: additive manufacturing; rapid solidification; microstructural evolution; non-equilibrium; quasi-equilibrium; multi-phase field method; CALPHAD database; nickel alloy
Online: 11 March 2021 (07:40:42 CET)
Solidification microstructure is formed under high cooling rates and temperature gradients in powder-based additive manufacturing. In this study, a non-equilibrium multi-phase field method (MPFM), which was based on a finite interface dissipation model proposed by Steinbach et. al., coupled with a CALPHAD database was developed for a multicomponent Ni alloy. A qua-si-equilibrium MPFM was also developed for comparison. Two-dimensional equiaxed micro-structural evolution for the Ni (Bal.)–Al–Co–Cr–Mo–Ta–Ti–W–C alloy was performed at various cooling rates. The temperature–γ fraction profiles obtained under 10^5 K/s using non- and qua-si-equilibrium MPFMs were in good agreement with each other. Over 10^6 K/s, the differences between non- and quasi-equilibrium methods grew as the cooling rate increased. The non-equilibrium solidification was strengthened over a cooling rate of 10^6 K/s. Colum-nar-solidification microstructural evolution was performed under cooling rates from 5×10^5 K/s to 1×10^7 K/s at various temperature gradient values under the constant interface velocity (0.1 m/s). The results showed that as the cooling rate increased, the cell space decreased in both methods, and the non-equilibrium MPFM agreed well with experimental measurements. Our results show that the non-equilibrium MPFM can simulate solidification microstructure in powder bed fusion additive manufacturing.
ARTICLE | doi:10.20944/preprints202001.0166.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: spatiotemporal database; spatial analysis; seasonal precipitation; spearman correlation coefficient; pacific decadal oscillation; southern oscillation index; north atlantic oscillation
Online: 16 January 2020 (10:59:53 CET)
Temporary changes in precipitation may lead to sustained and severe drought or massive floods in different parts of the world. Knowing variation in precipitation can effectively help the water resources decision-makers in water resources management. Large-scale circulation drivers have a considerable impact on precipitation in different parts of the world. In this research, the impact of El Niño-Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), and North Atlantic Oscillation (NAO) on seasonal precipitation over Iran was investigated. For this purpose, 103 synoptic stations with at least 30 years of data were utilized. The Spearman correlation coefficient between the indices in the previous 12 months with seasonal precipitation was calculated, and the meaningful correlations were extracted. Then the month in which each of these indices has the highest correlation with seasonal precipitation was determined. Finally, the overall amount of increase or decrease in seasonal precipitation due to each of these indices was calculated. Results indicate the Southern Oscillation Index (SOI), NAO, and PDO have the most impact on seasonal precipitation, respectively. Also, these indices have the highest impact on the precipitation in winter, autumn, spring, and summer, respectively. SOI has a diverse impact on winter precipitation compared to the PDO and NAO, while in the other seasons, each index has its special impact on seasonal precipitation. Generally, all indices in different phases may decrease the seasonal precipitation up to 100%. However, the seasonal precipitation may increase more than 100% in different seasons due to the impact of these indices. The results of this study can be used effectively in water resources management and especially in dam operation.
DATA DESCRIPTOR | doi:10.20944/preprints202209.0323.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: COVID-19; Open-source dataset; Drug Repurposing; Database system; Web application devel-opment; software development; Drug fingerprints; Bulk upload
Online: 21 September 2022 (10:14:11 CEST)
Although various vaccines are now commercially available, they have not been able to stop the spread of COVID-19 infection completely. An excellent strategy to quickly get safe, effective, and affordable COVID-19 treatment is to repurpose drugs that are already approved for other diseases as adjuvants along with the ongoing vaccine regime. The process of developing an accurate and standardized drug repurposing dataset requires a considerable level of resources and expertise due to the commercial availability of an extensive array of drugs that could be potentially used to address the SARS-CoV-2 infection. To address this bottleneck, we created the CoviRx platform. CoviRx is a user-friendly interface that provides access to the data, which is manually curated for COVID-19 drug repurposing data. Through CoviRx, the data curated has been made open-source to help advance drug repurposing research. CoviRx also encourages users to submit their findings after thoroughly validating the data, followed by merging it by enforcing uniformity and integ-rity-preserving constraints. This article discusses the various features of CoviRx and its design principles. CoviRx has been designed so that its functionality is independent of the data it dis-plays. Thus, in the future, this platform can be extended to include any other disease X beyond COVID-19. CoviRx can be accessed at www.covirx.org.
ARTICLE | doi:10.20944/preprints202106.0110.v1
Subject: Materials Science, Biomaterials Keywords: Thermodynamic modeling; CALPHAD; molten salt; molten salt reactor; thermodynamic database; modified quasichemical model; fluoride salt; chloride salt; salt system
Online: 3 June 2021 (11:50:13 CEST)
olten salt reactors (MSRs) utilize salts as coolant or as the fuel and coolant together with fissile isotopes dissolved in the salt. It is necessary to therefore understand the behavior of the salts to effectively design, operate, and regulate such reactors, and thus there is a need for thermodynamic models for the salt systems. Molten salts, however, are difficult to represent as they exhibit short range order that is dependent on both composition and temperature. A widely useful approach is the modified quasichemical model in the quadruplet approximation that provides for consideration of first and second nearest neighbor coordination and interactions. Its use in the CALPHAD ap-proach to system modeling requires fitting parameters using standard thermodynamic data such as phase equilibria, heat capacity, and others. Shortcoming of the model is its inability to directly vary coordination numbers with composition or temperature. Another issue is the difficulty in fitting model parameters using regression methods without already having very good initial values. The proposed paper will discuss these issues and note some practical methods for the effective genera-tion of useful models.
ARTICLE | doi:10.20944/preprints202012.0235.v1
Subject: Earth Sciences, Atmospheric Science Keywords: database; disaster prevention; disaster risk reduction (DRR); climate change adaptation (CCA); stakeholders; nature-based solutions (NBS); mountain; hydro-meteorological risks
Online: 9 December 2020 (16:48:34 CET)
In the context of global changes, Nature-Based Solutions (NBSs) increasingly draw attention as a possible way to reduce disaster risk associated with extreme hydro-meteorological events while providing human well-being and biodiversity benefits at the same time. The PHUSICOS platform is dedicated to gather and analyse relevant NBSs used to reduce disaster risk associated with extreme hydro-meteorological events in mountainous and hilly lands. To design the platform, an in-depth review of 11 existing platforms has been performed. The platform currently references 152 literature NBS cases and is continuously enriched with demonstrator sites through the contribution of NBS community. The platform also proposes a qualitative assessment of the NBSs collected according to 15 criteria related with five ambits: disaster risk reduction, technical and economical feasibility, environment, society and local economy. This paper presents the structure of the platform and a first analysis of its content.
DATA DESCRIPTOR | doi:10.20944/preprints202208.0112.v1
Subject: Earth Sciences, Geoinformatics Keywords: ground truth data; drone; mobile application; windshield survey; sample design; crop mapping; agriculture statistics; data dissemination; earth observation data; spatial database.
Online: 4 August 2022 (16:18:26 CEST)
Over the last few years, Earth Observation (EO) data has shifted towards increased use to produce official statistics, particularly in the agriculture sector. National statistics offices worldwide, including in Asia and the Pacific, are expanding their use of EO data to produce agricultural statistics such as crop classification, yield estimation, irrigation mapping, and crop loss estimation. The advances in image classification, such as pixel-based and phenology-based classifications, and machine learning create new opportunities for researchers to analyze EO data applied to agriculture statistics. However, it requires the ground truth (GT) data because classification result mainly depends on the quality of GT. Therefore, in this study, we introduced a random sampling approach to design and collect GT data using EO imagery and ancillary data. As a result of data collection, GT data improve the algorithms and validates classification results. Nevertheless, despite the importance of GT data, they are rarely disseminated as a data product in themselves. Thus, this results in an untapped opportunity to share GT data as a global public good, and improved use of survey and census data as a source of GT data.