SVDB : a Comprehensive Domain Specific Database of Snake Venom Toxins Generated Through NCBI

Venoms that drip from the fangs of snakes are incredibly complex chemical cocktails of compounds, with different proteins and enzymes, including a large variety of toxins like myotoxins, cardiotoxins, hemotoxins, and neurotoxins and their countless combinations. In addition to their use in the treatment of snake bites in humans, they have numerous therapeutic and medicinal applications. Potential use of snake venom includes excessive bleeding, stroke, neurological disorders, cancer, diabetes and aging. Therefore, a proper understanding of snake venom toxin and facilitating their use is of utmost importance. In this paper, we describe a novel database, called SVDB, for storage, dissemination and analysis of snake venom and toxins related information. SVDB has autonomous links to NCBI databases to pull relevant information both on-demand and asynchronous ways to facilitate data integration. SVDB includes authentic, non-redundant, up-todate scientific information on literature, sequences, structures, small molecules, taxonomy and many more. SVDB portal also provides external links to tools like BLAST, CLUSTAL, Swiss-model, phylogeny and other toxin related resources. The architecture of SVDB information fetching, linking and structuring is unique and can be implemented to any domain specific generic data collection pipeline through the NCBI. The database is publicly available at https://www.snakevenomdb.org.


Introduction
Snakes have evolved with a stunning range of venomous compounds worldwide.Venom that drips from the fangs of snakes is nature's most efficient killer, works very fast and is highly specific (1,2).Snake venom contains dozens to hundreds of different proteins, peptides and other molecules.For snakes, these toxins are principally for prey acquisition, digestion and defence against predators.Ironically, the properties that make venom deadly are also what make it so valuable in medicine (1)(2)(3).The therapeutic promise and medicinal use of snake venoms are enormous.Many of the proteins and enzymes in venoms have interesting biological properties and can play important roles in various biological functions such as blood coagulation, blood pressure regulation, and transmission of the nervous or muscular signals.Some of these proteins and enzymes have been successfully used in many pharmacological purposes and are useful drugs (4,5).
The National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov) was founded more than two decades ago with a mission to provide researchers and scientific community access to authentic, up-to-date biomedical data.NCBI hosts a substantial number of publicly available databases relevant to biotechnology and biomedicine and is an important resource for Bioinformatics tools and services that allow developers to access and manipulate NCBI data in their applications (6).At present, exponential growth of biological data poses huge challenges in data deposition, integration, retrieval, curation, translation and demands new data access techniques that will allow users to automatically perform the homogenization and provide easy access to the most relevant data.Here, we present SVDB, a new resource for all about snake venom and toxins related information from literature, taxonomy to molecular sequence data of genes, mRNA, protein, structure, and genome, together with external web links for data analysis like BLAST, CLUSTALW, Phylogeny, SWISS-model among many others.Users of all categories from layman to molecular biologist, biotechnologist have the flexibility to perform simple searches to detailed structuralfunctional investigation valuable for future drug designing.This database is the most complete resource for up-to-date data for molecular biologists, informaticians, toxicologists and anyone interested in snake venom & their components.SVDB is an open-source and publicly accessible at https://www.snakevenomdb.org.

System design and implementation:
The snake venom database (SVDB) has been implemented as a website, which has both server and database component.The front end of the web interface runs on a Linux-based Apache Web server (https://httpd.apache.org/)and PHP (http://www.php.net/).PHP supports all major Web servers and databases and is used for server-side scripting.It also provides multiple layers of security to prevent threats and malicious attacks.For data storage and management, MySQL (http://www.mysql.com)was used, as in combination with PHP, it provides both faster service and user friendly features to configure the web services.The interface of SVDB was built using CSS (https://www.w3.org/Style/CSS/Overview.en.html) for HTML elements to be presented on the Web.In SVDB, the time-based job scheduler CRON (8,9) was used for scheduling tasks to run on the server.Automated searches and fetching of venom and toxin related data through NCBI (https://www.ncbi.nlm.nih.gov/) were scheduled to be executed periodically using crontabs.All resources used to develop SVDB are open source.
Automated Data fetching through NCBI: The core component of SVDB is its 'data repository' that allows organizing and storage of the categorized data together with an efficient retrieval system.To build the repository, surveys were conducted on the NCBI ERD flowchart, to identify classes of categories and then tags were assigned to organize its content (Supplementary Figure S1).In SVDB, all relevant information on venom, toxins, snakes, family, species, genes, sequences and small compounds among many others were then collected and stored.All data were pipelined through NCBI which is a reliable source and we used API provided by NCBI (Entrez Programming Utilities) to search and fetch records (https://www.ncbi.nlm.nih.gov/books/NBK25498/)(10).
Data acquisition from NCBI is very tricky as it involves heterogeneous data formats.To avoid complexity, in SVDB structuring, data from all different databases were integrated into one database where all NCBI records are stored using RDBMS model (11,12), Common fields like unique ID (UID), title, accession number, etc., that are conserved across most of the databases of NCBI, were fetched and stored in a single common summary table (summery_tbl).Records of different databases were identified, pooled and structured under dbname column (Supplementary Figure S1).For example, taxonomy, data stored in a separate table and its unique ID was stored in the main common table.In addition, uncommon relevant information was organized and stored in other separate tables based on EAV (Entity Attribute Value) model (13,14).NCBI automatically updates data in every 24 hrs, so SVDB has been programmed for automated scheduled data fetching (crawling) at fixed time intervals using Corn (Figure 1).The crawled data was then normalized and automatically stored locally in SVDB portal.SVDB portal thus provides the most up to date data and much faster data access to users for the same query than that can be performed through NCBI.In addition, SVDB provides the user the flexibility to perform search using NCBI from within the system.When displaying results, if any, record that is new and not found in SVDB, it will be instantly fetched from NCBI.All crawled records will then be stored in the SVDB portal, so the next time it can be accessed through without any API call to NCBI.The detailed strategies for SVDB data, crawling implementation are depicted in Algorithm 1, 2 and 3. for Each id in uid_list do 5: id_detail ←Request NCBI for details of id 6: Store id_detail in DB 7: flag←Existence of Related Links of id 8: [flag = 1 means related link found, otherwise 0] 9: 10: if flag = 1 then 11: Save Related Links to SVDB for Each uid in uid_list do 13: Display the summary 14: end for 15: end procedure Data sources: In SVDB, different types of data from literature to taxonomy, sequence to structure, from small molecules to LD50 venom toxicity were accessed and extracted from various databases like PubMed, Nucleotide, Gene, Protein, Structure, Genome, Taxonomy, Conserved Domains, OMIM, PubChem compounds, substance, bioassay, Popset, GEO datasets and many more are accessed through NCBI (https://www.ncbi.nlm.nih.gov/) to construct a central repository of all snake venom and toxin related information (Supplementary Figure S2, Table 1).
Tools & additional resources: SVDB provides Web links to other important databases and resources on snake venom & toxins like LD50 toxicity (http://snakedatabase.org/pages/ld50.php),T3DB: the toxin & toxin target database (15), WHO database of venomous snakes (http://apps.who.int/bloodproducts/snakeantivenoms/database/),VenomKB (http://www.venomkb.org/),UniProt entries for venom toxins (https://www.uniprot.org/help/uniprotkb),reptile databases (http://www.reptile-database.org/),NABTSCT (http://www.nabtsct.net/questionnaire.htm)and many more.Also, users have access to Web based analytical tools like BLAST (16), CLUSTAL (17), PSIPRED (18), SWISS model (19), Phylogeny (20), Swiss PDBviewer (21), RasMol (http://rasmol.org/).Cn3D (http://dip.doembi.ucla.edu/dip/Main.cgi) for understanding high-level functions of the biological systems through this portal that would be especially helpful for researchers keen to study the mechanism of action and structure-functional details of snake venom components and toxins.The Taxonomical data on classification and nomenclature of five major venomous snake families: Colubridae, Elapidae, Viperidae, Atractaspididae and Hydrophiidae could also be accessed through SVDB.Furthermore, in SVDB, snake bite treatment section provides an ample resource links to the users for easy accessing general information on snakebite, their sign & symptoms, prevention, treatment and first-aid.In addition, a special section has been dedicated to the venomous snakes of Bangladesh their classification, types, images and habitats.Snake venom database (SVDB), is a customized, domain specific, non-redundant data repository built to communicate with and extract all snake venom and toxin related information from millions of data that can be accessed through NCBI.The Web interface of SVDB allows users to browse, search, analyse and download all snake venom and toxin related information from a number of primary data resources of NCBI (Figure 2 & 3).There is an option of choosing from 20 different databases available on the pull-down menus: in the simple 'Search' menu (located at the top right hand corner) of the web interface.SVDB users can simply perform searches by entering appropriate keywords and phrases into the search box, like snake name, venom, toxin name, venom components, accession no/ UID or 3D structural information.browse relevant databases in SVDB by clicking on the relevant sub-menus (Figure 3).Alternatively, users can click on the left side of the navigation bar to view the corresponding page or entry.Relevant data are also available for bulk download in several formats from the SVDB FTP site.Through the default filter setting of SVDB, data from NCBI is automatically filtered down and only the most relevant subject-oriented data will be displayed.So, sophisticated queries can also be performed by restricting searches to specific fields and combining terms with Boolean operators (AND, OR, NOT) using the following syntax: term [field] OPERATOR term [field].In SVDB, 'Advanced Search' and 'Limits' page options are also available on the home page to assist users to quickly construct multipart, user defined field queries.In addition, SVDB provides links to a collection of tools like BLAST, Phylogeny, Swiss-model on the pull-down menus: under 'Tools' menu on the top bar of the web interface for users for users to carry out further structural investigations on venom toxins.The 'Other resources' menu on the top bar of SVDB provides users access other important venom & toxin related databases including T3DB, VenomKB and LD50 toxicity levels of venomous snakes.The 'Taxonomy' menu provides taxonomical classification and nomenclature data of major venomous snake families.Furthermore, the 'Snake bite treatment' menu provides links to general information on snakebite, their symptoms, treatment and first-aid (Figure 2 & 3).Together, the SVDB website provides a complete information reservoir for snake venom components and toxins for all users from beginners to the most advanced researcher.

Performance Evaluation
In portal, all data are collected from NCBI which is an authentic and up to date resource.NCBI provided APIs (Entrez Programming Utilities) were used to fetch snake venom & toxin related data for SVDB.The amount of data in NCBI is huge & the search results of every query always pulled cross-referred data.The architecture of SVDB allows retrieving and display search results which is both fast and highly relevant to the user query (Figure 1).The Web performance of SVDB and NCBI were evaluated and compared using WEBPAGETEST (http://www.webpagetest.org),and GTmetrix (https://gtmetrix.com),two benchmark tools to evaluation performance of web portals.The basis of comparison is the load time of the search result page, in the web browser.For example, the keyword "snake venom" was searched in the PubMed database of both NCBI and SVDB and the outcome of the performances were compared (Figure 4, Table 2).SVDB uses its domain specific repositories to fetch records.As the database is pre-processed and especially structured for storing snake venom related data only, it takes less time to perform searches and display results than performing the same search in NCBI (Table 2).Additionally, SVDB page is very light weight, hence consumes less bandwidth resulting in faster load time (Table 2, Figure 4).Performance tests were also evaluated using different keywords & selecting different databases of SVDB & NCBI.In each case, similar performance patterns were observed.The observed differences in performance clearly indicated that the data retrieval from SVDB is faster than NCBI.

Conclusions & Future Scopes
Snake venom toxins have great diversity in structure, function and evolution; are therefore, invaluable in both basic and applied research.To our knowledge, SVDB is the most repository of snake venom and toxins.It provides information from the literature to taxonomy, sequence to structure, LD50 toxicity and many others.Together, SVDB provide links to important tools essential for structural-functional analysis of venom components & toxins, enabling researchers to screen for drug leads.Venom proteins are of considerable interest to the drug discovery and biotechnology communities (22).We believe that more researchers will be focusing on elucidating the biological role of snake venom and toxins in the near future and can take advantage of this categorized domain -specific data portal for further structural investigations.We believe that SVDB will be a valuable resource for snake venom & toxin community.
The architecture of SVDB ensures automatic growth & update of data as new records will continue to be added and updated in every 24 hours at NCBI.So, the user will always be able to access the most up to date, authentic and non redundant data from SVDB portal.This architecture of data extraction, storage & retrieval SVDB through NCBI (Figure 1) is not limited for the snake venom and toxins only, but is actually very flexible and has the expandability to generate a generic model for any domain specific data that can be accessed through the NCBI.Therefore, we will continue to expand SVDB database with the new publicly available datasets and keep improving.We host SVDB on its own domain (snakevenomdb.org)instead of on an institutional domain: as they tend to change.Dedicated domain name & space improve the site's sustainability.We also believe that feedback from researchers could further enrich this resource so a feedback site been incorporated at the SVDB portal and we make the database online accessible to anyone in the world.

Figure 2 .
Figure 2. Web portal and examples of some key elements of SVDB's user interface.

Figure 3 .
Figure 3. Analysis tools and other online resources accessed through SVDB portal.

Figure 4 .
Figure 4. Performance comparison of SVDB and NCBI.

Table 1 .
Summary of major data sources of SVDB (as of May 2018).

Table 2 .
Performance comparison of SVDB and NCBI.