Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Optimization and Performance Analysis of CAT Method for DNA Sequence Similarity Searching and Alignment

Version 1 : Received: 2 February 2024 / Approved: 2 February 2024 / Online: 4 February 2024 (17:15:55 CET)
Version 2 : Received: 18 February 2024 / Approved: 19 February 2024 / Online: 19 February 2024 (16:12:31 CET)

A peer-reviewed article of this Preprint also exists.

Gancheva, V.; Stoev, H. Optimization and Performance Analysis of CAT Method for DNA Sequence Similarity Searching and Alignment. Genes 2024, 15, 341. Gancheva, V.; Stoev, H. Optimization and Performance Analysis of CAT Method for DNA Sequence Similarity Searching and Alignment. Genes 2024, 15, 341.

Abstract

Bioinformatics is a rapidly developing field enabling scientific experiments through computer models and simulations. Considering the vast databases of biological data available, it is extremely important to develop efficient methods and algorithms for their processing. Sequence comparison is the best method for studying the evolutionary interaction be-tween genes. It is based on alignment – the process of arranging two or more sequences to achieve the maximum level of identity and degree of similarity. The paper presents a new algorithm for aligning DNA sequences based on a new method called CAT, using trilateration method. The generation of CAT profiles is done once data is entered into the database, allowing the profiles to be used as metadata for the sequences. It consists of an algorithm to calculate a CAT profile against the selected reference sequences and an algorithm to compare two sequences based on the calculated CAT profiles. Experiments have been carried out with different datasets to align DNA sequences based on the CAT method. Experimental results in terms of collisions, speed, and efficiency of the proposed solutions are presented. An analysis of the performance of CAT against Knuth–Morris–Pratt algorithm was performed. The addition of previous match dependencies over uniqueness for generated CAT profiles is investigated. The analysis of the experimental results obtained by sequence alignment shows a small deviation of the proposed algorithm based on the CAT method, which can be ignored if this deviation is acceptable at the expense of performance. The time efficiency of the CAT algorithm remains constant, regardless of the length of the sequences. Therefore, the advantage of the proposed method is its fast processing in the alignment of large sequences, for which the execution of the exact algorithms takes a long time.

Keywords

bioinformatics; biological data sequences; DNA sequences; sequence alignment

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.