Close
Help




JOURNAL

Evolutionary Bioinformatics

HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing

Submit a Paper


Evolutionary Bioinformatics 2016:12 73-85

Technical Advance

Published on 10 Feb 2016

DOI: 10.4137/EBO.S35545


Further metadata provided in PDF



Sign up for email alerts to receive notifications of new articles published in Evolutionary Bioinformatics

Abstract

Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis.



Downloads

PDF  (2.02 MB PDF FORMAT)

RIS citation   (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

BibTex citation   (BIBDESK, LATEX)

XML




Quick Links


New article and journal news notification services