Short reads produced from high-throughput sequencers come with short lengths and high sequencing error rates. These sequencing errors complicate some research fields related to short read analysis, including re-sequencing, single nucleotide polymorphism (SNP) calling, and genome assembly. Fortunately, the low sequencing cost allows producing sufficient reads to obtain a highly redundant coverage of a genome. Thus, it is possible to detect and correct sequencing errors based on this redundancy. However, the error correction procedure is both compute- and memory-intensive due to the large number of short reads, thus requiring both time and memory efficient short read error correctors to tackle the flood of short reads.
Musket
Musket is an efficient multistage k-mer based corrector for Illumina short-read data. We employ the k-mer spectrum approach and introduce three correction techniques in a multistage workflow: two-sided conservative correction, one-sided aggressive correction and voting-based refinement. Our performance evaluation results, in terms of correction quality and de novo genome assembly measures, reveal that Musket is consistently one of the top performing correctors. In addition, Musket is multi-threaded using a master-slave model and demonstrates superior parallel scalability compared to all other evaluated correctors as well as a highly competitive overall execution time.
Download: Sourcefoge
Publications:
- Yongchao Liu, Jan Schroeder, Bertil Schmidt: "Musket: a multistage k-mer spectrum based error corrector for Illumina sequence data". Bioinformatics, 2013, 29(3): 308-315
DecGPU
DecGPU (Distributed Error Correction on GPUs) is the first parallel and distributed error correction algorithm for high-throughput short reads using CUDA C++ and MPI. Using simulated and real datasets, our algorithm demonstrates superior performance, in terms of error correction quality and execution speed, to the existing error correction algorithms. The distributed feature of our algorithm makes it feasible and flexible for the error correction of large-scale datasets.
Download: Sourcefoge
Publications:
- Yongchao Liu, Bertil Schmidt, Douglas L. Maskell: "DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI". BMC Bioinformatics, 2011, 12:85.
- Haixiang Shi, Bertil Schmidt, Weiguo Liu, and Wolfgang Müller-Wittig:"A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware". Journal of Computational Biology, 2010, 17(4): 603-615
SHREC
SHREC is a new error correction method based on a suffix tree running on standard multi-cores with Java.
Download: Sourceforge
Publication:
- Jan Schröder, Heiko Schröder, Simon J. Puglisi, Ranjan Sinha and Bertil Schmidt: "SHREC: a short-read error correction method". Bioinformatics, 2009, 25(17): 2157-2163