Multiple Sequence Alignment

Multiple sequence alignment (MSA) generally constitutes the foundation of many bioinformatics studies related to molecular evolution and sequence functional/structural relationship analysis. The approach to producing an optimal MSA is to simultaneously align multiple sequences using dynamic programming. Unfortunately, this approach is impractical for alignments of more than a few sequences, due to its high computational cost. Therefore, many heuristics have been proposed to compute sub-optimal alignments based on different alignment approaches such as progressive alignment, iterative alignment, and alignment based on profile HMMs.

MSAProbs

MSAProbs is a new and practical multiple alignment algorithm for protein sequences. The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities. Assessed using the popular benchmarks: BAliBASE, PREFAB, SABmark and OXBENCH, MSAProbs achieves statistically significant accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons and Probalign. Furthermore, MSAProbs is optimized for multi-core CPUs by employing a multi-threaded design, leading to a competitive execution time compared to other aligners.

Download: Sourceforge

Publication:

MSA-CUDA

MSA-CUDA parallelizes all three stages of the ClustalW progressive alignment pipeline using CUDA, and achieves significant speedups compared to the sequential ClustalW for a variety of large protein sequence datasets. Furthermore, these speedups also compare favorably to ClustalW-MPI on 32 CPU cores in a high-performance compute cluster. In terms of alignment quality, MSA-CUDA is remarkably consistent with ClustalW on BAliBASE dataset.

Publication:

Distance Matrix Computation for MSA on Cell/BE and x86

We introduce an implementation that accelerates the Distance Matrix Computation on x86 and Cell Broadband Engine, a homogeneous and heterogeneous multi-core system, respectively. By taking advantage of multiple processors as well as Single Instruction Multiple Data (SIMD) vectorization, we were able to achieve speed-ups of two orders of magnitude compared to the publicly available implementation utilized in ClustalW.

Download: Sourceforge

Publication:

  • Adrianto Wirawan, Chee Keong Kwoh, Bertil Schmidt: "Multi Threaded Vectorized Distance Matrix Computation on the Cell/BE and x86/SSE2 Architectures". Bioinformatics, 2010, 26(10): 1368-1369