Motif Finding | High Performance Computing

Motif Finding

Computational motif discovery aims to identify unknown motifs that are believed to be shared in a set of sequences. This computational approach is especially important for understanding the mechanisms that regulate gene expressions by identifying transcription factor binding sites (TFBSs). Massively parallel sequencing technologies have enabled the genome-wide de novo identification of TFBSs, which is a fundamental and crucial problem to the complete understanding of the transcription regulatory processes of cells. This de novo motif discovery is non-trivial for large-scale genomic data, due to the high computational overhead of existing motif discovery algorithms. Furthermore, the rapid growth of genomic sequences and gene transcription data further complicates the situation, and thus establishes a strong requirement for time-efficient scalable motif discovery algorithms.

CUDA-MEME

CUDA-MEME is an ultrafast scalable motif discovery algorithm based on MEME (version 4.4.0) algorithm for mutliple GPUs using a hybrid combination of CUDA, MPI and OpenMP parallel programming models. This algorithm has been tested on a GPU cluster with eight compute nodes and two Fermi-based Tesla S2050 (and Tesla-based Tesla S1070) quad-GPU computing systems, running the Linux OS with the MPICH2 library. The experimental results showed that our algorithm scales well with respect to both dataset sizes and the number of GPUs. At present, OOPS and ZOOPS models are supported, which are sufficient for most motif discovery applications.

Download: Google; NVIDIA

Publications:

Yongchao Liu, Bertil Schmidt, Weiguo Liu, Douglas L. Maskell: "CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units". Pattern Recognition Letters, 2010, 31(14): 2170 - 2177
Yongchao Liu, Bertil Schmidt, Douglas L. Maskell: "An ultrafast scalable many-core motif discovery algorithm for multiple GPUs". 10th IEEE International Workshop on High Performance Computational Biology (HiCOMB 2011), 2011, 428-434
Lakshmi Kuttippurathu, Michael Hsing, Yongchao Liu, Bertil Schmidt, Douglas L.Maskell, Kyungjoon Lee, Aibin He, William T. Pu, and Sek Won Kong^*: "CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments". Bioinformatics, 2011, 27(5): 715-717