Motif Finding

Computational motif discovery aims to identify unknown motifs that are believed to be shared in a set of sequences. This computational approach is especially important for understanding the mechanisms that regulate gene expressions by identifying transcription factor binding sites (TFBSs). Massively parallel sequencing technologies have enabled the genome-wide de novo identification of TFBSs, which is a fundamental and crucial problem to the complete understanding of the transcription regulatory processes of cells. This de novo motif discovery is non-trivial for large-scale genomic data, due to the high computational overhead of existing motif discovery algorithms. Furthermore, the rapid growth of genomic sequences and gene transcription data further complicates the situation, and thus establishes a strong requirement for time-efficient scalable motif discovery algorithms.


CUDA-MEME  is an ultrafast scalable motif discovery algorithm based on MEME (version 4.4.0) algorithm for mutliple GPUs using a hybrid combination of CUDA, MPI and OpenMP parallel programming models. This algorithm has been tested on a GPU cluster with eight compute nodes and two Fermi-based Tesla S2050 (and Tesla-based Tesla S1070) quad-GPU computing systems, running the Linux OS with the MPICH2 library. The experimental results showed that our algorithm scales well with respect to both dataset sizes and the number of GPUs. At present, OOPS and ZOOPS models are supported, which are sufficient for most motif discovery applications.
Download: Google; NVIDIA