Computational motif discovery aims to identify unknown motifs that are believed to be shared in a set of sequences. This computational approach is especially important for understanding the mechanisms that regulate gene expressions by identifying transcription factor binding sites (TFBSs). Massively parallel sequencing technologies have enabled the genome-wide de novo identification of TFBSs, which is a fundamental and crucial problem to the complete understanding of the transcription regulatory processes of cells. This de novo motif discovery is non-trivial for large-scale genomic data, due to the high computational overhead of existing motif discovery algorithms. Furthermore, the rapid growth of genomic sequences and gene transcription data further complicates the situation, and thus establishes a strong requirement for time-efficient scalable motif discovery algorithms.
CUDA-MEME
- Yongchao Liu, Bertil Schmidt, Weiguo Liu, Douglas L. Maskell: "CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units". Pattern Recognition Letters, 2010, 31(14): 2170 - 2177
- Yongchao Liu, Bertil Schmidt, Douglas L. Maskell: "An ultrafast scalable many-core motif discovery algorithm for multiple GPUs". 10th IEEE International Workshop on High Performance Computational Biology (HiCOMB 2011), 2011, 428-434
- Lakshmi Kuttippurathu, Michael Hsing, Yongchao Liu, Bertil Schmidt, Douglas L.Maskell, Kyungjoon Lee, Aibin He, William T. Pu, and Sek Won Kong*: "CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments". Bioinformatics, 2011, 27(5): 715-717