Bit-parallel Approximate Pattern Matching

Bit-parallelism encodes calculated values into bit arrays. Thus, multiple values can be updated simultaneously by a single operation. This technique can be used to develop efficient algorithms for string comparison problems (e.g. the approximate pattern matching problem, the longest common substring problem).

An important parameter which has direct impact on the efficiency of bit-parallel algorithms is the machine word size w (e.g. w = 32 or 64 bits for conventional CPUs). On modern high performance computing architectures, such as Intel Xeon Phi coprocessors or CUDA-enabled GPUs, it is possible to efficiently simulate 512-bit or 1024-bit words which could contribute to the implementations of bit-parallel algorithms. In addition, the massive computation resource of these types of accelerator allows the development of powerful tools to process large text.

This project aims to provide bit-parallel string comparison tools on modern accelerators, as well as on conventional multi-core CPUs. Fundamentally, these tools are specified for biological data (i.e. DNA, RNA sequences), however, they could also be extended to process other types of text data.

Websitexbitpar.sourceforge.net

Publications

  • Tran, T, Liu, Y, and Schmidt, B: "Bit-Parallel approximate pattern matching: Kepler GPU versus Xeon Phi". Parallel Computing. 2015, under review.
  • Tran, T.T, Schindel, S, Liu, Y, and Schmidt, B: "Bit-Parallel approximate pattern matching on the Xeon Phi coprocessor". 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2014), pp. 81-88.