FastMotif: spectral sequence motif discovery

Nicoló Colombo, Nikos Vlassis

Research output: Contribution to journalArticlepeer-review

Abstract

MOTIVATION: Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, most of the existing motif finding algorithms are computationally demanding, and they may not be able to support the increasingly large datasets produced by modern high-throughput sequencing technologies.

RESULTS: We present FastMotif, a new motif discovery algorithm that is built on a recent machine learning technique referred to as Method of Moments. Based on spectral decompositions, our method is robust to model misspecifications and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. On HT-Selex data, FastMotif extracts motif profiles that match those computed by various state-of-the-art algorithms, but one order of magnitude faster. We provide a theoretical and numerical analysis of the algorithm's robustness and discuss its sensitivity with respect to the free parameters.

AVAILABILITY AND IMPLEMENTATION: The Matlab code of FastMotif is available from http://lcsb-portal.uni.lu/bioinformatics.

CONTACT: vlassis@adobe.com

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Original languageEnglish
Pages (from-to)2623-2631
Number of pages9
JournalBioinformatics
Volume31
Issue number16
Early online date16 Apr 2015
DOIs
Publication statusPublished - 15 Aug 2015

Keywords

  • Algorithms
  • Binding Sites
  • Computational Biology/methods
  • High-Throughput Nucleotide Sequencing/methods
  • Humans
  • Machine Learning
  • Models, Theoretical
  • Nucleotide Motifs/genetics
  • Transcription Factors/metabolism

Cite this