PDB-Hadoop: Parallelising Legacy applications on the Protein Databank using Apache Hadoop. / AlNasir, Jamie; Shanahan, Hugh.

In: Bioinformatics, 2015.

Research output: Contribution to journalArticle

In preparation

Abstract

We provide a framework that facilitates the parallel execution of protein structure analysis tools to be carried out on the entire (or large subsets of) the Protein Databank (PDB) using the Apache Hadoop platform. The framework is desgined so that structural Biologists can use the Hadoop platform without having to write the relatively complex Java code that Hadoop is implemented for.The framework is easily scalable and uses a mapper architecture that functions stand-alone or can be extended to include further map-reduce operations.
Original languageEnglish
JournalBioinformatics
StateIn preparation - 2015

ID: 23878507