PDB-Hadoop: Parallelising user applications on the protein databank using Apache Hadoop

Jamie AlNasir, Hugh Shanahan

Research output: Contribution to conferencePosterpeer-review

227 Downloads (Pure)


We present a framework that facilitates parallel execution of protein structure analysis tools to be carried out on the entire (or subsets of) the Protein Databank (PDB) using the Apache Hadoop platform. Our design enables structural Biologists to use the Hadoop platform without having to explicitly write Map-Reduce code. It is easily scalable and uses a mapper architecture that functions on a stand- alone basis or can be extended to include further Map-Reduce operations.
Original languageEnglish
Publication statusPublished - Jul 2015
Event3DSig Structural Bioinformatics and Computational Biophysics 2015 - Dublin, Ireland
Duration: 10 Jul 201511 Jul 2015


Conference3DSig Structural Bioinformatics and Computational Biophysics 2015

Cite this