The application of Hadoop in structural bioinformatics. / Alnasir, Jamie; Shanahan, Hugh.
In: Briefings in Bioinformatics, 20.11.2018, p. 1-10.Research output: Contribution to journal › Article › peer-review
The application of Hadoop in structural bioinformatics. / Alnasir, Jamie; Shanahan, Hugh.
In: Briefings in Bioinformatics, 20.11.2018, p. 1-10.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - The application of Hadoop in structural bioinformatics
AU - Alnasir, Jamie
AU - Shanahan, Hugh
PY - 2018/11/20
Y1 - 2018/11/20
N2 - The paper reviews the use of the Hadoop platform in Structural Bioinformatics applications. Specifically we review a number of implementations using Hadoop of high-throughput analyses, e.g. ligand-protein docking and structural alignment, and their scalability in comparison with other batch schedulers and MPI. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. We note there is some evidence that MPI implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop e.g. Spark improve, usage of cloud platforms (e.g. Azure and AWS) increases and approaches such as the Workflow Definition Language are taken up.
AB - The paper reviews the use of the Hadoop platform in Structural Bioinformatics applications. Specifically we review a number of implementations using Hadoop of high-throughput analyses, e.g. ligand-protein docking and structural alignment, and their scalability in comparison with other batch schedulers and MPI. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. We note there is some evidence that MPI implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop e.g. Spark improve, usage of cloud platforms (e.g. Azure and AWS) increases and approaches such as the Workflow Definition Language are taken up.
KW - tructural Bioinformatics
KW - Hadoop
KW - Cloud computing
U2 - 10.1093/bib/bby106
DO - 10.1093/bib/bby106
M3 - Article
SP - 1
EP - 10
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
SN - 1477-4054
M1 - bby106
ER -