Fostering Reproducibility, Standardisation, Fault-tolerance and Deployability in Computational Pipelines using Nextflow Workflow Language - Adoption and Training at the ICR

Jamie Alnasir

Research output: Contribution to conferenceAbstractpeer-review

Abstract

Computational pipelines are collections of workflows or execution tasks that couple data processing tasks - they are ubiquitous in Bioinformatics. Pipelines are widely used by different research groups in our Institute of Cancer Research (ICR).
More often than not, pipelines are implemented in a variety of different languages, such as Python, Perl, or use shell scripting such as bash. However, the problem with this approach is that they lack standardisation. This may result in poor interoperability, difficulties in reproducibility, and challenges in sharing, particularly with external collaborators. In addition to the lack of standardisation, when pipelines are used by different research groups within the same institute to perform common tasks (such as genome alignment, variant calling) with only slight variations in parameters, duplication of effort also results. Such implementations fail to demonstrate “Best practices” and this is particularly important where pipelines are used in laboratory service, clinical and diagnostic settings. Workflow languages can be used to address this issue. As Domain Specific Languages, they are used for implementing workflows and pipelines, and find application in a variety of fields such as Astronomy, Physics, and importantly, Bioinformatics and where typically High Performance Computing (HPC) is employed. In this talk, I will introduce the Nextflow workflow language and outline how the Scientific Computing department are encouraging its adoption, developing pipelines using it, and providing training to researchers and bioinformaticians across the institute. I'll also outline how Nextflow will facilitate transitioning to our new cluster which will change architecturally from currently fully on-premises to a hybrid on-premises and on-cloud cluster.
Original languageEnglish
Publication statusPublished - 5 Jul 2019
EventMathematical Foundations in Bioinformatics (MatBio '19) -
Duration: 5 Jul 20195 Jul 2019

Conference

ConferenceMathematical Foundations in Bioinformatics (MatBio '19)
Period5/07/195/07/19

Cite this