Activities per year
Abstract
Computational pipelines are collections of workflows or execution tasks that couple data processing tasks - they are ubiquitous in Bioinformatics. Pipelines are widely used by different research groups in our Institute of Cancer Research (ICR).
More often than not, pipelines are implemented in a variety of different languages, such as Python, Perl, or use shell scripting such as bash. However, the problem with this approach is that they lack standardisation. This may result in poor interoperability, difficulties in reproducibility, and challenges in sharing, particularly with external collaborators. In addition to the lack of standardisation, when pipelines are used by different research groups within the same institute to perform common tasks (such as genome alignment, variant calling) with only slight variations in parameters, duplication of effort also results. Such implementations fail to demonstrate “Best practices” and this is particularly important where pipelines are used in laboratory service, clinical and diagnostic settings. Workflow languages can be used to address this issue. As Domain Specific Languages, they are used for implementing workflows and pipelines, and find application in a variety of fields such as Astronomy, Physics, and importantly, Bioinformatics and where typically High Performance Computing (HPC) is employed. In this talk, I will introduce the Nextflow workflow language and outline how the Scientific Computing department are encouraging its adoption, developing pipelines using it, and providing training to researchers and bioinformaticians across the institute. I'll also outline how Nextflow will facilitate transitioning to our new cluster which will change architecturally from currently fully on-premises to a hybrid on-premises and on-cloud cluster.
More often than not, pipelines are implemented in a variety of different languages, such as Python, Perl, or use shell scripting such as bash. However, the problem with this approach is that they lack standardisation. This may result in poor interoperability, difficulties in reproducibility, and challenges in sharing, particularly with external collaborators. In addition to the lack of standardisation, when pipelines are used by different research groups within the same institute to perform common tasks (such as genome alignment, variant calling) with only slight variations in parameters, duplication of effort also results. Such implementations fail to demonstrate “Best practices” and this is particularly important where pipelines are used in laboratory service, clinical and diagnostic settings. Workflow languages can be used to address this issue. As Domain Specific Languages, they are used for implementing workflows and pipelines, and find application in a variety of fields such as Astronomy, Physics, and importantly, Bioinformatics and where typically High Performance Computing (HPC) is employed. In this talk, I will introduce the Nextflow workflow language and outline how the Scientific Computing department are encouraging its adoption, developing pipelines using it, and providing training to researchers and bioinformaticians across the institute. I'll also outline how Nextflow will facilitate transitioning to our new cluster which will change architecturally from currently fully on-premises to a hybrid on-premises and on-cloud cluster.
Original language | English |
---|---|
Publication status | Published - 5 Jul 2019 |
Event | Mathematical Foundations in Bioinformatics (MatBio '19) - Duration: 5 Jul 2019 → 5 Jul 2019 |
Conference
Conference | Mathematical Foundations in Bioinformatics (MatBio '19) |
---|---|
Period | 5/07/19 → 5/07/19 |
Activities
- 1 Participation in conference
-
Mathematical Foundations in Bioinformatics (MatBio '19)
Alnasir, J. (Participant)
5 Jul 2019Activity: Participating in or organising an event › Participation in conference