Network Medicine Characterisation of Genetic Disorders by Propagation of Disease Phenotypic Similarities

Juan Caceres Silva

Research output: ThesisDoctoral Thesis

310 Downloads (Pure)


The elucidation of the genetic causes of diseases is central to understanding the mechanisms of action of a pathology and the development of treatments. Disease gene prediction methods streamline the discovery of the molecular basis for a disease by prioritizing genes for experimental validation. Technological advances, such as high throughput sequencing and screening techniques, have led to an increasing accumulation of genomic data. Despite this growth, the mechanisms of action through which genomic variants drive disease development are not fully understood. Earlier approaches to find non-experimental disease gene associations such as linkage analysis or genome-wide association studies, produce either limited results or hundreds of candidates, making experimental validation expensive and time consuming.

Modern biological networks have been exploited to capture significant features of the highly complex protein interactions, leading to the rise of computational methods in network medicine. Recent network medicine based approaches bypass the lack of functional annotation by drawing inferences from interaction data. My approach, called Cardigan (ChARting DIsease Gene AssociatioNs), is based on a semi-supervised algorithm that propagates labels on the interactome. These labels integrate disease phenotypic information expressed as a similarity measure between diseases, which is obtained by mining and comparing MeSH terms relevant for each disease on the MeSH ontology. Thorough experimentation shows that Cardigan vastly outperforms state-of-the-art disease gene prioritisation methods. This work additionally presents an exploratory extension of the approach, which, to the best of my knowledge, allows network methods for the first time to handle protein interfaces.

As a ramification of disease characterisation, this work presents an analysis of viral induced lymphoid malignancies on mice. In particular, the characterisation of the clonality of viral insertions to classify different stages of lymphomagenesis. The results show that we can identify rare driver mutations from late stage samples, with infrequent occurrences as clonal mutations, by adding statistical support of their occurrence as subclonal mutations. Several known rare cancer drivers were found to appear as subclonal mutations in late stage cancer samples more often than expected by random chance. Another research ramification focuses on a pipeline to infer drug cocktails for the chronic phase of Chagas Disease, which were assembled from drugs with prospective efficiency against the parasite. The drug set is obtained by homology analysis of known drug targets and enzymes found in inferred metabolic pathways for the pathogen, and a random forest model trained with a large compound essay against the pathogen.
Original languageEnglish
Awarding Institution
  • Department of Computer Science
  • Royal Holloway, University of London
  • Paccanaro, Alberto, Supervisor
Award date1 Jul 2019
Publication statusUnpublished - 2019


  • disease gene prediction
  • diffusion
  • bioinformatics
  • Computational Biology

Cite this