Learning sparse representations for predicting drug side effects, disease genes and customer preferences. / Galeano Galeano, Diego.

2020. 203 p.

Research output: ThesisDoctoral Thesis



  • Diego Galeano PhD Thesis

    Other version, 9.35 MB, PDF document

    Embargo ends: 25/02/22

    Licence: CC BY-NC Show licence


Computational prediction methods that operate on pairs of objects are fundamental tools
for understanding and modelling complex systems in biology, chemistry, and customer
preference in recommender systems. I present four sparse matrix completion models to
learn a sparse representation of objects from data consisting of associations between pairs
of objects. The main goal of my models is to be able to generalise, that is, to predict new
relationships between a pair of objects. This thesis addresses the following problems: (1)
drug-side effect frequency prediction; (2) drug-side effect prediction; (3) disease-gene prediction; and (4) user preference prediction in top-N recommender systems. I show how my
sparse matrix completion models can be effectively used to predict missing relationships in the data; better than other state-of-the-art methods. My models are designed to favour interpretability. On the task of predicting the frequencies of drug side effects, I show a new
algorithm for non-negative matrix factorisation that learns parts of the human anatomical
system. On the task of predicting the presence/absence of drug side effects, I show a new algorithm that learns sparse self-representation of objects such that a given object, e.g. a side effect is represented by the linear combination of few other objects. In addition, my models naturally integrate structure knowledge in the form of graph networks, adding strong relational inductive biases without requiring well-defined heuristics or hand-crafted features.
Original languageEnglish
Awarding Institution
Thesis sponsors
  • Becas Don Carlos Antonio Lopez (BECAL) - Paraguayan Government
Award date1 Mar 2020
Publication statusUnpublished - 2020

Research outputs

ID: 37029967