Popularity-Independence in Evaluation and Learning for Link Prediction and Recommender Systems

Research output: ThesisDoctoral Thesis

1 Downloads (Pure)

Abstract

Networks are an important concept in many areas of business, science and technology. They describe how a group of entities interact, or link, with each other. Models that can accurately predict new links in the network are valuable. For example, social media platforms use link prediction to suggest friendships, streaming platforms use it to recommend movies, and pharmaceutical companies use it to prioritise drug candidates. In some networks, the metrics we typically use to evaluate link prediction models are highly dependent on assigning a high probability to links involving popular entities. This can lead to bad model selection because: models that appear to perform poorly can sometimes be drastically improved by trivially modifying them; performance may not generalise to unseen links; and link prediction based heavily on popularity is less meaningful and, as a result, less trustworthy.

Existing work has tried to tackle this problem by attempting to counteract an estimated (or assumed) popularity bias. We take a different approach. We propose the idea of popularity-independence in link prediction. We define a criterion for link prediction metrics to be popularity-independent, and define a metric that satisfies this criterion, the Double-Edge Swap Score. We show how the Double-Edge Swap Score differs from existing metrics by conducting experiments on eight biomedical networks where node popularity is highly predictive of links. We also explore training models to optimise a popularity-independent loss function. We conduct further experiments with a model trained in this way, and show that there is no significant performance gain compared to standard loss functions.
Original languageEnglish
QualificationPh.D.
Awarding Institution
  • Royal Holloway, University of London
Supervisors/Advisors
  • Paccanaro, Alberto, Supervisor
  • Hague, Matthew, Advisor
Thesis sponsors
Award date1 Feb 2025
Publication statusUnpublished - 2025

Keywords

  • link prediction
  • recommender systems
  • popularity bias
  • network analysis

Cite this