Abstract
Networks are an important concept in many areas of business, science and technology. They describe how a group of entities interact, or link, with each other. Models that can accurately predict new links in the network are valuable. For example, social media platforms use link prediction to suggest friendships, streaming platforms use it to recommend movies, and pharmaceutical companies use it to prioritise drug candidates. In some networks, the metrics we typically use to evaluate link prediction models are highly dependent on assigning a high probability to links involving popular entities. This can lead to bad model selection because: models that appear to perform poorly can sometimes be drastically improved by trivially modifying them; performance may not generalise to unseen links; and link prediction based heavily on popularity is less meaningful and, as a result, less trustworthy.
Existing work has tried to tackle this problem by attempting to counteract an estimated (or assumed) popularity bias. We take a different approach. We propose the idea of popularity-independence in link prediction. We define a criterion for link prediction metrics to be popularity-independent, and define a metric that satisfies this criterion, the Double-Edge Swap Score. We show how the Double-Edge Swap Score differs from existing metrics by conducting experiments on eight biomedical networks where node popularity is highly predictive of links. We also explore training models to optimise a popularity-independent loss function. We conduct further experiments with a model trained in this way, and show that there is no significant performance gain compared to standard loss functions.
Existing work has tried to tackle this problem by attempting to counteract an estimated (or assumed) popularity bias. We take a different approach. We propose the idea of popularity-independence in link prediction. We define a criterion for link prediction metrics to be popularity-independent, and define a metric that satisfies this criterion, the Double-Edge Swap Score. We show how the Double-Edge Swap Score differs from existing metrics by conducting experiments on eight biomedical networks where node popularity is highly predictive of links. We also explore training models to optimise a popularity-independent loss function. We conduct further experiments with a model trained in this way, and show that there is no significant performance gain compared to standard loss functions.
Original language | English |
---|---|
Qualification | Ph.D. |
Awarding Institution |
|
Supervisors/Advisors |
|
Thesis sponsors | |
Award date | 1 Feb 2025 |
Publication status | Unpublished - 2025 |
Keywords
- link prediction
- recommender systems
- popularity bias
- network analysis