Conformalised data synthesis

Julia A. Meister, Khuong An Nguyen

Research output: Contribution to journalArticlepeer-review

Abstract

With the proliferation of increasingly complicated Deep Learning architectures, data synthesis is a highly promising technique to address the demand of data-hungry models. However, reliably assessing the quality of a ‘synthesiser’ model’s output is an open research question with significant associated risks for high-stake domains. To address this challenge, we propose a unique synthesis algorithm that generates data from high-confidence feature space regions based on the Conformal Prediction framework. We support our proposed algorithm with a comprehensive exploration of the core parameter's influence, an indepth discussion of practical advice, and an extensive empirical evaluation of five benchmark datasets. To show our approach’s versatility on ubiquitous real-world challenges, the datasets were carefully selected for their variety of difficult characteristics: low sample count, class imbalance, and non-separability. In all trials, training sets extended with our confident synthesised data performed at least as well as the original set and frequently significantly improved Deep Learning performance by up to 61% points F1-score.
Original languageEnglish
Article number57
Number of pages37
JournalMachine Learning
Volume114
DOIs
Publication statusPublished - 6 Feb 2025

Cite this