Abstract
Conformal approach to anomaly detection was recently developed as a reliable framework of classifying examples into normal and abnormal groups based on a training data set containing only normal examples. Its validity property is that a normal example, generated by the same distribution as the examples from the training set, is classified as anomaly with probability bounded from above by a pre-selected significance level. Parallel processing of big data may require a split of the training set into several sources. We also assume that the collection of data for two or more sources might be done in parallel and the data distribution may differ for these sources. The contribution of this work to conformal anomaly detection is studying the ways of keeping conformal validity when the training set is obtained from heterogeneous (differently distributed) sources.
Original language | English |
---|---|
Title of host publication | Machine Learning and Applications (ICMLA), 2016 15th IEEE International Conference on |
Publisher | IEEE Computer Society |
Pages | 1-6 |
Number of pages | 6 |
ISBN (Electronic) | 978-1-5090-6167-9 |
ISBN (Print) | 978-1-5090-6168-6 |
DOIs | |
Publication status | E-pub ahead of print - 2 Feb 2017 |
Keywords
- conformal prediction, anomaly detection, distributed computing, validity