On Generating Efficient Data Summaries for Logistic Regression : A Coreset-based Approach. / Riquelme Granada, Nery; Nguyen, Dr. Khuong An; Luo, Zhiyuan.

9th International Conference on Data Science, Technology and Applications (DATA 2020). Vol. 1 2020. p. 78-89.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Published

Standard

On Generating Efficient Data Summaries for Logistic Regression : A Coreset-based Approach. / Riquelme Granada, Nery; Nguyen, Dr. Khuong An; Luo, Zhiyuan.

9th International Conference on Data Science, Technology and Applications (DATA 2020). Vol. 1 2020. p. 78-89.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Harvard

Riquelme Granada, N, Nguyen, DKA & Luo, Z 2020, On Generating Efficient Data Summaries for Logistic Regression: A Coreset-based Approach. in 9th International Conference on Data Science, Technology and Applications (DATA 2020). vol. 1, pp. 78-89. https://doi.org/10.5220/0009823200780089

APA

Riquelme Granada, N., Nguyen, D. K. A., & Luo, Z. (2020). On Generating Efficient Data Summaries for Logistic Regression: A Coreset-based Approach. In 9th International Conference on Data Science, Technology and Applications (DATA 2020) (Vol. 1, pp. 78-89) https://doi.org/10.5220/0009823200780089

Vancouver

Riquelme Granada N, Nguyen DKA, Luo Z. On Generating Efficient Data Summaries for Logistic Regression: A Coreset-based Approach. In 9th International Conference on Data Science, Technology and Applications (DATA 2020). Vol. 1. 2020. p. 78-89 https://doi.org/10.5220/0009823200780089

Author

Riquelme Granada, Nery ; Nguyen, Dr. Khuong An ; Luo, Zhiyuan. / On Generating Efficient Data Summaries for Logistic Regression : A Coreset-based Approach. 9th International Conference on Data Science, Technology and Applications (DATA 2020). Vol. 1 2020. pp. 78-89

BibTeX

@inproceedings{2c30e08cfa284c8ead22c028a6eb139b,
title = "On Generating Efficient Data Summaries for Logistic Regression: A Coreset-based Approach",
abstract = "In the era of datasets of unprecedented sizes, data compression techniques are an attractive approach for speeding up machine learning algorithms. One of the most successful paradigms for achieving good-quality compression is that of coresets: small summaries of data that act as proxies to the original input data. Even though coresets proved to be extremely useful to accelerate unsupervised learning problems, applying them to supervised learning problems may bring unexpected computational bottlenecks. We show that this is the case for Logistic Regression classification, and hence propose two methods for accelerating the computation of coresets for this problem. When coresets are computed using our methods on three public datasets, computing the coreset and learning from it is, in the worst case, 11 times faster than learning directly from the full input data, and 34 times faster in the best case. Furthermore, our results indicate that our accelerating approaches do not degrade the empirical performance of coresets.",
author = "{Riquelme Granada}, Nery and Nguyen, {Dr. Khuong An} and Zhiyuan Luo",
year = "2020",
month = jul,
doi = "10.5220/0009823200780089",
language = "English",
volume = "1",
pages = "78--89",
booktitle = "9th International Conference on Data Science, Technology and Applications (DATA 2020)",

}

RIS

TY - GEN

T1 - On Generating Efficient Data Summaries for Logistic Regression

T2 - A Coreset-based Approach

AU - Riquelme Granada, Nery

AU - Nguyen, Dr. Khuong An

AU - Luo, Zhiyuan

PY - 2020/7

Y1 - 2020/7

N2 - In the era of datasets of unprecedented sizes, data compression techniques are an attractive approach for speeding up machine learning algorithms. One of the most successful paradigms for achieving good-quality compression is that of coresets: small summaries of data that act as proxies to the original input data. Even though coresets proved to be extremely useful to accelerate unsupervised learning problems, applying them to supervised learning problems may bring unexpected computational bottlenecks. We show that this is the case for Logistic Regression classification, and hence propose two methods for accelerating the computation of coresets for this problem. When coresets are computed using our methods on three public datasets, computing the coreset and learning from it is, in the worst case, 11 times faster than learning directly from the full input data, and 34 times faster in the best case. Furthermore, our results indicate that our accelerating approaches do not degrade the empirical performance of coresets.

AB - In the era of datasets of unprecedented sizes, data compression techniques are an attractive approach for speeding up machine learning algorithms. One of the most successful paradigms for achieving good-quality compression is that of coresets: small summaries of data that act as proxies to the original input data. Even though coresets proved to be extremely useful to accelerate unsupervised learning problems, applying them to supervised learning problems may bring unexpected computational bottlenecks. We show that this is the case for Logistic Regression classification, and hence propose two methods for accelerating the computation of coresets for this problem. When coresets are computed using our methods on three public datasets, computing the coreset and learning from it is, in the worst case, 11 times faster than learning directly from the full input data, and 34 times faster in the best case. Furthermore, our results indicate that our accelerating approaches do not degrade the empirical performance of coresets.

U2 - 10.5220/0009823200780089

DO - 10.5220/0009823200780089

M3 - Conference contribution

VL - 1

SP - 78

EP - 89

BT - 9th International Conference on Data Science, Technology and Applications (DATA 2020)

ER -