Enabling Fair ML Evaluations for Security

Feargus Pendlebury; Fabio Pierazzi; Roberto Jordaney; Johannes Kinder; Lorenzo Cavallaro

doi:10.1145/3243734.3278505

Enabling Fair ML Evaluations for Security

Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, Lorenzo Cavallaro

Research output: Contribution to conference › Poster › peer-review

250 Downloads (Pure)

Abstract

Machine learning is widely used in security research to classify malicious activity, ranging from malware to malicious URLs and network traffic. However, published performance numbers often seem to leave little room for improvement and, due to a wide range of datasets and configurations, cannot be used to directly compare alternative approaches; moreover, most evaluations have been found to suffer from experimental bias which positively inflates results. In this manuscript we discuss the implementation of Tesseract, an open-source tool to evaluate the performance of machine learning classifiers in a security setting mimicking a deployment with typical data feeds over an extended period of time. In particular, Tesseract allows for a fair comparison of different classifiers in a realistic scenario, without disadvantaging any given classifier. Tesseract is available as open-source to provide the academic community with a way to report sound and comparable performance results, but also to help practitioners decide which system to deploy under specific budget constraints.

Original language	English
Pages	2264-2266
Number of pages	3
DOIs	https://doi.org/10.1145/3243734.3278505
Publication status	Published - 8 Oct 2018
Event	ACM Conference on Computer and Communications Security - Beanfield Centre, Toronto, Canada Duration: 15 Oct 2018 → 19 Oct 2018 https://www.sigsac.org/ccs/CCS2018/

Conference

Conference	ACM Conference on Computer and Communications Security
Abbreviated title	CCS '18
Country/Territory	Canada
City	Toronto
Period	15/10/18 → 19/10/18
Internet address	https://www.sigsac.org/ccs/CCS2018/

Keywords

Malware
Machine Learning
Experimental Bias

Access to Document

10.1145/3243734.3278505

Extended abstractAccepted author manuscript, 1.28 MB
Poster

Cite this

@conference{c57fead380ad433b972736e2a089d4f4,

title = "Enabling Fair ML Evaluations for Security",

abstract = "Machine learning is widely used in security research to classify malicious activity, ranging from malware to malicious URLs and network traffic. However, published performance numbers often seem to leave little room for improvement and, due to a wide range of datasets and configurations, cannot be used to directly compare alternative approaches; moreover, most evaluations have been found to suffer from experimental bias which positively inflates results. In this manuscript we discuss the implementation of Tesseract, an open-source tool to evaluate the performance of machine learning classifiers in a security setting mimicking a deployment with typical data feeds over an extended period of time. In particular, Tesseract allows for a fair comparison of different classifiers in a realistic scenario, without disadvantaging any given classifier. Tesseract is available as open-source to provide the academic community with a way to report sound and comparable performance results, but also to help practitioners decide which system to deploy under specific budget constraints.",

keywords = "Malware, Machine Learning, Experimental Bias",

author = "Feargus Pendlebury and Fabio Pierazzi and Roberto Jordaney and Johannes Kinder and Lorenzo Cavallaro",

year = "2018",

month = oct,

day = "8",

doi = "10.1145/3243734.3278505",

language = "English",

pages = "2264--2266",

note = "ACM Conference on Computer and Communications Security, CCS '18 ; Conference date: 15-10-2018 Through 19-10-2018",

url = "https://www.sigsac.org/ccs/CCS2018/",

}

TY - CONF

T1 - Enabling Fair ML Evaluations for Security

AU - Pendlebury, Feargus

AU - Pierazzi, Fabio

AU - Jordaney, Roberto

AU - Kinder, Johannes

AU - Cavallaro, Lorenzo

PY - 2018/10/8

Y1 - 2018/10/8

N2 - Machine learning is widely used in security research to classify malicious activity, ranging from malware to malicious URLs and network traffic. However, published performance numbers often seem to leave little room for improvement and, due to a wide range of datasets and configurations, cannot be used to directly compare alternative approaches; moreover, most evaluations have been found to suffer from experimental bias which positively inflates results. In this manuscript we discuss the implementation of Tesseract, an open-source tool to evaluate the performance of machine learning classifiers in a security setting mimicking a deployment with typical data feeds over an extended period of time. In particular, Tesseract allows for a fair comparison of different classifiers in a realistic scenario, without disadvantaging any given classifier. Tesseract is available as open-source to provide the academic community with a way to report sound and comparable performance results, but also to help practitioners decide which system to deploy under specific budget constraints.

AB - Machine learning is widely used in security research to classify malicious activity, ranging from malware to malicious URLs and network traffic. However, published performance numbers often seem to leave little room for improvement and, due to a wide range of datasets and configurations, cannot be used to directly compare alternative approaches; moreover, most evaluations have been found to suffer from experimental bias which positively inflates results. In this manuscript we discuss the implementation of Tesseract, an open-source tool to evaluate the performance of machine learning classifiers in a security setting mimicking a deployment with typical data feeds over an extended period of time. In particular, Tesseract allows for a fair comparison of different classifiers in a realistic scenario, without disadvantaging any given classifier. Tesseract is available as open-source to provide the academic community with a way to report sound and comparable performance results, but also to help practitioners decide which system to deploy under specific budget constraints.

KW - Malware

KW - Machine Learning

KW - Experimental Bias

U2 - 10.1145/3243734.3278505

DO - 10.1145/3243734.3278505

M3 - Poster

SP - 2264

EP - 2266

T2 - ACM Conference on Computer and Communications Security

Y2 - 15 October 2018 through 19 October 2018

ER -