Prescience: Probabilistic Guidance on the Retraining Conundrum for Malware Detection

Amit Deo; Santanu Dash; Guillermo Suarez de Tangil Rotaeche; Vladimir Vovk; Lorenzo Cavallaro

doi:10.1145/2996758.2996769

Prescience: Probabilistic Guidance on the Retraining Conundrum for Malware Detection

Amit Deo, Santanu Dash, Guillermo Suarez de Tangil Rotaeche, Vladimir Vovk, Lorenzo Cavallaro

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

225 Downloads (Pure)

Abstract

Malware evolves perpetually and relies on increasingly so- phisticated attacks to supersede defense strategies. Data-driven approaches to malware detection run the risk of becoming rapidly antiquated. Keeping pace with malware requires models that are periodically enriched with fresh knowledge, commonly known as retraining. In this work, we propose the use of Venn-Abers predictors for assessing the quality of binary classification tasks as a first step towards identifying antiquated models. One of the key benefits behind the use of Venn-Abers predictors is that they are automatically well calibrated and offer probabilistic guidance on the identification of nonstationary populations of malware. Our framework is agnostic to the underlying classification algorithm and can then be used for building better retraining strategies in the presence of concept drift. Results obtained over a timeline-based evaluation with about 90K samples show that our framework can identify when models tend to become obsolete.

Original language	English
Title of host publication	ACM Workshop on Artificial Intelligence and Security
Place of Publication	Vienna, Austria
Publisher	ACM
Pages	71-82
Number of pages	12
ISBN (Print)	978-1-4503-4573-6
DOIs	https://doi.org/10.1145/2996758.2996769
Publication status	Published - 28 Oct 2016

Access to Document

10.1145/2996758.2996769

Accepted ManuscriptAccepted author manuscript, 650 KB

MobSec: Malware and Security in the Mobile Age
Cavallaro, L. & Kinder, J.
Eng & Phys Sci Res Council EPSRC
10/11/14 → 4/05/19
Project: Research
Mining the Network Behaviour of Bots
Cavallaro, L., Gammerman, A., Vovk, V., Shanahan, H. & Luo, Z.
Eng & Phys Sci Res Council EPSRC
16/06/13 → 17/04/17
Project: Research
Centre for Doctoral Training in Cyber Security
Cid, C., Crampton, J., Martin, K. M. & Paterson, K.
Eng & Phys Sci Res Council EPSRC
1/04/13 → 31/12/19
Project: Research

Cite this

@inproceedings{26784ac9045a494c8d1a24252363ddf4,

title = "Prescience: Probabilistic Guidance on the Retraining Conundrum for Malware Detection",

abstract = "Malware evolves perpetually and relies on increasingly so- phisticated attacks to supersede defense strategies. Data-driven approaches to malware detection run the risk of becoming rapidly antiquated. Keeping pace with malware requires models that are periodically enriched with fresh knowledge, commonly known as retraining. In this work, we propose the use of Venn-Abers predictors for assessing the quality of binary classification tasks as a first step towards identifying antiquated models. One of the key benefits behind the use of Venn-Abers predictors is that they are automatically well calibrated and offer probabilistic guidance on the identification of nonstationary populations of malware. Our framework is agnostic to the underlying classification algorithm and can then be used for building better retraining strategies in the presence of concept drift. Results obtained over a timeline-based evaluation with about 90K samples show that our framework can identify when models tend to become obsolete.",

author = "Amit Deo and Santanu Dash and {Suarez de Tangil Rotaeche}, Guillermo and Vladimir Vovk and Lorenzo Cavallaro",

year = "2016",

month = oct,

day = "28",

doi = "10.1145/2996758.2996769",

language = "English",

isbn = "978-1-4503-4573-6 ",

pages = "71--82 ",

booktitle = "ACM Workshop on Artificial Intelligence and Security",

publisher = "ACM",

}

TY - GEN

T1 - Prescience

T2 - Probabilistic Guidance on the Retraining Conundrum for Malware Detection

AU - Deo, Amit

AU - Dash, Santanu

AU - Suarez de Tangil Rotaeche, Guillermo

AU - Vovk, Vladimir

AU - Cavallaro, Lorenzo

PY - 2016/10/28

Y1 - 2016/10/28

N2 - Malware evolves perpetually and relies on increasingly so- phisticated attacks to supersede defense strategies. Data-driven approaches to malware detection run the risk of becoming rapidly antiquated. Keeping pace with malware requires models that are periodically enriched with fresh knowledge, commonly known as retraining. In this work, we propose the use of Venn-Abers predictors for assessing the quality of binary classification tasks as a first step towards identifying antiquated models. One of the key benefits behind the use of Venn-Abers predictors is that they are automatically well calibrated and offer probabilistic guidance on the identification of nonstationary populations of malware. Our framework is agnostic to the underlying classification algorithm and can then be used for building better retraining strategies in the presence of concept drift. Results obtained over a timeline-based evaluation with about 90K samples show that our framework can identify when models tend to become obsolete.

AB - Malware evolves perpetually and relies on increasingly so- phisticated attacks to supersede defense strategies. Data-driven approaches to malware detection run the risk of becoming rapidly antiquated. Keeping pace with malware requires models that are periodically enriched with fresh knowledge, commonly known as retraining. In this work, we propose the use of Venn-Abers predictors for assessing the quality of binary classification tasks as a first step towards identifying antiquated models. One of the key benefits behind the use of Venn-Abers predictors is that they are automatically well calibrated and offer probabilistic guidance on the identification of nonstationary populations of malware. Our framework is agnostic to the underlying classification algorithm and can then be used for building better retraining strategies in the presence of concept drift. Results obtained over a timeline-based evaluation with about 90K samples show that our framework can identify when models tend to become obsolete.

U2 - 10.1145/2996758.2996769

DO - 10.1145/2996758.2996769

M3 - Conference contribution

SN - 978-1-4503-4573-6

SP - 71

EP - 82

BT - ACM Workshop on Artificial Intelligence and Security

PB - ACM

CY - Vienna, Austria

ER -

Prescience: Probabilistic Guidance on the Retraining Conundrum for Malware Detection

Abstract

Access to Document

Projects

MobSec: Malware and Security in the Mobile Age

Mining the Network Behaviour of Bots

Centre for Doctoral Training in Cyber Security

Cite this