Machine Learning for Security in Hostile Environments

Feargus Pendlebury

Machine Learning for Security in Hostile Environments

Research output: Thesis › Doctoral Thesis

242 Downloads (Pure)

Abstract

The potential for machine learning to change the world is undeniable. More data, better resources, and advances in algorithms have led to multiple breakthroughs in fields such as computer vision and natural language processing. Recently, efforts have been made to apply these methods to detection tasks in computer security where a system detects the presence of malicious objects to prevent them from causing harm to users. However the problem is challenging, primarily due to the inherently hostile environments that security detectors are deployed to. In these settings, members of the malicious class actively try to avoid detection, leading to drift in the data distribution over time that violates core assumptions of machine learning. Furthermore, adversaries can apply powerful algorithms to search for adversarial examples: objects which are confidently misclassified as benign by a detector while retaining malicious functionality.

In this thesis we explore whether machine learning is ready to be used in the security domain, given this hostile environment. We outline how adversarial behavior manifests in security data, providing novel perspectives on the relationship between concept drift and adversarial examples, as well as between feature-space and problem-space adversarial attacks. These lead us to devise a new problem-space attack demonstrating that adversarial examples are a realistic and practical threat against malware detectors.

We discuss the difficulty in performing fair, informative evaluations of defenses in such a dynamic and volatile environment, showing how the evaluations of previous state-of-the-art detectors have been inflated by experimental bias. Through an examination of these issues we construct actionable guidance on how to alleviate bias, allowing for clearer comparisons between drift mitigations. Finally, we propose a framework for classification with rejection based on conformal prediction and conformal evaluation theory which is able to identify and quarantine drifting examples, improving on previous work in terms of performance and runtime cost.

Ultimately we find that the benefits of machine learning remain a tantalizing solution for security detection. While challenges remain, the application of mechanisms to identify, track, and adapt to drifting and adversarial inputs—if realistic evaluations are used to assess them—can greatly raise the bar for attackers.

Original language	English
Qualification	Ph.D.
Awarding Institution	Royal Holloway, University of London
Supervisors/Advisors	Cavallaro, Lorenzo, Supervisor Kinder, Johannes, Supervisor Albrecht, Martin, Supervisor Paterson, Kenny, Advisor
Thesis sponsors	Engineering and Physical Sciences Research Council (EPSRC)
Award date	1 Dec 2021
Publication status	Unpublished - 2021

Keywords

Machine learning
Cybersecurity
Security
Program analysis
Malware detection
Mobile security
Experimental bias
Adversarial learning
Concept drift
Conformal prediction

Access to Document

PhD Thesis - Machine Learning for Security in Hostile EnvironmentsOther version, 11.4 MB

Cite this

@phdthesis{a2193ce2c03249d982ff1fe8de8321a8,

title = "Machine Learning for Security in Hostile Environments",

abstract = "The potential for machine learning to change the world is undeniable. More data, better resources, and advances in algorithms have led to multiple breakthroughs in fields such as computer vision and natural language processing. Recently, efforts have been made to apply these methods to detection tasks in computer security where a system detects the presence of malicious objects to prevent them from causing harm to users. However the problem is challenging, primarily due to the inherently hostile environments that security detectors are deployed to. In these settings, members of the malicious class actively try to avoid detection, leading to drift in the data distribution over time that violates core assumptions of machine learning. Furthermore, adversaries can apply powerful algorithms to search for adversarial examples: objects which are confidently misclassified as benign by a detector while retaining malicious functionality. In this thesis we explore whether machine learning is ready to be used in the security domain, given this hostile environment. We outline how adversarial behavior manifests in security data, providing novel perspectives on the relationship between concept drift and adversarial examples, as well as between feature-space and problem-space adversarial attacks. These lead us to devise a new problem-space attack demonstrating that adversarial examples are a realistic and practical threat against malware detectors. We discuss the difficulty in performing fair, informative evaluations of defenses in such a dynamic and volatile environment, showing how the evaluations of previous state-of-the-art detectors have been inflated by experimental bias. Through an examination of these issues we construct actionable guidance on how to alleviate bias, allowing for clearer comparisons between drift mitigations. Finally, we propose a framework for classification with rejection based on conformal prediction and conformal evaluation theory which is able to identify and quarantine drifting examples, improving on previous work in terms of performance and runtime cost. Ultimately we find that the benefits of machine learning remain a tantalizing solution for security detection. While challenges remain, the application of mechanisms to identify, track, and adapt to drifting and adversarial inputs—if realistic evaluations are used to assess them—can greatly raise the bar for attackers. ",

keywords = "Machine learning, Cybersecurity, Security, Program analysis, Malware detection, Mobile security, Experimental bias, Adversarial learning, Concept drift, Conformal prediction",

author = "Feargus Pendlebury",

year = "2021",

language = "English",

school = "Royal Holloway, University of London",

}

TY - BOOK

T1 - Machine Learning for Security in Hostile Environments

AU - Pendlebury, Feargus

PY - 2021

Y1 - 2021

N2 - The potential for machine learning to change the world is undeniable. More data, better resources, and advances in algorithms have led to multiple breakthroughs in fields such as computer vision and natural language processing. Recently, efforts have been made to apply these methods to detection tasks in computer security where a system detects the presence of malicious objects to prevent them from causing harm to users. However the problem is challenging, primarily due to the inherently hostile environments that security detectors are deployed to. In these settings, members of the malicious class actively try to avoid detection, leading to drift in the data distribution over time that violates core assumptions of machine learning. Furthermore, adversaries can apply powerful algorithms to search for adversarial examples: objects which are confidently misclassified as benign by a detector while retaining malicious functionality. In this thesis we explore whether machine learning is ready to be used in the security domain, given this hostile environment. We outline how adversarial behavior manifests in security data, providing novel perspectives on the relationship between concept drift and adversarial examples, as well as between feature-space and problem-space adversarial attacks. These lead us to devise a new problem-space attack demonstrating that adversarial examples are a realistic and practical threat against malware detectors. We discuss the difficulty in performing fair, informative evaluations of defenses in such a dynamic and volatile environment, showing how the evaluations of previous state-of-the-art detectors have been inflated by experimental bias. Through an examination of these issues we construct actionable guidance on how to alleviate bias, allowing for clearer comparisons between drift mitigations. Finally, we propose a framework for classification with rejection based on conformal prediction and conformal evaluation theory which is able to identify and quarantine drifting examples, improving on previous work in terms of performance and runtime cost. Ultimately we find that the benefits of machine learning remain a tantalizing solution for security detection. While challenges remain, the application of mechanisms to identify, track, and adapt to drifting and adversarial inputs—if realistic evaluations are used to assess them—can greatly raise the bar for attackers.

AB - The potential for machine learning to change the world is undeniable. More data, better resources, and advances in algorithms have led to multiple breakthroughs in fields such as computer vision and natural language processing. Recently, efforts have been made to apply these methods to detection tasks in computer security where a system detects the presence of malicious objects to prevent them from causing harm to users. However the problem is challenging, primarily due to the inherently hostile environments that security detectors are deployed to. In these settings, members of the malicious class actively try to avoid detection, leading to drift in the data distribution over time that violates core assumptions of machine learning. Furthermore, adversaries can apply powerful algorithms to search for adversarial examples: objects which are confidently misclassified as benign by a detector while retaining malicious functionality. In this thesis we explore whether machine learning is ready to be used in the security domain, given this hostile environment. We outline how adversarial behavior manifests in security data, providing novel perspectives on the relationship between concept drift and adversarial examples, as well as between feature-space and problem-space adversarial attacks. These lead us to devise a new problem-space attack demonstrating that adversarial examples are a realistic and practical threat against malware detectors. We discuss the difficulty in performing fair, informative evaluations of defenses in such a dynamic and volatile environment, showing how the evaluations of previous state-of-the-art detectors have been inflated by experimental bias. Through an examination of these issues we construct actionable guidance on how to alleviate bias, allowing for clearer comparisons between drift mitigations. Finally, we propose a framework for classification with rejection based on conformal prediction and conformal evaluation theory which is able to identify and quarantine drifting examples, improving on previous work in terms of performance and runtime cost. Ultimately we find that the benefits of machine learning remain a tantalizing solution for security detection. While challenges remain, the application of mechanisms to identify, track, and adapt to drifting and adversarial inputs—if realistic evaluations are used to assess them—can greatly raise the bar for attackers.

KW - Machine learning

KW - Cybersecurity

KW - Security

KW - Program analysis

KW - Malware detection

KW - Mobile security

KW - Experimental bias

KW - Adversarial learning

KW - Concept drift

KW - Conformal prediction

M3 - Doctoral Thesis

ER -