The Dose Makes the Poison — Leveraging Uncertainty for Effective Malware Detection
R. Sun, X. Yang, A. Lee, M. Bishop, D. Porter, X. Li, A. Gregio, and D. Oliveira, “The Dose Makes the Poison — Leveraging Uncertainty for Effective Malware Detection,” Proceedings of the 2017 IEEE Conference on Dependable and Secure Computing, to appear (Aug. 2017).
- Published version: not yet available
- Authors’ final version:
Malware has become sophisticated and organizations don’t have a Plan B when standard lines of defense fail. These failures have devastating consequences for organizations, such as sensitive information being exfiltrated.
A promising avenue for improving the effectiveness of behavioral-based malware detectors is to combine fast (usually not highly accurate) traditional machine learning (ML) detectors with high-accuracy, but time-consuming, deep learning (DL) models. The main idea is to place software receiving borderline classifications by traditional ML methods in an environment where uncertainty is added, while software is analyzed by timeconsuming DL models. The goal of uncertainty is to rate-limit actions of potential malware during deep analysis.
In this paper, we describe CHAMELEON, a Linux-based framework that implements this uncertain environment. CHAMELEON offers two environments for its OS processes: standard—for software identified as benign by traditional ML detectors—and uncertain—for software that received borderline classifications analyzed by ML methods. The uncertain environment will bring obstacles to software execution through random perturbations applied probabilistically on selected system calls. We evaluated CHAMELEON with 113 applications from common benchmarks and 100 malware samples for Linux. Our results show that at threshold 10%, intrusive and non-intrusive strategies caused approximately 65% of malware to fail accomplishing their tasks, while approximately 30% of the analyzed benign software to meet with various levels of disruption (crashed or hampered). We also found that I/O-bound software was three times more affected by uncertainty than CPU-bound software.