A variety of anti-malware scanners have been developed for malware detection. Previous research has indicated that combining multiple different scanners can achieve better result compared to any single scanner. However, given the diversity in detection rates and accuracy of different anti-malware scanners, how to determine the best possible outcome of multi-scanner systems in terms of accuracy and how to achieve this best outcome remain formidable tasks. In this paper, we propose three models to capture the combined output of different combinations of anti-malware scanners based on the limited amount of historical information available. These models enable us to predict the accuracy level of each combination, which helps us to determine the optimal configuration of the multi-scanner detection system to achieve maximum accuracy. We also introduce two methods to identify a near-optimal subset of scanners that can help reduce scanning cost while under time constraint. From simulations over randomly generated hypothetical datasets and experiments conducted with real world malware and goodware datasets and anti-virus scanners, we found that our models perform well in predicting the optimal configuration and can achieve an accuracy as high as within 1% of true maximum.
- Malware detection