TY - JOUR AB - High Performance Computing (HPC) is a field concerned with solving large-scale problems in science and engineering. However, the computational infrastructure of HPC systems can also be misused as demonstrated by the recent commoditization of cloud computing resources on the black market. As a first step towards addressing this, we introduce a machine learning approach for classifying distributed parallel computations based on communication patterns between compute nodes. We first provide relevant background on message passing and computational equivalence classes called dwarfs and describe our exploratory data analysis using self organizing maps. We then present our classification results across 29 scientific codes using Bayesian networks and compare their performance against Random Forest classifiers. These models, trained with hundreds of gigabytes of communication logs collected at Lawrence Berkeley National Laboratory, perform well without any a priori information and address several shortcomings of previous approaches.

AU - Whalen, Sean AU - Peisert, Sean AU - Bishop, Matt JO - Pattern Recognition Letters Y2 - February IS - 3 SP - 322 EP - -329 T1 - Multiclass Classification of Distributed Memory Parallel Computations VL - 34 PY - 2013 ER -