COMPUTATIONAL RESEARCH in BOSTON and BEYOND (CRIBB)

Date	August 5, 2022
Speaker	Baolin Li Northeastern University
Topic	Leveraging Heterogeneous Hardware Resources for Efficient Machine Learning Inference Service
Abstract	Machine learning (ML) model inference is a key service in many businesses and scientific discovery processes. In the meantime, modern HPC systems and clouds are integrating more and more heterogeneous resources into the system. In this talk, I will discuss various techniques we develop to efficiently serve ML inference workloads using heterogeneous hardware resources. I will first introduce a simplified version of the inference serving problem, of which we apply a integer linear programming solver to minimize the energy consumption. Next, I will emphasize on a few challenges in a production environment where the inference queries display high variety and the users demand strict quality-of-service (QoS). To solve this complicated problem, we propose RIBBON, a cost-effective and QoS-aware inference server deployed on heterogeneous cloud computing instances. RIBBON formulates this as a black-box optimization problem and devises a Bayesian Optimization-driven strategy to allocate the heterogeneous resources. Compared to existing approaches, RIBBON saves up to 16% of the inference serving cost on various representative workloads.
Biography

Acknowledgements

We thank the generous support of MIT IS&T, CSAIL, and the Department of Mathematics for their support of this series.