COMPUTATIONAL RESEARCH in BOSTON and BEYOND (CRIBB)
Date | August 5, 2022 |
---|---|
Speaker | Baolin Li Northeastern University |
Topic | Leveraging Heterogeneous Hardware Resources for Efficient Machine Learning Inference Service |
Abstract | Machine learning (ML) model inference is a key service in many businesses and scientific discovery processes. In the meantime, modern HPC systems and clouds are integrating more and more heterogeneous resources into the system. In this talk, I will discuss various techniques we develop to efficiently serve ML inference workloads using heterogeneous hardware resources. I will first introduce a simplified version of the inference serving problem, of which we apply a integer linear programming solver to minimize the energy consumption. Next, I will emphasize on a few challenges in a production environment where the inference queries display high variety and the users demand strict quality-of-service (QoS). To solve this complicated problem, we propose RIBBON, a cost-effective and QoS-aware inference server deployed on heterogeneous cloud computing instances. RIBBON formulates this as a black-box optimization problem and devises a Bayesian Optimization-driven strategy to allocate the heterogeneous resources. Compared to existing approaches, RIBBON saves up to 16% of the inference serving cost on various representative workloads. |
Biography |
Archives
Acknowledgements
We thank the generous support of MIT IS&T, CSAIL, and the Department of Mathematics for their support of this series.