What is it about?
There are many variations of Machine Learning models with different computation requirements and accuracies. We look at sending requests to different models when there are different levels of traffic to see if we can lower cost while trying to have a responsive system with a low cost.
Featured Image
Photo by Google DeepMind on Unsplash
Why is it important?
Machine Learning systems are becoming more and more common and require large resources to service these requests. Lowering their cost while avoiding excessive accuracy loss is useful from both a business and environmental perspective.
Perspectives
Read the Original
This page is a summary of: Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems, May 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3578356.3592578.
You can read the full text:
Contributors
The following have contributed to this page