What is it about?

There are many variations of Machine Learning models with different computation requirements and accuracies. We look at sending requests to different models when there are different levels of traffic to see if we can lower cost while trying to have a responsive system with a low cost.

Featured Image

Why is it important?

Machine Learning systems are becoming more and more common and require large resources to service these requests. Lowering their cost while avoiding excessive accuracy loss is useful from both a business and environmental perspective.

Perspectives

For better or worse machine learning algorithms are becoming more prominent. Past experience has shown that inefficient deployments can have serious economic and environmental costs (Consider the energy consumption of Bitcoin mining). Designing these systems with efficiency in mind is of critical importance as they become more prevalent.

Joseph Doyle
Queen Mary University of London

Read the Original

This page is a summary of: Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems, May 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3578356.3592578.
You can read the full text:

Read

Contributors

The following have contributed to this page