What is it about?

Modern embedded devices often need to run multiple AI tasks at the same time, but these devices have limited power and computing resources. Our research introduces MapFormer, a smart management system that helps these AI tasks work together more efficiently on embedded devices like smartphones or edge computing systems. MapFormer acts like a traffic controller, deciding which parts of the device (CPU, GPU, or specialized AI chips) should handle each AI task. It also adjusts how hard these components work to save power while maintaining good performance. Using attention mechanisms (similar to how humans focus on important information), MapFormer makes intelligent decisions about resource allocation. Our experiments show that MapFormer improves performance by about 90% compared to traditional methods, while staying within power limits. This means devices can run multiple AI tasks faster and more efficiently without draining the battery quickly, making advanced AI applications more practical on everyday devices.

Featured Image

Why is it important?

MapFormer addresses a critical challenge in deploying AI on edge devices like smartphones, smart cameras, and IoT sensors. These devices increasingly need to run multiple AI models simultaneously (for tasks like object detection, speech recognition, and text processing), but have strict power and computational constraints. What makes our work unique is the novel attention-based mechanism that dynamically manages both task scheduling and power consumption across highly heterogeneous systems like the NVIDIA Jetson AGX Xavier, which features multiple computing components (CPU, GPU, and two NVDLAs). Unlike previous approaches that primarily focus on throughput or treat power as a secondary concern, MapFormer's backend intelligently utilizes all available hardware resources to achieve greater average throughput and optimal power trade-offs. A significant advancement is MapFormer's superior scalability - while the state-of-the-art was limited to managing a maximum of 5 DNNs concurrently, our attention mechanism enables effective handling of workloads with up to 10 neural networks, representing a substantial leap in complexity management for embedded AI systems. As AI deployment continues to shift toward edge computing for privacy, latency, and connectivity reasons, MapFormer provides a timely solution that could accelerate this transition by making complex AI workloads more viable on everyday devices, particularly as these devices incorporate increasingly diverse computing architectures.

Perspectives

Developing MapFormer presented several fascinating technical challenges that pushed the boundaries of AI deployment on heterogeneous systems. One of the most complex aspects was designing a backend that could efficiently handle tensor pipelines across CPU, GPU, and NVDLAs—components that don't even share the same numerical representations (NVDLAs limited to FP16 while CPU and GPU support up to FP64). The performance estimator development provided a key insight that transformed our approach: we discovered that precise throughput prediction wasn't necessary—classifying expected performance into broader categories was sufficient. This revelation simplified our model and allowed us to evaluate the estimator's accuracy, rendering our framework transparent and reliable, properties that were absent in previous research. Perhaps the most intellectually stimulating challenge was efficiently pruning the vast exploration space. By utilizing latent action Monte Carlo tree search, we progressively narrowed the design space to promising candidate solutions. When paired with our classification-based estimator, this approach consistently discovered mappings that surpassed state-of-the-art throughput-power trade-offs for multi-DNN workloads. What makes this work particularly meaningful is how it transforms theoretical concepts into practical solutions. By enabling more complex AI workloads on existing embedded hardware, MapFormer helps bridge the growing gap between AI capabilities and edge device constraints, potentially democratizing access to advanced AI applications beyond cloud-dependent systems.

Andreas Karatzas
Southern Illinois University Carbondale

Read the Original

This page is a summary of: MapFormer: Attention-based multi-DNN manager for throughout & power co-optimization on embedded devices, October 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3676536.3676724.
You can read the full text:

Read

Contributors

The following have contributed to this page