A Thread-level Stream Scheduling Method for Accelerating LVMs' Inference on a Resource-constrained Platform

Yijie Chen; Jiaqi Han; Bin Liu; Xinzhe Zhang; Zijian Hu; Rongyu Dou; Keqin Li

doi:10.1145/3771550

What is it about?

AI model inference demands significant computational power, making efficient resource use on embedded devices critical. Our solution is a thread-level stream scheduling method. Leveraging the unified memory management on the NVIDIA Jetson Xavier NX, it binds threads to CUDA streams to enhance GPU utilization through parallel scheduling. This approach significantly boosts both the throughput and speed of model inference at the edge.

Photo by Se. Tsuchiya on Unsplash

Why is it important?

To deploy AI models on edge devices, most existing compilation frameworks focus on general-purpose model optimizations, often overlooking the specific architectural traits of embedded boards. Our work identifies a key issue: model compression can lead to low on-chip resource utilization. To address this, we introduce a thread-level CUDA stream scheduling method that significantly boosts GPU utilization, thereby increasing model throughput and inference speed. This research contributes to both edge AI deployment and compiler design, demonstrating a path to lower the cost and energy consumption of AI services through more efficient hardware use.

Perspectives

Edge AI is poised to enhance intelligent capabilities across all industries. Future model deployment frameworks will be able to efficiently optimize models by leveraging specific hardware features through methods like model architecture optimization, kernel fusion, memory management, and computational stream scheduling. With the support of such frameworks, AI models can be deployed rapidly, boosting efficiency and improving quality of life.
yijie chen
Northwest A&F University

This page is a summary of: A Thread-level Stream Scheduling Method for Accelerating LVMs' Inference on a Resource-constrained Platform, ACM Transactions on Embedded Computing Systems, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3771550.
You can read the full text:

Read

Contributors

The following have contributed to this page

yijie chen
Northwest A&F University

Accelerating Edge AI by Scheduling Threads and CUDA Streams

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Accelerating Edge AI by Scheduling Threads and CUDA Streams

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management