Demo: Split-and-Pipeline: Collaborative Large Model Inference on Edge Devices

Zuguang Li; Dongyuan Ou; Wen Wu; Songge Zhang; Shaohua Wu; Xuemin (Sherman) Shen

doi:10.1145/3680207.3765592

What is it about?

Deploying and executing large model inference on edge devices is challenging due to their limited computational power and memory resources. To address this challenge, we present a novel Split-and-Pipeline, a collaborative inference scheme that partitions a large model into multiple submodels and executes them across distributed edge devices in a pipelined manner. The scheme parallelizes data transfer across multiple CPU cores to avoid transmission bottlenecks. We build a real-world testbed using NVIDIA Jetson series edge devices to demonstrate the proposed scheme, achieving 1.2×–3.0× throughput improvement over state-of-the-art baselines.

Photo by Jo Lin on Unsplash

This page is a summary of: Demo: Split-and-Pipeline: Collaborative Large Model Inference on Edge Devices, November 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3680207.3765592.
You can read the full text:

Read

Contributors

The following have contributed to this page

Zuguang Li
Harbin Institute of Technology

Demo: Split-and-Pipeline: Collaborative Large Model Inference on Edge Devices

What is it about?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Demo: Split-and-Pipeline: Collaborative Large Model Inference on Edge Devices

What is it about?

Featured Image

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management