What is it about?
Deploying and executing large model inference on edge devices is challenging due to their limited computational power and memory resources. To address this challenge, we present a novel Split-and-Pipeline, a collaborative inference scheme that partitions a large model into multiple submodels and executes them across distributed edge devices in a pipelined manner. The scheme parallelizes data transfer across multiple CPU cores to avoid transmission bottlenecks. We build a real-world testbed using NVIDIA Jetson series edge devices to demonstrate the proposed scheme, achieving 1.2×–3.0× throughput improvement over state-of-the-art baselines.
Featured Image
Photo by Jo Lin on Unsplash
Read the Original
This page is a summary of: Demo: Split-and-Pipeline: Collaborative Large Model Inference on Edge Devices, November 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3680207.3765592.
You can read the full text:
Contributors
The following have contributed to this page







