What is it about?
Modern AI training often runs on GPUs located in different datacenters, and the data exchanged between them can be slowed down by the wide-area network. Existing communication libraries are designed for fast and predictable networks inside a single datacenter, so performance drops sharply across regions. SCALE-CCL provides a fast way to plan how GPUs should exchange data using only lightweight measurements of current network conditions. It produces near-optimal communication schedules in under a second and significantly reduces training time compared with standard approaches when datacenters are connected by slower or variable links.
Featured Image
Photo by Guillermo Ruiz on Unsplash
Read the Original
This page is a summary of: SCALE-CCL: A Scalable Collective Communication Library for Wide-Area Distributed Training, November 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3769695.3771677.
You can read the full text:
Contributors
The following have contributed to this page







