What is it about?

Modern AI training often runs on GPUs located in different datacenters, and the data exchanged between them can be slowed down by the wide-area network. Existing communication libraries are designed for fast and predictable networks inside a single datacenter, so performance drops sharply across regions. SCALE-CCL provides a fast way to plan how GPUs should exchange data using only lightweight measurements of current network conditions. It produces near-optimal communication schedules in under a second and significantly reduces training time compared with standard approaches when datacenters are connected by slower or variable links.

Featured Image

Read the Original

This page is a summary of: SCALE-CCL: A Scalable Collective Communication Library for Wide-Area Distributed Training, November 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3769695.3771677.
You can read the full text:

Read

Contributors

The following have contributed to this page