What is it about?
Data preprocessing consisting of tasks like sample resizing, cropping, and filtering, is a crucial step in machine learning (ML) workflows. Even though the preprocessing step is largely ignored by work that focuses on optimizing training algorithms, in practice for many workloads preprocessing and training are pipelined. Popular ML frameworks like PyTorch use data loaders to feed data into model training. If the pipeline between preprocessing and training is not done carefully, it can cause significant waiting times on the GPU side. To address this limitation, we introduce SpeedyLoader, a system that overlaps preprocessing and training by leveraging asynchronous data preprocessing and avoiding head-of-line blocking. SpeedyLoader incorporates dedicated data loading threads, which organize preprocessed samples into queues based on their predicted processing times. Concurrently, GPUs fetch samples from these queues, ensuring training is not impeded by preprocessing completion.
Featured Image
Why is it important?
Compared to the default PyTorch DataLoader, SpeedyLoader reduces training time by up to 30% and increases GPU usage by 4.3×, all while maintaining a consistent evaluation accuracy of 91%
Perspectives
Read the Original
This page is a summary of: SpeedyLoader, April 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3642970.3655824.
You can read the full text:
Contributors
The following have contributed to this page