What is it about?
This work distributes the training workload across a variety of available heterogeneous parallel computing machines.
Featured Image
Why is it important?
CNNs have proven to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times that not even the adoption of stand-alone GPUs could keep up to. This problem is partially solved by using more processing units and distributed training methods that are offered by several frameworks dedicated to neural network training, such as Caffe, Torch, or TensorFlow. However, these techniques do not take full advantage of the possible parallelization offered by CNNs and the cooperative use of heterogeneous devices with different characteristics such as processing capabilities, clock speed, memory size, among others. This paper presents a new method for the parallel training of CNNs where only the convolutional layer is distributed. Results show that this technique is capable of diminishing the training time without affecting the classification performance for both CPUs and GPUs.
Perspectives
Read the Original
This page is a summary of: Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures, Applied Artificial Intelligence, September 2018, Taylor & Francis,
DOI: 10.1080/08839514.2018.1508814.
You can read the full text:
Contributors
The following have contributed to this page