What is it about?
This paper surveys techniques for private and secure distributed deep learning (PSDDL). Deep learning models often require massive datasets, which can be difficult or undesirable to centralize due to cost, practicality, or privacy concerns. Distributed learning addresses this by training models on data spread across multiple devices or servers. However, this introduces new security and privacy risks. The paper categorizes distributed learning into two main types: collaborative learning (CBL) and federated learning (FL). In CBL, data is sent to a central server for training, while in FL, participants train local copies of a model on their own data and share updates with a central server without directly sharing the data itself. Both methods have vulnerabilities to attacks that could reveal sensitive information about the training data or the model itself. The survey then explores various protective mechanisms. For security, which focuses on preventing unauthorized access, the main techniques are cryptographic methods like secure multiparty computation (MPC) and homomorphic encryption (HE). MPC allows multiple parties to jointly perform computations without revealing their individual inputs, while HE enables computations on encrypted data. For privacy, which aims to prevent information leakage from the model, the primary focus is on differential privacy (DP). DP adds carefully calibrated noise to model updates or outputs, making it difficult to infer information about individual data points. The survey discusses the trade-offs between security/privacy and accuracy/efficiency. Stronger security and privacy measures often come at the cost of reduced model performance or increased computational overhead. The paper also highlights the importance of choosing the appropriate techniques based on the specific requirements of the application and the nature of the data distribution. It emphasizes that no single best solution exists and that careful consideration of these trade-offs is essential for designing effective PSDDL systems. Finally, the survey identifies open research areas, including addressing challenges related to non-independent and identically distributed data, robustly measuring actual privacy levels, and improving the integrity of data used in distributed learning.
Featured Image
Photo by Taylor Vick on Unsplash
Why is it important?
This survey is important because it provides a comprehensive overview of the rapidly evolving field of private and secure distributed deep learning (PSDDL), encompassing both security and privacy aspects across various distributed learning paradigms. It goes beyond previous surveys that typically focus on a single protective measure or a specific type of distributed learning. Its uniqueness lies in its broad scope, covering collaborative learning, federated learning, and other distributed methods like split learning and ensemble methods, alongside a discussion of both cryptographic techniques (MPC, HE) and differential privacy. This comprehensive approach makes the survey a valuable resource for researchers and practitioners seeking to understand and compare different PSDDL techniques, identify suitable methods for their specific needs, and recognize current limitations and open research challenges in the field. It essentially offers a single, unified point of reference for navigating the complex landscape of PSDDL.
Perspectives
Read the Original
This page is a summary of: Private and Secure Distributed Deep Learning: A Survey, ACM Computing Surveys, November 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3703452.
You can read the full text:
Contributors
The following have contributed to this page