What is it about?

This paper surveys techniques for private and secure distributed deep learning (PSDDL). Deep learning models often require massive datasets, which can be difficult or undesirable to centralize due to cost, practicality, or privacy concerns. Distributed learning addresses this by training models on data spread across multiple devices or servers. However, this introduces new security and privacy risks. The paper categorizes distributed learning into two main types: collaborative learning (CBL) and federated learning (FL). In CBL, data is sent to a central server for training, while in FL, participants train local copies of a model on their own data and share updates with a central server without directly sharing the data itself. Both methods have vulnerabilities to attacks that could reveal sensitive information about the training data or the model itself. The survey then explores various protective mechanisms. For security, which focuses on preventing unauthorized access, the main techniques are cryptographic methods like secure multiparty computation (MPC) and homomorphic encryption (HE). MPC allows multiple parties to jointly perform computations without revealing their individual inputs, while HE enables computations on encrypted data. For privacy, which aims to prevent information leakage from the model, the primary focus is on differential privacy (DP). DP adds carefully calibrated noise to model updates or outputs, making it difficult to infer information about individual data points. The survey discusses the trade-offs between security/privacy and accuracy/efficiency. Stronger security and privacy measures often come at the cost of reduced model performance or increased computational overhead. The paper also highlights the importance of choosing the appropriate techniques based on the specific requirements of the application and the nature of the data distribution. It emphasizes that no single best solution exists and that careful consideration of these trade-offs is essential for designing effective PSDDL systems. Finally, the survey identifies open research areas, including addressing challenges related to non-independent and identically distributed data, robustly measuring actual privacy levels, and improving the integrity of data used in distributed learning.

Featured Image

Why is it important?

This survey is important because it provides a comprehensive overview of the rapidly evolving field of private and secure distributed deep learning (PSDDL), encompassing both security and privacy aspects across various distributed learning paradigms. It goes beyond previous surveys that typically focus on a single protective measure or a specific type of distributed learning. Its uniqueness lies in its broad scope, covering collaborative learning, federated learning, and other distributed methods like split learning and ensemble methods, alongside a discussion of both cryptographic techniques (MPC, HE) and differential privacy. This comprehensive approach makes the survey a valuable resource for researchers and practitioners seeking to understand and compare different PSDDL techniques, identify suitable methods for their specific needs, and recognize current limitations and open research challenges in the field. It essentially offers a single, unified point of reference for navigating the complex landscape of PSDDL.

Perspectives

The explosion of big data has fueled remarkable advancements in deep learning, enabling breakthroughs in various domains. However, this progress has also intensified privacy concerns, as deep learning models often require access to vast amounts of sensitive data. High-profile data leaks and breaches have further heightened public awareness and spurred the development of privacy-preserving laws and regulations like GDPR. This has created a critical dichotomy: society needs access to and processing of large private datasets to improve services and develop beneficial applications, yet individual privacy must be protected. Distributed learning offers a promising path forward, enabling collaborative model training on decentralized data without requiring direct data sharing. However, distributed learning itself introduces new vulnerabilities. This paper addresses this crucial juncture by surveying the state-of-the-art in private and secure distributed deep learning (PSDDL). By exploring techniques like homomorphic encryption, secure multiparty computation, and differential privacy within various distributed learning frameworks, this work provides a roadmap for navigating the complex landscape of building and deploying deep learning models that are both powerful and privacy-preserving, effectively tackling the tension between data utility and individual privacy rights in the age of big data.

Saba Amiri
Universiteit van Amsterdam

Read the Original

This page is a summary of: Private and Secure Distributed Deep Learning: A Survey, ACM Computing Surveys, November 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3703452.
You can read the full text:

Read

Contributors

The following have contributed to this page