Federated learning is used for distributed training of machine learning algorithms on multiple edge devices without exchanging training data. Therefore, Federated learning introduces a new learning paradigm where statistical methods are trained at the edge in distributed networks.
Read about the unique properties and associated challenges of federated learning; we will cover the following:
- Why we need Federated Learning
- What is Federated Learning?
- Core challenges and concepts of Federated Learning
Why we need Federated Learning
Big Data and Edge-Computing Trend
Today, an immense number of connected devices such as mobile devices, wearables, and autonomous vehicles generate massive amounts of data (Big Data). Due to the fast-growing computational power of these devices, along with privacy concerns, there is an increasing need to store and process data locally – pushing the computation from the cloud to the edge.
Artificial Intelligence (AI) is needed to leverage the value of Big Data: Deep Learning has been shown to be very effective in learning from complex data. For example, deep neural network architectures have been able to outperform humans when recognizing images from the popular ImageNet dataset.
Edge Computing became the new paradigm, enabling the adoption of computation-intense applications. Edge Intelligence or Edge AI is a combination of AI and Edge Computing; it enables the deployment of machine learning algorithms to the edge device where the data is generated. The combination of distributed and connected systems with machine learning is also called AIoT.
Most concepts of Edge Intelligence generally focus on the inference phase (running the AI model) and assume that the training of the AI model is performed in cloud data centers, mostly due to the high resource consumption of the training phase. However, the growing computational capabilities and storage of connected edge devices enable methods for distributed training of machine learning models.
Need for Privacy-Preserving Deep Learning
Traditional machine learning approaches need to combine all data at one location, typically a cloud data center, which may violate the laws on user privacy and data confidentiality. Today, many parts of the world demand that technology companies treat user data carefully according to user-privacy laws. A prime example is the European Union’s General Data Protection Regulation (GDPR).
Federated learning is an emerging approach to preserve privacy when training the Deep Neural Network Model based on data originated by multiple clients. Federated machine learning addresses this problem with solutions combining distributed machine learning, cryptography and security, and incentive mechanism design based on economic principles and game theory.
Therefore, Federated learning could become the foundation of next-generation machine learning that caters to technological and societal needs for responsible AI development and application.
What is Federated Learning
Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices) collaboratively train a model under the orchestration of a central server (e.g., service provider) while keeping the annotated training data decentralized. Hence, machine learning algorithms, such as deep neural networks, are trained on multiple local datasets contained in local edge nodes.
Instead of aggregating the raw data to a centralized data center (Cloud) for training, federated learning leaves the raw data distributed on the client devices and trains a shared model on the server by aggregating locally computed updates.
Therefore, Federated learning can mitigate many systemic privacy risks and costs resulting from traditional, centralized machine learning approaches.
Federated Learning Applications
Federated learning methods play a critical role in supporting privacy-sensitive applications where the training data is distributed at the edge.
Some examples of federated learning applications include learning sentiment, semantic location, mobile phone activity, adapting to pedestrian behavior in autonomous vehicles, and predicting health events like heart attack risks from wearable devices.
There are multiple types of prominent federated learning applications:
- Smartphones. Statistical models are used to power applications such as next-word prediction, face detection, and voice recognition by jointly learning user behavior across a large pool of mobile phones. However, users may not agree to share their data to protect their personal privacy or minimize the bandwidth or battery usage of their phones. Federated learning can be used to enable predictive features on smartphones without leaking private information or diminishing the user experience.
- Organizations. In the context of federated learning, entire organizations or institutions can also be viewed as “devices”. For example, hospitals are organizations that contain a large amount of patient data for predictive healthcare applications. However, hospitals operate under strict privacy practices and may face legal, administrative, or ethical constraints that require data to remain local. Federated learning is a solution for such applications because it can reduce strain on the network and enable private learning between various devices/organizations.
- Internet of things. Modern IoT networks, such as wearable devices, autonomous vehicles, or smart homes, use sensors to collect and react to incoming data in real-time. For example, a fleet of autonomous vehicles may require an up-to-date model of traffic, construction, or pedestrian behavior to operate safely. However, building aggregate models in these scenarios may be difficult due to privacy concerns and the limited connectivity of each device. Federated learning methods enable the training of models that efficiently adapt to changes in these systems while maintaining user privacy.
Core Challenges of Federated Learning
The implementation of Federated Learning depends on a set of key challenges:
- Efficient Communication across the federated network
- Managing heterogeneous systems in the same networks
- Statistical heterogeneity of data in federated networks
- Privacy concerns and privacy-preserving methods
Communication is a key bottleneck to consider when developing methods for federated networks. This is because Federated networks potentially include a massive number of devices (for example, millions of smartphones), and communication in the network can be slower than local computation by many orders of magnitude.
Therefore, federated learning depends on communication-efficient methods that iteratively send small messages or model updates as part of the distributed training process instead of sending the entire dataset over the network. There are two main goals to further reduce communication: (1) reducing the total number of communication rounds or (2) reducing the size of transmitted messages at each round.
The following are general concepts that aim to achieve communication-efficient distributed learning methods:
- Local updating methods allow for a variable number of local updates to be applied on each machine in parallel at each communication round. Thus, the goal of local updating methods is to reduce the total number of communication rounds.
- Model compression schemes such as sparsification, subsampling, and quantization can significantly reduce the size of messages communicated at each update round.
- Decentralized training. In the federated learning settings, a server connects with all remote devices. Decentralized topologies are an alternative when communication to the server becomes a bottleneck, especially when operating in low bandwidth or high latency networks.
The storage, computational, and communication capabilities of the devices that are part of a federated network may differ significantly. Differences usually occur due to variability in hardware (CPU, memory), network connectivity (3G, 4G, 5G, wifi), and power supply (battery level).
Additionally, only a small fraction of the devices may be active at once. Each device may be unreliable as it is not uncommon for an edge device to drop out due to connectivity or energy constraints. Therefore, fault tolerance is important as participating devices may drop out before completing the given training iteration.
Therefore, federated learning methods have to be developed so that they (1) anticipate a low amount of participation, (2) tolerate heterogeneous hardware, and (3) are robust to dropped devices in the network.
There are some key directions to handle systems heterogeneity:
- Asynchronous communication is used to parallelize iterative optimization algorithms. Asynchronous schemes are an attractive approach to mitigating stragglers in heterogeneous environments.
- Active device sampling. Typically, only a small subset of devices participate in each round of training. Therefore, an approach is to actively select participating devices at each round with the goal to aggregate as many device updates as possible within a pre-defined time window.
- Fault tolerance. A practical approach is to ignore device failure, which may lead to bias in the device sampling scheme if the failed devices have specific data characteristics. Coded computation is another option to tolerate device failures by introducing algorithmic redundancy.
Devices frequently generate and collect data in a non-identically distributed manner across the network, e.g., mobile phone users have varied use of language in the context of a next word prediction task.
Also, the number of data points across devices may vary significantly, and there may be an underlying structure present that captures the relationship between devices and their associated distributions. This data generation paradigm violates frequently-used independent and identically distributed (I.I.D.) assumptions in distributed optimization, increases the likelihood of stragglers, and may add complexity in terms of modeling, analysis, and evaluation.
Challenges arise when training federated models from data that is not identically distributed across devices, both in terms of modeling the data and in terms of analyzing the convergence behavior of associated training procedures.
Privacy concerns often motivate the need to keep raw data on each device local in federated settings. However, sharing other information such as model updates as part of the training process can also potentially reveal sensitive information, either to a third party or to the central server.
Recently methods aim to enhance the privacy of federated learning using secure multiparty computation (SMC) or differential privacy. However, those methods usually provide privacy at the cost of reduced model performance or system efficiency. Therefore, balancing these trade-offs is a considerable challenge in realizing private federated learning systems.
Recently, multiple privacy-preserving methods for machine learning have been researched. For example, the following three main strategies could be used for federated settings: Differential privacy to communicate noisy data sketches, homomorphic encryption to operate on encrypted data, and secure function evaluation or multiparty computation.
- Differential Privacy is a popular privacy approach due to its strong information-theoretic guarantees, algorithmic simplicity, and comparably small systems overhead. A randomized mechanism is differentially private if the change of one input element will not result in too much difference in the output distribution. Therefore, it is not possible to draw conclusions about whether or not a specific sample is used in the learning process. Furthermore, there exists an inherent trade-off between differential privacy and model accuracy, as adding more noise results in greater privacy but may compromise accuracy significantly.
- Homomorphic Encryption can be used to secure the learning process by computing on encrypted data. However, it has currently been applied in limited settings, e.g., training linear models or involving only a few entities.
- Secure multiparty computation (SMC) or secure function evaluation (SFE) are other options for performing privacy-preserving learning with sensitive datasets distributed across different data owners. Those protocols enable multiple parties to collaboratively compute an agreed-upon function without leaking raw input information from any party except for what can be inferred from the output. SMC is a lossless method and can retain the original accuracy with a very high privacy guarantee. To achieve even stronger privacy guarantees, SMC can be combined with differential privacy.
Privacy in Federated Learning poses novel challenges to existing privacy-preserving algorithms. Most importantly, privacy-preserving methods have to offer rigorous privacy guarantees without overly compromising accuracy. Therefore, such methods have to be computationally cheap, communication-efficient, and tolerant to dropped devices.
Current implementations of privacy-preserving federated learning typically build around classical cryptographic protocols such as SMC and differential privacy. However, SMC techniques impose significant performance overheads, and their application to privacy-preserving deep learning remains an open problem.
If you want to learn more about related topics, we recommend the following articles:
- Everything you need to know about Edge AI
- Read about Deep Reinforcement Learning
- Read about Self-Supervised Learning and Supervised vs. Unsupervised Learning
- Learn about Edge Intelligence to deploy Deep Learning models