Federated learning (FL) is envisaged to be a promising solution for collaboratively training a machine learning model while keeping the training data decentralized and private. For example, in FL, different participants might have different samples of data from different classes. Some participants might have no samples from some of the subclasses which makes training for those classes impossible locally. In an extreme case, data distribution is mutually exclusive or orthogonal which means no two client devices have samples from the same class. So instead of sharing raw data to a central entity, participating client devices share focused updates for aggregation to ensure global convergence of a machine learning model.
Owing to the shortcomings of manually handcrafted neural network architectures, the research community is striving to develop neural architecture search (NAS) approaches to automatically search for optimal networks that fit the clients’ data. One of the key bottlenecks of FL is the cost of communication between clients and the server, and the state-of-the-art federated NAS techniques search for networks with millions of parameters that require several rounds of communication to find the optimal weight parameters. Also, deploying a network having millions of parameters on edge devices (which are the typical participants in an FL process) is infeasible due to its computational limitations and increased latency.
Thus, there is a need to allow clients to share a subset of the locally trained networks and evaluate each other's networks on their local data without sharing said data. Then, participants can use the results of these evaluations to "estimate" how well the nonshared networks are likely to perform on the other participants’ data.
Researchers at Arizona State University have developed a method for training small neural networks (including weight-agnostic networks) in a federated learning setting. Trained neural networks by different participants might have different architectures which makes weight averaging not applicable. Thus, this weight-agnostic neural network training is needed. The method includes (1) sharing a subset of networks between clients and (2) based on the performance of the subset of networks on a particular client device with their local data, the method determines if the networks not shared would perform well on that client device.
Related Publication: WFNAS: Weight-Agnostic Federated Neural Architecture Search
Potential Applications:
Benefits and Advantages: