Parallel and distributed training of neural networks via successive convex approximation

Di Lorenzo, Paolo; Scardapane, S.

doi:10.1109/MLSP.2016.7738894

The aim of this paper is to develop a theoretical framework for training neural network (NN) models, when data is distributed over a set of agents that are connected to each other through a sparse network topology. The framework builds on a distributed convexification technique, while leveraging dynamic consensus to propagate the information over the network. It can be customized to work with different loss and regularization functions, typically used when training NN models, while guaranteeing provable convergence to a stationary solution under mild assumptions. Interestingly, it naturally leads to distributed architectures where agents solve local optimization problems exploiting parallel multi-core processors. Numerical results corroborate our theoretical findings, and assess the performance for parallel and distributed training of neural networks.