Publication: Information Geometric Approaches for Neural Network Algorithms
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
This thesis addresses applications of differential geometry to learning algorithms on stochastic multilayer perceptrons and Boltzmann machines. Traditional methods rely on gradient descent, though given the non-Euclidean nature of the probability distributions of the algorithm outputs, the gradient with respect to the model parameters is not in general the true direction of steepest descent. We instead use a natural gradient derived from the Fisher metric on statistical manifolds. In the case of multilayer perceptrons, the challenge lies in deriving the inverse Fisher matrix, for which we provide explicit forms in some simple cases and approximations for more general feedforward networks. In the case of Boltzmann machines, we discuss the theory of exponential families, elucidating relationships between the Fisher metric, Kullback-Leibler divergence, and particular geometric connections on exponential families which provides a view of Boltzmann machine learning in terms of geodesics and forms the foundation for another application of the natural gradient. Throughout, we provide simulation results of some of the algorithms discussed, demonstrating the practical power of the natural gradient in some cases.