A Deeper Look at the Unsupervised Learning of Disentangled Representations in β-VAE From the Perspective of Core Object Recognition
Sikka, Harshvardhan Digvijay
MetadataShow full item record
CitationSikka, Harshvardhan Digvijay. 2020. A Deeper Look at the Unsupervised Learning of Disentangled Representations in β-VAE From the Perspective of Core Object Recognition. Master's thesis, Harvard Extension School.
AbstractThe ability to recognize objects despite there being differences in appearance, known as Core Object Recognition, forms a critical part of human perception. While it is understood that the brain accomplishes Core Object Recognition through feedforward, hierarchical computations through the visual stream, the underlying algorithms that allow for invariant representations to form downstream is still not well understood. (DiCarlo et al., 2012) Various computational perceptual models have been built to attempt and tackle the object identification task in an artificial perceptual setting. Artificial Neural Networks, computational graphs consisting of weighted edges and mathematical operations at vertices, are loosely inspired by neural networks in the brain and have proven effective at various visual perceptual tasks, including object characterization and identification. (Pinto et al., 2008) (DiCarlo et al., 2012) Artificial perceptual systems often stumble when encountering the core invariance problem identifying the same object over a spectrum of transformations and viewing conditions. A popular research direction in the field of Machine Learning that attempts to solve this as a subset of a larger problem is introducing inductive biases into the model itself to reflect the structure of the input data. The specific research problem being explored in this thesis centers on a meaningful, bounded subset of these overarching goals. For many data analysis tasks, learning representations where each dimension is statistically independent and thus disentangled from the others is useful. If the underlying generative factors of the data are also statistically independent, Bayesian inference of latent variables can form disentangled representations. This thesis constitutes a research project exploring a generalization of the Variational Autoencoder (VAE), β-VAE, that aims to learn disentangled representations using variational inference. β-VAE incorporates the hyperparameter β, and enforces conditional independence of its bottleneck neurons, which is in general not compatible with the statistical independence of latent variables. This text examines this architecture, and provides analytical and numerical arguments, with the goal of demonstrating that this incompatibility leads to a non-monotonic inference performance in β-VAE with a finite optimal β. Building artificial neural networks that can effectively disentangle representations is of great interest to both the neuroscience and computational perception communities. (Goodfellow et al., 2016) (LeCun et al., 2015) For the former, these models can inform scientific understanding of how neurons in the Ventral Visual Stream may be disentangling representations of objects to ascertain their identity in real time, and these systems can also provide powerful analytical tools for neuroscientists and computational biologists to apply to their own data, disentangling representations in underlying neural data to make better sense of what neuronal populations are doing. (Gurtubay et al., 2015) For the computational perception and machine learning community, building better artificial neural networks that can disentangle representations provides a powerful foundation towards more effective perceptual systems being used in modern technology. (Geron et al., 2017) Downstream impact includes better unsupervised preprocessing for semi supervised networks and applications in various industries including transportation, commerce, and security. (Goodfellow et al., 2016)
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37365075