Publication:
Internal Methods for Decomposition of Images into Shape and Appearance

No Thumbnail Available

Date

2022-12-20

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Verbin, Dor. 2022. Internal Methods for Decomposition of Images into Shape and Appearance. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Decomposing an image into interpretable and useful components is a fundamental problem in computer vision. In particular, decompositions involving shape are especially useful, since accurate estimated shape can often be used in downstream tasks in areas such as computer graphics, augmented reality, and robotics. While many previous approaches rely on large datasets in order to learn how to do this decomposition, they often fail to use all the information that is internal to the image itself. This dissertation explores methods for shape and appearance decomposition that advance the state of the art on three separate tasks while being internal, relying only on information found inside a single image or a small collection of images. The first method involves modeling a single image as a piecewise-smooth signal, with regions of similar color being grouped together, and separated by a sparse set of boundaries. The shape of the boundaries is modeled using a 2D field of overlapping generalized junctions, such that these boundaries separate regions of uniform appearance. The model is formulated as an optimization problem which encourages each junction to explain the local appearance of its underlying patch, and at the same time agree with its neighboring junctions. This model is shown to successfully recover the edges, corners, junctions, and uniform regions of an image, all under significant amounts of image noise. Despite not using an external dataset and having only a handful of tunable hyperparameters, the method is also shown to outperform large convolutional neural networks that are trained for the same tasks. In the second example, a single image of a textured object is decomposed into 2.5D shape (a field of surface normals), and a stochastic appearance texture process. Optimization is formulated as a three-player game, which upon convergence yields accurate shape and enables stochastically generating arbitrarily large flat texture samples. The three-player game enables accurate recovery of shape and texture appearance from a larger variety of textures compared with previous approaches. We also characterize the conditions under which this decomposition is unique, and under which conditions there may be an additional valid decompositions of the image into shape and texture. In the third and final example, a set of images of a single scene is decomposed into 3D scene shape and appearance using a light field encoded in the weights of a neural network. We show that explicitly modeling reflections and encouraging surface-like geometry significantly improves the estimated shape, and that it results in a significant boost to the accuracy of its novel view synthesis capabilities, especially for glossy objects. After convergence, this explicit parameterization also enables editing material properties.

Description

Other Available Sources

Keywords

Computer Vision, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories