Publication: Learning-Based Methods for Recovering Visual Structure
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Extracting explicit geometric structure from image data is a prerequisite for understanding visual scenes. This process manifests in 2D as the recovery of curvilinear boundaries that delineate objects. Similarly, in the 3D realm, it involves the derivation of scene surfaces from a set of multiple images. These transformations from images to structured representations are notoriously difficult inverse problems complicated by sparse data, misleading local evidence, and geometric complexity. Historically, approaches to this challenge have tended to bifurcate into two distinct paradigms: "geometry-first" methods that rely on rigorous, but restrictive, mathematical priors; and "learning-first" methods that prioritize data-driven scalability but lack interpretability and struggle to generalize beyond their training sets. This dissertation explores two cases of \textit{structured differentiability}, a synthesis of these paradigms that aims to overcome their limitations by embedding geometric objectives and inductive bias directly into differentiable, learning-based formulations.
First, in the context of boundary detection in 2D images, we introduce a lightweight network that employs a differentiable, geometry-aware attention mechanism to resolve ambiguities and recover from measurement noise. Our model decomposes an image into a field of geometric primitives, thereby preserving the geometric precision of geometry-first methods, while leveraging the inference speed and data-driven scalability of neural networks.
Second, we address the challenge of novel view synthesis, where the goal is to recover underlying surface geometry and appearance from a set of images to predict novel viewpoints. We build upon fast and effective splatting-based methods, which represent scene structure as millions of discrete primitives defined by their shape, color, and opacity. To overcome the limitations of traditional methods, which depend on manual tuning, we propose a probabilistic reformulation of 3D Gaussian Splatting. Rather than relying on the heuristic split-and-prune strategies traditionally used to manage surface primitives, we define a continuous, learnable probability distribution from which primitives are sampled. This transforms the allocation of geometry from a set of rigid, discrete rules into a fully differentiable process, allowing gradient descent to naturally concentrate representational capacity where it is needed most.
Collectively, these contributions demonstrate that coupling data-driven learning with geometrically grounded, differentiable objectives reconciles the interpretability of explicit modeling with the empirical power of deep learning, yielding recovery processes that are efficient, interpretable, and robust to real-world ambiguity.