Person: Adams, Ryan Prescott
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
First Name
Name
Search Results
Publication Revisiting Uncertainty in Graph Cut Solutions
(Institute of Electrical and Electronics Engineers, 2012) Tarlow, Daniel; Adams, Ryan PrescottGraph cuts is a popular algorithm for finding the MAP assignment of many large-scale graphical models that are common in computer vision. While graph cuts is powerful, it does not provide information about the marginal probabilities associated with the solution it finds. To assess uncertainty, we are forced to fall back on less efficient and inexact inference algorithms such as loopy belief propagation, or use less principled surrogate representations of uncertainty such as the min-marginal approach of Kohli & Torr. In this work, we give new justification for using min-marginals to compute the uncertainty in conditional random fields, framing the min-marginal outputs as exact marginals under a specially-chosen generative probabilistic model. We leverage this view to learn properly calibrated marginal probabilities as the result of straightforward maximization of the training likelihood, showing that the necessary subgradients can be computed efficiently using dynamic graph cut operations. We also show how this approach can be extended to compute multi-label marginal distributions, where again dynamic graph cuts enable efficient marginal inference and maximum likelihood learning. We demonstrate empirically that — after proper training — uncertainties based on min-marginals provide better- calibrated probabilities than baselines and that these distributions can be exploited in a decision-theoretic way for improved segmentation in low-level vision.
Publication On Nonparametric Guidance for Learning Autoencoder Representations
(Microtome Publishing, 2012) Snoek, Jasper; Adams, Ryan Prescott; Larochelle, HugoUnsupervised discovery of latent representations, in addition to being useful for density modeling, visualisation and exploratory data analysis, is also increasingly important for learning features relevant to discriminative tasks. Autoencoders, in particular, have proven to be an effective way to learn latent codes that reflect meaningful variations in data. A continuing challenge, however, is guiding an autoencoder toward representations that are useful for particular tasks. A complementary challenge is to find codes that are invariant to irrelevant transformations of the data. The most common way of introducing such problem-specific guidance in autoencoders has been through the incorporation of a parametric component that ties the latent representation to the label information. In this work, we argue that a preferable approach relies instead on a nonparametric guidance mechanism. Conceptually, it ensures that there exists a function that can predict the label information, without explicitly instantiating that function. The superiority of this guidance mechanism is con- firmed on two datasets. In particular, this approach is able to incorporate invariance information (lighting, elevation, etc.) from the small NORB object recognition dataset and yields state-of-the-art performance for a single layer, non-convolutional network.
Publication Randomized Optimum Models for Structured Prediction
(Microtome Publishing, 2012) Tarlow, Daniel; Adams, Ryan Prescott; Zemel, Richard S.One approach to modeling structured discrete data is to describe the probability of states via an energy function and Gibbs distribution. A recurring difficulty in these models is the computation of the partition function, which may require an intractable sum. However, in many such models, the mode can be found efficiently even when the partition function is unavailable. Recent work on Perturb-and-MAP (PM) models (Papandreou and Yuille, 2011) has exploited this discrepancy to approximate the Gibbs distribution for Markov random fields (MRFs). Here, we explore a broader class of models, called Randomized Optimum models (RandOMs), which include PM as a special case. This new class of models encompasses not only MRFs, but also other models that have intractable partition functions yet permit efficient mode-finding, such as those based on bipartite matchings, shortest paths, or connected components in a graph. We develop likelihood-based learning algorithms for RandOMs, which, empirical results indicate, can produce better models than PM.
Publication Computational Caches
(ACM Press, 2013) Waterland, Amos; Angelino, Elaine Lee; Cubuk, Ekin; Kaxiras, Efthimios; Adams, Ryan Prescott; Appavoo, Jonathan; Seltzer, MargoCaching is a well-known technique for speeding up computation. We cache data from file systems and databases; we cache dynamically generated code blocks; we cache page translations in TLBs. We propose to cache the act of computation, so that we can apply it later and in different contexts. We use a state-space model of computation to support such caching, involving two interrelated parts: speculatively memoized predicted/resultant state pairs that we use to accelerate sequential computation, and trained probabilistic models that we use to generate predicted states from which to speculatively execute. The key techniques that make this approach feasible are designing probabilistic models that automatically focus on regions of program execution state space in which prediction is tractable and identifying state space equivalence classes so that predictions need not be exact.
Publication Multi-Task Bayesian Optimization
(Curran Associates, Inc., 2013) Swersky, Kevin; Snoek, Jasper; Adams, Ryan PrescottBayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up (k)-fold cross-validation. Lastly, our most significant contribution is an adaptation of a recently proposed acquisition function, entropy search, to the cost-sensitive and multi-task settings. We demonstrate the utility of this new acquisition function by utilizing a small dataset in order to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.
Publication Message Passing Inference with Chemical Reaction Networks
(Curran Associates, Inc., 2013) Napp, Nils Edmund; Adams, Ryan PrescottRecent work on molecular programming has explored new possibilities for computational abstractions with biomolecules, including logic gates, neural networks, and linear systems. In the future such abstractions might enable nanoscale devices that can sense and control the world at a molecular scale. Just as in macroscale robotics, it is critical that such devices can learn about their environment and reason under uncertainty. At this small scale, systems are typically modeled as chemical reaction networks. In this work, we develop a procedure that can take arbitrary probabilistic graphical models, represented as factor graphs over discrete random variables, and compile them into chemical reaction networks that implement inference. In particular, we show that marginalization based on sum-product message passing can be implemented in terms of reactions between chemical species whose concentrations represent probabilities. We show algebraically that the steady state concentration of these species correspond to the marginal distributions of the random variables in the graph and validate the results in simulations. As with standard sum-product inference, this procedure yields exact results for tree-structured graphs, and approximate solutions for loopy graphs.
Publication Contrastive Learning Using Spectral Methods
(Neural Information Processing Systems Foundation, 2013) Zou, James; Hsu, Daniel; Parkes, David; Adams, Ryan PrescottIn many natural settings, the analysis goal is not to characterize a single data set in isolation, but rather to understand the difference between one set of observations and another. For example, given a background corpus of news articles together with writings of a particular author, one may want a topic model that explains word patterns and themes specific to the author. Another example comes from genomics, in which biological signals may be collected from different regions of a genome, and one wants a model that captures the differential statistics observed in these regions. This paper formalizes this notion of contrastive learning for mixture models, and develops spectral algorithms for inferring mixture components specific to a foreground data set when contrasted with a background data set. The method builds on recent moment-based estimators and tensor decompositions for latent variable models, and has the intuitive feature of using background data statistics to appropriately modify moments estimated from foreground data. A key advantage of the method is that the background data need only be coarsely modeled, which is important when the background is too complex, noisy, or not of interest. The method is demonstrated on applications in contrastive topic modeling and genomic sequence analysis.
Publication A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
(Curran Associates, Inc., 2013) Snoek, Jasper; Zemel, Richard; Adams, Ryan PrescottPoint processes are popular models of neural spiking behavior as they provide a statistical distribution over temporal sequences of spikes and help to reveal the complexities underlying a series of recorded action potentials. However, the most common neural point process models, the Poisson process and the gamma renewal process, do not capture interactions and correlations that are critical to modeling populations of neurons. We develop a novel model based on a determinantal point process over latent embeddings of neurons that effectively captures and helps visualize complex inhibitory and competitive interaction. We show that this model is a natural extension of the popular generalized linear model to sets of interacting neurons. The model is extended to incorporate gain control or divisive normalization, and the modulation of neural spiking based on periodic phenomena. Applied to neural spike recordings from the rat hippocampus, we see that the model captures inhibitory relationships, a dichotomy of classes of neurons, and a periodic modulation by the theta rhythm known to be present in the data.
Publication Bootstrap Learning Via Modular Concept Discovery
(AAAI Press/International Joint Conferences on Artificial Intelligence, 2013) Dechter, Eyal; Malmaud, Jonathan; Adams, Ryan Prescott; Tenenbaum, Joshua B.Suppose a learner is faced with a domain of problems about which it knows nearly nothing. It does not know the distribution of problems, the space of solutions is not smooth, and the reward signal is uninformative, providing perhaps a few bits of information but not enough to steer the learner effectively. How can such a learner ever get off the ground? A common intuition is that if the solutions to these problems share a common structure, and the learner can solve some simple problems by brute force, it should be able to extract useful components from these solutions and, by composing them, explore the solution space more efficiently. Here, we formalize this intuition, where the solution space is that of typed functional programs and the gained information is stored as a stochastic grammar over programs. We propose an iterative procedure for exploring such spaces: in the first step of each iteration, the learner explores a finite subset of the domain, guided by a stochastic grammar; in the second step, the learner compresses the successful solutions from the first step to estimate a new stochastic grammar. We test this procedure on symbolic regression and Boolean circuit learning and show that the learner discovers modular concepts for these domains. Whereas the learner is able to solve almost none of the posed problems in the procedure’s first iteration, it rapidly becomes able to solve a large number by gaining abstract knowledge of the structure of the solution space.
Publication Learning Outcome-Discriminative Dynamics in Multivariate Physiological Cohort Time Series
(Institute of Electrical and Electronics Engineers, 2013) Nemati, Shamim; Lehman, Li-wei H.; Adams, Ryan PrescottModel identification for physiological systems is complicated by changes between operating regimes and measurement artifacts. We present a solution to these problems by assuming that a cohort of physiological time series is generated by switching among a finite collection of physiologically-constrained dynamical models and artifactual segments. We model the resulting time series using the switching linear dynamical systems (SLDS) framework, and present a novel learning algorithm for the class of SLDS, with the objective of identifying time series dynamics that are predictive of physiological regimes or outcomes of interest. We present exploratory results based on a simulation study and a physiological classification example of decoding postural changes from heart rate and blood pressure. We demonstrate a significant improvement in classification over methods based on feature learning via expectation maximization. The proposed learning algorithm is general, and can be extended to other applications involving state-space formulations.