Publication: Content-Aware Manipulations for Image and Video Collections
Open/View Files
Date
2012-08-17
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Dale, Kevin. 2012. Content-Aware Manipulations for Image and Video Collections. Doctoral dissertation, Harvard University.
Research Data
Abstract
Digital photography and videography have become ubiquitous. With ever-cheaper and
more capable cameras, found in dedicated devices and, increasingly, in multipurpose
smartphones and tablets, it is now easier than ever for the casual user to generate their
own dense stream of personal multimedia data. With popular photo and video sharing
sites, like Flickr, Facebook, and YouTube, users can share their images and video with
the world, making for a vast amount of multimedia data stored at home and on the web.
This flood of data presents many challenges, particularly for the non-professional, to
manage their data, for example, to apply simple photographic adjustments, or to find
interesting shots worth keeping among megabytes of data from even a short weekend
trip. At the same time, researchers have a unique opportunity to exploit the vast
amount of publicly available multimedia data to make graphics tasks easier for the
individual. This dissertation presents work that seeks to address some of these challenges, and,
where possible, exploit existing data sets to do so. First, we discuss a general approach
that finds images similar to a given input from among a collection of photographs, from
which various task-specific properties are transferred to the input. We demonstrate this
basic approach in two distinct settings–image restoration and CG image enhancement.
Next, we focus on collections of video. We first present a method for efficiently
browsing and summarizing collections of related videos. Our approach is based on a
simple pairwise video alignment that identifies a relevant sequence of video clips that
best matches an input video. Finally, we discuss our work on replacing facial
performances in video that requires no special hardware and can be used to retarget
existing footage to synthesize new performances.
Description
Other Available Sources
Keywords
computer science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service