Factored time-lapse video

We describe a method for converting time-lapse photography captured with outdoor cameras into Factored Time-Lapse Video (FTLV): a video in which time appears to move faster (i.e., lapsing) and where data at each pixel has been factored into shadow, illumination, and reflectance components. The factorization allows a user to easily relight the scene, recover a portion of the scene geometry (normals), and to perform advanced image editing operations. Our method is easy to implement, robust, and provides a compact representation with good reconstruction characteristics. We show results using several publicly available time-lapse sequences.


Introduction
Time-lapse photography, in which frames are captured at a lower rate than that at which they will ultimately be played back, dates back to the late 19-th century 1 . Classic time-lapse photography subjects are clouds, stars, plants, and flowers. Today, most timelapse image sequences are collected by the thousands of cameras (webcams) whose images can be accessed using the Internet. They typically provide outdoor views of cities, construction sites, traffic, the weather, or natural phenomena such as volcanoes. Outdoor webcams are also used for surveillance to monitor outside activities around companies, ports, or warehouses. So-called webcam directories index thousands of live webcams, providing instant online access to time-lapse photos from around the world.
Time-lapse photography can create an overwhelming amount of data. For example, a single camera that takes an image every 5 seconds will produce 17,280 images per day, or close to a million images per year. Image or video compression reduces the storage requirements, but the resulting data has compression artifacts and is not very useful for further analysis. In addition, it is currently difficult to edit the images in a time-lapse sequence, and advanced image-based rendering operations such as relighting are impossible.
A key challenge in dealing with time-lapse data is to provide a representation that efficiently reduces storage requirements while allowing useful scene analysis and advanced image editing.
In this paper we focus on time-lapse sequences of outdoor scenes under clear-sky conditions. The camera viewpoint is fixed and the scene is mostly stationary, hence the predominant changes in the sequence are changes in illumination. Under these assumptions we have developed a method that provides a complete decomposition of the original dataset into shadow, illumination, and reflectance components. We call this representation Factored Time-Lapse Video (FTLV).
Our method begins by locating the onset of shadows using the time-varying intensity profiles at each pixel. We identify points in shadow and points in direct sunlight to separate skylight and sunlight components, respectively. We then analyze these spatiotemporal volumes using matrix factorization. The results are basis curves describing the changes of intensity over time, together with per-pixel offsets and scales of these basis curves, which capture spatial variation of reflectance and geometry. The resulting representation is compact, reducing a time-lapse sequence to three images, two basis curves, and a compressed representation for shadows. Reconstructions from the data show better error characteristics than standard compression methods such as PCA.
FTLVs are an intrinsic image-like scene representation that allow a user to analyse, reconstruct and modify illumination, reflectance, or geometry. The shadows may be discarded or retained, depending on the ultimate application, while other "outliers" such as pedestrians or cars are implicitly ignored. FTLVs can also be used for various computer vision tasks such as background modeling, image segmentation, and scene reconstruction. In this paper we demonstrate several applications for FTLVs, including relighting, shadow removal, advanced image editing, and pictorial rendering.

Previous Work
Time-Lapse Sequence Analysis Barrow and Tenenbaum [1978] introduced the concept of intrinsic images to represent intrinsic characteristics of a scene, such as illumination, reflectance, and surface geometry. Weiss [2001] uses a maximumlikelihood framework to estimate a single reflectance image and multiple illumination images from time-lapse video. His work was extended by Matsushita et al. [2004] who derive time-varying reflectance and illumination images from surveillance video. Matusik et al. [2004] use time-lapse data to compute the reflectance field (or light transport) of a scene for a fixed viewpoint. They represent images as a product of the reflectance field and the incoming illumination. However, the method requires estimating the incident illumination using a light probe camera, and the estimated reflectance field combines the effects of reflectance and shadows. Koppal and Narasimhan [2006] acquire image sequences with a randomly moving light source to cluster the image into regions that have similar normals. These normal clusters are then used as priors to bootstrap a variety of vision algorithms, including the decomposition of the image into the terms of a linearly separable BRDF model. Because the illumination is required to follow an unstructured trajectory, Koppal and Narasimhan's work is applicable only to outdoor time-lapse data captured over a larger period of time (many days or weeks).
Unlike this previous work, we arrive at a decomposition of timelapse sequences into shadows, partial scene geometry, and timevarying reflectance and illumination. This provides better estimates of intrinsic image qualities and more accurate scene analysis. Lawrence et al. [2006] describe a factorization method to decompose complex surface reflectance functions (spatially varying BRDFs) into a sum of products of lower dimensional (1D or 2D) terms. Similarly, Gu et al. [2006] decompose a time-varying surface appearance into lower dimensional representations that are space-time dependent. Both of these methods accomplish similar goals -they factorize large datasets of complex surface reflectances into terms that are highly compact and at the same time physically meaningful and editable. Since they acquire and model the full eight-dimensional BRDF they can render under any viewing, lighting, and, in the case of Gu et al.[2006], temporal condition. However, the complexity of the BRDF acquisition setup makes it impractical for complex, outdoor scenes.

Reflectance Factorizations
Our work bridges the gap between intrinsic images and the factorization of complex reflectance functions. In doing so we go beyond the mid-level representation of intrinsic images without requiring the acquisition setup and calibration of a complete BRDF estimation.
Inverse Rendering Inverse rendering measures rendering attributes -lighting, textures, and BRDF -from photographs. Although there is prolific literature on inverse rendering, most of the work focuses on small objects and indoor scenes. Yu and Malik [1998] and Debevec et al. [2004] recover photometric properties from photographs of outdoor architectures. They are able to relight them and create photo-realistic images from arbitrary viewpoints. However, their methods require measurements of the incident illumination and surface materials and a 3D model of the scene geometry. Similarly, Nimeroff et al. [1994] render scenes under natural illumination by combining basis images that are pre-rendered using a set of basis illuminations. But they also require measurements of scene geometry and reflectances. Seitz et al. [2005] and Nayar et al. [2006] separate the illumination in a scene into its direct and global components using controlled lighting. They demonstrate this for real-world materials and comment on the contribution of the two terms to surface appearance. We separate illumination into a global sky and a direct sun component while ignoring secondary inter-reflections, scattering, or translucency.

Video Analysis and Editing
There has been considerable research on analyzing video sequences to segment out objects and extract mattes [Chuang et al. 2002;McGuire et al. 2005;Li et al. 2005;Wang et al. 2005], to extract shadows [Chuang et al. 2003], to determine object motion and texture [Bregler et al. 1997;Schödl et al. 2000;Bhat et al. 2004], and to combine video frames into panoramas [Agarwala et al. 2005]. In addition, there has been recent work that modifies video by inserting objects [Li et al. 2005;Wang et al. 2005] or shadows [Chuang et al. 2002], enhancing small motion [Liu et al. 2005], or applying non-photorealistic stylization [Litwinowicz 1997;Wang et al. 2004;Winnemöller et al. 2006]. In contrast, the present paper performs a different type of analysis, focusing not on inferring objects' shape, motion, or texture, but rather on decomposing appearance and reflectance. Our analysis enables a different class of edits, such as changing reflectance and lighting. In addition, by extracting partial geometry of the scene, we are able to perform stylization using algorithms such as exaggerated shading ], which requires more knowledge about the scene than simply pixel intensities.

Representation
Our goal is to decompose the space-time volume F(t) of the timelapse image sequence into factors that will enable us to analyze and edit the scene. As the sun moves, the observations at every pixel in the time-lapse sequence result in a continuous appearance profile [Koppal and Narasimhan 2006] (see Figure 4(a)). The appearance profile (red curve) is a vector of intensities Fi(t) measured at pixel Pi over time (i.e., frame number). It is a complicated function of the illumination, scene geometry, and surface reflectance.
Under the clear-sky assumption we can approximate the illumination as a sum of an ambient term corresponding to sky illumination and a single-directional light source corresponding to the radiance of the sun. Using the linearity of the rendering equation, the spatiotemporal volume F(t) can therefore be expressed as a sum of the sky light and sunlight components: (1) Here Ssun(t) is the shadowing term that describes if a pixel is in shadow (and therefore has no sunlight contribution) or not.
One of the key insights in this paper is that for outdoor timelapse sequences under clear-sky conditions the appearance profiles of all points in the scene are similar up to an offset along the time axis and a scale factor. This is similar to the notion of orientation-consistency introduced by Hertzmann and Seitz [2005], which states that, under the right conditions, two points with the same surface orientation must have the same or similar appearance 101-2 • Sunkavalli et al. in an image. They compute the surface normals of a target object that has been imaged together with one or more reference objects with known shape (sphere) and similar BRDF. Koppal and Narasimhan [2006] also noted that scene points with the same surface normal often exhibit extrema in their profiles at the same time instant, irrespective of their material properties. They use a metric that matches appearance profile extremas, and unsupervised clustering, to compute orientation consistencies between scene points of unknown normals and BRDFs. The key difference in our work is that we seek to separate the contributions of shadows, skylight, and sun illumination. Similar to the work by Gu et al. [2006], we represent the corresponding appearance profiles as a linear combination of basis curves that are offset and scaled.
Based on our insight, we approximate Isun(t) with a single basis curve Hsun(t) scaled by per-pixel weights Wsun. In addition, we allow a per-pixel time offset Φ to the sunlight curve that stands in for the normal dependence of the appearance profile: We call weight matrix Wsun the sunlight image, the basis curve Hsun(t) the sunlight basis curve and the offset Φ the shift map.
Since the sun is a strong directional light source, Hsun(t) is an estimate of the 1-D slice of surface reflectance corresponding to the camera viewpoint and the arc described by the sun's motion.
The time-shifted basis is an accurate representation for appearance profiles because of the nature of sun illumination. Since the sun moves at a constant angular velocity, the appearance profiles at pixels with different surface normals correspond to different uniformly sampled 1-D slices (corresponding to the camera viewpoint and the plane of the sun) of the 4-D BRDF. For both diffuse and specular surfaces it has been shown that the shape of this slice is largely the same with the fundamental difference being the normal-dependent shift that Φ captures. If the arc described by the sun is known, under the assumption that the appearance profile is maximum when the sunlight is normally incident on the surface, the computed shift map can be converted into an estimate of the surface normal. Since the sun moves only in a plane, the shift map is an estimate of the surface normal projected onto this plane.
In order to estimate I sky (t) in Equation (1), we look to find a single illumination-vs.-time basis curve H sky (t) for the entire image sequence, such that the appearance of any shadowed pixel Pi may be reproduced as: We call the matrix W sky of per-pixel weights the skylight image, and the basis curve H sky (t) the skylight basis curve. Note that we do not apply an offset to the skylight curve in Equation (3) because the diffuse nature of sky illumination makes the offset hard to estimate.
The final representation (see Figure 2) therefore is: Our goal is to separate F(t) into I sky (t), Ssun(t), and Isun(t) and estimate all the per-pixel weights, per-pixel shifts, and time-curves without knowledge of scene geometry, reflectance, illumination, or camera calibration. We make the observation that we can estimate I sky (t) from points that are in shadow, whereas points in the sun contain I sky (t)+Isun(t). Our approach is to first estimate Ssun(t) and to use this to separate I sky (t) and Isun(t) and estimate the rest of the terms. The next section describes how we estimate Ssun(t) from the time-lapse data, and Section 5 shows how we use matrix factorization to solve for W sky , H sky (t), Wsun, Φ and Hsun(t). Figure 3 shows one frame of a time-lapse sequence of the Santa Catalina Mountains in Arizona. We will use this frame and pixels A, B, and C throughout our discussion. All computations are performed in RGB color space and independently on the three color channels. For simplicity we will focus on the red channel only. Our model assumes that the color of visible pixels is due to reflection from surfaces in the scene. Therefore we do not consider sky pixels in our representation. We compute the sky mask (using Photoshop's magic wand tool) for one frame and use it to later composite the sky from the original data to the reconstructed images. This has the added benefit that we preserve moving clouds, which are an important visual component in time-lapse videos.

Shadow Estimation
As can be seen in Figure 4(a) pixel intensities differ dramatically if the scene point is illuminated by the sun or if it is in shadow. We use these discontinuities in the appearance profiles to estimate shadow images. We first compute the median value mmin of the n smallest intensities at each pixel. We typically assume that each point is in shadow at least 20% of the time, so n is 20% of the total number of frames in the sequence. We set the shadow function Si(t) (purple curve in Figure 4(a)) to one for each Fi(t) > kmmin and to zero otherwise. We heuristically found that k = 1.5 worked well for most of the sequences in our experiments.
Figure 5 (top) shows the value of the shadow function for all pixels in frame 275. We call this the binary shadow image. It is generally quite noisy due to moving objects (e.g., trees, smoke, or people) and changes in the illumination (e.g., clouds). Therefore, to compute the final shadow image Ssun we use an edge-preserving bi-lateral filter [Tomasi and Manduchi 1998] (Figure 5 (bottom)). We could improve the shadow images by explicitly removing dynamic objects, either by computing median images for short time   intervals in the input sequence [Matsushita et al. 2004] or by using more sophisticated background models.

Time-Lapse Factorization
The computational framework we use to decompose appearance profiles into W and H factors is Alternating Constrained Least Squares (ACLS) [Lawrence et al. 2006], which has previously been applied to the problem of decomposing measured reflectance data into intuitively editable components. ACLS is similarly well suited for our task of decomposing appearance profiles, since its ability to incorporate domain-dependent constraints -such as nonnegativity, sparseness, and smoothness -leads to stable and intuitive decompositions. ACLS can also incorporate a confidence matrix C to deal with missing data; setting an entry of C to zero will cause the corresponding measurement to have no effect on the factorization. We use the matrix C to implicitly separate skylight and sunlight components.
Formally, each application of ACLS decomposes an m × n data matrix F(t), where m is the number of pixels in the image and n is the number of frames in the time-lapse sequence, into the product of an n × k weight matrix W and a k × m basis matrix H(t).
The algorithm takes k, the number of basis curves, as input. We set k = 1 for the decompositions shown in this paper. We apply ACLS in two separate steps to factor a matrix of measured spatio-temporal appearance profiles F(t), first factoring I sky (t) and then solving for Isun(t)).

Skylight Factorization
Multiplying the appearance profiles Fi(t) in Figure 4(a) by (1 − Si(t)) yields the appearance profile of points in shadow as shown in Figure 4(b). We perform ACLS factorization using F(t) as the data matrix and (1 − S(t)) as the confidence matrix to solve Equation (3). This ensures that we consider only shadowed pixels. Note that this is a different strategy from using interpolation to fill in data for unshadowed pixels, as we would need for factorization methods that do not consider confidence. However, because we have many frames in the sequence, the system of equations is highly over-constrained for a small number of basis curves. Intuitively, if we are missing data at one pixel we probably observe it at another pixel with a similar normal and appearance profile.
Figure 4(b) shows the skylight basis curve and its fit to the input data. We reconstruct the estimated I sky (t) using Equation (3). Figure 6 shows the skylight image W sky and a single frame of the reconstructed I sky (t). The skylight image is related to both surface albedo and ambient occlusion, where the ambient term for a point on a surface is determined by how occluded that point is by other surfaces in the scene (i.e., the darker the pixel, the less skylight it receives or reflects). The reconstructed images I sky (t) approximate how the scene would look throughout the time-lapse sequence in the absence of direct sunlight.   : Example frames for each dataset (first row) and appearance profiles (second row) of pixels A, B, and C (red channel). In the third row are the aligned appearance profiles and estimated sunlight basis curves (black). The profiles were aligned and scaled using the estimated shifts Φi and Wsun for pixels A, B, and C.

Sunlight Factorization
The reconstructed images I sky (t) are subtracted from the original data F(t) and the result is clamped to 0 to form the matrix Isun(t). Multiplying Isun,i(t) by Si(t) yields the appearance profile of frames when the points are illuminated only by sunlight as shown in Figure 4(c). Using Isun(t) as the data matrix and S(t) as the confidence matrix C thus ensures that only the sunlight component at every pixel is considered during the factorization. We run ACLS to solve Equation (2) and compute the basis curve Hsun, the sunlight image Wsun, and the shifts Φi.
In order to adapt ACLS to handle the time-offsets necessary to implement Equation (2), we modify the iterative update stage of the algorithm. Specifically, the original algorithm alternates between phases in which H(t) is held fixed while W is optimized using least squares, then vice versa (this is an instance of the principle of expectation maximization in inference). In order to incorporate the shifts Φi, we shift the entire matrix H(t) by +Φi when updating Wi and, similarly, shift each row i of F(t) by −Φi during updates of H(t). Finally, we introduce a third update phase during the iteration, in which we update Φi by finding, for each pixel, the shift that minimizes error. The metric used here is the same as that for updating W and H, i.e., the confidence-weighted Euclidean error between the scaled and offset basis curve H(t) and F(t). As with ACLS, our modified algorithm reduces error at each iteration, and is guaranteed to converge to a (possibly local) minimum.
Figure 4(c) shows the sunlight basis curve Hsun(t) and its fit to the input data, and Figure 8 shows the sunlight image Wsun and the reconstructed image Isun(t). The harsh black shadows in the reconstructed image Isun(t) are similar to images taken in a vacuum without atmospheric light scattering, such as images from the moon. Figure 8 (bottom) shows the shifts Φ for the sun component. This image is a good estimate of the partial geometry (normals) of the scene. If we have time-lapse sequences from the same static viewpoint for different days, it is possible to recover more than just this one-dimensional approximation of surface normals.

Results
The datasets for our experiments come from online webcams 2 . As shown in Table 1, the sequences are of varying length and resolutions, typically captured at intervals in the range of every 15 seconds to every 5 minutes. All of the datasets were originally compressed. We extracted the individual frames and automatically removed the regions covered by the sky using the sky-mask. All results in this paper and video were computed without sky. For some of the results in the paper and the video we composited the sky regions back into the sequences. Figure 7 shows sample images from four sequences with three pixels marked on each. The figure shows the appearance profiles for the red channel of the pixels over time (similar results are obtained for green and blue). Similar to Figure 4, the profiles show shadowing in the form of discontinuities but otherwise appear similar to each other. In Figure 7 in the bottom row we show the alignment of these separate time-varying profiles using the scales Wsun,i and shifts Φi we estimated in our factored representation for each pixel. The black curves show the sunlight basis curves for each sequence. The scaled and time-shifted profiles match the basis curves very well. Figure 9 qualitatively shows the accuracy of our reconstruction for the Square sequence. The reconstruction with shadows accurately matches the original images. Specific problem areas in this sequence are the streets and sidewalks due to parked cars, moving traffic, and people. Interestingly, our shadow estimation technique picks up the shadows of moving objects, which are especially visible in a few of the sequences shown in the accompanying video.

Reconstruction Quality
In some cases we median-filtered the shadows Ssun(t) temporally to remove flickering due to moving people, cars, etc. The bottom row in the figure shows our reconstruction without shadows. It effectively removes the "ghost" shadows of moving objects and the large shadow that is visible throughout most of the sequence. Table 1 shows the overall RMS image reconstruction errors for FTLV that were computed across all temporal frames and spatial locations. The error is quite large compared to traditional image and video compression. This is not surprising, since FTLV does not encode moving objects. To get a better qualitative handle on the reconstruction error we compare FTLV to principal component analysis (PCA), a standard matrix rank-reduction algorithm. Similar to ACLS, PCA decomposes the original spatio-temporal matrix F into the sum of the meanF and a product of weight matrices W and basis vectors H. The number of basis vectors determines the fidelity of the reconstruction.   Bottom: PCA reconstruction with the same error using five terms plus the mean. Note the blurry shadows in the PCA reconstruction.

Error Analysis
the corresponding FTLV error by the symbol X. The number of PCA terms that are necessary to achieve the same FTLV RMS error is also shown in Table 1. PCA typically needs more terms (plus the mean) since FTLV is equivalent to three terms (images). A visual comparison of the corresponding reconstructed sequences shows that FTLV is more effective in preserving sharp details, especially at shadow boundaries (see Figure 11). A reasonable reconstruction of shadows requires a large number of PCA terms (at least 15 for our datasets). In addition, PCA does not produce a meaningful description of the data. In particular, PCA allows negative values in W and H, resulting in a representation whose terms cannot be edited independently.

Compression
As can be seen from Table 1, FTLV is a highly compact and efficient representation for time-lapse sequences. For these comparisons, we store the skylight, sunlight, and shift images using high-quality JPEG. The binary shadow functions are stored per pixel. For each pixel, we store the frame numbers at which shadows start and end and do LZW compression on these number-pairs. The two basis curves are stored in ASCII text files. Table 1 shows the file sizes for raw images, PCA, FTLV without shadows, and FTLV with shadows. The file size of FTLVs is dominated by the encoded shadows. Of course this is very scene dependent. Scenes with highly complex shadows (pixels go in and out of shadow many times) have multiple shadow start-end pairs and this is why scenes such as Yosemite require more storage while Temple or Square require less.

Editing and NPR
A major benefit of FTLV is that it factors the scene into physically meaningful components, each of which can be edited to create interesting effects. Edits to the shift image affect surface appearance by changing their pseudo-normals. An example of this can be seen in Figure 12, where we show closeups of three frames from the Yosemite sequence. We added the imprint of the SIGGRAPH logo to the snow in the foreground by editing the shift map. We also edited the shadows in that region to keep the logo in sunlight.
Edits to the sunlight image and curve have an effect on surface reflectance. Figure 1(d) shows an example of this for the Square sequence. We changed the surface appearance of various rooftops by changing the sunlight curve to be more specular. We changed the sunlight image to add the SIGGRAPH logo and added windows to the building in the front by editing the skylight, sunlight, and shifts. We also selectively removed shadows, for example, for the building in the background. Note that by simply editing the skylight, sunlight, or shifts we affected the entire time-lapse sequence. As can be seen in the accompanying video, the results look visually plausible and could certainly be improved with more artistic care.
In order to demonstrate the flexibility of our method, we have also experimented with non-photorealistic effects to generate stylized images and videos of the input scenes. We first transform the shift maps Φ into pseudo-normal maps by mapping shifts to angles along an arc through the sky. In order to slightly add to the plausibility of the normals, we apply a small shift to the normals based on the ratio between the computed sky and sun maps (reasoning that vertical surfaces generally receive less sky illumination than horizontal ones do). While these normals are certainly not accurate, we nevertheless expect that they are related to the true normals by some (continuous) function, and they are sufficient as input to rendering techniques such as exaggerated shading ], which seek to emphasize local differences between normals. We generate our final NPR results by compositing the exaggerated shading with the sun and sky color maps, as well as (optionally) the shadow maps. Figure 13 shows results obtained using this technique for still images, while the supplementary video shows the result for moving scenes. Note that the computation is temporally coherent over time, leading to smooth video results (with only sharp temporal discontinuities due to the motion of shadows). Figure 12: Edits in the Yosemite sequence to add a logo to the snow. The edits were made in the shift ("normal map") image, and the results show plausible time-varying behavior.

Discussion
As with any least squares factorization approach there are some practical considerations to keep in mind while applying FTLV to real-world data. These are closely tied to the behaviour of the factorization and its sensitivity to the various parameters.

Initialization
We have found that in practice FTLV is sensitive to the initialization of the shifts Φ but robust to the initialization of W and H. We typically initialize Φ with random values in the range [−(n/2), +(n/2)], where n is the number of frames. This range may vary from dataset to dataset and is dependent on the distribution of normals in the scene and the temporal sampling of the data. For example the Square sequence, with a larger span of effective normals and temporal sampling, had a larger range. For the Temple sequence, which has fewer sets of normals, the range was smaller.
Also, while FTLV is robust to small errors in shadow estimation, drastic errors will corrupt the results. The simple heuristics we use to compute shadows have worked for most of our datasets. Pixels that are always in the shadows are treated as having Wsun = 0 while for pixels that are always in the sun W sky and Wsun are estimated with additional constraints based on the sunlight and skylight intensities estimated from other pixels.

Computational Costs
Like the original ACLS algorithm and other least squares optimizations with similarly large amounts of data, FTLV is computationally intensive. The additional iterations on Φi may be computationally expensive (especially if the range of values of Φi that we search over is very large) but in relative terms do not add significant processing to the algorithm. Quantitatively, for the Square dataset with 180,000 pixels and 850 frames, our sunlight factorization converges in about 45 mins on a P4 3.0 GHz with 2 GB of RAM.

Sampling
FTLV performance is also dictated by the temporal sampling of the input time-lapse sequence. The temporal-sampling limitations on our algorithms are closely tied to the nature of scene (diffuse scenes would work with lower sampling, complex reflectances would require more data points) and camera and image quality (noise, saturation, and compression will corrupt data). In our experiments, for the Square dataset (which has fairly smooth and clean appearance profiles) we have obtained equivalent results from 60 frames instead of 850.

Conclusions and Future Work
FTLVs are a compact, intuitive, factored representation for timelapse sequences that separate a scene into its reflectance, illumination, and geometry factors. They enable a number of novel imagebased scene modeling and editing applications. The intuitive representation enables applications such as shadow removal, relighting, advanced image editing, and painterly rendering. As sophisticated background models FTLVs can be used to model temporal scene variations and improve tracking. Their compact nature enables compression of large time-lapse datasets.
While our current results factor scenes into single basis curves it is not difficult to imagine using multiple curves. Our intuition is that multiple basis curves would relate to different material properties (e.g., diffuse and specular curves). This will both reduce error and enable material-based separation of scene elements.
We are currently looking at methods to express images as arbitrary fractional sums of the sky and sun terms. In future work we would also like to extend FTLVs to arbitrary illumination, including indoor scenes, and scenes with participating media, e.g., a time-lapse sequence captured on a foggy day.
Another limitation is that FTLVs do not handle motion. Moving objects show up in the residue between original and reconstruction and could be added back into the sequence. However, to improve the FTLV factorization it would be beneficial to remove moving objects first, or to use an initially computed FTLV as a background model for dynamic object tracking and removal. One could then try to find a compact representation for dynamic scene elements, including clouds and the sky.
Original Shadow Sky Sun Shifts Edited NPR Figure 13: Example images for the Yosemite, Temple, Square, and Tokyo sequences. FTLV essentially represents the spatio-temporal video volume with a set of shadow images (second row) and the skylight, sunlight, and shift images (next three rows