Occlusion Models for Natural Images: A Statistical Study of a Scale-Invariant Dead Leaves Model

We develop a scale-invariant version of Matheron's “dead leaves model” for the statistics of natural images. The model takes occlusions into account and resembles the image formation process by randomly adding independent elementary shapes, such as disks, in layers. We compare the empirical statistics of two large databases of natural images with the statistics of the occlusion model, and find an excellent qualitative, and good quantitative agreement. At this point, this is the only image model which comes close to duplicating the simplest, elementary statistics of natural images—such as, the scale invariance property of marginal distributions of filter responses, the full co-occurrence statistics of two pixels, and the joint statistics of pairs of Haar wavelet responses.


Introduction
Recently, there has been a great deal of interest in the statistics of natural images and many investigations of these from both the computational and biological vision perspectives.From the computational side, this has been motivated by many applications including: (i) the need for more effective image compression, e.g., Buccigrossi and Simoncelli (1999), (ii) the search for better deblurring and denoising algorithms, e.g., Freeman and Pasztor (1999), and (iii) the need to estimate the rates of false positives and false negatives in target or face recognition algorithms, e.g., Sullivan et al. (1999).
There are two discoveries which have motivated the search for good models of image statistics.The first is that image statistics are extremely non-Gaussian.For example, a large class of image enhancement algorithms is based on the paradigm of decomposing an observed image I into an enhanced image J and a noise component n and it is standard to assume n is Gaussian noise.These algorithms perform poorly because the 'noise' component which one wants to remove is often really caused by the clutter 1 of small objects or markings, partially resolved by the camera and not important to the understanding of the scene, and the statistics of such clutter are not Gaussian at all.The non-Gaussian nature of image statistics is apparent if one computes histograms of virtually any filter on virtually any database of images: the histogram will, essentially always, have kurtosis greater than 3 (the kurtosis of Gaussian distributions).
The second reason for the interest in image statistics derives from the empirical observation that image statistics scale.This means that any local statistic calculated on n ×n images and on block averaged 2n ×2n images should be the same, or that the probability of seeing an image I (x, y) is the same as that of seeing I (σ x, σ y).Although not exact, this result has been approximately confirmed on all large image databases that we have heard about.The reason this is exciting is that it implies that local image models describing small-scale structures will work as global image models describing the large-scale structures in images.This creates a surprising stability for image statistics and creates a link between the denoising and object recognition problems.In the first case, one wants to eliminate small irrelevant structures, in the second one wants to reject large background objects other than the soughtfor object.Since local clutter and generic backgrounds have the same statistics, the same sort of statistical tools can be used.In fact, in automated target recognition (ATR), the term clutter is used to describe the mass of irrelevant foreground and background objects, for example rocks and trees, which occur at all sizes (more smaller ones than larger ones) in natural images, and which significantly degrade the power of object recognition algorithms.The key issue in ATR is whether you have to identify and model every one of the objects in the foreground/background before finding the object of interest, or whether there are common statistics in natural images which enable you to separate the target from the rest of the scene without explicitly describing the clutter in detail.
For all the applications mentioned above, we need new non-Gaussian stochastic models for natural images.We may broadly characterize stochastic models of images into two classes: -"Descriptive" models.In these models the only random variables are the image pixel values (or the values of filters applied to the image).Typical models of this type are Gibbs models: where N (P) is a suitable pixel neighborhood of P, E is a local "energy" function and Z is the normalizing constant.While these models have been able to describe a wide range of simple textures, see e.g., Heeger and Bergen (1995) and Zhu et al. (1998) they lack the concept of structured objects, and usually fail to be scale-invariant or to reproduce the correct long-range dependencies.-"Generative" models.These models include hidden variables for the underlying causes of structures in real world imagery.An image is here a composite of objects or imagelets with extra random variables such as the location, scale and grey level of particular objects or imagelets.The generative models most common in the literature are (1) models based on templates of specific objects, such as faces, e.g., Hallinan et al. (1999), tanks, machined parts etc., often with extra variables such as non-linear distortions, lighting variables and location of key points on the template, or (2) models which approximate images by a linear superposition of transparent basis images, as in ICA, e.g., Olshausen and Field (1996), PCA and wavelet expansions.Mumford and Gidas, 2000), and extended contours or regions broken into pieces due to occlusions.This paper studies a hybrid class of generative image models which shares some attributes of both classical template-based models and wavelet expansion models.The basic idea behind these so called "dead leaves models" is to assume that the image is formed from a set of template-based elementary objects, whose locations and possibly scale are a sample from a Poisson process, and which partially occlude one another, being laid down in layers.These models were first introduced by Matheron (1968) and Serra (1982) for the morphological analysis of materials, and has only recently been applied to natural image statistics-see, for example, Ruderman (1997) on the origins of power spectrum scaling, and Alvarez et al. (1999) for the analysis and representation of geometric features in natural images.
An important issue is whether such models are only useful as toy models whose statistics can be studied or whether they can be fit to real images with a reasonable amount of computation.For some applications, it may not be necessary to use very complex generator models.In fact, Grenander and Srivastava (2000) have recently showed that a simple generator model made up of the profiles of the same object can already capture the variability of different types of scenes better than any of the existing clutter models, and that the parameters of the model (which in this case are two) can be directly estimated from the derivative histograms of real images.For applications that require a more accurate description of the complexity of cluttered scenes, there will of course be a larger number of random variables and parameters that have to be estimated.
In a recent paper (Zhu and Guo, 2000), Zhu and Guo have attacked this problem and shown that, at least in some cases, a generative model can be fit to textured scenes with multiple moderately large textons by using sophisticated Monte Carlo techniques.The advances in Monte Carlo methods, including the CON-DENSATION algorithm (Isard and Blake, 1998), Data Driven MCMC (Zhu et al., 1999) and adaptive texton segmentation models (Malik et al., 1999), have made the computation of complex hidden variables and complex scene structure seem much more accessible than before.For these reasons, we do think that probability models that are physics-based and include structured primitives are practical.
The goal of this paper is, however, not to develop tools for any of the applications listed above.Our aim is to study whether a simple version of the dead leaves model can reproduce the empirical statistics of natural images better than any of the other classes which have been studied, e.g.Gaussian models, the "scale mixture of Gaussians model" of Wainwright and Simoncelli (2000), the infinitely divisible models by Mumford and Gidas, and Markov Random Fields.Ultimately, we believe that different variations of dead leaves models (possibly with more realistic shapes and dependencies between objects) will be useful for many applications.More specifically, we will study four basic image statistics and statistical properties to see whether simple dead leaves models duplicate the empirical facts drawn from very large databases of natural images.The first is the scaling behavior of images, described above.The second, also mentioned above, is the high kurtosis of filter statistics derived from images, e.g.derivatives (the difference of adjacent pixel values).The third is the highly irregular shape of joint histograms of wavelet coefficients, esp.Haar wavelets.The fourth is the complex behavior of the full twopixel co-occurrence statistics which, as we will see below, seem to be best modeled as mixtures.This is the most direct confirmation of the accuracy of random-collage models where this mixture property is fundamental.
We wish to explain our emphasis on Haar wavelet filters in this study.Because of the simplicity of Haar filters, any structure in the statistics can be directly related to pixel values (see for example Huang et al., 2000) for an analysis of the local image structure in range images.It also seems that Haar filters show the non-Gaussian structure in local image patches clearer than smooth filters.
There appear to be two types of non-Gaussian behavior found in the statistics of filter responses.The first is the high kurtosis of the distribution of such filters (Field, 1987).Independently, Grenander and Srivastava (2000) and Wainwright and Simoncelli (2000) propose that this is the result of the filter response being a product of an independent "contrast factor" and a "geometry factor" reflecting the geometry of a contrastnormalized image.The kurtosis of such products is the product of the kurtosis of the factors, hence is 9 if the factors were Gaussian.
The second type of non-Gaussian behavior concerns the joint histogram p(x)dx of k filters R k , k ≤ 2, and considers the equi-probable contours p = cnst.If these are not ellipsoids, the filters are not jointly Gaussian, not even a "scale mixture of Gaussians", i.e. a product of a scalar contrast factor and a Gaussian distributed vector Wainwright and Simoncelli (2000). 2  It seems that Haar filters produce more non-elliptical contours than smooth filters: compare (Zetzsche et al., 1993;Simoncelli, 1999, esp. Fig. 5) with (Huang and Mumford, 1999, Fig. 8)-hence, we use them as a stronger test for our model.The fact that a dead-leaves model with circular leaves and smoothing reproduces these irregular equi-probability contours shows that the contours are not simply an artifact of the interaction of Haar filters with horizontal and vertical edges in natural images.
The paper is organized as follows.In Section 2, we will introduce and analyze an approximately scaleinvariant dead leaves model, discussing the need for large and small cut-offs in the sizes of the templates and their scaling behavior.In Section 3, we compare smoothed dead-leaves images with two large databases of natural images.We look first in detail at the derivative statistic and the scaling behavior.A considerable variation in this statistic is encountered in different classes of natural scenes and these will be compared to different versions of the dead leaves model.After this we consider the joint histogram of pairs of Haar wavelet responses.Finally, we look at the two-pixel co-occurrence statistics and the effects of smoothing in the model.

Dead Leaves Model with Approximate
Scale Invariance

Basic Set-up. A 2.1D Sketch of the World
Our model is based on the notion that the world can be broken down into approximately independent discrete objects of different sizes.When viewed, the 3D world creates a collage of objects that occlude each other (see Fig. 1): Imagine a simplified 3D picture, where the viewed surfaces of objects are modeled as rigid planar templates parallel to the image plane; and each template is at a random position (x, y, z), where z is the distance to the plane.Mathematically, we write the solid world as where s i are points from a uniform Poisson process ) of intensity λ; and T i are closed sets in R 2 × {0}, centered at 0 and of random sizes r i .
Each template T i is furthermore painted with some albedo a i .In Section 2.5, we assume a uniform albedo a i , and disk-like templates; but in principle, a i can be a function of T i , and the template can be of any shape, e.g. a face or a tree (fixed shape), or a polygon with random sides and angles (random shape function).
We assume that the random variables r 1 , r 2 , . . .and a 1 , a 2 , . . .are independent samples from f (r ) and p(a), respectively.The variables r i and a i can be regarded as markings of , but we can also (according to the marking theorem, Kingman, 1993) consider the set of points * = {(x i , y i , z i , r i , a i )} (2) as a Poisson process in the product space R 2 × [0, z max ] × [r min , r max ] × [a min , a max ] with measure dµ * = λ f (r ) p(a) dx dy dz dr da. (3) Equations ( 1)-(3) define the 3D world of objectsthe physical world.The next issue is how to map the 3D world to 2D images.In a model with transparent objects (see for example the random wavelet expansion in Mumford and Gidas (2000) an image J (x, y) is given by an arithmetic sum: where i is such that (x − x i , y − y i ) ∈ T i .
In our model, however, objects are opaque.We define the image I (x, y) by (orthographic) projections with occlusions, and let z max → ∞ for complete coverage. 3This gives where is the index of the closest object in a certain (x, y)direction.

Remarks on Density Parameter and Sampling. A Perfect Simulation of the Dead Leaves Model.
1.In the images (Eq.( 5)) with z max → ∞, the density parameter λ does not affect the amount of "clutter" or other statistics in the image samples-as they do in a transparent world (Eq.( 4)) where z max has to be finite: The images are created by a process which looks at all z ≥ 0, and there is no explicit z-dependency in the model.Thus, rather than sampling in 3D (from the Poisson process ), we can place random templates in an ordered sequence front to back (the order corresponding to the relative distance to the viewer) until the background is completely covered.The latter construction is used in Section 2.5 for the numerical simulation of dead leaves images.2. Furthermore, it can be shown (Kendall and Thönnes, 1998) that a front-to-back simulation until complete coverage is equivalent to the conventional back-tofront simulation 4 of the dead leaves model until stationarity.In other words, the construction delivers exact samples from the equilibrium distribution of a Markov chain {I (k) (x, y)} where for k = 1, 2, . . ., and random templates T k with grey levels a k .

Condition for Scale Invariance: Cubic Law of Sizes
As mentioned before, one of the most striking properties of natural images is an invariance to scale.This puts a strong constraint on realistic stochastic models for natural images.
In Ruderman (1997) scaling is related to a powerlaw size distribution of statistically independent regions, but the discussion is limited to scaling of the second-order statistics, and dead-leaves models with disks.In this section, we use a different formalismthe full probability measure of a Poisson process-to show that higher-order scaling, which has been observed empirically in the filter responses of natural images (Ruderman, 1994;Zhu and Mumford, 1997), places further restrictions on the images.The objects can also be of general shape.
The argument is that images are fully scale-invariant, i.e.
if the Poisson process * (Eq.( 2)) is invariant under "2D scaling" is uniquely determined by the measure dµ * (Eq.( 3)).Now, scaling according to Eq. ( 9) leads to a Poisson process Assume, for the time being, that we can ignore the short-and long-distance cut-offs in object sizes.The model above then scales if and only if Furthermore, if Eq. ( 12) is true, then the images I (x, y) and I (σ x, σ y)-which are projections of samples from * and * σ (see Eq. ( 5)), respectively-are statistically equivalent.
Equation ( 12) leads to a constraint on the sizes r of the objects: We refer to this as the "cubic law of sizes".

The Need for Cut-Offs in Sizes
It has previously been shown (Mumford and Gidas, 2000) that transparent models (the "random wavelet expansion") can be fully scale invariant.A possible problem with a model with occlusions is that we need size cut-offs r min and r max to obtain non-trivial images.
It can be shown that (assume cubic law of sizes): As r min → 0, the images are totally covered by microscopic objects.For each r 0 , the proportion of area covered by objects of size < r 0 goes to 1. On the other hand, as r max → ∞, the probability of an image containing only one object tends to 1 (see Appendix A for proof).Almost all image samples will then have uniform intensities.
The finite bounds introduce characteristic length scales in the system; thus, preventing full scaling.Below, we investigate the second-order statistics and scaling of an occlusion model with finite cut-offs.

Predictions for Dead Leaves Model with Finite Cut-offs
We assume disk-like templates with random radii from a 1/r 3 distribution where r min ≤ r ≤ r max .
As before, we consider the continuous case-where I (x, y) is a function in R 2 of continuous variables x and y.(The effects of discretization and smoothing are studied empirically in Section 3.)

Two-Point Co-occurrence Function.
Because the images are both translationally and rotationally invariant, the two-point statistics depend only on the distance between the points. Let be the co-occurrence function or the joint probability density function (joint pdf) for two points in a random image I ; a and b are grey levels, and x is the distance between the points.In our occlusion model, each object has a uniform intensity, and different objects are statistically independent.This gives (15) where P same (x) is the probability that two points a distance x apart belong to the same object, δ represents the Dirac delta function, and f is the probability density function for the intensities of the objects.The first term in Eq. ( 15) represents points on different objects, and the second term represents points on the same object.This "mixture nature" is fundamental for the model.
In Appendix B, we show that where B(x) is defined by Eq. ( 39), and a numerical estimate is given by Eq. ( 41).
From K (a, b, x), we can derive all statistics of second order; for example, the difference statistics (Section 2.4.2) and the covariance statistics (Section 2.4.3) of two points.

Difference Statistics.
The random variable is here the difference D between two points a fixed distance x apart.From Eq. ( 15), we have that the probability density function of D is 2, r min = 1/8, r max = 2048, x = 1).For the "double-exponential" form (which is a first approximation of log-contrast for natural data, see Section 3.1) we get The peak at D = 0 corresponds to regions with no contrast variation ("same object"), and the tails correspond to edges in the images ("different objects").As we shall see in Section 3.2.2 (Fig. 5(b)), the peak gets shorter and the straight tails become concave when images are filtered-but the mixture nature of two distributions (one concentrated at 0, and one with heavy tails) will still remain.

Covariance Statistics.
We write the covariance function schematically as where 0 is an arbitrary origin, x is a location in the image, and the brackets imply an average over angles, a shift over positions, and an ensemble average over different images I (x) (with mean zero).From the cooccurrence function in Eq. ( 14), we get where C 0 = I(0) 2 is the variance, and P same (x) is given by Eq. ( 16).This covariance function is approximately a power-law for models with power-law sized objects (see e.g.Ruderman (1997) for disks, and Alvarez et al. (1999) for more general shapes).Below we use the numerical expression in Eq. ( 41) to get an estimate of how C(x) depends on the parameters (i.e. the cut-offs) in a model with cubic law of sizes.
Figure 2(b) shows a numerical example for r min = 1/8 and r max = 2048.The solid line represents the predicted covariance function (Eq.( 19)), and the dashed line) a least-square-error fit (in the region 4 < x < 128) of this function to The two lines are almost completely overlapping; The best fit occurs for η = 0.14.We repeat the power-law fit for different values of r min and r max , and find that the power-law approximation (Eq.( 20)) is good for small values of r min and large values of r max (e.g.r min 1 and r max 1024).Figure 3 shows how the number η depends on the ratio r max /r min .
The reason why the figure is interesting is that it gives us an estimate of the deviation from scaling in our model due to the finite cut-offs.Note that full scale invariance defined by Eq. ( 8), implies a power spectrum of the form 1/ f 2 , and a covariance function with logbehavior-but scaling (with renormalization) leads to a power spectrum 1/ f 2−η , where η = 2 • ν, and a covariance function C(x) with the power-law form in Eq. ( 20).The number η is often known as the "anomalous dimension".

Numerical Simulation of Dead Leaves Images
So far we have assumed that images I (x, y) are functions of continuous variables x and y.In reality, natural images are given by measurements from a finite array of sensors that average the incident light in some neighborhood.We need to take these things into account in the occlusion model: In Section 3 we analyze a database of 1000 discretized dead leaves images I [i, j] (i and j are the row and column indices, respectively) with subpixel resolution and subpixel objects.Each image has 256 × 256 pixels, a subpixel resolution of 1/s pixels (length scale), where s = 16, and disks as templates.The radii r of the disks are distributed according to 1/r 3 , where r is between r min = 1/8 pixels and r max = 8 • 256 = 2048 pixels.The exact construction is as follows: First, we make an image which is s times larger than the final image; for s = 16, this means 4096 × 4096 pixels.Assume that the "viewing screen" is defined by |x| ≤ 2048 and |y| ≤ 2048.In each iteration, we pick a random position for (the center of) a disk, in an extended screen, defined by |x| ≤ 2048 + 16 • r max and |y| ≤ 2048 + 16 • r max -this is to avoid edge effects.The disk is then assigned a radius r from a 1/r 3 size distribution with 16•r min ≤ r ≤ 16 • r max , and a random grey level a according to the empirical marginal distribution of log-contrast for natural images.We use the 'front-toback' construction in Section 2.1.1 to make sure that the generated images are samples from a stationary probability distribution: First, we place the closest object on the "screen", and then we successively add objects which are farther away until the background is filled.
Finally, we scale down the generated images by averaging pixels in disjoint 16 × 16 blocks-but other ways of smoothing, such as convolution with a Gaussian filter and subsampling, are also possible.

Statistics of Natural Versus Synthetic Images
Below we compare the empirical statistics of the simulated dead leaves images (see Section 2.5 for details) with natural images from the following two databases: 1. Database by van Hateren (van Hateren and van der Schaaf, 1998) the examination of the statistics for each category separately, as well as for the whole ensemble.
Both databases for natural images use calibrated image data, i.e. the images measure light in the world up to an unknown multiplicative constant in each image.To obtain results which are independent of the gain setting, we will work with the log-contrast of the   images-defined by where φ [i, j] are the calibrated grey-level values and the average • is taken over each image separatelyand use statistics which do not contain the (now additive) constant.Furthermore, we will always graph the logarithm of probability, rather than probability itself, since the log-scale better shows the nature of the tails in the distributions.

Single-Pixel Statistics
The distribution of log-contrast for natural data has a highly non-Gaussian shape with heavy, almost straight tails (see Huang and Mumford, 1999).The single-pixel statistic is however not very informative, as we can strongly modify the histogram of an image without affecting much of its perception.It is also trivial to get a good fit with the dead leaves model-as we always can choose the gray levels of the templates according to the empirical distribution of log-contrast in natural data.

Derivative Statistics and Scaling
In this section, we study the difference of adjacent pixel values, and how well histograms of this difference or derivative statistic scale.More precisely: For each logcontrast image I , we define a scaled-down image I (N )  by computing the average of pixel values in disjoint N × N blocks.The statistic we investigate is the horizontal derivative, which for scale N , is given by If natural images are fully scale invariant, D (N ) should have the same distribution for different values of N .
When we measure departure from scale invariance, we look at both the change of the shape of the histograms after rescaling and the change in the standard deviation of D.

Generic Natural Scenes.
First, we note that for large databases of natural image, the derivative statistic D is a surprisingly stable statistic-consistent across different datasets.We also get an excellent fit of the probability density function of D to a generalized Laplace distribution 5 where s and α are parameters related to the variance and kurtosis.Figure 5(a) (bottom) shows three curves which are almost the same.These corresponds to: (1) the log-histogram of D, at scale 2, for van Hateren's database, (2) the corresponding log-histogram for the unsegmented British Aerospace database, and (3) a fit to a generalized Laplace distribution with α = 0.68.The second observation is that the histograms of D for generic images scale almost fully: Fig. 5(a) (top) shows four very similar curves.These correspond to the log-probability distribution of D (N ) , for N = 1, 2, 4, 8 in van Hateren's database (the curves have been shifted vertically for visibility).Except at the first scale (solid line), the histograms lie almost completely on top of each other.

Dead Leaves Model for Generic Scenes.
We now compare the marginal distribution of D for natural data, with results from the occlusion model (Fig. 5(b).We are able to reproduce the main features seen in natural data: (1) the singularity at 0, 6 which corresponds to large regions with no contrast variation, and (2) the heavy tails, which correspond to edges in the images.
The tails are slightly less concave than for natural data; but as before, we get an excellent fit to a generalized Laplace distribution (see the two overlapping curves in Fig. 5(b), bottom): In this case, a Laplace distribution with α = 0.78 (α = 0.68 for natural data).
Figure 5(b), top part, shows that the synthetic images are close to scale invariant.The shape of the log-histograms of D remains the same after scaling, but the standard deviation of the derivative decreases somewhat with N (a factor of N 0.1 )-this is because of the anomalous dimension η in the system (see Section 2.4.3).In the figure we have made a contrast renormalization (by dividing out the change in the standard deviation)-the histograms of D for 4 different scales then lie almost completely on top of each other.

Different Types of Natural Scenes.
The segmentation of the British Aerospace database makes it possible to study different types of natural scenes.
"sky", the shapes of the histograms at different scales are about the same (except at the first scale)-this is consistent with the assumption that I scales according to Eq. ( 21).For the sky category, the shape seems to depend on N .
To study the scaling properties more in detail, we plot the logarithm of the standard deviation against the logarithm of N , and perform a linear regression.This is equivalent to fitting a power spectrum of the form C/ f (2−η) : The slope gives us an estimate of half of the "anomalous dimension" η, or if Eq. ( 21) is valid, the scaling exponent ν.
From Fig. 6 and the linear regression, we conclude that, although natural images as a single ensemble are very nearly scale invariant with a derivative histogram described by a concave-shaped generalized Laplace distribution (see Section 3.2.1),major differences exist between different categories, or different parts of images: 1.For the vegetation category, the log-histograms of D have relatively straight tails.In terms of power spectrum fall-off, we get C/ f 1.8 -which is similar to Ruderman's and Bialek's results (Ruderman and Bialek, 1994).2. For the man-made category, the log-histograms of D have heavy "shoulders" with a convex shape.The power spectrum scales like C/ f 2.3 .3. For the sky category (including clouds), the density of the distribution for D is mainly concentrated around 0. The power spectrum scales like C/ f 1.0 , i.e. the category is intermediate between white noise and the standard category with η ≈ 0. 4. For the road category, the log(histograms) are slightly concave.The power spectrum scales roughly like C/ f 1.4 .N ) for "vegetation-like" and scales N = 2 (solid), 4 (dashed), 8 (dash-dotted), and 16 (dotted).d: Corresponding log-histograms for "man-made-like."

Different Versions of the Dead Leaves Model.
There are many ways one can vary the dead leaves model so that it fits different types of natural scenes.One can use different kinds of templates, and one can vary the size cut-offs r min and r max to fit the level of clutter and the anomalous dimension in the images.Here we show two different versions of the occlusion model that will be compared to the categories "vegetation" and "man-made" in the previous section: For vegetation-like, we generate 500 relatively cluttered images with 256 × 256 pixels (see Fig. 7(a)).We use elliptic primitives of random orientation.The length L of the major axis is distributed according to 1/L 3 , where 2 ≤ L ≤ 1024 pixels, and the ratio between the lengths of the minor and major axes is a uniform deviate between 1/8 and 1/2.We color the ellipses according to a "double-exponential" distribution with mean 0 and variance 1, but add some Gaussian noise for smoothing.
For man-made-like, we generate 500 cleanerlooking images with square primitives (see Fig. 7(b)) The length s of the side of the square is distributed according to 1/s 3 , where 2 ≤ s ≤ 2048 pixels.We color the squares according to a uniform intensity distribution with mean 0 and variance 1.As before, we add some Gaussian noise for smoothing.
Figures 7(c) and (d) summarize the results.The two plots show the log(probability) of the derivative D (N ) at scales N = 2, 4, 8, and 16-for "vegetationlike" and "man-made-like", respectively.Compare the plots to Fig. 6(a) and (b) for the vegetation and manmade categories.Furthermore, linear regression gives the scaling exponent −0.09 for the occlusion model with high clutter and elliptic primitives, and the exponent 0.13 for the model with low clutter and square primitives; Compare this to the "vegetation" and "manmade" categories where the exponents are −0.11 and 0.15, respectively.

Joint Statistics of Haar Wavelet Responses
In this section, we look at the distribution of 2 × 2 blocks of pixels.If a i, j = I [i 0 + i, j 0 + j], where 0 ≤ i, j ≤ 1, is such a block, then we look at the distribution of (a 00 , a 01 , a 10 , a 11 ) ∈ R 4 .The mean is a relatively un-informative statistic, hence we look at the projection to R 3 given by the 3 Haar wavelet responses: which we call the horizontal, vertical and diagonal filters, respectively.These Haar wavelets show clearly, though only partially, the very specific nature of local statistics (or textons) in natural images.We use the same definitions as in (Buccigrossi and Simoncelli, 1999), to describe the relative positions of wavelet coefficients: We call the coefficients at adjacent spatial locations in the same subband brothers (left, right, upper, or lower-depending on their relative positions; note that the basis functions are disjoint), and we call the coefficients at the same position, but different orientations, cousins.
Figure 8 shows the joint wavelet statistics for natural images.We have plotted the contour levels for the joint histograms of some different coefficient pairs (at scale N = 2).A common feature for all the histograms is that a cross-section through the origin has a peak at the center and long tails-the shape is very similar to the derivative density function in Section 3.2.More complicated structures also show up in the polyhedralike contour-level curves: The corners and edges (in the level curves), which are sometimes rounded and sometimes cuspidal, reflect typical local features in the images.For example, in the plot for the horizontal (cH 1 )-left brother (cH 2 ) pair (Fig. 8(c)), the edges along the diagonal cH 1 = cH 2 indicate the frequent occurrence of extended horizontal edges in natural scenes.In the plot for the horizontal (cH)-diagonal (cD) pair (Fig. 8(b)), the cusp at cH = cD corresponds to the T-junction a 01 = a 11 and a 00 = a 10 ; the cusp at cH = −cD corresponds to the T-junction a 00 = a 10 and a 01 = a 11 .
In Fig. 9, we have plotted the contour-level curves of the corresponding joint histograms for the synthetic images.We see that the plots are very similar to those in Fig. 8: The corners and edges all appear in the right places, and the shape of the curves are also almost the same as those in Fig. 8.This is a strong indication that the occlusion model captures much of the local image structure in natural data.Compare this to the Gaussian model, for example, which totally fails here-all contours in the wavelet domain are elliptic for these images.The random wavelet expansions (Eq.( 4)) can reproduce some of the polyhedra-like contours for images with low levels of clutter (i.e.few objects), but the edges become more rounded, and the contours more elliptic for higher levels of clutter.
In Fig. 9, we also see some smaller differences between natural and computer-simulated images-for example, in the cH − cV plot (Fig. 9(a)) and the cH 1 -cH 2 plot (Fig. 9(c)).In natural scenes, there's a strong bias in the horizontal and vertical direction, because of tree trunks, the horizon, buildings etc.The computersimulated images, on the other hand, have disk-like primitives only, and are also rotationally invariant.This leads to more rounded shapes in Fig. 9(a) and (c).

Long-Range Covariances
So far we have only looked at small-scale statistics, i.e. statistics for single pixels or nearest-neighbor pixels.Here we extend our comparison of natural and simulated images to long-range statistics.
The simplest long-range statistic is probably the correlation between two pixels in, for example, the horizontal direction.We can calculate the covariance function (Eq.( 18)) or, alternatively, the variance of the difference of two pixel values, i.e.
where • denotes an average over all images.The latter formulation is a good choice when images are offset by an unknown constant.The two functions are otherwise equivalent as In Huang and Mumford (1999), Huang shows that the "difference function" for natural images (in a fixed direction) is best modeled by where x is the separation distance between two pixels, and a 1 , a 2 , a 3 are constants.The power-law term dominates the short-range behavior, while the linear term dominates at large pixel-distances.The linear term indicates that, while the scale-invariance property holds almost exactly locally (i.e. for filters with small supports), there are systematic deviations from scale invariance on a large scale.This may be due to the presence of sky in the images.
In the synthetic images, the linear term is absent.The difference function for the simulated images is best modeled by a power-law for both short and large pixel-distances (see Section 2.4); In the Fourier domain, this corresponds to a power spectrum of the form 1/k 2−η .
Figure 10 maybe shows this clearer.Here we have a log-log plot (base 2) of the derivative of V (x) for natural images (solid line) and computer-simulated dead leaves images (dashed line).A power-law behavior according to Eq. ( 29), would lead to a straight line with slope (1 + η).
The fit between natural images (solid line) and synthetic images (dashed line) is good in the region where both curves are relatively straight, i.e. for 2 < log 2 x < 5, or distances between 2 and 32 pixels.The slopes here are −1.19 (η = 0.19) for natural data, and −1.16 (η = 0.16) for synthetic images (cf.η = 0.14 for the continuous model in Fig. 2(b)).For natural images, however, the curve turns and becomes almost horizontal for large distances (about 1/10 of the image size); this indicates a linear term in the difference function.

Two-Pixel Co-occurrence Statistics
In this section, we compare the complex behavior of the full co-occurrence statistics in the occlusion model and natural data-previously, we have been comparing the auto-correlation or variance of the difference of two pixel values, and the full histogram of differences in adjacent pixel values.Below, we also test the basic assumption in the occlusion model that natural images can be segmented into two parts: "same object" and "different objects".
In the calculations, we have symmetrized the data so that Pr{I (0) = a} = Pr{I (0) = −a}.This is to take away the bias towards high intensity values, caused by the sky in natural images.

Bivariate Fit for Computer-Simulated Images.
A Modified Occlusion Model.As mentioned before, the continuous occlusion model (Eq.( 15)) gives good predictions for high-resolution images, but dead leaves images with subpixel resolution by means really give the best fit to natural data (derivative statistics, scaling, joint statistics of Haar wavelet coefficients, etc.).Thus, one needs to examine how much the formula for the cooccurrence function K (a, b, x) is changed by subpixel averaging.
As an ansatz for the co-occurrence function of two pixels in dead leaves images with smoothing, we write (cf.Eq. ( 15)) As before, we assume that different objects are statistically independent.The first term in Eq. ( 30) corresponds to pixels on different objects, and is equivalent to the product in Eq. ( 15) of the pdf:s of single-pixel intensities.In the second term-which represents pixels on the same object-we replace the previous delta function with a new probability density function g x that is highly concentrated around 0. We also introduce a new probability density function h x which is similar to the function f x for single-pixel intensities.
(The subindex x in h x and f x indicates that the functions may depend on the distance x between the pixels.) For convenience, we make a variable substitution The joint pdf of the new variables is given by We now fit the empirical joint pdf Q synth (u, v, x) of computer-simulated images to the expression in Eq. ( 32); As a best-fit criterium we minimize the  , 2, 4, 8, 16, 32, 64, 128, and the solid line represents P same (x) for a continuous model with uniformly colored objects.b-d: The 3 plots show the 1D functions q, g x and h x from the bivariate fit in (a).The functions depend very little on x; h x and q are also almost the same.Legend: • x = 1; × x = 2; + x = 4; x = 8; x = 16; ♦ x = 32; x = 64; x = 128.Kullback-Leibler distance.For fixed x, the bivariate fit gives us a value for λ(x), and expressions for the 1D functions h x , g x and q.
In Fig. 11(a), we see how λ(x) varies with the distance x between the pixels.We compare λ-values for x = 1, 2, 4, 8, 16, 32, 64, 128 from the bivariate fit (rings) to the analytically calculated P same (x) in the continuous model (solid line).The figure shows that λ(x) > P same (x) for fixed x; that is the probability that two points belong to the same object is larger for the smoothed dead leaves images than for the continuous images.This is also what we expect of a model where small regions of intensity variations are considered to be parts/textures of larger objects, rather than separate objects.The 1D functions q, g x and h x that we have used for the bivariate fit at x = 1, 2, 4, 8, 16, 32, 64, 128 are plotted in Fig. 11(b)-(d).Note that the functions depend very little on x; h x and q are also almost the same.
Figure 12 shows the results of a bivariate fit for x = 2 (top), x = 16 (center), and x = 128 (bottom).In the left column, we have the contour levels of Q synth (u, v, x); The right column shows the best fit of the data to Eq. ( 32).The agreement between the data and the fit is very good, which shows that we can use Eq.(30) or Eq.(32) to accurately describe the two-pixel statistics of dead leaves images with subpixel resolution by means.

Bivariate Fit for Natural
Images.Finally, we test how well the modified mixture model fits natural data in van Hateren's database.Figure 13 (left and center columns) hows three examples of a bivariate fit to Eq. ( 32): The distances between the pixels are x = 2 (top), x = 16 (bottom), and x = 128 (bottom).Although the model is simple, the fit is good-that is, we are able to write the full two-pixel co-occurrence statistic of natural images as a mixture ("same" and "different" objects).This is a direct confirmation of the accuracy of the dead leaves model.
In the current version of the dead leaves model, different objects are independent.This seems to be a reasonable first approximation.The "objects" defined by such a model, however, become very complex and large, because of the dependencies between different parts of natural scenes.In Fig. 14, we plot the 1D functions h x , g x and q that we get from a best fit to Eq. (32) at separation distances x = 1, 2, 4, 8, 16, 32, 64, 128.As before, the functions h x and q are almost the same, and depend little on x.However, the function g x , which is related to the difference in intensity of pixels on the  , 2, 4, 8, 16, 32, 64, 128; These are compared to the λ-values for synthetic images (rings).b-d: The 3 plots show the 1D functions q, g x and h x for the bivariate fit of natural images.Note that the function g x , which is related to the intensity difference of pixel values on the same object, becomes wider for larger values of x.This indicates that the "objects" from the bivariate fit have parts, and parts of parts (see text).Legend: • x = 1; × x = 2; + x = 4; * x = 8; x = 16; ♦ x = 32; x = 64; x = 128.
same "object", depends strongly on x: The function is relatively concentrated around 0 for small x, but becomes wider for larger values of x.The dependency of g x on x indicates that the "objects" have parts, and parts of parts.Assume, for example, that a whole region of a forest is classified as one "object".Because the forest divides into trees, and trees, for example, have leaves, we would expect g x to become successively narrower for smaller values of the pixel distance x.This is also what we found for the bivariate fit of natural images, but not for the bivariate fit of the synthetic images which lack this "object tree structure".
Furthermore, we note that the contour plots for both natural (Fig. 13, left) and synthetic images (Fig. 12, left) become more rectangular for increasing values of x-as the probability of the pixels being on different "objects" increases.For fixed x, λ(x) for natural data is considerably larger than P same (x) for computersimulated images (see Fig. 14(a); also compare to Fig. 10(a) where, for fixed x, the variance V (x) of the difference of two pixel values is much larger for synthetic than for natural data).Again, this is an indication that a more realistic variant of the occlusion model should include objects with parts.However, if we compare contour plots of Q synth (u, v, x) and Q nat (u, v, x) that correspond to similar λ-valuescompare the left and right columns in Fig. 13-we get an excellent fit between the dead leaves model and natural data, for a range of different pixel separation distances.
Ruderman has previously shown that at stationarity where p 2 (x) is the probability that the front-most object, which is not occluded by any other objects, contains both points in the pair, and p 1 (x) is the probability that the object contains exactly one of the two points.Furthermore, he has derived an expression for the conditional probability for a circle A with radius r .In the dimensionless quantity ξ = x/r , the function has the form ( g(ξ ) = 0, for ξ > 2).
In our model, the radii of the disks are distributed according to 1/r 3 where r min ≤ r ≤ r max . 7We then have is the probability that a given point in the image belongs to an object with a radius in the interval [r, r + dr].
By inserting the above equations into Eq.( 33), we get where and g(ξ ) is given by Eq. ( 35).
Inserting Eq. ( 40) into Eq.(39) leads to a numerical estimate of B(x), and thus a numerical expression for P same in Eq. ( 38).For 2r min ≤ x < 2r max , 8 where u = x r max .
B. Can we let r max → ∞?
In the occlusion model, there is the notion that the larger r max is, the more likely the image is to be covered by a single object.Below, we show that if we allow infinitesized objects, the probability of this happening is exactly one.
For simplicity, we assume that the image screen is circular with radius a.As before, we place disks in R 2 × [r min , r max ] according to a Poisson process with rate function where c is a constant and r is the disk radius.A transformation to polar coordinates (ρ, α) gives -that is, the points (ρ, r ) form a Poisson process on the product space (0, ∞) × (r min , r max ) with rate function Now consider the front-most object in the image.The object either covers the whole image-in which case, the image has a uniform intensity-or the object overlaps part of the screen.Below, we calculate the probability that an image has a constant grey level. .Three examples of objects (shaded area) that "overlap" the image screen (dashed circle).Top: left and right: The object "overlaps" but does not "cover" the screen.Bottom: The object both "overlaps" and "covers" the screen.This is given by where µ cover is the measure of the set of "covering" objects in the Poisson process, and µ overlap is the measure of the set of "overlapping" objects.The definitions for "overlap" and "cover" are straightforward: An object overlaps the screen if and an objects covers the screen if Figure 15 shows three examples of objects that "overlap" the screen.In the top two figures, the objects "overlap" but do not "cover" the screen.In the bottom figure, the object both "overlaps" and "covers" the screen.
The quantities µ cover and µ overlap can be calculated by integration.Equations ( 44) and ( 47 for r min finite.Without an upper cutoff on the object sizes, a single object will cover the whole screen.

Figure 1 .
Figure 1.(a) Computer-simulated sample from a dead leaves model; see Section 2.5 for details.The image is here a collage of discrete objects which partially occlude one another.Compare with (b) a computer-simulated sample from the standard Gaussian model, where there are no clear objects and borders.Both images here are approximately scale invariant.

Figure 2
Figure 2(a) shows a numerical example for f (a) =

Figure 2 .
Figure 2. Predictions for a continuous dead leaves model with cubic law of sizes and cut-offs r min = 1/8 and r max = 2048.a: Difference statistics.The figure shows the predicted log (probability) distribution of D for a model with intensities according to a "double-exponential" distribution; D is the grey-level difference for points a distance = 1 apart.The distribution has a sharp peak at D = 0, and long straight tails.b: Covariance statistics.The solid line shows the predicted covariance function C(x); The overlapping dotted line represents a power-law fit C(x) ≈ −0.33 + 0.91 • x −0.14 .

Figure 3 .
Figure 3. Relation between the number η ("eta") and the cut-offs r min and r max in the dead leaves model with disks and cubic law of sizes.For each curve, r max is fixed and r min is varied.The data points are shown in a plot with η versus log 2 (r max /r min ).The 4 curves are roughly overlapping, which indicates that the number η is a function of the ratio r max /r min .Legend: r max = 1024; • r max = 2048; * r max = 4096; ♦ r max = 8192.

Figure 1
Figure 1(a) shows an example of a computersimulated dead leaves image.

Figure 4 .
Figure4.Two sample images from the British Aerospace database and their segmentations into pixels belonging to different categories of scenes (as defined in Table1).

Figure 5 .
Figure 5. Derivative statistics and scaling.a: Natural images.The difference statistic D between adjacent pixels is amazingly stable, both across different databases and different scales.The bottom three curves, which are overlapping, correspond to the log-histograms of D for van Hateren's database (solid), the British Aerospace database (dashed), and a fit to a generalized Laplace distribution with parameter α = 0.68 (dotted).The top four curves (which have been shifted vertically for visibility) correspond to the log-probability of D(N ) for scales N = 1 (solid), 2 (dashed), 4 (dash-dotted), and 8 (dotted).b: Synthetic images.The distribution of D for the model (solid, bottom part) can be approximated with a generalized Laplace distribution with parameter α = 0.68 (dashed, bottom part).After a contrast normalization, the log-histograms of D(N ) at scales N = 1, 2, 4, and 8 lie almost on top of each other (see the top four curves which have been shifted for visibility).

Figure 8 .
Figure 8. Joint statistics in the wavelet domain for natural images.The contour plots show the log(probability) distributions for different waveletcoefficient pairs at scale 1. a: Horizontal (cH) and vertical (cV ) components.b: Horizontal (cH) and diagonal (cD) components.c: Horizontal component (cH 1 ) and its left brother (cH 2 ).d: Vertical component (cV 1 ) and its left brother (cV 2 ).

Figure 9 .
Figure 9. Joint statistics in the wavelet domain for synthetic images.Compare the contour plots with those in Fig. 8 for natural images: The similarities indicate that the dead leaves model captures much of the local image structure of natural data.

Figure 10 .
Figure 10.Convariances.The figure shows a log-log plot (base 2) of the derivative of the difference function V (x) for natural images (solid line) and synthetic images (dashed line).

Figure 11 .
Figure 11.Functions that give the best bivariate fit of computer-simulated images (to Eq. (32)) at different distances x. a: The rings represent the λ-values from a bivariate fit at x = 1, 2, 4, 8, 16, 32, 64, 128, and the solid line represents P same (x) for a continuous model with uniformly colored objects.b-d: The 3 plots show the 1D functions q, g x and h x from the bivariate fit in (a).The functions depend very little on x; h x and q are also almost the same.Legend: • x = 1; × x = 2; + x = 4; x = 8; x = 16; ♦ x = 32; x = 64; x = 128.

Figure 12 .
Figure 12.Co-occurrence function Q synth (u, v, x) and best bivariate fit (to Eq. (32)) for numerically simulated dead leaves images.Left: The plots show the contour levels (−9, −7, −5, −3, −1) for the logarithm of the joint pdf of the sum u and difference v for pixels a distance x = 2, 16 and 128 pixels apart (horizontal direction).Right: The plots show the corresponding contour levels for the best fit to Eq. (32).

Figure 13 .
Figure13.The plots show the contour levels (−9, −7, −5, −3, −1) for the logarithm of the joint pdf of the sum u and difference v for pixels a distance x = 2, 16 and 128 pixels.Left: Q nat (u, v, x) for natural images.Center: The best fit of Q nat (u, v, x) to Eq. (32).Right: Q synth (u, v, x) that correspond to similar λ-values.

Figure 14 .
Figure 14.Functions that give the best bivariate fit of natural images (to Eq. (32)) at different distances x. a: The stars represent the λ-values from a bivariate fit of natural images at x = 1, 2, 4, 8, 16, 32, 64, 128;  These are compared to the λ-values for synthetic images (rings).b-d: The 3 plots show the 1D functions q, g x and h x for the bivariate fit of natural images.Note that the function g x , which is related to the intensity difference of pixel values on the same object, becomes wider for larger values of x.This indicates that the "objects" from the bivariate fit have parts, and parts of parts (see text).Legend: • x = 1; × x = 2; + x = 4; * x = 8; x = 16; ♦ x = 32; x = 64; x = 128.

Figure 15
Figure15.Three examples of objects (shaded area) that "overlap" the image screen (dashed circle).Top: left and right: The object "overlaps" but does not "cover" the screen.Bottom: The object both "overlaps" and "covers" the screen.

Table 1 .
Different categories and their frequencies.