The Nonlinear Statistics of High-Contrast Patches in Natural Images

Recently, there has been a great deal of interest in modeling the non-Gaussian structures of natural images. However, despite the many advances in the direction of sparse coding and multi-resolution analysis, the full probability distribution of pixel values in a neighborhood has not yet been described. In this study, we explore the space of data points representing the values of 3 × 3 high-contrast patches from optical and 3D range images. We find that the distribution of data is extremely “sparse” with the majority of the data points concentrated in clusters and non-linear low-dimensional manifolds. Furthermore, a detailed study of probability densities allows us to systematically distinguish between images of different modalities (optical versus range), which otherwise display similar marginal distributions. Our work indicates the importance of studying the full probability distribution of natural images, not just marginals, and the need to understand the intrinsic dimensionality and nature of the data. We believe that object-like structures in the world and the sensor properties of the probing device generate observations that are concentrated along predictable shapes in state space. Our study of natural image statistics accounts for local geometries (such as edges) in natural scenes, but does not impose such strong assumptions on the data as independent components or sparse coding by linear change of bases.


Introduction
A number of recent attempts have been made to describe the non-Gaussian statistics of natural images (Field, 1987;Ruderman and Bialek, 1994;Olshausen and Field, 1996;Huang and Mumford, 1999;Simoncelli, 1999b;Grenander and Srivastava, 2001).The interest for these studies in the computer vision community has been motivated by the search for more realistic priors for applications as diverse as object localization (Sullivan et al., 1999), segmentation (Malik et al., 2001;Tu et al., 2001), image reconstruction (Nielsen and Lillholm, 2001), denoising (Zhu and Mumford, 1998;Simoncelli, 1999a) and compression (Buccigrossi and Simoncelli, 1999).
The research in natural image statistics can roughly be divided into two related directions.Some studies involve analyzing 1D or 2D marginal statistics with respect to some fixed linear basis.Grenander and Srivastava (2001), for example, have shown that one can use a family of Bessel functions to model the 1D marginals of band-pass filtered data from a variety of different types of images.In Wegmann and Zetzsche (1990) and Simoncelli (1999b), the authors use a wavelet basis to uncover complex dependencies between pairs of wavelet coefficients at nearby spatial positions, orientations, and scales.In the other direction, there are studies of image statistics which try to find an "optimal" set of linear projections or basis functions in the state space defined by the image data (8 × 8 patches, for example, define a distribution in R 64 ).The directions in state space are usually chosen according to some higher-order statistical measure reflecting the non-Gaussianity or multi-modality of the projected data density; see e.g.projection pursuit (Huber, 1985;Friedman, 1987), sparse coding (Olshausen and Field, 1996) and ICA (see Hyvärinen (1999) for a survey of ICA and related methods).
Despite the many advances in sparse coding and multi-resolution analysis, we are still short of a description of the full probability distribution (as opposed to marginal distributions) of pixels in a neighborhood.Furthermore, so far, there have been few attempts to make precise the connection between the object structure in the world and the probability distribution of natural images.
In this paper, we analyze the state space of local patterns of pixel values.More precisely, we examine the empirical probability distribution of 3 × 3 patches of optical and range images.These two types of images reflect different aspects of generators (or objects) in the world, as well as differences in image sensor properties.After preprocessing (consisting of subtracting the mean of each patch and then whitening the data), the extracted 3 × 3 image patches define a distribution on a 7-dimensional sphere.We address the following questions: "How is the data distributed in this state space?" and "Are there any clear qualitative differences between the distributions of data from images of different modalities, e.g.optical versus range images?".
To develop statistically efficient image representations, it is important to understand how natural data is distributed in higher-dimensional state spaces.Without this knowledge of natural images we are not able to fully exploit the sparseness of the state space of the data.From a sparse coding point of view, high-density clusters and low-dimensional manifolds are especially interesting.These types of structures greatly reduce the dimensionality of the problem.
In ICA and related methods, one assumes that there exists a linear change of basis (into independent components) that sparsifies the image data.We believe that an analysis of the probability distribution of natural images should be free from such strong assumptions as independent components, or linear decompositions of an image into a few dominant basis images.In reality, the most common rule for image formation is occlusion-which is non-linear-and the state space of image patches is rather complex with many more high-density directions than the dimension of the state space.The complexity of the data can partially be seen in the Haar wavelet statistics of natural images (Huang and Mumford, 1999;Huang et al., 2000).Take, for example, the 3D joint distribution of horizontal, vertical, and diagonal wavelet coefficients of natural images.Figure 1 shows that the equi-probable surface of this distribution has several "hot spots" with 6 vertices along the axes y = z = 0, x = z = 0, x = y = 0 and x = ±y = ±z; and 8 local maxima around the shoulders x = ±y = ±z.These cusps are even more striking for range images (Fig. 2).
The observed cusps in Figs. 1 and 2 show very clearly frequently occurring local geometric patterns in pixelated natural images.A simple calculation 1 of the patterns corresponding to these cusps gives the following 2× 2 blocks and their rotations: A similar analysis can be done empirically for Haar wavelet coefficients at adjacent spatial locations in the same subband.Such a study of the 2D joint histogram of these so called wavelet "brothers" will reveal frequent occurrences in 2 × 4 patches of more complex local geometries best described as blobs, T-junctions, edges and bars (Huang et al., 2000).We believe that object-like structures in the world and the sensor properties of the probing device generate observations that are concentrated along predictable shapes (manifolds or clusters) in state space.We want to better understand how edges and other image "primitives" (see David Marr's primal sketch (Marr, 1982)) are represented geometrically in the state space of image data.Furthermore, we want to study how empirical data from natural images is distributed statistically with respect to the predicted clusters and manifolds.We are, in other words, searching both for a geometric and probabilistic model in state space of the basic primitives of generic images.
In this study, we focus on high-contrast 2 data.It is commonly believed that image regions with high contrast carry the most important content of a scene.Reinagel and Zador (1999) have shown, for natural images with a variety of cognitive content, that humans tend to focus their eye movements around high-contrast regions-thus significantly biasing the effective input that reaches the early stages of the visual system towards these types of image regions.Furthermore, we tend to believe that high-contrast and low-contrast regions follow qualitatively different distributions, and should be modeled separately.The equi-probable contours mentioned above (see Figs.This study deals with local patterns of pixel values, although it should be noted that the more general results (regarding the intrinsic dimension and shape of structures in state spaces) generalize to larger image patches and collections of filter responses (see the discussion in Section 7).Deriving pixel-level models-for example, through an iterative coarse-to-fine scheme using 3 × 3 or 5 × 5 patch structures-is also interesting by itself.What makes denoising and many computer vision applications difficult is that a natural image often contains many irrelevant, often partially resolved, objects.This type of "noise" is highly non-Gaussian and sometimes referred to as "clutter".To develop better image enhancement algorithms that can deal with structured noise, we need explicit models for the many regularities and geometries seen in local pixel patterns.This work has some similarities to work by Geman and Koloydenko (1999).The latter study also concerns geometrical patterns of 3 × 3 patches but in image space as opposed to state space.The authors quantize 3 × 3 blocks according to a modified order statistic, and define "equivalence classes" based on photometry, complexity and geometry in image space.One of their goals is data classification for object recognition applications.
The organization of the paper is as follows: In Section 2, we describe the two data sets extracted from an optical image database by Hateren and van der Schaaf (1998) and a range image database by Huang et al. (2000).In Section 3, we describe the preprocessing of the data sets.Our analysis is divided into three parts: In Section 4 we study the distribution of our data with respect to a Voronoi tessellation of the space of data points.This first part, is a model-free first exploration of the state space of contrast-normalized patches.We proceed in Section 5 with a study of the probability density of high-contrast optical image patches around an ideal 2D manifold.The manifold represents the loci in state space of blurred step edges of different orientations and positions.Finally, in Section 6 we analyze the probability density of range data around clusters corresponding to binary patches.The results are discussed in Section 7.

Optical and Range Data Sets
We extract two data sets of high-contrast 3 × 3 patches from optical and range images, respectively.These patches are then preprocessed according to Section 3.
-The optical data set contains about 4.2 • 10 6 highcontrast log-intensity patches.These are extracted from van Hateren's still image collection (van Hateren and van der Schaaf, 1998) of 4167 calibrated 1020 × 1532 images; see Fig. 3 for samples.The pixel values in these images are approximately linearly proportional to the scene luminance.
From each image in the database, we randomly select 5000 3 × 3 patches, and keep the top 20 percent, i.e. 1000 patches, with the highest contrast in logintensities (see Section 3 for definition of contrast or "D-norm").-The range data set contains about 7.9 • 10 5 highcontrast log-range patches.These are extracted from the Brown database 3 by Huang and Lee of around 200 444 × 1440 range images with mixed outdoor (Huang et al., 2000) and indoor scenes; see Fig. 4 for samples.We divide each image into disjoint 3 × 3 patches, and discard all patches with out-of-range data.From the remaining patches in each image, we randomly select 20000 patches and keep the top 20 percent with highest contrast in log-range values.
The optical and range images have quite different subresolution properties.In optical images, the subpixel details are averaged by the point-spread function of the camera.The pixel values in a range image, on the other hand, usually correspond to the minimum of the subresolution details.The field of view of the scanner is 80 • vertically and 259 • horizontally.The beam divergence of the laser range finder is approximately 3 mrad.If the laser footprint hits two targets with a range difference larger than 3 meters, the returned value is the range to the nearest target.If the range difference is less than 3 meters, the returned value is roughly a weighted average of both ranges where the weights depend on the reflectivity of the two targets.

Preprocessing
We want to compare and group image patches based on their geometrical structure.In a natural scene, the reflectance and the shape of a surface are usually fixed quantities, while the absolute distance to, and the illumination of, a point in a scene can vary widely.We thus work with the logarithm, rather than the absolute values, of intensity 4 or range.Furthermore, before analyzing the data, we subtract the mean and contrastnormalize each image patch.(1) The contrast x D , or "D-norm", is here calculated by summing the differences between 4-connected neighbors (i ∼ j) in a 3 × 3 patch and then taking the square root, i.e.
x D = i∼ j (2) In matrix form, we have where The preprocessed data points lie on a 7-dimensional ellipsoid S7 ⊂ R 9 , where S7 = y ∈ R 9 : For convenience, we make a change of basis to a coordinate system where the data points lie on a Euclidean sphere.In the case of scale-invariant images, this is exactly equivalent to whitening the data 5 .The 2-dimensional Discrete Cosine Transform (DCT) basis of a 3 × 3 image patch diagonalizes the matrix D. In vector form, we write the 8 non-constant DCT basis vectors as where the normalization is chosen such that e 1 D = . . .e 8 D = 1.Let the DCT basis vectors above be the columns of a 9 × 8-matrix A = [e 1 , e 2 , . . .e 8 ], and introduce a diagonal 8 × 8-matrix with the diagonal elements equal to 1/ e 1 2 , 1/ e 2 2 , . . ., 1/ e 8 2 .A change of basis taking e i to be the unit vectors according to or equivalently, y = Av, will then transform the ellipsoid S7 in Eq. ( 5) to a 7-dimensional Euclidean sphere The 7-sphere S 7 is the state space of our preprocessed data points.
We would like to be able to measure the distance between two 3 × 3 image patches P 1 and P 2 .Since the contrast-normalized data is on a sphere, we simply calculate the angular distance between the corresponding two points v 1 , v 2 ∈ S 7 ⊂ R 8 on the sphere.In other words, our distance measure is given by dist where the matrix B = A T , and the vectors y 1 , y 2 ∈ R 9 represent the centered 3 × 3 image patches P 1 and P 2 .

A First Exploration of the 7-Sphere
As a first exploration of our two data sets, we divide the 7-sphere S 7 ⊂ R 8 into Voronoi cells, and analyze how the data points are distributed with respect to the tessellation.This is a model-free exploration of the state space, where we derive non-parametric probability distributions.
Assume a discrete collection of sampling points P = {P 1 , P 2 , . . ., P N } in S 7 .A Voronoi cell i around a sampling point P i is defined as the set of all points x ∈ S 7 that are at least as close to P i as to any other sampling point P j ; that is, for any where dist(•, •) is the angular distance, as defined in Eq. ( 9), between two points on the sphere.The problem of choosing a dense set of sampling points on a sphere S 7 ⊂ R 8 is analogous to the problem of packing spheres in R 8 itself.For a fixed number of sampling points, we seek a set of points such that equal non-overlapping spheres centered at the points cover the sphere "efficiently", in the sense that the space not covered by these spheres is minimal.This is a nontrivial problem in the general n-dimensional case (see Conway and Sloane (1988) for an in depth exposé of sphere packing in higher dimensions).In the case of 8-dimensional lattices, however, there exists an optimal solution given by the so called E 8 lattice.There are several possible coordinate systems for E 8 .Using the "even" coordinate system, we obtain Suppose that there are N points in the E 8 lattice at a distance u from the origin.Then these points, when rescaled by dividing them by u, form a dense set of sampling points P = {P 1 , P 2 , . . ., P N } of S 7 .The first spherical shell with u = √ 2 and N = 240 is the unique solution to the "kissing number" problem in R 8 , where one wants to arrange the maximum number of nonoverlapping spheres of radius 1 so that they all touch the unit sphere.For our Voronoi sampling, we have chosen to take the 4:th spherical shell of E 8 with u = √ 8.After normalization, this gives us a total of 17520 Voronoi cells on the 7-sphere with roughly the same size.The sampling points P i are given by the permutations and sign changes of the following five 8-vectors: 1.The 112 permutations and sign changes of [2, 2, 0, 0, 0, 0, 0, 0] T / √ 8. 2. The 8960 permutations and sign changes of [2, 1, 1, 1, 1, 0, 0, 0] T / √ 8. 3. The 256 permutations and sign changes of [1, 1, 1, 1, 1, 1, 1, 1] T / √ 8. 4. The 7168 permutations and sign changes with the constraint that the number of minus signs is an odd number T / √ 8. 5.The 1024 permutations and sign changes with the constraint that the number of minus signs is an even number From a Monte Carlo simulation (with 5 million random points on the 7-dimensional unit sphere), we get that the volumes of the 5 types of Voronoi cells above are approximately 6.3 We bin our high-contrast optical and range patches into the 17520 Voronoi cells using the definition given in Eq. ( 10).We define the density of data points in the Voronoi cell around sample point P i as where N ( i ) is the number of patches in the Voronoi cell i associated with the sampling point P i , vol( i ) is the volume of that cell, and vol(S 7 ) = i vol( i ) = π 4 /3 is the total volume of the 7-sphere.In Figs. 5 and 6 we show the density ρ of the Voronoi cells for high-contrast optical and range patches, together with the percentage of volume occupied by the percentage of patches.We find that the distribution of data on S 7 is extremely "sparse", with the majority of data points concentrated in a few high-density regions on the sphere.For both the optical and range data, half of the patches can be divided into an optimal set of Voronoi cells that occupies less than 6% of the total volume of the 7-sphere.
We can use the Kullback-Leibler distance or relative entropy to get an information-theoretic measure of the deviation of the probability distributions of our data from a uniform distribution.Note that a Gaussian assumption on natural images corresponds to a uniform distribution in state space after whitening.
We estimate the probability density functions p o ( i ) and p r ( i ) for optical and range data, respectively, by calculating for i = 1, . . ., 17520.As before N ( i ) is the number of optical or range data points in Voronoi cell i .The corresponding probability density function for a uniform distribution is defined as where vol( i ) is the volume of Voronoi cell i , and vol(S 7 ) = π 4 /3 is the total volume of the 7-sphere.The KL-distance between the empirical probability density of the data and a uniform density on the 7-sphere is  12) for optical images.For each Voronoi cell, we display the 3 × 3 patch corresponding to the sample point P i .The cumulative sum of p( i ) (Eq. ( 13)) over the ordered patches is shown as p cum .
given by This gives us a measure of the number of excess bits we incur by coding the data points distributed by p o ( i ) or p r ( i ) with a code book based on the uniform distribution q u ( i ) (Cover and Thomas, 1991).For our two datasets, we have that D( p o q u ) = 1.62 bits and D( p r q u ) = 2.59 bits.These numbers indicate that the distribution of optical or range data points in the state space of the contrast-normalized data is highly non-uniform.In Figs. 7 and 8, we display the first 25 sampling points P i of the Voronoi cells ordered after their densities ρ( i ) (defined according to Eq. ( 12)) for optical and range data, respectively.(The pixel patterns in these two figures depend of course on the exact choice of sampling scheme, and can look very different if one were to choose basis functions from a different lattice.)In Fig. 7 for optical patches, the centers of the Voronoi cells with highest densities are close to blurred step edges (see Section 5.1).For high-contrast range patches, the cells with highest densities resemble binary patches (compare Fig. 8 with Fig. 18).Note  12) for range images.For each Voronoi cell, we display the 3 × 3 patch corresponding to the sample point P i .The cumulative sum of p( i ) (Eq. ( 13)) over the ordered patches is shown as p cum .
in particular that some of the first 25 Voronoi cells here are similar to binary symmetry classes 1', 2', and 5'.

Optical Data: Probability Density Around a 2D Manifold of Step Edges
The analysis of Voronoi cells on the 7-sphere indicates that blurred step edges are common high-contrast patterns in optical patches.In Section 5.1, we present an ideal model for edges in optical images that takes into account the averaging effects of the optics of the camera.The model predicts a 2-dimensional continuous manifold in state space, parametrized by the orientation α and position l of an edge.In Sections 5.2 and 5.3, we test the model with optical data from natural images.Finally, in Section 5.4 we apply this model to range data and discuss the differences in the results.

The Ideal Manifold of Edges
We represent the 3 × 3 image patch by a square S Q = {(x, y) : −3/2 ≤ x, y ≤ 3/2}.The pixels in the patch are given by where i, j = 0, 1, 2. The pixel value I (i, j) in an optical image is approximately an average of the underlying scene φ(x, y) recorded at each pixel S i j , i.e. the pixel intensity where φ(x, y) is the scene luminance.For an ideal step edge, where a > b.The parameter α ∈ [0, 2π) is the angle that the direction perpendicular to the edge makes with the y-axis, and the parameter l ∈ (−3/2, 3/2) is the displacement of the edge from the origin (Fig. 9).Thus, pixels in I α,l (i, j) strictly above the edge have intensities a, pixels strictly below the edge have intensities b, ) is the angle that the direction perpendicular to the edge makes with the y-axis, and the parameter l ∈ (−3/2, 3/2) is the displacement of the edge from the origin.and pixels along the edge is a weighted average of a and b.After subtracting the mean and contrast normalizing each edge patch (see Section 3), we arrive at a set of points v α,l ∈ S 7 ⊂ R 8 .It can be shown that the loci of these points, with α ∈ [0, 2π) and l ∈ (−3/2, 3/2), define a C 1 2-dimensional manifold, M 2 , embedded in the 7-dimensional sphere.
Because of the centering and the contrast normalization, some (α, l)-values are degenerate, i.e. they lead to the same patch or point v α,l ∈ M 2 after preprocessing.This situation occurs for patches with "glancing edges".Assume for example that the surface parameter l > 0. Then consider all edges that connect the points (−1.5, y) and (x, 1.5) on the border of the patch, where x ∈ (−1.5, ∞) is fixed and y can take any value in the interval [0.5, 1.5).Simple trigonometry shows that for a fixed value of x, this defines a set of (α, l)-values that correspond to the same contrastnormalized v α,l -patch.Rotations by π/2, reflections and contrast sign inversions (see the 16 symmetries in Eq. ( 28)) give the full family of equivalent edge patches.Each line in Fig. 10 represents one set of equivalent (α, l)-values in this family.There are two special cases of the example above with edges through the points (−1.5, y) and (x, 1.5).One special case is when x ∈ (−1.5, −0.5]:All edges with −1.5 < x ≤ −0.5 and 0.5 ≤ y < 1.5 lead to the same contrastnormalized patch.The light shaded regions in Fig. 10  represent these "corner patches" and their symmetries.Another special case is when x → ∞: This limit case and its symmetries "converge" to the set of equivalent (α, l)-values where 0.5 ≤ l < 1.5 or −1.5 < l ≤ −0.5 and α = 0, π/2, π or 3π/2 (horizontal and vertical edges).Non-degenerate edge patches are given by 0 ≤ α ≤ π/4 and 0 ≤ l < (1.5 sin α + 0.5 cos α), and its 16 symmetries.In the figure, these (α, l)-values are represented by the dark shaded region.
Figure 11 shows a few examples of ideal edge patches that correspond to different (α, l)-values between 0 ≤ α ≤ 180 degrees and 1.5 < l < −1.5.The step edges are here chosen on a triangular grid with the spacing α = 15 degrees and l = 1/4 pixel units.Numbers between patches represent the angular distances in degrees between nearest neighbors.Although we use an even sampling of grid points in the (α, l)-coordinate system, the distances between the nearest neighbors vary widely.
Figure 12 shows the geometry of the surface M 2 of step edges of different orientations and positions more clearly.Here we estimate the surface metric f (α, l) = d A dα dl numerically with a triangulated mesh that is much finer spaced than in the example above: We first divide the surface into rectangular bins with widths α = π/48 (3.75 degrees) and l = 1/16 pixels, hereafter we discard all bins that are completely   outside the interior region of the (α, l)-plot (see Fig. 10), that is, bins with degenerate points.Each of the remaining bins is then split into 4 spherical triangles, where the vertices represent blurred step edges in M 2 .The mesh is finally successively refined, where needed, until the distance between any two vertices in a triangle is less than 8 degrees.The final mesh contains 14376 spherical triangles.The area of a rectangular bin in Fig. 12 corresponds to the sum of the areas of spherical triangles 6 inside the bin.

Density of Optical Data as a Function of the Distance to the Ideal Edge Manifold
We next try to get a numerical estimate of the probability density of high-contrast optical data around the surface of step edges.In the following experiment, we use the "optical dataset" with N tot = 4.2•10 6 high-contrast patches described in Section 2. As above, we model the surface with a mesh with about 14000 spherical triangles, where the vertices of the triangles are blurred step edges in M 2 .
For each optical data point x n (n = 1, 2, . . ., N tot ), we calculate the distances to the centers of the spherical triangles in the mesh.We assume that the distance to the ideal edge manifold M 2 is approximately the same as the distance to the closest triangle center v α n ,l n , i.e. we assume that The error is largest for very small θ , where we sometimes get an overestimate of θ due to the finite grid spacing.
Figure 13(a), top, shows a normalized histogram of the number of data points as a function of the estimated distance θ.Let where θ is the bin width of the histogram.Linear regression (Fig. 13(b), top) gives that N (θ ) ∝ θ 1.4 for small θ .For the density estimate, we also need to calculate the volume of the set Figure 13(a), bottom, and Fig. 13(b), bottom, show the results from a Monte Carlo simulation with V tot = 10 7 sample points, s n (n = 1, 2, . . ., V tot ), that are uniformly randomly distributed on a 7-dimensional unit sphere.
The number of sample points in B θ , i.e.
is directly proportional to the volume of B θ .As expected 7 , the number of random points V increases approximately as V (θ ) ∝ θ 4 for small θ .The curve for V (θ ) has a maximum around θ = 43 degrees, after which it drops.The drop may indicate folds in the surface where part of the 7-sphere are at equal distance to different points of the surface.Furthermore, the plot shows that all points in S 7 are within approximately 60 degrees of the edge manifold M 2 .If the surface were flat, we would expect the corresponding distance to be 90 degrees (for the two antipodal points on the sphere).Figure 14 shows the density function For θ < ∼ 10 degrees, This result strongly supports the idea that there exists a 2-dimensional manifold in the 7-sphere where the data points are concentrated.In fact, we here see evidence of an infinite probability density at the ideal edge manifold (where θ = 0).In Fig. 15, top, we have plotted the percentage of data points that are within a certain distance θ of the surface.The curve shows that about 50% of the data points are within a tubular neighborhood of the surface with width θ = 26 degrees.This neighborhood corresponds to only 9% of the total volume of S 7 (Fig. 15, bottom).

Density of Optical Data as a Function of the Surface Parameters
We next study how the high-contrast optical data are spread out along the surface of step edges.For the position calculation, we only include data points that are very close to the surface.For each data point x n in our data set, we find the closest center point v α n ,l n in the triangulated mesh.We compute the 2D histogram where θ max = 20 degrees, and the bin widths α = π/48 radians and l = 1/16 pixels.We define the density p(α, l) of data points along the surface as where the sum α,l N (α, l) ≈ 1.3 • 10 6 , and f (α, l) is given in Section 5.1.
Figure 16 shows the results for regions where f (α, l) > 0.5 (see Fig. 12 for the surface metric).Although the data points are spread out along the whole surface, there is a clear concentration of data points around α = 0, 90, 180, and 270 degrees (vertical and horizontal edges) and the (α, l)-values near the border of degenerate edges.In the density calculation, we only include data points that are, at a position (α, l) where f (α, l) > 0.5, and in a tubular neighborhood of the surface with width θ max = 20 degrees.The bin widths α = π/48 (3.75 degrees) and l = 1/16 pixel units.

Range Data Comparison: Probability Density as a Function of the Distance to the Edge Manifold
A probability density estimate of high-contrast range data leads to very different results around the surface of blurred step edges.
In Fig. 17   patches.The histogram for range data has sharp peaks at θ ≈ 0, 11, 22, 24, . . .degrees.These peaks indicate the presence of high-density clusters of data points in S 7 .A more detailed analysis shows that the positions of the local maxima correspond closely to the distances between binary patches and the edge manifold.
For 3 × 3 patches, there are 510 binary patches.These can be divided into 50 equivalence classes with respect to the 16 elements in the product group 8 where {−1} represents sign inversion, and the point group C 4v (Schönflies notation (Elliott and Dawber, 1979)) is generated by rotations through π/2 around the center pixel, and reflections across a plane containing the rotation axis.
We denote the set of 510 binary patches in S 7 ⊂ R 8 by and the set of 50 distinct equivalence classes of the binary patches with respect to the symmetry group G by We use primed indices to denote binary patches that are grouped into equivalence classes, and unprimed indices to denote the 510 original binary patches.

Range Data: Probability Density Around Binary Patches
In the previous section we saw that high-contrast range image patches are concentrated in high-density clusters both on and around the surface of step edges.Furthermore, these clusters appear to be centered around binary patches.This motivates us to investigate the density of high-contrast range patches around the 510 possible binary patches.34)) of highcontrast range patches with respect to θ , the distance to the nearest binary patch.Linear regression in the interval 0.6 and 10 degrees gives p(θ ) ∼ θ −6.4 (see solid line). is given by We define the average density of high-contrast range patches as a function of the angular distance θ to the nearest binary patch as where vol(S 7 ) = π 4 /3.In Fig. 19 we show a log-log plot of p(θ).The graph is almost straight for more than a decade of distances, from 0.6 to 10 degrees.Linear regression gives that In Fig. 20 we show the cumulative percentage of the number of patches with respect to the distance θ to the nearest binary patch, as well as the cumulative volume  versus P cum (θ).We see that 80% of the high-contrast range patches are within a spherical neighborhood of 30 degrees from one of the 510 binary patches.The neighborhoods of these 80% patches occupy only 0.14% of the total volume of S 7 .
These results show that 3 × 3 high-contrast range patches are densely clustered around the 510 binary patches, and that the probability density is infinite at the positions of these binary patches.

Distribution of Range Patches Across the 50 Binary Symmetry Classes
We next study how the range patches are distributed among the 50 symmetry classes for binary patches (see Section 5.4).This will give us an idea of the typical geometrical patterns for high-contrast range patches in image space.
As before, we find the binary patches that are nearest to the data points x 1 . . .x N tot .We then group the data points together depending on which equivalence classes (with respect to the symmetry group G) the closest binary patches belong to.We define the number of range patches in symmetry class [b k ] G (as a function of the distance θ to the nearest binary patch) by where N j is given in Eq. (31  is defined by where V (θ) is given by Eq. ( 33   The graphs of the densities p k (θ) (Eq.( 39)) for the 50 symmetry classes are similar in appearance to Fig. 19.In Fig. 22, we show the slopes obtained by linear regression in a log-log plot of p k (θ ) ordered after decreasing cumulative percentage P k ,cum .The most frequent symmetry groups have very steep density curves, and there is a gradual decrease in the slopes of the curves for the less frequent symmetry classes.This is consistent with the result that most patches are close to binary patches which belong to the 7 most frequent classes.

Summary and Conclusions
In this work, we have taken a somewhat different approach to natural image statistics.Most of the work in image statistics focuses either on modeling 1D or 2D histograms of linear filter reactions or on finding a linear change of basis that sparsifies the data.Few attempts have been made to understand the full probability distribution of natural images and the intrinsic dimension of the state space of generic image data.It seems that we cannot take full advantage of the sparseness of the state space of the data without this knowledge of natural images.
In this study, we have analyzed the local geometric patterns seen in generic images.We believe that simple geometric structures in the world and the sensor properties of the probing device generate observations that are concentrated along predictable shapes (manifolds or clusters) in state space.The basic vocabulary of images (with edges, bars, blobs, terminations etc.) seems to be the same-whether one studies all types of natural images as here, or specific classes of images; e.g.medical images or images of just trees, indoor scenes etc.
Optical and range images measure different aspects of generators (objects) in the world; scene luminance versus distances to the nearest objects in a natural scene, respectively.Thus, there are bound to be differences in the statistics of these two types of images.However, we believe that the main qualitative differences between optical and range images are due to differences in subresolution properties.It seems that basic primitives (such as edges) in the world and morphological or ordering filters (such as median and mean filters) lead to compact clusters in state space.On the other hand, the same primitives and averaging filters lead to continuous submanifolds in state space.
In this paper, we have analyzed the probability distribution of 3 × 3 high-contrast patches from natural images of different modalities (optical versus 3D range images).In the preprocessing stage, we subtracted the mean and contrast-normalized the log-values of each image patch.The state space of the preprocessed image data (from optical or range images) is a 7-dimensional unit sphere in R 8 .
As a first exploration, we examined how the data distribute with respect to an approximately uniform Voronoi tessellation of the 7-sphere.The analysis showed that both optical and range patches occupy a very small amount of the total surface area (volume) of the state space: In both cases, half of the data can be divided into a set of Voronoi cells with a total volume of less than 6% of the volume of the 7-sphere.For optical patches, the centers of the most densely populated Voronoi cells resemble blurred step edges.For range patches, they resemble binary patterns.
A more detailed analysis showed clear differences in the probability distributions of optical and range patches.The majority of high-contrast optical patches are concentrated around a 2-dimensional C 1 submanifold embedded in the 7-dimensional sphere.This surface is highly non-linear and corresponds to ideal step edges blurred by the optics of the camera.A density calculation showed that the probability density of optical patches is infinite on this ideal surface.About 50% of the optical data points are within a tubular neighborhood of the surface with a width that corresponds to only 9% of the total volume of the state space.
The majority of range patches, on the other hand, are concentrated in compact clusters, rather than on a smooth manifold.The centers of these clusters seem to correspond to binary patches, i.e. patches with only two range values.A density calculation around the 510 possible binary patches indicated an infinite probability density at these "hot spots" of the image space.About 80% of the high-contrast patches are in a neighborhood of these spots that correspond to only 0.14% of the total volume of the 7-dimensional unit sphere.The most frequent binary patches are horizontal edges followed by slanted edges and corner-and T-junction-like structures.
Although the analysis in the current paper only deals with 3 × 3-patches, we believe that the more general results apply to larger patches and even general filter responses.The picture that seems to emerge is that basic image primitives-such as edges, blobs, and bars-generate low-dimensional and (in general) nonlinear structures in the state space of image data.Therefore, while the dimension of the state space, determined by the number of filters or pixels in the analysis, is usually very large, the intrinsic dimension of the manifolds generated by different primitives is fixed and determined by the complexity of the primitives only.The edge manifold we found for optical data is continuous and 2-dimensional.This is because an ideal edge can be characterized by two parameters: the orientation α and the position l of the edge.For optical data and bar structures which can be parameterized by 3 parameters (the orientation, position, and width of a bar), we would expect a 3-dimensional submanifold in state space, regardless of the dimension of the state space.
More generally, we believe that one can define a dictionary of probability models on representations of general primitives parameterized by = {φ 1 , φ 2 , . ..} for any set of filters f 1 , . . ., f N .In the N-dimensional state space of the filter-based image representations (and averaging linear filters), the image primitives will define manifolds of the general form where * denotes a convolution.
Our empirical results for the edge manifold and binary patches of optical and range data, respectively, show that, when studying natural image statistics, important geometric and probabilistic structures emerge only after abandoning assumptions such as independent components or sparse coding by linear change of basis.Furthermore, when looking at the full probability distribution of small patches, clear differences appear between different types of images (optical versus range) that otherwise have seemingly similar statistics.Analyzing 3 × 3 patches could thus offer a systematic and precise way of distinguishing and comparing image data.A complete description of the probability distribution of natural image patches, however, also requires modeling low-contrast patches, and highdensity regions that lie outside the ideal manifold of step edges (for optical patches) and binary clusters (for range patches).The Voronoi tessellation in Section 4 offers an automatic way of characterizing the full state space of 3 × 3 pixel data, while the "geometry-based" methods in Sections 5 and 6 may lead to a better understanding and parametric probability models of natural image data.

Figure 1 .
Figure1.An equi-probable surface of the joint distribution of horizontal, vertical, and diagonal wavelet coefficients in optical images, viewed from three different angles.
1 and 2) are highly irregular and star-shaped in the regions far from the origin of the plot.This clearly indicates the non-Gaussian statistics of high-contrast data.The contours near the center part of the plot look different.These contours are more ellipsoidal, which suggests fluctuations around low-contrast image regions may be Gaussian in nature.

Figure 2 .
Figure2.An equi-probable surface of the joint distribution of horizontal, vertical, and diagonal wavelet coefficients in range images, viewed from three different angles.

Figure 3 .
Figure 3.Samples from the van Hateren optical image database.The gray values code for log-intensity values.

Figure 4 .
Figure 4. Samples from the Brown range image database by Huang and Lee.The gray values code for log-range values.
Let x = [x 1 , . . ., x 9 ] T = [I 11 , I 21 , I 31 , I 12 , . . ., I 33 ] T ∈ R 9 be a non-constant vector with the log-values of the original patch.Subtracting the mean and contrast normalizing lead to a new vector

Figure 5 .Figure 6 .
Figure 5. Top: Density ρ( i ) of high-contrast optical patches in Voronoi cells that are sorted according to decreasing density.Bottom: Cumulative percentage of optical patches in the Voronoi cells above versus the cumulative percentage of volume in S 7 that are occupied by these cells.

Figure 7 .
Figure7.The first 25 Voronoi patches ordered after their densities ρ( i ) in Eq. (12) for optical images.For each Voronoi cell, we display the 3 × 3 patch corresponding to the sample point P i .The cumulative sum of p( i ) (Eq. (13)) over the ordered patches is shown as p cum .

Figure 8 .
Figure 8.The first 25 Voronoi patches ordered after their densities ρ( i ) in Eq. (12) for range images.For each Voronoi cell, we display the 3 × 3 patch corresponding to the sample point P i .The cumulative sum of p( i ) (Eq. (13)) over the ordered patches is shown as p cum .

Figure 10 .
Figure10.Because of the contrast normalization, some (α, l)values are degenerate, i.e. they correspond to the same point v α,l on the 7-dimensional unit sphere S 7 .Each line in the figure is an example of a set of (α, l)-values that lead to the same contrast-normalized 3 × 3 patch.The light shaded region corresponds to degenerate values for corner edges.The dark shaded region in the interior of the graph shows all (α, l)-values that are well-defined, i.e. non-degenerate.

Figure 11 .
Figure11.Examples of ideal step edges between 0 ≤ α ≤ 180 degrees (horizontal axis) and 1.5 < l < −1.5 (vertical axis).The step edges are here chosen on a triangular grid with sides α = 15 degrees and l = 1/4 pixel units.Numbers between patches represent the angular distances in degrees between neighbors in the horizontal, vertical and (lower left-upper right) diagonal directions in the (α, l)-grid.

Figure 12 .
Figure 12.Contour plot of the metric f (α, l) = d A dα dl for a surface of ideal step edges.The rectangular bins in the figure have widths α = π/48 (3.75 degrees) and l = 1/16 pixels.For the calculation of the area A of a bin, we add up the areas of 4 or more spherical triangles inside the bin (see text).

Figure 13 .
Figure 13.(a) Top: Normalized histogram of the number of optical high-contrast patches, N , versus the distance, θ , to the surface.Bottom: Normalized histogram of the number of random Monte Carlo samples, V , versus θ .(b) Top: Log-log plot of N versus θ .Linear regression in the interval between 0.5 and 7 degrees gives N ∼ θ 1.4 (solid line).Bottom: Log-log plot of V versus θ.Linear regression in the interval between 5 and 15 degrees gives V ∼ θ 4.0 (solid line).The circles represent extrapolated values of V for θ ≤ 5 degrees.The total number of optical data points is N tot ≈ 4 • 10 6 .The total number of random samples is V tot = 10 7 .The random samples are uniformly distributed on a 7-dimensional unit sphere, and give a Monte Carlo estimate of the volume of the space occupied by the histogram bins.

Figure 15 .
Figure 15.(a) Percentage of optical data points that are within a tubular neighborhood K θ of the surface with width θ ("distance to surface").(b) Same curve plotted versus the ratio 100•vol(K θ ) vol(S 7 )("percentage of volume of S 7 ").

Figure 16 .
Figure16.(a) Two-dimensional contour plot and (b) three-dimensional mesh of the density p(α, l) of high-contrast optical data along the surface of step edges.In the density calculation, we only include data points that are, at a position (α, l) where f (α, l) > 0.5, and in a tubular neighborhood of the surface with width θ max = 20 degrees.The bin widths α = π/48 (3.75 degrees) and l = 1/16 pixel units.
we show a normalized histogram N (θ) of range patches as a function of the distance θ to the surface; compare with Fig. 13(a), top, for optical

Figure 18 .
Figure 18.The 50 symmetry classes of binary patches arranged according to increasing θ -values, where θ is the angular distance to the closest point on the surface of ideal step edges.The number below each displayed patch represents this distance.The number in parenthesis left of each patch represents the number of binary patches in each equivalence class.In Fig. 18, we have sorted the 50 symmetry equivalence classes according to their distances to the surface of step edges.Note that the binary patches in [b 1 ] and [b 2 ] are exactly on the surface.Patches in [b 3 ] are at the same distance from the surface as the second local maximum at θ ≈ 11 (Fig. 17), patches in [b 4 ] and [b 5 ] correspond to the third and fourth local maxima at θ ≈ 22 and θ ≈ 24, respectively.The peak at θ ≈ 45 may be due to, for example, the symmetry classes [b 14 ] (blobs) and [b 18 ] (horizontal and vertical bars).The peaks at θ ≈ 50 and θ ≈ 57 could be signs of patterns similar to [b 25 ] (diagonal bars) and [b 42 ] (single dots), respectively.The displacement of the first peak from 0 in Fig. 17(compare [b 1 ] and [b 2 ] ) may be due to the overestimate of the distance θ for points that are very close to the surface.
), patches in [b 4 ] and [b 5 ] correspond to the third and fourth local maxima at θ ≈ 22 and θ ≈ 24, respectively.The peak at θ ≈ 45 may be due to, for example, the symmetry classes [b 14 ] (blobs) and [b 18 ] (horizontal and vertical bars).The peaks at θ ≈ 50 and θ ≈ 57 could be signs of patterns similar to [b 25 ] (diagonal bars) and [b 42 ] (single dots), respectively.The displacement of the first peak from 0 in Fig. 17 (compare [b 1  ] and [b 2 ] ) may be due to the overestimate of the distance θ for points that are very close to the surface.

Figure 20 .
Figure 20.Top: The cumulative percentage P cum (θ ) of the number of patches N (θ ) with respect to the distance θ to the nearest binary patch.Bottom: The cumulative volume V cum (θ ) versus P cum (θ ).

Figure 21 .
Figure 21.The 50 symmetry classes of binary patches ordered after the percentage P k of high-contrast range patches closest to one of the binary patches in the equivalence class [b k ] G .The cumulative percentage P k ,cum is also indicated.For each class, we display the binary patch which is most common among the high-contrast range patches.
), and size([b k ] G ) is the number of equivalent patches in the class [b k ] G .Figure21displays the 50 symmetry classes of binary patches ordered after the percentage of range patchesP k = θ N k (θ )/N tot in each class.For each symmetry class [b k ] G , we show the binary patch b j ∈ [b k ] G thatis most common among the range patches.The figure also shows the cumulative percentage P k ,cum = j ≤k P k .From these numbers we conclude that most high-contrast range patches cluster around binary patches that belong to only a few of the 50 symmetry classes; in fact, 70% of the patches are closest to the first 7 symmetry classes.The most common structures among high-contrast patches are horizontal and vertical edges followed by slanted edges, corner-and T-junction-like structures.The least probable structures are checkerboard and cross patches.
Figure 21 agrees with our Voronoi results for range patches (Fig. 8), as the patterns of the 25 most frequent Voronoi cells resemble the patterns of the patches in the 5 most frequent binary symmetry classes.

Figure 22 .
Figure 22.Slopes in a log-log plot of the densities p k (θ ) for each of the 50 binary symmetry classes [b k ] G ordered after decreasing cumulative percentage P k ,cum .
).Furthermore, the density of range patches in the symmetry class [b k ] G