ScholarWorks @ UVM ScholarWorks @ UVM Rarefaction and extrapolation with Hill numbers: A framework for Rarefaction and extrapolation with Hill numbers: A framework for sampling and estimation in species diversity studies sampling and estimation in species diversity studies

. Quantifying and assessing changes in biological diversity are central aspects of many ecological studies, yet accurate methods of estimating biological diversity from sampling data have been elusive. Hill numbers, or the effective number of species, are increasingly used to characterize the taxonomic, phylogenetic, or functional diversity of an assemblage. However, empirical estimates of Hill numbers, including species richness, tend to be an increasing function of sampling effort and, thus, tend to increase with sample completeness. Integrated curves based on sampling theory that smoothly link rarefaction (interpolation) and prediction (extrapolation) standardize samples on the basis of sample size or sample completeness and facilitate the comparison of biodiversity data. Here we extended previous rarefaction and extrapolation models for species richness (Hill number q D , where q ¼ 0) to measures of taxon diversity incorporating relative abundance (i.e., for any Hill number q D , q . 0) and present a uniﬁed approach for both individual-based (abundance) data and sample- based (incidence) data. Using this uniﬁed sampling framework, we derive both theoretical formulas and analytic estimators for seamless rarefaction and extrapolation based on Hill numbers. Detailed examples are provided for the ﬁrst three Hill numbers: q ¼ 0 (species richness), q ¼ 1 (the exponential of Shannon’s entropy index), and q ¼ 2 (the inverse of Simpson’s concentration index). We developed a bootstrap method for constructing conﬁdence intervals around Hill numbers, facilitating the comparison of multiple assemblages of both rareﬁed and extrapolated samples. The proposed estimators are accurate for both rarefaction and short-range extrapolation. For long-range extrapolation, the performance of the estimators depends on both the value of q and on the extrapolation range. We tested our methods on simulated data generated from species abundance models and on data from large species inventories. We also illustrate the formulas and estimators using empirical data sets from biodiversity surveys of temperate forest spiders and tropical ants.


INTRODUCTION
The measurement and assessment of biological diversity (biodiversity) is an active research focus of ecology (Magurran 2004, Magurran andMcGill 2011) and a central objective of many monitoring and management projects (Groom et al. 2005; Convention on Biological Diversity [CBD], available online). 7 The simplest and still the most frequently used measure of biodiversity is the species richness of an assemblage. Species richness features prominently in foundational models of community ecology (MacArthur and Wilson 1967, Connell 1978, Hubbell 2001, and is a key metric in conservation biology (May 1988, Brook et al. 2003 and historical biogeography (Wiens and Donoghue 2004). In spite of its intuitive and universal appeal, however, species richness is a problematic index of biodiversity for two reasons related to sampling intensity and the species abundance distribution.
First, observed species richness is highly sensitive to sample size (the sampling problem). Because most species in an assemblage are rare, biodiversity samples are usually incomplete, and undetected species are a common problem. As a consequence, the observed number of species in a well-defined biodiversity sample (species density; sensu Gotelli and Colwell 2001) is known to be a biased underestimate of true species richness, and is highly sensitive to the area surveyed, the number of individuals counted, and the number of samples scored for species occurrence (incidence; Colwell and Coddington 1994). Thus, from a statistical perspective, species richness is very difficult to estimate accurately from a finite sample.
A second problem with species richness as a measure of biodiversity is that it does not incorporate any information about the relative abundance of species (the abundance problem). By counting all species equally, species richness weights rare species the same as common ones. If two assemblages have identical species richness, it seems intuitive that any subjective sense of ''diversity'' should be higher in the assemblage with more-equal abundances among all the component species, whereas diversity should be lower in the assemblage that is dominated by the abundance of one or a few common species (Pielou 1975). Incorporating abundance into a biodiversity index is critical for studies of many (but not all) aspects of ecosystem function, because rare species usually make little contribution to important measures of ecosystem function such as biomass, productivity, or nutrient retention (Schwartz et al. 2000). On the other hand, rare species sometimes play key roles in ecosystem function (e.g., top predators; Terborgh et al. 2001) and are generally of greater conservation and management concern than are common ones (May 1988, Holsinger andGottlieb 1991; but see Gaston and Fuller 2008).
An extensive literature addresses both of these issues. For the sampling problem, standardized comparisons of species richness can be made after interpolation with rarefaction (Tipper 1979) to a common level of abundance (Sanders 1968, Hurlbert 1971, Simberloff 1972, Gotelli and Colwell 2001, sampling effort (Colwell et al. 2004), or sample completeness (Alroy 2010, Jost 2010, Chao and Jost 2012. Alternatively, biodiversity data can be used to estimate an asymptotic estimator of species richness that is relatively independent of additional sampling effort. Methods for obtaining asymptotic richness estimators include estimating the area beneath a smoothed curve of a parametric species abundance distribution (Fisher et al. 1943, Connolly andDornelas 2011), extending the species accumulation curve by fitting parametric functions (Sobero´n and Llorente 1993), or using nonparametric asymptotic richness estimators (Chao 1984, Colwell andCoddington 1994) that are based on the frequency of rare species in a sample. Although many ecologists still publish analyses of raw species density data, rarefaction and asymptotic estimators based on statistical sampling theory are becoming standard tools in biodiversity analysis (Gotelli and Ellison 2012). Colwell et al. (2012) recently unified the interpolation and extrapolation procedures for species richness. They showed that a single, smooth sampling curve (with an expectation and an unconditional variance), derived from a reference sample (a collection of individuals [or sampling units] that would be gathered in a typical biodiversity survey) can be interpolated (rarefied) to smaller sample sizes or extrapolated to a larger sample size, guided by an estimate of asymptotic richness. Thus, rigorous statistical comparison of species richness can be performed not only for rarefied subsamples, but also for extrapolated richness values based on samples of arbitrary and equal size. Chao and Jost (2012) developed coverage-based rarefaction and extrapolation methodology to compare species richness of a set of assemblages based on samples of equal completeness (equal coverage). The Colwell et al. (2012) sample-size-based approach standardizes based on sample effort, whereas the Chao and Jost (2012) coverage-based approach standardizes based on sample completeness, an estimated assemblage characteristic. The sample size-and coverage-based integration of rarefaction and extrapolation together represent a unified framework for estimating species richness and for making statistical inferences based on these estimates.
Like the sampling problem, the abundance problem has been recognized for decades in the ecological literature. Ecologists have introduced a plethora of diversity indices that combine species richness and the proportion of each species into a single metric (Washington 1984). These indices tend to be highly correlated with one another, are not always expressed in units that are intuitive, sensible, or that allow comparisons, and have sampling and statistical properties that have been poorly studied (Ghent 1991). Hill numbers are a mathematically unified family of diversity indices (differing among themselves only by an exponent q) that incorporate relative abundance and species richness and overcome many of these shortcomings. They were first used in ecology by MacArthur (1965), developed by Hill (1973), and recently reintroduced to ecologists by Jost (2006Jost ( , 2007. Hill numbers offer five distinct advantages over other diversity indices. First, Hill numbers obey an intuitive replication principle or doubling property. Hill (1973) proved a weak version of the doubling property: If two completely distinct assemblages (i.e., no species in common) have identical relative abundance distributions, then the Hill number doubles if the assemblages are combined with equal weights. Chiu et al. (2013: Appendix B) recently proved a strong version of the doubling property: If two completely distinct assemblages have identical Hill numbers of order q (relative abundance distributions may be different, unlike the weak version), then the Hill number of the same order doubles if the two assemblages are combined with equal weights. Species richness is a Hill number (with q ¼ 0) and obeys both versions of the doubling property, but most other diversity indices do not obey even the weak version.
A second advantage of Hill numbers is that they are all expressed in units of effective numbers of species: the number of equally abundant species that would be needed to give the same value of a diversity measure. Third, key diversity indices proposed in the literature, including the widely used Shannon entropy and the Gini-Simpson index, can be converted to Hill numbers by simple algebraic transformations. Fourth, Hill numbers can be effectively generalized to incorporate taxonomic, phylogenetic, and functional diversity, and thus provide a unified framework for measuring biodiversity (Chao et al. 2010, Gotelli and. Fifth, in the comparison of multiple assemblages, there is a direct link between Hill numbers and species compositional similarity (or differentiation) among assemblages (Jost 2007). This property unites diversity and similarity (or differentiation).
Although species richness is one of the Hill numbers, the literature on Hill numbers and on sampling models for species richness have developed independently. The recent literature generally fails to emphasize that Hill numbers other than species richness (those with q . 0) are also sensitive to the number of individuals or samples collected, although the under-sampling bias is progressively less severe for Hill numbers of higher orders of q. In theory, simple rarefaction curves can be constructed for any diversity index by resampling (Walker et al. 2008, Ricotta et al. 2012, although only recently has this been done explicitly for Hill numbers (Gotelli and Ellison 2012; R. Colwell, available online). 8 Asymptotic estimators for Hill numbers with q ¼ 1 are closely related to the well-known entropy estimation (for reviews, see Paninski 2003. For q ¼ 2 and any integer .2, nearly unbiased estimators exist (Nielsen et al. 2003, Gotelli and. In this paper, we unify the two fundamental frameworks used for the measurement and estimation of species diversity: rarefaction/extrapolation and Hill numbers. Specifically, we generalize the sample-sizebased approach of Colwell et al. (2012) and the coverage-based approach of Chao and Jost (2012) to the entire family of Hill numbers. We provide asymptotic estimators for Hill numbers and use them to link analytic estimators for rarefaction and extrapolation from an empirical reference sample. To characterize the species diversity of an assemblage, we propose using three integrated rarefaction/extrapolation curves based on the first three Hill numbers: species richness, the exponential of Shannon entropy (which we refer to as Shannon diversity), and the inverse Simpson concentration (which we refer to as Simpson diversity). The formulas and estimators are tested with simulated data generated from species abundance models/inventories and applied to several empirical data sets. Finally, we highlight the close theoretical links between Hill numbers and expected species accumulation curves (Hurlbert 1971, Dauby andHardy 2011). With this expanded framework, ecologists will be able to effectively use Hill numbers for a host of problems in biodiversity estimation, including comparison of the species diversity of different assemblages in time or space, with reliable statistical inferences about these comparisons.

TWO TYPES OF DATA AND MODELS
To describe model parameters and sample data, we adopt the notation and terminology of Colwell et al. (2012) and Gotelli and Chao (2013). Consider a species assemblage consisting of N total individuals, each belonging to one of S distinct species. The total abundance of species i is N i , where i ¼ 1, 2, . . . , S, N i . 0, and N ¼ P S i¼1 N i . Let p i ¼ N i /N denote the true relative abundance of species i, so that P S i¼1 p i ¼ 1. We emphasize that the quantities N, S, (N 1 , N 2 , . . . , N S ) and ( p 1 , p 2 , . . . , p S ) are the parameters representing, respectively, the true (albeit unknown) underlying assemblage size, the complete species richness of the assemblage, and the species absolute and relative abundance sets. We consider two sampling data structures for reference samples.

Individual-based (abundance) data and model
In most biological surveys, a sample of n individuals is taken with replacement from the assemblage, and a total of S obs ( S ) species are observed. (If individuals are sampled without replacement, we need to assume that the assemblage size N is much larger than the sample size n). Let X i be the number of individuals of the ith species that are observed in the sample, i ¼ 1, 2, . . . , S; we refer to X i as the sample species frequency. Let f k be the number of species represented by exactly k individuals in the sample, k ¼ 0, 1, . . . , n; we refer to f k as the abundance frequency counts. From these definitions, n ¼ P S i¼1 X i ¼ P k!1 kf k , and S obs ¼ P k!1 f k . In particular, f 1 is the number of species represented by exactly one individual (singletons) in the sample, and f 2 is the number of species represented by exactly two individuals (doubletons). The unobservable frequency f 0 denotes the number of species present in the entire assemblage, but are not observed in the sample.
The multinomial probability distribution is the most widely used model for the observed species sample frequencies (X 1 , X 2 , . . . , X S ) for given S and ( p 1 , p 2 , . . . , p S ): Note that undetected species, i.e., X i ¼ 0, do not contribute to this distribution. In this model, the detection probability for the ith species is simply the true relative abundance p i ¼ N i /N. In this case, the sample size n is fixed. Thus, the number of individuals represented by any single species is at most n, which is fixed by the sampling design. Alternatively, abundance data can also be collected by sampling a fixed area or by applying a fixed sampling effort, rather than a fixed sample size. With this sampling protocol, the sample size is a random variable and thus cannot be fixed in advance, implying that the number of individuals represented by any single species can be large, without any particular limit. A commonly used area-based model is the Poisson product model, which assumes that individuals of the ith species accumulate in the reference sample according to a Poisson process (Ross 1995). This model can be traced to Fisher et al. (1943) and forms the basis for Coleman et al.'s (1982) random sampling model for species-area relationships. As shown by Colwell et al. (2012), the Poisson product model produces results that are virtually indistinguishable from those based on a multinomial model. A statistical reason for this is that the Poisson product model is closely related to a multinomial model; see Chao and Chiu (2013) for details. Therefore, we considered only the multinomial model, which can accommodate both individual-based and area-based abundance data.

Sample-based (incidence) data and model
When the sampling unit is not an individual, but a trap, net, quadrat, plot, or timed survey, it is these sampling units, not the individual organisms that are sampled randomly and independently. Because it is not always possible to count individuals within a sampling unit, estimation can be based on a set of sampling units in which only the incidence (presence) of each species is recorded. The reference sample for such incidence data consists of a set of T sampling units. The presence or absence (technically, non-detection) of each species within each sampling unit is recorded to form a species-by-sampling-unit incidence matrix (W ij ) with S rows and T columns. The value of the element W ij of this matrix is 1 if species i is recorded in the jth sampling unit, and 0 if it is absent. The row sum of the incidence matrix Y i ¼ P T j¼1 W ij denotes the incidence-based frequency of species i, i ¼ 1, 2, . . . , S. Here, Y i is analogous to X i in the individual-based frequency vector. Species present in the assemblage but not detected in any sampling unit yield Y i ¼ 0. The total number of species observed in the reference sample is S obs (only species with Y i . 0 contribute to S obs ).
Following Colwell et al. (2012), we adopted a Bernoulli product model, which assumes that the ith species has its own unique incidence probability p i that is constant for any randomly selected sampling unit. Each element W ij in the incidence matrix is a Bernoulli random variable (since W ij ¼ 0 or W ij ¼ 1), with probability p i that W ij ¼ 1 and probability 1 À p i that W ij ¼ 0. The probability distribution for the incidence matrix is The model is equivalent to a binomial product model for the observed row sums (Y 1 , Y 2 , . . . , Y S ) as follows: Here the probability of incidence (occurrence) p i is the probability that species i is detected in a sampling unit.
In Appendix A, we describe a more general case (quadrat sampling) to interpret the model and explain how this model can incorporate spatial aggregation. Let Q k denote the incidence frequency counts, the number of species that are detected in exactly k sampling units, k ¼ 0, 1, . . . , T, i.e., Q k is the number of species each represented exactly Y i ¼ k times in the incidence matrix sample. Here Q k is analogous to f k in the abundance data. The total number of incidences U recorded in the T sampling units is analogous to n in the abundance data. Here U is a random variable and can be expressed as U ¼ P T k¼1 kQ k ¼ P S i¼1 Y i , and the number of observed species is S obs ¼ P T k¼1 Q k . Here, Q 1 represents the number of unique species (those that are each detected in only one sampling unit), and Q 2 represents the number of duplicate species (those that are each detected in exactly two sampling units). The unobservable zero frequency count Q 0 denotes the number of species among the S species present in the assemblage that are not detected in any of the T sampling units.

HILL NUMBERS
Abundance data Hill (1973) integrated species richness and species abundances into a class of diversity measures later called Hill numbers, or effective numbers of species, defined for q 6 ¼ 1 as in which S is the number of species in the assemblage, and the ith species has relative abundance p i , i ¼ 1, 2, . . . , S. The parameter q determines the sensitivity of the measure to the relative frequencies. When q ¼ 0, the abundances of individual species do not contribute to the sum in Eq. 3a. Rather, only presences are counted, so that 0 D is simply species richness. For q ¼ 1, Eq. 3a is undefined, but its limit as q tends to 1 is the exponential of the familiar Shannon index, referred to here as Shannon diversity: The variable 1 D weighs species in proportion to their frequency. When q ¼ 2, Eq. 3a yields Simpson diversity, the inverse of the Simpson concentration is as follows: which places more weight on the frequencies of abundant species and discounts rare species. Investigators using Hill numbers should report, at least, the diversity of all species (q ¼ 0), of ''typical'' species (q ¼ 1), and of dominant species (q ¼ 2). For a general order of q, if q D ¼ x, then the diversity is equivalent to that of an idealized assemblage with x equally abundant species, which is why Hill numbers are referred to as effective numbers of species or as species equivalents.
A complete characterization of the species diversity of an assemblage with S species, and relative abundances ( p 1 , p 2 , . . . , p S ) is conveyed by a diversity profile (a plot of q D vs. q from q ¼ 0 to q ¼ 3 or 4 [beyond this it changes little]; see To´thme´re´sz 1995). Although Hill numbers for q , 0 can be calculated, they are dominated by the frequencies of rare species and have poor statistical sampling properties. We thus restricted ourselves to the case q ! 0 throughout the paper. An example of a diversity profile is shown in Fig. 1a.
Hill numbers can be regarded as the theoretical or asymptotic diversities at a sample size of infinity for which the true relative abundances fp 1 , p 2 , . . . , p S g of each of i species are known. When sample size is relevant for discussion, we use the notation q D(') to denote the (asymptotic) Hill numbers. Throughout the paper, we use q D and q D(') interchangeably; i.e., q D ¼ q D(').

Incidence data
As far as we are aware, Hill numbers have been discussed only for abundance data and have not previously been defined for sample-based incidence data. Here, we propose the following Hill numbers for sample-based incidence data, based on the Bernoulli product model (Eq. 2a) or equivalently, the binomial product model (Eq. 2b). With either of these two models, P S i¼1 p i may be greater than 1. So we first normalize each parameter p i (i.e., divide each p i by the sum P S i¼1 p i ) to yield the relative incidence of the ith species in the assemblage. This relative incidence is assumed to be the same for any randomly selected sampling unit. Hill numbers of order q for incidence data are defined as: As with abundance data, Hill numbers for incidence data also represent the theoretical or asymptotic diversities when the number of sampling units is infinity. So we also use q D and q D(') interchangeably; i.e., q D ¼ q D('). Hill numbers q D for abundance data are based on relative abundances (Eq. 3a), whereas Hill numbers q D for incidence data are based on relative incidence in the assemblage (Eq. 4a). The parameter q in Eq. 4a determines the sensitivity of q D to the relative incidences. If all the incidence probabilities (p 1 , p 2 , . . . , p S ) are identical, then Hill numbers of all orders equal the species richness of the reference sample. The Hill number q D for incidence data is interpreted as the effective number of equally frequent species in the assemblage from which the sampling units are drawn. That is, if q D ¼ y, then the diversity of the assemblage is the same as that of an idealized assemblage with y species all of equal probability of incidence.
Eq. 4a yields species richness for incidence data when q ¼ 0. As with Eq. 3b, the limit of q D as q tends to 1 exists and gives which is equal to the Shannon diversity for incidence data, i.e., the exponential of Shannon entropy based on the relative incidences in the assemblage. When q ¼ 2, Eq. 4a becomes which is the Simpson diversity for incidence data, i.e., the inverse Simpson concentration based on relative incidences. By analogy to the case for abundance data, a plot of q D vs. q completely characterizes the species diversity of an assemblage with S species and incidence probabilities (p 1 , p 2 , . . . , p S ).

Diversity accumulation curve
It is well known that empirical species richness varies with sampling effort and thus also varies with sample completeness (as measured by sample coverage; see Sample-size-and coverage-based rarefaction and extrapolation). Therefore, we can plot the expected species richness as a function of sample size (this plot is the familiar species accumulation curve) or as a function of sample coverage. The asymptote of this curve as sample size tends to infinity is the species richness in the entire assemblage. We now extend the concept of species accumulation curve to the concept of a diversity accumulation curve.
As discussed for abundance data, the Hill number of a fixed order q defined in Eq. 3a represents the asymptotic diversity at a sample size of infinity. For non-asymptotic diversity, we define the expected diversity q D(m) for a finite sample size m as the Hill numbers based on expected abundance frequency counts for a sample of size m, for which data are formed by averaging among samples of size m taken from the entire assemblage. Mathematical formulas and statistical estimation are derived in the next section. See Discussion for the advantages of our approach over the alternative approach that defines the non-asymptotic Hill numbers as the average Hill numbers over many samples of size m taken from the entire assemblage. Note that for species richness and expected sample completeness (see Samplesize-and coverage-based rarefaction and extrapolation), these two approaches give identical formulas. Our definition similarly can be extended to define the expected diversity q D(m) for m sampling units under the model for incidence data.
Based on the above definition, our goal is to construct a diversity accumulation curve as a function of sample size (the number of individuals for abundance data or the number of sampling units for incidence data) or sample completeness. For example, in the model for abundance data, we considered the following focal questions: (1) When a sample of finite size m drawn at random from the entire assemblage, what are the theoretical formulas for the expected diversity of order q, q D(m), for this sample? The plot q D(m) as a function of m is the sample-size-based diversity accumulation curve. As m tends to infinity, these expected diversities approach q D ¼ q D(') as given in Eq. 3a. An example of a sample-size-based diversity accumulation curve is given in Fig. 1b. (2) For a sample of size m, what is the expected sample completeness, C(m), for this sample? The plot q D(m) as a function of C(m) is the coveragebased diversity accumulation curve. As C(m) tends to unity (complete coverage), these expected diversities also approach q D ¼ q D('). An example of a coverage-based diversity accumulation curve is given in Fig. 1c. (3) Given the data for a reference sample of size n, what are the analytic estimators for q D(m) and C(m)? Rarefaction (interpolation) refers to the case m , n, whereas prediction (extrapolation) refers to the case m . n. The integrated sample-size-or coverage-based rarefaction/extrapolation sampling curve represents the estimated diversity accumulation curve based on the reference sample. (4) When there are multiple assemblages, how do we compare their diversities based on the rarefaction/extrapolation sampling curves?
To answer these questions, we derive the theoretical formulas of q D(m) for any finite sample size m and the corresponding analytic estimators, along with their variances and confidence intervals, in the next section. Thus, sample-size-and coverage-based diversity accumulation curves can be estimated and compared across multiple assemblages. For incidence data, similar questions and the estimation of the diversity accumulation curve can be formulated.

A new perspective
The extension of the now well-understood rarefaction and extrapolation of species richness (for a refresher, see Appendix B for abundance data, and Appendix C for incidence data) to the general case of Hill numbers is not direct, and it requires a new perspective, based on abundance frequency counts with a different statistical framework. We first extend the notation f k (abundance frequency counts of the reference sample of size n) to a more general case. We define the abundance frequency count f k (m) for any m ! 1 as the number of species represented by exactly k individuals in a sample of size m. The expected value of the abundance frequency count f k (m) can be expressed as follows (see Appendix D for a proof ): FIG. 1. (a) A diversity profile curve, which plots Hill numbers q D(') as a function of order q, 0 q 3. Hill numbers are calculated for a Zipf-Mandelbrot model (Magurran 2004) including 100 species with species relative abundance The three solid dots denote Hill numbers for order q ¼ 0, 1, and 2. The diversity profile curve is a nonincreasing function of q. The slope of the curve reflects the unevenness of species relative abundances. The more uneven the distribution of relative abundances, the more steeply the curve declines. For completely even relative abundances, the curve is a constant at the level of species richness. (b) Sample-size-based diversity accumulation curve, which plots the expected diversity q D(m) as a function of size m, q ¼ 0, 1, and 2. As sample size m tends to infinity, each curve approaches q D('). (c) Coverage-based diversity accumulation curve, which plots the expected diversity q D(m) as a function of expected coverage, q ¼ 0, 1, and 2. As sample coverage tends to unity, each curve approaches q D(').
Note that E[ f 0 (m)] ¼ P S i¼1 ð1 À p i Þ m is the expected number of undetected species in a sample of size m. For the reference sample of size n, the frequency f k (n) is simply denoted as f k , as we defined in Abundance data.
To derive the theoretical formula of q D(m), we first describe the frequency counts expected in a sample of size m. Suppose a random sample of m individuals is taken from the entire assemblage; we obtain a set of abundance frequency counts for this sample, f f k (m); k ¼ 1, . . . , mg. After an infinite number of samples of size m have been taken, the average of f k (m) for each k ¼ 1, 2, . . . , m tends to E[ f k (m)], as derived in Eq. 5. The frequency counts expected in a sample of size m are thus fE[ f k (m)]; k ¼ 1, . . . , mg. According to our formulation, the expected diversity for a sample of size m, q D(m), is the set of Hill numbers based on these expected frequencies. Note that, for a sample size of m, the relative abundances of species are simply 1/m (there are Thus, Hill numbers of order q for a sample of size m are This formula is valid for any sample size m, which can be either less than the reference sample size n or greater than n. Therefore, throughout the paper, the theoretical formulas for rarefaction and extrapolation of Hill numbers refer to Eq. 6. The second column in Table 1 summarizes the formulas for the special cases of q ¼ 0, 1, 2, and in general for q . 2.

Analytic rarefaction and extrapolation estimators for Hill numbers of order q
Based on a reference sample of size n with sample frequency X i for the ith species and the observed frequency counts f k ¼ f k (n), we derive here the analytic estimators qD (m) for q D(m) given in Eq. 6. The notation ''hat'' on a diversity, e.g., qD (m), means an estimator of that diversity based on the reference sample. The observed Hill numbers, q D obs , for the reference sample are simply qD (n). That is, Our approach to deriving rarefaction formulas is based on statistical estimation theory for the expected frequency counts. The minimum variance unbiased See Appendix D for a proof. Here, We use this conventional definition throughout this paper and the appendices. By substitution (from Eq. 6), we can obtain the following analytic estimators of the expected diversity of an interpolated sample of size m as follows: Eq. 9a is the general, nearly unbiased, rarefaction formula for Hill numbers of any order q. (An estimator is nearly unbiased if its bias tends to zero when the reference sample size n is large.) The analytic estimator for the rarefaction of Hill numbers for each of the orders q ¼ 0, 1, and 2 is thus obtained by replacing E[ f k (m)] withf k (m) in the specific formulas provided in Table 1. The extrapolation of Hill numbers of any order q is a prediction of the expected diversity q D(n þ m*) for an augmented sample of size m ¼ n þ m*. Although the general formula in Eq. 6 for q D(m) also holds for any sample size m . n, our estimatorf k (m) in Eq. 8 is valid only for m , n. Therefore, we cannot simply replace E[ f k (m)] byf k (m) as we did for rarefaction. For each q, we need to develop a different approach for extrapolation by means of an estimate of species richness (for q ¼ 0), Shannon diversity (for q ¼ 1), Simpson diversity (for q ¼ 2), and higher orders (for q . 2).
Species richness (q ¼ 0).-From Eqs. 6 and 9a, the species richness (q ¼ 0) for a sample of size m , n is In Appendix D we show that this estimator is identical to the traditional individual-based rarefaction estimator (Hurlbert 1971, Smith andGrassle 1977). Our new perspective, however, offers a simpler, alternative approach to traditional individual-based rarefaction of species richness. Eq. 9b shows that the traditional rarefaction estimator of the expected species richness for a sample size of m is simply the sum of the estimated frequency counts. This idea can be extended easily to Hill numbers of any orders, as we next illustrate. The extrapolated species richness estimator for a sample of n þ m* used in this paper is reviewed in Appendix B, and the formula (originally derived by Shen et al. 2003) is shown in Table 1. This approach requires an estimated asymptote of species richness. Any proper species richness estimator can be used.  suggested using the Chao1 estimator (Chao 1984) or abundance-based coverage estimator (ACE; Chao and Lee 1992) and noted that extrapolation gives reliable estimates only up to approximately double or triple the reference sample size. This limitation is primarily a consequence of the fact that the asymptotic estimator is only a lower bound (Chao 1984).
Shannon diversity (q ¼ 1).-From Eqs. 6 and 9a, we have the following nearly unbiased interpolation estimator for the Hill number q ¼ 1 (Shannon diversity): For our new extrapolation formula, we need an estimator for the asymptote of Shannon diversity.  derived the following nearly unbiased estimator,Ĥ('), of Shannon entropy H ¼ H(') ¼ À P S i¼1 p i log p i as follows: TABLE 1. Theoretical formulas and analytic estimators for rarefaction and extrapolation of abundance-based Hill numbers of order q ¼ 0, q ¼ 1, q ¼ 2, and any integer order q . 2, given a reference sample with the observed Hill numbers q D obs and estimated coverageĈ ind (n).
Order/coverage Theoretical formulaà (for all m . 0) Interpolation estimator § (for m , n) Notes: The last row gives equations for sample completeness as a function of sample size. It also gives the corresponding coverage estimators for rarefied samples and extrapolated samples for coverage-based rarefaction and extrapolation curves.
For the reference sample, the observed Hill number of order q is qD g; see Eq. 12 in the subsection Sample-size-and coverage-based rarefaction and extrapolation.
à The frequency count f k (m) is defined as the number of species represented by exactly k individuals/times in a sample of size m. The formula for E[ f k (m)] is given in Eq. 5 in the subsection A new perspective.
§ An unbiased estimatorf k (m) for E[ f k (m)] exists for m , n and is given in Eq. 8 in the subsection Analytic rarefaction and extrapolation estimators for Hill numbers of order q.
# The Stirling number of the second kind, w(q, j ), is defined by the coefficient in the expansion . As a result, the asymptotic estimator for Shannon diversity is 1D The extrapolated estimator for Shannon diversity of a sample of size n þ m* is as follows: Details of the derivation are provided in Appendix E. Extensive simulations  suggest that the asymptotic Shannon estimator in Eq. 10b is nearly unbiased, implying the extrapolation provided by Eq. 10c is valid for a wide prediction range. This extrapolation can be safely extended to the asymptote. Simpson diversity (q ¼ 2).-The general formula for the expected Simpson diversity for any sample size m (for both m , n and m . n) is See Appendix D for proofs. A minimum variance unbiased estimator of P S i¼1 p 2 i is P S i¼1 X i ðX i À 1Þ/[n(n -1)] (Good 1953), implying that an estimator for the asymptotic Simpson diversity is 2D ¼ 2D (') ¼ n(n -1)/ P Xi!2 X i (X i -1). An interpolated estimator (m , n) from Eq. 11a can be expressed in two equivalent forms: We can apply Eq. 11a to an augmented size of n þ m* and obtain the following extrapolated estimator: The rarefaction, extrapolation, and asymptotic estimators are all nearly unbiased. This means that for q ¼ 2, the extrapolation can be safely extended to the asymptote. Diversity of integer order q . 2.-For Hill numbers of integer order q . 2, a nearly unbiased interpolation estimator is given in Eq. 9a. A general extrapolation estimator qD (n þ m*) is quite complicated and is shown in the last column in Table 1 (see Appendix E for derivation details). A nearly unbiased estimator of the true asymptotic value q D ¼ q D(') for any integer q . 2 is qD (') ¼ ½ P Xi!q X ðqÞ i =n ðqÞ 1=ð1ÀqÞ (Gotelli and Chao 2013), where x ( j ) denotes the falling factorial x (x -1) . . . (x -j þ 1). Table 1 summarizes, for abundance data, all theoretical formulas and analytic estimators for rarefaction and extrapolation of Hill numbers of order q ¼ 0, 1, 2 and any integer order q . 2. (The last row of Table 1 also gives the formulas for sample-completeness as a function of sample size; see the next subsection). We tested our estimators on simulated data generated from several species abundance models and on data from large empirical data sets (Appendix F). The results show that the proposed analytic rarefaction and extrapolation estimators match perfectly with the corresponding theoretical values for rarefied and extrapolated samples up to double the reference sample size. However, when the extrapolated sample size is more than double the reference sample size, the performance of our predictors depends on extrapolated range and the order q (see Discussion).
There are two kinds of variance associated with an interpolated or extrapolated estimator. A variance that is conditional on the reference sample measures only the variation in diversity that would arise from repeatedly resampling (without replacement) the given reference sample. This conditional variance approaches zero as m approaches n because the diversity of sample size of n is fixed (i.e., there is only one combination of all individuals Extrapolation estimator} (for a sample of size n þ m*) (reliable for m* , n) February 2014 or all sampling units). An unconditional variance measures the variation in diversity that would arise if another new sample of size m were taken from the entire assemblage (rather than from the original reference sample). Therefore, the unconditional variance does not approach 0 when sample size tends to n, and all associated confidence intervals are symmetric, which reflects the uncertainty of the new sample. In deriving an ''unconditional'' variance, the number of undetected species must be estimated because those undetected species also affect the variation of a new sample. In most applications, unconditional variance is more useful because inferences are not restricted to the reference sample. Colwell et al. (2012) obtained an unconditional analytic variance estimator for rarefied and extrapolated species richness estimators. However, extending this analytic approach for variance estimators to a general order of q becomes mathematically intractable. Therefore, we suggest a simpler, bootstrap method (Appendix G), to obtain unconditional variances and confidence intervals for all rarefied and extrapolated estimators. In the proposed procedure, we follow Colwell et al. (2012) and use the Chao1 (for abundance data) or Chao2 (for incidence data) to estimate the number of undetected species in the reference sample (Chao 1984(Chao , 1987, although any other proper estimators can also be used. The examples in Worked examples: comparison of assemblages illustrate our proposed sampling curves and the associated confidence intervals based on the unconditional variance from our proposed bootstrap method.

Sample-size-and coverage-based rarefaction and extrapolation
In comparing diversities among multiple assemblages, samples can be standardized by sample size or by sample completeness. Our proposed sample-size-based sampling curve for Hill numbers of each specific order q includes the rarefaction part (which plots qD (m) as a function of m, where m , n; see Table 1) and the extrapolation part (which plots qD (n þ m*) as a function of n þ m* for m* . 0; see Table 1) and yields a smooth sampling curve, the two parts of which join smoothly at the point of the reference sample (n, q D obs ). To fully incorporate the effect of relative abundance on diversity estimation, we suggest plotting curves for at least the first three Hill numbers (q ¼ 0, 1, 2).
When there are many ''invisible'' species (species with extremely small relative abundance that are almost undetectable in normal sampling schemes) our intuition is that the number of undetected species in samples (or equivalently, species richness in the entire assemblage) is very hard to estimate; see Colwell et al. (2012) and Gotelli and Chao (2013) for a review. On the other hand, and contrary to intuition, the notion of sample completeness can be accurately and efficiently estimated using only information contained in the reference sample itself. Sample completeness can be measured by sample coverage (or simply coverage), a concept origi-nally developed by the founder of modern computer science, Alan Turing, and I. J. Good (Good 1953, 2000, Good and Toulmin 1956according to Good [2000], Turing never published this work, but gave permission to Good to publish it). Coverage is defined as the total relative abundances of the observed species, or equivalently, the proportion of the total number of individuals in an assemblage that belong to species represented in the sample. Turing and Good (Good 1953, 2000, Robbins 1968, Esty 1983, 1986) derived a simple coverage estimator (of the reference sample of size n) as one minus the proportion of singletons. Robbins (1968) showed that the average squared error of this estimator ' 1/n. A tiny percentage of coverage can contain an infinite number of rare species. The estimated complement of coverage is not an estimate of the number of unseen species, but rather it estimates the proportion of the total individuals in the assemblage that belong to undetected species. For this reason, extremely rare, undetected species do not make a significant contribution to that proportion, even if there are many such species. This intuitively explains why the estimation of species richness in highly diverse assemblages is a statistically difficult issue, whereas sample coverage can be accurately estimated. Alroy (2010) and Jost (2010) independently proposed that samples be standardized to a common level of sample completeness (as measured by sample coverage), and developed algorithmic approaches for comparing rarefied samples. For species richness, Chao and Jost (2012) were the first to derive an analytic method for seamless coverage-based rarefaction and extrapolation. Chao and Jost (2012) suggested plotting rarefaction and extrapolation curves with respect to sample coverage rather than with respect to sample size because the expected species richness for equal sample coverage satisfies a replication principle or doubling property, which the expected species richness for equal sample size does not obey. Similar conclusions are valid for the expected diversity of any order q; see Appendix D for details. This property makes it possible to quantify ratio comparisons or any other comparisons between the magnitudes of the diversities of the assemblages.
For individual-based abundance data, Chao and Jost (2012) used a more accurate sample coverage estimate for the reference sample, as follows: They also derived an interpolated coverage estimator C ind (m) for any rarefied sample of size m , n and extrapolated coverage estimatorĈ ind (n þ m*) for any augmented sample of size n þ m*; see Table 1 (last row) for their formulas. The extrapolated coverage estimator is reliable if m* , n.
As with sample-size-based curves, for any specific order q, the coverage-based interpolation [which plots qD (m) with respect toĈ ind (m)] and extrapolation (which plots qD (n þ m*) with respect toĈ ind (n þ m*)) join smoothly at the reference point (Ĉ ind (n), q D obs ). The confidence intervals of expected diversity based on the bootstrap method also join smoothly.

Bridging sample-size-and coverage-based approaches
The sample-size-based approach plots the estimated diversity as a function of sample size, whereas the corresponding coverage-based approach plots the same diversity with respect to sample coverage. Therefore, these two approaches can be bridged by the relationship between coverage and sample size. Using the coverage estimators in Tables 1 and 2 (the last row in each table), we can construct a sample completeness curve, which reveals sample completeness for a given sample size. From the original reference sample, this curve estimates sample completeness for smaller rarified samples, as well as for larger extrapolated samples. This curve also provides an estimate of the sample size needed to achieve a fixed degree of completeness.
An optimal stopping theory derived by Rasmussen and Starr (1979) specifies that sampling stops when sample coverage reaches a predetermined value. The sample completeness curve thus provides information about whether we should continue or stop sampling. If multiple assemblages are to be sampled and compared, Chao and Jost (2012) suggested that ecologists should sample each assemblage to the same degree of completeness. Such equally complete samples from different assemblages can be compared directly, without any need for rarefaction or extrapolation. See Worked examples for illustration.

RAREFACTION AND EXTRAPOLATION OF INCIDENCE DATA USING HILL NUMBERS
For incidence data, parallel derivations to those for abundance data yield equations for the theoretical expected diversities for any sample size t (the second column in Table 2). Here, ''sample size'' for incidence data means ''number of sampling units.'' The analytic estimators for rarefied samples and analytic estimators for extrapolated samples are also given in Table 2. The asymptotic estimator for each order q of Hill numbers (q ¼ 0, 1, 2) is provided in the footnotes of Table 2. Full derivation details along with a replication principle appear in Appendix H; here we highlight the following differences from the models and estimators for abundance data.
First, for abundance data, our derivation was based on a model in which the species frequency X i follows a binomial distribution characterized by n and the true relative abundance p i . In contrast, for incidence data, we assume the species incidence-based frequency Y i follows a binomial distribution characterized by T (the total number of sampling units) and incidence probability p i . With abundance data, P S i¼1 p i ¼ 1, but with incidence data, P S i¼1 p i can exceed 1.
Second, the total number of individuals n in a reference sample of abundance data is fixed by design. In contrast, total number of incidences in a reference sample of T sampling units is a random variable U, with expectation E(U) ¼ T P S i¼1 p i . Therefore, P S i¼1 p i can be accurately estimated by U/T. For abundance data, the number of individuals in any rarefied or extrapolated sample size is fixed, whereas for incidence data, the number of incidences in any t sampling units is random Third, for abundance data, the primary derivations of our estimators are based on frequency counts f k (m), the number of species represented by exactly k individuals (or observed k times) in a sample of size m. The corresponding incidence frequency count is Q k (t), the number of species recorded in exactly k sampling units in a sample of t sampling units.
The sample completeness curve as a function of abundance, developed by Chao and Jost (2012), is reviewed in Appendix B, and the formulas appear in the last row of Table 1. In Appendix C we derive, for the first time, the corresponding sample completeness curve as a function of sampling units for incidence data. With such a curve, ecologists can objectively quantify the sample completeness for any incomplete abundance or incidence data sets. These curves help determine a sample size needed in a designing a survey. All formulas are summarized in the last row of Table 2.
Based on the formulas in Table 2, for each order q of the Hill numbers q D for incidence data, we can obtain an integrated rarefaction/extrapolation sampling curve with confidence intervals. Statistical inference theory implies that the proposed interpolated estimator for diversity is unbiased for q ¼ 0 and nearly unbiased for q ¼ 1 and 2. We support these claims with simulation tests (Appendix F). As with the abundance data, the performance of our extrapolated estimators depends on the order of Hill numbers and the prediction range of extrapolation. Simulation tests provide some general usage guidelines (see Discussion). In Tables 1 and 2, we summarize the properties and performance of each index based on theory and analyses of empirical and simulated data sets.

WORKED EXAMPLES: COMPARISON OF ASSEMBLAGES
Example 1: Abundance data-comparing spider species diversity in two treatments Sackett et al. (2011) provided species abundance data for samples of spiders from four experimental forest canopy-manipulation treatments at the Harvard Forest. The treatments were established to study the long-term consequences of loss of the dominant forest tree, eastern hemlock (Tsuga canadensis), caused by a nonnative insect, the hemlock woolly adelgid (Adelges tsugae; Ellison et al. 2010). Data from two treatments are used here for illustration: (1) the hemlock-girdled treatment, in which bark and cambium of hemlock trees were cut and the trees left in place to die to mimic tree mortality by adelgid infestation; and (2) the hemlock-logged treatment, in which hemlock trees were cut and removed from the plots (Ellison et al. 2010). The abundance frequency data for the two treatments (summed over two plots per treatment) are tabulated in Table 3, and the rank-abundance distributions are shown in Fig. 2. We used the data from these two treatments to illustrate the construction of two types (sample-size-and coverage-based) of rarefaction and extrapolation curves of Hill numbers. The constructed sampling curves were then used to compare spider species diversities between the two treatments.
The reference sample size (number of individual spiders) for the girdled treatment was 168, and the observed species richness, Shannon diversity, and Simpson diversity (i.e., Hill numbers for q ¼ 0, 1, 2) for this reference sample size were, respectively, 26, 12.06, and 7.84 (solid points in Fig. 3a and b). The sample size for the logged treatment was 252, and the corresponding observed Hill numbers for q ¼ 0, 1, 2 were 37, 14.42, and 6.76, respectively. Thus, judging from the unstandardized raw data (the reference samples), the logged treatment appears to have higher observed species richness and Shannon diversity, but lower Simpson diversity than the girdled treatment.
Step 1: Compare sample-size-based sampling curves up to a base sample size (Fig. 3).-We first constructed, for each of the two treatments, the integrated sample-sizebased rarefaction and extrapolation curves for Hill TABLE 2. The theoretical formulas and analytic estimators for rarefaction and extrapolation of Hill numbers based on incidence data for q ¼ 0, q ¼ 1, q ¼ 2, and any integer order q . 2, given a reference sample with the observed Hill numbers q D obs and estimated coverageĈ sample (T).

Order/coverage
Theoretical formula for all t . 0à Interpolation estimator (t , T) § Notes: The last row gives equations for sample completeness as a function of sample size, and the corresponding coverage estimators for rarefied samples and extrapolated samples. See Appendix C (for q ¼ 0) and Appendix H (for q . 0) for notation and all derivation details.
For the reference sample, the observed Hill number of order q is q D obs ¼ ½ P T k¼1 ðk=UÞ q Q k 1=ð1ÀqÞ . The coverage of the reference sample is estimated byĈ sample (T) ¼ 1 - jQ j denotes the total number of incidences in T samples.
à For any sample size of t, Q k (t) is defined as the number of species detected in exactly k sampling units. U t is defined as the expected total number of incidences in t sampling units: U t ¼ P t j¼1 jE½Q j ðtÞ ¼ t P S i¼1 p i . § An unbiased estimatorQ k (t) for E[Q k (t)] exists for t , T and is given in Eq. H.5 in Appendix H. An unbiased estimator for U t is Uˆt ¼ P t j¼1 jQ j ðtÞ ¼ tU/T. } When t* tends to infinity, each predictor tends to the estimator of the asymptotic diversity: 0D (') ¼ S obs þQ 0 , whereQ 0 is a predictor for Q 0 (Chao 1987); see Eq. C.5 in Appendix C. 1D (') ¼ exp[Ĥ sample (')], whereĤ sample (') is an entropy estimator for incidence data; see Eq. H.7 in Appendix H. For an integer q ! 2, qD (' Table 1 for the definition of the Stirling number of the second kind: w(q, j).
numbers of q ¼ 0, 1, 2. In Fig. 3a, we show these samplesize-based curves with 95% confidence intervals based on a bootstrap method. We extrapolated up to double the reference sample size (i.e., up to size 336 for the girdled treatment and size 504 for the logged treatment). In each plot, except for initial, small sample sizes, none of the confidence intervals for the three curves intersect, and the rank order of diversity is species richness . Shannon diversity . Simpson diversity. For any fixed sample size or completeness in the comparison range, if the 95% confidence intervals do not overlap, then significant differences at a level of 5% among the expected diversities (whether interpolated or extrapolated) are guaranteed. However, partially overlapping intervals do not guarantee nonsignificance (Schenker and Gentleman 2001). The curve for species richness (q ¼ 0) increases steeply with sample size in both treatments, but the curves for Shannon and Simpson diversity (q ¼ 1 and q ¼ 2) level off beyond the reference sample, illustrating that higher order Hill numbers are increasingly dominated by the frequencies of the more common species and are, therefore, less sensitive to sampling effects.
To compare diversities between the girdled and logged treatments, we show in Fig. 3b, for each fixed value of q (q ¼ 0, 1, and 2), the sample-size-based rarefaction and extrapolation of these two plots with 95% confidence intervals up to a base sample size. We suggest the base sample size to be double the smallest reference sample size or the maximum reference sample size, whichever is larger (the reason for our suggestion will become clearer in the second example). See Box 1 for systematic steps to determine a base sample size. In this example, the base sample size is 336 (double the smaller reference sample size). The estimated Hill numbers can then be compared across assemblages for any sample size less than the base size. In a traditional rarefaction, the data from the logged treatment would be rarefied to a sample size of 168 individuals to match the abundance in the girdled treatment. For this rarefied sample, the Hill numbers of q ¼ 0, 1, 2 are estimated to be 31.71, 13.83, and 6.68, respectively. The proposed integrated sampling curve allows reliable comparisons for any sample size up to an abundance of 336. Across this range of abundance, Fig.  3b reveals that the logged treatment is more diverse for all but the smallest sample sizes for species richness (q ¼ 0) and Shannon diversity (q ¼ 1), although the confidence intervals overlap. In contrast, for Simpson diversity (q ¼ 2), the girdled treatment is more diverse, although again the two confidence intervals overlap.
Step 2: Construct a sample completeness curve to link sample-size-and coverage-based sampling curves (Fig.  4).-Based on Eq. 12, the coverage for the girdled treatment is estimated as 93% for the reference sample of size 168 individuals, and the coverage for the logged treatment is 94% for the reference sample of 252 individuals. It is informative to examine how the sample completeness varies with sample size (see the formulas in the last row in Table 1). In Fig. 4, we plot the sample completeness curve as a function of sample size for each of the two treatments, up to double the reference sample size. For any sample size less than 168, the curve shows that the sample completeness for the girdled treatment is estimated to be higher than that in logged treatment, although the confidence intervals overlap. When sample size is larger than 168, the estimates of sample coverages (reliable for t* , T ) TABLE 3. Spider species abundance frequency counts in two canopy manipulation treatments (Ellison et al. 2010, Sackett et al. 2011. Girdled Logged 1  12  1  14  2  4  2  4  4  1  3  4  6  2  4  3  8  1  5  1  9  1  7  3  15  2  8  2  17  1  10  1  22  1  13  1  46  1  15  1  16  1  22  1  88  1 Note: The data include pairs of (i, f i ) where f i refers to the number of species represented by exactly i individuals. For the girdled treatment, S obs ¼ 26 species, n ¼ 168 individuals; for the logged treatment, S obs ¼ 37 species, n ¼ 252 individuals.
for the two treatments differ little. If we apply a traditional rarefaction approach to standardize sample coverage, a sample size of ;168 individuals in the logged treatment gives a sample coverage of 93%. Thus, the diversity ordering of the two treatments for 93% of the assemblage individuals is the same as that for a standardized sample of 168 individuals. The sample completeness curve figure provides a bridge between sample-size-and coverage-based sampling curves, as will be explained in the next step.
Step 3: Compare coverage-based sampling curves up to a ''base coverage'' (Fig. 5).-From the sample completeness curve (Fig. 4), when sample size in the girdled treatment is doubled from 168 to 336 individuals, the sample coverage is increased from 93% to 96%. In the logged treatment, when sample size is doubled from 252 to 504 individuals, the coverage is increased from 94% to 97%. In Fig. 5a, we present, for each treatment, the corresponding coverage-based rarefaction and extrapolation curves with 95% confidence intervals for diversity of q ¼ 0, 1, 2 when the coverage is extrapolated to the value for a doubling of each reference sample size.
In Fig. 5b, we compare the coverage-based diversities of the two treatments for q ¼ 0 (left panel), q ¼ 1 (middle panel), and q ¼ 2 (right panel) up to the coverage of 96%. This is our ''base coverage'' (the lowest coverage for doubled reference sample sizes or the maximum coverage for reference samples, whichever is larger). See Box 1 for suggestions on the choice of base coverage. Because the increase in coverage for the extrapolation is small, and the estimated diversity for q ¼ 1 and 2 hardly change beyond the reference samples, the extrapolation parts in Fig. 5b are nearly invisible for these two orders of q. Since the two confidence bands do not intersect for species richness (q ¼ 0) if coverage exceeds 50% (Fig. 5b, left panel), species richness in the logged treatment is significantly higher than in the girdled treatment for any standardized sample coverage between 50% and 96%. For Shannon diversity (q ¼ 1), the logged treatment is more diverse, but the confidence bands overlap. For Simpson diversity (q ¼ 2), when coverage is less than 70%, both treatments have almost the same diversity, but when coverage is greater than 70%, the Simpson diversity for the girdled treatment is slightly higher.
Comparing Figs. 3b and 5b, we see that the samplesize-and coverage-based curves for q ¼ 0 and q ¼ 1 exhibit consistent diversity orderings between the two treatments. However, for q ¼ 2, the sample-size-based curves do not intersect (Fig. 3b), but the coverage-based curves have two crossing points (Fig. 5b). See Discussion for more comparisons of the two types of curves.
Example 2: Incidence data-comparing species diversity of tropical ants among five sites We used the tropical ant species data collected by Longino and Colwell (2011) from five elevations on the Barva Transect, a 30-km continuous gradient of wet forest on Costa Rica's Atlantic slope. The five sites are, respectively, at elevations of 50 m, 500 m, 1070 m, 1500 m, and 2000 m. Species presence or absence was recorded in each sampling unit, which consisted of all worker ants extracted from a 1-m 2 forest floor plot. See Longino and Colwell (2011) for sampling and data details. A sample-by-species incidence matrix was produced for each of the five sites. The incidence frequency counts are given in Colwell et al. (2012: Table  6). The plots for rank-frequency distributions of the five sites are shown by Longino and Colwell (2011: Fig . 3). An integrated rarefaction and extrapolation curve for species richness was presented by Colwell et al. (2012: Fig. 4b). They concluded that species richness among the five sites was significantly different (none of the confidence intervals intersect, except for very small sizes), and that richness has the ordering: 500 . 50 . 1070 . 1500 . 2000 m.
Step 1: Compare sample-size-based sampling curves up to a base sample size (Fig. 6).-For each of the five sites, Fig. 6a shows the observed Hill numbers and samplesize-based rarefaction and extrapolation plots with 95% confidence intervals for three sampling curves (Hill numbers of q ¼ 0, 1, 2) up to double the reference sample size. To compare diversity among the five elevations, we first determined the base sample size. The reference sample sizes T (number of sampling units) for each elevation (50, 500, 1070, 1500, and 2000 m) are, respectively, 599, 230, 150, 200, and 200. The base sample size would be 599 (which is larger than 2 3 150 ¼ 300, double the smallest reference sample); see Box 1 for the choice of this base sample size. An advantage of this FIG. 2. Rank-abundance distributions for spider data from the girdled and logged treatments of eastern hemlock (Tsuga canadensis) at a study from the Harvard Forest, Petersham, Massachusetts, USA. In the girdled treatment, bark and cambium of hemlock trees were cut and the trees left in place to die to mimic tree mortality by adelgid infestation, and in the logged treatment, hemlock trees were cut and removed from the plots. The proportional abundance on the y-axis (on a logarithmic scale) is calculated as the proportion of the maximum abundance.
choice of base sample size is that no data are excluded from our analysis. However, a drawback is that the extrapolation range for some samples could exceed their doubled reference sample sizes. For Shannon and Simpson diversities, the prediction biases are minimal beyond the double reference sample sizes, but for species richness in such cases we should be cautious about the prediction bias. See Discussion for suggestions on extrapolation range.
Next for each specific order of q, we plot the samplesize-based interpolation and extrapolation curves with 95% confidence bands for these five elevations together in the same figure, as illustrated in Fig. 6b. Extrapolations are extended to the base sample size of 599 for all sites. Our plot of q ¼ 0 corresponds to Fig. 4b in Colwell et al. (2012). We here extend their approach to include curves for q ¼ 1 and q ¼ 2, and also include coveragebased plots. For the three orders of q, diversity of the sites is consistently ordered as 500 . 50 . 1070 . 1500 . 2000 m (Fig. 6b). All confidence intervals are nonoverlapping (except for very small sizes), implying the diversity of any order q ¼ 0, 1, 2 is significantly different among the five elevations for any fixed sample size up to 599 sampling units.
Step 2: Construct a sample completeness curve to link sample-size-and coverage-based sampling curves (Fig.  7).-The sample coverages for the five sites (50, 500, 1070, 1500, and 2000 m) were estimated as 99.18%, 97.60%, 98.39%, 98.89%, and 99.64%, respectively, indicating that sampling is nearly complete for all sites. A summary of coverage estimators for incidence data appear in Table 2. See Appendix C for estimation details. For any fixed sample size ,300 sampling units, the sample coverage for the two lower elevations (50 and 500 m) is significantly lower than coverage at higher elevations (1070, 1500, 2000 m). When sample size is greater than 300, the pattern persists, but the 95% confidence bands begin to overlap. Step 3: Compare coverage-based sampling curves up to a base coverage (Fig. 8). -Fig. 8a shows, for each plot, the corresponding coverage-based rarefaction and extrapolation curves for Hill numbers of q ¼ 0, 1, 2 when the coverage is extrapolated to the value for a doubling of each reference sample size. From the sample completeness curve (Fig. 7), when the sample size in each site is doubled, the sample coverage increases very slightly for all sites. There is little change in ant diversity for q ¼ 1 and 2. Thus, the extrapolated portions of the curves in Fig. 8a are nearly invisible, as we also noted in Fig. 5.
When all sample sizes are doubled, the minimum value of the coverage values of these doubled sample sizes among the five sites is 99.08% (for 500 m elevation). However, it is less than the coverage 99.64% of the reference sample for 2000 m elevation. In order to use all data, we select our base coverage to be 99.64% (Box 1). Fig. 8b compares coverage-based rarefaction and extrapolation curves up to the base coverage of 99.64%. All three coverage-based diversities show the same ordering by elevation as in the sample-size-based comparison. None of the confidence intervals overlap except at very small coverage values, implying significant differences in ant diversity among the five elevational transects at comparable coverage. BOX 1. Systematic steps to determine base sample size for the sample-size-based rarefaction and extrapolation, and base coverage for the coverage-based rarefaction/extrapolation. Example 2 is used to illustrate each step. The reference sample size for the ith sample is denoted by n i , i ¼ 1, 2, . . . , k, and the corresponding sample coverage estimate is denoted by C(n i ). For abundance data, sample size refers to the number of individuals; for incidence data, sample size refers to the number of sampling units.
Step 0. Set the maximum extrapolated ratio r, equal to the ratio of the extrapolated sample size and the reference sample size. For making inferences about species richness (q ¼ 0), we suggest the maximum extrapolated size should be double the reference sample size, that is, r ¼ 2. For inferences for diversity of q ! 1, r can be any positive number, i.e., it is statistically safe to extrapolate to the asymptote.

DISCUSSION
We have developed a new, comprehensive statistical framework for the analysis of biodiversity data based on Hill numbers. We also advocate the use of sample coverage (or simply coverage), developed by Turing and Good (Good 1953) to quantify sample completeness. To characterize the species diversity of an assemblage, we propose constructing two types of integrated rarefaction and extrapolation curves (sample-size-and coveragebased) as illustrated in Figs. 3a and 5a for Example 1, and Figs. 6a and 8a for Example 2. For each type of curve, we suggest plotting three rarefaction/extrapolation curves (with confidence intervals) corresponding to three orders (q ¼ 0, 1, 2) of Hill numbers. These curves are then used to compare multiple assemblages, as illustrated in Figs. 3b and 5b of Example 1 and 6b and 8b of Example 2. The sample-size-and coverage-based curves are linked by a sample completeness curve (Figs. 4 and 7), which reveals the relationship between sample size (number of individuals or number of sampling units) and sample completeness. This curve illustrates how much sampling effort is needed to achieve a predetermined level of sample completeness.
The proposed estimators work well for rarefaction and short-range extrapolation in which the extrapolated sample size is up to twice the reference sample size. For rarefaction, our proposed estimator is unbiased for q ¼ 0 and nearly unbiased for q ¼ 1 and 2. For short-range extrapolation, the prediction bias with respect to the expected diversity is often limited. When the extrapolated sample size is more than double the reference sample size, the prediction bias depends on the extrapolated range and the order q. The magnitude of the prediction bias generally increases with the predic-FIG. 5. (a) Coverage-based rarefaction (solid line) and extrapolation (dashed line) plots with 95% confidence intervals for spider species diversity based on Hill numbers (q ¼ 0, 1, 2) for the hemlock girdled and logged treatments. Reference samples are denoted by solid dots. In the girdled treatment, the coverage was extrapolated to 96%, and in the logged treatment, the coverage was extrapolated to 97% (i.e., the coverage value for a doubling of each reference sample size). The numbers in parentheses are the sample coverage and the observed Hill numbers for each reference sample. (b) Comparison of the coverage-based rarefaction (solid lines) and extrapolation (dashed lines), up to the base coverage 96% (i.e., lower coverage of the doubled reference sample sizes) of spider diversity using Hill numbers of order q ¼ 0 (left panel), q ¼ 1 (middle panel), and q ¼ 2 (right panel). Reference samples in each treatment are denoted by solid dots. Note that species richness (left panel) in the two treatments is significantly different when sample coverage is between 50% and 96%, as the two confidence bands do not intersect in this range of coverage values. The numbers in parentheses are the sample coverage and the observed Hill numbers for each reference sample. tion range. For q ! 1, the extrapolated estimator is nearly unbiased for all extrapolated sample sizes, so the extrapolation can be safely extended to the asymptote. However, for q ¼ 0, extrapolation is reliable up to no more than double the reference sample size. Beyond that, the predictor for q ¼ 0 may be subject to some bias because our asymptotic estimator for species richness (Chao1 for abundance data and Chao2 for incidence data) is a lower bound only (Chao 1984(Chao , 1987.
To compare the diversities of multiple assemblages, Box 1 gives guidelines for choosing a base sample size and base coverage for comparing sample-size-and coverage-based curves. With the suggested base sample size and base coverage, all data are used for compari-sons. Based on the integrated sample-size-and coveragebased rarefaction and extrapolation curves, ecologists can efficiently use all available data to make more robust and detailed inferences about the sampled assemblages for any standardized samples with sample size less than the base sample size, and for any equally complete samples with coverage less than the base coverage. However, Example 2 provides an example in which we extrapolate a sample beyond a doubling of its reference sample size, based on the suggested base sample size.
For those samples, we should be cautious in estimating quantitative differences in species richness (q ¼ 0) among assemblages, although inferences about diversities of q ! 1 are reliable. In our formulation of a diversity accumulation curve, we define the expected diversity of a finite sample of size m as the Hill numbers based on the expected abundance frequency counts fE [ f k (m)]; k ¼ 1, . . . , mg. Our proposed theoretical formula is given in Eq. 6. An alternative definition would be the average Hill numbers over many samples of size m taken from the entire assemblage. Although the two approaches generally yield very close numerical values, our approach has two main advantages. We have shown (see summaries in Tables 1 and 2) that accurate estimators via estimation of frequency counts can be obtained for our approach. However, it is difficult to accurately estimate the alternative formula of the expected Hill numbers; usually algorithmic methods are needed. Another advantage is that all transformations between diversity measures are valid for any size m under our formulation. For example, Hill number of order 2 for any sample size m is exactly the inverse of the Simpson concentrations for the same size when all are based on the same expected frequencies. This is important because all diversity measures give consistent comparisons. If we use the alternative approach, then such transformations will not be exactly valid, and different measures may produce different comparative results. As proved in Propositions D1 in Appendix D, the two approaches are identical for species richness, and the same conclusion is valid for expected sample coverage.
Rarefaction and extrapolation aim to make fair comparisons among incomplete samples. Sample-sizebased rarefaction and extrapolation, in which the samples are all standardized to an equal size, provide useful sampling information for a range of sizes. Coverage-based rarefaction and extrapolation, in which all samples are standardized to an equal coverage, ensure that we are comparing samples of equal completeness over a range of coverages. Taken together, these two types of curves allow us to make more robust and detailed inferences about the sampled assemblages. Our approach provides a unifying sampling framework for species diversity studies and allows for objective comparisons of multiple assemblages.
For species richness (q ¼ 0), if the expected sample-sizebased species accumulation curves of two assemblages do not cross for any finite sample size .1, then the expected coverage-based species accumulation curves for these two assemblages also do not cross at any finite coverage ,1 beyond the base point (Chao and Jost 2012). The reverse is also true. Thus, the two types of curves for species richness always give the same qualitative ordering of species richness. If crossing occurs, then the sample-sizeand coverage-based curves have exactly the same number of crossing points. However, for species richness, the coverage-based method is always more efficient (requiring smaller sample sizes in each assemblage) than the traditional method for detecting any specific crossing point (Chao and Jost 2012). The two types of curves can exhibit different patterns and yield different diversity ordering for q ¼ 1 (Shannon diversity) and q ¼ 2 (Simpson diversity). An example of the case of q ¼ 2 is illustrated in Figs. 3b and 5b. The sample-size-based curves for q ¼ 2 in Fig. 3b do not intersect, but the coverage-based curves for q ¼ 2 in Fig. 5b cross twice. Appendix J gives an example for the case of q ¼ 1. There is another difference between FIG. 7. Plot of sample coverage for rarefied samples (solid lines) and extrapolated samples (dashed line) with 95% confidence intervals for tropical ants sampled from five sites (data from Longino and Colwell [2011]). Each curve was extrapolated up to a doubling of its reference sample size (the extrapolated curve for the 500-m site was cut off at 700 and, thus, is not completely shown). Reference samples are denoted by solid dots. the sample-size-and coverage-based standardization methods. As proved in Appendix D, the expected diversity of any order obeys a replication principle only when coverage is standardized.
In biodiversity studies, ecologists are interested in measuring not only diversity, but also evenness and inequality (Ricotta 2003). Jost (2010) used partitioning theory to derive Hill's (1973) useful class of evenness measures, the ratios of Hill numbers q D and species richness, q D/S for q . 0, and he showed that the ratio of the logarithms of Hill numbers and logarithm of richness, log( q D)/log(S), including Pielou's (1975) J 0 ¼ log( 1 D)/ log(S ), express the corresponding relative evenness. These two classes of measures have been difficult to accurately estimate statistically from samples due to their strong dependence on species richness, and thus on sample size. Jost (2010) suggested estimating both S and Hill numbers at fixed coverage to obtain meaningful estimates of evenness and inequality indices. Based on the theory developed in this paper, we are now able to analytically estimate evenness and inequality indices at fixed sample size or sample coverage. This will be an important application of our proposed theory; see Tables 1 and 2 for a summary of our analytic formulas. FIG. 8. (a) Coverage-based rarefaction (solid lines) and extrapolation (dashed lines) plots with 95% confidence intervals for tropical ant diversity based on Hill numbers (q ¼ 0, 1, 2) for five elevations. Reference samples are denoted by solid dots. The extrapolation is extended to the coverage value for a doubling of the size of each reference sample. The numbers in parentheses are the sample coverage and the observed Hill numbers for each reference sample. (b) Comparison of the coverage-based rarefaction (solid lines) and extrapolation (dashed lines), up to a base coverage of 99.64% for tropical ant diversity samples using Hill numbers of order q ¼ 0 (left panel), q ¼ 1 (middle panel), and q ¼ 2 (right panel), with 95% confidence intervals based on 200 bootstrap replications. Reference samples in each plot are denoted by solid dots. The numbers in parentheses are the sample coverage and the observed Hill numbers for each reference sample. Note that some confidence intervals in panels (a) and (b) are very narrow so that they are almost invisible.
In addition to Hill numbers, there are two other widely used classes of measures: Renyi and Tsallis generalized entropies Taillie 1979, 1982). These measures are simple transformations of Hill numbers; see Jost (2007). Hurlbert (1971) suggested another unified class of species diversity indices, defined as the expected number of species in a sample of m individuals selected at random from an assemblage. The relationship between Hill numbers and Hurlbert's indices has not been clear to ecologists (Dauby and Hardy 2011). In Appendix I, we show that these two classes of infinity orders are mathematically equivalent, in the sense that they contain the same information about biodiversity. Moreover, given a reference sample, sample-size-based rarefaction and extrapolation formulas  for species richness provide estimates of Hurlbert's indices. Thus, our proposed sample-size-and coverage-based rarefaction/extrapolation sampling framework for Hill numbers includes the information and estimators of all Hurlbert's indices and provides a unified approach to quantifying species diversity.
The slope of a sample-size-based expected species accumulation curve or a rarefaction/extrapolation curve also provides important information. The slope at the base point in the species accumulation curve or rarefaction curve is closely related to the Simpson diversity and to Hurlbert's (1971) Probability of an Interspecific Encounter (PIE) measure (Olszewski 2004). The slope at any other point is closely related to the complement of coverage (Chao and Jost 2012). For coverage-based curves, see Appendix I for similar findings. In Appendix K, we consider different sampling schemes and discuss the relationship between the expected species accumulation curve, Simpson diversity, and PIE.
For Hill numbers, only species relative abundances are involved. Species absolute abundances play no role in traditional diversities. From the perspective of measuring ecosystem function, Ricotta (2003) argued that if two assemblages have the same relative abundances, the one with larger absolute abundances should be considered more diverse. We are currently working on extending Hill numbers to include absolute abundances of species. The associated rarefaction and extrapolation functions for absolute-abundance Hill numbers also merit further research. Finally, this paper has focused on traditional Hill numbers, which do not take species evolutionary history into account. Chao et al. (2010) generalized Hill numbers to a class of measures that incorporate phylogenetic distances between species. It is worthwhile to extend this work to rarefaction and extrapolation of phylogenetic and functional diversity measures (Walker et al. 2008, Ricotta et al. 2012. All the rarefaction and extrapolation estimators proposed in this paper are featured in the online freeware application iNEXT (iNterpolation/EXTrapo-lation; personal communication). The R scripts for iNEXT have been posted in the Supplement, and will also be available in the R CRAN packages (available online). 9 Sample-size-based rarefaction and extrapolation estimators for richness (q ¼ 0, in Tables 1 and 2) are computed by EstimateS Version 9 (R. Colwell, available online, see footnote 8).