International Journal of Health Geographics Research BioMed Central Open Access Power evaluation of disease clustering tests Changhong Song*1 and Martin Kulldorff1,2 Address: 1Department of Statistics, University of Connecticut, Storrs, Connecticut, 06269, U.S.A and 2Department of Ambulatory Care and Prevention, Harvard Medical School and Harvard Pilgrim Health Care,133 Brookline Avenue, 6th Floor, Boston, MA 02215, USA Email: Changhong Song* - changhon@stat.uconn.edu; Martin Kulldorff - martin_kulldorff@hms.harvard.edu * Corresponding author Published: 19 December 2003 International Journal of Health Geographics 2003, 2:9 Received: 30 October 2003 Accepted: 19 December 2003 This article is available from: http://www.ij-healthgeographics.com/content/2/1/9 © 2003 Song and Kulldorff; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. Spatial statisticsbenchmark datapowercluster detectionhot spot clustersglobal chain clusteringtest for spatial randomness Abstract Background: Many different test statistics have been proposed to test for spatial clustering. Some of these statistics have been widely used in various applications. In this paper, we use an existing collection of 1,220,000 simulated benchmark data, generated under 51 different clustering models, to compare the statistical power of several disease clustering tests. These tests are Besag-Newell's R, Cuzick-Edwards' k-Nearest Neighbors (k-NN), the spatial scan statistic, Tango's Maximized Excess Events Test (MEET), Swartz' entropy test, Whittemore's test, Moran's I and a modification of Moran's I. Results: Except for Moran's I and Whittemore's test, all other tests have good power for detecting some kind of clustering. The spatial scan statistic is good at detecting localized clusters. Tango's MEET is good at detecting global clustering. With appropriate choice of parameter, Besag-Newell's R and Cuzick-Edwards' k-NN also perform well. Conclusion: The power varies greatly for different test statistics and alternative clustering models. Consideration of the power is important before we decide which test statistic to use. Background A large number of tests for spatial randomness that adjust for an uneven background population have been proposed. Such test statistics are used to test whether or not the geographical distribution of disease is random. They are also used in many other areas such as genetics, geomorphology and ecology [1-6]. When we use these test statistics, it is important to know whether they have good power. There have been some studies comparing such test statistics [7-14], but there have been few simultaneous comparisons of three or more tests. When evaluating tests for spatial randomness, the best way is to compare them using the same simulated data sets. For our study, we use existing benchmark data [10], simulated from the female population in the Northeastern United States, to evaluate the power of different test statistics for various kinds of clusters. Previous studies have shown that the spatial scan statistic has good power in detecting hot spot clusters, and Tango's MEET has good power in detecting global clustering [10]. We compare the power of these two test statistics with six additional tests: Besag-Newell' R, Cuzick-Edwards' k-NN, Swartz' entropy test, Whittemore's test, Moran's I and a Page 1 of 8 (page number not for citation purposes) International Journal of Health Geographics 2003, 2 http://www.ij-healthgeographics.com/content/2/1/9 modified version of Moran's I. These tests are selected for different reasons. Some tests are widely used, such as Moran's I and Cuzick-Edwards' k-NN. Most of them are published in well reputed statistics journals. Methods Benchmark data sets The benchmark data sets are based on the 1990 female population in the 245 counties and county equivalents in the Northeastern United States, consisting of the states of Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, Connecticut, New York, New Jersey, Pennsylvania, Delaware, Maryland and the District of Columbia. Each county is represented by a centroid coordinate. The data is available at 'http://www.commed.uchc.edu/biostat/data sets/'. The benchmark data and how it was generated has been described in detail elsewhere [10]. We provide a brief summary here. Hot spot clusters Hot spot clusters were generated by setting the relative risk in some counties to be larger than 1. Three different sets of local clusters are constructed in a rural, urban and mixed area respectively. Within each of these three sets, there are five different sized clusters with 1, 2, 4, 8 and 16 counties respectively. The center of the rural cluster is Grand Isle County in Vermont. The center of the mixed cluster is Allegheny County (Pittsburgh) in Pennsylvania. The center of the urban cluster is New York county (Manhattan) in New York. The relative risks and counties included in each cluster are listed in Table 1. In order to evaluate how the disease clustering tests perform when there are multiple hot-spot clusters, the benchmark data also include 15 alternate models with two clusters and 5 models with three clusters by using different combinations of the original clusters. In a model, all clusters had the same number of counties. Global chain clustering In the global chain clustering model, every county has the same expected number of cases under the null and alternative hypothesis. The counties are tied together sequentially on a chain that passes through the centroid of each county exactly once, after which it reconnects with the first county on the chain, forming a Hamiltonian cycle. A map of the Hamiltonian cycle used has previously been published [10]. Under the null hypothesis of no clustering, 100,000 random data sets were generated by randomly allocating 600 cases to the various counties, with probabilities proportional to the county population. The null data is used to estimate the critical values, which is the cut-off point for the significance. Two kinds of clustering models were evaluated, hot spot clusters and global chain clustering. Table 1: The hot spot clusters Counties Counties included E [c|H0] E [c|HA] Relative risk 192.89 27.03 7.05 5.35 3.90 Rural clusters 1 2 4 8 16 Grand Isle, VT above + Franklin, VT above + Clinton, NY, Chittenden, VT above + Lamoille, VT, Washington, VT, Essex, NY, Addison, VT above + Orleans, VT, Franklin, NY, Caledonia, VT, Orange, VT, Essex, VT, Rutland, VT, Warren, NY, Windsor, VT Allegheny, PA above + Washington, PA above + Beaver, PA, Westmoreland, PA above + Butler, PA, Armstrong, PA, Lawrence, PA, Fayette, PA above + Greene, PA, Indiana, PA, Clarion, PA, Mercer, PA, Somerset, PA, Venango, PA, Cambria, PA, Jefferson, PA New York, NY above + Hudson, NJ above + Bronx, NY, Kings, NY above + Queens, NY, Bergen, NJ, Essex, NJ, Richmond, NY above + Union, NJ. Nassau, NY, Passaic, NJ, Rockland, NY, Westchester, NY, Morris, NJ, Middlesex, NJ, Monmouth, NJ 0.05 0.46 2.69 4.16 7.32 10 12 18 22 28 Mixed clusters 1 2 4 8 16 14.43 16.41 22.52 27.47 34.22 39 42 51 58 67 2.85 2.70 2.40 2.24 2.10 Urban clusters 1 2 4 8 16 15.97 21.78 59.99 101.96 154.94 42 50 100 150 209 2.73 2.43 1.81 1.63 1.53 Page 2 of 8 (page number not for citation purposes) International Journal of Health Geographics 2003, 2 http://www.ij-healthgeographics.com/content/2/1/9 To generate clusters, a certain number of cases are first located randomly on the map, according to the null hypothesis. These original cases then generate other new cases close by. If each original case generates one additional case, it is called twins. If two additional cases are generated, it is called triplets. A total of 26 chain clustering models are constructed with the distance between the twins (triplets) along the chain being either constant or exponentially distributed with different means. If the distance is zero, the twins (triplets) are in the same county. The chain does not imply that the disease itself spreads around the chain, just that twins and triplets cases are located in either of the two directions, as defined by the chain. Test statistics Notation Denote ci as the number of cases in county i, ni as the population size of county i, C as the total number of cases, N as the total population size, H as the total number of counties and dij as the distance between county i and j. This test statistic was originally designed for point data, but can easily be adapted for aggregated data. The test statistic is defined as Tk = ∑ ci gik , i where k is a parameter chosen by the user and for each county i, gik denotes the number of k nearest neighbors which are cases. To be more precise, gik = D(h-1)(i) + th(i) where h is decided so that U(h-1)(i) ≤ k, Uh(i) >k and th(i) = k − U(h −1)(i) Uh(i) − U(h −1)(i) ch . U(-1)(i) is defined as 0. The null hypothesis of no clustering is rejected when Tk is large. The spatial scan statistic The spatial scan statistic [20] has among other things been used to study human granulocytic ehrlichiosis near Lyme in Connecticut [21], soft-tissue sarcoma and non-Hodgkin's lymphoma clusters with high dioxin emission levels [22], childhood mortality in rural Burkina Faso [23], bovine tuberculoisis in Argentina [24] and Toxoplasma gondii infection of southeast sea, otters [25]. Let Dj(i) be the total number of cases in county i and its j closest neighbors, and let Uj(i) be the population size in county i and its j closest neighbors. Besag-Newell's R Besag-Newell's R statistic [15] has been used to study leukemia in upstate New York [16]. The test statistic is defined as R = ∑ i =1 ci I(P(Mi ≤ mi ) < 0.05) , where Mi is a random variable denoting the minimum number of counties needed to have at least k cases in county i and its Mi closest neighboring counties, mi is the observed value of Mi, that is, mi = min{j : (Dj(i) + 1) ≥ k}. k is a parameter set by the user. Usually, a large k is more sensitive to large clusters and a small k is more sensitive to small clusters. I is the indicator function with value 1 when P(Mi ≤ mi) < 0.05 and 0 otherwise. P(Mi ≤ mi) is calculated by H The spatial scan statistic imposes a circular window on the map and lets the circle centroid move across the study region. For any given position of the centroid, the radius of the window is changed continuously to take any value between zero and some upper limit. Let Lj(i) be the likelihood under the alternate hypothesis that there is a cluster in county i and its j closest neighbors, and let L0 be the likelihood under the null hypothesis. It can then be shown that L j(i) L0 =( P(Mi ≤ mi ) = 1 − P(Mi > mi ) = 1 − ∑ e s =0 k −1 −Um (i) C i N (U mi (i) C s ) / s!. N D j(i) C − D j(i) C − D j(i) )D j(i)( ) . C C U j(i) C − U j(i) N N The null hypothesis of no clustering is rejected when R is large. Cuzick-Edwards' k-NN Cuzick-Edwards' k-NN (k-Nearest Neighbors) test [17] has been widely used, for example for leukemias and lymphomas among young people in New Zealand [18] and the association of Ixodes pacificus and quine granulocytic ehrlichiosis in California [19]. As this likelihood ratio is maximized over all circles, it identifies the one that constitutes the most likely cluster. The test statistic is T = max i, j L j(i) L0 I(D j(i) > U j(i) N C), where I is the indicator function with value 1 when U j(i) D j(i) > C and 0 otherwise. The null hypothesis of N no clustering is rejected when T is large. Page 3 of 8 (page number not for citation purposes) International Journal of Health Geographics 2003, 2 http://www.ij-healthgeographics.com/content/2/1/9 Tango's Maximized Excess Events Test (MEET) For a given parameter λ, the Excess Events Test statistic [11] is defined as We also consider a modified version of Moran's I: EET(λ) = ∑ ∑ e i j 2 −4dij / λ 2 (ci − ni C C )(c j − n j ). N N Imod = ∑ H i =1 j =i +1 ∑ (ri − r )(rj − r )aij , H The choice of λ relates to the geographical scale of clustering. Large λ makes the test sensitive to geographically large clusters, while small λ will make the test more sensitive to small clusters. To be able to detect clustering irrespectively of its geographical scale, Tango [12] proposed the Maximized Excess Events Test (MEET) In both cases, we reject the null hypothesis of no clustering when I is large. Whittemore's test Whittemore et al. [32] proposed the statistic T= 1 ∑ ∑ dij ci c j . 2 i j MEET = min P(EET(λ) > eet(λ) | H0 , λ), 0≤λ ≤U We reject the null hypothesis of no clustering when T is small. Power calculation For Besag-Newell's R, Cuzick-Edwards' k-NN. Swartz' entropy test, Moran's I and Whittemore's test, the power estimate is calculated using C++ code written by the author. For the spatial scan statistic and Tango's MEET, the power estimates are obtained from the paper by Kulldorff et al.[10]. where eet(λ) is the observed value of the Excess Events Test statistic conditioning on λ, and U is the upper limit on λ. Practical implementation of the test uses 'line search' by discretization on λ, and the MEET statistic is evaluated using Monte Carlo hypothesis testing [26]. The null hypothesis of no clustering is rejected when the test statistic is small. Swartz' entropy test Swartz [27] proposed a test for spatial randomness based on the concept of entropy. The test statistic is defined [28] as Results Hot spot clusters Table 2 shows the estimated power of the test statistics in detecting the hot spot clusters. For each type of hot spot cluster, the highest power is highlighted. The spatial scan statistic has good power in detecting all three kinds of hot spot clusters: rural, mixed and urban clusters, and it performs best for detecting rural clusters. Tango's MEET performs best for the urban clusters, but not very well for the rural clusters. T = ln(C !) + ln((N − C)!) − ∑ (ln(ci !) + ln((ni − ci )!)). i The null hypothesis of no clustering is rejected when T is small. Moran's I Moran's I [29] was originally proposed to analyze continuous data. Subsequently, this statistic has also been used to analyze count data, such as Lyme disease in New York State [30] and cancer incidence in Canada [31]. The Moran' I statistic is defined as ∑ i=1 ∑ j=i+1(ri − r )(rj − r )aij . I= H (ri − r )2 ∑ i=1 H H With the right choice of parameter, Besag-Newell's R has the best power for detecting mixed clusters, but its strength is very sensitive to choice of parameter. The power of Cuzick-Edwards' k-NN also depends on the parameter. It has good power in detecting all three kinds of hot spot clusters with the right choice of parameter. The choice of parameter depends on the size of the cluster. Usually, for large clusters, large parameter values perform better, while for small clusters, small parameter values are better. Swartz' entropy test has good power in detecting the rural clusters, but not very good for mixed or urban clusters. Moran's I can detect the rural clusters except for the cluster with only one county. The modified Moran's I has similar performance to Moran's I, but it performs better for the rural clusters, especially for the cluster with one county. Whittemore's test does not perform as well as the other test statistics in detecting hot spot clusters. where r = c 1 H ∑ ri , ri = ni and H i =1 i 1 if county i and j are neighbors. aij =   0 if county i and j are not neighbors. Page 4 of 8 (page number not for citation purposes) International Journal of Health Geographics 2003, 2 http://www.ij-healthgeographics.com/content/2/1/9 Table 2: Power of the test statistics for the hot spot clusters. Besag-Newell's R Cuzick-Edwards' k-NN Spatial Scan Statistic Tango's MEET Swartz' Entropy Test I Moran's Whittemore's Test k=6 Rural (edge) 1 2 4 8 16 1 2 4 8 16 1 2 4 8 16 1 2 4 8 16 0.707 0.792 0.839 0.830 0.821 0.037 0.129 0.157 0.217 0.293 0.037 0.033 0.026 0.022 0.015 0.624 0.803 0.841 0.867 0.857 0.020 0.084 0.082 0.107 0.120 0.619 0.709 0.731 0.676 0.591 0.541 0.735 0.735 0.728 0.642 12 0.388 0.466 0.754 0.854 0.880 0.023 0.024 0.095 0.222 0.284 0.027 0.214 0.049 0.022 0.059 0.270 0.356 0.739 0.906 0.938 0.012 0.132 0.069 0.114 0.167 0.272 0.665 0.644 0.689 0.725 0.185 0.565 0.611 0.759 0.766 30 0.089 0.074 0.239 0.309 0.505 0.983 0.989 0.980 0.956 0.914 0.952 0.819 0.190 0.459 0.368 0.981 0.987 0.983 0.972 0.970 0.999 0.995 0.946 0.915 0.803 0.954 0.823 0.261 0.546 0.521 0.998 0.993 0.949 0.922 0.840 100K 0.752 0.810 0.874 0.851 0.758 0.648 0.655 0.645 0.608 0.598 0.627 0.587 0.378 0.292 0.257 0.956 0.969 0.977 0.973 0.949 0.929 0.918 0.807 0.710 0.616 0.949 0.955 0.947 0.911 0.803 0.992 0.994 0.987 0.972 0.909 500K 0.168 0.199 0.425 0.540 0.621 0.919 0.886 0.822 0.777 0.715 0.856 0.786 0.684 0.637 0.648 0.943 0.926 0.930 0.939 0.939 0.997 0.990 0.962 0.935 0.897 0.902 0.854 0.863 0.879 0.892 0.998 0.994 0.984 0.980 0.962 1000K 0.038 0.049 0.109 0.157 0.247 0.899 0.913 0.931 0.903 0.838 0.893 0.937 0.864 0.817 0.795 0.860 0.891 0.929 0.916 0.891 0.998 0.998 0.996 0.987 0.969 0.868 0.919 0.855 0.826 0.834 0.994 0.997 0.995 0.984 0.962 0.998 0.991 0.973 0.971 0.969 0.936 0.939 0.937 0.941 0.949 0.922 0.903 0.892 0.913 0.926 1.000 0.999 0.997 0.996 0.996 0.987 0.984 0.966 0.954 0.935 1.000 0.999 0.992 0.991 0.987 1.000 1.000 0.996 0.992 0.977 0.196 0.221 0.229 0.213 0.229 0.925 0.896 0.838 0.817 0.832 0.941 0.920 0.961 0.983 0.986 0.964 0.952 0.930 0.931 0.941 0.998 0.995 0.991 0.990 0.984 0.970 0.962 0.971 0.977 0.975 0.999 0.998 0.994 0.989 0.983 0.939 0.804 0.607 0.639 0.706 0.270 0.289 0.269 0.291 0.354 0.264 0.245 0.119 0.078 0.047 0.975 0.923 0.813 0.849 0.914 0.545 0.499 0.303 0.222 0.199 0.974 0.909 0.671 0.602 0.561 0.991 0.960 0.799 0.759 0.744 Imod 0.315 0.793 0.505 0.814 0.806 0.045 0.051 0.061 0.099 0.165 0.045 0.049 0.043 0.043 0.045 0.310 0.780 0.508 0.823 0.810 0.041 0.045 0.048 0.073 0.135 0.309 0.771 0.472 0.770 0.726 0.291 0.755 0.456 0.769 0.737 0.010 0.006 0.004 0.002 0.004 0.000 0.000 0.000 0.000 0.000 0.296 0.334 0.579 0.758 0.887 0.000 0.000 0.000 0.000 0.000 0.009 0.012 0.034 0.070 0.138 0.096 0.097 0.206 0.365 0.562 0.002 0.001 0.003 0.007 0.023 0.000 0.743 0.449 0.752 0.715 0.053 0.059 0.078 0.130 0.193 0.049 0.056 0.052 0.061 0.069 0.000 0.727 0.460 0.767 0.729 0.049 0.057 0.080 0.136 0.212 0.000 0.712 0.436 0.726 0.659 0.000 0.697 0.433 0.730 0.672 Mixed (corner) Urban (central) Rural and Mixed Mixed and Urban 1 2 4 8 16 Rural and Urban 1 2 4 8 16 1 2 4 8 16 Rural, Mixed and Urban All the test statistics have good power for multiple hot spot clusters except Whittemore's test and Moran's I. The spatial scan statistic, Tango's MEET and Cuzick-Edwards' k-NN perform very well in detecting multiple clusters. Global chain clustering Table 3 shows the estimated power of the test statistics for global chain clustering. The highest power for each type of global clustering is highlighted. Note that as the distance between the cases increases, there is less clustering in the data, and all tests have lower power. For most alternative models, Tango's MEET has the highest power. The spatial scan statistic performs well, but not as well as Tango's MEET. Swartz' entropy test is good when the distance is small, but the power decreases very quickly as the distance increases. Besag-Newell's R, Moran's I, Page 5 of 8 (page number not for citation purposes) International Journal of Health Geographics 2003, 2 http://www.ij-healthgeographics.com/content/2/1/9 Table 3: Power of the test statistics for the global chain clustering. Besag-Newell's R k=6 No distance Fixed Distance 0.00 0.005 0.01 0.02 0.04 0.08 0.16 0.005 0.01 0.02 0.04 0.08 0.16 0.477 0.076 0.057 0.060 0.061 0.056 0.060 0.212 0.140 0.096 0.076 0.063 0.059 12 0.491 0.242 0.077 0.060 0.054 0.054 0.051 0.314 0.210 0.134 0.097 0.074 0.063 30 0.423 0.332 0.231 0.118 0.055 0.050 0.042 0.351 0.274 0.191 0.121 0.091 0.061 Twins Cuzick-Edwards' k-NN Spatial Scan Statistic 100K 500K 1000K 1.000 0.488 0.159 0.077 0.060 0.059 0.059 0.820 0.534 0.284 0.144 0.086 0.062 0.925 0.644 0.319 0.107 0.065 0.059 0.056 0.709 0.525 0.314 0.171 0.104 0.070 0.728 0.570 0.383 0.154 0.067 0.058 0.045 0.791 0.392 0.285 0.194 0.124 0.080 0.055 Tango's MEET Swartz' Entropy Test Moran's Whittemore's Test I 0.990 0.624 0.406 0.264 0.174 0.109 0.059 0.738 0.556 0.378 0.250 0.166 0.107 0.999 0.357 0.143 0.079 0.062 0.059 0.060 0.642 0.386 0.210 0.127 0.081 0.064 0.049 0.116 0.078 0.056 0.051 0.050 0.053 0.182 0.133 0.094 0.071 0.063 0.054 Imod 0.136 0.101 0.068 0.054 0.050 0.051 0.054 0.179 0.121 0.086 0.067 0.058 0.053 0.132 0.128 0.122 0.116 0.097 0.073 0.053 0.127 0.122 0.112 0.102 0.085 0.071 Exponential Distance 0.587 0.452 0.466 0.351 0.309 0.262 0.184 0.185 0.111 0.124 0.074 0.080 Triplets 0.964 0.856 0.587 0.212 0.084 0.057 0.044 0.867 0.721 0.490 0.272 0.139 0.078 0.995 0.674 0.491 0.318 0.189 0.102 0.046 0.762 0.610 0.436 0.289 0.171 0.091 No distance Fixed Distance 0.00 0.005 0.01 0.02 0.04 0.08 0.16 0.005 0.01 0.02 0.04 0.08 0.16 0.742 0.088 0.064 0.067 0.060 0.058 0.060 0.315 0.185 0.118 0.084 0.073 0.060 0.780 0.333 0.092 0.062 0.063 0.056 0.057 0.524 0.323 0.184 0.110 0.075 0.066 0.716 0.587 0.368 0.149 0.057 0.044 0.035 0.629 0.473 0.303 0.180 0.099 0.065 1.000 0.715 0.228 0.092 0.076 0.069 0.066 0.977 0.786 0.438 0.202 0.110 0.070 0.999 0.885 0.470 0.132 0.079 0.063 0.053 0.939 0.773 0.489 0.251 0.118 0.071 1.000 0.884 0.646 0.430 0.265 0.141 0.050 0.960 0.826 0.599 0.390 0.226 0.115 1.000 0.559 0.202 0.098 0.072 0.066 0.064 0.884 0.598 0.315 0.161 0.098 0.067 0.052 0.178 0.102 0.065 0.054 0.052 0.053 0.317 0.200 0.127 0.085 0.062 0.053 0.196 0.148 0.087 0.060 0.053 0.048 0.053 0.314 0.185 0.117 0.077 0.060 0.054 0.188 0.179 0.171 0.149 0.118 0.078 0.043 0.176 0.170 0.154 0.135 0.102 0.071 Exponential Distance Whittemore' test are not very good at detecting global clustering. With the right choice of parameter, Cuzick-Edwards' k-NN performs very well, especially for clustering with small distances. Large parameter values tend to detect clustering with large distance, while small parameter values perform better for clustering with small distance. The performance of the test statistics for twins clustering with fixed and exponential distance is similar. All test statistics have better power in detecting triplet clustering since there is more clustering there. Discussion Of the evaluated test statistics, Besag-Newell's R, CuzickEdwards' k-NN, the spatial scan statistic, MEET, Whittemore's test are based on Euclidean distances. Moran's I is based on the adjacencies of counties. Swartz' entropy test does not use the spatial relationship among the counties. The M statistic [33] proposed by Bonetti and Pagano is a nonparametric test that uses the iriterpoint distance distribution to study the spatial pattern of the data. The M statistic has also been evaluated using the same benchmark data [10]. The M statistic does well for mixed and urban clusters and has good power in detecting multiple clusters. Generally, it does not perform quite as well as the spatial scan statistic and MEET, but it is very competitive compared to the other tests. Page 6 of 8 (page number not for citation purposes) International Journal of Health Geographics 2003, 2 http://www.ij-healthgeographics.com/content/2/1/9 Besag-Newell's R and Cuzick-Edwards' k-NN are good test statistics, but the power depends a lot on the parameter. Usually large parameter value can make the test statistic more sensitive to large clustering, whereas small parameter value can detect the small clustering better. So if we know the scale of clustering and choose a corresponding parameter, these two test statistics may have good power. In practice, we usually don't know the size of clustering. If we try different parameter values, that will cause multiple testing problems. Sometimes we need to adjust the analysis for age or other covariate. All the test statistics considered here can incorporate such adjustment except Swartz' entropy test, although it can be modified to do so. In terms of data resolution, Besag-Newell's R, Whittemore's test, Tango's MEET and Swartz' entropy test were originally proposed to analyze aggregated data, while Cuzick-Edwards' k-NN was proposed to analyze point data. The spatial scan statistic was proposed to analyze either aggregated or point data. Moran's I was designed for continuous data, but has been used extensively for aggregated count data as well. It is possible and maybe even likely that these test statistics may perform differently when applied to point data. A strength of this power evaluation study is that the data is typical of epidemiological applications, and uses actual population and geographical data. The strength of the test statistics will depend not only on the alternative model though, but also on the spatial distribution of the areas and the population size in this area. A limit of the study is that the background population of the benchmark data is from only one particular region, the female population of Northeast United States. Under other alternate models and background population, some test statistics may perform better or worse. Authors' contributions CH and MK jointly designed the study and chose the methods for evaluation. CH programmed the C++ code, carried out the power simulations and wrote the first draft of the manuscript. Both authors interpreted the results and wrote the final version of the paper. Acknowledgements This research was funded by NCI grant number RO1CA095979-01. References 1. 2. 3. 4. RuizGarcia M: Genetic relationships among some new cat populations sampled in Europe: A spatial autocorrelation analysis. Journal of Genetics 1997, 76:1-24. Gustine DL, Elwinger GF: Spatiotemporal genetic structure within white clover populations in grazed swards. Crop Science 2003, 43:337-344. Aubry P, Piegay H: Spatial autocorrelation analysis in geomorphology: Definitions and tests. Geographic Phisique et Quaternaire 2001, 55:111-129. Meirmans PG, Vlot EC, Den Nijs JCM, Menken SBJ: Spatial ecological and genetic structure of a mixed population of sexual diploid and apomictic triploid dandelions. Journal of Evolutionary Biology 2003, 16:343-352. Liebhold AM, Gurevitch J: Integrating the statistical analysis of spatial data in ecology. Ecography 2002, 25:553-557. Clark SA, Richardson BJ: Spatial analysis of genetic variation as a rapid assessment tool in the conservation management of narrow-range endemics. Invertebrate Systematics 2002, 16:583-587. Rogerson PA: The detection of clusters using a spatial version of the chi-square goodness-of-fit statistic. Geographical Analysis 1999, 31:130-147. Kulldorff M, Nagarwalla N: Spatial disease clusters: Detection and inference. Statistics in Medicine 1995, 14:799-810. Oden N: Adjusting Moran' I for population density. Statistics in Medicine 1995, 14:17-26. Kulldorff M, Tango T, Park P: Power comparisons for disease clustering tests. Computational Statistics and Data Analysis 2003, 42:665-684. Tango T: A class of tests for detecting 'general' and 'focused' clustering of rare diseases. Statistics in Medicine 1995, 14:2323-2334. Tango T: A test for spatial disease clustering adjusted for multiple testing. Statistics in Medicine 2000, 19:191-204. Vach W: Locally optimal tests on spatial clustering. in New Approaches in Classification and Data Analysis Edited by: Diday. Berlin, Springer-Verlag; 1994:161-168. Tango T: Comparison of general tests for spatial clustering. In Disease Mapping and Risk Assessment for Public Health Edited by: Lawson, et al. London, Wiley; 1999:111-117. Besag J, Newell J: The detection of clusters in rare diseases. Journal of the Royal Statistical Society 1991, A154:143-155. Waller LA, Turnbull BW, Clark LC, Nasca P: Spatial pattern analyses to detect rare disease clusters. In: Case Studies in Biometry Edited by: Lange N, Ryan L, Billard L, Brillinger D, Conquest L, Greenhourse J. New York: John Wiley & Sons; 1994:13-16. Cuzick J, Edwards R: Spatial clustering for inhomogeneous populations. Journal of the Royal Statistical Society 1990, B52:73-104. Dockerty JD, Sharples KJ, Borman B: An assessment of spatial clustering of leukaemias and lymphomas among young people in New Zealand. Journal of Epidemiology and Community Health 1999, 53:154-8. Vredevoe LK, Righter PJ, Madigan JE, Kimsey RB: Association of Ixodes pacificus (Acari: Ixodidae) with the spatial and temporal distribution of equine granulocytic ehrlichiosis in California. Journal of Medical Entomology 1999, 36:551-561. Kulldorff M: A spatial scan statistic. Communications in Statistics: Theory and Methods 1997, 26:1481-1496. Chaput EK, Meek JI, Heimer R: Spatial analysis of human granulocytic ehrlichiosis near Lyme, Connecticut. Emerging Infectious Diseases 2002, 8:943-948. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Conclusion The power varies greatly for different disease clustering test statistics. Consideration of the power is important before deciding which test statistic to use. If the size or scale of clustering is known, it is worth considering the use of Besag-Newell's R or Cuzick-Edwards' k-NN. If not, we feel confident recommending the spatial scan statistic for the detection of local clusters and use Tango's MEET for the general evaluation of clustering throughout the map. Other tests may be equally good or better for alternative models not considered in this paper. 17. 18. 19. List of abbreviations k-NN: k-Nearest Neighbors. MEET: Maximized Excess Events Test. 20. 21. Page 7 of 8 (page number not for citation purposes) International Journal of Health Geographics 2003, 2 http://www.ij-healthgeographics.com/content/2/1/9 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. Viel JF, Arveux P, Baverel J, Cahn JY: Soft-tissue sarcoma and nonHodgkin's lymphoma clusters around a municipal solid waste incinerator with high dioxin emission levels. American Journal of Epidemiology 2000, 152:13-19. Sankoh OA, Ye Y, Sauerborn R, Muller O, Becher H: Clustering of childhood mortality in rural Burkina Faso. International Journal of Epidemiology 2001, 30:485-492. Perez AM, Ward MP, Torres P, Ritacco V: Use of spatial statistics and monitoring data to identify clustering of bovine tuberculosis in Argentina. Preventive Veterinary Medicine 2002, 56:63-74. Miller MA, Gardner IA, Kreuder C, Paradies DM, Worcester KR, Jessup DA, Dodd E, Harris MD, Ames JA, Packham AE, Conrad PA: Coastal freshwater runoff is a risk factor for Toxoplasma gondii infection of southern sea otters (Enhydra lutris nereis). International Journal for Parasitology 2002, 32:997-1006. Dwass M: Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics 1957, 28:181-187. Swartz JB: An entropy-based algorithm for detecting clusters of cases and controls and its comparison with a method using nearest neighbors. Health and Place 1998, 4:67-77. Kulldorff M: Letter to the editor. Health and Place 1999, 5:313. Moran PAP: Notes on continuous stochastic phenomena. Biometrika 1950, 37:17-23. Glavanakov S, White DJ, Caraco T, Lapenis A, Robinson GR, Szymanski BK, Maniatty WA: Lyme disease in New York State: Spatial pattern at a regional scale. American Journal of Tropical Medicine and Hygiene 2001, 65:538-545. Le ND, Marret LD, Roberson DL, Semenciw RM, Turner D, Walter SD: Canadian Cancer Incidence Atlas. Canadian Government Publishing. 1995. Whittemore AS, Friend N, Brown BW, Holly EA: A test to detect clusters of disease. Biometrika 1987, 74:631-635. Bonetti M, Pagano M: On detecting clustering. Proceedings of the Biometrics Section American Statistical Association 2001:24-33. Publish with Bio Med Central and every scientist can read your work free of charge "BioMed Central will be the most significant development for disseminating the results of biomedical researc h in our lifetime." Sir Paul Nurse, Cancer Research UK Your research papers will be: available free of charge to the entire biomedical community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours — you keep the copyright Submit your manuscript here: http://www.biomedcentral.com/info/publishing_adv.asp BioMedcentral Page 8 of 8 (page number not for citation purposes)