1 JUNE 2010

HUYBERS

3009

Compensation between Model Feedbacks and Curtailment of Climate Sensitivity
PETER HUYBERS
Harvard University, Cambridge, Massachusetts (Manuscript received 11 August 2009, in ﬁnal form 30 December 2009) ABSTRACT The spread in climate sensitivity obtained from 12 general circulation model runs used in the Fourth Assessment of the Intergovernmental Panel on Climate Change indicates a 95% conﬁdence interval of 2.18–5.58C, but this reﬂects compensation between model feedbacks. In particular, cloud feedback strength negatively covaries with the albedo feedback as well as with the combined water vapor plus lapse rate feedback. If the compensation between feedbacks is removed, the 95% conﬁdence interval for climate sensitivity expands to 1.98–8.08C. Neither of the quoted 95% intervals adequately reﬂects the understanding of climate sensitivity, but their differences illustrate that model interdependencies must be understood before model spread can be correctly interpreted. The degree of negative covariance between feedbacks is unlikely to result from chance alone. It may, however, result from the method by which the feedbacks were estimated, physical relationships represented in the models, or from conditioning the models upon some combination of observations and expectations. This compensation between model feedbacks—when taken together with indications that variations in radiative forcing and the rate of ocean heat uptake play a similar compensatory role in models—suggests that conditioning of the models acts to curtail the intermodel spread in climate sensitivity. Observations used to condition the models ought to be explicitly stated, or there is the risk of doubly calling on data for purposes of both calibration and evaluation. Conditioning the models upon individual expectation (e.g., anchoring to the Charney range of 38 6 1.58C), to the extent that it exists, greatly complicates statistical interpretation of the intermodel spread.

1. Introduction
Collections of global climate model runs are the backbone of efforts to predict future climate, as most recently represented by the Coupled Model Intercomparison Project 3 (CMIP3) (Meehl et al. 2007) that collected together model runs used in the Intergovernmental Panel on Climate Change Fourth Assessment Report (IPCC AR4). Although these model runs were not designed to span the full range of uncertainty, are not fully independent, and are not identically forced (e.g., Knutti et al. 2010), they do offer some indication of the range of future climate states. If we are to correctly interpret such an ensemble of opportunity, it is ﬁrst necessary to determine the interdependence between the models and what range of uncertainty is covered by the ensemble. An important interdependence was identiﬁed between the radiative forcing and climate sensitivity across the

Corresponding author address: Peter Huybers, Harvard University, 20 Oxford St., Cambridge, MA 02138. E-mail: phuybers@fas.harvard.edu DOI: 10.1175/2010JCLI3380.1 Ó 2010 American Meteorological Society

CMIP3 models by Schwartz et al. (2007), who noted that, while twentieth-century changes in radiative forcing differs by a factor of 4 (0.6 to 2.4 W m22, 5%–95% conﬁdence limits) across the models, the resulting temperature spread differs by only a factor of 2. Although a linear relationship between radiative forcing and temperature is not expected—for example, because of long adjustment time scales—this ratio of differences nonetheless suggests compensation between various model components. Kiehl (2007) then presented evidence that this narrow temperature range results from an anticorrelation between radiative forcing and climate sensitivity, and Knutti (2008) demonstrated that this anticorrelation holds for the CMIP3 models in particular. Differences in radiative forcing arise from how aerosols are treated. Thus, the CMIP3 models approximate the twentieth-century warming through differing balances between radiative forcing and climate sensitivity. Intermodel compensation between climate sensitivity and radiative forcing (Schwartz et al. 2007; Kiehl 2007; Knutti 2008) underscores that the models are not based purely on theory but are also conditional upon

3010

JOURNAL OF CLIMATE

VOLUME 23

observations and, possibly, expectations. It has been noted that the inﬂuence of aerosol tuning on twentiethcentury simulations has little inﬂuence on the spread of future climate predictions because the radiative forcing from atmospheric CO2 comes to dominate over aerosols in the emissions scenarios (Kiehl 2007; Knutti 2008). However, the question arises whether other features of the models are also tuned and how these inﬂuence the spread in climate predictions. Webb et al. (2006) observed that the radiative forcing associated with a doubling of CO2 and climate sensitivity is anticorrelated across the models in the Cloud Feedback Model Intercomparison Project (McAvaney and Le Treut 2003). This suggests that the tuning of radiative forcing extends beyond aerosols and has consequences for the spread across predictions. Furthermore, Raper et al. (2002) noted that differences in the efﬁciency of heat uptake across the models in the second Coupled Model Intercomparison Project (CMIP2) give a more similar transient climate sensitivity across models than is expected from purely physical considerations. Following these indications that variations in the radiative forcing and ocean heat uptake across models act to narrow the spread in climate sensitivity, variations in the strength of feedbacks across the CMIP3 models are explored to see whether these also act to curtail the spread in climate sensitivity.

2. Feedbacks and their covariance
Feedbacks are variously deﬁned in the literature, making it useful to recap the notation used here, which follows the standard electronics literature deﬁnition. The relationship between changes in radiative forcing and temperature can be represented as a linear feedback system, DT 5 loDR 1 fDT, where perturbations in radiative forcing (DR in units of W m22) lead to direct changes in temperature (DT in units of 8C) according to the basic climate sensitivity (lo in 8C per W m22), as well as through feedbacks ( fx, which are unitless). The feedback factors are linearly additive and those associated with water vapor, the vertical lapse rate, albedo, and clouds are considered: fnet 5 fwv 1 flr 1 fa 1 fc. The mean and variance of fnet then depends on the joint probability distribution relating each feedback to one another, a topic returned to later. Solving for DT yields the expression DT 5 lo DR . 1 À f net (1)

This representation is based on the assumption that the earth’s temperature changes can be modeled as a linear perturbation and obviously breaks down for fnet $ 1.

I rely upon the feedbacks estimated for the CMIP3 models by Soden and Held (2006; see also Fig. 1 and Table 1 and appendix herein), where they considered results using the A1B emission scenario. Note that Soden and Held did not compute a climate sensitivity for the Goddard Institute for Space Studies Atmosphere–Ocean Model (GISS AOM) and GISS Model E-H (GISS EH) because these were only run out to 2100 AD and these models are excluded from the present analysis. Soden and Held (2006) deﬁne feedback parameters as gx 5 DRx/DTx and include the basic model response to changes in radiative forcing as a feedback. To convert to the formulation introduced above, basic climate sensitivity is obtained as lo 5 1/gp, where g p is Soden and Held’s Planck feedback. The feedback parameters for each model are then obtained as fx 5 logx [see Fig. 1 and Table 1 herein as well as Bony et al. (2006) and Roe and Baker (2007) for a more detailed discussion]. Anticorrelation between the water vapor feedback and the lapse rate feedback is expected on physical grounds (e.g., Cess 1975; Held and Soden 2000). For example, a less steep lapse rate (a negative feedback) implies relatively greater warming aloft and, by the Clausius– Clapeyron relationship, more upper tropospheric water vapor (a positive feedback). Thus, as is common, these two feedbacks are added together to form a single water vapor plus lapse rate feedback, fwv1lr. Note, however, that it can be questioned whether the anticorrelation between these feedbacks is an artifact of the models (Bony et al. 2006), possibly because coarse vertical resolution leads to a poor representation of changes in water vapor (Tompkins and Emanuel 2000). Whether other feedbacks ought to covary in one or another direction is less clear and will be taken up in greater detail below. The variance in the net feedbacks across the 12 CMIP3 models, var(fnet), is 0.0082, whereas the variances in the individual feedbacks are var(fa) 5 0.0004, var(fwv1lr) 5 0.0014, and var(fc) 5 0.014. The variance in cloud feedbacks, fc, is almost double the net variance, fnet, indicating that the other feedbacks compensate for variability in fc. Indeed, the cross-correlation between fc and fwv1lr is 20.7 and the cross-correlation between fc and fa is 20.4 (see Fig. 1). The anticovariance between fc and fa and between fc and fwv1lr is actually larger than the variance associated with fa and fwv1lr, respectively (see Table 2). If the covariance between individual feedbacks is suppressed and the individual feedback variances simply added together, the variance of fnet becomes 0.016, double the value obtained when covariance is included. Clouds appear to be the principal source of uncertainty in the models (e.g., Soden and Held 2006), as follows from the variance in fc being more than an order

1 JUNE 2010

HUYBERS

3011

FIG. 1. Feedback values from the CMIP3 collection of models (Soden and Held 2006). (a) The individual and net feedback factors for 12 climate models, ordered according to the strength of the net feedback. The cloud feedback plotted against (b) the albedo feedback and (c) the combined lapse rate and water vapor feedback.

of magnitude larger than the variance in fa or fwv1lr, but variance alone is an insufﬁcient description of differences in feedbacks across models. The covariance between clouds and the other feedbacks sums to 20.0082. Thus, the cloud covariance compensates for more than half of the cloud variance and, by coincidence, is very nearly equal in magnitude, albeit opposite in sign, to the net variance, var(fnet) 5 0.0082. Thus, both the variance in fc and the covariance between fc and other feedbacks appear to be leading order terms in determining fnet. Colman (2003a) also collected estimates of climate feedbacks from various models but, because the feedbacks were estimated using different methods, their intermodel variance is more difﬁcult to interpret than the results of Soden and Held (2006). It is nonetheless notable that Colman’s results indicate that the intermodel variance of fnet is nearly three times larger when the covariance between fc, fa, and fwv1lr is suppressed, indicating that substantial compensation also occurs between the estimated feedbacks in those models.

but it is still instructive to examine the implications that this ensemble has for the distribution of climate sensitivity. Climate sensitivity is deﬁned as DT/DR23, with DR23 representing the radiative forcing expected from a doubling of atmospheric CO2. An indication of the
TABLE 1. Columns are the basic response of the system to a change in radiative forcing, lo; the albedo, cloud, and combined lapse rate plus water vapor feedback; and the sum of the feedbacks. Rows correspond to individual models. All values are adapted from Soden and Held (2006, Table 1). Model names are listed in the appendix. Net Albedo Clouds wv 1 lr feedback 0.11 0.05 0.11 0.08 0.10 0.06 0.10 0.10 0.07 0.07 0.10 0.09 0.04 0.20 0.06 0.07 0.11 0.25 0.21 0.25 0.34 0.33 0.34 0.37 0.33 0.25 0.34 0.37 0.33 0.26 0.32 0.29 0.29 0.31 0.28 0.27 0.49 0.49 0.51 0.53 0.54 0.58 0.63 0.64 0.70 0.70 0.72 0.73

Model NCAR CCSM3 GISS ER NCAR PCM1 MRI INMCM3 GFDL CM2-1 GFDL CM2-0 CNRM UKMO HADCM3 IPSL MIROC MEDRES MPI ECHAM5

lo 0.31 0.31 0.31 0.31 0.31 0.31 0.31 0.31 0.31 0.32 0.32 0.31

3. Climate sensitivity
The CMIP3 ensemble of models is not designed to capture the full range of uncertainty in climate predictions,

3012

JOURNAL OF CLIMATE

VOLUME 23

TABLE 2. The covariance between feedbacks, the sums of variance (right column and bottom row), and the net variance (bottom right). All variances and covariances are multiplied by 10 000 and rounded. Also shown in parentheses are the cross-correlations between pairs of feedbacks. Note that the albedo and the combined water vapor plus lapse rate feedback each have a covariance with the cloud feedback that exceeds their individual variance. Models with higher albedo or combined lapse rate plus water vapor feedbacks thus actually tend to have a lower climate sensitivity. Albedo Albedo Clouds wv 1 lr Net 4 (1) 210 (20.4) 4 (0.6) 21 Clouds 210 (20.4) 139 (1) 231 (20.7) 97 wv 1 lr 4 (0.6) 231 (20.7) 14 (1) 213 Net 21 97 213 82

distribution of climate sensitivity can be obtained from the sample distribution of fnet. For illustrative purposes, fnet is assumed to follow a normal distribution characterized by the sample mean and variance obtained from the 12 CMIP3 models. (A Lilliefors test for the normality of the 12 net feedback values yields a p value of 0.34; thus, normality cannot be rejected, but this is a weak result given the small amount of data.) The assumption of normality is not ideal because it implicitly assumes that inﬁnite climate sensitivity has nonzero probability, and it does not correctly represent the probability of negative climate sensitivity. Weitzman (2009a) discusses the implications of very large climate sensitivity under more reasonable assumptions regarding the probability distribution, and Frame et al. (2005) and Annan and Hargreaves (2009) discuss how the choice of priors and distributional forms can inﬂuence the resulting estimates of climate sensitivity. Assuming normality, the distribution of fnet can be converted into a distribution for climate sensitivity following Roe and Baker (2007); see Fig. 2. The observed mean and variance of the net feedback [mean( fnet) 5 0.6, var( fnet) 5 0.008] gives a distribution of the climate sensitivity with a 95% conﬁdence range between 2.08 and 5.58C, whereas the net variance obtained without feedback covariance [mean( fnet) 5 0.6, var( fnet) 5 0.016] gives a range from 1.98 to 8.08C. The wider distribution of climate sensitivity is more consistent with the climateprediction.net results (Stainforth et al. 2005) and parallels how Roe and Baker (2007) estimated uncertainty across the CMIP3 models. Note that the length and fatness of the tail of the climate sensitivity distribution is particularly sensitive to changes in feedback uncertainty because of how feedback variance asymmetrically maps into climate sensitivity (Hansen et al. 1985; Roe and Baker 2007), with the upper 95% bound increasing by 2.58C. Often climate sensitivity is reported with a 90% conﬁdence interval, but 95% is also a standard

statistical choice; although this emphasizes the range where the distribution is more poorly understood (e.g., Annan and Hargreaves 2009), it is nonetheless perhaps of greater societal relevance (Weitzman 2009b). The two distributions of climate sensitivity considered here are illustrative of the importance of the covariance terms but neither is an acceptable estimate. In addition to the uncertainty in the functional form of the distributions, these estimates also come with the limitations of the CMIP3 ensemble, some of which were noted earlier. Additionally, the CMIP3 models are not independent of one another—both speciﬁcally (Tebaldi and Knutti 2007) and generally in that the assumptions, numerical approaches, and training of the modelers widely overlap— thus biasing the feedback variance low relative to that expected from independent realization. Knutti et al. (2010, and references therein) show the CMIP3 representation of 1980–99 surface air temperature contain systematic biases such that averaging across the various models reduces the rms error by less than half, whereas an approximately fourfold reduction is expected for independent errors. Further, the ensemble spread is curtailed by omission of ice shelf, carbon cycle, and other processes and, arguably, is widened by ignoring observational and other constraints upon climate sensitivity (e.g., Edwards et al. 2007; Knutti and Hegerl 2008; Urban and Keller 2009; Annan and Hargreaves 2009). Nonetheless, the enormous attention given to the model indications of climate sensitivity and the spread between these predictions, coupled with a sensitivity to the degree of covariance between feedbacks, suggests that inquiring into the origins of feedback covariance is worthwhile. Below I analyze the covariance between cloud and other feedbacks using some simple statistical tests. A more complete analysis would involve diagnosing the origins of feedback covariance within and across the CMIP3 models.

4. Origins of the covariance
There appear to be four possible explanations for how the overall negative covariance between feedbacks could arise: by chance, because of how the feedbacks are estimated, model parameterization of the physics inherently resulting in negative covariance, or through conditioning the models upon observations or expectations. These possibilities are not exclusive of one another.

a. Covariance by chance
What are the odds that the covariance observed between the feedbacks is truly zero and merely arises from chance ﬂuctuations? An analytical approach to assessing these odds would involve modeling the covariance

1 JUNE 2010

HUYBERS

3013

FIG. 2. Climate sensitivity distribution. (a) The probability distribution for climate sensitivity associated with a mean feedback of 0.6 and a variance of 0.008 (solid lines) or 0.016 (dashed lines). The higher variance results from assuming that the cloud, albedo, and combined water vapor and lapse rate feedbacks are independent. Vertical lines indicate the 95% intervals for each distribution. The positive skew of the probability distribution leads to a large 2.58C shift in the upper 95% bound but little change at the lower bound. (b) As in (a) but for the cumulative probability.

matrix and requires assumptions regarding the underlying feedback distributions. Instead, it seems preferable to use a bootstrap method that takes advantage of the sample distribution. Bootstrapping is performed by shufﬂing the feedbacks across the different models. For example, the NCAR CCSM3 albedo is randomly reassigning to any one of the albedos in the 12 models, including the NCAR CCSM3 model itself. This shufﬂing preserves the distribution of the feedbacks across models while destroying the expected covariance between different sets of feedbacks (e.g., Chernick 2007), in accord with a null hypothesis of zero covariance. The covariance matrix associated with the feedbacks is then recomputed from the shufﬂed feedback matrix, and summing across the rows and columns gives a realization of the net feedback variance. Note that the diagonal of the covariance matrix is unaffected because only covariance, not variance, depends on the ordering the feedbacks. Repeating the bootstrap procedure 100 000 times indicates a 0.3% probability for variance to be equal to or lower than the observed value of 0.008 by chance alone. It is thus safe to reject the null hypothesis and conclude that the small variance between model feedbacks arises from an actual negative covariance between the feedbacks.

Now the question becomes why such negative covariance exists.

b. Feedback estimation artifacts
The least interesting explanation of the negative covariance between clouds and the other feedbacks is as an artifact of the manner in which cloud feedbacks are estimated. The estimates used here (Soden and Held 2006) were acquired using the partial radiative perturbation approach (Wetherald and Manabe 1988; Held and Soden 2000). For each of 12 models, Soden and Held (2006) computed the change in a climate variable relative to the change in mean surface temperature between two decadelong control periods. The resulting ratios were then multiplied by the partial derivatives of top of the atmosphere radiation with respect to each climate variable to yield sensitivity ﬁelds. The climate variables considered were vertically average temperature, lapse rate, and albedo— each as a function of latitude, longitude, and (excepting average temperature) altitude. The ﬁelds of radiative sensitivity to temperature changes were then integrated from the surface to tropopause and averaged globally. Note that sensitivities to radiation were only estimated for the Geophysical Fluid Dynamics Laboratory (GFDL) but were applied to all models, which introduces some

3014

JOURNAL OF CLIMATE

VOLUME 23

error (Soden et al. 2008). Furthermore, the partial radiative perturbation approach is less prone to introducing correlation between clouds and other feedbacks than the other commonly used method—the so-called cloud forcing approach—but is by no means guaranteed to be free of artifacts (Aires and Rossow 2003; Soden et al. 2004; Bony et al. 2006; Soden et al. 2008). One issue is that cloud feedback could not be directly estimated because of changes in vertical overlap (Soden and Held 2006). Cloud feedbacks were instead found as the residual between the estimated net feedback and the individual feedback estimates, fc 5 fnet 2 fa 2 fwv1lr. Uncertainties in the estimation of these parameters could, in the limit, lead to fnet being unrelated to both fa and fwv1lr, yielding fc 5 2( fa 1 fwv1lr) 1 , where  is uncorrelated with both fa and fwv1lr. The expected covariances are then (i) cov( fc, fa) 5 2var( fa) 2 cov( fa, fwv1lr) 5 20.0004 2 0.0004 5 20.0008 and (ii) cov( fc, fwv1lr) 5 2var( fwv1lr) 2 cov( fa, fwv1lr) 5 20.0014 2 0.0004 5 20.0018, where the values for the variance and covariance are taken from the sample values (see Table 2). The case of negative sample covariance imposed by the estimation procedure considered here seems an upper bound, much larger than the expected errors (Soden and Held 2006; Soden et al. 2008), yet the resulting covariances are still less negative than the sampled values, cov( fc, fa) 5 20.0010 and cov( fc, fwv1lr) 5 20.0031. A scenario in which random draws of feedbacks happen to accentuate negative covariance present from estimation artifacts cannot be ruled out, but such a compound explanation seems unsatisfying. Other artifacts could also be present, the nature of which is unclear. It also notable that the manner in which the cloud feedbacks are calculated absorbs all processes that inﬂuence each model’s sensitivity except the feedbacks that are directly estimated (Soden and Held 2006). It is thus not possible to fully determine which model elements contribute to the variance and covariance associated with fc. Direct estimation of cloud feedbacks would permit more conclusive results.

c. Inherent covariance between feedbacks in the models
The nonlinearities inherent to the climate system suggest that it is unlikely for any feedback to be truly independent. Yet the general expectation of interaction is distinct from determination of the magnitude or even the expected sign of the relationship between feedbacks. The more poignant question is whether there is a physical basis by which to expect cloud feedbacks to be anticorrelated with the strength of albedo and water vapor feedbacks. Colman et al. (1997) analyzed the feedbacks present in a single model and found evidence for signiﬁcant

nonlinearity in the longwave response of lapse rates, clouds, and water vapor to perturbations in sea surface temperature ranging between 228 and 28C. Although interactions between feedbacks were not explicitly diagnosed, nonlinear changes in the strength of an individual feedback indicate sensitivity to the background climate and, thus, the likelihood of covariance between feedbacks. A more recent study by Colman (2003b) indicated that the strength of feedbacks also varies over the course of the seasons, further supporting the notion of nonlinear model feedbacks. Likewise, Aires and Rossow (2003) highlight nonlinear interactions between feedbacks in the context of a simple model using a neural network approach. Sanderson et al. (2008b) explored the leading interactions between feedbacks in a version of the Hadley Centre Slab Climate Model version 3 (HADSM3) through an empirical orthogonal function analysis of model radiative responses obtained through perturbation of model parameters. They show that the majority of the difference in climate sensitivity can be traced to variations in the entrainment coefﬁcient in their model’s convective scheme. Reducing the entrainment coefﬁcient increases the water vapor feedback strength because convection then delivers vapor farther aloft and decreases the cloud feedback strength because there are then fewer low-level clouds at midlatitudes in the basic model state. The sense of anticorrelation between cloud and water vapor feedbacks is consistent with the results observed across the CMIP3 models, although this result is obtained using only a single model. This example illustrates how uncertainty in parameters can introduce feedback covariance across multiple versions of a model and, presumably, across different models. There are also more physical reasons why feedbacks might covary. For example, more vigorous deep convection associated with a warming climate would increase upper tropospheric relative humidity and may also increase anvil cloud cover, albedo, and negative shortwave forcing, potentially leading to negative covariance between water vapor and cloud feedbacks (A. D. Del Genio 2009, personal communication). As another example, Gorodetskaya et al. (2008) document that the loss of Arctic sea ice and surface albedo is compensated by an increase in low-level clouds. Although Kay and Gettelman (2009) ﬁnd little evidence for such cloud compensation in satellite observations, such a mechanism could nonetheless operate across the CMIP3 models. It should also be noted that the magnitude and sign of covariance between feedbacks will depend upon the climate state. For example, Abbot et al. (2009) illustrate how prescribing a much warmer climate without sea ice initiates convective cloud formation in the Arctic that

1 JUNE 2010

HUYBERS

3015

causes a strong positive feedback upon warming, suggesting that ultimately the positive sea ice–albedo feedback could also be associated with a positive sea ice–cloud feedback. Presumably many more such interactions between feedbacks await articulation. While the controls upon feedbacks have begun to be parsed (e.g., Bony and Dufresne 2005; Webb et al. 2006), there remains substantial uncertainty both in identifying the causes of variations in individual feedbacks and in identifying interdependence between feedbacks (e.g., Bony et al. 2006; Sanderson et al. 2008b). It seems likely that the observed covariance depends at least in part on physical interactions between feedbacks or on how that physics is parameterized, though it is not yet possible to attribute the covariance among feedbacks in the CMIP3 models to a particular set of physical processes or parameter settings.

d. Feedback conditioning
Covariance could also arise through conditioning the models. A dice game illustrates how this might work. Assume two 6-sided dice that are fair so that no correlation is expected between the values obtained from successive throws. But if throws are only accepted when the dice sum to 7, for example, then a perfect anticorrelation will exist between acceptable pairs (i.e., 1–6, 2–5, etc.). Now introduce a 12-sided die and require the three dice to sum to 14. An expected cross-correlation of 20.7 then exists between realizations of the 12-sided die and each of the 6-sided die, whereas the values of the two 6-sided dice have no expected correlation between them. The summation rule forces the 6-sided dice to compensate for the greater range of the 12-sided die. This illustrates how placing constraints on the output of a system can introduce covariance between the individual components. Note that this covariance can be introduced, albeit not diagnosed, without ever actually observing the individual values. An analogous situation may hold for the CMIP3 models, with variations in flr and fwv compensating for the larger variations in fc. For example, if DR23/DT is made to have a speciﬁc value or range of values, it follows from Eq. (1) that only certain combinations of feedback values will be acceptable, fc 1 fa 1 flr1wv 5 1 2 loDR23/DT. Of course, the magnitude of DR23 or lo could be adjusted, as also seems to have been the case for the CMIP3 models (Schwartz et al. 2007; Kiehl 2007; Knutti 2008)—but only feedbacks are focused on here. Model conditioning can be differentiated as calibration and tuning. Calibration is used to refer to the adjustment of model parameters so as to bring model results into better agreement with speciﬁc observations or theory, whereas tuning will refer to adjustments made for other reasons. The distinction is useful—even if never perfect

given that what constitutes agreement, observation, and theory is partly subjective—because the statistical implications of these two forms of conditioning are quite different. As an example of calibration, CMIP3 models tend to underestimate longwave and overestimate shortwave surface radiation by, on average, 6 W m22 (Wild 2008), an anticorrelation that can be understood as arising from the need to close the energy budget. Variations in aerosol radiative forcing (Schwartz et al. 2007; Kiehl 2007) and ocean heat uptake (Raper et al. 2002) that offset differences in climate sensitivity to give the observed degree of modern warming are also indicative of model calibration (Knutti 2008). As a ﬁnal example, the standard model settings of version 3 of the Hadley Center Atmospheric Model (HadAM3) were found to be very nearly optimal for reproducing a range of climate data relative to a large number of perturbed versions of the model (Sanderson et al. 2008a), suggesting that this model was highly calibrated. It seems likely that model feedbacks are also calibrated against modern climate variations. The amount of covariance such calibration introduces among feedbacks could be explored, for example, by computing the feedback covariance across parameter perturbed realizations of general circulation models and comparing these against the feedback covariance found in the subsample of perturbed models that reproduce modern temperature trends. Model conditioning need not be restricted to calibration of parameters against observations, but could also include more nebulous adjustment of parameters, for example, to ﬁt expectations, maintain accepted conventions, or increase accord with other model results. These more nebulous adjustments are referred to as tuning. As one example of possible tuning, Van der Sluijs et al. (1998) discuss evidence that reported values of climate sensitivity are anchored near the 38 6 1.58C range initially suggested by the ad hoc study group on carbon dioxide and climate (Charney et al. 1979) and that these were not changed because of a lack of compelling reason to do so. More recently reported values of climate sensitivity have not deviated substantially (e.g., Knutti et al. 2008), having a range of 28–4.58C. The implication is that the reported values of climate sensitivity are, in a sense, tuned to maintain accepted convention. Another candidate example is the difference in cloud feedback strength reported between the studies by Cess et al. (1990) and Cess et al. (1996) wherein a tendency was noted for those models with the largest cloud feedbacks to be revised toward more modest values, whereas no countervailing tendency was observed for models initially having a modest cloud feedback strength. As Cess et al. (1996, p. 12 794) put it,

3016

JOURNAL OF CLIMATE

VOLUME 23

TABLE 3. As in Table 2 but for random models that are only accepted if they have a climate sensitivity between 2.28 and 4.28C. Albedo Albedo Clouds wv 1 lr Net 15 (1) 212 (20.3) 22 (20.1) 2 Clouds 212 (20.3) 109 (1) 244 (20.6) 52 wv 1 lr 22 (20.1) 244 (20.6) 50 (1) 5 Net 2 52 5 59

Although substantial changes to GCM cloud parameterizations have been implemented since 1990, it is not clear that a general increase in their accuracy is the sole explanation for the present trend toward convergence. It may be that current models are producing similar errors, while the earlier models produced different errors.

Covariance between model feedbacks is expected to arise if models are tuned toward a certain climate sensitivity, and this possibility can be explored with a more detailed version of the dice game. Consider the case in which feedbacks are drawn from a normal distribution having a mean corresponding to the CMIP3 feedbacks (see Table 1) and a standard deviation twice the observed value, where the larger standard deviation is used because the untuned model parameters would presumably have a wider spread. Model realizations are then only accepted if they have a climate sensitivity between 2.28–4.28C, the smallest and largest climate sensitivities implied by the net feedback strength of the 12 CMIP3 models examined here, where climate sensitivity is calculated according to Eq. (1) with a DR23 of 3.7 W m22 and lo of 0.31. Using this criteria, ;40 000 of the 100 000 realizations are accepted, and these have a covariance structure similar to that diagnosed for the CMIP3 models (see Table 3). In particular, the accepted models have anticorrelations between fc and fa of 20.3 and between fc and flr1wv of 20.6, leading to more than a factor of 3 reduction in the variance of fnet. The one exception is a lack of cross correlation between fa and flr1wv, whereas the CMIP3 models give a cross-correlation of 0.6 that is presumably attributable to one of the mechanisms described earlier. Note that, as with the dice game, conditioning upon the climate sensitivity serves to introduce feedback covariance without the need to actually calculate the individual feedback values. Tuning climate sensitivity to lie within the observed spread across the CMIP3 models is a sufﬁcient explanation for the origins of the compensation between fc and the other feedbacks. However, the simple example of tuning given here is more explicit than would be expected in actual model development. Little reason exists to conclude that a model would be rejected on the sole basis of an outlying climate sensitivity or that model feedbacks are intentionally adjusted to compensate one

another. More plausible is that model development and evaluation leads to an implicit tuning of the parameters, as suggested by Cess et al. (1996). As another example, of the 414 stable model versions Stainforth et al. (2005) analyzed, six versions yielded a negative climate sensitivity. Those six versions were apparently subjected to greater scrutiny and were excluded because of nonphysical interactions between the model’s mixed layer ocean and tropical clouds. Scrutinizing models that fall outside of an expected range of behavior, while reasonable from a model development perspective, makes them less likely to be included in an ensemble of results and, therefore, is apt to limit the spread of a model ensemble. In this sense, the covariance between the CMIP3 model feedbacks may be symptomatic of the uneven treatment of outlying model results.

5. Discussion and conclusions
Numerical climate models are indispensable tools for predicting climate. If we are to correctly interpret their results and optimally design future model studies, we must carefully track what assumptions and observations are incorporated into them. Evidence has accumulated that intermodel differences in climate forcing (Webb et al. 2006; Schwartz et al. 2007; Kiehl 2007; Knutti 2008), ocean heat uptake (Raper et al. 2002), and the individual feedbacks that contribute to climate sensitivity (this study) act to reduce the spread in global surface warming realized across models. These compensating model features may have a sound physical basis, but the specter of tuning leading to a curtailment of the intermodel spread in climate sensitivity is difﬁcult to dismiss. Knutti (2008) argued that parameter covariance across models is neither unexpected nor problematic if models are interpreted as having been calibrated to observations. A problem does arise, however, when model results are used in conjunction with observations to constrain climate sensitivity [see reviews by Edwards et al. (2007) and Knutti and Hegerl (2008)], as this runs the risk of doubly calling upon the data. Furthermore, comparison between model results and the climate of the twentieth century may then be circular (also see Rodhe et al. 2000). Ultimately, we need to know what exactly goes into a model if we are to correctly interpret its output. While it seems a large undertaking, a more objective approach to calibration may be warranted. Standard datasets could be agreed upon for tuning climate models, with other data explicitly withheld for testing. Or perhaps a more readily undertaken course of action is to test model results against less closely monitored aspects of the climate, such as features of the seasonal cycle of temperature (Knutti et al. 2006; Stine et al. 2009) and

1 JUNE 2010

HUYBERS

3017
Geophysical Fluid Dynamics Laboratory Climate Model version 2.0 Geophysical Fluid Dynamics Laboratory Climate Model version 2.1 Goddard Institute for Space Studies Model E-R Institute of Numerical Mathematics Coupled Model, version 3.0 L’Institut Pierre-Simon Laplace Model for Interdisciplinary Research on Climate, mediumresolution version Max Planck Institute ECHAM5 Meteorological Research Institute National Center for Atmospheric Research Community Climate System Model, version 3 National Center for Atmospheric Research Parallel Climate Model version 1 Third climate conﬁguration of the Met Ofﬁce Uniﬁed Model

albedo (Hall and Qu 2006). The paleoclimate record is also useful in this manner (e.g., Braconnot et al. 2007) in that it can be more safely assumed that models have not been calibrated to reproduce these more distant and, during many epochs, dramatically different climates. Convergence between model results, if not truly driven by a decrease in model uncertainty or clearly understood as a result of calibration, could have the unfortunate consequence of lulling us into too great a conﬁdence in model predictions or inferences of too narrow a range of future climates. To the extent that it occurs, tuning the models based on expectation or convention renders the modeling process a partially subjective exercise from which it is very complicated to derive a statistical interpretation. Related discussion can be found in a wide range of papers (e.g., Hodges and Dewar 1992; Knutti et al. 2010). As a ﬁnal note, the CMIP3 archive can be characterized as an ensemble of opportunity, not speciﬁcally designed to span the range of uncertainty in future climates. A better indication of the range of possible future climates may be obtained through more exhaustive searches of the behavior of simpler models under perturbation of their parameters (e.g., Stainforth et al. 2005). It may also be sensible to push the most sophisticated models toward generating realizations of future climate that are as inconsistent as possible with current predictions, while still being physically sound. Focusing on maximally inconsistent possibilities seems more likely to lead to scientiﬁc discoveries and to uncover climate surprises.1 A maximally inconsistent ensemble of stateof-the-art model realizations would also have the advantage of suggesting outer bounds upon the range of climate sensitivity and, therefore, be complimentary to existing estimates. Acknowledgments. Without implying their agreement, James Hammit, Reto Knutti, Gerard Roe, Alexander Stine, Martin Weitzman, and Carl Wunsch are thanked for the discussion and insightful suggestions they provided. Support was provided by the David and Lucile Packard Foundation.

GFDL CM2-0

GFDL CM2-1

GISS ER INM-CM3 IPSL MIROC MEDRES

MPI ECHAM5 MRI NCAR CCSM3

NCAR PCM1

UKMO HADCM3

REFERENCES Abbot, D., M. Huber, G. Bousquet, and C. Walker, 2009: HighCO2 cloud radiative forcing feedback over both land and ocean in a global climate model. Geophys. Res. Lett., 36, L05702, doi:10.1029/2008GL036703. Aires, F., and W. Rossow, 2003: Inferring instantaneous, multivariate and nonlinear sensitivities for the analysis of feedback processes in a dynamical system: Lorenz model case-study. Quart. J. Roy. Meteor. Soc., 129, 239–275, doi:10.1256/qj.01.174. Annan, J., and J. Hargreaves, 2009: On the generation and interpretation of probabilistic estimates of climate sensitivity. Climatic Change, doi:10.1007/s10584-009-9715-y, in press. Bony, S., and J. Dufresne, 2005: Marine boundary layer clouds at the heart of tropical cloud feedback uncertainties in climate models. Geophys. Res. Lett., 32, L20806, doi:10.1029/2005GL023851. ——, and Coauthors, 2006: How well do we understand and evaluate climate change feedback processes? J. Climate, 19, 3445–3482. Braconnot, P., and Coauthors, 2007: Results of PMIP2 coupled simulations of the Mid-Holocene and Last Glacial Maximum— Part 1: Experiments and large-scale features. Climate Past, 3, 261–277. Cess, R., 1975: Global climate change: An investigation of atmospheric feedback mechanisms. Tellus, 27, 193–198. ——, and Coauthors, 1990: Intercomparison and interpretation of climate feedback processes in 19 atmospheric general circulation models. J. Geophys. Res., 95 (D10), 16 601–16 615. ——, and Coauthors, 1996: Cloud feedback in atmospheric general circulation models: An update. J. Geophys. Res., 101, 12 791– 12 794. Charney, J., and Coauthors, 1979: Carbon dioxide and climate: A scientiﬁc assessment. National Academy of Sciences, 22 pp.

APPENDIX Model Names
CNRM Centre National de Recherches ´ ´ Meteorologiques

Such promotion of scientiﬁc discord may be contrasted with the IPCC process, which tends to prize consensus, albeit for political rather than scientiﬁc reasons.

1

3018

JOURNAL OF CLIMATE

VOLUME 23

Chernick, M., 2007: Bootstrap Methods: A Guide for Practitioners and Researchers. Wiley-Interscience, 369 pp. Colman, R., 2003a: A comparison of climate feedbacks in general circulation models. Climate Dyn., 20, 865–873, doi:10.1007/ s00382-003-0310-z. ——, 2003b: Seasonal contributions to climate feedbacks. Climate Dyn., 20, 825–841, doi:10.1007/s00382-002-0301-5. ——, S. Power, and B. McAvaney, 1997: Non-linear climate feedback analysis in an atmospheric general circulation model. Climate Dyn., 13, 717–731, doi:10.1007/s003820050193. Edwards, T., M. Cruciﬁx, and S. Harrison, 2007: Using the past to constrain the future: How the palaeorecord can improve estimates of global warming. Prog. Phys. Geogr., 31, 481–500, doi:10.1177/0309133307083295. Frame, D., B. Booth, J. Kettleborough, D. Stainforth, J. Gregory, M. Collins, and M. Allen, 2005: Constraining climate forecasts: The role of prior assumptions. Geophys. Res. Lett., 32, L09702, doi:10.1029/2004GL022241. Gorodetskaya, I., L. Tremblay, B. Liepert, M. Cane, and R. Cullather, 2008: The inﬂuence of cloud and surface properties on the Arctic Ocean shortwave radiation budget in coupled models. J. Climate, 21, 866–882. Hall, A., and X. Qu, 2006: Using the current seasonal cycle to constrain snow albedo feedback in future climate change. Geophys. Res. Lett., 33, L03502, doi:10.1029/2005GL025127. Hansen, J., G. Russell, A. Lacis, I. Fung, D. Rind, and P. Stone, 1985: Climate response times: Dependence on climate sensitivity and ocean mixing. Science, 229, 857. Held, I., and B. Soden, 2000: Water vapor feedback and global warming. Annu. Rev. Energy Environ., 25, 441–475, doi:10.1146/ annurev.energy.25.1.441. Hodges, J., and J. Dewar, 1992: Is it you or your model talking? A framework for model validation. Rand Corporation Rep. 4114, 43 pp. Kay, J., and A. Gettelman, 2009: Cloud inﬂuence on and response to seasonal Arctic sea ice loss. J. Geophys. Res., 114, D18204, doi:10.1029/2009JD011773. Kiehl, J., 2007: Twentieth-century climate model response and climate sensitivity. Geophys Res. Lett., 34, L22710, doi:10.1029/ 2007GL031383. Knutti, R., 2008: Why are climate models reproducing the observed global surface warming so well? Geophys. Res. Lett., 35, L18704, doi:10.1029/2008GL034932. ——, and G. Hegerl, 2008: The equilibrium sensitivity of the Earth’s temperature to radiation changes. Nature Geosci., 1, 735–743, doi:10.1038/ngeo337. ——, G. Meehl, M. Allen, and D. Stainforth, 2006: Constraining climate sensitivity from the seasonal cycle in surface temperature. J. Climate, 19, 4224–4233. ——, and Coauthors, 2008: A review of uncertainties in global temperature projections over the twenty-ﬁrst century. J. Climate, 21, 2651–2663. ——, F. Reinhard, C. Tebaldi, J. Cermak, and G. Meehl, 2010: Challenges in combining projections from multiple climate models. J. Climate, in press. McAvaney, B., and H. Le Treut, 2003: The cloud feedback model intercomparison project: CFMIP. CLIVAR Exchanges, No. 26, International CLIVAR Project Ofﬁce, Southampton, United Kingdom 1–4. Meehl, G., C. Covey, T. Delworth, M. Latif, B. McAvaney, J. Mitchell, R. Stouffer, and K. Taylor, 2007: The WCRP CMIP3 multimodel dataset: A new era in climate change research. Bull. Amer. Meteor. Soc., 88, 1383–1394.

Raper, S., J. Gregory, and R. Stouffer, 2002: The role of climate sensitivity and ocean heat uptake on AOGCM transient temperature response. J. Climate, 15, 124–130. Rodhe, H., R. Charlson, and T. Anderson, 2000: Avoiding circular logic in climate modeling. Climatic Change, 44, 419–422, doi:10.1023/A:1005536902789. Roe, G., and M. Baker, 2007: Why is climate sensitivity so unpredictable? Science, 318, 629–632. Sanderson, B., and Coauthors, 2008a: Constraints on model response to greenhouse gas forcing and the role of subgrid-scale processes. J. Climate, 21, 2384–2400. ——, C. Piani, W. Ingram, D. Stone, and M. Allen, 2008b: Towards constraining climate sensitivity by linear analysis of feedback patterns in thousands of perturbed-physics GCM simulations. Climate Dyn., 30, 175–190, doi:10.1007/s00382-007-0280-7. Schwartz, S., R. Charlson, and H. Rodhe, 2007: Quantifying climate change: Too rosy a picture? Nature Rep. Climate Change, 2, 23–24, doi:10.1038/climate.2007.22. Soden, B., and I. Held, 2006: An assessment of climate feedbacks in coupled ocean–atmosphere models. J. Climate, 19, 3354– 3360. ——, A. Broccoli, and R. Hemler, 2004: On the use of cloud forcing to estimate cloud feedback. J. Climate, 17, 3661– 3665. ——, I. Held, R. Colman, K. Shell, J. Kiehl, and C. Shields, 2008: Quantifying climate feedbacks using radiative kernels. J. Climate, 21, 3504–3520. Stainforth, D., and Coauthors, 2005: Uncertainty in predictions of the climate response to rising levels of greenhouse gases. Nature, 433, 403–406, doi:10.1038/nature03301. Stine, A., P. Huybers, and I. Fung, 2009: Changes in the phase of the annual cycle of surface temperature. Nature, 457, 435–440, doi:10.1038/nature07675. Tebaldi, C., and R. Knutti, 2007: The use of the multi-model ensemble in probabilistic climate projections. Philos. Trans. Roy. Soc., 365A, 2053–2075, doi:10.1098/rsta.2007.2076. Tompkins, A., and K. Emanuel, 2000: The vertical resolution sensitivity of simulated equilibrium temperature and watervapour proﬁles. Quart. J. Roy. Meteor. Soc., 126, 1219–1238, doi:10.1002/qj.49712656502. Urban, N., and K. Keller, 2009: Complementary observational constraints on climate sensitivity. Geophys. Res. Lett., 36, L04708, doi:10.1029/2008GL036457. Van der Sluijs, J., S. Shackley, and B. Wynne, 1998: Anchoring devices in science for policy: The case of consensus around climate sensitivity. Soc. Stud. Sci., 28, 291–323. Webb, M., and Coauthors, 2006: On the contribution of local feedback mechanisms to the range of climate sensitivity in two GCM ensembles. Climate Dyn., 27, 17–38, doi:10.1007/ s00382-006-0111-2. Weitzman, M., 2009a: Additive damages, fat-tailed climate dynamics, and uncertain discounting. Economics: The Open-Access, Open-Assessment E-Journal, 3, 2009-26. [Available online at http://www.economics-ejournal.org/economics/journalarticles/ 2009-26.] ——, 2009b: On modeling and interpreting the economics of catastrophic climate change. Rev. Econ. Stat., 91, 1–19, doi:10.1162/ rest.91.1.1. Wetherald, R., and S. Manabe, 1988: Cloud feedback processes in a general circulation model. J. Atmos. Sci., 45, 1397–1416. Wild, M., 2008: Short-wave and long-wave surface radiation budgets in GCMs: A review based on the IPCC-AR4/CMIP3 models. Tellus, 60A, 932–945, doi:10.1111/j.1600-0870.2008.00342.x.