Essays on Time-Varying Discount Rates A dissertation presented by Ian Louis Dew-Becker to The Department of Economics in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Economics Harvard University Cambridge, Massachusetts May 2012 ©2012 — Ian Louis Dew-Becker All rights reserved. Dissertation Advisor: Professor John Y. Campbell Ian Louis Dew-Becker Essays on Time-Varying Discount Rates ABSTRACT This dissertation consists of three essays that explore the interaction between various discount rates and the macroeconomy. The first essay studies the cross-section of discount rates, specifically, the term structure of interest rates. When physical capital is discounted like a bond with a similar duration, a high term spread is associated with low average duration for investment. I document a strong negative correlation between the term spread and the duration of investment, implying an important role for the cost of capital in determining the composition of aggregate investment. The results are robust to including a variety of controls. Consumer durable goods purchases display similar behavior. The second essay develops a new utility specification that incorporates Campbell– Cochrane–type habits into the Epstein–Zin class of preferences. It is a model in which risk premia change over time. In a simple calibration of a real business cycle model with EZ-habit preferences, the model generates a strongly countercyclical equity premium, substantial equity return predictability, and a stable riskless interest rate, as in the data. Moreover, conditional on the average level of risk aversion, time-variation in risk aversion increases the volatility and mean return of equities. On the real side, the model matches the short and long-term variances of output, consumption, and investment growth. As an additional empirical test, I measure implied risk aversion and find that it has an R² of over 50 percent for 5-year stock returns in post-war data. The third essay develops a New-Keynesian model in which households have Epstein– Zin preferences with time-varying risk aversion and the central bank has a time-varying inflation target. The model matches the dynamics of nominal bond prices in the US economy well: the fitting errors for individual bond yields are roughly as large as those obiii tained from a non-structural three-factor model, and two thirds smaller than in models with constant risk aversion or a constant inflation target. iv CONTENTS Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Investment and the Cost of Capital in the Cross-Section: The Term Spread Predicts the Duration of Investment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alternative explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consumer durables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The firm-level mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii vii 1 1 5 12 14 19 27 29 37 39 39 44 51 72 81 91 2. A model of time-varying risk premia with habits and production . . . . . . . . . . . . . 2.1 2.2 2.3 2.4 2.5 2.6 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calibration and simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Empirical return forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Bond pricing with a time-varying price of risk in an estimated medium-scale Bayesian DSGE model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 v 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Household preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 97 Aggregate supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Model solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Empirics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Asset pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 The real economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Appendix 146 A. Appendix to Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 A.1 The approximation for average duration . . . . . . . . . . . . . . . . . . . . . 147 A.2 Further robustness tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 B. Appendix to chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 B.1 The certainty equivalent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 B.2 Derivation of the SDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 B.3 The log-linear model with production . . . . . . . . . . . . . . . . . . . . . . 153 B.4 Details of return forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 C. Appendix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 C.1 Results from the text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 C.2 Approximation method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 C.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 vi ACKNOWLEDGMENTS I completed this dissertation with the support of many people. John Campbell was endlessly patient, and amazingly diligent about reading and commenting on any work I sent him. He went above and beyond any reasonable expectation. I also benefited from numerous comments and conversations with my other advisors, Emmanuel Farhi, Effi Benmelech, and also David Laibson, Robin Greenwood, and Jim Stock, among others. Countless conversations with fellow students made this possible, including in particular Jason Beeler, Stefano Giglio, Kelly Shue, and Eric Zwick. Economists too numerous to name also generously gave me constant feedback on my work which is reflected here. But none of the above should be implicated in any errors or omissions. vii 1. INVESTMENT AND THE COST OF CAPITAL IN THE CROSS-SECTION: THE TERM SPREAD PREDICTS THE DURATION OF INVESTMENT 1.1 Introduction This paper studies the cross-section of investment. While there is an enormous amount of work studying the aggregate level of investment and the determinants of firm-level investment, there is essentially no analysis of the determinants of investment in different types of assets. This paper begins that task by analyzing the distribution of investment across assets according to their depreciation rates. I show that when interest rates for longduration assets are higher than those for short-duration assets, aggregate investment shifts relatively towards high-depreciation assets. The response of investment to the cost of capital is a key mechanism in macroeconomics and finance. It is central to production-based asset pricing theories (e.g. Cochrane 1991, 1996); a primary feedback mechanism in standard general-equilibrium models; one of the key drivers for the response of the economy to monetary policy shocks; the source of the classical crowding-out effect of government spending; and an important determinant of the size of distortions from taxes. This paper considers a novel method for uncovering an empirical relationship between investment and the cost of capital. There is a long literature that studies the effect of the cost of capital on investment. Simple methods have, in general, failed to find important effects.1 Bernanke and Gertler (1995) find that nonresidential investment seems to respond only weakly to shocks to the Federal funds rate.2 The discount rate is also a determinant of Tobin’s Q, but estimates of the impact of Q on investment tend to be small (Summers, 1981; Eberly, Rebelo, and Vincent, 1 2 See Chirinko, 1993, for an extensive review. However, in recent unpublished papers, Gilchrist and Zakrajsek (2008) and Guiso et al. (2002) find a relationship in micro data between investment and interest rates. 1 2009, give a recent review). The primary contribution of this paper is to show that interest rates affect what assets firms invest in at the aggregate level. Furthermore, the effect is relevant at the cyclical frequency: it is neither centered around discrete (and somewhat rare) policy changes, such as tax changes, or dependent on very long-term effects.3 The basic idea here is to forecast the cross-section of investment using the cross-section of interest rates, instead of forecasting the level of investment with the level of interest rates. Long-term assets are discounted with long-term interest rates, and short-term assets with short rates. When long rates are higher than short rates—the term spread is high—a cost-of-capital effect implies investment should shift towards short-duration assets. The negative relationship holds strongly in the data: it explains roughly one third of the crosssectional variation in investment by duration, and the effect holds both within and across industries. Standard regressions of aggregate investment on the level of interest rates have the fundamental identification problem that periods of high interest rates may also be periods when investment demand is high, so the correlation between investment and interest rates could be zero or even positive. By studying the cross-section, I abstract from aggregate shocks, hopefully reducing this endogeneity problem. The strong empirical results suggest that in fact endogeneity is less of an issue in the cross-section. The data is simple to construct. I obtain nominal investment by asset and year from the Bureau of Economic Analysis. I study an index of average duration defined as the average the of the assets’ economic life-spans, weighted by their share in aggregate investment in each year.4 Figure 1.1 shows that this index of average duration is highly negatively correlated with the spread between interest rates on ten and one-year nominal Treasury bonds (note that the axis for average duration is reversed for the sake of clarity). When interest rates are relatively high for long-duration assets, investment shifts towards shortduration assets, creating a strong negative correlation between average duration and the 3 Caballero (1994) and Schaller (2006) use cointegration methods to show that in the long run the cost of capital is meaningfully related to the size of the capital stock (and hence the level of investment). A number of researchers have also focused on high-frequency changes in taxes which produce large movements in the cost of capital and investment (Hassett and Hubbard, 2002, provide an extensive review). 4 Specifically, "lifespan" is measured as a Macaulay duration using data on economic depreciation rates. 2 term spread. A negative raw correlation between investment and interest rates suggests that the cost of capital has an important role in determining the cross-sectional distribution of investment, but there are alternative mechanisms that could produce this result. I therefore build a simple Q-theory model to help elucidate the possible sources of bias in the basic result in figure 1.1 and try to account for them in subsequent regressions. I control for the level of productivity and expected productivity growth in a variety of ways and find that they do not eliminate the basic effect. More importantly, I find that the term spread–average duration relationship is not driven by changes in demand across industries. When the term spread is high, investment shifts towards low-duration assets within individual industries, in addition to shifting from industries that use more long-duration assets to ones that use more short-duration assets. In addition to contributing to the literature on the determinants of the level of investment, this paper is related to the recent literature on production-based asset pricing with projects that have differing characteristics (e.g. Berk, Green, and Naik, 1999, and Gomes, Kogan, and Yogo, 2009). While those papers show that variation in the types of capital owned by firms can lead to differences in their stock prices, I find that variation in the cross-section of asset prices can affect the types of investment that firms undertake. The findings here are also relevant to understanding the relationship between interest rates and debt issues. The final section of the paper provides novel evidence that firms match the maturity of their debt issues to their physical investment, consistent previous evidence in the finance literature (e.g. Stohs and Mauer, 1996). My findings suggest that the timing of debt issues to the term spread documented by Baker, Greenwood, and Wurgler (2003) could be explained by the dynamics of physical investment and the fact that firms match the maturity of their debt to their assets. The remainder of the paper is organized as follows. Section 1.2 describes the data and section 1.3 reports the main result. Next, I outline in section 1.4 a simple model that justifies the regression of the average duration of investment on the term spread. Section 1.5 controls for a number of possible biases suggested by the investment model and shows that the term spread is the single most powerful predictor of the average duration of in3 Figure 1.1: The average duration of investment versus the term spread 2 -0.2 1.5 -0.15 Term Spread 1 -0.1 0.5 -0.05 Term Spread -0.5 0.05 -1 0.1 Average Duration 0.15 -1.5 -2 0.2 Note: The term spread is the gap between the 10 and 1-year treasury yields averaged over the previous year. Both variables are HP-detrended. The axis for average duration is reversed. Grey bars indicate NBER-dated recessions. Duration 4 1953 1958 1963 1968 1973 1978 1983 0 0 1948 1988 1993 1998 2003 vestment. In section 1.6 I show that the relationship between duration and the term spread also appears in purchases of consumer durable goods. Section 1.7 examines the relationship between the type of debt that firms sell and the duration of their assets. I find a positive relationship (consistent with maturity-matching theories), which gives added support for the idea that long-term interest rates are the relevant cost of capital for long-duration assets and short rates for short-term assets. Finally, section 1.8 concludes. 1.2 Data To study the relationship between investment and the cost of capital in the crosssection, we need a relevant measure of the cost of capital that differs across assets. The duration of assets is a natural source of variation because it is easy to quantify for both physical assets (through their depreciation rates) and bonds (through maturities). Of course, the cost of capital depends on more than simply the level of interest rates. The equity premium is large and variable (e.g. Lettau and Ludvigson, 2001). The advantage of focusing on interest rates here is that we can directly observe the cost of capital for assets of different durations. While there have been studies of the term structure of equity (Lettau and Wachter, 2007), there is no simple way to actually measure the term structure of expected returns on equity, let alone the variation in the slope of that term structure. I obtain data on Treasury yields measured at year-end from the Federal Reserve. Treasury data has the advantage of including bonds with a large variety of maturities over a long period of time. However, firms do not in general borrow at the Treasury yield. I therefore also study the spread between yields on 3-month commercial paper and the Moody’s seasoned Baa corporate bond yield (from Global Financial Data and the Federal Reserve, respectively). The Moody’s index is meant to measure bonds with remaining maturities near 30 years. The main results focus on the Treasury yield spread.5 A potential concern is that the relevant discount rate for investment is the real interest rate, not the nominal rate. One method for obtaining the real interest rate would be Ideally, we would measure the true cost of capital for each asset, including the cost of capital for equity, in particular. While there is research on the term structure for equity (Lettau and Wachter, 2007), it is not obvious how to construct an equity cost of capital for each asset simply by looking at its depreciation rate. 5 5 to subtract an inflation forecast from the nominal rate. In general, random-walk inflation forecasts are competitive with more sophisticated methods (Atkeson and Ohanian, 2001). With a random-walk forecast, the nominal term spread and the spread obtained after subtracting expected inflation will be identical, which suggests that there is little to be gained by forecasting inflation here.6 Another option is to look at yields on inflation-protected bonds. The time series of inflation-protected bonds in the United States is relatively short, but inflation protected bonds have been sold in the United Kingdom since the 1980’s. Figure 1.2 plots the 10/5 year term spread in the UK for both nominal and inflation-protected bonds since 1985. Over the sample, the two series move together closely, even through the financial crisis. Their variances differ, but they are over 70 percent correlated. This result suggests that by studying the nominal term spread, we will obtain results that are similar to what we would obtain with the unobservable real term spread. Data on capital stocks and investment come from the Bureau of Economic Analysis’s (BEA) fixed asset tables. The main results focus on aggregate investment by asset, but the BEA also reports data at the asset×industry level. Data on depreciation rates is from Fraumeni (1997), the source for current depreciation rates used by the BEA.7 The BEA uses geometric (declining balance) depreciation for nearly all assets.8 Depreciation rates are estimated primarily from data on service lives and sales of vintage assets. Given the resale value of an asset for each age along with a service life, one can estimate an approximate geometric depreciation rate.9 These depreciation rates are closer to economic depreciation than the straight-line method used for accounting purposes by many firms (Hulten and Wykoff, 1981). I use 36 asset classes from the BEA tables, excluding household and government as6 Furthermore, we would need to estimate a 10-year inflation forecast, which would be difficult even if inflation were relatively easy to forecast at short horizons. Another option would be to use survey data on inflation forecasts, but this would substantially limit the available time series. Her depreciation rates closely match depreciation obtained by simply diving BEA reported depreciation by the capital stock. 8 9 7 Missiles and nuclear fuel rods, for example, are modeled with straight-line depreciation. The BEA’s current estimates are a combination of data from a variety of studies on resale values reviewed in Fraumeni, 1997. 6 Figure 1.2: United Kingdom 10/5 year term spreads, 1985–2010 2 1.5 1 0.5 Nominal 7 1/90 1/95 1/00 0 1/05 1/10 1/85 -0.5 -1 Inflation-adjusted -1.5 -2 Note: gap between yields on 10 and 5-year nominal and inflation-protected bonds sets and educational, health, and religion-related structures. The majority of the analysis focuses on equipment investment. The investment literature generally finds that models have substantial trouble explaining structures investment (Oliner, Rudebusch, and Sichel, 1995). This may be partly caused by the fact that nonresidential building projects take fourteen months to complete on average (Edge, 2000).10 The main results below go through when structures are included, but the relationships are far less clear. I therefore leave the analysis of structures to future work so as not to distract from a complete analysis of equipment investment. For each asset class, the BEA reports total stocks (on a current-cost basis) and investment for the private nonresidential economy. The asset classes accounting for the most nominal investment in 2007 were software (16 percent of total investment), petroleum and natural gas exploration and wells (8 percent), communication equipment (7 percent), and computers and peripheral equipment (6 percent). Except for oil and gas, these assets all have high depreciation rates, and substantial investment is necessary just to keep the stocks at constant levels. For an asset with geometric depreciation rate δi , if we assume that productivity is constant and there is a fixed discount rate r ∗ , Macaulay’s duration, Di , will be Di = j =1 ∑j ∞ (1 − δi ) j−1 (1 + r ∗ ) j = 1 + r∗ r ∗ + δi (1.1) When measuring durations I fix r ∗ = 0.03.11 Table 1.1 lists the assets used in this study along with their depreciation rates and durations. Software, computers, and office and accounting equipment have the highest depreciation rates, all above 20 percent per year. Types of heavy industrial machinery tend to have lower depreciation rates, as low as 5 percent. Finally, for the purpose of summarizing the cross-section of investment, I define an See Edge (2000) for an empirical model of residential and nonresidential structures investment that takes into account building lags. 11 10 r∗ . Allowing for a constant rate of productivity growth would be the equivalent of choosing a lower value of The results are not sensitive to the choice of r ∗ . 8 Table 1.1: Assets, depreciation rates, and durations Depreciation Duration Asset rate (percent) (years) Information processing equipment and software 0.25 3.73 Computers and peripheral equipment 0.40 2.37 Software 0.14 6.11 Communication equipment 0.14 6.24 Medical equipment and instruments 0.14 6.24 Nonmedical instruments 0.18 4.90 Photocopy and related equipment 0.31 3.01 Office and accounting equipment Industrial equipment 0.09 8.46 Fabricated metal products 0.05 12.62 Engines and turbines 0.12 6.75 Metalworking machinery 0.10 7.74 Special industry machinery, n.e.c. 0.11 7.51 General industrial, including materials handling, equipment 0.05 12.88 Electrical transmission, distribution, and industrial apparatus Transportation equipment 0.15 5.72 Trucks, buses, and truck trailers 0.22 4.12 Autos 0.12 7.10 Aircraft 0.06 11.31 Ships and boats 0.06 11.59 Railroad equipment Other equipment 0.14 6.15 Furniture and fixtures 0.12 6.87 Agricultural machinery 0.16 5.51 Construction machinery 0.15 5.72 Mining and oilfield machinery 0.16 5.42 Service industry machinery 0.18 4.83 Electrical equipment, n.e.c. 0.15 5.81 Other nonresidential equipment Structures 0.02 18.83 Office, including medical buildings 0.02 19.73 Commercial 0.03 16.89 Manufacturing 0.02 19.18 Electric 0.02 19.18 Other power 0.02 19.18 Communication 0.08 9.80 Petroleum and natural gas 0.05 13.73 Mining 0.02 19.62 Other buildings 0.03 17.91 Railroads 0.02 19.11 Farm Note: Depreciation rates are otained from the BEA. Duration is measured as 1.03/(0.03+δ). 9 index measuring the average duration of investment, ¯ Dt ≡ ∑∑ i Iit Di i Iit (1.2) ¯ Dt is simply a weighted average of the durations of the assets, where the weights are the assets’ shares in aggregate nominal investment. When investment shifts relatively towards ¯ ¯ short-duration assets, e.g. computers or software, Dt falls. Furthermore, Dt is constructed so that it is not mechanically related to the level of investment. There is no particular reason why there need be a positive or negative relationship between the level of investment (or ¯ the state of the business cycle) and Dt .12 ¯ Figure 1.3 plots Dt for 1948–2008. As might be expected, average duration has been falling over time. The fastest rate of decline appears in the late 1980’s, and the series flattens out after 1994, actually rising substantially between 2006 and 2008. We should not expect transitory changes in the term spread to explain the long-term changes in the duration of investment. Long run changes are driven by technological shifts, e.g. the introduction of computers, software, and other electronic equipment. Instead, the term spread will explain ¯ the year-to-year variation in Dt .13 Table 1.1 shows that computers have a depreciation rate of 25 percent, and software 40 percent. Their combined share of nominal investment rises from 7.4 percent in 1978 ¯ to 32.7 percent in 2008. Figure 1.3 also includes a version of Dt that excludes investment in computers and software, and we can see that if not for computers and software, there is no decline in the average duration of investment over time. Over the sample, though, ¯ the correlation of the first differences of the two versions of Dt is over 90 percent. For the main regressions, I detrend all of the variables using the Hodrick-Prescott (HP) filter with a smoothing parameter of 25. I obtain similar results when I use a polynomial trend or take first differences (see table 1.2 for results in first differences). Rather than using an index of average duration, which involves a discount rate, we could also simply use an index of the average depreciation rate of investment. All of the results below go through with this alternative measure. 13 Tevlin and Whelan (2003) give a more extensive discussion of the recent decrease in the duration of the capital stock. 12 10 Figure 1.3: Average Duration of Investment, 1948–2008 11 10.5 No computers or software 10 9.5 9 Years 8.5 11 1953 1958 1963 1968 1973 1978 1983 8 All assets 7.5 7 6.5 6 1988 1993 1998 2003 2008 1948 Note: average duration is duration summed across all assets, weighted by nominal investment shares. Investment is obtained from the BEA fixed asset tables. 1.3 Results ¯ Figure 1.1 plots HP-detrended Dt and the 10/1 year term spread at the end of the ¯ previous year (with the axis for Dt reversed). The negative relationship is immediately apparent. The term spread and average duration have a correlation of -54 percent. Gray bars indicate NBER-dated recessions. In most recessions, the term spread rises due to the Fed cutting interest rates, and the duration of investment falls. Duration is often high just prior to recessions, e.g. 1970, 1990, 2001, and 2007, when the yield curve is inverted. Looking more closely, we can see that over time the term spread has become more volatile ¯ while Dt has become somewhat less volatile. This is a common finding: the real economy has become less volatile (the great moderation), while Federal Reserve policy has become more aggressive, causing higher volatility in interest rates. ¯ Table 1.2 reports results of regressions of Dt on the first lag of the term spread. All of the variables in table 1.2 are standardized to have unit variance so that the regression coefficients indicate how a one standard deviation increase in the independent variables ¯ ¯ affects Dt in terms of its own standard deviation. The units of Dt have no deep economic meaning on their own. As expected, in the first column we find a highly significant negative coefficient on the term spread and an R2 of 0.30. This is a high value; Oliner, Rudebusch, and Sichel (1995), when forecasting the level of aggregate investment using models with as many as 11 lags of quarterly data, obtain at best an R2 of 0.34. With a single variable, I am able to get an ¯ R2 nearly as high for Dt . Column two uses the term spread on corporate bonds instead of Treasuries and finds a nearly identical coefficient and R2 . ¯ The third column of table 1.2 controls for the lagged level of Dt . The coefficient is only marginally significant and the coefficient on the term spread is essentially unchanged. Column 4 shows that leading values of the term spread have no explanatory power for average duration, which is consistent with the theory that firms are responding to the cost of capital, rather than there being some underlying variable that causes the term spread ¯ and D to generally move together. Finally, the fifth column runs the basic regression using investment in all assets instead 12 Table 1.2: Regressions of the average duration of investment Assets: Term spread(t-1) (4) Equip. -0.58 *** [0.14] (5) All -0.31 *** [0.12] First differences (6) (7) (8) Equip. Within Between -0.41 *** -0.28 *** -0.14 *** [0.11] [0.06] [0.06] Corporate TS(t-1) Duration(t-1) (1) (2) (3) Equip. Equip. Equip. -0.56 *** -0.49 *** [0.11] [0.12] -0.52 *** [0.09] 0.20 * [0.11] Term spread (t) Term Spread(t+1) 59 0.30 59 0.25 47 0.37 59 0.10 N R2 0.06 [0.09] 0.00 [0.08] 58 0.31 58 0.22 58 0.21 58 0.12 13 Note: * indicates significance at the 10 percent level, ** 5 percent level, *** 1 percent level. Annual data, 1950–2008, where available. The dependent variable is the average duration of investment. Investment and depreciation rates are obtained from BEA. The term spread is the 10-year minus the 1-year treasury yield at the end of the calendar year. The corporate term spread is the spread between the Moody's AAA corporate 30 year index and the St. Louis Fed's 3 month commercial paper yield. Columns 7 through 9 give results from first differenced regressions. Column 8 uses the effect of within-industry reallocation on average duration as the dependent variable. Column 9 is defined analogously using cross-industry reallocation. All variables are detrended with the HP filter with a smoothing parameter of 25 and standardized to have unit variance. Newey-West standard errors with a 3-year window are reported in brackets. of equipment alone. The results still go through. The symmetrical regression using only structures investment is unenlightening because there is not enough variation in duration within structures to provide reasonable statistical power. ¯ To test for a break in the relationship between Dt and the term spread, I use the sup-F test (also known as the Quandt likelihood ratio test). We might expect that the break in this relationship would have appeared following the great moderation, when monetary policy became more aggressive and the economy less volatile. The F-test for a break, though, is ¯ maximized in 1958. Looking at figure 1.1, it is clear that after 1958 the volatility of Dt fell and the volatility of the term spread rose. The F-statistic for a break is never above the critical value reported in Andrews (1993) except for in 1958 and 1959. The highest value outside those two years is 4.56 in 1992, well below the 10 percent critical value of 5.00. There is thus evidence for a structural break, but not where we might have thought. For ¯ the period since 1960, we cannot reject the hypothesis that the relationship between Dt and the term spread has been stable. 1.4 Model With the basic result in hand, it is useful to build a simple and stylized model to help understand where this correlation might come from. It is tempting to immediately jump to the conclusion that there is variation in the cost of capital (i.e. shocks to the supply of investment goods), which drives the result in figure 1.1. The model helps identify what other factors might induce a similar correlation. I consider a standard infinite-horizon setup with a few simplifications for analytic tractability. Firms face a linear production function in each type of capital, where the current level of productivity for asset i is Bit . That is, revenue is equal to ∑ Bit Kit i (1.3) where Kit is the stock of asset i at date t. Note that this revenue function ignores complementarities between types of assets. In general, if a decline in the term spread is expected to shift investment towards long-duration assets, complementarity across assets will at14 tenuate this effect (in the limit of a Leontief production function, firms would never vary the composition of the capital stock). I follow Baxter and Crucini (1993) and Jermann (1998) in specifying the update process for capital as Kit+1 = (1 − δi ) φi ( Iit−1 ) + φi ( Iit ) (1.4) The update process for capital assumes that capital only operates for two periods for the sake of analytic simplicity. Each asset depreciates by the factor (1 − δi ) between its first and second period of operation, and is subsequently obsolete. φi incorporates adjustment costs in investment so that a unit of investment may create less than one unit of capital. φi takes the form φi ( Iit ) = η1i 1−1/γ Iit + η2i 1 − 1/γ (1.5) φi has the useful property that the elasticity of investment with respect to Tobin’s Q will equal the constant γ.14 The parameters η1i and η2i determine the level of investment and the size of the adjustment costs paid and are allowed to vary across assets.15 Denoting the discount rate between dates t and t + 1 as rt+1 , the firm maximizes the discounted value of its revenue net of investment costs, Πt = max ∑ ∑ exp −∑k=1 rt+k Et Bit+ j Kit+ j − Iit+ j j Iit j =0 i ∞ (1.6) where Et denotes the expectation operator conditional on information available at date t. All profits are discounted at the riskless rate rt . For productivity growth, I assume that different assets may have different current levels of productivity, but expected productivity growth in the future is the same for all assets, 14 This functional form has the drawback that it is not necessarily consistent with negative investment. However, asset-level investment is always positive in the data, so this is not a practical concern here. These two parameters allow us to choose a steady state level of investment xi where Qi = 1 and φi ( xi ) = xi and φi ( xi ) = 1. That is, they allow us to choose a point where the firm pays no adjustment costs overall and on the margin. 15 15 Et log Bi,t+ j /Bi,t+ j−1 = µt+ j . Taking the first-order condition for Iit gives 1 = exp (µt+1 − rt+1 ) Bit + exp (µt+1 + µt+2 − rt+1 − rt+2 ) Bit (1 − δi ) φi ( Iit ) (1.7) The appendix shows that, using a first-order approximation, we can derive an approximate ¯ expression for the index of average duration, Dt , ¯ Dt ≈ d0 + γN −1 ∑ log ( Bit ) Di + F (µt+2 − rt+2 ) i (1.8) where d0 is a constant, N is the number of assets, log ( Bit ) ≡ log ( Bit ) − N −1 ∑i log Bit is the deviation of the productivity of asset i from the period average, and F is a strictly increasing function.16 We can rewrite the term µt+2 − rt+2 as µ t +2 − r t +2 = ( µ t +1 + µ t +2 ) − ( r t +1 + r t +2 ) Total prod. growth Total discount rate + ( µ t +2 − µ t +1 ) − ( r t +2 − r t +1 ) Productivity spread Term spread (1.9) The first line is total productivity growth and the total discount rate between periods t and t + 2. The term (rt+2 − rt+1 ) is the relevant concept of the term spread, and I refer to (µt+2 − µt+1 ) as the productivity growth spread. ¯ The previous section considered a simple regression of Dt on the term spread, (rt+2 − rt+1 ). Holding all else equal (including total expected productivity growth and the total discount rate), equation (1.8) confirms the simple intuition that this relationship should be negative. Equation (1.8) shows, however, that there are at least four potential omitted variables in this regression: total expected productivity growth and discount rates between t and t + 2 (µt+1 + µt+2 and rt+1 + rt+2 ); the productivity growth spread, (µt+2 − µt+1 ), and the levels of idiosyncratic productivity, N −1 ∑i log ( Bit ) Di . 16 Specifically, F ( x ) = − exp( x ) −1 ¯ N 1+exp( x )(1−δ) ˆ ¯ ˆ ¯ ∑i δi Di where δ ≡ N −1 ∑i δi and δi ≡ δi − δ. Note that since Di is decreasing in δi , F is strictly increasing. 16 First, holding the term spread and the productivity spread fixed, an increase in productivity growth (µt+1 + µt+2 ) or a decrease in discount rates (rt+1 + rt+2 ) will tilt the distribution of investment towards long-duration assets. This effect is the primary feature of duration: long-duration assets gain more value from a decline in interest rates or an increase in expected productivity growth than do short-duration assets. To the extent that the term spread is correlated with long-term average productivity growth and interest rates, then, a regression of the average duration of investment on the term spread will be biased. Specifically, we could spuriously find a negative relationship between the term spread ¯ and Dt if expected long-term productivity growth is low in periods when the term spread is high. The term spread is countercyclical, so this would correspond to a situation in which expected long-term productivity growth (µt+1 + µt+2 ) is low during recessions. I will try to control for these effects by controlling for the level of aggregate investment and various other indicators of the state of the business cycle. The second source of bias is that the productivity spread (µt+2 − µt+1 ) could be correlated with the term spread. In particular, if productivity growth is expected to slow down in the same periods that the term spread is high, we would find a spurious negative relationship between average duration and the term spread. In this case, recessions would have to be periods in which productivity growth is expected to decelerate in the future, which seems unlikely given that recessions are periods when growth is already slow in the first place (by definition). Finally, the levels of productivity across assets could be related to duration, affecting ¯ Dt through the N −1 ∑i log ( Bit ) Di term, which can be thought of as the covariance between duration and productivity across assets. If this covariance changes over time and is systematically related to the level of the term spread, then omitting it from the regression would bias the coefficient on the term spread. Over long horizons, investment and productivity shift substantially across different assets. The most notable of these changes is the long-run decline in prices and increase 17 in investment in computers and software (Tevlin and Whelan, 2003).17 The model would ¯ interpret this phenomenon as an increase in Bit for low-duration assets, which drives Dt ¯ downward. A simple way to control for those movements is to detrend Dt . Short-run movements in idiosyncratic productivity are more difficult to account for, though. If changes in the term spread are correlated with shifts in productivity that favor ¯ certain assets, then the regression of Dt on the term spread will be biased. In the empirical analysis below, I discuss and control for some specific mechanisms, most importantly industry demand shifts, that could drive high-frequency movements in N −1 ∑i log ( Bit ) Di . Instead of running a regression of average duration on investment, it would be nice to estimate a more fundamental parameter, such as the coefficient on marginal Q, which tells us about the size of adjustment costs in investment. One way to do that would be to calculate Tobin’s Q for each asset individually, as in Abel and Blanchard (1986), using the full term structure of interest rates. The problem is that we do not actually directly measure the marginal product of any individual asset at any point in time. Moreover, we do not measure anything like the true discount rate for each asset. Rather, the term spread in this paper is measured using Treasury yields and is taken as an indicator of differences in discount rates across assets. A deeper problem is that Abel and Blanchard’s method would also require forecasting inflation at very long horizons, when the literature generally finds that inflation is difficult to forecast even at quarterly and annual horizons (e.g. Atkeson and Ohanian, 2001).18 17 See also Caballero, 1994, and Schaller, 2006, for studies of the relationship between investment and the cost of capital in the long-run. Euler equation estimation is also an option. In a pair of papers, Oliner, Rudebusch, and Sichel (1995, 1996) study the effectiveness and internal consistency of Euler equation models for investment. They obtain parameter estimates that are somewhat difficult to reconcile with economic theory, find that supposedly "structural" parameters are unstable over time, and that the models have little forecasting power. There are also legitimate concerns about the validity and relevance of the instruments used in these models (especially when extended to asset-level data). I attempted to estimate an Euler equation using the panel of data on asset-level investment. Between twostage least squares, LIML, and GMM methods, there were substantial differences in results indicating that the model is misspecified or there are problems with the instruments. I also replicated some of the troubling results found by Oliner, Rudebusch, and Sichel. Furthermore, Euler equations are clearly difficult to estimate even with quarterly data, and I only have annual data on asset-level investment. The Euler-equation method is also more restrictive than the methods used in this paper because it is difficult or impossible to incorporate all of the controls that I consider. Euler equations are useful for estimating specific parameters in tightly theorized models. The regressions used here are meant to test a broader range of possible explanations for the correlation between average duration and the term spread and to measure the explanatory 18 18 What the regression of average duration on the term spread is useful for is testing whether the term spread drives investment in the direction that we would expect and how much explanatory power the term spread has for the cross-section of investment. A high R2 in a regression of average duration on the term spread is evidence that the crosssection of interest rates is an important determinant of the cross-sectional distribution of investment. 1.5 Alternative explanations The working hypothesis is that the negative relationship between average duration and the term spread is a simple cost-of-capital effect. The model in the previous section shows that there are a number of other factors that could cause us to find the correlation we observe in figure 1.1. This section considers a range of possible alternative explanations. I find that the correlation is driven to some extent by these other factors, but that the cost of capital retains a substantial amount of explanatory power and is generally the most powerful variable for explaining average duration. 1.5.1 Correlations by asset and industry ¯ One possible explanation for the correlation between the term spread and D is that demand for the products of different industries depends on the term spread. For example, suppose when the term spread is high consumers demand fewer durable goods (the term spread tends to be countercyclical, as are durables purchases; Yogo, 2006). If durable goods industries tend to use relatively more long-duration capital than services providers (for example, a car manufacturer may use more heavy machinery than a barber shop), then we would see investment shift towards low-duration assets. In the terms of the model, this is a story about the covariance term ∑i D (δi ) log ( Bi0 ) − N −1 ∑i log ( Bi0 ) . The correlation ¯ between D and the term spread then would be driven by consumer demand (and hence the variation in the marginal product across assets) instead of the cost of capital. We can power of the term spread. I therefore leave the Euler equation analysis of this panel dataset for future work. 19 ¯ test this hypothesis by decomposing D into components driven by within-industry reallocation and changes in the composition of investment across industries. As noted above, the BEA not only reports data on aggregate investment; it also gives ¯ ¯ levels of investment at the asset×industry level. Denoting the first difference of Dt as ∆ Dt , ¯ we can decompose ∆ Dt following van Ark and Inklaar (2006) using the industry-level data as ¯ ∆ Dt = ∑ j j 1 2 1 2 Ij,t Ij,t−1 − ¯ ¯ It It−1 Ij,t−1 Ij,t + ¯ ¯ It It−1 ¯ ¯ D j,t + D j,t−1 ¯ ¯ D j,t − D j,t−1 (1.10) +∑ ¯ where D j,t ≡ ∑i Ij,i,t ¯ Di Ij,t is the average duration of industry j at time t. The first part of equation (1.10) can be thought of as a cross-industry reallocation effect. It sums the changes in the industry investment shares weighting by their average depreciation rates at dates t and t − 1. The second term is the within-industry reallocation term. It represents the effects of industries changing their mix of investment among different assets. I refer to the two effects as the between and within-industry effects, respectively. The final three columns of table 1.2 report results from first-differenced regressions ¯ ¯ of ∆ D and its decomposition (1.10) on the change in the term spread. ∆ D and ∆TS are standardized to have unit variance as in the remainder of the table. The three columns report results from regressions with different dependent variables. The first column uses ¯ ∆ D. The coefficient on the term spread is similar to though somewhat smaller than the ¯ coefficient in column 1. In other words, the relationship between D and the term spread is somewhat weaker in high frequency data, which is perhaps not surprising considering the effects of planning, ordering, and building lags. The coefficients in columns 7 and 8 by definition sum to the coefficient in column 6. The within-industry coefficient is twice the size of the between-industry coefficient; in other words, two thirds of the aggregate effect comes from reallocation within industries. The hypothesis that industry demand is correlated with the term spread seems to be true, but it explains only a minority of the variation in average duration over time. 20 To analyze how the relationship in figure 1.1 and table 1.2 differs across assets, I run a regression of each asset’s share of aggregate investment on the term spread.19 Specifically, for each asset we run the regression Iit = αi + β i TSt + ε it ∑i Iit (1.11) It is straightforward to show that if β i is negatively related to each asset’s duration, then ¯ there will be a negative relationship between the term spread and Dt . This is a way of asking whether the relationship we observe at the aggregate level is pervasive across assets, or is driven by a few outlier assets. Figure 1.4 plots the coefficients β i against duration. The black boxes are for equipment, grey diamonds structures. Regression lines are included for the sample of all assets and for equipment only. The correlations between β i and Di are -0.42 and -0.31 for equipment only and all assets, respectively. Looking across equipment, the relationship between the composition of investment and the term spread is broadly based not driven by a few outliers. The plot includes labels for the assets that make up the largest part of investment over the last 15 years. Numbers in parentheses represent their percentage shares over that period. Within equipment, auto purchases as a share of total investment are far more positively correlated with the term spread than any other asset, though they represent a relatively small part of aggregate investment. Software is the single largest component of investment and it is well above the best fit line. Communication equipment and computers are next in the rankings and are somewhat closer to the regression line. Structures do not match the results for equipment very well. While the shares of structures are generally negatively related to the term spread, they are not as negative as we would think from just looking at equipment. Electric-power plants, in particular, are a large positive outlier. As noted above, the fact that building lags average over a year (a time that does not take into account the time required for planning) is likely to distort the 19 To control for long-term changes in the composition of investment I first detrend the dependent variable and the term spread using the HP filter with a smoothing parameter of 25 as above. 21 Figure 1.4: Coefficients from regressions of investment shares on the term spread 0.004 Cars (3.4) 0.003 0.002 Software (14.9) Trucks and buses (6.9) Medical equipment (3.7) Electric power plants (2.0) Furniture and fixtures (3.3) Regression Coefficient 0.001 22 5 10 Computers (7.4) 15 20 25 -1E-17 0 Construction machinery (2.1) Communication equipment (8.2) Full sample best fit Railroad Equipment (0.6) Petroleum and natural gas (4.0) Office/medical buildings (4.3) -0.001 Commercial buildings (5.3) Metalworking machinery (2.6) Equipment only best fit -0.002 Duration Note: Coefficients from regressions of each asset's share of aggregate investment on the term spread. Both variables are detrended with the HP filter with a smoothing parameter of 25. Numbers in parentheses are investment shares over 1993–2008. regressions for structures. 1.5.2 The business cycle, volatility, and other explanations Table 1.3 explores a number of other mechanisms that could cause the observed correlation beyond changes in demand across industries. Columns 1 and 2 control for the business cycle with the lagged detrended unemployment rate and level of output. In both cases the coefficient on the term spread is smaller but still statistically and economically significant. This is perhaps not surprising: even if the term spread does represent a true cost-of-capital effect, it is also a proxy for the business cycle. Controlling for other business cycle indicators will probably lower its coefficient. Including the current value and longer lags of unemployment and output do not change the results of the regressions. Another obvious question is whether there is a mechanical relationship between average duration and the level of investment. Suppose a firm has equal stocks of two assets, one with a depreciation rate of 1 percent, the other 10 percent. In a maintenance phase with no net capital growth, there will be 10 times as much investment in the high depreciation as the low depreciation asset. However, in an expansion phase, assuming both assets are expanded equally, investment will shift towards being equally balanced between the two assets. If the term spread is correlated with the level of investment, it might also then be correlated with average duration. Column 3 tests that hypothesis by including detrended aggregate equipment investment. Puzzlingly, unlike the example just given, when investment is high, duration actually tends to be low. However, the coefficient on the term spread is still large and significant. The term spread thus has explanatory power beyond its indication of either the business cycle of overall level of investment. Column 4 shows that if we include all three aggregate indicators, unemployment, GDP, and investment, the coefficient on the term spread is the highest, and has the highest t-statistic, of any of the variables (implying that the marginal R2 of the term spread is higher than any of the business-cycle indicators). Abel et al. (1996), among many others, study the effects of irreversibility on investment. With irreversibility, when idiosyncratic uncertainty is high, firms may be less willing to 23 Table 1.3: Robustness tests Term Spread(t-1) (2) -0.33 *** [0.11] (3) -0.65 *** [0.11] (5) -0.41 *** [0.09] (6) -0.36 *** [0.10] (7) -0.37 *** [0.10] Unemployment(t-1) 0.38 *** [0.10] -0.23 ** [0.11] 0.37 *** [0.12] 0.31 ** [0.14] (1) -0.27 ** [0.12] -0.45 *** [0.14] GDP(t-1) 0.38 *** [0.09] Investment(t) (4) -0.44 *** [0.08] -0.01 [0.19] 0.40 ** [0.17] -0.29 *** [0.09] SD_profits(t+1) SD_returns(t+1) -0.01 [0.09] -0.31 *** [0.08] -0.24 *** [0.08] 24 58 0.45 59 0.45 59 0.35 58 0.52 Bank tightness(t) Value spread(t) 44 0.65 37 0.62 N R2 -0.22 *** [0.07] 59 0.49 Note: See table 2. The dependent variable is the detrended average duration of equipment investment. The value spread is the gap between log book/market (B/M) for the top and bottom 30 percent of firms ranked by B/M, among the smaller 50 percent of firms, measured at the beginning of the year. SD_profits and SD_returns are the cross-sectional standard deviations of quarterly firm profit growth and stock returns, controlling for a time trend and 3-digit industry dummies. The unemployment rate is the national rate obtained from the BLS. GDP is real GDP from the BEA. Bank tighness is the Fed's Survey of Senior Loan Officers index (from Morgan and Lown, 2006). Investment is aggregate real nonresidential equipment investment. All variables are detrended with the HP filter with a smoothing parameter of 25 (except for the value spread, for which it is 100), and standardized to have unit variance. invest in long-duration assets. Intuitively, if it is more difficult to sell a long-duration asset (e.g. a large wind turbine) because it is more costly to disassemble than a shortduration investment, then there is option value to delaying investment which is increasing in uncertainty.20 Campbell et al. (2001) and Bloom (2009) find that when the volatility of returns on the aggregate stock market is high, so is idiosyncratic firm volatility. If the term spread is partially driven by aggregate volatility (a finding of Bloom, 2009, and implied by ¯ many term structure models, e.g. Longstaff and Schwartz, 1992), and volatility drives D, ¯ then we would find a spurious correlation between the term spread and D. I use two measures of cross-sectional volatility that are also used in Bloom (2009): the period-by-period cross-sectional standard deviations of firm quarterly profit growth and stock returns, including controls for 3-digit SIC industries.21 Column 5 of table 1.3 reports ¯ results of a regression of D on the volatility indexes. Both measures of volatility are positively correlated with the next year’s term spread, which is consistent with Bloom’s (2009) results. He finds that volatility shocks lead to economic contractions and reductions in the short rate. Table 1.3 shows that conditional on the term spread and the state of the business cycle, high stock return volatility (though not profit growth volatility) in the following year is associated with low duration investment. This is consistent with the hypothesis that long-duration investment involves a bigger commitment for firms than short-duration investment. That is, the hypothesis that high volatility interacts with fixed costs of adjustment to decrease investment seems to apply more strongly to long than short-term assets. Note, though, that even when controlling for volatility, the term spread remains significant and has a large coefficient. Another alternative hypothesis is that the term spread does not reflect the cost of capital but is simply an indicator of the stance of monetary policy. When the Federal Reserve contracts the money supply, this may inhibit bank lending, as in Kashyap and Stein (2000). If banks are more likely to finance projects of a certain duration (either high or low), then ¯ the term spread might simply be correlated with movements in D because it is correlated 20 21 House and Shapiro, 2008, discuss the relationship between real option-type effects and asset duration. The original data was retrieved from Compustat and CRSP. I obtained the data used here from Nick Bloom’s website. 25 with bank lending standards. One way to test this hypothesis is to try to directly measure bank lending standards. The Federal Reserve has administered a Survey of Senior Loan Officers since 1967 (with a gap between 1983 and 1989) that asks banks about the level of their lending standards.22 Column 6 includes the tightness index from this survey in the regression. The coefficient on the term spread remains significant. When bank lending standards are relatively tight (a high value of the index), average duration is low. This is perhaps surprising, since banks are usually thought of as financing short-duration projects, while firms go to credit markets for longer-term financing. One possible explanation is that lending standards tend to be high when other factors are driving firms towards shortduration investment. In particular, standards might be high in times of high uncertainty. The appendix includes further robustness tests. When all of the controls are included simultaneously, the term spread is the only significant variable and it has more explanatory power than any of the other variables individually. Lettau and Wachter (2007) argue that the differences in returns between high and low book/market (B/M) stocks can be explained by differences in the duration of their cash flows (see also Hansen, Heaton, and Li, 2008). A high value spread is associated with a high valuation for growth stocks, or long-duration assets, which implies investment in long-duration assets should be high. Since stock prices represent claims on capital, whereas Treasury bonds are claims on currency, we might expect that the value spread would have more predictive power than the term spread. Column 7 of table 1.3 reports the results of a regression including the value spread. I measure the value spread here as the ratio of the book to market ratios for the top and bottom third of stocks sorted by book to market (as reported on Kenneth French’s website).23 The coefficient is significantly negative: the opposite of what the duration theory of the value spread would predict. One possible explanation for this result is that firms with growth stocks tend to have lowerduration assets—e.g. technology firms—so when their values are high average duration 22 23 I obtain data from Lown and Morgan, 2006. Specifically, French reports value spreads for small and large stocks, split at the median of market capitalization. I average these two value spreads. Furthermore, I detrend the value spread using the HP filter with a smoothing parameter of 100. 26 falls. To many readers, that may have been the obvious result all along. Nevertheless, it runs against Lettau and Wachter’s theory. 1.6 Consumer durables If the term spread truly represents a cost of capital effect then we would expect household purchases of durable goods to respond to it in a manner similar to nonresidential investment. Households face some of the same choices as firms when deciding what types of durable goods to purchase. In particular, long-lasting durable goods may have financing arrangements with longer terms than those of shorter duration assets.24 Denoting the duration of durable good of type i as Ci and purchases as Pi , I define the average duration of consumer durables purchases as ∑ Ci Pit ¯ Ct ≡ i ∑i Pit (1.12) Table 1.4 lists the assets available from the BEA, along with their depreciation rates and durations. The two assets with the lowest depreciation rates are luggage and furniture at 13 percent. Computer software and motor vehicle parts have the highest rates at 76 and 90 percent, respectively. The assets are mostly clustered in a small range of depreciation rates, though: three fourths have depreciation rates between 16 and 25 percent. ¯ Figure 1.5 plots HP-detrended Ct against the detrended term spread. As in figure 1.1, ¯ the axis for Ct is reversed so that a negative correlation in the data is an easier-to-read positive correlation in the figure. For most of the sample, there is a strong negative correlation, just as we observe for nonresidential investment. In a regression similar to those in table 1.2, consumer durables on the lagged term spread, the coefficient is -0.31 with a p-value of 0.008. There is thus a significant relationship over the full sample, though the correlation is somewhat weaker than what we observe for nonresidential investment. The correlation is clearest between 1965 and 1991. For nonresidential investment the correlation is more consistent over time, which explains why the QLR test in section 1.3 indicated a break point 24 Attanasio, Goldberg, and Kyriazidou (2008) show that auto loan terms tend to be between three and five years, while home loans may be as long as 30 years. 27 Table 1.4: Consumer durables, depreciation rates, and durations Depreciation Duration Asset rate (percent) (years) 0.28 0.25 0.90 0.13 0.18 0.18 0.18 0.16 0.18 0.18 0.20 0.18 0.44 0.76 0.18 0.18 0.18 0.18 0.18 0.18 0.26 0.18 0.20 0.16 0.32 0.18 0.13 0.18 3.27 3.70 1.11 6.63 4.90 4.91 4.89 5.37 4.90 4.90 4.45 4.92 2.21 1.31 4.91 4.91 4.91 4.92 4.90 4.91 3.52 4.91 4.46 5.36 2.95 4.91 6.63 4.88 Motor vehicles and parts Autos Light trucks Motor vechicle parts & accessories Furnishings and household equipment Furniture Clocks, lamps, lighting fix & other Carpets and other floor coverings Window coverings Household appliances Glassware, tableware, & household uten Tools & equipment for house & garden Recreational goods and Vehicles Video & audio equipment Photographic equipment Personal computers and peripheral equip Computer software & accessories Calcs, typewrtrs, & oth info proc equip Sporting equip, supplies, guns, & ammo Motorcycles Bicycles & accessories Pleasure boats Pleasure aircraft Other recreational vehicles Recreational books Musical instruments Other durable goods Jewelry & watches Therapeutic appliances & equip Educational books Luggage & similar personal items Telephone & facsimile equipment Note: Depreciation rates are otained from the BEA. Duration is measured as 1.03/(0.03+δ). 28 only in the very beginning of the sample. ¯ The relationship between Ct and the term spread seems to abruptly break down after 1991. If we run a QLR test as before, we can reject the hypothesis of no break at the 1 percent level. The F-statistic is maximized in 1991, only one year different from the local maximum that is obtained in the F-statistic for nonresidential investment.25 The fact that these two break tests are maximized around the same time suggests that the breakdown in the consumer durables plot is not due to a factor that is specific to consumers. One possible consumer-specific explanation is that there was some sort of change in consumer credit markets around 1991. Perhaps easier access to credit cards made consumers less dependent on long-term financing for some durables purchases, which made them less sensitive to long-term credit conditions. The Flow of Funds accounts measure total credit card balances and household net worth. The ratio of consumer credit debt to net worth rises from 1.0 to 3.8 percent between 1945 and 1965, but then stays flat subsequently. While there were certainly changes in consumer credit markets following 1965, the total quantity of credit has remained in this sense stable. 1.7 The firm-level mechanism To augment the analysis above, this section studies two aspects of firm-level investment. I begin by asking whether firms that invest in long-duration assets also tend to sell long-term debt. Next, I look at whether industries with larger cash holdings are more sensitive to the term spread. The data answer both these questions in the affirmative, but when we include a full set of industry and year dummies the results go away, possibly because of insufficient statistical power. The first result indicates that when firms go to debt markets, the interest rate that they face depends on the duration of the investment they plan on undertaking. The second result shows that firms that are more likely to have to go to debt markets seem to vary the composition of their investment more strongly in response to interest rates. 25 Note, again, that the local maximum for nonresidential investment is not statistically significant. 29 Figure 1.5: Average duration of consumer durable purchases versus the lagged term spread 2 -0.06 -0.04 1 Term Spread -0.02 0 1951 1956 1961 1966 1971 1976 1981 1986 1991 1996 2001 2006 0 Term Spread 0.04 -2 Average Duration 0.06 -3 0.08 -4 0.1 Note: Average duration of consumer durables is defined analogously to that for durable equipment. Both lines represent HP-detrended values. The axis for average duration is reversed. Grey bars indicate NBER-dated recessions. Duration 30 -1 0.02 1.7.1 The maturity of assets and debt The link between the term spread and the cost of capital will be most clear to managers if investment in long-duration assets is financed with long-duration debt. If, for example, firms always borrow at the same maturity and simply roll over their debt, then they might only pay attention to the interest rate for the maturity at which they borrow, instead of the full term structure. Baker, Greenwood, and Wurgler (BGW, 2003) find that firms time the debt market when they sell bonds. In particular, when the term spread is high firms sell short-term debt. BGW argue that firms do this because when the term spread is high, the prices of shortterm bonds are expected to fall in the future. Firms are selling expensive or overpriced debt, which BGW claim represents arbitrage. But if it is true that firms try to match the maturity of their debt to the maturity of their investments, then the results in the previous sections could explain the BGW result. Matching the maturity of debt to assets reduces potential deadweight losses from bankruptcy (see, e.g., Stohs and Mauer, 1996). Graham and Harvey (2002) report evidence from surveys that maturity matching is the single most important determinant of debt maturity choice.26 Section 1.3 showed that when short-term yields are low, firms invest in short-duration assets. If the maturity-matching hypothesis is correct then those firms should also sell short-duration debt. That matches the Baker et al. result: low short yields are associated with short-duration investment, which is associated with sales of short-duration debt. BGW claim that firms are arbitraging debt markets; I claim they are managing risk through maturity-matching. The key to completing the argument is showing that firms actually do try to match the duration of their debt to that of their assets. In this section I provide evidence in support of this proposition.27 26 See also Barclay and Smith, 1995, and Guedes and Opler, 1996, among many others. 27 Baker at al. tried measuring the duration of assets with a similar strategy to mine. However, rather than using industry depreciation reported by the BEA, they used the amount of depreciation reported to the IRS by individual firms. Presumably this data was substantially more noisy than the BEA data, which caused them to find inconclusive results. Moreover, accounting depreciation is in general not the same as economic depreciation. The majority of firms use straight line depreciation, rather than the declining balance method found to better match the resale value of assets (Hulten and Wykoff, 1981). 31 I obtain data from two sources. Data on capital stocks come from the BEA’s detailed fixed asset tables as before.28 I continue to measure average duration within industry j as ∑i Di Iijt ¯ D jt = ∑i Iijt where i indexes assets, j indexes industries, and I is investment I obtain data on corporate debt from Compustat. Following Baker et al. (2003) and Greenwood et al. (2009), the long-term share in a given industry and year is the sum of all outstanding long-term debt reported by firms in that industry divided by all long and short-term debt.29 I estimate issuance of long-term debt as the change in the level of longterm debt, and short-term issuance as simply the level of short-term debt (since short-term debt has, by definition, a maturity of less than one year). The long-term issuance share is then just the ratio of long-term issuance to total issuance.30 An important issue here is that Compustat only covers publicly traded firms, whereas the BEA’s fixed-asset data covers all firms. To the extent that private firms have limited access to long-term credit markets, this will bias the level of the long-term share upwards.31 It is less clear, though, that selection should cause us to spuriously find that high-depreciation industries have a low long-term share. The selection would need to occur in such a way that firms in high-depreciation industries are more likely to go public but are no more likely to have access to long-term credit markets. Table 1.5 reports regressions of the long-term level and issuance shares on industry average duration. The first two columns use the level share, the second two the issue share. Columns 2 and 4 include industry fixed effects. Each regression includes year dummies and the standard errors are corrected for clustering within industries. Columns 1 and 3 28 The BEA has its own industry classification which is slightly different from NAICS. I use industries that roughly correspond to a 2-digit NAICS classification, but I combine some industries to ensure that I have sufficient firm observations to get good financial data. I end up with 22 industries 29 I measure long term debt as the sum of items 9 (long term borrowing) and 44 (long term debt about to retire), and short term debt as item 9 plus item 34 (current liabilities) minus long term debt. (1.13) I drop firm observations if the level of long term debt drops by more than one half (as this amount of retirement is implausible). Industry-year observations are dropped if they have a negative level of long term debt issuance. 31 30 For example, Titman and Wessels, 1988, find that small firms are less likely to use long-term debt financing. 32 show that there is a significant negative relationship between long-term debt levels and issuance and the depreciation rate of assets in an industry. However, columns 2 and 4 show that when we include industry dummies the effect goes away. That is, there is not evidence that when an industry shifts towards higher-depreciation assets, it also changes the composition of its debt. One reason I do not find within industry effects in table 1.5 could be that the data is not sufficiently precise. The median number of firms that is used to create the industry×year observations is only 148, and the 25th percentile is 30. Moreover, the measure of the duration of debt is extremely rough. Firms could easily be changing the maturity of their long-term issues, rather than substituting between long and short-term issues.32 1.7.2 Cash reserves and investment If firms match the maturities of assets and debt, as suggested by table 1.5, then when they borrow to finance investment, shifts in the term spread directly feed into their cost of capital, and presumably their investment decisions. However, when firms finance investment internally, we might think they simply use a rule-of-thumb method for the cost of capital, ignoring the term structure of interest rates (e.g. Graham and Harvey, 2002). I study this question by looking at how investment differs across industries with different cash holdings. I study the following regression ¯ D j,t = α j + β 1 TSt−1 + β 2 CHj,t + β 3 TSt−1 × CHj,t + ε j,t (1.14) where, as before, TSt is the term spread and CHj,t is a measure of cash holdings in industry j at time t. The coefficient β 3 measures the effect of cash holdings on the response of an industry’s average duration of investment to the term spread. Under the hypothesis that firms that can finance investment internally respond less to the term spread, we should observe a positive value for β 3 . I use two measures of an industry’s ability to finance investment internally: its total While there is data with more detail on the duration of corporate debt, it does not have a long enough time series to be useful for finding the aggregate effects that I am looking for here. 32 33 Table 1.5: Regressions of the long-term corporate level and issues shares Duration Fixed Effects? N (1) Levels 0.013 *** [0.005] No 1,040 (2) Levels -0.008 [0.006] Yes 1,040 (3) Issues 0.021 ** [0.11] No 1,023 (4) Issues -0.008 [0.012] Yes 1,023 Note: The long term level share is the share of total corporate debt accounted for by long term (>1 year maturity) debt. The issues share is the share of issues accounted for by long term debt. Duration is the average duration rate of the industry's capital stock. All regressions include year dummies. Standard errors reported in brackets are corrected for clustering within industries. Annual data for 22 industries, 1950–2008. 34 current cash holdings and its cash flows (income before extraordinary items), both scaled by current property, plants, and equipment (PPE).33 One of the measures for CHj,t is a flow, while the other is a stock. If industries differ in the amount of cash that they prefer to hold at any given time, then cash flows might be more relevant for their ability to finance investment internally. On the other hand, cash reserves could be saved precisely for use in future investment, and hence represent a source of funds for investment.34 I test both possibilities. ¯ As above, I obtain data from Compustat and detrend D j,t and TSt with the HP filter. I subtract industry means from the measure of cash holdings, CHj,t , so that the coefficient ¯ on the interaction term, β 3 , represents the change in the response of D j,t to the term spread depending on the difference between the industry’s current cash holdings and the sample average for that industry. Columns 1 and 2 of table 1.6 report estimates of equation (1.14). Column 1 shows ¯ that cash flows seem to have no significant relationship with the response of D j,t to the term spread. In column 2 we see, though, that cash holdings have a strong effect. For an ¯ industry with its sample average of level of cash holdings, the response of D falls by 0.15 standard deviations for every one standard deviation increase in the term spread. For an industry with cash holdings one standard deviation above their mean, this response falls to only 0.08 standard deviations. This result fits with the hypothesis that when firms are forced to finance investment in credit markets they are more sensitive to the cost of capital. Columns 3 and 4 are the same as columns 1 and 2 except that I generate a variable CH j,t , by regressing CHj,t on a set of year and industry dummies. CH j,t thus measures the deviation of an industry’s cash holdings relative to both its sample average and the average for the year. This controls for general trends in cash holdings over time. The coefficients on the interaction terms are no longer significant. As before, this may simply be a power issue. Note that the standard error on the interaction term is substantially larger in column 33 34 The results are unchanged if we use free cash flow instead of income. See Opler et al., 1999, for an analysis of the determinants of cash holdings and their use for investment, and the large literature following Fazzari, Hubbard, and Petersen, 1988, on the relationship between investment and cash flows. 35 Table 1.6: Interaction of the term spread with industry cash holdings Term spread (t-1) Cash flows (t-1) Cash holdings (t-1) TS(t-1)xCF(t-1) TS(t-1)xCH(t-1) Industry dummies Year dummies R2 N (1) -0.15 *** [0.03] 0.03 [0.02] (2) -0.15 *** [0.02] (3) -0.15 *** [0.03] 0.08 ** [0.03] (4) -0.15 *** [0.02] 0.03 * [0.02] -0.02 [0.03] 0.07 ** [0.03] Yes No 0.09 806 0.03 [0.02] 0.07 [0.06] Yes No 0.08 770 Yes Yes 0.08 770 -0.03 [0.06] Yes Yes 0.07 806 Note: Regressions of the average duration of investment by industry on the term spread interacted with measures of cash holdings. Cash data is obtained from Compustat. TS is the term spread. CF is cash flows. CH is cash holdings. All variables are measured at the end of the year All regressions include industry dummies. Standard errors are clustered by industry. 36 4 than column 2. We in fact cannot reject that the two coefficients are equal. 1.8 Conclusion This paper shows that there is a strong relationship in aggregate data between investment and the cost of capital. I find that the term spread can explain a third of the variation of the cross section of investment. While this relationship does not quantify the magnitude of internal adjustment costs facing firms, it does show that the cost of capital is a major factor driving the variation in the type of investment that firms do. The composition of investment changes meaningfully over the business cycle, and a substantial portion of these changes can be explained by the term spread alone. The results are robust to including a variety of controls, including multiple indicators of the state of the business cycle. None of the controls eliminate the coefficient on the term spread. Moreover, when we include all of the controls at once, the term spread is the only variable that remains significant. Of all of the variables I study, the term spread is the most robust and powerful explanator of the distribution of investment. The dimension of investment studied here has not been examined before. The results extend also to consumer durables purchases: households tend to buy less-durable durables when the yield curve is steep. Cochrane (2011) gives an extensive review of the literature on return predictability and variation in the price of risk, arguing that shifts in discount rates are part of "the central organizing question of asset pricing research." As Treasury bonds have (nominally) riskless payoffs, shifts in the term spread are purely driven by discount rates. The finding that the term spread determines the composition of investment is thus connected to Cochrane’s organizing question by showing its relevance for the aggregate economy, and not just financial markets. There are many other cross-sectional sources of variation in the cost of capital beyond differences in asset lives. Tax policies, e.g. R&D tax credits and bonus depreciation, distort the cost of capital, as will changes in the price of risk. The finding here that shifts in the term structure of interest rates affect the composition suggests that tax policy can succeed 37 distorting investment choices. Similarly, to the extent that the price of risk varies over time, an interesting question is whether a high price of risk causes businesses to shift relatively towards low-risk/low-reward projects. 38 2. A MODEL OF TIME-VARYING RISK PREMIA WITH HABITS AND PRODUCTION 2.1 Introduction Stock prices are more volatile than can be explained by movements in expected dividends. Moreover, excess returns on the aggregate stock market are predictable over time. The two phenomena are connected: changes in the discount rates applied to future dividends can induce excess volatility in asset prices. This paper develops a new preference specification with time-varying risk aversion that generates realistically predictable and volatile stock returns. When combined with a production framework, the model can match the short and long-run volatilities of output, consumption, and investment growth and at the same time generate a high and volatile price of risk. Simulated stock-return forecasting regressions are consistent with empirical results, and the model also delivers a new method for forecasting stock returns. The structural estimate of risk aversion has an R2 for 5-year stock returns in the post-war period of over 50 percent. The standard model of time-varying risk aversion is the habit specification of Campbell and Cochrane (1999).1 In their model, when an agent’s consumption falls close to her habit, her risk aversion rises. Using aggregate consumption data, they find that their implied risk aversion measure can explain a large proportion of the movements in the price-dividend ratio on the stock market. Campbell and Cochrane study an endowment economy, though, so they never test whether their utility function generates a realistic consumption process in equilibrium. In fact, Lettau and Uhlig (2000) and Rudebusch and Swanson (2008) find that Campbell–Cochrane preferences imply that consumers smooth A partial selection of other early papers studying habit formation is Abel (1990), Constantinides (1990), Boldrin, Christiano, and Fisher (2001) and Jermann (1998). For other papers that study return predictability in a production setting, see Gourio (2010), Campanale, Castro, and Clementi (2010), and Guvenen (2009), though note that the latter two papers do not match the degree of predictability observed in the data. 1 39 consumption growth extremely and implausibly strongly following technology shocks in standard general-equilibrium models. This paper embeds the intuition behind Campbell and Cochrane (1999)—that persistent external habits can induce time-varying risk aversion—into the framework developed by Kreps and Porteus (1978), Epstein and Zin (1989), and Weil (1989). The Epstein– Zin specification allows us to model risk aversion and intertemporal substitution separately, while the Campbell–Cochrane intuition motivates time-variation in risk aversion. In particular, consumers are modeled as having a time-varying external habit, which is a benchmark to which they compare their own lifetime utility. When lifetime utility is farther above the benchmark, risk aversion over proportional shocks to future welfare is lower. By explicitly separating variation in risk aversion from intertemporal substitution, the Epstein–Zin framework eliminates the problems that arise when standard Campbell– Cochrane preferences are used in a production setting. I refer to the new preference specification as the EZ-habit model for its combination of these two frameworks.2 The simple real business cycle (RBC) model with fixed labor supply provides a transparent laboratory in which to study the effects of time-variation in risk aversion on the macroeconomy in general equilibrium. I find that the dynamics of real variables and real interest rates under the EZ-habit specification are highly similar to a model with Epstein– Zin utility and constant relative risk aversion.3 The model can match both the short and long-run variances of output, investment, and consumption. Since consumption and wealth are cointegrated under balanced growth, their long-run variances must be the same. But empirically, the short-run (quarterly) variance of consumption growth is much smaller than the variance of changes in wealth. To match both the long and short-run moments, a model must have either mean-reversion in wealth or strong persistence in consumption growth. A number of recent asset-pricing papers (e.g. Bansal and Yaron, 2004, Kaltenbrunner and Lochstoer, 2010) have gone the route of choos2 Melino and Yang (2003) study a utility specification that is highly similar to mine in reduced form. However, they do not discuss the inclusion of a habit, and they do not insert the preferences into a production setting. For other recent studies of asset pricing in production economies, see Danthine, Donaldson and Mehra (1992), Rouwenhorst (1995), Tallarini (2000), and Cochrane (2005). 3 40 ing very strong persistence for consumption growth. In Kaltenbrunner and Lochstoer’s (2010) analysis of asset prices in the RBC model, for example, innovations to the permanent component of consumption have a standard deviation of 8 percent per year, which is at odds with the data. The EZ-habit model, on the other hand, implies that consumption is roughly a random walk—the short and long-run variances are nearly equal—but wealth is mean-reverting: declines in risk aversion raise current asset prices and lower expected returns. Whereas other papers in the production-based asset-pricing literature do not check the fit of their models to the long-run variance of consumption and output, I show that the EZ-habit model can match this moment along with the short-run variances. In addition to matching macro moments, the EZ-habit model improves the fit of the RBC model to financial moments. Previous habit-based models designed to generate high or volatile risk premia tended to have implausibly volatile interest rates, a flaw not found here.4 The reasonable behavior of interest rates is an important innovation of this paper; the EZ-habit model is able to have stable interest rates but still generate substantial asset price volatility because it has variation in discount rates on risky assets that is driven by variation in risk aversion.5 Movements in discount rates imply that asset returns should be predictable, and extensive tests show that the degree of predictability in the model is similar to what is observed in the data. Variation in risk aversion not only raises the volatility of asset returns, I find that it also makes the equity premium roughly 1/3 larger on average than it would be otherwise. Countercyclical movements in risk aversion thus increase both the quantity and price of risk in financial markets: good times seem even better and bad times worse. There are numerous empirical methods of forecasting stock returns, but the majority of them are not based on equilibrium theories. For example, regressions of stock returns on price-dividend ratios are motivated simply by an identity that links the price-dividend ratio to future returns and dividend growth. Under the EZ-habit model, though, it turns 4 See Jermann (1998); Boldrin, Christiano, and Fisher (2001); Campanale, Castro, and Clementi (2010); and Miao and Wang (2010). See LeRoy and Porter (1981), Shiller (1981), and Grossman and Shiller (1981), for early studies of excess volatility in asset prices and the relationship between return predictability and volatility. 5 41 out to be possible to directly measure risk aversion. As is standard in the habit literature, I assume that positive innovations to household welfare reduce risk aversion. So if we can measure welfare, we can also measure risk aversion. Under Epstein–Zin preferences with a constant elasticity of intertemporal substitution, welfare is a function of current household wealth and consumption. And this result holds generally; it is not dependent on the RBC model I analyze. Using data on consumption and wealth, I construct an empirical estimate of risk aversion and show that it is a strong forecaster of aggregate stock returns: it outperforms the price-dividend ratio, Lettau and Ludvigson’s (2001) measure of the consumption-wealth ratio, and Campbell and Cochrane’s (1999) excess consumption ratio. This result differentiates my paper from models of time-varying disaster risk because it does not rely on an unobservable latent process to drive risk premia.6 The model also can match forecasting results for consumption growth. Lettau and Ludvigson (2001) find little ability to forecast consumption growth using their measure of the consumption-wealth ratio. Campbell and Shiller (1988) obtain similar results for the stock market. As in the empirical data, it is essentially impossible to forecast consumption growth in the EZ-habit model using the consumption-wealth ratio, but forecasts of risk premia are highly effective. An alternative way to forecast consumption growth is with interest rates. Hall (1988) and Campbell and Mankiw (1989), in trying to estimate the elasticity of intertemporal substitution (EIS), essentially ask whether consumption growth can be forecasted with interest rates. They find little forecasting power, suggesting the representative household has a small or even zero EIS. In this paper, the EIS is set to 1.5, but I still replicate the regression results from Hall (1988) and Campbell and Mankiw (1989). The EZ-habit model explains the failure of those regressions through a time-varying precautionary-saving effect. When risk aversion is high, households want to save more to protect themselves against future shocks, which drives interest rates downward. This effect biases standard Euler-equation estimation based on models with constant relative risk aversion. After testing the model’s fit to macro and asset pricing moments and the predictions for 6 See Gourio (2010), and Wachter (2010), for recent models with time-varying disaster risk. 42 the EIS regressions and return forecasting, I consider two extensions to the model. First, I examine the effect of time-varying risk aversion on labor supply. Following positive technology shocks, risk aversion falls, raising consumption (through a decline in precautionary saving demand). This effect also lowers the response of labor supply to technology shocks. Intuitively, intratemporal optimization means that when households are willing to spend more money to raise consumption, they are also willing to sacrifice in terms of opportunity costs to raise leisure. Endogenous labor supply has little effect on risk premia in the economy, though. The reason is simply that under Epstein–Zin preferences with a high elasticity of intertemporal substitution, the volatility of the stochastic discount factor is driven mainly by the permanent component of consumption; so even if households smooth consumption growth by varying labor supply, the total amount of risk in the economy is essentially unchanged. The second extension is a log-linearization of the model using methods similar to Campbell (1994) and Lettau (2003). Unlike standard perturbation methods, the log-linearization used here does not impose certainty equivalence, so we can obtain expressions that take into account potentially time-varying risk premia even in the first-order approximation. I am able to derive explicit expressions for the Sharpe ratio in the model as a function of current risk aversion and the underlying parameters of the model and find that the results are highly similar to those from accurate numerical solutions. Much of the previous production-based asset-pricing literature has focused on simulations to study the implications of various models, so this paper introduces an important methodological contribution in extending and simplifying the analytic results of Campbell (1994) and Lettau (2003). Further, in the case where risk aversion is constant, I give an analytic characterization of how endogenous consumption smoothing generates long-run risks in a production setting (Bansal and Yaron, 2004; Kaltenbrunner and Lochstoer, 2010). The log-linearization thus provides an analytic explanation for results that were previously supported only with simulation-based evidence. The log-linear solution returns a stochastic discount factor (SDF) that takes on the essentially affine form that is widely used in the empirical asset-pricing literature. This is possibly the first paper to derive an essentially affine SDF with a time-varying price of risk 43 from a production-based model. It thus connects the standard modeling framework in macroeconomics with one of the most widely used asset-pricing specifications in empirical finance. The paper is organized as follows. Section 2.2 discusses the preference specification and lays out the economic environment. Section 2.3 calibrates a production economy and compares its behavior to the data. Section 4 tests the empirical implications of the model for return forecasting, and section 2.5 studies extensions to the basic framework. Section 2.6 concludes. 2.2 The model 2.2.1 Household preferences For households with a constant elasticity of intertemporal substitution (EIS), Epstein– Zin (1989) utility can be expressed as 1− ρ Vt = (1 − exp (− β)) Ct + exp (− β) Gt−1 ( Et [ Gt (Vt+1 )]) 1− ρ 1/(1−ρ) (2.1) for some function Gt , where Ct is household consumption and Et is the expectation op− erator conditional on information available at date t.7 The term Gt 1 ( Et [ Gt (Vt+1 )]) is a − certainty equivalent. When there is no uncertainty about Vt+1 , Gt 1 ( Et [ Gt (Vt+1 )]) = Vt+1 . The usual choice for Gt (going back to Weil, 1989, and Epstein and Zin, 1991) is power utility, GtPower (Vt+1 ) = Vt1−α +1 (2.2) Epstein and Zin (1989) show that the coefficient of relative risk aversion for a household with preferences of the form (2.1) is equal to the coefficient of relative risk aversion for Gt , while the EIS is equal to 1/ρ. The preferences can be further generalized to study alternative time aggregators, instead of the constant elasticity of substitution form. 7 44 Now consider a habit-formation utility function for G, GtHabit (Vt+1 ; Ht ) = (Vt+1 − Ht )1−α (2.3) Value functions involving GtHabit are related to those using GtPower in the same way that usual habit specifications, e.g. Constantinides (1991), are related to time-separable power utility. Rather than caring only about the absolute level of their continuation value, GtHabit says that households care about the spread between tomorrow’s value and a benchmark Ht . Since the utility function adds a habit to Epstein–Zin, I refer to it as the EZ-habit specification.8 I refer to the version of Vt using GtPower for the certainty equivalent as canonical Epstein–Zin in deference to its popularity in the literature. V The coefficient of relative risk aversion for GtHabit is equal to α Vt+1t+1Ht . As the spread − between value and habit rises, the coefficient of relative risk aversion falls. Intuitively, when the continuation value falls close to its benchmark, proportional shocks to Vt+1 loom much larger than when the household has a cushion between its continuation value and Ht . In principle it is possible to analyze a model with GtHabit , but it has three important drawbacks. First, if the support of the shocks to Vt+1 is sufficiently wide, there is a nonzero probability that Vt+1 will fall below Ht , leaving the certainty equivalent undefined.9 Second because GtHabit is not log-linear in Vt+1 , obtaining simple analytic results with it is difficult or impossible. Third, because GtHabit is not log-linear, standard arguments for the existence of a representative agent do not apply.10 Other papers, for example Rudebusch and Swanson (2010) and Yang (2008), incorporate consumption 1− ρ habits into Epstein–Zin preferences. That is, the Ct term is replaced by (Ct − Xt )1−ρ where Xt is the habit. Rudebusch and Swanson (2008) show that in general equilibrium this does not lead to a time-varying Sharpe ratio because households endogenously smooth consumption to reduce their overall risk exposure. That said, the specification in Rudebusch and Swanson (2008) is meant to generate smooth consumption growth rather than a high risk premium. In principle, there is no reason that this type of habit formation could not be added to the EZ-habits model to help generate smoother consumption (e.g. to help explain the excess smoothness puzzle of Campbell and Deaton, 1989). Dew-Becker (2011) studies preferences with both time-varying risk aversion and consumption habits in a medium-scale DSGE model. This issue also arises in other habit specifications. When models are solved with standard perturbation methods, the problem is simply ignored. I use a more precise global numerical solution technique that forces me to grapple with the problem. 10 9 8 A representative agent may exist, but their preferences need not actually look like the preferences of any 45 For the remainder of the paper I therefore replace GtHabit with the alternative GtTV (Vt+1 ) = Vt1−αt +1 αt = α TV (−1) Vt Vt − Ht (2.4) Gt Et GtTV (Vt+1 ) (where TV stands for time-varying) is a second-order approxima- tion to Gt Habit(−1) Et GtHabit (Vt+1 ) around the non-stochastic version of the model.11 Moreover, the appendix shows that in the continuous-time limit (i.e. under stochastic differential utility), preferences with GtTV are exactly equivalent to preferences using GtHabit .12 G TV is locally equivalent to G Habit in terms of risk preferences, but it solves the problems of integrability inside the certainty equivalent and the existence of a representative agent. As in Campbell and Cochrane (1999), I assume that households take the excess value ratio, Vt Vt − Ht , and hence the coefficient of relative risk aversion, αt , as external to their own decisions. The final step, then, is to specify a dynamic process for risk aversion. I assume a simple log-linear process, which we will find to be highly tractable, A A ¯ αt+1 = φαt + (1 − φ) α + λ ∆vt+1 − Et ∆vt+1 (2.5) A where vt is the log of Vt for the representative agent. Intuitively, when value unexpect- edly rises, it moves away from the habit and risk aversion falls, so λ < 0. Movements in the habit, and hence risk aversion, depend on aggregate value so that they are not affected by an individual household’s decisions. The AR(1) specification for risk aversion is approximately equivalent to a specification where log Ht is a geometrically weighted particular agent. Ideally, if every agent has identical preferences, the representative agent will also have those preferences. 11 More precisely, the second-order approximation also assumes no growth. Adding a constant growth rate (1+ µ )V µ to V would change the result to αt = α (1+µ)V −tH . The remainder of the analysis is identical. t t 12 Melino and Yang (2003) study a utility function with the same form as G TV , but they take α as a latent t variable and give no theoretical motivation for its variation. This paper is original for proposing inserting habits into the certainty-equivalent part of Epstein–Zin preferences to motivate movements in αt . 46 A moving average of past values of vt .13 The appendix shows how to derive the marginal rate of intertemporal substitution (the stochastic discount factor, or SDF) for the general form of Epstein–Zin preferences in (2.1). In the case of G TV , we end up with the expression, ∂Vt /∂Ct+1 ≡ = exp (− β) ∂Vt /∂Ct Vt+1 ρ−αt ρ−αt 1− α t Mt + 1 Ct+1 Ct −ρ −ρ (2.6) Et Vt1−αt +1 with the only difference from the SDF under canonical Epstein–Zin preferences being the subscript on αt . The SDF is a critical piece of the model since its volatility determines the price of risk in the economy.14 As usual, changes in expected consumption growth or volatility will affect the SDF through their effects on Vt+1 . Changes in αt+1 (or Ht+1 ) will also affect the SDF in the same way. Specifically, when the habit rises and households are more risk averse, they penalize consumption uncertainty more, driving Vt+1 down. High risk-aversion states thus have high Arrow–Debreu prices. It is also straightforward to derive the standard result that 1− ρ ρ Wt = Vt Ct / (1 − exp (− β)) (2.7) where Wt is the equilibrium price of a claim on the household’s consumption stream, which I refer to as the aggregate wealth portfolio. This formula holds regardless of whether risk aversion varies over time. Intuitively, the market price of the consumption stream is equal to the utility value that a household places on it, Vt , divided by the marginal utility of consumption, Vt Ct Zin (1991), Mt+1 = exp (− β) 1− α t 1− ρ ρ −ρ (1 − exp (− β)). This leads to the familiar result from Epstein and Ct+1 Ct −ρ 1− α t 1− ρ ρ−αt 1− Rw,tρ 1 + (2.8) where Rw,t+1 is the return on the wealth portfolio. 13 It is straightforward to derive the actual process that H must follow in order for risk aversion to follow t the process in (2.5). Hansen and Jagannathan (1991) show that the maximum Sharpe ratio (expected excess return divided by standard deviation) attained by any asset in the economy is equal to the standard deviation of the SDF divided by its mean. 14 47 2.2.2 Discussion The model is motivated as an extension of habit-based preferences. Rather than consumers having a habit level of consumption that they target, I assume they have a habit level of value. Since equation (2.7) shows that there is a direct link between value and wealth, we could also think of the model as saying that households have a benchmark level of wealth. The house-money effect of Thaler and Johnson (1990) has a somewhat similar intuition. They find that when subjects in lab experiments have recently gained money in betting games, they play more aggressively.15 Abel (1990) interprets habits in consumption as a "keeping up with the Joneses" effect. That intuition extends to the EZ-habit model. What households try to keep up with in this model, though, is fundamentally different. For example, consider a college senior who is trying to decide between following her friends into consulting or getting a law degree. With the J.D., she knows that in the short run her consumption will be lower than that of her friends, but in the long run she will likely be better off. In a model with an external consumption habit, three years of consumption below that of her friends looks painful. But when the habit appears as a function of value, the student is comfortable giving up consumption in the short run as long as she knows she will do well compared to her friends in the long run. Since the habit appears only in the risk aggregator, an agent with EZ-habit preferences is willing to substitute consumption over time in a way that an agent with standard habit-forming preferences is not. For the same reason, the EZ-habit model is not inconsistent with the mixed evidence on the effects of classic consumption habits at the micro level (e.g. Dynan, 2000, and Ravina, 2007). There are a number of papers that use investment choices to measure variation in risk aversion. Carroll (2002) finds that households with higher wealth tend to tilt their investment portfolios towards more risky assets. Brunnermeier and Nagel (2008), though, argue that there is little evidence that changes in wealth affect portfolio choices in household data. Rather, they find that inertia is the dominant characteristic of household portfolio Barberis, Huang, and Santos (2001) embed the house-money effect in a full asset-pricing model. See Gertner (1993) and Post et al. (2008) for evidence on the house-money effect from game shows. 15 48 choice. Calvet, Campbell, and Sodini (2009), after controlling for the inertia studied by Brunnermeier and Nagel, find a strong and significant relationship between innovations to wealth and the riskiness of a household’s portfolio.16 Furthermore, they show that weakness in the instruments for wealth shocks can cause a researcher to erroneously find that wealth does not affect risk-taking. Calvet and Sodini (2010) show that higher past income, controlling for current wealth and genetic differences in risk attitudes, is also negatively related to the share of household portfolios invested in risky assets. On net, with the notable exception of Brunnermeier and Nagel (2008), the empirical literature supports the idea that increases in wealth reduce risk aversion. 2.2.3 Production Aggregate output is a function of the capital stock, Kt , and productivity At 1− γ γ Yt = At Kt (2.9) In section 2.5.3 I add endogenous labor supply and show that it does not substantially change the dynamics of the model. The production function (2.9) can be thought of Cobb– Douglas with labor supply held fixed at unity. The aggregate resource constraint is Kt+1 = (1 − δ) Kt + Yt − Ct where δ is the depreciation rate of capital. 16 See also Tanaka, Camerer, and Nguyen (2010), who find that income, both its raw level and instrumented for with exogenous shocks, has a negative impact on loss aversion, and Guiso, Sapienza, and Zingales (2011) who find that following the financial crisis of 2008, households both reduced the risky shares of their portfolios and became more averse to gambles in survey questions. 49 For the benchmark calibration, productivity follows a random walk in logs,17 log At+1 = log At + µ + σa ε t+1 ε t+1 ∼ N (0, 1) (2.10) The drawback of using random-walk technology is that it is difficult to generate the degree of volatility for output and investment that is observed in the data.18 I therefore also consider a dual-shock version of the model that can match both the short and longrun variances of output, ¯ A t = A t Xt ¯ ¯ log At+1 = log At + µ + σa ε t+1 log Xt+1 = φx log Xt + σx ε x,t+1 ε t+1 , ε x,t+1 ∼ i.i.d. N (0, 1) (2.11) (2.12) (2.13) (2.14) ¯ At here is the permanent component of output, while Xt can be interpreted as a simple method of trying to capture forces that drive short-run fluctuations in output and consumption, e.g. shocks to monetary policy or energy prices. I refer to the version of the model with random-walk technology as the benchmark model, while the model with permanent and temporary technology shocks is the dual-shock model. 17 An alternative is a trend-stationary process for productivity. Alvarez and Jermann (2005) argue that permanent shocks to the level of productivity (more generally, to the level of state prices) are necessary to explain asset-pricing facts. Also, in models with Epstein–Zin preferences, because the SDF depends not only on current consumption but also on the level of the value function itself, an I(1) process for productivity tends to increase the volatility of the SDF compared to models with trend-stationary productivity, which helps explain the equity premium. Kaltenbrunner and Lochstoer (2010) find that in order to match the empirical equity premium in a model with trend-stationary productivity, their model needs an implausibly small EIS (0.05). With difference-stationary productivity they are able to choose a more reasonable value (1.5). 18 In particular, without a mean-reverting component, it is impossible for the model to replicate the result from Cochrane (1994) that the long-run variance of output is smaller than the unconditional variance. In the RBC model, output does not overshoot its long-run trend following a permanent increase in technology: it does not have a mean-reverting component to its dynamics. Because they rely only on permanent shocks in the RBC model, Kaltenbrunner and Lochstoer (2010) have to set the annual standard deviation of technology shocks to an implausibly high 8.2 percent per year to match the unconditional standard deviation of output growth. 50 2.3 Calibration and simulation I solve the model with projection methods, which entails fitting a polynomial approximation to the decision rule and searching for coefficients so that the equilibrium conditions hold exactly at certain specified points in the state space.19 The Euler equation errors in the simulations imply households misprice a claim on capital by uniformly less than 1/100th of 1 basis point (i.e. one part in one million) over the range of the state space that the simulations visit, and the median simulated error is an order of magnitude smaller. The model is parameterized to match quarterly data. Table 2.1 lists the parameter values and the target moments. Many of the parameters, e.g. the exponent on capital in the production function, take standard values. I discuss here the parameters that are unique to this paper or do not have standard and agreed-upon values. I set ρ = 2/3 as in Bansal and Yaron (2004), for an EIS of 1.5. Bansal and Yaron note that an EIS greater than 1 is necessary for increases in volatility to lower asset prices (specifically, the wealth-consumption ratio) in an endowment economy. In a production economy this result does not hold exactly (because consumption is endogenous), but it is approximately true. Similarly, an EIS greater than 1 ensures that increases in risk aversion increase the expected return on the wealth portfolio and lower its current price.20 Many studies attempting to estimate the EIS have obtained values much smaller than 1 (Hall, 1988; Campbell and Mankiw, 1989). An important test of the model will be whether it can match that result even though the calibrated EIS is larger than 1. I choose the variance of permanent innovations to technology to match the long-run variance of consumption growth in the data. Since technology and consumption are cointegrated in the model, the long-run variance of consumption growth is equal to the variance of the permanent technology shocks. I estimate the empirical long-run variance (i.e. the spectral density at frequency zero) of consumption growth with a third-order univariSee Caldara et al. (2009) for a good description of the method as applied to models with recursive utility. When solving the RBC model with Epstein–Zin preferences, they find that projection methods are orders of magnitude more accurate than the perturbation methods used in the majority of the macro literature. Intuitively, an increase in risk aversion or volatility has two effects – it lowers the risk-free rate and raises the excess return on the wealth portfolio. Which of these effects dominates depends on the EIS. 20 19 51 Table 2.1: Calibration Parameter γ β δ µ φ ρ σa mean(αt) stdev(αt) σx φx Value 0.33 0.9975 0.02 0.005 0.94 0.67 0.0088 14 6.2 0.012 0.9 Target Capital income share 2% annual real risk-free rate 8% annual depreciation (BEA data) 2% annual output growth Persistence of price/dividend ratio A priori (see text) Long-run standard deviation of consumption growth Mean Sharpe ratio (0.32 annualized) Stock return predictability Variance of output growth Variance of output growth Note: Parameters used for the structural models. In table 2, the CRRA model uses with stdev(α)=0; the benchmark EZ-habit model (column 3) sets σ x=0. 52 ate AR model (where the lag length was selected with the Bayesian information criterion) and obtain a value of 0.00882 . That is, the quarterly innovations to the permanent component of consumption have a standard deviation of 0.88 percent.21 For the dual-shock model, I select the parameters σx and φx to match the short-run volatility of consumption and output growth. The parameters imply that the temporary component of technology has an unconditional standard deviation of 2.7 percent.22 The persistence of risk aversion, φ, is set to match the empirical persistence of the pricedividend ratio for the aggregate stock market, as in Campbell and Cochrane (1999). The mean and volatility of risk aversion (¯ and, implicitly, λ) are chosen to match the average α Sharpe ratio for the stock market in the post-war sample and the degree of predictability observed using the price-dividend ratio to forecast stock returns. Mean risk aversion is 14 and the standard deviation is set to 6.2.23 2.3.1 Comparisons across models Table 2.2 reports basic moments from the three models. The first column gives the moments from the data while the second column gives results from the canonical Epstein– Zin model with constant relative risk aversion (EZ-CRRA). Columns 3 and 4 give results for the EZ-habit model under the benchmark calibration and with temporary technology shocks added. The first row simply shows that all three models are calibrated to match the longrun variance of consumption exactly, which, under balanced growth, means they also match the long-run variances of output and investment growth. Rows 2 through 4 give the standard deviations of quarterly output, consumption, and investment growth. Both 21 By choosing a smaller value for the long-run varaince than the long-run risks literature, I only make the task of matching the equity premium harder. I also make the model consistent with the point estimate of the long-run variance of consumption, rather than choosing a value in the upper end of the confidence interval. Empirically, I measure consumption as real per-capita nondurable and service consumption from the BEA. 22 Smets and Wouters (2007) estimate that the 1-quarter autocorrelation of stationary technology shocks is 0.95. On the other hand, the 1-quarter autocorrelation of detrended real GDP is 0.85. I take φx = 0.90 as the midpoint between these two values. When αt < 0, I still use the standard Euler equation even though the household’s optimization problem is convex. In the simulations, αt < 0 only 1.5 percent of the time. Treating households as if they are risk-neutral in periods when αt < 0 (i.e. censoring αt at zero) has no discernible effect on the results. 23 53 Table 2.2: Comparison of preference specifications 1 2 3 4 5 6 7 8 9 10 11 12 Model: Real moments: Long-run SD(dC,dY,dI) (%) StdDev(dY) (%) StdDev(dC) (%) StdDev(dI) (%) corr(dC(t),Rf(t-1)) Financial moments: Mean SR (annualized) Std. dev. SR p-value SR std. dev. Mean Rw (annualized %) StdDev(Rw) (annualized %) Mean Rf (annualized %) StdDev(Rf) (annualized %) 1 Data 0.88 0.99 0.46 2.65 -0.09 0.32 0.22 N/A 6.78 21.19 0.91 1.16 2 EZ-CRRA 0.88 0.59 0.28 1.11 0.28 0.22 0.12 0.16 1.04 4.71 2.20 0.21 3 4 EZ-habit Dual-shock 0.88 0.59 0.47 0.83 0.07 0.32 0.22 0.50 4.17 13.30 2.04 0.25 0.88 1.03 0.56 2.37 0.08 0.32 0.22 0.42 4.15 12.98 1.94 0.26 Note: Column 2 gives results under Epstein–Zin preferences with constant relative risk aversion, column 3 uses EZ-habit preferences and random-walk technology, and column 4 EZ-habit preferences with the dual-shock specification. All models are calibrated as in table 1. All variables are measured using quarterly values. dI is investment growth, dY output growth, and dC consumption growth. Rf is the risk-free rate (measured empirically as the nominal 3-month yield minus an inflation forecast), and Rw is the annualized return on a levered consumption claim (with a leverage ratio of 2.74). The long-run SD is the square root of the spectral density at frequency zero multiplied by 2π. SR is the annualized Sharpe ratio; the standard deviation of the Sharpe ratio is measured by the standard deviation of the fitted values in forecasts of one-quarter-ahead returns in 228-quarter simulated samples divided by the unconditional standard deviation of returns. Row 7 reports the median of the standard deviations of the Sharpe ratio in the simulated samples. Row 8 reports the fraction of simulated samples in which the standard deviation of the Sharpe ratio is as large as in the data. 54 the EZ-CRRA and single-shock EZ-habit models have volatilities for output and investment growth that are well below the empirical values. The dual-shock model rectifies this problem, matching both the short-run and long-run variances well. Both versions of the EZ-habit model match the empirical variance of consumption growth. Row 5 reports the correlation between the risk-free rate and the next period’s consumption growth. Empirically, the real risk-free rate is measured as the 3-month nominal interest rate minus an inflation forecast.24 In the EZ-CRRA model, the risk-free rate has a substantial amount of forecasting power for consumption growth, while in the data interest rates and consumption growth seem essentially unrelated. The two EZ-habit calibrations come much closer to matching that fact. Rows 1 through 5 show that the EZ-habit model can capture the basic unconditional moments of output, consumption, and investment. Rows 6 through 12 of table 2.2 summarize the financial side of the model. We can begin by looking at a measure of the price of risk. The Sharpe ratio on an asset is the ratio of its expected excess return over the riskfree rate divided by its standard deviation, so it measures the risk–return tradeoff. Hansen and Jagannathan (1991) show that the maximum Sharpe ratio obtained by any asset in the economy is equal to the standard deviation of the SDF divided by its mean. Recall that all three calibrations have the same average coefficient of relative risk aversion. The Hansen– Jagannathan bound and the mean Sharpe ratio for the consumption claim are roughly 1/3 higher in the two EZ-habit models than the EZ-CRRA case. The reason for this is that the household’s value, Vt , a component of the SDF (equation 2.6), is more volatile in the EZ-habit models. In all the models, a technology shock permanently raises expected consumption and hence Vt . In the EZ-habit case, the coefficient of relative risk aversion also falls. Households become less averse to future uncertainty, so Vt rises even more. Countercyclical variation in risk aversion thus makes good times even better and bad times even worse, raising the volatility of the SDF. This effect allows the model to explain the equity premium (or at least the Sharpe ratio on equities) with a lower coefficient of relative risk aversion than we would need in the EZ-CRRA model. 24 Expected inflation is measured as a forecast of quarterly inflation based on lagged levels of inflation and the nominal risk-free interest rate. 55 To test whether the models can match the degree of predictability for stock returns that is observed in the data, I regress simulated quarterly excess returns on the consumption claim on its lagged price-dividend ratio. I then estimate the standard deviation of the conditional Sharpe ratio as the standard deviation of the fitted returns divided by the unconditional standard deviation of returns (i.e. assuming a constant volatility). Row 7 reports the median standard deviation from 5,000 simulations of 228 quarters of data, while row 8 reports the proportion of the simulations that have a standard deviation as high as observed empirically (0.22).25 In column 2, we can see that there is actually a nontrivial amount of implied predictability on average in the EZ-CRRA model due to small-sample overfitting, but only 16 percent of the simulations match the variability observed in the data. For the EZ-habit model, the predictability observed in the data is calibrated to be exactly the median value in the simulations. Rows 9 and 10 report the mean and standard deviation of the excess return on a levered consumption claim in the model. For comparability to past results, I follow Abel (1999) and 2.74 Gourio (2010) in assuming a leverage ratio of 2.74 (i.e. the asset pays a dividend of Ct ). The two EZ-habit models are able to generate means and volatilities for returns that are far closer to the equity return observed in the data than the EZ-CRRA model can. Part of the reason for this success is that consumption growth, and hence dividend growth, is more volatile in the EZ-habit models than in the EZ-CRRA case, and part of the reason is that discount rates are more volatile. Following a positive technology shock, not only do dividends rise, but discount rates fall, thus making the returns on the wealth portfolio and the levered consumption claim more volatile.26 Rows 11 and 12 show that the means and standard deviations of the real risk-free rate in the three models are all reasonably close to the data. The volatility of interest rates is similar across all three models, and somewhat lower than in the data. The real risk-free The fitted Sharpe ratio is measured empirically by forecasting the CRSP value-weighted aggregate excess return with the aggregate price/dividend ratio. 26 LeRoy and Porter (1981) and Shiller (1981) argue that dividends do not seem sufficiently volatile to explain the volatility of stock prices. Grossman and Shiller (1981) suggest that variation in discount rates can explain this puzzle. 25 56 rate is measured empirically as the nominal 3-month Treasury yield minus a forecast of inflation. Errors in the inflation forecast will make the estimated real risk-free rate more volatile than the true real risk-free rate, which explains some of the divergence between the empirical and simulated volatilities. A common problem in early attempts to generate a high equity premium (e.g. Constantinides, 1990; Boldrin, Christiano, and Fisher, 2001, and Jermann, 1998), is a highly volatile risk-free rate. The EZ-habit specification replaces movements in discount rates coming from the risk-free rate with movements coming from risk premia. To summarize, table 2.2 shows that the EZ-habit model can match a broad array of features of the economy—the short and long-run variances of output growth, the relative volatilities of investment and consumption growth, and the mean and standard deviation of the Sharpe ratio on equities. The model also helps generate a larger premium on a levered consumption claim, closing roughly half the gap in the equity premium between the EZ-CRRA model and the data. Finally, the behavior of the risk-free rate is reasonably similar to the data, unlike previous general-equilibrium attempts at generating a high and volatile Sharpe ratio. 2.3.2 Predictability in the simulated model 2.3.2.1 The magnitude of return predictability Figure 2.1 plots R2 s from univariate regressions of excess aggregate stock returns over various horizons on the log price-dividend ratio on the CRSP value-weighted portfolio (e.g. Campbell and Shiller, 1988, among many others), Lettau and Ludvigson’s (2001) measure of the consumption-wealth ratio, cay, Campbell and Cochrane’s (1999) excess consumption ratio, and an estimate of risk aversion derived from the EZ-habit model in section 2.4. For the four different variables used in the empirical sample, the R2 s generally rise as the sample length grows, and estimated risk aversion outperforms cay, excess consumption, and the price-dividend ratio. The gray line labeled "Simulated mean" gives the mean R2 from 5000 regressions of excess returns on a consumption claim on the price-dividend ratio (equivalently, the wealth- 57 Figure 2.1: Simulated and empirical R2 s 0.7 0.6 0.5 Estimated risk aversion 0.4 R2 58 6 11 16 21 0.3 0.2 cay 0.1 0 26 Forecast horizon (quarters) 31 36 1 Note: R2s from univariate regressions of stock returns on various predictors. The forecast horizon is reported in quarters. Data for cay is obtained from Sydney Ludvigson's website; Price/dividend data comes from CRSP; the Campbell–Cochrane excess consumption ratio is computed using their parameter values and consumption data from the BEA. The gray lines give the mean and 95th percentiles in the simulation of the EZ-habit model. consumption ratio) over 228-quarter spans in the benchmark simulation of the singleshock model (the same length as the empirical sample). The upper gray line gives the 95th percentile of the simulations. As in the data, the simulated R2 s rise as the horizon lengthens. The model compares favorably with the price-dividend and excess-consumption ratios, with the simulated mean tracking the empirical values closely (the median follows almost the same path). The empirical R2 s for cay are at or below the 95th percentile in the simulations. The only variable that the simulations cannot match is estimated risk aversion, but raising the volatility of risk aversion in the calibration would solve this problem. The R2 s generated here are substantially higher than those obtained in production models such as Campanale, Castro, and Clementi’s (2010) model of time-varying firstorder risk aversion and Guvenen (2009) and De Graeve et al.’s (2010) studies of limited participation. The population R2 s are also essentially identical to those found by Wachter (2010) and Gourio (2010) in endowment-economy and production-based models, respectively, with time-varying disaster risk. The top panel of table 2.3 reports the percentage of simulated samples in which the simulated R2 is as high as we observe in the data for cay and the price-dividend ratio (results for excess consumption are similar to the price-dividend ratio), and where a high price-dividend ratio forecasts low returns. The table reports values for horizons of one quarter and one through five years. The EZ-CRRA model matches empirical R2 s for cay less than 5 percent of the time at horizons shorter than 16 quarters, but can match the R2 s for the price-dividend ratio 15 to 25 percent of the time. The habit model substantially raises the likelihood of the simulations of matching the data, by a factor of three or more at every horizon, and it never matches less than 5 percent of the time except for cay at the one-quarter horizon. As an alternative to the R2 , I also consider the test statistic suggested by Kiefer, Vogelsang, and Bunzel (KVB, 2000) based on Newey–West standard errors with the lag window equal to the sample size. At various horizons, I calculate the t-statistic on the coefficient in a regression of stock returns on the price-dividend ratio in the simulated samples. The bottom panel of table 2.3 repeats the analysis from the top half, but with the KVB test statistics. In every case, the habit model matches the empirical t-statistics at least five per59 Table 2.3: Proportion of simulated samples that match empirical statistics Forecasting R2 Model: Predictor: Horizon (quarters) 1 4 8 12 16 20 EZ-CRRA cay P/D 0.00 0.14 0.01 0.13 0.01 0.13 0.02 0.16 0.04 0.20 0.10 0.25 EZ-CRRA cay 0.10 0.04 0.02 0.03 0.06 0.06 EZ-habit cay 0.05 0.11 0.12 0.17 0.26 0.45 EZ-habit cay 0.38 0.17 0.08 0.12 0.22 0.19 P/D 0.52 0.52 0.51 0.56 0.62 0.69 KVB t-statistics Model: Predictor: Horizon (quarters) 1 4 8 12 16 20 P/D 0.05 0.11 0.11 0.15 0.14 0.20 P/D 0.22 0.35 0.35 0.39 0.38 0.47 Note: The top panel reports, for each predictor, the proportion of simulated 228quarter samples in which the R 2 in a return-forecasting regression is at least as large as observed in the data and where the predictive relationship has the correct sign. The bottom panel reports the proportion of simulated samples that generate KieferVogelsang-Bunzel t-statistics for each variable that are as large as in the data and have the same sign. The EZ-habit model is the benchmark (single-shock) model. cay is the consumption/wealth ratio from Lettau and Ludvigson (2001). P/D is the price/dividend ratio from CRSP. All simulated regressions use as the predictor the wealth/consumption ratio (equivalently, the price/dividend ratio on a claim to aggregate consumption) and the dependent variable is the excess return on a consumption claim. The regressions are run at horizons listed in the left-hand column. Bold numbers are less than 0.05, bold italics less than 0.01. 60 cent of the time. The EZ-CRRA model again has trouble matching the results for cay, and only replicates the statistics for the price-dividend ratio in 5 to 20 percent of the samples, compared to 20 to 50 percent of the samples for the habit model. 2.3.2.2 Other return predictors Table 2.4 reports the simulated correlation between five-year excess returns on the aggregate wealth portfolio and a variety of return predictors. The first row gives the correlation for actual risk aversion, which we would expect would be highest of all of the variables. The second row shows that the predictive power of the price-dividend ratio is nearly as high as that of αt in the benchmark model, but somewhat lower in the dual-shock calibration (though still not as much lower as in the data). Fama and Schwert (1977) and Campbell (1987) find that short term interest rates negatively predict future stock returns.27 In table 2.4, I do not replicate the result that the real interest rate negatively forecasts returns, but the risk-free rate minus its 4-quarter moving average (denoted RREL as in Campbell, 1987), does weakly negatively forecast returns. Table 2.4 shows that the correlation of the five-year excess stock return with the real risk-free ˆ rate and RREL is substantially negative and nearly as large as that of α. In the model, two effects cause interest rates to forecast stock returns. First, positive technology shocks raise interest rates and lower risk aversion. Second, even if risk aversion were driven by shocks unrelated to technology, interest rates might still forecast stock returns since a decline in risk aversion lowers the precautionary saving effect, raising interest rates. Intuitively, there is a flight-to-quality effect in interest rates, linking them to expected stock returns. Table 2.4 reports the mean and standard deviation of the real term spread for the EZhabit model. For the sake of simplicity, I follow the literature in modeling long-term debt as an asset that has a constant probability of paying its principal of one unit of the consumption good and retiring.28 If the bond does not retire and pay out, the holder retains Campbell (1991) subtracts the 12-month moving average of the nominal risk-free rate from itself as a way to detrend the short term interest rate, since the nominal rate may be nonstationary if there are changes in trend inflation. Detrending in that way should be unnecessary in the model since interest rates are stationary, but I still check this variable. 28 27 See, e.g. Rudebusch and Swanson (2008) and Miao and Wang (2010). 61 Table 2.4: Various return predictors Data EZ-CRRA Five-year excess stock return correlations: αt 0.69 0.05 Real interest rate 0.09 -0.05 RREL -0.09 -0.03 Term spread 0.20 0.05 P/D -0.39 -0.05 Term spread summary statistics: Mean 1.43 -0.14 Std. dev. 1.17 0.11 EZ-habit 0.31 -0.30 -0.22 0.29 -0.30 -0.23 0.07 Dual-shock 0.27 -0.23 -0.17 0.10 -0.23 -0.26 0.14 Note: The bottom two rows are the mean and standard deviation of the spread between the yield on a tenyear equivalent bond and a one-quarter riskless bond (measured in the data using nominal Treasuries). The remaining rows give population correlations between various variables and five-year stock retuns. For the simulations, population values are reported. RREL is the gap between the risk-free rate and its fourquarter moving average; P/D is the price/dividend ratio. In the simulations, P/D is measured as the wealth/consumption ratio. ἃ is the value of risk aversion in the model, and estimated in the data as in section 4. 62 the bond for another period. I assume that the quarterly probability of payout is 1-0.91/4 so that the expected maturity of the bond is ten years. The term spread is the yield to maturity on this bond minus the one-quarter riskless yield. The term spread in the model is on average negative, whereas the nominal Treasury yield curve is almost always upwardsloping in the data. The reason we have a negative term spread in the model is that in good times the marginal product of capital, and hence the risk-free rate, is above average. So in good times, short-term bonds have low prices, and hence they are a hedge and have a negative risk premium. Fama and French (1989) show that the term spread forecasts stock returns. Interestingly, table 2.4 shows that even though the term spread is negative on average in the model, it still positively predicts future stock returns as in the data. This essentially comes through an expectations-hypothesis effect. In periods when the risk premium is low, the risk-free rate is high and expected to fall. To the extent that long-term yields are just averages of expected future short yields, long yields will rise less than short yields. So in periods when risk aversion is low, the term spread falls, and the term spread thus positively predicts stock returns. In the model, the equity premium is a nearly a constant multiple of αt .29 The variables that forecast returns in table 2.4 are all correlated with αt , but imperfectly. For example, the price-dividend ratio also depends on expected consumption growth and interest rates. The fourth column of table 2.4 shows that the dual-shock model can qualitatively, if not quantitatively, match the empirical result in column 1 that estimated risk aversion is a more powerful forecaster of excess stock returns than any of the other variables, since it is uncontaminated by factors like expected consumption growth. 2.3.2.3 Consumption growth predictability The aggregate price-dividend and wealth-consumption ratios may be driven by either movements in expected dividend (consumption) growth or movements in discount rates. For the aggregate stock market, Campbell and Shiller (1988) and Cochrane (2008) find that 29 This result is exact in the log-linear approximation. 63 the price-dividend ratio has at best weak forecasting power for dividend growth. Similarly, Lettau and Ludvigson (2001) find that the wealth-consumption ratio has little forecasting power for consumption growth. Figure 2.2 shows that the EZ-habit model is consistent with those results. First, to get a general sense of the dynamic properties of consumption growth, the top panel of figure 2.2 plots the autocorrelations of consumption growth against their empirical counterparts. The shaded region is the 95-percent confidence interval for the empirical estimates using the Newey–West method with a lag window of 12 quarters. In the model, the autocorrelations are near zero at all horizons. The data suggests that the first three autocorrelations are positive, which the model does not match. At longer lag lengths, though, there is no evidence for persistence in consumption growth, consistent with the model. The bottom panel of figure 2.2 simulates 228-quarter samples as in figure 1 and calculates correlations between consumption growth between dates t and t + k and the consumptionwealth ratio at date t. What we see is that while many of the simulated correlations are far from zero, the mean sample correlation between the wealth-consumption ratio and future consumption growth is nearly zero. The figure also plots empirical correlations between cay and future consumption growth at various horizons, and they are similar to the simulated mean. Figure 2.2 thus shows that the EZ-habit model not only matches the short and long-run variances of consumption growth, but it also replicates relevant features of the dynamics of consumption. 2.3.3 Impulse response functions Figure 2.3 plots impulse response functions (IRFs) in the EZ-CRRA and benchmark EZhabit models for four variables: consumption, household value, the risk-free rate, and the Sharpe ratio on the consumption claim. The lines give log deviations from steady-state, except for the risk-free rate, for which I report the absolute change in annualized percentage points. The shock is a unit standard deviation (88 basis-point) permanent increase in the level of technology, which will lead to an identical long-run increase in consumption, capital, and output. 64 Figure 2.2: Consumption predictability Autocorrelation of consumption growth 0.6 Empirical 95% confidence interval 0.4 Empirical estimate Model-implied values 0.2 0 -0.2 -0.4 1 6 11 16 21 Quarters 26 31 36 Correlation of long-horizon consumption growth with W/C 0.6 0.4 Simulated 95% 0.2 Empirical estimate 0 1 -0.2 6 11 16 21 26 31 36 Simulated mean -0.4 -0.6 Simulated 5% -0.8 Forecast horizon (quarters) Note: The top panel reports the empirical autocorrelation function for consumption. The gray shaded region is the 95% confidence interval using Newey–West standard errors with a lag window of 12 quarters. The bottom panel reports the correlation of consumption growth between periods t and t+x with the wealth-consumption ratio (cay ) at date t, where x is reported in quarters on the horizontal axis. The solid lines give the mean and 5th and 95th percentiles of the same correlation in simulated 228-quarter samples. 65 Figure 2.3: Impulse response functions Value Sharpe ratio 0 1 -0.02 EZ-CRRA -0.06 -0.08 -0.1 -0.12 11 -0.14 16 21 26 31 36 41 -0.04 6 11 16 21 26 31 36 41 EZ-habit 0.012 0.01 0.008 0.006 0.004 0.002 0 1 6 Risk-free rate (levels) 0.008 0.007 0.006 0.005 0.004 0.003 0.002 0.001 0 11 16 21 26 31 36 41 1 6 11 Consumption 66 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 1 6 16 21 26 31 36 41 Note: Impulse responses for the EZ-CRRA and EZ-habits models. The shock is a positive unit-standard-deviation increase in technology. The dotted lines are for EZCRRA, solid are for EZ-habit. All functions are reported as fractions of the variables' means except for the risk-free rate, for which the response is in annualized percentage points. Value is lifetime utility; the Sharpe ratio is for an asset that pays aggregate consumption as its dividend. The top-left panel shows the response of household value. For the EZ-CRRA model, value immediately jumps to a point just below its new steady state, and then slowly rises as households accumulate capital. For the EZ-habit model, though, value actually overshoots its new steady state. The reason is that the positive shock to productivity drives risk aversion down. When households are less risk-averse, they place a higher value on their future consumption stream because they penalize uncertainty less strongly. This effect helps increase the volatility of the SDF (equation 2.6), raising the Hansen–Jagannathan bound. The top-right panel shows that on the impact of a shock, the Sharpe ratio in the EZ-habit model falls by 12.5 percent (as a fraction of its mean), and then gradually rises again, with a half-life of 12 quarters. The bottom-left panel shows the dynamics of the risk-free rate. The initial response is essentially identical for the two models. The reason for this is that the risk premium on an unlevered claim on capital is very small in the model, so the return on capital is roughly equal to the risk-free rate. Since the size of the capital stock is essentially fixed in the short-run, an increase in productivity directly increases the return on capital and hence the risk-free rate. The final panel of figure 2.3 shows the response of consumption in the two models. The EZ-habit model shows a larger initial response of consumption, with lower expected consumption growth going forward. To see why this is, we can write consumption growth in equilibrium as ¯ Et ∆ct+1 = c + ρ−1 r f ,t+1 + αt × vol (2.15) where r f ,t+1 is the risk-free interest rate between dates t and t + 1, vol represents a measure ¯ of the total volatility in the model, and c is a constant. αt × vol represents the precautionary saving effect and is a function of the current level of risk aversion and the variances of the shocks in the model. The standard interpretation in an endowment economy is that conditional on consumption growth, a strong precautionary saving motive leads to a low risk-free rate. In a production setting, though, it is the risk-free rate that is held roughly fixed since it is tied to the marginal product of capital, which is hard to change quickly through investment. Conditional on the risk-free rate, then, a small precautionary saving 67 motive leads to lower expected consumption growth (more consumption today, saving less for tomorrow). In the EZ-habit model, a positive technology shock lowers risk aversion, and hence consumption rises more than in the canonical EZ case. This effect also serves to increase the volatility of the SDF, just as the higher response of value does. Given the results in figure 2.3, it is straightforward to see what would happen in this economy if there were a pure shock to the coefficient of relative risk aversion. Since the risk-free rate is tied to the marginal product of capital, it would not move on the impact of a shock. The only effect on real variables of a pure decline in risk aversion then is that households would want a smaller buffer stock of savings, so they would raise consumption and lower investment: shocks to risk aversion look like simple demand shocks. 2.3.4 Estimating the EIS from interest rate regressions The value of the elasticity of intertemporal substitution is controversial. Regressions based on aggregate consumption and asset returns often find a very small EIS (Hall, 1988; Campbell and Mankiw, 1989). Campbell (2003) reviews the literature and estimates the EIS using a variety of specifications and data from a broad range of countries, finding values generally less than 0.5 and often less than 0.2.30 This result is in conflict with the calibration used here and in other recent production-based asset pricing studies (Kaltenbrunner and Lochstoer, 2010; Gourio, 2010), which assume that the EIS is greater than 1. The question is whether the EZ-habit model generates small EIS estimates in regressions similar to those estimated in Campbell (2003). The standard aggregate EIS regressions start from a model in which the risk-free rate takes the form r f ,t+1 = b0 + ρEt ∆ct+1 (2.16) where r f ,t+1 is the riskless interest rate between periods t and t + 1. b0 is a parameter depending on the discount rate and underlying volatility in the model (which are taken 30 Vissing-Jorgenson (2002) finds an EIS less than unity in micro data. On the other hand, Vissing-Jorgenson and Attanasio (2003) and Gruber (2006) obtain larger estimates using micro data, both above unity. Gruber (2006) is particularly well-identified, using variation in the capital income tax rate as the source of exogenous differences in the after-tax interest rate earned by households. 68 to be constant). This relationship is straightforward to derive in an endowment economy with homoskedastic consumption growth and where households have a constant EIS and coefficient of relative risk aversion. It is also obtained in a log-linearization of the standard RBC model with homoskedastic technology shocks. In principle, the EIS can be estimated from a regression of interest rates on consumption growth or vice versa. However, since the reduced-form relationship between consumption growth and interest rates is nearly zero in the EZ-habit model, in some of the simulations regressing Et ∆ct+1 on r f ,t+1 produces explosive estimates for ρ−1 (since we have to invert the coefficient estimate). Moreover, consumption in the EZ-habit model nearly follows a random walk, so it is essentially unpredictable and there are serious weak-instruments problems in an IV regression of interest rates on consumption growth. I therefore focus on the regression of consumption growth on interest rates, Et ∆ct+1 = b0 + ρ−1 r f ,t+1 (2.17) In the simulations of the model in section 2.3, we have the ability to directly measure Et ∆ct+1 . The first row of table 2.5 reports the population estimate of ρ−1 in regression (2.17) under the EZ-CRRA and EZ-habit models. In the EZ-CRRA case, the regression identifies ρ−1 exactly. On the other hand, the estimate of ρ−1 is biased substantially downwards in the EZ-habit specifications. The bias comes from the fact that the time-varying precautionary saving effect (equation 2.15) is omitted from the regression. Since precautionary saving is correlated with both expected consumption growth and interest rates, omitting it biases the simple IS-curve regression usually used to identify the EIS. An alternative way to see the source of the bias is to go back to the IRFs in figure 2.3. In both models the risk-free rate rises by the same amount following a shock. In the EZ-habit specification, though, because of the decline in precautionary saving, expected consumption growth is lower following a shock than in the EZ-CRRA case. That means that the estimate of ρ−1 will fall.31 31 In Bansal and Yaron (2004), time-variation in the volatility of shocks in principle causes EIS regressions to be biased. However, Beeler and Campbell (2010) show that their calibration generates almost no actual bias— the median sample EIS estimates are well above 1. This paper thus represents an improvement in being able 69 Table 2.5: Regressions estimating the elasticity of intertemporal substitution 1 2 3 4 5 6 7 Model: Population, infeasible (Et[∆ct+1]) Population Small sample [2.5%, 97.5%] Population, RRA control Small sample, RRA control [2.5%, 97.5%] Data N/A N/A 0.14 N/A N/A 0.18 N/A EZ-CRRA 1.50 1.50 1.16 [0.03, 1.79] N/A N/A N/A EZ-habit 0.64 0.56 0.03 [-1.98, 1.02] 1.50 0.07 [-3.08, 3.11] Dual-shock 0.78 0.71 0.35 [-1.32, 1.34] 1.50 0.94 [-1.07, 2.78] Note: Values reported are the coefficient from regressions of consumption growth or expected consumption growth on the risk-free rate. The dependent variable in row 1 is expected consumption growth (computed numerically in the simulations); all other rows use actual consumption growth. The small-sample regressions are based on 228 quarters of data, and median coefficient estimates are reported; 2.5 and 97.5 percentiles are reported in brackets. The RRA control is actual risk aversion in the simulations and estimated risk aversion (section 4) in the empirical regressions. 70 The regression in the first row of table 2.5 is in some sense ideal, but it is not the regression that we are actually able to run in the data since Et ∆ct+1 is unobservable.32 Rows 2 through 4 report results for estimates of ρ−1 from regressions of actual consumption growth, ∆ct+1 , on the risk-free rate, r f ,t+1 . Row 2 gives the population estimates, while rows 3 and 4 give the median and 95-percent range of the estimates from 228-quarter simulations. With constant relative risk aversion, the population regression in row 2 estimates the EIS exactly. The median estimate from the small-sample regressions in row 3 is 1.16. The 95-percent range is wide, and it only just barely contains the estimate from the data. So it is in principle possible for the EZ-CRRA model to generate an estimate of the EIS as small as what we observe in the data, but the probability is small (less than 10 percent). In the EZ-habit models, the bias is far larger. The population estimate in the singleshock case is 0.56, and the median sample estimate is 0.03. For the dual shock model, the estimates are only slightly better—0.71 in population and a median of 0.35 in small samples. The reason for this slight improvement is that the temporary shocks have little effect on risk aversion, so they induce variation in consumption and the risk-free rate that is closer to the usual EZ-CRRA case. The upper end of the 95 percent range for the simulated estimates is well below the true value of the EIS. The first four rows of table 2.5 show that in general, regressions of interest rates on expected consumption growth are not a very good way to estimate the EIS, and in the EZhabit model they are biased and inconsistent. It is worth noting, though, that if we could observe αt , we could completely eliminate the bias in the EIS regressions. Empirically, this suggests that regressions designed to estimate the EIS could be improved by including a control for risk aversion, such as the price-dividend ratio on the stock market. The final three rows of table 2.5 try to estimate the EIS including a control for risk aversion. In the data, I use a measure of risk aversion derived from the EZ-habit model below in section to generate a substantial bias in aggregate regressions without large movements in the conditional volatility of consumption. In principle, the real risk-free rate, r f ,t+1 , is also unobservable in the data. As above, I form r f ,t+1 as the difference between the nominal 3-month interest rate and a forecast of inflation based on lagged inflation and nominal interest rates. Errors in the estimate of the true real-risk-free rate would bias the estimate of ρ−1 towards zero. Instrumental-variables methods can theoretically eliminate this bias. 32 71 ˆ 2.4, denoted αt . The empirical estimate of the EIS is essentially unchanged from when ˆ αt is not included. In population, when αt is included in the simulated regressions, the EIS is estimated exactly. In small-sample regressions, though, the estimate of the EIS in the model is still biased downward. In the single-shock model, the median estimate is 0.07. and in the dual-shock model 0.94. Row 7 shows, though, that the 2.5 percentile of the small-sample estimates is -3.08 in the dual shock model, while the 97.5 percentile is 3.11. So even though the median estimate in the dual-shock model is not enormously biased, the empirical value of 0.18 is well within the simulated range. In the end, while controlling for risk aversion should, in principle, allow us to estimate the EIS consistently, in small samples the regressions still do not seem to provide useful estimates because of weak-identification problems. As an alternative to these regressions, the EIS could be consistently estimated if we had an instrument for the risk-free rate that was uncorrelated with risk aversion. Standard aggregate instruments like lagged interest rates and consumption growth (e.g. Campbell, 2003) will certainly not be valid instruments under the EZ-habit model since the precautionary saving effect is persistent. Household-level instruments would work better if my model is correct in treating risk aversion as being driven by aggregate factors.33 The EZhabit model thus has the ability to explain the divergence between micro and macro estimates of the EIS if the micro instruments are valid and the macro instruments invalid. 2.4 Empirical return forecasting This section shows how the model suggests we can directly estimate the coefficient of relative risk aversion in the data, and then demonstrates that the estimate is a powerful predictor of stock returns. Second, I present novel evidence that technology growth forecasts stock returns, just as it does in the production model. The results differentiate the EZhabit paper from models with time-varying disaster risk. Gourio (2010) predicts that when 33 Gruber (2005) discusses precisely these issues and tries to resolve them by using household-specific variation in tax rates as an instrument for consumption growth. Dynan (1993) explicitly controls for the precautionary savings effect at the household level by predicting the conditional volatility of consumption, but she does not deal with the possibility that risk aversion varies over time. 72 there are changes in the probability of a large disaster occurring, price-dividend ratios will forecast returns, which is also true in the EZ-habit model. The EZ-habit model also predicts, though, that technology and estimated risk aversion will forecast stock returns, and that estimated risk aversion will be the single most powerful forecaster of returns, which would not be true in the time-varying disaster model or models based on other forms of time-varying volatility (e.g. Bansal and Yaron, 2004, Bloom, 2009, or Fernandez-Villaverde et al., 2011). 2.4.1 Estimating risk aversion If risk aversion follows the AR(1) process given in (2.5), then we can measure current A risk aversion if we simply observe the history of aggregate value, vt . For a given value A of the EIS and observed data on wealth and consumption, it is possible to calculate vt by rearranging equation (2.7) A vt = ρ A 1 1 wA − c + log (1 − exp (− β)) 1−ρ t 1−ρ t 1−ρ (2.18) If we can measure household wealth and consumption, then we can measure value. We A then simply plug the estimates of vt into equation (2.5) to obtain estimates of αt . Lettau and Ludvigson (2001) study a cointegrating relationship between consumption and aggregate wealth. Their method is valid in my model since consumption and wealth are cointegrated under balanced growth. While their analysis was designed to estimate the consumption-wealth ratio, it also delivers, as a byproduct, a measure of aggregate wealth (since we can always add consumption to the wealth-consumption ratio to obtain wealth). The estimate of wealth derived using their method is a combination of asset wealth data obtained from the flow of funds accounts plus an estimate of human wealth. They treat labor income as the dividend from the stock of human wealth. Assuming the price-dividend ratio for human wealth is stationary, we can use labor income as a proxy for human wealth. Denoting asset wealth as at and labor income as yt , the appendix shows 73 that we then have a cointegrating relationship, ct = ζωat + ζ (1 − ω ) yt + ξ t (2.19) where ζ and ω are parameters and ξ t is a stationary error term. Lettau and Ludvigson (2001) refer to the residual ξ t as cay. This variable essentially represents an estimate of the consumption-wealth ratio. Since I want to estimate wealth, I define ayt ≡ ωat + (1 − ω ) yt (2.20) which, under the assumptions above, will be a statistically unbiased estimate of total wealth, but will include error due to the fact that we do not assume we directly measure human wealth.34 A With our measure of wealth ayt , we estimate vt as ˆA vt = ρ 1 ayt − ct 1−ρ 1−ρ (2.21) where we ignore constants, and a circumflex indicates an estimated variable. Note that since the parameters of the cointegrating relationship for ct , at , and yt are estimated superconsistently we do not have to modify any standard errors in the subsequent analysis to take into account the fact that ayt is a generated regressor (which is why it does not receive a circumflex). That said, to the extent that there is measurement error in the consumption ˆA ˆA or wealth data, vt will inherit that same error. When we use vt to forecast market returns, this measurement error should only weaken the results. For measurement error to generate a spurious predictive relationship, it would have to be correlated with other predictors of returns.35 34 In simulations with variable labor supply, The price/dividend ratio on human wealth does vary over time, but the variation is relatively small: risk aversion calculated using the method here (assuming a constant price-dividend ratio on human wealth) is over 95 percent correlated with actual risk aversion. One obvious source of measurement error is that human capital is not a perfect estimator of the value of human wealth. Suppose risk aversion rises above average and lowers the price-dividend ratio on human wealth below average. Labor income will then be overestimating human wealth (compared to its average). ˆ High levels of wealth drive out measure of αt downward, so this measurement error should bias the results against correctly forecasting returns (high risk aversion in the data leads to low risk aversion in our estimates). 35 74 ˆA This definition of vt is similar to Lettau and Ludvigson’s cayt , except they have equal weights on ct and ayt , whereas equation (2.21) uses a combination where the weights deˆA pend on the EIS. Also, cayt is stationary by construction, whereas vt is growing over time (it is cointegrated with consumption and wealth). In equation (2.21) a high EIS (low ρ) raises the weight on consumption relative to asset wealth. If the EIS is less than 1 (ρ > 1), the weight on wealth, ayt , is actually negative, and the weight on consumption greater than 1. Bansal and Yaron (2004) and Kaltenbrunner and Lochstoer (2010) both find that an EIS of 1.5 allows their models to fit asset pricing facts, so I use the same value. This value is also consistent with the micro evidence of Vissing-Jorgensen and Attanasio (2003). The results reported below are broadly similar as ˆA long as the EIS is greater than 1.1 (at that level and below, vt becomes very volatile). The appendix reports a sensitivity analysis for various values of the EIS. ˆA Figure 2.4 plots vt both in its raw levels and with a linear trend taken out. As we would ˆA expect, vt follows a strong upward trend. There seem to be both low and high frequency ˆA components to detrended vt . In particular, there are long-run swings with peaks in the early 1970’s and 2000’s and a trough around 1994, generally consistent with movements in the aggregate price-dividend ratio and variation in average output growth.36 At the same time, there are business-cycle frequency movements, e.g. the troughs in 1973, 2001, and 2008. ˆ I construct an estimate of αt , αt , using the update process for risk aversion, equation ˆA (2.5), and the data on vt . In particular, we have A ˆ ˆ ¯ ˆA αt+1 = φαt + (1 − φ) α + λ ∆vt+1 − Et ∆vt+1 (2.22) A As above, I assume that φ = 0.96. Et ∆vt+1 is estimated simply as the sample average of ˆA ∆vt .37 In the simulations not reported here, though, this effect seems to be small. Note, though, that linear detrending will tend to make the series look as if it is mean-reverting even if it follows a random walk. The linear trend is only used to make the graph legible; none of the results involve it. A A In principle, it is possible to forecast ∆vt+1 , but the amount of predictability in ∆vt+1 is sufficiently small A simply follows a random walk. that the results are nearly identical to assuming that vt+1 37 36 75 Table 2.6: Value, raw and linearly detrended 11.5 0.25 11.3 0.2 11.1 Log household value, linearly detrended Log household value (lifetime utility) 0.15 10.9 0.1 10.7 0.05 10.5 0 76 -0.05 -0.1 -0.15 -0.2 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007 10.3 10.1 9.9 9.7 9.5 1952 Note: Household value (lifetime utility) is measured using data on wealth and consumption from Sydney Ludvigson's website. The thin line is the absolute level of value (left-hand axis); the thick line is value linearly detrended. Both variables are measured in logs. Grey bars are NBER-dated recessions. The parameter λ governs the volatility of αt , but it has only a multiplicative effect on ˆ ˆ αt . That is, two estimates of αt will be perfectly correlated with each other, regardless of ¯ what values are chosen for λ. The same argument applies for α. As long as we are simply trying to forecast stock returns using a linear regression, we can ignore any additive or ˆ ¯ ˆ multiplicative shifts in αt . Therefore, I set α = 0 and choose λ so that αt has unit variance, normalizations that will have no effect on the regression-based measures of forecasting power (and I choose a negative value of λ to match the habit-formation motivation of the ˆ ¯ model). In the first period of the sample I assume α = α. An important feature of this method of forecasting is that it is based only on the preference specification. None of the assumptions we made about the production side of the economy are required for this method to be valid. We simply take advantage of the relationship between household value and changes in risk aversion and the relationship under Epstein–Zin preferences between household value and wealth. 2.4.2 Forecasting market returns The next question is to what extent the model-implied variation in expected returns is ˆ related to actual returns. Figure 2.5 plots αt and 5-year excess returns on the stock market (the value-weighted excess return from Kenneth French). The strong correlation between the two series (0.68) is immediately apparent. There are both high and low-frequency ˆ movements in αt associated with changes in growth in value. In the periods when value is growing quickly, e.g. the late 1990’s, risk aversion falls. At the same time, there are higherfrequency movements, such as the temporary increases in estimated risk aversion around the recessions in 1991 and 2001. ˆ Figure 2.1 plots R2 s from regressions of future stock returns on αt , cayt , the pricedividend ratio (P/D), and the excess consumption ratio from Campbell and Cochrane (1999). Each line gives the R2 from a univariate regression. The x-axis gives the horizon for the return in quarters. The nth point is the R2 from a regression of ∑n=1 rt+ j on j the predictor at time t. The regressions are all run on quarterly data from 1952 to 1999 (to The appendix shows that the results are robust to different choices for φ. 77 Figure 2.4: Estimated risk aversion and 5-year excess stock returns 4 6.5 3 5.5 2 5-year excess stock returns Estimated risk aversion 4.5 3.5 1 2.5 0 1.5 -1 0.5 Estimated risk aversion (normalized) -2 -0.5 -3 -1.5 -4 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007 -2.5 1952 Note: Excess stock returns are for the CRSP value-weighted index minus the risk-free rate, from Ken French's website. Returns are forward-looking five-year averages. Risk aversion is estimated from data on aggregate wealth and consumption and is normalized to have zero mean and unit variance. Excess returns 78 ensure we have data for the 40-quarter regression). Each regression uses the same sample for the predictors. ˆ At every horizon, α is dominant. At the five-year horizon, the R2 for estimated risk aversion peaks at more than twice that of the other variables. The R2 s are impressively ˆ high for just a single variable: at the 5-year horizon, α explains 50 percent of the postwar variation in stock returns. Furthermore, in horse-race regressions (reported in the ˆ appendix), α dominates cay at all horizons. An important consideration in long-horizon forecasting regressions is that the residuals are highly persistent. Kiefer, Vogelsang and Bunzel (2000) and Kiefer and Vogelsang (2005) show that by using Newey–West standard errors with a very long lag-window, we can obtain test statistics with better size properties than techniques that use a fixed (and usually short) lag window. I choose a lag window equal to half the sample size and use the critical values reported in Kiefer and Vogelsang (2005).38 For cay, every regression except for ˆ those with horizons greater than 30 quarters is significant at the 5 percent level. For α, the largest p-value is 0.0008. The price-dividend ratio is significant at the 5 percent level for forecasts of 14 quarters or longer. In other words, these regressions all imply that we have ˆ substantial ability to forecast stock returns in the post-war period, and α is the strongest of the predictors. Out-of-sample tests with both asymptotic and bootstrapped critical values give similar results (appendix D.3). Appendix D examines the sensitivity of the results in this section to the various parameters we had to calibrate (e.g. the EIS and the persistence of habits). The basic results hold across a broad range of parameter sets. 2.4.3 Forecasts from estimates of technology The method of estimating the level of risk aversion studied above does not rely on any assumptions about the structure of production in the economy, being derived purely from the preference specification. However, in the production model, changes in value are Kiefer and Vogelsang (2002) note that there is a size-power tradeoff. When the lag window is increased, the size of the test statistics gets closer to their nominal size, but there is a loss of power. I choose a lag window of half the sample to balance these considerations. The results are basically identical when using a lag window equal to the sample size as in Kiefer, Vogelsang, and Bunzel (2000), though the price-earnings is significant in more of the regressions. 38 79 closely related to changes in productivity. If we can measure innovations to technology, then risk aversion should follow an AR(1) process where the innovations are equal to the shocks to the stochastic trend in technology. There is a large literature that tries to estimate aggregate technology shocks. I consider two methods here. The first builds off of Solow (1957) and uses restrictions from a constantreturns production function: at = yt − γk t − (1 − γ) lt (2.23) at measures technology if the economy has a Cobb–Douglas production function, with l denoting log labor supply. I also consider a simpler metric, labor productivity, l pt = yt − lt . Labor productivity does not take into account the effects of capital accumulation and simply models technology as the average product of labor. Capital can be difficult to measure, whereas the number of hours supplied in the economy is a fairly concrete quantity (though the quality of those hours is difficult to account for).39 To extract the stochastic trend from the two productivity series, I estimate univariate ARMA models for each variable. The Bayesian information criterion implies that TFP growth is best fit with an MA(2), while labor productivity growth should be treated as i.i.d. ε TFP is defined as the residual in the MA(2), while ε LP is simply equal to labor productivity t t growth. That is, ε TFP and ε LP are innovations to the Beveridge–Nelson (1981) trends in t t productivity. Section 2.5.1 shows that, at least in the case where log technology follows a random walk, risk aversion follows an AR(1) process of the form, ¯ αt = (1 − θ ) αt + θαt−1 + ε X t (2.24) where ε X denotes a measure of technology growth. We then have two measures of αt , Furthermore, labor productivity determines the tradeoff that households face between consumption and leisure. If the capital stock rises because foreigners want to invest more in the US, household welfare will increase even if TFP does not. Similarly, a tax increase that reduced desired saving could lower welfare and ˆ labor productivity, without having any effect on TFP. And welfare is the relevant input in estimating αt . 39 80 ˆ ˆ which I denote α TFP , α LP , using ε TFP and ε LP , respectively.40 The two measures turn out to t t be highly correlated (93 percent). ˆ ˆ Figure 2.6 plots five-year excess returns against α TFP and α LP The two series are both clearly highly correlated with future excess returns. The p-values in regressions of quarˆ ˆ terly excess returns on α TFP and α LP are 0.032 and 0.026, respectively (using Kiefer, Vogelsang, and Bunzel, 2000, t-type-statistics to account for autocorrelation). The relationship between the three series is most clear around the turning points. Productivity growth begins slowing down around 1970, driving risk aversion upwards. Forward-looking stock returns reach their trough at roughly the same point. Productivity growth rises again starting in the mid-1990’s, which is exactly when stock returns begin falling again. The two measures of risk aversion in figure 2.6 clearly do not have the explanatory power of the variables studied above. The wealth and consumption data used above have a forward-looking component that is not present in instantaneous measures of technology. On the other hand, my measure of risk aversion is highly correlated with the consumptionwealth ratio. In almost any model with time-varying discount rates, the consumptionwealth ratio will forecast stock returns. It is not the case, though, that any model will predict that measures of technology should forecast stock returns. Figure 2.6 thus provides evidence in favor of the model presented here over other explanations of time-varying expected returns. 2.5 Extensions 2.5.1 Log-linearization This section studies a log-linear version of the production economy from above. I use the solution to build a better understanding of the basic results reported in the previous sections. In particular, I derive analytic approximations for the consumption function, the risk-free rate and the conditional Sharpe ratio for the wealth portfolio. I also derive 40 Note that α TFP includes some forward-looking information since its construction requires the estimation ˆ ˆ of an MA(2) on the full sample. α LP does not suffer from this flaw. It is true that in both cases we have to estimate mean productivity growth, but shifts in the estimted mean simply correspond to shifts in the mean ˆ ˆ of αt ; they have no effect on its dynamics. In regressions of returns on αt , the constant will thus always absorb ¯ shifts in α, so the estimation of the mean of productivity growth is irrelevant for forecasting returns. 81 Figure 2.5: Stock returns and estimates of risk aversion from productivity growth 3 6 5 2 4 1 3 Risk aversion from labor productivity 2 0 1 Estimated risk aversion (normalized) -1 Five-year stock returns 0 -1 -2 -2 Risk aversion from total factor productivity -3 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 -3 1955 Note: Total factor productivity is the quarterly Solow residual obtained from John Fernald's website. Labor productivity is output per hour in the non-farm private business sector from the BLS. Risk aversion is an AR(1) with innovations equal to the (negative) innovations to the Beveridge–Nelson trend in productivity. Five-year stock returns 82 an essentially affine model of the term structure with a time-varying price of risk. This result connects the standard production theory in the macro literature to one of the most commonly used empirical asset-pricing frameworks. 2.5.1.1 Approximation method and solution I derive a log-linear consumption function as the solution to a model that represents a log-linearization of the environment derived above with permanent technology shocks only. Specifically, if the capital accumulation equation, the return on capital, and the return on the wealth portfolio are log-linearized, then we are able to obtain an exact formula for the consumption function under EZ-habit utility (with canonical Epstein–Zin and power utility as special cases). The methods build on Campbell (1994) and Lettau (2003). Unlike the usual techniques in the macro literature, the solution is not based on certaintyequivalent or higher-order approximations to expectations.41 Rather, I take advantage of log-normality to calculate expectations exactly. That feature of the method is critical for accurately capturing risk premia and precautionary saving effects. It is straightforward to show that the approximation technique delivers policy functions that are identical to those obtained from perturbation up to the first order (depending on where the derivatives are taken). The difference is that because I take advantage of formulas for log-normal expectations, a term involving risk aversion appears in the solution, meaning that the approximation captures the time-varying precautionary saving effect that is central to driving the difference in the consumption response between the EZ-habit model and the RBC model with canonical Epstein–Zin preferences.42 The approximation method involves log-linear approximations to three components of the model: the budget constraint, the return on the wealth portfolio, and the return on The usual technique is the perturbation method of Judd (1999). See Woodford (2003) for a representative application and Rudebusch and Swanson (2009) for extensions to higher order approximations 42 In perturbation, the equilibrium equations are not only approximated with respect to the endogenous and exogenous variables, but also the volatility of the technology shock, σa . The first-order perturbation solution therefore does not include any terms involving interactions of state variables with σa . The approximation used 2 here includes a term for αt σa . The solutions are otherwise identical. 41 83 capital, k t +1 ≈ λ 0 + λ k k t + λ a a t + λ c c t rw,t+1 ≈ Et rw,t+1 + ∆Et+1 ∑ θ j ∆ct+ j+1 − ∆Et+1 ∑ θ j rw,t+ j+1 j =0 j =1 ∞ ∞ (2.25) (2.26) (2.27) rk,t+1 ≈ r0 + rk (k t+1 − at+1 ) where {λ0 , λk , λ a , λc , θ, r0 , rk } are linearization coefficients that I solve for in the appendix and lower-case letters denote logs.43 ˜ ˜ Define ct ≡ ct − at and k t ≡ k t − at . With the the three log-linear approximations (2.25), (2.26), and (2.27), the appendix shows that we can obtain the following result: Proposition 2.1. Given the log-linear budget constraint, the log-linear return on the consumption claim, and the production function, the optimal consumption plan takes the form ˜ ˜ ct = ηc0 + ηck k t + ηca αt (2.28) and the return on the wealth portfolio can be written as rw,t+1 = ηw0 + ηwa αt + ρEt ∆ct+1 + κr ε t+1 (2.29) where the coefficients {ηc0 , ηck , ηca , ηw0 , ηwa , κr } are solved for in the appendix. Furthermore, the log price-dividend ratio of the consumption claim is linear in risk aversion and scaled capital. The first important implication of this result is that even though risk aversion is timevarying, both consumption growth and returns on the wealth portfolio are homoskedastic. We did not assume the existence of a log-linear policy for consumption or homoskedastic wealth return. Variation in risk aversion induces variation in expected returns on risky assets, but not in their volatility. This same result is obtained in the numerical solution above. (2.25) is a log-linearization of the resource constraint, Kt+1 = (1 − δ) Kt + At Kt − Ct ; (2.26) is the Campbell–Shiller (1988) approximation for the return on the wealth portfolio; (2.27) is a linearization of γ −1 1− γ Rk,t+1 = γKt+1 At+1 + 1 − δ. 43 1− γ γ 84 Remark 2.1. ηck does not depend on the level or volatility of the coefficient of relative risk ¯ aversion (i.e. on α or λ). The finding that ηck is not affected by the time-variation in the coefficient of relative risk aversion helps us build intuition as to why the IRF for consumption changes when risk aversion varies. Consumption responds to a technology shock more strongly in the EZhabit model than in the RBC model purely because the coefficient of relative risk aversion falls in response to positive technology shocks. If the economy experienced a hypothetical shock to the size of the capital stock holding the coefficient of relative risk aversion fixed, the behavior of consumption and saving would be identical under EZ-habit and EZ-CRRA preferences. 2.5.1.2 The risk-free rate and excess returns on the wealth portfolio Denote the conditional standard deviation of a variable x as σ ( x ). We have the following formulas for the risk-free rate and Sharpe ratio, Proposition 2.2. In the log-linearized model, the risk-free rate follows r f ,t+1 = η f 0 + ρE∆ct+1 − η f a αt (2.30) and the Sharpe ratio of the consumption claim is Et rw,t+1 − r f ,t+1 + 1 σ2 (rw ) 2 = ρσ (∆c) + (αt − ρ) σ (∆v) σ (r w ) (2.31) As usual, expected consumption growth affects the risk-free rate in proportion to the inverse of the EIS. There is an additional term η f a αt reflecting the time-varying precautionary saving motive. When risk-aversion is high, precautionary saving demand is high, driving the risk-free rate downwards, all else equal. It is immediately clear that a simple regression of consumption growth on interest rates will not identify the EIS, 1/ρ, unless the instruments used for the interest rate are uncorrelated with current risk aversion or risk aversion is controlled for. Note also that the Sharpe ratio is strictly increasing in αt . The terms σ (∆c) and σ (∆v) 85 are the standard deviations of growth in household consumption and value, respectively (both of which are constant in equilibrium). Tallarini (2000) and Lettau (2003) show that in an RBC model with power utility, an increase in risk aversion need not increase the Sharpe ratio because consumers can endogenously smooth consumption.44 In this setting, though, risk aversion unambiguously increases the Sharpe ratio. The reason is that the preference for consumption smoothing comes from the EIS. An increase in risk aversion does not cause consumers to smooth consumption endogenously, and so the only effect is to raise the Sharpe ratio on the wealth portfolio. An obvious question is what the term σ (∆v) actually is. The appendix derives the following results, Proposition 2.3. In equilibrium, the coefficient of relative risk aversion αt follows the process, ¯ αt = φαt−1 + (1 − φ) α + σaa ε t (2.32) where σaa depends on the parameters λ, θ, φ, and σa . The standard deviation of innovations to value is σ (∆v) = −1 + θ 1 + 2 1−θφ σaa σa θ 1−θφ σaa (2.33) σ (∆v) is increasing in σa for σa > 0 and σaa < 0. Result 2.3 first shows that the coefficient of relative risk aversion follows an AR(1) process with innovations that are perfectly correlated with technology shocks. This result is a consequence of household value being a log-linear function of the level of technology, so that vt+1 − Et vt+1 is a linear function of the technology shock. Result 2.3 shows that the standard deviation of innovations to household value is constant. Furthermore, in the benchmark case where technology shocks drive risk aversion down, an increase in their volatility raises the volatility of innovations to value, as we would expect. Lettau and Uhlig (2004) and Rudebusch and Swanson (2008) obtain similar results for Campbell–Cochrane preferences. 44 86 In the case where σaa = 0, which corresponds to power utility, we obtain a surprisingly simple formula: Remark 2.2. For the RBC model where technology follows a random walk and consumers have constant relative risk aversion, the Sharpe ratio on a consumption claim is approximately SRt ≈ ασa (1 − ηck ) + (α − ρ) σa ηck (2.34) (2.35) ≈ ασa − ρσa ηck This formula is similar to the formula obtained by Bansal and Yaron (2004) for the Sharpe ratio in the presence of long-run risks in an endowment economy. σa represents long-run shocks to consumption growth, since consumption eventually catches up to a technology shock. Of that total response, σa (1 − ηck ) comes in the first period, with σa ηck in subsequent periods.45 We can thus think of the first component of the Sharpe ratio, ασa (1 − ηck ), as Bansal and Yaron’s short-run risk term, and (α − ρ) σa ηck as long run risks. Kaltenbrunner and Lochstoer (2010) also show that production models generate long-run risk endogenously, but this simple formula for the Sharpe ratio has not been obtained elsewhere. The second line shows that we can isolate the ηck term. If the EIS is large (ρ is small) then the endogenous response of consumption in the model is unimportant and the Sharpe ratio is determined simply by the volatility of technology shocks and the coefficient of relative risk aversion. 2.5.2 Affine bond pricing It turns out that the log-linear solution to the model allows us to connect the standard macro framework to the bond-pricing literature through the following result: 45 Recall that ηck is the coefficient on scaled capital in the consumption function. A unit increase in the technology shock ε t+1 raises consumption by σa ; the associated decline in scaled capital of σa lowers consumption ηck σa . 87 Proposition 2.4. The log stochastic discount factor can be expressed as 1 ( ω0 + ω1 α t ) 2 σ 2 + ( ω0 + ω1 α t ) ε t +1 2 mt+1 = −r f ,t+1 − (2.36) The SDF takes the tractable essentially affine form studied in much of the recent bondpricing literature (see Duffee, 2002, and Piazzesi, 2010, for a recent review). We have a production-based general-equilibrium affine model of the term structure with an endogenously varying price of risk. Not only is the one-period risk-free rate affine in the state variables, but so are the prices and yields for all longer-term zero-coupon bonds. The fact that the SDF is affine is convenient because it means that the model could be estimated using the Kalman filter, either through Bayesian or frequentist methods. I am not aware of an affine model of the term structure with a time-varying price of risk being derived in a production setting previously. 2.5.3 Labor supply I model labor supply as in Gourio (2010) and van Binsbergen et al. (2010). The household’s value function takes the form 1− ρ 1− ρ 1− α t 1 1− ρ Vt = 1 (1 − exp (− β)) Ct −v (1 − Nt )n + exp (− β) Et Vt1−αt +1 (2.37) where Nt represents market labor and αt follows the same process as above. The household’s labor supply condition is 1 1 − v ωt = 1 − Nt v Ct where ωt is the wage. Note that risk aversion and habits only affect labor supply to the extent that they affect consumption. When consumption rises, labor falls, all else equal. Since positive permanent technology shocks drive consumption up farther in the EZ-habit model compared to the EZ-CRRA case, labor supply will rise by less in the EZ-habit model. Following a temporary technology shock, there is little change in risk aversion, so labor supply in that case 88 (2.38) will look similar in the EZ-habit and EZ-CRRA models. This result is not specific to the Cobb-Douglas utility specification studied here. In general, preferences consistent with balanced growth will specify labor supply as some function H (ωt /Ct ) (see King, Plosser, and Rebelo, 1988). Since ωt /Ct is stationary with balanced growth, labor supply will be too. If H is monotonically increasing, regardless of its functional form, the increase in consumption induced by a decline in risk aversion will also lower labor supply. To see how habits affect labor supply here, figure 2.7 plots the response of employment to a shock to technology in the EZ-habit model versus a model with constant relative risk aversion.46 With EZ-CRRA we have the usual RBC result that the increase in technology increases ωt /Ct thus raising employment. Employment then slowly falls back down to its steady state. In the simple RBC model, it is possible to make labor supply fall following a shock by varying the parameters, but it always monotonically returns to steady state. In the EZhabit specification, though, the response of labor supply has a hump shape. On the impact of the shock, employment barely increases at all, and it then rises slowly thereafter. This behavior actually matches the response of employment to technology shocks in the literature following Gali (1999).47 In particular, all of those papers, though they use different methods, and though they find different initial responses of employment to technology, find a pronounced hump shape. Basu, Fernald, and Kimball (2006) argue that this could be explained by a New Keynesian model. Figure 2.7 shows that variation in risk aversion could also explain that behavior.48 Boldrin, Christiano, and Fisher (2001) and Jaccard (2011) note that in the RBC model with power utility and additive habits, variable labor supply undermines the ability of the RBC model to generate a volatile SDF. Intuitively, households can use labor supply to smooth consumption growth. Under power utility, the volatility of consumption growth 46 47 All of the parameters are identical to the main text, and v is set to 0.33 as in Gourio (2010). See, e.g. Christiano, Eichenbaum, and Vigfusson (2004), Francis and Ramey (2005), and Basu, Fernald, and Kimball (2006). Note, though, that those papers also find hump-shaped responses for output and consumption, which I do not obtain. 48 89 Figure 2.6: Response of employment to a technology shock 0.3 0.25 EZ-CRRA 0.2 Percent 0.15 90 EZ-habit 7 13 19 Quarters 0.1 0.05 0 25 31 37 1 Note: Response of labor supply to a unit-standard-deviation permanent increase in the level of technology. is what determines the volatility of the SDF. Under Epstein–Zin utility, though, the ability to smooth consumption shocks does not reduce the volatility of the SDF since the SDF loads almost purely on the permanent component of consumption, as we saw in the previous subsection.49 The EZ-habit model thus does not suffer from the drawback of previous habit-based models that freely variable labor supply could substantially reduce the Hansen–Jagannathan bound. 2.6 Conclusion This paper presents a model of time-varying risk aversion. It simultaneously matches the basic behavior of macroeconomic and financial aggregates. The EZ-habit model gives a framework in which consumption, output, and investment growth are all realistically volatile in both the short and long-run, consumption growth is nearly a random walk, and risk premia are high and volatile. More generally, this paper provides a general framework for modeling time-varying discount rates that can be used with other macro models. As pointed out by Cochrane (2011), asset-pricing research has recently focused on understanding variation in the price of risk over time. This paper gives a way of analyzing time-varying risk prices in the standard macro framework. I show that for the RBC model, the effect of an increase in risk aversion on consumption and investment looks similar to a decline in the household’s rate of time preference in the sense that it temporarily increases investment and reduces consumption. An obvious next step is to study the EZ-habit preferences in a richer setting. DewBecker (2011) estimates a standard medium-scale DSGE model with sticky prices and wages, but with the added feature that risk aversion varies over time, as here. Complementing the results in this paper on equity pricing, Dew-Becker (2011) shows that the EZ-habit model, when augmented with a model of inflation, can match the behavior of the nominal term structure well, generating a strongly upward-sloping term structure of I confirm this result numerically; the Hansen–Jagannathan bound is essentially identical with and without labor supply. 49 91 nominal interest rates and a volatile term premium. 92 3. BOND PRICING WITH A TIME-VARYING PRICE OF RISK IN AN ESTIMATED MEDIUM-SCALE BAYESIAN DSGE MODEL 3.1 Introduction Non-structural models are widely used in both macroeconomics and the study of the term structure of interest rates. Recently, Smets and Wouters (2003) have shown that a structural New Keynesian model can match the dynamics of the macroeconomy as well as or better than a benchmark non-structural VAR. This paper extends that work by showing that a suitably augmented version of their model can also match the dynamics of the term structure of interest rates as well as a standard non-structural model. In addition, including information from the term structure has substantial effects on the estimated sources of variation in the real economy. Bekaert, Cho, and Moreno (2010) show that a log-linearized macro model naturally also delivers closed-form expressions for bond prices. Their approximation method, however, is not able to describe risk premia, and even if it could, the model assumes that risk premia are constant. This paper builds on their work by using an approximation method that allows for positive and time-varying risk premia. I then estimate the model using Bayesian methods, and show that it fits interest rates with errors that are similar to those generated by a non-structural three-factor model. The errors in fitting annualized yields on bonds with maturities ranging from 1 quarter to 10 years have a standard deviation of 8 basis points. For the production side of the economy, I take the model described in Justiniano, Primiceri, and Tambalotti (JPT; 2010) and combine it with a preference specification that endogenously generates the essentially affine stochastic discount factor of Duffee (2002). Households are assumed to have Epstein–Zin preferences with time-varying risk aversion as in 93 Melino and Yang (2003) and Dew-Becker (2011a), which induces a time-varying price of risk. I also allow the central bank to have a time-varying inflation target, movements in which shift the entire term structure, inducing a so-called level factor in interest rates. The steady-state term spread in the model simply represents the average risk premium on long-term bonds. The steady-state term spread is estimated to be 152 basis points, similar in magnitude to the 207-basis-point average observed in the sample. To understand why that risk premium would be large, we first need to understand what drives the variance of the pricing kernel. When the representative household has Epstein–Zin preferences with a coefficient of relative risk aversion that is substantially larger than the inverse of its EIS (preferring an early resolution of uncertainty), state prices are almost entirely driven by innovations to the household’s lifetime utility, i.e. the value placed on its entire future stream of consumption and leisure. With a high EIS, transitory changes in consumption have a small effect on lifetime utility. Permanent technology shocks, though, will have large effects. Shifts in risk aversion also affect lifetime utility because they affect how much the household penalizes future uncertainty. Even though there are nine shocks in the economy, only two of them turn out to be relevant for the pricing kernel—labor-neutral technology and risk aversion. Since all of the other shocks (e.g. monetary policy, markups, government spending) are purely transitory, they have little effect on permanent income or welfare (because the household is estimated to have a relatively high EIS of 1.33), and thus they do not have a strong effect on state prices. Following a positive innovation to the level of technology, nominal interest rates are estimated to fall, making long-term bonds risky and inducing a positive slope in the term structure. This result is common to a variety of New-Keynesian models, e.g. JPT, Smets and Wouters (2004), and Christiano, Trabandt, and Walentin (2011). In this paper, the reason is that the central bank’s inflation target falls following positive technology shocks. Intuitively, a positive supply shock lowers inflationary pressure, which the central bank takes as an opportunity to drive inflation lower for an extended period. The fact that the negative correlation between technology shocks and interest rates is obtained in numerous other models that assume a constant inflation target suggests that this is in fact a well94 identified feature of the data. Variation in risk aversion also makes an important contribution to the model’s ability to the term structure of interest rates, though. Standard statistical tests easily reject a model with constant risk aversion in favor of one with time-varying risk aversion. The pricing errors for bonds are smaller by a factor of three when risk aversion is allowed to vary over time. Movements in risk aversion account for a large fraction of the variance of the term spread, particularly outside of recessions. While the variance decompositions imply that the pricing kernel is driven entirely by the labor-neutral technology and risk aversion shocks, I find that those two shocks have only minor effects on the dynamics of the real economy in the short-run. Risk aversion explains less than 5 percent of the variance of output, consumption, investment, and hours worked at business-cycle frequencies. The variable that is most responsive to the technology shock is hours worked, and the technology shock still explains only 25 percent of its variance. The variance decompositions also differ substantially from the results found by JPT. Whereas JPT find that investment technology shocks are an important driver of the business cycle, I find that they explain little other than investment, and monetary policy and markup shocks play much larger roles. This finding suggests that including information about bond prices in estimation has important effects on estimation results. In addition to matching the behavior of the term structure, the estimated parameters imply reasonable behavior for equity prices. The steady-state annualized Hansen– Jagannathan bound is estimated to be 0.47, which is consistent with the observed Sharpe ratio for the stock market in the data sample, even though data on equity returns is not included in the estimation. Furthermore, the estimated degree of variation in risk aversion is similar to (though somewhat higher than) the value used in Dew-Becker (2011a), who calibrates a general-equilibrium model that can match the both the average Sharpe ratio on equities and also empirical stock return forecasting regressions. At business-cycle frequencies, estimated risk aversion displays similar behavior to Cochrane and Piazessi’s (2005) tent-shaped bond return forecasting factor (and they are both strongly correlated with the term spread). This paper is related to a small but growing literature on bond pricing in production 95 economies. Bekaert, Cho, and Moreno (2010) and Doh (2011) estimate New-Keynesian macro models, but they do not focus on the size and volatility of the term premium, whereas that is the feature of the term structure that this paper concentrates on. Rudebusch and Swanson (2011) generate a large and volatile term premium in a calibrated model. This paper moves beyond them by considering a substantially more complex model and showing that it can be dynamically estimated through standard Bayesian methods using the Kalman filter. Models of the business cycle have strong implications for the term structure of interest rates, so adding that information can have strong effects on estimation results. For example, I find that when the model is estimated without bond price information, the shock to investment technology is estimated to account for a large fraction of the variance of short-term interest rates and the term spread. But when the term spread is included as part of the information set, the effects of investment technology shocks are much smaller. The implications of the model for wage-setting also change when interest rates are added: I estimate a substantially larger Frisch elasticity than JPT do, coming in closer agreement with micro evidence. The remainder of the paper is organized as follows. Section 3.2 describes household preferences and derives the pricing kernel. Section 3.3 describes the remainder of the economy including the production process, price setting, and monetary and fiscal policy. Next, section 3.4 explains how the model is solved. If we used perturbation methods, a third-order approximation would be necessary to capture time-variation in risk premia. The estimation of the model turns out to be sufficiently difficult, however (due to numerous local extrema in the likelihood function, a common feature of models of the term structure), that the use of a nonlinear filter for calculating the model’s marginal likelihood is infeasible. I therefore use the essentially affine solution method described in Dew-Becker (2011b). The method approximates the pricing kernel separately from the remainder of the model, allowing it to take the essentially affine form with a time-varying price of risk described in Duffee (2002). The essentially affine method is equivalent to a first-order perturbation local to the non-stochastic steady-state, but it includes corrections for volatility that allow it to substantially outperform perturbation in stochastic simulations. The key feature of the essentially affine method is that risk premia may vary over time and affect 96 real variables, not just asset prices. Section 3.5 describes the Bayesian methods used to estimate the model. Sections 3.6 and 3.7 examine the implications of the estimates for asset prices and the dynamics of the real economy, respectively. Finally, section 3.8 concludes. 3.2 Household preferences 3.2.1 Objective function and budget constraint I assume the household has recursive preferences over consumption and leisure 1− ρ 1− α t 1 1− ρ Vt = ¯ (1 − βBt ) U (Ct , Ct−1 , Nt , Zt ) + βBt Et Vt1−αt +1 (3.1) ¯ where Ct is consumption, Ct is aggregate consumption, Nt is the number of hours worked outside the home, and Et denotes the expectation operator conditional on information ¯ available at date t. The term Ct−1 allows the period utility function to potentially include external habit formation. The level of technology, Zt , may also affect household utility in order to ensure balanced growth (as in Rudebusch and Swanson, 2010). Bt is an exogenous shock to the household’s rate of time preference. The choice of exactly how to specify this preference shock is not trivial. The goal is to generate variation in consumption demand conditional on the level of interest rates. However, because Bt enters the value function, it may also affect the level of Vt , and hence asset prices. The ¯ specification (3.1) has the feature that if period utility, U (Ct , Ct−1 , Nt , Zt ), is constant over time, then a change in Bt will have no effect on Vt . So in some sense it purely affects the relative preference for consumption today versus in the future, as opposed to also affecting the household’s overall level of welfare.1 This specification thus imposes the restriction that intertemporal preference shocks are per se unpriced (in the sense that if they have no effect on consumption or leisure, they have no effect on the pricing kernel) since they have no direct effect on the level of welfare. Variance decompositions for the estiamted model reported below confirm that shocks to Bt have essentially no effect on the pricing kernel. 1 97 The household’s coefficient of relative risk aversion, αt , is allowed to vary over time. Dew-Becker (2011a) motivates variation in αt by considering adding a time-varying benchmark to the standard Epstein–Zin certainty equivalent, Et (Vt+1 − Ht )1−α . When Vt+1 is close to Ht , the household’s effective risk aversion over shocks to Vt+1 rises. The formulation (3.1) has the advantage that it is log-linear and we do not have to worry about the possibility that Vt+1 falls below Ht . In Dew-Becker (2011a), movements in αt are connected to movements in the household’s welfare. I loosen that constraint here and allow for independent shocks to risk aversion (equivalently, independent shocks to the habit). Melino and Yang (2003) study a similar specification, but without the emphasis on the habit. Unlike the intertemporal preference shock, since αt directly affects the level of welfare, shocks to αt will be per se priced – that is, even if they have no effect on consumption or leisure, they will still affect the pricing kernel through their impact on the level of welfare. The household’s budget constraint is ¯ ¯ Pt Ct + Pt It + Ht + Dt = (1 + it ) Ht−1 + Wt Nt + Πt + Rk,t ut Kt−1 − Pt a (ut ) Kt−1 + Dt−1 (3.2) where Pt is the price of the consumption good, It is the expentiture on physical investment, Ht is holdings of one-period nominal bonds, Dt is cash holdings, it is the nominally riskless interest rate, Wt is the wage, and Πt represents profits and other lump-sum transfers paid ¯ to the household. Rk,t is the rental rate on capital, K the quantity of capital the household owns, and ut the fraction it chooses to rent (with associated costs a (ut )). The dynamics of investment and capital accumulation will be discussed in more detail below. For now it is sufficient to simply note that the household sells labor and capital and allocates the proceeds between consumption and saving. For the sake of simplicity, I study the so-called cashless economy described in Woodford (2003). The monetary authority is able to control the interest rate because money enters the household’s utility function, but the effect of money on total utility is sufficiently small that we can ignore it when writing V (i.e. we take the limit where the relative importance of money goes to zero). I do not discuss money any further and from now on drop it from the household’s budget constraint. 98 3.2.2 The stochastic discount factor In general, the stochastic discount factor under recursive preferences involves transformations of the household’s value function. It is often practically difficult to directly solve for the value function. As usual with Epstein–Zin preferences, it is possible in this setting to obtain an expression for the stochastic discount factor (SDF) involving consumption growth and an asset return. However, the asset whose return enters the SDF is no longer the household’s total wealth portfolio: it is now an asset that pays a dividend depending on the period utility function U and the marginal utility of consumption. The intertemporal marginal rate of substitution of consumption between neighboring dates is Mt + 1 1 − βBt+1 UC,t+1 ∂Vt /∂Ct+1 = βBt ≡ ∂Vt /∂Ct 1 − βBt UC,t Vt+1 ρ−αt ρ−αt 1− α t (3.3) Et Vt1−αt +1 where UC,t ≡ ∂Ut /∂Ct is the marginal (period) utility of consumption. Mt+1 denotes the SDF between dates t and t + 1. In the case where Ut = Ct 1− ρ and Bt is constant, Mt+1 reduces to the usual formula for the SDF when utility depends only on consumption (e.g. Epstein and Zin, 1991). If the (period) marginal utility of consumption depends on labor, then the SDF will be distorted in the usual ways through UC,t+1 UC,t . Even if UC only depends on consumption, though (i.e. if period utility is separable between consumption and leisure), variation in labor will still affect the SDF through Vt+1 : with recursive preferences, it is not generally possible to separate labor supply decisions from asset prices, unlike the case where preferences are separable between consumption and labor and over time. 3.2.2.1 Substituting in an asset return −1 Now consider an asset that pays Ut UC,t as its dividend in each period. In the usual analysis of Epstein–Zin preferences, one substitutes the return on an asset that pays consumption as its dividend into the SDF. In the present case, dividing period utility, Ut , by the marginal utility of consumption intuitively converts Ut from utility units into consumption units. 99 −1 We now derive the price of a claim to Ut UC,t . Denote the cum-dividend price of this asset as WU,t . The appendix confirms the guess that 1− ρ −1 −1 Bt UC,t / (1 − WU,t = Vt βBt ) (3.4) and that we can substitute the return on this asset into the SDF to obtain 1− α t 1− ρ Mt + 1 = β where RU,t+1 ≡ UC,t+1 1 − βBt+1 Bt UC,t 1 − βBt 1− α t 1− ρ 1− RU,tρ 1 + ρ−αt (3.5) (3.6) WU,t+1 −1 WU,t − Ut UC,t 3.2.3 Period utility ¯ The period utility function, U (Ct , Ct−1 , Nt , Zt ) is motivated as a reduced form of a model of household production as in Rudebusch and Swanson (2010). Suppose households have power utility over both market goods and goods produced at home. ¯ Ct Ct−1 η 1− η 1− ρ 1− ρ Ut = 1−ρ + ϕ1 CH,t 1−ρ (3.7) where CH,t is consumption of the home good. Households do not derive utility directly from leisure, but rather from what they are able to produce in their non-market-work time αH (as in Campbell and Ludvigson, 2001). The home production function is Zt NH,t , for hours worked at home NH,t and a coefficient 0 < α H < 1. The level of labor-neutral technology in the economy is assumed to be equal (up to a constant of proportionality) in the home and market production sectors.2 Note that in the household sector, an exogenous shift in Zt , all else equal, raises output one-for-one, whereas below we will see that in the market sector it will raise output less than proportionally. The reason is that in the market sector, an increase in Zt also leads to an identical increase in the size of the capital stock. So, ultimately, the marginal product of labor in both sectors is proportional to Zt . One way to rationalize this slight elision would be if the household accumulates durable goods at home that aid household production. That feature of the model is left out for simplicity. 2 100 The period utility function can then be written as ¯ Ct Ct−1 η 1− η 1− ρ α (1− ρ ) Ut ≡ 1−ρ + Zt 1− ρ ϕ1 ¯ ( H − Nt ) H 1−ρ (3.8) ¯ ¯ where NH,t = H − Nt . H denotes the maximum number of hours that the household can work, either at home or in the market, and Nt is market labor. If sleep is part of home ¯ ¯ production, then H might equal 8760 hours for annual data. More generally, though, H ¯ might be smaller. As a practical matter, H affects both the elasticity of utility with respect ¯ to market labor and the Frisch elasticity. The three parameters ϕ1 , H, and α H jointly determine three primary features of household behavior: hours worked, the Frisch elasticity, and the elasticity of utility with respect to market labor. The first term in (3.8) gives the utility that comes from consumption. The household has power utility over a Cobb–Douglas aggregate of current and (aggregate) past consumption. This formulation differs from the standard recent implementation in the macro literature in that I assume a multiplicative instead of additive habit. Campbell and Cochrane (1999) show that an additive habit can induce time-varying risk aversion, whereas the multiplicative habit will have no affect on risk aversion; that way, variation in risk preferences is driven purely by αt . The key feature of the additive habit is simply that the marginal utility of current consumption is increasing in last period’s consumption, which induces consumers to try to smooth consumption growth, as observed in the data. To obtain that result in this setting (assuming 0 < η < 1), we need ρ < 1. 3.3 Aggregate supply For the supply side of the model, I follow exactly the setup in Justiniano, Primiceri, and Tambalotti (JPT; 2010). JPT is a standard medium-scale New-Keynesian model. It has 7 fundamental shocks—price and wage markups, labor-augmenting technical change, investment-specific productivity, monetary policy, discount rates (Bt ), and government spending. In JPT’s formulation, the monetary authority’s inflation target is constant. I allow it to vary to help match the movements in the long end of the yield curve. Other 101 than that and the preference specification, my model is identical to theirs. The model is also highly similar to Smets and Wouters (SW; 2003). The critical difference between the present setup and SW is that technology is difference-stationary rather than trend-stationary, where the former is standard in the production-based asset pricing literature.3 The difference-stationarity assumption helps generate large risk premia: when technology is trend-stationary, there is very little overall risk in the economy, so households must have an implausibly high coefficient of relative risk aversion in order to generate realistic asset prices.4 Since the model is standard and laid out in JPT and the main contribution of this paper is the preference specification and bond pricing, the remainder of this section gives a relatively short description of the production setup. The appendix gives a full derivation of the model, and the reader is referred to JPT for a more detailed analysis. My description follows theirs closely. 3.3.1 Producers of physical goods Final-good producers are competitive in both input and output markets and have a CES production function, Yt = 0 ˆ 1 Yt (i ) 1 1+λ p,t 1+λ p,t di (3.9) where i indexes the types of intermediate goods, Yt is output of the final good, which can be used for either consumption or investment, Yt (i ) is the use of intermediate of type i, and the elasticity of substitution across the intermediates, which determines markups in the intermediate-goods sector, varies over time. Intermediate-good producers are monopolists for their own goods with production function Yt (i ) = max Kt (i )γ Zt 1− γ ¯ Nt (i )1−γ − Zt F, 0 (3.10) 3 A difference-stationary process has first-differences that follow a stationary process, so it is integrated of order one. A trend-stationary process, on the other hand, is a process that has random stationary deviations around a non-stochastic trend (where the trend is generally unmodeled and taken as exogenous). Below, I estimate average risk aversion to be 18.7 (ignoring the correction from Swanson, 2011). Rudebusch and Swanson (2011), who use stationary technology (with a slightly different preference specification) choose an analogous parameter to be 149. 4 102 ¯ where F is a fixed cost of production that ensures that profits are zero in steady state. Kt (i ) and Nt (i ) are intermediate-good producer’s i purchases of capital and labor services, and Zt is the level of labor-augmenting technology. 3.3.2 Price setting We assume Calvo pricing. In every period, a fraction 1 − ξ p of intermediate good producers can change their prices, while the remainder index their prices following the rule, p Pt (i ) = Pt−1 (i ) πt−1 π 1−ι p ι (3.11) where Pt (i ) is the price of good i in terms of the numeraire, πt ≡ Pt /Pt−1 is aggregate inflation, and Pt = 0 ˆ 1 Pt (i ) λ −1 p,t λ p,t di (3.12) is the aggregate price index (equal to the marginal cost of a unit of the final good). π is the steady-state inflation rate, and the parameter ι p determines the degree of indexation to lagged versus average inflation. The firms that can choose their prices freely in a given period set them to maximize the present discounted value of profits over the period before they are allowed to choose a new price Et s =0 ∑ ξ sp Mt,t+s ∞ Pt (i ) k =1 ∏ π t + k −1 π 1− ι ιp s p Yt+s (i ) − Wt+s Nt+s (i ) − Rk,t+s Kt+s (3.13) where Mt,t+s ≡ ∏s=1 Mt+ j , Wt+s is the wage rate, and Rk+s is the rental rate for capital. j t 3.3.3 Employment agencies and wage setting Each household is a monopolistic supplier of specialized labor, Nt ( j). Competitive employment agencies aggregate labor supply into a homogeneous labor input (just as the 103 final good producers aggregate intermediate goods) with the production function, ˆ Nt = 0 1 Nt ( j) (1+λw,t )−1 1+λw,t dj (3.14) where, as with prices, λw,t determines the elasticity of demand and hence markups in the labor market. λw,t acts as a labor-supply shock. Since the employment agencies are competitive, the price of a unit of the homogeneous labor input is ˆ Wt = 0 1 λ −1 w,t λw,t Wt ( j) dj (3.15) The labor demand function is then Wt ( j) Wt − 1+λw,t λw,t Nt ( j) = Nt (3.16) As with prices, wages can only be changed intermittently, with probability (1 − ξ w ). If a household cannot change its wage, it indexes according to the rule Zt−1 Zt−2 ιw Wt ( j) = Wt−1 ( j) πt−1 (π exp (γ))1−ιw (3.17) where γ is the average growth rate of technology. The household will choose its wage in a manner similar to how the intermediate-good firms set prices: it maximizes expected utility over the period that the wage will remain unchanged. 3.3.4 Capital and investment Intermediate-good firms rent capital from the households at rate Rk,t . Households own ¯ a stock of capital Kt and choose a utilization rate ut . The effective quantity of capital rented to firms in period t is ¯ Kt = u t Kt (3.18) 104 The household pays a cost of utilization a (ut ) per unit of capital, with u = 1 in steady state, a (1) = 0 and χ ≡ a (1) /a (1).5 Households accumulate capital according to the rule, ¯ ¯ K t = (1 − δ ) K t −1 + µ t 1 − S It It−1 It (3.19) where δ is the depreciation rate and the function S incorporates adjustment costs in the rate of investment. In steady state, S = S = 0 and S > 0. µt is a shock to the cost of investment at date t. 3.3.5 Government policy The central bank follows a Taylor rule taking the form Rt = R R t −1 R ρR φπ ∗ πt πt ∗ πt Xt ∗ Xt φX 1 − ρ R Xt /Xt−1 ∗ ∗ Xt /Xt−1 φdX ηmp,t (3.20) where Rt is the gross nominal interest rate, R is its steady-state value, Xt is total output, ∗ ∗ Xt is the level of output that would prevail if prices had always been flexible, and πt is the inflation target at date t. The central bank is allowed to respond to both the level and change in the output gap. This flexibility helps ensure the model can match the dynamics of short-term interest rates, which is obviously critical for capturing the dynamics of the term structure. ηmp,t is an exogenous monetary policy shock. ∗ πt is a time-varying inflation target, which can potentially help match the high infla∗ tion and long-term interest rates seen in the early part of the sample. More generally, πt induces a level factor in the term structure. The government finances public spending by selling single-period bonds. Government expenditures, Gt , are a time-varying fraction of total output, 1 gt Gt = 1− Yt (3.21) 5 As usual, in the log-linear approximation, the conditions on the first and second derivatives in steady state are sufficient to describe the dynamics of the model. 105 where gt follows an exogenous process defined below. Households receive no utility from government expenditures. As long as the share of output consumed by the government is stationary, that assumption will have minimal effects on asset prices. 3.3.6 The aggregate resource constraint is Market clearing ¯ Ct + It + Gt + a (ut ) Kt−1 = Yt (3.22) 3.3.7 Exogenous processes The price and wage markup shocks follow ARMA(1,1) processes, log 1 + λ p,t = 1 − ρ p log 1 + λ p + ρ p log 1 + λ p,t−1 + ε p,t − θ p ε p,t−1 log (1 + λw,t ) = (1 − ρw ) log (1 + λw ) + ρw log (1 + λw,t−1 ) + ε w,t − θw ε w,t−1 (3.23) (3.24) 2 2 where ε p,t ∼ N 0, σp and ε w,t ∼ N 0, σw . The ARMA(1,1) form potentially helps match both the high and low-frequency features of inflation. Productivity has a unit root and its growth rate follows an AR(1) process, ¯ ∆zt = (1 − ρz ) z + ρz ∆zt−1 + ε z,t (3.25) 2 where ε z,t ∼ N 0, σz . The AR(1) setup potentially allows the model to incorporate the long-run risks studied by Bansal and Yaron (2004). The level of investment-specific productivity is assumed to be a stationary AR(1) process, log µt = ρµ log µt−1 + ε µ,t (3.26) 2 where ε µ,t ∼ N 0, σµ . Note that µt simply determines the efficiency of the transformation of the final output good into the investment good, so investment still benefits from the unit-root innovations to Zt . The government’s share of output, the monetary policy shock, and the time-preference 106 shock follow AR(1) processes, log gt = 1 − ρ g log g + ρ g log gt−1 + ε g,t ηmp,t = ρmp ηmp,t−1 + ε mp,t log Bt = ρb log Bt−1 + ε b,t (3.27) (3.28) (3.29) 2 2 2 where ε g,t ∼ N 0, σg , ε mp,t ∼ N 0, σmp , and ε b,t ∼ N 0, σb . The two exogenous processes that are added to JPT’s original model are the inflation target and risk aversion. As in Dew-Becker (2011a), I allow the innovations to risk aversion to be correlated with ε z,t . Intuitively, this means that risk aversion depends on innovations to the permanent component of consumption. There are also exogenous innovations to risk aversion. We thus have ¯ αt = ρα αt−1 + (1 − ρα ) α + θα,z ε z,t + ε α,t ˆ2 with ε α,t ∼ N 0, σα . (3.30) While a number of recent papers have studied models with time-varying inflation targets (e.g. Gurkaynak, Sack, and Swanson, 2005; Doh, 2010), there is little understanding of what actually drives the inflation target. Because the inflation target has a very strong impact on long-term bond prices, the relationship between the inflation target and the other innovations is a key determinant of the prices of long-term bonds. I therefore consider a loose specification where innovations to the inflation target may be correlated with all of the other fundamental shocks (excluding risk aversion). We thus have ∗ ∗ log πt = (1 − ρπ ) log π + ρπ log πt−1 + θπ ∗,g ε g,t + θπ ∗,z ε z,t + θπ ∗,p ε p,t + θπ ∗,w ε w,t + θπ ∗,b ε b,t + θπ ∗,µ ε µ,t + θπ ∗,mp ε mp,t + ε π ∗,t (3.31) ˆ2 with ε π ∗,t ∼ N 0, σπ ∗ . All of the shocks ε are also assumed to be independent. The θ parameters in equations (3.30) and (3.31) are somewhat difficult to interpret and 107 choose priors for. I therefore transform these parameters so that they can be interpreted as variance shares. Define 2 2 2 ˆ2 σα ≡ θα,z σz + σα (3.32) 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ˆ2 σπ ∗ ≡ θπ ∗,g σg + θπ ∗,z σz + θπ ∗,p σp + θπ ∗,w σw + θπ ∗,b σb + θπ ∗,µ σµ + θπ ∗,mp σmp + σπ ∗ (3.33) 2 2 σα and σπ ∗ are the variances of the total innovations to risk aversion and the inflation target, respectively. Next, define 2 2 θα,z σz 2 σα σα,z ≡ sign (θα,z ) (3.34) σα,z is the share of the total variance of the innovations to risk aversion that is accounted for by labor-neutral technology shocks. The sign of σα,z determines whether the effect of ∗ technology shocks on risk aversion is positive or negative. Similarly, for πt we can define 2 2 θπ ∗,X σX 2 σπ ∗ σX,z = sign (θπ ∗,X ) (3.35) 2 2 for X ∈ { g, z, p, w, b, µ, mp}. The parameters σα,z , σX,z , σα and σπ ∗ map uniquely into the ˆ2 ˆ2 original parameters θα,z , θπ ∗,X , σα , and σπ ∗ are are more easily interpreted as they represent variances and signed variance shares. 3.4 Model solution The standard method for approximating models of the form studied here is perturbation. The drawback of perturbation methods for our purposes is that if we want timevariation in risk aversion to have any effect on the dynamics of the model, we need to take a third-order approximation to the model. Since the solution would be non-linear, we would have to use the particle filter or some other nonlinear method in order to calculate the marginal likelihood of the model. I have found, though, that it is in general very difficult to find the peak of the likelihood function for this model, and it would be infeasible with a method as slow as the particle filter. This is a common problem in models of the 108 term structure (e.g. Ang and Piazzesi, 2003; Hamilton and Wu, 2011). I therefore use the essentially affine approximation method described in Dew-Becker (2011b). The essentially affine method delivers an approximation to the equilibrium dynamics of the model that is linear in the state variables but still allows time-varying risk aversion to affect the behavior of the endogenous variables. Dew-Becker (2011b) describes the method in detail and show that Euler equation errors in simulated models are competitive with third-order perturbations. Local to the non-stochastic steady-state, the essentially affine approximation is as accurate as a first-order perturbation (in a Taylor sense), and hence less accurate than higher-order perturbations. However, in a stochastic setting, it performs well. This section gives a short overview of the method, and the appendix provides further details. Denote the vector of the variables in the model (including the exogenous processes) as Xt and the vector of fundamental shocks as ε t ≡ ε mp,t , ε z,t , ε b,t , ε µ,t , ε g,t , ε p,t , ε w,t , ε α,t , ε π ∗,t . The equations determining the equilibrium of the model take the form 0 = G ( Xt , Xt+1 , σε t+1 ) (3.36) where the expectation operator may appear in the function G. There is one equation for each variable. σ is a parameter controlling the variance of the shocks. We will approximate ¯ around the point σ = 0, with the non-stochastic steady-state defined as the point X such that ¯ ¯ 0 = G ( X, X, 0) The equations G can be divided into two types: those that do not involve taking expectations over the SDF and those that do.   G ( X t , X t +1 , ε t +1 ) =  D ( Xt , Xt+1 , σε t+1 ) Et [ M ( Xt , Xt+1 , σε t+1 ) F ( Xt , Xt+1 , σε t+1 )]    (3.37) where D and F are vector-valued functions and M is the (scalar-valued) stochastic discount 109 factor.6 For the equations that do not involve the SDF, I use standard perturbation methods and simply take a log-linear approximation. We approximate D as 0 = log ( D (exp ( xt ) , exp ( xt+1 ) , σε t+1 ) + 1) ˆ ˆ 0 ≈ d0 + d x xt + d x xt+1 + dε σε t+1 (3.38) (3.39) where the terms d0 , d x , d x , and dε are coefficients from a Taylor approximation and xt ≡ log Xt ¯ ˆ xt ≡ log Xt − log X D will include equations such as budget constraints. The second set of equations is dynamic and involves expectations. In many economic models, including the present one, equations involving expectations take the form 1 = Et [ M ( Xt , Xt+1 , σε t+1 ) F ( Xt , Xt+1 , σε t+1 )] (3.40) where M ( Xt , Xt+1 , σε t+1 ) is the stochastic discount factor induced by the household’s intertemporal optimization condition. The key source of non-linearity in the model is the time-variation in risk aversion, which induces heteroskedasticity in the SDF. It is therefore natural to deal with M and F separately to isolate the relevant non-linearity. I now show that if we log-linearize F, we can transform (3.40) into a linear condition that can be solved alongside the remaining equations. M ( Xt , Xt+1 , σε t+1 ) will not even be log-linear in the state variables, but we will be able to state the equilibrium conditions in as a set of linear expectational difference equations. 6 Note that this formulation does not actually restrict F. Specifically, suppose there were a set of equilibrium conditions 1 = Et h ( Xt , Xt+1 , σε t+1 ), i.e. that do not involve the SDF. We could simply say that F ( Xt , Xt+1 , σε t+1 ) ≡ h ( Xt , Xt+1 , σε t+1 ) /M ( Xt , Xt+1 , σε t+1 ). 110 First, guess that the equilibrium dynamics of the model take the form ˆ ˆ xt+1 = C + Φ xt + Ψε t+1 (3.41) We confirm in the end that the solution is actually in this form. The next step then is to take log-linear approximations to M and F separately. Loglinearizing F is straightforward, and we obtain, log F ( xt , xt+1 , σε t+1 ) ≈ f 0 + f x xt + f x xt+1 + f ε σε t+1 (3.42) For M, in the case of the preferences laid out in section 3.2, the appendix shows that is it is possible to derive a first-order accurate expresion of the form 1 (1) ˆ mt+1 = m0 + m x xt + (κ0 + αt κ1 ) σε t+1 − σ2 α2 κ1 Σκ1 t 2 (1) (3.43) where Σ is the variance matrix of ε t . The superscript (1) indicates that mt+1 is first-order accurate for the true SDF. (3.43) is the essentially affine form from Duffee (2002). Taking the expectation of the approximated Euler equation yields,  ˆ  m0 + m x xt + (κ0 + αt κ1 ) σε t+1 − 0 = log Et exp  + f 0 + f x xt + f x xt+1 + f ε σε t+1 ˆ ˆ 0 = m0 + m x x t + f 0 + f x x t + f x ( C + Φ x t ) 1 + σ2 ( f x + f ε ) ΨΣΨ 2 f x + f ε + αt σ2 κ1 ΣΨ fx + fε 1 2 2 2 αt σ κ1 Σκ1    Since every equation in the system is now linear in the variables of the model, we can solve the system for the parameters Φ and Ψ from (3.41). Specifically, we solve the 111 following system, ˆ ˆ 0 = d0 + d x xt + d x xt+1 + dε σε t+1 ˆ ˆ 0 = m0 + m x x t + f 0 + f x x t + f x ( C + Φ x t ) 1 + σ2 ( f x + f ε ) ΨΣΨ 2 f x + f ε + αt σ2 κ1 ΣΨ fx + fε (3.44) (3.45) at the point σ = 1. The reason that the essentially affine SDF is useful is that the expectation in (3.45) will be linear in the state variables, so we have a simple linear system to solve. This system can be solved through, for example, Sims’ (2001) Gensys algorithm. Dew-Becker (2011b) shows that the transition function for the model obtained through the essentially affine method is first-order accurate for the true transition function and firstorder equivalent to a first-order perturbation. Clearly, though, the approximation includes higher-order terms that account for movements in risk aversion. αt will affect not only asset prices but also the dynamics of real variables. Dew-Becker (2011b) calibrates a simple version of the RBC model with time-varying risk aversion and finds that the essentially affine approximation has accuracy between that of second and third-order perturbations. Standard results derived in the appendix also deliver real and nominal zero-coupon bond prices. 3.5 Empirics I estimate the model using standard Bayesian methods. The observable data is the same as in JPT, but with bond prices added. Both real variables and bond prices are linear functions of the underlying state variables contained in the vector xt , so we can write the model in state-space form and measure the likelihood using the Kalman filter. I proceed by finding the posterior mode and running a monte carlo chain from that point to sample from full posterior distribution. The appendix describes the details of the estimation. 112 3.5.1 Data The sample is 1983q1 to 2004q4. I do not include the financial crisis in the sample because the zero lower bound on nominal interest rates becomes binding, a phenomenon that the model is not designed to capture. The sample is cut off in 1983 in order to ensure that monetary policy is consistent over the estimation period. The observable variables are real GDP, consumption, and investment growth, hours worked per capita, wage and price inflation, and yields on three-month, 1, 2, 3, 5, and 10year Treasury bonds. The 1 through 5-year yields are obtained from the Fama–Bliss CRSP files, the 10-year yield from Gurkaynak, Sack, and Wright (2006), and the three-month yield from the Fama risk-free rate CRSP file. The bond yields and inflation rates are always reported in annualized percentage points, unless otherwise noted. The real variables are all obtained from the BEA and the BLS. Consumption is defined as expenditures on nondurables and services, while investment is the sum of residential and non-residential fixed investment and consumer durables expenditures. Real wages are calculated as nominal compensation per hour in the non-farm business sector (from the BLS) divided by the GDP deflator. The change in the log GDP deflator is the measure of inflation. Hours worked per capita in the non-farm business sector are obtained from Francis and Ramey (2009) as updated on Valerie Ramey’s website. None of the variables are detrended. Figure 3.1 plots the data used in the estimation (with the exception of the intermediateterm bond yields). Output, consumption, and investment growth all look stationary over the sample and relatively homoskedastic. Hours worked per capita has a strong upward trend in this sample. Interest rates decline significantly over the sample, even though inflation only declines marginally. The short-term interest rate is substantially more volatile than the long-term rate, and the term spread is clearly countercyclical. The model has 9 fundamental shocks, but we have 13 observable variables. I follow JPT in assuming that the 6 macro variables plus the short-term interest rate are observed without error. I also assume that the 10-year bond yield is measured without error, which will help identify the inflation target. For the remaining bonds, I assume that the yields have i.i.d. measurement errors with identical standard deviations. The standard deviation 113 Figure 3.1: Data series for estimation 2.5 2 1.5 1 0.5 0 -0.51983 -1 -1.5 10 8 6 4 2 0 -21983 -4 -6 -8 3 2.5 2 1.5 1 0.5 0 -0.51983 -1 -1.5 14 12 10 8 6 4 2 0 1983 1988 Real GDP growth 2 1.5 1 0.5 Real consumption growth 1988 1993 1998 2003 0 1983 -0.5 -1 1988 1993 1998 2003 Real investment growth 685 680 675 670 Hours worked per capita 1988 1993 1998 2003 665 660 655 1983 1988 1993 1998 2003 Real wage growth 6 5 4 3 2 Annualized inflation 1988 1993 1998 2003 1 0 1983 1988 1993 1998 2003 1-quarter interest rate, annualized 14 12 10 8 6 4 2 0 10-year interest rate, annualized 1993 1998 2003 1983 1988 1993 1998 2003 Note: No variables are detrended. GDP, consumption, and investment are obtained from the BEA. Compensation per hour, and inflation are obtained from the BLS. Hours worked is obtained from Valerie Ramey's website. The one-quarter yield is the Fama risk-free rate. The ten-year yield is from Gurkaynak, Sack, and Wright (2006). 114 of these measurement errors is another parameter that will be estimated. The assumption of zero measurement error for the long and short ends of the yield curve forces the model to focus on matching the term spread, while leaving some flexibility in matching curvature. 3.5.2 Priors Table 3.1 lists the parameters and priors. For all of the parameters that I share with JPT, I choose the same priors. The remaining parameters are listed in the bottom section of the table. Many of them have uniform priors since I do not have strong a priori views about, for example, the fraction of the variance of the Federal Reserve’s inflation target that is driven by shocks to government spending. For the volatility of risk aversion, I choose a beta distribution over the ratio of the unconditional standard deviation of risk aversion to its mean. This means that average risk aversion is forced to be at least one standard deviation above zero. This prior could potentially be tightened to enforce a stronger restriction. As a practical matter, the data tends to push for a high volatility for risk aversion, and average risk aversion in the estimation simply rises high enough to accommodate the unconditional standard deviation. I constrain the persistence of the inflation target to follow nearly a random walk with ρπ ∗ = 0.99, consistent with the idea that the target is highly persistent. The assumption that ρπ ∗ < 1 ensures that inflation is stationary so that there is a steady-state around which we can approximate. The priors over the shares of the variances of the inflation target and risk aversion coming from the other shocks are uniform. 3.5.3 Posterior modes Table 3.1 lists the posterior modes for the parameters along with the 5th and 95th percentiles of the posterior distribution. Many of the posterior modes are reasonably close to the corresponding prior means. I focus mainly on those parameters that differ from the prior or are unique to this model. The prior for the variance of the innovations to the inflation target favors a reasonably low standard deviation, but the posterior seems to want a highly volatile target—the 115 Table 3.1: Priors and posterior modes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 α ιp ιw 100γ η λp λw LSS πSS 100(β -1-1) ν ξp ξw χ S fπ fy f∆y ρR ρz ρg ρµ ρλp ρλw ρb ρmp θp θw σR σz σg σµ σλp σλw σb σπ* σπ*,mp σπ*,z σπ*,g σπ*,µ σπ*,λp σπ*,λw σπ*,b σα/αSS σα,z ρα ρ αSS σyields Description Capital share Price indexation Wage indexation Mean technology growth Habit parameter Mean price markup Mean wage markup Mean log hours per capita Mean quarterly inflation Discount factor Inverse Frisch elasticity Price adjustment frequency Wage adjustment frequency Capital utilization costs Investment adjustment costs Taylor rule inflation Taylor rule output gap Taylor rule output gap growth Interest rate smoothing Technology shock AR Government spending AR Investment technology AR Price markup AR Wage markup AR Consumption demand shock AR Monetary policy AR Price markup MA Wage markup MA MP shock vol. Neutral tech. shock vol. Gov't spending vol. Investment tech. vol. Price markup vol. Wage markup vol. Demand shock vol. Inflation target vol. MP var. shr. in pi* z var. shr. in pi* g var. shr. in pi* mu var. shr. in pi* Price shock var. shr. in pi* Wage shock var. shr. in pi* b var. shr. in pi* RRA volatility/RRA mean z var. shr. in RRA RRA persistence Inverse EIS Mean risk aversion Bond measurement errors (bp) Distribution Normal Beta Beta Normal Beta Normal Normal Normal Normal Gamma Gamma Beta Beta Gamma Gamma Normal Gamma Normal Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta IG(1) IG(1) IG(1) IG(1) IG(1) IG(1) IG(1) IG(1) U[-1,1] U[-1,1] U[-1,1] U[-1,1] U[-1,1] U[-1,1] U[-1,1] Beta U[-1,1] U[-1,1] U[-1,1] Normal IG(1) Priors Mean Std. Dev. 0.3 0.05 0.5 0.15 0.5 0.15 0.5 0.25 0.5 0.1 0.15 0.05 0.15 0.05 6.7 0.2 0.5 0.1 0.25 0.1 2 0.75 0.66 0.1 0.66 0.1 5 1 4 1 1.7 0.3 0.125 0.04 0.125 0.05 0.6 0.2 0.6 0.2 0.6 0.2 0.6 0.2 0.6 0.2 0.6 0.2 0.6 0.2 0.4 0.2 0.5 0.2 0.5 0.2 0.1 1 0.5 1 0.5 1 0.5 1 0.1 1 0.1 1 1 1 0.1 0.1 0 0.58 0 0.58 0 0.58 0 0.58 0 0.58 0 0.58 0 0.58 0.5 0.2 0 0.58 0.5 0.29 0.5 0.29 15 5 13 33 Mode 0.13 0.39 0.77 0.48 0.52 0.10 0.15 6.75 -0.20 0.28 1.83 0.67 0.67 5.08 4.96 1.89 0.08 0.25 0.96 0.20 0.99 0.50 0.95 0.99 0.77 0.19 0.16 0.98 0.14 0.80 0.29 6.21 0.12 0.35 0.41 0.33 0.39 -0.16 0.00 0.01 -0.04 -0.06 0.28 0.95 0.00 0.77 0.76 18.70 8.20 Posterior 5% 0.10 0.21 0.66 0.43 0.32 0.02 0.07 6.71 -0.90 0.21 1.16 0.60 0.57 3.23 3.16 1.51 0.04 0.21 0.92 0.09 0.99 0.34 0.92 0.99 0.70 0.09 0.02 0.97 0.12 0.69 0.25 3.76 0.10 0.30 0.29 0.28 0.19 -0.26 0.00 0.00 -0.10 -0.12 0.18 0.83 -0.03 0.72 0.51 13.42 7.66 95% 0.15 0.56 0.88 0.52 0.71 0.18 0.24 6.79 0.74 0.47 2.87 0.71 0.72 7.19 6.71 2.51 0.16 0.30 0.97 0.31 0.99 0.67 0.97 0.99 0.80 0.30 0.39 0.98 0.17 0.97 0.35 9.39 0.18 0.43 0.63 0.41 0.48 -0.12 0.01 0.04 -0.01 -0.01 0.40 0.99 0.05 0.83 0.90 26.57 8.96 Estimates from JPT 0.17 0.24 0.11 0.48 0.78 0.23 0.15 N/A 0.71 0.13 3.79 0.84 0.7 5.3 2.85 2.09 0.07 0.24 0.82 0.23 0.99 0.72 0.94 0.97 0.67 0.14 0.77 0.91 0.22 0.88 0.35 6.03 0.14 0.2 0.04 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 1 1 N/A Note: Priors, posterior mode, and percentiles of the posterior distribution from the benchmark model. The far-right column reports the parameters from JPT, where applicable. 116 estimated standard deviation of the innovations to the annualized inflation target is 1.3 percent. This helps the model capture the observed volatility of the level factor in bond yields, but it is implausibly high. The shock to the level of labor-neutral technology has an important effect on the inflation target, accounting for 16 percent of the variance of its innovations. Following a positive innovation to technology, the central bank is estimated to lower its inflation target, consistent with the idea that following beneficial supply shocks that drive inflation downward, the central bank takes the opportunity to drive inflation lower persistently (e.g. Gurkaynak, Sack, and Swanson, 2005). This mechanism will turn out to be critical to the main results. The labor-neutral technology shock has a standard deviation of 0.72 and an autocorrelation of 0.22. The permanent component of the technology process (the Beveridge–Nelson trend) thus has a standard deviation of 1.00, which is similar to the values often calibrated in the production-based asset pricing literature (e.g. Gourio, 2010, and Dew-Becker, 2011a). The estimated long-run variance of technology growth is far smaller than the value calibrated in the long-run risks literature (e.g. Bansal and Yaron, 2004, and Kaltenbrunner and Lochstoer, 2010), but it is consistent with estimates obtained in JPT and SW and with simple univariate estimates from consumption and output data. The standard deviation of the investment technology shock is far larger than the prior, which is also consistent with JPT, showing that innovations that are isolated to the investment sector may play a large role in fluctuations. Alternatively, it could mean that the model simply matches investment poorly. The estimates imply that there is essentially zero correlation between innovations to technology and innovations to risk aversion. This runs against the theory from DewBecker (2011a), and implies that the price of risk in bond markets is driven by some factor other than permanent innovations to household consumption. Intuitively, part of the source of this result is that the price of risk is measured to be high in recessions, but over the 1983–2004 period, productivity growth has been only weakly procyclical, and in fact rose substantially in the 2000 recession. As in JPT, the government spending shock is estimated to follow nearly a unit root, 117 explaining the trend in the consumption-output ratio over the sample. The wage markup shock also follows nearly a unit root, which helps capture the strong trend in hours worked per capita seen in figure 3.1. In general, my parameter estimates are reassuringly similar to the values from JPT reported in the far-right column of table 3.1, even though I use a different sample period (post-1983 versus post-war) and extra data on bond yields. The main place where my estimates seem to differ from JPT is in price and wage determination. My estimates imply that wages are strongly indexed to inflation, whereas JPT estimate little indexation. The wage-markup shock is also substantially more volatile under my estimates than JPT. Interestingly, when bond prices are dropped from the estimation, I obtain values for wage indexation and the volatility of the wage markup shock that are much closer to JPT. This suggests that wage dynamics are important for matching the term structure. I also estimate a smaller inverse Frisch elasticity and a lower average price markup. In both cases, my estimates bring the model in closer to the priors and micro estimates. 3.6 Asset pricing This section studies the asset-pricing implications of the model. I first analyze the fit of the model to the term structure and show that it is competitive with a non-structural model. Next, I decompose the variance of the SDF to understand the source of the positive term premium in the model. I then analyze the prices of other assets, including the aggregate capital stock and a claim to aggregate profits. 3.6.1 3.6.1.1 Bond prices Fitted yields Figure 3.2 plots the deviations of the fitted yields from their actual values for the five yields that are assumed to be measured with error (reported in annualized basis points). The estimated standard deviation of these fitting errors is 8 basis points, which is economically small compared to the overall variation of the yields that is on the order of hundreds 118 of basis points. The errors are all centered around zero, meaning that the model can capture the shape of the term structure on average. The volatility of the errors looks somewhat higher for the 1 and 5-year yields and in the earlier part of the sample. There is clearly some autocorrelation in the errors; the fitted value for the 3-year yield is consistently too high in the first half of the sample, and the 4-year fitted yield is consistently too low in the second half, for example. And there is also some cross-correlation in the errors; the first principal component explains 37 percent of the total variance of the errors (twice what it would if the errors were orthogonal). These are thus clearly not classical (i.i.d.) measurement errors, but their small mean and volatility shows that the model does a reasonable job of fitting the data, and they are not disturbingly far from white noise. While there nine unobservable shock processes that can help us match the data, the model is asked to fit 13 data series, so obtaining a good fit for the bond yields is not trivial. Loosely, we have 6 macro variables that identify 6 shock processes, plus three extra processes (the monetary policy shock, the inflation target, and risk aversion) that can be used to fit the bond yields. The degrees of freedom here are thus comparable to a nonstructural bond-pricing model with three unobservable factors, but we also have numerous constraints on dynamics and risk prices. Table 3.2 lists the standard deviation of the yield errors in basis points obtained from regressing the bond yields on their first three principal components. The standard deviations are all between 4 and 8 basis points. I force the structural model to match the 1-quarter and 10-year yields exactly, and the remaining yields have errors with standard deviations of 8 basis points. The fit of the model to the yields is thus comparable to a non-structural model with three unobservable factors that have completely unrestricted dynamics. The third and fourth rows of table 3.2 report the measurement errors in constrained models that assume constant relative risk aversion and a constant inflation target (the other parameters are reesetimated). In both cases, the measurement errors have standard deviations roughly three times larger than the benchmark model, giving a measure of the improvement in fit generated by the benchmark model. Figure 3.2 and table 3.2 show that the model is able to provide a very close fit to bond yields in the data. The quality of the fit is essentially identical to the that of a purely non119 Figure 3.2: Bond yield errors 40 1-year 4-year 2-year 3-year 5-year 40 30 30 20 20 10 10 Yield errors (basis points) -10 -10 -20 -20 -30 -30 Note: each axis plots the measurement errors in basis points for one of the bond yields. Errors are measured from the Kalman-filtered estimates at the posterior mode. Yield errosr (basis points) 0 1980 1990 2000 1980 1990 2000 120 2000 1980 1980 1990 2000 1990 2000 0 1980 1990 Table 3.2: Fitting Errors PCA Benchmark model Constant RRA Constant π* 1-quarter 1-year 2-year 3-year 4-year 5-year 10-year 4.54 7.32 6.29 4.26 6.20 7.98 6.53 0 8.12 8.12 8.12 8.12 8.12 0 0 23.30 23.30 23.30 23.30 23.30 0 0 23.87 23.87 23.87 23.87 23.87 0 Note: Fitting errors measured in annualized basis points. The model-based estimates use the posterior modal estimate for the standard deviation. The 1-quarter and 10-year errors are constrained to equal zero in the structural model. The errors from PCA are the standard deviations of the residuals from regressions on the bond yields on their first three principal components. 121 structural model. 3.6.1.2 Steady-state yields Another way to evaluate the fit of the model is to ask whether the steady state of the model matches the average term structure in the data. Looking at the steady state keeps the Kalman filter from using large deviations in the unobservable state variables to fit the term structure. Figure 3.3 plots the average term structure in the sample along with its model-implied steady state. The solid black line gives the steady-state term structure in the model, renormalized so that the ten-year yield matches the empirical ten-year yield.7 To capture the uncertainty in the empirical term structure, the grey area gives the 95-percent confidence intervals for the means of the empirical yields relative to the ten-year yield (i.e. the confidence intervals for the spreads; the intervals are calculated using the Newey– West method with lag a 6-quarter lag window). What figure 3.3 shows is that the model matches the spread between the 10 and 2-year yields, but it does not match the curvature of the term structure below two years. However, all of the model-implied yields are within the 95 confidence intervals. One potential explanation at the very short end of the yield curve is that there is a small liquidity premium that the model is not incorporating. We will see that two features of the model are critical for generating the large steadystate term premium: first, following a positive shock to technology, the Fed’s inflation target falls; second, variation in risk aversion raises the premia on risky assets. To see the prima facie evidence that these two effects are key, figure 3.3 includes two lines giving the steady-state term structure in constrained models. The first line assumes that innovations to the inflation target are uncorrelated with the permanent technology shock, while the second line assumes that risk aversion is fixed. Neither line reestimates the other parameters, so they simply isolate the effects of those two features of the model. The line exiting the top of the chart is for the model when shocks to technology are assumed to have no impact on the inflation target. We then obtain the usual result that the term structure is downward-sloping, and the steady-state term spread is -261 basis points. I use this normalization because the estimated inflation target is above zero through most of the sample. The unconditional variance of the inflation target is sufficiently high that its average level is not well identified. 7 122 Figure 3.3: Steady-state nominal bond yields 8 7.5 Steady-state yields Constant risk aversion (spread=0.63) Steady-state yields π* indep. of z (spread=-2.61) 7 6.5 Steady-state yields (spread=1.52) 6 Annualized nominal yield (percentage points) 123 Average empirical yields (spread=2.07) Empirical 95% confidence band 4 8 12 16 20 Maturity (quarters) 5.5 5 4.5 4 24 28 32 36 0 Note: The solid black line gives the yield curve at the model's steady state; the grey lines are for the model with constant risk aversion and where the inflation target is unaffected by shocks to labor-neutral technology. The other parameters are not reestimated. All the linear are normalized to match the 10-year yield exactly, so the plot measures steady-state spreads. Boxes are average sample yields. The grey area is the 95% confidence band for the average yields relative to the 10-year yield, calculated using the Newey–West method with 6 lags. The solid black line gives the yield curve at the model's steady state, normalized to match the 10-year yield exactly. Time-varying risk aversion also turns out to be important, though. When risk aversion is fixed, the term structure is still upward-sloping, but the spread is quantitatively small— only 63 basis points in steady-state, compared to 207 in the data and 152 in the benchmark model. 3.6.1.3 Term premia The size of the steady-state term spread in the model can be interpreted as the average term premium—it is the excess return (in logs) that an investor earns in expectation by buying a long-term bond and holding it to maturity instead of buying short-term bonds and rolling them over for the same amount of time. An important feature of this model is that risk aversion varies over time, which should make the term premium also vary over time. The top panel of figure 3.4 plots the expected annualized excess return on holding a ten-year nominal bond (over a one-quarter bond) from the benchmark model against the expected excess return from a regression of bond returns on the Cochrane–Piazzesi (CP) factor. Cochrane and Piazzesi (2005) argue that a tent-shaped factor in forward yields summarizes the price of risk in the term structure, so their factor can be viewed as a simple non-structural benchmark for return forecasting. The structural forecast is highly correlated (34 percent) with the fitted value using the CP factor, and its standard deviation is roughly 20 percent larger. The two series rise by similar amounts in the two recessions in the sample, but the benchmark model also implies that the term premium rose in 1988 and 1999, whereas the CP factor is stable in those episodes. The bottom panel of figure 3.4 plots the term premium against the term spread. The term premium is defined as the spread between the 10-year yield and the average of the expected 1-quarter yields over the life of a 10-year bond. The variance of the term premium is non-trivial in comparison to the term spread. In the two recessions in the sample, the increases in the term spread are substantially larger than the movements in the term premium, but the term premium does rise in both episodes. Interestingly, the movements 124 Figure 3.4: Expected returns and the term premium 12 Annualized expected return on 10-year nominal bond 10 Benchmark model 8 6 4 2 Cochrane–Piazessi 0 1983 -2 5 1988 1993 1998 2003 10-year/1-quarter term spread and term premium 4 Term spread 3 Term premium 2 1 0 1983 1988 1993 1998 2003 -1 Note: Top panel gives expected excess returns on a 10-year bond over the following quarter, annualized. Values for the Cochrane–Piazzesi are from a linear regression. The term premium in the bottom panel is defined as the gap between the 10year nominal yield and the mean of expected 1-quarter yields over the following 10 years. 125 in the term spread outside of the two recessions seem almost entirely driven by movements in the term premium. In particular, the rises in the term spread in 1984, 1985, 1987, 1996, and 1999 are all associated with increases in the term premium of equal magnitudes. On the other hand, the inversions of the yield curve in 1989 and 2000, both just prior to recessions, are associated with only minor declines in risk aversion, and the subsequent rises in the term spread with similarly small rises in risk aversion. 3.6.1.4 Variation in interest rates To show how the bond yields respond to the various shocks, figure 3.5 plots responses of a level, slope, and curvature factor to the 9 fundamental shocks. Following Bekaert, Cho, and Moreno (2010), the level factor is defined as the average of the 1-quarter and 5 and 10year yields; the slope factor is the 10-year/1-quarter term spread; and curvature is the sum of the 5 and 1-year yields minus twice the 3-year yield. The shocks are orthogonalized in the sense that the interactions between the inflation target and risk aversion and the other shocks are switched off. So figure 3.5 shows, for example, the effect of a pure increase in the level of technology, holding the inflation target fixed. The shocks are all unit standard deviations. For the level factor, a number of shocks, the monetary policy and time preference shocks in particular, have important effects at high frequencies. The low-frequency movements, as we would expect, are mainly driven by shifts in the inflation target, while risk aversion also plays a role. Somewhat surprisingly, positive monetary policy shocks, which raise the short-term interest rate above its Taylor-rule value, are actually associated with declines in the level factor. The reason is that these shocks drive down expected inflation. So a positive monetary policy shock drives the real interest rate up, but nominal interest rates actually fall. The response to the time-preference shock is more intuitive: an increase in Bt is analogous to an increase in patience, so interest rates fall. The determinants of the slope factor are similar to those for the level factor: monetary policy and time-preference matter at high frequencies, while risk aversion determines the dynamics at lower frequencies. An increase in risk aversion increases the term spread, 126 Figure 3.5: Responses of term structure factors to orthogonalized shocks Monetary Pol. 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 Neutral Tech. Gov't Spending Investment. Tech. Prices Wages Time Preference Inflation Target Risk aversion 0.4 0.3 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 AAA 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0 1 11 21 31 1 11 21 31 1 11 21 31 1 11 21 31 1 11 21 31 1 11 21 31 1 11 21 31 0 0 0 0 0 0 0 0 1 11 21 31 1 11 21 31 Level Factor -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.3 -0.3 -0.3 -0.3 -0.3 -0.3 -0.3 -0.3 -0.3 -0.4 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 -0.1 -0.2 -0.3 -0.4 -0.5 0.25 0.25 0.25 0.25 -0.5 -0.5 -0.5 -0.4 -0.4 -0.4 -0.4 -0.5 0.25 -0.3 -0.3 -0.3 -0.3 -0.2 -0.2 -0.2 -0.2 -0.1 -0.1 -0.1 -0.1 11 21 31 1 11 21 31 1 11 21 31 1 11 21 31 1 11 21 0 0 0 0 31 -0.1 -0.2 -0.3 -0.4 -0.5 0.25 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3 0.3 0.2 0.1 0 1 11 21 31 0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 0.5 0.5 0.6 0.6 0.6 0.6 0.6 0.7 0.7 0.7 0.7 0.7 0.8 0.8 0.8 0.8 0.8 -0.4 -0.4 -0.4 -0.4 -0.4 -0.4 -0.4 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 -0.1 -0.2 -0.3 -0.4 -0.5 0.25 11 21 31 -0.4 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 -0.1 -0.2 -0.3 -0.4 -0.5 0.25 11 21 31 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Slope Factor Curvature factor 127 0.2 0.2 0.2 0.2 0.2 0.15 0.15 0.15 0.15 0.15 0.1 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.05 0.05 0 1 -0.05 -0.05 -0.05 11 21 31 1 11 21 31 1 11 21 0 0 31 -0.05 0 1 11 21 31 -0.05 0 1 11 21 31 -0.1 -0.1 -0.1 -0.1 -0.1 -0.15 -0.15 -0.15 -0.15 -0.15 -0.2 -0.2 -0.2 -0.2 -0.2 -0.25 -0.25 -0.25 -0.25 -0.25 0 1 11 21 31 -0.1 -0.2 -0.3 -0.4 -0.5 0.25 0.2 0.2 0.2 0.2 0.15 0.15 0.15 0.15 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.05 0 0 1 -0.05 11 21 31 0 1 -0.05 11 21 31 0 1 -0.05 11 21 31 1 11 21 31 -0.05 -0.1 -0.1 -0.1 -0.1 -0.15 -0.15 -0.15 -0.15 -0.2 -0.2 -0.2 -0.2 -0.25 -0.25 -0.25 -0.25 Note: responses of each of the term structure factors to the orthogonal structural shocks. Specifically, risk aversion and the inflation target are only affected by their own shocks, not the shocks to the other exogenous processes. The level factor is the average of the 1-quarter, 5-year, and 10-year yields. The slope factor is the gap between the 10-year and 1-quarter yields. Curvature is the sum of the 5-year and 1-year yields minus twice the 3-year yield. The shocks are all unit standard deviations. All scales in each row are identical and are measured in annualized percentage points. which fits with results on bond return forecasting (Campbell and Shiller, 1988) and the fact that the term spread forecasts high equity returns (Fama and French, 1989). 3.6.2 Determinants of asset prices The variance of the SDF 3.6.2.1 An asset’s expected excess return over the real riskless interest rate is determined by its covariance with the stochastic discount factor. One of the more interesting outputs of a model as rich as this one is the variance decomposition for the SDF. Table 3.3 reports a variance decomposition for the SDF at the one-quarter horizon. The variance of the SDF is essentially entirely driven by the neutral technology and risk aversion shocks. The bar chart in the bottom panel of table 3.3 decomposes the variance of the SDF into components coming from the neutral technology shock, the risk aversion shock, and the remaining shocks combined. The lines at the top of each bar give the 2.5 and 97.5 percentiles of the posterior distribution. The 97.5 percentile for the variance share in the SDF for nontechnology and non-risk-aversion shocks is less than 2 percent. On first glance this result might be somewhat surprising, but it is in fact a deep characteristic of models with Epstein–Zin preferences with a high EIS and high risk aversion. One way to see the source of this finding is to simply look at the household’s SDF, 1 − βBt+1 UC,t+1 1 − βBt UC,t Vt+1 ρ−αt ρ−αt 1− α t Mt+1 = βBt (3.46) Et Vt1−αt +1 For a household with a large EIS, the variance of UC,t+1 /UC,t is generally small (at least with standard preferences). In the case where the household does not have a habit (η = 0), this term is equal to (Ct+1 /Ct )−ρ . If the household has an EIS greater than 1, then ρ is less than 1 and the variance of UC,t+1 /UC,t will be less than the variance of log consumption growth. A one-percent permanent decline in consumption will raise this term by the factor 1.01ρ . The majority of the variance of the SDF is driven by the term Vt+1 Et Vt+1 1− α t ρ−αt ρ−αt 1− α t . Here, a one- percent permanent decline in consumption will make this term (approximately) equal to 128 Table 3.3: One-quarter ahead variance decompositions 1 2 3 Monetary policy Neutral tech. Gov't spending Investment tech. Price markup Wage markup Time preference Inflation target Risk aversion Moments: 10 Standard deviation 11 Correl. w/ SDF 12 Expected return 0.47 1.00 N/A Variance decompositions and 95% credible intervals 1 2 3 4 5 6 7 8 9 4.06 -0.14 0.27 4.06 -0.17 0.33 8.67 -0.09 0.36 6.54 -0.80 2.47 19.65 -0.53 4.90 1.19 0.01 N/A 1.07 0.04 N/A SDF 0.00 0.57 0.00 0.00 0.00 0.00 0.01 0.00 0.41 Utility return Cons. return 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.94 0.94 0.05 0.05 0.00 0.01 4 Nom. lev. cons. ret. 0.00 0.01 0.00 0.00 0.00 0.00 0.91 0.07 0.01 1.91 -0.12 N/A 5 6 Real lev. cons. 10-year bond ret. return 0.00 0.34 0.15 0.14 0.00 0.00 0.00 0.00 0.00 0.06 0.01 0.03 0.36 0.17 0.02 0.03 0.46 0.23 7 Output growth 0.00 0.00 0.09 0.31 0.04 0.00 0.03 0.52 0.00 8 Consumption growth 0.00 0.03 0.07 0.01 0.02 0.01 0.04 0.83 0.00 9 Investment growth 0.00 0.03 0.00 0.92 0.04 0.01 0.00 0.00 0.00 129 SDF Utility return Cons. return Levered cons. ret. LR cons. ret. 1.00 0.90 0.80 0.70 0.60 Neutral 0.50 technology Risk aversion 0.40 0.30 0.20 All other shocks 0.10 0.00 10-year bond return Output growth Consumption growth Investment growth Note: decompositions of one-quarter ahead forecast error. Levered returns for capital and consumption claims assume that the investor finances half the purchase price of the given claim with a 10-year nominally riskless bond. The 10-year return is the one-quarter return from holding a 10-year nominally riskless bond. The moments in rows 10–12 are annualized. The black bars in the figure give the 95 percent credible region based on random draws from the posterior density. 1.01αt −ρ . When αt ρ, as estimated here and in most models with Epstein–Zin prefer- ences, it is therefore the variation in Vt+1 that determines the behavior of the SDF. So what determines movements in Vt+1 ? One way to think about it is to split movements in consumption into permanent and temporary components. A purely temporary shock to consumption will have a relatively small impact on Vt+1 because the household is not very averse to shifting consumption over time. A permanent shock to consumption, on the other hand, will tend to shift Vt+1 by an equal amount. Risk aversion shocks also affect Vt . The reason is simply that increases in risk aversion directly drive Vt+1 down because households are more averse to the future uncertainty that they face. Since it is the shocks to Vt+1 that determine the movements in Mt+1 , we should thus not be surprised that it is mainly the neutral technology and risk aversion shocks that drive the variance of the SDF. 3.6.2.2 Impulse responses Since the technology and risk aversion shocks are the key to understanding the pricing kernel, it is natural to ask how they affect the economy. Figure 3.6 plots impulse responses to an increase in labor-neutral technology and a decrease in risk aversion. Dotted lines give 95-percent credible intervals (the range between the 2.5 and 97.5 percentiles in the posterior distribution). These impulse responses are different from those in figure 3.5 because they do not turn off the interactions between the inflation target and risk aversion and the other shocks. The idea is that we want to see what happens on average following these two shocks, since that behavior is what is relevant for understanding the correlations with the SDF. Following the technology shock, inflation falls, while output and real interest rates rise: a standard positive supply shock. The declines in inflation and nominal interest rates are especially pronounced and persistent. This behavior is the result of the fact that the estimates imply that the inflation target falls following an increase in technology. However, note that inflation falls more than the inflation target does, so the effect is not entirely driven by the inflation target. A similar result is obtained in JPT and SW, even though they do not have time-varying 130 Figure 3.6: Responses to technology and risk-aversion shocks Nominal risk-free rate 0.3 0.6 Real risk-free rate 0.1 0 -0.1 5 Decline in risk aversion 0.5 10 15 20 0.4 95-percent credible interval -0.3 0.3 Increase in labor-neutral technology 0.2 -0.5 0.1 -0.7 0 0 5 10 15 20 -0.9 -0.1 Inflation 0.4 0.2 0 0 -0.2 -0.4 -0.6 -0.6 -0.8 -0.8 -1 -1.2 -1.4 -1 -1.2 5 10 15 20 0.4 Inflation target Decline in risk aversion 0.2 0 0 -0.2 5 10 15 20 Increase in labor-neutral technology -0.4 Output 0.3 1.2 1 0.1 0.8 0.6 0.4 0.2 1E-15 0 -0.2 -0.4 -0.6 -0.7 5 10 15 20 -0.5 Output gap Increase in labor-neutral technology 0 -0.1 5 10 15 20 Decline in risk aversion -0.3 Note: responses in percentage points to a unit standard deviation positive shock to labor-neutral technology and a negative shock to risk aversion. Interest rate, inflation, and inflation target are annualized. Dotted lines give 2.5 and 97.5 percentiles from the posterior distribution 131 inflation targets. In all of these models, because prices are sticky, when there is a positive supply shock, rather than cutting prices, firms simply produce the same quantity as previously, thereby reducing employment and hence demand. Positive technology shocks are thus associated with small increases in output and large declines in the output gap (defined as the difference between output and the level it would take if prices were flexible). There is also a small empirical literature that provides more reduced-form evidence on effects of this sort using direct measures of technology (e.g. Basu, Fernald, and Kimball, 2006). The slow response of output to a technology shock is particularly notable. The impulse response function in figure 3.6 suggests that output and perhaps also consumption growth should be predictable and positively autocorrelated, though other shocks could obscure those relationships. Figure 3.7 therefore plots the empirical and model-predicted autocorrelation functions for consumption and output growth. For output, the model implies that the 1-quarter autocorrelation should be 0.50, while it is only 0.33 in the data. But 0.50 is well within the 95 percent confidence interval for the empirical value. In fact, nearly the entire autocorrelation function for output in the model is captured within the 95 percent confidence interval in the data. The model also implies strong one-quarter autocorrelation in consumption growth, and the autocorrelation is in fact higher than the upper end of the 95 percent confidence interval. However, this autocorrelation dies out very rapidly, and at lags longer than one quarter the autocorrelation of consumption growth in the model matches the behavior in the data well. The model implies little or no serial correlation in consumption and output growth at horizons longer than two quarters. Figure 3.6 also reports impulse responses for a decline in risk aversion. Inflation, interest rates, and the output gap all rise. This shock therefore takes the form of a classic demand shock. The effects are far smaller than those for the technology shock. Risk aversion has two main channels through which it affects the real economy. First, to the extent that physical investment is risky, a decline in risk aversion makes households more willing to purchase physical capital. Second, a decline in precautionary saving demand makes households want to consume more for any given level of interest rates. For this model, 132 Figure 3.7: Empirical and model-implied autocorrelations 0.8 Output growth Empirical 95 % confidence band 0.6 Empirical 0.4 0.2 0 1 -0.2 6 11 16 21 26 31 36 -0.4 Model -0.6 -0.8 0.6 Consumption growth 0.4 0.2 1E-15 1 6 11 16 21 26 31 36 -0.2 -0.4 -0.6 Notes: Empirical and model-implied autocorrelation functions. Gray regions are 95-percent confidence intervals 133 the increase in consumption demand dominates, which is why the shock is slightly expansionary. 3.6.2.3 Other asset prices Variance decompositions After the SDF, table 3.3 reports variance decompositions for the returns on a number of assets. The bottom panel of table 3.3 reports the fraction of the variance of the one-quarter innovation to each return coming from the neutral technology shock, the risk aversion shock, and all other shocks combined. Column 2 reports the variance decomposition for the return on the utility portfolio. 87 percent of its variance comes from the time-preference shock. The reason is simply that the utility claim has a relatively long duration, like that of a consol with a coupon that grows at the average rate of the economy, so shifts in real interest rates have a large effect on its price. The time-preference shock mainly affects real interest rates, so it drives the variance of the utility claim. Row 11 shows that the correlation of the utility return with the SDF is -0.14 More interestingly, the third and fourth columns of table 3.3 report variance decompositions for a claim on aggregate profits and the same claim levered two to one on short-term nominal debt.8 Once again, little of the variance of the return is driven by the technology or risk aversion shocks. While the neutral technology shock does play a role, it is relatively small; the vast majority of the variation is driven by shifts in the rate of time preference due to its effects on real interest rates. Assuming that equity is levered on nominal riskless debt does not change this result because the returns on nominal bonds are generally unaffected by the time-preference shock. However, column 5 shows that if equity claims are levered on real debt, then they because highly correlated with the pricing kernel, and the model can actually generate a large equity premium (though one that is still too small by half). The reason is that when the firm is levered on real bonds, the effects of time-preference shocks on the unlevered dividend claim and real bond prices cancel each other out. 8 The profits are those earned by the intermediate-good producers, Yt − Wt Nt . 134 Zero-coupon claims Lettau and Wachter (2007, 2011) argue that the value premium can be explained if value stocks have relatively short durations and the term structure of zero-coupon consumption (or dividend) claims is downward-sloping. Figure 3.8 plots the steady-state term structures for zero-coupon nominal bonds, inflation indexed bonds, and consumption claims (with the yields normalized to zero at the 1-quarter horizon). For the C zero-coupon consumption claims, figure 3.8 plots the steady-state values of − log Pj,t /Et Ct+ j /j , C where Pj,t is the price on date t of an asset that pays one unit of consumption date t + j. C − log Pj,t /Et Ct+ j /j is the average per-period discount rates applied to assets that pay a unit of consumption at date t + j. To understand the results in figure 3.8, first note that the real term structure is downward sloping for the usual reason: a positive technology shock (which drives the majority of the variance of the SDF) drives the marginal product of capital upward and raises real interest rates. So the prices of real bonds are low in good times, which induces a downwardsloping real term structure. This effect will also apply to the consumption claims since they are real claims and are discounted (partly) with real interest rates. However, there is a second effect – positive technology shocks raise the expected level of consumption in the future, which means that the consumption claims have a high payoff in good times. In the present setting, the latter effect is slightly stronger, inducing the small upward slope in the term structure for consumption claims. 3.6.2.4 The Hansen–Jagannathan bound Figure 3.9 plots the fitted quarterly Hansen–Jagannathan (HJ) bound against the nominal term spread. The estimated steady-state level of the HJ bound – 0.24 – is almost identical to the observed Sharpe ratio on the aggregate stock market in this sample of 0.26. This is notable given that equities were not included in the estimation and nominal bonds do not achieve the HJ bound. Perhaps more importantly, though, the price of risk in this model is highly volatile. The estimated standard deviation of the Hansen–Jagannathan bound is 95 percent of its mean. The level of variability here is somewhat higher than but still similar to that used in Dew-Becker (2011a) to match the degree of predictability observed 135 Figure 3.8: Steady-state term structures 2 1.5 1 Nominal bonds 0.5 136 Consumption claims 6 11 16 21 0 26 31 36 1 -0.5 Real bonds -1 -1.5 Notes: term structures for nominal and real zero-coupon bonds and zero coupon consumption claims. Horizons are measured in quarters. for aggregate stock returns in the post-war sample. 3.7 The real economy Up to now, the analysis has focused mainly on asset pricing. But the model gives a rich description of the real side of the economy. While I leave a deeper analysis of New Keynesian models to papers focused on those models for their own sake, the interaction of the real side of the economy with asset prices is important to this paper. Figure 3.10 gives a variance decomposition for the variables used in the estimation. The figure decomposes the variance of each variable at frequencies of 6 to 32 quarters into components coming from each of the structural shocks.9 Except for investment growth, for which the investment-specific shock is completely dominant, none of the other variables examined in figure 3.10 are dominated by any particular shock. Notably, the shock to risk aversion has almost no effect on the variance of any of the real variables at business-cycle frequencies. Its largest effect is on consumption growth, for which the variance share is still only 4 percent. The far-right bar, though, shows that risk aversion has a large effect on the term spread, as we saw in figure 3.5; it explains roughly 1/3 of the variance of the term spread at business-cycle frequencies. The variance decomposition reported in the top panel of figure 3.10 is rather different from that reported by JPT. They found that the investment shock was an important determinant of not only investment, but also output and consumption growth at business-cycle frequencies. Their model differs from mine in three ways: risk aversion and the inflation target are constant, the preference specification is slightly different (log utility and additive habits), and the data sample covers the entire post-war period and only includes the onequarter nominal interest rate. The bottom panel of figure 3.8 removes a number of those differences. It drops bond yields (except the short rate) from the estimation and assumes The variance decomposition is calculated using a spectral decomposition of the state-space form of the model. Specifically, since the structural shocks are orthogonal, the spectral density of the endogenous variables is equal to the sum of the densities obtained when each shock is turned on individually. Calculating variance shares over certain frequencies then simply requires integrating the density over those frequencies. I numerically integrate by calculating the spectral density at 100 increments between wavelengths of 6 and 32 quarters. 9 137 Figure 3.9: The Hansen–Jagannathan bound and the term spread 5 0.7 Hansen–Jagannathan bound 0.6 4 Term spread 0.5 3 0.4 Term spread 1 0.2 0 1988 1993 1998 2003 0.1 1983 -1 0 Hansen–Jagannathan bound 0.3 138 2 Figure 3.10: Variance decompositions at business-cycle frequencies Benchmark model 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Risk aversion Inflation target Time preference Wage markup Price markup Investment technology Government spending Neutral technology Monetary policy JPT model 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Time preference Wage markup Price markup Investment technology Government spending Neutral technology Monetary policy Note: each section of a bar represents a fraction of the variance of one of the series at frequencies between 6 and 32 quarters. The top panel gives results for the benchmark model. The bottom panel report results where the inflation target and risk aversion are fixed and bond prices are dropped from the estimation. That latter model is identical to JPT except that it uses post-1983 data and uses Epstein–Zin preferences. 139 the inflation target and coefficient of relative risk aversion are constant. The model is then reestimated. While I do not replicate JPT’s results exactly, I do also find that the investment technology shock is important for more variables than just investment itself. It accounts for 30 percent of the variation in output growth and 20 percent of consumption growth. Moreover, it accounts for 41 and 48 percent of the variance of the one-quarter nominal interest rate and the term spread, respectively. It is this latter result that explains the divergence between the two models. Since long-term bond prices are included in the estimation of the benchmark model, that model is forced to match the relationship between investment and the term spread. The difference in the variance decompositions suggests that JPT gets this relationship wrong. What is interesting, though, is that when the model is forced to match long-term bond yields, that also changes the decomposition for other variables. Table 3.3 gives one-quarter-ahead variance decompositions for output, consumption, and investment growth. This decomposition is useful for understanding whether any of these variables would be powerful asset pricing factors. Specifically, in a world where consumption followed a random walk and households had Epstein–Zin preferences with constant relative risk aversion, consumption growth would be perfectly correlated with the SDF, so it would price assets in the economy. What table 3.3 shows is that consumption, output, and investment growth are all only weakly correlated with the SDF at the one-quarter horizon—they all have correlations less than 16 percent. So asset pricing with only consumption growth will not work well in this economy. Many asset-pricing studies with Epstein–Zin preferences include both consumption growth and the return on the stock market as pricing factors. If we believe that the stock market is a claim on aggregate capital, then table 3.3 shows that it will do little to help with asset pricing as it is also only weakly correlated with the SDF. 3.7.1 Model comparison The primary difference between the model studied here and JPT is the addition of time-varying risk aversion and the time-varying inflation target. An important question, 140 then, is the extent to which those two factors improve the fit of the model to the data. Clearly, allowing a time-varying inflation target will help increase the volatility of longterm bond yields, and time-varying risk aversion will induce time-varying risk premia. But it is possible that the data can be well explained with a constant risk premium, or perhaps there is no need to have a time-varying inflation target. Table 3.4 considers two alternatives to the benchmark model: a version where the inflation target is constant, and a version where the coefficient of relative risk aversion is constant. It lists two statistics for each model. First, it gives the standard likelihood ratio used in MLE, i.e. based on log f (y|θ, M ), the likelihood of the data conditional on the model and parameters, where y represents the data and θ the parameter vector. M denotes the model choice, i.e. the full model or one of the two restricted versions. log f (y|θ, M) is closely related to the one-step-ahead forecast error of the model. The likelihood ratio test favors the benchmark over each alternative by a wide margin. One way to see the source of this rejection is to note that in table 3.2, the measurement errors for bond yields, essentially a residual variance, are three times higher in the alternative models than the benchmark. This difference alone is more than sufficient to explain the magnitude of the likelihood ratios. The second statistic is the Bayes factor, which is based on the marginal likelihood conditional only on the model, log f (y| M) (in the economics literature, see FernandezVillaverde and Rubio-Ramirez, 2004). I calculate log f (y| M ) using the monte carlo chain as in Fernandez-Villaverde and Rubio-Ramirez (2004).10 The difference in the Bayes factors listed in the bottom row of table 3.4 is similar to the values obtained under the usual likelihood ratio test. In order to accept the model with constant risk aversion or a constant inflation target, the ratio of the prior probability of either of those models to that of the benchmark model would have to be greater than exp (140). In a statistical sense, then, both the time-varying inflation target and timevarying risk aversion substantially improve the fit of the model. The last exercise I perform is to compare forecasting power between the structural 10 See also Gelfand and Dey (1994) and Geweke (1999) 141 Table 3.4: Model comparison statistics Constant RRA Constant π* Likelihood ratio 185.6 183.8 p-value 5.0E-41 1.6E-35 Bayes factor 141.50 142.50 Note: Row 1 gives the marginal likelihood of the data given the model and parameters. Row two gives the p-value for the frequentist LR test. Row 3 is the Bayes factor, the marginal probability of the data conditional on the model (i.e. integrated over the parameter space). 142 model and a VAR. Rather than include all 13 of the endogenous variables, I use the 6 macro variables and the first three principal components of the bond yields (i.e. I estimate a FAVAR). I estimate a VAR(2) with the restriction that the macro variables and the interest rate factors for not interact with eachother (though their innovations may be correlated). This assumption partly helps to limit the number of free parameters in the VAR, but it also means that we can compare the forecasting performance of the structural model to the benchmark models from the macro and asset-pricing literatures. Figure 3.11 plots root mean squared errors for the forecasts from the VAR(2) and the structural model. All of the variables are forecasted in levels, rather than differences. Because the strutural model is so difficult to estimate, I only consider in-sample forecasting performance. For the macro variables, the structural model has somewhat weaker forecasting power than the VAR. The place where the divergence is notable is for inflation. At long horizons, the structural model essentially says that the level factor in interest rates should equal expected inflation since they are both driven by the Fed’s inflation target. Over this sample, however, the level factor fell slowly, while inflation fell rapidly in the early 1980’s and then was low and stable. Forecasting long-term inflation with the level factor in this sample was thus unsuccessful compared to a simple random-walk forecast for inflation (which is basicaly what the VAR implies). For the bond yields, the structural model forecasts well. At long horizons, the RMSE generated by the structural model is smaller than the VAR by 10–50 percent, with the largest improvements coming for long-term bonds. 3.8 Conclusion This paper studies bond pricing in a medium-scale New-Keynesian model with a timevarying price of risk. I show that the model can generate a large and volatile term premium. The term premium is driven by the combination of two factors—a negative response of interest rates to positive technology shocks and variation in risk aversion. Removing either of these effects eliminates the model’s ability to match the magnitude of the term premium. 143 Figure 3.11: Forecast RMSE 3.5 3 2.5 2 1.5 Output 3 2.5 Consumption 16 14 12 Investment Structural model 2 1.5 10 8 6 4 VAR(2) 1 0.5 0 1 4.5 4 5 3.5 3 2.5 3 2 1.5 1 1 0.5 0 1 2.5 5 9 13 17 0 2 4 5 9 13 17 6 1 0.5 0 1 5 9 13 17 2 0 1 3 5 7 9 11 13 15 17 19 0.6 0.5 0.4 0.3 0.2 0.1 0 1 5 9 13 17 2.5 1 5 9 13 17 Hours worked Real wage Inflation 1-quarter yield 2.5 3-year yield 10-year yield 2 1.5 1 2 1.5 1 2 1.5 1 0.5 0 1 5 9 13 17 0.5 0 1 5 9 13 17 0.5 0 1 5 9 13 17 Note: Root mean squared error from forecasts of the endogenous variables. All variables are forecasted in levels. The horizontal axis gives the forecast horizon in quarters. 144 While shocks to risk aversion and technology determine average asset returns, they have only weak effects on real variables at business-cycle frequencies. The covariance of asset returns with real variables over the business cycle is therefore unimportant for determining average returns. It is true that the Federal Reserve tends to cut interest rates in recessions, but the model shows that most recessions are not high-marginal-utility states of the world. So the usual intuition that the Taylor rule should lead to a downward-sloping yield curve is inaccurate since it does not take into the difference between the shocks that have high risk prices and the shocks that drive the business cycle. Furthermore, while risk aversion is estimated to be highly volatile and to be an important determinant of the dynamics of the term spread, it has almost no effects on the real economy. This model thus suggests that there is a separation between the price of risk in financial markets and the real economy. 145 APPENDIX 146 A. APPENDIX TO CHAPTER 1 A.1 The approximation for average duration Now using the investment function from the text, we have Iit = exp (γ log (η1i ) + γ (µt+1 − rt+1 ) + γ log Bit + log (1 + exp (µt+2 − rt+2 ) (1 − δi ))) (A.1) = exp (γ (µt+1 − rt+1 )) exp (γ log (η1i ) + γ log Bit + log (1 + exp (µt+2 − rt+2 ) (1 − δi ))) (A.2) ¯ Using the definition of Dt and the investment function, we have ¯ Dt = ∑∑ i exp (γ log η1i + γ log Bit + log (1 + exp (µt+2 − rt+2 ) (1 − δi ))) D (δi ) i exp ( γ log η1i + γ log Bit + log (1 + exp ( µt+2 − rt+2 ) (1 − δi ))) (A.3) Now we take a linear approximation to exp (γ log η1i + γ log Bit + log (1 + exp (µt+2 − rt+2 ) (1 − δi ))) (A.4) ¯ ¯ around the point log η1 , log Bt , δ where log η1 ≡ ∑i log η1i , log Bt ≡ ∑i log Bit , and δ ≡ ∑i δi . Also define log (η1i ) ≡ log (η1i ) − log η1i log ( Bit ) ≡ log ( Bit ) − log Bt ˆ ¯ δi ≡ δi − δ 147 We have exp (γ log η1i + γ log Bit + log (1 + exp (µt+2 − rt+2 ) (1 − δi ))) ¯ ≈ exp γlog η1i + γlog Bit + log 1 + exp (µt+2 − rt+2 ) 1 − δ    1 + γlog (η1i ) + γlog ( Bit )  ×  exp − ˆ −δi 1+exp µ(µt+2 r rt+2 )1−δ ¯) ( t +2 − t +2 ) ( Simple algebra and the observation that (A.5) ∑ i ˆ 1 + γlog (η1i ) + γlog ( Bit ) − δi exp (µt+2 − rt+2 ) ¯ 1 + exp (µt+2 − rt+2 ) 1 − δ =N (A.6) yields the result   (A.7)  1 + γlog (η1i ) + γlog ( Bit )  ¯ Dt ≈ N − 1 ∑   Di exp − ˆ − 1+exp µ(µt+2 r rt+2 )1−δ δi i ¯) ( t +2 − t +2 ) ( = N −1 ∑ 1 + γlog (η1i ) + γlog ( Bit ) Di − i exp (µt+2 − rt+2 ) −1 ˆ ¯ N ∑ δi Di 1 + exp (µt+2 − rt+2 ) 1 − δ i (A.8) The term N −1 ∑i 1 + γlog (η1i ) Di is constant over time, and we thus have the desired result from the text. A.2 Further robustness tests This section describes table A.1, which has extra robustness tests for the regressions of ¯ D on the term spread and other controls. Column 1 includes up to three lags of the term spread. The second lag enters significantly, and with a coefficient slightly larger than the first lag. This sort of lagged response to price changes is commonly found in the literature. It is generally interpreted as being due to planning and delivery lags. The second column includes every one of the other controls simultaneously, instead of individually as in tables 2 and 3. The result is that the coefficients on the various other controls all become marginally significant at best, while the term spread is still highly significant. There is thus 148 Figure A.1: Further robustness tests Term Spread(t-1) Term Spread(t-2) Term Spread(t-3) Unemployment(t-1) GDP(t) GDP(t-1) Investment(t) Investment(t-1) SD_returns(t+1) SD_returns(t) SD_returns(t-1) N R2 Note: See tables 2 and 3. (1) -0.23 ** [0.10] -0.24 *** [0.08] 0.11 [0.09] 0.36 *** [0.10] (2) -0.40 [0.06] -0.27 [0.08] 0.12 [0.07] 0.18 [0.19] -0.05 [0.12] 0.55 [0.26] -0.08 [0.17] -0.12 [0.21] -0.23 [0.09] *** *** (3) -0.46 *** [0.08] (4) -0.57 *** [0.09] * ** 0.02 [0.20] 0.02 [0.13] -0.24 [0.22] -0.24 [0.16] 0.06 [0.19] -0.19 ** [0.09] 59 0.48 47 0.74 47 0.69 -0.28 *** [0.07] 0.15 ** [0.07] -0.08 [0.05] 45 0.56 something different about the term spread from all of the other business cycle, volatility, and investment controls. Column 3 is identical to column 2 except it only uses one lag of the term spread, and the coefficient is still significant at the one percent level. Finally, column 4 includes the current and lagged values of volatility instead of just the leading value. The leading value has a negative sign, indicating that high future volatility lowers duration today (consistent with a model of irreversible investment). The current value has a positive sign. One way to reconcile this is if the volatility variable should actually enter as a first difference: an increase in volatility lowers average duration, instead ¯ of a high value by itself. The lagged level of volatility is uncorrelated with Dt . 149 B. APPENDIX TO CHAPTER 2 B.1 The certainty equivalent This section looks at the relationship between the certainty equivalents using G habit and G TV . I first show that the two certainty equivalents are equal up to a second order approximation around the non-stochastic version of the model. Next, I show that in the continuous-time limit, the preferences associated with the two certainty equivalents are identical. B.1.1 Second-order approximation This section approximates the certainty equivalent G −1 ( Et ( G (Vt+1 ))) where Vt+1 = Vt × (1 + σε t+1 ) around the point σ = 0. We assume that Et ε t+1 = 0 and Et ε2+1 = 1. t Now consider the derivative of G −1 ( Et ( G (Vt+1 ))) with respect to σ, d d −1 dσ Et ( G (Vt+1 )) G ( Et ( G (Vt+1 ))) = dσ G ( G −1 ( Et ( G (Vt+1 )))) (B.1) We have d Et ( G (Vt+1 )) = dσ ˆ G (Vt (1 + σε t+1 )) ε t+1 Vt dF (ε t+1 ) d dσ Et (B.2) where F is the cdf of ε t+1 . Evaluated at σ = 0, ( G (Vt+1 )) = 0, and therefore (B.3) d −1 G ( Et ( G (Vt+1 ))) = 0 dσ So all certainty equivalents taking this form are identical up to the first order in approximations around σ. 150 Next, consider the second derivative, G G −1 ( Et ( G (Vt+1 ))) d2 E dσ2 t ( G (Vt+1 )) 2 − d 2 −1 G ( Et ( G (Vt+1 ))) = dσ2 Since d dσ Et d dσ Et ( G (Vt+1 )) d dσ G G −1 ( Et ( G (Vt+1 ))) [ G ( G −1 ( Et ( G (Vt+1 ))))] (B.4) ( G (Vt+1 )) is equal to zero at σ = 0, we can ignore the second term in the nu- merator. The second derivative of the expectation is d2 Et ( G (Vt+1 )) = dσ2 At σ = 0, d2 E dσ2 t ˆ G (Vt (1 + σε t+1 )) ε2+1 Vt2 dF (ε t+1 ) t (B.5) ( G (Vt+1 )) dσ2 d2 σ =0 = G (Vt ) Vt2 . We also have G G −1 ( Et ( G (Vt+1 ))) σ =0 σ =0 = G (Vt ), and hence G −1 ( Et ( G (Vt+1 ))) = G (Vt )Vt2 G (Vt ) . So any two choices of G, say G1 for any Vt . That relationship and G2 are equivalent up to the second order if holds for G habit and G TV . G1 (Vt ) G1 (Vt ) = G2 (Vt ) G2 (Vt ) B.1.2 Continuous time Duffie and Epstein (1992) show how to extend Epstein–Zin preferences to continuous time. They derive a utility function following the process dVt = µt + σt dBt (B.6) (B.7) = 1 − f (ct , Vt ) − A (Vt ) σt σt dt + σt dBt 2 for a Wiener process dBt . As in the main text, suppose the household’s certainty equivalent under discrete-time Epstein–Zin preferences is G −1 ( Et ( G (Vt+1 ))). Duffie and Epstein (1992) show that the analogous choice of A, obtained as a limiting case as the length of time periods approaches zero, is A (Vt ) = G (Vt ) . G (Vt ) In the case where G power (Vt ) = Vt1−α , we have A power (Vt ) = −α G power (Vt ) = power (V ) G Vt t (B.8) 151 and for G habit = (Vt − Ht )1−α Ahabit (Vt ) = For G TV = Vt1−αt , A TV (Vt ) = −α Vt − Ht −αt Vt (B.9) (B.10) V So G TV and G habit are identical if αt = α Vt −tHt , which is what is used in the text. For all three choices of the certainty equivalent G, we can use the standard choice for f , f (ct , Vt ) = β ct −Vt −ρ 1− ρ Vt 1− ρ 1− ρ . ρ then determines the elasticity of intertemporal substitution, while A determines risk aversion. B.2 Derivation of the SDF We can obtain the stochastic discount factor (SDF) by calculating the intertemporal marginal rate of substitution. We calculate two derivatives. First, ∂Vt −ρ ρ = Vt (1 − exp (− β)) Ct ∂Ct (B.11) Next, we differentiate Vt with respect to Ct+1 (w), where w denotes one state of the world, and πw is the probability of that state, ∂Vt ρ −ρ (−1) = πw Vt exp (− β) Rt Gt ∂Ct+1 (w) × ( Et [ Gt (Vt+1 (w))]) Gt (Vt+1 (w)) Vt+1 (w) (1 − β) Ct+1 (w) where Gt is the derivative of Gt and Gt (−1) ρ −ρ (B.12) − the derivative of Gt 1 . Rt ≡ G −1 ( Et Gt (Vt+1 )). The subscripts on Gt refer to the fact that Gt depends on the potentially-time-varying parameter αt . The assumption that αt is exogenous to the household is necessary for this formula for the derivative to be correct (in the same way that external habits lead to a more tractable formula for the SDF than do internal habits). The SDF can be derived from a consumer’s first order conditions for optimization as 152 Mt + 1 ( w ) = 1 ∂Vt /∂Ct+1 (w) . πw ∂Vt /∂Ct We then have Gt (Vt+1 (w)) Vt+1 (w) Ct+1 (w) ρ −ρ Gt ( Rt ) Rt Ct (−1) ρ −ρ Mt+1 (w) = exp (− β) (B.13) where the last line follows from the fact that Gt ( x ) = 1/Gt ( x ). In the case of Gt (V ) = V 1−αt , the SDF becomes Vt+1 t (w) Ct+1 (w) Rt ρ−αt ρ−α −ρ Mt+1 = exp (− β) Ct −ρ (B.14) B.2.1 Substituting in an asset return Consider an asset that pays Ct as its dividend. We guess that its cum-dividend price is Wt = Vt 1− ρ Ct (1 − exp (− β))−1 . This guess can be confirmed by simply inserting it into ρ the household’s Euler equation. The return on the consumption claim is Vt+1 Wt+1 = = Wt − Ct exp (− β) Rt (Vt+1 )1−ρ 1− ρ Rw,t+1 Ct+1 Ct ρ (B.15) Which yields Vt+1 t (w) Rt ρ−αt ρ−α = ( Rw,t+1 exp (− β)) ρ−αt 1− ρ Ct+1 Ct −ρ ρ−αt 1− ρ (B.16) We can then insert this into the SDF to yield 1− α t 1− ρ Mt+1 = exp (− β) Ct+1 Ct −ρ 1− α t 1− ρ 1− Rw,tρ 1 + ρ−αt (B.17) B.3 The log-linear model with production B.3.1 Steady state In the nonstochastic steady state, the interest rate earned by all assets, r, is equal to r = β + ρµ (B.18) 153 Standard manipulations show that the steady-state ratio of capital to technology is then exp ( β + ρµ) − 1 + δ γ 1/(γ−1) ¯ K= (B.19) We can obtain the steady-state consumption-output ratio by using the budget constraint, ¯ ¯ ¯ ¯ C = Y + (1 − δ) K − exp (µ) K ¯ ¯ C K 1 − ¯ = − (1 − δ − exp (µ)) ¯ Y Y B.3.2 The budget constraint (B.20) (B.21) The approximation I use for the budget constraint is identical to Campbell (1994). The budget constraint is Kt+1 = At Kt 1− γ − Ct + (1 − δ) Kt . I look for a log-linear approximation taking the form k t+1 = λ0 + λk k t + λ a at + λc ct , where the λ terms are coefficients from the approximation. The budget constraint can be rewritten as log [exp (∆k t+1 ) − (1 − δ)] = yt − k t + log (1 − exp (ct − yt )) (B.22) Taking a log-linear approximation to the left-hand side around the point ∆k t+1 = µ,we have log [exp (∆k t+1 ) − (1 − δ)] ≈ log [exp (µ) − (1 − δ)] + exp (µ) (∆k t+1 − µ) exp (µ) − (1 − δ) (B.23) To approximate the right-hand side of (B.22), we approximate log (1 − exp (ct − yt )) around the steady state cy, log (1 − exp (ct − yt )) ≈ log (1 − exp (cy)) + − exp (cy) (ct − yt − cy) 1 − exp (cy) (B.24) 154 This implies exp (µ) (∆k t+1 − µ) ≈ log (1 − C/Y ) + yt − k t exp (µ) − (1 − δ) − exp (cy) + (ct − yt − cy) 1 − exp (cy) (B.25) Now we can find the coefficients in the linear approximation to the budget constraint. The constant term is exp (µ) − (1 − δ) exp (µ) log [exp (µ) − (1 − δ)] + λ0 = log (1 − C/Y ) − − exp (cy) 1 − exp (cy) cy (B.26) The coefficients on k, a, and c, are then exp (µ) − (1 − δ) − exp (cy) γ−1−γ +1 exp (µ) 1 − exp (cy) exp (µ) − (1 − δ) exp (µ) − (1 − δ) λa = (1 − γ ) − (1 − γ ) exp (µ) exp (µ) exp (µ) − (1 − δ) − exp (cy) λc = exp (µ) 1 − exp (cy) λk = Now note that λk + λ a + λc = 1, So we have k t +1 = λ 0 + λ k k t + λ a a t + (1 − λ k − λ a ) c t (B.27) − exp (cy) 1 − exp (cy) (B.28) (B.29) (B.30) B.3.3 Capital return To approximate the return on capital, we say rk,t+1 = log (γ exp ((1 − γ) ( at+1 − k t+1 )) + 1 − δ) (B.31) ¯ ¯ (γ − 1) γ exp (γ − 1) k k t+1 − at+1 − k ¯ ≈ log γ exp (γ − 1) k + 1 − δ + ¯ γ exp (γ − 1) k + 1 − δ (B.32) ˜ ¯ rk,t+1 ≈ r + rkk k t+1 − k (B.33) 155 where rkk ≡ (γ − 1) (exp (r ) − 1 + δ) / exp (r ) B.3.4 Risk aversion I guess that the innovation to the value function van be written as κv ε t+1 , so that ¯ αt+1 = φαt + (1 − φ) α + λκv ε t+1 (B.34) I confirm this guess below. B.3.5 Consumption dynamics ˜ ˜ Writing k t ≡ k t − at and ct ≡ ct − at , we have ˜ ˜ ˜ k t+1 = λ0 + λk k t + λc ct − σa ε a,t+1 − µ (B.35) ˜ ˜ Now we guess that the consumption function is ct = ηc0 + ηck k t + ηcα αt (note here that I use λ terms for the budget constraint, which are terms depending only on the underlying parameters of the model; the η terms are coefficients from the optimal consumption rule). Then we have ˜ ˜ ˜ k t+1 = λ0 + λk k t + λc ηc0 + ηck k t + ηcα αt − σa ε a,t+1 − µ ˜ = ηk0 + ηkk k t + ηkα αt − σa ε a,t+1 ηk0 ≡ λ0 + λc ηc0 − µ ηkk ≡ λk + λc ηck ηkα ≡ λc ηcα This equation specifies the dynamics of capital conditional on the underlying parameters of the model and the two unknown coefficients determining the dynamics of consumption. ˜ For consumption growth, we say ∆ct+1 = ηd0 + ηdk k t + ηda αt + κd ε t+1 , where ¯ ηd0 ≡ ηck ηk0 − ηca (φα − 1) α + µ κd ≡ σa (1 − ηck ) + ηca λκv ηdk ≡ ηck (ηkk − 1) ηda ≡ ηck ηkα + ηca (φα − 1) (B.36) (B.37) 156 The remainder of the appendix confirms that our guesses for the form of the consumption and value functions are correct. B.3.6 Wealth return In the presence of balanced growth, the long-run response of consumption to an innovation of σa ε t to technology must be exactly σa ε t+1 . This is equivalent to saying that ∆Et+1 ∑ ∆ct+ j+1 = σa ε t+1 j =0 ∞ (B.38) In the case where θ approaches 1 (the steady-state dividend/price ratio approaches zero) or the consumption response only takes one period, ∆Et+1 ∑∞ 0 ∆ct+ j+1 = ∆Et+1 ∑∞ 0 θ j ∆ct+ j+1 . j= j= We therefore have the approximation, rw,t+1 = Et rw,t+1 + σa ε t+1 − ∆Et+1 ∑ θ j rw,t+ j+1 j =1 ∞ (B.39) This extra approximation is not strictly necessary, and the model is straightforward to solve without it. However, it substantially simplifies many of the formulas and makes them more transparent. The results reported below on the accuracy of the log-linear solution apply to the solution using this approximation. Now note that ∆Et+1 ∑ θ j rw,t+ j+1 = ∆Et+1 ∑ θ j αt+ j ηwa + ρEt+ j ∆ct+ j j =1 j =1 ∞ ∞ (B.40) (B.41) = ηwa θ σaa + ρ (ηck σa − ηca σaa ) 1 − θφ The second term follows from the approximation ∆Et+1 ∑ θ j ρEt+ j ∆ct+ j ≈ ∆Et+1 ∑ ρEt+ j ∆ct+ j j =1 j =1 ∞ ∞ The right hand side of this equation is simply ρ multiplied by the total amount of consumption growth expected following period t + 1. Since we know that in the long run, the 157 consumption-technology ratio is stationary, we just need to know how much consumption declines relative to technology at period t + 1. That’s going to be exactly ηck σa ε t+1 − ηca σaa ε t+1 (since capital falls by σa and α falls by −σaa ). We then have κr = σa − ηwa The return is thus rt+1 = ηw0 + ρEt ∆ct+1 + ηwa αt + σa ε t+1 + −ηwa θ θ σaa + η σa ε t+1 (B.43) 1 − θφ 1 − θηkk dk θ σaa − ρ (ηck σa − ηca σaa ) 1 − θφ (B.42) B.3.7 The Euler equation for wealth The asset pricing equation gives us  1 = Et exp ( β) 1− α t 1− ρ Ct+1 Ct −ρ 1− α t 1− ρ Rw,t+1  1− α t 1− ρ  (B.44) The log of the term inside the expectation is  log exp ( β) 1− α t 1− ρ Ct+1 Ct −ρ 1− α t 1− ρ Rw,t+1  = − 1− α t 1− ρ  1 − αt 1 − αt β−ρ ∆ct+1 1−ρ 1−ρ + 1 − αt (ηw0 + ρEt ∆ct+1 + ηwa αt + κr ε t+1 ) (B.45) 1−ρ 1 − αt 1 − αt =− β−ρ κ ε t +1 1−ρ 1−ρ d 1 − αt + (B.46) (ηw0 + ηwa αt + κr ε t+1 ) 1−ρ 1− α t 1− ρ Now taking the log of the expectation and dividing by gives 0 = − β + ηw0 + ηwa αt + 1 1 − αt (−ρκd + κr )2 2 1−ρ (B.47) 158 which implies 1 (−ρκd + κr )2 2 1−ρ 1 1 ηwa = (−ρκd + κr )2 21−ρ θ κr = σa − ηwa σaa − ρ (ηck σa − ηca σaa ) 1 − θφ B.3.8 The Euler equation for capital ηw0 = β − (B.48) (B.49) (B.50) The stochastic discount factor follows 1 − αt ρ − αt κ d ε t +1 + 1−ρ 1−ρ 1 αt − 1 (−ρκd + κr )2 + κr ε t+1 2 1−ρ mt+1 = − β − ρE∆ct+1 − ρ (B.51) For the capital return we have ˜ ¯ 1 = Et exp mt+1 + r + rkk ηk0 + ηkk k t + ηkα αt − σa ε a,t+1 − k (B.52) which implies 1 (1 − α t ) (−ρκd + κr )2 2 (1 − ρ ) 0 = − β − ρE∆ct+1 + ˜ ¯ + r + rkk ηk0 + ηkk k t + ηkα αt − k + 1 (r σa + κr )2 − (rkk σa + κr ) 2 kk 1 − αt 1−ρ (−ρκd + κr ) (B.53) Note that all of the nonlinearities disappear (i.e. the α2 terms), and this equation is linear t in the state variables. We can thus solve through the method of undetermined coefficients as usual. For the coefficients on capital, 2 0 = −ρ λk ηck + λc ηck − ηck + rkk (λk + λc ηck ) (B.54) 159 This is quadratic in ηkk , and we have ρ (1 − λk ) + rkk λc ± ηck = (ρ (1 − λk ) + rkk λc )2 + 4ρλc rkk λk 2ρλc (B.55) Now note that λc < 0, λk > 0, and rkk < 0. This implies that (ρ (1 − λk ) + rkk λc )2 + 4ρλc rkk λk > ρ (1 − λk ) + rkk λc (B.56) and hence ηck has a positive and a negative root. The root where ηck < 0 violates the transversality condition (high capital implies low consumption), so we choose the root ¯ with ηck > 0. Note that the formula for ηck does not involve α or σaa , which confirms remark 1. For the coefficients on αt , we have 1 (−ρκd + κr )2 (−ρκd + κr ) + rkk ηka + (rkk σa + κr ) 2 1−ρ 1−ρ B.3.9 Other parameters 0 = −ρηda − (B.57) To solve for (−ρκd + κr ), simply combine the equations for ηwa and κr , yielding −ρκd + κr = −1 + θ 1 + 2 1−θφ σaa σa 1 θ 1−ρ 1−θφ σaa (B.58) We choose the root for this equation that has the property that it approaches zero as σa approaches zero. That is, we know that when the shocks have zero variance, all assets have the same return, and so ηwa = 0. 160 B.3.10 Excess returns and the risk-free rate (result 2) To calculate excess returns, we can simply calculate the covariance of the wealth return with the SDF. The Sharpe ratio of the wealth portfolio is 1 2 Et rw,t+1 − r f ,t+1 + 2 κr αt − ρ = (−ρκd + κr ) + ρκd κr 1−ρ (B.59) = (αt − ρ) −1 + θ 1 + 2 1−θφ σaa σa θ 1−θφ σaa + ρκd (B.60) This also immediately gives a formula for the risk-free rate 1 2 αt − ρ Et rw,t+1 − r f ,t+1 + κr = (−ρκd + κr ) κr + ρκd κr 2 1−ρ r f ,t+1 = ηw0 + ρEt ∆ct+1 + ηwa αt 1 2 αt − ρ + κr − (−ρκd + κr ) κr − ρκd κr 2 1−ρ B.3.11 The wealth-consumption ratio (B.62) (B.61) The Campbell–Shiller approximation for the wealth-consumption ratio is ∞ z + Et ∑ θ j ∆ct+ j+1 − rt+ j+1 1−θ j =0 wt − ct = (B.63) for a constant z depending on the average consumption-wealth ratio (i.e. related to θ), z ≡ − log θ − (1 − θ ) log 1 θ ∞ − 1 . Now ¯ ¯ Et ∑ θ j ∆ct+ j+1 = −ηck k t − k − ηca (αt − α) j =0 (B.64) under the approximation θ = 1 from above, and Et ∑ θ j rw,t+ j+1 = Et ∑ θ j αt+ j ηwa + ρEt+ j ∆ct+ j+1 j =0 j =0 ∞ ∞ (B.65) 161 So θ z ¯ ¯ ¯ + (1 − ρ) −ηck k t − k − ηca (αt − α) − ηwa (αt − α) 1−θ 1 − θφ B.3.12 The value function and risk aversion (result 4) At any time, household value is 1− ρ ρ wt − c t = (B.66) Wt = Vt vt = Ct / (1 − exp (− β)) (B.67) (B.68) log (1 − exp (− β)) ( wt − ct ) + ct + 1−ρ 1−ρ The innovation to the value function, vt+1 − Et vt+1 , is equal to the sum of the innovations to ( w t +1 − c t +1 ) 1− ρ and ct+1 , which are ∆Et+1 θ ηwa ( w t +1 − c t +1 ) + ∆Et+1 ct+1 = σa ε t+1 − σaa ε t+1 1−ρ 1 − ρ 1 − θφ κv = σa ε t+1 − θ 1 1 − θφ 2 1 1−ρ 2 (B.69) (−ρκd + κr )2 σaa ε t+1 (B.70) Using the formula from above that defines (−ρκd + κr ), we have κv = (−ρκd +κr ) 1− ρ and −1 + σaa = λκv = λ θ 1 + 2 1−θφ σaa σa θ 1−θφ σaa (B.71) B.3.13 Affine bond pricing (result 5) mt+1 = − β − ρE∆ct+1 − ρ 1 − αt ρ − αt κ d ε t +1 + 1−ρ 1−ρ 1 αt − 1 (−ρκd + κr )2 + κr ε t+1 2 1−ρ (B.72) 162 var (mt+1 ) = 1 − αt (−ρκd + κr ) − κr 1−ρ 1 − αt 1−ρ 2 2 σ2 1 − αt (−ρκd + κr ) κr σ2 1−ρ (B.73) (B.74) = 2 (−ρκd + κr )2 + κr − 2 And hence 1 r f ,t+1 = − Et mt+1 + var (mt+1 ) 2   ˜ t + ηda αt − β − ρ ηd0 + ηdk k   = −  2 1− α t 1 2 1 (1− α t ) + 2 (1−ρ) (−ρκd + κr ) + 2 κr − 1−ρ (−ρκd + κr ) κr It is straightforward to show that 1 − αt (−ρκd + κr ) − κr 1−ρ 1 2 1 − αt (−ρκd + κr ) + κr 1−ρ 2 (B.75) (B.76) mt+1 = −r f ,t+1 + ε t +1 − σ2 (B.77) = −r f ,t+1 − 1 ( ω0 + ω1 α t ) 2 σ 2 + ( ω0 + ω1 α t ) ε t +1 2 (B.78) So the SDF takes the essentially affine form with (−ρκd +κr ) 1− ρ −(−ρκd +κr ) 1− ρ ω0 = − κ r ω1 = B.3.14 Accuracy of the approximation Table B.1 reports simple statistics summarizing the relationship between the projection solution and the log-linear approximation to the model. The first column lists the mean difference between the solutions, the second column the standard deviation of the gap, and the third column the standard deviation of the gap scaled by the standard deviation of the variable in the projection solution. I report deviations for log capital, log consumption growth, the coefficient of relative risk aversion, and the Sharpe ratio of the wealth portfolio. For the simulations, both models start with the same initial levels of capital and risk aversion and use the same technology shocks. I then simulate the models for 20,000 periods. 163 Figure B.1: Comparison of results from simulations of projection and log-linear model solutions Differences: Capital Cons. Growth RRA Sharpe Ratio Mean -8.72E-04 3.16E-08 1.11E-02 -2.15E-03 Std. dev. 0.00073 0.00013 0.75344 0.00892 Scaled std. dev. 0.020 0.028 0.116 0.123 Note: Comparison of the projection and log-linear solutions. The two simulations use the same shocks but different policy functions. The first column is the mean difference between the simulations, the second column the standard deviation, and the third column the standard deviation of the difference scaled by the standard of the variable in the projection solution. RRA is relative risk aversion. 164 Table B.1 shows that for capital and consumption growth, the log-linear approximation is nearly identical to the projection solution. The mean differences are essentially zero, and the standard deviations of the errors are both less than 3 percent of the standard deviations of the variables themselves. For risk aversion and the Sharpe ratio, the log-linear approximation is essentially identical to the projection solution on average, but the standard deviation of the differences is now roughly 12 percent of the standard deviation of the variables themselves. An alternative method of checking the accuracy of the approximation is to look at Euler equation errors. Figure A.1 plots histograms of the log10 Euler equation errors, α −1 log10 Et Mt+1 αKt+1 + 1 − δ − 1 under the projection solution and the log-linear so- lution at each date in the simulation. B.4 Details of return forecasting B.4.1 The method from Lettau and Ludvigson (2001) If consumption and wealth are cointegrated, then we have the relationship ct = ζwt + ξ t (B.79) where ζ is a parameter, and ξ t is a mean-zero, stationary, and not necessarily i.i.d. error term. If we observed wealth, ζ and ξ t could be directly estimated. We do not observe wealth, though, especially the human component. Lettau and Ludvigson (2001) therefore use the approximation wt = ωat + (1 − ω ) hut (B.80) where at is asset wealth and hut human wealth. This equation simply says that log aggregate wealth is equal to the sum of log asset and human wealth. Since the level of aggregate wealth is equal to the sum of the levels of asset and human wealth, the approximation is valid as long as the shares of asset and human wealth in aggregate wealth are stationary not not too variable. The fact that labor’s share of income has been stationary in the post-war US data makes this assumption reasonable. 165 Figure B.2: Log10 Euler equation error densities 4 Projection 3 Log-linearization 2 166 1 0 -9 -7 -5 -3 -1 -11 Note: Densities of Euler equation errors under the two solution methods. The log errors are defined as log10(|E[Mt+1Rk,t+1]-1|). Densities are estimated using a kernel smoother on simulated data. In both cases, the model used is the benchmark single-shock model with EZ-habit preferences and constant labor supply. Finally, we assume that labor income, yt , can be viewed as the dividend from human wealth and that the dividend/price ratio for human wealth is stationary. That is, yt = g + hut − µt (B.81) where g is a parameter and µt is a mean-zero stationary bz,1 term. This implies that wt = ωat + (1 − ω ) yt + (1 − ω ) g + µt ct = ζωat + ζ (1 − ω ) yt + ζ (1 − ω ) g + ζµt + ξ t (B.82) (B.83) since ξ t + ζµt is mean-zero and stationary, regardless of any correlation between ξ t and µt , the variables ct , at , and yt are jointly cointegrated. The parameters ζ, ω, and g can be estimated through standard methods for cointegrated models. As Lettau and Ludvigson point out, the estimation is of these parameters is superconsistent, converging linearly with sample size, so these parameters can be taken as known with certainty in any subsequent analyses (in particular, stock return forecasts). I follow Lettau and Ludvigson in referring to the cointegrating residual, ζµt + ξ t = ct − ζωat − ζ (1 − ω ) yt − ζ (1 − ω ) g as cayt , and I refer to ωat + (1 − ω ) × yt as ayt . ayt is an estimate of total wealth derived from data on consumption, asset wealth, and labor income, taking advantage of an assumed cointegrating relationship between the three variables. I estimate the parameters using standard maximum likelihood methods. B.4.2 Sensitivity analysis for return forecasting The results in section 2.4.2 depend on choices for two parameters – the EIS and the persistence of risk aversion. Tables B.2 and B.3 report the ratio of the R2 for excess value to cay for 1, 5, 10, and 20-quarter returns across a variety of choices for the EIS and the persistence of risk aversion. Table B.2 varies the EIS between 0.75 and 10. The numbers in bold represent points ˆ where cay outperforms α. When the EIS is greater than 1, cay only ever outperforms at the 1-quarter horizon, and then only if the EIS is set to 10. With an EIS less than 1, though, 167 ˆ ˆ cay always has an R2 substantially larger that of α. Moreover, the sign on α in the return ˆ regressions flips. Intuitively, this is because in the construction of v, when the EIS is less than 1, the weight on aggregate wealth is negative. The theory would predict that high risk ˆ aversion is associated with low returns, but with the EIS less than 1, α and future returns are actually positively correlated. Table B.3 presents R2 ratios for the same set of regressions, but now varying the persisˆ tence of risk aversion. Across a fairly wide range of autocorrelations, α outperforms cay at most horizons. The best performance is found with an annual autocorrelation of 0.9, which corresponds to φ = 0.974. Even with an autocorrelation as low as 0.65 (φ = 0.9), ˆ though, α performs nearly as well as cay. As with the EIS, the place where cay is most ˆ likely to outperform is with 1-quarter returns. Table B4 lists R2 s for cay, PE, and α for pre and post-1980 samples. B.4.3 Out-of-sample forecasting regressions An alternative to the in-sample regressions studied in the main text is out-of-sample tests of forecasting power. I consider the mean squared forecast bz,1 (MSFE) based tests from analyzed in Clark and McCracken (2001, 2005) and Clark and West (2007). Suppose we want to test whether a single variable, xt , forecasts stock returns, rt , against the null that rt is i.i.d. (the methods used here apply to any null model that is nested; i.e. they are appropriate for asking whether xt has marginal forecasting power when added to some other model). The forecast horizon can be any length. Therefore, denote rt,t+ j ≡ ∑τ =t rτ . ˆ We compare the residuals from the null model, e1t ≡ rt,t+ j − β 0,t (for an estimated ˆ constant mean β 0,t using data prior to date t) to the residuals from the alternative model, ˆ ˆ ˆ e2t,t+ j ≡ rt,t+ j − β 0,t − β 1,t xt+ j−1 (where β 1t is a constant regression coefficient estimates on the data from τ = 0 to τ = t − 1). The samples for the regressions are begun after the first 20 percent of the sample. The measure of the difference in MSFE is 2 2 f t,t+ j ≡ e1t,t+ j − e2t,t+ j + e1t,t+ j − e2t,t+ j 2 t+ j (B.84) 168 Figure B.3: Relative R2 s for varying EIS Span 1 quarter 5 quarters 10 quarters 20 quarters EIS=0.1 0.39 0.44 0.61 1.28 0.25 0.26 0.28 0.42 1.02 0.75 0.29 0.29 0.31 0.21 1.25 1.07 1.16 1.41 1.92 1.5 1.10 1.21 1.49 2.08 2 1.08 1.21 1.49 2.15 10 0.96 1.09 1.37 2.09 Note: This table lists the ratio of the R2 for a univariate regression of long-horizon returns on estimated risk aversion to the R2 for cay . Values less than 1 are in bold. The span in quartes is listed in the left hand column. The top row gives the EIS. The EIS is used to calculate household value and risk aversion. 169 Figure B.4: Relative R2 s for varying persistence of risk aversion Span 1 quarter 5 quarters 10 quarters 20 quarters Autocorr.=0.95 0.86 1.03 1.17 1.24 0.9 1.27 1.44 1.73 2.24 0.85 1.10 1.21 1.49 2.08 0.8 0.90 0.97 1.20 1.76 0.75 0.75 0.78 0.99 1.48 0.7 0.64 0.64 0.82 1.27 Note: This table lists the ratio of the R2 for a univariate regression of long-horizon returns on estimated risk aversion to the R2 for cay . Values less than 1 are in bold. The span in quarters is listed in the left hand column. The top row gives the annual autocorrelation of risk aversion. 170 Figure B.5: R2 sfrom pre and post-1980 univariate return forecasting regressions pre-1980 1q 5q 10q 20q Estim. RRA 0.10 0.28 0.27 0.38 cay 0.10 0.22 0.16 0.13 P/D 0.03 0.25 0.27 0.39 post-1980 1q 5q 10q 20q Estim. RRA 0.03 0.18 0.48 0.56 cay 0.03 0.15 0.36 0.33 P/D 0.03 0.08 0.19 0.29 Note: R2s from univariate regressions of long-horizon stock returns on estimated risk aversion, cay , and the price/dividend ratio. The highest value for each horizon and sample is listed in bold. 171 Under the null, the MSFE for the e1 model tends to be smaller than the MSFE for the e2 model because the e2 model has added noise due to the extraneous predictor. Intuitively, model e1 correctly imposes the constraint that β 1 = 0 under the null. The term e1t,t+ j − e2t,t+ j 2 is essentially a correction for this effect. When the forecast horizon is more than a single observation, f t,t+ j is serially correlated. To correct for this, we divide by a consistent estimate of its long-run variance (spectral density at frequency zero). Following Clark and West (2007), I use the Newey–West measure with a lag window of 1.5× j. Denote this measure of the long-run variance as S f t,t+ j . The long-run variance corrects for the fact that the forecast bz,1 from overlapping samples will be serially correlated. Clark and McCracken tabulate the critical values of the statistic ( T − j) ∑t=1 f t,t+ j /S f t,t+ j . ˆ In the main text, α is calculated using full-sample information. In particular, we need to calculate the cointegrating relationship between consumption, labor income, and financial wealth. We also need to know the average growth rate of value. For the out-of-sample forecasts, all of those parameters are estimated using only backward-looking information. The only possible source of look-ahead bias here would be data revisions. ˆ The top panel of figure B.2 plots the values of the statistics using αt as the predictor against a null of a constant expected equity for horizons from 1 to 20 quarters. We can easily reject the null at the 5 percent level at all horizons and at the 1% level for 2–13 quarter horizons. T−j B.4.3.1 Bootstrapping A major concern with predictive regressions is that asymptotic distribution theory is often a poor guide to small-sample behavior. A simple way to deal with that concern is to use a bootstrap to construct confidence intervals for the test statistics. I construct bootstrap samples in the following way. I select bootstrap samples of stock returns and growth rates of consumption, asset wealth, and labor income. I then construct level series for consumpˆ tion, wealth, and income, and calculate α using purely backward-looking information as above. Finally, I construct the test statistic from above for each bootstrapped sample at 172 Figure B.6: Out-of-sample test statistics 3 Bootstrapped 1% critical value Asy. 1% critical value Test statistic 2.5 2 Test statistic Bootstrapped 5% critical value 1.5 Asy. 5% critical value 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 4 3 Estim. risk aversion above cay Asy. 1% critical value 2 Test statistic 1 Asy. 5% critical value 0 1 -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 cay above Estim. risk aversion -2 -3 Forecast horizon (quarters) Note: Out-of-sample test statistics from Clark and McCracken (2001, 2005) based on the reduction in out-of-sample RMSE. Estimated risk aversion depends on the cointegrating model used to estimate cay . The top panel tests whether estimated risk aversion has marginal forecasting power against a null of a constant-mean model for returns. The cointegrating vector is reestimated in each period using only backward-looking information. The bottom panel tests adding estimated risk aversion to a null model including a constant and cay and vice versa. 173 each horizon from 1 to 20 quarters. I bootstrap 10,000 samples of data. The top panel of figure B.2 plots the 95th and 99th percentiles of the bootstrapped test statistics, and the out-of-sample forecasting power is still significant at the 5 percent level. B.4.3.2 ˆ α versus cay We can also use the out-of-sample test to ask whether estimated risk aversion forecasts stock returns better than cay. The null model is now one where stock returns depend on a constant and the lagged value of cay, and the encompassing alternative adds the lagged ˆ value of α. The bottom panel of figure A2 plots the test statistics. At every horizon, we can ˆ reject the null that α does not improve the forecast using cay at the 5 percent level, and we can reject the null at the 1 percent level at every horizon longer than 1 quarter. Figure B.2 also plots the statistic for a test of whether cay has any marginal predictive ˆ power above that of α. At horizons shorter than 8 quarters, we cannot reject the null that it does not. At longer horizons, though, there is evidence that both variables contain important information for forecasting stock returns. 174 C. APPENDIX TO CHAPTER 3 C.1 Results from the text C.1.1 Price of a utility claim and the SDF −1 The utility claim pays Ut UC,t as its dividend. We confirm that its cum-dividend price is WU,t = Vt 1− ρ −1 −1 Bt UC,t / (1 − β) by simply inserting this guess into the Euler equation, −1 WU,t = Ut UC,t + Et [WU,t+1 Mt+1 ]  1− ρ − −1 Vt Bt 1 UC,t / (1 − (C.1)  Vt+1 ρ−αt ρ−αt 1− α t β) = −1 Ut UC,t −1 −1  Vt+1 Bt+1 UC,t+1 Bt+1 UC,t+1  β + Et  Bt UC,t (1 − β ) 1− ρ Et Vt1−αt +1    (C.2)  Vt 1− ρ  Vt+1 ρ−αt ρ−αt 1− α t  1− ρ = (1 − β) Bt Ut + βEt Vt+1  Et Vt1−αt +1 1− ρ 1− α t    (C.3) 1− ρ Vt = (1 − β) Bt Ut + βEt Vt1−αt +1 (C.4) The last line shows that the guess for the price of the utility claim was in fact correct, since the Euler equation is guaranteed to hold. 175 Next, we consider the return on the utility claim. We have −1 −1 Vt+1 Bt+1 UC,t+1 / (1 − β) 1− ρ −1 −1 −1 Bt UC,t / (1 − β) − Ut UC,t 1− ρ Vt−1 Bt+1 UC,t+1 1− ρ Bt UC,t Vt − (1 − β) Bt Ut 1− ρ Vt+1 Bt+1 UC,t+1 −1 1− ρ RU,t+1 = Vt (C.5) −1 = =  ρ1−αρt − Vt+1 1− ρ 1− ρ 1− α t (C.6) (C.7) β     Et Vt1−αt +1 Et Vt1−αt +1 ρ−αt 1− ρ ρ−αt 1− ρ 1− ρ 1− α Bt UC,t    = RU,t+1 β UC,t+1 UC,t ρ−αt 1− ρ Bt+1 Bt ρ−αt 1− ρ (C.8) Now substitute the return into the SDF: ρ−αt ρ−αt B U 1− = β t+1 C,t+1 RU,tρ+1 β 1−ρ Bt UC,t 1− α t 1− ρ Mt + 1 UC,t+1 UC,t 1− RU,tρ 1 + ρ−αt ρ−αt 1− ρ Bt+1 Bt ρ−αt 1− ρ (C.9) (C.10) =β UC,t+1 Bt+1 UC,t Bt 1− α t 1− ρ This is the formula from the text. C.1.2 First-order condition for wage setting 0= ∗ Et ∗ = Et k =0 ∞ ∑ ∞ k ξw 1 + θw ψt+k θw Wt+k ( j) Wt+k θ − 1+ww θ Nt+k − λt+k Nt+k ( j) Wt+ j ( j) (C.11) k =0 k ∑ ξw dVt dVt Wt+k ( j) + dNt+k ( j) dCt+k Pt (1 + θ w ) − θ w dVt Wt+k ( j) Nt+k ( j) dCt+k Pt (C.12) ∗ = Et ∗ = Et k =0 ∞ k =0 k ∑ ξw k ∑ ξw ∞ dVt dVt Wt+k ( j) Nt+k ( j) (1 + θ w ) + dNt+k ( j) dCt+k Pt dVt /dCt+k dVt /dCt dVt /dNt+k ( j) W ( j) Nt+k ( j) (1 + θ w ) + t + k dVt /dCt+k Pt (C.13) (C.14) 176 C.2 Approximation method This section proceeds as follows. First, I fix notation for a general set of equilibrium conditions. Next, I describe the specifics of how to solve for the model’s equilibrium dynamics. Third, I show that the essentially affine approximation method delivers bondpricing formulas in the essentially affine class of Duffee (2002). C.2.1 Equilibrium conditions Denote the vector of the variables in the model (including the exogenous processes) as Xt . In addition to the various variables described above that track the state of the economy and the shocks, Xt will include the price/dividend ratio of the utility portfolio, the return on the utility portfolio, and the marginal utility of consumption. The vector of fundamental shocks in the model is denoted ε t ≡ ε mp,t , ε z,t , ε b,t , ε µ,t , ε g,t , ε p,t , ε w,t , ε α,t , ε π ∗,t . The equations determining the equilibrium of the model take the form 0 = G ( Xt , Xt+1 , σε t+1 ) (C.15) where the expectation operator may appear in the function G.There is one equation for each variable in the model. The new parameter σ is the usual parameter used in perturbation approximations that controls the variance of the shocks. In the true model, σ = 1, but we will consider an approximation around the point σ = 0. The equations G can be divided into two types: those that do not involve taking expectations over the SDF and those that do.   G ( X t , X t +1 , ε t +1 ) =  D ( Xt , Xt+1 , σε t+1 ) Et [ M ( Xt , Xt+1 , σε t+1 ) F ( Xt , Xt+1 , σε t+1 )]    (C.16) where D and F are vector-valued functions and M is the (scalar-valued) stochastic discount factor. Note that this formulation does not actually restrict F. Specifically, suppose there were a set of equilibrium conditions 1 = Et h ( xt , xt+1 , σε t+1 ), i.e. that do not involve the SDF. We could simply say that F ( xt , xt+1 , σε t+1 ) ≡ h ( xt , xt+1 , σε t+1 ) /M ( xt , xt+1 , σε t+1 ). 177 Note, though, that in this model, all expectational equations involve the SDF. For the equations that do not involve the SDF, I use standard perturbation methods and simply take a log-linear approximation. We approximate D as 0 = log ( D (exp ( xt ) , exp ( xt+1 ) , σε t+1 ) + 1) ˆ ˆ 0 ≈ d0 + d x xt + d x xt+1 + dε σε t+1 (C.17) (C.18) where the terms d0 , d x , d x , and dε are coefficients from a Taylor-series approximation and xt ≡ log Xt ¯ ˆ xt ≡ log Xt − log X C.2.2 Linearizing the Euler equations I now show that if we log-linearize the function F, we can transform (3.40) into a linear condition that can be solved alongside the remaining equations. Mt+1 will not even be log-linear in the state variables, but we will be able to state the equilibrium conditions as a set of linear expectational difference equations. First, guess that the approximated equilibrium dynamics of the model take the form ˆ ˆ xt+1 = C + Φ xt + Ψσε t+1 (C.19) ¯ ˆ where xt ≡ log ( Xt ) − log ( X ). We confirm in the end that the solution to the approximated equilibrium conditions actually takes this form. Next, define the matrices ΓUC , Γb , and Γr as matrices that select individual elements of xt such that ˆ ˆ ˆ ˆ ˆ ˆ uC,t = ΓUC xt , bt = Γb xt , rU,t = Γr xt (C.20) where lower-case letters with circumflexes denote log deviations from non-stochastic steady¯ ˆ state values. That is, uC,t ≡ log UC,t − log UC , etc.For the sake of arithmetical convenience, also define an auxiliary variable ζ t ≡ 1− α t 1− ρ . 178 C.2.2.1 The essentially affine SDF U= η ¯ 1− η Ct Ct−1 1− ρ 1−ρ + Zt 1− ρ ϕ1 α (1− ρ ) ¯ ( H − Nt ) H 1−ρ UC = ηct η (1−ρ)−1 (1−η )(1−ρ) η (1−ρ)−1 (1−η )(1−ρ) Zt Zt−1 c t −1 = ηct The SDF is η (1−ρ)−1 (1−η )(1−ρ) −(1−η )(1−ρ) (1−η )(1−ρ) −ρ Zt Zt Zt−1 c t −1 Mt + 1 = β 1− α t 1− ρ UC,t+1 1 − βBt+1 Bt UC,t 1 − βBt 1− α t 1− ρ 1− RU,tρ 1 + ρ−αt (C.21) Mt+1 is completely log-linear in the endogenous variables UC,t and RU,t+1 , but it is not log-linear in Bt and Bt+1 . If not for these terms, we could use the exact formula for Mt+1 in what follows. Since those terms are non-linear, I use the approximations, β 1 − β exp (bt+1 ) 1 bt − bt + 1 ≈ exp (−bt ) − β 1−β 1−β ¯ ˆ mt+1 ≡ m + ζ t ∆uC,t+1 + (1) (1) log (C.22) ˆ + (ζ t − 1) rU,t+1 (C.23) 1 β bt − bt + 1 1−β 1−β mt+1 is first-order accurate for mt+1 , hence the superscript. In the continuous time limit, this formula becomes exact. The Euler equation for the return on the utility portfolio is ˆ 1 = Et exp ζ t ∆uC,t+1 + 1 β bt − bt + 1 1−β 1−β ˆ + ζ t rU,t+1 (C.24) Taking logs of both sides and taking advantage of log-normality gives 1 β ˆ bt − Et bt+1 + Et rU,t+1 1−β 1−β ΓUC − β Γ + Γr 1−β b (C.25) ˆ ˆ 0 = Et uC,t+1 − uC,t + 1 β + ζ t σ2 ΓUC − Γ + Γr ΨΨ 2 1−β b 179 Moreover, this implies that the approximated SDF can be written as mt+1 = ζ t ΓUC − (1) β Γ + Γr 1−β b ¯ Ψσε t+1 − Γr (Φxt + Ψσε t+1 ) − r ΓUC − β Γ + Γr 1−β b (C.26) β 1 2 − ζ t σ2 ΓUC − Γ + Γr ΨΨ 2 1−β b which is the essentially affine form from the text. The reason that this form is useful is that any time the SDF is essentially affine, we can obtain an exact expression for Et exp (mt+1 + f 0 + f x xt + f x xt+1 ). It also means that we can price any asset whose payoffs are linear in the endogenous variables, including real and nominal bonds. C.2.2.2 Approximation to F Next, we take a first-order Taylor approximation to log F such that log F ( xt , xt+1 , ε t+1 ) ≈ f 0 + f x xt + f x xt+1 , giving (1) ˆ 1 = Et exp mt+1 + f x xt + f x xt+1 Taking logs and evaluating the expectation yields 1 ˆ 0 = − Et rU,t+1 + f x xt + f x Et xt+1 + σ2 (−Γr + f x ) ΨΨ (−Γr + f x ) 2 β 2 + ζ t σ ΓUC − Γ + Γr ΨΨ (−Γr + f x ) 1−β b (C.27) (C.28) (C.28) is the equation that we ultimately place into the system to be solved. It is completely linear in both xt and ζ t (equivalently, αt ). C.2.2.3 Solution Since every equation in the system is now linear in the variables of the model, we can solve the system for the parameters Φ and Ψ from (C.19). Specifically, we solve the 180 following system, 1 ˆ 0 = − Et rU,t+1 + f x xt + f x Et xt+1 + σ2 (−Γr + f x ) ΨΨ (−Γr + f x ) 2 β + ζ t σ2 ΓUC − Γ + Γr ΨΨ (−Γr + f x ) 1−β b 0 = d0 + d x xt + d x xt+1 + dε σε t+1 (C.29) (C.30) where σ = 1 in the stochastic equilibrium that we approximate. This system can be solved using, for example, Sims’ (2001) Gensys algorithm. The last wrinkle here is that we cannot simply insert (C.28) into the set of equations to be solved since it involves the matrix Ψ, which is one of the unknown structures we are solving for. I deal with this with a simple fixed-point iteration: I begin with the equations that we obtain from perturbation, 0 = −Γr Et xt+1 + f x xt + f x Et xt+1 ˆ ˆ 0 = d0 + d x xt + d x xt+1 + dε σε t+1 which will deliver an initial value of Ψ, denoted Ψ(1) . We then use Ψ(1) to change the equilibrium condition to take the form 1 ¯ 0 = −Γr Φxt − r + f x xt + f x Φxt + σ2 (−Γr + f x ) Ψ(1) Ψ(1) (−Γr + f x ) 2 β + ζ t σ2 ΓUC − Γ + Γr Ψ(1) Ψ(1) (−Γr + f x ) 1−β b ˆ ˆ 0 = d0 + d x xt + d x xt+1 + dε σε t+1 (C.31) (C.32) which delivers a value Ψ(2) . Then simply iterate to convergence. I treat parameter sets for which the iteration diverges as inadmissible, setting the marginal likelihood to zero. 181 C.2.3 Bond pricing To solve for bond prices, we guess that bond prices are log-linear in the vector of state variables, so that pn,t = An + Bn xt (C.33) where pn,t is the price of a zero-coupon bond that matures on date t + n and pays 1 unit of consumption. We can also write the price of a nominal bond as p$ . Using the formula for n,t the SDF from above, we have ¯ + Γr Ψε t+1 − Γr (Φxt + Ψε t+1 ) − r  ζ t ΓUC −  = Et exp  + An−1 + ( Bn−1 − Γπ ) (Φxt + Ψε t+1 )   β 1 2 − 2 ζ t σ2 ΓUC − 1− β Γb + Γr ΨΨ (ΓUC + Γb + Γr ) β 1− β Γ b        (C.34) exp p$ n,t ¯ = −r − Γr Φxt + An−1 + ( Bn−1 − Γπ ) Φxt 1 + σ2 ( Bn−1 − Γπ − Γr ) ΨΨ ( Bn−1 − Γπ − Γr ) 2 β Γ + Γr ΨΨ ( Bn−1 − Γπ − Γr ) Γζ xt + σ2 ΓUC − 1−β b Matching coefficients gives 1 ¯ An = −r + An−1 + σ2 ( Bn−1 − Γπ − Γr ) ΨΨ ( Bn−1 − Γπ − Γr ) (C.37) 2 β Γ + Γr ΨΨ ( Bn−1 − Γπ − Γr ) Γζ (C.38) Bn = −Γr Φ + ( Bn−1 − Γπ ) Φ + ΓUC − 1−β b (C.35) (C.36) C.3 Estimation Much of the analysis discusses the behavior of the model around the posterior mode (i.e. the peak of the posterior distribution; it is also a maximum-likelihood estimate penalized by the prior distribution). I also start the Metropolis–Hastings chain from that point. To search for the posterior mode, I begin by running a genetic algorithm on a population of 60 points drawn from the prior distribution. The genetic algorithm searches the parameter space by mixing parameter sets in the population and also allowing random mutations. After 30 iterations of the genetic algorithm, I take the point in the population with the 182 highest posterior density and use it as the starting point for Chris Sims’ CSMINWEL algorithm, which is a derivative-based hill-climbing algorithm that is designed for DSGE models. When CSMINWEL gets stuck, I also try the standard simplex algorithm. I ran this combined search 2500 times (each search takes roughly an hour, so access to a large computing cluster was essential). The point that I am calling the posterior mode was found to be the peak in fewer than 100 of the searches. In other words, it is extremely difficult to find the peak of the posterior likelihood. In general, I found that it was far easier to find the posterior mode when risk aversion or the inflation target was held fixed, and easier still when bond prices were also dropped from the estimation. Even though the priors help to add curvature to the posterior likelihood surface, I still find many local maxima, a problem that plagues the bond-pricing literature. Furthermore, since the model is so highly constrained there is no straightforward way to use the more recent estimation algorithms for affine term structure models proposed by, for example, Hamilton and Wu (2012) and Joslin, Singleton, and Zhu (2010). I simulate the posterior distribution using the adaptive random-walk Metropolis– Hastings algorithm of Haario, Saksman, and Tamminen (2001). I initialize the chain at the posterior mode. For the proposal distribution, I begin with a normal distribution whose variance matrix is equal to that of the prior, multiplied by (2.382 )/d, where d is the dimension of the parameter vector (49), which is the optimal scaling factor of Gelman, Roberts, and Gilks (1996). After 10,000 iterations of the algorithm, I update the variance matrix of the proposal distribution to be equal to the observed variance matrix for the first 10,000 iterations of the chain. Subsequently, the variance is updated each on each iteration using the sample variance of the chain up to the current iteration.1 I achieve relatively rapid mixing this way. The full chain has 1,000,000 draws, but it mixes well even by 100,000 draws. A more common method in the DSGE literature is to use the hessian of the posterior around the posterior mode to determine the variance of the proposal distribution. I have difficulty calculating the hessian due to numerical instability. 1 183 BIBLIOGRAPHY A BEL , A. B. (1990): “Asset Prices under Habit Formation and Catching up with the Joneses,” The American Economic Review, Papers and Proceedings, 80(2), 38–42. (1999): “Risk Premia and Term Premia in General Equilibrium,” Journal of Monetary Economics, 43(1), 3–33. A BEL , A. B., AND O. J. B LANCHARD (1986): “The Present Value of Profits and Cyclical Movements in Investment,” Econometrica, 54(2), 249–273. A BEL , A. B., A. K. D IXIT, J. C. E BERLY, AND R. S. P INDYCK (1996): “Options, the Value of Capital, and Investment,” Quarterly Journal of Economics, 111(3), 753–777. A LVAREZ , F., AND U. J. J ERMANN (2005): “Using Asset Prices to Measure the Persistence of the Marginal Utility of Wealth,” Econometrica, 73(6), 1977–2016. A NDREWS , D. W. (1993): “Tests for Parameter Instability and Structural Change with Unknown Change Point,” Econometrica, 61(4), 821–856. A NG , A., AND M. P IAZZESI (2003): “No-Arbitrage Vector Autoregression of Term Struc- ture Dynamics with Macroeconomic and Latent Variables,” Journal of Monetary Economics, 50, 745–787. ATKESON , A., AND L. E. O HANIAN (2001): “Are Phillips Curves Useful for Forecasting Inflation?,” Federal Reserve Bank of Minneapolis Quarterly Review, 25(1), 2–11. ATTANASIO , O. P., P. K. G OLDBERG , AND E. K YRIAZIDOU (2008): “Credit Constraints in the Market for Consumer Durables: Evidence from Micro Data on Car Loans,” International Economic Review, 49(2), 401–436. 184 B AKER , M., R. G REENWOOD , AND J. W URGLER (2003): “The Maturity of Debt Issues and Predictable Variation in Bond Returns,” Journal of Financial Economics, 70(2), 261–291. B ANSAL , R., AND A. YARON (2004): “Risks for the Long Run: A Potential Resolution of Asset Pricing Puzzles,” Journal of Finance, 59(4), 1481–1509. B ARBERIS , N., M. H UANG , AND T. S ANTOS (2001): “Prospect Theory and Asset Prices,” Quarterly Journal of Economics, 116(1), 1–53. B ARCLAY, M. J., AND J. C LIFFORD W. S MITH (1995): “The Maturity Structure of Corporate Debt,” Journal of Finance, 50(2), 609–631. B ASU , S., J. G. F ERNALD , AND M. S. K IMBALL (2006): “Are Technology Improvements Contractionary?,” American Economic Review, 96(5), 1418–1448. B AXTER , M., AND M. J. C RUCINI (1993): “Explaining Saving–Investment Correlations,” American Economic Review, 83(3), 416–436. B EELER , J., AND J. Y. C AMPBELL (2011): “The Long-Run Risks Model and Aggregate Asset Prices: An Empirical Assessment,” Working paper. B EKAERT, G., S. C HO , AND A. M ORENO (2010): “new Keynesian Macroeconomics and the Term Structure,” Journal of Money, Credit ans Banking, 42(1), 33–62. B ERK , J. B., R. C. G REEN , AND V. N AIK (1999): “Optimal Investment, Growth Options, and Security Returns,” Journal of Finance, 54(5), 1553–1607. B ERNANKE , B. S., AND M. G ERTLER (1995): “Inside the Black Box: The Credit Channel of Monetary Policy Transmission,” Journal of Economic Perspectives, 9(4), 27–48. B EVERIDGE , S., AND C. R. N ELSON (1981): “A New Approach to Decomposition of Eco- nomic Time Series into Permanent and Transitory Components with Particular Attention to Measurement of the "Business Cycle",” Journal of Monetary Economics, 7(2), 151–174. B LOOM , N. (2009): “The Impact of Uncertainty Shocks,” Econometrica, 77(3), 623–685. B OLDRIN , M., L. J. C HRISTIANO , AND J. D. M. F ISHER (2001): “Habit Persistence, Asset Returns, and the Business Cycle,” American Economic Review, 91(1), 149–166. 185 B RUNNERMEIER , M., AND S. N AGEL (2008): “Do Wealth Fluctuations Generate Time- Varying Risk Aversion? Micro-Evidence on Individuals’ Asset Allocation,” The American Economic Review, 98(3), 713–736. C ABALLERO , R. J. (1994): “Small Sample Bias and Adjustment Costs,” Review of Economics and Statistics, 76(1), 52–58. C ALDARA , D., J. F ERNANDEZ -V ILLAVERDE , J. F. R UBIO -R AMIREZ , AND Y. W EN (2009): “Computing DSGE Models with Recursive Preferences,” Working paper. C ALVET, L. E., J. Y. C AMPBELL , AND P. S ODINI (2009): “Fight Or Flight? Portfolio Rebal- ancing by Individual Investors,” The Quarterly Journal of Economics, 124(1), 301–348. C ALVET, L. E., AND P. S ODINI (2010): “Twin Picks: Disentangling the Determinants of Risk-Taking in Household Portfolios,” Working Paper. C AMPANALE , C., R. C ASTRO , AND G. L. C LEMENTI (2010): “Asset Pricing in a Production Economy with Chew-Dekel Preferences,” Review of Economic Dynamics, 13(2), 379–402. C AMPBELL , J. Y. (1987): “Stock Returns and the Term Structure,” Journal of Financial Economics, 18(2), 373–399. (1994): “Inspecting the Mechanism: An Analytical Approach to the Stochastic Growth Model,” Journal of Monetary Economics, 33(3), 463–506. (2003): “Consumption-Based Asset Pricing,” in Handbook of the Economics of Finance, vol. 1, pp. 803–887. Elsevier Science. C AMPBELL , J. Y., AND J. H. C OCHRANE (1999): “By Force of Habit: A Consumption-Based Explanation of Aggregate Stock Market Behavior,” Journal of Political Economy, 107(2), 205–251. C AMPBELL , J. Y., AND A. D EATON (1989): “Why is Consumption so Smooth?,” Review of Economic Studies, 56(3), 357–373. 186 C AMPBELL , J. Y., M. L ETTAU , B. G. M ALKIEL , AND Y. X U (2001): “Have Individual Stocks Become More Volatile? an Empirical Exploration of Idiosyncratic Risk,” Journal of Finance, 56(1), 1–43. C AMPBELL , J. Y., AND N. G. M ANKIW (1989): “Consumption, Income and Interest Rates: Reinterpreting the Time Series Evidence,” in NBER Macroeconomics Annual, ed. by O. J. Blanchard, and S. Fischer. C AMPBELL , J. Y., AND R. J. S HILLER (1988): “The Dividend-Price Ratio and Expectations of Future Dividends and Discount Factors,” Review of Financial Studies, 1(3)(3), 195–228. (1991): “Yield Spreads and Interest Rate Movements: A Bird’s Eye View,” Review of Economic Studies, 58(3), 495–514. C ARROLL , C. D. (2002): “Portfolios of the Rich,” in Household Portfolios, ed. by L. Guiso, M. Haliassos, and T. Japelli. MIT Press. C HIRINKO , R. S. (1993): “Business Fixed Investment Spending: Modeling Strategies, Empirical Results, and Policy Implications,” Journal of Economic Literature, 31(4), 1875–1911. C HRISTIANO , L. J., M. E ICHENBAUM , AND R. V IGFUSBSON (2004): “The Response of Hours to a Technology Shock: Evidence Based on Direct Measures of Technology,” Journal of the European Economic Association, 2(2–3), 381–395. C HRISTIANO , L. J., M. T RABANDT, AND K. WALENTIN (2011): “DSGE Models for Mone- tary Policy Analysi,” in Handbook of Monetary Economics. North-Holland. C LARK , T. E., AND M. W. M C C RACKEN (2001): “"Tests of Equal Forecast Accuracy and Encompassing for Nested Models",” Journal of Econometrics, 105(1), 85–110. (2005): “Evaluating Direct Multistep Forecasts,” Econometric Reviews, 24(4), 369– 404. C LARK , T. E., AND K. D. W EST (2007): “Approximately Normal Tests for Equal Predictive Accuracy in Nested Models,” Journal of Econometrics, 138(1), 291–311. 187 C OCHRANE , J. H. (1991): “Production-Based Asset Pricing and the Link Between Stock Returns and Economic Fluctuations,” Journal of Finance, 46(1), 209–237. (1996): “A Cross-Sectional Test of an Investment-Based Asset Pricing Model,” Journal of Political Economy, 104(3), 572–621. (2005): “Financial Markets and the Real Economy,” NBER Working paper. (2008): “The Dog That Did Not Bark: A Defense of Return Predictability,” Review of Financial Studies, 21(4), 1533–1575. (2011): “Discount Rates,” Journal of Finance, 66(4), 1047–1108, AFA Presidential Address. C OCHRANE , J. H., AND M. P IAZZESI (2005): “Bond Risk Premia,” American Economic Re- view, 95(1), 138–160. C ONSTANTINIDES , G. M. (1990): “Habit Formation: A Resolution of the Equity Premium Puzzle,” The Journal of Political Economy, 98(3), 519–543. D ANTHINE , J.-P., J. B. D ONALDSON , AND R. M EHRA (1992): “The Equity Premium and the Allocation of Income Risk,” Journal of Economic Dynamics and Control, 16(3–4), 509– 532. D EW-B ECKER , I. (2011a): “Bond Pricing With a Time-varying Price Of Risk in an Estimated Medium-Scale Bayesian DSGE Model,” Working paper. (2011b): “A Model of Time-Varying Risk Premia with Habits and Production,” Working paper. (2012): “Essentially affine approximations for economic models,” Working paper. D OH , T. (2011): “Yield Curve in an Estimated Nonlinear Macro Model,” Journal of Economic Dynamics & Control. D UFFEE , G. R. (2002): “Term Premia and Interest Rate Forecasts in Affine Models,” Journal of Finance, 57(1), 405–443. 188 D YNAN , K. E. (1993): “How Prudent are Consumers?,” Journal of Political Economy, 101(6), 1104–1113. (2000): “Habit Formation in Consumer Preferences: Evidence from Panel Data,” American Economic Review, 90(3), 391–406. E BERLY, J., S. R EBELO , AND N. V INCENT (2009): “Investment and Value: A Neoclassical Benchmark,” NBER Working Paper. E DGE , R. M. (2000): “The Effect of Monetary Policy on Residential and Structures Investment Under Differential Project Planning and Completion Times,” Federal Reserve Board of Governors International Finance Discussion Papers No. 671. E PSTEIN , L. G., AND S. E. Z IN (1989): “Substitution, Risk Aversion, and the Temporal Behavior of Consumption and Asset Returns: A Theoretical Framework,” Econometrica, 57(4), 937–969. (1991): “Substitution, Risk Aversion, and the Temporal Behavior of Consumption and Asset Returns: An Empirical Analysis,” The Journal of Political Economy, 99(2), 263– 286. FAMA , E. F., AND K. R. F RENCH (1989): “Business Conditions and Expected Returns on Stocks and Bonds,” Journal of Financial Economics, 25(1), 23–49. FAMA , E. F., AND G. W. S CHWERT (1977): “Human Capital and Capital Market Equilib- rium,” Journal of Financial Economics, 4(1), 95–125. FAZZARI , S. M., R. G. H UBBARD , AND B. C. P ETERSEN (1988): “Financing Constraints and Corporate Investment,” Brookings Papers on Economic Activity, 1988(1), 141–206. F ERNANDEZ -V ILLAVERDE , J., P. G UERRON , J. F. R UBIO -R AMIREZ , AND M. U RIBE (2011): “Risk Matters: The Real Effects of Volatility Shocks,” American Economic Review. F ERNANDEZ -V ILLAVERDE , J., AND J. R UBIO -R AMIREZ (2004): “Comparing dynamic equilibrium models to data: a Bayesian approach,” Journal of Econometrics, 123(1), 153–187. 189 F RANCIS , N., AND V. A. R AMEY (2005): “Is the Technology-Driven Real Business Cycle Hypothesis Dead? Shocks and Aggregate Fluctuations Revisited,” Journal of Monetary Economics, 52(8), 1379–1399. (2009): “Measures of Per Capita Hours and Their Implications for the TechnologyHours Debate,” Journal of Money, Credit, and Banking, 41(6), 1071–1097. F RAUMENI , B. (1997): “The Measurement of Depreciation in the U.S. National Income and Product Accounts,” Survey of Current Business, pp. 7–23. G ALI , J. (1999): “Technology, Employment, and the Business Cycle: Do Technology Shocks Explain Aggregate Fluctuations?,” American Economic Review, 89(1), 249–271. G ELFAND , A. E., AND D. K. D EY (1994): “Bayesian Model Choice: Asymptotics and Exact Calculations,” Journal of the Royal Statistical Society. Series B, 56(3), 501–514. G ELMAN , A., G. R OBERTS , AND W. G ILKS (1996): “Efficient Metropolis Jumping Rules,” in Bayesian Statistics, ed. by J. Bernardo, J. Berger, A. Dawid, and A. Smith, vol. 5. Oxford University Press. G ERTNER , R. (1993): “Game Shows and Economic Behavior: Risk-Taking on "Card Sharks",” Quarterly Journal of Economics, 108(2), 507–521. G EWEKE , J. (1999): “Using simulation methods for bayesian econometric models: inference, development,and communication,” Econometric Reviews, 18(1), 1–73. G ILCHRIST, S., AND E. Z AKRAJSEK (2007): “Investment and the Cost of Capital: New Evidence from the Corporate Bond Market,” Working Paper. G OMES , J. F., L. K OGAN , AND M. Y OGO (2009): “Durability of Output and Expected Stock Returns,” Journal of Political Economy, 117(5), 941–986. G OURIO , F. (2010): “Disaster Risk and Business Cycles,” Working Paper. G RAEVE , F. D., M. D OSSCHE , M. E MIRIS , H. S NEESSENS , AND R. W OUTERS (2010): “Risk premiums and macroeconomic dynamics in a heterogeneous agent model,” Journal of Economic Dynamics and Control, 34(9), 1680–1699. 190 G RAHAM , J. R., AND C. R. H ARVEY (2002): “How do CFOs Make Capital Budgeting and Capital Structure Decisions?,” Journal of Applied Corporate Finance, 15(1), 8–23. G REENWOOD , R., S. H ANSON , AND J. S TEIN (2009): “A Gap-Filling Theory of Corporate Debt Maturity Choice,” Working Paper. G ROSSMAN , S. J., AND R. J. S HILLER (1981): “The Determinants of the Variability of Stock Market Prices,” American Economic Review, Papers and Proceedings, 71(2), 222–227. G RUBER , J. (2006): “A Tax-Based Estimate of the Elasticity of Intertemporal Substitution,” NBER Working Paper 11945. G UEDES , J., AND T. O PLER (1996): “The Determinants of the Maturity of Corporate Debt Issues,” The Journal of Finance, 51(5), 1809–1833. G UISO , L., A. K. K ASHYAP, F. PANETTA , AND D. T ERLIZZESE (2002): “How Interest Sen- sitive is Investment? Very (when the data are well measured),” Working Paper. G UISO , L., P. S APIENZA , AND L. Z INGALES (2011): “Time-Varying Risk Aversion,” . G URKAYNAK , R. S., B. S ACK , AND E. S WANSON (2005): “The Sensitivity of Long-Term Interest Rates to Economic News: Evidence and Implications for Macroeconomic Models,” American Economic Review, 95(1), 425–436. G UVENEN , F. (2009): “A Parsimonious Macroeconomic Model for Asset Pricing,” Econometrica, 77(6), 1711–1750. H AARIO , H., E. S AKSMAN , AND J. TAMMINEN (2001): “An Adaptive Metropolis Algo- rithm,” Bernoulli, 7(2), 223–242. H ALL , R. E. (1988): “Intertemporal Substitution in Consumption,” Journal of Political Economy, 96(2), 339–357. H AMILTON , J. D., AND C. W U (2012): “Identification and Estimation of Gaussian Affine- Term-Structure Models,” Journal of Econometrics, Working Paper. H ANSEN , L. P., J. C. H EATON , AND N. L I (2008): “Consumption Strikes Back? Measuring Long-Run Risk,” Journal of Political Economy, 116(2), 260–302. 191 H ANSEN , L. P., AND R. J AGANNATHAN (1991): “Implications of Security Market Data for Models of Dynamic Economies,” Journal of Political Economy, 99(2), 225–262. H ASSETT, K. A., AND R. G. H UBBARD (2002): Handbook of Public Economicschap. Tax Policy and Business Investment, pp. 1293–1343. Elsevier Science B.V. H OUSE , C. L., AND M. D. S HAPIRO (2008): “Temporary Investment Tax Incentives: Theory with Evidence from Bonus Depreciation,” American Economic Review, 98(3), 737–768. H ULTEN , C. R., AND F. C. W YKOFF (1981): “The Measurement of Economic Depreciation,” in Depreciation, Inflation, and the Taxation of Income from Capital,, ed. by C. R. Hulten, pp. 81–125. The Urban Institute Press, Washington, D.C. J ACCARD , I. (2007): “Asset Returns and Labor Supply in a Production Economy With Habit Memory,” ECB Working Paper. J ERMANN , U. J. (1998): “Asset Pricing in Production Economies,” Journal of Monetary Economics, 41(2), 257–275. J OSLIN , S., K. J. S INGLETON , AND H. Z HU (2011): “A New Perspective on Gaussian Dy- namic Term Structure Models,” Review of Financial Studies, 24(3), 926–970. J UDD , K. L. (1999): Numerical Methods for Economists. MIT Press, Cambridge, MA. J USTINIANO , A., G. P RIMICERI , AND A. TAMBALOTTI (2010): “Investment Shocks and Business Cycles,” Journal of Monetary Economics, 57(2), 132–145. K ALTENBRUNNER , G., AND L. A. L OCHSTOER (Forthcoming): “Long-Run Risk Through Consumption Smoothing,” The Review of Financial Studies. K ASHYAP, A. K., AND J. C. S TEIN (2000): “What Do a Million Observations on Banks Say About the Transmission of Monetary Policy?,” American Economic Review, 90(3), 407–428. K IEFER , N. M., AND T. J. V OGELSANG (2005): “A New Asymptotic Theory for Heteroskedasticity-Autocorrelation Robust Tests,” Econometric Theory, 21(6), 1130–1164. K IEFER , N. M., T. J. V OGELSANG , AND H. B UNZEL (2000): “Simple Robust Testing of Regression Hypotheses,” Econometrica, 68(3), 695–714. 192 K ING , R. G., C. I. P LOSSER , AND S. T. R EBELO (1988): “Production, Growth and Business Cycles : I. The Basic Neoclassical Model,” Journal of Monetary Economics, 21(2–3), 195– 232. K REPS , D. M., AND E. L. P ORTEUS (1978): “Temporal Resolution of Uncertainty and Dy- namic Choice Theory,” Econometrica, 46(1), 185–200. L E R OY, S. F., AND R. D. P ORTER (1981): “The Present-Value Relation: Tests Based on Implied Variance Bounds,” Econometrica, 49(3), 555–574. L ETTAU , M. (2003): “Inspecting The Mechanism: Closed-Form Solutions for Asset Prices in Real Business Cycle Model,” The Economic Journal, 113(489), 550–575. L ETTAU , M., AND S. L UDVIGSON (2001): “Consumption, Aggregate Wealth, and Expected Stock Returns,” Journal of Finance, 56(3), 815–849. L ETTAU , M., AND H. U HLIG (2000): “Can Habit Formation be Reconciled with Business Cycle Facts?,” Review of Economic Dynamics, 3(1), 79–99. L ETTAU , M., AND J. A. WACHTER (2007): “Why Is Long-Horizon Equity Less Risky? A Duration-Based Explanation of the Value Premium,” Journal of Finance, 62(1), 55–92. L ONGSTAFF , F. A., AND E. S. S CHWARTZ (1992): “Interest Rate Volatility and the Term Structure: A Two-Factor General Equilbrium Model,” Journal of Finance, 47(4), 1259– 1282. L OWN , C. S., AND D. P. M ORGAN (2006): “The Credit Cycle and the Business Cycle: New Findings Using the Loan Officer Opinion Survey,” Journal of Money, Credit, and Banking, 38(6), 1575–1597. M ELINO , A., AND A. X. YANG (2003): “State-Dependent Preferences can Explain the Eq- uity Premium Puzzle,” Review of Economic Dynamics, 6(4), 806–830. M IAO , J., AND P. WANG (2010): “Credit Risk and Business Cycles,” Working Paper. 193 O LINER , S. D., G. D. R UDEBUSCH , AND D. S ICHEL (1995): “New and Old Models of Business Investment: A Comparison of Forecasting Performance,” Journal of Money, Credit, and Banking, 27(3), 806–826. (1996): “The Lucas Critique Revisited: Assessing the Stability of Empirical Euler Equations for Investment,” Journal of Econometrics, 70(1), 291–316. O PLER , T., L. P INKOWITZ , R. S TULZ , AND R. W ILLIAMSON (1999): “The Determinants and Implications of Corporate Cash Holding,” Journal of Financial Economics, 52(1), 3–46. P IAZZESI , M. (2010): “Affine Term Structure Models,” in Handbook of Financial Econometrics. Elsevier. P OST, T., M. J. VAN DEN A SSEM , G. B ALTUSSEN , AND R. H. T HALER (2008): “Deal or No Deal? Decision Making under Risk in a Large-Payoff Game Show,” American Economic Review, 98(1), 38–71. R AVINA , E. (2007): “Habit Formation and Keeping Up with the Joneses: Evidence from Micro Data,” Mimeo. R OUWENHORST, K. G. (1995): “Asset Pricing Implications of Equilibrium Business Cycle Models,” in Frontiers of Business Cycle Research, ed. by T. F. Cooley, chap. 10, pp. 294–330. Princeton University Press. R UDEBUSCH , G. D., AND E. T. S WANSON (2008): “Examining the Bond Premium Puzzle with a DSGE Model,” Journal of Monetary Economics, 55(Supplement 1), S111–S126. (Forthcoming): “The Bond Premium in a DSGE Model with Long-Run Real and Nominal Risks,” American Economic Journal: Macroeconomics, Federal Reserve Bank of San Francisco Working Paper 2008-31. S CHALLER , H. (2006): “Estimating the Long-Run User Cost Elasticity,” Journal of Monetary Economics, 53(4), 725–736. S HILLER , R. J. (1981): “Do Stock Prices Move Too Much to be Justified by Subsequent Changes in Dividends?,” American Economic Review, 71(3), 421–436. 194 S OLOW, R. M. (1957): “Technical Change and the Aggregate Production Function,” Review of Economics and Statistics, 39(3), 312–320. S TOHS , M. H., AND D. C. M AUER (1996): “The Determinants of Corporate Debt Maturity Structure,” The Journal of Business, 69(3), 279–312. S UMMERS , L. H. (1981): “Taxation and Corporate Investment: A q-Theory Approach,” Brookings Papers on Economic Activity, 1981(1), 67–140. S WANSON , E. T. (2011): “Risk Aversion and the Labor Margin in Dynamic Equilibrium Models,” American Economic Review. TALLARINI , T. D. (2000): “Risk-Sensitive Real Business Cycles,” Journal of Monetary Economics, 45(3), 507–532. TANAKA , T., C. F. C AMERER , AND Q. N GUYEN (2010): “Risk and Time Preferences: Link- ing Experimental and Household Survey Data from Vietnam,” American Economic Review, 100(1), 557–571. T EVLIN , S., AND K. W HELAN (2003): “Explaining the Investment Boom of the 1990s,” Journal of Money, Credit, and Banking, 35(1), 1–22. T ITMAN , S., AND R. W ESSELS (1988): “The Determinants of Capital Structure Choice,” Journal of Finance, 43(1), 1–19. VAN A RK , B., AND R. I NKLAAR (2006): “Catching up or getting stuck? Europe’s troubles to exploit ICT’s productivity potential,” GGDC Research Memorandum GD-79. VAN B INSBERGEN , J. H., J. F ERNANDEZ -V ILLAVERDE , R. S. K OIJEN , AND J. F. R UBIO - R AMIREZ (2011): “The Term Structure of Interest Rates in a DSGE Model with Recursive Preferences,” Working paper. V ISSING -J ORGENSEN , A., AND O. P. ATTANASIO (2003): “Stock-Market Participation, In- tertemporal Substitution, and Risk-Aversion,” The American Economic Review, 93(2), 383– 391. 195 V ISSING -J ORGENSON , A. (2002): “Limited Asset Market Participation and the Elasticity of Intertemporal Substitution,” Journal of Political Economy, 110(4), 825–853. WACHTER , J. A. (2010): “Can Time-Varying Risk of Rare Disasters Explain Aggregate Stock Market Volatility?,” Working Paper. W EIL , P. (1989): “The Equity Premium Puzzle and the Risk-Free Rate Puzzle,” Journal of Monetary Economics, 24(3), 401–421. W OODFORD , M. (2003): Interest and Prices. Princeton University Press, Princeton, NJ. YANG , W. (2008): “Intertemporal Substitution and Equity Premium: A Perspective With Habit in Epstein-Zin Preferences,” Working Paper. 196