Cointegration and Tests of Present Value Models

In a model where a variable Y[sub t] is proportional to the present value, with constant discount rate, of expected future values of a variable y[sub t] the "spread" S[sub t]= Y[sub t] - [theta sub t] will be stationary for some [theta] whether or not y[sub t]must be differenced to induce stationarity. Thus, Y[sub t] and y[sub t] are cointegrated. The model implies that S[sub t] is proportional to the optimal forecast of [delta Y{sub t+1}] and also to the optimal forecast of S*[sub t], the present value of future [delta y{sub t}]. We use vector autoregressive methods, and recent literature on cointegrated processes, to test the model. When Y[sub t] is the long-term interest rate and y[sub t] the short-term interest rate, we find in postwar U.S. data that S[sub t] behaves much like an optimal forecast of S*[sub t] even though as earlier research has shown it is negatively correlated with [delta Y{sub t+1}]. When Y[sub t] is a real stock price index and y[sub t] the corresponding real dividend, using annual U.S. data for 1871-1986 we obtain less encouraging results for the model, al-though the results are sensitive to the assumed discount rate.

' The discounted sum in eq. (1) extends to an infinite horizon. Most of the methods in this paper can be applied to the finite horizon case, at the cost of some additional complexity. Throughout this paper we will treat conditional expectations as equivalent to linear projections on information.
It might be attractive to model the variables y and Y as stationary in log first differences. However, since the model (1) is linear in levels, a log specification is intractable unless one is willing to focus on a special case (Kleidon 1986) or to approximate the model (Campbell and Shiller 1986).

have also studied this case. We follow Hansen and Sargent and differ from Mankiw et al. and West by using a relatively large information set H,. We include in H, current and lagged values not just of y, but also of Y,
Our choice of information set has several advantages. By including Y, in the vector stochastic process for analysis, we in effect include all relevant information of market participants, even if we econometricians do not observe all their information variables. We can test all the implications of the model for the bivariate (y, Y,) process, giving a natural extension of Fama's (1970) notion of a "weak-form" test. We can exploit the recently developed theory of cointegrated processes (Phillips and Durlauf 1986; Phillips and Ouliaris 1986; Engle and Granger 1987;Stock 1987). Our test procedure can be interpreted as a single-equation regression or as a test of restrictions on a VAR. We propose a way to assess the economic significance of deviations from (1), comparing the forecast of the present value of future y, embodied in Y, with an unrestricted VAR forecast. Because the information set H, includes Y, the two forecasts should be equal if the model is true.
We examine the present value models for bonds and stocks, while a companion piece by one of us (Campbell 1987) studies the permanent income theory of consumption. The paper is organized as follows. Section I discusses alternative tests of the present value relation when y, and Y, are stationary in first differences rather than levels. Section II is an introduction to the literature on cointegration, summarizing the results we use in testing the present value model. Section III applies the method to data on bonds and stocks. Section IV presents conclusions.

I. Alternative Tests of the Present Value Relation
One straightforward way to test the model (1) I ).
(2) Apart from a constant, i, is the true innovation at time t in Y, (i.e., the innovation with respect to the full market information set I,). The model has the striking implication that this innovation is observable when only Y., Y, Iy,_ , and the parameters c, 0, and 8 are known.5 In The variable t, can also be written as a constant plus the true innovation in the expected present value of all future y,. We note, however, that in general the model does not identify the true innovation in yt itself. the applications of this paper, i, has the economic interpretation of an asset return. In the term structure it is the excess return on long bonds over short bills, while in the stock market it is the excess return on stocks over a constant mean, multiplied by the stock price.
Since the right-hand side of (2), adjusted for a constant, is orthogonal to all elements of the information set It -I, one can test the present value relation by regressing At on variables in this set and testing that the coefficients are jointly zero. This approach is standard in the literature and seems attractively simple. However, there are some econometric pitfalls and issues of interpretation that need careful handling.
First, the regressors used to predict i, must be stationary if conventional asymptotic distribution theory is to apply. Of course, there are many stationary elements of I,_ ,, but one may want to choose variables that summarize the joint history of y, and Y,. It is not clear how the stationarity requirement can be reconciled with this objective if y, and Y1 are themselves nonstationary.
Second, while (1) implies (2), the reverse is not true. Equation (2) is consistent with a more general form of (1) that includes a "rational bubble," a random variable b, satisfying b, = 8E,b,+ 1. Recently there has been considerable interest in testing (1) against the alternative that Y1 is influenced by a rational bubble (Blanchard and Watson 1982;Hamilton and Whiteman 1985;Quah 1986;West 1987).
Third, it is not clear what are the implications for Y, of nonzero coefficients in a regression of it on information. Predictability of returns has consequences for asset price behavior, and one may want to calculate these explicitly.
Further insight into these issues can be gained by defining a new variable StY= -Oy,. We will refer to St as the "spread." In the case of the term structure, it is just the spread between long-and short-term interest rates; for stocks, it is the difference between the stock price and a multiple of dividends. The spread can also be written as a linear combination of the variables AY,, nyt, and i,: The present value model (1) implies two alternative interpretations of the spread. Subtracting Oyt from both sides of equation (1) and rearranging, one obtains Equation (3) says that the spread is a constant plus the optimal forecast of S*, a weighted average of future changes in y; equation (4) says that the spread is linear in the optimal forecast of the change in Y. Equation (4) can be used in an alternative test of the present value model, in which one regresses A\Y, on a constant, St-i, and other variables. The coefficient on St_ l should be (1 -8)/8, and the coefficients on the other variables should be zero. This regression is just a linear transformation of the regression that has it as the dependent variable, and it yields the same test statistic.
Equations (3) and (4) help to resolve the issues raised above. If ZAyt is stationary, it follows from (3) that St is stationary; (4) then implies that AYt is stationary. Thus one can use St and Ayt, or St and AYt, as stationary variables that summarize the bivariate history of yt and Yt in a regression test of the model. (The pair zAyt and zAYt is also stationary, but by using these one would lose information on the relative levels of yt and Yt.) Our strategy is to work with St and Ayt.
The effect of a "rational bubble" alternative is easily seen using (3) and (4). If a term bt is added to the right-hand side of equation (1), satisfying bt = 8Etbt+ I, it appears on the right-hand side of (3) but does not affect equations (2) and (4). The term bt is explosive by construction, so it causes explosive behavior of St by (3), and this is passed through to AYt by (4).6 One way to test for the importance of rational bubbles is therefore to test the stationarity of St and AY,. This approach has been proposed by Diba and Grossman (1984), among others. As we noted above, St can be written as a linear combination of AYt, Ayt, and it. Therefore, independent of any model, if three of the variables St, AYt, Ayt, and it are stationary, the fourth must be also. This linear dependence needs to be taken into account in testing for stationarity.  We can now discuss the implications of the present value relation for the VAR system. A rather weak implication is that St must linearly Granger-cause Ayt unless St is itself an exact linear function of current and lagged ZAyt (which is a stochastic singularity we do not observe in the data; it would require, e.g., that the variance-covariance matrix of ult and U2t, Qi, be singular).
The intuitive explanation for this result is that St is an optimal forecast of a weighted sum of future values of Ayt, conditional on agents' full information set. Therefore, St will have incremental explanatory power for future ZAyt if agents have information useful for forecasting Ayt beyond the history of that variable. If agents do not have such information, they form St as an exact linear function of current and lagged Ayt.i7 The full set of restrictions of the present value model is more demanding. We obtain these restrictions by projecting equation ( where g' and h' are row vectors with 2p elements, all of which are zero except tor the p + 1st element of g' and the first element of h', which are unity. If this expression is to hold for general z, (i.e., for nonsingular Q), it must be the case that Here the second equality follows by evaluating the infinite sum, noting that it must converge because the variables Ay, and St are stationary under the null.8 The restrictions of equation (7) appear to be highly nonlinear crossequation restrictions of the type described by Hansen and Sargent (1981b) as the "hallmark" of rational expectations models. However, it turns out that (7) can be simplified so that (taking 0 and 8 as given) its restrictions are linear and easily interpreted. Postmultiplying both sides of (7) by (I -8A), one obtains g'(I -8A) = Oh'8A. The major advantage of the VAR framework is that it can be used to generate alternative measures of the economic importance, not merely the statistical significance, of deviations from the present value relation. To see this more clearly, suppose that the present value model is false so that Egt+, #5 0 for i -1. Then equations (3) and (4) no longer hold. We define the "theoretical spread," S5, as the optimal forecast, given the information set Ht, of the present value of all future changes in y:

S' E (S*| Ht) = Oh'A(I -
8 Under an explosive bubble alternative this infinite sum will not converge, and the matrix (I -8A) will be singular. 9 However, this statistic is not numerically identical to the Wald statistic for a test of eq. (7), even though (7) Equations (10) and (11) measure deviations from the model in two different ways. The metric of equation (1 1) is the difference between St and the optimal forecast, given the information set Ht, of the oneperiod change in Y. Equation (1 1) shows that this difference is large if excess returns are predictable one period in advance.
The metric of equation (10) is the difference between St and the theoretical spread, which is large if the present value of all future excess returns is predictable. By this measure, a large deviation from the model requires not only that movements in g be predictable one period in advance but that they be predictable many periods in advance. Loosely speaking, predictable excess returns must be persistent as well as variable.'10 We use the VAR framework not only to conduct statistical tests of the present value relation but also to evaluate its failures using the metric of equation (10). We display time-series plots of the spread St and the theoretical spread St, the unrestricted VAR forecast of the present value of future changes in y. If the present value model is true, these variables should differ only because of sampling error. Large observed differences in the time-series movements of the two variables imply (subject to sampling error) economically important deviations from the model.
The VAR framework can also be used to test the present value model against more specific alternatives. Volatility tests, for example, are designed to test against the alternative that Y, or some transformation of it "moves too much." We present two different volatility tests. The first is just a test that the ratio var(S,)/var(S,) is unity. This ratio, together with its standard error, can be computed from the VAR system. Under the present value model, the ratio should be one but would be larger than one if the spread is too volatile relative to information about future y. A statistic that complements this is the correlation between St and St 11) The terminology of our earlier paper (Campbell and Shiller 1984) may be helpful in understanding (10) and (1 1). The right-hand side of (1 1) is proportional to what we called the one-period "holding premium," and the right-hand side of (10) is what we called the "rolling premium.

JOURNAL OF POLITICAL ECONOMY since if the variance ratio and correlation both equal one, then St must equal St' and the model is satisfied."
We obtain a second volatility test, following West (1987), as follows. Let us define t' as 0 times the innovation from t -1 to t in the expected present value of A~y, conditional on the VAR information set: Under the present value model, t' = kt since S' = St. We construct the ratio var(k,)/var(k'), again with standard error.'2 The model implies that this ratio should be one, while the notion that stock prices are too volatile suggests that it will be greater than one. We call the first of our variance ratios the "levels variance ratio" and the second the "innovations variance ratio." The fact that a linear combination St of yt and Yt is stationary in its level, even though yt and Yt are individually stationary only in first differences, turns out to be important for understanding present value models. In the language of time-series analysis, the vector x, = (yt Y,)' is cointegrated. Cointegrated vectors have a number of important properties, which we now discuss.

II. Properties of Cointegrated Vectors
In this section we summarize the theory of cointegrated processes and show how it applies to present value models.

DEFINITION (Engle and Granger 1987). A vector xt is said to be cointegrated of order (d, b), denoted xt CI(d, b), if (i) all components of xt are integrated of order d (stationary in dth differences) and (ii) there exists at least one vector ox (# 0) such that ot'xt is integrated of
When y, is stationary in first differences, the vector x, = (yt Yt)' is CI(1, 1) if the present value model holds. The CI(1, 1) case is the one " We compute the levels variance ratio and correlation from the sample moments of St and S.. We report numerical standard errors that are conditional on the sample moments of z, and take account of sampling error only in the coefficients of the estimated VAR. 12 We use the estimated variance-covariance matrix of the VAR to compute the innovations variance ratio. The standard error takes account of sampling error in this matrix as well as in the VAR coefficients. that has been studied almost exclusively in the theoretical literature, and the results that follow apply to it.
Cointegrated systems of order (1, 1) have two unusual properties. These concern the existence of well-behaved vector time-series representations for the cointegrated variables and the estimation of unknown elements of the vector ox. Both properties turn out to be relevant for testing present value models.
The first important property of a cointegrated vector is that the vector moving average (VMA) representation of the first difference Ax, is noninvertible. Equivalently, the spectral density matrix of Aex, is singular at zero frequency. This singularity is what "holds together" the elements of x, so that a linear combination is stationary.
More formally, write Ax, = K(L)E, = 1E, + KjEt_1 + The matrix M = K(1)K(l)', where K(1) = I + K1 + K2 + . . ., is the spectral density matrix of Axt at zero frequency. Now if the variance of ot'xt exists, it will be given by where V is the variance-covariance matrix of Et and C. = I + K1 + . . . + K,. Ignoring the degenerate case in which V is singular, the summation above converges only if o&'C-converges to zero. But the limit of CQ as i -x o is K(1), so for convergence we must have &CK(l) = 0, which requires K(1), and hence M, to be singular. It follows from this that if an economic theory imposes cointegration on a set of nonstationary variables, simple first differencing of all the variables can lead to econometric problems. Noninvertibility of the VMA destroys the usual argument for using a finite VAR representation, that a finite VAR can approximate the true VMA arbitrarily well. Intuitively, the problem arises because a cointegrated system has fewer unit roots than variables, so first differencing all the variables amounts to overdifferencing the system.'3 Fortunately, there is a simple solution to the difficulty, which is to include otxt in a VAR along with a subset of the elements of tx,. An equation that relates the change in an element of xt to its own lags and lags of otxt is called an error-correction model for that element of xt. The VAR proposed in the previous section to test present value models is an error-correction model for yt, along with an equation describing the evolution of otxt.
The second major result from the theory of cointegration concerns the "cointegrating vector" a. In a present value model, a is unique up to a scalar normalization and is proportional to ( -0 1)'. Stock (1987) and Phillips and Ouliaris (1986) prove that a variety of methods provide estimates that converge to the true parameter at a rate proportional to the sample size T (rather than VT as in ordinary cases). The reason for this is that, asymptotically, all linear combinations of the elements of x, other than 'x, have infinite variance.
The practical implication is that an unknown element of a may be estimated in a first-stage regression and then treated as known in second-stage procedures, whose asymptotic standard errors will still be correct. This is extremely useful in carrying out the VAR tests of the previous section. In the case of stock prices, for example, the present value model constrains 0 = 8/(1 -6), so one can estimate the discount factor from a preliminary regression and then treat it as known in testing the model. Two types of preliminary regression have been proposed for estimating the unknown parameter 0. The first, called the cointegrating regression by Engle and Granger (1987), isjust a regression of Y, on,. The second is an "error-correction" regression of Ay, or AY, on lagged changes in and levels of y, and Y,. In the first case, one estimates 0 as the coefficient on yt, while in the second case one takes the ratio of the coefficient on lagged yt to that on lagged Y,.
One might argue that use of the error-correction regression is preferable because it accounts more fully for the short-run dynamics of Y1 and yt. However, it has an important disadvantage. For any cointegrated vector with two elements, there are two possible errorcorrection regressions, one for Aoy, and one for HoY,. Cointegration alone does not rule out that, in one of these regressions, lagged Y1 and yt have zero coefficients in the population, so that the coefficient ratio fails to identify the desired parameter. 14 Of course, under the present value model the error-correction equation for Aoyt has nonzero coefficients (because otx1 Granger-causes Ayt), but this is not implied by all plausible alternatives. Accordingly, we rely primarily on the cointegrating regression to identify 0.
One may want to conduct a formal statistical test of the null hypothesis that x, is not cointegrated. This turns out to pose some difficult statistical problems. If a candidate for the cointegrating vector ox is available, the null hypothesis is that t'x, is nonstationary, and one can use a modified Dickey-Fuller (1981) test, regressing the change in otx, on a constant and a single lagged level. The t-statistics ' Cointegration does rule out that the coefficients are zero in both error-correction regressions. and F-statistic are corrected for serial correlation in the equation residual as proposed by Phillips and Perron (1986) and Phillips (1987) and then compared with significance levels computed numerically by 1)ickey and Fuller. If the statistics are sufficiently high, the null hypothesis is rejected.
If the cointegrating vector is not known but must be estimated from a cointegrating regression, the Dickey-Fuller significance levels are no longer appropriate. Engle and Granger (1987) analyze a variety of tests that use the residual from the cointegrating regression, an esti-Inate of &x'x,. We report two of their test statistics, one based on the Dickey-Fuller regression and one that augments that regression with four lagged dependent variables. Engle and Granger provide significance levels for these tests, based on a Monte Carlo study.'5 Phillips and Ouliaris (1986) propose an alternative test procedure for the null hypothesis of no cointegration. Their method involves computing the matrix M, the spectral density matrix at zero frequency, nonparametrically. As discussed above, this matrix will be nonsingular under the null and singular under the alternative of cointegration. Unlike the Engle-Granger procedures, their test statistics have a distribution that is asymptotically free of nuisance parameters. They applied their methods to our data, and we note their results below.

III. Testing the Model in Bond and Stock Markets
In this section we apply the methods developed above to test present value models for bonds and stocks. The model for bonds, usually referred to as the "expectations theory of the term structure," is a special case of equation (1) 8/(1 -6). The model restricts the constant c to be zero. The discount factor 8 is not known a priori but can be inferred by estimating the cointegrating vector for stock prices and dividends; a consistent estimate is also provided by the sample mean return on stocks.'9 One difficulty with this formulation for stocks is that Y1 and yt are not measured contemporaneously. The term Y, is a beginning-ofperiod stock price, and yt is paid sometime within period t. Literal application of the methods outlined in Section I would require us to assume that yt is known to the market at the start of period t; but, as pointed out by West (1987) and others, this might lead us to a spurious rejection of the model if in fact yt is known only at the start of period t + 1. Intuitively, it is not hard to "predict" excess returns using ex post information. In order to avoid this problem, we modify the procedures of Section I by constructing a variable SLt -Yt Ot-1 We use this variable in our tests and alter the cross-equation restrictions appropriately. The dependent variables in the VAR are now SL, and Ayt-l, both of which are in the information set at the start of time t but not at the start of time t -1 under our conservative assumption about the market's information.20 Since SL, = St + HAyt, it is of course stationary if St and zAyt are. We tested the model for stocks using time-series data for real annual prices and dividends on a broad stock index from 1871 to 1986. The term Yt is the Standard and Poor's composite stock price index for January, divided by the January producer price index scaled so that the 1967 producer price index equals 100. (Before 1900 an annual average producer price index was used.) The nominal dividend series is, starting in 1926, dividends per share adjusted to index, fourquarter total, for the Standard and Poor's composite index. The nominal dividend before 1926 was taken from Cowles (1939), who ex-18 For both samples, the parameter of linearization 8 is set equal to 1/(1 + R), with R at 0.0587/12 (the mean 20-year bond rate in the short sample, expressed at a monthly rate). Our subsequent empirical results are conditional on a fixed value of .
19 The sample mean return converges to the population mean only at rate VT and therefore should not strictly be taken as known in second-stage procedures. However, we ignore this problem in our empirical work. 20 Engle and Watson (1985) did some regressions similar to ours, using a similar data set on stock prices and dividends. They used the variable St rather than SL,. Their results differ from ours in that they found no evidence of Granger causality from S. to Ay1, but they did not reject the present value model more strongly than we do.  Phillips (1987). Significance levels are: with trend: 10%/c, -3.12; 5%r7, -3.41; 2.5%, -3.66; 1%, -3.96; without trend: 10%, -2.57; 5%, -2.86; 2.5%, -3.12; 1%, -3.43.

sion, corrected for fourth-order serial correlation as proposed by Phillips and Perron (1986) and Phillips (1987).2 We ran the Dickey-Fuller regression with and without a time trend; the former is appropriate when the alternative hypothesis is that the series is stationary around a trend, the latter when the alternative is that the series is stationary around a fixed mean.
The results in part A of table 1 are generally supportive of the view that short-and long-term interest rates are cointegrated, with the cointegrating vector equal to (-1 1) as implied by the expectations theory. Over the short sample 1959-78, one cannot reject the hypothesis that short and long rates have a unit root at even the 10 percent level; however, there is strong evidence that changes in interest rates are stationary. The hypothesis that the long-short spread has a unit root is rejected at the 10 percent level when a trend is estimated and at the 5 percent level when the trend is excluded from the regression. Finally, the excess return k, also appears stationary; this, together with the results for Ay, and BY,, is indirect evidence for stationarity of the spread because of the linear dependence discussed in Section I.

Results are fairly similar over the full sample 1959-83. There is even stronger evidence that the spread is stationary, and the unit root hypothesis for short rates can be rejected unless a trend in interest rates is ruled out on a priori grounds.23
In part B of the table, we repeated these tests for the stock market data. Once again yt and Y. appear to be integrated of order one. In the stock market, the parameter 0 is not determined by the present value model as it is in the term structure. Therefore, we must compute SL, and kt using estimates of 0 obtained from the data. Strictly speaking, this invalidates the Phillips-Perron tests for SL, and k,, but we report the statistics as data description. Table 2 gives details of alternative estimation procedures for 0. The cointegrating regression estimates 0 at 31.092; the corresponding real discount rate (the reciprocal of 0) is 3.2 percent, which is lower than the average dividend-price ratio and considerably lower than the sample mean return of 8.2 percent.24 The error-correction regression 22 The results are qualitatively unchanged by looking at other statistics from the Dickey-Fuller regression or by varying the order of the serial correlation correction between one and 10. 23 The results in table 1, pt. A, are more favorable to the hypothesis of cointegration between long and short rates than are the results reported by Phillips and Ouliaris (1986). They reject the null hypothesis of no cointegration at only the 15 percent level (their table 6). However, their procedure does not impose the cointegrating vector a priori, and this may involve a loss of power. 21 The estimate of 0 that corresponds to the sample mean return is 12.195. The higher estimate in the cointegrating regression is associated with a negative constant delivers a fairly similar estimate of 0, 37.021 with an implied real discount rate of 2.7 percent. We proceed to construct SL, using discount rates of 8.2 percent and 3.2 percent as a check on the robustness of our methods.
Engle and Granger's tests for no cointegration, based on the residual from the cointegrating regression, give mixed results: the (2 statistic rejects at the 5 percent level, while the id statistic narrowly fails to reject at the 10 percent level. The Phillips-Perron tests in part B of table 1 are also mixed. Both SL, and At appear to be stationary when the 3.2 percent discount rate is used, but at an 8.2 percent discount rate the tests fail to reject the unit root null for SL, even though they reject for Ay,, AY,, and A,. There seems to be some evidence for cointegration between stock prices and dividends, but it is weaker than the evidence for cointegration in the term structure.25 The results in table 1 do not suggest that a "rational bubble" is present in the term structure or the stock market since a bubble would cause both AY, and St to be nonstationary. Accordingly, we interpret the test statistics below in terms of predictable excess returns. In table 3, part A, we report summary statistics for a VAR test of the expectations theory of the term structure. The VAR includes Avy, term; under the present value model, the constant should be proportional to the unconditional mean change in dividends, so it should be positive rather than negative. An estimated discount rate lower than the mean dividend-price ratio is consistent with the model only if dividends are expected to decline through time, the historical rise being due to sampling error. 25 Phillips and Ouliaris (1986) did not reject the null hypothesis of no cointegration between stock prices and dividends at even the 25 percent level (their table 6). Campbell and Shiller (1986) report unit root tests for log dividends, log prices, and the log dividend-price ratio. There is some evidence for trend stationarity of log dividends, no evidence against the unit root null for log prices, and strong evidence for stationarity of the dividend-price ratio.  A formal test of the expectations theory restrictions in equation (8) rejects very strongly. The null that excess returns on long bonds are unpredictable can be rejected at less than the 0.005 percent level in the short sample and at the 0.03 percent level in the full sample. The R2 values for excess returns are 26.3 percent and 16.7 percent, respectively.27 In the corresponding regression (4), which has the change in the long rate as its dependent variable, the coefficient on the spread has the wrong sign (-0.020 in the short sample and -0.039 in the full sample).28 Despite these negative results, the summary statistics in table 3, part A, suggest that there is an important element of truth to the expectations theory of the term structure. The spread does seem to move very closely with the theoretical spread, the unrestricted forecast of the present value of future short-rate changes. In both sample periods the variance of the spread is insignificantly different from the variance of the theoretical spread (i.e., our "levels variance ratio" does not reject), and the two variables have similar innovation variances and an extremely high correlation. In the 1959-78 period the correlation between the actual and theoretical spreads is 0.978 with a standard error of 0.011, while in the 1959-83 period it is 0.956 with a 26 That is, we pick the number of lags to minimize (-in likelihood + number of parameters) in the VAR. Sawa (1978) has argued that the AIC tends to choose models of higher order than the true model but states that the bias is negligible when p < TI 10, as it is here. The test statistics in table 3 are not highly sensitive to small changes in the lag length of the VAR system. 27  What this suggests is that tests of predictability of returns are highly sensitive to deviations from the expectations theory-so sensitive, in fact, that they may obscure some of the merits of the theory. An example illustrates the point. Suppose long and short rates differ from the expectations theory in the following manner: St = S. + Wt, 29 The high correlation of these variables in postwar U.S. data might also have been inferred from results in Modigliani and Shiller (1973) (see particularly their fig. 6). Despite the evidence reported in Modigliani and Shiller and in the present paper, one of us (Shiller 1979) presented evidence suggesting that long-term interest rates are too volatile to accord with the expectations theory. By contrast with Modigliani and Shiller and the present paper, Shiller (1979) assumed that levels of short rates are stationary, an assumption more clearly appropriate for prewar data sets. where w1 is serially uncorrelated noise. As Campbell and Shiller (1984) point out, excess bond returns will be predicted by St, and a regression of zY,, I on S, may find that the coefficient has the opposite sign from that predicted by (4), even if the variance of w, is quite small. However, a regression of S* on S1 will find that the coefficient has the same sign as predicted by (3), and downward bias caused by w, will be small if the variance of w, is small. Moreover, the variance ratios var(S,)/var(S, ) and var(k,)/var((!) may not be much greater than one. In this example the spread predicts short-rate movements almost correctly, even though it badly misforecasts long-rate movements. Deviations from the present value model are transitory rather than persistent, so the metric of equation (10) reveals the strengths of the expectations theory that are obscured by the metric of equation (1 1)3?
In part B of table 3, we repeated the exercises above for stock prices and dividends. We worked with one sample period but two discount rates. The Akaike criterion selected a four-lag representation for the data when the sample mean discount rate 8.2 percent was used and a two-lag representation when the cointegrating regression discount rate 3.2 percent was used.
The VAR estimates suggest that dividend changes are rather highly predictable; the R2 values for the equations that explain them are around 40 percent. There is very strong evidence that price-dividend spreads Granger-cause dividend changes, which is what one would expect if there is any truth to the present value model for stock prices.
We conducted two formal tests of the model. The first restricted the mean of the price-dividend difference, while the second left the mean unconstrained and restricted only the dynamics of the variable. (In the case of the term structure, the mean spread is always unconstrained because we allowed a constant risk premium.) The results of these tests include some statistical rejections at conventional significance levels, but they are not nearly as strong as the rejections in the term structure. The pattern of results is sensitive to the choice of discount rate. When the sample mean return is used, the mean restriction on SLt is satisfied almost exactly. Therefore, the test of only the dynamic restrictions in equation (8) rejects more strongly, at the 4.7 percent level as compared with the 7.2 percent level for the full set of restrictions. When the discount rate from the cointegrating regression is used, the complete set of restrictions is rejected at the 1.1 percent level while the significance level for the dynamic restrictions is '( We do not claim that this example is literally correct for our data. The model S = S' + w can be tested, for any MA(q) process for w, by regressing g on information known q + 2 periods earlier. We found that this test rejected the model for q up to 8 using the bond data for 1959-78. only 21.8 percent.3' For both discount rates, a regression of zY,, I on SL, gives a coefficient estimate with a negative sign rather than the positive sign implied by the present value model.32 These tests are "portmanteau" tests of the present value model against an unspecified alternative. We also present variance ratios in order to test against the specific alternative that stock prices "move too much" in levels or innovations. The point estimate of the levels variance ratio var(SL,)/var(SL') is dramatically different from unity, at 67.22, when the sample mean discount rate is used. Not surprisingly, the variance ratio is smaller when future dividend changes are discounted at the lower rate estimated by the cointegrating regression, but it is still considerable at 4.79. However, the asymptotic standard errors on these ratios are huge, and one cannot reject the hypothesis that both of them equal unity.
The innovations variance ratios var(tt)/var(k') are also estimated larger than unity, and here the standard errors are less extreme. In the sample mean discount rate case, one can reject at the 5 percent level the hypothesis that the innovation variance ratio is unity; it is estimated to be 11.27, with a standard error of 4.49. With the lower discount rate, the ratio is estimated at 1.41, with a standard error of 0.44.
Plots of the price-dividend difference and the unrestricted VAR forecast of dividend changes give a visual image of these variance results. At an 8.2 percent discount rate ( fig. 2), SLt and SL are negatively correlated (but there is a very large standard error on the correlation) and the excess volatility of the spread is very dramatic. At a 3.2 percent discount rate ( fig. 3), SLt and SLE' have a correlation of 0.911 (with standard error 0.207) and the excess volatility is much less dramatic.33 To compare our results on volatility with results using earlier methods, we also computed sample values of S* using the terminal condition ST* = ST, where T is the last observation in our sample. We computed SL* analogously. Equation (3) implies o(S*) > or(S1) and u(SL*) > u(SLt). For the bond data in the period 1959-78, U(S*) 1.217, while o(St) = 1.060, so the inequality is satisfied. For the stock 3' Nonlinear Wald tests of the dynamic restrictions in the form (7), rather than (8), reject at less than the 0.005 percent level for the 8.2 percent discount rate and at the 7.3 percent level for the 3.2 percent discount rate. 32 The coefficient is -0.064 for the 8.2 percent discount rate and -0.079 for the 3.2 percent discount rate. 33 It should be emphasized that excess volatility of the spread SL, is not quite the same as the excess volatility discussed in Shiller (1981b). That analysis suggested that stock prices should very nearly follow a trend. If that were in fact what was observed, the spread SL, would be quite volatile because of dividend movements. For bonds in 1959-78, we estimated the coefficient at 0.81; for stocks at an 8.2 percent discount rate we estimated it at 0.16, while for stocks at a 3.2 percent discount rate we estimated it at 0.02. Thus the results using S* and SL* generally support the conclusion that the present value model for bonds fits the data comparatively well, whereas the model for stocks has a poor fit even though it cannot be rejected statistically at high levels of confidence.
We close with a caveat about the plots and summary statistics gener- In a simple case in whichyt follows an AR( 1) process with a unit root and the VAR includes one lag only, one can show that the estimated VAR companion matrix will have first column zero and second column ((1 -p)!0 p)', where p is a downward-biased estimate of the unit root. This companion matrix satisfies the restrictions of equation (9) almost exactly, whatever the behavior of the variable Y,. A symptom of this misspecification would be that mean returns would not obey the model, even though the dynamics of returns would appear to satisfy the restrictions.
It is possible that a problem of this sort affects our results for the stock market when we use a low 3.2 percent discount rate corresponding to a high 0 of 31.092. The cointegrating regression that generates this 0 estimate-a regression of the level of Y on the level of y-is dominated by the enormous postwar hump in stock prices. Since this hump coincided with a much milder hump in real dividends, the regression estimates a coefficient for y that is much larger than the historical average price-dividend ratio. The negative intercept prevents the fitted value from overpredicting Y over the sample period as a whole. As a result, over the bulk of the sample period, the spread SL, is distinctly negatively correlated with the lagged dividend.34 The VAR estimates place considerable weight on this earlier part of the sample period because the dividend equation is specified in terms of dividend changes that are more variable before 1946. Thus the high correlation of SL, and SL may be to some extent spurious. This view is supported by the results from regressing SL* on SLt. This is a levels regression that is dominated by the postwar hump in stock prices, and here we find the coefficient to be essentially zero rather than one as required by the model. Further support comes from the fact that we strongly reject the implications of the model for the mean of the data when we impose a 3.2 percent discount rate.

IV. Conclusion
In this paper we have shown how a present value model may be tested when the variables of the model, yt and Y, follow linear stochastic processes that are stationary in first differences rather than in levels. If the present value model is true, a linear combination of the variables-which we call the spread-is stationary. Thus y, and Y, are cointegrated. The model implies that the spread is linear in the optimal forecast of the one-period change in Y, and also in the optimal forecast of the present value of all future changes in y. We have shown how to conduct formal Wald tests of these implications.
We have also proposed an informal method for evaluating the "fit" of a present value model. A VAR is used to construct an optimal unrestricted forecast of the present value of future y, changes, and this is compared with the spread. If the model is true, the unrestricted forecast or "theoretical spread" should equal the actual spread. We computed the variances and correlation of the two variables and plotted their historical movements.
We applied our methods to the controversial present value models for stocks and bonds. We found that both models can be rejected statistically at conventional significance levels, with much stronger evidence for bonds. However, in our data set, the spread between long-and short-term interest rates seems to move quite closely with the unrestricted forecast of the present value of future short-rate changes. This can be interpreted as evidence that deviations from the present value model for bonds are transitory. In contrast, our evaluation of the present value model for stocks indicates that the spread between stock prices and dividends moves too much and that deviations from the present value model are quite persistent, although the strength of the evidence for this depends sensitively on the discount rate assumed in the test.