Estimating the Electoral Consequences of Legislative Redistricting

We analyze the effects of redistricting as revealed in the votes received by the Democratic and Republican candidates for state legislature. We develop measures of partisan bias and the responsiveness of the composition of the legislature to changes in statewide votes. Our statistical model incorporates a mixed hierarchical Bayesian and non-Bayesian estimation, requiring simulation along the lines of Tanner and Wong (1987). This model provides reliable estimates of partisan bias and responsiveness along with measures of their variabilities from only a single year of electoral data. This allows one to distinguish systematic changes in the underlying electoral system from typical election-to-election variability.

declining responsiveness of the U.S. House of Representatives to vote swings (Gelman and King, in press; King and Gelman, in press).
Our statistical methodology involves a hierarchical random-effects model with a mixture of Bayesian and non-Bayesian estimation, summarized probabilistically.Our Bayesian computation requires simulation along the lines of Tanner and Wong (1987).

THE DATA
We analyze the votes received by Democratic and Republican candidates for the lower house of the legislatures of Ohio, Connecticut, and Wisconsin, in the seven elections held in even-numbered years from 1968 through 1980.All elections in these states were by plurality vote in singlemember districts, and, except for two districts in Wisconsin in 1980, were won by one of the two major-party candidates.As a result of redistricting in the 1960s, all districts had roughly equal populations.As a sample of our data, Table 1 shows votes in each district election in Ohio in 1972 and1974.(Our data are available from the Interuniversity Consortium for Political and Social Research.) The Democrats controlled the 1971 Ohio redistricting process and redrew the 99 districts.Connecticut had 177 districts in [1968][1969][1970]; during the 1971 redistricting, the number of districts was reduced to 151 and the Republicans controlled where the lines were drawn.Wisconsin's 100 districts were redrawn in 1971 by bipartisan agreement.
For convenience, we will henceforth refer to the Democratic proportion of the two-party vote for a given district election as the district vote.We label the average of these proportions, over all districts in a given state and election, as the average district vote.
Some district elections feature a single candidate with insignificant opposition or none at all.We refer to such an election as uncontested if one candidate gets more than 95% of the two-party vote.The proportion of uncontested elections among all of the district elections varies greatly over the three states and seven election years, with an No statewide election in our study had more than 23% uncontested seats, except for Wisconsin in 1980, with 32%.Election returns in uncontested districts do not adequately reflect support for the two political parties.Since we are interested in this party support, we define the effective vote in the case of uncontested districts to be the (unob- served) proportion of the two-party vote that this candidate would have won in his or her district had the election been contested.We approximate the probability density of the effective vote with a stem-and-leaf plot of the vote proportions received by a party in a contested district, one election before an uncontested win by that party in that district.Figure 1 presents this plot, based on data from 1968-1980 in the three state legislatures considered in this article.

DATA SUMMARIES AND EXPLORATION
Previous work in this field has involved various theoretical constructs and related data summaries, but extremely few statistical models.One early concept is the "swing ratioH-the change in the proportion of legislative seats won by a party (S), divided by the change in the average district vote (V) received (Ansolabehere, Brady, and Fiorina 1988;Kendall and Stuart 1950).This concept was expanded to the "seats-votes curve," which is the fraction of the legislative seats won by a party, as a function of the average district vote (Niemi and Fett 1986;Quandt 1974).This curve can be expressed as the function S(V), where the variables for fraction of seats won and average district vote each vary from 0 to 1. Figure 2 presents two examples of seats-votes curves.One reflects de facto statewide proportional representation, where S = V.The other represents a highly responsive electoral system near the middle of the votes scale, where most elections are usually decided.Following King and Browning (1987) and King (1989), we consider these two symmetric seats-votes curves to represent electoral systems that are fair to the political parties.Deviation from bipartisan symmetry is considered partisan bias.
Of course, a party's legislative representation is not a function only of the number of votes it receives; a deterministic seats-votes curve, as defined, cannot be more than a theoretical construct (Tufte 1973).For this reason, we define the seats-votes curve in real electoral systems to be the expected value of S, as a function of V, and we will be interested in both this conditional expectation function and variability around it.Responsiveness and bias can be defined more formally as follows: - Past researchers have empirically estimated bias and responsiveness in two ways.The most widely used method uses the statewide Democratic fraction of seats won and the average statewide district vote for a legislature for each of several consecutive elections.One can estimate the seats-votes curve by fitting a nonlinear regression to a scatterplot of these values, and one can calculate summaries of interest from this estimated curve.This method has the disadvantage of ignoring short-term systematic changes in the underlying electoral system, as might result from redistricting.Since only five elections are generally held between redistricting processes, this method is quite limited for present purposes.
The second method, dating back to Butler (1951) [see also Gudgin and Taylor (1979)], creates a "hypothetical" seats-votes curve from the district votes of a single statewide election.This curve plots S(V), under the assumption of "uniform partisan swing"; that is, as the statewide vote V changes, the vote proportion in each district changes by the same amount.This method breaks down with district votes near 0 or 1 and, in general, is based on an overly strict assumption about voting patterns.
Before describing our stochastic model, we give some exploratory data summaries.We are interested in the distribution of district vote across a state.Figure 3 shows a stem-and-leaf plot of the district votes for the contested elections in Ohio in 1972.This pattern of two main humps with irregular outliers is typical of recent U.S. legislative elections.We identify the two humps with Democratic and Republican "safe seats," and we identify the irregular pattern with the irregular influences of geography on election districts and individual candidates on election results.Sometimes such a plot for an election shows only one main hump in the middle; this corresponds to a competitive system with few safe seats.1972-1974.Finally, we would like to know how much partisan voting patterns persist from election to election.As an example of this, Figure 4 shows a scatterplot of district vote proportions for contested elections in Ohio in 1972 and1974.(Each point on the plot represents one district.)Note that district votes clearly do not move exactly according to "uniform partisan swing"; if they did, all the points would fall precisely on a single line with slope 1.Instead, the points in Figure 4 are scattered around a straight line with slope 1 and intercept equal to the statewide vote swing.We interpret the residual standard deviation in this figure to be within-district random variation about the statewide average vote swing.(A nonuniform shift would be apparent if the points in Fig. 4 fit a clearly nonlinear pattern or no pattern at all.)

A PROBABILISTIC MODEL
To avoid problems with vote proportions near 0 or 1, we work with the logit of district votes in contested elections.We label v,, as the Democratic vote in district i and election t, and u,, = logit(v,,) = ln[v,,/(lv,,)] for contested elections.(For uncontested elections, u,, is the logit of the unobserved effective Democratic vote.This will be dealt with in Sec.5.1.) Our linear model, fit to a single state, is where yi is a district effect, 6, is a statewide election effect, and the Normal distributions are independent.We assume, therefore, that vote swings about the statewide mean are spatially independent across districts.More information about individual districts might enable one to better characterize district-level vote swings.Unfortunately, these data have not been collected, and it would be quite difficult to do so.Modeling districts with addi-tional information such as spatial correlation or covariates, if they were available, would probably yield more accurate estimates of the seats-votes curve.Omitting this unavailable information is unlikely to systematically bias our results.
From the logit effective vote proportions u, = (u,,, . . ., u,,,) for an election t, we define the aggregate Democratic proportions of votes and seats: We consider the vector y = (y,, . . .,y,,), along with the variance a', to identify an "electoral system."We will summarize this system by the seats-votes curve E(S, 1 V,, y), its variance var(S, 1 V,, y), and functions of these such as the bias and responsiveness functions.Since the elements of y remain unknown, we model them as random effects by letting the y,'s be distributed as a three-point Normal-mixture distribution with a prior distribution, all described in Section 5.2.We then average over our uncertainty in y as represented by this distribution.
The foregoing model is applied to a single observed statewide election, labeled t = 0, with observations u,,, (i = 1, . . ., n) and the assignment 6,) = 0.This assignment is arbitrary and does not affect our estimates of the seatsvotes curve.If an arbitrary constant were added to each effective district vote u,,, our results would not change.A family of "hypothetical election" results u, is defined by the linear model, applied to a range of statewide vote shifts 6,.This assumption that most electoral districts respond approximately as the statewide total does is widely accepted in the political science literature (Butler 1951;Niemi and Fett 1986), although it has not been formalized statistically.Our data, such as those in Figure 4, are consistent with this pattern.This is also consistent with our assumption in Equation (2) of no interaction between y, and 6,.
We apply this model to our data in four steps.
1. Preliminary Estimation.With data from several consecutive elections, we estimate the global parameters of the model.These include a' and uncontested effective vote parameters p,,, and a,,,, described in Section 5.
2. Bayesian Estimation for a Single Election.We condition on the data u,, = (u,,,; i = I , . . ., n) from a single election to sample from the posterior distribution P(y I u,,) of the vector y.This Bayesian estimation uses the parameters determined in the previous step.
3. The Seats-Votes Curve.We average over P(y 1 u,) to estimate the posterior seats-votes curve: (We allow V, to range from 0 to 1 by allowing 6, to range from -= to = on the logit scale.)We estimate the expected variance of results across hypothetical elections: We also estimate uncertainty in the seats-votes curve due to our uncertainty in y: 4. Summaries.From the estimated seats-votes curve (4) and related conditional expectations, we estimate bias and responsiveness summaries of the definitons in (1): (average bias between V = .45and V = .55) (average responsiveness between V = .45and V = .55) We define these summaries from V = .45to V = 3 .This is a convenient range, symmetric about .5, within which most statewide votes fall.We calculate the posterior mean and variance of these summaries.

Election-to-Election Variability
Our linear model creates hypothetical district election results uit from the district effects y; by adding a constant shift 6, to the mean in every district.From here, we add the variability in (2); this "unexplained" variance a2 determines the scope of the electoral system identified with the family of hypothetical elections.Setting a2 = 0, for example, causes the district effects to be exactly identified: yi = ulo.This assumption of "uniform partisan swing" on the logit scale cannot hope to fit more than a single statewide election.
We estimate a 2 from a model of the variances in real district-level election results, across time.We use the following conceptual model: (variance between two elections, Y years apart) = (variance due to randomness in individual elections) + (variance due to changes in the underlying electoral system).
In this framework, the first term on the right side of this equality is 2a2; we imagine the second quantity to be roughly proportional to Y.Note that, from (2), the difference u,, -uit2 has variance 2a2 if their two Normal distributions are independent.
For each state, we calculate the sample variance of the We then fit a linear regression of the values sf,,, as a function of the time differences (t2 -tl).For each state, our estimate of 2a2 is just the estimate of the constant term in this regression, and with an estimate of the regression slope pooled across the three states.This yields estimates of a (on the logit scale) as .22,.19,and .22 for Ohio, Connecticut, and Wisconsin, respectively, each with a standard error of estimation of .02.

The Distribution of District Effects y,
We need to estimate the vector y of district effects and our uncertainty in it.Embedding y in a lower-dimensional probabilistic model allows us to estimate these n district effects from the n data points uio; we can also then conveniently summarize our results in a posterior distribution.
We consider the district effects to be drawn from a mixture of three Normal distributions, identified by an eightdimensional parameter 0 = (pi, p: -a 2 , A,; j = 1, 2, 3) of means, variances, and mixture proportions, with the constraint A, + E. , + A, = 1.These three humps are meant to fit plots like Figure 3, with areas of Democratic strength, areas of Republican strength, and some districts that fit no clear pattern.The parameter p: is the variance of the jth Normal distribution in the density of observed district vote proportions uio; (p!a 2 ) is the variance of the jth Normal distribution in the density of expectations yi.
The method of maximum likelihood is inadequate to estimate these eight parameters, since the likelihood function is unbounded.Therefore, we give the eight parameters a prior distribution and move to Bayesian estimation.It is mathematically convenient, and substantively sufficient, to choose a family conjugate to an N(y,, a 2 ) distribution: -( P 1 , j = 1, 2, 3 (I,,, ,I2, A,) -Dirichlet(a,,, a ~, , a;.,).
(8) Table 2 specifies these distributions; we have chosen these hyperparameters based on our substantive knowledge, and from inspection of stem-and-leaf plots like Figure 3 and for many statewide elections (King and Gelman in press).When possible, we approximate to make prior assumptions about 0 vague rather than overly restrictive.Note that the prior distribution for y, is symmetric about 0, hence treating the political parties equally.We allow the parameters y and 6' to change each election year.
Finally, we truncate this distribution so that (p: -a 2 ) r 0 for j = 1, 2, 3. change in district vote between election years tl and t2, for that is, we do not track district votes across redistricting.

Uncontested Elections
For an uncontested Democratic district election, we approximate the uncertainty in the effective vote by the information in the stem-and-leaf plot of Figure 1.We then fit this to a Normal density on the logit scale: that is, for each uncontested seat i, Our data yield the estimates (Pun, 6,,) = (.74,-57).Assuming this distribution to be independent of uit in Equation (2), we get another Normal distribution for the uncontested district effects: where at, > 0 2 .We then truncate this distribution to be all-positive, so that an uncontested seat will always favor the winning party.We also symmetrically define y, for a Republican uncontested district to be distributed as N( -pun, a t , -a'), truncated to be negative.(Recall that 0 on the logit scale is .5 on the votes scale.)

BAYESIAN ESTIMATION FOR A SINGLE ELECTION
We summarize posterior distributions by sampling from, in the following order: Together, these steps amount to sampling from the desired posterior distribution of election results.(All of these distributions are of course conditional on the parameters specified in Sec.5.)

Averaging Over Uncertainty in 8
The likelihood function P(uo 1 0) is the product of n independent densities: u,, -Normal-mixture(,u,, p : , I.,; j = 1, 2, 3).The posterior density P ( 8 / u,,) is cumbersome, because of the Normal-mixture terms in the likelihood.Direct sampling or numerical integration over this eightdimensional distribution seems impossible.With a Normal likelihood, however, simulation of O would be easy.We exploit this possibility through the data augmentation algorithm of Tanner and Wong (1987).

4).
Next, we sample from P(8 / u,), in two steps, using the intermediate variable s.
Step 1 is intractable as stated but would be easy if 8 were known, because f o r i = 1 , . . ., n , where and 4 is the standard Normal density function.In our application of the data augmentation algorithm, we simulate a single random sample O * from P ( 8 I u,,), as follows.
1. Choose a reasonable starting point for O*.We use the posterior maximum of P ( 8 I u,,), which we estimate by the EM algorithm (Dempster, Laird, and Rubin 1977), again treating s as unobserved data.
2. Repeat the following steps a number of times: (a) sample s* from P(s / O = O*, u,,) and (b) sample 8* from P(O / s = s:" u,,).For our data, the distribution of simulated values 0* appears to converge after 10 iterations.
Increasing the number of iterations did not noticeably change the distribution of simulated values of 8* or our final results.
Iterations of this procedure yield approximately independent random samples from the posterior distribution of 8. We found that 50 iterations provided sufficient precision.

Averaging Over Uncertainty in y
We can factor the conditional posterior density as follows: The first factor here is just the Normal error density from the model (2), and the second factor is the Normal-mixture density parameterized by 8. Their product yields a new Normal-mixture density with easily calculated parameters 4; for each district; we sample from these independent distributions.
For each uncontested district, we simulate y, from the truncated Normal distribution (9).We combine these with the simulated values yi for contested districts to get a sample vector y from its posterior distribution.

Averaging Over u,
To estimate the seats-votes curve and its variability, we first approximate the first two moments of the joint conditional distribution P(V,, St I y, a,), for several values of 6,. Figure 5 provides an intuitive sense of our model and sampling procedure by plotting several simulated values u,, for 6, = 0, as a function of observed district votes uio, for Ohio in 1972.Note the assumed distribution of effective votes for the uncontested districts.
The aggregate votes and seats are averages [Eqs.
(3)] of their district-level counterparts v;, and sit, which in turn depend on yi and 6, only through their mean air = y, + 6,.Thus the desired conditional moments can be expressed in terms of the following expectations: Some of the foregoing integrals are immediately evaluated through the standard Normal distribution function @; we calculate the rest by approximating the inverse logit function eul(l + eu) by a third-degree polynomial in u.
We now approximate the seats-votes curve E ( S I V) versus V by the function defined by E(S, I a,) versus E(V, I a,), implicitly parameterized by a, (or, equivalently, by the scalar 6,).Similarly, we approximate the variance as follows: var(S, ) V,) .=var(S, 1 a,) -cov( Vl, Sl I a,) var( V, I a,) ' This variance depends on V, and is parameterized by 6, in the foregoing expression.The formula would be exactly correct if S, and V, were jointly Normally distributed, and it is a reasonable approximation for our problem.

Calculating Summaries
Finally, we simulate several vectors y from the posterior density P(y I u,).Each of these samples determines an electoral system, for which we approximate the seats-votes curve and its variance, as described previously.From the seats-votes curve, we calculate the bias and responsiveness of the system between 45% and 55% [Eqs.(7)].Finally, we estimate the bias and responsiveness of the true electoral system, and our uncertainty in these quantities, with the sample mean and variance of these values, over the many independent samples of y.
All computations were done in the Gauss computer language on an IBM PS12.

RESULTS
The procedure described in Section 6 produces estimates of an electoral system from the results of a single statewide election.This includes estimates of the seats-votes curve, its variability, and summaries such as the bias and responsiveness functions.Our model assumes that district votes move in an approximate uniform manner as the statewide vote totals change.Because of the lack of information, we assume the absence of spatial correlation.Finally, we assume that the district votes roughly follow a three-hump distribution specified by our family of prior distributions.Within these constraints, our model is quite general and fits recent legislative electoral data quite well.
An example of the complete results appear in Figure 6 results for each election from 1968 to 1980, using Formula (7).
The results for all seven years in Ohio appear in Figure 7, where responsiveness is plotted by partisan bias.Pooled standard error estimates appear in the lower left of the figure.The black square marks 1968, a year of moderate responsiveness but with an extreme bias favoring the Republicans.The next square is 1970, which is close to and within two standard errors of 1968.In 1971, the Democrats controlled the redistricting process, dramatically affecting Ohio's electoral system: the dotted line drawn between bias.The other change in the figure is a noticeable trend after redistricting toward lower responsiveness.
The changes in Connecticut's electoral system are portrayed in Figure 8.All of the years in Connecticut have electoral systems that are quite responsive, particularly compared with Ohio.In 1968 and 1970, Connecticut had essentially no partisan bias.The 1971 redistricting was controlled by the Republicans, and their effect in biasing the system in their favor seems quite dramatic-again much beyond what one would expect due to random variability.This dramatic effect seems ephemeral, however, since over the course of the rest of the decade the electoral system worked its way back to just about where it began.The Republican gerrymanderers in Connecticut were obviously not as successful as their Democratic counterparts in Ohio.We speculate that the pattern of incumbency retirements accounts for this difference-particularly since the Watergate landslide in 1974 helped to defeat many Republican state legislators.both houses of the state legislature, Wisconsin was redistricted by a bipartisan agreement between the parties.Redistricting thus has a quite predictable non-effect on the system: the change from 1970 to 1972 is no greater than most other changes between consecutive elections in this graph.Political scientists have speculated that bipartisan redistricters primarily try to protect incumbents; with fewer seats of both parties vulnerable to electoral swings, this would decrease responsiveness (Mayhew 1971).Surprisingly, Wisconsin's responsiveness changes no more across redistricting than between any other two consecutive elections.Of course, responsiveness in Wisconsin started from a low base; perhaps redistricters could not reduce responsiveness any further due to the geographic pattern of voters in the state.
When controlling the redistricting process, partisans have successfully biased the electoral system in their favor, at least in the short term.A glance at Figures 7-9 shows that redistricting had no systematic effect on responsiveness in any of the three states.All previous seats-votes models have been either deterministic, entirely theoretical, or average over many elections.Some have ignored partisan bias and either fit responsiveness or fixed it to the value of 3.0; other models have assumed the electoral system to be constant over several elections.We explicitly model variability and generate estimates and standard errors of bias and responsiveness for each statewide election.A comparison of the changes between elections with the standard errors in Figures 7-9 leads us to reject deterministic models and those with constant bias and responsiveness.[Received November 1988. Revised September 1989.1 Votes Received by Democrats and Republicans in Ohio Legislative House Districts, 1972 and 7974 District Democrat Republican Distr~ct Democrat Republican Distnct Democrat Republican average of 10% of the seats uncontested in any election.

Figure 7 .
Figure 7 .Stem-and-Leaf Plot of the Proportion of the Vote Received by a Party in a Contested District Election, Immediately Preceding an Election in Which That Party Was Unopposed in That District.

Figure 3 .
Figure 3. Stem-and-Leaf Plot of the Democratic Proportion of the Two-Party Vote in Contested District Elections in Ohio, 1972.