Equalizing Outcomes and Equalizing Opportunities: Optimal Taxation When Children'’s Abilities Depend on Parents' ’Resources

Empirical research suggests that parents’ economic resources affect their children’s future earnings abilities. Optimal tax policy therefore treats future ability distributions as endogenous to current taxes. We model this endogeneity, calibrate the model to match estimates of the intergenerational transmission of earnings ability in the United States, and use the model to simulate such an optimal policy numerically. The optimal policy in this context is more redistributive toward low-income parents than existing U.S. tax policy. It also increases the probability that low-income children move up the economic ladder, generating a present-value welfare gain of one and three-quarters percent of consumption in our baseline case.


Introduction
Economists have long recognized that parents'resources and investment in their children may be key determinants of their children's outcomes (Arleen Leibowitz 1974; Gary Becker and Nigel Tomes 1976;Becker 1981;Becker and Tomes 1986). This paper is motivated by recent evidence that increasing the disposable incomes of poor parents raises the performance of their children on tests of cognitive ability. That …nding suggests that current tax policy may a¤ect the future distribution of underlying income-earning abilities in the taxpayer population, the key determinant of how di¢ cult a tradeo¤ between e¢ ciency and equality society will face in the future. The dominant modern model of optimal taxation is unable to take this e¤ect into account, as it assumes that the distribution of ability is entirely exogenous. Our paper is an analytical and numerical exploration of the implications for optimal policy of relaxing this assumption.
First, we generalize a standard dynamic Mirrleesian optimal tax model to include the e¤ect of parental disposable income on children's abilities. In the standard model of James Mirrlees (1971), ability is exogenously given. In our model, a child's ability depends on three components: parental ability, which is exogenous to the parent and child; parental disposable income, which is endogenously chosen by parents given the tax system; and a stochastic shock. These components imply that the process generating children's skills in our model is partly exogenous, partly endogenous, and stochastic. By combining these components, our model introduces a novel element to the recent literature on dynamic optimal taxation that seeks, among other goals, to capture the complex process by which society's ability distribution is determined. 1 Using this model, we derive analytical conditions that reveal the key e¤ects of endogenous ability on optimal intratemporal and intertemporal policy. On the intratemporal margin, we …nd contradictory forces at play. First, marginal income tax rates are lower on parents whose economic resources matter more for their children's expected abilities. Lower marginal tax rates encourage greater parental earnings and disposable income, and because of endogenous ability, these parents thus produce higher-skilled children from whom society will be able to collect more tax revenue.
Evidence suggests the impact of parental resources on child skills is largest among parents with low incomes, so this force is likely to lead to lower marginal tax rates on low incomes relative to high incomes. Second, if low-ability parents enjoy a high relative return to earning disposable income (due to its larger e¤ects on their children), incentive problems are relaxed relative to a setting with exogenous ability, so marginal distortions on low earners can be reduced (thus reinforcing the …rst factor). Third, and working in the opposite direction, lower marginal tax rates on low incomes will in expectation di¤erentially bene…t low earners in prior generations because low-income parents are more likely than high-income parents to have low-ability children. This di¤erential bene…t increases the temptation for high-ability ancestors to feign low ability by earning less and accepting the greater probability of having low-ability descendants. In doing so, it worsens the distortionary e¤ects of marginal taxes on e¤ort. This factor works against the others, acting as a force for higher marginal tax rates at low incomes. The relative strength of these forces determines how optimal marginal tax rates di¤er from a conventional policy.
On the intertemporal margin, we derive a condition showing that the allocation of resources should equalize the cost of raising welfare across generations, taking into account not only the marginal utilities of individuals in each generation (as in a conventional model), but also the e¤ects of the current distribution of resources on prior generations'incentives and future generations'tax payments and utility levels. As a result, welfare-maximizing policy takes advantage of its potential to shape the ability distribution of future generations. For example, suppose hypothetically that the ability distribution is stable across generations under an existing tax policy. A conventional optimal policy model would recommend that generations be treated similarly, as each generation resembles the next. Our model may recommend a di¤erent approach. Namely, the optimal policy in this case borrows from future generations to fund greater after-tax income for parents in the current generation.
Together, optimizing policy along these intra-and inter-generational margins can generate an upward trajectory for the ability distribution across generations, generating more productive future populations and greater welfare overall.
Second, we calibrate our model to empirical evidence and solve numerically for the optimal policy. The model calibration requires empirical estimates of key statistics describing the transmission of ability across generations under an existing tax policy. To generate these estimates, we study the e¤ect of policy changes in the U.S. Earned Income Tax Credit (EITC) on the ability levels of taxpayers' children. Our empirical approach adapts the strategy of Gordon Dahl and Lance Lochner (forthcoming) in order to generate estimates relevant to the calibration exercise we perform. 2 Speci…cally, dividing matched parents and children in the National Longitudinal Survey of Youth (NLSY) and Children of the National Longitudinal Survey of Youth (CNLSY) into four equally-sized ability categories each, we estimate the e¤ect of parents' after-tax income on the probability that parents in each wage category have children in each category, and we calculate the transition matrices between ability categories across generations. Then, using a smooth approximation of Laurence Kotliko¤ and David Rapson's (2007) estimates of e¤ective marginal tax rates in the United States as the status quo tax policy, we …nd the values of the model's parameters that yield a model output that best matches the target statistics, when optimizing households take that policy as given.
We use the calibrated model to simulate Utilitarian-optimal policy, and we …nd that the pattern of optimal average and marginal tax rates is very di¤erent than the status quo. The optimal policy redistributes substantially more toward low-ability parents and earlier generations than does the status quo policy. The increase in redistribution toward low earners is driven by the Utilitarian objective assumed in the conventional optimal policy model. Nevertheless, the increase in redistribution generates an upward shift in the mean ability level across generations relative to the status quo, with a smaller share of the population having lower abilities and a larger share having higher abilities. We calculate the increase in aggregate welfare due to only the improved evolution of the ability distribution. We …nd that the gain is equivalent to a 1.75 percent permanent increase in disposable income in our baseline case-i.e. an increase of one and three-quarters percent in disposable income for all generations. This paper introduces a new element to the active literature in dynamic optimal taxation.
Following the original contribution of Mikhail Golosov, Narayana Kocherlakota, and Aleh Tsyvinski (2003), most work in this area has considered the impact of stochastic and exogenous skill processes on the optimal taxation of an individual over his lifetime. 3

Emmanuel Farhi and Iván Werning
(2010) extend that approach to characterize optimal taxation across generations, noting in their opening sentences that "One of the biggest risks in life is the family one is born into. We partly inherit the luck, good or bad, of our parents through the wealth they accumulate." Their important analysis assumed, however, that children's skills are independent of their parents' abilities and their parents'economic resources, leaving unaddressed a core part of the "family risk" that is their focus. 4 We take up the complementary analysis. That is, we analyze optimal tax policy when the skill distribution of one generation depends on the skill distribution and the choices of the previous generation (subject to stochastic shocks). Because we allow the skill distribution to be endogenously determined, our paper is closely related to another body of work that extends the original dynamic optimal tax literature by allowing individuals'choices to a¤ect their own ability levels (see Casey Rothschild and Florian Scheuer 2011 or Michael Best and Henrik Kleven 2013, for example). 5 The core conceptual contribution of this paper is to take into account the dynamic interaction between exogenous and endogenous components of skill heterogeneity. 6 We consider how choices by parents a¤ect the abilities taken as given by their children, and how these abilities in turn a¤ect the set of choices available to children. This interaction is a central factor in policy design, in that it is the crux of the tradeo¤ between redistributing to the poor later (i.e. equalizing the distribution of outcomes) and investing in their skills now (i.e. equalizing the distribution of opportunities). Our …ndings suggest one way in which society might increase equality of both outcomes and opportunities. That is, if future skill levels among the poor can be increased through current transfers, the net bene…t of those transfers to society will be increased. Though its application is most apparent across generations, the interaction between natural ability and human capital investments is also relevant for issues such as the design of life-cycle tax and training policies and social insurance programs. 7 The paper proceeds as follows. Section 1 describes the model. Section 2 derives analytical conditions that describe the optimal policy both within and across generations. Section 3 calibrates the model to existing U.S. tax policy and new empirical evidence on the transmission of ability 5 Other examples include the following. Marek Kapicka (2006aKapicka ( , 2006b) allows a deterministic skill process to be endogenous. Borys Grochulski and Tomasz Piskorski (2010) allow a population of identical agents to choose a human capital investment, the output and depreciation of which are stochastic, thus combining stochasticity with a form of endogeneity. Dan Anderberg (2009) extends that approach by allowing for heterogeneous ability shocks, the e¤ects of which on earnings can be magni…ed or reduced by human capital investment undertaken by identical agents before the ability shocks are realized. 6 Kapicka (2006a,b) has heterogeneity in natural ability, but each type is …xed for life, and all types share the same human capital production function. Grochulski and Piskorski (2010) have no heterogeneity outside of shocks to the human capital production function, the returns to which are therefore not dependent on natural ability. Anderberg (2009) has human capital and an exogenous shock interact, but human capital investment decisions are made by agents before their ability heterogeneity is realized. 7 Modeling this interaction is challenging, however, and one technical contribution of this paper is a novel formal simpli…cation of the dynamics of the endogenous ability distribution. Rather than having parental resources directly a¤ect the levels of children's abilities, we locate the e¤ects of parental resources on the distribution of children across a …xed set of abilities. In combination with history-independence, the natural assumption that taxes on individuals do not depend on the income of their parents or children, our use of a …xed set of abilities with an endogenous distribution rather than an endogenous set of abilities substantially simpli…es the computations of the optimal policy. The alternative approach, in which types vary continuously with parental resources, means that a planner has to specify allocations for all possible deviation paths. This technique may prove useful in other contexts. across generations. Section 4 uses the calibrated model to simulate and characterize both the structure and welfare implications of optimal tax policy in our context. Section 5 concludes. An Appendix contains details of the analytical and empirical results.

Model
The formal model largely follows the standard setup of modern dynamic Mirrleesian analysis. Individuals obtain utility from consumption and disutility from exerting work e¤ort. Each individual has an unobservable ability to earn income, which he or she combines with an unobservable level of work e¤ort to determine pre-tax income, which is observable. The social planner designs a tax system that generates a mapping from pre-tax income to disposable income. Individuals optimally choose work e¤ort knowing this tax system, and thereby produce income, pay taxes, and enjoy the remaining disposable income as consumption. We assume no intergenerational transfers, so disposable income equals consumption for each generation.
The intergenerational focus of this paper requires some additional structure. 8  Formally, denote with p j w i t ; c i 0 t the probability that an individual of generation t + 1 is of type j given that her parent (in generation t) was of type i and had the disposable income c of type i 0 .

Planner' s problem
As in standard optimal tax analysis, we model the social planner as specifying a menu of allocations of earned income y and disposable income c. We describe here the planner's problem in the case in which children's ability may depend on parents'resources (but not on parents'time allocation); we later explore the results in the case in which children's ability also depends on parents' time allocation. By the Revelation Principle we can restrict attention to menus in which the planner intends a speci…c (y i t ; c i t ) bundle for each type i in each generation t. These allocations may di¤er across generations. The planner's objective is to maximize social welfare. Following the standard approach, we de…ne social welfare as the present-value utility of the population of families starting from the …rst generation; that is, the objective is Utilitarian. The planner's maximization problem is constrained in two ways: …rst, feasibility, specifying that disposable income must be funded by output; second, incentive compatibility, specifying that individuals choose optimally among the o¤ered bundles (i.e. maximize their own utility taking the tax system as given).
We also impose the constraint that taxes may depend on only the current generation's characteristics and choices. In other words, taxes are restricted to be independent of history and cannot depend on the income of the taxpayer's parents or children. This restriction is standard, as well as convenient, in a variety of dynamic optimal tax contexts such as  analysis of human capital. Moreover, history independence captures the explicit tax system in a realistic way: no tax system does or, we conjecture, ever will levy taxes on a child that depend in any direct way on that child's parents'characteristics. There seems to be a normative aversion to such history-dependence across generations, so we will impose it on the policy here.
Some aspects of policy, such as subsidies for children's education like 529 education savings plans, may seem to violate our assumption of history dependence because they condition policy on parents'resources. The bulk of these incentives lie outside of and are small relative to the overall history-independent tax and transfer system. More importantly, these policies do not condition on the income of the child, a necessary component of history-dependent policies. To see this, note that an optimal history-dependent tax policy in this model would condition redistributive transfers to low earners in one generation on whether their parents were low or high earners. In particular, it would reduce transfers to those low earners whose parents were high ability, thereby discouraging those parents from underinvesting in their children. 9 Policies such as 529 savings plans, even if conditioned on parental income levels, are qualitatively di¤erent, because they do not depend on the child's earnings. In fact, these policies are similar to the supplements to parental resources that our results recommend. Of course, our assumption of history independence is nevertheless a simpli…cation, and understanding the impact of realistic intergenerational history dependence would be a valuable avenue for future research.
Formally, the planner's problem is as follows: where U i t , the present-value expected utility of a family with generation-t parents of type i, is de…ned recursively as This is maximized subject to feasibility: where R is an exogenous revenue requirement, and R i t is the expected present value of all current and future tax revenue of a family with parents of type i, de…ned recursively as and incentive compatibility for each generation: for all generations t and types i; i 0 ; where U i 0 ji t denotes the utility of a generation-t parent of type i claiming to be type i 0 :

Limitations
Some apparent limitations of the setup deserve clari…cation.
First, while the setup has the same measure of parental resources serve as the quantity of consumption in the parent's utility function and the input to the child's ability production function, we are not asserting that the way in which parental disposable income is used is irrelevant to their child's ability. Rather, we are guided not only by tractability but also by the data. Our empirical evidence concerns the e¤ect on a child's ability of transfers to parents; we have no data on how those transfers were allocated. In order to calibrate to this evidence, our model must also leave the allocation of these transfers unspeci…ed. We use the term disposable income, rather than consumption, to make this aspect of our analysis clear. In principle, one could attempt to use the limited available data on the division of parental expenditure into consumption and investment in children's abilities to model more subtle optimal policies. Identifying the separate e¤ects of these categories of expenditure on children's ability would not be possible using our data and identi…cation strategy (as studying the causal e¤ect of the EITC on di¤erent categories of consumption is severely limited by empirical power issues (Gelber and Mitchell 2012)). Moreover, the appeal of more subtle policy that distinguished between these categories would be diminished by incentives for (largely unobservable) misreporting of spending across categories.
Second, we assume in the problem above that the allocation of parental time has no e¤ect on children's abilities. We later explore the case in which children's ability depends on both parental income and parents' hours worked. If parents work more, they could spend less time with their children. This in principle could either worsen children's outcomes (if, say, parents teach children skills in their non-work time) or could improve children's outcomes (if, say, parents'increased work serves as a role model for children's work in school). We …nd that, on net, parental time allocation has only a small e¤ect on our baseline results. However, as we discuss, our empirical estimate of the e¤ect of parents'hours worked on children's ability is more suggestive than our estimate of the e¤ect of parent income.
Third, only tax policy is modeled in this paper, but that does not imply that other policies play no role. Our empirical estimates take as given the existing set of non-tax policies and institutions, such as schools, that have e¤ects on children's abilities (including e¤ects that may interact with the tax system). Our model implicitly assumes that these policies and institutions are held constant as taxes vary, again an assumption we make to match the empirical evidence to which we calibrate the model.

Analysis of optimal policy
Our analysis of the planner's problem in expressions (1) through (5) generates two results. First, we characterize the distortion to an individual's choice of how much to earn, the classic subject of optimal tax analyses since Mirrlees (1971). Second, we derive a necessary condition on optimal allocations across generations that modi…es the conventional model's recommendation in an intuitive but powerful way.

Optimal marginal distortion to earned income
The classic object of study in optimal tax models is the marginal tax rate, or the distortion to the individual's marginal choice between disposable income and leisure. Formally, the ratio equals one if an individual sets the marginal disutility of labor equal to the marginal utility of consuming the income that labor earns. Any factor reducing the marginal utility of earnings (such as a positive marginal tax rate) causes this ratio to be less than one, distorting the individual's choice of labor e¤ort.
In the model above, in the absence of taxes, parent i in generation t would solve her own planning problem. Formally, she would choose how much income to earn to maximize her own utility subject to a personal feasibility constraint and given the expectation that her descendants, whose abilities are determined by the production function p j w i and y i t imply a distortion to a parent's private choice. 10 Lemma 1 Intratemporal Distortion: Let where where j t j c i is the probability that a generation t descendant of parent type i from generation is of type j and P i j gain from raising the disposable income of parent j. In particular, the gain is the weighted present value sum of net revenues obtained across types over time, with the weight on type k in generation t + 1 representing the increase in probability that children of parent type j will be type k when c j t is increased slightly. The planner values that revenue gain, while the parent does not. Intuitively, larger A j t means that the planner generates greater net revenue gain from having the parent obtain a larger disposable income, implying that optimal policy entails a smaller downward distortion (or larger upward distortion) to parent j's e¤ort. We call this factor the "revenue e¤ect." To the extent that the marginal e¤ect of additional parental disposable income on children's abilities is larger for lower-ability parents, this revenue e¤ect will be greater at low incomes, and smaller marginal taxes 11 at low incomes will be optimal.
Second, suppose the planner cannot observe ability, but parental resources have no e¤ect on children's abilities-in other words, the conventional Mirrlees model. In that case, @p k w j t ; c j t =@c j t = 0, so A j t = 1. Then, the wedge would be simply jjj 0 t , and the optimal distortion is driven by binding incentive constraints in the current generation. Note that C j 1 1 Throughout this section, we use the terminology of distortions and marginal taxes interchangeably. This assumes an implementation of the distortions through an explicit tax system, which is straightforward in our setting without capital. 1 2 The term D j t+s is unchanged from the conventional model, unless parental work e¤ort enters the child's ability production function. We explore that case below. 1 3 Technically, this holds if the weighted sum of these e¤ects, weighted by eventual child type, is larger.
is independent of parental ability, then C j t is unchanged from in the stand ard model, while if @ 2 p k (w j granted to low-income families are of less value to high-ability (and thus high-income) parents, high-income parents will be less tempted in this model to claim low ability than in a standard model, so smaller distortions at low incomes will be required.
The "ancestor incentive e¤ect" relates to the value of B j t , which measures how an increase in c j t a¤ects the incentive problems of taxing earlier generations who can a¤ect the probability that their descendants have the type j. For example, suppose w i 0 > w i , so that type i 0 is the higher type in generation , and w j 0 > w j , so that type j 0 is the higher type in generation t. In this case, > 0, so that B j t is smaller, and the optimal marginal distortion is larger, for low-skilled types in a model with endogenous ability than in a model without. Intuitively, a smaller distortion on a low-skilled type raises the temptation for previous generations to work less and produce low-skilled descendants. The same logic holds in reverse: if j is a high skill type, > 0: Then, B j t is larger and the marginal distortion is smaller for high-skilled types. Intuitively, we should decrease the marginal distortion on type j if doing so reduces earlier generations' incentive problems. In this way, the ancestor incentive e¤ect pushes against the revenue and relative return e¤ects, serving to increase marginal taxes at low incomes.
In the end, as this discussion and the previously-mentioned endogeneity of the allocations in these expressions suggest, the e¤ect on optimal distortions of introducing endogenous ability is ambiguous. To get a sense for this ambiguity, consider the case of a low-ability parent. If parental resources have greater marginal e¤ects on the children of low-skilled parents, then A and C are likely to be larger for these parents, reducing the optimal distortion due to the revenue and relative return e¤ects. At the same time, B is likely to be smaller because increasing this low-skilled parent's resources makes it harder to incentivize previous generations to exert e¤ort, increasing the optimal distortion due to the ancestor incentive e¤ect. On net, the optimal distortion could be smaller or larger than in the conventional model.
Further intuition can be obtained by examining the case of only two ability types. In the case of two ability types, only one of the incentive constraints will bind within any given generation, allowing us to write result (7) more concisely for each ability type. We provide those expressions in the Appendix.
One of the lessons of Lemma 1 is that a two-period version of the model in this paper would obscure key aspects of the optimal policy problem. To see this, consider the two novel terms in the lemma, A j t and B j t . A j t depends on how a marginal increase in current disposable income a¤ects the tax revenue raised from all future generations. B j t depends on the incentive constraint multipliers for all previous generations. Any two-period model will neglect one of these two channels.

Allocations across generations
We now turn to analyzing intertemporal allocations. In a conventional model, the planner's …rstorder condition for c j t can be shown to equal: Summing across types and combining with the same condition for generation t + 1 immediately yields a condition on allocations across generations.
This condition, parallel to the Symmetric Inverse Euler Equation in Weinzierl (2011), shows that the optimal allocation equalizes the cost, in disposable income units, of raising social welfare across generations. A version of it also applies to optimal tagging, such as in N. Gregory Mankiw and Weinzierl (2010).
With endogenous ability, expression (12) may not hold. Instead, a modi…ed version of it applies, which we state in the following proposition and derive in the Appendix. 14 Proposition 1 The solution to the Planner's Problem satis…es To understand Proposition 1 intuitively, recall the meaning of the Symmetric Inverse Euler Equation in expression (12), namely that the cost of raising social welfare through transfers to one generation must be the same for all generations. Proposition 1 is the same condition, but in the more complicated context of this model economy. In a conventional model, the average inverse marginal utility of disposable income in a generation determines the cost of raising welfare through transfers to a generation. In our model, that cost also depends on three novel factors that we now discuss.
First, if the transfer raises individual j 0 s investment in her children's abilities, resulting in increased tax revenue from future generations, these revenue gains o¤set the costs of the transfer.
Formally, this factor is captured in the expression P k h @p k w j t ; c j t =@c j t i R k t+1 , and it is closely related to the revenue e¤ect identi…ed in the discussion of Lemma 1. This expression is the present value of the net change in future taxes paid by individual j 0 s children when c j t increases. Second, if the transfer raises individual j 0 s investment in her children's abilities, resulting in increased utilities for future generations, these welfare gains augment any direct changes in utility from the transfer. Formally, this factor is captured in the expression P This expression is the present value, per additional unit of utility for individual j, of the increase in utility enjoyed by individual j 0 s children when c j t increases. Third, both the relative return and ancestor incentive e¤ects from the discussion of Lemma and incentive constraints do not bind in preceding generations (i.e., i 0 ji = 0 for all ; i; i 0 ): If, instead, transfers to a generation relax incentive constraints 15 that were preventing low-income parents from having the disposable income to make relatively high-return investments in their children, the expression t is less than one. Similarly, if transfers to a generation relax incentive constraints that bind on ancestors whose o¤spring are relatively common in the recipient generation, the expression t is less than one. The smaller is t , the larger is the optimal transfer to generation t. Intuitively, the more that binding incentive constraints are preventing investments in children in either the current or preceding generations, the more the planner wants to use transfers to relax those incentive constraints.
As one might expect, the overall implications of these three novel factors for optimal policy are theoretically ambiguous; to build intuition for their e¤ects, consider a speci…c, empirically plausible scenario. Namely, suppose that mean ability is stable over time and the e¤ects of parental resources on a child's ability are largest at lower skill levels. 16 Conventional policy designed to satisfy the expression (12) would treat generations symmetrically, and those allocations would satisfy equation (13). However, that conventional policy fails to take advantage of the endogeneity of the ability distribution.
Consider, instead, a policy that transfers resources from generation t + 1 to generation t and, in particular, increases the resources available to the low-ability workers in generation t. Such a policy would violate the conventional expression (12), as it would lower the marginal utilities of disposable income for generation t and raise them for generation t + 1, increasing the left-hand side and decreasing the right-hand side of (12). Intuitively, the conventional perspective implies that the relative cost of raising welfare under such a policy is too high in the recipient generation t; the resources ought to stay with the future generation.
Such a policy is consistent with the true optimal policy condition (13), however, because of endogenous ability. To see why, note that the policy will increase the population proportion of higher-ability workers in generation t + 1. This shift in the distribution of k t+1 will put greater weights on workers with larger inverse marginal utilities of disposable income and smaller gains in future revenue and utility for their descendants from marginal resources. As a result, transfers from generation t + 1 to generation t increase the cost of raising social welfare in generation t + 1. This 1 5 That in the usual way, i.e., high-ability parents are tempted to claim lower ability to obtain a more generous tax treatment. 1  o¤sets what seemed to be a problem, namely that those transfers increased the cost of raising social welfare in generation t by directly lowering marginal utilities of income. Mathematically, then, this policy will increase both the left-hand and right-hand sides of (13). Therefore, equation (13) may be satis…ed with a policy that treats generations asymmetrically and generates greater welfare. In other words, transfers from future to earlier generations generate gains for all generations: early generations gain from having higher disposable incomes, and future generations gain from having improved ability distributions.
This example implies that an optimal policy making use of the endogeneity of the ability distribution may di¤er from the conventionally-optimal policy. While result (13) does not prove that such a superior policy equilibrium exists, the simulations of Section 4 show that the scenario described above …ts the empirical evidence from the United States, and that the potential welfare gains from such a policy are substantial.

Labor as an input to children' s ability
As we discuss above, it is also possible that children's ability could depend on parents'hours worked.
If children's ability depends on their parents' hours worked, the planner's problem includes the dependence of p ( ) on labor e¤ort (or, equivalently, time not devoted to labor e¤ort).

Problem 2 Planner's Problem with Parental Labor as an Input to Child Ability
where This is maximized subject to feasibility: where R is an exogenous revenue requirement, and and incentive compatibility for each generation: for all generations t and types i; i 0 ; where U i 0 ji t denotes the utility obtained by an individual of type i when claiming to be type i 0 : Simplifying as in the model without labor e¤ort in the ability production function, we obtain the following Lemma parallel to the …rst.
Note that the division of the intratemporal result into a wedge and an expression equal to what the parent would choose continues to hold in this setting. The parent's optimum would now be: The main di¤erences once parental labor e¤ort enters the child ability production function are as follows (recall that parents internalize the direct e¤ect of their time allocation on their children's abilities, as in the expression for the parental optimum above). Assume extra parental time at work is detrimental to child ability, so that First, the term A j t now captures that extra parental e¤ort will lower future revenues, so the optimal downward distortion to labor e¤ort is larger (i.e. the term A j t is smaller). Second, if extra parental time at work is more detrimental to child ability for low-ability parents, then D j t will be smaller. This reduces the optimal distortion on lower types. Intuitively, incentive constraints are looser with this e¤ect, because high-skilled parents gain relatively less from the lower labor e¤ort requirements they would enjoy if they claimed a low income. In the Appendix, we show that including parental time as modeled here has only small e¤ects on the results from our baseline quantitative analysis, which we describe in the next Section.

Model calibration under existing U.S. tax policy
In this section, we calibrate the model of Section 1 to estimates of the e¤ect of parental resources on children's ability under existing U.S. tax policy. We focus our calibration on matching empirical estimates of statistics related to the transmission of ability across generations under the status quo tax policy. In particular, we minimize the distance (i.e. sum of squared deviations) between the model's output and the empirical estimates of the marginal e¤ects of parental resources on their children's abilities, the transition matrix between generations, and the expected log wage within generations.

Empirical estimates of the target statistics
We adapt to our framework the empirical work from a recent major study of parents' taxes and children's outcomes. Dahl and Lochner (forthcoming) study the e¤ect of expansions of the EITC in the 1990s on children's test score outcomes. 17 Rather than calibrating our model using a crosssection of data, we use a modi…ed version of the Dahl and Lochner empirical strategy in order to generate more credible estimates of the causal e¤ect of parental income on child ability (as estimating such credible e¤ects is the focus of their study). Their study examines a speci…c context, and we must generalize outside of the speci…c features of this context with caution. While recognizing this caveat, we choose to examine this context because we believe that it represents one of the best available opportunities to study the e¤ect of tax policy toward parents on children's outcomes in the United States. We refer readers to their paper for a full description of their empirical strategy and its motivation, but we brie ‡y describe their empirical strategy here, often borrowing from their description of it.
The size of the EITC, which is a refundable tax credit primarily bene…tting low-and middle- . However, they have not estimated the e¤ect of parents' disposable income on children's wage rates in large part because linking the income of children's parents when the children were young to children's wage outcomes when they have grown into adults requires a long panel of data in which all of these variables are linked. This coincidence of data is unlikely in circumstances with suitable exogenous variation in parents'disposable income. In fact, our paper suggests a new empirical object of interest that should be studied in future work: the e¤ect of parents' disposable income on children's wages. income families, depends on earned income and the number of qualifying children. The EITC tax schedule has three regions. Over the "phase-in" range, a percentage of earnings is transferred to individuals. Over the "plateau" region, an individual receives the maximum credit, after which the credit is reduced (eventually to zero) in the "phase-out" region. Near the period studied in this paper, the EITC was expanded substantially in the tax acts of 1986, 1990, and 1993. The largest expansion of the EITC was in 1993. This reform increased the additional maximum bene…t for taxpayers with two or more children, which reached $1400 in 1996. The phase-in rate for the lowest-income recipients increased from 18.5% to 34% for families with one child and from 19.5% to 40% for families with two or more children. Tests (PIAT), which measures oral reading ability, mathematics ability, word recognition ability, and reading comprehension. Dahl and Lochner's instrumental variables estimates suggest that a $1,000 increase in family income raises math and reading test scores by about 6% of a standard deviation. 18 We estimate a model similar to Dahl and Lochner's, using the same basic sample of data they use (described more fully in their paper and in the Appendix), but we use it to obtain a slightly di¤erent empirical object. Motivated by our model above and simulation below, we estimate the e¤ect that income has on the probability that a parent of given ability type produces a child of a given ability type (controlling for the child's lagged ability type). We present summary statistics in Appendix Table 1 and run preliminary regressions-shown and discussed in detail in the Appendix to demonstrate the viability of our approach-in Appendix Tables 2 and 3. In our main speci…cation, we divide parents into four wage (ability) categories fP i g 4 i=1 and divide children into four test score categories fC i g 4 i=1 . Each category comprises one quartile of the sample distribution of wages or test scores, respectively, with subscript i indicating the quartile of the distribution, where i = 1 is the lowest quartile. Because there are four parent types, we estimate four separate regressions, in each of which the dependent variable is a dummy that equals one when the child has ability in the i-th category. We classify parents into wage types by ranking them according to their average wage over the full sample period.
In choosing the number of categories, we take into account competing technical and conceptual considerations: more categories will give the calibration more targets as well as better describe the true heterogeneity of the population and, therefore, the potential gains from optimal policy; but too many categories will prevent the regressions in the empirical estimation from having enough positive values of the dependent variable to yield meaningful results. It turns out that using too few categories fails to provide enough empirical targets for the calibration exercise to converge on a best set of parameter values: this factor requires us to use at least four categories. Elsewhere in the Appendix, we also show results with …ve and ten categories. Those results show heterogeneity in tax rates at a …ner level of disaggregation at the cost of a substantial loss of power in the empirical estimates. Our analyses with …ve and ten types yield similar results as does our benchmark fourtype model. The Appendix describes our speci…cation and data in detail.
In Appendix Table 4, for each regression, we show the estimated marginal e¤ects of parental resources on their children's abilities, with standard errors in parentheses. The signs of the coe¢ cients in Appendix Table 4 generally conform to expectations: higher parental income predominantly increases the probability that a child is high-ability (i.e. in the third or fourth quartile) and decreases the probability that a child is low-ability (i.e. in the …rst or second quartile), with the "correct" sign of the relationship occurring in 13 out of 16 regressions. Moreover, in the three cases in which the relationship takes the "wrong" sign, only one of these point estimates shows a non-negligible e¤ect: only one shows that the e¤ect of a 1% increase in parental income causes a change in the child's probability of being in a given a category larger than 0.1 percentage point. The regressions in Appendix Table 5 additionally control for the …rst-di¤erence of parent hours worked and show a very small impact of parent hours on child ability, as we discuss further in the Appendix.
In addition to the estimates in Appendix Table 4, our calibration targets the elements of the empirical ability transition matrix between generations and the expected log wage within a generation. Using the same dataset and de…nition of types as in the analysis just described, we can readily generate the transition matrix by calculating the fraction of the sample from each parent wage category who began the sample period with the child test score in each category. The results are in Appendix Table 6. 19 The mean log wage is also readily calculated, as the average of the log of the four ability levels shown above, to be 2.07.

Model speci…cation
We next describe how the model produces quantities corresponding to these target statistics, and we specify some components of the model required for simulation.
The quantities corresponding to the targeted statistics are generated by the model as follows.
In the planner's problem, the production function for a child's ability was left unspeci…ed. Here, we impose a particular, tractable form: 20 the expected ability of the child of a parent of type j Expression (20) shows that the child's expected ability is a function of the parent's ability, a …xed "reference" ability level, and the parent's disposable income. The child's expected ability is in ‡uenced by the parent's ability w j t relative to the …xed ability level w, indicating mean reversion in characteristics transmitted across generations (consistent with the empirical evidence on income, e.g. Steven Haider and Gary Solon 2006). This log-linear functional form concisely captures the basic forces at work in this model determining the transmission of ability across generations. Namely, it allows us to adjust the role of parental ability in determining a child's ability through the parameter . It also allows us to vary the relative importance of this channel and a second channel, parental resources, by adjusting the parameters a and j c . Note that the dependence of j c , the parameter controlling the importance of parental disposable income, on j, the parental ability type, establishes a direct connection between the exogenous and endogenous components of the ability production function.
The speci…cation in expression (20) imposes no restrictions on whether the marginal value of parental resources is increasing or decreasing in the child's innate ability. In particular, our assumed production function for child ability allows j c to vary with parental type j in expression (20). Depending on how j c varies with j-a relationship we will estimate in our simulationsthe marginal value of parental resources may increase, be constant, decrease, or exhibit complex nonlinearities as innate ability increases. 21 We translate the expected ability in expression (20) into an ability distribution for the population of children of parents of type j with disposable income c j 0 t by assuming that ability is distributed lognormally with variance 2 : The ability distribution over the income range relevant to this paper is commonly calibrated as lognormal (e.g. Tuomala 1990). The variance 2 represents an exogenous, stochastic shock to child ability common across parent types.
The simulations of this model will use a discrete distribution of abilities, consistent with the model described in Section 1, whereas expressions (20) and (21) appear to produce continuous ability distributions. To classify individuals into I discrete types, we de…ne …xed ranges of w that correspond to each type i 2 I. By "…xed," we mean that the boundary values of w that determine whether an individual is assigned wage w i or w i+1 are exogenously given. With these …xed ranges, we can translate the distribution of ability for a given child implied by expression (21) into transition probabilities among types across generations. 22 Applying this procedure, we can generate the transition probabilities k t+1 j c j t for all parent and child types. This structure also enables us to calculate the marginal e¤ects of parental disposable income as the increase in the probability of a given child type caused by an increase of one percent in a given parent type's disposable income. Formally, to compute the marginal e¤ect of c j , we calculate the semi-elasticity of the probability of each child type with respect to parental disposable income.
That is, we calculate the change in the probability of each child type associated with an incremental 2 1 We do not estimate this production function directly using our empirical approach because our empirical approach relies on a …xed e¤ects speci…cation, which would di¤erence out parent ability. Our regression speci…cation estimates a coe¢ cient on parental income that is comparable to the coe¢ cient on parental income in (20) : 2 2 An example may help clarify the procedure. Suppose I = 2, so that there are two ability types. Denote the …xed wage level that separates types 1 and 2 as w . A mother of type j expects her child to, on average, have the ability E ln w2jw j 1 ; c j 1 as de…ned by expression (20) : In reality, her child's ability is a random variable distributed according to N E ln w2jw j 1 ; c j 1 ; 2 . The probability that her child's ability ends up in the lower half of the full distribution of wages across all children is, therefore, the value at w of the cumulative density function implied by this normal distribution.
increase in the log of parental resources c j .
Expressions (20) and (21) indicate that the model calibration will search over values of the following parameters: , such that with I = 4 there are seven values to estimate.
As a baseline case, we will impose = 0:5 for the parameter controlling the transmission of ability across generations. This assumption is based on the voluminous evidence surveyed in Marcus Feldman, Sarah Otto, and Freddy Christiansen 2000. 23 We show the robustness of our results to this choice in the Appendix. We also impose the value of = 0:76; which we calculate using data on wages from the NLSY sample. This leaves …ve parameters to be chosen by this calibration.
Finally, before proceeding with the calibration, we specify the tax system facing individuals, the utility function those individuals maximize, and the set of ability types. For the status quo tax system, we assume that the Kotliko¤ and Rapson (2007)   Note that our data do not allow us to extend our calibration directly to higher incomes, a limitation that could, in principle, a¤ect our results because both the existing and optimal tax policies would redistribute substantial resources from higher earners. We show an extension of our analysis in the Appendix that suggests our results are robust to this potential concern.
The government's tax system also includes a grant to all individuals, which is constant across generations, as are tax rates. As in the feasibility constraint on the planner, expression (3), the government's budget is balanced in present value, where we set = 1:00, re ‡ecting no discounting of utility across generations. In the Appendix, we show that our results are robust to a modest degree of discounting, but note that there is no growth in this economy, so any discounting re ‡ects solely a preference for the utility of earlier generations.
The individual's current-generation utility takes a separable, isoelastic form where controls the concavity of utility from disposable income, controls the elasticity of labor supply, and is a taste parameter a¤ecting the level of labor e¤ort. Again, we choose this functional form for the sake of tractability and because it helps in illustrating the key features of the model in a straightforward way. We set = 2 and = 3 to be consistent with mainstream estimates of these parameters (which implies that the Frisch elasticity of labor supply is 1 2 ). We choose = 2:5 so that hours worked in the simulation approximately match the average labor supply in the population. 24 Finally, guided by the empirical analysis discussed above, we assume ability comes in I = 4 …xed types (roughly interpretable as the hourly wage): 25 w i t 2 f3: 44, 6:30, 9:42, 19:57g for all t = f1; 2; :::; T g. The probability distribution across those types is uniform in the …rst generation but is endogenously determined in the model for subsequent generations. 26

Calibration Results
To calibrate the model, we minimize a weighted sum of squared errors, where the targets are the marginal e¤ects and transition matrix shown in Appendix Tables 4 and 6 as well as the mean log wage. We weight the squared errors by the inverse of the targets'standard errors, which has the e¤ect of putting much greater weight on the more-precisely-estimated transition matrix elements and the mean log wage. We use ten generations (T = 10) in the simulations, allowing for several generations surrounding the middle (…fth-to-sixth) generation that we use as the target for the calibration exercise. We show robustness to this choice in the Appendix. Table 1 shows the parameter values chosen by the simulation. Recall that a and j c are the weights on the two channels, ability and economic resources, through which parents a¤ect their child's ability. The product of and a gives the weight on parental ability in expected child ability, while j c gives the (parental type-speci…c) weight on parental resources. The monotonically declining values of j c in Table 1 suggest that parental resources play a greater role among lower-ability parents, consistent with the empirical evidence. 27 Key moments determining the estimates of the j c are the coe¢ cients on parent income in determining child ability from Appendix Table 4. Key moments determining both the estimates of the j c and the estimate of a are the elements of the transition matrix of parent ability to child ability in Appendix Table 6, as these determine the combined role that parent ability and parent resources play in determining child ability.
The simulation does well in matching the empirical targets for which the data is most informative, namely the transition matrix and mean log wage. The simulation yields marginal e¤ects of parental resources that di¤er substantially from the data, as Appendix Table 7 shows for the …fth (middle) generation of the simulation. The calibrated status quo marginal e¤ects exhibit a pattern much closer to what intuition would suggest-negative for lower child types and positive for higher child types-than do the estimated e¤ects in the data. This is not surprising, however, given the statistical insigni…cance of the empirical estimates and their often-unexpected signs. Table 2 shows that the simulation closely matches the data for the transition matrix between generations (we show the transition between the …fth and sixth generations of the simulation as an illustrative example). 28 Finally, the simulation matches the mean log wage, 2.07.
As we show in the Appendix, these results are robust to varying time discounting , the number of generations T , the assumed persistence of type across generations , and the number of types I.
Løken, Mogstad, and Wiswall (2012) …nd a larger e¤ect of parental income on child achievement among lower-income families than among higher-income families. Consistent with these …ndings, we …nd that within each parent ability level the e¤ect of parental income on child achievement is concave. 2 8 The intergenerational correlation of ranks of income in Chetty et al. (2014) is only 0.34, suggesting substantial intergenerational mobility. In the Appendix, we show that if the parameter controlling the intergenerational correlation of ability ( ) is higher, the gains from optimal policy are increased.

Optimal Policy
In this section, we simulate a many-period version of the planner's problem using the calibration from the previous subsection. We characterize optimal policy by comparing it to the status quo policy used in that calibration. Table 3 shows average and marginal tax rates for each type under the optimal and status quo policies. 29 Average tax rates are calculated as the ratio (y c) =y. For marginal tax rates, we compare the marginal tax rates imposed by the status quo policy to the marginal tax rates that would implement the optimal allocation. The latter are the wedges that distort individuals'choices of labor e¤ort. In the discussion of Lemma 1, we showed that the wedge for parent of type i in generation t, which we denote as i t , can be written and D t are de…ned above in expressions (8) ; (9), (10) ; and (11) :  Table 3 shows that the optimal policy has very di¤erent average and marginal tax rates than the status quo. The optimal policy is substantially more redistributive, generating large transfers to low-skilled parents. This result is due entirely to the redistributive preferences of the social planner. 30 Nevertheless, these redistributive transfers generate an improved ability distribution by capitalizing on the gap between the impact of increased disposable income on the children of low-ability parents and high-ability parents. Optimal policy imposes larger marginal distortions (other than on the highest type) to make the allocations for lower types less attractive to those with higher ability, who expect to have children with higher ability on average.
The optimal policy also adjusts intertemporal allocations to capitalize on the endogeneity of ability, as was suggested in the discussion of Proposition 1. Table 4 reports the di¤erence between the planner's "budget balance" as a share of aggregate income in each generation under the optimal policy and under the status quo policy. In other words, it is the additional average tax rate assessed on each generation by the planner, relative to a balanced budget as assumed in the status quo. Optimal policy -Status quo policy -5.1% -0.7 -0.9 -0.9 -0.9 -0.9 -0.9 -0.8 -1.0 11.3 Table 4 shows that the optimal policy borrows from future generations to fund greater investment in the skills of the current generation relative to the status quo. Of course, our model abstracts from many features of the economy, notably capital as a factor of production, some of which may make de…cit-…nanced investment in children less appealing. However, the key point illustrated by Table 4 is that society can bene…t by having later generations contribute, through higher taxes, to improving the ability distribution generated by earlier generations. 31 These di¤erences in tax policy a¤ect the evolution of the ability distribution. We report the transition matrices for types across generations under the optimal and status quo policies. Table 5 repeats the transition matrix from Table 2 for the calibrated status quo model and compares it to the transition matrix under the optimal policy. The optimal policy enables a substantially greater share of the children of lower-skilled parents to move up the skill ladder than does the status quo policy. The optimal policy has much smaller-but non-negative-e¤ects on the prospects of the children of higher-skilled parents. Intuitively, Tables 3   and 4 show that the optimal policy takes resources from higher-skilled parents and later generations to support lower-skilled parents and earlier generations without worsening the prospects of children of the higher-skilled. This moves resources from those for whom the e¤ect of resources on a child's ability is lower to those for whom they are higher (that is, from smaller to greater values of i c ). As these transition matrices imply, the evolution of the ability distribution is di¤erent under the optimal and the status quo policies. Figure 2 shows the ability distribution in the …fth generation under the two policies, which closely resembles the distribution in all generations after the initial one. This …gure shows the substantial shift toward a higher ability distribution under the optimal policy that results from the greater progressivity of the optimal policy; the optimal policy leads to 1.8 percent fewer individuals of the lowest type and 1.6 percent more of the highest type. Welfare is much higher under the optimal policy, and it is more equitably distributed. In fact, the welfare gain of moving from the status quo policy to the optimal policy is enormous: it is equivalent to a 21 percent permanent increase in disposable income. But this very large gain is predominately driven by something other than the e¤ect of policy on the ability distribution. In particular, the optimal policy's Utilitarian foundation places a high value on income equality, so the greater redistribution to low-skilled parents under the optimal policy than under the status quo policy generates most of this large estimated increase in welfare. Because we may be interested in the importance of the endogenous ability channel alone in generating welfare gains, we consider the following thought experiment.
Suppose that the status quo model were granted the distribution of abilities generated by the optimal model for all generations; we call this the "adjusted status quo." Suppose further that we hold …xed the within-period utility levels of all individuals in the status quo model, but we calculate the total welfare for the economy given the adjusted status quo ability distributions. This will generate a greater level of welfare. Now, returning to the status quo tax policy's ability distributions, we calculate the factor by which disposable income would have to rise in the status quo model to reach the welfare of the adjusted status quo. This factor is a measure of the welfare gain due solely to the optimal policy's e¤ects on the ability distribution over time. Similar factors can be calculated for each type of …rst-generation parent, as well, indicating how the welfare gains through this channel are shared. Table 6 shows the results for the baseline case of ten generations. As these results show, the optimal policy has the potential to generate a welfare gain equivalent to one and three-quarters percent of aggregate disposable income simply by shifting the ability distribution over time. The gains are somewhat larger among low-skilled parents, as would be expected. Nevertheless, high-skilled parents gain substantially, as the e¢ ciency gains and greater equality accruing to future generations raise the current generation's present-value welfare. Gains for future generations follow the same patterns.
In the Appendix, we explore the robustness of these baseline results to variation in time discounting , the number of generations T , the assumed persistence of type across generations , and the number of types I. The qualitative and quantitative lessons of the baseline analysis prove to be robust. In particular, optimal policy that takes advantage of endogenous ability is more redistributive than the status quo, shifts resources from future to earlier generations, generates an upward shift in the ability distribution, and yields a sizeable welfare gain.

Conclusion
In this paper, we explore the possibility that equalizing individuals'economic outcomes may help to equalize their children's opportunities: that is, when poor parents have more disposable income, their children's performance improves and they have greater opportunity to succeed. We study the e¤ect that this intergenerational connection has on optimal tax policy, which will take advantage of this relationship to shape the ability distribution over time. But exactly how it will do so depends on complex interactions between natural ability and the returns to investment in human capital.
Ours is the …rst paper we know of to model this complexity and derive policy implications.
We characterize conditions describing optimal tax policy when children's abilities depend on both inherited characteristics and parental (…nancial) resources. On the intratemporal margin, we highlight competing e¤ects of this endogeneity. If parental resources have greater marginal e¤ects on the children of low-skilled parents, then optimal distortions may be smaller at low incomes because of their positive e¤ects on overall tax revenues and the incentives of high-skilled parents.
On the other hand, larger distortions at low incomes have a bene…t in encouraging preceding generations to invest in their children's ability pushes in the other direction. In the end, the implications for optimal marginal distortions are ambiguous. On the intertemporal margin, we show that optimality requires a more sophisticated understanding of the cost of raising social welfare through transfers across generations, in particular including the e¤ects of one generation's resources on future generations'tax payments and utilities.
We calibrate our model to microeconometric evidence on the transmission of skills and new estimates of the e¤ects of increases in disposable income on a child's ability, which we obtain by analyzing panel data from the NLSY in the United States. We then simulate optimal policy in this calibrated model and compare it to an estimated version of the existing U.S. tax code. The schedules of optimal average and marginal tax rates are very di¤erent from those in existing policy, as the optimal policy is substantially more redistributive and shifts the ability distribution up over time. This shift in the ability distribution generates a welfare gain equivalent to 1.75 percent of total disposable income in perpetuity, with larger gains for the poor. Even higher-skilled members of the current generation gain substantially, however, as the gains in e¢ ciency and equality in future generations raise the current generation's present-value welfare.
Of course, future research may be able to improve our understanding of the tax policy studied in this paper. For example, when a panel dataset of su¢ cient duration allows us to link data on parents' and children's wages, this will allow estimates of the intergenerational e¤ect of parental income on parent-child wage transitions. Incorporating other dimensions of parental in ‡uence is another natural next step. We have shown (in the Appendix) that parental leisure versus work time does not seem to exert an important in ‡uence in this case, but one might study how the composition of parents'available resources (i.e., as disposable income or in-kind, such as education) a¤ects the results. Such analyses may have implications for a broader class of policies that, like the taxes in this paper, could be used to a¤ect-rather than merely respond to-the dynamics of the ability distribution.