Measuring productive vocabulary of toddlers in low-income families: concurrent and predictive validity of three sources of data

This study examined parental report as a source of information about toddlers’ productive vocabulary in 105 low-income families living in either urban or rural communities. Parental report using the MacArthur Communicative Development Inventory – Short Form (CDI) at child age 2;0 was compared to concurrent spontaneous speech measures and standardized language assessments, and the utility of each source of data for predicting receptive vocabulary at age 3;0 (Peabody Picture Vocabulary Test) was evaluated. Relations between language measures of interest and background variables such as maternal age, education, and race/ethnicity were also considered. Results showed that for the sample as a whole, parental report was moderately associated with other language measures at age 2;0 and accounted for unique

variance in PPVT at age 3 ; 0, controlling for child language skills derived from a standard cognitive assessment.However, predictive validity differed by community, being stronger in the rural than in the urban community.Implications of significant differences in background characteristics of mothers in the two sites are discussed.
Remarkable variation is observed across children in the rates at which they acquire vocabulary over the first three years of life.Such variation characterizes vocabulary acquisition of toddlers in middle-class families (Fenson, Dale, Reznick, Bates, Thal & Pethick, 1994), in demographically diverse households (Feldman, Dollaghan, Campbell, Kurs-Lasky, Janosky & Paradise, 2000), and in predominantly low-income homes (U.S.Department of Health and Human Services, 2001; Tamis-LeMonda, Bornstein & Baumwell, 2001;Pan, Rowe, Singer & Snow, 2003 ;Spier, Tamis-LeMonda, Pan & Rowe, 2003).Given the importance of language development in general, and vocabulary specifically, for later academic achievement (Snow, Burns & Griffin, 1998), understanding the course and pace of vocabulary development during the first three years of life is of interest to both researchers and practitioners.With a few notable exceptions (e.g.Roberts, Burchinal & Durham, 1999;Feldman et al., 2000), most research to date has focused on children from middle-class families or on small groups of children from working-class or low-income families (e.g.Hart & Risley, 1995).
A primary methodological challenge in this area is identifying sensitive, reliable measures of child vocabulary derivable from assessment tools that are cost-effective and whose administration is minimally intrusive for children and families.Given the latter concerns, and building on the honoured tradition of diary studies in child language research, parental report measures have much to offer.To the extent that parents of infants and toddlers are tuned in to what their children say, they have a rich database to draw on in characterizing children's language, both because they observe and interact with their children in a variety of contexts on a daily basis and because they may be better able to understand their own child's less than perfectly articulated speech.Contextual factors, as well as the child's own health and attentional state, are also less likely to influence assessment via parental report, as compared to standardized, intervieweradministered instruments or observed spontaneous speech samples.
At the same time, concerns have been raised about the validity of parental report, particularly with respect to minority and low-income families (Arriaga, Fenson, Cronan & Pethick, 1998 ;Roberts et al., 1999 ;Feldman et al., 2000).One means of assessing the validity of parental report is by examining its concurrent and predictive association with other measures of vocabulary, language, or cognitive development.Thus far, such examination in children from racially and socioeconomically diverse families has been confined to associations between parental report and standardized measures.In the study presented here, we triangulate these sorts of data with data on children's spontaneous speech.
A second approach to assessing the validity of parental report is to examine the extent to which patterns of variation in children's vocabulary as measured by parental report conform to patterns of variation based on other sources of information and/or those reported in the literature.For example, maternal education has consistently been shown to relate positively to child vocabulary using a variety of indices (e.g.spontaneous speech, standardized measures of receptive/expressive vocabulary).One would expect, then, to find the same sort of relationship between maternal education and children's vocabulary based on parental report.Thus, in addition to concurrent and predictive associations between parental report and other measures, we examine here the extent to which patterns of association between various background variables and children's vocabulary as measured by parental report, spontaneous speech, and standardized measures conform to what has been reported in the literature in general.Of course, in making such comparisons it is important to keep in mind that most existing literature is based on children from white, middle-class families, a point to which we return in our discussion.
Before turning to these questions, let us first consider briefly some of the advantages and disadvantages of the three types of data on children's vocabulary around age 2; 0 utilized in the current study.

Parent report: the MacArthur Communicative Development Inventory
Among the most widely used parental report measures is the MacArthur Communicative Development Inventory (CDI; Fenson et al., 1994 ;Fenson, Pethick, Renda, Cox, Dale & Reznick, 2000).The CDI is available both in long and short forms for infants aged 0 ;8 to 1 ;4 and toddlers aged 1; 4 to 2; 6.Early work on the original long-form CDI demonstrated moderate to strong associations between middle-class parents' report of two-year-old children's vocabulary production and concurrent measures of children's spontaneous vocabulary use (Fenson et al., 1994).Corkum & Dunham (1996) report moderate associations between maternal report and lexical word types in spontaneous speech of children aged 1;6 from middleclass families ; similarly, Dunham & Dunham (1992), studying a sample of middle-to upper-middle-class families, found moderately strong associations (r=0 .71-0 .72) between children's CDI scores and spontaneous lexical production at age 2; 0. Finally, Ring & Fenson (2000) report moderately strong association (r=0 .78) between CDI expressive vocabulary scores and child performance on an expressive vocabulary laboratory task administered to children between the ages of 1 ;8 and 2;6.
In addition to concurrent validity, the CDI demonstrates good predictive validity for children from white, middle-income families (Dale, Bates, Reznick & Morisset, 1989).For example, Corkum & Dunham (1996) report a correlation of 0 .45 between scores on the short form CDI at age 2 ;0 and children's verbal IQ two years later.Reese & Read (2000), studying a somewhat more diverse sample of New Zealand children and using a New Zealand version of the CDI, also report correlations in the range of 0 .43-0 .50 between CDI vocabulary scores at ages 1;9 and 2;1 and standard measures of children's expressive and receptive vocabulary at ages 2 ;8 and 3; 4.
However, as noted above, some researchers have raised questions about the concurrent validity of the CDI for racial minority and low-income children.Using a 50-word version of the CDI, Roberts & colleagues (1999) found that some low-income African-American parents appeared to underreport their children's early vocabulary, relative to other standardized language measures administered concurrently.Interestingly, however, when scores the researchers deemed ' questionable ' were omitted, a significant child gender effect emerged, implying that some African-American mothers may overestimate boys' vocabulary.Gender differences aside, the researchers caution that the CDI and its norms may be inappropriate for low-income African-American families.Feldman and colleagues (2000) have also questioned the validity of the CDI for certain sociodemographic groups.They investigated the relationship between maternal education and parent report of children's language at ages 1; 0 and 2 ; 0 provided by a large sample of parents, 42 % of whom were Medicaid recipients.Rather than the positive association between maternal education and child vocabulary the research literature might predict, they found that maternal education was NEGATIVELY associated with children's receptive and productive CDI scores at age 1; 0. However, of note for the current discussion is that the expected positive association between maternal education and child vocabulary production was found subsequently for the same sample at child age 2; 0 (see also Reese & Read, 2000).Thus, beyond the earliest stages of children's language development, parent report appears congruent with other sources of information.Neither the study by Roberts & colleagues nor that by Feldman and colleagues included data on children's spontaneous speech, and neither provided longitudinal data beyond toddlerhood to allow examination of the value of parental report of children's vocabulary use for predicting later vocabulary of children in low-income families.Thus there remains a need for more information on both the concurrent and predictive validity of parent report on children's vocabulary production in these populations.

Spontaneous speech samples
Spontaneous speech samples provide a somewhat different look at children's language skills, but have their own advantages and disadvantages.One advantage is that spontaneous speech samples provide some sense of how the child actually uses language in interaction with an adult, usually a parent or other familiar individual, and thus are potentially more ecologically valid.Much of the existing literature on child language development, including research on the size and growth of children's vocabularies (e.g.Huttenlocher, Haight, Bryk, Seltzer & Lyons, 1991 ;Bornstein, Haynes & Painter, 1998) is based at least in part on spontaneous speech samples.However, contextual variables such as setting, materials, and interlocutor verbal style may influence children's production (Yont, Snow & Vernon-Feagans, 2003).Furthermore, the time required to obtain, transcribe and analyse speech samples generally limits the length of speech sample that can be analysed, the number of children who can be studied, and to some extent, the populations easily available for study.These constraints may explain in part why there is a dearth of information on the spontaneous speech of infants and toddlers from low-income families.To our knowledge, there are no published studies comparing parent report of children's vocabulary and spontaneous speech of children from low-income families.

Standardized measures of language
Finally, standardized measures of language such as the Peabody Picture Vocabulary Test (PPVT-III ; Dunn & Dunn, 1997), and language measures derived from more general cognitive assessments such as the Bayley Scales of Infant Development (Bayley, 1993), provide a third source of information about children's language development.Such instruments offer a basis for interpreting scores of individual children or groups of children, relative to a particular norming sample, but have the disadvantage of requiring the infant or toddler to interact with an unfamiliar adult and to engage in activities that may be novel and decontextualized, relative to the child's everyday communicative activities.

Factors associated with variability in child vocabulary
Sources of variability in vocabulary size and vocabulary growth rate during the toddler years include both environmental and child factors.Family socioeconomic status has consistently been shown to relate positively to children's vocabulary size and lexical production.Generally this association is demonstrated by comparing across socioeconomic groups differing in parental education, occupation or income (e.g.Hart & Risley, 1995 ;Lawrence & Shipley, 1996 ;Dollaghan, Campbell, Paradise, Feldman, Janosky, Pitcairn & Kurs-Lasky, 1999 ;Hoff, 2003).
More recently, attention has begun to focus on variability in child vocabulary growth and use WITHIN low-income samples and on covariates of the observed variation (Arriaga et al., 1998 ;Pan & Rowe, 1999 ;Roberts et al., 1999 ;Pan et al., 2003 ;Spier et al., 2003).The current study seeks to build on this work by examining potential relationships between selected background variables (maternal age, education, and race; child gender and birth order) and measures of child vocabulary/language at age 2;0, as well as identifying predictors of children's receptive vocabulary at age 3;0.

Maternal age, race, and education
Children of younger mothers tend to have lower vocabulary scores by age 3 ; 0, although the relationship between maternal age and child vocabulary appears to be stronger for white than minority children (Moore & Snyder, 1991).In the present study, we were interested in whether the relationships between child vocabulary and maternal age and race/ethnicity reported in the literature would be observable at age 2;0 in parental report and confirmed at age 3 ; 0 in PPVT scores.
As Turley (2003) points out, younger mothers differ from older mothers in multiple ways that may affect child development.One such factor thought to be of particular importance for children's vocabulary and cognitive outcomes is maternal educational attainment.Hoff-Ginsberg & Lerner (1999), for example, showed that maternal education effects on diversity of child vocabulary use are observable at the upper ends of the education distribution.They found that children of high school-educated parents produced fewer different words in conversation with their mothers than did children of college-educated parents.Similarly, Dollaghan & colleagues (1999), studying a more sociodemographically diverse group of 241 three-year-old children, found significant linear trends across three education levels (less than high school diploma, high school diploma, more than high school diploma) for the number of different words children produced spontaneously in 15 minutes of conversational interaction with their caregivers.As noted above, however, some researchers have not found maternal education to be positively related to parental report of children's vocabulary, particularly in early toddlerhood (Feldman et al., 2000).In the current study, therefore, we thought it important to examine potential relationship between maternal education and child vocabulary measures.

Child gender and birth order
The effects of child variables such as gender and birth order are less consistently reported in the literature.For example, a slight advantage for girls at the earliest stages of vocabulary development has been demonstrated using a variety of types of data (e.g.Fenson et al., 1994, andReese &Read, 2000, using parental report ;Bornstein et al., 1998, using maternal report, spontaneous production, and standardized measures ;Huttenlocher et al., 1991 andMorisset, Barnard &Booth, 1995, using spontaneous language production).In all cases, the advantage is quite small and short-lived.Nonetheless, the CDI does provide separate norms for boys' and girls' vocabulary.Similarly, advantages for firstborn over later born children have been reported occasionally in the literature (Goldfield & Reznick, 1990 ;Hoff-Ginsberg, 1998 ;cf. Reese & Read, 2000).Given the still limited information available on sources of variability in the language development of children from low-income families, child gender and birth order were also of interest.
To summarize, the current study was undertaken with the goal of using evidence from observed vocabulary use by children in spontaneous speech, as well as standardized measures of language/vocabulary to better understand the concurrent and predictive validity of parental report of vocabulary in children from low-income (i.e.welfare-eligible) families.In particular, we sought to compare concurrent parental report of children's expressive vocabulary, with children's observed spontaneous vocabulary use and with their language development more generally as measured by standardized cognitive assessments.The study focused on these concurrent measures at child age 2; 0, a point in development by which parental report in other samples has been shown to be reasonably accurate.We also investigated the relationship of parental report to background factors, in particular maternal education, given the somewhat mixed results with respect to low-income families reported in the literature.Finally, we examine the extent to which children's receptive vocabulary outcomes at age 3; 0 can be predicted by each of the three measures of children's language a year earlier.Specific research questions addressed, then, are : 1. How closely associated are measures based on parental report, spontaneous language, and structured assessments of children's language at age 2; 0? 2. How closely associated are these language measures with maternal education and other demographic and child factors ? 3. How predictive of child receptive vocabulary at age 3;0 is each of these language measures, controlling for child and maternal demographic factors ?
M E T H O D

Participants
One hundred and five mother-child dyads were drawn from a larger sample of approximately 3000 low-income families participating in a national study of the effects of Early Head Start.Families were recruited into the study when they applied for Early Head Start services either during the mothers' pregnancy or before the target child's first birthday.Eligibility for Early Head Start, and thus inclusion in the study, was based on meeting the income criterion for public financial assistance.The 105 families in the current study were drawn from two sites, 51 from a rural county in New England and 54 from an urban inner-city area in the Northeast.Criteria for inclusion in the study were : mothers were fluent in English and indicated that their child was a monolingual speaker of English or English-dominant ; mothers and children resided together through child age 2; 0 and complete child assessment data, parent report of children's vocabulary, and spontaneous speech measures based on parent-child videotaped interaction were available at child ages 2; 0 and 3 ; 0. Mothers ranged in age from 14 to 41 years at the time of their children's births (M=23; 4 years).The mothers from the urban group tended to be younger (M=20 years) than those from the rural group (M=26 years).All of the urban mothers were ethnic minorities (56%, n=30, black non-Hispanic ; 41 %, n=22, Hispanic ; and 3 %, n=1, of mixed heritage).Most (92 %, n=47) of the rural mothers were white, non-Hispanic with the remaining mothers describing themselves as black, non-Hispanic (2%, n=1), Hispanic (4 %, n=2) or of mixed heritage (2 %, n=1).At the time of the child's second birthday, 54% (n=29) of the urban mothers and 20% (n=10) of the rural mothers had less than a high school education, 18 % (n=10) of the urban mothers and 51% (n=26) of the rural mothers had a high school degree or GED, and 28% (n=15) of the urban mothers and 29 % (n=15) of the rural mothers had some education beyond high school.To summarize, samples from the two sites differed in maternal age, levels of educational attainment, and race/ethnicity, all variables that might be expected to influence either child language development, parental report on children's vocabulary, or both.

Procedure and measures
Families participating in the larger study were seen and/or interviewed on multiple occasions.The language data considered in the current study are from data collected when children were approximately 2; 0 and 3;0.During the home visit at child age 2; 0, primary caregivers (in all cases mothers) were asked to report on their children's language using the MacArthur Communicative Development Inventory -Toddler Short Form (CDI ; Fenson et al., 2000).Mother-child dyads were also videotaped during the same visit for 10 minutes as they interacted around a book and ageappropriate toys provided by the researcher.Finally, children were assessed using the Bayley Scales of Infant Development, from which a Bayley language score was constructed (U.S.Department of Health and Human Services, 2001) (see Dale et al., 1989, for a similar approach comparing CDI scores and language subscores from the Bayley Scales of Infant Development).
The MacArthur CDI -Toddler Short Form includes a checklist of 100 words drawn from the original longer version (Fenson, Dale, Reznick, Thal, Bates, Hartung, Pethick & Reilly, 1993).Parents are asked to check those words that their child has begun to say.Possible scores range from 0-100.The authors report overall correlation between the short and full toddler forms as 0 .99 (0 .98 with age partialled out).The norming sample for the short forms included only children whose primary language was English, although approximately 14% lived in bilingual households.The authors characterize the norming sample as being 'skewed away from the lower end of the sociometric distribution ' (Fenson et al., 2000, p. 104), thus raising some concerns about potential limitations in applicability to lowincome samples.Reliability assessed by means of Cronbach's coefficient alpha was 0 .99.The variable of interest in the current study was children's raw vocabulary production scores (CDI).
Word types and tokens were drawn from 10-minute videotaped observations of mother-child semi-structured free play in the home at child age 2; 0. Parent-child dyads were given three bags, the first containing a picture book and the next two containing age-appropriate toys, and parents were asked to play with their child as they normally would, beginning with the bag containing the book.Pacing and transition from one bag to the next were determined by the parent and child.Videotaped interaction of parents and children was transcribed using the conventions of the Child Language Data Exchange System (CHILDES ; MacWhinney, 2000).The unit of transcription was the utterance, defined as talk by one speaker bounded either by transition in speaker, by grammatical closure and/or by a pause of more than two seconds.Transcripts were verified either by a second transcriber or by the same transcriber after a period of at least two weeks.Automated computer analyses of the transcripts using the facilities of the CHILDES system yielded the number of word types (i.e.number of different, intelligible word roots) and number of word tokens (i.e. total number of intelligible words) produced by the child.Morphological variants of a single word root (e.g.dog, doggie, doggies ; read, reading) were considered to constitute a single word type.Word frequency lists were examined visually to eliminate any inconsistencies in spelling/transcription.The resulting number of word types (TYPES) produced provided a measure of diversity of observed vocabulary use.Total words produced (TOKENS) was included as a measure of child volubility and to investigate the possibility that parents might be responding to overall child talkativeness, rather than to children's vocabulary use per se.
The Bayley language factor (BAYLEY_LANG) was constructed based on factor analysis conducted by Boller (U.S.Department of Health and Human Services, 2001) of 42 Bayley Scales of Infant Development items appropriate for children ages 1 ; 11 to 2; 4. The factor analysis of responses of 1739 children participating in the larger evaluation of Early Head Start yielded a factor made up of 12 language items.Six of the items require the child to understand or produce lexical items, while the remaining six require syntactic and/or conversational skills (see Appendix A for all 12 items).
At age 3; 0, children were assessed using the PPVT-III, an untimed test of receptive vocabulary.In the current study, raw scores were converted to age-adjusted, standardized scores based on the published norms.Although we might have chosen to assess productive vocabulary at age 3:0, receptive vocabulary was the outcome measure of choice because it offered comparability with ongoing large-scale studies to assess low-income children's school readiness (e.g.U.S. Department of Health and Human Services, 2002) and because of its relationship to later reading achievement for children across the socioeconomic spectrum (Dickinson & Tabors, 2001 ;Scarborough, 2001).

R E S U L T S
Our results are organized around three sets of findings.First, we provide descriptive information on the various measures used to assess child language ability at ages 2; 0 and 3; 0 and present results of analyses examining associations among these measures.Second, relationships between language measures at both ages and other family demographics such as maternal age and education, child gender and birth order are presented.Finally, we provide findings from multiple regression analyses determining the extent to which the various child language measures at age 2 ; 0 predict children's receptive language skills at age 3;0, controlling for child and demographic factors.

Child language descriptives
As shown in Table 1, within this low-income sample there was large variation in children's language skills at age 2; 0 as assessed through parental report (CDI), spontaneous speech (TYPES, TOKENS), and the Bayley language factor (BAYLEY_LANG). Children's receptive vocabularies at age 3; 0 also varied widely (PPVT).
Associations among language measures at child age 2;0 Correlational analyses were undertaken to determine how closely associated measures of parental report, spontaneous language, and structured assessments were at child age 2; 0 and how these measures related to receptive vocabulary at age 3 ; 0. Table 2 presents the results of these analyses for the combined urban and rural sample. 1 The results show significant positive relationships among all measures, ranging from relatively weak to quite strong.The strongest associations were between the two spontaneous speech measures (TYPES and TOKENS).This relationship was expected, as the number of types is a function of the number of tokens.Of the two spontaneous speech measures, word types is more strongly associated with both the cognitive assessment language factor and maternal report, suggesting that the number of different words produced by a child is a better index of child language skills than the sheer talkativeness of the child.The maternal report measure (CDI) is most strongly related to the language factor derived from the Bayley (BAYLEY_LANG), and all associations between the three variables CDI, TYPES, and BAYLEY_LANG are of similar moderate strength (r ranging from 0 .49-0 .66, p<0 .001), suggesting that to some extent they each measure the same child language abilities.
Table 2 also shows that each language measure at child age 2; 0 was positively related to the PPVT at age 3; 0. The strongest of these relationships is between the CDI and PPVT (r=0 .50, p<0 .001), and between the Bayley language factor and the PPVT (r=0 .50, p<0 .001).These initial results provide support for using multivariate methods to determine which child language measures at age 2; 0 are the best predictors of child language comprehension at age 3 ; 0, controlling for relevant demographic factors.[1] All associations were also statistically significant when urban and rural samples were examined separately.

Relationships between demographic variables and child language measures
Demographic variables of interest were child gender and birth order ; maternal age, level of education, and race/ethnicity ; and urban vs. rural site residence.We conducted analyses of covariance (ANCOVA) between each categorical demographic variable and the language measures at age 2; 0 as well as the PPVT receptive language scores at age 3;0, with child exact age as a covariate. 2Least-squares means are reported to adjust for average child age and due to the unbalanced data for some demographic measures.Controlling for child age, there was no relationship between child gender nor birth order and any of the language measures at 2;0 or the PPVT at age 3 ; 0. There was a tendency for child PPVT scores to differ based on maternal education level (F (2, 102)=2 .94, p<0 .06).As noted above, maternal education level was categorized as (1) less than high school diploma ; (2) high school diploma or GED ; and (3) some schooling beyond high school.Post hoc tests showed that children of mothers with less than a high school degree scored significantly lower on the PPVT than children of mothers with some schooling beyond high school (p<0 .05).Comparisons between white, black, and Hispanic families revealed several differences related to race/ethnicity.Controlling for child age, the CDI differed by race/ ethnicity (F (3, 98)=4 .28, p<0 .05), with post hoc tests indicating that white mothers reported their children as having larger productive vocabularies than black (p<0 .05) and Hispanic mothers (p<0 .01). White children also scored significantly higher than their black and Hispanic counterparts on the PPVT (F (2, 99)=4 .11, p<0 .05).There were no differences between black and Hispanic children's scores on the CDI or the PPVT.Partial correlations of maternal age and language measures controlling for child age resulted in a positive relationship with the PPVT (p<0 .01).That is, children of older mothers did better on the PPVT at age 3 ;0 than children of younger mothers.Finally, urban vs. rural site differences were evident for [2] Child exact age at PPVT was taken into consideration through the normed scoring procedure and was thus not also controlled for in analyses.
the CDI (F (2, 102)=8 .52, p<0 .01) and PPVT (F (1, 103)=11 .15, p<0 .01), with the rural children scoring higher on average than the urban children on both measures.In this study, the race/ethnicity and site variables are very similar measures, given that none of the urban sample was white and only a few of the rural sample were non-white.However, as noted above, the two sites also differed in maternal age and educational level as well as race/ethnicity.In sum, our results suggest that at child age 2;0, white and rural mothers report their children to have larger vocabularies than non-white and urban mothers.At child age 3 ; 0, white mothers, rural mothers, older mothers, and mothers with more than a high school education have children with larger receptive vocabularies than their non-white, urban, younger, lessthan-high-school-educated counterparts.Given that each language measure at age 2 ; 0 was significantly positively related to children's receptive vocabulary at age 3 ; 0, our next step was to use multiple regression analyses to determine the extent to which language measures at age 2; 0 predict receptive vocabulary at age 3 ; 0, controlling for child and family demographic variables.Because site residence and race/ethnicity were measuring somewhat the same thing and could not both be used as predictors in the same multivariate models due to covariance, we retained site residence for the multivariate analyses presented here.However, we also looked at the role of race/ethnicity by means of dummy variables (white, black and Hispanic) to try and determine whether effects were due to urban vs. rural residence or race/ethnicity.
Predicting receptive vocabulary at age 3;0 We began our multiple regression model fitting process by starting with the language measures most strongly associated with PPVT scores, namely, CDI and BAYLEY_LANG.We then entered these two predictors in one model to determine their combined effect.Next we examined whether any of the other language measures explained additional variance in PPVT after controlling for children's CDI and Bayley language factor scores.Finally, we tested for interaction effects.Table 3 presents our model-building process using language measures and controls to predict PPVT.Model 1 shows the results of the simple regression model with CDI as a predictor.The CDI explains approximately 25 % of the variance in PPVT scores.Model 2 shows the results of the simple regression model with BAYLEY_LANG as a predictor.BAYLEY_LANG also explains approximately 25 % of the variance in PPVT scores.Together in the same model (Model 3) however, CDI and BAYLEY_LANG combine to explain 32 .6% of the variance in PPVT scores.Thus, the measures are not collinear, as each explains some unique variance in PPVT scores.Neither word types nor tokens were significant when included in a model already containing CDI and BAYLEY_LANG.In Model 4 we added in all of the controls that were previously shown to have significant relationships with PPVT (i.e.maternal education, age, and site).In this model, none of the control predictors remained predictive of PPVT with BAYLEY_LANG and CDI in the model, although the R-squared statistic increased approximately 6 % upon the inclusion of the controls.Next we investigated whether there were interaction effects between SITE and the other variables in the model.We anticipated that there might be an interaction between CDI and SITE, because we had learned previously that the scores on the CDI differed significantly by site, and therefore the role of the CDI as a predictor of PPVT might also differ by site.All interactions were tested and, as hypothesized, the interaction between CDI and SITE was a significant predictor (Model 5).Finally, as a last step, we removed the non-significant controls from the model (Model 6).
In separate analyses we looked at the role of race/ethnicity.When we included a dummy variable white/non-white instead of SITE in the same models, the findings were similar in that there was a white*CDI interaction, and the parallel Model 6 for the white analysis resulted in an R-squared statistic of 39 .69 % vs. 40 .54% for the Site analysis.Using a T A B L E 3. Regression models predicting children's receptive vocabulary skills at age 3;0 (PPVT) on the basis of child language measures at age 2;0, controlling for demographic variables (n=105) Hispanic vs. non-Hispanic dummy variable also resulted in similar findings, with a slightly smaller R-squared statistic (35 .65 %).Using a black dummy variable did not prove significant.Controlling for SITE and the other variables in Model 6, neither Hispanic nor black were significant additional predictors.Therefore, we are unable to tease apart the role of urban vs. rural site and race/ethnicity in this study due to the make-up of the samples.Model 6 in Table 3 is therefore our best or final model for predicting variation in three-year-old children's receptive vocabulary skills with language measures at age 2; 0 and controls.Examination of residuals from this final model showed that no assumptions were violated.Regarding the role of the Bayley language factor in predicting PPVT, this model tells us that controlling for CDI scores, urban vs. rural site (or race/ethnicity), and the interaction between site and the CDI, for every additional point on the Bayley language factor at age 2 ; 0, children, on average, do approximately 1 .5 points better on the PPVT a year later.Understanding the role of the CDI is a bit more complicated and is more easily explained through the use of a visual aid. Figure 1 shows the differing relationship between the CDI and the PPVT for children from the rural vs. urban site controlling for Bayley language scores.The dark black line shows the effect of the CDI in the urban, predominantly non-white sample, whereas the dashed gray line represents the effect of the CDI for the rural, predominantly white sample.The effect of the CDI on PPVT is positive in both sites, but the magnitude of the effect is larger in the rural, predominantly white sample than the

Urban Rural
Fig. 1.Effect of the CDI at age 2; 0 on PPVT scores at age 3; 0 for children in urban and rural sites, controlling for Bayley language factor scores at age 2; 0 (n=105).
urban, predominantly non-white sample.Thus, controlling for age 2 ;0 Bayley language factor scores, the role of maternal report (CDI) at age 2 ;0 in predicting children's receptive vocabularies (PPVT) a year later is stronger for the white, rural sample than for the non-white urban sample.

D I S C U S S I O N
Despite broad consensus as to the importance of children's early vocabulary development, methods of assessing the language abilities of toddlers remain problematic.Recognizing that standardized measures administered by a stranger are likely to underestimate children's true abilities, researchers and practitioners alike have looked to parental report and/or spontaneous speech samples as sources of information.Each of these, too, has its advantages and limitations.Parents enjoy access to a rich database of child vocabulary production in ecologically valid and varied settings.However, parents differ in the beliefs they hold about children's development in general and their language development in particular (e.g.Goodnow & Collins, 1990); they likely also differ in the extent to which they attend to children's early verbal production.Production of items parents deem remarkable may be more salient and more accessible to recall than are less unusual items.Even the proportion of particular word types can vary in parental report and in child spontaneous speech (Pine, Lieven & Rowland, 1996).Furthermore, lengthy checklists may try respondents' patience, while brief ones may undersample children's vocabulary.Spontaneous speech samples have the advantage of reflecting children's vocabulary use for authentic communicative purposes without children's production being filtered through parental recall.However, spontaneous samples tend to be brief (10-15 minutes in length) and are sensitive to contextual variables such as setting, activity, and interlocutor speech style (Yont et al., 2003).Moreover, the number of different words children produce is influenced by how much talk they themselves produce overall (Richards, 1987).Finally, the process of collecting, transcribing and analysing speech samples is costly and often difficult, if not unfeasible, in large-scale studies.
Thus, a key question for research as well as intervention is how closely associated these various indices of children's vocabulary are.The question is particularly pressing as it applies to low-income children's language development, both because as a group such children are at risk for languagerelated academic difficulties and because our research base in this area is still quite limited.This study undertook to examine the degree of association between three measures of low-income toddlers' language at age 2;0 and the relative value of each in predicting receptive vocabulary measured at age 3; 0.
Results from the present study showed that all concurrent measures of two-year-olds' vocabulary/language were moderately to strongly positively associated.Parental report was more closely related to concurrent observed vocabulary use than to child talkativeness, suggesting that parental report is indeed a measure of child vocabulary use and not simply a global assessment of the child's verbal production.At the same time, parental report on the CDI Short Form in this sample did not appear to be as highly associated with observed word types or with structured assessments as has been reported for middle-class children (Dunham & Dunham, 1992 ;Corkum & Dunham, 1996;Ring & Fenson, 2000).These results warranted further exploration of factors potentially associated with maternal report.
In the current sample, maternal education and children's CDI scores at age 2; 0 were not significantly associated.This lack of association is not consistent with Feldman & colleagues' findings either for children about age 1; 0 (negative relationship with maternal education) or those a year older (positive relationship).On the other hand, in the current sample, the lack of association between maternal education and CDI scores was also observed for the other three language measures (word types, tokens, and Bayley language factor), suggesting that parental report is congruent with other indices of vocabulary at this age and for this group of families.By age 3 ;0, a trend toward positive association between maternal education and children's receptive vocabulary was observed.It appears, then, that the relationship between maternal education and child vocabulary in toddlerhood may not be a very robust or stable one (at least over the range of education represented here), though the possibility remains that a continuous measure of maternal education, rather than the categorical one employed here, might yield more variation.In other work with the subsample of rural families studied here, we have found maternal language and literacy skills to be more predictive than maternal education of child vocabulary growth over the first three years of life (Pan et al., 2003).Maternal age showed a somewhat parallel pattern, in that it was not associated with any language measures at child age 2 ; 0, but was positively associated with receptive language a year later.
There does seem to be some cause for cautious interpretation of parental report by families differing in race/ethnicity.White mothers reported higher vocabulary scores for their two-year-old children than did black or Hispanic mothers, possibly reflecting either overestimation by white mothers or underestimation by black and Hispanic mothers.However, the case for over-or under-estimation is somewhat weakened by the additional race/ethnicity-related differences in receptive skills as assessed with the PPVT-III at age 3; 0. These findings suggest several possible interpretations.One is that the PPVT-III is not equally valid for minority and non-minority children, even though the current version is judged an improvement over earlier forms, at least with respect to African-American children (Washington & Craig, 1999).A second has to do with validity concerns regarding assessment of children growing up in bilingual households, as was the case for 41% of children living in the urban site in this study.For those children, the PPVT-III alone may underestimate receptive vocabularies.Even though mothers in this study all described their children as monolingual speakers of English, some children may have had substantial receptive skills in the home language that were not tapped by testing in English alone (Pearson, Fernandez & Oller, 1993).Yet another possibility is that the race/ethnicity-related differences validly reflect subtle racial/ethnic differences in parenting beliefs about the importance of child language and vocabulary, or about the pace at which language unfolds.Such differences in parenting beliefs may be first observable in parental report.
The results of this study point to the importance of using multiple sources of data in assessing the vocabulary of children in low-income families when possible and suggest that in the absence of resources for collecting/analysing spontaneous speech samples or administering standardized language/cognitive assessments, that maternal report can offer useful information about toddlers' vocabulary.At the same time, extreme caution is warranted in extending interpretation of results based on one racial/ethnic group of families to other groups.In particular, more research is needed with white and Hispanic children from low-income families, to complement previous research that has tended to focus more on African-American families (e.g.Roberts et al., 1999).
Our third question asked how predictive each of the data sources was of child receptive vocabulary at age 3; 0. Although all the language measures at age 2 ; 0 predicted PPVT scores a year later, correlations for the observed language measures were modest, whereas those for parental report and the Bayley language factor were somewhat stronger.These findings regarding the predictive validity of parent report in a low-income sample are comparable to those reported by Reese & Read (2000) for a sociodemographically diverse New Zealand sample.Unlike Reese & Read, however, we did not find that predictive validity of the CDI for later receptive vocabulary differed as a function of maternal education.These divergent findings may reflect differences in the distribution of maternal education levels in the two studies.Nearly half of the Reese & Read rural sample had post-secondary education, compared to about 30% of mothers in the current sample.Thus, it is possible that their education findings reflect differences across socioeconomic groups, whereas ours reflect within-group variation.
Like the CDI, observer ratings of children's language at age 2:0 (the Bayley language factor) are predictive of children's receptive vocabulary skills a year later.Parental report and observer ratings appear to predict largely, though not entirely, overlapping variance.Measures of children's observed production (word types, tokens) did not contribute additional variance, perhaps due to brevity of the spontaneous speech samples analysed, or contextual factors.These results are in keeping with those of Corkum & Dunham (1996), who found that neither number of child word types nor word tokens observed at age 2; 0 accounted for unique variance in child verbal IQ scores at age 4; 0.
Finally, let us consider the differential predictive value of parental report across sites found in this study.Our results indicated that the CDI seemed particularly sensitive to site and/or race/ethnicity effects.On the positive side, then, this suggests that parental report may offer an early hint at variability less discernible in other sources of information.At the same time, its superior predictive value for children in the rural site raises other intriguing questions that cannot be answered based on the findings of this study alone.What might account for the observed differences ?What cultural values are responsible for differences in the ways parents in the two sites reported child language ?Above we suggested one possibility, having to do with other languages spoken in the home.Another might have to do with parenting beliefs and values.An obvious limitation of the current study is that it is not possible to isolate the source(s) of variation across sites, given that families in the two sites, while all low-income, differed significantly in other key demographic characteristics, in particular maternal age, education, and race/ethnicity.These three factors, and perhaps others (e.g.father presence in the home), thus deserve further investigation with a larger sample in which confounding effects can be teased apart.Future work would also benefit from examination of parenting beliefs about language development held by subgroups of low-income parents who complete such inventories about their child's language skills.By triangulating data from parent report, spontaneous speech measures, and standardized assessments with a sizeable sample of low-income families, the current study provides a solid point of departure for such work.
Item 126 Child names three objects (ball, picture book, cup, spoon, pencil) Item 127 Child uses a three-word sentence (observed) Item 129 Child makes a contingent utterance (observed) Item 133 Child names five objects in photo (shoe, dog, cup, house, clock, book, fish, star, leaf, car) Item 136 Child poses question (observed) Item 142 Child produces multiple-word utterance in response to picture book A B L E 2. Correlation matrix showing simple estimated correlation coefficients (Pearson's r) between child language measures at age 2;0 and PPVT at age 3;0 T