The Youth Self Report: Applicability and Validity Across Younger and Older Youths

The Youth Self Report (YSR) is a widely used measure of youth emotional and behavioral problems. Although the YSR was designed for youths ages 11 to 18, no studies have systematically evaluated whether youths younger than age 11 can make valid reports on this measure. This study thus examined the reliability and validity of the YSR scales scores for younger (ages 7–10; n = 184) and older (ages 11–14; n = 147) youths. Results demonstrated that younger youths were able to provide reliable reports on the YSR broad band (Internalizing, Externalizing) scales, though less so on the narrow band scales. Across all scales, the externalizing scales performed more favorably than the internalizing scales among both younger and older youth. Younger youths’ DSM-oriented scales corresponded significantly with DSM diagnoses.

The Youth Self Report (YSR; Achenbach & Rescorla, 2001) is a prominent and widely used youth self-report measure for the assessment of emotional and behavioral problems among youth ages 11 to 18. Despite the wide usage of the YSR, a notable gap in the evidence base of the YSR is that few studies have assessed the reliability and validity of the YSR scales scores for youths younger than 11 years old. This is an important gap to fill given that some researchers have already begun administering the YSR to younger youths (below the intended age range of 11-18 years; Kolko & Kazdin, 1991;Yeh & Weisz, 2001). Demonstrating more conclusive psychometric support of the YSR with younger youth samples would also provide the field with an empirically supported assessment tool with broadened applicability to enhance child assessment practices in both research and clinical contexts.
Previous studies have examined this question with other youth self-report measures. For example, Muris, Meesters, Eijkelenboom, and Vincken (2004) examined the psychometric properties of the Strength and Difficulties Questionnaire (SDQ) and found general support for this measure for use among younger youths (ages 8-10), although the scale was originally intended for use with youths ages 11 to 17. A few studies have also evaluated the psychometric properties of YSR scales among youths younger than 11 years old. Kolko and Kazdin (1991) reported pilot testing the YSR among younger youth and reported that 6-year-olds were only ''somewhat familiar'' with the five items related to medical or physical conditions (p. 538). Kolko and Kazdin (1993) also administered the YSR to children ages 6 to 13 and found no differences between younger (6-9 years old) and older (10-13 years old) youths with respect to parent-child and teacher-child agreement on the YSR Internalizing, Externalizing and Total Problems scales. Yeh and Weisz (2001) also reported no differences between younger (ages 7-10) and older (ages 11-18) youths' YSR syndrome scales' coefficient alpha values and test-retest reliability estimates.
Despite these initial explorations, no study has thoroughly or systematically examined the YSR scales across multiple psychometric domains with younger samples. Such studies are needed given that the YSR continues to be used with youth younger than 11 years old (e.g., McCarthy & Weisz, 2002;Treutler, & Epkins, 2003). Additional questions regarding the validity of younger youths' YSR reports remain unanswered (e.g., factor structure, concurrent validity), and thorough psychometric investigations specific to the YSR are needed before researchers and clinicians should begin widely using the YSR among younger samples. It remains unknown, for instance, whether younger youths can provide reliable and valid reports on both the YSR broad band and narrow band scales.

THE PRESENT STUDY
The current study examined the psychometric properties of both younger (ages 7-10; n ¼ 184) and older (ages 11-14; n ¼ 147) youths' reports along the following dimensions: (a) factor structure, (b) scale reliability, (c) concurrent validity, and (d) parent-child agreement. Within each domain, we examined whether the test statistics of the younger group met general cutoff criteria for adequate reporting as well as whether their test statistics were significantly different than the older group.
We hypothesized that the younger youths' reports would be associated with model fit indices in acceptable ranges, as previous studies have demonstrated that younger youths' reports on internalizing and externalizing measures were associated with adequate model fit indices (e.g., Muris et al., 2004). Regarding scale reliability, Yeh and Weisz (2001) previously examined the broad band and syndrome scales, reporting .76 as the average (internal consistency) alpha value among their younger youth sample and no significant difference from the average alpha in their older group. We thus predicted that the Cronbach's alpha coefficients of the younger group's scales would not be significantly less than those of the older group. We could not make specific hypothesizes regarding the concurrent validity of younger youths' reports on the Diagnostic and Statistical Manual of Mental Disorders (DSM)-oriented scales given the mixed findings pertaining to their performance in the literature (cf. Ferdinand, 2008;Vreugdenhil, van den Brink, Ferdinand, Wouters, & Doreleijers, 2006). Based on Achenbach, McConaughy, and Howell's (1987) meta-analysis and Meyer and colleagues' (2001) review of parent-child agreement on psychosocial problems, we hypothesized that the older youth would evidence significant parent-child correlation coefficients in the range of .20 to .25. We further hypothesized that parent-child agreement for the younger group would show significantly smaller correlation coefficients than the older group given the generally lower parent-child agreement findings among younger youths (e.g., Edelbrock, Costello, Dulcan, Conover, & Kalas, 1986;Grills & Ollendick, 2003).

METHOD Participants
Youths in the present sample were drawn from 333 consecutively referred children and adolescents ages 7 to 14 who were seeking treatment in community clinic settings in Hawaii and Massachusetts for problems related to anxiety, depression, and=or conduct problems. Criteria for selection into the present study included having available YSR data. All 333 consecutively referred youth had available YSR data. To help ensure that all YSRs represented valid reports with sufficient data, inclusion into the study also required each YSR measure to have no more than eight problem items missing, as recommended by the measure's developers (Achenbach & Rescorla, 2001). Two participants were excluded due to having more than eight missing YSR items, yielding a final sample size of 331 youths. We computed Child Behavior Checklist (CBCL) scales only if the CBCL also had eight or fewer missing items.
Information on the total number of diagnoses in our sample appears in Table 1. Youth ages ranged from 7 to 14 years (M ¼ 10.6, SD ¼ 1.7), and caregiver ages ranged from 21 to 78 (M ¼ 41.2, SD ¼ 9.7). Youths from the two clinics generally did not differ. 1 Additional youth and primary caregiver demographic information appears in Table 2.
Children's Interview for Psychiatric Syndromes, Child Version (ChIPS; Fristad et al., 1998;Teare, Fristad, Weller, Weller, & Salmon, 1998). The ChIPS is a semistructured interview designed to be administered to youth ages 6 to 18 years old. The interview screen for 20 different Axis I disorders and are based on the DSM-IV (American Psychiatric Association, 1994) classification criteria. Content and concurrent validity, and inter-rater agreement of the ChIPS have been demonstrated in previous studies in clinical and community samples (e.g., Fristad et al., 1998;Teare et al., 1998).
YSR (Achenbach & Rescorla, 2001). The YSR is a self-report questionnaire developed to assess problems in youth ages 11 to 18. The 119 items on the YSR are rated as 0 (not true), 1 (somewhat or sometimes true), or 2 (very true or often true). The YSR developers intended it to be completed by youth with a mental Note: N ¼ 331. Primary ¼ a child's primary diagnosis; Anywhere ¼ a diagnosis that appears anywhere in a child's diagnostic profile; PTSD ¼ posttraumatic stress disorder; NOS ¼ not otherwise specified; ADHD ¼ Attention-Deficit=Hyperactivity Disorder; PDD ¼ Pervasive Developmental Disorder; Other includes substance abuse, substance dependence, enuresis, trichotillomania. Diagnostic data were missing for three younger youths and one older youth. Therefore, the total number of primary disorders (including no diagnosis) does not sum to the total sample size of 331.
age of 10 and fifth-grade reading skills (Achenbach & Rescorla, 2001). 2 Validity and reliability of the YSR broad band, syndrome, and DSM-oriented scales have been documented, and extensive normative data are available for children ages 11 to 18 (Achenbach & Rescorla, 2001). We used raw scores for all analyses.

Procedure
Legal guardians of all participating youths underwent standardized Institutional Review Board-approved notice of privacy and consent procedures prior to any data collection. Following consent provided at the initial meeting with the youths and their caretakers, the youths and caregivers filled out questionnaires including the YSR and CBCL. Youths also participated in the ChIPS structured interview conducted by assessors who were clinical psychology doctoral students and bachelor-level trained staff. 3 Assessors were blind to the YSR and CBCL scores while formulating diagnoses.

Data Analyses
Data preparation. Although missing data levels were low in our sample (80.5% and 81.6% of the 331 participants had no missing YSR and CBCL items, respectively; and 12.3% and 12.7% had only 1 missing YSR and CBCL item, respectively), missing data were handled using the Missing Value Analysis module of SPSS 15.0 (SPSS, 2006). 4 To help ensure that all YSR and CBCL subscales were valid, we calculated each subscale only if it had less than 20% missing items (cf. Ebesutani, Bernstein, Nakamura, Chorpita, & Weisz, 2010).
Confirmatory factor analysis. We explored the model fit of the YSR narrow and broad band scales using both younger and older subsamples. We conducted confirmatory factor analysis (CFA), using LISREL 8.8. We used the comparative fit index (CFI; Bentler, 1990) and the root mean square error of approximation Although the YSR developers intended the YSR to be completed by youth with a mental age of 10 and fifth-grade reading skills, analysis of the YSR items via the Flesch-Kincaid readability scale (Flesch, 1951) yielded a Flesch Reading Ease score of 100.0 and a Flesch-Kincaid Grade Level score of 0.6. Flesch Reading Ease scores of 90 to 100 indicate easily understandable items for an average 11-year-old student, and the Flesch-Kincaid Grade Level score corresponds to approximately Grade 1 reading level. These results thus indicate that the YSR items are highly readable, even among children younger than 11 years old. 3 Although interrater reliability data of these structured interviews were not gathered, assessors in the present study were trained to reliability using the ChIPS. Becoming trained to reliability involved (a) observation of three ChIPS interviews conducted by trained assessors, (b) conducting a series of five ChIPS interviews while being observed by a criterion-trained assessor, (c) matching the experienced assessor on all clinical diagnoses in three of the five interviews, and (d) matching the experienced interviewer on the Clinical Severity Ratings (CSRs) within at least 1 point on all diagnoses given. CSRs are ratings provided by the assessor which range from 0 to 10 and indicate clinical severity of each disorder. CSRs at 5 or higher indicate clinically significant severity for each disorder. 4 Notably, missing item values can be a sign that items were not understood by the respondent (e.g., the youth). We thus examined the number of missing items specific to the younger and older youths in the present study. The number of missing YSR items for both the younger and older groups were low. Specifically, the percentage of younger and older youths with missing YSR items were as follows: no missing YSR items ¼ 80% and 85%, respectively; one missing item ¼ 13% and 11%, respectively; two missing items ¼ 3% and 1%, respectively; three to eight missing items ¼ 4% and 3%, respectively. Both younger and older youths thus had comparable (low) levels of missing data. YSR SCALES (RMSEA; Steiger, 1990) statistics to evaluate model fit. CFI values of .90 or greater (Bentler, 1990) and RMSEA values of .08 or lower (Browne & Cudeck, 1993) suggest good model fit. We then conducted a multisample CFA to assess the degree to which the DSM-oriented scales were invariant across younger and older youths with respect to factor form and other related model parameters (i.e., factor loadings, factor correlations, error variance).
Scale reliability. We evaluated the reliability of the younger and older youths' reports on each of the YSR scales through estimating internal consistency via Cronbach's alpha coefficients. We used the .80 as the cutoff for acceptable reliability, as recommended by Nunnally and Bernstein (1994) for scale scores intended for use in clinical settings. Differences in internal consistency between groups were evaluated via F tests for Cronbach's alphas from independent samples (Feldt, 1969;Feldt, Woodruff, & Salih, 1987), adjusting the p value criterion to less than .003 (.05=17) to control for Type 1 error rates. As a basis for comparison, we also computed Cronbach's alpha coefficients for the narrow and broad band CBCL scales among the younger and older groups.

Concurrent validity.
We used an analysis of variance and receiver operating characteristic (ROC) analyses to examine the degree of correspondence of younger youths' reports on the DSM-oriented scales with related DSM diagnoses. For the ROC analyses, Area Under the Curve (AUC) values indicate the degree to which an indicator predicts binary classification status (e.g., presence= absence of a diagnosis). AUC values may be interpreted as follows: AUC of .50 to .70, poor; .70 to .80, fair; .80 to .90, good; .90 to 1.00, excellent (cf. Ferdinand, 2008). We also compared the relative performance of the younger and older youths' reports via z test comparisons of AUC values (p value criterion adjusted to <.003 [.05= 16] to control for Type 1 error rates).
Correlational analyses. Last, we examined parentchild agreement 5 of the younger youths compared to older youths. We used Fisher's z tests to examine differences in (independent) correlations between groups (p value criterion adjusted to <.003 [.05=16] to control for Type 1 error rates). To determine significance of individual correlations, we used the significance level of p < .01.

Factor Structure Across Younger and Older Youths
Adequate model fit was demonstrated among the younger and older samples for the six-factor DSM-oriented scales (younger: RMSEA ¼ .068, CFI ¼ .87; older: RMSEA ¼ .070, CFI ¼ .87), and the eight-factor syndrome scales (younger: RMSEA ¼ .077, CFI ¼ .80; older: RMSEA ¼ .070, CFI ¼ .74). The multisample CFA solution evidenced support for ''equal form'' of the six-factor DSM-oriented problems model across younger and older groups (i.e., RMSEA multi-sample ¼.069). Further, allowing correlations between factors to be freely estimated did not significantly improve fit compared to specifying all factor correlation pairs to be equal across younger and older groups, v 2 freely estimated model ð260Þ ¼ 4809:91; v 2 constrained model ð2641Þ ¼ 4845:82; v 2 difference ð21Þ ¼ 35:91; p > .01, suggesting that the correlations between factors are generally equal across groups. Overall, the YSR scales evidenced supportive factorial validity across both younger and older youths.

Internal Consistency Across Younger and Older Youths
The Cronbach's alpha values associated with reports on the YSR and CBCL specific to younger and older youths appear in Tables 3 and 4, respectively. Results revealed that the YSR narrow band scales did not achieve adequate levels of reliability (a < .80) among the younger group, whereas the older group performed much better with respect to this benchmark. The younger youths' YSR broad band internalizing and externalizing scale scores, however, did meet the benchmark for acceptable reliability (a ¼ .88, a ¼ .88, respectively), supporting the reliability of the broad band scale scores for application with younger youth in clinical settings. This is an important finding, particularly as Muris and colleagues' (2004) found that younger youths (ages 8-10) were not able to provide reliable reports on the SDQ scales, including the Total Difficulties scale (a ¼ .76).

Concurrent Validity
Anxiety Problems scale. As seen in Table 5, both younger and older youths' reports on the Anxiety Problems scale were able to discriminate anxious youths 5 As some CBCL DSM-oriented, syndrome, and broad band scales contain additional items not present on the YSR (i.e., the CBCL DSMoriented Conduct Problem scale includes two more items than the YSR DSM-oriented Conduct Problem scale; the CBCL Internalizing scale includes one more item than the YSR Internalizing scale; the CBCL Externalizing scale includes three more items than the YSR Externalizing scale; five of the eight CBCL Syndrome scales include one to three more items than the corresponding YSR Syndrome scales), we rescored these CBCL scales excluding the nonoverlapping items. We then used these rescored CBCL scales (based on YSR=CBCL overlapping items only) in the correlational analyses, so as to eliminate bias toward lower correspondence due to the additional CBCL items.

342
EBESUTANI ET AL. from non-anxious youths, as evidenced by significant F tests and AUC values significantly greater than chance level (i.e., AUC > .50). However, AUC values for the younger group fell in the ''poor'' range, whereas the AUC values for the older group fell in the ''fair'' range. AUC values between groups did not significantly differ.
Attention Deficit=Hyperactivity (ADH) Problems scale. As seen in Table 5, younger youths' reports on the ADH Problems scale were able to discriminate youths with ADHD diagnoses from youths without ADHD. AUC values for both the younger and older groups fell in the ''fair'' range and did not significantly differ.
Oppositional Problems scale. As seen in Table 5, younger youths' reports on the Oppositional Problems scale were able to discriminate youths with diagnoses of oppositional defiant disorder (ODD) from youths without ODD, as well as youths with any disruptive behavior diagnosis (i.e., ODD, Conduct Disorder [CD] or disruptive behavior disorder not otherwise specified) from youths without any disruptive behavior diagnosis. AUC values for the younger group fell in the ''fair'' range and did not significantly differ from the older group.
Affective, Conduct, and Somatic Problems scales. Given that there were insufficient numbers of youths diagnosed with CD (younger, n ¼ 7; older, n ¼ 26; affective disorders younger, n ¼ 22; older, n ¼ 17), and somatic disorders (n ¼ 0), concurrent validity analyses were omitted for the corresponding scales. 6

Parent-Child Agreement Across Younger and Older Youths
Results of the parent-child agreement analyses across younger and older youths appear in Table 6 and  6 Despite having insufficient power for these analyses, we conducted these analyses on the younger and older subsamples for illustrative purposes. The seven younger youths with CD scored higher on the DSM-oriented Conduct Problems scale (M ¼ 9.43, SD ¼ 3.87) than the 174 younger youths without CD (M ¼ 2.67, SD ¼ 3.32), t(179) ¼ 5.38, p < .001. The 26 older youths with CD also scored higher on the DSM-oriented Conduct Problems scale (M ¼ 9.17, SD ¼ 4.50) than the 119 older youths without CD (M ¼ 3.40, SD ¼ 3.08), t(143) ¼ 7.91, p < .001. With respect to the DSM-oriented Affective Problems scale, the 22 younger youths with any affective disorder (i.e., major depressive disorder, dysthymic disorder, mood disorder not otherwise specified) scored higher on this scale (M ¼ 7.77, SD ¼ 3.46) than the 159 younger youths without affective disorders (M ¼ 5.24, SD ¼ 3.89), t(179) ¼ 2.91, p ¼ .004. The 17 older youths with any affective disorder also scored higher on the DSM-oriented Affective Problems scale (M ¼ 7.65, SD ¼ 2.96) than the 129 older youths without affective disorders (M ¼ 4.27, SD ¼ 4.07), t(144) ¼ 3.31, p ¼ .001. YSR SCALES revealed that younger youths' parent-child agreement correlation coefficients were nonsignificant (p > .01) for nearly all internalizing scales but significant for some externalizing scales. The older youths evidenced significant parent-child correlations on both internalizing and externalizing scales, although primarily among the externalizing scales. These results are consistent with previous findings that parent-child agreement is worse among younger youth (Grills & Ollendick, 2003) and is greater for externalizing problems (e.g., Christensen, Margolin, & Sullaway, 1992). It is worth noting however that the low reliability associated with the younger youths' reports likely attenuated their parent-child agreement correlation coefficients relative to the older youths.
Several limitations to the current study should be noted. First, reliability of the younger and older youths' scale scores were estimated via Cronbach's alpha coefficients. The addition of test-retest data of both younger and older youths would have provided an additional statistic with which to estimate reliability. In addition, concurrent validity estimates may have been inflated due the concurrent validity analyses being based on the same informant (i.e., both the YSR and ChIPS diagnostic data were derived from youth reports only). Future studies should also investigate the degree to which these reliability and validity statistics differ among older adolescent samples. Normative data for younger youths should also ideally be gatheredparticularly for the broad band scales-to further increase the clinical utility of the YSR scales.

Implications for Research, Policy, and Practice
Despite these limitations, practitioners and researchers seeking empirically supported assessment tools for younger youth may administer and interpret the YSR broad band internalizing and externalizing scales. Caution should be exercised however if narrow band scores are interpreted, given the lower reliability Note: All Area Under the Curve (AUC) values were significantly greater than .50, p < .001. DSM ¼ Diagnostic and Statistical Manual of Mental Disorders; ADH ¼ Attention Deficit=Hyperactivity; ROC ¼ receiver operating characteristic; ANOVA ¼ analysis of variance; SAD=GAD= SPEC ¼ youths with separation anxiety disorder, generalized anxiety disorder, and=or specific phobia; Any Anxiety ¼ youths with separation anxiety disorder, generalized anxiety disorder, specific phobia, obsessive-compulsive disorder, posttraumatic stress disorder, panic disorder, social phobia, and=or anxiety disorder not otherwise specified; ADHD-PI=PH=C ¼ youths with ADHD-PI, ADHD-PH, or ADHD-C; Any ADHD ¼ youths with ADHD-PI, ADHD-PH, ADHD-C, or ADHD-NOS; ODD ¼ oppositional defiant disorder; Any Disruptive ¼ youths with oppositional defiant disorder, conduct disorder, or disruptive behavior disorder not otherwise specified. evidenced in the present study. More research is needed to better understand the psychometric properties of the narrow band scales among younger samples, particularly given that they evidenced promising results in other domains (i.e., factor structure, concurrent validity).
Given that YSR was designed to be completed by youth with fifth-grade reading skills, practitioners and researchers administering the YSR to younger youths should be prepared to provide assistance to children who have difficulty understanding items. The ASEBA manual (Achenbach et al., 2001) has reported guidelines for ''respondents who cannot complete forms independently'' (p. 6), indicating that interviewers may read the questions to youths and record their responses for them.