Using the Oaxaca-Blinder Decomposition Technique to Analyze Learning Outcomes Changes over Time: An Application to Indonesia’s Results in PISA Mathematics

The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations


Introduction
The Oaxaca-Blinder technique was originally used in labor economics to decompose earnings gaps and to estimate the level of discrimination.For earnings differentials, the use of multivariate regression analysis allows for the simulation of alternative outcomes and the decomposition of gross differentials.The decomposition method, the technique used for analyzing earnings differentials, was popularized in the economics literature by Oaxaca (1973) and Blinder (1973).It was used earlier in sociology (Siegel 1965;Duncan 1968), and before that in demography (Kitagawa 1955).Although in the economics literature it was first used to analyze the determinants of male/female earnings differentials, the decomposition technique has been used since to analyze ethnic earnings differentials, public/private sector earnings differentials, earnings differentials by socioeconomic background, to test the screening hypothesis, and to test the effectiveness of a job training program, among other uses.It has been applied since in other social issues, including education, where it can be used to assess how much of a gap is due to differences in characteristics (explained variation) and how much is due to policy or system changes (unexplained variation).
We apply the decomposition technique in an effort to analyze the increase in Indonesia's score in PISA mathematics.The test score increase is assessed in relation to family, student, school and institutional characteristics.The gap over time is decomposed into its constituent components based on the estimation of cognitive achievement production functions.The decomposition results suggest that almost the entire test score increase is explained by the returns to characteristics, mostly related to student age.However, we find that the adequate supply of teachers also plays a role in test score changes.Indonesia has participated in the PISA -the OECD's Programme for International Student Assessment, an internationally standardized assessment administered to 15 year olds in schoolssince its first round in 2000.There have been two subsequent rounds since then in 2003 and 2006.Over time, Indonesia has maintained a steady score in science with 393, 395, and 393 points in 2000, 2003 and 2006.The average score among OECD countries is 500 points and the standard deviation is 100 points.Indonesian students have steadily improved their score in reading over time, from 371 in 2000 to 382 in 2003 and 393 points in 2006, an increase of about 10 points, or a respectable 0.10 of a standard deviation, in each round.In math, there was no improvement between 2000 and 2003 (scores of 367 and 360 points), but there was a dramatic improvement in 2006, to 391 points, an increase of 0.30 of a standard deviationor almost one full school year equivalentin just three years.Figure 1 shows how the change occurred.In 2003, 80 percent of Indonesian students scored at the lowest levels, level 1 and -1.These are significantly low achievement levels, effectively denoting functional illiteracy.A typical student at level 1 or -1 may be able to read words but will not be able to decode the information they contain.By 2006, the number of students scoring at level -1 decreased drastically, while the proportions at higher levels went up.Nevertheless, there were very few students at the higher levels from 4 and above (and none at all at levels 5 and 6).

Figure 1:
Most developing countries score at the bottom of the scale in most international achievement tests.Until recently, there were very few if any examples of developing countries that had achieved significant improvements in these tests.Critics argue that the international development community has focused almost exclusively on increasing enrollment in the education sector and has ignored the need for that education to be of adequate quality.However, Indonesia is a rare case of a developing country that has achieved some progress.
In order to find out what lay behind Indonesia's exceptional improvement in 2006, we looked at how family, student, school and institutional inputs may have affected the increase in the test score of 15 year olds in math.We decomposed the increase over time into its constituent components using the traditional Oaxaca-Blinder method, based on the estimation of a cognitive achievement production function.Our decomposition results suggest that almost all of the test score increase was unexplained or, in other words, was due to changes in the returns to characteristics rather than to changes in the characteristics themselves.We found that most of the positive change was due to the increased returns over time to the variable representing a student's age, which varied only by months in this case (as the PISA is administered to a randomly selected sample of students who are between the ages of 15 years and 3 months and 16 years and 2 months at the time of the test).Empirical evidence on education production functions exists for both developed countries (for example, Hanushek, 1986 and2002) and developing countries (for example, Glewwe 2002).Previous empirical studies do not always agree on which school and family inputs improve children's achievement.For example, there is some disagreement about the role played by schooling inputs such as class size, teacher experience, teacher education, and mother's employment.For a survey of related literature, see Todd and Wolpin (2003).

Level of Proficiency -Math , Indonesia
Nevertheless, although a child's achievement is inherently individual in nature, a large body of evidence points to the existence of persistence effects in educational achievement across generations (Fertig 2003;Fertig and Schmidt 2002;Currie and Thomas 1999).Consequently, it is necessary to control for the characteristics of individual students as well as for their family backgrounds.Similarly, it is necessary to control for the characteristics of the school environment as well as its institutional arrangements.Recent evidence from the literature on early test score differentials suggests that differences in children's cognitive ability among families appear at an early age, tend to persist, and may even widen with age.In general, -good‖ families promote cognitive, social, and behavioral skills, while -bad‖ families do not.This is important in determining what policy interventions can be successful (Carneiro and Heckman 2003).Evidence also suggests that socioeconomic and family background variables, such as the education levels of a student's parents and the number of books a child has, are very important determinants of test scores at early ages (Fryer and Levitt 2002).

Methodology and Estimation
Our first step was to specify and estimate cognitive achievement production functions that relate student achievement to individual, family, school, and institutional inputs.We then proceeded to decompose the over-time test score change into an explained component (accounting for student, family, school and institutional characteristics) and an -unexplained‖ component (the efficiency by which the country is able to convert characteristics into student learning outcomes as measured by test scores), using the traditional Oaxaca-Blinder decomposition method (Oaxaca 1973;Blinder 1973).
The model specification that we used to estimate the production function for cognitive achievement is as follows: where T iaj is the observed test score (from the PISA math test) of student i in household j at time a (the time of the test), A ija is a vector of individual student characteristics, F ija is a vector of parent inputs, S ija is a vector of school-related inputs, I ija is a vector of the school's institutional characteristics, and є ija is an additive error, which includes all the omitted variables including those that relate to the history of past inputs, endowed mental capacity, and measurement error.Todd and Wolpin (2003) discuss in detail the assumptions that would satisfy the application of this specification, in which the achievement test score depends solely on contemporaneous measures of family, school, and other inputs.These assumptions state that: (i) current input measures capture the entire history of inputs or, alternatively, only contemporaneous inputs matter and (ii) contemporaneous inputs are unrelated to endowed mental capacity.Its linear specification (after dropping subscript a) is given by: where β 0 to β 4 are the coefficients to be estimated.The standard procedure for analyzing the determinants of the test score differences over time is to fit equations between test scores and observed characteristics.The observed test score differential can be decomposed as: where T is the standardized test score, X i is a vector of student, family, school, and institutional the same characteristics.This second (unexplained) component, while more difficult to interpret in the present context than an earnings gap decomposition framework, may have had more than one explanation.The first and most obvious explanation is that the unexplained portion of the test score increase may reflect certain unobserved family characteristics that are correlated with achievement over time, possibly related to household wealth.The second possible explanation may be that, given that enrollments are rising over time in Indonesia and more students from disadvantaged backgrounds are entering the school system, teachers may pre-judge these students as underachievers and, therefore, use different teaching standards with them than with other students (Ferguson, 1998).A third explanation may be that different cohorts of students do not reap the same benefits from equivalent school and classroom resources.Finally, the differences in the returns may reflect the impact over time of past reforms that both increased school enrollments and helped to improve the quality of school inputs in Indonesia.
Some of these coefficient estimates may be subject to biases.For example, if a school characteristic is correlated with unobserved family characteristics that influence achievement (such as family wealth and parents' motivation), then the effect of attending a school with such characteristics may be biased.

Modified Decomposition
An alternative decomposition is possible using a modified Oaxaca-Blinder method, in which the unexplained part of the test score differential is captured by a year indicator ( 2006) taking the value of 1 for 2006 and 0 otherwise (2003).Consider a production function for cognitive achievement: where 2006 ija is a dummy variable equal to 1 if the test was taken in 2006 and 0 otherwise.
In implementing a modified Oaxaca decomposition of the test score gap and assuming a linear specification, the differences of mean test scores for 2006 and 2003 students is given by: (F 2006 -F 2003 ) +β 4 (S 2006 -S 2003 ) + β 5 (I 2006 -I 2003 )   (5) where coefficient β 1 is an estimate of the portion of the change that remains after accounting for the differences in mean characteristics.
To arrive at the proportions that are explained and unexplained: While test scores and individual and family information are at the individual level, school resources and other school-related inputs are at the school level.In choosing the estimation method, we recognized that observed test scores can be expected to be correlated at the school level due to clustering effects.Therefore, the assumption that disturbances are independently and identically distributed with fixed conditional variance did not hold.As a result, we used the estimation method of OLS by cluster at the school level.

Data
The PISA is an international assessments initiated by the OECD.It assesses 15 year olds in each participating country in three main subject areasreading, mathematics, and scientific literacy.We focused on the results for Indonesia in mathematics in the assessments for 2003 and 2006.We did not include information for 2000 even though it was available, because the sample was very different.For instance, dataset for the 2000 survey has much fewer observations regarding parents' education than the 2003 and 2006 surveys; while there were 8,828 and 9,292 observations in 2003 and 2006, in 2000 the sample contained only 2,777 observations.In short, we do not believe that the 2000 sample is comparable with subsequent rounds.
Instead of testing the knowledge and skills specified in the national curricula of the participating countries, the PISA aims to test the ability of students to apply their acquired knowledge in the three subject areas in real-life situations.The targeted student population falls between the ages of 15 years and 3 months and 16 years and 2 months who are enrolled in the seventh grade or higher.Indonesia uses a two-stage sampling frame with a cluster design.We applied weights to the data at the student level.The PISA standardizes the data for OECD countries with the mean at 500 points and the standard deviation set to 100.Thus, it is the OECD means and standard deviation that are the benchmarks for the other participating countries.

Description of the Sample
The means for the variables that we used to analyze the determinants of learning are presented in Table 1.The PISA data contain missing values among the family background characteristic variables, and in.Table 1, we show where those missing values occur.While some might choose to impute the missing values, we decided not to do so in this case.Therefore, if a variable had any missing values, we dropped the observation in its entirety from our analysis.
We realize that deleting cases with missing values can have dangers, as demonstrated by Little and Rubin (1987).Deleting cases is based on the assumption that the deleted cases occur at random and are a relatively small representative proportion of the entire dataset.However, this may not necessarily be the case.The missing data may be indicative of some pattern and cannot safely be assumed to reflect randomness.In such circumstances, deletion can introduce substantial bias into the study.Moreover, the loss in sample size can appreciably diminish the statistical power of the analysis.
As a rule of thumb, if a variable has more than 5 percent missing values, it is advisable not to delete cases, and many researchers are much more stringent than this (Little and Rubin, 1987).
Deleting incomplete cases has its attractions, mostly the virtue of simplicity, but one loses information in doing so.This approach also ignores the possible systematic difference between the complete cases and incomplete cases, and the resulting inference may not be applicable to the population associated with all cases, especially with a smaller number of complete cases to take into account.Some techniques exist to impute missing values, ranging from correlations, single imputation, and a multiple imputation procedure (Rubin, 1987).However, very few of our variables had missing values that made up more than 5 percent of the total.Overall, the sample for 2003 dropped from 10,761 students to 8,828, and in 2006, the sample went down from 10,647 to 9,293.
The mean scores associated with each characteristic increased over time (Table 2).The scores for students whose mothers had a university education were much higher in 2006 than in 2003, at more than half a standard deviation.Speaking the same language at home that is used in school increased scores by more in 2006 than in 2003.The largest increase was for children with at least one computer at home -a 66 point increase or the equivalent of two years of learning.
Another important change is the score associated with the school autonomy variable titled -the school determines pedagogy.‖In 2003, those schools that did not determine pedagogy scored higher than those that had autonomy over their own pedagogy, but by 2006, the opposite was true.Also, the association between gender and math scores changed over time.In 2003, there was little difference in overall scores between boys and girls, but by 2006, boys scored 17 points higher girls in math.Given that we did not impute, we knew there was a possibility that our analysis would be biased.To minimize this risk, we examined mean scores by variable for two samples in each year (see Annex Table 1).One was the regression sample, which did not include observations with any missing value, and the other was the full PISA sample.The regression sample, despite its (small) number of missing values, was not very different from the full PISA sample in terms of outcomes.The differences in math scores by characteristic did not vary appreciably, by as little as 1 point in some cases and by no more than 10 points in others.
Overall, the scores differed by an average of only 4 points.On a scale with a mean of 500 and a standard deviation of 100, these are not very large numbers.Also, when we examined the differences in means between the two years, it became apparent that the regression sample was more urban and public school-oriented in both years but particularly in 2003.However, we found that the overall mean test score of the regression sample was very similar to the whole sample mean.Therefore, we concluded that the regression sample was not biased.

Regression Results
There are significant premiums associated with attending a public school and with attending a school that was able to determine its own pedagogy or, in other words, had been granted school autonomy (Table 3).There is some controversy about private and public schools in Indonesia.James et al (1996) found that private schools were better managed in Indonesia than public schools, and they argued that private management is more efficient than public management in achieving academic quality.There is also some evidence that private funding also increases efficiency whether the schools are publicly or privately managed.Bedi and Garg (2000) examined the effectiveness of public and private schools in Indonesia using the labor market earnings of their graduates as the measure.Controlling for observable personal characteristics and school selection, they found that graduates of private secondary schools performed better in the labor market than their peers from public secondary schools, contrary to the widely held belief in Indonesia that public secondary schools are superior.Suryadarma et al (2006) compared public primary schools with the smaller sample of private primary schools.They found that, on average, students in the private schools performed marginally better academically than their counterparts in public schools, but the only statistically significant difference was in mathematics performance.The mean differences were slight-less than three points on a 0-100 scale, or 0.11 standard deviations.This suggests that the differences in performance between public and private schools may not be very large.Newhouse and Beegle (2005) evaluated the impact of school type on the academic achievement of junior secondary school students (in grades 7 to 9).They found, after controlling for a variety of other characteristics, that students who graduated from public junior secondary schools scored 0.15 to 0.30 standard deviations higher on the national exit exam than their comparable privately schooled peers.This finding was robust to OLS, fixed-effects, and instrumental variable estimation strategies.The authors also found that students attending Muslim private schools, including Madrassahs, fared no worse academically on average than students attending secular private schools.The authors argued that the results provided indirect evidence that higher quality inputs at public junior secondary schools than at private schools of the same level promote higher test scores.
In our samples, the adequate supply of teachers was associated with higher test scores in 2006.
The coefficient for 2003 was statistically no different from zero, whereas the coefficient in 2006 was significant.We also found that the higher the percentage of students who repeated a grade in the school, the greater the significant and negative effect on scores.Living in a rural area had a negative effect, although fewer people lived in rural areas in the 2006 sample than in the 2003 sample and the coefficient was slightly less negative.The negative effect of being female actually increased in 2006.The effect of parental education had some unexpected effects in 2003.In the case of the mother's education, only having a mother with upper secondary schooling had a positive effect.By 2006, all of the signs had become positive, with having a mother with secondary schooling having had the largest effect.Having a large number of books at home used to be associated with a large positive coefficient, but by 2006, this variable was no longer significant.However, having a computer at home had a large and significant positive effect, an effect which grew in 2006.Overall, for both years, the samples were large (8,391 students in 2003 and 8,660 in 2006) Duflo (2001) studied the effects of this program by combining differences among regions in the terms of the number of schools that were built with differences among different cohorts of students induced by the timing of the program.Her research suggested that each primary school constructed per 1,000 children led to an average increase of 0.12 to 0.19 years of education as well as to a 1.5 to 2.7 percent increase in wages for that cohort.This implies total returns to education ranging from 6.8 to 10.6 percent.This huge increase in school places no doubt had a positive effect on the schooling outcomes of successive generations, including the 2006 class.Figure 2 shows that the change over time represented a shift of students towards higher levels of education and less inequality between the highest and the lowest achieving students.Before looking at the results of the decomposition, we examined over-time changes in characteristics and returns.Overall, there were not very many changes in characteristics.The adequate supply of teachers increased considerably, by 24 percentage points, and the returns associated with it increased significantly as well.The percentage of grade repeaters in a school declined significantly; the penalty associated with repeating fell also.More importantly, we could see that there had been a change in the schooling profile of the parents.More parents had primary, lower, and upper secondary schooling in 2006 than in 2003, and the proportion of parents with a university education had gone down.This was probably the result of two trends: first, the level of education of adult Indonesians has been rising steadily over time -thus increasing the proportion of parents with primary and secondary (instead of none) -and, second, student access to secondary schooling has been going up -thus reducing the proportion of parents with a university education in the sample.The returns to mothers' education went up at most levels of education except upper secondary, which was already very high in 2003.Meanwhile, there had been small declines in fathers' educational level of attainment at all levels.
The detailed decomposition is presented in Annex Table 2, while Table 4 shows the results of the decomposition.Almost all of the difference is -unexplained,‖ which, in terms of an over-time decomposition of changes in test scores, means that most of the over-time increase is due to higher returns to all characteristics.Simply put, Indonesia in 2006 was able to convert the characteristics in question into higher levels of learning.As Annex Table 2 shows, two other characteristics are important in explaining the differences in test scores between 2003 and 2006.An adequate supply of teachers, both in terms of endowments and coefficients, played a positive effect in increasing test scores between 2003 and 2006.Also, in terms of the unexplained part of the decomposition, the coefficient associated with being a female changed from -7.57 in 2003 to -18.6 in 2006.The results of the alternative decomposition (equation 4) are presented in Annex Table 3, and the overall results are presented in Table 6.The results are in line with the results that we got from the more traditional decomposition.

Discussion
It is very impressive that Indonesia was able to achieve such a gain in test scores in math given the increased enrollment of disadvantaged children in the school system (see Figure 3).As enrollments in lower secondary schooling continue to increase, more and more students from families with less well-educated parents are entering the school system.For example, in 2003, the average level of education attained by the fathers of the 15-year-old students was 9.26 years, and for their mothers it was 8.30 years.By 2006, the average level of the fathers' schooling had fallen to 9.09 years, while the mothers' level had fallen to 8.16 years.Table 7 presents the variables that are listed in the 2006 dataset.The more institutional variables that we included in our analysis, the more interesting were the findings that emerged.Among other things, firing teachers, which is an indicator of school autonomy, was significant.Also, if a school was having to compete with others in the vicinity, then the effect was large, positive, and significant.Parental involvement in formulating the school budget was also positive and significant.Public schools retained their large advantage.An adequate supply of math teachers played a positive role in the determination of test scores, while grade repetition had a small negative effect.The level of the mother's schooling was significant.Doing math work in class was also important.Despite the impressive gains that were made by Indonesian students in math in the 2006 PISA, Indonesia still has a long way to go to improve its academic standing.In 2006, almost threequarters of 15-year-olds scored at level 1 and below.Too few students scored at levels 2 and 3, and an insignificant number scored at levels 4 or above.Understanding the reasons why the scores increased in 2006 should help the Government of Indonesia to build on its strengths and make further improvements in the future.

Conclusions
In the 2006 PISA, Indonesia's score in math increased by 30 points, or 0.3 of a standard deviation, in just three years.We explored the reasons behind this increase by Indonesia's 15year-old students in relation to various family, student, school, and institutional inputs.We decomposed the change over time into its constituent components using the traditional Oaxaca-Blinder method, based on the estimation of a cognitive achievement production function.Our decomposition results suggest that almost all of the test score increase was unexplained, or, in other words, was due to changes in the returns to the characteristics rather than due to changes in the characteristics themselves.To put it another way, Indonesia was able to better educate its students in 2006 than in 2003 regardless of the characteristics of those students.

Figure 2 :
Figure 2: Distribution of Test Scores over Time

Figure 3 :
Figure 3: Changes in Parental Education and Students' Scores over Time

Table 2 : PISA 2003-2006, Mean Math Scores by Selected Characteristics
ResultsThe purpose of doing these decompositions was to investigate what changes may have occurred over time that would help us to explain the 30-point increase in math scores in Indonesia between 2003 and 2006.It seems clear that the 2006 score was partly the result of reforms, policies, strategies, and interventions that were put in place years ago, even a generation ago.For example, between 1973 and 1978, the Indonesian government engaged in one of the largest school construction programs on record (the INPRES program).
, representing 1.5 million and 1.8 million students in 2003 and 2006.The 2006 model seems to be more robust, with an R-square of 0.35, compared with an R-square of 0.26 for 2003.

Table 4 : Decomposition of Math Scores over Time (as percentage of total test score differential)
The bulk of the overall difference resulting from changes in the returns to characteristics was due to student characteristics.That is, for a given set of student characteristics, Indonesian schools were more able to convert those factors into higher levels of learning in 2006 than in 2003.This is a significant finding since more and more children enter the lower secondary school system with every passing year.As we have shown above, most of the new entrants come from poorer backgrounds and from homes with parents who have received less schooling.According to the 2006 math PISA scores, Indonesia was better able to educate students regardless of their age.Math scores for 2003 and 2006 by mean age are presented in Table5.In 2003 and 2006, the average-aged student achieved the mean score for the country overall.In 2003, there was not much variation in the ages of students who were one standard deviation above the mean or of those who were one standard deviation below the mean age.With a mean age of 15.71 years and a standard deviation of 0.27 years in 2003, students were between 15½ and almost 16 years of age.In 2006, the mean age was 15.78 years and the standard deviation was 0.29 years, thus the range was 15½ to just over 16 years of age.Scores were higher for all age groups in 2006 but also varied more than in 2003, with 16 year olds averaging 400 points.

Table 6 : Alternative Decomposition -Determinants of PISA Differentials
Sources: PISA 2003 and 2006; authors' calculations The main explanation behind the change in test scores between 2006 and 2003 is a fixed-time effect that yielded a 19.5 incremental increase in the score.The observable characteristics contributed only marginally to the change in test scores between 2003 and 2006.It is noteworthy that the characteristics of institutions and students made a positive contribution to the positive change in test scores, whereas schools and family played a negative role.

Annex Table 2: Decomposition of PISA Scores for Indonesia, 2003-2006
Program for International Student Assessment (PISA) 2003 and 2006