Read "A New Vision for High-Quality Preschool Curriculum" at NAP.edu

« Previous: 8 State- and Program-Level Curriculum Decision Making and Selection

Page 305 Cite

Suggested Citation:"9 Examining Variation in Curriculum Effects." National Academies of Sciences, Engineering, and Medicine. 2024. A New Vision for High-Quality Preschool Curriculum. Washington, DC: The National Academies Press. doi: 10.17226/27429.

Page 306 Cite

Page 307 Cite

Page 308 Cite

Page 309 Cite

Page 310 Cite

Page 311 Cite

Page 312 Cite

Page 313 Cite

Page 314 Cite

Page 315 Cite

Page 316 Cite

Page 317 Cite

Page 318 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-1 9 Examining Variation in Curriculum Effects Decision-makers are rarely interested in evidence that applies only to the specific samples, conditions, and outcomes included in research studies. Rather, they are interested in applying study results to predictâor to generalizeâfindings to their own populations, settings, and outcomes of interest. Generalizing findings requires: 1) study results that are internally valid and replicable; 2) curricula, study samples, contexts, and outcomes that are representative of conditions that are to be generalized over; and 3) knowledge about the extent to which findings may vary and why these findings vary (Shadish, Cook, & Campell, 2002; Cronbach, 1982;Stuart et al., 2011; Tipton, 2012; Tipton & Olsen, 2018). As described in Chapter 2, the evidence base on preschool curricula includes findings from many rigorous, internally valid evaluations, some of which have been replicated over multiple studies. Yet gaps remain in the types of curricula, study samples, contexts, and outcomes that are represented in the existing literature. This includes questions about the effects of culturally responsive curricula; the effects of curricula in different settings including family child-care or for-profit contexts; the effects on less commonly studied child outcomes, such as problem solving and curiosity, child positive racial identity, multilingual learnersâ growth in home language; and the effects on other widely adopted instructional approaches to preschool, such as Montessori. Emerging research on preschool curriculum also suggest variation in curriculum effectiveness. Given the diversity of early childhood conditions under which curricula are delivered, research about curriculum effectiveness should describe the specific contexts, settings, and populations under which the curriculum effect was observed. This is because study findings are often dependent on contexts and unstated theoretical assumptions under which the study was designed and carried out. For example, a curriculum may have a small average effect in improving studentsâ early literacy skills, but the magnitude of the effect may be much larger (or smaller) in different preschool settings (e.g., Head Start vs. child-care center), with more experienced educators delivering the curriculum, and for students with different home language experiences. Effects may also vary by the type of study design (experimental versus quasi- experimental). When curriculum effects vary substantially due to differences in student characteristics, in how the curriculum was delivered, in settings under which the curriculum is delivered, and in outcomes examined, findings observed under one specific set of circumstances will often not be observed under a different set of circumstances. This implies that different curriculum approaches may be more (or less) appropriate for addressing studentsâ developmental and learning goals. In these cases, the program administratorâs goal is not to select a preschool Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-2 curriculum that is the most effective âon averageâ, but to identify the curriculum that is most effective for their specific students and context. For example, program administrators in tribal communities may seek curriculum approaches that teach children academic and social-emotional skills in ways that are culturally responsive and well-aligned to the norms, mores, and goals of their community and families; educators who have children with disabilities in their classrooms may need curricula that address the unique learning needs of their specific students. The evidence-base for a new vision of preschool curriculum, therefore, must address the multiple and complex ways in which children, educators, and their environments interact with curriculum and its delivery. And yet, generating evidence to understand variation in curriculum effects remains elusive and challenging. This chapter focuses on research and methodological issues that arise in designing and implementing studies for evaluating variation in curriculum effectiveness. The committee first presents a framework for understanding why curriculum effects may vary across different studies and contexts (Steiner, Wong, & Anglin, 2019). The framework describes both programmatic and study-related factors that may be critical for determining the size of curriculum effects. Next, the chapter discusses challenges with identifying sources of effect variation within studies and across multiple studies. Finally, we present conclusions drawn from a review of the literature on curriculum effectiveness. IDENTIFYING SOURCES OF EFFECT VARIATION The overarching missions of many researchers of early childhood education is to identify curriculum, practices, and policies that improve studentsâ life outcomes. When curriculum effects replicate over multiple studies with diverse settings, populations, intervention deliveries, and contexts, researchers and decision-makers have increased confidence that findings are likely to generalize to their specific population or context of interest. However, when study findings are fragile, hard to replicate, and not robust to different populations of interest, the external validity of these findings is questioned. All scientific conclusions about curriculum effectiveness are based on study findings drawn from samples of participants with specific settings, treatment protocols, and materials. The challenge arises when in interpreting these findings, researchers fail to specify for whom and what the curriculum effects are intended to represent, as well as any potential constraints on generality that may either amplify or dampen the size of curriculum effects. In the absence of such information, program administrators may assume that study conclusions apply broadly to any sample or context. But when study findings fail to replicate, the trustworthiness of the finding is questioned, and their value for evidence-based decision-making is doubted. Given the diversity in contexts and settings for how preschool curricula are implemented and deliveredâand the potential variation in studentsâ responses to different curriculum approachesâa research agenda that supports a new vision of preschool curriculum must seek to understand the extent to which curriculum effects are robust across different contexts, settings, and populations. And, in cases where effect variation is observed, the evidence-base must identify why effects varied to understand the populations and conditions under which findings do âand do notâgeneralize over. Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-3 There are multiple reasons why curriculum effects may differ across studies and study conditions. Steiner, Wong, and Anglin (2019) propose a framework for describing key sources of effect variation across different populations, contexts, and studies. The framework demonstrates that the size of an intervention effect depends on programmatic considerations as well as study design characteristics. Programmatic considerations include variations in the curriculum or intervention, the condition against which it is being compared, and the outcomes for which curriculum effectiveness is being measured. It also includes differences in student and setting characteristics that may interact with the size of the curriculum effect. When curriculum effects fail to replicate because of programmatic differences across studies, researchers conclude that curriculum effects will likely not generalize to other student populations, contexts, settings, outcomes, and treatment deliveries (Rubin, 1981). Study design characteristics include methodology decisions that may affect the size and precision of an effect. When results are compared from studies with different design characteristics, they may differ because researchersâ choices in methodology yielded different conclusions about curriculum effects. For example, one study effect may be obtained from a randomized control trial while another study effect is obtained from an observational study. Curriculum effects across the two studies may differ because the latter suffers from selection bias, raising concerns about the validity of the study findings. Study findings may also fail to replicate because one or both studies lack statistical power for producing precise estimates of the curriculum effect. Decision-makers are often not interested in study findings that fail to replicate because of methodology choices. However, in cases where both programmatic and study design characteristics vary simultaneously, it may be impossible to disentangle why study findings failed to replicate. Do curriculum effects differ across studies and study conditions because effects are not generalizable, or because of study design choices? It may be impossible to know. The challenge arises when a researcher or decision-maker mistakenly interprets incongruent study findings to conclude that a curriculum effect fails to generalize when the true reason may be because one of the studies lacked statistical power to detect curriculum effects. A central goal for a research agenda supporting a new vision for preschool curriculum examines the extent to which curriculum effects are generalizableâand when they are notâto variations in student populations, contexts, settings, and outcomes. This report describes both programmatic and study design reasons why curriculum effects may vary. Chapter 2 summarizes the empirical literature examining programmatic reasons for why curriculum effects varyâ including differences curriculum type, in the outcomes used for assessing effectiveness, in the studentsâand their backgrounds, knowledge and experiencesâparticipating in the curriculum, in the characteristics of teachers using the curriculum, in the preschool setting under which the curriculum is delivered, and in macro conditions that may interact with how the curriculum is delivered (e.g. funding for preschool, state licensure requirements for preschool teachers). These characteristics may amplify or dampen the size of the curriculum effect and may interact in complex ways that affect the effectiveness of a curriculum (Tefera et al., 2018). For example, a curriculum may be especially effective for students with disabilities in public preschool settings but less so for children without disabilities in private childcare centers. Understanding the extent to which curriculum effects vary by programmatic features is critical for determining the generalizability of study findings. Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-4 In the following section, we describe study design reasons for why curriculum findings may differ across studies and study conditions. They include differences in the treatment-control contrast for evaluating the curriculum, the research design used for identifying and estimating the curriculum effect, and the size of the sample used to evaluate the effect. While these characteristics that are usually investigated to assess the generalizability of study findings, they are related to the feasibility, logistics, and ethics of conducting a study. Therefore, they are of special consideration for researchers and funders of studies on preschool curriculum. Contrast Condition for Evaluating Curriculum Curriculum effects are determined by comparing the average effects for students who participated in the curriculum with effects for students who did not. As such, what activities children engage with in both the curriculum and the control condition can have substantial impact on the size and direction of the effect. In preschool curriculum studies, children in the control condition may participate in a wide variety of activities. For example, they may be learning from a curriculum that teachers used prior to the introduction of the new curriculum, they may be engaged in an online activity on a computer or tablet, or they may not be enrolled in a preschool program at all. Usually, the control condition includes the learning activities, experiences, and instruction that the child would have received had they not participated in the curriculum under investigation. Given that these circumstances can vary widely across preschool settingsâand that curriculum effects are determined by comparing outcomes for students who participated and did not participate in the curriculumâunderstanding what activities occurred in the control condition is critical for interpreting curriculum effects. In general, studies with strong treatment contrastsâwith more distinct intervention and control group differencesâwill produce effects that are larger compared with effects from studies with weaker treatment contrasts. For example, Duncan and Magnuson (2013) noted that programs evaluated before 1980 produced substantially larger effects than those evaluated later. They argue that one explanation for the decline in effects is that the âcounterfactual conditions for children in the control group studies have improved substantiallyâ (pg. 114). In more recent samples, children in the control group were much more likely to attend center-based care programs and were likely to experience higher-quality home environments with more educated mothers (Duncan & Magnuson, 2013). Reanalysis of data from the Head Start Impact Study also concluded that the overall average effect masked substantial variations in Head Start effects that were related to differences in the control conditions (Morris et al., 2018) âhere, the authors found evidence of sustained impacts for Head Start when the control consisted of children who stayed home and did not attend center-based care. Finally, the Preschool Curriculum Evaluation Research (PCER) Program supported a series of experimental evaluations (Preschool Curriculum Evaluation Research Consortium (2008) in the early 2000s examining the relative performance of curricular approaches. A recent meta-analysis of the PCER data examined the performance of different curricular approaches against alternative counterfactuals (Jenkins et al., 2018). The authors looked at the performance of content-specific curricula in reading and math versus what they described as whole-child-focused curricular approaches, such as HighScope and Creative Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-5 Curriculum, and locally developed curricula. 28 Overall, the authors concluded that content- specific curricula produced larger effects on targeted outcomes than did whole-child approaches, and that whole-child approaches did not yield student-level effects that were reliably different from locally developed curricula. However, the original PCER evaluation studies were conducted 20 years ago, and the curricula represented in the control condition have been revised since then. Research Design for Identifying Effects Research design describes the methodological approach used for determining curriculum effectiveness. Most research designs involve the comparison of outcomes for one curriculum condition versus those obtained for an alternative condition. Students, teachers, or centers may be randomly assigned to different curriculum conditions, or they may be asked to select their own conditions. When participants select their own curriculum conditions, researchers may use statistical adjustment procedures to compare outcomes for students and classrooms that are observationally similar. The goal here is to ensure that differences observed in outcomes is the result of exposure to different curriculum approaches and not because of other differences between groups. Research design features are important if the choice in methodology contributes to the size of the curriculum effect. For example, in a preschool evaluation where the curriculum is not randomly assigned, but is selected by center directors, researchers may be concerned that center directors will be more likely to choose the curriculum being assessed because children in their centers are at risk for low academic achievement. By comparing outcomes for students in centers that selected the curriculum with those in centers that did not, the curriculum may appear ineffectiveâor even have negative effectsâbecause students enrolled in the intervention centers were at greater risk for low achievement than students in comparison centers. Although the researcher may use statistical procedures to ensure that both groups of children appear observationally similar, children across the two groups may also differ in ways that are unobserved by the researcher. In these cases, it can be difficult to differentiate why children in the curriculum condition exhibited lower outcome scores than those in the control conditionâ was it because the curriculum was ineffective or because there were other unobserved differences between children in the two groups? The modern program evaluation literature prioritizes clear interpretations of intervention effectsâor the internal validity of a studyâas the sine qua non for high-quality rigorous evaluations of program or policy effects (Campbell and Stanley, 1963). This is in part because empirical evaluations of methods have shown that, compared with experimental approaches, nonexperimental methods can yield badly biasedâor incorrect resultsâabout intervention effectiveness (Lalonde, 1986, Fraker & Maynard, 1987; Wong et al., 2018). One benefit of experimental approaches, therefore, is that they yield causally interpretable effects when assumptions for the research design are met. Moreover, when deviations from the planned 28 Some of the curricula evaluated as part of the 2008 PCER studies have undergone revisions in the time since these evaluations were conducted. As such, the versions currently used in classrooms may differ from those evaluated as part of these studies. Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-6 research design do occurâthat may introduce bias in the intervention effectâit can often be detected by the researcher. Deviations from the planned research design may include differential attrition across intervention conditions, the inclusion of additional interventions that occur simultaneously with introduction of the curriculum, or failure to comply with randomly assigned intervention conditions. For these reasons, most evidence registries (e.g., What Works Clearinghouse; Blueprints for Healthy Development) have minimum requirements for inclusion that are related to the quality and implementation of the research design. To date, the committee is unaware of any studies that have compared the magnitude of effects for curriculum effectiveness by research design. In a broader meta-analysis of 84 studies of early care and education curriculum effects, the effect size differences between evaluations in which interventions were randomly and nonrandomly assigned were not statistically different (0.25 standard deviations for randomized controlled trials versus 0.19 standard deviations for nonexperiments; Duncan & Magnusson, 2013). However, to be included in the meta-analysis required both experimental and quasi-experimental designs with more than 10 participants in each condition and less than 50 percent attrition. Quasi-experimental effects were limited only to those that included repeated measures approaches (e.g., change models; difference-in-difference models), regression-discontinuity, propensity score matching, and instrumental variable approaches. Sample Size Finally, studies with small samples produce effect estimates that are less precise. In these cases, it can be difficult to detect effect variation at all, much less identify sources of the effect variation. In addition to producing imprecise effect estimates, small sample studies may include participants and conditions that are not representative of populations ultimately intended to receive the intervention or curriculum. For example, intervention conditions may be administered by the researcher or developer and delivered under controlled settings, making the study more akin to a laboratory trial than field research. Participants, aware of their involvement in a small novel intervention, may also respond differently than they would have had they been involved in a scaled-up version of the intervention with many participants. In Duncan and Magnusonâs (2013) review of program impacts in early care and education, small sample studies tended to have larger impacts, but these studies were also more likely to have researcher- developed programs and to have been conducted prior to 1980. For all of these reasons, studies with small samples may be most informative when they can be synthesized with effect estimates from other study efforts. WITHIN- AND BETWEEN-STUDY APPROACHES FOR EXAMINING SOURCES OF EFFECT VARIATION In the research literature, the programmatic and study design features are sometimes described as moderators of intervention effects. Moderators may be examined by comparing curriculum effects for different subgroups of participants within the same study (within-study approaches) or by comparing effects across multiple studies with different participants, settings, Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-7 and sometimes research methodologies (between-study approaches). The former approach offers the benefit of comparing effects for different subgroups of participants within the same study, who usually have observations on the same measures and have likely undergone similar study procedures (Bloom & Michalopoulos, 2013). Thus, if differential effects are observed between groups of participants, the researcher may have more confidence that effect heterogeneity is due to differences between subgroups of students and not because of other extraneous, study-related characteristics. However, within-study comparisons of effects are limited because studies often do not have sufficient sample sizes for detecting differential effects for subgroups of participants and settings (Sabol et al., 2002; Spybrook, Kelcey, & Dong, 2016; Tipton, 2021). And, in the absence of strong theory guiding moderator analyses, researchers may be prone to conducting multiple moderator tests and reporting only statistically significant results. The challenge here is that these effects may be significant due to chance, resulting in misleading conclusions about moderator effects. An alternative approach for detecting sources of effect heterogeneity is to compare results across different studies with variations in populations, settings, intervention conditions, and outcomes. The studies included in the review often have been screened to meet criteria for yielding interpretable results, including a valid and well-implemented research design. For example, the What Works Clearinghouse applies methodological requirements to education evaluation studies for inclusion in its evidence registry. The Clearinghouse prioritizes study results with strong internal validity, such as those evaluated by experimental or well- implemented quasi-experimental designs. When results from multiple studies with similar interventions and outcomes are available, the What Works Clearinghouse uses meta-analytic approaches for examining the overall average effect of the intervention, as well as evidence for effect heterogeneity. Prioritizing strong internal validity in evaluation studies, however, can introduce biases into the evidence base for summarizing intervention effectiveness. Although experimental designs are viewed as the gold standard approach for yielding unbiased intervention effects, these approaches require intervention conditions that can be manipulated or randomly assigned to participants (Shadish, Cook, and Campbell, 2002). Promising curricular approaches that are not easily evaluated by random assignment (or quasi-experimental approaches) may be omitted from the evidence base. For example, a systems-based policy reform or a curriculum that is designed for a specific tribal community and cannot be randomly assigned may be excluded from the evidence registry. Criteria for study findings to be included in an evidence base requires consideration of both internal and external validity of studies â in terms of the representativeness of interventions, contexts, and populations of findings included (Imai, King, & Stuart, 2008). The focus on internal validity may also obviate other concerns with study quality, including the construct validity of the intervention and conditions being compared. For example, the researcherâs interpretation and understanding of intervention components may not be well- aligned with the participantsâ experience and understanding of the curriculum or program, or the contexts for how the curriculum was delivered. Intervention effects also can only be determined for constructs and outcome domains that can be reliably and validly measured, which may be challenging in preschool studies that often require direct assessments of young children. Outcome measures may not adequately represent all of the domain areas that are critical for Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-8 healthy development; they may also fail to fully capture the learning and growth of children from marginalized communities, especially those who have different home languages from what is represented on the assessment. Finally, intervention effects can only be obtained for samples and settings that are accessible to researchers. Study samples are recruited for a number of different reasons (Tipton et al., 2021; Tipton and Olsen, 2022). They may be locationally convenient, logistically feasible, and/or financially reasonable, but they are rarely obtained using randomâor even purposiveâ sampling from a well-defined population of units, treatments, outcomes, settings, and times. Children from marginalized communities, or belonging to low-incidence disability groups, may be underrepresented in study samples, potentially limiting the generalizability of study results. If different types of curricula are effective for underrepresented children, the results will not be reflected in the evidence base. Study effects may be averaged in meta-analysis without clarifying for what, whom, where, or when study effects represent (Schauer & Hedges, 2021). In meta-analysis, with enough study effects, the researcher may examine whether variations in curricular approaches, participant characteristics, settingsâas well as study design characteristicsâare related to the size and direction of intervention effects. These relationships can be modeled as factors related to scientific and study design conditions (and their interactions) in a series of meta-regressions of effects. The approach allows researchers to observe and test the robustness of effects across different programmatic and study design features, as well as to begin to formulate hypotheses about the conditions under which effects may or may not vary. However, even in meta-analysis, it is often unclear how the researcher can best interpret these associations, and whether these interactions causally moderate or dampen the size of the intervention effect. Moreover, these approaches do not allow for all sources of variation to be tested simultaneously. To consider challenges with identifying sources of variation across multiple but coordinated studies, the committee examined experimental results produced by the PCER program, which was funded by the Institute of Education Sciences. The goal of this initiative was to provide rigorous evidence about the efficacy of available preschool curricula. The initiative funded 12 research teams from across the country to experimentally evaluate 14 preschool curricula using a common set of measures (one curriculumâCreative Curriculumâwas evaluated twice by two different research teams, such that there were 15 evaluation studies of curriculum). Starting in fall 2003, the studyâs sample included predominantly low-income children enrolled in Head Start programs, state prekindergarten programs, or private child care centers. Outcomes for studentsâ skills (reading, phonological awareness, language development, mathematics knowledge, and behavior) were examined at the end of the preschool and kindergarten years. Researchers also examined classroom-level outcomes, including measures for classroom quality, teacherâchild interaction, and instructional quality. Results were analyzed and reported separately for each outcome and study because each team had its own sampling plan and randomization schemes. The final PCER report concluded mixed results for both the student- and classroom-level outcomes (Preschool Curriculum Evaluation Research Consortium, 2008). While eight of the curricula had statistically significant impacts on classroom-level measures, seven did not. Two curricula showed significant impacts on at least some of the student-level measures at the end of the preschool year, while 13 curricula did not have any Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-9 statistically significant effects. By the end of the kindergarten year, three curricula demonstrated effects on at least some positive student-level outcomes, while 11 had no impacts and one had negative impacts. PCER was funded with the goal of providing decision makers with definitive evidence for choosing preschool curriculum. The initiative required that curricula be evaluated using random assignment and include samples of children and programs that were of interest for decision makers. The programs included Head Start, state prekindergarten, and private child care centers in urban, rural, and suburban locations. The initiative also included standardized measures for assessing outcomes at the student- and classroom-levels, as well as for reporting curriculum fidelity and contamination of intervention conditions, and for assessing participant response rates and attrition. Finally, the effort included independent evaluations of curriculum conducted by twelve research teams, with technical support for conducting studies from two contract research firms. Given the well-defined target population, standardized method for data collection, and experimental design, the PCER initiative represents the acme of field evaluation methods for informing evidence-based decision making. And yet, why did the PCER effort not yield more conclusive evidence for guiding curriculum choice? One issue was the lack of statistical power for individual studies to detect significant effects. Random assignment occurred at the classroom or program levelâwith group sample sizes ranging between 11 and 40 clusters per evaluation study (the median group-level sample size was 18 classrooms or programs). Research teams reported minimum detectable effect sizes that ranged from 0.34 to 0.69 across composite student outcome measures, suggesting that individual studies were mostly underpowered for detecting statistically significant effects unless the magnitude of the effects was at least larger than a third of a standard deviation (Preschool Curriculum Evaluation Research Consortium, 2008). The lack of statistically significant findings was perhaps not surprising. Variations in study design characteristics also challenged the interpretation of results. Across the 15 evaluation studies (of 14 curricula), there were substantial differences in the comparison conditions for assessing curriculum effects, preschool settings in which the evaluations occurred, location of sites, and training of teachers on curriculum materials. For example, the evaluation of Project Construct found no statistically significant effects on student- level outcomes. The evaluation of DLM Early Childhood Express with Open Court Reading Pre- K, however, concluded statistically significant effects for student-level outcomes in reading, phonological awareness, and language. Explaining why there are different effects across the two curricula is more challenging. One reason may be that DLM Early Childhood Express with Open Court Reading is a more effective curriculum than Project Construct. Another reason could be that the teacher-developed materials in the control condition for Project Construct was more effective than the materials used by teachers in the control condition in the DLM Early Childhood Express study. Curriculum effects may also vary by preschool settingâthe DLM evaluation took place in public prekindergarten classrooms in Florida, and the Project Construct evaluation took place in private child care centers in Missouri. To address some of the challenges with interpreting results from the PCER initiative, Jenkins et al. (2018) reanalyzed results from the 2008 PCER study through a meta-analysis. By combining effect estimates across the 15 curriculum studies, the Jenkins et al. team was able to Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-10 address some of the ambiguity in conclusions due to weak statistical power for the individual studies; they also explored one hypothesis about why effects may have varied across studies. To conduct their analyses, Jenkins and colleagues compared curriculum effects according to four different treatmentâcontrol contrasts in the PCER initiative: (1) literacy-focused curriculum versus HighScope and Creative Curriculum, (2) literacy-focused versus locally (or teacher-) developed curriculum, (3) mathematics-focused versus HighScope and Creative Curriculum, and (4) Creative Curriculum versus locally developed curriculum. Overall, the authors concluded that, compared with Creative Curriculum and HighScope, the literacy- and mathematics-focused curricula had evidence of improving student-level outcomes; they also concluded that there was not much evidence that Creative Curriculum and HighScope improved studentsâ school-readiness skills more than teacher- or locally developed curriculum approaches. However, as discussed previously, the curriculum studies varied in multiple ways besides the treatment contrast investigated by Jenkins et al. (2018). If the type of treatmentâcontrol contrasts covaried with other setting characteristics (including the type of preschool setting, the fidelity of curriculum implementation), it may be difficult to make definitive conclusions about why these effects differed. As such, while post hoc approaches such as meta-analysis allow the researcher to explore and disentangle various predictors of effect variation, these analyses cannot allow us to definitively point to the âcauseâ of why curriculum effects varied, nor is it possible to separate whether multiple sources of variation produce differential effects simultaneously. For example, it may be possible that the effectiveness of different types of curriculum approaches vary by the type of preschool program that it is delivered under and for different types of children enrolled in the program. To make such a conclusion would require prospective research designs that intentionally vary multiple systematic sources of effect variation. REPRESENTING SOURCES OF EFFECT HETEROGENEITY IN RESEARCH STUDIES FOR GENERALIZED FINDINGS The goal to identify what works under what conditions and for whom is not a new initiative in education research or in the evaluation of pre-K curriculum. Given the diversity in settings, populations, and conditions under which pre-K curriculum can be delivered, there is an intense desire to understand the extent to which and why curriculum effects vary. To address these concerns, IES introduced the Standards for Excellence in Education Research (SEER) in 2019, encouraging researchers to begin identifying the conditions under which intervention effects are generated. Specifically, SEER asked grant recipients to specify intervention components, document the treatment implementation and contrast, and take steps to facilitate generalization of study findings. The reasoning here was that it is difficult to identify sources of effect heterogeneityâeven as correlational relationshipsâwhen it is unclear what the effects themselves represent. In its 2022 review of IESâs work, the National Academies recommended that the agency prioritize the funding of studies to understand the extent to which intervention effects vary and that it begin to identify sources of effect variation. The preschool evaluation literature also calls for researchers to characterize and understand the extent to which intervention effects vary (NASEM, 2022). Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-11 CONCLUSION Evidence about curriculum effectiveness is a central issue for the consideration of quality. Despite broad-based agreement by researchers and funders that understanding sources of effect heterogeneity is important for evidence-based decision making, the evidence on curriculum effectiveness often falls short of achieving these goals. The prior section described challenges that researchers face in understanding sources of effect variation. Results from individual studiesâeven large-scale, multisite trialsâare often underpowered for detecting and testing for treatment effect variation (Sabol et al., 2022). In cases where results from multiple studies are combined, such as in a meta-analysis, it may be difficult to interpret the synthesized findings because individual study results may represent different populations, contexts, settings, and outcomes that are not well-understood by the meta-analyst and reader. Even when multiple curriculum evaluation studies are planned and conducted in coordinated waysâsuch as in the PCER studyâit may be difficult for researchers to understand and disentangle why effects differed across studies, given the multiple sources of effect variation that occurred simultaneously. Data and quantitative and qualitative methods are needed to describe the rich contextual experiences for how preschool curricula are implemented and delivered, as well as a need for new analytic methods to examine and describe variations in effects. Ideally, evidence generated using these methods would: â¢ accurately and reliably represent childrenâs skills and knowledge, regardless of their cultural background, language, and abilities; and â¢ represent curriculumâand curriculum componentsâthat are feasible and desirable for delivery in real-world settings and compatible with the goals and objectives of educators and program administrators who select the curriculum. â¢ employ methods and study design features to identify programmatic factors that moderate curriculum effectiveness Because effectiveness is determined by comparing outcomes from children participating in the curriculum with outcomes obtained from an alternative condition, it is crucial that comparisons represent conditions that program administrators, educators, and parents are also likely to face. Moreover, high-quality teaching requires that educators are responsive to the dynamic and individual needs of children in their classrooms, so adaptations from curriculum materials are likely to occur. Study findings that are informed by an understanding of how the curriculum was delivered in real-world settings, and the extent to which deviations occurred from the intended protocols for the intervention and comparison conditions can provide valuable insights on effectiveness. The issue then is how researchers should carry out a research agenda that addresses the evolving needs of a diverse early childhood education landscape. The future research agenda described in Chapter 10 of this report highlights three areas of work needed to support such a research endeavor. Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-12 REFERENCES Bloom, H. S., & Michalopoulos, C. (2013). When is the story in the subgroups? Strategies for interpreting and reporting intervention effects for subgroups. Prevention science: The Official Journal of the Society for Prevention Research, 14(2), 179â188. https://doi.org/10.1007/s11121-010-0198-x Campbell D. T. and Stanley, J. C. (1963). Experimental and quasi-experimental designs for research on teaching. In Gage N. L. (Ed.), Handbook of research on teaching (pp. 171 -246). Chicago, IL: Rand McNally. Clements, D. H., Sarama, J., Layzer, C., & Unlu, F. (2023). Implementation of a scale-up model in early childhood: Long-term impacts on mathematics achievement. Journal for Research in Mathematics Education, 54(1), 64-88. Cronbach, L. J. (1982). In praise of uncertainty. New Directions for Program Evaluation, 1982(15), 49â 58. https://doi.org/10.1002/ev.1310 Cruz, R. A., Kulkarni, S. S., & Firestone, A. R. (2021). A QuantCrit Analysis of Context, Discipline, Special Education, and Disproportionality. AERA Open, 7, 23328584211041354. https://doi.org/10.1177/23328584211041354 Duncan, G.J. and Magnuson, K. 2013. Investing in preschool programs. Journal of Economic Perspectives, 27 (2): 109-32. DOI: 10.1257/jep.27.2.109 Fraker T. and Maynard, R. (1987). The adequacy of comparison group designs for evaluations of employment-related programs. Journal of Human Resources, 22(2), 194â227. https://doi.org/10.2307/145902 Garcia, N. M., LÃ³pez, N., & VÃ©lez, V. N. (2018). QuantCrit: rectifying quantitative methods through critical race theory. Race Ethnicity and Education, 21(2), 149-157. https://doi.org/10.1080/13613324.2017.1377675 Gillborn, D., Warmington, P., & Demack, S. (2018). QuantCrit: education, policy, âBig Dataâ and principles for a critical race theory of statistics. Race Ethnicity and Education, 21(2), 158-179. https://doi.org/10.1080/13613324.2017.1377417 Imai, K., Gary King, Elizabeth A. Stuart. (2008). Misunderstandings Between Experimentalists and Observationalists about Causal Inference, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 171, Issue 2, April 2008, Pages 481â502, https://doi.org/10.1111/j.1467- 985X.2007.00527.x Jenkins, J. M., Duncan, G. J., Auger, A., Bitler, M., Domina, T., & Burchinal, M. (2018). Boosting school readiness: Should preschool teachers target skills or the whole child?. Economics of Education Review, 65, 107-125. https://doi.org/10.1016/j.econedurev.2018.05.001 LaLonde, R. (1986). Evaluating the econometric evaluations of training with experimental data. The American Economic Review, 76, 604â620. http://www.jstor.org/stable/1806062 Maier, M., Hsueh, J., Somers, M., & Burchinal, M. (2022) Does classroom quality promote preschoolersâ learning? A conceptual framework for evaluating the impact of classroom quality on child outcomes. MDRC: New York, New York. https://www.mdrc.org/sites/default/files/VIQI_Conceptual_Framework_Brief_508.pdf Morris, P. A., Connors, M., Friedman-Krauss, A., McCoy, D. C., Weiland, C., Feller, A., Page, L., Bloom, H., & Yoshikawa, H. (2018). New findings on impact variation from the Head Start Impact Study: Informing the scale-up of early childhood programs. AERA Open, 4(2). https://doi.org/10.1177/2332858418769287 National Academies of Sciences, Engineering, and Medicine. 2022. The Future of Education Research at IES: Advancing an Equity-Oriented Science. Washington, DC: The National Academies Press. https://doi.org/10.17226/26428. Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-13 Phillips, D. A, Lipsey, M. W., Dodge, K. A., Haskins, R., Bassok, D., Burchinal, M. R., Duncan, G. J., Dynarski, M., Magnuson, K. A., & Weiland, C. (2017). Puzzling it out: The current state of scientific knowledge on pre-kindergarten effects; a consensus statement. In The current state of scientific knowledge on pre-kindergarten effects (pp. 19â30). Washington, DC: Brookings Institution. https://www.brookings.edu/wp-content/uploads/2017/04/consensus-statement_final.pdf Preschool Curriculum Evaluation Research Consortium. (2008). Effects of preschool curriculum programs on school readiness (NCER 2008-2009). Washington, DC: National Center for Education Research, Institute of Education Sciences, U.S. Department of Education. Washington, DC: U.S. Government Printing Office. Sabol, T. J., McCoy, D., Gonzalez, K., Miratrix, L., Hedges, L., Spybrook, J. K., & Weiland, C. (2022). Exploring treatment impact heterogeneity across sites: Challenges and opportunities for early childhood researchers. Early Childhood Research Quarterly, 58, 14-26. https://doi.org/10.1016/j.ecresq.2021.07.005 Schauer, J. M., & Hedges, L. V. (2021). Reconsidering statistical methods for assessing replication. Psychological Methods, 26(1), 127â139. https://doi.org/10.1037/met0000302 Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton, Mifflin and Company. Slavin, R. (2019). Developer- and researcher-made measures. https://robertslavinsblog.wordpress.com/2019/10/24/developer-and-researcher-made-measures/ Spybrook, J., Kelcey, B., & Dong, N. (2016). Power Analyses for Detecting Treatment by Moderator Effects in Cluster Randomized Trials. Journal of Educational and Behavioral Statistics. Steiner, P. M., Wong, V. C., & Anglin, K. (2019). A causal replication framework for designing and assessing replication efforts. Zeitschrift fÃ¼r Psychologie, 227(4), 280â292. https://doi.org/10.1027/2151-2604/a000385 Stuart, E. A., Cole, S. R., Bradshaw, C. P., & Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society: Series A (Statistics in Society), 174(2), 369â386. https://doi.org/10.1111/j.1467- 985X.2010.00673.xExperimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin Company. Tefera, A. A., Powers, J. M., & Fischman, G. E. (2018). Intersectionality in education: A conceptual aspiration and research imperative. Review of Research in Education, 42(1), viiâxvii. https://doi.org/10.3102/0091732X18768504 Tipton, E. (2012). Improving generalizations from experiments using propensity score subclassification. Journal of Educational and Behavioral Statistics, 38(3), 239â266. https://doi.org/10.3102/1076998612441947 Tipton, E. (2021). Beyond Generalization of the ATE: Designing Randomized Trials to Understand Treatment Effect Heterogeneity, Journal of the Royal Statistical Society Series A: Statistics in Society,184(2), 504â521, https://doi.org/10.1111/rssa.12629 Tipton, E., & Olsen, R. B. (2018). A review of statistical methods for generalizing from evaluations of educational interventions. Educational Researcher, 47(8), 516â524. https://doi.org/10.3102/0013189X18781522 Tipton, E., & Olsen, R. B. (2022). Enhancing the generalizability of impact studies in education. (NCEE 2022-003). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance. Retrieved from https://files.eric.ed.gov/fulltext/ED617445.pdf Prepublication Copy, Uncorrected Proofs

EXAMINING VARIATION IN CURRICULUM EFFECTS 9-14 Tipton, E., Spybrook, J., Fitzgerald, K. G., Wang, Q., & Davidson, C. (2021). Toward a system of evidence for all: Current practices and future opportunities in 37 randomized trials. Educational Researcher, 50(3), 145-156. Yoshikawa, H., Weiland, C., Brooks-Gunn, J., Burchinal, M. R., Espinosa, L. M., Gormley, W. T., Ludwig, J., Magnuson, K. A., Phillips, D. A., & Zaslow, M. J. (2013). Investing in our future: The evidence base on preschool education. New York: Foundation for Child Development and Ann Arbor, MI: Society for Research in Child Development. https://www.fcd-us.org/assets/2016/04/Evidence-Base-on-Preschool-Education-FINAL.pdf Young, J., & Young, J. (2022). Decoding the data dichotomy: applying QuantCrit to understand racially conscience intersectional meta-analytic research. International Journal of Research & Method in Education, 45(4), 381-396. https://doi.org/10.1080/1743727X.2022.2093847 Wong, V. C., Steiner, P. M., & Anglin, K. L. (2018). What can be learned from empirical evaluations of nonexperimental methods? Evaluation Review, 42(2), 147â175. https://doi.org/10.1177/0193841X18776870 Jenkins, J. M., Whitaker, A. A., Nguyen, T., & Yu, W. (2019). Distinctions Without a Difference? Preschool Curricula and Childrenâs Development. Journal of Research on Educational Effectiveness, 12(3), 514â549. https://doi.org/10.1080/19345747.2019.1631420 Kosuke Imai, Gary King, Elizabeth A. Stuart (2008). Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 171, Issue 2, April 2008, Pages 481â502, https://doi.org/10.1111/j.1467-985X.2007.00527.x Prepublication Copy, Uncorrected Proofs

Next: 10 Conclusions, Recommendations, and Research Needs »

A New Vision for High-Quality Preschool Curriculum (2024)

Chapter: 9 Examining Variation in Curriculum Effects

Welcome to OpenBook!

Get Email Updates