Evaluation 2017: From Learning to Action

View Printable Version

Current Issues in Randomized Experiments

Session Number: 1412
Track: Design and Analysis of Experiments
Session Type: Multipaper
Tags: Evaluation Design, Evaluation methods, experimental analysis, experimental design, experimental evaluation
Session Chair: Laura Peck [Principal Scientist - Abt Associates]
Discussant: Naomi Goldstein [Director - Administration for Children and Families]
Presenter 1: Diana Epstein [Senior Evidence Analyst]
Presenter 2: Robert Olsen [President - Rob Olsen LLC]
Presenter 3: Stephen H. Bell [Senior Fellow & Principal Scientist - Abt Associates Inc.]
Presentation 1 Additional Author: Jacob Alex Klerman
Presentation 2 Additional Author: Elizabeth Tipton
Time: Nov 11, 2017 (08:00 AM - 09:00 AM)
Room: Washington 5

Abstract 1 Title: When is a Program Ready for Rigorous Impact Evaluation?
Presentation Abstract 1:

The review of program evaluation literature suggests that, when rigorously evaluated, many apparently plausible programs are found to have at best small impacts not commensurate with their cost and often have no positive impacts at all. Further, many programs with positive results on an initial rigorous impact evaluation fail to demonstrate the same effectiveness in a second evaluation with an expanded population or at multiple sites. We argue that moving rapidly to an impact evaluation before a program is ready is a partial cause of the low success rates of such evaluations and propose a “falsifiable logic model” framework in response. Such a framework recommends starting first with a process evaluation that compares a program’s intermediate outcomes in the treatment group against concrete expected outcomes that would be suggested by a well-articulated logic model. This would allow funders and policymakers to save time and money by screening out programs unlikely to show impacts and therefore not ready for a rigorous impact evaluation.


This presentation will explain how the falsifiable logic model complements the contemporary view of a rigorous impact evaluation tollgate, where programs are developed, those that “pass” the falsifiable logic model are tested via rigorous impact evaluation, and only those that pass this tollgate proceed to broader rollout. If we were to proceed with first testing the intermediate steps of the logic model, it seems likely that many programs would fail to satisfy their own logic models and therefore fail to make it through all the tollgate steps. Given that reality, what should a funder do next? Should they invest in refining a program so that it might satisfy its own logic model and ultimately show positive evaluation results or simply abandon it in favor of other options? This presentation will lay out criteria for making such a decision, which may ultimately lead to more programs that show positive results when rigorously evaluated.

Abstract 2 Title: Statistical Methods for Improved External Validity in Impact Evaluations
Presentation Abstract 2:

Impact evaluations need internal validity to justify causal claims about the program or intervention being evaluated. But they also need external validity to ensure that the causal estimates are applicable outside the study. With both internal validity and external validity, evaluation results can be used to predict the impacts of future policy decisions. While the development of research methods for impact evaluations has focused primarily on internal validity, the field has turned to developing improved methods to bolster and assess the external validity of impact evaluations and their findings.

This paper provides a review of the methods that have been developed to improve and assess the external validity of impact evaluations, especially randomized trials, when they are conducted in a sample that is not representative of the population of policy interest. While these methods can be equally applied to quasi-experimental evaluations, methods development in this area has largely been motivated by randomized trials because they are considered by many to be the “gold standard” for internal validity, but they are typically conducted in convenience samples that may be unrepresentative of the population of interest—and thus they may have questionable external validity.

In this paper, we review three types of methods. First, we review methods for improved evaluation design. Most of these methods are focused on how samples are selected for impact evaluations, including probability sampling (Olsen and Orr, 2016) and balanced, nonrandom sampling designed to match the characteristics of the sample to the characteristics of the population of policy interest (Tipton et al., 2014). Second, we review methods for assessing the external validity of the evaluation, given the sample that was selected. These methods include approaches to comparing the sample and population on observed characteristics that may moderate the impact of the treatment (e.g., Tipton, 2014; Stuart et al., 2017); they also include approaches that account for unobserved differences between the sample and population—but require additional assumptions and data from other evaluations (Bell et al., 2016). Third, we review methods for impact analysis that adjust for differences between the study sample and the population—and extrapolate from the sample to the population. These methods include standard linear regression models with treatment effect interactions, more flexible modelling approaches (e.g., Chipman, George, and McCulloch, 2010), propensity score methods for reweighting (Cole and Stuart, 2010; Stuart et al., 2011), and propensity score methods for subclassification (Tipton, 2013).

Abstract 3 Title: Disentangling How Multiple Mediators Affect Social Program Impacts “Inside the Black Box” by Using Multivariate ASPES
Presentation Abstract 3:

Evaluators and policy makers often wish to identify the features of multi-faceted social programs that serve to enhance program impacts on participants. What is it inside a program’s “black box” intervention that generates favorable results, and which intervention components are not helping? Studies that randomly assign participants to receive varied intervention features provide the best basis for disentangling how multiple program components mediate impact magnitude. But most social experiments create a single treatment group (and a program-excluded control group) whose members then—post-random assignment—sort into patterns of participation in the program’s various components, or receive varying dosages of a given component. If researchers could identify within the control group counterparts to each of these endogenously-determined subpopulations of the treatment group, then experimental estimates of impacts on the latter could be produced. But the subpopulations of interest are not observable in the control group.

Analysis of symmetrically-predicted endogenous subgroups, or ASPES, provides a means of approximating this result, preserving the strengths of a randomized experiment to estimate how a discrete or continuous program feature affects impact magnitude. A substantial literature exists on ASPES, for both discrete (e.g., Bell & Peck, 2013; Peck, 2003, 2013) and continuous (e.g., Moulton et al., 2016; Peck, 2003) program feature mediators. With all of these tools, variation in impact inside the black box can only be examined one program feature at a time. Thus, when the discrete version of ASPES shows a larger impact on young children in Head Start classrooms with high rather than low academic content (Peck & Bell, 2014; Peck, Bell & Grindall, in progress), one has to wonder if other aspects of the children’s Head Start experience correlated with academic content cause better child development outcomes in their own rights (e.g., the extent of teacher-child interactions, the physical arrangement of and materials in the classroom). One needs an analytical tool that applies the symmetric-prediction approach of ASPES to multiple potential mediators of impact simultaneously to make this determination.

To that end, the current paper extends both the discrete and continuous versions of ASPES to simultaneous analysis of multiple mediators—creating a technique termed “multivariate analysis of symmetrically-predicted endogenous subgroups,” or “multivariate ASPES.” The design of the new methods is presented, along with an explanation of their properties and consideration of the challenges of applying them to real-world data. If future research applies these new analysis methods to randomized evaluation data, it will be able to capitalize on a study’s overall random assignment design while disentangling the distinct roles of each of multiple program features in influencing program impacts. It will also substantially reduce the risk of confounding the influence of omitted program features with the measured influence of an included feature, a risk that arises when applying conventional ASPES.

Theme: Learning What Works and Why
Audience Level: All Audiences

Session Abstract (150 words): 

In 2016, the New Directions for Evaluation published issue #152 on the topic of “Social Experiments in Practice.” This multi-paper brings together some of the contributors to that work and offers an review and update of key topics, including the ideal timing of rigorous evaluation, advances in thinking about external validity, and new extensions to endogenous subgroup analysis as a way of delving inside the so called “black box.”

For questions or concerns about your event registration, please contact registration@eval.org or 202-367-1173.

For questions about your account, membership status, or help logging in, please contact info@eval.org.

Cancellation Policy: Refunds less a $50 fee will be granted for requests received in writing prior to 11:59 PM EDT October 16, 2017. Email cancellation requests to registration@eval.org. Fax request to (202) 367-2173. All refunds are processed after the meeting. After October 16, 2017 all sales are final. For Evaluation 2017, international attendees and presenters who encounter complications due to the international travel environment will have up to 30 days after the event to request a refund and submit appropriate documentation. No administrative fee will apply for the international requests. The $50 fee will be waived for registrants who planned to travel into the US and experienced international travel issues.