The Problem with Outcome-Oriented Evaluations

Imagine I observe two poker players playing two tournaments each. During their first tournaments, Player A makes $1200 and Player B loses $800. During her second tournament, Player A pockets another $1000. Player B, on the other hand, loses $1100 more during her second tournament. Would it be a good decision for me to sit down at a table and model my play after Player A?

For many people the answer to this question – no – is counterintuitive. I watched Player A and Player B play two tournaments each and their results were very different – haven’t I seen enough to conclude that Player A is the better poker player? Yet poker involves a considerable amount of luck and there are numerous possible short- and longer-term outcomes for skilled and unskilled players. As Nate Silver writes in The Signal and the Noise, I could monitor each player’s winnings during a year of their full-time play and still not know whether either of them was any good at poker. It would be fully plausible for a “very good limit hold ‘em player” to “have lost $35,000” during that time. Instead of focusing on the desired outcome of their play – making money – I should mimic the player who uses strategies that will, over time, increase the likelihood of future winnings. As Silver writes,

When we play poker, we control our decision-making process but not how the cards come down. If you correctly detect an opponent’s bluff, but he gets a lucky card and wins the hand anyway, you should be pleased rather than angry, because you played the hand as well as you could. The irony is that by being less focused on your results, you may achieve better ones.

As Silver recommends for poker and Teach For America recommends to corps members, we should always focus on our “locus of control.” For example, I have frequently criticized Barack Obama for his approach to the Affordable Care Act. While I am unhappy that the health care bill did not include a public option, I couldn’t blame Obama if he had actually tried to pass such a bill and failed because of an obstinate Congress. My critique lies instead with the President’s deceptive work against a more progressive bill – while politicians don’t always control policy outcomes, they do control their actions. As another example, college applicants should not judge their success on whether or not colleges accept them. They should evaluate themselves on what they control – the work they put into high school and their applications. Likewise, great football coaches recognize that they should judge their teams not on their won-loss records, but on each player’s successful execution of assigned responsibilities. Smart decisions and strong performance do not always beget good results; the more factors in-between our actions and the desired outcome, the less predictive power the outcome can give us.

Most education reformers and policymakers, unfortunately, still fail to recognize this basic tenet of probabilistic reasoning, a fact underscored in recent conversations between Jack Schneider (a current professor and one of the best high school teachers I’ve ever had) and Michelle Rhee. We implement teacher and school accountability metrics that focus heavily on student outcomes without realizing that this approach is invalid. As the American Statistical Association’s (ASA’s) recent statement on value-added modeling (VAM) clearly states, “teachers account for about 1% to 14% of the variability in [student] test scores” and “[e]ffects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.” Paul Bruno astutely notes that the ASA’s statement is an indictment of the way VAM is used, not the idea of VAM itself, yet little correlation currently exists between VAM results and effective teaching. As I’ve mentioned before, research on both student and teacher incentives suggests that rewards and consequences based on outcomes don’t work. When we use student outcome data to assign credit or blame to educators, we may close good schools, demoralize and dismiss good teachers, and ultimately undermine the likelihood of achieving the student outcomes we want.

Better policy would focus on school and teacher inputs. For example, we should agree on a set of clear and specific best teaching practices (with the caveat that they’d have to be sufficiently flexible to allow for different teaching styles) on which to base teacher evaluations. Similarly, college counselors should provide college applicants with guidance about the components of good applications. Football coaches should likewise focus on their players’ decision-making and execution of blocking, tackling, route-running, and other techniques.

Input Output Graphic

When we evaluate schools on student outcomes, we reward (and punish) them for factors they don’t directly control.  A more intelligent and fair approach would evaluate the actions schools take in pursuit of better student outcomes, not the outcomes themselves.

Outcomes are incredibly important to monitor and consider when selecting effective inputs, of course. Mathematicians use outcomes in a process called Bayesian analysis to constantly update our assessments of whether or not our strategies are working. If we observe little correlation between successful implementation of our identified best teaching practices and student growth for five consecutive years, for instance, we may want to revisit our definition of best practices. A college counselor whose top students are consistently rejected from Ivy League schools should begin to reconsider the advice he gives his students on their applications. Relatedly, if a football team suffers through losing season after losing season despite players’ successful completion of their assigned responsibilities, the team should probably overhaul its strategy.

The current use of student outcome data to make high-stakes decisions in education, however, flies in the face of these principles. Until we shift our measures of school and teacher performance from student outputs to school and teacher inputs, we will unfortunately continue to make bad policy decisions that simultaneously alienate educators and undermine the very outcomes we are trying to achieve.

Update: A version of this piece appeared in Valerie Strauss’s column in The Washington Post on Sunday, May 25.


Filed under Education, Philosophy

8 responses to “The Problem with Outcome-Oriented Evaluations

  1. Well said, Ben; thank you.

    Just as Mr. Obama and politicians lie about what they are really “for” in policy, and we can observe that they say what appears as contrived bullshit (thank you Professor Frankfurt for making that an academic term from your bestseller, “On Bullshit”) because what they do is so far from what they say, this raises a question of the pattern we see in so-called “health care,” education policy, war policy (“self-defense?!” Really?), and so many areas (TSA genital groping is ok under the 4th Amendment??):

    What are these so-called “leaders” really up to? In education, are we being manipulated into calling public education a “failure” to put schools into private hands? Aren’t there similar and powerful questions in about 100 other areas that seem really important?

    This seems like a real-world critical thinking final exam for humanity. I’m glad Ben Spielberg is on our team!

  2. Thanks, Carl! I think evaluations based on actions can help us differentiate between empty rhetoric and espoused positions that might lead to better policy outcomes.

  3. Ewen

    Totally agree with you Ben. It does beg the question as to how to actually measure those inputs over a broad spectrum. Test scores are often used because of their “objectivity”, but with vastly different student populations and school environments, how do you compare teacher performance across districts? Is it possible? Is it even important? Just curious as to your thoughts.

    • You raise an excellent question. If we believe education is a right (I do, and I think most people do), then we should support standardized criteria for excellence as a way to provide equal access to that right.

      We are beginning to calibrate our understanding of measures of performance in San Jose Unified. Our administrators and consulting teachers are all getting the same training on how to assess the Teacher Quality Panel’s (TQP’s) new standards for areas of practice, and teachers, coaches, and administrators who participated in our EDI Progress Monitoring initiative this year also calibrated our assessment of teaching practices on SJUSD’s EDI rubric. Calibration on nuanced rubrics takes time and money and is a lot harder than checking boxes and judging people on student outcomes, but the approach is much fairer and significantly more likely to improve educational experiences for future students. Doing something similar across districts would involve far more coordination and investment, of course, but the payout would definitely be worth the cost and I think it’s very doable if we can shift our thinking away from outputs and towards teacher and school inputs.

      Hope that answers your question. Thanks for the great comment!

  4. Josh

    Great article, Ben. Use of the poker example at the beginning was very helpful, and I always like counter-intuitive results. It seems to me that the problems with most current trends in education policy is that they focus exclusively (or almost exclusively) on outcomes. Your emphasis on inputs is really important, but you also note that outcomes should still be considered. So it is not so much a total shift from one to the other, but more a shift in emphasis. Do you agree? We also need a better way of measuring outcomes – they need to be risk-adjusted, as well as measured holistically and both short term and long term.

    • I agree that it’s a shift in emphasis. We definitely shouldn’t ignore outcomes; they are our ultimate goal and we should, as you mention, monitor a broad range of them. As I wrote above, however, their use should be confined to revising our estimates of the probabilities that identified best practices are appropriate. We otherwise risk rewarding and punishing institutions and individuals for luck and/or extraneous factors instead of for job performance. Input-based evaluations are both fairer and more likely to foster growth than outcome-based evaluations.

      Thanks for the comment!

  5. David B. Cohen

    Reblogged this on InterACT and commented:
    Teacher evaluation and value-added measures have both been topics of frequent consideration here at InterACT. This re-blogged post by Ben Spielberg makes some excellent points about both, and provides a fine explanation of why we need to be cautious about relying on outcomes to evaluate teaching. Ben writes at where his bio notes: “Ben is currently a math instructional coach at two traditional public schools (one middle school and one high school) in San Jose, CA. He partners with all the teachers in both schools’ math departments to deliver high-quality instruction to students. ”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s