Eric Lerum and I Debate Teacher Evaluation and the Role of Anti-Poverty Work (Part 2)

Published by

Ben Spielberg

August 11, 2014

StudentsFirst Vice President Eric Lerum and I recently began debating the use of standardized test scores in high stakes decision-making. I argued in a recent blog post that we should instead evaluate teachers on what they directly control – their actions. Our conversation, which began to touch on additional interesting topics, is continued below.

Click here to read Part 1 of the conversation.

Lerum: To finish the outcomes discussion – measuring teachers by the actions they take is itself measuring an input. What do we learn from evaluating how hard a teacher tries? And is that enough to evaluate teacher performance? Shouldn’t performance be at least somewhat related to the results the teacher gets, independent of how hard she tries? If I put in lots of hours learning how to cook, assembling the perfect recipes, buying the best ingredients, and then even more hours in the kitchen – but the meal I prepare doesn’t taste good and nobody likes it, am I a good cook?

Regarding your use of probability theory and VAM – the problem I have with your analysis there is that VAM is not used to raise student achievement. So using it – even improperly – should not have a direct effect on student achievement. What VAM is used for is determining a teacher’s impact on student achievement, and thereby identifying which teachers are more likely to raise student achievement based on their past ability to do so. So even if you want to apply probability theory and even if you’re right, at best what you’re saying is that we’re unlikely to be able to use it to identify those teachers accurately on an ongoing basis. The larger point that is made repeatedly is that because outside factors play a larger overall role in impacting student achievement, we should not focus on teacher effectiveness and instead solve for these other factors. This is a key disconnect in the education reform debate. Reformers believe that focusing on things like teacher quality and focusing on improving circumstances for children outside of school need not be mutually exclusive. Teacher quality is still very important, as Shankerblog notes. Improving teacher quality and then doing everything we can to ensure students have access to great teachers does not conflict at all with efforts to eliminate poverty. In fact, I would view them as complementary. But critics of these reforms use this argument to say that one should come before the other – that because these other things play larger roles, we should focus our efforts there. That is misguided, I think – we can do both simultaneously. And as importantly in terms of the debate, no reformer that I know suggests that we should only focus on teacher quality or choice or whatever at the expense or exclusion of something else, like poverty reduction or improving health care.

If you’re interested in catching up on class size research, I highly recommend the paper published by Matt Chingos at Brookings, found here with follow-up here. To be clear about my position on class size, however; I’m not against smaller class sizes. If school leaders determine that is an effective way for improving instruction and student achievement in their school, they should utilize that approach. But it’s not the best approach for every school, every class, every teacher, or every child. And thus, state policy should reflect that. Mandating class size limits or restrictions makes no sense. It ties the hands of administrators who may choose to staff their schools differently and use their resources differently. It hinders innovation for educators who may want to teach larger classes in order to configure their classrooms differently, leverage technology or team teaching, etc. Why not instead leave decisions about staffing to school leaders and their educators?

The performance framework for San Jose seems pretty straightforward. I’m curious how you measure #2 (whether teachers know the subjects) – are those through rigorous content exams or some other kind of check?

I think a solid evaluation system would include measures using indicators like these. But you would also need actual student learning/growth data to validate whether those things are working – as you say, “student outcome results should take care of themselves.” You need a measure to confirm that.

I honestly think my short response to all of this would be that there’s nothing in the policies we advocate for that prevent what you’re talking about. And we advocate for meaningful evaluations being used for feedback and professional development – those are critical elements of bills we try to move in states. But as a state-level policy advocacy organization, we don’t advocate for specific models or types of evaluations. We believe certain elements need to be there, but we wouldn’t be advocating for states to adopt the San Jose model or any other specifically – that’s just not what policy advocacy is. So I think there’s just general confusion about that – that simply because you don’t hear us saying to build a model with the components you’re looking for, that must mean we don’t support it. In fact, we’re focused on policy at a level higher than the district level, and design and implementation of programs isn’t in our wheelhouse.

Spielberg: I believe you discuss three very important questions, each one of which deserves some attention:

1) Given that student outcomes are primarily determined by factors unrelated to teaching quality, can and should people still work on improving teacher effectiveness?

Yes! While teaching quality accounts for, at most, a small percentage of the opportunity gap, teacher effectiveness is still very important. Your characterization of reform critics is a common misconception; everyone I’ve ever spoken with believes we can work on addressing poverty and improving schools simultaneously. Especially since we decided to have this conversation to talk about how to measure teacher performance, I’m not sure why you think I’d argue that “we should not focus on teacher effectiveness.” I am critiquing the quality of some of StudentsFirst’s recommendations – they are unlikely to improve teacher effectiveness and have serious negative consequences – not the topic of reform itself. I recommend we pursue policy solutions more likely to improve our schools.

Critics of reform do have a legitimate issue with the way education reformers discuss poverty, however. Education research’s clearest conclusion is that poverty explains inequality significantly better than school-related factors. Reformers often pay lip-service to the importance of poverty and then erroneously imply an equivalence between the impact of anti-poverty initiatives and education reforms. They suggest that there’s far more class mobility in the United States than actually exists. This suggestion harms low-income students.

As an example, consider the controversy that surrounded New York mayor Bill de Blasio several months ago. De Blasio was a huge proponent of measures to reduce income inequality, helped reform stop-and-frisk laws that unfairly targeted minorities, had fought to institute universal pre-K, and had shown himself in nearly every other arena to fight for underprivileged populations. While it would have been perfectly reasonable for StudentsFirst to disagree with him about the three charter co-locations (out of seventeen) that he rejected, StudentsFirst’s insinuation that de Blasio’s position was “down with good schools” was dishonest, especially since a comprehensive assessment of de Blasio’s policies would have indisputably given him high marks on helping low-income students. At the same time, StudentsFirst aligns itself with corporate philanthropists and politicians, like the Waltons and Chris Christie, who actively exploit the poor and undermine anti-poverty efforts. This alignment allows wealthy interests to masquerade as advocates for low-income students while they work behind the scenes to deprive poor students of basic services. Critics argue that organizations like StudentsFirst have chosen the wrong allies and enemies.

I wholeheartedly agree that anti-poverty initiatives and smart education reforms are complementary. I’d just like to see StudentsFirst speak honestly about the relative impact of both. I’d also love to see you hold donors and politicians accountable for their overall impact on students in low-income communities. Then reformers and critics of reform alike could stop accusing each other of pursuing “adult interests” and focus instead on the important work of improving our schools.

2) How can we use student outcome data to evaluate whether an input-based teacher evaluation system has identified the right teaching inputs?

This concept was the one we originally set out to discuss. I’d love to focus on it in subsequent posts if that works for you (though I’d love to revisit the other topics in a different conversation if you’re interested).

I’m glad we agree that “a solid evaluation system would include [teacher input-based] measures…like [the ones used in San Jose Unified].” I also completely agree with you that we need to use student outcome data “to validate whether those things are working.” That’s exactly the use of student outcome data I recommend. Though cooks probably have a lot more control over outcomes than teachers, we can use your cooking analogy to discuss how Bayesian analysis works.

We’d need to first estimate the probability that a given input – let’s say, following a specific recipe – is the best path to a desired outcome (a meal that tastes delicious). This probability is called our “prior.” Let’s then assume that the situation you describe occurs – a cook follows the recipe perfectly and the food turns out poorly. We’d need to estimate two additional probabilities. First, we’d need to know the probability the food would have turned out badly if our original prediction was correct and the recipe was a good one. Second, we’d need the probability that the food would have turned out poorly if our original prediction was incorrect and the recipe was actually a bad one. Once we had those estimates, there’s a very simple formula we could use to give us an updated probability that the input – the recipe – is a good one. Were this probability sufficiently low, we would throw out the recipe and pick a new one for the next meal. We would, however, identify the cook as an excellent recipe-follower.

This approach has several advantages over the alternative (evaluating the cook primarily on the taste of the food). Most obviously, it accurately captures the cook’s performance. The cook clearly did an excellent job doing what both you and he thought was a good idea – following this specific recipe – and can therefore be expected to do a good job following other recipes in the future. If we punished him, we’d be sending the message that his actual performance matters less than having good luck, and if we fired him, we’d be depriving ourselves of a potentially great cook. Additionally, it’s not the cook’s fault that we picked the wrong cooking strategy, so it’s unethical to punish him for doing everything we asked him to do.

Just as importantly, this approach would help us identify the strategies most likely to lead to better meals in the long run. We might not catch the problem with the recipe if we incorrectly attribute the meal’s taste to the cook’s performance – we might end up continuously hiring and firing a bunch of great cooks before we realize that the recipe is bad. If we instead focus on the cook’s locus of control – following the recipe – and use Bayesian analysis, we will more quickly discover the best recipes and retain more cooks with recipe-following skills. Judging cooks on their ability to execute inputs and using outcomes to evaluate the validity of the inputs would, over time, increase the quality of our meals.

Let’s now imagine the analogous situation for teachers. Suppose a school adopts blended learning as its instructional framework, and suppose a teacher executes the school’s blended learning model perfectly. However, the teacher’s value added (VAM) results aren’t particularly high. Should we punish the teacher? The answer, quite clearly, is no; unless the teacher was bad at something we forgot to identify as an effective teaching practice, none of the explanations for the low scores have anything to do with the teacher’s performance. Just as with cooking, we might not catch a real problem with a given teaching approach if we incorrectly attribute outcome data to a teacher’s performance – we might end up continuously hiring and firing a bunch of great teachers based on random error, a problem with an instructional framework, or a problem with VAM methodology.

The improper use of student outcome data in high-stakes decision-making has negative consequences for students precisely because of this incorrect attribution. Making VAM a defined percentage of teacher evaluations leads to employment decisions based on inaccurate perceptions of teacher quality. Typical VAM usage also makes it harder for us to identify successful teaching practices. If we instead focus on teachers’ locus of control – effective execution of teacher practices – and use Bayesian analysis, we will more quickly discover the best teaching strategies and retain more teachers who can execute teaching strategies effectively. Judging teachers on their ability to execute inputs and using outcomes to evaluate the validity of the inputs would, over time, increase the likelihood of student success.

3) As “a state-level policy advocacy organization,” what is the scope of StudentsFirst’s work?

You wrote that StudentsFirst “[doesn’t] advocate for specific models or types of evaluations” but believes “certain elements need to be there.” One of the elements you recommend is “evaluating teachers based on evidence of student results.” This recommendation has translated into your support for the use of standardized test scores as a defined percentage of teacher evaluations. I was not recommending that you ask states to adopt San Jose Unified’s evaluation framework (as an aside, the component you ask about deals mostly with planning and, among other things, uses lesson plans, teacher-created materials, and assessments as evidence) or that you recommend across-the-board class size reduction (thanks for clarifying your position on that, by the way – I look forward to reading the pieces you linked). Instead, since probability theory and research suggest it isn’t likely to improve teacher performance, I recommend that StudentsFirst discontinue its push to make standardized test scores a percentage of evaluations. You could instead advocate for evaluation systems that clearly define good teacher practices, hold teachers accountable for implementing good practices, and use student outcomes in Bayesian analysis to evaluate the validity of the defined practices. This approach would increase the likelihood of achieving your stated organizational goals.

Thanks again for engaging in such an in-depth conversation. I think more superficial correspondence often misses the nuance in these issues, and I am excited that you and I are getting the opportunity to both identify common ground and discuss our concerns.

Click here to read Part 3a of the conversation, which focuses back on the evaluation debate.

Click here to read Part 3b of the conversation, which focuses on how reformers and other educators talk about poverty.

5 responses to “Eric Lerum and I Debate Teacher Evaluation and the Role of Anti-Poverty Work (Part 2)”

David B. Cohen

August 11, 2014

Great exchange here, Ben and Eric – I’ve read every word so far, and appreciate that we’re getting past sound bytes an assumptions about each other’s positions. I hope Eric takes note on the disconnect Ben noted regarding Students First advocacy. If we discount some of the internet noise and one-off tweets from frustrated people, no serious debate about these issues includes people saying poverty must be solved before we move on to improving schools. However, judging by Students First policy initiatives, its political spending, its funding sources and allies, it seems entirely fair to me to argue that SF shows little or no interest in addressing poverty up front – only through the assumption that well-educated children will eventually escape poverty. If I’m to believe that SF really cares about addressing the effects of poverty, I’d love to be shown examples where SF has advocated for policies that promote children’s health and wellbeing – which might mean supporting the kids’ families – perhaps by providing better access to health care, counseling, affordable housing, public transportation, etc. Has Students First ever put out a statement criticizing a school system or state for closing libraries or health clinics, for reducing counseling staff, etc.? We know SF opposes seniority in times of layoffs, but has SF ever proposed that a state maintain or raise taxes or fees that would provide the revenue to avoid the layoffs?

Reply
1. Ben Spielberg
  
  August 11, 2014
  
  Thanks, David, for the great comment. I look forward to hearing Eric’s response. I also would love to see more of an interest from education reformers in addressing the primary factors that could help students in low-income communities. However, as I wrote in a discussion with NYCUrbanEd in response to an earlier post: “I think it’s perfectly acceptable for organizations and/or individuals to focus their professional advocacy on 10-15% of a problem as long as they do two things: acknowledge the limitations of their advocacy in addressing the problem and avoid undermining other (often more important) efforts to ameliorate the problem.” StudentsFirst, unfortunately, does not speak honestly about the relative impact of education reforms and poverty conditions. They also actively undermine anti-poverty efforts by supporting the campaigns of politicians who make things extremely difficult for poor kids. I don’t necessarily have to see them take the positions you mention (though I think they absolutely should, and that it would be silly not to), but their stated commitment to low-income kids will continue to ring hollow for most people until they address these two issues (a point I plan to make when we return to this discussion in a later post).
  
  Reply
2. Eric Lerum
  
  August 12, 2014
  
  Thanks for the comment/questions, David. To answer them directly, no, we have not advocated on issues outside of education and/or schools – there needs to be some direct connection for us to weigh in. That’s not to say that those aren’t important or that you wouldn’t find a majority of folks working for us and in ed reform generally in support of them – I believe you would. But for us, it’s important to be single-issue focused. It comes at a cost, not the least of which is being subject to criticism like this. But I honestly believe, having worked in a policy-making and implementation capacity for a decade before doing this work, that there is a need for both approaches. Diluting your message and your focus – however good the reasons may be – does just that. I think that’s been part of the problem in maintaining momentum for true change – folks who care about kids’ education (on both sides of the reform debate) are often too quick to focus on other issues or allow other concerns and priorities to come in. We focus on the policy levers we think are right for change – and we try very hard to stay disciplined about that.
  
  As for advocating for raising taxes – nope, haven’t done that either. There are lots of reasons for that – from my perspective, the main one is that I’m not an expert on tax policy. I have no idea what tax structure – what revenue model – is best for any given jurisdiction and I have no interest in figuring that out. A state or district may not need to raise taxes – it could simply be a matter of reallocating their current spending and prioritizing education more. Or maybe not. I don’t know. Instead, we do advocate strongly for spending whatever resources they choose to allocate toward education in smarter ways. That means not spending on things that don’t work. It means providing greater flexibility for spending decisions at the school level, closer to classrooms and educators using the resources. And it means concentrating funding with students who have greater needs. For example, we helped OH revise its funding formula last year to create a weighted student funding formula that gave more resources to low-income kids.
  
  One last note – since reformers are always tagged with who they associate with and what they’ve done previously – during my time in DC in the Deputy Mayor’s office – in concert with the reforms Michelle instituted at DCPS – we created an integrated coordinating structure for interagency collaboration for the first time in DC. We created a school-based mental health program and piloted a half-dozen evidence-based mental/social/emotional health programs aimed at addressing exactly the needs you’re mentioning in your post. It was an integral part of our work, and therefore the reform in DC. It never gets talked about. It didn’t get any attention. But it was real and meaningful.
  
  Reply
Dave aka Mr. Math Teacher

August 11, 2014

Reblogged this on Reflections of a Second-career Math Teacher and commented:

Re-blogging and commenting in one fell swoop…
************************************************
Hi Ben. It is good to see you advocate for reliable measures of teacher performance, rather than blindly placing trust in a noisy statistical estimator such as VAM as advocated by StudentsFirst. Your stewardship to a teaching profession whose intent is to achieve maximal student outcomes is admirable.

At the same time, I cringed when you adopted the ‘cook’ example first offered up by Eric. In doing so, I believe you inadvertently oversimplified the act of teaching to following a district-prescribed teaching method, which in the limit approaches a script.

While you did not create the ‘cook’ analogy, running with it while narrowing the task to ‘following a recipe,’ which while advantageous to making your point on how to make inferences using conditional probability (Bayesian Analysis), feeds directly into many reformers’ narratives that teachers simply need to follow a scripted lesson. Nothing is further from the truth.

I’ve excerpted the two sections of your response to your question 2) below that upon reading, I cringed. In these, it appears that you conflate a school’s “instructional framework” with “[effective] teacher practices,” where the former is simply a limited, possible example of the latter depending upon the specific circumstances of any learning environment. Scripted instructional methods and/or content are anathema to teaching a diverse student body. I hope you agree.

*****************************************
2) How can we use student outcome data to evaluate whether an input-based teacher evaluation system has identified the right teaching inputs?***

Excerpt 1:
“Suppose a school adopts blended learning as its instructional framework, and suppose a teacher executes the school’s blended learning model perfectly.”

Excerpt 2:
“If we instead focus on teachers’ locus of control – effective execution of teacher practices – and use Bayesian analysis, we will more quickly discover the best teaching strategies and retain more teachers who can execute teaching strategies effectively. Judging teachers on their ability to execute inputs and using outcomes to evaluate the validity of the inputs would, over time, increase the likelihood of student success.”
*****************************************

If I were to rewrite your last sentence in excerpt 2, it would state: “Judging teachers on their ability to select dynamically from an array of effective inputs suited to the specifics of a given learning situation and using outcomes to evaluate the validity of the situationally selected inputs would, over time, increase the likelihood of student success.” In my view, rather than cook as proxy for a teacher in this portion of the debate, chef is more apropos.

Lastly, my interpretations, and concerns, may be moot if your intent was simply to illustrate how Bayesian analysis, using Eric’s example of a cook, could be used to determine the effectiveness, or ineffectiveness, of actions on outcomes. As such, I hope that you do not believe a teacher’s locus of control consists of “executing” a prescribed method or framework.

I hope all is well as you settle into your new surrounding in Washington, DC metro area.

Dave

Reply
1. Ben Spielberg
  
  August 11, 2014
  
  Hi Dave,
  
  The points you raise are excellent ones, and I did not spend enough time clarifying the limitations of the cook analogy (I think you hit on one of the major ones). I ran with the analogy primarily to explain how Bayesian analysis works; I would never recommend a scripted curriculum. In my original post on the subject, I noted that best practices would have to be defined so that they were “sufficiently flexible to allow for different teaching styles,” but this point is probably worth making clearer. I love your rewrite of my last sentence in excerpt 2 and your suggestion of using the “chef” instead of the “cook.” In excerpt 1, I did not mean to imply that the conception of blended learning (or any other framework) would be rigid. Instead, I meant to argue solely that, if a teacher and evaluator agree that a given approach is a worthwhile one, it is the execution of that approach that should be considered primarily in the evaluation.
  
  Thanks for the thoughts; I really appreciate them and will definitely keep them in mind as Eric and I continue the conversation.
  
  Ben
  
  Reply

34justice