Doug Lemov's field notes

Reflections on teaching, literacy, coaching, and practice.

05.20.13Scoring in the Lab: Surprisingly Revealing

in the labFor the past year or two we’ve been working closely with a major teacher training organization to help them develop successful new teachers faster and more effectively.   They work in more than a dozen cities and are the most data-driven organization we know of.

One of the things we recently learned is particularly fascinating for everyone whose work involves trying to develop talent.

The organization re-designed its summer training program around teaching techniques.  They wanted to achieve better mastery of key skills that assured new teachers of success and to see if more persistent and effective practice could be the tool to develop those skills.

During the summer they taught and practiced a variety of TLaC skills—What to Do, Positive Framing, Strong Voice, etc.   They leaned heavily towards behavioral and cultural skills on the premise that if they could get their first year teachers have productive learning environments, it would buy them time to build the more complete skill set necessary for rigorous learning. Their data told them that if teachers didn’t build a vibrant and orderly classroom culture in the first six months of teaching in , they’d likely never achieve success.

After the session on each technique—typically a day or two in length—they scored each candidate on the quality of their implementation.  This score was done “in the lab,” that is, they were scored based on how they demonstrated a technique –say, What to Do in a series of practice activities and role plays.

At the end of the summer they were also scored on a final lesson they taught kids.  Rather than a score based on performance in drills, this was a score based on a more realistic setting- the lesson had real content and real kids.

Then they tracked teacher data over the course of the year—principal evaluations, observations by trained neutral evaluators, MET student survey data. (Test core data too but it isn’t available yet).

They correlated these initial measures to the end of year out comes for more than 500 teachers:

1) the teacher’s selection score when they were chose for the program, a score of professionalism and insight and preparedness based on interviews, data such as grades, and writing.

2) the teacher’s practice score on 4 or 5 specific techniques

3) the teacher’s teaching sample score on 5-7 techniques

“We almost never see durable effects when we do this kind of correlation,” the head of the organization told me.  “It’s almost impossible to trace the bread crumbs of differences in teacher effectiveness back to anything you do in training or selection.”

 

By the time the end of the year rolls around it’s all washed out. “Any correlation at all is a surprise.”

But this year one of the three measures yielded a strong correlation. The strongest they’ve seen.  Want to guess which one?

No, not the selection score– no correlation at all there.  And surprisingly, no correlation at all on the scrimmage score. The real lesson with real kids.  But the scores based on technical proficiency in practice—even though it was highly “unrealistic”– produced remarkable correlations.

Two techniques were particularly strong: What to Do and Control the Game.

First question: why those techniques above all others?  My own theory is that they reward diligence. They’re pretty focused. If you decide you want to learn them and you’re decent at incorporating new ideas, you can. So what scores on those techniques reveal is someone who develops skills decently well and conceives of teaching as a craft that you built proficiency at by mastering approaches and methods, rather than assuming a mindset, say, or a philosophy.

Second question: Why were practice scores so revealing and what does it mean?  This one I want to throw out to you to reflect on.  Generally we evaluate people in the most realistic settings. But those settings it turns out introduce all sorts of variables and extraneous factors beyond someone’s capacity to master skills.  So it raises questions to me about whether “scrimmages” really are the best way to predict future performance or whether we ought to be more serious about scoring people “in the lab,” that is in a practice-like setting.  Any thoughts?

 

,

2 Responses to “Scoring in the Lab: Surprisingly Revealing”

  1. James Cryan
    May 22, 2013 at 4:04 pm

    Doug – this is fascinating, thanks for sharing. How strong was the correlation?

    • Doug_Lemov
      May 22, 2013 at 9:07 pm

      Hi, James. The strongest correlations were .2 with p<0.001. Not huge but also not insignificant…. i'm sure you know this but a bit of context for others' sake: a typical correlation between a major leage hitter's batting average from one year to the next is about .3

Leave a Reply