No More Marking? Our experience using Comparative Judgement









I first came across comparative judgement and Chris Wheadon’s No More Marking website about three years ago, when it was very much in its infancy. For some reason, I didn’t recognise its potential; I saw more drawbacks to collaborative assessment than benefits. What I hadn’t properly considered were the significant flaws in existing methods for assessing students’ written work – issues of bias, the illusion of objective evaluation against scoring rubrics, etc. I also didn’t fully appreciate the central premise that underpins comparative judgement: that human beings deal more in relative comparisons than absolute definites.

The significant benefits of using comparative judgement are much more obvious to me now, not just for English, but for other subjects areas too. Whilst it is not without its issues (see below), the more I use comparative judgement, and the accompanying assessment tools on the ever-improving No More Marking site, the more I think it can really help increase the reliability of assessing certain pieces of work, as well make a big difference in reducing teacher workload. There are other potential benefits too, such as opportunities for collaborative professional learning, getting better at understanding what makes a good piece of work, and quickly seeing different strengths and weakness across a cohort.

Most of the examples I have read about of school’s using comparative judgement tend to focus on the assessment of writing – facets of effective composition, such as control, organisation and style. An obvious example is Daisy Christodoulou’s pioneering work with Chris Wheadon, which is extremely useful in showing how to use comparative judgement at scale, as well as demonstrating how it can lead to greater reliability than teacher judgement and more conventional forms of standardisation. Comparative judgement of small pieces of written work is also at the heart of the FFT’s English Proof of Progress test that many schools, including ours, are using to measure the progress of their KS3 students and to cross reference against their own emerging assessment models.

This is all well and good, and I would imagine that even comparative judgement’s staunchest detractors can see that it has something to offer the process of assessing for things like style and technical accuracy. What I think is less well documented, though, is how comparative judgement can support the assessment of other areas of the English curriculum, such as longer pieces of analytical writing. This is because it’s much harder to use comparative judgement in this way. Yet, within my department, and probably for other secondary school departments too, this is what we are interested in right now: learning how comparative judgement might support the process of marking ever-increasing amounts of essays that our students are writing at both GCSE and A Level. Essays that we want to assess and that we want to assess reliably and quickly.

Unlike the assessment of writing, though, where it is possible to quickly read a piece of writing and make an instinctive judgement about its relative quality and accuracy, I think that analytical responses are much more problematic. For a start, judges must be well versed in the text or texts being written about. This is not an insurmountable hurdle, since many teachers in a department teach the same text, and one would hope that most English teachers are au fair enough with texts on a GCSE syllabus to pass judgement on a piece of analysis. That said, knowledge of the text and knowledge of the focus of the analysis – such as the extent to which contextual links play a role – are much more of a factor in collaborative assessment of reading than with writing, which makes it more difficult to enlist more judges and is therefore more time-consuming to make comparative judgements.

Trialling Comparative Judgement

We have now used comparative judgement in the English department on three separate occasions, most recently to assess a year 11 literature mock question on Dr Jekyll and Mr Hyde. Whereas last year we focused on experimenting with the process and getting used to marking in such a different way, this year we have increased our use of comparative judgement with the longer-term aim of making it a key component of our overall assessment portfolio. Rather than blindly replacing the old with the new, however, which is certainly tempting when you think you can see the benefits from the outset, we are mindful that we need to tread carefully.

As a result we have set up a controlled trial to try and get some objective feedback to check against our hunches. The trial essentially consists of splitting our GCSE cohort into two groups. All students will sit 5 literature assessments throughout the course of the year, with one group having their assessments marked using comparative judgement, and the other through the more traditional method of applying a mark scheme followed by a process of moderation. Using a combination of quantitative and qualitative methods, we hope to ascertain the effect, if any, of using comparative judgement on student learning, but also, more importantly, its impact on teacher workload. Admittedly, such evaluation is flawed, but we hope that it will at least make us better informed when we come to make a decision later on about whether to adopt comparative judgement more widely.

Issues and solutions

The impact of poor handwriting on grade scores is not a new phenomenon. I remember when I was a GCSE exam marker: I would much prefer reading legible scripts and curse the ones I had to spend time deciphering. Obviously, I tried to not let students’ poor handwriting get in the way of making my judgments, but the reality was it probably did, even if the only bias was the additional time meant I saw flaws more clearly. When you are marking your own students’ essays – as with the usual way we mark our internal assessments – you get used to those students with tricky handwriting, and learn how to decipher their meaning, perhaps unconsciously giving them the benefit of the doubt because you know what they meant.

It’s even harder to avoid handwriting bias with comparative judgement, particularly when you are encouraged to make quick judgements and you are reading lots of scanned photocopied scripts off a computer screen. Poor handwriting was clearly a factor behind some of the anomalous results from our recent session. Several teachers noted how hard it was to properly read some essays, and a deeper examination of the worst offenders showed that the mean time of judgements on them was much longer than those that were easier on the eye. Most of these essays also scored badly. Conversely, almost all the best essays had the neatest, most legible pen work. Under closer inspection, however, a significant number were clearly in the wrong band.

It would be wrong to suggest that all the anomalies we found after interrogating the ranked order of essays was entirely down to issues of handwriting. There were a number of administrative failures, such as students writing on the wrong part of barcoded paper and some of the scans uploaded back to front, which gave the impression that some students had not written very much at all, or only in fragments. These are technical issues, and can easily be ironed out the more we get to grips with the approach. That is the whole point of taking things slowly and learning from trial and error.

Aside from issues of handwriting and administration a number of other anomalies remained. Some of these apparent errors turned out to be completely right: students that teachers had expected to score highly had not written a good essay, and students who had not really been expected to gain high marks did much better than anticipated. With our usual approach – teachers marking their own classes with some subsequent moderation – I suspect that some of these surprising results would not have been apparent. Other anomalies were just plain wrong, which I would love to illustrate but our uploaded scripts are no longer available on the new No More Marking website. We still haven’t got to the bottom of why a significant number of these scripts were placed in completely the wrong order / bands. Some error is inevitable, of course, but the question is probably more about whether comparative judgement has created these errors, or whether they were always there and that comparative judgement has just brought them to light.

I hope to be able to answer this question as the year goes on.

Next steps:

  • Brief teachers on issues of bias with poor handwriting and halo effect of neat work
  • Emphasise to students the importance of taking care with their handwriting
  • Standardised instructions conditions for all students taking the tests
  • Teacher standardisation session using exemplar work from previous session
  • Clearer focus on the criteria for judgements
  • Previous responses used as anchors in the judging session
  • Divide up marking sessions: 1) an initial collaborative judging to iron out issues, identify interesting or salient features of students’ work and to check teacher reliability, etc. 2) Independent judging session/s another time to avoid issues of fatigue and cognitive overload
  • Investigate significant anomalies and identify possible factors into judgements
  • Use insights into student work to inform subsequent teaching

Disciplined enquiry, or how to get better at getting better


How do you know what to do to improve your teaching? And if you can identify what you need to do get better, how do you know whether what you are doing to try and improve is actually making a difference where it really matters: in developing your students’ learning?

I think there are probably five main sources available to teachers to help them identify areas for their improvement. These are the data on their students’ outcomes, feedback from their colleagues, feedback from their students, research evidence into what works and where, and, finally, their reflections about their practice.

Each of these sources can be extremely useful, providing teachers with valuable insights into where they might need to focus. Equally, they can all be very unhelpful, giving unreliable feedback on areas of strength and weakness, particularly where limitations and nuances are not fully understood, or where potential improvement tools are used as performance measures.

Perhaps the best approach is to take a number of these sources of feedback together, increasing the likelihood of identifying genuine areas for improvement. In subsequent posts, I hope to outline a framework that harnesses these feedback mechanisms into a clear and systematic structure, but for now I want to focus on exploring just one means of self-improvement: getting better at being you.

In many respects, you are both the best source of feedback, and the worst of source of feedback; you can be wise and foolish in equal measure! The problem is that, whilst you are undoubtedly the one who spends the most time with your students and the one who thinks the most carefully about how to help them improve, you are also extremely prone to bias and flawed thinking, which can make it hard for you to trust your judgements, especially in relation to developing your own practice.

Others have written extensively about human fallibility and the dangers of trusting instinct. Daniel Kahnemman’s Thinking Fast and Slow, David Didau’s What If Everything You Knew About Education Was Wrong? and David Mcraney’s You Are Not So Smart all provide excellent insights into how we humans routinely get things wrong. It is clear, then, that we need to understand and respect our cognitive limitations and avoid thinking we know what works just because it feels right. Instinct is not enough. That said, I believe we can be useful sources of feedback in relation to improving our own teaching, particularly if we can learn how to reduce the impact of our biases and can get better at being more objective.

What is disciplined enquiry

Honing the skills of restrained reflection is the hallmark of a disciplined enquirer, and disciplined enquiry is what I have come to think is probably the best we way can grow and develop as a profession. Like many terms in education, disciplined enquiry means lots different things to lots of different people. For me, it represents the intersection between the science and the craft of teaching, and involves a systematic approach that encourages teachers to ‘think hard’ about their improvement and making use of the best available evidence to inform their decision-making. My definition of a disciplined enquirer tries to capture this complexity:

A disciplined enquirer draws upon internal and external experience – they operate as both subject and object in relation to improving their own practice. Through a systematic framework a disciplined enquirer develops the ability to limit the impact of bias, whilst learning how to become more attune to interpreting the complexity of the classroom, such as appreciating the role of emotions, the impact of actions and the nature of relationships. Over time and through deliberate noticing they become increasingly sensitive to interpreting patterns of behaviour and learning how to react better in the moment and how to make better decisions in the future.

Understanding how we make decisions

Perhaps the first step to becoming a disciplined enquirer is to recognise the nature of decision-making itself. Kahneman’s model of system one and system two thinking is instructive here. System one thinking describes the way we use mental shortcuts to quickly make sense of complex phenomena and to give us the appearance of coherence and control, whereas the system two model uses a more methodical and analytical approach to decision-making, where we take our time to review and weigh up choices. The trade off between the two modes is time and effort. The result is that busy teachers come to rely more and more on quick, instinctive system one thinking over the slower, more deliberate system two model, which can lead to mistakes.

As well as understanding how we make decisions and how we react to given situations, a disciplined enquirer needs to appreciate the way that we gain insights in the first place, since it is the opening up new ways of seeing that we are ultimately looking for in order to help us improve our practice. It seems to me that if we know the conditions under which we are more likely to learn something new, whether about our teaching, our students’ learning or any other aspect of the classroom environment, then we are better able to take steps to recreate these conditions and harness them when they manifest.

In Seeing What Others Don’t See, Gary Klein uses a triple-path-model to illustrate the ways in which we commonly reach such new insights. Klein’s model challenges the widely held notion of eureka moments, where inspiration or epiphany follows long periods of gestation. From studying decision-making in naturalistic conditions, Klein suggests there are three main triggers that typically lead to new insights – contradiction, connection, and creative desperation. These triggers, working on their own or in combination, shift or supplant the existing anchors that we ordinarily rely upon to make decisions. An anchor is a belief or story that gives us a sense of coherence and informs the decisions that we make, often without us even realising.


In some respects, Klein’s anchors resemble the idea of mental shortcuts, or heuristics, in Kahneman’s model of system one thinking. The anchor and the heuristic both guide action, usually subconsciously, and both can prevent us from seeing things clearly. Whilst we need heuristics (or anchors) to make our daily lives manageable – getting from A to B, for instance, without endlessly checking the route – for more complex decision making, such as that which constitutes classroom teaching, they can often lead us to make mistakes or develop false notions of what works. Disciplined enquiry should therefore seek to find ways to engage system two thinking, and to consciously trigger the cultivation of better anchors to help us improve our decision-making.

There are a number of steps that can help achieve this end. The diagram below gives an idea of what this might look like in practice. None of the suggestions are a panacea – it is surprisingly difficult to shift our thinking in relation to our deeply held values and beliefs – but they are an attempt to provide some sense of how we could get better at not only making decisions, but also of being aware of the reasons why we are making those decisions in the first place. The goal for disciplined enquiry is, then, to try ti find ways to override system one intuition, and activiate system two consideration.


Identifying inconsistency

One example Klein uses to illustrate the trigger of identifying inconsistency is the case of an American police officer who whilst following a new car is struck by the strange behaviour of the man in the passenger seat. Following the car, which is otherwise being driven normally, the officer notices the passenger appear to stub a cigarette out on the seat. What he witnesses is at odds with his understanding of what people normally do when riding as passengers in new cars. As a result he decides to pull the car over – an action that leads to an arrest, when it turns out that the car has in fact been stolen.

There are several ways a disciplined enquirer can set out to deliberately create this kind of inconsistency of thought – the sort of cognitive dissonance that might lead to a useful new insight into an aspect of pedagogy. One obvious way is to actively seek out alternative views or dissenting voices. Rather than always being surrounded by likeminded opinions, whether online or in the staffroom, teachers wishing to improve their practice should spend time listening to the views of those with contrary positions. This approach helps to avoid groupthink and fosters the kind of self-questioning that might shed light on an area of practice previously hidden.

Spotting coincidence

Unlike the trigger of identifying inconsistency, the trigger of spotting coincidence is about looking for similarities and patterns between phenomena and using these revealed relationships to build new insights. One of Klein’s examples of how spotting coincidence can change understanding and lead to meaningful changes in practice involves the American physician, Michael Gottilieb. After noticing connections between the symptoms of a number of his homosexual patients in the early 1980s, Gottilieb began to realise that what he was actually dealing with was something very different and very important from what he had previously experienced. His insights led him to publish the first announcement of the AIDS epidemic.

There are two crucial aspects of this story in respect of disciplined enquiry. The first is that Gottilieb’s insight didn’t happen overnight. It was slow process over a long period of time involving the gradual noticing of patterns that could not initially be attributed to something already known. Too often us teachers try to make too many changes to our practices too quickly, without understanding or assessing their impact. The second important point is how much Gottilieb retained his focus – he didn’t just notice something once, think it was interesting and then move on; instead he relentlessly pursued an emerging pattern, consciously noting down his observations, until he could formulate his observations into something more concrete and usable.

One of the key things that leads to developing new insights is thus a combination of time and deliberate attention: being alive to the possibility that two or three things that have something in common may lead to something more meaningful, or they may not. As the name suggests, disciplined enquiry involves disciplined focus, something so often overlooked in education in the scramble to share untested best practice. It is far better to isolate one or two variables in the classroom and look to notice their impact on student learning, than to proceed on a whim.

Escaping an empasse

Perhaps the most poignant story in Klein’s book is the story of a group of smokejumers who were parachuted into the hills of Montana in 1949 in an attempt to control a large forest fire that was spreading quickly. The firefighters were soon caught in the fire themselves which was moving swiftly up the side of the grassy hillside. The men tried to outrun the fire, but sadly only two of the original 15 made it to the top. The other 13 could not run fast enough and were consumed by the onrushing flames.

One of the two men to survive was Wagner Dodge who, like the others, initially tried to outrun the flames, but, unlike the others, realised that this wasn’t going to work and unless he did something different he would die. His quick-thinking insight was to set fire to a patch of grass ahead of him, thus creating an area of safety where he could stand with the fire deprived of its fuel. In a moment of literal life and death decision-making, Dodge had arrived at a creative solution that had unfortunately passed his friends by. Out of desperation, Dodge had discarded his intuition (to run), and thought hard about a radical solution (to cut of the fire’s fuel source).

Obviously, as important as teaching is, it is not really a profession that rests on life or death decisions. That said, there are aspects from the story of the Colorado smokejumpers, in particular the counterintuitive actions of Wagner Dodge, that a disciplined enquirer can learn from in an effort to increase their chances of generating new insights. Foremost amongst those lessons, is the way that a fixed condition – in this case the fire sweeping up the fireside – forced Dodge to focus on the other variables open to him. It may be that self-imposed limitations, such as deadlines, parameters for recording reflections or routines of practice, rather than stifle thinking, may actually encourage new ways of seeing. Being forced to consider all possibilities, including rejecting existing ideas and beliefs, could enhance our ability to make great sense of student interaction or learning. After all, the famous Pomodoro Technique is largely predicated on the notion that short bursts of focused, time-bound thinking produce much better results that longer, drawn out periods of study.

Disciplined enquiry is not easy and does make demands on what is already a very demanding job. That said, if there is a framework and culture that supports disciplined enquiry and makes the systematic study of one or two areas of improvement routine, then I think it could be a powerful means of both individual teacher and whole school improvement. What this framework might look like will be the subject of my next post.

‘Without Contraries there is no progression’: or 7 principles for pairing words with images


William Blake was a visionary poet. As a four-year-old he claimed to have seen God at his window and angels walking around the fields of London. As an adult visions continued to come to Blake, notably when he claimed to see the ‘world in a grain of sand / And a Heaven in a wild flower’. Blake’s visions inspired his poetic vision, particularly his illustrated children’s books, Songs of Innocence and Experience. In these deceptively child-like poems, Blake saw through the systems of corruption behind social injustice and spoke for the poor and the weak at the margins of an increasingly mechanised world.

But there is another sense in which Blake can be considered visionary – in his innovative etched relief printing technique, which combined powerful words with vivid images. Whilst many are familiar with lines from London, The Chimney sweep and The Garden of Love, less have probably seen the accompanying illustrations. Those who have not contemplated the totality of Blake’s inked copperplates, painstakingly coloured by hand, have not really experienced the wholeness of his work. For in his beautiful marriage of poetry and art, Blake enhanced the meanings of his writings through his illustrations whilst appealing to our innate pleasure in combining sensory experience, centuries before dual coding was even a thing.

One wonders, then, what Blake would make of technology today, and the array of software that, in theory, makes it possible to do in minutes what it would have taken him countless nights straining under candlelight to produce. I am, of course, referring to the possibilities afforded the modern day teacher to combine words and images to convey meaning, and I say in theory, because in practice it seems to me that not enough of us know how to properly harness the benefit of pairing words with images to help improve our students’ learning. Whether through ignorance (I doubt dual coding is included in a lot of training), or lack of basic technical know-how, (almost certainly not included), many teachers risk missing out on the additional boost to learning that adding graphics to text or audio explanations offers.

Now I know there are some who question the value of using software tools like Powerpoint in the first place, wary of how they can stifle teaching, or add an unnecessary burden to an already burdensome workload. These concerns are valid and fair. My interest, however, is not so much about the efficacy of technology use per se – I use Powerpoint if I feel it will enhance my students’ learning and avoid it if I think there are better, more economical means. This post is more about how to take advantage of the way our minds take in information, regardless of whether we teach through the medium of slides, lessons handouts or simply writing on a whiteboard and talking to our students.

What Blake intuited with his etched relief printing method, we now know a bit more about through scientific study: namely, that we can encode information from two different sources or channels at the same time. Handled correctly, rather than compromising our understanding or overloading our limited working memory capacity, the simultaneous presentation of visual information with verbal information (the spoken word or the written word converted into sound) can significantly increase our ability to learn. Such an approach, known as the modality principle, increases the chances of integrating new information into existing schema by exploiting natural cognitive processes. But even if it is the written words that are paired with images – and thus processed through the same channel – research demonstrates that done the right way, combining them still gives a significant boost to learning.

The phrases ‘handled correctly’ and ‘done the right way’ are important. Like many lessons learned from research and applied in the classroom, it is not quite as simple it often appears to apply theory to practice. In this case adding an image to a slide, or putting a graphic on a handout sound like things that everybody does already, doesn’t it? The size, placement and type of graphic, however, really matters, as does the moment in which it is deployed, the purpose of the teaching point itself and, of course, the make-up of the learners and their levels of prior knowledge. It is also important to be mindful about what we mean by words, as to our minds the spoken word (audio) and the written word (text) are not the same thing, and they are different rules that apply to each. Too often, these nuances are missed, and the results are poorly designed materials that not only run the risk of looking naff, but also fail to improve student learning or, worse still, have a negative impact on understanding.

Here is an example of where the pairing of words and images was probably not all that helpful in terms of improving student understanding. It might be entertaining, but it is unlikely to make a difference to students’ of income tax, or Einstein for that matter!


Principles for combining words and images in teaching

Below are 7 principles that I think should help teachers make better use of visual aids in their teaching. Some are concerned with the arrangement of images, whilst others focus more on the types of graphics used and how students can engage with them in different ways to maximise their learning. It is worth pointing out that whilst I have tried to give enough detail to help teachers think and plan their use of visuals, I have inevitably still had to omit a great deal of nuance to prevent the post from becoming too unwieldy!

  1. Pair words and images





Generally speaking, you can improve your students’ learning by adding visuals to written text that you display on slides, handouts or on the whiteboard. This principle is particularly effective for students with low levels of prior knowledge, who benefit the most from the combination of words and visuals to build coherent mental models.

  1. Choose images carefully





Just choosing any old graphic is unlikely to boost student learning. In fact, in many cases a poorly selected visual accompaniment to a verbal explanation might actually depress learning. The main point here is to avoid unnecessary visuals that increase extraneous demands on students’ limited working memory capacity or that distract them from their learning (e.g. school logos); well intentioned, but irrelevant pictures (e.g. actor when explaining tragedy); images designed simply to entertain or amuse.

  1. Keep images near text


It is generally better to place your visuals next to or as near to any written text as possible and to avoid the situation where students have to wait for the next slide, or turn over the next page to find the graphic that goes with the words that you are explaining. This is not only incredibly frustrating, but also significantly increases the demands placed on the working memory.

  1. Keep visuals simple

four step51

The temptation when selecting images is to find the most impressive or realistic visual example possible. However, studies repeatedly show that simple visuals are best to communicate the teaching point clearly and concisely. Implementing this principle may be made more difficult by the fact students often tend to favour elaborate and realistic visuals, even though they do not necessarily improve their learning. Unless the point you are trying to communicate demands it, though, go for simplicity over complexity, such as by opting for a flat 2 dimensional image over a more elaborate 3D option.

  1. Strip out the text


In some circumstances, it is better just to avoid any words on or around an image and instead simply explain the visuals verbally. This is particularly true when the images are of concepts that are pretty self-explanatory, such as introducing a picture of a new vocabulary items. In these instances having to read additional words whilst listening to an explanation puts an additional burden on the same visual-spatial channel. The same is generally true of other visuals and graphics like diagrams, where it is best to reduce any accompanying text. In situations where that diagram or explanation is lengthy or complex, however, the addition of some text is helpful in reducing cognitive load by providing memory prompts.

  1. Get students drawing


Studies have found a significant boost to learning when students are asked to produce drawings whilst reading textual information, much more so than other more common means of engagements such as writing summaries. It is important to ensure drawings are accurate, either by providing regular feedback on them or by showing an accurate completed picture looks like in the first place. Cognitive load can be further eased by providing partially completed drawings, such as the opening stages of a timeline or some of the connections in a causal relationship diagram. This method is particularly effective when the learning relies on complex problem solving.

  1. Show the connections


Use relevant visuals to help illustrate relationships and connections in your lesson material. The table below highlights the four main types of explanatory visual, and give examples of what this might look like in practice.

Four types of explanatory visuals (from: Evidence-Based Training Methods)

Type A visual that Examples
Organizational Illustrates qualitative relationships in the content Tree diagram

Concept maps

Relational Summarizes quantitative data Pie chart

Colour on a map to indicate temperatures

Transformational Depicts change in time or space An animation of how equipment works

A series of line drawing with arrows to illustrate blood flow

Interpretative Makes abstract ideas concrete or represents principles A simulation of feature changes by gene alterations

An animation of molecular movement with changes in temperature

In his 2009 paper ‘Research-Based Principles for Designing Multimedia Instruction’, Richard E. Mayer notes that ‘people learn more deeply from words and graphics than from words alone.’ Mayer’s assertion is obviously concerned with optimising the way the mind takes in and stores new information in an effort to improve learning, in our case for the students in our classes. Whilst Blake’s intentions for combining words with images were in many respects very different – more aesthetic and more political – he too understood the importance of harnessing the fullness of our sensory inputs to learn new things.

“If the doors of perception were cleansed every thing would appear to man as it is, Infinite. For man has closed himself up, till he sees all things thro’ narrow chinks of his cavern.”

William Blake, The Marriage of Heaven and Hell

Thanks for reading.


Clark, R. (2015) Evidence-Based Training Methods

Clark, R. and J. Sweller et al (2006) Efficiency in Learning: Evidence-Based Guidelines to Manage Cognitive Load

Mayer, R.E. (2009) Research-Based Principles for Designing Multimedia Instruction


What’s in a word?


Ever picked up a class, in say year 10 or year 11, and been surprised that they don’t seem to know some pretty fundamental terms relating to your subject? For an English teacher, those words might include ‘metaphor’, ‘simile’, ‘juxtaposition’ or perhaps even ‘fronted adverbial’ – the kinds of subject specific terminology you would hope, nay expect, students to have pretty much nailed down by the time they are 14, 15 or 16 years old. I have experienced this many times in different schools.

This is most definitely not about bashing KS3 or KS2 colleagues. I’m fairly sure that when I taught more year 7 and year 8 classes, I encountered the same thing, and I’m guessing that a year 5 or year 6 teacher probably experiences something similar too. Likewise, it wouldn’t surprise me if the teachers who inherit my classes find themselves having to explain the same core terminology again that I thought I had successfully taught the previous year. It’s a seemingly endless cycle.

But why does this happen? Whilst the forgetting curve is inevitably partly to blame, I suspect a fuller explanation lies in the way most of us approach teaching vocabulary and some of the assumptions we routinely make about what our students know. This is possibly more prevalent in the secondary setting, where depending on the subject you teach, it can be very difficult to have an accurate grasp on your students’ vocabulary levels. Yet it is crucial that we do because it is those tier three words that carry the fundamental ideas and concepts upon which other knowledge and skills are then built.

It is not hard to understand why we often make assumptions about what students know in relation to vocabulary knowledge, or any other kind of knowledge for that matter. For a start, words like metaphor and simile are supposedly covered at KS1. My 7-year-old daughter, for instance, can provide an example of a simile and a rudimentary definition because she has recently been doing poetry in class. But I think there’s a big difference between covering a word in a unit that is then subsequently assessed, and really knowing that word more fully, including its intricate web of conceptual links and associations. Coverage is quite clearly very different from learning, but it can be too easily conflated.

The other probable reason why we teachers assume too much about our students’ knowledge base is because we are usually so reliant on very unreliable proxies to make our inferences. Too often, assessments of students’ linguistic fidelity are bound up in rubrics designed to assess more generic skills. As a result we can fall into the trap of assuming that a level 4 in this, or 85% in that, corresponds to a certain level of understanding of the subject more generally, including all its attendant terminology. The result is that we can end up building on sand if we assume that our students are secure with core subject concepts (and the words that encapsulate them) when they are not.

If a teacher realises that her class doesn’t really know what an image or imagery is – as I have found every year of teaching question two of the iGCSE! – she will decide to teach that concept and how to apply it correctly in context. The problem is that by the time the next year rolls around that understanding often disappears and a new teacher comes along and finds herself in the same position. What I think is needed is a clearly-defined vocabulary programme, detailing exactly what subject-specific terminology students are expected to have learnt and by what stage. Without such a system, and a reliable form of assessment to underpin it, we run the risk of continuing to make unfounded assumptions about what students know and therefore continually waste our time re-teaching the same thing.

As part of our school’s initial attempt to construct such a coherent, school-wide vocabulary sequence, I have been carrying out some exploratory work with our current year 7. I have designed some core knowledge quizzes aimed at getting a more accurate picture of what specifically our students do and do not know across a range of subjects, including history, art, geography, music and religion. A key component of these quizzes is vocabulary, so there are questions asking for definitions of foundational terms like ‘primary colour’ or ‘portrait’ in art, ‘monarch’ or ‘civilisation’ for history and ‘island’ and ‘hemisphere’ for geography.

It is early stages at the moment, and I may blog in more detail in the future, but for now I want to share with you one small insight gleaned from the process so far, which in many respects perfectly illustrates some of issues I have touched upon above. The definition in question is ‘island’, a word I’m sure most would expect the majority of 11 and 12 year olds to be able to define. Everyone knows what an island is, right? So you might think, yet the students’ responses seen to suggest otherwise. As you can see from the definitions supplied by one year 7 tutor group, we might need to seriously challenge our assumptions about our students and their levels of understanding, and think more carefully about what it means to truly know a word.

Correct responses:

  • ‘Land surrounded by water’
  • ‘Land that is surrounded by sea on all four sides’
  • ‘It is a bit of land surrounded by water’
  • ‘A bit of land surrounded by water’
  • ‘A piece of land surrounded by water’
  • ‘An area surrounded by a sea of ocean’
  • ‘A piece of land surrounded by water’
  • ‘A piece of land surrounded by water’

Mostly correct responses:

  • ‘A large or small part of land not connected to anything’
  • ‘A broken piece of land’
  • ‘A small place where, people or animals may live, but is surrounded by the ocean’
  • ‘A part of land away from a country’
  • ‘A big or small place covered with sand and surrounded by the ocean’

Not really correct responses:

  • ‘Land, flat land’
  • ‘An abandoned land’
  • ‘Some small land on the sea’
  • ‘A small place in the ocean’
  • ‘A piece of land covered by water’
  • ‘Is a place in a country near / on a beach’
  • ‘A tiny bit of land’
  • ‘Surface’
  • ‘An island is a cut off region of land that has little civilisation’
  • ‘A piece of land’
  • ‘A place’
  • ‘A big bit of rock formed after volcanic eruption under water’

For me, the crucial aspect of understanding an island is that it is a piece of land surrounded by water on all sides. Whilst we may quibble about other aspects of a successful definition, these strike me as being the defining features of what make an island an island, particularly for children at this age. Based on this admittedly loose definition, 9 students out of 30 got the correct answer, 5 get the benefit of the doubt, whilst 12 don’t really manage to define an island successfully. 4 students did not provide an answer at all. What this means is that, even if we say that 2 of the students who failed to answer could define the word island correctly if asked more directly, then there would still be around 50% of the class who either cannot articulate their understanding of what an island is, or that harbour some pretty serious misconceptions about it. Neither of these situations is desirable.

Now, you could argue that this doesn’t really matter, or that were I to probe the students more thoroughly with more specific questions than simply, ‘define the term island’, the results would be different. Maybe; maybe not. If, however, we park that for the moment and look at some of the student responses in more detail we begin to see a couple of important, and I would argue potentially damaging, misconceptions about the nature of islands. The first of these seems to be that islands are ‘abandoned’ places that are ‘cut off’ from civilisation. Perhaps adventure stories in films and books lead to this particular misunderstanding. Then there is the suggestion that islands are ‘tiny bits of land’ or ‘small places’. Again, popular depictions of island settings may create this association.

Whilst in many cases islands are indeed small and remote places cut off from people and more recognisable signs of society, there are also plenty of very good examples of islands that are not, such as the very place in which we live. If you stop and think about the implications of this for a minute, you realise they are potentially quite significant. For instance, if you don’t understand that islands are surrounded by water, you might not fully appreciate the challenges and opportunities this might pose for a group of people who live on one. Likewise, without a foundational grasp of the nature of islands, you may not fully understand the importance of Britain’s island status, in relation to both its history and its present, such as its ongoing relationship with the rest of the European continent.

It is for these reasons and those that I have written about before that I think the priority for developing student vocabulary, particularly in our school context, is improving the quality of teaching in relation to tier three, not tier two, words, at least in the initial stages of building a school-wide approach. When you consider what it really means to know a word you see how important it is to teach subject-specific vocabulary as well as you possibly can at the first time of asking, so as to try and avoid having to repeat the process every year ad nauseam. Obviously, improving students’ wider academic vocabularies is extremely important too, and it may well be possible to do this simultaneously given the time and support necessary to do it justice. Perhaps, though, it is better to leverage the effects of solid, consistent tier three terminology teaching first and then scale up to address the wider language gap later on.

After all, ‘no man is an island’.

Evaluating CPD: hard but not impossible

Screenshot 2016-04-09 08.22.31*

At a time of shrinking budgets, there is a need for reliable formative and summative feedback about the efficacy of professional learning. It is not acceptable to assume that, however well intentioned or well received a school’s CPD programme is, it is necessarily right for it to continue. If it is not having an impact on student outcomes, whether in the narrowest sense of achievement or more broadly across other competencies, then it has, at the very least, to be called into question. It may well be that other forms of professional learning are more effective, or perhaps, as some would argue, that no CPD would have more impact, freeing up busy teachers to plan and mark better. If you have no way of knowing, then you may be wasting valuable time and resources on the wrong thing.

The problem is that whilst we may well agree that evaluating the impact of training on student outcomes is important, it is far from straightforward to measure this impact in a robust and efficient way. I know how hard it is because we have spent the past few years trying to figure out how to do evaluation better. I don’t think we have cracked it – far from it – but with the support of organisations like the fantastic Teacher Development Trust, we are getting closer to understanding what successful evaluation looks like and how to align our systems and practices so they are congruent with the content and aims of our professional learning.

There are a number of theoretical models for evaluating professional development, all of which have benefits and flaws. Kirkpatrick’s (1959, 1977, 1978) model from the world of business offers four types of evaluation. Despite its criticisms, such as the failure to consider the wider cultural factors of the organisation and assumptions about the causality between the levels, it provides a useful framework for thinking about what should go into effective evaluation. Likewise, although it runs counter to what we know about effective CPD, namely having a clear sense of intended outcomes, Scriven’s (1972) notion of goal-free evaluation also has its place, allowing within the evaluation process a place for identifying a range of impact outcomes, whether originally intended or not.

My favourite model for evaluating CPD, however, is Guskey’s (2000) hierarchy of five levels of impact. In this model the five levels are arranged hierarchically with each one increasing in complexity. The final two levels – including the last one which looks at the impact of professional learning on student outcomes – are the hardest to achieve, which no doubt explains why so many schools, including my own, have not done them terribly well. In many respects Guskey’s model bears similarities to Kirkpatrick’s framework, but crucially it adds a fifth level of evaluation, one that looks at the impact at an organisational level, which is useful for trying to make sure that the aims of a school’s CPD programmes are not undermined elsewhere by its culture or systems.

In the rest of this post, I will briefly outline each of the five levels in Guskey’s model and then explain what practices we are currently undertaking within each to improve the evaluation of our professional development. This is very much still a work in progress, so any feedback received would help us make further refinements moving forward.

  1. Reaction quality Evaluates how staff feel about the quality of their professional learning

In many respects, this area of evaluation is quite soft: basing evaluation on whether participants liked or disliked specific activities rather than objectively evaluating its impact on where it counts has been rightfully challenged as being weak. I do, think, however, that it is still important to include some element of staff qualitative feedback within the overall evaluation process, particularly if suggestions can be acted upon easily to increase buy-in.

To this end, we send out reaction quality surveys after every short form CPD session. It has only two sections. The first asks participants to evaluate the extent to which session objectives have been met, whilst the second invites more ‘goal free’ reaction feedback by asking about what was learned and what participants would like to see included or amended in future sessions.

Screenshot 2016-04-09 08.21.02

  1. Learning evaluationmeasures knowledge, skills and attitudes acquired through training

This aspect of evaluation is linked in with our appraisal process. I have already written about the changes to our appraisal this year, which have gone down well so far with enhancements to follow after feedback. Essentially, all teachers, classroom support staff and non-teaching staff identify two main goals: one that is a subject (or department/role) target orientated towards developing a specific aspect of pedagogy, practice or knowledge, whilst the second is a learning question, allowing for the enquiries into the more nebulous and complex aspects of improvement that lie at the heart of our daily practice.

The subject goal is supported by departments or teams during their fortnightly subject CPD time. For instance, a couple of science teachers seeking to improve their modelling might work together using IRIS lesson observation equipment, or a group of religious studies teachers might run seminars during department pedagogy time on the knowledge required to teach their new specifications. The enquiry question is supported by the wider CPD programme, the bulk of which takes places in learning communities that are selected during the appraisal process and aim to provide the necessary input and ongoing support.

The evaluation itself comes in two parts. The first is a professional audit, which we instigated for the first time last year and will be revisited in the summer term to see the extent to which knowledge has changed. The second part is built into the appraisal process, where through a combination of a learning journal, voluntary targeted observations and professional dialogue colleagues can demonstrate the new knowledge and insights they have acquired in their department training or through participation in their learning community.

The model is based upon a number of sources, including the helpful lesson study enquiry cycle put together by the Teacher Development Trust. Both interim appraisal and annual appraisals provide opportunities for meaningful discussions about individual development, as well as for the evaluation of individual and aggregated professional learning. This is not so much about holding individuals to account, but rather as a means of fostering an ethos of continual improvement and gaining insight of what training adds value and what doesn’t.

  1. Organisational evaluation – assesses the support and ethos of the organisation

This third level of evaluation in Guskey’s model represents the missing part of Kirkpatrick’s framework – evaluation of school ethos and support for CPD. As Guskey observes, it would be ridiculous for an individual teacher or group of teachers to receive high quality training that they understand in theory, agree with in principle but cannot put into practice because of ‘organizational practices that are incompatible with implementation efforts’.

The problem, however, with assessing the support and ethos across a whole school, and evaluating whether it is aligned with the objectives and content of the professional learning programme, is that it requires an objective, external voice – the ‘critical friend’ cited in recent reports into effective teaching and professional development. Fortunately, we are members of the Teacher Development Trust and one of the benefits of membership to their network is the regular external audit of CPD. Unlike other brands of external judgement, this one is supportive and helpful – both in the summative, but moreover in the formative sense. This post from TeacherToolkit provides a useful insight into one school’s experience of the TDT audit.

The audit is split into 7 categories with three levels of award for elements within these categories – Gold, Silver and Bronze. In assessing the overall quality of professional learning, it canvasses the views of all members of teaching and non-teaching staff. This is done via a pre-visit survey and then through extended interviews with a cross-section of staff during the day of the evaluation, which is peer reviewed with another member of the network. What I particularly like about the TDT audit is the way it provides rigorous external feedback into what is working and what requires improvement. There is no spurious judgement, but rather crucial feedback about what staff think about their own school’s CPD and a cool appraisal of whether of not its culture and practices enable new learning to be enacted.

  1. Behaviour evaluation – focuses on changes in behaviours as a result of training received

Professional development cannot really considered to have been successful if the day-to-day behaviours of teachers have not changed. As we all know, this usually takes a great deal of time. Even small changes in practice, such as trying to avoid talking whilst students are working can take a great deal of practice and feedback. Focused observations are a useful support in this process and can be requested by individuals who want to gain feedback on how their behaviours have changed and what they may wish to consider to change in the future. These observations are agreed at the outset and are purely developmental.

Perhaps the most reliable and useful source of ongoing evaluation into a teacher’s behaviorial change in the classroom is from the students’ themselves. Next year, we intend to introduce student evaluations, which again are not designed to catch staff out but rather to gain useful feedback for teachers with regards to the one or two identified areas of change that they have been deliberately working on, for either their subject goal or their learning question. It was too soon to introduce this year, particularly as we wanted to be careful about how we make sure that student evaluations are embraced not feared.

  1. Results evaluation assesses the impact of professional development on outcomes

At the outset of the appraisal in early October, teachers identify specific classes, groups of students and aspects of their classroom teaching or their students’ learning that they want to change as a consequence of their professional learning. This identification of outcomes is a structured and supported process, which not only looks back at previous examined and non-examined results, but also looks forward to future curriculum and timetable challenges. We no longer set arbitrary performance targets, but do seek to establish clearly-defined outcomes in relation to student learning. Again, the TDT resources have proven a very useful guide.

The intention for this year is to look closely at the impact of bespoke department and school-wide professional development on specified student outcomes. There may be some mileage in considering this in the aggregate too, but we are very much aware that much of the nuance is lost in such a process. It may be possible in the future to more closely align the goals of individual classroom contexts to those at department or whole school level, but this is very much something for the future. This is by no means a flawless approach, but it does get much closer to evaluating the thread between teacher growth and student achievement. David Weston, Chief Executive of the Teacher Development Trust, provides a different more immediate way of building evaluation into professional development with this wonderful worked example of a group of science teachers working on a common problem.

As I have already stressed, ours is still very much a work in progress. I do think, however, that we are much further along in understanding the importance of evaluation in relation to professional development, and what this might look like in practice.

Thanks for reading.


Bates (2004) ‘A critical analysis of evaluation practice: the Kirkpatrick model and the principle of beneficence’

Creemers B., L. Kyriakides and P. Antoniou (2013) Teacher Professional Development for Improving Quality of Teaching

Guskey, T (2000) Evaluating Professional Development

Scriven (1991) ‘Prose and Cons about Goal-Free Evaluation

* image adapted from: