Principles of Great Assessment #1 Assessment Design

Screenshot 2017-03-10 17.52.06.png

This is the first in a short series of posts on our school’s emerging principles of assessment, which are split into three categories – principles of assessment design; principles of ethics and fairness; and principles for improving reliability and validity. My hope in sharing these principles of assessment is to help other develop greater assessment literacy, and to gain constructive feedback on our work to help us improve and refine our model in the future.

In putting together these assessment principles and an accompanying CPD programme aimed at middle leaders, I have drawn heavily on a number of writers and speakers on assessment, notably Dylan Wiliam, Daniel Koretz, Daisy Christodolou, Rob Coe and Stuart Kime. All of these have a great ability to convey difficult concepts (I only got a C grade in maths, after all) in a clear, accessible and, most importantly, practical way. I would very much recommend following up their work to deepen your understanding of what truly makes great assessment.

  1. Align assessments with the curriculum 

 Screenshot 2017-03-10 17.52.48.png

In many respects, this first principle seems pretty obvious. I doubt many teachers deliberately set out to create and administer assessments that are not aligned with their curriculum. And yet, for a myriad of different reasons, this does seem to happen, with the result that students sit assessments that are not directly sampling the content and skills of the intended curriculum. In these cases the results achieved, and the ability to draw any useful inferences from them, are largely redundant. If the assessment is not assessing the things that were supposed to have been taught, it is almost certainly a waste of time – not only for the students sitting the test, but for the teachers marking it as well.

Several factors can affect the extent to which an assessment is aligned with the curriculum and are important considerations for those responsible for setting assessments. The first is the issue of accountability. Where accountability is unreasonably high and a culture of fear exists, those writing assessments might be tempted to narrow down the focus to cover the ‘most important’ or ‘most visible’ knowledge and skills that drive that accountability. In such cases, assessment ceases to provide any useful inferences about knowledge and understanding.

Assessment can also become detached from the curriculum when that curriculum is not delineated clearly enough from the outset. If there is not a coherent, well-sequenced articulation of the knowledge and skills that students are to learn, then any assessment will always be misaligned, however hard someone tries to make the purpose of the assessment valid. A clear, well structured and shared understanding of the intended curriculum is vital for the enacted curriculum to be successful, and for any assessment of individual and collective attainment to be purposeful.

A final explanation for the divorce of curriculum from assessment is the knowledge and understanding of the person writing the assessment in the first place. To write an assessment that can produce valid inferences requires a solid understanding of the curriculum aims, as well as the most valid and reliable means of assessing them. Speaking for myself, I know that I have got a lot better at writing assessments that are properly aligned with curriculum the more I have understood the links between the two and how to go about bridging them.

  1. Define the purpose of an assessment first 

 Depending on how you view it, there are essentially two main functions of assessment. The first, and probably most important, purpose is as a formative tool to support teaching and learning in the classroom. Examples might include a teacher setting a diagnostic test at the beginning of a new unit to find out what students already know so their teaching can be adapted accordingly. Formative assessment, or responsive teaching, is an integral part of teaching and learning and should be used to identify potential gaps in understanding or misconceptions that can be subsequently addressed.

The second main function of assessment is summative. Whereas examination bodies certify student achievement, in the school context the functions of summative assessment might include assigning students to different groupings based upon perceived attainment, providing inferences to support the reporting of progress home to parents, or the identification of areas of underperformance in need of further support. Dylan Wiliam separates out this accountability function from the summative process, calling it the ‘evaluative’ purpose.

Whether the assessment is designed to support summative or formative inferences is not really the point. What matters here is that the purpose or function of the assessment is made clear to all and that the inferences the assessment is intended to produce are widely understood by all. In this sense, the function of the assessment determines its form. A class test intended to diagnose student understanding of recently taught material will likely look very different from a larger scale summative assessment designed to draw inferences about whether knowledge and skills have been learnt over a longer period of time. Form therefore follows function.

3. Include items that test understanding across the construct continuum

 Many of us think about assessment in the reductive terms of specific questions or units, as if performance on question 1 of Paper 2 was actually a thing worthy of study in and of itself. Assessment should be about approximating student competence in the constructs of the curriculum. A construct can be defined as the abstract conception of a trait or characteristic, such as mathematical or reading ability. Direct constructs measure tangible physical traits like height and weight and are calculated using verifiable methods and stated units of measurement. Unfortunately for us teachers, most educational assessment assesses indirect constructs that cannot be directly measured by such easily understood units. Instead, they are calculated by questions that we think indicate competency, and that stand in for the thing that we cannot measure directly.

Within many indirect constructs, such as writing or reading ability, is likely to be a continuum of achievement possible. So within the construct of reading, for instance, some students will be able to read with greater fluency and/or understanding than others. A good summative assessment therefore needs to differentiate between these differing levels of performance and, through the questions set, define what it means to be at the top, middle or bottom of that continuum. In this light, one of the functions of assessment has to be a way of estimating the position of learners on a continuum. We need to know this to evaluate the relative impact or efficacy of our curricula, and to understand how are students are progressing within it.

Screenshot 2017-03-09 16.52.15.png

  1. Include items that reflect the types of construct knowledge

 Some of the assessments we use do not adequately reflect the range of knowledge and skills of the subjects they are assessing. Perhaps the format of terminal examinations has had too much negative influence on the way we think about our subjects and design assessments for them. In my first few years of teaching, I experienced considerable cognitive dissonance between my understanding of English and the way that it was conceived of within the profession. I knew my own education was based on reading lots of books, and then lots more books about those books, but everything I was confronted with as a new teacher – schemes of work, the literacy strategy, the national curriculum, exam papers– led me to believe that I should really be thinking of English in terms of skills like inference, deduction and analysis.

English is certainly not alone here, with history, geography and religious studies all suffering from a similar identify crisis. This widespread misconception of what constitutes expertise and how that expertise is gained probably explains, at least in part, why so many schools have been unable to envisage a viable alternative to levels. Like me, many of the people responsible for creating something new themselves been infected by errors from the past and have found it difficult to see clearly that one of the big problems with levels was the way they misrepresented the very nature of subjects. And if you don’t fully understand or appreciate what progression looks like in your subject, any assessment you design will be flawed.

Daisy Christodoulou’s Making Good Progress is a helpful corrective, in particular her deliberate practice model of skill acquisition, which is extremely useful in explaining the manner in which different types of declarative and procedural knowledge can go into perfecting a more complex overarching skill. Similarly, Michael Fordham’s many posts on substantive and disciplinary knowledge, and how these might be mapped on to a history progression model are both interesting and instructive. Kris Boulton’s series of posts (inspired by some of Michael’s previous thinking) are also well worth a look. They consider the extent to which different subjects contain more substantive or disciplinary knowledge, and are useful points of reference for those seeking to understand how best to conceive of their subject and, in turn, design assessments that assess the range of underlying forms of knowledge.

Screenshot 2017-03-09 16.53.06.png

  1. Use the most appropriate format for the purpose of the assessment

 The format of an assessment should be determined by its purpose. Typically, subjects are associated with certain formats. So, in English essay tasks are quite common, whilst in maths and science, short exercises where there are right and wrong answers are more the norm. But as Dylan Wiliam suggests, although ‘it is common for different kinds of approaches to be associated with different subjects…there is no reason why this should be so.’ Wiliam draws a useful distinction between two modes of assessment: a marks for style approach (English, history, PE, Art, etc.), where students gain marks for how well they complete a task, and a degree of difficulty approach (maths, science), where students gain marks for how well they progress in a task. It is entirely possible for subjects like English to employ marks for difficulty assessment tasks, such as multiple choice questions, and maths to set marks for style assessments, as this example of comparative judgement in maths clearly demonstrates.

Screenshot 2017-03-09 16.53.18.png

In most cases, the purpose of assessment in the classroom will be formative and so designed to facilitate improvements to student learning. In such instances, where the final skill has not yet been perfected but is still very much a work in progress, it is unlikely that the optimal interim assessment format will be the same as the final assessment format. For example, a teacher who sets out to teach her students by the end of the year to construct well written, logical and well supported essays is unlikely to set essays every time she wants to infer her students’ progress towards that desired end goal. Instead, she will probably set short comprehension questions to check their understanding of the content that will go into the essay, or administer tests on their ability to deploy sequencing vocabulary effectively. In each of these cases, the assessment reflects the inferences about student understanding the teacher is trying to ascertain, and not confusing or conflating them with other things.

In the next post, I will outline our principles of assessment in relation to ethics and fairness. As I have repeatedly made clear, my intention is to help contribute towards a better understanding of assessment within the profession. I welcome anyone who wants to comment on our principles, or to critique anything that I have written, since this will help me to get a better understanding of assessment myself, and make sure the assessments that we ask our students to sit are as purposeful as possible.

Thanks for reading.

 

 

Principles of Great Assessment: Increasing the Signal and Reducing the Noise

Screenshot 2017-03-09 17.03.29.png

After the government abolished National Curriculum levels, there was a great deal of initial rejoicing from both primary and secondary teachers about the death a flawed system of assessment. Many, including myself, delighted in the freedom afforded to schools to design their own assessment systems anew. At the time I had already been working on a model of assessment for KS3 English – the Elements of Assessment – and believed that the new freedoms were a positive step in improving the use of assessment in schools.

Whilst I still think that the decision to abolish levels was correct, I am no longer quite so sure about the manner and timing in which they were removed. Since picking up responsibility for assessment across the school, I have come to realise just how damaging it was for schools to have to invent their own alternatives to levels without anywhere near enough assessment expertise to do so well. Inevitably, many schools simply recreated levels under a different name, or retreated into the misguided safety of the flight path approach.

I would like to think that our current KS3 assessment model, the Elements of Expectation, has the potential to be a genuine improvement on National Curriculum levels, supporting learning and providing reliable summative feedback on student progress at sensible points in the calender. Even though it is in its third year, however, it is still not quite right. One of the things that I think is holding us back is our lack of assessment literacy. I am probably one of the more informed staff members on assessment, but most of what I know has been self-taught from reading some books and hearing a few people talk.

This year, in an effort to do something about this situation and to finally get our KS3 model closer to what we want, we have run some extensive professional development on assessment. Originally, I had intended to send some colleagues to Evidence Based Education’s inaugural Assessment Academy. It looks superb and represents an excellent opportunity to learn much more about assessment. But when it became clear budget constraints would make this difficult, we decided to set up and run our own in-house version: not as good (obviously) and inevitably rough around the edges, but good enough, I think, for our KS3 Co-ordinators and heads of subjects to develop the expertise they need to improve their use of assessment with our students.

The CPD is iterative and runs throughout the course of the year. So far, we have established a set of assessment principles that we will use to guide the way we design, administer and interpret assessments in the future. In the main, these principles apply to the use of medium to large-scale assessments, where the inferences drawn will be used to inform relatively big decisions, such as proposed intervention, student groupings, predictions, reporting progress, etc. Assessment as a learning event is pretty well understood by most of our teachers and is already a feature of many of our classrooms, so our focus is more on improving the validity and reliability of our summative inferences.

I thought it might be useful and timely to share these principles over a series of posts, especially as a lot of people still seem to be struggling, like us, to create something better and more sustainable than levels. The release of Daisy Christodolou’s book Making Good Progress has undoubtedly been a great and timely help, and I intend it to provide some impetus to our sessions going forward, as we look to implement some of the theory we covered before Christmas into something practical and useful. This excellent little resource from Evidence based Education is an indication of some of the fantastic work out there on improving assessment literacy. I hope I can add a little more in my next few posts.

If we are going to take the time and the trouble to get our students to sit assessments, then we want to make sure that the information is as reliable and valid as possible, and that we don’t try and ask our assessments to do too much. The first in my series of blogs will be on our principles of assessment design, with the other two on ethics and fairness and then, finally, reliability and validity.

All constructive feedback welcome!

The Future of Assessment for Learning

51i69fbowl-_sx347_bo1204203200_ 

Making Good Progress is an important book and should be required reading for anyone involved in designing, administrating or interpreting assessments involving children. Given the significant changes to the assessment and reporting landscape at every level, notably in the secondary context at KS3, this book is a timely read, and for my money it is the most helpful guide to designing effective formative and summative assessment models currently available to teachers.

I’ve heard Daisy speak at various education events over the years, and it is interesting to see how many of these individual talks have fed into the development of this book. Making Good Progress is a coherent and highly convincing argument for re-evaluating our existing understanding and approach to formative assessment and for moving away from the widespread practice of using formative assessment for summative purposes.

Life after Levels

From what I can tell, schools have responded to the abolishment of levels in three main ways. The first is business as usual, maintaining the use of levels – and thus ignoring the manifold problems associated with their misapplication – or recreating levels but in another name. Such amended approaches appear to recognise the flaws of levels and offer something different, but in reality too often they end up simply representing the same thing, changing numbers to letters or something else as equally fatuous. In many respects our first iteration after levels – the Elements of Assessment – fell foul of some of these same mistakes.

The second response to life after levels is the mastery-inspired model of assessment. In this approach subjects identify learning objectives for a student to master over the course of a year. This approach, which usually includes mapping out these myriad goals on a spreadsheet, appears more attractive in theory – what is to be learned is clearly articulated and not bundled up into a grade or prose descriptor – but in practice can prove equally unreliable and particularly unwieldy to maintain. Often the micro goals are watered down versions of the final assessment, not carefully broken down components of complex skills.

The final approach is the popular flight path model. This comes in various forms, but generally tends to focus on working backwards from GCSE grades to provide a clear ‘path’ from year 7 to year 11. I can understand the allure of this, and appreciate how such a model appears to offer school leaders a neat and tidy solution to levels. The problem is that learning is not this straightforward, and introducing the language of GCSE at year 7 seems to me to entirely miss the point of what assessment can and should be at this point of a child’s education – some five years before any terminal exam is to be sat!

As you read Daisy’s fantastic book, it becomes clear how all of these approaches to assessment are in one way or another fundamentally flawed: none of them really address the two underlying problems that ultimately did in for levels, namely the tendency for interim (or formative) assessment to always look like the final task, and for assessment to happily double up for formative and summative purposes. Making Good Progress destroys these widely held beliefs, albeit in the kind and sympathetic manner of a former teacher who understands how all this mess came to pass.

Generic Skill versus Deliberate Practice

In chapter five Daisy takes up what, from my experience, is the biggest barrier to improvement in the use of assessment in schools: how teachers conceive of their subjects in the first place. Daisy carefully unpicks the misconception that initial tests should reflect the same format as the final assessment. She outlines two very different methods of skill acquisition that account for how interim assessments are constructed – the generic skill method (where skills are transferable and practiced in a form close to their final version) and the deliberate practice method (where practice is deliberate and focused may look different in nature to the final version).

In the generic skill model, an interim assessment, such as a test of reading ability in English, will look very similar to the final assessment of reading at the end of the course, an essay or an extended piece of analysis in a GCSE exam, for instance. This approach, however, completely misunderstands how students learn such large and complex domains like reading, and prevents the opportunity for the interim assessment to be used formatively because it bundles up the many different facets of the domain and hides them in vague prose descriptors.

The alternative to this model, Daisy calls the deliberate practice model. Informed by the work of Anders Ericsson, this view of skill acquisition respects the limitations of working memory and recognises how complex skills are learnt by breaking down the whole skill into its constituent parts in an effort to build up the mental models that enable expertise. In this model very little, if any, practice tasks look like the final assessment. Sports coaches and music teachers have long understood the importance of this method, isolating specific areas of their domain for deliberate practice. As Daisy notes: ‘The aim of performance is to use mental models. The aim of learning is to create them.’

These two distinct approaches to skill development have a significant consequence for the design and implementation of assessment in the classroom. If you are a history teacher and you teach in accordance with the generic model of skills acquisition, you will tend to set your students essays when you want to check their understanding of historical enquiry. You may get the illusion of progress through your summative judgements, an emerging student might appear to become a secure student from one assessment to the next, but neither you, nor your students, will really be any the wiser of what, if anything, has improved or, more to the point, what needs to be improved in the future.

Another history teacher might share the same desire to teach her students to write coherent historical essays. This teacher, however, knows this is an incredibly complex skill that requires sophisticated mental models underpinned by a breadth and depth of historical knowledge. This teacher isolates these specific areas and targets them for dedicated practice. When she checks for understanding, she sets tests that reflect these micro components, such as setting a timeline task to show students’ understanding of chronology, or a series of multiple choices questions designed to ascertain their understanding of causality. Extended writing comes later when the mental models are secure. For now, the results from the tasks provide useful, precise formative feedback.

Koretz and Wiliam

For much of the book, Daisy draws on the work of Daniel Koretz and Dylan Wiliam to support her arguments. Koretz’s Measuring Up is another great book, which outlines the design and purpose of standardised testing and how to interpret examination results in a sensible way. Wiliam’s work is equally instructive, in particular his SSAT pamphlet Principled Assessment Design, which is a helpful technical guide for school leaders on designing reliable and valid school assessments.

Making Good Progress complements both these other works, and together the three books tell you everything you need to know about how to construct valid, reliable and ethical assessments. Like Koretz and Wiliam, Daisy considers the key technical assessment concepts of reliability and validity, and similarly exposes the uses and abuses of assessment, which she does in such a way that makes the need to assess better seem urgent and necessary. What it also offers, however, in particular through the deliberate practice paradigm, is the means through which to improve assessment and to link it to a coherent progression model of learning.

If I had one minor criticism of Making Good Progress, it would be that the closing chapters that outline this coherent model of curriculum and assessment are perhaps a little idealistic. Whilst the arguments for more widespread use of textbooks to support a coherent model of progression are sound, and the idea to create banks of subject-specific diagnostic questions for formative assessment purposes makes complete sense, the chances of either of these things happening any time soon seems to me rather remote. Both require significant agreement amongst teachers on the nature of their disciplines, some kind of consensus around skill acquisition (as Daisy notes herself, the generic skill method is pervasive) and for schools to systematically work together. Oh, and stacks of investment too. None of these things seem likely in the current education climate.

One much bigger criticism of the book, which I really must take Daisy to task about, is that it was not written several years earlier. Whilst I get that it may have taken her a while to formulate her ideas, and perhaps a good few months more to write them out, it still seems pretty remiss of her not to have co-ordinated better with the DFE. Had Making Good Progress been published in 2013 when the abolishment of National Curriculum levels was first announced (perhaps in a Waterstones 3 for 2 offer with Koretz and Wiliam), then I think that I, along with a number of other teachers, would have not wasted quite so much time and effort floundering around in the dark, trying to design something better than what went before, but often failing miserably.

Making Good Progress is a truly great read, and though its ostensible focus is on improving the use of formative assessment in schools, it covers a great deal of other ground in order to lay out the evidence to support the arguments. I enjoyed Daisy’s book immensely and commend it to anyone in the profession in any way involved with assessment, which is pretty much everyone!

Cleverlands – A Smart Read

cover-jpg-rendition-460-707

Lucy Crehan‘s Cleverlands is a great read. As well as providing a fantastic overview of the workings of many of the world’s leading education systems, Cleverlands also offers a unique insight into the culture and people who live and breathe the systems on a daily basis – the parents, the teachers, and the students themselves, all of whom, in one way or another, are asking the same fundamental questions about education: what should young people learn, and how can we make education better and fairer for all?

After working for three years in a London secondary school teaching science, Crehan wanted to find out more about why some countries seemed to perform better with their educational outcomes than others, at least according to PISA assessment scores. She embarked on a journey that took in most of the world’s prominent education jurisdictions – the usual suspects such as Singapore, Finland and Japan – with the aim of getting to the heart of the reality behind the statistics of national comparison data. The result is this fantastic book, written by someone who clearly understands that headlines only ever tell part of the story, and who has a keen eye for the nuance of research data, which we know is too often appropriated by those looking for quick fixes and easy answers.

Cleverlands is organised into 18 perfectly weighted chapters that each focus on exploring an aspect of a particular educational system. One of the things that make this such a pleasurable read is the clarity of Crehan’s writing, and in particular her effortless blend of travelogue, considered analysis and opinion. One moment we’re inside the home of one of the many teachers who agree to house her during her travels, to show her around their schools and to act as her interpreter, and the next we’re taking a step back to review the bigger picture, looking at the research, or learning about a country’s social and cultural history.  Throughout I felt cheered by the essential kindness of strangers, and by the way that so many teachers around the world were willing to help Crehan on the back of just a few speculative emails, which she herself admits were rather optimistic and the potential actions of a ‘lunatic’.

There is something to learn from each of the countries under examination. Not so much in terms of directly taking any of the ideas or approaches being described and blindly applying them to a different classroom, school or even system – the book is clear that, despite what some politicians might think, it’s a bit more complicated than that – but more in the sense in which the insights that the book offers into the lives of others, enables a greater understanding of ideas, beliefs and practices much closer to home. In many respects, I found myself thinking how much we fall short in comparison to our international colleagues, and I don’t mean in PISA scores, which often don’t reveal the complete picture.

Compared to the very best education systems our system does not appear to be very systematic at all, at least where it really matters: in developing great teachers, in raising the status of the profession and in giving the time and resource necessary to genuinely improve educational outcomes. Whether or not you like the Singaporean approach to widespread streaming (I suspect you won’t and to be fair, neither it seems do the Singaporeans), you have to admire the fact they have a coherent plan, one that a great deal of thought went into producing. Time after time what emerges from each of the stories of educational success from China to Canada is the notion of coherence and joined-up thinking. There are drawbacks, caveats and nuance aplenty, but at least the world’s leading education nations have a strategy, whereas all we seem to have is fracture, self-interest and free market chaos.

Depending on how much you read about teaching or follow education policy in the media, there will be bits of Cleverlands that you will probably already know about, or at the very least with which you will be quite familiar. For instance, Japan’s large class sizes, high levels of parental engagement and collaborative teaching practices will be common knowledge to most who will seek out this book in the first place. Likewise the triumphs of the Finnish system that lead to the outstanding results of the 2006 PISA report are well documented, in particular the high standard of teaching training, the prestige of the profession in society and the role of high quality textbooks in ensuring curriculum coherence. Familiar too will be the backlash against this success and the supposed fall of the Finnish star in recent years.

But even within the familiar, there are surprises and lesser known, but nevertheless fascinating, observations. For instance, the significant changes in demographics that Finland has faced in the last 20 or so years was news to me, as was their heavy investment in a multi-discipline approach to tackling welfare issues early on in a child’s education. Crehan describes the weekly meetings that take place in Finnish schools between education specialists and class teachers to discuss individual students and devise plans to tackle their social and academic needs. Whilst there are, admittedly, signs of the all too recognisable bureaucracy here, as Crehan rightly points out, it’s ultimately the right approach at the right time. Whereas the Finns look to act on disparity and need early on, in this country we tend to put ‘interventions into place that attempt to deal with a symptom of a problem, rather than its underlying cause.’ Too little, too late in other words.

Cleverlands is published by Unbound using a crowd-funding model, where readers who like the sound of the book’s synopsis contribute to its production. Judging by how quickly Crehan reached her target, it’s clear that there is a lot of interest for this kind of well-informed, well-written educational voyeurism. Whilst there are similarish books on the market– I’m thinking of Amanda Ripley’s The Smartest Kids in the World ­– nothing I’ve read quite manages to achieve the same happy balance between human sentiment and cool analysis. Clearly Crehan’s previous incarnation as a teacher has helped her to focus on the things that we want to know and presented them in such an engaging way that leaves you feeling better informed, if not slightly frustrated at the continued failings and short-sightedness of our own not-so-clever land. This really is a smart read – buy a copy as soon as you can.

No More Marking? Our experience using Comparative Judgement

131212_edu_essays-jpg-crop-promo-mediumlarge

 

 

 

 

 

 

 

I first came across comparative judgement and Chris Wheadon’s No More Marking website about three years ago, when it was very much in its infancy. For some reason, I didn’t recognise its potential; I saw more drawbacks to collaborative assessment than benefits. What I hadn’t properly considered were the significant flaws in existing methods for assessing students’ written work – issues of bias, the illusion of objective evaluation against scoring rubrics, etc. I also didn’t fully appreciate the central premise that underpins comparative judgement: that human beings deal more in relative comparisons than absolute definites.

The significant benefits of using comparative judgement are much more obvious to me now, not just for English, but for other subjects areas too. Whilst it is not without its issues (see below), the more I use comparative judgement, and the accompanying assessment tools on the ever-improving No More Marking site, the more I think it can really help increase the reliability of assessing certain pieces of work, as well make a big difference in reducing teacher workload. There are other potential benefits too, such as opportunities for collaborative professional learning, getting better at understanding what makes a good piece of work, and quickly seeing different strengths and weakness across a cohort.

Most of the examples I have read about of school’s using comparative judgement tend to focus on the assessment of writing – facets of effective composition, such as control, organisation and style. An obvious example is Daisy Christodoulou’s pioneering work with Chris Wheadon, which is extremely useful in showing how to use comparative judgement at scale, as well as demonstrating how it can lead to greater reliability than teacher judgement and more conventional forms of standardisation. Comparative judgement of small pieces of written work is also at the heart of the FFT’s English Proof of Progress test that many schools, including ours, are using to measure the progress of their KS3 students and to cross reference against their own emerging assessment models.

This is all well and good, and I would imagine that even comparative judgement’s staunchest detractors can see that it has something to offer the process of assessing for things like style and technical accuracy. What I think is less well documented, though, is how comparative judgement can support the assessment of other areas of the English curriculum, such as longer pieces of analytical writing. This is because it’s much harder to use comparative judgement in this way. Yet, within my department, and probably for other secondary school departments too, this is what we are interested in right now: learning how comparative judgement might support the process of marking ever-increasing amounts of essays that our students are writing at both GCSE and A Level. Essays that we want to assess and that we want to assess reliably and quickly.

Unlike the assessment of writing, though, where it is possible to quickly read a piece of writing and make an instinctive judgement about its relative quality and accuracy, I think that analytical responses are much more problematic. For a start, judges must be well versed in the text or texts being written about. This is not an insurmountable hurdle, since many teachers in a department teach the same text, and one would hope that most English teachers are au fair enough with texts on a GCSE syllabus to pass judgement on a piece of analysis. That said, knowledge of the text and knowledge of the focus of the analysis – such as the extent to which contextual links play a role – are much more of a factor in collaborative assessment of reading than with writing, which makes it more difficult to enlist more judges and is therefore more time-consuming to make comparative judgements.

Trialling Comparative Judgement

We have now used comparative judgement in the English department on three separate occasions, most recently to assess a year 11 literature mock question on Dr Jekyll and Mr Hyde. Whereas last year we focused on experimenting with the process and getting used to marking in such a different way, this year we have increased our use of comparative judgement with the longer-term aim of making it a key component of our overall assessment portfolio. Rather than blindly replacing the old with the new, however, which is certainly tempting when you think you can see the benefits from the outset, we are mindful that we need to tread carefully.

As a result we have set up a controlled trial to try and get some objective feedback to check against our hunches. The trial essentially consists of splitting our GCSE cohort into two groups. All students will sit 5 literature assessments throughout the course of the year, with one group having their assessments marked using comparative judgement, and the other through the more traditional method of applying a mark scheme followed by a process of moderation. Using a combination of quantitative and qualitative methods, we hope to ascertain the effect, if any, of using comparative judgement on student learning, but also, more importantly, its impact on teacher workload. Admittedly, such evaluation is flawed, but we hope that it will at least make us better informed when we come to make a decision later on about whether to adopt comparative judgement more widely.

Issues and solutions

The impact of poor handwriting on grade scores is not a new phenomenon. I remember when I was a GCSE exam marker: I would much prefer reading legible scripts and curse the ones I had to spend time deciphering. Obviously, I tried to not let students’ poor handwriting get in the way of making my judgments, but the reality was it probably did, even if the only bias was the additional time meant I saw flaws more clearly. When you are marking your own students’ essays – as with the usual way we mark our internal assessments – you get used to those students with tricky handwriting, and learn how to decipher their meaning, perhaps unconsciously giving them the benefit of the doubt because you know what they meant.

It’s even harder to avoid handwriting bias with comparative judgement, particularly when you are encouraged to make quick judgements and you are reading lots of scanned photocopied scripts off a computer screen. Poor handwriting was clearly a factor behind some of the anomalous results from our recent session. Several teachers noted how hard it was to properly read some essays, and a deeper examination of the worst offenders showed that the mean time of judgements on them was much longer than those that were easier on the eye. Most of these essays also scored badly. Conversely, almost all the best essays had the neatest, most legible pen work. Under closer inspection, however, a significant number were clearly in the wrong band.

It would be wrong to suggest that all the anomalies we found after interrogating the ranked order of essays was entirely down to issues of handwriting. There were a number of administrative failures, such as students writing on the wrong part of barcoded paper and some of the scans uploaded back to front, which gave the impression that some students had not written very much at all, or only in fragments. These are technical issues, and can easily be ironed out the more we get to grips with the approach. That is the whole point of taking things slowly and learning from trial and error.

Aside from issues of handwriting and administration a number of other anomalies remained. Some of these apparent errors turned out to be completely right: students that teachers had expected to score highly had not written a good essay, and students who had not really been expected to gain high marks did much better than anticipated. With our usual approach – teachers marking their own classes with some subsequent moderation – I suspect that some of these surprising results would not have been apparent. Other anomalies were just plain wrong, which I would love to illustrate but our uploaded scripts are no longer available on the new No More Marking website. We still haven’t got to the bottom of why a significant number of these scripts were placed in completely the wrong order / bands. Some error is inevitable, of course, but the question is probably more about whether comparative judgement has created these errors, or whether they were always there and that comparative judgement has just brought them to light.

I hope to be able to answer this question as the year goes on.

Next steps:

  • Brief teachers on issues of bias with poor handwriting and halo effect of neat work
  • Emphasise to students the importance of taking care with their handwriting
  • Standardised instructions conditions for all students taking the tests
  • Teacher standardisation session using exemplar work from previous session
  • Clearer focus on the criteria for judgements
  • Previous responses used as anchors in the judging session
  • Divide up marking sessions: 1) an initial collaborative judging to iron out issues, identify interesting or salient features of students’ work and to check teacher reliability, etc. 2) Independent judging session/s another time to avoid issues of fatigue and cognitive overload
  • Investigate significant anomalies and identify possible factors into judgements
  • Use insights into student work to inform subsequent teaching