Principles of Great Assessment #2 Validity and Fairness


This is the second of a three part series on the principles of great assessment. In my last post I focused on some principles of assessment design. This post outlines the principles that relate to ideas of validity and fairness.* As I have repeatedly stressed, I do not consider myself to be an expert in the field of assessment, so I am more than happy to accept constructive feedback to help me learn and to improve upon the understanding of assessment that we have already developed as a school. My hope is that these posts will help others to learn a bit more about assessment, and for the assessments that students sit to be as purposeful and supportive of their learning as possible.

So, here are my principles of great assessment 6-10.

6. Regularly review assessments in light of student responses

Validity in assessment is extremely important. For Daniel Koretz it is ‘the single most important criterion for evaluating achievement testing.’ Often when teachers talk about an assessment being valid or invalid, they are using the term incorrectly. In assessment validity means something very different to what it means in everyday language. Validity is not a property of a test, but rather of the inferences that an assessment is designed to produce. As Lee Cronbach observes, ‘One validates not a test but an interpretation of data arising from a specified procedure’ (Cronbach, 1971).

There is therefore no such thing as a valid or invalid assessment. A maths assessment with a high reading age might be considered to provide valid inferences for students with a high reading age, but invalid inferences for students with low reading ages. The same test can therefore provide both valid and invalid inferences depending on its intended purpose, which links back to the second assessment principle: the purpose of the assessment must be set and agreed from the outset. Validity is thus specific to particular uses in particular contexts and is not an ‘all or nothing’ judgement but rather a matter of degree and application.

Picture4If you understand that validity applies to the inferences that assessments provide, then you should be able to appreciate why it is so important to make sure that an assessment gives as valid inferences about student achievement as possible, particularly when there are significant consequences attached for students taking them, like attainment grouping. There are two main threats to achieving this validity: construct under-representation and construct irrelevance. Construct under-representation refers to when a measure fails to capture important aspects of the construct, whilst construct irrelevance refers to when a measure is influenced by things other than just the construct i.e. the example of high reading age in a maths assessment.

There are a number of practical steps that teachers can take to help reduce these threats to validity and, in turn, to increase the validity of the inferences provided by their assessments. Some are fairly obvious and can be implemented with little difficulty, whilst others require a bit more technical know-how and/or a well-designed systematic approach that provides teachers with the time and space needed to design and review their assessments on a regular basis.

Here are some practical steps educators can take:

Review assessment items collaboratively before a new assessment is sat

Badly constructed assessment items create noise and can lead to students guessing the answer. Where possible, it is therefore worth spending some time and effort upfront, reviewing items in a forthcoming summative assessment before they go live so that any glaring errors around the wording can be amended, and any unnecessary information can be removed. Aside from making that assessment more likely to generate valid inferences, such as approach has the added advantage of training those less confident in assessment design in some of the ways of making assessments better and more fit for purpose. In an ideal world, an important assessment should be piloted first to provide some indication of issues with items, and the likely spread of results across an ability profile. This will not always be possible.

Check questions for cues and contextual nudges

Another closely-linked problem and another potential threat to validity is flawed question phrasing that inadvertently reveals the answer, or provides students with enough contextual cueing to narrow down their responses to particular semantic or grammatical fit. In the example item from a PE assessment below, for instance, the phrasing of the question, namely the grammatical construction of the words and phrases around the gaps, make anaerobic and aerobic more likely candidates for the correct answer. They are adjectives which precede nouns, whilst the rest of the options are all nouns and would sound odd to a native speaker – a noun followed by a noun.  A student might select anaerobic and aerobic, not because they necessarily know the correct answer, but because they sound correct in accordance with the syntactical cues provided. This is a threat to validity in that the inference is perhaps more about grammatical knowledge rather than understanding of bodily process.

Example: The PE department have designed an end of unit assessment to check students’ understanding of respiratory systems. It includes the following types of item.

Task: use two of the following words to complete the passage below

Anaerobic, Energy, Circulation, Metabolism, Aerobic 

When the body is at rest this is ______ respiration. As you exercise you breathe harder and deeper and the heart beats faster to get oxygen to the muscles. When exercising very hard, the heart cannot get enough oxygen to the muscles. Respiration becomes _______.

Interrogate questions for construct irrelevance

If the purpose of an assessment has been clearly established from the outset and that assessment has been clearly aligned to the constructs within the curriculum, then a group of subject professionals working together should be able to identify items where things other than the construct are being assessed. Obvious examples are high reading ages that get in the way of assessments of mathematical or scientific ability, but sometimes it might be harder to detect, as with the example below. To some, this item might seem fairly innocuous, but on closer inspection it becomes clear that it is not assessing vocabulary knowledge as purported, but rather spelling ability. Whilst it may be desirous for students to spell words correctly, inferences about word knowledge would not be possible from an assessment with these kinds of items in it.

Example: The English department designs an assessment to measure students’ vocabulary skills. The assessment consists of 40 items like the following:

Task: In all of the ________________ of packing into a new house, Sandra forgot about washing the baby.

  1. Excitement
  2. Excetmint
  3. Excitemant
  4. Excitmint

7. Standardise assessments that lead to important decisions

Teachers generally understand the importance of making sure that students sit final examinations in an exam hall under same conditions as everyone else taking the test. Mock examinations tend to replicate these conditions, because teachers and school leaders want the inferences provided by them to be as valid and fair as possible. For all manner of reasons, though, this insistence on standardised conditions for test takers is less rigorously adhered to lower down the school, even though some of decisions based upon such tests in year 7 and 8 arguably carry much more significance for students than any terminal examination.

I know that I have been guilty of not properly understanding the importance of standardising test conditions.  On more than one occasion I have set an end of unit or term assessment as a cover activity, thinking that it was ideal work because it would take students the whole lesson to complete and they would need to work in silence. I hadn’t appreciated how assessment is a bit more complicated than that, even for something like an end of unit test. I hadn’t considered, for instance, that it mattered whether students got the full hour, or more likely 50mins if it was set by a cover supervisor who had to spend valuable time settling the class. I hadn’t taken on board that it would make a difference if my class sat the assessment on a afternoon, and the class next door completed theirs bright and early in the morning.

It may well be that my students would have scored exactly the same whether or not I was present, whether they sat the test in the morning or in the afternoon, or whether they had 50 minutes or the full hour. The point is that I could not be sure, and that if one or more of my students would have scored significantly higher (or lower) under different circumstances, then their results would have provided invalid inferences about their understanding. If they were then placed in a higher or lower group as a result, or I reported home to their parents some erroneous information about their test scores, which possibly affected their motivation or self-efficacy, then you could suggest that I had acted unethically.

8. Important decisions are made on the basis of more than one assessment

Imagine you are looking to recruit a new head of science. Now imagine the even more unlikely scenario that you have received a strong field of applicants, which I appreciate in the current recruitment climate, is a bit of a stretch of the imagination. With such a strong field for such an important post, a school would be unlikely to make any decision on whom to appoint based upon the inferences provided by one single measure, such as an application letter, a taught lesson or an interview. More likely, they would triangulate all these different inferences about the candidate’s suitability for the role when making their decision, and even then crossing their fingers that they had made the right choice.

A similar principle is at work when making important decisions on the back of student assessment results, such as which group to place them in the following term, identifying which individuals need additional support or how much, if any, progress to report home to parents. In each of these cases, as with the head of science example, it would be wise to be able to draw upon multiple inferences in order to make a more informed decision. This is not to advocate an exponential increase in the number of tests students sit, but rather to recognise that when the stakes are high, it is important to make sure the information we use is as valid as possible. Cross referencing examinations is one way of achieving this, particularly given the practical difficulties of standardising assessments previously discussed.

9. Timing of assessment is determined by purpose and professional judgement

The purpose of an assessment informs its timing. Whilst this makes perfect sense in the abstract, in practice there are many challenges to making this happen. In Principled Assessment Design, Dylan Wiliam notes how it is relatively straightforward to create assessments which are highly sensitive to instruction if what is taught is not hard to teach and learn. For example, if I all I wanted to teach my students in English was vocabulary, and I set up a test that assessed them on the 20 or so words that I had recently taught them, it would be highly likely that the test would show rapid improvements in their understanding of these words. But as we all know, teaching is about much more than just learning a few words. It involves complex cognitive processes and vast webs of interconnected knowledge, all of which take a considerable amount of time to teach, and in turn to assess.


It seem that’s the distinction between learning and performance is becoming increasingly well understood, though perhaps in terms of curriculum and assessment its widespread application to the classroom is taking longer to take hold. The reality for many established schools is that it is difficult to construct a coherent curriculum, assessment and pedagogical model across a whole school that embraces the full implications of the difference between learning and performance. It is hard enough to get some colleagues to fully appreciate the distinction, and its many nuances, so indoctrinated are they by years of the wrong kind of impetus. Added to this, whilst there is general agreement that assessing performance can be unhelpful and misleading, there is no real consensus of the optimal time to assess for learning. We know that assessing soon after teaching is flawed, but not exactly when to capture longer term learning. Compromise is probably inevitable.

What all this means in practical terms for schools is they to work within their localised constraints, including issues of timetabling, levels of understanding amongst staff and, crucially, the time and resources to enact the theory when known and understood. Teacher workload must also be taken into account when deciding upon the timing of assessments, recognising certain pinch points in the year and building a coherent assessment timetable that respects the division between learning and performance, builds in opportunities to respond to (perceived) gaps in understanding and spreading out the emotional and physical demands for staff and students. Not easy, at all.

10. Identify the range of evidence required to support inferences about achievement

Tim Oates’ oft quoted advice to avoid assessing ‘everything that moves, just the key concepts’ is important to bear in mind, not just for those responsible for assessment, but also for those who design the curricula with which those assessments are aligned. Despite the freedoms afforded from the liberation of levels and the greater autonomy possible with academy status, many of us have still found it hard to narrow down what we teach to what is manageable and most important. We find it difficult in practice to sacrifice breadth in the interests of depth, particularly where we feel passionately that so much is important for students to learn. I know it has taken several years for our curriculum leaders to truly reconcile themselves to the need to strip out some content and focus on teaching the most important material to mastery.

Once these ‘key concepts’ have been isolated and agreed, the next step is to make sure that any assessments cover the breadth and depth required to gain valid inferences about student achievement of them.  I think the diagram below, which I used in my previous blog, is helpful in illustrating how assessment designers should be guided by both the types of knowledge and skills that exit within the construct (the vertical axis) and the levels of achievement across each component i.e. the continuum (horizontal axis). This will likely look very different in some subjects, but it nevertheless provides a useful conceptual framework for thinking about the breadth and depth of items required to support valid inferences about levels of attainment of the key concepts.

Screenshot 2017-03-09 16.53.06

In my next post, which I must admit I am dreading writing and releasing for public consumption, is focusing on trying to articulate a set of principles around the very thorny and complicated area of assessment reliability. I think I am going to need a couple of weeks or so to make sure that I do it justice!

Thanks for reading!


* I am aware the numbering of the principles on the image does not match the numbering in my post. That’s because the image is a draft document.


Principles of Great Assessment #1 Assessment Design

Screenshot 2017-03-10 17.52.06.png

This is the first in a short series of posts on our school’s emerging principles of assessment, which are split into three categories – principles of assessment design; principles of ethics and fairness; and principles for improving reliability and validity. My hope in sharing these principles of assessment is to help other develop greater assessment literacy, and to gain constructive feedback on our work to help us improve and refine our model in the future.

In putting together these assessment principles and an accompanying CPD programme aimed at middle leaders, I have drawn heavily on a number of writers and speakers on assessment, notably Dylan Wiliam, Daniel Koretz, Daisy Christodolou, Rob Coe and Stuart Kime. All of these have a great ability to convey difficult concepts (I only got a C grade in maths, after all) in a clear, accessible and, most importantly, practical way. I would very much recommend following up their work to deepen your understanding of what truly makes great assessment.

  1. Align assessments with the curriculum 

 Screenshot 2017-03-10 17.52.48.png

In many respects, this first principle seems pretty obvious. I doubt many teachers deliberately set out to create and administer assessments that are not aligned with their curriculum. And yet, for a myriad of different reasons, this does seem to happen, with the result that students sit assessments that are not directly sampling the content and skills of the intended curriculum. In these cases the results achieved, and the ability to draw any useful inferences from them, are largely redundant. If the assessment is not assessing the things that were supposed to have been taught, it is almost certainly a waste of time – not only for the students sitting the test, but for the teachers marking it as well.

Several factors can affect the extent to which an assessment is aligned with the curriculum and are important considerations for those responsible for setting assessments. The first is the issue of accountability. Where accountability is unreasonably high and a culture of fear exists, those writing assessments might be tempted to narrow down the focus to cover the ‘most important’ or ‘most visible’ knowledge and skills that drive that accountability. In such cases, assessment ceases to provide any useful inferences about knowledge and understanding.

Assessment can also become detached from the curriculum when that curriculum is not delineated clearly enough from the outset. If there is not a coherent, well-sequenced articulation of the knowledge and skills that students are to learn, then any assessment will always be misaligned, however hard someone tries to make the purpose of the assessment valid. A clear, well structured and shared understanding of the intended curriculum is vital for the enacted curriculum to be successful, and for any assessment of individual and collective attainment to be purposeful.

A final explanation for the divorce of curriculum from assessment is the knowledge and understanding of the person writing the assessment in the first place. To write an assessment that can produce valid inferences requires a solid understanding of the curriculum aims, as well as the most valid and reliable means of assessing them. Speaking for myself, I know that I have got a lot better at writing assessments that are properly aligned with curriculum the more I have understood the links between the two and how to go about bridging them.

  1. Define the purpose of an assessment first 

 Depending on how you view it, there are essentially two main functions of assessment. The first, and probably most important, purpose is as a formative tool to support teaching and learning in the classroom. Examples might include a teacher setting a diagnostic test at the beginning of a new unit to find out what students already know so their teaching can be adapted accordingly. Formative assessment, or responsive teaching, is an integral part of teaching and learning and should be used to identify potential gaps in understanding or misconceptions that can be subsequently addressed.

The second main function of assessment is summative. Whereas examination bodies certify student achievement, in the school context the functions of summative assessment might include assigning students to different groupings based upon perceived attainment, providing inferences to support the reporting of progress home to parents, or the identification of areas of underperformance in need of further support. Dylan Wiliam separates out this accountability function from the summative process, calling it the ‘evaluative’ purpose.

Whether the assessment is designed to support summative or formative inferences is not really the point. What matters here is that the purpose or function of the assessment is made clear to all and that the inferences the assessment is intended to produce are widely understood by all. In this sense, the function of the assessment determines its form. A class test intended to diagnose student understanding of recently taught material will likely look very different from a larger scale summative assessment designed to draw inferences about whether knowledge and skills have been learnt over a longer period of time. Form therefore follows function.

3. Include items that test understanding across the construct continuum

 Many of us think about assessment in the reductive terms of specific questions or units, as if performance on question 1 of Paper 2 was actually a thing worthy of study in and of itself. Assessment should be about approximating student competence in the constructs of the curriculum. A construct can be defined as the abstract conception of a trait or characteristic, such as mathematical or reading ability. Direct constructs measure tangible physical traits like height and weight and are calculated using verifiable methods and stated units of measurement. Unfortunately for us teachers, most educational assessment assesses indirect constructs that cannot be directly measured by such easily understood units. Instead, they are calculated by questions that we think indicate competency, and that stand in for the thing that we cannot measure directly.

Within many indirect constructs, such as writing or reading ability, is likely to be a continuum of achievement possible. So within the construct of reading, for instance, some students will be able to read with greater fluency and/or understanding than others. A good summative assessment therefore needs to differentiate between these differing levels of performance and, through the questions set, define what it means to be at the top, middle or bottom of that continuum. In this light, one of the functions of assessment has to be a way of estimating the position of learners on a continuum. We need to know this to evaluate the relative impact or efficacy of our curricula, and to understand how are students are progressing within it.

Screenshot 2017-03-09 16.52.15.png

  1. Include items that reflect the types of construct knowledge

 Some of the assessments we use do not adequately reflect the range of knowledge and skills of the subjects they are assessing. Perhaps the format of terminal examinations has had too much negative influence on the way we think about our subjects and design assessments for them. In my first few years of teaching, I experienced considerable cognitive dissonance between my understanding of English and the way that it was conceived of within the profession. I knew my own education was based on reading lots of books, and then lots more books about those books, but everything I was confronted with as a new teacher – schemes of work, the literacy strategy, the national curriculum, exam papers– led me to believe that I should really be thinking of English in terms of skills like inference, deduction and analysis.

English is certainly not alone here, with history, geography and religious studies all suffering from a similar identify crisis. This widespread misconception of what constitutes expertise and how that expertise is gained probably explains, at least in part, why so many schools have been unable to envisage a viable alternative to levels. Like me, many of the people responsible for creating something new themselves been infected by errors from the past and have found it difficult to see clearly that one of the big problems with levels was the way they misrepresented the very nature of subjects. And if you don’t fully understand or appreciate what progression looks like in your subject, any assessment you design will be flawed.

Daisy Christodoulou’s Making Good Progress is a helpful corrective, in particular her deliberate practice model of skill acquisition, which is extremely useful in explaining the manner in which different types of declarative and procedural knowledge can go into perfecting a more complex overarching skill. Similarly, Michael Fordham’s many posts on substantive and disciplinary knowledge, and how these might be mapped on to a history progression model are both interesting and instructive. Kris Boulton’s series of posts (inspired by some of Michael’s previous thinking) are also well worth a look. They consider the extent to which different subjects contain more substantive or disciplinary knowledge, and are useful points of reference for those seeking to understand how best to conceive of their subject and, in turn, design assessments that assess the range of underlying forms of knowledge.

Screenshot 2017-03-09 16.53.06.png

  1. Use the most appropriate format for the purpose of the assessment

 The format of an assessment should be determined by its purpose. Typically, subjects are associated with certain formats. So, in English essay tasks are quite common, whilst in maths and science, short exercises where there are right and wrong answers are more the norm. But as Dylan Wiliam suggests, although ‘it is common for different kinds of approaches to be associated with different subjects…there is no reason why this should be so.’ Wiliam draws a useful distinction between two modes of assessment: a marks for style approach (English, history, PE, Art, etc.), where students gain marks for how well they complete a task, and a degree of difficulty approach (maths, science), where students gain marks for how well they progress in a task. It is entirely possible for subjects like English to employ marks for difficulty assessment tasks, such as multiple choice questions, and maths to set marks for style assessments, as this example of comparative judgement in maths clearly demonstrates.

Screenshot 2017-03-09 16.53.18.png

In most cases, the purpose of assessment in the classroom will be formative and so designed to facilitate improvements to student learning. In such instances, where the final skill has not yet been perfected but is still very much a work in progress, it is unlikely that the optimal interim assessment format will be the same as the final assessment format. For example, a teacher who sets out to teach her students by the end of the year to construct well written, logical and well supported essays is unlikely to set essays every time she wants to infer her students’ progress towards that desired end goal. Instead, she will probably set short comprehension questions to check their understanding of the content that will go into the essay, or administer tests on their ability to deploy sequencing vocabulary effectively. In each of these cases, the assessment reflects the inferences about student understanding the teacher is trying to ascertain, and not confusing or conflating them with other things.

In the next post, I will outline our principles of assessment in relation to ethics and fairness. As I have repeatedly made clear, my intention is to help contribute towards a better understanding of assessment within the profession. I welcome anyone who wants to comment on our principles, or to critique anything that I have written, since this will help me to get a better understanding of assessment myself, and make sure the assessments that we ask our students to sit are as purposeful as possible.

Thanks for reading.



The Future of Assessment for Learning


Making Good Progress is an important book and should be required reading for anyone involved in designing, administrating or interpreting assessments involving children. Given the significant changes to the assessment and reporting landscape at every level, notably in the secondary context at KS3, this book is a timely read, and for my money it is the most helpful guide to designing effective formative and summative assessment models currently available to teachers.

I’ve heard Daisy speak at various education events over the years, and it is interesting to see how many of these individual talks have fed into the development of this book. Making Good Progress is a coherent and highly convincing argument for re-evaluating our existing understanding and approach to formative assessment and for moving away from the widespread practice of using formative assessment for summative purposes.

Life after Levels

From what I can tell, schools have responded to the abolishment of levels in three main ways. The first is business as usual, maintaining the use of levels – and thus ignoring the manifold problems associated with their misapplication – or recreating levels but in another name. Such amended approaches appear to recognise the flaws of levels and offer something different, but in reality too often they end up simply representing the same thing, changing numbers to letters or something else as equally fatuous. In many respects our first iteration after levels – the Elements of Assessment – fell foul of some of these same mistakes.

The second response to life after levels is the mastery-inspired model of assessment. In this approach subjects identify learning objectives for a student to master over the course of a year. This approach, which usually includes mapping out these myriad goals on a spreadsheet, appears more attractive in theory – what is to be learned is clearly articulated and not bundled up into a grade or prose descriptor – but in practice can prove equally unreliable and particularly unwieldy to maintain. Often the micro goals are watered down versions of the final assessment, not carefully broken down components of complex skills.

The final approach is the popular flight path model. This comes in various forms, but generally tends to focus on working backwards from GCSE grades to provide a clear ‘path’ from year 7 to year 11. I can understand the allure of this, and appreciate how such a model appears to offer school leaders a neat and tidy solution to levels. The problem is that learning is not this straightforward, and introducing the language of GCSE at year 7 seems to me to entirely miss the point of what assessment can and should be at this point of a child’s education – some five years before any terminal exam is to be sat!

As you read Daisy’s fantastic book, it becomes clear how all of these approaches to assessment are in one way or another fundamentally flawed: none of them really address the two underlying problems that ultimately did in for levels, namely the tendency for interim (or formative) assessment to always look like the final task, and for assessment to happily double up for formative and summative purposes. Making Good Progress destroys these widely held beliefs, albeit in the kind and sympathetic manner of a former teacher who understands how all this mess came to pass.

Generic Skill versus Deliberate Practice

In chapter five Daisy takes up what, from my experience, is the biggest barrier to improvement in the use of assessment in schools: how teachers conceive of their subjects in the first place. Daisy carefully unpicks the misconception that initial tests should reflect the same format as the final assessment. She outlines two very different methods of skill acquisition that account for how interim assessments are constructed – the generic skill method (where skills are transferable and practiced in a form close to their final version) and the deliberate practice method (where practice is deliberate and focused may look different in nature to the final version).

In the generic skill model, an interim assessment, such as a test of reading ability in English, will look very similar to the final assessment of reading at the end of the course, an essay or an extended piece of analysis in a GCSE exam, for instance. This approach, however, completely misunderstands how students learn such large and complex domains like reading, and prevents the opportunity for the interim assessment to be used formatively because it bundles up the many different facets of the domain and hides them in vague prose descriptors.

The alternative to this model, Daisy calls the deliberate practice model. Informed by the work of Anders Ericsson, this view of skill acquisition respects the limitations of working memory and recognises how complex skills are learnt by breaking down the whole skill into its constituent parts in an effort to build up the mental models that enable expertise. In this model very little, if any, practice tasks look like the final assessment. Sports coaches and music teachers have long understood the importance of this method, isolating specific areas of their domain for deliberate practice. As Daisy notes: ‘The aim of performance is to use mental models. The aim of learning is to create them.’

These two distinct approaches to skill development have a significant consequence for the design and implementation of assessment in the classroom. If you are a history teacher and you teach in accordance with the generic model of skills acquisition, you will tend to set your students essays when you want to check their understanding of historical enquiry. You may get the illusion of progress through your summative judgements, an emerging student might appear to become a secure student from one assessment to the next, but neither you, nor your students, will really be any the wiser of what, if anything, has improved or, more to the point, what needs to be improved in the future.

Another history teacher might share the same desire to teach her students to write coherent historical essays. This teacher, however, knows this is an incredibly complex skill that requires sophisticated mental models underpinned by a breadth and depth of historical knowledge. This teacher isolates these specific areas and targets them for dedicated practice. When she checks for understanding, she sets tests that reflect these micro components, such as setting a timeline task to show students’ understanding of chronology, or a series of multiple choices questions designed to ascertain their understanding of causality. Extended writing comes later when the mental models are secure. For now, the results from the tasks provide useful, precise formative feedback.

Koretz and Wiliam

For much of the book, Daisy draws on the work of Daniel Koretz and Dylan Wiliam to support her arguments. Koretz’s Measuring Up is another great book, which outlines the design and purpose of standardised testing and how to interpret examination results in a sensible way. Wiliam’s work is equally instructive, in particular his SSAT pamphlet Principled Assessment Design, which is a helpful technical guide for school leaders on designing reliable and valid school assessments.

Making Good Progress complements both these other works, and together the three books tell you everything you need to know about how to construct valid, reliable and ethical assessments. Like Koretz and Wiliam, Daisy considers the key technical assessment concepts of reliability and validity, and similarly exposes the uses and abuses of assessment, which she does in such a way that makes the need to assess better seem urgent and necessary. What it also offers, however, in particular through the deliberate practice paradigm, is the means through which to improve assessment and to link it to a coherent progression model of learning.

If I had one minor criticism of Making Good Progress, it would be that the closing chapters that outline this coherent model of curriculum and assessment are perhaps a little idealistic. Whilst the arguments for more widespread use of textbooks to support a coherent model of progression are sound, and the idea to create banks of subject-specific diagnostic questions for formative assessment purposes makes complete sense, the chances of either of these things happening any time soon seems to me rather remote. Both require significant agreement amongst teachers on the nature of their disciplines, some kind of consensus around skill acquisition (as Daisy notes herself, the generic skill method is pervasive) and for schools to systematically work together. Oh, and stacks of investment too. None of these things seem likely in the current education climate.

One much bigger criticism of the book, which I really must take Daisy to task about, is that it was not written several years earlier. Whilst I get that it may have taken her a while to formulate her ideas, and perhaps a good few months more to write them out, it still seems pretty remiss of her not to have co-ordinated better with the DFE. Had Making Good Progress been published in 2013 when the abolishment of National Curriculum levels was first announced (perhaps in a Waterstones 3 for 2 offer with Koretz and Wiliam), then I think that I, along with a number of other teachers, would have not wasted quite so much time and effort floundering around in the dark, trying to design something better than what went before, but often failing miserably.

Making Good Progress is a truly great read, and though its ostensible focus is on improving the use of formative assessment in schools, it covers a great deal of other ground in order to lay out the evidence to support the arguments. I enjoyed Daisy’s book immensely and commend it to anyone in the profession in any way involved with assessment, which is pretty much everyone!

ResearchEd 2015 – a sharper focus on what works

Screenshot 2015-09-08 18.49.53

I didn’t go to the inaugural ResearchEd conference in 2013. I did, however, get a flavour of the day from my Twitter feed and then, subsequently, from the videoed sessions kindly put together after the event. Ben Goldacre set the tone for an enthusiastic response towards the idea of teaching as more of an evidence-based profession. This desire for a greater research base to inform educational decision-making was perhaps understandable: for too long teachers had been subjected to every unsubstantiated whim of national policy makers, and had born the work load brunt of over exuberant SLTs eager to demonstrate how they were meeting the latest outstanding criteria laid out by Ofsted. Research provided a possible means to challenge the status quo and build something better from the ground up.

As is often the case, early enthusiasm can be followed by cynicism and doubt. And so it came to pass that at last year’s conference, which I was lucky enough to attend, there was a definite air of caution towards the trumpeting of research in education. If Ben Goldacre symbolised ResearchEd13’s rallying cry for education to become more like the medical profession, then Dylan Wiliam struck a more cautious note for teaching and evidence becoming bedfellows. His provocatively titled, ‘Why teaching will never be a research-based profession, and why that’s a Good Thing’ was a message heard in several other sessions I attended that day. Whilst most teachers generally felt an evidence-informed profession was a desirable thing – that was why there were there! – there was uncertainty about exactly what form it should take.

At Saturday’s fantastic third annual conference, held at South Hampstead High School in North West London, it felt like some of these doubts from last year had gone away, or rather that – at least in the sessions I went to or heard about – there was a growing confidence about how research could play a successful (and practical) role within education, one that also takes account of the legitimate concerns articulated previously – the lack of replicated findings; the prevalence of poorly constructed studies, compatibility with craft knowledge, etc. As distinct as each session was, it seemed that the different facets of the profession had begun to work out what their relationship to research and evidence should be, and that there was something close to consistency from these different stakeholders. More importantly, there was greater clarity on the practical benefits of research for teachers in the classroom and for school leaders looking to establish the conditions for great teaching in their schools.

I certainly picked up something valuable from every presentation, an idea or approach that I can and will take back to my school and apply in my context. I’ve picked out four of the sessions that I attended and identified what I took away from the session, which is likely to be idiosyncratic and reflect my own school’s concerns.

Can we learn anything from ‘top performing’ education systems? Lucy Crehan

I was gutted to miss Lucy speak at the Festival of Education earlier in the year. I had dragged @teachertweeks half way across the school to hear Lucy, only to meet with a shut door and a queue of disgruntle edu-punters. I was not going to make the same mistake again and so arrived earlier to get a seat. Lucy’s talk concentrated on the ways in which international evidence is often used incorrectly by governments to make policy decisions. Lucy has visited a number with different education contexts, including Finland, Canada, New Zealand and Singapore, and spent time digging beneath the surface headlines to find out a lot about how their systems work. Her accounts of different jurisdictions are fascinating and will no doubt make a great read when she gets round to writing them up in her crowd-sourced book, Cleverlands. There were many interesting observations about both the nature of different education settings and how data from these environments can be and has been misapplied by governments looking simplistically at other systems for solutions to their own educational problems.

# Takeaway point 1

In high performing systems that operate benchmark standards for student achievement Lucy talked about the amount of early intervention to address underperformance, often before the achievement gap gets too wide to do anything about. There are no higher standards (or objectives) for higher attaining students, though in reality they do learn, do and know more than their weaker peers. This struck a chord with me and helped me think more about how we get to grips with intervention whilst there is still time.

Hack your own teacher-researcher career – Becky Allen

I think it is fair to say that Becky is not really a big fan of everyone becoming teacher researchers; she believes ‘almost all teachers should never do education research’ – it is just too small scale and often poorly constructed to provide anything of any real value to the wider profession. In truth, she is not against the notion of the enquiring practitioner per se, just that it may not be the best of use of a teacher’s time and that it is never going to be the kind of thing that brings about system-wide change or the type of research that her organisation Education Datalab is interested in looking at. What Becky does like, and speaks passionately about, is big data: the size and scale of studies that can and should be used across systems to provide teachers with an evidence base to help them inform their practice. Becky’s talk outlined how those intent on contributing towards this kind of research could go about it, avenues open to teachers today that were not available to her when she forced to take a sabbatical to pursue further studies in research. Becky is an inspiring individual, though I tend to look more favourably on the potential of individual disciplined enquiry than she does.

# Takeaway point 2

The 10 step process that Becky used to explain how a willing teacher could get more involved in large scale research is a fantastic sequence of steps that I will definitely draw upon to help develop the lead learners at my school, and to encourage more teachers to engaged with research in all its different forms. We may not be conducting large-scale studies here, but we can get better at our own classroom enquiry.


The swimming pool and the marathon: prioritizing cognitive and character development – Eric Kalenze

Personally, I thought Eric was a bit of a star turn. Not only was his talk excellent, but his passion for the event itself and his obvious amazement at what Tom and Helene have created over the last few years was a clear reminder of the extraordinary power of the ResearchEd initiative. Sometimes it takes an outsider to help you appreciate exactly what you have got right in front of you, even if it means that you have to unwittingly be part of a group selfie to prove to the folks back home that you exist! Eric’s talk drew upon one of the chapters of his recent book, Education is Upside Down, which though I have not yet started, am reliably informed is excellent. It is clear from Eric that the American education system is facing many of the challenges that are only too familiar to those working on these shores, such as the overcorrection to curriculum time for teaching soft skills caused by the Dweck-Duckworth juggernaut. Eric skilfully explored the pitfalls (and lack of evidence) of trying to teach non-cognitive skills in an isolated way, reminding us that even the notion of certain character traits may themselves be circumspect. For example, how many people have ever cleared their drive of snow out of intrinsic motivation? Eric doesn’t. His drive remains snow-free to ensure a wife happy and keep a marriage intact.

# Takeaway point 3

I learnt a lot from Eric about how a warm smile and upbeat manner can allow you to get in some pretty devastating critiques without ever appearing polemic or dismissive. His talk also helped me to strengthen my understanding that the best way to teach non-cognitive skills is through cognitive activities and that this is not an either/or situation – that we ‘have to embrace both poles…in full knowledge of the essential contradictions’ between cognitive and non-cognitive outcomes.

Exam marking and re-marking: what do we know and how should we use what we know? Amanda Spielman

This time last year I was Amanda Spielman’s bitch! Let me explain. Amanda is a meticulous presenter and likes to continually move backwards and forwards across her in-depth slides to highlight comparisons, show trends and rearticulate previous points in light of new information presented. Without a clicker, she enlists a nearby manual alternative. Last time round it was me; this year it the honour fell to one of her colleagues at Ofqual. For the love of God, someone please give this woman a clicker for future events. Daisy Christoholodou has already written about the complexity of this session, and it is certainly true that you need to be on it throughout her talk or you lose the thread of her carefully constructed points. I must admit, whilst she lost me at some parts, I nevertheless understood the main thrust of her presentation that the exam marking system and Enquiries About Result (EARs) process, such as it stands, is much more robust than anecdotal account often suggests and, unless we are prepared to invest absurd amounts of time and resources, is probably as reliable and valid as it gets. Amanda outlined some fairly extensive research that Ofqual have conducted into different EAR methods. Even up against single or double blind remarking models, the current EAR model held up strongly, showing the same or greater reliability.

# Takeaway point 4

Perhaps not practical, but I did take away from this session the view that oversight of the assessment process is in much better hands than we are often led to believe. It is clear that amongst the profession there is not really anywhere near enough understanding about assessment procedures and how exactly the whole process works. It was reassuring, though not without its problems, to learn that the Ofqual research into EAR processes revealed an unconscious bias amongst examiners against lowering students’ grades on remark, perhaps influenced by the importance of the grade for the students’ future. Much is being done behind the scenes to make exam outcome as valid and accurate as possible; we need to understand that if the assessment system continues to assess the range of skills and understanding we value, it probably can never truly be perfect.


I have to confess I didn’t stay to the end. I have made a vow to myself that this year I am going to strike a better work / life balance. Getting home to eat with my family won over seeing the likes of Jack Marwood, Sam Freedman and, of course, Tom Bennett.

To be honest, I am not sure how much more I could have gained from this fantastic event, which seems to be continuing to go from strength to strength.

Thank you, Tom and Helene.