Principles of Great Assessment #1 Assessment Design

Screenshot 2017-03-10 17.52.06.png

This is the first in a short series of posts on our school’s emerging principles of assessment, which are split into three categories – principles of assessment design; principles of ethics and fairness; and principles for improving reliability and validity. My hope in sharing these principles of assessment is to help other develop greater assessment literacy, and to gain constructive feedback on our work to help us improve and refine our model in the future.

In putting together these assessment principles and an accompanying CPD programme aimed at middle leaders, I have drawn heavily on a number of writers and speakers on assessment, notably Dylan Wiliam, Daniel Koretz, Daisy Christodolou, Rob Coe and Stuart Kime. All of these have a great ability to convey difficult concepts (I only got a C grade in maths, after all) in a clear, accessible and, most importantly, practical way. I would very much recommend following up their work to deepen your understanding of what truly makes great assessment.

  1. Align assessments with the curriculum 

 Screenshot 2017-03-10 17.52.48.png

In many respects, this first principle seems pretty obvious. I doubt many teachers deliberately set out to create and administer assessments that are not aligned with their curriculum. And yet, for a myriad of different reasons, this does seem to happen, with the result that students sit assessments that are not directly sampling the content and skills of the intended curriculum. In these cases the results achieved, and the ability to draw any useful inferences from them, are largely redundant. If the assessment is not assessing the things that were supposed to have been taught, it is almost certainly a waste of time – not only for the students sitting the test, but for the teachers marking it as well.

Several factors can affect the extent to which an assessment is aligned with the curriculum and are important considerations for those responsible for setting assessments. The first is the issue of accountability. Where accountability is unreasonably high and a culture of fear exists, those writing assessments might be tempted to narrow down the focus to cover the ‘most important’ or ‘most visible’ knowledge and skills that drive that accountability. In such cases, assessment ceases to provide any useful inferences about knowledge and understanding.

Assessment can also become detached from the curriculum when that curriculum is not delineated clearly enough from the outset. If there is not a coherent, well-sequenced articulation of the knowledge and skills that students are to learn, then any assessment will always be misaligned, however hard someone tries to make the purpose of the assessment valid. A clear, well structured and shared understanding of the intended curriculum is vital for the enacted curriculum to be successful, and for any assessment of individual and collective attainment to be purposeful.

A final explanation for the divorce of curriculum from assessment is the knowledge and understanding of the person writing the assessment in the first place. To write an assessment that can produce valid inferences requires a solid understanding of the curriculum aims, as well as the most valid and reliable means of assessing them. Speaking for myself, I know that I have got a lot better at writing assessments that are properly aligned with curriculum the more I have understood the links between the two and how to go about bridging them.

  1. Define the purpose of an assessment first 

 Depending on how you view it, there are essentially two main functions of assessment. The first, and probably most important, purpose is as a formative tool to support teaching and learning in the classroom. Examples might include a teacher setting a diagnostic test at the beginning of a new unit to find out what students already know so their teaching can be adapted accordingly. Formative assessment, or responsive teaching, is an integral part of teaching and learning and should be used to identify potential gaps in understanding or misconceptions that can be subsequently addressed.

The second main function of assessment is summative. Whereas examination bodies certify student achievement, in the school context the functions of summative assessment might include assigning students to different groupings based upon perceived attainment, providing inferences to support the reporting of progress home to parents, or the identification of areas of underperformance in need of further support. Dylan Wiliam separates out this accountability function from the summative process, calling it the ‘evaluative’ purpose.

Whether the assessment is designed to support summative or formative inferences is not really the point. What matters here is that the purpose or function of the assessment is made clear to all and that the inferences the assessment is intended to produce are widely understood by all. In this sense, the function of the assessment determines its form. A class test intended to diagnose student understanding of recently taught material will likely look very different from a larger scale summative assessment designed to draw inferences about whether knowledge and skills have been learnt over a longer period of time. Form therefore follows function.

3. Include items that test understanding across the construct continuum

 Many of us think about assessment in the reductive terms of specific questions or units, as if performance on question 1 of Paper 2 was actually a thing worthy of study in and of itself. Assessment should be about approximating student competence in the constructs of the curriculum. A construct can be defined as the abstract conception of a trait or characteristic, such as mathematical or reading ability. Direct constructs measure tangible physical traits like height and weight and are calculated using verifiable methods and stated units of measurement. Unfortunately for us teachers, most educational assessment assesses indirect constructs that cannot be directly measured by such easily understood units. Instead, they are calculated by questions that we think indicate competency, and that stand in for the thing that we cannot measure directly.

Within many indirect constructs, such as writing or reading ability, is likely to be a continuum of achievement possible. So within the construct of reading, for instance, some students will be able to read with greater fluency and/or understanding than others. A good summative assessment therefore needs to differentiate between these differing levels of performance and, through the questions set, define what it means to be at the top, middle or bottom of that continuum. In this light, one of the functions of assessment has to be a way of estimating the position of learners on a continuum. We need to know this to evaluate the relative impact or efficacy of our curricula, and to understand how are students are progressing within it.

Screenshot 2017-03-09 16.52.15.png

  1. Include items that reflect the types of construct knowledge

 Some of the assessments we use do not adequately reflect the range of knowledge and skills of the subjects they are assessing. Perhaps the format of terminal examinations has had too much negative influence on the way we think about our subjects and design assessments for them. In my first few years of teaching, I experienced considerable cognitive dissonance between my understanding of English and the way that it was conceived of within the profession. I knew my own education was based on reading lots of books, and then lots more books about those books, but everything I was confronted with as a new teacher – schemes of work, the literacy strategy, the national curriculum, exam papers– led me to believe that I should really be thinking of English in terms of skills like inference, deduction and analysis.

English is certainly not alone here, with history, geography and religious studies all suffering from a similar identify crisis. This widespread misconception of what constitutes expertise and how that expertise is gained probably explains, at least in part, why so many schools have been unable to envisage a viable alternative to levels. Like me, many of the people responsible for creating something new themselves been infected by errors from the past and have found it difficult to see clearly that one of the big problems with levels was the way they misrepresented the very nature of subjects. And if you don’t fully understand or appreciate what progression looks like in your subject, any assessment you design will be flawed.

Daisy Christodoulou’s Making Good Progress is a helpful corrective, in particular her deliberate practice model of skill acquisition, which is extremely useful in explaining the manner in which different types of declarative and procedural knowledge can go into perfecting a more complex overarching skill. Similarly, Michael Fordham’s many posts on substantive and disciplinary knowledge, and how these might be mapped on to a history progression model are both interesting and instructive. Kris Boulton’s series of posts (inspired by some of Michael’s previous thinking) are also well worth a look. They consider the extent to which different subjects contain more substantive or disciplinary knowledge, and are useful points of reference for those seeking to understand how best to conceive of their subject and, in turn, design assessments that assess the range of underlying forms of knowledge.

Screenshot 2017-03-09 16.53.06.png

  1. Use the most appropriate format for the purpose of the assessment

 The format of an assessment should be determined by its purpose. Typically, subjects are associated with certain formats. So, in English essay tasks are quite common, whilst in maths and science, short exercises where there are right and wrong answers are more the norm. But as Dylan Wiliam suggests, although ‘it is common for different kinds of approaches to be associated with different subjects…there is no reason why this should be so.’ Wiliam draws a useful distinction between two modes of assessment: a marks for style approach (English, history, PE, Art, etc.), where students gain marks for how well they complete a task, and a degree of difficulty approach (maths, science), where students gain marks for how well they progress in a task. It is entirely possible for subjects like English to employ marks for difficulty assessment tasks, such as multiple choice questions, and maths to set marks for style assessments, as this example of comparative judgement in maths clearly demonstrates.

Screenshot 2017-03-09 16.53.18.png

In most cases, the purpose of assessment in the classroom will be formative and so designed to facilitate improvements to student learning. In such instances, where the final skill has not yet been perfected but is still very much a work in progress, it is unlikely that the optimal interim assessment format will be the same as the final assessment format. For example, a teacher who sets out to teach her students by the end of the year to construct well written, logical and well supported essays is unlikely to set essays every time she wants to infer her students’ progress towards that desired end goal. Instead, she will probably set short comprehension questions to check their understanding of the content that will go into the essay, or administer tests on their ability to deploy sequencing vocabulary effectively. In each of these cases, the assessment reflects the inferences about student understanding the teacher is trying to ascertain, and not confusing or conflating them with other things.

In the next post, I will outline our principles of assessment in relation to ethics and fairness. As I have repeatedly made clear, my intention is to help contribute towards a better understanding of assessment within the profession. I welcome anyone who wants to comment on our principles, or to critique anything that I have written, since this will help me to get a better understanding of assessment myself, and make sure the assessments that we ask our students to sit are as purposeful as possible.

Thanks for reading.



ResearchED Brighton: inside out not bottom up

Screenshot 2015-04-19 09.30.21

I have been to several ResearchEd events, but I have to say that I thought yesterday’s conference in Brighton was the best one, at least in terms of the amount and quality of ideas I took away with me. The high standard of the speakers certainly helped, as did the deliberate decision to make the event more intimate. It really did make a difference to be able to ask questions of the speakers and to share reflections during breaks. Once again, a big well done and thank you to Tom Bennnet and Hélène Galdin-O’Shea, and to the university of Brighton hosts for offering up such a splendid and amenable venue.

If previous ResearchED events have been characterised by a bottom up approach to the use of research in schools, today seemed to be more about working from inside out – a slightly nuanced adjustment to the metaphor of grassroots teacher professional development that I think better captures the way in which inquiry – in all its different guises – helps to grow the individual and, in turn, develop the organisation. However you frame the metaphor of what’s going in educational circles at the moment, these events sure do beat the stale training days in expensive hotels of yesteryear.

The keynote session was delivered by the charismatic figure of Daniel Muijs. His very pertinent presentation was about the extent to which it is possible to reliably measure teacher effectiveness. Drawing upon a range of international research, including some of his own as well the large-scale study into measuring teacher effectiveness conducted by the Bill and Melinda Gates Foundation, Mujis outlined the complex issues surrounding evaluating the performance of teachers. It was very clear that whilst for every measure there are advantages to be had, these often come at a considerable cost and lead to many significant undesirable consequences.

Screenshot 2015-04-19 09.19.25

Whilst the negative effects of using lesson observation for summative judgements are legion, Muijs did outline some of the ways in which it is possible to make them more effective, particularly if you are willing to invest the time, care and resource necessary to develop a coherent framework, such as the Charlotte Danielson model, and to train observers adequately on how to use it effectively. Even then, for observation to meet adequate standards of reliability and validity somewhere between 6-12 observations per teacher per year are required. I doubt there are many schools up and down the country willing or able to invest that much resource into observing every member of staff throughout the course of the year. The conclusion was that whilst some kind of balance of measures is probably best, this is still far, far from being perfect.

I was glad I stayed in the main hall for the next session, even though that meant missing out on what I later heard was an excellent session by Becky Allen on avoiding some of the pitfalls of testing, tracking and targets. In the main lecture hall Louise Bamfield and Paul Foster introduced the Research Rich Schools Website, a result of an initiative from the National College for Teaching and Leadership, which commissioned a group of teaching school alliances to develop a framework research and development tool in collaboration with the RSA. I haven’t had chance to properly investigate the site yet, but it promises to be an excellent resource, not only for designated Research Leads, but more broadly for teachers and organisations interested in developing their engagement with research and inquiry a stage further. The different levels of emerging, expanding and embedding seem helpful for supporting schools who are at different phases of development.

The next session was led by Andy Tharby on the ways in which his school, Durrington, have formed a partnership with Brighton University to support their teachers in running robust small-scale research projects. Originally the talk was to be co-presented by Brian Marsh, the school’s ‘critical friend’ from the university and from what I gathered a great bloke and fantastic storyteller. Unfortunately, Brian had to pull out at the last minute, but Andy carried on undeterred. Perhaps I am a little biased – I rate Andy’s blog and think he is excellent company – but it was really interesting to learn how his school are building up their engagement with research by matching it at different levels to teacher interest and expertise. Whilst he admits it is still in its embryonic stage, the many benefits of having a professional researcher to support, challenge and guide classroom teachers in conducting their own classroom inquiry were clear.

I don’t usually think of educational conferences in terms of their comedy value, but James Mannion’s presentation was a hoot! A combination of his own humourus and engaging style and the benefits of a smaller, more interactive audience, made this session both informative and enjoyable. James has spent the past 6 months or so working on developing an efficient and meaningful way to bridge the gap between educational research and classroom practice. He believes that ‘all teachers should systematically be engaged with professional inquiry’ and has developed a platform for this happen. The Praxis pilot platform, ‘launched’ at the previous Research Leads conference in Cambridge, provides an excellent online space for teacher to upload their own research inquiries, where they can then be shared and critiqued by others.

What I particularly like about James’s project is the way in which he has thought extremely carefully about how to make the whole process as efficient and as user-friendly as possible. There is an inquiry planner which follows a helpful format for thinking about and organising small-scale research.

  • Title
  • Context
  • Research Question(s)
  • Brief literature review
  • Avenue of inquiry
  • Research methods (how are you going to collect data? )
  • Findings / analysis
  • Conclusions
  • Evaluation

Screenshot 2015-04-19 09.20.20

Whilst I am not fully convinced about the overall aim of getting all teachers to be systematically engaged with professional inquiry (perhaps I simply need to know more about the terms of this statement), I find the sentiment behind it laudable and the effort expended on the project nothing short of remarkable. I can already think of several ways of incorporating James’s platform into the professional inquiry options on offer at my school. James will probably disagree, but I do see value in having a continuum of research options available for classroom teachers to engage with as part of their professional development. For James the word Praxis, as defined by Freire as ‘reflection and action upon the world, in order to transform it’ has much less baggage in educational circles than concepts like Lesson Study, practitioner-led research and disciplined inquiry. I am not so sure, and as Nick Rose pointed out, if anything it contains more of a trace of Marxist ideology. Anyway, for some, the small-scale teacher friendly Praxis model will be great, for others, models implied by the terms ‘disciplined inquiry’ and ‘lesson study may be more appropriate. Perhaps it is all semantics.

My day ended with Nick Rose’s wonderful session on different research tools he has developed to better facilitate teacher inquiry. In his role as research lead and leader of the coaching programme at his school, Nick has produced a number of excellent resources to better support the coaching process and help teachers to better understand what is going on in their classrooms. Some of these tools, all of which Nick stressed were for formative purposes only, included a classroom climate log, the use of student surveys and structured prompts to encourage focused self reflection on targeted areas of professional development.

For me, Nick’s session provided a lovely counterpoint to the findings about lesson observation made in Daniel Muij’s keynote, namely with regards to the different possibilities afforded to the profession from using observation as a formative practitioner tool rather than a high stakes judgement mechanism. I liked many of structured observation protocols Nick has developed on the back of Rob Coe’s work in relation to ‘thinking hard’ about subject content and poor proxies for learning. It was clear how these teaching and learning behaviours could be used as more proximate indicators of learning than the ones more commonly associated with Ofsted framework, particularly within a supportive coaching framework.

Those of you familiar with Nick’s fantastic blog, Evidence into Practice, will already know that Nick is an astute and incredibly meticulous thinker. His real life presentation style is equally impressive and I came out of his session with my head bursting with ideas. I can’t remember being so intellectually stretched by the complexity and range of ideas on offer in a session before, so when Nick announced at the end that ‘he has only just got started with this work’, I joined with everyone else in spontaneous laughter. Has there ever been such an example of ironic self-deprecation before? Probably not.

This was a wonderful day with wonderful people.

Thank you to all at ResearchED.