Principles of Great Assessment #2 Validity and Fairness


This is the second of a three part series on the principles of great assessment. In my last post I focused on some principles of assessment design. This post outlines the principles that relate to ideas of validity and fairness.* As I have repeatedly stressed, I do not consider myself to be an expert in the field of assessment, so I am more than happy to accept constructive feedback to help me learn and to improve upon the understanding of assessment that we have already developed as a school. My hope is that these posts will help others to learn a bit more about assessment, and for the assessments that students sit to be as purposeful and supportive of their learning as possible.

So, here are my principles of great assessment 6-10.

6. Regularly review assessments in light of student responses

Validity in assessment is extremely important. For Daniel Koretz it is ‘the single most important criterion for evaluating achievement testing.’ Often when teachers talk about an assessment being valid or invalid, they are using the term incorrectly. In assessment validity means something very different to what it means in everyday language. Validity is not a property of a test, but rather of the inferences that an assessment is designed to produce. As Lee Cronbach observes, ‘One validates not a test but an interpretation of data arising from a specified procedure’ (Cronbach, 1971).

There is therefore no such thing as a valid or invalid assessment. A maths assessment with a high reading age might be considered to provide valid inferences for students with a high reading age, but invalid inferences for students with low reading ages. The same test can therefore provide both valid and invalid inferences depending on its intended purpose, which links back to the second assessment principle: the purpose of the assessment must be set and agreed from the outset. Validity is thus specific to particular uses in particular contexts and is not an ‘all or nothing’ judgement but rather a matter of degree and application.

Picture4If you understand that validity applies to the inferences that assessments provide, then you should be able to appreciate why it is so important to make sure that an assessment gives as valid inferences about student achievement as possible, particularly when there are significant consequences attached for students taking them, like attainment grouping. There are two main threats to achieving this validity: construct under-representation and construct irrelevance. Construct under-representation refers to when a measure fails to capture important aspects of the construct, whilst construct irrelevance refers to when a measure is influenced by things other than just the construct i.e. the example of high reading age in a maths assessment.

There are a number of practical steps that teachers can take to help reduce these threats to validity and, in turn, to increase the validity of the inferences provided by their assessments. Some are fairly obvious and can be implemented with little difficulty, whilst others require a bit more technical know-how and/or a well-designed systematic approach that provides teachers with the time and space needed to design and review their assessments on a regular basis.

Here are some practical steps educators can take:

Review assessment items collaboratively before a new assessment is sat

Badly constructed assessment items create noise and can lead to students guessing the answer. Where possible, it is therefore worth spending some time and effort upfront, reviewing items in a forthcoming summative assessment before they go live so that any glaring errors around the wording can be amended, and any unnecessary information can be removed. Aside from making that assessment more likely to generate valid inferences, such as approach has the added advantage of training those less confident in assessment design in some of the ways of making assessments better and more fit for purpose. In an ideal world, an important assessment should be piloted first to provide some indication of issues with items, and the likely spread of results across an ability profile. This will not always be possible.

Check questions for cues and contextual nudges

Another closely-linked problem and another potential threat to validity is flawed question phrasing that inadvertently reveals the answer, or provides students with enough contextual cueing to narrow down their responses to particular semantic or grammatical fit. In the example item from a PE assessment below, for instance, the phrasing of the question, namely the grammatical construction of the words and phrases around the gaps, make anaerobic and aerobic more likely candidates for the correct answer. They are adjectives which precede nouns, whilst the rest of the options are all nouns and would sound odd to a native speaker – a noun followed by a noun.  A student might select anaerobic and aerobic, not because they necessarily know the correct answer, but because they sound correct in accordance with the syntactical cues provided. This is a threat to validity in that the inference is perhaps more about grammatical knowledge rather than understanding of bodily process.

Example: The PE department have designed an end of unit assessment to check students’ understanding of respiratory systems. It includes the following types of item.

Task: use two of the following words to complete the passage below

Anaerobic, Energy, Circulation, Metabolism, Aerobic 

When the body is at rest this is ______ respiration. As you exercise you breathe harder and deeper and the heart beats faster to get oxygen to the muscles. When exercising very hard, the heart cannot get enough oxygen to the muscles. Respiration becomes _______.

Interrogate questions for construct irrelevance

If the purpose of an assessment has been clearly established from the outset and that assessment has been clearly aligned to the constructs within the curriculum, then a group of subject professionals working together should be able to identify items where things other than the construct are being assessed. Obvious examples are high reading ages that get in the way of assessments of mathematical or scientific ability, but sometimes it might be harder to detect, as with the example below. To some, this item might seem fairly innocuous, but on closer inspection it becomes clear that it is not assessing vocabulary knowledge as purported, but rather spelling ability. Whilst it may be desirous for students to spell words correctly, inferences about word knowledge would not be possible from an assessment with these kinds of items in it.

Example: The English department designs an assessment to measure students’ vocabulary skills. The assessment consists of 40 items like the following:

Task: In all of the ________________ of packing into a new house, Sandra forgot about washing the baby.

  1. Excitement
  2. Excetmint
  3. Excitemant
  4. Excitmint

7. Standardise assessments that lead to important decisions

Teachers generally understand the importance of making sure that students sit final examinations in an exam hall under same conditions as everyone else taking the test. Mock examinations tend to replicate these conditions, because teachers and school leaders want the inferences provided by them to be as valid and fair as possible. For all manner of reasons, though, this insistence on standardised conditions for test takers is less rigorously adhered to lower down the school, even though some of decisions based upon such tests in year 7 and 8 arguably carry much more significance for students than any terminal examination.

I know that I have been guilty of not properly understanding the importance of standardising test conditions.  On more than one occasion I have set an end of unit or term assessment as a cover activity, thinking that it was ideal work because it would take students the whole lesson to complete and they would need to work in silence. I hadn’t appreciated how assessment is a bit more complicated than that, even for something like an end of unit test. I hadn’t considered, for instance, that it mattered whether students got the full hour, or more likely 50mins if it was set by a cover supervisor who had to spend valuable time settling the class. I hadn’t taken on board that it would make a difference if my class sat the assessment on a afternoon, and the class next door completed theirs bright and early in the morning.

It may well be that my students would have scored exactly the same whether or not I was present, whether they sat the test in the morning or in the afternoon, or whether they had 50 minutes or the full hour. The point is that I could not be sure, and that if one or more of my students would have scored significantly higher (or lower) under different circumstances, then their results would have provided invalid inferences about their understanding. If they were then placed in a higher or lower group as a result, or I reported home to their parents some erroneous information about their test scores, which possibly affected their motivation or self-efficacy, then you could suggest that I had acted unethically.

8. Important decisions are made on the basis of more than one assessment

Imagine you are looking to recruit a new head of science. Now imagine the even more unlikely scenario that you have received a strong field of applicants, which I appreciate in the current recruitment climate, is a bit of a stretch of the imagination. With such a strong field for such an important post, a school would be unlikely to make any decision on whom to appoint based upon the inferences provided by one single measure, such as an application letter, a taught lesson or an interview. More likely, they would triangulate all these different inferences about the candidate’s suitability for the role when making their decision, and even then crossing their fingers that they had made the right choice.

A similar principle is at work when making important decisions on the back of student assessment results, such as which group to place them in the following term, identifying which individuals need additional support or how much, if any, progress to report home to parents. In each of these cases, as with the head of science example, it would be wise to be able to draw upon multiple inferences in order to make a more informed decision. This is not to advocate an exponential increase in the number of tests students sit, but rather to recognise that when the stakes are high, it is important to make sure the information we use is as valid as possible. Cross referencing examinations is one way of achieving this, particularly given the practical difficulties of standardising assessments previously discussed.

9. Timing of assessment is determined by purpose and professional judgement

The purpose of an assessment informs its timing. Whilst this makes perfect sense in the abstract, in practice there are many challenges to making this happen. In Principled Assessment Design, Dylan Wiliam notes how it is relatively straightforward to create assessments which are highly sensitive to instruction if what is taught is not hard to teach and learn. For example, if I all I wanted to teach my students in English was vocabulary, and I set up a test that assessed them on the 20 or so words that I had recently taught them, it would be highly likely that the test would show rapid improvements in their understanding of these words. But as we all know, teaching is about much more than just learning a few words. It involves complex cognitive processes and vast webs of interconnected knowledge, all of which take a considerable amount of time to teach, and in turn to assess.


It seem that’s the distinction between learning and performance is becoming increasingly well understood, though perhaps in terms of curriculum and assessment its widespread application to the classroom is taking longer to take hold. The reality for many established schools is that it is difficult to construct a coherent curriculum, assessment and pedagogical model across a whole school that embraces the full implications of the difference between learning and performance. It is hard enough to get some colleagues to fully appreciate the distinction, and its many nuances, so indoctrinated are they by years of the wrong kind of impetus. Added to this, whilst there is general agreement that assessing performance can be unhelpful and misleading, there is no real consensus of the optimal time to assess for learning. We know that assessing soon after teaching is flawed, but not exactly when to capture longer term learning. Compromise is probably inevitable.

What all this means in practical terms for schools is they to work within their localised constraints, including issues of timetabling, levels of understanding amongst staff and, crucially, the time and resources to enact the theory when known and understood. Teacher workload must also be taken into account when deciding upon the timing of assessments, recognising certain pinch points in the year and building a coherent assessment timetable that respects the division between learning and performance, builds in opportunities to respond to (perceived) gaps in understanding and spreading out the emotional and physical demands for staff and students. Not easy, at all.

10. Identify the range of evidence required to support inferences about achievement

Tim Oates’ oft quoted advice to avoid assessing ‘everything that moves, just the key concepts’ is important to bear in mind, not just for those responsible for assessment, but also for those who design the curricula with which those assessments are aligned. Despite the freedoms afforded from the liberation of levels and the greater autonomy possible with academy status, many of us have still found it hard to narrow down what we teach to what is manageable and most important. We find it difficult in practice to sacrifice breadth in the interests of depth, particularly where we feel passionately that so much is important for students to learn. I know it has taken several years for our curriculum leaders to truly reconcile themselves to the need to strip out some content and focus on teaching the most important material to mastery.

Once these ‘key concepts’ have been isolated and agreed, the next step is to make sure that any assessments cover the breadth and depth required to gain valid inferences about student achievement of them.  I think the diagram below, which I used in my previous blog, is helpful in illustrating how assessment designers should be guided by both the types of knowledge and skills that exit within the construct (the vertical axis) and the levels of achievement across each component i.e. the continuum (horizontal axis). This will likely look very different in some subjects, but it nevertheless provides a useful conceptual framework for thinking about the breadth and depth of items required to support valid inferences about levels of attainment of the key concepts.

Screenshot 2017-03-09 16.53.06

In my next post, which I must admit I am dreading writing and releasing for public consumption, is focusing on trying to articulate a set of principles around the very thorny and complicated area of assessment reliability. I think I am going to need a couple of weeks or so to make sure that I do it justice!

Thanks for reading!


* I am aware the numbering of the principles on the image does not match the numbering in my post. That’s because the image is a draft document.


Principles of Great Assessment #1 Assessment Design

Screenshot 2017-03-10 17.52.06.png

This is the first in a short series of posts on our school’s emerging principles of assessment, which are split into three categories – principles of assessment design; principles of ethics and fairness; and principles for improving reliability and validity. My hope in sharing these principles of assessment is to help other develop greater assessment literacy, and to gain constructive feedback on our work to help us improve and refine our model in the future.

In putting together these assessment principles and an accompanying CPD programme aimed at middle leaders, I have drawn heavily on a number of writers and speakers on assessment, notably Dylan Wiliam, Daniel Koretz, Daisy Christodolou, Rob Coe and Stuart Kime. All of these have a great ability to convey difficult concepts (I only got a C grade in maths, after all) in a clear, accessible and, most importantly, practical way. I would very much recommend following up their work to deepen your understanding of what truly makes great assessment.

  1. Align assessments with the curriculum 

 Screenshot 2017-03-10 17.52.48.png

In many respects, this first principle seems pretty obvious. I doubt many teachers deliberately set out to create and administer assessments that are not aligned with their curriculum. And yet, for a myriad of different reasons, this does seem to happen, with the result that students sit assessments that are not directly sampling the content and skills of the intended curriculum. In these cases the results achieved, and the ability to draw any useful inferences from them, are largely redundant. If the assessment is not assessing the things that were supposed to have been taught, it is almost certainly a waste of time – not only for the students sitting the test, but for the teachers marking it as well.

Several factors can affect the extent to which an assessment is aligned with the curriculum and are important considerations for those responsible for setting assessments. The first is the issue of accountability. Where accountability is unreasonably high and a culture of fear exists, those writing assessments might be tempted to narrow down the focus to cover the ‘most important’ or ‘most visible’ knowledge and skills that drive that accountability. In such cases, assessment ceases to provide any useful inferences about knowledge and understanding.

Assessment can also become detached from the curriculum when that curriculum is not delineated clearly enough from the outset. If there is not a coherent, well-sequenced articulation of the knowledge and skills that students are to learn, then any assessment will always be misaligned, however hard someone tries to make the purpose of the assessment valid. A clear, well structured and shared understanding of the intended curriculum is vital for the enacted curriculum to be successful, and for any assessment of individual and collective attainment to be purposeful.

A final explanation for the divorce of curriculum from assessment is the knowledge and understanding of the person writing the assessment in the first place. To write an assessment that can produce valid inferences requires a solid understanding of the curriculum aims, as well as the most valid and reliable means of assessing them. Speaking for myself, I know that I have got a lot better at writing assessments that are properly aligned with curriculum the more I have understood the links between the two and how to go about bridging them.

  1. Define the purpose of an assessment first 

 Depending on how you view it, there are essentially two main functions of assessment. The first, and probably most important, purpose is as a formative tool to support teaching and learning in the classroom. Examples might include a teacher setting a diagnostic test at the beginning of a new unit to find out what students already know so their teaching can be adapted accordingly. Formative assessment, or responsive teaching, is an integral part of teaching and learning and should be used to identify potential gaps in understanding or misconceptions that can be subsequently addressed.

The second main function of assessment is summative. Whereas examination bodies certify student achievement, in the school context the functions of summative assessment might include assigning students to different groupings based upon perceived attainment, providing inferences to support the reporting of progress home to parents, or the identification of areas of underperformance in need of further support. Dylan Wiliam separates out this accountability function from the summative process, calling it the ‘evaluative’ purpose.

Whether the assessment is designed to support summative or formative inferences is not really the point. What matters here is that the purpose or function of the assessment is made clear to all and that the inferences the assessment is intended to produce are widely understood by all. In this sense, the function of the assessment determines its form. A class test intended to diagnose student understanding of recently taught material will likely look very different from a larger scale summative assessment designed to draw inferences about whether knowledge and skills have been learnt over a longer period of time. Form therefore follows function.

3. Include items that test understanding across the construct continuum

 Many of us think about assessment in the reductive terms of specific questions or units, as if performance on question 1 of Paper 2 was actually a thing worthy of study in and of itself. Assessment should be about approximating student competence in the constructs of the curriculum. A construct can be defined as the abstract conception of a trait or characteristic, such as mathematical or reading ability. Direct constructs measure tangible physical traits like height and weight and are calculated using verifiable methods and stated units of measurement. Unfortunately for us teachers, most educational assessment assesses indirect constructs that cannot be directly measured by such easily understood units. Instead, they are calculated by questions that we think indicate competency, and that stand in for the thing that we cannot measure directly.

Within many indirect constructs, such as writing or reading ability, is likely to be a continuum of achievement possible. So within the construct of reading, for instance, some students will be able to read with greater fluency and/or understanding than others. A good summative assessment therefore needs to differentiate between these differing levels of performance and, through the questions set, define what it means to be at the top, middle or bottom of that continuum. In this light, one of the functions of assessment has to be a way of estimating the position of learners on a continuum. We need to know this to evaluate the relative impact or efficacy of our curricula, and to understand how are students are progressing within it.

Screenshot 2017-03-09 16.52.15.png

  1. Include items that reflect the types of construct knowledge

 Some of the assessments we use do not adequately reflect the range of knowledge and skills of the subjects they are assessing. Perhaps the format of terminal examinations has had too much negative influence on the way we think about our subjects and design assessments for them. In my first few years of teaching, I experienced considerable cognitive dissonance between my understanding of English and the way that it was conceived of within the profession. I knew my own education was based on reading lots of books, and then lots more books about those books, but everything I was confronted with as a new teacher – schemes of work, the literacy strategy, the national curriculum, exam papers– led me to believe that I should really be thinking of English in terms of skills like inference, deduction and analysis.

English is certainly not alone here, with history, geography and religious studies all suffering from a similar identify crisis. This widespread misconception of what constitutes expertise and how that expertise is gained probably explains, at least in part, why so many schools have been unable to envisage a viable alternative to levels. Like me, many of the people responsible for creating something new themselves been infected by errors from the past and have found it difficult to see clearly that one of the big problems with levels was the way they misrepresented the very nature of subjects. And if you don’t fully understand or appreciate what progression looks like in your subject, any assessment you design will be flawed.

Daisy Christodoulou’s Making Good Progress is a helpful corrective, in particular her deliberate practice model of skill acquisition, which is extremely useful in explaining the manner in which different types of declarative and procedural knowledge can go into perfecting a more complex overarching skill. Similarly, Michael Fordham’s many posts on substantive and disciplinary knowledge, and how these might be mapped on to a history progression model are both interesting and instructive. Kris Boulton’s series of posts (inspired by some of Michael’s previous thinking) are also well worth a look. They consider the extent to which different subjects contain more substantive or disciplinary knowledge, and are useful points of reference for those seeking to understand how best to conceive of their subject and, in turn, design assessments that assess the range of underlying forms of knowledge.

Screenshot 2017-03-09 16.53.06.png

  1. Use the most appropriate format for the purpose of the assessment

 The format of an assessment should be determined by its purpose. Typically, subjects are associated with certain formats. So, in English essay tasks are quite common, whilst in maths and science, short exercises where there are right and wrong answers are more the norm. But as Dylan Wiliam suggests, although ‘it is common for different kinds of approaches to be associated with different subjects…there is no reason why this should be so.’ Wiliam draws a useful distinction between two modes of assessment: a marks for style approach (English, history, PE, Art, etc.), where students gain marks for how well they complete a task, and a degree of difficulty approach (maths, science), where students gain marks for how well they progress in a task. It is entirely possible for subjects like English to employ marks for difficulty assessment tasks, such as multiple choice questions, and maths to set marks for style assessments, as this example of comparative judgement in maths clearly demonstrates.

Screenshot 2017-03-09 16.53.18.png

In most cases, the purpose of assessment in the classroom will be formative and so designed to facilitate improvements to student learning. In such instances, where the final skill has not yet been perfected but is still very much a work in progress, it is unlikely that the optimal interim assessment format will be the same as the final assessment format. For example, a teacher who sets out to teach her students by the end of the year to construct well written, logical and well supported essays is unlikely to set essays every time she wants to infer her students’ progress towards that desired end goal. Instead, she will probably set short comprehension questions to check their understanding of the content that will go into the essay, or administer tests on their ability to deploy sequencing vocabulary effectively. In each of these cases, the assessment reflects the inferences about student understanding the teacher is trying to ascertain, and not confusing or conflating them with other things.

In the next post, I will outline our principles of assessment in relation to ethics and fairness. As I have repeatedly made clear, my intention is to help contribute towards a better understanding of assessment within the profession. I welcome anyone who wants to comment on our principles, or to critique anything that I have written, since this will help me to get a better understanding of assessment myself, and make sure the assessments that we ask our students to sit are as purposeful as possible.

Thanks for reading.



Principles of Great Assessment: Increasing the Signal and Reducing the Noise

Screenshot 2017-03-09 17.03.29.png

After the government abolished National Curriculum levels, there was a great deal of initial rejoicing from both primary and secondary teachers about the death a flawed system of assessment. Many, including myself, delighted in the freedom afforded to schools to design their own assessment systems anew. At the time I had already been working on a model of assessment for KS3 English – the Elements of Assessment – and believed that the new freedoms were a positive step in improving the use of assessment in schools.

Whilst I still think that the decision to abolish levels was correct, I am no longer quite so sure about the manner and timing in which they were removed. Since picking up responsibility for assessment across the school, I have come to realise just how damaging it was for schools to have to invent their own alternatives to levels without anywhere near enough assessment expertise to do so well. Inevitably, many schools simply recreated levels under a different name, or retreated into the misguided safety of the flight path approach.

I would like to think that our current KS3 assessment model, the Elements of Expectation, has the potential to be a genuine improvement on National Curriculum levels, supporting learning and providing reliable summative feedback on student progress at sensible points in the calender. Even though it is in its third year, however, it is still not quite right. One of the things that I think is holding us back is our lack of assessment literacy. I am probably one of the more informed staff members on assessment, but most of what I know has been self-taught from reading some books and hearing a few people talk.

This year, in an effort to do something about this situation and to finally get our KS3 model closer to what we want, we have run some extensive professional development on assessment. Originally, I had intended to send some colleagues to Evidence Based Education’s inaugural Assessment Academy. It looks superb and represents an excellent opportunity to learn much more about assessment. But when it became clear budget constraints would make this difficult, we decided to set up and run our own in-house version: not as good (obviously) and inevitably rough around the edges, but good enough, I think, for our KS3 Co-ordinators and heads of subjects to develop the expertise they need to improve their use of assessment with our students.

The CPD is iterative and runs throughout the course of the year. So far, we have established a set of assessment principles that we will use to guide the way we design, administer and interpret assessments in the future. In the main, these principles apply to the use of medium to large-scale assessments, where the inferences drawn will be used to inform relatively big decisions, such as proposed intervention, student groupings, predictions, reporting progress, etc. Assessment as a learning event is pretty well understood by most of our teachers and is already a feature of many of our classrooms, so our focus is more on improving the validity and reliability of our summative inferences.

I thought it might be useful and timely to share these principles over a series of posts, especially as a lot of people still seem to be struggling, like us, to create something better and more sustainable than levels. The release of Daisy Christodolou’s book Making Good Progress has undoubtedly been a great and timely help, and I intend it to provide some impetus to our sessions going forward, as we look to implement some of the theory we covered before Christmas into something practical and useful. This excellent little resource from Evidence based Education is an indication of some of the fantastic work out there on improving assessment literacy. I hope I can add a little more in my next few posts.

If we are going to take the time and the trouble to get our students to sit assessments, then we want to make sure that the information is as reliable and valid as possible, and that we don’t try and ask our assessments to do too much. The first in my series of blogs will be on our principles of assessment design, with the other two on ethics and fairness and then, finally, reliability and validity.

All constructive feedback welcome!

Cleverlands – A Smart Read


Lucy Crehan‘s Cleverlands is a great read. As well as providing a fantastic overview of the workings of many of the world’s leading education systems, Cleverlands also offers a unique insight into the culture and people who live and breathe the systems on a daily basis – the parents, the teachers, and the students themselves, all of whom, in one way or another, are asking the same fundamental questions about education: what should young people learn, and how can we make education better and fairer for all?

After working for three years in a London secondary school teaching science, Crehan wanted to find out more about why some countries seemed to perform better with their educational outcomes than others, at least according to PISA assessment scores. She embarked on a journey that took in most of the world’s prominent education jurisdictions – the usual suspects such as Singapore, Finland and Japan – with the aim of getting to the heart of the reality behind the statistics of national comparison data. The result is this fantastic book, written by someone who clearly understands that headlines only ever tell part of the story, and who has a keen eye for the nuance of research data, which we know is too often appropriated by those looking for quick fixes and easy answers.

Cleverlands is organised into 18 perfectly weighted chapters that each focus on exploring an aspect of a particular educational system. One of the things that make this such a pleasurable read is the clarity of Crehan’s writing, and in particular her effortless blend of travelogue, considered analysis and opinion. One moment we’re inside the home of one of the many teachers who agree to house her during her travels, to show her around their schools and to act as her interpreter, and the next we’re taking a step back to review the bigger picture, looking at the research, or learning about a country’s social and cultural history.  Throughout I felt cheered by the essential kindness of strangers, and by the way that so many teachers around the world were willing to help Crehan on the back of just a few speculative emails, which she herself admits were rather optimistic and the potential actions of a ‘lunatic’.

There is something to learn from each of the countries under examination. Not so much in terms of directly taking any of the ideas or approaches being described and blindly applying them to a different classroom, school or even system – the book is clear that, despite what some politicians might think, it’s a bit more complicated than that – but more in the sense in which the insights that the book offers into the lives of others, enables a greater understanding of ideas, beliefs and practices much closer to home. In many respects, I found myself thinking how much we fall short in comparison to our international colleagues, and I don’t mean in PISA scores, which often don’t reveal the complete picture.

Compared to the very best education systems our system does not appear to be very systematic at all, at least where it really matters: in developing great teachers, in raising the status of the profession and in giving the time and resource necessary to genuinely improve educational outcomes. Whether or not you like the Singaporean approach to widespread streaming (I suspect you won’t and to be fair, neither it seems do the Singaporeans), you have to admire the fact they have a coherent plan, one that a great deal of thought went into producing. Time after time what emerges from each of the stories of educational success from China to Canada is the notion of coherence and joined-up thinking. There are drawbacks, caveats and nuance aplenty, but at least the world’s leading education nations have a strategy, whereas all we seem to have is fracture, self-interest and free market chaos.

Depending on how much you read about teaching or follow education policy in the media, there will be bits of Cleverlands that you will probably already know about, or at the very least with which you will be quite familiar. For instance, Japan’s large class sizes, high levels of parental engagement and collaborative teaching practices will be common knowledge to most who will seek out this book in the first place. Likewise the triumphs of the Finnish system that lead to the outstanding results of the 2006 PISA report are well documented, in particular the high standard of teaching training, the prestige of the profession in society and the role of high quality textbooks in ensuring curriculum coherence. Familiar too will be the backlash against this success and the supposed fall of the Finnish star in recent years.

But even within the familiar, there are surprises and lesser known, but nevertheless fascinating, observations. For instance, the significant changes in demographics that Finland has faced in the last 20 or so years was news to me, as was their heavy investment in a multi-discipline approach to tackling welfare issues early on in a child’s education. Crehan describes the weekly meetings that take place in Finnish schools between education specialists and class teachers to discuss individual students and devise plans to tackle their social and academic needs. Whilst there are, admittedly, signs of the all too recognisable bureaucracy here, as Crehan rightly points out, it’s ultimately the right approach at the right time. Whereas the Finns look to act on disparity and need early on, in this country we tend to put ‘interventions into place that attempt to deal with a symptom of a problem, rather than its underlying cause.’ Too little, too late in other words.

Cleverlands is published by Unbound using a crowd-funding model, where readers who like the sound of the book’s synopsis contribute to its production. Judging by how quickly Crehan reached her target, it’s clear that there is a lot of interest for this kind of well-informed, well-written educational voyeurism. Whilst there are similarish books on the market– I’m thinking of Amanda Ripley’s The Smartest Kids in the World ­– nothing I’ve read quite manages to achieve the same happy balance between human sentiment and cool analysis. Clearly Crehan’s previous incarnation as a teacher has helped her to focus on the things that we want to know and presented them in such an engaging way that leaves you feeling better informed, if not slightly frustrated at the continued failings and short-sightedness of our own not-so-clever land. This really is a smart read – buy a copy as soon as you can.

Disciplined enquiry, or how to get better at getting better


How do you know what to do to improve your teaching? And if you can identify what you need to do get better, how do you know whether what you are doing to try and improve is actually making a difference where it really matters: in developing your students’ learning?

I think there are probably five main sources available to teachers to help them identify areas for their improvement. These are the data on their students’ outcomes, feedback from their colleagues, feedback from their students, research evidence into what works and where, and, finally, their reflections about their practice.

Each of these sources can be extremely useful, providing teachers with valuable insights into where they might need to focus. Equally, they can all be very unhelpful, giving unreliable feedback on areas of strength and weakness, particularly where limitations and nuances are not fully understood, or where potential improvement tools are used as performance measures.

Perhaps the best approach is to take a number of these sources of feedback together, increasing the likelihood of identifying genuine areas for improvement. In subsequent posts, I hope to outline a framework that harnesses these feedback mechanisms into a clear and systematic structure, but for now I want to focus on exploring just one means of self-improvement: getting better at being you.

In many respects, you are both the best source of feedback, and the worst of source of feedback; you can be wise and foolish in equal measure! The problem is that, whilst you are undoubtedly the one who spends the most time with your students and the one who thinks the most carefully about how to help them improve, you are also extremely prone to bias and flawed thinking, which can make it hard for you to trust your judgements, especially in relation to developing your own practice.

Others have written extensively about human fallibility and the dangers of trusting instinct. Daniel Kahnemman’s Thinking Fast and Slow, David Didau’s What If Everything You Knew About Education Was Wrong? and David Mcraney’s You Are Not So Smart all provide excellent insights into how we humans routinely get things wrong. It is clear, then, that we need to understand and respect our cognitive limitations and avoid thinking we know what works just because it feels right. Instinct is not enough. That said, I believe we can be useful sources of feedback in relation to improving our own teaching, particularly if we can learn how to reduce the impact of our biases and can get better at being more objective.

What is disciplined enquiry

Honing the skills of restrained reflection is the hallmark of a disciplined enquirer, and disciplined enquiry is what I have come to think is probably the best we way can grow and develop as a profession. Like many terms in education, disciplined enquiry means lots different things to lots of different people. For me, it represents the intersection between the science and the craft of teaching, and involves a systematic approach that encourages teachers to ‘think hard’ about their improvement and making use of the best available evidence to inform their decision-making. My definition of a disciplined enquirer tries to capture this complexity:

A disciplined enquirer draws upon internal and external experience – they operate as both subject and object in relation to improving their own practice. Through a systematic framework a disciplined enquirer develops the ability to limit the impact of bias, whilst learning how to become more attune to interpreting the complexity of the classroom, such as appreciating the role of emotions, the impact of actions and the nature of relationships. Over time and through deliberate noticing they become increasingly sensitive to interpreting patterns of behaviour and learning how to react better in the moment and how to make better decisions in the future.

Understanding how we make decisions

Perhaps the first step to becoming a disciplined enquirer is to recognise the nature of decision-making itself. Kahneman’s model of system one and system two thinking is instructive here. System one thinking describes the way we use mental shortcuts to quickly make sense of complex phenomena and to give us the appearance of coherence and control, whereas the system two model uses a more methodical and analytical approach to decision-making, where we take our time to review and weigh up choices. The trade off between the two modes is time and effort. The result is that busy teachers come to rely more and more on quick, instinctive system one thinking over the slower, more deliberate system two model, which can lead to mistakes.

As well as understanding how we make decisions and how we react to given situations, a disciplined enquirer needs to appreciate the way that we gain insights in the first place, since it is the opening up new ways of seeing that we are ultimately looking for in order to help us improve our practice. It seems to me that if we know the conditions under which we are more likely to learn something new, whether about our teaching, our students’ learning or any other aspect of the classroom environment, then we are better able to take steps to recreate these conditions and harness them when they manifest.

In Seeing What Others Don’t See, Gary Klein uses a triple-path-model to illustrate the ways in which we commonly reach such new insights. Klein’s model challenges the widely held notion of eureka moments, where inspiration or epiphany follows long periods of gestation. From studying decision-making in naturalistic conditions, Klein suggests there are three main triggers that typically lead to new insights – contradiction, connection, and creative desperation. These triggers, working on their own or in combination, shift or supplant the existing anchors that we ordinarily rely upon to make decisions. An anchor is a belief or story that gives us a sense of coherence and informs the decisions that we make, often without us even realising.


In some respects, Klein’s anchors resemble the idea of mental shortcuts, or heuristics, in Kahneman’s model of system one thinking. The anchor and the heuristic both guide action, usually subconsciously, and both can prevent us from seeing things clearly. Whilst we need heuristics (or anchors) to make our daily lives manageable – getting from A to B, for instance, without endlessly checking the route – for more complex decision making, such as that which constitutes classroom teaching, they can often lead us to make mistakes or develop false notions of what works. Disciplined enquiry should therefore seek to find ways to engage system two thinking, and to consciously trigger the cultivation of better anchors to help us improve our decision-making.

There are a number of steps that can help achieve this end. The diagram below gives an idea of what this might look like in practice. None of the suggestions are a panacea – it is surprisingly difficult to shift our thinking in relation to our deeply held values and beliefs – but they are an attempt to provide some sense of how we could get better at not only making decisions, but also of being aware of the reasons why we are making those decisions in the first place. The goal for disciplined enquiry is, then, to try ti find ways to override system one intuition, and activiate system two consideration.


Identifying inconsistency

One example Klein uses to illustrate the trigger of identifying inconsistency is the case of an American police officer who whilst following a new car is struck by the strange behaviour of the man in the passenger seat. Following the car, which is otherwise being driven normally, the officer notices the passenger appear to stub a cigarette out on the seat. What he witnesses is at odds with his understanding of what people normally do when riding as passengers in new cars. As a result he decides to pull the car over – an action that leads to an arrest, when it turns out that the car has in fact been stolen.

There are several ways a disciplined enquirer can set out to deliberately create this kind of inconsistency of thought – the sort of cognitive dissonance that might lead to a useful new insight into an aspect of pedagogy. One obvious way is to actively seek out alternative views or dissenting voices. Rather than always being surrounded by likeminded opinions, whether online or in the staffroom, teachers wishing to improve their practice should spend time listening to the views of those with contrary positions. This approach helps to avoid groupthink and fosters the kind of self-questioning that might shed light on an area of practice previously hidden.

Spotting coincidence

Unlike the trigger of identifying inconsistency, the trigger of spotting coincidence is about looking for similarities and patterns between phenomena and using these revealed relationships to build new insights. One of Klein’s examples of how spotting coincidence can change understanding and lead to meaningful changes in practice involves the American physician, Michael Gottilieb. After noticing connections between the symptoms of a number of his homosexual patients in the early 1980s, Gottilieb began to realise that what he was actually dealing with was something very different and very important from what he had previously experienced. His insights led him to publish the first announcement of the AIDS epidemic.

There are two crucial aspects of this story in respect of disciplined enquiry. The first is that Gottilieb’s insight didn’t happen overnight. It was slow process over a long period of time involving the gradual noticing of patterns that could not initially be attributed to something already known. Too often us teachers try to make too many changes to our practices too quickly, without understanding or assessing their impact. The second important point is how much Gottilieb retained his focus – he didn’t just notice something once, think it was interesting and then move on; instead he relentlessly pursued an emerging pattern, consciously noting down his observations, until he could formulate his observations into something more concrete and usable.

One of the key things that leads to developing new insights is thus a combination of time and deliberate attention: being alive to the possibility that two or three things that have something in common may lead to something more meaningful, or they may not. As the name suggests, disciplined enquiry involves disciplined focus, something so often overlooked in education in the scramble to share untested best practice. It is far better to isolate one or two variables in the classroom and look to notice their impact on student learning, than to proceed on a whim.

Escaping an empasse

Perhaps the most poignant story in Klein’s book is the story of a group of smokejumers who were parachuted into the hills of Montana in 1949 in an attempt to control a large forest fire that was spreading quickly. The firefighters were soon caught in the fire themselves which was moving swiftly up the side of the grassy hillside. The men tried to outrun the fire, but sadly only two of the original 15 made it to the top. The other 13 could not run fast enough and were consumed by the onrushing flames.

One of the two men to survive was Wagner Dodge who, like the others, initially tried to outrun the flames, but, unlike the others, realised that this wasn’t going to work and unless he did something different he would die. His quick-thinking insight was to set fire to a patch of grass ahead of him, thus creating an area of safety where he could stand with the fire deprived of its fuel. In a moment of literal life and death decision-making, Dodge had arrived at a creative solution that had unfortunately passed his friends by. Out of desperation, Dodge had discarded his intuition (to run), and thought hard about a radical solution (to cut of the fire’s fuel source).

Obviously, as important as teaching is, it is not really a profession that rests on life or death decisions. That said, there are aspects from the story of the Colorado smokejumpers, in particular the counterintuitive actions of Wagner Dodge, that a disciplined enquirer can learn from in an effort to increase their chances of generating new insights. Foremost amongst those lessons, is the way that a fixed condition – in this case the fire sweeping up the fireside – forced Dodge to focus on the other variables open to him. It may be that self-imposed limitations, such as deadlines, parameters for recording reflections or routines of practice, rather than stifle thinking, may actually encourage new ways of seeing. Being forced to consider all possibilities, including rejecting existing ideas and beliefs, could enhance our ability to make great sense of student interaction or learning. After all, the famous Pomodoro Technique is largely predicated on the notion that short bursts of focused, time-bound thinking produce much better results that longer, drawn out periods of study.

Disciplined enquiry is not easy and does make demands on what is already a very demanding job. That said, if there is a framework and culture that supports disciplined enquiry and makes the systematic study of one or two areas of improvement routine, then I think it could be a powerful means of both individual teacher and whole school improvement. What this framework might look like will be the subject of my next post.