pedagogy · Policy

Using comparative statistical analysis to improve educational assessment

I’ve always pondered why education does not use comparative statistical analysis to triangulate the results of teacher-generated assessment and externally bench-marked exams. After this summer’s fiasco; I have the answer. The question is, can we learn from it? Like, what went wrong, and can we use algorithms to improve assessment going forward? I think so; others don’t. Most are either against teacher-generated assessment, preferring exams, or for it. Some see algorithms as the enemy of education. In this blog, I argue we need both types of assessment and, professionally written, algorithms.

So, what went wrong? I doubt the Gov’t lacks competent statisticians. I expect they are way better at stats than I am; at least, I hope so. I am guessing that the problem was political and centred on the following three things: the algorithm was too ambitious, it was not subjected to the oversight of teachers and finally, the missing component of the algorithm was the exam itself. Without an exam result, the algorithm has no way of knowing how an individual would perform, in a nationally benchmarked exam, at the point of enactment. Comparing the performance of the current cohort against previous years is always going to generate too many anomalies for comfort.

Why were shortcuts taken? We know that ministers wanted to ensure the integrity of the data and standardise it against previous years. I guess that a rigorous approach to such a process would have taken too long to implement. So, the result was a complete fudge up.

Ofqual designed the algorithm to resolve the issue of grade inflation by teachers. I won’t beat about the bush; it is an issue. Teachers, under pressure from senior leaders, inflate grades. Professionalism has nothing to do with it; we are all subjective. Professional judgement needs triangulating with nationally bench-marked data. The reverse is also true. After all, teachers mark exams and an algorithm enacted to ensure statistical integrity. What’s the difference?

Working on the assumption that the circumstances arising from COVID 19 will not happen again (fingers crossed), here is how we can improve assessment by using comparative statistical techniques to triangulate data generated by teachers with data from exams.

First, what is comparative analysis? Well, it is a type of statistical approach whereby statisticians compare two or more datasets to determine consistency with one another and establish an equilibrium. A competently written algorithm could identify outliers and statistical anomalies with a whole host of other information. For example, does one institution assess more generously than another?

Big brother? Maybe, but it is better than what we have now. Take supply and demand data as an example. Comparative statistical economics compares old and new equilibrium positions. When the supply and demand of a widget are the same – the price remains consistent. If demand increases and supply remains the same, then the price of the widget increases. 

Possibly an overly simplistic analogy, education is not economics; however, if one set of data starts to vary from the other, reasons can be sought for the change. For example, demands for improved results from educational leaders may inflate teacher assessments not subsequently reflected in nationally bench-marked exams.

So far, so straightforward. Other factors can cause problems for policymakers, which increases their exposure to criticism. Ill-considered changes to nationally bench-marked exams or teacher assessed curriculum could cause significant variance in the data. Educationalists would probably argue that such policy oversight can only be a good thing.

It would take time, grades would remain consistent (using the currently established approaches), with raw data used to establish an equilibrium position reflecting systemic improvement (or otherwise).

It has real benefits. Teachers can acknowledge creativity and consistency in their assessments, while externally moderated exams would reflect the ability to work under pressure in a controlled environment. Once aggregated and distributed, teachers can review the results of the algorithm making judgments upon outliers, either by adjusting or accepting them depending upon circumstances.

The process would also provide rich data for researchers to identify where teacher-generated data differs from exam data and why. It could lead to improvements in exams or perhaps recognition of how the two sets of data complement one another. Further refinements to the aggregated data could provide a fairer reflection of student performance.

Finally, Ofqual could challenge a lack of institutional competence or integrity. Over a period, properly managed creative conflict between educational institutions and Ofqual could lead to improved communication and, more importantly, better assessment.

In summary, in the absence of externally verified nationally benchmarked metrics, educational leaders can make decisions using proxies, which often conform to their assumptions and prejudices. Additionally, corners are cut and thinly veiled hacks used to beat the system. None of this benefits the talented teacher.

Externally verified nationally benchmarked exams also suffer similar issues. Besides, they only assess students at one point in time under controlled conditions. Using both sets of data with professionally written and tested algorithms would add value and rigour to student assessment.

Sadly, this summer’s problems have, yet again, badly let down teachers. I guess the underlying issue is the reluctance of policymakers to subject their policies to independent scrutiny. Longitudinal data interrogating policy success, or otherwise, is high risk. Far easier, for policymakers, to misuse Ofsted data than have high-quality data designed by statisticians available for scrutiny by academics and others.

We need both teacher assessed and nationally benchmarked data to increase the validity of assessment in education. Change is required, but it requires political will and expertise. I, for one, won’t be holding my breath waiting for it to happen.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s