randomised control trials

Here’s what is going wrong with ‘evidence-based’ policies and practices in schools in Australia

An academic‘s job is, quite often, to name what others might not see. Scholars of school reform in particular are used to seeing paradoxes and ironies. The contradictions we come across are a source of intellectual intrigue, theoretical development and at times, humour. But the point of naming them in our work is often a fairly simple attempt to get policy actors and teachers to see what they might not see when they are in the midst of their daily work. After all, one of the advantages of being in ‘the Ivory Tower’ is having the opportunity to see larger, longer-term patterns of human behaviour.

This blog is an attempt to continue this line of endeavour. Here I would like to point out some contradictions in current public rhetoric about the relationship between educational research and schooling – focusing on teaching practices and curriculum for the moment.

The call for ‘evidenced-based’ practice in schools

By now we have all seen repeated calls for policy and practice to be ‘evidence-based’. On the one hand, this is common sense – a call to restrain the well-known tendency of educational reforms to fervently push one fad after another, based mostly on beliefs and normative appeals (that is messages that indicate what one should or should not do in a certain situation). And let’s be honest, these often get tangled in party political debates – between ostensible conservatives and supposed progressives. The reality is that both sides are guilty of pushing reforms with either no serious empirical bases or half-baked re-interpretation of research – and both claiming authority based on that ‘research.’ Of course, not all high quality research is empirical – nor should it all be – but the appeal to evidence as a way of moving beyond stalemate is not without merit. Calling for empirical adjudication or verification does provide a pathway to establish more secure bases for justifying what reforms and practices ought to be implemented.

There are a number of ways in which we already know empirical analysis can now move educational reform further, because we can name very common educational practices for which we have ample evidence that the effects of those practices are not what advocates intended. For example, there is ample evidence that NAPLAN has been implemented in a manner that directly contradicts what some of its advocates intended; but the empirical experience has been that NAPLAN has become far more high-stakes than intended and has carried the consequences of narrowing curriculum, a consequence its early advocates said would not happen. (Never mind that many of us predicted this. That’s another story.) This is an example of where empirical research can serve the vital role of assessing the difference between intended and experienced results.

Good research can turn into zealous advocacy

So on a general level, the case for evidence-based practice has a definite value. But let’s not over-extend this general appeal, because we also have plenty of experience of seeing good research turn into zealous advocacy with dubious intent and consequence. The current over-extensions of the empirical appeal have led paradigmatic warriors to push the authority of their work well beyond its actual capacity to inform educational practice. Here, let me name two forms of this over-extension.

Synthetic reviews

Take the contemporary appeal to summarise studies of specific practices as a means of deciphering which practices offer the most promise in practice. (This is called a ‘synthetic review’. John Hattie’s well-known work would be an example). There are, of course, many ways to conduct synthetic reviews of previous research – but we all know the statistical appeal of meta-analyses, based on one form or another of aggregating effect sizes reported in research, has come to dominate the minds of many Australian educators (without a lot of reflection on the strengths and weaknesses of different forms of reviews).

So if we take the stock standard effect size compilation exercise as authoritative, let us also note the obvious constraints implied in that exercise. First, to do that work, all included previous studies have to have measured an outcome that is seen to be the same outcome. This implies that outcome is a) actually valuable and b) sufficiently consistent to be consistently measured. Since most research that fits this bill has already bought the ideology behind standardised measures of educational achievement, that’s its strongest footing. And it is good for that. These forms of analysis are also often not only about teaching, since the practices summarised often are much more than just teaching, but include pre-packaged curriculum as well (e.g. direct instruction research assumes previously set, given curriculum is being implemented).

Now just think about how many times you have seen someone say this or that practice has this or that effect size without also mentioning the very restricted nature of the studied ‘cause’ and measured outcome.

Simply ask ‘effect on what?’ and you have a clear idea of just how limited such meta-analyses actually are.

Randomised Control Trials

Also keep in mind what this form of research can actually tell us about new innovations: nothing directly. This last point applies doubly to the now ubiquitous calls for Randomised Control Trials (RCTs). By definition, RCTs cannot tell us what the effect of an innovation will be simply because that innovation has to already be in place to do an RCT at all. And to be firm on the methodology, we don’t need just one RCT per innovation, but several – so that meta-analyses can be conducted based on replication studies.

This isn’t an argument against meta-analyses and RCTs, but an appeal to be sensible about what we think we can learn from such necessary research endeavours.

Both of these forms of analysis are fundamentally committed to rigorously studying single cause-effect relationships, of the X leads to Y form, since the most rigorous empirical assessment of causality in this tradition is based on isolating the effects of everything other than the designed cause – the X of interest. This is how you specify just what needs to be randomised. Although RCTs in education are built from the tradition of educational psychology that sought to examine generalised claims about all of humanity where randomisation was needed at the individual student level, most reform applications of RCTs will randomise whatever unit of analysis best fits the intended reform. Common contemporary forms of this application will randomise teachers or schools in this or that innovation. The point of that randomisation is to find effects that are independent of the differences between whatever is randomised.

Research shows what has happened, not what will happen

The point of replications is to mitigate against known human flaws (biases, mistakes, etc) and to examine the effect of contexts. This is where our language about what research ‘says’ needs to be much more precise than what we typically see in news editorials and twitter. For example, when phonics advocates say ‘rigorous empirical research has shown phonics program X leads to effect Y’, don’t forget the background presumptions. What that research may have shown is that when phonics program X was implemented in a systemic study, the outcomes measured were Y. What this means is that the claims which can reasonably be drawn from such research are far more limited than zealous advocates hope. That research studied what happened, not what will happen.

Such research does NOT say anything about whether or not that program, when transplanted into a new context, will have the same effect. You have to be pretty sure the contexts are sufficiently similar to make that presumption. (Personally I am quite sceptical about crossing national boundaries with reforms, especially into Australia.)

Fidelity of implementation studies and instruments

More importantly, such studies cannot say anything about whether or not reform X can actually be implemented with sufficient ‘fidelity’ to expect the intended outcome. This reality is precisely why researchers seeking the ‘gold standard’ of research are now producing voluminous ‘fidelity of implementation’ studies and instruments. The Gates Foundation has funded many of these in the US, and I see intended publications from them all the time in my editorial role. Essentially fidelity of implementation measures attempt to estimate the degree to which the new program has been implemented as intended, often by analysing direct evidence of the implementation.

Each time I see one of these studies, it begs the question: ‘If the intent of the reform is to produce the qualities identified in the fidelity of implementation instruments, doesn’t the need of the fidelity of information suggest the reform isn’t readily implemented?’ And why not use the fidelity of implementation instrument itself if that’s what you really think has the effect? For a nice critique and re-framing of this issue see Tony Bryk’s Fidelity of Implementation: Is It the Right Concept?

The reality of ‘evidence-based’ policy

This is where the overall structure of the current push for evidence-based practices becomes most obvious. The fundamental paradox of current educational policy is that most of it is intended to centrally pre-determine what practices occur in local sites, what teachers do (and don’t do) – and yet the policy claims this will lead to the most advanced, innovative curriculum and teaching. It won’t. It can’t.

What it can do is provide a solid basis of knowledge for teachers to know and use in their own professional judgements about what is the best thing to do with their students on any given day. It might help convince schools and teachers to give up on historical practices and debates we are pretty confident won’t work. But what will work depends entirely on the innovation, professional judgement and, as Paul Brock once put it, nous of all educators.

 

James Ladwig is Associate Professor in the School of Education at the University of Newcastle and co-editor of the American Educational Research Journal.  He is internationally recognised for his expertise in educational research and school reform. 

Find James’ latest work in Limits to Evidence-Based Learning of Educational Science, in Hall, Quinn and Gollnick (Eds) The Wiley Handbook of Teaching and Learning published by Wiley-Blackwell, New York (in press).

James is on Twitter @jgladwig