NAPLAN: Where have we come from – where to from here?

By Sally Larsen

With the shift to a new reporting system and the advice from ACARA that the NAPLAN measurement scale and time series have been reset, now is as good a time as any to rethink what useful insights can be gleaned from a national assessment program.

The 2023 national NAPLAN results were released last week, accompanied by more than the usual fanfare, and an overabundance of misleading news stories. Altering the NAPLAN reporting from ten bands to four proficiency levels, thereby reducing the number of categories students’ results fall into, has caused a reasonable amount of confusion amongst public commentators, and many excuses to again proclaim the demise of the Australian education system. 

Moving NAPLAN to Term 1, with all tests online (except Year 3 writing) seems to have had only minimal impact on the turnaround of results.

The delay between the assessments and the results has been a limitation to the usefulness of the data for schools since NAPLAN began. Added to this, there are compelling arguments that NAPLAN is not a good individual student assessment, shouldn’t be used as an individual diagnostic test, and is probably too far removed from classroom learning to be used as a reliable indicator of which specific teaching methods should be preferred. 

But if NAPLAN isn’t good for identifying individual students’ strengths and weaknesses, thereby informing teacher practices, what is it good for?

My view is that NAPLAN is uniquely powerful in its capacity to track population achievement patterns over time, and can provide good insights into how basic skills develop from childhood through to adolescence. However, it’s important that the methods used to analyse longitudinal data are evaluated and interrogated to ensure that conclusions drawn from these types of analyses are robust and defensible.

Australian governments are increasingly interested in students’ progress at school, rather than just their performance at any one time-point. The second Gonski review (2018) was titled Through Growth to Achievement. In a similar vein, the Alice Springs (Mparntwe) Education Declaration (2019) signed by all state, territory and federal education ministers, argued,

“Literacy and numeracy remain critical and must also be assessed to ensure learning growth is understood, tracked and further supported” (p.13, my italics)

Tracking progress over time should provide information about where students start and how fast they progress, and ideally, allow  insights into whether policy changes at the system or state level have any influence on students’ growth.

However, mandating a population assessment designed to track student growth, does not always translate to consistent information or clear policy directions – particularly when there are so many stakeholders determined to interpret NAPLAN results via their own lens.

One recent example of contradictory information arising from NAPLAN, relates to whether students who start with poor literacy and numeracy results in Year 3 fall further behind as they progress through school. This phenomenon is known as the Matthew Effect. Notwithstanding widespread perceptions that underachieving students make less progress on their literacy and numeracy over their school years compared with higher achieving students, our new research found no evidence of Matthew Effects in NAPLAN data from NSW and Victoria.

In fact, we found the opposite pattern. Students who started with the poorest NAPLAN reading comprehension and numeracy test results in Year 3 had the fastest growth to Year 9. Students who started with the highest achievement largely maintained their position but made less progress.

Our results are opposite to those of an influential Grattan Institute Report published in 2016. This report used NAPLAN data from Victoria and showed that the gap in ‘years of learning’ widened over time. Importantly, this report applied a transformation to NAPLAN data before mapping growth overall, and comparing the achievement of different groups of students.

After the data transformation the Grattan Report found,  

“Low achieving students fall ever further back. Low achievers in Year 3 are an extra year behind high achievers by Year 9. They are two years eight months behind in Year 3, and three years eight months behind by Year 9.” (p.2)

How do we reconcile this finding with our research? My conclusion is that these opposing findings are essentially due to different data analysis decisions.

Without the transformation of data applied in the Grattan Report, the variance in NAPLAN scale scores at the population level decreases between Year 3 and Year 9. This means that there’s less difference between the lowest and highest achieving students in NAPLAN scores by Year 9. Reducing variance over time can be a feature of horizontally-equated Rasch-scaled assessments – and it is a limitation of our research, noted in the paper.

There are other limitations of NAPLAN scores outlined in the Grattan Technical report. These were appropriately acknowledged in the analytic strategy of our paper and include, modelling the decelerating growth curves, accounting for problems with missing data, allowing for heterogeneity in starting point and rate of progress, modelling measurement error, and so on. The latent growth model analytic design that we used is very suited to examining research questions about development, and the type of data generated by NAPLAN assessments.

In my view, the nature of the Rasch scores generated by the NAPLAN testing process does not require a score transformation to model growth in population samples. Rasch scaled scores do not need to be transformed into ‘years of progress’ – and indeed doing so may only muddy the waters.

For example, I don’t think it makes sense to say that a child is at a Year 1 level in reading comprehension based on NAPLAN because the skills that comprise literacy are theoretically different at Year 1 compared with Year 3. We already make a pretty strong assumption with NAPLAN that the tests measure the same theoretical construct from Year 3 to Year 9. Extrapolating outside these boundaries is not something I would recommend.

Nonetheless, the key takeaway from the Grattan report, that “Low achieving students fall ever further back” (p.2) has had far reaching implications. Governments rely on this information when defining the scope of educational reviews (of which there are many), and making recommendations about such things as teacher training (which they do periodically). Indeed, the method proposed by the Grattan report was that used by a recent Productivity Commission report, which subsequently influenced several Federal government education reviews. Other researchers use the data transformation in their own research, when they could use the original scores and interpret standard deviations for group-based comparisons.

Recommendations that are so important at a policy level should really be underpinned by robustly defended data analysis choices. Unfortunately the limitations of an analytic strategy can often be lost because stakeholders want takeaway points not statistical debates. What this example shows is that data analysis decisions can (annoyingly) lead to opposing conclusions about important topics.

Where to from here

Regardless of which interpretation is closer to the reality, NAPLAN 2023 represents something of a new beginning for national assessments in Australia. The key change is that from 2023 the time series for NAPLAN will be reset. This means that schools and states technically should not be comparing this year’s results with previous years. 

The transformation to computer adaptive assessments is also now complete. Ideally this should ensure more precision in assessing the achievement of students at the both ends of the distribution – a limitation of the original paper-based tests. 

Whether the growth patterns observed in the old NAPLAN will remain in the new iteration is not clear: we’ll have to wait until 2029 to replicate our research, when the 2023 Year 3s are in Year 9.  

Sally Larsen is a Lecturer in Learning, Teaching and Inclusive Education at the University of New England. Her research is in the area of reading and maths development across the primary and early secondary school years in Australia, including investigating patterns of growth in NAPLAN assessment data. She is interested in educational measurement and quantitative methods in social and educational research. You can find her on Twitter @SallyLars_27




Republish this article for free, online or in print, under Creative Commons licence.