PISA tests

Serious flaws in how PISA measured student behaviour and how Australian media reported the results

International student performance test results can spark media frenzy around the world. Results and rankings published by the Organisation for Economic Co-operation and Development (OECD) are scrutinized with forensic intensity and any ranking that is not an improvement is usually labelled a ‘problem’ by the politicians and media of the country involved. Much time, energy and media space is spent trying to find solutions to such problems.

It is a circus that visits Australia regularly.

We saw it all last December when the latest Programme for International Student Assessment (PISA) results were published. We were treated to headlines such as ‘Pisa results: Australian students’ science, maths and reading in long-term decline’ from the Australian edition of the Guardian.

In March a follow-up report was published by the Australian Council for Educational Research (ACER) highlighting key aspects of the test results from an Australian perspective.

Australian mainstream media immediately zeroed in on one small part of the latter report dealing with classroom ‘disciplinary climate’. The headlines once again damned Australian schools, for example, Education: Up to half of students in Australian classrooms unable to learn because of ‘noise and disorder’ from The Daily Telegraph and Australian students among worst behaved in the developed world from The Australian.

This is pretty dramatic stuff. Not only do the test results apparently tell us the standard of Australian education is on the decline, but they also show that Australian classrooms are in chaos.

As these OECD test results inform our policy makers and contribute to the growing belief in our community that our education system is in crisis, I believe the methods used to derive the information should be scrutinised carefully. I am also very interested in how the media reports OECD findings.

Over the past few years, many researchers have raised questions about whether the PISA tests really do tell us much about education standards. In this blog I want to focus on the efficacy of some of the research connected to the PISA tests, specifically that relating to classroom discipline, and examine the way our media handled the information that was released.

To start we need to look closely at what the PISA tests measure, how the testing is done and how classroom discipline was included in the latest results.

What is PISA and how was classroom discipline included?

PISA is an OECD administered test of the performance of students aged 15 years in Mathematical Literacy, Science Literacy and Reading Literacy. It has been conducted every three years since 2000, with the most recent tests being undertaken in 2015 and the results published in December 2016. In 2015, 72 countries participated in the tests which are two hours in length. They are taken by a stratified sample of students in each country. In Australia in 2015 about 750 schools and 14,500 students were involved in the PISA tests.

How ‘classroom disciplinary climate’ was involved in PISA testing

During the PISA testing process, other data are gathered for the purpose of fleshing out a full picture of some of the contextual and resource factors influencing student learning. Thus in 2015, Principals were asked to respond to questions about school management, school climate, school resources, etc; and student perspectives were gleaned from a range of questions and responses relating to Science which was major domain in 2015. These questions focused on such matters as classroom environment, truancy, classroom disciplinary climate, motivation and interest in Science, and so on.

All these data are used to produce ‘key findings’ in relation to school learning environment, equity, and student attitudes to Science. Such findings emerge after multiple cross correlations are made between PISA scores, student and schools’ socio-economic status, and the data drawn from responses to questionnaires. They are written up in volumes of OECD reports, replete with charts, scatter plots and tables.

In 2015 students were asked to respond to statements related to classroom discipline. They were asked: ‘How often do these things happen in your science classes?

  • Students don’t listen to what the teacher says
  • There is noise and disorder
  • The teacher has to wait a long time for the students to quieten down
  • Students cannot work well
  • Students don’t start working for a long time after the lesson begins.

Then, for each of the five statements, students had to tick one of the boxes on a four point scale from (a) never or hardly ever; (b) in some lessons; (c) in most lessons; and (d) in all lessons.

Problems with the PISA process and interpretation of data

Even before we look at what is done with the results of the questions posed in PISA about classroom discipline, alarm bells would be ringing for many educators reading this blog.

No rationale for what is a good classroom environment

For a start, the five statements listed above are based on some unexplained pedagogical assumptions. They imply that a ‘disciplined’ classroom environment is one that is quiet and teacher directed, but there is no rationale provided for why such a view has been adopted. Nor is it explained why the five features of such an environment have been selected above other possible features. They are simply named as the arbiters of ‘disciplinary climate’ in schools.

Problem of possible interpretation

However, let’s accept for the moment that the five statements represent a contemporary view of classroom disciplinary climate. The next problem is one of interpretation. Is it not possible that students from across 72 countries might understand some of these statements differently? Might it not be that the diversity of languages and cultures of so many countries produces some varying interpretations of what is meant by the statements, for example that:

  • for some students, ‘don’t listen to what the teacher says’, might mean ‘I don’t listen’ or for others ‘they don’t listen’; or that students have completely different interpretations of ‘not listening’;
  • what constitutes ‘noise and disorder’ in one context/culture might differ from another;
  • for different students, a teacher ‘waiting a long time’ for quiet might vary from 10 seconds to 10 minutes;
  • ‘students cannot work well’ might be interpreted by some as ‘I cannot work well’ and by others as ‘they cannot work well’; or that some interpret ‘work well’ to refer to the quality of work rather than the capacity to undertake that work; and so on.

These possible difficulties appear not to trouble the designers. From this point on, certainty enters the equation.

Statisticians standardise the questionable data gathered

The five questionnaire items are inverted and standardised with a mean of 0 and a standard deviation of 1, to define the index of disciplinary climate in science classes. Students’ views on how conducive classrooms are to learning are then combined to develop a composite index – a measurement of the disciplinary climate in their schools. Positive values on this index indicate more positive levels of disciplinary climate in science classes.

Once combined, the next step is to construct a table purporting to show the disciplinary climate in the science classes of 15 year olds in each country. The table comprises an alphabetical list of countries, with the mean index score listed alongside each country, so allowing for easy comparison. This is followed by a series of tables containing overall disciplinary climate scores broken down by each of the disciplinary ‘problems’, correlated with such factors as performance in the PISA Science test, schools and students socio-economic profile, type of school (eg public or private), location (urban or rural) and so on.

ACER reports the results ‘from an Australian perspective’

The ACER report summarises these research findings from an Australian perspective. First, it compares Australia’s ‘mean disciplinary climate index score’ to selected comparison cities/countries such as Hong Kong, Singapore, Japan, and Finland. It reports that:

Students in Japan had the highest levels of positive disciplinary climate in science classes with a mean index score of 0.83, followed by students in Hong Kong (China) (mean index score: 0.35). Students in Australia and New Zealand reported the lowest levels of positive disciplinary climate in their science classes with mean index scores of – 0.19 and – 0.15 respectively, which were significantly lower than the OECD average of 0.00 (Thomson, Bortoli and Underwood, 2017, p. 277).

Then the ACER report compares scores within Australia by State and Territory; by ‘disciplinary problem’; and by socio-economic background. The report concludes that:

Even in the more advantaged schools, almost one third of students reported that in most or every lesson, students don’t listen to what the teacher says. One third of students in more advantaged schools and one half of the students in lower socioeconomic schools also reported that there is noise and disorder in the classroom (Thomson et al, 2017, p. 280).

What can we make of this research?

You will note from the description above, that there would need to be a number of caveats placed on the research outcomes. First, the data relate to a quite specific student cohort who are 15 years old of age, and are based only on science classes. That is, the research findings cannot be used to generalise about other subjects in the same year level, let alone about primary and/or secondary schooling.

Second, there are some questions about the classroom disciplinary data that call into question the certainty with which the numbers are calculated and compared. These relate to student motivation in answering the questions, and to the differing interpretations by people from many different cultures about the meaning of the same words and phrases.

Third, there are well-documented problems related to the data with which the questionnaire responses are cross-correlated, such as the validity of the PISA test scores.

In short, it may well be that discipline is a problem in Australian schools, but this research cannot provide us with that information. Surely the most one can say is that the results might point to the need for more extended research. But far from a measured response, the media fed the findings into the continuing narrative about falling standards in Australian education.

The media plays a pivotal role

When ACER released its report, the headlines and associated commentary once again damned Australian schools. Here is the daily paper from my hometown of Adelaide.

Disorder the order of the day for Aussie schools (Advertiser, 15/3/2017)

Australian school students are significantly rowdier and less disciplined than those overseas, research has found. An ACER report, released today, says half the students in disadvantaged schools nationally, and a third of students in advantaged schools, reported ‘noise and disorder’ in most or all of their classes…. In December, the Advertiser reported the (PISA) test results showed the academic abilities of Australian students were in ‘absolute decline’. Now the school discipline results show Australian schools performed considerably worse than the average across OECD nations…. Federal Education Minister Simon Birmingham said the testing showed that there was ‘essentially no relationship between spending per student and outcomes. This research demonstrates that more money spent within a school doesn’t automatically buy you better discipline, engagement or ambition’, he said (Williams, Advertiser 15/3/17).

Mainstream newspapers all over the country repeated the same messages. Once again, media commentators and politicians had fodder for a fresh round of teacher bashing.

Let’s look at what is happening here:

  • The mainstream press have broadened the research findings to encompass not just 15 year old students in science classrooms, but ALL students (primary and secondary) across ALL subject areas;
  • The research report findings have been picked up without any mention of some of the difficulties associated with conducting such research across so many cultures and countries. The numbers are treated with reverence, and the findings as the immutable ‘truth’;
  • The mainstream press have cherry picked negative results to get a headline, ignoring such findings in the same ACER report that, for example, Australia is well above the OECD average in terms of the interest that students have in their learning in Science, and the level of teacher support they receive;
  • Key politicians begin to use the research findings as a justification for not having to spend more money on education, and to blame schools and students for the ‘classroom chaos’.


These errors and omissions reinforce the narrative being promulgated in mainstream media and by politicians and current policy makers that standards in Australian education are in serious decline. If such judgments are being made on the basis of flawed data reported in a flawed way by the media, they contribute to a misdiagnosis of the causes of identified problems, and to the wrong policy directions being set.

The information that is garnered from the PISA process every three years may have the potential to contribute to policy making. But if PISA is to be used as a key arbiter of educational quality, then we need to ensure that its methodology is subjected to critical scrutiny. And politicians and policy makers alike need to look beyond the simplistic and often downright wrong media reporting of PISA results.


Alan Reid is Professor Emeritus of Education at the University of South Australia. Professor Reid’s research interests include educational policy, curriculum change, social justice and education, citizenship education and the history and politics of public education. He has published widely in these areas and gives many talks and papers to professional groups, nationally and internationally. These include a number of named Lectures and Orations, including the Radford Lecture (AARE); the Fritz Duras Memorial Lecture (ACHPER); the Selby-Smith Oration (ACE); the Hedley Beare Oration (ACE -NT); the Phillip Hughes Oration (ACE – ACT); the Garth Boomer Memorial Lecture (ACSA); and the national conference of the AEU.

Why condemning international tests is a distraction, and what we really should be worried about

Governments all around the world seem to be influenced by the international rankings of students by the Organisation for Economic Co-operation and Development (OECD). Australia is no different. Famously, in 2012 our then prime minister, Julia Gillard, set a goal for our students to rank in the world’s “top 5 by 2025”

Increasingly, educators have been questioning the validity of these rankings and asking why policy makers pay so much attention to them. One critique getting a lot of attention is by Stanford University’s Professor Martin Carnoy who concluded recently the OECD’s Program for International Student Assessment (PISA) rankings are misleading.

According to Carnoy they are misleading because of (1) differences between countries in students’ access to resources (2) the validity of using PISA results for international comparisons, (3) larger than acknowledged test score errors and (4) the Shanghai educational system scores, held up as a model for the rest of the world, are based on a subset of students that is not representative of the Shanghai student population as a whole. Carnoy was speaking specifically about the PISA results and the US, but his opinions would also be valid for Australia.

Tests do distort policy and practice. Most educators, and many parents, these days understand that. There is no evidence that standardised tests, or any league tables generated as a result, improve student achievement!

But there are other lessons aimed at the US coming from PISA rankings that perhaps the US, and other countries such as Australia, should be paying attention to. How many times does it have to be pointed out that “socio-economic disadvantage has a notable impact on student performance”? Also PISA 2009 asserted local funding of education exacerbates inequality and “may be the single most important factor for the US.” But it seems impossible for Americans to come to grips with these findings.

Meanwhile here in Australia the divide in schooling is growing. The too widespread entrenched inequality of educational outcomes and opportunities are further exacerbated by economic trends. Tom Bentley and Ciannon Cazaley from Victoria University’s Mitchell Institute say “As a result, there is a mismatch between the learning needs of students and schools, and the current capabilities of education systems”.

I contend that, while I agree there is much about OECD rankings that are wrong, condemning these international tests is a distraction.

Yes there are problems with PISA

Of the four reasons given by Carnoy as to why the PISA scores are misleading the first two, about different access and validity, can be simply agreed with.

But PISA Reports are not just analyses of test scores! Conclusions drawn from test data and information supplied by school principals agree with many other reports. Using test scores to construct league tables frequently ignores variances attached to the means and media analyses and political responses often show less than adequate understanding of basic statistics or of the education literature. Carnoy also details the unreliable way in which the Shanghai sample was constructed by PISA.

The important point is this: were the results from Shanghai, Japan or Singapore to be ignored, the conclusions which can be reasonably drawn from the remaining data are still a challenge to the practices adopted by US and other countries participating in PISA.

Indeed ignoring international tests altogether would still leave the conclusions from other studies which show the positive contribution to school student achievement by school leadership which supports teacher quality, demanding curriculum, rich experiences, attention to those having difficulty, socioeconomic background and diversity of the class and community relations.

Carnoy recommends changes in the interpretation of results of international tests: relating changes over time to family academic resource (FAR) factors. As well micro-data made available to allow further analysis, independent academics appointed to PISA’s decision-making board. And last, policymakers should focus more on differences in achievement gains between the US states, rather than between the US and other countries.


Bad education reform in the US is not the result of reacting to OECD rankings

The practices of educational administrators, politicians and corporate donors to education reform in the US are not based on international test scores and analyses. They are derived from ideological beliefs about “the market” and how people behave. The private sector is considered good, public sector bad, government intrusion into the community is excessive and inhibits self-reliance. The beliefs are financial incentives drive superior performance, competition improves quality and choice (of school) is a democratic right.

In this scenario of course teachers’ unions are seen as very bad and that their influence needs to be removed. Rich donors, whose donations cost public money because of tax offsets (as pointed out by Diane Ravitch in her comprehensive criticism of US education policy, “The Death and Life of the American School System”) relentlessly pursue that objective through campaigns for elections to school boards.

None of these beliefs are based on valid research any more than is climate change denialism or Bishop Usher’s views on the origin of the Earth. To blame PISA is at best a distraction.

The main response by US authorities in recent times to assertions about student achievement has been the creation of more charter schools, independently run though publicly, and privately, financed. Recently the Common Core State Standards Initiative which makes testing easier has dominated reform. All as part of “Race to the Top” endorsed by President Obama and overseen by Education Secretary Arne Duncan. The failure of charter schools has been well demonstrated, David Zyngier describing them as a flawed idea and there is a revolt against Common Core Standards led by Diane Ravitch.

In most countries educational performance by students attending independent schools is due mainly to student selection and retention. When controlled for socioeconomic factors any superior performance compared with students from public schools disappears. More money spent on those schools delivers relatively little extra gain. The very important University of Queensland study by Son Nghiem and colleagues revealed that what mattered was a suite of factors relating to the home environment.


Inequality and education

Carnoy admits the importance of the home environment and the resources made available by parents. He points out education analysts in the US “pay close attention to the level and trends of test scores and their relationship with socioeconomic factors, that is inequality”. Indeed they do: David Berliner, in his Letter to the President, Diane Ravitch, Jean Anyon and many others have shown policies and practices in the US do not address poverty’s impact: US practices aren’t like those of successful countries.

Finnish teachers would not stay long in US schools according to Finnish expert Pasi Sahlberg in “What if Finland’s great teachers taught in U.S. schools?”.


Education Achievement and Economic Performance

Carnoy points to the study by the Hoover Institution’s Eric Hanushek and others which argued that the average national math test scores are the single best predictor of national economic growth, at least for the years 1960-2010. The poor US student test scores therefor threaten US economic superiority!

The seeming contradiction arising from the economic success of the US (as measured by GDP and similar metrics) and Nobel prizes seem to deny this prospect. Diane Ravitch argued there was no relationship between test scores and economic productivity. Researcher and writer on education reform Norman Eng pointed out that learning in school was narrow, detached and contrived whereas work, especially highly skilled jobs, was active, cross disciplinary and “out-of-the-box”.

There is indeed a poor relationship between test scores and wages, and rewarding teachers on the basis of their student’s scores likely misdirects resources according to Nobel prize-winner James Heckman and colleagues.

But there is more. Michael Teitelbaum’s said his research shows claims of looming problems due to a shortage of quality graduates in STEM subjects is a myth. US high tech companies employ large numbers of foreign students with US earned PhDs on H-1 visas who tend to be locked into jobs at lower salaries. They aren’t smarter than available US workers and do not bring talents not otherwise available. They are just cheaper! They register fewer patents and their PhDs are from less prestigious universities.

In any event, as reviews by Simon Marginson, now of University College London, and Director of American Studies at Columbia University Andrew Delbanco, have recently pointed out, too often universities do not enrol students on merit but choose them from more prestigious colleges: access is decreasing as quality of teaching declines and fees increase. Fewer graduates means higher salaries and widening economic divergence.


Putting PISA in its place

International programs like PISA are a snapshot of achievement in tests by a sample of 15 year olds to questions about their comprehension of writing, reading, mathematics and science. Why not accept them for that! Americans obsess over their ranking in the world and use any excuse to deny the contrary: consider health care. Just as well they don’t play Rugby Union.

Advantaged families in many countries also obsess over test scores achieved by the schools their children attend or might attend and spend huge sums of money moving house to be near “better” schools. They devote time to pushing their children to complete excessive amounts of homework and engage in so much other activity that they have little time for simply developing the ability to form relationships with others.

Standardised tests pander to the myth of accountability. They diminish the enhancement of creativity and give undue weight to a few subjects notwithstanding the importance of those. But privileging literacy and numeracy marginalises other skills. Large numbers of people who had difficulty with basic subjects or even the school experience in general have gone on to be successful in science, the arts and other domains. Some were dyslexic and some autistic, but often were discriminated against at school.


Here’s what we really should be worried about

The US is a country where education researchers contribute high-quality research, which most US policy makers ignore. What is also ignored is early childhood, a time of far greater importance than school years, a time when inequality has most impact on families and therefore on children. The implications continue to be ignored in the US, to an extent in Australia.

In Australia, privileging independent schools wastes taxpayer money, disadvantaged children in poor suburbs and country areas suffer poorly supported teachers and inadequate resources. So do Indigenous children who in the Northern Territory are subject to Direct Instruction, their own languages are marginalised and boarding schools are pitched as the solution. Inequality increases.

Meanwhile Prime Minister Turnbull wonders whether we can afford to fully implement the Gonski reforms whilst asserting that Australia is a fair country, and Opposition leader Shorten notes weekend penalty rates mean parents can afford to send their children to private schools!


des180907Des Griffin is Gerard Krefft Fellow at the Australian Museum, Sydney where he was director from 1976 to 1998. He graduated from Victoria University of Wellington and the University of Tasmania in marine biology. He is interested in museums and arts organisations, the environment and science, organisational dynamics, especially leadership and governance and in education. He was founding president of Museums Australia, the single association representing museum people from 1993 to 1996. He was appointed a Member of the Order of Australia (AM) in 1990 and elected a Fellow of the Royal Society of New South Wales in June 2014. He writes at www.desgriffin.com.

Des is the author of Education Reform: The Unwinding of Intelligence and Creativity (Springer, 2014). The book and individual chapters can be downloaded from the Springer site which contains abstracts of the chapters. The book can be purchased from the site or from booksellers such as Fishpond.