The cancellation of NAPLAN testing for this year came as no surprise. We all have enough to worry about right now, but it is not too soon to start thinking about the shape of post-COVID Australia. The process of collating a reform program has already started. The performance of our school system should come high on this list. NAPLAN is one of the most prominent measures of school performance in Australia but attracts a near-constant barrage of criticism. We should take the opportunity of this assessment “gap year” to review its future.
A key objective of NAPLAN is to provide diagnostic feedback about the strengths and weaknesses of individual students. A further objective is to provide comparative school results to help parents make an informed choice of school. These are worthy objectives. What, as they say, could possibly go wrong?
NAPLAN is not perfect—there is no perfect test—but every year it attracts a battery of commentary about the frequency of testing, anxiety for students, claims of “teaching to the test”, and over-emphasis on cognitive rather than non-cognitive skills. The teachers’ organisations often give the impression they will always oppose NAPLAN because they simply don’t like the idea of accountability in our schools. It is hard work finding anything but anecdotal comment in the annual pile-on, but it would be a mistake to dismiss the criticism. Much of it is misplaced, but the risk is that constant adverse commentary will provoke “reform” that emasculates the key purpose of measuring learning outcomes.
If we are to develop worthwhile NAPLAN reform we need a drastic rethink about how the results are presented. In this era of home schooling, the first step is to teach ourselves the role of high-stakes and low-stakes assessment.
Learning assessments such as NAPLAN are relatively new. This may sound bizarre. Almost everyone who has been to school is thoroughly familiar with exams. Examinations go back to medieval Europe, to classical Hindu society in India, and many centuries in China. Shakespeare’s “whining schoolboy … creeping like snail unwillingly to school”, would have been acquainted with exams.
The examinations familiar to us over the centuries have two key limitations. First, they do not tell us the specific skills or knowledge that a student has learned relative to a specified set of educational goals. Second, they often have as a major objective the selection of students for further study, and the student’s mark may be scored at least partly in relation to what other students have achieved (known as norm referencing). Traditional exams are typically high-stakes norm-referenced. Such exams may have severe consequences for students but tell us little about their specific skills.
To rectify this we need a different style of testing, known as criterion-based learning assessment. NAPLAN is such an assessment. Student grades in NAPLAN are not determined relative to other students: they are established by comparing a student’s achievements with clearly stated criteria for learning outcomes. For example, we might define Level 2 Reading as knowing how to locate text expressed in short, repetitive sentences unaided by pictures. Now we know the (rather limited) reading skills of a student scoring Level 2.
Criterion-based assessments such as NAPLAN are generally low-stakes. Of course, the results matter to students and teachers, but they are not influenced by the performance of other students. It is not a competition between them and there are no significant adverse consequences. The emphasis is on identifying learning problems by providing reliable diagnostic evidence.
As soon as we set out the differences between low stakes and high stakes we begin to see the problem with NAPLAN. With the major objective of providing diagnostic feedback to individual students, NAPLAN is a low-stakes assessment. But NAPLAN has a second objective, which is to provide comparative school results. Those results are obtained by aggregating each school’s student scores and then ranking the schools.
Now we have high stakes with a vengeance. When the overall scores are used to rank schools, there are potentially serious consequences for a school’s reputation, enrolments and funding. Those individual student results no longer matter purely to the parents and teachers: they become a vital input into the school’s overall academic performance. Inevitably, the high-stakes demand of school ranking puts pressure on teachers. That in turn transfers to students and parents. Helicopter parents hovering over their kids’ performance add to pressures from the school in what was never intended to be an anxiety-inducing assessment.
The high-stakes comparative school rankings permeate the entire process, to the point where NAPLAN as a whole becomes a high-stakes exercise. The low-stakes diagnostic results are swept up into the high-stakes presentation of school rankings like partners in an incompatible marriage. Little wonder that each year brings so much noisy criticism.
How can we ensure that NAPLAN’s valuable student and school results are meaningfully reformed? Several countries, Singapore among them, have reduced the frequency of testing. This makes good sense in Singapore’s famously competitive school system but has little relevance for Australia. Students here sit NAPLAN only four times in the first nine years of schooling. That hardly amounts to excess, even with time for class preparation.
There are occasional suggestions for moving to a sample rather than universal coverage. Testing only a proportion of students and schools would mean that some students are denied the benefits of individual diagnosis. Schools outside the sample would miss out on vital information about their teaching performance. It is a perfect example of a “cure” worse than the problem.
There should be no question of jettisoning either of the two key NAPLAN objectives. Providing diagnostic feedback about individual students is self-evidently crucial. And the fact that comparative school rankings have high-stakes consequences for schools does not alter the fact that such consequences (and therefore incentives to improve) are vital in meeting the second NAPLAN objective of providing comparative information for parents.
No single proposal for reform will quieten all the criticism. Criticism from those hostile to testing will continue as long as both low-stakes and high-stakes outcomes draw upon the same set of assessments. But the disjuncture between low-stakes and high-stakes functions suggests that a sensible step would be to separate those functions to the extent possible. At present, both these roles are carried out by a single agency, the Australian Curriculum, Assessment and Reporting Agency (ACARA). ACARA has developed considerable expertise in low-stakes assessment. It has, however, carried out much less work on analysis of the composite school rankings. The school rankings are published with little analysis of why schools differ in their performance.
If ACARA were to retain responsibility for individual student results, it would allow them to concentrate on their strength: the development of NAPLAN scale scores, the National Minimum Standard (which is set very low), the further development of online assessment, and coverage of the tests in relation to the curriculum.
The Commonwealth government is now in the throes of establishing the National Evidence Institute for education. The institute has so far attracted little attention, no doubt obscured by the pandemic, but it could play a vital role in raising the performance of Australia’s schools. The institute’s purpose is to drive improvements in teaching practice, school systems and policies by identifying worldwide research and good practice that translate into practical tools for our own classrooms. Its establishment provides the perfect opportunity to think of it as the new home for work on the high-stakes school rankings.
If the institute takes responsibility for the school rankings, both institute and schools will benefit. The institute will have a massive database of specifically Australian performance data. The school rankings will still appear, but instead of a simple list, those rankings can be analysed by, and related to, performance-enhancing factors. It will be the perfect vehicle for keeping the work grounded in the practicalities of “what works”, rather than getting lost chasing the wilder ideas so fashionable in education. From the schools’ point of view, they will have confidence that the practices they are being urged to adopt have been tested not just in the research literature but in the measured performance of Australian schools.
Ken Gannicott is a former Professor of Education at Wollongong University.