The Testing Controversy and the Creativity Gap

I have written to both the federal Education Minister Mr Dan Tehan and the NSW Education Minister Ms Sarah Mitchell as well as to both newspapers that I read regularly, The Australian and The Daily Telegraph, alerting them to the problems with international PISA testing and the continued good performance of our 15-year-olds on our own NAPLAN testing at Year 9 (see “Drop Pisa, Keep NAPLAN,” Quadrant, May 2020).  I had no reply from the government ministers other than a form-letter brushoff from Tehan’s office, and Mitchell’s office did not even acknowledge receipt.  At the same time, Australian newspaper editors will not publish any letters that go against their columnists’ mantra that “our students are failing in international tests” and that our young people’s school performance is getting worse.

These myths are apparently taken as gospel by our so-called educational testing experts, Professor Geoff Masters, CEO of the Australian Council for Educational Research (ACER) which administers PISA in Australia, and Professor Barry McGaw, previous head of, and still measurement adviser to, the Australian Curriculum and Assessment Review Committee (ACARA) which administers NAPLAN.   NAPLAN adviser McGaw has recently dropped his support of NAPLAN testing and proposed that it be replaced by a completely new and yet to be devised Australian National Standardised Assessment (ANSA) test.  This would be an indefensible solution to a non-existent problem. 

In this follow-up article, I describe four solutions that are far more defensible than anything proposed so far.  These solutions can be summarized as follows: (1) Withdraw from international PISA testing permanently before the next round of PISA testing is due in 2022; (2) Leave the NAPLAN Numeracy, Literacy, and Writing tests exactly as they are, but move the tests to the end of years 3, 5, 7, and 9, where in Year 9 they can serve as a possible exit exam for students who do not have the intellectual ability needed to realistically try for university and should instead be paid by the government to pursue TAFE-type training for a nationally useful trade; (3) At Year 12, use the SAT I test for university entry; and (4) Add to the year-end Year 9 NAPLAN tests a Creative Ability Test, necessary in order to identify and governmentally support those students with high creative talent who, regardless of their IQ level, could help turn around Australia’s dangerously falling record of innovation.  I’ve used the same bracketed numbers in the following to highlight each recommendation.

(1) Let’s get rid of PISA testing.  PISA testing is offered only every three years and next time will be offered with a four-year gap due to the coronavirus pandemic, which means that a majority of our Year 9 students – two-thirds at present and three-quarters next time – do not take it anyway!  Moreover, as I pointed out in my earlier article, PISA testing is badly flawed.  Each student sitting for the test is presented with a different set and different number of items due to PISA’s use of a complex and secretive statistical procedure called “imputation,” or “adaptive testing,” whereby each student’s answers to the initial questions determine the difficulty level of the questions he or she is presented with for the rest of the test.  What this means is that students within a given country actually answer different items, and students from different countries answer different items as well.  This not only causes within-country PISA performance to fluctuate wildly over the three-year or now four-year testing intervals but also renders between-country comparisons spurious. 

Australian education researchers are either ignorant of these problems with PISA testing, or have chosen to ignore them.  (And be warned that I am about to question the academic competence of all the educational researchers that the government regards as experts and it is high time that someone did this.)  Professor Geoff Masters, described in The Australian (January 3, 2014) as “one of the country’s most respected academics in this field,” who heads the Australian Council for Education Research, ACER, which administers PISA testing in Australia, is in favor of the item-changing adaptive testing method used by the OECD for PISA testing because his few refereed academic publications are based on it and stem from the research he did almost 50 years ago in his 1972 PhD thesis at the University of Chicago.  Masters is also an advocate – along with businessman David Gonski, principal author of the government’s highly controversial paid-for review of educational policy in Australian schools – of U.S. psychology academic Carol Dweck’s ethically and empirically dubious “growth mindset theory” wherein students are told the lie that research shows that intelligence is malleable and that by telling themselves that they are smart they can do better on tests and exams. 

None of the ACER researchers, including Masters at the top, have any sort of acceptable research publication record in refereed academic journals and, as we will see shortly, none of the new ANSA test proposers do either, and the same goes for our purported education experts prominent in the media.

(2) NAPLAN testing should be retained as is, but with a timing change to the end of the year.  Before explaining why, I would like to respond to the shockingly ill-informed proposal, released to the media on August 28 by Queensland government Education Minister Ms Grace Grace, titled “Expert review suggests NAPLAN be replaced.”  This proposal, which has been agreed to and paid for by the state governments of Queensland, Victoria, New South Wales, and the territory government of the ACT, would, to be adopted, have to be agreed to by all state and territory governments as well as the federal government.

The proposal argues that NAPLAN should be scrapped, and replaced by a yet-to-be-devised “Australian National Standardised Assessment” (ANSA) test.  This is a naive proposal, and I suggest that it has gotten as far as it has only because of the research ignorance of those who wrote it and the poor staff advice given to the four government education ministers who have so far endorsed it.  According to the media release, the expert review of NAPLAN and the proposal of the new ANSA test were the work of “renowned education experts,” the aforementioned Professor Barry McGaw, of the University of Melbourne, Professor Bill Louden of the University of Western Australia, and Professor Claire Wyatt-Smith of the Australian Catholic University.  Renowned education experts?  Like Professor Masters, Professor McGaw, who at one time or other has been in high-level positions in all three organizations, the OECD which runs PISA, the ACER which administers PISA testing in Australia, and the ACARA which administers NAPLAN, and who also obtained his PhD in the US in 1972, in his case from the University of Illinois, and therefore should have been well trained in research, has zero refereed publications listed on the main academic publication database, Google Scholar.  Professor Louden, recently retired, has a minuscule 170 Google Scholar citations from a very poor lifetime total of 17 refereed journal articles.  Professor Wyatt-Smith, the only reasonably research-active member of this threesome, has a modest 2,500 or so citations on Google Scholar and very little experience in educational measurement. 

The McGaw-led proposal is likely to escape scrutiny in the media because Australia’s few media-active academic education experts are no better qualified than its proposers.  Here are two of the more qualified of the media commentators.  Dr. Rachel Wilson of the University of Sydney, described in The Daily Telegraph (July 16, 2020) as “a school assessment expert,” has fewer than 1,000 lifetime Google Scholar citations and mostly publishes in international journals instead of Australian journals where her work might be seen by Australia’s educators.  Dr. Jennifer Buckingham, Senior Research Fellow in the privately funded Centre for Independent Studies, who is a regular contributor to The Daily Telegraph, also has fewer than 1,000 Google Scholar citations.  To give an idea of just how low 1,000 citations is, consider that Australia’s (and probably the world’s) most prolific education researcher, Professor Herbert Marsh, a research professor at ACU and at Oxford University, has over 130,000 Google Scholar citations, and Australia’s second-most published education researcher, recently retired Professor John Sweller of UNSW, has 42,000.  Incidentally, both these researchers have received large amounts of government funding for their research over the years and I am curious as to why they have not stepped forward, retired or not, on the PISA-NAPLAN and now ANSA issue.  Criticism of education research should not have to be left to a mere business and psychology professor like myself.

The McGaw-led proposal of August 28 is a big worry.  There are at least five major problems with it.  The first problem is the statement by Ms Grace that “It is clear that the current NAPLAN testing is not world’s best practice” when NAPLAN clearly is world’s best practice.  NAPLAN is identical in content – though appropriately lower in level of difficulty for Year 9 down to Year 3 – to the world’s most widely used test for university entrance, the Scholastic Aptitude Test, now simply called the SAT, which in the U.S. is typically taken late in the final year of high school or early in the year after leaving high school in order to apply for university entrance when the U.S. academic year begins in September.  The SAT is the most widely used university entrance test internationally and has recently been adopted in Australia by the University of Melbourne.  (To see the fascinating scientific and political and evolution of the SAT you need go no further than Wikipedia, “SAT,” last edited June 5, 2020.)  Both the SAT and the NAPLAN tests – although with today’s ubiquitous political correctness they are not now labeled as such – are essentially IQ tests.  They test the applicant’s mathematical (numeracy) IQ and verbal (literacy) IQ, and the equally weighted sum of the two provides a very good measure of overall IQ.  The content of Year 9 NAPLAN tests and the two SAT tests is identical.  The NAPLAN Year 9 Numeracy test focuses on arithmetic, algebra, and geometry, with some attention to statistics and probability; the SAT Math test focuses on exactly the same four learning areas.  The NAPLAN Year 9 Literacy test focuses on reading comprehension, correct use of language, and quality of written expression; the SAT Reading & Writing test focuses exactly on these three areas.  The NAPLAN tests and SAT tests are identical even in their predominant use of the 4-option multiple-choice format, which the NAPLAN Year-9 Practice Booklet (Athanasou & Defterious, Sydney: Pascal Press, 2017) correctly states “is by far the preferred option for large-scale testing” and that “the most competent students do well whatever the form of assessment and those students well below the minimum will struggle with whatever method of assessment is used.”  Multiple-choice test scoring is preferable because it is far more objective than the subjective scoring of open-ended answers and, with machine scoring used exclusively these days, provides much faster results.

The ignoring of IQ – also called “general mental ability” – by Australia’s educational researchers (and policymakers) is nothing short of scandalous.  They should read what is to my mind the definitive review of the importance of IQ, Linda Gottfredson’s article titled “Why g matters: the complexity of everyday life,” which can be found online in the plainly written academic journal Intelligence (1997, volume 24, issue 1, pages 79-132).  Spearman’s famous “g” ability – IQ in common parlance – is defined by Gottfredson as “the ability for reasoning, problem-solving, and decision-making” and her analysis is by far the best available because it is based on the results of very broad IQ testing, namely, in personnel selection for all job levels in the private sector in the U.S. and for recruitment into the U.S. military.  I will summarize the main findings here.  Firstly, IQ as measured in the private sector by tests such as the Wechsler or the very similar U.S. Employment Service General Aptitude Test Battery, and as measured in the military sector by the Armed Forces Qualifying Test or its forerunner the Army Alpha test, is the single best predictor of job performance at all levels of employment.  For civilian jobs, IQ has an average correlation with objective job performance, on a scale of 0 minimum to 1.00 maximum, of .75, and an average correlation with supervisor-rated job performance of .47 (supervisors aren’t that good at judging job performance but their ratings are often the only thing that matters for the employee!).  For military jobs, which are mostly less complex except at the very top level, the respective correlations are .53 and .24.  Secondly, IQ’s prediction of job performance increases with the complexity of the job, defined by Gottfredson as the ability to identify and deal with problem situations quickly, to learn and recall job-related information, and to reason and make judgments.  Using the more realistic supervisory ratings, the predictive correlation of IQ for performance in high complexity jobs (e.g., IT programmer, lawyer, accountant, high level business executive) is .58; for medium complexity jobs (e.g., radiologist, high school teacher, auto mechanic) it is .51; for low complexity jobs (e.g. assembler, machine operator, forklift driver) it is .40; and for very low complexity jobs (e.g., spot welder, nurse’s aide, janitor) it is .23.  Note that high school teaching is rated as a medium complexity job and the recommended minimum IQ for hiring high school teachers in the U.S. is 112, which translates to the top 19% of the population, just as it would here in Australia.  An IQ of 110, representing the top 25% of the population, would seem to be the very minimum needed to effectively teach high school Maths or English.  Thirdly, IQ is strongly related to trainability – except for very low complexity jobs where training is hardly needed and probably wouldn’t “take” anyway.  The correlation of IQ with trainability is slightly higher for medium complexity jobs, .57, than low complexity jobs, .54, and is slightly lower for high complexity jobs, .50, in which a high level of training is usually reached before getting the job.  IQ is ideal for selection within job complexity levels because for a particular job type it means that the relatively higher IQ employee will likely be more trainable and a better performer.  Lastly, of interest is that apart from in wartime when selection standards have to be lowered, the U.S. Navy requires a minimum IQ of 91, the Air Force 88, and the Army 85, noting that the threshold for borderline mental retardation is an IQ of 75.  Any way you cut it, you cannot ignore IQ.

The SAT total score, combining SAT Math and SAT Verbal, has a very high correlation, .86, with the “gold standard” Wechsler-tested IQ score (Wikipedia, 2020).  Since NAPLAN, as revealed above, covers the same content as the SAT, it is likely that a combined NAPLAN Numeracy and Literacy score, also, would provide a suitably accurate measure of IQ.  The Australian government actually uses IQ testing to determine who gets in to our selective high schools.  In NSW, for instance – the states and territories vary – the child hoping to enter a selective school requires an IQ score of 120+, representing the top 9% of the IQ distribution.  The NSW Department of Education, up until now, has used the Wechsler IQ test, specifically the WISC-V, for this purpose but according to the Department’s website, NSW, for some unstated reason, is in the process of changing away from the WISC.  Looking at the bigger picture, one may ask why the government considers IQ testing valid for evaluating the ability of our brightest high schoolers but not the ability of the rest.  IQ is much more important than other factors such as household socio-economic status in determining school achievement.  As a startling example of this, in the 2012 PISA testing round – and we can probably accept PISA results here because we are comparing within the same year – the children of Shanghai’s garbage collectors, the U.S. term “sanitation engineers” would be a kinder one, had a higher PISA Maths average score, 592, than the children of Australia’s IT professionals, 555 (The Australian, February 21, 2014).  This is not surprising given that the average young person’s IQ in Shanghai is 108 according to most estimates, which is about 10 IQ points higher than that of young people here.  This IQ superiority may be due to the fact, demonstrated to me by one of my PhD students who grew up in China, that the Chinese have a better a method of teaching arithmetic than we do.  The major difference is that multiplication is taught as equivalent to addition, and division is taught as equivalent to subtraction; for instance, 2 × 3 = 6 has to be learned vacuously by rote by Western students whereas Chinese students learn it as 2 + 2 + 2, which you can almost see happening and is far more concrete.  The Chinese, too, have an advantage for reading.  This is because spelling is much easier in Chinese; for example, the abstract four-letter English word “good” is reduced to just two concrete letter symbols in Chinese, which are, fascinatingly to those of us in the gender-obsessed world of the West, the symbol for “woman” plus the symbol for “child”; and the hard for many to spell 11-letter word “triceratops” becomes just the three letter symbols “three” “horn” “dinosaur.”  It is little wonder that the Chinese outperform us not only in mathematics but in language comprehension as well.

The second problem with the McGaw-led review is that it falsely claims that NAPLAN testing (and presumably the new proposed ANSA test) encourages teachers to “teach to the test” as though this were an effective method of improving students’ scores.  SAT researchers have studied this question since 1946 when the then-called Scholastic Aptitude Test was first introduced (Wikipedia, 2020).  What they found was that private tutoring and coaching, the most intensive and expensive available method of “teaching to the test,” will improve SAT Math scores by, on average, just over 3%, and improve SAT Verbal scores by less than 2%.  This hardly seems worth the cost unless your son or daughter is just below the cutoff of a particular university’s required entrance score (and noting that the SAT is offered four times a year, twice a year in Australia, and can be re-sat as many times as you wish, but that repeat test-taking improves scores by only about 1% on average and that you might even do worse on a repeated test).  For NAPLAN, normal classroom teaching with practice questions should be enough to provide the only real benefit of “teaching to the test” – which is that practice will help the student overcome unfamiliarity with test procedures and get used to the various specific forms of question-and-answer presentation.  This of course should reduce test anxiety and also cut down the time spent nervously trying to understand how to answer the questions.  A not unimportant side issue here is, if teachers are not teaching Maths and English then just what are they teaching?  I think we all know the answer to that!  As a break from Maths and English, and in place of practically useless and quickly forgotten subjects such as history, geography, economics, and foreign languages where many of the students who take language classes already have the language as their at-home language anyway, all students would be much better off doing an hour or so of active group sport several days a week given that children get far less exercise outside of school than they used to.  Elementary civics might also be worth visiting occasionally before children leave school and are old enough to vote.  Civility, on the other hand, can only be taught indirectly, by parents and teachers serving as role models.   

The third problem with the McGaw-led review is the proposal authors’ belief that you can teach “critical thinking” – something that reportedly would be tested in the new ANSA test.  But critical thinking – or “reasoning” as it used to be called – is the fundamental process that underlies IQ, and, unlike mere facts and purely procedural knowledge, it cannot be taught.  Critical thinking or reasoning ability is a skill that the smarter students pick up automatically and there is plenty of research evidence from failed attempts to teach it showing that this is the reality of the situation.  Like “growth mindset theory” favored by Gonski and Masters, this idea of trying to teach “critical thinking” is a waste of teachers’ resources and time.

The fourth problem with the McGaw-led review is the proposal to place a greater emphasis on science, by which is undoubtedly meant the so-called STEM subjects of science, technology, engineering, and maths.  However, IQ is already the single strongest predictor of performance in STEM subjects at university regardless of whether you did science subjects in high school (see D. Lubinski, Behavior Genetics, 2009).  At primary school, what is known as home science – food choice and preparation, cleaning, use of appliances, hygiene, grooming, and childcare – would not go astray given the high incidence of family breakups and the sad state of parenting in many households these days.  At high school, students need only be taught the basic amount of physics, chemistry, and biology needed for them to be able to function safely in everyday life, although intellectually curious students are undoubtedly going to look for more advanced coverage of these three subjects in Years 11 and 12.

The last problem with the McGaw-led review is its talk of moving the Year 3, Year 5, and Year 7 testing to the beginning of those years, and Year 9 testing to the beginning of Year 10.  This is dangerous because the teacher’s beginning-of-the-year perception of how “bright” the student is will inevitably shape the teacher’s perception in the more subjective areas of marking. 

My recommendations for which tests should be given, and when, are as follows.  We should retain the existing NAPLAN tests at Years 3, 5, 7, and 9 but move them to the end of the year.  We should also drop the between-school “NAPLAN competition” and instead report the results for the individual child over the four NAPLAN years privately to the school and to the child’s parent(s) or carer(s).  We should expect the individual child’s NAPLAN scores in Numeracy and Literacy to stay pretty constant over years 3, 5, 7, and 9, recognizing that the NAPLAN tests are designed to increase in difficulty as the child goes through school.  Any marked gains, therefore, could be attributable to the teaching and any marked declines could indicate either a health problem or the simple fact of the child slacking off with bigger worries than schoolwork.  

At the end of Year 9, we should allow these mostly 15-year-olds on the basis of their NAPLAN Numeracy and Literacy Scores to leave school if they so wish, provided they undertake training in a useful trade.  At present, only 72% of public school students go on to Year 12 to attempt the HSC and only 45% of Aboriginal and Torres Strait Islander students do so (The Daily Telegraph, June 14, 2020).  Let’s be truthful.  The ability to go on to university is largely determined by IQ and there is very little that an extra two years of school, often endured with great difficulty and frequent failure, will do to change that.  While on this topic, it has been reported in the press recently (The Sunday Telegraph, September 13, 2020, Letters to the Editor) that the Government’s severe cuts to TAFE mean that only about $6,500 a year is spent on the average full-time Vocational Education and Training, VET, student, compared with about $40,500 spent on the average full-time university student.  This clearly discriminatory policy obviously has to change.

(3) At the end of Year 12, I recommend that for those students who wish to try for university entry we supplement or even replace the HSC with the SAT.  We should use specifically the SAT I, which includes only the Maths and Verbal tests, and not the more recently added SAT II, which examines performance in individual subjects.  Adoption of the internationally used SAT I, as so far only Melbourne University has done, at last will give Australia a valid screener for university.

(4) Here is my most economically and socially vital recommendation.  At the end of Year 9, we should introduce and require a NAPLAN Creative Ability Test.  Australia as a nation has an embarrassingly poor recent track record of innovation in manufacturing, production, business, and also services – we are now predominantly a service economy with our increasing emphasis on tourism and foreign student education, not to mention barbers and coffee shops.  Here are some of the alarming statistics on genuine innovation, where innovation is defined as the implementation of new or significantly improved products, services, processes, and marketing activities (UNESCO Institute for Statistics, Summary Report of the 2015 UIS Innovation Data Collection, March 2017).  We have fallen out of the top 10 countries in terms of patents applied for per capita and fallen out of the top 10 countries in patents applied for relative to national GDP (World Intellectual Property Indicators, Wikipedia, May 7, 2020).  South Korea, China, and Japan lead us on these two statistics by a country mile.  And, suspiciously to my mind with its record of industrial espionage, China now accounts for 46% of all new patent applications worldwide (WIPO, 2020).  Other indicators (UNESCO, 2017) reveal Australia’s poor support of innovation.  Only 21% of Australia’s self-named innovative manufacturing firms report conducting in-house R&D, compared with 82% of innovative manufacturing firms in South Korea, 59% in China, and 56% in Japan.  Just 28% of Australia’s self-designated innovative manufacturing firms conduct specific training for innovation, compared with 44% in China, 37% in South Korea, and 37% in Japan.  And, on the output side, only 27% of self-designated innovative manufacturing firms in Australia reported introducing an innovation to market in 2015, compared with 37% in Japan, and 37% in South Korea – though, suggesting a lower proportion of successful innovations, only 23% in China.

One reason for Australia’s poor innovation record is our government’s well-known chronic reluctance to financially support innovations and the consequent necessity for many of our best innovators to have to take their ideas overseas, mostly to the U.S.  The other reason is our failure to identify in the high school years our young people with hidden creative ability.

I immodestly and lamentably point out that I am the only active researcher in Australia who has any sort of research publication record in the field of creativity and creative idea generation.  Creativity has been an important field of study for me ever since my first academic publication – an article summarizing my University of Western Australia 1965 Bachelor of Psychology Honours thesis research on IQ and creativity as predictors of Year 9 school achievement (Taft & Rossiter, Psychological Reports, 1966).  My interest in creativity has continued through my career as a business professor and consultant in the U.S. and then after returning home to Australia.  Meanwhile, we have non-experts parading as experts on creative thinking, such as the Australian Council for Educational Research’s – the PISA promoters, you will remember – Senior Research Fellow Ray Philpot (“Assessing creative thinking,” ACER News, September 11, 2019), who does not have even a single Google Scholar-listed publication!  Emeritus Professor Ronald Taft, recently turned 100 and a founding member of the Australian Psychological Society, and who was my University of Western Australia thesis supervisor, designed a highly valid test of creative ability back in 1964, based on the famous University of Southern California psychology professor J.P. Guilford’s work, which I had the benefit of using for my thesis research.  Scores on this test were found to be completely unrelated to the students’ Wechsler-tested Year 9 IQ scores.  Moreover, creative ability scores were found to be completely unrelated to school achievement scores.  What this means – and Taft and I were the first to find it – is that highly creative students are largely being missed in the traditional school system. 

Introducing a NAPLAN Creative Ability Test together with the regular NAPLAN tests at the end of Year 9 would give much overdue recognition to those highly creative children who are not necessarily doing well in school but if identified early, encouraged, and rewarded for their special talent, no doubt could help raise our innovation performance in every job category in Australia.  This is a great opportunity and the Australian government cannot afford to miss it.

Professor John R. Rossiter, AM, is an expert in social science measurement.  He is an honorary professorial fellow in the Faculty of Business at the University of Wollongong and adjunct research Professor in the School of Psychology at Charles Sturt University.


Leave a Reply