Computer Adaptive Tests: what’s not to like?

By Angela Hopkins, NFER Head of Assessment Services

Wednesday 24 November 2021

Current assessment and accountability arrangements in primary schools have been examined in detail by the Education and Skills (EDSK’s) report, Making Progress: The future of assessment. A key recommendation is that England should move to online adaptive tests to track the performance and progress of primary-aged pupils.

(Computer) Adaptive Testing, sometimes referred to as personalised testing, is a topic of growing interest, especially given the transitions made in Scotland, Wales and Australia to various forms of adaptive assessment. In this blog we take an objective look at the benefits and also the challenges of an adaptive testing model.

The defining feature of a computer adaptive test (CAT) is that it selects questions for the test taker as they go along, based on how well they have performed on the questions they’ve answered so far. By adapting in this way, it means the test taker can be presented with a set of questions targeted to their ability. Most test takers would come away having scored around half marks on a test that feels ‘about right’ for them.

The concept of adaptive testing has been around for some time. Indeed tiering, as used in many GCSE exams when they were introduced in the late 1980s, is a fairly basic form of adaptive testing. Online assessment capability, however, presents the opportunity for greater use of this assessment approach through the use of a CAT. Based on algorithms, CATs enable a more sophisticated use of adaptive testing and the possibility of use at a national level.

There are numerous benefits associated with a computer adaptive system. One of the most important of these is the pupil’s experience. Rather than take a test in which they potentially cannot answer many of the questions or, conversely, find many of them too easy, the pupil is directed through a series of mainly targeted questions. This helps ensure the pupil feels no sense of inadequacy or boredom. The test length can also be reduced, as the use of targeted questions will enable stakeholders to make inferences about the pupil’s knowledge and understanding of the curriculum, based on a smaller number of questions than is needed for a linear test. Similarly, a test comprising questions which are increasingly targeted at the pupil’s ability level will provide a greater degree of precision than a test designed for pupils of all abilities.

Other benefits include automated scoring and immediacy of results, so avoiding the need for a post-test marking arrangement and checks on marker accuracy. With their higher levels of precision, CATs can also support a system where the focus is more on tracking learners’ progress over an educational phase with potentially more frequent but shorter assessments. A move to more targeted assessments also enables a move to testing when ready, rather than testing everyone at the same time. This does have implications, however, for reporting outcomes as explained later.

In summary, CATs potentially offer a more reliable, proportionate and user-friendly test.

As enthusiasm for CATs grows – as seen in the EDSK report and ASCL’s recent Blueprint - A Great Education For Every Child – it’s important key stakeholders in the assessment system reflect on all the implications of an adaptive system.

For a country which has grown very used to the notion of linear assessments with raw scores representing grade boundaries, CATs would require a shift to understanding outcomes on a scale, determined by an algorithm and with much less transparency. For a start, pupils taking the assessment would not get a raw or total score as a result. Their performance would be described as a position on an ability scale, although it could be translated to a grade or accompanied by a proficiency description. It is the job of test developers and psychometricians to ensure this metric can be understood in terms of the knowledge, skills and understanding the pupil has demonstrated.

A revised national accountability system based on CATs, as proposed in both the EDSK and the ASCL reports, would be possible to introduce but the ‘results’ and outputs will be complicated. For many stakeholders, including parents and pupils, it could feel less transparent than the current arrangement and lead to frustration and lack of engagement. Any move to an adaptive system would therefore, benefit from an effective communications campaign to support the transition. In addition, no accountability system is immune to having an unintended influence on schools’ behaviours. For example, if testing windows were to become more flexible, as seen in some other countries which use adaptive testing, it may create a tension between timing the test to best meet the need of the school (in relation to published outputs) or timing it to meet individual pupil needs.

CATs could potentially make it harder for teachers to use test and question-level data to inform learning at a whole class level, because pupils will have tackled different questions. Careful consideration of how outcomes from CATs are reported is, therefore, important to ensure the information can support teaching.

A CAT could also potentially limit the learner’s opportunity to show their true ability. For example, if they were to enter uncharacteristic responses to earlier questions, they may not be presented with more demanding or more accessible questions later on. Similarly, some learners may not perform in a linear fashion. For example, they may be able to access some fairly challenging concepts in maths, yet struggle with more basic questions.

A CAT needs to comprise questions which can be scored in real time using automated scoring. This will restrict the types of questions that can be presented to learners and would require a significant change to many assessments currently in use in England, including national curriculum tests. While it is possible to test some aspects of higher order skills and understanding through selected response questions, it will limit the scope of what is assessed and, as a result, tests could provide less valid evidence of attainment and progress.

Behind the scenes, any CAT system needs a very large bank of questions to function effectively. There must be enough questions to adequately cover the test content and the full ability range of candidates. It must also avoid being open to allegations of memorisation or predictability of questions on certain topics - and this risk is increased if pupils are tested more frequently. It may seem like a straightforward task but the reality of developing large numbers of high quality questions with fine gradations of challenge is a demanding one. It could mean the set up costs of the assessment are very high.

It goes without saying that for a CAT to work effectively, it must be rigorously piloted to ensure the assessment will function as intended and produce reliable data about each question and each test taker. In addition, regardless of the device used, and when or where the test is taken, the learner experience should be the same.

CATs carry many benefits, especially in relation to improving pupils’ experience. However, such a significant change in the assessment system and the particular challenges in a move to CAT should not be underestimated. The complex development process would need to be carefully managed to ensure it results in valid and reliable measurements which are informative for teachers and other stakeholders.