How are tests developed?

NFER has been developing assessments for over 70 years for a variety of sponsors, including national governments. We also produce our own suite of optional assessments in reading, mathematics, grammar and punctuation and spelling named NFER Tests.

NFER’s test development processes are underpinned by in-depth research into what works in assessment, as well as through working closely with schools across the country to ensure tests are at an appropriate level of difficulty, use accessible language and include relevant and suitable contexts. As well as informal trials in schools, our summer tests undergo thorough standard setting processes with large and diverse groups of experienced teachers to ensure they are rigorous and reliable. The NFER Tests range has also been standardised with over 60,000 pupils to ensure the most accurate and robust outcomes.

There are five main elements in test development cycles:

Test specification
Item writing
Trialling
Standardisation
Standard setting

Test specification – During this stage, consideration is given to the purpose of the test and the target test-takers to inform a specification of the required characteristics, outputs and methodology for the development of the assessment.

Item writing – Test questions are then written and reviewed in order to map them to the relevant curriculum or to adequately represent the particular skills, knowledge or understanding the test is intended to assess.

Trialling – Questions are trialled in schools to establish how pupils interact with them, to inform the development of the mark schemes and to identify any aspects of the questions that could be improved or amended.

Standardisation – Once questions have been finalised, tests are trialled with a large statistically representative sample of pupils, prior to the test becoming live, in order to provide national benchmarks. An explanation of standardised and age-standardised scores can be found here.

Standard setting (where appropriate) – This is a process involving both qualitative and quantitative data that is used to define the threshold levels of achievement (cut scores or grade boundaries) or the required proficiency on an assessment. For example, to establish what raw score on the test corresponds to the ‘expected standard’ for a pupil.

Test development processes may vary by developer and some processes may be more robust than others, so it is always worth asking for information on the development process if considering commercial materials.

For more on the effective use of assessment, head over to the NFER Assessment Hub where you'll find a host of free guidance and resources. You can also sign up to our monthly assessment newsletter for exclusive assessment-related content delivered direct to your inbox.

For more information on NFER’s popular range of termly standardised assessments for key stage 1 and 2, visit www.nfer.ac.uk/tests.