A to Z of Assessment

Sometimes a good old glossary of terms can be the most valuable resource when getting to grips with a new area, so below you can find definitions for some of the most common terms used in relation to assessment.

In the UK, education is a devolved issue with each of the countries of the United Kingdom having separate educational policies and practice. The terms defined in this glossary include some generic terms and some that reflect specific aspects of practice in England and Wales.

Terms given in italics have a separate entry in the glossary.

Accountability – process of evaluating school performance on the basis of pupil performance measures.

Age-related expectations – standards that define what is expected of a pupil by a specified age or year group.

Age-standardised scores – standardised scores that account for the age of a pupil.

Analyse School Performance (ASP) system – replaced RAISEonline in England in 2018 and provides detailed performance analysis to schools.

Assessment coordinator/assessment leader – the member of school staff who leads on assessment.

Assessment for learning – any assessment activity that guides further learning. See Formative assessment.

Assessment literacy – knowledge and understanding of good assessment practice.

Baseline assessment – used to establish a point from which future measurements and predictions can be calculated. By comparing a pupil's performance with their baseline assessment, attainment and progress can be monitored.

Comparative judgement – an assessment technique which involves comparing a series of two pieces of work side-by-side to establish a measurement scale. The judge focuses on identifying which of the pieces is of higher quality.

Computer-adaptive test – a computer-based test that presents easier or more difficult tasks based on previous answers (i.e. it adapts to the test-taker's performance). Sometimes referred to as ‘CAT tests’ or ‘personalised tests’.

Computer-based assessment – a test, task or other assessment activity completed on a computer, sometimes, but not necessarily, delivered online. Also known as E-Assessment.

Contextual value added – a value added measure of the effectiveness of a school or the progress made by individual pupils, taking account of pupils’ starting points (prior attainment) and school and pupil contextual factors (such as the number of pupils in the school eligible for free school meals or the number of pupils for whom English is not their first language).

Criterion-referenced assessment – an assessment where pupils are assessed against a criterion or set of criteria rather than evaluating them in comparison with the performance of other pupils. The criteria represent a level of expertise or mastery of skills or knowledge (in theory all pupils could meet or fail the criteria.) The driving test is an example of a criterion-referenced test.

Curriculum test – an assessment of what children have learned, rather than a test of skill. The assessment is based upon content from a pre-defined curriculum or programme of study.

Data (assessment) – outputs from assessment, such as test results, that can be collected and recorded for reference or analysis.

Diagnostic assessment/diagnostic test – an assessment that allows a teacher to determine pupils’ individual strengths, weaknesses, knowledge, and skills prior to a new period of learning.

Early Years Foundation Stage Profile (England) / Foundation Phase Profile (Wales) – an assessment used to summarise outcomes and progress at the end of the foundation stage / phase of education.

E-assessment – a test, task or other assessment activity sometimes completed on a computer, sometimes, but not necessarily, delivered online.

Evaluation – gathering evidence from a range of sources that inform on performance and using this to support improvements in practice.

Formative assessment – assessment activities that show teachers where pupils are in their learning and help teachers decide what to teach next (sometimes referred to as Assessment for Learning).

High-stakes testing – typically refers to national assessment programmes. The term ‘high-stakes’ refers to the significance of the outcomes for individual pupils (e.g. secondary selection tests) or for schools (accountability judgements about school performance).

League tables – see performance tables.

National curriculum tests (NCTs) – statutory tests developed by the Standards and Testing Agency (STA) for primary schools in England (often referred to as SATs). Pupils take externally-marked national curriculum tests in reading, grammar, punctuation and spelling, and mathematics at the end of key stage 2 (end of year 6). Pupils take national curriculum tests to support teacher assessment at the end of key stage 1 (end of year 2). One of the purposes of the tests is to judge school performance.

National Literacy and Numeracy Framework (LNF) – is designed to help teachers in Wales to embed literacy and numeracy into all subjects for learners aged 5 to 14 and to monitor, assess and report on individual learner performance.

National reading and numeracy tests (Wales) – national tests of reading and numeracy for pupils in years 2 to 9. The tests provide an age-standardised score and a progress measure. Currently transitioning to online adaptive tests. The tests are not used for school accountability purposes.

National Reference Test – Ofqual introduced the National Reference Test (NRT) in 2017 to provide additional information to support the future awarding of GCSEs in England. The information, from English and mathematics tests, is considered each year by Ofqual and the exam boards prior to GCSEs being awarded. The NRT provides information on changes in performance over time and is based on results from a nationally representative sample of students who take the test shortly before taking their GCSEs.

National School Categorisation – a school performance / improvement system used in Wales, to show how well a school is performing. It takes into consideration how effectively the school is led and managed, the quality of learning and teaching, and the level of support and challenge it needs to do better. Each primary and secondary school is placed into one of four colour-coded support categories which trigger a tailored support package.

Norm-referenced assessment – an assessment where pupils are assessed in comparison with the performance of other pupils (e.g. standardised tests). ‘Norm’ groups (e.g. the samples with which pupils are compared) are usually ones in which the scores are distributed in a ‘normal’ or ‘bell-shaped’ curve.

Off the shelf tests – commercially published tests which can be purchased by schools. Some – but not all – will be standardised tests.

Optional tests – non-statutory tests used to assess pupils’ knowledge, understanding and learning.

P scales – descriptions of levels of achievement for pupils with special educational needs. P scales allow progress to be measured in small steps. In England, P scales are used to report statutory assessment outcomes at the end of key stages 1 and 2 for pupils who are working below the pre-key stage standards. (The government has said it will remove the requirement on schools to assess pupils using the P-levels from 2018-19 for subject-specific learning.)

Peer-assessment – process of pupils taking responsibility for assessing the work of their peers.

Percentile ranks – refer to a pupil’s performance in relation to the ‘norm’ (i.e. other pupils who took the same test). If a pupil’s percentile rank is 80, it means that he/she achieved a score equal to or better than 80 percent of the pupils who took the test.

Performance tables (England) – also known as league tables. These summarise how well a school is doing, using a number of performance measures, including the percentage of pupils achieving the ‘expected standard’ in reading, writing and mathematics; pupils’ average scaled scores in reading and mathematics and pupils’ average progress in reading, writing and mathematics at the end of key stage 2.

Phonics screening check (England) – an assessment designed for use at the end of year 1 to confirm whether pupils have learnt phonic decoding to an appropriate standard. The check consists of 20 real words and 20 pseudo-words that a pupil reads aloud to the teacher.

(Interim) pre-key stage standards – for the statutory assessment of pupils (in England) who are working below the overall standard of national curriculum tests and are engaged in subject specific learning. See also P scales.

Progress 8 – Progress 8 is a cohort measure that aims to capture the progress a pupil makes from the end of primary school to the end of key stage 4 in England. It is a type of value added measure, which means that pupils’ results are compared to the progress of other pupils nationally with similar prior attainment. Each increase in every grade a pupil achieves will attract additional credit in the performance tables.

Pupil premium (England) / pupil development grant (Wales) – an additional funding given to schools so that they can support disadvantaged pupils and close the attainment gap between them and their peers. For Early Years there is an equivalent fund: EYPP and EYPDG respectively.

RAISEonline (closed 2017) – an online assessment information package for schools in England. It gave detailed pupil assessment data, which schools used to evaluate their performance. Replaced by Analyse School Performance (ASP).

Raw score – the actual mark or score obtained by a pupil in a test. This may be converted to a Standardised score.

Reception Baseline Assessment – a baseline assessment of children’s attainment (primarily early literacy and numeracy skills) on entry to a reception class. A new statutory assessment is currently in development for introduction in reception by autumn 2020. Its purpose will be to act as the starting point for calculating progress when pupils reach the end of key stage 2 in England.

Reliability – a statistical measure of internal consistency in a test. A high reliability value (the maximum is 1) suggests that the questions in a test are related and therefore assessing the same thing. Other measures of reliability exist, e.g. indicating test stability and whether different versions of a test are interchangeable (i.e. the extent to which repeating the test or using different versions of a test would give similar results).

Representative sample – a sample group that accurately represents the composition of the population group of interest. Representative samples of test-takers are used in assessment trials so that valid and reliable conclusions can be drawn about the general population of test-takers.

SATs – see national curriculum tests.

Scaled score – converted from raw scores, scaled scores enable accurate comparisons of performance over time. In national curriculum tests a scaled score of 100 always represents the expected standard on the test. The NCT scaled scores are different from standardised scores (see below) where the 100 represents the average at the time of standardisation.

Self-assessment – process of pupils taking responsibility for assessing their own work.

Standard deviation – a measure that is used to quantify the amount of variation or dispersion of a set of data values. In assessment, it quantifies the average number of points between all test scores and the average score.

Standardisation – the process of making something conform to a standard. In assessment, tests can be standardised through trials on a large nationally representative sample so a pupil’s performance (see Standardised score) can be benchmarked against the national average, and meaningfully compared with other pupils and standardised scores from other tests.

Standardised score – a score that is converted onto a common scale so that the achievement of pupils can be compared directly. Useful for comparing attainment of pupils who took different versions of a test or for monitoring relative progress. Age-standardised scores allow for differences in the age of pupils taking the same test.

Standardised tests – published tests that have been trialled on a nationally representative sample and that can therefore be used for benchmarking individual / group performance against the national average. Schools can choose whether or not to use these tests and how frequently. For example, NFER publishes termly standardised tests in reading, mathematics, grammar and punctuation and spelling for years 3 to 5 and tests of reading and mathematics for years 1 and 2.

Standard setting – the process of establishing cut scores on an assessment. This helps to create categories such as pass / fail or grades 9-1. A variety of statistical and judgemental methods can be used to set standards.

Statutory assessments – assessments that schools are legally obliged to carry out (e.g. at the end of a key stage).

Success criteria – the evidence a teacher looks for when deciding whether a pupil has successfully learned something. For example, for a pupil beginning to learn about using full stops, the teacher’s success criteria might be that the pupil should use a full stop at the end of all sentences in a piece of writing that is one page long.

Summative assessment – a type of assessment used at the end of a topic, year or phase of education to show what pupils have learned.

Target – defines what pupils will aim to achieve next. Targets may be short term or longer term. They may be set for individual pupils, for groups, or for a whole class.

Task – a practical activity used for assessment purposes. It may be observed by a teacher who will take notes on how the pupil completes the task.

Teacher assessment frameworks – guidance from DfE in England to be used to support statutory teacher assessment judgements at the end of key stages 1 and 2.

Test – a series of questions that a pupil answers on his or her own, without help. Generally answered on paper or on a computer.

Validity – reflects the extent to which an assessment enables correct inferences to be made about a pupil’s knowledge, understanding and skills in the subject being assessed; i.e. the extent to which the test is appropriate for a given population and purpose. Validity judgements include a consideration of the extent to which the test assesses the skill or knowledge targeted, not other irrelevant characteristics, and the consistency and accuracy of its measurement (i.e. reliability is an essential requirement).

Value added – a measure of progress based purely on prior attainment. It takes account of the ‘expected’ progress of pupils with similar prior attainment and allows comparisons across the key stage for a whole year group (see also contextual value added).

Z-score – a measure of how many standard deviations below or above the mean raw score. Z-scores can highlight how far a pupil’s raw score is from the average raw score of test-takers.

For more on the effective use of assessment, head over to the NFER Assessment Hub where you'll find a host of free guidance and resources. You can also sign up to our monthly assessment newsletter for exclusive assessment-related content delivered direct to your inbox.

For more information on NFER’s popular range of termly standardised assessments for key stage 1 and 2, visit www.nfer.ac.uk/tests.