Centre for Assessment
Information about assessment
What do test scores mean?
Many people will remember test scores from their school days such as ‘7 out of 10’ for a primary school spelling test, or ‘63%’ for one of their secondary school exams. Such scores are readily understandable and are useful in indicating what proportion of the total marks a person has gained, but these scores do not account for factors such as how hard the test is, where a person stands in relation to other people, and the margin of error in the test score. As another example, in a school test such as mathematics or English, we would not know how well the pupil is performing against National Curriculum measures.
Many professionally produced tests, including most of those constructed by NFER, give outcomes that are different from simple proportions or percentages. The following types of score or measure account for many of the outcomes of educational or psychometric tests:
Standardised scores and percentile ranks are directly related. Both enable test-takers to be compared with a large, nationally representative sample that has taken the test prior to publication. The standardised score is on a scale that can be readily compared and combined with standardised scores from other tests; the percentile rank gives a rank ordering of that score based on the population as a whole.
Standardised scores are more useful measures than raw scores (the number of questions answered correctly) and there are three reasons why such scores are normally used.
1) In order to place test takers' scores on a readily understandable scale
One way to make a test score such as 43 out of 60 more readily understandable would be to convert it to a percentage (72 per cent to the nearest whole number). However, the percentage on its own is not related to (a) the average score of all the test-takers, or (b) how spread out their scores are. On the other hand, standardised scores are related to both these statistics. Usually, tests are standardised so that the average, nationally standardised score automatically comes out as 100, irrespective of the difficulty of the test, and so it is easy to see whether a test-taker is above or below the national average.
The measure of the spread of scores is called the 'standard deviation' and this is usually set to 15 for educational attainment and ability tests, and for many occupational tests. This means that, irrespective of the difficulty of the test, about 68 per cent of the test-takers in the national sample will have a standardised score within 15 points of the average (between 85 and 115), and about 96 per cent will have a standardised score within two standard deviations (30 points) of the average (between 70 and 130). These examples come from a frequency distribution known as 'the normal distribution', which is shown in the figure below.
2) In educational tests, so that an allowance can be made for the different ages of the pupils
In a typical class in England and Wales, it is usual that most pupils are born between 1st September in one year and 31st August of the following year, which means that the oldest pupils are very nearly 12 months older than the youngest. Almost invariably in ability tests taken in the primary and early secondary years, older pupils achieve slightly higher raw scores than younger pupils. However, standardised scores are derived in such a way that the ages of the pupils are taken into account by comparing a pupil only with others of the same age (in years and months). An older pupil may in fact gain a higher raw score than a younger pupil, but have a lower standardised score. This is because the older pupil is being compared with other older pupils in the reference group and has a lower performance relative to his or her own age group.
3) So that scores from more than one test can be meaningfully compared or added together
Standardised scores from most educational tests cover the same range from 70 to 140. Hence a pupil's standing in, say, mathematics and English can be compared directly using standardised scores. Similarly, should a teacher wish to add together scores from more than one test, for example in order to obtain a simple overall measure of attainment, they can be meaningfully combined if standardised scores are used, whereas it is not meaningful to add together raw scores from tests of different length or difficulty.
In occupational tests, the use of standardised scores enables the organisation to compare directly or add together sub-test scores or scores from different tests in a battery.
Recording a test-taker's percentile rank enables his or her performance to be compared very clearly with those in the national standardisation sample. The percentile rank of a test-taker is defined as the percentage of test-takers in the sample who gained a score at the same level or below that of the test-taker's score. Performance at the 25th percentile, for example, indicates a standardised score that is as good as, or better than, the standardised scores of 25 per cent of the sample. This information may be useful when, for example, reporting school test scores to parents.
There is, in fact, a fixed relationship between standardised scores and percentile ranks when the same average score and standard deviation are used. The table below shows the relationship for tests that employ an average standardised score of 100 and a standard deviation of 15.
The table below shows conversion of standardised scores to percentile ranks (example only).
* SS = Standardised Score; PR = Percentile rank
There are though, some disadvantages to using percentiles. First, since they denote positions, they cannot be meaningfully combined to find averages for groups of test-takers, such as a school or a class. Second, as can be seen in the table, percentiles are very compressed in the centre of the score range and a small increase in standardised score can lead to a big jump in the percentile rank of a test-taker, simply because of the bunching of scores in the middle of the score range. Comparisons between test-takers in terms of percentiles should therefore be made very cautiously. Care should also be taken when considering the change in percentile rank of one test-taker on different occasions or on different tests.