Confidence bands

If we measured an adult's height with the same tape-measure every day for one week, we would not expect to have different values from one day to the next. The only source of variability would be the care with which the person doing the measuring aligned the tape-measure and the way in which they read a value from the scale on the tape-measure. On the other hand, test scores do not provide such fixed and exact measures of a test-taker's ability or attainment. It is important to appreciate that, however carefully tests are constructed, an element of error is likely to appear in the results they produce. This does not imply that a mistake has been made in the scoring of the tests, but rather that the score a test-taker achieves can vary within a few points around the 'true score'. The true score is the hypothetical perfect measurement of the person's ability if he or she had taken a very long test with no outside influences such as fatigue or learning. The extent to which a test produces consistent estimates of an individual’s true score is called the reliability of the test.

In order to allow for or quantify this margin of error, a 'confidence interval' or 'confidence band' can be calculated for any particular test score. This is a function of the 'standard error of measurement' of a test, which is a composite measure of the two sources of variability in a set of scores from a number of test-takers. These two sources are (1) the reliability of the test and (2) the 'standard deviation' of the scores, a measure of how spread out the scores are as a result of different test-takers having differing abilities.

Confidence bands vary in size from test to test. A typical 90 per cent confidence band for the mean (average) score of 100 might be obtained by adding 6 and subtracting 6 and so there would be a 9 out of 10 (90 per cent) chance that the test-taker's true score lies within the band 94 to 106. In general, and as a made-up example, out of a group of 30 test-takers, three will be expected to have true scores that fall outside their confidence band. However, we would not know the identities of these three test-takers.

There is, though, a complicating factor in using confidence bands: they are only symmetrical for scores close to the mean (average) score of 100. For high and low scores, the confidence band tends to be pulled back toward the average score for the test. The extent to which this can happen for a test is shown in the table below. The numbers given are those that should be added to and subtracted from scores in the different standardised score ranges from 70 to 140 in order to form the 90 per cent confidence band. In this particular example, for a test-taker scoring 111, the band should be formed by adding 5 and subtracting 7. The limits of the 90 per cent confidence band would therefore be 104 and 116. 


Standardised score range To form
confidence band
Add Subtract
70 to 76 8 4
77 to 80 7 4
81 to 90 7 5
91 to 94 6 5
95 to 105 6 6
106 to 109 5 6
110 to 119 5 7
120 to 124 4 7
125 to 134 4 8
135 to 138 3 8
139 to 140 3 9

Numbers that should be added to and subtracted from a standardised score for a hypothetical test to form a 90 per cent confidence band. Whilst this is numerically a simple size, the use of a 90 per cent confidence band is somewhat arbitrary statistically; the size of the confidence band can only reflect the level of certainty that the test user wishes to operate. The other commonly-used level of confidence is the 95 per cent band, for which only 3 out of a group of 60 test-takers would be expected to have true scores falling outside this band.


