How do you know your tests are reliable?

It is one thing to administer a test of, say, numerical ability to a group of college students; it is another thing to have confidence in using the results that emerge. What if the results happened to show that students with poor GCSE maths grades actually did quite well on the test? What if the test was administered to the same students on a subsequent occasion with very different results? What if the test questions appeared to be more scientific than mathematical? All of these situations would provide grounds for doubting whether results from the test could be trusted as accurate indices of students’ numerical abilities.

Test developers, such as the NFER, need to be sure - and need to be seen to be sure - that their tests do not fall foul of these kinds of criticisms. As such, we need to be pro-active in demonstrating how trustworthy results from our tests are likely to be. Unfortunately, there is no single experiment that could prove conclusively whether a particular test will yield trustworthy results in future applications. However, a wide range of conceptual and empirical techniques have been developed over the years to provide at least partial evidence. These have traditionally been discussed under two headings: reliability and validity.

There are no absolutes when dealing with psychological or educational tests and no test will ever be perfect. For this reason, evidence of reliability and validity has to be weighed up by the person who intends to use the results of the test. An essential consideration is the likelihood that conclusions from the reliability and validity experiments (presented by the test developer) will generalise to the situation in which the test will be used. Ultimately, the onus is upon the test user to decide whether the results that s/he will generate are likely to be valid enough or reliable enough for the purpose that s/he has in mind.


