Centre for Assessment
Information about assessment
A discussion of validity
Although validity is a broad notion that tends to defy precise definition, it might usefully be described as the extent to which test scores are appropriate for the uses to which they are intended to be put. The central facet of validity - which is integral for all intended uses of a test - concerns the characteristic, or construct, that is supposed to be assessed by the test.
Construct-related validity concerns the extent to which scores on a particular test represent the actual distribution of the characteristic that the test is supposed to assess; that is, the extent to which test scores accurately represent the amount of characteristic X possessed by each person tested. The first question is the confidence that can be placed in the construct itself: is it meaningful in common sense and psychological terms? This might require locating it firmly within a theoretical framework of academic psychology. From an empirical perspective, reliability evidence is important in establishing construct-related validity. Similarly, where scores on sub-tests are aggregated to form overall scores, evidence of correlation between the sub-tests is desirable.
Criterion-related validity concerns the relationship between scores on the test in question and on other assessments of performance. This might be investigated from a concurrent or from a predictive standpoint.
Concurrent validity asks how well test performance matches expectations based upon evidence already available from other relevant sources. For example, it was predicted that scores on the NFER’s First Graduate Assessment test ought to bear at least some relationship to the UCAS point scores of students upon whom the test was trialled. Correlations of 0.36, 0.28 and 0.22 for the verbal, numerical and abstract components supported this expectation. Of course, as the underlying constructs were not precisely the same, very high correlations would have been undesirable. Slightly different concurrent evidence supported the validity of the NFER’s Critical Reasoning Tests. Here, statistical Analysis of Variance was used to demonstrate that higher scores on verbal and numerical sub-tests were associated with more O-level or GCSE ‘passes’.
Predictive validity asks how well test performance predicts a future criterion. For example, if the primary purpose of GCE A-level examinations is to provide evidence for university selectors, then UCAS point scores ought to be able to predict university performance with reasonable accuracy. Unfortunately, predictive validity often takes a number of years to establish, which often means that it is less frequently reported by test developers than it should be. Yet, it is one of the most important sources of validity evidence, as tests are so frequently used as tools for prediction.
Content-related validity concerns the subject matter of a test. If test scores are to be used primarily to certificate a person’s level of attainment in an area, then it is particularly important to demonstrate that the items on each test are both relevant to that area (content relevance) and sample the different aspects of that area adequately (content coverage). These concerns are of central importance when the NFER designs national curriculum tests for the Qualifications and Curriculum Authority (QCA).
Finally, while these more conceptual aspects of validity are paramount, the consequential aspects of validity must not be forgotten. These concern the intended and unintended consequences of using test scores for particular purposes. One of the most frequently discussed concerns is the backwash effect that high stakes school assessments can have on the curriculum. When schools are judged in terms of the performance of their pupils in a limited range of subjects, teachers can be tempted to spend more time teaching these subjects at the expense of others.