The validity of a pre-hire assessment is the extent to which the assessment is well-grounded in research and corresponds accurately to the real-world dimensions it claims to represent; in short, validity is the degree to which a test measures what it is supposed to measure.
Types of Validity
Content validity is evaluated by showing how well the content of a test samples the class of situations or subject matter about which conclusions are to be drawn, or how representative the test sample is to the universe of generalization for which it is intended. Example of content validity: The GRE Advanced Test in Psychology should adequately and proportionately represent the different fields of psychology. Example of lack of content validity: The teacher gives an exam over chapters s/he has not covered in class. Unlike most of the other forms of validity, content validity cannot be measured by a statistic, it is usually assessed in terms of expert opinion.
Criterion-related validity is evaluated by comparing the test scores with one or more external variables (called criteria) considered to provide a direct measure of the characteristic or behavior in question. Example: self-esteem correlating with GPA.
Predictive validity indicates the extent to which an individual’s future level on a criterion is predicted from prior test performance. Or, the extent to which future levels on a construct are predicted from present construct scores. Example: Using the GRE-Verbal scores of College Seniors to predict their future Graduate School GPA.
Concurrent validity indicates the extent to which the test scores estimate an individual’s present standing on the criterion. Or, the extent to which a construct is related to another construct or criterion when both are measured at the present time. Example: Need-for-achievement score correlating with GPA, both measured now.
Construct validity is evaluated by investigating what qualities a test measures, that is, by determining the degree to which certain explanatory concepts or constructs account for performance on the test. This is the “ big cheese” of validity and can be seen as incorporating all other forms of validity evidence. In principle, there is a complete theory surrounding a construct, every link of which is empirically verified in construct validation. Construct validation requires the integration of many studies.
Convergent validity is evaluated by the degree to which different (hopefully independent) methods of measuring a construct are related and produce similar results. A good metaphor here is a legal trial where the different forms of evidence (e.g., eyewitness testimony, blood samples, fingerprints, fibers converge on the same result and lead to a common conclusion) Example: Self-reported extroversion is related to extroversion as reported by spouse or as rated by an observer.
Discriminant validity is evaluated by the degree to which a construct is discriminable (e.g, uncorrelated) from and non-redundant with other related constructs. Example: Your new measure of self-esteem can be differentiated statistically from other established measures of self-esteem (for example, showing moderate to low correlations with cognate constructs, showing different validity patterns and incremental validity).
Incremental validity refers to the degree to which a construct (or variable) significantly adds unique variance to the prediction of some other construct or criterion. Example: In a hierarchical regression equation, your new measure of self-esteem adds unique variance to the prediction of Teacher Rating of Competence after the Coopersmith Self-Esteem scale has already been entered into the equation.
Known-group validation: refers to predicting and verifying differences on a construct as function of group membership where there is a high degree of a priori consensus about between-group differences on levels of the construct. For example, we would predict and expect to find mean differences on the construct “Attitudes toward Abortion” between “Pro-Choice” and “Pro-Life” groups. In fact, if we did not find a whopping t test difference, we might suspect that something was wrong with our measurement of Attitudes toward Abortion. Or, we might establish known group validation for a measure of schizophrenics using residents of a psychiatric hospital compared to the general population.