“Don’t judge a book by its cover.” That’s good advice about not evaluating persons merely by the way they look to you. I suggest we all take it.
But…when it comes to evaluating data collection tools, things are different. When we ask the question, “Does this questionnaire, interview, or measurement instrument look like it measures what it is supposed to measure, then we are legitimately judging a book (instrument) by its cover (appearance). We call that judgment face validity. In other words, the tool appears to us on its face to measure what it is designed to measure.
For example, items on the well-established Beck Depression Inventory (DPI) cover a range of symptoms, such as sadness, pessimism, feelings of failure, loss of pleasure, guilt, crying, and so on. If you read all DPI items, you could reasonably conclude just by looking at them that those items do indeed measure depression. That judgement is made without the benefit of statistics, and thus you are judging that book (the DPI) by its cover (how it appears to you). That is face validity.
Face validity is only one of four types of data collection tool validity.
In research, tool validity is defined as how well a research tool measures what it is designed to measure. The four broad types of validity are: a) face, b) content, c) construct, and d) criterion-related validity. And make no mistake, face validity is the weakest of the four. Nonetheless, it makes a good a starting point. Just don’t stop there; you will need one or more of its three statistical validity cousins–content, construct, and criterion-related–to have a strong data collection tool.
And…in referring back to the DPI example….the DPI looks valid probably because it is verified as valid by other types of validity
Thots about why we need face validity at all?
