Tag Archives: validity

Content Validity: Expert Judgment Required

For accurate study data, you need a tool that correctly & comprehensively measures the outcome of interest (concept). If a tool measures your outcome of interest accurately it has strong validity. If it measures that outcome consistently, it has high reliability.

For now, let’s focus on validity.

Again, validity is how well a research tool measures what it is intended to measure. 

The four (4) types of validity are 1) face, 2) content, 3) construct, & 4) criterion-related. Click here to read my blog on face validity–the weakest type. Now, let’s step it up a notch to content validity.

Content validity is the comprehensiveness of a data collection survey tool. In other words, does the instrument include items that measure all aspects of the thing (concept) you are studying–whether that thing be professional quality of life, drug toxicity, spiritual health, pain, or something else.

When you find a tool that you want to use, look for documented content validity. Content validity means that the tool creators:

  • 1) adopted a specific definition of the concept they want to measure,
  • 2) generated a list of all possible items from a review of literature and/or other sources,
  • 3) gave both their definition and item list to 3-5+ experts on the topic, &
  • 4) asked those experts independently to rate how well each item represents the adopted concept definition (or not). Often experts are asked to evaluate item clarity as well.

When a majority of the expert panel agrees that an item matches the definition, then that item becomes part of the new tool. Items without agreement are tossed. Experts may also edit items or add items to the list, and the tool creator may choose to submit edited and new items to the whole expert panel for evaluation.

Optionally tool creators  may statistically calculate a content validity index (CVI) for items and/or for the tool as a whole, but content validity is still based on experts’ judgment. Some tool authors are just more comfortable with having a number to represent that judgment. An acceptable CVI > 0.78; the “>” means “greater than or equal to.” (Click here for more on item & scale CVIs. )

When reading a research article, you might see content validity reported for the tool. Here’s an example: Content…validity of the nurse and patient [Spiritual Health] Inventories…[was] based on literature review [and] expert panel input….Using a religious-existential needs framework, 59 items for the nurse SHI were identified from the literature with the assistance of a panel of theology and psychology experts…. Parallel patient items were developed, and a series of testing and revisions was completed resulting in two 31-item tools (p. 4, Highfield, 1992).

For more, check out this  quick explanation of content validity: 3 minute YouTube video. If you are trying to establish content validity for your own new tool, consult a mentor and a research text like Polit & Beck’s Nursing research: Generating and assessing evidence for nursing practice.

Critical thinking: What is the difference between face and content validity? How are they alike. (Hint: check out the video.) What other questions do you have?

Face Validity: Judging a book by its cover

“Don’t judge a book by its cover.” That’s good advice about not evaluating persons merely by the way they look to you. I suggest we all take it.

But…when it comes to evaluating data collection tools, things are different. When we ask the question, “Does this questionnaire, interview, or measurement instrument look like it measures what it is supposed to measure, then we are legitimately judging a book (instrument) by its cover (appearance). We call that judgment face validity. In other words, the tool appears to us on its face to measure what it is designed to measure.

For example, items on the well-established Beck Depression Inventory (DPI) cover a range of symptoms, such as sadness, pessimism, feelings of failure, loss of pleasure, guilt, crying, and so on. If you read all DPI items, you could reasonably conclude just by looking at them that those items do indeed measure depression. That judgement is made without the benefit of statistics, and thus you are judging that book (the DPI) by its cover (how it appears to you). That is face validity.

Face validity is only one of four types of data collection tool validity.

In research, tool validity is defined as how well a research tool measures what it is designed to measure. The four broad types of validity are: a) face, b) content, c) construct, and d) criterion-related validity. And make no mistake, face validity is the weakest of the four. Nonetheless, it makes a good a starting point. Just don’t stop there; you will need one or more of its three statistical validity cousins–content, construct, and criterion-related–to have a strong data collection tool.

And…in referring back to the DPI example….the DPI looks valid probably because it is verified as valid by other types of validity

Thots about why we need face validity at all?

On Target all the time and everytime !

“Measure twice. Cut once!” goes the old carpenter adage. Why? Because measuring accurately means you’ll get the outcomes you want!

Same in research. A consistent and accurate measurement will get you the outcomes you want to know. Whether an instrument measures something consistently is called reliability. Whether it measures accurately is called validity. So, before you use a tool, check for its reported reliability and validity.

A good resource for understanding the concepts of reliability (consistency) and validity (accuracy) of research tools is at https://opentextbc.ca/researchmethods/chapter/reliability-and-validity-of-measurement/ Below are quoted Key Takeaways:

  • Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them.
  • There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
  • Validity is a judgment based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
  • The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. The assessment of reliability and validity is an ongoing process.