Testing the Test (or an intro to “Does the measurement measure up?”)

When reading a research article, you may be tempted only to read the Introduction & Background, then go straight to the Discussion, Implications, and Conclusions at the end. You skip all those pesky, procedures, numbers, and p levels in the Methods & Results sections.

Perhaps you are intimidated by all those “research-y” words like content validity, construct validity, test-retest reliability, and Cronbach’s alpha because they just aren’t part of your vocabulary….YET!

WHY should you care about those terms, you ask? Well…let’s start with an example. If your bathroom scale erratically measured your weight each a.m., you probably would toss it and find a more reliable and valid bathroom scale. The quality of the data from that old bathroom scale would be useless in learning how much you weighed. Similarly in research, the researcher wants useful outcome data. And to get that quality data the person must collect it with a measurement instrument that consistently (reliably) measures what it claims to measure (validity). A good research instrument is reliable and valid. So is a good bathroom scale.

Let’s start super-basic: Researchers collect data to answer their research question using an instrument. That test or tool might be a written questionnaire, interview questions, an EKG machine, an observation checklist, or something else. And whatever instrument the researcher uses needs to give them correct data answers.

For example, if I want to collect BP data to find out how a new med is working, I need a BP cuff that collects systolic and diastolic BP without a lot of artifacts or interference. That accuracy in measuring BP only is called instrument validity. Then if I take your BP 3 times in a row, I should get basically the same answer and that consistency is called instrument reliability. I must also use the cuff as intended–correct cuff size and placement–in order to get quality data that reflects the subject’s actual BP.

The same thing is true with questionnaires or other measurement tools. A researcher must use an instrument for the intended purpose and in the correct way. For example, a good stress scale should give me accurate data about a person’s stress level (not their pain, depression, or anxiety)–in other words it should have instrument validity. It should measure stress without a lot of artifacts or interference from other states of mind.

NO instrument is 100% valid–it’s a matter of degree. To the extent that a stress scale measures stress, it is valid. To the extent that it also measures other things besides stress–and it will–it is less valid. The question you should ask is, “How valid is the instrument?” often on a 0 to 1 scale with 1 being unachievable perfection. The same issue and question applies to reliability.

Reliability & validity are interdependent. An instrument that yields inconsistent results under the same circumstances cannot be valid (accurate). Or, an instrument can consistently (reliably) measure the wrong thing–that is, it can measure something other than what the researcher intended to measure. Research instruments need both strong reliability AND validity to be most useful; they need to measure the outcome variable of interest consistently.

Valid for a specific purpose: Researchers must also use measurement instruments as intended. First, instruments are often validated for use with a particular population; they may not be valid for measuring the same variable in other populations. For example, different cultures, genders, professions, and ages may respond differently to the same question. Second, instruments may be valid in predicting certain outcomes (e.g., SAT & ACT have higher validity in predicting NCLEX success than does GPA). As Sullivan (2011) wrote: “Determining validity can be viewed as constructing an evidence-based argument regarding how well a tool measures what it is supposed to do. Evidence can be assembled to support, or not support, a specific use of the assessment tool.”

In summary….

  1. Instrument validity = how accurate the tool is in measuring a particular variable
  2. Instrument reliability = how consistently the tool measures whatever it measures

Fun Practice: In your own words relate the following article excerpt to the concept of validity? “To assess content validity [of the Moral Distress Scale], 10 nurses were asked to provide comments on grammar, use of appropriate words, proper placement of phrases, and appropriate scoring. From p.3, Ghafouri et al. (2021). Psychometrics of the moral distress scale in Iranian mental health nurses. BMC Nursing. https://doi.org/10.1186/s12912-021-00674-4

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.