Category Archives: Reliability & validity

Content Validity: Expert Judgment Required

For accurate study data, you need a tool that correctly & comprehensively measures the outcome of interest (concept). If a tool measures your outcome of interest accurately it has strong validity. If it measures that outcome consistently, it has high reliability.

For now, let’s focus on validity.

Again, validity is how well a research tool measures what it is intended to measure. 

The four (4) types of validity are 1) face, 2) content, 3) construct, & 4) criterion-related. Click here to read my blog on face validity–the weakest type. Now, let’s step it up a notch to content validity.

Content validity is the comprehensiveness of a data collection survey tool. In other words, does the instrument include items that measure all aspects of the thing (concept) you are studying–whether that thing be professional quality of life, drug toxicity, spiritual health, pain, or something else.

When you find a tool that you want to use, look for documented content validity. Content validity means that the tool creators:

  • 1) adopted a specific definition of the concept they want to measure,
  • 2) generated a list of all possible items from a review of literature and/or other sources,
  • 3) gave both their definition and item list to 3-5+ experts on the topic, &
  • 4) asked those experts independently to rate how well each item represents the adopted concept definition (or not). Often experts are asked to evaluate item clarity as well.

When a majority of the expert panel agrees that an item matches the definition, then that item becomes part of the new tool. Items without agreement are tossed. Experts may also edit items or add items to the list, and the tool creator may choose to submit edited and new items to the whole expert panel for evaluation.

Optionally tool creators  may statistically calculate a content validity index (CVI) for items and/or for the tool as a whole, but content validity is still based on experts’ judgment. Some tool authors are just more comfortable with having a number to represent that judgment. An acceptable CVI > 0.78; the “>” means “greater than or equal to.” (Click here for more on item & scale CVIs. )

When reading a research article, you might see content validity reported for the tool. Here’s an example: Content…validity of the nurse and patient [Spiritual Health] Inventories…[was] based on literature review [and] expert panel input….Using a religious-existential needs framework, 59 items for the nurse SHI were identified from the literature with the assistance of a panel of theology and psychology experts…. Parallel patient items were developed, and a series of testing and revisions was completed resulting in two 31-item tools (p. 4, Highfield, 1992).

For more, check out this  quick explanation of content validity: 3 minute YouTube video. If you are trying to establish content validity for your own new tool, consult a mentor and a research text like Polit & Beck’s Nursing research: Generating and assessing evidence for nursing practice.

Critical thinking: What is the difference between face and content validity? How are they alike. (Hint: check out the video.) What other questions do you have?

Face Validity: Judging a book by its cover

“Don’t judge a book by its cover.” That’s good advice about not evaluating persons merely by the way they look to you. I suggest we all take it.

But…when it comes to evaluating data collection tools, things are different. When we ask the question, “Does this questionnaire, interview, or measurement instrument look like it measures what it is supposed to measure, then we are legitimately judging a book (instrument) by its cover (appearance). We call that judgment face validity. In other words, the tool appears to us on its face to measure what it is designed to measure.

For example, items on the well-established Beck Depression Inventory (DPI) cover a range of symptoms, such as sadness, pessimism, feelings of failure, loss of pleasure, guilt, crying, and so on. If you read all DPI items, you could reasonably conclude just by looking at them that those items do indeed measure depression. That judgement is made without the benefit of statistics, and thus you are judging that book (the DPI) by its cover (how it appears to you). That is face validity.

Face validity is only one of four types of data collection tool validity.

In research, tool validity is defined as how well a research tool measures what it is designed to measure. The four broad types of validity are: a) face, b) content, c) construct, and d) criterion-related validity. And make no mistake, face validity is the weakest of the four. Nonetheless, it makes a good a starting point. Just don’t stop there; you will need one or more of its three statistical validity cousins–content, construct, and criterion-related–to have a strong data collection tool.

And…in referring back to the DPI example….the DPI looks valid probably because it is verified as valid by other types of validity

Thots about why we need face validity at all?

Essentials for Clinical Researchers

[note: bonus 20% book discount from publisher. See below flyer]

My 2025 book, Doing Research, is a user-friendly guide, not a comprehensive text. Chapter 1 gives a dozen tips to get started, Chapter 2 defines research, and Chapters 3-9 focus on planning. The remaining Chapters 10-12 guide you through challenges of conducting a study, getting answers from the data, and sharing with others what you learned. Italicized key terms are defined in the glossary, and a bibliography lists additional resources.

New book: “Doing Research: A Practical Guide”

Author: Martha “Marty” E. Farrar Highfield

NOW AVAILABLE ELECTRONICALLY & SOON IN PRINT.

CHECK OUT: https://link.springer.com/book/10.1007/978-3-031-79044-7

This book provides a step-by-step summary of how to do clinical research. It explains what research is and isn’t, where to begin and end, and the meaning of key terms. A project planning worksheet is included and can be used as readers work their way through the book in developing a research protocol. The purpose of this book is to empower curious clinicians who want data-based answers.

Doing Research is a concise, user-friendly guide to conducting research, rather than a comprehensive research text. The book contains 12 main chapters followed by the protocol worksheet. Chapter 1 offers a dozen tips to get started, Chapter 2 defines research, and Chapters 3-9 focus on planning. Chapters 10-12 then guide readers through challenges of conducting a study, getting answers from the data, and disseminating results. Useful key points, tips, and alerts are strewn throughout the book to advise and encourage readers.

Testing the Test (or an intro to “Does the measurement measure up?”)

When reading a research article, you may be tempted only to read the Introduction & Background, then go straight to the Discussion, Implications, and Conclusions at the end. You skip all those pesky, procedures, numbers, and p levels in the Methods & Results sections.

Perhaps you are intimidated by all those “research-y” words like content validity, construct validity, test-retest reliability, and Cronbach’s alpha because they just aren’t part of your vocabulary….YET!

WHY should you care about those terms, you ask? Well…let’s start with an example. If your bathroom scale erratically measured your weight each a.m., you probably would toss it and find a more reliable and valid bathroom scale. The quality of the data from that old bathroom scale would be useless in learning how much you weighed. Similarly in research, the researcher wants useful outcome data. And to get that quality data the person must collect it with a measurement instrument that consistently (reliably) measures what it claims to measure (validity). A good research instrument is reliable and valid. So is a good bathroom scale.

Let’s start super-basic: Researchers collect data to answer their research question using an instrument. That test or tool might be a written questionnaire, interview questions, an EKG machine, an observation checklist, or something else. And whatever instrument the researcher uses needs to give them correct data answers.

For example, if I want to collect BP data to find out how a new med is working, I need a BP cuff that collects systolic and diastolic BP without a lot of artifacts or interference. That accuracy in measuring BP only is called instrument validity. Then if I take your BP 3 times in a row, I should get basically the same answer and that consistency is called instrument reliability. I must also use the cuff as intended–correct cuff size and placement–in order to get quality data that reflects the subject’s actual BP.

The same thing is true with questionnaires or other measurement tools. A researcher must use an instrument for the intended purpose and in the correct way. For example, a good stress scale should give me accurate data about a person’s stress level (not their pain, depression, or anxiety)–in other words it should have instrument validity. It should measure stress without a lot of artifacts or interference from other states of mind.

NO instrument is 100% valid–it’s a matter of degree. To the extent that a stress scale measures stress, it is valid. To the extent that it also measures other things besides stress–and it will–it is less valid. The question you should ask is, “How valid is the instrument?” often on a 0 to 1 scale with 1 being unachievable perfection. The same issue and question applies to reliability.

Reliability & validity are interdependent. An instrument that yields inconsistent results under the same circumstances cannot be valid (accurate). Or, an instrument can consistently (reliably) measure the wrong thing–that is, it can measure something other than what the researcher intended to measure. Research instruments need both strong reliability AND validity to be most useful; they need to measure the outcome variable of interest consistently.

Valid for a specific purpose: Researchers must also use measurement instruments as intended. First, instruments are often validated for use with a particular population; they may not be valid for measuring the same variable in other populations. For example, different cultures, genders, professions, and ages may respond differently to the same question. Second, instruments may be valid in predicting certain outcomes (e.g., SAT & ACT have higher validity in predicting NCLEX success than does GPA). As Sullivan (2011) wrote: “Determining validity can be viewed as constructing an evidence-based argument regarding how well a tool measures what it is supposed to do. Evidence can be assembled to support, or not support, a specific use of the assessment tool.”

In summary….

  1. Instrument validity = how accurate the tool is in measuring a particular variable
  2. Instrument reliability = how consistently the tool measures whatever it measures

Fun Practice: In your own words relate the following article excerpt to the concept of validity? “To assess content validity [of the Moral Distress Scale], 10 nurses were asked to provide comments on grammar, use of appropriate words, proper placement of phrases, and appropriate scoring. From p.3, Ghafouri et al. (2021). Psychometrics of the moral distress scale in Iranian mental health nurses. BMC Nursing. https://doi.org/10.1186/s12912-021-00674-4

On Target all the time and everytime !

“Measure twice. Cut once!” goes the old carpenter adage. Why? Because measuring accurately means you’ll get the outcomes you want!

Same in research. A consistent and accurate measurement will get you the outcomes you want to know. Whether an instrument measures something consistently is called reliability. Whether it measures accurately is called validity. So, before you use a tool, check for its reported reliability and validity.

A good resource for understanding the concepts of reliability (consistency) and validity (accuracy) of research tools is at https://opentextbc.ca/researchmethods/chapter/reliability-and-validity-of-measurement/ Below are quoted Key Takeaways:

  • Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them.
  • There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
  • Validity is a judgment based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
  • The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. The assessment of reliability and validity is an ongoing process.

Research Words of the Week: Reliability & Validity

Reliability & validity are terms that refer to the consistency and accuracy of a quantitative measurement questionnaire, technical device, ruler, or any other measuring device.  It means that the outcome measure can be trusted and is relatively error free.

  • Reliability This means that the instrument measures CONSISTENTLY
  • Validity – This means that the instrument measures ACCURATELY. In other words it measures what it is supposed to measure and not something else.

For example: If your bathroom scale measures weight, then it is a valid measure of weight (e.g. it doesn’t measure BP or stress). You might say it had high validity. If your bathroom scale measures your weight as the same thing when you step on and off of it several times then it is measuring weight reliably  or consistently; and you might say it has high reliability.

“Please answer!” – How to increase the odds in your favor when it comes to questionnaires

Self-report by participants is one of the most common ways that researchers collect data, yet it is fraught with problems.   Some worries for researchers are: “Will participants be honest or will they say what they think I want to hear?”   “Will they understand the DifferentGroupsquestions correctly?”  “Will those who respond (as opposed to those who don’t respond) have unique ways of thinking so that my respondents do not represent everyone well?” and a BIG worry “Will they even fill out and return the questionnaire?”

One way to solve at least the latter 2 problems is to increase the response rate, and Edwards et al (2009 July 8) reviewed randomized trials  to learn how to do just that!!Questionnaire faces

If you want to improve your questionnaire response rates, check it out!  Here is Edwards et al.’s plain language summary as published in Cochrane Database of Systematic Reviews, where you can read the entire report.

Methods to increase response to postal and electronic questionnaires

MailPostal and electronic questionnaires are a relatively inexpensive way to collect information from people for research purposes. If people do not reply (so called ‘non-responders’), the research results will tend to be less accurate. This systematic review found several ways to increase response. People can be contacted before they are sent a postal questionnaire. Postal questionnaires can be sent by first class post or recorded delivery, and a stamped-return envelope can be provided. Questionnaires, letters and e-mails can be made more personal, and preferably kept short. Incentives can be offered, for example, a small amount of money with Remember jpga postal questionnaire. One or more reminders can be sent with a copy of the questionnaire to people who do not reply.

 

Critical/reflective thinking:  Imagine that you were asked to participate in a survey.  Which of these strategies do you think would motivate or remind you to respond and why?

For more info read the full report: Methods to increase response to postal and electronic questionnaires

 

Consistency wins! High reliability= Zero harm

priority“What’s important is not where an organization begins its patient safety journey, but instead the degree to which it exhibits a relentless commitment to improvement.” – TJC, 2016, p.68

The path to zero harm, according to TJC, begins with high reliability.   Reliability in research = consistency.  TJC says for zero harm we as providers must be consistent in these ways:

  • Never be satisfied with your safety record. Always be alert for danger
  • Be alert for early signs of potential danger. Don’t oversimplify your observations
  • Note small changes in the organization as having longer range or unintended effects
  • Commit to resilience so that when errors do happen, you bounce back quickly
  • When confronted by a threat, put its resolution in the hands of those with the most expertise in that area

Using evidence in practice can be part of our “relentless commitment to improvement,” especially when coupled with above 5 actions and can support zero harm to patients.   That evidence can be from research, from process improvement, from evaluation of clinical innovations, or from experts.

For more read TJC’s High Reliability: The Path to Zero Harm online at http://www.jointcommission.org/assets/1/18/HC_Exec_article.pdf  

Making research accessible to RNs