Category Archives: Reliability & validity

Data collection, Methodologic research, nursing research, Outcome measurement, Reliability & validity, research

Construct Validity: Taking it to the next level

January 27, 2026 Martha "Marty" Farrar Highfield PhD RN Leave a comment

Collecting data is tricky. Data collection tools, like questionnaires, measure research study outcomes more or less well. A tool’s level of validity is how comprehensively & accurately the tool measures what it is supposed to measure (like stress, hope, etc); and reliabilityis how consistently it measures what it is supposed to measure. We’ve all had the experience of a weight-measuring bathroom scale breaking bad and changing our weight each time we step on it. That scale has validity, but no reliability. (See earlier post On Target all the time and everytime” )

Tools are more or less reliable & valid; none are perfect.

Validity is like hitting the right outcome target, and there are four (4) types of validity: 1) Face validity, 2) Content validity, 3) Construct validity, & 4) Criterion-related validity. Earlier posts focused on face & content validity as linked above. This blog focuses on #3: construct validity.

Construct validity is the level of tool accuracy & can be established by these statistical measures: a) convergent validity, b) discriminant/divergent validity, c) known groups, and d) factor analysis. For each of these, subjects (Ss) complete the measurement tool, & results are analyzed.

To illustrate, let’s assume we have a new pain data collection tool. In convergent validity, the same group of Ss complete the new pain tool and an already established pain tool (like self-report on 1-10 scale). Convergent construct validity exists when there is a positive correlation between the results from both tools. Scores on both tools should be similar for convergent validity.

For discriminant (or divergent) validity, all Ss complete the new pain tool and an established tool that measures the “opposite” or a dissimilar concept, such as feeling comfortable. Divergent validity of the new tool is revealed when there is no or low correlation between results from these 2 dissimilar tools. That’s a good thing! We should expect a big difference because the tools are measuring very different things. Pain & feeling comfortable should be very different in the same person at the same time for divergent validity.

Known groups validity means that we compare scores from subjects who exhibit & from those who do NOT exhibit what our tool is supposed to measure. For example, a group in pain and a group who are NOT in pain may fill out the new pain assessment tool. Scores of these two groups should obviously be very different on the new tool. Scores of the two groups should have an inverse,* no, or low correlation. If the two groups average scores are compared, the group means should be very different.* These differences between groups = known group construct validity. [*Notes: 1) “inverse” means that as one score goes up the other goes down; 2) t-test or ANOVA would be used to compare group means.]

Photo by Karolina Grabowska http://www.kaboompics.com on Pexels.com

Finally, a single group of subjects (Ss) may complete the instrument, and the researcher calculates statistical factor analysis. Factor analysis results arrange items into groups of similar items. The researcher examines each group of items (a factor) and labels it conceptually based on what s/he sees as their commonality. In our fictitious pain tool example, factor analysis may group items into three (3) main factors that the researcher labels as “physical aspects of pain,” “psychological aspects of pain,” and “disruption of relationships.”

FOR MORE INFO: Check out Highfield, M.E.F. (2025). Select Data Collection Tool. In: Doing Research: A practical guide for health professionals. Springer, Cham. https://doi.org/10.1007/978-3-031-79044-7_8

CRITICAL THINKING EXERCISE: Read this Google AI overview to test yourself on construct validity. Do you see any familiar ideas?

Pain scale construct validity is established when instruments (e.g., VAS, NRS, FPS-R) accurately measure the theoretical, multi-dimensional concept of pain—intensity, affect, and interference—rather than just a physical sensation. Evidence shows strong convergence between these tools (r=0.82–0.95), confirming they measure similar constructs.

Convergent Validity: High correlations exist between different, established pain scales (e.g., Numerical Rating Scale (NRS) and Visual Analogue Scale (VAS), indicating they measure the same construct.

Discriminant Validity: Pain scales show lower, non-significant correlations with unrelated variables (e.g., age, irrelevant behavioral factors), proving they specifically measure pain, not general distress.

Dimensionality: Construct validity in tools like the Brief Pain Inventory (BPI) is confirmed through factor analysis, which differentiates between pain intensity and pain interference.

Methodologic research, nursing research, Reliability & validity, research

Content Validity: Expert Judgment Required

August 10, 2025 Martha "Marty" Farrar Highfield PhD RN Leave a comment

For accurate study data, you need a tool that correctly & comprehensively measures the outcome of interest (concept). If a tool measures your outcome of interest accurately it has strong validity. If it measures that outcome consistently, it has high reliability.

For now, let’s focus on validity.

Again, validity is how well a research tool measures what it is intended to measure.

The four (4) types of validity are 1) face, 2) content, 3) construct, & 4) criterion-related. Click here to read my blog on face validity–the weakest type. Now, let’s step it up a notch to content validity.

Content validity is the comprehensiveness of a data collection survey tool. In other words, does the instrument include items that measure all aspects of the thing (concept) you are studying–whether that thing be professional quality of life, drug toxicity, spiritual health, pain, or something else.

When you find a tool that you want to use, look for documented content validity. Content validity means that the tool creators:

1) adopted a specific definition of the concept they want to measure,
2) generated a list of all possible items from a review of literature and/or other sources,
3) gave both their definition and item list to 3-5+ experts on the topic, &
4) asked those experts independently to rate how well each item represents the adopted concept definition (or not). Often experts are asked to evaluate item clarity as well.

When a majority of the expert panel agrees that an item matches the definition, then that item becomes part of the new tool. Items without agreement are tossed. Experts may also edit items or add items to the list, and the tool creator may choose to submit edited and new items to the whole expert panel for evaluation.

Optionally tool creators may statistically calculate a content validity index (CVI) for items and/or for the tool as a whole, but content validity is still based on experts’ judgment. Some tool authors are just more comfortable with having a number to represent that judgment. An acceptable CVI > 0.78; the “>” means “greater than or equal to.” (Click here for more on item & scale CVIs. )

When reading a research article, you might see content validity reported for the tool. Here’s an example: Content…validity of the nurse and patient [Spiritual Health] Inventories…[was] based on literature review [and] expert panel input….Using a religious-existential needs framework, 59 items for the nurse SHI were identified from the literature with the assistance of a panel of theology and psychology experts…. Parallel patient items were developed, and a series of testing and revisions was completed resulting in two 31-item tools (p. 4, Highfield, 1992).

For more, check out this quick explanation of content validity: 3 minute YouTube video. If you are trying to establish content validity for your own new tool, consult a mentor and a research text like Polit & Beck’s Nursing research: Generating and assessing evidence for nursing practice.

Critical thinking: What is the difference between face and content validity? How are they alike. (Hint: check out the video.) What other questions do you have?

Methodologic research, Reliability & validity

Face Validity: Judging a book by its cover

July 19, 2025 Martha "Marty" Farrar Highfield PhD RN 1 Comment

“Don’t judge a book by its cover.” That’s good advice about not evaluating persons merely by the way they look to you. I suggest we all take it.

But…when it comes to evaluating data collection tools, things are different. When we ask the question, “Does this questionnaire, interview, or measurement instrument look like it measures what it is supposed to measure, then we are legitimately judging a book (instrument) by its cover (appearance). We call that judgment face validity. In other words, the tool appears to us on its face to measure what it is designed to measure.

For example, items on the well-established Beck Depression Inventory (DPI) cover a range of symptoms, such as sadness, pessimism, feelings of failure, loss of pleasure, guilt, crying, and so on. If you read all DPI items, you could reasonably conclude just by looking at them that those items do indeed measure depression. That judgement is made without the benefit of statistics, and thus you are judging that book (the DPI) by its cover (how it appears to you). That is face validity.

Face validity is only one of four types of data collection tool validity.

In research, tool validity is defined as how well a research tool measures what it is designed to measure. The four broad types of validity are: a) face, b) content, c) construct, and d) criterion-related validity. And make no mistake, face validity is the weakest of the four. Nonetheless, it makes a good a starting point. Just don’t stop there; you will need one or more of its three statistical validity cousins–content, construct, and criterion-related–to have a strong data collection tool.

And…in referring back to the DPI example….the DPI looks valid probably because it is verified as valid by other types of validity

Thots about why we need face validity at all?

Data collection, Disseminating findings, Literature review, Magnet New Knowledge_Research, nursing research, nursing science, Qualitative research, quantitative research, reading research, Reliability & validity, research, Research design, research methods, Research student, Resident research project

Essentials for Clinical Researchers

June 18, 2025 Martha "Marty" Farrar Highfield PhD RN Leave a comment

[note: bonus 20% book discount from publisher. See below flyer]

My 2025 book, Doing Research, is a user-friendly guide, not a comprehensive text. Chapter 1 gives a dozen tips to get started, Chapter 2 defines research, and Chapters 3-9 focus on planning. The remaining Chapters 10-12 guide you through challenges of conducting a study, getting answers from the data, and sharing with others what you learned. Italicized key terms are defined in the glossary, and a bibliography lists additional resources.

authorship, Case study, Clinical significance, Conferences, content analysis, Correlation research, Data, Data collection, Descriptive research, Disseminating findings, Ethics, Evidence hierarchy, Historical research, hypothesis, Literature review, Meta-analysis, Methodologic research, Methods, mixed methods, Narrative research, non-experimental research, nonrandom trials, nursing research, Observational research, Outcome measurement, Paradigm, PICO, pilot study, Poster presentation, Prospective design, Publication, Publishing, Qualitative research, quality improvement, quantitative research, quasi-experimental, Questionnaires, RCT, reading research, Reliability & validity, research, Research design, research methods, Resident research project, statistical significance, Writing

New book: “Doing Research: A Practical Guide”

March 25, 2025 Martha "Marty" Farrar Highfield PhD RN 1 Comment

Author: Martha “Marty” E. Farrar Highfield

NOW AVAILABLE ELECTRONICALLY & SOON IN PRINT.

CHECK OUT: https://link.springer.com/book/10.1007/978-3-031-79044-7

This book provides a step-by-step summary of how to do clinical research. It explains what research is and isn’t, where to begin and end, and the meaning of key terms. A project planning worksheet is included and can be used as readers work their way through the book in developing a research protocol. The purpose of this book is to empower curious clinicians who want data-based answers.

Doing Research is a concise, user-friendly guide to conducting research, rather than a comprehensive research text. The book contains 12 main chapters followed by the protocol worksheet. Chapter 1 offers a dozen tips to get started, Chapter 2 defines research, and Chapters 3-9 focus on planning. Chapters 10-12 then guide readers through challenges of conducting a study, getting answers from the data, and disseminating results. Useful key points, tips, and alerts are strewn throughout the book to advise and encourage readers.

Data collection, nursing research, Outcome measurement, quantitative research, reading research, Reliability & validity, research, research methods

Testing the Test (or an intro to “Does the measurement measure up?”)

April 27, 2022 Martha "Marty" Farrar Highfield PhD RN 1 Comment

When reading a research article, you may be tempted only to read the Introduction & Background, then go straight to the Discussion, Implications, and Conclusions at the end. You skip all those pesky, procedures, numbers, and p levels in the Methods & Results sections.

Perhaps you are intimidated by all those “research-y” words like content validity, construct validity, test-retest reliability, and Cronbach’s alpha because they just aren’t part of your vocabulary….YET!

WHY should you care about those terms, you ask? Well…let’s start with an example. If your bathroom scale erratically measured your weight each a.m., you probably would toss it and find a more reliable and valid bathroom scale. The quality of the data from that old bathroom scale would be useless in learning how much you weighed. Similarly in research, the researcher wants useful outcome data. And to get that quality data the person must collect it with a measurement instrument that consistently (reliably) measures what it claims to measure (validity). A good research instrument is reliable and valid. So is a good bathroom scale.

Let’s start super-basic: Researchers collect data to answer their research question using an instrument. That test or tool might be a written questionnaire, interview questions, an EKG machine, an observation checklist, or something else. And whatever instrument the researcher uses needs to give them correct data answers.

For example, if I want to collect BP data to find out how a new med is working, I need a BP cuff that collects systolic and diastolic BP without a lot of artifacts or interference. That accuracy in measuring BP only is called instrument validity. Then if I take your BP 3 times in a row, I should get basically the same answer and that consistency is called instrument reliability. I must also use the cuff as intended–correct cuff size and placement–in order to get quality data that reflects the subject’s actual BP.

The same thing is true with questionnaires or other measurement tools. A researcher must use an instrument for the intended purpose and in the correct way. For example, a good stress scale should give me accurate data about a person’s stress level (not their pain, depression, or anxiety)–in other words it should have instrument validity. It should measure stress without a lot of artifacts or interference from other states of mind.

NO instrument is 100% valid–it’s a matter of degree. To the extent that a stress scale measures stress, it is valid. To the extent that it also measures other things besides stress–and it will–it is less valid. The question you should ask is, “How valid is the instrument?” often on a 0 to 1 scale with 1 being unachievable perfection. The same issue and question applies to reliability.

Reliability & validity are interdependent. An instrument that yields inconsistent results under the same circumstances cannot be valid (accurate). Or, an instrument can consistently (reliably) measure the wrong thing–that is, it can measure something other than what the researcher intended to measure. Research instruments need both strong reliability AND validity to be most useful; they need to measure the outcome variable of interest consistently.

Valid for a specific purpose: Researchers must also use measurement instruments as intended. First, instruments are often validated for use with a particular population; they may not be valid for measuring the same variable in other populations. For example, different cultures, genders, professions, and ages may respond differently to the same question. Second, instruments may be valid in predicting certain outcomes (e.g., SAT & ACT have higher validity in predicting NCLEX success than does GPA). As Sullivan (2011) wrote: “Determining validity can be viewed as constructing an evidence-based argument regarding how well a tool measures what it is supposed to do. Evidence can be assembled to support, or not support, a specific use of the assessment tool.”

In summary….

Instrument validity = how accurate the tool is in measuring a particular variable
Instrument reliability = how consistently the tool measures whatever it measures

Fun Practice: In your own words relate the following article excerpt to the concept of validity? “To assess content validity [of the Moral Distress Scale], 10 nurses were asked to provide comments on grammar, use of appropriate words, proper placement of phrases, and appropriate scoring.“ From p.3, Ghafouri et al. (2021). Psychometrics of the moral distress scale in Iranian mental health nurses. BMC Nursing. https://doi.org/10.1186/s12912-021-00674-4

Methodologic research, Methods, Questionnaires, reading research, Reliability & validity, research

On Target all the time and everytime !

December 6, 2019 Martha "Marty" Farrar Highfield PhD RN Leave a comment

“Measure twice. Cut once!” goes the old carpenter adage. Why? Because measuring accurately means you’ll get the outcomes you want!

Same in research. A consistent and accurate measurement will get you the outcomes you want to know. Whether an instrument measures something consistently is called reliability. Whether it measures accurately is called validity. So, before you use a tool, check for its reported reliability and validity.

A good resource for understanding the concepts of reliability (consistency) and validity (accuracy) of research tools is at https://opentextbc.ca/researchmethods/chapter/reliability-and-validity-of-measurement/ Below are quoted Key Takeaways:

Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them.
There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
Validity is a judgment based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. The assessment of reliability and validity is an ongoing process.

Methods, Questionnaires, Reliability & validity, research, Research class

Research Words of the Week: Reliability & Validity

August 23, 2018 Martha "Marty" Farrar Highfield PhD RN Leave a comment

Reliability & validity are terms that refer to the consistency and accuracy of a quantitative measurement questionnaire, technical device, ruler, or any other measuring device. It means that the outcome measure can be trusted and is relatively error free.

Reliability – This means that the instrument measures CONSISTENTLY
Validity – This means that the instrument measures ACCURATELY. In other words it measures what it is supposed to measure and not something else.

For example: If your bathroom scale measures weight, then it is a valid measure of weight (e.g. it doesn’t measure BP or stress). You might say it had high validity. If your bathroom scale measures your weight as the same thing when you step on and off of it several times then it is measuring weight reliably or consistently; and you might say it has high reliability.

Cochrane, Literature review, Outcome measurement, Questionnaires, Reliability & validity, research methods

“Please answer!” – How to increase the odds in your favor when it comes to questionnaires

June 25, 2017 Martha "Marty" Farrar Highfield PhD RN 1 Comment

Self-report by participants is one of the most common ways that researchers collect data, yet it is fraught with problems. Some worries for researchers are: “Will participants be honest or will they say what they think I want to hear?” “Will they understand the questions correctly?” “Will those who respond (as opposed to those who don’t respond) have unique ways of thinking so that my respondents do not represent everyone well?” and a BIG worry “Will they even fill out and return the questionnaire?”

One way to solve at least the latter 2 problems is to increase the response rate, and Edwards et al (2009 July 8) reviewed randomized trials to learn how to do just that!!

If you want to improve your questionnaire response rates, check it out! Here is Edwards et al.’s plain language summary as published in Cochrane Database of Systematic Reviews, where you can read the entire report.

Methods to increase response to postal and electronic questionnaires

Postal and electronic questionnaires are a relatively inexpensive way to collect information from people for research purposes. If people do not reply (so called ‘non-responders’), the research results will tend to be less accurate. This systematic review found several ways to increase response. People can be contacted before they are sent a postal questionnaire. Postal questionnaires can be sent by first class post or recorded delivery, and a stamped-return envelope can be provided. Questionnaires, letters and e-mails can be made more personal, and preferably kept short. Incentives can be offered, for example, a small amount of money with a postal questionnaire. One or more reminders can be sent with a copy of the questionnaire to people who do not reply.

Critical/reflective thinking: Imagine that you were asked to participate in a survey. Which of these strategies do you think would motivate or remind you to respond and why?

For more info read the full report: Methods to increase response to postal and electronic questionnaires

Evidence based nursing, Reliability & validity

Consistency wins! High reliability= Zero harm

January 13, 2016 Martha "Marty" Farrar Highfield PhD RN Leave a comment

“What’s important is not where an organization begins its patient safety journey, but instead the degree to which it exhibits a relentless commitment to improvement.” – TJC, 2016, p.68

The path to zero harm, according to TJC, begins with high reliability. Reliability in research = consistency. TJC says for zero harm we as providers must be consistent in these ways:

Never be satisfied with your safety record. Always be alert for danger
Be alert for early signs of potential danger. Don’t oversimplify your observations
Note small changes in the organization as having longer range or unintended effects
Commit to resilience so that when errors do happen, you bounce back quickly
When confronted by a threat, put its resolution in the hands of those with the most expertise in that area

Using evidence in practice can be part of our “relentless commitment to improvement,” especially when coupled with above 5 actions and can support zero harm to patients. That evidence can be from research, from process improvement, from evaluation of clinical innovations, or from experts.

For more read TJC’s High Reliability: The Path to Zero Harm online at http://www.jointcommission.org/assets/1/18/HC_Exec_article.pdf

Discovering Your Inner Scientist

Category Archives: Reliability & validity

Construct Validity: Taking it to the next level

Content Validity: Expert Judgment Required

Face Validity: Judging a book by its cover

Essentials for Clinical Researchers

New book: “Doing Research: A Practical Guide”

Testing the Test (or an intro to “Does the measurement measure up?”)

On Target all the time and everytime !

Research Words of the Week: Reliability & Validity

“Please answer!” – How to increase the odds in your favor when it comes to questionnaires

Consistency wins! High reliability= Zero harm

Making research accessible to RNs

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Making research accessible to RNs