Tag Archives: instrument development

Construct Validity: Taking it to the next level

Collecting data is tricky. Data collection tools, like questionnaires, measure research study outcomes more or less well. A tool’s level of validity is how comprehensively & accurately the tool measures what it is supposed to measure (like stress, hope, etc); and reliabilityis how consistently it measures what it is supposed to measure. We’ve all had the experience of a weight-measuring bathroom scale breaking bad and changing our weight each time we step on it. That scale has validity, but no reliability. (See earlier post On Target all the time and everytime” )

Tools are more or less reliable & valid; none are perfect.

Validity is like hitting the right outcome target, and there are four (4) types of validity: 1) Face validity, 2) Content validity, 3) Construct validity, & 4) Criterion-related validity. Earlier posts focused on face & content validity as linked above. This blog focuses on #3: construct validity.

Construct validity is the level of tool accuracy & can be established by these statistical measures: a) convergent validity, b) discriminant/divergent validity, c) known groups, and d) factor analysis. For each of these, subjects (Ss) complete the measurement tool, & results are analyzed.

To illustrate, let’s assume we have a new pain data collection tool. In convergent validity, the same group of Ss complete the new pain tool and an already established pain tool (like self-report on 1-10 scale). Convergent construct validity exists when there is a positive correlation between the results from both tools. Scores on both tools should be similar for convergent validity.

For discriminant (or divergent) validity, a single group of Ss complete the new pain tool and an established tool that measure the “opposite” or a dissimilar concept, such as feeling comfortable. Divergent validity of the new tool is revealed when there is no or low correlation between results from these 2 dissimilar tools. That’s a good thing! We should expect a big difference because the tools are measuring very different things. Pain & feeling comfortable should be very different in the same person at the same time for divergent validity.

Known groups validity means that we compare scores from subjects who exhibit & from those who do NOT exhibit what our tool is supposed to measure. For example, a group in pain and a group who are known to be in complete Zen and NOT to be in pain may fill out the new pain tool. Scores of those in pain should obviously be very different from scores of the “Zenned-out” group who is NOT in pain. Scores of the two groups should have an inverse, no, or low correlation. If the two groups average scores are compared (using ANOVA or t-test) those means should be very different. These differences between groups = known group construct validity.

Photo by Karolina Grabowska http://www.kaboompics.com on Pexels.com

Finally, a single group of subjects (Ss) may complete the instrument, and statistical factor analysis testing is done. Factor analysis arranges items into groups of similar items. The researcher examines each group of items (a factor) and labels it. In our fictitious pain tool example, factor analysis may group items into three (3) main factors that the researcher labels as “physical aspects of pain,” “psychological aspects of pain,” and “disruption of relationships.”

FOR MORE INFO: Check out Highfield, M.E.F. (2025). Select Data Collection Tool. In: Doing Research. Springer, Cham. https://doi.org/10.1007/978-3-031-79044-7_8

CRITICAL THINKING EXERCISE: Read this Google AI overview to test our your renewed/new knowledge of construct validity. See any now familiar ideas?

Pain scale construct validity is established when instruments (e.g., VAS, NRS, FPS-R) accurately measure the theoretical, multi-dimensional concept of pain—intensity, affect, and interference—rather than just a physical sensation. Evidence shows strong convergence between these tools (r=0.82–0.95), confirming they measure similar constructs. 

Convergent Validity: High correlations exist between different, established pain scales (e.g., Numerical Rating Scale (NRS) and Visual Analogue Scale (VAS), indicating they measure the same construct.

Discriminant Validity: Pain scales show lower, non-significant correlations with unrelated variables (e.g., age, irrelevant behavioral factors), proving they specifically measure pain, not general distress.

Dimensionality: Construct validity in tools like the Brief Pain Inventory (BPI) is confirmed through factor analysis, which differentiates between pain intensity and pain interference.

Content Validity: Expert Judgment Required

For accurate study data, you need a tool that correctly & comprehensively measures the outcome of interest (concept). If a tool measures your outcome of interest accurately it has strong validity. If it measures that outcome consistently, it has high reliability.

For now, let’s focus on validity.

Again, validity is how well a research tool measures what it is intended to measure. 

The four (4) types of validity are 1) face, 2) content, 3) construct, & 4) criterion-related. Click here to read my blog on face validity–the weakest type. Now, let’s step it up a notch to content validity.

Content validity is the comprehensiveness of a data collection survey tool. In other words, does the instrument include items that measure all aspects of the thing (concept) you are studying–whether that thing be professional quality of life, drug toxicity, spiritual health, pain, or something else.

When you find a tool that you want to use, look for documented content validity. Content validity means that the tool creators:

  • 1) adopted a specific definition of the concept they want to measure,
  • 2) generated a list of all possible items from a review of literature and/or other sources,
  • 3) gave both their definition and item list to 3-5+ experts on the topic, &
  • 4) asked those experts independently to rate how well each item represents the adopted concept definition (or not). Often experts are asked to evaluate item clarity as well.

When a majority of the expert panel agrees that an item matches the definition, then that item becomes part of the new tool. Items without agreement are tossed. Experts may also edit items or add items to the list, and the tool creator may choose to submit edited and new items to the whole expert panel for evaluation.

Optionally tool creators  may statistically calculate a content validity index (CVI) for items and/or for the tool as a whole, but content validity is still based on experts’ judgment. Some tool authors are just more comfortable with having a number to represent that judgment. An acceptable CVI > 0.78; the “>” means “greater than or equal to.” (Click here for more on item & scale CVIs. )

When reading a research article, you might see content validity reported for the tool. Here’s an example: Content…validity of the nurse and patient [Spiritual Health] Inventories…[was] based on literature review [and] expert panel input….Using a religious-existential needs framework, 59 items for the nurse SHI were identified from the literature with the assistance of a panel of theology and psychology experts…. Parallel patient items were developed, and a series of testing and revisions was completed resulting in two 31-item tools (p. 4, Highfield, 1992).

For more, check out this  quick explanation of content validity: 3 minute YouTube video. If you are trying to establish content validity for your own new tool, consult a mentor and a research text like Polit & Beck’s Nursing research: Generating and assessing evidence for nursing practice.

Critical thinking: What is the difference between face and content validity? How are they alike. (Hint: check out the video.) What other questions do you have?