Tag Archives: Methodologic research

Face Validity: Judging a book by its cover

“Don’t judge a book by its cover.” That’s good advice about not evaluating persons merely by the way they look to you. I suggest we all take it.

But…when it comes to evaluating data collection tools, things are different. When we ask the question, “Does this questionnaire, interview, or measurement instrument look like it measures what it is supposed to measure, then we are legitimately judging a book (instrument) by its cover (appearance). We call that judgment face validity. In other words, the tool appears to us on its face to measure what it is designed to measure.

For example, items on the well-established Beck Depression Inventory (DPI) cover a range of symptoms, such as sadness, pessimism, feelings of failure, loss of pleasure, guilt, crying, and so on. If you read all DPI items, you could reasonably conclude just by looking at them that those items do indeed measure depression. That judgement is made without the benefit of statistics, and thus you are judging that book (the DPI) by its cover (how it appears to you). That is face validity.

Face validity is only one of four types of data collection tool validity.

In research, tool validity is defined as how well a research tool measures what it is designed to measure. The four broad types of validity are: a) face, b) content, c) construct, and d) criterion-related validity. And make no mistake, face validity is the weakest of the four. Nonetheless, it makes a good a starting point. Just don’t stop there; you will need one or more of its three statistical validity cousins–content, construct, and criterion-related–to have a strong data collection tool.

And…in referring back to the DPI example….the DPI looks valid probably because it is verified as valid by other types of validity

Thots about why we need face validity at all?

On Target all the time and everytime !

“Measure twice. Cut once!” goes the old carpenter adage. Why? Because measuring accurately means you’ll get the outcomes you want!

Same in research. A consistent and accurate measurement will get you the outcomes you want to know. Whether an instrument measures something consistently is called reliability. Whether it measures accurately is called validity. So, before you use a tool, check for its reported reliability and validity.

A good resource for understanding the concepts of reliability (consistency) and validity (accuracy) of research tools is at https://opentextbc.ca/researchmethods/chapter/reliability-and-validity-of-measurement/ Below are quoted Key Takeaways:

  • Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them.
  • There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
  • Validity is a judgment based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
  • The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. The assessment of reliability and validity is an ongoing process.

Pilot sTUdies: Look before you leap! (a priori vs. posthoc)

Why does it matter if a study is labeled a “pilot”?

SHORT ANSWER: …Because a pilot is about testing research methods,….not about answering research questions.

If a project has “pilot” in the title, then you as a reader should expect a study that examines whether certain research methods work (methodologic research). Methods include things like timing of data collection, sampling strategies, length of questionnaire, and so on. Pilots suggest what methods will effectively to answer researchers’ questions. Advance prep in methods makes for a smooth research landing.

Small sample = Pilot? A PILOT is related to study goals and design–not sample size. Of course pilots typically have small samples, but a small sample does not a pilot study make. Sometimes journals may tempt a researcher to call their study a pilot because of small samples. Don’t go there. Doing so means after-the-fact, posthoc changes that were Not the original, a priori goals and design.

Practical problems? If researchers label a study a “pilot” after it is completed (post hoc), they raise practical & ethical issues. At a practical level, researchers must create feasibility questions & answers. (See NIH.) The authors should drop data analysis that answers their original research questions.

Ethics? This ethically requires researchers 1) to say they planned something that they didn’t or 2) to take additional action. Additional action may be complete transparency about the change and seeking modification to original human subjects’ committee approvals. An example of one human subjects issue is that you informed your subjects that their data would answer a particular research question, and now you want to use their data to answer something else–methods questions!

Options? You can just learn from your small study and go for a bigger one, including improving methods. Some journals will consider publication of innovative studies even when small.

Look first, then leap: Better to look a priori, before leaping. If you think you might have trouble with your methods, design a pilot. If you made the unpleasant discovery that your methods didn’t work as you hoped, you can 1) disseminate your results anyway or 2) rethink ethical and practical issues.

Who’s with me? The National Institutes of Health agree: https://nccih.nih.gov/grants/whatnccihfunds/pilot_studies . NIH notes that common misuses of “pilots” are determining safety, efficacy of intervention, and effect size.

Who disagrees? McGrath argues that clinical pilots MAY test safety and efficacy, as well as feasibility. (See McGrath, J. M. (2013). Not all studies with small samples are pilot studies, Journal of Perinatal & Neonatal Nursing, 27(4): 281-283. doi: 10.1097/01.JPN.0000437186.01731.bc )