Category Archives: research methods

Testing the Test (or an intro to “Does the measurement measure up?”)

When reading a research article, you may be tempted only to read the Introduction & Background, then go straight to the Discussion, Implications, and Conclusions at the end. You skip all those pesky, procedures, numbers, and p levels in the Methods & Results sections.

Perhaps you are intimidated by all those “research-y” words like content validity, construct validity, test-retest reliability, and Cronbach’s alpha because they just aren’t part of your vocabulary….YET!

WHY should you care about those terms, you ask? Well…let’s start with an example. If your bathroom scale erratically measured your weight each a.m., you probably would toss it and find a more reliable and valid bathroom scale. The quality of the data from that old bathroom scale would be useless in learning how much you weighed. Similarly in research, the researcher wants useful outcome data. And to get that quality data the person must collect it with a measurement instrument that consistently (reliably) measures what it claims to measure (validity). A good research instrument is reliable and valid. So is a good bathroom scale.

Let’s start super-basic: Researchers collect data to answer their research question using an instrument. That test or tool might be a written questionnaire, interview questions, an EKG machine, an observation checklist, or something else. And whatever instrument the researcher uses needs to give them correct data answers.

For example, if I want to collect BP data to find out how a new med is working, I need a BP cuff that collects systolic and diastolic BP without a lot of artifacts or interference. That accuracy in measuring BP only is called instrument validity. Then if I take your BP 3 times in a row, I should get basically the same answer and that consistency is called instrument reliability. I must also use the cuff as intended–correct cuff size and placement–in order to get quality data that reflects the subject’s actual BP.

The same thing is true with questionnaires or other measurement tools. A researcher must use an instrument for the intended purpose and in the correct way. For example, a good stress scale should give me accurate data about a person’s stress level (not their pain, depression, or anxiety)–in other words it should have instrument validity. It should measure stress without a lot of artifacts or interference from other states of mind.

NO instrument is 100% valid–it’s a matter of degree. To the extent that a stress scale measures stress, it is valid. To the extent that it also measures other things besides stress–and it will–it is less valid. The question you should ask is, “How valid is the instrument?” often on a 0 to 1 scale with 1 being unachievable perfection. The same issue and question applies to reliability.

Reliability & validity are interdependent. An instrument that yields inconsistent results under the same circumstances cannot be valid (accurate). Or, an instrument can consistently (reliably) measure the wrong thing–that is, it can measure something other than what the researcher intended to measure. Research instruments need both strong reliability AND validity to be most useful; they need to measure the outcome variable of interest consistently.

Valid for a specific purpose: Researchers must also use measurement instruments as intended. First, instruments are often validated for use with a particular population; they may not be valid for measuring the same variable in other populations. For example, different cultures, genders, professions, and ages may respond differently to the same question. Second, instruments may be valid in predicting certain outcomes (e.g., SAT & ACT have higher validity in predicting NCLEX success than does GPA). As Sullivan (2011) wrote: “Determining validity can be viewed as constructing an evidence-based argument regarding how well a tool measures what it is supposed to do. Evidence can be assembled to support, or not support, a specific use of the assessment tool.”

In summary….

  1. Instrument validity = how accurate the tool is in measuring a particular variable
  2. Instrument reliability = how consistently the tool measures whatever it measures

Fun Practice: In your own words relate the following article excerpt to the concept of validity? “To assess content validity [of the Moral Distress Scale], 10 nurses were asked to provide comments on grammar, use of appropriate words, proper placement of phrases, and appropriate scoring. From p.3, Ghafouri et al. (2021). Psychometrics of the moral distress scale in Iranian mental health nurses. BMC Nursing. https://doi.org/10.1186/s12912-021-00674-4

iS IT 2? OR 3?

Credible sources often disagree on technicalities. Sometimes this includes classification of research design. Some argue that there are only 2 categories of research design:

  1. True experiments. True experiments have 3 elements: 1) randomization to groups, 2) a control group and an 3) intervention; and
  2. Non-experiments. Non-experiments may have 1 to none of those 3 elements.
Within-subject Control Group

Fundamentally, I agree with the above. But what about designs that include an intervention and a control group, but Not randomization?

Those may be called quasi-experiments; the most often performed quasi-experiment is pre/post testing of a single group. The control group are subjects at baseline and the experimental group are the same subjects after they receive a treatment or intervention. That means the control group is a within-subjects control group (as opposed to between-group control). Quasi-experiments can be used to answer cause-and-effect hypothesis when an experiment may not be feasible or ethical.

One might even argue that a strength of pre/post, quasi-experiments is that we do Not have to Assume that control and experimental groups are equivalent–an assumption we would make about the subjects randomized (randomly assigned) to a control or experimental group. Instead the control and experimental  are exactly equivalent because they are the same persons (barring maturation of subjects and similar threats to validity that are also true of experiments).

I think using the term quasi-experiments makes it clear that persons in the study receive an intervention. Adding “pre/post” means that the

This image has an empty alt attribute; its file name is intervention.jpg
Baseline ->Intervention->Post

researcher is using a single group as their own controls. I prefer to use the term non-experimental to mean a) descriptive studies (ones that just describe the situation) and b) correlation studies (ones without an intervention that look for whether one factor is related to another).

What do you think? 2? or 3?

A practical place to start

Enrolled in an MSN….and wondering what to do for an evidence-based clinical project?

Recently a former student contacted me about that very question. Part of my response to her is below:

“One good place to start if you are flexible on your topic is to look through Cochrane Reviews, Joanna Briggs Institute, AHRQ Clinical Practice Guidelines, or similar for very strong evidence on a particular topic and then work to move that into practice in some way.  (e.g., right now I’m involved in a project on using evidence of a Cochrane review on the benefits of music listening–not therapy–in improving patient outcomes like pain, mood, & opioid use).

Once you narrow the topic it will get easier.  Also, you can apply only the best evidence you have, so if there isn’t much research or other evidence about the topic you might have to tackle the problem from a different angle” or pick an area where there IS enough evidence to apply.

Blessings! -Dr.H

Pilot sTUdies: Look before you leap! (a priori vs. posthoc)

Why does it matter if a study is labeled a “pilot”?

SHORT ANSWER: …Because a pilot is about testing research methods,….not about answering research questions.

If a project has “pilot” in the title, then you as a reader should expect a study that examines whether certain research methods work (methodologic research). Methods include things like timing of data collection, sampling strategies, length of questionnaire, and so on. Pilots suggest what methods will effectively to answer researchers’ questions. Advance prep in methods makes for a smooth research landing.

Small sample = Pilot? A PILOT is related to study goals and design–not sample size. Of course pilots typically have small samples, but a small sample does not a pilot study make. Sometimes journals may tempt a researcher to call their study a pilot because of small samples. Don’t go there. Doing so means after-the-fact, posthoc changes that were Not the original, a priori goals and design.

Practical problems? If researchers label a study a “pilot” after it is completed (post hoc), they raise practical & ethical issues. At a practical level, researchers must create feasibility questions & answers. (See NIH.) The authors should drop data analysis that answers their original research questions.

Ethics? This ethically requires researchers 1) to say they planned something that they didn’t or 2) to take additional action. Additional action may be complete transparency about the change and seeking modification to original human subjects’ committee approvals. An example of one human subjects issue is that you informed your subjects that their data would answer a particular research question, and now you want to use their data to answer something else–methods questions!

Options? You can just learn from your small study and go for a bigger one, including improving methods. Some journals will consider publication of innovative studies even when small.

Look first, then leap: Better to look a priori, before leaping. If you think you might have trouble with your methods, design a pilot. If you made the unpleasant discovery that your methods didn’t work as you hoped, you can 1) disseminate your results anyway or 2) rethink ethical and practical issues.

Who’s with me? The National Institutes of Health agree: https://nccih.nih.gov/grants/whatnccihfunds/pilot_studies . NIH notes that common misuses of “pilots” are determining safety, efficacy of intervention, and effect size.

Who disagrees? McGrath argues that clinical pilots MAY test safety and efficacy, as well as feasibility. (See McGrath, J. M. (2013). Not all studies with small samples are pilot studies, Journal of Perinatal & Neonatal Nursing, 27(4): 281-283. doi: 10.1097/01.JPN.0000437186.01731.bc )

Trial Balloons & Pilot Studies

A pilot study is to research what a trial balloon is to politics

In politics, a trial balloon is communicating a law or policy idea via media to see how the intended audience reacts to it.  A trial balloon does not answer the question, “Would this policy (or law) work?” Instead a trial balloon answers questions like “Which people hate the idea of the policy/law–even if it would work?” or “What problems might enacting it create?” In other words, a trial balloon answers questions that a politician wants to know BEFORE implementing a policy so that the policy or law can be tweaked to be successfully put in place.

meeting2

In research, a pilot study is sort of like a trial balloon. It is “a small-scale test of the methods and procedures” of a planned full-scale study (Porta, Dictionary of Epidemiology, 5th edition, 2008). A pilot study answers questions that we want to know BEFORE doing a larger study, so that we can tweak the study plan and have a successful full-scale research project. A pilot study does NOT answer research questions or hypotheses, such as “Does this intervention work?”  Instead a pilot study answers the question “Are these research procedures workable?”

A pilot study asks & answers:Can I recruit my target population? Can the treatments be delivered per protocol? Are study conditions acceptable to participants?” and so on.   A pilot study should have specific measurable benchmarks for feasibility testing. For example if the pilot is finding out whether subjects will adhere to the study, then adherence might be defined as  “70 percent of participants in each [group] will attend at least 8 of 12 scheduled group sessions.”  Sample size is based on practical criteria such as  budget, participant flow, and the number needed to answer feasibility questions (ie. questions about whether the study is workable).

A pilot study does NOT Test hypotheses (even preliminarily); Use inferential statistics; Assess safety of a treatment; Estimate effect size; Demonstrate safety of an intervention.

A pilot study is not just a small study.

Next blog: Why this matters!!

For more info read the source of all quotes in this blog: Pilot Studies: Common Uses and Misuses @ https://nccih.nih.gov/grants/whatnccihfunds/pilot_studies

Of Mice and Cheese: Research with Non-equivalent Groups

Reposting. Enjoy the review. -Dr.H

Discovering Your Inner Scientist

Last week’s blog focused on the strongest types of evidence that you might find when trying to solve a clinical problem. These are: #1 Systematic reviews, Meta-analyses, or Evidence-based clinical practice guidelines based on systematic review of RCTs; & #2 Randomized controlled trials. (For levels of evidence from strongest to weakest, see blog “I like my coffee (and my evidence) strong!”)

So after the two strongest levels of evidence what is the next strongest? #3 level is controlled trials without randomization. (Sometimes called quasi-experimental studies.)

Here’s an example of a controlled trial without randomization: I take two groups of mice and test two types of cheese to find out which one mice like best. I do NOT randomly assign the mice to groups. The experimental group #1 loved Swiss cheese, & the control group #2 refused to eat the cheddar. I assume confidently that mice LOVE Swiss cheese…

View original post 196 more words

“Please answer….” (cont.)

What do people HATE about online surveys?   If you want to improve your response rates, check out SurveyMonkey Eric V’s (May Mail2017)  Eliminate survey fatigue: Fix 3 things your respondents hate 

For more info: Check out my earlier post “Please Answer!”

Filtered vs. Unfiltered: What do these terms mean?

Are we talking cigarettes?  water? coffee? other?   Yes, other.   In this case about what is sometimes called “filtered” or “unfiltered” literature in the evidence-based medicine pyramid of research evidence.  (I have more than one issue with this particular pyramid as a representation of all evidence, but for right now let’s look at filtered information & unfiltered information.  Pyramid source:  Wikimedia Commons  
Filtered Unfiltered jpg

Filtered is considered stronger–meaning that we can be more confident that literature from this category better  supports cause and effect.  I agree.

Unfiltered evidence (usually single studies etc) is considered weaker–meaning that we must be more cautious about its accuracy in representing reality.  I agree.

But, “Is unfiltered information really unfiltered?”  No filtering at all? My qualified answer is, “No.”   Argue with me if you like.

My opinion: If the “unfiltered” article is a primary source, research study that has strong design and is published in a peer-review journal then it has been filtered by multiple, expert peer reviewers just to make it to publication.

Thus, when discussing filtered vs. unfiltered one should be very clear on what those terms mean and do not mean.

Critical Thinking: When filtered literature (systematic reviews & critically appraised topics & articles) has been filtered by one individual, is that superior to unfiltered literature in terms of introducing bias?  What if the “filtered” evidence is 7 years old and a primary, “unfiltered” source(s) from this year has different findings?   What is the relationship between “filtered” and “unfiltered”–after all the “unfiltered” is the pyramid base so what does that mean?

For more Info:  For peer review, the lower level filtering of single studies, consider its 1)  advantages (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4975196/)  and 2) its potential flaws (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1420798/)

“Please answer!” – How to increase the odds in your favor when it comes to questionnaires

Self-report by participants is one of the most common ways that researchers collect data, yet it is fraught with problems.   Some worries for researchers are: “Will participants be honest or will they say what they think I want to hear?”   “Will they understand the DifferentGroupsquestions correctly?”  “Will those who respond (as opposed to those who don’t respond) have unique ways of thinking so that my respondents do not represent everyone well?” and a BIG worry “Will they even fill out and return the questionnaire?”

One way to solve at least the latter 2 problems is to increase the response rate, and Edwards et al (2009 July 8) reviewed randomized trials  to learn how to do just that!!Questionnaire faces

If you want to improve your questionnaire response rates, check it out!  Here is Edwards et al.’s plain language summary as published in Cochrane Database of Systematic Reviews, where you can read the entire report.

Methods to increase response to postal and electronic questionnaires

MailPostal and electronic questionnaires are a relatively inexpensive way to collect information from people for research purposes. If people do not reply (so called ‘non-responders’), the research results will tend to be less accurate. This systematic review found several ways to increase response. People can be contacted before they are sent a postal questionnaire. Postal questionnaires can be sent by first class post or recorded delivery, and a stamped-return envelope can be provided. Questionnaires, letters and e-mails can be made more personal, and preferably kept short. Incentives can be offered, for example, a small amount of money with Remember jpga postal questionnaire. One or more reminders can be sent with a copy of the questionnaire to people who do not reply.

 

Critical/reflective thinking:  Imagine that you were asked to participate in a survey.  Which of these strategies do you think would motivate or remind you to respond and why?

For more info read the full report: Methods to increase response to postal and electronic questionnaires

 

“Should you? Can you?”

ApplesOranges2Quasi-experiments are a lot of work, yet don’t have the same scientific power to show cause and effect, as do randomized controlled trials (RCTs).   An RCT would provide better support for any hypothesis that X causes Y.   [As a quick review of what quasi-experimental versus RCT studies are, see “Of Mice & Cheese” and/or “Out of Control (Groups).”]

So why do quasi-experimental studies at all?  Why not always do RCTs when we are testing cause and effect?  Here are 3 reasons:

#1  Sometimes ETHICALLY the researcher canNOT randomly assign subjects to a control Smokingand an experimental group.  If the researcher wants to compare health outcomes of smokers with non-smokers, the researcher cannot assign some people to smoke and others not to smoke!  Why?  Because we already know that smoking has significant harmful effects. (Of course, in a dictatorship, by using the police a researcher could assign them to smoke or not smoke, but I don’t think we wanna go there.)

#2 Sometimes PHYSICALLY the researcher canNOT randomly assign subjects to control & Country of Originexperimental groups.   If the researcher wants to compare health outcomes of
individuals from different countries, it is physically impossible to assign country of origin.

#3 Sometimes FINANCIALLY the researcher canNOT afford to assign subjects randomly PiggyBankto control & experimental groups.   It costs $ & time to get a list of subjects and then assign them to control & experimental groups using random numbers table or drawing names from a hat.

Thus, researchers sometimes are left with little alternative, but to do a quasi-experiment as the next best thing to an RCT, then discuss its limitations in research reports.

Critical Thinking: You read a research study in which a researcher recruits the 1st 100 patients on a surgical ward January-March quarter as a control group.  Then the researcher recruits the 2nd 100 patients on that same surgical ward April-June for the experimental group.  With the experimental group, the staff uses a new, standardized pain script for better pain communications.  Then the pain communication outcomes of each group are compared statistically.

  • Is this a quasi-experiment or a randomized controlled trial (RCT)?
  • What factors (variables) might be the same among control & experimental groups in this study?
  • What factors (variables) might be different between control & experimental groups that might affect study outcomes?
  • How could you design an ethical & possible RCT that would overcome the problems with this study?
  • Why might you choose to do the study the same way that this researcher did?

For more info: see “Of Mice & Cheese” and/or “Out of Control (Groups).”

Making research accessible to RNs

%d bloggers like this: