We are increasingly inundated with media reports purporting to prove health benefits, lack of benefits, or hazards for medications, nutriceuticals, diets, exercise equipment, and a whole host of other things. This week’s reports about a study that failed to find expected benefits from low-fat diets are a good example.
Most newspapers, fitness magazines, and health-related internet sites have health experts (like me) that write about “what’s new in health.” These reports, often couched in scientific terms, are often too good to be true—because, well, they are too good to be true.
So, what’s a consumer to do? How can you cut through the baloney and determine what is good science, really good science, or just plain junk? Here is my suggestion. Learn about study design. “Oh no,” you are thinking. “She isn’t really going to write about study design on a PEERtrainer blog is she? I’m outta here…”
Hey, wait. I’ll make this as painless as possible. I promise. Please, take the time to read this. As your teachers used to say, it will be on the test. And, there’s the potential additional benefit of being able to astonish your friends when you tell them you don’t believe the $200 and cruncher they are considering buying really works because it has not been subjected to a rigorous randomized controlled trial.
Ok, here we go…
Most of the studies you read about are trying to determine if there is a relationship, usually a causal link, between one thing (the intervention or exposure) and another (the outcome or result). For example, the researchers may want to learn if drinking coffee causes cancer.
What are good study designs?
The best study designs for determining if something is causally related to an outcome (i.e., the intervention was at least partially responsible for the observed changes) is a randomized controlled trial (RCT). Participants are randomly assigned to one or more types of interventions. At a minimum, there is a group receiving the intervention being studied, (for example, taking a certain medication) and another group that does not receive the intervention. The latter group is called the control or comparison group.
There are lots of ways to randomize participants, the simplest being to assign every other person enrolled in the study intervention with the rest of the folks serving as controls. If the people randomly assigned to the intervention group are determined to be similar in most ways to those randomly assigned to the control group, scientists then assume that the major difference between the two groups was whether they got the intervention or not. Any difference in outcomes between the two groups may well be causally related to the intervention.
Here’s the rub. Researchers can’t always be certain that the people in the intervention group are, in fact, similar to those in the control group. They will ask a lot of questions about an individual’s behavior, medications, and exposures. They may also take a lot of measurements, like blood pressure, weight, etc. They then perform an analysis to determine if the groups are statistically similar.
However, no matter how many characteristics they study, there is always the chance that the groups are dissimilar in important ways that the researchers simply didn’t know to ask about. For example, the coffee drinkers who got cancer could have been drinking their coffee out of mugs with a cancer-producing substance in the ceramic glaze. In this hypothetical example, it is the glaze, not the coffee that caused cancer. The larger the group of people studied and the more they are matched for relevant characteristics, the more likely the studied exposure is causally related to the study outcome.
Here’s another example: 600 twenty-somethings attending a University are recruited to participate in a study on weight loss. Every other person to sign up is assigned to take the study medication, a new diet pill. They become the intervention group. The others are assigned to the control (comparison) group. They are told to take a pill that looks identical to the diet pill, but it has no active ingredients (i.e., a placebo). After 5 months, the study group has lost an average of 25 pounds apiece but the controls have neither gained nor lost weight. The study is reported as having demonstrated that the pill seems to cause weight loss, at least in individuals with characteristics similar to the twenty-somethings in the study.
A key to RCTs is that the participants have no say in determining which group assignment they receive. When possible, it is best if neither the participants in the trial nor the scientists evaluating the outcome data know which group any given individual is in until after all of the results are in—that is called a “double-blind” study. In some cases, that will not be possible. How do you mask whether someone is in a yoga class or not? RCTs that are not blinded are still better than studies without control (comparison) groups at determining causal links between treatments and outcomes.
Non-randomized controlled studies
Non-randomized controlled studies look at one or more groups who, in a non-random way, either participate or don’t participate in the intervention being studied. They then try to determine if key differences in outcome exist between those that experienced the study intervention (the study group) and those that don’t (the control group).
These types of studies frequently have an important flaw that is called a selection bias.
To illustrate what I mean, let’s say we want to study the impact of yoga on body weight. We go to a yoga studio sign up 100 new yoga practitioners (the intervention group) and we compare them to 100 people we found drinking coffee at the Starbucks next door (the comparison group). We measure everyone’s weight initially and after 12 months in the study. We find that they all of the yoga practitioners lost a significant amount of weight. The Starbucks crowd not only didn’t lose weight, they gained an average of 5 lbs apiece (oh, those frappuccinos). Does that mean that yoga was responsible for the weight loss?
The answer is we simply don’t know because it is possible that the group that attended yoga classes was more motivated to lose weight than the group hanging out at Starbucks. It wasn’t the yoga that caused the weight loss, rather the key difference between the study group and the controls was motivation.
Another study design that is commonly used is called a pre-post study. Measurements of interest are taken from every individual in the study both before and after the intervention. The problem with this type of study is that you cannot determine if something other than the study intervention was what really caused the outcomes.
Again, let me illustrate. Let’s say this time we go to a yoga studio in Alabama. We enroll the first 100 new customers in the study. Again, we measure weight before and after 12 months of yoga practice. And again, we find a substantial weight loss. Did yoga cause the weight loss…again, we don’t really know. Perhaps everyone in Alabama is losing weight because of Governor’s Huckabee’s new statewide focus on fitness. We simply wouldn’t know because we did not have a control group for comparison.
So, there you go study design in a nutshell. Here are the take-home points:
- Proving that something is causally linked to something else requires a control group for comparison.
- Comparing individuals similar in (almost) every way except for whether or not they are exposed to the study intervention is a key to determining causality.
- Studies with large numbers of people are almost always better than studies with small numbers of people in determining causality.
- Failure to randomize groups to intervention vs. no intervention risks introducing biases, such as selection bias, that weaken the strength of the results.
- Pre-post studies, although very popular, usually fail to prove that any changes seen were due to the study intervention. Something else that occurred that wasn’t studied could have caused that change in the population at large. It was just missed by the researchers because they didn’t have a comparison group.
- Ecological studies and cross-sectional studies give us hints that things may be related, but more rigorous study designs are necessary to prove it.
There are some other important things to consider when trying to determine how much you can rely on the results of a study. They include factors like magnitude of the differences between the groups and whether the sample size was large enough to detect a subtle, but nevertheless, important difference. Don’t worry though. I’ll try to point out these problems in future blogs that discuss the results of specific studies.
See, that wasn’t so bad, was it?