You will hear about and read research your entire career. Headlines report on positive treatment results, and families come in asking about them. So how should we evaluate research, and what makes a study good or flawed?
Published On: 12/17/2021
Duration: 17 minutes, 41 seconds
References: Database for reporting adverse events
Joshua Feder, MD, and Mara Goverman, LCSW, have disclosed no relevant financial or other interests in any commercial companies pertaining to this educational activity.
Dr. Feder: You will hear about and read research your entire career. Headlines report on positive treatment results, and families come in asking about them. So how should we evaluate research, and what makes a study good or flawed? In this episode, Mara and I will discuss the components of high-quality research and the red flags associated with studies of lower quality. This episode will also help you be better prepared to talk with families about research studies and results that can affect treatment decisions.
Welcome to The Carlat Psychiatry Podcast.
This is a special episode from the child psychiatry team.
I’m Dr. Josh Feder, The Editor-in-Chief of The Carlat Child Psychiatry Report and co-author of The Child Medication Fact Book for Psychiatric Practice.
Mara: And I’m Mara Goverman, a Licensed Clinical Social Worker in Southern California with a private practice.
There can be many different outcomes reported within a single study. And it can be difficult to discern which outcomes carry the most weight. So Dr. Feder, which outcomes should we focus on the most in a particular study?
Dr. Feder: Randomized controlled trials, or RCTs, typically use several outcomes. Researchers declare a single outcome as the primary outcome before the study starts. This prevents researchers from cherry-picking a positive outcome after the data have been examined, then declaring it as the primary outcome.
And it’s important to note that a study is designed to specifically assess the primary outcome. This is a key point, because the parameters included in a study’s design, such as the sample size, duration of treatment, amount of follow-up evaluations, and many others, are chosen primarily to evaluate the primary outcome measure.
Mara: Additionally, you should examine all secondary outcomes. Common secondary measures include patient self-reports, quality of life, levels of functioning in school/work/family life – these measures provide valuable information and should be closely considered along with clinician-rated symptom scales.
Dr. Feder: There may be different results across these outcomes. For instance, antidepressants have been shown to provide small benefits on depression rating scales for youth, but yield no benefits on depression self-reports and quality of life measures compared to placebo.
Mara: It is easy to get excited about published positive treatment results, but further research may or may not support the initial findings. For industry studies, replication from a non-industry team is important. For therapy studies, replication of positive results by a separate research group is important.
Dr. Feder: Two good RCTs of the same treatment for the same indication that both show positive results, is much more powerful than a single trial. How often does this happen in child and adolescent psychopharmacology? Not often. Some treatments have demonstrated consistently poor results; for example, there are multiple studies showing no significant effect of desvenlafaxine or paroxetine on depression in youth.
Mara: Now let’s dive into the pros and cons of different study designs, and the caveats associated with different designs and outcome variables.
In open-label studies, both researchers and patients know what the treatment is and there is no control group. Positive results can be misleading because the patients may improve due to positive expectations or the natural course of their illness—rather than as a result of the treatment. Thus, open-label studies may offer preliminary hope but not solid evidence of efficacy.
Dr. Feder: RCTs are a better way to evaluate a treatment’s effects – but they must be examined closely. RCTs are supposed to be double-blind, with participants and clinical raters unaware of who receives which intervention until the study is over. But drug side effects can “unblind” such studies and bias their results. Drugs with more obvious physical or psychological effects lead to greater unblinding; for instance, high-dose olanzapine is more likely to cause unblinding than a low dose of fluoxetine.
Mara: Also, the same clinical raters usually evaluate both efficacy and adverse events. A rater who notes drug-specific adverse events (or a lack of adverse events) may guess which patients are receiving the active treatment, defeating the purpose of the RCT.
And researchers often declare that a “treatment works” if it has a statistically significant benefit over placebo. But what exactly does “statistical significance” mean?
Dr. Feder: In an RCT, suppose an antidepressant outperforms placebo by two points on a depression rating scale. Based on scores obtained on the rating scale, statistical calculations generate a p-value. The p-value is the probability that the obtained result (medication outperforming placebo by 2 points) could be explained by the null hypothesis, which claims that there is no treatment effect.
Mara: In other words, if the drug really had no effect (ie, the null hypothesis is true), what are the odds that the study would find at least a two-point benefit for the drug? If the p-value is less than .05 (5 percent), the result is deemed statistically significant.
Dr. Feder: It does not necessarily mean that the result is important! Among other things, the size of the sample is influential… a very small treatment benefit may be statistically significant in a large study.
Mara: Statistical significance gives some confidence that there is a treatment effect. It is an important first step, but it’s not the final word on treatment efficacy.
Dr. Feder: What we really want to know is the effect size – the magnitude of the treatment effect. Common convention is that 0.20 = small, 0.50 = medium, and 0.80 = large. In psychiatry, effective treatments nearly always generate small to medium effects (compared to placebo).
Mara: It is now standard practice to report effect size in treatment studies. A study that fails to report effect size may be hiding a minimal treatment benefit.
Clinicians often prefer categorical outcomes, like response and remission. Is there anything we should look out for when studies report categorical outcomes?
Dr. Feder: Categorical outcomes sound great, but aren’t always impressive. Categories have arbitrary cutoff points. For instance, in autism, we can look at a total ADOS score or a change from the severe to moderate range, moderate to mild, etc. But this is tricky – if a patient’s score goes from the low end of “severe” to the very high end of “moderate”, that is not necessarily clinically meaningful. Pay attention to total scores along with any categorical outcomes.
Mara: And it's especially important to check the number needed to treat, or NNT, and the number needed to harm, also known as the NNH, for categorical outcomes. For instance, an NNT of 8 for “response” means 8 patients would need to be treated to gain a response which would not have occurred if all 8 patients had taken placebo.
Dr. Feder: The NNT/NNH refers to the number of people who would need to receive treatment in order to gain an additional positive or negative outcome over what would have occurred if all participants received placebo.
Mara: The lower the NNT, the better. An NNT of 5 is often considered impressive and NNTs of more than 10 are often considered unimpressive, but there is no firm consensus on this.
Dr. Feder: For NNH, the acceptable range might vary from 10 to 100 depending on the severity of the side effect. For severe side effects such as Stevens-Johnson Syndrome, you want to see a much higher number.
Mara: Most RCTs last only a few weeks, while child development is measured in years. One cannot assume long-lasting benefits based on a positive short-term RCT. Most studies of long-term drug efficacy inappropriately use a randomized discontinuation design.
Dr. Feder: These studies start with only participants who have responded to the drug in the short-term. By random assignment, some participants are (usually abruptly) switched to placebo while others continue to take the drug. This conflates a) drug withdrawal effects among those switched to placebo with b) treatment efficacy in those who keep taking the drug. The worse the drug discontinuation effects, the worse the placebo group performs after being taken off medication – and the better those who stay on medication seem in comparison.
Mara: A better test of long-term effects is to simply lengthen a short-term placebo-controlled RCT, but this is expensive and might reduce the hoped-for positive findings of the researchers, so it is rarely done.
Speaking of things that might shine a negative light on a particular drug in a study, let’s talk about side effects. Is there anything we should keep an eye out for, when it comes to the reporting of adverse events in a study?
Dr. Feder: Definitely! Ideally, RCTs should accurately detect adverse events. While weight and some lab measures are usually reliably assessed in RCTs, most adverse events are assessed vaguely. For example, until recently, there was little attempt to ask specific questions regarding suicidality in most treatment studies, leading to an underreporting of such events.
Mara: Adverse events must be systematically assessed, otherwise studies may be unable to detect them. Also, researchers often don’t report all recorded adverse events in journal articles.
Dr. Feder: Yeah, not only is this highly unethical, but it can have seriously dangerous implications for clinical practice. The prevalence of underreporting serious adverse events is systemic and it was highlighted in a 2014 cross-sectional study of 244 summaries associated with published research articles. These summaries were from trials that evaluated the efficacy of six antidepressant and antipsychotic drugs. Out of the 1608 drug-related serious adverse events presented in trial summaries, about 43% of them did not make it into their associated articles.
And this is shocking! According to this study, the majority of deaths and suicides, about 62% and 53% of cases, respectively, were not reported in the published articles. So it’s critical that you keep an eye out for how adverse events are recorded and reported.
Mara: News journals and morning talk shows love covering stories on how “a new study found the treatment that cures disease X.” And it’s easy to get all excited about these treatments. Which is why it’s pretty common for patients and families to ask us about certain treatments or interventions that they heard on the news. We want our patients to be actively involved in our treatment plans, but we also need to formulate and discuss treatments that are supported by high quality research. So how can we help families understand that much of the popular press coverage of research is misleading, without sounding cynical or disrespectful?
Dr. Feder: We need to listen respectfully, then describe in a calm and neutral manner the hope that an open-label study provides, and that controlled trials are needed to know whether a treatment is truly helpful.
Mara: For example, in the recent CCPR Jan/Feb/March 2021 issue, we talked with Dr. Aaron Besterman about how results from pharmacogenomic testing, while interesting, are unlikely in most cases to lead to changes in good treatment.
Dr. Feder: To recap, Mara and I will provide you with a quick and easy clinical research checklist that you should use when evaluating a study. Here we go:
Is the study an RCT or an open-label trial?
Mara: If it’s an open label design, then the natural course of illness, placebo effect, and researcher biases may have caused improvement, rather than the treatment itself.
Dr. Feder: If it’s an RCT, is everyone blinded to which treatment the patient received? Note that drug side effects might unblind the treatment.
Mara: For each efficacy outcome, are the results statistically significant? And what is the treatment’s effect size? Again, a small effect size is 0.2, a medium effect size is 0.5, and a large effect size is 0.8. Did the study include secondary outcomes like parent reports, self-reports, improvements in daily functioning or changes in quality of life?
Dr. Feder: Remember that categorical outcomes (eg, remission, response) are usually based on arbitrary cutoff scores. Always consider categorical outcomes in the context of rating scale scores and other outcomes.
What is the number needed to treat (NNT) for these outcomes?
And if an adverse event is not systematically measured, it is likely underreported.
Mara: When assessing whether a drug is effective over the long-term, you should know that randomly reassigning some patients from drug to placebo may cause drug discontinuation effects and invalidate the comparison.
Dr. Feder: As a bottom-line message, it’s imperative that you pay attention to the quality of the study and the effect sizes of particular treatments. Educate families about how you use your professional judgement to give them truly evidence-based recommendations.
The clinical update will be available for subscribers to read in The Carlat Child Psychiatry Report. Hopefully people check it out. Subscribers get print issues in the mail and email notifications when new issues are available on the website. Subscriptions also come with full access to all the articles on the website and CME credits.
Mara: And everything from Carlat Publishing is independently researched and produced. There’s no funding from the pharmaceutical industry.
Dr. Feder: Yes, the newsletters and books we produce depend entirely on reader support. There are no ads and our authors don’t receive industry funding. That helps us to bring you unbiased information you can trust.
Mara: Go to www.thecarlatreport.com to sign up. You can get a full subscription to any of our four newsletters for $30 off using the coupon code LISTENER.
As always, thanks for listening and have a great day!
Got feedback? Take the podcast survey.