Assessing the Evidence in Child and Adolescent Psychiatry

May 8, 2026

Glen Spielmans, PhD. Professor of Psychology, Metropolitan State University, St. Paul, MN.

Dr. Spielmans has no financial relationships with companies related to this material.

Getting your Trinity Audio player ready...

CCPR: Why should clinicians critically evaluate the research studies behind the treatments they use?
Dr. Spielmans: Research can take time and skill to review. Is a study relevant to your patients? Are there biases in the design or the way the results are reported? The results may be overhyped, so clinicians need to evaluate the outcomes for themselves. Leucovorin, for example, is touted as a treatment for autistic symptoms; however, the largest placebo-controlled study to date was recently retracted after alert readers noted major numerical discrepancies in the study (www.tinyurl.com/m4ehh8z3). This leaves little evidence of leucovorin’s efficacy.

CCPR: How can clinicians identify good research?
Dr. Spielmans: If a study’s results seem too good to be true, they probably are. Look for replicated results, especially from independent research groups. To identify good studies, use the PICOT method: Population, Intervention, Comparison, Outcomes, and Time frame. Starting with Population: Is it relevant? Clinical trials often exclude people with anxiety disorders, eating disorders, substance use problems, or suicidality, but our patients rarely present with a depression-only profile (Blanco C et al, Pediatrics 2017;140(6):e20161701). Wouldn’t we like to know if a treatment works for our suicidal patients with complex diagnostic pictures? If the population isn’t relevant, maybe the study isn’t either. Also note that medication trials frequently exclude prior non-responders to the drug or its class, likely inflating efficacy estimates: “This drug works, except for people who didn’t respond to it.”

CCPR: What about the Intervention itself?
Dr. Spielmans: Ask yourself whether the intervention represents what you do in practice. Suppose you are a cognitive behavioral therapy (CBT) therapist, and you read a study showing “CBT” was efficacious for depression. Examine what the study therapists did. If they focused on depressive thoughts but you focus on behavioral activation, then the study is not very relevant to your work. Just because you subscribe to a particular “brand” of therapy doesn’t mean all studies on that brand are relevant.

CCPR: What about the Comparison part?
Dr. Spielmans: When a study reports that treatment was effective, I ask: “Compared to what?” Doing nothing? It’s not very informative when a treatment is better than a waitlist. We hope medications are better than placebo, but clinicians harness the placebo effect. In open-label studies, when half of the people get better, it’s probably a placebo effect. So, look for head-to-head comparisons and watch for bias. One cleverly titled study, “Why Olanzapine Beats Risperidone, Risperidone Beats Quetiapine, and Quetiapine Beats Olanzapine,” examined head-to-head trials of second-generation antipsychotics, finding that 90% of studies declared victory for the sponsored drug (Heres S et al, Am J Psychiatry 2006;163(2):185–194).

CCPR: Many autism studies use waitlist controls rather than active comparators. What does the research show, and why should clinicians approach these studies with caution?
Dr. Spielmans: This is the problem with most ABA studies. Parents arrive feeling hopeless after failed interventions, making them primed for what Jerome Frank called “re-moralization.” The high-intensity treatment program lifts morale, which can shift how parents perceive and rate their child’s behavior. Meanwhile, waitlist controls feel worse for waiting, skewing their ratings in the opposite direction. Compounding this, many ABA studies are single-case or non-experimental, observing changes during treatment without any independent control group. Sandbank’s sophisticated meta-analysis excluded such studies when comparing effect sizes across traditional ABA, naturalistic developmental behavioral intervention (NDBI), and developmental relationship-based care (DRBI), finding moderate effect sizes for NDBI and DRBI. Despite ABA’s reputation, most ABA studies had poor designs and weaker evidence than developmental approaches for treating autism (Sandbank M et al, BMJ 2023;383:e076733).

CCPR: What are we looking for in the Outcomes of a study?
Dr. Spielmans: Most clinical trials use symptom rating scales. There’s more to a person than those scores. We overvalue decades-old rating scales without really thinking (Kazdin AE, Am Psychol 2006;61(1):42–49; Fried EI et al, Nat Rev Psychol 2022;1(6):358–368). Does a five-point change on a scale mean a person’s relationships or school function are better? Most scales don’t measure those things. Assessments of quality of life and functioning in important areas (social, school, family, etc) are important but underutilized.

CCPR: We also need to track side effects. Is there an illustrative study on that?
Dr. Spielmans: The infamous Study 329 is a great example (Keller MB et al, J Am Acad Child Adolesc Psychiatry 2001;40(7):762–772). This randomized trial reported that paroxetine was efficacious and safe, though it was neither (McHenry LB and Jureidini JN, Account Res 2008;15(3):152–167). After years of haranguing, paroxetine’s sponsor released detailed safety and efficacy data, which were independently analyzed (Le Noury J et al, BMJ 2015;351:h4320). The original JAACAP paper reported that 5 of 93 paroxetine patients experienced suicidal/self-injurious behaviors compared to 1 of 87 on placebo, yet these were euphemized as “emotional lability” in the original paper. Re-analysis based on careful review of individual participant records found that 11 of 93 paroxetine patients experienced suicidal/self-injurious behaviors compared to 1 or 2 of 87 on placebo. At least two participants on paroxetine who were hospitalized subsequent to threatening suicide were not even coded as being “emotionally labile.”

CCPR: What’s the lesson here?
Dr. Spielmans: On one hand, published research is the foundation of evidence-based medicine. On the other, published research often excludes key safety data. Sadly, poor safety data reporting à la Study 329 is common (Hughes S et al, BMJ Open 2014;4:e005535). In fall 2025, JAACAP finally attached a vague expression of concern for the infamous Study 329 paper—24 years after initial publication—despite major concerns being raised almost immediately after the paper was published. The lesson here is that you need to be extra wary of industry-sponsored research.

CCPR: Are there data reporting standards for clinical trials?
Dr. Spielmans: An internal Eli Lilly document described a goal to “mine existing data to generate and publish findings ... support[ing] the reasons to believe the brand promise of olanzapine” (Spielmans GI and Parry PI, J Bioeth Inq 2010;7:13–29), and internal documents from other companies reveal a similar playbook. But what if the data don’t support the brand promise? The voluntary Consolidated Standards of Reporting Trials (CONSORT) sets a widely accepted standard for clinical trial reporting. A primary outcome measure must be declared before the trial begins, preventing researchers from analyzing data first and then “picking a winning outcome.” All efficacy and safety outcomes should follow a predetermined statistical analysis plan and be fully reported. Most medical journals claim to follow CONSORT, yet many articles fall short, and journals often refuse to publish letters pointing out the discrepancy (Jones CW et al, BMC Med 2015;13:282; Goldacre B et al, Trials 2019;20(1):118). In practice, CONSORT compliance often provides a veneer of credibility over the same poor reporting that has long plagued clinical trials. This makes critical reading essential. To vet an article, find its National Clinical Trial (NCT) number in the abstract or methods section and enter it at clinicaltrials.gov to review the study protocol and history of changes, checking whether all outcomes were reported. It’s extra work, but it can meaningfully raise (or lower) your confidence in a study’s results.

CCPR: And, finally, what about the Time frames for studies?
Dr. Spielmans: Depressed people are usually better after a few months if you do nothing. So, when a treatment study reports that most people were better in a few months, I’d say, “They should have been anyway.” That will vary with severity, co-occurring problems, and the condition being studied. Look for controlled studies with long (eg, one-year) outcomes. Also, don’t assume that improvements made in a short-term clinical trial carry forward to months or years later. Symptoms often recur after a study is completed. Taken together, PICOT gives you a practical five-part checklist you can apply to almost any study before acting on its findings.

“Don’t get excited about any individual study. If something seems too good to be true, it probably is. Look for replicated results, especially by a different group of researchers.”
Glen Spielmans, PhD

CCPR: Which reports catch your attention?
Dr. Spielmans: Positive case studies report innovations that springboard systematic research. But I want to see more case studies that say, “I tried this thing, and it didn’t work.” One review found that 95% of published case studies found positive results; if only we achieved such outcomes in the real world (Albrecht J et al, J Clin Epidemiol 2005;58(12):1227–1232)! Side note: I think clinicians should collect data on patient outcomes, for the sake of accountability and quality improvement. We all want to think we’re effective, but this should be tracked objectively. Look for clinical trials whose participants are similar to your patients, report data fully, and use reasonable controls (placebo is nice, but a well-implemented established alternative treatment is better).

CCPR: What about groups of clinical studies, such as meta-analyses?
Dr. Spielmans: People already think of meta-analysis as wizard math, but it’s essentially combining averages across studies using sophisticated statistical formulas. Start by checking for heterogeneity: If studies are too different, lumping them together may not make sense. For example, one CBT study might focus on challenging negative thoughts while another emphasizes behavioral activation. Combining them inflates the sample and statistical power, making effects easier to find across 30 studies than 3, but quantity doesn’t equal quality. Individual studies may have poor outcome assessor blinding, questionable randomization, or other sources of bias.

CCPR: Besides Sandbank, which rocked the autism world, what other important meta-analyses are out there in child psychiatry?
Dr. Spielmans: There are few relevant controlled trials for child psychiatry interventions, limiting the availability and informativeness of meta-analyses. A few positive SSRI trials for youth depression were published in the late 1990s to early 2000s based on highly selective publication, but once researchers accessed unpublished data, meta-analyses found minimal to no effects for SSRIs and SNRIs (Jureidini JN et al, BMJ 2004;328(7444):879–883; Whittington CJ et al, Lancet 2004;363(9418):1341–1345). While fluoxetine appeared to outperform other SSRIs, subsequent studies have raised questions about that finding (Plöderl M et al, J Clin Epidemiol 2026;189:112016). (Editor’s note: At Carlat we recognize that antidepressants have little impact on most kids, and we favor fluoxetine if we are going to try using one.)

CCPR: How are studies of therapy similar to medication studies?
Dr. Spielmans: Many therapy studies use placebo therapy (reflective listening, no homework, no advice) to hold psychotherapy to the same standard as medication. These studies typically show that real therapy outperforms the placebo, but questions remain about therapist training and supervision quality. More informative are head-to-head comparisons of psychotherapies, designed and implemented by experts in each modality being studied.

CCPR: How does all this lead to umbrella reviews?
Dr. Spielmans: Umbrella reviews are meta-analysis of meta-analyses (or of systematic reviews if it’s not easily quantifiable). Say we’re looking at prevention of depression, anxiety, and substance use. These are related but separate entities. An umbrella review can combine them to look at prevention of mental health problems in general. Some umbrella reviews combine the biggest or best meta-analysis on a topic, because several meta-analyses on the same topic can have a lot of overlap of individual studies. Another way is to gather the individual studies from all existing meta-analyses, but this is a lot more work.

CCPR: How do we talk with families about research, especially with current challenges to mental health care in public forums?
Dr. Spielmans: Use a PICOT approach and check whether the studies are relevant to the family. Say you’ve got a kid with autism, ADHD, and depression. That’s not unusual. How many clinical trials do we have on this combination? Not many, if any. But consider generalizing from narrower studies. You might interpret such studies as a glass half full or glass half empty—but being explicit with families about that uncertainty and framing it as an evolving evidence base, rather than a failing of science, can build trust instead of undermining it.

CCPR: Thank you for your time, Dr. Spielmans.

Editor’s note: Keep the following in mind when reviewing research you may want to apply to clinical practice:

Abstracts often overstate their findings (Shinohara K et al, PLoS One 2017;12(9):e0184786).
Effect sizes can help distinguish between statistical and clinical significance.
It’s worth differentiating outcomes-based studies that show associations from experimental studies, which aim to establish cause and effect.
The quality of an umbrella review is only as strong as the meta-analyses it draws from—and meta-analytic quality is often poor.

Child Psychiatry