• Home
  • Store
    • Newsletter Subscriptions
    • Multimedia
    • Books
    • eBooks
    • ABPN SA Courses
    • Social Work Courses
  • CME Center
  • Multimedia
    • Podcast
    • Webinars
    • Blog
    • Psychiatry News Videos
    • Medication Guide Videos
  • Newsletters
    • General Psychiatry
    • Child Psychiatry
    • Addiction Treatment
    • Hospital Psychiatry
    • Geriatric Psychiatry
    • Psychotherapy and Social Work
  • FAQs
  • Med Fact Book App
  • Log In
  • Register
  • Welcome
  • Sign Out
  • Subscribe
Home » Statistical Significance: What Does it Really Mean?

Statistical Significance: What Does it Really Mean?

February 1, 2007
From The Carlat Psychiatry Report
Issue Links: Learning Objectives | Editorial Information | PDF of Issue

You won’t get very far into any journal before you start reading about statistical significance, and its close sibling, 95% confidence intervals. But what do these terms mean, and how do they help us draw conclusions about studies?

Because this topic can be numbingly dry in the abstract, we’ll illustrate these basic concepts by reviewing the results of a paper chosen from the current issue of the American Journal of Psychiatry (Eranti S et al, A Randomized, Controlled Trial With 6-Month Follow-Up of Repetitive Transcranial Magnetic Stimulation and Electroconvulsive Therapy for Severe Depression 2007;164:73-81.)

From the title alone, you can ascertain that the study design is one of the better ones, that is, a randomized, controlled trial comparing ECT with rTMS for the treatment of severe depression. In this study, a total of 46 patients with severe depression were randomly assigned to either a 15-day course of rTMS (N=24) or a standard course of ECT (N=22). If you peer into the “Method” section, however, you will discover that it is neither double-blinded nor placebo-controlled. Not a perfect design, but a pretty good one.

Next, let’s go directly to the “Results” section (page 75), focusing specifically on the subheading “Primary Outcome:”

“Post hoc tests showed that the end-oftreatment HAM-D scores were significantly lower in the ECT group than in the rTMS group (F=10.89, df=1, 45, 95% CI for difference=3.40 to 14.05, p=0.002), demonstrating a strong standardized effect size of 1.44.”

It’s a mouthful, to be sure, but with some basic concepts in statistics you’ll be able to whip through such verbiage in no time. Let’s deconstruct the findings.

1. “Post hoc tests showed that the end-of-treatment HAM-D scores were significantly lower in the ECT group than in the rTMS group (F=10.89, df=1,
45, 95% CI for difference=3.40 to 14.05, p=0.002)….”

This means that statistics done after the results were tallied (“post hoc”) showed that patients who received ECT ended up with an average Hamilton depression score that was lower (meaning less depressed) than those patients who received rTMS. The stuff in parentheses is there to prove that this difference was “statistically significant.” Skip to the end of those numbers, and you see that “p=0.002.” Translation: the probability that this result might have occurred by chance alone (and therefore, is not a “real” finding) is 2 out of 1000, or 0.002, or only 0.2 %. The standard cut-off point for statistical significance is p=0.05, or a 5% probability that the results occurred by chance, so the results of this study are particularly “robust.”

You will often see studies in which results are reported like this: “the difference between Drug A and Drug B showed a trend toward statistical significance (p=0.06).” This means that the results didn’t quite meet the crucial 0.05 threshold, but they came close. Why is 5% the magic number? As befits an arbitrary number, its choice was also somewhat arbitrary. In 1926, R. A. Fisher, one of the fathers of modern statistics, wrote an article in which he argued that it was “convenient” to choose this cut-off point, for a variety of reasons having to do with standard deviations and the like (for more information, see Dallal GE, The Little Handbook of Statistical Practice, posted on the web at http://www.tufts.edu/~gdallal/LHSP.HTM). This number has stood the test of time throughout all the scientific disciplines. Why? Because it has some intuitive appeal.

Look at it this way: Before we accept a finding as scientific fact, we want to be pretty certain that it didn’t occur through some coincidence of random factors. But how certain is “pretty certain?” Would 80% certainty (p=0.2) be enough for you? Probably not. Most doctors would not feel comfortable basing important treatment decisions on only an 80% certainty that a treatment is effective. Much better would be 99% certainty (p=0.01), but if that were the required threshold we would have very little to offer our patients. It just so happens that 95% certainty has felt “right” to scientists through the last 50 years or so. Of course it’s arbitrary, but if we don’t agree on some threshold, we open ourselves up to researchers creating their own threshold values depending on how strongly they want to push acceptance of their data (some still do this anyway). Because the scientific community has settled upon p=0.05, the term “statistical significance” has a certain, well, significance!

That being said, you, as a reader and clinician, have every right to look at a study reporting p=0.06 and say to yourself, “There’s only a 6/100 chance that this was a coincidental finding. It may not meet the 0.05 threshold, but, at least in this clinical situation, that’s good enough for me, so I think I’ll try this treatment.”

What about those other numbers? “F=10.89” means that the “F value” is 10.89. The F value is computed from the difference between the HAM-D scores in the two treatment groups (it’s a bit more involved than this, because this difference is divided by a factor to correct for variation in individual scores, but for purposes of basic understanding, we don’t need to get into that). Clearly, the higher the F-value, the more of a difference there is between the groups, and the more likely it is that this difference will be statistically significant.

You’ll often see these kind of statistics referred to as “analysis of variance,” and now you can see why it’s called that. It’s the analysis of the variance, or difference, between the averages of two treatment groups.

The “df” in the extract means “degrees of freedom,” an arcane statistical term that in this case equals the number of treatment groups minus one. Believe me, you don’t want to know more than this.

What about the “95% CI for difference= 3.40 to 14.05”? This refers to the 95% confidence interval for the F value (the corrected difference between the treatment groups). This means that we have 95% confidence that the actual corrected difference in HAM-D scores between the two groups is somewhere between 3.40 and 14.05. That’s a large range, to be sure, but the key point here is that we’re 95% certain that the difference is no less than 3.4. And there’s a good chance that the difference is more than that.

2. “…demonstrating a strong standardized effect size of 1.44.”

Knowing that the apparent advantage of ECT over rTMS in these patients was statistically significant is all well and good. But how do we get a handle on measuring how strong this advantage was? This is where “effect size” comes into play. The effect size is the size of a statistically significant difference. To calculate it, you divide the difference in the outcome measure between the two treatment groups by the standard deviation. (Sorry, I’m not going to define standard deviation, since understanding this is not crucial for a basic understanding of effect size).

If the effect size is 0, this implies that the mean score for the treatment group was the same as the comparison group, ie, no effect at all. And just as obviously, the higher the effect size, the stronger the effect of treatment. Here are the standard benchmarks: effect sizes of 0 to 0.3 represent little to no effect, 0.3 to 0.6 a small effect, 0.6 to 0.8 a moderate effect and 0.8 or greater a strong effect. As you can see, the effect size in this study, 1.44, was very strong, meaning that ECT was strongly superior to rTMS in these patients.

By the way, resist the temptation to make up your mind after reading a single study. In this case, it turns out that several other studies have compared rTMS and ECT; some have replicated the findings of this study (Janicak PG, et al, Biol Psychiatry 2002;51:659-667), while others have reported that rTMS is just as good as ECT (Grunhaus L, Biol Psychiatry 2000;47:314-324). Authors usually discuss such discrepancies in the discussion section, and the usual explanation is that the other studies were deficient in some way!

TCPR Verdict:

Statistical Significance: Tame the numbers!

 
General Psychiatry
KEYWORDS understanding_psychiatric_research
    www.thecarlatreport.com
    Issue Date: February 1, 2007
    SUBSCRIBE NOW
    Table Of Contents
    How to Read a Journal Article
    Statistical Significance: What Does it Really Mean?
    Clarifying the Risks of Antidepressants
    Vitamin E and Cognitive Function in Women
    Common PTSD Drug Doesn’t Really Help
    A Cure for Amphetamine Dependence?
    Curb Your Enthusiasm!
    DOWNLOAD NOW
    Featured Book
    • MFB7e_Print_App_Access.png

      Medication Fact Book for Psychiatric Practice, Seventh Edition (2024) - Regular Bound Book

      The updated 2024 reference guide covering the most commonly prescribed medications in psychiatry.
      READ MORE
    Featured Video
    • KarXT (Cobenfy)_ The Breakthrough Antipsychotic That Could Change Everything.jpg
      General Psychiatry

      KarXT (Cobenfy): The Breakthrough Antipsychotic That Could Change Everything

      Read More
    Featured Podcast
    • shutterstock_2622607431.jpg
      General Psychiatry

      Should You Test MTHFR?

      MTHFR is a...
      Listen now
    Recommended
    • Join Our Writing Team

      July 18, 2024
      WriteForUs.png
    • Insights About a Rare Transmissible Form of Alzheimer's Disease

      February 9, 2024
      shutterstock_2417738561_PeopleImages.com_Yuri A.png
    • How to Fulfill the DEA's One Time, 8-Hour Training Requirement for Registered Practitioners

      May 24, 2024
      DEA_Checkbox.png
    • Join Our Writing Team

      July 18, 2024
      WriteForUs.png
    • Insights About a Rare Transmissible Form of Alzheimer's Disease

      February 9, 2024
      shutterstock_2417738561_PeopleImages.com_Yuri A.png
    • How to Fulfill the DEA's One Time, 8-Hour Training Requirement for Registered Practitioners

      May 24, 2024
      DEA_Checkbox.png
    • Join Our Writing Team

      July 18, 2024
      WriteForUs.png
    • Insights About a Rare Transmissible Form of Alzheimer's Disease

      February 9, 2024
      shutterstock_2417738561_PeopleImages.com_Yuri A.png
    • How to Fulfill the DEA's One Time, 8-Hour Training Requirement for Registered Practitioners

      May 24, 2024
      DEA_Checkbox.png

    About

    • About Us
    • CME Center
    • FAQ
    • Contact Us

    Shop Online

    • Newsletters
    • Multimedia Subscriptions
    • Books
    • eBooks
    • ABPN Self-Assessment Courses

    Newsletters

    • The Carlat Psychiatry Report
    • The Carlat Child Psychiatry Report
    • The Carlat Addiction Treatment Report
    • The Carlat Hospital Psychiatry Report
    • The Carlat Geriatric Psychiatry Report
    • The Carlat Psychotherapy Report

    Contact

    carlat@thecarlatreport.com

    866-348-9279

    PO Box 626, Newburyport MA 01950

    Follow Us

    Please see our Terms and Conditions, Privacy Policy, Subscription Agreement, Use of Cookies, and Hardware/Software Requirements to view our website.

    © 2025 Carlat Publishing, LLC and Affiliates, All Rights Reserved.