Fragility Index


  • The Fragility Index is the minimum number of patients whose status would have to change from a nonevent to an event that is required to turn a statistically significant result to a non-significant result
  • The smaller the Fragility Index, the more fragile the trial’s outcome
  • The Fragility Index is a useful metric for demonstrating how easily statistical significance based on a threshold P-value may be overturned
  • Much of the published medical literature, especially in critical care, is built upon ‘statistically fragile’ trials


Threshold p-values are widely used in the medical literature to determine statistical significance despite important limitations

  • results with similar P-values do not indicate a similar likelihood of being real if there are large differences in the size of the trials or number of events in the trials being compared
  • when one P-values when is above and one below the threshold value (eg, P = 0.051 and P = 0.049), the latter, but not the former, is typically interpreted as indicating a real treatment effect despite there being minimal absolute difference between the two p-values

95% Confidence Intervals have similar problems to threshold p-values

  • they are often viewed dichotomously as indicating significance if they do not cross 1
  • smaller, more fragile trials can have tighter 95CIs that are more distant from 1 than larger, less fragile trials


Fragility Index can be calculated as follows (from Ridgeon et al, 2016):

  • trial results are arranged in a two-by-two contingency table
  • an event is iteratively added to the group with the smaller number of events (although removing a nonevent from the same group to maintain the total group size) until the p value produced by Fisher exact test equaled or exceeded 0.05
  • The number of events added to reach this threshold is the Fragility Index


Ridgeon et al, 2016

  • The authors attempted to calculate the fragility index for all MCRCTs in critical care medicine reporting mortality; they found 56 MCRCTs that met their criteria
  • Findings
    • The median fragility index was 2 (interquartile range, 1-3.5)
    • greater than 40% of trials had a fragility index of less than or equal to 1
    • 12.5% of trials reported loss to follow-up greater than their fragility index
    • Trial sample size was positively correlated (less fragile), and reported p value was negatively correlated (more fragile), with fragility index 
    • An overview of the 56 eligible MCRCTs is available in one of the online supplements
  • The authors conclude that
    • findings in critical care trials often depend on a small number of events
    • critical care clinicians should be wary of basing decisions on trials with a low fragility index.
    • fragility index should be reported for future trials in critical care to aid interpretation and decision making by clinicians


Walsh et al, 2014

  • The authors calculated the Fragility Index for 399 eligible RCTs  in high-impact medical journals that reported a statistically significant result for at least one dichotomous or time-to-event outcome in the abstract
  • The journals included were: NEJM, The Lancet, JAMA, BMJ and Annals of Internal Medicine
  • Findings
    • the RCTs had:
      • median sample size of 682 patients (range: 15–112,604)
      • median of 112 events (range: 8–5,142)
    • 53% reported a P-value <0.01
    • median Fragility Index was 8 (range: 0–109)
    • 25% had a Fragility Index of 3 or less
    • In 53% of trials, the Fragility Index was less than the number of patients lost to follow-up
  • Commentary:
    • note that the trials included in this study were not necessarily multi-center studies and were not restricted to having mortality as a statistically significant outcome
  •  Conclusion:
    • The statistical significance of RCTs in major medical journals often hinges on the outcomes of a small number of events, suggesting that the results are ‘fragile’
    • This is supported by high rates of medical reversal when trials are repeated or subsequent larger, multi-center trials are performed


This section is based on a discussion with Paul Young:

Interpretation of the Fragility Index, and the importance of loss to follow-up, should be taken in context


  • The NICE-SUGAR trial had a Fragility Index of 11 and 82 patients were lost to follow-up
    • the conclusion was measured in that the authors only stated that intensive insulin therapy is not better than conventional insulin therapy and may be harmful
    • the number of events that need to be changed to make this interpretation incorrect is very large – i.e. you need to make the significance swing the opposite direction, i.e.  significance would have to swing in the opposite direction.
  • The CRASH-2 trial had a Fragility Index of 48 and 84 patients were lost to follow-up
    • the loss to follow-up is one of a number of issues that weakens the strong drive to translate the findings of this study into clinical practice
    • other issues are that only 3% of CRASH-2 patients came from countries with modern trauma centres and the thromboembolic risk in trauma patients in these centers is likely to be high

Overall, the issue of loss to follow-up appears to be less of an issue in critical care trials compared to non-critical care trials published in high impact journals.

References and links


Journal articles

  • Feinstein AR. The unit fragility index: an additional appraisal of “statistical significance” for a contrast of two proportions. Journal of clinical epidemiology. 43(2):201-9. 1990. [pubmed]
  • Ridgeon EE, Young PJ, Bellomo R, Mucchetti M, Lembo R, Landoni G. The Fragility Index in Multicenter Randomized Controlled Critical Care Trials. Critical care medicine. 2016. [pubmed]
  • Walsh M, Srinathan SK, McAuley DF. The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index. Journal of clinical epidemiology. 67(6):622-8. 2014. [pubmed] [free full text]

FOAM and web resources

CCC 700 6

Critical Care


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.