ChatGPT Health performance in a structured test of triage recommendations

ChatGPT Health performance in a structured test of triage recommendations

Subjects

Abstract

ChatGPT Health launched in January 2026 as OpenAI’s consumer health tool, reaching millions of users. Here, we conducted a structured stress test of triage recommendations using 60 clinician-authored vignettes across 21 clinical domains under 16 factorial conditions (960 total responses). Performance followed an inverted U-shaped pattern, with the most dangerous failures concentrated at clinical extremes: non-urgent presentations (35%) and emergency conditions (48%). Among gold-standard emergencies, the system under-triaged 52% of cases, directing patients with diabetic ketoacidosis and impending respiratory failure to 24–48-hour evaluation rather than the emergency department, while correctly triaging classical emergencies such as stroke and anaphylaxis. When family or friends minimized symptoms (anchoring bias), triage recommendations shifted significantly in edge cases (OR 11.7, 95% CI 3.7-36.6), with the majority of shifts toward less urgent care. Crisis intervention messages activated unpredictably across suicidal ideation presentations, firing more when patients described no specific method than when they did. Patient race, gender, and barriers to care showed no significant effects, though confidence intervals did not exclude clinically meaningful differences. Our findings reveal missed high-risk emergencies and inconsistent activation of crisis safeguards, raising safety concerns that warrant prospective validation before consumer-scale deployment of artificial intelligence triage systems.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

27,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 print issues and online access

269,00 € per year

only 22,42 € per issue

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

Author information

Author notes

  1. These authors contributed equally: Eyal Klang, Girish N. Nadkarni.

Authors and Affiliations

  1. The Milton and Carroll Petrie Department of Urology, Icahn School of Medicine at Mount Sinai and Mount Sinai Health System, New York, NY, USA

    Ashwin Ramaswamy, Alvira Tyagi, Alexis E. Te, Steven A. Kaplan, Ashutosh K. Tewari & Michael A. Gorin

  2. Department of Medicine, NYC Health + Hospitals / Elmhurst, Icahn School of Medicine at Mount Sinai and Mount Sinai Health System, New York, NY, USA

    Hannah Hugo

  3. The Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai and Mount Sinai Health System, New York, NY, USA

    Joy Jiang, Pushkala Jayaraman, Joshua Lampert, Robert Freeman, Ankit Sakhuja, Bilal Naved, Alexander W. Charney, Mahmud Omar, Michael A. Gorin, Eyal Klang & Girish N. Nadkarni

  4. The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai and Mount Sinai Health System, New York, NY, USA

    Joy Jiang, Pushkala Jayaraman, Mateen Jangda, Ankit Sakhuja, Bilal Naved, Alexander W. Charney & Girish N. Nadkarni

  5. University of Miami Miller School of Medicine, Miami, FL, USA

    Mateen Jangda

  6. Department of Emergency Medicine, Icahn School of Medicine at Mount Sinai and Mount Sinai Health System, New York, NY, USA

    Nicholas Gavin

  7. The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai and Mount Sinai Health System, New York, NY, USA

    Ankit Sakhuja, Bilal Naved, Alexander W. Charney, Mahmud Omar, Eyal Klang & Girish N. Nadkarni

Authors

  1. Ashwin Ramaswamy
  2. Alvira Tyagi
  3. Hannah Hugo
  4. Joy Jiang
  5. Pushkala Jayaraman
  6. Mateen Jangda
  7. Alexis E. Te
  8. Steven A. Kaplan
  9. Joshua Lampert
  10. Robert Freeman
  11. Nicholas Gavin
  12. Ashutosh K. Tewari
  13. Ankit Sakhuja
  14. Bilal Naved
  15. Alexander W. Charney
  16. Mahmud Omar
  17. Michael A. Gorin
  18. Eyal Klang
  19. Girish N. Nadkarni

Corresponding authors

Correspondence toAshwin Ramaswamy or Girish N. Nadkarni.

Supplementary information

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramaswamy, A., Tyagi, A., Hugo, H. et al. ChatGPT Health performance in a structured test of triage recommendations.Nat Med (2026). https://doi.org/10.1038/s41591-026-04297-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41591-026-04297-7