Medical AI LLM Tracker – August 2025

Aug 9, 2025
2 min read

Dr. Tang, Lee Akay

As Dr. Tang reported from WAIC 2025, the conversation in healthcare AI has shifted decisively from potential to proven impact. Across hospitals, clinics, and health systems, AI models are no longer experiments, they are embedded in workflows, delivering measurable improvements in efficiency, accuracy, and patient experience.

For healthcare executives and physicians, this shift creates both opportunity and complexity. The number of available large language models (LLMs) is growing fast, but only a small fraction has demonstrated operational success in real-world clinical environments. Selecting the right model and knowing when to switch requires more than benchmark scores, it requires verified deployment data, and a framework designed for flexibility in a fluid market.

To address this need, IDC has created the Operational Medical AI LLM Tracker, an objective, continuously updated ranking of models based on real-world performance, integration readiness, and safety with each result tagged by a clear confidence rating. This is our first published analysis, and it will be updated monthly as new deployments, metrics, and evidence emerge.

Methodology & Confidence Ratings

IDC’s Medical AI LLM Tracker ranks models based on deployment presence, deployment quality, safety & compliance, integration readiness, and open-source flexibility.

Confidence ratings:

High – Peer-reviewed or official hospital/government data with independent corroboration
Medium – Official announcement plus a corroborating source
Low – Single-source or vendor-only claim

Metrics labeled “Pending” are awaiting independent verification and will be updated when validated data is available.

Medical AI LLM Rankings – August 2025

1. DAX Copilot

Deployment Presence: High
Deployment Quality: High – 24% ↓ documentation time, 17% ↓ after-hours charting, 77–81% clinician satisfaction (Northwestern Medicine, Overlake Medical Center)
Confidence: High
IDC insights: Most robust published evidence of measurable workflow impact.

2. MedGo

Deployment Presence: High – Shanghai East Hospital, 15 Pudong CHCs, hospitals in Jiangsu and Shanxi
Deployment Quality: Pending KPI data – No publicly verified clinician time saved, satisfaction %, or error-rate changes
Confidence: Medium
IDC insights: Strong breadth of deployment in China; waiting on quantifiable impact data.

3. DeepSeek (R1 in medical deployments)

Deployment Presence: High – PKU Third Hospital, Beijing Friendship, Beijing Tsinghua Changgung, 100+ hospitals reported
Deployment Quality: Pending KPI data – No published clinician or workflow metrics yet
Confidence: Medium
IDC insights: Open-weight model with wide hospital adoption; verification of outcomes is next step.

Additional Ranked Models

Model	Deployment Presence	Deployment Quality	Confidence
Med-PaLM 2	Medium	Pending KPI data	Medium
HuatuoGPT / II	Low	N/A	Low
Hippocratic AI	Medium	Pending clinician workload/time & readmission data	Medium

Models to Watch (non-scored, emerging contenders)

Qilin-Med-VL
LLaVA-Med
BioGPT-X
Tongyi-Qianwen-Med
Claude-Med

Want the complete tracker?Subscribe to IDC’s monthly update and get the full Operational Medical LLM Tracker with all confidence ratings and data sources.