Preventing PHI leakage in a clinical AI
How a healthcare AI company discovered and remediated 12 critical data exposure vulnerabilities in their clinical decision support chatbot before a HIPAA audit.
At a glance
- Challenge
- Clinical AI chatbot handling PHI with no LLM-specific adversarial testing
- Solution
- Targeted red teaming for PHI extraction, hallucination detection, HIPAA-aligned reporting
- Timeline
- 4 weeks from assessment to remediation
- Team
- 2 security engineers + clinical compliance team
12
Critical findings caught pre-audit
0
Data breaches since deployment
100%
HIPAA compliance achieved
3 weeks
Ahead of audit deadline
Background
A healthcare AI company building clinical decision support tools used by over 400 hospitals and clinics. The flagship product is an AI chatbot that helps clinicians look up drug interactions, review patient summaries, and generate clinical notes, backed by an LLM integrated with EHR systems.
The system processes PHI on every interaction. With a HIPAA compliance audit eight weeks out, the CTO realised existing security testing had never included adversarial attacks specific to LLMs.
A single PHI exposure through the AI chatbot could trigger mandatory breach notification under HIPAA, OCR investigation, and fines up to $1.5M per violation category.
The challenges
- PHI exposure through conversational context Conversation context retained patient identifiers across sessions. No tests had verified whether adversarial prompts could extract them.
- Hallucinated medical guidance The LLM occasionally generated clinically inaccurate drug interaction warnings or dosage recommendations.
- No LLM-specific testing history Annual pen tests and SOC 2 covered network and application layers but not prompt injection, jailbreaks, or indirect injection through EHR data.
- Regulatory deadline pressure Eight weeks to identify, remediate, and document all AI-specific risks while the product remained in production.
Our approach
PHI-focused red teaming
Targeted attacks designed for healthcare PHI extraction scenarios.
- Simulated adversarial clinician sessions extracting other patients' records
- Tested cross-session context leakage with 50+ conversation patterns
- Attempted PHI extraction through indirect injection via EHR data fields
- Validated that system prompts contained no patient data or credentials
Hallucination detection
Systematic verification of clinical accuracy in AI-generated responses.
- Tested 200+ known drug interactions for accuracy
- Identified hallucination patterns in dosage and contraindications
- Validated appropriate use of uncertainty language
- Mapped hallucination frequency by clinical domain
HIPAA evidence generation
Generated audit-ready documentation mapping all findings to HIPAA requirements.
- Mapped every finding to HIPAA Security Rule provisions (§164.308-§164.312)
- Generated HIPAA Risk Analysis evidence
- Documented remediation with before/after results
- Created ongoing monitoring reports for HIPAA evaluation
Representative findings
Cross-patient record leakage through context manipulation
criticalA multi-turn conversation mimicking a clinical workflow caused the system to surface PHI from a previously-accessed patient in responses about a different patient. RAG retrieval was not enforcing patient boundaries.
System prompt containing database connection strings
criticalThe system prompt included a partial DB connection string for EHR lookups. A role-play jailbreak could extract it, granting potential direct access to patient data.
Hallucinated drug interaction warnings
highClinically inaccurate warnings for 8% of tested combinations. In 3 cases the system failed to flag known dangerous interactions.
Session data persisting beyond logout
highPatient context from previous sessions was accessible after logout via constructed follow-up prompts.
Outcomes
- All 12 critical and high-severity findings remediated within 3 weeks, finishing 3 weeks ahead of the audit
- Cross-patient leakage patched with strict context isolation, verified with 500+ adversarial test cases
- Database credentials removed from system prompts; replaced with a secure credential manager
- Drug-interaction hallucination rate dropped from 8% to 0.3% with a clinical knowledge verification layer
- HIPAA audit completed with zero AI-related findings, first time the organisation hit that benchmark
- Continuous monitoring now provides ongoing HIPAA Risk Analysis evidence
We thought our existing pen tests covered us. AgenticAssure showed us an entire category of LLM-specific risks we'd never tested. Finding cross-patient leakage before our HIPAA audit potentially saved us from a reportable breach.