Preventing PHI leakage in a clinical AI

Background

A healthcare AI company building clinical decision support tools used by over 400 hospitals and clinics. The flagship product is an AI chatbot that helps clinicians look up drug interactions, review patient summaries, and generate clinical notes, backed by an LLM integrated with EHR systems.

The system processes PHI on every interaction. With a HIPAA compliance audit eight weeks out, the CTO realised existing security testing had never included adversarial attacks specific to LLMs.

A single PHI exposure through the AI chatbot could trigger mandatory breach notification under HIPAA, OCR investigation, and fines up to $1.5M per violation category.

The challenges

PHI exposure through conversational context Conversation context retained patient identifiers across sessions. No tests had verified whether adversarial prompts could extract them.
Hallucinated medical guidance The LLM occasionally generated clinically inaccurate drug interaction warnings or dosage recommendations.
No LLM-specific testing history Annual pen tests and SOC 2 covered network and application layers but not prompt injection, jailbreaks, or indirect injection through EHR data.
Regulatory deadline pressure Eight weeks to identify, remediate, and document all AI-specific risks while the product remained in production.

Our approach

PHI-focused red teaming

Targeted attacks designed for healthcare PHI extraction scenarios.

Simulated adversarial clinician sessions extracting other patients' records
Tested cross-session context leakage with 50+ conversation patterns
Attempted PHI extraction through indirect injection via EHR data fields
Validated that system prompts contained no patient data or credentials

Hallucination detection

Systematic verification of clinical accuracy in AI-generated responses.

Tested 200+ known drug interactions for accuracy
Identified hallucination patterns in dosage and contraindications
Validated appropriate use of uncertainty language
Mapped hallucination frequency by clinical domain

HIPAA evidence generation

Generated audit-ready documentation mapping all findings to HIPAA requirements.

Mapped every finding to HIPAA Security Rule provisions (§164.308-§164.312)
Generated HIPAA Risk Analysis evidence
Documented remediation with before/after results
Created ongoing monitoring reports for HIPAA evaluation

Representative findings

Cross-patient record leakage through context manipulation

critical

A multi-turn conversation mimicking a clinical workflow caused the system to surface PHI from a previously-accessed patient in responses about a different patient. RAG retrieval was not enforcing patient boundaries.

System prompt containing database connection strings

critical

The system prompt included a partial DB connection string for EHR lookups. A role-play jailbreak could extract it, granting potential direct access to patient data.

Hallucinated drug interaction warnings

high

Clinically inaccurate warnings for 8% of tested combinations. In 3 cases the system failed to flag known dangerous interactions.

Session data persisting beyond logout