De-identifying Patient Data, Part 1

In healthcare IT, no matter where you work, you’re faced with protecting patient data. Many countries have regulatory frameworks to address patient privacy and the use of health information. In the US, HIPAA regulates the use of PHI (protected health information). In Canada, the law is called PIPEDA (Personal Information Protection and Electronic Documents Act). PIPEDA regulates the use of consumer data in a number of industries, not just healthcare. Plus a few Canadian provinces have their own privacy legislation in place.

Regardless, data breaches cost healthcare organizations a staggering $6 billion annually, in the US alone.

So how do you protect patient data? Let’s hone in on one data protection technique: de-identification. Data de-identification is essentially a way to mask or replace personally identifiable information (PII) and protected health information (PHI). On occasion, HL7 analysts need to share or redistribute HL7 production data. One use case is the need to port realistic data to a test system or staging area.

So what do you need to know in order to de-identify HL7 log data?

  1. To begin with, you’ll need to list the sensitive data identifiers you’re dealing with. The Department of Health and Human Services (HHS) provides a HIPAA Privacy Rule booklet (PDF) that highlights the 18 HIPAA identifiers. Each identifier is a category of data you need to protect. The list goes way beyond names, addresses, social security numbers, and health plan numbers. You’ll need to pay attention to device identifiers and even IP addresses. Ensure that your de-identification technique covers all 18 identifiers.
  2. To be safe, use techniques that don’t permit re-identification.
  3. Make sure you map identifiers to HL7 fields and segments. This will vary from one system to the next. You’ll want to have the ability to trace which message components will be impacted by changes before you hit that OK button (or the equivalent) on your de-identification tool.
  4. Ensure the data remains useful. One of the issues with traditional randomization techniques is that scrambled data may not be plausible. Overall meaning in the message flow should be preserved. You don’t want to be able to identify patient John Smith, but you want to make sure he isn’t discharged before he’s admitted — so the patient’s overall record should remain as-is.

Further Reading on Protecting Patient Data

Your Comments

We’ve just touched the tip of the de-identification iceberg here. Are there other issues we should be keeping an eye out for? Let everyone know in the comments.