What is De-identification?

Young Man with a blank paper hiding his face

De-identification is a technique to mask or remove protected health information (PHI) from sensitive patient data. If you need to use production HL7 data, you’ll need to find a way to protect PHI. De-identification is a good choice when you need to:

  • Troubleshoot an interface
  • Populate a test system
  • Gather data for analytics

Why de-identify?

There are a number of reasons you should consider de-identification.

1. Leverage the richness of production data

When you test a system or an interface, you’ll cover the most realistic test scenarios with realistic data. What is “realistic data”? Customized fields, Z-segments, the code sets that your health system (or your customer) really uses. De-identifying production data will get you realistic data. Learn more about HL7 testing in Chapter 9 of the HL7 Survival Guide.

2. Support HIPAA, protect patient identifiers

Based on HIPAA, you need to protect 18 types of information that could potentially identify a patient. This covers a range of items: names, locations, phone numbers, social security numbers, even medical device numbers. Before you use the data, make sure you these identifiers are protected. De-identification will remove the data for you, and in some cases, replace it with realistic dummy data.

3. Safeguard data

Even if you sign and enforce Business Associate agreements, data in transit – via email or a laptop — is at risk. Reduce the risk by de-identifying data within the Covered Entity system before moving it to the BA.

How to de-identify HL7 data

You have some options when it comes to de-identifying data.

Manual removal
Some analysts de-identify data manually by loading messages in a text editor then scanning, removing, and replacing data in HL7 fields, components, and sub-components. This works if you’ve got a small batch to process – say 10 messages or fewer – and you can get someone to check over your work to make sure you haven’t missed a data element.

Some developers write custom scripts to deal with a batch of messages. If you have the time and the skills, this can be a good option; however, when you take into consideration the time to both write and test the script for a complex de-identification, this option becomes expensive.

Finally there are teams that use de-identification software to get the job done.  If you’re shopping for software, look for these capabilities:

  • Easy to use – if it takes a week to set up and train on the de-identification, that’s going to slow you down. Keep looking.
  • Ability to remove data and also replace it with random yet realistic data.
  • The data replacement should be flexible. For instance, if you’re removing date of birth in a data for patients over 65, make sure the replacement data generated by the software doesn’t turn them in NICU patients.
  • Tracking: keep track of what you’ve de-identified. You’ll want to keep a record of your de-identifications: which fields and segments were affected, when the de-identification took place, who did it, etc.

De-identification Software Free Trial

De-identify HL7 data in minutes. Download a Cloak software trial now.