Cloak

Introducing CaristixTM Cloak Software

Caristix Cloak is designed to help interface analysts and engineers to accurately de-identify HL7 data, covering all 18 HIPAA identifiers. Data can then be safely shared for such purposes as porting realistic data to a test system or staging area, providing realistic sample HL7 messsages for interface scoping, and providing data for clinical and financial analytics.

Cloak software provides the following features and functionality:

Maps identifiers to fields and segments
Maintains useful date preserving the patient’s overall record
Traces impacted message components
Produces a de-ideintification process report
Produces a de-ideintification process report

System Requirements

Firsts Steps

The easiest way to get your feet wet with Cloak is to read about how to de-identify HL7 Messages.

Getting started

Install and Register Caristix Cloak Software

Install Caristix Cloak by clicking on the installation file (.msi file) you received.
Launch the software, and fill out the Email, First Name, Last Name and Organization fields in the registration form.
Click the Activate button.If you have a trial version, you will need to purchase an annual license to continue using Cloak after the end of the trial period.

Managing Files

File Types in Cloak

De-id Rules File

The de-identification settings in the Fields and Data Types tabs can be saved and reused. Cloak loads the last used de-id rules file when the program is opened. The default de-id rules will be used the first time, it contains the fields and data types that cover the 18 HIPAA identifiers that must be de-identified.

HL7 Messages

These are the HL7 messages you load in Cloak for de-identification.

File Menu

New – Start a new de-identification process.
Open
- Default De-Id Rules – Replace the existing de-identification rules with the default de-identification rules that covers the HIPAA privacy rule.
- De-Id Rules – Browse to and open an existing de-id rules file
- Messages – Browse to and open log file to de-identify. Alternative: messages can also be extracted from a database or a Connector (click on the Database tab and select a Data Source)
Save De-Id Rules – Save the current de-id rules
Save De-Id Rules as… – Save a rules file for reuse
Configure Cloak WebService – Configure the Cloak WebService stand-alone application (contact sales@caristix.com for more information)
De-identify – After review, de-identify an open log file
Close – Close Caristix Cloak

De-Identification Concepts

Protecting Patient Data

One of the most important issues in healthcare IT is the protection of patient data. Regulation addresses patient privacy and the use of health information in many countries. In the US, HIPAA regulates the use of PHI (protected health information).

While protecting patient data, HL7 analysts need to share or redistribute HL7 production data for such purposes as porting realistic data to a test system or staging area, providing realistic sample HL7 messages for interface scoping, and providing data for clinical and financial analytics.

The Department of Health and Human Services (HHS) provides a HIPAA Privacy Rule booklet (PDF) that highlights the 18 criteria that can be used to identify patients. All 18 identifiers are categories of data that must be protected. Besides easily recognized personal information, care must be given to protect device identifiers and even IP addresses. De-identification techniques must cover all 18 identifiers.

Definitions

De-identification or Anonymization

This term refers to removing or masking protected information. The de-identification removes identifiers from a data set so that information can no longer be linked to a specific individual. In terms of health care information, all identifiers are removed from the information set including both personally identifiable information (PII) and protected health information (PHI).

Pseudonymization

As a subset of de-identification, pseudonymization replaces data elements with new identifiers. After that substitution, the initial subject cannot be associated with the data set. In terms of health care information, patient information can be pseudonymized by replacing patient-identifying data with completely unrelated data resulting in a new patient profile. The data appears complete and the data context is preserved while patient information is completely protected

Re-identification

A pseudonymized data set can be restored to its original state through re-identification. In re-identifying data, a reverse mapping structure (constructed as the data was pseudonymized) is applied. As an example, a pseudonmymized data set could be sent for processing to an external system. Once that processed information is returned, the data could be re-identified and pushed to the correct patient file.

Identifiers

Identifiers are data elements that can directly identify individuals.This includes name, email address, telephone address, home address, social security number, medical card number, among others. Two identifiers may be needed to identify a unique individual.

Quasi-identifiers

Data elements of this type do not directly identify an individual but may provide enough information to narrow the potential of identifying a specific individual. Genders, date of birth and zip/postal code have been studied extensively in this context. There is a dependent relationship between quasi-identifiers and the type of data set of which they are a part. As an example, if all members of a data set are male, gender cannot be a meaningful quasi-identifier. In addition, quasi-identifiers are categorical in nature with a finite set of discrete values. It’s relatively easy to search for individuals using quasi-identifiers.

Non-identifiers

Non-identifiers may contain an individual’s personal information but aren’t helpful in reconstructing the initial information. For example, an indicator of an allergy to pollen would be a non-identifying data element. The incidence of such an allergy is extremely high in the general population. Therefore this factor is not a good discriminator among individuals. Again, non-identifiers are dependent on data sets. In the right context, they may be used to identify an individual.

De-Identification Concepts

Protecting Patient Data

Definitions

De-identification or Anonymization

Pseudonymization

Re-identification

Identifiers

Quasi-identifiers

Non-identifiers

De-identifying HL7 Messages

Overview

De-identification in Cloak works as follows:

Loading HL7 Messages

Load the HL7 message that requires de-identification:

On the menu bar, go to File, Open, Messages…
In the HL7 Log dialog box that opens, click the plus (+) sign.
Navigate to your log file and choose the file to be opened. Add more files by clicking the plus (+) sign.
Click Open.The file name will appear highlighted in the HL7 Log dialog box.
Click Next to continue.

The log is loaded in Messages tab. The tab also indicates the number of messages in the viewing pane and the total number of messages in the file you loaded. The Original pane displays the log you loaded while the De-identified pane displays the de-identified log. The split screens scroll synchronously so that the data displayed is mirrored in the original and de-identified logs.

Resize vertically to change the quantity of data displayed in the viewing pane. Place the pointer on the line dividing the two panes and drag the window to increase or decrease its size. Click Hide and Show buttons to hide or view panes as needed.

The fields and data types set for de-identification are highlighted in red for easy visibility.

De-identification Settings

On the left side of the screen are the de-identification settings listed under the Fields and Data Types tabs. Cloak loads settings to cover the 18 HIPAA identifiers by default.

Fields Tab

Checkbox: the checkmark indicates an active rule. Uncheck to deactivate a rule.
SEGMENT: select a segment.
FIELD: select a field.
COMPONENT: select a component, if needed.
SUBCOMPONENT: select a subcomponent, if needed.
ID: sets the primary key. This lets you set the data element used to link several messages to the same patient and keep the clinical history unchanged. Check it for any field uniquely identifying the patient. For instance, if patient is identified using PID.3.1 and PID.3.4, make sure there is a rule for each of those 2 fields and check both as being ID. Int he same way, if you want to use patient name, gender and date of birth as patient identifier, make sure the ID check box is checked for all fields. Unchecking it would change how data is generated (a new patient would be created for each message).
GENERATOR TYPE: sets the data source for de-identified values. Learn about Generator Type settings.

Data Types Tab

Checkbox: the checkmark indicates an active rule. Uncheck to deactivate.
DATATYPE: select a datatype.
COMPONENT: select a component.
GENERATOR: sets the data source for de-identified values. Learn about Generator settings.

Add or Remove Rules

To add a de-identification rule under Fields or Data Types:

Click the plus sign at the bottom of the list of selectors. A new line will appear.
Edit using dropdowns in each column.

To remove a setting, click the trashcan at the end of the line.

View Example and Save a De-identified HL7 Log File

Once you have created and configured all the selectors applicable to the HL7 log to be de-identified, click View Example at the bottom of the left hand pane. A preview of the de-identified log file will appear. Scroll through the log in the viewing panes to verify the potential results of the de-identification process.

Once reviewed and after applying any changes:

Click De-identify at the bottom of the left hand screen to the right of View Example to save the de-identified log file.
A Save Results dialogue box will appear with a number of options. Click the appropriate radio button for that log file. Those options include:
- Save the file with the initial file structure
- Divide the file into smaller chunks of a specific size in MB or number of messages
- Not to split the file but to keep it all in one file
Click Save and browse to the location to store the file. Click Save.
A window will open tracking the progress of the process.

Results Summary

Once saved, a De-identification Process Report dialogue box will open asking if you wish to create a de-identification process report. Click Yes or No. If Yes is clicked, you will be prompted to choose a location to save the generated PDF and to give a name to the file. Click Save and the file will be saved to the specified location. The PDF of the De-identification Process Summary will open on your desktop for review.

Once a set of selectors have been chosen for the de-identification of a log file, that set can be saved for reuse.

In the drop down menu under File in the upper menu bar, Click Save de-id rules.
Choose a location and fill in a file name for the settings and click Save.

Once a log file has been opened, the saved de-identification rules can be applied by clicking Open de-id rules from the drop down menu bar under File in the the top menu bar.

Generators

Generators refer to the data sources used to set de-identification values in Workgroup.

Generator	Recommended Use
String	Insert a randomly generated string or static value. You can set the length and other parameters.
Boolean	Insert a Boolean value (true or false).
Numeric	Insert a randomly generated number. You can set the length, decimals and other parameters.
Date Time	Insert a randomly generated date-time value. You can set the range, time unit, format, and other parameters.
Table	Pull data from HL7-related tables stored in one of your profiles, useful for coded fields.
SQL Query	Pull data from a database based on an SQL query. You’ll be able to configure a database connection.
Text	Pull random de-identification data from a text file — for instance, a list of names. Several file formats can be used: txt, csv, etc
Excel	Pull random de-identification data from an Excel 2007 or later spreadsheet — for instance, a list of names, addresses, and cities.
Use Original Value	Keep the field as-is. No de-identification rules will be applied.
Copy Another Field	Copy the contents of another field.
Unstructured Data	Find and replace sensitive data in free text fields — for instance, find and replace a patient’s last name in physician notes.

Generator Settings

Each generator has its own settings, which you can edit from the Value Generator tab. Click on the generator name to navigate to the setting details.

Advanced Mode

Allows you to use more than one generator for a single field, edit the output format or preformat the source value. You can also set preconditions to conditionally apply the de-identification rule.

Preformat Value

(Only available in Advanced Mode)

Use this to format the original value before it is processed.

This is useful for generators that include the original value or ID fields. Here are two usage examples:

a) In an unstructured data field, you may wish to remove a value that is not contained elsewhere (not already cloaked in another field):

If you know the field may contain a reference to an ID defined as ‘ID-999999’, you would:

1. Cloak the field using an Unstructured Data generator.

2. Set the following preformat for the unstructured data:

Find what:

ID-\d+     (Search for a text, anywhere in the field value, starting with 'ID-' and followed by one or more numbers.)

Replace by:

ID-XXXX     (We set a static text to hide the ID but still keep the context of the text.)

b) If you have the same patient ID number in two systems, but formatted differently, you could format them so that both systems change to the same ID format and can both be recognized as the same patient. Having the same ID will provide continuity of the message flow for a patient (messages will be cloaked using the same fake data):

If, for example, PID.2 is defined like this for the two systems:

First system: ID:123456

Second system: 123-456

You would need to:

a) Set the field PID.2 as an ID (by checking the ID column).

b) Define two preformats like this:

Find what:

^ID-(?<ID_Number>\d+)$      (We find an exact match for the format and set the numbers only in a group variable named 'ID_Number')

Replace by:

${ID_Number}    (We set only the number, removing the superfluous text)

Find what :

^(?<ID_Number_Part_1>\d+)-(?<ID_Number_Part_2>\d+)$      (Find an exact match for the format and set the numbers only in a group variable named 'ID_Number')

Replace by:

${ID_Number_Part_1}${ID_Number_Part_2}     (Only the number, remove the superfluous text)

Now both systems will treat PID.2 as being ‘123456’ and match and cloak the messages properly as being the same patient.

String

This generator creates a uppercase character string to be used to set a static value.

How to use the “String” generator to create random value:

Check the Random option.
Set the minimum length of the strings you want to generate.The minimum value for this configuration is 0. A string with a length of 0 is equivalent to an empty string.
Set the maximum length of the strings you want to generate.
Include lowercase letters (a to z characters).
Include uppercase letters (Z to Z characters).
Include digits (0 to 9 characters).
Include special characters. This allows you to include any character you want.
Include random blanks. Including random blanks generates empty strings among the values for use in the field or data type.

How to use the “String” generator to set a static value:

Check the Static option.
Set the static value to be inserted.

How to use the “String” generator to set a Lorem Ipsum text:

Check the Lorem Ipsum option.
Set the minimum length of the strings you want to generate.The minimum value for this configuration is 0. A string with a length of 0 is equivalent to an empty string.
Set the maximum length of the strings you want to generate.
Include random blanks. Including random blanks means that you generate empty strings among the values for use in the field or data type.

Example #1:

Generated Values

Random
Minimum length: 0
Maximum length: 5
Include random blanks: checked

XDZ

VOJHZ

BFAR

Example #2:

Generated Values

Static
Static Value: MyNewValue

MyNewValue

Boolean

This generator creates a Boolean (True or False) value.

How to use the Boolean generator:

Random values
- Generate True or False value randomly.
- Include random blanks. Allowing random blanks will mean that you generate empty strings among the values for use in the field or data type.

Sequential list
- Generate a sequence of True, False, True, False, True, etc.
- Start new list. Always start the sequence with True.
- Continue from previous list. If you run the De-Identication and it ends with True, next time, it will start with False.

Example #1:

Generated Values

Random values
Include random blanks: unchecked

True

False

True

False

Numeric

This generator creates a number.

How to use the “Numeric” generator:

Random values
- Randomly generate values between minimum and maximum limits.
- Decimal. Set the decimal precision of the generated value. Example #2 will generate value with 2 decimals (3.75).
- Include random blanks. Including random blanks generates empty strings among the values for use in the field or data type.

Sequential list
- Generate a sequence of 0, 1, 2, 3, etc.
- Decimal. Set the decimal precision of the generated value. Example #2 will generate value with 2 decimals (3.75).
- Increment by. The step to use between each generation. You can use a negative value.
- Start new list. Always start with the minimum limit or the maximum limit if you’re using a negative increment.
- Continue from previous list. If you run De-Identication and it ends with 13, the next time, it will start with 14.

Example #1:

Generated Values

Sequential list
Between: 10 and 1000
Decimal: 2
Increment by: 5
Start new list

10.34

15.2

20.85

25.39

30.12

Example #2:

Generated Values

Random values
Between: 10 and 1000
Decimal: 0
Include random blanks: unchecked

353

942

359

626

967

Date time

This generator creates date and time values.

How to use the “Date time” generator:

Random values
- Randomly generate values in a range between minimum and maximum limits of time unit (second, minute, etc.)
- Based on.
  - Now. Will use the current date time as a reference.
  - Actual field value. Will use the date time value from the field in the original message.
  - A specific date. You can specify a date and time to use as a reference.
- Date format. Set the format of the new date-time. Note that you have a choice of formats. You can also enter your own format manually.
- Include random blanks. Allowing random blanks will mean that you generate empty strings among the values for use in the field or data type.

Sequential list
- Generate a sequence of date-time like: 2013-12-12, 2013-12-13, 2013-12-14, 2013-12-15, etc.
- Based on.
  - Now. Will use the current date time as a reference.
  - Actual field value. Will use the date time value from the field in the original message.
  - A specific date. You can specify a date and time to use as a reference.
- Date format. Set the format of the new date-time. Note that you have a choice of formats. You can also enter your own format manually.
- Increment by. The step to use between each generation. You can use a negative value and set a time unit (second, minute, etc.)
- Start new line. Always start with the minimum limit or the maximum limit if you’re using a negative increment.
- Continue from previous list. If you run the De-Identication and it ends with 2013-12-13, next time, it will start with 2013-12-14.

Example #1:

Generated Values

Description

Random values
In a range between: 10 and 1000 Day
Based on: A specific date (2012-01-01 00:00:00)
Date format: yyyyMMdd
Include random blanks: unchecked

20120318	Reference date + 77 days
20120614	Reference date + 165 days
20140102	Reference date + 732 days
20120212	Reference date + 42 days
20130508	Reference date + 493 days

Example #2:

Generated Values

Description

Sequential list
In a range between: 0 and 1440 Minute
Based on: A specific date (2012-01-01 09:15:30)
Date format: yyyyMMddHHmmss
Increment by: 15 minutes

20120101091530	Reference date + 0 minutes
20120101093030	Reference date + 15 minutes
20120101094530	Reference date + 30 minutes
20120101100030	Reference date + 45 minutes
20120101101530	Reference date + 60 minutes

Example #3:

Generated Values

Description

Sequential list
In a range between: 0 and 30 Minute
Based on: A specific date (2012-01-01 09:15:30)
Date format: yyyyMMddHHmmss
Increment by: 10 minutes

20120101091530	Reference date + 0 minutes
20120101092530	Reference date + 10 minutes
20120101093530	Reference date + 20 minutes
20120101094530	Reference date + 30 minutes
20120101091530	Reference date + 0 minutes

When the generator exceeds the maximum value (30), the sequence is reset starting at the minimum value (0).

Example #4: Manipulate date of birth

Original field Value

Generated Value

Random values
In a range between: -3650 and 3650 Day
Based on: Actual field value
Date format: yyyyMMdd
Include random blanks: unchecked

19130113	19110213
19900909	20000812
19850909	19870514
19601020	19650218
19800317	19880617

HL7 Table

This generator pulls data from HL7-related tables stored in a profile. Read how to set the profile.

How to configure the generator to use the appropriate HL7 table:

Random values
- Randomly generate values from an HL7 table.
- Source. Select the profile containing the table.
- Table. Select a table from which the value will be generated.
- To access the table content, click on the Edit Table button. If
  you change the table content, the new table content will appear in the profile you
  select.
- Restrict to values between. Will only use table entries that are within the specified
  limits.
- Include random blanks. Allowing random blanks will mean that you generate empty
  strings among the values for use in the field or data type.

Sequential list
- Generate a sequence of value starting with the first table entry.
- Source. Select the profile containing the table.
- Table. Select a table from which the value will be generated.
- To access the table content, click on the Edit Table button. If
  you change the table content, the new table content will appear in the profile you
  select.
- Restrict to values between. Will only use table entries that are within the specified
  limits.
- Start new line. Always start with the first entry of the table.
- Continue from previous list. If you run the De-Identication and it ends with the
  13th entry, next time, it will start with the 14th one.

Example #1:

Generated Values

Random values
Table: 0001 – Administrative Sex
Restrict to values between: 1 and 1
characters
Include random blanks: unchecked

Example #2:

Generated Values

Sequential list
Table: 0001 – Administrative Sex
Start new list

SQL Query

This generator pulls data from an SQL-accessible database.

How to configure this generator to use SQL query results as de-identified values:

Select a database connection. If no database connections are configured, click Connections… to set up a connection.
Enter the SQL query. You can use the embedded Query Builder to help you build the query.
Restrict to values between. Will only use values that are within the specified limits.
Include random blanks. Allowing random blanks will mean that you generate empty strings among the values for use in the field or data type.

Example #1:

Generated Values

Connection: Connection1
Query: SELECT name FROM employees
Restrict to values between: 1 and 20 characters
Include random blanks: Unchecked

John Smith

Jane Doe

Road Runner

The Coyote

Tweety Bird

Text File

This generator pulls data from a text file (*.txt, *.csv, etc).

How to configure this generator to use text file content:

Random values
- Randomly generate values from a text file.
- File. Specify the source of the text file. Use the Browse… button to select a file.
- Column. Specify the column id to use (in case of a character delimited file, ex: *.csv)
- Column delimiter. The character that separate each column in the text file.
- First/Last rows. Specify the rows to get data.
- Between character position. Will only use characters that are within the specified positions.
- Restrict to values between. Will only use values that are within the specified limits.
- Include random blanks. Allowing random blanks will mean that you generate empty strings among the values for use in the field or data type.

Sequential list
- Generate a sequence of value from a text file starting with the first row.
- File. Specify the source of the text file. Use the Browse… button to select a file.
- Column. Specify the column id to use (in case of a character delimited file, ex: *.csv)
- Column delimiter. The character that separate each column in the text file.
- First/Last rows. Specify the rows to get data.
- Between character position. Will only use characters that are within the specified positions.
- Restrict to values between. Will only use values that are within the specified limits.
- Start new line. Always start with the first row in the text file.
- Continue from previous list. If you run the De-Identication and it ends with the 13th entry, next time, it will start with the 14th one.

Note: If more than one field is configured using the same text file, the same line will be used within the same message. In other words, you can use a text file to ensure several values will be used together. This can be useful when linking a a city with a zip code or a first name with a gender.

The examples below use the following content in a file C:MyDocumentsmyFile.txt

1,Road Runner,M,ACME,Anycity,12345
2,The Coyote,M,ACME,Anycity,12345
3,Sylvester The Cat,M,ACME,Anycity,12345
4,Tweety Bird,M,ACME,Anycity,12345
5,John Smith,M,,Anothercity,98765
6,Jane Doe,F,,Anothercity,98765

Example #1:

Generated Values

Random values
File: C:MyDocumentsmyFile.txt
Column: 2
Delimiter: ,
Restrict to values between: 1 and 20 characters
Include random blanks: Unchecked

John Smith

Jane Doe

Road Runner

The Coyote

Tweety Bird

Example #2:

Generated Values

Sequential list
File: C:MyDocumentsmyFile.txt
Column: 3
Delimiter: ,
Restrict to values between: 1 and 20 characters
Start new list

Excel file

This generator pulls data from an Excel 2007+ file (*.xlsx).

How to configure the generator to use Excel file content:

Random values
- Randomly generate values from an Excel file.
- File. Specify the source of the Excel file. Use the Browse… button to select a file.
- Worksheet. Specify the Worksheet to use.
- Column. Specify the column to use.
- First/Last rows. Specify the rows to get data.
- Restrict to values between. Will only use values that are within the specified limits.
- Include random blanks. Allowing random blanks will mean that you generate empty strings among the values for use in the field or data type.

Sequential list
- Generate a sequence of value from an Excel file starting with the first row.
- File. Specify the source of the Excel file. Use the Browse… button to select a file.
- Worksheet. Specify the Worksheet to use.
- Column. Specify the column to use.
- First/Last rows. Specify the rows to get data.
- Restrict to values between. Will only use values that are within the specified limits.
- Start new line. Always start with the first row in the Excel file.
- Continue from previous list. If you run the De-Identication and it ends with the 13th entry, next time, it will start with the 14th one.

Note: If more than one field is configured using the same worksheet, the same row will be applied across a message. In other words, you can use an Excel file to ensure that several values will be used together. This can be useful when link a city with a zip code or a first name with a gender.

The examples below use the following content from a file named C:MyDocumentsmyExcelFile.xlsx

1	Road Runner	M	ACME	Anycity	12345
2	The Coyote	M	ACME	Anycity	12345
3	Sylvester The Cat	M	ACME	Anycity	12345
4	Tweety Bird	M	ACME	Anycity	12345
5	Jane Doe	F		Anothercity	98765
6	John Smith	M		Anothercity	98765

Example #1:

Generated Values

Random values
File: C:MyDocumentsmyExcelFile.xlsx
Worksheet: TheFirstSheet
Column: 2
Restrict to values between: 1 and 20 characters
Include random blanks: Unchecked

John Smith

Jane Doe

Road Runner

The Coyote

Tweety Bird

Example #2:

Generated Values

Sequential list
File: C:MyDocumentsmyExcelFile.xlsx
Worksheet: TheFirstSheet
Column: 3
Restrict to values between: 1 and 20 characters
Start new list

Use original value

This generator is to be used when you don’t want a data element to be changed. Here
are two use case examples.

Use Case #1

Make sure this field is not changed and
Make sure the field is listed in the
De-identification Process Report to underline that the field was not de-identified

Use Case #2

De-identify all fields with XPN data type except for the attending doctor

MSH|^~&|SYSTEM-A|1 |||20100404210829||ADT^A01|20100404000000645509|P|2.3|||||CA|ASCII
EVN|A01|201004042108||129|Interface^HL7 Interface|201004042106
PID|0001|ID53572812^^^|0126271^^^^^1||SMITH^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH
AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
PV1|0001|I|2C^2322^2322-0^1^^^^^3|1|50386||1083278^MCFEE,MIKE^^^^^||||||||||1083278^MCFEE,
MIKE|1|50386|1||||||||||||||||||||||||201004042106||||||||
PV2||||||||||||||||||||||N

If the data type Extended Person Name (XPN) is part of the list of data
types to de-identify, you might need to preserve some of the fields using this data
type.

Data Type	Component	Generator
XPN	2 – Given Name	Excel File
FN	1 – Surname	Excel File

Segment	Field	Component	Subcomponent	ID	Generator
PV1	7 – Attending Doctor				Use Original Value

Using this configuration, you would make sure all names are de-identified except
the attending doctor’s name.

MSH|^~&|SYSTEM-A|1 |||20100404210829||ADT^A01|20100404000000645509|P|2.3|||||CA|ASCII
EVN|A01|201004042108||129|Interface^Johnson|201004042106
PID|0001|ID53572812^^^|0126271^^^^^1||Johnson^Deborah||195307280000|M|<span
style=”color: #ff0000;”>Johnson^Deborah||1 FIFTH AVENUE^NEW YORK^NEW
YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
PV1|0001|I|2C^2322^2322-0^1^^^^^3|1|50386||1083278^MCFEE,
MIKE^^^^^||||||||||1083278^Johnson|1|50386|1||||||||||||||||||||||||201004042106||||||||
PV2||||||||||||||||||||||N

Use Case #3

Prevent de-identifying a field that is defined as a ID

Field IDs must have a generator associated with them but, if for some reason you prefer having the original value, you can set this to avoid any changes in that value.

Use Case #4

Re-use the original data and combine it with other generators

In Advanced Mode, you can de-identify the original value by specifying several generators, but you could also include the original value to combine it with other generated values.

Copy Another Field

This generator replicates the value from another de-identified field.

How to use the “Copy Another Field” generator:

Add a new de-id rule by right-clicking the field to de-identify.
Select the Copy Another Field generator.
Set the Segment, Field, Component and SubComponent of the source field.
The source field can be any other field present in the message.

Example 1: copy the replacement MRN value from PID. 2 to ZCA.3

Unstructured Data

Sensitive data can be found in unstructured data (free text) such as clinician notes or other narrative text. Most of the data within an unstructured field is not sensitive, but there are times when it might contain data elements you want to protect.

This generator will replace any piece of information found in another message field that is set for de-identification.

Example #1

In the following message, the name of the patient is mentioned in the patient update note (NTE.3).

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||SMITH^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr. Smith provided new phone numbers

If the patient name (PID.5.1 field) is listed among the de-identification rules, you can configure a new field to detect the patient name within NTE.3

Segment	Field	Component	Subcomponent	ID	Generator
PID	5 – Patient Name	1 – Family Name			Excel File
NTE	3 – Comment				Unstructured Data

Using these settings, the de-identified message will look like this:

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||Doe^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr Doe provided new phone numbers

Example #2

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII

PID|0001|ID53572812^^^|0126271^^^^^1||SMITH^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr Smith ( ID53572812 ) provided new phone numbers

NTE|||Mr Smith also provided a new address

If the patient name (PID.2 field) is listed among the de-identification rules, you can configure a new field to detect the patient ID within NTE.3

Segment	Field	Component	Subcomponent	ID	Generator
PID	2 – Patient ID				Numeric
PID	5 – Patient Name	1 – Family Name			Excel File
NTE	3 – Comment				Unstructured Data

Using these settings, the de-identified message will look like this:

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|123459876^^^|0126271^^^^^1||Doe^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr Doe (123459876) provided new phone numbers

NTE|||Mr Doe also provided a new address

Example #3

Sometimes, a field may be Base64-encoded, as seen below.

In the above message, the decoded value of the NTE.3 field is “Mr. Smith provided new phone numbers”. To detect and de-identify the patient’s name, in addition to including the patient name (PID.5.1 field) into the de-identification rules, you need to tick the “Decode message field from base 64 format before De-Identifying” checkbox. This will decode the field, de-identify it, and then re-encode it into Base64.

Segment	Field	Component	Subcomponent	ID	Generator
PID	5 – Patient Name	1 – Family Name			Excel File
NTE	3 – Comment				Unstructured Data

Using these settings, the de-identified message will look like this:

The decoded value of the above NTE.3 field is “Mr. Doe provided new phone numbers.”

Preconditions

When creating a de-identification rule, you can optionally create and apply a precondition to decide whether or not to apply the rule to a given field. Preconditions are scripts that are written with our JavaScript API.

You can add a precondition to an existing de-identification rule by going into Advanced Mode and selecting “Add Precondition.” This will open a window that allows you to write the script for the precondition and to test it by supplying messages in the Test Data window.

To decide whether or not the precondition is satisfied, use the callback() method. The callback() method accepts a boolean (true or false) which determines whether or not the precondition is satisfied. If the precondition is satisfied, the de-identification rule will be applied to the given field. If it is not satisfied, the de-identification rule will not be applied.

HL7 Messaging De-Identification Context

During HL7 message de-identification, the JavaScript engine context is updated, allowing you to access the current element being validated. The context has the following properties you can refer to:

profile: Allows you to fetch data from profile. See the Profile object definition.
message: Allows you to access the message being de-identified and any of its properties or methods. See the Message object definition
segment: Allows you to access the current segment being de-identified and any of its properties or methods. See the Segment object definition.
field: Allows you to access the current field being de-identified and any of its properties or methods. See the Field object definition.
component: Allows you to access the current component being de-identified and any of its properties or methods. See the Component object definition.
subComponent: Allows you to access the current sub-component being de-identified. See the SubComponent object definition.
dataType: Allows you to access the current data-type instance being de-identified if de-identifying using a data-type de-identification rule. The data-type can be any Field, Component or SubComponent.

Example

The following is an example of a precondition.

Suppose that you only wanted to apply a de-identification rule to the PID.3.1 – ID Number component in a message if the ID was a medical record number. In other words, if the value of the PID.3.5 – Identifier Type Code component was “MR.” The precondition you’d use would like this:

var patientIdTypeCode = context.field.get('5'); callback(patientIdTypeCode == 'MR');

Here, the precondition context’s field is the PID.3 – Patient Identifier List field.

This generator will replace any piece of information found in another message field that is set for de-identification.

Example #1

In the following message, the name of the patient is mentioned in the patient update note (NTE.3).

If the patient name (PID.5.1 field) is listed among the de-identification rules, you can configure a new field to detect the patient name within NTE.3

Segment	Field	Component	Subcomponent	ID	Generator
PID	5 – Patient Name	1 – Family Name			Excel File
NTE	3 – Comment				Unstructured Data

Using these settings, the de-identified message will look like this:

Example #2

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII

NTE|||Mr Smith also provided a new address

If the patient name (PID.2 field) is listed among the de-identification rules, you can configure a new field to detect the patient ID within NTE.3

Segment	Field	Component	Subcomponent	ID	Generator
PID	2 – Patient ID				Numeric
PID	5 – Patient Name	1 – Family Name			Excel File
NTE	3 – Comment				Unstructured Data

Using these settings, the de-identified message will look like this:

NTE|||Mr Doe also provided a new address

Example #3

Sometimes, a field may be Base64-encoded, as seen below.

Segment	Field	Component	Subcomponent	ID	Generator
PID	5 – Patient Name	1 – Family Name			Excel File
NTE	3 – Comment				Unstructured Data

Using these settings, the de-identified message will look like this:

The decoded value of the above NTE.3 field is “Mr. Doe provided new phone numbers.”

Command Line

Caristix Cloak allows you to de-identify HL7 messages using a command line. This allows you to automate operations, such as data conversion, de-identification, test execution, etc. To automate operations, you will be able to use the CloakConsole executable located in the software’s installation folder (typically C:\Program Files (x86)\Caristix\Caristix Cloak).

You can open a command prompt and type the following command to get a list of available commands
CloakConsole.exe help

To get help on a particular command, type
CloakConsole.exe help <command-name>

De-Identify-XML

This command will de-identify HL7v2-XML messages.

To get help with this command, type: CloakConsole.exe help De-Identify-XML

C:\Program Files (x86)\Caristix\Caristix Cloak>CloakConsole.exe help De-Identify-XML

** De-Identify-Xml **

e.g. De-Identify-Xml C:\first-document.xml D:\second-document.xml -de <or> -DeIdentificationR
ules "C:\My DeIdentification rules.cxdx" [-cp <or> -ConformanceProfile "C:\HL7Reference\CCD (
Continuity of Care).cxpx"] [-pi <or> -PersistentIdentities "D:\persistence-xml.dic"] [-r <or>
 -Results "D:\results\"] [-lp <or> -LogsFilePath "C:\logs.txt"]

Source files : The documents to De-Identify (can also be folders).
-de required : DeIdentification rules file path.
-cp [optional] : Conformance Profile file path.
-pi [optional] : Persisted identities file path (if the file already exists, the context will
 be loaded from it).
-r [optional] : Result folder path. The value has to be a folder [default: .\Results].
-lp [optional] : Logs file path.

Options

Cloak Options

Cloak has a number of options that can be set. From the main menu bar, click Tools, then Options. In the Options dialog box that opens, there are three categories: Reference Profile, Windows Service Settings, Delimiters, Settings and Preferences.

Reference Profile

These settings allow the use of HL7 reference profiles to parse logs. Open the Reference Profile tab.

Click the checkbox for Use Reference Profile.
Under the Profiles tab, click and highlight the HL7 reference file to be used from the list and click OK. This will change the reference file used in Cloak.
To load an alternate library, click Browse to navigate to the location of the file. Choose the location and file name and click Open. The file path will be referenced when returning to the Reference Profile tab.
Click OK to save the settings.

Delimiters

These settings allow the addition of specific delimiters to the log file to assist with manageability and readability. They include:

Use message beginning delimiter:
- Open the Preferences tab.
- Click the checkbox to select this category.
- Type the delimiter to be used in the text box.
- Click the checkbox to choose the location for the delimiter.
  - Beginning of file or Use custom regex.

Use message ending delimiter.
- Click the checkbox to select this category.
- Enter the delimiter to be used in the text box.
- Click the checkbox to choose the location for the delimiter.
  - Beginning of file or Use custom regex.

Use segment ending delimiter.
- Click the checkbox to select this category.
- Type the delimiter to be used in the text box.
- Click the checkbox to choose the location for the delimiter.
  - End of line or Use custom regex.

Click OK to save the delimiters.

Settings

Generate value on empty field
- This will populate every field assigned by a rule even if the original value is empty or missing.

Include leading zeros in numeric identifiers
- This allows Cloak to ignore leading zeros in patient identifiers and consider them as numeric values

Re-apply rules and replacement data across multiple files
- When unchecked, replacement patient identity and mapping to actual patient are destroyed as soon as the de-identification process ends. This maximizes security, as without this information, data cannot be re-identified in any way.
- When checked, replacement patient identity and mappings are saved (in the file configured). This file will be reused the next time you de-identify messages, and patient data will be replaced by the same replacement patent identity. In other words, if Joe Smith was replaced by John Doe the first time, checking this option would mean Joe Smith would be replaced again by John Doe, and so on until you you uncheck this option.

Click OK to save the settings.

Preferences

Check for updates upon startup.

Every time you start Cloak, the software will check for available updates. You can manually check for updates by going to Help, Check for Updates

Show tips

Displays information boxes that provide guidance on Cloak features.To hide a tip permanently, click the close button. Restore all hidden tips with the “Reset hidden tips” link.

Show Did You Know

Every time you start Cloak, the software will show a ‘Did you know’ article.

Click OK to save the Preferences.

How To / Tutorial

DE-Identifying HL7 Messages

To help you understand how to use Caristix^TM Cloak, see it in action in this video. A transcript is below to help you follow the steps.

Transcript

Welcome to the “De-Identifying HL7 Messages” tutorial. This will show you how to use Caristix Workgroup to remove PHI from a stack of HL7 messages.

The application would replace PHI with new patient data generated at run-time, keeping patient history but removing any link with the actual patients.

To get started, let’s open the de-identification module and load a file containing HL7 messages. Message could also be loaded from a database or directly from your interface engine if you have the connector installed.

Open HL7 v2.x messages you want to de-identify:

Click FILE → Open → Messages… → +Add…

Choose the files containing the messages. If it is saved on your computer, click Browse My Computer.

The chosen file will be added to the file list.

Click Next > to load the file content.

Your message will appear in the Original section and an example of your message de-identified will appear in the De-identified section.

(0:35) All de-identified data in messages is in red so you can see the actual message and the result.

(0:41) The application comes with a set of de-identification rules. It covers all standard HL7 fields HIPPA identified as containing sensitive data. If messages contain customized fields or Z-segments, go ahead and customize rules.

If needed, you can modify the de-identification rules. Look at this video if you need help.

Once all rule configurations are as wanted, click View Example. You can see an example of the result in the De-identified section. If anything is not as expected in the response, continue customizing the rules.

Set the dictionary:

Click TOOLS → Option… → Settings → Enable Re-apply rules and replacement data across multiples files.

You can create as many dictionaries as needed. For this tutorial, let’s create a new dictionary called HL7Deid. Replace the file name with: C:\ProgramData\Caristix\Carisitx Cloak\Temp\HL7Deid.dic

(0:58) Once de-identification rules are set, it’s time to launch it so all messages are de-identified and stored in files. At the end of the processing, if needed, an audit PDF file can also be created, documenting all settings de-id was done with.

Click OK → De-identify. → Choose where to save the result. Click Browse My Computer to save it onto your computer. → OK → Yes if you want to create a De-identify Process Report in PDF.

(1:14) This ends the “De-Identifying HL7 Messages” introduction tutorial. If you have any question, feel free to contact us. We love questions and feedback!

Thanks for watching

Cloak

Introducing CaristixTM Cloak Software

System Requirements

Firsts Steps

Table des matières

Getting started

Install and Register Caristix Cloak Software

Managing Files

File Types in Cloak

De-id Rules File

HL7 Messages

File Menu

De-Identification Concepts

Protecting Patient Data

Definitions

De-identification or Anonymization

Pseudonymization

Re-identification

Identifiers

Quasi-identifiers

Non-identifiers

De-Identification Concepts

Protecting Patient Data

Definitions

De-identification or Anonymization

Pseudonymization

Re-identification

Identifiers

Quasi-identifiers

Non-identifiers

De-identifying HL7 Messages

Overview

Loading HL7 Messages

De-identification Settings

Fields Tab

Data Types Tab

Add or Remove Rules

View Example and Save a De-identified HL7 Log File

Results Summary

Generators

Generator Settings

Advanced Mode

Preformat Value

String

Boolean

Numeric

Date time

HL7 Table

SQL Query

Text File

Excel file

Use original value

Use Case #1

Use Case #2

Use Case #3

Use Case #4

Copy Another Field

Unstructured Data

Example #1

Example #2

Example #3

Preconditions

HL7 Messaging De-Identification Context

Example

Example #1

Example #2

Example #3

Command Line

De-Identify-XML

Options

Cloak Options

Reference Profile

Delimiters

Settings

Preferences

Check for updates upon startup.

Show tips

Show Did You Know

How To / Tutorial

DE-Identifying HL7 Messages