Cloak

Introducing CaristixTM Cloak Software

Caristix Cloak is designed to help interface analysts and engineers to accurately de-identify HL7 data, covering all 18 HIPAA identifiers. Data can then be safely shared for such purposes as porting realistic data to a test system or staging area, providing realistic sample HL7 messsages for interface scoping, and providing data for clinical and financial analytics.

Cloak software provides the following features and functionality:

System Requirements

Firsts Steps

The easiest way to get your feet wet with Cloak is to read about how to de-identify HL7 Messages.

Table des matières

Getting started

Install and Register Caristix Cloak Software

  • Install Caristix Cloak by clicking on the installation file (.msi file) you received.
  • Launch the software, and fill out the EmailFirst NameLast Name and Organization fields in the registration form.
  • Click the Activate button.RegistrationIf you have a trial version, you will need to purchase an annual license to continue using Cloak after the end of the trial period.

Managing Files

File Types in Cloak

De-id Rules File

The de-identification settings in the Fields and Data Types tabs can be saved and reused. Cloak loads the last used de-id rules file when the program is opened. The default de-id rules will be used the first time, it contains the fields and data types that cover the 18 HIPAA identifiers that must be de-identified.

HL7 Messages

These are the HL7 messages you load in Cloak for de-identification.

File Menu

  • New – Start a new de-identification process. 
  • Open
    • Default De-Id Rules – Replace the existing de-identification rules with the default de-identification rules that covers the HIPAA privacy rule.
    • De-Id Rules – Browse to and open an existing de-id rules file
    • Messages – Browse to and open log file to de-identify. Alternative: messages can also be extracted from a database or a Connector (click on the Database tab and select a Data Source)
  • Save De-Id Rules – Save the current de-id rules
  • Save De-Id Rules as… – Save a rules file for reuse
  • Configure Cloak WebService – Configure the Cloak WebService stand-alone application (contact sales@caristix.com for more information)
  • De-identify – After review, de-identify an open log file
  • Close – Close Caristix Cloak

De-Identification Concepts

Protecting Patient Data

One of the most important issues in healthcare IT is the protection of patient data. Regulation addresses patient privacy and the use of health information in many countries. In the US, HIPAA regulates the use of PHI (protected health information).

While protecting patient data, HL7 analysts need to share or redistribute HL7 production data for such purposes as porting realistic data to a test system or staging area, providing realistic sample HL7 messages for interface scoping, and providing data for clinical and financial analytics.

The Department of Health and Human Services (HHS) provides a HIPAA Privacy Rule booklet (PDF) that highlights the 18 criteria that can be used to identify patients. All 18 identifiers are categories of data that must be protected. Besides easily recognized personal information, care must be given to protect device identifiers and even IP addresses. De-identification techniques must cover all 18 identifiers.

Definitions

De-identification or Anonymization

This term refers to removing or masking protected information. The de-identification removes identifiers from a data set so that information can no longer be linked to a specific individual. In terms of health care information, all identifiers are removed from the information set including both personally identifiable information (PII) and protected health information (PHI).

Pseudonymization

As a subset of de-identification, pseudonymization replaces data elements with new identifiers. After that substitution, the initial subject cannot be associated with the data set. In terms of health care information, patient information can be pseudonymized by replacing patient-identifying data with completely unrelated data resulting in a new patient profile. The data appears complete and the data context is preserved while patient information is completely protected

Re-identification

A pseudonymized data set can be restored to its original state through re-identification. In re-identifying data, a reverse mapping structure (constructed as the data was pseudonymized) is applied. As an example, a pseudonmymized data set could be sent for processing to an external system. Once that processed information is returned, the data could be re-identified and pushed to the correct patient file.

Identifiers

Identifiers are data elements that can directly identify individuals.This includes name, email address, telephone address, home address, social security number, medical card number, among others. Two identifiers may be needed to identify a unique individual.

Quasi-identifiers

Data elements of this type do not directly identify an individual but may provide enough information to narrow the potential of identifying a specific individual. Genders, date of birth and zip/postal code have been studied extensively in this context. There is a dependent relationship between quasi-identifiers and the type of data set of which they are a part. As an example, if all members of a data set are male, gender cannot be a meaningful quasi-identifier. In addition, quasi-identifiers are categorical in nature with a finite set of discrete values. It’s relatively easy to search for individuals using quasi-identifiers.

Non-identifiers

Non-identifiers may contain an individual’s personal information but aren’t helpful in reconstructing the initial information. For example, an indicator of an allergy to pollen would be a non-identifying data element. The incidence of such an allergy is extremely high in the general population. Therefore this factor is not a good discriminator among individuals. Again, non-identifiers are dependent on data sets. In the right context, they may be used to identify an individual.

De-Identification Concepts

Protecting Patient Data

One of the most important issues in healthcare IT is the protection of patient data. Regulation addresses patient privacy and the use of health information in many countries. In the US, HIPAA regulates the use of PHI (protected health information).

While protecting patient data, HL7 analysts need to share or redistribute HL7 production data for such purposes as porting realistic data to a test system or staging area, providing realistic sample HL7 messages for interface scoping, and providing data for clinical and financial analytics.

The Department of Health and Human Services (HHS) provides a HIPAA Privacy Rule booklet (PDF) that highlights the 18 criteria that can be used to identify patients. All 18 identifiers are categories of data that must be protected. Besides easily recognized personal information, care must be given to protect device identifiers and even IP addresses. De-identification techniques must cover all 18 identifiers.

Definitions

De-identification or Anonymization

This term refers to removing or masking protected information. The de-identification removes identifiers from a data set so that information can no longer be linked to a specific individual. In terms of health care information, all identifiers are removed from the information set including both personally identifiable information (PII) and protected health information (PHI).

Pseudonymization

As a subset of de-identification, pseudonymization replaces data elements with new identifiers. After that substitution, the initial subject cannot be associated with the data set. In terms of health care information, patient information can be pseudonymized by replacing patient-identifying data with completely unrelated data resulting in a new patient profile. The data appears complete and the data context is preserved while patient information is completely protected

Re-identification

A pseudonymized data set can be restored to its original state through re-identification. In re-identifying data, a reverse mapping structure (constructed as the data was pseudonymized) is applied. As an example, a pseudonmymized data set could be sent for processing to an external system. Once that processed information is returned, the data could be re-identified and pushed to the correct patient file.

Identifiers

Identifiers are data elements that can directly identify individuals.This includes name, email address, telephone address, home address, social security number, medical card number, among others. Two identifiers may be needed to identify a unique individual.

Quasi-identifiers

Data elements of this type do not directly identify an individual but may provide enough information to narrow the potential of identifying a specific individual. Genders, date of birth and zip/postal code have been studied extensively in this context. There is a dependent relationship between quasi-identifiers and the type of data set of which they are a part. As an example, if all members of a data set are male, gender cannot be a meaningful quasi-identifier. In addition, quasi-identifiers are categorical in nature with a finite set of discrete values. It’s relatively easy to search for individuals using quasi-identifiers.

Non-identifiers

Non-identifiers may contain an individual’s personal information but aren’t helpful in reconstructing the initial information. For example, an indicator of an allergy to pollen would be a non-identifying data element. The incidence of such an allergy is extremely high in the general population. Therefore this factor is not a good discriminator among individuals. Again, non-identifiers are dependent on data sets. In the right context, they may be used to identify an individual.

De-identifying HL7 Messages

Overview

De-identification in Cloak works as follows:

De_identifying_hl7_messages

Loading HL7 Messages

Load the HL7 message that requires de-identification:

  • On the menu bar, go to FileOpen, Messages…
  • In the HL7 Log dialog box that opens, click the plus (+) sign.
  • Navigate to your log file and choose the file to be opened. Add more files by clicking the plus (+) sign.
  • Click Open.The file name will appear highlighted in the HL7 Log dialog box.
  • Click Next to continue.

The log is loaded in Messages tab. The tab also indicates the number of messages in the viewing pane and the total number of messages in the file you loaded. The Original pane displays the log you loaded while the De-identified pane displays the de-identified log. The split screens scroll synchronously so that the data displayed is mirrored in the original and de-identified logs.

Resize vertically to change the quantity of data displayed in the viewing pane. Place the pointer on the line dividing the two panes and drag the window to increase or decrease its size. Click Hide and Show buttons to hide or view panes as needed.

The fields and data types set for de-identification are highlighted in red for easy visibility.

De-identification Settings

On the left side of the screen are the de-identification settings listed under the Fields and Data Types tabs. Cloak loads settings to cover the 18 HIPAA identifiers by default.

Fields Tab

  • Checkbox: the checkmark indicates an active rule. Uncheck to deactivate a rule.
  • SEGMENT: select a segment.
  • FIELD: select a field.
  • COMPONENT: select a component, if needed.
  • SUBCOMPONENT: select a subcomponent, if needed.
  • ID: sets the primary key. This lets you set the data element used to link several messages to the same patient and keep the clinical history unchanged. Check it for any field uniquely identifying the patient. For instance, if patient is identified using PID.3.1 and PID.3.4, make sure there is a rule for each of those 2 fields and check both as being ID.  Int he same way, if you want to use patient name, gender and date of birth as patient identifier, make sure the ID check box is checked for all fields. Unchecking it would change how data is generated (a new patient would be created for each message).
  • GENERATOR TYPE: sets the data source for de-identified values. Learn about Generator Type settings.

Data Types Tab

  • Checkbox: the checkmark indicates an active rule. Uncheck to deactivate.
  • DATATYPE: select a datatype.
  • COMPONENT: select a component.
  • GENERATOR: sets the data source for de-identified values. Learn about Generator settings.

Add or Remove Rules

To add a de-identification rule under Fields or Data Types:

  • Click the plus sign at the bottom of the list of selectors. A new line will appear.
  • Edit using dropdowns in each column.

To remove a setting, click the trashcan at the end of the line.

View Example and Save a De-identified HL7 Log File

Once you have created and configured all the selectors applicable to the HL7 log to be de-identified, click View Example at the bottom of the left hand pane. A preview of the de-identified log file will appear. Scroll through the log in the viewing panes to verify the potential results of the de-identification process.

Once reviewed and after applying any changes:

  • Click De-identify at the bottom of the left hand screen to the right of View Example to save the de-identified log file.
  • Save Results dialogue box will appear with a number of options. Click the appropriate radio button for that log file. Those options include:
    • Save the file with the initial file structure
    • Divide the file into smaller chunks of a specific size in MB or number of messages
    • Not to split the file but to keep it all in one file
  • Click Save and browse to the location to store the file. Click Save.
  • A window will open tracking the progress of the process.

Results Summary

Once saved, a De-identification Process Report dialogue box will open asking if you wish to create a de-identification process report. Click Yes or No. If Yes is clicked, you will be prompted to choose a location to save the generated PDF and to give a name to the file. Click Save and the file will be saved to the specified location. The PDF of the De-identification Process Summary will open on your desktop for review.

Once a set of selectors have been chosen for the de-identification of a log file, that set can be saved for reuse.

  • In the drop down menu under File in the upper menu bar, Click Save de-id rules.
  • Choose a location and fill in a file name for the settings and click Save.

Once a log file has been opened, the saved de-identification rules can be applied by clicking Open de-id rules from the drop down menu bar under File in the the top menu bar.

Generators

Generators refer to the data sources used to set de-identification values in Workgroup.

Generator
Recommended Use
StringInsert a randomly generated string or static value. You can set the length and other parameters.
BooleanInsert a Boolean value (true or false).
NumericInsert a randomly generated number. You can set the length, decimals and other parameters.
Date TimeInsert a randomly generated date-time value. You can set the range, time unit, format, and other parameters.
TablePull data from HL7-related tables stored in one of your profiles, useful for coded fields.
SQL QueryPull data from a database based on an SQL query. You’ll be able to configure a database connection.
TextPull random de-identification data from a text file — for instance, a list of names. Several file formats can be used: txt, csv, etc
ExcelPull random de-identification data from an Excel 2007 or later spreadsheet — for instance, a list of names, addresses, and cities.
Use Original ValueKeep the field as-is. No de-identification rules will be applied.
Copy Another FieldCopy the contents of another field.
Unstructured DataFind and replace sensitive data in free text fields — for instance, find and replace a patient’s last name in physician notes.

Generator Settings

Each generator has its own settings, which you can edit from the Value Generator tab. Click on the generator name to navigate to the setting details.

Advanced Mode

Allows you to use more than one generator for a single field, edit the output format or preformat the source value. You can also set preconditions to conditionally apply the de-identification rule.

Preformat Value

(Only available in Advanced Mode)

Use this to format the original value before it is processed.

This is useful for generators that include the original value or ID fields. Here are two usage examples:

a) In an unstructured data field, you may wish to remove a value that is not contained elsewhere (not already cloaked in another field):

If you know the field may contain a reference to an ID defined as ‘ID-999999’, you would:

1. Cloak the field using an Unstructured Data generator.

2. Set the following preformat for the unstructured data:

Find what:

ID-\d+     (Search for a text, anywhere in the field value, starting with 'ID-' and followed by one or more numbers.)

Replace by:

ID-XXXX     (We set a static text to hide the ID but still keep the context of the text.)

b) If you have the same patient ID number in two systems, but formatted differently, you could format them so that both systems change to the same ID format and can both be recognized as the same patient. Having the same ID will provide continuity of the message flow for a patient (messages will be cloaked using the same fake data):

If, for example, PID.2 is defined like this for the two systems:

First system: ID:123456
Second system: 123-456

You would need to:

a) Set the field PID.2 as an ID (by checking the ID column).

b) Define two preformats like this:

Find what:

^ID-(?<ID_Number>\d+)$      (We find an exact match for the format and set the numbers only in a group variable named 'ID_Number')

Replace by:

${ID_Number}    (We set only the number, removing the superfluous text)

Find what :

^(?<ID_Number_Part_1>\d+)-(?<ID_Number_Part_2>\d+)$      (Find an exact match for the format and set the numbers only in a group variable named 'ID_Number')

Replace by:

${ID_Number_Part_1}${ID_Number_Part_2}     (Only the number, remove the superfluous text)

Now both systems will treat PID.2 as being ‘123456’ and match and cloak the messages properly as being the same patient.

String

This generator creates a uppercase character string to be used to set a static value.

How to use the “String” generator to create random value:

  • Check the Random option.
  • Set the minimum length of the strings you want to generate.The minimum value for this configuration is 0. A string with a length of 0 is equivalent to an empty string.
  • Set the maximum length of the strings you want to generate.
  • Include lowercase letters (a to z characters).
  • Include uppercase letters (Z to Z characters).
  • Include digits (0 to 9 characters).
  • Include special characters. This allows you to include any character you want.
  • Include random blanks. Including random blanks generates empty strings among the values for use in the field or data type.

How to use the “String” generator to set a static value:

  • Check the Static option.
  • Set the static value to be inserted.

How to use the “String” generator to set a Lorem Ipsum text:

  • Check the Lorem Ipsum option.
  • Set the minimum length of the strings you want to generate.The minimum value for this configuration is 0. A string with a length of 0 is equivalent to an empty string.
  • Set the maximum length of the strings you want to generate.
  • Include random blanks. Including random blanks means that you generate empty strings among the values for use in the field or data type.
Example #1:Generated Values
  • Random
  • Minimum length: 0
  • Maximum length: 5
  • Include random blanks: checked
XDZ
VOJHZ
 
BFAR
Example #2:Generated Values
  • Static
  • Static Value: MyNewValue
MyNewValue

Boolean

This generator creates a Boolean (True or False) value.

 

How to use the Boolean generator:

  • Random values
    • Generate True or False value randomly.
    • Include random blanks. Allowing random blanks will mean that you generate empty strings among the values for use in the field or data type.
  • Sequential list
    • Generate a sequence of True, False, True, False, True, etc.
    • Start new list. Always start the sequence with True.
    • Continue from previous list. If you run the De-Identication and it ends with True, next time, it will start with False.

 

Example #1:Generated Values
  • Random values
  • Include random blanks: unchecked
True
True
False
True
False



Numeric

This generator creates a number.

How to use the “Numeric” generator:

  • Random values
    • Randomly generate values between minimum and maximum limits.
    • Decimal. Set the decimal precision of the generated value. Example #2 will generate value with 2 decimals (3.75).
    • Include random blanks. Including random blanks generates empty strings among the values for use in the field or data type.
  • Sequential list
    • Generate a sequence of 0, 1, 2, 3, etc.
    • Decimal. Set the decimal precision of the generated value. Example #2 will generate value with 2 decimals (3.75).
    • Increment by. The step to use between each generation. You can use a negative value.
    • Start new list. Always start with the minimum limit or the maximum limit if you’re using a negative increment.
    • Continue from previous list. If you run De-Identication and it ends with 13, the next time, it will start with 14.
Example #1:Generated Values
  • Sequential list
  • Between: 10 and 1000
  • Decimal: 2
  • Increment by: 5
  • Start new list
10.34
15.2
20.85
25.39
30.12
Example #2:Generated Values
  • Random values
  • Between: 10 and 1000
  • Decimal: 0
  • Include random blanks: unchecked
353
942
359
626
967



Date time

This generator creates date and time values.

How to use the “Date time” generator:

  • Random values
    • Randomly generate values in a range between minimum and maximum limits of time unit (second, minute, etc.)
    • Based on.
      • Now. Will use the current date time as a reference.
      • Actual field value. Will use the date time value from the field in the original message.
      • A specific date. You can specify a date and time to use as a reference.
    • Date format. Set the format of the new date-time. Note that you have a choice of formats. You can also enter your own format manually.
    • Include random blanks. Allowing random blanks will mean that you generate empty strings among the values for use in the field or data type.
  • Sequential list
    • Generate a sequence of date-time like: 2013-12-12, 2013-12-13, 2013-12-14, 2013-12-15, etc.
    • Based on.
      • Now. Will use the current date time as a reference.
      • Actual field value. Will use the date time value from the field in the original message.
      • A specific date. You can specify a date and time to use as a reference.
    • Date format. Set the format of the new date-time. Note that you have a choice of formats. You can also enter your own format manually.
    • Increment by. The step to use between each generation. You can use a negative value and set a time unit (second, minute, etc.)
    • Start new line. Always start with the minimum limit or the maximum limit if you’re using a negative increment.
    • Continue from previous list. If you run the De-Identication and it ends with 2013-12-13, next time, it will start with 2013-12-14.
Example #1:Generated ValuesDescription
  • Random values
  • In a range between: 10 and 1000 Day
  • Based on: A specific date (2012-01-01 00:00:00)
  • Date format: yyyyMMdd
  • Include random blanks: unchecked
20120318Reference date + 77 days
20120614Reference date + 165 days
20140102Reference date + 732 days
20120212Reference date + 42 days
20130508Reference date + 493 days
Example #2:Generated ValuesDescription
  • Sequential list
  • In a range between: 0 and 1440 Minute
  • Based on: A specific date (2012-01-01 09:15:30)
  • Date format: yyyyMMddHHmmss
  • Increment by: 15 minutes
20120101091530Reference date + 0 minutes
20120101093030Reference date + 15 minutes
20120101094530Reference date + 30 minutes
20120101100030Reference date + 45 minutes
20120101101530Reference date + 60 minutes
Example #3:Generated ValuesDescription
  • Sequential list
  • In a range between: 0 and 30 Minute
  • Based on: A specific date (2012-01-01 09:15:30)
  • Date format: yyyyMMddHHmmss
  • Increment by: 10 minutes
20120101091530Reference date + 0 minutes
20120101092530Reference date + 10 minutes
20120101093530Reference date + 20 minutes
20120101094530Reference date + 30 minutes
20120101091530Reference date + 0 minutes

When the generator exceeds the maximum value (30), the sequence is reset starting at the minimum value (0).

Example #4: Manipulate date of birthOriginal field ValueGenerated Value
  • Random values
  • In a range between: -3650 and 3650 Day
  • Based on: Actual field value
  • Date format: yyyyMMdd
  • Include random blanks: unchecked
1913011319110213
1990090920000812
1985090919870514
1960102019650218
1980031719880617



HL7 Table

This generator pulls data from HL7-related tables stored in a profile. Read how to set the profile.

How to configure the generator to use the appropriate HL7 table:

  • Random values
    • Randomly generate values from an HL7 table.
    • Source. Select the profile containing the table.
    • Table. Select a table from which the value will be generated.
    • To access the table content, click on the Edit Table button. If
      you change the table content, the new table content will appear in the profile you
      select.
    • Restrict to values between. Will only use table entries that are within the specified
      limits.
    • Include random blanks. Allowing random blanks will mean that you generate empty
      strings among the values for use in the field or data type.
  • Sequential list
    • Generate a sequence of value starting with the first table entry.
    • Source. Select the profile containing the table.
    • Table. Select a table from which the value will be generated.
    • To access the table content, click on the Edit Table button. If
      you change the table content, the new table content will appear in the profile you
      select.
    • Restrict to values between. Will only use table entries that are within the specified
      limits.
    • Start new line. Always start with the first entry of the table.
    • Continue from previous list. If you run the De-Identication and it ends with the
      13th entry, next time, it will start with the 14th one.
Example #1:Generated Values
  • Random values
  • Table: 0001 – Administrative Sex
  • Restrict to values between: 1 and 1

    characters

  • Include random blanks: unchecked
N
M
A
F
A
Example #2:Generated Values
  • Sequential list
  • Table: 0001 – Administrative Sex
  • Start new list
A
F
M
N
O



SQL Query

This generator pulls data from an SQL-accessible database.

How to configure this generator to use SQL query results as de-identified values:

  • Select a database connection. If no database connections are configured, click Connections… to set up a connection.
  • Enter the SQL query. You can use the embedded Query Builder to help you build the query.
  • Restrict to values between. Will only use values that are within the specified limits.
  • Include random blanks. Allowing random blanks will mean that you generate empty strings among the values for use in the field or data type.
Example #1:Generated Values
  • Connection: Connection1
  • Query: SELECT name FROM employees
  • Restrict to values between: 1 and 20 characters
  • Include random blanks: Unchecked
John Smith
Jane Doe
Road Runner
The Coyote
Tweety Bird



Text File

This generator pulls data from a text file (*.txt, *.csv, etc).

How to configure this generator to use text file content:

  • Random values
    • Randomly generate values from a text file.
    • File. Specify the source of the text file. Use the Browse… button to select a file.
    • Column. Specify the column id to use (in case of a character delimited file, ex: *.csv)
    • Column delimiter. The character that separate each column in the text file.
    • First/Last rows. Specify the rows to get data.
    • Between character position. Will only use characters that are within the specified positions.
    • Restrict to values between. Will only use values that are within the specified limits.
    • Include random blanks. Allowing random blanks will mean that you generate empty strings among the values for use in the field or data type.
  • Sequential list
    • Generate a sequence of value from a text file starting with the first row.
    • File. Specify the source of the text file. Use the Browse… button to select a file.
    • Column. Specify the column id to use (in case of a character delimited file, ex: *.csv)
    • Column delimiter. The character that separate each column in the text file.
    • First/Last rows. Specify the rows to get data.
    • Between character position. Will only use characters that are within the specified positions.
    • Restrict to values between. Will only use values that are within the specified limits.
    • Start new line. Always start with the first row in the text file.
    • Continue from previous list. If you run the De-Identication and it ends with the 13th entry, next time, it will start with the 14th one.

Note: If more than one field is configured using the same text file, the same line will be used within the same message. In other words, you can use a text file to ensure several values will be used together. This can be useful when linking a a city with a zip code or a first name with a gender.

The examples below use the following content in a file C:MyDocumentsmyFile.txt

1,Road Runner,M,ACME,Anycity,12345
2,The Coyote,M,ACME,Anycity,12345
3,Sylvester The Cat,M,ACME,Anycity,12345
4,Tweety Bird,M,ACME,Anycity,12345
5,John Smith,M,,Anothercity,98765
6,Jane Doe,F,,Anothercity,98765

 

Example #1:Generated Values
  • Random values
  • File: C:MyDocumentsmyFile.txt
  • Column: 2
  • Delimiter: ,
  • Restrict to values between: 1 and 20 characters
  • Include random blanks: Unchecked
John Smith
Jane Doe
Road Runner
The Coyote
Tweety Bird
Example #2:Generated Values
  • Sequential list
  • File: C:MyDocumentsmyFile.txt
  • Column: 3
  • Delimiter: ,
  • Restrict to values between: 1 and 20 characters
  • Start new list
M
M
M
M
M
F



Excel file

This generator pulls data from an Excel 2007+ file (*.xlsx).

How to configure the generator to use Excel file content:

  • Random values
    • Randomly generate values from an Excel file.
    • File. Specify the source of the Excel file. Use the Browse… button to select a file.
    • Worksheet. Specify the Worksheet to use.
    • Column. Specify the column to use.
    • First/Last rows. Specify the rows to get data.
    • Restrict to values between. Will only use values that are within the specified limits.
    • Include random blanks. Allowing random blanks will mean that you generate empty strings among the values for use in the field or data type.
  • Sequential list
    • Generate a sequence of value from an Excel file starting with the first row.
    • File. Specify the source of the Excel file. Use the Browse… button to select a file.
    • Worksheet. Specify the Worksheet to use.
    • Column. Specify the column to use.
    • First/Last rows. Specify the rows to get data.
    • Restrict to values between. Will only use values that are within the specified limits.
    • Start new line. Always start with the first row in the Excel file.
    • Continue from previous list. If you run the De-Identication and it ends with the 13th entry, next time, it will start with the 14th one.

Note: If more than one field is configured using the same worksheet, the same row will be applied across a message. In other words, you can use an Excel file to ensure that several values will be used together. This can be useful when link a city with a zip code or a first name with a gender.

The examples below use the following content from a file named C:MyDocumentsmyExcelFile.xlsx

1Road RunnerMACMEAnycity12345
2The CoyoteMACMEAnycity12345
3Sylvester The CatMACMEAnycity12345
4Tweety BirdMACMEAnycity12345
5Jane DoeF Anothercity98765
6John SmithM Anothercity98765

 

Example #1:Generated Values
  • Random values
  • File: C:MyDocumentsmyExcelFile.xlsx
  • Worksheet: TheFirstSheet
  • Column: 2
  • Restrict to values between: 1 and 20 characters
  • Include random blanks: Unchecked
John Smith
Jane Doe
Road Runner
The Coyote
Tweety Bird
Example #2:Generated Values
  • Sequential list
  • File: C:MyDocumentsmyExcelFile.xlsx
  • Worksheet: TheFirstSheet
  • Column: 3
  • Restrict to values between: 1 and 20 characters
  • Start new list
M
M
M
M
F
M



Use original value

This generator is to be used when you don’t want a data element to be changed. Here
are two use case examples.

Use Case #1

Use Case #2

  • De-identify all fields with XPN data type except for the attending doctor

MSH|^~&|SYSTEM-A|1 |||20100404210829||ADT^A01|20100404000000645509|P|2.3|||||CA|ASCII
EVN|A01|201004042108||129|Interface^HL7 Interface|201004042106
PID|0001|ID53572812^^^|0126271^^^^^1||SMITH^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH
AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
PV1|0001|I|2C^2322^2322-0^1^^^^^3|1|50386||1083278^MCFEE,MIKE^^^^^||||||||||1083278^MCFEE,
MIKE|1|50386|1||||||||||||||||||||||||201004042106||||||||
PV2||||||||||||||||||||||N

 

If the data type Extended Person Name (XPN) is part of the list of data
types to de-identify, you might need to preserve some of the fields using this data
type.

Data TypeComponentGenerator
XPN2 – Given NameExcel File
FN1 – SurnameExcel File

 

SegmentFieldComponentSubcomponentIDGenerator
PV17 – Attending Doctor   Use Original Value

 

Using this configuration, you would make sure all names are de-identified except
the attending doctor’s name.


MSH|^~&|SYSTEM-A|1 |||20100404210829||ADT^A01|20100404000000645509|P|2.3|||||CA|ASCII
EVN|A01|201004042108||129|Interface^Johnson|201004042106
PID|0001|ID53572812^^^|0126271^^^^^1||Johnson^Deborah||195307280000|M|<span
style=”color: #ff0000;”>Johnson^Deborah||1 FIFTH AVENUE^NEW YORK^NEW
YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
PV1|0001|I|2C^2322^2322-0^1^^^^^3|1|50386||1083278^MCFEE,
MIKE^^^^^||||||||||1083278^Johnson|1|50386|1||||||||||||||||||||||||201004042106||||||||
PV2||||||||||||||||||||||N

 

Use Case #3

  • Prevent de-identifying a field that is defined as a ID

    Field IDs must have a generator associated with them but, if for some reason you prefer having the original value, you can set this to avoid any changes in that value.

Use Case #4

  • Re-use the original data and combine it with other generators

    In Advanced Mode, you can de-identify the original value by specifying several generators, but you could also include the original value to combine it with other generated values.



Copy Another Field

This generator replicates the value from another de-identified field.

How to use the “Copy Another Field” generator:

  • Add a new de-id rule by right-clicking the field to de-identify.
  • Select the Copy Another Field generator.
  • Set the Segment, Field, Component and SubComponent of the source field.
  • The source field can be any other field present in the message.

Example 1: copy the replacement MRN value from PID. 2 to ZCA.3

DeIdentification_CopyField_Example1

 

Unstructured Data

Sensitive data can be found in unstructured data (free text) such as clinician notes or other narrative text. Most of the data within an unstructured field is not sensitive, but there are times when it might contain data elements you want to protect.

 

This generator will replace any piece of information found in another message field that is set for de-identification.

Example #1

In the following message, the name of the patient is mentioned in the patient update note (NTE.3).

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||SMITH^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr. Smith provided new phone numbers


If the patient name (PID.5.1 field) is listed among the de-identification rules, you can configure a new field to detect the patient name within NTE.3

SegmentFieldComponentSubcomponentIDGenerator
PID5 – Patient Name1 – Family Name  Excel File
NTE3 – Comment   Unstructured Data

 

Using these settings, the de-identified message will look like this:

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||Doe^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr Doe provided new phone numbers

 

Example #2

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||SMITH^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr Smith ( ID53572812 ) provided new phone numbers
NTE|||Mr Smith also provided a new address

 

If the patient name (PID.2 field) is listed among the de-identification rules, you can configure a new field to detect the patient ID within NTE.3

SegmentFieldComponentSubcomponentIDGenerator
PID2 – Patient ID   Numeric
PID5 – Patient Name1 – Family Name  Excel File
NTE3 – Comment   Unstructured Data

 

Using these settings, the de-identified message will look like this:

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|123459876^^^|0126271^^^^^1||Doe^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr Doe (123459876) provided new phone numbers
NTE|||Mr Doe also provided a new address

 

Example #3

Sometimes, a field may be Base64-encoded, as seen below.

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||SMITH^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||TXIuIFNtaXRoIHByb3ZpZGVkIG5ldyBwaG9uZSBudW1iZXJz


In the above message, the decoded value of the NTE.3 field is “Mr. Smith provided new phone numbers”. To detect and de-identify the patient’s name, in addition to including the patient name (PID.5.1 field) into the de-identification rules, you need to tick the “Decode message field from base 64 format before De-Identifying” checkbox. This will decode the field, de-identify it, and then re-encode it into Base64.

SegmentFieldComponentSubcomponentIDGenerator
PID5 – Patient Name1 – Family Name  Excel File
NTE3 – Comment   Unstructured Data

 

Using these settings, the de-identified message will look like this:

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||Doe^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||TXIuIERvZSBwcm92aWRlZCBuZXcgcGhvbmUgbnVtYmVycw==

 

The decoded value of the above NTE.3 field is “Mr. Doe provided new phone numbers.”

Preconditions

When creating a de-identification rule, you can optionally create and apply a precondition to decide whether or not to apply the rule to a given field. Preconditions are scripts that are written with our JavaScript API.

You can add a precondition to an existing de-identification rule by going into Advanced Mode and selecting “Add Precondition.” This will open a window that allows you to write the script for the precondition and to test it by supplying messages in the Test Data window.

To decide whether or not the precondition is satisfied, use the callback() method. The callback() method accepts a boolean (true or false) which determines whether or not the precondition is satisfied. If the precondition is satisfied, the de-identification rule will be applied to the given field. If it is not satisfied, the de-identification rule will not be applied.

 

HL7 Messaging De-Identification Context

During HL7 message de-identification, the JavaScript engine context is updated, allowing you to access the current element being validated. The context has the following properties you can refer to:

  • profile: Allows you to fetch data from profile. See the Profile object definition.
  • message: Allows you to access the message being de-identified and any of its properties or methods. See the Message object definition
  • segment: Allows you to access the current segment being de-identified and any of its properties or methods. See the Segment object definition.
  • field: Allows you to access the current field being de-identified and any of its properties or methods. See the Field object definition.
  • component: Allows you to access the current component being de-identified and any of its properties or methods. See the Component object definition.
  • subComponent: Allows you to access the current sub-component being de-identified. See the SubComponent object definition.
  • dataType: Allows you to access the current data-type instance being de-identified if de-identifying using a data-type de-identification rule. The data-type can be any FieldComponent or SubComponent.

 

Example

The following is an example of a precondition.

Suppose that you only wanted to apply a de-identification rule to the PID.3.1 – ID Number component in a message if the ID was a medical record number. In other words, if the value of the PID.3.5 – Identifier Type Code component was “MR.” The precondition you’d use would like this:
 

var patientIdTypeCode = context.field.get('5');
callback(patientIdTypeCode == 'MR');

 
Here, the precondition context’s field is the PID.3 – Patient Identifier List field.

This generator will replace any piece of information found in another message field that is set for de-identification.

Example #1

In the following message, the name of the patient is mentioned in the patient update note (NTE.3).

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||SMITH^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr. Smith provided new phone numbers

If the patient name (PID.5.1 field) is listed among the de-identification rules, you can configure a new field to detect the patient name within NTE.3

SegmentFieldComponentSubcomponentIDGenerator
PID5 – Patient Name1 – Family Name  Excel File
NTE3 – Comment   Unstructured Data

Using these settings, the de-identified message will look like this:

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||Doe^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr Doe provided new phone numbers

Example #2

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||SMITH^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr Smith ( ID53572812 ) provided new phone numbers
NTE|||Mr Smith also provided a new address

If the patient name (PID.2 field) is listed among the de-identification rules, you can configure a new field to detect the patient ID within NTE.3

SegmentFieldComponentSubcomponentIDGenerator
PID2 – Patient ID   Numeric
PID5 – Patient Name1 – Family Name  Excel File
NTE3 – Comment   Unstructured Data

Using these settings, the de-identified message will look like this:

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|123459876^^^|0126271^^^^^1||Doe^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||Mr Doe (123459876) provided new phone numbers
NTE|||Mr Doe also provided a new address

Example #3

Sometimes, a field may be Base64-encoded, as seen below.

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||SMITH^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||TXIuIFNtaXRoIHByb3ZpZGVkIG5ldyBwaG9uZSBudW1iZXJz

In the above message, the decoded value of the NTE.3 field is “Mr. Smith provided new phone numbers”. To detect and de-identify the patient’s name, in addition to including the patient name (PID.5.1 field) into the de-identification rules, you need to tick the “Decode message field from base 64 format before De-Identifying” checkbox. This will decode the field, de-identify it, and then re-encode it into Base64.

SegmentFieldComponentSubcomponentIDGenerator
PID5 – Patient Name1 – Family Name  Excel File
NTE3 – Comment   Unstructured Data

Using these settings, the de-identified message will look like this:

MSH|^~&|SYSTEM-A|1|||20100404210829||ADT^A08|20100404000000645509|P|2.3|||||CA|ASCII
PID|0001|ID53572812^^^|0126271^^^^^1||Doe^JOHN||195307280000|M|SMITH^JOHN||1 FIFTH AVENUE^NEW YORK^NEW YORK^^33333^USA^P^53052^16||(555)555-5555|(555)555-5555|^|2|||238898464|||||||||C1||N
NTE|||TXIuIERvZSBwcm92aWRlZCBuZXcgcGhvbmUgbnVtYmVycw==

The decoded value of the above NTE.3 field is “Mr. Doe provided new phone numbers.”

Command Line

Caristix Cloak allows you to de-identify HL7 messages using a command line. This allows you to automate operations, such as data conversion, de-identification, test execution, etc. To automate operations, you will be able to use the CloakConsole executable located in the software’s installation folder (typically C:\Program Files (x86)\Caristix\Caristix Cloak).

You can open a command prompt and type the following command to get a list of available commands
CloakConsole.exe help

To get help on a particular command, type
CloakConsole.exe help <command-name>

De-Identify-XML

This command will de-identify HL7v2-XML messages.

To get help with this command, type: CloakConsole.exe help De-Identify-XML

C:\Program Files (x86)\Caristix\Caristix Cloak>CloakConsole.exe help De-Identify-XML

** De-Identify-Xml **

e.g. De-Identify-Xml C:\first-document.xml D:\second-document.xml -de <or> -DeIdentificationR
ules "C:\My DeIdentification rules.cxdx" [-cp <or> -ConformanceProfile "C:\HL7Reference\CCD (
Continuity of Care).cxpx"] [-pi <or> -PersistentIdentities "D:\persistence-xml.dic"] [-r <or>
 -Results "D:\results\"] [-lp <or> -LogsFilePath "C:\logs.txt"]

Source files : The documents to De-Identify (can also be folders).
-de required : DeIdentification rules file path.
-cp [optional] : Conformance Profile file path.
-pi [optional] : Persisted identities file path (if the file already exists, the context will
 be loaded from it).
-r [optional] : Result folder path. The value has to be a folder [default: .\Results].
-lp [optional] : Logs file path.

Options

Cloak Options

Cloak has a number of options that can be set. From the main menu bar, click Tools, then Options. In the Options dialog box that opens, there are three categories: Reference Profile, Windows Service Settings, Delimiters, Settings and Preferences.

 Options

Reference Profile

These settings allow the use of HL7 reference profiles to parse logs. Open the Reference Profile tab.

  • Click the checkbox for Use Reference Profile.
  • Under the Profiles tab, click and highlight the HL7 reference file to be used from the list and click OK. This will change the reference file used in Cloak.
  • To load an alternate library, click Browse to navigate to the location of the file. Choose the location and file name and click Open. The file path will be referenced when returning to the Reference Profile tab.
  • Click OK to save the settings.

Delimiters

These settings allow the addition of specific delimiters to the log file to assist with manageability and readability. They include:

  • Use message beginning delimiter:
    • Open the Preferences tab.
    • Click the checkbox to select this category.
    • Type the delimiter to be used in the text box.
    • Click the checkbox to choose the location for the delimiter.
      • Beginning of file or Use custom regex.
  • Use message ending delimiter.
    • Click the checkbox to select this category.
    • Enter the delimiter to be used in the text box.
    • Click the checkbox to choose the location for the delimiter.
      • Beginning of file or Use custom regex.
  • Use segment ending delimiter.
    • Click the checkbox to select this category.
    • Type the delimiter to be used in the text box.
    • Click the checkbox to choose the location for the delimiter.
      • End of line or Use custom regex.

Click OK to save the delimiters.

Settings

  • Generate value on empty field
    • This will populate every field assigned by a rule even if the original value is empty or missing.
  • Include leading zeros in numeric identifiers
    • This allows Cloak to ignore leading zeros in patient identifiers and consider them as numeric values
  • Re-apply rules and replacement data across multiple files
    • When unchecked, replacement patient identity and mapping to actual patient are destroyed as soon as the de-identification process ends.  This maximizes security, as without this information, data cannot be re-identified in any way.
    • When checked, replacement patient identity and mappings are saved (in the file configured).  This file will be reused the next time you de-identify messages, and patient data will be replaced by the same replacement patent identity.  In other words, if Joe Smith was replaced by John Doe the first time, checking this option would mean Joe Smith would be replaced again by John Doe, and so on until you you uncheck this option.

Click OK to save the settings.

Preferences

Check for updates upon startup.

  • Every time you start Cloak, the software will check for available updates. You can manually check for updates by going to  HelpCheck for Updates

Show tips

  • Displays information boxes that provide guidance on Cloak features.To hide a tip permanently, click the close button. Restore all hidden tips with the “Reset hidden tips” link.

Show Did You Know

  • Every time you start Cloak, the software will show a ‘Did you know’ article.

 

Click OK to save the Preferences.

How To / Tutorial

DE-Identifying HL7 Messages

To help you understand how to use CaristixTM Cloak, see it in action in this video. A transcript is below to help you follow the steps.

Transcript

Welcome to the “De-Identifying HL7 Messages” tutorial. This will show you how to use Caristix Workgroup to remove PHI from a stack of HL7 messages.

The application would replace PHI with new patient data generated at run-time, keeping patient history but removing any link with the actual patients.

To get started, let’s open the de-identification module and load a file containing HL7 messages. Message could also be loaded from a database or directly from your interface engine if you have the connector installed.

Open HL7 v2.x messages you want to de-identify:

Click FILE → Open → Messages… → +Add…

Choose the files containing the messages. If it is saved on your computer, click Browse My Computer.

The chosen file will be added to the file list.

Click Next > to load the file content.

Your message will appear in the Original section and an example of your message de-identified will appear in the De-identified section.

(0:35) All de-identified data in messages is in red so you can see the actual message and the result.

(0:41) The application comes with a set of de-identification rules. It covers all standard HL7 fields HIPPA identified as containing sensitive data. If messages contain customized fields or Z-segments, go ahead and customize rules.

If needed, you can modify the de-identification rules. Look at this video if you need help.

Once all rule configurations are as wanted, click View Example. You can see an example of the result in the De-identified section. If anything is not as expected in the response, continue customizing the rules.

Set the dictionary:

Click TOOLS → Option… → Settings → Enable Re-apply rules and replacement data across multiples files.

You can create as many dictionaries as needed. For this tutorial, let’s create a new dictionary called HL7Deid. Replace the file name with: C:\ProgramData\Caristix\Carisitx Cloak\Temp\HL7Deid.dic

(0:58) Once de-identification rules are set, it’s time to launch it so all messages are de-identified and stored in files. At the end of the processing, if needed, an audit PDF file can also be created, documenting all settings de-id was done with.

Click OK → De-identify. → Choose where to save the result. Click Browse My Computer to save it onto your computer. → OK → Yes if you want to create a De-identify Process Report in PDF.

(1:14) This ends the “De-Identifying HL7 Messages” introduction tutorial.  If you have any question, feel free to contact us. We love questions and feedback!

Thanks for watching