Generators

Generators refer to the data sources used to set de-identification values in Workgroup.

Generator
Recommended Use
String Insert a randomly generated string or static value. You can set the length and other parameters.
Boolean Insert a Boolean value (true or false).
Numeric Insert a randomly generated number. You can set the length, decimals and other parameters.
Date Time Insert a randomly generated date-time value. You can set the range, time unit, format, and other parameters.
Table Pull data from HL7-related tables stored in one of your profiles, useful for coded fields.
SQL Query Pull data from a database based on an SQL query. You’ll be able to configure a database connection.
Text Pull random de-identification data from a text file — for instance, a list of names. Several file formats can be used: txt, csv, etc
Excel Pull random de-identification data from an Excel 2007 or later spreadsheet — for instance, a list of names, addresses, and cities.
Use Original Value Keep the field as-is. No de-identification rules will be applied.
Copy Another Field Copy the contents of another field.
Unstructured Data Find and replace sensitive data in free text fields — for instance, find and replace a patient’s last name in physician notes.

 

Generator Settings

Each generator has its own settings, which you can edit from the Value Generator tab. Click on the generator name to navigate to the setting details.

Advanced Mode

Allows you to use more than one generator for a single field, edit the output format or preformat the source value. You can also set preconditions to conditionally apply the de-identification rule.

Preformat Value

(Only available in Advanced Mode)

Use this to format the original value before it is processed.

This is useful for generators that include the original value or ID fields. Here are two usage examples:

a) In an unstructured data field, you may wish to remove a value that is not contained elsewhere (not already cloaked in another field):

If you know the field may contain a reference to an ID defined as ‘ID-999999’, you would:

1. Cloak the field using an Unstructured Data generator.

2. Set the following preformat for the unstructured data:

Find what:

ID-\d+     (Search for a text, anywhere in the field value, starting with 'ID-' and followed by one or more numbers.)

Replace by:

ID-XXXX     (We set a static text to hide the ID but still keep the context of the text.)

b) If you have the same patient ID number in two systems, but formatted differently, you could format them so that both systems change to the same ID format and can both be recognized as the same patient. Having the same ID will provide continuity of the message flow for a patient (messages will be cloaked using the same fake data):

If, for example, PID.2 is defined like this for the two systems:

First system: ID:123456
Second system: 123-456

You would need to:

a) Set the field PID.2 as an ID (by checking the ID column).

b) Define two preformats like this:

Find what:

^ID-(?<ID_Number>\d+)$      (We find an exact match for the format and set the numbers only in a group variable named 'ID_Number')

Replace by:

${ID_Number}    (We set only the number, removing the superfluous text)

 

Find what :

^(?<ID_Number_Part_1>\d+)-(?<ID_Number_Part_2>\d+)$      (Find an exact match for the format and set the numbers only in a group variable named 'ID_Number')

Replace by:

${ID_Number_Part_1}${ID_Number_Part_2}     (Only the number, remove the superfluous text)

Now both systems will treat PID.2 as being ‘123456’ and match and cloak the messages properly as being the same patient.