Methods for De-Identification

Methods for De-Identification

De-identification reduces the risk of linking information in the data set to specific individuals by removing personal identifiers from data. There is more than one way to de-identify data and none of them are perfect. When choosing a method for de-identification, you should also consider its constraints and the additional safeguards that might be needed to pick up the slack.

Methods for De-Identification

Pseudonymization

The first method on deck is pseudonymization. This method replaces personal identifiers in a data set with pseudonyms. As a result, this technique limits the likelihood of using that data being to re-identify the individuals the data is referencing.

Method Limitations:
  • Since this method only replaces the personal identifiers, additional information stored separately (like an algorithm or key) can re-identify pseudonymized data.
  • In order to use pseudonymization correctly, you must remove or mask combinations and indirect identifiers.
Examples of Pseudonymization:
  • Patient 123
  • UUIDs

 

Anonymization

Next, we have anonymization. Anonymization permanently and completely removes personal identifiers so that the data subject can no longer be identified directly or indirectly.

Method Limitations:
  • This method still leaves the contextual information intact, so cross-referencing anonymized data with other sources can de-anonymize the data.
Examples of Anonymization:
  • The anonymized data can be used to build psychological profiles for marketing purposes.

 

K-Anonymization

K-Anonymization removes all identifiable data from the data set. In doing so, this method guarantees that the data remains practically useful without re-identifying the individuals.

Method Limitations:
  • K-Anonymization cannot anonymize highly dimensional data sets.
  • Since this method suppresses the identifiable data, it can skew results.
Examples of K-Anonymization:
  • Password keepers and Have I Been Pwned? use this method.

 

Non-Anonymizable Data

Individual biology and activity patterns are so unique that the data is impossible to anonymize. Since it cannot exist without identifiers, this type of data is an exception to de-identification.

Examples of Non-Anonymizable Data:
  • Biometric data
  • Location history
  • Contents of chat logs
  • VR scans

 

Do you have a question about how to de-identify data to reduce the risk of linking specific individuals to data? Do you need guidance or training on your general data security practices? Contact us for a free consultation to see if Gazelle Consulting’s customized compliance services are right for you.

Nav close