What are anonymised, pseudonymised and identifiable personal data?

The GDPR applies when dealing with “personal data”. If data is considered personal then the GDPR places specific legal obligations on the controller of that data. If data is not personal (i.e. if it never related to a person or if it has since been anonymised) then the GDPR does not apply.

Personal data

Also known as “identifiable data”. According to the Information Commissioner’s Office (ICO), this is “any information relating to an identifiable natural person (data subject) who can be directly or indirectly identified in particular by reference to an identifier”.

This definition provides for a wide range of personal identifiers to constitute personal data, including name, address, identification number, location data or online identifier.

In the field of medical research, some commonly encountered identifiers, in addition to name and address, are; nhs number, date of birth and date of death. Certain medical conditions could also be considered identifiers, if they are very rare.

Pseudonymised data

Also known as “de-identification”, pseudonymisation is the process of separating data from direct identifiers so that discovering the identity of an individual is not possible without additional data. We do this with an artificially created identifier that we refer to as a “study number”. The resulting dataset is called “pseudonymised” or “de-identified” data.

When our data is pseudonymised, we do not hold patient identifiers; we only hold the clinical data needed for our research (e.g. symptoms, diagnoses, clinical examinations, outcomes, cancers and mortality information) and the study number of the individual. This makes the pseudonymised data held by the CSPRG effectively anonymous to our research team. The identifiable data (e.g. name, NHS number, address) and study number may be held by our data providers such as NHS hospitals responsible for the individual’s care, NHS Digital and the National Cancer Registration and Analysis Service.

The GDPR considers pseudonymisation to be one of several privacy-enhancing techniques that can be used to reduce the risk of re-identification. Although pseudonymised data may be hard to re-identify, it is not exempt from the GDPR.

Anonymised data

Anonymised data is data that cannot be used to identify individuals and is not linked to any individual, not even by study number. The GDPR does not apply to anonymised information.

Total anonymisation is an extremely high bar. Therefore, the ICO does not require anonymisation to be perfect but that the risk of re-identification be made remote.

Special category data

According to the ICO, “Special category data is personal data which the GDPR says is more sensitive, and so needs more protection. In order to lawfully process special category data, controllers must identify both a lawful basis under Article 6 and a separate condition for processing special category data under Article 9.”

The GDPR lists the special categories of data in Article 9. They include political opinions, religious beliefs, trade union membership, genetic data, biometric data, data concerning health and data concerning a natural person’s sex life or sexual orientation.

As a medical research group, much of the data we hold is special category data.