Conducting Quantitative Risk Assessments for Anonymized Datasets and Documents: What This Means for Sponsors and Patient Privacy

CANTON, Mich. (10/2/2022) – Data anonymization and document anonymization in clinical trial data is now more important than ever.
In recent years, clinical trial data sharing has become a requirement as part of the regulatory process for EMA and Health Canada.
It is now required to anonymize the personal information of trial participants before it is shared to protect the participants’ privacy and stay compliant with privacy protection laws- whether for regulatory requirements or voluntary data sharing.
Due to evolving technology and the availability of clinical trial data in various forms, there is always a concern about the re-identification of trial participants, even with data anonymization.
Therefore, assessing the inherent risk of re-identifying a trial participant in the shared data is required.
Estimating the risk of re-identification in anonymized dataset
Risk can be defined as the probability of re-identifying a trial participant. Estimating risk means determining the probability that an intruder would discover the correct identity of a single record.
The re-identification probability depends on the number of participants sharing the same identifiers across the dataset.
The risk level (maximum or average) that needs to be considered is determined by how the data is being shared. You should consider maximum risk when the data is being shared publicly without any security controls and average risk when the data is being shared through a secured portal with security controls.
There are quite a few precedents for what can be considered an acceptable amount of risk. These precedents have been used for many decades, are consistent internationally, and have persisted over time.
Managing re-identification risk means:
(1) selecting an appropriate risk metric (e.g., k-anonymity, l-diversity, t-closeness),
(2) selecting an appropriate threshold (industry standard is to set the threshold at 0.09)
(3) measuring the risk in the actual clinical trial dataset or documents that will be disclosed
Once a threshold has been determined, the actual probability of re-identification is measured in the dataset.
If the probability is higher than the threshold, transformations of the data need to be performed. These transformations may include additional equivalence class categorization and/or data redaction (documents).
Otherwise, the dataset can be declared to have an acceptable risk level for re-identification.
What about anonymized documents?
Work is ongoing within the industry to establish standards for quantitative risk assessment of anonymized and/or redacted documents.
At MMS, we have created a template where quantitative re-identification risk assessment includes a conservative threshold factor based on the uniqueness of the data in the document as compared to the underlying dataset; each variable is weighted based on the number of unique values in the dataset equivalence group divided by the number of participants in the document.
This methodology incorporates the number and uniqueness of the data in the document, compared to the overall dataset, in adjusting the overall risk of re-identification of the participants in the document.
The future of risk assessments
We continue to monitor research and industry trends associated with quantitative risk assessment. Our experts enhance and adjust our efforts in this area to provide cutting-edge solutions to quantify the risk of re-identification of clinical trial datasets and documents.
By: Veera Thota, Principal Statistical Programmer, and Harry Haber, Senior Principal Biostatistician
Learn more about MMS anonymization services here.
If you have questions about risk assessments or anonymizing data or documents, email info@mmsholdings.com for more information.
Suggested For You

perspectives
September 28th, 2023
What You Need to Know About Phase 1 Clinical Trial Designs and Bioequivalence (BE)/Bioavailability (BA) in the US and EU

perspectives
April 5th, 2022
Forever Chasing the Shiny New Thing: Thoughts from a Long-time Biostatistician

perspectives
July 23rd, 2024
PSI 2024 Ignited Conversations on External Data Sources, Requirements for Estimands, and Bayesian Methodology for Statisticians in Pharma

perspectives
June 6th, 2024
Datacise and Diversity in Patient Enrollment: Combining Geospatial and Demographic Data to Aid Site Selection

perspectives
April 29th, 2024
Validation of Clinical Dashboards for Decision Making

perspectives
December 27th, 2023
Clinical Data Science: Five Ways it Evolved from Clinical Data Management

perspectives
December 14th, 2023
Data Provenance in Real World Evidence Studies, Explained!

perspectives
October 17th, 2023
Proven Ways to Meet Key Study Start-up Timelines within Clinical Data Management

perspectives
September 25th, 2023
Clinical Data Managers Should Do These Three Things for Any Post-Production Changes

perspectives
September 8th, 2023
FDA and the Real-World: Key Changes from Draft to Final Guidance on RWD and RWE

perspectives
March 16th, 2023
10 Things to Consider When Discussing and Planning a Decentralized Clinical Trial (DCT)

perspectives
November 30th, 2022
Exploring the use of Real-World Evidence in Regulatory Decision Making Under PDUFA VII