What is it about?

Imagine providing three pieces of data about yourself in a survey: your birthday, sex, and zip code. With this information, could someone uniquely identify you? Surprisingly, research by Latanya Sweeney suggests that the answer is yes for most people. (Find out if it’s true for you here: https://cpg.doc.ic.ac.uk/observatory/explore). This process of uniquely identifying a person using indirect information (e.g., gender, birthdate, race/ethnicity) is called re-identification and it highlights a tension between two goals researchers often juggle. On the one hand, researchers working with human participants aim to make their work transparent and reproducible by sharing data publicly. On the other hand, researchers aim to protect participants’ privacy, and sharing data publicly can compromise privacy. In this paper, we offer guidance and tools to support data sharing without jeopardizing participants’ privacy. Specifically, we provide a pipeline for quantifying re-identification risk and introduce two open-source algorithms that can reduce re-identification risk while maintaining a dataset’s quality for future use.

Featured Image

Why is it important?

This work is relevant to both researchers and the general public. For researchers, this work is important because it provides guidance on engaging with open science practices without jeopardizing participant privacy. For the public, it sheds light on a potentially unforeseen consequence of data sharing and outlines how researchers can share their data while safeguarding privacy. Taken together, we demonstrate how open science and privacy aims can be satisfied in tandem so that researchers can share their work transparently and research participants can be confident that their identities will not inadvertently be revealed.

Perspectives

As campaigns about data privacy (e.g., https://youtu.be/NOXK4EVFmJY) become increasingly prevalent, I wondered how concerns about re-identification might influence participants’ behavior and the research process more broadly. Will participants provide inaccurate demographic data to protect their privacy? Will university ethics boards require researchers to reduce the amount of demographic information collected? In this paper, we show that pursuing open science and protecting participant privacy can be reconciled with innovative anonymization solutions.

Kirsten Morehouse
Harvard University

Read the Original

This page is a summary of: Responsible data sharing: Identifying and remedying possible re-identification of human participants., American Psychologist, May 2024, American Psychological Association (APA),
DOI: 10.1037/amp0001346.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page