What is it about?

We crowdsourced and published a first-of-its-kind open dataset that contains Amazon purchase histories, spanning 2018 to 2022, from more than 5000 US consumers, along with their sociodemographics. This paper is not about the dataset — readers can learn more about the dataset via https://www.nature.com/articles/s41597-024-03329-6 — but about the data collection process. To collect the dataset we developed a method that prioritizes user consent and we embedded an experiment into our data collection tool in order to study what impacts participants’ likelihood to share their data for open research. Our paper presents results from our experiment, along with our data collection method, as a resource for future data crowdsourcing efforts.

Featured Image

Why is it important?

Data generated by users on digital platforms are a crucial resource for advocates and researchers interested in uncovering digital inequities, auditing algorithms, and understanding human behavior. The CSCW and HCI communities have relied on data sources like these, yet data access is often restricted and becoming increasingly limited or expensive. Our project demonstrates how data crowdsourcing can offer an alternative way for researchers to access valuable data while also prioritizing user consent.

Read the Original

This page is a summary of: Insights from an Experiment Crowdsourcing Data from Thousands of US Amazon Users: The importance of transparency, money, and data use, Proceedings of the ACM on Human-Computer Interaction, November 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3687005.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page