Synthetic Survey Data Generation and Evaluation

Yanru Jiang; Siyu Liang; Junwon Choi

doi:10.1145/3690624.3709421

What is it about?

Survey data is a commonly used data type in social science research. However, its data sharing has been compromised by data privacy risks and previous coarse de-identification techniques. In this study, we tackle this challenge by systematically evaluating four common synthetic data models, Synthpop, CTGAN, REaLDTabFormer, and TVAE, across three key dimensions: utility, fidelity, and privacy.

Photo by Firmbee.com on Unsplash

Why is it important?

Our findings reveal that each model has distinct strengths: Synthpop excels in general utility, CTGAN prioritizes privacy, and REaLDTabFormer and TVAE perform best in downstream applications. We recommend that future researchers select a generative method by considering the trade-offs between performance across various evaluation dimensions, training size, data type, and computational infrastructure.

Perspectives

This paper introduces an end-to-end pipeline to streamline and standardize synthetic data generation and evaluation for survey researchers. We hope to provide a practical guide on the strengths and limitations of these methods regarding social science survey data.
Yanru Jiang
University of California Los Angeles

This page is a summary of: Synthetic Survey Data Generation and Evaluation, July 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3690624.3709421.
You can read the full text:

Read

Contributors

The following have contributed to this page

Yanru Jiang
University of California Los Angeles

Generating and evaluating synthetic survey data

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Generating and evaluating synthetic survey data

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management