Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework

Andrea Di Pierno; Luca Guarnera; Dario Allegra; Sebastiano Battiato

doi:10.1145/3746265.3759668

What is it about?

This research focuses on attributing audio deepfakes to their source, rather than simply detecting whether they are real or fake. It introduces a two-level framework that first identifies the generation technology used to create a synthetic voice and then recognizes the specific AI model responsible. By analyzing subtle acoustic patterns using a shared neural encoder and attention mechanisms, the framework improves forensic reliability under real-world conditions, including unfamiliar or unseen attacks. This structured approach helps investigators and researchers trace the origins of fake audio with high accuracy.

Photo by Stanislav Vlasov on Unsplash

Why is it important?

Audio deepfakes can be used for scams, disinformation, or to undermine trust in digital communications. While detecting whether audio is real or fake is important, understanding where a fake comes from is crucial for forensic analysis, accountability, and policy responses. By attributing synthetic audio to both the generation technology and the specific AI model behind it, this work supports more reliable investigations, helps expose malicious actors, and contributes to building transparent and trustworthy media ecosystems.

Perspectives

Future work will explore extending the framework to new types of generative models and family-level attribution, enabling broader coverage of emerging technologies. We also plan to combine audio and visual modalities for multi-modal deepfake attribution, and to integrate the framework into real-world forensic workflows. These developments could support law enforcement, journalists, and regulators in tracing and understanding increasingly sophisticated synthetic media.
Andrea Di Pierno
IMT School for Advanced Studies

This page is a summary of: Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3746265.3759668.
You can read the full text:

Read

Resources

URL
Code and Models
Github repository with code, pretrained models and the documentation needed to run the framework.

Contributors

The following have contributed to this page

Andrea Di Pierno
IMT School for Advanced Studies

How to Trace Audio Deepfakes Back to Their Source

What is it about?

Why is it important?

Perspectives

Resources

Code and Models

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

How to Trace Audio Deepfakes Back to Their Source

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Code and Models

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management