As Good as It KAN Get: High-Fidelity Audio Representation

Patryk Marszałek; Maciej Rut; Piotr Kawa; Przemysław Spurek; Piotr Syga

doi:10.1145/3746252.3761405

What is it about?

Implicit neural representations (INR) have gained prominence for efficiently encoding multimedia data, yet their applications in audio signals remain limited. This study introduces the Kolmogorov-Arnold Network (KAN), a novel architecture using learnable activation functions, as an effective INR model for audio representation. KAN demonstrates superior perceptual performance over previous INRs, achieving the lowest Log-SpectralDistance of 1.29 and the highest Perceptual Evaluation of Speech Quality of 3.57 for 1.5 s audio. To extend KAN's utility, we propose FewSound, a hypernetwork-based architecture that enhances INR parameter updates. FewSound outperforms the state-of-the-art HyperSound, with a 33.3% improvement in MSE and 60.87% in SI-SNR. These results show KAN as a robust and adaptable audio representation with the potential for scalability and integration into various hypernetwork frameworks. The source code can be accessed at https://github.com/gmum/fewsound

Photo by Susan Wilkinson on Unsplash

Why is it important?

This work is important because it introduces Kolmogorov-Arnold Networks (KANs) as a new class of implicit neural representations (INRs) specifically tailored for audio signals, an area where INRs have been underexplored. Unlike traditional INR models that rely on fixed activation functions, KANs use learnable activation functions, enabling more adaptive and precise encoding of complex acoustic patterns. This innovation leads to measurable gains in perceptual quality and reconstruction accuracy. Moreover, by integrating KANs into a hypernetwork framework (FewSound), the study demonstrates substantial improvements over the state-of-the-art in few-shot audio representation—achieving enhancement in signal-to-noise ratio and significant reductions in reconstruction error. This makes the work timely as it aligns with the growing demand for efficient, scalable, and high-fidelity neural audio representations used in speech synthesis, coding, and multimodal AI applications.

Perspectives

Writing this article was a truly enjoyable experience, as it brought together collaborators with diverse expertise in signal processing and neural modeling. The process deepened our shared understanding of how implicit neural representations can extend beyond images and into the complex domain of audio. Personally, I found great satisfaction in seeing theoretical ideas about Kolmogorov-Arnold Networks transform into practical, high-performing models.
Patryk Marszałek
Uniwersytet Jagiellonski w Krakowie

This page is a summary of: As Good as It KAN Get: High-Fidelity Audio Representation, November 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3746252.3761405.
You can read the full text:

Read

Contributors

The following have contributed to this page

As Good as It KAN Get: High-Fidelity Audio Representation

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

As Good as It KAN Get: High-Fidelity Audio Representation

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management