What is it about?
Implicit neural representations (INR) have gained prominence for efficiently encoding multimedia data, yet their applications in audio signals remain limited. This study introduces the Kolmogorov-Arnold Network (KAN), a novel architecture using learnable activation functions, as an effective INR model for audio representation. KAN demonstrates superior perceptual performance over previous INRs, achieving the lowest Log-SpectralDistance of 1.29 and the highest Perceptual Evaluation of Speech Quality of 3.57 for 1.5 s audio. To extend KAN's utility, we propose FewSound, a hypernetwork-based architecture that enhances INR parameter updates. FewSound outperforms the state-of-the-art HyperSound, with a 33.3% improvement in MSE and 60.87% in SI-SNR. These results show KAN as a robust and adaptable audio representation with the potential for scalability and integration into various hypernetwork frameworks. The source code can be accessed at https://github.com/gmum/fewsound
Featured Image
Photo by Susan Wilkinson on Unsplash
Why is it important?
This work is important because it introduces Kolmogorov-Arnold Networks (KANs) as a new class of implicit neural representations (INRs) specifically tailored for audio signals, an area where INRs have been underexplored. Unlike traditional INR models that rely on fixed activation functions, KANs use learnable activation functions, enabling more adaptive and precise encoding of complex acoustic patterns. This innovation leads to measurable gains in perceptual quality and reconstruction accuracy. Moreover, by integrating KANs into a hypernetwork framework (FewSound), the study demonstrates substantial improvements over the state-of-the-art in few-shot audio representation—achieving enhancement in signal-to-noise ratio and significant reductions in reconstruction error. This makes the work timely as it aligns with the growing demand for efficient, scalable, and high-fidelity neural audio representations used in speech synthesis, coding, and multimodal AI applications.
Perspectives
Writing this article was a truly enjoyable experience, as it brought together collaborators with diverse expertise in signal processing and neural modeling. The process deepened our shared understanding of how implicit neural representations can extend beyond images and into the complex domain of audio. Personally, I found great satisfaction in seeing theoretical ideas about Kolmogorov-Arnold Networks transform into practical, high-performing models.
Patryk Marszałek
Uniwersytet Jagiellonski w Krakowie
Read the Original
This page is a summary of: As Good as It KAN Get: High-Fidelity Audio Representation, November 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3746252.3761405.
You can read the full text:
Contributors
The following have contributed to this page







