LAFUFU: Latent Acoustic Features for Ultra-Fast Utterance Restoration

Utterance restoration is an automated voice processing task where the goal is to recreate high-fidelity speech from the imperfect original recordings, affected by the presence of diverse distortions. In recent years, generative diffusion models have been shown to be remarkably effective in this domain, demonstrating leading performance on various benchmarks. However, their computational demands render them impractical when utilised in edge devices or in real-time scenarios. In this paper we introduce LAFUFU — a novel approach to the utterance restoration problem leveraging the latent-space acoustic representations. Rather than working directly with raw audio inputs, our method operates on compact, information-dense features extracted using a dedicated pretrained encoder network. By doing so, we are able to achieve multifold improvements in model inference speed without compromising the output integrity. We also show that, given an equivalent time constraints, LAFUFU is capable of producing higher-quality restored utterances than the classical non-latent alternatives, as evidenced by its competitive performance on the EARS-WHAM and EARS-Reverb frontier benchmarks. Those results highlight representation learning as a key enabler for unlocking generative diffusion potential in audio applications, suggesting further progress is achievable via this research avenue.

Radosław Łazarz

Radosław Łazarz

Samsung R&D Poland

AGH University

Mateusz Wosik

Mateusz Wosik

Samsung R&D Poland

Mikołaj Pudo

Mikołaj Pudo

Samsung R&D Poland

Urszula Krywalska

Urszula Krywalska

Samsung R&D Poland

Adam Cieślak

Adam Cieślak

Samsung R&D Poland

Reverb ESTOI comparison plot showing LAFUFU performance against other methods
Reverb ESTOI performance comparison

Audio Examples

Listen to LAFUFU's speech restoration results across different speakers and distortion types

▶️ EARS-Reverb

Speech restoration examples with reverberation distortion

▶️ EARS-WHAM

Speech restoration examples with background noise distortion

Resources

📄

Paper

Read the full paper with detailed methodology and analysis

💻

Code

Access the complete implementation and pretrained models

Citation

@article{lafufu2025,
  title={LAFUFU: Latent Acoustic Features for Ultra-Fast Utterance Restoration},
  author={Rados{\l}aw {\L}azarz and Mateusz Wosik and Miko{\l}aj Pudo and Urszula Krywalska and Adam Cieślak},
  year={2025},
  url={https://github.com/SamsungLabs/LAFUFU}
}