Diffusion-Based Audio Inpainting

Moliner, Eloi; Välimäki, Vesa

doi:10.17743/jaes.2022.0129

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2305.15266 (eess)

[Submitted on 24 May 2023 (v1), last revised 10 Jan 2025 (this version, v3)]

Title:Diffusion-Based Audio Inpainting

Authors:Eloi Moliner, Vesa Välimäki

View PDF HTML (experimental)

Abstract:Audio inpainting aims to reconstruct missing segments in corrupted recordings. Most of existing methods produce plausible reconstructions when the gap lengths are short, but struggle to reconstruct gaps larger than about 100 ms. This paper explores recent advancements in deep learning and, particularly, diffusion models, for the task of audio inpainting. The proposed method uses an unconditionally trained generative model, which can be conditioned in a zero-shot fashion for audio inpainting, and is able to regenerate gaps of any size. An improved deep neural network architecture based on the constant-Q transform, which allows the model to exploit pitch-equivariant symmetries in audio, is also presented. The performance of the proposed algorithm is evaluated through objective and subjective metrics for the task of reconstructing short to mid-sized gaps, up to 300 ms. The results of a formal listening test show that the proposed method delivers comparable performance against the compared baselines for short gaps, such as 50 ms, while retaining a good audio quality and outperforming the baselines for wider gaps that are up to 300 ms long. The method presented in this paper can be applied to restoring sound recordings that suffer from severe local disturbances or dropouts, which must be reconstructed.

Comments:	Submitted for publication to the Journal of Audio Engineering Society on January 30th, 2023
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2305.15266 [eess.AS]
	(or arXiv:2305.15266v3 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2305.15266
Journal reference:	Journal of the Audio Engineering Society 72, no. 3 (2024): 100-113
Related DOI:	https://doi.org/10.17743/jaes.2022.0129

Submission history

From: Eloi Moliner [view email]
[v1] Wed, 24 May 2023 15:52:11 UTC (7,393 KB)
[v2] Thu, 14 Sep 2023 20:16:56 UTC (9,091 KB)
[v3] Fri, 10 Jan 2025 13:07:40 UTC (4,592 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Diffusion-Based Audio Inpainting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Diffusion-Based Audio Inpainting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators