DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization (Interspeech 2025)

by Geonyoung Lee*, Geonhee Han*, Paul Hongsuck Seo
*Equal contribution

This is the official repository for our Interspeech 2025 paper:
DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization.

We propose a novel training-free framework that enables zero-shot language-queried audio source separation by repurposing pretrained text-to-audio diffusion models. DGMO refines magnitude spectrogram masks at test-time via guidance from diffusion-generated references.

Overview

DGMO consists of two key modules:

Reference Generation: Uses DDIM inversion to generate query-conditioned audio references with pretrained diffusion models.
Mask Optimization: Learns a spectrogram mask aligned to the reference, enabling faithful extraction of the target sound from the input mixture.

Unlike traditional LASS approaches, DGMO requires no training and generalizes across datasets with only test-time optimization.

Installation

We recommend using conda to create a clean environment:

conda create -n dgmo python=3.10 -y
conda activate dgmo
pip install -r requirements.txt

Make sure you have ffmpeg installed if you work with audio files:

conda install -c conda-forge ffmpeg

Inference

You can perform source separation using DGMO with a simple shell script.

Step 1: Set Up `inference.sh`

Modify the following variables in the script:

# inference.sh

# Input mixture path
MIX_PATH="./data/samples/dog_barking_and_cat_meowing.wav"

# Text queries (e.g., sources you want to extract)
TEXTS=("dog barking" "cat meowing")

Each text query corresponds to a target sound to be separated.

Step 2: Run Inference

Run the script as follows:

bash inference.sh

This will:

Run DGMO inference for each query
Save the separated audio as .wav files
Create a timestamped directory for organized output (e.g., ./results/run_20250607_170502/)

Acknowledgement

Our implementation builds on several open-source projects including AudioLDM, Auffusion, and Peekaboo. We sincerely thank the authors for their contributions.

Attribution

This project includes components licensed under CC BY-NC-SA 4.0.
See LICENSE for full terms.

Specifically, we incorporate ideas and/or pretrained models from:

Auffusion — CC BY-NC-SA 4.0
AudioLDM — CC BY-NC-SA 4.0
AudioLDM2 — CC BY-NC-SA 4.0

🔒 Note: This project is for non-commercial research and educational use only, as required by the licenses of the incorporated models.

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{lee25g_interspeech,
  title     = {{DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization}},
  author    = {{Geonyoung Lee and Geonhee Han and Paul Hongsuck Seo}},
  year      = {{2025}},
  booktitle = {{Interspeech 2025}},
  pages     = {{4983--4987}},
  doi       = {{10.21437/Interspeech.2025-840}},
  issn      = {{2958-1796}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
assets		assets
configs		configs
data/samples		data/samples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
inference.sh		inference.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization (Interspeech 2025)

Overview

Installation

Inference

Step 1: Set Up `inference.sh`

Step 2: Run Inference

Acknowledgement

Attribution

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

wltschmrz/DGMO

Folders and files

Latest commit

History

Repository files navigation

DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization (Interspeech 2025)

Overview

Installation

Inference

Step 1: Set Up inference.sh

Step 2: Run Inference

Acknowledgement

Attribution

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Step 1: Set Up `inference.sh`

Packages