Skip to content

wltschmrz/DGMO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Korea University MIIL Interspeech 2025

DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization (Interspeech 2025)

[Paper] [Project Page]

by Geonyoung Lee*, Geonhee Han*, Paul Hongsuck Seo
*Equal contribution

This is the official repository for our Interspeech 2025 paper:
DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization.

We propose a novel training-free framework that enables zero-shot language-queried audio source separation by repurposing pretrained text-to-audio diffusion models. DGMO refines magnitude spectrogram masks at test-time via guidance from diffusion-generated references.


Overview

DGMO Diagram

DGMO consists of two key modules:

  • Reference Generation: Uses DDIM inversion to generate query-conditioned audio references with pretrained diffusion models.
  • Mask Optimization: Learns a spectrogram mask aligned to the reference, enabling faithful extraction of the target sound from the input mixture.

Unlike traditional LASS approaches, DGMO requires no training and generalizes across datasets with only test-time optimization.


Installation

We recommend using conda to create a clean environment:

conda create -n dgmo python=3.10 -y
conda activate dgmo
pip install -r requirements.txt

Make sure you have ffmpeg installed if you work with audio files:

conda install -c conda-forge ffmpeg

Inference

You can perform source separation using DGMO with a simple shell script.

Step 1: Set Up inference.sh

Modify the following variables in the script:

# inference.sh

# Input mixture path
MIX_PATH="./data/samples/dog_barking_and_cat_meowing.wav"

# Text queries (e.g., sources you want to extract)
TEXTS=("dog barking" "cat meowing")

Each text query corresponds to a target sound to be separated.

Step 2: Run Inference

Run the script as follows:

bash inference.sh

This will:

  • Run DGMO inference for each query
  • Save the separated audio as .wav files
  • Create a timestamped directory for organized output (e.g., ./results/run_20250607_170502/)

Acknowledgement

Our implementation builds on several open-source projects including AudioLDM, Auffusion, and Peekaboo. We sincerely thank the authors for their contributions.

Attribution

This project includes components licensed under CC BY-NC-SA 4.0.
See LICENSE for full terms.

Specifically, we incorporate ideas and/or pretrained models from:

🔒 Note: This project is for non-commercial research and educational use only, as required by the licenses of the incorporated models.


Citation

If you find our work useful in your research, please consider citing:

@inproceedings{lee25g_interspeech,
  title     = {{DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization}},
  author    = {{Geonyoung Lee and Geonhee Han and Paul Hongsuck Seo}},
  year      = {{2025}},
  booktitle = {{Interspeech 2025}},
  pages     = {{4983--4987}},
  doi       = {{10.21437/Interspeech.2025-840}},
  issn      = {{2958-1796}},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •