This is a Python re-implementation of the spectral clustering algorithm in the paper Speaker Diarization with LSTM.
This is not the original implementation used by the paper.
Specifically, in this implementation, we use the K-Means from scikit-learn, which does NOT support customized distance measure like cosine distance.
- numpy
- scipy
- scikit-learn
Install the package by:
pip3 install spectralclusteror
python3 -m pip install spectralclusterSimply use the predict() method of class SpectralClusterer to perform
spectral clustering:
from spectralcluster import SpectralClusterer
clusterer = SpectralClusterer(
min_clusters=2,
max_clusters=100,
p_percentile=0.95,
gaussian_blur_sigma=1)
labels = clusterer.predict(X)The input X is a numpy array of shape (n_samples, n_features),
and the returned labels is a numpy array of shape (n_samples,).
For the complete list of parameters of the clusterer, see
spectralcluster/spectral_clusterer.py.
Our paper is cited as:
@inproceedings{wang2018speaker,
title={Speaker diarization with lstm},
author={Wang, Quan and Downey, Carlton and Wan, Li and Mansfield, Philip Andrew and Moreno, Ignacio Lopz},
booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={5239--5243},
year={2018},
organization={IEEE}
}
Our new speaker diarization systems are now fully supervised, powered by uis-rnn. Check this Google AI Blog.
To learn more about speaker diarization, here is a curated list of resources: awesome-diarization.
