Introduction

status Open In Colab Gitter Google group : Open-Unmix

Open-Unmix - A Reference Implementation for Music Source Separation

Open-Unmix, is a deep neural network reference implementation for music source separation, applicable for researchers, audio engineers and artists. Open-Unmix provides ready-to-use models that allow users to separate pop music into four stems: vocals, drums, bass and the remaining other instruments.

Although open-unmix reaches state of the art separation performance as of September, 2019 (See Evaluation), the design choices for it favored simplicity over performance to promote clearness of the code and to have it serve as a baseline for future research. The results are comparable/better to those of UHL1/UHL2 which obtained the best performance over all systems trained on MUSDB18 in the SiSEC 2018 Evaluation campaign. We designed the code to allow researchers to reproduce existing results, quickly develop new architectures and add own user data for training and testing. We favored framework specifics implementations instead of having a monolithic repository with common code for all frameworks.

The model is available for three different frameworks. However, the pytorch implementation serves as the reference version that includes pre-trained networks trained on the MUSDB18 dataset.

Paper

status

Open-unmix is presented in a paper that has been published in the Journal of Open Source Software. You may download the paper PDF here

If you use open-unmix for your research, please cite it through the references below.

Design Choices

The design choices made for Open-Unmix have sought to reach two somewhat contradictory objectives. Its first aim is to have state-of-the-art performance, and its second aim is to still be easily understandable, so that it can serve as a basis for research to allow improved performance in the future. In the past, many researchers faced difficulties in pre- and post-processing that could be avoided by sharing domain knowledge. Our aim was thus to design a system that allows researchers to focus on A) new representations and B) new architectures.

Framework specific vs. framework agnostic

We choose pytorch to serve as a reference implementation due to its balance between simplicity and modularity. Furthermore, we already ported the core model to NNabla and plan to release a port for Tensorflow 2.0, once the framework is released. Note that the ports will not include pre-trained models as we cannot make sure the ports would yield identical results, thus leaving a single baseline model for researchers to compare with.

"MNIST-like"

Keeping in mind that the learning curve can be quite steep in audio processing, we did our best for Open-unmix to be:

  • simple to extend: The pre/post-processing, data-loading, training and models part of the code are isolated and easy to replace/update. In particular, a specific effort was done to make it easy to replace the model.
  • not a package: The software is composed of largely independent and self-containing parts, keeping it easy to use and easy to change.
  • hackable (MNIST like): Due to our objective of making it easier for machine-learning experts to try out music separation, we did our best to stick to the philosophy of baseline implementations for this community. In particular, Open-unmix mimics the famous MNIST example, including the ability to instantly start training on a dataset that is automatically downloaded.

Reproducible

Releasing Open-Unmix is first and foremost an attempt to provide a reliable implementation sticking to established programming practice as were also proposed in (McFee et al. 2018). In particular:

  • reproducible code: everything is provided to exactly reproduce our experiments and display our results.
  • pre-trained models: we provide pre-trained weights that allow a user to use the model right away or fine-tune it on user-provided data (Stöter and Liutkus 2019a, 2019b).
  • tests: the release includes unit and regression tests, useful to organize future open collaboration through pull requests.

Using the PyTorch version

For installation we recommend to use the Anaconda python distribution. To create a conda environment for open-unmix, simply run:

conda env create -f environment-X.yml where X is either [cpu-linux, gpu-linux-cuda10, cpu-osx], depending on your system. For now, we haven't tested windows support.

We provide two pre-trained models:

  • umxhq (default) trained on MUSDB18-HQ which comprises the same tracks as in MUSDB18 but un-compressed. This allows outputting separated signals with a full bandwidth of 22050 Hz.

    DOI

  • umx is trained on the regular MUSDB18 which is bandwidth limited to 16 kHz due to AAC compression. This model should be used for comparison with other (older) methods for evaluation in SiSEC18.

    DOI

To separate audio files (wav, flac, ogg - but not mp3) files just run:

python test.py input_file.wav --model umxhq

A more detailed list of the parameters used for the separation is given in the inference.md document. We provide a jupyter notebook on google colab to experiment with open-unmix and to separate files online without any installation setup.

Torch.hub

The pre-trained models can be loaded from other pytorch based repositories using torch.hub.load:

torch.hub.load('sigsep/open-unmix-pytorch', 'umxhq', target='vocals')

Evaluation using museval

To perform evaluation in comparison to other SISEC systems, you would need to install the museval package using

pip install museval

and then run the evaluation using

python eval.py --outdir /path/to/musdb/estimates --evaldir /path/to/museval/results

Contribute / Support

open-unmix is a community focused project, we therefore encourage the community to submit bug-fixes and requests for technical support through github issues. For more details of how to contribute, please follow our CONTRIBUTING.md.

For support and help, please use the gitter chat or the google groups forum.

References

If you use open-unmix for your research – Cite Open-Unmix
@article{stoter19,  
  author        = {F.-R. St\\"oter and 
                   S. Uhlich and 
                   A. Liutkus and 
                   Y. Mitsufuji},  
  title         = {Open-Unmix - A Reference Implementation 
                   for Music Source Separation},
  journal       = {Journal of Open Source Software},  
  year          = 2019,
  doi           = {10.21105/joss.01667},
  url           = {https://doi.org/10.21105/joss.01667}
}

If you use the MUSDB dataset for your research - Cite the MUSDB18 Dataset

@misc{MUSDB18,
  author       = {Zafar Rafii and
                  Antoine Liutkus and
                  Fabian-Robert St{\"o}ter and
                  Stylianos Ioannis Mimilakis and
                  Rachel Bittner},
  title        = {The {MUSDB18} corpus for music separation},
  month        = dec,
  year         = 2017,
  doi          = {10.5281/zenodo.1117372},
  url          = {https://doi.org/10.5281/zenodo.1117372}
}

If compare your results with SiSEC 2018 Participants - Cite the SiSEC 2018 LVA/ICA Paper

@inproceedings{SiSEC18,
  author       = {Fabian-Robert St{\"o}ter and
                  Antoine Liutkus and
                  Nobutaka Ito},
  title        = {The 2018 Signal Separation Evaluation Campaign},
  booktitle    = {Latent Variable Analysis and Signal Separation},
  year         = 2018,
  pages        = {293--30}
}

⚠️ Please note that the official acronym for open-unmix is UMX.

Authors

Fabian-Robert Stöter, Antoine Liutkus, Inria and LIRMM, Montpellier, France

License

MIT

Funding

This work was partly supported by the research programme KAMoulox(ANR-15-CE38-0003-01) funded by ANR, the French State agency for re-search