# Datasets
The MASS dataset formed the core content of the early Signal Separation Evaluation Campaigns (SiSEC) (Vincent, Araki, and Bofill 2009), which evaluate the quality of various music separation methods. SiSEC always had a strong focus on vocals and accompaniment separation. For a long time, vocals separation methods were very demanding computationally and it was already considered extremely challenging to separate excerpts of only a few seconds.
In the following years, new datasets were proposed that improved over the MASS dataset in many directions. They are summarized in the table below:
Dataset | Year | Tracks | Track duration (s) | Full/stereo? |
---|---|---|---|---|
MASS (opens new window) | 2008 | 9 | 16 | ❌ / ✔️ |
MIR-1K (opens new window) | 2010 | 1,000 | (8 | ❌ / ❌ |
QUASI (opens new window) | 2011 | 5 | (206 | ✔️ / ✔️ |
ccMixter (opens new window) | 2014 | 50 | (231 | ✔️ / ✔️ |
MedleyDB (opens new window) | 2014 | 63 | (206 | ✔️ / ✔️ |
iKala (opens new window) | 2015 | 206 | 30 | ❌ / ❌ |
sigsep DSD100 | 2015 | 100 | (251 | ✔️ / ✔️ |
sigsep MUSDB18 | 2017 | 150 | (236 | ✔️ / ✔️ |
sigsep MUSDB18-HQ | 2019 | 150 | (236 | ✔️ / ✔️ |