# Datasets

The MASS dataset formed the core content of the early Signal Separation Evaluation Campaigns (SiSEC) (Vincent, Araki, and Bofill 2009), which evaluate the quality of various music separation methods. SiSEC always had a strong focus on vocals and accompaniment separation. For a long time, vocals separation methods were very demanding computationally and it was already considered extremely challenging to separate excerpts of only a few seconds.

In the following years, new datasets were proposed that improved over the MASS dataset in many directions. They are summarized in the table below:

Dataset Year Tracks Track duration (s) Full/stereo?
MASS (opens new window) 2008 9 16 7) ❌ / ✔️
MIR-1K (opens new window) 2010 1,000 (8 8) ❌ / ❌
QUASI (opens new window) 2011 5 (206 21) ✔️ / ✔️
ccMixter (opens new window) 2014 50 (231 77) ✔️ / ✔️
MedleyDB (opens new window) 2014 63 (206 121) ✔️ / ✔️
iKala (opens new window) 2015 206 30 ❌ / ❌
sigsep DSD100 2015 100 (251 60) ✔️ / ✔️
sigsep MUSDB18 2017 150 (236 95) ✔️ / ✔️
sigsep MUSDB18-HQ 2019 150 (236 95) ✔️ / ✔️