# Datasets

The MASS dataset formed the core content of the early Signal Separation Evaluation Campaigns (SiSEC) (Vincent, Araki, and Bofill 2009), which evaluate the quality of various music separation methods. SiSEC always had a strong focus on vocals and accompaniment separation. For a long time, vocals separation methods were very demanding computationally and it was already considered extremely challenging to separate excerpts of only a few seconds.

In the following years, new datasets were proposed that improved over the MASS dataset in many directions. They are summarized in the table below:

Dataset	Year	Tracks	Track duration (s)	Full/stereo?
MASS (opens new window)	2008	9	16 7)	❌ / ✔️
MIR-1K (opens new window)	2010	1,000	(8 8)	❌ / ❌
QUASI (opens new window)	2011	5	(206 21)	✔️ / ✔️
ccMixter (opens new window)	2014	50	(231 77)	✔️ / ✔️
MedleyDB (opens new window)	2014	63	(206 121)	✔️ / ✔️
iKala (opens new window)	2015	206	30	❌ / ❌
sigsep DSD100	2015	100	(251 60)	✔️ / ✔️
sigsep MUSDB18	2017	150	(236 95)	✔️ / ✔️
sigsep MUSDB18-HQ	2019	150	(236 95)	✔️ / ✔️