Apply it on the mixture, reconstruct and listen to the result
Introduction: filtering
Introduction: filtering
Introduction: filtering
Introduction: filtering
Introduction: filtering
Introduction: filtering
Introduction: a brief history
The big picture
Rafii, Zafar, et al. "An Overview of Lead and Accompaniment Separation in Music."
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 26.8 (2018): 1307-1335.
A brief history: model-driven methods
Harmonicity for the lead
Pitch detection
Clean voices
"Metallic" artifacts
A brief history: model-driven methods
Redundancy for the accompaniment: NMF
Spectral templates
Low-rank assumptions
Bad generalization
A brief history: model-driven methods
Redundancy for the accompaniment: RPCA
Low-rank for music
Vocals as unstructured
Strong interferences in general
A brief history: model-driven methods
Redundancy for the accompaniment: REPET
Repetitive music
Non-repetitive vocals
Solos in vocals
A brief history: model-driven methods
Modeling both lead and accompaniment: source filter
SDR: Source to distortion ratio. Error in the estimate.
SIR: Source to interference ratio. Presence of other sources.
SAR: Source to artifacts ratio. Amount of artificial noise.
E. Vincent et al. "Performance measurement in blind audio source separation."
IEEE transactions on audio, speech, and language processing 14.4 (2006): 1462-1469.
museval (BSSeval v4)
Better matching filters computed track-wise
Faster 10x
F. Stöter et al. "The 2018 Signal Separation Evaluation Campaign."
LVA/ICA 2018.
Introduction: evaluating quality
Hands-on
Loop over some musdb tracks
Evaluate our separation system on musdb
Compare to state of the art (SiSEC18)
Introduction
Deep neural networks
Y. LeCun, et al. "Deep learning". nature, 521(7553), 436 (2015).
Introduction: deep neural networks
Basic fully connected layer
Introduction: deep neural networks
Basic fully connected network
Introduction: deep neural networks
Usual deep network
Cascading linear and non-linear operations augments expressive power
N Srivastava, et al. "Dropout: a simple way to prevent neural networks from overfitting". JMLR. (2014) 15(1), 1929-1958.
Training: regularization with dropout
Regularization makes things worse
Training: sampling strategy
Non unique tracks in batch
Not all samples per epoch
Training: sampling strategy
Unique tracks in batch
Not all samples per epoch
Training: sampling strategy
Non unique tracks in batch
All samples per epoch
Training: sampling strategy
Unique tracks in batch
All samples per epoch
Training: sampling strategy
Hands on hierarchical sampling
Implement the 4 strategies with pescador
Apply they on spectrograms
Training: sampling strategy
Unique tracks per batch is slower
All samples per epoch is faster
B. Recht. "Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies. and consequences." (2012). Technical report.
Training: data augmentation
Basic augmentation: overlap samples within each track
There are more advanced strategies
S. Uhlich, et al. "Improving music source separation based on deep neural networks through data augmentation and network blending." (2017) ICASSP.
Training: data augmentation
Basic augmentation helps a bit (0.5dB)
Not shown: new tracks are better!
Open Source Unmix (OSU) models
Outline
Testing
Representation
Mono filter tricks
Multichannel Gaussian model
The multichannel Wiener filter
Testing: evaluation
Testing: representations
The first source of poor results: inverse STFT!
Verify perfect reconstruction
Better: use established libraries, like librosa, scipy...
Testing: mono filter tricks
Logit filters
If the mask is 0.8... just put 1
If the mask is 0.2... just put 0
Cheap interference reduction
Multichannel Gaussian model
Multichannel Gaussian model
Multichannel Gaussian model
Multichannel Gaussian model
Multichannel Gaussian model
Multichannel Gaussian model
Testing: the multichannel Wiener filter
Sources and mixtures are jointly Gaussian
We observe the mix, what can we say about the sources?
Testing: the multichannel Wiener filter
Testing: the multichannel Wiener filter
Testing: the multichannel Wiener filter
Testing: the multichannel Wiener filter
Testing: the multichannel Wiener filter
Testing: the multichannel Wiener filter
Testing: the Expectation-Maximization algorithm
Testing: evaluation
Testing: evaluation
Iterations improve SIR
$\Rightarrow$ greatly reduces interferences
Iterations worsen SAR
$\Rightarrow$ introduces distortion
logit has good SIR
$\Rightarrow$ cheap interference reduction
Testing: evaluation
Outline
Conclusion
Resulting baseline
What was kept out
What is promising
Ending remarks
Conclusion: the open source unmix
Conclusion: what was kept out
Exotic representations
Alternative structures
The convolutional neural network (CNN)
The U-NET
The MM-densenet
Deep clustering
Generative approaches
Generative adversarial nets
(Variational) auto encoders
Deep clustering
Full grid search over parameters (fund us!)
Advanced data augmentation (naive=+0.3dB SDR)
Conclusion: what is promising
More data
More data
Even more data
Did we mention more data?
New approaches
Structures with more parameters work better...
Better signal processing helps
Engineering
We got 3dB SDR improvement with no publishable contribution
$\Rightarrow$ evaluating the real impact of a contribution is difficult
Conclusion: ending remarks
Convergence of signal processing, probability theory and DL