Vocal Source Separation

Results from ICASSP paper here

First, a song which performs well. This song (45412_chorus from the iKala dataset) achieves Normalised Source to Distortion Ratio (NSDR) of 13.9254dB for the sung voice and 12.4505dB for the musical accompaniment, around 1 standard deviation above our mean performance over the dataset:

Mix


Predicted Vocals

Predicted Music

Now for an average case. This track (21056_chorus) achieves performance roughly equal to our mean performance across the dataset: Vocal NSDR = 9.4209dB, Music NSDR = 9.7477dB

Mix

Predicted Vocals

Predicted Music

Finally, a track where we do less well. This track (21056_chorus) achieves performance roughly equal to 1 standard deviation below our mean performance across the dataset: 5.7311dB, Music NSDR = 6.5855dB:

Mix

Predicted Vocals

Predicted Music