The scores here are either taken from their respective papers or from the evaluation scores of SiSEC18, and we show the median of SDR. It is worth mentioning that there is no extra data used in our training procedure. In order to make a fair comparison, we only compare with the methods without data augmentation. In addition, we provided some Demo.
Models | Bass | Drums | Other | Vocals | AVG. |
---|---|---|---|---|---|
IRM oracle | 7.12 | 8.45 | 7.85 | 9.43 | 8.21 |
Wave-U-Net [paper] [code] | 3.21 | 4.22 | 2.25 | 3.25 | 3.23 |
UMX [paper] [code] | 5.23 | 5.73 | 4.02 | 6.32 | 5.33 |
Meta-TasNet [paper] [code] | 5.58 | 5.91 | 4.19 | 6.40 | 5.52 |
MMDenseLSTM [paper] | 5.16 | 6.41 | 4.15 | 6.60 | 5.58 |
Sams-Net [paper] | 5.25 | 6.63 | 4.09 | 6.61 | 5.65 |
X-UMX [paper] [code] | 5.43 | 6.47 | 4.64 | 6.61 | 5.79 |
Conv-TasNet [paper] | 6.53 | 6.23 | 4.26 | 6.21 | 5.81 |
LaSAFT [paper] [code] | 5.63 | 5.68 | 4.87 | 7.33 | 5.88 |
Spleeter [paper] [code] | 5.51 | 6.71 | 4.02 | 6.86 | 5.91 |
D3Net [paper] | 5.25 | 7.01 | 4.53 | 7.24 | 6.01 |
DEMUCS [paper] [code] | 7.01 | 6.86 | 4.42 | 6.84 | 6.28 |
ours | 7.92 | 7.33 | 4.92 | 7.37 | 6.89 |
We also make a comparison between our proposed CDEHTCN with the previous methods on the other two metrics2: SAR (sources to artifacts ratio) and SIR (source to interference ratio).
[1] Hu Y, Chen Y, Yang W, et al. Hierarchic Temporal Convolutional Network With Cross-Domain Encoder for Music Source Separation[J]. IEEE Signal Processing Letters, 2022, 29: 1517-1521. [paper]