AVASpeech - SMAD

This repository contains code, labels and metadata for AVASpeech - SMAD dataset presented in late-breaking demo, ISMIR 2021.

Citation

AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence [arXiv]

- Yun-Ning Hung, Karn N. Watcharasupat, Chih-Wei Wu, Iroro Orife, Kelian Li, Pavan Seshadri, Junyoung Lee

@article{avaspeechSMAD,
  title     = {AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence},
  author    = {Hung, Yun-Ning and Watcharasupat, Karn and Wu, Chih-Wei and Orife, Iroro and Li, Kelian and Seshadri, Pavan and Lee, Junyoung},
  year      = {2021},
  journal={arXiv preprint arXiv:2111.01320}
}

Dataset

Download audio:
- Install youtube-dl
- Run the download script python3 process.py
Labels (in labels/):
- Speech labels: from original AVASpeech dataset [1]
- Music labels: manually created by the authors

Metadata

Benchmark result from the existings models (in evaluation/):
- inaSpeechSegmenter [2]: results derived from this repo
- synth-audio-seg [3]: results derived from this repo
Statistic
- statistic.cvs: music, speech and overlap labels percentage of each song.
- distribution/: music, speech and overlap labels percentage distribution for the entire dataset

Code

process.py: code to download the audio and calculate the statistics

Definition of Speech and Music

Music

Pitched sounds with more than one note
Singing voice
Ident
Melodic ringtone
Multiple instrumental sounds played simultaneously
Any rhythmic sequence of musical elements (moving melody, or drums/percussion)

Non Music

Ambient sound effect (e.g., low frequency sound)
Pitched sound with only one note (no moving melody)
Traditional phone bell ring or buzz with no apparent musical elements

Speech

human voice in different languages
Oh (is considered speech, as in Oh my! Or Oh no!)
Singing with lyrics

Non Speech

Sighing
Screaming
Laughing
Ah, Hm, Uh-hum, Uh, Err
Groaning, moaning, heavy breathing

Reference

[1] S. Chaudhuri, J. Roth, D. P. Ellis, A. Gallagher, L. Kaver, R. Marvin, C. Pantofaru, N. Reale, L. G. Reid, K. Wilson et al., “AVA-speech: A densely la- beled dataset of speech activity in movies,” in Proceed- ings of the 19th Annual Conference of the International Speech Communication Association, 2018.

[2] S. Venkatesh, D. Moffat, and E. R. Miranda, “Inves- tigating the effects of training set synthesis for audio segmentation of radio broadcast,” Electronics, vol. 10, no. 7, p. 827, 2021.

[3] D. Doukhan, E. Lechapt, M. Evrard, and J. Carrive, “INA’s MIREX 2018 music and speech detection sys- tem,” in 14th Music Information Retrieval Evaluation eXchange, 2018.

Contact

Yun-Ning (Amy) Hung

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
distribution		distribution
evaluation		evaluation
labels		labels
LICENSE		LICENSE
README.md		README.md
process.py		process.py
statistic.csv		statistic.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AVASpeech - SMAD

Citation

Dataset

Metadata

Code

Definition of Speech and Music

Music

Non Music

Speech

Non Speech

Reference

Contact

About

Releases

Packages

Languages

License

biboamy/AVASpeech_Music_Labels

Folders and files

Latest commit

History

Repository files navigation

AVASpeech - SMAD

Citation

Dataset

Metadata

Code

Definition of Speech and Music

Music

Non Music

Speech

Non Speech

Reference

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages