Skip to content

biboamy/AVASpeech_Music_Labels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AVASpeech - SMAD

This repository contains code, labels and metadata for AVASpeech - SMAD dataset presented in late-breaking demo, ISMIR 2021.

Citation

AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence [arXiv]

- Yun-Ning Hung, Karn N. Watcharasupat, Chih-Wei Wu, Iroro Orife, Kelian Li, Pavan Seshadri, Junyoung Lee

@article{avaspeechSMAD,
  title     = {AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence},
  author    = {Hung, Yun-Ning and Watcharasupat, Karn and Wu, Chih-Wei and Orife, Iroro and Li, Kelian and Seshadri, Pavan and Lee, Junyoung},
  year      = {2021},
  journal={arXiv preprint arXiv:2111.01320}
}

Dataset

  1. Download audio:

    • Install youtube-dl
    • Run the download script python3 process.py
  2. Labels (in labels/):

    • Speech labels: from original AVASpeech dataset [1]
    • Music labels: manually created by the authors

Metadata

  1. Benchmark result from the existings models (in evaluation/):

    • inaSpeechSegmenter [2]: results derived from this repo
    • synth-audio-seg [3]: results derived from this repo
  2. Statistic

    • statistic.cvs: music, speech and overlap labels percentage of each song.
    • distribution/: music, speech and overlap labels percentage distribution for the entire dataset

Code

  1. process.py: code to download the audio and calculate the statistics

Definition of Speech and Music

Music

  • Pitched sounds with more than one note
  • Singing voice
  • Ident
  • Melodic ringtone
  • Multiple instrumental sounds played simultaneously
  • Any rhythmic sequence of musical elements (moving melody, or drums/percussion)

Non Music

  • Ambient sound effect (e.g., low frequency sound)
  • Pitched sound with only one note (no moving melody)
  • Traditional phone bell ring or buzz with no apparent musical elements

Speech

  • human voice in different languages
  • Oh (is considered speech, as in Oh my! Or Oh no!)
  • Singing with lyrics

Non Speech

  • Sighing
  • Screaming
  • Laughing
  • Ah, Hm, Uh-hum, Uh, Err
  • Groaning, moaning, heavy breathing

Reference

[1] S. Chaudhuri, J. Roth, D. P. Ellis, A. Gallagher, L. Kaver, R. Marvin, C. Pantofaru, N. Reale, L. G. Reid, K. Wilson et al., “AVA-speech: A densely la- beled dataset of speech activity in movies,” in Proceed- ings of the 19th Annual Conference of the International Speech Communication Association, 2018.

[2] S. Venkatesh, D. Moffat, and E. R. Miranda, “Inves- tigating the effects of training set synthesis for audio segmentation of radio broadcast,” Electronics, vol. 10, no. 7, p. 827, 2021.

[3] D. Doukhan, E. Lechapt, M. Evrard, and J. Carrive, “INA’s MIREX 2018 music and speech detection sys- tem,” in 14th Music Information Retrieval Evaluation eXchange, 2018.

Contact

Yun-Ning (Amy) Hung

[email protected]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages