Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high_volubility.py: chunk sizes #125

Open
GladB opened this issue Jun 3, 2019 · 3 comments
Open

high_volubility.py: chunk sizes #125

GladB opened this issue Jun 3, 2019 · 3 comments
Assignees

Comments

@GladB
Copy link

GladB commented Jun 3, 2019

It is possible to modify the size of the chunks for the first, second and third extraction steps with the --chunk_sizes argument, however the new_onset_*_minutes function used to compute the onsets to extract those chunks only exists for 2 (second extraction step) and 5 (third extraction step) minutes:

  • in new_onsets_two_minutes the new onset seems to be later than the given onset, meaning the extracted chunk which was supposed to be centered on the smaller chunks starts in the middle of the smaller chunk, is that on purpose?
  • in in new_onsets_five_minutes the new onset is 2 minutes before the current onset, no matter the length asked for (it could be that the --chunk_sizes argument was [10.0, 120.0, 120.0] and then none of the chunks would contain the data based on which they were ranked)

I don't know which behavior was expected, but the second one at least can be conflicting with what the script is supposed to output.

@fmetze
Copy link
Contributor

fmetze commented Jun 4, 2019

@alecristia - you will know best how this is supposed to work?

@alecristia
Copy link
Collaborator

No, sorry - and the second one looks like a bug, so I'm tagging Marvin

fmetze added a commit that referenced this issue Jun 18, 2019
…ally wrt #122 and #125. may still need improvements to cmd line parameters, expected behavior, or documentation/ code match
@MarvinLvn
Copy link
Collaborator

MarvinLvn commented Jun 20, 2019

It is possible to modify the size of the chunks for the first, second and third extraction steps with the --chunk_sizes argument, however the new_onset_*_minutes function used to compute the onsets to extract those chunks only exists for 2 (second extraction step) and 5 (third extraction step) minutes.

Yep this functions needs the previous list of onsets. Therefore, the first one must be computed differently, with the select_onsets function

in new_onsets_two_minutes the new onset seems to be later than the given onset, meaning the extracted chunk which was supposed to be centered on the smaller chunks starts in the middle of the smaller chunk, is that on purpose?

With the following parameters :

a) a wav file of 3000 seconds
b) --chunk_sizes 10.0 120.0 300.0
c) --nb_chunks 2
d) --step 600

  1. We compute the onsets of the 10 seconds chunks (each of them being separated by 600 sec), these onsets are :
    [145.0, 745.0, 1345.0, 1945.0, 2545.0]

  2. We keep the nb_chunks * 2 of them that contain the most speech, and we compute the onsets of the new chunks (sorted by amount of speech) :
    [690.0, 90.0, 1290.0, 2490.0]

745 became 690, 145 became 90, etc ... :

  1. We keep the nb_chunks of them that contain the most speech, and we compute the onsets of the new chunks (the ones that will be returned by the script) :

[600.0, 0.0]

690 became 600, 90 became 0, etc ...

Going back to first list of onsets ([145.0, 745.0, 1345.0, 1945.0, 2545.0]), we see that we chose :

  1. The second one, whose first chunk was starting at 745 (centered at 750).
  2. The first one, whose first chunk was starting at 145.0 (centered at 150).

in new_onsets_five_minutes the new onset is 2 minutes before the current onset, no matter the length asked for (it could be that the --chunk_sizes argument was [10.0, 120.0, 120.0] and then none of the chunks would contain the data based on which they were ranked)

The bugs you are describing might have been fixed by this commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants