Detect speech pauses; Out of Memory Crash #14

jonathanglasmeyer · 2014-10-25T16:55:52Z

I'm not entirely sure if this is the best place to ask these kind of questions, so please point me to a better place in case there is one.

We are currently using the Sphinx4 Long Aligner with some success for a subtitling project at University Hamburg.

Today was the first time that I tried it successfully "in the field".
I took the transcription and this video from the CCC Congress and aligned the 35min video (of course i mean the converted wav according to your instructions) in ~88 min with Sphinx Long Aligner, which is pretty good i think. (You can see the (manually optimized) results on the linked video page.)

So right now the biggest problem for this application are pauses in speech. The words are always directly next to each other even if there are long pauses. This means a lot of manual dragging around of the results. Long story short: is there an option to turn on speech pauses detection?

Also, a little second problem: when trying the Aligner with a >50min audio, it fails with an Out of Memory error at the liveCMN stage (the java vm has a 7G limit), after about 2h. Is there a way to change this?

Thanks for your help and your great work, that enables us to work on subtitling the CCC videos a magnitude faster.

nshmyrev · 2014-10-25T19:57:42Z

Hi Jonathan

Thanks for using CMUSphinx

Could you please elaborate more on this problem with pauses? I'm not sure I get it.

Also please share the problematic files where you have issues with aligner.

Thank you.

jonathanglasmeyer · 2014-10-25T22:11:08Z

Hi,
so say the speaker makes a longer pause. Then this pause isn't represented in the timing information of the last word before the pause and the first word after the pause -- they are aligned as though they would be directly next to each other.

So an example where it failed with the same error on 2 pc's is this audio with this transcription.

The Aligner is running for ~45min and than hangs at the same position in the logging output (it just stands still, for >60min)

.
.
.

INFO: Skipping text range due to a high density [and]
Oct 25, 2014 8:55:49 PM edu.cmu.sphinx.api.SpeechAligner align
INFO: Aligning frame 0:15580 to text [id, like, to, introduce, our, speaker, here, patrick, here, has, made, a, carrer, of, datamining, for, good, prosecuting, war, crimes, got, a, conviction, in, his, own, country, gouatemala, thank] range edu.cmu.sphinx.util.Range@61dfee2f
20:55:49.086 INFO dictionary           Loading dictionary from: jar:file:/home/jwerner/dev/prosub/modules/aligner/sphinx4-samples.jar!/edu/cmu/sphinx/models/acoustic/wsj/dict/cmudict.0.6d
20:55:49.175 INFO dictionary           Loading filler dictionary from: jar:file:/home/jwerner/dev/prosub/modules/aligner/sphinx4-samples.jar!/edu/cmu/sphinx/models/acoustic/wsj/noisedict
20:55:49.176 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'carrer'
20:55:49.176 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'datamining'
20:55:49.177 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'gouatemala'
20:55:49.597 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'carrer'
20:55:49.598 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'datamining'
20:55:49.598 INFO dictionary           The dictionary is missing a phonetic transcription for the word 'gouatemala'
20:55:49.608 INFO lexTreeLinguist      Max CI Units 50
20:55:49.608 INFO lexTreeLinguist      Unit table size 125000
20:55:49.640 INFO liveCMN              15.56 -0.79 -1.05 -0.39 -0.27 -0.12 -0.13 -0.16 -0.17 -0.15 -0.19 -0.16 -0.16 
20:55:49.684 INFO liveCMN              13.81 -0.78 -0.88 -0.39 -0.23 -0.10 -0.10 -0.12 -0.16 -0.15 -0.16 -0.15 -0.14 
20:55:49.823 INFO liveCMN              11.58 -0.66 -0.60 -0.30 -0.12 -0.03 -0.02 -0.07 -0.12 -0.12 -0.12 -0.11 -0.13 
20:55:50.114 INFO liveCMN              11.15 -0.74 -0.72 -0.33 -0.12 0.00 -0.01 -0.05 -0.11 -0.06 -0.10 -0.11 -0.10 
20:55:50.729 INFO liveCMN              12.25 -0.87 -0.85 -0.39 -0.17 -0.03 -0.06 -0.06 -0.11 -0.07 -0.10 -0.11 -0.12 
20:55:51.236 INFO liveCMN              13.46 -0.75 -0.88 -0.39 -0.15 -0.05 -0.07 -0.06 -0.11 -0.10 -0.12 -0.13 -0.13

So here it is probably not a Out of Memory problem, but some other kind ..

Could this be correlated to bad quality of the transcription?

mbait · 2014-10-25T22:54:23Z

Hi Jonathan,

Then this pause isn't represented in the timing information of the last
word before the pause and the first word after the pause -- they are
aligned as though they would be directly next to each other.

It still isn't clear what's your expected and actual output.

On Sun, Oct 26, 2014 at 9:11 AM, Jonathan Werner [email protected]
wrote:

Hi,
so say the speaker makes a longer pause. Then this pause isn't represented
in the timing information of the last word before the pause and the first
word after the pause -- they are aligned as though they would be directly
next to each other.

So an example where it failed with the same error on 2 pc's is this audio
https://transfer.sh/fd21Z/datamining.wav with [this transcription]
https://transfer.sh/fd21Z/datamining.wav).

The Aligner is running for ~45min and than hangs at the same position in
the logging output (it just stands still, for >60min)

.
.
.

INFO: Skipping text range due to a high density [and]
Oct 25, 2014 8:55:49 PM edu.cmu.sphinx.api.SpeechAligner align
INFO: Aligning frame 0:15580 to text [id, like, to, introduce, our, speaker, here, patrick, here, has, made, a, carrer, of, datamining, for, good, prosecuting, war, crimes, got, a, conviction, in, his, own, country, gouatemala, thank] range edu.cmu.sphinx.util.Range@61dfee2f
20:55:49.086 INFO dictionary Loading dictionary from: jar:file:/home/jwerner/dev/prosub/modules/aligner/sphinx4-samples.jar!/edu/cmu/sphinx/models/acoustic/wsj/dict/cmudict.0.6d
20:55:49.175 INFO dictionary Loading filler dictionary from: jar:file:/home/jwerner/dev/prosub/modules/aligner/sphinx4-samples.jar!/edu/cmu/sphinx/models/acoustic/wsj/noisedict
20:55:49.176 INFO dictionary The dictionary is missing a phonetic transcription for the word 'carrer'
20:55:49.176 INFO dictionary The dictionary is missing a phonetic transcription for the word 'datamining'
20:55:49.177 INFO dictionary The dictionary is missing a phonetic transcription for the word 'gouatemala'
20:55:49.597 INFO dictionary The dictionary is missing a phonetic transcription for the word 'carrer'
20:55:49.598 INFO dictionary The dictionary is missing a phonetic transcription for the word 'datamining'
20:55:49.598 INFO dictionary The dictionary is missing a phonetic transcription for the word 'gouatemala'
20:55:49.608 INFO lexTreeLinguist Max CI Units 50
20:55:49.608 INFO lexTreeLinguist Unit table size 125000
20:55:49.640 INFO liveCMN 15.56 -0.79 -1.05 -0.39 -0.27 -0.12 -0.13 -0.16 -0.17 -0.15 -0.19 -0.16 -0.16
20:55:49.684 INFO liveCMN 13.81 -0.78 -0.88 -0.39 -0.23 -0.10 -0.10 -0.12 -0.16 -0.15 -0.16 -0.15 -0.14
20:55:49.823 INFO liveCMN 11.58 -0.66 -0.60 -0.30 -0.12 -0.03 -0.02 -0.07 -0.12 -0.12 -0.12 -0.11 -0.13
20:55:50.114 INFO liveCMN 11.15 -0.74 -0.72 -0.33 -0.12 0.00 -0.01 -0.05 -0.11 -0.06 -0.10 -0.11 -0.10
20:55:50.729 INFO liveCMN 12.25 -0.87 -0.85 -0.39 -0.17 -0.03 -0.06 -0.06 -0.11 -0.07 -0.10 -0.11 -0.12
20:55:51.236 INFO liveCMN 13.46 -0.75 -0.88 -0.39 -0.15 -0.05 -0.07 -0.06 -0.11 -0.10 -0.12 -0.13 -0.13

So here it is not probably not a Out of Memory problem, but some other
kind ..

Could this be correlated to bad quality of the transcription?

—
Reply to this email directly or view it on GitHub
#14 (comment).

Sincerely, Alexander

jonathanglasmeyer · 2014-10-26T07:21:40Z

Ok, let me rephrase it with an example:
Say you have two words A and B, with the following real start and stop times (in seconds):
A start=2, stop=2.2
B start=4, stop=4.2

So you have a speech pause between 2.2 and 4.
We would like to have this pause represented in the alignment.

But the actual alignment looks for example like this:
A start=2, stop=2.2
B start=2.2, stop=4.2

nshmyrev · 2014-10-26T20:14:28Z

I can take a look

Btw, for better alignment quality you should better use en-us generic acoustic model:

http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz/download

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect speech pauses; Out of Memory Crash #14

Detect speech pauses; Out of Memory Crash #14

jonathanglasmeyer commented Oct 25, 2014

nshmyrev commented Oct 25, 2014

jonathanglasmeyer commented Oct 25, 2014

mbait commented Oct 25, 2014

jonathanglasmeyer commented Oct 26, 2014

nshmyrev commented Oct 26, 2014

Detect speech pauses; Out of Memory Crash #14

Detect speech pauses; Out of Memory Crash #14

Comments

jonathanglasmeyer commented Oct 25, 2014

nshmyrev commented Oct 25, 2014

jonathanglasmeyer commented Oct 25, 2014

mbait commented Oct 25, 2014

jonathanglasmeyer commented Oct 26, 2014

nshmyrev commented Oct 26, 2014