-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardware requirements to trains Kaldi models #72
Comments
@pguyot can you share your split CPU / GPU scripts ? How many CPU cores / memory are required per GPU ? What other issues have you seen while training french models ? I found the .ipa files to have missing entries, the quality in the transcripts is wrong, the CNTRL sentence import hangs, what else can I expect and how can i help ? I can use 3 machines with 28 cores each, do you have a way to split the work over multiple pc's ? @OleksandrChekmez, I ran the small german model on 50 hours of audio with 28 cores and 1 1080 ti card in ~24 hours. Memory usage ~16gb. |
@joazoa Sorry if my previous message was unclear about CPU/GPU requirements. I have been renting a VM with a GPU and I found out that the GPU is required too early by script It is not used until stage 1 of train.py which is invoked in stage "11": The rest is CPU or I/O-bound (mostly CPU). Too many cores can be a waste of computing power as Kaldi splits data in jobs and some jobs can prove significantly longer than others (eventually n-1 cores are waiting for a single core to finish). You can set the number of jobs as printed out from this line: My script is just an adaptation of I've been working on the French model, which we may discuss on another thread. My patches may require a careful review for reproductibility and I am very glad you are trying!
Indeed, the quality flag of transcripts is ignored as verbatim are not stored in tokenized form in CSVs. This may or may not be a good idea, but does it prevent you from using the standard script? Considering parallelization on several boxes:
|
Hello, I noticed that when i did a test run for german that the cpu did not get used until epoch 1of10, probably spent half a day debugging why my cuda wasn't working until i ran it a bit longer once :) I will try and document the use of multiple gpu's and maybe slurm usage once i get that to that stage with the french model. I will leave a comment for everything french related in the other ticket. |
Dear Guenter,
It will be very helpful to know hardware requirements, to avoid problems with lack of RAM, HDD or GPU RAM and wasting time to trying to train using not enough powerful computer.
I understand that there may be no well defined requirements, everything depends from used corpora, configs, etc.
But would you mind at least sharing your hardware spec to understand what was enough to build kaldi-generic-en-tdnn_f model. And how much time it took.
Thank you!
The text was updated successfully, but these errors were encountered: