You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've successfully trained a model using this repository, and it works great for basic inference tasks. However, I noticed that during the inference process, there doesn't seem to be an option to adjust parameters like pitch and duration, which is something I can do within the software's UI.
Could you please explain what technology or techniques are being used to achieve this functionality in the software? Is there a specific part of the codebase that handles these adjustments, or is it utilizing an external library or tool?
The text was updated successfully, but these errors were encountered:
Hi, this repo basically follows the structure of FastSpeech2, you can get some information from the paper.
According to the paper, the duration and pitch are precited by 2 modules Variance Predictor at inference time, which in this repo is defined here. The predictor takes a sequence of hidden text representation and predicts dur/pit with the same length.
I don't know how exactly your trained checkpoint is organized, but you can divide the model by the predictors, intercept the duration and pitch from predictors, adjust them and send back to decoder.
Hello,
I've successfully trained a model using this repository, and it works great for basic inference tasks. However, I noticed that during the inference process, there doesn't seem to be an option to adjust parameters like pitch and duration, which is something I can do within the software's UI.
Could you please explain what technology or techniques are being used to achieve this functionality in the software? Is there a specific part of the codebase that handles these adjustments, or is it utilizing an external library or tool?
The text was updated successfully, but these errors were encountered: