Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusting Pitch and Duration in Inference Process. #21

Open
Skaiyin opened this issue Feb 9, 2024 · 1 comment
Open

Adjusting Pitch and Duration in Inference Process. #21

Skaiyin opened this issue Feb 9, 2024 · 1 comment

Comments

@Skaiyin
Copy link

Skaiyin commented Feb 9, 2024

Hello,

I've successfully trained a model using this repository, and it works great for basic inference tasks. However, I noticed that during the inference process, there doesn't seem to be an option to adjust parameters like pitch and duration, which is something I can do within the software's UI.

Could you please explain what technology or techniques are being used to achieve this functionality in the software? Is there a specific part of the codebase that handles these adjustments, or is it utilizing an external library or tool?

@Patchethium
Copy link

Hi, this repo basically follows the structure of FastSpeech2, you can get some information from the paper.

According to the paper, the duration and pitch are precited by 2 modules Variance Predictor at inference time, which in this repo is defined here. The predictor takes a sequence of hidden text representation and predicts dur/pit with the same length.

I don't know how exactly your trained checkpoint is organized, but you can divide the model by the predictors, intercept the duration and pitch from predictors, adjust them and send back to decoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants