Adjusting Pitch and Duration in Inference Process. #21

Skaiyin · 2024-02-09T02:33:48Z

Hello,

I've successfully trained a model using this repository, and it works great for basic inference tasks. However, I noticed that during the inference process, there doesn't seem to be an option to adjust parameters like pitch and duration, which is something I can do within the software's UI.

Could you please explain what technology or techniques are being used to achieve this functionality in the software? Is there a specific part of the codebase that handles these adjustments, or is it utilizing an external library or tool?

Patchethium · 2024-03-19T12:09:49Z

Hi, this repo basically follows the structure of FastSpeech2, you can get some information from the paper.

According to the paper, the duration and pitch are precited by 2 modules Variance Predictor at inference time, which in this repo is defined here. The predictor takes a sequence of hidden text representation and predicts dur/pit with the same length.

I don't know how exactly your trained checkpoint is organized, but you can divide the model by the predictors, intercept the duration and pitch from predictors, adjust them and send back to decoder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjusting Pitch and Duration in Inference Process. #21

Adjusting Pitch and Duration in Inference Process. #21

Skaiyin commented Feb 9, 2024

Patchethium commented Mar 19, 2024

Adjusting Pitch and Duration in Inference Process. #21

Adjusting Pitch and Duration in Inference Process. #21

Comments

Skaiyin commented Feb 9, 2024

Patchethium commented Mar 19, 2024