Duration / Speech Impl #4

bkodes · 2023-05-20T15:21:52Z

bkodes
May 20, 2023

Reference: PR Comment

Ideas / Approach to incorporating duration / speech priors
--> e.g NaturalSpeech2 Impl

bkodes · 2023-05-20T15:36:56Z

bkodes
May 20, 2023
Author

@lucidrains just creating this discussion to jot ideas as we have them

0 replies

seastar105 · 2023-05-29T10:33:05Z

seastar105
May 29, 2023

InstructTTS used variance adaptor from fastspeech2 as pitch/duration predictor with their own style encoder. this model is also nar model with neural audio codec. it generates discrete tokens with discrete diffusion.

will just using standard Text Encoder + pitch/duration predictor with prompt could work nicely? it seems much doable than training good text-to-semantic transformer.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duration / Speech Impl #4

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Duration / Speech Impl #4

bkodes May 20, 2023

Replies: 2 comments

bkodes May 20, 2023 Author

seastar105 May 29, 2023

bkodes
May 20, 2023

bkodes
May 20, 2023
Author

seastar105
May 29, 2023