Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pitch range information in metas.json #60

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Patchethium
Copy link

As discussed here, I'd like to add the optimal pitch range $(\mu - 3\sigma, \mu + 3\sigma)$ into each character's metas.json. Then it can be read by engine and provided to the GUI for downstream tasks like limiting tuning sliders for more fine grained operation.

Be noted that for whisper style I use $\mu=0$ and $\sigma=0$.

@Hiroshiba
Copy link
Member

Thank you for your patience 🙇

There are a few points I'd like to discuss about your proposal. First, the idea of defining the pitch range based on 3σ from the mean and standard deviation of the synthesis results is excellent. Another idea could be to set a low/high range based on the dataset. It would be worthwhile to consider which method is superior. After merging, we could experiment with different definitions of the range and adjust as we go.

I'm also a bit concerned about the data structure, as we want to save the range not only for pitch but also for volume. This relates to the API, so I'd like to discuss it on the engine side.

日本語版

大変お待たせしました🙇

この提案について話し合いたい内容がいくつかあります。
まず、合成結果の平均と標準偏差から3σの範囲を定めるアイデアは素晴らしいと思います。
他のアイデアとして、データセットを基にlow/highの範囲を設定もできると思います。
どちらの方法が優れているか検討する価値があると思います。
マージした後実際にいろいろ試してみて、範囲の定義を調整していくプロセスで進めましょう。

また、データ構造についても少し悩んでいます。
rangeを保存したいのはピッチだけではなくボリュームもなので。
これはAPIにも関係してくるので、エンジン側で議論できればと思っています。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants