diff --git a/CONTRIBUTING.en.md b/CONTRIBUTING.en.md new file mode 100644 index 000000000..188b43c41 --- /dev/null +++ b/CONTRIBUTING.en.md @@ -0,0 +1,418 @@ +# Contributor's Guide + +[Japannese](./CONTRIBUTING.md) + +VOICEVOX ENGINE is an open-source project. This project is actively developed, and its results are reflected in the production version of VOICEVOX. VOICEVOX ENGINE welcomes contributions from the community. +This guide provides information to assist contributors, including development policies, pull request procedures, and review processes. + +## Table of Contents + +You can check the guidelines for VOICEVOX ENGINE policies from the following: + +- [Development Governance](#development-governance) +- [Versioning](#versioning) +- [Branch Strategy](#branch-strategy) +- [Pull Requests](#pull-requests) +- [Reviews](#reviews) +- [Bugs](#bugs) +- [Feature Improvements](#feature-improvements) +- [Static Analysis](#static-analysis) +- [Tests](#tests) +- [License](#license) + +You can check the guidelines according to the contributor's purpose from the following: + +- [Submitting a Pull Request](#submitting-a-pull-request) +- Bugs + - [Finding Bugs](#finding-bugs) + - [Reporting Bugs](#reporting-bugs) + - [Fixing Bugs](#fixing-bugs) +- Feature Improvements + - [Finding Feature Improvement Tasks](#finding-feature-improvement-tasks) + - [Requesting Features](#requesting-features) + - [Implementing Feature Improvements](#implementing-feature-improvements) +- [Setting Up the Environment](#setting-up-the-environment) +- [Running the Code](#running-the-code) + +You can check frequently used commands for development from the following: + +- [Installing Dependencies](#installing-dependencies) +- [Running Without Voice Library](#running-without-voice-library) +- Packages + - [Adding Packages](#adding-packages) + - [Updating Packages](#updating-packages) + - [Reflecting Package Information to pip requirements.txt File](#reflecting-package-information-to-pip-requirementstxt-file) +- Static Analysis + - [Checking for Typos](#checking-for-typos) + - [Running Static Analysis in Batch](#running-static-analysis-in-batch) +- Tests + - [Testing Code](#testing-code) + - [Updating Snapshots](#updating-snapshots) + - [Diagnosing Vulnerabilities](#diagnosing-vulnerabilities) + +## Development Governance + +VOICEVOX ENGINE conducts open development based on GitHub. +We accept feature requests, bug reports, and questions from the community through GitHub Issues. We also welcome pull requests. When creating a pull request to resolve an issue, we recommend either informing on the issue side that you've started working on it, or initially creating a Draft pull request to avoid working on the same issue as someone else. + +To facilitate more casual development, we have discussions and chats on the [VOICEVOX Unofficial Discord Server](https://discord.gg/WMwWetrzuh). Feel free to join us. + +## Versioning + +We adopt semantic versioning. +At this stage, the major version is 0, and we allow minor updates that include breaking changes. We update the minor version for major feature additions and changes, and the patch version for bug fixes and character additions. + +You can check the summary of changes for each version in the [Releases](https://github.com/VOICEVOX/voicevox_engine/releases). + +## Branch Strategy + +We adopt GitHub Flow with release branches as our branch strategy. +Pull requests are basically merged into the `master` branch. As an exception, at the time of updating the production version of VOICEVOX, a release branch `release-X.Y` is prepared, temporarily branching from `master`. Commits necessary for the release are made to `release-X.Y`, and releases are made from this branch. Hotfixes immediately after release are first merged into `release-X.Y`, and after the release, the entire branch is merged into `master`. + +## Pull Requests + +All code changes are made through pull requests. +Pull requests are managed collectively on [GitHub Pull requests](https://github.com/VOICEVOX/voicevox_engine/pulls) and merged after [review](#reviews). VOICEVOX ENGINE welcomes pull requests from the community. + +### Submitting a Pull Request + +You can create a pull request by following these steps: + +- Set up the [development environment](#setting-up-the-environment) +- Fork this repository and create a branch for your pull request from the `master` branch +- [Install the dependencies](#installing-dependencies) +- (Optional) [Install the voice library](#installing-the-voice-library) +- [Edit the code](#editing-the-code) +- [Run static analysis in batch](#running-static-analysis-in-batch) ([Type checking](#type-checking), [Linting](#linting), [Formatting](#formatting)) +- [Test the code](#testing-the-code) +- Push the branch to remote and create a pull request to this repository + +## Reviews + +All pull requests are merged after review. +Reviews are conducted openly on [GitHub Pull requests](https://github.com/VOICEVOX/voicevox_engine/pulls), and anyone in the community can participate in the form of comments. After review, it will be merged into the `master` (or `release-X.Y`) branch. Merging requires approval from the VOICEVOX team. + +## Bugs + +We use GitHub Issues to centrally manage bugs. + +### Finding Bugs + +You can access the list of known bugs by [filtering with the `bug` label](https://github.com/VOICEVOX/voicevox_engine/issues?q=is%3Aissue+is%3Aopen+label%3Abug). The status of bug fixes can be checked in each bug's issue. + +### Reporting Bugs + +If you find a bug that is not in the list of known bugs (new bug), you can report it on GitHub Issues. VOICEVOX ENGINE welcomes reports of new bugs. + +### Fixing Bugs + +Bug fixes are discussed on the Issue and fixed using pull requests. The procedure for creating a pull request is guided in "[Submitting a Pull Request](#submitting-a-pull-request)". VOICEVOX ENGINE welcomes pull requests that fix bugs. + +## Feature Improvements + +We use GitHub Issues to centrally manage feature improvements. + +### Finding Feature Improvement Tasks + +You can access the list of new feature additions and specification changes by [filtering with the `feature improvement` label](https://github.com/VOICEVOX/voicevox_engine/issues?q=is%3Aissue+is%3Aopen+label%3A機能向上). The implementation status of feature improvements can be checked in each feature improvement's issue. + +### Requesting Features + +If you have a feature improvement proposal that is not in the existing proposal list, you can propose it on GitHub Issues. VOICEVOX ENGINE welcomes feature improvement proposals. + +### Implementing Feature Improvements + +Feature improvements are discussed on the Issue and implemented using pull requests. The procedure for creating a pull request is guided in "[Submitting a Pull Request](#submitting-a-pull-request)". VOICEVOX ENGINE welcomes pull requests that implement feature improvements. + +## Setting Up the Environment + +It is developed using `Python 3.11.9`. +To install, you will need C/C++ compilers and CMake for each OS. + +### Installing Dependencies + +You can install the dependencies by running the following commands in the shell: + +```bash +# Install execution, development, and test environments +python -m pip install -r requirements.txt -r requirements-dev.txt -r requirements-build.txt + +# Install git hook +pre-commit install -t pre-push +``` + +## Voice Library + +The OSS version of VOICEVOX ENGINE does not include the voice library of the product version of VOICEVOX, so voice synthesis is a mock version. + +The voice library of the product version of VOICEVOX can be installed by following the terms of use and using one of the following procedures. This allows you to synthesize product version character voices such as "Zundamon". + +### Installing the Voice Library + +The voice library can be installed using one of the following procedures: + +#### Installing the Voice Library Using the Product Version of VOICEVOX + +You can use the voice library by installing the product version of VOICEVOX. +Please follow the [VOICEVOX official website](https://voicevox.hiroshiba.jp/) to install the software. + +#### Installing the Voice Library Using the Product Version of VOICEVOX CORE + +You can use the voice library by installing the product version of VOICEVOX CORE. +The necessary files will be prepared by the following commands: + +```bash +# Define variables for CORE variation (e.g., VOICEVOX CORE v0.15.0 CPU version for x64 Linux machines) +VERSION="0.15.0"; OS="linux"; ARCHITECTURE="x64"; PROCESSOR="cpu"; + +# Download and extract CORE +CORENAME="voicevox_core-${OS}-${ARCHITECTURE}-${PROCESSOR}-${VERSION}" +curl -L "https://github.com/VOICEVOX/voicevox_core/releases/download/${VERSION}/${CORENAME}.zip" -o "${CORENAME}.zip" +unzip "${CORENAME}.zip" +``` + +The CORE variation variables can be specified with the following values: + +- `VERSION`: voicevox_core version (e.g., `0.15.0`) +- `OS`: OS type (`windows` | `osx` | `linux`) +- `ARCHITECTURE`: CPU architecture (`x86` | `x64` | `arm64`) +- `PROCESSOR`: Processor type (`cpu` | `gpu` | `cuda` | `directml`) + +The latest release can be found [here](https://github.com/VOICEVOX/voicevox_core/releases/latest). + +## Running the Code + +Running VOICEVOX ENGINE will start an HTTP server. +Check the details of command-line arguments with the following command: + +```bash +python run.py --help +``` + +### Running Without Voice Library + +If you haven't installed the voice library or want to use lightweight mock voice synthesis, you can run the engine by executing the following command in the shell: + +```bash +python run.py --enable_mock +``` + +### Running Using the Product Version of VOICEVOX as Voice Library + +```bash +VOICEVOX_DIR="C:/path/to/VOICEVOX/vv-engine" # Path to ENGINE in the product version VOICEVOX directory +python run.py --voicevox_dir=$VOICEVOX_DIR +``` + +### Running Using the Product Version of VOICEVOX CORE as Voice Library + +```bash +VOICELIB_DIR_1="C:/path/to/core_1"; VOICELIB_DIR_2="C:/path/to/core_2"; # Path to the product version VOICEVOX CORE directory +python run.py --voicelib_dir=$VOICELIB_DIR_1 --voicelib_dir=$VOICELIB_DIR_2 +``` + +### Changing Log to UTF8 + +```bash +python run.py --output_log_utf8 +# or +VV_OUTPUT_LOG_UTF8=1 python run.py +``` + +## Editing Code + +### Packages + +We manage packages using `poetry`. We also generate `requirements-*.txt` files for `pip` users. +Dependency packages must have licenses that "do not conflict with the voice library's license even when integrated with the voice library through building". +The acceptability of major licenses is as follows: + +- MIT/Apache/BSD-3: OK +- LGPL: OK (because it's dynamically separated from the core) +- GPL: Not OK (because it requires disclosure of all related code) + +#### Adding Packages + +```bash +poetry add `package_name` +poetry add --group dev `package_name` # Adding development dependencies +poetry add --group build `package_name` # Adding build dependencies +``` + +#### Updating Packages + +```bash +poetry update `package_name` +poetry update # Update all +``` + +#### Reflecting Package Information to pip requirements.txt File + +```bash +poetry export --without-hashes -o requirements.txt # If you update this, you need to update the two below as well. +poetry export --without-hashes --with dev -o requirements-dev.txt +poetry export --without-hashes --with build -o requirements-build.txt +``` + +## Static Analysis + +### Type Checking + +We employ type checking. +The goal is to improve safety, and we use `mypy` as the checker. + +For running type checks, refer to the "[Running Static Analysis in Batch](#running-static-analysis-in-batch)" section. + +### Linting + +We employ automatic linting. +The goal is to improve safety, and we use `flake8` and `isort` as linters. + +For running linters, refer to the "[Running Static Analysis in Batch](#running-static-analysis-in-batch)" section. + +### Formatting + +We employ automatic code formatting. +The goal is to improve readability, and we use `black` as the formatter. + +For running the formatter, refer to the "[Running Static Analysis in Batch](#running-static-analysis-in-batch)" section. + +Note that we currently do not employ automatic document formatting. Maintainers periodically format using `prettier`. + +### Typo Checking + +We employ typo checking. +The goal is to improve readability, and we use [`typos`](https://github.com/crate-ci/typos) as the checker. If there are false positives or files that should be excluded from checking, please edit `pyproject.toml` according to the [configuration file explanation](https://github.com/crate-ci/typos#false-positives). +For local installation of `typos`, please refer to the official documentation according to your environment. If local installation is difficult, please refer to the results of `typos` automatically executed by GitHub Actions during pull requests. + +#### Checking for Typos + +Execute the following command in the shell to check for typos: + +```bash +typos +``` + +### Running Static Analysis in Batch + +Execute the following command in the shell to run static analysis ([type checking](#type-checking), [linting](#linting), [formatting](#formatting)) in batch. +Automatic corrections will be made where possible. + +```bash +pysen run format lint +``` + +## Testing + +We employ automated testing. +To aim for long-term stable development, we have enriched both unit tests and End-to-End tests, and we also adopt snapshot tests to guarantee the invariance of values. We use `pytest` as the test runner. + +### Testing Code + +Execute the following command in the shell to run tests: + +```bash +python -m pytest +``` + +### Updating Snapshots + +When code changes alter expected output values, it may be necessary to update snapshots. +Execute the following command in the shell to update snapshots: + +```bash +python -m pytest --snapshot-update +``` + +### Diagnosing Vulnerabilities + +We ensure the safety of dependency packages through vulnerability diagnosis using `safety`. +Execute the following command in the shell to diagnose vulnerabilities: + +```bash +safety check -r requirements.txt -r requirements-dev.txt -r requirements-build.txt +``` + +## Building + +The build created by this method differs from what is publicly released. Also, for GPU usage, additional libraries such as cuDNN, CUDA, or DirectML are required. + +```bash +OUTPUT_LICENSE_JSON_PATH=licenses.json \ +bash tools/create_venv_and_generate_licenses.bash + +# For mock build +pyinstaller --noconfirm run.spec + +# For product version build +CORE_MODEL_DIR_PATH="/path/to/core_model" \ +LIBCORE_PATH="/path/to/libcore" \ +LIBONNXRUNTIME_PATH="/path/to/libonnxruntime" \ +pyinstaller --noconfirm run.spec +``` + +TODO: Describe Docker version build procedure based on GitHub Actions + +### Building with Github Actions + +You can build by turning on Actions in your forked repository and triggering `build-engine-package.yml` with workflow_dispatch. +The artifacts will be uploaded to Releases. + +### Checking the API Documentation + +The [API Documentation](https://voicevox.github.io/voicevox_engine/api/) (actual file is `docs/api/index.html`) is automatically updated. +You can manually create the API documentation with the following command: + +```bash +PYTHONPATH=. python tools/make_docs.py +``` + +## GitHub Actions + +### Variables + +| name | description | +| :----------------- | :-------------------- | +| DOCKERHUB_USERNAME | Docker Hub username | + +### Secrets + +| name | description | +| :-------------- | :-------------------------------------------------------------------- | +| DOCKERHUB_TOKEN | [Docker Hub access token](https://hub.docker.com/settings/security) | + +## Issue + +Please report bugs, feature requests, improvement suggestions, and questions in the Issue section. + +### Issue Status + +VOICEVOX ENGINE organizes issue status transitions as follows: +Each status corresponds to a GitHub `status: XX` label (e.g., [`status: seeking implementer`](https://github.com/VOICEVOX/voicevox_engine/labels/状態:実装者募集)). + +```mermaid +--- +title: Issue Status Transition Diagram v1.0 +--- +stateDiagram-v2 + [*] --> NecessityDiscussion : issue open + state opened { + NecessityDiscussion --> Design + Design --> SeekingImplementer + SeekingImplementer --> Implementation : Start declaration + } + opened --> not_planned : NoGo decision + not_planned --> [*] : issue close + Implementation --> resolved : Pull request merge + resolved --> [*] : issue close + opened --> Roadmap : Stagnation + Roadmap --> opened +``` + +NOTE: The decision to roadmap should be made when an issue has stagnated for 30 days in `NecessityDiscussion`, or 180 days in `Design`, `SeekingImplementer`, or `Implementation`. Support should also be considered during `Implementation` stagnation. + +## License + +This is a dual license of LGPL v3 and another license that does not require source code disclosure. +If you want to obtain the other license, please contact Hiho. +X account: [@hiho_karuta](https://x.com/hiho_karuta) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 3e67d3480..e85eea3dd 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,5 +1,7 @@ # 貢献者ガイド +[English](./CONTRIBUTING.en.md) + VOICEVOX ENGINE はオープンソースプロジェクトです。本プロジェクトは活発に開発されており、その成果は製品版 VOICEVOX へも反映されています。VOICEVOX ENGINE はコミュニティの皆さんからのコントリビューションを歓迎しています。 本ガイドは開発方針・プルリクエスト手順・レビュープロセスなど、コントリビュータの皆さんの一助となる情報を提供します。 diff --git a/README.en.md b/README.en.md new file mode 100644 index 000000000..bdf195353 --- /dev/null +++ b/README.en.md @@ -0,0 +1,653 @@ +# VOICEVOX ENGINE + +[![build](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-engine-package.yml/badge.svg)](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-engine-package.yml) +[![releases](https://img.shields.io/github/v/release/VOICEVOX/voicevox_engine)](https://github.com/VOICEVOX/voicevox_engine/releases) +[![discord](https://img.shields.io/discord/879570910208733277?color=5865f2&label=&logo=discord&logoColor=ffffff)](https://discord.gg/WMwWetrzuh) + +[![test](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/test.yml/badge.svg)](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/test.yml) +[![Coverage Status](https://coveralls.io/repos/github/VOICEVOX/voicevox_engine/badge.svg)](https://coveralls.io/github/VOICEVOX/voicevox_engine) + +[![build-docker](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-engine-container.yml/badge.svg)](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-engine-container.yml) +[![docker](https://img.shields.io/docker/pulls/voicevox/voicevox_engine)](https://hub.docker.com/r/voicevox/voicevox_engine) + +[Japanese](./README.md) + +This is the engine for [VOICEVOX](https://voicevox.hiroshiba.jp/). +It's essentially an HTTP server, so you can perform text-to-speech synthesis by sending requests. + +(The editor is [VOICEVOX](https://github.com/VOICEVOX/voicevox/), +the core is [VOICEVOX CORE](https://github.com/VOICEVOX/voicevox_core/), +and the overall structure is detailed [here](https://github.com/VOICEVOX/voicevox/blob/main/docs/%E5%85%A8%E4%BD%93%E6%A7%8B%E6%88%90.md).) + +## Table of Contents + +Here are guides tailored to your specific needs: + +- [User Guide](#user-guide): For those who want to perform text-to-speech synthesis +- [Contributor Guide](#contributor-guide): For those who want to contribute to the project +- [Developer Guide](#developer-guide): For those who want to utilize the code + +## User Guide + +### Download + +Please download the corresponding engine from [here](https://github.com/VOICEVOX/voicevox_engine/releases/latest). + +### API Documentation + +Please refer to the [API Documentation](https://voicevox.github.io/voicevox_engine/api/). + +You can also access the documentation for the running engine by visiting http://127.0.0.1:50021/docs while the VOICEVOX engine or editor is running. +For future plans and other information, you may find [Collaboration with VOICEVOX Text-to-Speech Engine](./docs/Integration_with_VOICEVOX_Speech_Synthesis_Engine.en.md) helpful. + +### Docker Image + +#### CPU + +```bash +docker pull voicevox/voicevox_engine:cpu-ubuntu20.04-latest +docker run --rm -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:cpu-ubuntu20.04-latest +``` + +#### GPU + +```bash +docker pull voicevox/voicevox_engine:nvidia-ubuntu20.04-latest +docker run --rm --gpus all -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:nvidia-ubuntu20.04-latest +``` + +##### Troubleshooting + +When using the GPU version, errors may occur depending on the environment. In such cases, adding `--runtime=nvidia` to the `docker run` command may resolve the issue. + +### Sample Code for Text-to-Speech Synthesis via HTTP Request + +```bash +echo -n "Hello, welcome to the world of speech synthesis" >text.txt + +curl -s \ + -X POST \ + "127.0.0.1:50021/audio_query?speaker=1"\ + --get --data-urlencode text@text.txt \ + > query.json + +curl -s \ + -H "Content-Type: application/json" \ + -X POST \ + -d @query.json \ + "127.0.0.1:50021/synthesis?speaker=1" \ + > audio.wav +``` + +The generated audio has a somewhat unusual sampling rate of 24000Hz, which may not be playable on some audio players. + +The value specified for `speaker` is the `style_id` obtained from the `/speakers` endpoint. It's named `speaker` for compatibility reasons. + +### Sample Code for Adjusting Speech + +You can adjust the speech by editing the parameters of the query for speech synthesis obtained from `/audio_query`. + +For example, let's try to increase the speech speed by 1.5 times. + +```bash +echo -n "Hello, welcome to the world of speech synthesis" >text.txt + +curl -s \ + -X POST \ + "127.0.0.1:50021/audio_query?speaker=1" \ + --get --data-urlencode text@text.txt \ + > query.json + +# Use sed to change the value of speedScale to 1.5 +sed -i -r 's/"speedScale":[0-9.]+/"speedScale":1.5/' query.json + +curl -s \ + -H "Content-Type: application/json" \ + -X POST \ + -d @query.json \ + "127.0.0.1:50021/synthesis?speaker=1" \ + > audio_fast.wav +``` + +### Retrieving and Modifying Pronunciation with AquesTalk-like Notation + +#### AquesTalk-like Notation + + + +"**AquesTalk-like Notation**" is a notation that specifies pronunciation using only katakana and symbols. It differs slightly from [the original AquesTalk notation](https://www.a-quest.com/archive/manual/siyo_onseikigou.pdf). +AquesTalk-like Notation follows these rules: + +- All kana are written in katakana +- Accent phrases are separated by `/` or `、`. A silent interval is inserted only when separated by `、`. +- Placing `_` before a kana makes that kana unvoiced +- Accent position is specified with `'`. Each accent phrase must have one accent position specified. +- Adding `?` (full-width) at the end of an accent phrase allows for interrogative pronunciation + +#### Sample Code for AquesTalk-like Notation + +The response from `/audio_query` includes the pronunciation determined by the engine, described in [AquesTalk-like Notation](#aquestalk-like-notation). +By modifying this, you can control the reading and accent of the speech. + +```bash +# Write the text you want to be read in utf-8 to text.txt +echo -n "Deep learning is not a panacea" >text.txt + +curl -s \ + -X POST \ + "127.0.0.1:50021/audio_query?speaker=1" \ + --get --data-urlencode text@text.txt \ + > query.json + +cat query.json | grep -o -E "\"kana\":\".*\"" +# Result... "kana":"ディ'イプ/ラ'アニングワ/パンノオヤクデワアリマセ'ン" + +# We want it to be read as "ディイプラ'アニングワ/パンノ'オヤクデワ/アリマセ'ン", so +# Get the intonation with is_kana=true and save it to newphrases.json +echo -n "ディイプラ'アニングワ/パンノ'オヤクデワ/アリマセ'ン" > kana.txt +curl -s \ + -X POST \ + "127.0.0.1:50021/accent_phrases?speaker=1&is_kana=true" \ + --get --data-urlencode text@kana.txt \ + > newphrases.json + +# Replace the content of "accent_phrases" in query.json with the content of newphrases.json +cat query.json | sed -e "s/\[{.*}\]/$(cat newphrases.json)/g" > newquery.json + +curl -s \ + -H "Content-Type: application/json" \ + -X POST \ + -d @newquery.json \ + "127.0.0.1:50021/synthesis?speaker=1" \ + > audio.wav +``` + +### User Dictionary Feature + +You can reference, add, edit, and delete words in the user dictionary via the API. + +#### Reference + +You can get a list of the user dictionary by sending a GET request to `/user_dict`. + +```bash +curl -s -X GET "127.0.0.1:50021/user_dict" +``` + +#### Adding Words + +You can add words to the user dictionary by sending a POST request to `/user_dict_word`. +The following URL parameters are required: + +- surface (the word to be registered in the dictionary) +- pronunciation (katakana reading) +- accent_type (accent nucleus position, integer) + +For the accent nucleus position, this text might be helpful. +The number part that is marked with ○ is the accent nucleus position. +https://tdmelodic.readthedocs.io/ja/latest/pages/introduction.html + +If successful, the return value will be a string of the UUID assigned to the word. + +```bash +surface="test" +pronunciation="テスト" +accent_type="1" + +curl -s -X POST "127.0.0.1:50021/user_dict_word" \ + --get \ + --data-urlencode "surface=$surface" \ + --data-urlencode "pronunciation=$pronunciation" \ + --data-urlencode "accent_type=$accent_type" +``` + +#### Editing Words + +You can edit words in the user dictionary by sending a PUT request to `/user_dict_word/{word_uuid}`. +The following URL parameters are required: + +- surface (the word to be registered in the dictionary) +- pronunciation (katakana reading) +- accent_type (accent nucleus position, integer) + +The word_uuid can be confirmed when adding a word or by referencing the user dictionary. +If successful, the return value will be `204 No Content`. + +```bash +surface="test2" +pronunciation="テストツー" +accent_type="2" +# Please replace word_uuid according to your environment +word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d" + +curl -s -X PUT "127.0.0.1:50021/user_dict_word/$word_uuid" \ + --get \ + --data-urlencode "surface=$surface" \ + --data-urlencode "pronunciation=$pronunciation" \ + --data-urlencode "accent_type=$accent_type" +``` + +#### Deleting Words + +You can delete words from the user dictionary by sending a DELETE request to `/user_dict_word/{word_uuid}`. + +The word_uuid can be confirmed when adding a word or by referencing the user dictionary. +If successful, the return value will be `204 No Content`. + +```bash +# Please replace word_uuid according to your environment +word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d" + +curl -s -X DELETE "127.0.0.1:50021/user_dict_word/$word_uuid" +``` + +#### Importing & Exporting Dictionary + +You can import and export the user dictionary in the "User Dictionary Export & Import" section of the engine's [settings page](http://127.0.0.1:50021/setting). + +You can also import and export the user dictionary via API. +Use `POST /import_user_dict` for importing and `GET /user_dict` for exporting. +For details on arguments, etc., please refer to the API documentation. + +### About Preset Feature + +You can use presets for characters, speech speed, etc. by editing `presets.yaml` in the user directory. + +```bash +echo -n "By effectively utilizing presets, third parties can use the same settings" >text.txt + +# Get preset information +curl -s -X GET "127.0.0.1:50021/presets" > presets.json + +preset_id=$(cat presets.json | sed -r 's/^.+"id"\:\s?([0-9]+?).+$/\1/g') +style_id=$(cat presets.json | sed -r 's/^.+"style_id"\:\s?([0-9]+?).+$/\1/g') + +# Get query for voice synthesis +curl -s \ + -X POST \ + "127.0.0.1:50021/audio_query_from_preset?preset_id=$preset_id"\ + --get --data-urlencode text@text.txt \ + > query.json + +# Voice synthesis +curl -s \ + -H "Content-Type: application/json" \ + -X POST \ + -d @query.json \ + "127.0.0.1:50021/synthesis?speaker=$style_id" \ + > audio.wav +``` + +- `speaker_uuid` can be confirmed with `/speakers` +- `id` must not be duplicated +- Changes to the file will be reflected in the engine after the engine is started + +### Sample Code for Morphing with 2 Types of Styles + +`/synthesis_morphing` generates morphed audio based on voices synthesized in two different styles. + +```bash +echo -n "By using morphing, you can mix two types of voices." > text.txt + +curl -s \ + -X POST \ + "127.0.0.1:50021/audio_query?speaker=8"\ + --get --data-urlencode text@text.txt \ + > query.json + +# Synthesis result in the original style +curl -s \ + -H "Content-Type: application/json" \ + -X POST \ + -d @query.json \ + "127.0.0.1:50021/synthesis?speaker=8" \ + > audio.wav + +export MORPH_RATE=0.5 + +# Note that it takes time because it involves voice synthesis for two styles + voice analysis by WORLD +curl -s \ + -H "Content-Type: application/json" \ + -X POST \ + -d @query.json \ + "127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \ + > audio.wav + +export MORPH_RATE=0.9 + +# If query, base_speaker, and target_speaker are the same, cache is used, so it's generated relatively quickly +curl -s \ + -H "Content-Type: application/json" \ + -X POST \ + -d @query.json \ + "127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \ + > audio.wav +``` + +### Sample Code for Retrieving Additional Character Information + +This code retrieves portrait.png from the additional information. +(Using [jq](https://stedolan.github.io/jq/) to parse JSON.) + +```bash +curl -s -X GET "127.0.0.1:50021/speaker_info?speaker_uuid=7ffcb7ce-00ec-4bdc-82cd-45a8889e43ff" \ + | jq -r ".portrait" \ + | base64 -d \ + > portrait.png +``` + +### Cancellable Voice Synthesis + +`/cancellable_synthesis` immediately releases computational resources when the connection is cut. +(With `/synthesis`, voice synthesis calculation continues to the end even if the connection is cut.) +This API is an experimental feature and is not enabled unless the `--enable_cancellable_synthesis` argument is specified when starting the engine. +The parameters required for voice synthesis are the same as for `/synthesis`. + +### Sample Code for Song Synthesis via HTTP Request + +```bash +echo -n '{ + "notes": [ + { "key": null, "frame_length": 15, "lyric": "" }, + { "key": 60, "frame_length": 45, "lyric": "Do" }, + { "key": 62, "frame_length": 45, "lyric": "Re" }, + { "key": 64, "frame_length": 45, "lyric": "Mi" }, + { "key": null, "frame_length": 15, "lyric": "" } + ] +}' > score.json + +curl -s \ + -H "Content-Type: application/json" \ + -X POST \ + -d @score.json \ + "127.0.0.1:50021/sing_frame_audio_query?speaker=6000" \ + > query.json + +curl -s \ + -H "Content-Type: application/json" \ + -X POST \ + -d @query.json \ + "127.0.0.1:50021/frame_synthesis?speaker=3001" \ + > audio.wav +``` + +The `key` in the score is the MIDI number. +`lyric` is the lyrics, and any string can be specified, but some engines may return an error for strings other than one mora in hiragana or katakana. +The default frame rate is 93.75Hz, which can be obtained from `frame_rate` in the engine manifest. +The first note must be silent. + +The `speaker` that can be specified in `/sing_frame_audio_query` is the `style_id` of styles with type `sing` or `singing_teacher` that can be obtained from `/singers`. +The `speaker` that can be specified in `/frame_synthesis` is the `style_id` of styles with type `frame_decode` that can be obtained from `/singers`. +The argument is named `speaker` for consistency with other APIs. + +It's also possible to specify different styles for `/sing_frame_audio_query` and `/frame_synthesis`. + +### CORS Settings + +For security protection, VOICEVOX is set to only accept requests from `localhost`, `127.0.0.1`, `app://`, or no Origin. +Therefore, responses may not be received from some third-party applications. +As a workaround, we provide a UI that can be configured from the engine. + +#### Configuration Method + +1. Access . +2. Change or add settings according to the application you're using. +3. Press the save button to confirm the changes. +4. Restarting the engine is necessary to apply the settings. Please restart as needed. + +### Disabling APIs that Modify Data + +By specifying the runtime argument `--disable_mutable_api` or setting the environment variable `VV_DISABLE_MUTABLE_API=1`, you can disable APIs that modify engine settings, dictionaries, etc. + +### Character Encoding + +The character encoding for all requests and responses is UTF-8. + +### Other Arguments + +Arguments can be specified when starting the engine. For more details, check the help with the `-h` argument. + +```bash +$ python run.py -h + +usage: run.py [-h] [--host HOST] [--port PORT] [--use_gpu] [--voicevox_dir VOICEVOX_DIR] [--voicelib_dir VOICELIB_DIR] [--runtime_dir RUNTIME_DIR] [--enable_mock] [--enable_cancellable_synthesis] + [--init_processes INIT_PROCESSES] [--load_all_models] [--cpu_num_threads CPU_NUM_THREADS] [--output_log_utf8] [--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}] + [--allow_origin [ALLOW_ORIGIN ...]] [--setting_file SETTING_FILE] [--preset_file PRESET_FILE] [--disable_mutable_api] + +This is the engine for VOICEVOX. + +options: + -h, --help show this help message and exit + --host HOST The host address to accept connections. + --port PORT The port number to accept connections. + --use_gpu Enables voice synthesis using GPU. + --voicevox_dir VOICEVOX_DIR + The directory path for VOICEVOX. + --voicelib_dir VOICELIB_DIR + The directory path for VOICEVOX CORE. + --runtime_dir RUNTIME_DIR + The directory path for libraries used by VOICEVOX CORE. + --enable_mock Performs voice synthesis with a mock without using VOICEVOX CORE. + --enable_cancellable_synthesis + Enables cancellation of voice synthesis midway. + --init_processes INIT_PROCESSES + The number of processes to generate during initialization of the cancellable_synthesis feature. + --load_all_models Loads all voice synthesis models at startup. + --cpu_num_threads CPU_NUM_THREADS + The number of threads for voice synthesis. If not specified, the value of the environment variable VV_CPU_NUM_THREADS is used instead. If VV_CPU_NUM_THREADS is not empty and not a number, it will exit with an error. + --output_log_utf8 Outputs logs in UTF-8. If not specified, the value of the environment variable VV_OUTPUT_LOG_UTF8 is used instead. If the value of VV_OUTPUT_LOG_UTF8 is 1, it's UTF-8; if 0 or empty, or if the value doesn't exist, it's automatically determined by the environment. + --cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps} + CORS permission mode. Can specify either all or localapps. all allows everything. localapps limits the cross-origin resource sharing policy to app://. and localhost-related. Other origins can be added with the allow_origin option. Default is localapps. This option takes precedence over the settings file specified by --setting_file. + --allow_origin [ALLOW_ORIGIN ...] + Specifies allowed origins. Multiple can be specified by separating with spaces. This option takes precedence over the settings file specified by --setting_file. + --setting_file SETTING_FILE + Can specify a settings file. + --preset_file PRESET_FILE + Can specify a preset file. If not specified, it searches for presets.yaml in the environment variable VV_PRESET_FILE and the user directory in that order. + --disable_mutable_api + Disables APIs that modify static data of the engine, such as dictionary registration and setting changes. If not specified, the value of the environment variable VV_DISABLE_MUTABLE_API is used instead. If the value of VV_DISABLE_MUTABLE_API is 1, it's disabled; if 0 or empty, or if the value doesn't exist, it's ignored. +``` + +### Update + +Delete all files in the engine directory and replace them with new ones. + +## Contributor's Guide + +VOICEVOX ENGINE welcomes your contributions! +For details, please see [CONTRIBUTING.en.md](./CONTRIBUTING.en.md). +We also have discussions and casual chats on the [VOICEVOX Unofficial Discord Server](https://discord.gg/WMwWetrzuh). Feel free to join us. + +When creating a pull request to resolve an issue, we recommend either informing on the issue side that you've started working on it, or initially creating a Draft pull request to avoid working on the same issue as someone else. + +## Developer's Guide + +### Environment Setup + +It is developed using `Python 3.11.9`. +To install, you'll need C/C++ compilers and CMake for each OS. + +```bash +# Install runtime environment +python -m pip install -r requirements.txt + +# Install development environment, test environment, and build environment +python -m pip install -r requirements-dev.txt -r requirements-build.txt +``` + +### Execution + +For details on command-line arguments, check with the following command: + +```bash +python run.py --help +``` + +```bash +# Start server with production version of VOICEVOX +VOICEVOX_DIR="C:/path/to/voicevox" # Path to production version VOICEVOX directory +python run.py --voicevox_dir=$VOICEVOX_DIR +``` + + + +```bash +# Start server with mock +python run.py --enable_mock +``` + +```bash +# Change log to UTF8 +python run.py --output_log_utf8 +# Or VV_OUTPUT_LOG_UTF8=1 python run.py +``` + +#### Specifying CPU Thread Count + +If CPU thread count is not specified, half of the logical core count is used. (For most CPUs, this is half of the total processing power) +If you're running on IaaS or a dedicated server, and want to adjust the processing power used by the engine, you can achieve this by specifying the CPU thread count. + +- Specify with runtime argument + ```bash + python run.py --voicevox_dir=$VOICEVOX_DIR --cpu_num_threads=4 + ``` +- Specify with environment variable + ```bash + export VV_CPU_NUM_THREADS=4 + python run.py --voicevox_dir=$VOICEVOX_DIR + ``` + +#### Using Past Versions of the Core + +It's possible to use VOICEVOX Core 0.5.4 or later. +Support for libtorch version core on Mac is not available. + +##### Specifying Past Binaries + +By specifying the directory of the production version VOICEVOX or pre-compiled engine with the `--voicevox_dir` argument, that version of the core will be used. + +```bash +python run.py --voicevox_dir="/path/to/voicevox" +``` + +On Mac, specifying `DYLD_LIBRARY_PATH` is necessary. + +```bash +DYLD_LIBRARY_PATH="/path/to/voicevox" python run.py --voicevox_dir="/path/to/voicevox" +``` + +##### Directly Specifying Voice Library + +Specify the directory of the unzipped [VOICEVOX Core zip file](https://github.com/VOICEVOX/voicevox_core/releases) with the `--voicelib_dir` argument. +Also, specify the directory of [libtorch](https://pytorch.org/) or [onnxruntime](https://github.com/microsoft/onnxruntime) (shared library) with the `--runtime_dir` argument according to the core version. +However, if libtorch and onnxruntime are in the system's search path, specifying the `--runtime_dir` argument is unnecessary. +The `--voicelib_dir` and `--runtime_dir` arguments can be used multiple times. +When specifying the core version in the API endpoint, use the `core_version` argument. (If not specified, the latest core will be used) + +```bash +python run.py --voicelib_dir="/path/to/voicevox_core" --runtime_dir="/path/to/libtorch_or_onnx" +``` + +On Mac, specifying `DYLD_LIBRARY_PATH` is necessary instead of the `--runtime_dir` argument. + +```bash +DYLD_LIBRARY_PATH="/path/to/onnx" python run.py --voicelib_dir="/path/to/voicevox_core" +``` + +##### Placing in User Directory + +Voice libraries in the following directories are automatically loaded: + +- Built version: `/voicevox-engine/core_libraries/` +- Python version: `/voicevox-engine-dev/core_libraries/` + +`` varies depending on the OS. + +- Windows: `C:\Users\\AppData\Local\` +- macOS: `/Users//Library/Application\ Support/` +- Linux: `/home//.local/share/` + +### Build + +Local building is possible through packaging with `pyinstaller` and containerization with Dockerfile. +For detailed procedures, please see [Contributor's Guide#Build](./CONTRIBUTING.en.md#build). + +If using GitHub, you can build using GitHub Actions in your forked repository. +Turn ON Actions and start `build-engine-package.yml` with workflow_dispatch to build. +The artifacts will be uploaded to Releases. +For GitHub Actions settings necessary for building, please see [Contributor's Guide#GitHub Actions](./CONTRIBUTING.md#github-actions). + +### Testing and Static Analysis + +Testing with `pytest` and static analysis with various linters are possible. +For detailed procedures, please see [Contributor's Guide#Testing](./CONTRIBUTING.md#testing) and [Contributor's Guide#Static Analysis](./CONTRIBUTING.md#static-analysis). + +### Dependencies + +Dependencies are managed with `poetry`. Also, there are license restrictions on the dependent libraries that can be introduced. +For details, please see [Contributor's Guide#Packages](./CONTRIBUTING.md#packages). + +### About Multi-Engine Feature + +In the VOICEVOX editor, you can start multiple engines simultaneously. +By using this feature, you can run your own voice synthesis engine or existing voice synthesis engines on the VOICEVOX editor. + + + +
+ +#### How the Multi-Engine Feature Works + +The multi-engine feature is realized by starting multiple Web APIs of engines compliant with the VOICEVOX API on different ports and handling them uniformly. +The editor starts each engine via executable binary and individually manages settings and states by binding them to EngineID. + +#### How to Support the Multi-Engine Feature + +Support is possible by creating an executable binary that starts a VOICEVOX API compliant engine. +The easiest way is to fork the VOICEVOX ENGINE repository and modify some of its functions. + +The points to modify are engine information, character information, and voice synthesis. + +The engine information is managed in the manifest file (`engine_manifest.json`) in the root directory. +A manifest file in this format is required for VOICEVOX API compliant engines. +Please check the information in the manifest file and change it as appropriate. +Depending on the voice synthesis method, it may not be possible to have the same functions as VOICEVOX, such as morphing. +In that case, please change the information in `supported_features` in the manifest file as appropriate. + +Character information is managed in files in the `resources/character_info` directory. +Dummy icons and such are prepared, so please change them as appropriate. + +Voice synthesis is performed in `voicevox_engine/tts_pipeline/tts_engine.py`. +In the VOICEVOX API, voice synthesis is realized by the engine creating an initial value for the voice synthesis query `AudioQuery` and returning it to the user, the user editing the query as needed, and then the engine synthesizing voice according to the query. +Query creation is done at the `/audio_query` endpoint, and voice synthesis is done at the `/synthesis` endpoint. At minimum, supporting these two is sufficient to be compliant with the VOICEVOX API. + +#### How to Distribute Multi-Engine Feature Compatible Engines + +We recommend distributing as a VVPP file. +VVPP stands for "VOICEVOX Plugin Package," and it's essentially a Zip file of a directory containing the built engine and other files. +If you change the extension to `.vvpp`, it can be installed in the VOICEVOX editor with a double-click. + +The editor side unzips the received VVPP file on the local disk, then explores files according to the `engine_manifest.json` in the root. +If you can't get it to load properly in the VOICEVOX editor, please refer to the editor's error log. + +Also, `xxx.vvpp` can be distributed as `xxx.0.vvppp` files with sequential numbers. +This is useful when the file size is large and difficult to distribute. +The `vvpp` and `vvppp` files needed for installation are listed in the `vvpp.txt` file. + +
+ +## Case Introductions + +**[voicevox-client](https://github.com/voicevox-client) [@voicevox-client](https://github.com/voicevox-client)** ・・・ API wrappers for various languages for VOICEVOX ENGINE + +## License + +Dual license of LGPL v3 and another license that doesn't require source code disclosure. +If you want to obtain the other license, please ask Hiho. +X account: [@hiho_karuta](https://x.com/hiho_karuta) diff --git a/README.md b/README.md index e16c9fad8..97bf2e843 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,8 @@ [![build-docker](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-engine-container.yml/badge.svg)](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-engine-container.yml) [![docker](https://img.shields.io/docker/pulls/voicevox/voicevox_engine)](https://hub.docker.com/r/voicevox/voicevox_engine) +[English](./CONTRIBUTING.en.md) + [VOICEVOX](https://voicevox.hiroshiba.jp/) のエンジンです。 実態は HTTP サーバーなので、リクエストを送信すればテキスト音声合成できます。 diff --git a/docs/Glossary.en.md b/docs/Glossary.en.md new file mode 100644 index 000000000..fba0df9f7 --- /dev/null +++ b/docs/Glossary.en.md @@ -0,0 +1,42 @@ +# Glossary + +This document explains terms used within VOICEVOX ENGINE. +We plan to add more terms over time. Feel free to send pull requests with additions! + + + +## Domain Terms + +TODO: Explain terms that are presented to users + +## Engine-related + +TODO + +## OpenJTalk-related + +### full-context label + +Data obtained from analyzing sentence structure, gathered for each phoneme, or a collection thereof. +Contains information such as which phoneme, which mora position, which accent phrase position, etc. +This is an HTS concept. + +### Label: label + +Refers to the full context label of a single phoneme. +This is a VOICEVOX-specific definition (within OpenJTalk, "label" refers to the full context label). + +### Context: context + +Refers to a single element within a full context. +This is a VOICEVOX-specific definition. + +### feature + +A label converted into a single-line string. +This is an OpenJTalk concept. diff --git a/docs/Integration_with_VOICEVOX_Speech_Synthesis_Engine.en.md b/docs/Integration_with_VOICEVOX_Speech_Synthesis_Engine.en.md new file mode 100644 index 000000000..3ac34f2d7 --- /dev/null +++ b/docs/Integration_with_VOICEVOX_Speech_Synthesis_Engine.en.md @@ -0,0 +1,10 @@ +# Integration with VOICEVOX Speech Synthesis Engine + +Here's a brief note introducing our development policies. + +- Even as versions increase, we plan to maintain the ability to perform speech synthesis by directly POSTing the values returned from `/audio_query` to `/synthesis` + - While `AudioQuery` parameters will increase, we'll ensure that default values generate similar output to previous versions + - We'll maintain backward compatibility by allowing older versions of `AudioQuery` to be POSTed directly to `/synthesis` in newer versions +- Voice styles have been implemented since version 0.7. Style information can be obtained from `/speakers` and `/singers` + - Speech synthesis can be performed as before by specifying the `style_id` from the style information in the `speaker` parameter + - The parameter name remains `speaker` for compatibility reasons diff --git a/docs/Resource_File_URLs_and_Filemap.en.md b/docs/Resource_File_URLs_and_Filemap.en.md new file mode 100644 index 000000000..bdc88c0f4 --- /dev/null +++ b/docs/Resource_File_URLs_and_Filemap.en.md @@ -0,0 +1,72 @@ +# Specifications for Resource File URLs + +VOICEVOX ENGINE returns some resource files as URLs. +If the URL remains the same even after updating a resource file, caching might prevent fetching the new resource. +To prevent this, we include the hash value of the resource file in the URL, ensuring the URL changes with each resource modification. + +ResourceManager manages the correspondence between files and their hashes. +filemap.json is a file that pre-maps files to their hashes. +generate_filemap.py creates the filemap.json. + +## ResourceManager + +Resource files listed in `filemap.json` can be registered. +If `create_filemap_if_not_exist` is set to `True` during initialization, directories without `filemap.json` can be registered. + +For detailed specifications, please refer to the ResourceManager documentation and implementation. + +## filemap.json + +The keys in `filemap.json` are relative paths from the registration directory to the resource files. +For compatibility, the path separator must be `/`. + +The values are strings, such as hashes, that uniquely identify the registered files. +`generate_filemap.py` generates sha256 hashes. + +### Example + +#### Directory Structure + +``` +Registration Directory/ +├── filemap.json +├── dir_1/ +│ ├── registered_file.png +│ ├── samples/ +│ │ └── registered_file.wav +│ └── unregistered_file1.txt +└── dir_2/ + ├── registered_file.png + ├── samples/ + │ └── registered_file.wav + └── unregistered_file1.txt +``` + +#### filemap.json + +```json +{ + "dir_1/registered_file.png": "HASH-1", + "dir_1/samples/registered_file.wav": "HASH-2", + "dir_2/registered_file.png": "HASH-3", + "dir_2/samples/registered_file.wav": "HASH-4" +} +``` + +## generate_filemap.py + +This script generates `filemap.json`. +By default, it only registers png and wav files. + +### Example + +```bash +python tools/generate_filemap.py --target_dir resources/character_info +``` + +Example of registering jpg files in addition to png and wav + +```bash +python tools/generate_filemap.py --target_dir resources/character_info \ + --target_suffix png --target_suffix wav --target_suffix jpg +```