Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#146] Fix mbrola voices being loaded for espeak backend #173

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

reubenadams
Copy link

Summary: Implements the fix suggested in issue #146 by skipping voices starting with 'mb' in the EspeakWrapper.

As far as I understand it, the phonemizer uses the first voice that matches the IETF language tag, which for e.g. Japanese is an MBROLA voice. For a Windows user who cannot install MBROLA and therefore does not have the espeak-mbrola backend, this leads to the error RuntimeError: failed to load voice "ja".

The fix in issue #146 suggests to simply skip any voices starting with 'mb' in the available_voices method of the EspeakWrapper class, which is what I've done.

This is my first PR, so I've probably not implemented this in the ideal way. Sorry about that!

@mmmaat
Copy link
Collaborator

mmmaat commented Aug 23, 2024

Thanks for the PR. As you see this is breaking some tests (actually this completely hide mbrola voices and break the mbrola backend on mac and linux).

Can you please try the following and let me know if the tests are passing (just run pytest from root phonemizer directory) and Japanese is working as expected?

  1. Delete your changes in EspeakWrapper.available_voices
  2. Replace the function EspeakBackend.supported_languages by the following:
    @classmethod
    def supported_languages(cls):
        return {
            voice.language: voice.name
            for voice in EspeakWrapper().available_voices()
            # ignore mbrola voices causing a bug on windows (see #146)
            if 'mb/' not in voice.identifier}

For me (on linux) all the tests are passing and it should fix the bug on windows, by ognoring mbrola voices when using the espeak backend.

@reubenadams
Copy link
Author

reubenadams commented Aug 23, 2024

I've done as you requested. I get three errors, all because I don't have festival installed:

image

I have not installed festival because I found the link in the documentation to the install confusing: http://www.festvox.org/docs/manual-2.4.0/festival_6.html#Installation.

Unfortunately the error for Japanese has returned: RuntimeError: failed to load voice "ja" I thought this might be because you have a forward slash in your suggested addition if 'mb/' not in voice.identifier, but changing it to a backwards slash (or omitting the slash) does not resolve the issue. When I run it in debug mode I see that voice_code = 'ja' as expected, but `voice_name = 'mb\mb-jp1', which is weird because I though we had excluded the mbrola voices. Note this is the case even if I reverse or omit the slash.

I'm afraid I don't know what to try next. Any ideas?

@reubenadams
Copy link
Author

Okay I think I may have figured out why your solution didn't fix RuntimeError: failed to load voice "ja". The traceback points to self._espeak.set_voice(language) in the EspeakBackend. Now EspeakBackend and EspeakMbrolaBackend both inherit from BaseEspeakBackend, the init method of which sets self._espeak = EspeakWrapper(). But the set_voice method of EspeakWrapper is not sensitive to whether the backend is an EspeakBackend or an EspeakMbrolaBackend:

    def set_voice(self, voice_code):
        """Setup the voice to use for phonemization

        Parameters
        ----------
        voice_code (str) : Must be a valid language code that is actually
          supported by espeak

        Raises
        ------
        RuntimeError if the required voice cannot be initialized

        """
        if 'mb' in voice_code:
            # this is an mbrola voice code. Select the voice by using
            # identifier in the format 'mb/{voice_code}'
            available = {
                voice.identifier[3:]: voice.identifier
                for voice in self.available_voices('mbrola')}
        else:
            # this are espeak voices. Select the voice using it's attached
            # language code. Consider only the first voice of a given code as
            # they are sorted by relevancy
            available = {}
            for voice in self.available_voices():
                if voice.language not in available:
                    available[voice.language] = voice.identifier

        try:
            voice_name = available[voice_code]
        except KeyError:
            raise RuntimeError(f'invalid voice code "{voice_code}"') from None

        if self._espeak.set_voice_by_name(voice_name.encode('utf8')) != 0:
            raise RuntimeError(  # pragma: nocover
                f'failed to load voice "{voice_code}"')

        voice = self._get_voice()
        if not voice:  # pragma: nocover
            raise RuntimeError(f'failed to load voice "{voice_code}"')
        self._voice = voice

So when I run

import phonemizer
print(phonemizer.phonemize("ほたる", language="ja", backend="espeak"))

it goes into the else block and picks out the first voice in self.available_voices() for each language, which for language="ja" is 'ja': 'mb\\mb-jp1'. This then triggers

        if self._espeak.set_voice_by_name(voice_name.encode('utf8')) != 0:
            raise RuntimeError(  # pragma: nocover
                f'failed to load voice "{voice_code}"')

in the set_voice method above.

I can think of three approaches to solving this:

  1. In the constructor for EspeakBackend and EspeakMbrolaBackend, pass the name of the backend (mbrola or espeak) to the parent class so the set_voice method above knows to exclude mbrola voices if it was passed backend=espeak. I could do this, but I suspect that on Linux EspeakBackend should collect mbrola voices if mbrola is installed? I'm a bit confused about the relationship between mbrola and espeak voices; are mbrola voices a subset of espeak voices?
  2. In the else block, exclude mbrola voices if mbrola is not installed. I'm afraid I don't know how to do this.
  3. Dig into the available_voices method of the EspeakWrapper, perhaps checking there whether mbrola is installed.

If the first approach makes sense then I can do it, but I suspect it's still on the wrong track. Otherwise I think I will have to leave this PR as I'm not confident I can finish it without help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants