speechbrain.inference.TTS 模块
指定文本转语音 (TTS) 模块的推理接口。
- 作者
Aku Rouhe 2021
Peter Plantinga 2021
Loren Lugosch 2020
Mirco Ravanelli 2020
Titouan Parcollet 2021
Abdel Heba 2021
Andreas Nautsch 2022, 2023
Pooneh Mousavi 2023
Sylvain de Langen 2023
Adel Moumen 2023
Pradnya Kandarkar 2023
摘要
类
Fastspeech2 (文本 -> mel_spec) 的即用型包装器。 |
|
带有内部对齐功能的 Fastspeech2 (文本 -> mel_spec) 的即用型包装器。 |
|
用于零样本多说话人 Tacotron2 的即用型包装器。 |
|
用于 Tacotron2 (文本 -> mel_spec) 的即用型包装器。 |
参考
- class speechbrain.inference.TTS.Tacotron2(*args, **kwargs)[source]
基类:
Pretrained
用于 Tacotron2 (文本 -> mel_spec) 的即用型包装器。
示例
>>> tmpdir_tts = getfixture('tmpdir') / "tts" >>> tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir=tmpdir_tts) >>> mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb") >>> items = [ ... "A quick brown fox jumped over the lazy dog", ... "How much wood would a woodchuck chuck?", ... "Never odd or even" ... ] >>> mel_outputs, mel_lengths, alignments = tacotron2.encode_batch(items)
>>> # One can combine the TTS model with a vocoder (that generates the final waveform) >>> # Initialize the Vocoder (HiFIGAN) >>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder" >>> from speechbrain.inference.vocoders import HIFIGAN >>> hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir=tmpdir_vocoder) >>> # Running the TTS >>> mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb") >>> # Running Vocoder (spectrogram-to-waveform) >>> waveforms = hifi_gan.decode_batch(mel_output)
- HPARAMS_NEEDED = ['model', 'text_to_sequence']
- class speechbrain.inference.TTS.MSTacotron2(*args, **kwargs)[source]
基类:
Pretrained
用于零样本多说话人 Tacotron2 的即用型包装器。用于语音克隆:(文本, 参考音频) -> (mel_spec)。用于生成随机说话人声音:(文本) -> (mel_spec)。
示例
>>> tmpdir_tts = getfixture('tmpdir') / "tts" >>> mstacotron2 = MSTacotron2.from_hparams(source="speechbrain/tts-mstacotron2-libritts", savedir=tmpdir_tts) >>> # Sample rate of the reference audio must be greater or equal to the sample rate of the speaker embedding model >>> reference_audio_path = "tests/samples/single-mic/example1.wav" >>> input_text = "Mary had a little lamb." >>> mel_output, mel_length, alignment = mstacotron2.clone_voice(input_text, reference_audio_path) >>> # One can combine the TTS model with a vocoder (that generates the final waveform) >>> # Initialize the Vocoder (HiFIGAN) >>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder" >>> from speechbrain.inference.vocoders import HIFIGAN >>> hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-libritts-22050Hz", savedir=tmpdir_vocoder) >>> # Running the TTS >>> mel_output, mel_length, alignment = mstacotron2.clone_voice(input_text, reference_audio_path) >>> # Running Vocoder (spectrogram-to-waveform) >>> waveforms = hifi_gan.decode_batch(mel_output) >>> # For generating a random speaker voice, use the following >>> mel_output, mel_length, alignment = mstacotron2.generate_random_voice(input_text)
- HPARAMS_NEEDED = ['model']
- class speechbrain.inference.TTS.FastSpeech2(*args, **kwargs)[source]
基类:
Pretrained
Fastspeech2 (文本 -> mel_spec) 的即用型包装器。
示例
>>> tmpdir_tts = getfixture('tmpdir') / "tts" >>> fastspeech2 = FastSpeech2.from_hparams(source="speechbrain/tts-fastspeech2-ljspeech", savedir=tmpdir_tts) >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(["Mary had a little lamb."]) >>> items = [ ... "A quick brown fox jumped over the lazy dog", ... "How much wood would a woodchuck chuck?", ... "Never odd or even" ... ] >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(items) >>> >>> # One can combine the TTS model with a vocoder (that generates the final waveform) >>> # Initialize the Vocoder (HiFIGAN) >>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder" >>> from speechbrain.inference.vocoders import HIFIGAN >>> hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir=tmpdir_vocoder) >>> # Running the TTS >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(["Mary had a little lamb."]) >>> # Running Vocoder (spectrogram-to-waveform) >>> waveforms = hifi_gan.decode_batch(mel_outputs)
- HPARAMS_NEEDED = ['spn_predictor', 'model', 'input_encoder']
- class speechbrain.inference.TTS.FastSpeech2InternalAlignment(*args, **kwargs)[source]
基类:
Pretrained
带有内部对齐功能的 Fastspeech2 (文本 -> mel_spec) 的即用型包装器。
示例
>>> tmpdir_tts = getfixture('tmpdir') / "tts" >>> fastspeech2 = FastSpeech2InternalAlignment.from_hparams(source="speechbrain/tts-fastspeech2-internal-alignment-ljspeech", savedir=tmpdir_tts) >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(["Mary had a little lamb."]) >>> items = [ ... "A quick brown fox jumped over the lazy dog", ... "How much wood would a woodchuck chuck?", ... "Never odd or even" ... ] >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(items) >>> # One can combine the TTS model with a vocoder (that generates the final waveform) >>> # Initialize the Vocoder (HiFIGAN) >>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder" >>> from speechbrain.inference.vocoders import HIFIGAN >>> hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir=tmpdir_vocoder) >>> # Running the TTS >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(["Mary had a little lamb."]) >>> # Running Vocoder (spectrogram-to-waveform) >>> waveforms = hifi_gan.decode_batch(mel_outputs)
- HPARAMS_NEEDED = ['model', 'input_encoder']