speechbrain.inference.vocoders 模块

指定文本到语音 (TTS) 模块的推理接口

作者

Aku Rouhe 2021
Peter Plantinga 2021
Loren Lugosch 2020
Mirco Ravanelli 2020
Titouan Parcollet 2021
Abdel Heba 2021
Andreas Nautsch 2022, 2023
Pooneh Mousavi 2023
Sylvain de Langen 2023
Adel Moumen 2023
Pradnya Kandarkar 2023

摘要

类

`DiffWaveVocoder`	一个即用型的 DiffWave 推理包装器，用作声码器。该包装器允许执行生成任务：局部条件生成：mel_spec -> 波形
`HIFIGAN`	一个即用型的 HiFiGAN 包装器 (mel_spec -> 波形)
`UnitHIFIGAN`	一个即用型的 Unit HiFiGAN 包装器 (离散单元 -> 波形)

参考

class speechbrain.inference.vocoders.HIFIGAN(*args, **kwargs)[source]

基类：Pretrained

一个即用型的 HiFiGAN 包装器 (mel_spec -> 波形)

参数：

*args (tuple)
**kwargs (dict) – 参数转发到 Pretrained 父类

示例

>>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder"
>>> hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir=tmpdir_vocoder)
>>> mel_specs = torch.rand(2, 80,298)
>>> waveforms = hifi_gan.decode_batch(mel_specs)
>>> # You can use the vocoder coupled with a TTS system
>>> # Initialize TTS (tacotron2)
>>> tmpdir_tts = getfixture('tmpdir') / "tts"
>>> from speechbrain.inference.TTS import Tacotron2
>>> tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir=tmpdir_tts)
>>> # Running the TTS
>>> mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb")
>>> # Running Vocoder (spectrogram-to-waveform)
>>> waveforms = hifi_gan.decode_batch(mel_output)

HPARAMS_NEEDED = ['generator']

decode_batch(spectrogram, mel_lens=None, hop_len=None)[source]

从一批梅尔频谱图计算波形

参数：

spectrogram (torch.Tensor) – 梅尔频谱图批次 [batch, mels, time]
mel_lens (torch.tensor) – 该批次中梅尔频谱图的长度列表，可从 Tacotron/FastSpeech 的输出中获得
hop_len (int) – 用于提取梅尔频谱图的 hop length，应与 .yaml 文件中的值相同

返回值：

waveforms – 梅尔波形批次 [batch, 1, time]

返回类型：

torch.Tensor

mask_noise(waveform, mel_lens, hop_len)[source]

在批量推理期间屏蔽由填充引起的噪声

参数：

waveform (torch.tensor) – 生成波形的批次 [batch, 1, time]
mel_lens (torch.tensor) – 该批次中梅尔频谱图的长度列表，可从 Tacotron/FastSpeech 的输出中获得
hop_len (int) – 用于提取梅尔频谱图的 hop length，与 .yaml 文件中的值相同

返回值：

waveform – 没有填充噪声的波形批次 [batch, 1, time]

返回类型：

torch.tensor

decode_spectrogram(spectrogram)[source]

从单个梅尔频谱图计算波形

参数：

spectrogram (torch.Tensor) – 梅尔频谱图 [mels, time]

返回值：

waveform (torch.Tensor) – 波形 [1, time]
音频可以通过以下方式保存：
>>> import torchaudio
>>> waveform = torch.rand(1, 666666)
>>> sample_rate = 22050
>>> torchaudio.save(str(getfixture(‘tmpdir’) / “test.wav”), waveform, sample_rate)

forward(spectrogram)[source]: 解码输入频谱图

class speechbrain.inference.vocoders.DiffWaveVocoder(*args, **kwargs)[source]

基类：Pretrained

一个即用型的 DiffWave 推理包装器，用作声码器。该包装器允许执行生成任务

局部条件生成：mel_spec -> 波形

参数：

*args (tuple)
**kwargs (dict) – 参数转发到 Pretrained 父类

HPARAMS_NEEDED = ['diffusion']

decode_batch(mel, hop_len, mel_lens=None, fast_sampling=False, fast_sampling_noise_schedule=None)[source]

从频谱图生成波形

参数：

mel (torch.tensor) – 频谱图 [batch, mels, time]
hop_len (int) – 梅尔频谱图提取期间的 Hop length，应与 .yaml 文件中的值相同。用于确定输出波形长度，也用于屏蔽声码任务的噪声
mel_lens (torch.tensor) – 用于屏蔽由填充引起的噪声，该批次中梅尔频谱图的长度列表，可从 Tacotron/FastSpeech 的输出中获得
fast_sampling (bool) – 是否进行快速采样
fast_sampling_noise_schedule (list) – 用于快速采样的噪声调度

返回值：

waveforms – 梅尔波形批次 [batch, 1, time]

返回类型：

torch.tensor

mask_noise(waveform, mel_lens, hop_len)[source]

在批量推理期间屏蔽由填充引起的噪声

参数：

waveform (torch.tensor) – 生成波形的批次 [batch, 1, time]
mel_lens (torch.tensor) – 该批次中梅尔频谱图的长度列表，可从 Tacotron/FastSpeech 的输出中获得
hop_len (int) – 用于提取梅尔频谱图的 hop length，与 .yaml 文件中的值相同

返回值：

waveform – 没有填充噪声的波形批次 [batch, 1, time]

返回类型：

torch.tensor

decode_spectrogram(spectrogram, hop_len, fast_sampling=False, fast_sampling_noise_schedule=None)[source]

从单个梅尔频谱图计算波形

参数：

spectrogram (torch.tensor) – 梅尔频谱图 [mels, time]
hop_len (int) – 用于提取梅尔频谱图的 hop length，与 .yaml 文件中的值相同
fast_sampling (bool) – 是否进行快速采样
fast_sampling_noise_schedule (list) – 用于快速采样的噪声调度

返回值：

waveform (torch.tensor) – 波形 [1, time]
音频可以通过以下方式保存：
>>> import torchaudio
>>> waveform = torch.rand(1, 666666)
>>> sample_rate = 22050
>>> torchaudio.save(str(getfixture(‘tmpdir’) / “test.wav”), waveform, sample_rate)

forward(spectrogram)[source]: 解码输入频谱图

class speechbrain.inference.vocoders.UnitHIFIGAN(*args, **kwargs)[source]

基类：Pretrained

一个即用型的 Unit HiFiGAN 包装器 (离散单元 -> 波形)

参数：

*args (tuple) – 参见 Pretrained
**kwargs (dict) – 参见 Pretrained

示例

>>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder"
>>> hifi_gan = UnitHIFIGAN.from_hparams(source="speechbrain/hifigan-hubert-l1-3-7-12-18-23-k1000-LibriTTS", savedir=tmpdir_vocoder)
>>> codes = torch.randint(0, 99, (100, 1))
>>> waveform = hifi_gan.decode_unit(codes)

HPARAMS_NEEDED = ['generator']

decode_batch(units, spk=None)[source]

从一批离散单元计算波形

参数：

units (torch.tensor) – 离散单元批次 [batch, codes]
spk (torch.tensor) – 说话人嵌入批次 [batch, spk_dim]

返回值：

waveforms – 梅尔波形批次 [batch, 1, time]

返回类型：

torch.tensor

decode_unit(units, spk=None)[source]

从单个离散单元序列计算波形 :param units: codes: [time] :type units: torch.tensor :param spk: spk: [spk_dim] :type spk: torch.tensor

返回值：: waveform – 波形 [1, time]
返回类型：: torch.tensor

forward(units, spk=None)[source]: 解码输入单元