speechbrain.nnet.diffusion 模块

去噪扩散的实现

https://arxiv.org/pdf/2006.11239.pdf

部分内容借鉴/受启发于 denoising-diffusion-pytorch https://github.com/lucidrains/denoising-diffusion-pytorch

作者

Artem Ploujnikov 2022

概要

类

`DenoisingDiffusion`	经典去噪扩散概率模型 (DDPM) 的实现
`Diffuser`	基础扩散实现
`DiffusionTrainSample`	DiffusionTrainSample(pred, noise, noisy_sample)
`GaussianNoise`	添加普通高斯噪声
`LatentDiffusion`	一个潜在扩散包装器。
`LatentDiffusionTrainSample`	LatentDiffusionTrainSample(diffusion, autoencoder)
`LengthMaskedGaussianNoise`	应用于填充样本的高斯噪声。

函数

sample_timesteps

返回一个时间步的随机样本作为 1-D 张量（仅一个维度）

参考

class speechbrain.nnet.diffusion.Diffuser(model, timesteps, noise=None)[source]

基类：Module

基础扩散实现

参数：

model (nn.Module) – 底层模型
timesteps (int) – 时间步数量
noise (callable|str) –
要使用的噪声函数/模块

提供了以下预定义类型的噪声：“gaussian”：高斯噪声，应用于整个样本；“length_masked_gaussian”：仅应用于样本中非填充部分的高斯噪声

to the parts of the sample that is not padding

distort(x, timesteps=None)[source]

向一批数据添加噪声

参数：

x (torch.Tensor) – 原始数据样本
timesteps (torch.Tensor) – 一个 1-D 整数张量，其长度等于 x 中的批量数量，每个条目对应于该批次的时间步数。如果省略，将随机采样时间步。

train_sample(x, timesteps=None, condition=None, **kwargs)[source]

创建一个用于训练循环的样本及对应的目标

参数：

x (torch.Tensor) – 原始数据样本
timesteps (torch.Tensor) – 一个 1-D 整数张量，其长度等于 x 中的批量数量，每个条目对应于该批次的时间步数。如果省略，将随机采样时间步。
condition (torch.Tensor) – 用于条件生成时的条件。在无条件生成时应省略。
**kwargs (dict) – 传递给底层模型的参数。

返回：

pred (torch.Tensor) – 模型输出 0 预测噪声
noise (torch.Tensor) – 应用的噪声
noisy_sample (torch.Tensor) – 应用噪声后的样本

sample(shape, **kwargs)[source]

生成由 shape 参数指定的样本

参数：

shape (enumerable) – 要生成的样本的形状
**kwargs (dict) – 传递给底层模型的参数。

forward(x, timesteps=None)[source]: 计算前向传播，调用 distort()

class speechbrain.nnet.diffusion.DenoisingDiffusion(model, timesteps=None, noise=None, beta_start=None, beta_end=None, sample_min=None, sample_max=None, show_progress=False)[source]

基类：Diffuser

经典去噪扩散概率模型 (DDPM) 的实现

参数：

model (nn.Module) – 底层模型
timesteps (int) – 时间步数量
noise (str|nn.Module) – 使用的噪声类型；“gaussian” 将产生标准高斯噪声
beta_start (float) – 过程中开始时和结束时“beta”参数的值（参见论文）
beta_end (float) – 过程结束时“beta”参数的值
sample_min (float)
sample_max (float) – 用于裁剪输出。
show_progress (bool) – 在推理过程中是否显示进度

示例

>>> from speechbrain.nnet.unet import UNetModel
>>> unet = UNetModel(
...     in_channels=1,
...     model_channels=16,
...     norm_num_groups=4,
...     out_channels=1,
...     num_res_blocks=1,
...     attention_resolutions=[]
... )
>>> diff = DenoisingDiffusion(
...     model=unet,
...     timesteps=5
... )
>>> x = torch.randn(4, 1, 64, 64)
>>> pred, noise, noisy_sample = diff.train_sample(x)
>>> pred.shape
torch.Size([4, 1, 64, 64])
>>> noise.shape
torch.Size([4, 1, 64, 64])
>>> noisy_sample.shape
torch.Size([4, 1, 64, 64])
>>> sample = diff.sample((2, 1, 64, 64))
>>> sample.shape
torch.Size([2, 1, 64, 64])

compute_coefficients()[source]: 计算扩散系数（alpha 和 beta）

distort(x, noise=None, timesteps=None, **kwargs)[source]

在正向扩散过程中向样本添加噪声，

参数：

x (torch.Tensor) – 具有 2 个或更多维的数据样本，第一维表示批量
noise (torch.Tensor) – 要添加的噪声
timesteps (torch.Tensor) – 一个 1-D 整数张量，其长度等于 x 中的批量数量，每个条目对应于该批次的时间步数。如果省略，将随机采样时间步。
**kwargs (dict) – 传递给底层模型的参数。

返回：

result – 与 x 维度相同的张量

返回类型：

torch.Tensor

sample(shape, **kwargs)[source]

生成由 shape 参数指定的样本

参数：

shape (enumerable) – 要生成的样本的形状
**kwargs (dict) – 传递给底层模型的参数。

返回：

result – 生成的样本

返回类型：

torch.Tensor

sample_step(sample, timestep, **kwargs)[source]

处理采样过程的单个时间步

参数：

sample (torch.Tensor) – 下一个时间步的样本
timestep (int) – 时间步编号
**kwargs (dict) – 传递给底层模型的参数。

返回：

predicted_sample – 预测样本（去噪一步）

返回类型：

torch.Tensor

class speechbrain.nnet.diffusion.LatentDiffusion(autoencoder, diffusion, latent_downsample_factor=None, latent_pad_dim=1)[source]

基类：Module

一个潜在扩散包装器。潜在扩散是应用于潜在空间而非原始数据空间的去噪扩散。

参数：

autoencoder (speechbrain.nnet.autoencoders.Autoencoder) – 将原始空间转换为潜在空间的自编码器
diffusion (speechbrain.nnet.diffusion.Diffuser) – 扩散包装器
latent_downsample_factor (int) – 潜在空间维度需要能整除的因子。如果扩散包装器的底层模型基于 UNet 样式的架构，输入会被逐步缩小和放大两倍，则此参数很有用。
latent_pad_dim (int|list[int]) – 将对潜在空间进行填充的维度

示例

>>> import torch
>>> from torch import nn
>>> from speechbrain.nnet.CNN import Conv2d
>>> from speechbrain.nnet.autoencoders import NormalizingAutoencoder
>>> from speechbrain.nnet.unet import UNetModel

设置一个简单的自编码器（真正的自编码器将是深度神经网络）

>>> ae_enc = Conv2d(
...     kernel_size=3,
...     stride=4,
...     in_channels=1,
...     out_channels=1,
...     skip_transpose=True,
... )
>>> ae_dec = nn.ConvTranspose2d(
...     kernel_size=3,
...     stride=4,
...     in_channels=1,
...     out_channels=1,
...     output_padding=1
... )
>>> ae = NormalizingAutoencoder(
...     encoder=ae_enc,
...     decoder=ae_dec,
... )

构建一个具有 UNet 架构的扩散模型

>>> unet = UNetModel(
...     in_channels=1,
...     model_channels=16,
...     norm_num_groups=4,
...     out_channels=1,
...     num_res_blocks=1,
...     attention_resolutions=[]
... )
>>> diff = DenoisingDiffusion(
...     model=unet,
...     timesteps=5
... )
>>> latent_diff = LatentDiffusion(
...     autoencoder=ae,
...     diffusion=diff,
...     latent_downsample_factor=4,
...     latent_pad_dim=2
... )
>>> x = torch.randn(4, 1, 64, 64)
>>> latent_sample = latent_diff.train_sample_latent(x)
>>> diff_sample, ae_sample = latent_sample
>>> pred, noise, noisy_sample = diff_sample
>>> pred.shape
torch.Size([4, 1, 16, 16])
>>> noise.shape
torch.Size([4, 1, 16, 16])
>>> noisy_sample.shape
torch.Size([4, 1, 16, 16])
>>> ae_sample.latent.shape
torch.Size([4, 1, 16, 16])

创建一些样本（给定的形状应为潜在空间的形状）

>>> sample = latent_diff.sample((2, 1, 16, 16))
>>> sample.shape
torch.Size([2, 1, 64, 64])

train_sample(x, **kwargs)[source]

创建一个用于训练循环的样本及对应的目标

参数：

x (torch.Tensor) – 原始数据样本
**kwargs (dict) – 传递给底层模型的参数。

返回：

pred (torch.Tensor) – 模型输出 0 预测噪声
noise (torch.Tensor) – 应用的噪声
noisy_sample – 应用噪声后的样本

train_sample_latent(x, **kwargs)[source]

返回带有自编码器输出的训练样本 - 可用于联合训练扩散模型和自编码器

参数：

x (torch.Tensor) – 原始数据样本
**kwargs (dict) – 传递给底层模型的参数。

返回：

训练样本。

返回类型：

LatentDiffusionTrainSample

distort(x)[source]

在正向扩散过程中向样本添加噪声，

参数：: x (torch.Tensor) – 具有 2 个或更多维的数据样本，第一维表示批量
返回：: result – 与 x 维度相同的张量
返回类型：: torch.Tensor

sample(shape)[source]

从扩散模型获取样本

参数：: shape (torch.Tensor)
返回：: sample – 指定形状的样本
返回类型：: torch.Tensor

speechbrain.nnet.diffusion.sample_timesteps(x, num_timesteps)[source]

返回一个时间步的随机样本作为 1-D 张量（仅一个维度）

参数：

x (torch.Tensor) – 任意维度的样本张量
num_timesteps (int) – 时间步总数

返回类型：

时间戳的随机样本。

class speechbrain.nnet.diffusion.GaussianNoise(*args, **kwargs)[source]

基类：Module

添加普通高斯噪声

forward(sample, **kwargs)[source]

前向传播

参数：

sample (原始样本)
**kwargs (dict) – 传递给底层模型的参数。

返回类型：

与样本形状相同的噪声。

class speechbrain.nnet.diffusion.LengthMaskedGaussianNoise(length_dim=1)[source]

基类：Module

应用于填充样本的高斯噪声。不会向填充部分的位置添加噪声。

参数：: length_dim (int) – 长度适用的时间维度。

forward(sample, length=None, **kwargs)[source]

创建高斯噪声。如果提供了长度张量，则不会向填充位置添加噪声。

参数：

sample (torch.Tensor) – 一批数据
length (torch.Tensor) – 相对长度
**kwargs (dict) – 传递给底层模型的参数。

返回类型：

与样本形状相同的高斯噪声。

class speechbrain.nnet.diffusion.DiffusionTrainSample(pred, noise, noisy_sample)

基类：tuple

noise: 字段编号 1 的别名

noisy_sample: 字段编号 2 的别名

pred: 字段编号 0 的别名

class speechbrain.nnet.diffusion.LatentDiffusionTrainSample(diffusion, autoencoder)

基类：tuple

autoencoder: 字段编号 1 的别名

diffusion: 字段编号 0 的别名