speechbrain.lobes.models.Cnn14 模块

此文件实现了 https://arxiv.org/abs/1912.10211 中的 CNN14 模型

作者 * Cem Subakan 2022 * Francesco Paissan 2022

摘要

类

`CNN14PSI`	此类估计梅尔域显著性掩码
`CNN14PSI_stft`	此类根据分类器表示估计 STFT 域上的显著图。
`Cnn14`	此类实现了 https://arxiv.org/abs/1912.10211 中的 Cnn14 模型
`ConvBlock`	此类实现了 CNN14 中使用的卷积块

函数

`init_bn`	初始化 Batchnorm 层。
`init_layer`	初始化 Linear 或 Convolutional 层。

参考

speechbrain.lobes.models.Cnn14.init_layer(layer)[source]: 初始化 Linear 或 Convolutional 层。

speechbrain.lobes.models.Cnn14.init_bn(bn)[source]: 初始化 Batchnorm 层。

class speechbrain.lobes.models.Cnn14.ConvBlock(in_channels, out_channels, norm_type)[source]

基类：Module

此类实现了 CNN14 中使用的卷积块

参数:

in_channels (int) – 输入通道数
out_channels (int) – 输出通道数
norm_type (str in ['bn', 'in', 'ln']) – 归一化类型

示例

>>> convblock = ConvBlock(10, 20, 'ln')
>>> x = torch.rand(5, 10, 20, 30)
>>> y = convblock(x)
>>> print(y.shape)
torch.Size([5, 20, 10, 15])

init_weight()[source]: 初始化模型的卷积层和 batchnorm 层

forward(x, pool_size=(2, 2), pool_type='avg')[source]

CNN14 中卷积块的前向传播

参数:

x (torch.Tensor) –
输入张量，形状为 B x C_in x D1 x D2，其中 B = Batchsize

C_in = 输入通道数 D1 = 第一个空间维度 D2 = 第二个空间维度
pool_size (tuple with integer values) – 每层池化大小
pool_type (str in ['max', 'avg', 'avg+max']) – 池化类型

返回类型:

一个卷积块的输出

class speechbrain.lobes.models.Cnn14.Cnn14(mel_bins, emb_dim, norm_type='bn', return_reps=False, l2i=False)[source]

基类：Module

此类实现了 https://arxiv.org/abs/1912.10211 中的 Cnn14 模型

参数:

mel_bins (int) – 输入的梅尔频率 bin 数
emb_dim (int) – 输出嵌入的维度
norm_type (str in ['bn', 'in', 'ln']) – 归一化类型
return_reps (bool (default=False)) – 如果为 True，模型也会返回中间表示用于解释
l2i (bool) – 如果为 True，移除其中一个输出。

示例

>>> cnn14 = Cnn14(120, 256)
>>> x = torch.rand(3, 400, 120)
>>> h = cnn14.forward(x)
>>> print(h.shape)
torch.Size([3, 1, 256])

init_weight()[source]: 初始化模型的 batch norm 层

forward(x)[source]

CNN14 编码器的前向传播

参数:

x (torch.Tensor) –

输入张量，形状为 B x C_in x D1 x D2，其中 B = Batchsize

C_in = 输入通道数 D1 = 第一个空间维度 D2 = 第二个空间维度

返回类型:

CNN14 编码器的输出

class speechbrain.lobes.models.Cnn14.CNN14PSI(dim=128)[source]

基类：Module

此类估计梅尔域显著性掩码

参数:: dim (int) – 嵌入的维度

示例

>>> from speechbrain.lobes.models.Cnn14 import Cnn14
>>> classifier_embedder = Cnn14(mel_bins=80, emb_dim=2048, return_reps=True)
>>> x = torch.randn(2, 201, 80)
>>> _, hs = classifier_embedder(x)
>>> psimodel = CNN14PSI(2048)
>>> xhat = psimodel.forward(hs)
>>> print(xhat.shape)
torch.Size([2, 1, 201, 80])

forward(hs, labels=None)[source]

前向步骤。根据分类器表示估计显著图。

参数:

hs (torch.Tensor) – 分类器的表示。
labels (None) – 未使用

返回:

xhat – 估计的显著图 (sigmoid 之前)

返回类型:

torch.Tensor

class speechbrain.lobes.models.Cnn14.CNN14PSI_stft(dim=128, outdim=1)[source]

基类：Module

此类根据分类器表示估计 STFT 域上的显著图。

参数:

dim (int) – 输入表示的维度。
outdim (int) – 定义显著图中输出通道的数量。

示例

>>> from speechbrain.lobes.models.Cnn14 import Cnn14
>>> classifier_embedder = Cnn14(mel_bins=80, emb_dim=2048, return_reps=True)
>>> x = torch.randn(2, 201, 80)
>>> _, hs = classifier_embedder(x)
>>> psimodel = CNN14PSI_stft(2048, 1)
>>> xhat = psimodel.forward(hs)
>>> print(xhat.shape)
torch.Size([2, 1, 201, 513])

forward(hs)[source]

用于估计显著图的前向步骤

参数:: hs (torch.Tensor) – 分类器的表示。
返回:: xhat – 显著图的估计
返回类型:: torch.Tensor