speechbrain.lobes.models.L2I 模块

此文件实现了实现 Listen-to-Interpret (L2I) 解释方法所需的类和函数，该方法来自 https://arxiv.org/abs/2202.11479v2

作者 * Cem Subakan 2022 * Francesco Paissan 2022

摘要

类

`CNN14PSI_stft`	此类根据分类器表示，估计 STFT 域上的显著性图。
`CNN14PSI_stft_2d`	此类使用 L2I 框架估计 NMF 激活以创建显著性图
`NMFDecoderAudio`	此类实现了 NMF 解码器
`NMFEncoder`	此类使用卷积网络实现了 NMF 编码器
`Psi`	卷积层，用于从分类器表示中估计 NMF 激活
`PsiOptimized`	卷积层，用于从分类器表示中估计 NMF 激活，针对对数频谱进行了优化。
`Theta`	此类在 NMF 激活之上实现了一个线性分类器

函数

weights_init

对网络权重应用 Xavier 初始化。

参考

class speechbrain.lobes.models.L2I.Psi(n_comp=100, T=431, in_emb_dims=[2048, 1024, 512])[source]

基类：Module

卷积层，用于从分类器表示中估计 NMF 激活

参数：

n_comp (int) – NMF 分量数量（或等效地，每个时间步长的输出神经元数量）
T (int) – 沿时间维度的目标长度
in_emb_dims (List with int elements) – 一个长度为 3 的列表，包含输入维度的维度。该列表需要与输入分类器表示中的通道数量匹配。最后一个条目应该是最小的条目。

示例

>>> inp = [torch.ones(2, 150, 6, 2), torch.ones(2, 100, 6, 2), torch.ones(2, 50, 12, 5)]
>>> psi = Psi(n_comp=100, T=120, in_emb_dims=[150, 100, 50])
>>> h = psi(inp)
>>> print(h.shape)
torch.Size([2, 100, 120])

forward(inp)[source]

此 forward 函数根据分类器激活返回 NMF 时间激活

参数：: inp (list) – 一个长度为 3 的分类器输入表示列表。
返回类型：: NMF 时间激活

class speechbrain.lobes.models.L2I.NMFDecoderAudio(n_comp=100, n_freq=513, device='cuda')[source]

基类：Module

此类实现了 NMF 解码器

参数：

n_comp (int) – NMF 分量数量
n_freq (int) – NMF 字典中的频率 bin 数量
device (str) – 运行模型的设备

示例

>>> NMF_dec = NMFDecoderAudio(20, 210, device='cpu')
>>> H = torch.rand(1, 20, 150)
>>> Xhat = NMF_dec.forward(H)
>>> print(Xhat.shape)
torch.Size([1, 210, 150])

forward(H)[source]

给定激活 H，进行 NMF 的前向传递

参数：

H (torch.Tensor) –

激活张量，形状为 B x n_comp x T，其中 B = 批大小

n_comp = NMF 分量数量 T = 时间点数量

返回：

output – NMF 输出

返回类型：

torch.Tensor

return_W()[source]: 此函数返回 NMF 字典

speechbrain.lobes.models.L2I.weights_init(m)[source]

对网络权重应用 Xavier 初始化。

参数：: m (nn.Module) – 要初始化的模块。

class speechbrain.lobes.models.L2I.PsiOptimized(dim=128, K=100, numclasses=50, use_adapter=False, adapter_reduce_dim=True)[source]

基类：Module

卷积层，用于从分类器表示中估计 NMF 激活，针对对数频谱进行了优化。

参数：

dim (int) – 隐藏表示（分类器的输入）的维度。
K (int) – NMF 分量数量（或等效地，每个时间步长的输出神经元数量）
numclasses (int) – 可能的类别数量。
use_adapter (bool) – 如果希望为潜在表示学习适配器，则为 True。
adapter_reduce_dim (bool) – 如果适配器应该压缩潜在表示，则为 True。

示例

>>> inp = torch.randn(1, 256, 26, 32)
>>> psi = PsiOptimized(dim=256, K=100, use_adapter=False, adapter_reduce_dim=False)
>>> h, inp_ad= psi(inp)
>>> print(h.shape, inp_ad.shape)
torch.Size([1, 1, 417, 100]) torch.Size([1, 256, 26, 32])

forward(hs)[source]

计算前向步骤。

参数：: hs (torch.Tensor) – 潜在表示（分类器的输入）。预期形状为 torch.Size([B, C, H, W])。
返回：: NMF 激活和适配表示。形状为 `torch.Size([B, 1, T, 100])`。
返回类型：: torch.Tensor

class speechbrain.lobes.models.L2I.Theta(n_comp=100, T=431, num_classes=50)[source]

基类：Module

此类在 NMF 激活之上实现了一个线性分类器

参数：

n_comp (int) – NMF 分量数量
T (int) – NMF 激活中的时间点数量
num_classes (int) – 分类器处理的类别数量

示例

>>> theta = Theta(30, 120, 50)
>>> H = torch.rand(1, 30, 120)
>>> c_hat = theta.forward(H)
>>> print(c_hat.shape)
torch.Size([1, 50])

forward(H)[source]

我们首先将时间轴折叠，然后通过线性层

参数：

H (torch.Tensor) –

激活张量，形状为 B x n_comp x T，其中 B = 批大小

n_comp = NMF 分量数量 T = 时间点数量

返回：

theta_out – 分类器输出

返回类型：

torch.Tensor

class speechbrain.lobes.models.L2I.NMFEncoder(n_freq, n_comp)[source]

基类：Module

此类使用卷积网络实现了 NMF 编码器

参数：

n_freq (int) – NMF 字典中的频率 bin 数量
n_comp (int) – NMF 分量数量

示例

>>> nmfencoder = NMFEncoder(513, 100)
>>> X = torch.rand(1, 513, 240)
>>> Hhat = nmfencoder(X)
>>> print(Hhat.shape)
torch.Size([1, 100, 240])

forward(X)[source]

参数：

X (torch.Tensor) –

输入频谱图张量，形状为 B x n_freq x T，其中 B = 批大小

n_freq = 输入频谱图的 nfft T = 时间点数量

返回类型：

NMF 编码输出。

class speechbrain.lobes.models.L2I.CNN14PSI_stft(dim=128, K=100)[source]

基类：Module

此类根据分类器表示，估计 STFT 域上的显著性图。

参数：

dim (int) – 输入表示的维度。
K (int) – 定义显著性图中的输出通道数量。

示例

>>> from speechbrain.lobes.models.Cnn14 import Cnn14
>>> classifier_embedder = Cnn14(mel_bins=80, emb_dim=2048, return_reps=True)
>>> x = torch.randn(2, 201, 80)
>>> _, hs = classifier_embedder(x)
>>> psimodel = CNN14PSI_stft(2048, 20)
>>> xhat = psimodel.forward(hs)
>>> print(xhat.shape)
torch.Size([2, 20, 207])

forward(hs, labels=None)[source]

前向步骤。估计用于获取显著性掩码的 NMF 激活。

参数：

hs (torch.Tensor) – 分类器的表示。
labels (torch.Tensor) – 分类器表示的预测标签。

返回：

xhat – 估计的 NMF 激活系数

返回类型：

torch.Tensor

class speechbrain.lobes.models.L2I.CNN14PSI_stft_2d(dim=128, K=100)[source]

基类：Module

此类使用 L2I 框架估计 NMF 激活以创建显著性图

参数：

dim (int) – 输入表示的维度。
K (int) – 定义显著性图中的输出通道数量。

示例

>>> from speechbrain.lobes.models.Cnn14 import Cnn14
>>> classifier_embedder = Cnn14(mel_bins=80, emb_dim=2048, return_reps=True)
>>> x = torch.randn(2, 201, 80)
>>> _, hs = classifier_embedder(x)
>>> psimodel = CNN14PSI_stft_2d(2048, 20)
>>> xhat = psimodel.forward(hs)
>>> print(xhat.shape)
torch.Size([2, 20, 207])

forward(hs, labels=None)[source]

前向步骤。估计用于获取显著性掩码的 NMF 激活。

参数：

hs (torch.Tensor) – 分类器的表示。
labels (torch.Tensor) – 分类器表示的预测标签。

返回：

xhat – 估计的 NMF 激活系数

返回类型：

torch.Tensor