speechbrain.nnet.losses 模块

用于训练神经网络的损失函数。

作者

Mirco Ravanelli 2020
Samuele Cornell 2020
Hwidong Na 2020
Yan Gao 2020
Titouan Parcollet 2020

摘要

类

`AdditiveAngularMargin`	Additive Angular Margin (AAM) 的实现，该方法在以下论文中提出：'''Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition''' (https://arxiv.org/abs/1906.07317)
`AngularMargin`	Angular Margin (AM) 的实现，该方法在以下论文中提出：'''Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition''' (https://arxiv.org/abs/1906.07317)
`AutoencoderLoss`	标准（非变分）自编码器损失的实现
`AutoencoderLossDetails`	AutoencoderLossDetails(loss, rec_loss)
`ContrastiveLoss`	wav2vec2 中使用的对比损失。
`Laplacian`	计算类似图像数据的拉普拉斯算子
`LaplacianVarianceLoss`	拉普拉斯方差损失 - 用于惩罚类似图像数据（如频谱图）的模糊性。
`LogSoftmaxWrapper`
`PitWrapper`	允许使用现有损失函数进行置换不变训练 (PIT) 的置换不变包装器。
`VariationalAutoencoderLoss`	变分自编码器损失，支持长度掩码
`VariationalAutoencoderLossDetails`	VariationalAutoencoderLossDetails(loss, rec_loss, dist_loss, weighted_dist_loss)

函数

`bce_loss`	计算二元交叉熵 (BCE) 损失。
`cal_si_snr`	计算 SI-SNR。
`cal_snr`	计算双声道信噪比。
`ce_kd`	交叉熵损失的简单蒸馏版本。
`classification_error`	计算帧级别或批次级别的分类误差。
`compute_length_mask`	计算指定数据形状的长度掩码
`compute_masked_loss`	计算一组不等长波形的真实平均损失。
`ctc_loss`	CTC 损失。
`ctc_loss_kd`	CTC 损失的知识蒸馏。
`distance_diff_loss`	一种损失函数，可用于模型输出离散变量在区间尺度上的任意概率分布（例如序列长度），而基本事实是数据样本中变量的精确值的情况。
`get_mask`
`get_si_snr_with_pitwrapper`	此函数使用 speechbrain pit-wrapper 包装 si_snr 计算。
`get_snr_with_pitwrapper`	此函数使用 speechbrain pit-wrapper 包装 snr 计算。
`kldiv_loss`	计算批次级别的 KL 散度误差。
`l1_loss`	计算真实的 l1 损失，考虑长度差异。
`mse_loss`	计算真实均方误差，考虑长度差异。
`nll_loss`	计算负对数似然损失。
`nll_loss_kd`	负对数似然损失的知识蒸馏。
`reduce_loss`	执行原始损失值的指定缩减
`transducer_loss`	Transducer 损失，请参阅 `speechbrain/nnet/loss/transducer_loss.py`。
`truncate`	确保预测和目标长度相同。

参考

speechbrain.nnet.losses.transducer_loss(logits, targets, input_lens, target_lens, blank_index, reduction='mean', use_torchaudio=True)[source]

Transducer 损失，请参阅 speechbrain/nnet/loss/transducer_loss.py。

参数:

logits (torch.Tensor) – 预测张量，形状为 [batch, maxT, maxU, num_labels]。
targets (torch.Tensor) – 目标张量，不含任何空白，形状为 [batch, target_len]。
input_lens (torch.Tensor) – 每个话语的长度。
target_lens (torch.Tensor) – 每个目标序列的长度。
blank_index (int) – 空白符号在标签索引中的位置。
reduction (str) – 指定应用于输出的缩减方式：‘mean’ | ‘batchmean’ | ‘sum’。
use_torchaudio (bool) – 如果为 True，则使用 torchaudio 中的 Transducer loss 实现，否则使用 Speechbrain Numba 实现。

返回类型:

计算得到的 Transducer 损失。

class speechbrain.nnet.losses.PitWrapper(base_loss)[source]

基类: Module

允许使用现有损失函数进行置换不变训练 (PIT) 的置换不变包装器。

置换不变性是根据源/类别轴计算的，该轴假定为最右边的维度：预测和目标张量假定形状为 [batch, …, channels, sources]。

参数:: base_loss (function) – 基本损失函数，例如 torch.nn.MSELoss。假定它接受两个参数：predictions 和 targets，并且不执行缩减。（如果使用 pytorch 损失函数，用户必须指定 reduction=”none”）。

示例

>>> pit_mse = PitWrapper(nn.MSELoss(reduction="none"))
>>> targets = torch.rand((2, 32, 4))
>>> p = (3, 0, 2, 1)
>>> predictions = targets[..., p]
>>> loss, opt_p = pit_mse(predictions, targets)
>>> loss
tensor([0., 0.])

reorder_tensor(tensor, p)[source]

参数:

tensor (torch.Tensor) – 根据最优置换重新排序的 torch.Tensor，形状为 [batch, …, sources]。
p (list of tuples) – 最优置换列表，例如对于 batch=2 和 n_sources=3，列表为 [(0, 1, 2), (0, 2, 1]。

返回:

reordered – 根据置换 p 重新排序的张量。

返回类型:

torch.Tensor

forward(preds, targets)[source]

参数:

preds (torch.Tensor) – 网络预测张量，形状为 [batch, channels, …, sources]。
targets (torch.Tensor) – 目标张量，形状为 [batch, channels, …, sources]。

返回:

loss (torch.Tensor) – 当前示例的置换不变损失，张量形状为 [batch]
perms (list) – 输入在源上的最优置换索引列表。例如，对于三个源和每批次两个示例，列表为 [(0, 1, 2), (2, 1, 0)]。

speechbrain.nnet.losses.ctc_loss(log_probs, targets, input_lens, target_lens, blank_index, reduction='mean')[source]

CTC 损失。

参数:

log_probs (torch.Tensor) – 预测张量，形状为 [batch, time, chars]。
targets (torch.Tensor) – 目标张量，不含任何空白，形状为 [batch, target_len]
input_lens (torch.Tensor) – 每个话语的长度。
target_lens (torch.Tensor) – 每个目标序列的长度。
blank_index (int) – 空白符号在字符索引中的位置。
reduction (str) – 对输出应用的缩减方式。‘mean’，‘sum’，‘batch’，‘batchmean’，‘none’。有关 ‘mean’，‘sum’，‘none’ 请参阅 pytorch。‘batch’ 选项返回批次中每个项的损失，‘batchmean’ 返回总和 / 批次大小。

返回类型:

计算得到的 CTC 损失。

speechbrain.nnet.losses.l1_loss(predictions, targets, length=None, allowed_len_diff=3, reduction='mean')[source]

计算真实的 l1 损失，考虑长度差异。

参数:

predictions (torch.Tensor) – 预测张量，形状为 [batch, time, *]。
targets (torch.Tensor) – 目标张量，与预测张量大小相同。
length (torch.Tensor) – 每个话语的长度，用于计算带掩码的真实误差。
allowed_len_diff (int) – 在引发异常之前容忍的长度差异。
reduction (str) – 选项包括 ‘mean’，‘batch’，‘batchmean’，‘sum’。有关 ‘mean’，‘sum’ 请参阅 pytorch。‘batch’ 选项返回批次中每个项的损失，‘batchmean’ 返回总和 / 批次大小。

返回类型:

计算得到的 L1 损失。

示例

>>> probs = torch.tensor([[0.9, 0.1, 0.1, 0.9]])
>>> l1_loss(probs, torch.tensor([[1., 0., 0., 1.]]))
tensor(0.1000)

speechbrain.nnet.losses.mse_loss(predictions, targets, length=None, allowed_len_diff=3, reduction='mean')[source]

计算真实均方误差，考虑长度差异。

参数:

predictions (torch.Tensor) – 预测张量，形状为 [batch, time, *]。
targets (torch.Tensor) – 目标张量，与预测张量大小相同。
length (torch.Tensor) – 每个话语的长度，用于计算带掩码的真实误差。
allowed_len_diff (int) – 在引发异常之前容忍的长度差异。
reduction (str) – 选项包括 ‘mean’，‘batch’，‘batchmean’，‘sum’。有关 ‘mean’，‘sum’ 请参阅 pytorch。‘batch’ 选项返回批次中每个项的损失，‘batchmean’ 返回总和 / 批次大小。

返回类型:

计算得到的 MSE 损失。

示例

>>> probs = torch.tensor([[0.9, 0.1, 0.1, 0.9]])
>>> mse_loss(probs, torch.tensor([[1., 0., 0., 1.]]))
tensor(0.0100)

speechbrain.nnet.losses.classification_error(probabilities, targets, length=None, allowed_len_diff=3, reduction='mean')[source]

计算帧级别或批次级别的分类误差。

参数:

probabilities (torch.Tensor) – 后验概率，形状为 [batch, prob] 或 [batch, frames, prob]
targets (torch.Tensor) – 目标，形状为 [batch] 或 [batch, frames]
length (torch.Tensor) – 每个话语的长度，如果需要帧级别损失的话。
allowed_len_diff (int) – 在引发异常之前容忍的长度差异。
reduction (str) – 选项包括 ‘mean’，‘batch’，‘batchmean’，‘sum’。有关 ‘mean’，‘sum’ 请参阅 pytorch。‘batch’ 选项返回批次中每个项的损失，‘batchmean’ 返回总和 / 批次大小。

返回类型:

计算得到的分类误差。

示例

>>> probs = torch.tensor([[[0.9, 0.1], [0.1, 0.9]]])
>>> classification_error(probs, torch.tensor([1, 1]))
tensor(0.5000)

speechbrain.nnet.losses.nll_loss(log_probabilities, targets, length=None, label_smoothing=0.0, allowed_len_diff=3, weight=None, reduction='mean')[source]

计算负对数似然损失。

参数:

log_probabilities (torch.Tensor) – 应用 log 后的概率。格式为 [batch, log_p] 或 [batch, frames, log_p]。
targets (torch.Tensor) – 目标，形状为 [batch] 或 [batch, frames]。
length (torch.Tensor) – 每个话语的长度，如果需要帧级别损失的话。
label_smoothing (float) – 应用于标签的平滑量（默认为 0.0，不平滑）
allowed_len_diff (int) – 在引发异常之前容忍的长度差异。
weight (torch.Tensor) – 为每个类手动指定的重新缩放权重。如果给定，则必须是大小为 C 的张量。
reduction (str) – 选项包括 ‘mean’，‘batch’，‘batchmean’，‘sum’。有关 ‘mean’，‘sum’ 请参阅 pytorch。‘batch’ 选项返回批次中每个项的损失，‘batchmean’ 返回总和 / 批次大小。

返回类型:

计算得到的 NLL 损失。

示例

>>> probs = torch.tensor([[0.9, 0.1], [0.1, 0.9]])
>>> nll_loss(torch.log(probs), torch.tensor([1, 1]))
tensor(1.2040)

speechbrain.nnet.losses.bce_loss(inputs, targets, length=None, weight=None, pos_weight=None, reduction='mean', allowed_len_diff=3, label_smoothing=0.0)[source]

计算二元交叉熵 (BCE) 损失。它还会直接应用 sigmoid 函数（这提高了数值稳定性）。

参数:

inputs (torch.Tensor) – 应用最终 softmax 之前的输出。格式为 [batch[, 1]?] 或 [batch, frames[, 1]?]。（无论末尾是否有单例维度均可使用）。
targets (torch.Tensor) – 目标，形状为 [batch] 或 [batch, frames]。
length (torch.Tensor) – 每个话语的长度，如果需要帧级别损失的话。
weight (torch.Tensor) – 如果提供，则用于手动重新缩放权重，并重复以匹配输入张量形状。
pos_weight (torch.Tensor) – 正例的权重。必须是长度等于类别数的向量。
reduction (str) – 选项包括 ‘mean’，‘batch’，‘batchmean’，‘sum’。有关 ‘mean’，‘sum’ 请参阅 pytorch。‘batch’ 选项返回批次中每个项的损失，‘batchmean’ 返回总和 / 批次大小。
allowed_len_diff (int) – 在引发异常之前容忍的长度差异。
label_smoothing (float) – 应用于标签的平滑量（默认为 0.0，不平滑）

返回类型:

计算得到的 BCE 损失。

示例

>>> inputs = torch.tensor([10.0, -6.0])
>>> targets = torch.tensor([1, 0])
>>> bce_loss(inputs, targets)
tensor(0.0013)

speechbrain.nnet.losses.kldiv_loss(log_probabilities, targets, length=None, label_smoothing=0.0, allowed_len_diff=3, pad_idx=0, reduction='mean')[source]

计算批次级别的 KL 散度误差。此损失直接对目标应用标签平滑

参数:

log_probabilities (torch.Tensor) – 后验概率，形状为 [batch, prob] 或 [batch, frames, prob]。
targets (torch.Tensor) – 目标，形状为 [batch] 或 [batch, frames]。
length (torch.Tensor) – 每个话语的长度，如果需要帧级别损失的话。
label_smoothing (float) – 应用于标签的平滑量（默认为 0.0，不平滑）
allowed_len_diff (int) – 在引发异常之前容忍的长度差异。
pad_idx (int) – 该值的条目被视为空白填充。
reduction (str) – 选项包括 ‘mean’，‘batch’，‘batchmean’，‘sum’。有关 ‘mean’，‘sum’ 请参阅 pytorch。‘batch’ 选项返回批次中每个项的损失，‘batchmean’ 返回总和 / 批次大小。

返回类型:

计算得到的 kldiv 损失。

示例

>>> probs = torch.tensor([[0.9, 0.1], [0.1, 0.9]])
>>> kldiv_loss(torch.log(probs), torch.tensor([1, 1]))
tensor(1.2040)

speechbrain.nnet.losses.distance_diff_loss(predictions, targets, length=None, beta=0.25, max_weight=100.0, reduction='mean')[source]

一种损失函数，可用于模型输出离散变量在区间尺度上的任意概率分布（例如序列长度），而基本事实是数据样本中变量的精确值的情况。

损失定义为 loss_i = p_i * exp(beta * |i - y|) - 1。

如果输出不是概率，此损失也可以使用，只要希望在接近基本事实位置处的值较高，远离基本事实位置处的值较低。

参数:

predictions (torch.Tensor) – 一个 (batch x max_len) 张量，其中每个元素是该位置的概率、权重或其他值
targets (torch.Tensor) – 一个 1-D 张量，其中每个元素是基本事实
length (torch.Tensor) – 长度（用于填充批次中的掩码）
beta (torch.Tensor) – 控制惩罚的超参数。Beta 越高，惩罚增加越快
max_weight (torch.Tensor) – 最大距离权重（用于长序列的数值稳定性）
reduction (str) – 选项包括 ‘mean’，‘batch’，‘batchmean’，‘sum’。有关 ‘mean’，‘sum’ 请参阅 pytorch。‘batch’ 选项返回批次中每个项的损失，‘batchmean’ 返回总和 / 批次大小

返回类型:

带掩码的损失。

示例

>>> predictions = torch.tensor(
...    [[0.25, 0.5, 0.25, 0.0],
...     [0.05, 0.05, 0.9, 0.0],
...     [8.0, 0.10, 0.05, 0.05]]
... )
>>> targets = torch.tensor([2., 3., 1.])
>>> length = torch.tensor([.75, .75, 1.])
>>> loss = distance_diff_loss(predictions, targets, length)
>>> loss
tensor(0.2967)

speechbrain.nnet.losses.truncate(predictions, targets, allowed_len_diff=3)[source]

确保预测和目标长度相同。

参数:

predictions (torch.Tensor) – 用于检查长度的第一个张量。
targets (torch.Tensor) – 用于检查长度的第二个张量。
allowed_len_diff (int) – 在引发异常之前容忍的长度差异。

返回:

predictions (torch.Tensor)
targets (torch.Tensor) – 与输入相同，但形状相同。

speechbrain.nnet.losses.compute_masked_loss(loss_fn, predictions, targets, length=None, label_smoothing=0.0, mask_shape='targets', reduction='mean')[source]

计算一组不等长波形的真实平均损失。

参数:

loss_fn (function) – 仅接受 predictions 和 targets 的损失计算函数。应返回所有损失，而不是缩减（例如 reduction=”none”）。
predictions (torch.Tensor) – 损失函数的第一个参数。
targets (torch.Tensor) – 损失函数的第二个参数。
length (torch.Tensor) – 每个话语的长度，用于计算掩码。如果为 None，则计算并返回全局平均值。
label_smoothing (float) – 标签平滑的比例。仅应用于 NLL 损失。参考：Regularizing Neural Networks by Penalizing Confident Output Distributions。 https://arxiv.org/abs/1701.06548
mask_shape (torch.Tensor) –
掩码的形状。默认为 “targets”，这将导致掩码与 targets 具有相同的形状

其他选项包括 “predictions” 和 “loss”，这将分别使用 predictions 和未缩减损失的形状。这对于输出形状与 targets 不匹配的损失函数很有用。
reduction (str) – ‘mean’、‘batch’、‘batchmean’、‘none’ 之一，其中 ‘mean’ 返回单个值，‘batch’ 返回批次中每个项的值，‘batchmean’ 是总和 / batch_size，‘none’ 返回所有值。

返回类型:

带掩码的损失。

speechbrain.nnet.losses.compute_length_mask(data, length=None, len_dim=1)[source]

计算指定数据形状的长度掩码

参数:

data (torch.Tensor) – 数据形状
length (torch.Tensor) – 对应数据样本的长度
len_dim (int) – 长度维度（默认为 1）

返回:

mask – 掩码

返回类型:

torch.Tensor

示例

>>> data = torch.arange(5)[None, :, None].repeat(3, 1, 2)
>>> data += torch.arange(1, 4)[:, None, None]
>>> data *= torch.arange(1, 3)[None, None, :]
>>> data
tensor([[[ 1,  2],
         [ 2,  4],
         [ 3,  6],
         [ 4,  8],
         [ 5, 10]],

        [[ 2,  4],
         [ 3,  6],
         [ 4,  8],
         [ 5, 10],
         [ 6, 12]],

        [[ 3,  6],
         [ 4,  8],
         [ 5, 10],
         [ 6, 12],
         [ 7, 14]]])
>>> compute_length_mask(data, torch.tensor([1., .4, .8]))
tensor([[[1, 1],
         [1, 1],
         [1, 1],
         [1, 1],
         [1, 1]],

        [[1, 1],
         [1, 1],
         [0, 0],
         [0, 0],
         [0, 0]],

        [[1, 1],
         [1, 1],
         [1, 1],
         [1, 1],
         [0, 0]]])
>>> compute_length_mask(data, torch.tensor([.5, 1., .5]), len_dim=2)
tensor([[[1, 0],
         [1, 0],
         [1, 0],
         [1, 0],
         [1, 0]],

        [[1, 1],
         [1, 1],
         [1, 1],
         [1, 1],
         [1, 1]],

        [[1, 0],
         [1, 0],
         [1, 0],
         [1, 0],
         [1, 0]]])

speechbrain.nnet.losses.reduce_loss(loss, mask, reduction='mean', label_smoothing=0.0, predictions=None, targets=None)[source]

执行原始损失值的指定缩减

参数:

loss (function) – 仅接受 predictions 和 targets 的损失计算函数。应返回所有损失，而不是缩减（例如 reduction=”none”）。
mask (torch.Tensor) – 计算损失前应用的掩码。
reduction (str) – ‘mean’、‘batch’、‘batchmean’、‘none’ 之一，其中 ‘mean’ 返回单个值，‘batch’ 返回批次中每个项的值，‘batchmean’ 是总和 / batch_size，‘none’ 返回所有值。
label_smoothing (float) – 标签平滑的比例。仅应用于 NLL 损失。参考：Regularizing Neural Networks by Penalizing Confident Output Distributions。 https://arxiv.org/abs/1701.06548
predictions (torch.Tensor) – 损失函数的第一个参数。仅在应用标签平滑时需要。
targets (torch.Tensor) – 损失函数的第二个参数。仅在应用标签平滑时需要。

返回类型:

缩减后的损失。

speechbrain.nnet.losses.get_si_snr_with_pitwrapper(source, estimate_source)[source]

此函数使用 speechbrain pit-wrapper 包装 si_snr 计算。

参数:

source (torch.Tensor) – 形状为 [B, T, C]，其中 B 是批次大小，T 是源的长度，C 是源的数量，排序方式使此损失与 PitWrapper 类兼容。
estimate_source (torch.Tensor) – 估计的源，形状为 [B, T, C]

返回:

loss – 计算得到的 SNR

返回类型:

torch.Tensor

示例

>>> x = torch.arange(600).reshape(3, 100, 2)
>>> xhat = x[:, :, (1, 0)]
>>> si_snr = -get_si_snr_with_pitwrapper(x, xhat)
>>> print(si_snr)
tensor([135.2284, 135.2284, 135.2284])

speechbrain.nnet.losses.get_snr_with_pitwrapper(source, estimate_source)[source]

此函数使用 speechbrain pit-wrapper 包装 snr 计算。

参数:

source (torch.Tensor) – 形状为 [T, E, B, C]，其中 B 是批次大小，T 是源的长度，E 是双声道，C 是源的数量，排序方式使此损失与 PitWrapper 类兼容。
estimate_source (torch.Tensor) – 估计的源，形状为 [T, E, B, C]

返回:

loss – 计算得到的 SNR

返回类型:

torch.Tensor

双声道信噪比

计算 SI-SNR。

参数:

speechbrain.nnet.losses.cal_si_snr(source, estimate_source)[source]
source (torch.Tensor) – 形状为 [T, B, C]，其中 B 是批次大小，T 是源的长度，C 是源的数量，排序方式使此损失与 PitWrapper 类兼容。

返回:

计算得到的 SI-SNR。
示例
———
>>> import numpy as np
>>> x = torch.Tensor([[1, 0], [123, 45], [34, 5], [2312, 421]])
>>> xhat = x[ (, (1, 0)])
>>> x = x.unsqueeze(-1).repeat(1, 1, 2)
>>> xhat = xhat.unsqueeze(1).repeat(1, 2, 1)
>>> si_snr = -cal_si_snr(x, xhat)
>>> print(si_snr)
tensor([[[ 25.2142, 144.1789], – [130.9283, 25.2142]]])

speechbrain.nnet.losses.cal_snr(source, estimate_source)[source]

计算双声道信噪比。

参数:

source (torch.Tensor) – 形状为 [T, E, B, C] 其中 B 是批次大小，T 是源的长度，E 是双声道，C 是源的数量，排序方式使此损失与 PitWrapper 类兼容。
estimate_source (torch.Tensor) – 估计的源，形状为 [T, E, B, C]

返回类型:

双声道信噪比

speechbrain.nnet.losses.get_mask(source, source_lengths)[source]

参数:

source (torch.Tensor) – 形状 [T, B, C]
source_lengths (torch.Tensor) – 形状 [B]

返回:

mask – 形状 [T, B, 1]

返回类型:

torch.Tensor

示例

>>> source = torch.randn(4, 3, 2)
>>> source_lengths = torch.Tensor([2, 1, 4]).int()
>>> mask = get_mask(source, source_lengths)
>>> print(mask)
tensor([[[1.],
         [1.],
         [1.]],

        [[1.],
         [0.],
         [1.]],

        [[0.],
         [0.],
         [1.]],

        [[0.],
         [0.],
         [1.]]])

class speechbrain.nnet.losses.AngularMargin(margin=0.0, scale=1.0)[source]

基类: Module

Angular Margin (AM) 的实现，该方法在以下论文中提出：'''Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition''' (https://arxiv.org/abs/1906.07317)

参数:

margin (float) – 余弦相似度的 margin
scale (float) – 余弦相似度的 scale

示例

>>> pred = AngularMargin()
>>> outputs = torch.tensor([ [1., -1.], [-1., 1.], [0.9, 0.1], [0.1, 0.9] ])
>>> targets = torch.tensor([ [1., 0.], [0., 1.], [ 1., 0.], [0.,  1.] ])
>>> predictions = pred(outputs, targets)
>>> predictions[:,0] > predictions[:,1]
tensor([ True, False,  True, False])

forward(outputs, targets)[source]

计算两个张量之间的 AM

参数:

outputs (torch.Tensor) – 形状为 [N, C] 的输出，需要余弦相似度。
targets (torch.Tensor) – 形状为 [N, C] 的目标，将在此应用 margin。

返回:

预测

返回类型:

torch.Tensor

class speechbrain.nnet.losses.AdditiveAngularMargin(margin=0.0, scale=1.0, easy_margin=False)[source]

基类: AngularMargin

Additive Angular Margin (AAM) 的实现，该方法在以下论文中提出：'''Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition''' (https://arxiv.org/abs/1906.07317)

参数:

margin (float) – 余弦相似度的 margin。
scale (float) – 余弦相似度的 scale。
easy_margin (bool)

示例

>>> outputs = torch.tensor([ [1., -1.], [-1., 1.], [0.9, 0.1], [0.1, 0.9] ])
>>> targets = torch.tensor([ [1., 0.], [0., 1.], [ 1., 0.], [0.,  1.] ])
>>> pred = AdditiveAngularMargin()
>>> predictions = pred(outputs, targets)
>>> predictions[:,0] > predictions[:,1]
tensor([ True, False,  True, False])

forward(outputs, targets)[source]

计算两个张量之间的 AAM

参数:

outputs (torch.Tensor) – 形状为 [N, C] 的输出，需要余弦相似度。
targets (torch.Tensor) – 形状为 [N, C] 的目标，将在此应用 margin。

返回:

预测

返回类型:

torch.Tensor

class speechbrain.nnet.losses.LogSoftmaxWrapper(loss_fn)[source]

基类: Module

参数:: loss_fn (Callable) – 要包装的 LogSoftmax 函数。

示例

>>> outputs = torch.tensor([ [1., -1.], [-1., 1.], [0.9, 0.1], [0.1, 0.9] ])
>>> outputs = outputs.unsqueeze(1)
>>> targets = torch.tensor([ [0], [1], [0], [1] ])
>>> log_prob = LogSoftmaxWrapper(nn.Identity())
>>> loss = log_prob(outputs, targets)
>>> 0 <= loss < 1
tensor(True)
>>> log_prob = LogSoftmaxWrapper(AngularMargin(margin=0.2, scale=32))
>>> loss = log_prob(outputs, targets)
>>> 0 <= loss < 1
tensor(True)
>>> outputs = torch.tensor([ [1., -1.], [-1., 1.], [0.9, 0.1], [0.1, 0.9] ])
>>> log_prob = LogSoftmaxWrapper(AdditiveAngularMargin(margin=0.3, scale=32))
>>> loss = log_prob(outputs, targets)
>>> 0 <= loss < 1
tensor(True)

forward(outputs, targets, length=None)[source]

参数:

outputs (torch.Tensor) – 网络输出张量，形状为 [batch, 1, outdim]。
targets (torch.Tensor) – 目标张量，形状为 [batch, 1]。
length (torch.Tensor) – 对应输入的长度。

返回:

loss – 当前示例的损失。

返回类型:

torch.Tensor

speechbrain.nnet.losses.ctc_loss_kd(log_probs, targets, input_lens, blank_index, device)[source]

CTC 损失的知识蒸馏。

参考

Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition. https://arxiv.org/abs/2005.09310

param log_probs:: 来自学生模型的预测张量，形状为 [batch, time, chars]。
type log_probs:: torch.Tensor
param targets:: 来自单个教师模型的预测张量，形状为 [batch, time, chars]。
type targets:: torch.Tensor
param input_lens:: 每个话语的长度。
type input_lens:: torch.Tensor
param blank_index:: 空白符号在字符索引中的位置。
type blank_index:: int
param device:: 计算设备。
type device:: str
rtype:: 计算得到的 CTC 损失。

speechbrain.nnet.losses.ce_kd(inp, target)[source]

交叉熵损失的简单蒸馏版本。

参数:

inp (torch.Tensor) – 来自学生模型的概率，形状为 [batch_size * length, feature]
target (torch.Tensor) – 来自教师模型的概率，形状为 [batch_size * length, feature]

返回类型:

蒸馏后的输出。

speechbrain.nnet.losses.nll_loss_kd(probabilities, targets, rel_lab_lengths)[source]

负对数似然损失的知识蒸馏。

参考

Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition. https://arxiv.org/abs/2005.09310

param probabilities:: 来自学生模型的预测概率。格式为 [batch, frames, p]
type probabilities:: torch.Tensor
param targets:: 来自教师模型的目标概率。格式为 [batch, frames, p]
type targets:: torch.Tensor
param rel_lab_lengths:: 每个话语的长度，如果需要帧级别损失的话。
type rel_lab_lengths:: torch.Tensor
rtype:: 计算得到的 NLL KD 损失。

示例

>>> probabilities = torch.tensor([[[0.8, 0.2], [0.2, 0.8]]])
>>> targets = torch.tensor([[[0.9, 0.1], [0.1, 0.9]]])
>>> rel_lab_lengths = torch.tensor([1.])
>>> nll_loss_kd(probabilities, targets, rel_lab_lengths)
tensor(-0.7400)

class speechbrain.nnet.losses.ContrastiveLoss(logit_temp)[source]

基类: Module

wav2vec2 中使用的对比损失。

参考

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations https://arxiv.org/abs/2006.11477

param logit_temp:: 用于除以 logits 的温度。
type logit_temp:: torch.Float

forward(x, y, negs)[source]

计算对比损失。

参数:

x (torch.Tensor) – 编码后的嵌入向量，形状为 (B, T, C)。
y (torch.Tensor) – 特征提取器目标嵌入向量，形状为 (B, T, C)。
negs (torch.Tensor) – 来自特征提取器的负例嵌入向量，形状为 (N, B, T, C)，其中 N 是负例的数量。可以使用我们的 sample_negatives 函数（在 lobes/wav2vec2 中查看）获取。

返回:

loss (torch.Tensor) – 计算得到的损失
accuracy (torch.Tensor) – 计算得到的准确率

class speechbrain.nnet.losses.VariationalAutoencoderLoss(rec_loss=None, len_dim=1, dist_loss_weight=0.001)[source]

基类: Module

变分自编码器损失，支持长度掩码

源自 Autoencoding Variational Bayes: https://arxiv.org/pdf/1312.6114.pdf

参数:

rec_loss (callable) – 用于计算重构损失的函数或模块
len_dim (int) – 用于表示长度的维度，如果编码的是变长序列
dist_loss_weight (float) – 分布损失（K-L散度）的相对权重

示例

>>> from speechbrain.nnet.autoencoders import VariationalAutoencoderOutput
>>> vae_loss = VariationalAutoencoderLoss(dist_loss_weight=0.5)
>>> predictions = VariationalAutoencoderOutput(
...     rec=torch.tensor(
...         [[0.8, 1.0],
...          [1.2, 0.6],
...          [0.4, 1.4]]
...         ),
...     mean=torch.tensor(
...         [[0.5, 1.0],
...          [1.5, 1.0],
...          [1.0, 1.4]],
...         ),
...     log_var=torch.tensor(
...         [[0.0, -0.2],
...          [2.0, -2.0],
...          [0.2,  0.4]],
...         ),
...     latent=torch.randn(3, 1),
...     latent_sample=torch.randn(3, 1),
...     latent_length=torch.tensor([1., 1., 1.]),
... )
>>> targets = torch.tensor(
...     [[0.9, 1.1],
...      [1.4, 0.6],
...      [0.2, 1.4]]
... )
>>> loss = vae_loss(predictions, targets)
>>> loss
tensor(1.1264)
>>> details = vae_loss.details(predictions, targets)
>>> details
VariationalAutoencoderLossDetails(loss=tensor(1.1264),
                                  rec_loss=tensor(0.0333),
                                  dist_loss=tensor(2.1861),
                                  weighted_dist_loss=tensor(1.0930))

forward(predictions, targets, length=None, reduction='batchmean')[source]

计算前向传播

参数:

predictions (speechbrain.nnet.autoencoders.VariationalAutoencoderOutput) – 变分自编码器输出
targets (torch.Tensor) – 重构目标
length (torch.Tensor) – 用于通过掩码计算真实误差的每个样本的长度。
reduction (str) – 应用的归约类型，默认为“batchmean”

返回:

loss – VAE损失（重构损失 + K-L散度）

返回类型:

torch.Tensor

details(predictions, targets, length=None, reduction='batchmean')[source]

获取关于损失的详细信息（对于绘图、日志等很有用）

参数:

predictions (speechbrain.nnet.autoencoders.VariationalAutoencoderOutput) – 变分自编码器输出（或由 rec, mean, log_var 组成的元组）
targets (torch.Tensor) – 用于重构损失的目标
length (torch.Tensor) – 用于通过掩码计算真实误差的每个样本的长度。
reduction (str) – 应用的归约类型，默认为“batchmean”

返回:

details – 一个具名元组，包含以下参数 loss: torch.Tensor

组合损失

rec_loss: torch.Tensor: 重构损失
dist_loss: torch.Tensor: 分布损失（K-L散度），原始值
weighted_dist_loss: torch.Tensor: 加权后的分布损失值，在组合损失中使用

返回类型:

VAELossDetails

class speechbrain.nnet.losses.AutoencoderLoss(rec_loss=None, len_dim=1)[source]

基类: Module

标准（非变分）自编码器损失的实现

参数:

rec_loss (callable) – 用于计算重构损失的可调用对象
len_dim (int) – 用作长度的维度索引

示例

>>> from speechbrain.nnet.autoencoders import AutoencoderOutput
>>> ae_loss = AutoencoderLoss()
>>> rec = torch.tensor(
...   [[0.8, 1.0],
...    [1.2, 0.6],
...    [0.4, 1.4]]
... )
>>> predictions = AutoencoderOutput(
...     rec=rec,
...     latent=torch.randn(3, 1),
...     latent_length=torch.tensor([1., 1.])
... )
>>> targets = torch.tensor(
...     [[0.9, 1.1],
...      [1.4, 0.6],
...      [0.2, 1.4]]
... )
>>> ae_loss(predictions, targets)
tensor(0.0333)
>>> ae_loss.details(predictions, targets)
AutoencoderLossDetails(loss=tensor(0.0333), rec_loss=tensor(0.0333))

forward(predictions, targets, length=None, reduction='batchmean')[source]

计算自编码器损失

参数:

predictions (speechbrain.nnet.autoencoders.AutoencoderOutput) – 自编码器输出
targets (torch.Tensor) – 用于重构损失的目标
length (torch.Tensor) – 用于通过掩码计算真实误差的每个样本的长度
reduction (str) – 应用的归约类型，默认为“batchmean”

返回类型:

计算得到的损失。

details(predictions, targets, length=None, reduction='batchmean')[source]

获取关于损失的详细信息（对于绘图、日志等很有用）

提供此方法主要是为了使该损失可以与更复杂的自编码器损失（如 VAE 损失）互换使用。

参数:

predictions (speechbrain.nnet.autoencoders.AutoencoderOutput) – 自编码器输出
targets (torch.Tensor) – 用于重构损失的目标
length (torch.Tensor) – 用于通过掩码计算真实误差的每个样本的长度。
reduction (str) – 应用的归约类型，默认为“batchmean”

返回:

details – 一个具名元组，包含以下参数 loss: torch.Tensor

组合损失

rec_loss: torch.Tensor: 重构损失

返回类型:

AutoencoderLossDetails

class speechbrain.nnet.losses.VariationalAutoencoderLossDetails(loss, rec_loss, dist_loss, weighted_dist_loss)

基类: tuple

dist_loss: 字段号 2 的别名

loss: 字段号 0 的别名

rec_loss: 字段号 1 的别名

weighted_dist_loss: 字段号 3 的别名

class speechbrain.nnet.losses.AutoencoderLossDetails(loss, rec_loss)

基类: tuple

loss: 字段号 0 的别名

rec_loss: 字段号 1 的别名

class speechbrain.nnet.losses.Laplacian(kernel_size, dtype=torch.float32)[source]

基类: Module

计算类似图像数据的拉普拉斯算子

参数:

kernel_size (int) – Laplacian 核的大小
dtype (torch.dtype) – 数据类型（可选）

示例

>>> lap = Laplacian(3)
>>> lap.get_kernel()
tensor([[[[-1., -1., -1.],
          [-1.,  8., -1.],
          [-1., -1., -1.]]]])
>>> data = torch.eye(6) + torch.eye(6).flip(0)
>>> data
tensor([[1., 0., 0., 0., 0., 1.],
        [0., 1., 0., 0., 1., 0.],
        [0., 0., 1., 1., 0., 0.],
        [0., 0., 1., 1., 0., 0.],
        [0., 1., 0., 0., 1., 0.],
        [1., 0., 0., 0., 0., 1.]])
>>> lap(data.unsqueeze(0))
tensor([[[ 6., -3., -3.,  6.],
         [-3.,  4.,  4., -3.],
         [-3.,  4.,  4., -3.],
         [ 6., -3., -3.,  6.]]])

get_kernel()[source]: 计算 Laplacian 核

forward(data)[source]

计算图像状数据的 Laplacian

参数:: data (torch.Tensor) – 包含图像状数据的 (B x C x W x H) 或 (B x C x H x W) 张量
返回类型:: 变换后的输出。

class speechbrain.nnet.losses.LaplacianVarianceLoss(kernel_size=3, len_dim=1)[source]

基类: Module

拉普拉斯方差损失 - 用于惩罚类似图像数据（如频谱图）的模糊性。

损失值将是方差的负值，因为方差越高，图像越清晰。

参数:

kernel_size (int) – Laplacian 核的大小
len_dim (int) – 用作长度的维度

示例

>>> lap_loss = LaplacianVarianceLoss(3)
>>> data = torch.ones(6, 6).unsqueeze(0)
>>> data
tensor([[[1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1.]]])
>>> lap_loss(data)
tensor(-0.)
>>> data = (
...     torch.eye(6) + torch.eye(6).flip(0)
... ).unsqueeze(0)
>>> data
tensor([[[1., 0., 0., 0., 0., 1.],
         [0., 1., 0., 0., 1., 0.],
         [0., 0., 1., 1., 0., 0.],
         [0., 0., 1., 1., 0., 0.],
         [0., 1., 0., 0., 1., 0.],
         [1., 0., 0., 0., 0., 1.]]])
>>> lap_loss(data)
tensor(-17.6000)

forward(predictions, length=None, reduction=None)[source]

计算 Laplacian 损失

参数:

predictions (torch.Tensor) – (B x C x W x H) 或 (B x C x H x W) 张量
length (torch.Tensor) – 对应输入的长度。
reduction (str) – “batch” 或 None

返回:

loss – 损失值

返回类型:

torch.Tensor