speechbrain.nnet.loss.stoi_loss 模块

用于计算 STOI 的库。参考文献：“End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks”，TASLP，2018

作者: Szu-Wei, Fu 2020

摘要

函数

`removeSilentFrames`	从 STOI 计算中移除静音帧。
`stoi_loss`	计算 STOI 分数并返回 -1 * 该分数。
`thirdoct`	返回 1/3 倍频带矩阵。

参考

speechbrain.nnet.loss.stoi_loss.thirdoct(fs, nfft, num_bands, min_freq)[source]

返回 1/3 倍频带矩阵。

参数:

fs (int) – 采样率。
nfft (int) – FFT 大小。
num_bands (int) – 1/3 倍频带的数量。
min_freq (int) – 最低 1/3 倍频带的中心频率。

返回:

obm – 倍频带矩阵。

返回类型:

tensor

speechbrain.nnet.loss.stoi_loss.removeSilentFrames(x, y, dyn_range=40, N=256, K=128)[source]

从 STOI 计算中移除静音帧。

此函数可用作基于 SGD 更新的训练的损失函数。

参数:

x (torch.Tensor) – 干净的 (参考) 波形。
y (torch.Tensor) – 退化的 (增强的) 波形。
dyn_range (int) – 用于掩码计算的动态范围。
N (int) – 窗口长度。
K (int) – 步长。

返回类型:

包含 2 个元素的列表，x 和 y 移除静音后。

speechbrain.nnet.loss.stoi_loss.stoi_loss(y_pred_batch, y_true_batch, lens, reduction='mean')[source]

计算 STOI 分数并返回 -1 * 该分数。

此函数可用作基于 SGD 更新的训练的损失函数。

参数:

y_pred_batch (torch.Tensor) – 退化的 (增强的) 波形。
y_true_batch (torch.Tensor) – 干净的 (参考) 波形。
lens (torch.Tensor) – 批次内波形的相对长度。
reduction (str) – 要使用的归约类型 (“mean” 或 “batch”)。

返回类型:

计算出的 STOI 损失。

示例

>>> a = torch.sin(torch.arange(16000, dtype=torch.float32)).unsqueeze(0)
>>> b = a + 0.001
>>> -stoi_loss(b, a, torch.ones(1))
tensor(0.7...)