speechbrain.lobes.models.conv_tasnet 模块

实现了一个流行的语音分离模型。

摘要

类

`ChannelwiseLayerNorm`	通道维度层归一化 (cLN)。
`Chomp1d`	此类从信号末尾截取一部分。
`解码器`	此类实现了 ConvTasnet 的解码器。
`DepthwiseSeparableConv`	ConvTasNet 中 Masknet 时间块的构建模块。
`编码器`	此类学习 ConvTasnet 模型的自适应前端。
`GlobalLayerNorm`	全局层归一化 (gLN)。
`MaskNet`
`TemporalBlock`	Masknet 中使用的 conv1d 复合层。
`TemporalBlocksSequential`	用于复制时间块层的包装器

函数

choose_norm

此函数返回所选的归一化类型。

参考

class speechbrain.lobes.models.conv_tasnet.Encoder(L, N)[source]

基类: Module

此类学习 ConvTasnet 模型的自适应前端。

参数:

L (int) – 滤波器核大小。必须是奇数。
N (int) – 自适应前端输出的维度数量。

示例

>>> inp = torch.rand(10, 100)
>>> encoder = Encoder(11, 20)
>>> h = encoder(inp)
>>> h.shape
torch.Size([10, 20, 20])

forward(mixture)[source]

参数:: mixture (torch.Tensor) – Tensor 形状为 [M, T]。M 是批量大小。T 是样本数量。
返回:: mixture_w – Tensor 形状为 [M, K, N]，其中 K = (T-L)/(L/2)+1 = 2T/L-1
返回类型:: torch.Tensor

class speechbrain.lobes.models.conv_tasnet.Decoder(L, N)[source]

基类: Module

此类实现了 ConvTasnet 的解码器。

分离后的源嵌入被输入到解码器以在时域中重构估计的源。

参数:

L (int) – 重构时使用的基数数量。
N (int) – 输入大小

示例

>>> L, C, N = 8, 2, 8
>>> mixture_w = torch.randn(10, 100, N)
>>> est_mask = torch.randn(10, 100, C, N)
>>> Decoder = Decoder(L, N)
>>> mixture_hat = Decoder(mixture_w, est_mask)
>>> mixture_hat.shape
torch.Size([10, 404, 2])

forward(mixture_w, est_mask)[source]

参数:

mixture_w (torch.Tensor) – Tensor 形状为 [M, K, N]。
est_mask (torch.Tensor) – Tensor 形状为 [M, K, C, N]。

返回:

est_source – Tensor 形状为 [M, T, C]。

返回类型:

torch.Tensor

class speechbrain.lobes.models.conv_tasnet.TemporalBlocksSequential(input_shape, H, P, R, X, norm_type, causal)[source]

基类: Sequential

用于复制时间块层的包装器

参数:

input_shape (tuple) – 预期的输入形状。
H (int) – 中间通道的数量。
P (int) – 卷积中的核大小。
R (int) – 重复多层时间块的次数。
X (int) – 具有不同膨胀的时间块层数。
norm_type (str) – 归一化类型，可选项为 [‘gLN’, ‘cLN’]。
causal (bool) – 使用因果卷积还是非因果卷积，可选项为 [True, False]。

示例

>>> x = torch.randn(14, 100, 10)
>>> H, P, R, X = 10, 5, 2, 3
>>> TemporalBlocks = TemporalBlocksSequential(
...     x.shape, H, P, R, X, 'gLN', False
... )
>>> y = TemporalBlocks(x)
>>> y.shape
torch.Size([14, 100, 10])

class speechbrain.lobes.models.conv_tasnet.MaskNet(N, B, H, P, X, R, C, norm_type='gLN', causal=False, mask_nonlinear='relu')[source]

基类: Module

参数:

N (int) – 自动编码器中的滤波器数量。
B (int) – 瓶颈 1 × 1 卷积块中的通道数量。
H (int) – 卷积块中的通道数量。
P (int) – 卷积块中的核大小。
X (int) – 每次重复中的卷积块数量。
R (int) – 重复次数。
C (int) – 说话人数量。
norm_type (str) – 可选项包括 BN, gLN, cLN。
causal (bool) – 因果或非因果。
mask_nonlinear (str) – 使用哪个非线性函数生成掩码，可选项为 [‘softmax’, ‘relu’]。

示例

>>> N, B, H, P, X, R, C = 11, 12, 2, 5, 3, 1, 2
>>> MaskNet = MaskNet(N, B, H, P, X, R, C)
>>> mixture_w = torch.randn(10, 11, 100)
>>> est_mask = MaskNet(mixture_w)
>>> est_mask.shape
torch.Size([2, 10, 11, 100])

forward(mixture_w)[source]

保持此 API 与 TasNet 相同。

参数:: mixture_w (torch.Tensor) – Tensor 形状为 [M, K, N]，M 是批量大小。
返回:: est_mask – Tensor 形状为 [M, K, C, N]。
返回类型:: torch.Tensor

class speechbrain.lobes.models.conv_tasnet.TemporalBlock(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]

基类: Module

Masknet 中使用的 conv1d 复合层。

参数:

input_shape (tuple) – 预期的输入形状。
out_channels (int) – 中间通道的数量。
kernel_size (int) – 卷积中的核大小。
stride (int) – 卷积层中的卷积步长。
padding (str) – 卷积层中的填充类型，(same, valid, causal)。如果是“valid”，则不执行填充。
dilation (int) – 卷积层中的膨胀量。
norm_type (str) – 归一化类型，可选项为 [‘gLN’, ‘cLN’]。
causal (bool) – 使用因果卷积还是非因果卷积，可选项为 [True, False]。

示例

>>> x = torch.randn(14, 100, 10)
>>> TemporalBlock = TemporalBlock(x.shape, 10, 11, 1, 'same', 1)
>>> y = TemporalBlock(x)
>>> y.shape
torch.Size([14, 100, 10])

forward(x)[source]

参数:: x (torch.Tensor) – Tensor 形状为 [M, K, B]。
返回:: x – Tensor 形状为 [M, K, B]。
返回类型:: torch.Tensor

class speechbrain.lobes.models.conv_tasnet.DepthwiseSeparableConv(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]

基类: Sequential

ConvTasNet 中 Masknet 时间块的构建模块。

参数:

input_shape (tuple) – 预期的输入形状。
out_channels (int) – 输出通道数量。
kernel_size (int) – 卷积中的核大小。
stride (int) – 卷积层中的卷积步长。
padding (str) – 卷积层中的填充类型，(same, valid, causal)。如果是“valid”，则不执行填充。
dilation (int) – 卷积层中的膨胀量。
norm_type (str) – 归一化类型，可选项为 [‘gLN’, ‘cLN’]。
causal (bool) – 使用因果卷积还是非因果卷积，可选项为 [True, False]。

示例

>>> x = torch.randn(14, 100, 10)
>>> DSconv = DepthwiseSeparableConv(x.shape, 10, 11, 1, 'same', 1)
>>> y = DSconv(x)
>>> y.shape
torch.Size([14, 100, 10])

class speechbrain.lobes.models.conv_tasnet.Chomp1d(chomp_size)[source]

基类: Module

此类从信号末尾截取一部分。

它被写成一个类，以便能够将其合并到序列包装器中。

参数:: chomp_size (int) – 要丢弃部分的尺寸（以样本为单位）。

示例

>>> x = torch.randn(10, 110, 5)
>>> chomp = Chomp1d(10)
>>> x_chomped = chomp(x)
>>> x_chomped.shape
torch.Size([10, 100, 5])

forward(x)[source]

参数:: x (torch.Tensor) – Tensor 形状为 [M, Kpad, H]。
返回:: x – Tensor 形状为 [M, K, H]。
返回类型:: torch.Tensor

speechbrain.lobes.models.conv_tasnet.choose_norm(norm_type, channel_size)[source]

此函数返回所选的归一化类型。

参数:

norm_type (str) – 可选项包括 [‘gLN’, ‘cLN’, ‘batchnorm’]。
channel_size (int) – 通道数量。

返回类型:

构造的所选类型的层

示例

>>> choose_norm('gLN', 10)
GlobalLayerNorm()

class speechbrain.lobes.models.conv_tasnet.ChannelwiseLayerNorm(channel_size)[source]

基类: Module

通道维度层归一化 (cLN)。

参数:: channel_size (int) – 归一化维度（第三个维度）中的通道数量。

示例

>>> x = torch.randn(2, 3, 3)
>>> norm_func = ChannelwiseLayerNorm(3)
>>> x_normalized = norm_func(x)
>>> x.shape
torch.Size([2, 3, 3])

reset_parameters()[source]: 重置参数。

forward(y)[source]

参数: y: [M, K, N]，M 是批量大小，N 是通道大小，K 是长度
返回: cLN_y: [M, K, N]

class speechbrain.lobes.models.conv_tasnet.GlobalLayerNorm(channel_size)[source]

基类: Module

全局层归一化 (gLN)。

参数:: channel_size (int) – 第三个维度中的通道数量。

示例

>>> x = torch.randn(2, 3, 3)
>>> norm_func = GlobalLayerNorm(3)
>>> x_normalized = norm_func(x)
>>> x.shape
torch.Size([2, 3, 3])

reset_parameters()[source]: 重置参数。

重置参数。

参数:: y (torch.Tensor) – Tensor 形状 [M, K, N]。M 是批量大小，N 是通道大小，K 是长度。
返回:: gLN_y – Tensor 形状 [M, K. N]
返回类型:: torch.Tensor