speechbrain.lobes.models.convolution 模块

此模块用于集合一个带或不带残差连接的卷积（深度可分离）编码器。

作者

Jianyuan Zhong 2020
Titouan Parcollet 2023
Gianfranco Dumoulin Bertucci 2025

总结

类

`ConvBlock`	卷积块的实现，使用 1d 或 2d 卷积（深度可分离）。
`ConvolutionFrontEnd`	此模块用于集合一个带或不带残差连接的卷积（深度可分离）编码器。
`ConvolutionalSpatialGatingUnit`	此模块实现了 Branchformer 中定义的 CSGU："Branchformer: 用于语音识别和理解的并行 MLP-Attention 架构，以捕获局部和全局上下文"

参考

class speechbrain.lobes.models.convolution.ConvolutionalSpatialGatingUnit(input_size: int, kernel_size: int = 31, dropout: float = 0.0, use_linear_after_conv: bool = False, activation: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.linear.Identity'>)[源代码]

基类: Module

此模块实现了 Branchformer 中定义的 CSGU："Branchformer: 用于语音识别和理解的并行 MLP-Attention 架构，以捕获局部和全局上下文”

代码很大程度上受到了原始 ESPNet 实现的启发。

参数:

input_size (int) – 特征（通道）维度的尺寸。
kernel_size (int, optional (default=31)) – 卷积核尺寸。
dropout (float, optional (default=0.0)) – 应用于输出的 dropout 率。
use_linear_after_conv (bool, optional (default=False)) – 如果为 True，将应用尺寸为 input_size//2 的线性变换。
activation (Type[torch.nn.Module], optional (default=torch.nn.Identity)) – 用于门控的激活函数。

示例

>>> x = torch.rand((8, 30, 10))
>>> conv = ConvolutionalSpatialGatingUnit(input_size=x.shape[-1])
>>> out = conv(x)
>>> out.shape
torch.Size([8, 30, 5])

forward(x)[源代码]

参数:: x (torch.Tensor) – 输入 tensor，形状 (B, T, D)
返回:: out – 处理后的输出。
返回类型:: torch.Tensor

class speechbrain.lobes.models.convolution.ConvolutionFrontEnd(input_shape: ~typing.Iterable, num_blocks: int = 3, num_layers_per_block: int = 5, out_channels: ~typing.List[int] = [128, 256, 512], kernel_sizes: ~typing.List[int] = [3, 3, 3], strides: ~typing.List[int] = [1, 2, 2], dilations: ~typing.List[int] = [1, 1, 1], residuals: ~typing.List[bool] = [True, True, True], conv_module: ~typing.Type[~torch.nn.modules.module.Module] = <class 'speechbrain.nnet.CNN.Conv2d'>, activation: ~typing.Callable = <class 'torch.nn.modules.activation.LeakyReLU'>, norm: ~typing.Type[~torch.nn.modules.module.Module] | None = <class 'speechbrain.nnet.normalization.LayerNorm'>, dropout: float = 0.1, conv_bias: bool = True, padding: ~typing.Literal['same', 'valid', 'causal'] = 'same', conv_init: str | None = None)[源代码]

基类: Sequential

此模块用于集合一个带或不带残差连接的卷积（深度可分离）编码器。

参数:

input_shape (Iterable) – 预期输入 tensor 的形状。
num_blocks (int, optional (default=3)) – 块的数量。
num_layers_per_block (int, optional (default=5)) – 每个块的卷积层数量。
out_channels (List[int], optional (default=[128, 256, 512])) – 每个块的输出通道数。
kernel_sizes (List[int], optional (default=[3, 3, 3])) – 卷积块的核尺寸。
strides (List[int], optional (default=[1, 2, 2])) – 每个块的步长因子，应用于最后一层。
dilations (List[int], optional (default=[1, 1, 1])) – 每个块的膨胀因子。
residuals (List[bool], optional (default=[True, True, True])) – 是否在每个块应用残差连接。
conv_module (Type[torch.nn.Module], optional (default=sb.nnet.Conv2d)) – 用于构造卷积层的类。
activation (Callable, optional (default=torch.nn.LeakyReLU)) – 每个块的激活函数。
norm (Optional[Type[torch.nn.Module]] (default=LayerNorm)) – 用于模型正则化的归一化方法。
dropout (float, optional (default=0.1)) – dropout 概率。
conv_bias (bool, optional (default=True)) – 是否向卷积层添加偏置项。
padding (Literal["same", "valid", "causal"], optional (default="same")) – 要应用的 padding 类型。
conv_init (Optional[str], optional (default=None=zeros)) – 用于卷积层的初始化类型。

示例

>>> x = torch.rand((8, 30, 10))
>>> conv = ConvolutionFrontEnd(input_shape=x.shape)
>>> out = conv(x)
>>> out.shape
torch.Size([8, 8, 3, 512])

get_filter_properties() → FilterProperties[源代码]

类 speechbrain.lobes.models.convolution.ConvBlock(num_layers: int, out_channels: int, input_shape: ~typing.Iterable, kernel_size: int = 3, stride: int = 1, dilation: int = 1, residual: bool = False, conv_module: ~typing.Type[~torch.nn.modules.module.Module] = <class 'speechbrain.nnet.CNN.Conv2d'>, activation: ~typing.Callable = <class 'torch.nn.modules.activation.LeakyReLU'>, norm: ~typing.Type[~torch.nn.modules.module.Module] | None = None, dropout: float = 0.1, conv_bias: bool = True, padding: ~typing.Literal['same', 'valid', 'causal'] = 'same', conv_init: str | None = None)[source]

基类: Module

卷积块的实现，使用 1d 或 2d 卷积（深度可分离）。

参数:

class speechbrain.lobes.models.convolution.ConvBlock(num_layers: int, out_channels: int, input_shape: ~typing.Iterable, kernel_size: int = 3, stride: int = 1, dilation: int = 1, residual: bool = False, conv_module: ~typing.Type[~torch.nn.modules.module.Module] = <class 'speechbrain.nnet.CNN.Conv2d'>, activation: ~typing.Callable = <class 'torch.nn.modules.activation.LeakyReLU'>, norm: ~typing.Type[~torch.nn.modules.module.Module] | None = None, dropout: float = 0.1, conv_bias: bool = True, padding: ~typing.Literal['same', 'valid', 'causal'] = 'same', conv_init: str | None = None)[源代码]
num_layers (int) – 此块的深度可分离卷积层数量。
input_shape (Iterable) – 预期输入 tensor 的形状。
out_channels (int) – 此模型的输出通道数。
kernel_size (int, optional (default=3)) – 卷积层的核尺寸。
stride (int, optional (default=1)) – 此块的步长因子。
dilation (int, optional (default=1)) – 膨胀因子。
residual (bool, optional (default=False)) – 如果为 True，添加残差连接。
conv_module (Type[torch.nn.Module], optional (default=sb.nnet.Conv2d)) – 构造卷积层时使用的类。
activation (Callable, optional (default=torch.nn.LeakyReLU)) – 此块的激活函数。
norm (Optional[Type[torch.nn.Module]] (default=None)) – 用于模型正则化的归一化方法。
dropout (float, optional (default=0.1)) – 输出置零的概率。
conv_bias (bool, optional (default=True)) – 向卷积层添加偏置项。
conv_init (Optional[str], optional (default=None=zeros)) – 用于卷积层的初始化类型。

示例

>>> x = torch.rand((8, 30, 10))
>>> conv = ConvBlock(2, 16, input_shape=x.shape)
>>> out = conv(x)
>>> x.shape
torch.Size([8, 30, 10])

padding (Literal["same", "valid", "causal"], optional (default="same")) – 要添加的 padding 类型。: forward(x)[源代码]

处理输入 tensor x 并返回一个输出 tensor。