speechbrain.processing.vocal_features 模块

用于分析发声特征的函数：jitter (抖动)、shimmer (颤动)、HNR (谐波噪声比) 和 GNE。

这些函数通常用于使用更传统的方法（即非深度学习）分析构音障碍声音。它们常常作为基线用于例如病理检测。灵感来自 PRAAT。

作者

Peter Plantinga, 2024

摘要

函数

`autocorrelate`	使用循环卷积生成自相关分数。
`compute_autocorr_features`	计算基于自相关的特征
`compute_cross_correlation`	计算两组帧之间的相关性。
`compute_gne`	来自原始论文的 GNE 计算算法
`compute_hilbert_envelopes`	使用 FFT 计算特定频带信号的希尔伯特包络。
`compute_periodic_features`	计算周期特征的函数：jitter (抖动)、shimmer (颤动)
`compute_spectral_features`	计算频谱帧上的统计量，例如 flux (通量)、skew (偏度)、spread (展度)、flatness (平坦度)。
`inverse_filter`	对帧进行逆滤波以估计声门脉冲序列。
`spec_norm`	通过频谱归一化给定值。

参考

speechbrain.processing.vocal_features.compute_autocorr_features(frames, min_lag, max_lag, neighbors=5)[source]

计算基于自相关的特征

参数：

frames (torch.Tensor) – 用于评估自相关的音频帧，形状为 [batch, frame, sample]
min_lag (int) – 考虑的潜在周期长度的最小样本数。
max_lag (int) – 考虑的潜在周期长度的最大样本数。
neighbors (int) – 用于滚动中值的邻居数量 – 以避免八度错误。

返回：

harmonicity (torch.Tensor) – 相对于 0-lag 分数的最高自相关分数。用于计算 HNR。
best_lags (torch.Tensor) – 对应于最高自相关分数的 lag，周期长度的估计值。

示例

>>> audio = torch.rand(1, 16000)
>>> frames = audio.unfold(-1, 800, 200)
>>> frames.shape
torch.Size([1, 77, 800])
>>> harmonicity, best_lags = compute_autocorr_features(frames, 100, 200)
>>> harmonicity.shape
torch.Size([1, 77])
>>> best_lags.shape
torch.Size([1, 77])

speechbrain.processing.vocal_features.autocorrelate(frames)[source]

使用循环卷积生成自相关分数。

参数：: frames (torch.Tensor) – 用于评估自相关的音频帧，形状为 [batch, frame, sample]
返回：: autocorrelation – 最佳候选 lag 的自相关分数与 lag 0 处理论最大自相关分数之比。通过窗口的自相关分数进行归一化。
返回类型：: torch.Tensor

示例

>>> audio = torch.rand(1, 16000)
>>> frames = audio.unfold(-1, 800, 200)
>>> frames.shape
torch.Size([1, 77, 800])
>>> autocorrelation = autocorrelate(frames)
>>> autocorrelation.shape
torch.Size([1, 77, 401])

speechbrain.processing.vocal_features.compute_periodic_features(frames, best_lags, neighbors=4)[source]

计算周期特征的函数：jitter (抖动)、shimmer (颤动)

参数：

frames (torch.Tensor) – 用于计算特征的帧音频，维度 [batch, frame, sample]。
best_lags (torch.Tensor) – 每一帧的估计周期长度，维度 [batch, frame]。
neighbors (int) – 比较中使用的邻居数量。

返回：

jitter (torch.Tensor) – 帧上周期长度的平均绝对偏差。
shimmer (torch.Tensor) – 帧上幅度的平均绝对偏差。

示例

>>> audio = torch.rand(1, 16000)
>>> frames = audio.unfold(-1, 800, 200)
>>> frames.shape
torch.Size([1, 77, 800])
>>> harmonicity, best_lags = compute_autocorr_features(frames, 100, 200)
>>> jitter, shimmer = compute_periodic_features(frames, best_lags)
>>> jitter.shape
torch.Size([1, 77])
>>> shimmer.shape
torch.Size([1, 77])

speechbrain.processing.vocal_features.compute_spectral_features(spectrum, eps=1e-10)[source]

计算频谱帧上的统计量，例如 flux (通量)、skew (偏度)、spread (展度)、flatness (平坦度)。

计算值的参考页面：https://www.mathworks.com/help/audio/ug/spectral-descriptors.html

参数：

spectrum (torch.Tensor) – 用于计算特征的频谱，维度 [batch, frame, freq]。
eps (float) – 一个小值，用于避免除以 0。

返回：

features –

一个 [batch, frame, 8] 的 tensor，包含每一帧的频谱特征

centroid: 频谱的平均值。
spread: 频谱的标准差。
skew: 频谱平衡度。
kurtosis: 频谱峰度。
entropy: 频谱的尖锐度。
flatness: 几何平均数与算术平均数之比。
crest: 频谱最大值与算术平均数之比。
flux: 一个频谱值与其后续频谱值之间的平均平方差。

返回类型：

torch.Tensor

示例

>>> audio = torch.rand(1, 16000)
>>> window_size = 800
>>> frames = audio.unfold(-1, window_size, 200)
>>> frames.shape
torch.Size([1, 77, 800])
>>> hann = torch.hann_window(window_size).view(1, 1, -1)
>>> windowed_frames = frames * hann
>>> spectrum = torch.abs(torch.fft.rfft(windowed_frames))
>>> spectral_features = compute_spectral_features(spectrum)
>>> spectral_features.shape
torch.Size([1, 77, 8])

speechbrain.processing.vocal_features.spec_norm(value, spectrum, eps=1e-10)[source]: 通过频谱归一化给定值。

speechbrain.processing.vocal_features.compute_gne(audio, sample_rate=16000, bandwidth=1000, fshift=300, frame_len=0.03, hop_len=0.01)[source]

来自原始论文的 GNE 计算算法

引用 D. Michaelis、T. Oramss 和 H. W. Strube 的论文“Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices”。

该算法将信号分成频带，并比较频带之间的相关性。高相关性表明信号中的噪声相对较低，而较低的相关性可能表明声信号存在病理。.

Godino-Llorente 等人在“声门噪声激励比对声音障碍筛查的有效性”中探讨了带宽和频率偏移参数的优劣，此处的默认值是该研究中推荐的值。

参数：

audio (torch.Tensor) – 用于计算 GNE 的批处理音频信号，[batch, sample]
sample_rate (float) – 输入音频的采样率。
bandwidth (float) – 用于计算相关性的频带宽度。
fshift (float) – 用于计算相关性的频带之间的偏移。
frame_len (float) – 每个分析帧的长度，以秒为单位。
hop_len (float) – 每个分析帧开始之间的时间长度，以秒为单位。

返回：

gne – 音频信号每一帧的声门噪声激励比。

返回类型：

torch.Tensor

示例

>>> sample_rate = 16000
>>> audio = torch.rand(1, sample_rate) # 1s of audio
>>> gne = compute_gne(audio, sample_rate=sample_rate)
>>> gne.shape
torch.Size([1, 98])

speechbrain.processing.vocal_features.inverse_filter(frames, lpc_order=13)[source]

对帧进行逆滤波以估计声门脉冲序列。

使用自相关方法和线性预测编码 (LPC)。算法来自 https://course.ece.cmu.edu/~ece792/handouts/RS_Chap_LPC.pdf

参数：

frames (torch.Tensor) – 要使用逆滤波器进行滤波的音频帧。
lpc_order (int) – 要计算并在帧上使用的滤波器大小。

返回：

filtered_frames – 应用逆滤波器后的帧

返回类型：

torch.Tensor

示例

>>> audio = torch.rand(1, 10000)
>>> frames = audio.unfold(-1, 300, 100)
>>> frames.shape
torch.Size([1, 98, 300])
>>> filtered_frames = inverse_filter(frames)
>>> filtered_frames.shape
torch.Size([1, 98, 300])

speechbrain.processing.vocal_features.compute_hilbert_envelopes(frames, center_freq, bandwidth=1000, sample_rate=10000)[source]

使用 FFT 计算特定频带信号的希尔伯特包络。

参数：

frames (torch.Tensor) – 一组用于计算包络的信号帧。
center_freq (float) – 包络的目标频率。
bandwidth (float) – 用于包络的频带大小。
sample_rate (float) – 帧信号每秒的样本数。

返回：

envelopes – 计算出的包络。

返回类型：

torch.Tensor

示例

>>> audio = torch.rand(1, 10000)
>>> frames = audio.unfold(-1, 300, 100)
>>> frames.shape
torch.Size([1, 98, 300])
>>> envelope = compute_hilbert_envelopes(frames, 1000)
>>> envelope.shape
torch.Size([1, 98, 300])

speechbrain.processing.vocal_features.compute_cross_correlation(frames_a, frames_b, width=None)[source]

计算两组帧之间的相关性。

参数：

frames_a (torch.Tensor)
frames_b (torch.Tensor) – 用于使用互相关进行比较的两组帧，形状为 [batch, frame, sample]
width (int, default is None) – 0 lag 前后的样本数。 width 为 3 将返回 7 个结果。如果为 None，则将 0 lag 放在前面，结果是原始长度的 1/2 + 1，对于自相关来说是个不错的默认值，因为没有重复值。

返回类型：

frames_a 和 frames_b 之间的互相关。

示例

>>> frames = torch.arange(10).view(1, 1, -1).float()
>>> compute_cross_correlation(frames, frames, width=3)
tensor([[[0.6316, 0.7193, 0.8421, 1.0000, 0.8421, 0.7193, 0.6316]]])
>>> compute_cross_correlation(frames, frames)
tensor([[[1.0000, 0.8421, 0.7193, 0.6316, 0.5789, 0.5614]]])