speechbrain.processing.multi_mic 模块
多麦克风组件。
此库包含用于多麦克风信号处理的函数。
示例
>>> import torch
>>>
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.features import STFT, ISTFT
>>> from speechbrain.processing.multi_mic import Covariance
>>> from speechbrain.processing.multi_mic import GccPhat, SrpPhat, Music
>>> from speechbrain.processing.multi_mic import DelaySum, Mvdr, Gev
>>>
>>> xs_speech = read_audio(
... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac'
... )
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels]
>>> xs_noise_diff = read_audio('tests/samples/multi-mic/noise_diffuse.flac')
>>> xs_noise_diff = xs_noise_diff.unsqueeze(0)
>>> xs_noise_loc = read_audio('tests/samples/multi-mic/noise_0.70225_-0.70225_0.11704.flac')
>>> xs_noise_loc = xs_noise_loc.unsqueeze(0)
>>> fs = 16000 # sampling rate
>>> ss = xs_speech
>>> nn_diff = 0.05 * xs_noise_diff
>>> nn_loc = 0.05 * xs_noise_loc
>>> xs_diffused_noise = ss + nn_diff
>>> xs_localized_noise = ss + nn_loc
>>> # Delay-and-Sum Beamforming with GCC-PHAT localization
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> gccphat = GccPhat()
>>> delaysum = DelaySum()
>>> istft = ISTFT(sample_rate=fs)
>>> Xs = stft(xs_diffused_noise)
>>> Ns = stft(nn_diff)
>>> XXs = cov(Xs)
>>> NNs = cov(Ns)
>>> tdoas = gccphat(XXs)
>>> Ys_ds = delaysum(Xs, tdoas)
>>> ys_ds = istft(Ys_ds)
>>> # Mvdr Beamforming with SRP-PHAT localization
>>> mvdr = Mvdr()
>>> mics = torch.zeros((4,3), dtype=torch.float)
>>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00])
>>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00])
>>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> srpphat = SrpPhat(mics=mics)
>>> doas = srpphat(XXs)
>>> Ys_mvdr = mvdr(Xs, NNs, doas, doa_mode=True, mics=mics, fs=fs)
>>> ys_mvdr = istft(Ys_mvdr)
>>> # Mvdr Beamforming with MUSIC localization
>>> music = Music(mics=mics)
>>> doas = music(XXs)
>>> Ys_mvdr2 = mvdr(Xs, NNs, doas, doa_mode=True, mics=mics, fs=fs)
>>> ys_mvdr2 = istft(Ys_mvdr2)
>>> # GeV Beamforming
>>> gev = Gev()
>>> Xs = stft(xs_localized_noise)
>>> Ss = stft(ss)
>>> Ns = stft(nn_loc)
>>> SSs = cov(Ss)
>>> NNs = cov(Ns)
>>> Ys_gev = gev(Xs, SSs, NNs)
>>> ys_gev = istft(Ys_gev)
- 作者
William Aris
Francois Grondin
摘要
类
计算信号的协方差矩阵。 |
|
通过使用 TDOA 并以第一个通道作为参考执行延迟求和波束形成。 |
|
具有相位变换的广义互相关定位。 |
|
广义特征值分解 (GEV) 波束形成。 |
|
多信号分类 (MUSIC) 定位。 |
|
通过使用频域输入信号、其协方差矩阵和 TDOA(以计算导向向量)来执行最小方差无失真响应 (MVDR) 波束形成。 |
|
具有相位变换的导向响应功率定位。 |
函数
此函数将到达方向(以米为单位表示的 xyz 坐标)转换为到达时间差(以采样点为单位表示)。 |
|
此函数为形成 3D 球体的一组点生成笛卡尔坐标 (xyz)。 |
|
此函数通过使用每个通道的到达时间差(以采样点为单位)和 bin 数量 (n_fft) 来计算导向向量。 |
|
此函数选择每个通道的 TDOA 并将其放入一个 tensor 中。 |
参考
- class speechbrain.processing.multi_mic.Covariance(average=True)[source]
基类:
Module
计算信号的协方差矩阵。
- 参数:
average (bool) – 通知模块是否应该返回协方差矩阵的平均值(在时间维度上计算)。默认值为 True。
示例
>>> import torch >>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> xs_noise = xs_noise.unsqueeze(0) >>> xs = xs_speech + 0.05 * xs_noise >>> fs = 16000
>>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> >>> Xs = stft(xs) >>> XXs = cov(Xs) >>> XXs.shape torch.Size([1, 1001, 201, 2, 10])
- forward(Xs)[source]
此方法使用实用函数 _cov 计算协方差矩阵。因此,结果具有以下格式:(batch, time_step, n_fft/2 + 1, 2, n_mics + n_pairs)。
最后一维的顺序对应于方阵的 triu_indices。例如,如果我们有 4 个通道,则顺序如下:(0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3) 和 (3, 3)。因此,XXs[…, 0] 对应于通道 (0, 0),XXs[…, 1] 对应于通道 (0, 1)。
参数:
- Xstorch.Tensor
频域音频信号的批处理。tensor 必须具有以下格式:(batch, time_step, n_fft/2 + 1, 2, n_mics)。
- class speechbrain.processing.multi_mic.DelaySum[source]
基类:
Module
通过使用 TDOA 并以第一个通道作为参考执行延迟求和波束形成。
示例
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT, ISTFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_speech = xs_speech. unsqueeze(0) # [batch, time, channel] >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] >>> fs = 16000 >>> xs = xs_speech + 0.05 * xs_noise >>> >>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gccphat = GccPhat() >>> delaysum = DelaySum() >>> istft = ISTFT(sample_rate=fs) >>> >>> Xs = stft(xs) >>> XXs = cov(Xs) >>> tdoas = gccphat(XXs) >>> Ys = delaysum(Xs, tdoas) >>> ys = istft(Ys)
- forward(Xs, localization_tensor, doa_mode=False, mics=None, fs=None, c=343.0)[source]
此方法通过使用 TDOA/DOA 计算导向向量,然后调用实用函数 _delaysum 执行波束形成。结果具有以下格式:(batch, time_step, n_fft, 2, 1)。
- 参数:
Xs (torch.Tensor) – 频域音频信号的批处理。tensor 必须具有以下格式:(batch, time_step, n_fft/2 + 1, 2, n_mics)。
localization_tensor (torch.Tensor) – 包含每个时间步的到达时间差 (TDOA)(以采样点为单位)或到达方向 (DOA)(以米为单位的 xyz 坐标)的 tensor。如果 localization_tensor 表示 TDOA,则其格式为 (batch, time_steps, n_mics + n_pairs)。如果 localization_tensor 表示 DOA,则其格式为 (batch, time_steps, 3)。
doa_mode (bool) – 如果 localization_tensor 表示 DOA 而不是 TDOA,用户需要将此参数设置为 True。其默认值设置为 False。
mics (torch.Tensor) – 每个麦克风的笛卡尔位置(以米为单位的 xyz 坐标)。tensor 必须具有以下格式:(n_mics, 3)。此参数仅在 localization_tensor 表示 DOA 时强制要求。
fs (int) – 信号的采样率,单位为赫兹。此参数仅在 localization_tensor 表示 DOA 时强制要求。
c (float) – 介质中的声速。速度以米/秒表示,此参数的默认值为 343 米/秒。此参数仅在 localization_tensor 表示 DOA 时使用。
- 返回:
Ys
- 返回类型:
torch.Tensor
- class speechbrain.processing.multi_mic.Mvdr(eps=1e-20)[source]
基类:
Module
通过使用频域输入信号、其协方差矩阵和 TDOA(以计算导向向量)来执行最小方差无失真响应 (MVDR) 波束形成。
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT, ISTFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channel] >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] >>> fs = 16000 >>> xs = xs_speech + 0.05 * xs_noise >>> >>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gccphat = GccPhat() >>> mvdr = Mvdr() >>> istft = ISTFT(sample_rate=fs) >>> >>> Xs = stft(xs) >>> Ns = stft(xs_noise) >>> XXs = cov(Xs) >>> NNs = cov(Ns) >>> tdoas = gccphat(XXs) >>> Ys = mvdr(Xs, NNs, tdoas) >>> ys = istft(Ys)
- forward(Xs, NNs, localization_tensor, doa_mode=False, mics=None, fs=None, c=343.0)[source]
此方法在使用实用函数 _mvdr 执行波束形成之前计算导向向量。结果具有以下格式:(batch, time_step, n_fft, 2, 1)。
- 参数:
Xs (torch.Tensor) – 频域音频信号的批处理。tensor 必须具有以下格式:(batch, time_step, n_fft/2 + 1, 2, n_mics)。
NNs (torch.Tensor) – 噪声信号的协方差矩阵。tensor 必须具有以下格式:(batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)。
localization_tensor (torch.Tensor) – 包含每个时间步的到达时间差 (TDOA)(以采样点为单位)或到达方向 (DOA)(以米为单位的 xyz 坐标)的 tensor。如果 localization_tensor 表示 TDOA,则其格式为 (batch, time_steps, n_mics + n_pairs)。如果 localization_tensor 表示 DOA,则其格式为 (batch, time_steps, 3)。
doa_mode (bool) – 如果 localization_tensor 表示 DOA 而不是 TDOA,用户需要将此参数设置为 True。其默认值设置为 False。
mics (torch.Tensor) – 每个麦克风的笛卡尔位置(以米为单位的 xyz 坐标)。tensor 必须具有以下格式:(n_mics, 3)。此参数仅在 localization_tensor 表示 DOA 时强制要求。
fs (int) – 信号的采样率,单位为赫兹。此参数仅在 localization_tensor 表示 DOA 时强制要求。
c (float) – 介质中的声速。速度以米/秒表示,此参数的默认值为 343 米/秒。此参数仅在 localization_tensor 表示 DOA 时使用。
- 返回:
Ys
- 返回类型:
torch.Tensor
- class speechbrain.processing.multi_mic.Gev[source]
基类:
Module
广义特征值分解 (GEV) 波束形成。
示例
>>> from speechbrain.dataio.dataio import read_audio >>> import torch >>> >>> from speechbrain.processing.features import STFT, ISTFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import Gev >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] >>> xs_noise = read_audio('tests/samples/multi-mic/noise_0.70225_-0.70225_0.11704.flac') >>> xs_noise = xs_noise.unsqueeze(0) >>> fs = 16000 >>> ss = xs_speech >>> nn = 0.05 * xs_noise >>> xs = ss + nn >>> >>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gev = Gev() >>> istft = ISTFT(sample_rate=fs) >>> >>> Ss = stft(ss) >>> Nn = stft(nn) >>> Xs = stft(xs) >>> >>> SSs = cov(Ss) >>> NNs = cov(Nn) >>> >>> Ys = gev(Xs, SSs, NNs) >>> ys = istft(Ys)
- forward(Xs, SSs, NNs)[source]
此方法使用实用函数 _gev 执行广义特征值分解波束形成。因此,结果具有以下格式:(batch, time_step, n_fft, 2, 1)。
- 参数:
Xs (torch.Tensor) – 频域音频信号的批处理。tensor 必须具有以下格式:(batch, time_step, n_fft/2 + 1, 2, n_mics)。
SSs (torch.Tensor) – 目标信号的协方差矩阵。tensor 必须具有以下格式:(batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)。
NNs (torch.Tensor) – 噪声信号的协方差矩阵。tensor 必须具有以下格式:(batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)。
- 返回:
Ys
- 返回类型:
torch.Tensor
- class speechbrain.processing.multi_mic.GccPhat(tdoa_max=None, eps=1e-20)[source]
基类:
Module
具有相位变换的广义互相关定位。
- 参数:
示例
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT, ISTFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channel] >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] >>> fs = 16000 >>> xs = xs_speech + 0.05 * xs_noise >>> >>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gccphat = GccPhat() >>> Xs = stft(xs) >>> XXs = cov(Xs) >>> tdoas = gccphat(XXs)
- forward(XXs)[source]
使用实用函数 _gcc_phat 执行广义互相关相位变换定位,并通过提取延迟(以采样点为单位)然后执行二次插值来提高精度。结果格式为:(batch, time_steps, n_mics + n_pairs)。
最后一维的顺序对应于方阵的 triu_indices。例如,如果我们有 4 个通道,则顺序如下:(0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3) 和 (3, 3)。因此,delays[…, 0] 对应于通道 (0, 0),delays[…, 1] 对应于通道 (0, 1)。
参数:
- XXstorch.Tensor
输入信号的协方差矩阵。tensor 必须具有以下格式:(batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)。
- class speechbrain.processing.multi_mic.SrpPhat(mics, space='sphere', sample_rate=16000, speed_sound=343.0, eps=1e-20)[source]
基类:
Module
具有相位变换的导向响应功率定位。
- 参数:
mics (torch.Tensor) – 每个麦克风以米为单位的笛卡尔坐标 (xyz)。tensor 必须具有以下格式:(n_mics, 3)。
space (string) – 如果此参数设置为 ‘sphere’,则定位将在 3D 空间中通过在可能的 DOA 球体中搜索来完成。如果设置为 ‘circle’,则搜索将在 2D 空间中通过在圆中搜索来完成。默认情况下,此参数设置为 ‘sphere’。注意:‘circle’ 选项尚未实现。
sample_rate (int) – 对其执行 SRP-PHAT 的信号的采样率,单位为赫兹。默认情况下,此参数设置为 16000 Hz。
speed_sound (float) – 介质中的声速。速度以米/秒表示,此参数的默认值为 343 米/秒。
eps (float) – 用于避免除以 0 等错误的小值。此参数的默认值为 1e-20。
示例
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import SrpPhat
>>> xs_speech = read_audio('tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac') >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> fs = 16000
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] >>> xs_noise = xs_noise.unsqueeze(0)
>>> ss1 = xs_speech >>> ns1 = 0.05 * xs_noise >>> xs1 = ss1 + ns1
>>> ss2 = xs_speech >>> ns2 = 0.20 * xs_noise >>> xs2 = ss2 + ns2
>>> ss = torch.cat((ss1,ss2), dim=0) >>> ns = torch.cat((ns1,ns2), dim=0) >>> xs = torch.cat((xs1,xs2), dim=0)
>>> mics = torch.zeros((4,3), dtype=torch.float) >>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) >>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) >>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) >>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> srpphat = SrpPhat(mics=mics)
>>> Xs = stft(xs) >>> XXs = cov(Xs) >>> doas = srpphat(XXs)
- forward(XXs)[source]
通过计算导向向量,然后使用实用函数 _srp_phat 提取 DOA,对信号执行 SRP-PHAT 定位。结果是包含到达方向(声源方向上以米为单位的 xyz 坐标)的 tensor。输出 tensor 的格式为 (batch, time_steps, 3)。
此定位方法使用全局相干场 (GCF):https://www.researchgate.net/publication/221491705_Speaker_localization_based_on_oriented_global_coherence_field
- 参数:
XXs (torch.Tensor) – 输入信号的协方差矩阵。tensor 必须具有以下格式:(batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)。
- 返回:
doas
- 返回类型:
torch.Tensor
- class speechbrain.processing.multi_mic.Music(mics, space='sphere', sample_rate=16000, speed_sound=343.0, eps=1e-20, n_sig=1)[source]
基类:
Module
多信号分类 (MUSIC) 定位。
- 参数:
mics (torch.Tensor) – 每个麦克风以米为单位的笛卡尔坐标 (xyz)。tensor 必须具有以下格式:(n_mics, 3)。
space (string) – 如果此参数设置为 ‘sphere’,则定位将在 3D 空间中通过在可能的 DOA 球体中搜索来完成。如果设置为 ‘circle’,则搜索将在 2D 空间中通过在圆中搜索来完成。默认情况下,此参数设置为 ‘sphere’。注意:‘circle’ 选项尚未实现。
sample_rate (int) – 对其执行 SRP-PHAT 的信号的采样率,单位为赫兹。默认情况下,此参数设置为 16000 Hz。
speed_sound (float) – 介质中的声速。速度以米/秒表示,此参数的默认值为 343 米/秒。
eps (float) – 用于避免除以 0 等错误的小值。此参数的默认值为 1e-20。
n_sig (int) – 声源数量的估计值。默认值设置为一个声源。
示例
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import SrpPhat
>>> xs_speech = read_audio('tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac') >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> fs = 16000
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] >>> xs_noise = xs_noise.unsqueeze(0)
>>> ss1 = xs_speech >>> ns1 = 0.05 * xs_noise >>> xs1 = ss1 + ns1
>>> ss2 = xs_speech >>> ns2 = 0.20 * xs_noise >>> xs2 = ss2 + ns2
>>> ss = torch.cat((ss1,ss2), dim=0) >>> ns = torch.cat((ns1,ns2), dim=0) >>> xs = torch.cat((xs1,xs2), dim=0)
>>> mics = torch.zeros((4,3), dtype=torch.float) >>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) >>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) >>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) >>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> music = Music(mics=mics)
>>> Xs = stft(xs) >>> XXs = cov(Xs) >>> doas = music(XXs)
- speechbrain.processing.multi_mic.doas2taus(doas, mics, fs, c=343.0)[source]
此函数将到达方向(以米为单位表示的 xyz 坐标)转换为到达时间差(以采样点为单位表示)。结果具有以下格式:(batch, time_steps, n_mics)。
- 参数:
- 返回:
taus
- 返回类型:
torch.Tensor
示例
>>> import torch
>>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.multi_mic import sphere, doas2taus
>>> xs = read_audio('tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac') >>> xs = xs.unsqueeze(0) # [batch, time, channels] >>> fs = 16000 >>> mics = torch.zeros((4,3), dtype=torch.float) >>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) >>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) >>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) >>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> doas = sphere() >>> taus = doas2taus(doas, mics, fs)
- speechbrain.processing.multi_mic.tdoas2taus(tdoas)[source]
此函数选择每个通道的 TDOA 并将其放入一个 tensor 中。结果具有以下格式:(batch, time_steps, n_mics)。
- 参数:
tdoas (torch.Tensor) – 每个时间步的到达时间差 (TDOA)(以采样点为单位)。tensor 的格式为 (batch, time_steps, n_mics + n_pairs)。
- 返回:
taus
- 返回类型:
torch.Tensor
示例
>>> import torch >>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, tdoas2taus >>> >>> xs_speech = read_audio( ... 'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac' ... ) >>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac') >>> xs = xs_speech + 0.05 * xs_noise >>> xs = xs.unsqueeze(0) >>> fs = 16000 >>> >>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gccphat = GccPhat() >>> >>> Xs = stft(xs) >>> XXs = cov(Xs) >>> tdoas = gccphat(XXs) >>> taus = tdoas2taus(tdoas)
- speechbrain.processing.multi_mic.steering(taus, n_fft)[source]
此函数通过使用每个通道的到达时间差(以采样点为单位)和 bin 数量 (n_fft) 来计算导向向量。结果具有以下格式:(batch, time_step, n_fft/2 + 1, 2, n_mics)。
参数:
- taustorch.Tensor
每个通道的到达时间差。tensor 必须具有以下格式:(batch, time_steps, n_mics)。
- n_fftint
STFT 产生的 bin 数量。假定 STFT 的参数“onesided”设置为 True。
示例:——–f >>> import torch >>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, tdoas2taus, steering >>> >>> xs_speech = read_audio( … ‘tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac’ … ) >>> xs_noise = read_audio(‘tests/samples/multi-mic/noise_diffuse.flac’) >>> xs = xs_speech + 0.05 * xs_noise >>> xs = xs.unsqueeze(0) # [batch, time, channels] >>> fs = 16000
>>> stft = STFT(sample_rate=fs) >>> cov = Covariance() >>> gccphat = GccPhat() >>> >>> Xs = stft(xs) >>> n_fft = Xs.shape[2] >>> XXs = cov(Xs) >>> tdoas = gccphat(XXs) >>> taus = tdoas2taus(tdoas) >>> As = steering(taus, n_fft)
- speechbrain.processing.multi_mic.sphere(levels_count=4)[source]
此函数为形成 3D 球体的一组点生成笛卡尔坐标 (xyz)。坐标以米为单位表示,可用作 DOA。结果具有以下格式:(n_points, 3)。
- 参数:
levels_count (int) –
与用户想要生成的点数成比例的数字。
如果 levels_count = 1,则球体将有 42 个点。
如果 levels_count = 2,则球体将有 162 个点。
如果 levels_count = 3,则球体将有 642 个点。
如果 levels_count = 4,则球体将有 2562 个点。
如果 levels_count = 5,则球体将有 10242 个点。
…
默认情况下,levels_count 设置为 4。
- 返回:
pts – 球体中 xyz 点的列表。
- 返回类型:
torch.Tensor
示例
>>> import torch >>> from speechbrain.processing.multi_mic import sphere >>> doas = sphere()