speechbrain.lobes.models.g2p.homograph 模块
同形异义词消歧工具 作者
Artem Ploujnikov 2021
摘要
类
一个实用类,用于帮助从一批序列中提取子序列 |
|
用于输出中特定单词的损失函数,用于同形异义词消歧任务。方法如下:1. |
参考
- class speechbrain.lobes.models.g2p.homograph.SubsequenceLoss(seq_cost, word_separator=0, word_separator_base=0)[source]
基类:
Module
用于输出中特定单词的损失函数,用于同形异义词消歧任务。方法如下:1. 将原始批次中仅目标单词排列成一个张量 2. 找到每个目标单词的词索引 3. 计算预测序列中单词的开头和结尾。假设模型已经训练得足够好,可以使用简单的 argmax 识别单词边界,而无需执行束搜索。重要!此损失仅用于微调 模型应已能够正确预测单词边界
- 参数:
示例
>>> import torch >>> from speechbrain.lobes.models.g2p.homograph import SubsequenceLoss >>> from speechbrain.nnet.losses import nll_loss >>> loss = SubsequenceLoss( ... seq_cost=nll_loss ... ) >>> phns = torch.Tensor( ... [[1, 2, 0, 1, 3, 0, 2, 1, 0], ... [2, 1, 3, 0, 1, 2, 0, 3, 2]] ... ) >>> phn_lens = torch.IntTensor([8, 9]) >>> subsequence_phn_start = torch.IntTensor([3, 4]) >>> subsequence_phn_end = torch.IntTensor([5, 7]) >>> p_seq = torch.Tensor([ ... [[0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [1., 0., 0., 0.]], ... [[0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 0., 0., 1.], ... [0., 0., 1., 0.]] ... ]) >>> loss_value = loss( ... phns, ... phn_lens, ... p_seq, ... subsequence_phn_start, ... subsequence_phn_end ... ) >>> loss_value tensor(-0.8000)
- property word_separator
正在使用的单词分隔符
- property word_separator_base
正在使用的单词分隔符
- forward(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_lens_base=None)[source]
计算子序列损失
- 参数:
phns (torch.Tensor) – 音素张量(批次 x 长度)
phn_lens (torch.Tensor) – 音素长度张量
p_seq (torch.Tensor) – 输出音素概率张量(批次 x 长度 x 音素)
subsequence_phn_start (torch.Tensor) – 目标子序列(即同形异义词)的起始位置
subsequence_phn_end (torch.Tensor) – 目标子序列(即同形异义词)的结束位置
phns_base (torch.Tensor) – 音素张量(未预处理)
phn_lens_base (torch.Tensor) – 音素长度(未预处理)
- 返回:
loss – 损失张量
- 返回类型:
torch.Tensor
- class speechbrain.lobes.models.g2p.homograph.SubsequenceExtractor(word_separator=0, word_separator_base=None)[source]
基类:
object
一个实用类,用于帮助从一批序列中提取子序列
- 参数:
示例
>>> import torch >>> from speechbrain.lobes.models.g2p.homograph import SubsequenceExtractor >>> extractor = SubsequenceExtractor() >>> phns = torch.Tensor( ... [[1, 2, 0, 1, 3, 0, 2, 1, 0], ... [2, 1, 3, 0, 1, 2, 0, 3, 2]] ... ) >>> phn_lens = torch.IntTensor([8, 9]) >>> subsequence_phn_start = torch.IntTensor([3, 4]) >>> subsequence_phn_end = torch.IntTensor([5, 7]) >>> p_seq = torch.Tensor([ ... [[0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [1., 0., 0., 0.]], ... [[0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 0., 0., 1.], ... [0., 0., 1., 0.]] ... ]) >>> extractor.extract_seq( ... phns, ... phn_lens, ... p_seq, ... subsequence_phn_start, ... subsequence_phn_end ... ) (tensor([[[0., 1., 0., 0.], [0., 0., 0., 1.], [0., 0., 0., 0.]], [[0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 0.]]]), tensor([[1., 3., 0.], [1., 2., 0.]]), tensor([0.6667, 1.0000]))
- extract_seq(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_base_lens=None)[source]
从完整序列中提取子序列
- 参数:
phns (torch.Tensor) – 音素张量(批次 x 长度)
phn_lens (torch.Tensor) – 音素长度张量
p_seq (torch.Tensor) – 输出音素概率张量(批次 x 长度 x 音素)
subsequence_phn_start (torch.Tensor) – 目标子序列(即同形异义词)的起始位置
subsequence_phn_end (torch.Tensor) – 目标子序列(即同形异义词)的结束位置
phns_base (torch.Tensor) – 音素张量(未预处理)
phn_base_lens (torch.Tensor) – 音素长度(未预处理)
- 返回:
p_seq_subsequence (torch.Tensor) – 输出子序列(概率)
phns_subsequence (torch.Tensor) – 目标子序列
subsequence_lengths (torch.Tensor) – 子序列长度,表示为张量最后一维的一部分