speechbrain.decoders.scorer 模块
Token 打分器抽象和规范。
- 作者
Adel Moumen 2022, 2023
Sung-Lin Yeh 2021
摘要
类
用于束搜索中其他打分方法继承的打分器抽象。 |
|
用于束搜索中其他打分方法继承的打分器抽象。 |
|
基于 BaseScorerInterface 的 CTCPrefixScore 包装器。 |
|
一个覆盖惩罚打分器,用于防止假设循环,其中 |
|
基于 BaseRescorerInterface 的 HuggingFace TransformerLM 包装器。 |
|
KenLM N-gram 打分器。 |
|
一个长度奖励打分器。 |
|
基于 BaseRescorerInterface 的 RNNLM 包装器。 |
|
基于 BaseScorerInterface 的 RNNLM 包装器。 |
|
为束搜索构建重新打分器实例。 |
|
为束搜索构建打分器实例。 |
|
基于 BaseRescorerInterface 的 TransformerLM 包装器。 |
|
基于 BaseScorerInterface 的 TransformerLM 包装器。 |
参考
- class speechbrain.decoders.scorer.BaseScorerInterface[source]
基类:
object
用于束搜索中其他打分方法继承的打分器抽象。
打分器是一个模块,它根据当前时间步输入和先前的打分器状态对词汇表中的 tokens 进行打分。它可以用于对整个词汇表(即完整打分器)或剪枝的 tokens 子集(即部分打分器)进行打分,以避免计算开销。在后一种情况下,部分打分器将在完整打分器之后调用。它只会对从完整打分器中提取的 top-k 候选 tokens(即剪枝的 tokens 子集)进行打分。top-k 候选 tokens 是根据 beam 大小和 scorer_beam_scale 提取的,以便候选 tokens 的数量为 int(beam_size * scorer_beam_scale)。当完整打分器计算开销较大时(例如 KenLM 打分器),这非常有用。
继承此类以实现与 speechbrain.decoders.seq2seq.S2SBeamSearcher() 兼容的自定义打分器。
- 参见
speechbrain.decoders.scorer.CTCPrefixScorer
speechbrain.decoders.scorer.RNNLMScorer
speechbrain.decoders.scorer.TransformerLMScorer
speechbrain.decoders.scorer.KenLMScorer
speechbrain.decoders.scorer.CoverageScorer
speechbrain.decoders.scorer.LengthScorer
- score(inp_tokens, memory, candidates, attn)[source]
此方法根据当前时间步的信息对新的 beams 进行打分。
打分是一个形状为 (batch_size x beam_size, vocab_size) 的 tensor。它是给定当前时间步输入和先前打分器状态下下一个 token 的对数概率。
它可以用于对剪枝的 top-k 候选 tokens 进行打分,以防止计算开销,或者当 candidates 为 None 时对整个词汇表进行打分。
- 参数:
inp_tokens (torch.Tensor) – 当前时间步的输入 tensor。
memory (No limit) – 此时间步的打分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size)。在完整打分器之后要打分的 top-k 候选 tokens。如果为 None,打分器将对整个词汇表进行打分。
attn (torch.Tensor) – 在 CoverageScorer 或 CTCScorer 中使用的注意力权重。
- 返回:
torch.Tensor – (batch_size x beam_size, vocab_size),下一个 tokens 的得分。
memory (No limit) – 此时间步的内存变量输入。
- class speechbrain.decoders.scorer.CTCScorer(ctc_fc, blank_index, eos_index, ctc_window_size=0)[source]
-
基于 BaseScorerInterface 的 CTCPrefixScore 包装器。
此打分器用于提供下一个输入 tokens 的 CTC 标签同步得分。实现基于 https://www.merl.com/publications/docs/TR2017-190.pdf。
- 参见
speechbrain.decoders.scorer.CTCPrefixScore
- 参数:
示例
>>> import torch >>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.transformer.TransformerASR import TransformerASR >>> from speechbrain.decoders import S2STransformerBeamSearcher, CTCScorer, ScorerBuilder >>> batch_size=8 >>> n_channels=6 >>> input_size=40 >>> d_model=128 >>> tgt_vocab=140 >>> src = torch.rand([batch_size, n_channels, input_size]) >>> tgt = torch.randint(0, tgt_vocab, [batch_size, n_channels]) >>> net = TransformerASR( ... tgt_vocab, input_size, d_model, 8, 1, 1, 1024, activation=torch.nn.GELU ... ) >>> ctc_lin = Linear(input_shape=(1, 40, d_model), n_neurons=tgt_vocab) >>> lin = Linear(input_shape=(1, 40, d_model), n_neurons=tgt_vocab) >>> eos_index = 2 >>> ctc_scorer = CTCScorer( ... ctc_fc=ctc_lin, ... blank_index=0, ... eos_index=eos_index, ... ) >>> scorer = ScorerBuilder( ... full_scorers=[ctc_scorer], ... weights={'ctc': 1.0} ... ) >>> searcher = S2STransformerBeamSearcher( ... modules=[net, lin], ... bos_index=1, ... eos_index=eos_index, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... using_eos_threshold=False, ... beam_size=7, ... temperature=1.15, ... scorer=scorer ... ) >>> enc, dec = net.forward(src, tgt) >>> hyps, _, _, _ = searcher(enc, torch.ones(batch_size))
- score(inp_tokens, memory, candidates, attn)[source]
此方法根据在时间帧上计算的 CTC 分数对新的 beams 进行打分。
- 参见
speechbrain.decoders.scorer.CTCPrefixScore
- 参数:
inp_tokens (torch.Tensor) – 当前时间步的输入 tensor。
memory (No limit) – 此时间步的打分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size)。在完整打分器之后要打分的 top-k 候选 tokens。如果为 None,打分器将对整个词汇表进行打分。
attn (torch.Tensor) – 在 CoverageScorer 或 CTCScorer 中使用的注意力权重。
- 返回:
scores (torch.Tensor)
memory
- class speechbrain.decoders.scorer.RNNLMScorer(language_model, temperature=1.0)[source]
-
基于 BaseScorerInterface 的 RNNLM 包装器。
RNNLMScorer 用于根据当前时间步输入和先前的打分器状态,提供下一个输入 tokens 的 RNNLM 分数。
- 参数:
language_model (torch.nn.Module) – 一个基于 RNN 的语言模型。
temperature (float) – 应用于 softmax 的温度因子。它改变了概率分布,当 T>1 时更平滑,当 T<1 时更尖锐。(默认值:1.0)
示例
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder >>> from speechbrain.decoders import S2SRNNBeamSearcher, RNNLMScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> emb = torch.nn.Embedding( ... embedding_dim=input_size, ... num_embeddings=vocab_size, ... ) >>> d_model=7 >>> dec = AttentionalRNNDecoder( ... rnn_type="gru", ... attn_type="content", ... hidden_size=3, ... attn_dim=3, ... num_layers=1, ... enc_dim=d_model, ... input_size=input_size, ... ) >>> n_channels=3 >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) >>> lm_weight = 0.4 >>> lm_model = RNNLM( ... embedding_dim=d_model, ... output_neurons=vocab_size, ... dropout=0.0, ... rnn_neurons=128, ... dnn_neurons=64, ... return_hidden=True, ... ) >>> rnnlm_scorer = RNNLMScorer( ... language_model=lm_model, ... temperature=1.25, ... ) >>> scorer = ScorerBuilder( ... full_scorers=[rnnlm_scorer], ... weights={'rnnlm': lm_weight} ... ) >>> beam_size=5 >>> searcher = S2SRNNBeamSearcher( ... embedding=emb, ... decoder=dec, ... linear=seq_lin, ... bos_index=1, ... eos_index=2, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... topk=2, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.25, ... scorer=scorer ... ) >>> batch_size=2 >>> enc = torch.rand([batch_size, n_channels, d_model]) >>> wav_len = torch.ones([batch_size]) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]
此方法根据先前的 tokens 计算的 RNNLM 分数对新的 beams 进行打分。
- 参数:
inp_tokens (torch.Tensor) – 当前时间步的输入 tensor。
memory (No limit) – 此时间步的打分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size)。在完整打分器之后要打分的 top-k 候选 tokens。如果为 None,打分器将对整个词汇表进行打分。
attn (torch.Tensor) – 在 CoverageScorer 或 CTCScorer 中使用的注意力权重。
- 返回:
log_probs (torch.Tensor) – 输出概率。
hs (torch.Tensor) – LM 隐藏状态。
- class speechbrain.decoders.scorer.TransformerLMScorer(language_model, temperature=1.0)[source]
-
基于 BaseScorerInterface 的 TransformerLM 包装器。
TransformerLMScorer 用于根据当前时间步输入和先前的打分器状态,提供下一个输入 tokens 的 TransformerLM 分数。
- 参数:
language_model (torch.nn.Module) – 一个基于 Transformer 的语言模型。
temperature (float) – 应用于 softmax 的温度因子。它改变了概率分布,当 T>1 时更平滑,当 T<1 时更尖锐。(默认值:1.0)
示例
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.transformer.TransformerASR import TransformerASR >>> from speechbrain.lobes.models.transformer.TransformerLM import TransformerLM >>> from speechbrain.decoders import S2STransformerBeamSearcher, TransformerLMScorer, CTCScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> d_model=128 >>> net = TransformerASR( ... tgt_vocab=vocab_size, ... input_size=input_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=1, ... d_ffn=256, ... activation=torch.nn.GELU ... ) >>> lm_model = TransformerLM( ... vocab=vocab_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=0, ... d_ffn=256, ... activation=torch.nn.GELU, ... ) >>> n_channels=6 >>> ctc_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> seq_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> eos_index = 2 >>> ctc_scorer = CTCScorer( ... ctc_fc=ctc_lin, ... blank_index=0, ... eos_index=eos_index, ... ) >>> transformerlm_scorer = TransformerLMScorer( ... language_model=lm_model, ... temperature=1.15, ... ) >>> ctc_weight_decode=0.4 >>> lm_weight=0.6 >>> scorer = ScorerBuilder( ... full_scorers=[transformerlm_scorer, ctc_scorer], ... weights={'transformerlm': lm_weight, 'ctc': ctc_weight_decode} ... ) >>> beam_size=5 >>> searcher = S2STransformerBeamSearcher( ... modules=[net, seq_lin], ... bos_index=1, ... eos_index=eos_index, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.15, ... scorer=scorer ... ) >>> batch_size=2 >>> wav_len = torch.ones([batch_size]) >>> src = torch.rand([batch_size, n_channels, input_size]) >>> tgt = torch.randint(0, vocab_size, [batch_size, n_channels]) >>> enc, dec = net.forward(src, tgt) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]
此方法根据先前的 tokens 计算的 TransformerLM 分数对新的 beams 进行打分。
- 参数:
inp_tokens (torch.Tensor) – 当前时间步的输入 tensor。
memory (No limit) – 此时间步的打分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size)。在完整打分器之后要打分的 top-k 候选 tokens。如果为 None,打分器将对整个词汇表进行打分。
attn (torch.Tensor) – 在 CoverageScorer 或 CTCScorer 中使用的注意力权重。
- 返回:
log_probs (torch.Tensor)
memory
- class speechbrain.decoders.scorer.KenLMScorer(lm_path, vocab_size, token_list)[source]
-
KenLM N-gram 打分器。
此打分器基于 KenLM,这是一个快速高效的 N-gram 语言模型工具包。它用于提供下一个输入 tokens 的 N-gram 分数。
此打分器依赖于 KenLM 包。可以使用以下命令安装:
> pip install https://github.com/kpu/kenlm/archive/master.zip
注意:KenLM 打分器的计算成本较高。建议将其用作部分打分器,对前 k 个候选而不是整个词汇集进行打分。
示例
# >>> from speechbrain.nnet.linear import Linear # >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder # >>> from speechbrain.decoders import S2SRNNBeamSearcher, KenLMScorer, ScorerBuilder # >>> input_size=17 # >>> vocab_size=11 # >>> lm_path=’path/to/kenlm_model.arpa’ # or .bin # >>> token_list=[‘<pad>’, ‘<bos>’, ‘<eos>’, ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’] # >>> emb = torch.nn.Embedding( # … embedding_dim=input_size, # … num_embeddings=vocab_size, # … ) # >>> d_model=7 # >>> dec = AttentionalRNNDecoder( # … rnn_type=”gru”, # … attn_type=”content”, # … hidden_size=3, # … attn_dim=3, # … num_layers=1, # … enc_dim=d_model, # … input_size=input_size, # … ) # >>> n_channels=3 # >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) # >>> kenlm_weight = 0.4 # >>> kenlm_model = KenLMScorer( # … lm_path=lm_path, # … vocab_size=vocab_size, # … token_list=token_list, # … ) # >>> scorer = ScorerBuilder( # … full_scorers=[kenlm_model], # … weights={‘kenlm’: kenlm_weight} # … ) # >>> beam_size=5 # >>> searcher = S2SRNNBeamSearcher( # … embedding=emb, # … decoder=dec, # … linear=seq_lin, # … bos_index=1, # … eos_index=2, # … min_decode_ratio=0.0, # … max_decode_ratio=1.0, # … topk=2, # … using_eos_threshold=False, # … beam_size=beam_size, # … temperature=1.25, # … scorer=scorer # … ) # >>> batch_size=2 # >>> enc = torch.rand([batch_size, n_channels, d_model]) # >>> wav_len = torch.ones([batch_size]) # >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]
此方法基于 n-gram 分数对新的 beam 进行打分。
- 参数:
inp_tokens (torch.Tensor) – 当前时间步的输入 tensor。
memory (No limit) – 此时间步的打分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size)。在完整打分器之后要打分的 top-k 候选 tokens。如果为 None,打分器将对整个词汇表进行打分。
attn (torch.Tensor) – 在 CoverageScorer 或 CTCScorer 中使用的注意力权重。
- 返回:
scores (torch.Tensor)
(new_memory, new_scoring_table) (tuple) –
- class speechbrain.decoders.scorer.CoverageScorer(vocab_size, threshold=0.5)[source]
-
一个覆盖惩罚打分器,用于防止假设循环,其中
`coverage`
是累积注意力概率向量。参考:https://arxiv.org/pdf/1612.02695.pdf,示例
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder >>> from speechbrain.decoders import S2SRNNBeamSearcher, RNNLMScorer, CoverageScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> emb = torch.nn.Embedding( ... num_embeddings=vocab_size, ... embedding_dim=input_size ... ) >>> d_model=7 >>> dec = AttentionalRNNDecoder( ... rnn_type="gru", ... attn_type="content", ... hidden_size=3, ... attn_dim=3, ... num_layers=1, ... enc_dim=d_model, ... input_size=input_size, ... ) >>> n_channels=3 >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) >>> lm_weight = 0.4 >>> coverage_penalty = 1.0 >>> lm_model = RNNLM( ... embedding_dim=d_model, ... output_neurons=vocab_size, ... dropout=0.0, ... rnn_neurons=128, ... dnn_neurons=64, ... return_hidden=True, ... ) >>> rnnlm_scorer = RNNLMScorer( ... language_model=lm_model, ... temperature=1.25, ... ) >>> coverage_scorer = CoverageScorer(vocab_size=vocab_size) >>> scorer = ScorerBuilder( ... full_scorers=[rnnlm_scorer, coverage_scorer], ... weights={'rnnlm': lm_weight, 'coverage': coverage_penalty} ... ) >>> beam_size=5 >>> searcher = S2SRNNBeamSearcher( ... embedding=emb, ... decoder=dec, ... linear=seq_lin, ... bos_index=1, ... eos_index=2, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... topk=2, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.25, ... scorer=scorer ... ) >>> batch_size=2 >>> enc = torch.rand([batch_size, n_channels, d_model]) >>> wav_len = torch.ones([batch_size]) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, coverage, candidates, attn)[source]
此方法基于 Coverage 打分器对新的 beam 进行打分。
- 参数:
inp_tokens (torch.Tensor) – 当前时间步的输入 tensor。
coverage (无限制) – 此时间步的打分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size)。在完整打分器之后要打分的 top-k 候选 tokens。如果为 None,打分器将对整个词汇表进行打分。
attn (torch.Tensor) – 在 CoverageScorer 或 CTCScorer 中使用的注意力权重。
- 返回:
score (torch.Tensor) –
coverage –
- class speechbrain.decoders.scorer.LengthScorer(vocab_size)[source]
-
一个长度奖励打分器。
LengthScorer 用于提供长度奖励分数。它用于防止束搜索偏爱短假设。
注意:长度归一化 (length_normalization) 与此打分器不兼容。使用 LengthScorer 时请务必将其设置为 False。
- 参数:
vocab_size (int) – token 的总数。
示例
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder >>> from speechbrain.decoders import S2SRNNBeamSearcher, RNNLMScorer, CoverageScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> emb = torch.nn.Embedding( ... num_embeddings=vocab_size, ... embedding_dim=input_size ... ) >>> d_model=7 >>> dec = AttentionalRNNDecoder( ... rnn_type="gru", ... attn_type="content", ... hidden_size=3, ... attn_dim=3, ... num_layers=1, ... enc_dim=d_model, ... input_size=input_size, ... ) >>> n_channels=3 >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) >>> lm_weight = 0.4 >>> length_weight = 1.0 >>> lm_model = RNNLM( ... embedding_dim=d_model, ... output_neurons=vocab_size, ... dropout=0.0, ... rnn_neurons=128, ... dnn_neurons=64, ... return_hidden=True, ... ) >>> rnnlm_scorer = RNNLMScorer( ... language_model=lm_model, ... temperature=1.25, ... ) >>> length_scorer = LengthScorer(vocab_size=vocab_size) >>> scorer = ScorerBuilder( ... full_scorers=[rnnlm_scorer, length_scorer], ... weights={'rnnlm': lm_weight, 'length': length_weight} ... ) >>> beam_size=5 >>> searcher = S2SRNNBeamSearcher( ... embedding=emb, ... decoder=dec, ... linear=seq_lin, ... bos_index=1, ... eos_index=2, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... topk=2, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.25, ... length_normalization=False, ... scorer=scorer ... ) >>> batch_size=2 >>> enc = torch.rand([batch_size, n_channels, d_model]) >>> wav_len = torch.ones([batch_size]) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]
此方法基于 Length 打分器对新的 beam 进行打分。
- 参数:
inp_tokens (torch.Tensor) – 当前时间步的输入 tensor。
memory (No limit) – 此时间步的打分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size)。在完整打分器之后要打分的 top-k 候选 tokens。如果为 None,打分器将对整个词汇表进行打分。
attn (torch.Tensor) – 在 CoverageScorer 或 CTCScorer 中使用的注意力权重。
- 返回:
torch.Tensor – 分数
None –
- class speechbrain.decoders.scorer.ScorerBuilder(weights={}, full_scorers=[], partial_scorers=[], scorer_beam_scale=2)[source]
基类:
object
为束搜索构建打分器实例。
ScorerBuilder 类负责为束搜索构建打分器实例。它接收完整打分器和部分打分器的权重,以及完整打分器和部分打分器类的实例。它根据指定的权重组合打分器,并提供对 token 打分、置换打分器内存和重置打分器内存的方法。
这是用于为束搜索构建打分器实例的类。
参见 speechbrain.decoders.seq2seq.S2SBeamSearcher()
- 参数:
示例
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.transformer.TransformerASR import TransformerASR >>> from speechbrain.lobes.models.transformer.TransformerLM import TransformerLM >>> from speechbrain.decoders import S2STransformerBeamSearcher, TransformerLMScorer, CoverageScorer, CTCScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> d_model=128 >>> net = TransformerASR( ... tgt_vocab=vocab_size, ... input_size=input_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=1, ... d_ffn=256, ... activation=torch.nn.GELU ... ) >>> lm_model = TransformerLM( ... vocab=vocab_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=0, ... d_ffn=256, ... activation=torch.nn.GELU, ... ) >>> n_channels=6 >>> ctc_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> seq_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> eos_index = 2 >>> ctc_scorer = CTCScorer( ... ctc_fc=ctc_lin, ... blank_index=0, ... eos_index=eos_index, ... ) >>> transformerlm_scorer = TransformerLMScorer( ... language_model=lm_model, ... temperature=1.15, ... ) >>> coverage_scorer = CoverageScorer(vocab_size=vocab_size) >>> ctc_weight_decode=0.4 >>> lm_weight=0.6 >>> coverage_penalty = 1.0 >>> scorer = ScorerBuilder( ... full_scorers=[transformerlm_scorer, coverage_scorer], ... partial_scorers=[ctc_scorer], ... weights={'transformerlm': lm_weight, 'ctc': ctc_weight_decode, 'coverage': coverage_penalty} ... ) >>> beam_size=5 >>> searcher = S2STransformerBeamSearcher( ... modules=[net, seq_lin], ... bos_index=1, ... eos_index=eos_index, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... using_eos_threshold=False, ... beam_size=beam_size, ... topk=3, ... temperature=1.15, ... scorer=scorer ... ) >>> batch_size=2 >>> wav_len = torch.ones([batch_size]) >>> src = torch.rand([batch_size, n_channels, input_size]) >>> tgt = torch.randint(0, vocab_size, [batch_size, n_channels]) >>> enc, dec = net.forward(src, tgt) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, attn, log_probs, beam_size)[source]
此方法基于定义的完整打分器和部分打分器对词汇表中的 token 进行打分。分数将添加到对数概率中用于束搜索。
- 参数:
- 返回:
log_probs (torch.Tensor) – (批大小 x 束大小, 词汇集大小)。由打分器更新后的对数概率。
new_memory (dict[str, 打分器状态]) – 打分器更新后的状态。
- class speechbrain.decoders.scorer.BaseRescorerInterface[source]
-
用于束搜索中其他打分方法继承的打分器抽象。
在此方法中,使用神经网络为潜在文本转录分配分数。束搜索解码过程产生前 K 个假设的集合。这些候选随后被发送到语言模型 (LM) 进行排序。排序由 LM 执行,它为每个候选分配一个分数。
分数计算如下
score = beam_search_score + lm_weight * rescorer_score
- 参见
speechbrain.decoders.scorer.RNNLMRescorer –
speechbrain.decoders.scorer.TransformerLMRescorer –
speechbrain.decoders.scorer.HuggingFaceLMRescorer –
- device (str) – 打分器要移动到的设备。
-
基于 BaseRescorerInterface 的 RNNLM 包装器。
- 参数:
language_model (torch.nn.Module) – 一个基于 RNN 的语言模型。
tokenizer (SentencePieceProcessor) – 一个 SentencePiece 分词器。
如果 device 为 None,打分器将被移动到构造函数中提供的默认设备。
temperature (float) – 应用于 softmax 的温度因子。它改变了概率分布,当 T>1 时更平滑,当 T<1 时更尖锐。(默认值:1.0)
bos_index (int) – 序列开始 (bos) token 的索引。
eos_index (int) – 序列结束 (eos) token 的索引。
pad_index (int) – 填充 token 的索引。
注意
此类旨在与预训练的 TransformerLM 模型一起使用。请参见:https://hugging-face.cn/speechbrain/asr-crdnn-rnnlm-librispeech
默认情况下,此模型使用 SentencePiece 分词器。
示例
>>> import torch >>> from sentencepiece import SentencePieceProcessor >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.utils.parameter_transfer import Pretrainer >>> source = "speechbrain/asr-crdnn-rnnlm-librispeech" >>> lm_model_path = source + "/lm.ckpt" >>> tokenizer_path = source + "/tokenizer.ckpt" >>> # define your tokenizer and RNNLM from the HF hub >>> tokenizer = SentencePieceProcessor() >>> lm_model = RNNLM( ... output_neurons = 1000, ... embedding_dim = 128, ... activation = torch.nn.LeakyReLU, ... dropout = 0.0, ... rnn_layers = 2, ... rnn_neurons = 2048, ... dnn_blocks = 1, ... dnn_neurons = 512, ... return_hidden = True, ... ) >>> pretrainer = Pretrainer( ... collect_in = getfixture("tmp_path"), ... loadables = { ... "lm" : lm_model, ... "tokenizer" : tokenizer, ... }, ... paths = { ... "lm" : lm_model_path, ... "tokenizer" : tokenizer_path, ... }) >>> _ = pretrainer.collect_files() >>> pretrainer.load_collected() >>> from speechbrain.decoders.scorer import RNNLMRescorer, RescorerBuilder >>> rnnlm_rescorer = RNNLMRescorer( ... language_model = lm_model, ... tokenizer = tokenizer, ... temperature = 1.0, ... bos_index = 0, ... eos_index = 0, ... pad_index = 0, ... ) >>> # Define a rescorer builder >>> rescorer = RescorerBuilder( ... rescorers=[rnnlm_rescorer], ... weights={"rnnlm":1.0} ... ) >>> # topk hyps >>> topk_hyps = [["HELLO", "HE LLO", "H E L L O"]] >>> topk_scores = [[-2, -2, -2]] >>> rescored_hyps, rescored_scores = rescorer.rescore(topk_hyps, topk_scores) >>> # NOTE: the returned hypotheses are already sorted by score. >>> rescored_hyps [['HELLO', 'H E L L O', 'HE LLO']] >>> # NOTE: as we are returning log-probs, the more it is closer to 0, the better. >>> rescored_scores [[-17.863974571228027, -25.12890625, -26.075977325439453]]
- to_device(device=None)[source]
此方法将打分器移动到设备。
如果 device 为 None,打分器将被移动到构造函数中提供的默认设备。
- 参数:
如果 device 为 None,打分器将被移动到构造函数中提供的默认设备。
- class speechbrain.decoders.scorer.TransformerLMRescorer(language_model, tokenizer, device='cuda', temperature=1.0, bos_index=0, eos_index=0, pad_index=0)[source]
-
基于 BaseRescorerInterface 的 TransformerLM 包装器。
- 参数:
language_model (torch.nn.Module) – 一个基于 Transformer 的语言模型。
tokenizer (SentencePieceProcessor) – 一个 SentencePiece 分词器。
如果 device 为 None,打分器将被移动到构造函数中提供的默认设备。
temperature (float) – 应用于 softmax 的温度因子。它改变了概率分布,当 T>1 时更平滑,当 T<1 时更尖锐。(默认值:1.0)
bos_index (int) – 序列开始 (bos) token 的索引。
eos_index (int) – 序列结束 (eos) token 的索引。
pad_index (int) – 填充 token 的索引。
注意
此类旨在与预训练的 TransformerLM 模型一起使用。请参见:https://hugging-face.cn/speechbrain/asr-transformer-transformerlm-librispeech
默认情况下,此模型使用 SentencePiece 分词器。
示例
>>> import torch >>> from sentencepiece import SentencePieceProcessor >>> from speechbrain.lobes.models.transformer.TransformerLM import TransformerLM >>> from speechbrain.utils.parameter_transfer import Pretrainer >>> source = "speechbrain/asr-transformer-transformerlm-librispeech" >>> lm_model_path = source + "/lm.ckpt" >>> tokenizer_path = source + "/tokenizer.ckpt" >>> tokenizer = SentencePieceProcessor() >>> lm_model = TransformerLM( ... vocab=5000, ... d_model=768, ... nhead=12, ... num_encoder_layers=12, ... num_decoder_layers=0, ... d_ffn=3072, ... dropout=0.0, ... activation=torch.nn.GELU, ... normalize_before=False, ... ) >>> pretrainer = Pretrainer( ... collect_in = getfixture("tmp_path"), ... loadables={ ... "lm": lm_model, ... "tokenizer": tokenizer, ... }, ... paths={ ... "lm": lm_model_path, ... "tokenizer": tokenizer_path, ... } ... ) >>> _ = pretrainer.collect_files() >>> pretrainer.load_collected() >>> from speechbrain.decoders.scorer import TransformerLMRescorer, RescorerBuilder >>> transformerlm_rescorer = TransformerLMRescorer( ... language_model=lm_model, ... tokenizer=tokenizer, ... temperature=1.0, ... bos_index=1, ... eos_index=2, ... pad_index=0, ... ) >>> rescorer = RescorerBuilder( ... rescorers=[transformerlm_rescorer], ... weights={"transformerlm": 1.0} ... ) >>> topk_hyps = [["HELLO", "HE LLO", "H E L L O"]] >>> topk_scores = [[-2, -2, -2]] >>> rescored_hyps, rescored_scores = rescorer.rescore(topk_hyps, topk_scores) >>> # NOTE: the returned hypotheses are already sorted by score. >>> rescored_hyps [["HELLO", "HE L L O", "HE LLO"]] >>> # NOTE: as we are returning log-probs, the more it is closer to 0, the better. >>> rescored_scores [[-17.863974571228027, -25.12890625, -26.075977325439453]]
- to_device(device=None)[source]
此方法将打分器移动到设备。
如果 device 为 None,打分器将被移动到构造函数中提供的默认设备。
当 stage 等于 TEST 时,此方法在 recipes 中动态调用。
- 参数:
如果 device 为 None,打分器将被移动到构造函数中提供的默认设备。
- class speechbrain.decoders.scorer.HuggingFaceLMRescorer(model_name, device='cuda')[source]
-
基于 BaseRescorerInterface 的 HuggingFace TransformerLM 封装。
示例
>>> from speechbrain.decoders.scorer import HuggingFaceLMRescorer, RescorerBuilder >>> source = "gpt2-medium" >>> huggingfacelm_rescorer = HuggingFaceLMRescorer( ... model_name=source, ... ) >>> rescorer = RescorerBuilder( ... rescorers=[huggingfacelm_rescorer], ... weights={"huggingfacelm": 1.0} ... ) >>> topk_hyps = [["Hello everyone.", "Hell o every one.", "Hello every one"]] >>> topk_scores = [[-2, -2, -2]] >>> rescored_hyps, rescored_scores = rescorer.rescore(topk_hyps, topk_scores) >>> # NOTE: the returned hypotheses are already sorted by score. >>> rescored_hyps [['Hello everyone.', 'Hello every one', 'Hell o every one.']] >>> # NOTE: as we are returning log-probs, the more it is closer to 0, the better. >>> rescored_scores [[-20.03631591796875, -27.615638732910156, -42.662353515625]]
- to_device(device=None)[source]
此方法将打分器移动到设备。
如果 device 为 None,打分器将被移动到构造函数中提供的默认设备。
当 stage 等于 TEST 时,此方法在 recipes 中动态调用。
- 参数:
如果 device 为 None,打分器将被移动到构造函数中提供的默认设备。