fastNLP.modules.encoder¶

class fastNLP.modules.encoder.ConvolutionCharEncoder(char_emb_size=50, feature_maps=(40, 30, 30), kernels=(1, 3, 5), initial_method=None)[源代码]¶

别名 fastNLP.modules.ConvolutionCharEncoder fastNLP.modules.encoder.ConvolutionCharEncoder

char级别的卷积编码器.

__init__(char_emb_size=50, feature_maps=(40, 30, 30), kernels=(1, 3, 5), initial_method=None)[源代码]¶

参数:

char_emb_size (int) -- char级别embedding的维度. Default: 50 :例: 有26个字符, 每一个的embedding是一个50维的向量, 所以输入的向量维度为50.
feature_maps (tuple) -- 一个由int组成的tuple. tuple的长度是char级别卷积操作的数目, 第`i`个int表示第`i`个卷积操作的filter.
kernels (tuple) -- 一个由int组成的tuple. tuple的长度是char级别卷积操作的数目, 第`i`个int表示第`i`个卷积操作的卷积核.
initial_method -- 初始化参数的方式, 默认为`xavier normal`

forward(x)[源代码]¶

参数:	x (torch.Tensor) -- `[batch_size * sent_length, word_length, char_emb_size]` 输入字符的embedding
返回:	torch.Tensor : 卷积计算的结果, 维度为[batch_size * sent_length, sum(feature_maps), 1]

class fastNLP.modules.encoder.LSTMCharEncoder(char_emb_size=50, hidden_size=None, initial_method=None)[源代码]¶

别名 fastNLP.modules.LSTMCharEncoder fastNLP.modules.encoder.LSTMCharEncoder

char级别基于LSTM的encoder.

__init__(char_emb_size=50, hidden_size=None, initial_method=None)[源代码]¶

参数:	char_emb_size (int) -- char级别embedding的维度. Default: 50 例: 有26个字符, 每一个的embedding是一个50维的向量, 所以输入的向量维度为50. hidden_size (int) -- LSTM隐层的大小, 默认为char的embedding维度 initial_method -- 初始化参数的方式, 默认为`xavier normal`

forward(x)[源代码]¶

参数:	x (torch.Tensor) -- `[ n_batch*n_word, word_length, char_emb_size]` 输入字符的embedding
返回:	torch.Tensor : [ n_batch*n_word, char_emb_size]经过LSTM编码的结果

class fastNLP.modules.encoder.ConvMaxpool(in_channels, out_channels, kernel_sizes, activation='relu')[源代码]¶

别名 fastNLP.modules.ConvMaxpool fastNLP.modules.encoder.ConvMaxpool

集合了Convolution和Max-Pooling于一体的层。给定一个batch_size x max_len x input_size的输入，返回batch_size x sum(output_channels) 大小的matrix。在内部，是先使用CNN给输入做卷积，然后经过activation激活层，在通过在长度(max_len) 这一维进行max_pooling。最后得到每个sample的一个向量表示。

__init__(in_channels, out_channels, kernel_sizes, activation='relu')[源代码]¶

参数:

in_channels (int) -- 输入channel的大小，一般是embedding的维度; 或encoder的output维度
out_channels (int,tuple(int)) -- 输出channel的数量。如果为list，则需要与kernel_sizes的数量保持一致
kernel_sizes (int,tuple(int)) -- 输出channel的kernel大小。
activation (str) -- Convolution后的结果将通过该activation后再经过max-pooling。支持relu, sigmoid, tanh

forward(x, mask=None)[源代码]¶

参数:	x (torch.FloatTensor) -- batch_size x max_len x input_size, 一般是经过embedding后的值 mask -- batch_size x max_len, pad的地方为0。不影响卷积运算，max-pool一定不会pool到pad为0的位置
返回:

class fastNLP.modules.encoder.LSTM(input_size, hidden_size=100, num_layers=1, dropout=0.0, batch_first=True, bidirectional=False, bias=True)[源代码]¶

别名 fastNLP.modules.LSTM fastNLP.modules.encoder.LSTM

LSTM 模块, 轻量封装的Pytorch LSTM. 在提供seq_len的情况下，将自动使用pack_padded_sequence; 同时默认将forget gate的bias初始化为1; 且可以应对DataParallel中LSTM的使用问题。

__init__(input_size, hidden_size=100, num_layers=1, dropout=0.0, batch_first=True, bidirectional=False, bias=True)[源代码]¶

参数:

input_size -- 输入 x 的特征维度
hidden_size -- 隐状态 h 的特征维度. 如果bidirectional为True，则输出的维度会是hidde_size*2
num_layers -- rnn的层数. Default: 1
dropout -- 层间dropout概率. Default: 0
bidirectional -- 若为 True, 使用双向的RNN. Default: False
batch_first -- 若为 True, 输入和输出 Tensor 形状为 :(batch, seq, feature). Default: False
bias -- 如果为 False, 模型将不会使用bias. Default: True

forward(x, seq_len=None, h0=None, c0=None)[源代码]¶

Return (output, (ht, ct)):
参数:	x -- [batch, seq_len, input_size] 输入序列 seq_len -- [batch, ] 序列长度, 若为 `None`, 所有输入看做一样长. Default: `None` h0 -- [batch, hidden_size] 初始隐状态, 若为 `None` , 设为全0向量. Default: `None` c0 -- [batch, hidden_size] 初始Cell状态, 若为 `None` , 设为全0向量. Default: `None`
	output: [batch, seq_len, hidden_sizenum_direction] 输出序列和 ht,ct: [num_layersnum_direction, batch, hidden_size] 最后时刻隐状态.

class fastNLP.modules.encoder.StarTransformer(hidden_size, num_layers, num_head, head_dim, dropout=0.1, max_len=None)[源代码]¶

别名 fastNLP.modules.StarTransformer fastNLP.modules.encoder.StarTransformer

Star-Transformer 的encoder部分。输入3d的文本输入, 返回相同长度的文本编码

paper: https://arxiv.org/abs/1902.09113

__init__(hidden_size, num_layers, num_head, head_dim, dropout=0.1, max_len=None)[源代码]¶

参数:

hidden_size (int) -- 输入维度的大小。同时也是输出维度的大小。
num_layers (int) -- star-transformer的层数
num_head (int) -- head的数量。
head_dim (int) -- 每个head的维度大小。
dropout (float) -- dropout 概率. Default: 0.1
max_len (int) -- int or None, 如果为int，输入序列的最大长度，模型会为输入序列加上position embedding。若为`None`，忽略加上position embedding的步骤. Default: None

forward(data, mask)[源代码]¶

参数:

data (FloatTensor) -- [batch, length, hidden] 输入的序列
mask (ByteTensor) -- [batch, length] 输入序列的padding mask, 在没有内容(padding 部分) 为 0, 否则为 1

返回:

[batch, length, hidden] 编码后的输出序列

[batch, hidden] 全局 relay 节点, 详见论文

class fastNLP.modules.encoder.TransformerEncoder(num_layers, **kargs)[源代码]¶

别名 fastNLP.modules.TransformerEncoder fastNLP.modules.encoder.TransformerEncoder

transformer的encoder模块，不包含embedding层

__init__(num_layers, **kargs)[源代码]¶

参数:	num_layers (int) -- transformer的层数 model_size (int) -- 输入维度的大小。同时也是输出维度的大小。 inner_size (int) -- FFN层的hidden大小 key_size (int) -- 每个head的维度大小。 value_size (int) -- 每个head中value的维度。 num_head (int) -- head的数量。 dropout (float) -- dropout概率. Default: 0.1

forward(x, seq_mask=None)[源代码]¶

参数:	x -- [batch, seq_len, model_size] 输入序列 seq_mask -- [batch, seq_len] 输入序列的padding mask, 若为 `None` , 生成全1向量. Default: `None`
返回:	[batch, seq_len, model_size] 输出序列

class fastNLP.modules.encoder.VarRNN(*args, **kwargs)[源代码]¶

基类 fastNLP.modules.VarRNNBase

别名 fastNLP.modules.VarRNN fastNLP.modules.encoder.VarRNN

Variational Dropout RNN. 相关论文参考：A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (Yarin Gal and Zoubin Ghahramani, 2016)

__init__(*args, **kwargs)[源代码]¶

参数:

input_size -- 输入 x 的特征维度
hidden_size -- 隐状态 h 的特征维度
num_layers -- rnn的层数. Default: 1
bias -- 如果为 False, 模型将不会使用bias. Default: True
batch_first -- 若为 True, 输入和输出 Tensor 形状为 (batch, seq, feature). Default: False
input_dropout -- 对输入的dropout概率. Default: 0
hidden_dropout -- 对每个隐状态的dropout概率. Default: 0
bidirectional -- 若为 True, 使用双向的RNN. Default: False

class fastNLP.modules.encoder.VarLSTM(*args, **kwargs)[源代码]¶

基类 fastNLP.modules.VarRNNBase

别名 fastNLP.modules.VarLSTM fastNLP.modules.encoder.VarLSTM

Variational Dropout LSTM. 相关论文参考：A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (Yarin Gal and Zoubin Ghahramani, 2016)

__init__(*args, **kwargs)[源代码]¶

参数:

input_size -- 输入 x 的特征维度
hidden_size -- 隐状态 h 的特征维度
num_layers -- rnn的层数. Default: 1
bias -- 如果为 False, 模型将不会使用bias. Default: True
batch_first -- 若为 True, 输入和输出 Tensor 形状为 (batch, seq, feature). Default: False
input_dropout -- 对输入的dropout概率. Default: 0
hidden_dropout -- 对每个隐状态的dropout概率. Default: 0
bidirectional -- 若为 True, 使用双向的LSTM. Default: False

class fastNLP.modules.encoder.VarGRU(*args, **kwargs)[源代码]¶

基类 fastNLP.modules.VarRNNBase

别名 fastNLP.modules.VarGRU fastNLP.modules.encoder.VarGRU

Variational Dropout GRU. 相关论文参考：A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (Yarin Gal and Zoubin Ghahramani, 2016)

__init__(*args, **kwargs)[源代码]¶

参数:

input_size -- 输入 x 的特征维度
hidden_size -- 隐状态 h 的特征维度
num_layers -- rnn的层数. Default: 1
bias -- 如果为 False, 模型将不会使用bias. Default: True
batch_first -- 若为 True, 输入和输出 Tensor 形状为 (batch, seq, feature). Default: False
input_dropout -- 对输入的dropout概率. Default: 0
hidden_dropout -- 对每个隐状态的dropout概率. Default: 0
bidirectional -- 若为 True, 使用双向的GRU. Default: False

class fastNLP.modules.encoder.MaxPool(stride=None, padding=0, dilation=1, dimension=1, kernel_size=None, ceil_mode=False)[源代码]¶

别名 fastNLP.modules.MaxPool fastNLP.modules.encoder.MaxPool

Max-pooling模块。

__init__(stride=None, padding=0, dilation=1, dimension=1, kernel_size=None, ceil_mode=False)[源代码]¶

参数:	stride -- 窗口移动大小，默认为kernel_size padding -- padding的内容，默认为0 dilation -- 控制窗口内元素移动距离的大小 dimension -- MaxPool的维度，支持1，2，3维。 kernel_size -- max pooling的窗口大小，默认为tensor最后k维，其中k为dimension ceil_mode --

class fastNLP.modules.encoder.MaxPoolWithMask[源代码]¶

别名 fastNLP.modules.MaxPoolWithMask fastNLP.modules.encoder.MaxPoolWithMask

带mask矩阵的max pooling。在做max-pooling的时候不会考虑mask值为0的位置。

forward(tensor, mask, dim=1)[源代码]¶

参数:	tensor (torch.FloatTensor) -- [batch_size, seq_len, channels] 初始tensor mask (torch.LongTensor) -- [batch_size, seq_len] 0/1的mask矩阵 dim (int) -- 需要进行max pooling的维度
返回:

class fastNLP.modules.encoder.KMaxPool(k=1)[源代码]¶

别名 fastNLP.modules.KMaxPool fastNLP.modules.encoder.KMaxPool K max-pooling module.

forward(x)[源代码]¶

参数:	x (torch.Tensor) -- [N, C, L] 初始tensor
返回:	torch.Tensor x: [N, C*k] k-max pool后的结果

class fastNLP.modules.encoder.AvgPool(stride=None, padding=0)[源代码]¶

别名 fastNLP.modules.AvgPool fastNLP.modules.encoder.AvgPool

给定形如[batch_size, max_len, hidden_size]的输入，在最后一维进行avg pooling. 输出为[batch_size, hidden_size]

forward(x)[源代码]¶

参数:	x (torch.Tensor) -- [N, C, L] 初始tensor
返回:	torch.Tensor x: [N, C] avg pool后的结果

class fastNLP.modules.encoder.AvgPoolWithMask[源代码]¶

别名 fastNLP.modules.AvgPoolWithMask fastNLP.modules.encoder.AvgPoolWithMask

给定形如[batch_size, max_len, hidden_size]的输入，在最后一维进行avg pooling. 输出为[batch_size, hidden_size], pooling 的时候只会考虑mask为1的位置

forward(tensor, mask, dim=1)[源代码]¶

参数:	tensor (torch.FloatTensor) -- [batch_size, seq_len, channels] 初始tensor mask (torch.LongTensor) -- [batch_size, seq_len] 0/1的mask矩阵 dim (int) -- 需要进行max pooling的维度
返回:

class fastNLP.modules.encoder.MultiHeadAttention(input_size, key_size, value_size, num_head, dropout=0.1)[源代码]¶

别名 fastNLP.modules.MultiHeadAttention fastNLP.modules.encoder.MultiHeadAttention

Transformer当中的MultiHeadAttention

__init__(input_size, key_size, value_size, num_head, dropout=0.1)[源代码]¶

参数:	input_size -- int, 输入维度的大小。同时也是输出维度的大小。 key_size -- int, 每个head的维度大小。 value_size -- int，每个head中value的维度。 num_head -- int，head的数量。 dropout -- float。

forward(Q, K, V, atte_mask_out=None)[源代码]¶

参数:	Q -- [batch, seq_len_q, model_size] K -- [batch, seq_len_k, model_size] V -- [batch, seq_len_k, model_size] seq_mask -- [batch, seq_len]

class fastNLP.modules.encoder.BiAttention[源代码]¶

别名 fastNLP.modules.BiAttention fastNLP.modules.encoder.BiAttention

Bi Attention module

对于给定的两个向量序列 \(a_i\) 和 \(b_j\) , BiAttention模块将通过以下的公式来计算attention结果

\[\begin{split}\begin{array}{ll} \\ e_{ij} = {a}^{\mathrm{T}}_{i}{b}_{j} \\ {\hat{a}}_{i} = \sum_{j=1}^{\mathcal{l}_{b}}{\frac{\mathrm{exp}(e_{ij})}{\sum_{k=1}^{\mathcal{l}_{b}}{\mathrm{exp}(e_{ik})}}}{b}_{j} \\ {\hat{b}}_{j} = \sum_{i=1}^{\mathcal{l}_{a}}{\frac{\mathrm{exp}(e_{ij})}{\sum_{k=1}^{\mathcal{l}_{a}}{\mathrm{exp}(e_{ik})}}}{a}_{i} \\ \end{array}\end{split}\]

forward(premise_batch, premise_mask, hypothesis_batch, hypothesis_mask)[源代码]¶

参数:	premise_batch (torch.Tensor) -- [batch_size, a_seq_len, hidden_size] premise_mask (torch.Tensor) -- [batch_size, a_seq_len] hypothesis_batch (torch.Tensor) -- [batch_size, b_seq_len, hidden_size] hypothesis_mask (torch.Tensor) -- [batch_size, b_seq_len]
返回:	torch.Tensor attended_premises: [batch_size, a_seq_len, hidden_size] torch.Tensor attended_hypotheses: [batch_size, b_seq_len, hidden_size]

class fastNLP.modules.encoder.SelfAttention(input_size, attention_unit=300, attention_hops=10, drop=0.5, initial_method=None)[源代码]¶

别名 fastNLP.modules.SelfAttention fastNLP.modules.encoder.SelfAttention

这是一个基于论文 A structured self-attentive sentence embedding 的Self Attention Module.

__init__(input_size, attention_unit=300, attention_hops=10, drop=0.5, initial_method=None)[源代码]¶

参数:	input_size (int) -- 输入tensor的hidden维度 attention_unit (int) -- 输出tensor的hidden维度 attention_hops (int) -- drop (float) -- dropout概率，默认值为0.5 initial_method (str) -- 初始化参数方法

forward(input, input_origin)[源代码]¶

Return torch.Tensor output1:
参数:	input (torch.Tensor) -- [batch_size, seq_len, hidden_size] 要做attention的矩阵 input_origin (torch.Tensor) -- [batch_size, seq_len] 原始token的index组成的矩阵，含有pad部分内容
	[batch_size, multi-head, hidden_size] 经过attention操作后输入矩阵的结果
Return torch.Tensor output2:
	[1] attention惩罚项，是一个标量