Standardize on 'teacher forcing' rather than 'input feeding' which is… (#769)

Summary:
Input feeding generally refers to a slightly different concept
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/769

Differential Revision: D16491898

Pulled By: myleott

fbshipit-source-id: 68573584e820f11f199db4e7e37e9ee7a69a3287
This commit is contained in:
Myle Ott 2019-07-25 07:19:44 -07:00 committed by Facebook Github Bot
parent 3d764a3dc6
commit 8835d93cf0
8 changed files with 20 additions and 21 deletions

View File

@ -285,7 +285,7 @@ following contents::
max_source_positions=self.args.max_positions,
max_target_positions=1,
# Since our target is a single class label, there's no need for
# input feeding. If we set this to ``True`` then our Model's
# teacher forcing. If we set this to ``True`` then our Model's
# ``forward()`` method would receive an additional argument called
# *prev_output_tokens* that would contain a shifted version of the
# target sequence.

View File

@ -125,9 +125,9 @@ Decoder
Our Decoder will predict the next word, conditioned on the Encoder's final
hidden state and an embedded representation of the previous target word -- which
is sometimes called *input feeding* or *teacher forcing*. More specifically,
we'll use a :class:`torch.nn.LSTM` to produce a sequence of hidden states that
we'll project to the size of the output vocabulary to predict each target word.
is sometimes called *teacher forcing*. More specifically, we'll use a
:class:`torch.nn.LSTM` to produce a sequence of hidden states that we'll project
to the size of the output vocabulary to predict each target word.
::
@ -171,7 +171,7 @@ we'll project to the size of the output vocabulary to predict each target word.
"""
Args:
prev_output_tokens (LongTensor): previous decoder outputs of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
encoder_out (Tensor, optional): output from the encoder, used for
encoder-side attention
@ -387,8 +387,8 @@ previous hidden states.
In fairseq this is called :ref:`Incremental decoding`. Incremental decoding is a
special mode at inference time where the Model only receives a single timestep
of input corresponding to the immediately previous output token (for input
feeding) and must produce the next output incrementally. Thus the model must
of input corresponding to the immediately previous output token (for teacher
forcing) and must produce the next output incrementally. Thus the model must
cache any long-term state that is needed about the sequence, e.g., hidden
states, convolutional states, etc.

View File

@ -88,8 +88,7 @@ class LanguagePairDataset(FairseqDataset):
shuffle (bool, optional): shuffle dataset elements before batching
(default: True).
input_feeding (bool, optional): create a shifted version of the targets
to be passed into the model for input feeding/teacher forcing
(default: True).
to be passed into the model for teacher forcing (default: True).
remove_eos_from_source (bool, optional): if set, removes eos from end
of source if it's present (default: False).
append_eos_to_target (bool, optional): if set, appends eos to end of
@ -167,10 +166,10 @@ class LanguagePairDataset(FairseqDataset):
- `src_lengths` (LongTensor): 1D Tensor of the unpadded
lengths of each source sentence of shape `(bsz)`
- `prev_output_tokens` (LongTensor): a padded 2D Tensor of
tokens in the target sentence, shifted right by one position
for input feeding/teacher forcing, of shape `(bsz,
tgt_len)`. This key will not be present if *input_feeding*
is ``False``. Padding will appear on the left if
tokens in the target sentence, shifted right by one
position for teacher forcing, of shape `(bsz, tgt_len)`.
This key will not be present if *input_feeding* is
``False``. Padding will appear on the left if
*left_pad_target* is ``True``.
- `target` (LongTensor): a padded 2D Tensor of tokens in the

View File

@ -22,7 +22,7 @@ class FairseqDecoder(nn.Module):
"""
Args:
prev_output_tokens (LongTensor): shifted output tokens of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
encoder_out (dict, optional): output from the encoder, used for
encoder-side attention

View File

@ -13,7 +13,7 @@ class FairseqIncrementalDecoder(FairseqDecoder):
Incremental decoding is a special mode at inference time where the Model
only receives a single timestep of input corresponding to the previous
output token (for input feeding) and must produce the next output
output token (for teacher forcing) and must produce the next output
*incrementally*. Thus the model must cache any long-term state that is
needed about the sequence, e.g., hidden states, convolutional states, etc.
@ -37,7 +37,7 @@ class FairseqIncrementalDecoder(FairseqDecoder):
"""
Args:
prev_output_tokens (LongTensor): shifted output tokens of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
encoder_out (dict, optional): output from the encoder, used for
encoder-side attention
incremental_state (dict, optional): dictionary used for storing

View File

@ -202,8 +202,8 @@ class FairseqEncoderDecoderModel(BaseFairseqModel):
Run the forward pass for an encoder-decoder model.
First feed a batch of source tokens through the encoder. Then, feed the
encoder output and previous decoder outputs (i.e., input feeding/teacher
forcing) to the decoder to produce the next outputs::
encoder output and previous decoder outputs (i.e., teacher forcing) to
the decoder to produce the next outputs::
encoder_out = self.encoder(src_tokens, src_lengths)
return self.decoder(prev_output_tokens, encoder_out)
@ -213,7 +213,7 @@ class FairseqEncoderDecoderModel(BaseFairseqModel):
`(batch, src_len)`
src_lengths (LongTensor): source sentence lengths of shape `(batch)`
prev_output_tokens (LongTensor): previous decoder outputs of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
Returns:
tuple:

View File

@ -345,7 +345,7 @@ class LightConvDecoder(FairseqIncrementalDecoder):
"""
Args:
prev_output_tokens (LongTensor): previous decoder outputs of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
encoder_out (Tensor, optional): output from the encoder, used for
encoder-side attention
incremental_state (dict): dictionary used for storing state during

View File

@ -370,7 +370,7 @@ class TransformerDecoder(FairseqIncrementalDecoder):
"""
Args:
prev_output_tokens (LongTensor): previous decoder outputs of shape
`(batch, tgt_len)`, for input feeding/teacher forcing
`(batch, tgt_len)`, for teacher forcing
encoder_out (Tensor, optional): output from the encoder, used for
encoder-side attention
incremental_state (dict): dictionary used for storing state during