Gregor Koehler / Oct 22 2017
Building a Translator - A Sequence to Sequence Learning Intro in Keras (CPU)
Building a Translator - A Sequence to Sequence Learning Intro in Keras (CPU)
Prerequisites:
- basic understanding of a Recurrent Neural Network cell called LSTM (there is a great blog post on RNNs and LSTMs written by Christopher Olah)
- basic understanding of ML concepts (if need be see other articles on the topic, e.g. Getting started with Python, MNIST Handwritten Digit Classification, Image Classification with Keras)
conda install -qy -c anaconda tensorflow h5py pip install keras
Locked
setupBash
from keras.models import Model from keras.layers import Input, LSTM, Dense import numpy as np
Done
importsPython Setup
Download dataset (spanish-english here), taken from this source.
spa.txt
Open in new windowSome info:
- Because the training process and inference process (decoding sentences) are quite different, we use different models for both, albeit they all leverage the same inner layers.
On Canonical Sequence to Sequence Learning:
- more general since input and output sequences don't have to have the same length
- entire input sequence is needed to start predicting the target
- usually simply referred to as 'Seq2Seq' learning (leaving out the canonical)
Training Mode:
- "Encoder"
- is a RNN layer (or a stack thereof) to process the input sequence
- returns internal state of the RNN (output is discarded)
- internal state serves as "context", or "conditioning", of the decoder
- "Decoder"
- is another RNN layer (or a stack thereof) trained to predict the next characters(or other sequence?) of the target sequence, given previous characters of the target sequence
- aims to turn target sequences into the same sequences but offset by one timestep into the future (called "teacher forcing")
- as initial state uses the internal state of the encoder, thus obtaining information on what it's supposed to generate
- effectively learns to generate targets[t+1] given targets[t], conditioned on the input sequence by using the encoder internal state
Inference Mode:
- encode the input sequence into state vectors (internal state)
- start with a target sequence of size 1 ('start-of-sequence' character)
- feed state vectors and start char to decoder to produce predictions for the next character
- sample the next char using these predictions (often simply by using argmax)
- append the sampled char to the target sequence
- repeat until we hit 'end-of-sequence' char or we hit char limit

1.
Section Title
1.
Section Title
2. Turn the Sentences into 3 Numpy Arrays
2.
Turn the Sentences into 3 Numpy Arrays
These numpy arrays are encoder_input_data
, decoder_input_data
and decoder_target_data
.
encoder_input_data
is a 3D array of shape (num_pairs
,max_english_sentence_length
,num_english_characters
) containing a one-hot vectorization of the English sentences.decoder_input_data
is a 3D array of shape (num_pairs
,max_french_sentence_length
,num_french_characters
) containg a one-hot vectorization of the French sentences.decoder_target_data
is the same asdecoder_input_data
but offset by one timestep.decoder_target_data[:, t, :]
will be the same asdecoder_input_data[:, t + 1, :]
.
3. Train a basic LSTM-based Seq2Seq Model
3.
Train a basic LSTM-based Seq2Seq Model
... in order to predict decoder_target_data
given encoder_input_data
and decoder_input_data
. Our model uses teacher forcing.
4. Test the Model by Decoding Some Sentences
4.
Test the Model by Decoding Some Sentences
I.e. turn samples from encoder_input_data
into corresponding samples from decoder_target_data
.