Building a Translator - A Sequence to Sequence Learning Intro in Keras (CPU)

Gregor Koehler, adapted from F. Chollet's blog post 

Prerequisites:

basic understanding of a Recurrent Neural Network cell called LSTM (there is a great blog post on RNNs and LSTMs written by Christopher Olah)
basic understanding of ML concepts (if need be see other articles on the topic, e.g. Getting started with Python, MNIST Handwritten Digit Classification, Image Classification with Keras)

conda install -qy -c anaconda tensorflow h5py
pip install keras

Locked

setup

Bash

from keras.models import Model
from keras.layers import Input, LSTM, Dense

import numpy as np

Done

imports

Python Setup

Download dataset (spanish-english here), taken from this source.

spa.txt

Open in new window

Some info:

Because the training process and inference process (decoding sentences) are quite different, we use different models for both, albeit they all leverage the same inner layers.

On Canonical Sequence to Sequence Learning:

more general since input and output sequences don't have to have the same length
entire input sequence is needed to start predicting the target
usually simply referred to as 'Seq2Seq' learning (leaving out the canonical)

Training Mode:

"Encoder"
is a RNN layer (or a stack thereof) to process the input sequence
returns internal state of the RNN (output is discarded)
internal state serves as "context", or "conditioning", of the decoder
"Decoder"
is another RNN layer (or a stack thereof) trained to predict the next characters(or other sequence?) of the target sequence, given previous characters of the target sequence
aims to turn target sequences into the same sequences but offset by one timestep into the future (called "teacher forcing")
as initial state uses the internal state of the encoder, thus obtaining information on what it's supposed to generate
effectively learns to generate targets[t+1] given targets[t], conditioned on the input sequence by using the encoder internal state

Inference Mode:

encode the input sequence into state vectors (internal state)
start with a target sequence of size 1 ('start-of-sequence' character)
feed state vectors and start char to decoder to produce predictions for the next character
sample the next char using these predictions (often simply by using argmax)
append the sampled char to the target sequence
repeat until we hit 'end-of-sequence' char or we hit char limit

1.

Section Title

2.
Turn the Sentences into 3 Numpy Arrays

These numpy arrays are encoder_input_data, decoder_input_data and decoder_target_data.

encoder_input_data is a 3D array of shape (num_pairs, max_english_sentence_length, num_english_characters) containing a one-hot vectorization of the English sentences.
decoder_input_data is a 3D array of shape (num_pairs, max_french_sentence_length, num_french_characters) containg a one-hot vectorization of the French sentences.
decoder_target_data is the same as decoder_input_data but offset by one timestep. decoder_target_data[:, t, :] will be the same as decoder_input_data[:, t + 1, :].

3.
Train a basic LSTM-based Seq2Seq Model

... in order to predict decoder_target_data given encoder_input_data and decoder_input_data. Our model uses teacher forcing.

4.
Test the Model by Decoding Some Sentences

I.e. turn samples from encoder_input_data into corresponding samples from decoder_target_data.

Building a Translator - A Sequence to Sequence Learning Intro in Keras (CPU)

1. Section Title

2. Turn the Sentences into 3 Numpy Arrays

3. Train a basic LSTM-based Seq2Seq Model

4. Test the Model by Decoding Some Sentences

1.

Section Title

2.
Turn the Sentences into 3 Numpy Arrays

3.
Train a basic LSTM-based Seq2Seq Model

4.
Test the Model by Decoding Some Sentences