Gregor Koehler / Oct 22 2017

Building a Translator - A Sequence to Sequence Learning Intro in Keras (CPU)

Gregor Koehler, adapted from F. Chollet's blog post

Prerequisites:

conda install -qy -c anaconda tensorflow h5py
pip install keras
Locked
setup
Bash

















































from keras.models import Model
from keras.layers import Input, LSTM, Dense

import numpy as np
Done
imports
Python Setup

Download dataset (spanish-english here), taken from this source.

Some info:

  • Because the training process and inference process (decoding sentences) are quite different, we use different models for both, albeit they all leverage the same inner layers.

On Canonical Sequence to Sequence Learning:

  • more general since input and output sequences don't have to have the same length
  • entire input sequence is needed to start predicting the target
  • usually simply referred to as 'Seq2Seq' learning (leaving out the canonical)

Training Mode:

  • "Encoder"
  • is a RNN layer (or a stack thereof) to process the input sequence
  • returns internal state of the RNN (output is discarded)
  • internal state serves as "context", or "conditioning", of the decoder
  • "Decoder"
  • is another RNN layer (or a stack thereof) trained to predict the next characters(or other sequence?) of the target sequence, given previous characters of the target sequence
  • aims to turn target sequences into the same sequences but offset by one timestep into the future (called "teacher forcing")
  • as initial state uses the internal state of the encoder, thus obtaining information on what it's supposed to generate
  • effectively learns to generate targets[t+1] given targets[t], conditioned on the input sequence by using the encoder internal state

Inference Mode:

  • encode the input sequence into state vectors (internal state)
  • start with a target sequence of size 1 ('start-of-sequence' character)
  • feed state vectors and start char to decoder to produce predictions for the next character
  • sample the next char using these predictions (often simply by using argmax)
  • append the sampled char to the target sequence
  • repeat until we hit 'end-of-sequence' char or we hit char limit

1.

Section Title

2.
Turn the Sentences into 3 Numpy Arrays

These numpy arrays are encoder_input_data, decoder_input_data and decoder_target_data.

  • encoder_input_data is a 3D array of shape (num_pairs, max_english_sentence_length, num_english_characters) containing a one-hot vectorization of the English sentences.
  • decoder_input_data is a 3D array of shape (num_pairs, max_french_sentence_length, num_french_characters) containg a one-hot vectorization of the French sentences.
  • decoder_target_data is the same as decoder_input_data but offset by one timestep. decoder_target_data[:, t, :] will be the same as decoder_input_data[:, t + 1, :].

3.
Train a basic LSTM-based Seq2Seq Model

... in order to predict decoder_target_data given encoder_input_data and decoder_input_data. Our model uses teacher forcing.

4.
Test the Model by Decoding Some Sentences

I.e. turn samples from encoder_input_data into corresponding samples from decoder_target_data.