JSoC 2019: Reinforcement Learning Environments for Julia

Introduction

Hi. I'm Kartikey and this is my first blogpost since after my proposal was selected for Julia Season of Code for the year 2019. My project for the summer is adding a variety of Reinforcement Learning Environments for the Julia programming language. My project comes under Julia's machine learning spearhead, Flux.

Over the course of summer of 2019, I'll be adding various RL environments for the language. While Julia is a great langauge for ML, and the Flux stack makes deep learning a breeze, the lack of dedicated RL environments, similar to OpenAI's acclaimed gym package, is apparent to the RL community. A lot of good work is being done in the field right now, and RL has started to generate renewed interest in the ML community. If there are proper tools for researchers and RL enthusiats, the language may be adopted early on by the community and that may solidify its status as the go-to langauge for all RL related work. This is where my project comes comes in. Guiding me on my quest are mentors Dhairya Gandhi and Elliot Saba and community member Tejan Karmali.

A lot of my work is inspired from OpenAI's gym package. The package has environments from various categories. These include...

Classic control environments- These are control problems from classic RL literature
Toy text environments- A collection of environments that serve as an easy starting point.
Algorithmic problems- The objective here is to learn to mimic computations
Atari based environments- The goal is to achieve a high score in Atari 2600 games
2D games, such as Moon Lander, Bipedal Walker and Car Racing
Environments that require the MuJoCo backend
Unit Test environments for CNNs and RNNs

In addition to the aforementioned environments, I also plan on adding a differentiable NES emulator, which brings a whole slew of classic NES games for training a Julian RL agent with.

A big goal is to make the environments differentiable. AD is the next big thing that has happened in ML. Automatic Differentiation, which is at the heart of differentiable programming, not only makes taking derivatives easy and simple, but also very accurate in comparison to the more traditional methods. Since ML is all about calculating those gradients when tuning the model, this has made AD and differentiable programming the center of attention for next-gen ML work. The Flux stack also makes use of these techniques and this makes having differentiable environments even more desirable.

Demonstration

The work on the project has already begun and is slowly taking shape in the Gym.jl package. The project was started by members Tejan and Neethu Mariya Joy (@Roboneet), who added the initial set of environments. Over the weeks, we've improved the package by adding representations of the Discrete, Box, Tuple, Multi-Binary spaces and their derivatives. We've also added a registry system for handling the environments, and we also have a working rendering system, thanks to the efforts of community member Kyle Daruwalla (@darsnack).

Disclaimer

Before we begin, I should mention that for this demonstration, I'm using a modified version of the package. The original version that is available on GitHub can render environments three ways-

Output an RGB array
Render in Juno's Plot Pane
Render in a Gtk generated window

Importing the #master version of Gym.jl gives an error because Gtk can not be properly initialised on nextjournal's servers. To showcase the package in this article, I had to remove the Gtk dependency.

The code in this post can be executed. However, initialising the Julia runtime and importing the package for the first time may take a few minutes.

Loading environments

Effort has been put to keep things similar to way they are in Pythonland. While designing the registry, I've tried to keep the interface very similar to OpenAI's gym package, so that transitioning from gym to Gym.jl is an effortless transition. To that end, we have the make() function which can be used to initialise new environment configurations. It accepts a String, which is the name and version of the environment we want. We can also specify the rendering mode for the environment as a symbol.

:human_pane for rendering in Juno's Plot Pane
:human_window for rendering in a popup window
:rgb for an RGB array

Not specifying a rendering mode defaults to the window option.

using Gym

env = make("CartPole-v0", :human_pane)

1.1s

demo-runtime (Julia)

EnvWrapper(false, 0, 0, true, 195, 200, CartPoleEnv, CairoCtx(CartPoleDrawParams(0x00000190, 0x00000258, 4.8, 125.0, 0x00000064, 10.0, 125.0, 50.0, 30.0), CairoSurfaceBase{UInt32}(Ptr{Nothing} @0x000000000442b2d0, 600.0, 400.0)))

Environments can also be registered using the register() function. Adding environment configurations is easy. The function accepts the name of the environment, the name of the struct of the environment as a Symbol, and the path to file where it is defined. Additional arguments, such as the maximum number of episode steps, the threshold for reward etc. can be defined by using keyword arguments. For example, the CartPole environment mentioned above is registered in the following manner

register("CartPole-v0",
         :CartPoleEnv,
         "/classic_control/CartPole.jl",
	     	 max_episode_steps=200,
		     reward_threshold=195.0)

0.7s

demo-runtime (Julia)

Executing the above cell will result in an error, because this configuration of the CartPole environment has already been registered.

Available Spaces

The Spaces have been added to provide a little structure to the Gym. All the spaces are subtypes of an abstract type called AbstractSpace. We have the following spaces...

Box
Discrete
MultiBinary
TupleSpace
DictSpace
MultiDiscrete

A sample from a space can be drawn by using the sample() function. For example...

my_box_space = Gym.Space.Box([-1.1,-2.0], [2.6,4.0], Float32) # Box(low::Array, high::Array, dtype::DataType)
sample(my_box_space)

2.3s

demo-runtime (Julia)

2-element Array{Float32,1}: 1.4229 -0.936795

Progressing through the timesteps

The make() returns an environment that is wrapped in a wrapper. This wrapper neatly binds the environment with its metadata. To step through the environment, we pass our action to the step!() function in a format that agrees with the environment's action space. In case of a newly generated environment instance, we must first call reset!() function to generate an initial state.

env = make("CartPole-v0", :human_pane)
reset!(env)

0.9s

demo-runtime (Julia)

4-element Array{Float32,1}: -0.0453082 -0.0477357 0.0103336 0.0169484

CartPole has a Discrete(2) action space which denotes the two directions in which force can be applied to the cart. The environment configuration and the action is passed to the step!() function to apply that action. The function, in case of CartPole, returns the updated environment state, the reward, a boolean value that tells if the simulation has terminated and any additional information, if present.

next_state, reward, done, _ = step!(env, sample(env._env.action_space)) # We can access the environment in the wrapper by using the `_env` field of the wrapper
println("New state is $(next_state)")
println("The reward is $(reward)")
println("Is the simulation complete? $(done)")

2.6s

demo-runtime (Julia)

Rendering

To render the current state of an environment, the render!() function is called. As mentioned above, we currently have three ways to render environments- :human_pane, :human_window and :rgb

env = make("CartPole-v0", :human_pane)
reset!(env)
render!(env)

1.2s

demo-runtime (Julia)

After some time steps, we call render!() again to see the difference...

actions = [sample(env._env.action_space) for i=1:1000]
i = 1
done = false
reset!(env)
while i <= length(actions) && !done
    global i, done
    a, b, done, d = step!(env, actions[i])
  	i += 1
end
render!(env)

1.0s

demo-runtime (Julia)

The presence of only static images may give the impression that the rendering is incapable of animation, but that is far from the truth. While Nextjournal is undoubtedly the best platform for writing this blogpost, I am unable to show animations that are generated by the running script. But the following short video should give you an idea of what to expect...

List of available environments

As of right now, there are three environments in the Gym

CartPole
Pendulum
Continuous_MountainCar

I am already working on Pendulum and the Continuous_MountainCar renderers and they should be out soon! :)

Wrap up and concluding thoughts

This has been a short (or as short as my sleep-deprived brain could manage) rundown of what my project is. I've explained where I want it to go, and I've explained, as best as I can, where it is right now.

I would like to conclude this post by wishing my fellow GSoC and JSoC participants the best of luck for their projects. Everyone has their projects, their deadlines, and their plans of attack.

Well then, let's get this party started, shall we?