JSoC 2019: Reinforcement Learning Environments for Julia
Introduction
Hi. I'm Kartikey and this is my first blogpost since after my proposal was selected for Julia Season of Code for the year 2019. My project for the summer is adding a variety of Reinforcement Learning Environments for the Julia programming language. My project comes under Julia's machine learning spearhead, Flux.
Over the course of summer of 2019, I'll be adding various RL environments for the language. While Julia is a great langauge for ML, and the Flux stack makes deep learning a breeze, the lack of dedicated RL environments, similar to OpenAI's acclaimed gym
package, is apparent to the RL community. A lot of good work is being done in the field right now, and RL has started to generate renewed interest in the ML community. If there are proper tools for researchers and RL enthusiats, the language may be adopted early on by the community and that may solidify its status as the go-to langauge for all RL related work. This is where my project comes comes in. Guiding me on my quest are mentors Dhairya Gandhi and Elliot Saba and community member Tejan Karmali.
A lot of my work is inspired from OpenAI's gym
package. The package has environments from various categories. These include...
- Classic control environments- These are control problems from classic RL literature
- Toy text environments- A collection of environments that serve as an easy starting point.
- Algorithmic problems- The objective here is to learn to mimic computations
- Atari based environments- The goal is to achieve a high score in Atari 2600 games
- 2D games, such as Moon Lander, Bipedal Walker and Car Racing
- Environments that require the MuJoCo backend
- Unit Test environments for CNNs and RNNs
In addition to the aforementioned environments, I also plan on adding a differentiable NES emulator, which brings a whole slew of classic NES games for training a Julian RL agent with.
A big goal is to make the environments differentiable. AD is the next big thing that has happened in ML. Automatic Differentiation, which is at the heart of differentiable programming, not only makes taking derivatives easy and simple, but also very accurate in comparison to the more traditional methods. Since ML is all about calculating those gradients when tuning the model, this has made AD and differentiable programming the center of attention for next-gen ML work. The Flux stack also makes use of these techniques and this makes having differentiable environments even more desirable.
Demonstration
The work on the project has already begun and is slowly taking shape in the Gym.jl package. The project was started by members Tejan and Neethu Mariya Joy (@Roboneet), who added the initial set of environments. Over the weeks, we've improved the package by adding representations of the Discrete, Box, Tuple, Multi-Binary spaces and their derivatives. We've also added a registry system for handling the environments, and we also have a working rendering system, thanks to the efforts of community member Kyle Daruwalla (@darsnack).
Disclaimer
Before we begin, I should mention that for this demonstration, I'm using a modified version of the package. The original version that is available on GitHub can render environments three ways-
- Output an RGB array
- Render in Juno's Plot Pane
- Render in a Gtk generated window
Importing the #master
version of Gym.jl gives an error because Gtk can not be properly initialised on nextjournal's servers. To showcase the package in this article, I had to remove the Gtk dependency.
The code in this post can be executed. However, initialising the Julia runtime and importing the package for the first time may take a few minutes.
Loading environments
Effort has been put to keep things similar to way they are in Pythonland. While designing the registry, I've tried to keep the interface very similar to OpenAI's gym
package, so that transitioning from gym
to Gym.jl is an effortless transition. To that end, we have the make()
function which can be used to initialise new environment configurations. It accepts a String
, which is the name and version of the environment we want. We can also specify the rendering mode for the environment as a symbol.
:human_pane
for rendering in Juno's Plot Pane:human_window
for rendering in a popup window:rgb
for an RGB array
Not specifying a rendering mode defaults to the window option.
using Gym env = make("CartPole-v0", :human_pane)
Environments can also be registered using the register()
function. Adding environment configurations is easy. The function accepts the name of the environment, the name of the struct
of the environment as a Symbol, and the path to file where it is defined. Additional arguments, such as the maximum number of episode steps, the threshold for reward etc. can be defined by using keyword arguments. For example, the CartPole environment mentioned above is registered in the following manner
register("CartPole-v0", :CartPoleEnv, "/classic_control/CartPole.jl", max_episode_steps=200, reward_threshold=195.0)
Executing the above cell will result in an error, because this configuration of the CartPole environment has already been registered.
Available Spaces
The Spaces have been added to provide a little structure to the Gym. All the spaces are subtypes of an abstract type called AbstractSpace
. We have the following spaces...
Box
Discrete
MultiBinary
TupleSpace
DictSpace
MultiDiscrete
A sample from a space can be drawn by using the sample()
function. For example...
my_box_space = Gym.Space.Box([-1.1,-2.0], [2.6,4.0], Float32) # Box(low::Array, high::Array, dtype::DataType) sample(my_box_space)
Progressing through the timesteps
The make()
returns an environment that is wrapped in a wrapper. This wrapper neatly binds the environment with its metadata. To step through the environment, we pass our action to the step!()
function in a format that agrees with the environment's action space. In case of a newly generated environment instance, we must first call reset!()
function to generate an initial state.
env = make("CartPole-v0", :human_pane) reset!(env)
CartPole has a Discrete(2)
action space which denotes the two directions in which force can be applied to the cart. The environment configuration and the action is passed to the step!()
function to apply that action. The function, in case of CartPole, returns the updated environment state, the reward, a boolean value that tells if the simulation has terminated and any additional information, if present.
next_state, reward, done, _ = step!(env, sample(env._env.action_space)) # We can access the environment in the wrapper by using the `_env` field of the wrapper println("New state is $(next_state)") println("The reward is $(reward)") println("Is the simulation complete? $(done)")
Rendering
To render the current state of an environment, the render!()
function is called. As mentioned above, we currently have three ways to render environments- :human_pane
, :human_window
and :rgb
env = make("CartPole-v0", :human_pane) reset!(env) render!(env)
After some time steps, we call render!()
again to see the difference...
actions = [sample(env._env.action_space) for i=1:1000] i = 1 done = false reset!(env) while i <= length(actions) && !done global i, done a, b, done, d = step!(env, actions[i]) i += 1 end render!(env)
The presence of only static images may give the impression that the rendering is incapable of animation, but that is far from the truth. While Nextjournal is undoubtedly the best platform for writing this blogpost, I am unable to show animations that are generated by the running script. But the following short video should give you an idea of what to expect...
List of available environments
As of right now, there are three environments in the Gym
- CartPole
- Pendulum
- Continuous_MountainCar
I am already working on Pendulum and the Continuous_MountainCar renderers and they should be out soon! :)
Wrap up and concluding thoughts
This has been a short (or as short as my sleep-deprived brain could manage) rundown of what my project is. I've explained where I want it to go, and I've explained, as best as I can, where it is right now.
I would like to conclude this post by wishing my fellow GSoC and JSoC participants the best of luck for their projects. Everyone has their projects, their deadlines, and their plans of attack.
Well then, let's get this party started, shall we?