JSoC'19: A Summary
Hello. This is the second and final blogpost, in which I describe all the work I did under Julia's Season of Contributions, in the summer of 2019. Thanks for this wonderful opportunity and a big thank you to my mentors, Dhairya Gandhi and Elliot Saba, and to the entire Julia community.
While I had intended to work solely on the the Gym.jl package when I started off four months ago, I ended up doing, and learning, much more than I had initially anticipated. I will be breaking down my progress into months.
June
June started off with the end of my semester. The first week was a bit hectic because I had move my stuff around and relocate, but I would sit down with my work in the night, when things got quiet. I used this time to fix a few bugs that were not caught during the inspection in the pre-JSoC period. After that I moved on to writing the algorithmic environments of the Gym. Agents can learn to imitate computation from these environments. Here's an example Copy environment...
using Gym env = make("Copy-v0", :human) Gym.reset!(env) Gym.render!(env)
I then finished off the remaining classic control environments. I added the MountainCar environment and the rendering for the MountainCar and ContinuousMountainCar and fix the rendering for the PendulumEnv. Cairo, the graphics library I use in Gym.jl, is a bit tricky to work with. I had trouble figuring out how to rotate the cart along the track, so that isn't smooth and needs to be worked on.
env = make("MountainCar-v0", :human_pane) Gym.reset!(env) Gym.render!(env)
I also worked on the Atari environments. I started with the reading the Arcade Learning Environment (ALE) reference manual and I started off with ArcadeLearningEnvironment.jl which to my knowledge hadn't been updated since the v0.6 days. I had updated it using the standards we have here in v1.1. Fortunately, JuliaReinforcementLearning already had a fork which was v1.0 compatible, so that was nice. I went over it nonetheless, added docstrings, fixed a few minor errors here and there. Soon after that I was able to add an interface to the ALE in the Gym and describe environments. June ended with me finishing up on the Atari Environments. Here's a small demonstration...
using GymSpaces env = make("PongNoFrameskip-v4", :human_pane) Gym.reset!(env) action_space = env.action_space for i=1:200 Gym.step!(env, sample(action_space)) end Gym.render!(env)
July
As I finished up with Atari Environments, I wanted test them out, make sure that they function properly. I decided that the best way for that would be to train an agent on it. I selected the DDQN algorithm because it is one of the seminal works in RL and the abundance of implementations to refer from. There's also a working implementation in Julia, but it had to be updated, which was all the more the reason for selecting this algorithm. It took me some time to read and understand the literature and I started off with a basic implementation which, to no one's surprise, didn't work. At this point, I decided to shelf it and come back to it at a later time. More on that later...
Around this time, my mentor, Dhairya Gandhi, asked me if I'd be up for a project involving differentiable pong environment. This would be my first encounter with Zygote.jl. Writing the environment itself was easy. The challenge was making it differentiable. I had to read Zygote's docs and articles on automatic differentiation. I finally got down to it and after some trial and error I got something. But it had a few bugs regarding to the underlying IR Tools used by Zygote. It took a few workarounds to get over it and we have something that works. Of course, I'm not done with it yet. I can still think of ways for improving on the work I've done, and hopefully I'll be able to get to it very soon.
using DiffPong env = Env() DiffPong.reset!(env) for _=1:100 state, reward, done, _ = DiffPong.step!(env, rand(1:3)) end DiffPong.render(env) |> display
Next up were the NES environments. With the NES environments, I had a dynamic library for the NES emulator that I wanted to use. But it wasn't as well realised as ALE. For example, ALE handles the job of checking whether the ROM is corrupt or not, loading the ROM properly, proper environment instance generation, making sure what the rewards are etc; these tasks are handled by ALE, not the user. That wasn't the case with the NES emulator that I was working with. It was essentially just an emulator and it was my job to make stable, functioning environments out of it. I hit my first obstacle almost immediately after I began; random SEGFAULTs. It took me about half a week to overcome this problem. I had to reread the docs the describe Julia's C interoperability. I looked through the code again and identified the bug; I had used a Cchar
when I should have used a Cwchar_t
-a subtle mistake that I had previously overlooked. After that, I breezed through NES Environment definition and its related functions.
Next step was writing the Super Mario Bros levels. I hit my next roadblock when I was trying to get the obtain the game screen from the screen buffer. It took me some time to figure that out, but I did find the problem, which was that I was calling in data from the wrong locations in the Emulator's RAM. With this problem fixed, the NES emulator and the SMB environments were quickly finished.
using NES environment_type = "smb" action_type = :RIGHT_ONLY env = SMBEnv(environment_type, action_type) current_state = NES.reset!(env) done = false actions = [env.action_map |> keys |> rand for _=1:200] for action ∈ actions global done, current_state current_state, reward, done, info = NES.step!(env, action) done && break end NES.render(env) |> display
August
As August came by, my new term in college had begun, so I've not been as active as I hoped to be.
I started August with the intention of getting the Lunar Lander and Bipedal Walker done. I had picked the RigidBodyDynamics.jl package for that purpose. But it turned out that it required a certain physics background which I don't currently have. So I decided to come back to it at a later point in time. It was around this time that I decided to take a break from the Gym work and try something else. I figured that would be a great time to go back to the DDQN agent and pick up where I had previously left off. With a refreshed mind and new outlook, I made major changes to the agent code, essentially rewriting the code from the ground up. This turned out to be a good thing, as I was made aware of a few things that I had missed in the environment code, and I quickly fixed them. I tried numerous approaches, each of which improved my agent bit by bit, but still not enough to reach convergence. I tried almost everything I could think of, that could fix the bug that eludes me, but to no avail. After sitting down with it for almost a week and half, I decided to shelf it yet again, and think of other ideas to try with it until the day I decide to come back to again.
Couple of weeks ago, I was asked to write adjoints for cumsum
and cumprod
functions for Zygote. I spent the first day catching up and studying the groundwork that had already been laid. I figured I should start easy, which meant starting with a GPU implementation of reverse
that supported the dims
keyword and worked on multi dimensional arrays. Working with DDQN agent had already provided me some experience with CuArrays
. I used that to study the basics of GPU programming (something I wanted to try for quite some time). I wrote a very simple implementation that performs index transformations by considering a multi-dimensional array as a 1D array by unravelling the indices. That was a quarter of the problem solved. The next two quarters involved writing the adjoints themselves. cumsum
was easy. cumprod
would also easy but it had a few edge cases that had to be dealt with separately. The remaining part of the problem involves rewriting the CUDA kernels for prefix scan for CuArrays.jl. The current implement is very slow and not optimised. The current implementation uses the Hillis Steele algorithm and uses global memory, which makes it very slow. I plan on rewriting the kernels using the Blelloch algorithm and shared memory, should lead to a performance speedup.
I finished a major bulk of the toy text environments for the Gym a few days ago. I still have to write their rendering and register them to the Gym's registry, but that is should be easy, so I'm not particularly sweating it at the moment.
Recently, I decided to collaborate with a few community members to work on a Differentiable Neural Computer, which is based on a paper published by DeepMind back in 2016. It's a very interesting project and if anybody wants in on that, they are more than welcome :)
Conclusion and parting thoughts
This is a summary of almost everything I did during my 3 month long summer break. As someone who barely had a GitHub presence in March, and who was intimidated by open source projects because of just how massive they tend to be, I'd say I've grown to love the whole idea of open source. During this time, I learned a lot about Julia, ML and other disciplines as well. I have already chalked out plans for future projects (all of them in Julia :P ) and I plan on hanging around in Julia's Slack for a long time.
Until next time.