A fresh approach to N-Body problems
N-Body problems describe the motion of a system of N mutually interacting particles or bodies. They initially appeared in astronomy, as a way of describing the motion of the planets, but the same formalism can also be applied in the case of molecular dynamics or other systems where we have entities interacting via potentials. The simplest case of N=2 has analytical solutions, but starting from N=3, there are no general analytical solutions and thus numerical methods are required.
Mathematically, this implies solving a system of ordinary differential equations, constructed by computing the forces on each particle of the system. Since in the most cases of interest the above mentioned N is very large, this results in a large system of ordinary differential equations (ODEs) that need to be solved. This in turn generates constraints on the performance of the numerical methods that we can use to solve the said system.
Moreover, in most cases, the equations of motion of interest have a special structure that preserves some invariants, such as the total energy of the system. Thus we need a method that has good conservation properties and good performance. For this reasons, in practice, the most widely used numerical method is the Verlet integrator (or variations such as the leapfrog method).
In Julia, the answer to solving large system of ODEs is to use the differential equations integrators provided by the SciML ecosystem. If we do not want to construct the system of equations ourselves, we can use on of the dedicated N-Body problem packages in Julia to simplify our task. One of them is the NBodySimulator.jl package, which can use the vast number of integrators in the SciML ecosystem.
One question that arises naturally is if there are other numerical methods that could be used besides the Verlet method. To answer this question, I started benchmarking several integrators in order to see how they behave. As a test problem, I chose one of the examples form the NBodySimulator.jl package dealing with a liquid argon simulation. The benchmark can be seen in the SciMLBenchmarks.jl repository here.
As the benchmark requires a very long runtime and significant computing power, it would be counter-productive to reproduce them here, so I will continue only with the main results.
When we solve numerically a system of ODEs, we are implicitly converting a continuous dynamical system into a discrete one. It is important to note that the mathematical properties of the continuous system are not necessarily the same as the ones of its discrete version. Thus if we want to preserve the qualitative features of the system, we should choose an integration method that gives the discrete version of the system properties as close as possible to the original ones.
In the case of classical mechanics, examples of such mathematical properties that we want to preserve, are energy conservation and the symplectic structure. There is a family of integrators called symplectic integrators which is of interest as preserving the symplectic structure indirectly helps with energy conservation.
Comparing most of the symplectic methods in OrdinaryDiffEq.jl (see the complete list here) in terms of runtime vs energy error gives the following result.
As we can see, the low order methods generally perform better than the high order ones in this case. We can also see that CalvoSanz4 and McAte2 are competitive with the Verlet method.
Besides symplectic integrators, there is an other family of integrators specialized on second order ODEs called Runge-Kutta-Nyström integrators, which can be more efficient in certain conditions as they have adaptive time stepping, but they do not conserve energy that well.
Comparing them to the best symplectic integrators from the previous benchmark gives
As we can see above, the symplectic methods perform significantly better on energy error, but they are slower.
During the development of these benchmarks, I discovered another Julia package that was targeting N-body simulations, named Molly.jl. This package was built for molecular dynamics and thus was more advanced than NBodySimulator.jl in those aspects. After a discussion with my JSoC mentor, I decided to start working on this package (together with its author) to optimize and enhance it in order to become a future successor of NBodySimulator.jl
In order see the performance impact of the optimizations we will consider two test problems (taken from the tests of the package): one concerning a Lennard-Jones gas and another considering more complex interactions, a peptide simulation.
The first part of the benchmark measures the complete simulation time and the second part measures the time for one
accelerations function call (which is the most important part in a time step).
In order to see the performance evolution, I locally ran the above code on the commits mentioned in the comments and recorded the results (the absolute values are not relevant in this case).
The first data point is before my first pull request to Molly.jl, the second one is after some changes aimed at eliminating dynamic dispatch and type instability. As we can observe, these were the most significant performance improvements.
The next step was to improve the testing infrastructure by checking energy conservation during simulations. This required adding the potentials besides the forces for each type of interaction. I also made some performance improvements targeting non-bonded interactions (which can be seen in the speedup in the Lennard-Jones test). This gives the situation at the third data point.
The last data point is from an ongoing PR of Joe Greener (@jgreener64) which aims at making Molly work on GPUs and also being differentiable via Zygote.jl.
In this section I will detail more about the optimizations that I implemented in Molly. The first round of optimizations was aimed at eliminating dynamic dispatch, as this was occurring during the acceleration function call, which is one of the most called functions during the simulation. This was achieved by ensuring that the
Simulation structure had concrete types for its fields. This helps type inference to determine the types cases like
s.atoms and thus eliminates dynamic dispatch.
Another optimization was part of the energy function PR and was aimed at the
vector function, which computes the distance between two coordinates by taking into account periodic boundary conditions. Since almost all potentials and forces depend on the distance between particles, this is one of the most important functions to optimize during the simulation. The version in the package (at v0.1.0) uses broadcasting to compute the result.
Since Molly uses
StaticArrays to represent coordinates for performance (in the CPU case), we can add a new method specialized on
StaticArrays to add a performance optimization while retaining the generic fallback. How could such an optimization look like?
The simplest way to optimize the above code would be to directly hardcode the resulting
But this will not be as general as the previous definition. If we would like to also use this optimization in the case of 2D simulations, we would have to write another method. What could we do to keep generality in this case?
The answer is to use the compiler to write the code for us. This can be done via metaprogramming, more specifically with generated functions.
We can use similar techniques for other optimizations such as eliminating
if checks in code by encoding information in type parameters. For example, there is an option to check if the force between two particles is too big and truncate it if necessarily. This can help in the case of incorrectly chosen initial conditions that put particles too close.
The above show the use of optionally generated functions to eliminate the
if check completely from the code if the
F type parameter is
Comparing with NBodySimulator.jl
After optimizing the Molly.jl code, an question that appears is how does it compare with NBodySimulator.jl. To this end, we can take a test problem and simulate it with both packages to see the differences.
We will consider the
NBodySimulator setup as the reference one and create a similar one in
Molly. We will benchmark both the total simulation time and the acceleration computation in one step.
acceleration (generic function with 1 method)
We can now initialize the setups and benchmark the accelerations
As we can see, the
Molly version is over 2x faster than the
NBodySimulator one. Let us now look at the full simulation time.
In this case, the
NBodySimulator version almost 2x is faster. This indicates that the integration is more efficient, which was expected as the package uses the high performance integrators from
OrdinaryDiffEq. A future integration of these integrators with
Molly is planned.
The energy function PR added the possibility to record the energy of the system during its evolution. While computing the energy of the system seems quite simple at first sight, a more subtle problem appears when we take a closer look at the potentials.
The total potential energy of a system is given as a sum of the individual inter-particle potentials
formula not implemented
The forces acting on the particles are given by
formula not implemented
In the case of the Lennard-Jones potential, the inter-particle potential is given by
formula not implemented
and the forces are given by
formula not implemented
As the potential, and thus also the force decreases rapidly with the distance, in almost every implementation of the Lennard-Jones force calculation there is a cutoff radius beyond which the force is set to 0.
While this sounds like a very sensible approach, it introduces a discontinuity in the force function and it requires us to also modify the potential, as beyond the cutoff radius the force would be 0, but the derivative of the unmodified potential is not. One way to truncate the potential is to shift the potential by its cutoff value.
formula not implemented
This way the potential function is continuous and the relation between forces and potentials is satisfied. This truncation method is called shifted potential cutoff.
Another option is to shift the force in order to make it continuous
formula not implemented
This requires a more complicated change in the potential in order to satisfy the relation between them. This method is called the shifted force cutoff. The continuity of the force is desirable as it may give better energy conservation properties as shown in Toxvaerd 2011.
There are also more complicated truncation methods that interpolate between the original potential and 0, but we will consider those two for the moment.
The truncation approximations that we use can significantly alter the qualitative features of the simulation as shown in many articles in the molecular dynamics literature (Fitzner 2017, van der Spoel 2006 and others).
After the brief theoretical review above, we can focus on the implementation of cutoffs in
Molly. As the truncation approximation can be theoretically applied to any sufficiently rapidly decaying potential, it is important to decouple the potential implementation from the cutoff approximation.
In the general case, the cutoffs all share a number of points at which the force expression changes. If we have no cutoff, that number is 0, if we have a shifted force or potential cutoff, then that number is 1. In more complex cases we can have 2 points at which the force expression changes.
Because of the nature of the cutoff, we must exit the function early after the cutoff radius. This imposes that we
return from inside the force function corresponding to the potential. In order to decouple the cutoff approximation we can use the following
This way we can have any number of cutoffs approximations for the same potentials without needing to rewrite the code each time. A similar technique is used for the potential.
For the cutoff implementations we now just have to dispatch on the cutoff type
As we can see, with this design the relationship between the force and the potential is enforced by dispatching on the same type in both the
force_cutoff function and the
As a further performance optimization, we could also use optionally generated functions in the force implementation to insert only the code corresponding to the desired number of cutoff points at compile time.
As next steps for this project, I would like to help integrate
Molly with the integrators in
OrdinaryDiffEq. To this end I created the ParticleAccelerations.jl package which will factor out the summation algorithms for forces and potentials and provide a way to decouple the summation implementations from the interaction implementations (which will live in
Molly). An interesting aspect to tackle would be to implement Ewald summation, which could improve the efficiency of the summation algorithm.