# Beginning of Julia season of contribution

## Introduction

Hello everyone!

I have been accepted to "Julia season of contribution" and I could not be happier. I will spend the following months developing a new package for "Accelerating optimization via machine learning with different surrogate models".

Sounds cool, right? Well, It is! The idea behind this approach is quite intuitive: suppose we have a function *Lotka-Volterra*, *Air pollution model* and the *2D Brusselator semilinear Heat equation*.

## Many surrogates to choose from!

I am going to mainly focus on the following surrogates: **Response surfaces**, **Machine learning based surrogates** and **Gaussian Processes**.

I will give a **very** high level explanation of what those are. If you are interested in mathematical details you can take a look at my proposal right here on my GitHub.

### Response surfaces

Response surfaces are polynomial interpolations "polluted" with other functions called *basis functions. *

There are many of such functions, I am going to focus on:

The last one is called *Kriging* and it stands out from the others because it has a statistical interpretation: we can compute an interpolator as well as a measure of its possible error. We can then develop search methods that put some emphasis on sampling where this error is high.

### Machine learning based surrogates

I will mainly focus on *Random forest regression* algorithms and *Radial Basis Neural Networks*. For the former, there is already the library XGBoost that does the heavy lifting so it should be quite easy to integrate that with DifferentialEquations.jl

For the latter, the main idea is to interpolate with the following function:

After some analysis on functionals, it can be proved that the weights are found by solving the following linear system:

Where *minimization* problem with least squared regression.

### Gaussian process (Bonus)

A gaussian process (GP) is a collection of indexed random variables such that every finite collection of those random variables has a multivariate normal distribution.

The distribution if fully characterized by its mean function and its covariance function.

They are useful for regression because not only we get a predicted value in a unknown region but also the confidence level of such a prediction.

## Plans for the next two weeks

I will initialize the package and start coding *Response surfaces*. I feel it is better to start now and solve problems along the way than to keep reading docs without actually coding by myself.

Happy coding!

Ludovico