Hello everyone! It’s been a while since my last post, but now I am back from my vacation ready to keep on testing new ML modules. In this blog, we will be testing 2 different optimization techniques for neural networks that are basic.
If this is the first time you visit my blog, the purpose of this series of articles is intended to create a Machine Learning model to predict how long will it take an average visitor to wait in line at a Disney or Universal park before they can ride their favorite rollercoaster and with this information, optimize their visit by scheduling parks in the day where their favorite parks are less crowded.
Before jumping directly into what dropout is, we will be studying very quickly how can we save a trained model into a file, because we obviously don’t want to be training the models each time they want to use it.
Save a model in .h5
Saving a model into a file will not only save us time for reusing our trained model, but we will see later on that with a model file we can import and improve our pipeline using online tools. I am not going to give out any spoilers but get ready to migrate your environment to the cloud.
Lucky for us, saving a model into a file is extremely easy. Please feel free to look at my google colab to see step-by-step how I trained the model and how to import the libraries to create the file.
Before we begin with the process please make sure you have installed the packages by running the following command:
pip install h5py
Once installed we can now train the model and perform all the hyperparameter tunning needed beforehand. For this exercise, I will be training 100 times my MLP model with scrambled data, I will evaluate each trained model and within a local variable, I will store the highest accuracy so far. Only if a new model has a higher accuracy when compared to the previous best, the function will overwrite the model file and update the variable containing the performance metric.
Pretty simple so far, as you can see, the magic element here is line number 8, where we use the function .save() to define the name of the file and also provide a location for it, it is actually not different from creating a .csv file.
Now the moment of truth, let’s read that file and use it to make predictions:
The code is simple but let me explain a couple of details that might be relevant for this exercise. First, in line 2 I am reading the model named MLP_HP, then in line 3, I am creating an inorganic input for my model, taking into consideration that my values are normalized. Finally, I am calling the .predict() function and obtaining a prediction from my neural network.
Easy such as winning a match against the Ottawa Senators!
Dropout Regularization
This concept is extremely easy to understand. As its name suggests, this technique will drop some neurons in every iteration of the training. To make this a little more visual let’s use an image and a step by step
Efficient Chaotic Imperialist Competitive Algorithm with Dropout Strategy for Global Optimization — Scientific Figure on ResearchGate. Available from:
https://www.researchgate.net/figure/Dropout-Strategy-a-A-standard-neural-network-b-Applying-dropout-to-the-neural_fig3_340700034 [accessed 3 Aug, 2022]
Let’s imagine our neural network is this simple, it has only 4 layers:
One input layer
2 hidden layers with 5 neurons each
one output layer with one neuron
First, we need to define the value of the dropout, this is a value between 0 and 1, and it defines the probability for a single neuron to be dropped in a loop. So, for example, if we set this value to 0.5, this means that each neuron will be “entering a draw” each neuron has a 50% chance of living and being used in the network but also a 50% chance of being dropped. If we modify this parameter to, for example, 90%, this means each neuron will have a 90% chance of being dropped, which will drastically reduce the number of neurons used in every loop. This might be beneficial or not, it all depends on our architecture.
After updating the weights of the neurons for the first time using backpropagation, we will start using the dropout in the second loop. In this second interaction, all our neurons will have to be evaluated and some of them will simply be excluded from the analysis. They will not take part in the backpropagation update of weights. It is totally random which neurons will take part in the analysis and which ones will be dropped, but this is exactly what will help our model improve its performance and most of all it will help with overfitting.
On every loop, each neuron will be evaluated and it will be decided whether it lives or dies on that part of the process, so each training loop will be done using a different architecture, based on the general that we defined.
For this exercise, I created a class with two different ways to make drop out. Using code, dropout is treated as if it was one more layer, so technically we can define a dropout function for every layer we define. My two functions are contrary, the first one uses only one dropout layer, linked to the first layer, in my second function, there is one dropout layer for each layer defined previously. In the whole class, all dropout layers share the same parameter.
The next step of the analysis will be… YES! You guessed correctly, which value of dropout is the best? will it be 0.5? This is the most common parameter. Let’s find out.
My code is extremely easy I only set an array of values and evaluate my function with each value. Let’s take a look at some results:
MLP with dropout rate = 0.1
You can take a complete look at all the results in my colab file. Here just as a summary I observed that the higher the percentage, the worst the results. In normal words I can translate the results as: “The more likely it is for a neuron to be dropped, the worse my model will behave”. This is because my neural network is relatively small and when you reduce the size of the network, its performance goes down. So this technique should be used on denser networks for obtaining better results. One thing that we can appreciate is that, when we are using 0.1 as the dropout rate, the curves are smoother, which means that there is less overfitting in my model when compared to my previous post where I used no dropout.
MLP without dropout
As you can see, the blue line that represents the training error is far from the orange line that represents the validation error. When we compare it to the result of using dropout, both lines are closer together.
This is all for my post, I hope you have enjoyed it and hopefully learned something new. I was planning on covering Batch Normalization, but I will do it when the technique actually improves my metrics. Thank you if you got until here. Here is the link to my google colab where you can see the complete code and a small example of how to add batch normalization to the code.
Google Colab notebook: https://colab.research.google.com/drive/1RHQZk_p7ASXS-v9kTFUS5vxy2mDFFEyo?usp=sharing
תגובות