Part 16 Discovering Recurrent Neural Networks and Multivariate Time Series

Rodrigo Ledesma
Aug 31, 2022
8 min read

It’s been a while since I last posted my last article, and it’s been because this was my second time working with Recurrent Neural Networks. We will dig deeper into what they are and why they are helpful, but generally speaking, they are a better algorithm to analyze data that has some continuity and dependency throughout time.

If this is the first time you visit my blog, the purpose of this series of articles is intended to create a Machine Learning model to predict how long it will take an average visitor to wait in line at a Disney or Universal park before they can ride their favorite rollercoaster and with this information, optimize their visit by scheduling parks in the day where their favorite parks are less crowded.

In this article, we will explore the characteristics of a Recurrent Neural Network (RNN), and I will show you how to use them for predictions that are out of the ordinary. I will not use the common example of analyzing a set of words or a music datastream (not that they are not good examples, but you can find hundreds of posts regarding this topic). I want to focus on a different side of the process. If you have been following my last posts, you know that I am analyzing exactly a time series. The data I have collected throughout the months displays the behavior of the Disney and Universal rides’ waiting times described in 28 features. It is tricky to insert 28 different features in an RNN and this is the main reason why I wanted to take my time and obtain the results I wanted from this experiment. So ladies and gentlemen, let’s get our hands dirty.

What is a Recurrent Neural Network?

Let’s first focus on the word “recurrent” it comes from recurrence and it means to happen again (according to the Cambridge Dictionary). So the magic behind a recurrent neural network lies in the fact of not only analyzing the present data but also the data that has already been analyzed (past data on time = t-1). Let’s get more serious:

A common neural network is used to analyze data that has no dependency between themselves. In contrast, RNNs are specially designed to analyze the dependency of data on their predecessors. RNNs use a technique called “feedback loops” which uses the actual data point on the analysis together with the result of the past data point. This concept creates an idea of memory. RNNs have a direct relation with memories as some types will be trained to keep certain past states for a finite amount of loops, and when they are irrelevant, the network will discard them.

Let’s have an example. I mentioned in the introduction that you will be able to find tons of different posts where you can analyze natural language processing, in my case I will not do it as my objective is different. But I will use this example to illustrate how they differ from a regular Neural Network.

Natural Language Processing with RNNs.

Let’s imagine we are trying to predict if a sentence reflects any kind of bad language or sensitive content for a children’s web page. And we have two different sentences:

“I prayed God for his health”
“He is so damn hot oh my God ”

As there might be kids reading my posts, I did not write the sentence I had in mind but you can imagine it. Now, as you can see, if we use the technique called “bag of words” and we assign a number to each word and we train our network to learn that whenever the word “god” appears it is safe, then on our second example we will have some problems. Also in the opposite direction.

Recurrent neural networks will take into consideration the past words in the sentence and when they observe the work prayed before god, the process will output a different quantity, when compared to the second where the word damn and hot are before God. Both sentences have the common word God and nevertheless, they have totally opposite meanings.

The magic behind RNNs lies under the recurrence, in the image above you can see how the input X, gets into h and produces an output, but this output is also reused together with the input. This is what makes the RNNs so special, if we are doing a time analysis, the output “x-i” will be an input of “x-i+1”. This is what gives the network its memory.

When we go deeper into the different types of RNNs we will study their specific architecture. Right now I want to let you know that since I started working with my data, I was wishing to get to this point, as I have the hypothesis that the past state of my data, will influence directly the future outcome. More specifically, if 5 minutes ago it was sunny and the time in line to ride Splash Mountain was 30 minutes, and now it is still sunny there should not be a very big change. But on the other hand, if the park just opened and in 15 minutes the time grew from 10 minutes to 50 in the queue, and it is a hot day in Florida, then it is quite possible that the time will increment based on past behavior.

Analyzing the properties of a Time-Series

If we look for the term “time series” in the dictionary (I know we don't do it anymore we use Google LOL) we will find that it is a series of data points that occur in successive order over some period of time. For example, you will find lots of articles that talk about time series analysis for predicting the price of stocks or the temperature. In this case, we will be using the time series to predict the time in line for a rollercoaster. There are some important concepts we must analyze for understanding the patterns and behavior of our time series:

Seasonality: This concept refers to specific cyclic variations that occur in a regular time interval. For example, in amusement parks, time in line peaks in the mornings and also when there is a holiday. To perform this analysis we can only look at the plot of the data and perform a visual analysis of it.
Trend: This refers to the tendency of the data, this concept is very widely known for the stock market predictions, if the behavior of action is currently going up, there are some conditions that need to be fulfilled for it to stop going up, and its general trend will be to go up.
Cyclicity: This term refers to the up and down patterns the data presents, but not in a defined period.
Stationarity: Our data is said to be stationary if when we calculate the autocorrelation, standard deviation, and mean, they all remain constant through time. Lucky for us we do not need to perform these calculations manually we can use the Johansen cointegration test on the time series which returns the eigenvalues in the form of an array. If the eigenvalues are less than one then the series is said to be stationary. (https://towardsdatascience.com/time-series-analysis-on-multivariate-data-in-tensorflow-2f0591088502)

Prepare our data for a multivariate time-series evaluation with regression on RNN

Let’s get our hands dirty. In our last post, we used MLPs to perform a classification task and save our models. In this post, we cannot reuse the dataframe provided before, as now we need the analysis of the target variable to have the analogic values, not grouped into classes. So let’s start step by step to clean and transform our data:

Let’s break down the code quickly. In lines 1 to 11 we are downloading the data, then deleting all the attractions but the one we are interested in (Harry Potter and the Forbidden Journey). Then I change the name of the variables so that they have unique numbers.

I am not going to show this, but the dataset contains a combination of numbers and strings representing the time it takes to ride the rollercoaster, but when there is a mechanical problem, or the park is closed instead of displaying an integer it displays a string “Closed”. As you know in ML we cannot analyze strings, so we need to either get rid of them or cast them to a number. As it is not my intention to teach my model to predict when the ride will be closed, I will get rid of these rows, and that’s what is being done in lines 13 to 17.

Next, we need to analyze the date variable. As you can see it is a string divided by “/” and for our analysis, it will be better to have them as independent features and that’s what I do in the last lines of the previous code.

One Hot Encoding:

We have analyzed this step in other posts, but in this case, the weather report variable contains a series of different strings that indicate the weather condition. As this is a categorical variable, we analyzed all possible encoding methods and concluded that One hot encoding was the best option and here we are creating the encoding and concatenating it with the general dataframe.

Splitting the dataframe for train and test

This is very important, in other projects and exercises we might be able to scramble all the data and split it randomly. But in this case, we cannot do it, but why?

A time series is a set of data that occur in successive order, if we introduce any type of randomness, the order will get compromised and the training of the model will be unsuccessful. So, how are we going to split the data? This will not take too much time. Let’s concatenate the year and the month of our data to create a feature that does not repeat and that is reliable for splitting.

Lines 1 and 2 of the code above are the most relevant, as they perform the split on the data. For this exercise, I decided to use two months for testing and the rest of the data for training. This constitutes an 80–20 ratio.

Afterward, I get rid of the variables I do not need, such as the year-month feature, and normalize the X values of the dataset.

Transforming the dimensions of our data to fit an RNN

This is one of the most crucial parts of the pipeline. As we are dealing with time series, we cannot input the data as we were used to doing in all the other models. RNN requires our data to have a specific shape to adjust the dimensions. In this case, the RNN needs a 3-dimension array to fit the network’s architecture. If we do not shape our data, the function will send an error, and they are a little difficult to read and understand.

Our network will require our data to have the following shape:

[number of samples, 1, number of features]

So in order to reshape our array and prepare it for ingestion, we will be using the following code:

Creating the architecture of our RNN

As this is the first RNN we are modeling, I am not going to do any parameter tunning or try out different configurations of layers. I used the example from this post: https://towardsdatascience.com/time-series-analysis-on-multivariate-data-in-tensorflow-2f0591088502

Which uses an LSTM, which we will be analyzed later on. with a dropout layer. Here is how we will be creating the model:

As you can see we have a sequential model, with an LSTM as the first layer with 250 neurons. then the dropout later and as we are making a regression, a final dense layer with only 1 output neuron. Then when we fit our model we will be training it for only 50 epochs, with a batch size of 32, and this is important, we set the shuffle parameter to False to keep the integrity of the time series. After training, we are plotting the loss curves to see how well our model predicts new data.

Let me clarify that this does not mean that our RNN has accomplished a good predicting range, I am only demonstrating that it is working. Now in conclusion I will plot the predicted time series and the actual values and compare the structure of both datasets.

As I have stated before, this post is not to show the readers the great and magical job that RNNs do with the data, it is to give an introduction and show how to correctly configure and run the algorithm. In the last image, we can see the model is having a very hard time predicting times above 70 minutes, but it learned the general pattern.

This is everything for today! Thank you so much for reading, next time we will be digging deeper into the architecture and tuning some parameters for improving its outcome.