top of page

Part 2 Using APIs to get weather conditions and scheduler

  • Writer: Rodrigo Ledesma
    Rodrigo Ledesma
  • May 15, 2022
  • 5 min read

Updated: May 18, 2022


In our last article, we talked about how we are going to obtain information about the Disney and Universal times using a technique called web scraping. If we use the code given in the last section and we get the waiting times list, we will get something similar to this file:



One problem that pops out immediately is, that those are only results, there are no variables to analyze. This is what we will be doing today. So, let’s make brainstorm and think about what information can influence the amount of time it takes for a visitor to wait in line for a ride or a show.

  • Rain

  • Holidays

  • Cold

  • Extreme heat

  • Hurricanes (it is Florida)

  • Time of the day

  • Covid-19 restrictions

Most of them are easy to get; for example if we want to know the temperature and the weather conditions we can visit a weather page such as the weather channel.com but for our project, it is more important to automate the consumption of information and repeat the query periodically. In order to do so, let’s consume the information from an API.


There are some excellent articles that will explain in technical detail what an API is, how it works and when we should use them. I will give you a brief explanation and a simple way to understand it, so if you already know what an API is, please feel free to skip the next section and head directly to II-Data consumption using APIs


I-What is an API


Imagine that you are at a restaurant, and you want to order some food. If you talk directly to the chefs or cooks, they might have problems understanding what you want, as their codes are different than the common language you speak. So how are we supposed to order food then? Let’s use the help of a waiter, they have been specially trained to understand the vocabulary and the process cooks use. But they also understand the language guests use, so they are able to function as a bridge of communication between both worlds.


Now let’s get more technical, imagine that you have a backend application that stores data in a cloud-based environment. But you don’t have time to learn the programming language because you have 20 different projects apart from that one, that’s where APIs come in handy, they read what is in the backend and deliver the result of a query in a JSON file, which can be easily manipulated and understood. So you don’t have to worry about what the hell is going on behind the stage, you will only ask the API for information and wait until your beautiful JSON package arrives with your information.


An API is considered a software intermediary and it allows two technologies to interact or communicate.




II-Data consumption using APIs

Lucky for us, Python has a couple of user-friendly libraries specially designed to interact with APIs from external agents. In this case, I used ‘requests’. But using the correct library is only a tiny step in the process because we need a reliable source of information to extract our variables. For that, I decided to use openweathermap.org. This web page offers services to consume large amounts of data, but specifically, if you make less than 10,000 queries to the API per day the services will be free (this was one of the main reasons why I chose this service.


Now you will need to create an account on the platform and explain why you want to use their service (research, commercial, personal…). Once the account is created you will have to generate a KEY that will give you access to consume the data.


Talking about Python, let’s take a look at how we will make the initial configurations:


import requests# importing requests and json
CITY = "Orlando"
# upadting the URL
URL = 'http://api.openweathermap.org/data/2.5/weather?q=Orlando,US&appid=fa79...c11&units=metric'
# HTTP request
response = requests.get(URL)
if response.status_code == 200:
   # getting data in the json format
   data = response.json()
   # getting the main dict block
   main = data['main']
   # getting temperature
   temperature = main['temp']
   # getting the humidity
   humidity = main['humidity']
   # getting the pressure
   pressure = main['pressure']
   # weather report
   report = data['weather']
else:
   # showing the error message
   print("Error in the HTTP request")
 

Code is quite simple but let’s see what it does line by line, first it will gather information from only the city of Orlando in Florida US. If you want to use a different city please read the information given on the web page because they have almost all countries but they might not have specifically the one you are looking for. Afterward, we paste the KEY using the URL variable. and we create a response object using the request library to tell our code to get the information from the URL given using our credentials.


The if statement will be analyzing the primary response of the API call, thick is the status code. This will tell us if there is a problem in the HTTP request. As I am not an expert in the field I will not go deeper into it. But the important part is that if the code is anything apart from a 200, then we have a communication problem and we need to let the user know, that’s why we send an error message.


As the last step, we need to create a JSON object and in this JSON we will be storing the variables that are relevant to us, in this case, I will store temperature, humidity, pressure, and report. The first three are pretty self-explanatory but the last one (report) is actually considered a categorical variable, as it contains information in a string format about the weather, for example, “cloudy” or “clear sky”.


To manage this information, I used Pandas and appended all the information into a DataFrame, just o debug the process, this is how the final document looks like. Please note that I added some other variables that are actually manual such as “feriado” which means holiday and time and date variables.



One helpful tip I will put here will be how to obtain information constantly without having to run periodically any scripts. What do I mean? Well for this project I decided to obtain information every 5 minutes. This means that every five mins my script will do the scraping, create a DataFrame with all the information, concatenate the results and store them in a .csv file. There are different ways to program this automation using python, even using tools from the Windows environment. I decided to go with a python tool that schedules a job (function) to run every x time. The library’s name is “schedule” pretty simple isn’t it? Lets make an example of how we can use it in a .py


import schedule#this function will contain what needs to be periodically repeateddef job(): 
   print("hello")schedule.every(10).minutes.do(job) # our function will run every 10 minswhile True:
    schedule.run_pending()
    time.sleep(1)#this while loop is actually an infinite loop, in charge of running indefinitely our job every ten minutes.

Now, please be aware that the 10 minutes will not always be precise. because depending on the power of your machine or virtual environment, the function will take different amounts of time. So, please don’t expect your code to run exactly every 10 or 5 minutes, as it will not be precise.


So far we have learned how to perform web scraping, how to use one API to obtain information, and how to periodically schedule a function to run autonomously. In the next post, we will start with an analysis more related to ML and DS, such as data cleansing and variables correlation.

 
 
 

Comments


bottom of page