IBM Watson AutoAI has recently introduced a new beta feature — time series support. It’s is as easy as a walk in the park: all you need to do is drag & drop your time series data, and then sit back and relax while the best model to is being prepared for you.
In this story I will present how easily IBM AutoAI python API can be applied to COVID19 data to get predicted confirmed cases for the next few days.
To work with AutoAI for time series one needs to have Watson Machine Learning service instance (included with the free plan). Watson Machine Learning provides the python interface via
ibm-watson-machine-learning package (available on pypi). You can easily install the package by running the following pip command:
pip install ibm-watson-machine-learning
Next, you need to provide authentication information to initialise the python client.
from ibm_watson_machine_learning import APIClient
client = APIClient(credentials)
The time series data
The prepared data set contains the date and daily_cases columns. The daily_cases column contains the number of confirmed COVID19 cases in Poland on a particular day. The data set tracks confirmed cases from January 22, 2020 till March 28, 2021. At that time, we start to observe a dramatic increase in daily cases here in Poland (around 30k per day).
Here a visualisation of this data set prepared using the plotly package.
Now we use the python client to upload the data to Cloud Object Storage, to make it available for AutoAI.
AutoAI for time series
Using the python API we can easily define the AutoAI experiment for time series data. We need to define the following parameters for our experiment’s optimizer:
name- experiment name
prediction_type— problem type
prediction_columns— indices of target columns
timestamp_column_name— date&time column index
forecast_window— number of days to be predicted
holdout_size- number of holdout records
scoring- optimization metric
Now, call the
fit() method to start the training job.
As soon as training is completed, we can list all models found for us by AutoAI.
We can retrieve each pipeline details by calling the
Each pipeline details contains data for visualization. Below is a simple comparison of observed vs. predicted values on a holdout data set.
Pipeline_6 is the best model returned by the AutoAI. So, we will use this one for deployment and scoring.
Deployment and scoring
In this section we will deploy the best pipeline as a webservice. Then we will use this webservice’s scoring endpoint to obtain predictions for the next 7 days.
Since our deployment has been created, we can ask for predictions using the
We receive a list of predicted confirmed cases in Poland in the following week. Let’s visualize that.
Based on the prediction we can expect a significant peak in 3–4 days.