Predicting COVID19 cases with AutoAI time series API

Lukasz Cmielowski, PhD
4 min readApr 23, 2021

IBM Watson AutoAI has recently introduced a new beta feature — time series support. It’s is as easy as a walk in the park: all you need to do is drag & drop your time series data, and then sit back and relax while the best model to is being prepared for you.

In this story I will present how easily IBM AutoAI python API can be applied to COVID19 data to get predicted confirmed cases for the next few days.

Setup

To work with AutoAI for time series one needs to have Watson Machine Learning service instance (included with the free plan). Watson Machine Learning provides the python interface via ibm-watson-machine-learning package (available on pypi). You can easily install the package by running the following pip command:

pip install ibm-watson-machine-learning

Next, you need to provide authentication information to initialise the python client.

from ibm_watson_machine_learning import APIClient
client = APIClient(credentials)

The time series data

The prepared data set contains the date and daily_cases columns. The daily_cases column contains the number of confirmed COVID19 cases in Poland on a particular day. The data set tracks confirmed cases from January 22, 2020 till March 28, 2021. At that time, we start to observe a dramatic increase in daily cases here in Poland (around 30k per day).

Here a visualisation of this data set prepared using the plotly package.

Now we use the python client to upload the data to Cloud Object Storage, to make it available for AutoAI.

AutoAI for time series

Using the python API we can easily define the AutoAI experiment for time series data. We need to define the following parameters for our experiment’s optimizer:

  • name - experiment name
  • prediction_type — problem type
  • prediction_columns — indices of target columns
  • timestamp_column_name — date&time column index
  • forecast_window — number of days to be predicted
  • holdout_size - number of holdout records
  • scoring - optimization metric

Now, call thefit() method to start the training job.

As soon as training is completed, we can list all models found for us by AutoAI.

We can retrieve each pipeline details by calling the get_pipeline_details() method.

Each pipeline details contains data for visualization. Below is a simple comparison of observed vs. predicted values on a holdout data set.

Pipeline_6 is the best model returned by the AutoAI. So, we will use this one for deployment and scoring.

Deployment and scoring

In this section we will deploy the best pipeline as a webservice. Then we will use this webservice’s scoring endpoint to obtain predictions for the next 7 days.

Since our deployment has been created, we can ask for predictions using the score() method.

We receive a list of predicted confirmed cases in Poland in the following week. Let’s visualize that.

Based on the prediction we can expect a significant peak in 3–4 days.

Go to IBM Cloud and check this new feature out.
You can also find sample AutoAI notebooks here.

--

--

Lukasz Cmielowski, PhD

Senior Technical Staff Member at IBM, responsible for AutoAI (AutoML).