Peeking behind the curtain with IBM Watson AutoAI Python Client

Lukasz Cmielowski, PhD
5 min readMar 22, 2021

Written by: Lukasz Cmielowski, PhD, Yair Schiff & Przemyslaw Czuba

Imagine you are at your favorite restaurant. You just finished eating your favorite dish. Yes, exactly, the one with the secret sauce. Suddenly you look up from your empty plate and see the chef standing next to your table. She tells you to please follow her. As you make your way towards the back of the restaurant the anticipation grows. Finally the chef swings open the doors to the kitchen, waves her hand for you to come in, and says “let me show you how the secret sauce is made!”

With IBM Watson Machine Learning Python API client, our team is inviting you into our kitchen. You can see how we make our very special AutoAI One Button Machine (OBM) sauce. Although it might not be as aromatic as our story above, we are happy to announce the new beta version of AutoAI OBM SDK notebooks, which give users greater control over to the state-of-the-art machine learning pipeline generation available in AutoAI. This new beta release, allows users to directly access the API calls that underlie AutoAI and OBM through a Python client, allowing for greater control and customization of the data pre-processing, model generation, and evaluation processes.

If you have ever worked with multiple data sets then you know that data joining and preprocessing can be a very time-consuming task. Now AutoAI can do this automatically for you. The Watson Studio UI offers you the support for multi data source joins and feature engineering. OneBM provides a rich set of aggregation functions to generate features for many types of data including transactional data such as ATM transactions, time series such as sale quantity, event sequence such as click-through events etc.

But what if you are not a fan of UI? What if you need to have the Python interface to do your work programmatically?

If that’s the case, then we’ve got something for you — the auto-generated AutoAI notebook with injected Python API commands that mirrors the UI experiment. You do not need to look for Python API documentation or examples on how to write the script or the notebook. It is already there — just use it. Let’s look at how this works.

Watson Studio

Anonymous outdoor equipment purchase data for machine learning examples.

In the example below the anonymous outdoor equipment purchase data is used. The company would like to predict the quantity of sport products required. Using the Watson Studio AutoAI we can drag & drop data sets and then define data joins. We have five data sets for which we define the join keys.

Data joins definition

Next, we trigger the binary classification experiment.

Experiment configuration for multi data sets

If you want to use the notebook, use the “Save as notebook” option as shown in the second image below.”

Completed AutoAI experiment
Generated AutoAI experiment notebook (Save as notebook link)

AutoAI experiment notebook

Watson Studio automatically opens the AutoAI notebook using the linked runtime environment (Python 3.7). The notebook contains the following sections:

  1. Setup
    — Package installation
    — Watson Machine Learning connection
  2. Experiment configuration
    — Experiment metadata
  3. Working with completed AutoAI experiment
    — Get fitted AutoAI optimizer
    — Pipelines comparison
    — Get pipeline as scikit-learn pipeline model
    — Inspect pipeline
    — Visualize pipeline model
    — Preview pipeline model as python code
  4. Deploy and Score
    — Working with spaces
  5. Running AutoAI experiment with Python SDK
  6. Clean up
  7. Next steps

The setup and configuration sections provide all steps required to prepare the environment. The Working with completed AutoAI experiment section shows how to explore the finished AutoAI experiment via AutoAI Python API. The Deploy and Score section provides the API command to create batch deployment of selected pipeline and score new data. Finally, the Running AutoAI experiment with Python SDK section contains Python API code that allows you to re-run the experiment in programmatic manner.

Working with a completed AutoAI experiment

After connecting to the fitted pipeline optimiser, we can preview the contents of the preprocessing pipeline (data joins and feature engineering).

It is also possible to compare all trained pipelines. The AutoAI Python API returns a Pandas data frame with all metrics.

For people interested in feature importance, this option is also available.

Features importance

To provide the full transparency of AutoAI pipelines we can easily visualise selected model using the “visualize” method. The pipeline contains the preprocessing (join) pipeline plus machine learning pipeline steps. It ends with an estimator — in the example below it is RandomForestRegressor.

To get the join and feature engineered data set you can use the following method:

obm_output_df = pipeline_optimizer.get_preprocessed_data_connection().read()

There is also an option to pretty_print the exact steps and transformations done at each step. The printed code illustrates the whole process giving 100% transparency.

Running an AutoAI experiment with Python SDK

This section provides all commands to run exactly the same experiment via Python API (instead of UI). It recreates data join graph definition, data sources and optimizer (experiment) configuration. It provides scikit-learn like API to “fit” the optimizer and work with the results.

Data join graph via python API
Experiment definition via Python API

As you can see the AutoAI not only finds the best model for you but also provides Python API code to make any integration and automation easy.

I hope you have enjoyed learning about the AI secret sauce. Please go to IBM Cloud and check this new feature out. You can also find sample AutoAI notebooks here.

--

--

Lukasz Cmielowski, PhD

Senior Technical Staff Member at IBM, responsible for AutoAI (AutoML).