Some time ago I have written a story how to predict incorrect bug fixes. The full story “Adoption of machine learning to software failure prediction” can be found here. Long story short — we have adopted binary classification algorithm to predict if the bug has been correctly or incorrectly fixed. Code change sets predicted as incorrect ones required development (QA) team attention. Ones predicted as correctly fixed were automatically closed. The adopted solution involved data science knowledge.
Data science expertise was required to:
- Preprocess data coming from source code and bugs repos like Github,
- Build machine learning pipeline using Spark MLlib library
- Train the model
- Evaluate the model and decide if it is accurate and reliable enough to use it.
For many development teams this is a blocker. There is basically a skill-gap that cannot be easily addressed. Is it still the case ?.
Automation of AI can easily fill this gap and unblock machine learning adoption for development teams. An example of such automated AI is AutoAI. It is capable of:
- Data preparation
- Model development
- Feature engineering
- Hyper-parameter optimisation
So, let’s do the experiment: replace machine learning knowledge of original story with AutoAI.
After creating a project in Watson Studio we define new AutoAI experiment by providing experiment name and associated Watson Machine Learning instance. You can also chose the computation configuration for your experiment.
In next step we provide training data set extracted from GitHub repository. You can find detailed description of training data creation in prior story.
The data set contains 35 columns including the target column “Reopened” that needs to be pointed in next window (prediction column for experiment).
As we can see AutoAI has automatically detected machine learning problem type we are interested in. Binary classification is exactly what we need. We want to predict if particular bug fix will lead to bug reopening or not. Our target variable Reopen can either be TRUE or FALSE. This is what we want to predict for not verified bug fixes.
That’s all we need to provide. Just trigger the “Run experiment” and observe the progress. In next stage AutoAI makes data split to training data and holdout data. As we can see on below screen the split by default is 90:10%.
The holdout data set does not take part neither in training nor tuning process. It is being use only for validation of final models.
In next step the training process starts. AutoAI explores space of different features, different estimators and returns best found pipelines. Cross-validation technique is used to evaluate trained models.
As experiment progresses new pipeline models are returned to us. Each of them contains scorer value and is ranked depending on it. In our case it is accuracy. The best model found for our use case is XGB Classifier model with features engineering (FE) and hyper-parameters tuning (HPO).
The accuracy of found and trained model is 0.982 — which is extraordinary. We can also check other evaluation metrics calculated for that model. The holdout data set accuracy score is 100% (1.0).
We can also preview confusion matrix for this model. As we can see all records were classified correctly.
As we can see in less than 15 minutes and without data science skills we received well performing machine learning model that can boost our QA team capabilities.
AutoAI also allows us to preview model information, feature transformations and importance.
If needed we can export the pipeline definition as source code (scikit-learn) in notebook. We can also store the model in Watson Machine Learning service for web service creation and integration with external system/application.
That’s basically all. We have a web service we can use to make predictions for new incoming fixes. The predictions and probability of prediction can be used to enrich bug records and make further prioritisation and analysis of QA tasks.