Adoption of machine learning to software failure prediction
For any software development organisation, the cost of defects verification is extremely large. Such process is not always trivial or even achievable and often requires following very specific use cases or replicating complex customer’s environments. This can consume a lot of time and skills.
Consider test team, that verifies few hundreds of bugs per year. Either those come from the field or are found in-house, the time required for verification of a single defect is at minimum two hours. For 250 defects, that sums up to almost 100 days of work on nothing else but this.
Additionally, this process requires extreme carefulness to make sure to not to introduce any regressions in already supported functionalities, which may easily happen when delivering bug fixes.
Goal of the project:
The goal of this project is to build analytic solution that will reduce cost of defects verification by predicting which defects were incorrectly fixed and need to be revisited by development team.
- Based on historical data (i.e. previously verified defects) create training data set.
- Train predictive model using classification algorithm that meets defined quality criteria.
- Model should be exposed as an online REST service, so we can easily call it and make prediction requests for new incoming defects.
- Defect records that are stored in code repository, are automatically updated with scoring result: prediction and probability of prediction.
- Defects should be sorted by prediction probability, so QA engineer could start testing bugs with highest probability of incorrect fix.
- If new bugs are successfully verified and label value is known (correctly fixed / incorrectly fixed), such records are marked as new training data and stored in feedback data store.
- Feedback data store is used to evaluate served model quality, automatically retrain deployed model, and finally to re-deploy new model version. The goal of such continuous learning system, is to ensure the highest possible quality of exposed model.
To solve our problem, the following technologies were used
- Watson Studio with Watson Machine Learning service on IBM Cloud,
- For predictive model creation, we leveraged Jupyter Notebooks and Spark MLlib served by Watson Studio (Spark notebook runtime),
- For model deployment, monitoring and retraining we applied Continuous Learning System of Watson Machine Learning offering.
Detailed workflow can be found below.
Input data information.
Each reported bug contains set of fields that have to be filled out by bug creator, bug resolver (software developer) and bug validator (test engineer) (see figure 3 and table 1). Those fields are used to create training data (used to train a model) as well as scoring data (used to get prediction).
Creation of training data set.
Each bug verified by test engineer has information that tells us if bug has been fixed (the “Reopened” field). “Reopened” field is our target column — we want to predict this field value [false, true] for not verified bugs. Verified bugs are used as training data since the target is known here. This is called supervised learning since we provide (supervise) learning process by providing target values in training data set (for more detailed definition please refer to ).
Based on data gathered from verified bugs we create training data set consisting of feature columns and target column. The data needs to be pre-processed (transformed) to be compliant with Spark MLlib. The process of training data set preparation is described on chart below.
Building Spark MLlib pipeline.
“In machine learning, it is common to run a sequence of algorithms to process and learn from data.”
“MLlib represents such a workflow as a Pipeline, which consists of a sequence of PipelineStages (Transformers and Estimators).”
Our simple machine learning pipeline consists of StringIndexer and VectorAssembler transformations. As an estimator we have used RandomForestClassifier.
Train and evaluate model.
To train and evaluate our model, we split training data set into 2 subsets: train (to train a model) and test (to evaluate quality of trained model). Evaluation done using test data set showed 87% of accuracy.
Note: For simplicity of this article we have skipped model tuning section.
Model deployment on Cloud with Watson Machine Learning.
To publish and deploy the model we have used watson-machine-learning-client available on pypi.
At this stage our model is present in Watson Machine Learning repository on Cloud. Now we can create online deployment of this model and score new data records.
Online deployment is created so let’s print scoring endpoint.
You can use below method to do test scoring request against deployed model.
In above example, scoring response predicted value is „true”, what means that bug has been incorrectly fixed. The probability of that prediction is ~67%.
Enablement of Continuous Learning System for deployed model.
The IBM Watson™ Machine Learning service includes a continuous learning system. Continuous learning systems provide automated monitoring of model performance, retraining, and redeployment to ensure right predictions quality.
Next figure shows an example of the system run. We can notice that original model quality was above specified threshold (threshold is represented by red line, whereas model quality by blue). In 1st and 2nd iteration of the system deployed model was evaluated based on feedback data. Evaluation result showed model accuracy degradation (blue line below red). Since model accuracy is below specified threshold retraining process has been triggered. New model version was trained using training and feedback data stores. Since new version of the model shows accuracy above specified threshold model redeployment took place. Old model version is being replaced by newly trained one ensuring correct quality of predictions.
Save prediction results in bug record.
Prediction result as well as probability are being saved in bug records as two additional fields. That allows to easily sort and validate the list of bugs awaiting verification.