Adoption of machine learning to software failure prediction

Defect life cycle.
  1. Based on historical data (i.e. previously verified defects) create training data set.
  2. Train predictive model using classification algorithm that meets defined quality criteria.
  3. Model should be exposed as an online REST service, so we can easily call it and make prediction requests for new incoming defects.
  4. Defect records that are stored in code repository, are automatically updated with scoring result: prediction and probability of prediction.
  5. Defects should be sorted by prediction probability, so QA engineer could start testing bugs with highest probability of incorrect fix.
  6. If new bugs are successfully verified and label value is known (correctly fixed / incorrectly fixed), such records are marked as new training data and stored in feedback data store.
  7. Feedback data store is used to evaluate served model quality, automatically retrain deployed model, and finally to re-deploy new model version. The goal of such continuous learning system, is to ensure the highest possible quality of exposed model.
Flow chart of the solution.
List of fields (features) used for model training and scoring.
Sample anonymized defect records.
Training data preparation.
Schema of training data.
Code snippet for Spark MLlib pipeline building.
Code snippet showing model evaluation.
Sample prediction result (number of features simplified for this example).
Screenshot of learning system configuration window.
Scrrenshot of evaluation events chart.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lukasz Cmielowski, PhD

Lukasz Cmielowski, PhD

Senior Technical Staff Member at IBM, responsible for AutoAI (AutoML).