GSoC Project Proposal: GlitchWitcher: AI-assisted Bug Prediction

GlitchWitcher: AI-assisted Bug Prediction

Description

Defects exist in source code. Some defects are easily found during code reviews, some are revealed by proper unit or integration testing. Before code is reviewed and the code is executed during testing, there are other ways to detect where bugs may be lurking.

The intent of this project is to trial and compare 2 approaches to predicting where defects live in source code. There are a lot of different research papers that exist that discuss different algorithms that assist in finding defects. This project will focus on implementation of 2 approaches.

Approach 1: Predicting Faults from Cached History

This first approach is a relatively simple, inexpensive technique for predicting where bugs live. It is outlined in this research paper [1]. We would revive an earlier prototype of BugTools [2], which is a utility that applies the BugCache/FixCache algorithm to selected Github repositories to return scores against the different files in the repositories, as per the algorithm outlined in the paper. Those with the highest hit rates are the most likely to contain defects, so are the more important to cover thoroughly with tests.

This phase of the project is not expected to take very long, it is really to warm up to the idea of analyzing source code and integrating a new verification ‘check’ into a workflow of a Github repository. This utility would report top 10 files that are most likely to contain defects.

Approach 2: Reconstruction Error Probability Distribution (REPD) model

The 2nd approach utilizes a supervised anomaly detection/classification model outlined in this research paper [3] to categorize defective and non-defective code. This approach is much more involved. Section 3 of the paper describes the model in use, while section 4 describes the methodology. They train against datasets from NASA ESDS Data Metrics project [4].

As part of this project, participants are asked to try and reproduce the REPD model described in this paper and apply it to both the data used by the researchers to see if similar results are found and then apply it to a separate C/C++ code base such as OpenJ9 [5] or OpenJDK [6] (or both).

Ideally, one of the outcomes of this project would be to compare the results found with Approach A versus Approach B when applied against the same codebase. A second outcome of this work would be to incorporate an interim verification check against a source code repository, perhaps on the cadence of every time a new tag is applied.

Reference Links

[1] https://web.cs.ucdavis.edu/~devanbu/teaching/289/Schedule_files/Kim-Predicting.pdf

[2] https://github.com/adoptium/aqa-test-tools/tree/master/BugPredict/BugTool

[3] https://www.sciencedirect.com/science/article/abs/pii/S0164121220301138

[4] https://www.earthdata.nasa.gov/about/data-metrics

[5] https://github.com/eclipse-openj9/openj9

[6] https://github.com/adoptium/jdk

Links to Eclipse Projects / Repositories

https://projects.eclipse.org/projects/adoptium.aqavit https://projects.eclipse.org/projects/adoptium.temurin https://projects.eclipse.org/projects/technology.openj9

https://github.com/adoptium/aqa-tests https://github.com/adoptium/aqa-test-tools https://github.com/eclipse-openj9/openj9 https://github.com/adoptium/jdk (mirror of upstream repository)

Expected outcomes

Trialing 2 different approaches (implemented as static analysis 'utilities’) to predict source code defects in a given source code base
A comparison of the 2 approaches (do they identify the same files in a code base as ‘most likely’ containing bugs)
An additional way to flag areas of code that need more scrutiny during code reviews and a greater emphasis during testing
A verification check (or workflow, a.k.a. GlitchWitcher) that runs these static analysis utilities against pull requests in a repository

Skills required/preferred

Languages & Frameworks: Python (for ML and automation), Git APIs, NLP libraries (e.g., SpaCy, BERT, GPT-based models). Awareness of different classifiers (Gaussian Naive Bayes, logistic regression, k-nearest-neighbors, decision tree, and Hybrid SMOTE-Ensemble) and statistical analysis will be helpful.
CI/CD Integration: GitHub Actions, Jenkins
Database & Storage: MongoDB (or PostgreSQL/MySQL) for storing historical build data and test results.
Deployment: integration with current development workflow and pipelines

Project size

350 hours

Possible mentors:

Lan Xia lan_xia@ca.ibm.com
Longyu Zhang longyu.zhang@ibm.com
Shelley Lambert slambert@redhat.com

Rating

medium - hard