ICIMICE 2023
Conference Management System
Main Site
Submission Guide
Register
Login
User List | Statistics
Abstract List | Statistics
Poster List
Paper List
Reviewer List
Presentation Video
Online Q&A Forum
Access Mode
Ifory System
:: Abstract ::

<< back

On Using Instance Hardness Measures to Select Training Data for Software Defect Prediction
Benyamin Langgu Sinaga (1, 2, *), Sabrina Ahmad (2), Zuraida Abal Abas (2)

1). Department of Informatics Universitas Atma Jaya Yogyakarta, Jalan Babarsari 43 Yogyakarta, Indonesia 55281
2). Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100 Durian Tunggal Melaka, Malaysia
*) benyamin.sinaga[at]uajy.ac.id


Abstract

The software defect prediction model has been a popular solution to allow the software quality assurance team to focus closely on testing the highly defect-prone modules. However, directly using cross-project datasets to learn the prediction model results in an unsatisfactory predictive model. As a result, the selection of training data is critical. Most training data selection occurs at the instance level, using kNN and the Euclidean distance to measure the similarity between source and target data. Such an approach, however, is susceptible to noise. Defect datasets are complex due to class imbalance, noisy datasets, and class overlaps. However, selection criteria are predominantly based on the distance between the source and target datasets while ignoring those data complexity-related factors. It causes several machine learning algorithms to underperform. This study proposed a filter for selecting training data instances considering the complexity factors. The filter is constructed utilizing four instance hardness measures related to defect dataset complexity factors: noisy instances and the overlapping character of instance classes on cross-project data. The proposed system was evaluated using 14 datasets and six classification algorithms. The findings indicate that using instance hardness measures for data selection can improve the prediction performance of the defect prediction model.

Keywords: Training data selection, software defect prediction, instance hardness measures

Topic: Machine Learning and Deep Learning

Plain Format | Corresponding Author (Benyamin Langgu Sinaga)

Share Link

Share your abstract link to your social media or profile page

ICIMICE 2023 - Conference Management System

Powered By Konfrenzi Standard 1.832M-Build6 © 2007-2024 All Rights Reserved