Comparison of Support Vector Machine of Unbalanced Microarray Data Classification using The Resampling Method: Synthetic Minority Oversampling Technique (SMOTE) and Radial Based Oversampling (RBO) Diana Nurlaily(a*), Irhamah (b), Santi Wulan Purnami (b)
a) Study Program of Statistics, Institut Teknologi Kalimantan, Kampus ITK-Karang Joang, Balikpapan, 76127, Indonesia.
*diana.nurlaily[at]lecturer.itk.ac.id
b) Department of Statistics, Institut Teknologi Sepuluh Nopember, Kampus ITS-Sukolilo Surabaya 60111, Indonesia.
Abstract
Microarray data contains hundreds to thousands of observable genes. Microarray data that is usually used for research is DNA Microarray. DNA Microarray is used to determine the level of gene expression and the gene sequence in the sample. This type of data is used to collect information from tissue and cell samples about differences in gene expression that can be useful for diagnosing disease or differentiating certain type of tumors. Some characteristics of microarray datasets are high dimensions and imbalance. These characteristics can lead to inappropriate classification predictions. This study aims to compare which resampling method is better between the Synthetic Minority Oversampling Technique (SMOTE) and Radial Based Oversampling (RBO) methods in microarray data classification using a Support Vector Machine (SVM). The data used in this study are breast cancer and lymphoma which have an imbalance ratio and a different number of variables. Based on the results of the analysis, it was found that the goodness of SVM model with SMOTE was better than using RBO.