Handling imbalanced dataset using ensemble methods on poverty datasets in Bengkulu Province
Winalia Agwil, Dian Agustina, Herlin Fransiska, Nurul Hidayati

Program Studi Statistika, FMIPA Universitas Bengkulu


Abstract

Poverty is a global issue that concerns the entire world. This is evident in the SDGs statement, which places poverty at the top of the priority list. Poverty reduction will aid in the resolution of other global issues such as hunger, health, prosperity, education, clean water, and sanitation. Of course, an investigation of the characteristics of impoverished households is required to fulfill the goal of dealing with poverty promptly and effectively. So that, a specific program may be developed based on the characteristics of low-income households. A classification tree, such as the Classification and Regression Tree (CART), is one of the statistical methods that can be used. However, this method has a weak classifier if there is an imbalanced dataset- in this case, the rate of poor households compared to the rate of non-poor households is unequal. Because of the skewed dataset, the minority class will have poorer accuracy. To solve this problem, we^ll need a method called the ensemble technique. There are numerous ensemble methods that can be employed, including the boosting algorithm and the random forest algorithm (RF). The top classifier was chosen based on its sensitivity value. The results revealed that the ensemble technique outperforms the CART method in terms of accuracy.

Keywords: Proverty, Classification, CART, Ensemble, Accuracy

Topic: Mathematics

MaSEIS 2021 Conference | Conference Management System