Estimation of Missing Values in Economic Issues Using Cluster-based Multiple Imputation Mety Agustini (1,2), Dedy Dwi Prastyo (1)
1) Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
2) Badan Pusat Statistik Provinsi Kepulauan Bangka Belitung, Pangkalpinang, Indonesia
Abstract
Missing values commonly occur in the data collection process, including in economic surveys and censuses. The existence of missing values not only detracts from the data set availability that can be analyzed but also compromises the statistical power of the research. Deleting missing values significantly impacts the estimation process, resulting in overestimated or underestimated values. Moreover, it affects a significant bias in the results and degrades the efficiency of the data. This research proposes an efficient imputation method for dealing with incomplete data. K-means clustering is a common algorithm applied to discover similarity or dissimilarity among units. Using multivariate imputation by chained equations (MICE), incomplete data will be imputed with plausible values generated from the data that do not contain missing values. Extended experiments reveal the efficacy of the proposed method in the imputation task. Model accuracy is evaluated quantitatively and graphically using plots of plausible value, regression, and density of imputed datasets.