Reduction of Data Dimensions Using PCA and SVD with Case Studies of ITB Tracer Study Data Dina Prariesa (a), Udjianna Sekteria Pasaribu (b*), Utriweni Mukhaiyar (b)
a) Doctoral Program in Mathematics, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Jalan Ganesha 10, Bandung 40132, Indonesia
b) Statistics Research Division, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Jalan Ganesha 10, Bandung 40132, Indonesia
*Corresponding Author: udjianna[at]math.itb.ac.id
Abstract
As big data develops, techniques to reduce dataset dimensions are becoming increasingly important in statistical analysis. Researchers and data analysts have been faced with the task of reducing a high-dimensional data set to a collection of lower-dimensional data sets without significant loss of information. The most common techniques of reducing dataset dimensions uses Principal Component Analysis (PCA) which retains as much variation as possible in the data set by transforming to a new set of variables, namely the principal components, which are uncorrelated, and ordered so that the first few variables retain most of the variation present in all original variable. In this paper, a theoretical study of the data dimension reduction technique is carried out using PCA and its derivative, namely the Singular Value Decomposition (SVD) technique. Furthermore, the relationship between these two data reduction techniques, PCA and SVD, will be applied to the tracer study data of ITB alumni, especially study programs in the Faculty of Mathematics and Natural Sciences (FMIPA), School of Pharmacy (SF), and the School of Life Sciences and Technology (SITH).
Keywords: Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Tracer Study