Kernel Choice for Unsupervised Kernel Methods

Alam, Md. Ashad

View/Open

Md. Ashad Alam (27.79Mb)

Date

2014-09

Author

Alam, Md. Ashad

Abstract

In kernel methods, choosing a suitable kernel is indispensable for favorable results. While cross-validation is a useful method of the kernel and parameter choice for supervised learning such as the support vector machines, there are no well-founded methods, have been established in general for unsupervised learning. We focus on kernel principal component analysis (kernel PCA) and kernel canonical correlation analysis (kernel CCA), which are the nonlinear extension of principal component analysis (PCA) and canonical correlation analysis (CCA), respectively. Both of these methods have been used effectively for extracting nonlinear features and reducing dimensionality. As a kernel method, kernel PCA and kernel CCA also suffer from the problem of kernel choice. Although cross-validation is a popular method of choosing hyperparameters, it is not applicable straightforwardly to choose a kernel and the number of components in kernel PCA and kernel CCA. It is important, thus, to develop a well-founded method for choosing hyperparameters of the unsupervised methods. In kernel PCA, it is not possible to use cross-validation for choosing hyperparameters because of the incomparable norms given by different kernels. The first goal of the dissertation is to propose a method for choosing hyperparameters in kernel PCA (the kernel and the number of components) based on cross-validation for the comparable reconstruction errors of pre-images in the original space. The experimental results of synthesized and real-world datasets demonstrate that the proposed method successfully selects an appropriate kernel and the number of components in kernel PCA in terms of visualization and classification errors on the principal components. The results imply that the proposed method enables the automatic design of hyperparameters in kernel PCA. XIV In recent years, the influence function of kernel PCA and a robust kernel PCA has been theoretically derived. One observation of their analysis is that kernel PCA with a bounded kernel such as Gaussian is robust in that sense the influence function does not diverged, while for kernel PCA with unbounded kernels for example polynomial the influence function goes to infinity. This can be understood by the boundedness of the transformed data onto the feature space by a bounded kernel. While this is not a result of kernel CCA but for kernel PCA, it is reasonable to expect that kernel CCA with a bounded kernel is also robust. This consideration motivates us to do some empirical studies on the robustness of kernel CCA. It is essential to know how kernel CCA is effected by outliers and to develop measures of accuracy. Therefore, we do intend to study a number of conventional robust estimates and kernel CCA with different functions but fixed parameter of kernel. The second goal of the dissertation is to discuss five canonical correlation coefficients and investigate their performances (robustness) by influence function, sensitivity curve, qualitative robustness index and breakdown point using different type of simulated datasets. The final goal of the dissertation is to extract the limitations of cross-validation for the kernel CCA, and to propose a new regularization approach to overcome the limitations of kernel CCA. As we demonstrate for Gaussian kernels, the cross-validation errors for kernel CCA tend to decrease as the bandwidth parameter of the kernel decreases, which provides inappropriate features with all the data concentrated in a few points. This is caused by the ill-posedness of the kernel CCA with the cross-validation. To solve this problem, we propose to use constraints on the 4th order moments of canonical variables in addition to the variances. Experiments on synthesized and real world datasets including human action recognition for a robot demonstrate that the proposed higher-order regularized kernel CCA can be applied effectively with the cross-validation to find appropriate kernel and regularization parameters.