Kernel Choice for Unsupervised Kernel Methods
Collections
Abstract
In kernel methods, choosing a suitable kernel is indispensable for favorable results.
While cross-validation is a useful method of the kernel and parameter choice for supervised learning such as the support vector machines, there are no well-founded methods,
have been established in general for unsupervised learning. We focus on kernel principal
component analysis (kernel PCA) and kernel canonical correlation analysis (kernel CCA),
which are the nonlinear extension of principal component analysis (PCA) and canonical
correlation analysis (CCA), respectively. Both of these methods have been used effectively
for extracting nonlinear features and reducing dimensionality.
As a kernel method, kernel PCA and kernel CCA also suffer from the problem of kernel
choice. Although cross-validation is a popular method of choosing hyperparameters, it is
not applicable straightforwardly to choose a kernel and the number of components in kernel
PCA and kernel CCA. It is important, thus, to develop a well-founded method for choosing
hyperparameters of the unsupervised methods.
In kernel PCA, it is not possible to use cross-validation for choosing hyperparameters
because of the incomparable norms given by different kernels. The first goal of the dissertation is to propose a method for choosing hyperparameters in kernel PCA (the kernel and the
number of components) based on cross-validation for the comparable reconstruction errors
of pre-images in the original space. The experimental results of synthesized and real-world
datasets demonstrate that the proposed method successfully selects an appropriate kernel
and the number of components in kernel PCA in terms of visualization and classification
errors on the principal components. The results imply that the proposed method enables
the automatic design of hyperparameters in kernel PCA.
XIV
In recent years, the influence function of kernel PCA and a robust kernel PCA has been
theoretically derived. One observation of their analysis is that kernel PCA with a bounded
kernel such as Gaussian is robust in that sense the influence function does not diverged,
while for kernel PCA with unbounded kernels for example polynomial the influence function goes to infinity. This can be understood by the boundedness of the transformed data
onto the feature space by a bounded kernel. While this is not a result of kernel CCA but
for kernel PCA, it is reasonable to expect that kernel CCA with a bounded kernel is also
robust. This consideration motivates us to do some empirical studies on the robustness of
kernel CCA. It is essential to know how kernel CCA is effected by outliers and to develop
measures of accuracy. Therefore, we do intend to study a number of conventional robust
estimates and kernel CCA with different functions but fixed parameter of kernel.
The second goal of the dissertation is to discuss five canonical correlation coefficients
and investigate their performances (robustness) by influence function, sensitivity curve,
qualitative robustness index and breakdown point using different type of simulated datasets.
The final goal of the dissertation is to extract the limitations of cross-validation for the
kernel CCA, and to propose a new regularization approach to overcome the limitations of
kernel CCA. As we demonstrate for Gaussian kernels, the cross-validation errors for kernel
CCA tend to decrease as the bandwidth parameter of the kernel decreases, which provides
inappropriate features with all the data concentrated in a few points. This is caused by
the ill-posedness of the kernel CCA with the cross-validation. To solve this problem, we
propose to use constraints on the 4th order moments of canonical variables in addition
to the variances. Experiments on synthesized and real world datasets including human
action recognition for a robot demonstrate that the proposed higher-order regularized kernel
CCA can be applied effectively with the cross-validation to find appropriate kernel and
regularization parameters.