dc.description | Methods using positive definite kernel (PDK), kernel methods play an increasingly prominent role to solve various problems in statistical machining learning such as, web design,
pattern recognition, human action recognition for a robot, computational protein function
perdition, remote sensing data analysis and in many other research fields. Due to the kernel trick and reproducing property, we can use linear techniques in feature spaces without
knowing explicit forms of either the feature map or feature spaces. It offers versatile tools to
process, analyze, and compare many types of data and offers state-of-the-art performance.
Nowadays, PDK has become a popular tool for the most branches of statistical machine learning e.g., supervised learning, unsupervised learning, reinforcement learning,
non-parametric inference and so on. Many methods have been proposed to kernel methods, which include support vector machine (SVM, Boser et al., 1992), kernel ridge regression (KRR, Saunders et al., 1998), kernel principal component analysis (kernel PCA,
Schélkopf et al., 1998), kernel canonical correlation analysis (kernel CCA, Akaho, 2001,
Bach and Jordan, 2002), Bayesian inference with positive definite kernels (kernel Bayes’
rule, Fukumizu et al., 2013), gradient-based kernel dimension reduction for regression
(gKDR, Fukumizu and Leng, 2014), kernel two-sample test (Gretton, 2012) and so on. | en_US |
dc.description.abstract | In kernel methods, choosing a suitable kernel is indispensable for favorable results.
While cross-validation is a useful method of the kernel and parameter choice for supervised learning such as the support vector machines, there are no well-founded methods,
have been established in general for unsupervised learning. We focus on kernel principal
component analysis (kernel PCA) and kernel canonical correlation analysis (kernel CCA),
which are the nonlinear extension of principal component analysis (PCA) and canonical
correlation analysis (CCA), respectively. Both of these methods have been used effectively
for extracting nonlinear features and reducing dimensionality.
As a kernel method, kernel PCA and kernel CCA also suffer from the problem of kernel
choice. Although cross-validation is a popular method of choosing hyperparameters, it is
not applicable straightforwardly to choose a kernel and the number of components in kernel
PCA and kernel CCA. It is important, thus, to develop a well-founded method for choosing
hyperparameters of the unsupervised methods.
In kernel PCA, it is not possible to use cross-validation for choosing hyperparameters
because of the incomparable norms given by different kernels. The first goal of the dissertation is to propose a method for choosing hyperparameters in kernel PCA (the kernel and the
number of components) based on cross-validation for the comparable reconstruction errors
of pre-images in the original space. The experimental results of synthesized and real-world
datasets demonstrate that the proposed method successfully selects an appropriate kernel
and the number of components in kernel PCA in terms of visualization and classification
errors on the principal components. The results imply that the proposed method enables
the automatic design of hyperparameters in kernel PCA.
XIV
In recent years, the influence function of kernel PCA and a robust kernel PCA has been
theoretically derived. One observation of their analysis is that kernel PCA with a bounded
kernel such as Gaussian is robust in that sense the influence function does not diverged,
while for kernel PCA with unbounded kernels for example polynomial the influence function goes to infinity. This can be understood by the boundedness of the transformed data
onto the feature space by a bounded kernel. While this is not a result of kernel CCA but
for kernel PCA, it is reasonable to expect that kernel CCA with a bounded kernel is also
robust. This consideration motivates us to do some empirical studies on the robustness of
kernel CCA. It is essential to know how kernel CCA is effected by outliers and to develop
measures of accuracy. Therefore, we do intend to study a number of conventional robust
estimates and kernel CCA with different functions but fixed parameter of kernel.
The second goal of the dissertation is to discuss five canonical correlation coefficients
and investigate their performances (robustness) by influence function, sensitivity curve,
qualitative robustness index and breakdown point using different type of simulated datasets.
The final goal of the dissertation is to extract the limitations of cross-validation for the
kernel CCA, and to propose a new regularization approach to overcome the limitations of
kernel CCA. As we demonstrate for Gaussian kernels, the cross-validation errors for kernel
CCA tend to decrease as the bandwidth parameter of the kernel decreases, which provides
inappropriate features with all the data concentrated in a few points. This is caused by
the ill-posedness of the kernel CCA with the cross-validation. To solve this problem, we
propose to use constraints on the 4th order moments of canonical variables in addition
to the variances. Experiments on synthesized and real world datasets including human
action recognition for a robot demonstrate that the proposed higher-order regularized kernel
CCA can be applied effectively with the cross-validation to find appropriate kernel and
regularization parameters. | en_US |