Adaptive Kernel Kalman Filter

—This paper presents a novel model-based Bayesian ﬁlter called the adaptive kernel Kalman ﬁlter (AKKF). The proposed ﬁlter approximates the arbitrary probability distribution functions (PDFs) of hidden states as empirical kernel mean embeddings (KMEs) in reproducing kernel Hilbert spaces (RKHSs). Speciﬁcally, particles are generated and updated in the data space to capture the properties of the dynamical system model, while the corresponding kernel weight vector and matrix associated with the particles’ feature mappings are predicted and updated in the RKHS based on the kernel Kalman rule (KKR). We illustrate and conﬁrm the advantages of our approach through simulation, o ﬀ ering detailed comparison with the unscented Kalman ﬁlter (UKF), particle ﬁlter (PF) and Gaussian particle ﬁlter (GPF) algorithms.

the EKF, was proposed in [4] and uses a weighted set of deterministic particles (so called sigma points) in the state space to approximate the state distribution. Compared with the EKF, the UKF can significantly improve the accuracy of the approximations, but divergence can still occur as in both filters the state distributions are essentially approximated as Gaussian. A more general solution to the non-linear Bayesian filter can be found in the bootstrap particle filter (PF) proposed in [5], in which the hidden state distributions are represented through a weighted set of random particles. Resampling is a necessary step in the bootstrap PF which induces an increase in complexity and is hard to parallelize [6]. To avoid the need for resampling, some specific implementations of the bootstrap PF have been proposed that further approximate the hidden state distribution at each time with a Gaussian, such as the Gaussian particle filter (GPF) [6], and the Gauss-Hermite filter [7].
Different from the approaches above, a number of works have used the recently formulated kernel Bayes rule (KBR) to develop data-driven Bayesian filters based on kernel mean embeddings (KMEs) [2], [8]. Here the unknown measurement model was inferred from prior training data. Owing to the virtue of KMEs, these methods can effectively deal with problems that involve unknown models or strong non-linear structures [9]. However, the feature space for the kernel embeddings remains restricted to the feature space defined by the training data set. Therefore, the performance of these filters relies heavily on there being sufficient similarity between the training data and the test data [10].
Inspired by the KBR [8] and kernel Kalman rule (KKR) [11], we explore the potential of KMEs within full model based filters and introduce a new hybrid filter called the adaptive kernel Kalman filter (AKKF). The main contributions of this paper can be summarized as: • We derive a new model based Bayesian filter that is a hybrid of kernel based methods and PFs, in which both the prediction and posterior distributions are embedded into a kernel feature space but the known measurement and transition operators are used to calculate the update rules. This is in contrast to the PF where the prediction and posterior distributions are calculated through empirical PDF estimates in the data space. • The proposed filter can avoid the problematic resampling in most PFs. In passing, we also highlight a missing link between the UKF sigma point method and the kernel conditional embedding operator. The rest of the paper is set out as follows. The KME and KKR are reviewed in Section II. Section III presents the proposed AKKF. Simulation results for bearing-only tracking (BOT) problem are presented in Section IV and finally conclusions are drawn in Section V.

II. Preliminaries
In this section, we briefly review the frameworks of the KME and data-driven KKR, see [8] and [11] for details.

A. Kernel Mean Embedding
A reproducing kernel Hilbert space (RKHS) denoted as H x on the data space X with a Kernel function k x (x, x ) is defined as a Hilbert space of functions with the inner product ·, · H x that has some additional properties [8]. The KME approach represents a conditional distribution P(X|y) by an element in the RKHS as, where φ x (x) ∈ H x represents the feature mapping of x in RKHS H x for all x ∈ X. µ X|y is a family of points, each indexed by fixing Y to a particular value y. By defining the conditional operator C X|Y as the linear operator which takes the feature mapping of a fixed value y as the input and outputs the corresponding conditional KME [10], the KME of a conditional distribution defined in (1), under certain conditions, is calculated as, Here, C XY and C YY represent the covariance operators in the tensor product feature spaces H x ⊗ H y and H y ⊗ H y , respectively. The term λ is a regularization parameter to ensure that the inverse is well defined. If instead of access to the true underlying distribution, as required by (1), an empirical estimate of the PDF is available through a particle representation, the KMEs can be estimated directly from these particles. Hence, given the sample set and Υ := φ y (y {1} ), . . . , φ y (y {M} ) , the estimate of the conditional embedding operatorĈ X|Y is obtained as a linear regression in the RKHS [12], as shown in the illustration in Fig. 1. Then, the empirical KME of the conditional distribution is calculated by a linear algebra operation as, Here, G YY = Υ T Υ is the Gram matrix for the samples from the observation variable Y. The input test variable is y ∈ Y.
The kernel weight vector w = w {1} , . . . , w {M} T includes M non-uniform weights and is calculated based on the vector of kernel functions G :,y = k y (y {1} , y), . . . , k y (y {M} , y) T . In summary, an empirical KME can represent a PDF over a basis at RKHS with the corresponding weight vector, which has the advantages of low computational cost and low sample complexity. Fig. 1: KME of the conditional distribution P(X|y) is embedded as a point in kernel feature space as µ X|y = X φ x (x)dP(x|y). Given the training data sampled from P(X, Y), the empirical KME of P(X|y) is approximated as a linear operation in RKHS, i.e.,μ X|y = C X|Y φ y (y) = Φw. Legend: · samples, ×: empirical KME, * : KME.

B. Kernel Kalman Rule
The KKR was proposed in [11] as a recursive least squares estimator for KMEs of posterior distributions. In the proposed empirical KKR [11], the mean embedding and covariance operator are predicted and updated similar to the way a conventional KF does but relying on the training data set Here, respectively. The estimate of the preceding state is given by the KMEμ + x n−1 and the covariance operatorĈ + x n−1 ,x n−1 . Based on the derivations in [11], the kernel Kalman filter prediction and update steps consist of the following: Here, the conditional embedding operators for the distributions P(X|X) and P(Y|X), represented byĈ X|X andĈ Y|X , are calculated based on training data asĈ X|X = Φ Kxx + λK −1Φ and The covariance of the transition residual matrix is represented as V, and the kernel Kalman gain operator Q n is given by [11], where R is the covariance of the residual of the observation operator.
It should be noted that the existing filters based on the KKR are fully data driven and therefore of use when the DSM is not available and the test data has high similarities to the training data. Data-driven based KKR filters have been used for tracking problems that include the position estimate of a target which follows rotation in a circle or oscillatory rotation [11]. However, the data-driven based filters are only effective when training data provides a good description for the current state, and will fail when the target moves out of the training space. To mitigate this shortcoming, we present a new type of kernel Kalman filter defined for model based scenarios in Section III.

III. Adaptive Kernel Kalman Kilter
Inspired by the data-driven based KKR [11] and PF, the proposed adaptive kernel Kalman filter aims to take all the benefits of the KKR and PF. The proposed AKKF is executed in both data space and kernel feature space. In the kernel feature space, the kernel weight vector and positive definite weight matrix are estimated using the KKR, which requires an embedding of the state update function to update the estimate from time n − 1 to time n. Then an embedding of the measurement function is used to update the prior estimate at time n to the posterior estimate at time n. In data space, the embeddings for the state process and measurement functions are obtained as follows: a proposal distribution is generated using information from the kernel space at time n − 1, which is then propagated through the non-linear DSM.
The following subsections will derive the proposed AKKF, with the implementation is summarized in Algorithm I. ), respectively. Given also the previous weight vector w + n−1 and positive definite weight matrix S + n−1 , the empirical KME and covariance operator for the posterior p(x n−1 |y 1:n−1 ) are: where the feature mappings is calculated as [11]. Specifically, suppose x n−1 = x n−1,1 , . . . , x n−1,d T is a d-dimension vector and the quadratic kernelis utilized. Then, its feature mapping is [13], Therefore, from (10), the empirical KMEμ + x n−1 is represented in terms of the expectations of X 2 n−1 and X n−1 , asμ + Then, the E(X n−1 ) and E(X 2 n−1 ) are extracted fromμ + x n−1 and passed to the data space.
As pointed out in [14], the approximation of a Gaussian distribution is easier to realize than the approximation of an arbitrary non-linear function. Hence, the proposed AKKF uses a new weighted sample representation called Update: • First, in the data space: y {i} n = h(x {i} n , v {i} n ), ⇒ Second, in the kernel feature space with basis Φ n : w + n = w − n +Q n G :,y n − G yy w − n , S + n = S − n −Q n G yy S − n . µ x n = Φ n w + n .

5:
Proposal particles draw: • First, in the data space: in the kernel feature space with basis Ψ n : w + n = Γ n w + n ,S + n = Γ n S + n Γ T n . 6: end for proposal particles to approximate the KME that can be exactly propagated through the non-linearity. The proposal particles are generated according to the importance distribution as, The feature mappings of the proposal particles are represented as Ψ n−1 = φ x (x {1} n−1 ), . . . , φ x (x {M} n−1 ) . Then, the posterior distribution p(x n−1 |y 1:n−1 ) can also be embedded using the new basis Ψ n−1 and therefore the weight vector and covariance operator are transformed into Ψ n−1 as, Substituting (15) into (10), and (16) into (11), respectively, the proposal kernel weight vectorw + n−1 and matrixS + n−1 are calculated as, where Γ n−1 defined in (17) represents the change of basis from Φ n−1 to Ψ n−1 , Kxx = Ψ T n−1 Ψ n−1 represents the Gram matrix of the proposal particles at time n − 1, Kx x = Ψ T n−1 Φ n−1 is the matrix between the particles and proposal particles at time n − 1, and λK is the regularization parameter to modify Kxx.

B. Prediction from Time n − 1 to Time n
The proposal particles at time n − 1 are propagated through the process function to achieve the prediction particles, i.e., where u {i} n represents a process noise sample drawn from the process noise distribution. Then, the transitional probability p(x n |x n−1 ) is embedded using the new basis defined by the feature mappings of the prediction particles Φ n = φ x (x {1} n ), . . . , φ x (x {M} n ) , and is approximated as: where w − n is the prior kernel weight vector andĈ x n |x n−1 represents the empirical transition operator. Next, the empirical predictive covariance operatorĈ − x n x n with the corresponding prior kernel weight matrix S − n is computed as, Here, V n represents the transition residual matrix, where V n is the finite matrix representation of V n . Based on (20)-(22), the prior w − n and S − n are calculated as, C. Update at Time n The observation particles are updated based on the observation model as, where v {i} n represents a measurement noise sample drawn from the measurement noise distribution. The kernel mappings of observation particles in the kernel feature space are Υ n = φ y (y {1} n ), . . . , φ y (y {M} n ) . The posterior KME and the corresponding covariance operator are calculated as [11], where w + n and S + n represent the posterior kernel weight and matrix, respectively. The kernel Kalman gain operator denoted as Q n is derived by minimizing the residual errorĈ + x n x n . According to derivations in [11], Q n is calculated as, where R is the covariance matrix of the observation operator residual and is approximated as R = κI. The empirical likelihood operator is calculated as, C y n |x n =Ĉ y n x nĈ −1 Here, the Gram matrix of particles at time n is calculated as K xx = Φ T n Φ n , and λ K is the regularization parameter to modify the covariance operator K xx . In this paper, λ K is set to be 0. Substituting (21) and (29) into (28), Q n can be calculated as, (30) where Q n is the finite matrix representation of Q n in terms of the current basis Φ n . The Gram matrix of the observation at time n is G yy = Υ T n Υ n . Then, the updated KME vector and matrix are given by, where the kernel vector of the measurement at time n is G :,y n = Υ T n φ y (y n ). Based on the derivations above, the weight mean vector and covariance matrix are finally updated as,

IV. Simulation Results
Bearing-only tracking (BOT) is one of the fundamental problems in target tracking systems. In this section, we report the tracking performance of different filters applied to BOT problems of a single target using a single sensor in a 2-D space. The corresponding dynamical state-space model (DSM) is described by the equations: Here, n represents time index and n = 1, . . . , N. The hidden states are x n = [ξ n ,ξ n , η n ,η n ] T , where (ξ n , η n ) and (ξ n ,η n ) represent the target position and the corresponding velocity in X-axis and Y-axis, y n is the corresponding observation. The process noise u n follows Gaussian distribution u n ∼ N(0, σ 2 u I 2 ) and σ u = 0.001. Following [6], the prior distribution for the initial state is specified as x 0 ∼ N(µ 0 , P 0 ) with µ 0 = [−0.05, 0.001, 0.7, −0.05] T and, PF, GPF and AKKF are 50, while the benchmark performance is given by a PF with 2000 particles. Fig. 4 and Fig. 5 shows the average logarithmic mean square error (LMSE) obtained for 100 random realizations of trajectory-1 and trajectory-2 as a function of particle number denoted as M. From the simulation results, we can arrive at the following conclusions. First, for BOT problems, the tracking performance of PF, GPF and AKKF is obviously better than UKF which shows divergence for trajectory-2 as shown in Fig. 3. Second, the proposed AKKF shows significant improvement compared to the PF and GPF with small particle numbers, as shown in Fig.  4 and Fig. 5.

V. Conclusions
This paper provided a novel model based kernel Kalman filter. By embedding the probabilities into kernel spaces, more feature information of the hidden states and observations can be captured and recorded. Therefore, the proposed AKKF out performs existing algorithms when applied to a BOT problem.