Adaptive algorithms for sparse nonlinear channel estimation

In this paper, we consider the estimation of sparse nonlinear communication channels. Transmission over the channels is represented by sparse Volterra models that incorporate the effect of Power Amplifiers. Channel estimation is performed by compressive sensing methods. Efficient algorithms are proposed based on Kalman filtering and Expectation Maximization. Simulation studies confirm that the proposed algorithms achieve significant performance gains in comparison to the conventional non-sparse methods.


I. INTRODUCTION
Channel nonlinearities are mainly due to Power Amplifiers (PA).PAs located at an access point of a downlink channel (base stations in cellular systems and repeaters for satellite links) often operate close to saturation in order to achieve power efficiency.The models employed in the description of PAs are either static (memoryless) or dynamic (models with memory).
Wireless communication channels are characterized by time varying multipath propagation effects.Quite often in practice, several reflections reach the receiver at different time instances.These reflections arrive at the receiver with longer delay than the first group.Hence, the wireless channel is modeled by sparse fading rays and long zero samples and thus admits a sparse representation [1].The sparseness characteristic is preserved when the PA representation is also described by a sparse model [2].Recent experimental results reported in [2] indicate better performance if sparse nonlinear models are employed for the representation of PA.Moreover, the time-varying nature of the wireless channels suggest the use of adaptive algorithms that minimize transmission delays and take advantage of parameters sparsity.Thus, compressive sensing provides a promising framework for such developments.Adaptive algorithms for sparse channel estimation are developed in [3,4].In [3] two different sparsity constraints are incorporated into the quadratic cost function of the LMS algorithm, to take into account the sparse channel coefficient vector.An 1 -regularized RLS type algorithm based on a low complexity Expectation-Maximization, is derived in [4].
In this paper we focus on adaptive estimation of sparse nonlinear communication channels.Adaptation is carried out The rest of the paper is organized as follows.Nonlinear channels and sparse channel estimation are discussed in Section II.The proposed algorithms for adaptive tracking of sparse nonlinear channels are given in section III.Simulation results are presented in Section IV. Conclusions are discussed in Section V.

II. SPARSE NONLINEAR CHANNEL ESTIMATION
In what follows, power amplifier nonlinear models are incorporated into the study of two important channels: 1) the satellite link, and 2) the multi-path wireless channel.In both cases, the overall communication channel is represented by baseband Volterra series.

A. Nonlinear channel models
In satellite digital transmission, both the earth station and the satellite repeater employ power amplifiers.The satellite amplifier operates near saturation due to limited power resources and hence behaves in a nonlinear fashion.The satellite link is represented by the block diagram of Fig. 1.The LTI filter with impulse response g 1 describes the cascade of all linear operations preceding the power amplifier.Likewise the LTI filter g 2 represents the cascade of all linear devices following the nonlinearity.
An analysis of the above system for static power amplifiers is provided by Benedetto and Biglieri [5].Let us next consider power amplifiers with memory described by Volterra models.To reduce the computational complexity we shall follow standard practice [2,6] and confine our study to diagonal Volterra models 1 .Straightforward calculation lead to the baseband Volterra model x * (t − τ j )dτ 1:2p+1 .
In most cases the filter g 1 performs a specific functionality (for instance pulse shaping) and hence is known.Since in this paper we shall deal with channel estimation using known inputs, we may with no loss of generality assume the input signal is the output of g 1 .In this case the Volterra representation from signal x(t) to signal r(t) gets simpler.More precisely we have The above expression represents also the multipath channel.In this case the modulated signal is amplified by a power amplifier and then transmitted through the wireless medium.The received waveform is the superposition of weighted and delayed versions of the signal resulting from various multipaths plus additive white Gaussian noise.We shall assume that the different nonzero fading rays arrive at the receiver at different time instances and they vary slowly with time and frequency hence the wireless channel becomes a frequency selective channel [1] and is described by an impulse response of the form where N is the number of paths, a i is the attenuation along path i and τ i is the clustered delay.

B. Sparse channel estimation
The transmission systems described in the previous section operate in continuous time.Discrete Volterra forms result when the modulation at the transmitter and the sampling device at the receiver are taken into account.We shall consider memoryless modulation schemes whereby The sequence s i consists of i.i.d (discrete) complex valued random variables and T s denotes the symbol period.Substituting x(t) from Eq. ( 4) into (1) yields the discrete baseband Volterra model [5] which can be expressed as a linear regression.Let us define the vector T and the i-fold Kronecker product The Kronecker product contains all 2p + 1 order products of the input with p conjugate copies.The output can be written in the linear regression form with 1 ≤ i ≤ n.If we stack n successive samples in a column format we obtain where 6) provides a noisy representation of a block of received successive samples in terms of the columns of X n (also referred to as dictionary), that are formed by the products of shifted symbol sequences.The above representation is sparse and hence recovery of the vector h can be accomplished by compressed sensing methods.Next we consider sparsity.It is well documented in the literature that parsimonious models are highly desirable in the representation of memory PA.In fact it has been experimentally observed [2] that sparse diagonal Volterra models provide enhanced performance in comparison to the full model.Furthermore, a physical justification of sparsity for the multipath channel is given in [1].The sparsity of the 2p + 1 kernel is at most s k × s m , where s k is the sparsity of the PA and s m is the sparsity of the multipath coefficients.Similar observations hold for the satellite channel.It thus follows that the vector h in Eq. ( 6) is sparse.
Recovery of the locations, the magnitudes and the nonlinear coefficients of h can be accomplished by the convex program The 1 -norm provides a convex relaxation to the 0 quasinorm.The scalar parameter γ provides a trade-off between sparsity and total squared error.The optimization problem (7) has been widely studied from the perspective of compressive sensing (see, for instance, [7]).

III. ADAPTIVE ALGORITHMS
Since the parameter vector h changes with time we need a model that captures the corresponding dynamics.A popular technique in the adaptive filtering literature is to describe parameter variation by the first-order model [8] h Table 1.Algorithms for sparse nonlinear channel identification (a) EM-Kalman Initialization : h0 = h0, P 0 = δ −1 I with δ =const.
For n := 1, 2, . . .do For n := 1, 2, . . .do R n = (1 − λ) P n 5: + µx * n εn end For Λ 0 denotes the support set of h 0 , i.e. the set of the non-zero coefficients.The noise term q n is zero outside Λ 0 and zeromean Gaussian inside Λ 0 with diagonal covariance matrix where d is the 0 norm of h 0 .The variances {σ 2 q i (n)} d i=1 are in general allowed to vary with time.The stochastic processes v, q and the random variable h 0 are mutually independent.
We next incorporate Eq. ( 8) and the convex program ( 7) into the Expectation-Maximization (EM) framework.The resulting adaptive algorithms employ only one iteration per time update for computational purposes.Let θ = h0 be the vector of unknown parameters.Note that under the Gaussian assumption postulated above, minimization of ( 8) is equivalent to the maximization of the log-likelihood p(y n |θ) augmented by an 1 penalty.
To apply the Expectation-Maximization method we have to specify the complete and incomplete data.The vector h n at time n is taken to represent the complete data vector, whereas y n−1 accounts for the incomplete data [9].In this context the conditional density p(h n |y n−1 ) plays a major role.This density is Gaussian with mean and covariance: Under broad conditions the maximizer of the incomplete likelihood is obtained by maximizing the complete likelihood function through successive application of the following two steps: E-step : computes the conditional expectation M-step : maximizes the Q-function minus the 1 -penalty with respect to θ: Note that Therefore the Q-function takes the form where the constant incorporates all terms that do no involve θ and hence do not affect the maximization.The parameter ψ n is recursively computed by the Kalman filter [8], see Table 1(a) steps 1 − 3, which in the special case of the time-varying random walk model Eq. ( 8) takes an RLS type appearance.Note that ε n , in Table 1, denotes the prediction error given by ε n = y n − x T n h n−1 .Maximization of the Q function leads to the soft thresholding function This operation shrinks coefficients above the threshold in magnitude value.
EM-Kalman filter.The Kalman filter computes h n under the assumption that the variances σ 2 v and {σ 2 qi (n)} d i are known.The noise variances can be estimated in various ways.One method is to use the Maximum Likelihood estimates.These estimates can be obtained by maximizing the Q-function.
Alternatively, under the assumption that the state noise is R n,Λ 0 = r n I, then both noise disturbances can be estimated adaptively.A smoothed estimate of the state and observation noise can be respectively obtained according to steps 5 and 6 of Table 1(a), where α is a smoothing parameter and R(x) is the ramp function (R(x) = x if x ≥ 0 and 0 otherwise).These two methods for online estimation of the noise disturbances is due to Jazwinski [10].
EM-RLS filter.The recursive procedure for the determination of the Kalman filter in the case of the random-walk  8), resembles the RLS algorithm.In fact, the RLS can be viewed as a special form of Table 1(a) which provides an alternative for the estimation of the noise variances.
The RLS filter is given by steps 1-3 of Table 1(b) [8], with Sparse LMS filter.For the purposes of simulations presented in the next section we discuss the LMS variant developed in [3].LMS updates some convex cost function of the prediction error signal ε n plus an 1 penalty.The update equation which minimizes the cost function is given in step 1a of Table 1(c).The authors in [3] replace the 1 -norm penalty by the log-sum penalty function.Hence, the resulting update equation for this cost function becomes step 1b of Table 1(c).The log-sum penalty function has the potential of being more-sparsity encouraging since it better approximates the non-convex 0 -norm.

IV. SIMULATIONS
Experiments were conducted on the multipath channel setup of Eq. ( 2).The algorithms were run for 2000 iterations and averaged over 50 Monte Carlo runs to reduce realization dependency.In all experiments the output sequence is disturbed by additive white Gaussian noise for various SNR levels ranging from 7 to 27dB.The Normalized Mean Square Error, defined as NMSE= 10 log 10 E[ ĥ − h , was used to assess performance.The NMSE is computed after 500 iterations so that all algorithms have secured convergence.
A third order channel model was used to test the derived algorithms.The wireless channel taps for the linear and cubic part were generated by sparse Rayleigh fading rays.All rays are assumed to fade at the same Doppler frequency of f D = 80Hz with sampling period T s = 0.8µs.The linear and the cubic part have equal memory size M 1 = M 3 = 50 and the support signal consists of 2 randomly selected elements for each part.The input signal is drawn from a complex Gaussian distribution CN (0, 1/4).We observe that the EM-Kalman and EM-RLS algorithms provide gains of 7dB and 5dB respectively, over the corresponding conventional non-sparse algorithms.
The choice of the parameters γ, λ that were used to compare performance of the sparse algorithms are summarized in Table 2 for various SNR levels.The additional parameters required for the LMS are set to µ = 5 × 10 −2 and = 10.For  the EM-Kalman and the EM-RLS, the initial noise variance is set to σ 2 0 = 0.01σ 2 x .It must be noted that due to the nature of the soft thresholding step, the identified h n has many zero entries.This will allow to implement the EM-Kalman and EM-RLS algorithms in a low-complexity fashion similar to the approach taken in [4].Thus, the EM-Kalman and EM-RLS algorithms introduce complexity gains as well as NMSE performance gains over the non-sparse methods.

V. CONCLUSIONS
In this paper, sparse approximations have been studied for nonlinear channel estimation.Adaptive algorithms combining Expectation-Maximization and Kalman filtering were developed and tested by simulations.Significant performance gains were achieved in comparison to the conventional nonsparse methods.

Fig. 1 .
Fig. 1.Digital satellite link by recursive algorithms that combine Expectation Maximization and Kalman filtering.The expectation step is carried out by Kalman filtering while the maximization step corresponds to a soft-thresholding function due to the 1 regularization.The rest of the paper is organized as follows.Nonlinear channels and sparse channel estimation are discussed in Section II.The proposed algorithms for adaptive tracking of sparse nonlinear channels are given in section III.Simulation results are presented in Section IV. Conclusions are discussed in Section V.

Table 2 .
Choice of parameters for the sparse algorithms