# MMSE Equalizer Design Optimization for Wireline SerDes Applications

Alireza Akbarpour Bazargani<sup>®</sup>, *Graduate Student Member, IEEE*, Hossein Shakiba<sup>®</sup>, *Senior Member, IEEE*, and David A. Johns<sup>®</sup>, *Life Fellow, IEEE* 

Abstract—This paper presents analytical equations for optimizing feedforward equalizer (FFE) and decision feedback equalizer (DFE) parameters in a wireline receiver to speed up system-level design and simulations. A minimum mean square error (MMSE)-based approach is applied to the receiver model, and a set of equations is developed to co-optimize FFE and DFE taps. The equations consider the noise sources in wireline links, including the sampling clock jitter. It also considers the effect of the noise correlations on the equalizer parameters. For sampling clock jitter, two separate models are developed to distinguish between sampling for discrete-time and continuous-time FFEs (pre- and post-FFE sampling). Then, the translation of jitter noise to voltage noise is carefully investigated. Jitter noise can be either white or correlated. Later, the developed model is modified to generate different variants of MMSE-based approaches to be used in various practical scenarios a designer may face. This includes the equalizer design for maximum likelihood sequence estimation (MLSE)-based receivers and equalizer design with bounded DFE tap magnitude to control undesired side effects such as error propagation. Finally, the use of "tap skipping" to save FFE hardware resources is investigated. The accuracy of models and the performance of each method is justified through simulations and comparing against the LMS adaptation loops.

Index Terms—Equalization, feedforward equalizer (FFE), decision feedback equalizer (DFE), wireline, mean square error (MMSE), SerDes, LMS algorithm.

### I. Introduction

EAST mean squares (LMS) based adaptation engines are widely used in the practical implementation of the wireline receivers to adapt the coefficients of the feedforward equalizer (FFE) and decision feedback equalizer (DFE). Currently, the LMS algorithm is also used in system-level simulations which takes up significant simulation time. This simulation time can be especially troublesome during the initial architecture design phase when one wants to search the

Manuscript received 4 July 2023; revised 3 October 2023; accepted 25 October 2023. This work was supported in part by the National Sciences and Engineering Research Council of Canada (NSERC) and in part by Huawei Canada's contributions. This article was recommended by Associate Editor K. Moez. (Corresponding author: Alireza Akbarpour Bazargani.)

Alireza Akbarpour Bazargani and David A. Johns are with The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, Department of Electrical Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: ara.bazargani@gmail.com).

Hossein Shakiba is with Huawei Technologies Company Ltd., Markham, ON L3R 5A4, Canada.

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCSI.2023.3328807.

Digital Object Identifier 10.1109/TCSI.2023.3328807

design space over a number of key system parameters. Key system parameters for wireline applications include the various channels, the required Continuous-Time Linear Equalizer (CTLE) boost, the number of FFE/DFE taps and the required range of tap magnitudes.

The purpose of this paper is to present a set of analytical equations and methods that directly calculate the FFE/DFE tap coefficients. These tap coefficients are the same as what would be obtained by running an LMS algorithm simulation. As a result, this approach can be thought of as an additional tool for system-level designers that allows them to eliminate the need for running the LMS algorithm while searching though the initial design space.

Deriving analytic equations to obtain optimum equalizer coefficients has been addressed in different contexts [1], [2], [3], [4]. However, these equations have been developed for general communication links but not specifically for wireline links and, as a result, there are missing details that are important in wireline links. For example, the equations are under the assumption of white channel noise at the input of the FFE, while in a wireline receiver, the channel noise spectral density is shaped by a CTLE, which introduces correlations between noise samples. Another limitation of the current analysis methods is that they do not consider jitter noise or are not directly applicable when a DFE is used. Finally, the current optimization methods do not offer any solution to address the practical limitations that a designer may face while the link architecture (such as limiting the size of the individual DFE taps to contain error propagation).

In this work, we develop a comprehensive set of equations for co-optimizing FFE and DFE that addresses the limitations of prior works and facilitate the designer with tools and methods for equalizer design in different realistic scenarios. In Section II, we first review the equations used to design a feedforward equalizer based on the Minimum Mean Square Error (MMSE) criterion [4]. We then extend these equations to support the co-optimization of the DFE and FFE in wireline receivers in the presence of channel noise filtered by a CTLE and other noise sources in a wireline link. Then, the derived equations are validated by showing the matching between calculations and the results obtained by LMS-based timedomain simulations. In section III, the effect of sampling clock jitter is taken into account in detail, where the sampling location may be either after a CTLE or after a continuous-time FFE. The accuracy of the proposed jitter model is also justified

1549-8328 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 1. Receiver employing a continuous-time adaptive linear combiner.

using time-domain simulations. In Section IV, equations are further extended to extract methods applicable in different design scenarios, like DFE design in the presence of error propagation, FFE design for MLSE application, and "tap skipping" to save FFE hardware resources. Finally, the paper is concluded and summarized in Section V.

### II. MMSE EQUALIZER DESIGN

### A. Background

We start by going over the work presented in [4] where a continuous-time Adaptive Linear Combiner (ALC) is used as a CTLE to equalize the received signal, as shown in Fig. 1. Derived equations in [4] provide coefficients of the ALC, resulting in minimum mean square error. Following is a summary of the main equations for the system shown in Fig. 1 incorporating an *N*-tap ALC.

Lets consider a sampling frequency of  $f_s = 1/T$ , a sampling time offset of  $t_0$  to ensure optimum sampling phase, and  $c_j(t) = p(t) * x_j(t)$  being negligible for  $t \ge L' \times T$  where L' is the maximum length of  $c_j(t)$  after sampling. Also, the transmit sequence is independent and identically distributed (i.i.d.) with a power of  $\sigma_a^2$ . Then, defining error as e(n) = d(n) - y(n), it is shown in [4] that the Mean-Squared-Error (MSE) can be calculated as:

$$E\{|e(n)|^2\} = \sigma_a^2 (\mathbf{C}\mathbf{w} - \mathbf{h}_{\delta})^T (\mathbf{C}\mathbf{w} - \mathbf{h}_{\delta}) + \mathbf{w}^T \mathbf{M} \mathbf{w}$$
 (1)

In this equation, C is the sampled pulse response matrix at the channel output with elements:

$$C_{ij} = c_i((i-1)T + t_0), i = 1 \text{ to } L', j = 1 \text{ to } N$$
 (2)

**M** is the noise correlation matrix expressing cross-correlations between noise samples at different taps  $(x_i(t))$  outputs in Fig. 1) with elements:

$$\mathbf{M}_{ii} = E\left\{\eta_i\left(n\right) \times \eta_i\left(n\right)\right\} \tag{3}$$

where:

$$\eta_i(n) = n(t) * x_i(t) |_{t=nT+t_0}$$
 (4)

and i, j = 1 to N.

<sup>1</sup>In this context, \* denotes convolution operation.

 $\mathbf{h}_{\delta}^{T}$  is a  $1 \times L'$  vector with the  $\delta^{th}$  component equal to 1 and all other elements equal to zero, representing the fully-equalized desired response at the output of the equalizer.

Finally, **w** is the vector of equalizer coefficients  $w_1, w_2, \ldots, w_N$ .

For MMSE performance, [4] has calculated the optimum equalizer coefficients,  $\mathbf{w}_{OPT}$ :

$$\mathbf{w}_{OPT} = \mathbf{A}^{-1} \mathbf{C}^T \mathbf{h}_{\delta} \tag{5}$$

where:

$$\mathbf{A} = \mathbf{C}^T \mathbf{C} + \frac{1}{\sigma_a^2} \mathbf{M} \tag{6}$$

Also, the corresponding MMSE is found to be:

$$MMSE = \sigma_a^2 \mathbf{h}_{\delta}^T (\mathbf{I} - \mathbf{C} \mathbf{A}^{-1} \mathbf{C}^T) \mathbf{h}_{\delta}$$
 (7)

and is a function of  $\delta$ . The optimum  $\delta$  is:

$$\delta_{opt} = \arg\min\{[\mathbf{I} - \mathbf{C}\mathbf{A}^{-1}\mathbf{C}^T]_{\delta,\delta}\}$$
 (8)

which corresponds to the minimum diagonal element of  $(\mathbf{I} - \mathbf{C}\mathbf{A}^{-1}\mathbf{C}^T)$  [2], [4].

It is worth mentioning that the right-hand side of (1) is indeed a power sum of two separate contributors to the error: residual ISI with a power of:

$$\sigma_{ISI}^2 = \sigma_a^2 \left( \mathbf{C} \mathbf{w} - \mathbf{h}_{\delta} \right)^T \left( \mathbf{C} \mathbf{w} - \mathbf{h}_{\delta} \right) \tag{9}$$

and additive noise with a power of:

$$\sigma_n^2 = \mathbf{w}^T \mathbf{M} \mathbf{w} \tag{10}$$

With the above as a starting point, the purpose of this paper is to extend the analysis to cover wireline solutions that usually employ CTLE, FFE, and DFE in their receivers, as well as to accommodate other sources of noise that are present in these links.

## B. Reformatting to FFE-Based SerDes Model

We start the extension of the work in [4] by comparing the system of Fig. 1 with the FFE-based SerDes model shown in Fig. 2. From a signal perspective, considering an ideal sampler, the output y(n) of these two systems would be identical if  $x_i(t) = h_{ctle}(t) * \delta(t - (i - 1)T)$  where  $\delta(t)$  is Dirac delta function. So,  $x_i$  in Fig. 1 correspond to the delayed versions of CTLE impulse response, i.e.:

$$x_i(t) = h_{ctle}(t - (i - 1)T)$$
 (11)

By defining link pulse response, h(t), as:

$$h(t) = h_{tx}(t) * h_{ch}(t) * h_{ctle}(t)$$
 (12)

and considering  $p(t) = h_{tx}(t) * h_{ch}(t)$  we have:

$$c_{j}(t) = p(t) * x_{j}(t)$$

$$= h_{tx}(t) * h_{ch}(t) * h_{ctle}(t) * \delta(t - (j - 1)T)$$

$$= h(t - (j - 1)T)$$

now (2) is obtained as:

$$\mathbf{C}_{ij} = h ((i-1) T + t_0 - (j-1) T)$$
 (13)



Fig. 2. Wireline SerDes with FFE.

In wireline SerDes applications, it is a general practice to define the channel, the equalizer, and the transmitter filter in frequency domain [7]. So, extracting h(t) is straightforward by the Inverse Fast Fourier Transform (IFFT). Suppose the sampled link pulse response  $h_n = h(nT + t_0)$  is limited to L samples from  $h_0$  to  $h_{L-1}$ . As a result,  $\mathbb{C}$  will be a matrix of size  $(L + N - 1) \times N$  as:

$$\mathbf{C} = \begin{bmatrix} h_0 & 0 & 0 & \dots & 0 & 0 \\ h_1 & h_0 & 0 & \dots & 0 & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots & \vdots \\ h_{L-1} & h_{L-2} & \ddots & \ddots & 0 & 0 \\ 0 & h_{L-1} & \ddots & \ddots & h_0 & 0 \\ \vdots & 0 & \ddots & \ddots & h_1 & h_0 \\ \vdots & \vdots & \ddots & \ddots & \vdots & \vdots \\ 0 & 0 & \dots & 0 & h_{L-1} & h_{L-2} \\ 0 & 0 & \dots & 0 & 0 & h_{L-1} \end{bmatrix}_{(L+N-1)\times N}$$

Note that L + N - 1 equals to L' in (2). Also,  $h_{tx}$  in (12) absorbs any possible transmitter FFE. Since the pulse response used in the construction of  $\mathbf{C}$  is affected by both the Tx filter and the CTLE, MMSE equations calculate the optimum FFE taps corresponding to pre-set Tx FFE and Rx CTLE equalizer configurations. Finding the optimum setting of these equalizers is beyond the scope of this paper.

# C. Noise Correlation Matrix

In contrast to the work in [4], in the wireline transceivers, multiple noise sources contribute to the construction of the noise correlation matrix. The main noise sources in a wireline transceiver chain are illustrated in Fig. 2, including transmitter noise  $(n_{tx})$ , channel noise  $(n_{ch})$ , crosstalk noise  $(n_{xtalk})$  and Analog-to-Digital Converter (ADC) quantization noise  $(n_Q)$ .  $n_{ch}$  also absorbs CTLE circuit noise  $(n_{ctle})$  when input referred. Moreover, the sampling clock jitter contributes to the total noise at the sampler output.<sup>2</sup>

Having multiple sources of noise  $n_k(t)$  where k represents one of the noise sources as above, we define the extended

<sup>2</sup>For the sake of simplicity, only the channel noise is shown in the upcoming figures of this paper.

noise correlation matrix  $\overline{\mathbf{M}}$  as:

$$\overline{\mathbf{M}}_{ij} = E\{ [\eta_{1,i}(n) + \eta_{2,i}(n) + \dots + \eta_{k,i}(n)] \times [\eta_{1,j}(n) + \eta_{2,j}(n) + \dots + \eta_{k,j}(n)] \}$$
(14)

where:

$$\eta_{k,i}(n) = [n_k(t) * h_{k,i}(t)]_{t=nT+t_0}$$
(15)

and  $h_{k,i}(t)$  is the impulse response from noise source k, to equalizer tap i. Since in a practical system, noise sources are mutually uncorrelated  $(E\{[n_i(t)n_j(t)]_{t=nTb+t0}\}=0)$  and  $\overline{\mathbf{M}}_{i,i}$  can be simplified as:

$$\overline{\mathbf{M}}_{ij} = E\{ [\eta_{1,i}(t)\eta_{1,j}(t) + \eta_{2,i}(t)\eta_{2,j}(t) + \dots + \eta_{k,i}(t)\eta_{k,j}(t)]_{t=nT+t0} \}$$
(16)

In other words,  $\overline{\mathbf{M}}$  is the superposition of individual noise correlation matrices associated with each noise source:

$$\overline{\mathbf{M}} = \mathbf{M}_{tx} + \mathbf{M}_{ch} + \dots \tag{17}$$

where each  $M_x$  is created using (3).

It is worth mentioning that using  $\overline{\mathbf{M}}$  in (10) we can obtain the total noise power at the output of the FFE with a tap vector of  $\mathbf{w}$ .

For channel noise,  $n_{ch}(t)$ , shown in Fig. 2, (15) can be written as:

$$\eta_i(t) = n_{ch}(t) * h_{ctle}(t) * \delta(t - (i - 1)T)$$

$$= n_{ch, f}(t - (i - 1)T)$$

Here we have defined filtered channel noise  $n_{ch,f}(t) = n_{ch}(t) * h_{ctle}(t)$ . Now  $\mathbf{M}_{ch}$ , the channel noise correlation matrix, can be derived using (3) as follows:

$$\mathbf{M}_{ch,ij} = E\{[n_{ch,f}(t-(i-1)T) \times n_{ch,f}(t-(j-1)T)]_{t=nT+t_0}\}$$

$$= R_{n_{ch,f}}((i-j)T)$$
(18)

where  $R_{n_{ch,f}}(\tau)$  denotes autocorrelation of filtered noise,  $n_{ch,f}(t)$ . Having the noise PSD,  $K_0$ , and frequency response of CTLE,  $H_{ctle}$ ,  $R_{n_{ch,f}}(\tau)$  can be calculated using Wiener–Khinchin theorem:

$$R_{n_{ch,f}}(\tau) = K_0 \int_{-\infty}^{\infty} |H_{ctle}(f)|^2 e^{i2\pi f \tau} df \qquad (19)$$



Fig. 3. Incorporation of DFE in wireline SerDes.

This way, channel noise can be easily characterized since the CTLE transfer function typically is well-defined in the frequency domain. A similar procedure applies for creating  $\mathbf{M}_{tx}$ , the transmitter noise correlation matrix, assuming white transmitter noise  $n_{tx}$ .

For white quantization noise, following (3) and (15),  $\mathbf{M}_Q$ , the quantization noise correlation matrix, can be simplified as:

$$\mathbf{M}_Q = \sigma_n^2 \times \mathbf{I} \tag{20}$$

where  $\sigma_n^2$  and **I** are quantization noise power and identity matrix, respectively.

One important source of noise in wireline links, is crosstalk. Crosstalk noise is often caused by several neighbouring far-end and near-end aggressors contributing through their individual parasitic paths with their own frequency responses. Being data sources for other links, these aggressors possess similar PSDs to the main source and are usually uncorrelated. As a result, the same above approach for the CTLE can be used to calculate the crosstalk correlation matrix for each aggressor and add them to obtain the overall crosstalk correlation matrix,  $\mathbf{M}_{xtalk}$ .

We will analyse the jitter noise in detail in section III.

### D. Extension to DFE (MMSE-DFE)

Fig. 3 shows the receiver side of a wireline link incorporating both FFE and DFE. We will extend the equations in [4] so that they can provide both FFE and DFE tap values. To do so, we must adapt (1) to the structure shown in Fig. 3.

Cw in (1) represents equalized sampled pulse response at the FFE output. It is compared with  $h_{\delta}$ , the desired FFE output pulse response, and the difference constitutes the residual ISI. The first term on the right-hand side of (1) represents the ISI power, and the second term provides the noise added by other sources.  $\delta^{th}$  element of  $(\mathbf{Cw} - \mathbf{h}_{\delta})$  corresponds to the pulse response main cursor. In the presence of M-tap DFE, M elements of cursors after the main cursor are eliminated by the DFE. So, ISI associated with these post-cursors should not be considered in ISI power calculation. Accordingly, the corresponding rows in the FFE output target vector should be released and no-longer set to zero. To achieve this goal, we can set the corresponding rows of C to zero (row  $\delta + 1$  to  $\delta + M$ ), naming the resultant matrix  $C_M$  (Modified C). Fig. 4 illustrates the creation of  $C_M$  for a 4-tap FFE and 2-tap DFE example system. Using this modification, cursors  $\delta + 1$  to  $\delta + M$  in the resultant equalized pulse response,  $C_M \mathbf{w}$ , will equal zero. Now we can use  $C_M$  in (5) and (6) to calculate  $\mathbf{w}_{OPT}$ . This new set of coefficients provides the optimum coefficients of the FFE, which no-longer attempts to minimize ISI from the  $\delta+1$  to  $\delta+M$  cursors in isolation, but rather leaves them to be removed by the DFE. Then the DFE coefficients,  $\mathbf{b}^T = [b_1, b_2, \ldots]$ , can be calculated as:

$$\mathbf{b} = [\mathbf{C}\mathbf{w}]_{\delta+1} \ to \ \delta+M \tag{21}$$

In other words, **b** equals the non-zero elements of  $(\mathbf{C} - \mathbf{C}_M)\mathbf{w}$ . This method is an analytical solution for co-optimizing FFE and DFE tap values based on the MMSE criterion.

To summarize, to find the optimum coefficients in an MMSE sense, we make use of equations (5)-(7) with the new values for **A**, **C**, **M** as shown above with the DFE values found from (21).

It is worth mentioning that for a DFE system, (8) can not be used directly to find the delay resulting in MMSE. It's because the construction of  $\mathbf{C}_M$ , as shown in Fig. 4, is dependent on the location of the output main cursor and hence the FFE delay. In other words,  $\mathbf{C}_M$  will be different for each delay value. In this case, a sweep of  $\delta$  is needed to find the optimum delay. Sweeping the delay is used in practical optimization of the equalizer in the current state-of-the-art receivers during the link start-up and is managed by higher-level optimization engines such as a genetic algorithm.

# E. Validation of Equations

A simulation set-up incorporating an LMS adaptation loop in a PAM-4 transceiver is used to validate the accuracy of the equations derived. Transmitted PAM-4 symbols are from a constellation of (-1, -1/3, 1/3, 1) generated using a PRBS source. The test channel introduces 32-dB loss at Nyquist frequency and is followed by a CTLE. To keep the CTLE boosting in a practical range [5], [8], [9], 13-dB of the channel loss is equalized using the CTLE and the remaining has been left for the following equalizers (FFE-DFE). To better represent a realistic scenario, the CTLE consists of two zeros. However, the optimization of the CTLE is beyond the scope of this work. The pulse response at the output of the CTLE (input of the FFE) is shown in Fig. 5, which is truncated

$$\begin{bmatrix} h_0 & 0 & 0 & 0 \\ h_1 & h_0 & 0 & 0 \\ h_2 & h_1 & h_0 & 0 \\ h_3 & h_2 & h_1 & h_0 \\ h_4 & h_3 & h_2 & h_1 \\ 0 & 0 & h_4 & h_3 \\ 0 & 0 & 0 & h_4 \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ w_4 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 1 \\ ? \\ 0 \\ 0 \end{bmatrix}$$

$$\begin{bmatrix} h_0 & 0 & 0 & 0 \\ h_1 & h_0 & 0 & 0 \\ h_2 & -h_1 & -h_0 & 0 \\ 0 & 0 & -0 & 0 \\ 0 & -0 & 0 & 0 \\ 0 & 0 & h_4 & h_3 \\ 0 & 0 & 0 & h_4 \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ w_4 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 1 \\ w_2 \\ w_3 \\ w_4 \end{bmatrix} \leftarrow \text{Main cursor}$$

$$\begin{bmatrix} \text{Post-cursors} \\ \text{(DFE tap 1 & & 2)} \end{bmatrix}$$

$$\begin{bmatrix} \text{Corresponding rows to} \\ 1^{\text{st}} \text{ and } 2^{\text{nd}} \text{ post-cursor} \\ \text{(DFE tap 1 & & 2)} \end{bmatrix}$$

Fig. 4. Modification of C matrix in MMSE-DFE (N = 4 and L = 5).



Fig. 5. Test pulse response.

TABLE I
TEST PULSE RESPONSE SAMPLES

| Cursor     | 1     | 2    | 3     | 4    | 5     |
|------------|-------|------|-------|------|-------|
| Value [mV] | -2.7  | 7.1  | 364.2 | 1000 | 468.2 |
| Cursor     | 6     | 7    | 8     | 9    | 10    |
| Value [mV] | 194.1 | 43.3 | 120.1 | 47.4 | 19.7  |
| Cursor     | 11    | 12   | 13    | 14   | 15    |
| Value [mV] | 51.4  | 9.4  | 8.1   | 18.9 | -1.9  |
| Cursor     | 16    | 17   | 18    | 19   | 20    |
| Value [mV] | 3.2   | 2.1  | -4.6  | -3.0 | -3.8  |

TABLE II
CORRELATION COEFFICIENTS

| $\tau$ [UI]             | 0 | +/-1    | +/-2    | +/-3   | +/-4    | +/-5    |
|-------------------------|---|---------|---------|--------|---------|---------|
| Correlation coefficient | 1 | -0.3764 | -0.0049 | 0.0003 | -0.0028 | -0.0018 |

to 20 Unit-Intervals (UI) and amplified to have a peak value of 1. Corresponding pulse samples are listed in Table I.

Using (19), noise correlation coefficients at the output of the CTLE are calculated as  $R_{n_{ch,f}}(nT)/R_{n_{ch,f}}(0)$ . A plot of correlation coefficients is shown in Fig. 6 and the corresponding values are listed in Table II. These noise correlation coefficients are multiplied by the intended noise power at the FFE input to obtain different noise levels.

The current state of the art receiver operating on the links with high insertion loss (beyond 30dB) incorporate more



Fig. 6. Correlation coefficients at CTLE output.

than 15 taps of FFE followed by 1 to 2 taps of DFE to compensate for the higher sensitivity of the PAM-4 signalling to ISI [5], [9], [10], [11], [12], [13]. However, in our test set-up, the equalizer is realized in the form of 10-tap FFE and 3-tap DFE. Here, a longer DFE is realized to put more stress on the optimization process. Also, for our test channel, increasing the number of FFE taps results in multiple taps with small magnitudes near zero, so we didn't push the FFE length beyond 10. By sweeping the FFE delay from 1 to 10 and exploiting (5) and (6), an optimum delay of 5 (having  $w_6$  as the main tap) minimizes the MSE while keeping the DFE tap sizes less than one.

Fig. 7 shows the adaptation over  $2 \times 10^6$  samples with an LMS step size of 0.001 and noise of  $30 \text{mV}_{rms}$  at the input of FFE.<sup>3</sup> As we can see, the taps have converged to the expected optimum levels calculated through MMSE equations, showing the accuracy of the equations. The corresponding error signal defined as the difference between the slicers output and input is shown in Fig. 8. According to Fig. 8, the LMS loop settles properly for the simulation runtime and results in an error that settles to a  $49 \text{mV}_{rms}$  level, which is the same value found through analysis using (7).

The equalizer tap values are listed in Table III. For the LMS case, average tap values are recorded over the last

<sup>&</sup>lt;sup>3</sup>Note that for simplicity, the pulse response at the FFE input (CTLE output) is amplified to obtain a main cursor value of 1. The actual pulse response at the CTLE output has a 142mV main cursor. The actual noise level (integrated noise at the output of the CTLE) before amplification is 4.26mV.



Fig. 7. LMS vs Analytic calculation, a) FFE taps 4 to 9, b) DFE taps.



Fig. 8. LMS error settling behaviour with  $30\text{mV}_{rms}$  noise at the FFE input. After convergence, error settles to  $49\text{mV}_{rms}$ .

1000 iterations. Table III also contains the results for the case of  $60\text{mV}_{rms}$  noise at the equalizer input and the break-down of the MSE contributor at the FFE output.

For the FFE input noise of  $30\text{mV}_{rms}$  and  $60\text{mV}_{rms}$ , the corresponding ISI at the output of the equalizer is  $19\text{mV}_{rms}$  and  $41\text{mV}_{rms}$ , respectively. The higher ISI noise in the later case indicates that the equalizer designed using the MMSE criterion results in an optimum noise/ISI balance which is achieved by making less effort (less equalization) in minimizing the ISI when the input noise increases to avoid excessive noise boosting caused by the FFE.

Furthermore, as the input noise level increases, more equalization is pushed from the FFE toward the DFE, which can be inferred from having larger DFE taps for the case of  $60\text{mV}_{rms}$  noise in Tabel III and also by looking at the equalized pulse responses shown in Fig. 9. Fig. 9 shows the equalized pulse responses after FFE and DFE summation nodes for two different noise levels. Sharper transition and larger post-cursor undershoot of post-FFE pulse response shown in Fig. 9a is an indication of the higher equalization introduced by the FFE stage at lower noise levels compared to the pulse response shown in Fig. 9b which is wider and has a larger  $1^{\text{st}}$  post-cursor. The FFE, being a linear equalizer, amplifies the in-band



TABLE III
ANALYTIC CALCULATIONS VS. LMS RESULTS

| Input Noise      | 30m                    | $V_{rms}$              | 60m <sup>v</sup>       | $V_{rms}$            |  |
|------------------|------------------------|------------------------|------------------------|----------------------|--|
| Method           | Analytic               | LMS                    | Analytic               | LMS                  |  |
|                  | -0.010                 | -0.009                 | -0.010                 | -0.010               |  |
|                  | 0.030                  | 0.030                  | 0.026                  | 0.026                |  |
|                  | -0.077                 | -0.076                 | -0.061                 | -0.061               |  |
|                  | 0.199                  | 0.198                  | 0.162                  | 0.162                |  |
| FFE taps         | -0.492                 | -0.491                 | -0.421                 | -0.422               |  |
|                  | 1.146                  | 1.147                  | 1.014                  | 1.017                |  |
|                  | 0.109                  | 0.108                  | 0.378                  | 0.371                |  |
|                  | 0.045                  | 0.040                  | 0.057                  | 0.042                |  |
|                  | -0.406                 | -0.406                 | -0.251                 | -0.257               |  |
|                  | 0.053                  | 0.054                  | -0.032                 | -0.026               |  |
|                  | 0.565                  | 0.561                  | 0.791                  | 0.778                |  |
| DFE taps         | 0.170                  | 0.165                  | 0.338                  | 0.317                |  |
|                  | -0.344                 | -0.347                 | -0.161                 | -0.174               |  |
| FFE output noise | $45 \text{mV}_{rms}$   | 45 mV $_{rms}$         | $74 \mathrm{mV}_{rms}$ | $74 \text{mV}_{rms}$ |  |
| Residual ISI     | $19 \mathrm{mV}_{rms}$ | $19 \mathrm{mV}_{rms}$ | $41 \mathrm{mV}_{rms}$ | $41 \text{mV}_{rms}$ |  |
| MSE              | $49 \mathrm{mV}_{rms}$ | $49 \text{mV}_{rms}$   | $85 \text{mV}_{rms}$   | $85 \text{mV}_{rms}$ |  |

noise along with the signal, known as noise boosting. Pushing more equalization toward DFE happens to reduce the excessive output noise since the DFE, in contrast to FFE, does not suffer from noise boosting. This shift of more equalization to DFE, however, prompts error propagation as will be explained later.

# III. JITTER NOISE MODEL AND SAMPLER LOCATION

The sampling noise introduced by the clock jitter depends on the sampler location in the receiver chain. Due to advancements in DSP technologies, current state-of-the-art wireline receivers use an ADC to sample and quantize the signal after the CTLE [4], [5], [6], [9], [10], [11], [12], [13]. Then all the time-domain equalizations including FFE and DFE are implemented in the DSP. Nevertheless, we characterize both pre-FFE sampling and post-FFE sampling approaches. This is to extend the applicability of the analysis to implementations where for various reasons, such as cost optimization and increasing ADC dynamic range utilization, FFE is partially or entirely implemented outside the DSP and in the analog





Fig. 9. Equalized pulse response for two different noise power at the equalizer input, a)  $30\text{mV}_{rms}$ , b)  $60\text{mV}_{rms}$ .

continuous-time front-end [4], [5], [14], [15]. In the former case, noise from the sampling jitter is added to the signal before the equalizer (see Fig. 3), and introduces a new component to the noise correlation matrix. In the case of post-FFE sampling, as shown in Fig. 10a, the sampling occurs after the equalizer, where the continuous-time FFE (CT-FFE) has already shaped the pulse response. Since the translation of the jitter to the voltage noise is dependent on the shape of the pulse response, the sampler noise power can be controlled by the equalizer. However this noise is added after the equalizer. On the other hand, the developed MMSE equation can accommodate the noise added before the FFE/DFE. By input-referring the sampler noise of the CT-FFE, we will be able to use the MMSE equations to optimize the equalizer while taking post-FFE sampler noise into account.

### A. Pre-FFE Sampling

Considering h(t) from (12), the received signal at the input of the sampler (CTLE output) in Fig. 3 is:

$$v(t) = \sum_{i} a_i h(t - iT)$$
 (22)

where  $a_i$  represents transmitted data. With a random jitter of  $\Delta_k$  at sampling time t = kT, the sampler output will be:

$$v_{\Delta}(k) = v(kT + \Delta_k) = \sum_{i} a_i h((k-i)T + \Delta_k)$$
 (23)

Approximating (23) with its first two terms of the Taylor series expansion we have:

$$v_{\Delta}(k) = \sum_{i} a_{i}h((k-i)T) + \Delta_{k} \sum_{i} a_{i}h'((k-i)T)$$
$$= v_{k} + n_{k}$$
(24)

In the above equation, h'(.) denotes the derivative of the pulse response,  $v_k$  is the ideal sample value, and  $n_k$  is the noise introduced by sampler jitter at t = kT. Considering  $h'_{k-i} = \frac{dh(t)}{dt}|_{t=(k-i)T}$  we have:

$$n_k = n(kT) = \Delta_k \sum_i a_i h'_{k-i}$$
 (25)

Since the noise samples at the FFE multipliers input are delayed versions of  $n_k$ , we have:

$$\eta_m(k) = n_{k-m+1} = \Delta_{k-m+1} \sum_i a_i h'_{k-i-m+1}$$
 (26)

Now the noise correlation matrix can be calculated by incorporating (26) into (3) and considering the jitter as a stationary process as follows:

$$\mathbf{M}_{mn} = E\left\{\eta_m\left(k\right) \times \eta_n\left(k\right)\right\} = E\left\{n_k \times n_{k-(n-m)}\right\}$$
 (27)

Continuing calculations for the right-hand side of (27), we have:

$$E\{n_{k}, n_{k-l}\} = E\left\{ \left[ \Delta_{k} \sum_{i} a_{i} h'_{k-i} \right] \left[ \Delta_{k-l} \sum_{j} a_{j} h'_{k-l-j} \right] \right\}$$

$$= E\left\{ \Delta_{k} \Delta_{k-l} \sum_{i} \sum_{j} a_{i} a_{j} h'_{k-i} h'_{k-l-j} \right\}$$

$$= E\left\{ \Delta_{k} \Delta_{k-l} \right\} \sum_{i} \sum_{j} E\left\{ a_{i} a_{j} \right\} h'_{k-i} h'_{k-l-j}$$
(28)

where:

$$l = n - m \tag{29}$$

For i.i.d source symbols  $a_i$ , and random uncorrelated jitter  $\Delta_k$  (later we will discuss correlated jitter), we have:

$$E\left\{a_i a_j\right\} = \begin{cases} \sigma_a^2 & i = j\\ 0 & else \end{cases} \tag{30}$$

$$E\left\{\Delta_{k}\Delta_{k-l}\right\} = \begin{cases} \sigma_{J}^{2} & l = 0\\ 0 & else \end{cases}$$
 (31)

where  $\sigma_a^2$  and  $\sigma_J^2$  are i.i.d source power and random jitter power, respectively. So, (28) can be simplified as:

$$E\{n_k, n_{k-l}\} = \begin{cases} \sigma_J^2 \sigma_a^2 \sum_i {h_i'}^2 & l = 0\\ 0 & else \end{cases}$$
 (32)

The above equation indicates that when sampling is performed before the FFE, having uncorrelated sampling clock jitter, the noise introduced by the translation of sampling jitter to voltage



Fig. 10. Continuous-time FFE model a) sampled at the equalizer output, b) equivalent model for jitter analysis,

will also be uncorrelated (which will not hold for the case of post-FFE sampling).

Finally, for the uncorrelated noise, the noise correlation matrix can be created similar to what was done in (20) for the quantization noise, in which  $\sigma_n^2$  is:

$$\sigma_n^2 = \sigma_J^2 \sigma_a^2 \sum_i {h_i'}^2 \tag{33}$$

where  $\sigma_n^2$  is the power of the voltage noise introduced by translating the sampling clock jitter to voltage noise, considering h(t) as the pulse response at the sampler location.

In the case of having correlated clock jitter, following (28), the noise correlation matrix becomes:

$$\mathbf{M}_{mn} = E\{\Delta_k \Delta_{k-l}\} \sigma_a^2 \sum_i h_i' h_{i-l}' \qquad l = m - n \qquad (34)$$

Incorporating (34) into (5)-(6) provides us with the optimum equalizer parameters in the presence of sampling clock jitter. Also, the final contribution of jitter-induced noise at the slicer input (FFE output) can be calculated using (10).

# B. Post-FFE Sampling

In the case of a CT-FFE implementation, where the sampling occurs after a continuous-time linear combiner's summation node, one can first derive the pulse response at the output of the CT-FFE. Then (33) can be used to determine jitter-induced noise power. The noise obtained in this manner can significantly differ from when the jitter is translated to the voltage noise at the CT-FFE input. As an extreme case example, one can consider a fully equalized pulse response at the CT-FFE output for which the derivatives at sampling points are zero. Hence the resultant jitter-induced noise power would be zero. So, it makes sense to discriminate between pre-FFE and post-FFE sampling approaches when trying to find optimum equalizer parameters.

Post-FFE sampling is illustrated in Fig. 10(a). In this case, although the jitter-induced noise is added after the CT-FFE, it is still affected by the equalizer parameters. On the other hand, in MMSE equations, noise sources are characterized within the noise correlation matrix, **M**, the elements of which are the cross-correlations between noise samples at the input

of the CT-FFE multipliers. So, we should find an input referred equivalent for jitter noise.

To push the noise source into the optimization chain, the sampler in Fig. 10(a) can conceptually be moved to the input of the CT-FFE multipliers, as shown in Fig. 10(b). Note that an exact replica of the sampling clock triggers all the samplers in Fig. 10(b). Also, signals at the input of samplers are delayed versions of v(t) in (22). Therefore, applying the same steps as (23) and (24) on these signals, we can calculate the noise at the input of multiplier m as:

$$\eta_m(k) = \Delta_k \sum_i a_i h'_{k-i-m+1} \tag{35}$$

The main difference between (26) and (35) is that in the later the jitter term,  $\Delta_k$  is independent of m. Considering the jitter as a stationary process, substituting (35) into (27) and applying (29):

$$\mathbf{M}_{mn} = E\left\{ \left[ \Delta_k \sum_{i} a_i h'_{k-i} \right] \left[ \Delta_k \sum_{j} a_j h'_{k-l-j} \right] \right\}$$
$$= E\left\{ \Delta_k \Delta_k \right\} \sum_{i} \sum_{j} E\left\{ a_i a_j \right\} h'_{k-i} h'_{k-l-j}$$
(36)

Finally, by taking (30) and (31) into account,  $\mathbf{M}_{mn}$  is calculated as:

$$\mathbf{M}_{mn} = \sigma_J^2 \sigma_a^2 \sum_i h_i' h_{i-l}' \qquad l = m - n$$
 (37)

The final equation shows that in the case of post-FFE sampling, the jitter-induced noise can be transferred to the equalizer input. The power of this input-referred noise is the same as that of jitter-induced noise in the case of pre-FFE sampling. But the noise samples are correlated through the link overall pulse response, h(n) (even if the sampling clock jitter is assumed to be uncorrelated). Also, (36) indicates that the jitter correlations from sampling clock source ( $E\{\Delta_k\Delta_{k-l}\}$ ) does not contribute to the construction of M (and consequently the power of the induced noise by the sampler). In other words, (37) remains intact, whether the sampling clock jitter is correlated or not. It is in contrast with (34) for the case of pre-FFE sampling in which the non-zero values for  $E\{\Delta_k\Delta_{k-l}\}$  alter the elements of the noise correlation matrix.

 $TABLE\ IV$  Test Pulse Response Normalized Derivatives (dV/Dt/baud)\*

| Cursor          | 1     | 2     | 3   | 4     | 5     |
|-----------------|-------|-------|-----|-------|-------|
| Derivatives[mV] | 4.2   | 65.1  | 763 | -16.4 | -509  |
| Cursor          | 6     | 7     | 8   | 9     | 10    |
| Derivatives[mV] | -238  | 35.2  | 5.3 | -97.2 | 50.5  |
| Cursor          | 11    | 12    | 13  | 14    | 15    |
| Derivatives[mV] | -23.5 | -31   | 25  | -17.1 | -10.6 |
| Cursor          | 16    | 17    | 18  | 19    | 20    |
| Derivatives[mV] | -11.1 | -12.2 | 1.3 | 2.2   | -4.7  |

<sup>\*</sup>To obtain absolute derivatives, these values must be scaled by baud-rate (56Gsps).

### C. Models Validation

Earlier, we justified the accuracy of MMSE equations in obtaining optimum equalizer tap weights. So, here we just demonstrate the accuracy of our jitter models by comparing the analytical calculation of jitter-noise against the time-domain simulation results.

Again, we use the pulse response of Fig. 5 in a PAM-4 modulation scheme. Corresponding pulse derivatives at sampling points (normalized to baud rate) are listed in Table IV. Assuming random jitter standard deviation  $\sigma_J = 100$ mUI and pulse response derivatives as in Table IV, the sigma of the noise introduced by sampling clock jitter at the output of CTLE (input of the equalizer) can be calculated using (33). With the PAM-4 constellation of [-1, -1/3, 1/3, 1], (33) results in  $\sigma_n = 71.5 m V_{rms}$ .

Considering a 5-tap FFE after CTLE with taps of [-0.075, 0.229, -0.574, 1.386, -0.523], the jitter-induced noise power is calculated for two cases of pre- and post-FFE sampling. For the case of pre-FFE sampling, the noise correlation matrix,  $\mathbf{M}$ , is created by incorporating  $\sigma_n = 71.5 m V_{rms}$  into (20). Then using (10), the contribution of jitter noise at FFE output is calculated as  $\sigma_{n,out} = 114.9 \text{mV}_{rms}$ .

For post-FFE sampling, **M** is created by substituting  $\sigma_J = 100mUI$ ,  $\sigma_a = 5/9$ , and the values listed in Table IV into (37). In this case, (10) results in  $\sigma_{n,out} = 88\text{mV}_{rms}$ .

To justify the obtained values for  $\sigma_{n,out}$ , the corresponding time-domain model for each scenario (pre- and post-FFE sampling) is realized in MATLAB SIMULINK. Then, the FFE output signal is compared to a reference jitter-less system to obtain the error component due to the jitter, and the RMS value of the error is used to validate the calculation results. The error signal has a general form of the steady-state part of Figure 8, but with different magnitude. According to the time-domain simulation results in MATLAB SIMULINK, an error of 113.8mV<sub>rms</sub> and 89.1mV<sub>rms</sub> is obtained for the case of pre- and post-FFE sampling, respectively, which are closely matched to the calculated values for  $\sigma_{n,out}$  using derived equations (114.9mV<sub>rms</sub> and 88mV<sub>rms</sub>).

### IV. APPLICATION TO SOME DESIGN VARIANTS

## A. Pre-Determined (PD) MMSE

MMSE-DFE is able to directly provide the DFE tap values resulting in minimum noise and ISI power (maximum SNR).

But, in DFE-based receivers, the translation of SNR to BER is not straight-forward since an erroneous decision made by the DFE can propagate in the DFE feedback loop and result in a burst of errors [16], referred to as DFE error propagation. Error propagation can significantly impact the performance and worsens as the DFE tap size increases due to more tendency of the stronger feedback to preserve the error in the loop. Therefore, if the resultant DFE taps using MMSE-DFE become too large, the equalizer would not be at its overall optimum operating condition.

Analysis of the exact correlation between DFE tap size and the system BER with the presence of error propagation is beyond the scope of this paper [16], [17]. However, the studies show that in most cases keeping the DFE tap size bounded to certain values is enough to minimize the impact of error propagation [5], [18]. Although we can not bound the tap size using closed form MMSE equations, we will show how this goal can be achieved by minimum iterations.

With MMSE equations, we can control the exact value of DFE taps. To do so, the FFE needs to be optimized to generate a certain number of post-cursors with predetermined values. This can be achieved by altering the desired FFE output vector  $\mathbf{h}_{\delta}$ . As mentioned before, the  $\delta^{th}$  element of  $\mathbf{h}_{\delta}$  corresponds to the output pulse main cursor, and anything afterward is considered a post-cursor. So, specific post-cursor values can be achieved by setting the corresponding elements of  $\mathbf{h}_{\delta}$  to the desired values as:

$$\mathbf{h}_{\delta}^{\mathbf{T}} = [0, \dots, 0, 1, b_1, b_2, \dots, b_M, 0, \dots]$$
 (38)

where  $b_i$  denotes the corresponding value of the  $i^{th}$  post-cursor. Incorporating (38) into MMSE equations provides the MMSE solution constrained to outputting a pulse response with a specific set of post-cursors. We refer to the above method as PreDetermined post-cursor MMSE (PD-MMSE). Note that in this context, we use the terms "DFE tap" and "post-cursor" interchangeably as the DFE tap values are equal to pulse response post-cursors at the input of the DFE.

Fig. 11 shows the variation of SNR versus DFE tap size for a 5-tap FFE and 1-tap DFE equalizer where the FFE tap values are found using PD-MMSE.<sup>4</sup> The optimum tap size of 0.65 is equal to the value that is directly calculated using MMSE-DFE method. According to the SNR profile in Fig. 11, moving DFE tap from 0 to the optimum tap size of 0.65, the SNR monotonically increases. As this is the common behaviour for any single tap DFE system, one can conclude that presetting the DFE tap value to any specific value less than the optimum size implicitly corresponds to bounding the DFE tap size. With the above observations, the process of designing an equalizer including a single-tap DFE with consideration on error propagation is summarized in the flowchart of Fig. 12. Precise determination of the DFE tap threshold in Fig. 12 requires post-FEC BER analysis [17], however, [19] lists practical values for this threshold for various applications/standards developed for different channel losses.

 $^4$ For all remaining examples in this paper we have used the pulse response of Fig. 5 considering  $30\text{mV}_{rms}$  noise at equalizer input and PAM-4 modulation.

TABLE V
EQUALIZER PARAMETERS CONSIDERING THE JITTER EFFECT AND LOCATION OF THE SAMPLER

|                   | FFE taps |      |      |       | DFE taps |       |       |       | $\sigma_{\mathbf{n},\mathbf{Out}}$ |       |                          |
|-------------------|----------|------|------|-------|----------|-------|-------|-------|------------------------------------|-------|--------------------------|
| Pre-FFE Sampling  | -0.26    | 0.84 | 0.72 | -0.14 | -0.32    | -1    | -0.23 | 0.247 | 0.06                               | -0.05 | $84.7 \text{mV}_{rms}$   |
| Post-FFE Sampling | -0.29    | 0.93 | 0.48 | -0.04 | -0.20    | -0.85 | -0.28 | 0.12  | -0.02                              | -0.06 | $90.0 \mathrm{mV}_{rms}$ |



Fig. 11. SNR vs. DFE tap using PD-MMSE.

### B. Partially Pre-Determined (PPD) MMSE

Although finding the optimum performance condition using PD-MMSE is relatively straightforward for one DFE tap, as soon as the number of DFE taps goes beyond one, it grows in complexity because of the need for multi-dimensional sweep of tap values to find a combination of taps that meets both the MMSE criterion and tap size limits.

For a multi-tap DFE system, instead of sweeping all taps using the PD-MMSE method directly, it is possible to change the magnitude of all taps only with manipulating the size of one of the DFE taps. For example, we can pre-determine the 1st DFE tap (assuming it is the largest tap), and let the optimum value of the remaining tap(s) be determined using the similar approach applied for MMSE-DFE design. This way, the combination of the predetermined tap and the calculated one(s) always meets the MMSE criterion (constrained to the predetermined tap). To do so, the  $(\delta + 1)^{th}$  element of  $\mathbf{h}_{\delta}$  in (38), is set to a value in the sweep range of the 1<sup>st</sup> DFE tap. Simultaneously the rows of C corresponding to other DFE taps (taps two and higher) are set to zero to construct  $C_M$ . This process is demonstrated in Fig. 13 for a 4-tap FFE and 3-tap DFE system. We call this method Partially Pre-Determined post-cursor MMSE (PPD-MMSE).

Fig. 14 illustrates the result of applying PPD-MMSE to a 10-tap FFE and 3-tap DFE equalizer example in which the 1<sup>st</sup> DFE tap is swept from 0 to 1, and the corresponding values of remaining taps and the SNR are calculated and plotted. This plot shows that in this example the 2<sup>nd</sup> and 3<sup>rd</sup> taps are reduced by decreasing the 1<sup>st</sup> tap. In other words, by sweeping the 1<sup>st</sup> tap, we can indirectly modify the magnitude of the remaining taps. However, there are crossover points for small values of the 1<sup>st</sup> tap, where the magnitude of the 2<sup>nd</sup> and 3<sup>rd</sup> taps become larger than the 1<sup>st</sup> tap. Fig. 14 also shows that, same as the PD-MMSE, changing the size of the 1<sup>st</sup> tap from 0 to its optimum value of 0.87, the SNR monotonically increases. Again we may conclude that presetting the 1<sup>st</sup> DFE



Fig. 12. Bounding single-tap DFE size for error propagation considerations.

TABLE VI

10-TAP FFE, 3-TAP DFE EQUALIZER PARAMETER USING MMSE-DFE

| FFE taps         |      |       |       |       |       |      | OFE tap | os    |
|------------------|------|-------|-------|-------|-------|------|---------|-------|
| $w_1 - w_5$ :    | 0.02 | -0.07 | 0.18  | -0.43 | 1.00  | 0.87 | 0.37    | -0.21 |
| $w_6 - w_{10}$ : | 0.45 | 0.10  | -0.36 | 0.05  | -0.06 |      |         |       |

tap to any specific value less than the optimum size implicitly corresponds to bounding the 1<sup>st</sup> DFE tap size, and by avoiding the aforementioned crossover points it can be extended to bounding all remaining DFE taps. However, practically, the limiting magnitudes for the 2<sup>nd</sup> and 3<sup>rd</sup> taps are not necessarily same as the 1st one, and it is usually smaller [19]. So, it is possible that applying PPD-MMSE to the 1st tap does not result in a situation in which all DFE taps meet their limits. For example, let's assume taps 1-3 should be limited to 0.4, 0.3 and 0.2, respectively. According to Fig. 14, no optimum combination of DFE taps meets these limits simultaneously. In other words, limiting the 1<sup>st</sup> tap to 0.4 results in an optimum value of 0 and -0.43 for taps 2 and 3 (3<sup>rd</sup> tap exceeding the limit). In this case, PPD-MMSE can be extended to limit the 1st and 3rd tap magnitudes to 0.4 and 0.2, respectively, which results in an optimum value of 0.16 for the 2<sup>nd</sup> tap, meeting the requirement.

It should be noted that in the above demonstration we applied the predetermination to the 1<sup>st</sup> DFE tap because in a properly designed equalizer with multi-tap DFE, the 1<sup>st</sup> tap often has the highest magnitude and the most contribution to error propagation. Even if the CTLE over-equalizes the input signal, which may result in a pulse response with other post-cursors greater than the 1<sup>st</sup> one, this over-equalization will be cancelled by the FFE preceding the DFE. Nevertheless, in extreme cases, same approach can be applied to predetermine the most significant DFE tap first. Also, there is no limit on the number of DFE taps when applying PPD-MMSE. The selected combination of FFE-DFE tap is just an example with a moderate complexity to demonstrate the process.

Finally, the tap values for the MMSE-optimized 10-tap FFE and 3-tap DFE equalizer is calculated using MMSE-DFE and

$$\begin{bmatrix} h_0 & 0 & 0 & 0 \\ h_1 & h_0 & 0 & 0 \\ h_2 & h_1 & h_0 & 0 \\ h_3 & h_2 & h_1 & h_0 \\ h_4 & h_3 & h_2 & h_1 \\ 0 & 0 & 0 & h_4 & h_3 \\ 0 & 0 & 0 & h_4 \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ w_4 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 1 \\ \alpha \\ \beta \\ \gamma \\ 0 \end{bmatrix}$$

$$\begin{bmatrix} h_0 & 0 & 0 & 0 \\ h_1 & h_0 & 0 & 0 \\ h_2 & h_1 & h_0 & 0 \\ h_3 & h_2 & h_1 & h_0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & h_4 & h_3 \\ 0 & 0 & 0 & h_4 \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ w_4 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 1 \\ w_4 \end{bmatrix}$$

$$\begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ w_4 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 1 \\ w_4 \end{bmatrix}$$

$$\begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ w_4 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 1 \\ 0 \\ 0 \end{bmatrix}$$

$$\begin{bmatrix} corresponding rows to \\ corresponding rows$$

Fig. 13. Modification of C matrix in PPD MMSE method.



Fig. 14. Variation of DFE taps and SNR in PPD-MMSE.

listed in Table VI. The matching between the DFE tap values in Table VI and the numbers annotated in Fig. 14 at maximum SNR (minimum MSE) indicates that the PPD-MMSE indeed provides the optimum MMSE solution constrained to the size of the 1<sup>st</sup> DFE tap. Note that for this example, this optimum result is achieved by avoiding a 3-dimensional sweep.

An alternative to DFE in wireline receivers is the application of Maximum Likelihood Sequence Estimation (MLSE) [20], [21], [22], [23], as shown in Fig. 15. In this case we look for an optimum partially equalized pulse response. MLSE enhances the received signal SNR at the expense of complexity. In general, MLSE can be of higher orders and flexibility as  $1 + \alpha D + \beta D^2 + \dots$ , where alpha, beta, ... denote 1<sup>st</sup>, 2<sup>nd</sup>, ... post-cursors of the input pulse to the sequence detector, respectively. However, due to the rapid increase in the complexity of higher order MLSE, DFE may still be the preferred choice when several noticeable post-cursors are still remained. A commonly used MLSE configuration is 1 + D, in which the input pulse to the sequence detector is equalized to have only one post-cursor at the same magnitude as the main cursor. Although finding the optimum MLSE configuration is beyond the scope of this paper, PD-MMSE can be used for designing the equalizer in front of MLSE targeting specific set of the post-cursors. We applied this method to design an example 5-tap FFE followed by a 1 + D MLSE, considering the channel of Fig. 5, PAM-4 signalling, and  $30\text{mV}_{rms}$  noise at

|       | -      | FFE taps | 6     |        | $\alpha$ |
|-------|--------|----------|-------|--------|----------|
| 0.085 | -0.314 | 0.805    | 0.856 | -0.520 | 1        |

the FFE input. Calculations are straightforward and results are summarized in Table VII. In this example, because of targeting a post-cursor as large as the main one, two FFE taps (3<sup>rd</sup> and 4<sup>th</sup> taps) have relatively large values. Even though the 1+D MLSE may not be as good as the optimum  $1+\alpha D$ , but regarding the implementation complexity it is still preferred.

# C. FFE Tap-Skipping

It is well known that a DFE does not suffer from noise enhancement of an FFE [24]. It's also been demonstrated that at high rates, the DFE feedback loop imposes a timing bottleneck, difficult to overcome in a straightforward implementation. As a result, speculative DFEs have been widely used in high-speed wireline receivers [5], [9], [11], [12], [13]. The problem, however, is now the implementation complexity grows exponentially with number of taps. As a result, most implementations have converged to a combination of a short DFE and a long FFE with an overlap window of post-cursor locations.

Given the limited available hardware resources and the noise advantage of the DFE, it may be wise in some applications to leave the overlapped cursors to the DFE and re-target the FFE resources to equalize the cursors outside this window. For example, in a system containing a 4-tap FFE followed by a 1-tap DFE, we can extend the FFE window to 5 taps while the tap corresponding to the first post-cursor is set to zero (still having only four multipliers). We call this method FFE tap-skipping and its optimization process Tap-Skipping MMSE-DFE (TS MMSE-DFE).

According to (9), having the  $k^{th}$  element of  $\mathbf{w}$  equal to zero eliminates the contribution the  $k^{th}$  column of  $\mathbf{C}$  to the total noise power. A similar argument applies to the  $k^{th}$  column and row of the noise correlation matrix, given by (10). So, to mimic having zero FFE coefficients, we define  $\mathbf{w}_M$ ,  $\mathbf{C}_M$  and  $\mathbf{M}_M$  as follows:  $\mathbf{w}_M$  is a truncated version of the FFE taps, in which the taps that overlap in position with the DFE taps are removed.  $\mathbf{C}_M$  is a reduced size version of  $\mathbf{C}$  where the



Fig. 15. MLSE-based wireline transceiver.

TABLE VIII
FFE TAP SKIPPING VS CONVENTIONAL MMSE-DFE

|             | FFE taps |        |      |         |        | DFE tap | Error                   | SNR  |
|-------------|----------|--------|------|---------|--------|---------|-------------------------|------|
| TS MMSE-DFE | 0.130    | -0.454 | 1.16 | 0       | -0.196 | -0.389  | $125 \mathrm{mV}_{rms}$ | 15.5 |
| MMSE-DFE    | 0.147    | -0.517 | 1.33 | - 0.426 | N/A    | -0.102  | $148 \mathrm{mV}_{rms}$ | 14.1 |



Fig. 16. Modification of C and w in FFE tap skipping.



Fig. 17. Modification of M in FFE tap skipping.

columns corresponding to the overlapping taps, are removed and rows corresponding to the overlapping taps are set to zero. Finally,  $\mathbf{M}_M$  is created by removing both the columns and rows of  $\mathbf{M}$  that correspond to the overlapping taps. In Fig. 16 and Fig. 17, this process is illustrated for an example 4-tap FFE and 1-tap DFE equalizer, where  $\mathbf{w}_3$  is considered as the FFE main tap. Here, the FFE window size is extended to 5 taps by inserting a new tap of zero at the position of  $\mathbf{w}_4$  (corresponding to the first post-cursor). By these modifications, (5)-(7) can be used to obtain  $\mathbf{w}_{M,opt}$ . Finally,  $\mathbf{w}$  can be reconstructed by embedding zeros back in the locations of the overlapping taps.

Table VIII summarizes the results of applying TS MMSE-DFE to a 4-tap FFE and 1-tap DFE for the pulse response of Fig. 5 with  $30\text{mV}_{rms}$  noise at the equalizer input. Table VIII also contains the results for a conventional equalizer

implementation that has overlap between FFE and DFE taps designed using MMSE-DFE method. Here, the FFE main tap is kept fixed at 3 for both cases. Comparing the results shows a 1.4dB improvement in SNR with almost no additional hardware (same number of active taps and only one more delay stage). This improvement has been achieved because in this case extending the reach of the FFE taps by one covers one more noticeable cursor according to Fig. 5.

The effectiveness of TS MMSE-DFE is not always guaranteed. In fact, by applying TS MMSE-DFE, we slightly lose the control of the post-cursor ISIs in favour of extending FFE window by adding intermediate zero taps. But, depending on the channel characteristics, sometimes extending the FFE window may not offer any significant improvement to compensate for this loss of ISI control. To make it more clear, Figure 18 shows the variation of SNR vs FFE length for the equalizer of the above example, where sharp SNR transitions when moving from 3-tap to 4-tap FFE and from 6-tap to 7-tap FFE are observed. Following the above explanation, we should expect a higher SNR improvement when applying TS MMSE-DFE to 3-tap and 6-tap FFE compared to the other cases. This is consistent with the results shown in Table IX, which summarizes a comparison between MMSE-DFE and TS MMSE-DFE while sweeping the number of the FFE taps. As expected, Table IX shows the highest SNR improvement of 2.4dB when applying the TS MMSE-DFE to the 6-tap FFE. Note that in this case the



Fig. 18. Variation of SNR vs FFE length for the equalizer with 1-tap DFE.

TABLE IX MMSE-DFE vs TS MMSE-DFE

| Number of taps |     | MMSE-    | DFE  | TS MMSE-DFE |      |  |
|----------------|-----|----------|------|-------------|------|--|
| FFE            | DFE | main tap | SNR  | main tap    | SNR  |  |
| 3              |     | 2        | 12.7 | 2           | 14   |  |
| 4              |     | 2        | 14.7 | 3           | 15.5 |  |
| 5              | 1   | 3        | 15.9 | 4           | 15.8 |  |
| 6              | 1   | 4        | 16.1 | 3           | 18.3 |  |
| 7              |     | 3        | 18.5 | 4           | 19   |  |
| 10             |     | 3        | 20.6 | 4           | 20.8 |  |

resultant SNR of 18.3dB lands between 16.1dB and 18.5dB (the SNR of conventional 6-tap and 7-tap FFE). Figure 18 also shows that the sharp SNR transitions happen for shorter FFEs, and the plot of SNR vs FFE length turns flat by increasing the FFE length. So, we should generally expect a higher SNR gain from TS MMSE-DFE for shorter FFEs, as long as those sharp SNR transitions are in scope of the available FFE hardware (number of available multipliers). However, based on the channel characteristics, flat transitions can sometimes happen in short FFE regions of the SNR plot. This happens to be the case for our example when moving from 5-tap to 6-tap FFE in Figure 18. As a result, according to Table IX, for the case of 5-tap FFE, there is no SNR improvement when applying tap skipping. In fact, in this particular situation, there is even a slight 0.1dB degradation due to the loss of control on post-cursor ISI.

### V. CONCLUSION

This paper presents a set of analytical equations to calculate optimum FFE/DFE parameters in a wireline receiver. Equations can co-optimize FFE and DFE, taking into consideration noise colouring effect as well as sampling jitter. Two separate approaches were presented to consider the sampling clock jitter, one for pre-FFE sampling for discrete-time FFEs, and another for post-FFE sampling for continuous-time FFEs. The equations were further manipulated, and methods were proposed for different design variants of wireline receivers. The PD-MMSE can design equalizers with pre-determined post-cursors at FFE output (or predetermined DFE taps). Main applications of this method are FFE designs optimized to work with fix-tap DFE or MLSE. The PD-MMSE can also be used to limit the tap sizes of the DFE to reasonable ranges dictated by the DFE error propagation. The final method, TS-MMSE,

can enhance equalizer performance where limited hardware resources (FFE multipliers) are available. It was shown that depending on the channel condition, the equalizer performance can be improved by removing the overlap between the FFE and DFE taps and reusing the hardware to extend the FFE window size. Finally, it should be emphasized that the method proposed in this work is not meant to replace the LMS adaptation loop, which is typically employed in the final product, rather it provides a quick way to calculate the adaptation results for fast and ease of system level design, simulation, and verification.

### REFERENCES

- [1] R. D. Gitlin, J. F. Hayes, and S. B. Weinstein, *Principles of Data Communication*. New York, NY, USA: Plenum, 1992.
- [2] R. Johnson, P. Schniter, T. J. Endres, J. D. Behm, D. R. Brown, and R. A. Casas, "Blind equalization using the constant modulus criterion: A review," *Proc. IEEE*, vol. 86, no. 10, pp. 1927–1950, Dec. 1998.
- [3] C. R. Johnson et al., "On fractionally-spaced equalizer design for digital microwave radio channels," in *Proc. Conf. Rec. 29th Asilomar Conf. Signals, Syst. Comput.*, vol. 1, 1995, pp. 290–294, doi: 10.1109/ACSSC.1995.540558.
- [4] S. Pavan, "Power and area-efficient adaptive equalization at microwave frequencies," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 6, pp. 1412–1420, Jul. 2008, doi: 10.1109/TCSI.2008.918149.
- [5] H. Lin et al., "ADC-DSP-based 10-to-112-Gb/s multi-standard receiver in 7-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 56, no. 4, pp. 1265–1277, Apr. 2021, doi: 10.1109/JSSC.2021.3051109.
- [6] J. Bailey et al., "A 112-Gb/s PAM-4 low-power nine-tap sliding-block DFE in a 7-nm FinFET wireline receiver," *IEEE J. Solid-State Circuits*, vol. 57, no. 1, pp. 32–43, Jan. 2022, doi: 10.1109/JSSC.2021.3109167.
- [7] S. Kiran et al., "Modeling of ADC-based serial link receivers with embedded and digital equalization," *IEEE Trans. Compon.*, *Packag., Manuf. Technol.*, vol. 9, no. 3, pp. 536–548, Mar. 2019, doi: 10.1109/TCPMT.2018.2853080.
- [8] S. Kiran et al., "A 56 GHz receiver analog front end for 224 Gb/s PAM-4 SerDes in 10 nm CMOS," in *Proc. Symp. VLSI Circuits*, Kyoto, Japan, Jun. 2021, pp. 1–2, doi: 10.23919/VLSICircuits52068.2021.9492471.
- [9] Y. Segal et al., "A 1.41 pJ/b 224 Gb/s PAM-4 SerDes receiver with 31 dB loss compensation," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2022, pp. 114–116, doi: 10.1109/ISSCC42614.2022.9731794.
- [10] M.-A. LaCroix et al., "A 116 Gb/s DSP-based wireline transceiver in 7 nm CMOS achieving 6 pJ/b at 45 dB loss in PAM-4/duo-PAM-4 and 52 dB in PAM-2," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2021, pp. 132–134, doi: 10.1109/ISSCC42613.2021.9366030.
- [11] D. Xu et al., "A scalable adaptive ADC/DSP-based 1.25-to-56 Gbps/112 Gbps high-speed transceiver architecture using decision-directed MMSE CDR in 16 nm and 7 nm," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2021, pp. 134–136, doi: 10.1109/ISSCC42613.2021.9366063.
- [12] Z. Guo et al., "A 112.5 Gb/s ADC-DSP-based PAM-4 long-reach transceiver with >50 dB channel loss in 5 nm Fin-FET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2022, pp. 116–118, doi: 10.1109/ISSCC42614.2022.9731650.
- [13] H. Park et al., "A 4.63 pJ/b 112 Gb/s DSP-based PAM-4 transceiver for a large-scale switch in 5 nm FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2023, pp. 5–7, doi: 10.1109/ISSCC42615.2023.10067613.
- [14] J. Sewter and A. C. Carusone, "A 3-tap FIR filter with cascaded distributed tap amplifiers for equalization up to 40 Gb/s in 0.18-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1919–1929, Aug. 2006, doi: 10.1109/JSSC.2006.875293.
- [15] O. Eshet, A. Ran, A. Mezer, Y. Hadar, D. Lazar, and M. Moyal, "An adaptive 4-tap analog FIR equalizer for 10-Gb/s over backplane serial link receiver," in *Proc. 34th Eur. Solid-State Circuits Conf. (ESSCIRC)*, Edinburgh, U.K., Sep. 2008, pp. 178–181, doi: 10.1109/ESSCIRC.2008.4681821.
- [16] W. Feller, An Introduction to Probability Theory and Its Applications, vol. 1, 2nd ed. New York, NY, USA: Wiley, 1957.

- [17] M. Yang, S. Shahramian, H. Shakiba, H. Wong, P. Krotnev, and A. C. Carusone, "Statistical BER analysis of wireline links with nonbinary linear block codes subject to DFE error propagation," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 1, pp. 284–297, Jan. 2020, doi: 10.1109/TCSI.2019.2943569.
- [18] Y. C. Lu, H. Wong, and D. Tonietto, "DFE error propagation characteristics in real 56 Gbps PAM4 high-speed links with pre-coding and impact on the FEC performance," in *Proc. DesignCon*, 2017, pp. 9–10.
- [19] Optical Internetworking Forum (OIF). (Dec. 2022). Common Electrical I/O (CEI)—Electrical and Jitter Interoperability Agreements for 6G+ bps, 11G+ bps, 25G+ bps, 56G+ bps and 112G+ bps I/O. Accessed: Jun. 20, 2023. [Online]. Available: https://www.oiforum.com/wpcontent/uploads/OIF-CEI-5.0.pdf
- [20] G. Forney, "Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference," *IEEE Trans. Inf. Theory*, vol. IT-18, no. 3, pp. 363–378, May 1972, doi: 10.1109/TIT.1972.1054829.
- [21] G. D. Forney, "The Viterbi algorithm," Proc. IEEE, vol. 61, no. 3, pp. 268–278, Mar. 1973.
- [22] M. H. Shakiba, "Analog Viterbi detection for partial-response signaling," Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ. Toronto, Toronto, ON, Canada, 1997.
- [23] P. J. Black and T. H.-Y. Meng, "A 1-Gb/s, four-state, sliding block Viterbi decoder," *IEEE J. Solid-State Circuits*, vol. 32, no. 6, pp. 797–805, Jun. 1997.
- [24] P. K. Hanumolu, G.-Y. Wei, and U.-K. Moon, "Equalizers for high-speed serial links," *Int. J. High Speed Electron. Syst.*, vol. 15, no. 2, pp. 429–458, Jun. 2005.



Alireza Akbarpour Bazargani (Graduate Student Member, IEEE) received the B.Sc. degree in electrical engineering from the K. N. Toosi University of Technology, Tehran, Iran, in 2014, and the M.Sc. degree in electrical engineering from the Amirkabir University of Technology (Tehran Polytechnic), Tehran, in 2017. He is currently pursuing the Ph.D. degree in electrical engineering with the University of Toronto, Toronto, ON, Canada, with a focus on high-speed integrated circuit design for wireline communications. From 2014 to 2017.

he was with the Integrated Circuits Design Laboratory, Amirkabir University of Technology, doing his research on high-speed sigma-delta modulator design. His research interests include analog and mixed-signal integrated circuits, data converters, wireline communications, and biomedical circuits.



Hossein Shakiba (Senior Member, IEEE) received the B.Sc. and M.Sc. degrees in electrical engineering from the Department of Electrical and Computer Engineering, Isfahan University of Technology, Iran, in 1985 and 1989, respectively, and the Ph.D. degree in electrical engineering from the Department of Electrical and Computer Engineering, University of Toronto, Canada, in 1997. He has over 35 years of teaching, research, design, and management experience in the area of analog circuit and system design for various applications with a focus on wireline

communication in both the industry and academia and he is currently collaborating with the wireline industry for the development of modelling, analysis, and standardization of emerging links. He is also involved in conducting research with universities and co-supervises several graduate students. He has been the recipient of several awards, including the 1999 IEEE Darlington Award.



David A. Johns (Life Fellow, IEEE) received the B.A.Sc., M.A.Sc., and Ph.D. degrees from the University of Toronto, Canada, in 1980, 1983, and 1989, respectively. In 1988, he was hired at the University of Toronto, where he is currently a Full Professor. He has ongoing research programs in the general area of analog integrated circuits. His research work has resulted in more than 80 publications. Together with academic experience, he also has spent a number of years in the semiconductor industry and he was a Co-Founder of Snowbush Microelectronics.

Snowbush Microelectronics was an IP company with a specialization in SerDes technology. In 2007, Gennum acquired Snowbush, and thanks to Snowbush's presence, Toronto has become one of the leading global hubs for SerDes innovation and activity. He received the 1999 IEEE Darlington Award. He served as the Guest Editor for the IEEE JOURNAL OF SOLID-STATE CIRCUITS and an Associate Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS. His homepage is located at https://www.eecg.utoronto.ca/johns.