# An Adaptation Engine for a 2x Blind ADC-Based CDR in 65 nm CMOS

Behrooz Abiri, Ali Sheikholeslami, Senior Member, IEEE, Hirotaka Tamura, Senior Member, IEEE, and Masaya Kibune

Abstract—This paper proposes an adaptation engine for a  $2 \times$  blind sampling ADC-based receiver. The proposed adaptive engine uses a triangular desired waveform, instead of two fixed desired levels, to shape the equalizer output in spite of blind nature of sampling. The measured results confirm the adaptive engine restores a 5 Gb/s eye subjected to 13 dB of attenuation at Nyquist frequency to an equivalent of 320 mV of vertical opening. The receiver consumes 192 mW, out of which 78 mW is used by the digital CDR.

Index Terms—Adaptation, ADC, blind sampling, CDR, DFE, equalizer.

#### I. INTRODUCTION

T HE increasing demand in higher data rates through legacy backplane channels with limited bandwidth has introduced severe signal degradation due to inter-symbol interference (ISI) to the received signal. To recover data from this severely degraded signal, high equalization levels are required [1]. While analog equalization could be used in binary CDR's as shown in Fig. 1, the use of ADC as the sampler provides another layer of equalization in the digital domain. The combined equalization in analog and digital can be used to recover data from higher attenuation channels (Fig. 1(b)). Digital equalizers are easy to design and are portable across the technology nodes because they can be implemented in RTL. In addition, digital equalizers consume less power with technology advancement and are more robust to PVT variations.

As shown in Fig. 2, the sampling clock in ADC-based receivers could either track the phase of the incoming data by  $CK_{REC}$  or it could ignore the phase when a blind (asynchronous) clock,  $CK_{Blind}$ , is used. In a phase tracking system, as shown in Fig. 2(a), a digital phase detector compares the phase of the incoming data with the phase of the sampling clock. A low pass filter then sends digital control bits to a digitally-controlled oscillator (DCO) or a phase-interpolator (PI) in order to adjust the phase of the sampling clock [2]. In this system, there is a feedback loop containing both digital and analog components and, as a result, the delay of the feedback

Manuscript received April 27, 2011; revised July 19, 2011; accepted August 18, 2011. Date of publication October 28, 2011; date of current version November 23, 2011. This paper was approved by Guest Editor Miki Moyal.

B. Abiri and A. Sheikholeslami are with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada M5S 3G4 (e-mail: behrooz@eecg.utoronto.ca; ali@eecg.utoronto.ca).

H. Tamura and M. Kibune are with Fujitsu Laboratories Limited, Kawasaki 211-8588, Japan.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2011.2169183



Fig. 1. Block diagram of (a) a typical binary CDR and (b) an ADC-based CDR.



Fig. 2. ADC sampling method in digital CDR: (a) Phase-tracked clocking; (b) blind (asynchronous) sampling.

loop plays an important role in the stability of the system [3]. During the design, delay of both digital and analog blocks in the loop should be taken into account, which makes the mixed signal design complicated. On the other hand, a blind sampling CDR [4], as shown in Fig. 2(b), eliminates the feedback path and hence is unconditionally stable. This allows for independent design of the ADC and the remaining digital building blocks.

As mentioned earlier, the main advantage of an ADC-based CDR is in the availability of extra equalization in the digital domain. This extra equalization can be done either as a feed-forward equalizer (FFE) or a decision-feedback equalizer (DFE). An FFE [5] boosts both the signal and the noise at high frequencies. This noise, in the case of ADC-based CDR, includes the ADC quantization noise that may limit the performance. A DFE for blind sampling CDR is proposed in [6] to address this noise enhancement. In [6], the DFE coefficients are obtained manually by measuring the pulse response of the channel and



Fig. 3. Complete block diagram of the blind sampling ADC-based receiver.



Fig. 4. Four-phase clock divider.

subtracting it from a desired pulse response, where the latter is defined so as not to contain any ISI. This approach, however, does not lend itself easily to adaptation unless the data communication is interrupted or initiated by a training sequence so as to obtain the channel pulse response. To overcome this limitation, we propose [7] an adaptive DFE where the DFE coefficients are obtained during data transmission, i.e. without interruption by a training sequence. We have further integrated this adaptive DFE with the rest of the building blocks to demonstrate a complete receiver as shown in Fig. 3.

We explain the details of the ADC design [5], the feed-forward CDR [4], and the DFE as it was presented in [6] in the background section. The details of the proposed adaptive DFE will be discussed in Section III, followed by simulation and measurement results in Section IV and conclusions in Section V.

## II. BACKGROUND

## A. ADC

Flash ADC's are known to have higher conversion rate compared to other ADC architectures. The implemented CDR requires a sampling rate of 10 GS/s which is provided by four time-interleaved 5-bit flash ADC's, each sampling at 2.5 GS/s. The ADC sampling clocks are generated by a 4-phase clock divider (Fig. 4) which is driven by an external 5 GHz clock source. The divider is the only CML component in the system.

While time interleaving increases the aggregate sampling rate, it reduces the input bandwidth of the ADC as it increases the input capacitance of the ADC. To reduce the input capacitance of each ADC, an interpolating flash ADC [8] was used in this design to reduce the number of pre-amplifiers (PA) that load the input node. Fig. 5 shows an overall block diagram of



Fig. 5. Interpolating flash ADC (PA: pre-amplifier, L: latch).



Fig. 6. Clocked amplifier used in ADC pre-amplifier.

an interpolating flash ADC. The PA's amplify the difference between the input signal and the reference voltages. For a typical 5-bit flash ADC, a total of 31 PA's are required at the front-end. In this interpolating design, we use a total of 17 PA's instead, relying on a resistive ladder to generate the remaining 14 levels. The PA's and resistive ladder outputs are then latched and sent to a thermal-to-binary encoder.

It is desirable for a PA to have a high gain as this would reduce the effect of latch offset and the probability of metastability [9]. The gain offered by a continuous-time PA is not sufficient for high-speed applications due to inherent trade-off posed by the gain-bandwidth product of the PA [10]. We use instead a Strong-Arm regenerative PA as shown in Fig. 6 where the overdrive recovery is improved by resetting the previous state of the amplifier.

In an interpolating flash ADC, the PA's must be linear; otherwise the interpolated values will not correspond to the correct intermediate reference voltages. The implemented regenerative PA has a high gain and its output will easily enter into a nonlinear region. To demonstrate this point, Fig. 7(a) shows the outputs of two adjacent PA's, PA(N) and PA(N + 2), when the input voltage lies between their two input reference voltages, but is closer to  $V_{ref}(N)$ . As a result, the output of PA(N),  $V_O(N)$ , has a smaller slope magnitude compared to that of PA(N+2),  $V_O(N+2)$ . The difference in slope causes the interpolated voltage,  $V'_{OI}(N+1)$ , to initially become negative (which is the expected correct value), then move towards zero, as the



Fig. 7. Reference voltage generation by interpolation (a) without the interpolating amplifier; (b) with the interpolating amplifier.



Fig. 8. Proposed interpolating flash ADC with clocked pre-amplifier (PA), interpolating amplifier (IA), and Latch (L).

outputs of the PA's saturate. This would be an incorrect interpolated value and may send the following latch into a metastable state.

To overcome this problem, we have added another regenerative amplifier, denoted by IA in Fig. 8, with its sampling aperture occurring after PA's aperture and before the settling of their outputs. The timing diagram of this modified structure is shown in Fig. 7(b). The IA performs interpolation by amplifying the transient output difference of two PA's while valid. The same clock that triggers the PA's also triggers IA's. The amplifying window of the interpolating amplifiers is delayed with respect to the PA's by reducing the size of  $M_1$  and  $M_2$  (Fig. 6) in the IA's with respect to the corresponding transistors in the PA's.

## B. Feed-Forward Blind-Sampling CDR

A feed-forward blind sampling CDR was implemented similar to that in [4]. Fig. 9(a) shows a simplified block diagram of the CDR, where the ADC samples the data at twice the data rate and a digital phase detector calculates the sampling phase of ADC with respect to incoming data. Fig. 9(b) shows the method of phase recovery using linear interpolation [4]. The samples are first arranged in groups of three with one sample being shared between two adjacent groups. The position of a possible zerocrossing with respect to the first sample of the group,  $\Phi_X$ , is calculated using linear interpolation. A digital low-pass filter averages this instantaneous zero-crossings and produces an average phase,  $\Phi_{AVG}$ . Depending on  $\Phi_X$  and  $\Phi_{AVG}$ , the sliced value of either S<sub>0</sub>, S<sub>1</sub> and S<sub>2</sub> is selected as the recovered bit.

The linear interpolation in the PD requires smooth data transitions for accurate phase recovery. While a 5 dB or more loss in typical channels is sufficient for this purpose, an anti-aliasing filter [11] has to be integrated with the receiver for shorter channels or when the CDR operates at lower data rates where channel attenuation drops significantly.

A frequency offset between transmitter data rate and receiver sampling rate will cause the samples to drift in the UI. Whenever the sampling phase moves one UI forward (backward) one sample needs to be inserted (dropped). The CDR produces a signal,  $N_{valid}$  which is sent to FIFO to add or drop the extra bit (refer to Fig. 3). In this paper, the FIFO data is read out at the exact rate of incoming data and, hence, the FIFO is never over/under flowed. In a commercial product, a flow control in data link layer is needed to adjust data throughput, so that FIFO will not over/under flow.



Fig. 9. (a) Block diagram of a feed-forward blind sampling CDR and (b) phase  $(\Phi_X)$  recovery from digital samples.



Fig. 10. (a) DFE in phase tracking CDR. (b) DFE in a blind sampling CDR.

# C. DFE for Blind-Sampling CDR

The structure of digital DFE depends on the sampling scheme. In a phase tracking CDR, the sampling is performed at the eye-center of incoming data. This fixed sampling phase implies that the main cursor and the first post-cursor ISI are fixed for a given channel, thus the ISI replica generation block is simply providing a constant DFE coefficient, as illustrated in Fig. 10(a). On the other hand, in a blind sampling CDR, as the sampling phase sweeps the UI, the value of main cursor and first post-cursor ISI change, as shown in Fig. 10(b). This implies that the ISI replica generation should take into account the sampling phase and dynamically change the DFE coefficients according to the sampling phase.

To address the variable DFE coefficients of different sampling phases, the authors in [6] propose dividing the UI into eight bins (as shown in Fig. 11(a)) and choosing an appropriate DFE coefficient from a look-up table based on where the sampling phase falls within one UI. Fig. 11(b) shows the simplified full-rate implementation of DFE for the CDR as proposed in [6]. As can be seen from this figure, two look-up tables produce the phase-dependent DFE coefficients for even and odd



Fig. 11. Selection of DFE coefficients based on sampling phase. (a) DFE coefficients shown on pulse response; (b) full-rate implementation using a look-up table [6].

samples based on  $\Phi_{AVG}$ . Since the samples are half a UI apart, the corresponding DFE coefficients are shifted by four in the look-up table. The CDR uses three-sample windows to calculate the sampling phase. The samples are arranged such that two of the samples correspond to the current UI  $(b_n)$  while the other corresponds to the previous UI  $(b_{n-1})$ . Hence for the implementation of 1-tap DFE, both  $b_{n-1}$  and  $b_{n-2}$  are required to remove the first post cursor ISI from  $b_n$  and  $b_{n-1}$  respectively.

The measurement in [6] show that the manual 1-tap DFE is only capable of equalizing up to 13.3 dB of attenuation. However, for typical channels with higher attenuation, a 2-tap DFE or a combination of the 1-tap DFE with a linear equalizer should be used. Theoretically the DFE combined with the FFE presented in [5] is capable of equalizing channels up to 28 dB.

#### **III. PROPOSED ADAPTATION ENGINE**

Fig. 12 shows the block diagram of the conventional LMS adaptation engine for a phase-tracking CDR. In this diagram,  $rx_n$  represents the received signal at a discrete time n, which corresponds to the center of the  $n^{\text{th}}$  UI. Similarly,  $s_n$  and  $b_n$  represent the equalized signal and the recovered bits corresponding to the same  $n^{th}$  interval. The core of the adaptive engine consists of subtracting  $s_n$ , the equalized signal, from a reference level,  $d_{ref}$ , to produce an error signal,  $e_n$ . This error signal is then correlated with the previous recovered bit,  $b_{n-1}$ , to produce the DFE coefficient, c, for a 1-tap DFE.

If we limit the channel to the one that produces only one post-cursor ISI, the  $rx_n$  can take one of four values as depicted in Fig. 13(a). After the ISI is removed, the equalized signal,  $s_n$ ,



Fig. 12. Conventional LMS adaptation engine in phase tracking CDR.



Fig. 13. One-tap DFE equalization in (a) phase tracking, and (b) blind sampling CDR. Note that in the blind case, the samples assume four levels corresponding to four different desired levels after equalization.

can only take one of two values. In fact, these two values are used as the reference voltage in Fig. 12. For a CDR with a blind clock, on the other hand, the choice of  $d_{ref}$  is more complicated as illustrated in Fig. 13(b). In this case, the sampling clock is not phase aligned to the center of the UI, and hence the equalized signal, at the sampling phase, may assume any of four possible values, two of which are also phase dependent. The two values corresponding to no transition do not depend on the sampling phase. The two that correspond to data transitions depend on the sampling phase. With transition filtering, the adaptation engine can use either sets as the desired levels. However, in this design we only use the phase-dependent desired levels, because proper operation of the phase detector requires equalization of edge samples. The phase-dependent desired levels provide a reference to the samples near the zero-crossings and thus can guide the adaptation engine to equalize those samples.

To accommodate this phase-dependent desired levels, we propose the modified LMS engine shown in Fig. 14. The  $d_{ref}$  generator block in this diagram produces a desired level corresponding to the sampling phase. The only remaining problem is that the DFE coefficient has to change with the sampling



Fig. 14. Proposed LMS adaptation engine for blind-sampling CDR.

phase and if the adaptation speed is lower than the rate at which the sampling phase is changing, then the adaptation may not converge to its final value for that phase. To resolve this issue, eight registers are used to store the DFE coefficients as in [6] but updated dynamically. At each sampling phase, only the corresponding DFE coefficient will be updated. In this way, each coefficient will reach its final value corresponding to that sampling phase. This may require several passes of sampling phase through that phase bin.

Fig. 15 shows the detailed implementation of adaptation engine. To reduce adaptation area and power overhead, only two consecutive ADC samples,  $S_{8,9}$ , are used. Based on these samples and  $\Phi_{AVG}$ , the desired waveform generator block produces phase-dependent desired levels,  $d_{ref-1,2}$ , which correspond to sampling phases 1/2 UI apart.  $d_{ref-1,2}$  are then compared with corresponding equalized samples,  $S_{EQ-8,9}$ . The resulting errors are multiplied by adaptation loop gain, g, and the previous recovered bit. DFE coefficient updates are produced after a transition filtering that removes errors not corresponding to data transitions. Two 1:8 DMUX use  $\Phi_{AVG}$  to select two accumulators that store the corresponding DFE coefficients to be updated. The DFE coefficient select block (DCS) then selects the two DFE coefficients,  $c_{1,2}$ , that are used in the DFE adders.

The shape of the desired waveform can be derived from an equalized eye by dividing UI into 8 bins and then averaging the samples that fall in each bin. One drawback of this averaging scheme is the extra hardware required to store and update the desired waveform. Another drawback is the formation of interacting adaptation and desired waveform generation loops which can cause unpredictable behavior. As an example, if the adaptation starts with zero initial conditions, the eye opening at the output of DFE would be small, producing in turn small desired levels. As a result, the adaptation will not be able to work properly and the eye opening will not improve.

Another way to produce the desired waveform is to use a fixed pre-defined shape with adjustable amplitude to accommodate different input power levels. A triangular waveform is a suitable candidate because it is consistent with linear interpolation by the PD. In other words, if the adaptation converges perfectly so that the equalized eye becomes diamond shape, then the error in PD due to the linear interpolation should be minimal.

It is possible to merge the two methods described above to produce the desired levels. First we let the engine to adapt based on a pre-defined desired waveform and then switch to the averaging technique. Fig. 16 compares the performance of this combined approach against that of a triangular waveform only. It can



Fig. 15. Detailed block diagram of proposed adaptation engine.



Fig. 16. Simulated high-frequency jitter tolerance comparison of desired waveforms, generated based on triangular and averaging technique with (a) added random jitter (RJ) to receiver clock and (b) added random noise ( $V_n$ ) to the received signal. (Sinusoidal jitter frequency: 170 MHz, BER <  $10^{-6}$ , RJ and  $V_n$  have Gaussian distribution for BER <  $10^{-6}$ , used channel has 10 dB loss at Nyquist frequency.)

be seen that the receiver jitter tolerance is better with the averaging scheme whenever high levels of random noise or jitter are added to the received signal or the receiver clock, respectively.



Fig. 17. Block diagram of desired waveform generator.

In the actual implementation, we used triangular desired waveform because of its simplicity and less overhead compared to the other method. The desired waveform generator is shown in Fig. 17. Two dynamic look-up tables calculate the desired levels for  $2\times$  samples that are 1/2 UI apart, based on a stored triangular waveform and  $\Phi_{AVG}$ . The height of the triangular waveform is adjusted based on the incoming data amplitude. The ADC samples that are closer to the center of the eye are rectified and averaged to produce an approximation of the incoming data amplitude.

The limited bandwidth and nonlinearity of the analog front-end (AFE) and the quantization noise of the ADC may adversely affect the adaptation or equalization. The bandwidth limitation of the AFE can be absorbed into the channel loss, thus it will only reduce the equalization range of 1-tap DFE. Both nonlinearity and quantization noise can be represented with additive noise and as a result they can also reduce the equalization range as they degrade the received signal on top of ISI degradation. For a random bit sequence, the adaptation loop, however, remains almost unaffected because it finds the DFE coefficients by correlating equalization error with the previous bit and averages out any high speed uncorrelated variation caused by the quantization noise and nonlinearity.

#### IV. SIMULATION AND MEASUREMENT RESULTS

The channels used in measurement consist of two FR4 daughter cards with 5-inch traces each and a backplane with

 TABLE I

 COMPARISON OF BER, HORIZONTAL AND VERTICAL EYE OPENING, BEFORE AND AFTER ADAPTATION FOR THE CHANNELS WITH S21 SHOWN IN FIG. 18(c)

|                   | Before EQ   |                    |                   | After EQ    |                    |                   |
|-------------------|-------------|--------------------|-------------------|-------------|--------------------|-------------------|
| Channel att. (dB) | BER         | Horiz. eye opening | Vert. eye opening | BER         | Horiz. eye opening | Vert. eye opening |
| 6.9               | $< 10^{-6}$ | 0.552 UI           | 320 mV            | $< 10^{-6}$ | 0.6185 UI          | 288 mV            |
| 9.4               | $< 10^{-6}$ | 0.474 UI           | 224 mV            | $< 10^{-6}$ | 0.6538 UI          | 272 mV            |
| 10.9              | 2.25e-4     | 0.365 UI           | 96 mV             | $< 10^{-6}$ | 0.568 UI           | 200 mV            |
| 12.4              | 0.0019      | 0.245 UI           | 80 mV             | $< 10^{-6}$ | 0.475 UI           | 192 mV            |
| 14.9              | 0.169       | 0                  | 0                 | 0.0046      | 0.176 UI           | 40mV              |
| 19.8              | 0.381       | 0                  | 0                 | 0.164       | 0                  | 0                 |
| 22.9              | 0.46        | 0                  | 0                 | 0.323       | 0                  | 0                 |



Fig. 18. Channel insertion loss for (a) a 26-inch and (b) a 34-inch FR4 channel. (c)  $S_{21}$  of the channels used in simulation for Table I.

adjustable trace length. The total length of the FR4 channels are 26-inch and 34-inch corresponding to insertion loss of 9.9 dB and 13.3 dB at the Nyquist frequency of 2.5 GHz (Fig. 18(a) and (b)).

The functional simulations were performed in Simulink using event-driven modeling [12] to increase simulation speed.



Fig. 19. Simulated jitter tolerance comparison of adaptive DFE (this work) versus manual DFE (based on [6]).

The pulse response of the channels extracted from measured S-parameters were used in the simulation to emulate channel attenuation. The effect of adaptive 1-tap DFE on vertical and horizontal eye opening of the received signal and BER of the receiver for different channels has been presented in Table I. Although the 1-tap DFE is not able to open the eye for the lossy channels, the adaptation has improved the BER. Fig. 19 compares the simulated jitter tolerance of the receiver with the adaptive DFE (this work) against the manual DFE (based on [6]). In both simulations, the target BER is  $10^{-6}$  (as contrasted with  $10^{-12}$  in measurements) and PRBS7 is used. A frequency offset of 50 ppm is introduced between the receiver and transmitter clock frequencies to emulate blind sampling. In addition, a Gaussian random jitter of 0.17  $UI_{PP}$  and 0.23 UI<sub>PP</sub> is introduced to the transmitter and the receiver clock, respectively. The simulation results confirm the adaptation is achieved with little or no loss to performance (jitter tolerance) in the 34-inch channel. To find the limit of adaptation, the manual DFE coefficients were swept for a given channel and the set of coefficients which reduced the receiver BER to less than  $10^{-6}$  were compared to the adapted coefficients in the adaptive DFE. It was observed that the 1-tap DFE is able to reach the target BER for channels up to 14.8 dB of attenuation, but the adaptive DFE, in spite of convergence of coefficients, was unable to achieve the target BER. Although the adaptive DFE falls behind the manual DFE by 1.5 dB, it automatically provides DFE coefficients that are otherwise quite time consuming to find.

The receiver test chip was implemented in Fujitsu's 65 nm CMOS process. The die photo is shown in Fig. 20. The ADC





Fig. 21. Measurement setup.

and the digital CDR including all the test structures occupy an area of  $400 \times 490 \ \mu\text{m}^2$  and  $420 \times 640 \ \mu\text{m}^2$  respectively.

A simplified measurement setup is shown in Fig. 21. A Centellax board generating PRBS7 at 5 Gb/s was used as data source. The output amplitude of the PRBS generator did not cover ADC's input range, therefore we used a wideband amplifier with a gain of 7 dB after the PRBS generator. Based on the on-chip PRBS checker, the receiver operates at 5 Gb/s with BER <  $10^{-12}$ .

Fig. 22 shows the reconstructed eye diagrams of received data before and after the 1-tap DFE equalization. A small frequency offset between the receiver and the transmitter was used so that the sampling points sweep the UI. The samples from the ADC and DFE were extracted and post-processed to produce the eye diagrams. For the 34-inch channel, the adaptive DFE is able to open the otherwise closed eye of the received data by 320 mV.

The learning curves of the DFE coefficients are shown in Fig. 23. Coefficients 1 to 4 are shown on the first and 5 to 8 on the second row. It can be seen that the DFE coefficients converge in around 80  $\mu$ s. The implemented adaptation engine uses 2 out of 16 ADC samples to perform the adaptation. The adaptation speed can be increased by utilizing more samples at the expense of more hardware and power consumption. Increasing adaptation loop gain can also speed up the adaptation, however this may cause coefficients to drift whenever a non-random bit sequence is received.

The measurement results of receiver jitter tolerance for BER <  $10^{-12}$  are plotted and compared with simulation results in Fig. 24. Sinusoidal jitter was applied to the transmitted data by modulating the clock frequency of the PRBS board. Using an Agilent E8257D signal generator as the clock source, the maximum modulation frequency that this signal generator supports is 8 MHz, thus jitter tolerance measurement was limited to this frequency. It can be seen that at 8 MHz the receiver tolerates 0.29 UI<sub>PP</sub> and 0.2 UI<sub>PP</sub> of sinusoidal jitter



26" Tyco Channel

(-9.9dB @ 2.5GHz)

Signal To New

1) 2) 200

30

25

20

15

10

5

30

20

10

-10

-20

0

0.5

ú

Eye Opening = 26 LSB = 416 mV

(a)

32mV<sub>diff</sub>/LSB

EQ Output 16mV<sub>diff</sub>/LSB 191mV<sub>PP,diff</sub>

94ps (0.47UI)

Time-50.0 ps/dv Ing Normal Cox



Fig. 23. Measured learning curves.

for the 26-inch and the 34-inch channels, respectively. Finally, a performance summary is presented in Table II.

#### V. CONCLUSION

An adaptive DFE for a  $2\times$  blind sampling ADC-based CDR was described. The adaptation engine which provides the DFE coefficients uses phase-dependent desired levels for adaptation. A triangular waveform was used as the ideal reference waveform to guide the adaptation. While the CDR cannot provide error-free operation at 5 Gb/s for the 34-inch FR4 channel without equalization, it does provide a jitter tolerance of 0.2 UI<sub>PP</sub> with BER <  $10^{-12}$  after adaptive equalization. The receiver consumes 192 mW, out of which, 114 mW is

34" Tyco Channel

(-13.3dB @ 2.5GHz)

46ps (0.23UI)

41.2mV PP dif

XEX

\*\*

Eye Acquitude More (3 of 3)

100

30

25

20

15

10

30

20

10

0

-10

-20

-30

0.5

ú

Eye Opening = 20 LSB = 320 mV

6mV<sub>diff</sub>/LSB

32mV<sub>diff</sub>/LSB



Fig. 24. Simulated and measured jitter tolerance for (a) 26-inch and (b) 34-inch FR4 channels.

| Technology                     | 65nm CMOS             |
|--------------------------------|-----------------------|
| Data rate                      | 5 Gb/s                |
| Area                           | $0.46 \ { m mm}^2$    |
| Channel attenuation            | 13.3 dB               |
| Adaptation time                | $80 \ \mu s$          |
| High freq. jitter tolerance    | $0.2 \text{ UI}_{pp}$ |
| ADC + input buffer + 4:16 DMUX |                       |
| power consumption              | 114 mW                |
| Digital CDR power consumption  | 78 mW                 |

TABLE II Performance Summary

consumed by the flash ADC and 78 mW by the digital blocks. It is possible to reduce the overall power consumption by using fractional sampling architectures [13] or by reducing the ADC power consumption using different ADC architectures such as SAR.

#### ACKNOWLEDGMENT

The authors would like to thank Ravi Shivnaraine for his help with the measurements.

## REFERENCES

[1] Y. Hidaka, W. Gai, T. Horie, J. H. Jiang, Y. Koyanagi, and H. Osone, "A 4-channel 1.25–10.3 Gb/s backplane transceiver macro with 35 dB equalizer and sign-based zero-forcing adaptive control," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3547–3559, Dec. 2009.

- [2] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Colman, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal, B. Killips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio, G. Swanson, A. Szczepanek, T. Ward, J. Williams, R. Williams, and T. Willwerth, "A 12.5 Gb/s SerDes in 65 nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2007, pp. 436–591.
- [3] J. Bergmans, "Effect of loop delay on stability of discrete-time PLL," *IEEE Trans. Circuits Syst. I, Fundam. Theory Applicat.*, vol. 42, no. 4, pp. 229–231, Apr. 1995.
- [4] O. Tyshchenko, A. Sheikholeslami, H. Tamura, M. Kibune, H. Yamaguchi, and J. Ogawa, "A 5-Gb/s ADC-based feed-forward CDR in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 6, pp. 1091–1098, Jun. 2010.
- [5] H. Yamaguchi, H. Tamura, Y. Doi, Y. Tomita, T. Hamada, M. Kibune, S. Ohmoto, K. Tateishi, O. Tyshchenko, A. Sheikholeslami, T. Higuchi, J. Ogawa, T. Saito, H. Ishida, and K. Gotoh, "A 5 Gb/s transceiver with an ADC-based feedforward CDR and CMA adaptive equalizer in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2010, pp. 168–169.
- [6] S. Sarvari, T. Tahmoureszadeh, A. Sheikholeslami, H. Tamura, and M. Kibune, "A 5 Gb/s speculative DFE for 2 × blind ADC-based receivers in 65-nm CMOS," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, 2010, pp. 69–70.
- [7] B. Abiri, A. Sheikholeslami, H. Tamura, and M. Kibune, "A 5 Gb/s adaptive DFE for 2 × blind ADC-based CDR in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2011, pp. 436–438.
- [8] R. van de Grift, I. Rutten, and M. van der Veen, "An 8-bit video ADC incorporating folding and interpolation techniques," *IEEE J. Solid-State Circuits*, vol. SC-22, no. 6, pp. 944–953, Dec. 1987.
- [9] C. Mangelsdorf, "A 400-MHz input flash converter with error correction," *IEEE J. Solid-State Circuits*, vol. 25, no. 1, pp. 184–191, Feb. 1990.
- [10] M. Choi and A. Abidi, "A 6-b 1.3-Gsample/s A/D converter in 0.35 μm CMOS," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1847–1858, Dec. 2001.
- [11] T. Tahmoureszadeh, S. Sarvari, A. Sheikholeslami, H. Tamura, Y. Tomita, and M. Kibune, "A combined anti-aliasing filter and 2-tap FFE in 65-nm CMOS for 2 ×; blind 2–10 Gb/s ADC-based receivers," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, 2010, pp. 1–4.
- [12] M. van Ierssel, H. Yamaguchi, A. Sheikholeslami, H. Tamura, and W. Walker, "Event-driven modeling of CDR jitter induced by power-supply noise, finite decision-circuit bandwidth, and channel ISI," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 5, pp. 1306–1315, May 2008.
- [13] O. Tyshchenko, A. Sheikholeslami, H. Tamura, Y. Tomita, H. Yamaguchi, M. Kibune, and T. Yamamoto, "A fractional-sampling-rate ADC-based CDR with feedforward architecture in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2010, pp. 166–167.



**Behrooz Abiri** received the B.S. degree with honors in electrical engineering from Sharif University of Technology, Tehran, Iran, and the M.S. degree from the University of Toronto, Toronto, ON, Canada, in 2008 and 2011, respectively. He is currently pursuing the Ph.D. degree in electrical engineering at the California Institute of Technology (Caltech), Pasadena, CA.

His research interests are integrated circuits for high speed chip-to-chip and optical interconnects, RF and millimeter-wave transceivers.

Mr. Abiri has been awarded the Edward S. Rogers scholarship (2009–2011) and the Caltech PhD Fellowship (2011–2012). He is also the Gold Medal winner of the 16th National Physics Olympiad, Iran, and the 35th International Physics Olympiad, South Korea.



Ali Sheikholeslami (S'98–M'99–SM'02) received the B.Sc. degree from Shiraz University, Shiraz, Iran, in 1990, and the M.A.Sc. and Ph.D. degrees from the University of Toronto, Toronto, ON, Canada, in 1994 and 1999, respectively, all in electrical and computer engineering.

In 1999, he joined the Department of Electrical and Computer Engineering, University of Toronto, where he is currently a Professor. His research interests are in the areas of analog and digital integrated circuits, high-speed signaling, and VLSI memory design. He

currently supervises two active research groups in the areas of high-speed signaling and VLSI memories. He has collaborated with industry on various VLSI design research in the past few years, including work with Nortel and Mosaid, Canada, and with Fujitsu Labs of Japan and America. He spent his 2005–2006 research sabbatical year with Fujitsu Labs of Japan and Fujitsu Labs of America.

Dr. Sheikholeslami served on the Memory Subcommittee of the IEEE International Solid-State Circuits Conference (ISSCC) from 2001 to 2004, and on the Technology Directions Subcommittee of the same conference from 2002 to 2005. He currently serves on the Wireline Subcommittee of ISSCC and on the executive committee of the same conference. He presented a tutorial on ferroelectric memory design at ISSCC 2002 and a tutorial on high-speed signaling at ISSCC 2008. He is an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS. He was the program chair for the 34th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2004) held in Toronto, Canada. He is a registered professional engineer in the province of Ontario, Canada.

Dr. Sheikholeslami has received the Best Professor of the Year Award four times (in 2000, 2002, 2005, and 2007) by the popular vote of the undergraduate students in the Department of Electrical and Computer Engineering, University of Toronto. He received the 2005–2006 Early Career Teaching Award and the 2010 Faculty Teaching Award, both from the Faculty of Applied Science and Engineering at the University of Toronto, in "Recognition of Superb Accomplishment in Teaching".



**Hirotaka Tamura** (M'02–SM'10) received the B.S., M.S., and Ph.D. degrees in electronic engineering from Tokyo University, Tokyo, Japan, in 1977, 1979, and 1982, respectively.

In 1982, he joined Fujitsu Laboratories, Ltd., Kawasaki, Japan, where he was engaged in research on Josephson devices and other exploratory devices. In 1995, he moved into the area of CMOS circuit design. After working on multi-gigabit DRAMs and ferroelectric nonvolatile memories, he got involved in CMOS high-speed signaling. His current interest

covers the circuit topology and architecture of high-speed CMOS interfaces.



**Masaya Kibune** was born in Kanagawa, Japan, in 1973. He received the B.S. and M.S. degrees in applied physics from Tokyo University, Tokyo, Japan, in 1996 and 1998 respectively.

In 1998, he joined Fujitsu Laboratories, Ltd., Kanagawa, Japan. He has been engaged in research and design of high-speed IO with CMOS.