An 81Gb/s, 1.2V TIALA-Retimer in Standard 65nm CMOS

Shahriar Shahramian1, Anthony Chan Carusone1, Peter Schvan2, Sorin P. Voinigescu1
1) The Edward S. Rogers Sr., Department of ECE, University of Toronto, Toronto, ON, Canada
2) Nortel Networks, Ottawa, ON, Canada

Abstract- This paper describes the fastest full-rate retiming circuit reported to date in any semiconductor technology. By combining low- and high-Vt MOSFETs on the data and clock path, respectively, and CMOS-inverter based transimpedance amplifiers as low-noise, broadband preamplifiers, record speed is achieved with 1.2V supply. The power consumption of the 81GHz latch is only 9.6mW. On-wafer measurements demonstrate correct full-rate retiming up to 81Gb/s with jitter reduction and rise/fall time improvements.

I. INTRODUCTION

With $f_T$ and $f_{MAX}$ values exceeding those of all other commercial semiconductor technologies, CMOS offers the potential for low voltage operation and high levels of integration for emerging wireline applications with up to 110-Gb/s serial data rates. Although 100GHz static frequency dividers have been reported in SOI CMOS [1], CMOS retiming flip-flop speed continues to lag those reported in SiGe BiCMOS and III-V technologies. For example, full-rate retimers have been demonstrated up to 80Gb/s from -5.7V supply in InP HEMT technology [2], at 48Gb/s from 2.5V in SiGe BiCMOS [3] and at 40Gb/s from a 1.2V supply in 90nm CMOS [4]. This paper reports for the first time that by adapting the circuit topologies to low-voltage nanoscale CMOS technology and by appropriately sizing, biasing and laying out the circuit cells, a record 81Gb/s transimpedance-limiting-amplifier (TIALA) and retimer circuit can be fabricated in a 65nm general purpose and low power (GPLP) CMOS process with multiple threshold voltage devices.

II. CIRCUIT DESIGN

The circuit block diagram is shown in Fig. 1. The data path comprises a broadband low-noise TIA, four pseudo-differential gain stages and four differential CML inverters whose role is to provide single-ended to differential conversion and to boost the signal before it reaches the re-timing flip-flop without degrading the SNR. The clock signal is applied through a transformer-input and broadband distribution network to the flip-flop whose outputs are taken off-chip via a chain of output drivers.

A. Data Path

The TIA takes advantage of the quintessential digital circuit, the CMOS inverter, to achieve a single-ended gain of 7-dB over a 3-dB bandwidth of 80GHz. The gate width ratio of the p-MOS and n-MOS transistors is set approximately equal to the inverse of the $f_T$, $I_{ON}$, and characteristic current density ratios of the two devices [4]. The p-MOSFET and n-MOSFET are biased at 0.1mA/µm and 0.2mA/µm, respectively, which loosely correspond to the optimum noise figure current density and maximum power gain for each device, respectively. The total gate periphery and gate capacitance of the p-MOS and n-MOS devices ensures that the noise impedance of the TIA is also matched to 50 Ohm over a broad band. Finally, the layout of the CMOS inverter takes advantage of the equal number of gate fingers of the p-MOS and n-MOS transistors to minimize interconnect footprint and layout parasitic capacitance.

Fig. 1: Retimer block diagram
The role of the pseudo-differential limiting amplifier stages, whose schematic is shown in Fig. 2, is to provide at least 20dB of gain to overcome the metastability of the flip-flop for TIA input signal levels as low as 10mVpp, even when applied in a single-ended fashion. Since the TIALA has no common-mode rejection, a chain of differential n-MOS inverters (Fig. 3) with inductive series-shunt peaking and active tail current sources is inserted between the TIALA and the flip-flop. This combination of differential TIA, a pseudo-differential and differential CML inverter chain ensures that sensitivity and bandwidth are maximized even for single-ended inputs. All signal-path n-MOSFETs are biased at 0.2mA/µm, have a drawn gate length of 60nm and a physical gate length of 45nm. The simulated small-signal bandwidth of the TIALA after extraction of RC layout parasitics is 45GHz, limited by the limiting-amplifier chain.

B. Clock Distribution

The clock distribution network (Fig. 4) features a broadband planar transformer for single-ended to differential conversion. A series-shunt network (C_M, L_A) is used to match the transformer input to 50Ohm. The same CMOS TIA as in the data path is implemented in the clock path. By AC-coupling the TIA outputs to a chain of tuned differential pairs with active current sources, 6dB of differential gain is achieved over a broad frequency band extending from 50 GHz to 85 GHz. The final stage of the clock distribution uses a 0.8V supply and its outputs directly bias the clock path transistors of the flip-flop.

C. Flip-flop

The schematic of the flip-flop, shown in Fig. 5, is similar to that in [3]. The use of high-V_T devices on the clock path and low V_T devices on the data path allows the latch to operate from a 1.2V supply and at 81Gb/s. Each latch consumes 8mA.

In this design, the layout of each block is optimized by interdigitating and merging transistors with common sources or drains in a single well. For instance, the layout of the latch uses interdigitated clock-pair and data-path transistors with
gate fingers of 0.8µm and 1.6µm respectively. This technique, shown in Fig. 6, results in a compact latch layout which minimizes interconnect capacitance and transistor mismatch. Such layout techniques are particularly important in nanoscale CMOS where interconnect parasitic capacitances can approach the value of the intrinsic capacitance of the device.

**D. Output Driver**

The output driver consists of a chain of four scaled CML inverters with inductive peaking and a maximum fan-out of 1.5. The final stage produces a swing of 0.4V pp per side in 50Ω loads.

**III. Fabrication**

The circuit was fabricated in a standard digital 65nm GPLP CMOS process with a 7 layer metal back-end. The measured $f_T$ and $f_{MAX}$ for an 80x60nm1µm n-MOSFET are 170GHz and 250GHz at $V_{DS} = 0.6V$. Fig. 7 shows the die photo with an area of 0.57mmx1.2mm, including pads.

**IV. Measurement Results**

The power consumption of the entire circuit is 200mW from a 1.2V supply, with 9.6mW, 48mW and 30mW being consumed by each latch, the TIALA, and the clock distribution network, respectively. The retimer was tested on-wafer with a 2^7-1 PRBS signal generated by the transmitter described in [5]. The block diagram of the test setup for 81Gb/s operation is shown in Fig. 7. A 75-100GHz Millitech (x6) multiplier is used to generate the 81GHz clock signal. The multiplier is driven by an RF source at 1/6th of the clock frequency which also triggers the precision time-base module of the Agilent DCA-J oscilloscope. The 90Gb/s transmitter requires a half-rate clock signal which is provided by a second RF source. The two RF sources are synchronized via a 10MHz reference signal. A mechanical phase shifter is employed to align the retimer clock signal with the PRBS transmitter data. The half rate clock is also passed through a by-four divider to generate a (1/8)th-rate clock which is in turn used trigger the pattern-lock module of the DCA-J oscilloscope.

The PRBS generator was mounted on a separate probe station. One side of its differential output was connected to the TIALA-retimer under test via a 24-inch, 50GHz cable. The other side of the PRBS generator output was connected to the oscilloscope via an identical cable allowing the oscilloscope to capture the input and output signals of the TIALA-retimer simultaneously. All measurements are performed single-endedly and are reproduced in Figs. 9-14.

**Fig. 5:** Flip-flop schematic.

**Fig. 6:** Optimized layout of the CML latch.

**Fig. 7:** Chip micrograph.

**Fig. 8:** Block diagram of the test setup.

**Fig. 9:** Measured input and output eye diagrams at 75Gb/s.

The input eye height/amplitude is 5mV pp/40mV pp.
The minimum input eye amplitude is 40mV pp at 75Gb/s, 60mV pp at 78Gb/s and 80mV pp at 81Gb/s. The improvement in rise-time, eye opening and jitter (e.g. from 2.5ps RMS to 380fs RMS in Fig. 8) are illustrated in all cases. In the absence of an 80Gb/s BERT, the pattern-lock feature of the DCA-J was employed to verify the correctness of the input and output bit streams, Figs. 10 and 14.

V. CONCLUSIONS

A record full-rate TIALA-retimer, capable of operating with input data rates up to 81Gb/s is reported. The input TIA stage exhibits a figure of merit of 118.5µW/Gb/s. The output eye diagrams show rise/fall times better than 6ps and RMS jitter below 380fs. These results demonstrate that 65nm and 45nm CMOS technologies can compete with SiGe and InP technologies for the next-generation serial 80-110Gb/s wireline transceivers.

We wish to acknowledge Nortel for funding and chip fabrication.


Fig. 10: Correct input and output 2^7-1 PRBS patterns at 75Gb/s corresponding to Fig. 9.

Fig. 11: Measured input and output eye diagrams at 78Gb/s. The input eye height/amplitude is 12mV pp/60mV pp.

Fig. 12: Measured input and output eye diagrams at 81Gb/s. The input eye height is 63mV pp.

Fig. 13: Measured input and output eye diagrams at 81Gb/s. The input eye height/amplitude is 15mV pp/80mV pp.

Fig. 14: Correct input and output 27-1 PRBS patterns at 81Gb/s corresponding to Fig. 13.