New Page 1

An 8B/10B Channel Encoder for AC-coupled High-speed IO Interface

1. Introduction

Modern Trends in Chip to Chip communication have moved towards Ghz-speed serial links using clock recovery techniques. The adoption of clock recovery architectures require dc-balanced data streams. As a result, system architects [1],[2] have developed encoding algorithms to convert a regular 8-bit stream of data into a 10-bit dc-balanced, run length limited data pattern. In this project we will be implementing a single channel 8B/10B encoder following the PCI-Express® encoding requirements.

2. High-level General Design Flow

Figure 2.1 of the appendix shows our high level design flow. The design process is divided into tree teams: Front end, Back end and Custom. Dash lines represent work done by the same team/person. In this flow, all three teams can relatively start work at the same time reducing development time. The next sections describe in detail the steps of how each team iterated through this flow.

3. Architecture, RTL, Synthesis.

As indicated in figure 2.1, the first step in the design process is defining the top-level architecture. This includes core block functionality and I/O to core interface, taking into account system level timing, area and technology limits. Figure 3.1 summarizes the proposed architecture. The encoder is to be implemented using CMC 0.35u wcells Std Library, targeting Frequency of 250Mhz.

3.1 RTL Code

The encoder core logic is composed of 5 blocks. Verilog RTL Code was written for each blocks individually first and verified. Once all blocks pass low level verification, the entire core RTL netlist was composed and once again functionally verified. The 8 bit input word is divided into sets of 3 and 5 bits. This pair is then encoded to a 4/6 bits pair given the depending on running disparity and run length previous set of encoded data. Table 3.1 describes detail functionality of each block:

Block	Description
enc_k	determines weather current data stream is a valid command code if k input is asserted. Also outputs encoded 4B/6B for a valid input commands.
enc_d	encode input 8bit word to 4B/6B data streams.
enc_flip	determine weather encoding algorithm allows for DBI (data bit inversion). Determine if running disparity (RD) needs to be flipped after transmission of data stream. outputs other control signals to reset, current RD value.
S3	State machine stage. keeps track of current RD. Mux between enc_k or enc_d 4B/6B outputs. determine if data needs to be inverted if DBI is enabled. Determine if invalid_k output needs to be asserted.
S4	output flop stage. Send out data or inverted data depending on S3 output. Allow for output data retiming incase necessary to meet system level timing.

Table 3.1 - Core block descriptions

3.2 IO Clocking Architecture

Since the 8B/10B encoder ASIC operates synchronously, the clock/data relationship at its input ports and its output ports, relative to the respective input clock and output clock, have to be considered at an early design stage. For ease of system integration, the data hold time was chosen to be “zero” at the IO interface. As a result, for system integration with other integrated circuits, the only requirement would be that the input data edge never arrives earlier than the clock edge. Figure 3.2 Shows the Core-to-Pad interface.

3.3 Synthesis

Once the architecture is solidified, The RTL code is synthesized with preliminary constraints. These constraints are further tweaked during synthesis to obtain the optimal design based on area, timing and power requirements. A flat (vs. hierarchical) synthesis approach was chosen since we are able to achieve our targets with more margin and would ease setting design constraints. Once synthesis is complete, a physical verilog netlist is exported from the Synopsys Design Environment. This netlist along with synthesis constraints file, will then be used to place are route the design in First Encounter.

4. PR flow: placement, clock tree, routing and timing closure

This section describes our basic place and route flow used to design the encoder. The entire flow is detailed in the following seven subsections starting from placement of IO pads to generating GDS for DRC/LVS. Figure 4.1 illustrates our P&R flow. There are four main inputs: gate-level netlist, design timing constraints, physical library of all standard cells and I/O's (lef) and timing library (lib). The final output is place and route GDS. The Padout spreadsheet (Figure 5.1) maintained by the IO team was used to place IO pads around the ring as indicated.

4.2 Power Grid Design

The power grid structure of our design consists of Metal1, Metal2 and Metal3. The number of stripes is sufficient to have power consistently distributed throughout the entire core area and have minimal IR drop at the chip center. The following table describes the layout of the power grid.

Metal	Orientation	Width	Pwr&Gnd stripes spacing	Set to set spacing
M1	Horizontal	5.8um	15.8um	43.2um
M2	Vertical	10um	1.5um	72um
M3	Horizontal	10um	11.6um	86.4um

4.3 Standard Cell Placement

The encoder consists of 878 instances. We have utilized the placement engine from First Encounter to place the design. We achieve 65% placement utilization. Figure 4.2 is a snapshot of our cell placement.

4.4 Clock Tree Synthesis

The clock signal is coming from a Clock Pad. It then branches out with one (clk_in) is driving flops and the other (clk_in_delay) is going to all Input Pads. The clk_in_delay clock is the delay version of the clk_in to satisfy IO timing constraints. First Encounter Clock Tree Synthesis (CTS) is used to generate those clock trees. The following table summarized the characteristics of those two clock trees. Figure 4.3 is showing our clock tree structures after CTS.

Clock Name	flops/buffers	Insertion delay	Skew	Transition
clk_in	54/96	1.4 -> 1.6 ns	200.6 ps	392ps
clk_in_delay	10/91	3.25 -> 3.35 ns	100 ps	217ps

4.5 Routing

First Encounter Nanoroute is used to route the encoder. The design has minimal congestion hence zero routing violation is achieved.

4.6 Timing Closure and ECO

Since there is no timing model for our IO pads, setup and hold timing verifications can only be done by simulations on full chip gds. The design is extracted and checked for transition timing for both clock and data as followed:

Clock transition:	392 ps (worst case)
Data transition	807 ps (worst case)

Some buffers are upsized and inserted during ECO stages to speed up some transition. The overall timing is good.

4.7 GDS Generation

The physical database is converted to GDS from encounter. This gds has all routing metals, via's and references to standard cells. It is then streamed in using Cadence Virtuoso along with standard cells gds for final DRC/LVS verifications.

5. IO design: high speed IO

5.1 Full-Custom IO Design Flow

The design of the input/output circuit for the 8B/10B encoder ASIC began with the target specifications, and layout planning at the chip level. Figure 5.1 in the appendix illustrates the planning of the IO pad out, the chip power/ground to signal ratio, and signal ordering.

5.3 IO Drivers

The IO drivers are specified to drive 2kW, 20pF loads, using LVTTL signaling standard as specified in JEDEC JESD-8B (Interface Standard for Nominal 3V/3.3V Supply Digital Integrated Circuits). The reason for output drivers being able to drive resistive load, is to accommodate for subsequent stages which could have bi-directional IO pads, which could have resistive loading. Driving the specified loads, the final design demonstrates rise and fall times of 646ps and 469ps respectively.

5.4 IO Receivers

The clock receiver and data receiver are different in that the clock receiver consists of a Schmitt trigger for noise immunity, whereas the data receiver consists of a master-slave flip-flop to synchronize data relative to off-chip clock to on-chip clock.

5.5 ESD Structures

All IO pads have p+/nwell p-diodes and n+/psub n-diodes as their protection against positive and negative ESD hits. In addition, the input pads have series resistors of approximately 200W (constructed using parallel combination of pdiff resistor and ndiff resistor) for additional input protection.

5.6 IO Ring layout design

For a modular design, each inline-bonded IO pad has the same basic structure in a common instance, consisting of the bond site, horizontal VDD/VSS rails, ESD diodes, and guard ring isolations between core-logic and output driver/input receiver logic. A common corner pad connects up the four sides of the IO pads in a seamless manner. The IO ring instances and layout could be found in Figure 5.2 and 5.3 respectively.

The LEF (library exchange format) for each individual IO pad was exported from their abstract view, to be utilized by the back-end team in performing automated place-and-route. The IO ring is fully DRC and LVS cleaned prior to full-chip integration.

6. Top-level Integration

Chip integration is the final stage of the design, merging the auto-routed core logic, the full-custom IO ring, and any additionally required blocks such as a seal ring. The chip-level instance view and layout view are shown in Figure 6.1 and 6.2 .

In order to perform LVS, the verilog netlist for the core logic was imported into cadence as schematics constructed using standard cells. The signals interfacing with the IO ring were identified with IO ports created. The physical view of the core logic was imported from a gds with a stream layer map table providing the layer definitions.

Manual power tap connections were inserted to tap the IO power grid onto the chip core power grid. Finally, top level pin stamps were drawn to identify top level IO signals.

The IO ring and the core were individually lvs-cleaned first, prior to top-level integration; the lvs reports are shown in the appendix.

Due to the two different LVS methodologies required for IO and core logic (flat LVS vs. macrolvs), a complete full-chip was not able to be performed. However, the respective IO and core interfacing signals were checked to be connected properly.

7. Appendix

7.1 Figures

Figure 2.1 - Top Level Design Flow

Figure 3.1 - Encoder Core Blocks

Figure 3.2 - I/O to Core interface

Figure 4.1 - Place and Route Flow

Figure 4.2 - Placed Std Cells

Figure 4.3 - Placed and routed clock tree

Figure 5.1 - Snapshot of Padout Spreadsheet

Figure 5.2 - I/O Ring instance view

Figure 5.3 - I/O Ring Layout View

Figure 6.1 - Full Chip instance View

Figure 6.2 - Full Chip Layout View

7.2 I/O Ring LVS Reports

Running simulation in directory: "/proj/fc/devel_rfung/meng/ECE1388/LVS".

*WARNING* Attached technology library _wcells does not exist.

Design library has been temporarily attached to default technology library.

Please attach an existing technology library to the design library cmosp35diode.

Or add the attached technology library _wcells in cds.lib.

*WARNING* techOpenTechFile: unable to open file techfile.cds in library cdsDefTechLib in r mode

*WARNING* techPcellEvalTrigger: Internal error since tfCnt is equal to 0

Begin netlist: Dec 20 00:42:25 2004

view name list = ("auLvs" "extracted" "gate.sch" "cmos.sch")

stop name list = ("auLvs")

library name = "Encoder8B10BLib"

cell name = "IORING"

view name = "extracted"

globals lib = "basic"

Running Artist Flat Netlisting ...

End netlist: Dec 20 00:42:41 2004

Moving original netlist to extNetlist

Removing parasitic components from netlist

presistors removed: 0

pcapacitors removed: 0

pinductors removed: 0

pdiodes removed: 0

trans lines removed: 0

6203 nodes merged into 6203 nodes

Begin netlist: Dec 20 00:42:43 2004

view name list = ("auLvs" "schematic" "gate.sch" "cmos.sch")

stop name list = ("auLvs")

library name = "Encoder8B10BLib"

cell name = "IORING"

view name = "schematic"

globals lib = "basic"

Running Artist Flat Netlisting ...

End netlist: Dec 20 00:42:54 2004

Moving original netlist to extNetlist

Removing parasitic components from netlist

presistors removed: 0

pcapacitors removed: 0

pinductors removed: 0

pdiodes removed: 0

trans lines removed: 0

7303 nodes merged into 7303 nodes

Running netlist comparison program: LVS

Begin comparison: Dec 20 00:42:56 2004

@(#)$CDS: LVS version 5.0.0 01/31/2004 20:15 (intelibm12) $

Warning: Devices on a command "permuteDevice" that are not present in netlist:

"capacitor".

Warning: Devices on a command "parameterMatchType" that are not present in netlist:

"capacitor".

1328 net-list ambiguities were resolved by random selection.

The net-lists match.

layout schematic

instances

un-matched 0 0

rewired 0 0

size errors 0 0

pruned 0 0

active 14018 11878

total 14018 11878

nets

un-matched 0 0

merged 0 0

pruned 0 0

active 6203 6203

total 6203 6203

terminals

un-matched 0 0

matched but

different type 0 0

total 69 69

End comparison: Dec 20 00:43:05 2004

Comparison program completed successfully.

7.3 Core LVS Reports

@(#)$CDS: LVS version 5.0.0 01/31/2004 20:15 (intelibm12) $

Command line: /proj/stfs1_vol9/local_user/vendor_tools/cadence.5033-1/tools.lnx86/dfII/bin/32bit/LVS -dir /proj/fc/devel_rfung/meng/ECE1388/LVS -l -s -f -t /proj/fc/devel_rfung/meng/ECE1388/LVS/layout /proj/fc/devel_rfung/meng/ECE1388/LVS/schematic

Like matching is enabled.

Net swapping is enabled.

Fixed device checking is enabled.

Using terminal names as correspondence points.

Net-list summary for /proj/fc/devel_rfung/meng/ECE1388/LVS/layout/netlist

count

934 nets

37 terminals

1 wdkrp_2

3 wand2_1

2 wdkrp_4

1 wand2_2

2 wdtkrp_2

20 wand2_4

1 wdtkrp_4

2 wnor2_2

4 wnor2_4

13 wbuf_1

16 wdp_4

11 wbuf_2

13 wand3_4

74 winv_1

154 wbuf_4

37 winv_2

4 wnand2_1

10 wnand2_2

252 winv_4

23 wnand2_4

1 wand4_1

15 wxor2_2

3 wand4_4

3 wdtp_2

24 wdtp_4

1 wnand3_2

1 wnand3_4

5 wor2_1

29 wcd_8

21 wor2_2

56 wor2_4

5 wmux2_4

1 wnand4_4

4 wor3_1

6 wor3_4

4 wcd_12

1 wdtkrsp_2

4 wcd_16

2 wdtkrsp_4

2 wdkrsp_2

1 wor4_1

2 wor4_2

Net-list summary for /proj/fc/devel_rfung/meng/ECE1388/LVS/schematic/netlist

count

934 nets

37 terminals

1 wdkrp_2

3 wand2_1

2 wdkrp_4

1 wand2_2

2 wdtkrp_2

20 wand2_4

1 wdtkrp_4

2 wnor2_2

4 wnor2_4

13 wbuf_1

16 wdp_4

11 wbuf_2

13 wand3_4

74 winv_1

154 wbuf_4

37 winv_2

4 wnand2_1

10 wnand2_2

252 winv_4

23 wnand2_4

1 wand4_1

15 wxor2_2

3 wand4_4

3 wdtp_2

24 wdtp_4

1 wnand3_2

1 wnand3_4

5 wor2_1

29 wcd_8

21 wor2_2

56 wor2_4

5 wmux2_4

1 wnand4_4

4 wor3_1

6 wor3_4

4 wcd_12

1 wdtkrsp_2

4 wcd_16

2 wdtkrsp_4

2 wdkrsp_2

1 wor4_1

2 wor4_2

Terminal correspondence points

1 VDD!

2 VSS!

3 clk_in

4 clk_in_delay__L40_N0

5 clk_in_delay__L40_N1

6 clk_in_delay__L40_N2

7 clk_in_delay__L40_N3

8 clk_in_delay__L40_N4

9 clk_in_delay__L40_N5

10 clk_in_delay__L40_N6

11 clk_in_delay__L40_N7

12 clk_out__L14_N0

13 clk_out__L14_N1

14 clk_out__L14_N2

15 clk_out__L14_N3

16 clk_out__L14_N4

17 data0_net

18 data1_net

19 data2_net

20 data3_net

21 data4_net

22 data5_net

23 data6_net

24 data7_net

25 invalid_k_net

26 k_net

27 rst_net

28 tx_data0_net

29 tx_data1_net

30 tx_data2_net

31 tx_data3_net

32 tx_data4_net

33 tx_data5_net

34 tx_data6_net

35 tx_data7_net

36 tx_data8_net

37 tx_data9_net

The net-lists match.

layout schematic

instances

un-matched 0 0

rewired 0 0

size errors 0 0

pruned 0 0

active 834 834

total 834 834

nets

un-matched 0 0

merged 0 0

pruned 0 0

active 934 934

total 934 934

terminals

un-matched 0 0

matched but

different type 0 0

total 37 37

Probe files from /proj/fc/devel_rfung/meng/ECE1388/LVS/schematic

devbad.out:

netbad.out:

mergenet.out:

termbad.out:

prunenet.out:

prunedev.out:

audit.out:

Probe files from /proj/fc/devel_rfung/meng/ECE1388/LVS/layout

devbad.out:

netbad.out:

mergenet.out:

termbad.out:

prunenet.out:

prunedev.out:

audit.out:

8 References

[1] Actel Corporation, 955 East Arques Avenue, Sunnyvale, California 94086, USA. "Implementing an 8b/10b Encoder/Decoder for Gigabit Ethernet in the Actel SX FPGA Family". web: www.actel.com

[2] Widmer Albert X. "8B/10B Encoding and Decoding for High Speed Applications". IBM T.J. Watson Research Center. 1101 Kitchawan Rd. Route 134, Yorktown Heights, NY 10598-0218