An 8B/10B Channel Encoder for AC-coupled High-speed IO Interface

1. Introduction

Modern Trends in Chip to Chip communication have moved towards Ghz-speed serial links using clock recovery techniques.  The adoption of clock recovery architectures require dc-balanced data streams.  As a result, system architects [1],[2] have developed encoding algorithms to convert a regular 8-bit stream of data into a 10-bit dc-balanced, run length limited data pattern.  In this project we will be implementing a single channel 8B/10B encoder following the PCI-Express® encoding requirements. 

2. High-level General Design Flow

Figure 2.1 of the appendix shows our high level design flow.   The design process is divided into tree teams: Front end, Back end and Custom.  Dash lines represent work done by the same team/person.  In this flow, all three teams can relatively start work at the same time reducing development time.  The next sections describe in detail the steps of how each team iterated through this flow.

3. Architecture, RTL, Synthesis.

As indicated in figure 2.1, the first step in the design process is defining the top-level architecture.  This includes core block functionality and I/O to core interface, taking into account system level timing, area and technology limits. Figure 3.1 summarizes the proposed architecture.   The encoder is to be implemented using CMC 0.35u wcells Std Library, targeting Frequency of 250Mhz.

3.1 RTL Code

The encoder core logic is composed of 5 blocks.   Verilog RTL Code was written for each blocks individually first and verified.  Once all blocks pass low level verification, the entire core RTL netlist was composed and once again functionally verified.  The 8 bit input word is divided into sets of 3 and 5 bits.  This pair is then encoded to a 4/6 bits pair given the depending on running disparity and run length previous set of encoded data.  Table 3.1 describes detail functionality of each block:

Block Description

enc_k

determines weather current data stream is a valid command code if k input is asserted. Also outputs encoded 4B/6B for a valid input commands.

enc_d

encode input 8bit word to 4B/6B data streams. 

enc_flip

determine weather encoding algorithm allows for DBI (data bit inversion).  Determine if running disparity (RD) needs to be flipped after transmission of data stream.  outputs other control signals to reset, current RD value.

S3

State machine stage. keeps track of current RD. Mux between enc_k or enc_d 4B/6B outputs.  determine if data needs to be inverted if DBI is enabled.  Determine if invalid_k output needs to be asserted.

S4

output flop stage. Send out data or inverted data depending on S3 output. Allow for output data retiming incase necessary to meet system level timing.


Table 3.1 - Core block descriptions
n

3.2 IO Clocking Architecture

Since the 8B/10B encoder ASIC operates synchronously, the clock/data relationship at its input ports and its output ports, relative to the respective input clock and output clock, have to be considered at an early design stage. For ease of system integration, the data hold time was chosen to be “zero” at the IO interface. As a result, for system integration with other integrated circuits, the only requirement would be that the input data edge never arrives earlier than the clock edge. Figure 3.2 Shows the  Core-to-Pad interface.

3.3 Synthesis

Once the architecture is solidified, The RTL code is synthesized with preliminary constraints.  These constraints are further tweaked during synthesis to obtain the optimal design based on area, timing and power requirements.  A flat (vs. hierarchical)  synthesis approach was chosen since we are able to achieve our targets with more margin and would ease setting design constraints.  Once synthesis is complete,  a physical verilog netlist is exported from the Synopsys Design Environment.   This netlist along with synthesis constraints file, will then be used to place are route the design in First Encounter.

4. PR flow: placement, clock tree, routing and timing closure

This section describes our basic place and route flow used to design the encoder. The entire flow is detailed in the following seven subsections starting from placement of IO pads to generating GDS for DRC/LVS. Figure 4.1 illustrates our P&R flow. There are four main inputs: gate-level netlist, design timing constraints, physical library of all standard cells and I/O's (lef) and timing library (lib). The final output is place and route GDS.  The Padout spreadsheet (Figure 5.1) maintained by the IO team was used to place IO pads around the ring as indicated. 

4.2 Power Grid Design

 The power grid structure of our design consists of Metal1, Metal2 and Metal3. The number of stripes is sufficient to have power consistently distributed throughout the entire core area and have minimal IR drop at the chip center. The following table describes the layout of the power grid.

Metal

Orientation

Width

Pwr&Gnd stripes spacing

Set to set spacing

M1

Horizontal

5.8um

15.8um

43.2um

M2

Vertical

10um

1.5um

72um

M3

Horizontal

10um

11.6um

86.4um

4.3 Standard Cell Placement 

The encoder consists of 878 instances. We have utilized the placement engine from First Encounter to place the design. We achieve 65% placement utilization. Figure 4.2 is a snapshot of our cell placement.

4.4 Clock Tree Synthesis

The clock signal is coming from a Clock Pad. It then branches out with one (clk_in) is driving flops and the other (clk_in_delay) is going to all Input Pads. The clk_in_delay clock is the delay version of the clk_in to satisfy IO timing constraints. First Encounter Clock Tree Synthesis (CTS) is used to generate those clock trees. The following table summarized the characteristics of those two clock trees. Figure 4.3 is showing our clock tree structures after CTS. 

Clock Name

flops/buffers

Insertion delay

Skew

Transition

clk_in

54/96

1.4 -> 1.6 ns

200.6 ps

392ps

clk_in_delay

10/91

3.25 -> 3.35 ns

100 ps

217ps

4.5 Routing

First Encounter Nanoroute is used to route the encoder. The design has minimal congestion hence zero routing violation is achieved.

4.6 Timing Closure and ECO

Since there is no timing model for our IO pads, setup and hold timing verifications can only be done by simulations on full chip gds. The design is extracted and checked for transition timing for both clock and data as followed:

Clock transition:

392 ps (worst case)

Data transition

807 ps (worst case)

Some buffers are upsized and inserted during ECO stages to speed up some transition. The overall timing is good.

4.7 GDS Generation

The physical database is converted to GDS from encounter. This gds has all routing metals, via's and references to standard cells. It is then streamed in using Cadence Virtuoso along with standard cells gds for final DRC/LVS verifications.

5. IO design: high speed IO

5.1 Full-Custom IO Design Flow

The design of the input/output circuit for the 8B/10B encoder ASIC began with the target specifications, and layout planning at the chip level. Figure 5.1 in the appendix illustrates the planning of the IO pad out, the chip power/ground to signal ratio, and signal ordering.

5.3 IO Drivers

The IO drivers are specified to drive 2kW, 20pF loads, using LVTTL signaling standard as specified in JEDEC JESD-8B (Interface Standard for Nominal 3V/3.3V Supply Digital Integrated Circuits). The reason for output drivers being able to drive resistive load, is to accommodate for subsequent stages which could have bi-directional IO pads, which could have resistive loading. Driving the specified loads, the final design demonstrates rise and fall times of 646ps and 469ps respectively.

5.4 IO Receivers

The clock receiver and data receiver are different in that the clock receiver consists of a Schmitt trigger for noise immunity, whereas the data receiver consists of a master-slave flip-flop to synchronize data relative to off-chip clock to on-chip clock.

5.5 ESD Structures

All IO pads have p+/nwell p-diodes and n+/psub n-diodes as their protection against positive and negative ESD hits. In addition, the input pads have series resistors of approximately 200W (constructed using parallel combination of pdiff resistor and ndiff resistor) for additional input protection.

5.6 IO Ring layout design

For a modular design, each inline-bonded IO pad has the same basic structure in a common instance, consisting of the bond site, horizontal VDD/VSS rails, ESD diodes, and guard ring isolations between core-logic and output driver/input receiver logic. A common corner pad connects up the four sides of the IO pads in a seamless manner. The IO ring instances and layout could be found in Figure 5.2 and 5.3 respectively.

The LEF (library exchange format) for each individual IO pad was exported from their abstract view, to be utilized by the back-end team in performing automated place-and-route. The IO ring is fully DRC and LVS cleaned prior to full-chip integration.

6. Top-level Integration

Chip integration is the final stage of the design, merging the auto-routed core logic, the full-custom IO ring, and any additionally required blocks such as a seal ring. The chip-level instance view and layout view are shown in Figure 6.1 and 6.2 .

In order to perform LVS, the verilog netlist for the core logic was imported into cadence as schematics constructed using standard cells. The signals interfacing with the IO ring were identified with IO ports created. The physical view of the core logic was imported from a gds with a stream layer map table providing the layer definitions.

Manual power tap connections were inserted to tap the IO power grid onto the chip core power grid. Finally, top level pin stamps were drawn to identify top level IO signals.

The IO ring and the core were individually lvs-cleaned first, prior to top-level integration; the lvs reports are shown in the appendix.

Due to the two different LVS methodologies required for IO and core logic (flat LVS vs. macrolvs), a complete full-chip was not able to be performed. However, the respective IO and core interfacing signals were checked to be connected properly.

7. Appendix

7.1 Figures


Figure 2.1  - Top Level Design Flow


Figure 3.1  - Encoder Core Blocks


Figure 3.2  - I/O to Core interface


Figure 4.1  - Place and Route Flow


Figure 4.2  - Placed Std Cells


Figure 4.3  - Placed and routed clock tree


Figure 5.1  - Snapshot of Padout Spreadsheet


Figure 5.2  - I/O Ring instance view


Figure 5.3  - I/O Ring Layout View


Figure 6.1  - Full Chip instance View


Figure 6.2  - Full Chip Layout View

7.2 I/O Ring LVS Reports


Running simulation in directory: "/proj/fc/devel_rfung/meng/ECE1388/LVS".

 

*WARNING* Attached technology library _wcells does not exist.

         Design library has been temporarily attached to default technology library.

         Please attach an existing technology library to the design library cmosp35diode.

         Or add the attached technology library _wcells in cds.lib.

*WARNING* techOpenTechFile: unable to open file techfile.cds in library cdsDefTechLib in r mode

*WARNING* techPcellEvalTrigger: Internal error since tfCnt is equal to 0

*WARNING* techPcellEvalTrigger: Internal error since tfCnt is equal to 0

*WARNING* techPcellEvalTrigger: Internal error since tfCnt is equal to 0

*WARNING* techPcellEvalTrigger: Internal error since tfCnt is equal to 0

 

 

Begin netlist:    Dec 20 00:42:25 2004

       view name list       = ("auLvs" "extracted" "gate.sch" "cmos.sch")

       stop name list       = ("auLvs")

       library name  = "Encoder8B10BLib"

       cell name     = "IORING"

       view name     = "extracted"

       globals lib   = "basic"

Running Artist Flat Netlisting ...

End netlist:    Dec 20 00:42:41 2004

 

Moving original netlist to extNetlist

Removing parasitic components from netlist

       presistors removed:  0

       pcapacitors removed: 0

       pinductors removed:  0

       pdiodes removed:     0

       trans lines removed: 0

       6203 nodes merged into 6203 nodes

 

 

Begin netlist:    Dec 20 00:42:43 2004

       view name list       = ("auLvs" "schematic" "gate.sch" "cmos.sch")

       stop name list       = ("auLvs")

       library name  = "Encoder8B10BLib"

       cell name     = "IORING"

       view name     = "schematic"

       globals lib   = "basic"

Running Artist Flat Netlisting ...

End netlist:    Dec 20 00:42:54 2004

 

Moving original netlist to extNetlist

Removing parasitic components from netlist

       presistors removed:  0

       pcapacitors removed: 0

       pinductors removed:  0

       pdiodes removed:     0

       trans lines removed: 0

       7303 nodes merged into 7303 nodes

 

Running netlist comparison program:  LVS

Begin comparison:    Dec 20 00:42:56 2004

@(#)$CDS: LVS version 5.0.0 01/31/2004 20:15 (intelibm12) $

Warning: Devices on a command "permuteDevice" that are not present in netlist:

         "capacitor".

Warning: Devices on a command "parameterMatchType" that are not present in netlist:

         "capacitor".

 

1328 net-list ambiguities were resolved by random selection.

 

The net-lists match.

 

                          layout  schematic

                           instances

       un-matched           0      0

       rewired                    0      0

       size errors          0      0

       pruned               0      0

       active               14018  11878

       total                14018  11878

 

                             nets

       un-matched           0      0

       merged               0      0

       pruned               0      0

       active               6203   6203

       total                6203   6203

 

                           terminals

       un-matched           0      0

       matched but

       different type             0      0

       total                69     69

End comparison:      Dec 20 00:43:05 2004

 

 

Comparison program completed successfully.


7.3 Core LVS Reports

@(#)$CDS: LVS version 5.0.0 01/31/2004 20:15 (intelibm12) $

 

Command line: /proj/stfs1_vol9/local_user/vendor_tools/cadence.5033-1/tools.lnx86/dfII/bin/32bit/LVS -dir /proj/fc/devel_rfung/meng/ECE1388/LVS -l -s -f -t /proj/fc/devel_rfung/meng/ECE1388/LVS/layout /proj/fc/devel_rfung/meng/ECE1388/LVS/schematic

Like matching is enabled.

Net swapping is enabled.

Fixed device checking is enabled.

Using terminal names as correspondence points.

 

    Net-list summary for /proj/fc/devel_rfung/meng/ECE1388/LVS/layout/netlist

       count

       934           nets

       37            terminals

       1             wdkrp_2

       3             wand2_1

       2             wdkrp_4

       1             wand2_2

       2             wdtkrp_2

       20            wand2_4

       1             wdtkrp_4

       2             wnor2_2

       4             wnor2_4

       13            wbuf_1

       16            wdp_4

       11            wbuf_2

       13            wand3_4

       74            winv_1

       154           wbuf_4

       37            winv_2

       4             wnand2_1

       10            wnand2_2

       252           winv_4

       23            wnand2_4

       1             wand4_1

       15            wxor2_2

       3             wand4_4

       3             wdtp_2

       24            wdtp_4

       1             wnand3_2

       1             wnand3_4

       5             wor2_1

       29            wcd_8

       21            wor2_2

       56            wor2_4

       5             wmux2_4

       1             wnand4_4

       4             wor3_1

       6             wor3_4

       4             wcd_12

       1             wdtkrsp_2

       4             wcd_16

       2             wdtkrsp_4

       2             wdkrsp_2

       1             wor4_1

       2             wor4_2

 

    Net-list summary for /proj/fc/devel_rfung/meng/ECE1388/LVS/schematic/netlist

       count

       934           nets

       37            terminals

       1             wdkrp_2

       3             wand2_1

       2             wdkrp_4

       1             wand2_2

       2             wdtkrp_2

       20            wand2_4

       1             wdtkrp_4

       2             wnor2_2

       4             wnor2_4

       13            wbuf_1

       16            wdp_4

       11            wbuf_2

       13            wand3_4

       74            winv_1

       154           wbuf_4

       37            winv_2

       4             wnand2_1

       10            wnand2_2

       252           winv_4

       23            wnand2_4

       1             wand4_1

       15            wxor2_2

       3             wand4_4

       3             wdtp_2

       24            wdtp_4

       1             wnand3_2

       1             wnand3_4

       5             wor2_1

       29            wcd_8

       21            wor2_2

       56            wor2_4

       5             wmux2_4

       1             wnand4_4

       4             wor3_1

       6             wor3_4

       4             wcd_12

       1             wdtkrsp_2

       4             wcd_16

       2             wdtkrsp_4

       2             wdkrsp_2

       1             wor4_1

       2             wor4_2

 

 

    Terminal correspondence points

         1    VDD!

         2    VSS!

         3    clk_in

         4    clk_in_delay__L40_N0

         5    clk_in_delay__L40_N1

         6    clk_in_delay__L40_N2

         7    clk_in_delay__L40_N3

         8    clk_in_delay__L40_N4

         9    clk_in_delay__L40_N5

        10    clk_in_delay__L40_N6

        11    clk_in_delay__L40_N7

        12    clk_out__L14_N0

        13    clk_out__L14_N1

        14    clk_out__L14_N2

        15    clk_out__L14_N3

        16    clk_out__L14_N4

        17    data0_net

        18    data1_net

        19    data2_net

        20    data3_net

        21    data4_net

        22    data5_net

        23    data6_net

        24    data7_net

        25    invalid_k_net

        26    k_net

        27    rst_net

        28    tx_data0_net

        29    tx_data1_net

        30    tx_data2_net

        31    tx_data3_net

        32    tx_data4_net

        33    tx_data5_net

        34    tx_data6_net

        35    tx_data7_net

        36    tx_data8_net

        37    tx_data9_net

 

The net-lists match.

 

                          layout  schematic

                           instances

       un-matched           0      0

       rewired                    0      0

       size errors          0      0

       pruned               0      0

       active               834    834

       total                834    834

 

                             nets

       un-matched           0      0

       merged               0      0

       pruned               0      0

       active               934    934

       total                934    934

 

                           terminals

       un-matched           0      0

       matched but

       different type             0      0

       total                37     37

 

 

Probe files from /proj/fc/devel_rfung/meng/ECE1388/LVS/schematic

 

devbad.out:

 

netbad.out:

 

mergenet.out:

 

termbad.out:

 

prunenet.out:

 

prunedev.out:

 

audit.out:

 

 

Probe files from /proj/fc/devel_rfung/meng/ECE1388/LVS/layout

 

devbad.out:

 

netbad.out:

 

mergenet.out:

 

termbad.out:

 

prunenet.out:

 

prunedev.out:

 

audit.out:


 

8 References

 

[1] Actel Corporation, 955 East Arques Avenue, Sunnyvale, California 94086, USA. "Implementing an 8b/10b Encoder/Decoder for Gigabit Ethernet in the Actel SX FPGA Family".  web: www.actel.com

[2] Widmer Albert X. "8B/10B Encoding and Decoding for High Speed Applications". IBM T.J. Watson Research Center. 1101 Kitchawan Rd. Route 134, Yorktown Heights, NY 10598-0218