Contents

Home

About...

Major Upgrades
The LEB Architecture
Placement
Routing
Support Tools
Generating LEB Netlists
Results
Placement
Routing
References
References
Appendix
Placement Pictures
Architecture Generation Pictures

VPR Placement Support for LEBs

The goal of placement support for LEBs was to upgrade the VPR wire-length (bounding box) driven placer such that it could support any number of LEB columns where all LEB locations were the same width and height. This was successfully accomplished with full graphics support as shown in the Placement Pictures.

Basic Approach

We applied the same placement heuristics applied to IOs and CLBs to place LEBs. Thus, LEBs are added to the anneal process in VPR. The annealer works by randomly selecting blocks to move and swap that block to a random location that is obeys the Dlimit constraint. Furthermore, when choosing the random location, you must find a location that can support the current block type (as specified in the clb[][].type field). Previously, the annealer only had conditions for IOs and CLBs. We updated the annealer to support LEBs. In cases where the random block selected is an LEB, a valid LEB location is found for swapping. Like CLBs and IOs, the LEB swap location is chosen at random and must obey the radius limit constraints, Dlimit, at the current temperature. LEB column locations are stored in an additional data structure for fast access (specified in the leb_cols data structure). Once the swap is made, the change is cost is calculated. Based on the current temperature and change in cost, it is either accepted or rejected. Since the placement is driven by the wire-length estimate (bounding box cost), the change in cost is simply the total wire-length estimate before the swap minus the wire-length estimate after the swap. As stated in the Introduction, swaps that increase the overall wire-length estimate may still be accepted in order to achieve hill-climbing.

Since LEBs are not uniform width or height, this complicates the wire-length estimate and move radius constraints. In the following sections we explain this issue and some possible solutions.

Additional Heuristics

Radius Limiter for Moves

As described in [Betz1997], blocks are restricted in movement in the x and y direction by a radius limit referred as Dlimit as illustrated in previous sections. Since CLBs are unit width and height, Dlimit represents the maximum number of CLB positions between a source location and its destination. For LEBs, this is no longer the case where the LEB is not necessarily unit width or unit height, and LEB columns are not adjacent to each other. One possibility is to ignore this and treat the Dlimit uniformly between CLBs and LEBs. However, this may potentially cause LEBs to get "stuck" in column early in the placement stage. For example, consider the following figure where there exist two LEB columns on the chip.


Figure 1. LEB Dlimit restriction.

Although Dlimit spans several CLB locations, it does not reach the closest LEB column, thus the LEB is "stuck" in its current column which may be sub-optimal. This potentially may not be a problem, but we propose two approaches to get around this condition if placement quality is hurt because of this restriction.

One possibility is to relax the Dlimit constraint in some cases. When selecting the location to swap to, we pick any (x,y) location that obeys the Dlimit constraint, just as in the case of CLBs. If this location is not a valid LEB location, we find the closest LEB location. When searching for the closest LEB location, we ignore the Dlimit constraint. This method prevents an LEB from getting "stuck" in an LEB column early in the placement; however, once Dlimit becomes smaller than half the length between two adjacent LEB columns, an LEB will again get trapped in its current column.

Our second method prevents an LEB from getting trapped in any column throughout the placement process. In this case, we scale Dlimit by the distance between two adjacent LEB columns for the x direction, and by the LEB height in the y direction. Thus, LEBs will always have freedom to switch between columns and neighbouring LEBs. For example, in the following figure, Dlimit=1, but the valid search region for LEB swaps spans an entire LEB column to column width and LEB height.


Figure 2. Scaling of Dlimit for LEB moves.

Pin Locations for Wire-Length Estimation

VPR uses net bounding boxes as an estimation of wire-length. In order to calculate the bounding box, the pin locations of a net need to be identified. For CLBs, there is the assumption that all pins are centered in the middle of the CLB. This is a fair approximation since all CLBs are unit height and width. LEBs, however, can vary in height and width and thus there is more flexibility in pin locations. We use two methods to approximate the location of pins: assume all pins are found in the bottom-left corner of the LEB; assume the worst case where connections will span the farthest distance between two LEBs. In the worst case condition, this will cause the bounding box to encapsulate the entire LEB. For example, consider the simple case where the two LEBs are connected in the following figure. The pin locations are assumed to be in the locations which will cause the pins to span the farthest distance, and thus the bounding box will encapsulate both LEBs.


Figure 1. Illustration of a worst case bounding box for two connected LEBs.

The worse case bounding box estimate required several changes to the VPR bounding box calculator. VPR uses edge detection to speed up bounding box calculations. This is useful when the number of pins on a net is fairly large (more than 4 pins). Edge detection works by storing the current bounding box edge locations and the number of blocks found at each edge. The benefit of edge detection allows bounding box updates to occur without having to visit all pin connections of the net. Only the blocks currently being moved and the number of blocks at a given edge need to be considered. This is illustrated in Figure 2a. Here, any movement indicated by the arrows will require an update to the bounding box size. Note that we only need to consider the current block being moved in this illustration. During the worse case bounding box estimation, having multi-width and height LEBs complicates edge detection since the size of the blocks needs to be accounted for as shown in Figure 2b.


(a) Edge detection using uniform blocks

(b) Edge detection using non-uniform blocks
Figure 2. Edge detection scheme used in VPR.

Test Approach

In order to ensure our LEB upgrades did not degrade placement quality, we ran several tests on 23 MCNC benchmark circuits. First, we placed each circuit with CLBs only and compared the cost against an LEB based architecture with three LEB columns and 5% LEB content in the netlist. When generating LEBs, we select random CLB pairs and join them to form one LEB. This process is described in detail here, The following figure illustrates this architecture using one of the smaller MCNC benchmark circuits.


Figure 3. Diagram of circuit i9 with the addition of LEBs to the circuit netlist.

The motivation of this first test is to ensure that a small number of LEBs and LEB columns will not degrade the placement quality by orders of magnitude. Since we create LEBs by joining random CLB pairs, placing LEBs becomes equivalent to restricting placement of subset of CLBs to LEB columns and forcing CLB pairs to be placed adjacent to each other. Restricting CLB movement will generally harm placement, thus we expect a reasonable degradation to the placement bounding box cost. However, assuming the LEB moves are assessed properly, this degradation will not be extreme. The following figure illustrates the comparison of our LEB supported architecture against our CLB only architecture. Here, the final bounding box costs are compared where the LEB cost is divided by the CLB only cost. As the figure shows, the placement quality degrades by approximately 7%.


Figure 4. Comparison of the LEB supported architecture against the CLB only architecture.

Following our CLB to LEB comparison, we performed another check. Here we converted single CLBs directly into LEBs. This contrasts with the previous conversion that converted CLB pairs into LEBs. Both of these conversion techniques are described in detail here. We defined an architecture containing LEBs of unit width and height LEBs and LEB columns spanned every other column. Thus, half of the (x,y) FPGA contained LEB locations and the other half contained CLB locations where both location types were spread uniformly across the FPGA. Using this architecture, we created two netlists for each benchmark circuit: one where 98% of the CLBs were converted to LEBs, and one where 2% of the CLBs were converted LEBs. These netlists were both placed with our upgraded LEB placer. Since the FPGA architecture specified a uniform density of CLB and LEB locations with equivalent pin and size characteristics, we expect the placement quality of the LEB dominated netlists to be similar to the CLB dominated netlists. The following figure proves that this was the case where on average the placement difference was less than 0.1%.


Figure 5. Comparison of CLB dominated netlist placement over LEB dominated netlist placement.

Results

In all of our tests, we use a basic LEB architecture consisting of three uniformly spread LEB columns. LEBs have a height of two and width of one. For all netlist, 5% of their CLBs were converted into LEBs where CLB pairs were used to create one LEB. The CLBs consist of four 4-LUTs with 10 inputs per cluster (N=4 I=10). The LEBs consists of two CLBs with 20 inputs, 8 outputs, and 2 global clocks inputs (i.e. double the pins on one CLB). The following table illustrates our base costs used in all of our comparisons. Here, the LEBs were placed using the strict Dlimit constraint where LEBs could not violate this constraint during the move selection. Also, the basic bounding box estimation is used where LEB net pins are assumed to connect to the bottom right location of the LEB.

Circuit Name

Cost

5xp1

1.53787

C6288

31.7029

alu4

134.549

apex2

213.35

apex4

138.231

bigkey

169.947

clma

721.724

des

137.148

diffeq

71.4736

dsip

165.354

elliptic

248.9

ex1010

379.428

ex5p

117.27

frisc

304.211

i9

18.5126

misex3

135.463

pdc

715.321

s298

186.162

s38417

308.435

s38584.1

352.236

seq

143.866

spla

185.734

Table 1. Base bounding box cost used in results for benchmark circuits.

In the first set of experiments, the effect of violating the Dlimit constraint is explored as explained in the pervious section. Here, after a random location is found for an LEB swap, the closest LEB column is selected as its swap locations. As the figure shows, on average, this does not affect the placement quality.



Figure 6. Effect of violating the Dlimit constraint when finding swap locations.

The following figure illustrates the effect of scaling Dlimit by the LEB height and LEB column to column distance. As in the previous experiment, this seems to only marginally affect the placement quality where the cost reduces by 0.2%. This improvement seems insignificant and could be attributed to noise.


Figure 7. Effect of scaling Dlimit by the LEB height and LEB column to column distance.

The following figure illustrates the effect of altering the bounding box cost to the worse case metric as described in previous sections. Again, the effect is marginal where we get an average improvement of 0.2% in the bounding box cost. The negligible effect is attributed to the small LEB dimensions which are 2x1. Since the LEBs are close to the CLB dimensions, the effect of switching between the worse case and basic bounding box estimate for LEBs will be insignificant. This motivates the use of larger LEB dimensions to see the effect of various heuristics.


Figure 8. The effect of using the worse case bounding box estimate for LEBs.

In the next set of experiments, the LEB architecture was modified to use large LEBs that were 3x2 in dimension. An illustration of this architecture is shown in the Appendix. The following figure illustrates the effect of increasing the LEB size on the bounding box cost. As expected, the bounding box cost increases where the cost increases by 9% on average.


Figure 9. Effect of increasing the LEB dimensions on the overall bounding box cost.

The following experiments are compared against the basic LEB placement using 3x2 LEBs. Here we see the effect of scaling the Dlimit and using the worse case bounding box estimate. In both cases, only a marginal effect on placement is viewed.


Figure 10. The effect of scaling the Dlimit on the bounding box cost.

Figure 11. The effect of using the worse case bounding box estimate on overall bounding box cost.

In the previous graph, the effect of the worse case bounding box estimate had negligible effect on the average final bounding box cost. This could be due to the small number (5%) of LEBs present in the netlists. Thus, in the following experiment, we increased the LEB count and converted 20% of all CLBs in a netlist into LEBs. As before, large LEBs were used with dimensions 3x2. The results were more substantial where the worse case bounding box estimate increased the bounding box cost by 1% when compared to the basic bounding box estimate.


Figure 11. Effect of the worse case bounding box estimate on the final bounding box cost when 20% of all CLBs in the netlist are converted into 3x2 LEBs.

Discussion

The results in the previous section show that the worse case bounding box cost does not help in reducing the final bounding box cost. This estimate is probably overly pessimistic, thus harms the final placement. Furthermore, since the router attempts to reduce wire-length, the worse case pin connections are probably unrealistic and if applied to the timing driven placer, would most likely be harmful to performance.

The effect of manipulating or violating the Dlimit constraint appears to have no average effect on the final bounding box cost. This is probably due to two contributing factors. Firstly, blocks that violate the Dlimit constraint will generally begin to degrade the placement quality. Secondly, the heuristic only affects Dlimit when it becomes smaller than the LEB column to column distance. This generally only occurs at medium to low temperatures. At these temperatures, the probability of accepting a bad move is much lower than at high temperatures. Thus, even if LEBs are allowed to violate the Dlimit due to scaling or searching for the closest LEB location, the chance of those moves being accepted are low and will not affect the final placement quality.

Program Usage

Please contact Andrew Ling if you wish to have a copy of the VPR placer with bounding box driven LEB support. The following are the additional VPR flags needed to turn on the various heuristics described previously.

Flag

Description

-use_rlim

Strictly obey the Dlimit constraint for LEBs.

Default – After a LEB swap location is found, ignore the Dlimit constraint and find the closest valid LEB column as the swap location.

-use_leb_radius

Scale Dlimit by the LEB height and LEB column to column distance.

Default – do not scale Dlimit.

-use_worse_case

Use the worse case bounding box for LEBs.

Default – assume all the LEB pin connections are found at the bottom left corner of the LEB.

Table 2. VPR LEB placement options.

Last updated Dec 2005