|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
VPR
Placement Support for LEBs The goal of placement support for LEBs was to upgrade the VPR wire-length (bounding box) driven placer such that it could support any number of LEB columns where all LEB locations were the same width and height. This was successfully accomplished with full graphics support as shown in the Placement Pictures. Basic ApproachWe applied the same placement heuristics applied to IOs and CLBs to place LEBs. Thus, LEBs are added to the anneal process in VPR. The annealer works by randomly selecting blocks to move and swap that block to a random location that is obeys the Dlimit constraint. Furthermore, when choosing the random location, you must find a location that can support the current block type (as specified in the clb[][].type field). Previously, the annealer only had conditions for IOs and CLBs. We updated the annealer to support LEBs. In cases where the random block selected is an LEB, a valid LEB location is found for swapping. Like CLBs and IOs, the LEB swap location is chosen at random and must obey the radius limit constraints, Dlimit, at the current temperature. LEB column locations are stored in an additional data structure for fast access (specified in the leb_cols data structure). Once the swap is made, the change is cost is calculated. Based on the current temperature and change in cost, it is either accepted or rejected. Since the placement is driven by the wire-length estimate (bounding box cost), the change in cost is simply the total wire-length estimate before the swap minus the wire-length estimate after the swap. As stated in the Introduction, swaps that increase the overall wire-length estimate may still be accepted in order to achieve hill-climbing. Since LEBs are not uniform width or height, this complicates the wire-length estimate and move radius constraints. In the following sections we explain this issue and some possible solutions. Additional HeuristicsRadius Limiter for MovesAs described in [Betz1997], blocks are restricted
in movement in the x and y direction by a radius limit referred as
Dlimit as illustrated in previous sections. Since CLBs are unit
width and height, Dlimit represents the maximum number
of CLB positions between a source location and its destination. For
LEBs, this is no longer the case where the LEB is not necessarily unit width
or unit height, and LEB columns are not adjacent to each other. One
possibility is to ignore this and treat the Dlimit
uniformly between CLBs and LEBs. However, this may potentially cause
LEBs to get "stuck" in column early in the placement stage. For example,
consider the following figure where there exist two LEB columns on the
chip. ![]() Figure 1. LEB Dlimit restriction. One possibility is to relax the Dlimit constraint in some cases. When selecting the location to swap to, we pick any (x,y) location that obeys the Dlimit constraint, just as in the case of CLBs. If this location is not a valid LEB location, we find the closest LEB location. When searching for the closest LEB location, we ignore the Dlimit constraint. This method prevents an LEB from getting "stuck" in an LEB column early in the placement; however, once Dlimit becomes smaller than half the length between two adjacent LEB columns, an LEB will again get trapped in its current column.
Our second method prevents an LEB from getting trapped in any column
throughout the placement process. In this case, we scale
Dlimit by the distance between two adjacent LEB
columns for the x direction, and by the LEB height in the y direction.
Thus, LEBs will always have freedom to switch between columns and
neighbouring LEBs. For example, in the following figure,
Dlimit=1, but the valid search region for LEB swaps spans
an entire LEB column to column width and LEB height. ![]() Figure 2. Scaling of Dlimit for LEB moves. Pin Locations for Wire-Length Estimation
VPR uses net bounding boxes as an estimation of wire-length.
In order to calculate the bounding box, the pin locations of a net need to be identified.
For CLBs, there is the assumption that all pins are centered in the
middle of the CLB. This is a fair approximation since all CLBs are unit
height and width. LEBs, however, can vary in height and width and thus
there is more flexibility in pin locations. We use two methods to
approximate the location of pins: assume all pins are found in the
bottom-left corner of the LEB; assume the worst case where connections will
span the farthest distance between two LEBs. In the worst case
condition, this will cause the bounding box to encapsulate the entire
LEB. For example, consider the simple case where the two LEBs are
connected in the following figure. The pin locations are assumed to
be in the locations which will cause the pins to span the farthest
distance, and thus the bounding box will encapsulate both LEBs. ![]() Figure 1. Illustration of a worst case bounding box for two connected LEBs.
The worse case bounding box estimate required several changes to the VPR
bounding box calculator. VPR uses edge detection to speed up bounding
box calculations. This is useful when the number of pins on a net is
fairly large (more than 4 pins). Edge detection works by storing the
current bounding box edge locations and the number of blocks found at
each edge.
The benefit of edge detection allows bounding box updates to occur
without having to visit all pin connections of the net. Only the blocks
currently being moved and the number of blocks at a given edge need to
be considered.
This is illustrated in Figure 2a. Here, any movement indicated by the
arrows will require an update to the bounding box size. Note that we
only need to consider the current block being moved in this illustration.
During the
worse case bounding box estimation, having
multi-width and height LEBs complicates edge detection
since the size of the
blocks needs to be accounted for as shown in Figure 2b.
![]() (a) Edge detection using uniform blocks ![]() (b) Edge detection using non-uniform blocks Figure 2. Edge detection scheme used in VPR. Test Approach
In order to ensure our LEB upgrades did not degrade placement quality,
we ran several tests on 23 MCNC benchmark circuits. First, we placed
each circuit with CLBs only and compared the cost against an LEB based
architecture with three LEB columns and 5% LEB content in the netlist.
When generating LEBs, we select random CLB pairs and join them to form
one LEB.
This process is described in detail here,
The following figure illustrates this architecture using one of the
smaller MCNC benchmark circuits.
Figure 3. Diagram of circuit i9 with the addition of LEBs to the circuit netlist.
Figure 4. Comparison of the LEB supported architecture against the CLB only architecture.
Figure 5. Comparison of CLB dominated netlist placement over LEB dominated netlist placement. ResultsIn all of our tests, we use a basic LEB architecture consisting of three uniformly spread LEB columns. LEBs have a height of two and width of one. For all netlist, 5% of their CLBs were converted into LEBs where CLB pairs were used to create one LEB. The CLBs consist of four 4-LUTs with 10 inputs per cluster (N=4 I=10). The LEBs consists of two CLBs with 20 inputs, 8 outputs, and 2 global clocks inputs (i.e. double the pins on one CLB). The following table illustrates our base costs used in all of our comparisons. Here, the LEBs were placed using the strict Dlimit constraint where LEBs could not violate this constraint during the move selection. Also, the basic bounding box estimation is used where LEB net pins are assumed to connect to the bottom right location of the LEB.
In the first set of experiments, the effect of violating the Dlimit constraint is explored as explained in the pervious section. Here, after a random location is found for an LEB swap, the closest LEB column is selected as its swap locations. As the figure shows, on average, this does not affect the placement quality.
Figure 6. Effect of violating the Dlimit constraint when finding swap locations.
The following figure illustrates the effect of scaling
Dlimit by the LEB height and LEB column to column
distance. As in the previous experiment, this seems to only
marginally affect the placement quality where the cost reduces by 0.2%.
This improvement seems insignificant and could be attributed to
noise.
Figure 7. Effect of scaling Dlimit by the LEB height and LEB column to column distance.
The following figure illustrates the effect of altering the bounding box
cost to the worse case metric as described in previous sections. Again, the effect is marginal where we
get an average improvement of 0.2% in the bounding box cost. The
negligible
effect is attributed to the small LEB dimensions which are 2x1. Since
the LEBs are close to the CLB dimensions, the effect of switching
between the worse case and basic bounding box estimate for LEBs will be
insignificant. This motivates the use of larger LEB dimensions to see
the effect of various heuristics.
Figure 8. The effect of using the worse case bounding box estimate for LEBs.
In the next set of experiments, the LEB architecture was modified to use
large LEBs that were 3x2 in dimension. An illustration of this
architecture is shown in the Appendix. The
following figure illustrates the effect of increasing the LEB size on
the bounding box cost. As expected, the bounding box cost increases
where the cost increases by 9% on average.
Figure 9. Effect of increasing the LEB dimensions on the overall bounding box cost.
The following experiments are compared against the basic LEB placement
using 3x2 LEBs. Here we see the effect of scaling the
Dlimit and using the worse case bounding box estimate.
In both cases, only a marginal effect on placement is viewed.
Figure 10. The effect of scaling the Dlimit on the bounding box cost.
Figure 11. The effect of using the worse case bounding box estimate on overall bounding box cost.
In the previous graph, the effect of the worse case bounding box
estimate had negligible effect on the average final bounding box cost.
This could be due to the small number (5%) of LEBs present in the
netlists. Thus, in the following experiment, we increased the LEB count
and converted 20% of all CLBs in a netlist into LEBs. As before, large
LEBs were used with dimensions 3x2. The results were more substantial
where the worse case bounding box estimate increased the bounding box
cost by 1% when compared to the basic bounding box estimate.
Figure 11. Effect of the worse case bounding box estimate on the final bounding box cost when 20% of all CLBs in the netlist are converted into 3x2 LEBs. DiscussionThe results in the previous section show that the worse case bounding box cost does not help in reducing the final bounding box cost. This estimate is probably overly pessimistic, thus harms the final placement. Furthermore, since the router attempts to reduce wire-length, the worse case pin connections are probably unrealistic and if applied to the timing driven placer, would most likely be harmful to performance. The effect of manipulating or violating the Dlimit constraint appears to have no average effect on the final bounding box cost. This is probably due to two contributing factors. Firstly, blocks that violate the Dlimit constraint will generally begin to degrade the placement quality. Secondly, the heuristic only affects Dlimit when it becomes smaller than the LEB column to column distance. This generally only occurs at medium to low temperatures. At these temperatures, the probability of accepting a bad move is much lower than at high temperatures. Thus, even if LEBs are allowed to violate the Dlimit due to scaling or searching for the closest LEB location, the chance of those moves being accepted are low and will not affect the final placement quality. Program UsagePlease contact Andrew Ling if you wish to have a copy of the VPR placer with bounding box driven LEB support. The following are the additional VPR flags needed to turn on the various heuristics described previously.
Last updated Dec 2005 |