Software Simulator

Index

Supported Systems

  • Visual C++ 2008
  • Linux-like systems

The simulator is located at sim/.

Command

sim <config_file> <ready_delta> <sim_time_steps> [Option]

Note that sim_time_steps here indicates the duration in which measurement flits will be generated. If a warmup phase is specified, a certain number of warmup cycles will be run before we start counting the measurement cycles.

Options

  • v: verbose
  • V <T>: verbose after sim_time >= T
  • -legacy: Configuration file is of legacy format

Structure

Class Node defines the interface of a “Node”, and is inhereted by classes TrafficGen, Router, and FlitQueue (currently not used). Class Interconnect implements the on-chip network.

Each clock cycle, the simulator drives the interconnect first and then the nodes by calling the tick () function of each object.

Router

The Router class models the on-chip router (without the input buffers). It uses a static deterministic routing table to forward flits. The table is indexed by flit destination, and each entry contains the next hop node address and the port and VC number to use. A Router can have variable number of ports (defined in the topology file) and each port is fully duplex.

RouterMesh

The RouterMesh subclass models an unpipelined mesh-element (5 ports, variable VCs). Incoming flits are queued according to the respective input port and virtual channel. Every clock cycle, this Router picks one flit among all input ports/VCs and routes it if the requested output port has available credits. Arbitration priority among the input ports is based on a rotating priority scheme, which prevents starvation.

The Router uses a counter to track credits for each output port/VC. While it makes implementing credit-based flow control simpler, this element cannot be used to build any topology that requires some Router to have more than 5 ports.

RouterMeshFull

A RouterMesh object can be configured to expose all its output ports to the global interconnect. This is used to isolate the effect of serializing the input ports and serializing the output ports. Right now this configuration can only be used with the DualCrossbarRTPort interconnect.

RouterMeshIdeal

The RouterMeshIdeal subclass models an unpiplined mesh-element (5 ports, variable VC) that are able to route one flit for all input ports in one clock cycle provided there are sufficient credits. A separate output buffer is used for each output port. This model sets the lower bound on the clock/timestep performance given a specific interconnect.

Traffic Generator

The TrafficGen class models a number of synthetic traffic injection process. For now only the Bernoulli process with a fixed flit size and send-to address is implemented. Each TG has a default Router; it usually connects to port 0 on this Router.

Flit Queue

The FlitQueue class models the input buffers of nodes. It has configurable latency, bandwidth (number of flits per time step) and buffer size. By modeling the input buffers separately, we can construct nodes with variable number of ports by connecting multiple FlitQueues to a Router.

In addition to the flit FIFO, each FQ also has a credit channel. This models the side-band channel between Router that carries the flow control credits. This channel is simpler than the flit channel because it does not model bandwidth. This is because credits only traverse a single hop; hence the credit channel will have at most one credit to deliver per time step. The flit channel and credit channel are independent and each can deliver a flit every clock cycle.

Virtual channels are modeled using separate FIFOs within a FlitQueue (each FQ represents a physical channel) instead of multiple FQs because bandwidth is a physical channel property. For example, if a physical channel with bandwidth 1 carries two VCs, then between the two VCs only 1 flit can pass through the FQ each time step.

Flit

The Flit class represents the traffic in the network. Flits are injected by Traffic Generators and traverses several Routers until it reaches the destination TG.

The Flit timestamp keeps track of when the next event for this Flit should be processed.

Configuration

We extend the SimFlex topology file to configure our simulator. For now, only a subset of the SimFlex topology directives that are relevant to our simulator are supported. All other tokens are ignored. An example of this reduced set is shown below (adopted from 16node-torus.topology from DSMFlex.OoO).

Configuration files are located at sim/config/.

# Basic Switch/Node connections
NumNodes 16
NumSwitches 16
SwitchPorts 5              # Not used for now
SwitchBandwidth 4          # Not used for now

Top Node 0 -> Switch 0:0
Top Node 1 -> Switch 1:0
...
Top Node 15 -> Switch 15:0

# Topology for a 16 node TORUS
Top Switch 0:3 -> Switch 1:1
Top Switch 0:4 -> Switch 4:2
Top Switch 1:3 -> Switch 2:1
Top Switch 1:4 -> Switch 5:2
...
Top Switch 14:4 -> Switch 2:2
Top Switch 15:3 -> Switch 12:1
Top Switch 15:4 -> Switch 3:2

# Deadlock-free routing tables

# Switch 0 -> * { DestPort : VC }
Route Switch 0 -> 0 { 0:0 } 
Route Switch 0 -> 1 { 3:1 } 
...
Route Switch 0 -> 14 { 1:1 } 
Route Switch 0 -> 15 { 1:1 } 

...

# Switch 15 -> *
Route Switch 15 -> 0 { 3:0 }
Route Switch 15 -> 1 { 1:0 }
...
Route Switch 15 -> 14 { 1:0 }
Route Switch 15 -> 15 { 0:0 }

We added the following directives to specify the interconnect model, traffic pattern, and simulation time. These should be inserted to the beginning of the config file before the SimFlex stuff. The Traffic Pattern lines are generated by scripts.

I ideal
# I bus
# I dualbus
# I crossbar
# I dualthing3pipe 5

SwitchType ideal            # Specifies type of Router: ideal, finite
NumNodes 16                 # This has to appear before specifying the traffic pattern

# Traffic pattern (type, source node -> dest node, size, interval)
# Constant-rate permutation traffic
Traffic CBR Node 0 -> Node 7 1000 10.0
Traffic CBR Node 1 -> Node 5 1000 1.0
...
Traffic CBR Node 14 -> Node 10 1000 5.0
Traffic CBR Node 15 -> Node 2 1000 10.0

# Number of warmup cycles to run before starting measurement
NumWarmupSteps 300

Legacy Support

Legacy config file is supported and can be used with the -legacy command line switch. An example is shown below:

I ideal
T 0 1 6 30 88 30 88 30 2      # addr2, def_router, interval, psize, OQ latency, bandwidth, IQ ..., sendto
T 2 1 7 30 40 30 40 30 0
R 1 2 (0 0) (2 2)             # addr2, # of entries, (dest0, port0), (dest1, port1), ...
Q 88 30 3 3                   # latency, bandwidth, next_hop, addr2

A single letter (T/R/Q) indicates the types of the node, followed by a sequence of numbers for the node parameters. A single line comment is marked by a #.

The interconnect is specified using the following format. Only one interconnect should be specified in each config file.

I ideal
I bus
I dualbus
I crossbar
I dualthing3pipe 5            # Max delivery delay is 5 time steps

Only ideal Router (infinite switching bandwidth) is supported in legacy mode.

Scripts

A set of helper scripts are located at sim/scripts/.

permutation.pl

Generate constant-rate permutation traffic for the given number of nodes.

run_sweep.pl

Sweep simulation parameters (such as interconnect, Router architecture, or traffic load). For a sweep config file as shown below, 4 different experiments (2 arch parameters x 2 traffic loads) will be generated and run.

sim_exec           $(PROJ_HOME)/sim/sim.exe
topology_file      16node-torus.bare.topology
traffic_file       16node-torus.traffic
sim_length         100000

parse_script       $(PROJ_HOME)/sim/script/parse_out.pl

sweep_arch (
  ideal, mesh_ideal
  dualcrossbar, mesh_ideal
);

sweep_traffic (
  size 2, interval 4
  size 2, interval 5
);
parse_out.pl

Parses out metrics from simulation output. It is used by the run-sweep script to generate experiment results.

Benchmarks

We use a simple benchmark (16node-torus.topology) to evaluate the design choices made in the simulator. It consists of a 16-node torus network with constant-rate permutation traffic. The traffic pattern is shown below. In the future we need to create more benchmarks.

16node-torus network with CBR permutation traffic

Verification

We compare the results obtained from the software simulator to booksim (PPIN) to verify the correctness of our simulator.

Random permutation traffic

This traffic saturates at flit rate = 0.5 because some links are shared between two traffic streams. We can start to see saturation behaviour when flit rate >= 0.4 (80% capacity) (figure1 ). Before saturation, the measured average packet latency between booksim and icsim is within 0.65%. After saturation, the trend of increasing latency projected by both simulators is similar, but the actual latency can differ by up to 25% (at flit rate = 0.45). One cause is the increaesd probability of “collision” (i.e. a string of packets issued in consecutive cycles) which will cause longer packet latencies. If this is the case, we would still expect similar distribution of packet latencies between the two simulators. This is verified in figure3 below.

In both cases, the simulator is warmed up for 15,000 cycles and measurement flits are generated for about 20,000 cycles.

Sweeping the packet size at fixed flit rate (=0.4, before saturation) produces similar results in both booksim and icsim (figure4 ). This gives us confidence that icsim behaves correctly for bursty traffic as well.

simulator.txt · Last modified: 2009/10/20 11:57 by danyao
Back to top
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0