The Plan

The core of the project is the interconnection network as described on the start page. On top of that, we propose to build a cache simulator that allows the study of things such as cache coherence protocols for large CMPs. This calls for a three-tiered system:

The plan is to develop a software performance model for the interconnection network simulator first to evaluate different design decisions (e.g. is the ECE1373 on-chip interconnect architecture scalable to hundreds of nodes? How hard is it to model the details in the router?). We will also use this model to work out the interface between the processor model (which can be a host PC or a full-system simulator), the cache simulator and the interconnection network simulator.

We need to work out the API on the interface between the cache simulator and the interconnection network simulator, so that these two parts can proceed in parallel.

Standalone Interconnection Network Simulator

With the stand-alone interconnection network simulator, the processor model and the cache simulator can be implemented in software on a host PC or modeled by a full-system simulator. This external “driver” provides packets to the interconnection network simulator, and receives back timing information for each event.

Cache Simulator

The cache simulator sits between the processor model and the interconnection network simulator. The design of the cache simulator is orthogonal to that of the interconnect simulator, so long as it is compatible with the interface exported by the latter.

The processor sends memory request messages to the cache simulator, which in turn sends network packets to the interconnection network.

Other Design Issues

FPGA Platform

Two options for now:

  • Xilinx ML507 board
  • The other board with the FPGA on the FSB (talk to Greg or Prof. Chow)

ISA

The interconnection network simulator should not depend on a specific ISA. But we prefer to make sure it works with x86 processors (host or simulated).

To-Do's

Keep this list up-to-date on the issues that need to be investigated. Results should be logged here, or on a separate page linked from here.

  1. :DONE: What is the aggregated bandwidth required between the processors and the interconnection network simulator?
    • SimFlex data from Jason (bandwidth.xls) shows that for a variety of workloads, the average number of messages (control and data) injected into the network per core per cycle is 0.05. Assume that we use 64-bit message descriptors, and the on-chip network is simulated at 10MHz, this translates into a bandwidth requirement of 3.86 MB/core/s. The theoretical bandwidth of the common interfaces are listed below:
      • PCIe: v1.x 250 MB/s per lane; v2.0 500 MB/s per lane
      • USB: 60MB/s
      • FSB (on Nallatech V5 Xeon accelerator module): 8GB/s peak
  2. :TODO: Software performance model of the interconnection network simulator
    • How many nodes we can pack into a single FPGA?
    • Is the on-chip interconnect scalable to more nodes?
  3. :TODO: What inferface to use between the simulator on the FPGA and the host PC? FSB? PCIe? USB?
  4. :TODO: How does the data flow in a multi-processor system?

A list of requirement on different aspects of the project.

Notes

  1. Event-queue based approach may be better for latency tolerance
    • Parallel event dispatch
  2. Network visualization
  3. Specify the properties of the network and optimize for the application/underlying fabric
notes.txt · Last modified: 2009/07/30 16:11 by danyao
Back to top
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0