Table of Contents

FPGA-Aided Microarchitecture Exploration

Increasing number of modern digital systems now use interconnection networks as the communication fabric. Simulation is an essential tool to evaluate the performance of interconnection networks, but detailed simulation in software is often slow. Since a network has few global states, and most nodes in the network are quasi-independent from one another, there is opportunity for hardware acceleration.

Interconnection Network Simulator on FPGA

The goal is to build a timing simulator to study the behaviour of the type of interconnection networks used in CMPs. Modern on-chip interconnection networks are packet-switched and consist of a network of routers. The simulator will model such a network, with a focus on the details of the pipelines and buffers of the on-chip routers. The simulator accepts packet traffic from an external source (a host computer or a full-system simulator), and simulates the traversal of the packets in the network.

The simulator provides tools to meausre various metrics of the network performance, such as aggregated throughput, average end-to-end latency, etc.

A tentative Project Plan.

Goals

Performance

Function

The interconnection network simulator should model the functionality of the following components:

Interface

The simulator should provide an interface to interact with an external processor or memory system simulator. The external simulator can be running on an embedded processor or a CPU off-chip. It provides stimulus traffic to the interconnection network simulator and receives latency numbers from the latter.

Configurability

Related Work

Research Survey

Architecture

Router

Traffic Generation

On-Chip Interconnection Network

On-chip interconnect

Our on-chip interconnect is similar to the one in Packet Network Simulator project. This architecture allows arbitrary communication pattern between the on-chip nodes (Routers, Traffic Generators, etc.), so we can simulate different topologies without changing the simulator hardware.

HDL

HDL specifications of the simulator components

Stuff

Design Questions

  1. What's the performance penalty for using a generic on-chip network to simulate a specific topology instead of building the target topology directly?
    • ~50% speed reduction as measured on the network-simulator-on-chip simulator 1)
  2. Which full-system simulator to use?
    • Need to analyze the performance bottleneck in a full-system simulator
  3. How to synchronize timing between external processor simulator and the interconnect simulator?
  4. Any merit to not assume unit bandwidth and latency for on-chip links?
    • As long as everything operates based on flit-cycles, there's no benefit to assume non-unity bandwidth and latency. Links to memory controllers are also on-chip.
1) http://www.stuffedcow.net/bits/On-Chip_Network ECE1373 Network-Simulator-on-Chip project