Research Overview
My primary research interests lie in on-chip interconnection networks and cache coherence protocols for many-core architectures. My current research re-examines and challenges some of the design assumptions that hold true for shared-memory multiprocessors when explored in the context of chip multiprocessors. As we migrate many of these design choices on-chip, it is worthwhile to examine their suitability and present novel solutions that are attractive in the unique environment of a many-core architecture. The goal of this research is to carefully consider communication requirements (as dictated by the coherence protocol and software), design the interconnection to better serve the coherence protocol and improve the cache coherence protocols and high-level communication mechanisms to better leverage the functionality of the on-chip interconnection network.
For an overview and introduction to on-chip network research, I refer the interested reader to the Computer Architecture Synthesis Lecture on On-Chip Networks.
Current Projects
Simulation Acceleration
Modern multi-cores and systems-on-chip have increasingly used packet-switched networks-on-chip (NoCs) to meet the growing demand for on-chip communication bandwidth, as more cores are incorporated into each chip. NoC designs are sensitive to many parameters such as topology, buffer sizes, routing algorithms, and flow control mechanisms. Detailed NoC simulation is essential to accurate full-system evaluation. We are exploring various technique to improve simulation time and fidelity. First, we propose DART, a fast and flexible FPGA-based NoC simulation architecture. Rather than laying the NoC out in hardware on the FPGA like previous approaches, our design virtualizes the NoC by mapping its components to a generic NoC simulation engine, composed of a fully-connected collection of fundamental components (e.g., routers and flit queues). This approach has two main advantages: (i) since FPGA implementation is decoupled it can simulate any NoC; and (ii) any NoC can be mapped to the engine without resynthesizing it, which can take time for a large FPGA design. Second, we are exploring new evaluation methodologies that will allow early-stage analysis of the impact and requirements of cache coherence protocols for on-chip networks.

Writing efficient, high performance parallel programs represents a significant challenge to the adoption of many-core architectures. Communication consumes a significant fraction of on-chip resources and can becomes a bottleneck in scaling programs. Using the Intel Single-chip Cloud Computer, we are exploring the impact of programming models and algorithms on communication. This chip can be programmed using message passing; various communication libraries and primitives are being developed to leverage the on-chip network to exploit the computation capacities of this device.

Semantically-rich Interconnection Networks
Today's on-chip interconnection networks are largely oblivious to the needs of the components they connect and serve the sole purpose of shuffling bits around the die. In this project, we propose to embed additional functionality in the interconnection network. In particular, we are providing hardware support within the network for various communication primitives such as multicasting which is used in cache coherence protocols and leveraged by programming models such as MPI. By more efficiently handling communication primitives in hardware, we improve performance and reduce the dynamic power consumption of the on-chip network. Furthermore, we are exploring on-chip network designs that must effectively match the demands of different applications.
On-Chip Network Support for Server Consolidation and Workload Isolation
As architectures scale to many cores, it becomes increasingly difficult to scale individual programs to fully utilize the available cores. As a result, multiple workloads are being consolidated on a single chip to maximize utilization. Existing routing algorithms, both the deterministic and adaptive largely overlook the issues associated with workload consolidation. Ideally, the performance of each application should be the same whether it is running in isolation or is co-scheduled with other applications. Significant research has focused on maintaining isolation and effectively sharing on-chip resources such as caches and memory controllers. Recently, we have proposed DBAR, a destination-based adaptive routing scheme. DBAR dynamically filters network congestion information to prevent the traffic patterns and congestion of one workload from impacting the routing decisions of a separate workload.
Additional Projects
In addition to these projects, I am currently recruiting outstanding Masters and PhD students to work on new projects. These projects explore various aspects of on-chip network design and optimization. If you have applied to graduate school at the University of Toronto and feel your interests align with mine, please email me. You are more likely to receive a response if you can demonstrate that you have read at least one of my papers. E-mails that address me as "Dear Sir:" will be ignored. Bonus points if you figure out that my full last name is "Enright Jerger".
We are grateful for the funding and in-kind contributions for these projects provided by the following: Natural Science and Engineering Research Council (NSERC), Connaught Foundation, University of Toronto, Canadian Foundation for Innovation (CFI), Intel, Qualcomm and AMD.
Last updated: March 2011
