Abstract
The performance of any computer system is a function of the efficiency
of its architecture, which determines how many basic operations are
required to execute a given program, and the efficiency of its
implementation, which determines how much time each of those basic
operations takes. Reconfigurable computing systems can provide
extremely efficient architectures by implementing application-specific
operations in reconfigurable hardware, but this architectural efficiency
comes at a significant cost in operation latency and other metrics of
hardware efficiency. This gap in implementation efficiency can be seen
in the disparity between the clock rates of current microprocessors (2+
GHz) and FPGA-based systems (typically on the order of 200 MHz), and
will continue to grow as wire delays become a greater and greater
component of overall cycle times unless changes are made to the
architectures of reconfigurable systems.
In this talk, I present the architecture of the Amalgam clustered
programmable-reconfigurable processor, which was designed to allow both
its programmable and reconfigurable components to operate at high clock
rates when implemented in future fabrication processes. An Amalgam
processor consists of four programmable and four reconfigurable clusters
that communicate with each other and the memory system via an on-chip
network. Amalgam's reconfigurable clusters limit wire lengths by
dividing their reconfigurable logic into four segments that are
interleaved with portions of the cluster's register file, and support
pipelining of long wire delays across multiple clock cycles. A
register-based inter-cluster communication mechanism allows fast
transfers of data between clusters, while queues on each cluster's
network interface reduce synchronization overhead.
In far-future fabrication processes, Amalgam's support for pipelining in
its reconfigurable clusters allows them to operate at clock rates up to
70% higher than unpipelined versions of the reconfigurable clusters.
This increase in reconfigurable cluster clock rates, combined with
Amalgam's support for fine-grained parallelism and low synchronization
overheads, allow Amalgam to maintain a 2.5x performance advantage over
an 8-processor CMP in a wide range of fabrication processes. Amalgam's
reconfigurable cluster design also greatly simplifies the process of
mapping algorithms onto pipelined reconfigurable logic by allowing the
pipeline depth of the algorithm to be determined independently of the
pipeline depth of the hardware, overcoming a difficulty seen in previous
pipelined reconfigurable logic architectures.
Biography
Nicholas Carter has been an Assistant Professor at the University of Illinois at Urbana-Champaign since 1999. Prior to that, he was a graduate student at the Massachusetts Institute of Technology, where he was the memory system architect on the M-Machine project. His research interests focus on reconfigurable computing and computing using non-silicon devices.