The NUMAchine Multiprocessor
Next: Introduction
The NUMAchine Multiprocessor
Z. Vranesic, S. Brown, M. Stumm, S. Caranci, A. Grbic, R. Grindley,
M. Gusat, O. Krieger, G. Lemieux, K. Loveless, N. Manjikian Z. Zilic,
T. Abdelrahman, B. Gamsa, P. Pereira, K. Sevcik, A. Elkateeb, S. Srbljic
Department of Electrical and Computer Engineering
Department of Computer Science
University of
Toronto
Toronto, Ontario, Canada M5S 1A4
Wed Jun 28 18:35:07 EDT 1995
Abstract:
NUMAchine is a cache-coherent shared-memory multiprocessor designed to
have high-performance, be cost-effective, modular, and easy to program for
efficient parallel
execution. Processors, caches, and memory are distributed across a
number of stations interconnected by a hierarchy of unidirectional
bit-parallel rings. The simplicity of the interconnection network
permits the use of wide datapaths at each node, and a novel scheme for
routing packets between stations enables high-speed operation of the
rings in order to reduce latency. The ring hierarchy provides useful
features, such as efficient multicasting and order-preserving message
transfers, which are exploited by the cache coherence protocol, for
low-latency invalidation of shared data. The hardware is
designed so that cache coherence traffic is restricted to
localized sections of the machine whenever possible. NUMAchine is optimized
for applications with good locality, and
system software is designed to maximize locality. Results from detailed
behavioral simulations to evaluate architectural tradeoffs indicate
that a prototype implementation will perform well for a variety of
parallel applications.
Stephen D. Brown
Wed Jun 28 18:34:27 EDT 1995