The NUMAchine Multiprocessor

next up previous
Next: Introduction

The NUMAchine Multiprocessor

Z. Vranesic, S. Brown, M. Stumm, S. Caranci, A. Grbic, R. Grindley,
M. Gusat, O. Krieger, G. Lemieux, K. Loveless, N. Manjikian Z. Zilic,
T. Abdelrahman, B. Gamsa, P. Pereira, K. Sevcik, A. Elkateeb, S. Srbljic

Department of Electrical and Computer Engineering
Department of Computer Science
University of Toronto
Toronto, Ontario, Canada M5S 1A4

Wed Jun 28 18:35:07 EDT 1995


NUMAchine is a cache-coherent shared-memory multiprocessor designed to have high-performance, be cost-effective, modular, and easy to program for efficient parallel execution. Processors, caches, and memory are distributed across a number of stations interconnected by a hierarchy of unidirectional bit-parallel rings. The simplicity of the interconnection network permits the use of wide datapaths at each node, and a novel scheme for routing packets between stations enables high-speed operation of the rings in order to reduce latency. The ring hierarchy provides useful features, such as efficient multicasting and order-preserving message transfers, which are exploited by the cache coherence protocol, for low-latency invalidation of shared data. The hardware is designed so that cache coherence traffic is restricted to localized sections of the machine whenever possible. NUMAchine is optimized for applications with good locality, and system software is designed to maximize locality. Results from detailed behavioral simulations to evaluate architectural tradeoffs indicate that a prototype implementation will perform well for a variety of parallel applications.

Stephen D. Brown
Wed Jun 28 18:34:27 EDT 1995