The NUMAchine memory hierarchy consists of four levels with respect to a processor within a station. The primary on-chip processor cache is the closest level, followed by the external secondary SRAM cache. The next level consists of the DRAM memory located in the same station. This includes the memory module(s) for the physical address range assigned to the station, and the station's network cache, which is used as a cache for data whose home memory is in a remote station. The final level in the memory hierarchy consists of all memory modules that are in remote stations.
Within each station, processor modules share a centralized memory via the station bus. This arrangement has the advantage of centralizing cache coherence mechanisms within a station, which simplifies the memory system design. Furthermore, separating the processors from the memory permits the processor technology to be improved without affecting the rest of the system.
Each station's network cache serves two related purposes: it caches data whose home memory location is in a remote station, and it confines cache coherence operations (as much as possible, according to the coherence protocol) for the remote data so that they are localized within the station. In addition, the network cache reduces network traffic by serving as a target for multicasts of remote data, and by combining multiple outstanding requests from the station for the same remote cache line. For simplicity, in our prototype machine the network cache is direct-mapped. Its design does not enforce inclusion of the data cached in the station's processor caches, but the size of the network cache, which is at least as large as the combined processor secondary caches, implies that inclusion in the network cache will usually exist.