The NUMAchine rings connect a number of nodes with unidirectional links that operate synchronously using a slotted-ring protocol. Each slot carries one packet and advances from node to node every clock cycle. The ring interface at each node contains a bidirectional link to a station or to another ring. To place a packet onto the ring, the ring interface waits for an empty slot. After removing a packet from the ring, the ring interface sends an empty slot to the next node.
Packets are used to transfer requests and responses between stations. A single transfer may consist of one or more packets, and may be of several types: cached and uncached reads and writes, multicasts, block transfers, invalidation and intervention requests, interrupts, and negative acknowledgements. All data transfers that do not include the contents of a cache line or a block require only a single packet. Cache line and block transfers require multiple packets. Since these packets are not necessarily in consecutive slots, they are assigned an identifier to enable reassembling the cache lines or blocks at the destination station.
The routing of packets through the NUMAchine ring hierarchy begins and ends at stations in the lowest level of the ring hierarchy. The unidirectional ring topology guarantees a unique routing path between any two stations. Station addresses are specified in packets by means of routing masks. Each level in the hierarchy has a corresponding bit field in the routing mask, and the number of bits in each field corresponds to the number of links to the lower level. For example, a two-level system consisting of a central ring connected to 4 local rings, with each local ring connected to 4 stations, requires two 4-bit fields in the routing mask; one field specifies a particular ring, and the other field indicates a specific station on that ring. The routing of packets through the levels of the hierarchy is determined by setting bits in the appropriate fields of the routing mask. Since a single field is used for each level of the hierarchy, the number of bits needed for routing grows logarithmically with the size of the system. In addition to specifying the path of packets through the ring hierarchy, the routing masks are also used in maintaining status information needed for the cache coherence protocol; the routing bits identify the locations which may have a copy of each cache line. The small size of the routing mask limits the storage cost for this status information.
Figure 3: An example of an inexact routing mask.
When only one bit is set in each field of the routing mask, it uniquely identifies a single station for point-to-point communication. Multicast communication to more than one station is enabled by OR-ing bit masks for multiple destinations. As a result, more than one bit may be set in each field. Since a single field is used for each level, rather than individual fields for each ring at a given level, setting more than one bit per field may specify more stations than actually required. This is illustrated in Figure 3, which shows that when the bitmasks that specify station 0 on ring 0 and station 1 on ring 1 are OR'd, then station 1 on ring 0 and station 0 on ring 1 will also be sent the message. The imprecise nature of the routing bits results in some packets being routed to more stations than necessary, but the extra traffic generated under normal conditions (i.e. where data locality exists) is small and represents a good tradeoff for the savings involved (the significance of the savings is in both the number of bits needed per packet and, more importantly, in the number of coherence status bits needed per cache line).
The rules for routing packets in the ring hierarchy using the routing mask are simple. An ascending packet has at least one bit set in the field corresponding to the next higher level, and ring interfaces to higher-level rings always switch these packets up to the next level. Once the highest level specified by the routing mask is reached, the packet must descend. At each ring interface connected to a lower level of the hierarchy, the packet may potentially be switched down to the lower level if the bit corresponding to the downward link is set to one in the routing mask. A copy of the packet may also be passed to the next ring interface at the same level if more than one bit is set in the same field. When a packet is switched downward to a lower level, all bits in the higher-level field are cleared to zero. The simplicity of this scheme permits a high-speed implementation, since only one field of the routing mask is involved in the routing decision at each ring interface.
Figure 4: Two-level NUMAchine cache coherence protocol.