Consider the case where many processors are spinning on a data element (e.g., a eureka variable ) and some processor writes that data. With a write invalidate protocol, when the processor modifies the data all the shared copies of the data are invalidated. Hence, data accessed in this fashion involves both a large latency to make the modification and contention at the memory module when the spinning processors obtain a new copy. With the above control, software can instead temporarily bypass the hardware coherence, modifying shared data and multicasting it to the affected network caches without first invalidating the shared copies.
In particular, the system software interacts with the hardware to: 1) obtain the routing mask of network caches at stations caching the data, 2) lock the cache line to ensure that additional stations are not granted access to it, 3) modify the state of the cache line in the secondary cache to dirty, 4) modify the contents of the cache line in the secondary cache, and 5) multicast the cache lines using the routing mask obtained earlier. When the updates arrive at a network cache, the network cache invalidates any copies in local secondary caches. When the update arrives at memory, the cache line is unlocked.