Performance of the Coherence Protocol



next up previous
Next: Related Work Up: Performance Results Previous: Performance of the

Performance of the Coherence Protocol

As mentioned at the beginning of this section, the coherence protocol was designed under the assumption that certain cases would occur only infrequently. To assess the validity of that assumption, we measure the frequency of those cases here.

The first case involves the inexactness of the filter mask. It is possible that an "old" write permission request that has been travelling through the network for some time can reach memory after previous requests have invalidated the requester's shared copy. Since the filter mask is not precise, it is possible for the memory module to erroneously believe that the requester still has a shared copy, in which case it will respond with only an acknowledgement, granting ownership to the requester. The requester will see the ownership grant, but will not have valid data. In this case, the requester must send a special write request to memory, indicating that data must be returned. The above scheme is an optimistic design, in that a memory module always assumes that a requester has correct data, in spite of the ambiguous directory information. The alternative would be for the memory module to assume that the requester does not have valid data when such ambiguity arises; this implies that data would always be sent, and this would be wasteful unless that data is almost always needed. The simulation results shown below indicate that the optimistic choice is the right one. Across all the applications and for all system geometries (representing hundreds of millions of requests to memory) only 4 special read requests were ever sent. This result is a manifestation of the well-known property of multiprocessor systems that a given cache line is almost always shared by 1, 2 or all processors, and very rarely by some number in between; the chances that three stations share a line in just the right way for the optimistic assumption to fail are small.

  
Table 3: Percentage of local requests to NC that result in a false remote request being sent to memory.

The second case of interest arises due to the direct-mapped nature of the network caches. It is possible for the network cache to lose directory information due to replacements by other requests. The most costly effect of this choice is when data has been made dirty locally on a station, but this information is subsequently thrown out of the NC. A request for this line now misses, and is sent to memory, which sends the request back indicating that its filter masks indicates that the local station already has that data, in LV state. At this point the NC does the intervention that it could have done immediately if the directory information had not been lost. We call these types of misses false remote requests. Again the simulations show that this case happens very infrequently. Table 3 indicates the percentage of all local requests that end up generating false remote requests. Only for one application, FMM, does the percentage approach 1 %.

Both of the above cases arise due to a loss of information in the coherence protocol. (In one case it is imprecision in the directory bits, in the other it is the wholesale loss of all local directory information.) The conclusion is that full state/directory information is not necessary for the efficiency of the cache coherence protocol. The cases for which the protocol chose simplicity over efficiency are those that happen rarely enough that overall performance is not affected.



next up previous
Next: Related Work Up: Performance Results Previous: Performance of the



Stephen D. Brown
Wed Jun 28 18:34:27 EDT 1995