This document summarizes special circuitry that should exist
in other parts of NUMAchine to help improve monitoring on the
processor card.
Processor Support
- secondary cache is hooked up to support split Instr/Data caches
- provide copy of secondary cache signals:
- SCAddr[17,14:1]
- SCAddr[0] is not needed
- SCTag[24:22], tag state bits
- SCTag[18:0], tag upper address portion, is not yet required
(would like it, i but don't have enough pins right now). stay tuned.
- don't care about SCTag[21:19] (pidx)
- SCTCS*
- SCDCS*
- SCWr*
- provide copy of IvdAck* (see External Agent Support below)
- provide STATUS signals to monitor chips
- provide CORRECT clock to STATUS chips (screws up when processor logic clock changes), and probably another copy to COUNTER chips
External Agent Support
- accept PhaseID[3..0] value from monitor, add it into the address space
in bits Addr[52..49]
- provide local bus handshake lines
- sends LB_RDn, LB_WRn, LB_RDYn, LB_DTn, LB_DWn (double-word accesses)
- only LB_DWn needs to be valid for entire duration of the operation)
- maybe LB_Rdn and LB_Wrn should be valid for entire duration too, in
case the external agent wants to cancel a read/write (eg: due to processor
reset)
- receives LB_Validn, LB_Busyn
- uncached (word and dword) writes to local bus from OTHER PROCESSORS must be
handled as LOCAL BUS WRITES by the EXTERNAL AGENT. The EA state machine reads
the header (and data) info out of the FIFOs and hands them over as a LB access.
(this is for inter-CPU interrupts, barriers, and writing to (ie, zeroing)
monitoring registers, mostly via multicast)
- provide copy of IvdAck* (which goes to processor, see Processor Support above)
- provide copy of FIFO handshake signals, so i can infer which
FIFO transactions are occuring, eg:
- are these correct?
- fifo_read
- fifo_write
- header -- asserted when reading/writing header of packet (end-of-data bit in CMD?)
- provide copy of NUMAchine CMD bits
- send outgoing 4-bit PhaseID from monitor
- return incoming 4-bit PhaseID to monitor
- support NACK counting -- give monitor either (TBD at a later date):
- NACK_COUNT bus (4-bits?), NACK_COUNT_VALID signal (asserted for 1 cycle)
- (preferred) NACK_RECEIVED signal (asserted for 1 cycle for each nack) and
ACK_RECEIVED signal (asserted for 1 cycle when ack response finally received)
- return to monitor the StateInfo from memory/network cache at the same
time as a (response) packet header is being placed from the FIFO into the EA latches:
- Global bit -- asserted when hits memory in global state
- Valid bit -- asserted when hits memory in valid state
- NC_Hit -- asserted when response comes from network cache
(no ring traffic was performed)
- provide bits for the monitor at the same time as a (response) packet header is being
placed from the FIFO into the EA latches:
- exclusive read -- asserted if the original request was a READ_EXCLUSIVE or UPGRADE
- remote read -- asserted if the request was to a remote (off-station)
home memory address
- provide to monitor a copy of the filter masks for all incoming invalidates.
Memory Card Support
- keep 4-bit PhaseID from processor request with each transaction
- return 3-bits of StateInfo in processor response packets:
- Global bit -- asserted when hits memory in global state
- Valid bit -- asserted when hits memory in valid state
- NC_Hit -- always zero
- place a MEMBUSY signal on the bus, so local processors know
that the memory is servicing a request. this MEMBUSY signal
should be maskable based upon number of REQ in FIFO or
number of PACKETS (+DATA) in FIFO. this can be used by local
processor card to count the number of times it goes to main
memory but the memory was busy, indicating a potentially longer
wait. note that with 2 memory cards there must be 2 MEMBUSY lines.
Network Cache Support
- keep 4-bit PhaseID from processor request with each transaction
- return 3-bits of StateInfo in processor response packets:
- Global bit -- asserted when hits memory in global state
- Valid bit -- asserted when hits memory in valid state
- NC_Hit -- asserted when transaction HITS in network cache
(no ring traffic was performed)
Ring Interface Support
- keep 4-bit PhaseID from processor request with each transaction
- keep 3-bit StateInfo in processor response packets:
- place 2 busy signals on the bus. one is LOCALBUSY, the other is
GLOBALBUSY. LOCALBUSY is a copy of the full/empty bit passing by
on the local ring. for GLOBALBUSY, an extra bit is allocated on
the local ring. this bit is a copy of the full/empty bit on the
global ring and is set/cleared by the inter-ring interface. ring
utilization can then be measured at the processor. software can
use the information to control network traffic -- number of issued
prefetches or block transfers, for instance.