Efficient Multi-Ported Memories for FPGAs

A better way to build concurrent-access memories on FPGAs

As FPGAs continue to increase in transistor density, designers are using them to build larger and more complex systems-on-chip that require frequent sharing, communication, queueing, and synchronization among distributed functional units and compute nodes. These functions boil down to FIFOs and register files, which can both be implemented using multi-ported memories.

In this work we propose a new design for true multi-ported memories that capitalizes on FPGA block RAMs while providing:

  1. substantially better area scaling than a pure logic-based approach
  2. higher frequencies than the multipumping approach
  3. true random access from all ports without contention
The key to our approach is a form of indirection through a structure called the Live Value Table (LVT), which is itself a small multi-ported memory implemented in reconfigurable logic. Essentially, the LVT allows a banked design to behave like a true multi-ported design by directing reads to appropriate banks based on which bank holds the most recent or ``live'' write value.

The intuition for why an LVT-based design is more efficient, even though the LVT is purely implemented in logic elements, is because the LVT is much narrower than the actual memory banks since it only holds bank numbers rather than full data values—thus the lines that are decoded/multiplexed are also much narrower and hence more efficiently placed and routed. An LVT-based design also leverages block RAMS, which implement bulk memory more efficiently, and has an operating frequency closer to that of the block RAMs themselves.

Additionally, LVT-based design and multipumping are complementary, and we show that with multipumping we can reduce the area of an LVT-based design by halving its maximum operating frequency. With these techniques we can support soft solutions for multi-ported memories without expensive hardware block RAMs with more than two ports.

For example, the charts below show the area and speed of three 32-bit-wide multi-ported memories on an Altera Stratix III FPGA: LVT-based using M9K block RAMs, LVT-based using MLABs, and a pure logic (Pure-ALM) approach. At a depth of 256 elements, our LVT-M9K solution has 84% less area and 43% less delay than a pure logic implementation:


People

Publications

Links

Download


Home
Last Updated April 28, 2010