Octavo: An FPGA-Centric Processor Family

(a snapshot of work-in-progess)

Overlay processor architectures allow FPGAs to be programmed by non-experts using existing software tools, but prior designs have mainly been based on the architecture of their ASIC predecessors, which suffer a performance penalty on FPGAs.

In this work we develop a new processor architecture that from the beginning accounts for and exploits the predefined widths, depths, maximum operating frequencies, and other discretizations and limits of the underlying FPGA components. The result is Octavo, a ten-pipeline-stage eight-threaded processor that operates at the Block RAM maximum of 550MHz on a Stratix IV FPGA.

We name our processor architecture Octavo (An octavo is a booklet made from a printed page folded three times to produce eight leaves, or 16 pages.), for nominally having eight thread contexts. However, Octavo is really a processor family since it is highly parameterizable in terms of its datapath and memory width, memory depth, and number of supported thread contexts. This parameterization allows us to search for optimal configurations that maximize FPGA resource utilization and clock frequency.

In this work we ask the fundamental question: How do FPGAs want to compute? A more exact (but less memorable) phrasing of this question is: What processor architecture best fits the underlying structures and discretizations of an FPGA?

We guide our investigation with the following goals for a processor design:

  1. To support a highly-threaded data-parallel programming model, similar to OpenCL.
  2. To run at the maximum operating frequency allowed by the particular FPGA resources used (e.g.: BRAMs).
  3. To have high performance---i.e, not only high-frequency but also reasonable instruction count and processor-cycles-per-instruction.
  4. To never stall due to hazards (such as control or data dependences).
  5. To strive for simplicity and minimalism, rather than inherit all of the features of an existing processor design/ISA.
  6. To match underlying FPGA structures; for example, to discover the most effective width for data elements for both datapaths and storage, as opposed to defaulting to the conventional 32-bit width.

In this paper we focus on the architecture of a single Octavo core and provide the following four contributions:

  1. we present the design process leading to Octavo, an 8-stage multithreaded processor family that operates at up to 550MHz on a Stratix IV FPGA;
  2. we demonstrate the utility of self-loop characterization for reasoning about the pipelining requirements of processor components on FPGAs;
  3. we present a design for a fast multiplier, consisting of two half-pumped DSP blocks, which overcomes hardware timing and CAD limitations;
  4. we present the design space of Octavo configurations of varying datapath and memory widths, memory depths, and number of pipeline stages.


People

Publications

Links

(forthcoming publication of proceedings)

Download

(forthcoming a few refinements)
Home
Last Updated February 28, 2012