Overlay processor architectures allow FPGAs to be programmed by non-experts using existing software tools, but prior designs have mainly been based on the architecture of their ASIC predecessors, which suffer a performance penalty on FPGAs.
In this work we develop a new processor architecture that from the beginning accounts for and exploits the predefined widths, depths, maximum operating frequencies, and other discretizations and limits of the underlying FPGA components. The result is Octavo, a ten-pipeline-stage eight-threaded processor that operates at the Block RAM maximum of 550MHz on a Stratix IV FPGA.
We name our processor architecture Octavo (An octavo is a booklet made from a printed page folded three times to produce eight leaves, or 16 pages.), for nominally having eight thread contexts. However, Octavo is really a processor family since it is highly parameterizable in terms of its datapath and memory width, memory depth, and number of supported thread contexts. This parameterization allows us to search for optimal configurations that maximize FPGA resource utilization and clock frequency.
In this work we ask the fundamental question: How do FPGAs want to compute? A more exact (but less memorable) phrasing of this question is: What processor architecture best fits the underlying structures and discretizations of an FPGA?
We guide our investigation with the following goals for a processor design:
In this paper we focus on the architecture of a single Octavo core and provide the following four contributions: