Parallelizing FPGA CAD Using Transactional Memory

Steven Birk, University of Toronto

To capitalize on the growing abundance of multicore hardware, FPGA vendors have begun to parallelize the most important algorithms in their CAD software. However, parallelization is a painstaking and hence expensive process that limits the number of algorithms that can be cost-effectively parallelized. Transactional Memory (TM) promises an easier, optimistic alternative to locks for critical sections in threaded code---allowing programmers to avoid deadlocks and data races, and also allowing critical sections to execute in parallel as long as they access independent data. In this paper we summarize our recent work on using TM to parallelize simulated-annealing-based placement for FPGAs. In particular, we use a software-TM (TinySTM) to parallelize the placement phase of Versatile Place and Route (VPR) 5.0.2. With TM we very quickly produced a parallel and correct version of the software, allowing us to focus on incrementally tuning performance. We describe our experiences in tuning both the STM and CAD software for each other, and the interesting algorithmic trade-offs that exist. In the end we find that transactionalized placement has the potential for scalable performance: our non-deterministic implementation achieves self-relative speedups over a single thread of 1.84x, 3.73x and 7.56x at 2,4, and 8 threads (respectively) with little quality impact. However, hardware support for TM is likely required to overcome the overheads of STM.