### NetTM: Faster and Easier Synchronization for Soft Multicores via Transactional Memory



Martin Labrecque Prof. Greg Steffan <u>University of Toronto</u>

FPGA, February 27th 2011



FPGAs in Telecommunications:

- Present in most high-end routers
- More than 40% of FPGA market



FPGAs in Telecommunications:

- Present in most high-end routers
- More than 40% of FPGA market

Deep packet inspection requires: software + CPUs



FPGAs in Telecommunications:

- Present in most high-end routers
- More than 40% of FPGA market

Deep packet inspection requires: software + CPUs Our goal: implement those cores directly in the FPGA



FPGAs in Telecommunications:

- Present in most high-end routers
- More than 40% of FPGA market



Deep packet inspection requires: software + CPUs Our goal: implement those cores directly in the FPGA





8 threads?



8 threads? Write 1 program, run on all threads!



8 threads? Write 1 program, run on all threads! Released online: **Google** netfpga+netthreads

### Ideal scenario:

Packets are dataindependent and are processed in parallel



### Ideal scenario:

Packets are dataindependent and are processed in parallel

#### **Reality:**

Programmers need to insert locks **in case** there is a dependence



### Ideal scenario:

Packets are dataindependent and are processed in parallel

#### **Reality:**

Programmers need to insert locks **in case** there is a dependence



Experimental result: Synchronizing packet processing threads with fine/medium-grained global locks is overlyconservative 80-90% of the time [ANCS'10]

TIME

### Ideal scenario:

Packets are dataindependent and are processed in parallel

#### **Reality:**

Programmers need to insert locks **in case** there is a dependence

### Transactional memory

Data-independent packets are processed in parallel













- 1K words speculative writes buffered per thread



1st HTM implementation tightly integrated with soft processors

- 1st HTM implementation tightly integrated with soft processors
- •Supports conventional locks and TM without code modification!

- 1st HTM implementation tightly integrated with soft processors
- •Supports conventional locks and TM without code modification!
- •Can extract optimistic parallelism across packets
  - •Improves benchmark throughput: +6%, +54%, +57%

- 1st HTM implementation tightly integrated with soft processors
- •Supports conventional locks and TM without code modification!
- •Can extract optimistic parallelism across packets
  - •Improves benchmark throughput: +6%, +54%, +57%
- •Coarse critical sections and deadlock avoidance simplify program

- 1st HTM implementation tightly integrated with soft processors
- •Supports conventional locks and TM without code modification!
- Can extract optimistic parallelism across packets
  - •Improves benchmark throughput: +6%, +54%, +57%
- •Coarse critical sections and deadlock avoidance simplify program
- •Processor and conflict detection integration works well on FPGA

- 1st HTM implementation tightly integrated with soft processors
- •Supports conventional locks and TM without code modification!
- Can extract optimistic parallelism across packets
  - •Improves benchmark throughput: +6%, +54%, +57%
- •Coarse critical sections and deadlock avoidance simplify program
- •Processor and conflict detection integration works well on FPGA
- Future work: scale to more cores on newer FPGA/NetFPGA!

- 1st HTM implementation tightly integrated with soft processors
- •Supports conventional locks and TM without code modification!
- Can extract optimistic parallelism across packets
  - •Improves benchmark throughput: +6%, +54%, +57%
- •Coarse critical sections and deadlock avoidance simplify program
- •Processor and conflict detection integration works well on FPGA

Future work: scale to more cores on newer FPGA/NetFPGA!

<u>NetTM</u> and <u>NetThreads</u> available online <u>Google</u>: netfpga+netthreads martinL@eecg.utoronto.ca