Complex analytics of the vast amount of data collected via social media, cell phones, ubiquitous smart sensors, and satellites is likely to be the biggest economic driver for the IT industry over the next decade. For many ``Big Data'' applications, the limiting factor in performance is often the transportation of large amount of data from hard disks to where it can be processed, i.e. DRAM. We will present BlueDBM, an architecture for a scalable distributed flash store which is designed to overcome this limitation in two ways. First, the architecture provides a high-performance, high-capacity, scalable random-access storage. It achieves high-throughput by sharing large numbers of flash chips across a low-latency, chip-to-chip backplane network managed by the flash controllers. Second, it permits some computation near the data via a FPGA-based programmable flash controller. We will present the preliminary results on accelerating complex queries using BlueDBM consisting of 20 nodes and up to 20TB of flash.

Biography: Arvind is the Johnson Professor of Computer Science and Engineering at MIT. Arvind’s group, in collaboration with Motorola, built the Monsoon dataflow machines and its associated software in the late eighties. In 2000, Arvind started Sandburst which was sold to Broadcom in 2006. In 2003, Arvind co-founded Bluespec Inc., an EDA company to produce a set of tools for high-level synthesis. In 2001, Dr. R. S. Nikhil and Arvind published the book "Implicit parallel programming in pH". Arvind's current research focus is on enabling rapid development of embedded systems.

Arvind is a Fellow of IEEE and ACM, and a member of the National Academy of Engineering and the American Academy of Arts and Sciences.

Data centers are a highly competitive environment that demands high performance and energy efficiency and, in many cases, low latency. Custom hardware can provide significant improvements over conventional microprocessors on those metrics. Microsoft has been investigating the use of reconfigurable logic, in the form of field programmable gate arrays, to accelerate its data centers. In this talk, I will describe some of our efforts in this area.

Biography: Derek Chiou is a Principal Architect at Microsoft where he co-leads a team working on FPGAs for data center applications and an associate professor at The University of Texas at Austin. His research areas are FPGA acceleration, high performance computer simulation, rapid system design, computer architecture, parallel computing, Internet router architecture, and network processors. Before going to UT, Dr. Chiou was a system architect and lead the performance modeling team at Avici Systems, a manufacturer of terabit core routers. Dr. Chiou received his Ph.D., S.M. and S.B. degrees in Electrical Engineering and Computer Science from MIT.

Biography: Smith received the Bachelor of Applied Science degree from the Division of Engineering Physics (now Division of Engineering Science) in 1954, the M.A.Sc in electrical engineering in 1956, and the Ph.D. in Physics in 1960, all from the University of Toronto. In 1961 he joined the University of Illinois as an Assistant Professor, where he became Chief Engineer of Illiac II, and the of Illiac III, and attained the rank of Associate Professor. In 1965 he returned to the University of Toronto, where he is currently Professor Emeritus. Smith's publications are in the areas of electronic circuits, computer architecture, multiple-valued logic, instrumentation, sensors, machine vision, neural networks, computer music, human factors and human-computer interfaces, and databases. He is the co-author of Microelectronic Circuits (with A.S. Sedra), which is now in its seventh edition (2015). Prof. Smith is a Life Fellow of the IEEE.

ASAP 2015 Program

Monday - July 27, 2015

Keynote Chair: Deshanand Singh, Altera Corp.

Arvind, "BlueDBM: A Multi-access, Distributed Flash Store for Big Data Analytics"

Session M1: Architecture and Technologies 1

Chair: Russell Tessier, University of Massachusetts

Regular Papers:

Cecilia Gonzalez-Alvarez, Jennifer Sartor, Carlos Alvarez, Daniel Jimenez-Gonzalez and Lieven Eeckhout, "Automatic Design of Domain-Specific Instructions for Low-Power Processors " (BEST STUDENT PAPER AWARD!)

Nachiket Kapre, "Custom FPGA-based Soft-Processors for Sparse Graph Acceleration"

Raphael Polig, Heiner Giefers and Walter Stechele, "A Soft-Core Processor Array for Relational Operators"

Short Papers:

Nasim Farahini and Ahmed Hemani, "Atomic Stream Computation Unit based on Micro-thread Level Parallelism"

Tanvir Ahmed and Yuko Hara-Azumi, "Timing Speculation-Aware Instruction Set Extension for Resource-Constrained Embedded Systems"

Prof. Kenneth C. Smith, “Tales of the Illiac – Anecdotes from the Design of Revolutionary” Early Supercomputers

Session M2: Application Acceleration 1

Chair: Sean Wagner, IBM

Regular Papers:

Nolan Denman, Mandana Amiri, Kevin Bandura, Liam Connor, Matt Dobbs, Mateus Fandino, Mark Halpern, Adam Hincks, Gary Hinshaw, Carolin Hofer, Peter Klages, Kiyoshi Masui, Juan Mena Parra, Laura Newburgh, Andre Recnik, Richard Shaw, Kris Sigurdson, Kendrick Smith and Keith Vanderlinde, "A GPU-based Correlator X-engine Implemented on the CHIME Pathfinder"

Nitin Gawande, Joseph Manzano, Antonino Tumeo, Nathan Tallent, Darren Kerbyson and Adolfy Hoisie, "Power and Performance Trade-offs for Space Time Adaptive Processing"

Tahsin Reza, Aaron Zimmer, Parwant Ghuman, Tanuj Kr Aasawat and Matei Ripeanu, "Accelerating Persistent Scatterer Pixel Selection for InSAR Processing"

Short Paper:

Andre Recnik, Kevin Bandura, Nolan Denman, Adam D. Hincks, Gary Hinshaw, Peter Klages, Ue-Li Pen and Keith Vanderlinde, "An Efficient Real-time Data Pipeline for the CHIME Pathfinder Radio Telescope X-Engine"

Break and Poster Session 1

Ross Thompson and James Stine, "An IEEE 754 Double-Precision Floating-Point Multiplier for Denormalized and Normalized Floating-Point Numbers"

Wei He and Dirmanto Jap, "Dual-Rail Active Protection System against Side-Channel Analysis in FPGAs"

Tung Hoang Thanh, Amirali Shambayati, Henry Hoffmann and Andrew A. Chien, "Does Arithmetic Logic Dominate Data Movement ? A Systematic Comparison of Energy-Efficiency for FFT Accelerators"

Bingzhe Li, M. Hassan Najafi and David Lilja, "An FPGA Implementation of a Restricted Boltzmann Machine Classifier Using Stochastic Bit Streams"

Mehmet Ali Arslan, Flavius Gruian and Krzysztof Kuchcinski, "Application-Set Driven Exploration for Custom Processor Architectures"

Abdelhamid Dine, Abdelhafid Elouardi, Bastien Vincke and Samir Bouaziz, "Speeding up Graph-based SLAM Algorithm: a GPU-based Heterogeneous Architecture Study"

Session M3: Arithmetic

Chair: Qiang Wang, Huawei America

Regular Papers:

Hugues de Lassus Saint-Geniès, David Defour and Guillaume Revy, "Range Reduction Based on Pythagorean Triples for Trigonometric Function Evaluation"

Yongchao Liu and Bertil Schmidt, "LightSpMV: Faster CSR-based Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs " (BEST PAPER AWARD!)

Ran Zheng, Wei Wang, Hai Jin, Song Wu, Yong Chen and Han Jiang, "GPU-based Multifrontal Optimizing Method in Sparse Cholesky Factorization"

Opening Day Reception on Patio of University of Toronto Faculty Club

Tuesday - July 28, 2015
Keynote 2

Derek Chiou, Accelerating Data Centers with Reconfigurable Logic

Session T1: Architecture and Technologies 2

Chair: Vaughn Betz, University of Toronto

Regular Paper:

Seiichi Tade, Hiroki Matsutani, Hideharu Amano and Michihiro Koibuchi, "A Metamorphotic Network-on-Chip for Various Types of Parallel Applications"

Short Papers:

Ming-Ju Wu, Yan-Ting Chen and Chun-Jen Tsai, "Dynamic Pipeline-Partitioned Video Decoding on Symmetric Stream Multiprocessors"

Ran Wang, Jie Han, Bruce Cockburn and Duncan Elliott, "Stochastic Circuit Design and Performance Evaluation of Vector Quantization"

Glenn Cowan, Kevin Cushon and Warren Gross, "Mixed-Signal Implementation of Differential Decoding using Binary Message Passing Algorithms"

Introduction Presentation to ASAP 2016

Lunch Break

Session T2: Crypto/Security

Chair: Wenjing Rao, University of Illinois at Chicago

Regular Papers:

Mihai Maruseac, Gabriel Ghinita, Ming Ouyang and Razvan Rughinis, "Hardware Acceleration of Private Information Retrieval Protocols Using GPUs"

Moon Sung Lee, Yongje Lee, Jung Hee Cheon and Yunheung Paek, "Accelerating Bootstrapping in FHEW using GPUs"

Tedy Thomas, Arman Pouraghily, Kekai Hu, Russell Tessier and Tilman Wolf, "Multi-Task Support for Security-Enabled Embedded Processors"

Short Papers:

Pei Luo, Liwei Zhang, Yunsi Fei and A. Adam Ding, "Towards Secure Cryptographic Software Implementation Against Side-Channel Power Analysis Attacks"

Paulo Martins, Leonel Sousa, Julien Eynard and Jean-Claude Bajard , "Programmable RNS Lattice-Based Parallel Cryptographic Decryption"

Break and Poster Session 2

Xin Fang, Pei Luo, Yunsi Fei and Miriam Leeser, "Balance Power Leakage to Fight Against Side-Channel Analysis at Gate Level in FPGAs"

Jie Tang, Chen Liu and Jean-Luc Gaudiot, "How can Garbage Collection be Energy Efficient By Dynamic Offloading?"

Zhinan Cheng, Xi Li, Beilei Sun, Ce Gao and Jiachen Song, "Automatic Frame Rate-Based DVFS Of Game"

Rodrigo Devigo, Liana Duenha, Rodolfo Azevedo and Ricardo Santos, "MultiExplorer: A Tool Set for MultiCore System-on-Chip Design Exploration"

Vincenzo Catania, Andrea Mineo, Salvatore Monteleone, Maurizio Palesi and Davide Patti, "Noxim: An Open, Extensible and Cycle-accurate Network on Chip Simulator"

Peter Klages, Kevin Bandura, Nolan Denman, Andre Recnik, Jonathan Sievers and Keith Vanderlinde, "GPU Kernels for High-Speed 4-Bit Astrophysical Data Processing"

Session T3: Tools and Design Methodologies

Chair: Warren Gross, McGill University

Regular Papers:

Moritz Schmid, Oliver Reiche, Frank Hannig and Jürgen Teich, "Loop Coarsening in C-based High-Level Synthesis"

Dylan Rudolph and Greg Stitt, "An Interpolation-Based Approach to Multi-Parameter Performance Modeling for Heterogeneous Systems"

Erkan Diken, Martin O'riordan, Roel Jordans, Lech Jozwiak, Henk Corporaal and David Moloney, "Mixed-Length SIMD Code Generation for VLIW Architectures with Multiple Native Vector-Widths"

Short Paper:

Kenneth Hill, Stefan Craciun, Alan George and Herman Lam, "Comparative Analysis of OpenCL vs. HDL with Image-Processing Kernels on Stratix-V FPGA"

Boarding for Boat

Boat Departs

Wednesday - July 29, 2015

Session W1: Fault Tolerance

Chair: Nachiket Kapre, Nayang Technological University

Regular Papers:

Alexandru Tanase, Michael Witterauf, Jürgen Teich, Frank Hannig and Vahid Lari, "On-Demand Fault-Tolerant Loop Processing on Massively Parallel Processor Arrays"

Aniruddha Shastri, Greg Stitt and Eduardo Riccio, "A Scheduling and Binding Heuristic for High-Level Synthesis of Fault-Tolerant FPGA Applications"

Session W2: Application Acceleration 2 and Power

Chair: Andy Ye, Ryerson University

Regular Papers:

Andreea Ingrid Funie, Paul Grigoras, Pavel Burovskiy, Wayne Luk and Mark Salmon, "Reconfigurable Acceleration of Fitness Evaluation in Trading Strategies"

Hamed Tabkhi, Majid Sabbagh and Gunar Schirner, "An Efficient Architecture Solution for Low-Power Real-Time Background Subtraction"

Shijie Zhou, Yun Qu and Viktor Prasanna, "Large-scale packet classification on FPGA"

Short Papers:

Andrew Wong, Saied Hemati and Warren Gross, "Efficient Implementation of Structured Long Block-Length LDPC Codes"

Andrea Sanny, Yi-Hua Edward Yang and Viktor Prasanna, "Energy Optimization of Parallel k-Means Clustering Algorithm on FPGA"

