Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
wiki:overview [2018/04/27 17:17]
Andreas Moshovos
wiki:overview [2018/04/27 17:59] (current)
Andreas Moshovos
Line 29: Line 29:
   * P. Judd, J. Albericio, Andreas Moshovos, Stripes: Bit-Serial Deep Learning Computing, Computer Architecture Letters, Accepted, April 2016, Appears Aug 2016.   * P. Judd, J. Albericio, Andreas Moshovos, Stripes: Bit-Serial Deep Learning Computing, Computer Architecture Letters, Accepted, April 2016, Appears Aug 2016.
   * P. Judd, J. Albericio, T. Hetherington,​ T. Aamodt, Andreas Moshovos, Stripes: Bit-Serial Deep Learning Computing, ACM/IEEE International Conference on Microarchitecture,​ Oct 2016.   * P. Judd, J. Albericio, T. Hetherington,​ T. Aamodt, Andreas Moshovos, Stripes: Bit-Serial Deep Learning Computing, ACM/IEEE International Conference on Microarchitecture,​ Oct 2016.
-  * A. Delmas Lascorz, S. Sharify, P. Judd, A. Moshovos, TARTAN: Accelerating Fully-Connected and Convolutional Layers in Deep Neural Networks, OpenReview, Oct 2016.+  * A. Delmas Lascorz, S. Sharify, P. Judd, A. Moshovos, TARTAN: Accelerating Fully-Connected and Convolutional Layers in Deep Neural Networks, ​[[https://​openreview.net/​forum?​id=Hy-lMNqex|OpenReview]], Oct 2016. [[https://​arxiv.org/​abs/​1707.09068|Arxiv]].
  
 **Bit-Pragmatic Deep Learning Computing:​** We observe that //on average more than 90% of the work done when multiplying activations and weights in Convolutional Neural Networks is ineffectual//​. In Pragmatic, execution time depends only on the essential bit content of the runtime values for convolutional layers. Even after reducing precision to the absolute minimum, there are still many ineffectual computations. Pragmatic eliminates any remaining computations that are certainly ineffectual. It boosts both energy efficiency and performance over all previous proposals. Compared to DaDianNao it improves performance by more than 4x for narrower configurations performance approaches nearly 8x over an equivalent DaDianNao configuration. **Bit-Pragmatic Deep Learning Computing:​** We observe that //on average more than 90% of the work done when multiplying activations and weights in Convolutional Neural Networks is ineffectual//​. In Pragmatic, execution time depends only on the essential bit content of the runtime values for convolutional layers. Even after reducing precision to the absolute minimum, there are still many ineffectual computations. Pragmatic eliminates any remaining computations that are certainly ineffectual. It boosts both energy efficiency and performance over all previous proposals. Compared to DaDianNao it improves performance by more than 4x for narrower configurations performance approaches nearly 8x over an equivalent DaDianNao configuration.
Line 42: Line 42:
 ** Dynamic Stripes **: It has been known that the precisions that activations need can be tailored per network layer. Several hardware approaches exploit this precision variability to boost performance and energy efficiency. Here we show that much is left on the table by assigning precisions at the layer level. In practice the precisions will vary with the input and at a much lower granularity. An accelerator only needs to consider as many activations as it can process per cycle. In the work below we show how to adapt precisions variability at runtime at the processing granularity. We also show how to boost performance and energy efficiency for fully-connected layers. ** Dynamic Stripes **: It has been known that the precisions that activations need can be tailored per network layer. Several hardware approaches exploit this precision variability to boost performance and energy efficiency. Here we show that much is left on the table by assigning precisions at the layer level. In practice the precisions will vary with the input and at a much lower granularity. An accelerator only needs to consider as many activations as it can process per cycle. In the work below we show how to adapt precisions variability at runtime at the processing granularity. We also show how to boost performance and energy efficiency for fully-connected layers.
  
-  * Alberto Delmas, Patrick Judd, Sayeh Sharify, Andreas Moshovos, [[https://​arxiv.org/​abs/​1706.00504|Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks, arxiv.]] ​(An extended version combining DStripes and Tartan was rejected thus far from ISCA and HPCA)+  * Alberto Delmas, Patrick Judd, Sayeh Sharify, Andreas Moshovos, [[https://​arxiv.org/​abs/​1706.00504|Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks, arxiv.]] ​
  
 ** LOOM: An Accelerator for Embedded Devices **: When compute needs are modest the design described below exploits both activation and weight precisions. ** LOOM: An Accelerator for Embedded Devices **: When compute needs are modest the design described below exploits both activation and weight precisions.
Line 48: Line 48:
    * S. Sharify, A. Delmas Lascorz, P. Judd and Andreas Moshovos, [[https://​arxiv.org/​abs/​1706.07853|Arxiv]] ​    * S. Sharify, A. Delmas Lascorz, P. Judd and Andreas Moshovos, [[https://​arxiv.org/​abs/​1706.07853|Arxiv]] ​
    * S. Sharify, A. Delmas, P. Judd, K. Siu, and A. Moshovos, Design Automation Conference, June 2018 (here we use dynamic precision detection and also study the effects of off-chip traffic and on-chip buffering). ​    * S. Sharify, A. Delmas, P. Judd, K. Siu, and A. Moshovos, Design Automation Conference, June 2018 (here we use dynamic precision detection and also study the effects of off-chip traffic and on-chip buffering). ​
 +
 +
 +** DPRed: ** This builds on the Dynamic Stripes work and shows that dynamic precision and per group precision adaptation can yield benefits. Stay tuned for an update:
 +   * A. Delmas, A. Sharify, P. Judd, M. Nikolic, A. Moshovos, DPRed: Making Typical Activation Values Matter In Deep Learning Computing, [[https://​arxiv.org/​abs/​1804.06732|Arxiv]] ​
 +
  
 ** Bit-Tactical:​ **: Exploiting weight sparsity if available ** Bit-Tactical:​ **: Exploiting weight sparsity if available