Differences

This shows you the differences between two versions of the page.

--- wiki:aca2017:assign3 [2018/02/24 21:20] – Andreas Moshovos
+++ wiki:aca2017:assign3 [2018/03/23 13:58] (current) – [Sample Inputs/Networks] Andreas Moshovos
@@ Line 1: / Line 1: @@
+==== Warning:) ====
+I have not written a simulator for the accelerator in question. So, I do not have strong opinions on what needs to be done and how. Moreover, I do not have the experience with potential problems that may arise for this specific case. However, I have experience with processor simulators which I will happily share as needed. So approach this more as a "how would you go about developing this simulator" as opposed to "do these precise steps".
+AT the end what I care the most is documenting your approach, thinking, problems encountered and solutions developed, and a commentary of what you would do next if you had more time and of the lessons learned.
 ==== Goal: ====
@@ Line 7: / Line 12: @@
 DaDianNao (from here on referred to as DaDN) consisted of 16 tiles and a 4MB Activation Memory (AM) (referred to as Neuron Memory in the publication). The AM provided 16 activations per cycle which it broadcast to all 16 tiles. These input activations were temporarily buffered into an input buffer (NBin) per tile. Each tile contained 16 filter lanes, each processing 16 weight (called synapse in the publication) and activation pairs. A local 2MB per tile eDRAM Weight Memory (WM -- Synapse Buffer in the original publication) provided the 256 weights each tile needed per cycle. For each filter lane there were 16 multipliers feeding into an 16-input adder tree. The resulting sum passed through an activation function and finally was stored into an output buffer (NBout) prior to eventually being written back to AM. Writes to AM were performed by at most one tile per cycle and were for 16 output activations.
-In summary, a complete model would include the following: tiles, WMs (per tile), AM, NBin (per tile), NBout (per tile), external memory, link between AM and tiles, link between WMs and tile, link(s) between external memory and the WMs and the AM. The bare minimum the simulator will have to model the Tiles and accesses to/from WM and AM.
+In summary, a complete model would include the following: tiles, WMs (per tile), AM, NBin (per tile), NBout (per tile), external memory, link between AM and tiles, link between WMs and tile, link(s) between external memory and the WMs and the AM. The bare minimum the simulator will have to model the Tiles and accesses to/from WM and AM. Finally, there is the control unit, the one that instructs all others what to do. The DaDN paper has a description of a simple "instruction" set, which is essentially a set of memory access engines that feed to and from the datapath. A significant part of this assignment is to think how to go about developing this control unit. You do not have to worry about how it would look in hardware, only what functionality it should provide. What it should do, not how it should it.
 ==== What to implement the simulator in ====
@@ Line 61: / Line 66: @@
 ==== Sample Inputs/Networks ====
+**This will be revised soon. Ignore for the time being.**
 Milos Nikolic was kind enough to prepare the following sample inputs which you can use to test your simulator. We will clarify some of the information below later on. For the time being this should be sufficient to get you started.
@@ Line 75: / Line 81: @@
 These are in order, only convolution and fully connected. I used 7 over the fixed point for the result.
-[[https://www.dropbox.com/s/61fdnmp7r5hlbw3/weights.zip?dl=0|Get the networks and inputs from dropbox.]]
+[[ https://www.dropbox.com/s/hkt4x9kq9ref0st/per_layer_weights.zip?dl=0|Get the networks and inputs from dropbox.]] This was updated on March 23.
 Here are two python scripts that can read the above files:
 {{ :wiki:aca2017:scripts.zip |}}
+Here's further info from Milos:
+The values are all stored in 16 bit int (if you load it in numpy, you will get an int). If you look at them in binary the first n+1 are the integer part, while the rest are the fraction bits, where n is precision I included in lists. For lenet layer one, input activations will be 2.14 and weights are 1.15.
+The npy files contain python dictionaries with the values, layer names are the keys. After loading the files into variable var, each layer parameters are accessed as var[‘layer name’], this will be a 4D array. I never used it, but there is a git repo with code to load npy in c/c++ (https://github.com/rogersce/cnpy).
+'
+The numbers should be interpreted as fixed point values with the following format. The layers are in the same order as prototxt and the diagram and include only convolution and inner product layers.
+Lenet layers:
+conv1  conv2  ip1       ip2
+Lenet activations:
+.14     4.12     4.12     4.12     8.8
+Lenet weights:
+.15     1.15     1.15     1.15
+Nin layers:
+conv1  cccp1   cccp2   conv2  cccp3   cccp4   conv3  cccp5   cccp6   conv4-1024     cccp7-1024      cccp8-1024
+Nin activations:
+.5     11.5     10.6     13.3     13.3     12.4     12.4     12.4     11.5     11.5     11.5     10.6     8.8
+Nin weights:
+.15     2.14     2.14     1.15     1.15     1.15     1.15     1.15     1.15     1.15     1.15     1.15
+Activations show the format of the input into the layer. The bias and output of the layer should follow the format of the following layer. Weights follow the format of the current layer.