Our Our primary goal is to get the most out of the FPGAs in terms of speed.
    We could build a combinational circuit, but this would be very slow.
    The first idea for speedup is to use pipelining. By inserting a pipeline register after each stage, we get a 16-times clock speed improvement. We need 64 bits for the data and another 56 bits for the key. Unfortunately, this requires too many resources and does not fit in one FPGA. We had to find a way to optimize the circuit.
    Note that each pipeline stage processes one key. At any time we have 16 consecutive keys in the key pipeline. Basically the key registers are used to store the previous 16 keys.

.