Sigmoid/Tanh Module¶
Concept¶
The idea is to have a Tanh/Sigmoid compute block as an EltWise_Type within the Element_Wise operators.
The main logic computes Tanh first, and the Sigmoid output is derived using
if Eltwise_type == ELTWISE_SIG.
Module Descriptions¶
Top Tanh/Sigmoid¶
The input is fed from element_wise_op module after scaling it with Eltwise_Scale.
The top module checks the i_tanh_sigmoid signal and performs \(input / 2\) if it’s high
(i.e., when Eltwise_Type == ELTWISE_SIG).
The input is then fed to the tanh_interpolator_engine after performing a 2’s complement
if the MSB of the input data is high. The Tanh interpolator_engine inherently operates on unsigned integers,
and a final 2’s compliment is performed on the interpolator_engine’s output, by checking the MSB bit of it’s corresponding
input data. This works because the \(tanh(x)\) is an odd function i.e,
\[tanh(-x) = -tanh(x)\]
Tanh Interpolator Engine¶
This module combines the input_range_decoder and interpolator_engine.
It also ensures signal alignment through proper delaying to synchronize the interpolated output.
Input Range Decoder¶
The range decoder compares the input value with DATA_SAMPLE_MAX and determines the compute region.
It then generates the control signal for the interpolation region.
If the input value is greater than DATA_SAMPLE_MAX, the input is clamped to the DATA_SAMPLE_MAX value before being fed to the interpolator.
Interpolator Engine¶
This is the core Tanh compute block. The compute methodology includes:
Reading precomputed values from:
interpolation_points.txtslope_values.txttanh_values.txt
to enable faster approximation without computing
tanhdirectly.Index calculation depends on the input data and is performed as follows:
Determine the LUT segment using the difference between the input sample and
data_sample_min.Multiply the segment offset with the step size
N / (DATA_SAMPLE_MAX - DATA_SAMPLE_MIN).Compute the segment index using:
\[\text{segment\_index} = \frac{x - x_{\min} + l/2}{l}\]where
\(l = \frac{x_{\max} - x_{\min}}{n}\).
x = input to the interpolator_engine
\(x_{\min}\) =
data_sample_min(32’d0 in this case)n = Number of interpolation points (128 in this case)
Compute the scaled value using
\[\text{scaled\_data} = x_{\text{diff}} * slope\_value\]where \(x_{\text{diff}} = x - x_{\text{sample}}\).
The interpolated value is obtained by adding \(\text{scaled\_data}\) and the corresponding scaled
tanh_value.This gives the Tanh output, which is converted to Sigmoid by:
\[sigmoid(x) = \frac{1 + tanh(x/2)}{2}\]when
EltWise_Type == ELTWISE_SIG.NOTE: The Sigmoid/Tanh output is calculated using data scaled to fp16. Hence, the actual hardware compute becomes:
\[sigmoid(x) = \frac{65535 + tanh(x/2)}{>>>1}\]when
EltWise_Type == ELTWISE_SIG.The output is sent back to
element_wise_op, which proceeds to quantization.Element_Wiseoperations usefp_cast. It is hardcoded to use 16 bits for Sigmoid/Tanh and 10 bits for other operations.
References¶
[1] Improving Neural Network Efficiency Using Piecewise Linear Approximation of Activation Functions
Instruction Integration¶
A macro has been added in arch_param.vh to enable dynamic hardware generation of the Sigmoid/Tanh
EltWise_Type.
Instructions.vh has also been updated to include the new EltWise types.