AI Chips Must Get The Floating-Point Math Right
By Sergio Marchese, Semiconductor Engineering
Most AI chips and hardware accelerators that power machine learning (ML) and deep learning (DL) applications include floating-point units (FPUs). Algorithms used in neural networks today are often based on operations that use multiplication and addition of floating-point values, which subsequently need to be scaled to different sizes and for different needs. Modern FPGAs such as Intel Arria-10 and Xilinx Everest include floating-point units in their DSP slices that can be leveraged to optimize classification, detection, and image recognition tasks. Convolutional neural networks (CNNs) are popular for computer vision applications and are demanding on compute power. The computational workload of a convolution layer may involve deeply nested loops.
Reducing power consumption and area are crucial goals. In many cases, half precision (16 bits) is sufficient for AI platforms. Lower precisions, 12 bits or 8 bits, have also been demonstrated to adequately support certain applications of CNNs. Implementing CNNs on embedded devices poses even tougher requirements on storage area and power consumption. Low-precision fixed-point representations of CNN weights and activations may be an option. But, as argued in this paper, using floating-point numbers for weights representation may result in significantly more efficient hardware implementations. Fused multiply-add (FMA) operations, where rounding is computed on the final results, may provide additional performance improvements.
Floating-point representations of real numbers have significant advantages over fixed-point. For example, given a fixed bit width for binary encoding, floating-point formats cover a much wider range of values without losing precision. However, FPUs are much harder to implement in hardware than integer or fixed-point arithmetic. The IEEE 754 standard defines many corner-case scenarios and non-ordinary values, such as +0, -0, signed infinity, and NaN (not a number). Moreover, there are four possible rounding modes (roundTowardZero, roundTiesToEven, roundTowardPositive and roundTowardNegative), as well as five exceptions flags (invalid operation, division by zero, inexact result, underflow and overflow).