Overview

Convolutional neural networks are a class of Machine Learning algorithms that excel at identifying/representing patterns in 2D structures such as images. CNN inherit the computational complexity that ML algorithms are known for.

The compute required for CNN may not be “complex” with a lot of interacting elements but it is vast. Accelerating CNNs is an interesting and challenging problem as a result. It requires handling memory bandwidth diligently and designing efficient architectures that fit on a small FPGA and do not occupy a lot of resources.

Vaaman and the edge

Vaaman is a single-board computer with an FPGA with it. The CPU and the FPGA are connected through the MIPI interface. This connection allows a special type of computation where the CPU may perform computations that it is good at (Data Marshalling, running an operating system, controlling other processors etc.) and a co-processor (in this case FPGA, can perform massively parallel, high-throughput demanding computations). This is the core essence of vaaman (and of Heterogeneous Computing).

Where Gati Fits

Gati is the set of software-hardware libraries/programs that enable/accelerate CNN (as of now) applications. The following document describes the internal architecture of Gati.