Hardware Integration of a Neural Network onto FPGA and its Application to Sound Source Localization
Résumé
Sound Source Localization algorithms estimate the Direction of Arrival of one or multiple (moving) sound sources in a 3-D space. Potential applications include environment acoustic mapping, spatial filtering of relevant sources out of acoustic clutter and noise, and headset-free human-robot speech interaction. Deep Learning algorithms have been widely used, with many solutions relying on Feedforward Neural Networks, convolutional kernels, or attention mechanisms derived from the Transformer. In this paper, the hardware implementation of a Deep Learning model for Sound Source Localization is proposed on FPGA evaluation board (Digilent Zybo-7020). This enables Artificial Intelligence inference directly onto a FPGA with acceptable performances (70 % accuracy for an error < 10 °) through an energy-efficient full hardware implementation, without resorting to usual processing units (e.g., CPU, GPU, DSP, etc.) nor soft-core processor with hardwired accelerators/co-processors.