Over Over the past years, the field of Machine Learning (ML) and Deep Learning (DL) has seen strong developments both in terms of software and hardware, with the increase of specialized devices. One of the biggest challenges in this field is the inference phase, where the trained model makes predictions of unseen data. Although computationally powerful, traditional computing architectures face limitations in efficiently managing requests, especially from an energy point of view. For this reason, the need arose to find alternative hardware solutions, and among these, there are Field Programmable Gate Arrays (FPGAs): their key feature of being reconfigurable, combined with parallel processing capability, low latency and low power consumption, makes those devices uniquely suited to accelerating inference tasks. In this paper, we present a novel approach to accelerate the inference phase of a multi-layer perceptron (MLP) using BondMachine framework, an OpenSource framework for the design of hardware accelerators for FPGAs. Analysis of the latency, energy consumption, and resource usage, as well as comparisons with respect to standard architectures and other FPGA approaches, is presented, highlighting the strengths and critical points of the proposed solution. The present work represents an exploratory study to validate the proposed methodology on MLP architectures, establishing a crucial foundation for future work on scalability and the acceleration of more complex neural network models.

Extending a Moldable Computer Architecture to Accelerate DL Inference on FPGA

Mariotti, Mirko
;
Bianchini, Giulio
;
Neri, Igor;Ciangottini, Diego;Storchi, Loriano
2025

Abstract

Over Over the past years, the field of Machine Learning (ML) and Deep Learning (DL) has seen strong developments both in terms of software and hardware, with the increase of specialized devices. One of the biggest challenges in this field is the inference phase, where the trained model makes predictions of unseen data. Although computationally powerful, traditional computing architectures face limitations in efficiently managing requests, especially from an energy point of view. For this reason, the need arose to find alternative hardware solutions, and among these, there are Field Programmable Gate Arrays (FPGAs): their key feature of being reconfigurable, combined with parallel processing capability, low latency and low power consumption, makes those devices uniquely suited to accelerating inference tasks. In this paper, we present a novel approach to accelerate the inference phase of a multi-layer perceptron (MLP) using BondMachine framework, an OpenSource framework for the design of hardware accelerators for FPGAs. Analysis of the latency, energy consumption, and resource usage, as well as comparisons with respect to standard architectures and other FPGA approaches, is presented, highlighting the strengths and critical points of the proposed solution. The present work represents an exploratory study to validate the proposed methodology on MLP architectures, establishing a crucial foundation for future work on scalability and the acceleration of more complex neural network models.
2025
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1615987
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact