In this work, we address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a $w^k\hbox{-}{\rm{bit}}$, $(k\geq 2)$, sequence using as basic building blocks linear arrays of at most $w^2$ shift switches, where $w$ is a small power of $2$. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most $w^2$. We adopt a VLSI delay model where the “length” of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a $w^k\hbox{-}{\rm{bit}}$ binary sequence in the time of $2k-2$ broadcasts, while the corresponding prefix sums can be computed in the time of $3k-4$ broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most $w^2$, the total number of broadcasts involved is less than three times the number required by an “ideal” design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a $kw^k\hbox{-}{\rm{bit}}$ binary sequence in the time of $3k+\lceil\log_w k\rceil -3$ broadcasts. Using this design, the corresponding prefix sums can be computed in the time of $4k+\lceil\log_w k\rceil -5$ broadcasts.

Scalable Hardware-Algorithms for Binary Prefix Sums

PINOTTI, Maria Cristina;
2000

Abstract

In this work, we address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a $w^k\hbox{-}{\rm{bit}}$, $(k\geq 2)$, sequence using as basic building blocks linear arrays of at most $w^2$ shift switches, where $w$ is a small power of $2$. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most $w^2$. We adopt a VLSI delay model where the “length” of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a $w^k\hbox{-}{\rm{bit}}$ binary sequence in the time of $2k-2$ broadcasts, while the corresponding prefix sums can be computed in the time of $3k-4$ broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most $w^2$, the total number of broadcasts involved is less than three times the number required by an “ideal” design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a $kw^k\hbox{-}{\rm{bit}}$ binary sequence in the time of $3k+\lceil\log_w k\rceil -3$ broadcasts. Using this design, the corresponding prefix sums can be computed in the time of $4k+\lceil\log_w k\rceil -5$ broadcasts.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/157882
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 14
social impact