Large-scale computing frameworks are key technologies to fulfill the computational requirements of massive data analysis. In particular, while Apache Spark has emerged as de facto standard for big data analytics after Hadoop’s MapReduce, tools such as Dask and Ray can greatly boost the performance of Python applications in distributed environments. The goal of this paper is to study the performance of these three frameworks on a common playground. We focus on cloud-native architectures, which merge the benefits of big data and cloud computing. We refrain from considering high-level features such as ML models, we instead consider simple data processing operations, common ingredients of more complex pipelines. As a byproduct of our experiments, we offer a set of guidelines for the development of cloud-native data processing applications.

Large-scale Computing Frameworks: Experiments and Guidelines

Montecchiani F.;
2023

Abstract

Large-scale computing frameworks are key technologies to fulfill the computational requirements of massive data analysis. In particular, while Apache Spark has emerged as de facto standard for big data analytics after Hadoop’s MapReduce, tools such as Dask and Ray can greatly boost the performance of Python applications in distributed environments. The goal of this paper is to study the performance of these three frameworks on a common playground. We focus on cloud-native architectures, which merge the benefits of big data and cloud computing. We refrain from considering high-level features such as ML models, we instead consider simple data processing operations, common ingredients of more complex pipelines. As a byproduct of our experiments, we offer a set of guidelines for the development of cloud-native data processing applications.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1569366
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact