Deep learning approaches for malware attacks are effective when trained on the same organizational network. However, it is challenging to develop a trustworthy distributed malware detection model that employs diverse training data from multiple sources. This is primarily due to privacy concerns and the lack of a standardized dataset. Confidentiality trends and cyberattacks require reliable and scalable distributed data privacy technologies. In addition, imbalanced datasets and remote clients make these systems difficult to administer. In this study, we propose a privacy-preserving distributed malware detection system with intermittent clients that uses a deep CNN-based federated learning approach. The raw malware binaries are converted into color images to study visual features. Cybersecurity organizations may have imbalanced feature sets, with clients interacting intermittently with the global server. A data augmentation method is utilized to balance malware data for the local training process. A deep CNN architecture used augmented features to perform local training and generate Local Model Updates (LMU). After that, LMU are sent to the global server for model aggregation. The global server collects all LMU received from various clients and generates Global Model Updates (GMU). The GMU is then sent to the remote clients to refresh LMU locally. Experiments demonstrate that the proposed approach achieves competitive results in addressing two different issues, including intermittent client and imbalanced data. These results would motivate cybersecurity companies to work together and utilize rich private data to quickly develop an effective distributed malware detection model.
Privacy-Preserving Federated Learning Approach for Distributed Malware Attacks With Intermittent Clients and Image Representation
Mostarda, Leonardo
2024
Abstract
Deep learning approaches for malware attacks are effective when trained on the same organizational network. However, it is challenging to develop a trustworthy distributed malware detection model that employs diverse training data from multiple sources. This is primarily due to privacy concerns and the lack of a standardized dataset. Confidentiality trends and cyberattacks require reliable and scalable distributed data privacy technologies. In addition, imbalanced datasets and remote clients make these systems difficult to administer. In this study, we propose a privacy-preserving distributed malware detection system with intermittent clients that uses a deep CNN-based federated learning approach. The raw malware binaries are converted into color images to study visual features. Cybersecurity organizations may have imbalanced feature sets, with clients interacting intermittently with the global server. A data augmentation method is utilized to balance malware data for the local training process. A deep CNN architecture used augmented features to perform local training and generate Local Model Updates (LMU). After that, LMU are sent to the global server for model aggregation. The global server collects all LMU received from various clients and generates Global Model Updates (GMU). The GMU is then sent to the remote clients to refresh LMU locally. Experiments demonstrate that the proposed approach achieves competitive results in addressing two different issues, including intermittent client and imbalanced data. These results would motivate cybersecurity companies to work together and utilize rich private data to quickly develop an effective distributed malware detection model.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.