This study addresses the problem of how to automatically generate source code that is not only functional, but also well-structured, readable, and maintainable. Existing generative models for source code often produce functional code, but they lack consistency in structure and adherence to coding standards, essential for integration into existing application development projects and long-term software maintenance. By training the model on specific code structures, including a data set with Italian annotations, the proposed methodology ensures that the generated code is compliant with both the functional requirements and the pre-defined coding standards. The methodology proposed in this study applies transfer learning techniques on the DeepSeek Coder model, to refine pre-trained models to generate code that integrates additional structuring constraints. By training the model on specific code structures, including a dataset with Italian comments, the proposed methodology ensures that the generated code meets both functional requirements and coding structure. Experimental results, evaluated using the perplexity metric, demonstrate the effectiveness of the proposed approach, which impacts the goals of reducing errors, and ultimately improving software development quality.

Generative Models for Source Code: Fine-Tuning Techniques for Structured Pattern Learning

Valentina Franzoni
Supervision
;
Alfredo Milani
Supervision
2024

Abstract

This study addresses the problem of how to automatically generate source code that is not only functional, but also well-structured, readable, and maintainable. Existing generative models for source code often produce functional code, but they lack consistency in structure and adherence to coding standards, essential for integration into existing application development projects and long-term software maintenance. By training the model on specific code structures, including a data set with Italian annotations, the proposed methodology ensures that the generated code is compliant with both the functional requirements and the pre-defined coding standards. The methodology proposed in this study applies transfer learning techniques on the DeepSeek Coder model, to refine pre-trained models to generate code that integrates additional structuring constraints. By training the model on specific code structures, including a dataset with Italian comments, the proposed methodology ensures that the generated code meets both functional requirements and coding structure. Experimental results, evaluated using the perplexity metric, demonstrate the effectiveness of the proposed approach, which impacts the goals of reducing errors, and ultimately improving software development quality.
2024
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1586573
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact