This paper addresses the emerging threat of indirect prompt injection, a technique in which malicious agents embed prompts into seemingly innocuous text to manipulate the behaviour and output of Generative Large Language Models (LLMs). As LLMs become more popular and commonly used in everyday activities, this type of attack poses serious concerns about their responses. Without knowledge and protection, processes that depend on them may not be reliable. We present and analyse real-world high-risk cases, most notably in the scenario of a comparative analysis of Curriculum Vitae documents. In this scenario, prompt injection is used to mislead the human resources manager who uses LLMs to support personnel selection. This risk is also becoming increasingly relevant in educational contexts where LLMs are used for activities such as automated essay review, tutoring, and content generation, potentially enabling subtle forms of manipulation and misconduct. The hidden prompt subtly alters the behaviour of the generative model, steering its output away from the intended results of the user’s LLM prompt. We analyze the structure of these attacks, evaluate the vulnerability and resilience of popular LLMs, and suggest potential countermeasures. We conclude by discussing the broader implications, evolving risks, and opportunities for securing LLM-based workflows.

Indirect prompt injection in large language models

Franzoni, Valentina
Supervision
;
Florindi, Emanuele
Membro del Collaboration Group
2026

Abstract

This paper addresses the emerging threat of indirect prompt injection, a technique in which malicious agents embed prompts into seemingly innocuous text to manipulate the behaviour and output of Generative Large Language Models (LLMs). As LLMs become more popular and commonly used in everyday activities, this type of attack poses serious concerns about their responses. Without knowledge and protection, processes that depend on them may not be reliable. We present and analyse real-world high-risk cases, most notably in the scenario of a comparative analysis of Curriculum Vitae documents. In this scenario, prompt injection is used to mislead the human resources manager who uses LLMs to support personnel selection. This risk is also becoming increasingly relevant in educational contexts where LLMs are used for activities such as automated essay review, tutoring, and content generation, potentially enabling subtle forms of manipulation and misconduct. The hidden prompt subtly alters the behaviour of the generative model, steering its output away from the intended results of the user’s LLM prompt. We analyze the structure of these attacks, evaluate the vulnerability and resilience of popular LLMs, and suggest potential countermeasures. We conclude by discussing the broader implications, evolving risks, and opportunities for securing LLM-based workflows.
2026
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1626314
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact