IRIS - Res&Arch Institutional Research Information System - Research & Archive

This paper addresses the emerging threat of indirect prompt injection, a technique in which malicious agents embed prompts into seemingly innocuous text to manipulate the behaviour and output of Generative Large Language Models (LLMs). As LLMs become more popular and commonly used in everyday activities, this type of attack poses serious concerns about their responses. Without knowledge and protection, processes that depend on them may not be reliable. We present and analyse real-world high-risk cases, most notably in the scenario of a comparative analysis of Curriculum Vitae documents. In this scenario, prompt injection is used to mislead the human resources manager who uses LLMs to support personnel selection. This risk is also becoming increasingly relevant in educational contexts where LLMs are used for activities such as automated essay review, tutoring, and content generation, potentially enabling subtle forms of manipulation and misconduct. The hidden prompt subtly alters the behaviour of the generative model, steering its output away from the intended results of the user’s LLM prompt. We analyze the structure of these attacks, evaluate the vulnerability and resilience of popular LLMs, and suggest potential countermeasures. We conclude by discussing the broader implications, evolving risks, and opportunities for securing LLM-based workflows.

Indirect prompt injection in large language models

Franzoni, Valentina^Supervision;Florindi, Emanuele^{Membro del Collaboration Group}

2026

Abstract

This paper addresses the emerging threat of indirect prompt injection, a technique in which malicious agents embed prompts into seemingly innocuous text to manipulate the behaviour and output of Generative Large Language Models (LLMs). As LLMs become more popular and commonly used in everyday activities, this type of attack poses serious concerns about their responses. Without knowledge and protection, processes that depend on them may not be reliable. We present and analyse real-world high-risk cases, most notably in the scenario of a comparative analysis of Curriculum Vitae documents. In this scenario, prompt injection is used to mislead the human resources manager who uses LLMs to support personnel selection. This risk is also becoming increasingly relevant in educational contexts where LLMs are used for activities such as automated essay review, tutoring, and content generation, potentially enabling subtle forms of manipulation and misconduct. The hidden prompt subtly alters the behaviour of the generative model, steering its output away from the intended results of the user’s LLM prompt. We analyze the structure of these attacks, evaluate the vulnerability and resilience of popular LLMs, and suggest potential countermeasures. We conclude by discussing the broader implications, evolving risks, and opportunities for securing LLM-based workflows.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Rivista su cui è pubblicata l'opera
	
				NEURAL COMPUTING & APPLICATIONS
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1626314

Citazioni

ND

0

ND

social impact