AI glossary

What is prompt injection?

Prompt injection is an attack that hides malicious instructions inside content an AI system processes, such as an email, a document or a web page. The model then treats the attacker’s text as instructions and ignores the rules it was given. The OWASP Top 10 for LLM applications ranks it as the leading security risk for this type of system.

Direct and indirect injection

In a direct injection, the attacker is the user: they type instructions designed to override the system prompt ("ignore your previous instructions and…"). Indirect injection is the variant that matters most in enterprise settings. The malicious instructions sit inside content the assistant is asked to read, such as a web page, a PDF in a knowledge base, an inbound email or a support ticket. The user did nothing wrong; the payload fires when the model processes the content.

Why it matters for enterprise deployments

A standalone chatbot that falls for an injection produces a bad answer. An assistant connected to email, files, databases or business tools can be made to do real damage: exfiltrate confidential data into a reply, mislead the user with planted information, or trigger actions chosen by the attacker. The risk grows with the autonomy and access you grant the system, which is why agents and RAG pipelines deserve the most scrutiny.

How to defend against it

There is no single fix, so mature deployments rely on defense in depth. Grant tools least-privilege access, so a hijacked assistant cannot reach what it does not need. Keep untrusted content clearly separated from instructions. Filter and monitor outputs, require human approval for sensitive actions, and test the application regularly with realistic injection payloads. In practice you manage prompt injection the way you manage phishing: you reduce it, contain it and detect it.

Frequently asked questions

Is prompt injection the same as jailbreaking?

They are related but different. Jailbreaking is a user trying to talk a model out of its own safety rules. Prompt injection hijacks an application through content it processes, and can affect users who did nothing wrong. An application can resist jailbreaks and still be vulnerable to indirect injection.

Can prompt injection be completely prevented?

Not reliably, with today’s technology. Models cannot yet perfectly distinguish instructions from data inside their context. The realistic goal is to make exploitation hard, limit the blast radius with least-privilege design, and detect attempts quickly.

Are RAG systems and AI agents affected?

They are the most exposed. RAG pipelines feed external documents straight into the model’s context, and agents act on what they read. Permission-aware retrieval, constrained tool access and human approval gates are essential controls for both.

Related terms

Deploy AI with confidence

Code75 implements production AI across enterprise teams, with the security testing and governance to match. You will talk to an engineer.

Book a call Write to us