What is prompt injection?
Prompt injection is an attack that hides malicious instructions inside content an AI system processes, such as an email, a document or a web page. The model then treats the attacker’s text as instructions and ignores the rules it was given. The OWASP Top 10 for LLM applications ranks it as the leading security risk for this type of system.
Direct and indirect injection
In a direct injection, the attacker is the user: they type instructions designed to override the system prompt ("ignore your previous instructions and…"). Indirect injection is the variant that matters most in enterprise settings. The malicious instructions sit inside content the assistant is asked to read, such as a web page, a PDF in a knowledge base, an inbound email or a support ticket. The user did nothing wrong; the payload fires when the model processes the content.
Why it matters for enterprise deployments
A standalone chatbot that falls for an injection produces a bad answer. An assistant connected to email, files, databases or business tools can be made to do real damage: exfiltrate confidential data into a reply, mislead the user with planted information, or trigger actions chosen by the attacker. The risk grows with the autonomy and access you grant the system, which is why agents and RAG pipelines deserve the most scrutiny.
How to defend against it
There is no single fix, so mature deployments rely on defense in depth. Grant tools least-privilege access, so a hijacked assistant cannot reach what it does not need. Keep untrusted content clearly separated from instructions. Filter and monitor outputs, require human approval for sensitive actions, and test the application regularly with realistic injection payloads. In practice you manage prompt injection the way you manage phishing: you reduce it, contain it and detect it.
Is prompt injection the same as jailbreaking?
They are related but different. Jailbreaking is a user trying to talk a model out of its own safety rules. Prompt injection hijacks an application through content it processes, and can affect users who did nothing wrong. An application can resist jailbreaks and still be vulnerable to indirect injection.
Can prompt injection be completely prevented?
Not reliably, with today’s technology. Models cannot yet perfectly distinguish instructions from data inside their context. The realistic goal is to make exploitation hard, limit the blast radius with least-privilege design, and detect attempts quickly.
Are RAG systems and AI agents affected?
They are the most exposed. RAG pipelines feed external documents straight into the model’s context, and agents act on what they read. Permission-aware retrieval, constrained tool access and human approval gates are essential controls for both.
- LLM red teamingLLM red teaming is the structured, adversarial testing of AI systems. Testers deliberately attack a model or AI application with jailbreaks, prompt-injection payloads, data-extraction attempts and abuse scenarios in order to find failures before real users or attackers do.
- Retrieval-augmented generation (RAG)Retrieval-augmented generation (RAG) is a technique that connects a language model to your own knowledge sources. When a user asks a question, the system first retrieves the most relevant documents, then passes them to the model alongside the question. The answer is grounded in your data instead of relying only on what the model learned during training.
- Shadow AIShadow AI is the use of AI tools inside an organisation without the knowledge or approval of IT and security teams. Typical examples include employees pasting company data into personal chatbot accounts, unvetted AI browser extensions, and AI features wired into business workflows outside any oversight.
Deploy AI with confidence
Code75 implements production AI across enterprise teams, with the security testing and governance to match. You will talk to an engineer.