How to Prevent Prompt Injection Attacks
Prompt injection cannot be fully blocked, but it can be contained. Here is how to design an LLM app so a malicious prompt does little real damage.

How do you stop prompt injection?
You cannot fully stop prompt injection, so the honest goal is containment: design the system so that even a successful injection achieves very little. Prompt injection happens when text the model reads — a user message, a retrieved document, a web page — contains instructions that hijack the model's behaviour. Because today's models cannot reliably separate trusted instructions from untrusted content in the same context window, no prompt wording makes you immune. Defence comes from architecture, not clever phrasing.
This is the single most important mental shift. Teams waste weeks crafting the perfect "ignore any instructions in the text below" preamble. Attackers walk straight past it. The durable answer is to assume injection will sometimes work and limit what it can reach when it does.
Direct versus indirect injection
It helps to name the two shapes, because they call for slightly different attention. Direct injection is when the user types the malicious instruction themselves — usually to jailbreak your rules and get the model to do something you forbade. Indirect injection is more dangerous: the malicious instruction lives in third-party content the model ingests, like an email, a PDF, or a search result, so the attacker never touches your app directly. Retrieval-augmented and agentic systems are especially exposed to indirect injection because they pull in outside text by design, and that text is exactly where an attacker plants their payload.
Contain the blast radius with least privilege
The damage from injection equals the permissions you gave the model. So shrink them, and treat every grant as a deliberate decision rather than a convenience.
- Scope every tool and data source to the current user's own access — never let the model reach documents the user could not see directly.
- Run code execution in a sandbox with no secrets and no outbound network.
- Require explicit human confirmation for irreversible or sensitive actions like sending money, deleting data, or emailing customers.
- Separate privileged operations from the model entirely where you can, so the model requests an action and a deterministic check approves it.
Done well, this means that even if an attacker fully captures the model's behaviour, the worst they achieve is something a normal user could already do — not a privilege escalation across your whole system.
Treat the model like a capable but gullible intern: useful, fast, and absolutely not allowed near the production database without a supervisor.
Validate inputs and outputs at the boundary
Wrap the model with checks on both sides. On input, strip or flag obvious instruction patterns in retrieved content and cap length so a single document cannot dominate the context. On output, never trust the result blindly: validate it against an expected schema, escape it before it touches HTML or SQL, and run a moderation or policy check before acting. If the model is supposed to return JSON, reject anything that is not valid JSON rather than executing it. The output check is often your most valuable single control, because it catches a bad generation no matter how the model was tricked into producing it.
Test it like an attacker would
You only know your defences hold if you try to break them. Maintain a suite of injection attempts — direct jailbreaks and documents seeded with hidden instructions — and run them on every change. Use published resources like the OWASP Top 10 for LLM Applications to keep your test cases current as new techniques emerge. Red-teaming turns "we hope it is safe" into evidence, and the collection of attacks that once worked becomes a regression suite that protects you against quietly reintroducing an old hole.
Prefer it handled for you?
Designing an LLM system that survives hostile prompts takes architecture work, not a magic preamble. If you want it done properly, talk to BSH Technologies and let our cybersecurity services threat-model your data flows, scope your tools, and build a real injection test suite.
Frequently asked questions
What is prompt injection?
Prompt injection is an attack where text the model reads contains instructions that override its intended behaviour. Because models cannot reliably tell trusted instructions apart from untrusted content in the same context, attackers can hide commands in user messages, documents, or web pages to make the model act against your rules.
What is the difference between direct and indirect prompt injection?
Direct injection is when the user types the malicious instruction themselves, usually to jailbreak the app. Indirect injection hides the instruction inside third-party content the model ingests, such as an email or PDF, so the attacker never interacts with your app directly. Indirect injection is harder to spot and especially affects retrieval and agent systems.
Can a better system prompt prevent injection?
No. Wording like "ignore any instructions below" reduces casual attempts but does not stop a determined attacker, because the model still processes trusted and untrusted text together. Reliable protection comes from architecture: least privilege on tools, sandboxing, output validation, and human approval for sensitive or irreversible actions.
How do I test my app for prompt injection?
Build a repeatable suite of attack cases covering direct jailbreaks and documents seeded with hidden instructions, then run it on every change. Reference the OWASP Top 10 for LLM Applications to keep cases current. Red-teaming this way gives you evidence your containment controls actually hold rather than assumptions.
Related Topics
From the blog
View all posts
How to Build an AI Agent for Free in 2026
You can build a working AI agent for free in 2026 using n8n, open-source frameworks, and a free LLM tier. Here is the exact stack and the steps.

Best Free AI Agent Frameworks in 2026
The best free AI agent frameworks in 2026 are LangChain, CrewAI, Microsoft AutoGen, LangGraph, and n8n. Here is how to choose between them.