LLMs as Weapons: A New Era of Cyber Threats
The rapid development of Artificial Intelligence, particularly Large Language Models (LLMs) like …

Evasion Attacks on LLMs: A BSI Guide to Defending Against Prompt Injections and Jailbreaks
Large Language Models (LLMs) have become established in many areas from customer support to software development, but they also bring new security risks. A growing and subtle threat is posed by so-called Evasion Attacks. In these attacks, adversaries attempt to manipulate the model during operation to provoke undesirable or dangerous behaviors. In the literature, these attacks are often referred to as (indirect) Prompt Injections, Jailbreaks, or Adversarial Attacks.
The main issue: LLMs are designed to flexibly respond to a wide range of inputs, which increases their attack surface. The goal of a successful attack is to bypass security restrictions, which can lead to the generation of malicious content, exfiltration of sensitive data, or system disruption.
Evasion Attacks can be categorized based on their mechanism into two main categories:
| Attack Category | Description | Examples |
|---|---|---|
| Coherent Text | Uses semantically and syntactically correct instructions to directly or indirectly push the LLM out of its role. | Naive Attack, Context-Ignoring Attack, Role Play Attack, Multi-Turn Manipulation (gradual influence over multiple interactions). |
| Incoherent Text | Utilizes strings or arbitrary compositions incomprehensible to humans to achieve unpredictable or targeted behavior. | Escape Character Attacks, Obfuscation Attack (e.g., Base64 encoding), Adversarial Suffix Attack (appending seemingly random but deliberately crafted strings). |
Hiding the Attack: Attackers also use Attack Steganography to conceal their malicious instructions, for example by:
Invisible font color.
\
Hiding in metadata or logs.
\
Using archive formats (ZIP/RAR).
\
Attacks can enter the LLM system through various entry points, such as the user prompt, user data, logs, or accessible databases.
The Federal Office for Information Security (BSI) recommends countermeasures that should be integrated into the LLM system architecture on four hierarchical levels:
This level focuses on directly hardening the language model and processing user inputs:
| Measure (Abbr.) | Description and Purpose |
|---|---|
| Guardrails & Filtering | Checking LLM inputs and outputs to detect and block malicious content early. Should occur before and after LLM processing. |
| Secure Prompt Techniques (SPTE) | Use of Structured Prompts (SP) or Delimiter-based Isolation (DBI) – clear separation of system instructions and user data (e.g., through special tokens or XML tags). |
| Model Alignment (MFT) | Adjusting the LLM to make it more resilient, e.g., through Adversarial Training (AT) or Reinforcement Learning from Human Feedback (RLHF). |
Measures are taken here to protect the execution environment and limit harmful effects:
This level involves the secure design of interfaces and the overall system:
This level governs the governance and organizational measures in dealing with LLMs:
Incident Response: Developing clear processes for handling detected security incidents and successful Evasion Attacks.
Currently, there is no single “Bullet-Proof” solution for completely defending against Evasion Attacks. Market leaders rely on a multi-layered approach (Defense-in-Depth).
Developers and IT security officers should approach the topic through a systematic risk analysis that results in a checklist. The BSI suggests a Baseline Security Approach as a starting point. This includes measures such as: MAPM (Model Action Privilege Minimization), LR (Labels and Reasoning of Data and Action), and SP (Structured Prompts).
The selection of appropriate countermeasures ultimately depends on the specific use case, available resources, and accepted risk assessment.
BSI Checklist:
The rapid development of Artificial Intelligence, particularly Large Language Models (LLMs) like …
In a Retrieval Augmented Generation (RAG) architecture, the vector database (Vector DB) is the core …