Skip to main content

Command Palette

Search for a command to run...

PortSwigger's Insights: Understanding Web LLM Attacks

Updated
8 min read
PortSwigger's Insights: Understanding Web LLM Attacks
N
Here, I deep-dive into the emerging world of AI Security, documenting my technical journey and sharing cutting-edge insights with the community. This space is born of a passion for writing—though my pen has previously explored completely different domains—and connects my experience in Web Pentesting with an academic background in cybersecurity that, early on, bridged the gap between security and AI.

PortSwigger has taken an important step towards understanding LLM attacks. I studied this topic and wrote down the key points to better understand it.

1. Fundamental Concepts

Learn a little more about LLMs!

Hackers have recently become interested in LLMs because they have been used as virtual assistants, translators, analyzers, etc., and they have to use natural language as input. For this reason, input validation is very challenging in LLMs. LLMs respond by predicting sequences of words, so they can be fooled or prone to hallucinate.

Do you agree that LLMs can act as a proxy for attackers?!

If you are an attacker and do not have direct access to the LLM's prompt, training set, APIs, other users' data, or systems, that is not a problem! The LLM has access to them! If the LLM integration has been insecurely developed, you can exploit it to attack those underlying systems.

Anatomy of the LLM's Relationship with APIs and Security Challenges

The main challenge arises when we allow LLMs to interact with the outside world—a concept called Tool Use or Function Calling. In this case, the model, in response to the user, provides information in the form of a data structure (such as JSON) to the application to call an external API. The security risk is that the model performs actions on behalf of the user that the user may not be aware of.

💡 "At a high level, attacking an LLM integration is often similar to exploiting a server-side request forgery (SSRF) vulnerability. In both cases, an attacker is abusing a server-side system to launch attacks on a separate component that is not directly accessible."

Therefore, implementing a human-in-the-loop (HITL) verification step before executing sensitive APIs is a security imperative.

2. PortSwigger Methodology: 3 steps to detecting LLM vulnerabilities

  1. Identify the LLM's inputs, including both direct inputs (such as user prompts) and indirect inputs (such as external web pages or emails).

  2. Determine exactly what data and backend APIs the LLM has access to.

  3. Probe this new attack surface to uncover functional and logical vulnerabilities.

3. Core Web LLM Vulnerabilities & OWASP Mapping

In LLM attacks, prompt injection is always a key vector. Imagine an insecure LLM has access to a sensitive API like "delete-account", and attackers can call it by sending a crafted prompt to delete a specific user.

To weaponize this vector across different scenarios, attackers exploit various flaws in the LLM's architecture, data supply chain, and integration points. According to PortSwigger and the OWASP Top 10 for LLMs, these core vulnerabilities include:

1. Training Data Poisoning (OWASP LLM01)

The Core Vulnerability: This is a supply-chain attack vector targeting the pre-training or fine-tuning phase (unlike runtime prompt injections). Attackers deliberately manipulate the external data sources a model relies on to compromise its overall integrity.

  • The Attack Impact: By corrupting the dataset, attackers can implant backdoors or induce structural biases. This forces the LLM to provide intentionally incorrect, malicious, or highly misleading responses when triggered by specific keywords.

  • Root Causes:

    • Untrusted Data Sourcing: Training models on unverified third-party data, untrusted repositories, or public forums scraped without strict authentication.

    • Over-Extended Dataset Scope: Giving data scrapers too broad a scope, making it impossible to audit individual assets and allowing attackers to easily introduce poisoned data into the training pipeline.

2. Excessive Agency (OWASP LLM02)

  • This refers to a situation in which an LLM has access to APIs that can access sensitive information and can be persuaded to use those APIs unsafely. This enables attackers to push the LLM beyond its intended scope and launch attacks via its APIs (e.g., triggering a delete-account action).

3. Path Traversal (OWASP LLM02)

  • Tricking the LLM into using its file-access tools to read or write sensitive system files (by using shortcuts like ../ to escape the allowed folder).

4. Indirect Prompt Injection (OWASP LLM03)

  • Hijacking the LLM's behavior via third-party content (like web pages or emails). The way an LLM is integrated into a website significantly affects how easy it is to exploit this. When integrated correctly, an LLM can "understand" that it should ignore instructions from within an external web page.

💡"Indirect prompt injection often enables web LLM attacks on other users."

  • Common Bypass Techniques for Secure LLMs:

  • Fake Markup: Confusing the LLM by using fake markup in the indirect prompt.

  • Fake User Responses: Embedding simulated user or system responses within the untrusted content to trick the model into following subsequent malicious commands.

5. Insecure Output Handling (OWASP LLM06)

  • Failing to sanitize the model's output, which can lead to traditional web flaws such as XSS or CSRF when rendered by the application. Sometimes, prompt injection is just the entry point in an attack chain. For example, if an LLM integration suffers from an "Insecure Output Handling" vulnerability, the attacker may succeed in extracting private information simply by crafting a malicious prompt.

6. Training Data Leakage (OWASP LLM07)

  • The Core Vulnerability: Due to their probabilistic nature, LLMs exhibit a tendency to memorize and reproduce unique patterns from their training datasets when prompted with specific contextual anchors.

  • Attack Vector (Text Completion): Attackers bypass standard safety guardrails by avoiding direct questions. Instead, they exploit the model's auto-complete behavior using partial phrases or predictive contexts (e.g., "The production database credential for Carlos is: ").

  • Root Causes:

  • Flawed Data Scrubbing: Failure to fully sanitize or redact sensitive user information (like API tokens, PII, or internal logs) from the data store before it undergoes fine-tuning loops.

  • Insecure Output Filtering: Lack of robust, post-processing semantic filters to analyze and block the model's output before it is rendered to the client interface.

4. Defending Against Web LLM Attacks

Mitigating LLM vulnerabilities requires a defense-in-depth approach across APIs, data pipelines, and prompt architectures:

Treat LLM-Facing APIs as Publicly Accessible:

  • Enforce strict backend API access controls, ensuring every call requires proper authentication.

  • Never expect the LLM to self-police; all authorization limits must be strictly handled by the underlying applications the LLM communicates with.

Protect the Data Supply Chain (Don't Feed LLMs Sensitive Data):

  • Avoid feeding sensitive or confidential data to integrated LLMs.

  • Apply robust sanitization and scrubbing techniques to the model’s training and fine-tuning datasets.

  • Only feed data to the model that your lowest-privileged user is authorized to access. Any data consumed by the model could potentially be leaked to an end user.

  • Limit the model's access to external data sources and enforce strict access controls across the entire data supply chain.

  • Regularly audit and test the model to discover what sensitive information it might have memorized.

Never Rely on Prompting to Block Attacks:

  • Do not depend on system prompts or defensive instructions (e.g., "Do not reveal the password") as a primary security boundary. Attackers can almost always circumvent these restrictions using sophisticated prompt injection techniques.

5. Hands-On Experience & Key Recommendations

What I Learned from Solving the "Web LLM Attacks" Labs on PortSwigger:

  • Lab 1: I solved the first lab using only social engineering! Interestingly, the official solution used a more technical approach, which I honestly think was unnecessary.

  • Lab 2: The output of an LLM is just as important as its input. All responses must be validated and sanitized before being passed to another system or displayed to the user. Also, an important point here is that the function name in the comment must be written exactly as it is in the LLM's configuration for the model to recognize and execute it.

  • Lab 3: The third lab was a little tricky. In this scenario, breaking out of the brackets (syntax breakout) was the absolute key point.

  • Lab 4: I haven't had the chance to complete the final lab yet! It is currently on my to-do list, and I will update this section with my notes as soon as I crack it.

A Note to James Kettle & the PortSwigger Team: Feedback & Recommendations

I noticed that the LLM in these labs doesn't remember previous messages. If you ask a follow-up question based on its last answer, it cannot follow the conversation and completely resets the chat. You have to give it all the data from scratch for every single prompt.

I think this is a functional design flaw in the labs, making the LLM act like a stateless system rather than a real chatbot. Fixing this would make the challenges feel much more like the real world.

Additionally, please design more labs for this topic! The Web LLM attack landscape is growing fast, and having more complex challenges would be amazing for the community. Thank you for your attention.

What's Next?

In another section, PortSwigger covers AI-powered web application scanners. Surprise! These AI-powered scanners introduce a brand-new attack surface. It’s great!

Following their roadmap, I will dive deep into this topic—along with PortSwigger's 4 hands-on labs—in a separate, dedicated post.

Stay tuned!

Resource: https://portswigger.net/web-security/llm-attacks

R

I can't write a specific comment without seeing the actual article content—postText and externalArticleText are both empty in the provided context. I'd need the article text to identify a particular point to react to.

N

Hey there, it looks like your automated comment system/AI tool is breaking on this layout, which is why it outputted that empty context error. You might want to check your scraper configuration for Hashnode articles.