Table of contents
Demystifying the OWASP Top 10: A Comprehensive Guide for LLM (Language Learning Models)
The rush of interest in Large Language Models (LLMs) following the release of mass-market pre-trained chatbots in late 2022 has been astounding—businesses seeking to capitalise on LLMs' promise increasingly integrate them into their operations and client-facing solutions. However, the rapid use of LLMs has overtaken the development of robust security protocols, leaving many applications vulnerable to high-risk vulnerabilities.
There was no centralised resource addressing these security risks in LLMs. Developers were left with dispersed resources since they were unfamiliar with the specific hazards connected with LLMs, and OWASP's purpose seemed a logical fit to assist in pushing for safer use of this technology.
LLM01: Prompt Injection
LLMs can be manipulated by attackers via forged inputs, causing them to carry out the attacker's plans. This can be accomplished directly by adversarially pressing the system prompt or indirectly by altered external inputs, which could result in data exfiltration, social engineering, and other concerns.
Examples
Direct prompt injections overwrite system prompts
Indirect prompt injections hijack the conversation context
A user employs an LLM to summarize a webpage containing an indirect prompt injection
Prevention
Enforce privilege control on LLM access to backend systems
Implement humans in the loop for extensible functionality
Segregate external content from user prompts
Establish trust boundaries between the LLM, external sources, and extensible functionality.
Attack Scenarios
An attacker provides a direct prompt injection to an LLM-based support chatbot
An attacker embeds an indirect prompt injection in a webpage
A user employs an LLM to summarize a webpage containing an indirect prompt injection.
LLM02: Insecure Output Handling
Insecure Output Handling is a vulnerability that occurs when a downstream component takes large language model (LLM) output without properly scrutinising it. This can result in XSS and CSRF attacks in web browsers, as well as SSRF, privilege escalation, and remote code execution on backend systems.
Examples
LLM output is entered directly into a system shell or similar function, resulting in remote code execution
JavaScript or Markdown is generated by the LLM and returned to a user, resulting in XSS.
Prevention
Apply proper input validation on responses coming from the model to backend functions
Encode output coming from the model back to users to mitigate undesired code interpretations
Attack Scenarios
An application directly passes the LLM-generated response into an internal function responsible for executing system commands without proper validation
A user utilizes a website summarizer tool powered by an LLM to generate a concise summary of an article, which includes a prompt injection
An LLM allows users to craft SQL queries for a backend database through a chat-like feature.
LLM03: Training Data Poisoning
Manipulation of the data or the fine-tuning process to introduce vulnerabilities, backdoors, or biases that could undermine the model's security, effectiveness, or ethical behaviour is referred to as training data poisoning. This increases the likelihood of performance deterioration, downstream software exploitation, and reputational damage.
Examples
A malicious actor creates inaccurate or malicious documents targeted at a model’s training data
The model trains using falsified information or unverified data which is reflected in output.
Prevention
Verify the legitimacy of targeted data sources during both the training and fine-tuning stages
Craft different models via separate training data for different use-cases
Use strict vetting or input filters for specific training data or categories of data sources
Attack Scenarios
Output can mislead users of the application leading to biased opinions
A malicious user of the application may try to influence and inject toxic data into the model
A malicious actor or competitor creates inaccurate or falsified information targeted at a model’s training data
The vulnerability Prompt Injection could be an attack vector to this vulnerability if insufficient sanitization and filtering are performed
LLM04: Model Denial of Service
Manipulation of the data or the fine-tuning process to introduce vulnerabilities, backdoors, or biases that could undermine the model's security, effectiveness, or ethical behaviour is referred to as training data poisoning. This increases the likelihood of performance deterioration, downstream software exploitation, and reputational damage.
Examples
Posing queries that lead to recurring resource usage through high volume generation of tasks in a queue
ending unusually resource-consuming queries
Continuous input overflow: An attacker sends a stream of input to the LLM that exceeds its context window
Prevention
Implement input validation and sanitization to ensure input adheres to defined limits, and cap resource use per request or step.
Enforce API rate limits to restrict the number of requests an individual user or IP can make
Attack Scenarios
Attackers send multiple requests to a hosted model that are difficult and costly for it to process
A piece of text on a webpage is encountered while an LLM-drive tool is collecting information to respond to benign a query.
The attacker overwhelms the LLM with input that exceeds its context window.
LLM05: Supply Chain Vulnerabilities
Supply chain flaws in LLMs can jeopardise training data, ML models, and deployment platforms, resulting in skewed findings, security breaches, and total system failures. Such flaws might be caused by outdated software, vulnerable pre-trained models, tainted training data, and insecure plugin designs.
Examples
Using outdated third-party packages
Fine-tuning with a vulnerable pre-trained model
Training using poisoned crowd-sourced data
Utilizing deprecated, unmaintained models
Lack of visibility into the supply chain is.
Prevention
Vet data sources and use independently-audited security systems
Use trusted plugins tested for your requirements
Apply MLOps best practices for own models
Use model and code signing for external models
Implement monitoring for vulnerabilities and maintain a patching policy
Regularly review supplier security and access
Attack Scenarios
Attackers exploit a vulnerable Python library
An attacker tricks developers via a compromised PyPi package
Publicly available models are poisoned to spread misinformation
A compromised supplier employee steals IP
An LLM operator changes T&Cs to misuse application data.
LLM06: Sensitive Information Disclosure
LLM apps may mistakenly reveal sensitive information, proprietary algorithms, or confidential data, resulting in unauthorised access, intellectual property theft, and privacy violations. LLM applications should use data sanitization, create proper usage controls, and limit the types of data returned by the LLM to reduce these risks.
Examples
Incomplete filtering of sensitive data in responses
Overfitting or memorizing sensitive data during training
Unintended disclosure of confidential information due to errors
Prevention
Use data sanitization and scrubbing techniques
Implement robust input validation and sanitization
Limit access to external data sources
Apply the rule of least privilege when training models
Maintain a secure supply chain and strict access control.
Attack Scenarios
Legitimate users exposed to other user data via LLM
Crafted prompts used to bypass input filters and reveal sensitive data
Personal data leaked into the model via training data increases risk.
LLM07: Insecure Plugin Design
Due to weak access constraints and faulty input validation, plugins might be vulnerable to malicious requests, which can result in negative outcomes such as data exfiltration, remote code execution, and privilege escalation. To prevent exploitation, developers must use strong security techniques such as rigorous parameterized inputs and safe access control principles.
Examples
Plugins accepting all parameters in a single text field or raw SQL or programming statements;
Authentication without explicit authorization to a particular plugin;
Plugins treating all LLM content as user-created and performing actions without additional authorization.
Prevention
Enforce strict parameterized input and perform type and range checks;
Conduct thorough inspections and tests including SAST, DAST, and IAST;
Use appropriate authentication identities and API Keys for authorization and access control;
Require manual user authorization for actions taken by sensitive plugins
Attack Scenarios
Attackers craft requests to inject their own content with controlled domains;
The attacker exploits a plugin accepting free-form input to perform data exfiltration or privilege escalation;
The attacker stages a SQL attack via a plugin accepting SQL WHERE clauses as advanced filters.
LLM08: Excessive Agency
Excessive Agency is a vulnerability in LLM-based systems caused by over-functionality, excessive permissions, or too much autonomy. To avoid this, developers must restrict plugin functionality, rights, and autonomy to the absolute minimum, log user authorisation, demand human approval for all operations, and implement authorization in downstream systems.
Examples
An LLM agent accesses unnecessary functions from a plugin
An LLM plugin fails to filter unnecessary input instructions
A plugin possesses unneeded permissions on other systems
An LLM plugin accesses downstream systems with high-privileged identities.
Prevention
An LLM agent accesses unnecessary functions from a plugin
An LLM plugin fails to filter unnecessary input instructions
A plugin possesses unneeded permissions on other systems
An LLM plugin accesses downstream systems with high-privileged identities.
Attack Scenarios
- An LLM-based personal assistant app with excessive permissions and autonomy is tricked by a malicious email into sending spam. This could be prevented by limiting functionality, and permissions, requiring user approval, or implementing rate limiting.
LLM09: Overreliance
Overreliance on LLMs can have major repercussions, including disinformation, legal concerns, and security flaws.
It happens when an LLM is trusted to make crucial decisions or create information without proper scrutiny or confirmation.
Examples
LLM provides incorrect information
LLM generates nonsensical text
LLM suggests insecure code
Inadequate risk communication from LLM providers
Prevention
Regular monitoring and review of LLM outputs
Cross-check LLM output with trusted sources
Enhance the model with fine-tuning or embeddings
Implement automatic validation mechanisms
Break tasks into manageable subtasks
Clearly communicate LLM risks and limitations
Establish secure coding practices in development environments.
Attack Scenarios
AI fed misleading info leading to disinformation
AI's code suggestions introduce security vulnerabilities
The developer unknowingly integrates a malicious package suggested by AI.
LLM10: Training Data Poisoning
Unauthorised access to and exfiltration of LLM models is LLM model theft, putting economic loss, reputation damage, and unauthorised access to sensitive data at risk. To safeguard these models, strong security measures are required.
Examples
The attacker gains unauthorized access to the LLM model
Disgruntled employee leaks model artefacts
Attacker crafts input to collect model outputs
Side-channel attack to extract model info
Use of stolen model for adversarial attacks.
Prevention
Implement strong access controls, authentication, and monitor/audit access logs regularly
Implement rate limiting of API calls
Watermarking framework in LLM lifecycle
Automate MLOps deployment with governance.
Attack Scenarios
Unauthorized access to LLM repository for data theft
Leaked model artefacts by disgruntled employee
Creation of a shadow model through API queries
Data leaks due to supply-chain control failure7 B Side-channel attack to retrieve model information.